Topicalization in Asian Englishes: Forms, Functions, and Frequencies of a Fronting Construction 2018044610, 9781138549456, 9781351000437

402 59 26MB

English Pages [239] Year 2019

Polecaj historie

Null Subjects in Englishes: A Comparison of British English and Asian Englishes 9783110645354, 3110645351, 9783110649260, 3110649268

The future of English linguistics as envisaged by the editors of Topics in English Linguistics lies in empirical studies

372 127 1MB Read more

Null Subjects in Englishes: A Comparison of British English and Asian Englishes 9783110649260, 9783110633436

Winner of the Hermann Paul Award This book presents the first systematic quantitative study of null subjects not only

174 84 1MB Read more

Null Subjects in Englishes: A Comparison of British English and Asian Englishes 9783110649260, 9783110633436

Winner of the Hermann Paul Award This book presents the first systematic quantitative study of null subjects not only

146 65 9MB Read more

Modular Forms and L-functions

253 108 614KB Read more

Morpho: Joint Forms and Muscular Functions 9798888140628

In Morpho: Joint Forms and Muscular Functions, artist and teacher Michel Lauricella presents a mechanical view of the hu

1,441 365 31MB Read more

Video in Social Science Research : Functions and Forms 9780203839119, 9780415467858

In this digital age the use of video in social science research has become commonplace. As sophistication has increased

153 39 1MB Read more

Social Stratification: The Forms and Functions of Inequality

936 126 18MB Read more

Architecture in Northern Ghana: A Study of Forms and Functions [Reprint 2020 ed.] 9780520324978

172 74 20MB Read more

World Englishes and Second Language Acquisition: Insights from Southeast Asian Englishes [1 ed.] 9789027266651, 9789027249180

Bridging the gap between the fields of World Englishes and Second Language Acquisition, this volume offers an in-depth c

178 65 9MB Read more

Green's Functions: Construction and Applications 9783110253399, 9783110253023

Green's functions represent one of the classical and widely used issues in the area of differential equations. Th

216 72 9MB Read more

Topicalization in Asian Englishes: Forms, Functions, and Frequencies of a Fronting Construction
2018044610, 9781138549456, 9781351000437

Author / Uploaded
Sven Leuckert

Table of contents :
Cover
Half Title
Series Information
Title Page
Copyright Page
Dedication
Table of contents
List of Figures
List of Tables
Acknowledgements
Abbreviations
1 Introduction
Notes
2 Approaching topicalization
2.1 Topics
2.1.1 Origin and history
2.1.2 Givenness
2.1.3 Aboutness
2.1.4 Defining ‘topic’
2.2 Topicalization
2.2.1 Establishing a framework for topicalization
2.2.2 An expanded concept of topicalization
2.2.3 Topicalization as a non-canonical construction
2.2.4 Discourse functions of topicalization
Emphasis and contrast
Topic continuity and topic shifting
2.3 Summary
Notes
3 Topic-prominence in Asian contact languages
3.1 Word-order typology
3.2 Li and Thompson’s classification of languages
3.3 Ratings of individual languages and language families
3.3.1 Rating procedure
3.3.2 Indo-Aryan languages
Hindi
Bangla
Marathi
3.3.3 Dravidian languages
Telugu
Tamil
Kannada
3.3.4 Sinitic languages
Mandarin
Cantonese
3.3.5 Austronesian languages
Malay
Tagalog
3.4 Summary
Notes
4 Development and variety status of four Asian Englishes
4.1 Analysing World Englishes
4.1.1 Earlier models
4.1.2 The Dynamic Model
Phase 1
Phase 2
Phase 3
Phase 4
Phase 5
Demographics of India
The status of IndE
IndE in the Dynamic Model
Demographics of Singapore
The status of SinE
SinE in the Dynamic Model
Demographics of Hong Kong
The status of HKE
HKE in the Dynamic Model
Demographics of the Philippines
The status of PhilE
PhilE in the Dynamic Model
4.2 Introducing the varieties
4.2.1 Indian English
4.2.2 Singapore English
4.2.3 Hong Kong English
4.2.4 Philippine English
4.3 Summary
Notes
5 Corpus analysis: Data basis and methodology
5.1 The International Corpus of English (ICE)
5.2 Data selection
5.3 Coding and evaluation
5.4 Problematic cases and limitations
Notes
6 Forms, functions, and frequencies of topicalization
6.1 Frequencies of topicalization
6.1.1 Direct conversations
6.1.2 Phone calls
6.1.3 Classroom lessons
6.1.4 Comparison
6.2 Forms of topicalization
6.2.1 Constituent form
Noun Phrases
Adjective Phrases
Clauses
Prepositional Phrases
Adverb Phrases
6.2.2 Information status
6.2.3 Hanging topics
6.3 Functions of topicalization
6.3.1 Syntactic function
Direct and indirect objects
Subject and object complements
Adverbials
6.3.2 Interaction with further syntactic processes
6.3.3 Discourse function
Emphasis and contrast
Topic continuity and topic shifting
Topic persistence
6.4 Summary
Notes
7 Explaining topicalization frequencies
7.1 The role of language contact
7.1.1 Forming a hypothesis
7.1.2 Intensity of contact and borrowability
7.1.3 The relation of contact to variety status
7.2 Processes of second-language acquisition
7.2.1 Overview of SLA processes
7.2.2 SLA processes and topicalization
7.3 Topicalization, social identity, and politeness
7.3.1 Social identity and variety status
7.3.2 Topicalization as a politeness strategy
7.4 The multicausal nature of topicalization
Notes
8 Conclusion and outlook
8.1 Summary
8.2 Limitations and opportunities for future research
References
Index

Citation preview

i

Topicalization in Asian Englishes

This monograph is the first comprehensive study of topicalization in Asian second-language varieties of English and provides an in-depth analysis of the forms, functions, and frequencies of topicalization in four Asian Englishes. Topicalization, that is, the sentence-initial placement of constituents other than the subject, has been found to occur frequently in the English spoken by many Asians, but so far the possible reasons for this have never been scrutinized. This book closes this research gap by taking into account the structures of the major contact languages, the roles of second-language acquisition and politeness as well as other factors in order to explain why topicalization is highly frequent in some varieties such as Indian English and much less frequent in other varieties such as Hong Kong English. In addition to exploring major and minor forces involved in explaining the frequency of topicalization, the forms and functions of the feature are assessed. Central questions addressed in this regard are the following: Which syntactic constituents tend to be topicalized the most and the least frequently? Which discourse effects does topicalization achieve? How can we approach topicalization methodologically? And, lastly, which influence do language processing and production have on topicalization? Sven Leuckert is a lecturer in the Institute of English and American Studies at the Technical University of Dresden, Germany.

ii

Routledge Studies in World Englishes Series Editor: Ee Ling Low

National Institute of Education, Nanyang Technological University, Singapore, and president of Singapore Association of Applied Linguistics

This Singapore Association for Applied Linguistics book series will provide a starting point for those who wish to know more about the aspects of the spread of English in the current globalized world. Each volume can cover the following aspects of the study of World Englishes: issues and theoretical paradigms, feature- based studies (i.e. phonetics and phonology, syntax, lexis) and language in use (e.g. education, media, the law, and other related disciplines). Negotiating Englishes and English-speaking Identities A Study of Youth Learning English in Italy Jacqueline Aiello World Englishes Rethinking Paradigms Edited by Ee Ling Low and Anne Pakir EIL Education for the Expanding Circle A Japanese Model Nobuyuki Hino Professional Development of English Language Teachers in Asia Lessons from Japan and Vietnam Edited by Kayoko Hashimoto and Van-Trao Nguyen The Politics of English in Hong Kong Attitudes, Identity and Use Jette G. Hansen Edwards Topicalization in Asian Englishes Forms, Functions, and Frequencies of a Fronting Construction Sven Leuckert For a full list of titles in this series, visit https://www.routledge.com/Routledge- Studies-in-World-Englishes/book-series/RSWE

iii

Topicalization in Asian Englishes

Forms, Functions, and Frequencies of a Fronting Construction Sven Leuckert

iv

First published 2019 by Routledge 2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN and by Routledge 52 Vanderbilt Avenue, New York, NY 10017 Routledge is an imprint of the Taylor & Francis Group, an informa business © 2019 Sven Leuckert The right of Sven Leuckert to be identified as author of this work has been asserted by him in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data Names: Leuckert, Sven, 1989– author. Title: Topicalization in Asian Englishes : forms, functions, and frequencies of a fronting construction / by Sven Leuckert. Description: New York, NY : Routledge, 2018. | Series: Routledge studies in world Englishes | Includes bibliographical references and index. Identifiers: LCCN 2018044610 | ISBN 9781138549456 (hardback) | ISBN 9781351000437 (ebook) Subjects: LCSH: English language–Asia. | English language–Study and teaching–Asia. | English language–Foreign countries. | English language–Study and teaching–Foreign countries. | English language–Topic and comment. Classification: LCC PE3501 .L48 2018 | DDC 427/.95–dc23 LC record available at https://lccn.loc.gov/2018044610 ISBN: 978-1-138-54945-6  (hbk) ISBN: 978-1-351-00043-7  (ebk) Typeset in Galliard by Newgen Publishing UK

v

To my parents To my brother To Christine

vi

vii

Contents

List of figures List of tables Acknowledgements List of abbreviations

x xi xiii xv

1 Introduction

1

2 Approaching topicalization

7

2.1 Topics 7

2.1.1 Origin and history 7 2.1.2 Givenness 13 2.1.3 Aboutness 16 2.1.4 Defining ‘topic’ 18

2.2 Topicalization 19

2.2.1 Establishing a framework for topicalization 19 2.2.2 An expanded concept of topicalization 26 2.2.3 Topicalization as a non-canonical construction 28 2.2.4 Discourse functions of topicalization 29

2.3 Summary 31

3 Topic-prominence in Asian contact languages

3.1 Word-order typology 34 3.2 Li and Thompson’s classification of languages 36 3.3 Ratings of individual languages and language families 41 3.3.1 Rating procedure 41 3.3.2 Indo-Aryan languages 42 3.3.3 Dravidian languages 47 3.3.4 Sinitic languages 49 3.3.5 Austronesian languages 53

3.4 Summary 57

34

viii

viii Contents

4 Development and variety status of four Asian Englishes 62 4.1 Analysing World Englishes 63 4.1.1 Earlier models 63 4.1.2 The Dynamic Model 65

4.2 Introducing the varieties 68

4.2.1 Indian English 68 4.2.2 Singapore English 70 4.2.3 Hong Kong English 74 4.2.4 Philippine English 76

4.3 Summary 79

5 Corpus analysis: Data basis and methodology

81

6 Forms, functions, and frequencies of topicalization

98

5.1 The International Corpus of English (ICE) 81 5.2 Data selection 82 5.3 Coding and evaluation 84 5.4 Problematic cases and limitations 91

6.1 Frequencies of topicalization 98 6.1.1 Direct conversations 99 6.1.2 Phone calls 100 6.1.3 Classroom lessons 101 6.1.4 Comparison 103

6.2 Forms of topicalization 105

6.2.1 Constituent form 106 6.2.2 Information status 115 6.2.3 Hanging topics 119

6.3 Functions of topicalization 121

6.3.1 Syntactic function 122 6.3.2 Interaction with further syntactic processes 145 6.3.3 Discourse function 154

6.4 Summary 169

7 Explaining topicalization frequencies 7.1 The role of language contact 172

7.1.1 Forming a hypothesis 172 7.1.2 Intensity of contact and borrowability 174 7.1.3 The relation of contact to variety status 176

7.2 Processes of second-language acquisition 178 7.2.1 Overview of SLA processes 178 7.2.2 SLA processes and topicalization 179

7.3 Topicalization, social identity, and politeness 185 7.3.1 Social identity and variety status 185 7.3.2 Topicalization as a politeness strategy 187

7.4 The multicausal nature of topicalization 190

172

ix

Contents ix

8 Conclusion and outlook

194

8.1 Summary 194 8.2 Limitations and opportunities for future research 196

References Index

198 216

x

Figures

.1 4 6.1 6.2 6.3 6.4 6.5 6.6 6.7 7.1

Kachru’s three circles model Frequency of topicalization across varieties, direct conversations Frequency of topicalization across varieties, phone calls Frequency of topicalization across varieties, classroom lessons Overall frequency of topicalization across varieties (totals) Frequency comparison of topicalization across varieties and genres Distribution of syntactic forms in percentages (across all corpora) Information status of all topicalization tokens across varieties Expected token frequencies of topicalization based on the degree of topic-prominence in the contact languages 7.2 Actual token frequencies of topicalization in the analysed ICE components .3 Chronology of variety emergence, nativization, and token 7 frequency of topicalization per 100,000 words

64 100 101 102 104 104 107 117 173 174 176

xi

Tables

.1 2 2.2 2.3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 .8 3 3.9 3.10 3.11 3.12 3.13 .1 4 4.2 4.3 5.1 .2 5 5.3 5.4 6.1 6.2 6.3 6.4 6.5

Matrix of information statuses in Prince’s framework Expanded functions of topicalization in Mesthrie’s study of SAIE Canonical vs. non-canonical sentences Distribution of word-order types in the languages of the world Language types according to Li and Thompson (1976: 460) Characteristics of ‘subject’ and ‘topic’ Characteristics of topic-prominent languages Major Indo-Aryan languages spoken in India Topic-prominence features in Hindi, Bangla, and Marathi Major Dravidian languages spoken in India compared to Indo-Aryan languages Topic-prominence features in Tamil, Telugu, and Kannada Speaker numbers for the main languages spoken in Singapore Hong Kong population aged 5 and over by usual language Topic-prominence features in Mandarin and Cantonese Topic-prominence features in Malay and Tagalog Overview of the typological profiles with regard to topic-prominence Tripartite classification of English(es) worldwide Singapore’s population and ethnic groups in 2000 and 2010 Speaker numbers for the main languages spoken in Singapore Structure and expected word count of the analysed ICE components Annotation schema for tokens of topicalization Explanation of the symbols in the regular expressions Word counts in the analysed ICE components after normalization Overall frequency of topicalization across the direct conversations Overall frequency of topicalization across the phone calls Overall frequency of topicalization across the classroom lessons Overall frequency of topicalization across varieties (totals) Syntactic forms of topicalized constituents across varieties (absolute frequencies and relative frequencies)

20 27 28 35 37 38 40 42 46 48 50 51 51 54 58 59 65 71 72 84 85 90 91 99 100 102 103 106

xii

xii List of tables 6.6 6.7 .8 6 6.9 6.10 6.11 6.12 6.13 .1 7 7.2 7.3

Topicalized anaphoric pronouns and demonstrative determiners across varieties Possible information statuses of topicalized tokens compared to Gundel et al.’s (1993) framework Information status of all topicalized constituents Syntactic functions of topicalized constituents across varieties Semantic categories of adjuncts according to the CGEL Discourse functions of topicalization across varieties Topic persistence across varieties Rating of Mesthrie’s criteria for an expanded concept of topicalization (1992) based on the analysed ICE corpora Main principles and strategies of SLA Topic continuity tokens across the five analysed varieties (absolute and proportional figures) Summary of explanatory parameters for frequency differences of topicalization in the four Asian varieties under consideration

109 116 116 122 138 154 168 170 180 189 191

xiii

Acknowledgements

This book is a revised version of my dissertation, which I handed in at the Faculty of Linguistics, Literature and Cultural Studies at Technische Universität Dresden in May 2017. Many people contributed to making this book a reality, and all of them deserve to be mentioned. Most of all, I would like to thank Claudia Lange: Many years ago, she was the first to show me in an introductory class to English linguistics that analysing language does not have to be dry but can be entertaining and inspiring. Since then, she has always supported me, not only in her roles as my supervisor and boss, but also as a mentor and friend. Only with her feedback, her assistance, and her trust in me was it possible for this book to ever see the light of day. My gratitude also goes to my colleagues and friends in Regensburg, namely Edgar Schneider, Sarah Buschfeld, Alexander Kautzsch, Theresa Neumaier, Thorsten Brato, and Julia Hubner, for giving me invaluable feedback on my work and making my year in Regensburg a wonderfully enriching experience. There are many people who helped in making it possible for me to finish this work. Susanne Wagner gave me time and freedom in the final stages of writing this book, for which I am very thankful. For commenting on various parts of this study, I want to thank Ashleigh Moeller, who makes coming to the office a delight; Sofia Rüdiger, who did a fantastic job on the thesis crisis hotline; and Beatrix Weber, whose alternative perspective on linguistics has always been fascinating to me. I would also like to thank Theresa Neumaier for her insightful comments and Sarah Buschfeld for discussing many difficult examples of topicalization with me. For giving me assistance in all things computer-related, I am thankful to Matthias Meyer, Christopher Koch, Martin Leuckert, and Stefan Hartmann. In addition, I would like to thank the GFF e.V. (Gesellschaft von Freunden und Förderern der TU Dresden) for financial support in the initial stages of my PhD work. Last but not least, I am grateful to the three reviewers of the manuscript for their comments, to Ee-Ling Low for accepting this book in the Routledge Studies in World Englishes series, and to the people at Routledge for making this book come to life. Finally, I wish to thank my family and friends for supporting me in all of my endeavours. I am forever indebted to my closest friend, Christine, whose friendship is something I value the most, and who never fails to make me laugh

xiv

xiv Acknowledgements and brighten my mood. Sascha and Christiane also deserve special mention; both of them listened to my lamentations during all stages of writing this book and managed to keep me sane. I also thank my friends Anne, Michael, Benjamin, Hannes, Matthias, and Martin for providing me with welcome diversions and encouraging words. Furthermore, I would like to thank Niclas for making my life easier and more enjoyable every day. My biggest gratitude, however, goes to my parents, Ute and Uwe, who have always supported me in every way and who accept that I would rather be a linguist than a blacksmith.

xv

newgenprepdf

Abbreviations

L1 First language L2 Second language AdjP Adjective Phrase AdvP Adverb Phrase AmE American English BrE British English CGEL Cambridge Grammar of the English Language EFL English as a Foreign Language ENL English as a Native Language ESL English as a Second Language HKE Hong Kong English ICE International Corpus of English IndE Indian English LD Left-dislocation NP Noun Phrase PhilE Philippine English PP Prepositional Phrase SABE South African Black English SAIE South African Indian English SinE Singapore English SLA Second-language acquisition TOP Topicalization WALS World Atlas of Language Structures

xvi

1

1 Introduction

Asian Englishes are “contact languages par excellence” (Lange 2012a: 33; emphasis in the original) –they are the product of contact between English, brought to Asia in colonial times, and the indigenous languages spoken in the colonized countries. Mere intuition suggests that these varieties are not carbon copies of the imported English; overhearing a conversation in one of the Asian varieties would, in all likelihood, reinforce this impression. While certain features of British English (or, in the case of the Philippines, American English) have been retained, there are also many differences in pronunciation, lexis, morphology, and syntax. It is worth mentioning that such differences usually occur more frequently in spoken language than they do in written language –reading one of the local newspapers in English is usually less revealing than, for instance, listening to people converse in the market or in the streets.1 As a difference between the spoken ‘traditional’ and spoken Asian varieties of English, the increased usage of topicalization strategies in the latter has been noted repeatedly; see, for reference, Mesthrie (1992), Lange (2012a), and Winkle (2015). Many definitions of topicalization have been proposed; Lambrecht, for instance, provides the following definition: Finally, we can mention the case of the topicalization construction, in which a non-subject constituent is “topicalized,” i.e. marked as a topic expression by being placed in the sentence-initial position normally occupied by the tonal subject. (1994: 147) The following three examples from different components of the International Corpus of English (ICE) serve to illustrate the phenomenon.2 In each example, the topicalized constituent is printed in bold. (1.1) Okay one bridal bouquet uh one bridal bouquet one posy bridal bouquet is a hundred posy is about eighty hair pieces forty corsages he gave free (ICE-SIN:S1A-002#168–169)

2

2 Introduction (1.2) But I I don’t know how it come to because I I in pronunciations I was never checked (ICE-IND:S1A-010#79) (1.3) But I will some of some of them I will cut and I think I will I only go on Tuesday (ICE-HK:S1A-045#138) In this study, I analyse topicalization in Hong Kong English (HKE), Indian English (IndE), Philippine English (PhilE), and Singapore English (SinE). I also analyse topicalization in British English (BrE) in order to be able to compare the Asian varieties of English to a European variety, which is the traditional target variety for all of the analysed varieties except PhilE. It has been noted, in spite of the apparent increased usage of topicalization in Asian varieties of English, that the phenomenon is not exclusively Asian: It is well known that speakers of other varieties employ topicalization as well,3 which is why the question is not one of existence but one of frequency. Identifying the forms, functions, and frequencies of topicalization in four postcolonial Asian Englishes (and British English) and explaining potential differences between the varieties is the primary objective of this book. For this purpose, I have read, tagged, and analysed parts of the spoken components of the ICE corpora for Hong Kong, India, the Philippines, Singapore, and Great Britain. The ICE corpora represent an ideal source for comparing varieties: They have been (and are still being) compiled following a consistent structure; more precisely, they all consist of roughly a million words and a similar distribution of spoken and written texts for each variety. Since topicalization represents a phenomenon that alters the information structure of an utterance and is, therefore, sensitive to discourse- pragmatic decisions by speakers, spoken language was expected to show more tokens of the feature. Furthermore, “oral performance is less constrained and less conservative than written styles, so this is where innovations are most likely to surface” (Schneider 2004: 247). For this reason, the corpus files containing direct conversations, phone calls, and classroom lessons were analysed. Studies in information structure –or, to use another term, information- packaging (cf. chapter 16 in the Cambridge Grammar of the English Language; henceforth: CGEL) –are typically complex affairs because “grammatical analysis at this level is concerned with the relationship between linguistic form and the mental states of speakers and hearers” (Lambrecht 1994: 1). According to Lambrecht, this multifaceted nature of information structure necessitates an integrated approach: Information- structure research neither offers the comfort which many syntacticians find in the idea of studying an autonomous formal object nor provides the possibility enjoyed by sociolinguists of putting aside issues of formal structure for the sake of capturing the function of language in social interaction. (Ibid.)

3

Introduction 3 In addition to the theory-internal complexities of the field, finding variety-specific (as well as potentially overarching) motives for topicalization usage represents yet another largely unresolved issue. The origins of the differences between ‘newer’ and traditional varieties of English are frequently the topic of heated debates, but at least some influence from the indigenous languages, that is, the substrate languages, is often assumed for many of the features diverging from the input variety (cf. Gut 2011: 201). In Sharma’s words, “[s]‌urface similarities across New Englishes can be skin deep, diverging dramatically upon closer examination, due to substrate systems or substrate-superstrate interaction” (2009: 190). For the case of topicalization, ‘topic-prominence’ in the contact languages represents a potential influence on topicalization in Asian Englishes. In highly topic- prominent languages, sentences are structured according to the topic- comment principle. In contrast to (primarily) subject-prominent languages, the topic occurs sentence-initially in such languages (see Li and Thompson 1976). Since ‘topicalization’ refers to the sentence-initial placement of constituents other than the subject, transfer from the substrate languages is a promising explanation for increased topicalization usage. While the Sinitic contact languages of HKE and SinE are considered to be prototypical topic-prominent languages (see, among others, Yip and Matthews 2011; Li and Thompson 1981), the status of the other contact languages is less clear; some publications have argued that important languages in the Indo- Aryan and Dravidian language families (as contact languages of IndE and SinE) and the Austronesian language family (as contact languages of PhilE and SinE) also show traits of topic-prominence (see Junghare 1988; Schachter and Otanes 1972). In addition to the replication of certain structures (cf. Matras 2009), substrate influence may extend to areas that are not of a purely formal nature. Bhatt, for instance, claims that the use of undifferentiated question tags in vernacular Indian English is a reflex of a culture “where the verbal behavior is constrained, to a large extent, by politeness regulations” –choosing a default question tag over the Standard English option is, in his mind, a representation of “non-imposition [as] the essence of polite behavior” (2008: 553). Although this claim has been criticized (cf. Lange 2012a), investigating linguistic structures as reflections of cultural attitudes is of value. Establishing topic continuity has been cited as one of the primary functions of topicalization, and creating continuity in discourse is arguably another facet of being polite to the interlocutor(s).4 Thus, cultural habits can result in the preference of a certain structure or feature that may be used less (or not at all) in traditional varieties, and it is this phenomenon that d’Souza terms ‘grammar of culture’: ‘Grammar of culture’ is used here to mean the acceptable possibilities of behaviour within a particular culture. This includes notions of the kind of behaviour that is appropriate or expected in a given context. Since the use of language is included within the ‘acceptable possibilities of behaviour,’ some correlation may be found between socio-cultural factors and their linguistic manifestations. (1988: 160; emphasis in the original)

4

4 Introduction While substrate influence often seems a useful first explanation for the occurrence of a non-standard feature, the sole analysis of the substrate(s) followed by the conclusion that a feature has been transferred is going to be simplistic. In the acquisition of English as a Second Language (= ESL) or English as a Foreign Language (= EFL), “[m]‌any issues such as the social context, the learner’s age and gender, motivation and type of instruction combine in myriad ways that make the learning situations of individuals virtually unique” (Gut 2011: 108). Thus, no research on a non-standard feature should simply be concluded once the presence or absence of a similar structure in the contact language has been determined.5 Instead, a much more promising approach takes into consideration language contact, variety status, cultural/linguistic identity, and processes of second-language acquisition (SLA) and acknowledges the complexity involved in feature selection. For the present study, I predict that several of the varieties’ substrate languages provide the pattern of topicalization and topic-comment structures. However, topicalization in all varieties is also influenced by individual speaker preferences, the input variety, the developmental phase of each variety, general processes of second-language acquisition as well as cultural habits. In assessing the possible forms and functions of topicalization, the question needs to be asked to what extent topicalization in the analysed varieties is different from the traditional varieties of English. In a study on South African Indian English (SAIE), Mesthrie (1992) identified six differences (referred to as ‘expanded functions’) between topicalization in SAIE and ‘mainstream’ varieties of English. Two of these differences are an increased frequency of topicalization in SAIE and the interaction of topicalization with questions and negation; however, the most important differences are the topicalization of constituents other than noun phrases (NPs) and the topicalization of information that is new to the discourse. This study shows that the analysed varieties (including BrE) fulfil most of the proposed criteria. For this reason, I suggest that spoken varieties of English, in general, tend to use topicalization creatively; the main difference between the analysed varieties is of a quantitative nature. Based on these deliberations, there are three major research questions that I address with this project: (1) What are the frequencies, forms, and functions of topicalization in HKE, IndE, PhilE, and SinE, and do they differ significantly from BrE? (2) Do Mesthrie’s ‘expanded functions’ of topicalization (1992) apply to the analysed varieties of English? (3) Which factors can explain different frequencies of topicalization in the four analysed Asian Englishes? In order to provide answers to these three questions, the book proceeds as follows: Chapter 2 establishes the terminological and the theoretical framework for the present study. Most importantly, I discuss two core notions necessary for the subsequent chapters: ‘topic’ and ‘topicalization’. By giving a definition of ‘topic’ that combines both traditional and recent perspectives and recognizes both

5

Introduction 5 ‘givenness’ and ‘aboutness’ as relevant aspects in topic identification, a sound foundation for a definition of topicalization is provided. Defining what is meant by topicalization in this book forms the final part of this chapter. Chapter 3 serves as a link between chapter 2 and chapter 4 by looking at the role of topics in some of the major contact languages of the Asian varieties under investigation. More precisely, this chapter is concerned with topic-prominence, that is, the degree to which the topic-comment principle dominates word order. Building on Li and Thompson’s criteria for topic-prominence laid out in their paper from 1976, this chapter analyses to what extent several Indo- Aryan, Dravidian, Sinitic, and Austronesian languages can be called ‘topic-prominent’. Chapter 4 provides a general introduction to Asian Englishes and, more specifically, to the varieties I chose to include in my analysis. In the first section, I introduce the main theoretical frameworks that have been proposed for the analysis of World Englishes with a focus on Kachru’s Three Circles (1985) and Schneider’s Dynamic Model (2003, 2007). After an introduction to the main aspects of the Dynamic Model, the four varieties of interest are introduced. For each of the four varieties, the sociolinguistic setting and the general framework in which they are analysed receive attention. Chapter 5 introduces the data and the applied method and describes the limitations of my study. First, the selected data are discussed in detail by introducing the ICE family of corpora and the reasoning behind the choice of the sub-corpora. Some methodological problems and differences between the three analysed corpus components, that is, the direct conversations, phone calls, and classroom lessons, are highlighted. The tags used for the annotation are introduced and are illustrated with corpus examples, and the process of preparing the corpora for analysis and tagging is described. Furthermore, critical cases that were not counted in the analysis are discussed in this chapter. Chapter 6 presents the results of the empirical analysis from quantitative and qualitative points of view. Based on the findings from the ICE corpora, the first section of this chapter addresses whether topicalization is a pan-Asian feature or, in other words, a feature that is shared between several Asian varieties of English. For this purpose, I present the frequency of topicalization in each of the varieties and identify variety-specific preferences for topicalizing certain constructions. In addition to frequencies, the forms and functions of topicalization are analysed. The general label ‘forms’ incorporates syntactic forms (phrases and clauses) and the information status of topicalized constituents. The label ‘functions’ encompasses the syntactic functions of topicalized constituents as well as the discourse functions of topicalization. Chapter 7 serves as a link between the theoretical considerations of the first chapters and the data evaluation. It defines topicalization as a feature whose frequency and usage patterns cannot be explained by language contact or second-language acquisition alone; instead, its frequency and usage patterns are multicausal. Chapter 8 concludes the book by giving a summary of the main findings and an outlook on possible future studies. In particular, the three research questions

6

6 Introduction mentioned in this introduction and the results presented in chapters 6 and 7 are revisited, and some questions that remain open for further research are presented.

Notes 1 For an in-depth discussion of spoken versus written data, see chapter 5 on methodology with a discussion of my choice of data. 2 These short excerpts are only supposed to give a first impression. For this reason, any speaker information and the surrounding discourse are deliberately left out. 3 Filppula (1986) discusses topicalization in Irish English, and Speyer (2010) and Winkle (2015) account for dialects of American and British English. 4 Valentine, referring to strategies such as topicalization as ‘elliptical repetitions’, comments that “Elliptical repetitions help to please the positive face of the hearer and to reduce any uncertainty” (1995: 233). 5 This is an important point also raised by Thomason (2001).

7

2 Approaching topicalization

Topicalization is the feature of interest in this book –but how can it be defined and how have linguists analysed it thus far? Generally speaking, topicalization is an ‘information-packaging’ construction serving to manipulate the information structure of an utterance (see Ward et al. 2002; Winkle 2015). Information structure, in turn, is concerned with “how information is presented, in contrast to the information itself” (Krifka and Musan 2012b: 1). In this book, topicalization is understood as the marked fronting of constituents: By placing objects, complements, (obligatory) adverbials or embedded subjects in sentence-initial position, speakers of Asian Englishes (and other varieties) turn these constituents into topics (and/ or foci) and employ a discourse-pragmatic strategy to create contrast, emphasize constituents, shift the topic or establish topic continuity (see Lange 2012a). Since the word ‘topicalization’ contains the term ‘topic’, discussing what is meant by it and how the term has been discussed in linguistic literature certainly deserves attention. For this reason, the first part of this chapter analyses ‘topic’ as a linguistic notion. Taking into account historical as well as recent descriptions of the term, I place particular emphasis on the topic-defining criteria ‘givenness’ and ‘aboutness’. In the discussion of ‘topic’, I also address the relation between information structure and syntax. In the next sub-chapter, I turn to definitions of topicalization. I first discuss frameworks given by Prince (1992), Lambrecht (1994), Birner and Ward (1998), and Ward and Birner (2004) to set the stage and then focus on Mesthrie (1992, 1997). Most importantly, I suggest that several aspects of Mesthrie’s definition of topicalization make it preferable for the analysis of World Englishes and vernaculars, in particular because of the inclusion of potentially unexpected patterns. At the end of the chapter, I discuss to what extent topicalization can be considered a ‘non-canonical’ construction and which discourse functions are associated with topicalization.

2.1 Topics 2.1.1 Origin and history A good starting point for a definition of ‘topic’ is a look at the term’s origins and its terminological counterparts. The first important distinction that needs to be

8

8 Approaching topicalization made is that between subject and topic. Although in many European languages – among them, the majority of the Germanic and Romance languages –subject and topic are often realized by the same expression, this overlap is by no means a given for every sentence. This is so because the pairs of subject and predicate and topic and comment, respectively, operate on different levels: While subject and predicate indicate grammatical relations, topic and comment are concerned with the information structure of an utterance. Documented discussions of this problem date back as far as the middle of the nineteenth century, when Henri Weil differentiated between syntax and what he called a ‘marche de la pensée’ –a ‘progression of thought’. The following quotes –the original French text from 1844 as well as Charles W. Super’s translations into English from 1887 –give insight into Weil’s ideas: On a fait sentir qu’il y a une marche de la pensée qui diffère de la marche syntactique, puisqu’elle en est indépendante et qu’elle reste la même sous les diverses transformations de la phrase et même dans la traduction en une langue étrangère. (1844:  22–23) We are made to feel that there is a progression of thought which differs from that of syntax, because it is independent thereof and because it remains the same amid the diverse transformations of the sentence, and even when we translate into a foreign tongue. (1887: 28) In the course of his thesis, Weil sketches a grammatical category (i.e., syntax) and a cognitive-pragmatic category (i.e., the progression of thought) and, in doing so, he determines that syntax is less flexible than information structure, which can be shaped according to the individual’s needs.1 However, he also notes that the syntax of a language directly influences the extent to which the information structure of a sentence may be varied. This is apparent if we consider how the options for word-order manipulation differ between languages, sometimes even in languages of the same language family. A well-known example of this is the contrast between SVX in English and German with its topic-based verb-second/ ‘V2’ word order in main clauses.2 The examples in (2.1) and (2.2) illustrate the higher flexibility in German word order compared to the relatively fixed word order of English.3 (2.1) Der A RT. D E F. N O M . M . S G

Den A RT. D E F. A C C . M . S G

(example my own)

Hund dog Mann man

beiß-t bite-3 P S beiß-t bite-3 P S

den ART.D E F.AC C .M.SG

der ART.D E F.N OM.M.SG

Mann. man Hund. dog

9

Approaching topicalization 9 (2.2) The dog bites the man. #The man bites the dog. (Example my own) In German, the case system allows the relatively free movement of both arguments and adjuncts in main clauses: Grammatical relations remain intact and a (mostly) unambiguous subject-object distinction is possible because of case marking. This, in turn, allows speakers to choose freely between any constituent (other than a finite verb) to become the sentence-initial topic. However, it is only the first sentence that can be translated directly into English without encountering problems: As soon as the direct object and the subject swap positions, the sentence loses its intended meaning. The formal identity of subjects and objects in English is one of the outcomes of historical processes that have been discussed extensively under the heading of the ‘Middle English creolization hypothesis’ (see Danchev 1997; Maroldt 2010). As a result of English losing many of its synthetic traits in the transition from Old English to Middle English, a gradually more fixed word order balanced out the lack of nominal inflection and, consequently, objects could no longer be moved into sentence-initial position in canonical sentences. English then required speakers to make use of non-canonical constructions, such as topicalization, if they wished to turn a constituent (other than the subject) into the topic.4 In the majority of sentences, however, the need for doing so is debatable: Givón (1979: 210) estimates from a cross-linguistic point of view that subject and topic do not correlate in only 10–20 per cent of cases. For German, Engel (1972: 44) claims that approximately 40 per cent of simple sentences have a topic not realized by the subject.5 Welke explains the strong correlation of subject and topic with the “inherent thematicity of the subject” (1992: 57).6 As a short summary of the above, the following two points shall suffice: (a) Weil, using the expression “une marche de la pensée” in 1844, was, to my knowledge, the first (Western) scholar in modern times to talk about information structure; and (b) he correctly observed the (partial) dependence of flexibility in modifying the information structure of a sentence on a language’s syntax, as exemplified by the German-English contrast above. Another seminal work dealing inter alia with the problem of the subject- topic distinction is Georg von der Gabelentz’s Die Sprachwissenschaft (Gabelentz 2 1901). In this book, von der Gabelentz distinguishes between a grammatical and a psychological subject and predicate. The first pair correlates with the traditional notions of subject and predicate, whereas the second pair can be interpreted as realizations of Weil’s ‘progression of thought’. Von der Gabelentz offers (2.3) as an example. (2.3) Mit COM

Speck bacon

fäng-t catch-3 P S

man Mäuse. one mouse.PL .AC C (Gabelentz 21901: 369–370; gloss added)

10

10 Approaching topicalization A set of possible translations into English could look like the sentences given in (2.4). (2.4) a. One catches mice with bacon. /You catch mice with bacon. b. With bacon one catches mice. c. Bacon is used to catch mice. d. Mice are caught with bacon. (Examples my own) In the German sentence, the indefinite pronoun man (Engl. one/you) is the grammatical, but certainly not the psychological, subject. According to von der Gabelentz, the psychological subject –the entity the hearer focuses on –is the sentence-initial prepositional phrase Mit Speck (Engl. with bacon). As in examples (2.1) and (2.2), we can see that the German syntax of the main clause is much less restrictive than in the English translation. An intuitive translation might look like (2.4a). A word-by-word translation as in (2.4b) requires the use of topicalization, and an adequate translation without violating the SVO order would require the addition or removal of lexical material as in (2.4c). Only in (2.4b) and (2.4c), however, is the unambiguous status of bacon as the topic maintained. What we can see here –and I take this to be von der Gabelentz’s first major point –is another instance of the grammatical and the psychological subject not being realized by the same phrase. The second important point is the requirement for speakers of English (and, depending on its typological structure, any other language) to wilfully employ strategies such as inversion or other word- order manipulations if they wish to form a sentence with a constituent other than the pre-determined one, usually the grammatical subject, as the psychological subject. The similarities between Weil’s and von der Gabelentz’s contemplations are striking, since both scholars emphasize the separation of preconditioned grammatical structure and modifiable information structure. Von der Gabelentz’s theory and terms were later adopted by neogrammarian Hermann Paul. Adding to previous findings, Paul elaborated on some aspects of psychological subjects and predicates –the interplay of word order and intonation, among others –that Weil and von der Gabelentz did not discuss in detail. For a comprehensive discussion of all these aspects, I refer to von Heusinger (1999: 108–111); for my purposes, I only discuss two points from von Heusinger’s list: the potential overlap of the psychological subject and predicate and the role of intonation. These two aspects are inexorably tied together because an overlap is created by means of intonation. Paul illustrates this with the sentence Karl fährt morgen nach Berlin (Karl goes to Berlin tomorrow), each constituent of which may be the psychological predicate (see 1880/21886: 236). Placing tonal emphasis on the different constituents results in different psychological predicates. As von Heusinger points out (1999: 109), Paul does not illuminate his readers regarding the question of whether the phenomenon he tries to explain is intra-sentential or spanning over longer discourse sequences. For the moment, however, the relevant issue is the possibility of overlapping psychological subjects and psychological

11

Approaching topicalization 11 predicates. Von der Gabelentz thought intonation to be independent of word order (21901: 376), but Paul correctly observed that intonation also factors into information structure. Moving away from the terminology of psychological subjects and predicates, scholars of the Prague School published several works on information structure using the terms ‘theme’ and ‘rheme’, which had previously been introduced by Ammann (1928/21962).7 Mathesius, one of the most prolific publishers on the issue and founder of the Prague linguistic circle, found that “[i]‌n English […] the subject has to a considerable extent acquired thematic function, i.e. the function of expressing the agent of the action has been appreciably weakened in favour of the function to express the theme of the utterance” (1975: 101). For him, this arises out of the necessity of a sentence to follow the theme-rheme (or topic-comment) structure. Were the English word order less fixed (such as the word order of Czech, which Mathesius uses for comparisons), the subject would not be required to carry thematic function to the same extent (ibid.: 103). In Czech, on the other hand, the so-called functional sentence perspective dominates: Rather than having the subject in the initial position, other constituents may be placed there and fulfil the thematic role. Brief discussions of the Prague School’s core ideas in German and English can be found in Wüest (2011) and von Heusinger (1999). In Western Europe, the publication of Ferdinand de Saussure’s Cours de linguistique générale by his students Charles Bally and Albert Séchehaye brought forth a paradigm shift in linguistic scholarship. Although Saussure never explicitly mentions topic and comment, he does bring up a contrast akin to the grammatical/psychological distinction: But we must realize that in the syntagm there is no clear-cut boundary between the language fact, which is a sign of collective usage, and the fact that belongs to speaking and depends on individual freedom. In a great number of instances it is hard to class a combination of units because both forces have combined in producing it, and they have combined in indeterminable proportions. (1916/1959:  125) Saussure distinguishes between a conventionalized, fixed category –“le fait de langue, marque de l’usage collectif” –and a category of individual language realization –“le fait de parole, qui dépend de la liberté individuelle” (1916: 173). His trichotomy of langue, langage, and parole is, of course, the matter of introductory courses to linguistics. However, in the context of information structure, it is interesting to see that Saussure acknowledges –not unlike Weil, von der Gabelentz, and Paul before him –the fact that individual choices and syntagmatic rules coexist, but often coincide in such a way that makes it difficult to tell them apart. This is exclusively true for ‘concrete’ language –for parole –which was not Saussure’s concern. The last scholar whose work I (briefly) wish to introduce in this historical outline is Halliday, who was the first to use the term ‘information structure’

12

12 Approaching topicalization and followed the tradition of the Prague School in using the terms theme and rheme (always written with initial capitals in his works, cf. 1985/42014: 89). ‘Theme’, according to him, is “the element that serves as the point of departure of the message; it is that which locates and orients the clause within its context” (ibid.). ‘Rheme’, on the other hand, represents “[t]‌he remainder of the message, the part in which the Theme is developed” (ibid.). In a statement reminiscent of Saussure’s elaborations on syntagma, Halliday comments on the influence of individual preference in choosing a topic: “[T]he speaker/writer is selecting the desired Theme –[…] there can be variation in what is chosen as the thematic element in the clause” (ibid.: 90). Most importantly, Halliday is credited with establishing a system that fully incorporates intonational phrasing (see von Heusinger 1999: 118). What became evident in this brief historical survey is the status of information structure as a category that is not purely syntactical: Weil talked about it as a subjective movement and a progression of thought, and von der Gabelentz and Paul spoke of psychological subjects and predicates. More recent definitions also take the multidimensional nature of information structure into account, as Krifka and Musan’s definition reveals: “[W]‌ith the term information structure we understand aspects of natural language that help speakers to take into consideration the addressee’s current information state, and hence to facilitate the flow of communication” (2012b: 1). Even though there can be no doubt that the syntax of a language provides the scope for variation in information structure, there must be some kind of (conscious or subconscious) motivation for choosing non-canonical constructions such as topicalization or, for typologically more flexible languages, to make anything but the subject the topic (Reinhart 1981: 53). We must assume that speakers either consciously or subconsciously decide on a specific topic- comment sequence that facilitates communication. Speaking in Gricean terms, one could say that speakers usually follow the maxim of relation (1975: 46–47): By picking up a topic, continuity is warranted, and the relevance of a contribution is secured. This is not to say that speakers do this all the time. If they did, we would, indeed, be in “a philosopher’s paradise” (Levinson 1983: 102). Still, the reality is that “in most ordinary kinds of talk these principles are oriented to, such that when talk does not proceed according to their specifications, hearers assume that […] the principles are nevertheless being adhered to at some deeper level” (ibid.). Another problem we learn about from older as well as from newer publications is that there is a multitude of terms used to describe the two parts of a sentence’s information structure. Von Heusinger lists, among others, the pairs of ‘topic’ and ‘comment’, ‘topic’ and ‘focus’, ‘theme’ and ‘rheme’, and ‘psychological subject’ and ‘psychological predicate’ (1999: 102). It goes without saying that it would be presumptuous to take all of these terms as referring to the exact same thing, especially considering that some of them stem from schools of thought that could not be more different from each other. Nevertheless, the core idea is probably the same in many of the approaches: The ‘classical’ notions of subject and predicate have been deemed too restrictive to account for speaker-hearer involvement in

13

Approaching topicalization 13 producing and processing discourse and the resulting altered sentences, so that a new set of terms was required. For the purposes of this book, I use two of the abovementioned pairs. The first one is that of ‘topic’ and ‘comment’, which plays a role whenever I talk about information structure without incorporating suprasegmental features. The second pair is ‘topic’ and ‘focus’, with focus being new information but the tonally highlighted word as well. In the following three sub-chapters, my goal is to define ‘topic’ by reviewing the findings of (primarily) more recent literature. My main concern here is the role of the two properties of givenness and aboutness in understanding topics, which is why 2.1.2 is devoted to the former and 2.1.3 to the latter. These two sections are followed by a sub-chapter that functions both as a summary of my deliberations in 2.1 and as an attempt at my own definition. 2.1.2 Givenness Both givenness and newness are fundamental concepts in any study of information structure and, unlike Speyer, I do not think of either of the two as “rather self-explanatory” (2010: 4). There is a wide range of publications tackling the nature of what constitutes given and new information, and the role of givenness in defining topics has been discussed controversially (at least) since Reinhart (1981). Because of the overwhelming amount of literature on the phenomenon, considering all or even the majority of publications dealing with givenness is impossible in this chapter.8 Therefore, I mainly focus on the theories proposed by Gundel (1985, 2010, 2012), Gundel et al. (1993), Gundel and Fretheim (2004), and Krifka and Musan (2012b), since they represent some of the more well-known, but also some of the more controversial, frameworks. Gundel defines the relation between givenness and newness (a) with regard to relational givenness-newness and (b) with regard to referential givenness-newness (2010: 222–223). Of course, the names of the two strands already reveal the fact that both are concerned with the distinction between given and new information. There is, however, a substantial difference in their scope since only relational givenness-newness is intra-sentential, whereas referential givenness-newness can be extra-linguistic as it includes, inter alia, visual and aural stimuli, general knowledge, shared knowledge among the interlocutors, and so forth (see Chafe 1976: 31). The first of the two types, that is, relational givenness and newness, is concerned with the internal structure of a sentence. Rather than pertaining to anything –be it linguistic or extra-linguistic –outside of the individual sentence, the relation between givenness and newness in this type is exclusively intra-sentential. The first part X is given in relation to the second part in that it establishes something that can be commented on or questioned about, and so forth, without being dependent on the second part Y (Gundel and Fretheim 2004: 177). The second and new part, on the other hand, is relationally new in that it provides information, and/or asserts or asks something about the first part. For Gundel and Fretheim, it is this pair of terms that corresponds to the notions of topic and focus, topic and comment, theme and rheme, and so forth (ibid.).

14

14 Approaching topicalization In contrast, referential givenness-newness “involves a relation between a linguistic expression and a corresponding non- linguistic entity in the speaker/ hearer’s mind, the discourse (model), or some real or possible world” (Gundel 2012: 587). This type is independent of relational givenness-newness and does not necessarily correlate with topic or focus in any way (Gundel and Fretheim 2004: 179). One of the more frequent problems occurring in the literature is that givenness is sometimes not defined in a way that makes it possible to account for the subtle differences in the degree of givenness in individual utterances. Differentiating between ‘given’ and ‘new’ without also commenting on salience yields a simplistic understanding, which is why a “definition of givenness must be such that it allows for saying that an expression is given to a particular degree” (Krifka and Musan 2012b: 22). Gundel et al. (1993) tackled this problem with the ‘Givenness Hierarchy’: Because of the fact that information may be more or less present in the interlocutor’s minds (something may be in focus, or it may only be known to be generally existent), speakers choose from a variety of linguistic means (mostly pronouns and determiners such as it, that, the, a, etc.) in order to express the degree of givenness (Gundel et al. 1993: 275). Even without specialist knowledge, it is clear to proficient speakers of English that an expression such as this car refers to a car that is either visible or (at least presumably) shared knowledge among the involved interlocutors in the moment of the utterance. The phrase a car, on the other hand, does not automatically delimit the amount of cars that the interlocutor refers to (unless additional information such as like this one right here is added in the utterance or the discourse). In order to illustrate the two kinds of givenness-newness, Gundel and Fretheim (2004) offer the brief exchange given in (2.5). (2.5) A: Did you order the chicken or the pork? B: It was the P O R K that I ordered. (Gundel and Fretheim 2004: 177; emphasis in the original) In B’s response, P O R K is referentially given: It is activated or in focus since it clearly entered the discourse in A’s utterance. Relationally, however, it is new, since only the fact that B ordered some kind of meat is given information. Another approach to givenness is based on the ‘common ground’. Established by linguists and philosophers such as Stalnaker (1978) and Clark and Brennan (1991), common ground was initially “considered as a distributed form of mental representation and adopted as a basis on which successful communication is warranted” (Kecskes and Zhang 2009: 332). In this traditional view, the common ground is seen as “the field on which a language game is played” (Stalnaker 2002: 720). The participants in this language game bring their own individual beliefs into the common ground but, at the same time, they are thought to make assumptions about the beliefs and intentions of the other person(s) involved. Stalnaker bases his theory on the social and conventional nature of communicating with each other, as speakers (a) make assumptions and

15

Approaching topicalization 15 (b) accommodate others (ibid.). In recent research, common ground has been defined as more dynamic and related to cognitive processes. Unlike Stalnaker, whose theory heavily focuses on the idea of the Cooperative Principle (Grice 1975), researchers involved in developing a newer theory believe that speakers are much more egocentric than had previously been assumed (see Keysar 2007: 81–82). Kecskes and Zhang (2009, 2013) criticize both the traditional view and the more recent view because, according to them, these perspectives fail to incorporate socio-cognitive factors to a sufficient degree. Whereas the traditional (or, according to Kecskes and Zhang, ‘pragmatic’) view holds cooperation to be the major force in communication, and the more recent, cognitive view sees egocentrism as the driving force, the socio-cognitive view regards “communication [as] the result of the interplay of intention and attention motivated by the socio- cultural background” (2009: 338). This socio-cognitive approach offers an array of advantages over the pragmatic and the cognitive views. First and foremost, this approach understands speakers and hearers as individuals who may or may not enter a conversation with a completely different cognitive status (Kecskes and Zhang 2013: 378). This claim asserts that making assumptions about the interlocutor’s beliefs (Stalnaker 2002) does not necessarily have to happen before making an utterance. As individuals with individual mindsets and individual intentions, participants in a conversation may choose to ignore information that is already part of the common ground or to force the conversation into a certain direction. This is particularly relevant to less c onservative definitions of topicalization, as this belief entails that topicalized information does not have to represent ‘old’ information. Another advantage of the approach is its acknowledgement of the individual interpretation of an utterance by each involved speaker and hearer. Although Kecskes and Zhang do not mention this explicitly, it is safe to assume that both cooperation and egocentrism factor into how an utterance is interpreted but, ultimately, some combination of socio-cognitive factors is decisive. Another aspect of their theory is the distinction between core common ground and emergent common ground (2009, 2013). The former refers to “the relatively static, generalized, common knowledge that belongs to a certain speech community as a result of prior interaction and experience”, whereas the latter “refers to the relatively dynamic, actualized and particularized knowledge co-constructed in the course of communication that belongs to and is privatized by the individual(s)” (2013: 379). This distinction is similar to Krifka’s and Musan’s framework, in which they describe common ground as mutually shared information and differentiate between (a) common ground content and (b) common ground management (2012b: 1). Common ground content refers to the truth-conditional information of the common ground, whereas common ground management considers the development of the common ground (ibid.: 4). Common ground content contains information that is either assumed to be shared or information that has already been explicitly verbalized and, thus, added to the common ground. Krifka and Musan present different strategies of introducing entities into the common ground as well as

16

16 Approaching topicalization to shape and reference the common ground; amongst others, indefinite NPs, pronouns, and definite NPs can be used for this purpose (2012b: 1–3). One example is given in (2.6). (2.6) I had to bring my cat to the vet because it was sick. (Krifka and Musan 2012b: 2; emphasis added) In this example, the knowledge that the speaker has a cat is introduced into the common ground by my cat. Thus, fully introducing the cat in the next part of the utterance is not necessary; instead, the speaker may opt to refer to it by means of a pronoun –their ownership of the cat has become part of the common ground. In other words: Whenever a speaker introduces new information into the common ground, the hearer (or addressee) adopts it and it becomes mutually shared information. Combining Gundel’s referential givenness and Krifka and Musan’s framework, we arrive at the following working definition: A topic can be considered as given when the information it represents is stored in the common ground. Topics may enter the common ground via either linguistic or extra-linguistic means. Linguistic means are explicit expressions in the discourse or the opening of a so- called poset, of which the element of interest is a part. Posets (‘partially ordered sets’) are defined by Ward and Birner in the following way: The notion of a poset subsumes both coreferential links, where the linking relation between the preposed constituent link and the corresponding poset is one of simple identity, and non-coreferential links, where the ordering relation is more complex. (2004: 159) For example, this means that pronouns as identity links, but also complex type/ subtype relations, may link different expressions, although all of them would be categorized as belonging to the same poset and, therefore, to the same topic. As previously mentioned, extra-linguistic means include stimuli of all sorts as well as shared knowledge; however, the degree to which information is part of the common ground may vary. As I show in the second part of this chapter, givenness plays an important role in defining topicalization. 2.1.3 Aboutness I noted before that the discussion of what constitutes a topic mostly revolves around the two concepts of givenness and aboutness. Hockett, who coined the terms ‘topic’ and ‘comment’ in the 1960s, provides a description for ‘topic’ involving aboutness: “The speaker announces a topic and then says something about it” (1958: 201). More recently, Dalrymple and Nikolaeva define topic as “the entity that the proposition is about” (2011: 48). They exemplify aboutness with the brief exchange in (2.7):

17

Approaching topicalization 17 (2.7) What is Bill doing? or What about Bill? He is eating pizza in the kitchen. (Dalrymple and Nikolaeva 2011: 50; emphasis in the original) In this example, the speaker assumes or, perhaps, even expects the addressee to take the utterance to be about Bill, and therefore also expects them to respond with a comment on the topic Bill. Making a case for the importance of aboutness, Reinhart provides the examples given in (2.8) and (2.9) to show that the given-new distinction in a referential sense is not sufficient to explain topics. (2.8) A: Who did Felix praise? B: Felix praised M A X . (Reinhart 1981: 72; emphasis in the original) (2.9) A: Who did Felix praise? B: Felix praised H I M S E L F . (Reinhart 1981: 72; emphasis in the original) The first pair of sentences, according to Reinhart, is unproblematic: Felix praised x is given information and Max is the variable representing the new information. In the second sentence, however, we can see that the referent of both the given and the new information is Felix. Reinhart believes that, based on the givenness criterion, “this person is simultaneously in and not in the participants’ immediate awareness or general consciousness, which is a plain contradiction” (ibid.). She is particularly critical of the givenness criterion and, to an extent, I agree with being cautious towards viewing it as the only criterion. However, I do not think that anyone would truly want to postulate that Felix in the example is in and not in the immediate awareness at the same time. The example, like any other, needs to be seen in context: The options for whom Felix could have praised are unknown; it remains a variable, and himself is only one of many potential answers. It is of course not Felix as a person who is not in the immediate awareness; it is the fact that he praised himself. Only an overly formal analysis would have to arrive at the conclusion that he is both in and not in the immediate awareness, which is not the approach taken here. The crucial role of aboutness in topic identification has, to the best of my knowledge, never been put into question. Endriss and Hinterwimmer, for instance, postulate that “most linguists agree that an aboutness-relation holding between the topic and the rest of the clause is a necessary ingredient in the definition of topicality” (2007: 83; emphasis in the original). Similarly, Krifka and Musan believe that ‘topic’ is most commonly defined as the element about which the comment provides information (2012b: 25). Even when Gundel and Fretheim talk about relational givenness and newness, they refer to aboutness at the same time –to the point of explicitly using the word ‘about’.

18

18 Approaching topicalization Overall, the importance of aboutness in defining topics should have become abundantly clear. The question remains, however, how we can bring together the notions of aboutness and givenness in a definition of ‘topic’. 2.1.4 Defining ‘topic’ For the purpose of joining givenness and aboutness and giving my own definition of ‘topic’, I would first like to turn to a paper by Jacobs (2001) who, I believe, presents one of the more convincing accounts of how to determine topic and comment. In a first step, Jacobs correctly states that topics may be marked lexically, morphologically, syntactically, or by prosodic means (2001: 641).9 He notes that there are different degrees of how much the topic is syntactically integrated into a sentence: It may overlap with a grammatical function; it may be situated outside of a clause but be taken up by another element within the sentence or clause (as in cases of left-dislocation); or it may not be integrated at all (ibid.). The central problem, in his opinion, is that scholars failed to realize the absence of a “unitary functional notion” behind all of the different forms a topic can have (ibid.: 643). This failure resulted in a number of problems, most notably conceptual problems. The advantage of newer approaches by, for instance, Gundel (2010, 2012), Kecskes and Zhang (2009, 2013), Krifka and Musan (2012b), and others is that, in one way or another, they all capture that given and new are very complex concepts and, more importantly, concepts not mutually exclusive with aboutness. As I noted before, there is a dynamic side to shared information: Information may be given as an extra-linguistic property (such as a house in sight of the interlocutors or a dog’s bark), or knowledge may be consistently developed and shaped in the course of a conversation (as suggested by Kecskes and Zhang’s emergent common ground and Krifka and Musan’s common ground management). In his approach, Jacobs tries to capture the complex nature of topic-comment (TC) structures by distinguishing amongst four dimensions of TC, which he calls ‘informational separation’, ‘predication’, ‘addressation’, and ‘frame-setting’ (2001: 643–644). Informational separation was alluded to already by Hockett’s definition, which is also quoted by Jacobs (ibid.: 645): “The speaker announces a topic and then says something about it” (1958: 201). In this quote, the sequential character of the topic-comment structure is indicated by the words and then. ‘Addressation’, in turn, is Jacobs’s version of aboutness: “[A]‌n address is a constituent that –via its reference […] –identifies one of these mental files [i.e., the topic]: it refers to the entity that is the subject of the file” (Jacobs 2001: 651). The last criterion, frame-setting, corresponds to inference to an extent. However, English as a subject-prominent language does not primarily use topics as frame- setting devices. Chafe comments that the topic in topic-prominent languages provides a frame for the sentence rather than indicating directly what it is about (1976: 51). In his approach, Jacobs does not claim that a topic has to fulfil all four of the criteria he listed. Instead, a topic-comment sequence can be analysed using these four criteria, and any combination may apply –depending on the language

19

Approaching topicalization 19 and the example at hand. This is also where the major advantage of Jacobs’s approach lies: Cross-linguistic comparison becomes feasible because differences in how information is structured can be accounted for. Before I turn to my own definition, I would like to cite Krifka and Musan’s definition of topic. According to them, “[t]‌he topic constituent identifies the entity or set of entities under which the information expressed in the comment constituent should be stored in the common ground content” (2012b: 28). Similar to Krifka and Musan, I consider aboutness as a strong criterion for topic identification. On the other hand, givenness (in a referential sense) is a relatively weak criterion. Topics cannot be defined on the basis of referential properties alone, whereas they may be understood solely in terms of aboutness. Most importantly, understanding the basics of information structure and topics is a substantial ingredient in understanding topicalization. Secondly, delimiting the meaning of ‘given’ is a helpful tool in contextualizing topics: The deliberate choice of a topic that is not given at all might reveal speaker attitudes or tell us something about discourse cohesion. In summary, I provide the following working definition of topic: A topic is usually that part of a sentence about which something is said, asserted or questioned, but it can also be a frame-setting device for the sentence. A topic does not necessarily have to be part of the common ground, but it will be in the vast majority of cases. For the specific case of English, I assume that the topic is usually the sentence-initial constituent. Finally, a topic may overlap with the focus.

2.2 Topicalization The previous section introduced the notion of ‘topic’, a term that has been defined in a variety of ways. The situation with ‘topicalization’ is similar; however, there appears to be at least a core agreement on the fact that it involves the fronting of constituents. Regarding the specifics of this process, however, various approaches with differing degrees of consensus have been proposed. In this chapter, some of these approaches are presented and compared. One of my main claims is that some of the more widely received accounts of topicalization in English do not take the full range of possible patterns found in varieties beyond traditional L1 Englishes into consideration, which makes them partially unsuitable for the analysis at hand. Throughout this chapter, I concentrate on the role of information status, the relation between topicalization and other non-canonical constructions, and the application of definitions of topicalization to L2 varieties of English. This chapter concludes with a summary and my own approach to topicalization. 2.2.1 Establishing a framework for topicalization The theoretical foundation that represents –at least in parts –the basis for much of the work that would follow on topicalization, has been established by Prince (1981a, 1981b, and 1992). In her article from 1992, Prince focuses on the information status of discourse entities and differentiates (a) between entities that are

20

20 Approaching topicalization old and new and (b) between the current information status with regard to the discourse and to the hearer. Examples (2.10) and (2.11) given by Prince illustrate the possible combinations. (2.10) a. I’m waiting for it to be noon so I can call someone in California. b. I figure she’ll be up by 9, her time. (Prince 1992: 309; emphasis in the original) (2.11) a. I’m waiting for it to be noon so I can call Sandy Thompson. b. I figure Sandy/she’ll be up by 9, her time. (Prince 1992: 309; emphasis in the original) The information of interest in the examples is in italics (as in the original text) and placed into the relevant combination of hearer-status and discourse-status in Table 2.1. Someone in California in (2.10) is ‘brand-new’: It has not been mentioned in the discourse (as indicated by the indefinite pronoun someone) and is hearer- new because there would be no need to speak of someone in lieu of an explicitly named person unless the speaker wants to hide this information from the hearer or, as is more likely, the person is simply unknown to the hearer. A brand-new entity may be unanchored or anchored; the latter case refers to discourse entities that are, in some way, linked to another discourse entity (Prince 1981a: 236). The combination of discourse-old and hearer-new is considered impossible, because information that has been mentioned in the discourse cannot technically be new to the hearer.10 Sandy Thompson in (2.11), on the other hand, is discourse-new, but hearer-old. The speaker must assume that the hearer knows her (or specifically wants to trigger a question on who Sandy Thompson is) because, otherwise, giving a full name would lead to unsuccessful communication. The final option is what Prince calls ‘evoked’. Information is considered to be evoked when it is both hearer-old and discourse-old, meaning that it has been mentioned in the preceding discourse (which automatically makes it hearer-old information). Another scenario not captured in the table involves inferrable information, which is illustrated by the examples in (2.12). Table 2.1 Matrix of information statuses in Prince’s framework

Discourse-new

Discourse-old

Hearer-new

Brand-new (2.10) a: someone […]

D.N.A.

Hearer-old

Unused (2.11) a: Sandy Thompson

Evoked (2.10) b: she (2.11) b: Sandy, she

Source: Prince 1992: 309; reprinted by permission from John Benjamins.

21

Approaching topicalization 21 (2.12) a. He passed by the door of the Bastille and the door was painted purple. b. He passed by the Bastille and the door was painted purple. (Prince 1992: 305; emphasis in the original) In the first sentence, the door is introduced in the first of the two main clauses and, thus, discourse-old. The door in the second sentence, on the other hand, is not introduced in the first half of the sentence –but it cannot be called discourse- new either, because the ‘trigger entity’ (ibid.: 305), namely the Bastille, makes it clear to the hearer that the door belongs to the Bastille. The speaker assumes that the hearer has a stored, mental concept along the lines of a “building (generally/plausibly) has associated with it a particular door, namely the main door used for entering and leaving” (ibid.), which allows them to successfully process the information. With regard to fronting constructions, Prince (1981b) differentiates between topicalization, focus movement, and Yiddish movement. These three types of fronting differ not by their form and word order, but by their relation to the preceding discourse (Prince 1981b: 250). Examples (2.13), (2.14), and (2.15) illustrate the three types. (2.13) Topicalization Most of the time I make biscuits for my kids. Cornbread you got to make. I don’t mean the canned kind. (Prince 1981b: 254; heading and emphasis added) (2.14) Focus movement Now they’re coming out with a hydraulic crane. Cherry pickers they’re called. They’re so very easy to upset […] (Prince 1981b: 259; heading and emphasis added) (2.15) Yiddish movement She works with me. Twenty years we’ve been here almost. They demand more from a hairstylist and you get more money from your work. (Prince 1981b: 260; heading and emphasis added) The first example involves the topicalization of cornbread, which is “part of an (inferrable) set of ‘breads’ and is salient in the discourse” (Mesthrie 1992: 111). The second sentence, however, involves focus movement because “cherry pickers […] specifies the value of the attribute ‘be called X’ ” (ibid.). The following sections show that a major difference between topicalization and the fronting of foci is frequently seen in the intonation of constituents. Cherry pickers in (2.14) would receive tonic stress, whereas a preposed topic in the narrow sense would not. The last example, finally, involves Yiddish movement (i.e., the fronting of

22

22 Approaching topicalization new information that has not been mentioned in the preceding discourse); in this case, the unexpected temporal information twenty years is preposed. In order to define topicalization in a narrow sense, Prince mentions the criterion that the topicalized NP needs to be discourse-old or, at least, linked to an entity that is inferrable (1981b: 251). Although her expression ‘saliently inferrable’ leaves open a small window for cases in which the fronted entity is not repeated in identical form, this criterion is still fairly restrictive –not least because of the fact that she only discusses NPs but no other phrases. By noting that contrast is not a necessary function of topicalization (ibid. 256), however, Prince slightly widens the scope.11 Following Prince, Lambrecht (1994) provides a frequently cited definition of topicalization. This definition served as a general introduction to the phenomenon in the first chapter and shall be repeated here as a first step into a discussion of his terminological framework. Lambrecht speaks of topicalization as a construction “in which a non-subject constituent is ‘topicalized’, i.e.[,]‌marked as a topic expression by being placed in the sentence-initial position normally occupied by the topical subject” (1994: 147). Disregarding any fine-tuning that can be found in his book, this definition has the advantage of being broad enough to be applicable to a wide range of contexts. A distinction made by Lambrecht, which is also found in Prince (1981b) and Ward and Birner (2004), is the one between the preposing of topics and the preposing of foci. In this regard, Lambrecht writes that the “ ‘topicalized’ phrase may stand either in a topic relation or in a focus relation to the proposition expressed by the sentence” (ibid.: 31). This relation is marked on the intonational level, but, according to him, not on the syntactic level. Thus, distinguishing between topicalization and focalizing, that is, Prince’s ‘focus movement’, is only possible via suprasegmental means. Apart from separating topicalization from focus movement, Lambrecht (2001) also calls for a separate treatment of topicalization and left-dislocation. In sentences featuring left-dislocation, the internal syntactic structure is retained by means of a resumptive pronoun ‘standing in’ for the fronted constituent. Two scholars who also published widely on topicalization are Birner and Ward (e.g., 1998, Ward and Birner 2004). They refer to fronting as ‘preposing’, which they define as “a sentence in which a lexically governed phrasal constituent (NP, AP, PP, VP) appears to the left of its canonical position, typically sentence- initially” (1998: 3). They consider left-dislocation to be outside of the realm of preposing because, unlike preposing constructions, left-dislocation allows for discourse-new and hearer-new initial constituents (Ward and Birner 2004: 162). In their publications, they repeatedly emphasize that preposed constituents must have a link to the preceding discourse: Felicitous preposing requires that the referent or denotation of the preposed constituent be anaphorically linked to the preceding discourse (see Reinhart1981, Vallduví 1992). (1998: 32)12

23

Approaching topicalization 23 [T]‌ he constraint on preposing and postposing constructions is absolute (e.g., in preposing the preposed constituent must represent discourse-old information regardless of the status of the information represented by the rest of the sentence). (Birner and Ward 2009: 1172) The relation holding between the preposed constituent and the preceding discourse does not have to be one of identity but includes “type/subtype, entity/ attribute, part/whole, identity, etc.” (1998: 32; Ward and Birner 2004: 159). Ward and Birner refer to the sum of the entities standing in any of these relations as ‘posets’. Thus, in an example such as (2.16), it could be argued that the bigger category ‘sports’ is the topic and not football. (2.16) G: Do you watch football? E: Yeah. Baseball I like a lot better. (Birner and Ward 1998: 38; emphasis in the original) In addition to their dismissal of preposed constituents that do not have a link to the preceding discourse, Birner and Ward limit their “analysis of preposing to those phrasal constituents that are lexically governed by the matrix verb” – accordingly, “subcategorized NPs, APs, VPs and PPs are included, while various adverbials and adjuncts are not” (1998: 31).13 In order to show how differently lexically governed constituents are constrained in comparison to free adjuncts and adverbials, Ward and Birner give the examples in (2.17) and (2.18). (2.17) *In a basket, I put your clothes. I put your clothes in a basket. (Ward and Birner 1998: 31) (2.18) In New York, there’s always something to do. There’s always something to do in New York. (Ward and Birner 1998: 31) Without further context, only the different orderings in (2.18) are possible without the sentence preposing becoming infelicitous. This is true regardless of whether givenness refers to discourse-old or hearer-old information status because even in the latter case, a speaker’s motivation needs to be assessed or, at the very least, hypothesized about in order for a preposing like this to make any sense. In an important aside, Birner and Ward stress that their understanding of topicalization is only loosely connected to the concept of ‘topic’ (1998: 38). Furthermore, they intend to apply the term as “a pragmatically defined type of preposing in particular” rather than “to refer […] to NP preposing in general” (ibid.). Although pointing out terminological vagueness in the discussion of definitions is a well-worn topos in academic literature, topic and its related terms

24

24 Approaching topicalization are, as mentioned before, notoriously hard to characterize. For this purpose, the previous sub-chapter followed the most frequently encountered criteria of aboutness and givenness as two central concepts involved in the identification of what constitutes a topic and what does not. In doing so, I did not wish to make a move towards discarding other approaches: It is certainly acceptable to define topic in primarily syntactic or primarily discourse-pragmatic terms or any mixture of these, but, whatever the approach, the definition should serve the research question at hand. As pointed out above, Ward and Birner focus strongly on the importance of discourse links and, in doing so, they are concerned with givenness (in the abovementioned sense), a criterion of topichood that had already been at the centre of several articles before the publication of Ward and Birner’s monograph from 1998. Their emphasis on the preposed constituent requiring a link to the preceding discourse represents one of the restrictions that I do not wish to follow as strictly in this study. Instead, unused and brand-new topics without a link are also accepted. This is in line with my understanding of ‘topics’, which may or may not contain information that is part of the common ground (irrespective of immediate accessibility).14 In this regard, my understanding of topicalization is closer to Mesthrie’s approach than it is to Ward and Birner’s because Mesthrie (1992) also considers cases to be instances of topicalization where information that is clearly shared knowledge between two speakers is placed in sentence-initial position but has not been evoked either directly or by means of a poset in the discourse. Mesthrie’s example for this is Your tablet you took? (1992: 113), which is, presumably, a relevant question in the discourse. Possible scenarios for a question like this to be asked include, for instance, a conversation before the addressee goes to bed or is about to embark on some type of journey – only two of many imaginable situations in which it might be important that they took their medicine. Two factors that possibly influence preposings that are not discourse-but hearer-old or brand-new are relevance and processing. Relevance has been a hotly debated concept in pragmatics, with Sperber and Wilson’s books (1986, 1995) being pivotal works in the field. A complete explanation of their theory goes well beyond the scope of this book. Instead, the following, fairly recent definition by Clark shall suffice: “[A]‌stimulus or other phenomenon is relevant to an individual to the extent that it has positive cognitive effects for that individual and to the extent that the effort involved in deriving them is small” (2013: 329; emphasis removed). In this context, ‘positive’ refers to “the possibility that some effects derived on the basis of false assumptions might be disadvantageous and, in fact, lead to a stimulus being less relevant rather than more” (ibid.). Psycholinguistic research has analysed the role of word order in syntactic processing for L1s and L2s (e.g., Kaan 1997, 2001) and, generally, the subject of the sentence (in the sense of the grammatical subject) is considered to be highly accessible in processing (cf. Gompel and Pickering 2007: 299; Kaan 2001).15 With regard to relevance, processing is a particularly interesting factor: If there is indeed no link to the preceding discourse but the preposed constituent is only hearer-old (and, potentially, only assumed by the addresser to be shared knowledge in some cases),

25

Approaching topicalization 25 the question needs to be asked whether the information in the topicalized constituent is relevant enough to justify a (presumably) higher cognitive processing load. With regard to the processing effort involved in marked syntactic structures, Kaiser and Trueswell note that non-canonical constructions require a higher processing effort than do their canonical counterparts, but they are also subject to discourse-pragmatic aspects (2004: 115). The most obvious problem relevant to the issue of processing in English is ambiguity because, formally, apart from certain pronouns, subject and object look alike in English.16 This means that the speaker is not able to distinguish one from the other right from the beginning of the expression based on their exterior form, and this, in turn, means that parsing requires more work when the object is preposed. Certainly, hearer-old or discourse-old information at the beginning of a sentence might potentially diminish this effect to an extent, because, while it may not necessarily be expected, it will at least be recognized quickly. Some studies suggest that “new information should attract the listener’s attention […] [and] result in a ‘deeper’ processing […] than that of the old information” (Engelkamp and Zimmer 1983: 122) and, in addition, that “it should rather be expected that new information is processed before the old, simply because it attracts the listener’s attention” (ibid.: 123; emphasis in the original). Such might be the case in canonical sentences, but is certainly debatable when dealing with non-canonical structures. For an in-depth analysis of this related problem, a close investigation of overlapping topics and foci would be necessary, at which point a closer distinction as Ward and Birner’s would be of use. In addition, the question of givenness comes into play again, since, as Kaiser and Trueswell mention (2004: 115–118) and as intuition would suggest, a member of a previously evoked poset will be easier to process than information that is only hearer-old (or not part of the common ground at all). However, since there are no audio files available for all of the ICE corpora and preposings are frequently ambiguous to begin with, such an endeavour is not feasible for the study at hand but needs to be considered as an issue for future research. In spite of the fact that I follow Birner and Ward’s framework only to a limited extent, their decade-spanning insistence on thinking of information structure as a context-sensitive, discourse-pragmatic phenomenon is admirable. Their 1998 monograph in particular sets out to criticize previous claims that pragmatic factors do not influence word order (or only to a limited extent). Hawkins’s framework considers syntactic weight as “the major determinant of word-order variation, and indeed of all word order, while informational notions play only a subsidiary role” (1994: 111). Ward and Birner, on the other hand, comment “that context and information status are crucial to the felicity of the noncanonical- word-order constructions we are considering” (1998: 27), that is, various types of preposing, postposing, and argument reversal. This goes together with their claim that few previous studies, that is, studies from before they published their book at the end of the 1990s, dared to look beyond the sentence level in the analysis of topicalization (ibid.: 39). By looking at constructed examples instead of examples from actual spoken or written language and by not moving past the

26

26 Approaching topicalization sentence level, the discourse-pragmatic component factoring greatly into topicalization and other information-packaging phenomena was frequently ignored. In a less positive light, however, it should be pointed out that Ward and Birner discard any examples of topicalization without a linking relation to the preceding discourse. Since they worked on a corpus of American English, however, their data called for different criteria in their analysis than L2 and learner varieties of English. Nevertheless, their exclusion of preposings with no link to the preceding discourse makes their framework only partially applicable to the study at hand. As the next chapter shows, overly restrictive approaches to topicalization cannot and do not cover innovative usage patterns found in many spoken varieties of English. 2.2.2 An expanded concept of topicalization In his study on SAIE, Mesthrie calls topicalization rare (or highly marked) in formal manifestations of standard English, but quite common in informal English, particularly in pidgins, creoles, L2s (nativized or non-nativized) and some social and regional dialects. (1992: 110) Unlike other definitions discussed in this chapter, his understanding of topicalization is a broader one and includes fronting and left-dislocation. Fronting, in his framework, can be further divided into Prince’s categories described above: (a) topicalization, (b) focus movement, and (c) Yiddish movement. A good quarter century after the publication of his monograph on SAIE, Mesthrie and his co-author Bhatt extended the scope of what they consider to be topicalization even further and include, in addition to fronting and left-dislocation, clefting constructions (Mesthrie and Bhatt 2008: 81–82). In establishing his expanded framework, Mesthrie largely builds on Prince’s theories described above. However, following an attestation of the three types for SAIE, Mesthrie claims that this variety “goes beyond the functions associated with fronting and dislocation in mainstream varieties of English” (1992: 112). Several of these ‘functions’ or differences could also be found in the ICE corpora for the four Asian varieties I investigated, which makes them particularly relevant for the present study. The six additional functions and differences identified by Mesthrie are summarized in Table 2.2. While all the findings in Table 2.2 are worthy of closer inspection and validation with regard to other varieties, (1) and (3) in particular differ from established notions of topicalization in English. Finegan and Besnier, for instance, write that “[i]‌n English, the main function of fronting is to mark givenness [and] [t]he fronted noun phrase must represent given information” (1989: 224). This may be the case for ‘traditional’ L1 varieties of English, but it does not capture the complexity found in other varieties of English, many of which are more or less heavily influenced by different communicative demands and the specific linguistic, often bi-or multilingual, environments.

27

Approaching topicalization 27 Table 2.2 Expanded functions of topicalization in Mesthrie’s study of SAIE

1

2 3

Function /Difference

Example(s) in SAIE

unanticipated fronting, i.e., topic has no referent in the preceding discourse and does not serve a contrasting purpose higher frequency of fronting and dislocation compared to other L1 varieties of South African English fronting and dislocation of constituents other than NPs

Your tablet you took? (= ‘Have you taken your tablets?’)

4

topics in embedded clauses, extraction of topics from such clauses, and stacking of topics

5

interaction with further syntactic processes such as yes-no and wh- questions, negation, indefinite NPs, pro-drop shift from initiated canonical SVO order to topicalization in basilectal speech

6

/ temporals, locatives, genitives, comitatives, instrument, goal, beneficiary, source, dative of purpose, dative to, comparative NPs An’ then, just two months ago now, ‘nother one wedding, my next-door daughter got married, in Umzinto, that wedding, I seen him there. Alone you came? (= ‘Did you come alone?’) We paid seventy-six cents we paid.

Source: Adapted from Mesthrie 1992: 113–120; reprinted by permission from Cambridge University Press.17

In addition to giving the different functions and criteria, Mesthrie effectively highlights the typological side of the coin: “In simply fronting a salient (but not necessarily given or contrastive) element, SAIE appears to be closer to the ‘pure’ topic mode than the mainstream English mode” (1992: 113). The notion of ‘topic mode’ roughly corresponds to topic-prominence, which is discussed with regard to the relevant contact languages in the next chapter. Despite a clear interest in typological matters, Mesthrie claims that [t]‌he predilection for topicalisation […] is not substrate-induced. The Indic and Dravidian languages do not appear to use a particularly striking proportion of topicalized sentences (no more than standard English, say). Once again we see universals of discourse structure playing a greater role than transfer. (ibid.: 157) The claim in the second sentence of this quotation is puzzling. A close analysis of some of the most widely spoken Indo-Aryan and Dravidian languages reveals that they show traits of both subject-and topic-prominent languages and frequently employ topicalization. Thus, I follow Lange (2012a: 151) in her recommendation to take into account the structure of contact languages in order to explain

28

28 Approaching topicalization forms, functions, and frequencies of topicalization in varieties of English. This approach, rooted in contact linguistics, compares typological configurations and is applied to all varieties analysed in this study. As far as the aforementioned issue of givenness is concerned, the terminological question arising is, ultimately, one of scope. If a definition of ‘givenness’ includes shared (world) knowledge and not just the immediate discourse, then Your tablets in the abovementioned example might well be considered as being ‘given’, since both the speaker and hearer are most likely aware of the fact that the addressee has to take some sort of medication. However, one of the intriguing aspects of World Englishes research is the identification of linguistic features and structures that might have previously been unheard of or, at least, are rare in the ‘traditional’ varieties. This naturally encompasses the necessity of approaching Asian varieties with an open mind and a framework that is not too restrictive; otherwise, many interesting cases might be overlooked. 2.2.3 Topicalization as a non-canonical construction An aspect that, thus far, has received little attention in this chapter is the status of topicalization as a non-canonical construction. In this regard, it is illuminating to consult descriptions of topicalization in the major grammars of English. The CGEL elaborates on topicalization in a chapter written by Ward, Birner, and Huddleston (2002). The title of the chapter gives away the authors’ stance, as it is called Information packaging, which suggests a deviation from ‘regular’, canonical information structure. Indeed, the CGEL defines information- packaging constructions as “a number of clause constructions […] which differ syntactically from the most basic, or canonical, constructions in the language” (2002: 1365). The proximity of the canonical/non-canonical distinction to the unmarked/ marked pair becomes particularly evident in the chapter when the authors note that “information-packaging constructions characteristically have a syntactically more basic counterpart differing not in truth conditions or illocutionary meaning but in the way the informational content is presented” (ibid.). The three pairs shown in Table 2.3 serve as examples. As in the framework by Birner and Ward, discussed above, Ward et al. (2002) refer to ‘preposing’ rather than ‘topicalization’ (ibid.: 1366) and note that the non-canonical versions are (a) less frequent and (b) subject to pragmatic constraints that do not hold for the canonical versions (ibid.: 1367). Table 2.3 Canonical vs. non-canonical sentences

canonical version

non-canonical version

a. Kim wrote the letter. a. Two doctors were on the plane. a. We rejected six of the applications.

b. The letter was written by Kim. b. There were two doctors on the plane. b. Six of the applications we rejected.

Source: Ward et al. 2002: 1365; reprinted by permission from Cambridge University Press.

29

Approaching topicalization 29 In the Longman Grammar of Spoken and Written English by Biber et al. (1999), the phenomenon of pre-nuclear placement of constituents is referred to as ‘fronting’ (1999: 900). Although the authors do not explicitly describe fronting as a non- canonical or marked construction, they claim that “fronting of core elements is virtually restricted to declarative main clauses, and is relatively rare in English” (ibid.). They found mostly predicative fronting in academic prose and the news, while object fronting dominated in fiction and conversations (ibid.: 910). Identifying tokens of fronting in academic prose, news, and fiction sounds very intriguing since these are (in the case of Biber et al.’s corpus) written texts, but there is an important addition to be made that lessens the impact of this finding to an extent: The examples that Biber et al. (1999) found in fictional texts seem to be almost exclusively attempts by the respective authors to recreate spoken language. The last major grammar to be consulted, Quirk et al. (1985), presents what the authors call ‘fronting’ in a very different light. According to them, “fronting is in no way confined to colloquial speech” (1985: 1377) and can be found in both spoken and written language. Overall, it becomes evident that the major grammars understand topicalization (or preposing or fronting) and its frequency relatively differently. Whereas Biber et al. (1999) consider it a non-canonical, relatively infrequent feature used for different reasons across different registers, Quirk et al. (1985) claim that it occurs rather frequently in both spoken and written language. Ward et al. (2002), in turn, think of topicalization as an important information-packaging construction with different discourse-pragmatic constraints. For the remainder of the book, it is assumed that topicalization is indeed a ‘marked’ construction in the sense that it represents a construction deviating from the unmarked, canonical SVX pattern – in most varieties and contexts. IndE and SinE are discussed as two varieties that feature topicalization to an extent that suggests (pragmatic) unmarking, which supports a reassessment of how topicalization should be approached. 2.2.4 Discourse functions of topicalization The last aspect to be covered in this chapter is which specific functions topicalization may have in the discourse. For the present analysis, emphasis, contrast, topic continuity, and topic shifting were differentiated as the four main discourse functions of topicalization. This is, to an extent, a simplification, but all cases identified in this study could be accounted for sufficiently with these four functions. The first three functions have previously been noted by Biber et al. (1999: 900), who list “organizing information flow to achieve cohesion”, “expressing contrast”, and “enabling particular elements to gain emphasis” as the main discourse functions of topicalization. Topic shifting is not mentioned by Biber et al. (1999), which might be attributed to the fact that very few tokens of topicalization fulfil this specific function. Emphasis and contrast Emphasis is notoriously difficult to define, since it overlaps with other concepts important to information structure and is (formally) often indistinguishable from

30

30 Approaching topicalization other functions, particularly contrast (Matthews 1997: 113). Contrast is created when one alternative is picked over others and the number of alternatives is not infinite (see Chafe 1976). Biber et al. add that contrast occurs when “elements are in focus” (1999: 897) and that “[o]‌bject fronting is typically chosen when there is a communicative need to emphasize or contrast a clause element. Both the fronted element and the verb are strongly focused” (ibid.: 904). Identifying the focus of a sentence is, therefore, not sufficient to delineate contrastive from emphatic elements. Part of the idea of emphasizing a constituent is that the constituent shall serve as the point of attention, which has been defined in this chapter as one of the main criteria defining foci. Biber et al. give example (2.19) from the conversations in their corpus: (2.19) Right you are! (Biber et al. 1999: 904; italics removed) In such cases of emphasis, “[t]‌he fronting has an intensifying effect, which is often strengthened by the choice of words (horrible, bloody amazing, etc.), or by emphatic stress when spoken (reflected by exclamation marks)” (ibid.; emphasis in the original). In addition, focus particles such as only and also have been noted as additional means for giving prominence to an element in the discourse in Indian English (see Lange 2007; Bernaisch and Lange 2012; Fuchs 2012). For the present study, the taxonomy suggested by Callies (2009) was applied in order to distinguish between emphasis and contrast. Callies differentiates between intensification on the one hand and contrast on the other. Intensification, referred to in this book as ‘emphasis’, occurs when information is highlighted as important but there is no contrastive function, whereas contrast refers to cases where explicit or implicit alternatives are present (Callies 2009: 23). The main difference to Callies’s framework lies in the absence of an explicit focus analysis in the present study. This means that emphasis was accepted as a discourse function even when the probability of one of the foci falling on the topicalized constituent was relatively low. For contrasts, focal highlighting of the contrastive element is generally assumed to be a given. The problem with the analysis of contrasts lies elsewhere: Despite the absence of a clear-cut set of criteria for measuring degrees of contrastiveness, it needs to be acknowledged that contrastiveness is gradient and not analysable in simple terms of being present versus being absent (cf. Lambrecht 1994: 290; Molnár 2002). Consequently, the question of whether a token is contrastive is, to a degree, subject to personal judgement, in particular when the analysis has to rely on the written representation of spoken language with little further information on speakers and context. An example of a token that was counted as being contrastive is given in (2.20). (2.20) A: Oh we never knew what happened to her The other we knew ah but Vidya we didn’t knew (ICE-IND: S1A-021#58–59)

31

Approaching topicalization 31 In such a case, where the contrasting partners are both overt, classifying the discourse function was fairly straightforward. Topic continuity and topic shifting Topic continuity and topic shifting are, like emphasis and contrast, relatively closely related functions. Topic continuity is understood here in the sense of Lange, who found for Indian English that “[m]‌any examples display an explicit discourse-linking function, where the immediately preceding topic is taken up again” (2012a: 134). An example for a case of topic continuity is given in (2.21). (2.21) Z: It’s not easy to uh witness to your own people A: I know Z: Sometimes it takes a long long time A: What is your suggestion Z: Patience and prayer A: Yeah and prayer I have (ICE-HK:S1A-052#279–284) In the example, speaker A picks up the topic introduced by speaker Z and repeats it, clearly establishing a discourse link and topic continuity. In contrast to topic continuity, topic shifting refers to those cases where topicalization is used to draw attention to a completely different topic. An example for a potential case of topic shifting is given in (2.22), with after graduation being the relevant part. (2.22) B: The frontal view frontal view or I don’t know what’s the where the bridge is located A: By the way after graduation what are your plans B: I plan to have a short gathering a sort of celebration (ICE-PHI:S1A-045#67–69) In this example, after graduation is uttered unexpectedly and introduces a new direction for the conversation. Topic shifting is, as mentioned previously, very rare, but should be mentioned nonetheless. Topic continuity, on the other hand, represents one of the major functions of topicalization.

2.3 Summary In this chapter, I presented and evaluated approaches to topics, information structure and status, and topicalization. This is far from a comprehensive overview of what is available in the literature on topicalization but provides insight into some major studies that have influenced or directly contributed to research on non-canonical syntax in general and on World Englishes in particular. A major

32

32 Approaching topicalization outcome of this chapter is that Mesthrie’s framework emerges as the most applicable to the data presented in this study. This is hardly surprising, since he also analysed a variety with a colonial background and had to account for forms, functions, and frequencies of topicalization that may not necessarily be part of ‘traditional’ L1 varieties of English. Coming back to the role of givenness and discourse links in topicalization, all of the analysed studies apart from Mesthrie’s limit their understanding of topicalization to such an extent that not all cases I found in ICE can be adequately dealt with. The complexity of topicalization usage patterns and frequencies in varieties in the Outer Circle (and possibly in the Expanding Circle) is best captured by looking at each token individually and deducing an overall theory of which constituents may be topicalized in Asian varieties of English and which constraints hold. Considering the tokens identified in ICE, unused and brand-new constituents in topicalized position are rare or fairly marked, but nevertheless present. I am, therefore, in favour of not insisting on information having to be discourse-old or requiring a link to the preceding discourse in order for topicalization to be felicitous. Shared knowledge, world knowledge, or other information that has entered the common ground or is ‘available’ to enter the common ground by whichever means, should be considered sufficient. Topic-comment sentences and topicalization processes are often basic ‘components’ of Asian contact languages of English and possibly allow for usage patterns in English that might go beyond what is commonly expected (at least in L1 varieties). As will be shown by my findings, topicalization of hearer-old and even brand-new information occurs and does not put a conversation to a sudden halt by infelicitous preposing. In summary, it makes sense to repeat the definition given in the introduction to this chapter: I understand topicalization as the marked fronting of constituents.18 By placing objects, complements, (obligatory) adverbials or embedded subjects in sentence-initial position, speakers of Asian Englishes (and other varieties) turn these constituents into topics (and/or foci) and, in doing so, employ a discourse- pragmatic strategy to create contrast, emphasize constituents, shift the topic or establish topic continuity (see Lange 2012a). Furthermore, topicalization needs to be analysed in the context of both the individual sentence and the discourse as a whole: It is a strategy modifying the structure of a sentence, but this is usually done in order to shape the discourse at large. Often, topicalization refers back to previously mentioned entities. In some cases, topicalization introduces new entities into the discourse or highlights certain constituents. These functions are not idiosyncratic, but they help to structure the discourse, create cohesion, and represent speaker attitudes. It is therefore not helpful to explicitly remain at the sentence level or the discourse level, but, rather, to consider the individual utterance in relation to the discourse.

Notes 1 Neither in the quotes nor in the remainder of Weil’s thesis does his theory fully acknowledge the fact that information structure –depending on the options available

33

Approaching topicalization 33 in a language –can be manipulated by syntactic, morphological, suprasegmental, and/ or lexical means, although he tackles prosody in his third chapter. At any rate, the pool of options for modifying the information structure of a sentence (cross-linguistically) reinforces his idea. 2 ‘Verb-second’ and ‘V2’ describe that “in an independent declarative clause the finite verb is in second position” (Frey 2004: 1). 3 The German examples are glossed according to the conventions described in the Leipzig Glossing Rules. 4 Topicalization is also a non-canonical, but far less marked construction in German. 5 It does not become clear from the text on which data Engel (1972) bases his claim. 6 Original quote in German: “inhärente Thematizität des Subjekts” (translation by the author). 7 Ammann defines Thema as “den Gegenstand der Mitteilung” (“the topic of the message”, translation by the author) and Rhema as “das was ich dem Hörer über das Thema zu sagen habe” (1928/21962: II, 3; English: “what I have to tell the hearer about the theme”, translation by the author). 8 See, for instance, Gundel (2012: 587) for an overview of literature on different terms that relate in some way to referential givenness-newness. 9 See also Lambrecht (1994: 334). 10 This does not take into account certain health-related circumstances such as transient global amnesia, which, while possibly producing situations where information is both discourse-old and hearer-new, are probably too rare to factor into a generalising overview. 11 It is duly noted that my own findings as well as previous studies (cf. Lange 2012a) confirm this claim. 12 In later texts, they added Horn (1986) to the references in this quotation. 13 Verbs of direct discourse also fall into this category because their complements “appear [freely] in preposed position” (1998: 31). 14 It should be mentioned again that discourse-old topics are still assumed to be the typical case, which proves to be true based on my findings. 15 This explains the tendency of the world’s languages to follow an order where the subject comes first in a transitive clause (cf. Dryer 2007; see also chapter 3) and the tendency of languages with flexible word order to prefer sentence-initial (grammatical) subjects (cf. Kaiser and Trueswell 2004). 16 Ambiguity in topicalization has also been attested for Dutch, see Erteschik- Shir (2007: 155). 17 As Lange (2012a: 150) points out, SAIE is a shift variety. This limits a direct application of Mesthrie’s findings to the Asian varieties under consideration, although the advantage of taking a more inclusive perspective remains unchallenged. 18 Although, as mentioned before, gradual unmarking appears to be underway in some varieties.

34

3 Topic-prominence in Asian contact languages

How can we explain frequencies of topicalization in an Asian variety of English that differ notably from those found in other varieties? A first and very attractive explanation is substrate transfer –that is, the transfer of a feature from a contact language to English. Taking the notions of ‘topic’ and ‘topicalization’ from chapter 2 as a foundation, this chapter contributes to the question of to which extent language contact and, consequently, typological interference can explain topicalization frequency and usage patterns in Asian Englishes. For this purpose, I first discuss the role of word order and establish (potential) criteria for subject- prominence and topic-prominence based on Li and Thompson’s paper from 1976. In a second step, I give typological overviews of some of the major contact languages of English in India, Singapore, Hong Kong, and the Philippines, namely Hindi, Bangla, Marathi, Tamil, Telugu, Kannada, Mandarin, Cantonese, Malay, and Tagalog. These languages were selected because of their importance in the analysed countries, speaker numbers, and the availability of detailed grammatical descriptions. Each of these languages is evaluated with regard to Li and Thompson’s criteria (1976) and then placed on the continuum of subject-and topic-prominence. It should be noted from the outset that this typological discussion is based solely on literature review and not on personal familiarity with the languages.1 At the end of the chapter, it will be established that all of the analysed contact languages, albeit in different degrees, show traits of subject- prominence and topic-prominence.

3.1 Word-order typology Before the focus of this chapter moves to subject-and topic-prominence and the position of individual languages on the continuum, a brief overview of word- order typology serves as the basis for subsequent considerations. Subject-and topic-prominence are situated at the interface of information structure and sentence grammar, but since traditional Western linguistics has primarily looked at sentence structure based on the position of the three major constituents – subject, verb, and object –typological descriptions typically also account for word order.

35

Topic-prominence in Asian languages 35 Table 3.1 Distribution of word-order types in the languages of the world

Type 1 2 3 4 5 6 7

No. of languages Subject-Object-Verb (SOV) Subject-Verb-Object (SVO) Verb-Subject-Object (VSO) Verb-Object-Subject (VOS) Object-Verb-Subject (OVS) Object-Subject-Verb (OSV) Lacking a dominant word order

497 435 85 26 9 4 172

Source: Dryer 2005a: 330; reprinted by permission from Oxford University Press.

In the World Atlas of Language Structures (WALS, Haspelmath et al. 2005), perhaps the most comprehensive work of typological comparisons available today, Dryer distinguishes amongst six possible types of word order and an additional category for languages with no dominant word order. Table 3.1 lists the different types and, based on the total number of 1,228 analysed languages, estimates the number of languages belonging to each type. The most obvious finding to be taken from this table is that SOV and SVO are by far the most frequent word-order types in the sample; 932 of 1228 languages (ca. 75.9 per cent) are SOV or SVO; another 172 languages lack a dominant word order, and only 124 languages, that is, 10 per cent of the sample, are VSO, VOS, OVS, or OSV. Even without investigating subject-and topic-prominence in more detail, this distribution of word-order types has implications for the analysis of language contact as an influence on topicalization. If topicalization is defined as the fronting of the object in the pattern SVO (or SOV), then having the object appear first in any of the contact languages would represent an intriguing, but even more complex, case. Since none of the languages investigated are object-initial, however, this is not of further relevance. Dryer’s contribution to WALS (as any other approach to a matter as complex as word order) needs to be taken with consideration, since his understanding of the terms ‘subject’ and ‘object’ is not identical to traditional notions.2 For the analysis at hand, this approach does not make much of a difference. It is, however, laudable because it allows the inclusion of languages where traditional notions are less applicable or do not hold at all. Perhaps the most substantially ‘different’ language considered in this analysis –from an Indo-European point of view –is Tagalog, over which there is still substantial disagreement concerning the function of certain particles. Contemplations of this sort already go a long way to show that even fundamental concepts such as word order are ultimately simplifications of highly complex matters, and the division into subject-prominence and topic- prominence is no different in this regard. On the other hand, one cannot do away with categorization and classification (even if that implies a certain degree of simplification).

36

36 Topic-prominence in Asian languages Turning back to Dryer’s study, two further distinctions need to be pointed out. The first of these is the division between languages with rigid order and languages with flexible order: Languages belonging the first group “can be assigned straightforwardly to one of the six types, because all orders other than one are either ungrammatical or used relatively infrequently and only in special pragmatic contexts” (ibid.). On the other hand, in languages with flexible order, constituents are not assigned a position as rigorously; although position may be influenced, or even determined, by certain pragmatic rules. Secondly, Dryer distinguishes between languages with and without dominant word order. The label ‘dominant’ encompasses languages with rigid order as well as more flexible languages with a predominant order (ibid.). This is why languages such as German and Dutch are considered languages with no dominant word order: Since there are clauses with SVO (main clauses without an auxiliary) and SOV (subordinate clauses and main clauses with an auxiliary) and both of them could be seen as being the ‘common’ type, Dryer decided not to consider one or the other as dominant (ibid.: 331). In the next section, I present the idea behind the terms ‘subject-prominence’ and ‘topic-prominence’ put forward by Li and Thompson (1976). Looking at sentence structure from a more general typological point of view, the authors are interested in the difference between two types of language: those that form sentences based on the subject and predicate and those that structure their sentences based on topic and comment. As I show throughout this chapter, certain word-order preferences appear to correlate, to an extent, with subject-and topic-prominence. Describing word- order variation as a “fruitful area in which to explore discourse- marking structure” (2007: 271), Masica confirms the necessity of studying the connection between surface structure, that is, word order, and the underlying principles (often rooted in the realm of discourse-pragmatics) that influence word order. This necessity was pointed out in 2007, more than thirty years after Li and Thompson attempted to describe the four guiding principles of sentence structuring, which I discuss in the following section.

3.2 Li and Thompson’s classification of languages Li and Thompson claim that languages “may differ in their strategies in constructing sentences according to the prominence of the notions of topic and subject” (1976: 459). Based on the general differentiation between subject and topic, they identify four types of languages: ( 1) languages that are subject-prominent, (2) languages that are topic-prominent, (3) languages that are both subject-and topic-prominent, (4) languages that are neither subject-prominent nor topic-prominent. In languages of type (1), sentences are structured according to the subject- predicate principle. Languages belonging to the second type are based on the

37

Topic-prominence in Asian languages 37 topic-comment principle, while languages in the third group have constructions of both kinds. In the fourth category, subject and topic “have merged and are no longer distinguishable in all sentence types” (ibid.). Perhaps the most important fact to acknowledge with regard to this distinction is mentioned early on in their paper: Being topic-prominent does not mean that a language does not have subjects, and, likewise, subject-prominent languages (can) have topic-comment constructions. According to Li and Thompson, the latter case holds true for all the languages they investigated. Topicalization in English is a case in point: English certainly is a subject-prominent language but, by means of processes such as left-dislocation and topicalization, sentences that highlight the topic-comment distinction can be construed. Li and Thompson’s point is, therefore, that the concept of topic and the concept of subject are regarded as ‘basic’ or fundamental in most languages (ibid.: 460). As the next pages show in more detail, subject-and topic-prominence are categories best thought of as two poles on a continuum rather than a clear-cut division with languages falling into one group or the other. For their investigation, Li and Thompson (1976) had a sample of languages available that included Indo-European languages as well as several languages from other families spoken primarily in Asia, which makes their overview intrinsically interesting for the present discussion. Table 3.2 illustrates Li and Thompson’s four language types and the languages assigned to these types (see 1976: 460). The assignment of these languages to the different types should be seen as a first attempt by the authors and certainly not as conclusive. The fact that a small sample of languages was used represents the first problem. Li and Thompson justify this by claiming that “a careful investigation of the syntactic structures of a language is necessary” (1976: 460) in order to test for topic-prominence. Such a careful investigation requires a lot of time and, at the time of the publication of Li Table 3.2 Language types according to Li and Thompson (1976: 460)

Subject-prominent languages

Topic-prominent languages

Indo-European Niger-Congo Finno-Ugric Simitic Dyirbal (Australian) Indonesian Malagasy …

Chinese Lahu (Lolo-Burmese) Lisu (Lolo-Burmese) ….

Subject-prominent  and topic-prominent languages Japanese Korean …

Neither subject-prominent nor topic-prominent languages Tagalog Illocano …

Source: Reprinted by permission from Elsevier.

38

38 Topic-prominence in Asian languages and Thompson’s paper, descriptive grammars were not as easily accessible as they are today. Even in the present day, some major languages are still not described in detail with regard to all linguistic levels. In addition, there was (and is) a tendency in descriptive grammars to explain sentence structure based on the threefold distinction differentiating subject, verb, and object, which turns an investigation of the criteria of topic-prominence into a challenging task. These caveats make some of the disputable decisions made in Table 3.2 understandable because, as will be seen later, the Indo-Aryan languages (as part of the Indo-European language family) are not exclusively subject-prominent. In order to make a case for the typological classification of languages based on their four types, Li and Thompson (1976) provide a number of differences between subject and topic properties. These differences are summed up in Table 3.3. For a thorough understanding of the individual differences assumed by Li and Thompson (1976), it is essential to read the original text. For this discussion, some additional remarks on their theoretical framework shall suffice. One problem, noted by Kiss (2001), is the lack of a proper definition of topic-prominence in Li and Thompson’s paper. Although the authors outline properties of what supposedly makes a language topic-prominent, there is no definition that could be quoted as such. Another problem worthy of investigation is brought up by Dryer, according to whom “evidence in the tradition of Givón (1983) shows that many languages seem to place what can be described as topical elements late in clauses, casting doubt on the notion that there is a universal pragmatic preference for topics to occur early in sentences” (2005b: 335). Although there appears to be a frequent overlap of subject and

Table 3.3 Characteristics of ‘subject’ and ‘topic’

Criterion

Subject

Topic

Definiteness Selectional relations Role of the verb Functional role

Indefinite or definite Always has selectional relation with some predicate Verb determines subject (a) Empty (dummy subjects) or (b) role definable within the confines of a sentence Agreement between subject and verb (in many languages) Position may vary

Always definite Need not be an argument of a predicative constituent Verb does not determine topic Constant functional role across sentences (setting the framework) No agreement between topic and verb (or extremely rare) Surface coding of topic involves sentence-initial position Not involved in grammatical processes such as the ones given in the subject column

Verb agreement Sentence-initial position Grammatical processes

Crucial role in processes such as reflexivization, passivization, equi-NP deletion, verb serialization, imperativization

Source: Adapted from Li and Thompson 1976: 461–466; reprinted by permission from Elsevier.

39

Topic-prominence in Asian languages 39 topic in, for instance, the Germanic languages, larger samples that include more languages seem to deviate from this tendency. Surface coding of the topic by placing it sentence-initially can be found in several languages. Nevertheless, particles, affixes, and other morphological or even prosodic means should be able to mark the topic, ultimately rendering additional marking by position redundant. However, it seems more reasonable for the topic to occur early in the sentence, because the amount of processing necessary to understand the sentence is, at least intuitively, the lowest in this configuration. This is also Givón’s belief: “Information that attracts more attention is memorized, stored and retrieved more efficiently. The great universality of the pragmatic use of word order attests to its solid cognitive underpinning” (Givón 2001b: 250). This assumption might be contested, however, if the topic is in fact –as mentioned in the previous c hapter –a dislocated constituent assumed to follow later in the sentence. In the case of English, this would be an object, a complement, or an (obligatory) adverbial. For the case of topicalized constituents which are not prepared in the preceding discourse in particular, the notion of efficient processing of the topic is questionable. A detailed investigation of this problem with psycholinguistic methods, such as eye-tracking experiments, could help solve this issue, but this is not a concern of this book. Certainly, though, I agree with Givón that “the pragmatic principle that controls word order has relatively little to do with fronting old information, but rather with fronting important information” (ibid.: 257; emphasis in the original). This statement not only underlines my claim that aboutness is more important than givenness in topicalization (at least for spoken varieties), but it is also indicative of an alternative ‘logic’ of sentence structuring in topic-prominent languages. In addition to providing differences between subjects and topics, Li and Thompson also outline eight characteristics typical of topic-prominent languages, which are listed in Table 3.4. With minor modifications (explained below), these criteria are analysed for ten major contact languages of the four Asian varieties of English under consideration. Surface coding (a), the passive construction (b), dummy subjects (c), V-final languages (f), constraints on the topic constituent (g), and basicness of topic- comment sentences (h) are rather self-explanatory. “Dummy” subjects serve as an example: If a language is topic-prominent, there is no need for it to fill the semantically empty slot of a dummy subject in sentences such as It is raining (Li and Thompson 1976: 467). Criteria (d) and (e), however, are less obviously clear. Double subjects (d) are constructions where the psychological subject (the topic) and the grammatical subject occur right next to each other in a sentence. Examples from Japanese (3.1) and from Mandarin (3.2) illustrate the construction. wa tai ga oisii (3.1) Sakana fish top red snapper subj.delicious ‘Fish (topic), red snapper is delicious’. (Li and Thompson 1976: 468)

40

40 Topic-prominence in Asian languages Table 3.4 Characteristics of topic-prominent languages

Characteristic

Description

(a)

Surface coding

(b)

The passive construction

(c)

“Dummy” subjects

(d)

“Double subject”

(e)

Controlling co-reference

(f)

V-final languages

(g)

Constraints on topic constituent Basicness of topic-comment sentences

Coding of the topic by position and/or morphological marking in topic-prominent languages No or marginal passivization in topic-prominent languages Absence of dummy subjects in topic-prominent languages Pervasive double-subject constructions in topic- prominent languages Topic controls co-referential constituent deletion in topic-prominent languages Topic-prominent languages tend to be verb-final languages No constraints on what the topic may be in topic- prominent languages Topic-comment structures are the basic structure in topic-prominent languages

(h)

Source: Adapted from Li and Thompson 1976: 466–471; reprinted by permission from Elsevier.

(3.2)

Nèike shù yèzi dà that tree leaves big ‘That tree (topic), the leaves are big’.

(Li and Thompson 1976: 468)

What is interesting about these examples is that the translations resemble the hanging topic construction more than they do ‘regular’ cases of topicalization because the topics fish and that tree are syntactically not linked to the remainder of the sentence. In order to understand the construction better, four criteria given by Li and Thompson (1976: 468) prove helpful: ( a) the topic and subject both occur and are distinguishable; (b) the topic has no selectional relationship with the verb; (c) no argument could be made for a movement rule; (d) a Tp language has sentences of this type, while no pure Sp language does. The fourth parameter (d) is more of a statement about the occurrence of double subjects than it is a strictly testable criterion. In the analysis, the basicness of TC-sentences was taken as an indication for making judgements about this aspect. “Controlling co-reference”, criterion (e) in Table 3.4, is another aspect in need of description. Li and Thompson point out that “[i]‌n a Tp language, the topic, and not the subject, typically controls co-referential constituent deletion” (ibid.: 469); see example 3.3.

41

Topic-prominence in Asian languages 41 shù yèzi dà, suǒyi wǒ bu xǐhuān Ø (3.3) Nèike that tree leaves big so I not like ‘That tree (topic), the leaves are big, so I don’t like it’.3 (Li and Thompson 1976: 469) In this example, the obligatory pronoun it in the English sentence is omitted in the Mandarin sentence. This slot is ‘controlled’ by the topic in Mandarin and not by the grammatical subject. Unfortunately, finding out about controlling co-reference was not possible for the majority of languages under investigation. For this reason, I made the decision to omit this criterion from the remainder of the analysis.

3.3 Ratings of individual languages and language families The previous sections introduced word order and topic-prominence from a more theoretical perspective. In this section, the theoretical insights from the previous sections will be applied by asking to which degree the major languages in contact with English in Hong Kong, India, Singapore, and the Philippines can be considered topic-prominent. It should be noted that the choice of languages was motivated by two central aspects: the importance in the country/region and the availability of sufficiently detailed grammatical descriptions. Furthermore, this is but an overview of a topic that could be its own book series; ideally, a corpus analysis of topic-first constructions would accompany each language description. However, such an in-depth analysis was not possible for what this particular study wants to achieve; instead, it should be approached as a starting point for follow-up studies and serves as a foundation for discussing a contact hypothesis in this book. 3.3.1 Rating procedure In the typological analysis, the following procedure was applied: ( 1) Review of grammatical descriptions; (2) Search for indirect or direct statements about any of the topic-prominence features listed in Table 3.4; (3) Rating of the criterion’s presence or absence, based on at least one, and, if possible, more sources, with the following options available: (a) ✓: the feature can be attested unambiguously; (b) (✓): the feature is present to a limited extent (e.g. if there are some syntactic or pragmatic constraints); (c) (✕): the feature is present to a very limited extent (e.g. if it is highly marked or several constraints are in place); (d) ✕: the feature is absent; (e) ?: none of the consulted sources explicitly or implicitly elaborated on the feature or the information was found to be ambivalent to such a degree that no rating could be determined.

42

42 Topic-prominence in Asian languages Once the features were rated, all ratings were counted with the goal of placing the contact languages on the continuum of subject-and topic-prominence. Because of a lack of personal insight into the inner workings of the languages under scrutiny, the decision was made not to ‘weigh’ the different criteria in any way. In the following sub-chapters, the typological profiles with the individual ratings as well as typological descriptions of the languages are presented. The conclusions drawn from this analysis are discussed in section 3.4. It should be noted that not every single topic-prominence feature is discussed for every language; instead, a selection of particularly controversial or interesting aspects has been chosen for discussion. 3.3.2 Indo-Aryan languages The Indo-Aryan languages represent one of the major language families of India and several other South Asian countries, such as Pakistan, Bangladesh, Nepal, Sri Lanka, and the Maldives (Masica 1991: 8). While the Indo-Aryan languages are not the only languages spoken, in each country, their speakers outnumber those of other language families (ibid.). Based on absolute speaker numbers, Table 3.5 depicts the major Indo-Aryan languages spoken in India; percentages have been added to indicate relative proportions against the total population (given in the bottom line of the table). As a brief aside, it is interesting to consider the word order of Sanskrit, the language from which the other Indo-Aryan languages developed. In a comparison of Sanskrit to Latin and Greek, Sanskrit has been described as an inflectional language with great freedom in terms of word-order variation (Aralikatti 1991: 13). Changing the word order in a sentence such as The dog bites a man would result in a completely different meaning in English; however, the same would not be the case in Sanskrit. However, despite the flexibility of constituent order in Sanskrit, an unmarked order of SOV exists (ibid.). This finding has been substantiated in other contributions, for instance Schäufele (1991). Schäufele focuses on classical Sanskrit, not on its modern forms, and describes it as a language that –at least in unmarked sentences –generally follows SOV, with adverbs occurring somewhere Table 3.5 Major Indo-Aryan languages spoken in India

Language

Speakers

% of total population

Hindi Bengali Marathi Urdu Gujarati Oriya ∑ population of India

422,048,642 83,369,769 71,936,894 51,536,111 46,091,617 33,017,446 1,028,737,436

41 8.1 7 5 4.5 3.2 100

Source: Census of India 2001. The Indian census data from 2001 are available online at www. censusindia.gov.in/.

43

Topic-prominence in Asian languages 43 between the subject NP and the VP (1991: 153). The tendency towards verb- final word order has persisted and can be found in all three of the Indo-Aryan languages analysed. Another interesting finding is that Vedic Sanskrit allowed topicalization of both complete phrases and isolated lexical items (ibid.: 170). Cardona adds that “[f]‌ronting is possible” and “[s]tylistic factors can play a role in different word orders” (2003: 155). Whether this tendency can also be found in major Sanskrit-derived languages is discussed in the following pages, starting with a discussion of Hindi. Hindi Hindi is, as shown above, a highly important language in terms of speaker numbers in India and represents one of the major mother tongues indicated by the speakers in ICE-India. For this reason, Hindi is analysed first and receives particular attention in this chapter. Singh points out Hindi’s relatively low sensitivity to word order, which is based on its case system (1994: 217; see also Kachru 2006). Explicitly marking syntactic constituents can result in word-order flexibility because an additionally fixed position (as found in English in the transition from Old English to Middle English) would be redundant. Mixed cases occur, and position is always a major criterion for information structuring, but, speaking in formal terms, a rich case system will be sufficient in order to identify constituent functions. This is why several constituents can be moved in Hindi for the overarching goal of structuring information in a certain way (Shapiro 2003: 276). Fronting is often highlighted as a favoured strategy in this regard (cf. Kachru 1990: 67; Shapiro 2003: 276; Kachru 2006: 246). Both examples (3.4) and (3.5) feature topicalized constituents. tarah kῑ hindῑ yahā̃ log (3.4) is this kind of Hindi here people ‘People don’t speak this kind of Hindi here’.

to vah hindust ānῑ lekin lagtā (3.5) hai is to he Indian but seems ‘[It’s true] he’s Indian, but he seems Japanese’.5

nah ɩ̄̃ not

bolte4 speak

(Shapiro 2003: 276)

hai

jāpānῑ Japanese

(Shapiro 2003: 276)

Despite a wide-ranging flexibility, there appear to be at least some constraints on word order. Hindi is a verb-final language, and while many constituents can be moved around freely, the position of the verb is relatively fixed (Kachru 1990: 67; Kachru 2006: 159). Examples deviating from this (3.5 being a prime example) are highly marked. In chapter 2, I pointed out that subject and topic coincide frequently, and Hindi is not different in this regard (cf. Kachru 1990: 71). However, in the

44

44 Topic-prominence in Asian languages examples above, the direct object and even the copula are moved to the sentence- initial position in order to mark them as topics of their respective sentences. Indeed, in her grammar of Hindi, Kachru claims that adverbials (2006: 246), conjunctive participle phrases (ibid.), and complements (ibid.: 247) may all be fronted. Example (3.6) provides the basis for a discussion of double subjects in Hindi. to mɛ̃ne ʃipra ko de dī (3.6) kitab book.F .S G .P T C L I AG Shipra.F D AT give give.PE R F .F .SG ‘The book I gave (it) to Shipra’. (Kachru 2006: 247; emphasis removed) The topic, that is, the book, stands right before the grammatical subject in the Hindi sentence, meaning that the first of Li and Thompson’s requirements for double subjects could be considered as met. This is also true for the second criterion because the topic does not exert any control over the verb. While there is no clear sign indicative of movement apart from the particle to and the position of kitab, it needs to be acknowledged that this is because of optional case marking on inanimate objects in Hindi (cf. Vasishth 2003: 11–12). Li and Thompson’s final criterion for double subjects –“Tp languages have sentences of this type” (1976: 468) –is therefore difficult to judge. Although evidence points to double subjects and a basicness of topic-comment sentences in Hindi, SOV is frequently highlighted as the language’s basic word order. For this reason, the decision was made to give a restricted negative rating. Considering all of the factors discussed in this section, Hindi appears to be closer to the topic-prominent pole of the continuum than to the subject-prominent pole; the other features given in the survey below (see Table 3.6) corroborate this assumption. Bangla The second Indo-Aryan language of interest is Bangla, also referred to as Bengali. Word order in Bangla generally follows SOV and is in line with other Indo-Aryan languages in this regard (cf. Dasgupta 2003: 375; Conners and Chacón 2015: 249). Although an impoverished case system resulted in a relatively fixed word order in modern Bangla, there are different means available for indicating the topic. First, there are several particles that can turn a constituent into a topic (Dasgupta 2003: 384).6 In addition to these particles, the subject or object can be placed in sentence- initial or sentence-final position “to highlight different aspects of discourse-relevant information –such as new or old information –or to background or foreground information” (Conners and Chacón 2015: 250). On the other hand, the preverbal position is reserved for the focus and may be filled by any argument or adjunct (ibid.: 251). While Bangla does have a construction resembling the passive, Dasgupta argues that it does not have a ‘true’ passive (2003: 376). The rating for double subjects is rather negative because an object affix suggests movement (see example 3.7) and topic-comment sentences are rather marked.

45

Topic-prominence in Asian languages 45 bohudin dhore brotin-o to cene [[…dhoriyā…]] (3.7) birenke Biren-Obj long-time for Brotin-emph of-course knows ‘Biren, of course, now knows Brotin too’. (Dasgupta 2003: 384) Overall, Bangla shows traits of both subject-and topic-prominence and is, therefore, similar to the other Indo-Aryan languages discussed in this chapter. Marathi The last language of interest in this sub-chapter is Marathi, an Indo-Aryan language primarily spoken in the state of Maharashtra (Pandharipande 2003: 699). Marathi shares several characteristics with other Indo-Aryan languages; most prominently, a tendency towards SOV with a certain degree of flexibility in its word order (cf. Dhongde and Wali 2009: 195). Marathi’s most important traits for the present discussion are the presence of two particles marking the topic (namely tar and mhaṇdӡe) and the sentence-initial position of topics (Pandharipande 2003: 716); see example 3.8. dzaṇa tar/mhaṇdӡe āwaśyak hota (3.8) tudzha your going emph necessary was ‘It was indeed necessary for you to go’. (lit: ‘Your going indeed was necessary’.) (Pandharipande 2003: 716) While topic marking can be achieved by sentence-initial placement and by using the two abovementioned particles, emphasis may generally also be achieved by prosodic means, moving the respective constituent to post-verbal position, repetition, or adding other particles (ibid.). The two particles used for topic marking can also be used to emphasize parts of the sentence. As in Hindi, “the subject is treated as topic” (Pandharipande 1997: 252) in unmarked SOV word order. According to Pandharipande, any constituent may be topicalized by means of various devices, but topicalization is described only as optional in Marathi (ibid.: 254). Therefore, double subjects are not a highly frequent phenomenon in Marathi. Topics are marked by particles, position, or even prosody in Marathi, which suggests a positive rating. On the other hand, there is an accusative marker, which means that topicalized direct objects can be identified as having been moved to the initial position. The rating for the basicness of topic-comment sentences in Marathi is therefore partially positive, while surface coding received a completely positive rating. The individual ratings for the major Indo-Aryan languages spoken in India are summarized in Table 3.6. The above comparison reveals a high degree of topic-prominence in Hindi, Bangla, and Marathi. For Hindi, six features indicative of topic- prominence could be attested, while Bangla and Marathi received completely and partially positive ratings for five features. For all three languages, the decisive criterion

46

Table 3.6 Topic-prominence features in Hindi, Bangla, and Marathi

(a)

(b)

(c)

(d)

(e)

(f)

(g)

Criterion

Hindi

Bangla

Marathi

Surface coding

✓

✓

✓

(Kachru 1990: 71; Kachru 2006: 245–251)

(Lehmann 1989: 370; Dasgupta 2003: 384; Conners and Chacón 2015: 250)

(Pandharipande 1997: 252; Nayudu 2008: 30–32)

Lack of passives

✕

(✓)

✕

(Kachru 1990:  69–70; Masica 1991: 356–358; Kachru 2006: 93– 94, 176–178)

(Dasgupta 2003: 376; Thompson 2012: 224–225)

(Junghare 1988: 313–314; Dhongde and Wali 2009: 185–190, 201)

No “dummy” subjects

✓

✓

✓

(Junghare 1988: 312–313; Kachru 2006: 167)

(Thompson 2012: 221; Conners and Chacón 2015: 284)

(Junghare 1988: 312–313; Pandharipande 1997: 133)

“Double subject”

(✕)

(✕)

(✕)

(Masica 1991: 363; Kachru 2006: 245–251)

(Dasgupta 2003: 384)

(Pandharipande 1997: 252–253)

V-final languages

✓

✓

✓

(Kachru 1990: 67; Shapiro 2003: 271; Kachru 2006: 159)

(Dasgupta 2003: 375; Thompson 2012: 185; Conners and Chacón 2015: 249)

(Pandharipande 1997: 138; Nayudu 2008: 49; Dhongde and Wali 2009: 195)

No/few constraints

✓

(✓)

✓

(Kachru 1990: 67; Schmidt 2003: 340; Kachru 2006: 160, 246–247)

(Conners and Chacón 2015: 249)

(Junghare 1988: 314; Pandharipande 1997: 254)

Basicness of TC- sentences

(✓)

(✕)

(✓)

(Kachru 2006: 159–160)

(Junghare 1988: 326)

(Junghare 1988: 315; Pandharipande 1997: 254)

47

Topic-prominence in Asian languages 47 for the surface coding of topics by position and/or morphological means could be attested unambiguously. As far as the basicness of topic-comment sentences is concerned, there seems to be some disagreement in the literature. Although Bangla is topic-prominent (at least to some degree), Junghare believes that it is less topic-prominent than Hindi and Marathi (1988: 371). This perspective is based partly on findings from an earlier paper in which Junghare investigated definiteness and drew the following conclusion: The more topicalization and word order variation a language allows, the less restricted its marking of definiteness. If a language can use already available devices (such as topicalization and word order variation) for marking definiteness, there is no need for that language to invent new ones (such as markers or articles). On the marking scale of definiteness, Marathi stands at the low end, Bengali at the upper end (marking the definite status most) and Hindi near the middle. (1983: 126) In their much more recent Descriptive Grammar of Bangla, Conners and Chacón describe Bangla as a language that allows scrambling, that is, the (relatively) free movement of constituents to different positions for discourse-pragmatic purposes (2015: 249). Since no conclusive statements can be given here, the assumption shall suffice that Bangla might be slightly less topic-prominent than Hindi but is still far from being a purely subject-prominent language. All three languages clearly have both subject-based and topic-based constructions which, unlike Li and Thompson suggest, places them firmly in the “both subject-prominent and topic-prominent” group of languages. 3.3.3 Dravidian languages According to Steever, the Dravidian language family “is, in terms of speakers, the fourth or fifth largest in the world [and] comprises at least twenty-three languages spoken primarily in South Asia by as many as 220 million people” (1998b: 1). While Dravidian languages are most prominently spoken in the central and southern parts of India, speakers can also be found in Bangladesh, Nepal, Pakistan, and Sri Lanka (ibid.).7 The Dravidian languages represent one of the major groups of contact languages in India, second only to the Indo-Aryan languages. Furthermore, Tamil is one of the four official languages of Singapore, meaning that it is the only language of the Indian community with official status in the country. In India, the Eighth Schedule from 1951 lists Kannada, Malayalam, Tamil, and Telugu as 4 of the 22 official languages. Unfortunately, the most recent official number of speakers for these four languages stems from the 2001 census. Despite the fact that significant changes have occurred since 2001, the data still provide an impression of the language distribution across the country. In Table 3.7, the three major Indo-Aryan languages and their speaker numbers were added for comparison; the Dravidian languages are highlighted in bold. Again, percentages have been added.

48

48 Topic-prominence in Asian languages Table 3.7 Major Dravidian languages spoken in India compared to Indo-Aryan languages

Language

Speakers

% of total population

Hindi Bengali Telugu Marathi Tamil Kannada Malayalam ∑ population of India

422,048,642 83,369,769 74,002,856 71,936,894 60,793,814 37,924,011 33,066,392 1,028,737,436

41 8.1 7.2 7 5.9 3.7 3.2 100

Source: Census of India 2001.

For the present analysis, I focus on Telugu, Tamil, and Kannada. Telugu Telugu, being the Dravidian language with the highest number of L1 speakers in India, follows SOV order and is similar to many other Dravidian languages in this regard (Krishnamurti 1998: 227). According to Krishnamurti, a “simple declarative sentence has two constituents: NP (topic); Pred (comment). The predicate may be a noun phrase (NP) or a verb phrase (VP)” (ibid.: 228). In contrast to descriptions of Tamil and Kannada, however, Krishnamurti’s chapter specifically points out the option of turning a constituent into the topic by placing it in sentence-initial position (ibid.: 229). The resulting constructions are considered marked; in “unmarked word order, the subject NP occurs as the first constituent of a sentence” (ibid.). In his traditional grammar, Arden simply describes the word order of Telugu as SOV (1905: 110) but does not mention topicalization. Tamil Tamil, like Telugu, belongs to the group of SOV languages (cf. Annamalai and Steever 1998: 117). Similar to other Dravidian languages, relatively free constituent movement is possible with the exception of the verb. “Though explicit noun morphology allows some freedom in word order, verbs remain at the right end of their clause: since they mark the clause boundary, they are displaced from that position only in marked circumstances” (ibid.). Steever points out the possibility of clefting in Tamil and confirms the fixed position of the verb (1998b: 31) but provides little information on other discourse-pragmatic constructions. Kannada Sentences in Kannada, the third Dravidian language under investigation, are also typically SOV (cf. Schiffman 1983: 95). However, deviations from SOV for stylistic or pragmatic reasons are possible. Judging from Schiffman’s description that the Kannada subject is “an important structural element [… that] plays a crucial role in many grammatical processes in the language” (ibid.), there appears

49

Topic-prominence in Asian languages 49 to be no basis for the assumption that topic-comment sentences are basic in Kannada. Still, several criteria suggesting topic-prominence could be attested for in Kannada in Li and Thompson’s framework. Table 3.8 presents the results of the typological analysis of Telugu, Tamil, and Kannada. The typological investigation of the three Dravidian languages revealed that all of them show some features typical of topic-prominent languages and some features typical of subject-prominent languages. For Telugu, Tamil, and Kannada, five features associated with topic-prominence could be attributed to each. Because of word-order flexibility in the Dravidian languages (apart from the verb, which almost exclusively occurs in final position), speakers can easily access topicalization. However, as noted for Telugu in Krishnamurti (1998: 229), placing a constituent other than the subject in sentence-initial position is a strategy that needs to be employed consciously; the resulting construction is usually considered marked or non-canonical. Of course, surface coding of topics and the basicness of TC-sentences go hand in hand: If the topic cannot be identified by position, morphological marking, a specific particle, or prosodic means, interlocutors will not be able to recognize it. In the Dravidian languages, it appears that the sentence-initial position is reserved for topics, but, as discussed earlier, the grammatical subject is often also the topic. Although the option of topicalization is mentioned in some grammatical descriptions of Dravidian languages (see, for instance, Krishnamurti 1998), the languages are never discussed as topic-prominent languages, but as languages where the subject-predicate-principle is dominant (cf. Schiffman 1983). Based on the findings in Table 3.8, I consider Telugu, Tamil, and Kannada as mixed languages that show traits of both subject-and topic-prominence. 3.3.4 Sinitic languages Sinitic contact languages play a major role in Singapore and Hong Kong. Mandarin is widely spoken in Singapore, whereas Cantonese is the main L1 spoken in Hong Kong. While English is becoming more and more important in Singapore, Mandarin was still the most widely spoken first language in Singapore in the 2010 census. Table 3.9 compares speaker percentages across the last decades. In Hong Kong, the situation is much less diverse, as Cantonese is spoken by the vast majority in the city. Table 3.10 gives an overview of the ‘usual language’ from the 2011 population census and compares absolute and relative speaker numbers by people aged 5 and over. Mandarin and Cantonese clearly are the most important Sinitic contact languages in Singapore and Hong Kong. In addition, they also represent the most well-documented Sinitic languages. Accordingly, they were selected for analysis. Mandarin Lin (2001: 123) describes Mandarin as an SVO language, which is the type that English belongs to as well.8 Having stated this, she quickly acknowledges the

50

Table 3.8 Topic-prominence features in Tamil, Telugu, and Kannada

(a)

(b)

(c)

(d)

(e)

(f)

(g)

Criterion

Tamil

Telugu

Kannada

Surface coding

(✓)

(✓)

(✓)

(Lehmann 1989: 370)

(Krishnamurti 1998: 229)

(Schiffman 1983: 127)

Lack of passives

✕

✕

✕

(Annamalai and Steever 1998: 123, 127; Kausen 2013: 651)

(Arden 1905: 127; Subbarao 2004: 173)

(Hodson 1864: 107; Jensen 1969: 87; Sridhar 1990: 214–217)

No “dummy” subjects

✓

✓

✓

(Lehmann 1989: 175– 176; Annamalai and Steever 1998: 117; Subrahmanyam 2009: 299)

(Subrahmanyam 2009: 299)

(Rau 2007: 6; Subrahmanyam 2009: 299)

“Double subject”

(✓)

(✓)

(✓)

(Shibatani 1999: 46)

(Subbarao 2012: 190)

(Jensen 1969: 147–149)

V-final languages

✓

✓

✓

(Zvelebil 1990: 43; Annamalai and Steever 1998: 117; Kausen 2013: 652)

(Zvelebil 1990: 43; Krishnamurti 1998: 227)

(Schiffman 1983: 95; Zvelebil 1990: 43; Steever 1998b: 146)

No/few constraints

✓

✓

✓

(Lehmann 1989: 176–179)

(Krishnamurti 1998: 227)

(Schiffman 1983: 95; Steever 1998b: 146)

Basicness of TC- sentences

(✕)

(✕)

(✕)

(Lehmann 1989: 176– 180; Annamalai and Steever 1998: 118; Kausen 2013: 652)

(Krishnamurti 1998: 229)

(Steever 1998b: 146)

51

Topic-prominence in Asian languages 51 Table 3.9 Speaker numbers for the main languages spoken in Singapore

English Mandarin Chinese dialects Malay Tamil Others

1980

1990

2000

2010

11.6% 10.2% 59.5% 13.9% 3.1% 1.7%

18.8% 23.7% 39.6% 14.3% 2.9% 0.8%

23.0% 35.0% 23.8% 14.1% 3.2% 0.9%

32.3% 35.6% 14.3% 12.2% 3.3% 1.1%

Source: Leimgruber 2013: 7; reprinted by permission from Cambridge University Press.

Table 3.10 Hong Kong population aged 5 and over by usual language

Usual Language

2001

Cantonese Putonghua Other Chinese Dialects

5,726,972 55,410 352,562

89.2 0.9 5.5

6,030,960 60,859 289,027

90.8 0.9 4.4

6,095,213 94,399 273,745

89.5 1.4 4.0

English Others Total

203,598 79,197 6,417,739

3.2 1.2 100.0

187,281 72,217 6,640,344

2.8 1.1 100.0

238,288 106,788 6,808,433

3.5 1.6 100.0

No.

2006 %

No.

2011 %

No.

%

Source: 2011 Population Census (2012).

fact that “basic word order can change in certain constructions […], and it can interact with pragmatics and discourse” (ibid.). She refers to Li and Thompson (1981), who classify Mandarin as a topic-prominent language, and subsequently explains the subject-topic relation in Mandarin: Subject and topic may overlap, but they do not have to, and sentences in Mandarin may well have both separately (ibid.: 124). Furthermore, she points out that the topic needs to occur sentence- initially in Mandarin (ibid.). As hinted at in Lin (2001), Li and Thompson also stress the importance of the topic in Mandarin: “[T]‌he concept of subject seems to be less significant [compared to English], while the concept of topic appears to be quite crucial in explaining the structure of ordinary sentences in the language” (1981: 16). Unlike subjects, which are not marked by position, agreement, or case, Mandarin topics share two formal properties: (1) they occur in sentence-/clause-initial position and (2) they “can be separated from the rest of the sentence (called the comment) by a pause or by one of the pause particles – a (or its phonetic variant ya), me, ne, or ba –although the use of the pause or the pause particle is optional” (ibid.: 86; emphasis in the original).

52

52 Topic-prominence in Asian languages Further evidence for the basicness of topic-comment sentences in Mandarin can be found in Sun (2006: 186) and Cheng and Sybesma (2015). The latter provide two examples, given here as (3.9) and (3.10), to show that various parts of the sentence may function as topics in Mandarin. (3.9) cóng zhé-jiā

yínháng, tì Zhāng Sān, wǒ zhīdao wǒmen from this-C L F bank for Zhang San 1SG know 1PL kéyǐ jièdào hěnduoō qián can borrow much money ‘From this bank, for Zhang San, I know we can borrow a lot of money’. (Cheng and Sybesma 2015: 1548)

(3.10) Zhāng Sāni,

cóng zhè-jiā yínháng, wǒ zhīdao wǒmen Zhang San, from this-C L F bank 1SG know 1PL kéyǐ tì tāi jièdào hěnduoō qián can for 3S G borrow much money Lit: ‘Zhang Sani, from that bank, I know that we can borrow a lot of money for himi’. (Cheng and Sybesma 2015: 1548)

Additionally, they claim that Mandarin’s status as the prototypical topic- prominent language is based on the fact that it has “so-called ‘aboutness’ topics […] which are not related to any element or constituent in the sentence, but which only have a relation with the sentence as a whole” (2015: 1547). As can be seen in the previous examples, constructions of this kind resemble double subjects. Cantonese The second major contact language belonging to the Sinitic family is Cantonese, spoken widely across Hong Kong. While its word order is generally SVO, “[i]‌t would be more accurate […] to say that while Cantonese can be treated in this way –this order normally works –departures from it play an important role in the language” (Yip and Matthews 2000: 115). In their grammar of Cantonese, Yip and Matthews make several such observations, which delineate Cantonese as a topic- prominent language. After noting that a sentence may begin with the object if the sentence is ‘about’ this object, they comment on the syntactic discrepancy between Cantonese sentences and their English translations: Frequently, “the most natural English translation does not put the object first; this illustrates how the Cantonese syntax ‘prefers’ the topicalized version” (ibid.: 116). Sentences such as (3.11) exemplify that the topic and the subject do not have to be grammatically related. hói àh, deihtit jeui faai (3.11) Gwo cross sea SFP underground most fast ‘For crossing the harbour, the underground is fastest’. (Yip and Matthews 2011: 77)

53

Topic-prominence in Asian languages 53 Double subjects in Cantonese –like the ones in Mandarin –are thus loosely connected to the following predicate, which gives the appearance as though there were two subjects in the sentence; see also example (3.12). (3.12) Hēunggóng

làuhga gwai dou séi Hong Kong flat-price expensive until die ‘Flat prices in Hong Kong are ridiculous’. (Yip and Matthews 2011: 86)

Despite the loose connection, Cantonese, like Mandarin, uses pauses or topic particles to mark the topic (ibid.: 78). Interestingly, Yip and Matthews’s book is one of the very few sources explicitly mentioning the problem of controlling co-reference (see 2011: 78). They claim that subject and object pronouns may be omitted if these refer back to a previously mentioned or implicit topic (ibid.). Regarding the basic sentence structure of Cantonese, Yip and Matthews repeatedly point out that topic-comment sentences are the default (2011: 84). In addition, they present various kinds of hanging topics in Cantonese: ( 1) The topic sets a location in time or space. (2) The topic sets up a whole, of which an element later in the sentence represents a part. (3) The topic states a general category of which the subject or object represents a particular type. (Adapted from Yip and Matthews 2000: 117–118) Information on topicalization in Cantonese can also be found in literature that is primarily concerned with HKE. In their monograph on HKE, for instance, Setter et al. mention the influence of Cantonese topic-comment structures on the variety (2010: 56). For the present section, their observation certainly reinforces the assumption that Cantonese is a topic-prominent language. The ratings for Mandarin and Cantonese are summed up in Table 3.11. Unsurprisingly, both Mandarin and Cantonese show many traits of topic- prominent languages. With the exception of passives and verb-final word order, all criteria for topic-prominence are met at least to a certain extent. Topics are coded in both languages, double subject constructions are common, and topic- comment sentences abound. 3.3.5 Austronesian languages The last group of languages to be analysed in this chapter are the Austronesian languages; in particular, detailed analyses are provided of Singapore’s national language, Malay, and of Tagalog, the majority language spoken in the Philippines. Malay is another important language in Singapore, with 12.2 per cent indicating it as their mother tongue in the 2010 census (see Table 3.9). Tagalog, in turn, is the most widely spoken first language in the Philippines, with 24.4 per cent (or,

54

54 Topic-prominence in Asian languages Table 3.11 Topic-prominence features in Mandarin and Cantonese

Criterion

Mandarin

Cantonese

(a)

Surface coding

✓ (Li and Thompson 1981: 16; Ross and Sheng Ma 2006: 352–354; Shyu 2014: 100)

✓ (Yip and Matthews 2000: 115; Yip and Matthews 2011: 78)

(b)

Lack of passives

✕

✕

(Lin 2001: 150; Sun 2006: 211–212; Cheng and Sybesma 2015: 1537)

(Killingley 1993: 39; Yip and Matthews 2000: 115; Yip and Matthews 2011: 168–172)

No “dummy” subjects

✓

✓

(Hendricks 2003: 297)

(Killingley 1993: 37; Setter et al. 2010: 58; Yip and Matthews 2011: 83)

“Double subject”

✓

✓

(Cheng and Sybesma 2015: 1547)

(Setter et al. 2010: 56–58; Yip and Matthews 2011: 86–88)

V-final languages

(✕)

✕

(Lin 2001: 123; Cheng and Sybesma 2015: 1541; Paul 2015: 7–52)

(Killingley 1993: 41; Yip and Matthews 2000: 115; Yip and Matthews 2011: 78)

No/few constraints

(✓)

✓

(Paul 2015: 31–32)

(Yip and Matthews 2000: 115; Yip and Matthews 2011: 77–78)

Basicness of TC- sentences

(✓)

✓

(c)

(d)

(e)

(f)

(g)

(Li and Thompson 1981: 16; (Yip and Matthews 2000: 115; Kausen 2013: 725) Kausen 2013: 725)

in speaker numbers, 22.5 million) of the population naming it as their mother tongue in 2010 (Philippine Statistics Authority 2017). Malay Varieties of Malay are spoken most prominently in Indonesia (called ‘Bahasa Indonesia’, with Bahasa meaning ‘language’, cf. Othman 2012) and in Malaysia (‘Bahasa Malaysia’), but also in Singapore. According to Kausen, the differences

55

Topic-prominence in Asian languages 55 between the major varieties of Malay are so minor that we can call them varieties of one language rather than distinct languages (2014: 522). SinE was mostly influenced by Baba Malay9 and, to a somewhat lesser extent, by the former lingua franca Bazaar Malay. Both of these forms of Malay, however, are losing influence in the country because of the continuous rise of English (cf. Ho and Platt 1993; Bao and Aye 2010). Regarding word order, Bao and Aye (2010: 156) describe Malay as SVO, while Lee (2014: 255) describes it as predicate-final, with the predicate usually being a verb. Since most examples I could find are SVO, a restricted negative rating was given for V-final word order. Malay has the kena as well as the kasi passive (Lee 2014: 223–225), with the kena passive being a prominent transfer feature found in SinE (Bao and Wee 1999; Fong 2004; Bao 2010). In both Baba Malay and Bazaar Malay, topicalization plays an important role. While Lim (1988) calls topic-comment structures the basic sentence form in Baba Malay, Lee suggests that “it may be preferable to assert that topicalization often happens in BM [= Baba Malay]” (2014: 274). As constituents that may be topicalized, Lee (2014) lists objects (3.13), adjective phrases (3.14), and adverbial clauses (3.15).

(3.13) [Ikan

kuning], tarok asam fish yellow put tamarind ‘Yellow fish, put tamarind (on it)’.

ini mas real this gold ‘Real, this gold is’.

(Lee 2014: 275)

(3.14) [Betol],

(3.15)

[Betol lawa], dia real stylish 3.S G ‘Really stylish, he dresses’.

(Lee 2014: 275)

pakay wear (Lee 2014: 276)

Bazaar Malay has been heavily influenced by Chinese and, as a result, has acquired topic-comment structures of different types (Bao and Aye 2010: 160). Bao and Aye give numerous examples; one of them is provided in (3.16).

(3.16)

[Kerja punya pasal]TOP [dia tanya e]S10 work MOD matter 3S G ask ‘(A) work-related matter, he asked (about)’. (Bao and Aye 2010: 160)

56

56 Topic-prominence in Asian languages The question of whether there are double subjects in Malay is difficult to answer. Lee claims that “it is common for subjects to not be explicitly expressed, leaving verb phrases to already front the clause” (2014: 274). For such cases, a rating would be impossible. Going from the examples given by Lim (1988: 45) and Bao and Aye (2010), double subjects seem to occur in some constructions, which could explain why a restricted positive rating was given. The same applies to the basicness of topic-comment sentences. Following Bao and Aye (2010), sentences featuring topicalization in Bazaar Malay occur frequently. Regarding Baba Malay, Lee “hesitates to state that BM is more Topic- Comment than it is Subject-Predicate, simply because there are many instances of Subject- Predicate examples” (2014: 274). However, this seems to be an incomplete evaluation. Perhaps it is a bold assumption, but it appears we should assume that there is no language where subject-predicate and topic-comment never overlap. The typological profiling of the languages in this chapter confirms this, and so does the theoretical reasoning in chapter 2. Thus, the basicness of TC sentences also receives a restricted positive rating. To be thorough, it should be mentioned again that the varieties of Malay spoken in Singapore have themselves been influenced significantly by Chinese (cf. Lim 1988; Bao and Aye 2010).11 Considering the ongoing multilingualism in the country, this is hardly surprising. However, the influence of Malay on SinE –while far from negligible –is, on the one hand, not as far-reaching as that of Chinese and, on the other hand, also serves as a ‘carrier’ of originally Chinese features (see Bao and Aye 2010). Tagalog The other Austronesian language to be analysed in this chapter is Tagalog, spoken by the majority of Filipinos.12 In the analysed ICE corpora, 78 per cent of speakers indicate Tagalog as their mother tongue or as one of their native languages. The syntax of Tagalog has been debated rather extensively (see the dispute between Schachter 1976 and Drossard 1984; Himmelmann 2005a), and there appears to be no agreement, to this day, regarding the amount of control the agent and the topic exert in Tagalog sentences. One aspect of Tagalog word order, however, seems undebated: Sentences in Tagalog are predicate-initial, with the predicate being either a verb, a noun, an adjective, or a prepositional phrase (Schachter 2015: 1658). The construction at the heart of the debate of whether Tagalog is a TP language consists of the preceding particle ang and whichever constituent follows. Ramos, for instance, describes Tagalog sentences as being composed of a topic and a comment and considers ang as a marker indicating which element of the sentence is the topic (1971: 79–81). Such descriptions are frequently found in grammatical descriptions of Tagalog written in the last century, with Schachter and Otanes (1972) being the most prolific of these. Newer publications, however, are critical of this perspective. In one of the most recent publications touching upon the problem, Schachter (2015) refers to ang as a ‘Trigger’ rather than some sort of topic marker; see (3.17).

57

Topic-prominence in Asian languages 57 (3.17) Mag-aabot

ang babae ng laruan AT.will:hand T woman TH toy ‘The woman will hand a toy to a/the child’. AT = Actor-Trigger  affix T = Trigger marker TH = Theme marker D = Direction marker

sa D

bata. child

(Schachter 2015: 1659)

The Trigger refers to the subsequent phrase and, as a whole, such constructions are frequently translated as definite (as in the example above) (Schachter 2015: 1659). According to Schachter, “[i]‌t is the regular association of the Trigger with definiteness that has led some previous analysts (e.g. Schachter and Otanes 1972) to identify the Trigger as a topic” (ibid.). However, not all Triggers fulfil ‘typical’ functions of topics. I established in chapter 2 that a major function (and inherent characteristic) of topics is their ‘aboutness’, which means that topics indicate what the rest of the sentence will be about. Tagalog Triggers and their referents do not always fulfil this function, leading Schachter to the abovementioned interpretation in his recent work (cf. ibid.: 1659). Himmelmann (2005a, 2005b) provides yet another perspective and calls ang a “specific article” which, together with the element it occurs with, forms the subject (2005b: 355–356). Drawing conclusions from these deliberations is difficult because there is clearly no consensus. The more recent sources, however, lean more towards ang being a construction relating to the subject. For this reason, coupled with Schachter’s claim that some constructions do involve topic marking, the basicness of topic-comment sentences receives a restricted negative rating. Because the ang-complex is difficult to comprehend (apparently even by scholars working with the language), I decided to leave a question mark for the double- subject criterion. The results for Malay and Tagalog are summed up in Table 3.12. In conclusion, it became evident that neither Malay nor Tagalog represent clear- cut cases. For some criteria of topic- prominence, there are heavily contradicting positions in the literature. However, Malay clearly exhibits features suggesting both subject-prominence and topic-prominence, whereas the situation in Tagalog is rather non-transparent. Counting which features are attested for in Tagalog (and which are not) is just as possible for this language as it is for any other, but the focal point in the literature, the ang-construction, is subject to debate. For this reason, I agree with Li and Thompson in their stance that Tagalog cannot be categorized properly in their framework (although the reasons for this claim are different).

3.4 Summary Based on the criteria described by Li and Thompson (1976), the results of the typological analysis are visualized in Table 3.13. For each language, a rating sums up the findings: The label SP/TP indicates both subject-prominence and

58

58 Topic-prominence in Asian languages Table 3.12 Topic-prominence features in Malay and Tagalog

Criterion

Malay

Tagalog

(a)

Surface coding

(✓) (Lee 2014: 274)

(✓) (Katagiri 2006)

(b)

Lack of passives

✕ (Lim 1988: 46; Lee 2014: 223–225)

✕ (Möller 2013: 109)

(c)

No “dummy” subjects

✓

✓

(Lim 1988: 43)

(Himmelmann 2005b: 365; Möller 2013: 116)

(d)

“Double subject”

(✓) (Lim 1988: 45; Bao and Aye 2010)

?

(e)

V-final languages

(✕) (Bao and Aye 2010: 156; Lee 2014: 255)

✕ (Himmelmann 2005b: 355; Katagiri 2006: 5)

(f)

No/few constraints

(✓)

(✕)

(Lee 2014: 274)

(Katagiri 2006: 23)

(g)

Basicness of TC-sentences

(✓)

(✕)

(Lee 2014: 274)

(Schachter and Otanes 1972: 60; Himmelmann 2005b: 356; Katagiri 2006: 23)

topic-prominence, the label TP indicates topic-prominence, and the label NN indicates that the language is neither subject-nor topic-prominent. For some of the languages, Li and Thompson’s categorization from 1976 could be confirmed. While Mandarin and Cantonese do not meet every single requirement for being considered purely topic-prominent languages, they show all of the important traits of topic-prominence. Since a classification is simultaneously based on, and interested in, showing tendencies, I did not make decisions solely on grounds of how many features could be attested, but rather on the overall picture. In the case of the Sinitic languages, the overall picture very clearly points towards topic-prominence. In addition to the Sinitic languages, I also found Li and Thompson’s classification of Tagalog (as a language that does not fit into the continuum) to be the most appropriate. There is still no consensus regarding some of the most basic structures in Tagalog; therefore, a definite description of what is occurring in the language exceeds my grasp. While I could not find a recent and comprehensive grammatical description of Malay (Bahasa Melayu or Bahasa Malaysia), the various texts providing information on the language do suggest a status similar to the Indo-Aryan and the Dravidian languages.

59

newgenrtpdf

Table 3.13 Overview of the typological profiles with regard to topic-prominence

(a) (b) (c) (d) (e) (f) (g)

Criterion

Hindi

Bangla

Marathi

Tamil

Telugu

Kannada

Mandarin

Cantonese

Malay

Tagalog

Surface coding Lack of passives No dummy subj. “Double subject” V-final languages No/few constraints Basicness of TC Rating:

✓ ✕ ✓ (✕) ✓ ✓ (✓) SP/TP

✓ (✓) ✓ (✕) ✓ (✓) (✕) SP/TP

✓ ✕ ✓ (✕) ✓ ✓ (✓) SP/TP

(✓) ✕ ✓ (✓) ✓ ✓ (✕) SP/TP

(✓) ✕ ✓ (✓) ✓ ✓ (✕) SP/TP

(✓) ✕ ✓ (✓) ✓ ✓ (✕) SP/TP

✓ ✕ ✓ ✓ (✕) ✓ ✓ TP

✓ ✕ ✓ ✓ ✕ ✓ ✓ TP

(✓) ✕ ✓ (✓) (✕) (✓) (✓) SP/TP

(✓) ✕ ✓ ? ✕ (✕) (✕) NN

60

60 Topic-prominence in Asian languages The Indo-Aryan group is, in turn, where I disagree most strongly with Li and Thompson’s classification. The authors lump all Indo-European languages together and do not differentiate closely which, in this case, led to overgeneralization. Thus, I rather agree with Junghare in that “the application of Li and Thompson’s (1976) categories indicates that Indo- Aryan languages such as Marathi and Hindi are subject-prominent as well as topic-prominent” (1988: 316). In fact, all the Indo-Aryan languages as well as the Dravidian languages I investigated show numerous traits of subject-and topic-prominence, which gives substantial evidence to the claim that topicalization in IndE (and SinE) is, at least to an extent, a contact-induced feature. Although Masica does not consider topic-comment to be the basic structure of Indo-Aryan languages, he calls “violations of normal order in the form of meaningful displacements of constituents […] an important syntactic feature” (1991: 394; emphasis removed). Overall, two main findings can be drawn from this chapter. First, it has shown, by identifying high degrees of topic-prominence in several contact languages of HKE, IndE, PhilE, and SinE, that language contact and, consequently, transfer need to be taken into consideration in an analysis of topicalization in Asian Englishes. This has frequently been hinted at (cf. Winkle 2015), but thus far has not been investigated in detail. Additionally, for IndE in particular, it was shown that some of its most prominent contact languages often employ topicalization and are far from being purely subject-prominent. For the remainder of this book, this typological overview serves two purposes: It represents the foundation for the comparison of actual corpus examples with structures found in the contact languages in chapter 6 and is taken into account as an argument for a contact hypothesis in chapter 7.

Notes 1 In addition, Conners and Chacón note a general lack of research “on scrambling particularly […] for Bangla and other South Asian languages” (2015: 249–250), with scrambling referring to the movement of constituents (such as in topicalization). This finding extends to various Asian contact languages of English. 2 Dryer offers the following description: “The terms subject and object are used here in a rather informal semantic sense, to denote the more agent-like and more patient-like elements respectively. Their use here can be defined in terms of the notions S, A, and P, where the S is the single argument in an intransitive clause, the A is the more agent-like argument in a transitive clause, and the P is the more patient-like argument in a transitive clause. For the purposes of this map [i.e., the typological overview], then, the term subject is used for the A while the term object is used for the P” (Dryer 2005a: 330; emphasis in the original). 3 The example was slightly modified by replacing a line given at the end of the original sentence with the sign for a zero form. 4 Italics were removed in all examples in this chapter. Glosses were left untouched and no other modifications were made unless indicated otherwise. 5 Note that no direct translation for the Hindi particle to is given; according to Kachru, “[o]‌ne of the functions of this particle is to mark the thematic element in sentences” (2006: 246).

61

Topic-prominence in Asian languages 61 6 These particles are “/to/‘of course, as you know’, /nāki/‘apparently’, /ki/‘Q’ (turns what without it would be a declarative clause into a yes-no question), /je/‘TopCompC’ (marks a clause as a topicalized complement clause)” (Dasgupta 2003: 384). 7 Steever points out that there are also speakers of Dravidian languages, of Tamil in particular, in Fiji, Indonesia, Malaysia, Martinique, Mauritius, Myanmar, Singapore, South Africa, and Trinidad (1998b: 1). The role of Tamil in Singapore is discussed separately in the chapter on SinE; the remaining countries are mentioned merely for the sake of completeness. 8 Li and Thompson (1981: 26) observe and discuss the possibility of Mandarin undergoing a shift from SVO to SOV, but ultimately conclude that Mandarin is best described as a language that cannot easily be classified as one type or the other. Kausen (2013: 724) mentions that SOV occurs in certain contexts, while Paul (2015) is critical of the assumed change to SOV. For the typological profile, a restricted negative rating was chosen. 9 Baba Malay is the variety of Malay spoken by the Peranakan Chinese settlers in Singapore (cf. Bao and Aye 2010: 155–156; Lim 2015). 10 The form e is a resumptive pronoun (Bao and Aye 2010: 159). 11 Another common line of reasoning is that Malay played a disproportionate role in the (re-)shaping of SinE (cf. Ansaldo et al. 2007). This argumentation is based on Mufwene’s ‘Founder Principle’ (1996, 2001), which claims that “structural features of creoles have been predetermined to a large extent […] by characteristics of the vernaculars spoken by the populations that founded the colonies in which they developed” (Mufwene 1996: 84). Bao, however, questions the founder effect as no convincing evidence has been presented so far in his opinion (2015: 30). 12 The language is referred to both as Tagalog and Filipino.

62

4 Development and variety status of four Asian Englishes

This chapter introduces the four Asian Englishes under consideration: Hong Kong English (HKE), Indian English (IndE), Philippine English (PhilE), and Singapore English (SinE). The term ‘Asian Englishes’ is commonly used to lump together the different varieties of English found on the Asian continent. Delineating what we mean when we speak of Asian Englishes or English in Asia is a primary concern of Braj B. Kachru’s book Asian Englishes: Beyond the Canon (2005). One of the most telling statements occurs early in the book, as Kachru claims that “English in one way or another has a presence in the most vital aspects of Asian lives including our cultures, our languages, our interactional patterns, our discourses, our economics, and, of course, our politics” (2005: 10). Yet, the most important role of English has been in “altering our identities as individuals, societies, and the identities of Asian languages” (ibid.). Contact varieties, which Asian Englishes inherently are, tend to represent these alterations in both their usage contexts and their typology. A well-developed and well-researched example for this is Singlish (Singapore’s basilectal variety of English): Mixing various features from the involved contact languages, it is now an integral part of a distinctly Singaporean identity (see Lim 2009 and Lim 2015). However, the situation has not been as favourable for anything other than Standard English (or English in general) in many parts of Asia as it has in Singapore and, even there, the government tried for a long time to block the development of a vernacular variety. It is only in more recent times and an increasingly globalized world that the importance of learning English has come to be acknowledged more extensively. Partially because of this (gradually) more liberal attitude towards the language, ESL and EFL varieties with their own distinctive features are developing rapidly throughout Asia. In addition, intense language contact and the fact that “[t]‌he functional dynamics of Asian Englishes –as indeed of other Asian languages –are in constant change” (Kachru 2005: 1) make these varieties not only comparable to some extent, but highly interesting objects of research in their own right. As mentioned above, I focus on HKE, IndE, PhilE, and SinE. This selection is based on the data available in the ICE family of corpora and proves to be an interesting mixture. With the exception of the Philippines –a former American colony (see Schneider 2007) –all the countries where the selected varieties are spoken were British colonies, and all of them gained independence or experienced

63

Development of four Asian Englishes 63 a shift in governance in the second half of the twentieth century. The individual developments of the varieties, however, differ extensively from each other. In this chapter, I proceed as follows: After a brief description of earlier models of World Englishes (with a focus on the Three Circles introduced by Kachru 1985), I outline Schneider’s (2003, 2007) “Dynamic Model of Postcolonial Englishes” and comment on recent additions and modifications to the model. The following sections introduce the selected varieties in more detail. After a brief look at some demographical aspects, the developmental stage of each variety is discussed. In all cases, the functional distribution between English and its contact languages is taken into account.

4.1 Analysing World Englishes 4.1.1 Earlier models The first systematic classifications of native and non-native varieties of English go back to the early 1980s. Quirk et al.’s (1972: 3–4) threefold distinction between ENL (English as a Native Language), ESL (English as a Second Language), and EFL (English as a Foreign Language) represents the first impactful account in this regard. The terms ENL, ESL, and EFL, first conceptualized by Strang (1970),1 still represent useful labels, but are far too static to explain the complex linguistic situations in many countries. As Schneider remarks, the separation into these categories fails to acknowledge the presence of non-native-speaking groups, whether indigenous or immigrant, in ENL countries: there is no room reserved in this framework for, say, French Canadians, Native Americans, Australian Aboriginals, or Pakistani communities in Britain. (2007: 12) Another model that still has its uses but has also been the subject of criticism is Kachru’s (1985) Three Circles model, depicted in Figure 4.1. The model maintains a differentiation amongst three categories but calls them ‘Inner Circle’, ‘Outer Circle’, and ‘Expanding Circle’. The Inner Circle refers to endonormative L1 varieties in the UK, the United States, Australia, and New Zealand, while the Outer Circle encompasses numerous countries in Asia and Africa in which English retained various functions after being introduced as the language of the settlers (cf. Bruthiaux 2003: 159–160). The Expanding Circle includes countries where English is taught as a foreign language and has few other functions. Without the context provided by Kachru, the terms Inner, Outer, and Expanding Circle simply appear to be rebrandings of the terms ENL, ESL, and EFL (see Edwards 2016: 3). Although Kachru has played a crucial role in shaping World Englishes research, the claim that his model “has outlived its usefulness” (Bruthiaux 2003: 161) appears to be a correct assessment given the development of more complex, dynamic approaches. A fair evaluation of Kachru’s Circles,

64

64 Development of four Asian Englishes

Expanding circle

Outer circle

Inner circle e.g. USA, UK 320–380 million

e.g. India, Singapore 300–500 million

e.g. China, Russia 500–1,000 million

Figure 4.1 Kachru’s three circles model.

Source: Crystal 2003: 61; reprinted by permission from Cambridge University Press.

however, should also make mention of the fact that Kachru was well aware of many shortcomings of his model (see Buschfeld 2011: 71). Varieties beyond the Inner Circle were being dismissed as inferior in previous approaches, whereas Kachru did not regard Outer and Expanding Circle varieties as defective. His influence can also be seen in the fact that several other models of World Englishes have been proposed, but none of them –with the exception of the Dynamic Model –had quite the same impact as Kachru’s.2 Table 4.1 is an adaptation of Edwards’s (2016: 3) survey of ‘traditional’ models of World Englishes and provides a sociolinguistic account of how English has been perceived through these models. The table underscores that English has not been perceived as a dynamic, developing language in earlier approaches. Shifts from, for example, an

65

Development of four Asian Englishes 65 Table 4.1 Tripartite classification of English(es) worldwide

ENL/Inner Circle

ESL/Outer Circle

EFL/Expanding Circle

English as a native language users norm providing / endonormative English acquired at home, at school and in wider society intranational communication StdE USA, UK, Australia, New Zealand

English as a second language users norm developing / exo-to endonormative English acquired at school and in wider society

English as a foreign language

intranational communication innovation Nigeria, India, Philippines

international communication

learners norm dependent / exonormative English acquired at school

error Italy, Morocco, Brazil

Source: Edwards 2016: 3; reprinted by permission from John Benjamins.

exonormative towards an endonormative orientation are neither considered to a sufficient extent nor can they be adequately explained by traditional models. An outline of a much more comprehensive model, namely Schneider’s (2003, 2007) Dynamic Model of Postcolonial Englishes, follows. 4.1.2 The Dynamic Model There are numerous overviews of Schneider’s (2003, 2007) Dynamic Model, both brief and comprehensive (see Buschfeld 2011), as well as applications of the model to varieties in the Outer Circle, the Expanding Circle, and the various stages in-between (see, for instance, Buschfeld 2011, 2013 on English in Cyprus; Edwards 2014, 2016 on English in the Netherlands). For this reason, I provide only a brief summary of the model in the following paragraphs. One of the Dynamic Model’s major advantages over previous models is its multidimensional approach to the development of Englishes in postcolonial settings. Although claims could be made that the Circles model is not quite as static as the ENL–ESL–EFL distinction, it does not account for historical developments and multilingual environments. This gap was closed with the introduction of the Dynamic Model. A central aspect of the model is the differentiation of five phases in the development of a variety: ‘Foundation’, ‘Exonormative Stabilization’, ‘Nativization’, ‘Endonormative Stabilization’, and ‘Differentiation’. These five phases chronicle the typical (expected) development of a postcolonial variety, but Schneider (2014) acknowledges that not all of these phases are necessarily part of every variety’s development. For each of the five phases, the Dynamic Model takes into account and describes the history and politics of English, identity construction, sociolinguistic conditions and attitudes, and linguistic developments. The model further distinguishes between developments in the settler (STL)

66

66 Development of four Asian Englishes strand and the indigenous (IDG) strand. The next pages describe some major developments during each of the phases. Phase 1 During the foundation phase, “English is brought to a new territory by a significant group of settlers, and begins to be used on a regular basis in a country which was not English-speaking before” (Schneider 2007: 33). The STL and the IDG strand become aware of each other’s existence, but a sense of unity has yet to emerge. Because of dialect contact within the STL stream and interactions between settlers and indigenous people, a “complex contact situation emerges” (ibid.: 34). In this phase, bilingualism is limited to a small group of IDG people who function as mediators “interact[ing] with the immigrants as traders, translators, or guides, or in some political function” (ibid.: 35). On the linguistic level, koinéization, incipient pidginization, and toponymic borrowings are typical of the foundation phase. Phase 2 The name of the second phase is ‘exonormative stabilization’. ‘Stabilization’ refers to the fact that “colonies or settlers’ communities tend to stabilize politically, normally under foreign, that is, mostly British, dominance” (ibid.: 36). During this time, English is used regularly in official contexts, such as administration, education, and law. In contrast to the phase of foundation, identities are shifting, and edges are softened: Both the settlers and the indigenous people continue to value their own original identity but also begin to develop (positive) attitudes towards the other group; moreover, first generations of mixed ethnicity are born (ibid.: 37). Most linguistic changes during this phase affect the lexicon, although first modifications of morphology and syntax can also be observed at later stages of the second phase (ibid.: 39). Phase 3 The third phase, nativization, is described by Schneider as “the most interesting and important, the most vibrant one, the central phase of both cultural and linguistic transformation” (2007: 40). Typically, the colonized countries gain independence during this phase, and the gap between the settlers and the indigenous people is reduced (ibid.: 41). Contact between the groups is now common and, in spite of differences in how the two groups accommodate to each other, “the readiness to accept localized forms, gradually also in formal contexts, increases inexorably” (ibid.: 42– 43). Linguistically, the phase of nativization entails large-scale changes resulting in a distinct variety. Most importantly, the areas of morphology and syntax undergo significant change and develop “constructions peculiar to the respective country” (ibid.: 44). A decisive point, according to Schneider, is

67

Development of four Asian Englishes 67 that the communities are moving closer toward each other. Mutual negotiation results in a shared variety which is a second language for some and a first language, incorporating erstwhile L2-transfer features, for others. (Ibid.: 45) The grammatical and lexical inventory of the community is enriched, and pragmatic features may be transferred (ibid.: 46–47); such transfer is often culturally motivated and plays a role in explaining frequencies of topicalization. Phase 4 Endonormative stabilization, the fourth phase described in the model, usually “follows and presupposes political independence” (ibid.: 48). Gaining independence is important for this phase to commence because it provides the authority to make important decisions with regard to language planning. Sometimes, an ‘Event X’ initiates the fourth phase and introduces a shift in how the country of origin is seen. Identity-wise, the settlers see themselves as inhabitants of a newly born nation of which the indigenous people are part. As a result, “the role of ethnicity, and ethnic boundaries themselves, will tend to be redefined and regarded as increasingly less important” (ibid.: 49). Consequently, the new variety of English is gradually accepted and used as a carrier of the new identity. The phase of endonormative stabilization marks a shift from “English in X” to “X English” and is a time of emerging literary creativity involving the new variety. A high degree of homogeneity and codification of the new variety of English signal, on the one hand, “a new language variety which is recognizably distinct in certain respects from the language form that was transported originally, and which has stabilized linguistically to a considerable extent” (ibid.: 51), and “the acceptance of earlier spoken realities as appropriate to formal and written contexts” (ibid.: 52) on the other. Phase 5 As there is now external stability, internal differentiation becomes possible. In the fifth phase, the focus is, therefore, no longer on being a unified new nation that is separate from the colonizing nation; rather, individual subgroups with their own identities and individual social networks emerge and become more important (ibid.: 53). Consequently, this is the period of dialect birth (ibid.: 54). Recently, new models have been proposed to explain the development of varieties that cannot be easily described by means of the Dynamic Model. Both Schneider’s (2014) model of ‘Transnational Attraction’ and Buschfeld and Kautzsch’s (2016) ‘Extra-and Intra-Territorial Forces’ model take into account processes of globalization and general developments that shape and influence the development of a variety also in non-postcolonial contexts. For an analysis of the four Asian varieties under scrutiny, the Dynamic Model remains an ideal starting point. While there are some important differences between the varieties (e.g., the

68

68 Development of four Asian Englishes Philippines as a former colony of the United States), all four of the Asian varieties share a colonial background.

4.2 Introducing the varieties In the following sections I introduce the four Asian varieties under consideration. After providing some geographical and demographical information, I discuss each variety’s status in the Dynamic Model. For reasons of comparability, the order of the following sections roughly corresponds to the order in which the contact languages described in chapter 3 were presented. 4.2.1 Indian English Demographics of India India is a country of enormous proportions and a diverse population. The decennial census shows that, in the years between 1991 and 2001, the number of Indians rose above one billion, with the latest census, in 2011, indicating a population of 1.21 billion inhabitants (see Census of India 2011). Mumbai and Delhi represent the South Asian country’s biggest metropolitan areas and New Delhi, one of Delhi’s districts, serves as the nation’s capital. Because of the size and diversity of the country, the structures and functions as well as the attitudes towards English in the country are manifold and demand careful examination. The status of IndE In an innovative account of the history of India, Lange (2012a: 65– 70) locates English in India in a ‘communicative space’, a concept put forward by Oesterreicher (1995), Koch and Oesterreicher (1996), and Oesterreicher (2001). Oesterreicher defines ‘communicative space’ in the following way: It is necessary to consider the scenario in which the communicative space of a society or nation is not limited to varieties of one language, but different languages are assigned to certain functions. (Oesterreicher 1995: 9; my translation)3 Examples of such communicative spaces can be found throughout the history of English,4 but they are also part of many Asian countries today. Clearly, the functional division between languages is also a major aspect in Schneider’s (2007) Dynamic Model and needs to be taken into consideration whenever varieties of English are in focus. In the case of India, the functional distribution between Hindi, English, and the main regional languages is of particular importance. According to Lange (2012a), there are opposing viewpoints regarding the role of English in India’s communicative space. While “one camp sees English as an ‘outsider’ and as a threat to the Indian communicative space”, the other camp

69

Development of four Asian Englishes 69 “embraces English as an asset” (2012a: 70) and acknowledges the importance of English in finding employment. Since a positive attitude towards English is a primary catalyst for its development and a requirement for plurifunctionality, mixed attitudes may result in a decelerated development of the variety. Indeed, the status of IndE –even the very existence of a variety called ‘Indian English’ –“has been fiercely contested both by linguists who happen to be interested in New Englishes and by more or less everyone concerned with the teaching of English in India” (Lange 2012a: 1). Given the extraordinary historical, social, and linguistic factors contributing to the complex situation of English in India, such statements come as no surprise. The country’s linguistic abundance, with 447 living languages spoken in the country according to the Ethnologue (Lewis et al. 2016), further contributes to the issue. A sustained assumption in theorizing about IndE has been the lack of a supraregional variety: This line of reasoning posits the implausibility of a monolithic variety, ‘Indian English’, when numerous sub-varieties may exist, all of which are linked to certain groups and mother tongues in India (Lange 2012b: 133). Lange points out that a lack of time means fewer opportunities for “internal differentiation and dialectological boundaries” to develop, and, furthermore, “accommodation processes [= dialect levelling and koinéization] generally favour reduced variation” (2012b: 134). Diatopic variation is, therefore, likely on the phonological but unlikely or insignificant on the syntactic level. IndE in the Dynamic Model The status of IndE in the Dynamic Model has been subject to debate. Whereas Schneider (2007) and Sharma (2009) describe IndE as a Phase 3 variety, Mukherjee (2007) and Schneider (2014) suggest that the variety shows signs of endonormative stabilization. In the following paragraphs, I outline the main events and developments of the first three phases. Then, I discuss some arguments put forward for IndE having progressed into the fourth phase. Phase 1, the foundation of IndE, can be dated to 1600, when Queen Elizabeth I granted a charter to the East India Company (Schneider 2007: 162; Sedlatschek 2009: 8). Despite gaining a foothold much earlier than many other varieties, English did not begin to play a major role in the country until the “second half of the eighteenth century, when the motivation shifted from purely economic interests to a strive for political authority” (Schneider 2007: 163; cf. also Kachru 1994: 502). Phase 2 was put into motion by spreading English-language teaching and bilingualism with local languages and English (Schneider 2007: 164), which was a consequence of the increasing political power of the British in India. Although knowledge of English was mostly confined to an elite, certain members of the middle class were also required to have (at the least) functional proficiency in English. In the beginnings of the twentieth century, English was also acquired by parts of the middle and lower classes (Kachru 1983: 23). Phase 3 followed as the remainder of the twentieth century saw a transition from an exonormative orientation towards nativization, with India gaining independence in 1947 as a major turning point (Schneider 2007: 166). As Sedlatschek

70

70 Development of four Asian Englishes notes, “English passed through an eventful history of rejection and acceptance after 1947 before it would gain permanent official recognition” (2009: 18). Today, English is still commonly associated with an elite despite being also used for purposes beyond administration, education, and the media. Phase 4 might be incipient in India, but the status of IndE as a potentially endonormative variety is controversial. A major proponent of IndE having reached the phase of endonormative stabilization is Mukherjee (2007): He identifies the language riots against Hindi in the state of Tamil Nadu from 1965 (see Lange 2010) as Event X, that is, the event that set the transition into the fourth phase into motion (2007: 168). The reason for this, he claims, is the effect the riots had on how English was viewed in the country. In the riots’ aftermath, “the political parties readjust[ed] their stance on language policy and ensure[d]‌the continuing use of the English language in India” (ibid.). The consequence of this was the Official Language Act, amended in 1967, which stabilized the role of English as an official language alongside Hindi. In addition, English is a regular subject at schools.5 Regarding language attitudes, Kachru’s (1983, 2005) idea of ‘linguistic schizophrenia’ appears to be prevalent in India: Although English is widely accepted for specific purposes, many speakers are critical of their own variety of English and/or consider their native languages to be of more importance. As far as the structural level is concerned, Mukherjee points out the presence a ‘common core’ in the areas of vocabulary and grammar (a core shared between IndE and British English) but also progressive forces enabling innovations and the development of variety-unique features (ibid.: 170–171). In this regard, Mukherjee points out that the vocabulary of IndE shows the highest amount of innovations and creativity.6 In spite of the presence of several features of endonormative stabilization, Mukherjee believes that it is an integral characteristic of IndE to keep certain traits of the third stage; according to him, the variety is in a state of equilibrium in which both progressive and conservative forces operate and coexist. A crucial aspect that will show whether IndE will develop further in terms of the Dynamic Model is its potential as an identity carrier; however, both Schneider (2007) and Mukherjee (2007) doubt that English will become an identity carrier in India in the near future. In the present study, I treat IndE as a Phase 3 variety. However, it needs to be acknowledged that IndE is by far the ‘oldest’ variety of the four varieties under consideration and has played a major role in South Asia for centuries. Linguistic features at all levels had more time to spread and become part of a supraregional variety (possibly to the extent of becoming a local norm). This aspect will also be a part of the explanation for the high frequency of topicalization in the Indian component of ICE. 4.2.2 Singapore English Demographics of Singapore Singapore is an island- state consisting of 53 islands with a population of 3.9 million by 2016 (Department of Statistics Singapore 2016).7 Situated at the

71

Development of four Asian Englishes 71 Table 4.2 Singapore’s population and ethnic groups in 2000 and 2010

Census

Total pop.

Chinese

Malays

Indians

Others

2000

3,263,209

2010

3,771,721

2,505,379 (76.8%) 2,793,980 (74.1%)

453,633 (13.9%) 503,868 (13.4%)

257,791 (7.9%) 348,119 (9.2%)

46,406 (1.4%) 125,754 (3.3%)

Source: Leow 2001; Wong 2011.

south of the Malay Peninsula, Singapore spans across some 710km². In terms of ethnicities, Singapore’s population is characterized by a significant degree of diversity; see Table 4.2 for census data from 2000 (Leow 2001) and 2010 (Wong 2011) on the country’s ethnicities. Although Chinese are clearly in the majority, ca. 25 percent of the country’s population is either of Malay, Indian, or another ethnicity (the category ‘Others’ comprises Eurasians and Europeans of various origins; cf. Wee 2008: 261). This diverse composition is particularly interesting from both a linguistic and a political point of view, but it has also proven to be a challenge for the Singaporean government. As Leimgruber calls it, the “thorny issue of linguistic diversity” was addressed in 1963 at the time the constitution was written (2013: 6). As stated in §153A of the constitution, “(1) Malay, Mandarin, Tamil and English shall be the 4 official languages in Singapore” (The Law Revision Commission under the Authority of the Revised Edition of the Laws Act Chapter 275 1999). Technically, these languages are awarded the same status in the constitution. However, this is neither reflected in speaker numbers nor in the different roles the languages play in the country; Table 4.3 quotes Leimgruber’s (2013: 7) overview of speaker percentages. The table reveals some interesting and –considering it only looks at a span of 30 years –baffling changes in terms of speaker numbers. While a mere 11.6 per cent spoke English in Singapore in 1980, it was spoken by almost as many speakers as Mandarin in 2010. At the same time, other Chinese dialects were dominant in 1980 with more than 50 per cent of the population speaking one of them but, by 2010, had dropped to 14.3 per cent. The status of SinE Understanding the role of English in Singapore entails understanding the ongoing tension in the country’s population and its government “between, on the one hand, accepting Singlish as a legitimate part of Singapore’s linguistic ecology, and on the other, rejecting it in favor of a more standard variety” (Wee 2008: 264). Although English is acknowledged by the government as an instrument beneficial to areas such as commerce and international communication, there is an internal struggle based on the differences between a standard variety of English and Colloquial Singapore English (CollSgE, also known as Singlish).

72

72 Development of four Asian Englishes Table 4.3 Speaker numbers for the main languages spoken in Singapore

English Mandarin Chinese dialects Malay Tamil Others

1980

1990

2000

2010

11.6% 10.2% 59.5% 13.9% 3.1% 1.7%

18.8% 23.7% 39.6% 14.3% 2.9% 0.8%

23.0% 35.0% 23.8% 14.1% 3.2% 0.9%

32.3% 35.6% 14.3% 12.2% 3.3% 1.1%

Source: Leimgruber 2013: 7; reprinted by permission from Cambridge University Press. Leimgruber uses the following sources for his numbers: Foley (1998: 221) citing Lau (1993: 6) for 1980, Leow (2001) for 1990 and 2000 and Wong (2011) for 2010.

This is partly because of its history: Rather than being spread in a ‘natural’ way by colonial settlers, English had been introduced and advocated as a means to make the country more competitively relevant in science, technology, trade, and so forth (Lim and Foley 2004: 4). With the emergence of a colloquial variety, a shift occurred away from English being used solely for practical purposes to its being a language that is actually used by Singaporeans in everyday contexts. This is a development that was not received as well by the government as it has been by the public. On the one hand, “[t]‌he existence of the colloquial variety is felt by the state to undermine the development of proficiency in the standard and, hence, to threaten that economic competitiveness” (Wee 2008: 263); and, on the other, it is considered detrimental to maintaining a uniquely Singaporean –and therefore Asian –identity (see Lim et al. 2010: 6). The state’s viewpoint is exemplified by the division of the population into four different ethnic backgrounds. Three of the state’s official languages can be directly linked to a corresponding ethnicity, just as Chinese in Singapore is mostly Mandarin, Malay is used by Malays and is also the state’s national language, and Tamil is a language spoken by most Indians in Singapore (Wee 2008: 263). English, however, does not have a direct equivalent apart from ‘Others’, which is a category that lumps together (mostly) Europeans and Eurasians who are not part of the remaining three ethnicities (Wee 2013: 105). One of the government’s key activities in getting Singaporeans to speak an internationally intelligible variety of English while maintaining their local identity has been to “consistently encourage […] Singaporeans to be bilingual in English and a mother tongue that is officially assigned to them on the basis of their ethnicity” (ibid.: 107). However, English no longer only serves as a lingua franca for inter-ethnic communication in the country. In scholarship, various approaches to analysing SinE have been proposed. Five of the more prominent approaches and their proponents are listed below: ( 1) the lectal continuum (Platt and Weber 1980); (2) the Expanding Triangle model (Pakir 1991); (3) diglossia (Gupta 1994);

73

Development of four Asian Englishes 73 ( 4) the Cultural Orientation Model (Alsagoff 2010); (5) the Indexicality approach (Leimgruber 2013). A description of each of these approaches would go far beyond the demands of this chapter. However, some remarks on Alsagoff’s “Cultural Orientation Model” (2010: 340), which encompasses and advances previous ideas, shall be provided. Basing her ideas on a twofold distinction between English used in global contexts and English used in local contexts in Singapore, Alsagoff defines culture as dynamic and fluid as opposed to static and predefined (ibid.: 340). The strong determination to compete (and enabling Singaporeans to compete) internationally is one of the main factors for the government’s intent to provide Singaporeans with an education in ‘good’ English. There is no obvious reason to assume that Singaporeans would position themselves against this, but, at the same time, English has become far more than an instrument for international competition. Rather, it is a language that is spoken in many households, and, as the census reveals (cf. Wong 2011), the number of native speakers of English has been on the rise for many years now. This specific situation has resulted in a “duality of the forces of the global and the local”; a duality “founded in the cultural perspectives and orientations of Singaporeans” (Alsagoff 2010: 337). SinE in the Dynamic Model Phase 1 began when English came to the country in 1819, with Singapore becoming a trading outpost for the British East India Company (Schneider 2007: 153). Because of the favourable location of Singapore, “a continuing massive influx of more traders, colonial agents, and contract laborers of predominantly Chinese and Indian origin but also others from a variety of Asian, European, and mixed backgrounds” (ibid.: 154; cf. also Gupta 1998) characterized its subsequent development. It was also during this time that the “capitan” system was established, which is the precursor to the government’s present-day division into the ethnicities mentioned above (see Lim 2004). Phase 2 started in 1867 with Singapore’s change in status to a crown colony and the country’s continued growth (Schneider 2007: 154). Europeans were in charge during this time, and identities, while collaborative in nature, were still ethnicity-based. Linguistically, ‘practical’ borrowings of vocabulary dominated the first phases. Phases 3 and 4 were set into motion via the policy of ethnicity-based bilingualism, which Schneider considers to be the major cause for the rapid development of Singaporean English, post-1970: “English is the only common bond shared by everybody [… and] the language education policy has had the indirect, and probably unintended, effect of alienating children from the varieties spoken by their parent and grand-parent generations” (2007: 156). In addition, everyday contact between the various languages has also had the effect of making code- switching a common speech pattern in the country (especially between English and the individual mother tongues), further promoting the development of a

74

74 Development of four Asian Englishes uniquely Singaporean variety of English (Bokhorst- Heng and Caleon 2009: 236). An important aspect is the undeniable existence of a local norm, and as Schneider (2007: 160) sums up, formal recognition of this norm has been called for and envisaged (cf. Ooi 2001: xi). In the present study, SinE is treated as the sole Phase 4 variety. Much like the other ICE components, ICE-Singapore features mostly educated speakers who are often able to switch between an acrolectal variety in formal contexts to basilectal Singlish in an informal context. Code-mixing is not absent from the analysed files in ICE but it occurs rarely in the corpus. The context of the recordings (with speakers being aware of being recorded) implies that the Singaporean data are neither fully fledged Singlish nor overly formal English, so that the presented variety is best analysed as mesolectal with occasional shifts in both a more formal and a more informal direction. 4.2.3 Hong Kong English Demographics of Hong Kong Hong Kong, one of China’s two Special Administrative Regions (the other being Macau), is located on the southern coast of China and has a population of about 7.3 million.8 In addition to Hong Kong Island, the city comprises the Kowloon Peninsula and the New Territories (Setter et al. 2010: 1). Since more than 95 per cent of its population is ethnically Chinese, Hong Kong can be considered a monoethnic society (Wong 2013: 548), a fact with notable implications for the linguistic situation in the region. It also is in sharp contrast to societies such as India and Singapore, which are characterized by ethnic and linguistic diversity. The status of HKE The current language situation in Hong Kong can be described as “trilingual and biliterate” (Setter et al. 2010: 4): It is trilingual because there are three official spoken languages, namely Cantonese, English, and Putonghua (Mandarin Chinese), and it is biliterate because most Hongkongers are able to write in Standard Chinese and English. Though not quite as common as in some other Asian societies, mixture between languages does occur (Kachru and Nelson 2006: 171). The combination of “trilingual and biliterate” results from both the ethnic composition of the region and the historical and political events of the last century. Referring to the strict division between Cantonese-speaking and English- speaking parts of the population, Luke and Richards described Hong Kong as “an example of societal bilingualism supported by two largely monolingual communities coexisting in relative social isolation” and as a community lacking any sense of pan-ethnic identity (1982: 47). Of course, these claims were made at a time when Hong Kong was still a British colony, and the need for international communication (especially in English) was not as strong as it was at the turn of the century with the demands of globalization.

75

Development of four Asian Englishes 75 The situation for Hong Kong changed after its territory was transferred from Britain to China in 1997 in the ‘Handover’ (Joseph 2004: 150), an event that brought about several changes in language policy and usage. As Hong Kong was no longer ruled by an English-speaking country, a shift from English to Mandarin as the dominant language of official matters was expected. This led to claims that Hong Kong linguists were more worried about the conflict between Cantonese and Putonghua than the conflict between English and Cantonese (see Bolton 2000a: 270). These worries stemmed from the clash between the role of Cantonese in Hong Kong and the plans of the People’s Republic of China to encourage the use of Putonghua in official contexts such as education (ibid.: 271). However, English had already been established as one of the two languages of government and law as well as “the dominant language of higher education and the business community” (Bolton 2012: 235). Whereas the implementation of Putonghua in Hong Kong is often described as a rather heavy-handed process, and the language remains tertiary in its importance, Cantonese and English each fulfil specific functions. Cantonese is the language “of solidarity and community ties” (Wong 2013: 549), while English is important for success in business and education (see Evans 2015). However, Cantonese remains the dominant and most widely spoken language in Hong Kong as it is spoken by roughly 90 per cent of the population (2011 Population Census 2012). For a long time, the absence of English in everyday communication and the separation of the Cantonese and English speech communities made the very existence of a unique variety of English in Hong Kong the topic of heated debate (see Groves 2009: 55).9 Today, however, the existence of a Hong Kong variety of English cannot be denied. This is supported by the number of speakers who speak English as their second language, which is presumed to be at about 43 per cent according to Hickey (2014: 148). Nevertheless, the variety cannot be described as an identity carrier in Hong Kong, which is also reflected in its position in the Dynamic Model. HKE in the Dynamic Model Phase 1 began between 1841 and 1842 with the establishment of Hong Kong as a colony and lasted throughout the nineteenth century when English was mostly spread via missionary activities that brought English into the educational sector (Bolton 2000a: 267). Some typical characteristics, such as limited bilingualism and borrowing of indigenous place names, hold for Phase 1 in Hong Kong, while others, such as dialect mixture, are fairly limited or absent (Schneider 2007: 135). Phase 2 started with the Treaty of 1898, which saw the lease of Hong Kong to the UK for 99 years. Although the colony stabilized after the treaty, English remained accessible only to a minority and was taught with a clear exonormative orientation (ibid.). Phase 3 can be dated to “late British colonialism” (Bolton 2000a: 268) starting in the 1960s. Initially a poor city, Hong Kong evolved into one of Asia’s major economic centres –a development with clear implications for English. The importance of being able to communicate in English in order to become and

76

76 Development of four Asian Englishes remain a major power in the world was not lost on Hong Kong’s inhabitants or government, and other developments such as a service economy, a local-born population, and wider accessibility to education, also meant a continuous rise of English. In terms of the Dynamic Model, Hong Kong is unusual “in that it did not gain independence but was turned over to another power” (Schneider 2007: 136). The Handover to China in 1997 (Joseph 2004: 150) meant that ties to the colonial power were weakened. At the same time, acknowledging globalization resulted in generally positive attitudes towards the English language. Certain linguistic developments are also indicative of HKE having reached Phase 3: At the turn of the millennium, a special issue of World Englishes on English in Hong Kong featured several articles concerned with unique phonological (Hung 2000) and syntactic (Gisborne 2000) features of HKE. Moreover, English frequently serves as the language of choice in written communication (e.g., emails and chats, see Schneider 2007: 137), for business interactions, and “also at times as a lingua franca between speakers of Cantonese and Putonghua” (ibid.; cf. also Bolton 2003: 201–203). Groves (2011: 35) explains that the two most important processes of Phase 3 –identity reconstruction and linguistic innovations –are clearly observable in the development of HKE: With the Handover, locals developed “a stronger ‘Hong Konger’ identity nested within a broader ‘Chinese identity’ ” (ibid.). This new identity entailed a more positive attitude towards English and an increased usage of a mixed English–Chinese code, which developed “its own distinctive phonological, lexical, and grammatical features” (ibid.). Because of the uncertain future of HKE, “Hong Kong may become an interesting test case for the predictive implications of the Dynamic Model and the inherent power of the developmental dynamism which it describes” (Schneider 2007: 139). Since ICE-Hong Kong was compiled before the Handover in 1997, it is assumed for the present study that the recorded variety of HKE was located somewhere between Phase 2 and Phase 3 of the Dynamic Model. 4.2.4 Philippine English Demographics of the Philippines The Republic of the Philippines comprises 7,107 islands and has 100.6 million inhabitants (Bolton and Bautista 2008; Lewis et al. 2016). There is an Austronesian majority in the country, with Tagalogs, Cebuanos, and Ilocanos being the three major groups (Bolton and Bautista 2008: 2). Because of its history, there are also various mixed ethnicities in the country, most notably the Philippine-Spanish, Philippine-Chinese, and Philippine-American groups. Unlike Hong Kong, India, and Singapore, the Philippines were never under British but under American rule. The status of PhilE Today, English in the Philippines is in a difficult position, and there has been an ongoing conflict between the roles and functions of English and Filipino. In this

77

Development of four Asian Englishes 77 respect, Gonzalez reports of a younger generation of Filipinos who “condemned the ‘miseducation’ of the Filipino in a foreign language” (2008: 22) and intended to revitalize the use of Filipino in areas that had become dominated by English. However, print media and higher education became domains where English remained the preferred language. In addition, strengthening Filipino at the expense of English is at odds with the ever-increasing demand for workforces in the international market, since foreign workers are expected to have at least a satisfactory proficiency in English (cf. Gonzalez 2008: 22; Bernardo 2008). According to Tupas, the importance of this requirement is generally acknowledged by both upper and lower classes (2008: 76). However, the lack of proper education in English for people of the lower classes tends to prevent these parts of society from acting on it. The quality of education in English in the Philippines tends to be directly linked to the level of education a person can access (Gonzalez 2008). As a consequence, established structures of wealth and distribution of power between “the upper classes of Philippine society (‘the oligarchs’) and the lower classes of the cities and provinces (the masa, or ‘masses’)” (Bolton and Bautista 2008: 3) are reinforced. The consensus in several papers in Bautista and Bolton’s (2008) anthology on PhilE is that the government will have to make wise decisions in order to turn English into “a positive resource in the education of Filipinos” (Bernardo 2008: 44). The overall objective should be the coexistence of English and Filipino as languages with different functions but of equal importance; furthermore, providing similar (educational) conditions for people of all classes is necessary. Gonzalez and Bautista (1985) and Gonzalez (2008) argue in favour of looking at sub-varieties of PhilE as ‘edulects’, “since the levels are a function of the education of the speaker and the kind of English language tuition he/she received in school” (Gonzalez 2008: 20). McFarland (2008: 144), also in favour of differentiating between different varieties of Philippine and Standard Philippine English, surmises that three factors influence local varieties: (1) the speaker’s first language, (2) proficiency, and (3) “the difference between the kind of English that would be used in an all-Filipino setting –where everyone understands both Tagalog and English –and in a mixed setting –where part of the group does not understand Tagalog”. A speaker’s first language is most certainly a significant factor, especially in an ESL setting as found in the Philippines. With more than a hundred languages spoken in the country (ibid.: 131),10 individual characteristics are bound to develop. His second criterion has been discussed in the context of edulects. For the third factor, the all-Filipino setting is particularly interesting because ‘Taglish’ (a mixture of Tagalog and English) is common and shows features that have been identified as being typical of PhilE. McFarland makes it a point not to consider Taglish as a speech variety in its own right but rather as the general process of mixing Tagalog and English, which can produce different results (ibid.: 144). PhilE in the Dynamic Model Phases 1 and 2 saw the presence of a large IDG strand but only a small settler strand. These two phases “seem to have practically merged and progressed very

78

78 Development of four Asian Englishes rapidly” (Schneider 2007: 140). The Spanish-American War of 1898 ended with the United States controlling the Philippines, which led to quick enforcement of English as the official language. With the goal of providing fast and widespread English education, 523 teachers were sent to the Philippines in 1901 (Thompson 2003: 21).11 After a Filipino revolt for independence was defeated in 1902, this educational endeavour became very successful as the English language spread rapidly across the country, and speaker percentages almost tripled from 1939 to 1980 (Bolton 2000b: 97). Martin attributes this success to a welcoming attitude towards the Americans: “Having experienced 300 years of oppressive Spanish colonial rule, the Filipinos were only too happy to have the Americans on their shores” (2014: 71). Linguistically, the first two phases are characterized by typical developments on the level of vocabulary, as names for places, plants, animals, and cultural objects were borrowed (Bolton and Butler 2004: 95, 97). Phase 3, according to Schneider, can be dated to roughly a decade before independence in 1946, that is “eleven years after the Philippines were granted limited sovereignty under a ‘commonwealth’ status” (Schneider 2007: 140– 141). A Philippine variety of English was first recognized by Llamzon at the end of the 1960s (1969: 1–2) and innovations in PhilE signalling nativization have been identified on all linguistic levels. Phase 4 can be seen in early signs, but a fully endonormative orientation has not yet developed for PhilE. A useful factor in determining the beginning of endonormative stabilization is the aforementioned Event X, which marks linguistic and, often, cultural independence from the colonizer(s). In this context, Borlongan suggests that “the ratification and implementation of two post-World War II acts may serve as Event X in the development of Philippine English” (2011: 4), that is, the Tydings Rehabilitation Act of 1946 and the Bell Trade Relations Act of 1946. These acts saw the payment of reparations to war victims and the opening of free trade between the Philippines and the United States. Perhaps more importantly, a post-Event X signalled increased independence: The Philippine Senate rejected a military base agreement, upon which American military installations were withdrawn (ibid.: 5). According to Martin, the latter event “signaled a strong desire of a former colony, by that time governed by the woman who overthrew the U.S.-supported Marcos dictatorship, to sever remaining ties from its former colonial master” (2014: 78). In other areas, however, Martin does not see the requirements for endonormative stabilization as being fulfilled in the Philippines. Codification, for instance, has begun (cf. the Anvil-Macquarie Dictionary of Philippine English), but such codifications “are few and far between” (2014: 80) and do not suggest “an explicit declaration of linguistic independence” (Schneider 2007: 125). In addition, Martin notes that a Philippine variety of English is not an identity carrier like Singlish (2014: 80). Rather, (Philippine) English is often viewed critically, in particular amongst lower classes. Taking into account the arguments presented in Martin (2014) and the compilation years of ICE-Philippines (between 1990 and 2004; see Bautista 2004), I consider PhilE as a Phase 3 variety in this study.

79

Development of four Asian Englishes 79

4.3 Summary In this chapter, I presented the theoretical framework for the analysis of postcolonial varieties in Asia and the development of English in the three countries plus Hong Kong. While there are several differences between how the varieties emerged and developed, certain similarities could also be identified: All the varieties are in a continuous state of contact with other languages. Additionally, understanding and evaluating the functions of English and its contact language(s) represents or has represented a struggle in each of the covered regions. While English in Singapore, for instance, is an identity carrier, its role remains largely functional in Hong Kong. This is a major aspect in the development of varieties: In order for an endonormative standard to develop, there has to be widespread acceptance of the language; the use of English needs to extend to the private domain. With the exception of SinE, I consider all varieties to be somewhere in the phase of nativization.12 HKE, however, is located at a very early point in the phase, while PhilE and IndE, in particular, have reached a more advanced stage. Variety status plays an important role in the analysis of topicalization, since the lack of nativized structures and a lack of intense contact between English and Cantonese in natural contexts in Hong Kong is put forward as a reason for the low frequencies of topicalization in ICE-Hong Kong. The only variety in the course of stabilizing endonormatively is SinE: Linguistic features (such as topicalization) are used creatively in the variety and occur more frequently. Despite its status as a Phase 3 variety (which it shares with Hong Kong), IndE has had more intense contact with the local languages.

Notes 1 Strang differentiates A, B, and C speakers. 2 Some of these models and approaches are McArthur’s (1987) Circle of World Englishes, Görlach’s (1990) Circle of International English (coined by McArthur 1998: 98), and Gupta’s (1997) classification of output types. For a comprehensive overview, see chapter 3 in Buschfeld (2011). 3 Original quote in German: “Sodann gilt es kurz auch den Fall zu betrachten, wo der Kommunikationsraum einer Gesellschaft oder Nation gar nicht mit Varietäten einer Sprache besetzt ist, sondern verschiedene Sprachen auf bestimmte Funktionsbereiche verteilt sind”. 4 Medieval England comes to mind, where French, Latin, and English each served different functions at different levels (Moessner and Schaefer 1987: 151; Schaefer 2012). 5 Acknowledging the two official languages in addition to the most important regional languages, the Three-Language Formula demands that Hindi, English, and a regional language be taught (see Biswas 2004). 6 The well-known usage guides by Nihalani et al. (1979) and Nihalani et al. (2004) lend weight to this argument. 7 The Singaporean census department differentiates between residents and non- residents. If the non-residents are added, Singapore’s population was at 5.53 million in 2016 (cf. Department of Statistics Singapore 2016).

80

80 Development of four Asian Englishes 8 This number is taken from Hong Kong’s census population estimate for the end of 2014. The official population census from 2011 indicated a population of 7.07 million (cf. 2011 Population Census 2012: 7). 9 Luke and Richards (1982) denied its existence, and Tay (1991: 327) “argued that there has been no motivation for the indigenization of English in Hong Kong” (Poon 2006: 23). Less willing to take a clear stand, Platt (1982) pointed out undeniable differences between English in Hong Kong and English in Singapore. A decade later, little had changed about the controversial status of the variety. In an article from the beginning of the 1990s, Tay claimed that “[i]‌t should be clear from the preceding paragraphs that there is no social motiviation [sic!] for the indigenisation of English in Hong Kong” (1991: 327). Three years later, Johnson wrote that “[t]here is no social or cultural role for English to play among Hong Kong Chinese; it only has a role in their relations with expatriates and the outside world” (1994: 182). 10 According to McFarland (2008), the precise number is difficult to pin down. Depending on the source, numbers ranging from 118 (McFarland 1980) to 163 (Grimes 2002) are given. The most recent version of the Ethnologue lists 183 living languages (cf. Lewis et al. 2016). 11 The teachers arrived aboard the USS Thomas and are referred to as ‘Thomasites’. 12 This is assumed for the representation of the varieties in the ICE corpora and not for the present day.

81

5 Corpus analysis Data basis and methodology

The present study is concerned with topicalization in Asian varieties of English. To this end, I chose to follow in the footsteps of many other publications on non-standard features of World Englishes by working with corpora of spoken language, namely the spoken segments of the International Corpus of English corpora for Hong Kong, India, the Philippines, and Singapore as well as for Great Britain. In this chapter, I present my data, describe the procedure for coding and evaluating topicalization, and discuss some caveats of my approach. In addition, I also mention problematic cases found in the corpora that were excluded from further analysis but should nonetheless be recognized.

5.1 The International Corpus of English (ICE) The data analysed in this study come from the International Corpus of English, commonly abbreviated as ICE. With the goal of providing parallel corpora that can be used “as the basis for international comparisons” (Greenbaum 1996b: 5), corpora for different varieties of English were (and are still being) compiled under comparable conditions and with a similar distribution of spoken and written texts (Nelson 1996: 27; see also Hundt 2015: 382). In each of the ICE corpora, there are 500 texts with approximately 2,000 words each, yielding a total of about a million words for each corpus. 300 of the files feature spoken texts, separated into dialogues and monologues; the other 200 files feature written language. While the structure and the treatment of data are similar across the corpora, variation in sex, age, educational level, occupation, speaker relationships, and so forth, must be seen as a necessary by-product of an ambitious project such as ICE. As of June 2018, 14 corpora have been made available on the ICE website, with several others currently in development.1 While there is a broad range of speakers featured in the corpus, certain criteria were put into place in order to decide whether a person is suitable for being included in the corpus. The relevant criteria are given by Nelson (1996: 28): ( 1) the speaker must be aged 18 or over; (2) the speaker must have been educated through the medium of English to at least the end of secondary schooling;

82

82 Corpus analysis: Data and methodology (3) the speaker must have been born in the country under consideration or at least moved there at an early age. Exceptions to these rules are sometimes made when a person’s “public status makes their inclusion appropriate”, so that, for instance, with regard to the second rule, “the British component of ICE includes speech from the Queen and Prime Minister John Major, neither of whom would be admitted under a strict application of this criterion” (Greenbaum and Nelson 1996: 5): The Queen was home-schooled and John Major left school at age 16, but both are certainly worthy of being included in ICE (cf. Edwards 2016: 116). For ICE, a unified markup system has been developed. There is textual markup used to identify indigenous words, pauses, laughter, and overlapping speech directly in the text files. In this book, unless it is of particular relevance for an example (or the only contribution by a speaker in an utterance), all markups have been removed for improved readability. In addition to the markup, there is biographical information on age, educational level, L1, and so forth, in a separately stored file for some of the corpora. Unfortunately, this information is not available for each variety. The Singapore metadata, for instance, cannot be accessed for scholarly research, which makes a typological comparison for that particular corpus much more problematic. As Hundt notes for ICE in general, “documentation of existing ICE components remains poor even though background information is often important for the interpretation of individual corpus findings” (2015: 400). Since this is a study with a strong focus on contact between typologically distinct languages, the lack of metadata reporting on speakers’ L1 requires generalization in some cases. An important factor that should not go unnoticed in any present-day analysis using ICE in some capacity is the point in time when the data were compiled. For the first generation of ICE corpora, the texts were published and recorded between 1990 and 1994 (Nelson 1996: 28). As the previous chapter has shown, some post-1994 developments in the four countries under consideration have brought about drastic social and political changes, with Hong Kong’s 1997 Handover back to China as a major case in point. In addition, sample collection for one particular variety sometimes required longer periods of time, which entails diachronic variation (cf. Hundt 2015: 384). For topicalization (but also other features primarily found in spoken language), having access to corpora with data from different points in time will certainly result in worthy additions to research on World Englishes. Until such corpora are available, however, nothing will change the fact that an analysis of a spoken corpus is, ultimately, an analysis of a ‘snapshot’ of language (or perhaps a small series of subsequent snapshots).

5.2 Data selection As mentioned above, ICE consists of written and spoken sections. Spoken language is assumed to be more revealing than written language as far as non- canonical sentence structure is concerned, which is why spoken corpora were analysed. ICE is particularly well suited for this purpose, since

83

Corpus analysis: Data and methodology 83 [o]‌ne of its strengths, quite clearly, is the size of its spoken components, given that oral performance is less constrained and less conservative than written styles, so this is where innovations are most likely to surface. (Schneider 2004: 247) Thus, speaking in sociolinguistic terms, analysing spoken interactions in the selected ICE components allows ‘tapping into the vernacular’: no interviewer is present; the prestige of the vernacular is not a major issue (Vaux and Cooper 1999; but see 5.4 for counterexamples); the topics can be freely chosen (Milroy and Gordon 2003). Very often, the speakers in the recordings in ICE are friends or, in some cases, relatives, which also helps to reduce any anxiety or the necessity to speak in a very formal, standard-like way (see Tagliamonte 2006: 26). From a sociolinguistic perspective, the selected spoken components of ICE were therefore expected to contain relatively natural and unconstrained data. As far as written language is concerned, I considered that the ICE files in the written components are, for the most part, relatively formal. While topicalization can be used as a stylistic device in writing, it has been noted mostly as a feature of spontaneous spoken language in Asian Englishes. In addition, influence from the contact languages was assumed to be first noticeable in spoken language. Accordingly, a study of topicalization in written English might be worthwhile, but it was not deemed a priority for the present book. It could be added that the written segments of ICE contain mostly what Koch and Oesterreicher termed ‘language of distance’, which is less insightful for the research questions addressed in this study than ‘language of immediacy’ (Koch and Oesterreicher 1985/2012). Both types of language may be represented by oral or written language, but in ICE, the written sections pertain mostly to language of distance (for example, the sections containing academic and instructional writing as points in case). Koch and Oesterreicher define language of immediacy as the locus of progression and innovation and, therefore, widen the scope by including certain forms of written language (1996: 64).2 Thus, innovation and conservatism may be found both in spoken and written language, but innovation is expected in language of immediacy first (ibid.: 67). Consequently, the files involving interactional settings with the highest expected degree of spontaneity and immediacy were selected: open topic direct conversations, phone calls, and classroom lessons. Despite the fact that a lot of spontaneity was expected in the classroom lessons, they certainly represent a setting that is different from the direct conversations and phone calls. Whereas the relations in those kinds of discourse are usually highly dynamic and subject to negotiation, a classroom situation traditionally involves a comparatively static teacher– student relationship, with the teacher doing most of the talking. Often, “the three-phase exchange (initiate-respond- feedback) in classroom discourse […] restricts learners to the performance of a relatively narrow range of speech acts associated with the responding role assigned to them”, which, in turn, “may limit the development of full sociolinguistic competence and the extended linguistic repertoire that this requires”

84

84 Corpus analysis: Data and methodology Table 5.1 Structure and expected word count of the analysed ICE components

Text group

Text type

ICE-Files

No. of words

ICE –  S1A Dialogues, private

Direct Conversations

S1A-001- S1A-090 S1A-091- S1A-100

180,000

ICE –  S1B Dialogues, public

Classroom Lessons

S1B-001- S1B-020

40,000

Telephone Calls

20,000

Total analysed word count = 5 x 240,000 words = 1,200,000 words Source: Greenbaum & Nelson 1996.

(Ellis 1997: 173).3 On the other hand, the amount of talking opportunities for the students depends on many factors, such as the topic of the session, the number and kind of questions asked by the teacher, the applied method, and so forth. These differences are clearly reflected in the ICE classroom data.4 Some of the analysed corpora feature very lively teacher–student interaction, while others are dominated by teacher monologue. Even though the direct conversations and phone calls in ICE are not free from similar issues, the degree of comparability in the classroom lessons appears to be heavily influenced by the educational setting and corresponding dynamics. Table 5.1 shows the expected word count per variety and a calculation of the expected total number of analysed words. The following section gives an overview of the coding and evaluation procedure, which also contains further information on the actual word count after the corpus files were cleaned up.

5.3 Coding and evaluation All corpus components listed in Table 5.1 were read and coded manually for each variety. Each unambiguous instance of topicalization was first given the tag during the initial read-through. A second tag was used for all cases that required evaluation by a second rater. Once the tagging procedure was completed, all tags were retrieved and exported into a table for further analysis with the help of AntConc (Anthony 2014). Any sentence featuring the tag was closely examined a second time in order to make a final decision on whether the case at hand was a clear-cut instance of topicalization that counts into the quantitative analysis or, instead, did not factor into the final frequency count. Cases that received the tag were analysed again by myself and by a second rater; based on the overall assessment, a decision was then made on whether the example at hand was an instance of topicalization, a case that required discussion in the empirical chapter, or a case that could be dismissed. All tokens that counted as cases of topicalization were subsequently annotated with regard to the six criteria given in Table 5.2. The structure of the results

85

Corpus analysis: Data and methodology 85 Table 5.2 Annotation schema for tokens of topicalization

Criterion

Possible categories

Tag

Syntactic form

Noun Phrase Adjective Phrase Adverb Phrase Prepositional Phrase Clause

NP AdjP AdvP PP CL

Information status

Evoked Unused Brand-new

E U B

Syntactic function

Subject (embedded) Subject Complement Object Complement Direct Object Indirect Object Adverbial

S CS CO OD OI A

Expanded functions (Mesthrie 1992)

None Token is not evoked Constituent is not an NP Interaction with further processes Topic extraction from embedded clause

0 1 2 3 4

Discourse function

Emphasis Topic continuity Contrast Topic shifting

EMP CON COT SHF

Topic persistence

None Poset Identity Pronoun Zero (implicit)

no PS ID PN Zero

presented in chapter 6 follows the order of the criteria laid out in this table. Tokens representing a ‘typical’ and uncontroversial case of topicalization, that is, an NP functioning as a direct object, received the tag , whereas ‘non-traditional’ tokens, that is, all others, received the tag . Syntactic form and syntactic function were annotated to find out which constituents are topicalized in each variety. Furthermore, syntactic form was annotated to test if tokens other than NPs were topicalized. This was motivated by Mesthrie and Bhatt’s (2008: 82) claim that topicalization occurs only with NPs, which is a huge deviation from Mesthrie’s earlier position. Taking his 1992 work as a basis, however, a variety of phrases and clauses were deemed acceptable and annotated based on a general syntactic framework. For the classification of information status, Prince’s threefold distinction between evoked, unused, and brand-new (cf. 1992) was used. Since even these three tags already involve interpretation, the choice was made not to use a more

86

86 Corpus analysis: Data and methodology differentiated set of information statuses. The annotation of information status was helpful in making decisions about discourse function and pertains to another of Mesthrie’s criteria for an expanded concept of topicalization –the topicalization of ‘new’ information. The tags used in the annotation of information status are illustrated with the following three examples: The evoked topic durian cake is repeated in (5.1); with my family represents shared knowledge but has not been mentioned in the discourse preceding the excerpt in (5.2) and was rated as ‘unused’; and, finally, the book in (5.3) is topicalized without being prepared in any way and received the label ‘brand-new’: (5.1) Evoked (‘E’) A: Durian cake durian cake is nice B: Yous you’ll hear a lot of background A: But durian cake you must have word the season nah (ICE-SIN:S1A-006#91–93) (5.2) Unused (‘U’) A: So then you are coming to Goa B: Yes shortly A: Let me know now B: With my family I’ll come (ICE-IND:S1A-001#66–69) (5.3) Brand-new (‘B’) B: I don’t want to attend any class A: You have to listen You have to attend The book ah you have to chop (ICE-SIN:S1A-084#112–115) Whenever a token fit one of Mesthrie’s (1992) expanded functions of topicalization, it was annotated accordingly. Higher frequency is a criterion listed by Mesthrie, and was treated differently in this analysis. In comparing frequencies between different South African varieties of English, Mesthrie notes that “fronting and dislocation occur quite frequently in SAIE, at a much higher frequency than other first-language varieties of English in South Africa” (1992: 113).5 In light of Mesthrie’s restriction to South African Englishes, this criterion needs some reshaping in order to be applicable here. In the context of this book, comparing input varieties to postcolonial varieties is more sensible. In order to be able to talk about input varieties, ICE-Great Britain was analysed in addition to the four Asian varieties. Since no spoken ICE data are available for the United States, this corpus could not be included. The tags used for the annotation of Mesthrie’s expanded functions of topicalization are illustrated with the examples given in (5.4–5.7).

87

Corpus analysis: Data and methodology 87 (5.4) Token is not evoked (‘1’) B: I don’t want to attend any class A: You have to listen You have to attend The book ah you have to chop (ICE-SIN:S1A-084#112–115) (5.5) Constituent is not an NP (‘2’) A: The neighbours did not know what to make of the young man That he was a college lecturer and scientist they knew And he did experiments in small laboratory he had set up at his house (ICE-IND:S1B-003#282–284) (5.6) Interaction with further processes (‘3’) A: Very nice movie it is just entertaining it is  (ICE-IND:S1A-052#247) (5.7) Topic extraction from embedded clause (‘4’) B: So now you see they have to immediately ship fifty k g A: Immediately they say it’s not possible (ICE-IND:S1A-094#81–82) Most of these ‘functions’ are self-explanatory and require no further comment. However, it should be noted that the example in (5.6) shows the shifting from initiated SVX to topicalization and is a place-holder for different possible interactions with further syntactic processes, namely negation, wh-questions, yes/ no-questions, copula deletion, and pro-drop. Another criterion in the annotation scheme is discourse function. For the present study, four major functions of topicalization were distinguished: Emphasis, contrast, creating topic continuity, and topic shifting; for descriptions of these functions, see chapter 2. The corpus excerpts in (5.8–5.11) show an example each of the possible discourse functions. (5.8) Emphasis (‘EMP’) B: People I know A: Don’t think so (ICE-SIN:S1A-025#32–40) (5.9) Topic continuity (‘CON’) C: Well we sound like sore losers laughter  […] A: Hello That I will never agree with you (ICE-PHI:S1A-073#307–311)

88

88 Corpus analysis: Data and methodology (5.10) Contrast (‘COT’) A: Oh we never knew what happened to her The other we knew ah but Vidya we didn’t knew (ICE-IND: S1A-021#58–59) (5.11) Topic shifting (‘SHF’) C: Is it a restaurant E: It is a restaurant C: And they have satay D: They have satay Recently that man I saw (ICE-SIN:S1A-037#153–157) The last criterion in the annotation was topic persistence. Topic persistence refers to “the number of times the referent persists as an argument in the subsequent 10 clauses following the current clause” (Givón 1984a: 908). In contrast to both Givón (1984a) and Gregory and Michaelis (2001), the decision was made not to restrict the analysis of persistence to the next ten or five utterances, respectively. Furthermore, I chose not to use a scoring system. Instead, I noted how (if at all) the topic persisted in the entirety of the subsequent discourse. This was possible because all corpora were read in full, which means that persistence could be analysed and tagged through full conversations. Topic persistence is an interesting factor for testing whether topicalization contributes to establishing a topic in the discourse or if it does not. Gregory and Michaelis (2001) comment on Givón’s idea of referential continuity (1984a), noting that “[t]‌he model makes sense, since the mere proffering of a topic by the speaker does nothing to ensure ratification of that topic by the hearer” (2001: 1693). Rather, “speaker-hearer consensus alone determines topic persistence, any attempt at topic establishment is subject to failure” (ibid.). This claim could be tested by a close annotation of topic persistence using the tags listed in Table 5.2, illustrated with examples (5.12–5.15). In each example, the topicalized constituent is in bold and the form in which the topic persists is underlined. This was not possible for implicit topic persistence, since there is no direct link to the topicalized constituent. (5.12) Persistence via identical repetition (‘ID’) B: The the one dollar I give you Z: Uhm in other way I put the one dollar (ICE-HK:S1A-061#847–848) (5.13) Persistence via poset (‘PS’) A: And also uhm Mandarin I don’t know Z: Yeah And English (ICE-HK:S1A-056#485–487)

89

Corpus analysis: Data and methodology 89 (5.14) Persistence via pronoun (‘PN’) A: Yeah Peking restaurant you’ve tried before And you don’t like it you said that (ICE-HK:S1A-056#220–222) (5.15) Implicit topic persistence (‘Zero’) A: Preparation of the marriage I’m talking about […] B: We’ve not discussed anyhing yet A: You have not discussed B: No no A: And have you booked the hall? (ICE-IND:S1A-095#165–171)6 The annotation allowed for in-depth qualitative and, to some extent, quantitative analysis. However, for quantitative comparisons across the corpora, normalizing the token distribution was necessary. Although scholars who compile ICE components are asked to stick to the word count indicated in Table 5.1, identical word counts (of spoken data in particular) are impossible to achieve without setting artificial boundaries. Thus, the figures were normalized in three steps. First, all corpora were rid of unneeded markup. For this purpose, every file was opened in Notepad++ and cleared by means of the regular expressions (RegEx) given in (5.16) in the search-and-replace window. (5.16) (a) Extra-corpus text (in … ) RegEx: ]*>(.*?) (b) Untranscribed text (in …) RegEx: ]*>(.*?) (c) Editorial comments (in …) RegEx: ]*>(.*?) (d) All remaining (in-line) tags RegEx: Table 5.3 indicates the functions of the symbols in the regular expressions. For reasons of convenience, both the tag symbols that were present in ICE as well as the symbols that have a specific function in the RegEx are explained. The regular expressions reflect the pattern identifiable in the corpora by looking for the first tag, anything in-between the tags, and the final tag. Every match found by the three regular expressions in (5.16a), (5.16b), and (5.16c) was checked manually because inaccurate tagging during compilation can result in missing slashes which, in turn, would mean that the match looks for the next closing tag. In such a case, a potentially long and relevant part of the discourse would be deleted if all matches were deleted automatically.

90

90 Corpus analysis: Data and methodology Table 5.3 Explanation of the symbols in the regular expressions

Symbol

Function

Tag symbols

/ X, O, &

Indicate a tag Indicates the closing tag Symbols used in ICE annotation

RegEx symbols \s * . [^]‌ () ?

Detects any kind of spacing Indicates that the sign before the asterisk may occur 0 up to an infinite amount of times Matches most possible characters Ensures that any text in addition to the tag symbol is identified Group a set of symbols Ensures that the assigned function stops searching when the matching pair of closing brackets is found

Only the first three tags required the deletion of additional material between the opening and the closing tags. The tags and , for instance, indicate extra-corpus text that often comes from speakers who were recorded and whose utterances were transcribed for the sake of completion but who are not speakers of the variety of the corpus (for example, an Australian speaker in ICE- Hong Kong). Information between and would also critically distort the word count; often, laughter is indicated between these tags as laughs, laughter, and so forth. Finally, and and everything in-between was deleted since these were editorial comments such as translations, additional information, and so forth. Once these tags were deleted, the remaining unnecessary tags could be deleted with a simple RegEx (5.16d) since no further text occurring between tags had to be removed. Indigenous and unclear words (but not the tags marking them) were kept in the text because they could contribute to the discourse and deleting them would distort the actual word count. After cleaning the files, the words were counted and used as a basis for the normalization. The formula in (5.17) only required an adjustment of values and, because of the relatively low number of tokens, the normalization was based on 100,000 words instead of a million words. n (tokens ) (5.17) ×100, 000 n (total words ) The resulting number indicates the expected frequency of topicalization in 100,000 words; an alternative would have been to measure the number of utterances that feature topicalization against those that do not. After normalizing the figures, the word counts given in Table 5.4 remained.

91

Corpus analysis: Data and methodology 91 Table 5.4 Word counts in the analysed ICE components after normalization

ICE corpus

Text type

No. of words

ICE-Great Britain

Direct Conversations Telephone Calls Classroom Lessons

183,373 20,271 41,695

Total word count for ICE-Great Britain: 245,339 ICE-Hong Kong

Direct Conversations Telephone Calls Classroom Lessons

212,136 28,293 49,466

Total word count for ICE-Hong Kong: 289,895 ICE-India

Direct Conversations Telephone Calls Classroom Lessons

193,618 23,465 43,707

Total word count for ICE-India: 260,790 ICE-Philippines

Direct Conversations Telephone Calls Classroom Lessons

193,372 23,929 45,929

Total word count for ICE-Philippines: 263,230 ICE-Singapore

Direct Conversations Telephone Calls Classroom Lessons

179,805 20,012 40,430

Total word count for ICE-Singapore: 240,247

In total, the word counts amount to 1,299,501 words, which is slightly more than the estimated 1.2 million words indicated in Table 5.1.

5.4 Problematic cases and limitations Deciding whether a token is a case of topicalization or not is not a trivial task, particularly in non-standard varieties of English. For this reason, I discuss some difficult cases as well as certain limitations of the dataset in this section. Many tokens received a tag first, and many cases had to be left out of the analysis because they were ambiguous or problematic in a different way. Consider, for instance, the discourse excerpt in (5.18). (5.18) A: word scared of all these B: Ah B: The roller coaster I dunno lah Maybe I when I’m going to take I feel scared lah (ICE-SIN:S1A-085#271–274) Examples such as (5.18) are ambiguous from a discourse-pragmatic perspective. I dunno has been noted as a discourse-pragmatic feature, defined by Pichler as

92

92 Corpus analysis: Data and methodology a formally heterogeneous category of syntactically optional elements which make little or no contribution to the truth-conditional meaning of their host units and –depending on their scope, linguistic co-text as well as sequential, situational and cognitive context –perform one or more of the following macro-functions: to express speaker stance; to guide utterance interpretation; and to structure discourse. (2013: 4) Discourse-pragmatic features or discourse markers can serve subjective functions, textual functions, or a combination of both. Subjective functions relate to speaker attitudes and face (cf. Coates 1996: 156; Pichler 2007: 178); textual functions include “repair, hesitation and turn-exchange devices” (Pichler 2007: 179).7 In terms of their form, ambiguous tokens that could be mistaken for instances of topicalization are frequently sentence-final comment clauses. Clauses of this kind “are loosely connected to the main clause, they normally lack an explicit link, and they are usually short and can appear in a variety of positions” (Biber et al. 1999: 197). Moreover, they are “usually in the present rather than past tense, first or second rather than third person, and comment on a thought rather than the delivery of a wording” (ibid.). There are several possibilities for the interpretation of (5.18). One such example would be to consider I dunno as a comment clause functioning as a discourse-pragmatic device, which indicates, in this particular example, hesitation: Considering that speaker B finishes their thought process in the next line, it could be interpreted as a gap-filler. Another interpretation would be to analyse The roller coaster I dunno lah as a non-canonical sentence with the structure OSV followed by the discourse particle lah. An interpretation of (5.18) as a case of topicalization also has potential, since speaker B apparently does not know the roller coaster from first-hand experience; they have not ridden the roller coaster before, as indicated by the next line. Understanding I dunno as a gap-filler is also possible, however, since it seems intuitively plausible that the roller coaster generally serves to set the frame but is, technically speaking, a subject standing in isolation. To complicate things even further, left-dislocation is also an option, and the utterance following the critical expression would have to be understood in the manner laid out in (5.19): (5.19) A: word scared of all these B: Ah B: The roller coaster I dunno lah Maybe I when I’m going to take Ø I feel scared lah (ICE-SIN:S1A-085#271–274) If the example is indeed a case of left-dislocation, the resumptive pronoun it picking up the fronted constituent is omitted. All in all, (5.18) leaves three possible interpretations:

93

Corpus analysis: Data and methodology 93 ( a) as a case of topicalization with the pattern OSV; (b) as a discourse-marker filling a gap /indicating hesitation; (c) as a case of left-dislocation. Finding a solution to problems of this nature is challenging. It would have been helpful to have had access to the audio versions of the analysed transcriptions because the intonation of I dunno could possibly give away its intended function in the utterance. Regarding the intonation of topics and comments, Wells writes that “[t]‌he topic is typically said with a non-falling tone (a dependent fall-rise or rise), the comment with a falling tone (a definitive fall)” (2006: 72). In addition, there is a tendency to deaccent old information, so that, in (5.20) the accent falls on adore (indicated by the apostrophe marking the stressed syllable) and not on the previously mentioned dogs: (5.20) D’you object to dogs? No, I a’dore dogs. (Wells 2006: 109) In ICE, the tags for pauses indicated by and may at times help to decide if topicalization is likely or unlikely since a longer pause might indicate hesitation (Nelson 2002: 4). Because of the amount of subjective interpretation necessary, (5.18) was excluded from further analysis. Another frequent problem is the omission of prepositions. The example in (5.21) shows a case of ambiguity caused by such an omission: (5.21) B: Uh exam day ba A: Uh-huh B: Aside from today A: Yeah B: Is uh is Wednesday A: So so you mean to say Monday you don’t have B: No I don’t have (ICE-PHI:S1A-013#162–168) Speaker A wants to inquire whether speaker B means that they do not have exams on Monday but omits the preposition as well as the direct object in doing so. This leads to two possible interpretations: Monday could simply be an adverbial but, at the same time, it looks like a topicalized direct object and could, accordingly, be interpreted as such. In these cases, the context was taken into consideration to identify the option that seemed more likely. In this particular conversation, Monday was labelled as an irrelevant case. In some cases, indigenous words in the transcriptions also led to problems of interpretation. Since no translation was provided in the annotation, examples like the second utterance by speaker C in (5.22) could only be treated as

94

94 Corpus analysis: Data and methodology irrelevant cases. However, the preceding discourse strongly suggests that poem was picked up as a topic with the pronoun that. Additionally, unlike in other cases, there is no alternative as likely as topicalization. Again, a case-by-case examination was necessary to decide if such examples could be counted or should be dismissed. (5.22) C: Ay but that reminds me of the poem B: What poem C: That you’ll ‘no yung kanina (ICE-PHI:S1A-056#159–161) One of the most critical phenomena in the interpretation of topicalization is ellipsis. In (5.23), for instance, speaker A omits several words before cinema: (5.23) B: Tsk it’s true A: Tell me about it I would not gossip about it Cinema you can hor B: Excuse me everybody in the cinema was sitting word

(ICE-SIN:S1A-021#329–333) Ellipsis, defined by Biber et al. as “a pervasive feature of conversational dialogue” (1999: 1099), means that certain parts of an utterance are omitted to make communication more efficient. By “build[ing] on the content of what a previous speaker has said” (ibid.), the conversation moves at a quicker pace and repetition is avoided. For a case such as (5.23), this is problematic: Since cinema stands in sentence-initial position and you is the subject, cinema could be interpreted as a topicalized direct object. A more likely scenario, however, is that cinema represents a shortened version of in the cinema. Then, topicalization is no longer the strongest interpretation, since in the cinema in this utterance is unmarked. In general, cases that involved ellipsis were discarded. Any cases that were unclear were evaluated with a second rater. In addition to the problematic cases discussed above, some methodological problems with regard to the quality of the data and the data analysis need to be discussed. A well-known problem tied to working with spoken data is the ‘Observer’s Paradox’. Since, as mentioned above, non- standard features are much more likely to occur in spoken language (or, more accurately, language of immediacy), research questions such as the ones posed in this study can only be answered properly when the data is as natural as possible. The methodological problem arising from this desideratum was most famously noted by Labov (1972), who points out that

95

Corpus analysis: Data and methodology 95 the aim of linguistic research in the community must be to find out how people talk when they are not being systematically observed; yet we can only obtain these data by systematic observation. (1972: 209)8 The ICE corpora, while one of the best sources for comparative analyses of World Englishes, are clearly not free from speakers who actively realize that they are being recorded (examples 5.24 and 5.25) or have other forms of meta-discourse related to the fact of being recorded (examples 5.26 and 5.27). (5.24) A: Yeah actually I’m the subject of the interview today (ICE-HK:S1A-089#3) (5.25) C: You’re going to be transcribing all of this are you B: Yes Yes Some poor little soul’s got to do it C: So to say any nasty words then B: No no We’re not to mention We’re not to ment mention the fact the tape recorder’s on or it ruins things And talk fairly quickly yeah (ICE-GB:S1A-053#137–145) (5.26) A: You know what B: What A: I feel so embarrassed doing this thing B: Yeah A: Recording our voice B: A lot of people are looking at us A: Oh my gosh Okay anyway continue continue B: Don’t mind them (ICE-PHI:S1A-039#278–286) (5.27) C: Starting now A: Thirty minutes of this C: So you said you’re gonna throw laughs  A: Thanks a lot B: Start that now A: Should we make sense B: Do we have to make sense

96

96 Corpus analysis: Data and methodology C: I don’t know do we have to B: We can just babble for thirty minutes and I’ll make sense just a little babble babble like what I’m doing now laughs  (ICE-PHI:S1A-055#3–11) Effects of the Observer’s Paradox potentially taint the data because the speakers’ awareness of being recorded might influence their speech, at the very least, on a subconscious level. As a result, speakers may attempt to speak in a more ‘standard’ manner and avoid many of the features of interest to linguistic analysis. Another important issue is the comparatively high degree of subjectivity in reading and tagging a corpus manually. In his introduction to corpus linguistics, Mukherjee (2009: 25) discusses the issue of ‘exhaustivity’ (or thoroughness) when working with corpus data: The usage of computer-based software allows for an exhaustive analysis of corpora, nearly void of human error; Mukherjee thus calls an automated corpus analysis ‘reliable’ (ibid.). Many questions (in particular those concerned with lexical items or unambiguously identifiable morphosyntactic features) benefit greatly from the ever-growing possibilities offered by modern technology. Automated analysis is often viewed as a completely objective approach; however, it is important to note that subjectivity affects the work of all scholars to some degree. In automated analysis, it is present in, for example, selecting the data, deciding on an approach, choosing specific software, choosing a theoretical framework, and so forth. Computer-based software is a highly useful resource for corpus analysis, yet it is (thus far) incapable of identifying all cases of topicalization: This non-canonical construction is syntactically complex and only identifiable through reading the transcriptions in their full context. Therefore, manual analysis was the method of data evaluation for this study.

Notes 1 Currently, there are corpora for Canada, East Africa, Great Britain, Hong Kong, India, Ireland and SPICE Ireland, Jamaica, New Zealand, the Philippines, and Singapore as well as the written components for Nigeria, Sri Lanka, and the United States. Projects currently in progress are the ICE-corpora for Australia, the Bahamas, Ghana, Gibraltar, Malta, Namibia, Pakistan, Scotland, South Africa, Trinidad & Tobago, Uganda and the missing components for partially available corpora. 2 In the present day, chat communication (for instance via WhatsApp, Facebook, and similar platforms) represents a strong case of written communication close to the ‘language of immediacy’ pole of Koch and Oesterreich’s continuum. 3 Ellis (1997) refers primarily to the SLA classroom. However, his statement extends to bilingual classrooms involving students with varying proficiency in English. 4 It should be mentioned that the ICE classroom lessons were recorded in institutions of higher education. 5 The constraint (at the time of Mesthrie’s 1992 monograph) was the lack of data available for comparison. Over the last 25 years, the data situation has clearly improved in this regard. 6 Possible mistakes in the transcriptions (such as anyhing in the example) were not ‘corrected’.

97

Corpus analysis: Data and methodology 97 Pichler explicitly links these functions to her corpus of English in Berwick-upon-Tweed. 7 8 A call for developing methods to overcome the challenge of the Observer’s Paradox has been made by Rickford (1987: 154) and many others. Recently, Rüdiger (2016) proposed the “cuppa coffee” data collection method for sociolinguistic interviews: “[T]‌he simple act of framing the sociolinguistic interview as new acquaintances drinking a cup of coffee together helps to avoid a language learning and teaching framework, puts participants in a more relaxed mindset and finally results in more ‘naturalistic’ and richer conversational data” (2016: 49).

98

6 Forms, functions, and frequencies of topicalization

Previous research has hinted at (comparatively) high frequencies of topicalization in Asian Englishes (Lange 2012a; Winkle 2015) and innovative usage patterns in L2 varieties of English (Mesthrie 1992). This chapter builds on this research and looks at topicalization in Asian Englishes in more detail by providing answers to two central questions of this book: (1) What are the frequencies, forms, and functions of topicalization in HKE, IndE, PhilE, and SinE, and do they differ significantly from BrE? (2) Do Mesthrie’s ‘expanded functions’ of topicalization (1992) apply to the analysed varieties of English? For this purpose, I present the results of the corpus analysis by combining quantitative and qualitative methods. First, the frequencies of topicalization across varieties are investigated. This is followed by an analysis and a discussion of the syntactic forms and the information status of the identified topicalized constituents. Finally, the syntactic functions and the discourse functions of topicalization are discussed. By incorporating these aspects and the distribution according to Mesthrie’s criteria for an expanded concept of topicalization, my claim is reinforced that an understanding of topicalization needs to be expanded in order to account for all usage patterns in spoken (non-standard) varieties of English.

6.1 Frequencies of topicalization The first major section of this chapter considers the frequencies of topicalization by asking the following three questions: (1) How often do speakers of HKE, IndE, PhilE, and SinE use topicalization in ICE and how do these frequencies compare to British English? (2) What is the ratio of TOP1 (tokens of topicalization in a ‘traditional’ sense) to TOP2 (tokens of topicalization in an ‘expanded’ sense) in each variety? (3) Are there differences depending on genre, that is, direct conversations, phone calls, and classroom lessons?

99

Topicalization form, function, frequencies 99 In order to answer these questions, the frequencies in each genre are presented individually before a comparison offers a perspective on differences between varieties and across genres. 6.1.1 Direct conversations Files containing direct conversations between speakers represent the bulk of the spoken files in ICE. Although ICE is designed to allow direct comparisons between varieties, the actual word count in each corpus differs at times quite greatly from the numbers given as orientation for researchers and from the other ICE corpora. Thus, the actual number of tokens shows tendencies but is unfit for statistical analysis. For this reason, the figures were normalized according to the procedure described in section 5.3. Table 6.1 gives both the actual number of tokens and the normalized figures in the conversation files. Figure 6.1 shows a stacked bar chart, in which the lower bars represent ‘traditional’ tokens (TOP1) and the upper bars are cases according to Mesthrie’s expanded concept of topicalization (TOP2).1 The differences in terms of topicalization frequencies between the varieties are highly significant (X-squared = 121.16, df = 4, p-value < 2.2e-16).2 This is not surprising, given that ICE-India has a much higher number of tokens both in absolute and normalized figures and in TOP1 and TOP2 each. ICE- Singapore comes in second place, with numbers somewhat above half of ICE- India’s count. Interestingly, the lowest number of tokens was not identified for ICE-Great Britain, but for ICE-Hong Kong. Overall, the numbers in Table 6.1 and the size of the barplots in Figure 6.1 suggest a relatively similar distribution of TOP1 and TOP2 across varieties. SinE is, however, the sole variety where more than half of the tokens (60.95%) are only acceptable if an expanded definition is applied. Table 6.1 Overall frequency of topicalization across the direct conversations

ICE-GB

ICE-HK

ICE-IND

ICE-PHI

TOP1 tokens

30

17

110

19

TOP1 normalized

16.36

TOP2 tokens

25

TOP2 normalized

13.63

Overall tokens

55

38

207

37

Overall normalized

29.99

17.91

106.91

19.13

8.01 21 9.9

56.81 97 50.1

9.83 18 9.31

ICE-SIN 41 22.80 64 35.59 105 58.39

100

100 Topicalization form, function, frequencies

Figure 6.1 Frequency of topicalization across varieties, direct conversations.

6.1.2 Phone calls The overall frequency of topicalization and the distribution of TOP1 and TOP2 in the phone calls are, for the most part, similar to the direct conversations. Table 6.2 gives the number of tokens found in the phone calls, while Figure 6.2 is a graphic representation of the normalized TOP1 and TOP2 counts. Table 6.2 Overall frequency of topicalization across the phone calls

ICE-GB TOP1 tokens TOP1 normalized

4 19.73

TOP2 tokens

2

TOP2 normalized

9.87

Overall tokens

6

Overall normalized

29.6

ICE-HK 4 14.14 4 14.14 8 28.28

ICE-IND

ICE-PHI

17

2

72.45

8.36

5

0

ICE-SIN 8 39.98 7

21.31

0

34.98

22

2

15

93.76

8.36

74.96

101

Topicalization form, function, frequencies 101

Figure 6.2 Frequency of topicalization across varieties, phone calls.

Statistically, the difference between the varieties is, as in the conversation files, highly significant (X-squared = 108.84, df = 4, p-value < 2.2e-16). However, some differences with regard to the distribution of tokens can be witnessed. ICE-India again shows the highest frequency of topicalization, but the number of tokens identified in ICE-Singapore is much closer to the Indian frequencies than in the conversation files. In the direct conversations, the Singapore corpus has about 60 per cent of the tokens of India, whereas the number is at 80 per cent in the phone calls. Another major difference can be seen for ICE-Philippines, where no TOP2 tokens and few TOP1 tokens were found. The very low overall number of tokens found in the phone calls, however, suggests that these results should be taken with consideration –the chances of no TOP2 token occurring in a corpus of 100,000 words are rather slim, although the normalized figures technically predict complete absence. In addition, the phone calls have the lowest word count of the three text types at approximately 20,000 words per variety. In addition, Bautista notes for ICE-Philippines that the phone calls were recorded by students as a part of their graduate class rather than in a natural way (2004: 12), which might have affected the data even more than would normally be expected in corpus compilation. 6.1.3 Classroom lessons The classroom lessons represent a different kind of setting compared to the direct conversations and phone calls. This is because of the fact that the relation between the teacher and the students is relatively static and there is a higher degree of

102

102 Topicalization form, function, frequencies variation in terms of how spontaneous and interactive discussions turn out to be. In spite of this problem, some trends identified in the other two sub-corpora are corroborated. Again, most tokens were found in the Indian corpus, while the lowest numbers were found in ICE-Hong Kong. Interestingly, the classroom lessons in ICE-Philippines only show slightly fewer tokens than the ones in ICE- Singapore, which is a very different finding compared to the conversations and phone calls. The numbers for topicalization across all analysed classroom lessons in ICE are shown in Table 6.3 and the distribution is visualized in Figure 6.3. Table 6.3 Overall frequency of topicalization across the classroom lessons

ICE-GB TOP1 tokens TOP1 normalized

5 11.99

ICE-HK 2

18

4.04

41.18

TOP2 tokens

3

3

TOP2 normalized

7.2

6.06

Overall tokens

8

5

Overall normalized

19.19

ICE-IND

10.11

9

ICE-PHI 7 15.24

ICE-SIN 6 14.84

2

2

20.6

4.35

4.95

27

9

8

61.78

19.59

Figure 6.3 Frequency of topicalization across varieties, classroom lessons.

19.79

103

Topicalization form, function, frequencies 103 Table 6.4 Overall frequency of topicalization across varieties (totals)

ICE-GB

ICE-HK

ICE-IND

ICE-PHI

TOP1 tokens

39

23

145

28

55

TOP1 normalized

15.89

10.64

22.89

TOP2 tokens

30

20

73

TOP2 normalized

12.23

Overall tokens

69

51

Overall normalized

28.12

17.59

7.93 28 9.66

55.6 111 42.56 256 98.16

7.6 48 18.24

ICE-SIN

30.39 128 53.28

Once more, the difference in terms of frequency is highly significant between the varieties (X-squared = 63.571, df = 4, p-value < 5.146e-13). In the classroom lessons, there are generally fewer TOP2 tokens than in the other sub-corpora. HKE is an exception to this, as the lessons are the only context for the variety in which the number of TOP2 tokens exceeds that of TOP1 (although it is fairly even in the conversation files and identical in the phone calls). PhilE stands out by showing a much higher frequency of topicalization in the classroom lessons than it does in the other files. This finding can certainly be attributed, in part, to the very heterogeneous and unpredictable nature of interaction in the classroom. 6.1.4 Comparison The previous sections compared the frequencies of topicalization for quite different genres, but what happens when the overall frequencies are compared? As a first step towards an answer to this question, the figures for all tokens across all three sub-corpora are given in Table 6.4 and visualized in Figure 6.4. A potentially more revealing overview is given in Figure 6.5, which presents a comparison of varieties and of frequencies across the three different genres as well as ‘traditional’ (TOP1) versus ‘expanded’ (TOP2) topicalization. The x-axis indicates TOP1 and TOP2 as well as genre (DC = direct conversations, PC = phone calls, CL = classroom lessons). In all three sub-corpora, IndE consistently shows the highest frequencies of topicalization; the only exception to this is the amount of TOP2 tokens in the phone calls. The strong tendency of IndE speakers to use topicalization identified in this study is in line with previous cross-varietal investigations (Winkle 2015) and confirms studies focusing on the syntax of IndE (Lange 2012a). Although there are differences in the precise numbers, the overall picture remains unchanged in this regard.

104

Figure 6.4 Overall frequency of topicalization across varieties (totals).

Figure 6.5 Frequency comparison of topicalization across varieties and genres.

105

Topicalization form, function, frequencies 105 With high numbers in all three sub-corpora, SinE is relatively consistent. PhilE is a strange case: topicalization occurs more frequently in the PhilE conversations than in the HKE conversations but, in the phone calls, PhilE shows the lowest numbers identified for any variety in any setting. In the classroom lessons, on the other hand, it almost ranks in second place. HKE also shows low frequencies compared to IndE and SinE, whereas BrE ranks in the middle in almost every corpus. The differences in frequency between the varieties are significant in all the sub-corpora as well as when combined (X-squared = 107.44, df = 4, p-value < 2.2e-16). It should be noted again, however, that the classroom lessons cannot be lumped together with the other files without running into methodological problems. The distribution between TOP1 and TOP2 is fairly even in the conversations, but there are fewer TOP2 tokens in the classroom lessons. The major findings from this first section are, accordingly, that IndE features particularly high frequencies of topicalization and that BrE, perhaps surprisingly, does not feature significantly lower frequencies compared to the Asian varieties. HKE is outstanding in that it features comparatively few tokens of topicalization, while PhilE behaves inconsistently across the different corpus components.

6.2 Forms of topicalization The previous section introduced the frequencies of topicalization, which means that it is now clear how often topicalization occurs in the selected varieties. Now, building on this information, the question is addressed in which form(s) topicalization occurs in Asian Englishes. In this context, ‘form(s)’ is used as an umbrella term, since the analysis in this sub-chapter encompasses syntactic form as well as information status. Again, a series of questions can be used for orientation through the section: (1) Which syntactic form do topicalized constituents have (and how often do they occur)? (2) Why are clauses and long constituents rarely topicalized? (3) What is the information status of topicalized tokens, that is, is the topicalized information new or old in the discourse? (4) How do so-called hanging topics, that is, syntactically loosely connected topics, fit into the study of topicalization? Based on the answers to these questions, this sub- chapter lends support to Mesthrie’s (1992) idea of an expanded concept of topicalization by presenting tokens that are not NPs, not evoked, and not even syntactically related to the remainder of the sentence in the case of hanging topics. It should be mentioned that, unless necessary and then stated accordingly, I do not differentiate the three sections of ICE that were analysed from here onwards.

106

106 Topicalization form, function, frequencies 6.2.1 Constituent form This section focuses on the syntactic form of constituents and the correlation between the length of constituents and topicalization. Mesthrie lumps function and form together as he speaks of “subject NPs” (1992: 113), which are the most frequently left-dislocated constituent in SAIE. He further identifies topicalized objects as well as constructions with the semantic roles of temporals, locatives, accompaniment, genitives, dative of purpose, goal, instrument, means, cause, beneficiary/recipient, dative to, comparatives, and others (Mesthrie 1992: 113, 120). In order to avoid mixing up syntactic and semantic categories, I decided to annotate all tokens for their syntactic form and function as well as their function in the discourse. Prototypical topicalization involves an object –frequently an NP –in lieu of the subject in sentence-initial position. Deviations from this pattern were seen as relevant to this criterion, that is, any formal realization other than an NP. In Table 6.5, the absolute numbers, normalized figures (for comparison), and percentages of the syntactic forms across all varieties are indicated; Figure 6.6 depicts the mean values of the percental distribution of all forms from most to least frequent. Percentages in the table are given in order to highlight how frequent each category is in a variety compared to its relative frequency in the other varieties. As expected, most topicalization occurs with NPs. In fact, around three quarters or more of all constituents are NPs in each variety. PhilE is outstanding Table 6.5 Syntactic forms of topicalized constituents across varieties (absolute frequencies and relative frequencies)

NP frequency, absolute figures Normalized frequency NP frequency in % AdjP frequency, absolute figures Normalized frequency AdjP frequency in % AdvP frequency, absolute figures Normalized frequency AdvP frequency in % PP frequency, absolute figures Normalized frequency PP frequency in % Clause frequency, absolute figures Normalized frequency Clause frequency in %

ICE-GB

ICE-HK

ICE-IND

ICE-PHI

ICE-SIN

52

38

193

43

95

21,19 75.36 4

13,11 74.51 5

74 75.39 20

16,34 89.58 3

39,54 74.22 12

1,63 5.8 3

1,72 9.8 2

7,67 7.81 7

1,14 6.25 0

4,99 9.37 4

1,22 4.35 7

0,69 3.92 6

2,68 2.73 28

0 0 2

1,66 3.13 17

2,85 10.14 3

2,07 11.76 0

10,74 10.94 8

0,76 4.17 0

7,08 13.28 0

1,22 4.35

0 0

3,07 3.13

0 0

0 0

107

Topicalization form, function, frequencies 107

Figure 6.6 Distribution of syntactic forms in percentages (across all corpora).

in that it has, by far, the highest amount of NPs with almost 90 per cent. Because of this high amount of NPs, it has notably fewer topicalized AdvPs and PPs than the other varieties. Across all other varieties, the distribution is fairly similar. As can be seen in the table, topicalized AdjPs, AdvPs, PPs, and clauses could be identified as well, although they occur much less frequently than NPs. In the following sections, I provide details on each topicalized constituent form and discuss noteworthy examples. Noun Phrases Topicalized NPs may be repetitions of previously mentioned entities, newly introduced entities, or anaphoric pronouns. In case a speaker wishes to refer to a previously mentioned entity, a demonstrative determiner or an anaphoric pronoun may be used. Examples (6.1) and (6.2) from ICE-India contain topicalized anaphoric pronouns: (6.1) C: Uh have you been to the temple which are on the beach? B: Yes on the beach uh I don’t know One Maha C: Called Mahabalipuram B: Yeah that I’ve seen (ICE-IND:S1A-029#135–139)

108

108 Topicalization form, function, frequencies (6.2) A: But if you need a for the passport you need duplicate do C: No I don’t suppose but you need always originals If she has a passport it’s well and good otherwise great problem yaar A: Yeah That she has I think (ICE-IND:S1A-037#243–247) Anaphoric pronouns are located at the ‘activated’ end of the Givenness Hierarchy (Gundel et al. 1993) and would count as activated in Lambrecht’s (1994) scale. This is because of the fact that anaphoric pronouns must represent given information, with ‘given’ explicitly referring to discourse-old information (in this case). An analysis of information status confirms this finding, as every single anaphoric pronoun in sentence-initial position found in the corpora was annotated as containing evoked information. The situation is slightly different for demonstrative determiners, as they may also refer to extralinguistic entities. An NP such as this car may refer to a car already mentioned in the discourse (‘This car [you were just talking about]’) or to a car that is, for instance, in the immediate field of vision (‘This car [right next to me]’). Gundel et al. (1993) thus differentiate between this N in activated mode and this N in referential mode, which is closer to the indefinite end of the scale. In the analysed corpora, the vast majority of demonstrative determiners represent evoked information. The exchange in (6.3) shows such a case, while (6.4) features a demonstrative determiner introducing discourse-new information. (6.3) C: In Singapore at least none of the I mean tech officers will go down and confirm that that is the writing of someone D: But this kind of things they will only do if it’s something real serious like C: Yah (ICE-SIN:S1A-005#305–307) (6.4) C: Is it a restaurant E: It is a restaurant C: And they have satay D: They have satay Recently that man I saw (ICE-SIN:S1A-037#153–157) In (6.4), speaker D starts talking about that man without any announcement or antecedent in the preceding discourse. Although the determiner that suggests an unusually high degree of definiteness not commonly associated with new information, it appears that the information is, in their mind, related in some way to the information that the restaurant is offering satay. Interestingly, the following discourse makes no mention of the man again, which means that the speaker

109

Topicalization form, function, frequencies 109 Table 6.6 Topicalized anaphoric pronouns and demonstrative determiners across varieties

Anaphoric pronouns % of totals Demonstrative determiners % of totals % evoked

ICE-GB

ICE-HK

3 4.35 2

5 9.80 4

33 12.89 18

6 12.5 8

2 1.56 14

7.84 100

7.03 88.9

16.67 100

10.94 85.71

2.9 100

ICE-IND

ICE-PHI

ICE-SIN

either failed to shift the topic, if that was their intention, or the interlocutors accepted it as additional information without further inquiry. Table 6.6 gives an overview of the frequencies of anaphoric pronouns and demonstrative determiners across varieties. In addition, their percentage in relation to all tokens and, for the demonstrative determiners, the percentage of evoked tokens is indicated. As can be seen in the table, almost all combinations of a demonstrative determiner and a noun are evoked; in the two varieties where this is not the case, only two tokens each could be identified (one of which is shown in 6.4). A regular case, where direct repetition of an NP is involved, can be seen in (6.5). In this exchange, speaker A is astonished by B’s culinary adventures and simultaneously creates a discourse link and emphasizes the NP snakes. (6.5) B: Yeah in Hong Kong they serve that A: Snakes B: Yeah they were offering snake’s blood as a drink in that same Chinese restaurant A: Yeah B: And that same kind of stuff huh A: Snakes you tried What about snake bile (ICE-PHI:S1A-023#267–272) The topicalized NP in (6.5) is a typical example, not only for ICE-Philippines but also for the other varieties. It contains evoked information, functions as a direct object, and creates topic continuity. Adjective Phrases The next phrase type, the adjective phrase, is featured in as little as 6 per cent of all tokens in ICE-Great Britain to in as much as 10 per cent in ICE-Hong Kong. As in (6.6), AdjPs typically function as subject complements. In this example, the speaker emphasizes the excellence of Stanley Bay as a place to live by topicalizing the adjective phrase. In this discourse excerpt in particular, the usage

110

110 Topicalization form, function, frequencies of topicalization is probably motivated by the speaker’s need to quickly utter an idea that spontaneously occurred in their mind. The quality of Stanley Bay here is the most important piece of information, and it has a clearly identifiable link to the preceding discourse where the living conditions of other places have been discussed. (6.6) Z: The living condition is very good I think A: Uhm  uhm Z: Around U hall A: Yes B: Yes Z: Silent  place A: Uhm Excellent Stanley Bay is Unless you you you not face that kind of that kind of uhm something frightening (ICE-HK:S1A-051#446–454) Clauses In addition to NPs and AdjPs, 11 clauses in topicalized position could be identified in the corpora. The low amount of topicalized clauses can be explained by taking into consideration production and processing requirements. All eleven clauses have a length of at least three words, but most of them exceed five words. Because of expectations of a sentence-initial subject, processing a direct object in the form of a clause is particularly demanding. The results of initial sentence processing, which sets in very quickly in on-line parsing (cf. Pickering 1999: 124), are misleading when a clause that is not the subject or an adverbial comes first. Consequently, the hearer is required to ‘start over’. A similar problem is posed by ‘garden-path’ sentences, in which the hearer is also misled by the first part of a sentence; see examples (6.7a-b): (6.7) (a) Because Bill left the room seemed empty. (Ferreira and Anes 1994: 35) (b) The city council argued the mayor’s position forcefully. (Beach 1991: 646) Sentence (a) contains a garden- path sentence and is more difficult to process than (b) because additional effort is required to untangle its structure. In actual conversation, however, processing can often be eased by prosodic means. Ambiguity in (a), for instance, would “[a]‌uditorily […] be far less apparent, or even eliminated” (Ferreira and Anes 1994: 35). Similarities between garden-path sentences and ambiguity in topicalized clauses can be illustrated by taking into consideration excerpts such as (6.8) from ICE-India.

111

Topicalization form, function, frequencies 111 (6.8) C: And I took a Belgaum and Dharwad bus that is beautiful new bus infact And it took two two hours thirty minutes to reach Dharwad A: Uh Uhn B: My God C: And I was the first man to go to driver and to scold him because see whether you are a driver or not first of all I asked him (ICE-IND:S1A-017#11–16) Speaker C topicalizes an object clause but elicits a canonical SVX sentence, because subject and object clauses cannot be distinguished purely based on their surface. The hearer(s) must assume the clause to be a subject clause, which could, for instance, be followed by a copula (e.g., ‘Whether you are a driver or not is actually important’). Such is not the case here, as the speaker seemingly initiates the narration of a sequence of events (indicated by first of all). This impression is not confirmed, but the subject and verb follow only after the object and the parenthetic adverbial. A concept of relevance to clauses and longer constituents in general is the ‘end-weight principle’ (Quirk et al. 1985; Biber et al. 1999) or ‘principle of increasing complexity’ (Dik 1989).3 In essence, this principle is based on the aforementioned notion of syntactic weight, that is, the concept “measured in terms of the length (number of syllables or words) and/or the morphosyntactic complexity of sentence constituents” (Callies 2009: 17). The end-weight principle postulates that shorter and ‘simpler’ constituents precede longer and more complex constituents in sentence production (ibid.).4 Regarding the effort going into an utterance, weight can be seen from the perspective of the hearer and that of the speaker. Studies tended to focus on the former perspective, arguing that parsing and sentence comprehension explain the principle (ibid.). Wasow (1997a, 1997b) sheds light on the speaker perspective and argues that “weight effects exist primarily to facilitate utterance planning and production” (1997a); thus, the speaker is effectively “buying time to plan the remainder of the sentence” (Callies 2009: 18). Kaltenböck (2015) sees the primary link between the end-weight principle and information packaging in the interplay between constituent length and information status. More precisely, given constituents tend to be shorter because they can be pronouns or other short expressions (Kaltenböck 2015: 121). New information, on the other hand, “often needs to be stated more fully and can therefore be expected to be longer” (ibid.). If applied to the clauses identified in the corpora, it becomes evident that the end-weight principle generally seems to hold, since clauses are the least frequent constituent type to be topicalized. On the speaker side, production planning is complicated, and, for the hearer, processing becomes more difficult because of the potential of ambiguity. According to Kaltenböck’s theory (2015), topicalizing given constituents is more likely (and expected to be felicitous), since they

112

112 Topicalization form, function, frequencies are generally shorter in length than constituents conveying new information. Extending this concept, the assumption is reasonable that if longer constituents are topicalized, they will usually contain information that is easily accessible to the hearer and important enough to the speaker to accept complex production planning. In this regard, it is interesting to analyse further examples with regard to information status. Generally speaking, clauses tend to fit less neatly into the threefold distinction of evoked, unused, and brand-new than do most other constituents. This is because previously mentioned information may be part of the clause, but additional information can be added. Still, whichever of the three categories was considered the most appropriate was selected in the annotation. The information status of the finite clause in (6.9), for instance, was annotated as ‘evoked’, since ragging and the cardinal directions are mentioned by speaker B at earlier points in the discourse. (6.9) B: So those are the differnces [sic] you can find like how badly they are ragging in north and south South how lenient they are A: So what you like? Do you think that there should be ragging or not? B: Well in my point of view okay ragging means what they are doing in the south I really appreciate Unless you rag your junior by asking so many I mean whatever you want to ask that junior never bothers to say hi even like in their later life like when the college goes on smoothly like So in that point of view ragging is good (ICE-IND:S1A-090#129–135) Like the examples in (6.8) and (6.9), the majority of clauses indeed contain evoked information. However, cases with unused and brand-new information do occur. In example (6.10), an excerpt from the classroom lessons of ICE-India, the speaker begins to read a text that is introduced as a lecture. Irrespective of whether this is a text written by the instructor or someone else, the topicalized clause in the second utterance contains information that the teacher cannot assume to be stored in his audience’s minds. (6.10) A: The neighbours did not know what to make of the young man That he was a college lecturer and scientist they knew And he did experiments in small laboratory he had set up at his house (ICE-IND:S1B-003#282–284) Comparing the relation of brand-new and unused tokens with evoked tokens in clauses, a proportion of roughly 1:3 suggests that clauses typically contain evoked information. Moreover, only two of the four clauses where this is not the case contain brand-new information.

113

Topicalization form, function, frequencies 113 Prepositional Phrases In addition to the other identified constituent forms, several topicalized prepositional phrases were found in ICE. With the exception of PhilE, prepositional phrases are, in fact, the second most frequently topicalized structure in all varieties. PPs in topicalization usually represent highly marked or obligatory adverbials, an example of which can be seen in (6.11). In this exchange, speaker A employs topicalization with the purpose of structuring and emphasizing their thoughts. (6.11)

Z: But How do they get on A: They B: No one cares A: Yah and they will run th they will run through the window yeah so through the window they can go inside the train (ICE-HK:S1A-062#337–341)

In some cases, although this is not restricted to PPs, making sense of topicalization requires additional knowledge. See (6.12) as a case in point: While the PP is formally identical to the one in (6.11), it requires having knowledge about the meaning of ‘MSG’. (6.12) A: Every day I think that Jollibee earns more than McDonalds B: Maybe it’s because they have more variety A: Uh uhm and we have tastier food B: Through the M S G they’re putting A: What B: They’re strictly Filipino A: Yeah B: The ingredients are strictly Filipino (ICE-PHI:S1A-038#32–39) Without additional information or knowledge of the abbreviation MSG, understanding what is meant by speaker B is not possible. This is also suggested by A’s puzzled reaction. MSG refers to monosodium glutamate (Freeman 2006) and is a flavour enhancer commonly used in Chinese restaurants. Speaker B takes a critical stance and suggests that the only reason that speaker A’s cuisine has tastier food is because they are putting a (possibly unnatural) flavour enhancer in said food. Thus, the PP underscores speaker B’s point by immediately continuing her argument without a direct syntactic connection. Adverb Phrases Much rarer than PPs and only slightly more frequent than clauses in the data are topicalized adverb phrases. They can occur in copular sentences (6.13) or as marked adverbials (6.14).

114

114 Topicalization form, function, frequencies (6.13) D: The land still owns the government still owns the land C: Yah  yah D: Embassy C: Embassy  no It’s part of the A: But overseas it is isn’t it C: Yeh (ICE-SIN:S1A-005#338–344) (6.14) A: So in yoga there are uh relaxation technique B: Uhm A: So we have to relax full our body And then I will give you instruction according to that you do then concentration on breathing B: Accha A: If you will do all these you will feel relaxed and uh you will yourself will be able to B: Mentally I want to be fit A: Yeah you will be and your uh that capacity capability to work will increase (ICE-IND:S1A-043#31–38) In light of the various constituent forms that were found in topicalized position in the five spoken varieties of English, one of Mesthrie’s (1992) ‘expanded functions’ of topicalization can be confirmed: Topicalization, not only in Asian Englishes, but also in British English, goes beyond the fronting of NPs, which is why I disagree with narrow conceptualizations put forward, for instance, by Mesthrie and Bhatt (2008). It is not surprising that NPs are the most frequent kind of phrase in topicalized position, since objects typically take this form. However, PPs at around 10 per cent (except in PhilE) are also rather frequent, and AdjPs, AdvPs, and clauses can be found to a limited extent as well. At this point, I would like to pick up the question of constituent length again. It was mentioned that topicalized clauses and longer constituents are relatively rare because of increased production and processing demands, meaning that both the speaker and the hearer(s) have to put in additional effort. Counting all tokens with a length of four words or more, it becomes evident that they are indeed relatively rare in topicalized position. Of all 552 tokens, 68 (12.3%) consist of 4 or more words. Slightly more than a third of these (n = 26, 38.2%) have a length of 4 words, and 16 (23.5%) have a length of 5 words, meaning that little more than a third of this already rather low amount is longer than 5 words. In order to have a methodologically sound analysis of constituent length, information on the average length of constituents in their ‘regular’ position would be necessary. This is difficult to establish, which is why I have to put forward the mere assumption that longer constituents are generally not preferred in topicalized position. According to Erdmann (1988: 337) and Yngve (1960, 1961), the immediate

115

Topicalization form, function, frequencies 115 memory can store a maximum of seven items. This might be why “the grammar of English is so constructed that excessively deep constructions are actively prevented, and alternatively constructions of lesser depth are provided” (Yngve 1961: 136). Yngve’s approach is concerned exclusively with the syntactic context. However, both of the theories mentioned in this chapter –Yngve’s as well as the end-weight principle –hold for the topicalization of longer constituents (which rarely exceed six words).5 It became evident in this section that innovation in form, meaning the topicalization of constituents other than NPs, can be found in all the analysed spoken varieties –including British English. The next sub-chapter looks in detail at information status, which is another area where some of the varieties show noteworthy tendencies. 6.2.2 Information status Information status, sometimes also called cognitive status (Gundel et al. 1993: 275), refers to assumptions made by a cooperative speaker “regarding the addressee’s knowledge and attention state in the particular context in which the expression is used” (ibid.). The major categories distinguished in scholarship were outlined in chapters 2 and 5, where I pointed out a preference for a simplified distinction of evoked, unused, and brand-new status for the present study. For the most part, the decision not to go into further detail was based on various methodological constraints, such as lacking discourse context and unavailable audio files. Turning to Mesthrie again, ‘unanticipated fronting’ is listed as the first criterion in his collection of SAIE deviations from ‘traditional’ varieties of English in terms of topicalization. The relevant quote is repeated here for convenience: “In simply fronting a salient (but not necessarily given or contrastive) element SAIE appears to be closer to the ‘pure’ topic mode than the mainstream English mode” (1992: 113). As pointed out repeatedly, givenness is often highlighted as a defining characteristic of topics and topicalization. Ward and Birner (2004), for instance, consider a link to the preceding discourse to be a requirement for felicitous topicalization. In order to test this criterion for the five varieties, the preceding discourse was taken into consideration. Based on this analysis, three major ratings were given. Table 6.7 reintroduces the three selected labels, compares them to Gundel et al.’s (1993) framework, and gives a description of what is encompassed by each status. It should be noted that the distinction of evoked, unused, and brand-new does not map precisely to the frameworks proposed by others; the indicated relation is an approximation visualized by placing the categories closer to the top or bottom of each cell. Table 6.8 indicates the frequencies and percentages of evoked, unused, and brand-new tokens across all corpora. As in the analysis of syntactic form, all sub- corpora were included. The percentages are visualized in a stacked barplot in Figure 6.7, with the percentages located on the y- axis and the varieties on the x- axis. Since the

116

116 Topicalization form, function, frequencies Table 6.7 Possible information statuses of topicalized tokens compared to Gundel et al.’s (1993) framework

Annotation

Category

Gundel et al. (1993)

Description

E

evoked

in focus

The constituent is ‘given’ in the sense that it has been explicitly mentioned in the discourse before or is part of a previously established poset.

activated U

unused

B

brand-new

familiar

The constituent is not explicitly given in the sense of E, but is presumed to be shared knowledge between the speakers or general world knowledge.

uniquely identifiable referential

The constituent is brand- new; the speaker cannot assume that the hearer is aware of the information.

type identifiable

Table 6.8 Information status of all topicalized constituents

Evoked

Unused

Brand-new

Totals

ICE-Great Britain %

56 81.16

5 7.25

8 11.59

69 100

ICE-Hong Kong %

41 80.4

4 7.84

6 11.76

51 100

223 87.11

19 7.42

14 5.47

256 100

ICE-Philippines %

40 83.33

5 10.42

3 6.25

48 100

ICE-Singapore %

95 74.22

16 12.5

17 13.28

128 100

ICE-India %

distribution in the figure is based on percentages, the decision was made to order the varieties alphabetically in this case. The barplots show the calculated relative frequencies of the three categories assigned to each token. Although evoked topics clearly dominate in each variety, ICE-Singapore stands out with regard to information status. While the plots in all other varieties are comprised of more than 80 per cent evoked topics, 74 per cent of the tokens in SinE are evoked, 12 per cent are unused, and 13 per cent are brand-new. With the exception of ICE-Singapore (where we find even numbers) and ICE-Great Britain (where it is exactly the other way round),

117

Topicalization form, function, frequencies 117

Figure 6.7 Information status of all topicalization tokens across varieties.

unused tokens are usually more frequent than brand-new tokens. This is unsurprising, as brand-new tokens are far more demanding of the hearer than unused tokens. Interestingly, ICE-India behaves relatively ‘conservatively’ with regard to information status. Of all corpora, ICE-India has by far the highest number of evoked tokens (87%). An explanation for this might be found in the fact that topic continuity plays a particularly important role in IndE (Lange 2012a: 137). As topic continuity normally requires a constituent to be discourse-old, a possible correlation might be seen here. For the purpose of illustration, some examples with different information status are shown and discussed in the next paragraphs. Example (6.15) from the conversation files of ICE-Singapore is interesting in that the discourse topic of {items to buy} runs through the conversation for a long time. The small excerpt only shows a brief segment of the overall discourse; the speakers previously discuss purchasing a wrapped clock and getting food from a bakery. The topicalized constituent flowers is clearly evoked –not only are the flowers part of a previously established poset, but they are also mentioned only seconds before speaker B employs topicalization. (6.15) B: I don’t know whether to go I need to go to my mum’s but no flowers A: You should have You want to go you never plan in advance one nuh B: But flowers you buy too early

118

118 Topicalization form, function, frequencies It’s useless what It will it will die off (ICE-SIN:S1A-007#64–69) Example (6.16) from ICE- India contains a topicalized comitative realized by a prepositional phrase. With my family –the topicalized constituent in this example –was rated as ‘unused’ for two reasons: The first reason is the lack of an antecedent in the preceding discourse; the other is the fact that ‘having a family’ can be assumed to be knowledge shared by the speakers. Furthermore, the family is not referred to by means of a pronoun, which would strongly suggest that it is given. The speaker can assume the hearer to be aware of her family, so that topicalization is felicitous. This is confirmed by the reaction of speaker A, which is (presumably affirmative) laughter. (6.16) A: So then you are coming to Goa B: Yes shortly A: Let me know now B: With my family I’ll come A: Laughs  B: I’ll inform you I’ll get your address and I’ll inform you (ICE-IND:S1A-001#66–72) The conversational exchange in (6.17) shows a token rated as ‘brand- new’, although a case could also be made for unused status. (6.17)

B: Go  China Why don’t you go China A: China is the last place on earth I wanna go B: Why A: So difficult getting a now that so difficult to get a clean toilet uh The mosquitoes not to mention (ICE-SIN:S1A-050#194–199)

In one line of reasoning, the poset {reasons for not going to China} could be seen as running through the conversation as soon as speaker A decides to respond to speaker B’s question. The first member of this poset, then, would be the difficulty of finding a clean toilet. The mosquitoes, in turn, would be the second member. However, it could be asked how much a poset may be generalized and at what point it becomes too broad to actually be considered ‘given’. Reasons for not going to a specific place range from sanitary reasons, as in this example, to very personal reasons of which the interlocutor might be entirely unaware. Thus, the connection here is rather loose regardless of the actual answer given by speaker A, and, although there is certainly an expectation on speaker B’s part, it is unlikely that they expected specifically this answer. Since each case was annotated individually, the decision regarding an entity’s information status was always made

119

Topicalization form, function, frequencies 119 based on the discourse context and the likelihood of an entity being (assumed) shared knowledge. In terms of the frequency of unused and brand-new tokens, SinE clearly shows the highest percentages. IndE, in contrast, is the variety with the highest amount of evoked tokens. I come back to this in the discussion of discourse functions, since creating continuity has been called one of the primary motivations for topicalization in IndE (although it is evidently not the only function). The establishment of cohesion and topic continuity requires links to the preceding discourse, meaning that having a larger number of evoked tokens is a necessary by-product. Considering Mesthrie’s criteria again, it becomes evident that Asian varieties of English do not differ significantly from British English in terms of information status. I identified unused and brand-new tokens in all five corpora, which means that differences are exclusively of a statistical nature. In conclusion, the frequently proposed constraint that topicalized tokens must have a link to the preceding discourse does not hold for any of the analysed spoken varieties of English. 6.2.3 Hanging topics Another way of looking at forms of topicalization is by analysing so-called hanging topics, which can be discussed both in terms of their syntactic form and information status. Hanging topics, also known as ‘dangling topics’, are sentence-initial topics without a syntactic link to the remainder of the clause (see Hole 2012: 57). Some important properties of hanging topics are (a) that the information that follows tells something about them; (b) their pragmatic accessibility; and (c) the lack of a syntactic link to the rest of the clause (Lambrecht 1994: 193). An example is given in (6.18). (6.18) (From a TV interview about the availability of child care) That isn’t the typical family anymore. The typical family today, the husband and the wife both work. (Lambrecht 1994: 193; emphasis added) Formally, hanging topics are identical to topicalization and left-dislocation (cf. Nolda 2004) in the sense that they represent a topic that is marked by dislocation to the left. The difference, as mentioned above, is the lack of a syntactic relation to the remainder of the clause. Function-wise, hanging topics strongly tend to create cohesion in the discourse. This function is also associated with topicalization and left-dislocation, although the range of possible functions includes several other options. Hanging topics were tagged in ICE for the present study, but a quantitative analysis would not be particularly revealing because of a very low number of tokens. Instead, some examples and typological comparisons (to show yet another possible sub-category of topicalization in Asian Englishes) are provided in this chapter; starting with example (6.19) containing a brief dialogue at the onset of a recorded conversation in SinE.

120

120 Topicalization form, function, frequencies (6.19) B: Is that That’s not enough for you ah A: Thirsty I’ll go and buy a drink do you want do you need another drink B: No (ICE-SIN:S1A-001#1–3) Thirsty, in this exchange, is a good example of one such ambiguity: It could be considered as either a case of ellipsis (‘I am thirsty, I […]’) or a (formally rather unusual) hanging topic. The lack of a syntactic link to the following discourse, which is the primary characteristic of hanging topics, is met by this particular example. In addition, the information in the hanging topic is pragmatically accessible and the remainder of the sentence is semantically related. Typologically, hanging topics are frequently highlighted as a common construction in Chinese. Kausen calls topics that follow the subject but do not have a syntactic link to the clause ‘typical’ of modern Chinese (2013: 725), providing example (6.20) as a case in point. (6.20)

Zhangsang tóu téng Zhangsang head hurts ‘Zhangsang, (his) head hurts’

(Kausen 2013: 725)

Following Chafe (1976), such syntactically unconnected topics in Chinese are referred to by Bao and Min as ‘Chinese-style topics’ (2005: 274). Chinese-style topics are the counterpart to English-style topics (ibid.): a. English-style: b. Chinese-style:

T O P I C  [S…e…] T O P I C  [S…]

The difference between these two ‘styles’ is the mandatory ‘e’, which Bao and Min call a ‘place-holder’ (ibid.). This place-holder is usually a resumptive pronoun in left-dislocated structures, but it may also be phonologically null (ibid.). Two Chinese examples given by the authors can be seen in (6.21), with (6.21a) containing an English-style and (6.21b) containing a Chinese-style topic. (6.21)

a. shuiguo, wo fruit I ‘Fruits, I like’

xihuan like

e (Bao and Min 2005: 274)

b. shuiguo, wo xihuan li fruit I like pear ‘As for fruits, I like pear (lit. Fruits, I like pear)’ (Bao and Min 2005: 274) In Chinese-style topic constructions, “[t]‌he topic does not enter into [a] selectional relationship with the main verb” (ibid.: 276). According to Bao and

121

Topicalization form, function, frequencies 121 Min, both kinds of topics may be found in SinE, but it is only Chinese-style topics that resemble the hanging topic construction. Example (6.22) shows a Chinese- style topic in SinE given by Bao and Min and (6.23) is an example identified in ICE-Singapore. (6.22) One test, I got zero. (Bao and Min 2005: 280) (6.23) B: She sings better in Cantonese A: word Cantonese She’s got better voice Her Chinese don’t know what she’s saying don’t understand (ICE-SIN:S1A-007#321–324) A major methodological problem related to identifying hanging topics can be seen in Bao and Min’s example: Hanging topics formally often resemble elliptical constructions. If the preposition on is added at the beginning of Bao and Min’s example sentence, the construction does not become a regular case of left-dislocation but a topicalized adverbial with a clear gap in the comment left by the adverbial’s regular position. The decision of what kind of construction we are looking at needs to be based on what we can see; anything else would be tampering with the data. However, the direction of arriving at a specific construction would be rather interesting: Did the speaker truly use a hanging topic construction as they would have in their mother tongue, or did they omit a preposition for some reason? The example in (6.23) is one of few in ICE where the link between the topic and comment is loose enough to speak of a hanging topic and where ellipsis is not an appealing alternative interpretation. In addition, all three of Lambrecht’s criteria given above are met by her Chinese: The following words are indeed a comment on her Chinese, it is a pragmatically accessible topic, and there is no syntactic link of any kind to the comment. This brief excursus about hanging topics shows yet another option available to speakers in order to topicalize a constituent, particularly if a contact hypothesis is assumed. Hanging topic constructions are prevalent in the Sinitic languages and could be considered a potential factor in allowing for similar constructions in HKE and SinE. However, hanging topics are relatively rare in the ICE corpora, which means that any interpretation in the direction of contact influence would be tentative.

6.3 Functions of topicalization The previous section presented the forms of topicalization from a variety of perspectives. This section, in turn, considers the functions of topicalization from two perspectives by asking which syntactic functions and which discourse-related functions topicalization fulfils in the varieties under scrutiny. The following questions are answered in the course of the section:

122

122 Topicalization form, function, frequencies (1) Which syntactic functions do the topicalized constituents fulfil in the clause or in the utterance as a whole? (2) Which of Mesthrie’s (1992) criteria for an expanded concept of topicalization are realized in the five analysed varieties? (3) Which discourse functions do the topicalized constituents fulfil? (4) How much does topicalization contribute to establishing a topic in the conversation, that is, how (if at all) does it contribute to topic persistence? 6.3.1 Syntactic function This section introduces the syntactic functions of topicalized constituents. After a cross-varietal overview, objects, complements, and adverbials are discussed individually. While there is a certain degree of overlapping with syntactic form, this section offers additional perspectives. One important component is the presentation of structures found in the contact languages that might be the basis for pattern replication.6 Thus, I provide further evidence for the (potential) role played by typological interference.7 Furthermore, special attention is given to adverbials as they represent a controversial category. Instead of taking the normalized figures as a basis, the relative frequencies of all the syntactic functions of topicalized constituents were calculated to find out whether there are any notable differences between the varieties. The absolute frequencies of each syntactic function as well as the relative frequencies (given here in percentages) can be seen in Table 6.9. In the table (and in the following analyses), all sub-corpora are included. The results in the table suggest that the distribution of each function is consistent across varieties, which could be substantiated with a Chi-Squared Test, indicating that the differences are not statistically significant (X-squared = 15.176, df = 16, p-value > 0.5). Direct objects are by far the most frequently topicalized Table 6.9 Syntactic functions of topicalized constituents across varieties

Direct objects %

ICE-GB

ICE-HK

ICE-IND

ICE-PHI

ICE-SIN

50 72.46

34 66.67

175 68.36

37 77.08

84 65.62

Indirect objects %

1 1.45

Subject complements %

6 8.7

Object complements %

1 1.45

Adverbials % Subjects %

11 15.94 0 0

0 0 6 11.76 0 0 11 21.57 0 0

3 1.17

0 0

32 12.50

3 6.25

1 0.39

1 2.08

45 17.58

7 14.58

0 0

0 0

0 0 15 11.72 0 0 28 21.88 1 0.78

123

Topicalization form, function, frequencies 123 constituents, with percentages ranging from 66.9 per cent in SinE to 77.08 per cent in PhilE. Obligatory adverbials are topicalized from ca. 15 per cent (in PhilE) up to ca. 22 per cent (in SinE). Indirect objects and object complements are topicalized very rarely, while the topicalized subject represents a hapax legomenon, which is discussed as a unique case later. Direct and indirect objects The first syntactic function I look at in detail is topicalized direct and indirect objects, taking into account previous research on the respective Asian varieties of English as well as on structures found in the substrate languages. In HKE, 34 topicalized direct objects could be identified. Prototypical examples look like (6.24), which includes the topicalization of a direct object that is an NP and both discourse- and hearer-old (i.e., evoked). (6.24) A: Uh the only thing I think they th they know nothing about even they st have stu studied for six years Z: Uhm A: No concept they obtained fo acquire they did not acquire any uh any knowledge from the school (ICE-HK:S1A-026#494–496) This example is not ambiguous and received a TOP1 rating in the analysis. The poset {knowledge} is opened early in the conversation and is continuously referred to by all participants in the discourse. The direct object No concept is stressed by the speaker, and, simultaneously, represents a link to the preceding discourse. According to Yip and Matthews (2000), who –as noted in chapter 3 – describe Cantonese as a highly topic-prominent language, sentence-initial direct objects such as No concept in the example are common in Cantonese. The sentences in (6.25) and (6.26) from Yip and Matthews’ study both feature direct objects in sentence-initial position. (6.25) Nī go yàhn ngóh gin-gwo (lit. this person I have seen) ‘I’ve seen this person before’. (Yip and Matthews 2000: 115) (6.26) Póutūng-wá ngóh sīk síu-síu (lit. Putonghua I know a little) ‘I know a little Putonghua’. (Yip and Matthews 2000: 115) Examples with a pronoun as a topicalized object are, as mentioned previously, notably rare in HKE: only four such tokens could be found. One of them is shown in (6.27).

124

124 Topicalization form, function, frequencies (6.27) Oh this I made because uh his was the first time I brought some stuff to the laundry And when I took it I mean it seems like new (ICE-HK:S1A-011#171–172) For this example, the preceding discourse is not included because it contains several unclear words and (seemingly) unrelated expressions. In addition, the referent is not made clear by the preceding utterances. However, picking up a topic by means of an anaphoric pronoun (cf. Lange 2012a: 129) implies that the referenced information represents shared knowledge; otherwise, the communicative effort will fail as the addressee(s) will be unaware of what the speaker is talking about. I specifically mention the term ‘shared knowledge’ again because examples such as (6.27) imply that the discussion involves an extralinguistic entity, meaning that reading the corpus does not tell the whole truth. The pronoun in (6.27) could therefore be interpreted as a deictic expression and might be supported by gestures. Previous studies looking at topic- comment structures in HKE found few instances of topicalized objects (or any other constituent, for that matter). Setter et al. (2010: 77), for instance, found only one example in their data.8 Left- dislocation (6.28) and constructions interpretable as either left-dislocation or hanging topics (6.29), on the other hand, occurred rather frequently, although the authors provide no concrete numbers. (6.28)

(6.29)

the the fish

they are not very sensitive to the shining hook

TOPIC

COMMENT

(Setter et al 2010: 76)

Vancouver

they have high-rise buildings they have […] relatively good food

TOPIC

COMMENT

(Setter et al 2010: 77)

Setter et al.’s analysis is further proof of the terminological lack of clarity that has prevailed in the literature. They describe (6.28) and (6.29) as examples showing topics “mostly related to the subject of the comment clause in some way” (2010: 77) and do not use any terminology other than ‘topic-comment sentences’ to describe the phenomena they identify. Based on the examples they provide, topicalization, as understood in this book, would be a hapax legomenon in their data. Setter et al. (2010) do not mention indirect objects in any capacity, and no such cases could be found in the present study, either. Certainly, their corpus findings underline the fact that topicalization in general is relatively rare in HKE (at least in its representation in ICE and in Setter et al.’s 2010 corpus). In the Indian component of ICE, 175 topicalized direct objects were found; (6.30) features a discourse-deictic anaphoric pronoun representing both speaker B’s question what you’re going to have to eat and A’s response what do you want.

125

Topicalization form, function, frequencies 125 (6.30) B: And what you’re going to have to eat A: Now what do you want it’s upto you That you can decide You want mutton you want fish you want chicken (ICE-IND:S1A-003#100–103) Lange (2012a: 135) notes that anaphoric pronouns in sentence-initial position can be cases of focus preposing or topicalization (as defined by Ward and Birner 2004). In chapter 2, I mention that these two notions cannot be differentiated clearly in the present study –firstly, because of a lack of information on prosody (and the ambiguity resulting thereof); and, secondly, because of the fact that thinking of topicalization as being both rhematic and thematic brings the two concepts together. If a clear separation is intended, distinguishing between focus preposing and topicalization in a narrow sense is clear-cut in some cases and essentially impossible in others. Example (6.31), given as a case in point by Lange (2012a: 134), illustrates the problem. (6.31) B: Particularly in Diwali days they have to do Laxmi pooja and all that A: Uhm  there B: So so really if you started seven till nine pooja and decoration and everything A: That you must do previous day na B: No previous day we never do (ICE-IND:S1A-065#188–192) In A’s second response to B, the focus falls on previous day. In the following utterance by speaker B, however, the situation is much less obvious: “[I]‌f the utterance carries another nuclear accent on never, then we might be dealing with another topicalization construction; if not, we have another instance of focus preposing where the preposed focus constituent forms an identity link to the prior discourse” (Lange 2012a: 136; emphasis in the original). The situation must, therefore, remain unresolved. However, anaphoric pronouns are explicit links in each instance and were analysed as tokens of topicalization whenever they occurred in sentence-initial position. Typologically, it is interesting to note that speaker A, who creates topic continuity in example (6.31), is a native speaker of Marathi. Although I could not find Marathi examples with topicalized pronouns, there are examples such as (6.32), which show that topicalization of objects is possible in Marathi. (6.32)

te pustak mi rāmsāthi ghetla that book I Ram for bought ‘That book, I bought for Ram’. (Junghare 1988: 315; emphasis removed)

Another interesting but rare phenomenon is when speakers topicalize in two subsequent utterances as in (6.33).

126

126 Topicalization form, function, frequencies (6.33) A: I have joined a lending library but I enjoy and take a book only in the month of May B: One or two words  Uh A: Only in the month of May I’ll take About ten to twenty books I’ll read that’s all C: I never I can never resist a book somehow (ICE-IND:S1A-030#125–130) First, the adverbial only in the month of May is topicalized; in the second sentence, about ten to twenty books is topicalized. Because of the unusual structure of the first utterance, the topicalized direct object in the second might be interpreted as an argument of take and not of read. The information contained in the first adverbial only in the month of May is repeated in order to emphasize the month as well as the fact that speaker A will read only in May. Since the response by speaker B is not given in ICE-India, it is unclear whether this is a direct reaction to her, but the repetition of the adverbial and the following specification (given by speaker A) support this interpretation. A case of direct contrasting using a topicalized direct object can be seen in (6.34). (6.34) A: Okay fine did you go around Mysore B: Not much uh you see we have such busy schedules it’s so many we get less time you have a little socializing to do and then on one word  so Not much but okay I have been to a few places and uh and the remaining few I plan to see later (ICE-IND:S1A-014#10–19) In this exchange, speaker B responds to speaker A’s question of whether she has visited Mysore, a city in the southern Indian state of Karnataka. Having explained the scheduling difficulties, B then contrasts the places she has seen with the ones she has not. She begins the sentence canonically, but then resorts to topicalization in the second main clause and creates a mirror-like effect. This contrastive, emphatic use of topicalization is common in all English varieties and also well- attested for Bangla, speaker B’s mother tongue. Junghare provides example (6.35) in order to explain the use of topicalization in Bangla to specify an object that would be indefinite in its canonical position. By changing the position of the object, the ‘original’ meaning of ækta, ‘a’, becomes ‘one’. (6.35) ækta boi one book

por̨eche read

še he

(Junghare 1983: 124)

127

Topicalization form, function, frequencies 127 Interestingly, the more recent publication on Bangla syntax by Conners and Chacón (2015) provides more elaborately glossed examples but is far less certain about the ‘motivation’ for topicalization in Bangla. Noting that “[m]‌uch more work needs to be done to understand the full nature of scrambling, particularly its semantic and pragmatic constraints” (2015: 250), they give example (6.36) for leftward movement of the direct object. (6.36) gari gaṛi-ta to ami caliye-ch-Ø-i gɔtɔkal car car-CLF INT 1S G .N O M drive.P R F -P R S -1 yesterday ‘The car. The car I drove yesterday’. (Conners and Chacón 2015: 250; emphasis in the original) The precise reasons for topicalization in Bangla might elude us, but it is apparent that direct objects may occur in initial position in Bangla. As expected, topicalization also occurs in the English spoken by many Indians featured in ICE-India who grew up with Hindi as their mother tongue. One example of this is (6.37), in which speaker A adds a new member to a previously opened list of tests. For comparison, an example of a topicalized direct object in Hindi is given in (6.38). (6.37) A: He suggested these tests he said first thyroid test  B: Uhm A: For this anxiety thing one word one word  B: Yeah Uhm A: Then he said another test we will take B: Uhm  uhm A: And if necesary [sic] we will do angiography (ICE-IND:S1A-068#142–149) (6.38)

vah kitāb maine rāmkeliye kharīdī that book I Ram for bought ‘That book, I bought for Ram’. (Junghare 1988: 315; emphasis removed)

While direct objects are the most frequent constituents to occur in sentence- initial position in IndE, indirect objects are rarely topicalized. One of the few examples found in ICE-India is shown in the discourse excerpt in (6.39), in which two people discuss the fact that discussing private matters is easier with friends than with parents. (6.39) A: Friends are like more closer than parents in hostel B: Parents means you can’t tell each and everything A: Yeah B: Isn’t  it A: Friends we can tell (ICE-IND:S1A-054#202–206)

128

128 Topicalization form, function, frequencies Speaker A’s mother tongue is Kashmiri, which has not been covered in chapter 3. Several dialects of Kashmiri, an Eastern Dardic language belonging to the Indo- Aryan branch, “are strongly subjected to the influence of the neighbouring Indo- Aryan languages” (Edelman 1983: 298). For this reason, it can be expected that Kashmiri and its neighbouring languages exhibit certain similar patterns. As in the Indo-Aryan languages investigated above, there is a relatively high degree of freedom in constituent movement in Kashmiri (Koul and Wali 2006: 136). Sentence (6.40) is an example of indirect object fronting in Kashmiri, suggesting that structures of this kind are present in the speaker’s repertoire. Emphasis has been added in the example to illustrate the different positions of the indirect object in Kashmiri, on the one hand, and in the (canonical) translation into English on the other. (6.40) mohanas dits aslaman kita:b ra:mini khə:tri ra:th gari ‘Aslam gave Mohan a book for Ram yesterday at home’.9 (Koul and Wali 2006: 136) For Hindi, the example in (6.41) gives insight into the possibility of fronting indirect objects: (6.41)

anjum-koi yuusuf soc-taa hai [ki Anjum-D AT Yusuf.M think-H A B . M.SG be.PR S .3SG that nuur-ne ti kitaab di-i] Nur-E R G book.F give.P F V -F ‘Anjum, Yusuf thinks that Nur gave a book to’. (Bhatt 2016: 508)

Topicalization of indirect objects is, therefore, part of the linguistic repertoire of many Indians, but may simply not occur frequently in IndE because of various reasons. A first and very important aspect in this regard is the fact that indirect objects are, generally speaking, not as frequent as direct objects. It should be interesting to see in which ways ‘new’ ditransitives (Mukherjee and Hoffmann 2006) might interact with topicalization in the future.10 In the present study, however, the amount of topicalized indirect objects was far too small to make any claims about this. As in the other corpus components, the direct object represents the most frequently topicalized constituent in ICE-Philippines. PhilE is, however, the only variety in which direct objects account for more than three quarters of all tokens. This finding differs from the frequencies given by Winkle (2015), who identified a higher percentage of objects in both ICE-Singapore and ICE-Hong Kong (2015: 161). An important point to note is that the data from ICE-Philippines are biased: One speaker in the data repeatedly uses a construction similar to that much in (6.42) and is responsible for six of 37 direct objects (which amounts to 16.2% of all direct objects in the variety). The use of topicalization might therefore be idiolectal in the case of this speaker.

129

Topicalization form, function, frequencies 129 (6.42) B: That was his hobby A: Taking pictures B: Uh we we used to live there in the Boulevard So we would walk across along the bay and then we would take a pose for him He’d take pictures that that was the normal routine That much I remember but I no I guess you’ll have to ask me more more questions na lang or when we get to the generally it was the fact that he was around when we wanted him (ICE-PHI:S1A-005#44–49) This speaker clearly wants to maintain cohesion and refers back to the immediately preceding discourse, creating a narration-like effect. Interestingly, Biber et al. (1999) found a similar construction (although with a complement clause in fronted position) in a fictional text in their corpus; see (6.43). (6.43) What it was that changed this conclusion, I don’t remember. (Biber et al. 1999: 901) In Tagalog, that is, speaker B’s L1, some movement of constituents is allowed. Although the verbal complex is the first constituent of a canonical Tagalog sentence, other constituents such as subject complements in initial position are possible. However, I could not find an example of a topicalized object in the consulted texts on Tagalog that represents a convincing equivalent to topicalized objects in English. Topicalized indirect objects were not found in ICE-Philippines. ICE-Singapore has the lowest total count of topicalized direct objects, but they still represent the most frequently topicalized constituent in this variety. An example from the direct conversations in the corpus is shown in (6.44). (6.44) A: So he’s she’s doing stage management Now he’s going to collect all the props for me lah B: Uhm A: The only problem we have is like word I forgot to call Mandarin Uh to ask them whether they have a whiteboard and a flip chart We want to mount the you know the carpark sign on the B: Uhm uhm uhm A: flip chart you know B: Ya A: And then use their whiteboard to pin up the world map B: I thought flip chart flip chart we can still bring but whiteboard will be rather big quite difficult (ICE-SIN:S1A-015#172–182)

130

130 Topicalization form, function, frequencies In the example, two speakers discuss the preparations regarding the “stage management”. Speaker A opens up the poset {stage props}, which continually receives new members throughout the discussion, for example, whiteboard and flip chart. Thus, flip chart is well established in speaker B’s final utterance here by means of a poset relation. Typologically, Mandarin’s status as a topic-prominent language entails that comparable examples for nominal constituents as sentence- initial topics abound in grammatical descriptions. For example, (6.45) features a direct object in clause-initial position in Mandarin. (6.45)

nàben shū wǒ that-CLS book I ‘I have read that book’.

kàn look

guò EXP (Lin 2001: 124)

Sentences with a nominal topic that is syntactically detached can also be found in many grammatical descriptions of Mandarin. Examples (6.46) and (6.47) show cases of topicalized NPs which, in an English translation (provided under the gloss), could be interpreted as hanging topics. (6.46)

[[Top Zhe-jia yinhang] [Comment[Subj. xinyu] bucuo]]. this-Cl bank reputation not-bad ‘(As for) this bank, (its) reputation is good’. (Pan 2015: 192)

(6.47)

nà-chǎng huǒ, xìngkuī xiāofángduì lái-de-kuǎi. that-CLF fire luckily fire.brigade come-DES-fast ‘As to that fire, fortunately the fire brigade arrived quickly’. (Cheng and Sybesma 2015: 1548)

It becomes clear from grammatical descriptions of Chinese that different constituents may be positioned sentence-initially as topics in both Mandarin and Cantonese. However, Bao and Min not only note the presence of topicalization in SinE, but even go so far as to call it a topic-prominent language as well (2005: 272). Topic-prominence in SinE, as they claim, can largely be attributed to Chinese influence (2005; cf. also Platt and Ho 1993) and extends not only to left-dislocation, but also to structures without a referential pronoun in the main clause (2005: 279–280). These structures, as mentioned before, are called ‘Chinese-style topics’ by Bao and Min (2005) and, in some cases, resemble hanging topics more than they do ‘regular’ cases of topicalization. One of the examples they provide, however, would be included under the heading of object topicalization; see (6.48). (6.48) (on school experience) Trouble-makers, I have a lot actually. (Bao and Min 2005: 280)

131

Topicalization form, function, frequencies 131 Objects also represent the bulk of identified constituents in topicalized position in Winkle’s (2015) analysis of SinE. Indirect objects in SinE, however, are mentioned neither by Bao and Min (2005) nor by Winkle (2015) and none could be identified in this study either. Setting Tagalog aside, all major contact languages of the four varieties permit direct objects in sentence-initial position. Topicalization of direct objects is a well- established phenomenon in the Indo-Aryan, Dravidian, and Sinitic languages and can be considered a part of the repertoire of the speakers recorded in ICE. However, direct objects are also the most frequently topicalized constituent in ICE-Great Britain, which means that the historical input variety for HKE, IndE, and SinE also features this kind of topicalization. Unlike direct objects, topicalized indirect objects occur rarely. Neither British English (as an input variety) nor the Asian varieties show noteworthy frequencies of indirect object topicalization. Whether this is because of the general scarcity of indirect objects compared to direct objects or to other reasons is difficult to judge, although the former is likely part of the explanation. Subject and object complements The next constituents to be analysed are subject and object complements. In copular clauses, a copula links the subject with the subject complement, whereas object complements occur in complex-transitive constructions as complements of the (direct) object. Examples (6.49a-b) give canonical examples for sentences with subject complements, and (c-d) are sentences that feature object complements; all examples are adapted from Quirk et al. (1985). The complement in each example is given in italics. (6.49) Sentences with subject complements a. The children are happy. b. That you need a car is obvious. (Quirk et al. 1985: 417; emphasis in the original) Sentences with object complements c. I find him careless. d. He made the children happy. (Quirk et al. 1985: 417; emphasis in the original) NPs, AdjPs, PPs, and clauses may function as subject complements or as obligatory adverbials following the copula (cf. Quirk et al. 1985: 54–56, 1170–1171). The most frequent copular verb is be, but other verbs can function as copulas as well.11 Variation in the topicalization of subject complements may occur with regard to the structure of a copular sentence, since the verb may be located either directly after the topicalized complement or after the subject. For the present study, subject complements are considered as topicalized irrespective of the

132

132 Topicalization form, function, frequencies position of the copula. An example in which the copula succeeds the subject can be seen in (6.50) –one of six instances of topicalized subject complements in HKE. (6.50) Z: It’s kind of like half of A level A: Uh we have the same AS level yeah it is called Z: Uhm Yes Yes Yeah Advanced Supplementary you know that kind of (ICE-HK:S1A-042#952–958) Here, an affirmative discourse particle follows the complement; then the subject follows, and, finally, the verbal complex completes the utterance. A similar structure found in ICE-India is shown in (6.51). (6.51) B: Oh God it was ordinary Yamaha and then we went there and we bought cloth and all and we’ve gone to eat something you know both of us were sleeping there in the restaurant So tired we were we couldn’t talk also anything but while coming home it was quite nice you know the journey (ICE-IND:S1A-040#70–71) Examples identified in ICE- India are frequently of this sort, with speakers using topicalization in order to reinforce or emphasize a previously mentioned statement. In this particular example, the state of being tired has not been mentioned explicitly before but is clearly implied by overeating and falling asleep. As mentioned above, object complements were the least frequent constituents to be topicalized (with the exception of a subject that needs to be treated differently). In ICE-India, the sole topicalized object complement is Ekka in (6.52). (6.52) B: You see in Amrawati you will find people lying you know drunk heavily drunked drunk lot and lots of people know specially those rickshaw puller I mean lower strata public A: Is it so I had totally different ideas about Amravati Yeah B: I mean from the lower strata A: Ekka we call that sort of rickshaw (ICE-IND:S1A-046#83–89) Because of a lack of information on the movement of object complements in any of the major contact languages, no typological comparison can be made for this

133

Topicalization form, function, frequencies 133 syntactic function. However, topicalization of subject complements is mentioned in several grammars and is permitted in Hindi; see (6.53a-b) from Montaut (2015). (6.53)

a. ye log besharam these people shameless ‘These people are shameless’.

hain are

b. besharam to āp hain! shameless to you are ‘Shamelessness is rather yours!’

(Montaut 2015: 272)

(Montaut 2015: 272)

In (6.53b), “[t]‌he term besharam ‘shameless’, formerly part of the comment, is promoted […] in the theme position, but a theme re-qualified in its relation with its referent since the referent is now the opposite group” (Montaut 2015: 272). Although confined to a specific context, this example indicates the possibility of placing subject complements in sentence-initial position. Apart from having besharam in a different position and adding the thematic particle to, there are no differences with regard to this specific complement between a and b. For this reason, the translation by the author can be misleading. While we cannot assume that besharam with the added particle to is indeed “redefined as a different type of shamelessness in relation to the new subject” (ibid.), the constituent is, syntactically, still a subject complement. Sentence-initial complements and obligatory adverbials in copular sentences are also permitted in other Indo-Aryan and Dravidian languages. In some cases, complements only occur initially because the clause is missing a dummy subject, as is found, for instance, in impersonal constructions such as weather descriptions (6.54) or indications of time (6.56). Although the subject often remains the initial constituent in the Dravidian languages, the complement usually follows immediately after and precedes the verb (6.58). The examples given in (6.55) and (6.57) represent further instances of topicalized constituents in copular clauses. (6.54) Marathi aj phar thənḍi / gərmi today very cold-F hot-F ‘It feels very cold/hot today’.

wat-t-e feel-I MPF -F SG (Dhongde and Wali 2009: 201)

(6.55) Marathi teblāvar ek pustak āhe table on one book is ‘There is a book on the table’.

(Junghare 1988: 313)

134

134 Topicalization form, function, frequencies (6.56) Tamil maNi pattu aagudu hour ten is ‘The time is ten’. (6.57) Tamil meesemeele (oru) pustagom table-on a book ‘There is a book on the table’. (6.58) Tamil kumaar mandiri aanaaru Kumar minister became ‘Kumar became a minister’.

(Annamalai 2004: 13)

irukku is (Annamalai 2004: 12)

(Annamalai 2005: 91)

It is evident in the examples (6.54) through (6.58) that subject complements and obligatory adverbials, that is, the constituents required by a copula, may be topicalized in some of the major Indo-Aryan and Dravidian languages. This means, from a typological point of view, that Indian speakers have topicalization of these constituents in their general linguistic repertoire but do not access it at all times when they form English sentences. In ICE-Philippines, three subject complements and one object complement could be identified as topicalized constituents. The subject complement in (6.59) is outstanding because of its length, which far exceeds the average length of other complements in topicalized position. (6.59) A: Why marry somebody who has a different language B: laughter But uh you can marry a Filipina with a different dialect also A: Uh no problem Dialect no problem but no B: Not language Customs customs A: Language Very different in culture B: Culture A: Very close to Filipino culture is Thai B: Uh A: They ha they almost have the same culture (ICE-PHI:S1A-028#121–130) Topicalization of a lengthy constituent is felicitous in this example because a contrasting expression has been mentioned by speaker A shortly before

135

Topicalization form, function, frequencies 135 topicalization occurs. The utterance very different in culture is formally very similar to very close to Filipino culture and, in terms of meaning, the two form a contrasting pair. Thus, the complement itself is salient in the discourse even though the verb and subject that follow might be unanticipated. For this reason, both production and processing costs are manageable. Moreover, speaker A reinforces the content of his previous statement by rewording it in a canonical sentence (They ha they almost have the same culture). In addition to ensuring that speaker B properly understood what A said, this also emphasizes the perceived similarity between Filipino and Thai culture. A possible alternative interpretation would be to consider very close to Filipino culture as a strategy to ‘repair’ the previous, incorrectly started utterance (in the eyes of the speaker). However, the first interpretation described in this paragraph has been favoured. Although some topicalized subject complements could be found in ICE- Philippines, their low frequency in the corpus is rather surprising considering the fact that a similar structure is permitted in Tagalog for NPs, AdjPs, and locatives. This structure can be considered part of the speakers’ linguistic repertoire; see (6.60–6.62) for examples given by Schachter (2015). (6.60)

Istudyante ang bata student T child ‘The child is a student’. (Schachter 2015: 1666; emphasis removed)

(6.61)

Matalino ang bata intelligent T child ‘The child is intelligent’.

(6.62)

(Schachter 2015: 1666; emphasis removed)

Nasa iskwela ang bata at school T child ‘The child is at school’. (Schachter 2015: 1666; emphasis removed)

The sole topicalized object complement in ICE-Philippines was found in the phone calls and is shown in (6.63). (6.63) B: Meron kaming ano we have this weekly prayer Couple’s prayer we call it (ICE-PHI:S1A-094#166–167) Immediately after introducing the prayer into the discourse, the speaker specifies its name. Thus, the topicalized object complement serves both to create continuity by instantly picking up the topic {prayer} again and to give further detail on the nature of the prayer. Because of the salience of the complement in the discourse and the relatively low complexity of the sentence as a whole, topicalization is felicitous in this context.

136

136 Topicalization form, function, frequencies ICE-Singapore shows roughly the same distribution as ICE-Hong Kong with regard to syntactic functions, meaning that slightly less than 12 per cent of topicalization occurred with subject complements. Two examples from the direct conversations in ICE-Singapore are given in (6.64) and (6.65). (6.64) B: Cute and the daughter it’s cute uh A: Ya quite cute B: Very cute the way she talks (ICE-SIN:S1A-041#96–98) (6.65) A: Very funny your system very bad your your office B: Why A: I I I I called up two times right The second time I got you already Then they still ask you for word  calls (ICE-SIN:S1A-080#48–52) In both utterances that feature topicalization, the copula be is deleted. SinE is known for frequent copula deletion as a result of contact with Mandarin Chinese (cf. Platt et al. 1984; Gupta 1994; Deterding 2007), while occasional copula deletion in HKE can be explained by contact with Cantonese (cf. Gisborne 2009; Kortmann and Lunkenheimer 2013). If, in addition to copula deletion, a cataphoric pronoun has been omitted, the two utterances with the highlighted complements could also be interpreted as ellipses with right-dislocation in the original construction (6.66–6.67). (6.66) It is very cute, the way she talks. (6.67) It is very funny your system, […] Considering the typological background, however, topicalization is likely in both cases. The two examples in (6.65) are similar to the first in (6.64) in having a fronted subject complement and an omitted copula. They are, however, outstanding because they are not anticipated by the preceding discourse in any way. There is no anticipation of either very funny or very bad or at least none is visible to the reader of the corpus. The information status of the topics can be classified as ‘unused’, though, because the characteristics of B’s office and the processes in it are most certainly stored knowledge for B. Discourse-wise, an explanation follows immediately as speaker A explains their criticism. Object complements in sentence-initial position could not be identified in ICE-Singapore. Indeed, object complements represent one of the least frequently topicalized constituents and occur significantly less in sentence-initial position than subject complements. Similar to indirect objects, this could be because of the comparatively low number of complex-transitive verbs in comparison to copular

137

Topicalization form, function, frequencies 137 constructions. In addition, the complexity of topicalizing a constituent that is already part of a complex construction might be too demanding for speakers and too difficult for hearers to process. Since native and non-native speakers alike do not topicalize object complements (with the rare exception, of course), the similarity between ICE-Great Britain and the Asian corpora lends weight to this argument. Subject complements, on the other hand, are topicalized relatively frequently in all varieties. In terms of their form, topicalized subject complements are overwhelmingly realized by adjective phrases. Roughly 65 per cent of all subject complements in sentence-initial position are AdjPs, which can be explained by the fact that AdjPs in initial position are much less likely to create ambiguity than NPs. For a hearer, the main alternative to interpreting them as dislocated constituents would be to consider them as potential ellipses. Typologically, this section has reinforced that the mere existence of a structure in a contact language does not automatically result in higher frequencies of a similar structure in the English variety. Whether or not a structure will be replicated is subject to a complex selectional process depending on numerous factors. Adverbials Whereas the status of objects and complements as ‘topicalizable’ constituents in English is largely undisputed, adverbials remain controversial in this regard. The relation between topicalization and adverbials therefore deserves special attention because the two commonly assumed constraints on topics (aboutness and givenness) are often believed not to hold for adverbials or, at least with regard to aboutness, might even be considered impossible for most adverbials. I believe that this issue is best dealt with by resorting to terminology that is more precise: As shown in the CGEL (Huddleston and Pullum 2002: 665–666), some adverbials are much closer to being (mandatory) complements than others. Potentially the biggest problem of this distinction lies in the fact that some cases are vague, and, in addition, formally identical constructions might be omissible in one sentence but not in another. Table 6.10 is a scale presented in the CGEL and serves to illustrate the vast differences in terms of how ‘strongly’ specific kinds of adverbials, despite belonging to the same overarching category, might be incorporated into the sentence. We can use the terms ‘adjunct’ and ‘complement’ to describe such differences syntactically. However, it is the semantic and discourse-pragmatic aspects that explain why very eloquently in i cannot be omitted without taking away important information while the sentence in xxvi ‘loses’ very little information when moreover is removed. The Cambridge Grammar claims that “elements belonging to the later categories in this list are less tightly integrated into the structure of the containing clause than the earlier ones” (Huddleston and Pullum 2002: 680).12 Effectively, the table shows a continuum between complements on one end and adjuncts on the other –the two labels can and should not be considered as a clear-cut dichotomy. A definition of adjuncts and other kinds of adverbials is given by Hasselgård:

138

138 Topicalization form, function, frequencies Roughly, adverbials that contribute to referential meaning are called adjuncts or circumstantial adverbials; those that convey the speaker’s evaluation of something in the proposition are called disjuncts or modal adverbials, and those that have mainly text-organizing and connective functions are called conjuncts or conjunctive/linking adverbials. (2010: 19) Applying Hasselgård’s definition to the semantic categories in the CGEL, the lowest members, for example, connectives such as moreover, would be classified as conjuncts. Disjuncts such as fortunately are located slightly higher Table 6.10 and adjuncts (e.g., adverbials of manner such as very eloquently) represent the bulk of the higher-placed categories. Although the topicalized adverbials identified in ICE do not exclusively fall into the group of adjuncts, the vast majority does. However, the fact that a given example may not always be easily labelled as a complement or an adjunct is underlined by contrastive examples sentences such as The stew is in the oven versus We had breakfast in the kitchen (Huddleston and Pullum Table 6.10 Semantic categories of adjuncts according to the CGEL

No.

Example

Semantic category

i ii iii iv v vi vii viii ix x xi xii xiii xiv xv xvi xvii xviii xix xx xxi xxii xxiii xxiv xxv xxvi

She presented her case very eloquently. They opened it with a tin-opener. We solved the problem by omitting the section altogether. I foolishly omitted to lock the back-door. He slept in the TV room. He hurried from the scene. She went to New York for Christmas. We made the mistake of travelling via Heathrow. I crawled towards the door. They walked five miles. I woke up at five. Ken slept for ten hours. It was already light. I often read in bed. She read the book for the third time. We enjoyed it very much. He left the door open in order to allow late-comers to enter. They had to walk because of the bus-strike. As the sun sank, the light intensified so that the hills glowed. I’ll come along, though I can’t stay very long. We’ll get there before dinner if the train is on time. Technically, he did not commit an offence. The accident was probably due to a short-circuit. Fortunately, we got there on time. Frankly, I’m disappointed. There is, moreover, no justification for making an exception.

manner instrument means act-related spatial location source goal path direction extent temporal location duration aspectuality frequency serial order degree purpose reason result concession condition domain modality evaluation speech act-related connective

Source: Adapted from Huddleston and Pullum 2002: 665–666; emphasis in the original; reprinted by permission from Cambridge University Press.

139

Topicalization form, function, frequencies 139 2002: 680; emphasis in the original). The adverbial in the first sentence is obligatory in spite of its apparent formal identity to the adverbial in the second sentence. In order to illustrate the connection between adverbials and information structure, Huddleston and Pullum (2002) provide the three examples in (6.68). (6.68) a. I saw your father at the window. b. I saw your father in London. c. I saw your father on the bus. (Huddleston and Pullum 2002: 681) While the location indicated in (b) functions as a description of the situation in its entirety, the semantic role of at the window in (a) is rather that of a theme. The sentence in (c) is ambiguous, as it may either be a description of the speaker’s location at the time of seeing the father or it may be a description of where the father was when the speaker saw him, in which case the theme-interpretation would hold again. Further evidence for the topicality of certain adverbials can also be found in Hasselgård’s (2010) monograph. According to Hasselgård, “adjuncts can take various roles in the information structure of a clause” (2010: 293). Adjuncts in initial position may be contrastive, as this position carries “the potential of thematic focus” (ibid.). The two examples in (6.69) illustrate such contrastive adjuncts: (6.69) a. ‘She means to hurt me,’ Emma thought […]. Aloud, she said: ‘I don’t know what it’s like to be husband-less but I can imagine’. b. but even further behind Hembury Hall was Rocky Romance (Hasselgård 2010: 293; emphasis in the original) It comes as no surprise that adverbial adjuncts do not usually represent “the informational peak of the message” (ibid.); this is also assumed to be the case for the majority of tokens found in ICE. Taking the deliberations in the CGEL and specialized studies such as Hasselgård’s (2010) into consideration, the potential for some adverbials to occur in topicalized position and to function as topics becomes evident. The degree of markedness is subject to variation and depends on both the discourse context and the kind of adverbial, as suggested by Table 6.10. In the present study, deciding whether an adverbial is an obligatory or an omissible constituent was done by assessing the semantic integrity of the clause when the adverbial is omitted and by taking the preceding and the following discourse into consideration. Critical cases were discussed with a second rater. Since adverbials are very often realized by AdvPs or PPs, many of them fall into the TOP2 group. This means that the frequency distribution given at the beginning of this chapter is in fact a neat distinction when it comes to syntactic function, since potentially controversial adverbials are not mixed with uncontroversial cases (such as direct objects). Turning to concrete examples from ICE, the next paragraphs discuss some adverbials that were included in the frequency count and some that were not included. As in the previous sections, relevant patterns from some of the contact languages are presented.

140

140 Topicalization form, function, frequencies The discourse excerpt in (6.70) comes from the direct conversations in ICE- Philippines and shows a token that was not counted as ‘relevant’. (6.70) A: I don’t think too much about it anymore because uh I don’t know I’ve found some uh I think I think what I am fighting for now is not just because I’m crazy about this person but because I’m married and everything is legal anyway B: Uh  huh A: You know so unlike in cases like this in cases like this everything’s just wrong wrong wrong In my case morally there’s no problem (ICE-PHI:S1A-006#203–206) Morally might receive tonal stress, making a contrastive reading likely. In this line of reasoning, the intended meaning could be that there might not be a moral problem but a problem of a different kind. Another interpretation would be to consider morally as reinforcement of the information that there is no problem. The first interpretation would make a pause after morally rather reasonable, but judging from the ICE annotation, no longer pause follows. Semantically, the two readings reflect the difference between complements and adjuncts outlined above: As a contrastive adverbial, morally would have to be considered a case of topicalization and not be omissible. However, a direct opposition to morally can neither be found in the preceding nor in the subsequent discourse. In addition, alternatives are not suggested implicitly. If a contrastive adverbial is not what we see here, then the sentence does not lose any significant meaning by deleting the adverbial. In terms of the semantic categories given in the CGEL, morally would most likely be classified as an adverbial indicating the domain, which would mean that it is closer to the adjunct-pole than it is to the complement-pole. Because no second member of a contrastive pair of words can be found in the subsequent discourse and that no pause follows after morally, I decided not to count this as a relevant token of topicalization. In contrast, an example that was counted is given in (6.71). Through the M S G in the excerpt is clearly an argument of the verb and cannot be omitted: (6.71) A: Every day I think that Jollibee earns more than McDonalds B: Maybe it’s because they have more variety A: Uh uhm and we have tastier food B: Through the M S G they’re putting A: What B: They’re strictly Filipino (ICE-PHI:S1A-038#32–37) Tagalog, the most dominant contact language in the Philippines and the predominant L1 indicated by the speakers in ICE-Philippines, allows for all manner of adverbials in sentence-initial position. These adverbials are usually followed by a

141

Topicalization form, function, frequencies 141 particle such as ay. However, this particle does not simply mark them as adverbials, but, according to Kaufman (2006), functions as a topic marker as seen in (6.72). (6.72) a. Kadalasan ay hindi siya pa~pasok sa klase. usually TOP NEG 3S G .S U B PR OG ~enter OB L class ‘Usually, he doesn’t come to class’. (Kaufman 2006: 157) b. Malamang ay nan-daya probably TOP AV .P R F -cheat ‘They probably cheated’.

sila. 3P L .SUB (Kaufman 2006: 157)

The positions of the verb and the particle that may or may not follow are integral to understanding the meaning of the adverbial (cf. ibid.: 158–159). For the present study, however, what is most important is the fact that speakers of Tagalog are familiar with adverbials in topic position and that adverbials interact directly with syntactic position in Tagalog. This finding is interesting especially with regard to the lack of clarity in the topicalization of other constituents in Tagalog. Another interesting but also problematic case found in ICE-Philippines is shown in (6.73): (6.73) B: Have you gone to the States Or you’re planning to go somewhere for the summer A: No I’m planning to go to Toulouse France this summer B: What is Toulouse A: Toulouse I I’ve never been (ICE-PHI:S1A-041#80–85) Toulouse in speaker A’s utterance can be interpreted as a PP with an omitted preposition or as an NP that is rather loosely connected to the remainder of the clause. In its present form, the constituent cannot be analysed as a hanging topic in a narrow sense because its semantic and syntactic integration are still very strong. If Toulouse were deleted, however, the utterance would still be perfectly understandable because the hearer can easily infer that the speaker is talking about the town. The topic is thus not only fully evoked, but it is, essentially, an exact repetition. In a sense, it could, therefore, also be understood as an echo construction. Another example of an adverbial that was counted as a token of topicalization is given in (6.74) from ICE-Hong Kong. (6.74) Z: But what what subject do you want to teach if you become a primary school teacher A: Yes If in a primary teachers I think that except except English every every subjects I like (ICE-HK:S1A-012#269–271)

142

142 Topicalization form, function, frequencies Prepositional phrases with except as their head differ from many other kinds of PPs in that their complement is licensed by the matrix clause and therefore not independent. The CGEL calls such expressions “matrix- licensed complements” (2002: 642), an example of which is the underlined PP in “Everyone liked it except Kim” (ibid.; emphasis in the original). This means that such ‘adverbials of exception’ (Quintero 2002: 99) are, by definition, non-omissible complements. This is underscored by the corpus example, since deleting the highlighted constituent except English would result in a completely different meaning. The major contact language of HKE, Cantonese, permits sentence- initial adverbials. Two examples from Yip and Matthews (2000) are given below. In both examples, bold print was added to emphasize the position of the adverbial in the Cantonese sentence and in its respective translation into English. (6.75) Hēunggóng jeui gwai haih joōu uk (lit. Hong Kong most expensive is rent house) In Hong Kong the biggest expense is rent (Yip and Matthews 2000: 117) (6.76) Seuhnghói ngóh yáuh pàhngyáuh, Bākging jauh móuh I have some friends in Shanghai, but not in Beijing (Yip and Matthews 2000: 117) Topicalized adverbials can also be found in ICE-India. In (6.77), an AdvP is topicalized and sufficiently marked to be counted as a relevant example. (6.77) B: But basically English language as it is I don’t think we can go for transform very much to, branding of Indian English as much as we brand American English A: American English There is much difference in American B: Because uh they specifically they took a lot of efforts they changed it A: Intentionally they did it B: Uhm I mean they created their own dictionary (ICE-IND:S1A-015#106–111) The example in (6.78) features a topicalized adverbial and is also marked, but requires further discussion: (6.78) A: They trace everything out They just tell you some you see B: Yes with pleasure I’ll hear it A: Smile I always like this (ICE-IND:S1A-001#123–126)

143

Topicalization form, function, frequencies 143 With- PPs represent a particularly difficult case: While they may be object comitatives in some constructions, they may be (omissible) adjuncts in others. In order to differentiate between these two options, a gapping test has been proposed (see Zhang 2007: 143); however, this test goes beyond the scope of this book since very few topicalized tokens starting in with could be identified. In (6.78), the PP was considered to be sufficiently marked to be counted as a topicalized adverbial but not syntactically important enough to be rated as a necessary argument of the verb (i.e., as an object). In Indo- Aryan languages, locative and temporal adverbials tend to be mentioned before the arguments (Kachru 2006: 159). While the English translation of such adverbials would sometimes line with the original sentence, the pre-clausal positioning in the Indo-Aryan languages is more systematic. Three examples, two from Hindi (6.79–6.80) and one from Marathi (6.81), illustrate the difference between the Indo-Aryan languages and English. (6.79) Hindi kəl ghər pər koi yesterday house at anyone ‘No one was at home yesterday’.

nəhı̄̃ not

(6.80) Hindi ʃukrəvar ko laibrerī me͂ Friday ACC library in mulaqat hogī. meeting.F happen.F U T .F .S G ‘(I) will meet with Shyam in the library on Friday’.

tha. be.PAST .M .SG (Kachru 2006: 159) ʃyam Shyam

se with

(Kachru 2006: 159)

(6.81) Marathi gharātSyā tShaprāwar to baslā hotā house-P O S S roof-on he sit-P ST -3SM was ‘On the roof of the house, he was sitting’. (Pandharipande 1997: 252) Interestingly, Hindi adverbials in sentence- initial position may also be promoted to topics (or ‘themes’ in Kachru’s framework) by means of a particle (Kachru 2006: 246); see example (6.82). (6.82) aj (to) həm ʈenis zərūr today (PTCL) we tennis certainly ‘Today we will definitely play tennis’.

khele͂ge. play.F UT .M .PL (Kachru 2006: 246)

An adverbial found in ICE-Singapore that was counted as obligatory and, therefore, as topicalized, is given in (6.83):

144

144 Topicalization form, function, frequencies (6.83) A: We bought you something C: And he didn’t like it A: Ya actually C: Ya B: I really like it C: You like it B: Really Tomorrow I’ll wear it C: With that shirt ya Red and green Really clashes ya Yuck On a on a white shirt it’ll look good (ICE-SIN:S1A-056#396–408) Mandarin, one of the main contact languages of English in Singapore, permits placing the adverbial in initial position depending on its information status; compare (6.84) and (6.85) from Yiu (2014). (6.84) Tā zài hēibǎn shang xiě he at blackboard Localizer write ‘On the blackboard, he writes characters’.13 (6.85) Tā xiě zì zài hēibǎn he write character at blackboard ‘He writes characters on the blackboard’.

zì. character (Yiu 2014: 84) shang. Localizer (Yiu 2014: 84)

Based on whether the blackboard or the writing represents old information, the topic in each sentence differs. Summing up, adverbials functioning as sentence topics are prevalent in many of the major contact languages of English in Asia. They may be marked as topics by means of particles and by being placed sentence-initially or by either of these two means separately. However, subtle differences in the frequencies of topicalized adverbials cannot be explained solely by considering contact structures. The question of whether adverbials can be complements and which criteria they need to fulfil to be regarded as such is answered differently depending on the theoretical framework. As Austin et al. point out, “various approaches differ as to which adverbials are considered to be complements, and as regards the syntactic (and semantic) argumentation that supports these assumptions” (2004: 8). The syntactic and semantic integration of an adverbial is difficult to judge because speaker intentions are not always clear to the reader, and, in some cases, a speaker may consider a sentence-initial adverbial as a topic without a reader of the corpus interpreting it as such. The two primary criteria in the decision whether to treat an adverbial as a token of topicalization (or not) were (a) its

145

Topicalization form, function, frequencies 145 syntactic integration (= if it is a complement or an adjunct) and (b) the subsequent discourse (= if the discourse revolves around the information in the adverbial). It was found that the rating of an adverbial as obligatory or omissible was often confirmed by the syntactic-semantic integration of adverbials; this was also supported by the second rater. First and foremost, this section shows that adverbials can be topicalized and that their status in the utterance is subject not only to syntactic but also to semantic and discourse-pragmatic constraints. The frequency of topicalized adverbials in the four Asian varieties is certainly influenced by the presence of similar structures in the contact languages, but the decisive factors for the frequency of the other constituents also influence the topicalization of adverbials. At any rate, the topicalization of adverbials represents a very promising and fruitful area to study innovative structures emerging in second- language varieties of English. Innovation and creativity in using topicalization are also the focus of the next section. 6.3.2 Interaction with further syntactic processes Topicalization in the ‘traditional’ sense is the fronting or preposing of discourse-old information in the form of NPs functioning as direct objects. As I mentioned earlier, such an approach is far too narrow to describe the reality of topicalization usage in Asian Englishes. In order to lend further weight to this argument, this section presents and discusses tokens that interact in specific ways with other syntactic processes. The first part analyses topic extraction from embedded clauses and the second and third parts show tokens of topicalization interacting with wh- and yes/no-questions as well as negation and copula deletion. Finally, cases where speakers shift from an initiated canonical SVX sentence to topicalization are presented and discussed. Thus, this section shows that Mesthrie’s expanded functions of topicalization identified for SAIE in his 1992 monograph also apply to the Asian varieties under consideration and, to a certain extent, also to BrE. One of the expanded ‘functions’ of topicalization that Mesthrie identifies for SAIE is the extraction of topics from embedded clauses (1992: 114). Since ambiguity and the structure of ICE with clipped utterances do not allow for an analysis of stacked topics, I chose to focus exclusively on the extraction of topics from embedded clauses. Overall, extracted topics occurred rarely in the analysed ICE corpora; in all files combined, only seven instances could be identified. This amounts to 1.27 per cent of the overall 552 tokens across the three sub-corpora and is notably less than Mesthrie’s 6.5 per cent for SAIE (cf. ibid.: 120). However, even these relative numbers cannot be directly compared: Mesthrie includes left- dislocation in his analysis, allowing for cases such as (6.86) to be counted. In addition, he includes the stacking of topics (example 6.87) in this count as well. (6.86) Because Hindu religion and culture, I feel it’s too beautiful. (Mesthrie 1992: 120)

146

146 Topicalization form, function, frequencies (6.87) Most of the children, English films they like. (Mesthrie 1992: 121) On the other hand, Mesthrie claims that the majority of his examples, 73 tokens to be precise, were cases of topic extraction, which is still much more than could be found in ICE. However, in spite of the low numbers, some of the most interesting constructions involving topicalization could be found in the analysis of this criterion. Consider example (6.88) from ICE-Singapore: (6.88) B: How about the other time I we try the AR Pioneer is not a Proton Uh Proton I don’t think is a good buy Doesn’t lood a job uh A: Proton is a maturer manufacturer than Will not They make products that are good (ICE-SIN:S1A-018#331–337) This excerpt from the conversation builds on a larger discussion regarding manufacturers of sound systems. Proton, despite being mentioned by B for the first time, is well established in the discourse via a poset. The poset {manufacturers} is the topic for a long span of the discourse and different examples such as Typhoon and Spando are mentioned before Proton. In the utterance featuring topicalization, Proton is extracted from the embedded clause in I don’t think [that Proton is a good buy]. Thus, Proton functions as the subject in the embedded clause (realized here as a contact clause without the relative pronoun that) and is promoted to sentence-initial status even beyond its regular initial position in the subordinate clause. Apart from being one of the few examples where extraction occurred, this is also the only instance of a topicalized subject (which, of course, necessitates extraction because canonical subjects naturally occur sentence-initially). In the discourse, this example creates continuity while also serving a contrasting function. Another highly interesting case, which is also a hapax legomenon in the data, could be found in the phone calls in ICE-India. In this excerpt, given in (6.89), A and B quickly take turns in discussing the details of a freight shipment. (6.89) A: Air freight it takes about three days na? B: But it takes about ten days for clearance A: Is  that? B: Yeah A: Haan  haan B: So now you see they have to immediately ship fifty k g A: Immediately they say it’s not possible B: Haan A: See they can uh earliest they can dispatch before the end of next week (ICE-IND:S1A-094#76–84)

147

Topicalization form, function, frequencies 147 The information that the 50 kilograms need to be shipped immediately (and not, for instance, tomorrow or next week) is of highest importance in the discourse. Even in B’s utterance preceding the relevant token, immediately is emphasized by being mentioned before the information about which goods need to be shipped. Speaker A then also emphasizes the temporal information by not just moving immediately in the embedded clause, but by placing it at the very beginning of the entire utterance. Thus, the primary canonical reading of the sentence (‘They say [that] it is not possible immediately’) is completely overturned. An alternative interpretation with immediately as modifier of the main clause as a whole (‘Immediately, they say it is not possible’) can be ruled out because of the preceding discourse. In addition to this interesting violation of the canonical word order, this example also displays the potential of adverbials as topics. The utterance is about the point in time when the shipment needs to be delivered and the adverbial contains evoked information. For this reason, even a narrow definition of topicalization would likely include this example as it fulfils both the aboutness and the givenness criterion for topics. Typologically, it is interesting to note that both the mother tongue and the additional language indicated by speaker A in example (6.89) allow for topic extraction; see example (6.90) from Marathi and (6.91) from Hindi. (6.90) āilā, hī kāltSīts gosta āhe kī Mother-A C C this yesterday’s-E MPH matter is C OMP rameś dɔktarkaḍe gheūn gelā Ramesh doctor-T O take go-P ST ‘Mother, it is only yesterday’s matter, that Ramesh took (her) to the doctor’. (Pandharipande 1997: 252) (6.91) dukaane͂ meraa khayaal hai ki ti stores.F my idea be.P R S .SG that nau baje khul jaa-tii haĩ 9 o’clock open GO-H A B .F be.P R S .PL ‘The stores, I think, open at 9 o’clock’. (Gambhir 1981: 303–304) Considering (6.90), Pandharipande comments that Marathi allows for “the elements in the subordinate clause [to] be moved to the front of the matrix clause” (1997: 252). Indian speakers of (at least) Marathi and Hindi are, therefore, familiar with this structure, although the speakers recorded for ICE-India do not use it particularly frequently in English. In summary, the extraction of topics from embedded clauses is found in all varieties except for HKE. However, with seven tokens, it is not particularly frequent –even though topic extraction can result in highly interesting constructions, as exemplified in the discourse excerpts above. In addition to the extraction of topics, Mesthrie (1992) observes topicalization in wh- and yes/no-questions in SAIE. He gives the examples in (6.92–6.93) from SAIE.

148

148 Topicalization form, function, frequencies (6.92) Alone you came? (= ‘Did you come alone?’) (Mesthrie 1992: 114) (6.93) Your car where you parked? (= ‘Where did you park your car?’) (Mesthrie 1992: 114) In the analysed ICE corpora, topicalization in questions occurs occasionally. With the exception of PhilE, all varieties featured at least one question with a topicalized constituent. Most questions involving topicalization resemble the excerpts shown in (6.94) from ICE-India and (6.95) from ICE-Singapore. In the two following yes/no-questions, a direct object, realized by an NP, is topicalized. (6.94) A: As soon as the exams were over evaluation and then uh I just finished evaluation and came here C: I didn’t go anywhere no Just I kill the time in summer reading doing nothing B: Evaluation you have done? (ICE-IND:S1A-031#296–299) (6.95) A: Ya I think they should have flip chart B: So they should have A: So these are the only two things uh that Mandarin has to provide lah The rest we can bring right B: Uhm A: Prop uh costuhmes and uh what else huh The guns and all that (ICE-SIN:S1A-015#192–198) ICE-India features the sole example of a topicalized obligatory adverbial in a question; see (6.96). The information contained in the topicalized adverbial serves as a contrast to other locations discussed in the discourse. Simultaneously, it creates continuity by highlighting the location that was in focus in the previous utterance by speaker A. (6.96) A: Did you celebrate this Daserra in uh your place B: Yeah A: In Andhra you are celebrating B: Yes in Andhra means only Telangana Telangana only one day (ICE-IND:S1A-002#131–134) Example (6.97) below features predicative topicalization in what appears to be a rhetorical question and is the only instance in which topicalization occurs in a

149

Topicalization form, function, frequencies 149 question with an interrogative. In addition to deleting the copula, the speaker topicalizes the predicative adjective. The canonical version of the utterance that features topicalization would be ‘How many more months are left?’: (6.97) A: Haven’t done anything haven’t got results Haven’t got my clones for that one yet you know So how Very worried you know B: Left how many more month But you submitted that abstract already right (ICE-SIN:S1A-020#279–284) In conclusion, topicalization in questions occurred rarely in the analysed ICE components. Fewer than ten questions featuring a topicalized constituent in initial position could be identified, which means that it is not a major phenomenon. Still, the creative usage in this context further suggests creative usage of topicalization in spoken varieties. Typologically, the interaction of topicalization with questions is not explicitly mentioned in the consulted grammars. For this reason, a typological comparison would have been fragmentary at best and was, therefore, not included in this section. In some rare cases, topicalization interacts with negation and deletion. Deletion is understood here as an umbrella term that includes both copula deletion and pro-drop. Copula deletion is not mentioned by Mesthrie (1992) as one of the processes co-occurring with topicalization, but it could be identified in some of the corpus files that were analysed for the present study. Some of Mesthrie’s examples for negation (6.98 and 6.99) and pro-drop (6.100 and 6.101) can be seen in the following sentences from his SAIE data: (6.98) I’m here fourteen years; not with one neighbour I had problem. (Mesthrie 1992: 114) (6.99) No slang an’ all we used to use. (Mesthrie 1992: 114) (6.100) Rajend never see long time. (= ‘We haven’t seen Rajend for a long time’) (Mesthrie 1992: 114) (6.101) Skabeni Hill must walk now. (= ‘I have to walk up Skabeni Hill now’) (Mesthrie 1992: 114) In the four Asian ICE components, topicalization interacts with negation and pro-drop in some cases. The following example (6.102) from ICE-Singapore is interesting in that it involves negation as well as pro-drop, although it needs to be inferred from the context who the pronoun’s referent is.

150

150 Topicalization form, function, frequencies (6.102) A: Go and learn B: Don’t  want See first ah Sometime you got the urge to take back organ A: Organ don’t want (ICE-SIN:S1A-085#85–89) A case more in line with Mesthrie’s examples, where the negation is actually part of the topicalized constituent, can be seen in example (6.103) from ICE-India: (6.103) B: And that’s the thing A: That is the problem with me also B: Even I face the same problem but I try to tackle it you know with uh I’ve got some problems whenever I have Actually not in this earlier school no I had problems like this most of the students no they are just like hooligans and all that (ICE-IND:S1A-085#86–89) This example is particularly interesting because the dislocation of the negating particle entails the omission of the auxiliary do, which is no longer present in the sentence. An example taken from ICE-Singapore that entails copula deletion is given in (6.104). Here, speaker B omits a form of be and only gives the subject complement so boring in initial position followed by the subject the lecture: (6.104) B: That means you want to skip the lecture You also C: Cannot cannot If I skip also then nobody is going to go there B: You you send one rep A: Aye just like the case of last year that time at first so many then we count B: So boring the lecture C: It’s not terribly boring (ICE-SIN:S1A-069#95–102) A syntactic interpretation of this example is not entirely unproblematic. At least three different versions of understanding the utterance could be construed. In two of these versions, the copula occurs in different positions: Is could occur right after so boring or after the lecture. Of these two interpretations, the latter construction would be a stronger case of topicalization. As an alternative to these interpretations, it could be assumed that the entire sentence is elliptical. Then, many versions can be hypothesized: Possibly, the lecture was added as a clarification by the speaker so as not to confuse the other participants of the discussion;

151

Topicalization form, function, frequencies 151 in this case, a fully realized utterance might have been something along the lines of ‘It’s so boring, I mean the lecture’ or, in a stronger case of right-dislocation, ‘It’s so boring, the lecture’. However, taking into consideration the tendency of Singaporean speakers to delete the copula in informal speech (cf. Deterding 2007), a scenario where the copula is deleted and the complement is topicalized for emphasis is plausible (see also Leuckert and Neumaier 2016 for a discussion of the copula in various Asian contact languages of English). The only token in ICE-Great Britain that was found to be relevant for the criterion of deletion, given in (6.105), is similar to (6.104) in that the subject complement could be interpreted as being topicalized while other interpretations are also plausible: (6.105) A: You went to Greece though didn’t you C: Yes Uh we went we went to uh Hagios Lindos A: Lindos C: Yes Villa Rainbird D: That’s right Yes C: Very good that I’m sure D: It was great (ICE-GB:S1A-063#32–41) In this excerpt, very good could be understood as the initial (expectedly elliptical) response by speaker C. This response is then expanded rather awkwardly by the speaker (possibly without the intention of doing so at the beginning of the utterance). In a cross-varietal comparison, the interaction of topicalization with negation and deletion is not highly frequent –the discussed examples already represent most of the less ambiguous examples. In conjunction with the other identified phenomena, however, they further strengthen the argument that topicalization is employed in creative ways in the four Asian varieties of English and ICE-Great Britain. The last of Mesthrie’s criteria for an expanded concept of topicalization to be analysed in this section is the shift from initiated SVX to topicalization. Mesthrie identifies this tendency of SAIE speakers in basilectal speech: “[T]‌he drive towards topicalisation seems so strong in the basilect as to operate even when speakers have already begun with canonical SVO order” (1992: 115). Of course, the analysed corpus texts can hardly be described as ‘basilectal’. Although the direct conversations and phone calls in ICE are largely informal, they are designed to capture the standard(ized) variety. For the present analysis, shifting from SVX to TOP was annotated regardless of register. In addition, the analysis was not restricted to SVO. Instead, any initiated canonical pattern (e.g., SVC or SV plus obligatory

152

152 Topicalization form, function, frequencies adverbial) was counted. The fact that shifting from initiated SVX to topicalization may occur in a relatively formal setting can be seen in example (6.106). (6.106) A: And therefore in that sense it’s isolated it seems to be and that is isolated from society so that is the second meaning of isolation (ICE-HK:S1B-008#96) In this example, taken from the classroom lessons in Hong Kong, the canonical SVX order remains intact as the full sentence is finished with isolated as the subject complement following the copula. However, the speaker then continues their utterance with isolated as a topicalized complement taken from the standard clause it seems to be isolated. It should be mentioned that it seems to be might also be a repair mechanism –however, based strictly on the written representation of the file, topicalization in this example is likely. In contrast to (6.106), the example shown in (6.107) from the conversation files in ICE-India occurs in a more casual setting. Again, the sentence features a subject complement that is part of both the regular clause and the topicalized clause. In this example, the mark-up for the pauses was not removed to show that speaker A does not pause before continuing with the emphatic it is after the subject complement just entertaining. (6.107) A: The other day you saw that film uhn B: Which  A: Uh solwan saal   B: No  A: This one was there yaar Waheeda Rehman and Guru Dutt B: No  A: Nice movie yaar that song is there no hai apna dil to awara   B: I didn’t see A: Very nice movie it is just entertaining it is  (ICE-IND:S1A-052#239–247) In (6.108) from ICE-Singapore, speaker A’s utterance featuring topicalization is complex: (6.108) B: So because uh if say right now uh if the user say our user the account user they they printer got problem they will still call us you know A: Uhm uhm uhm B: But next time this one should be solved by the word

A: What you mean is that uhm we’ll sort of act as a buffer B: Uh the interface uh

153

Topicalization form, function, frequencies 153 A: Solve the the minor minor problems we’ll solve But when it comes to a more the more specialised programmes we’ll look you’ll look into that (ICE-SIN:S1A-045#30–36) The initiated utterance lacks a subject in canonical position, which means that it does not begin as a ‘regular’ SVX sentence. Since pro-drop represents a common feature in Asian Englishes, however, the sentence could still be included in the present analysis.14 In addition, speaker A has already introduced the subject we in their previous statement and mentions it again at a later point. Apart from the absent subject in initial position, the sentence unfolds canonically with the verb solve and the direct object the minor problems. Following this structure, speaker A then provides the subject and the modal verb. The speaker repeats the main verb either as reinforcement or to complete the verbal complex; the sentence is ‘complete’ because all grammatically required constituents are present. The example given in (6.109) from the direct conversations in ICE-Hong Kong is highly interesting because of the apparent ‘reverse’ direction of shifting: Instead of turning to topicalization after initiating a canonical SVX sentence, speaker A either repairs or completes their utterance by (seemingly) switching from topicalization to SVX. Since it is more likely that the speaker intended to form a canonical sentence, which is underlined by their repetition of only, this example cannot be considered as fulfilling Mesthrie’s criterion of shifting from SVX to topicalization. However, it can serve to show an alternative to Mesthrie’s suggested function. (6.109) B: But can you can you see the match uh the the live match on the Internet A: The the only only the scores I can only read the scores (ICE-HK:S1A-083#8–9) Shifting from SVX to topicalization is not limited to the Asian varieties. In the direct conversations of ICE-Great Britain, four out of 55 tokens exhibit this pattern. Two of them are shown in (6.110) and (6.111). (6.110) B: He tends to tends to work with Harry Beckett he plays with quite a lot (ICE-GB:S1A-058#264–265) (6.111) A: I mean I love American crap especially comedies like crap comedies that everybody thinks are crap I like (ICE-GB:S1A-041#123) In summary, shifting from initiated SVX to topicalization occurs in all varieties but is a relatively rare phenomenon. It is not limited to basilectal speech, for

154

154 Topicalization form, function, frequencies it was also found to occur in the rather formal setting of the classroom where mesolectal or acrolectal speech is expected. Summing up, this section has shown that interactions with syntactic processes such as interrogation, negation, and topic extraction from embedded clauses as well as shifting from initiated SVX to topicalization occur in most of the analysed varieties. Speakers creatively employ topicalization in different contexts and sometimes even seem to feel strongly inclined to do so. This is most apparent in instances of topic extraction from embedded clauses and shifting to topicalization after initiating a canonical SVX clause. Again, BrE does not differ significantly from the Asian varieties. This lends further weight to the argument that Mesthrie’s suggested expanded functions of topicalization apply to spoken varieties of English in general and are not limited to L2 (or learner) varieties. 6.3.3 Discourse function In addition to syntactic function, topicalization can also be analysed with regard to the four discourse functions described previously: emphasis, contrast, continuity, and shifting. This section presents examples and indicates the frequency of each function. Additionally, information status is correlated with discourse function. An overview of the frequency of each discourse function across varieties is given in Table 6.11. In addition to the four discourse functions and their frequencies, the table indicates the combined percentages of emphasis and contrast, on the one hand, and topic continuity and topic shifting on the other. This was done to emphasize the proportion of tokens that highlight a constituent as opposed to those that affect cohesion in some way. The table shows that topic continuity is the dominant function in ICE-India and ICE-Philippines, whereas emphasis and contrast are the most frequent discourse functions in the other three corpora. Table 6.11 Discourse functions of topicalization across varieties

ICE-GB

ICE-HK

ICE-IND

ICE-PHI

ICE-SIN

A %

Emphasis

16 23.19

14 27.45

58 22.66

6 12.5

35 27.34

B %

Contrast

20 28.99

12 23.53

29 11.33

8 16.67

35 27.34

52.18

50.98

33.99

29.17

54.68

33 47.83

25 49.02

169 66.02

33 68.75

57 44.53

1 2.08

1 0.78

70.83

45.31

A+B% C %

Continuity

D %

Shift

C+D%

0 0 47.83

0 0 49.02

0 0 66.02

155

Topicalization form, function, frequencies 155 Although I agree with Winkle in her claim that “the creation of topic continuity cannot be seen as the sole motivating factor for Indian English speakers to use fronting constructions” (2015: 155), my findings are more in line with Lange (2012a), who found creating topic continuity to be of great importance in IndE. In fact, despite a largely similar database, Winkle only found 32.4 per cent of topicalizations in IndE to be motivated by topic continuity (ibid.: 156). This can be attributed, in part, to a different definition of continuity, since the definition applied in this book is more encompassing. Emphasis and contrast The first two discourse functions to be discussed in some detail are emphasis and contrast. The following examples indicate in which ways emphasis and contrast are employed in the Asian varieties of English. Some tokens that were considered as emphatic rather than contrastive are shown in (6.113), (6.114), and (6.115). (6.113) A: And he said sometimes if you find a kid who is particularly rebellious Everything you ask he is doing opposite to you and you always think that why is he not a good kid you know (ICE-HK:S1A-033#166–167) (6.114) B: I mean no group just individuals A: Ya B: Just can’t imagine A: Ya  can Aiyah drama people mah Very easy one uh You just you just B: People I know A: Don’t think so (ICE-SIN:S1A-025#32–40) (6.115) B: Are you asleep yet A: Hi Naw Going to A little bit more I would have knocked off B: Okay (ICE-SIN:S1A-098#3–8) In all three examples, the topicalized constituents emphasize information but do not explicitly or implicitly suggest a contrastive reading. Everything you ask in (6.113) addresses the problem connected to the rebellious personality of the

156

156 Topicalization form, function, frequencies aforementioned child, while speaker B in (6.114) picks up the previous topic and emphasizes their perceived knowledge of human behaviour. In (6.115) from the phone calls in ICE-Singapore, a little bit more is new information supposed to highlight the fact that the speaker wants to sleep a little longer. As in the analysed Asian components of ICE, emphasis is also an important discourse function in ICE-Great Britain. One example from the corpus is given in (6.116). (6.116) B: Well I’ve been doing a lot of research into this and everybody that cooks I ask how they make pastry you see And they all say well it’s very difficult (ICE-GB:S1A-057#131–132) The fact that the line between emphasis and contrast is not always easy to draw is illustrated by the topicalized comitative with ceramic in the following example in (6.117) from ICE-Great Britain: (6.117) B: You really do need to uhm If you’re looking at the uh the microwave properties then uh it’s essentially you need to fashion a cavity That’s that’s with ceramic of course that’s not too difficult to make That’s why applications involving those things have already been carried out (ICE-GB:S1A-089#34–37) In one line of reasoning, it could be argued that with ceramic has a contrastive function: While the described goal can be easily achieved if ceramic is used, other materials may not be as successful. On the other hand, the properties of ceramic as the ideal choice could be emphasized without the intent of mitigating the potential of other materials. Not all instances of topicalization are ambiguous, however. Tokens that clearly serve to create a contrast were found in all corpora, and some of these cases occur in pairs. An example in IndE (6.118), also identified by Lange (2012a: 132), gives two contrasting set members: (6.118) A: Oh we never knew what happened to her The other we knew ah but Vidya we didn’t knew (ICE-IND: S1A-021#58–59) In this example, an overlapping of functions can be observed: The other and Vidya are clearly contrasted with each other, as the positive verb form in the first clause is also contrasted with the negated verb form in the second; however, at the same time, the two objects are clearly emphasized and directly connected to the preceding sentence. The contrast between the topicalized constituents is a case of ‘explicit’ contrast, that is, when the alternatives are “explicitly mentioned,

157

Topicalization form, function, frequencies 157 contrasted or denied in the same stretch of discourse” (Callies 2009: 23). Similar examples were also found in other sub-corpora; see, for instance, (6.119) from the conversation files in ICE-Singapore: (6.119) A: Each one do one only is it B: Ya this one I have to return Seok Mee and this one consider as ah for word tutorial (ICE-SIN:S1A-069#31–32) Explicit contrast may also occur without the alternatives as contrasting partners in topicalized position; (6.120) from ICE-Hong Kong is a case in point. For convenience, the non-topicalized contrastive partner is underlined. (6.120) Z: Yeah so it’s like how many subjects are you doing French English A: Uhm I take uhm totally s six courses this year and two whole courses in French and other four ha and other courses on English And English you have to prepare but I don’t think it is not […] Z: English is easy A: Not that harsh as yours Quite easy You you just have to study (ICE-HK:S1A-017#268–275) Without the added context, English could be interpreted as either contrastive or emphatic. Taking into consideration the preceding and the following discourse, however, a contrastive function seems more likely. In the utterance that features the topicalized object, speaker A explains that English requires preparation, but is not as challenging a subject as French. In order to grasp this meaning, the surrounding discourse is necessary. Since French as a subject is explicitly mentioned in the sentence preceding the topicalization, this contrast is also explicit. Another example found in the classroom lessons in ICE-Singapore, given in (6.121), shows a similar scenario. Here, the speaker contrasts the first mentioned point in time now with the topicalized in the past: (6.121) A: Right so the point that he was trying to say is that even though now we don’t pronounce persychology laughs  In the past maybe they did but I think the ‘k-n’ one know ya they might have pronounced ‘k-n’ and then we stop pronouncing the ‘ker’ or something like that okay but then the spellings do show the previous phonological system (ICE-SIN:S1B-002#150–151) For topicalized (or, in their terminology, ‘preposed’) AdjPs, the CGEL claims that they always have a contrasting function (Huddleston and Pullum 2002:

158

158 Topicalization form, function, frequencies 1375). Some of the tokens found in ICE fit this constraint; see, for instance, the example in (6.65) that is quoted partially again for convenience as (6.122). (6.122) A: Very funny your system very bad your your office (ICE-SIN:S1A-080#48) Considering all topicalized AdjPs that were found in the analysed corpora, however, this constraint does not hold in every case. Of the 45 topicalized AdjPs, only 8 were identified as serving a primarily contrastive function. This needs to be taken with consideration, since discourse functions may overlap and are, to an extent, subject to individual interpretation. Regarding the overlap of functions, the excerpt in (6.123) from ICE-Singapore shows an example of a token simultaneously creating continuity and contrast: (6.123) C: And he charged you quite okay B: Flowers quite reasonable Flowers uhm okay lah compared to other shops we’ve seen How much is it Okay one bridal bouquet uh one bridal bouquet one posy bridal bouquet is a hundred posy is about eighty hair pieces forty corsages he gave free (ICE-SIN:S1A-002#164–169) Corsages in the discourse excerpt is a new member of the poset {flower arrangements} and directly continues the list of items provided by speaker B. Simultaneously, corsages stand in direct contrast to the previously added members, since speaker B switches from naming the price after the item to giving a full sentence telling speaker C that the corsages were free. A similar example that serves both to create a contrast and topic continuity is given in (6.124), which is also taken from the conversation files in ICE-Singapore. (6.124) E: So so small group discussion and so going to see how what happens in life F: Or where whether are we in the mood E: We usually have interesting topics laughter  F: Followed by food No food we cannot discuss (ICE-SIN:S1A-088#302–307) Speaker F addresses the poset {interesting topics} opened by speaker E by first suggesting food as a potential topic for discussion. After laughing about it, they say that food is not an appropriate topic. In this utterance, food is topicalized and establishes cohesion in the discourse, but it is also contrasted with other alternatives. These alternatives are practically endless since the poset {interesting topics} could contain any number of members.

159

Topicalization form, function, frequencies 159 Topic continuity and topic shifting The next discourse functions of topicalization to be analysed are the establishment of topic continuity and its direct counterpart topic shifting. Topic continuity is roughly understood here in the sense of Lange, who found for IndE that “[m]‌any examples display an explicit discourse-linking function, where the immediately preceding topic is taken up again” (2012a: 134). Instead of only looking at the immediately preceding topic, I also count those cases as continuations where a topic had been established at an earlier point in the discourse. For instances of topic continuity, see examples (6.125) from HKE, (6.126) from SinE, and (6.127) from PhilE. (6.125) Z: It’s not easy to uh witness to your own people A: I know Z: Sometimes it takes a long long time A: What is your suggestion Z: Patience and prayer A: Yeah and prayer I have (ICE-HK:S1A-052#279–284) (6.126) B: I think most people do that but my sister Mary she loves buying clothes for them She insist on wearing the same clothes you know She keeps buying clothes for them I think most of their clothes they wear two three times only (ICE-SIN:S1A-048#176–180) (6.127) C: Well we sound like sore losers laughter  B: No A: No Hello That I will never agree with you (ICE-PHI:S1A-073#307–311) In all three examples, a topic that was mentioned in one of the preceding utterances (immediate or otherwise) is referred to again by means of topicalization. In (6.125), for example, speaker A repeats the previously mentioned suggestion by speaker Z to turn to prayer for help. Topic continuity and therefore discourse cohesion is established by speaker A by repeating prayer. In addition to exact repetition, using the pronoun that, as seen in (6.127) from ICE-Philippines, can establish topic continuity. Since there must be an identity link between a discourse-deictic pronoun and its referent, cohesion is implied when such pronouns are used. Finally, (6.126) is an excerpt in which a topicalized constituent creates cohesion within a monologue by the same speaker. This shows that topic continuity can also be employed to structure a longer passage uttered

160

160 Topicalization form, function, frequencies by the same person and, thereby, make it more accessible to the hearer. In addition to establishing topic continuity, the topicalized constituent in (6.126) also emphasizes the NP most of their clothes. While the establishment of topic continuity is one of the primary functions of topicalization in the ICE corpora, topic shifting by means of topicalization is very rare. In the exchange given in (6.128), speaker A introduces a topic shift by using the hedge by the way, but in the discourse, the transition still feels relatively ‘rough’ and unanticipated. (6.128) B: The frontal view frontal view or I don’t know what’s the where the bridge is located A: By the way after graduation what are your plans B: I plan to have a short gathering a sort of celebration (ICE-PHI:S1A-045#67–69) After graduation as an indicator of temporal location falls in the middle of the CGEL classification of adjuncts versus complements. However, in the context of the discourse, it is marked; speaker A wants to shift the topic with an emphasis on the point in time. The pauses were kept in the example to show that the speaker does not pause after bringing the new topic into the conversation. The shift is successful, since the following utterances revolve around B’s graduation, which is now salient in the discourse. A similar example that includes an unanticipated topic shift by means of topicalization has already been quoted above and is given here with more of the following discourse in (6.129). (6.129) C: Is it a restaurant E: It is a restaurant C: And they have satay D: They have satay Recently that man I saw C: word E: It’s quite good Is it D: The satay is good (ICE-SIN:S1A-037#153–161) Speaker D either utters a spontaneous thought or wants to add information that only seems unrelated to the reader/hearer. Whatever the case may be, it is evident that, in the following discourse, with the possible exception of the unclear passage, there is no further reaction to the newly introduced topic that man. In addition to being only one of two cases that clearly serve to shift the topic, this is also one of the rare cases where an attempted topic change is unsuccessful. This is only from an outsider perspective, however. The intention of speaker D does not become clear by only considering the presented discourse segment.

161

Topicalization form, function, frequencies 161 Overall, the paucity of shifting tokens can be explained by taking into account the tendency of speakers to be polite to interlocutors. Communication is generally considered a cooperative process even though speakers may, of course, manipulate a conversation in many ways. Shifting a topic by means of topicalization could certainly be interpreted as an expression of disinterest or impoliteness. Politeness is often pointed out as a defining characteristic of many Asian societies, which would explain, to an extent, why an act that could be interpreted as being rude is generally avoided. Furthermore, the fact that speakers are often aware of being recorded could add to this effect, as losing face might be considered an attached risk. The next section considers topic persistence, which is not a discourse function in the narrow sense but a related concept. Topic persistence Topic persistence is defined by Givón as “the number of times the referent persists as an argument in the subsequent ten clauses following the current clause” (1984a: 908; cf. also 2001a: 457). An influential article dealing with topic persistence is Gregory and Michaelis (2001), who compared the persistence of topics that were placed in sentence-initial position (a) via left-dislocation and (b) via topicalization. They found that “[t]‌he majority of the preclausal-NP denotata in TOP are (a) previously mentioned and (b) fail to persist as topics” (2001: 1696), while topic persistence of LD-topics is much stronger. Consequently, they conclude that left-dislocation is a more effective topic-establishing construction than topicalization (ibid.). This is where the relation to discourse function is located, since topicalization is seen, in this sense, as one of many possible means of establishing a topic. Topic-establishment was not tagged in this study as an alternative function to emphasis, contrast, and so forth. Instead, topic persistence was analysed for every single token. For the present study, I did not apply a scoring system as Givón (1984a) and Gregory and Michaelis (2001). The latter gave zero points when a topic did not persist at all, one point when an NP was repeated, and two points when the topic returned in the discourse in the form of a pronoun (2001: 1689). I analysed whether or not the topic persisted in the discourse and, if it did, in which way it persisted. If the topic was referenced again beyond the following ten clauses, it was still counted as being persistent in the discourse. In addition to explicit topic persistence, for example, when the topic is referred to by a pronoun or by repetition, I also counted cases where speakers continued to talk about the topic without referencing it directly. In the following paragraphs, I exemplify implicit and explicit topic persistence and compare topic persistence between varieties. A possible way for a topic to persist in the discourse is by being repeated. The label given in such cases was that of ‘identity’. For easier readability, all the following discourse excerpts indicate the topicalized constituent in bold, while the form in which the topic reappears in the discourse is underlined. Excerpt (6.130) is an example of a topic being repeated in exactly the same form as it appeared in topicalization at an earlier point in the conversation15:

162

162 Topicalization form, function, frequencies (6.130)

B: The the one dollar I give you Z: Uhm in other way I put the one dollar Oh A: Uh B: Oh thank you Z: In my pocket (ICE-HK:S1A-061#847–852)

Example (6.131) from the Philippine conversation corpus shows the topic snakes that re-enters the discourse by means of topicalization and is referred to several times again in the following discourse, even by yet another topicalization. Like the one dollar in the example above, snakes is repeated once in identical form and four times in the singular (twice in endocentric nominal compounds as a premodifier). (6.131) A: Snakes you tried What about snake bile What about snake uh-huh you’ll get poisoned laughs  Rattlesnake’s bile B: That’s  gross Yeah but I think it’s really true that they cook everything […] A: Snakes I mean B: Oh my God A: Yuck B: Funny how those people think of different ways to present food in more groceries A: Yeah If you were offered to eat if you had nothing to eat and you only had let’s say what what’s gross B: I don’t know you asked A: Uh uhm snake would you (ICE-PHI:S1A-023#272–317) As noted in chapter 2, poset relations technically include identity (Ward and Birner 2004: 159). In order to be more specific, I differentiated identity links, pronouns (that would fall into the category of identity links in Ward and Birner’s framework), and non-referential links in the overall category of posets. Among these non-referential links, type/subtype, entity/attribute, part/whole and other relationships are listed by Ward and Birner (2004: 159). A poset relationship between a topicalized constituent and another member of the same poset is shown in (6.132) from ICE-Hong Kong. In this excerpt, speaker A topicalizes Mandarin. Speaker Z, who is not from Hong Kong, then picks up the poset {languages} by adding English. The (overall) topic, therefore, persists in the discourse since it is referenced again in a poset relationship.

163

Topicalization form, function, frequencies 163 (6.132) A Fion say that he is better than us Z: Uh ha A: And also uhm Mandarin I don’t know Z: Yeah And English (ICE-HK:S1A-056#483–487) A similar example could also be found in the Singaporean conversations, shown in (6.133): Speaker B opens the poset {national mythology} and discusses which nationalities are familiar to a man named Thomas. Following three other examples, Chinese stuff is topicalized and quickly followed by yet another poset member. It should be mentioned that the persisting topic is not introduced via topicalization in this excerpt, but has already entered the discourse before being topicalized by speaker B. (6.133) A: Is Thomas interested in this sort of myths B: Don’t know word  stuff Scandinavian stuff The English stuff okay Uh Chinese stuff he knows a bit But not Singaporean stuff (ICE-SIN:S1A-030#271–282) Topic persistence also plays an important role in the classroom, not only in argumentation between teachers and students as well as amongst the students themselves, but also in teacher presentations. Excerpt (6.134) shows part of a teacher monologue in which the teacher opens the poset {new inventions} and adds several new members (such as {wooden plough} and {this basket}), but sometimes also refers to the overarching category ({new inventions} and {new things}). (6.134)

A: And the other weapons also he made out of stones And then in the agricultural stage he started making use of wood Wooden plough and wooden instruments he started preparing […] He started inventing new things and discovering new lands and other things […] They started preparing machines New inventions and they are specially machines they started you know that There were industries and they were producing so many things One one example of the same kind this basket and this these steel cups I have now here (ICE-IND:S1B-008#49–77)

164

164 Topicalization form, function, frequencies In this excerpt as well as in (6.135), the discourse topic continually receives new members and is being referred to in some way. By adding new members and mentioning both overarching categories (e.g., new inventions) as well as specific members (e.g., this basket), the overall presentation by the teacher becomes coherent and dynamic and, possibly, more accessible to the students. A similar teaching strategy by a teacher is shown in excerpt (6.135), which is also taken from the classroom component of ICE-India. In this excerpt, the teacher topicalizes four days. {Days} have already been evoked as poset and continue to be referenced in different ways throughout the following utterances by the teacher: (6.135) A: Now if this is the case where will you put these days Four days we have now One this here here Put it this way one two three and four Where will you put twenty-second December? What happens on twenty-second December? Where does the vertical uh ray of the sun strike (ICE-IND:S1B-002#282–288) Topics may also be picked up by discourse-deictic pronouns in the subsequent discourse. The topicalized constituent Peking restaurant in the conversational exchange in (6.136) from ICE-Hong Kong, for instance, is picked up again by the personal pronoun it. In (6.137) from ICE-Philippines, it refers to the game. (6.136)

A: Yeah Peking restaurant you’ve tried before And you don’t like it you said that Z: Peking I like Peking duck (ICE-HK:S1A-056#220–222)

(6.137) A: But but of course this game uhm I I didn’t play so I’m not I’m not sure if we won or anything B: Oh was it hard trying out for it (ICE-PHI:S1A-034#130–131) The topic Allan Tomkin runs through the discourse excerpt in (6.138) from the phone calls in ICE-India. While new information is regularly added by speaker B, Allan Tomkin is referenced several times with the personal pronoun he. (6.138) B: May be and then you know I contacted uh Tomkin C: Ah  Allan B: Allan Allan Tomkin I telephoned ah and I even he even wrote to me […]

165

Topicalization form, function, frequencies 165 B: Oh then actually you know that is the place from where you can even directly order things C: Haan B: He told me […] B: Anyway I have paid and ordered books for S S C Board C: Haan B: And he promised to send it at a cost of very marginal charge (ICE-IND:S1A-091#138–165) In addition to personal pronouns, demonstrative pronouns can also refer back to the topic. The pronoun that in the phone call excerpt from ICE-India, shown in (6.139), shares a referent with the topicalized constituent. As mentioned before, pronouns are referential links but were treated separately from identity (understood as exact repetition) in the annotation. (6.139) C: Same pack uh she has given you B: I see and C: That’s for the arthritis (ICE-IND:S1A-091#51–53) The last kind of topic persistence is implicit topic persistence. A complex case of a topic persisting implicitly (i.e., not by being picked up via repetition, a pronoun, or other linguistic means) is given in (6.140). For such cases, the tag ‘zero’ was used. (6.140) A: What about the preparation? B: Yeah yeah exactly they they were all A: Preparation of the marriage I’m talking about B: That A: Ha B: We’ve not discussed anyhing [sic] yet A: You have not discussed B: No no A: And have you booked the hall? (ICE-IND:S1A-095#163–171) The conversational exchange following the topicalized NP leaves room for interpretation. The pronoun that given by speaker B in the first response could be considered as belonging to their following statement which, taken together, might be understood as ‘Regarding that, we’ve not discussed anything yet’. That might also be a false start by speaker B, who restarts the utterance in a different way after speaker A’s short reaction. The latter interpretation, which was favoured over the first in the annotation, means that the topic persists implicitly but can

166

166 Topicalization form, function, frequencies easily be retrieved and is missing solely because it is highly salient in the discourse (possibly because of having been topicalized). The utterance We’ve not discussed anything yet does not need any addition, because the unexpressed modifier ‘regarding the preparation of the marriage’ can be filled in by the hearer without any difficulty. The success of the exchange is evident in the following remarks, which continue to revolve around the preparations of the marriage, first in general and then in more detail. A second example of implicit topic persistence is given in (6.141). Speaker B topicalizes the NP long working hour, which remains a topic for numerous turns in the following conversation. As in example (6.140), the topic is salient to a sufficient degree, which means that there is no need for it to be expressed verbally. However, it could be ‘added’ rather easily. For instance, it could be attached as a post-modifier to the compensation in the form of ‘for them/for it’. (6.141) B: Perhaps they work a long for a long time Long working hour I mean Z: Yeah yeah I see that’s true B: Yeah The compensation is the salary and the bonus Z: Yah I haven’t seen it ant yet though B: Do you got a bonus for this Z: Yah B: Uh that’s cos you are just in here for three months Z: No when they paid out I only join in December B: Aw that’s why (ICE-HK:S1A-010#628–639) The previous examples were only concerned with the Asian varieties. However, all types of topic persistence were also found in ICE-Great Britain. In (6.142), the more specific poset member a few lunches, which is the topicalized constituent, is exchanged in the subsequent discourse for the more general category meals. (6.142) B: I did cook occasionally when they were out And so a few cu a few lunches I cooked for myself That usually happens in our family Up here it’s time thing A: What B: to try and set as usually set aside for meals Would mealtimes at home be congenial (ICE-GB:S1A-059 #171–177) Exact repetition is shown in (6.143), in which the topicalized constituent museums is mentioned again numerous times in the following discourse.

167

Topicalization form, function, frequencies 167 (6.143) A: Do you have the background information on all or any of these B: I mean museums I have uh some kind of background in just because the work that I was doing was quite closely affiliated with museums and stuff […] B: Well yes I mean the idea is I’m not interested really in uh museums more generally I mean I wouldn’t be interested in a more sort of A: OK so it’s a limited number of museums (ICE-GB:S1A-066#55–65) Similarly, the topic three of them is picked up by means of an anaphoric pronoun in (6.144): (6.144) A: So uh this week and what I’d like people to do is uh give brief summaries to the group about the contents of their essays Now three of them I only got this morning so I haven’t had time to look at uhm but I still might ask someone to do them (ICE-GB:S1B-016#2–4) Finally, a case of implicit persistence in the British component is shown in (6.145). Although the speaker does not refer back to the screenplay with the use of a pronoun or a poset member, the topic as a cognitive focal point remains active in the following discourse: (6.145) B: And again what happens then is that you sort of lose the skill you lose you forget the reason that you wanted to write whatever you were writing to begin with and you kind of lose steam on it and uhm the screenplay that I was writing I was actually co- writing with another woman uhm and it completely destroyed her They the sort of relationship that we had both as friends and as writers because it’s just such a The B B C were like this third party that kind of came in and there were all sorts of dictates from above what they needed that year and uhm and what and what the viewers wanted to see and uhm and so what whatever kind of things have been important to your to your script were were trivialized really in in order for them to get what they wanted to get (ICE-GB:S1A-058#209–216) Thus, there are no particular qualitative differences between BrE and the Asian varieties in terms of how topic persistence can be realized. In order to test whether there are quantitative differences between ICE-Great Britain and the

168

168 Topicalization form, function, frequencies Table 6.12 Topic persistence across varieties

Topic persistence

ICE-GB

ICE-HK

ICE-IND ICE-PHI

ICE-SIN

None %

26 37.68%

21 41.18%

94 36.72%

22 45.83%

56 43.75%

Poset %

13 18.84%

13 25.49%

84 32.81%

6 12.5%

35 27.34%

Identity %

12 17.39%

13 25.49%

47 18.36%

14 29.17%

22 17.19%

Pronoun %

17 24.64%

1 1.96%

19 7.42%

2 4.17%

4 3.13%

1 1.45%

3 5.88%

12 4.69%

4 8.33%

11 8.59%

Zero (implicit) %

Asian varieties (as well as between the Asian varieties themselves), the percentages of each kind of topic persistence were calculated. The cross-varietal comparison of topic persistence is provided in Table 6.12. The most substantial difference between the Asian varieties and British English, as represented in ICE, lies in topic persistence via pronouns. Apart from that, each variety has a slight tendency towards one of the categories. In ICE- Philippines, for instance, almost half of all topics disappear from the discourse after they are topicalized. If they do not, then chances are high that they reappear in an identity relation. Again, in this context the frequent occurrence of that much as a construction influences the data, since this construction was rated as an identity link. Topic persistence by means of a poset relation other than identity was found to be the highest in ICE-India, with almost a third of all topics being picked up by a non-referential link. Topic persistence was analysed in this section in a slightly different way than it was by Givón (1984a, 2001a, 2001b) and Gregory and Michaelis (2001), since reading the entire corpora allowed me to scan for topic persistence across longer discourse spans (and not only the subsequent ten utterances after an instance of topicalization). I found that explicit means of topic persistence are usually restricted to the discourse that is closer to the token of topicalization, while implicit topic persistence may continue for a longer time. The aim of this section was manifold: It sought to find out how ‘successful’ topicalization is as an instrument for topic-establishment, which strategies are preferred in the varieties, and how the varieties differ from each other in this regard. Although there are some differences in terms of how topics persist across the varieties, the general tendencies are very similar. The chances of a topic disappearing from the discourse after being topicalized ranged from 38 per cent in ICE-Great Britain to 46 per cent in ICE-Philippines. ‘Topicalized topics’ are more frequently picked up by a pronoun in ICE-Great Britain than in the Asian varieties of English. In future studies, it would be very interesting to see if and how topics persist after being affected by other constructions. However, it needs to be mentioned that sometimes there is no obvious driving force for a topic to persist, which explains the high frequency of topics that do not persist at all.

169

Topicalization form, function, frequencies 169

6.4 Summary This empirical chapter surveyed the forms, functions, and frequencies of topicalization. In the process, important findings relevant to all research questions were discussed. This section sums up these findings and provides answers to the first two research questions. First of all, differences in topicalization usage between Asian Englishes and British English can be found most notably in terms of frequency, but much less so in terms of the forms and functions of topicalization. In its first section, this chapter gave an overview of how topicalization is distributed in the direct conversations, phone calls, and classroom lessons collected in ICE-Great Britain, ICE-Hong Kong, ICE-India, ICE-Philippines, and ICE-Singapore. One of the first major findings is that it is not ICE-Great Britain that has the lowest number of overall tokens, but ICE-Hong Kong, a fact that requires investigation in the next chapter. Overall, Mesthrie’s claim that “the capacity for topic formation in colloquial forms of all English varieties is probably greater than commonly believed” (1992: 123; emphasis added) could be confirmed. On the other hand, some varieties show a notably stronger tendency towards topicalization: At least in the ICE corpora, speakers of IndE and SinE employ topicalization relatively frequently, with IndE showing, by far, the highest frequencies. Yet another claim by Mesthrie is substantiated by my findings in that the overall frequencies of topicalization are low if compared to the overall number of words (1992: 123). In this regard, Mesthrie writes that “topics are easily outnumbered by grammatical subjects in SAIE” (ibid.). While the idea is directly applicable to my findings, it should be noted again that topic and subject often overlap in English. If the grammatical subject and the topic fall together, there is no need for topicalization. The syntactic forms and functions of topicalized constituents were relatively consistent across all five varieties, with minor differences being observable. NPs dominate in ICE-Philippines with almost 90 per cent of tokens belonging to this category, which means that other forms are accordingly less frequent in the corpus. Because of production and processing costs, topicalization of clauses is rare across all corpora. In terms of syntactic function, most topicalized constituents are direct objects. Topicalized adverbials and subject complements also occur relatively frequently, whereas indirect objects and object complements are hardly ever topicalized. In addition to syntactic function, the discourse function of each token of topicalization has been analysed. The direct comparison of varieties shows that topic continuity occurs most frequently in the Indian and Philippine components of ICE, which is a finding that is in line with Winkle’s analysis (2015). When combined and analysed under the heading of ‘emphasis’ (cf. Callies 2009), intensification and contrast are more frequent in ICE-Great Britain, ICE-Hong Kong, and ICE-Singapore than topic continuity. Topic persistence, finally, also behaves relatively consistently in the five varieties. The chance of a topic reoccurring as a pronoun was highest in ICE-Great Britain, which was the main difference from the Asian varieties. The second research question (i.e., whether Mesthrie’s expanded functions of topicalization can be applied to the varieties under consideration or not)

170

170 Topicalization form, function, frequencies found support in practically all respects. Most importantly, topicalization does not require a link to the preceding discourse in all cases. Depending largely on discourse function, brand-new and unused information may also occur in topicalization. That is, if a speaker intends to emphasize or newly introduce a topic, hearer-new, or even discourse-new information may be topicalized. Contrasting set members or establishing topic continuity, on the other hand, typically require a link to the preceding discourse and co-occur significantly with evoked tokens. In terms of which formal constituents may be topicalized and which may not, the analysis clearly shows that all varieties feature topicalized constituents that are not NPs. Although NPs are the most frequent phrasal type to be topicalized, topicalized AdjPs, AdvPs, PPs, and clauses could also be identified. The additional criteria, namely the extraction of topics, interactions with other syntactic processes, and shifting from initiated SVX to topicalization could be found in (almost) all corpora as well. Furthermore, a case could be made for hanging topics as another possible form of topicalization that is employed in vernacular speech. In Table 6.13, attestations of the six functions and differences described by Mesthrie (1992) are summarized. Because ICE-Great Britain was the basis of comparison, the second criterion was not rated for this corpus. When only one token could be found for a criterion, the tick is indicated in parentheses. Since ICE-Great Britain does not deviate significantly from the other corpora with regard to these functions, I cannot substantiate Mesthrie’s claim for SAIE “that topicalisation […] goes well beyond that of mainstream English varieties, in terms of both syntax and pragmatics” (Mesthrie 1992: 115). Instead, I agree with his statement given above, in which he claims that topicalization is subject to a lot more variation than is often assumed in general. I would suggest a closer analysis of different modes of communication in a next step, since speakers employ topicalization creatively in (mostly) spontaneous, undirected conversations. Comparing patterns and frequencies of topicalization in written to spoken language and, more importantly, in language of distance to language of immediacy, Table 6.13 Rating of Mesthrie’s criteria for an expanded concept of topicalization (1992) based on the analysed ICE corpora

1 2 3 4 5 6

Function / Difference

ICE-GB

ICE-HK

ICE-IND ICE-PHI ICE-SIN

High(er) frequency Token is not evoked Constituent is not an NP Interaction with further processes Topic extraction from embedded clause Shift from initiated SVO to TOP

/ ✓ ✓

✕ ✓ ✓

✓ ✓ ✓

✕ ✓ ✓

✓ ✓ ✓

(✓)

✕

✓

✓

✓

(✓)

✕

(✓)

(✓)

✓

✓

✓

✓

(✓)

✓

171

Topicalization form, function, frequencies 171 will be necessary in order to better understand the contexts in which topicalization and similar constructions are favoured. Following the empirical analysis provided in this chapter, the reasons for variation in the frequency of topicalization need to be analysed. For this purpose, the next chapter discusses numerous potential influences on topicalization usage in the four Asian varieties of English.

Notes 1 All graphs in this chapter were created using the ggplot2 (Wickham 2009) and reshape2 (Wickham 2007) packages in R (R Development Core Team 2015). 2 This p-value given by R equals p