Experience Counts: Frequency Effects in Language 9783110346916, 9783110343427

Frequency has been identified as one of the most influential factors in language processing, and plays a major role in u

174 104 2MB

English Pages 262 [264] Year 2016

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Experience Counts: Frequency Effects in Language
 9783110346916, 9783110343427

Table of contents :
Contents
Acknowledgements
Experience counts: An introduction to frequency effects in language
Explaining variation in plural marking of German noun insertions in Russian sentences
Hesitation placement as evidence for chunking. A corpus-based study of spoken English
Recency as a factor of phonological variation
Frequency effects in lexical sociolectometry are insubstantial
Input optimization. Effects of type and token frequency manipulations in instructed second language learning
Modeling frequency effects in language change
Frequency and lexical specificity in grammar: A critical review
Frequency in language learning and language change. The contributions to this volume from a cognitive and psycholinguistic perspective

Citation preview

Experience Counts: Frequency Effects in Language

linguae & litterae

Publications of the School of Language & Literature Freiburg Institute for Advanced Studies Edited by Peter Auer, Gesa von Essen, Werner Frick Editorial Board Michel Espagne (Paris), Marino Freschi (Rom), Ekkehard König (Berlin), Michael Lackner (Erlangen-Nürnberg), Per Linell (Linköping), Angelika Linke (Zürich), Christine Maillard (Strasbourg), Lorenza Mondada (Basel), Pieter Muysken (Nijmegen), Wolfgang Raible (Freiburg), Monika Schmitz-Emans (Bochum)

Volume 54

Experience Counts: Frequency Effects in Language Edited by Heike Behrens and Stefan Pfänder

ISBN 978-3-11-034342-7 e-ISBN (PDF) 978-3-11-034691-6 e-ISBN (EPUB) 978-3-11-038459-8 ISSN 1869-7054 Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2016 Walter de Gruyter GmbH, Berlin/Boston Typesetting: epline, Kirchheim unter Teck Printing: Hubert & Co. GmbH & Co. KG, Göttingen ♾ Printed on acid-free paper Printed in Germany www.degruyter.com

Contents Stefan Pfänder and Heike Behrens Acknowledgements  VII Stefan Pfänder and Heike Behrens Experience counts: An introduction to frequency effects in language  1 Nikolay Hakimov Explaining variation in plural marking of German noun insertions in Russian sentences  21 Ulrike Schneider Hesitation placement as evidence for chunking A corpus-based study of spoken English  61 Christian Schwarz Recency as a factor of phonological variation  91 Tom Ruette, Katharina Ehret, and Benedikt Szmrecsanyi Frequency effects in lexical sociolectometry are insubstantial  111 Karin Madlener Input optimization Effects of type and token frequency manipulations in instructed second language learning  133 Malte Rosemeyer Modeling frequency effects in language change  175 Holger Diessel Frequency and lexical specificity in grammar: A critical review  209 Nick C. Ellis Frequency in language learning and language change The contributions to this volume from a cognitive and psycholinguistic perspective  239

Acknowledgements This volume presents research that originated from work in the Graduiertenkolleg (GRK) 1624 “Frequenzeffekte in der Sprache: Frequenz als Faktor in gebrauchs­ basierten Modellierungen von Sprachwandel, Sprachverarbeitung und Sprach­ erwerb” (Research Training Group (RTG) 1624: ‘Frequency effects in language: Frequency as a determinant in usage-based models of language change, l­ anguage processing and language acquisition’). This research group is funded by the Deut­ sche Forschungsgemeinschaft (DFG) and was supported by the Freiburg Institute of Advanced Studies (FRIAS), sponsored by the German government to promote excellent research universities. We would like to thank the DFG and the FRIAS for their generous support. This volume would not exist if it was not for the intense discussions, collaborations and numerous research visits from guests all over the world facilitated by both the DFG and the FRIAS. The papers in this edited volume bear witness to the wide variety of topics and aim at developing interdiscipli­ nary research that crosses the boundaries of established fields and enables us as researchers to profit from insights beyond our own realm of study. The introduc­ tion by Pfänder and Behrens and the afterword by Ellis fan out the fundamental questions in linguistics, psycholinguistics and cognitive science covered by the articles in the present volume. The RTG on Frequency Effects started in 2009 and will continue until 2018. The original group of investigators and supervisors consisted of Peter Auer, Heike Behrens, Daniel Jacob, Rolf Kailuweit, Lars Konieczny, Bernd Kortmann, Christian Mair, Stefan Pfänder, Gerhard Strube (in alphabetical order), with the editors of this volume as speakers. Some of us profited from sabbaticals awarded by FRIAS that allowed us to work together more intensely than it would have been possible in a typical, busy academic year. Benedikt Szmrecsanyi and Martin Hilpert were Junior Research Fellows at the FRIAS at the time, and provided ­valuable theoretical and methodological input throughout. We would also like to thank the (anonymous) reviewers of this volume for their critical feedback, and the Series Editor and former FRIAS director Peter Auer for encouraging us to put together this volume. Julia Vagg gave critical content and stylistic editorial feedback to all papers. Special thanks go to Michael Schäfer, Monika Schulz, Julia Voegelin, Lisa Pütz and ­Catherine Diederich for their dedicated, precise and skilled checking of any formal aspect of this volume throughout the production process. Nikolay Hakimov, Karin Madlener, Malte Rosemeyer and Ulrike Schneider are four of the ten successful PhD students of the first generation whose work is represented in this volume. They all impressed us with their enthusiasm to learn about new theories and fields of research as well as putting theory into practice

VIII 

 Acknowledgements

by addressing their research questions with state of the art methodology. Holger Diessel, Nick Ellis and Tom Ruette were guests of the RTG and FRIAS. They, as well as many other guests, provided high frequency input on frequency effects in a number of linguistic domains. They helped us create and sustain an atmos­ phere of intellectual excitement that kept the mind fresh and alert, even when one was tired from a lot of reading and trying to understand, learn and apply new and complicated statistical methods. We owe all of them and many others our heartfelt thanks for this intense but exciting period of our academic lives! Freiburg and Basel, August 2015

Stefan Pfänder and Heike Behrens

Stefan Pfänder (University of Freiburg) and Heike Behrens (University of Basel)

Experience counts: An introduction to frequency effects in language 1 Experience counts: Towards a unified account of frequency effects in language learning, language processing, and language change Frequency effects are ubiquitous in virtually every domain of human cognition and behav­ ior, from the perception of facial attractiveness […] and the processing of musical struc­ ture […] to language change […] and adult sentence processing […]. [F]requency effects are ubiquitous also in children’s first language acquisition. […] We argue, very simply, that frequency effects constitute a phenomenon for which any successful theory must account. (Ambridge, Kidd, Rowland, and Theakston 2015: 240)

The mission statement by Ambridge et al. (2015) quoted above presents a very strong claim, supported by research from different fields, namely language acqui­ sition, language change, and language production and perception (cf. Behrens 2009a, 2009b; Bybee 2010; Ellis 2002, 2012a, 2012b, 2012c; Pfänder et al. 2013; Diessel 2014; among others). Researchers across all these fields have argued that experience counts. This entails that the frequency of the experience of an item matters for learners, listeners, and speakers alike (for other recent surveys on frequency effects see e. g. the papers in Gries and Divjak (2012) and Divjak and Gries (2012); as well as Divjak and Caldwell-Harris (2015) and Hilpert (submitted)). But what exactly does the term frequency effects refer to, given the con­ troversies in the field about the role of frequency when linguistic structures are being learned or undergo language change? For the present volume, frequency is not argued to be the most decisive factor in language, nor do we want to argue that all processes are driven by the most frequent item(s). In some domains, type and/or token frequencies have a direct effect (see below), whereas in other con­ stellations frequency interacts with other processing factors such as recency and salience. Frequency in terms of high transition probability, for example, leads to entrenchment and automatization (Diessel, this volume). High frequency can protect certain forms from errors because of their entrenchment, but it may also lead to errors in contexts where a lower frequency form is the target (Ambridge et al. 2015). In language change, the most frequent type may attract new items

2 

 Stefan Pfänder and Heike Behrens

and thus trigger change; but highly frequent tokens may also resist these attractor effects (conservation effect; Bybee 1985). These seemingly contradictory effects can be reconciled if (a) we assume that frequency interacts with other processing factors such as recency or salience, and if (b) we make a clear distinction between type and token frequency effects. Token frequency or repetition alone does not lead to generalization but to entrench­ ment (cf. Bybee 2006, 2010; Ambridge et al. 2015). Type frequency or variation, on the other hand, is needed to provide the basis for possible schema formation and generalization (Langacker 1987; Bybee 2006, 2010; Ambridge et al. 2015). Last but not least we have to take into account that (c) frequency effects operate at different levels of representation, from small and concrete linguistic units like sounds1, morphemes2 or words3 to more abstract and larger units such as multiword phrases4, sentential constructions5 or even conversational structure6. Thus, if defined properly, and all other things being equal and controlled for, frequency can be shown to have a certain effect on the processing of linguistic units and ultimately on the language system itself. Consequently, the investigation of

1 Over the last five years, frequency effects have been shown, for example, in phonetic reduction (Hollmann and Siewierska 2011; Lorenz 2012, 2013a, 2013b; Schäfer 2014), in phonological variation (in a broader perspective, among others, Coetzee and Kawahara 2013), in phonological height harmony (Archangeli, Mielke, and Pulleyblank 2012), and in the area of syllable frequency (Cholin, Dell, and Levelt 2011). 2 In morphology, frequency effects have been shown ranging from morphemes in Dutch L1 (Ver­ hoeven and Schreuder 2011) over inflectional morphology in L1/L2 Spanish (Bowden, Gelfand, Sanz, and Ullman 2010) to more complex linguistic units such as compound constructions (Baa­ yen, Kuperman, and Bertram 2010; Dye, Walenski, Prado, Mostofsy, and Ullman 2013). 3 Word frequency has been investigated in areas as diverse as experimental psychology (Brys­ baert, Buchmeier, Conrad, Jacobs, Bölte, and Böhl 2011), speech recognition (Dufour, Brunel­ lière, and Frauenfelder, 2013; Johns, Greuenfelder, Pisoni, and Jones 2012), discriminative learn­ ing perspectives (Baayen 2010 and Baayen, Hendrix, and Ramscar 2013), L2 corpus linguistics (Crossley, Salsbury, Titak, and McNamara 2014), reading tasks for dyslexic children (Grande, Meffert, Huber, Amunts, and Heim 2011), and sociolinguistic approaches to syntactic variability (Erker and Guy 2012). 4 Frequency effects have been shown to be operative in multi-word phrases in Arnon and Snider (2010); Arnon and Priva (2013); Janssen and Barber (2012); Siyanova-Chanturia, Conklin, and van Heuven (2011), among others. 5 In syntax, frequency effects have been shown for tense/aspect (Ellis 2013), verb-argument constructions (Ellis, O’Donnell, and Römer 2014), and different sentential complements (for ex­ ample, Kidd, Lieven, and Tomasello 2010). 6 Conversational phenomena such as hesitation (Schneider 2013, 2014, this volume) and repair (Kapatsinski 2010, Pfeiffer 2015) have only recently been investigated in relation to transitional probabilities and entrenchment.



Experience counts: An introduction to frequency effects in language  

 3

frequency effects requires a very fine-grained analysis of the usage-conditions of linguistic structures, as well as the application of sophisticated statistical methods. In the present volume we present research that was carried out in the context of the research training group “Frequency effects in language: Frequency as a determinant in usage-based models of language change, language processing and language acquisition” that has been funded by the Deutsche Forschungsgemein­ schaft since 2009 (DFG GRK 1624). The goal of this group is to bring together inter­ disciplinary expertise relevant to frequency-related phenomena in language, to train cohorts of graduate students in the relevant linguistic and psycholinguistic theories and methods, and to stimulate research across disciplines by looking at topics that bear the potential for a transfer of theoretical insights and/or empirical methods (see also Ellis, this volume, for the common denominator for all the studies from the perspective of cognitive psychology). In terms of theory we proceed from the central claim of usage-based linguis­ tics that linguistic structures emanate from usage-events (Langacker 1987; Bybee 2006). The research training group’s work tackles a central problem of usagebased linguistics, namely the question of how and to what degree frequency distributions in language use impact on the processing by speakers and hearers or on the emergence of linguistic structure in language acquisition, processing and change (for more results from this group see the articles in Franceschini and Pfänder 2013). The group investigates frequency effects at work in several areas. On the one hand research is being done in areas where frequency effects are to be expected, as, for example, in the phonetic reduction of high-frequency words or chunks (Lorenz 2013a, 2013b), in the resistance of morphological irregularities to analogical leveling (Krause, in progress), or in the speedy production and retrieval of compounds (Schmid, forthcoming). On the other hand research is also being carried out in areas where frequency effects have received hardly any attention so far, as, for example, in code-switching (see Hakimov, this volume), or where experimental investigation has faced considerable logistical obstacles, as in the calibration of input in instructed L2 acquisition in classroom settings (see Madlener, this volume). Coming back to our motto from Ambridge et al. (2015), it has been dem­ onstrated in many fine-grained empirical analyses that theories of language change, language learning, and language processing cannot ignore frequency effects. In our view, the time has come to start working on those topics that cross the boundaries of learning, processing and changing language, and to investigate whether similar processes are at work in these areas. Ellis (2008a) provides a compelling sketch of the interaction of processing, acquisition and change within a dynamic, emergentist concept of language: Language use leads to language change because frequency-induced processes like reduction lead to the erosion of

4 

 Stefan Pfänder and Heike Behrens

certain morphemes. Language change affects perception, and perception affects learning because reduced elements are less salient, and therefore harder to learn. This, in turn, affects the language system. In second language learner varieties, for example, non-salient elements are often omitted. Consequently we observe differences in, for instance, the grammar of the varieties referred to as “English as a second language” and the varieties of English spoken by native speakers. While first language learners have enough exposure to ultimately turn their attention to fine morphological details, second language learners often omit them because they are hard to perceive and functionally redundant, as the respective gram­ matical information is often also coded by other elements (Ellis 2008a). For the present volume we have invited contributions that illustrate four pos­ sible lines of research for cross-domain studies of frequency effects: –– The role of entrenchment in language change, learning, production and processing (Section 2). –– The interaction of frequency with other processing factors such as recency and salience (Section 3). –– The differential impact of type and token frequencies in processing, learning and change (Section 4). –– The automatization of activation as a function of linear order (Section 5). In the remainder of this introduction we will sketch out those strands of research and position the contributions to the present volume in the context of these over­ arching topics.

2 Entrenchment in language change, learning, production and processing If language use has an influence on how language is represented cognitively, then quantitative changes in one’s language use will lead to changes in the language system itself. Repeated encounter leads to entrenchment, the strengthening of memory traces (cf. Blumenthal-Dramé 2012; Divjak and Caldwell-Harris 2015; Hilpert and Diessel, forthcoming; Schmid, forthcoming). The entrenchment of linguistic units gives rise to a number of interesting processes. This increases the stability of representation and facilitates retrieval because high frequency units are typically more readily accessible. The frequent co-occurrence of several units may result in chunking:7 The larger, chunked unit is learned and/or processed as

7 This issue has been the subject of extensive debate in corpus linguistics (cf. for instance Stefa­ nowitsch and Gries 2003; Geeraerts, Grondelaers, and Bakema 1994).



Experience counts: An introduction to frequency effects in language  

 5

a whole. More complex processes have been identified as well, for example when co-occurrence likelihoods or transitional probabilities change. One of these more complex processes is grammaticalization, where lexical elements take on a gram­ matical function. Grammaticalization not only affects single words, but can also occur at the boundaries of previously independent items. The highly frequent co-occurrence of independent items may cause their mor­ phological boundaries to become blurred (phonetic reduction effect, see Bybee and Thompson 1997), resulting in the fusion, compounding and automatization of formerly separate elements. These types of fusion processes exhibit an increase in the occurrence of phonological processes across morphological boundaries, in particular in high frequency combinations. Palatalization at the morphological boundaries between verbs and the pronoun you, for example, occurs more frequently in don’t you than in good you (came) (cf. Krug 2003; Bush 2001; Cooper and Paccia-Cooper 1980). This leads to an increase in opacity of the internal structure of high-frequency, morphologically complex expressions (cf. Bybee 1985; Krug 1998; Boyland 1996 for you and I; Bybee 2001 for the French form of liaison; Bybee 2010). This frequency effect goes beyond mere phonetic erosion. The eroded high frequency units will also increasingly be perceived as fused units and may become autonomous, meaning they are no longer perceived as consisting of the original units. This process can be illustrated by the change of the phrase let’s in English, which – at least for some speakers – changed from the combination of an imperative and an object pronoun into an adhortative element. In this function, we observe double marking of the subject because speakers have fused the ’s with let and do not see it as reduced form of us, as in the colloquial let’s you and him fight (cf. Hopper and Traugott 2003: 12–16). Recently, research in the area of language change has become consider­ ably more corpus-based and frequency effects have been integrated into various models of language change. While frequency was mentioned only in passing in the first edition of Hopper and Traugott’s (1993) authoritative compendium (for instance in connection with the phonological by-products of morphologization, p. 146), the new edition dedicates an entire chapter to it, discussing the issue in a much more differentiated manner (Hopper and Traugott 2003: 126–130). The role of frequency of usage in general and of entrenchment in particular has been studied with respect to the building and dismantling of forms and constructions  – i. e. in the initial phase of processes of grammaticalization, especially those in present-day language (cf. the English allegro forms gonna < going to and gotta < have got to and their emergence in overseas varieties of English; Lorenz 2012, 2013a). Increases in frequency after grammaticalization processes are well attested, due to the fact that grammatical elements tend to be more frequent than lexical items (Zipf 1936). Is it possible, however, to find evidence for frequency as a trigger for grammaticalization processes? The work

6 

 Stefan Pfänder and Heike Behrens

of Rosemeyer (2013, this volume) suggests that such evidence exists. This type of research requires fine-grained statistical analyses of the time course of shifts in usage frequencies. The focus of investigation shifts from the expected increase in frequency as a result of grammaticalization, which occurs with a delay due to the predominantly written source material provided by corpora, to usage frequency as one of the possible triggers of grammaticalization processes (cf. Rosemeyer 2013, 2014, this volume). In the present volume we present two studies that investigate frequency effects at work between or across linguistic units. Hakimov (forthcoming, this volume) provides a frequency-based account of code-switching and language change in multilinguals. Schneider (this volume) is concerned with the (non)occurrence of hesitation phenomena as an indicator of planning processes and units in language production. Both Hakimov (forthcoming, this volume) and Schneider (2013, 2014, this volume) provide evidence for the existence of chunks not just in language learning but also in language production and processing. They show that code-switching and hesitation phenomena can be predicted by the degree of entrenchment of concrete multi-morphemic or multiword units. Their contributions also illustrate that the same theoretical model (usagebased theory) and empirical methods (cohesion and predictability measures) can be used to study language production phenomena in two previously dis­ tinct fields of research: hesitation phenomena in speech production and code switching in bilinguals. In both domains, frequently used multi-word sequences turn into chunks that strongly disfavor interruption, either by a code switch or by a hesitation marker. In addition, both contributions demonstrate that the phenomenon of chunking is not limited to being tested experimentally, but can also be investigated using appropriate statistics on corpus data.

3 The interaction of frequency with other processing factors such as recency and salience Experience does count and, crucially, goes beyond the mere perception of usage frequencies. The strength and nature of our experience is also influenced by processing factors such as the context in which a unit occurs, its perceptual salience, and memory related factors such as recency. There are other factors beyond the ones relevant for processing of course, including but not limited to contextual factors such as the social setting or the attitude of the participants − these factors are not yet at the horizon of our empirical research, however. Recency: frequency effects are dependent on the time frame in which they occur. A more recent speech event has a stronger influence on how a current



Experience counts: An introduction to frequency effects in language  

 7

speech event is processed than one that is less recent. This effect can be derived immediately from the structure of memory and has been documented repeatedly in language processing (Szmrecsanyi 2006; Poplack and Tagliamonte 1996; Ellis 2012c). This is particularly relevant when two variants of a construction or ­category are in free variation in a particular context. Three factors are important here. Firstly, the amount of time that has elapsed since the last occurrence of the item: the more recently an item has occurred, the stronger its activation ­(Szmrecsanyi 2006). Secondly, the frequency with which an item has occurred in a specific amount of time: the more often an item has occurred recently, the stronger its activation. Thirdly, the overall frequency of the item: the lower the general token frequency of an item, the stronger its recency effects (Jacoby and Dallas 1981; Schwenter 2013; Rosemeyer 2014). Recency effects are boosted by low token frequency, because the occurrence of items with a low token frequency causes a higher rate of surprisal (see below) which leads to stronger activation. Recency effects are also modulated by the timing of the exposure. There is a domain-general advantage of spaced learning over massed learning in memory retention. Exposure to a high number of tokens in a short period seems to be less effective than a more distributed exposure where the experience is re-activated (see the meta-analysis by Janiszewski, Noel, and Sawyer 2003; Ambridge, Theak­ ston, Lieven, and Tomasello 2006; as well as Madlener, this volume, for acqui­ sition). Schwarz (this volume) investigates frequency effects in the realm of phonological variation. According to him, recency can be defined as the tendency of the speaker to repeat identical phonological items within a speaking sequence. Speakers are more likely to repeat identical phonological items if the time span between two utterances is short. On the basis of a large corpus Schwarz is able to show that the variation between dialect and standard realizations of vowels in the Alemannic area are best described as an interaction effect of recency and frequency. In a nutshell, the less frequent variant has stronger recency effects, and vice versa. Schwarz concludes that recency does not explain why an innovative phonological item enters the repertoire of a dialect speaker, but it does account for its spread. Salience: frequency effects are also influenced by the salience of the respective grammatical or phonetic unit (Tomlin and Myachykov 2015). Salience can, for example, be defined as prosodic salience. Prosodically more pronounced, i. e. more rhythmical, speech events activate words more strongly, for instance, than those that have less pronounced prosody. Morphosyntactic salience is studied primarily in research on dialect contact (Trudgill 1986; Kerswill and Williams 2000, 2002), in perceptual dialectology (Lenz 2010; Long and Preston 2002; Kerswill 2002) and in sociolinguistics (Elmenthaler, Gessinger, and Wirrer

8 

 Stefan Pfänder and Heike Behrens

2010; Patterson and Connine 2001) 8. Ruette, Ehret and Szmrecsanyi (this volume) investigate the interaction of frequency, salience and surprisal. Using methods from sociolectometry they quantify lexical distances between varieties of English as a function of the number of different lexical encodings of one and the same semantic concept (such as soda and pop). Their quantitative lexical sociolectom­ etry study does not find a frequency effect in favor of the most frequent lexical alternatives. The lexical distances between informative and imaginative corpora of American and British English remained similar when either the high-frequency or low-frequency variables were boosted in the statistical model. It must be noted that these models do not study human perception or processing. Overall, however, the impact of frequency of lexical items on distances between varieties studied in isolation seems to be very modest if not non-existent. In this introduction we focus on recency and salience as competing factors. There are other factors, however, that also influence the processing of exemplars, especially the social context. Ambridge et al. (2015) show how situational, social and individual aspects of interaction affect the intake of the input. For future research we have to keep in mind that linguistic experience has many dimensions beyond the linguistic signal: “The cognitive neural networks that compute the associations binding linguistic constructions are embodied, attentionally – and socially  – gated, conscious, dialogic, interactive, situated, and cultured” (Ellis 2012c).

4 The differential impact of type and token frequencies in processing, learning and change The most basic distinction with regard to the concept of frequency is that between type and token frequency (cf. Ellis 2012c). Token frequency refers to the number of occurrences of a concrete form (or of a lemma) in a corpus or in the input in general: how often does, for example, the word form played appear in a corpus? High token-frequency does not typically lead to high productivity of a con­ struction, but rather to the entrenchment of a particular instantiation of this con­ struction.

8 It is noteworthy, however, that these studies have focused on phonology, i. e. that features of dialect grammar are largely excluded (with the exception of Cheshire 1996 and e. g. Kerswill’s work), and that barely any systematic work has been done on either the rele­vance of (high or low) frequency in explaining perceptual salience (or non-salience) or the exact interaction of frequency and salience in various scenarios (but see Rácz 2012, 2013).



Experience counts: An introduction to frequency effects in language  

 9

Type frequency, on the other hand, refers to the number of distinct items that can fill a slot in a particular construction: How many different lexical items can appear, for example, with the English Past Tense Construction VERBed? Measuring the type frequency of a construction is crucial for determining its productivity (Bybee and Hopper 2001; Barðdal 2008): high type frequency correlates with high productivity. The English VERB-ed construction, for example, can be said to be far more productive than the past tense construction which involves an i-a vowel change (as in sink-sank) since it appears with a far greater number of verbs and also readily extends to new verbs. In contrast, “irregular” and/or unproductive patterns of low type frequency may survive if their token frequency is sufficiently high, or if they are highly analogous to other forms. We need to look at the different effects of type and token frequency in order to explain why high frequency units tend to remain stable due to their entrenched nature in some situations (“conservation effect”; Bybee 1985; Bybee and Thompson 1997) but seem to drive linguistic change in other situations. In morphology and syntax, forms that occur frequently have stronger cognitive anchoring than those that are less frequent and are therefore less affected by processes of regularization. Thus the strongest irregularities of verbal paradigms (i. e. apophonia, suppletion) apply to the verbs that are used most frequently, such as modal and auxiliary verbs (cf. for instance Croft and Cruse 2004: 293 on the verb to be; see also Nübling 2000; Lieberman, Michel, Jackson, Tang, and Nowak 2007). In the field of morphosyntax, Hilpert (2008) shows that both the absolute frequency of adjectives (in the positive form) and the occurrence ratio of adjectives in the positive and the comparative form influence the usage of the analytical or synthetic comparative: while many adjectives occur in both forms (e. g., prouder and more proud), a frequent adjective such as easy is mostly inflected as easier, whereas the same does not hold for the low-frequency queasy. Likewise, some adjectives are used more often in the comparative than in the positive (e. g., humble). These adjectives are more likely to be used in the mor­ phological comparative (humbler) than those adjectives that are mainly used in the positive (e. g., able). In a re-analysis of Poplack’s (1992) corpus-based study of the subjonctif in Canadian French Bybee and Thompson (1997) suggest that the preservation of certain constructions can be traced to the high frequency of V + complement combinations in constructions such as faut que je lui dise, while the subjonctif form in general is becoming increasingly rare. We may also assume that the preservation of SVO syntax with negation and yes/no questions containing English auxiliary verbs is due to their high frequency in Middle English (Krug 2003: 29). An underexplored area of research where the type-token ratio plays a crucial role is the decrease in frequency that occurs when grammatical constructions

10 

 Stefan Pfänder and Heike Behrens

disappear from the system (cf. Traugott 2012: 189). In addition to a theory of how grammar “emerges”, a theory is needed that explains how grammar “dis­ appears”. In simple cases, inspections of type-token-relations may be sufficient: at what point are the tokens of a particular construction restricted to few enough constructional (sub)types that the construction can be called “unproductive”? The concept of “statistical pre-emption” may be useful here. Statistical preemption has been used to explain why speakers do not make generalizations they might be expected to make, i. e. how speakers learn “what not to say” (Boyd and Goldberg 2011). While the question of how speech communities learn “what not to say any more”, is certainly slightly different, statistical pre-emption may be a helpful concept here as well. In any case, when looking at the decrease in frequency of a construction, the contexts a construction (still) occurs in are of crucial importance. Otherwise, the seemingly paradoxical relation of dis­ appearance from the system and continuous existence in functional niches that can be found for a lot of “disappearing” grammatical constructions would not receive a principled explanation. While all new processes start out with a first exemplar (token and type), the actual effect of this exemplar depends on our previous representation of related exemplars, as well as the nature of subsequent experiences of this unit. Whether this unit acquires high token or type frequency leads to different outcomes  – stabilization in the first case, variation and possibly productivity in the latter. Madlener (2015, this volume) and Rosemeyer (2013, 2014, this volume) discuss these issues in language acquisition and in language change. Their studies are relevant also because they study the effect of the concrete timing of experience. In her training study of a productive, but lesser known German construction with advanced learners of German, Madlener found that depending on the learners’ previous knowledge, different type-token ratios of input had different effects (learners with low previous knowledge profited more from increased token frequencies, for example). The temporal resolution of the input, however, mattered as well: Some learners profited from skewed input where they received the same number of types, but some types occurred with higher token frequencies. In particular constellations, a Zipfian distribution in type-token frequencies can also give rise to prototype effects (Ibbotson and Tomasello 2009; Ellis, McDonell, and Römer 2013). A close investigation of both token and type frequencies has proven most fruitful for Rosemeyer’s investigation of language change in Spanish auxiliaries (cf. Rosemeyer, this volume). As in other Romance language, there is a choice in compound tense auxiliaries between haber ‘to have’ and ser ‘to be’. Con­ ditions for the selection of the auxiliary have been changing from the Middle Ages until today. Rosemeyer models the general tendency towards the choice of



Experience counts: An introduction to frequency effects in language  

 11

have with perfects and plu-perfects as an effect of type frequency. The fact that several verbs resist this general tendency can be explained by a conservation effect caused by the high token frequency of these lexemes. Both Madlener and Rosemeyer thus show that both type and token frequencies have to be consid­ ered as constitutive parameters in speakers’ experience with language, both in language learning and in language change. Interestingly, both contributions show that the effects of type and token frequencies may be active at different, though overlapping, time spans. In a nutshell, then, different frequency effects will be modelled at different moments in time, and as separate processes that may converge over time.

5 The automatization of activation as a function of linear order The assumption that both learners and fluent speakers build their knowledge on memorized exemplars is gaining acceptance in the field of usage-based lin­ guistics. Diessel shows that the ‘bare’ exemplar model cannot account for the relations between lexemes and constructions though. In Diessel’s view, “the central idea behind exemplar theory is that categories are emergent from individ­ ual tokens of experience that are grouped together as exemplars, i. e. clusters of tokens with similar or identical features, which are then used to license the classification of novel tokens. Since emerging token clusters can be more or less complex, exemplar representations vary across a continuum of generality and abstractness” (Diessel, this volume, 210). The exemplar model has first been developed in the field of phonetics and phonology (Pierrehumbert 2001, 2006). If applied to syntax, it should be able to account, for example, for the clustering of constructions as in: (a) Peter gives John a letter / tells John a story. (b) Peter brings / takes a letter to John. All of the verbs illustrated in the examples above (give/tell/bring/take) can be used both in a ditransitive construction and with a prepositional phrase. There is a strong statistical bias, though. Both give and tell tend to be used frequently in ditransitive constructions, whereas bring and take are most frequently used with a prepositional phrase. Even if we do find semantic and/or pragmatic explanations for this tendency of use (Goldberg 1995; Haspelmath 2012), these options are still learned as such and not independently of probability. Nonetheless, Diessel argues, “irrespective of the fact that the speaker’s choice of particular words is semantically and/or pragmatically motivated, there is evidence that the lexical

12 

 Stefan Pfänder and Heike Behrens

biases of verb-argument constructions are also represented in memory. Speakers ‘know’ that the ditransitive construction typically occurs with particular verbs (and particular nominals) because they have experienced this construction so frequently with certain words”. Going beyond the ‘bare’ exemplar idea, Diessel claims that the way in which speakers have “experienced this construction” is intrinsically related to the sequential character of syntax. This can be best explained by citing yet another example from Diessel’s paper. If we hear the sequence “Peter donated …” online, in the real time course of conversation (Auer 2007, 2009), we expect or ‘project’ the sentence to continue with a direct object and a prepositional phrase (c) and not with the ditransitive construction (d): (c) Peter donated money to the Red Cross. (d) *Peter donated the Red Cross money. Diessel argues that the general assumption of exemplar theory as it has been developed in phonetics and phonology, namely that categorization starts from particular items, can be transferred to syntax but needs to be modified. Syn­ tactic constructions contain a lot of lexical information that speakers are aware of. Speakers know that verbs do not only have certain semantic and pragmatic properties but also tend to be used in certain types of constructions. As syntax unfolds in real time, the automatization of constructional strings does not only depend on speakers’ experiences with lexically specific usage-patterns, but also on transitional probabilities. In other words, the specific strings expected after particular lexemes trigger syntactic projections in real time.

6 Summary and outlook The usage-based model assumes that “grammar is held responsible for a speaker’s knowledge, of the full range of linguistic conventions, regardless of whether these conventions can be subsumed under more general statements” (Langacker 1987: 494). About two decades later Joan Bybee (2006) coined the often cited passage on the impact that frequency, as one crucial factor of speakers’ “experience” with previously heard utterances, has on language: A usage-based view takes grammar to be the cognitive organization of one’s experience with language. Aspects of that experience, for instance, the frequency of use of certain con­ structions or particular instances of constructions, have an impact on representation that is evidenced in speaker knowledge of conventionalized phrases and in language variation and change. (Bybee 2006: 711)



Experience counts: An introduction to frequency effects in language  

 13

This implies that we need to rethink the basis of the analysis of language structure or grammar: what de Saussure has been said to call parole is not just the instantiation of langue or the language system, but the very basis for its existence. The speakers’ experience with acts of parole makes them develop probabilistic expectations. What speakers of any language or variety hear during their everyday interactions is what “counts” as their linguistic experience. Or, as Bybee continues in her argument that the language system is built from usage: The proposal presented here is that the general cognitive capabilities of the human brain, which allow it to categorize and sort for identity, similarity, and difference, go to work on the language events a person encounters, categorizing and entering in memory these experiences. The result is a cognitive representation that can be called a grammar. This grammar, while it may be abstract, since all cognitive categories are, is strongly tied to the experience that a speaker has had with language. (Bybee 2006: 711)

From our perspective, the contributions in this volume provide relevant insights towards an integrative and experience-based theory of language acquisition, processing and change. In addition they are examples of how recent linguistic research integrates insights and methods from cognitive psychology (see the afterword by Nick Ellis, this volume). The main results along the four lines of research outlined above show how frequency contributes to the emergence or stability of linguistic systems that are thought of as being temporary states in a dynamic process: Firstly, one of the crucial concepts of an experience-based account of language acquisition, processing and change that several different fields agree on is the concept of entrenchment. Entrenchment relies heavily on transitional probabilities based on speakers’ experiences with previously heard utterances. In language processing, entrenched units are less prone to be interrupted by hesitation phenomena (Schneider, this volume), and they may be used as chunks in language contact situations, which leads to chunk-based language mixing independent of morpho-syntactic boundaries (Hakimov, this volume). Secondly, while the degree of entrenchment seems to directly affect certain processes in language learning and processing (for example, the conservation and reduction effects discussed above), other frequency effects are modulated by the interaction with processing factors such as context, recency and salience (cf. Ellis, this volume; Ruette, Ehret and Szmrecsanyi, this volume; and Schwarz, this volume). Schwarz investigates the variability in vowel production in Aleman­ nic areas and finds that variability is best predicted by an interaction effect of recency and frequency: The less frequent variant has stronger recency effects, and vice versa. Ruette, Ehret, and Szmrecsanyi test frequency-based models in lexical sociolectometry and find no difference in the lexical distances between

14 

 Stefan Pfänder and Heike Behrens

varieties of English and different genres. Thus frequency does not seem to have an effect on lexical variation when the lexicon is studied in isolation, and in corpus data alone. Thirdly, the way the linguistic system is shaped by usage depends on the distribution of types and tokens over time, in interaction with the speaker’s current representation of the system. Madlener (this volume) shows that “input flooding is not the whole story”. When learning a construction, beginning learners profit more from moderate type variation with statistical skewing that helps them to identify the function of a construction, while advanced learners are able to profit from increased type frequency that allows them to expand the category. Rosemeyer (this volume) traces such interactions over time in the his­ torical change of the Spanish auxiliary system that is characterized by bursts in frequency of one construction and the fading off of the other. In order to establish the effect of frequency on a certain (intermediate) state we thus need to be precise not only in what is being counted, but also about when we count it. Fourth, one of the central findings of usage-based linguistics is the importance of the lexical specificity of grammatical constructions. Diessel (this volume) argues that the linear order in which speakers process linguistic strings plays an important role in predicting which construction is activated. Diessel argues that the processing of concrete strings in their linear order does not become redundant when speakers have made more abstract generalizations because the automatization we observe is based on the bottom-up processing of the incoming information, rather than being rule-governed top-down. We hope that the studies on language change, language learning and language perception presented here convince our readers that experience counts!

References Ambridge, Ben, Evan Kidd, Caroline F. Rowland and Anna L. Theakston 2015: The ubiquity of frequency effects in first language acquisition. Journal of Child Language 42(2): 239–273. Ambridge, Ben, Anna Theakston, Elena V. M. Lieven and Michael Tomasello 2006: The distributed learning effect for children’s acquisition of an abstract grammatical construction. Cognitive Development 21: 174–193. Archangeli, Diana, Jeff Mielke and Douglas Pulleyblank 2012: Greater than noise: Frequency effects in Bantu height harmony. In: Bert Botma and Roland Noske (eds.), Phonological Explorations: Empirical, Theoretical and Diachronic Issues, 191–222. Berlin/New York: de Gruyter. Arnon, Inbal and Neal Snider 2010: More than words: Frequency effects for multi-word phrases. Journal of Memory and Language 62: 67–82. Arnon, Inbal and Uriel Cohen Priva 2013: More than words: The effect of multi-word frequency and constituency on phonetic duration. Language and Speech 56: 349–372.



Experience counts: An introduction to frequency effects in language  

 15

Auer, Peter 2007: Syntax als Prozess. In: Heiko Hausendorf (ed.), Gespräch als Prozess. Linguistische Aspekte der Zeitlichkeit verbaler Interaktion, 95–124. Tübingen: Narr. Auer, Peter 2009: On-line syntax: Thoughts on the temporality of spoken language. Language Sciences 31: 1–13. Baayen, Harald R. 2010: Demythologizing the word frequency effect: A discriminative learning perspective. In: Gonia Jarema, Gary Libben and Chris Westbury (eds.), Methodological and Analytic Frontiers in Lexical Research (Part I). Special issue of The Mental Lexicon 5(3): 436–461. Baayen, Harald R., Victor Kuperman and Raymond Bertram 2010: Frequency effects in compound processing. In: Sergio Scalise and Irene Vogel (eds.), Compounding, 257–270. Amsterdam/Philadelphia: Benjamins. Baayen, Harald R., Peter Hendrix and Michael Ramscar 2013: Sidestepping the combinatorial explosion: An explanation of n-gram frequency effects based on naive discriminative learning. Language and Speech 56(3): 329–347. Barðdal, Jóhanna 2008: Productivity: Evidence from Case and Argument Structure in Icelandic. Amsterdam/Philadelphia: Benjamins. Behrens, Heike 2009a: Usage-based and emergentist approaches to language acquisition. Linguistics 47: 383–411. Behrens, Heike 2009b: Konstruktionen im Spracherwerb. Zeitschrift für Germanistische Linguistik 37(3): 427–444. Blumenthal-Dramé, Alice 2012: Entrenchment in Usage-Based Theories: What Corpus Data Do and Do Not Reveal About The Mind. Berlin/New York: de Gruyter. Bowden, Harriet Wood, Matthew P. Gelfand, Christina Sanz and Michael T. Ullman 2010: Verbal inflectional morphology in L1 and L2 Spanish: A frequency effects study examining storage versus competition. Language Learning 60(1): 44–87. Boyd, Jeremy K. and Adele E. Goldberg 2011: Learning what NOT to say: The role of statistical preemption and categorization in a-adjective production. Language 87(1): 55–83. Boyland, Joyce Tang 1996: Morphosyntactic Change in Progress: A Psycholinguistic Approach. Unpublished dissertation. UC Berkeley. Brysbaert, Marc, Matthias Buchmeier, Markus Conrad, Arthur M. Jacobs, Jens Bölte and Andrea Böhl 2011: The word frequency effect: A review of recent developments and implications for the choice of frequency estimates in German. Experimental Psychology 58(5): 412–424. Bush, Nathan 2001: Frequency effects and word-boundary palatalization in English. In: Joan L. Bybee and Paul J. Hopper (eds.), Frequency and the Emergence of Linguistic Structure, 255–280. Amsterdam/Philadelphia: Benjamins. Bybee, Joan L. 1985: Morphology: A Study of the Relation between Meaning and Form. Amsterdam/Philadelphia: Benjamins. Bybee, Joan L. 2001: Frequency effects on French liaison. In: Joan L. Bybee and Paul J. Hopper (eds.), Frequency and the Emergence of Linguistic Structure, 337–359. Amsterdam/ Philadelphia: Benjamins. Bybee, Joan L. 2006: From usage to grammar: The mind’s response to repetition. Language 82(4): 711–733. Bybee, Joan L. 2010: Language, Usage, and Cognition. Cambridge: Cambridge University Press. Bybee, Joan L. and Paul Hopper (eds.) 2001: Frequency and the Emergence of Linguistic Structure. Amsterdam/Philadelphia: Benjamins. Bybee, Joan L. and Sandra Thompson 1997: Three frequency effects in syntax. In: Proceedings of the Twenty-Third Annual Meeting of the Berkeley Linguistics Society: General Session

16 

 Stefan Pfänder and Heike Behrens

and Parasession on Pragmatics and Grammatical Structure: 378–388. Berkeley: Berkeley Lunguistics Society. Cheshire, Jenny 1996: Syntactic variation and the concept of prominence. In: Juhani Klemolam, Merja Kytö and Matti Rissanen (eds.), Speech Past and Present: Studies in English Dialectology in Memory of Ossi Ihalainen, 1–17. Frankfurt a. M.: Peter Lang. Cholin, Joana, Gary S. Dell and Willem J. M. Levelt 2011: Planning and articulation in incremental word production: Syllable-frequency effects in English. Journal of Experimental Psychology: Learning, Memory and Cognition 37(1): 109–122. Coetzee, Andries W. and Shigeto Kawahara 2013: Frequency biases in phonological variation. Natural Language and Linguistic Theory 31: 47–89. Cooper, William E. and Jeanna Paccia-Cooper 1980: Syntax and Speech. Cambridge, MA: Harvard University Press. Croft, William and D. Alan Cruse 2004: Cognitive Linguistics. Cambridge: Cambridge University Press. Crossley, Scott, Tom Salsbury, Ashley Titak and Danielle McNamara 2014: Frequency effects and second language lexical acquisition. International Journal of Corpus Linguistics 19(3): 301–332. Diessel, Holger 2014: Frequency effects in language development. In: Patricia J. Brooks and Vera Kempe (eds.), Encyclopedia of Language Development, 222–224. Thousand Oaks: Sage. Divjak, Dagmar and Stefan Th. Gries (eds.) 2012: Frequency Effects in Language Representation. Berlin/Boston: de Gruyter. Divjak, Dagmar and Catherine L. Caldwell-Harris 2015: Frequency and entrenchment. In: Ewa Dabrowska and Dagmar Divjak (eds.), Handbook of Cognitive Linguistics, 53–75. Berlin/ New York: de Gruyter. Dufour, Sophie, Angèle Brunellière and Ulrich H. Frauenfelder 2013: Tracking the time course of word-frequency effects in auditory word recognition with event-related potentials. Cognitive Science 23: 489–507. Dye, Christina D., Matthew Walenski, Elizabeth L. Prado, Steward Mostofsky and Michael T. Ullman 2013: Children’s computation of complex linguistics forms: A study of frequency and imageability effects. PLOS One 8(9): 1–13. Ellis, Nick C. 2002: Frequency effects in language processing: A review with implications for theories of implicit and explicit language acquisition. Studies in Second Language Acquisition 24(2): 143–188. Ellis, Nick C. 2008a: The dynamics of second language emergence: Cycles of language use, language change, and language acquisition. Modern Language Journal 92: 232–239. Ellis, Nick C. 2008b: The psycholinguistics of the Interaction Hypothesis. In: Alison Mackey and Charlene Polio (eds.), Multiple Perspectives on Interaction in SLA: Second Language Research in Honor of Susan M. Gass, 11–40. New York: Routledge. Ellis, Nick C. 2012a: Formulaic language and second language acquisition: Zipf and the phrasal teddy bear. Annual Review of Applied Linguistics 32: 17–44. Ellis, Nick C. 2012b: Second language acquisition. In: Graham Trousdale and Thomas Hoffmann (eds.), Oxford Handbook of Construction Grammar, 365–378. Oxford: Oxford University Press. Ellis, Nick C. 2012c: What can we count in language, and what counts in language acquisition, cognition, and use? In: Stefan Th. Gries and Dagmar S. Divjak (eds.), Frequency Effects



Experience counts: An introduction to frequency effects in language  

 17

in Cognitive Linguistics, Vol. 1: Statistical Effects in Learnability, Processing and Change, 7–34. Berlin/Boston: de Gruyter. Ellis, Nick C. 2013: Frequency-based grammar and the acquisition of tense-aspect in L2 learning. In: Rafael Salaberry and Llorenc Comajoan (eds.), Research Design and Methodology in Studies on Second Language Tense and Aspect, 89–118. Berlin/Boston: de Gruyter. Ellis, Nick C., Matthew Brook O’Donnell and Ute Römer 2013: Usage-based language: Investigating the latent structures that underpin acquisition. Language Learning 63: 25–51. Ellis, Nick C., Matthew Brook O’Donnell and Ute Römer 2014: The processing of verb-argument constructions is sensitive to form, function, frequency, contingency, and prototypicality. Cognitive Linguistics 25(1): 55–98. Elmenthaler, Michael, Joachim Gessinger and Jan Wirrer 2010: Qualitative und quantitative Verfahren in der Ethnodialektologie am Beispiel von Salienz. In: Markus Hundt, Alexander Lasch and Christina A. Anders (eds.), Perceptual Dialectology: Neue Wege der Dialektologie, 111–149. Berlin/New York: de Gruyter. Erker, Daniel and Gregory R. Guy 2012: The role of lexical frequency in syntactic variability: Variable subject personal pronoun expression in Spanish. Language 88(3): 526–557. Franceschini, Rita and Stefan Pfänder (eds.) 2013: Frequenzeffekte. Zeitschrift für Literaturwissenschaft und Linguistik 43. Stuttgart/Weimar: Metzler. Geeraerts, Dirk, Stefan Gronelaers and Peter Bakema 1994: The Structure of Lexical Variation: Meaning, Naming, and Context. Berlin/Boston: de Gruyter. Goldberg, Adele 1995: Constructions. A Construction Grammar Approach to Argument Structure. Chicago: University of Chicago Press. Grande, Marion, Elisabeth Meffert, Walter Huber, Katrin Amunts and Stefan Heim 2011: Word frequency effects in the left IFG in dyslexic and normally reading children during picture naming and reading. Neuroimage 57: 1212–1220. Gries, Stefan Th. and Dagmar Divjak (eds.) 2012: Frequency Effects in Language Learning and Processing. Berlin/Boston: de Gruyter. Hakimov, Nikolay forthcoming: Effects of frequency and word repetition on switch-placement: Evidence from Russian-German code-mixing. In: Justyna A. Robinson and Monika Reif (eds.), Cognitive Perspectives on Bilingualism. Trends in Applied Linguistics (TAL). Berlin/ New York: de Gruyter. Haspelmath, Martin 2012: Explaining the ditransitive person-role constraint. Constructions 2: 1–72. Hilpert, Martin 2008: The English comparative: Language structure and language use. English Language and Linguistics 12(3): 395–417. Hilpert, Martin submitted: Frequencies in diachronic corpora and knowledge of language. In: Marianne Hundt, Simone Pfenninger and Sandra Mollin (eds.), The Changing English Language: Psycholinguistic Perspectives. Cambridge: Cambridge University Press. Hilpert, Martin and Holger Diessel forthcoming: Entrenchment in construction grammar. In: Hans-Jörg Schmid (ed.), Entrenchment, Memory and Automaticity. The Psychology of Linguistic Knowledge and Language Learning. Berlin/Boston: American Psychology Association and de Gruyter. Hollmann, Willem B. and Anna Siewierska 2011: The status of frequency, schemas, and identity in Cognitive Sociolinguistics: A case study on definite article reduction. Cognitive Linguistics 22(1): 25–54.

18 

 Stefan Pfänder and Heike Behrens

Hopper, Paul J. and Elizabeth C. Traugott 1993: Grammaticalization. Cambridge: Cambridge University Press. Hopper, Paul J. and Elizabeth C. Traugott 2003: Grammaticalization. 2nd edition. Cambridge: Cambridge University Press. Ibbotson, Paul and Michael Tomasello 2009: Prototype constructions in early language acquisition. Language and Cognition 1(1): 59–85. Janiszewski, Chris, Hayden Noel and Alan G. Sawyer 2003: A meta-analysis of the spacing effect in verbal learning: Implications for research on advertising repetition and consumer memory. Journal of Consumer Research 30, 138–149. Jacoby, Larry L. and Mark Dallas 1981: On the relationship between autobiographical and perceptual learning. Journal of Experimental Psychology: General 110: 306–340. Janssen, Niels and Horacio A. Barber 2012: Phrase frequency effects in language production. PLOS One 7(3): 1–11. Johns, Brendan T., Thomas M. Greunenfelder, David B. Pisoni and Michael N. Jones 2012: Effects of word frequency, contextual diversity, and semantic distinctiveness on spoken word recognition. The Journal of the Acoustical Society of America 132. Kapatsinski, Vsevolod 2010: Frequency of use leads to automaticity of production: Evidence from repair in conversation. Language & Speech 53(1): 71–105. Kerswill, Paul 2002: A dialect with ‘great inner strength’? The perception of nativeness in the Bergen speech community. In: Daniel Long and Dennis Preston (eds.), Handbook of Perceptual Dialectology, Vol. 2, 155–175. Amsterdam/Philadelphia: Benjamins. Kerswill, Paul and Ann Williams 2000: ‘Salience’ as an explanatory factor in language change: Evidence from dialect levelling in urban England. Reading Working Papers in Linguistics 4: 63–94. Kerswill, Paul and Ann Williams 2002: Dialect recognition and speech community focusing in new and old towns in England: The effects of dialect levelling, demography and social networks. In: Daniel Long and Dennis R. Preston (eds.), Handbook of Perceptual Dialectology, Vol. 2, 173–204. Amsterdam/Philadelphia: Benjamins. Kidd, Evan, Elena Lieven and Michael Tomasello 2010: Lexical frequency and exemplarbased learning effects in language acquisition: Evidence from sentential complements. Language Sciences 32: 132–142. Krause, Anne in progress: Morphological Change in German: Formation of Imperatives. Dissertation. University of Freiburg. Krug, Manfred 1998: String frequency: A cognitive motivating factor in coalescence, language processing, and linguistic change. Journal of English Linguistics 26(4): 286–320. Krug, Manfred 2003: Frequency as a determinant in grammatical variation and change. In: Günter Rohdenburg and Britta Mondorf (eds.), Determinants of Grammatical Variation in English, 7–67. Berlin/Boston: de Gruyter. Langacker, Ronald W. 1987: Foundations of Cognitive Grammar, Vol. 1: Theoretical Prerequisites. Stanford: Stanford University Press. Lenz, Alexandra N. 2010: Zum Begriff der Salienz und zum Nachweis salienter Merkmale. In: Markus Hundt, Alexander Lasch and Christina A. Anders (eds.), Perceptual Dialectology: Neue Wege der Dialektologie, 89–110. Berlin/New York: de Gruyter. Lieberman, Erez, Jean-Baptiste Michel, Joe Jackson, Tina Tang and Martin Nowak 2007: Quantifying the evolutionary dynamics of language. Nature 449: 713–716. Long, Daniel and Dennis R. Preston (eds.) 2002: Handbook of Perceptual Dialectology, Vol. 1–2. Amsterdam/Philadelphia: Benjamins.



Experience counts: An introduction to frequency effects in language  

 19

Lorenz, David 2012: The perception of ‘gonna’ and ‘gotta’: A study of emancipation in progress. In: Antonis Botinis (ed.), Proceedings of the 5th ISEL Conference on Experimental Linguistics, 77–80. University of Athens and International Speech Communication Association. http://conferences.phil.uoa.gr/exling/proceedings.html. Lorenz, David 2013a: On-going change in English modality: Emancipation through frequency. In: Rita Franceschini and Stefan Pfänder (eds.), Frequenzeffekte. Zeitschrift für Literaturwissenschaft und Linguistik 43: 33–48. Stuttgart/Weimar: Metzler. Lorenz, David 2013b: Semi-modal Contractions in English: Emancipation through Frequency. New Ideas in Human Interaction (NIHIN). Freiburg: Universitätsbibliothek. Madlener, Karin 2015: Frequency Effects in Instructed Second Language Acquisition. Berlin/ Boston: de Gruyter. Nübling, Damaris 2000: Prinzipien der Irregularisierung. Eine kontrastive Analyse von zehn Verben in zehn germanischen Sprachen. Linguistische Arbeiten 415. Tübingen: Niemeyer. Patterson, David and Cynthia M. Connine 2001: Variant frequency in flap production: A corpus analysis of variant frequency in American English flap production. Phonetica 58: 254–275. Pfänder, Stefan, Heike Behrens, Peter Auer, Daniel Jacob, Rolf Kailuweit, Lars Konieczny, Bernd Kortmann, Christian Mair and Gerhard Strube 2013: Erfahrung zählt. Frequenzeffekte in der Sprache – ein Werkstattbericht. In: Rita Franceschini and Stefan Pfänder (eds.), Frequenzeffekte. Zeitschrift für Literaturwissenschaft und Linguistik 43: 7–32. Stuttgart/ Weimar: Metzler. Pfeiffer, Martin 2015: Selbstreparaturen im Deutschen. Syntaktische und interaktionale Analysen. Berlin/Boston: de Gruyter. Pierrehumbert, Janet 2001: Exemplar dynamics: Word frequency, lenition and contrast. In: Joan L. Bybee and Paul Hopper (eds.), Frequency and the Emergence of Linguistic Structure, 138–157. Amsterdam/Philadelphia: Benjamins. Pierrehumbert, Janet 2006: The next toolkit. Journal of Phonetics 34: 516–530. Poplack, Shana 1992: The inherent variability of the French subjunctive. In: Christiane Laeufer and Terell A. Morgan (eds.), Theoretical Analyses in Romance Linguistics, 235–263. Amsterdam/Philadelphia: Benjamins. Poplack, Shana and Sali Tagliamonte 1996: Nothing in context: Variation, grammaticization and past time marking in Nigerian Pidgin English. In: Philip Baker and Anand Syea (eds.), Changing Meanings, Changing Functions. Papers Relating to Grammaticalization in Contact Languages, 71–94. London: University of Westminster Press. Rácz, Péter 2012: Operationalising salience: Definite article reduction in the North of England. English Language and Linguistics 16(1): 57–79. Rácz, Péter 2013: Salience in Sociolinguistics. Berlin/Boston: de Gruyter. Rosemeyer, Malte 2013: Tornar and volver: The interplay of frequency and semantics in compound tense auxiliary selection in Medieval and Classical Spanish. In: Jóhanna Barðdal, Elly van Gelderen and Michela Cennamo (eds.), Argument Structure in Flux: The Naples-Capri Papers, 435–458. Amsterdam/Philadelphia: Benjamins. Rosemeyer, Malte 2014: Auxiliary Selection in Spanish: Gradience, Gradualness, and Conservation. Amsterdam/Philadelphia: Benjamins. Schäfer, Michael 2014: Phonetic Reduction of Adverbs in Icelandic: On the Role of Frequency and Other Factors. New Ideas in Human Interaction (NIHIN). Freiburg: Universitätsbibliothek.

20 

 Stefan Pfänder and Heike Behrens

Schmid, Hans-Jörg (ed.) forthcoming: Entrenchment, Memory and Automaticity. The Psychology of Linguistic Knowledge and Language Learning. Berlin/Boston: American Psychology Association and de Gruyter. Schneider, Ulrike 2013: CART Trees and Random Forests in linguistics. In: Janne Schulz and Sven Hermann (eds.), Hochleistungsrechnen in Baden-Württemberg. Ausgewählte Aktivitäten im bwGRiD 2012. Beiträge zu Anwenderprojekten und Infrastruktur im bwGRiD im Jahr 2012, 67–81. Karlsruhe: KIT Scientific Publishing Verlag. Schneider, Ulrike 2014: Frequency, Chunks, and Hesitations: A Corpus-based Analysis of Chunking in English. New Ideas in Human Interaction (NIHIN). Freiburg: Universitätsbibliothek. Schwenter, Scott A. 2013: Strength of priming and the maintenance of variation in the Spanish past subjunctive. Paper presented at NWAV 2012 Pittsburgh. Siyanova-Chanturia, Anna, Kathy Conklin, and Walter J. B. van Heuven 2011: Seeing a phrase ‘time and again’ matters: The role of phrasal frequency in the processing of multiword sequences. Journal of Experimental Psychology: Learning, Memory, and Cognition 37(3): 776–784. Stefanowitsch, Anatol and Stefan Th. Gries 2003: Collostructions: Investigating the interaction between words and constructions. International Journal of Corpus Linguistics 8(2): 209–243. Szmrecsanyi, Benedikt 2006: Morphosyntactic Persistence in Spoken English: A Corpus Study at the Intersection of Variationist Sociolinguistics, Psycholinguistics, and Discourse Analysis. Berlin/New York: de Gruyter. Tomlin, Russell S. and Andriy Myachykov 2015: Attention and salience. In: Ewa Dabrowska and Dagmar Divjak (eds.), Handbook of Cognitive Linguistics, 31–52. Berlin/New York: de Gruyter. Traugott, Elizabeth Closs 2012: Review of: Geoffrey Leech, Marianne Hundt, Christian Mair and Nicholas Smith (eds.) 2011 Change in Contemporary English. English Language and Linguistics 16(1): 183–193. Trudgill, Peter 1986: Dialects in Contact. Oxford: Blackwell. Verhoeven, Ludo and Rob Schreuder 2011: Morpheme frequency effects in Dutch complex word reading: A developmental perspective. Applied Psycholinguistics 21: 483–498. Zipf, George Kingsley 1936: The Psychobiology of Language. London: Routledge.

Nikolay Hakimov (University of Freiburg)

Explaining variation in plural marking of German noun insertions in Russian sentences Abstract: In bilingual speech, as evidenced by extensive contact linguistics lit­ erature, nouns from one language (embedded language) are regularly inserted into sentences framed by the other language (matrix language). When marked for the plural, these nouns either receive plural markers from the matrix language or retain embedded-language plural markers. Although several explanations of this variation have been proposed, none of them has been studied systematically thus far, nor have they been analyzed jointly as competing probabilistic explanations. The purpose of this article is to systematically investigate plural marking on noun insertions in bilingual speech involving two fusional languages, Russian and German. The choice between Russian and German plural markers with German noun insertions depends on the interaction of three factors: the frequency with which the plural form is used in the embedded language, i. e. German; its phonetic shape; and the morphological case projected by the Russian matrix structure on the slot the noun is inserted into.

1 Introduction Variation lies at the heart of many patterns of code-mixing. Explaining codemixing therefore necessitates tackling the issue of variation. Variation in codemixing is determined by an intricate interplay of structural, psycholinguistic and sociolinguistic factors (Muysken, Kook, and Vedder 1996: 487). Therefore, it comes as no surprise that one of the first accounts of code-mixing originated within the framework of variation theory (Poplack [1980] 2006). As attested in numerous corpora of bilingual speech (see below), lone code-mixed nouns combine with plural markers to form three possible patterns: noun with matrix language plural marking, noun with embedded-language plural marking, and on rare occurrences noun with double plural marking. Although patterns of plural marking on code-mixed nouns are often discussed in the literature (see Backus 1996, 1999, 2003; Boumans 1998; Muhamedowa 2006; Myers-Scotton 1993, 2002), the need to explain, or even predict this variation remains.

22 

 Nikolay Hakimov

The purpose of this paper is determined by the following considerations: Firstly, research on code-mixing has focused on universally applicable principles, and inherent properties of the languages involved have often been neglected, especially those on the level of morphophonology. Yet, as I will demonstrate, morphophonological regularities of these languages influence the distribution of code-mixing patterns. Secondly, though introduced as a factor determining bilingual online production already in Backus (1996), the factor ‘frequency’ has neither been subject to quantitative analysis nor empirical validation in any of the code-mixing studies reviewed and has been generally understudied in the field to date. This circumstance is surprising given the amount of work examining frequency as a major factor determining human linguistic behavior, and language organization carried out in other branches of linguistics (Barlow and Kemmer 1999; Bybee 2007; Bybee and Hopper 2001; Divjak and Gries 2012; Gries and Divjak 2012; see Diessel 2007; Ellis 2002 for reviews). Thirdly, struc­ tural approaches to code-mixing outlined below examine variation in bilingual speech and indicate possibilities for code-mixing. However, they do not make any assumptions regarding the probability that one or another variant will be used in bilingual production. The question I address is whether variation in patterns of plural marking of code-mixed nouns as observed in Russian-German code-mixing can be effectively predicted by the relevant structural properties of Russian and German, and the frequency distribution of the plural and singular forms of the inserted German nouns. In an attempt to predict this variation, the current study draws on assumptions stemming from usage-based approaches. A usage-based approach not only aims to provide a cognitively real account of language processing and organization, it also allows for an integration of two kinds of factors: factors based on discreet categories, and factors distinguished by gradience. In order to test the interplay of these probabilistic factors and predict its outcome, a statistical model is applied. The first hypothesis of this study is that the frequency distribution of singular and plural forms in the embedded language, German, will influence patterns of plural marking of German nouns in Russian sentences. Words that are often used in their plural forms in one language are inserted as plurals into another language (Backus 1996, 1999, 2003). Highly entrenched German plurals will thus retain German plural markers in Russian sentences. Another consequence of the frequency distribution of singular and plural forms in the embedded language is that nouns frequent in the singular in German will receive their plural marking from Russian, the matrix language. As a matrix language, Russian restricts the possibilities for accommodation of German noun insertions on the morpho­ phonological level. The Russian nominal system favors stems that have con­ sonants in the final position. Therefore, the second hypothesis is that German



Explaining variation in plural marking of German noun insertions 

 23

noun stems exhibiting vowels stem-finally will retain the German plural marker. That is, the phonological shape of German nouns is a factor determining the choice of language for marking the plural on inserted stems. Another motivation behind the observed variation is a mismatch between the nominal systems of German and Russian: the Russian system exhibits a lesser degree of syncretism and con­ tains more morphological cases than German. The final hypothesis is therefore that an insertion will receive a Russian plural affix if the slot of the insertion requires a non-core case. This is of particular interest in a situation of contact between Russian and German, two fusional languages, as previous research on variation in plural marking of noun insertions has thus far only involved pairs of agglutinative and analytical languages. The paper is organized as follows: Section 2 presents a survey of patterns of plural marking on code-mixed nouns by drawing on instances observed in corpora of bilingual speech involving various language pairs. Section 3 reports the state of the art of code-mixing research concerned with explaining the dis­ tribution of these patterns in mixed sentences. Section 4 then offers an overview of the systems of plural marking in Russian and German. In Section 5, I describe the corpus of Russian-German bilingual speech utilized in this study and the analyzed data. Section 6 is concerned with the analysis of the factors determin­ ing the studied variation: these include frequencies of the singular and plural forms of the inserted lexical items as distributed in the embedded language, phonetic shape of the German noun insertions, including the segment featured stem-finally, and the morphosyntactic context in which the nouns appear, i. e. the morphological case required by the slot in which they are to be inserted. Each of these factors is a tendency rather than an absolute rule, therefore they are evaluated statistically in Section 7, which describes the statistical model testing the interplay of factors when determining the use of the patterns of plural marking. Section 8 summarizes, and discusses the results.

2 Typology of marking plural on code-mixed nouns The structure of a code-mixed clause can be analyzed by determining the division of labor between the languages involved. In the case of insertional code-mixing (Auer 1999, 2014; Muysken 1995, 2000), an asymmetry is observed between the roles the languages play in contributing to the core structure of the clause. In models of insertional code-mixing (see Auer and Muhamedova 2005; Backus 1996; Boumans 1998; Muhamedowa 2006; Muysken 1995, 2000; Myers-Scotton 1993, 2002), the language responsible for the core structure of the clause is called the matrix language (ML); the other, dominated language, whose role is usually

24 

 Nikolay Hakimov

restricted to the supply of lexical items, is the embedded language (EL). The terms “matrix language” and “embedded language” are used in the sense of Boumans (1998), namely as labels for the languages in an asymmetric relation and without reference to the implications of Myers-Scotton’s Matrix Language Frame model, with which these terms are usually associated. Code-mixed nouns can receive plural marking in three different ways: the first possibility occurs when marking results from the morphological processes of the matrix language, i. e. [a [b N b]-pl a]; the second possibility to mark the plural on code-mixed nouns is to retain the plural marker used with the word in the embedded language, or the language of the inserted stem, i. e. [a [b N-pl b] a]; the final option is double plural marking, which refers to the case when the noun receives two plural markers: one from the embedded language and the other from the matrix language, i. e. [a [b N-pl b]-pl a].

2.1 Type 1: A morphological process of the matrix language Marking the plural by the morphological process of the matrix language is espe­ cially frequent in situations in which an agglutinative language is the matrix language. In (1), the Dutch noun activiteit ‘activity’, embedded into a Turkish clause, is used with the Turkish plural suffix -ler-, and in (2), the English noun rule acquires the Finnish stem formant -i and the plural marker -t. (1)

Turkish-Dutch1 (Backus 1996: 150) activiteit-ler-i yapacağız dedik activity-pl-acc do-fut-1pl said-1pl ‘we said we were going to activities’

(2) Finnish-English (Halmari 1997: 60) joo missä kummassa ne rule-i-t on? yeah where ever 3pl-sf-pl are ‘Yeah, where on earth are those rules?’

1 In the examples, italics are used for highlighting German material. The code at the end of a sample utterance indicates the position in the corpus. The morphosyntactic glosses are provided according to the Leipzig Glossing Rules.



Explaining variation in plural marking of German noun insertions 

 25

The high frequency of this kind of insertion in language pairs with an agglutinative language as the matrix language, and its rarity in pairs of languages with fully fusional Arabic (Boumans 1998: 180; Nortier 1990: 189) and moderately fusional French and Dutch as the matrix language (Treffers-Daller 1994) led Muysken to hypothesize that fusional languages are resistant to this pattern (2000: 77). However, from as early as Hasselmo (1972: 265–266) there is documented evidence in the literature against this claim (see Budzhak-Jones 1998: 174 for UkrainianEnglish; Hlavac 2003: 73 for Croatian-English; Stenson 1990: 180 for Irish-Eng­ lish; Szabó 2010: 352 for German-Hungarian). In line with this evidence are the following examples of lone code-mixed nouns in Russian sentences: (3) Russian-English (Benson 1960: 173) ves’       place by-l zaparkova-n car-ami whole place[nom.sg] be-pst[sg.m] park-part.sg.m -instr.pl ‘The whole place was parked full of cars.’ (4) Russian-Hebrew (Naiditch 2008: 48) kogda u vas nač-n-ut-sja bagrut-y? when with 2gen.pl begin-perf.fut-3pl-refl exam-nom.pl ‘When will your exams begin?’ The noun car in (3) and the noun bagrut ‘exam’ in (4) take Russian inflectional affixes expressing plural and morphological case, which is also a typical pattern employed in the Russian-German code-mixing data analyzed below.

2.2 Type 2: A morphological process of the embedded language The second possibility to mark the plural on code-mixed nouns is to retain the plural marker used with the word in the embedded language. A consequence of this process is the emergence of so-called embedded-language islands. In (5), the noun window is inserted with its English plural marker -s; similarly in (6), we can see the insertion of the plural form dvarim ‘things’ of the Hebrew noun davar ‘thing’. The noun dvarim is not marked for the Russian genitive case, and thus presents an embedded-language island, just like the noun windows in (5). (5) Russian-English (Benson 1960: 173) ona poš-l-a clean-ova-t’ window-s -ipfv-inf -pl 3sg.f go-pst-sg.f ‘She went to clean windows.’

26 

 Nikolay Hakimov

(6) Russian-Hebrew (Naiditch 2008: 48) est’ neskol’ko dvar-im kotor-ye my dolžn-y zna-t’ be.prs several pl\thing-pl which-acc.pl 1pl obliged-1pl know-inf ‘There are several things you have to know.’ (7) Russian-Estonian (Mürkheim 1970: 117, quoted in Verschik 2004: 437) ja boj-u-s’ vaim-u-sid 1sg fear-1sg-refl ghost-sf-partitive.pl ‘I am afraid of ghosts.’ In (7), the Estonian stem vowel and the partitive affix are retained, possibly due to a mismatch in the category of morphological case: not only is the Russian partitive limited to a specific semantic class, and devoid of a plural form, but the verb bojus’ also requires the genitive. Some agglutinative languages, such as Tamil and Kazakh (Muhamedowa 2006: 67), may combine the embedded-language plural marker with the matrix language case marker, as in (8). This is indicative of the gradience of the morphosyntactic integration of inserted forms, which needs to be taken into consideration in models of code-mixing. (8) Tamil-English (Sankoff, Poplack, and Vanniarajan 1990: 81) only kalaimagaLLataan movie-s-e patti peece ille only proper.name.loc.only movie-pl-acc about talk neg ‘Only in KalaimagaL there is no talk about the movies.’

2.3 Type 3: Double plural marking Double plural marking is probably the most frequently described case of double morphology documented in the literature and has therefore attracted the most interest (Backus 1992: 96, 1996: 151, 1999: 98–99; Boumans 1998: 90–91; Muhamedowa 2006: 152–156; Myers-Scotton 1993: 132–135, 2002: 91–93; MyersScotton and Jake 1995). For example: (9) Kazakh-Russian (Muhamedowa 2006: 152) sonday ülken zvanij-a-lar-ï bar such big title-pl.nom-pl-pos3sg exist.3sg ‘He has many titles.’ In this example, the Russian plural noun zvanija ‘titles’ additionally acquires the Kazakh plural marker -lar-. Another example comes from Moluccan Malay-Dutch



Explaining variation in plural marking of German noun insertions 

 27

code-mixing in the Netherlands. This instance involves the use of reduplication as a productive morphological process of plural marking in Moluccan Malay (Voigt 1994: 50–56): (10) Moluccan Malay-Dutch (Voigt 1994: 52) kalau minum terus ada di punya pitje-s-pitje-s-nya if drink continue exist det seed-pl-seed-pl-det ‘If I continue drinking there’ll be seeds.’

3 Previous explanations Prior studies have predominantly focused on double plural marking in mixed nouns, and double morphology in general, since the latter was considered an exceptional case in the early version of Myers-Scotton’s influential Matrix Language Frame (MLF) model (1993). The proponents of the MLF model declared double morphology to be motivated by erroneous access in production (MyersScotton 1993: 132–136; Myers-Scotton and Jake 1995: 1000). Although MyersScotton (1993: 134) suggests possible scenarios for double morphology, it is not clear what factors determine its occurrence in bilingual corpora. A structural explanation of the phenomenon was suggested by Boumans (1998), who considers a mismatch in the morphological marking processes to be a crucial factor in determining the emergence of double morphology; he claims that “The likelihood of double marking appears to increase when each language marks the same feature in a different manner, for instance, by means of prefixes and suffixes” (Boumans 1998: 91). This approach sheds light on a number of cases of double morphology present in bilingual corpora, for instance, the Moluccan Malay-Dutch Example (10). The morphological processes of plural marking mismatch in this case because Malay uses reduplication whilst Dutch employs inflectional suffixes. This leads to a juxtaposition of both markings and thus the emergence of a form with the plural marked twice. Boumans’ proposal has been widely echoed in the literature. Indeed, it not only holds for double plurals, but also for other cases of double morphology (see Myers-Scotton 2002: 91 for double marking of determiners and infinitives; Muysken 2000: 104 for doubled adpositions; Muhamedowa 2006: 152–156 for doubled adpositions; Szabó 2010: 346–352 for doubled adpositions and determiners). Myers-Scotton (2002: 150) extends this principle to the case of plural marking on inserted nouns by the embedded language. It can, for example, aptly account for the emergence of the internal embedded-language island in (6). In contact between Hebrew,

28 

 Nikolay Hakimov

which employs both a stem change and a suffix to mark plural, and Russian, which uses only inflectional suffixes, the Hebrew plural form is produced. This factor, however, has a restricted explanatory power because it cannot account for the use of double morphology in pairs of languages whose mor­ phological processes for marking a particular feature do not diverge, as Example (9) revealed. On the other hand, even when a mismatch in processes of plural marking is observed, embedded-language plural markers and matrix language plural markers can alternate, as in (11): (11) English-Hebrew (Olshtain and Blum-Kulka 1989: 70) Mother: What do I have to do when I go to your gan (nursery.school)? Son (15): Wait, they are on strike, the gan-im (nursery.school-pl)? Mother: No, the ozrot (pl\assistant) are on strike. In order not to close down the gan the mothers are taking tor-ot (turn-pl) to help the ganenet-s (nursery.teacher-pl). The assistants to the ganenet-s (nursery.teacher-pl). They’re young women who have taken a year’s course to be an ozeret (assistant) for gan-im (nursery. school-pl). Son (15): So why are they going on strike and not the gananot (pl\nursery. teacher)? This example demonstrates high variability of plural marking within a short piece of conversation. Hebrew morphology is retained in most of the plural forms (ganim ‘nursery schools’, ozrot ‘assistants’, torot ‘turns’ and gananot ‘nursery teachers’), possibly due to a mismatch in the morphological processes of plural marking: the plural forms ozrot and gananot employ a stem change, whereas English uses suffixation. Nevertheless, the item ganenet ‘nursery teacher’ receives the English plural marker. In discussing the same phenomenon, Backus (1996: 151) posits an alternative proposal. Within a Cognitive Linguistics framework, he provides a usagebased explanation of double marking. Emphasizing the role of entrenchment of certain plural forms, he observes that some words occur in their plural form more often than in their singular form. The high degree of entrenchment of these plural forms leads to their activation in production, and is attributed to a mis­ match in the frequency distribution of the plural and singular forms of nouns. Backus further elaborates on this idea in the following way: “[… forms with the embedded-language plural morphemes] are established lexical units for the speakers who use them. This means they are chunks […]: they are lexical units that consist of more than one morpheme.” (1999: 98) This assumption is based on



Explaining variation in plural marking of German noun insertions 

 29

the idea that cognitive representations on various linguistic levels are functions of frequency of use, and therefore in line with usage-based theory (Bybee 1985, 2006, 2010). Positing a rich memory for language (Bybee 2006, 2010; Langacker 1987, 1999; Tomasello 2003) allows for both plural and singular forms of same nouns to exist independently in the lexicon (Backus 1999: 99). Baayen, Dijk­ stra, and Schreuder (1997) present evidence for the independent storage of highfrequency regular plurals by speakers of Dutch. Although Backus’ reasoning is by and large plausible, it cannot be viewed as a principle determining the emergence of embedded-language plural markers. If this were the case, any noun that appeared with an embedded-language plural morpheme would be granted the status of a lexical unit. When modeling the probability of activating a plural form in production, we need to consider the competition between the plural and singular form of the inserted noun as conditioned by entrenchment, as well as speaker variation. If these intricacies are ignored, variation in the plural marking of code-mixed nouns, such as in (11), cannot be exhaustively explained. Thus far, phonotactic restrictions and morphophonological regularities have been neglected as possible motivations for the use of particular plural marking patterns. However, as will be shown below, they are among the requirements the matrix language imposes on an item that is to be integrated in cases of insertional code-mixing. In sum, the appearance of embedded-language plural markers either as part of internal embedded-language islands, or as constituents with double mor­ phology can be explained as the result of either a mismatch in the processes of marking a plural, or an asymmetry in the token frequency of the singular and plural forms of a lexical item. Further factors such as morphophonology do not appear to have been considered thus far. A limitation that pervades all the approaches to double morphology discussed above is their sole reliance on individual factors and therefore a tendency to overlook some of the intricacies of the matter. In order to account for the variation observed in the data, the possible driving forces behind it should be analyzed in their interaction.

4 Plural marking in Russian and German Russian-German code-mixing provides a good testing ground for the interaction of factors that contribute to variable patterns of plural marking because they are fusional languages. Both inflect nouns for number, gender and case, but dem­ onstrate varying degrees of syncretism and systematicity in the encoding of these grammatical notions.

30 

 Nikolay Hakimov

4.1 Russian Russian noun declension fuses case, number, and gender marking. The system is stem-based and synthetic, and can be best described by stipulating declensional classes. However, there is no consensus in the literature regarding the number of the declensional classes and the determinants for distinguishing them (cf. Corbett 1991, 2003). In this paper I adopt Zaliznjak’s approach to the Russian nominal inflection ([1967] 2002, [1977] 2009). The following declensional classes are rele­ vant for the presentation of the data below: the ‘masculine’, the ‘feminine’, and the zero-marked class. Traditionally, three principal classes are differentiated, each being prototypical for the respective gender: the masculine, the feminine, and the neuter. This treatment is based on the gender and phonological type of the stem. The three classes are presented in Table 1. Among these classes, the ‘masculine’ and ‘femi­nine’ ones are vital for the integration of noun stems into Russian sentences since they are the most productive ones in Russian (Zaliznjak [1967] 2002: 218; Timberlake 2004: 148). This tendency is observed in RussianHebrew (Naiditch 2008) as well as Russian-German code-mixing data. It is nec­ essary to note that the distinctions between the declensional classes in the plural, when compared those in the singular, are minimal: each morphological case has one inflection per class except for the genitive and the nominative, whose forms coincide with those of the accusative. The last relevant class relies on zero paradigms and is restricted to a group of loan words ending in vowels, except unstressed {-a} preceded by a consonant (Timberlake 2004: 148–149), such as alibi, boa ‘feather boa’, kafe ‘café’, kenguru ‘kangaroo’, kino ‘cinema’. Neither of the nominal grammatical categories is marked overtly on the nouns of this class, like in ljubimoe kafe ‘a favorite café’ vs. raznye kafe ‘different cafés’. Table 1: Inflections of the Russian nominal declension (adapted from Zaliznjak [1977] 2009: 26). The inflections in brackets present a subtype used with stems featuring a soft final consonant singular n

m nominative genitive dative

inanimate accusative animate instrumental prepositional

Ø

Ø -a

-a -u

-om

-o

-o

-e

f

m

-a -y (-i) -e -u

-y (-i) -ov (-ej)

-oj

-y (-i) -ov (-ej)

plural n -a -am -a -ami -ah

f Ø

Ø

-y (-i)

-y (-i)

Russian employs the genitive singular to express plurality when nouns follow the cardinal numerals ‘two’, ‘three’, and ‘four’, for example dv-a dom-a ‘two-m



Explaining variation in plural marking of German noun insertions 

 31

house-gen.sg.m’ and dv-e knig-i ‘two-f book-gen.sg.f’. If we assume conceptual plurality, the instances of German nominal insertions in dv-a mensch-a ‘two-m person-gen.sg.m’ and dv-e flasch-i ‘two-f bottle-gen.sg.f’ should be considered cases of plural marking.

4.2 German Like Russian, German noun inflection fuses case, number, and gender marking. Furthermore, the nominal system is also traditionally analyzed in terms of declensional classes (Eisenberg 2006: 158; Duden 04 2009: 229). Because the plural demonstrates a high degree of syncretism, some scholars handle the patterns of plural marking and the patterns of the nominal inflection in the singular sep­ arately, given that overt case marking is the exception in the paradigm (Flämig 1991; Helbig and Buscha 2001; Hentschel and Weydt 2003). In the plural, the only case opposition, characteristic of most patterns of plural marking, is between the dative, which is marked by the suffix -(e)n, and the non-dative. Moreover, German lone nouns used in otherwise Russian sentences with German plural markers are not marked for the dative case in my data set. In other words, German plurals inserted into Russian sentences do not exhibit German case morphology. Four mechanisms of plural marking are employed by German (Flämig 1991: 480): (1) zero marking (Schüler ‘pupil[sg]’  – Schüler ‘pupil[pl]’), (2) the use of umlaut, or vowel alteration, (Garten ‘garden[sg]’  – Gärten ‘pl\garden’), (3) suffixation (Arm ‘arm[sg]’ – Arm-e ‘arm-pl’), (4) a combination of umlaut and a plural suffix (Buch ‘book[sg]’ – Büch-er ‘pl\book-pl’) . The patterns of German plural marking with the various inflections are given in Table 2. Table 2: Patterns of German plural marking (adapted from Flämig 1991: 480) Pattern number

Example of singular

   Pattern Suffix Umlaut

Example of plural

1 2 3 4 5 6 7 8 9

Lehrer ‘teacher’ Kloster ‘cloister’ Tag ‘day’ Kopf ‘head’ Kind ‘child’ Mann ‘man’ Name(n) ‘name’ Mensch(en) ‘person’ Auto ‘car’

-Ø -Ø -e -e -er -er -n -(e)n -s

Lehrer Klöster Tage Köpfe Kinder Männer Namen Menschen Autos

– + – + – + – – –

32 

 Nikolay Hakimov

As the inflectional categories are compatible between the languages, it is of inter­ est to see where code-mixing occurs and why.

5 Data and methodology 5.1 Corpus This study utilizes a corpus of Russian-German bilingual speech recorded amongst Russian-speaking communities across Germany (Freiburg im Breisgau, Hanover, Lahr/Black Forest) (cf. Hakimov, to appear). The speech of 21 speakers is rep­ resented. The speakers migrated from the former Soviet Union to Germany with their parents, ethnic Germans, and are a group officially called russlanddeutsche (Spät-)Aussiedler ‘Russian-German late repatriates’ (Brehmer 2007; Meng 2001). One of the subjects, born in Germany to a Russian-speaking family of German descent, can be classified as a second-generation speaker, whereas the others qualify as the so-called intermediate generation owing to their early age at immigration (cf. Backus 1996: 58). The participants of the study are young adults between the ages of 18 and 35. Though five were exposed to German before immigration, Russian was the first language that they acquired. The age of acqui­ sition of German, as measured by the age of immigration to Germany, varies: four speakers began learning German before the age of seven, fifteen started between eight and twelve, and one speaker began at the age of fifteen. All but one partici­ pant has lived in Germany for at least ten years prior to the data collection. Though the exception participant has stayed in Germany only for three years, she started learning German before her immigration and was living in a half-German family, being richly exposed to German. The other control factor for bilingual capacity is school education in Germany: the subjects had either finished school in Germany or were still attending school in Germany. These criteria ensured that all inform­ ants are fluent bilinguals. The total size of the corpus is approximately 28 hours of recorded speech. One half of the corpus contains casual conversations that occurred between one of the subjects and their peers or family members. The other half of the corpus includes informal group interviews that I conducted with those subjects who were unwilling to record their private conversations. All interviews were carried out in groups consisting of at least two subjects, who were very familiar with each other, being either classmates or friends. The familiarity of the subjects with each other was intended to enhance naturalness of the interaction. The subjects were not informed that the phenomenon of interest for data collection was code-mixing. The relevant information concerning the language biographies of the speakers represented in the corpus was made available after recording.



Explaining variation in plural marking of German noun insertions 

 33

Pluralized noun insertions were extracted from the corpus; all of them present German nouns inserted into the Russian matrix structure. No instances of Russian pluralized nouns inserted in German sentences were registered.

5.2 Patterns of plural marking on code-mixed German nouns in the sample In the bilingual corpus, a total of 153 instances of German noun insertions in Russian sentences were identified as marked for plural. The German nominalized adjectives Mehrsprachige ‘multilinguals‘, Russlanddeutsche ‘Russian Germans’ and Verwandte ‘relatives’ were not considered nouns on the grounds that they employ the adjectival declensional system and do not have stable plural forms. Russian plural markers are registered with 73 of the insertions. The plural inflections of these forms express all possible morphological cases. In (12), for instance, the German noun Augenarzt ‘ophthalmologist’ is used with the Russian inflection of the genitive plural. (12) malo augenarzt-ov (LA050310) few ophthalmologist-gen.pl ‘There are few ophthalmologists.’ In the corpus, 69 instances of German noun insertion in Russian sentences appear with German plural markers, as in (13). (13) čo na nej za klamotte-n ode-t-y (LG050311) what on 3prep.sg for rag-pl clothe-part-pl ‘What rags is she wearing?’ Here, the colloquial German noun Klamotten ‘rags’ (referring to ‘clothes’) is inserted into the Russian matrix clause in its plural form. Assignment of German noun insertions to one of the two patterns – [a [b N b]-pl a] and [a [b N-pl b] a] – is uncomplicated for most forms. On the other hand, attrib­ uting certain insertions to one of these patterns on the basis of their shape is not always so straightforward. German nouns featuring /r/ in the stem-final position present such a case. This phoneme has two typical phonetic realizations in the corpus: a near-open central vowel [ɐ] or an alveolar vibrant consonant [r]. It is necessary to note that the consonantal /r/ is phonologically real in German. For example, the phoneme /r/ is realized as a vowel in the German suffix {-er} when the phoneme occurs in the word-final position, as in Lehrer [ˈleːʀɐ] ‘teacher.m’ and jüng-er [ˈjʏŋɐ] ‘young-comp’. However, when another suffix is added to {-er}

34 

 Nikolay Hakimov

so that the phoneme occurs in an intervocalic position, /r/ is realized as a con­ sonant2, as in Lehrer-in [ˈleːʀəʀın] ‘teacher-f’ and jüng-er-e [ˈjʏŋəʀə] ‘young-compsg’. This variability has direct consequences for the morphological integration of forms with /r/ in the stem-final position. In order to become integrated into the Russian declensional system, the noun stem usually has to feature a consonant in the final position (see Section 6 for further details). Integration in the Russian morphological system is therefore unambiguous when the consonant realization of /r/ is selected. The following eight examples are evidence for this: auslände[r]ov ‘foreigner-gen.pl’, baue[r]-á ‘peasant-nom.pl’, hamste[r]-y ‘hamster-nom.pl’, hauptsemina[r]-y ‘advanced.seminar-acc.pl’, opfe[r]-y ‘loser-nom.pl’, penne[r]-y ‘tramp-nom.pl’, pflaste[r]-ah ‘plaster-prep.pl’, studiengebüh[r]-y ‘tuition. fee-acc.pl’. At the same time, the data also contain instances of German noun insertions featuring the near-open central vowel [ɐ] stem-finally, these include anfänger ‘beginner’, aschenbecher ‘ashtray’ (two tokens), dinosaurier ‘dinosaur’ (two tokens), finger, inliner ‘rollerblade’, kleiderständer ‘coat-stand’, mitarbeiter ‘employee’, obstbecher ‘fruit cup’, zuschauer ‘spectator’, zigeuner ‘gipsy’. As the singular and plural forms of these nouns coincide in German, the plural is marked only syntactically, i. e. on the noun phrase but not on the nominal stem: ein artiger Schüler ‘a good pupil’ vs. viele artige Schüler ‘many good pupils’ (see pattern 1 in Table 2). Russian also allows for this strategy, as there is a small group of Russian stems with vowels in the final position whose plural and case forms are marked by zero. Therefore, the plural on German noun insertions whose stems exhibit the near-open central vowel stem-finally is marked syntactically in Russian sentences as well, for example: (14) ty ne naš-l-a tak-ie kleiderständ[ɐ] (LB071401) 2sg neg find-pst-sg.f such-acc.pl coat.stand[pl] ‘Have you found such coat stands?’ Here, the plural is marked on the adjective takie, and the noun kleiderständer is analyzed as a Russian indeclinable. In the sample, there is one instance of syn­ tactic marking of the plural on a stem featuring a vowel in the final position: the German noun LKW [ɛlkaˈve:] ‘lorry’ (15) is not marked for the plural overtly although its singular and plural forms differ (LKW vs. LKWs). In this case, it is assumed that the word must be integrated into the Russian zero declension, as otherwise the sentence is ungrammatical.

2 In German, at least four consonantal realizations of the phoneme /r/ are distinguished: [r], [ɾ], [ʀ], and [ʁ]. These variants occur in free variation, which can be attributed to different regional varieties of German (Kohler 1995: 165–166; Ramers and Vater 1991: 37–38, 110–112).



Explaining variation in plural marking of German noun insertions 

 35

(15) nu kogda èt-i [ɛlkaˈve] grëban-ye proezža-jut ptcl when this-nom.pl lorry[pl] jiggered-nom.pl pass.by.prs-3pl ‘But when those jiggered lorries pass by.’ (LR07141) This example supports the interpretation that German nouns with a vowel in the stem-final position should be treated as zero-marked plurals in Russian, when used in plural contexts. But why should any of the stems ending in /r/ become marked for plural overtly in the first place if the more economic strategy would be to handle them as zero-marked? Note that there are eight tokens featuring the coronal vibrant that receive overt Russian plural markers and ten instances of stems with the vocalic /r/ at the end. This latter tendency may be explained by the idea that plurality is expressed because it satisfies speakers’ intentions (MyersScotton 2002: 150). In order to be more explicit, speakers may prefer an overt marker because the default case in Russian is to mark plural overtly (apart from the small class of zero-marked nouns, cf. Zaliznjak [1967] 2002: 218). Furthermore, consideration must be given to inter-speaker variation in the pronunciation of /r/, i. e. whether or not they tend to vocalize the coronal vibrant in all possible German contexts. Unfortunately, these issues cannot be studied in depth in the current paper due to the scarceness of the data. Nonetheless, in the following analysis, the nominal stems with the final vocalized /r/ will be considered as possible candidates for overt plural marking. As discussed above and shown in Section 6.1, they can either retain their German plural marked by zero or take a Russian overt plural marker. The data examined here contain no instances of double plural marking. The distribution of the patterns of plural marking on code-mixed German nouns in the sample is given in Table 3. The following section will discuss the factors determining this variation in the data. The subsequent analysis will be concerned with the two prototypical patterns: the use of either Russian or German plural inflections with the nouns in focus. Table 3: Distribution of patterns of plural marking with code-mixed German nouns in the sample Plural marking

Tokens

%

morphological (overt) Russian: [r [g N g]-pl r] German: [r [g N-pl g] r]

 73  69

47.7 45.1

syntactic (covert): [r A-pl [g N g] r]

 11

 7.2

Total:

153

36 

 Nikolay Hakimov

6 Determinants of overt plural marking on German code-mixed nouns The analysis of the variation of plural marking on German noun insertions in Russian sentences proceeds from three important observations. First, Russian morphophonology restricts the possibilities of using a German stem with Russian declensional inflections. Second, the frequency distribution of the singular and plural forms of an item in German can predict which of the two patterns will be selected in bilingual production. Third, the mismatch in the nominal systems of German and Russian is considered a further factor determining the language of plural-markers on German noun insertions.

6.1 Morphophonological restrictions on overt Russian plural markers with German nominal stems Both the phonetic shape of a lexical item to be inserted and the morpho­ phonological restrictions of the matrix language determine whether the item undergoes morphological integration into the matrix language or features the embedded-language morphology. In the following I will show that the Russian declensional system restricts the use of Russian plural inflections with certain German nouns, depending on their phonetic shape. German lexical items feature either a vowel or a consonant at the end of their base form, that is the nominative singular. A major portion of German nouns receiving Russian plural inflections has stems with consonants in the final position, such as Geschenk-i ‘present-nom.pl’, Netz-y ‘net-acc.pl’, and Beispiel-ej ‘example-gen.pl’. The data set contains 57 tokens of this type. Noun stems with vowels in the final position are much less frequent: there are only 16 instances of such forms in the sample. These are classified into two groups depending on whether they have a stressed or unstressed vowel in the stem-final position. Of the 16 instances, 14 stems exhibit an unaccented vowel in the final position. The most frequent lexical item of this type, Sprache ‘language’, receives Russian plural inflections four times. In contrast, there are only two instances of a German noun with a stem-final accented vowel. In each of these instances, the speakers use the lexeme LKW ‘lorry’ in the plural, though its realizations differ. Let us consider the first group, which has unstressed vowels in the stem-final position. In order for the stem to receive the corresponding Russian inflection (as was shown above, the initial phonemes of Russian nominal inflections are



Explaining variation in plural marking of German noun insertions 

 37

vowels), the unstressed final vowels are deleted: -CV + -V(C) > -CVV(C). The forms that result from this process are listed below: (16) Flasche + i > (tri) Flasch-i ‘(three) bottle-gen.sg’ (LR0125) Grippe + -y > Gripp-y ‘flu-nom.pl’ (LAR022411) Konto + -y > Kont-y ‘account-no.pl’ (LA05036) Kunde + -ov > Kund-ov ‘customer-gen.pl’ (LA022415) Kunde + -am > Kund-am ‘language-dat.pl’ (LA05038) Sache + -i > Sach-i ‘thing-nom.pl’ (LB110526) Sprache + -i > Sprach-i ‘language.(course)-acc.pl’ (LB110526, LV022410) Sprache + -ah > Sprach-ah ‘language.(course)-prep.pl’ (LV022410) Sprache + -ami > Sprach-ami ‘language.(course)-instr.pl’(LV022410) Türke + -i > Türk-i ‘Turk-nom.pl’ (LR0316) Zwetschge + -i > Zwetsch[k]-i ‘plum-nom.pl’ (LR0714) Zwetschge + -Ø > Zwetschek ‘gen.pl\plum[gen.pl]’ (LR0714) The vowel sound at the end of the stems in (16) is almost always a schwa, in Konto, however, it is a peripheral vowel that is subject to deletion. We can assume from this that stem-final vowels are deleted regardless of their quality. With regard to the form Zwetschek, it is similarity in the phonetic shape that enables the speaker to produce it: this form results from the Russian [ɪ] ~ zero alternation ([tʃk] ~ [tʃɪk]) as a genitive plural marker, a process employed by some Russian feminine nouns whose stems end in [tʃk], such as pečk-i ‘stove-nom.pl.f’ ~ peček ‘gen.pl.f\ stove[gen.pl.f]’ or točk-a ‘point-nom.sg.f’ ~ toček ‘gen.pl.f\point[gen.pl.f]’. The plural form Sprachi of the German word Sprache ‘language’ is used to refer to ‘language courses’, this lexical item might thus be regarded as an established loan, whose meaning diverges from the original semantics. One exception to stem-final deletion is found in the sample: together with the Russian plural inflection -i, the lexical item Baby receives the consonant [k] so that the form babyki arises. Two possible explanations can be proposed for this case. First, if the consonant is not inserted, the inflection will not be acous­ tically perceivable (cf. *baby-i). Second, the combination -ik- corresponds to the Russian diminutive suffix occurring with masculine nouns (e. g. dom ‘house’ – dom-ik ‘little house’), in this case the semantics of babyki remains congruent with the meaning of the German Babys. Note that another speaker uses a further very productive diminutive suffix -ičk- (Švedova [1980] 2005: 209–210) with the same lexical item, which allows her to effectively integrate the item into the Russian declensional system: babyčka (HO1007). Here, as well as in the aforementioned case, the final vowel -i undergoes a reanalysis and becomes part of the Russian suffix, i. e. Baby + -čk-a > bab-ičk-a ‘baby-dim-nom.sg’.

38 

 Nikolay Hakimov

Regarding inserted German items with accented vowels in the stem-final position, instead of deleting the stressed vowel, speakers insert a suffix element, as shown in (17). The noun LKW ‘truck’ is the only noun affected: (17) [ɛlkaˈveː] + -am (dat.pl) > [ɛlkaˈveʃkəm] (truck-dim-dat.pl) (LJ07141) [ɛlkaˈveː] + -Ø (gen.pl)    > [ɛlkaˈveʃɪk] (gen.pl\truck-dim[gen.pl]) (LR07141) Again, reanalysis is involved: the stressed vowel [e] is treated as part of the suffix -ešk-, an allomorph of the diminutive suffix -k- (Švedova [1980] 2005: 2010). Interestingly, a similar strategy is observed in vernacular Russian: loan words, which are treated as indeclinable nouns within the zero declensional class in standard Russian (like the use of the word LKW in 15), receive (diminu­ tive) suffixes in order to undergo pluralization and declination. For instance, indeclinable nouns of standard Russian such as kafé ‘café’, pjuré ‘purée’ and sidí ‘CD’ are inflected as kaféška, pjuréška and sidíška in the Russian vernacular. In sum, stems with final consonants are easily integrated into the Russian declensional system with the standard addition of an inflection. The integration of stems with unaccented final vowels is achieved through the deletion of these vowels or the insertion of a suffix element such that the stems feature a con­ sonant in the final position. The stems with accented final vowels are problematic because they can only be integrated into the declensional system if consonantal suffixal elements are added (see Table 4). Although this process is attested only with one noun in the sample, it appears that integration into the declensional system in general and pluralization by inflections in particular are only possible if the stem ends in a consonant, that is direct use of overt Russian plural inflections with this type of stems is restricted and depends on the phonemic shape of the lexical item. This assumption accounts for why nouns such as Schlittschuh [ˈʃlɪtˌʃuː] ‘skate’ (LJ1221), Presswehe [ˈpʀɛsˌveːə] ‘pushing contraction’ (LV022408), and CD [ˌtseːˈdeː] (LD0405) are used in the corpus exclusively with German plural markers (cf. *Schlittschuh-i3, *Press-weh(e)-i, *CD-i).

3 Interestingly, the form šui [ˈʃuɪ] is not impossible in monolingual Russian: it corresponds to the genitive singular form of the toponym Šuja (a city in Central Russia). Nevertheless, the un­ derlying form of this noun is [ˈʃuj-], not [ˈʃu-] (which is evidenced by the derived anthroponym Šujskij). Likewise, the underlying form of the word idé-i ‘ideas’ is /idej-/ (see Itkin 2007: 246 for this stem alteration).



Explaining variation in plural marking of German noun insertions 

 39

Table 4: Morphophonological processes allowing the pluralization of German noun stems ending on vowels Stem-final vowel

Morphophonological process

unaccented accented

Tokens

Types

deletion

12

8

insertion of a suffix element

 1

1

insertion of a suffix element

 2

1

As such, analysis of morphophonological regularities of a matrix language can be fruitful for explaining the variation of plural marking with code-mixed nouns. However, the restriction formulated above is limited only to the three mentioned instances (Schlittschuhe ‘skates’, Presswehen ‘pushing contractions’, and CDs) because only few German stems end in an accented vowel. Furthermore, it is pos­ sible that the use of these German plural forms in Russian sentences could be explained by their frequencies in the embedded language. In German, the word Presswehe ‘pushing contraction’ is more common in the plural than in the singular (Duden online 2013). Since the influence of the frequency of plural forms in the embedded language seems to be more pervasive than the morphophonological restriction outlined above, we must establish the role frequency plays in choosing between German and Russian plural markers for German noun insertions.

6.2 Factors determining the language for plural marking: coding and modeling As illustrated above, when analyzing the plural marking on German noun insertions in Russian sentences, it is not always possible to attribute the choice between German and Russian overt markers to a single factor. Rather, the use of one of the patterns can be seen as an outcome of several factors interacting online in bilingual production. These factors include: (1) a mismatch in the frequencies of the plural and singular forms, (2) the stem final segment of the base form (nominative singular), and (3) the morphological case required by the slot the noun is inserted into. Each of these could be analyzed to account for some part of the data, yet it is impossible to say how relevant a factor is in terms of the overall variation observed because their effects differ in strength. It is thus necessary to perform statistical modeling in order to first disentangle the impact of each in determining overt plural marking and then to predict the outcome of the competition between them. To do this, I utilize the generalized linear mixed model.

40 

 Nikolay Hakimov

6.2.1 Frequencies of singular and plural forms In research to date, a mismatch in the frequencies of the singular and plural forms of a noun has not been investigated thoroughly as a factor determining the use of the embedded-language plural marking with code-mixed nouns as presented above. Backus mentions it in his monograph (1996), but does not explore its potential as a determinant of plural marking on inserted nouns empirically. Support for this idea comes from a study by Baayen, Dijkstra, and Schreuder (1997) in which lexical decision tasks were used to demonstrate the storage of Dutch high-frequency noun plurals. Following this hypothesis, German nouns inserted in otherwise Russian sentences should retain the embedded-language, i. e. German, plural marking if they are more frequently used as plurals than singulars in German. In contrast, if a German noun is not more commonly used in the plural, it will more likely be marked with a Russian plural marker as a result of the pressure the matrix language exerts to produce Russian constituents. To test these assumptions, the frequencies of singulars and plurals have to be measured. The bilingual corpus utilized here is of too modest a size to adequately examine the plural-singular distributions of the nouns under investigation. Therefore, the relevant frequencies have been obtained from deWaC, which is a 1.6 billion word corpus of German (Baroni and Kilgariff 2006). Given that this corpus is primarily based on written language, the measured frequencies can only be considered as rough approximations of spoken language. Unfortunately, no corpus of spoken German is available which matches deWaC in size. However, considering the age and education of the participants in this study, we can assume that they have received large portions of their German input from written sources as well. The first step in this analysis was to use deWaC to determine the frequencies of the singular and plural forms of the inserted German nouns extracted from the bilingual corpus. Nouns that exhibit divergent morphophonemic shapes in their singular and plural forms present the clearest and prototypical case, either adding an affix or employing umlaut to mark the plural. For example, the singular form of the noun Situation occurs 228,379 times in the corpus, whilst its plural form marked by inflection -en occurs 48,579 times. Differentiating between the singular and plural of nouns with coinciding singular and plural forms is impossible because the corpus is not tagged for morphological number. There are two groups of nouns for which no automated count of singular vs. plural forms could be achieved: one is distinguished by the inflection -n or -en, which appears both in the singular and the plural in all cases except for the nominative singular (patterns 7 and 8 in Table 2); the other group includes masculine nouns with stems ending in -r, -n and -l (Pattern 1 in Table 2), which are devoid of overt plural marking except in the dative plural. All sentences which include these ambiguous nouns were extracted from the deWaC corpus and further parsed automatically



Explaining variation in plural marking of German noun insertions 

 41

by the mate-tools pipeline (Björkelund et al. 2010) to obtain their morphological number. Frequencies of the items Ballerinas ‘ballerina-shoes’ and Obstbecher ‘fruit cups’ also could not be measured in deWaC because the former item also refers to ‘ballet-dancers’, and the frequencies of the two meanings could not be differentiated in an automated corpus analysis, and the latter word was not attested in the corpus at all. In total, 151 items were used for the analysis. The competition between the plural and singular forms of the nouns examined is modeled by employing the odds ratio (see Fahrmeir et al. 2007: 119). Odds is the ratio of the likelihood that an event will happen to the likelihood that an event will not happen. For plural-singular competition, this ratio is formulated in the following way: (18)

    

odds =

Fpl Fsg

where Fpl is the frequency of the plural form of a noun and Fsg is that of the singular form. The ratio expresses the relation between the strengths of representation of the plural and singular forms of a noun. In other words, plurals are character­ ized as more strongly or weakly represented mentally than their corresponding singulars. When the odds ratio equals 1, the representations of both forms are regarded to be equally strong. If the odds ratio is larger than 1, the plural form is more strongly entrenched than the singular form, and is thus more likely to become activated as a unit in production. Congruently, the value of the odds below 1 indicates a stronger entrenchment of the singular form. If we assume that the odds ratio models the competition between the rep­ resentations of plurals and singulars, we can hypothesize that with odds ratios larger than 1 the plural form should become activated as a whole and produced as a unit. This allows for the following prediction concerning bilingual production: When the embedded-language and the matrix language plural marking compete, the embedded-language plurals should be produced if they have odds ratios greater than 1. As such, the variation in the patterns of code-mixing could be explained by taking into account the frequency of the inflected form. Table 5 presents examples of the singular and plural forms of lexical items under investigation with their respective frequencies and odds values. The items were selected according to the pattern of plural marking they exhibit and their corresponding odds values, which are either higher or lower than 1. The first two items present nouns which form their plurals by suffixation, the following two employ both suffixes and umlaut to mark plural, and the last four are homophones. Taking the odds ratios reported here, we can assume that the plural forms Studien­gebühren ‘tuition fees’ and Bundesländer ‘federal states’

42 

 Nikolay Hakimov

Table 5: Plurals and singulars of some of the lexical items studied in the DeWAC corpus with respective frequencies of occurrence and values of odds. Overt plural markers are in bold Singular

Translation

Fsg

Plural

Fpl

Odds

Situation Studiengebühr Parkplatz Bundesland Türke(n) Kunde(n) Pflaster Ausländer

‘situation’ ‘tuition fee’ ‘car park’ ‘federal state’ ‘Turk’ ‘customer’ ‘plaster’ ‘foreigner’

228,379    895 13,839 16,857 14,432 77,580 3,491 12,128

Situationen Studiengebühren Parkplätze Bundesländer Türken Kunden Pflaster Ausländer

48,579 23,474  8,230 69,728  3,811 132,604 1533 42238

  .213 26.228   .595  4.136   .264  1.709   .439  3.483

are represented more strongly than their singular counterparts. The words Situation and Parkplatz ‘car park’ demonstrate a reverse distribution: they are more common and thus more strongly entrenched in the singular form. As mentioned above, the nouns Türke ‘Turk’ and Kunde ‘customer’ take the inflection -n not only to mark the plural but also the non-core cases in the singular. Hence, the forms Türken ‘Turk(s)’ and Kunden ‘customer(s)’ are homophonous. Consequently, we can hypothesize that the corresponding exemplar clusters are stronger than those linked to the forms of the nominative singular Türke and Kunde. However, as there are only nine tokens of this type in the bilingual corpus, this issue cannot be addressed here. Future work focussing on the cognitive representation of homophones in their relation to morphosyntax is clearly needed. In order to maintain consistency, counts of these nouns in deWaC were carried out according to the general procedure: case distinctions are ignored, and the instances of plural forms are counted separately from singular forms. The zero-marked form of the nominative singular and that inflected by -(e)n for the genitive, accusative and dative singular are taken together. The same method was employed with the second group of nouns exhibiting homophonous forms, i. e. masculine nouns featuring the stem-final /r/. These forms are exemplified by the two items at the bottom of Table 5: Pflaster ‘plaster’ and Ausländer ‘foreigner’. After the plural-singular ratios were calculated for the German lexical items marked for the plural, the logarithm of their values was taken in order to avoid skewing in the distribution (Baayen 2008: 31). These values are given in Figure 1. Figure 1a depicts the ordered values of the odds ratio, whilst Figure 1b shows its quartiles. As can be seen in both, the data points are distributed more sparsely around the extreme values than around the median. The values on both ends of the scale present outliers: these include the items Grundkenntnis ‘basic knowledge’ (log(odds) = 4.66) and Grippe ‘flu’ (log(odds) = –4.86), the former being extremely



Explaining variation in plural marking of German noun insertions 

 43

4 2 0 −4

−2

Plural−singular ratio

2 0 −2 −4

Plural−singular ratio

4

rare as a singular, and the latter as a plural. In order to enhance normality, the outliers were removed from the dataset. The number of the discarded data points amounts to 1.3 % of the sample. The distribution of values of the plural-singular ratio without outliers is represented in Figure 1c, and its quarterlies, in Figure 1d. A comparison of Figure 1a with Figure 1c indicates that the distribution in Figure 1c is more centered around zero, and therefore closer to normality.

0%

25%

75%

100%

2 1 0 −1 −2 −3

−3

−2

−1

0

1

2

Plural−singular ratio

3

Figure 1b

3

Figure 1a

Plural−singular ratio

50%

Quartiles

0%

Figure 1c

25%

50%

75%

100%

Quartiles

Figure 1d

Figure 1: The ordered values of the plural-singular odds ratio on the logarithmic scale (left) and its quartiles (right); before removing the outliers (above) and after removing outliers (below)

Figure 2 reports the relationship between the plural-singular ratio on the log­ arithmic scale and the language of the overt plural marker. The binary vari­ able overt plural marker is on the vertical axis and has the values of one and

44 

 Nikolay Hakimov

Plural marker language

Russian

German −3

−2

−1

0

1

2

3

Plural−singular ratio, log scale Figure 2: The relationship between the plural-singular ratio (on the logarithmic scale) and the language of the overt plural marker. Russian is the matrix language (ML), German is the embedded language (EL)

zero, which represent the overt plural marker language, Russian, as the matrix language, and German, as the embedded language; the values of the odds ratio are on the horizontal axis. The line depicting the relationship between the two variables is a LOWESS (locally weighted scatterplot smoothing) curve, which is a local regression model (Cleveland and Devlin 1988) defining the deterministic part of the variation in the sample. As indicated in Figure 2, the curve is virtually symmetrical around the logarithmic value of –0.3, which corresponds to the odds value of 0.740. In the plot we can also see that the line has two points of inflection, separating the central interval of the curve between the logarithmic values of –1.3 and 0.6. The central interval, also symmetrical around –0.3, stands for a gradual, transitional area. Interestingly, a comparison of the inflection points 0.277 and 1.822 at opposite ends of the spectrum also shows a symmetrical distribution of the lexical items in the data set: there are as many nouns more common in the singular than in the plural as there are nouns more frequent in the plural. As is



Explaining variation in plural marking of German noun insertions 

 45

evident from the shape of the curve, inserted items with the embedded-language plural markers present nouns that commonly occur in the plural, whereas nouns featuring the matrix language overt plural markers correspond to nouns typically used in the singular. The symmetry of the curve indicates that the amount of the former in the sample is approximately equal to that of the latter. Generally speaking, the higher the plural-singular ratio of an embedded lexical item, the higher the tendency for it to be selected in production as a unit.

6.2.2 Stem-final segments of inserted lexical item as a predictor of the language of plural marking The phrase ‘stem-final segment of an inserted lexical item’ refers to the stem final segment observed in the nominative singular, or the base form. For example, the stem-final segment of the word Türke ‘Turk’ is a vowel. As shown in 6.1, there are a number of compromise strategies when adding Russian plural inflections to German stems featuring vowels stem-finally. One strategy is to delete the vowel, if it is unaccented, regardless of its quality, as in (16), another option is to insert a consonant or a consonant cluster, which as a rule is related to highly productive Russian (diminutive) suffixes. Both of these strategies result in a stem-final con­ sonant. Occasionally, the tendency to avoid vowels in the stem-final position of the German noun insertions triggers the use of German plural markers with stems featuring vowels in that position. As discussed above, this is observed with stemfinal accented vowels. In contrast to the morphophonological constraint formu­ lated in Section 6.1, the assumption here is that even German nominal stems featuring unaccented vowels in the final position tend to retain German plural markers. In other words, if the stem-final segment of the inserted German lexical item is a vowel, the chance that this item will be used with its German plural marker is high, and vice versa, if the stem-final segment of the inserted German lexical item is a consonant, the preference for pluralizing this item with a Russian plural inflection will be strong. The data were coded for the presence or absence of a vowel in the stem-final position of German nominal insertions. In the case of the stem-final /r/, its con­ sonantal and vocal realizations were considered separately and were respectively coded. Figure 3 displays the relationship between the sound at the stem end and the language of the plural marker. According to Figure 3, the proportion of embedded-language plural markers is skewed depending on whether stems feature final vowels. This is in line with the hypothesis that noun insertions with vowels in the stem-final position favor German plural markers, while Russian plural markers are more common with stems that end in consonants. Whether the

46 

 Nikolay Hakimov

EL marker

ML marker

Stem−final segment

consonant

vowel

Plural marker language Figure 3: The choice of the language for overt plural marking on German nominal insertions and the final sound of the stem. Russian is the matrix language (ML), German is the embedded language (EL)

segment in the stem-final position of a German insertion has explanatory power regarding the variation in the data will be discussed in Section 7.

6.2.3 The morphological case of the slot The mismatch in the nominal systems of German and Russian could be consid­ ered an additional motivation to resort to the richer inflectional system of Russian when inserting German lexical items into Russian sentences. The two languages are nonequivalent in two respects: Firstly, the number of morphological cases in the Russian nominal system outnumbers the cases in the German system. Secondly, in German, case is often not marked morphologically on the stem, rather syntactically on the determiner (see Section 4.2). In these data, insertion of fully-fledged noun phrases is not observed, and the inserted German plurals



Explaining variation in plural marking of German noun insertions 

 47

do not add the only German case suffix used in the plural -(e)n, therefore these insertions remain unmarked for the morphological case. In contrast, Russian plural inflections do not show such a high degree of syncretism. In order to mark the case explicitly, as is required by the nominal system of Russian, speakers can use Russian plural markers, especially if the case projected on the slot is a noncore case, i. e. neither the nominative nor the accusative. The nominative case and the accusative case have a special status in the Russian declensional system. In the data set, the nominative plural is generally expressed by the inflection -i and its variant -y, with the exception of the inflection of the masculine plural -á, occurring in the form bauer-á ‘peasants’ in analogy with such Russian nouns as professor-á and traktor-á. The same inflections are used for the accusative plural of inanimate nouns. Note that no animate nouns are attested in the data set in the form of the accusative plural (which require inflections identical with the genitive plural). In other words, apart from one instance of the inflection -a, the nominative plural and the accusative plural appear both to be marked solely by the inflection -i and its allomorph -y. As such, the inflection -i (-y) is regarded as a prototypical plural marker of Russian nouns; the inflection -y also marks plural on pre­ dicative ‘short’ adjectives, as in umn-y ‘clever-pl’, and the inflection -i marks plural on verbs in the past tense, as in by-l-i ‘be-past-pl’, thus reinforcing their status as prototypical plural markers. Additional evidence for this assumption comes from language acquisition. As the most frequent plural inflection in the declensional system of Russian nouns, -i (y) is the first plural marker to be acquired by Russian learning children, and is the one most often generalized in the process of acquisition (Gagarina and Voeikova 2009). As previously outlined, a similar situation can be observed in German: plural inflections of German nouns exhibit a high degree of syncretism. It thus appears that larity in the status of German plural markers and Russian plural the simi­ inflections referring to the core cases facilitates insertion of German plural forms. Accordingly, we can hypothesize that if the slot requires the nominative case or the accusative case, German plurals can easily be inserted. The opposite is also assumed to hold, i. e. German noun insertions will take Russian plural inflections if the slot in which the noun is inserted has a non-core case. These hypotheses determined the coding of the data with respect to case. The examined German noun insertions were analyzed for the morphological case that the slot they were inserted in required. Following the argumentation above, the items were then grouped according to the case of the slot: one group included the items in nominative and accusative slots, and the other was made up of the items in the non-core cases (the genitive, the dative, the instrumental and the prepositional). The predictor ‘case of the slot’ was thus coded as a binary vari­

48 

 Nikolay Hakimov

EL marker

ML marker

Slot case

other

Nom and Acc

Plural marker language Figure 4: The relationship between the choice of the language for overt plural marking on German (EL) nominal insertions and the case projected by the slot. Russian is the matrix language (ML)

able. The relationship between the case projected on the inserted German nouns and the language of the plural marker is displayed in Figure 4. The case of the slot is on the vertical axis, the language of the plural marker is on the horizontal axis. The data in Figure 4 reveal that overall, the nominative and the accusative are more frequent than all the other cases. Additionally, the proportions of the embedded-language and matrix language plural markers are asymmetrical in terms of the case required by the slot. The large proportion of German plurals in the nominative and accusative slots indicates that these slots, as expected, seem to accommodate German plurals easily. The explanation for this accommodation is as suggested above: the status of the Russian plural inflection -i and its variant -y is similar to that of German plural inflections. Finally, as anticipated, inflections of the matrix language are favored when a morphological case other than the



Explaining variation in plural marking of German noun insertions 

 49

nominative or accusative is required. Given these results, the case projected by the slot will be included in the statistical model below as one of the factors deter­ mining the language of plural markers.

7 Predicting the language of overt plural markers of German noun insertions in Russian sentences by statistical modeling In this section, the factors presented thus far will be investigated regarding the extent to which they compete with or assist one another in determining the plural marking on German nouns inserted into a Russian matrix structure. The issue of the explanatory power of each of the competing factors is also addressed. These matters are studied through the application of the generalized linear mixed model (see Baayen 2008: 278–284; cf. Bresnan et al. 2007; Tagliamonte and Baayen 2012). Probabilities of binary outcomes (the choice between the languages providing plural markers: Russian as the matrix language and German as the embedded language) are determined on the basis of the predictor variables, i. e. the coding and modeling of factors as described in Section 6.2. Significant interactions between the predictor variables and the dependent variable ‘the language of the plural marker of code-mixed German nouns’ provide tangible evidence for the relevance of the factors.

7.1 Model fitting In a regression model with mixed effects, the joint contribution of all factors is computed by testing each factor individually, while the others remain constant (cf. Szmrecsanyi 2013). In order to obtain a minimal adequate regression model, the common procedure was employed (Baayen 2008; Szmrecsanyi 2013). I began by fitting the maximal model with the three factors outlined above as main effects: the plural-singular odds ratio based on the frequencies of occurrence in the embedded language of singular and plural forms of lexical items under analysis; the stem-final segment of these items, i. e. a vowel or a consonant; and the mor­ phological case of the slot in which the item is inserted. Additionally, the maximal model included the interactions between these factors. The speakers’ individual differences in marking plural on German nominal insertions, i. e. the tendency to either retain the German plural marker or to add Russian plural inflections, was measured using the variable ‘speaker’. The variable ‘speaker’ was handled as a by-subject random effect. Unfortunately, adding ‘item’ as a random effect was

50 

 Nikolay Hakimov

impossible due to the high variation in this variable: the 151 pluralized German noun insertions occur with 110 different lexical items. The model was thus run without the by-item random variable. The model simplification consisted in the exclusion of factors and interactions without any significant contribution to the explanatory power of the model. Following Baayen (2008: 281), the estimation of explanatory power of the interaction terms and main effects is based on the calculation of the C index of concordance. As a consequence, the procedure of model reduction excluded all interaction terms from the model (plural-singular odds ratio × stem-final segment, plural-singular odds ratio × case of the slot, stem-final segment × case of the slot). The final, minimal adequate model is given in Table 6. Table 6: Predicting the language of the overt plural marker: minimal adequate generalized linear mixed model. Predicted odds ratios are for Russian (or matrix language) overt plural markers

model intercept Plural-singular ratio Stem-final segment (‘vowel’) Slot case (accusative or nominative)

odds ratio

b

p-value

2.147  .443  .339  .383

.764 –.815 –1.082 –.961

.096 . .000 *** .014 * .031 *

Random effect: Speaker (intercept, N = 12, variance: 0.439, σ = 0.662) Summary statistics: N % correct predictions (% baseline) C index of concordance Somers’ Dxy

151  79 (81) .838 .676

Significance levels: p < 0.05 (*), 0.05 < p < 0.1 (.).

7.2 Model evaluation and model discussion After the model is fit to the data, the fit can be evaluated. The minimal adequate model reported in Table 6 is of high quality. The model correctly classifies 79 % of the data overall. Regarding the categorical prediction of the plural marker language, i. e. when always guessing one variant, the model correctly predicts 77 % of German plural markers and 81 % of Russian plural inflections. Fur­ thermore, the fit is estimated by the measure of predictive accuracy which relates



Explaining variation in plural marking of German noun insertions 

 51

the observed realizations of the plural marker to the predicted outcome of the model (Bresnan et al. 2007). The outcome of the model is the choice of the German or Russian plural marker, denoted by 0 and 1 respectively. The accuracy measure counts any probability > 0.5 as correct for the Russian plural marker. The measure of predictive accuracy is provided in Table 7. The model correctly predicts the use of German plural markers with the lexical items analyzed in 85 % of the cases, but it has more difficulty predicting the use of Russian plural markers, performing with only 72 % of correct predictions. The C index of concordance between the predicted probability and the observed binary outcome is 0.838, which indicates that the model has real predictive power. Performance indicator Somers’ Dxy, a rank correlation coefficient between predicted probabilities and observed binary response, is 0.676, which also attests some predictive capacity of the model. Table 7: Model accuracy. Classification table for the minimal adequate model. (The table representation is based on Bresnan et al. 2007)

Observed

0 1

   Predicted

% correct

 0 67 20

 1 12 52

85 72

Overall

79

Regarding the random effect ‘speaker’, the variance and standard deviation of the by-subject effect reveal minimal variation. The speakers in the sample do not exhibit any considerable individual differences when choosing the language of the plural marking with German noun insertions. In other words, neither language is favored by the subjects for marking plural. The main effects in the model offer persuasive evidence in favor of the hypotheses formulated above. The signs of the regression coefficients (b) in Table 6 reveal the direction of the adjustment to the intercept. Given this, we can con­ clude that the factors high plural-singular ratio, vowel as a stem-final segment, and the nominative or accusative case of the slot, disfavor the use of Russian plural markers with German code-mixed nouns. Conversely, any of the non-core cases projected by the slot, a consonant as a stem-final segment, and low pluralsingular ratio demonstrate a preference for Russian plural markers. Consider the odds ratios listed in Table 6. Both the stem-final segment and the case of the slot have comparable effect sizes, that is, the odds for using a Russian plural marker decrease by approximately 66 % if the stem features a vowel in the final position, and by 62 % if the case of the slot is not a non-core case. The strongest effect is

52 

 Nikolay Hakimov

exhibited by the plural-singular ratio: the odds for Russian overt plural markers fall by 56 % at every one-unit increase in the plural-singular ratio on the log­ arithmic scale. This stands for the increase of this ratio by 2.7 on the linear scale. In other words, if the plural-singular ratio increases by 2.7, the odds for using a Russian plural marker falls by 56 %. PL−SG ratio

Stem−final vowel

Slot case 0

10

20

Figure 5: Importance of factors in model: decrease in Akaike Information Criterion (AIC) if factor removed. The table representation is based on Szmrecsanyi 2013

The overall importance of the factors is given in Figure 5 by plotting the decrease in the Akaike Information Criterion of the model when a factor is removed from the minimal model. According to Szmrecsanyi (2013), more sizable decreases in the AIC criterion of a factor stand for its greater overall importance. Thereby, when predicting the language of the overt plural marker used with 151 German noun insertions in Russian sentences, the most important factor is the plural-singular ratio. The second most important predictor is the phonetic shape of the stem, to be more exact the presence or absence of a vowel in the stem-final position. The case projected by the slot on the inserted lexical item is ranked last. It should be noted that in slots that project the accusative or nominative case (coded as one level of the binary variable ‘case of the slot’), the proportions of German and Russian plural markers on noun insertions are equally large, and the asymmetry exhibited is due to the number of instances of the other cases projected. In this



Explaining variation in plural marking of German noun insertions 

 53

case, German noun insertions do not take German plural markers as frequently as Russian ones. This analysis shows that the frequency-based plural-singular ratio, a con­ tinuous variable, and structural factors such as the case of the slot and the stemfinal segment of the inserted lexical item, when modeled as binary variables can reliably predict the language of the overt plural marking on German lexical items inserted in Russian matrix clauses. The most important predictor in the variation is the plural-singular ratio. Furthermore, the examined variation in the speakers’ preferences to use either Russian or German plural markers was taken into con­ sideration and found to be negligible.

8 Conclusions and discussion This study addresses the question of whether in a situation of marking plural on code-mixed lone nouns, these nouns retain the plural morphology of their language, i. e. the embedded language, or receive plural markers from the matrix language: [r [g N-pl g] r] or [r [g N g]-pl r]. Most prior research has focused on suggesting explanations for the use of either double plural morphology or embedded-language plural morphology with code-mixed nouns, and attributed the use of these patterns to structural factors (Boumans 1998: 91; Myers-Scotton 2002: 91, 150), erroneous access in production (Myers-Scotton 1993: 132–136; Myers-Scotton and Jake 1995: 1000) or high frequency of some plurals in the source language (Backus 1996: 151, 1999: 97–99, 2003: 93–100). However, these assumptions have neither been systematically examined, nor tested on mono­ lingual material. In this study I have analyzed the extent to which the choice of the language for plural marking on code-mixed nouns is determined by the frequencies of the plural forms of the inserted items in the embedded language and the structural requirements imposed by the matrix language. The research questions are addressed through corpus analyses and statistical modeling. This investigation reveals three main findings. The first concerns a frequency effect: the frequency with which a noun plural occurs in the embedded language, i. e. German, appears to determine the language of the plural marker on codemixed nouns. In a situation of competition between the singular and plural forms of a lexical item during online production, an item commonly used as a plural will tend to be selected as a whole, and inserted into a matrix clause retaining its plural marker. If the competition between the plural and singular forms of a lexical item is too intense, or it is the singular whose representation is stronger, only the stem of the lexical item – which corresponds to the singular form in these data – will be activated in production, and will receive the plural marker from the

54 

 Nikolay Hakimov

matrix language, i. e. Russian. This provides compelling evidence for the effect of frequency assumed earlier in Backus (1996, 1999, 2003). The entrenchment of high-frequency plurals observed in the Russian-Ger­ man code-mixing data concurs with similar findings from previous studies in language processing, first-language acquisition and typology. A series of experi­ ments conducted by Baayen, Dijkstra and Schreuder (1997) reveal that speakers of Dutch store high-frequency noun plurals. In the speech of children acquiring Russian, the first plural forms are nouns that are usually used as plurals, for example glaz-a ‘eye-nom.pl’, jagod-y ‘berry-nom.pl’ and grib-y ‘mushroom-nom. pl’ (Gagarina and Voeikova 2009: 198). Interestingly, children have been found to use these forms with reference to singular entities (Ceitlin 2000: 91), which reinforces Baayen, Dijkstra and Schreuder’s assertion (1997) that high-frequency plurals are learned and stored as holistic units. Furthermore, in a corpus-based cross-linguistic analysis of number marking, Haspelmath and Karjus (2013) propose semantic groups of lexemes which are more frequent as plurals than singulars, these include paired body-parts, paired items, small animals, fruit/ vegetables, people, and ethnic groups. It is indeed striking that many of the insertions with the embedded-language plural markers in both my data and other bilingual corpora fall into one of the suggested categories. The frequency effect observed in my data could thus also be approached from a semantic perspective. The fact that plurals distinguished by high frequency are selected as wholes and produced as units indicates that morphological forms related to each other – such as a stem and its plural – seem to have separate mental representations if the frequency of the inflected form is high. This result is in line with findings of experi­ mental studies on language change in progress, which assert that phonetically reduced forms are represented separately from full forms when the former are highly frequent (Lorenz 2013; Schäfer 2013). These findings also hold for multi-word sequences, which are produced as holistic units, given their high frequency (see Schneider this volume). These observations provide solid evidence for the reality of entrenchment, its reliance on frequency of use, and its importance in language processing and language change. The second finding of this study concerns the relevance of the phonetic shape of the inserted word for receiving an inflectional suffix from the matrix language. In the data, German lexical items with an accented vowel in the stemfinal position cannot take Russian inflectional suffixes directly as the Russian declensional system depends on having stems with consonants in the final position. As such, if the embedded German noun has an accented vowel in the stem-final position, either a compromise strategy is employed, such as the use of epenthetic consonants, or German plural forms are produced. The mor­ phophonological regularities of the Russian declensional system thus impose



Explaining variation in plural marking of German noun insertions 

 55

a restriction on the phonetic shape of German lexical items to be inserted. In the subsequent statistical analysis, this absolute constraint was reevaluated as a probabilistic factor because it was revealed that the presence of a vowel in the stem-final position, whether accented or not, results in favoring the use of German plural markers. The third result relates to the other structural factor: the mismatch in the systems of case marking in the plural between German and Russian. When the matrix structure projects a non-core case on the slot in which a German lexical item is inserted, the tendency is to employ Russian inflections, characterized by a fusion of the plural and case. This finding can be interpreted as a manifestation of the pressure exerted by the matrix language to produce well-formed Russian con­ stituents. However, there are many occurrences of German nouns retaining their German plural markers in slots requiring the nominative or accusative case. This situation is explained by the fact that the functions of German plural inflections and the Russian inflection of the nominative and accusative case -i (y) coincide: both express plural rather than case, owing to the form syncretism. This kind of equivalence results in the ease of inserting German plurals in these slots. The findings of this study have far-reaching consequences for models of codemixing because they clarify the emergence of the so-called internal embeddedlanguage islands. Firstly, the embedded-language plural markers are regarded as syntactically inactive (Myers-Scotton 2002: 92) or parts of chunks (Backus 1999: 98). If we interpret the idea of syntactic inactivity in terms of holistic storage, we can assert that the embedded-language plural markers are syntactically inactive inasmuch as they are part of the representations of plural forms. These plurals are so strongly entrenched in the mental lexicon/grammar that they are retrieved as wholes in bilingual production. If a plural is weakly entrenched, the matrix language plural marker is produced. Secondly, this work contributes to the field of contact linguistics, in that the application of a multi-factorial analysis enabled the modeling of three factors in their collaborative work of determin­ ing the outcome of competition between Russian and German in bilingual production. By handling both categorical factors  – the phonetic and the mor­ phosyntactic context – as probabilistic tendencies rather than absolute rules, the study manages to provide greater understanding of the idiosyncrasies observed in bilingual language production. The results here are encouraging, and should be validated in further studies of other bilingual corpora.

56 

 Nikolay Hakimov

Acknowledgments I would like to express my deep gratitude to Peter Auer and Heike Behrens for their invaluable advice and perspicacious criticism that enabled me to considerably improve this paper. I am also grateful to the graduate training group “Frequency effects in language” for all kinds of support. The analysis of frequencies in the deWaC corpus would have been incomplete without the technical assistance by Uli Held, to whom I owe a debt of gratitude. Furthermore, I would like to thank those who responded to my questions concerning various languages: Netta Abugov for Hebrew, Ad Backus for Turkish, Lilya Molchanova for Japanese, Francesca Moro for Malay and Pekka Posio for Finnish. Any remaining errors are my own.

References Auer, Peter 1999: From codeswitching via language mixing to fused lects: Toward a dynamic typology of bilingual speech. International Journal of Bilingualism 3(4): 309–332. Auer, Peter 2014: Language mixing and language fusion: When bilingual talk becomes monolingual. In: Juliane Besters-Dilger, Cynthia Dermarkar, Stefan Pfänder and Achim Rabus (eds.), Congruence in Contact-Induced Language Change, Language Families, Typological Resemblance, and Perceived Similarity, 294–334. Berlin/Boston: de Gruyter. Auer, Peter and Raihan Muhamedova 2005: “Embedded Language” and “Matrix Language” in insertional language mixing: Some problematic cases. Journal of Italian Linguistics 17(1): 35–54. Baayen, Harald R. 2008: Analyzing Linguistic Data. Cambridge (UK)/New York: Cambridge University Press. Baayen, Harald R., Ton Dijkstra and Robert Schreuder 1997: Singulars and plurals in Dutch: Evidence for a parallel dual-route model. Journal of Memory and Language 37(1): 94–117. Backus, Ad 1992: Patterns of Language Mixing: A study in Turkish-Dutch Bilingualism. Wiesbaden: Harrassowitz. Backus, Ad 1996: Two in One: Bilingual Speech of Turkish Immigrants in the Netherlands, Tilburg: Tilburg University Press. Backus, Ad 1999: Evidence for lexical chunks in insertional codeswitching. In: Bernt Brendemoen, Elizabeth Lanza and Else Ryen (eds.), Language Encounters across Time and Space: Studies in Language Contact, 93–109. Oslo: Novus. Backus, Ad 2003: Units in codeswitching: Evidence for multimorphemic elements in the lexicon. Linguistics 41(1): 83–132. Barlow, Michael and Suzanne Kemmer (eds.) 1999: Usage-Based Models of Language. Stanford, CA: Center for the Study of Language and Information (CSLI). Baroni, Marco and Adam Kilgariff 2006: Large linguistically-processed web corpora for multiple languages. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006), Trento, Italy, 87–90. Morristown, NJ: Association for Computational Linguistics.



Explaining variation in plural marking of German noun insertions 

 57

Benson, Morton 1960: American-Russian speech. American Speech 35(3): 163–174. Björkelund, Anders, Bernd Bohnet, Love Hafdell and Pierre Nugues 2010: A high-performance syntactic and semantic dependency parser. In: Proceedings of the 23rd International Conference on Computational Linguistics, 33–36. Beijing: Chinese Information Processing Society of China. Boumans, Louis 1998: The Syntax of Codeswitching: Analysing Moroccan Arabic/Dutch Conversation, Tilburg: Tilburg University Press. Brehmer, Bernhard 2007: Sprechen Sie Qwelja? Formen und Folgen russisch-deutscher Zweisprachigkeit in Deutschland. In: Tanja Anstatt (ed.), Mehrsprachigkeit bei Kindern und Erwachsenen: Erwerb, Formen, Förderung, 163–185. Tübingen: Attempto. Bresnan, Joan, Anna Cueni, Tatiana Nikitina and Harald R. Baayen 2007: Predicting the dative alternation. In: Gerlof Bouma, Irene Krämer and Joost Zwarts (eds.), Cognitive Foundations of Interpretation, 69–94. Amsterdam: Edita-KNAW-Royal Netherlands Academy of Arts and Sciences. Budzhak-Jones, Svitlana 1998: Against word-internal codeswitching: Evidence from UkrainianEnglish bilingualism. International Journal of Bilingualism 2(2): 161–182. Bybee, Joan L. 1985: Morphology: A Study of the Relation between Meaning and Form. Amsterdam/Philadelphia: Benjamins. Bybee, Joan L. 2006: From usage to grammar: The mind’s response to repetition. Language 82(4): 711–733. Bybee, Joan L. 2007: Frequency of Use and the Organization of Language. Oxford/New York: Oxford University Press. Bybee, Joan L. 2010: Language, Usage and Cognition. Cambridge, UK: Cambridge University Press. Bybee, Joan L. and Paul J. Hopper (eds.) 2001: Frequency and the Emergence of Linguistic Structure. Amsterdam/Philadelphia: Benjamins. Ceitlin, Stella N. 2000: Jazyk i rebjonok: Lingvistika detskoj reči [Language and Child: Linguistics of Child Speech]. Moscow: VLADOS. Cleveland, William S. and Susan J. Devlin 1988: Locally weighted regression: An approach to regression analysis by local fitting. Journal of the American Statistical Association 83(403): 596–610. Corbett, Greville G. 1991: Gender. Cambridge, UK: Cambridge University Press. Corbett, Greville G. 2003: Types of typology, illustrated from gender systems. In: Frans Plank (ed.), Noun Phrase Structure in the Languages of Europe, 289–334. Berlin/New York: de Gruyter. Diessel, Holger 2007: Frequency effects in language acquisition, language use, and diachronic change. New Ideas in Psychology 25: 108–127. Divjak, Dagmar and Stefan Th. Gries (eds.) 2012: Frequency Effects in Language Representation. Berlin/Boston: de Gruyter. Duden online 2013: Das Wörterbuch Duden. Bibliographisches Institut GmbH (ed.). May 16, 2013. http://www.duden.de/node/764273/revisions/1149030/ view Eisenberg, Peter 2006: Gundriss der deutschen Grammatik, Vol. 1. Stuttgart: Metzler. Ellis, Nick C. 2002: Frequency effects in language acquisition: A review with implications for theories of implicit and explicit language acquisition. Studies in Second Language Acquisition 24: 148–188.

58 

 Nikolay Hakimov

Fabricius-Hansen, Cathrine, Peter Gallmann, Peter Eisenberg, Reinhard Fiehler and Jörg Peters (eds.) 2009: Die Grammatik: Unentbehrlich für richtiges Deutsch, Band 4. Mannheim/Wien et al.: Dudenverlag. Fahrmeir, Ludwig, Rita Künstler, Iris Pigeot and Gerhard Tutz 2007: Statistik: Der Weg zur Datenanalyse. Berlin/Heidelberg et al.: Springer. Flämig, Walter 1991: Grammatik des Deutschen: Einführung in Struktur- und Wirkungszusammenhänge. Berlin: Akademie. Gagarina, Natalia and Maria D. Voeikova 2009: Acquisition of case and number in Russian. In: Ursula Stephany and Maria D. Voeikova (eds.), Development of Nominal Inflection in First Language Acquisition: A Cross-Linguistic Perspective, 179–217. Berlin/New York: de Gruyter. Gries, Stefan Thomas and Dagmar Divjak (eds.) 2012: Frequency Effects in Language Learning and Processing. Berlin/Boston: de Gruyter. Hakimov, Nikolay to appear: Effects of frequency and word repetition on switch-placement: Evidence from Russian-German code-mixing. In: Justyna A. Robinson and Monika Reif (eds.), Cognitive Perspectives on Bilingualism. Boston/Berlin: de Gruyter. Halmari, Helena 1997: Government and Codeswitching: Explaining American Finnish. Amsterdam/Philadelphia: Benjamins. Haspelmath, Martin and Andres Karjus 2013: Corpus-Based Universals Research: Explaining Asymmetries in Number Marking. Presented at the 35th Tagung der Deutschen Gesellschaft für Sprachwissenschaft, AG7: Usage-based approaches to morphology, Potsdam. Hasselmo, Nils 1972: Code-switching as ordered selection. In: Evelyn Scherabon Firchow, Kaaren Grimstad, Nils Hasselmo and Wayne A. O’Neill (eds.), Studies for Einar Haugen, 261–280. The Hague/Paris: Mouton. Helbig, Gerhard and Joachim Buscha 2001: Deutsche Grammatik: ein Handbuch für den Ausländerunterricht. Berlin/New York: Langenscheidt. Hentschel, Elke and Harald Weydt 2003: Handbuch der deutschen Grammatik. 3rd ed. Berlin/ New York: de Gruyter. Hlavac, Jim 2003: Second-Generation Speech: Lexicon, Code-Switching and Morpho-Syntax of Croatian-English Bilinguals. Bern/Oxford: Peter Lang. Itkin, Il’ja B. 2007: Russkaja morfonologija [Russian morphophonology]. Moscow: Gnozis. Kohler, Klaus J. 1995: Einführung in die Phonetik des Deutschen. Berlin: Erich Schmidt. Langacker, Ronald W. 1987: Foundations of cognitive grammar, Vol. 1: Stanford, CA: Stanford University Press. Langacker, Ronald W. 1999: A dynamic usage-based model. In: Michael Barlow and Suzanne Kemmer (eds.), Usage-Based Models of Language. Stanford, CA: Center for the Study of Language and Information (CSLI). Lorenz, David 2013: Contractions of English Semi-Modals: The Emancipating Effect of Frequency. NIHIN studies: New Ideas in Human Interaction. Freiburg: Albert-LudwigsUniversität: Universitätsbibliothek. Meng, Katharina 2001: Russlanddeutsche Sprachbiografien: Untersuchungen zur sprachlichen Integration von Aussiedlerfamilien. Tübingen: Narr. Muhamedowa, Raihan 2006: Untersuchung zum kasachisch-russischen Code-Mixing (mit Ausblicken auf den uigurisch-russischen Sprachkontakt). München: Lincom Europa.



Explaining variation in plural marking of German noun insertions 

 59

Muysken, Pieter 1995: Code-switching and grammatical theory. In: Lesley Milroy and Pieter Muysken (eds.), One Speaker, two Languages: Cross-Disciplinary Perspectives on CodeSwitching, 177–198. Cambridge, UK/New York: Cambridge University Press. Muysken, Pieter 2000: Bilingual Speech: A Typology of Code-Mixing. Cambridge, UK/New York: Cambridge University Press. Muysken, Pieter, Hetty Kook and Paul Vedder 1996: Papiamento/Dutch code-switching in bilingual parent-child reading. Applied Psycholinguistics 17(04): 485–505. Myers-Scotton, Carol 1993: Duelling Languages: Grammatical Structure in Codeswitching. Oxford/New York: Clarendon Press/Oxford University Press. Myers-Scotton, Carol 2002: Contact Linguistics: Bilingual Encounters and Grammatical Outcomes. Oxford: Oxford University Press. Myers-Scotton, Carol and Janice L. Jake 1995: Matching lemmas in a bilingual language competence and production model: evidence from intrasentential code-switching. Linguistics 33: 981–1024. Naiditch, Larissa 2008: Tendencii razvitija russkogo jazyka za rubežom: russkij jazyk v Izraile [Tendencies in the development of Russian abroad: Russian in Israel]. Russian Linguistics 32(1): 43–57. Nortier, Jacomine 1990: Dutch-Moroccan Arabic Code Switching among Moroccans in the Netherlands. Dordrecht/Providence: Foris. Olshtain, Elite and Shoshana Blum-Kulka 1989: Happy Hebrish: Mixing and switching in American-Israeli family interactions. In: Susan M. Gass, Carolyn Madden, Dennis R. Preston and Larry Selinker (eds.), Variation in Second Language Acquisition, 59–83. Clevedon: Multilingual Matters. Poplack, Shana [1980] 2006: Sometimes I’ll start a sentence in Spanish y termino en español: Toward a typology of code-switching. In: Li Wei (ed.), The Bilingualism Reader, 2nd ed., 213–243. New York: Routledge. Ramers, Karl Heinz and Heinz Vater 1991: Einführung in die Phonologie. Hürth-Efferen: Gabel. Sankoff, David, Shana Poplack and Swathi Vanniarajan 1990: The case of the nonce loan in Tamil. Language Variation and Change 2(1): 71–101. Schäfer, Michael 2014: Phonetic Reduction of Adverbs in Icelandic. NIHIN Studies: New Ideas in Human Interaction. Freiburg: Albert-Ludwig-Universität: Universitätsbibliothek. Stenson, Nancy 1990: Phrase structure congruence, government, and Irish-English codeswitching. In: Randall Hendrick (ed.), The Syntax of the Modern Celtic Languages, Syntax and Semantics, 167–197. San Diego: Academic Press. Szabó, Csilla Anna 2010: Language shift und Code-Mixing: Deutsch-ungarisch-rumänischer Sprachkontakt in einer dörflichen Gemeinde in Nordwestrumänien. Frankfurt a. M./Berlin et al.: Peter Lang. Szmrecsanyi, Benedikt 2013: The great regression: Genitive variability in Late Modern English news texts. In: Kersti Borjars, David Denison and Alan Scott (eds.), Morphosyntactic Categories and the Expression of Possession, 59–88. Amsterdam/Philadelphia: Benjamins. Švedova, Natalija Jul’evna (ed.) [1980] 2005: Russkaja grammatika [Russian grammar]: Fonetika, fonologija, udarenie, intonatsija, slovoobrazovanije, morfologija [Phonetics, phonology, stress, intonation, word-formation, morphology], Vol. 1. Moscow: Institut russkogo jazyka im. V. V. Vinogradova.

60 

 Nikolay Hakimov

Tagliamonte, Sali A. and Harald R. Baayen 2012: Models, forests, and trees of York English: Was/were variation as a case study for statistical practice. Language Variation and Change 24(2), 135–178. Timberlake, Alan 2004: A Reference Grammar of Russian. Cambridge, UK: Cambridge University Press. Tomasello, Michael 2003: Constructing a Language. Cambridge, MA/London: Harvard University Press. Treffers-Daller, Jeanine 1994: Mixing two Languages: French-Dutch Contact in a Comparative Perspective. Berlin/New York: de Gruyter. Verschik, Anna 2004: Aspects of Russian-Estonian codeswitching: Research perspectives. International Journal of Bilingualism 8(4): 427–448. Voigt, Herman A. 1994: Code-wisseling, taalverschuiving en taalverandering in het Melaju Sini [Code-switching, language shift and language change in Melaju Sini]. Unpublished MA Thesis, Tilburg University. Zaliznjak, Andrej Anatol’evič [1967] 2002: Russkoe imennoe slovoizmenenie [Russian nominal inflection]. In: “Russkoe imennoe slovoizmenenie” s priloženiem izbrannyh rabot po sovremennomu russkomu jazyku i obščemu jazykoznaniju [“Russian nominal inflection” with a supplement of selected papers on Modern Russian and general linguistics]. Moscow: Jazyki slavjanskoj kul’tury. Zaliznjak, Andrej Anatol’evič [1977] 2009: Grammatičeskij slovar’ russkogo jazyka: slovoizmenenie [Dictionary of Russian grammar: inflection], 5th ed. Moscow: AST-Press Kniga.

Ulrike Schneider (Johannes Gutenberg University Mainz)

Hesitation placement as evidence for chunking

A corpus-based study of spoken English Abstract: Hesitations are not scattered throughout speech at random, they occur in predictable positions. Previous studies have shown that structural factors such as constituent boundaries influence hesitation placement. This study accounts for hesitation placement from a usage-based perspective through the analysis of hesitations in prepositional phrase contexts in the Switchboard NXT corpus of American English. Using this method, I demonstrate that oft-used sequences are less likely to be interrupted by hesitations than low frequency word pairs. Fur­ thermore, results show that complex measures of the associations between words cannot only predict hesitation behavior, but also model semantic unity. These results strongly point to frequency effects in speech processing. Co-occurrence patterns of words are mentally tracked so that with increasing frequency or likelihood of co-occurrence words become easier to retrieve as a sequence.

1 Introduction Since the mid-20th century, linguistic interest in hesitations has soared. While most studies center on the function and number of hesitations in spoken language, some researchers have also taken an interest in their placement. The majority of these analyses look at structural determinants of hesitation placement (cf.  Maclay and Osgood 1959; Boomer 1965; Goldman-Eisler 1968; Cook 1971; Beattie and Butterworth 1979; Shriberg 1994; Clark and Wasow 1998; Bortfeld et al. 2001; Clark and Fox Tree 2002; Schilperoord and Verhagen 2006) and find that hesitations such as filled and unfilled pauses predominantly occur at the boundaries between intonation units (cf. e. g. Clark and Fox Tree 2002: 94), or between major constituents (cf.  e. g. Clark and Clark 1977: 267–268) such as sentences (cf. e. g. Shriberg 1994: 149–151; Cook 1971: 138) and phrases (cf. e. g. Maclay and Osgood 1959: 33; Goldman-Eisler 1968: 95; Bortfeld et al. 2001: 138). It has furthermore been shown that filled and unfilled pauses are typically more likely to precede lexical words than function words (cf. e. g. Maclay and Osgood 1959: 32–33; but see Schilperoord and Verhagen 2006: 145 for partially conflicting evidence).

62 

 Ulrike Schneider

Claims that hesitations are preferentially placed at constituent boundaries or early on within the constituent (cf.  Boomer 1965; Clark and Clark 1977) are supported by traditional formal views of speech planning, which postulate that speakers first commit to at least parts of a syntactic frame before filling it wordby-word from the mental lexicon (cf. Carroll 1953; Chomsky 1957). Interestingly, as early as 1954, Lounsbury already put forward some hypotheses which suggest that other, usage-based factors might be able to explain hesitation placement more accurately, not least because these can also account for the location of hesitations within constituents: 1. Hesitation pauses correspond to the points of highest statistical uncertainty in the sequencing of units of any given order. [...] 2. Hesitation pauses and points of high statistical uncertainty correspond to the beginnings of units of encoding. [...] 3. Hesitation pauses and points of high statistical uncertainty frequently do not fall at the points where immediate-constituent analysis would establish boundaries between higher-order linguistic units [...]. (Lounsbury 1954: 99–100) Lounsbury continues that his hypotheses imply that within “easy oft-repeated combinations [...] hesitation pauses will tend to be eliminated” (Lounsbury 1954: 100–101). While Lounsbury lacked the computational possibilities to test his hypotheses, noting that even within the confines of a small corpus, “the calculation of all transitional probabilities [...] would be an impossible task” (1954: 99–100), modern technological development and the availability of large corpora have now made this possible. Thus several more recent studies (e. g. Pawley and Syder 1983; Kapatsinski 2005; Bybee 2007b; Erman 2007; Tily et al. 2009) show that oft-repeated words or combinations of words are indeed uttered more fluently than rarely used words or sequences. All of these studies are based on a usage-based view of language process­ ing which can explain why there is a correlation between co-occurrence patterns of words and hesitation placement. In contrast to most generative grammars, usage-based accounts do not make a rigid distinction between grammar and the lexicon and assume that through repeated use, word-combinations and structures become more strongly mentally represented and thus easier to retrieve as wholes (cf. Langacker 1987; Goldberg 1995; Bybee 2006; Beckner et al. 2009). This means that the mind keeps track of which words tend to be used together, so that frequently used combinations are mentally more “unit-like” than rarely used ones. Such a model of the mind predicts that frequently occurring sequences of words are more easily retrievable, or “chunked”, than rarely used or novel



Hesitation placement as evidence for chunking  

 63

sequences. Consequently, hesitation placement should be predictable from the co-occurrence patterns of words in the sense that frequent combinations of words, i. e. strong chunks, should be less likely to be interrupted by hesitations than rarely used combinations, i. e. very weak chunks. The research questions addressed in the present paper follow from Louns­ bury’s hypotheses and the results and methodology of subsequent usage-based studies. Thus, the paper will further investigate the effects that co-occurrence patterns of words have on hesitation placement. The main objective of this study is to determine which of the currently advocated determinants of chunking best explains hesitation placement. Currently, the formation of chunks is modeled in one of two manners: either increasing frequency of combined use is supposed to strengthen connections between the nodes representing the words in a sequence (cf. e. g. the “distributed account” discussed by Kapatsinski and Radicke 2009), or word-combinations are modeled as receiving a combined entry in the mental lexicon, which can then be accessed as a unit. Such entries are sometimes claimed to be created only for high-frequency sequences (cf. e. g. Erman and Warren 2000; Wray 2002), though others hold that they also exist for low-frequency combinations (cf. e. g. Bybee 2007a, 2010). In the latter case, the more often a speaker uses two words together, the stronger their combined representation is supposed to become, thus rendering the combined node more easily accessible. There are also mixed accounts, such as Kapatsinski and Radicke (2009), who find that with rising co-occurrence frequency the mental connections between words are strengthened until – once a certain frequency threshold is reached  – they are so strongly associated that the sequence is moved to the lexicon and henceforth no longer compositional (though still analyzable). These models make different predictions concerning hesitation placement. A model which assumes that the connection between mental nodes is strengthened by increasing frequency of combined use predicts that chunking is a gradual process and thus the chance of a chunk getting interrupted by hesitations should be inversely related to its frequency. A model like Wray’s (2002), on the other hand, which only conceptualizes highly frequent pairs as chunked, in the sense of holistically stored, predicts an abrupt change in hesitation behavior: all combinations of words below the threshold for chunking should be equally likely to be interrupted while all combinations above the threshold should be unlikely to be interrupted, because they are accessed as single units. Finally, in a system as described by Bybee (2010), where both the parts and the whole are stored irrespective of usage frequency, we should find competition between the parts and the whole: the more often a sequence is used, the more likely speakers are to access it as a whole, no longer assembling the parts to form it. However, the parts

64 

 Ulrike Schneider

must, of necessity, always be at least as frequent as the whole, which should lead to increased numbers of hesitations in sequences consisting of highly frequent parts. Apart from this primary disagreement concerning holistic storage, there is another controversy: Chunking is mostly linked to absolute co-occurrence frequency, yet there are indications that statistical (un)certainty, which is evidenced by measures of the relative chance of words to co-occur, is a better predictor of chunking (cf.  Stefanowitsch and Gries 2003). However, modeling chunking based on the relative chance of co-occurrence means that some highly frequent sequences such as in the are rated rather low and thus, weakly chunked. Some researchers, such as, for example, Bybee (2010: 97; see also Bybee and Eddington 2006) remain doubtful that this “devaluation” of such chunks and high-frequency constructions in general also happens in the mind. The present study focuses on the question whether absolute co-occurrence frequency or a measure of the relative associations between words is a better predictor of hesitation placement and consequently of chunking. The study con­ ceives of chunking as a gradual process, but makes no prior assumption as to whether chunks are cognitively represented in the form of holistic units or in the form of mutual activation between nodes. Thus, chunking is operationally defined as the process whereby the cognitive representation of a sequence of words becomes stronger. This process is expected to depend on usage-patterns. The stronger a chunk, the easier it is to retrieve as a unit and thus the less likely to be interrupted by hesitations. This means that a speaker should prefer to utter (2) rather than (1), because the hesitation in (1) appears to be placed inside a strong chunk. (1) ?I wish you a Merry uh Christmas. (2) ?I wish you a uh Merry Christmas. Not only is the word pair Merry Christmas used frequently (at least around a certain season), but the transitional probability when moving from Merry to Christmas is also very high: merry, these days, is likely to precede Christmas. Roughly one third of all merrys in the 450-million-word Corpus of Contemporary American English (COCA) precede Christmas. Therefore, we would expect a speaker to rarely hesitate between these words. Rather, we would expect speakers to know that once they have chosen merry, Christmas will follow, or, in other words, that Merry Christmas is a strong chunk. Should speakers for some reason – be it syntactic planning or lexical search – need to buy time for planning, we would expect them to hesitate before or after the chunk, where the associations between words are not as strong. Based on this reasoning, I posit the following hypothesis:



Hesitation placement as evidence for chunking  

 65

1. Locations of hesitation placement in spoken English are predictable from usage-based factors, i. e. from chunking. Many usage-based models assume that the most important determinant of chunking strength is absolute co-occurrence frequency. However, frequency can, in fact, be overruled by semantics, because idioms with non-transparent semantics, such as kick the bucket, need to be modeled as mental units despite low usage frequencies (cf. Biber et al. 1999: 988; Mel’čuk 1998; Moon 1998). However, the closer a sequence is to representing a single semantic concept, the more the words in the sequence attract each other. This means that with increasing semantic unity comes a greater likelihood of co-occurrence. Thus, measures of the relative likelihood of co-occurrence, such as transitional probabilities, should be able to explain the chunkiness of expressions such as kick the bucket, which could previously only be explained by the additional factor of semantics. There­ fore, I conclude that absolute co-occurrence frequency is not the best predictor of chunking, and I hypothesize that: 2. Measures of relative associations between words, such as transitional prob­ abilities or the mutual information score, which are based on information besides absolute frequency of co-occurrence, best reflect how strongly word pairs are chunked. Testing the second hypothesis requires the explicit contrast of absolute cooccurrence frequency with other, more complex, measures of chunking strength. The latter rely on a calculation of attraction which goes beyond count­ ing frequencies. I will approach these issues with the help of a corpus-based, bottom-up case study, in which the central focus is on the best implementation of chunking strength. This will be achieved through contrasting simple cooccurrence frequency with more complex measures of association.

2 Data and methodology 2.1 Data The data best suited for the analysis of hesitation placement and chunking are recordings of natural conversations. The Switchboard NXT corpus of American English (NXT Switchboard Corpus Public Release 2008) is therefore ideally suited for the purpose. It is a subset of the larger Switchboard corpus (Godfrey, Holliman, and McDaniel 1992), a 2.9-million-word sample of telephone conver­

66 

 Ulrike Schneider

sations between previously unacquainted adults (Godfrey and Holliman 1997). Switchboard is the single most widely used corpus for studies of hesitation phenomena (cf.  e. g. Acton 2011; Bell et al. 2003; Clark and Wasow 1998; Kapatsinski 2010; Shriberg 1994, 1996; Shriberg and Stolcke 1996; Stolcke and Shriberg 1996; Tily et al. 2009). The corpus has also been extensively used for the study of frequency effects (cf. e. g. Arnon and Snider 2010; Bybee 2007b, 2010; Gregory et al. 1999). This corpus has been transcribed several times. The transcript in Switchboard NXT that I am using is based on the transcript released in the LDC’s Tree­ bank3 (Marcus et al. 1999). Of all the conversations included in Treebank3 642, comprising a total of ca. 830,000 words, form the basis for Switchboard NXT (cf. Calhoun et al. 2010: 394). Switchboard is also one of the most highly annotated spoken corpora available to date. The present study makes use of the fact that the transcript is time-aligned, POS-tagged and parsed (cf. Calhoun et al. 2010: 393–395; Marcus, Marcinkiewicz, and Santorini 1993: 314). Both the POS-tagging and parsing of Switchboard NXT were drawn from earlier publications of the corpus where the mark-up had been inserted semi-automatically, i. e. with hand-corrections (cf. Calhoun et al. 2010: 393; Marcus, Marcinkiewicz, and Santorini 1993: 320).

2.2 Hesitations I define hesitations as elements which are not part of the emergent syntactic structure, do not contribute to the propositional meaning of the utterance, and are related to planning problems (cf.  also Fox Tree 1995: 709). This definition applies to unfilled pauses, the filled pauses uh and um, and the discourse markers well, like, you know and I mean. For unfilled pauses, minimum and maximum pause lengths were adopted to exclude as many non-hesitation cases as possible. The minimum pause length was set to 0.2 seconds (cf. Boomer 1965: 150; Goldman-Eisler 1968: 12; Ford and Thompson 1996: 146), and the maximum at one second. This upper limit was opted for to reduce noise, as it would be unlikely for a speaker in a telephone conversation to pause for longer than one second without filling that pause with some kind of floor-holding or hesitation device (cf.  Jefferson 1989) unless he/ she was actually interrupted by the other speaker or potential other people in the room – both of which are not annotated in the corpus. Furthermore, unfilled pauses at utterance boundaries were excluded, because these cannot be faithfully attributed to one of the speakers in the conversation.



Hesitation placement as evidence for chunking  

 67

Discourse markers such as well and you know were included because it has been shown that some discourse markers can be used to mark ongoing lexical and content search. This hesitation function has been most frequently attri­ buted to the discourse markers well, like, you know and I mean (cf. Jucker 1993: 438–447; Fung and Carter 2007: 423–424; Müller 2005: 109, 158, 208; Levey 2006: 426). The finding that discourse markers like you know can be used instead of the fillers uh and um in American English (cf.  Tottie and Svalduz 2009) and Schif­ frin’s observation that “several markers  – y’know, I mean, oh, like  – can occur quite freely within a sentence at locations which are very difficult to define syn­ tactically” (Schiffrin 1987: 32) lend support to this decision. Repetitions and self-corrections are excluded. Example (3) illustrates reasons for this choice. (3) He went to the sh- to the supermarket. Two points are needed to describe the self-correction in (3): both the point where the speaker cuts off  – the interruption point  – and the point where he restarts – the retraction point (cf. Pfeiffer 2010; Kapatsinski 2005). The previously described hesitations merely constitute interruptions; there is no retraction. Thus, repetitions and self-corrections do not fit the given framework.

2.3 Design Notwithstanding the aim to extract general regularities and tendencies, it was important to limit the scope of the analysis to specific syntactic environments, as absolute and relative co-occurrence frequencies are not comparable across the board. First of all, each word class holds characteristic relations to the words in its surrounding. Nouns, for example, tend to be preceded by determiners, and hence are claimed to form tighter bonds to their left context than their right (cf. Bybee 2007b: 318–323), whereas verbs are often followed by a restricted set of prepositions and may thus form tighter bonds to their right. Secondly, previous studies of hesitation placement have shown that hesitations are preferentially placed at phrase boundaries or before the first content word in a constituent (cf. Maclay and Osgood 1959; Goldman-Eisler 1968; Clark and Clark 1977; Shriberg 1994; Biber et al. 1999; Bortfeld et al. 2001). Therefore, it is particularly inter­ esting to investigate whether phrase boundaries have an independent effect on hesitation placement which cannot be explained by patterns of co-occurrence (for a claim refuting this view see Bybee 2007b). For these reasons, comparing associations between entirely different word-pairs is of limited usefulness. The

68 

 Ulrike Schneider

optimal solution is to compare only stretches of speech that consist of a set sequence of word classes. The contexts chosen for the present study are prepositional phrases. Table 1 shows the selected set of phrases. All hesitations occurring in the context of a structure whose beginning matches one of the six phrase types were selected for analysis. The word preceding the preposition was also extracted. If the phrase occurred sentence-initially, the data-point was excluded. Prepositional phrases have the advantage that they are frequent and allow for stepwise expansion through optional elements. Additionally, they offer a number of competing structural factors for hesitation placement: the prepositional phrase boundary is followed by a noun phrase boundary, which in half of the cases is then followed by the transition to the first content word in the phrase. Previous studies, such as Maclay and Osgood (1959) have shown that there is considerable variation in hesitation placement in prepositional phrases, which cannot be fully explained by structural factors. Table 1: Hesitations occurring in or directly preceding the following types of prepositional phrases were selected for the study, where n = the number of hesitations or clusters which occur in each phrase type Phrase Type

Example

n

Prep N Prep Det N Prep N N Prep Det N N Prep Adj N Prep Det Adj N

at home (sw3586.A.s110) to the park (sw3324.B.s82) before spring break (sw2092.A.s186) in the winter time (sw3124.B.s109) with low mileage (sw2299.B.s72) in the low forties (sw3377.B.s104)

1,231 1,440   553   494   431   575

Analysis of chunking strength in these contexts and, of course, of the influence of chunking strength on hesitation placement will be analyzed on the so-called bigram level, which means that only the absolute and relative frequencies of two-word sequences will be taken into consideration. This restriction is in place because the Switchboard NXT corpus is comparatively small by today’s stand­ ards. Given that the longer a sequence, the rarer it is (cf.  Bybee 2010: 35), and consequently the fewer the number of tokens present in a given corpus, a smaller corpus is not a reliable database for the assessment of the frequency of and associations within longer sequences. However, no larger spoken corpus offers such accurate tagging and parsing, as well as time alignment. Bigram frequencies, as well as any figures further needed to calculate measures of association were calculated based on co-occurrence patterns in



Hesitation placement as evidence for chunking  

 69

Switchboard NXT. Bigrams were defined as two consecutive words not cross­ ing sentence boundaries. A word, in turn, was defined as any word-form except hesitations with its attached part-of-speech-tag, separated by spaces from other word-forms. The constructed example in (4) illustrates this definition. In these two sentences, it’s, ‘s crazy, I do, don’t and so on would be considered bigrams, while ‘s uh, uh crazy and crazy I, would not. (4) it_PRP ‘s_BES uh_UH crazy_JJ. i_PRP do_VBP n’t_RB believe_VB it_PRP.

2.4 Predictors My central hypothesis is that sequences which are likely to occur in speech are easier to retrieve and produce than comparatively improbable sequences, because the former are more strongly chunked. Consequently, speakers should not inter­ rupt highly frequent pairs or combinations of words which are strongly attracted and instead prefer to halt before or after them if they need to hesitate. Testing this hypothesis requires the measurement of associations between adjacent words in stretches of connected speech. If hesitation placement is predictable from the associations in the surrounding context, the hypothesis is confirmed. I furthermore advocate that we explicitly compare and combine different measures of association, instead of relying on a single one, because we do not yet know which measure most accurately predicts chunking and thus hesitation placement. My second hypothesis particularly requires that absolute frequency of co-occurrence be compared to more complex measures which take into account other parameters besides frequency to represent relative chances of cooccurrence. To avoid randomly choosing a single measure, a range of commonly used measures will be tested as predictors of hesitation placement: –– Bigram frequency (bi.freq) measures how often each two-word combination occurs in the corpus. Bigram frequency is commonly used as a simple measure of chunking strength (e. g. Bybee 2007a). –– Direct transitional probability (TPD) measures how likely the first word is to be followed by the second (cf. Kapatsinski 2005: 6–7). Direct transitional prob­ ability is unidirectional in that it only looks from the first word to the second and not vice versa. The measure is also known as “Conditional Bigram (Prob­ ability)” (e. g. Gregory et al. 1999) and “Forward Bigram Probability” (e. g. Tily et al. 2009). –– Backwards transitional probability (TPB) measures how likely the second word is to be preceded by the first (cf.  Kapatsinski 2005: 6–7). Backwards transitional probability is also a unidirectional measure, and is sometimes

70 

 Ulrike Schneider

known as “Reverse Conditional Bigram (Probability)” (e. g. Gregory et al. 1999) and “Backward Bigram” (e. g. Tily et al. 2009). –– The mutual information score (MI) assesses how strongly the two words in a bigram attract by calculating how much more often they occur together than would be expected by chance (cf. Manning and Schütze 1999; Oakes 1998). It is a bidirectional measure of association because, unlike the transitional probabilities, it takes associations from left to right as well as from right to left into account. My calculation follows the formula used by Wiechmann (2008: 264–265). –– Lexical gravity G (G) is a relatively new measure of attraction, which assesses how likely among all possible combinations of words the combination in a given bigram is (cf. Daudaravičius and Marcinkevičienė 2004). This means that unlike transitional probabilities and the mutual information score, it does not make an “assumption of complete independence” (Gries and Mukherjee 2010: 3). Instead, lexical gravity G recognizes that due to semantic and syn­ tactic constraints, not all word combinations are possible in English. Like the mutual information score, G is a bidirectional measure of association. It is expectable that the performance of these predictors increases from top to bottom, because from the transitional probabilities down to lexical gravity G, measures are based on increasing amounts of information and thus become more and more complex. However, it is also possible that a combination of these measures might predict hesitation placement even more accurately. For example, word pairs displaying both high frequency and a high mutual information score might be the least likely to be interrupted by hesitations. Additionally, the following control factors were included in the analysis: –– Word frequencies (w.freq) can show whether an apparent effect of chunking is in fact caused by the frequency of only one word in the bigram (cf. Biber et al. 1999; Kapatsinski 2010; Stolcke and Shriberg 1996). For example, a hesitation placed before word Y could be placed there because Y is more strongly associated with following Z than with preceding X, or simply because Y is infrequent. –– Hesitation type (hes.type) is included as a predictor because filled pauses, unfilled pauses and discourse markers may be placed differently. Thus, vari­ ation in placement could merely be an effect of hesitation type. Importantly, a large proportion of hesitations do not occur alone, but in clusters like you know uh [pause], well [pause] or uh uh. Due to the large number of combinations, including each as a separate predictor in the analysis would mean



Hesitation placement as evidence for chunking  

 71

raising the number of hesitation types to over eighty, all with very low token frequencies. It is therefore preferable to group different types of hesitation clusters in order to end up with low type and high token frequencies. A clustering analysis presented in Schneider (2014b) shows that clusters and individual hesitations are placed very similarly and particularly that there is no significant difference in the placement of uh and um. Based on this study, the following groupings were chosen: –– pause: Individual unfilled pauses. –– u: The fillers uh and um and all clusters consisting of one or more instances of uh/um and optional pauses, irrespective of their order (e. g. uh um, [pause] uh uh, uh [pause] um). –– dm: The discourse markers like, well, you know and I mean and all clusters which consist of one or more discourse markers and possible fillers or pauses (e. g. [pause] I mean, you know like uh, I mean [pause] well). Based on these definitions, each data-point was coded for type of hesitation, as well as for all frequency-based predictors listed above. This means that in an example like (5), the frequency of all individual words as well as the bigram frequency, transitional probabilities and so forth of all word-pairs – die at, at this and this moment – will be taken into consideration (5) die [pause] at this moment (sw2039.A.s52)

2.5 Recursive partitioning A statistical model bringing together the hypotheses, the data and the predictors must be able to accommodate complex demands: –– Testing the first hypothesis requires the model to determine whether there is a negative correlation between association strength and interruption rate in bigrams. In other words, the model has to determine whether speakers’ hesitation behavior can be successfully predicted from the associations between words. –– The second hypothesis requires that the model can explicitly contrast the predictive power of different predictors, which means that it needs to be multi-factorial, i. e. able to consider several factors at once. –– The question of whether relative measures of association are apt to reflect semantic relations can be addressed by grouping bigrams according to their association patterns and their likelihood of being interrupted. We can then

72 

 Ulrike Schneider

observe whether statistically attracted pairs are more likely to constitute semantic units than others. –– Most of the selected prepositional phrases are more than two words long, which means that for all but the first structure, decisions are multinomial, i. e. the speaker has more than two options. In (5), for instance, the speaker could have placed the pause either before at, before this or before moment. –– Some of the predictors correlate because bigram and word frequencies are included in the calculations of all of the probabilistic measures. Generalized linear models and other regression approaches commonly applied in linguistics do not meet these requirements, particularly because they cannot handle multinomial outcomes or correlated predictors. Therefore, I use a methodology common in disciplines such as bioinformatics and statistics, which has only recently been applied to linguistics (cf. Tagliamonte and Baayen 2012). The methodology combines Classification and Regression Trees (CART trees; Hothorn, Hornik, and Zeileis 2006) and random forests (Hothorn, Hornik, and Zeileis 2006; Strobl et al. 2007; Strobl et al. 2008). These algorithms “grow” trees through recursive binary partitioning of the data, with the aim to create ever purer “branches”, i. e. subgroups of the data (cf. Baayen 2008: 148–149; Strobl, Maley, and Tutz 2009). In this way, they can handle multinomial outcomes and complex interactions as well as collinear predictors and large numbers of predictors (cf. Tagliamonte and Baayen 2012: 161, 171). While CART trees rely on a single tree per dataset, random forests grow thousands of trees using only a random selection of data points and predictors in each tree (cf. Tagliamonte and Baayen 2012: 159; Strobl, Maley, and Tutz 2009: 15–16). Figure 1 shows an exemplary CART tree. At each split, the algorithm aims to divide the set into groups with more homogenous hesitation behavior. Nodes (i. e. splits and leaves) are numbered from one through nine. At each split, the predictor and splitting point are listed. At the first split, for instance, the algorithm selects bi0.freq  – i. e. the frequency of the X + Preposition word pair  – as the splitting criterion, and a value 469 as the splitting point. The bar graphs at each terminal node show the distribution of outcomes in the leaf. The highest bar indicates the model’s prediction for this leaf. So in node 4, the model predicts for all hesitations to occur at position one, which means before the preposition. In this way, CART trees provide an informative graphical representation of the data that not only visualizes the most influential predictors from the quantitative analysis, but also suggests groups for later qualitative analyses and linguistic description. To assess the quality of the model, the numbers of correct and false predictions are compared to a baseline model “that simply predicts the most likely realization for all data points” (Baayen 2008: 153) in a chi-square test of significance. Fur­

2

3

4

1

2

3

4

0

0.2

0.4

0.6

0.8

1

1

2

3

4

Node 8 (n = 147)

> 2.351

6 G2.NXT p < 0.001 2.351 Node 7 (n = 111)

> 2.577

469

0

0.2

0.4

0.6

0.8

1

> 469

1 bi0.freq.NXT p < 0.001

1

2

3

4

Node 9 (n = 32)

Hesitation placement as evidence for chunking  

Figure 1: Classification and Regression Tree for the Structure “Preposition Determiner Adjective Noun”

1

0

0

4

0.2

0.2

0.2

3

0.4

0.4

0.4

2

0.6

0.6

0.6

1

0.8

0.8

0.8

0

1

1

Node 5 (n = 95)

> 1.189

2.577

2 MI0.NXT p < 0.001

Bigram Measures bi0 X + Preposition bi1 Preposition + Determiner Determiner + Adjective bi2 bi3 Adjective + Noun

1

Node 4 (n = 190)

1.189

G3.NXT p = 0.012

Word Frequencies w0 Word Preceding the Preposition w1 Preposition w2 Determiner w3 Adjective w4 Noun 3

List of Abbreviations w.freq Word Frequency bi.freq Bigram Frequency Direct Transitional Probability TPD TPB Backwards Transitional Probability MI Mutual Information Score G Lexical Gravity G

  73

74 

 Ulrike Schneider

thermore, the residuals of the chi-square test are examined to determine whether the number of correct classifications in the ctree model significantly exceeds those of the baseline model. If the value of the residuals exceeds two, the two models’ performance can be considered statistically significantly different. Reliance on a single tree is not ideal though, because all splits in the tree are only locally optimal. The algorithm chooses predictors and splitting points based on what provides the most purification at any given point in the tree, irrelevant of the consequences of the split. Therefore, a split that does not sort the data as perfectly but would lead to better results in subsequent splits is not chosen (cf. Strobl, Maley, and Tutz 2009: 333). This issue is resolved by relying instead on random forests, which generate an ensemble of trees (cf. Strobl, Maley, and Tutz 2009: 331), each based on a random subsample of data points and predictors (cf. Strobl, Maley, and Tutz 2009: 332–333). In this way, splits emerge that may not have been the locally optimal ones had all predictors been considered (cf. Strobl, Maley, and Tutz 2009: 332–333). Furthermore, predictors that might never appear in a single tree because they are always marginally outperformed by another correlated predictor are given a chance to perform, which then allows for an objective comparison of their predictive powers. Finally, each tree gets to “vote” on the most likely response for a given datapoint (cf.  Tagliamonte and Baayen 2012: 161). For the present study, forests of 3,000 trees were selected, because model per­ formance increased up to ca. 3,000 trees, but did not further improve with larger forests. The model was furthermore set to consider five predictors at every split. In theory, any number of predictors between two and twenty is possible, but in practice, models considering other than three or five (the default) predictors per split are rarely seen, and in the present case models performed better with five (cf. Schneider 2014b). Model performance is evaluated in the same way as the performance of CART trees, i. e. by comparing the number of correct predictions to those of a baseline model. Additionally, forests offer a more conservative and therefore realistic estimate of the model’s performance (cf. Strobl, Maley, and Tutz 2009: 335), as they “come with their own ‘built-in’ test sample, the out-of-bag observations” (Strobl, Maley, and Tutz 2009: 341). Each tree in the forest is based on a random subset of data-points and ignores the remainder of the data. As such, all assumptions (i. e. splits) are based only on the “in-bag” data. The “out-of-bag” observations can then serve as a control group, by testing whether the model’s assumption can be generalized to the unseen data. Throughout this chapter, the random forest results given are therefore based on out-of-bag observations. Finally, random forests score all predictors based on their predictive power. These scores serve as a “relative ranking” of predictors within the same model (Shih 2011: 2; see also Strobl, Maley, and Tutz 2009: 336).



Hesitation placement as evidence for chunking  

 75

3 Results 3.1 Frequency effects on hesitation placement The placement of hesitations in the analyzed data is extremely varied. Table 2 shows that hesitations occur in all kinds of transitions with no single preferred position. Only the position before the second content word in longer noun phrases is obviously dispreferred. Overall, the pattern proposed by earlier findings (cf. Maclay and Osgood 1959; Goldman-Eisler 1968; Clark and Clark 1977; Shriberg 1994; Biber et al. 1999; Bortfeld et al. 2001) is confirmed: hesitations occur predominantly at the phrase boundary as well as before the first lexical word in the phrase. Structural descriptions cannot, however, explain the broad variation in placement. Table 2: Distribution of hesitations in the six prepositional phrase types Phrase Type Prep N Prep Det N Prep N N Prep Det N N Prep Adj N Prep Det Adj N

before Prep 596 847 165 142 122 241

before Det

before Adj

before N

before N

251 143

635 292 316 183  58  96

72 62

301 107  95

Results also reveal that the more frequent a pair of words or the more strongly attracted the words within it, the less likely hesitations will occur within it. Table 3 shows the results of the analyses based on CART trees that used absolute frequency and the measures of association outlined above, as well as the predictors hesitation type and word frequency. The first line reads as follows: A CART tree fitted to the dataset of 1,440 hesitations occurring in prepositional phrases of the type “Preposition Noun” classifies 72.5 % of data points correctly, corresponding to a misclassification rate of 27.5 %. This result represents a highly significant improvement when compared to the corresponding baseline model (see Section 3.5 above), thus indicating highly significant frequency effects. These effects are confirmed by the test’s residuals; the value of both highly exceeds two, showing that the number of correct predictions highly significantly exceeds the baseline model’s, whilst the number of misclas­ sifications is highly significantly lower.

76 

 Ulrike Schneider

Table 3: Performance of the Classification and Regression Trees. Misclassification rates (MCR), p-values based on chi-square tests and the residuals of the chi-square tests are given Phrase

MCR

Sig. level

    Residuals

Prep N Prep Det N Prep N N Prep Det N N Prep Adj N Prep Det Adj N

27.5 % 35.8 % 37.4 % 55.9 % 41.1 % 50.3 %

p