Selected papers from the Eighth International Conference on Language Variation in Europe (ICLaVE 8), Leipzig, May 2015 9789027234995, 902723499X

444 8 36MB

English Pages 237 Se [255] Year 2017

Polecaj historie

Language variation - European perspectives VII : Selected papers from the Ninth International Conference on Language Variation in Europe (ICLaVE 9), Malaga, June 2017 9789027262073, 9027262071

This volume contains a selection from papers presented at the 9th International Conference on Language Variation in Euro

551 103 9MB Read more

Post-Quantum Cryptography: 10th International Conference, PQCrypto 2019, Chongqing, China, May 8–10, 2019 Revised Selected Papers 3030255093, 9783030255091

This book constitutes the refereed proceedings of the 9th International Workshop on Post-Quantum Cryptography, PQCrypto

1,121 105 10MB Read more

Applications of Finite-state Language Processing: Selected Papers From the 2008 International NooJ Conference 1443825735, 9781443825733, 9781443826037

NooJ is both a corpus processing tool and a linguistic development environment: it allows linguists to formalize several

301 28 3MB Read more

Selected Papers from the 12th International Networking Conference: INC 2020 3030647579, 9783030647575

The proceedings includes a selection of papers covering a range of subjects focusing on topical areas of computer networ

172 82 20MB Read more

Athens and Attica in Prehistory: Proceedings of the International Conference (Athens, 27-31 May 2015) 9781789696714

568 53 50MB Read more

Advancements in Interdisciplinary Research. First International Conference, AIR 2022 Prayagraj, India, May 6–7, 2022 Revised Selected Papers 9783031237232, 9783031237249

217 51 60MB Read more

Project on Nuclear Issues : A Collection of Papers from the 2015 Conference Series 9781442259461, 9781442259454

This annual volume includes papers from the 2015 CSIS Project on Nuclear Issues' Capstone Conference. Spanning a wi

146 80 1MB Read more

The Second International Symposium on Signed Language Interpretation and Translation Research: Selected Papers 1944838511, 9781944838515

The Second International Symposium on Signed Language Interpretation and Translation Research was a rare opportunity for

555 71 2MB Read more

Frontiers In Electronics: Selected Papers From The Workshop On Frontiers In Electronics 2011 (Wofe-11): Selected Papers from the Workshop on Frontiers in Electronics 2011 9789814541862, 9789814536844

Frontiers in Electronics includes the best papers of WOFE-11 invited by the Editors and down selected after the peer rev

166 11 26MB Read more

Financial Cryptography and Data Security. 26th International Conference, FC 2022 Grenada, May 2–6, 2022 Revised Selected Papers 9783031182822, 9783031182839

827 104 22MB Read more

Selected papers from the Eighth International Conference on Language Variation in Europe (ICLaVE 8), Leipzig, May 2015
9789027234995, 902723499X

Author / Uploaded
Isabelle Buchstaller
Beat Siebenhaar

Citation preview

Language Variation – European Perspectives VI

John Benjamins Publishing Company

Studies in Language Variation

Edited by Isabelle Buchstaller Beat Siebenhaar

19

Language Variation – European Perspectives VI

Studies in Language Variation (SILV) issn 1872-9592 The series aims to include empirical studies of linguistic variation as well as its description, explanation and interpretation in structural, social and cognitive terms. The series will cover any relevant subdiscipline: sociolinguistics, contact linguistics, dialectology, historical linguistics, anthropology/anthropological linguistics. The emphasis will be on linguistic aspects and on the interaction between linguistic and extralinguistic aspects — not on extralinguistic aspects (including language ideology, policy etc.) as such. For an overview of all books published in this series, please see http://benjamins.com/catalog/silv

Editors Frans Hinskens

Paul Kerswill

Jannis K. Androutsopoulos

Peter Gilles

K. K. Luke

Arto Anttila

Barbara Horvath

Gaetano Berruto

Brian Joseph

Paul Boersma

Johannes Kabatek

Jenny Cheshire

Juhani Klemola

Gerard Docherty

Miklós Kontra

Peter Auer

Universität Freiburg

Meertens Instituut & Vrije Universiteit, Amsterdam

University of York

Editorial Board University of Hamburg Stanford University Università di Torino University of Amsterdam University of London Newcastle University

Penny Eckert

Stanford University

William Foley

University of Sydney

Volume 19

University of Luxembourg University of Sydney The Ohio State University Universität Zürich

University of Tampere Károli Gáspár University of the Reformed Church in Hungary

Bernard Laks

CNRS-Université Paris X Nanterre

Maria-Rosa Lloret

Universitat de Barcelona

Nanyang Technological University, Singapore

Rajend Mesthrie

University of Cape Town

Pieter Muysken

Radboud University Nijmegen

Marc van Oostendorp

Meertens Institute & Leiden University

Sali Tagliamonte

University of Toronto

Johan Taeldeman University of Gent

Øystein Vangsnes

University of Tromsø

Juan Villena Ponsoda Universidad de Málaga

Language Variation – European Perspectives VI. Selected papers from the Eighth International Conference on Language Variation in Europe (ICLaVE 8), Leipzig, May 2015 Edited by Isabelle Buchstaller and Beat Siebenhaar

Language Variation – European Perspectives VI Selected papers from the Eighth International Conference on Language Variation in Europe (ICLaVE 8), Leipzig, May 2015

Edited by

Isabelle Buchstaller Beat Siebenhaar University of Leipzig

John Benjamins Publishing Company Amsterdam / Philadelphia

8

TM

The paper used in this publication meets the minimum requirements of the American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984.

doi 10.1075/silv.19 Cataloging-in-Publication Data available from Library of Congress: lccn 2017007924 (print) / 2017032191 (e-book) isbn 978 90 272 3499 5 (Hb) isbn 978 90 272 6557 9 (e-book)

© 2017 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Company · https://benjamins.com

Table of contents

Introduction Isabelle Buchstaller and Beat Siebenhaar

vii

Plenaries Analytic and synthetic: Typological change in varieties of European languages Martin Haspelmath and Susanne Maria Michaelis A case for clustering speakers and linguistic variables: Big issues with smaller samples in language variation Miriam Meyerhoff and Steffen Klaere

3

23

Dynamics, variation and the brain Jürgen Erich Schmidt

47

Individual chapters

69

Aggregate analysis of lexical variation in Galician Xulio Sousa

71

Inter-individual variation among young children growing up in a bidialectal community: The acquisition of dialect and standard Dutch vocabulary R. J. Francot, K. Van den Heuij, E. Blom, W. Heeringa and L. Cornips

85

The unruly dialect variant [a]: The case of the opening of (ɛ) in the traditional Torsby dialect Jenny Nilsson and Lena Wenner

99

Vowel raising and vowel deletion as sociolinguistic variables in Northern Greek Panayiotis A. Pappas

113

Between local and standard varieties: Horizontal and vertical convergence and divergence of dialects in Southern Spain Juan-Andrés Villena-Ponsoda and Matilde Vida-Castro

125

vi

Language Variation – European Perspectives VI

Syntactic doubling and variation: The case of Romani Aurore Tirard

141

Variation in style: Register and lifestyle in Parisian French Aria Adli

157

A corpus-based study of concessive conjunctions in three L1-varieties of English Ole Schützler Variation in the structure of conjunctions in Luxembourgish German in the 19th century: An interplay of language-internal and contact-induced variation Rahel Beyer

173

185

Geolinguistic documentation of multilingual areas: VerbaAlpina and the challenges of digital humanities (DH) Susanne Oberholzer and Markus Kunzmann

199

Variation in Croatian: The verbal behaviour of rural speakers in an urban speech community Ivana Škevin

215

Index

233

Introduction Isabelle Buchstaller and Beat Siebenhaar Leipzig University

The International Conference for Language Variation in Europe (ICLaVE) a ddresses all aspects of linguistic variation observed in languages spoken in p resent-day Europe. The series aims to bring together scholars of European languages or language varieties with the purpose of discussing empirical, methodological and theoretical issues in the study of language variation and change on the European continent. As such, it is intended to provide a platform for scholars interested in historical linguistics, psycholinguistics, dialectology, sociolinguistics, language acquisition, phonetics, grammatical theory or any other point of view that considers issues that pertain to language variation and change in European languages. The 8th ICLaVE conference took part from June 27th to June 29th 2015 at Leipzig University. Leipzig’s geographical situatedness at the very heart of Europe and at the crossroads of the Germanic and the Slavic speaking areas was seen as an ideal location for attracting a maximally encompassing range of languages. Indeed, the conference call was taken up with much enthusiasm: We received 181 abstracts featuring thirty-three languages spoken and written in a great wealth of dialects. Out of these submissions, we accepted 151 papers and posters, amounting to a retention rate of 83%. We decided not to group the papers by language / family but by thematic focus, which resulted in fifteen sessions on areas as diverse as language borders, complexity in morpho-syntax, language and migration, acoustic phonetics, and language and the media. These sessions were complemented by five thematic panels, including Quantitative and qualitative approaches to language (de)standardization (organised by Steff Grondelaers and Jürgen Jaspers), Living on the border between conflicting communities of practice (organised by Corinne Seals), Koines and regional standard varieties (organised by Frans Hinskens, Stavroula Tsiplakou and Juan Villena Ponsoda), Community-based language change (organised by Isabelle Buchstaller and Suzanne Evans Wagner), and Minority languages in Europe (organised by Anne-José Villeneuve and Nanna Haug Hilton). This edited volume contains a selection of these papers.

doi 10.1075/silv.19.int © 2017 John Benjamins Publishing Company

viii Isabelle Buchstaller and Beat Siebenhaar

It is time-honoured ICLaVE tradition to invite at least one plenary speaker from the hosting country as well as one from further afield. We were delighted to be able to attract Jürgen Erich Schmidt (Forschungsstelle Deutscher Sprachatlas, Marburg University), whose plenary built on historical German dialect atlas data to examine “Dynamics, variation and the brain”. We also relished the opportunity to tap into the typological expertise at the (since defunct) department of linguistics at the Max Planck Institute for Evolutionary Anthropology by inviting Susanne Maria Michaelis and Martin Haspelmath to speak to us on “Analytic and synthetic: Typological change in varieties of European languages”. Last but not least, we were thrilled that Miriam Meyerhoff (Victoria University of Wellington) agreed to make the long journey to Germany in order to explore “The large and the small of it: Big issues with smaller samples in the study of language variation”. We are particularly happy that all plenary speakers have agreed to publish their contributions as chapters in this volume.

Scope of this volume The submissions to this collection exemplify the breadth and the variability of research on European languages. The papers encompass languages from north to south (Swedish to Greek), from west to east (Galician to (again) Greek). The language families included in this volume include large ones, such as Germanic (varieties of Dutch (Francot, van der Heuij, Blom, Heeringa and Cornips), German dialects (Schmidt), Swedish dialects (Nilsen and Wenner) and English dialects (Schützler)) as well as Romance languages (including French (Adli) and Spanish dialects (Villena-Ponsoda and Vida-Castro)). We were particularly delighted to see small languages so well represented in this volume, including Škevin’s chapter on Croatian, Beyer’s research on Luxembourgish, Sousa’s report on Galician, and Tirard’s analysis of Romani. Haspelmath and Michaelis as well as Oberholzer and Kunzmann added a comparative angle and Meyerhoff investigates English-based contact varieties. The diversity of methodologies used in the research represented here to explore the many issues related to European languages was not entirely expected, yet highly propitious; the methodologies in this collection span the gamut of approaches represented in the field of linguistics and its allied disciplines. Haspelmath and Michaelis’ analysis is situated in a Greenbergian comparative-typological tradition. Geographical models for analysing patterns of language use feature in a number of chapters, including Villena Ponsoda and Vida-Castro’s investigation of data derived from the Pasos (Sociolinguistic Patterns of Castilian Spanish) project, Sousa’s work on the Galician atlas (Atlas Lingüístico Galego), Schmidt’s

Introduction ix

chapter on old data from the Digitaler Wenker-Atlas (DiWA), as well as Oberholzer and Kunzmann’s project, which is based on atlases of the three languages of the Alpine area. Surveys are also well-represented, encompassing questionnaires which collect perception data (Škevin, Francot et al.) as well as a sociocultural questionnaire (Adli). While the quantitative analysis of speech collected via relatively unmonitored (semi)structured sociolinguistic interviews forms the basis of a range of submissions (Schützler, Pappas, Meyerhoff, Nilsson and Wenner), the research represented in this volume often relies on other, often highly innovative, types of data such as speech collected via a detective game (Adli), the parallel text corpus consisting of bilingual German/French public notices issued by the City of Luxembourg in the 19th century (Beyer), the newly-developed Limburgish dialect word production task (Francot et al.) and an experiment that asked informants to manipulate and describe a range of objects (Tirard). Notably, this volume hosts a wealth of research which combines different methods to various ends (see Meyerhoff 2016, Soukup 2016). Adli, for example, analyses production data on the basis of a categorisation derived from survey-based methods. Meyerhoff and Klaere rely on constrained correspondence analysis (CCA), a method not often used in sociolinguistics, to assess patterns across space and speaker status. Francot et al. pair a standardized receptive voca bulary test in Dutch with a newly-developed Limburgish dialect word production task to compare children’s receptive knowledge and use of standard and dialect vocabulary. Schmidt combines a range of older dialect atlases with state-of-the-art brain imaging EEG and ERG. Given the many languages and dialects represented in this volume, and given the wealth of methodological approaches represented, we decided to group the chapters loosely by the level of linguistic structure investigated. The collection starts with the three plenary chapters (Haspelmath and Michalis, Meyerhoff and Klaere, Schmidt), followed by investigations into the lexicon (Sousa, Francot et al.). These are followed by a section on phonetics and phonology (Nilsson and Wenner, Pappas, Villena-Ponsoda and Vida-Castro) and morphology / morphosyntax (Tirard, Adli, Schützler, Beyer). The volume concludes with an investigation that spans different linguistic levels (Oberholzer and Kunzmann) and with the perceptual research reported by Škevin. In the following we will briefly synthesize each chapter in turn.

x

Isabelle Buchstaller and Beat Siebenhaar

Overview over individual chapters Plenaries Haspelmath and Michaelis take a macro-comparative perspective on typological variation across European languages. They argue that a synchronic distinction between analytic and synthetic patterns is problematic because it rests on the concept of the “(auxiliary) word”, which is not well-defined and therefore fundamentally inconsistent across individual languages/linguistic descriptions. The authors propose to reconceptualise the typological contrast as a diachronic process whereby lexical or other concrete material comes into functional competition with (and tends to replace) older (synthetic) patterns. This “analyticising” or “refunctionalising” process is cross-linguistically very widespread and involved in a substantial number of salient grammatical innovations. On the basis of the Atlas of Pidgin and Creole Language Structures, the authors illustrate analyticising developments in a range of creole languages, which all show drastic loss of inflectional markers, their replacement by new function items, and/or the development of novel function items, mostly from earlier lexical roots. Haspelmath and Michaelis argue that this process can be explained on the basis of general principles of contact-induced grammatical change, not only for creole languages, but also for other high-contact varieties of the major language families of Europe. Analyticisations occur in social situations with many adult second-language speakers, when people need to make an extra effort to make themselves understood, i.e. they need to add extra transparency. Meyerhoff and Klaere’s chapter examines the relationship between variation in the individual and in the group. Since many linguistic sub-fields rely on data from a restricted number of observations and individuals, research that is successful in scaling up our questions across multiple variables and speakers can make small data sets meaningful to the field. A possible method for scaling up is exemplified via the analysis of the “short fat Bequia corpus”, which contains a lot of information from relatively few (18) speakers coming from three different settlements. Constrained Cluster Analysis, which groups speakers on the basis of multiple variables, reflects the mechanisms by which interlocutors categorise and perceive each other as simultaneously members of groups and as individuals. The question of how intra-individual variation transcends intergroup differences brings to the fore the tensions between the notion of the speech community (which emphasises coherence) and the analysis of individual speaker style (which emphasises individual agency). Meyerhoff and Klaere argue that finding the mechanisms by which variation that is a property of the individual speaker is amplified across

Introduction xi

speakers to become recognisable as the characteristics of a group, and eventually to differentiate entirely distinct languages, lies at the heart of linguistic enquiry. Schmidt uses neurolinguistic tests to explain the change and stability that can be found in old linguistic maps. Large-scale collections of German dialect data allow the author to trace back linguistic changes in space and time over a period of more than 130 years. On the basis of these data, Schmidt considers stability, sound change which proceeds one word at a time, as well as cases where phonemes change as a whole. Stable situations can be characterized by a systematic correspondence of the sounds to those of neighbouring dialects, whereas in an unstable case, a conflicting sound distribution in a neighbouring dialect results in word-for-word-change, whereby speakers replace the dialectal form with one closer to the standard. An electroencephalogram (EEG) suggests that in the stable situation the deviant dialectal forms were hardly ever noticed, since the EEG shows hardly any difference between the standard and deviant realizations. In the conflicting case on the other hand, informants noticed phonetic differences in the words of their dialect as compared to the one they hear: they try to process them on the basis of their own phonological system, but the attempt fails. In the case of change of a phoneme as a whole the dialectal system remains intact and maintains its difference from the standard. A neurolinguistic test revealed that this development is due to overlapping phonemes in coexisting varieties (i.e. varieties that speakers switch between in their everyday language usage). This overlap interferes with the ability of the phonemes in the subordinate variety to distinguish meaning. The combination of old dialect maps and neurolinguistic measurements allows Schmidt to show that sound change happens as word-for-word-change as well as phoneme-change. The strategy which is favoured depends on the linguistic situation, on whether there are phonological conflicts between neighbouring dialects, and their speakers’ neurolinguistic reflexes.

Individual chapters The recognised boundaries of many language areas are drawn on the basis of phonetic variables. Sousa’s contribution presents an aggregate analysis of lexical variables collated from over one hundred maps from Atlas Lingüístico Galego with the purpose of identifying and characterising the main lexical areas in Galician dialects. Aggregate dialectology makes it possible to identify behavioural patterns which help to account for territorial nuclei of spatial distribution. Cluster analyses identify a set of dialectal areas showing internal similarity and contrast with other lexical spaces or areas within the Galician linguistic territory. Beam maps are used to discover areas of linguistic proximity to other zones belonging to

xii Isabelle Buchstaller and Beat Siebenhaar

adjacent, closely related linguistic domains within Galician. Overall, there are clear correlations between the distribution of these lexical areas and the findings of previous analyses using traditional procedures based on morphological and phonetic features (Zamora 1953; Carballo 1966; Fernández 1994). Sousa’s study provides solid evidence of the usefulness of quantitative dialectology for studying lexical data. It also supports the author’s claim that the geolinguistic analysis of lexical variation ought to be incorporated into descriptions of language varieties within the domain of Galician. Francot et al. relies on a unique combination of production and perception tasks. It has both a scientific and an applied goal and sets out (i) to explore whether it is possible to distinguish between monolingual and bidialectal children and (ii) to assess whether children raised with a dialect in Limburg experience more problems acquiring standard Dutch vocabulary than monolingual Dutchspeaking children. Their results reveal that each child shows a unique pattern of responses to the dialect production task, so it is not possible to draw a clear distinction between the bidialectal and monolingual children. Furthermore, there is no significant correlation between children’s propensity to use dialect vocabulary and their vocabulary knowledge in Dutch. These findings suggest that being raised in a dialect does not hinder or facilitate the knowledge of standard Dutch vocabulary. Nilsson and Wenner present a real and an apparent time investigation of the variable (ɛ), focusing on the use of the dialectal variant [a] in the small rural village of Torsby. Contrary to the levelling of most other dialect features, [a] has increased significantly in Torsby over the last 70 years, spreading both across the linguistic system, as well as in the speech community. Examining the social indexicality of [a] suggests that the opening of short /ɛ/ expresses tradition and authentic local identity in Torsby and has become enregistered (see also Johnstone et al. 2006). This is contrary to the opening of the variable (ɛ:), which signals urbanity and is found in large parts of Sweden, including the nearby town of Karlstad (Leinonen 2010; Svahn and Nilsson 2014). By triangulating production data with fieldwork notes, ethnographic interviews and data collected via questionnaires, the authors find that Torsby citizens are proud of their heritage, and that speaking the dialect is a very important part in maintaining an authentic local identity. They hypothesize that the motivation to signal tradition and local identity in Torsby may be an effect of increasing contact with other Swedish varieties over the past 70 years, similar to the processes Labov (1972) found in Martha’s Vineyard. Pappas’ research investigates the linguistic effects of de-urbanization as a result of severe economic recession in Greece. Comparing two groups of middle-aged speakers on the island of Thassos who are differentiated by whether or not they had lived for a substantial period in an urban centre, Pappas demonstrates that urbanisation leads to dialect attrition and loss. The chapter focuses on unstressed

Introduction xiii

vowel deletion and vowel raising, both of which are socially embedded in Northern Greek. Even though the usage of standard forms is very close to categorical, Pappas presents quantitative evidence that the use of standard variants indexes more advanced education and an orientation towards an urban lifestyle. Deletion in particular carries more stigma than raising. Therefore, speakers who have moved to an urban centre, and who are more positively orientated towards the standard, lead in the avoidance of the most stigmatized feature of the dialect, i.e. deletion. Since the stigma against raising is not as strong, it is avoided the most by those speakers who, typically, lead in the adoption of the standard: Women who are less attached to their local community have been shown to be leaders of such changes (cf. Labov 1972, 2001). Villena-Ponsoda and Vida-Castro’s chapter reports on the research project ‘Sociolinguistic Patterns of Castilian Spanish’ (Pasos). Horizontal levelling of varieties in urban Andalusia has resulted in convergence towards the national standard – particularly in Eastern Andalusia, which is far from the influence of the urban regional standard of Seville. At the same time, this vertical process has brought Andalusian and central Castilian varieties – as well as transitional dialects – closer by eliminating the most vernacular features. The chapter provides evidence of the formation of an intermediate regional variety betweencentral and standard Castilian Spanish on the one hand, and southern innovative dialects on the other. The resulting intermediate variety, which has gradually been emerging in the urban centres of east Andalusia, is a koine melting innovative non-standard phonological traits with standard features. It thus maintains some of the Andalusian phonologically unmarked features affecting codas, but diverges from the Seville regional standard since it adopts overt-prestige marked features from the national standard. This intermediate variety is relatively stable. Tirard analyses polydefiniteness in the Albanian Romani NP, where full doubling of the article is possible only with definite articles and in the presence of a postposed attributive adjective. The author presents results from a task which was designed to trigger doubling constructions (dnda) in contrastive contexts. The language contact history of the Albanian Romani varieties suggests that polydefiniteness can be interpreted as a pattern replication from Greek that resulted in a new order. On the basis of these findings, Tirard postulates that the Romani doubled construction (dnda) is a bridge from the canonical word order (dan) to a new one (dna). Notably, the community is split into subgroups who are involved in different patterns of language change: Mečkar and Čergar groups experience a pattern of stability since they have already completed the change toward dnda and dna. The Arli community on the other hand is split by age: the older Arli are stable but middle-aged Arli seem to exhibit a pattern of lifespan change, since the speakers of this cohort have individually changed in the direction of the rest

xiv Isabelle Buchstaller and Beat Siebenhaar

of the community. The patterning of the younger Arli, finally, can be interpreted either as age grading or indeed as generational change. Real-time data from the next generation will be needed to substantiate one or the other interpretation. Adli’s contribution explores the usefulness of Bourdieu’s (1979) sociocultural theory for language variation in Parisian French. A complex questionnaire a llows the author to stratify the informant pool based on their preferences in the areas of leisure, media, and clothing, which results in a complex composite index. Differences in terms of a person’s lifestyle are then correlated with their use of subject doubling and subject-verb inversion in spontaneous speech. This mixed methodology allows for a consistent explanation of the data, which reveals a clear distinction between the groups. While the excitement-seeking, down-to-earth lifestyle (orthodoxy) defends the norms of standard French, refraining from doubling and making frequent use of the formal inverted interrogative variant, the educated liberal lifestyle (heterodoxy) does the contrary. While neither education nor occupation turn out to be significant variables in an ANOVA, an analysis that considers Bourdieu’s notions of lifestyle and cultural capital allows Adli to demonstrate the salient complementary pattern of inversion questions and subject doubling across the two lifestyle types. Schützler’s chapter presents an analysis of the concessive conjunctions although, though and even though in British, Canadian and New Zealand English. The investigation, which is based on the International Corpus of English, highlights the underexplored semantic characteristics of the conjunctions, each of which is associated with clear preferences concerning the semantic types of concessives they encode. Notably, the difference between although and though is significant only in the oldest variety, BrE, which suggests that different semantic patterns between conjunctions may take a long time to evolve. The paper also provides evidence for inter- and intra-varietal stability: the semantic properties of conjunctions are very similar, not only in the three varieties under investigation, but also in speech and writing. Introducing a scale of subjectivity allows the author to suggest that constructions that are generally high in subjectivity tend to be even more subjective in speech, and constructions that are low in subjectivity tend to be even less subjective in speech. This synchronic pattern may be a symptom of ongoing semantic change led by spoken registers. Beyer’s contribution examines the role of contact between different varieties of German and French in the standardization of Luxembourgish German. Particular focuses of this project are contact-induced variation, interlingual transfer and norm selection, i.e. the increasingly standardised choices made amongst compe ting conjunctions (Im Fall(e)(,) dass/falls ‘in case (that)’, and the use of wann ‘when’ as conjunction). The analysis relies on a rare corpus of bilingual public notices

Introduction xv

of the City of Luxembourg which span the years 1795–1920. Mining this corpus allows Beyer to show the influence of political events on language; for example the “French period” (1795–1814), during which the Grand-Duchy was governed by a ruler with a clear language preference, shows the most variation. Subsequent periods show increasing standardisation, a concomitant decrease of the overall number of alternatives and less frequent use of these alternatives. These findings suggest that replication processes are sources of increased variation. However, (i) even during the French period, structural patterns are transferred only occasionally and (ii) the exploitation of the identified correspondences can only be observed as long as one of the languages clearly has a higher status and more prestige than the other. Oberholzer and Kunzmann report on the VerbaAlpina project, which aims to overcome a problem in traditional European geolinguistics: monolingual atlases, dictionaries and monographs usually survey only one particular dialectal region, and thus do not allow the exploration of linguistic phenomena that transgress dialectal and linguistic borders. Differing transcription systems and inconsistent conceptual descriptions have further hindered the direct comparison of different linguistic atlases. The VerbaAlpina project represents the three language families used in the Alpine area (Germanic, Slavic and Romance) with their corresponding dialects in one consistent environment, offering an interlingual overview of the Alpine region based on a wealth of different sources. The chapter focuses on the methodological challenges of transferring different repositories into one consistent and homogeneous structure. Innovations include: (i) a relational database which connects etymological roots (“basic type”) with various morpho-lexical types, to explore historical linguistic relations, and (ii) the use of Beta Code (a graphematic transcription based on ASCII-symbols), which enables the comparison of data from different sources, provides a coherent transcription system for all data sources, and ascertains the project’s sustainable compatibility with a wealth of repositories. Škevin’s questionnaire-based study investigates the responses of forty-five young rural university students from (island, coast, and hinterland) areas surrounding the Croatian coastal town of Zadar. Appealing to Le Page and Tabouret-Keller’s (1985) acts of identity and the markedness model, Škevin demonstrates that participants dynamically structure their verbal behaviour according to their interlocutors and the interactional situation they find themselves in, positioning themselves relative to each other in establishing a new relationship, or in already-established relationships (see also Coupland 2007: 113–4). Moreover, well-known language ideologies structure linguistic behaviour: island respondents in particular report that they accommodate more, because they perceive their own

xvi Isabelle Buchstaller and Beat Siebenhaar

variety as different from standard Croatian, and they know that their dialects are perceived as “funny”. Moreover, women report avoiding their hinterland variety and prefer the usage of regional Dalmatian forms or the standard (see Eckert and McConnell-Ginnet 2003, Trudgill’s concept of ‘covert prestige’ 1972: 179).

References Meyerhoff, Miriam. 2016. Methods, innovations and extensions: Reflections on half a century of methodology in social dialectology. Journal of Sociolinguistics 20: 431–452 Bourdieu, Pierre. 1979. La distinction: Critique sociale du jugement. Paris: Les Editions de Minuit. Carballo Calero, R. 1966. Gramática elemental del gallego común. Vigo: Editorial Galaxia. Fernández Rei, Francisco. 1994. “Areallinguistik / Áreas lingüísticas.” In Lexikon der Romanistischen Linguistik, ed. by Günter Holtus, Michael Metzeltin and Christian Schmitt, 98–110. Tübingen: Max Niemeyer. Johnstone, Barbara, Jennifer Andrus, Andrew E. Danielson. 2006. “Mobility, Indexicality, and the Enregisterment of ‘Pittsburghese’”, Journal of English Linguistics 34 (2): 77–104. Labov, William. 1972. Sociolinguistic patterns. Philadelphia: University of Pennsylvania Press. Leinonen, Therese. 2010. An Acoustic Analysis of Vowel Pronunciation in Swedish Dialects. GRODIL 83. Groningen: Rijksuniversiteit Groningen. Soukup, Barbara. 2016. An integrated theory of language attitudes for mixed methods research. Panel organised at Sociolinguistics Symposium 21, University of Murcia. Svahn, Margareta and Jenny Nilsson. 2014. Dialektutjämning i VästsverigeGöteborg: Dialekt-, ortnamns- och folkminnesarkivet i Göteborg. Zamora Vicente, Alonso. 1953. “De geografía dialectal: ‘-ao, -an’ en gallego.” Homenaje a Amado Alonso, Nueva Revista de Filología Hispánica VII: 73–80.

Plenaries

Analytic and synthetic Typological change in varieties of European languages Martin Haspelmath and Susanne Maria Michaelis MPI-SHH Jena / Leipzig University

It has long been observed that the modern European languages use more function words compared to earlier inflectional patterns, and this trend seems to have increased even further in creoles and other non-standard varieties. Here we make two arguments: First, we note that the terms synthetic and analytic are based on the “word” concept, which is not well-defined, so that these concepts cannot be used in a synchronic typology. But we can define a notion of “analyticization”, i.e. the replacement of an earlier pattern by a new, more e laborate pattern based on lexical or concrete items. Second, we observe that such analyticizations are particularly common in creole languages (when viewed as continuations of their lexifiers), and we hypothesize that this is due to the extra transparency that is required in situations with many adult second-language speakers. Keywords: analytic, synthetic, function word, grammaticalization, creole

1. The macro-comparative perspective: Language typology and language contact Since the early 19th century, linguists have sometimes tried to understand language change from a broader perspective, as affecting the entire character of a language. Since A. W. von Schlegel (1818), it has been commonplace to say that Latin was a synthetic language, while the Romance languages are (more) analytic, i.e. make more use of auxiliary words and periphrastic constructions of various kinds. In this paper, we adopt a macro-comparative perspective on language variation in Europe, corresponding to our background in the world-wide typology of contact languages (Michaelis et al. 2013) and general world-wide typology (Haspelmath et al. 2005). While variationist studies typically ask for patterns of variation within doi 10.1075/silv.19.01has © 2017 John Benjamins Publishing Company

4

Martin Haspelmath and Susanne Maria Michaelis

a single language, we ask whether there is a “big picture” in a ddition to all the details, in the tradition of A. W. von Schlegel. In particular, the topic of this paper is the replacement of synthetic patterns by analytic p atterns that has interested typologists and historical linguists since the 19th century and that has recently been the focus of some prominent research in variationist s tudies of English and English-lexified creoles (Szmrecsanyi 2009; Kortmann and Szmrecsanyi 2011; Siegel et al. 2014; Szmrecsanyi 2016). By way of a first simple illustration, consider the examples in Table 1, where the symbol “>>” means that the newer pattern on the right-hand side competes with and tends to replace the older pattern on the left-hand side. Most of the changes that we will talk about are well-known and have been widely discussed: development of a new prepositional genitive (as in German), development of a new auxiliary-based past tense (as in French), development of a new particle-based comparative form (as in Modern Greek); the loss of the old plural form in creoles such as French-based Seychelles Creole, and its replacement by forms such as bann (from French bande ‘group’) is less well-known, but also falls into the class of analyticizations. (The replacement of the definite article la by the demonstrative sa can also be seen as an example of this.) Table 1. Some illustrative cases of synthetic and analytic patterns German French Modern Greek French > Seychelles Creole

synthetic (old)

analytic (new)

des Hauses ‘the house’s’ Edith chanta ‘Edith sang’ oreó-tero ‘nicer’ la femme / les femmes ‘the woman/the women’

>> >> >> >>

von dem Haus ‘of the house’ Edith a chanté ‘Edith has sung’ pjo oréo ‘more nice’ sa fanm / sa bann fanm ‘the woman/the women’

In this paper, we make three main points. First, we discuss the basic question of how to distinguish analytic and synthetic patterns in the first place, noting that the distinction if understood synchronically rests on the concept of “(auxiliary) word”, which is not well-defined except in a trivial orthographic sense (§ 3.1). But there is no question that a diachronic process of “analyticizing” or “refunctionali zing” is widespread and is involved in a substantial number of salient grammatical innovations (§ 3.2). Second, we highlight the strongly analyticizing developments in creole languages, based on examples from the Atlas of Pidgin and Creole Language Structures (Michaelis et al. 2013). Compared to other Romance varieties, especially of course

Analytic and synthetic

the standard varieties, all creoles show drastic loss of inflectional markers, their replacement by new function items, and/or the development of novel function items, mostly from earlier lexical roots (§ 4). Third, we propose an explanation of these developments on the basis of the contact history of these languages, invoking general principles of contact-induced grammatical change (§ 5). The basic idea is that analyticizations are due to the increased need for clarity when a language has many speakers who learn it as adults. We go on to ask whether similar differences can be found within some of the major language families of Europe (e.g. with French being more analytic than Spanish, or Bulgarian more analytic than Russian), or even within the major languages (with some vernacular varieties being more analytic than the standard varieties). Before getting to these three main points in §§ 3 –5, we briefly discuss the history of the analytic/synthetic terminology. 2. A short history of the analytic/synthetic terminology The terms analytic and synthetic, as they are still used today, were coined by von Schlegel (1818). He conceived of them as two subtypes of the class of inflecting languages (comprising all Indo-European languages), which was opposed to the classes of agglutinating languages (such as Turkish) and isolating languages (such as Chinese). In the early 19th century, there was generally a value judgement associated with language classification. More highly inflected languages were regarded as superior to agglutinating and isolating languages. From a modern, strictly linguistic point of view, analytic patterns are not really different from isolating patterns, but in earlier times there was a general feeling that the situation in Romance languages should not be compared to Chinese (after all, Romance languages still have fairly rich verb inflections), so the terms analytic and synthetic survived. 1 Typological considerations had little prestige in the second half of the 19th century, but linguists became increasingly skeptical about the value judgements that were originally associated with them. Otto Jespersen even claimed that the modern analytic languages (such as English) were superior to the cumbersome classical languages (most clearly expressed in his 1894 book Progress in Language). In the early 20th century, Sapir (1921) tried to apply the typological notions from the 19th century to the North American languages that he had studied, but typology became popular again only with Joseph Greenberg’s work in the 1960s. 1. The morphological types analytic, agglutinating etc. have also been closely linked to the idea of a general cyclic or spiral-like development, as discussed from a modern perspective in Haspelmath (2017).

5

6

Martin Haspelmath and Susanne Maria Michaelis

Greenberg became best-known for the word-order correlations that he p opularized and discovered, but in 1960 he also published the first paper that approached typological distinctions quantitatively, by computing an analyticity index for different languages. The idea that whole languages could be classified into categories such as analytic or agglutinating was gradually abandoned in the latter part of the 20th century, but linguists still needed to distinguish between patterns such as des Hauses and von dem Haus, so they described the latter as analytic constructions. Since the 1990s, interest in the development of such constructions that use grammatical words has increased greatly, but generally under the rubric of grammaticalization (e.g. Lehmann 2016[1982]; Hopper and Traugott 1993). The term analytic construction has remained in use, but not prominently, and there has been very little general research on analyticization that uses this term (in fact, our dynamic term analyticization is likely to be unfamiliar to most linguists). In the 2000s, it became popular to consider whole languages from the point of view of “complexity” (e.g. McWhorter 2001; Miestamo et al. 2008), which is not unrelated to analyticity. 3. Analytic/synthetic as a synchronic notion Before moving on, we need to point out that the term pair analytic/synthetic is problematic because it is based on the notion of a “word”, which is itself poorly defined. In Matthews’s (1997) Concise Dictionary of Linguistics, the terms are defined as in (1a) and (1b). In addition to presupposing a notion of “word”, they presuppose “inflection” (1c), a concept that is itself based on “lexical unit” (= “word”). (1) a. analytic form “form in which separate words realize grammatical distinctions that in other languages may be realized by inflections” b. synthetic form “form in which grammatical distinctions are realized by inflections” c. inflection “any form or change of form which distinguishes different grammatical forms of the same lexical unit”

That the notion of “word” cannot be defined consistently across languages (other than orthographically, in languages with spaces between words) has long been known and was recently highlighted again by Haspelmath (2011) and Michaelis (2015). Spelling systems are arbitrary (historically accidental) to a substantial degree, so comparative linguists can hardly rely on them. But while the difference

Analytic and synthetic

between the written unit “letter” and strictly linguistic units such as ‘segment’ and ‘phoneme’ is universally recognized, linguists often still seem to think that a written unit separated by spaces (“word”) must reflect an important grammatical unit in (spoken) languages. Siegel et al.’s (2014) study of analyticity and syntheticity in creoles, for example, attempts to base the distinction between “free grammatical markers” and “bound grammatical markers” (along the lines of Greenberg 1960). The authors claim that the writing systems of the languages they are considering “generally represent a free marker as a separate word and a bound marker as a part of another word. Thus, we use this conventional orthography as a basis for our analysis.” They are aware that there is a problem, but they do not actually do anything about it: Of course, this is not ideal, as there is not necessarily an unequivocal relationship between spelling conventions and language structure (Haspelmath 2011). However, as a detailed phonological and morphosyntactic analysis of each language’s texts would not be feasible, it is the best option. (Siegel et al. 2014: 53)

But we feel that the problem is actually deeper than they imply, because there is no rigorous definition of “word” and “affix” that would coincide with linguists’ intuition and also be based on “a detailed phonological and morphological analysis of languages”. Apparently our intuitions are based on the best known spelling systems, and these are not based on anything truly systematic. 2 (Note also that “bound” cannot be equated with “affixal”, because clitics are generally regarded as bound elements that are not affixes.) However, in the next section we will see that the tendency to replace older synthetic patterns by newer analytic patterns is real, despite the definitional d ifficulty noted here.

2. A reviewer asked whether the notion of phonological word might be helpful, but as Schiering et al. (2010) have shown, the diverse criteria for identifying phonological word domains typically do not coincide, so there are as many phonological words as criteria. Another reviewer asked whether the synchronic notion synthetic could not be based on some other notion such as uninterruptability. The answer is found in Haspelmath (2011): “No other notion gives a good match with the folk concept of the “word”; in particular, most function items (e.g. prepositions and auxiliaries) are uninterruptable in the sense that they cannot be separated from the form they occur next to.”

7

8

Martin Haspelmath and Susanne Maria Michaelis

4. Synthetic/analytic in diachrony Even if we cannot say whether a construction is synchronically synthetic, due to our inability to define “word” across languages, we can often determine whether it is diachronically old or freshly created from new material. Thus, the English -er comparative (e.g. likeli-er) is clearly old, while the more comparative (more likely) is clearly innovative; and the French simple synthetic future (e.g. elle chante-r-a ‘she will dance’) is old, while the aller future (e.g. elle va chanter ‘she is going to dance’) is clearly innovative. Thus, from a diachronic perspective it is possible to define an analytic pattern as in (2). (2) analytic pattern: a morphosyntactic pattern that was created from lexical or other concrete material and that is in functional competition with (and tends to replace) an older (synthetic) pattern

This means that if we are dealing with a language whose history is totally unknown, we cannot classify its patterns as synthetic or analytic. Since we know the history of French, we can perhaps say that it is “analytic” in comparison with Latin, but for a language such as Hup in Amazonia (Epps 2008) or Oko in Nigeria (Atoyebi 2010), about whose history we know very little, we cannot say whether they are analytic or synthetic. We are thus primarily talking about a diachronic process of analyticization or “refunctionalization”. The latter term is intended to refer to the new creation of a functional morpheme to express approximately the same grammatical notion or construction that has earlier been expressed by some other construction. 3 This restriction of the term analytic to a diachronic context may strike some readers as odd, but there is a general tendency in language typology to view typological generalizations as primarily diachronic (e.g. Bybee 2006; Cristofaro 2012), so even if we are not optimistic about pursuing Greenberg’s (1960) research programme 3. Of course, there is no need for a new grammatical pattern to express exactly the same m eaning as an older one, and indeed we typically seem to find meaning differences, e.g. between the older French synthetic future and the newer aller future (cf. Reinöhl and Himmelmann 2017), or between the German genitive and the possessive construction with von, or between the English definite article and the Sranan definite article a (cf. § 4.1 below). The “identity” of the older and the newer patterns is not more than a relationship of typological matching, in terms of typological comparative concepts (Haspelmath 2010). The notion of “replacement” must be seen in this light, and it cannot be inferred that the newer construction is completely identical in function with the older construction.

Analytic and synthetic

of a quantitative synchronic typology in terms of degrees of analyticity, we are convinced that there are strong general tendencies toward analyticization in language change. Thus, the term analytic should be understood as roughly meaning “freshly re-grammaticalized”. This definition works, because all patterns that have traditionally been called “analytic” are known to have been created from lexical or other concrete material; there does not seem to be any other way in which such patterns can come about. This definition is somewhat broader than the traditional purely synchronic definition, in that it also includes cases like the English past-tense marker -ed as in play-ed, which is generally thought to be a much newer pattern than the old pattern represented by ablauting verbs such as sing/sang, write/wrote (e.g. Lahiri 2000), and cases like the Sranan definite article a (deriving from da < dat < that), which is an analytic form when compared to English the, because it is based on a refunctionalization. On the other hand, our definition of analytic is somewhat narrower than the synchronic view in that grammatical morphemes with no earlier counterparts, such as the Germanic and Romance definite articles, cannot be regarded as analytic. Even though the rise of definite articles in Romance and Germanic languages (and more generally in European languages, cf. Heine and Kuteva 2006: Chapter 3) has often been seen as part of the general tendency toward analyticity, we think that they should be treated differently, because otherwise we would have to include all kinds of other grammatical elements (e.g. discourse particles, focus particles, the new conditional mood) as well. Changes whereby an earlier more compact pattern is replaced by a newer pattern based on a new periphrasis are quite widespread in European languages. A few types of changes are listed in (3), where the symbol “>>” means that the (approximate) function of a pattern tends to be replaced by a new pattern based on lexical or other concrete material. (3)

a. b. c. d. e. f.

cases >> prepositions comparative suffixes >> analytical comparatives synthetic past tense >> analytic past tense synthetic future tense >> analytic future tense person-number suffixes on verbs >> personal pronouns infinitive >> person-indexed subjunctive forms (Balkan languages)

In all these cases, the new analytic forms (originally) include transparent components, i.e. they arise from grammaticalization of concrete or lexical items. In a next step, analytic forms may then become compact again, be written together and thus become “synthetic again” (a process which can be called anasynthesis, cf. Haspelmath 2017), e.g.

9

10

Martin Haspelmath and Susanne Maria Michaelis

(4) anasynthesis in the Romance future:(Spanish) Latin canta-bit >> ‘will sing’ Romance cantare habet > cantare ha > cantar-á ‘has to sing’ ‘will sing’ (5) anasynthesis in the Romance adverb:(Italian) Latin fidel-iter >> ‘faithfully’ Romance fideli mente > fedelmente ‘faithfully’

As mentioned earlier, analytic forms never arise in any other way; in particular, they do not arise from “desyntheticization” of a synthetic form (“antigrammaticalization” does not exist, with very few exceptions; Haspelmath 2004). Just as new grammatical patterns can arise without any precursors (e.g. definite articles), old synthetic forms can disappear without any replacement, e.g. the old gender inflection in English adjectives, or the dual of older Germanic languages. While analyticizations are really common in European languages, we will see in the next section that they are found even more widely in creole languages based on European languages. 4 5. Analyticizations occur very commonly in creoles In this section, we will give examples of analyticizations in creole languages, from the database of the Atlas of Pidgin and Creole Language Structures (APiCS, Michaelis et al. 2013). The creoles we mention are all based on Germanic languages (English, Dutch) or Romance languages (Portuguese, Spanish, French). Table 2 lists the creoles from which our examples come, together with their major lexifier and the corresponding APiCS contribution.

4. Note that we do not follow authors such as Bickerton (1981) and Thomason and Kaufman (1988) in regarding creole languages as “new languages” with no historical continuity with their lexifier languages. Our position in this paper is more in line with authors such as Mufwene (2001) and Ansaldo et al. (2007), who stress the similarities between creolization and other kinds of contact-induced historical change.

Analytic and synthetic

Table 2. Creoles and their lexifiers in APiCS creole language

lexifier

author in APiCS

African American English Batavia Creole Bislama Creolese Diu Indo-Portuguese Guadeloupean Creole Guinea-Bissau Kriyol Guyanais Haitian Creole Jamaican Kriol Mauritian Creole Negerhollands Palenquero Papiá Kristang Principense Santome Seychelles Creole Sranan Sri Lanka Portuguese Tayo Ternate Chabacano Tok Pisin Vincentian Creole

English Portuguese English English Portuguese French Portuguese French French English English French Dutch Spanish Portuguese Portuguese Portuguese French English Portuguese French Spanish English English

Green 2013 Maurer 2013 Meyerhoff 2013 Devonish & Thompson 2013 Cardoso 2013 Colot & Ludwig 2013 Intumbo et al. 2013 Pfänder 2013 Fattier 2012 Farquharson 2013 Schultze-Berndt & Angelo 2013 Baker & Kriegel 2013 van Sluijs 2013 Schwegler 2013 Baxter 2013 Maurer 2013 Hagemeijer 2013 Michaelis & Rosalie 2013 Bruyn & van den Berg 2013 Smith 2013 Ehrhart & Revis 2013 Sippola 2013 Siegel & Smith 2013 Prescod 2013

The APiCS numbers following the subsection headings below are the feature numbers where the relevant information can be found. 5.1

Definite articles (APiCS 28, 9)

(6) Sranan a (e.g. a pikin ‘the child’) < da < English that

(Bruyn 2009)

(7) Kriol thet/thad (e.g. thad lif ‘the leaf ’) < English that: i.

Thad lif pat bla mukarra, im gud-wan bla so. dem leaf part poss river.pandanus 3sg good-adj dat sore ‘The leaf of the river pandanus is good for sores.’ (Schultze-Berndt and Angelo 2013)

(8) Haitian Creole  = la (e.g. nouvel = la ‘the news) < French là ‘there’

11

12

Martin Haspelmath and Susanne Maria Michaelis

5.2

Indefinite articles (APiCS 29, 10)

(9) Sranan wan < English one (Bruyn 2009), also in other English-lexified creoles (10) Guinea-Bissau Kriyol utru < Portuguese outro ‘other’ i.

5.3

utru omi musulmanu a man Muslim ‘a Muslim man’

(Intumbo et al. 2013)

Plural markers (APiCS 22, 23)

(11) Seychelles Creole bann < French bande ‘group’ i.

Tou sa bann landrwa mon ’n ale. all dem pl place 1sg prf go ‘It’s to all these places that I have been.’ (Michaelis and Rosalie 2013)

(12) Tok Pisin ol < English all (13) Diu tud < Portuguese tudo ‘all’ (also Tayo tule < tous les)

5.4

Genitive markers (APiCS 38, 37)

(14) Vincentian Creole fo ‘of ’ < English for (also in other English-lexified Caribbean creoles) i.

di pikni fo di woman art child for art woman ‘the woman’s child(ren)’

(Prescod 2013)

(15) Tok Pisin bilong < (that) belong (to) (also Bislama blong, Kriol bla) (16) Seychelles Creole pour, Tayo pu < French pour ‘for’

5.5

Personal pronouns in subject or possessor function (APiCS 62)

(17) Santome obligatory subject person forms (cf. Portuguese optional subject pronouns) i.

Bô na sêbê kuma bô so kota mu mon fa? 2sg neg know comp 2sg foc cut 1sg.obj hand neg ‘Don’t you know that it was you who cut my hand off?’ (Hagemeijer 2013)

Analytic and synthetic 13

(18) Palenquero obligatory subject person forms (cf. Spanish optional subject pronouns) i.

Ele e ta trabahá. he he is working

(Schwegler 2013)

(19) Diu Indo-Portuguese d-el ‘his’, lit. ‘of him’ (cf. Portuguese possessive pronoun seu/sua) (20) Guadeloupean Creole timoun an mwen [child of me] ‘my child(ren)’

5.6

Accusative markers (APiCS 57)

(21) Batavia Creole kung, Papiá Kristang ku (< Portuguese com ‘with’), Ternate Chabacano con (< Spanish con ‘with’) (22) Sri Lanka Portuguese -pa (< Portuguese para ‘for’) (cf. also Afrikaans vir < voor ‘for’) i.

5.7

eev vosa kuɲaadu-pa kada ɔɔra ki-lembraa 1sg 2sg.gen brother.in.law-acc every time hab-think.of ‘I often think of your brother-in-law.’ (Smith 2013)

Dative markers (APiCS 60, 61)

(23) Bislama long (< English along), cf. also Kriol langa, la (24) Mauritian Creole avek/ek (< French avec ‘with’) i.

(av)ek ki sanla to’n don larzaṅ la? with who that.one 2sg.pfv give money def ‘To whom have you given the money?’ (Baker and Kriegel 2013)

(25) Diu Indo-Portuguese pe (< Portuguese para) (26) Papiá Kristang ku, Batavia Creole kung, Chabacano con/kon (cf. § 5.6)

5.8

Future tense markers (cf. APiCS 48)

(27) Negerhollands lo < loo ‘go’ < Dutch lopen ‘run’ i.

Morək mi lō lō. tomorrow 1sg fut go ‘Tomorrow I will go.’

(28) Seychelles Creole pou < French (être) pour (29) Tok Pisin bai < English by and by

(van Sluijs 2013)

14

Martin Haspelmath and Susanne Maria Michaelis

5.9

Past tense (or anterior) markers (APiCS 45)

(30) Seychelles Creole ti < French était ‘was’ (31) Jamaican wehn < English been (also in many other English-lexified creoles) (32) Principense tava < Portuguese estava ‘was’ (33) Batavia Creole dja (perfective marker) < Portuguese já ‘already’ i.

fala kung ile ki eo dja teng aki tell obj 3sg comp 1sg pfv be here ‘tell him that I have been here’

(Maurer 2013a)

5.10 Imperfective aspect markers (APiCS 46, 47, 48) (34) Early Sranan de (prog) < English there (also in other Atlantic English-lexified creoles) i.

Hangri de killi mi. hunger prog kill 1sg ‘I am hungry (lit. Hunger is killing me).’ (Bruyn & van den Berg 2013)

(35) Tok Pisin i stap (prog) < English stop (36) Seychelles Creole pe, Haitian Creole ap (prog) < French (être) après ‘near, about (to do)’ (37) Haitian Creole konn (hab) < French connaître ‘know’ (38) Palenquero asé (hab) < Spanish hacer ‘do’

(cf. Gullah duhz)

5.11 Causative construction (39) Seychelles Creole fer < French faire ‘do’ Mon fer Zan manze vs French Je fais manger Jean ‘I make Jean eat’

The Seychelles construction in (39) uses the same lexical construction as the older French construction, and the older construction is not normally regarded as “synthetic”. But by our definition, the Seychelles Creole pattern qualifies as analytic because it is clearly a new creation, as can be seen from the word order, where the causee (Zan) stands between the causative verb fer and the caused verb (manze). If the Creole construction continued the French construction, this ordering would not be possible. Thus, we see that creole languages have a substantial additional number of analyticizations. In the next section we propose an explanation for this and link it to language contact in more general terms.

Analytic and synthetic

6. Analyticization is generally favoured by language-contact situations The idea that analyticization is favoured by language contact is not new. In fact, in a sketchy form it is found in the very first work that discussed the analytic/synthetic distinction, August Wilhelm von Schlegel’s (1818) work on the Provençal language (and its literature): Mais cette transition au système analytique a lieu bien plus rapidement, et, pour ainsi dire, par secousses, lorsque, par l’effet de la conquête, il existe un conflit entre deux langues, celle des conquérans et celle des anciens habitans du pays. Voilà ce qui a eu lieu dans les provinces de l’empire occidental, conquises par les peuples germaniques, et en Angleterre lors de l’invasion des Normands. De la lutte prolongée de deux langues, dont l’une étoit celle de la grande masse de la population, l’autre celle de la nation prépondérante, et de l’amalgame final des langues et des peuples, sont issus le provençal, l’italien, l’espagnol, le portugais, le françois et l’anglais. 5

More recently, Carlier et al. (2012) express it in the following way: The more languages spread over large populations and involve frequent language contact between individuals who are related to each other by weak ties, the faster languages may evolve by regularizing mechanisms, ultimately also reducing their morphological and grammatical systems. (Carlier, De Mulder & Lamiroy 2012: 292, citing Lupyan & Dale 2010; Trudgill 2011; see also McWhorter 2007)

But what explains increased analyticization in situations of increased contact? We can contrast two possible explanations for the increased tendency to analyticize in situations of language contact, what we call the “Loss-and-Repair Hypothesis” (cf. 40) and the Extra-Transparency Hypothesis (cf. 41). We will argue below that the second hypothesis is the more adequate explanation. But the Loss-and-Repair Hypothesis does not seem implausible either at first blush. In fact, the idea that languages tend to undergo “decay” and therefore need fresh material to reconstitute its grammar is quite old, going back at least to Schleicher (1860).

5. “But this transition to the analytic system took place more rapidly, and, so to speak, by jolts, when, due to conquest, a conflict between two languages arises, the language of the conquerors and the language of the earlier inhabitants of the country. This took place in the provinces of the Western Roman empire which were conquered by Germanic peoples, and in England after the Norman invasion. The extended struggle between two languages, one of which was the language of the great majority of the people, and the other the language of the ruling group, and the eventual merging of the languages and the peoples gave rise to Provençal, Italian, Spanish, Portuguese, French and English.”

15

16

Martin Haspelmath and Susanne Maria Michaelis

(40) Loss-and-Repair Hypothesis (e.g. Siegel 2008: 65–66; Good 2012) In the transmission bottleneck of pidginization, inflectional and other non- salient grammatical markers are lost, because they cannot be acquired by adult learners. This leaves a void, and when pidgins turn into fully-fledged languages again, they need to fill the gaps by new material deriving from content words.

This hypothesis is similar to the therapeutic view of grammaticalization, which has been shown to be wrong for grammaticalization in general (cf. Lehmann 1985; Haspelmath 2000). It simply cannot be the case, for a number of reasons, that grammatical forms first reduce and then need to be strengthened again. Thus, we favour the following hypothesis: (41) Extra-Transparency Hypothesis In social situations with many (or even mostly) adult second-language speakers, people need to make an extra effort to make themselves understood – they need to add extra transparency. This naturally leads to the overuse of content items for grammatical meanings, which may become fixed when more and more speakers adopt the innovative uses.

This is similar to the extravagance-based view of grammaticalization, which offers the best account of unidirectionality (Haspelmath 1999). Grammaticalization in ordinary situations is explained as due to occasional extravagant language use, when no special social circumstances are present. But in high-contact situations, no appeal to extravagance is necessary, and extra clarity can explain the stronger tendency to functionalize content items. Creolists have recently tended to focus on transparency (Seuren and Wekker 1986; Leufkens 2013), simplification (McWhorter 2001; 2007; Parkvall 2008) or on the uniqueness of creole languages (Bakker et al. 2011), not on particularly fast grammaticalization. But the idea that many of the changes observed in creoles can be seen as accelerated grammaticalization has been expressed earlier (cf. Plag 2002), and despite some problems in distinguishing between true innovative grammaticalization and simple constructional calquing (cf. Bruyn 2009), we think that it is basically correct. It is clear that simplification by adult learners cannot be invoked in all cases of analyticization, because this also occurs when the older synthetic form is simple to begin with (e.g. the >> that in § 4.1, faire manger Jean >> fer Zan manze in § 4.11, a >> para (dative marker) in Brazilian Portuguese, de >> pou (genitive marker) in § 4.4). And analyticization also occurs in languages that do not have a high number of adult second language speakers, so something like extravagance needs to play a role in any event.

Analytic and synthetic 17

If extra transparency is the explanation for the very high degree of analyticization in creoles, then we may expect to find evidence for this also in languages that have undergone less extreme language contact changes, so this is what we briefly consider in the next section. 7. Further examples of increased analyticity in European varieties Increased analyticity is apparently also found in some varieties of European languages that have undergone more contact influence than closely related varieties. In this section we give a few examples. Much more in-depth study would be required to really establish this, but we would like to include these examples because we believe that the creoles of APiCS are not completely unique, but are just more extreme cases of a kind of phenomenon that is also found elsewhere. In particular, increased analyticity is found in a range of constructions in two varieties that have been called “semi-creoles”, Afrikaans and Colloquial Brazilian Portuguese (cf. Holm 2004). 7.1

Increased analyticity in Afrikaans

In Afrikaans, the old past tense disappeared and was replaced by the ‘have’ perfect (42), the dative is expressed by the preposition vir (43), and the genitive is exclusively expressed by the preposition van (44). For discussion, see Holm (2004). (42) past tense Ek het geskryf. ‘I wrote, I have written’ (43) vir-dative Hy het dit gister vir sy broer gewys.  he has this yesterday to his brother shown ‘He showed it to his brother yesterday.’ (44) possessive van  de werken van Vondel ‘Vondel’s works’ (cf. Dutch Vondel’s werken)

18

Martin Haspelmath and Susanne Maria Michaelis

7.2

Increased analyticity in Brazilian Portuguese

Holm (2004) also discusses Brazilian Vernacular Portuguese, where independent pronouns regularly occur in addition to subject inflection on verbs, or even replace it (45), independent pronouns replace object clitics (46), and relative clauses are used with resumptive pronouns (47). (45) eu parto você parte ele parte nós parte eles parte ‘I leave’ ‘you leave’ ‘he leaves’ ‘we leave’ ‘they leave’ (46) ela chamou eu she called me ‘she called me’ (cf. Portuguese chamou-me, with object clitic) (47) o aluno que eu conheço o pai dele the student that I know the father of.him ‘the student whose father I know’ (lit. ‘…that I know his father’)

7.3

Increased analyticity in Bulgarian

Among the Slavic languages, Bulgarian and Macedonian show the most drastic changes away from the Proto-Slavic patterns. Most strikingly, genitive and dative case are replaced by the preposition na (originally ‘on’) (48–49), and the old comparative degree forms are replaced by the new particle po- (50). Hinrichs (2004) even claims that Bulgarian is a creolized form of Old Bulgarian. (48) Petar dade kniga-ta na Ivan. Petar gave book-def on Ivan ‘Peter gave book-the to Ivan.’ (cf. Russian Ivan-u [Ivan-DAT]) (49) kola-ta na Marija car on Marija ‘Marija’s car’ (cf. Russian Mari-i [Maria-GEN]) (50) po-umna compr-smart ‘smarter’ (cf. Russian umn-ee [smart-compr])

There are probably more cases of increased analyticity in European languages. As mentioned briefly above, Carlier et al. (2012) link differences in the pace of grammaticalization in various Romance languages (French, showing more advanced grammaticalization, compared to Italian and Spanish) to language contact. The Eastern Scandinavian languages (notably Swedish and Danish) also show higher degrees of analyticization than western languages (notably Icelandic and Faroese).

Analytic and synthetic 19

8. Conclusion In this paper, we briefly reviewed the history of the distinction between analytic and synthetic patterns and observed that it cannot be based on a synchronic definition, because we cannot define words and affixes in a consistent way. We therefore proposed a diachronic definition of an analytic pattern as a morphosyntactic pattern that was created from lexical or other concrete material and that is in functional competition with (and tends to replace) an older (synthetic) pattern. The main empirical observation is that analyticization is particularly frequent in Europeanbased creole languages, and we proposed an explanation in terms of extra transparency: in social situations with many adult second-language speakers, people need to make an extra effort to make themselves understood, i.e. they need to add extra transparency. This naturally leads to the overuse of content items for grammatical meanings, and thus to analyticization. It remains to be seen to what extent this explanation can account for differences within European languages (some relevant observations were made in § 7), and whether it can also account for developments in languages outside Europe. The support of the European Research Council (ERC Advanced Grant 670985, Grammatical Universals) is gratefully acknowledged.

References APiCS: Michaelis et al. (2013) Ansaldo, Umberto, Stephen Matthews and Lisa Lim (eds.). 2007. Deconstructing Creole. Amsterdam: Benjamins. doi: 10.1075/tsl.73 Atoyebi, Joseph Dele. 2010. A Reference Grammar of Oko: A West Benue-Congo Language of North-Central Nigeria. Köln: Köppe. Bakker, Peter, Aymeric Daval-Markussen, Mikael Parkvall and Ingo Plag. 2011. “Creoles Are Typologically Distinct from Non-Creoles.” Journal of Pidgin and Creole Languages 26(1). 5–42. doi: 10.1075/jpcl.26.1.02bak. Bickerton, Derek. 1981. Roots of Language. Ann Arbor: Karoma Publishers. Bruyn, Adrienne. 2009. “Grammaticalization in Creoles: Ordinary and Not-So-Ordinary Cases.” Studies in Language 33(2). 312–337. doi: 10.1075/sl.33.2.04bru Bybee, Joan. 2006. “Language Change and Universals.” In Linguistic Universals, ed. by Ricardo Mairal and Juana Gil, 179–194. Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511618215.009 Carlier, Anne, Walter De Mulder and Béatrice Lamiroy. 2012. “Introduction: The Pace of Grammaticalization in a Typological Perspective.” Folia Linguistica 46(2). 287–301. doi: 10.1515/flin.2012.010 Cristofaro, Sonia. 2012. “Cognitive Explanations, Distributional Evidence, and Diachrony.” Studies in Language 36(3). 645–670. doi: 10.1075/sl.36.3.07cri.

20 Martin Haspelmath and Susanne Maria Michaelis

Epps, Patience. 2008. A Grammar of Hup. (Mouton Grammar Library, 43). Berlin: Mouton de Gruyter. doi: 10.1515/9783110199079 Good, Jeff. 2012. “Typologizing Grammatical Complexities, or: Why Creoles May Be Paradigmatically Simple but Syntagmatically Average.” Journal of Pidgin and Creole Languages 27(1). 1–47. doi: 10.1075/jpcl.27.1.01goo Greenberg, Joseph H. 1960. “A Quantitative Approach to the Morphological Typology of Language.” International Journal of American Linguistics 26(3). 178–194. doi: 10.1086/464575 Haspelmath, Martin. 1999. “Why is Grammaticalization Irreversible? Linguistics 37(6). 1043–1068. doi: 10.1515/ling.37.6.1043 Haspelmath, Martin. 2000. “The Relevance of Extravagance: A Reply to Bart Geurts.” Linguistics 38(4). 789–798. doi: 10.1515/ling.2000.007 Haspelmath, Martin. 2004. “On Directionality in Language Change with Particular Reference to Grammaticalization.” In Up and Down the Cline: The Nature of Grammaticalization, ed. by Olga Fischer, Muriel Norde and Harry Perridon, 17–44. Amsterdam: Benjamins. doi: 10.1075/tsl.59.03has Haspelmath, Martin. 2010. “Comparative Concepts and Descriptive Categories in Crosslinguistic Studies.” Language 86(3). 663–687. doi: 10.1353/lan.2010.0021 Haspelmath, Martin. 2011. “The Indeterminacy of Word Segmentation and the Nature of Morphology and Syntax.” Folia Linguistica 45(1). 31–80. doi: 10.1515/flin.2011.002 Haspelmath, Martin. 2017. “Revisiting the Anasynthetic Spiral.” In Grammaticalization and Language Typology, ed. by Bernd Heine and Heiko Narrog. (to appear) Haspelmath, Martin, Matthew S. Dryer, David Gil and Bernard Comrie. 2005. The World Atlas of Language Structures. Oxford: Oxford University Press. Heine, Bernd and Tania Kuteva. 2006. The Changing Languages of Europe. Oxford: Oxford University Press. doi: 10.1093/acprof:oso/9780199297337.001.0001 Hinrichs, Uwe. 2004. “Ist das Bulgarische kreolisiertes Altbulgarisch?” In Die europäischen Sprachen auf dem Wege zum analytischen Sprachtyp, ed. by Uwe Hinrichs, 231–242. Wies baden: Harrassowitz. Holm, John. 2004. Languages in Contact: The Partial Restructuring of Vernaculars. Cambridge: Cambridge University Press. Hopper, Paul J., and Elizabeth C. Traugott. 1993. Grammaticalization. Cambridge: Cambridge University Press. Jespersen, Otto. 1894. Progress in Language: With Special Reference to English. Kortmann, Bernd and Benedikt Szmrecsanyi. 2011. “Parameters of Morphosyntactic Variation in World Englishes: Prospects and Limitations of Searching for Universals.” In Linguistic Universals and Language Variation, ed. by Peter Siemund, 264–290. Berlin: De Gruyter Mouton. doi: 10.1515/9783110238068.264 Lahiri, Aditi 2000. “Hierarchical Restructuring in the Creation of Verbal Morphology in Bengali and Germanic: Evidence from Phonology.” In Analogy, Levelling, Markedness: Principles of Change in Phonology and Morphology, ed. by Aditi Lahiri, 71–123. Berlin: Mouton de Gruyter. doi: 10.1515/9783110808933.71 Lehmann, Christian. 1985. “Grammaticalization: Synchronic Variation and Diachronic Change.” Lingua e Stile 20(3). 303–318. Lehmann, Christian. 2015[1982]. Thoughts on Grammaticalization. 3rd edition. Berlin: Language Science Press.

Analytic and synthetic 21

Leufkens, Sterre. 2013. “The Transparency of Creoles.” Journal of Pidgin and Creole Languages 28(2). 323–362. doi: 10.1075/jpcl.28.2.03leu Lupyan, G. and R. Dale. 2010. “Language Structure Is Partly Determined by Social Structure.” PLoS ONE 5(1). e8559. doi: 10.1371/journal.pone.0008559 Matthews, P. H. 1997. The Concise Oxford Dictionary of Linguistics. Oxford: Oxford University Press. McWhorter, John H. 2001. “The Worlds Simplest Grammars Are Creole Grammars.” Linguistic Typology 5(2–3). 125–166. McWhorter, John H. 2007. Language Interrupted: Signs of Non-Native Acquisition in Standard Language Grammars. Oxford: Oxford University Press. doi: 10.1093/acprof:oso/9780195309805.001.0001 Michaelis, Susanne. 2015. “Inflectional Complexity in Creole Languages: Evidence from the Atlas of Pidgin and Creole Language Structures.” Paper presented at the SLE conference in Leiden, September 2015. Michaelis, Susanne, Philippe Maurer, Martin Haspelmath and Magnus Huber (eds.). 2013. The Atlas of Pidgin and Creole Language Structures. Oxford: Oxford University Press. (apics- inline.info) Mufwene, Salikoko S. 2001. The Ecology of Language Evolution. Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511612862 Parkvall, Mikael. 2008. “The Simplicity of Creoles in a Cross-Linguistic Perspective.” In Language Complexity: Typology, Contact, Change, ed. by Matti Miestamo, Kaius Sinnemäki and Fred Karlsson, 265–285. Amsterdam: Benjamins. doi: 10.1075/slcs.94.17par Plag, Ingo. 2002. “On the Role of Grammaticalization in Creolization.” In Pidgin and Creole Linguistics in the 21st Century, ed. by Glenn Gilbert. New York: Peter Lang. Reinöhl, Uta and Himmelmann, Nikolaus. 2017. “ “Renewal”: A Figure of Speech or a Process Sui Generis?” Manuscript, University of Cologne. Sapir, Edward. 1921. Language: An Introduction to the Study of Speech. New York: Harcourt Brace. Schiering, René, Balthasar Bickel and Kristine A. Hildebrandt. 2010. “The Prosodic Word Is Not Universal, but Emergent.” Journal of Linguistics 46(3). 657–709. doi: 10.1017/S0022226710000216 Schlegel, August Wilhelm von. 1818. Observations sur la langue et la littérature provençales. Paris: Librairie grecque-latine-allemande. Schleicher, August. 1860. Compendium der vergleichenden Grammatik der indogermanischen Sprachen. Weimar: Böhlau. Seuren, Pieter and Herman Wekker. 1986. “Semantic Transparency as a Factor in Creole Genesis.” In Substrata Versus Universals in Creole Genesis, ed. by Pieter Muysken and Norval Smith, 57–70. Amsterdam: Benjamins. doi: 10.1075/cll.1.05seu Siegel, Jeff. 2008. The Emergence of Pidgin and Creole Languages. Oxford: Oxford University Press. Siegel, Jeff, Benedikt Szmrecsanyi and Bernd Kortmann. 2014. “Measuring Analyticity and Syntheticity in Creoles.” Journal of Pidgin and Creole Languages 29(1). 49–85. doi: 10.1075/jpcl.29.1.02sie Szmrecsanyi, Benedikt. 2009. “Typological Parameters of Intralingual Variability: Grammatical Analyticity Versus Syntheticity in Varieties of English.” Language Variation and Change 21(03). 319–353. doi: 10.1017/S0954394509990123 Szmrecsanyi, Benedikt. 2016. “An analytic-synthetic spiral in the history of English.” In Cyclical Change Continued, ed. by Elly van Gelderen, 93–112. Amsterdam: Benjamins.

22

Martin Haspelmath and Susanne Maria Michaelis

Thomason, Sarah Grey and Terrence Kaufman. 1988. Language Contact, Creolization, and Genetic Linguistics. Berkeley: University of California Press. Trudgill, Peter. 2011. Sociolinguistic Typology: Social Determinants of Linguistic Complexity. Oxford: Oxford University Press. WALS: Haspelmath et al. (2005).

A case for clustering speakers and linguistic variables Big issues with smaller samples in language variation Miriam Meyerhoff and Steffen Klaere

Victoria University of Wellington / University of Aukland

We undertake a detailed analysis of a sample of over 10,000 utterances from 18 speakers in a corpus of Bequia English and apply constrained cluster analysis to discern patterns that identify the linguistic signatures for different villages and to see how individuals pattern in relation to the rest of their village. The analysis of multiple variables provides a richer picture of both group and individual than any one variable does and holds promise for better understanding the mysterious mechanisms by which variation between individuals scales up to variation between groups. Keywords: Bequia, St Vincent and the Grenadines, creole, group and individual, multiple variables, cluster analysis, scaling up, historical linguistics

1. Introduction A number of years ago, a sociolinguistics professor was heard to reproach a class of graduate students in the following way. The class was dissecting someone’s research paper, and they critiqued the authors for the paucity of their data in some areas of the analysis. The professor cut off the discussion abruptly, saying “Don’t tell me what this study needs is ‘more data’. Everyone can always use ‘more data’; the question is what can you do with the data you’ve got”. The comment sets out challenges for all of us. It serves firstly as a reproach (don’t dismiss someone else’s work too quickly and easily), and secondly as a licence to think big, even if the data we have may be wanting – sometimes wanting in more ways than one. This chapter talks about some of the senses in which variationists can go big even if they are working on small data sets. It reviews some promising methods for scaling up the data (even from relatively small data sets) by looking at multiple variables across the speech of a number of speakers. And doi 10.1075/silv.19.02mey © 2017 John Benjamins Publishing Company

24 Miriam Meyerhoff and Steffen Klaere

it argues in favour of doing this by addressing the need to scale up our research questions. If we are successful in scaling up our questions and interests, then we believe small data sets can make meaningful contributions to the field. The need to scale up is important at this moment in the history of linguistics for several reason. First, it is important because if we can find a way to make the most of relatively little, it means we can benefit most fully from the datasets that people working on less well documented languages have. It even holds out the possibility that the work students undertake under the intense time pressure of a one year or one semester course might be able to make meaningful contributions to what we know about variation, language change and sociolinguistics. Second, it is important because there is a pressing need to draw explicit connections between sociolinguistic studies of variation and studies of language variation that intersect with phylogenetic and typological methods. The chapter begins with some comments about some of the big issues associated with small numbers. Much of this is well-trodden ground. It then outlines a slightly larger issue, which relates to how variationist work is a small, but still crucial, part of the broader intellectual enterprise of understanding how languages change and how new ones develop. Next, the chapter turns to a big issue that requires us to stay true to the details of how people talk to each other but also explores how we can link individuals to larger levels of sociolinguistic structure. This is an old problem, outlined by Guy (1980), but one that remains an unsolved mystery for the field of variationist sociolinguistics. We suggest that a focus on how our data might speak to bridging the gap between individuals and groups is a productive step because (i) it reasserts core problems that give direction to the sociolinguistic analysis of variation and (ii) it makes the case for variationist sociolinguistics as an integral component of research that focuses on linguistic variability writ large, specifically, research on language evolution and language change. Much of the last section is exploratory work, undertaken as part of ongoing interdisciplinary collaboration between a variationist sociolinguist and a statistician. 1 We see this collaboration as being not just to add new or more complex statistical methods to the analysis of linguistic variation (though we are certainly 1. This collaboration follows on from a larger interdisciplinary foray into such collaboration: “Methods ‘n Models: A language variation workshop” held at Victoria University of Wellington, 17– 19 February 2015. We are grateful to the other participants at the workshop for their comments and feedback – Richard Arnold, Andreea Calude, Aymeric Dayval-Markussen, Michael Dunn, Russell Gray, Evan Hazenberg, Naomi Nagy, Nancy Niedzielski, Meredith Tamminga, James Walker. We also thank Richard Arnold, Dick Smakman, Bodo Winter and Brian McArdle for useful additional discussions about quantitative methods appropriate to posing big questions with small data sets. Lily Trinh and Eva Maria Brammen were immensely helpful and provided great insights during their respective summer scholarships on the project funded by the University of Auckland.

Clustering speakers and linguistic variables 25

open to that). We see it as providing an opportunity for both fields. By drawing on the complex nature of natural language datasets, statistical methods and tools are stretched – datasets based on tokens extracted from spontaneous speech don’t always look like the kinds of datasets the statistics were designed to deal with. From the other side of the collaboration, we may be able to pose different kinds of questions about natural language datasets than variationists typically do when they analyse data using more familiar methods of multiple regression. We are particularly interested in questions of scale and relatedness: How are the variable speech patterns of individuals related to those of groups? How do the variable patterns of groups demonstrate closeness or distance between groups? We will attempt to explore these questions using methods that enable us to observe how individual speakers pattern in relation to one another with respect to multiple linguistic variables. The detailed analysis of individual variables has undoubtedly provided variationist sociolinguistics with a solid foundation and some essential generalisations about the relationship between synchronic variation and change (Labov 1994, 2001, 2010) or between synchronic intergroup variation and the strategic choices of individual speakers (cf. Jaffe 2009 and papers therein). However, we believe the field is sufficiently mature that we can now afford to lift our heads from the detailed analysis of single variables to consider how interlocutors categorise each other based on multiple variables realised within a single speech event (Guy 2013). When we talk to someone it seems highly likely that we engage with all the information available in their utterances, not just with the information associated with a single sociolinguistic variable. In order to explore the manner in which interlocutors might make use of the wide variety of sociolinguistic information contained in even a short discourse, we need new methods (cf. Hinskens and Guy 2016) – just as new approaches were required to explore how linguistic variation fits in with other behaviours in a more holistic and ethnographically rich analysis of speaker style or stance (Eckert 1996, 2000, Levon and Holmes-Elliott 2013). At the end of the chapter, we will outline some further big questions that are raised by the methods we use to explore how speakers cluster in relation to each other and in respect of multiple linguistic variables. In doing so, we suggest the broader potential inherent in these methods. 2. Going big from small (samples) A time honoured way of extracting the maximum information from even one utterance or one conversational exchange is for the sociolinguist to move into qualitative analysis. The qualitative analysis of one token can be seen as a refunctionalisation of the quantitative paradigm – as Schegloff (1993) pointed out years

26 Miriam Meyerhoff and Steffen Klaere

ago “one is also a number” (1993: 101); one token is only trivial if the denominator is a very large number, if a variable is comparatively rare and therefore the deno minator is small, then it may say quite a lot. Clearly, we would not argue that large numbers of tokens are essential for us to be able to comment on the ways in which variation informs us about the social structure of a community – Meyerhoff (1999) uses very few tokens of the lexeme “sorry” produced by men in Vanuatu as the basis for an analysis of language and gender. More recently, Buchstaller (2014) and Cheshire (2013) have shown us that very small numbers of tokens of morphosyntactic variables are worthwhile objects of study; they may capture the start of a change in progress (Cheshire 2013 on the use of man as a pronoun by some speakers of urban London English) or may provide vital clues to the ways in which speakers understand or analyse the variation underlying ongoing grammaticalisation in their speech community (Buchstaller 2014 on the infrequent, but telling, combination of be like and complementiser that in British English). As Buchstaller and Cheshire show, one or very few tokens of a variant may be informative not just for their ability to highlight the social, interactional or emotional dimensions of sociolinguistic variation, but they also have the potential to provide insights that address more formal and theoretical questions. It is not uncommon for quantitative analyses to shift into qualitative ones. Walker and Meyerhoff (2013) argued, in an analysis of copula presence and absence in the English spoken on Bequia island in the Eastern Caribbean, that whether speakers treat adjectives like verbs or nouns for copula deletion can be taken as diagnostic of community membership – the fundamental division between the villages was whether adjectives pattern with verbs or with nouns. Figure 1 (Meyerhoff and Walker 2007: 356) shows the patterns of copula deletion in at least four grammatical contexts for three villages on the island and it plots one individual in each village against the group norms. These are people who have been away from Bequia for some period in their life, and then returned to their home village – the ‘urban sojourners’. By and large, the individuals match the patterns of their village community. The exception to this is perhaps the Hamilton speaker, who seems to deviate from the community norms and almost never uses zero copula before adjectives; he seems to be treating them like nouns, though for all other following grammatical categories, his pattern looks very similar to the other Hamilton speakers. Walker and Meyerhoff (2013) argue, however, that the super low rate of deletion before adjectives in the speech of this Hamilton person does not necessarily put him at odds with the community norm. They note that in his speech, the adjectives are predominantly three lexemes (weak, strong, right) and they occur mainly

Clustering speakers and linguistic variables 27

100 90 80 70 Mt Pleasant 60 speakers 50 40 30 20 10 0

Stay-at-home Urban sojourner

V-ing

Adj

PP

NP

Following grammatical category

100 90 80 70 Paget 60 Farm 50 speakers 40 30 20 10 0

V-ing

Adj

Loc

PP

NP

Following grammatical category 100 90 80 70 Hamilton 60 speakers 50 40 30 20 10 0 V-ing

Adj

PP

NP

Loc

Following grammatical category

Figure 1. Percentage frequency of zero copula with different following grammatical category in three Bequia villages (group trends against the patterns of an urban sojourner in each village). Source, Meyerhoff and Walker (2007: 356).

28

Miriam Meyerhoff and Steffen Klaere

in a section of his interview where he was trying to persuade his listeners about the merits of homeopathy (Walker and Meyerhoff 2013: 192). In other words, they suggest that the data was marked both stylistically (topic) and formally (lexically restricted forms). So the quantitative divergence of the Hamilton urban sojourner from the generalisation about the stability of community norms across indivi duals must be tempered by this information about the marked quality of the data from him. What Walker and Meyerhoff did with this analysis is not uncommon in sociolinguistic studies of variation – they started out with a quantitative ana lysis but used more qualitative means of exploring data that in a purely statistical analysis might be written off as an “outlier” (this relates of course to the principle of accountability in sociolinguistics which requires that all applications of a rule should be analysed fully). As a field, variationists come honestly by their predilection to mix up the quantitative and the qualitative. The foundation of variationist sociolinguistics involved anthropological as well as historical linguistics, and this heritage a llows variationists to shift comfortably from quantitative analysis into qualitative ana lysis and sometimes even back again. There are certainly tensions created by this dual heritage, but those tensions are some of the things that have made sociolinguistics, and specifically variationist sociolinguistics, so alluring to many researchers in the field. 3. Language variation in context An awareness of this blended heritage and how the field has successively negotiated its legacies from historical and anthropological linguistics can be a real asset. We would argue that they should feed into other branches of ongoing research into language variation and change, some of which show little awareness of variationist traditions. The specialist knowledge sociolinguists possess and our hard-won insights into the dynamics of language change are crucial, if poorly understood, components of the new research going on in historical linguistics, where people are drawing on insights from evolutionary biology about change in complex systems to pose big questions about language variation across language families. For example, Gray, Greenhill and Atkinson (2013) posed three big questions that they argued historical linguistics still must address: What drives language diversification? What drives linguistic disparity (a measure of the overall amount of variation between varieties)? Can we identify cultural and linguistic homelands? (Gray et al. 2013: 287). Bromham et al. (2015) tackle the diversification question in more depth. This paper applies some of Trudgill’s claims (2004, 2011) about how community size and community density favour or disfavour certain kinds

Clustering speakers and linguistic variables 29

of complexity, accelerating or retarding language change, to the facts of language diversification in the Polynesian language family. Polynesian is a good test case for this, since there is a lot of well checked data on vocabulary relatedness, and this tells us how the Polynesian languages are related to each other. Scholars also very largely agree about the settlement dates for the language communities, and this allows us to infer how long each language has been developing on its own. Finally, there is good information about how big the geographic spread of each language’s home territory is. There is some dispute about the population sizes for the languages at the time of European contact, but even with respect to this, it is possible to define a range most scholars are happy with, so this gives us estimates for each language community’s size at two points in history – European contact and today. Bromham et al. conduct pairwise comparisons of closely-related Polynesian languages – sisters on their branch of the family tree – to see what effect populations size has on the gain or loss of cognates in 210 items of basic vocabulary. They find a significant effect for population size when you look either at vocabulary gains or losses. Languages with smaller speaker populations tend to have higher rates of word loss in the basic vocabulary than languages with larger speaker populations do. Conversely, languages with larger speaker populations in the Polynesian language family tend to gain new words in the basic vocabulary more than languages with smaller speaker populations do. The modelling also factors in how long the languages in each pair are believed to have been separated from each other, thereby controlling for the amount of time required for change in a language’s basic vocabulary to take place. Figure 2 shows the key results. In Figure 2, the range of expected gains for each pairwise comparison are plotted on the left as normal distributions and the range of expected losses are plotted on the right. Then the actual attested rates of gain or loss are shown for the two languages being compared with the solid bars, yellow for the language with the larger population and blue for the one with the smaller populations. The generalisation that smaller languages tend to lose basic vocabulary more than languages with larger speaker communities is very clear when we look down the graphs on the right. The situation is muddier when we look across the figures for vocabulary gains. 2 Bromham et al. are very clear that they just consider this a first step in the enquiry, not a definitive answer to the sociolinguistic and historical linguistic problem of how community size impacts on the long-term relatedness of languages within the same family. It is heartening that people outside of sociolinguistics are drawing 2. In private correspondence, Bromham and Greenhill say that there are several reasons why we don’t see more of the blue lines to the right of the red lines in the figures comparing vocabulary gains. One is that the model factors in how long the languages have been separated.

0

0.1

Miriam Meyerhoff and Steffen Klaere

0.1 0.0

Samoan East Uvea

0.1 0.0

Ifira-Mele West Futuna

0.1 0.0

Tahitian New Zealand Maori

0.1 0.0

Rarotongan Penrhyn

0.1 0.0

Marquesas Mangareva

0.1 0.0

Tikopia Vaeakau-Taumako

0.1 0.0

Rennellese East Futuna

0.1 0.0

Kapingamarangi Nukuoro

0.1 0.0

Takuu Sikaiana Emae Anuta

0.0

Probability density 0.0 0.1 0.0 0.1 0.0 0.1 0.0 0.1 0.0 0.1 0.0 0.1 0.0 0.1 0.0 0.1 0.0 0.1 0.0 0.1

30

20

40

Number of gains

60

0

40 80 Number of losses

120

Expected number of changes in larger population Expected number of changes in smaller population Observed number of changes in larger population Observed number of changes in smaller population

Figure 2. Observed and expected numbers of gains and losses of cognates from basic vocabulary in 10 language pairs under the best-fitting model (phylogenetically structured, constant population size, no founder effects). Reproduced with permission from Bromham et al. (2015).

on concepts and research done in the variationist paradigm to pose themselves new challenges, but these papers pose an implicit challenge to sociolinguistics. If research on historical and typological linguistics is drawing on sociolinguistics to pose big questions for their fields – what are the equivalent big questions for people working on language change within the variationist paradigm?

Clustering speakers and linguistic variables 31

In an unpublished talk at the New Ways of Analyzing Variation conference in Chicago, Guy (2014) outlined at least one big question that he feels is implicit, but unarticulated, in modern sociolinguistics. He proposed that there is an underlying tension between the notion of the speech community, which emphasises coherence, and the analysis of individual speaker style, which emphasises individual agency. His resolution (Guy 2013) to this was to suggest that suites of variables that correlate with each other comprise the units or components of the linguistic system that afford us the sense of coherence that underpins the speech community, while the ones that are uncorrelated are the aspects of the linguistic system that allow for individual agency. 4. Addressing a big problem for variationist sociolinguistics Guy correctly identifies a big problem for variationist sociolinguistics and provided a very clever resolution to it, though it is debatable whether it is primarily an issue about coherence versus agency. The question, it seems to us, is not: “How do we resolve an apparent contradiction between individual agency and group cohesion?”, but “What is the relationship between the group and the individual in sociolinguistics?”, a question dating back to a much earlier paper of his (Guy 1980). Sociolinguists may have various understandings about the relationship of the individual’s linguistic repertoire and that of the groups they are part of (this might be a series of nesting relationships drilling down from speech community to community of practice to individual) but this still dodges the fundamental sociolinguistic problem, namely in what way does the variation occurring in the speech of an individual scale up to group norms? Labov (2001) argues that we can’t understand the individual without understanding the dynamics and structure of the group first. Although Labov expressly rejects the possibility that social psychological approaches to the individual might be able to assist with the analysis of variation in the speech community, Labov’s opinion that we need to understand the group to understand the individual actually lines up rather well with a lot of social psychological research on individuals. Recent work in the field seems to show that our identification with a group enhances how good we feel about ourselves, and conversely, that how good we feel about our group memberships depends on aspects of our individual personality (Wilson, Bulbulia and Sibley 2013). Labov’s position is a top-down approach: start with the variation within and among groups and you will be able to understand the variation in the individual. This has philosophically shaped the field’s traditional methods, specifically, the practice of grounding an analysis of variation in the behaviour of social groups, not individual speakers, that is it provides a philosophical motivation for the practice

32

Miriam Meyerhoff and Steffen Klaere

of drawing quantitative generalisations over groups and then adding qualitative analysis of individuals’ behaviour afterwards, as mentioned earlier. It has also played a role in debates over the merits of using quantitative models that treat individual speakers as a random effect. Proponents for the use of mixed effects models in which speakers are random effects (Baayen 2008, Johnson 2009) argue that it reduces the chance of Type I statistical errors (avoids stating significant effects for groups such as age or sex where the influence of individual speakers on the variation means that such social groupings are not, in fact, significant). Recently, however, questions have been raised about the appropriateness of models in which speaker is treated as a random effect for small data sets. Roy and Levey (2014), drawing on research by Moineddin et al. (2007), have urged caution with this method of analysis for sociolinguistic datasets with few speakers or few tokens of the variable under investigation in any given cell. Small datasets, especially those based on the painstaking description of an undocumented language are almost guaranteed to trip these triggers. Caution and some creative thinking about the kinds of inferential statistics we use on such datasets are perhaps in order. The methods we present in the next section are an attempt to respond creatively not only to the big question of the group and the individual, but also to issues with the samples we may be working with. Labov’s stance that the primary object for analysis is the group, not the individual, is a good starting point for questions about how variation diffuses once it has acquired some kind of social meaning within a speech community. Ironically though, it is not clear whether it helps us “bring people back into the field of socio linguistics” (Labov [1966] 2006: 157), as he has exhorted the field to do. It seems that the mystery of the relationship between variation in the individual and in the group – a mystery which lies at the empirical heart of our field – is something we have been able to set aside for many years while we contest methodologies. 5. The missing link – a sociolinguistic Higgs-Boson particle? Suppose we were to decide to pursue this question seriously, what would we be looking for? One possibility is that we will be trying to define a mechanism (e.g. a complex constraint on the combinatorial relationship between variables); another is that we will be seeking to identify an independent variable (e.g. a constant in the speech of an individual which maps the behaviour of individuals onto the behaviour of groups). These are very different goals or targets, but sociolinguistics is in a position to make headway in such an endeavour. We now have decades of descriptive variationist work which provides a rich context for our enquiry, and we are also seeing a growing willingness to tackle problems in language variation that arise from situations of language or dialect contact. If we consider how the fields

Clustering speakers and linguistic variables 33

of historical linguistics and creolistics have been reinvigorated in recent years by the posing of bold and difficult questions, it seems clear that a concerted effort to tackle this fundamental question could only add to the vitality of variationist sociolinguistics. We are essentially proposing that variationist sociolinguistics embark on a search for our field’s equivalent of the Higgs-Boson particle. We don’t know at the outset of the enquiry whether we are seeking evidence for a mechanism or variable that unifies the group and the individual, indeed, it may prove not to be a single mechanism or variable. What we are looking for may or may not be something that we can view directly – it might, for instance, be something the only emerges when several other conditions are met. The link between individual and group may be something that manifests itself in finite or infinite ways. At this point in the programme, it is perhaps less important to specify clearly what the nature of the end product is likely to be than it is to specify these kinds of questions about the end product. A conventionally structured chapter would, at this point, have its “reveal”; perhaps presenting a neatly organised exploration of a dataset that throws into relief what is still unknown against that which is known. Our purpose in this chapter is more modest: we propose to outline a process by which we might helpfully move towards identifying what constitutes the unknown. 6. Individuals and groups in Bequia (St Vincent and the Grenadines) We approach this with a sub-corpus of the recordings from Bequia island (St Vincent and the Grenadines) that James Walker (York University, Toronto) and Meyerhoff have been working with for a number of years. The Bequia corpus (Meyerhoff and Walker 2013a) is both quite big and quite small. For an undescribed language, for which there are next to no historical recordings, the corpus of transcribed recordings is non-trivial. By comparison with corpora of standardised and well-described languages (such as English, Spanish, Arabic), it is nugatory. As we noted at the start of the chapter – we could probably do with more data, but since that is vacuously true for all samples, the more interesting question is, what can we do with what we’ve got? Walker and Meyerhoff extracted from the total corpus of interviews a sub- sample of 18 out of the total of 60 Bequia speakers that had been recorded. 3 For 3. They refer to this sub-sample as a “short-fat” sample, in that it codes exhaustively for a range of variables in the interviews of a smaller number of speakers from the total corpus. This contrasts with a “long-thin” sub-sample which would sample a smaller number of tokens from all speakers in the corpus.

34

Miriam Meyerhoff and Steffen Klaere

each of these speakers, they extracted and coded heavily from each speaker’s interview. This meant that although there was a small number of speakers, there was quite a lot of data from each of them. This short-fat corpus comprised 10,621 tokens of finite clauses coded for 18 linguistic variables and six aspects of social structure. The Bequia corpus is well-suited for this kind of enquiry because there is a very clear sense among people living on the island that they have a distinct way of speaking, what they refer to amongst themselves as Dialect. A visitor from outside the Caribbean might not be able to differentiate the English creole spoken on Bequia from the English creoles spoken elsewhere in the Eastern Caribbean, but we have plenty of evidence from the people we worked with in Bequia, and from visitors from other Caribbean islands, that within the region, a distinct variety associated with the island is recognised. Nevertheless, Bequians also have a strong sense of place within their island. Although the island is small, and the population is also quite small (at the time of our fieldwork in the early 2000s, the population had held steady for several decades at about 5000 people), people identify quite strongly with their home village on the island. This is particularly so for the historically white village, Mount Pleasant, but also for some of the villages of mixed or predominantly African descent. In our work, and in Agata Daleszynska’s work with teenagers on the island about 5 years after us (Daleszynska 2011), we focused on Paget Farm (an ethnically mixed fishing village on the south coast) and Hamilton (a mainly Black village on the former site of a plantation near the main harbour) and Mount Pleasant (the historically White village in the hills above the main harbour). Both Paget Farm and Hamilton villages are quite populous by Bequia standards, and both have competing claims as representing real, or authentic, or traditional Bequia life and social values (Daleszynska 2013 discusses this particularly well). Given that Bequians attentively track village membership, and given that, as Daleszynska shows, the values and attributes associated with each village are actively negotiated in intergroup and interindividual contexts, it’s not surprising that Bequians simultaneously claim to all share their one speech variety, Dialect, that distinguishes them as Bequian, and also claim that they can differentiate amongst themselves and tell what village someone comes from, just from the way they speak. Below this level of sociolinguistic structure, Meyerhoff and Walker (2007) have also discussed the interesting case of the people they call ‘urban sojourners’. These were the individuals contrasted against the larger community in Figure 1. Urban sojourners are people who lived in the village where they had grown up (often the same village that one or more of their parents and one or more of their grandparents had also lived and grown up in), but who had spent some time working overseas – in Canada or the UK. Obviously, every individual speaker on Bequia has their own individual speech signature in some way or another, but the urban sojourners turned out to have a particular signature that has showed up over

Clustering speakers and linguistic variables 35

several variables examined (Meyerhoff and Walker 2007, Meyerhoff and Walker 2013b, Walker and Meyerhoff 2013). In general, they sound rather different from the rest of the people in their village and this is partly because of their pronunciation of Dialect but also partly because the frequencies with which they use local or supra-local variants tend to the extreme. That is, they show variation between the same local and supra- local variants that their peers who stayed in their village use, but they much more strongly favour one variant than their stay-at-home peers do. Nevertheless, when we look at how they are using the local and supra-local variants, we find that their use of the variants still seems to be constrained by the same linguistic factors that we found constrain the other speakers in their village. Meyerhoff and Walker conclude that speakers’ mobility may affect the raw frequencies with which a form occurs, but the grammar of their variation (as evidenced by the sharing of constraints on the variable) remains largely unaltered. 4 To recap, the short-fat corpus consists of a number of speakers from one island, who can further be identified according to their home village and who within each village can further be identified as individuals, including the people we have called urban sojourners. This means that we have a corpus that is well-suited to the kind of scaling up exercise necessary for addressing the question of how variation in the individual is related to variation in the group (here, the village and the island). 7. Linguistic features in the Bequia corpus Our short-fat corpus contains 10,621 grammatically encoded lines from interviews with 18 speakers coming from the 3 Bequia settlements, (Hamilton, Mount Pleasant, Paget Farm). Individual speakers contributed between 300 and 823 lines to the corpus. Apart from the community, we recorded four social variables to potentially explain certain observations in the data. These were age (between 42 and 100) and sex of the speaker, whether they left their community for a significant amount of time in their life (urban sojourner), and how many people were present during the interview. For each encoded line, the states of 13 grammatical variables were obtained. Each variable had between 2 and 21 categories, making the space of potential combinations practically impossible to explore. Thus, we regarded only six variables: form of the subject, form of the copula, verbal auxiliary or modal, form of verbal negation, negative concord, tense-aspect marking on the verb. These variables were 4. Whether we should speak of variation as a grammar is another big question that even small data sets might be applied to answering.

36

Miriam Meyerhoff and Steffen Klaere

selected as the test case because they should allow us to see a signature that unifies groups of speakers in how predication is handled. They suit our overall purpose well because they often multiply mark a single utterance, allowing us to grapple with the question of how clusters of features might characterise different speakers or groups of speakers on Bequia. We treat the combinations of states of these six grammatical features as profiles, and wish to investigate their joint discriminative power with respect to the communities. The motivation behind this comes from the idea that investigating the utterances as a whole rather than each by itself or in pairs will use more information, including information about profiles not observed. The simultaneous analysis of variables has been previously proposed in Guy (2013) and Oushiro and Guy (2014) and the differences between our approaches will be made clear in the conclusion. The categories within each feature range between 5 and 16 levels. Assuming independence of features (which is rather naïve) we would expect 748,800 unique profiles. In practice, we found 908 unique profiles of which 440 occurred exactly once. We reduced the number of profiles further by categorising the statistics of each feature into one of four categories: Creole-like, English-like; one that is undetermined or ambiguous with respect to its form (i.e. it occurs in both Standard English and English based creoles of the Caribbean); and did not apply in the utterance (for example, coding for negative concord does not apply in an utterance in which either there is no negation or there is no indefinite NP argument alongside the negation). There was no line without a Subject Type, no Creole Form of the Auxiliary Modal, and no English form of the Negative Concord. Consequently, we had 3 variables with 4 levels, and 3 variables with 3 levels, yielding an overall number of 33 43 = 1,728 unique potential profiles. We observed 213 unique reduced profiles of which 61 occurred exactly once. The motivation behind this transformation is to generate a state space with fewer states but more frequent observed profiles. This simplification of the data is not ideal for a whole lot of reasons – not least of which is the deterministic notion of Creole vs non-Creole, which a linguist would ultimately like to be part of the enquiry. Eventually, our goal is to be able to run models that incorporate the full richness of a heteroglossic speech community like the one in Bequia, and which allow us to identify the impact that specific forms have on the way in which a speech community divides up, and how this interacts with an individuals’ speech patterns. Unfortunately, even the short-fat sample of individuals on Bequia doesn’t provide enough data to extract a clear signal from the noise, which suggests that some questions really cannot be put except to corpora that are even larger than the 10,000-plus tokens we have in this one. Our ultimate goal is to represent each speaker by a proportion vector indica ting how often the speaker used any of the 213 profiles in the transformed corpus, and then to compare the proportion vectors to find commonalities within the

Clustering speakers and linguistic variables 37

villages. An initial observation about these 213 signatures – very few patterns are unique to individuals (with the largest number being 5% of all utterances of Speaker 1). This is important because it tells us that the differentiation of the individuals based on these combinations of variables is mainly due to how often they used combinations of variables within the larger community pool, not because they are doing something completely on their own. 8. Clustering speakers with respect to multiple linguistic features The approach that we adopt in analysing the data at this point is one that is not often used in linguistics. It is related to principal components analysis (cf. Horvath and Sankoff’s (1987) analysis of the Sydney speech community). Principal components analysis is the simplest eigenvector-based multivariate transformation method. It infers latent variables which capture parts of the data variation in decreasing order. Depending on the question asked it might simply be inadequate (see, e.g., Legendre and Legendre 2012, for an overview of transformation methods and their suitability). There are alternatives that are more widely used in other sciences. Latent cluster analysis, for instance, is used by Wilson, Bulbulia and Sibley (2013) to identify five latent categories in a sample of over 4000 people. These categories, which they call “faith signatures”, emerged in their responses to a range of statements about the paranormal, extraordinary life forms and the spiritual. These five faith signatures emerge independently of how respondents might align themselves with larger social classifications, such as religious, believers in the paranormal or sceptics (though naturally the data would allow respondents to be grouped in this manner as well if one chose to). The individual speaker effect can overshadow the effect of location (Johnson 2009) and reduced study sizes may make it difficult to guarantee homogeneous language communities (Cedergren and Sankoff 1974). Thus major sources of variation observed in the data may not be attributable to our variable of interest but due to individual variation, thus making unsupervised clustering like PCA unsuitable for our study. Constrained correspondence analysis (CCA) takes analyst-specific constraints into account when inferring the latent axes from the data (ter Braak 1986; Legendre and Legendre 2012). In our case, we constrain the cluster analysis by village membership, since the Bequia speakers themselves have told us that although they perceive themselves as grouped together as speakers of Dialect, they can differentiate amongst themselves according to what village someone comes from. In other words, the speakers themselves understand the variation in their speech community to be constrained in this manner. The questions we can pose are therefore what evidence of this grouping can we find looking purely at the

Miriam Meyerhoff and Steffen Klaere

linguistic data? And if we find evidence of clear groupings what linguistic features account for this structure? We used the implementation of CCA from the R-package vegan (Oksanen et al. 2016) to visualise the constraint groupings within the data. Figure 3 shows the speakers and their numbers and ellipsoids indicating the range of the speakers for each settlement. We find that most speakers from Hamilton lie more or less on the horizontal line for CCA2 = 0, but their spread appears wider due to Speaker 27, who is an urban sojourner, and clusters more closely with the Paget Farm commu nity. There are two further urban sojourners, Speaker 24 (from Paget Farm), and Speaker 313 (from Mount Pleasant). Their relative location to the Paget Farm ellipsoid leads to the speculation that Paget Farm might have the most hybrid spoken language of the three communities. CCA2 seems to have more discriminative power to distinguish the Paget Farm community (CCA2 > 0) from the Mount Pleasant community (CCA [Y] and [ɑː] > [aː]. Also, in the adolescents’ speech the adjective agrees with the subject in number and gender as in standard Swedish, whereas it does not in the speech of the adults). Furthermore, dialect variants with few tokens in the data, e.g. /sl/> /ʃl/, i.e. /slʉːta/> /ʃlʉːta/, ‘stop’, and the prefix /ʉː/, i.e. /uːlik/> /ʉːlik/, ‘different’ (standard Swedish /uː/), are only

104 Jenny Nilsson and Lena Wenner

used by the NORM speakers (recorded in the 1940s and 2011). In other words, the general tendency is that variants used infrequently by the more traditional speakers are used only sporadically, or not at all, by the more standardized and especially the adolescent speakers. Overall, it seems that the dialect repertoire in Torsby is shrinking. 5. The unruly [a] The dialect variant [a] has a long history in the area: it is found in legal texts from the 13th century in western Sweden (Collin et.al. [1827] 1976:IV; Broberg [1972] 2001: 76 and dialect maps 5:7, 8:3; Götlind & Landtmanson 1940–41: 260). In our 1940s data, informants use it sporadically, and only in front of liquids (/r/ and /l/). It thus seems that this dialect variant was undergoing change towards the standard variant in the mid-20th century. If this variant followed the same pattern as the other dialect variants investigated in this paper, it should be replaced by the standard variant [ɛ] in the newer recordings, at least among the adolescents. This was predicted in the 1970s: Broberg ([1972] 2001) notes that this dialect variant was undergoing change in Värmland, and that the short (ɛ) in front of /l/ and /r/ was realised more like the standard Swedish pronunciation [ɛ], rather than the dialect variant [a]. However, as was stated in the introduction, this dialect variant appears to be behaving in a rather unruly fashion. Not only has it not disappeared, its use has increased significantly and all speakers in the 2011 recordings use it frequently. Adding to the picture of what could almost be described as a ‘phonetic explosion’ is the fact that speakers use it in front of all phonemes, an expansion of which we found no trace in the recordings from the 1940s, nor Broberg in the 1970s. Some time between 1970 and 2011 something must have happened to this dialect variant. In order to investigate the use of (ɛ), words that contained the vowels /ɛ/ and /a/ were measured in all informants’ speech. In total, we have measured approximately 150 tokens of the variable short (ɛ) and for comparison approximately 80 tokens of the variable short (a) (in order to see how similar the realisations of the two variables are) in the three datasets from Torsby and Karlstad. The measurements were made in Praat (Boersma & Weenink 2013). Figure 3 illustrates the vowel phonemes of standard Swedish, and how /ɛ/ opens towards /a/. In Figure 4, a NORM speaker recorded in Torsby in the 1940s illustrates the traditional system, i.e. the phonemes are kept apart except from in front of /l/ (and for some people /r/) where the phoneme /ɛ/ sometimes opens towards /a/. Although the NORM informants recorded in the 1940s use the dialect variant only to a small extent in front of /r/ and /l/, all the newly recorded informants use

The unruly dialect variant [a] 105

u

y

i

u I

Y e

o

ø

ѳ

œ

α a

Figure 3. The vowel phonemes of Central Standard Swedish, after Engstrand (1999: 140). The arrow illustrates the opening of /ɛ/ towards /a/.

400

Standard /ε/ Dialect /ε/>/a/ Comparison /a/

läpparna ‘lips’ järpen ‘hazel hen’ snäll ‘kind’

500 i stället ‘instead’

järpen ‘hazel hen’

rädd ‘afraid’

F1 600

hjälp ‘help’

kälke ‘sledge’

hjälp ‘help’ fallet ‘case’ hade ‘had’ alltså ‘you know’ hjälp

700

arbetet ‘work’

‘help’

aldrig ‘never’

800 2000

1800

1600

1400 F2

1200

1000

800

Figure 4. One NORM-informant’s pronunciation of /ɛ/ and /a/ illustrates the traditional system in Torsby in the 1940s.

it frequently. It is very noticeable and it is not only found in front of /r/ and /l/, but in all consonant contexts, i.e. /f/ (/hɛftit/>/haftit/, ‘cool’), /g/ (/vɛgː/>/ vagː/, ‘wall’), /j/ (/sɛʝəɹ/>/saʝəɹ/, ‘says’), /k/ /slɛkt/>/slakt/,

106 Jenny Nilsson and Lena Wenner

‘relatives’), /m/ (/ɛmnən/>/amnən/, ‘subjects’), /n/ (/çɛnstən/>/çanstən/, ‘the position’), /ŋ/ (/tɛŋt/>/taŋt/, ‘thought’), /s/ (/hɛst/>/hast/, ‘horse’) and /t/ (/lɛtː/>/latː/, ‘simple’). In Figure 5, a 24-year old woman serves as an example of the realization of (ɛ) and (a) in Torsby in 2011. The pronunciation of the word promenadhäst ‘re creational riding horse’ and the name Anna are displayed and illustrate how close the dialect realisation of /ɛ/ in häst is to that of the open /a/ in Anna in the dialect. The dialect speaker is marked with a square in the vowel chart. For comparison, we have made a recording with a standardized speaker’s realisation of the same words. As the vowel chart shows, the distance between /ɛ/ and /a/ is shorter in the speech of the Torsby informant than for the more standardized speaker.

600

Standard Dialect

promenadhäst

800

promenadhäst

F1

a Anna

1000

a Anna

1200 2000

1900

1800

1700 F2

1600

1500

1400

Figure 5. Vowel chart illustrating the distance between the pronunciations of /ɛ/ and /a/ for a dialect speaker (recorded in 2011) and a standardized speaker (recorded in 2015).

We have also listened to data from Karlstad as well as some smaller villages in Värmland (see Section 3). In Köla, Gräsmark and Skillingmark, all within a 100 km radius of Torsby, we found frequent use of this variant among all ages (regardless of gender or social background). In Karlstad, we have also found some speakers who open /ɛ/ very clearly (again without any correlation to any social variables). In locations further away from Torsby (i.e. Södra Finnskoga, Dalby, Gåsborn and Hammarö), we found no trace of the dialect variant [a] and it seems this dialect variant is restricted to an area between Torsby, Skillingmark and Karlstad (see Figure 1 above).

The unruly dialect variant [a] 107

6. Discussion In this article, we have described the increased use of the unruly dialect variant [a]. It is unruly in the sense that it does not follow the typical pattern for traditional dialect features: in our data, dialect variants that are uncommon in older data are used even less frequently, or are absent, in today’s speech in Torsby. However, the dialect variant [a] is used more frequently today than 70 years ago, and is found in front of all phonemes, in contrast with the 1940s when it was only used in front of /r/ and /l/. In the newly recorded data, informants of all ages use the [a], regardless of how much traditional dialect they speak, and regardless of age, gender or social background. As a comparison, we have also investigated the use of [a] in a dataset from the area’s largest village, Karlstad, as well as in short speech samples from seven small villages in Värmland. In Karlstad, Köla, Skillingmark and Gräsmark, we also found this unruly variant, but not in locations further away in the region. In Karlstad we also found single cases of the opening of the variable (ɛ), i.e. [ɛː]. As mentioned in Section 2, previous research has described the opening of /ɛː/ towards /aː/ in Swedish, and has suggested that this is an ongoing change in the Swedish vowel system in general, as this can be noted in large parts of Sweden (Leinonen 2010; Svahn & Nilsson 2014). As it is thought to spread from urban areas, it can be said to signal urbanity. Could it simply be that /ɛ/ follows the same pattern as /ɛː/? The answer to this must be no. We have found no evidence of the open /ɛː/ in Torsby, nor any other traces of other new urban linguistic features, regardless of age or how levelled towards the standard the informants are. Furthermore, Torsby informants do not seem to orient towards urban lifestyles to any great extent, indeed it is rather the opposite. As was mentioned in Section 3, the interviews and questionnaires reveal that these informants have close-knit social networks and identify strongly with local values. Instead, we believe that the long /ɛː/ and short /ɛ/ are two separate processes with separate social meanings (see also Figure 6): The opening of long /ɛː/, we suggest, indexes (Silverstein 2003) modernity and urban lifestyles, whereas the opening of short /ɛ/ rather expresses tradition and authentic local identity and is linked to place (see also Johnstone et al. 2006; Johnstone & Kiesling 2008; Johnstone 2009, 2014). The motivation to signal tradition and local identity in Torsby may be an effect of increasing contact with other Swedish varieties over the past 70 years, similar to the processes Labov found in Martha’s Vineyard where his informants diverged from tourists by speaking more traditional dialect and thus marking their sovereignty as locals (c.f. Labov 1972). Even though Torsby is homogenous and isolated compared to many other locations in Sweden, citizens of Torsby of course have more contact with other parts of the world today than 70 years ago,

108 Jenny Nilsson and Lena Wenner

//

[æ:] or [a:]

standard and neutral

modern / urban

//

[a]

standard and neutral

traditional / local

Figure 6. The social meaning of opening /ɛː/ and /ɛ/.

and linguistically signalling who is in-group and who is out-group may be of importance (c.f. Giles & St Clair 1979: 11, Silverstein 1998, Johnstone et al 2006). From the fieldwork, interviews and questionnaires, it is apparent that Torsby citizens in general are proud of their heritage, and that speaking dialect is a very important part in maintaining an authentic local identity. Studies by Røyneland (2005) and Johnstone et al (2006) (see also Johnstone 2009) demonstrate similar trends in eastern Norway and Pittsburgh, USA. The question remains how this dialect variant gained such ground in this area, both language internally as well as in the speech community. Why this variant and not another? Did this variant at some point in between 1940 and 2011 have certain indexical values (Silverstein 2003) that would speed up this process? From our data, it seems it did not in the 1940s, and from Broberg´s ([1972] 2001) report it did not in the 1970s either. Unfortunately, we only have access to data from NORM speakers from the 1940s, and no data from the 1970s–1990s, which could have shed some light on where this use emerged, as well as who used it. However, as the traditional dialect variant [a] was used in a much larger geographical area (see e.g. Götlind & Landtmanson 1940–41), and was not typical for Torsby (opposite to several other traditional salient dialect variants), it is hard to see what this specific variant would have indexed in the 1940s. It is possible that [a] at some point in between 1970 and today did have certain indexical values (c.f. Johnstone et al 2006). According to the data from 2011, it is apparently used by most in the speech community in question today, and it is possible that this traditional dialect variant has become enregistered (Agha 2007, Johnstone 2009). A potential meaning (Eckert 2008) could then be “authentic local citizen”. This is however only one possible interpretation, made by outsiders, and in order to fully understand the social meaning of this marker for the users of it we would need to further investigate the Torsby citizens’ perceptions and attitudes towards different realizations of (ɛ). In the meantime, we can at least conclude that the linguistic variable (ɛ) is very common in Swedish speech, and the frequency with which this dialect variant is used adds to the impression of an individual being a traditional dialect speaker,

The unruly dialect variant [a] 109

which we suggest is perceived as positive by most in Torsby. The frequency in itself may also partly explain the rapid spread in the speech community; as speakers would be exposed to it in almost any short conversation, convergence processes, and ultimately change, could be faster than for a less frequent variant (see e.g. Auer & Hinskens 2005; Giles & Smith 1979). Other reasons for why the use of the traditional dialect variant [a] has increased so rapidly, and not another previously more locally salient dialect variant, remain a mystery.

Acknowledgements The research reported here was funded by the Swedish Academy with support from the Knut and Alice Wallenberg Foundation, as well as by the Swedish Research Council (VR) and the Institute for Language and Folklore. We would like to express our gratitude to Therese Leinonen and Eivind Nessa Torgersen for their insightful comments on an earlier draft of this manuscript, and to two anonymous reviewers and the editors for their valuable suggestions. We are also grateful to Bengt Edqvist for help with the figures, to Tobias Müller for help with the map illustration, to Rachel Hartell for proofreading, and to the project Interaction and Variation in Pluricentric Languages for providing us with additional data.

References Agha, Asif. 2007. Language and social relations (No. 24). Cambridge: Cambridge University Press. Auer, Peter. 1998. “Dialect Levelling and the Standard Varieties in Europe”. Folia Linguistica 32 (1–2): 1–10. doi: 10.1515/flin.1998.32.1-2.1 Auer, Peter. 2005. “Europe’s sociolinguistic unity, or: A typology of European dialect/standard constellations”. In Perspectives on Variation: Sociolinguistic, Historical, Comparative, ed. by N. Delbecque et al., 7–42. (Trends in Linguistics. Studies and Monographs 163.) Berlin: Mouton de Gruyter. doi: 10.1515/9783110909579.7 Auer, Peter, and Hinskens, Frans. 2005. “The role of interpersonal accommodation in a theory of language change”. In Dialect change: convergence and divergence in European languages, ed. by P. Auer, F. Hinskens, and P. Kerswill, 335–357. New York: Cambridge University Press. doi: 10.1017/CBO9780511486623.015 Boersma, Paul and Weenink, David. 2013. Praat: doing phonetics by computer [Computer program]. Version 5.3.51, retrieved 2 June 2013 from http://www.praat.org/. Broberg, Richard. [1972] 2001. Språk- och kulturgränser i Värmland. En översikt och några synpunkter. (Svenska landsmål och svenskt folkliv B:67.) Uppsala: Språk- och folkminnesinstitutet. Chambers, Jack K. and Trudgill, Peter. 1980. Dialectology. Cambridge: Cambridge University Press. Collin, H. S., Schlyter, C. J. (eds.). [1827] 1976. Westgöta-lagen. Lund: Ekstrand. Eckert, Penelope. 2008. ”Variation and the indexical field”, Journal of Sociolinguistics 12 (4): 453–476. doi: 10.1111/j.1467-9841.2008.00374.x

110 Jenny Nilsson and Lena Wenner

Engstrand, Olle. 1999. Handbook of the International Phonetic Association. A Guide to the usage of the International Phonetic Alphabet. Cambridge: Cambridge University Press. Finnur Friðriksson. 2008. Language change vs. stability in conservative language communities: A case study of Icelandic. Göteborg: Institutionen för lingvistik. Giles, Howard and Robert N. St. Clair, (eds.). 1979. Language and Social Psychology. Oxford: Blackwell and Baltimore. Giles, Howard and Philip Smith. 1979. ”Accommodation Theory: Optimal Levels of Convergence”. In Language and Social Psychology, ed. by Howard Giles and Robert N. St. Clair, 45–65. Oxford: Blackwell and Baltimore. Götlind, Johan and Samuel Landtmanson. 1940–50. Västergötlands folkmål. Vol. 1–4. (Skrifter utg. av Kungl. Gustav Adolfs Akademien för folklivsforskning 6.) Uppsala: Lundequistska bokh. Ivars, Ann-Marie. 2003. ”Lokalt och regionalt i svenskan i Finland. Tendenser i språkutvecklingen i norr och söder”. In Nordisk dialektologi, ed. by Gunnstein Akselberg et al., 51–81. Oslo: Novus. Johnstone, Barbara. 2009. “Pittsburghese shirts: commodification and the enregisterment of an urban dialect”, American Speech 84 (2): 157–175. doi: 10.1215/00031283-2009-013 Johnstone, Barbara. 2014. ““100% Authentic Pittsburgh”: Sociolinguistic authenticity and the linguistics of particularity”. In Indexing Authenticity: Sociolinguistic Perspectives, ed. by T. Breyer, J. Leimgruber, and V. Lacoste 97–112. (Linguae & litterae 39). Berlin, München, Boston: DeGruyter. Johnstone, Barbara, Jennifer Andrus, Andrew E. Danielson. 2006. “Mobility, Indexicality, and the Enregisterment of ‘Pittsburghese’”, Journal of English Linguistics 34 (2): 77–104. doi: 10.1177/0075424206290692 Johnstone, Barbara and Scott F. Kiesling. 2008. “Indexicality and experience: Exploring the meanings of /aw/-monophthongization in Pittsburgh”, Journal of Sociolinguistics 12 (1): 5–33. doi: 10.1111/j.1467-9841.2008.00351.x Labov, William. 1972. Sociolinguistic patterns. Philadelphia: University of Pennsylvania Press. Leinonen, Therese. 2010. An Acoustic Analysis of Vowel Pronunciation in Swedish Dialects. GRODIL 83. Groningen: Rijksuniversiteit Groningen. Nilsson, Jenny. 2009. “Dialect change?”. Nordic Journal of Linguistics 32: 207–220. doi: 10.1017/S0332586509990047 Nilsson, Jenny. 2015. “Stabilitet och förändring i norra Värmland – dialekten i Torsbyområdet 1940-tal och 2010-tal”. Folkmålsstudier 53: 167–198. Norrby, Catrin, Camilla Wide, Jenny Nilsson, and Jan Lindström. 2015. “Address and Interpersonal Relationships in Finland-Swedish and Sweden-Swedish Service Encounters”. In Address Practice as Social Action, ed. by Catrin Norrby and Camilla Wide, 75–96. Houndmills, Basingstoke, Hampshire; New York: Palgrave Macmillan. Pedersen, Inge Lise. 2005. “Processes of standardization in Scandinavia”. In Dialect change. Convergence and Divergence in European Languages, ed. by Peter Auer, Frans Hinskens and Paul Kerswill, 171–195. New York: Cambridge University Press. Røyneland, Unn. 2005. Dialektnivellering, ungdom och identitet. Ein komparativ studie av sprakleg variasjon og endring i to tilgrensande dialektområden, Røros og Tynset. (Acta Humaniora 231.) Oslo: Det humanistiske fakultet, Universitetet i Oslo. Sandøy, Helge. 2004. “Types of society and language change in the Nordic countries.” In Language Variation in Europe. Papers from the Second International Conference on Language Variation in Europe, ICLaVE 2 Uppsala University, Sweden, June 12–14, 2003, ed. by Britt-Louise Gunnarsson etal., 53–76. Uppsala: Department of Scandinavian Languages, Uppsala University.

The unruly dialect variant [a] 111

Silverstein, Michael. 1998. “Contemporary Transformations of Local Linguistic Communities”, Annual Review of Anthropology 27 (1): 401–426. doi: 10.1146/annurev.anthro.27.1.401 Silverstein, Michael. 2003. “Indexical Order and the Dialectics of Sociolinguistic Life.” Language and Communication 23: 193–229. doi: 10.1016/S0271-5309(03)00013-2 Svahn, Margareta and Jenny Nilsson. 2014. Dialektutjämning i Västsverige Göteborg: Dialekt-, ortnamns- och folkminnesarkivet i Göteborg. Thelander, Mats. 1979. Språkliga variationsmodeller tillämpade på nutida Burträsktal. (Acta Universitatis Upsaliensis. Studia Philologiae Scandinavicae Upsaliensia 14:1 & 2.) Uppsala: Uppsala universitet. Wenner, Lena. 2010. När lögnare blir lugnare. En sociofonetisk studie av sammanfallet mellan kort ö och kort u i uppländskan. (Skrifter utg. av Institutionen för nordiska språk vid Uppsala universitet 80). Uppsala: Uppsala University.

Vowel raising and vowel deletion as sociolinguistic variables in Northern Greek Panayiotis A. Pappas Simon Fraser University

The research reported here is part of a longitudinal case study into the linguistic effects of de-urbanization, which is occurring in Greece due to severe economic recession. The overall aim of the analysis is to explore the way in which de-urbanization is affecting the evaluation and production of dialects in rural communities. In this paper, I present evidence that the features of unstressed vowel deletion and vowel raising are socially embedded in Northern Greek. Even though the usage of standard variants is very close to categorical, a quantitative comparison of the linguistic patterns of urban in-migrants who have returned to the rural community against the usage of speakers who have never left reveals that the use of standard variants indexes more advanced education and an orientation towards the urban lifestyle. Keywords: high vowel deletion, mid vowel raising, rural dialects, dialect contact, de-urbanization, Modern Greek

1. Introduction One of the major causes of language change has been the large movement of popu lations. For example, Labov (2001: 342) credits the vast mobilization that was the result of WWI as the cause for changes in the vowel system of Philadelphia. Other than the two world wars, the demographic process that has undoubtedly shaped the linguistic map of the developed world the most in the 20th century is urbani zation. Bailey and Maynor (1989) argue that the rise of the automobile industry in Detroit was a catalyst in the divergence of African-American English from its Anglo-American origins, as the creation of thousands of jobs in the northern cities in the early 1900s attracted many members of these communities, both Anglo and African. In the new urban environments of Detroit, Chicago, and Cleveland, however, these workers lived in segregated communities, and the ensuing deepe ning of the geographical and cultural separation led to the divergence of the two doi 10.1075/silv.19.07pap © 2017 John Benjamins Publishing Company

114 Panayiotis A. Pappas

varieties of English. Similar effects of population movement have been noted in other regions such as Tennessee (Fridland 2003), Ohio (Dodsworth 2008), and in the U.K. (Britain and Trudgill 2005). Labov (1963) showed that the process of urbanization can result in the weakening of rural dialects, since younger dialect speakers who aspire to find success in the urban environment perceive the urban variety as prestigious and begin to adopt it, while at the same time aspects of the rural dialect are stigmatized. Hence, local varieties begin to give way to more urban forms of speech, through the process of dialect contact (Chambers and Trudgill 1998, Britain and Trudgill 1999, Kerswill and Williams 2000). This type of language change has been confirmed by several studies, (Trudgill 1972, Nichols 1983, Trudgill 1986, Kerswill 1993, Wolfram and Schilling-Estes 1995, Britain 1997, Hazen 2002), and has been labeled by Labov (2006: 203) as “change from above.” Hazen (2002) classified residents in a rural county of North Carolina into two categories: those who identified with their local community only, and those who identified both with the local community and with the urban communities surrounding it. He shows that the latter tend to use stigmatized features of their dialect less frequently, and argues that this is correlated with their attitudinal orientation. On the other hand, some studies (Bailey et al. 1993, Fridland 2003) have demonstrated that when speakers of a standard variety move en masse into a community where a non-standard variety is spoken, local speakers may resist this change by preserving a few dialectal features as markers of an authentic local identity. Wolfram and Schilling-Estes (1996) have shown that this can also be the case for smaller communities as they transition into a post-insular phase. According to Bourdieu (1977), such phenomena of dialect shift or maintenance are viewed as the linguistic expression of power struggles within a commu nity, because language is imbued with symbolic capital. Typically, urban life is associated with a higher standard of living, better education and more symbolic capital than life in rural areas. As a result, urban varieties have overt prestige when compared to rural ones. An important question that has not been investigated, however, is whether this process of dialect attrition through urbanization can be reversed if a change in the socioeconomic conditions renders life in urban centers less desirable than life in rural communities. The reason for the lack of such investigations is that there have not been many such cases of de-urbanization in the developed world, until the last decade. In this project, I intend to conduct a study that examines the effects of such a reversal in urbanization in Greece. After WWII, Greece, like many European countries, underwent a long period of urbanization that did not abate until a few years ago. By the beginning of the 21st century, the two major urban centers of

Vowel raising and deletion in Northern Greek 115

Greece (Athens and Thessaloniki) comprised 50% of the population of Greece (roughly 5 million inhabitants, HRMI 2006). This period of urbanization led to the decrease of dialect speakers and the emergence of an urban Greek verna cular (based mostly on the Athenian koiné), which quickly became the prestige variety in the first decade after the restoration of democratic rule in the country (1974–1984, cf. Frangoudaki 1992). Pappas (2008) explores the different attitudes of speakers in a rural community in Greece towards their local variety and especially towards the palatal pronunciation of the sounds /l/ and /n/, which is stigmatized in popular culture. The study reveals that this stigmatization was affecting younger speakers the most, particularly those who planned to pursue education and career opportunities in Athens or Thessaloniki. These speakers used the standard pronunciation almost exclusively and expressed negative attitudes both towards their local community and their dialect. However, since the beginning of the economic crisis in 2008, unemployment in Greece has reached 25.6% for the general population, and over 53.2% for workers under the age of 24 (ELSTAT 2015). As expected, the urban workforce has been much more affected by this turn of events. In rural communities, especially in areas where the tourism industry is very robust, the effect is mitigated. News articles (Shorto 2012) have documented the exodus of young city dwellers to their ancestral villages in search of better job opportunities. Α study published by the Greek government (Kapa Research 2012) shows that 19% of the participants were exploring ways to move out of the two major cities, while 68% thought that such a move would be beneficial to their standard of living. The press release for the study comments that this is “a period of reversal of the process of urbanization” (HRΜDF 2012), whose biggest effect is that young people in rural Greece no longer desire to move to a city to seek employment. The larger project to which this study belongs is a longitudinal study which aims to assess whether in-migrants who are returning to rural communities converge back to rural dialect norms over time, or if they continue to use the standard at the same rate as when they first arrived. The research questions guiding this project are as follows: i. Will the pronunciation of speakers who have returned from the cities converge over time with that of speakers who have not left the village? ii. Will their attitudes towards the local variety change over time? In order to answer these questions I plan to: (i) Analyze speakers’ usage and evaluation of certain diagnostic variables in order to establish what differences, if any, exist between the two groups, and whether gender and education play a role; (ii) Repeat interviews after a period of five years with as many speakers as possible and determine what changes, if any, have occurred.

116 Panayiotis A. Pappas

The first set of interviews took place in 2012 in a rural community in Northern Greece (the village of Limenaria on the island of Thasos), where I recorded semi-structured interviews with two groups of speakers: eleven speakers between the ages of 25 and 35 who had never left their community, and thirteen speakers of the same age group who had lived in an urban center but returned. The former group includes five women and six men, while the latter has seven women and six men. In terms of education, all those who have not left the island completed high school only, while of the in-migrants nine have some form of post-secondary education. The island of Thasos was chosen because: (i) There are published descriptions of the traditional dialect (Tompaidis 1967), confirming the presence of defining characteristics of Northern Greek; (ii) While tourism is vigorous and the local economy is robust, the island also maintains a stable population during the off- season months, so it would be easy to find participants; (iii) It is located near the mainland, so while there is a certain degree of isolation, travel and contact with urban centres is quite frequent. The first necessary step in the project is to verify that the in-migrant speakers have indeed adopted a more standard pronunciation. I will present evidence that there is a significant difference between speakers who have remained on the island for the most part and those who are returning after a substantial period of living in a major urban centre and that, furthermore, education and gender play a role. Thasos Greek (Contossopoulos 2001: 61–71, Trudgill 2003: 53) belongs to the Extreme Northern variety in which unstressed high vowels are deleted and unstressed mid vowels are raised in all word positions. Typically, one hears [piˈði] for /peˈði/ (‘child’), [ˈtosu] for /ˈtoso/ (‘this much’), [ˈpaʎ] for /ˈpali/ (‘again’), or [ˈkti] for /kuˈti/ (‘box’). Phonologically, the rules are arranged in a counter-feeding order (Newton 1972: 186): [piˈði] (‘child’) does not also undergo deletion to become [pði]. In terms of acoustic studies, there is only Topintzi and Baltazani (2012), which examines the deletion pattern of one elderly speaker from Kozani in northwestern Greece. They find that /i/ is much more resistant to deletion than /u/ and that in both cases the deletion is gradient and variable. However, their findings are based on only one speaker, who is performing dialect (Schilling-Estes 1998). He is reading a story that he himself has written based on local experiences and in which he attempts to represent the vowel deletion orthographically. As performance speech tends to exaggerate the occurrence of dialectal features, a direct comparison of the results of Topintzi and Baltazani (2012) with the results of this report would not be particularly informative.

Vowel raising and deletion in Northern Greek 117

2. Methodology and results All speakers were recorded during semi-structured interviews, in which the main topics were family and relatives, life on the island vs. life in the city, the impact of the economic crisis, and unique features of Thasos, including its dialect. Each interview was at least 25 minutes long. For each vowel, 20 tokens were extracted from the portion of the interview ranging after the fifth minute and before any discussion of dialectal features, in order to avoid, as much as possible, self-conscious speech. If a word contained more than one possible instance of deletion or raising, only the first vowel was examined in order to mitigate any effects of priming (cf. Tilsen 2009). For the vowels /i/, /e/ and /o/, which are very frequent, I restricted the selection by imposing a maximum of one token for each type (word) per speaker in order to ensure the lexical variability of the dataset. This was not possible for /u/, because, as Topintzi and Baltazani (2012: 393) note, Protopapas et al. (2010) have shown that /u/ is the least frequent among vowels in Greek (4% only). As a result, the bulk of unstressed /u/ tokens come from six types, most notably the pronoun /mu/ (indirect object or possessive, 1st sg.) at 27% (127/480), and the noun /ðuˈlia/ (‘work’) at 16% (67/480). For the raising of unstressed mid vowels, several words with /e/ and /o/ in word final position were excluded from the dataset because the vowels in question were not raised but deleted. A word was excluded if at least two speakers demonstrated this pattern of deletion, which is mostly seen in the ending of the 1st person plural active of a verb. For example, /ˈpame/ (‘let’s go’) is frequently realized [ˈpam] instead of the expected [ˈpami]. For this reason, I excluded all instances of a 1st person active verb form; the form /ˈine/ (3rd singular or plural of ‘be’), which often is pronounced [ˈin] as well as [ˈini]; and the adverbs /ˈkato/ (‘down’) pronounced [ˈkat] as well as [ˈkatu], and /ˈpano/, [ˈpan] as well as [ˈpanu]. All tokens were coded impressionistically as to whether the pronunciation of the vowel was dialectal or standard. Spectrograms of a sample of the tokens (10%) were examined in Praat 5.4.2 (Boersma and Weenink 2015) in order to verify the accuracy of the coding. Overall, 1920 tokens were extracted and coded, 480 (24 speakers X 20 tokens) for each variable. Table 1 shows the results for use of the two variants for each of the vowels. The frequency of standard usage for /i/ is 86.5%, 89.6% for /e/, 81% for /o/, while for unstressed /u/ it is the highest, at 99%. For this reason, /u/ is not included in the regression analysis. Although these results show that standard usage is at near categorical levels for all variables, I will show that there is still significant social embedding at play (consider Meyerhoff, this volume, for a discussion of how to deal conceptually with low frequency forms). A preliminary Goldvarb (Sankoff et al. 2015) analysis demonstrates that ‘type of vowel’ is indeed a significant factor group in terms of

118 Panayiotis A. Pappas

Table 1. Distributional results for vowel deletion and raising in Thasos Greek Vowel

Pronunciation Standard

/u/ /i/ /e/ /o/

99% 86.5% 89.6% 81% Total N 

Northern Greek N = 475 N = 415 N = 430 N = 389 1709

1% 13.5% 10.4% 19%

N = 05 N = 65 N = 50 N = 91 211

probability of standard pronunciation (see Table 2). Furthermore, a ΔG (difference of deviance) comparison of the model with the three vowels as separate factors (log likelihood = −535.4) against the model in which vowels /e/ and /o/ are grouped together (log likelihood = −542.4), is significantly better, with a x2 value of 14, which is significant at p 55 years)

Split Merger Total Split Merger Total

Total

Secondary

University

Total

N

%

N

%

N

%

N

%

2021 1195 3216 10773 3460 14233

62.8 37.2

1633 395 2028 12677 2033 14710

80.5 19.5

2764 972 3736 9338 1504 10842

74.0 26.0

6418 2562 8980 32788 6997 39785

71.5 28.5

75.7 24.3

86.2 13.8

86.1 13.9

82.4 17.6

Results from Granada, as discussed by Moya-Corral and Sosiński (2015), are particularly striking and support the hypothesis of a nearly accomplished change among the youngest and most educated speakers (Figure 5). 100

1995 2014

90 80 70 60 50 40 30 20

Age I: 20-34 II: 35-55 III: >55

10 0

I

II University

III

I

II

III

Secondary school

I

II

III

Primary school

Figure 5. Effect of the speaker’s age and education on dental split in Granada town. Source: Moya-Corral and Sosiński (2015: 56, translated from Spanish)

3.3

Erosive changes

Though obstruent consonants in onset position are relatively stable in central- Castilian dialects (with the apparent exception of intervocalic /d/), this is not true with regard to obstruents in the coda (Table 4). Actually, lenition and deletion of both syllable-final /s/ and intervocalic /d/ are old changes from below, variably

Between local and standard varieties 135

affecting every variety of Spanish. Even though both varieties – central-Castile and Andalusia dialects – seem to share the same constraints on /d/ and /s/ variation, changes are less frequent and act more slowly in the former than in the latter. In fact, frequency and social spread of these changes follow the same order as the one shown in Table 4 (standard >> Castilian >> Eastern >> Western). Table 4. Erosive changes. Dialect lenition and deletion of syllable-final /s/ and intervocalic /d/. Source: adapted from Villena-Ponsoda and Vida-Castro (2015) Word sets

Standard

Castilian dialects

Eastern

Western

Gloss

niñ-o-s/niñ-o

[ˈniɲos̺ ]

[ˈniɲɔ]

[ˈniɲo]

kids/kid

com-e-s/ com-e pata/pasta

[ˈkomes̺ ]

[ˈniɲos̺ ] [ˈniɲɔh] [ˈkomes̺ ] [ˈkomɛh] [ˈpas̺ta] [ˈpahta]

[ˈkɔmɛ]

[ˈtu ˈkome] [ˈpatha] [ˈpatsa]

castillo/ cachillo

[kas̺ˈtiʝo]

[kas̺ˈtiʝo] [kahˈtiʝo]

hablado

[aˈβlaðo]

comido dedo

[koˈmiðo] [ˈdeðo]

[aˈβlao] [aˈβlaðo] [koˈmiðo] [ˈdeðo]

[ˈpatːa] [ˈpatha] [ˈpatsa] [kaˈtːiʝo] [kaˈthiʝo] [kaˈtsiʝo] [aˈβlao]

you eat/(s)he eats leg/dough

[koˈmio] [ˈdeðo] [ˈdeo]

3.4

[ˈpas̺ta]

[kaˈthiʝo] [kaˈtsiʝo]

castle/little piece

[aˈβlao]

spoken

[koˈmio] [ˈdeðo] [ˈdeo]

eaten finger

Cross-dialect variation. Syllable-final /s/

The increase of weakened and deleted syllable-final /s/ reveals a northern to southern geolinguistic continuum where some implicational relationships can be observed (Table 5). Variation includes sibilant (Castilian) or aspirated allophones with several stages of assimilation and resyllabification (Andalusian) – as well as deletion. Thus, deletion of syllable-final /s/ is very advanced among Andalusian and Canary Islands varieties overall, where sibilant allophones are almost non- existent, whereas it is unusual among central Castilian varieties. However, retention or deletion of /s/ is not only a matter of geography. Linguistic constraints, including either word position (final or internal) or grammatical status, strongly affect /s/ variation. For instance, syllable-final /s/ tends to be realised in a different way depending on whether it occurs word-internally or in word-final position (see 3.4). Bearing in mind that as we head south, frequency of sibilant [s]

136 Juan-Andrés Villena-Ponsoda and Matilde Vida-Castro

Table 5. Syllable-final dialect variation in Spain. Source: adapted from Vida-Castro (2004) and Molina-Martos (2015) Areas Conservative

Innovative

Madrid downtown Vallecas Alcalá de Henares Getafe Toledo Las Palmas Linares (Jaén) Malaga Granada

(Molina-Martos 2015) (Molina-Martos 2015) (Blanco-Canales 2004) (Martín-Butragueño 1991) (Molina-Martos 1998) (Samper-Padilla 1990) (Gómez-Serrano 1994) (Vida-Castro 2004) (Tejada-Giráldez 2015)

s

h

0

82.0 67.6 64.0 59.0 52.0 3.0 2.0 1.5 1.4

16.0 29.5 34.0 35.0 36.0 64.3 48.1 31.1 28.1

1.0 2.8 2.0 6.0 12.0 32.7 49.9 67.5 70.5

pronunciation decreases and aspirated [h] and deleted variants increase, it is also apparent that external and internal factors interact (Table 6). Retention of segmental allophones ([s] and [h]) of /s/ in word-medial position ([ˈpahta], [ˈpatha], ‘dough’) vs. deletion in word-final position ([loˈniɲo], [lɔˈniɲɔ], ‘the kids’) typically occurs in the innovative areas, whereas conservative dialects make no such sharp difference between both environments. Table 6. Effect of word position in syllable-final /s/ variation. Source: adapted from Vida-Castro (2004), Molina-Martos (2015) and Tejada-Giráldez (2015) Areas Conservative

Innovative

Word position Madrid downtown Internal (Molina-Martos 2015) Final Alcalá de Henares Internal (Blanco-Canales 2004) Final Getafe Internal (Martín-Butragueño 1991) Final Toledo Internal (Molina-Martos 2008) Final Granada Internal (Tejada-Giráldez 2015) Final Malaga Internal (Vida-Castro 2004) Final Las Palmas Internal (Samper-Padilla 1990) Final

s

h

0

Retention

87.0 80.0 81.0 64.0 62.4 20.0 56.0 52.0 1.1 1.5 0.3 1.8 0.5 3.6

11.0 15.0 18.6 34.0 36.4 69.8 43.0 35.0 90.5 7.3 90.1 14.8 96.2 53.6

2.0 2.0 0.5 2.0 0.3 8.5 2.0 12.0 8.4 91.2 9.6 83.4 2.6 42.6

98.0 97.0 99.6 98.0 98.8 89.8 98.0 87.0 91.6 8.8 90.4 16.6 96.7 57.2

Between local and standard varieties 137

3.5

Near-Andalusian Castilian

Since the converging Andalusian variety could be qualified as “near-Castilian” due to its increasing acquisition of standard features (3.1–3.2), the inverse might also be true. Near-Andalusian Castilian varieties resist trends of standardness and reinforce the use of non-standard features. Urban southern Castilian working-class speakers and, particularly, speakers with a migrant background follow patterns of /s/ and /d/ pronunciation that are close to those from the southern areas. Moreover, data from transitional dialects shows that this is also the case for Murcia and Extremadura (Hernández-Campoy and Villena-Ponsoda 2009). However, this trend is not uniform and young speakers from central Castile lead a reaction against southern pronunciation, not only in Madrid (Molina-Martos 2015) but also in its surrounding area (Martín-Butragueño 1991). This pattern of resistance is possibly motivated by local loyalty and a reaction to the alleged excessive migration from southern areas (Blanco-Canales 2004). This conservative change seems to have progressed considerably in recent years. Martín-Butragueño’s (1991) study of Getafe supports this idea of a reaction against the southern patterns by young speakers including young migrants with an Andalusian background (Table 7). More recent studies have confirmed his results (Molina-Martos 2015). Table 7. Effect of background and age on sibilant /s/ retention among immigrants and native speakers in Getafe. Source: adapted from Martín-Butragueño (2002) Immigrants N % Probability

Natives

I

II

III

IV

I

II

III

IV

413 72.0 .640

738 64.0 .555

985 57.0 .484

213 25.0 .188

387 68.0 .594

809 70.0 .624

954 58.0 .496

507 59.0 .500

4. Conclusion Social and demographic changes occurring as a consequence of urbanisation, globalisation and intensive migration have affected dialect use all over Europe and given way to local dialect attrition. Local dialect features disappear as local identities vanish and, at the same time, regional varieties emerge where features of wider geographical dispersal have greater potential for survival. However, levelling does not necessarily lead to convergence towards the standard variety, but may also produce koineisation towards supra-local, non-standard norms (Hinskens 1997; Auer

138 Juan-Andrés Villena-Ponsoda and Matilde Vida-Castro

2005). The emergence of relatively stable and coherent regional varieties in the standard/dialect continuum (Auer’s 2005 type-C constellation) is common today. Horizontal levelling of varieties in urban Andalusia has resulted in convergence towards the national standard – particularly in Eastern Andalusia, which is far from the influence of the urban regional standard from Seville. However, at the same time, this vertical process has brought Andalusian and central Castilian varieties – as well as the transitional dialects – closer, by eliminating the strongest vernacular features. Some of the mergers, which are common among the most diverging vernacular varieties, become unusual as co-occurrence restrictions within the reorganised phonemic system avoid them (see 3.2). The ongoing resulting intermediate variety maintains some of the Andalusian phonological unmarked features affecting codas but diverges from the Seville regional standard since it adopts overt-prestige marked features from the national standard. This intermediate variety is nowadays relatively stable and coherent.

References Alarcos-Llorach, Emilio. 1950. Fonología española. Madrid: Gredos. Alvar, Manuel. 1974. “Sevilla, macrocosmos lingüístico”. In Estudios Filológicos y Lingüísticos, ed. by L. Quiroga-Torrealba, M. Torrealba and P. Díaz-Seijas, 13–42. Caracas: Instituto Pedagógico. Ariza-Viguera, Manuel. 1997. “De la aspiración de /s/”. Philologica Hispalensis 13: 49–60. Auer, Peter. 1997. “Co-occurrence Restrictions between Linguistic Variables. A Case for Social Dialectology, Phonological Theory and Variation Studies”. In Variation, Change and Phonological Theory, ed. by Frans Hinskens, Roeland van Hout and Leo Wetzels, 69–99. Amsterdam: John Benjamins. doi: 10.1075/cilt.146.05aue Auer, Peter. 2005. “Europe’s Sociolinguistic Unity; Or, a Typology of European Dialect/Standard Constellations”. In Perspectives on Variation, ed. by N. Delbecque et al, 7–42. Berlin: Mouton de Gruyter. doi: 10.1515/9783110909579.7 Auer, Peter and Frans Hinskens. 1996. “The Convergence and Divergence of Dialects in Europe. New and Not So New Developments in an Old Area”. Sociolinguistica 10: 1–30 (special issue on The Convergence and Divergence of Dialects in Europe, ed. by Peter Auer and Frans Hinskens). Bellmann, Günter. 1997. “Between Base Dialect and Standard Language”. Folia Linguistica, 32/1–2: 23–34. Blanco-Canales, Ana. 2004. Estudio sociolingüístico de Alcalá de Henares. Alcalá: Universidad. Bortoni-Ricardo, Stella-Maris. 1985. The Urbanization of Rural Dialect Speakers. A Sociolinguistic Study in Brazil. Cambridge: Cambridge University Press. Britain, David, Reinhild Vandekerckhove and Willy Jongenburger. 2009. Dialect Death in Europe? International Journal of the Sociology of Language 196/197. Berlin and New York. Mouton de Gruyter.

Between local and standard varieties 139

Cerruti Massimo and R. Regis. 2014. “Standardization Patterns and Dialect/Standard Con vergence: A North-Western Italian Perspective”. Language in Society 43/1: 83–111. doi: 10.1017/S0047404513000882 García-Amaya, Lorenzo. 2008. “Variable Norms in the Production of /θ/ in Jerez de la Frontera, Spain”. IUWPL7. Gender in Language: Classic Questions, New Contexts, ed. by Jeff Siegel et al, 49–71. Bloomington. IULC. Gómez-Serrano, Antonio. 1994. Aspectos del habla de Linares (Jaén). Málaga: Universidad. Guy, Gregory. 2013. “The Cognitive Coherence of Sociolects: How Do Speakers Handle Multiple Sociolinguistic Variables? Journal of Pragmatics 52: 63–71. doi: 10.1016/j.pragma.2012.12.019 Guy, Gregory and Frans Hinskens. 2016. “Linguistic Coherence; Systems, Repertoires and Speech Communities. Introduction”. In Linguistic coherence: Systems, Repertoires and Speech Communities, ed. by Gregory Guy and Frans Hinskens. Lingua 172–173: 1–9. Hernández-Campoy, Juan-Manuel and Juan-Andrés Villena-Ponsoda. 2009. “Standardness and Non-Standardness in Spain: Dialect Attrition and Revitalization of Regional Dialects of Spanish”. International Journal of the Sociology of Language 196/197: 181–214. Hinskens, Frans. 1997. “Dialect Levelling: A Two-Dimensional Process”. Folia Linguistica 32/1–2: 35–51. Hinskens, Frans, Peter Auer and Paul Kerswill 2005. “The Study of Dialect Convergence and Divergence: Conceptual and Methodological Considerations”. In Dialect Change. Convergence and Divergence in European Languages, ed. by Peter Auer, Frans Hinskens and Paul Kerswill, 1–48. Cambridge: Cambridge University Press. Hualde, José. 2005. The Sounds of Spanish. Cambridge: Cambridge University Press. Kerswill, Paul. 1994. Dialects Converging: Rural Speech in Urban Norway. Oxford: Clarendon Press. Kerswill, Paul. 2001. “Koineisation and Accommodation”. In The Handbook of Language Variation and Change, ed. by J. K. Chambers, Peter Trudgill and Natalie Schilling-Estes, 669–702. Oxford: Blackwell. Kerswill, Paul and A. Williams. 2005. “New Towns and Koineization: Linguistic and Social Correlates”. Linguistics 43: 1023–1048. doi: 10.1515/ling.2005.43.5.1023 Lasarte-Cervantes, Mª.-Cruz. 2010. Formación de dialectos en el contexto urbano. Convergencia y divergencia dialectal en Málaga. Ph.D. diss., Málaga, Universidad. Martín-Butragueño, Pedro. 1991. Desarrollos sociolingüísticos en una comunidad de habla. Ph.D. diss., Madrid, Universidad Complutense. Martín-Butragueño, Pedro. 2002. Variación lingüística y teoría fonológica. México: El Colegio de México. Mattheier, Klaus J. (ed.). 2000. Dialect and Migration in a Changing Europe. Frankfurt. Peter Lang. Molina-Martos, Isabel. 1998. La fonética de Toledo. Alcalá, Universidad. Molina-Martos, Isabel. 2015. “La variable sociolingüística /-s/ en el distrito de Vallecas”. In Patrones Sociolingüísticos de Madrid, ed. by Ana-María Cestero-Mancera, Isabel MolinaMartos y Florentino Paredes-García, 91–116. Bern: Peter Lang. Molina-Martos, Isabel and Florentino Paredes-García. 2015. “La conservación de la dental /-d-/ en el distrito de Salamanca”. In Patrones Sociolingüísticos de Madrid: 63–89. Bern. Peter Lang. Moya-Corral, Juan-Antonio and Sosiński, Marcin. 2015. “La inserción social del cambio. La distinción s/θ en Granada. Análisis en tiempo aparente y en tiempo real”. Lingüística Española Actual 37/1: 33–72.

140 Juan-Andrés Villena-Ponsoda and Matilde Vida-Castro

Moya-Corral, Juan-Antonio and Emilio García-Wiedemann. 1995. El habla de Granada y sus barrios. Granada: Universidad. Navarro-Tomás, Tomás, A. M. Espinosa and L. Rodríguez-Castellano. 1933. “La frontera del andaluz”. Revista de Filología Española 20: 225–277. Pascual, J. M. 1998. “El revolucionario conservadurismo del español norteño. A propósito de la evolución de /s/ implosiva”. In Estudios de lingüística y filología española. Homenaje a Germán Colón, ed. by I. Andrés-Suárez and L. López-Molina, 387–400. Madrid: Gredos. Penny, Ralph. 2000. Variation and Change in Spanish. Cambridge: Cambridge University Press. doi: 10.1017/CBO9781139164566 Recaño-Valverde, Joaquín and Marta Roig-Vila. 2003. “Internal Migration and Inequalities: The influence of Migrant Origin on Educational Attainment in Spain”. European Sociological Review 19/3: 299–317. doi: 10.1093/esr/19.3.299 Regan, Brendan. 2016. “Sociolinguistic Analysis of Ceceo (De-)Merger in Western Andalusia (Huelva)”. Unpublished manuscript (under revision). University of Texas at Austin. Samper-Padilla, Juan-Antonio. 1990. Estudio sociolingüístico del español de las Palmas de Gran Canaria. Las Palmas: La Caja de Canarias. Siegel, Jeff. 1985. “Koines and Koineisation”. Language in Society 14/3: 357–378. doi: 10.1017/S0047404500011313 Siegel, Jeff. 2001. “Koine Formation and Creole Genesis”. In Creolization and Contact, ed. by Norval Smith and Tonjes Veenstra, 175–197. Amsterdam: John Benjamins. doi: 10.1075/cll.23.08sie Tejada-Giráldez, Mª.-Sierra. 2015. Contribución al estudio de los patrones sociolingüísticos del español de Granada. Ph.D. diss., Granada. Universidad. Hispánicos 2: 185–217. Trudgill, Peter. 1986. Dialects in Contact. New York: Basil Blackwell. Tuten, Donald. 2003. Koineization in Medieval Spanish. Berlin/New York: Mouton de Gruyter. doi: 10.1515/9783110901269 Vida-Castro, Matilde. 2004. Estudio sociofonológico del español hablado en la ciudad de Málaga. Alicante: Universidad. Villena-Ponsoda, Juan-Andrés. 1996. “Convergence and Divergence in a Standard-Dialect Con tinuum: Networks and Individuals in Malaga”. Sociolinguistica 10: 112–137. doi: 10.1515/9783110245158.112 Villena-Ponsoda, Juan-Andrés. 2001. La continuidad del cambio lingüístico. Granada: Universidad. Villena-Ponsoda, Juan-Andrés. 2005. “How Similar Are People Who Speak Alike? An Interpretive Way of Using Social Network in Social Dialectology Research”. In Dialect Change. Convergence and Divergence in European Languages, ed. by Peter Auer, Frans Hinskens and Paul Kerswill, 303–334. Cambridge: Cambridge University Press. Villena-Ponsoda, Juan-Andrés. 2008. “Sociolinguistic Patterns of Andalusian Spanish”. Inter national Journal of the Sociology of Language 193–194: 139–160. Villena-Ponsoda, Juan-Andrés and Antonio Ávila-Muñoz. 2014. “Dialect Stability and Divergence in Southern Spain. Social and Personal Motivations”. In Stability and Divergence in Language Contact. Factors and Mechanisms, ed. by K. Braunmüller, S. Höder and K. Kühl, 207–238. John Benjamins: SILV 16. Villena-Ponsoda, Juan-Andrés and Matilde Vida-Castro. 2015. “Maintenance or Loss of Dialect Andalusian Features: Internal and External Factors”. Unpublished manuscript.

Syntactic doubling and variation The case of Romani Aurore Tirard

Inalco & Lacito-CNRS, Paris

This paper analyses a case of syntactic doubling in Romani: the full doubling of the definite article in NPs including an adjective. This structure (dnda) is s imilar to the Greek polydefiniteness and displays the same grammatical optionality. A task was designed to trigger its use and submitted to Albanian Romani native speakers. The results show that an evolution in the nominal constituent order has taken place in contrastive contexts, whereby the community is still split into subgroups experiencing different patterns of language change. This doubling (dnda) has been used as a kind of bridge from the canonical word order (dan) to a new one (dna). Social factors show that this process has been favoured by contact with Albanian and/or Greek. Keywords: syntactic variation, language change, language contact, doubling, polydefiniteness, Romani, Balkans

1. Introduction This paper deals with variation in nominal morphosyntax in the Romani varieties spoken in Albania, involving different constituent orders and the optional full doubling of the definite article in NPs involving an adjective. Romani is a minority language of Indic origin spoken on all continents and especially in Europe. It shows substantial dialectal variation because of its geographical extension and, given the absence of monolingual speakers, because of the multiple language contact scenarios in which it is involved. The language has not yet been standardised: various attempts have been made in different countries, leading to the emergence of competing norms (Matras 2005, Leggio 2013: 36–44). Definiteness is marked in Romani by a definite article, a free morpheme placed in the first slot of the nominal phrase (Table 1).

doi 10.1075/silv.19.09tir © 2017 John Benjamins Publishing Company

142 Aurore Tirard

Table 1. Linear layout of the noun phrase: principal slots (Matras 2002: 166) [preposition] + [determiner] + [quantifier] + [adjective] + noun + [options]

The positioning of the Romani adjective has not yet been extensively discussed. The descriptions of individual varieties (e.g. Tenser 2005) generally assert that a is placed before n. This is indeed the case in most varieties – and it is also the canonical word order in Indic languages (Masica 1993: 370). Therefore, we can consider the dan sequence the inherited and canonical word order. A dna structure is documented in varieties spoken in the Balkans (e.g. former Yugoslavia) or in contact with Romance languages (Romanian, Italian, Spanish). Some authors explain its occurrence as a particular semantic and/or pragmatic device: the adjective “is exposed as an afterthought” (Matras 2002: 167) or as a “comment” (Boretzky 1993: 41). Others explain it as a consequence of language contact with a language postposing a (Soravia 1972: 38). However, another non-canonical structure is attested in some Balkan varieties: two identical definite articles determining a unique head noun in presence of a postposed a, as in example (1). (1) Istanbul, kaj ćer-ena o film-e o bar-e? Istanbul where make-3pl def.art.pl film-pl def.art.pl big-pl ‘Istanbul, where they make the great films?’ (Female Arli speaker, age 16, Korça, July 2014)

This structure is very similar to a phenomenon labelled Determiner Spreading (Androutsopoulou 1995) or polydefinite NP (Kolliakou 2004). Lekakou and Szendrői (2012: 108) define it as “instances of an adjective modifying a noun where the noun and the adjective are each accompanied by their own determiner”. In Romani, this third syntactic variant, dnda, involves a “focus” (Boretzky 2000: 42) or an “appositional function” (Matras 2002: 97). dna and dnda can be considered innovations because a is postposed to n, being thus placed within the postnominal ‘option’ slot. The question remains what can trigger such an optional threefold variation? 2. Data and method Because of the rarity of dnda (28 tokens in 28 hours of conversational speech), an experiment was designed by Evangelia Adamou to elicit this construction by adapting the Static Localization Task n°8 of the QUIS (Skopeteas et al. 2006: 93). Pairs of native speakers were asked to manipulate culturally appropriate real-life objects and to describe this manipulation. All informants were given 28 objects

Syntactic doubling and variation 143

contrasting in shape, colour and size. The aim was to lead the speakers to contrast the objects through a restrictive use of attributive adjectives (a feature of Greek polydefinites according to Campos and Stavrou 2004: 141, Lekakou and Szendrői 2012: 125–129). The task was mostly conducted at the participants’ houses in Korça, Albania, in 2014. Since the investigated structure seems to exist only in the Balkan varieties, those spoken in this border region (Figure 1), in the heart of the Balkans, were perfect candidates for this study. The task was submitted to thirty-four Romani native speakers stratified by gender, age, education and linguistic variety. Twelve of them can speak Greek because they live/d in Greece or work/ed with Greek people. Twenty-two of them do not know Greek but some have relatives (mostly grand-parents) who do/did. POLAND CZECH REP.

UKRAINE SLOVAKIA

AUSTRIA

MOLDOVA

SLOVENIA

HUNGARY ROMANIA

CROATIA BOSNIAHERZEGOVINA

SERBIA KOSOVO Black Sea BULGARIA

MONTENEGRO ITALY

ALBANIA

MACEDONIA Korça GREECE TURKEY

Mediterranean Sea 0 100 200 300 400 km Designed by J. Picard, CNRS, LACITO, 2016

Figure 1. Location of the fieldwork for this study, Korça, Albania

144 Aurore Tirard

Little is known about the Roma in Albania. Bakker (2001) estimated their numbers at about 90,000 during the 1990s. Only 8301 persons declared themselves as Roma in the 2011 census in Albania, whereas the Council of Europe estimated them at between 80,000 and 150,000, i.e. 3.59% of the total Albanian population (CAHROM 2012). 1 It is not possible to select a representative sample of the Albanian Roma population because the demographics of the population in terms of mean age and age pyramid, professions and differences between the groups remain unknown. I could thus only attempt to get a representative sample of the community as I observed it during my fieldwork (Table 2). Table 2. Overview of the sample stratification Variables

Levels

N of informants Comments

gender

Women Men ≤ 15 y. old

16 18 4

16–39 y. old

16

40–59 y. old

8

≥ 60 y. old

6

0–5 years

13

6–12 years

14

12+ years

7

Mečkar Arli Čergar

6 15 2

age

education

variety

Speakers who are not married, do not work and are considered “children” by Roma and non-Roma societies. Speakers who tend to be married, work and are considered young adults by the Roma society. Speakers who are married, usually work, have grandchildren and are considered older adults by the Roma society. Speakers who are considered seniors by Roma and non-Roma societies. Speakers who received primary education or no education at all. Speakers who received junior high education or high-school education (secondary school). Speakers who received higher (postsecondary) education. (see Section 3.2)

1. For further sociological information about the Roma in Albania, see De Soto, Beddies and Gedeshi (2005).

Syntactic doubling and variation 145

3. Results 3.1

Linguistic factors

Definite NPs containing an attributive adjective were counted according to the following three categories: –– monodefinite NP with preposed adjective (dan) –– monodefinite NP with postposed adjective (dna) –– polydefinite NP (dnda) Table 3 shows the overall distribution of the definite NPs: the three structures appear in roughly equal proportion. This finding confirms the optionality of the dnda structure. The proportion of 38% dnda is very high, which suggests that the pragmatic requirements of the task are indeed highly conducive to the realisation of dnda. Table 3. Distribution of the definite and indefinite NPs including an adjective dan N tokens 337 Percentage 32

dna

dnda Total (def) ind a n ind n a ind n ind a Total (ind)

316 30

411 38

1064 100

18 36

32 64

0 0

50 100

On the other hand, the absence of polyindefinite constructions (with indefinite NPs) is striking. This could of course stem from the task itself: since all the objects were on the table, the speakers were more prone to use definite articles. Nevertheless, among the few indefinite NPs that were produced, no polyindefinite structures occurred. Nor did they occur in the spontaneous speech I c ollected – this corresponds to the pattern observed in Greek (Lekakou and Szendrői 2012: 109). definiteness is therefore a relevant linguistic factor. Regarding the monodefinite NPs, the similar proportion of dan (32%) and dna (30%) is of particular interest. 2 Postposition of the adjective is thus not marked in Albanian Romani. Moreover, if we include the polydefinite NPs, which also rely on a postposed adjective, postposition represents more than two thirds of the tokens (68%). This suggests that postposition is the unmarked position of the adjective, at least in the context of the task. This finding is also striking for indefinite articles: ind n a represents 64% of the indefinite NPs. A salient difference with Greek (allowing both dadn and dnda) is that the adjective must be postposed in Romani polydefinite NPs (allowing dnda but not 2. This count excludes the NPs with both preposed and postposed adjectives (dana).

146 Aurore Tirard

*dadn). This suggests a strong correlation between nominal constituent order and polydefiniteness. Thus, contrary to Greek (allowing dan but not *dna), the Romani adjective is less flexible in polydefinites than in monodefinite NPs (allowing both dan and dna). 3.2

Social factors

The distribution of the definite NPs including an attributive adjective according to gender is represented in Figure 2. 100 90 80 70

51%

117 294

60 (%) 50 40 30 20 10 0

24%

DNDA DNA DAN

43% 210

18% 106 31%

33%

179

158

Women

Men

Figure 2. Distribution of the definite NPs within each gender

Women mostly used polydefinite constructions (51%, N = 294) followed by monodefinite constructions with preposed adjectives (31%, N = 179). They produced very few monodefinites with postposed adjective (18%, N = 106). Men on the other hand mostly used monodefinites with a postposed adjective (43%, N = 210). On the whole, women used as many monodefinites as polydefinites, which suggests that they preferred to postpose their adjectives. Men used more monodefinites than polydefinites but postposed their adjectives as much as women did. The distribution of the definite NPs including an adjective according to age is represented in Figure 3. We can see that the youngest speakers almost exclusively used the supposedly non-canonical dna (84%, N = 128), while the oldest speakers almost exclusively used the canonical dan (80%, N = 125). Speakers in the middle age range (15–59 years old) show a more balanced usage but favoured dnda (with 43%, N = 112 and 52%, N = 259). On the whole, the youngest and the oldest age brackets almost exclusively used monodefinites. Speakers in the middle age range tended to employ as many mono- as polydefinites, hence postposing rather than preposing

Syntactic doubling and variation 147

100 90 80

10% 16

52%

84% 259

70

43%

15% 24

112

5% 8 80%

DNDA DNA DAN

60 (%) 50 40

128

30

0

24% 63

125

32%

20 10

24% 117 24%

6% 9 –15 yrs old

120 15–39 yrs old

83 40–59 yrs old

+ 60 yrs old

Figure 3. Distribution of the definite NPs within each age level

the adjective. While dan was used by all speakers over 15, those under 15 hardly used it. dna was equally used by all speakers under 60, whereas those over 60 hardly used it. The systematic decrease in the use of the canonical dan from the oldest to the youngest speakers is striking, as is the systematic increase in the use of non-canonical dna from the youngest to the oldest speakers. This finding can be interpreted as evidence of a change towards increasing postposition of the adjective. Polydefinites, crucially, are most frequently used by middle-aged speakers. Moreover, with the exclusion of the youngest generation, the data reveal a clear slope with age, whereby younger speakers use increasingly more of the dnda construction. The data could be explained following Sankoff and Blondeau (2007: 562) as any of the following scenarios: 1. The slope with age could be interpreted as an indication of age grading, whereby change occurs across the life-time of individuals and is cyclical in character: speakers are unstable across their life-time, but no long-term change takes place across the whole community. 2. The findings could be indicative of a generational change, whereby “individuals may retain their childhood patterns” and a long-term change takes place across the community. 3. Finally, the pattern could reveal lifespan change, whereby “individual speakers change over their lifespans in the direction of a change in progress in the rest of the community”. Such individual patterns mirror an ongoing historical (long-term) change in the community. The distribution of the definite NPs including an adjective according to education is represented in Figure 4.

148 Aurore Tirard

100 90

34%

80

105

39%

45%

211

95

70 60 (%) 50

11%

35

55%

40 30 20

212

32% 69

173

10 0

39%

DNDA DNA DAN

0–5 years

21% 115

23%

6–12 years

12+ years

49

Figure 4. Distribution of the definite NPs within each education level

Speakers with lower education mostly used dan (55%, n = 173). Middle- and highly-educated people, however, used all three structures equally, preferring respectively dna/dnda (both 39%, N = 212 and N = 211) and dnda (45%, n = 95). On the whole, low-educated speakers favoured monodefinites more than poly definites, and preposed rather than postposed adjectives. Middle-educated speakers also used monodefinites more than polydefinites, but they preferred to postpose the adjective. Highly-educated speakers equally used mono- and polydefinites, hence postposing more than preposing their adjectives. The overall pattern is thus for more educated speakers to use polydefinite NPs more frequently. Albanian Roma are split into several communities, but no research is available on the internal differences within those communities. The following description is therefore based on my own field observations and ethnographic interviews. The three main varieties of Romani spoken in Albania correspond to three main sub-groups: –– The Mečkar were the first group to settle in Albania several centuries ago, according to informants from all groups. Their variety has been in contact with Albanian longer than any other Romani variety. Greek also greatly influenced the language during a previous stage. Mečkar are said to be generally well off and educated. –– The Čergar are supposed to have arrived later in the country, namely between the 19th and early 20th centuries. Their variety shows less Greek influence but Balkan Romance and extensive South-Slavic lexical influence. The Čergar are far less numerous than the other two groups and are generally considered well off and educated. –– The Arli also arrived between the 19th and early 20th centuries from Greece and – formerly – from a Turkish speaking area of the Balkans. The Arli variety

Syntactic doubling and variation 149

thus shares an important Greek and Turkish lexical legacy. Indeed, some older people still know Greek but not Turkish. The Arli are said to be less wealthy and educated than the other two groups. All three varieties have been in contact with Greek at various times in the past and are now in contact with Albanian. As far as I know, the three main groups are spread all over Albania and inter-marry with each other. Only those speakers whose parents speak the same variety as themselves were taken into account. Eleven bivarietal speakers were excluded from the sample. The distribution of the definite NPs including an adjective according to variety is represented in Figure 5. 100 90

65%

80 70

51% 25

116

(%) 50 30 20

43%

10 0

32%

111 13%

60 40

30%

21

DNDA DNA DAN

49

56% 205

57 3% 6 Meçkar

6% 3 Çergar

Arli

Figure 5. Distribution of the definite NPs within each variety.

Mečkar and Čergar speakers strongly favoured dnda (65%, N = 116 and 51%, N = 25) but hardly used the canonical dan (3%, N = 6 and 6%, N = 3), unlike the Arli who mostly used it (56%, N = 205). Mečkar and Čergar almost never preposed the adjective. Arli speakers used more monodefinites than polydefinites (30%, N = 111) and slightly preferred to prepose the adjective (56%, N = 205). Indeed, it seems that dan is almost exclusively used by Arli. dna and dnda are used by every group – but less so by Arli. 4. Discussion In order to understand such syntactic variation, we first have to determine why the repetition of d has been adopted. Here too, we can postulate two different hypotheses:

150 Aurore Tirard

A. The pattern was borrowed from Greek (which is prestigious and displays a similar polydefinite structure) or from Albanian (the current dominant contact language). B. It is an internal innovation that fills a communicative gap. In the following, I will briefly explore both hypotheses. 4.1

The language contact hypothesis

Roma everywhere are a minority and Romani is always in a subordinate position outside their community: it is not used by the majority (Non-Roma) and their institutions (police, school, etc.). Historical and recent migrations have therefore exposed the varieties spoken by the different sub-groups to the influence of different languages. Matras (2002: 195) proposes a threefold layering of the contact languages (L2) that influence any given Romani variety: –– the older L2 has heavily influenced the forerunner of the variety but is no longer spoken by the community. –– the recent L2 is only used by the parent or grandparent generation. –– the current L2 is the main language of everyday interaction with the non-Romani majority and often within the family alongside Romani. On the basis of my fieldwork observations and my informants’ declarations, I have reconstructed the L2s distribution presented in Table 4. Table 4. Contact history of the three Albanian Romani varieties 3 Mečkar

Čergar

Arli

Older L2

Albanian Greek (South-Slavic) 3

Romance

Turkish Greek

Recent L2

Albanian (Greek)

South-Slavic (Greek)

Greek (Turkish)

Current L2

Albanian (Greek) (Italian)

Albanian (Greek) (Italian)

Albanian (Greek) (Italian)

Importantly for the present discussion, the adjective occurs in different positions in those contact languages: 3. Contact with languages in brackets was occasional and, for Current L2, due to contemporary migrations and/or media exposure.

Syntactic doubling and variation 151

Table 5. Possible word orders in definite NPs in the various contact languages 4 56 78 910 11 12 13 14 0 d 4

1d

Albanian

Greek

dan

Turkish Romance

a n 8

South-Slavic

a n (Serbian) 13

n=d a 9 (a=d n) 10

2d+ n=d aclass 1 5 n=d det aclass 2 6 (aclass 1=d n) 7 (det aclass 2=d n) dan dnda n=d det a 11 (n=d a=d, n=d det a=d) 12

a=d n (Macedonian, Bulgarian) 14

4. d stands for definiteness morpheme. 5. The Albanian definite article is a (second-position phrasal) postposed suffix/clitic (Lyons 1999: 71, 75–76) or ending (Androutsopoulou 2001: 162). 6. det, originally a definite article (Lyons 1999: 79–80), is an adjectival article/determiner obligatorily occurring before class-2 adjectives in definite and indefinite NPs (see Campos (2008) for an extensive account on Albanian det). 7. Structures in brackets are possible only with certain adjectives or in particular contexts. 8. Turkish has no definite article (Enç 1991: 9, see also Lyons 1999: 50, 96). 9. The Romanian definite article is a postposed clitic (Lyons 1999:74–75) or suffix (Cornilescu and Nicolae 2012: 1075–1076). The same holds for Megleno-Romanian (Tomić 2006: 153) and Aromanian (Tomić 2006: 168). 10. See Lyons (1999: 75) and Tomić (2006: 127–128) for Romanian, Tomić (2006: 155) for Megleno-Romanian. Aromanian is the only Romance language that unmarkedly preposes the adjective (Tomić 2006: 169). 11. See Lyons (1999: 80–82), Campos and Stavrou (2004: 161) and especially Cornilescu and Nicolae (2012) for an extensive account of Romanian det, originally a demonstrative that is today a free-standing adjectival article. See Tomić (2006: 156) for Megleno-Romanian and Tomić (2006: 171) for Aromanian. 12. Only in Aromanian (Campos and Stavrou 2004: 138). 13. Serbian has no definite article (Tomić 2006: 108). 14. Macedonian and Bulgarian definite articles are postposed clitics (Lyons 1999: 73–74). For an extensive account of this topic, see (Tomić 2006: 55–63, 88–94).

152 Aurore Tirard

Table 5 shows that both Greek and Albanian can display a structure with two determiners: 15 in Greek through full doubling of d, and in Albanian through a mixed structure involving a postposed clitic and an adjectival article. Albanian adjectives are almost always postposed to the noun (Androutsopoulou 2001: 163–164) and most of them require the presence of an adjectival article (Campos 2008). I suggest that Mečkar prefers to postpose a since it has been in contact with Albanian for a longer period than the other varieties. Čergar’s similar preference can be seen as the effect of Romance influence that unmarkedly postposes a. 16 Arli’s lower frequency of postposition is probably a consequence of longer contact with languages that mostly prepose a (Turkish and Greek): its presence could also be due to the recent influence of Albanian. 4.2 Socio-linguistic account of the variation Table 6 predicts which kind of speaker is more prone to favour which variant. Table 6. Socio-linguistic profile of the most plausible speaker for each syntactic variant dan dna dnda

gender

age

education

variety

(equal) men women

≥ 60 y. old ≤ 15 y. old 15 to 59 y. old

0–5 years 6+ years 6+ years

Arli Čergar, Mečkar Mečkar, Čergar

The hypothesis that dan is a retention of an older form is confirmed by its higher frequency amongst older speakers. Speakers in the middle age range had more opportunity to attend school during the “communist” era than older Roma. Schooling results in deepening the speaker’s contact with Albanian (which prefers postposed adjectives), because it is not Romani but Albanian that is taught and practiced at school. That is why less-educated people (regardless of age) are expected to behave like older speakers (favouring dan). The more educated a person is, the more intensive contact they have with the dominant language. Since the level of education has severely decreased since the regime’s fall (De Soto, Beddies and Gedeshi 2005: 53–61), we would expect younger people to favour dan – but they do not. A possible explanation is the impact of television that dramatically increased the exposure of all the speakers (regardless of gender, age and education) to 15. The similarity between Greek and Albanian structures has been discussed by Androutsopoulou (2001), Campos (2008: 1024–1027) and Alexiadou (2014: 84–90). 16. The South-Slavic influence seems to have been less significant on the syntactic than on the lexical level.

Syntactic doubling and variation 153

Albanian (Foulkes and Docherty 1999, the papers in Androutsopoulos 2014). Internet and cell/smartphones may today reinforce this effect. Contact with Albanian must also be differentiated according to gender, a complex construction that “interacts with other social identities” (Meyerhoff 2011: 232). Almost all women 40+ years old in my sample did not attend school; many women work/ed as housewives and do not often go out of their house. Consequently, they generally use Albanian only in occasional or commercial interactions with Non-Roma people. Men, on the other hand, usually have a broader use of Albanian, experiencing it in more informal situations. For these reasons, I expected women to favour conservative dan – but they did not. A possible explanation is that “women seem to lead men in the use of the incoming, non-standard variant” (Meyerhoff 2011: 225). The fact that women favour dnda instead of dna may be explained by the larger geographical distribution of dnda (which occurs with most Albanian adjectives and in Greek polydefinites) than dna (which o ccurs with few Albanian adjectives). According to Foulkes and Docherty (1999: 16), women tend to be sensitive to the geographical extension of the forms, preferring non-local to local (here conservative Romani) ones. In order to explain how dnda has emerged and why it is used more by Mečkar, Čergar and speakers in the middle age range, I propose the scenario described in Figure 6: a change in the nominal constituent order seems to have taken place in Albanian Romani. The initial word order was dan for all communicative functions and is still used by older/low-educated/Arli speakers. dnda was probably borrowed from Greek, a language that displays an identical structure with the specific function of restricting the set of the noun’s denotation. It is a case of pattern replication (Matras and Sakel 2007, Matras 2009) in contrasting contexts: full doubling of the definite article and postposition of the adjective are its pivotal features. This resulted in a long-term change in the replica language Romani, enriching its structural inventory with a calque-like dnda. dnda may also have been borrowed from Albanian, in which case the pivotal features were a blank second determiner and adjectival postposition. The more innovative order, dna, is used in my data by younger/educated/ Mečkar and Čergar speakers. Dna was probably a further pattern replication from Albanian – rather than from Greek, which does not display any dna at all. The pivotal feature is the mere postposition of the adjective, available in both Albanian class-1 and class-2 adjectives and in Romani dnda structures. The full doubling of the definite article could thus have been used as a kind of bridge from dan to dna. If dna then extends from its primary contrasting function to other communicative functions, it would be a case of pragmatic unmarking, a contact-induced gramma ticalisation process “from pragmatic to syntactic marking” (Heine 2008: 54).

154 Aurore Tirard

In contrastive contexts

Synthesis: Diachronic scenario Stage 1

Stage 2

Stage 3

Stage 4

D A N DOMINANT D N D A rare D N A rare

D A N frequent D N D A frequent D N A rare

D N A rare D N D A frequent D A N frequent

D N A rare D N D A rare D A N DOMINANT

Balkanic Romani

Transition stage

Albanian Romani

Arli (older)

Arli (middle-aged)

Arli (middle-aged) Mečkar (all) Čergar (all)

Arli (younger)

Figure 6. Overview of the diachronic scenario

5. Conclusion Albanian Romani displays polydefiniteness whereby full doubling of the article is possible only with definite articles. The analysis of the contact history of the Albanian Romani varieties shows that a change in the nominal constituent order has taken place. The community appears to be split into sub-groups: –– Mečkar and Čergar experience a pattern of stability since they have already completed a change from dan toward dnda and dna. –– Older Arli also experience a pattern of stability since no change has occurred. –– Arli speakers in the middle age range seem to exhibit a pattern of lifespan change (scenario 3), since the speakers of this cohort have individually changed in the direction of the rest of the community (Mečkar and Čergar). –– Younger Arli seem to be experiencing a change that can be either interpreted as an age grading (scenario 1) if, by getting older, they increase their use of dnda – or as a generational change if the next generation follows them by preferring dna (scenario 2). Since such a synchronic study can only provide a snapshot of Albanian Romani, future longitudinal restudies are needed to check this analysis. I have suggested that the canonical sequence dan ceased to be appropriate in contrastive contexts or did not trigger a restrictive interpretation any more. To better suit their communicative needs in contrastive contexts (hypothesis B), Mečkar and Čergar speakers innovated by using dna. This was possible because dnda was available and took over a bridging function between initial dan and target dna. Polydefiniteness in Romani is a case of pattern replication from Greek that then enabled a second pattern replication from Albanian, a new order dna (hypothesis A).

Syntactic doubling and variation 155

Acknowledgements This paper presents the results that are laid out in detail in my PhD dissertation at Inalco, Paris. I would like to acknowledge support from Inalco and SeDyL (CNRS UMR 8202) for two field trip grants to Albania (2013, 2014), and from Inalco and Lacito (CNRS, UMR 7107) for attending ICLaVE 8. I am indebted to my PhD supervisor, Evangelia Adamou, for her precious advice on the present study. This work could not have been carried out without the Albanian Roma consultants and Marcel Courthiade who put me in touch with them. I am grateful to Daniele Viktor Leggio, François Jacquesson, Walter Breu and the audiences of ICLaVE 8 and Alexandre François’ seminar for the inspiring discussions on this research. My thanks are also due to Isabelle Buchstaller, Ciara R. Wigham and three anonymous reviewers for proofreading earlier drafts of this paper.

References Alexiadou, Artemis. 2014. Multiple Determiners and the Structure of DPs. Amsterdam, Philadelphia: John Benjamins. doi: 10.1075/la.211 Androutsopoulos, Jannis (ed.). 2014. Mediatization and Sociolinguistic Change. Berlin, Boston: De Gruyter. doi: 10.1515/9783110346831 Androutsopoulou, Antonia. 1995. “The Licensing of Adjectival Modification”. Proceedings of the WCCFL 14, 17–31. Androutsopoulou, Antonia. 2001. “Adjectival Determiners in Albanian and Greek”. In Comparative Syntax of Balkan Languages, ed. by María Luisa Rivero and Angela Ralli, 161–199. Oxford, New York: Oxford University Press. Bakker, Peter. 2001. “Romani in Europe”. In The Other Languages of Europe: Demographic, Sociolinguistic and Educational Perspectives, ed. by Guus Extra and Durk Gorter, 293–313. Clevedon: Multilingual Matters. Boretzky, Norbert. 1993. Bugurdži: Deskriptiver und historischer Abriß eines Romani-Dialekts [Bugurdži: Descriptive and Historical Outline of a Romani Dialect]. Wiesbaden: Harrassowitz Verlag. Boretzky, Norbert. 2000. “The Definite Article in Romani Dialects”. In Grammatical Relations in Romani: The Noun Phrase, ed. by Yaron Matras and Viktor Elšík, 31–64. Amsterdam, Philadelphia: John Benjamins. doi: 10.1075/cilt.211.05bor CAHROM – Council of Europe. 2012. “Estimates and official numbers of Roma in Europe”. Available at http://rm.coe.int/CoERMPublicCommonSearchServices/DisplayDCTM Content?documentId=0900001680088ea9 Campos, Héctor. 2008. “Some notes on adjectival articles in Albanian”. Lingua 119: 1009–1034. doi: 10.1016/j.lingua.2008.09.014 Campos, Héctor, and Melita Stavrou. 2004. “Polydefinite Constructions in Modern Greek and in Aromanian’. In Balkan Syntax and Semantics, ed. by Olga Mišeska Tomić, 137–173. Amsterdam, Philadelphia: John Benjamins. doi: 10.1075/la.67.09cam Cornilescu, Alexandra, and Alexandru Nicolae. 2012. “Nominal Ellipsis as Definiteness and Anaphoricity: The Case of Romanian”. Lingua 122 (10): 1070–1111. doi: 10.1016/j.lingua.2012.05.001

156 Aurore Tirard

De Soto, Hermine, Sabine Beddies, and Ilir Gedeshi. 2005. Roma and Egyptians in Albania: From Social Exclusion to Social Inclusion. Washington D.C.: The World Bank. doi: 10.1596/0-8213-6171-6 Enç, Mürvet. 1991. “The Semantics of Specificity”. Linguistic Inquiry 22 (1): 1–25. Foulkes, Paul, and Gerard J. Docherty. 1999. “Urban Voices – Overview”. In Urban Voices, ed. by Paul Foulkes and Gerard J. Docherty, 1–24. London: Edward Arnold. Heine, Bernd. 2008. “Contact-induced word order change without word order change”. In Language Contact and Contact Languages, ed. by Peter Siemund and Noemi Kintana, 33–60. Amsterdam, Philadelphia: John Benjamins. doi: 10.1075/hsm.7.04hei Kolliakou, Dimitra. 2004. “Monadic Definites and Polydefinites: Their Form, Meaning and Use”. Journal of Linguistics 40: 263–333. doi: 10.1017/S0022226704002531 Leggio, Daniele Viktor. 2013. Lace Avilen ko Radio. Romani Language and Identity on the Internet. Manchester: University of Manchester. Lekakou, Marika, and Kriszta Szendrői. 2012. “Polydefinites in Greek: Ellipsis, Close Apposition and Expletive Determiners”. Journal of Linguistics 48 (1): 107–149. doi: 10.1017/S0022226711000326 Lyons, Christopher. 1999. Definiteness. Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511605789 Masica, Colin P. 1993. The Indo-Aryan Languages. Cambridge: Cambridge University Press. Matras, Yaron. 2002. Romani: A Linguistic Introduction. Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511486791 Matras, Yaron. 2005. “The future of Romani: Toward a policy of linguistic pluralism”. Roma Rights Quarterly 1: 31–44. Matras, Yaron. 2009. Language Contact. Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511809873 Matras, Yaron, and Jeanette Sakel. 2007. “Investigating the Mechanisms of Pattern Replication in Language Convergence”. Studies in Language 31(4): 829–865. doi: 10.1075/sl.31.4.05mat Meyerhoff, Miriam. 2011. Introducing Sociolinguistics. 2nd ed. New York: Routledge. Sankoff, Gillian, and Hélène Blondeau. 2007. “Language Change across the Lifespan: /r/ in Montreal French”. Language 83 (3): 560–588. doi: 10.1353/lan.2007.0106 Skopeteas, Stavros et al. 2006. Questionnaire on Information Structure: Reference Manual. Potsdam: Universitätsverlag Potsdam. Soravia, Giulio. 1972. “Italian Influences on the Dialect of the Gypsies of Abruzzi”. Journal of the Gypsy Lore Society 3 (51): 34–39. Tenser, Anton. 2005. Lithuanian Romani. München: Lincom Europa. Tomić, Olga Mišeska. 2006. Balkan Sprachbund Morpho-Syntactic Features. Dordrecht: Springer. doi: 10.1007/1-4020-4488-7

Variation in style Register and lifestyle in Parisian French Aria Adli

Universität zu Köln

This study presents a sociolinguistic analysis of two linguistic variables of French, subject doubling and subject-verb inversion in wh-questions. First, factor and cluster analyses led to a grouping of the sample into four distinct lifestyle types. Then, statistical tests show that lifestyle, gender, and age are significant external factors, and that lifestyle exhibits the most salient effect. While the lifestyle associated with orthodoxy correlates with a high inversion rate (formal linguistic style) and low doubling rate (informal linguistic style), the group associated with heterodoxy demonstrates the inverse pattern. It stands to reason that sociolinguistic studies can uncover more patterns of variation if they go beyond the standard sociodemographic variables (such as age, gender, etc.) and a ‘narrow’ concept of class. Keywords: French, subject doubling, subject-verb inversion, wh-question, dialogue data, lifestyle, social class, norm, factor analysis, cluster analysis

1. Introduction It is well-known that the operationalization of social class remains an important challenge in variationist sociolinguistics (Kerswill 2007). Sociolinguists mostly use single demographic or economic indicators such as neighborhood, education, income, occupation, etc., or rely on somewhat ad-hoc indices (e.g. the socioeconomic class index in Labov 1966, 2001). While these measures capture certain basic elements of an individual’s social position, they are far from reflecting a broader picture, for example in terms of social and cultural standing. The distinction between ruling and non-ruling classes includes many factors beyond economic and socio-demographic facts (this objection might even be more pronounced in some European societies as compared to North America).

doi 10.1075/silv.19.10adl © 2017 John Benjamins Publishing Company

158 Aria Adli

One framework that addresses the social changes from early capitalist to post-industrialized Western societies is Bourdieu’s (1979) sociocultural theory (see Bourdieu 1984 for the English translation). Essentially, Bourdieu argues that taste and lifestyle are key elements of social power, which leads him to postulate an extended, post-Marxist notion of (economic, social, and cultural) capital and an exchange mechanism between these different forms of capital. The notion of lifestyle and its embedding in a theory of capital combines micro- and macrosociological perspectives in an interesting way: a person’s choices that reflect her/ his taste, for example in the fields of leisure, media, clothing, and values, build her/his cultural capital, which translates into differences in terms of social and finally economic capital. In this chapter I will demonstrate the usefulness of this theory by considering two linguistic variables: subject doubling and subject-verb inversion, which are interesting test cases for the relation between linguistic style and lifestyle. 1 2. Data Sgs (www.sgscorpus.com) is a multilingual sociolinguistic project started in 2004, which includes data from three Western- and Non-Western metropolises (essentially megacities): Paris (European French), Barcelona (Catalonia Spanish and Catalan), and Tehran (Persian). The data has been collected using the same protocol: first, spontaneous speech data was recorded in a specifically-designed game between interviewer and interviewee, 2 in which the interviewee had to solve a fictive murder case by speaking freely with the native and well-trained interviewer. Generally, interviewees chose a rather colloquial, non-formal register during this task. Unlike the classic sociolinguistic interview, which results in a mostly declarative set of sentences, this approach elicits both declarative and interrogative sentences from the interviewee. Second, the interviewees gave gradient acceptability judgments on selected constructions. Third, they filled out a sociocultural questionnaire, inspired by Bourdieu (1979: 599–605), and adapted to contemporary society in the respective metropolis. The present chapter relies on the recordings from Paris with 102 French native speakers, gender balanced (56% women, 44% men) in the age range 19–49 (average 29), which resulted in a corpus of 27 hours of transcribed and annotated speech. 1. I want to thank Josina Gausepohl, who gave me valuable comments on the interpretation of the factors and the clusters and who proof-read the manuscript. 2. Spontaneous speech data was not collected for Catalan, which is not reported on here. Instead, a game task was conducted with the Catalan participants.

Variation in style 159

3. Subject-verb inversion in wh-questions and subject doubling in French The first variable investigated in this chapter is subject-verb inversion in wh-questions. In syntax research, this word order has been called ‘stylistic inversion’ when co-occurring with an inverted non-pronominal subject as in (1a) (Kayne and Pollock 1978; Drijkoningen and Kampers-Manhe 2008), while it is often referred to as ‘subject-clitic inversion’ with an inverted weak subject pronoun as in (1b) (Auger 1994; Elsig 2009). (2) is the variant in which the interrogative pronoun remains in its canonical position (declarative word order) and which is considered as colloquial, (3) is the variant with the optional polar question particle est-ce que, and (4) is the variant without subject inversion, which is also considered as colloquial. (1) a.

où va ton mari? [whVSlex] where go your husband ‘Where does your husband go?’

b. où vas -tu? [whVSclit] where go -you ‘Where do you go?’ (2) tu vas où ? [wh-in-situ] you go where (3) où est-ce que tu vas? [wh-ESQ] where est-ce que you go (4) où tu vas? [whSV] where you go

This study includes (1a) as well as (1b), the two of them are stylistically marked as formal interrogative variants. Both elliptical or otherwise incomplete wh-questions, and questions with non-referential subject pronoun (mainly expletives) were excluded from the envelope of variation. The dependent variable is the relative frequency of subject-verb inversion (which consists of all tokens of whVSclit and whVSlex, i.e. variants (1a) and (1b)). It is calculated as in (5) for each interviewee i, taking into consideration all four interrogative variants productive in contemporary spoken French. Other forms such as clefts or so-called complex inversion that hardly occur in spontaneous speech (see also Elsig 2009), are not included. The total rate of subject inversion for the entire corpus is given in (6) and it builds on a total of 1477 extracted wh-questions. The low average frequency of 8% is in line with the formal stylistic value of the inverted variants. Given the normativity pressure in French (there is a long tradition of astonishing stigmatization of colloquial grammatical forms),

160 Aria Adli

subject-verb inversion in wh-questions is a suitable sociolinguistic variable (for other studies on this variation, see also Druetta 2008; Elsig 2009; Coveney and Dekhissi 2013). Indeed the variable has even been used as a diagnostic for diglossia: based partly on well-known stylistic differences between French interrogative variants, French native speakers have been argued to be diglossic (Zribi-Hertz 2010) or even bilingual (Meisel et al. 2011). (5) hwhVSi =

NwhVSi NwhVSi+Nwhinsitui+NwhESQi+NwhSVi

(6) hwhVStot =

NwhVStot 112 = = 0.08 NwhVStot+Nwhinsitutot+NwhESQtot+NwhSVtot 112+873+242+250

The second variable investigated is subject doubling. The dialogue fragment from sgs in (7) shows the variation between the simple weak subject pronoun (elle) and the sequence of adjacent strong and weak subject pronoun (elle elle). (7) A: la porte était entre-ouverte. Il y avait pas eu d‘effraction. the door was half open. there had not been break-in. B: donc, a priori, elle l’ a ouvert? so a priori sheweak it has opened A: ouais, ouais. yeah yeah B: et elle elle était où? and shestrong sheweak was where

Doubling is generally considered to be colloquial. Subject doubling has been amply discussed in variationist studies (among others Nadasdi 1995; Nagy et al. 2003; Coveney 2005; Culbertson 2010; Zahler 2014), which have mainly concentrated on sentences with a lexical subject, with or without an optional coreferential weak pronoun, as in (8a). (8a) et la voisine elle était où? and the neighbour sheweak was where

In the present study the decision to exclude lexical subjects and to concentrate on pronominal referents has been made, i.e. sentences with a weak subject pronoun, with or without an optional coreferential strong pronoun: (moi) je1sg , (toi) tu2sg , (lui) il3sg,masc , (elle) elle 3sg,fem , (nous) on1pl , (nous) nous1pl , (vous) vous 2pl , (eux) ils3pl,masc, (elles) elles3pl,fem. Circumscribed in this way, variable doubling in French is functionally similar to the much investigated variable subject pronoun in Spanish. This will allow cross-linguistic variationist studies (in future work).

Variation in style 161

Instances of doubling in which strong and weak subject pronouns are not adjacent as in (8b) have also been excluded. These cases require a different syntactic analysis (Culbertson 2010) and might not be variants of simple pronominal forms. (8b) et elle, hier soir, elle était où? and shestrong yesterday evening sheweak was where

Elliptical, fragmentary, or otherwise incomplete utterances were excluded. The dependent variable is again the relative frequency of pronominal subject doubling for each interviewee. The overall doubling rate in the entire corpus is 3%. (9) hdoubltot =

Ndoubltot 238 = = 0.03 Ndoubltot+Nsimpletot 238+7434

At first sight, one might ask whether the inversion and doubling rates are sufficiently high to be considered sociolinguistically relevant. However, these variants are markers that do not require higher rates to show consistent style effects. Furthermore, their frequency should not be underestimated either. Given that questions and subjects are frequent in everyday discourse, the low rate of the respective marked variants does not mean that they represent sparse, barely produced constructions. 4. Lifestyle and Bourdieu’s sociocultural theory Theories of class were predominant among sociologists in western societies, most notably the class theory of Karl Marx. The lifestyle concept used in this chapter emerged from the shortcomings of the traditional notion of class which does not have the same explanatory power in postmodern societies as it had in continental European societies of the 19th century. Bourdieu’s (1979) sociocultural theory has the merit of reconnecting social theories of inequality with cultural sociology. 3 One of the fundamental principles in Bourdieu’s theory concerns the relation between the individual and society. The characteristics of the social structure shape collective representations and social classifications which, in turn, become manifest in apparently genuine individual patterns of personality, such as cultural preferences, judgments of taste, and lifestyle.

3. More details of Bourdieu’s theory (including on his predecessors, the nation of capital, as well as the concept of habitus) can be found in the web-appendix at http://sociolab.phil-fak.uni-koeln. de/index.php?id=26329.

162 Aria Adli

Choosing according to one’s tastes is a matter of identifying goods that are objectively attuned to one’s position and which ‘go together’ because they are situated in roughly equivalent positions in their respective spaces, be they films or plays, cartoons or novels, clothes or furniture; this choice is assisted by institutions – shops, theatres (Left Bank or Right Bank [of the river Seine, which divides Paris]), critics, newspapers, magazines – which are chosen on the same principles and which, being defined by their position in a field, have to exhibit themselves distinctive marking. (Bourdieu 1984: 232) [revised translation, A.A.]

Bourdieu develops a system of sociologically relevant ways of cognition. He distinguishes between different perceptions and attitudes towards the environment on the one hand, and systematic forms of misperceptions on the other. He calls doxa ordinary, non-scrutinized schemas of thought, perception, and judgment which are typically perceived as natural. Bourdieu (1993: 51) describes them as “everything that goes without saying, and in particular the systems of classification determining what is judged interesting or uninteresting, the things that no one thinks worthy of being mentioned, because there is no demand”. Two other ways of perception, orthodoxy and heterodoxy, express deviation from the natural, non-scrutinized doxa. Orthodoxy stands for a systematic, scrutinizing and conscious cognition, claiming legitimacy and normativeness. Orthodoxy represents a conservative view based on a value system supporting and calling for normative agreement. Heterodoxy also presupposes a systematic, scrutinizing cognition. However, in this type of perception, the cognition is used to corroborate an alternative interpretation, opposed to the orthodoxy. It stands for the deviating, critical voice, which is, similarly to orthodoxy, capable of sophisticated judgment. Both orthodoxy and heterodoxy build on a sufficient amount of cultural capital and are therefore a privilege. A distinguished lifestyle representing special social position and success, contains a smaller proportion of doxa. The fine-grained differences between different forms of distinguished lifestyles decide whether distinction is primarily realized by orthodoxy, heterodoxy, or by a skillful, unique and therefore particularly individual combination of both ways of cognition. Bourdieu’s close relation between class and lifestyle is expressed by the fact that in the Paris area of the late 1960s, he attributes orthodoxy primarily to social and cultural climbers, and heterodoxy to the established bourgeoisie. Both possess an aesthetical competence which builds on the knowledge of form and style. The lifestyle dimensions used in the present study, namely leisure, media (consisting of the subdimensions book genres, newspapers, magazines, TV, music, internet, radio, preferred news sources), clothing, and values. Although I am aware that lifestyle has more aspects than these four dimensions (see Bourdieu 1979: 599– 605), they represent core information on lifestyle which are presumably suitable for populations of numerous post-industrialized metropolises around the world.

Variation in style 163

5. Operationalization of lifestyle The essence of the operationalization of lifestyle is a meaningful data reduction of the multitude of information encoded in the answers to the single items of the sociocultural questionnaire. Figure 1 summarizes the data reduction process. items (204) reduce to

factors (29)

45 items on leisure - visiting friends - doing sports …

113 items on media - newspaper XYZ - TV channel XYZ … 16 factors

7 factors

reduce to clusters (4)

lifestyle cluster 1

lifestyle cluster 2

28 items on clothing - brand name cloth - perfumes … 3 factors lifestyle cluster 3

18 items on values - religious values - partner choice … 3 factors

lifestyle cluster 4

Figure 1. Data reduction of lifestyle

5.1

Data reduction

In a first step, four different factor analyses are calculated (one each for leisure, media, clothing, and values), reducing the original number of 204 lifestyle items to 29 factors. In a second step, these 29 factors are taken as the input for the cluster analysis. The rationale for data reduction is that it is not feasible to interpret the complex result pattern of 204 items and give equal importance to each item (if one was to interpret this complex result pattern, one would inevitably opt for a heuristic that would give emphasis to some elements while ignoring others). At the initial stage, each of the 102 subjects of the study is thus characterized by her/his answers on 204 questions. Mathematically, each subject is a (uniquely identifiable) point in a 204-dimensional (hyper)space. 5.2

Factor analysis

Factor analysis, also known under the label PCA, helps us to find a representation with which p variables can be expressed by q factors, with q < p. We obtain a result in which each subject is characterized by her/his values on 29 factors, i.e. s/he represents a point in a space reduced to 29. Table 1 shows the interpretations of the factor analysis for leisure items. Leisure is a classic core element of lifestyle. It builds on very diverse areas like visiting friends, doing sports or engaging in political or social activities. Table 2 shows the interpretation of factor analysis for media items. They cover preferences for book genres, newspapers, magazines, TV programs (and video), music, internet,

164 Aria Adli

radio, and speakers’ preferred sources of information for daily news. The items on newspaper and radio constitute a selection of different, partly high circulation media. Media are granted a proportionally high weight in this operationalization: I assume that this weighting corresponds to a general social trend towards an increasing relevance of media preference for lifestyle. Table 3 resumes the interpretation of the factor analysis for clothing items. It is based on questions about the speakers’ attitudes concerning clothing and fashion, on the functions of clothing, on expenses for selected accessories (such as perfume, underwear, shoes), and on preferred sources of supply. Finally, in Table 4 are the factors for value items. They cover criteria for partner choice, social perception regarding the social position of others, and religiosity. 4 Table 1. Lifestyle factors for the leisure dimension F1 leis. F2 leis. F3 leis. F4 leis. F5 leis. F6 leis. F7 leis.

Sociability and going out Activities promoting health, well-being and uplift Cultural, political, and intellectual (elite) activities Activities requiring low initial effort (e.g. eating fast food, sedentary activities) and fictional entertainment (games, computer, comics) Pastime activities with the family (mainly inside one’s home) Relaxing during mental stimulation (especially reading) Internet-abstinent, practical/aesthetic pastime activities (e.g. handicrafts, theatre)

Table 2. Lifestyle factors for the media dimension F1 med. F2 med. F3 med. F4 med. F5 med. F6 med. F7 med. F8 med. F9 med. F10 med. F11 med. F12 med. F13 med.

Entertainment on TV Downloading documents/music, E-Commerce, and online information (Internet) Political media (mainly newspapers) Information-oriented, popular radio Sports media Mainstream music (radio) Classical music and intellectual radio Literature and art media Sex and erotic media, computer game magazines, techno music Free newspapers; satellite TV Multiple news sources, independently syndicated radio and newspaper Confidence in internet and foreign media; radio Diversion and coziness (e.g. detective novels, women’s magazine, decoration, talk radio)

4. Technical details on the selected factor and cluster solutions, one factor matrix (which served as the basis for the leisure factors) and some background on these techniques are given in the web-appendix at http://sociolab.phil-fak.uni-koeln.de/index.php?id=26329.

Variation in style 165

Table 2. (continued) F14 med. F15 med. F16 med.

Orientation towards practical application and applied topics (e.g. economy, cooking, information technology) Pop/rock/mainstream music Trivial entertainment (TV shopping-channels, love (dime) stories, internet chat, music videos)

Table 3. Lifestyle factors for the clothing dimension F1 cloth. F2 cloth. F3 cloth.

Fashion as a tool for distinction Frequency of purchasing accessories (perfume, underwear, shoes, sun glasses) Spending money for clothes and accessory

Table 4. Lifestyle factors for the values dimension F1 val. F2 val. F3 val.

5.3

Obvious status symbols as an indicator for a person’s social position (place of residence, appearance, furniture and decoration) (Acquired) social status as criteria for partner choice (socio-economic status, education, common roots, beauty) Religiosity

Cluster analysis

In the second step, cluster analysis finds the best possible grouping solution for the subjects according to the criterion of highest within-group homogeneity and highest between-group heterogeneity. It takes the (hyperdimensional) scatter plot consisting of 102 points in the 29-dimensional space and suggests a way of s eparating this scatterplot of points in a limited number of non-overlapping clouds, which are the clusters. Figure 2 shows a solution with the prototype of four clusters (i.e. the centers of the four clusters). Factor values that are clearly above or below the sample mean of 0 are characteristic features of a lifestyle cluster. For example, lifestyle cluster 2 shows a high value on the fifth media factor (F5 med.) which means that the persons belonging to this cluster are consuming sports media much more often than the sample average. Further details on the cluster analysis can be found in the web-appendix. The first cluster consists of 22 persons and is interpreted as a lifestyle oriented towards social conventions and conservative values. The speakers rarely engage in activities requiring low initial effort or in fictional entertainment (F4 leis.), but prefer internet-abstinent, practical/aesthetic pastime activities (F7 leis.). They rarely use the internet for downloads, e-commerce, and online information (F2 med.),

166 Aria Adli

which is probably also due to their low confidence in internet (and foreign media, F12 med.). Compared to the other clusters, this group spends most money on clothes and accessories (F3 cloth.) and shows a high religiosity (F3 val.). The mean age is 27, and the majority are women (82% women, 18% male).

1.5

1. Lifestyle oriented towards social conventions and conservative values 2. Excitement-seeking, but down-to-earth lifestyle (oscillating between welcoming change and searching for security) 3. Educated, liberal lifestyle 4. Lifestyle marked by internet-affinity, conservative values, and and low estimation of aesthetics

1.0 0.5 0.0

-1.0

F1 leis. F2 leis. F3 leis. F4 leis. F5 leis. F6 leis. F7 leis. F1 med. F2 med. F3 med. F4. med F5 med. F6 med. F7 med. F8 med. F9 med. F10 med. F11 med. F12 med. F13 med. F14 med. F15 med. F16 med. F1 cloth. F2 cloth. F3 cloth. F1 val. F2 val. F3 val.

-0.5

Figure 2. Four lifestyle prototypes based on the factor values

The second lifestyle cluster contains 18 persons and is interpreted as an excitementseeking but down-to-earth lifestyle, oscillating between welcoming change and searching for security. Speakers in this cluster often engage in activities geared towards the promotion of health, well-being and uplift (F2 leis.), but are much less often involved in cultural/intellectual activities (F3 leis.) and activities that require low initial effort (F6 leis.). They often consume sports media (F5 med.) and listen to information-oriented popular radio (F4 med.) as well as mainstream music radio (F6 med.), but they stay away from literature and art media (F8 med.). These speakers frequently purchase accessories (F2 cloth.) but do not use fashion as a tool for distinction (F1 cloth.). They rely on (acquired) social status as criteria for partner choice and have an above-average religiosity. The mean age is again 27, but this time, the majority is made up by men (78% men, 22% women). The third cluster contains 33 persons and exhibits an educated, liberal lifestyle. This group is the one which most often practises cultural, political, and intellectual (elite) activities (F3 leis.) and likes to relax during mental stimulation, especially by reading (F6 leis.). They also have the highest consumption of political media,

Variation in style 167

mainly newspapers (F3 med.) and classical music and intellectual radio (F7 med.). The speakers who make up this cluster clearly employ fashion as a tool for distinction (F1 cloth.) although they do not have higher than average expenses for clothes and accessories (F3 cloth). They are rather unreligious (F3 val.). In this cluster, we find the highest average age (mean value 33) and its gender distribution is balanced (56% women, 44% men). Cluster 4 contains 29 persons and is interpreted as a lifestyle marked by internet-affinity and low estimation of aesthetics and conservative values. This subgroup rarely engages in activities geared towards the promotion of health, well-being and spiritual uplift activities (F2 leis.) and internet-abstinent, practical/ aesthetic pastime activities, such as handicrafts or theatre (F7 leis.). They are much more prone to be involved in download, e-commerce, and online information (F2 med.) but rarely consume media oriented towards practical application and topics, such as economy, cooking or information technology (F14 med.). The speakers in this group rarely purchase accessories (F2 cloth.) and spend only little money on clothing and accessories (F3 cloth.). They do not rely on (acquired) social status and common points as criteria for partner choice (F2 val.) and they are not very religious (F3 val.). The average age of this cluster is 29 and it is relatively gender balanced (59% women, 41% men). Reflecting these findings in the light of Bourdieu’s sociocultural theory, the educated, liberal lifestyle (cluster 3) is the critical, deviant voice endowed with cultural capital and can be interpreted as the heterodoxy in the sense of Bourdieu (1993: 51). Based on commonplace stereotypes, it might seem that the lifestyle oriented towards social conventions and conservative values represents orthodoxy. However, in order to be part of this lifestyle group it does not suffice to have conservative values for being orthodox – as is the case for both the first and the fourth lifestyle group. Rather, mere orientation to social conventions and less scrutinized orientation towards conservative values represents the doxa (clusters 1 and 4), which in Bourdieu’s (1993: 51/52) words: “What is most hidden is what everyone agrees about, agreeing so much that they don’t even mention them, the things that are beyond question, that go without saying. […] It is what informants don’t say, or say only by omission, in their silences.” The excitement-seeking, downto-earth lifestyle (cluster 2) represents more clearly the orthodoxy pattern. These individuals are guardians of the norms. Their purpose and strategies of conserving the status quo are based, according to Bourdieu (1984: 426), on their incorporated characteristics of distinction and culture: “They have spontaneously the bodily hexis, diction and pronunciation to suit their words; there is an immediate, perfect, natural harmony between the speech and the spokesman”. Both orthodoxy (cluster 2) and heterodoxy (cluster 3) are privileged groups with a relative high level of cultural capital. The level of education of the persons

168 Aria Adli

belonging to lifestyle 2 and 3 is on average one year above the persons belonging to lifestyle 1 and 4, and their revenue after taxes is also more than 200 Euro higher per month on average. 6. The effect of lifestyle and other social variables on inversion and doubling 6.1

Statistical results

In order to choose meaningful values for the error probabilities α and β, and the effect size ε for the given sample size, I calculated power analyses using G*Power 3 (Faul et al. 2007). Balanced error risks have been employed with α = β = 0.1, which allows for achieving an effect size in the range between medium and large. 5 In a first step, I calculated two one-way ANOVAs with lifestyle, two with gender, and two with occupational category. The occupational category was created by recod ing the French office of statistics (INSEE 2003) output into 3 categories (i. farmer, intermediate occupation, employee or worker, to which n1 = 29 individuals belong, ii. executive, intellectual activity, trader or artisan, comprised of n2 = 40 individuals, iii. non-working, e.g. student but not retiree, comprised of n3 = 28 i ndividuals). Two bivariate correlations (Pearson-R) were calculated with age (which is a metrical variable). Education, which has been measured in years of schooling/educational qualification, has been analyzed by two Kendall’s τb correlation tests for rank orders (because it is an ordinal variable). Table 5. Results of inversion and doubling for five social variables (stars denote significance) Lifestyle Gender Occupation Education Age

inversion rate

doubling rate

p