Corpus-based Studies of Lesser-described Languages : The CorpAfroAs corpus of spoken AfroAsiatic languages [1 ed.] 9789027268891, 9789027203762

For a long time, Toolbox has been the most used software dedicated to text annotation in the community of field linguist

211 6 3MB

English Pages 344 Year 2015

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Corpus-based Studies of Lesser-described Languages : The CorpAfroAs corpus of spoken AfroAsiatic languages [1 ed.]
 9789027268891, 9789027203762

Citation preview

Corpus-based Studies of Lesser-described Languages

Studies in Corpus Linguistics (SCL) issn 1388-0373

SCL focuses on the use of corpora throughout language study, the development of a quantitative approach to linguistics, the design and use of new tools for processing language texts, and the theoretical implications of a data-rich discipline. For an overview of all books published in this series, please see http://benjamins.com/catalog/books/scl

General Editor

Consulting Editor

Elena Tognini-Bonelli

Wolfgang Teubert

The Tuscan Word Centre/ The University of Siena

University of Birmingham

Advisory Board Michael Barlow

Graeme Kennedy

Douglas Biber

Michaela Mahlberg

Marina Bondi

Anna Mauranen

Christopher S. Butler

Ute Römer

Sylviane Granger

Jan Svartvik

M.A.K. Halliday

John M. Swales

Yang Huizhong

Martin Warren

University of Auckland Northern Arizona University University of Modena and Reggio Emilia University of Wales, Swansea University of Louvain University of Sydney Jiao Tong University, Shanghai

Victoria University of Wellington University of Nottingham University of Helsinki Georgia State University University of Lund University of Michigan The Hong Kong Polytechnic University

Susan Hunston

University of Birmingham

Volume 68 Corpus-based Studies of Lesser-described Languages. The CorpAfroAs corpus of spoken AfroAsiatic languages Edited by Amina Mettouchi, Martine Vanhove and Dominique Caubet

Corpus-based Studies of Lesser-described Languages The CorpAfroAs corpus of spoken AfroAsiatic languages Edited by

Amina Mettouchi EPHE (LLACAN), Paris

Martine Vanhove

CNRS (LLACAN), Paris

Dominique Caubet

INALCO (LaCNAD), Paris

John Benjamins Publishing Company Amsterdam / Philadelphia

8

TM

The paper used in this publication meets the minimum requirements of the American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984.

Cover design: Françoise Berserik Cover illustration from original painting Random Order by Lorenzo Pezzatini, Florence, 1996.

doi 10.1075/scl.68 Cataloging-in-Publication Data available from Library of Congress: lccn 2014046968 (print) / 2014047717 (e-book) isbn 978 90 272 0376 2 (hb) isbn 978 90 272 6889 1 (e-book)

© 2015 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Co. · P.O. Box 36224 · 1020 me Amsterdam · The Netherlands John Benjamins North America · P.O. Box 27519 · Philadelphia pa 19118-0519 · usa

Table of contents Preface Amina Mettouchi, Martine Vanhove and Dominique Caubet

1

Part 1. Phonetics, phonology and prosody Representation of speech in CorpAfroAs: Transcriptional strategies and prosodic units Shlomo Izre’el and Amina Mettouchi Tone and intonation Bernard Caron

13 43

Part 2. Interfacing prosody, information structure and syntax The intonation of topic and focus: Zaar (Nigeria), Tamasheq (Niger), Juba Arabic (South Sudan) and Tripoli Arabic (Libya) Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira

63

Quotative constructions and prosody in some Afroasiatic languages: Towards a typology Il-Il Malibert and Martine Vanhove

117

Part 3. Cross-linguistic comparability Glossing in Semitic languages: A comparison of Moroccan Arabic and Modern Hebrew Ángeles Vicente, , Il-Il Malibert and Alexandrine Barontini

173

From the Leipzig Glossing Rules to the GE and RX lines Bernard Comrie

207

Cross-linguistic comparability in CorpAfroAs Amina Mettouchi, Graziano Savà and Mauro Tosco

221

Functional domains and cross-linguistic comparability Zygmunt Frajzyngier and Amina Mettouchi

257

vi

Table of contents

Part 4. Language contact Language contact, borrowing and codeswitching Stefano Manfredi, Marie-Claude Simeone-Senelle and Mauro Tosco

283

Part 5. Information technology ELAN-CorpA: Lexicon-aided annotation in ELAN Christian Chanard

311

Language index

333

Subject index

335 Companion website can be found at: http://dx.doi.org/10.1075/scl.68.website

Preface Amina Mettouchi, Martine Vanhove and Dominique Caubet EPHE (LLACAN) / CNRS (LLACAN) / INALCO (LaCNAD)

When the CorpAfroAs project was submitted to the French Agence Nationale de la Recherche in 2006, there was only one website, The Semitisches Tonarchiv providing online data in Afroasiatic laguages, in the form of sound files accompanied by transcriptions in pdf format, and another project, the Corpus of Spoken Israeli Hebrew (CoSIH), which was at a standstill, with no available data online. Other language families were more largely represented on the web, in the form of Archives at LACITO, DoBeS or ELDP, among others. At the time, however, even the richest of repositories in lesser-described languages had not integrated systematic prosodic segmentation in their transcription of the data. And none had chosen a systematic annotation schema in view of typological research. In this context, CorpAfroAs appeared as a pioneering endeavor, a status that is still valid at the time we are releasing the data and publishing the accompanying volume. The project involves the collection of one hour of spontaneous speech per language, 60% monologal and 40% dialogal, in thirteen Afro-Asiatic languages; the sound-indexed transcription, annotation, and translation into English of those thirteen subcorpora; the elaboration of grammatical sketches for each language; and the development of a lexicon-assisted annotation tool in the software ELAN named ELAN-CorpA. The aim of the project is not only to provide data, but to offer a methodology for the creation of corpora in lesser-described languages, from data collection, through analysis, to online dissemination. All stages of the process have been documented in a Manual available online at http://dx.doi.org/10.1075/ scl.68.website. CorpAfroAs is characterized by its integrated dimension: – a common layout for the annotation of sound files, – a unified list of abbreviations, allowing searches across languages, – an accompanying grammatical sketch per language, where glosses are given language-internal definitions. doi 10.1075/scl.68.00pre 2015 © John Benjamins Publishing Company

2

Amina Mettouchi, Martine Vanhove and Dominique Caubet

The corpus is searchable online, within and across languages. Ultimately, the pilot corpus is designed to grow and become a reference corpus, as well as to inspire initiatives for other language phyla. The languages represented in the online corpus are:

Kabyle, Tamashek (Berber), Hausa and Zaar (Chadic), Afar, Beja, Gawwada, Ts’amakko (Cushitic), Wolaytta (Omotic), Moroccan and Libyan Arabic, Hebrew (Semitic), Juba-Arabic, an expanded Arabic-based pidgin.

In its pilot form, the corpus is not designed to present a balanced sample of languages. It covers all branches, and different types of languages, in order to provide technical and scientific solutions for all potential types: tonal and intonational, concatenative and non-concatenative, endangered as well as rather well-described languages, with or without codeswitching, etc. The project is organized along two axes, prosody and morphosyntax, which are linked to the nature of the materials and to the aim of the project, namely crosslinguistic comparability. Our first research question bears on the prosodic structure of the languages of the project, and more precisely, on the type of segmentation relevant for our data. This task is one of the main innovative aspects of our project. We decided to index the recordings on intonation units, a level that is largely recognized as useful for grammar, discourse and conversation analysis, and which was for instance used in the C-ORAL-Rom corpus of spoken Romance languages, developed by Cresti and Moneglia (2005). We therefore analyzed the prosodic units of our languages into minor (nonterminal) and major (terminal) units, using the software Praat. No other specification (tones, contours etc.) is given to those boundaries, but the fact that the transcription is indexed to the sound will ultimately allow more in-depth prosodic studies on the available data. The second research question concerned the morphosyntactic organization of the languages of the corpus. The corpus is not only translated, but also interlinearly glossed. For this purpose, we have developed a format allowing several annotation tiers. The purpose of those tiers is the automatic retrieval of a number of relevant queries concerning Afroasiatic languages: pronominal systems, case systems, nominal predicates, aspect, ideophones, demonstratives, verbal derivation, etc. Here are the various tiers and their contents:



Preface

ref identifier for the annotation unit (time-aligned) tx transcription in broad phonetics into phonological words (Symbolic Association) mot intermediary tier allowing the segmentation into morphosyntactic words (Symbolic Subdivision) mb morphophonological transcription into morphemes (Symbolic Subdivision) ge morpheme-by-morpheme gloss of mb according to the Leipzig Glossing Rules, expanded within the project (Symbolic Association) rx part-of-speech and other information relevant for retrieval purposes (Symbolic Association) ft or mft free translation into English (Symbolic Association) The annotation system we chose is based on the Leipzig Glossing Rules, developed jointly by the Department of Linguistics of the Max Planck Institute for Evolutionary Anthropology (Bernard Comrie, Martin Haspelmath) and by the Institute of Linguistics of the University of Leipzig (Balthasar Bickel), in order to promote convergence in glossing systems. The list of morphemes being open, and the rules devised more for readability than for automatic retrieval, one of the tasks we achieved is the establishment of a completed list, with rules adapted to computer retrieval. The full list is available on line on the project website at http:// dx.doi.org/10.1075/scl.68.website, and it is possible to suggest additions through the use of an online form. Each member of the project collected, transcribed, segmented and annotated the data in their language, on the basis of language-internal consistency, but also in view of cross-linguistic comparison. Our main concern was to find the optimal degree of unification of the annotations, in order to both respect the specificities of languages, and provide a comparative basis for typology. It turned out in the course of the project that cross-linguistic comparison could not rely directly on the corpora, even with a list of abbreviations and definitions, but had to be mediated through grammatical sketches, which were therefore added to the deliverables of the project. Those sketches are online, and give the end-user sufficient insight into the definition of the categories involved in order to find the relevant typological matches in the other languages of the corpus. Ethical and technical aspects have also been largely dealt with in this project. As the data are made available online to the community, a thorough reflection process was initiated before data collection, concerning the ethical aspects of the project. Thus, anonymization procedures, as well as control over sensitive data, have been implemented when needed. At the same time, all the relevant information was gathered, in order to provide rich metadata on the recordings, in the

3

4

Amina Mettouchi, Martine Vanhove and Dominique Caubet

IMDI format. These metadata follow the requirements of OLAC (Open Language Archives Community) and the TEI (Text Encoding Initiative). The recordings were all digital, with strict requirements as to the format: non-compressed, .wav files, recorded at 44.1 khz / 16 bits, with high-quality microphones and pre-amplifiers. The high quality of the recording is necessary, not only because one of our scientific aims is to conduct a prosodic analysis of the data, but also for conservation purposes. The software used for the analysis of the data were Praat for segmentation, and ELAN-CorpA, or Toolbox followed by ELAN for primary annotation. Indeed, before the development of ELAN-CorpA within the project, it was necessary to import the Praat Textgrid into ELAN, then export the resulting file into Toolbox, then annotate in Toolbox, then reimport the Toolbox file into ELAN in order to have a sound-indexed, searchable file. ELAN-CorpA, a lexicon-aided annotation made it possible to do without the complex process of data treatment and annotation via Toolbox. Once the corpus was fully annotated, a website for online queries was devised, in collaboration with another ANR project of our research department, Sénélangues. This tool, ELAN-WebSearch, is similar to the search module of ELAN, but is based on a PostgreSQL database, and includes additional functionalities (concordances and lists). The technical dimension of the project implies of course the participation of an engineer on a permanent basis (Christian Chanard, of the LLACAN research unit), as well as a developer specially hired for the project, Coralie Villes. Regular collaboration with Han Sloetjes of the Max Planck Institute at Nijmegen has guaranteed the adaptation of ELAN-CorpA to the general architecture of ELAN. Once the corpus was released in its Beta version, it appeared desirable to publish an accompanying volume where various central aspects of the CorpAfroAs project would be developed, and some scientific results published. The first part of this volume is dedicated to phonetic, phonological and prosodic aspects of the project. Izre’el and Mettouchi’s survey of the phonetic and transcriptional aspects of CorpAfroAs sketches a portrait of the Corpus according to the choices that were implemented. First of all, the priority was given to the close relationship between the tx tier and the sound file, mirroring the structure of the software, in which tx is indexed to the sound file represented by the waveform window in ELAN. The transcription in tx was therefore meant to reproduce as faithfully as possible the spoken monologue or interaction, allowing the enduser to recognize the elements of the speech continuum. However, the length of the corpus does not allow detailed phonetic representation, therefore, a degree of phonologization of the transcription was introduced, resulting in a broad phonetic transcription. In this tier, words are phonological (as opposed to morphosyntactic). The segmental string was segmented into prosodic units, defined by



Preface

their boundaries and by their coherent internal contour. Intonation units were chosen over syntactic units (clauses or phrases) because they are the organic units of spontaneous speech. At a later stage, the corpus can be further segmented into other units if needed for further research on the correspondence between syntactic and prosodic units. The tx tier was in turn further morpho-phonologized so that the mot tier should be composed of morphosyntactic words, morphemically transcribed. This level opens the way for a tokenization into morphemes in the mb tier. Those morphemes are then glossed in ge and rx. Finally, a free translation was given, which is currently aligned with respect to Intonation Units in most languages of the corpus, and to larger units (paratones) for the SOV languages of the sample, thus allowing better alignment between the source language and the target language (English) for the translation. The comparison between tx and mot allows the systematic study of sandhi and other similar phenomena, and of the syntax/prosody interface. The segmentation into prosodic units allows the study of various interfaces: syntax, information structure, discourse. The paper provides a detailed discussion of the various units of speech, and arguments for the decisions made by the team regarding segmentation and transcription. Bernard Caron’s paper specifically broaches the question of intonation in tone languages, with data from Zaar (Chadic). He shows how pitch plays a role in the intonation of a three-tone language, through the observation of the variations between post-lexical tones as they are perceived and transcribed by native speakers, and their acoustic realisation as represented by Praat and Prosogramme, and how surface tones accounted for and/or predicted by postlexical tonological rules undergo further variations. Those variations fall under two main categories: (a) declination; and (b) intonemes, defined as the minimal units of distinctive intonation contours associated with particular functions (pragmatic, information structure…). These are further divided into terminal intonemes (fall, rise, level and high-rise) and initial intonemes: step-down and step-up. The study allows the author to classify Zaar as a mixed language as regards Bearth’s (1998) typology: a language which both stacks intonation patterns over lexico-grammatical tones and expresses intonation at the periphery of the utterance, i.e. with both internal and peripheral intonation. The second part is dedicated to studies of interfaces between prosody, information structure or syntax. Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira analyze the correlation between prosody, sentence types, morphology and information structure categories, with a special emphasis on topic, focus and frames (i.e. left-dislocated circumstantials). After having examined into details the various prosodic patterns for four languages of the CorpAfroAs corpus

5

6

Amina Mettouchi, Martine Vanhove and Dominique Caubet

(Zaar, Tamasheq, Juba Arabic and Tripoli Arabic), they conclude that (i) despite their different phonological pitch systems, and some differences in the correlations between prosodic contours and information structures, strong common tendencies emerge concerning the default intonation patterns of thetic sentences, and of topics (with the exception of Tamasheq); (ii) the variations in the intonation patterns of polar questions, focus and Wh-Questions follow a rule: a lack of a specific intonation pattern for a specific information structure is supplemented by morpho-syntactic marking, in other words the more a structure relies on morphosyntax, the less it relies on intonation. Il-Il Malibert and Martine Vanhove’s paper investigates, in a crosslinguistic perspective, the relationship between prosodic contours and direct and indirect reported speech (i.e. without or with deictic shift) in four typologically and genetically different Afroasiatic languages of the CorpAfroAs pilot corpus: Beja (Cushitic), Zaar (Chadic), Juba Arabic (Arabic based pidgin) and Modern Hebrew (Semitic). The descriptive tools and analysis of Genetti (2011) for direct speech report in Dolakha Newar (Tibeto-Burman) are used as a starting point and adapted to the annotation system of CorpAfroAs. Each language section investigates the prosodic cues and contours of direct speech reports, in relation to their quotative frame and their right and left contexts. The same prosodic features are also investigated for the three languages in our corpus which have indirect reported speech (Zaar, Juba Arabic and Hebrew). It is shown that speech reporting as a rhetorical strategy varies a lot from one language to another and is more frequent in the three unscripted languages of the sample. Even if speech reports show a wide range of prosodic behaviors, there are nonetheless clear tendencies that become apparent and which are related to various factors: speech report types, types of constituents of the quotative frame, genres, and typological features of the languages in question. A preliminary typology of the interface between prosody and speech reporting is proposed. The third part of the volume deals with the problem of cross-linguistic comparability. This issue is relevant on several levels: the choice and type of glosses, the role of the accompanying grammatical sketches, the kinds of queries allowed by the chosen layout, which is based not on one, but on two annotation tiers: ge, and rx. The ge line provides morpheme-by-morpheme glossing in terms of the function of each form. The rx line is devoted to part-of-speech information, as well as any gloss that the researcher considers helpful for the retrieval of relevant phenomena in the corpora (syncretism, verb class, noun class, syntactic relations, etc.). Angeles Vicente, Il-Il Malibert and Alexandrine Barontini show how a project such as CorpAfroAs can challenge existing traditions in terms of annotation, and develop proposals for a standard in the domain of Semitic linguistics. Indeed,



Preface

Semitic studies traditionally used no interlinear glossing, translated examples were the norm, with occasional morphological information (SG, PL…) interspersed inside word-by-word translations. Those habits hamper the diffusion of Semitic studies outside the circle of language specialists. They also prevent the development of morphosyntactically-annotated corpora, and possible family-internal comparisons. The authors of the paper provide a series of annotation proposals that take into account the complex morphology characterizing this language-family. This endeavor also paves the way for the comparison of similar categories in Moroccan Arabic and Spoken Hebrew. The paper by Comrie addresses the issue of glossing examples in unfamiliar languages, with particular reference to the CorpAfroAs project. He first introduces the Leipzig Glossing Rules (LGR), which were devised with a very specific purpose in mind, namely to standardize the notations used by linguists, especially typologists, in presenting the morphological structure of example sentences in languages unfamiliar to the reader. While LGR form a suitable basis for annotation in projects like CorpAfroAs, such projects have a higher level of requirements, in particular the need to be able to retrieve particular categories and structures from corpora in various languages. The article discusses with examples the enrichment of LGR that is needed for this purpose, in particular the addition of extra tiers to capture such notions as part of speech and grammatical relation. Amina Mettouchi, Graziano Savà and Mauro Tosco’s paper test the potential for cross-linguistic comparability of CorpAfroAs through the study of three morphosyntactic phenomena represented in several languages of the corpus (‘ventive’ extensions, gender, and case); they show that CorpAfroAs indeed allows the retrieval of a body of data amenable to cross-linguistic comparison, within the AfroAsiatic phylum and beyond, but that, given the annotation scheme of the corpus, the retrieval of relevant data also relies on information given in the accompanying grammatical sketches. This being taken into account, the paper proves that such automatic cross-linguistic investigations are powerful enough to allow the retrieval of complex data for gender; they can provide quite precise characterizations of lexical information concerning the types of verb that co-occur with directional extensions; and they contribute to the debate on the function of case-marking and syntactic roles in several languages of the corpus. In this respect, CorpAfroAs, as a pilot corpus, can indeed serve as a basis for typological investigations. Zygmunt Frajzyngier and Amina Mettouchi’s paper carries further the discussion on cross-linguistic comparability, by showing how the corpus can be completed by a comparative database that allows empirical analysis of the data. In this proposal, comparison is no longer based directly on the labels used for annotation, but it is mediated by the establishment, by the author of the corpus and specialist of the language, of the functional domains of that language. And it

7

8

Amina Mettouchi, Martine Vanhove and Dominique Caubet

is those domains that are compared, along several dimensions: type and number of oppositions, formal means used for the encoding of functions and predications. The domain considered for illustration is that of Reference, in Kabyle (Berber) and Mina (Chadic). This proposal is currently being implemented in a new ANRfunded project coordinated by Amina Mettouchi, CorTypo. The fourth part concerns two different contact-induced phenomena found in several languages of the CorpAfroAs corpus: codeswitching and borrowing. Stefano Manfredi, Marie-Claude Simeone-Senelle and Mauro Tosco provide new linguistic evidence for the identification of codeswitching in contrast to lexical borrowings. The innovative dimension of this paper lies in the use of prosodic information for the analysis of those phenomena. The authors show that variation in intonation contours provides a good test for telling them apart. They show in addition that different types of codeswitching have different prosodic segmentation patterns: intersentential codeswitching is systematically related to monolingual intonation units, while intrasentential codeswitching tends to occur at the end of bilingual intonation units; on the other hand tag-switching is regularly highlighted by prosodic prominence. This result can be taken as a new constraint for distinguishing codeswitching from borrowing. The volume ends with the presentation, by Christian Chanard, of the software development conducted within the CorpAfroAs project on the basis of the software ELAN (Max Planck Institute, Nijmegen). This development, resulting in the ELAN-CorpA software, involves the addition of an internal parser linked to a lexicon, for semi-automatic interlinearization purposes. This addition was made necessary due to several obstacles encountered in the course of the project realization, namely lack of text-to-sound indexation in Toolbox, the most widely-used software in African field linguistics, problems of compatibility for Mac-users, lack of complex search functions in current software. ELAN-CorpA is a good example of the way bottom-up proposals made by field linguists in lesser-described languages can make their way into information technology, thus facilitating the treatment of data, both at the source, and for the end-user. In conclusion, the collection of papers in this volume is a quite unique endeavor in that it provides data, analyses and discussions on several aspects of corpus linguistics that are rarely studied together. By combining research on lesser-described languages, cross-linguistic comparison within a phylum, the introduction of prosodic segmentation and several layers of transcription and annotation, the use of first-hand dialogal and monologal recordings, the elaboration of accompanying grammatical sketches and standardized glosses, CorpAfroAs provides a rich integrated basis for corpus-based and corpus-driven investigations.



Preface

Acknowledgements We are most grateful to our colleague David Roberts who reviewed the volume from the point of view of a native English speaker.

References The Semitisches Tonarchiv. Corpus of Spoken Israeli Hebrew (CoSIH). Archives (now named Pangloss) at LACITO. DoBeS. ELDP. Mettouchi, Amina, Vanhove, Martine & Caubet, Dominique (eds). Corpus-based Studies of Lesser-described Languages: The CorpAfroAs Corpus of Spoken AfroAsiatic Languages.

ELAN-CorpA. CorpAfroAs Manual. CorTypo. Bearth, Thomas. 1998. Tonalité, déclinaison tonale et structuration du discours - Un point de vue comparatif. In Les unités discursives dans l’analyse sémiotique: La segmentation du discours, Gustavo Quiroz, Ioanna Berthoud-Papandropoulou, Evelyne Thommen & Christina Vogel (eds), 73–87. Bern: Peter Lang. Cresti Emanuela & Moneglia, Massimo (eds). 2005. C-ORAL-ROM : Integrated Reference Corpora for Spoken Romance Languages [Studies in Corpus Linguistics 15]. Amsterdam: John Benjamins. DOI: 10.1075/scl.15

9

Part 1

Phonetics, phonology and prosody

Representation of speech in CorpAfroAs Transcriptional strategies and prosodic units* Shlomo Izre’el and Amina Mettouchi Tel Aviv University and LLACAN, Paris

This paper surveys the transcriptional aspects of CorpAfroAs, a spoken corpus of Afroasiatic languages, with a focus on the representation of phonemes, morphemes, words, and longer units. We discuss the distinction between prosodic, phonological and morphosyntactic word, as well as that between intonation unit, paratone and period. Segmentation and transcription choices are analyzed and their outcome in terms of scientific breakthroughs is presented : the comparison between phonological and morphosyntactic word allows the systematic study of sandhi and other similar phenomena, and of the syntax/phonology interface. The segmentation into prosodic units allows the study of interfaces with syntax, information structure, and discourse.

1. Introduction The spoken medium is acoustic, linear and temporally extended. Therefore, visual transmission is necessary in order to enable research, except, perhaps, for those focused on individual, small units. Even in this latter case, one needs to transmit sound into the visual medium in order to publish the results. The linguist must therefore use a transcript of the spoken text. Transcribing a text is not a trivial undertaking, as has been noted time and again by those who have attempted an accurate transmission of speech into the written medium, i.e., its visualization. Transcribing a recording is a time consuming endeavor, and an hour of transcription may take — depending on the nature of *  We thank The CorpAfroAs team for fruitful discussions and constant input, especially Alexandrine Barontini, Bernard Caron, Il-Il Malibert-Yatziv, Stefano Manfredi, Christophe Pereira, Graziano Savà, Mauro Tosco and Ángeles Vicente. We have further benefitted from responses by the audience in a lecture given by Izre’el at Tel-Aviv University in March 2011, for which we are grateful. Lastly, we thank Emanuela Cresti for her insightful review of our paper and for some useful suggestions. doi 10.1075/scl.68.01izr 2015 © John Benjamins Publishing Company

14

Shlomo Izre’el and Amina Mettouchi

the speech — rate of speech, numbers of speakers, setting (naturally occurring or spontaneous), environment, genre, etc. — at least many dozens of hours of painstaking work, in some cases the amount of time invested will climb to hundreds of hours. While orthographic transcription is often used in languages with a written system and written tradition, transcription in the standard orthography has by its very nature a very limited range of uses for the analyst, and indeed seems to be most useful for lexical studies and discourse analysis, yet even there only with at least minimal prosodic notation. Other domains of linguistic analysis can hardly profit from using transcription in only the standard orthography of any speech, without having access to the sound stretch itself. This applies not only to phonetic or phonological analyses, but practically to all other domains, such as morphology, morphophonology, prosody, and even syntax. Notably, the standard orthography of a language is by definition related to only one (demographic or contextual) variety of linguistic forms used by speakers of that language. Moreover, the vast majority of the languages represented in CorpAfroAs have no orthographic standards or orthographies at all, which implies that other transcription systems should be used. Therefore, CorpAfroAs is so constructed as to present to its end users both sound and transcription linked and aligned to the extent that each relevant unit of language can be easily retrieved and accessed together. The first (tx) tier presents a broad phonetic transcription, whereas the second tier (mot) brings forth a basically phonological representation of the same stretch. It is this second tier that forms the basis of all upper-level analyses, i.e., morphological, POS, and beyond. Both the phonetic and the phonological levels take cognizance of the prosodic structure of the language in terms of intonation units; the phonetic tier further exhibits lower level units, i.e., phonological words. One should note, however, that phonological words are not necessarily a lower level in the prosodic hierarchy, as will be clarified in section 2.1. In what follows, we shall first discuss the representation of the segmental strings in different tiers. Then we shall discuss in some detail the theoretical basis of segmentation into prosodic units and its implications. 2. Visualization of the spoken: Phones and segmental phonemes The first (tx) tier presents a broad phonetic transcription of the speech stretch as actually perceived by the transcriber. In terms of sound, this tier conveys the sound segments at the surface level, i.e., after all phonological rules have been performed. Operations can be present in the creation of allophones, assimilation (total or partial), elision, or lengthening, shortening, etc. Analysis at the tx level



Representation of speech in CorpAfroAs

is thus mostly phonetic, although it has much to do with the phonology of the language, as each represented segment actually stands for a class of phones which are related on both the phonetic and the phonological level (Wells 2006; Esling 2010: 680). Units at the tx level are phonological words (see below, §2.1). A more abstract level of representation is presented at the mot tier. Each character at this level is thus ideally representing a phoneme. This transcription line does not represent any abstraction beneath the morphophonological level, i.e., it represents phonemic strings following the operation of morphophonemic rules. Analysis at this level is thus purely phonological, as allophonic variation, sandhi phenomena and their like are usually not shown. Ex. 1 from Gawwada will serve well to demonstrate the differences between the broad phonetic representation in the tx tier and the morphophonemic transcription represented in the mot tier:1 (1) tx: ħaːmos kujaʕte ogaːjb ano raːdonaba aqasi raːdonesi apaqasana mot: ħaːmosí kujaʕte ʔokaːjpa ʔano raːtonepa ʔapaqasí raːtonesí ʔapaqasana mb: ħaːmo=s-í kujaʕ-t-e ʔokaːj-i=pa ʔano raːton-e=pa ʔapaq-a=s-í raːtone=s-í ʔapaq-a=s-a=n-a ge: Haamo=DEICT-SPEC day-SING-F come-PFV.1SG=LINK IDP.1SG radioF=LINK listen-IPFV.1SG=DEICT-SPEC radio-F=DEICT-SPEC listen-IPFV1SG=DEICT-GEN=MOV-OUT ft: ‘This Haamo came as I was listening to the radio. And as I was listening to the radio,’ (GWD_MT_NARR_003_012)

In Ex. 2 from Moroccan Arabic, there are two occurrences of the definite article /əl/, one in each of the two words in this example. In the first occurrence, the vowel that usually precedes the consonant is now found following it: ləħbəq. It is thus duly represented in the tx tier, whereas the order has been reversed in the mot tier (əlħbəq), thus following the accepted representation of the Arabic definite article. In the second occurrence, the definite article is represented in both the tx and the mot tiers, as showing the morphophonemic change of /l/ to /s/ which occurs in adjacency to the following word, beginning with /s/. As this change is morphophonemic, it is similarly represented in the mot tier. The underlying phonemic string / əl / is represented in this case only in the mb tier. (2)

tx: ləħbəq wussuːsaːn mot: əlħbəq wəssuːsaːn mb: əl=ħbəq w=əl=suːsaːn ge: DEF=basil and=DEF=lily ft: ‘the basil and the lily’ (ARY_AB_narr_1_004)

1.  In this section, we dispense with the prosodic notation of boundaries, which will be dealt with in §2.2.

15

16

Shlomo Izre’el and Amina Mettouchi

Divergences from the principled system as characterized above can be discerned in some treatments of the languages represented in CorpAfroAs, notably with regard to vocalic epentheses, where theories may differ regarding their actual status. As Ex. 2 from Moroccan Arabic demonstrates, the theoretical premise that lies behind the representation /əl/ for the definite article in Moroccan Arabic is that the initial schwa is part of the phonemic string that forms this morpheme. In Hebrew, epenthesis usually takes the form [e]. However, scholars differ in their analysis and representation of various morphemes as regards the status of this vowel in the morphemic string, notably in the domain of prepositions. Note Ex. 3 from Hebrew, where the vowel [e] in the preposition [be] is interpreted as epenthetic: (3)

tx: beotobusim mot: beotobusim mb: b=otobus-im ge: in=bus-M.PL ft: ‘By buses.’ (HEB_IM_NARR_7_SP1_0948)

While strict methodology would require the representation of /b/ as [b] rather than [be] in the mot tier, the reading of such a string will be misleading: [botobusim]. Therefore, it has been decided to copy the epenthetic [e] also to the mot tier. Note also Ex. 4, exhibiting epenthetic vowels in Kabyle: tx: ikkrəd jufad jəssisulaʃiθənt mot: ikkərdd jufadd jəssis ulaʃitənt mb: i-kkər=dd j-ufa=dd jəssi-s ulaʃ=tənt ge: SBJ3SG.M-stand_up\PFV=PROX SBJ3SG.M-find\PFV=PROX daughter\ PL-KIN3SG NEGEXS=ABSV3SG.M ft: “The father woke up and found that his daughters were no longer there” (KAB_AM_NARR_01_0902)

(4)

A strictly accurate representation of the phonemic string of the first word would yield /ikkrdd/. The long cluster of consonants would be hard to interpret. In this case, the final morpheme, /dd/, is represented as a cluster, still immediately following the final consonant of the verbal stem. It will be interpretable when compared to the mb tier. However, the mb tier cannot provide readability to the consonant cluster of the verbal stem, which has therefore been represented in all tiers along with an epenthetic vowel. The represented form, kkər, will serve as a basic allomorphic representation to the morpheme (=verbal stem) /kkər/, which also has the variants əkkr and kkr. The epenthesis can therefore appear in various places in the tx tier, but in the mot tier the stem is represented in a single form as above. From the technical point of view, only one record (=stem representation) will thus



Representation of speech in CorpAfroAs

be used in the ELAN lexicon, but the other forms will appear as variants of that record. In a similar vein, representation of the phonological structure of the absolutive clitic /tnt/ in Kabyle (at the end of this IU) will be unreadable, so that the epenthetic schwa, which is usually used in the pronunciation of this clitic [θǝnt] has been kept also in the mot tier: /tǝnt/.2 3. Prosodic segmentation: Prosodic units and their representation 3.1 Phonological word A phonological word is a unit consisting of one syllable or more which has at least one defining property chosen from the following areas: (1) Segmental features: internal syllabic and segmental structure; phonetic realization in terms of this; word boundary phenomena; pause phenomena. (2) Prosodic features: stress (or accent) and/or tone assignment; prosodic features such as nasalization, retroflexion, vowel harmony. (3) Phonological rules: some rules apply only within a phonological word; others (external sandhi rules) apply specifically across a phonological word boundary (Dixon & Aikhenvald 2002: 13). There is no consensus over the definition of either a phonological word or a prosodic word. Definitions differ among linguistic schools, as well as within schools. For example, scholars of the generative school “differ in how function and content words are parsed into Prosodic Words, and also in how different types of morphemes are parsed into Prosodic Words” (Shattuck-Hufnagel & Turk 1996: 216–218). The issue of cliticization is often brought into account in determining the scope of the notion of prosodic word, without there being a consensus about its relevance to the definition of the notion of phonological word (op. cit., §§3.1;3.2.4; Aikhenvald 2002; Vogel 2006: 532–3). Yet cliticization in itself is a complex feature in that the behavior of clitics should be regarded as language specific (Aikhenvald 2002; Schiering, Bickel & Hildebrandt 2010). Furthermore, cliticization is not invariable, and either content word or function words may have — under different conditions — both full and reduced versions (Zwicky 1977, 1995; Aikhenvald 2002: 72–75; Anderson 2005: §4). In any case, the relationship between prosody and morphosyntax plays a large role in the determination of prosodic words (Vogel 2006). Units at the tx level are phonological ones. Units at the mot level are morphosyntactic words (or, as commonly called, “grammatical words”), preparing the 2.  The fricative [θ] is a phonetic realization of the phoneme /t/, which is therefore used in the mot tier.

17

18

Shlomo Izre’el and Amina Mettouchi

ground for morphological and morphophonological analyses which are operated while moving down to the mb tier, representing the morphemic structure of the language under scrutiny. Whereas a phonological word may be defined on phonological or prosodic terms, a morphosyntactic word is defined on morphosyntactic terms as follows: it consists of a morpheme or several morphemes that (1) always occur together (rather than scattered through the clause); (2) occur in a fixed order; (3) have a conventionalized coherence and meaning (following Dixon & Aikhenvald 2002: 19). As noted by Julien (2006: 619), the rather commonly used term “grammatical word” to denote the nonphonological and nonlexical meaning of “word” is not strictly correct because phonology is, of course, also a part of grammar. Therefore, we will use the term “morphosyntactic word” instead (cf., e.g., Vogel 2006; Matthews 2007 s.v.; Crystal 2008: s.v.). Ex. 5 from Hebrew is a clear illustration of the difference between morphosyntactic words (in the mot tier) and phonological (or prosodic) words (in the tx tier). Boundaries between either phonological words or morphosyntactic words are represented by spaces on the relevant tier. The vertical lines in Figure 1 show the boundaries between phonological words. tx: χalomʃlanu zʃtijelanu galeʁja mot: χalom ʃelanu ze ʃetihje lanu galeʁja mb: χalom ʃel=anu ze ʃe=t-ihje l=anu galeʁj-a ge: dream of=POSS.1PL DEM.SG.M NMLZ=3SG.F-be\NFCT to=POSS.1PL gallery-F ft: ‘Our dream is that we will have our (own) gallery.’ (HEB_IM_ CONV_3_SP1_041)

(5)

χalomʃlanu

zʃtijelanu

galerja

Figure 1.  Phonological words in Hebrew (Ex. 5)

However, there are cases where morphosyntactic words and phonological words do not show morphosyntactic unity, and vice versa, a mismatch which is commonly attested in some languages (cf. Caink 2006: 492). An interesting case is exhibited by Juba Arabic, an expanded pidgin of Southern Sudan. Given the lack of inflectional morphology in that language, phonological words often coincide with



Representation of speech in CorpAfroAs

grammatical words. If a prosodic word is defined by a stretch with only a single (main) stress, then reduplicated items can be seen as morphosyntactic words consisting of two prosodic (=phonological) words; e.g., bigídu~gídu ‘pierce repeatedly’ (JA_SM_CONV_2_SP2_299). On the other hand, a single prosodic word may consist of two morphosyntactic words; e.g. [jaʃán] /ja aʃán/ ‘then because’ (JA_SM_CONV_2_SP2_372). Ex. 5 is, admittedly, an ideal representation of the tiers in the CorpAfroAs tier template. In practice, the mot tier exhibits a compromise between the morphosyntactic structure of the phonemic string, its actual pronunciation and its intermediary status between the transcription proper, on tx, and the morphemic analysis on mb. Moreover, there are problems in determining and segmenting a text into phonological words. Such problems are not only the result of the diversity of languages represented on CorpAfroAs, or divergences in theoretical orientations of the respective schools involved and among individual scholars, but they are also inherent to the very issue of the definition of “phonological word”, “prosodic word”, and the relationship between those entities. Ex. 6 and Figure 2 exhibit cliticization is in Moroccan Arabic: (6) ħʃuːmiːja=u daːxla suːq ṛaːs=ha=u shyness=and come_in market head=her=and ‘(She was always in) modesty and minding her own business and …’ (ARY_AB_narr_1_029–030)

ħʃumija =u

daxla suq rṣas=ha =u

Figure 2.  Encliticization of the connective in Moroccan Arabic (Ex. 6)

The connective /u/ is usually regarded as having the tendency to cliticize to the following word (or unit).3 However, in the two occurrences of the connective in this example, it is decisively cliticized to the preceding word. As suggested by the gloss, the analyst considers the connective to be a clitic not only on the phonological level, but also on a morphosyntactic level. Another sort of problem can be illustrated by Ex. 7 (with Figure 3) from Ts’amakko:

3.  Thus, in a way, following the tradition of written Standard Arabic.

19

20 Shlomo Izre’el and Amina Mettouchi

(7)

ˈqajto ˈχumɓiɣa ˈfugaɗɛ q’ajto χumɓi=ka pug-aɗ-aj time all=CONTR inflate-MID-IPFV.2SG (What is that you eat and) ‘you always get satiated?’ (TSB_NARR_001_051)

qajto

χumɓia

fugaɗ

Figure 3.  Prosodic or phonological words (Ex. 7)

There are three content words in this intonation unit: q’ajto, χumɓi and pugaɗaj. From the prosodic point of view, they seem to form three prosodic words, where the contrastive focus marker is cliticized to the second content word, thus forming with it a single phonological word. This is indicated by the lack of stress on the clitic, as well as by the voicing and fricativization of its first consonant (k→ɣ\v_). One should, however, take into consideration the possibility that the stress on the first content word, namely, q’ajto, is a secondary stress, with the consequence that it be regarded as a single prosodic word with the following χumɓiɣa. The decision bears on the analysis of information structure of this string, i.e., whether the phonological compound as a whole is focused or only the second content word. From the perceptual point of view, the level of accent of the first word seems as prominent as in the third word, with only the second word showing more prominence. Therefore, the conclusion seems to be that the focal point of this intonation unit is on the second word, which conforms to the position of the segmental contrastive focus marker /ka/. As we have seen, the initial consonant of the element /ka/ is fricativized in the process of cliticization. An interesting question then arises when one looks at the fricativization of the initial consonant of the last word, i.e., p→f. Should this change be interpreted as the result of cliticization or prosodic proximity between the second and the third word? In our opinion, this change can hardly suggest that we should regard the second and the third word as forming together a single phonological word, all the more so a single prosodic word. We should allow ourselves the liberty to interpret word-initial assimilation as this one as an external sandhi phenomenon. Boundaries between prosodic word in particular and phonological words in general are not easy to detect (cf. Dixon & Aikhenvald 2002: 16; Fletcher 2010: §2; Basebøl 2000). As is clear from this example, sandhi phenomena may pose difficulties also in drawing morphosyntactic boundaries.



Representation of speech in CorpAfroAs

Giving attention to difficulties in boundary notations and the segmentation into prosodic words in particular and phonological words more generally, we propose that the units as represented in the tx tier may not be necessarily regarded as a lower level than Intonation Units in the prosodic hierarchy, although a rather widespread consensus may claim that they do, because prosodic and phonological units are usually not distinguished: The Phonological Word (or Prosodic Word) is located within the phonological hierarchy between the constituents defined in purely phonological terms (i.e., mora, syllable, foot) and those that involve a mapping from syntactic structure (i.e., clitic group, phonological phrase, intonational phrase, utterance). (Vogel 2006: 531)

The annotation of prosody in CorpAfroAs stops at the indication of boundaries. In their essence, the words contained in the tx tier are phonological and not strictly prosodic. The issue of determining prosodic or phonological words must be subject to further research. 3.2 Intonation unit The units of the next level are intonation units. It has long been recognized that spoken language organizes itself in segments of speech that can be accounted for by their suprasegmental structure. The suprasegmental unit according to which segmentation of the spoken language can be made has been conceived to be dependent mainly on tone, or rather pitch, and has therefore been termed “tone group”, “intonation group”, “tone unit”, “intonation(al) phrase”, “intonation unit”, or the like (e.g., Beckman & Pierrehumbert 1986; Halliday 1989; Selkirk 1984; Chafe 1994; Cruttenden 1997; Brazil 1997; Hirst & Di Cristo 1998; Fox 2000; Halliday 2004), where the identified prosodic stretch may be identical or different in some respects among the various approaches. Different paths have been used to explain the concept. Whatever approach is taken, it seems that there is a wide consensus that the intonation unit (henceforth: IU) encapsulates a functional, coherent segmental unit, be it syntactic, semantic, informational, or the like. IUs are therefore the first level of units where alignment of sound and transcription is made in CorpAfroAs. It seems commonly accepted that an IU is a coherent intonation contour, and some would define the IU in these terms (Chafe 1994; Du Bois et al. 1992, 1993; Tao 1996; etc.). An example of a prototypical coherent intonation contour can be seen in the pitch curve in Figure 4, depicting the intonation contour of a single IU from Beja cited as Ex. 8:

21

22

Shlomo Izre’el and Amina Mettouchi

(8)

ˈhoːɖaːbiˈjaːjiːha hoːj ɖaːbiˈjaj iːha hoːj ɖaːb-iˈja-j iː-ha 3ABL run-PFV.3SG.M-SUFX.PROG AOR.3SG.M-be ‘He managed to run away from there.’ (BEJ_MV_NARR_01_shelter_092)

ho ɖabjaj iha

Figure 4.  A coherent intonation contour (Ex. 8)

“A coherent intonation contour”, while quite easily perceivable, is rather hard to define in itself by acoustic, formal terms, nor is it easy to define an IU by any other internal criteria (Cruttenden 1997). In practice, segmentation of a discourse flow into IUs is made by detecting their boundaries, whereas internal criteria are brought into consideration only secondarily (Cruttenden 1997). This practice has been used successfully in transcribing large corpora (Du Bois et al. 1992, 1993; Du Bois 2004; Cresti & Moneglia 2005; cf. also Cheng, Greaves & Warren 2005, following the methodology of Brazil 1997). Theory has also inclined towards the delimitation of the intonation unit — or “intonational phrase” — by reference to “boundary tones”: “Each intonational phrase provides an opportunity for a new choice of tune, and … some parts of the tune serve to mark the phrase boundaries” (Pierrehumbert & Hirschberg 1990, 272); “Rappelons que le rapport de dominance dépend uniquement des tons finals; il est insensible aux éléments intonatifs apparaissant ailleurs dans le groupe” (Blanche-Benveniste et al. 1990: 172). A useful account of the study of prosodic structures will be found in Fox 2000; see also Beckman and Venditti 2010. Segmentation into IUs in CorpAfroAs was carried out applying both external and internal criteria, i.e., by detection boundaries of IUs and by looking at the internal structure of the pitch contour. Following previous research in various languages, we have decided to use four major perceptual and acoustic cues for boundary recognition as follows: (1) final lengthening; (2) initial rush; (3) pitch reset; (4) pause (cf. Cruttenden 1997; Du Bois et al. 1992; Hirst & Di Cristo 1998). It should be noted that the threshold over which we consider that the pause is significant have been set between 100 and 200 ms, depending on genre, language and rhythm of speech (see CorpAfroAs’ manual). The internal criteria used



Representation of speech in CorpAfroAs

— apart from an impressionistic-perceptual conception of a contour — were: (1) declination (Cruttenden 1997: §§4.4.4.4, 5.5.1; Wichmann 2000: §5.1.1; Fox 2000: §5.5.5; also called “downdrift”, Fox 2000: §4.2.2.3); (2) tonal parallelism, or isotony (Wichmann 2000: §4.3; Du Bois 2004). One may perhaps note at this juncture that the number of (morphosyntactic) words within an IU as exhibited in the CorpAfroAs texts is small, ranging between 1 and 7 (in extreme cases), with an average of ca. 2 to 4, depending on language and genre (for other languages see, inter alia, Chafe 1994: 64–65, 148). None of the four cues for prosodic boundaries is in itself a necessary or sufficient cue for the existence of an IU boundary, and languages may differ in their most prominent cue for delimitation of IUs (Hirst & Di Cristo 1998). This is the case also with the Afro-Asiatic languages represented in CorpAfroAs. Previous research on Hebrew has shown that tempo, notably final lengthening, is the highest in hierarchy among acoustic features presented at an IU boundary, whereas pause occupies the last position in this hierarchy (Amir, Silber-Varod & Izre’el 2004; endorsed in the CorpAfroAs research). Pauses, however, have been shown to be a prominent cue in perception of IU boundaries in both Hebrew and Kabyle (Mettouchi et al. 2007), as is the case with some other language in the CorpAfroAs sample (e.g., Ts’amakko, Juba Arabic). Some CorpAfroAs researchers have noted different hierarchies among acoustic features for their languages while working on transcription and segmentation; e.g. in the Ts’amakko and Juba Arabic subcorpora, pitch reset is the most frequent cue, whereas pause is the most perceptually prominent; the Moroccan Arabic subcorpus seems to favor pause as its most frequent cue, whereas the most perceptually prominent cue is pitch reset). Minor (= non-terminal) boundaries and major (= terminal) boundaries may differ in this hierarchy. Furthermore, pause may be interpreted as indicating a major boundary, thus overpowering the final tone movement in some cases. Genre or style of speech, among other features, may also exhibit divergent hierarchies. In Ex. 9 (and Figure 5) from Hebrew, the boundary between the first and the second IUs shows all four cues: lengthening of the last syllable of the first IU, fastrate production of the first syllables of the following IU, pitch reset from the level of 240 HZ at the end of the first IU to 145 Hz at the beginning of the second IU, and a 210 ms pause between the two units. All first three cues are presented also at the boundary between the second and the third IUs, but in this case there is no pause present. As for the internal criteria, this stretch rather clearly exhibits declination of the F0 contour on the second and third IUs, as well as, with some complication, also on the first IU. The final tone being high for the first two IUs, declination naturally stops before the respective final rises. One should further note that declination affects not only any single IU, but a sequence of IUs, forming together — as in this case — a paratone (see below).

23

24

Shlomo Izre’el and Amina Mettouchi

(9) χaʃuv ʃehu javin / ʃemeoto ʁega ʃehu halaχ / hakvutsa niʁet tov joteʁ // ‘It is important that he understand, that since the minute he left — the group looks better.’ (OM[=Omer 4.2: 1350”-1354”; CoSIH text)

χa∫uv ∫ehu javin /

∫emeoto rega ∫ehu halaχ /

hakvutsa niret tov joter //

Figure 5.  Intonation units: boundaries; declination (Ex. 9)

Isotony (Du Bois 2004), or tonic parallelism (Wichmann 2000), can be used to perceive an intonation contour, as it repeats itself in two or more adjacent IUs. This structure occurs notably in lists, but is found not infrequently also elsewhere, as in Ex. 10 (and Figure 6) and in Ex. 11 (and Figure 7), the first from Hebrew, the second from Ts’amakko: (10) haaχot baa / taktak nagaba / anijodaatma / tiplaba / vze / ‘The nurse arrived, / just touched her / — whatever — / took care of her / and so on, / …’ (C514_309’’-402’’; CoSIH text)

Figure 6.  Isotony (tonic parallelism) in Hebrew (Ex. 10)

(11) miːntea kalːatʃːo / kinːu ʃabbaje kiː // miːnt-e=ʔa kalːatʃː-o kinːu ʃab~b-a=je kij-i forehead-F=DEF anus-M M.POSS.3SG.F tie~PUNCT-JUSS.3SG.M=EMPH sayPFV.3SG.M ‘He said: “Let me tie your anus on the forehead.” ’ (TSB_GS_NARR_001_128)



Representation of speech in CorpAfroAs

mintea kalatʃo /

kinu ʃabbaje ki //

Figure 7.  Isotony (tonic parallelism) in Ts’amakko (Ex. 11)

The final tone of an IU carries with it functional load in terms of discourse structure and information structure, with implications for syntax. For C-ORAL-ROM, the basic structural unit of spoken language is an “utterance”, which is defined operatively as follows: “The operative definition of the utterance is such that every expression marked by a prosodic terminal break is an utterance” (Cresti & Moneglia 2005: 210). An utterance can include more than a single IU (referred to as an information unit), where the non-final IUs end with a “non-terminal” break. For the C-ORAL-ROM project, a prosodic break is considered terminal if a competent speaker assigns to it, according to his perception, the quality of concluding a sequence … a prosodic break is considered non-terminal if a competent speaker assigns to it, according to his perception, the quality of being non-conclusive. (Cresti & Moneglia 2005: 17)

The reasoning behind this choice is the same as the one determined for CorpAfroAs: [T]he annotation of terminal and non-terminal breaks does not describe the prosodic movement that actually occurs in correspondence with a specific speech segment, but rather it selects the specific segment where, according to perception, a significant movement occurs. At the same time the annotation does not specify which proper speech act is performed by a sequence of word, but rather, specifies which sequence of words performs an act, for prosodic reasons. … Once the relevant domain for prosodic movements and speech acts is determined, this will probably allow a better interpretation of both the relevant prosodic movements and the functional, dialogical value of the speech event. The same consideration can hold for syntactic features. Utterances cannot be identified and defined on the basis of syntactic properties as clauses can, for instance, but once an utterance is identified on the basis of a terminal break, any kind of morpho-syntactic and lexical evaluation can be driven on it. (Cresti & Moneglia 2005: 20)

It must be noted, however, that while Cresti and Moneglia have based their segmentation into prosodic units on speech act theory (op. cit.: 15 and note 17 on p. 67; 210), CorpAfroAs deliberately remains non-aprioristic in theoretical persuasion, left for its creators and end users for further research according to one’s own individual stance.

25

26 Shlomo Izre’el and Amina Mettouchi

CorpAfroAs concurs with the functional dichotomy between major and minor prosodic breaks, indicating terminal and continuing boundary tones by perception. Indicating boundary tones or breaks by perception has been proven reliable for C-ORAL-ROM (Cresti & Moneglia 2005: §1.2 and Appendix; Danieli et al. 2004). As it is not based solely on acoustic features but rather indicates functionality of the respective boundary tones as perceived by the annotator, the notation adopted for CorpAfroAs seems to be the best method for determining functional breaks, without any aprioristic ideas about the type of function involved. Still, for most subcorpora of CorpAfroAs, a concomitant acoustic check was carried out during the segmentation process and backed the perceptual indication of boundary breaks. In some cases the acoustic check served to refine prosodic notation; in other cases, it was an essential tool in the process, which was carried out using textgrids of Praat (see the CorpAfroAs manual, and Mettouchi & Chanard 2010). It should be noted that sometimes distinguishing between minor and major boundaries is not so easy, as there are cases where the final tone seems to be ambiguous. Major boundaries are usually better perceived than minor ones. On the other hand, syntax and discourse structure tend to influence this perception (cf. Mettouchi et al. 2007). CorpAfroAs indicates minor boundaries by a single slash /, major boundary by a double slash //. Questions are indicated at the rx tier by the notation Q, irrespective of their segmental or prosodic structure. In Ex. 12 (Figure 8) from Hebrew, the first IU presents a minor boundary, both the second and the third major boundary, where the first of the two carries a rise indicating a yes/no question and the last one carries a falling tone: (12) ma /(Q) ataʁaχavta alsus //(Q) ʃloʃa jamim // sp2: “What? You rode on a horse?” — sp1: “Three days.” (HEB_IM_ NARR_7_SP2_098 — SP1_0334)

Figure 8.  Minor and major notations; notation of a question (Ex. 12)



Representation of speech in CorpAfroAs

Not all IUs fall into the two categories terminal vs non-terminal. Indeed there are incomplete IUs in all of the subcorpora of CorpAfroAs. IUs that have not come to completion can be of two types: 1. An IU that has been truncated abruptly and can be perceived by prosodic cues like a shortening of a syllable or a part of a syllable, an additional glottal stop, along with a perceivable incompletion of a coherent intonation contour. Many an IU of this type will also end with a truncated word. This type of IU will be termed fragmentary (or truncated) and is marked in CorpAfroAs by a double crosshatch sign (##; a truncated word is indicated by a single crosshatch sign #). In Ex. 13 from Moroccan Arabic, a fragmentary IU which ends with a truncated word is indicated by both a single crosshatch sign and the double crosshatch sign, as explained in the CorpAfroAs manual: (13)

ɣaːtəlqa fiːha tlaːta djə# ## ɣaː=t-lqa f=ha tlaːta djə# ## ɣaːtəlqa fiːha rəbʕa dəlkudjaːt // ɣaː=t-lqa f=ha rəbʕa d=əl=kudj-aːt // FUT=2-find\IPFV in =OBL.3SG.F three djə# ## FUT=2-find\IPFV in =OBL.3SG.F four of=DEF=hill-PL ‘You will find in it three of th- You will find in it four of the hills.’ (ARY_AB_ narr_1_406–407)

In Ex. 13, the speaker has corrected the number from ‘3’ to ‘4’, having noticed her mistake only after she had already started to utter the following word. The truncation of the (phonological) word dəlkudjaːt is accompanied and perceived by the palatalization of the dental stop [d] (/d/ ‘of ’) and by a glottal stop following the schwa (not indicated in the transcription), which is the first segment of the definite article əl. 2. An IU that seems to have been meant to continue and therefore shows a non-abrupt intonation contour, mostly (or always) carrying a continuing boundary tone. Still, the following IU seems not to be a continuation of this IU but starts a new stretch of speech. This new stretch of speech can be perceived as such by some prosodic cues (notably a long pause or hesitation phenomena; cf. SilberVarod 2010; 2011; with previous references), by its syntactic structure, or by its semantic or pragmatic contents. In such instances, speakers may continue the stretch of speech, restart it or some part of it, or start a stretch of speech similar to the one already found in the suspended unit or any other unit before it, for example by rephrasing it. Alternatively, they can start a new sequence altogether. An IU of this type will not be regarded as truncated or fragmentary, but has been termed “suspended”. We should notice that prosodic structures of the so-called suspended IUs seem not to differ from prosodic contours of minor IUs. In fact, speakers tend

27

28

Shlomo Izre’el and Amina Mettouchi

at times to use suspension also as a discourse strategy, and therefore it would be a mistake to look at such IUs as representing cognitive failure. In Ex. 14, uttered by the same speaker who has contributed our Ex. 13, the first IU is truncated, the second is suspended. (14)

ɣiːṛ hiːja kaːt# ## (pause = 0.515 sec) ɣiːṛ hiːja kaː=t-# ## only 3SG.F REAL=3F-# ## kaːnt diːːːma fə fəːː / kaːn-ət diːma fə fə / be\PFV-3F always in in / ‘She was always in in …’ (ARY_AB_narr_1_026–027)

One should note that suspension is not a prosodic feature and is not recognized by prosodic cues. As mentioned above, the stretch of speech following a suspended unit cannot always be regarded as a direct continuation of the discourse presented at the suspended unit, either from the syntactic point of view or from the semantic point of view. In such cases, the discourse can resume (although it does not have to), either in close proximity to the suspended unit or at some distance from it, e.g., after a short or long parenthesis. Therefore, we have preferred the term “suspension” over other terms, such as “abandoned (unit)”. The term “suspension” or “suspended (unit)” was chosen because the discourse can resume, after a false start, after a parenthesis (that can be long) or not resumed. Segmentation into IUs was implemented in CorpAfroAs because text-sound indexation was a necessity. Several options were available: interpausal indexation, indexation of episodes, random chunking, periods, paratones etc. The choice of Intonation Units was favored in view of its possible correlation with morphosyntactic issues, one of the aims of the CorpAfroAs project being to analyze prosody and morphosyntax in several Afroasiatic languages. This type of segmentation proved very valuable for the study of grammatical relations: Mettouchi (to appear) shows that in Kabyle (Berber), the grammatical relations subject and object are only transparently coded within the intonation unit containing the verb. In the same paper, the author also shows that the interaction of IU boundaries, linear order and state are the building blocks of information structure constructions in Kabyle. Other papers using CorpAfroAs data (see for instance Malibert & Vanhove this volume, Caron, Lux, Manfredi & Pereira, this volume) use the IU as a unit for the study of information structure and syntactic dependency. This kind of segmentation is therefore relevant for syntactic and pragmatic studies.



Representation of speech in CorpAfroAs

3.3 Paratone The next level is a paratone. The term “paratone”, or “paratone group”, coined on the analogy of the term “paragraph”, has been used by some authors for the idea of a coherent formal sequence of intonation units (Crystal 2008 s.v.). Fox (1973 and subsequent studies), along lines suggested by Palmer (1922: section XI; 1924: 21– 23), conceives a paratone (or a paratone group) as a larger prosodic unit than a tone group (in our terminology: intonation units), where “one or more major tone-groups are optionally preceded and/or followed by minor tone-groups” (Fox 2000: 318). Brown (1977: §5.2.1), who worked on read aloud news items, has defined a “paratone” on the basis of the organizational pattern of tone groups: If we go on to study the organization of a whole news item we shall find that the final tonic syllable in the complete item is marked by an even bigger pitch movement. So all the tonic syllables of what we might call the ‘paratone’, after the model of ‘paragraph’, are grouped together. The function of this patterning is to signal to the listener which tone groups are joined together in some larger structure and where the end of the larger structure comes. (Brown 1977: 86–7)

As analyzed and exemplified, Brown’s notion of “paratone” suggests a sequence of IUs forming a sentence-like stretch (Brown 1977: §5.1; cf. Brown, Currie & Kenworthy 1980: §2.3). The “paratone” is further defined as a prosodic unit that encompasses a discourse where a new topic is being introduced (Brown, Currie & Kenworthy 1980: §2.3 and §3.6.ii; Brown & Yule 1983: §3.6.2). According to Brown (1990: 92), “[t]he most obvious phonetic cues [for the recognition of a paratone] are the high placing of the onset of a paratone, the brevity of the pauses within it, and the gradual drift down in overall pitch height towards a low ending”. In these terms, Brown’s “paratone” is closer to the notion of the oral “paragraph” as described by Wichmann (2000; cf. her discussion of Brown’s “paratone” in §5.2.1) and the notion of “period” as suggested below, §2.4.4 Noticing this ambiguity in Brown’s definition and criteria, Yule (1980) has suggested the notion of “major paratone” for a single-topic related stretch, whereas the notion of “minor paratone” has been left somewhat ambiguous (see further Brown, Currie & Kenworthy, 4.  The term ‘période’ is employed in the French tradition for the notion of a unit that is larger than a clause or a comparable unit of the spoken language, but the definition of this unit has been different among scholars (Avanzi, Benzitoun and Glikman 2007). Work in computational linguistics has come up with a set of parameters to detect périodes automatically (Lacheret and Victorri 2002). It seems to us that this set of parameters may fit — mutatis mutandis — a prosodic unit which is located in hierarchy between an intonation unit and what we have defined below as ‘period’. However, the relationship between a ‘période’ defined in these or similar terms and a ‘paratone’ as defined here is still to be sought.

29

30

Shlomo Izre’el and Amina Mettouchi

71, who define the difference between major and minor paratone by the strength of their respective prosodic cues). In CorpAfroAs, a “paratone” can be defined as one or more IUs ending in a major (terminal) final boundary, where any (optional) previous IU carries a minor (continuing) boundary tone. In this we follow the path of C-ORAL-ROM, for which a similar sequence has been defined as “utterance”, as we have seen above. As the paratone frequently conveys a unified and coherent idea, and as translation may need to capture the whole idea conveyed by a paratone rather than by any individual IU internal to this paratone, the current ft tier has for some languages been supplemented with an mft tier aligned on paratones rather than intonation units. A comparative study of the respective merits of those two alignments still has to be done. Empirically, it appears that a paratone-aligned translation is highly desirable for V-final languages, but no theoretical investigation of the reasons behind this preference has been conducted yet. At the time of writing this report, no significant theoretical studies on paratones have been conducted on any of the CorpAfroAs languages apart from Hebrew. In a preliminary study based on data from The Corpus of Spoken Israeli Hebrew (CoSIH), Izre’el (forthcoming) suggests an interface between prosodic, discursive and syntactic units where the paratone, which encapsulates a discourse unit termed ‘utterance’, is the default domain of the clause rather than the IU, as is accepted by many authors (e.g., Chafe 1994: 65–6; Kibrik & Podlesskaya 2006). This suggestion is based on empirical research on spontaneous, everyday speech, mostly conversations, where a significant percentage of the attested IUs encapsulate only part of the components within a clause, whereas the paratone can neatly be regarded as the default domain of a clause, although it can consist of several clauses forming together a single discourse unit. Table 1 summarizes the interface between prosodic, discursive and syntactic units in spoken Hebrew. Table 1.  Prosodic, discursive and syntactic units in Hebrew. Syntactic units

Discourse units

Prosodic units

Clause/Clause Cluster

Utterance

Paratone

Phrase/Clause (/Clause Cluster)

Speech Group (one of two or more in an ­utterance)

Prosodic Group (one of two or more in a paratone)

A prototypical paratone can be seen in Ex.15 (Figure 9):



(15)

Representation of speech in CorpAfroAs

tːpaˈʃut mɔˈʁaχtala / ɛjˈtɔnː / ˈdɛvɛk // at paˈʃut mɔˈʁaχat al=ha= /itɔn / ˈdɛvɛk // youSGF simply spreadSGF on=the=newspaper glue ‘Do you just spread glue on the newspaper?’ (C714_sp1_20–22; CoSIH text)

tpaʃut mraχtala /

jtn /

dvk //

Figure 9.  A prototypical paratone (Ex. 15)

Another typical paratone is shown in Ex. 16 (Figure 10). As is the case with the preceding example, Ex. 16 too exhibits a paratone composed of three IUs and consisting of a single clause: (16) vuuaˈsa ɛtaisuˈmim / mbχinatsfiˈʁɔt / vɛaˈkɔl vɛaˈkɔl // v=hu=aˈsa ɛt=ha=jisuˈmim / mi=bχinat=sfiˈʁɔt / v=ha=ˈkɔl v=ha=ˈkɔl // and=he_did acc=the=applications from=aspectF−of counts and=the=all and=the=all ‘And he made the applications as regards counting and all?’ (C612_4_ sp2_115–7; CoSIH text)

Figure 10.  A Typical paratone (Ex. 16)

Ex. 9, already cited above and cited again here as Ex. 17, consists of three IUs which include more than a single clause, forming together a discourse unit: (17) χaʃuv ʃehu javin / ʃemeoto ʁega ʃehu halaχ / hakvutsa niʁet tov joteʁ // ‘It is important that he understand, that since the minute he left — the group looks better.’ (OM[=Omer 4.2: 1350”-1354”; CoSIH text)

31

32

Shlomo Izre’el and Amina Mettouchi

χa∫uv ∫ehu javin /

∫emeoto rega ∫ehu halaχ /

hakvutsa niret tov joter //

Figure 11.  A paratone consisting of three intonation units including more than a single clause (Ex. 17)

Although much further research on the concept of paratone and its discourse parallel unit, the utterance, is still wanting, some further few comments may be in order now. Although in general a paratone would be delineated by a perceivable major boundary, there are cases where a stretch of speech does not seem to carry a perceivable terminal tone, yet the continuing IU does not readily form part of one and the same paratone with that stretch. These are usually cases of fragmentary or suspended IUs. As explained above, a fragmentary IU is one that ends abruptly and has a perceivable prosodic cue(s) for truncation; a suspended IU is one that seems like a coherent minor IU, yet the following IU does not seem to be its direct continuation, either in prosodic terms or from the point of view of syntax, semantics or pragmatics. Therefore, paratones can also be perceived as either fragmentary or suspended. While the end boundary of a paratone is easy to delineate, determining its beginning is somewhat more complex. As is obvious from the above, a new paratone may follow another paratone that — as defined — carries a major boundary tone, i.e., follows a major boundary. A paratone can further start after a fragmentary or suspended paratone (=IU), as is the case in the Ex. 18 from Hebrew: (18) veaz / kʃeχazaʁnu le / ulanbatoʁ / lakaχnu ta / tʁansibiʁit χazaʁa le / (pause) ‘And then / when we returned to / Olan Bator / we took the / Trans-Sibirian back to /’ (suspension) lo jaʁadnu ## ‘We didn’t get off ’ –– (truncation) lo imʃaχnu lebejdʒin / jaʁadnu po bebe# datong / ‘We did not continue to Beijin; we got off here, in Be- Datung,’ (HEB_IM_ NARR_7_SP1_0815–0823)

Ex. 18 exhibits two new starts. The suspended paratone and the new start following it are recognized by pause, rhythm change (length at the suspension point and rush in the following IU, in itself fragmented, with an immediate restart of yet a new paratone with a change in the lexicon. The truncated prosodic unit is



Representation of speech in CorpAfroAs

recognized as a separate unit by only a pitch reset at its right boundary, so its independent status is somewhat questionable. In contrast to the above, a suspended unit can be shown to be an integral part of a single paratone, albeit not necessarily a coherent one. In Ex. 19, again from Hebrew, the speaker continues with a very similar topic as the one she was speaking about. Further, the speaker repeats the last word of the suspended unit and continues from there both syntactically and semantically, and in some way also prosodically. Moreover, the suspended unit ends with a level boundary tone which signals stronger continuation than a rising tone (Silber-Varod 2011). (19) ma jaani hem baim ## (pause) baim beeze ʃaloʃ babokeʁ e / aχaʁe miklaχat // what meaning they come ## (pause) come like three in_the_morning uh / after shower // ‘What? You mean, they come like three in the morning after shower?’ (OCD6/1_41’:20’’-41’:23’’; CoSIH text)

Of course, the beginning of a discourse or a conversational turn will also start with a paratone. While this seems an obvious conclusion from the definition of a paratone, there are cases where a single paratone will be divided between interlocutors (Lerner 1996, 2004). Ex. 20 from Hebrew presents such a case.5 (20) sp2: az ze lo haja madʁiχ / ze haja paʃut em / ‘So, this was not a guide, it was just uh …’ sp1: miʃehu ʃe [ose et ze // someone who [does_this.’ sp2: [miʃehu / ʃe hovil otχem // ‘[someone who took you.’ (HEB_IM_NARR7_sp2_166–167; sp1_0651)

This is an especially interesting case, as the speaker that started the paratone also continues it, but his interlocutor catches in the middle and continues the same paratone himself. A significant prosodic cue for delineating paratones is the seemingly universal feature of declination. As declination is apparently a natural feature, it is discernible also in IUs (see above, §2.3). However, declination transcends IUs and is observable also in paratones, as well as in periods (see below, §2.4). In such cases, a pitch reset may occur between IUs comprising the paratone, but the overall curve will usually be lower in each IU than in the one preceding it. Ex. 9 above nicely 5.  An opening bracket [ indicates the starting point of an overlap. (CorpAfroAs does not indicate overlaps as they are visually represented by the sound-transcription alignment.)

33

Shlomo Izre’el and Amina Mettouchi

shows the feature of declination as it is observable in the paratone depicted there and in each of the three IUs that comprise this paratone. Special cases are paratones with the insertion of parenthetical units. Some parentheses end with a major boundary, but they still show some prosodic cues like low pitch or reduced loudness that may enable us to regard the following units as continuing of an on-going paratone. Ex. 21 (and Figure 12) from Hebrew will illustrate the case: (21) [a] jeʃ ʃam paʁk /(pause) [b] lo jodea ma // [c] kama dunamim tovim / (pause) [d] male male gumχot / im male / psalim ktanim / bealafim / pesel eχad anak / tsiv# tsavua / lo tsavua / hakol budot // ‘[a] There is a park over there, — [b] I’m not sure — [c] (the size of it is) a good number of acres; [d] (it has) many many alcoves, with many small statues, by the thousand; (there was) one huge statue; (there was another one) col- colored, (still one other) not colored; all (these are) Budha(-statue)s.’ (HEB_IM_NARR7_sp1_0837–0850)

[a]

pause

creak

34

[b]

[c]

pause [d]

Figure 12.  Paratone with a parenthesis inside (Ex. 21)

The second IU [b] (italicized), lo jodea ma // ‘I am not sure’ ends in a major boundary, yet it is marked as a parenthesis by a low (and descending) pitch. The following unit [c], kama dunamim tovim / ‘a good number of acre’ is still uttered in a low pitch, yet it rises at the end of the unit, indicating a return to the paratone stretch by a continuing (=minor) boundary. Indeed, the following IU [d] continues the paratone in both segmental and prosodic terms (note the declination throughout the paratone from the first IU [a]). Parentheses in general, and the relationship between paratones and parenthetical units in particular, deserve special research (cf., inter alia, Barth-Weingarten, Dehé & Wichmann 2009; Debaisieux & Martin 2009). Summing up, a paratone may be recognized by the following internal (1, 2) or external (3, 4) cues:



Representation of speech in CorpAfroAs

1. If a paratone consists of either a single IU or of more than a single IU, it will show declination of the intonation curve throughout the entire stretch of the paratone. A change in the downdrift direction may occur if the last (or only) IU is an interrogative one (“yes/no” question) or other prosodically marked stretches such as exclamations or commands. 2. If a paratone consists of more than a single IU, each of the non-final IUs composing this paratone will carry a minor boundary tone. 3. A paratone begins following an IU ending in a major boundary tone; at the beginning of a discourse or at the beginning of a turn (unless shared by two interlocutors); following a fragmentary or a suspended IU (and therefore recognized mostly by non-prosodic features). 4. A paratone ends in a major boundary tone. If fragmentary or suspended, the final boundary of a paratone can be discerned by prosodic cues (e.g., a long pause) or by noticing a new start in non-prosodic terms. 5. A parenthesis ending in a major boundary may under certain conditions be inserted into a paratone. The notion of paratone, as well as the prosodic and segmental criteria for defining paratone, still need much further research. It may perhaps be noted at this juncture that the number of IUs in a paratone as exhibited in the CorpAfroAs texts is usually small, depending on language and genre. In a significant number of cases, a paratone will consist of only a single IU; e.g., in the Hebrew part of CorpAfroAs, 37% of the paratones in the narrative texts and 49% of the conversational texts consist of only a single IU. 3.4 Period A Period is the highest level in the prosodic hierarchy. A period will be defined as a speech stretch that shows declination along its paratones (“supradeclination” according to Wichmann 2000: §5.2.2), as well as by other prosodic means, e.g., isotony at specific defined stretches (cf. Martin 2009: §4.3). Contrary to the paratone, the period does not require that internal unit boundaries be continuing (minor) ones. A period encapsulates a “passage” in segmental terms (i.e., it shows some unity in syntactic, pragmatic or discursive structure, which is larger than an utterance). In a way, then, a spoken period can be compared to a written paragraph (Yule 1980; Brown & Yule 1983: §3.6.2; Wichmann 2000; cf. the discussion of “paratone” and “major paratone” in §2.3 above). There is no reference to periods in the texts compiled and analyzed for CorpAfroAs, and the question remains a research topic for the future. Still

35

36

Shlomo Izre’el and Amina Mettouchi

the following two examples, Ex. 22 (Figure 13) from Lybian Arabic and Ex. 23 (Figure 14) from Hebrew, will illustrate what can be referred to as a period. (22) haːda / ssaħləb / (pause) əːːː jəʃəṛbuːh lamma fəʃʃte // (pause) ṣagaʕ // ‘This / salep — / they drink it during the winter. // Cold. //’ (AYL_CP_ narr_003_068–072)

hada /

ssaħləb /

pause ə

jəʃərbuh lamma fəʃʃte //

pause

sḷaga //

Figure 13.  Period in Lybian Arabic (Ex. 22)

(23) ʃam amʁu ʃebenladen joʃev // lədaati biʃvilze hu nasa / χaʃvu ʃejitfesu oto // basof lo tafsu oto // ‘They said Ben Laden is residing there.// I think that this is why he went there. / They thought he will be caught. / At the end they did not catch him. //’ (HEB_IM_NARR_7_SP2_021–024)

ʃam amru ʃebenladen joʃev //

lədaati biʃvilzehu nasa //

χaʃvu ʃejitfesu oto / basof lo tafsu oto //

Figure 14.  Period in Hebrew (Ex. 23)

4. Conclusions This survey of the phonetic and transcriptional aspects of CorpAfroAs allows to sketch a portrait of the Corpus in terms of the choices that were implemented. First of all, the priority was given to the close relationship between the tx tier and the sound file, mirroring the structure of the software, in which tx is indexed to the sound file represented by the waveform window in ELAN. The transcription in tx was therefore meant to reproduce as faithfully as possible the spoken monologue or interaction, allowing the end-user to recognize the elements of the speech continuum. However, the length of the corpus does not allow detailed phonetic



Representation of speech in CorpAfroAs

representation, therefore, a degree of phonologization of the transcription was introduced, resulting in a broad phonetic transcription. In this tier, words are phonological (as opposed to morphosyntactic). The segmental string was segmented into prosodic units, defined by their boundaries and by their coherent internal contour. Intonation units were chosen over syntactic units (clauses or phrases) because they are the only organic units of speech. At a later stage, the corpus could be further segmented into other units if needed for further research on the correspondence between syntactic and prosodic units. The tx tier was in turn further morpho-phonologized so that the mot tier should be composed of morphosyntactic words, morphemically transcribed. This level opens the way for a tokenization into morphemes in the mb tier. Those morphemes are then glossed in ge and rx. Finally, a free translation was given, which is currently aligned with respect to Intonation Units, but should ideally be aligned with respect to paratones rather than individual intonation units, because the latter often provide too small translation chunks which are difficult to organize together to form a coherent translation in the target language, English. The process which led us to those decisions was based on some assumptions about the nature of speech, and on the research questions that interested us: the comparison between tx and mot for instance, allows the systematic study of sandhi and other similar phenomena, and of the syntax/prosody interface. The segmentation into prosodic units allows the study of various interfaces: syntax, information structure, discourse.

References Aikhenvald, Alexandra Y. 2002. Typological parameters for the study of clitics, with special reference to Tariana. In Word: A Cross-linguistic Typology, Robert M. W. Dixon & Alexandra Y. Aikhenvald (eds), 42–78. Cambridge: CUP. Amir, Noam, Silber-Varod, Vered & Izre'el, Shlomo. 2004. Characteristics of intonation unit boundaries in spontaneous spoken Hebrew: Perception and acoustic correlates. In Speech Prosody 2004, Nara, Japan, March 23-26, 2004: Proceedings, Bernard Bel & Isabelle Marlien (eds), 677–680. Anderson, Stephen R. 2005. Aspects of the Theory of Clitics [Oxford Studies in Theoretical Linguistics 11]. Oxford: OUP. DOI: 10.1093/acprof:oso/9780199279906.001.0001 Avanzi, Mathieu, Benzitoun, Christophe & Glikman, Julie. 2007. Comment se comprendre sans se méprendre? L'exemple de trois termes problématiques: Période, parataxe et subordination inverse. In Actes du 4ème Colloque Doctorants et Jeunes Chercheurs en Sciences du Langage (Coldoc’07) : Le vocabulaire scientifique et technique en Sciences du Langage, Nanterre, 20-21 juin 2007. 

37

38

Shlomo Izre’el and Amina Mettouchi Barontini, Alexandrine. 2012. Moroccan Arabic Corpus. Corpus recorded, transcribed and annotated by Alexandrine Barontini. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken Afro-Asiatic Languages.  DOI:http://dx.doi.org/10/1075/scl.68.website. Accessed on 10 January 2012. Barth-Weingarten Dagmar, Dehé, Nicole & Wichmann, Anne (eds). 2009. Where Prosody Meets Pragmatics [Studies in Pragmatics 8]. Bingley: Emerald. Basebøl, Hans. 2000. Word boundaries. In Morphologie: ein internationales Handbuch zur Flexion und Wortbildung = Morphology: An International Handbook on Inflection and Word-formation [Handbücher zur Sprach- und Kommunikationswissenschaft – Janbooks of Linguistics and communication Science 17(1)], Gert Booij, Christian Lehmann & Joachim Mugdan, in collaboration with Wolfgang Kesselheim & Stavros Skopeteas (eds), #40, 377–388. Berlin: Walter de Gruyter. Beckman, Mary E. & Pierrehumbert, Janet B. 1986. Intonational structure in Japanese and English. Phonology Yearbook 3: 255–309. DOI: 10.1017/S095267570000066X Beckman, Mary E. & Venditti, Jeniffer J. 2010. Tone and intonation. In The Handbook of Phonetic Sciences, 2nd edn [Blackwell Handbooks in Linguistics], William J. Hardcastle, John Laver & Fiona E. Gibbon (eds), 603–650. Chichester: Wiley-Blackwell.  DOI: 10.1002/9781444317251.ch16 Blanche-Benveniste, Claire, Bilger, Mirelle, Rouget, Christine & Karel van den Eynde. 1990. Le français parlé: Études grammaticales, Participation de Piet Mertens [Sciences du Language]. Paris: CNRS Éditions. Brazil, David. 1997. The Communicative Value of Intonation in English. Cambridge: CUP. Brown, Gillian. 1977. Listening to Spoken English [Applied Linguistics and Language Study]. London: Longman. Brown, Gillian. 1990. Listening to Spoken English, 2nd edn [Applied Linguistics and Language Study]. London: Longman. Brown, Gillian, Currie, Karen L. & Kenworthy, Joanne. 1980. Questions of Intonation. London: Croom Helm. Brown, Gillian & Yule, George. 1983. Discourse Analysis [Cambridge Textbooks in Linguistics]. Cambridge: CUP. DOI: 10.1017/CBO9780511805226 Caink, Andrew D. 2006. Clitics. In Encyclopedia of Language and Linguistics, 2nd edn, Keith Brown (ed.), 491–495. Oxford: Elsevier. DOI: 10.1016/B0-08-044854-2/00110-3 Chafe, Wallace. 1994. Discourse, Consciousness, and Time: The Flow and Displacement of Conscious Experience in Speaking and Writing. Chicago IL: The University of Chicago Press. Cheng, Winnie, Chris Greaves & Martin Warren. 2005. A Corpus-driven Study of Discourse Intonation: The Hong Kong Corpus of Spoken English [Studies in Corpus Linguistics 32]. Amsterdam: John Benjamins. DOI: 10.1075/scl.32 CoSIH: The Corpus of Spoken Israeli Hebrew (CoSIH): Cresti, Emanuela & Moneglia, Massimo (eds). 2005. C-ORAL-ROM: Integrated Reference Corpora for Spoken Romance Languages [Studies in Corpus Linguistics 15]. Amsterdam: John Bejnamins. DOI: 10.1075/scl.15 Cruttenden, Alan. 1997. Intonation, 2nd edn [Cambridge Textbooks in Linguistics]. Cambridge: CUP. DOI: 10.1017/CBO9781139166973 Crystal, David. 2008. A Dictionary of Linguistics and Phonetics, 6th edn. Oxford: Blackwell.  DOI: 10.1002/9781444302776



Representation of speech in CorpAfroAs

Danieli, Morena, Garrido, Juan María, Moneglia, Massimo, Panizza, Andrea Quazza, Silvia & Swerts, Marc . 2004. Evaluation of consensus on the annotation of prosodic breaks in the Romance corpus of spontaneous speech ‘C-ORAL-ROM’. In Speech Corpus Production and Validation, LREC 2004: Fourth International Conference on Language Resources and Evaluation, 24th May, 2004, Lisbon, Christoph Draxler, Henk van den Heuvel & Florian Schiel (eds). 1513–1516. < http://www.lrec-conf.org/proceedings/lrec2004/pdf/371.pdf > Debaisieux, Jeanne-Marie & Martin, Philippe. 2010. Les parenthèses: Étude macrosyntaxique et prosodique sur corpus. In La parataxe, Tome 1: Entre dépendance et intégration, Tome 2: Structures, marquages et exploitations discursives, Marie-José Béguelin, Mathieu Avanzi & Gilles Corminboeuf (eds). Bern: Peter Lang. Dixon, Robert M. W. & Aikhenvald, Alexandra Y. (eds). Word: A Cross-linguistic Typology. Cambridge: CUP. Du Bois, John W., Cumming, Susanna, Schuetze-Coburn, Stephan & Paolino, Danae. 1992. Discourse Transcription [Santa Barbara Papers in Linguistics 4]. Santa Barbara CA: Department of Linguistics, University of California, Santa Barbara. Du Bois, John W., Cumming, Susanna, Schuetze-Coburn, Stephan & Paolino, Danae. 1993. Outline of discourse transcription. In Talking Data: Transcription and Coding in Discourse Research, Jane A. Edwards, & Martin D. Lampert (eds), 45–89. Hillsdale NJ: Lawrence Erlbaum Associates. Du Bois, John W. 2004. Representing Discourse. Part 2: Appendices and Projects. Santa Barbara CA: Linguistics Department, University of California. Esling, John H. 2010. Phonetic Notation. In The Handbook of Phonetic Sciences, 2nd edn [Blackwell Handbooks in Linguistics], William J. Hardcastle, John Laver & Fiona E. Gibbon (eds), 678–702. Chichester: Wiley-Blackwell. DOI: 10.1002/9781444317251.ch18 Fletcher, Janet. 2010. The Prosody of Speech: Timing and Rhythm. In The Handbook of Phonetic Sciences, 2nd edn [Blackwell Handbooks in Linguistics], William J. Hardcastle, John Laver & Fiona E. Gibbon (eds), 523–602. Chichester: Wiley-Blackwell. Fox, Anthony. 1973. Tone sequels in English. In Archivum Linguisticum 4 (new series): 17–26. Fox, Anthony. 2000. Prosodic Features and Prosodic Structure: The Phonology of Suprasegmentals. Oxford: OUP. Halliday, Michael A.K. 1989. Spoken and Written Language, 2nd edn. Oxford: OUP. Halliday, Michael A. K. 2004. An Introduction to Functional Grammar. 3rd edn revised by Christian M. I. M. Matthiessen. London: Arnold. Hirst, Daniel & Di Cristo, Albert (eds). 1998. Intonation Systems: A Survey of Twenty Languages. Cambridge: CUP. Izre'el, Shlomo. Forthcoming. Basic units of language: Prosody, discourse and syntax. In Researching Spoken Hebrew, Einat Gonen (ed.). (In Hebrew; English version in preparation). Julien, Marit. 2006. Word. In Encyclopedia of Language and Linguistics, 2nd edn, Keith Brown (ed.), 617–624. Oxford: Elsevier. DOI: 10.1016/B0-08-044854-2/00130-9 Kibrik, Andrej A. & Podlesskaya, Vera I. 2006. Problema segmentacii ustnogo diskursa i kognitivnaja sistema govorjashchego (Segmentation of spoken discourse and the speaker’s cognitive system). In Kognitivnye issledovanija, Vol. 1, Valerij D. Solovyev (ed.), 138–158. Moscow: Institut psixologii RAN. ; English summary: Discourse as a kind of cognitive activity: The principles of segmentation. In The Second

39

40 Shlomo Izre’el and Amina Mettouchi Biennial Conference on Cognitive Science, June 9-13, 2006, St. Petersburg, Russia, Abstracts, Vol. 2, 501–503. Lacheret, Anne & Victorri, Bernard. 2002. La période intonative comme unité d'analyse pour l'étude du français. In Verbum 24/1-2: Y a-t-il une syntaxe au-delà de la phrase?, Michel Charolles, Pierre Le Goffic & Mary-Annick Morel (eds), 55–72. Lerner, Gene H. 1996. On the ‘semi-permeable’ character of grammatical units in conversation: Conditional entry into the turn space of another speaker. In Interaction and Grammar, Elinor Ochs, Emanuel A. Schegloff & Sandra A. Thompson (eds), 238–276. Cambridge: CUP. DOI: 10.1017/CBO9780511620874.005 Lerner, Gene H. 2004. Collaborative Turn Sequences. In Conversation Analysis: Studies from the First Generation [Pragmatics & Beyond New Series 125], Gene H. Lerner (ed.), 225–256. Amsterdam: John Benjamins. DOI: 10.1075/pbns.125.12ler Malibert-Yatziv, II -II. 2012. ‘Hebrew Corpus’. Corpus recorded, transcribed and annotated by II-II Malibert-Yatziv. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOl: http://dx.doi.org/10.1075/scI.68.website. Accessed on 10 January 2012. Manfredi, Stefano. 2012. ‘Juba Arabic Corpus’, Corpus recorded, transcribed and annotated by Stefano Manfredi. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages, DOI: http://dx.doi.org/10.1075/scl.68.website. Accessed on 10 January 2012. Martin, Philippe. 2009. Intonation du français [Collection U • Linguistique]. Paris: Armand Colin. Matthews, Peter. H. 2007. Oxford Concise Dictionary of Linguistics, 2nd edn [Oxford Paperback Reference]. Oxford: OUP. Mettouchi, Amina. 2012. ‘Kabyle Corpus’. Corpus recorded, transcribed and annotated by Amina Mettouchi. In Amina Mettouchi & Christian Chanard (eds). The CorpAfraAs Corpus of Spoken AfroAsiatic Languages. DOl: http://dx.doi.org/10.1075/scI.68.website. Accessed on 10 January 2012. Mettouchi, Amina. To appear. The Interaction of State, Prosody and Linear Order in Kabyle (Berber): Grammatical relations and information structure. In Data and Perspectives in Afroasiatic, Alessandro Mengozzi & Mauro Tosco (eds). Amsterdam: John Benjamins. Mettouchi, Amina & Chanard, Christian. 2010. From fieldwork to annotated corpora: The CorpAfroAs project. Faits de Langues – Les Cahiers 2: 255–266. Mettouchi, Amina, Lacheret-Dujour, Anne, Silber-Varod, Vered & Izre'el, Shlomo. 2007. Only prosody? Perception of speech segmentation. In Nouveauz cahiers de linguistique française 28: Interfaces discours – prosodie: Actes du 2ème Symposium international and Colloque Charles Bally, 207–218.  ; sound files and transcriptions: < http://clf.unige.ch/annexe.php?article=108> Palmer, Harold E. 1922. English Intonation; with Systematic Exercises. Cambridge: Heffer & Sons. Palmer, Harold E. 1924. A Grammar of Spoken English: On a Strictly Phonetic Basis. Cambridge: Heffer & Sons. Pereira, Christophe. 2012. ‘Tripolinian Arabic Corpus’. Corpus recorded, transcribed and annotated by Christophe Pereira. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages.  DOl: http://dx.doi.org/10.1075/scI.68.website. Accessed on 10 January 2012. Pierrehumbert, Janet & Hirschberg, Julia. 1990. The meaning of intonational contours in the interpretation of discourse. In Intentions in Communications [Systems Development



Representation of speech in CorpAfroAs

Foundation Benchmark Series], Philip R. Cohen, Jerry Morgan & Martha E. Pollak (eds), 271–311. Cambridge MA: The MIT Press. Savà, Graziano. 2012. ‘Ts’amakko Corpus’. Corpus recorded, transcribed and annotated by Graziano Savà. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOl: http://dx.doi.org/10.1075/scl.68.website. Accessed on 10 January 2012. Schiering, René, Bickel, Balthasar & Hildebrandt, Kristine A. 2010. The prosodic word is not universal, but emergent. Journal of Linguistics 46: 657–709. DOI: 10.1017/S0022226710000216 Selkirk, Elisabeth. 1984. Phonology and Syntax: The Relation between Sound and Structure. Cambridge MA: The MIT Press. Shattuck-Hufnagel, Stefanie & Turk, Alice E. 1996. A prosody tutorial for investigators of auditory sentence processing. Journal of Psycholinguistic Research 25: 193–247.  DOI: 10.1007/BF01708572 Silber-Varod, Vered. 2010. Phonological aspects of hesitation disfluencies. Proceedings of Speech Prosody 2010, Chicago. < http://www.speechprosody2010.illinois.edu/papers/100020.pdf > Silber-Varod, Vered. 2011. The SpeeCHain Perspective: Prosodic-Syntactic Interface in Spontaneous Spoken Hebrew. PhD dissertation, Tel-Aviv University.  Tao, Hongyin. 1996. Units in Mandarin Conversation: Prosody, Discourse, and Grammar [Studies in Discourse and Grammar 5]. Amsterdam: John Benjamins. DOI: 10.1075/sidag.5 Tosco, Mauro. 2012. ‘Gawwada Corpus’. Corpus recorded, transcribed and annotated by Mauro Tosco. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOl: http://dx.doi.org/10.1075/scI.68.website. Accessed on 10 January 2012. Vanhove, Martine. 2012. ‘Beja Corpus’. Corpus recorded, transcribed and annotated by Martine Vanhove. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AjraAsiatic Languages. DOl: http://dx.doi.org/10.1075/scI.68.website. Accessed on 10 January 2012. Vogel, Irene. 2006. Phonological words. In Encyclopedia of Language and Linguistics, 2nd edn, Keith Brown (ed.), 531–534. Oxford: Elsevier. DOI: 10.1016/B0-08-044854-2/00043-2 Wells, John C. 2006. Phonetic transcription and analysis. In Encyclopedia of Language and Linguistics, 2nd edn, Keith Brown (ed.), 386–396. Oxford: Elsevier.  DOI: 10.1016/B0-08-044854-2/00014-6 Wichmann, Anne. 2000. Intonation in Text and Discourse: Beginnings, Middles and Ends [Studies in Language and Linguistics]. Harlow: Pearson Education. Yule, George. 1980. Speakers’ topics and major paratones. Lingua 52: 33–47.  DOI: 10.1016/0024-3841(80)90016-9 Zwicky, Arnold M. 1977. On Clitics. Bloomington IN: Indiana University Linguistics Club. Zwicky, Arnold M. 1995. What is a clitic? In Clitics: A Comprehensive Bibliography, 1892-1991 [Library & Information Sources in Linguistics Series 22], Joel Ashmore Nevis, Brian D. Joseph, Dieter Wanner & Arnold M. Zwicky (eds), xii–xx. Amsterdam: John Benjamins. DOI: 10.1075/lisl.22

41

Tone and intonation* Bernard Caron

Llacan (UMR 8135): Inalco, CNRS, PRES Sorbonne Paris-Cité

Most of the literature on intonation derives from pioneering studies on English intonation. These authors and their followers have identified the exponents of intonation as F0, rhythm (including length and pauses) and intensity. The difficulty when studying intonation in ‘tone languages’ is that F0 is already mobilised by the lexicon and the morpho-syntax.1 The question is then: does pitch play a role in the intonation of tone languages, and how? Is this role comparable to that of pitch in non-tonal languages? This problem became crucial in the transcription and segmentation of the tonal languages represented in the CorpAfroas corpus of Afroasiatic languages, viz. Hausa and Zaar, two Chadic languages on the one hand, and Wolaytta, an Omotic language on the other hand. The objectives of this study are (i) to identify the basic components of pitch that can be isolated from tone and attributed to intonation; (ii) to establish them as the elements that must be accounted for in the transcription of an oral corpus. These components are meant to be available for typological studies of the relationship between these elements as they are employed for marking of lexical and grammatical distinctions on the one hand, and intonation on the other hand. To address this problem, this study leans heavily on Zaar, a Chadic tone language spoken in the South of Bauchi State, Nigeria. Our hypothesis is that the role of pitch in Zaar intonation can be observed in the variation between post-lexical tones as they are perceived and transcribed by the native speaker and their acoustic realisation as represented by Praat and Prosogramme. These variations, i.e. the way intonation influences the realisation of post-lexical tones, fall under *  I wish to thank Raymond Boyd, Shlomo Izre’el, Alexis Michaud, and an anonymous reviewer for their help. Any error remains mine entirely. 1.  Tone-languages use pitch variation for morpho-syntactic uses on a scale that cannot be compared with what obtains in a stress-language like English for example, where stress combines variation in pitch, loudness and length in the same way as intonation, but where the often quoted opposition between verb and noun relying on stress (e.g. a ‘record / to re’cord) is at best marginal and not systematic (e.g. a/to ‘cover in both cases). The only place where stress manifests itself as a prominent grammatical feature, concerns its association with certain suffixes, e.g. -ics, -aphy, -ition, etc., and it is not distinctive. doi 10.1075/scl.68.02car 2015 © John Benjamins Publishing Company

44 Bernard Caron

the following categories: (a) Declination; (b) Intonemes, which are divided into Initial intonemes (Step-down and Step-up) and Terminal intonemes (Fall, Rise, Level and High-Rise). These prosodic features (declination and intonemes) are illustrated in the first part of the paper. In the final part, an intonation pattern exemplifying the combination of these features is analysed. The examples quoted in the paper are extracted from the Zaar CorpAfroAs corpus.

1. Introduction Zaar is a tone language with three phonemic tones: High (written with an acute accent: á), Mid (left unwritten: a) and Low (written with a grave accent: à). Two contour tones result from the combinations High-Mid and Low-Mid on a single syllable, i.e. resp. Falling (written with a circumflex accent: â) and Rising (written with a caron: ǎ). Tones are important to identify lexemes (lexical tone), but play a part in morphosyntax too. The influence of morphosyntax on the surface realisation of lexical tones is explained e.g. in the suprasegmental theory of tone by post-lexical rules.2 Let us take the verbs vər, give and ʧet, tell. Tense, aspect and mood are expressed in Zaar by “subject pronouns” which can affect the tone of the following verb. If we take the subject pronouns tàː, 3pl.pfv and tə́, 3pl.aor, and combine them with these verbs, we get the following sentences, where the tones of the verbs vary according to their lexical tone classes and the subject pronouns preceding them: – tàː vər, ’they have given (it)’ ; tə́ və̀r, ‘they gave (it)’ – tàː ʧet, ‘they have said (it)’ ; tə́ ʧêt, ‘they said (it)’3

2.  The tone marking in the transcription has been done by Marvellous S. Davan, a language assistant who consistently marks postlexical tones in grammatical words, whatever their acoustic realisation. Some of the recordings of the interviews used for this study were made in 1999. At that time, Zaar was a language without grammatical tradition or orthography, and up till now, it is not taught at school. The transcriber had just been trained in marking lexical tones on individual words and was given the recordings to transcribe as a first exercise. Without hesitation, he marked post-lexical tones right from the beginning, and has never varied in his transcriptions. Some passages that had not been faithfully transcribed (the hesitations, syntactic mistakes, etc. had been eliminated in the original transcription) were revised 10 years later. The same passages were transcribed again with exactly the same tones. 3.  The two verbs vər and ʧet both belong to the same [Mid] tone class. The Low tone on və̀r is due to the depressing effect of the initial voiced consonant /v/. See (Caron 2005: 212 ff.) on tone classes in Zaar.



Tone and intonation

Verbs can affect the tone of direct object pronouns through tone spreading, as can be seen in the 2nd person singular direct object pronoun =kə in the following sentences: – tàː vər=ɣə,4 ‘they have given you’ ; tə́ və̀r=ɣə̀, ‘they gave you’; – tàː ʧtet=kə, ‘they have told you’ ; tə́ ʧét=kə̀, ‘they told you.’ However, these “surface tones” accounted for and/or predicted by post-lexical tonological rules undergo further variations. This can be heard when listening to recordings of natural speech, and it can be represented and measured through instrumental acoustics. Our hypothesis is that the pitch component of intonation in tone languages lies in this variation. 1.1 Tone variation Pitch varies all along the syllables. Even non-modular tones are rarely realized by a plateau. The measurement of pitch in the study of tone has recently used the notion of “target”, defined as follows by (Akinlabi & Liberman 2001) : “the phonetic target value of a tone [is] the highest F0 of a High tone, or the lowest F0 of a Low tone “. In Yoruba, this target “is found at the end of the span of time corresponding to the associated tone-bearing unit”. For Zaar, this is true for Low and Mid tones, but not for High tones, where the phonetic target is at the beginning of the tone bearing unit. After a long process of trial and error, I have found that the phonetic target in Zaar is situated at the intensity peak of the syllable. Another method, which I have used for this study, is to transcribe prosody using pitch contour stylization based on a tonal perception model and automatic segmentation, as done e.g. by “Prosogramme”. The system has been implemented by Piet Mertens as a Praat script.5 (Mertens 2004) Prosogramme follows four steps: – Calculate acoustic parameters: F0, intensity, voicing. – Obtain segmentation. Select the relevant units (e.g. vowels, syllables). Select the voiced portion of these units that has sufficient intensity/loudness (using difference thresholds relative to the local peak). – Stylize the F0 of the selected time intervals. – Determine pitch range used in speech fragment. Plot stylized pitch and some annotation tiers (text, phonetic transcription). Use a musical (semitone) scale

4.  /k/ is realised as [ɣ] in the contexts V_V and r_V. 5.  “Praat” is a tool for acoustic and phonetic research, written by Paul Boersma and David Weenink, of the Institute of Phonetic Sciences in Amsterdam.

45

46 Bernard Caron

and add calibration lines at every 2 semitones for easy interpretation of pitch intervals.(Mertens 2002)6 The advantage of this method is twofold: first, it provides in a simple way images of Intonation Units, plotted against a time scale and customized annotation tiers; second, as it is automatic, it is reproductible and objective. One major drawback comes from the way vowel devoicing often prevents Prosogramme from representing F0 utterance-finally and sometimes utterance-internally, e.g. last syllable ni and antepenultimate syllable àː in example (1). Taking into account this major drawback, I have used Prosogramme as a first approximation which remains useful to identify at a glance the general shape of a melody, a register shift, or a freakish phenomenon such as strong Rise-Fall associated with sentence-final adverbials (cf. § 3.2.4). See example (1) below for a comparison of both methods, e.g. the Hz representation given by Praat and the semi-tone representation given by Prosogramme. When a syllable is missing in Praat, I have made an approximation using the Praat Hz calculation. This approximation is represented with dotted line in the diagram as opposed to the Prosogramme representation which uses continuous lines. (1)

séː sarkinpáːda tə̀ və́rtə hàkuri // séː sarkinpáːda tə̀ və́r=tə hàkuri // then Sarkin_Pada 3SG.SBV give =3S.OBJ patience // Sarkin Pada should tell him to be forgiving. (SAY_BC_CONV_02_SP1_077)

6.  The analysis of pitch intervals done by Prosogramme is based on the glissando threshold G, or auditory threshold for pitch variation. This depends on the amplitude (extent) and the duration of the F0 variation. Since the work of J.’t Hart, it is usually expressed in ST/s (semitones per second). ST use a logarithmic scale to give a better approximation of the way F0 is perceived and interpreted by the human ear. For convenience reasons, the ‘Automatic detection of syllabic nuclei’ has been selected for this work in the Prosogramme settings. This method uses “a segmentation into local peaks in the intensity of band-pass (300–3500 Hz) filtered speech, adjusted on the basis of the intensity (full bandwith)” (Mertens 2002). The other methods require manual time-plotting of the syllable nuclei on a Praat Textgrid.



Tone and intonation

Pitch (Hz)

260 200 150 100 0 0

Time (s)

1.779

1 asyll, G=0.16 / T2 , DG=20, dmin=0.035

100 90

150 Hz

80 SAY_BC_conv_02_SP1_078

Prosogram v2.7

In the main body of the article, in order to save space, I have represented only F0 as calculated by Praat. Our hypothesis is that the role of pitch in Zaar intonation can be observed in the variation between post-lexical tones as they are perceived and transcribed by the native speaker and their acoustic realisation as represented by Prosogramme. These variations, i.e. the way intonation influences the realisation of post-lexical tones, fall under the following categories: a. b.

Declination; Intonemes,7 which are divided into: – Terminal intonemes : Fall (↓), Rise (↑), Level (→) and High-Rise (↑↑); – Initial intonemes: Step-down (!) and Step-up (¡).

In the first part of the paper, I will illustrate these prosodic features (declination and intonemes). Then in the final part, I will analyse an intonation pattern exemplifying the combination of these features. But before doing this, let us clarify some terminological issues. 1.2 Intonation Unit The Intonation Unit (IU) is what “encapsulates a functional, coherent segmental unit, be it syntactic, semantic, informational, or the like” (Izre’el & Mettouchi, in 7.  Intonation literature uses the terms “falling/rising tone” to refer to phonetic cues characterised by a reduction/increase in pitch. In the linguistic description of tone languages, “tone” refers to the pitch variations used to characterize lexical and grammatical oppositions alongside vowels and consonants. I want to preserve the terms “falling/rising tone” for these phonologically distinctive suprasegmental units. For the reduction/increase in pitch working as intonation acoustic cues, I will use the terms “Fall/Rise intoneme” rather than tone.

47

48 Bernard Caron

this vol.). In other words, it is “that part of a discourse text that the speaker by his voice wished to identify as an informational unit.” (Markus 2006: 112) An intonation unit (henceforth IU) is characterized by a combination of the following elements: – overall: declination. – final: pause, creaky voice; lengthening of final vowel or consonant; – initial: (upward or downward) pitch adjustment, acceleration. 1.3 Paratone The paratone corresponds to an utterance, i.e. a functionally complete speech act.8 A canonical paratone is followed by a pause and a pitch reset, ends in a Fall, and is characterized by overall declination.9 Paratones can consist of one or more IU’s. What distinguishes IU’s from paratones is the fact that they do not necessarily correspond to a complete speech act. The end of an IU is transcribed by a single slash (/). The end of a paratone is transcribed by a double slash (//) corresponding to the completion of the speech act. In a transcribed text, a paratone is delimited by two doubles slashes: it ends in a double slash and begins after the final double slash of the preceding paratone. 1.4 Period A period is the highest prosodic hierarchy, defined as “a speech stretch that shows declination along its paratones (‘supradeclination’ according to Wichman 2000: §5.2.2)” (Izre’el & Mettouchi in this vol.) 2. Declination 2.1 The general frame For both tone and non-tone languages, declination has been presented as a universal tendency due to physiological constraints,10 linked to the energy used to expel 8.  Speech act: declaration (positive and negative), question (including rhetorical questions), injunction, exclamation, etc. 9.  See below for the way genre (e.g. tales) or gender (e.g. women speech) can interfere with this canonical definition. 10.  The phenomenon of declination has to be distinguished from downstep. Downstep occurs in some tone languages and is set off by a succession of High and Low tones. It results in the



Tone and intonation 49

pulmonic air through the vocal organs. This creates the background for a “neutral” intonation against which variations of pitch by the speaker can be interpreted as meaningful patterns of deviations.11 How is this compatible with what obtains in tone languages where the constraints of lexico-grammatical tones may influence the melody in an upward movement, contrary to the general downward movement of declination? Bearth (98) sketches a typology of declination in tone languages, falling into 3 categories illustrated by Chinese (where intonation is superimposed on lexico-grammatical tonology), Akan languages (where declination is phonologised into tone downstep, and intonation is added to the periphery) and Toura (where declination is neutralised, and intonation is added to the periphery). Zaar belongs to the Chinese type in Bearth’s typology, with no downstep phenomenon, and a declination observable from the Intonation Unit up to the Period as a gradual lowering of the pitch over the intonation unit. This is noticeable sp. in High tones. The highest tone in an IU is the first High tone of this unit. Each following High tone is pronounced lower than the preceding one. In example (2), the first three High tones read at 251 (á), 249 (mí) and 243 (ŋáː) respetively, with the last High tone of the utterance (lí) reading at 172. The same declination is observed in the final Low tones reading at 175 (mə̀) and 169 (jè). Utterance-final Falls are added to declination, e.g. the lexically Mid tone of the last syllable of the paratone

automatic lowering of a High tone following a Low tone. As a consequence, in a succession of High-Low-High tones, the second High is pronounced with a lower pitch than the first one, resulting in what has been called terraced-level tone languages (Clements 1979). On the other hand, declination is a gradual, progressive lowering of F0 occurring over an utterance, whatever the succession of tones, and can be observed even in utterances with both all-High or all-Low tones. As stated by (Ladd 1996), “(…) F0 tends to decline over the course of phrases and utterances, both in tone languages and in languages like English or Dutch.” (p. 73ff.), and “[…] even when nothing is ‘happening’ phonologically in the contour, F0 continues to go down slightly […]” (p. 18) 11.  However, Bearth (1998) presents data from Toura, a four-tone African language, where declination is limited to local tonal downstep where two tones of the same phonological level are separated by one or several lower tones, the second tone tends to be realised lower than the first. This lowering is then immediately locally readjusted and the following tones resume the general framework of the language, where all high tones of a unit will be pronounced at approximately the same level. Intonation is then expressed at the periphery of the IUs. “C’est la dernière more (…) de l’énoncé qui est le point de contact entre la [tonalité lexico-grammaticale et l’intonation périphérique] à partir duquel se contitue un paradigme énonciatif chargé de caractériser l’énoncé des points de vue notamment de son statut en tant qu’acte illocutif, de sa complétude ou incomplétude, de l’expression de la subjectivité et de l’émotivité, ainsi que du positionnement social des interlocuteurs. » (Bearth 1998: 80–1)

Bernard Caron

(oː) which bears the utterance-final Fall from 161 (lower than the preceding Low tone) to 140.12 á lə̌ːrmí ŋáːwôs mə́nɗi mə̀ jèlì òː // á lə́ːr =mí ŋáː =wôs 3SG.AOR.SBJ bring =1PL.OBJ son =3SG.POS mə́n -ɗi mə̀ jel -i -oː BEN -DIR 1PL.AOR.SBJ see -SPCF FCT He has brought his son for us to see. (SAY_BC_CONV_02_SP2_029)

(2)

Pitch (Hz)

270 250 200 150 100 0

1.907

Time (s)

This is representative of the canonical declarative intonation of Zaar. The same intonation pattern is found in WH-Questions, as in example (3): tòː zəgì ʧi gòs dzàŋ gjòː // tòː zəgì ʧi gòs dzàn gjòː // well Ziggy 3SG.SBJ.be 3SG.POS day which // Well, Ziggy, his own, which day (was it)? (SAY_BC_CONV_03_SP1_703)

(3)

Pitch (Hz)

50

150 100 60

tô 0





ʧi Time (s)

gòs

dzàŋ

gjò // 0.853

To compensate for declination, each IU starts with an initial pitch reset, also called ‘declination reset’ (Ladd 1996: 279). When IU’s are integrated into a paratone, or paratones into a period, declination applies inside larger units as well, e.g. in example (4):

12.  This final lowering explains why the assertive particle oː has been transcribed with a Low tone by the language assistant.



Tone and intonation

(4) mjáːni ma mbútni / ká ŋgjâk gàːlí ɣá / ɓasəm // (776) tòː / ká ŋgjêrtə̀ / dón vwàrɲì / tə̀ náː ɓasəm // (1070) dón vwàrɲì jáː náː ɓasəm mjâːn ma daːfá gə̀tn ɓáːtkə̂n // mjáːni ma mbút -ni / ká ngjak 1SG 1SG.FUT lie_down -INCH / 2PL.FUT hold_fast gàːl -íː ká / ɓas =əm // tòː / ká nger =tə / cow -RES at / PosL =1SG.OBJ // well / 2PL.FUT cut =3S.OBJ / dón vòràŋ -i / tə̀ naː ɓas =mə // because blood -INDF / 3SG.SBV remain PosL =1SG.OBJ // dón vòràŋ -i jáː naː ɓas =mə because blood -INDF 3SG.COND remain PosL =1SG.OBJ mjáːni ma daːfá gə̀tn ɓaːt -kə́nì // káwâj -oː // 1SG 1SG.FUT continue 3SG.POS lick -NMLZ // merely -FCT // I will lie down and you will hold the cow over me. Then, slaughter it so that the blood should stay on me. Because if the blood stays on me, me, I will keep licking myself. (SAY_BC_NARR_02_SP1_110)

200 200

Pitch (Hz) Pitch (Hz)

150 150 100 100 70 70 0.4659 0.4659

Time (s) Time (s)

3.09 3.09

200 200 Pitch (Hz) Pitch (Hz)

150 150 100 100 70 70 3.632 3.632

Pitch (Hz) Pitch (Hz)

200 200 150 150 100 10070 70 8.968 8.968

Time (s) Time (s)

Time (s) Time (s)

6.257 6.257

11.43 11.43

51

Bernard Caron

In this example, gradual declination of high tones can be observed over the first three paratones, from 169,7 to 150,6 Hz; 152 to 142,4 Hz; 162,3 to 135,7 Hz, with final High tones getting gradually lower. 2.2 Variations in declination Declination however is neutralised in some conditions, e.g. exclamation (example 5), in some forms of story-telling, and in some women’s speech (example 6). (5)

à múrín múrín // á múr -ə́n múr -ə́n // eh man -PROX man -PROX // Really, this man! (SAY_BC_CONV_02_SP2_178)

300 Pitch (Hz)

52

250 200 150 125 0

Time (s)

0.8031

Example (6) is part of a woman’s speech detailing her daily chores. It shows a period consisting in two paratones where the second one finishes roughly at the same level if not slightly higher (155 Hz) than the first one(150Hz). This cancellation of declination gives a feeling of vehemency to the way some women’s speech is perceived. (6) tôː / mjàːní / lə̂p jáː ɬǎːj / tôː mə́ ɬǐː nə́ ŋamtsə́ nátkə́nì // mə́ nât ŋamtsə́ɗi / tôː mə́ mánì mə́ mán tsə̌tnnì // tòː / mjàːní / lə̂p jáː ɬaː -íː / well / 1PL / place 3SG.COND cut -RES / tòː mə́ ɬə -íː nə́ ŋamtsə́ nat -kə́nì // well 1PL.AOR go -RES for wood tie -NMLZ // mə́ nat ŋamtsə́ -ɗi / 1PL.AOR tie wood -CTP / tòː mə́ máni mə́ man tsə́tn -ni // well 1PL.AOR come 1PL.AOR come sit -INCH // Well, we, when the day breaks, well we go and collect wood. After collecting wood, we come back and sit down. (SAY_BC_CONV_02_SP1_014–019)



Tone and intonation

Pitch (Hz)

450 400 300 200 75 0.05857

Time (s)

3.386

Pitch (Hz)

450 400 300 200 75 3.404

Time (s)

5.445

Apart from those exceptions, declination helps identify the limit of speech units through pitch reset. Against this general background, intonemes operate both at the initial of IUs (affecting the whole of the unit) and at the end of paratones, in what Bearth (1998) calls ‘peripheral intonation’. 3. Intonemes Intonemes are defined as the minimal units of distinctive intonation contours associated with particular functions. 3.1 Initial: Step-up and Step-down13 Initial lowering (Step-down, noted !) or raising (Step-up, noted ¡) consist in a noticeable change in the register of an intonation unit compared to the preceding one. This initial pitch adjustment creates a break in the gradual lowering of the pitch induced by declination. Both Step-up and Step-down are associated with specific functions: Step-up is associated with topicalisation, emphasis of adverbials and emotional statements. Step-down is associated with parenthesis and comments following a (stepped-up) topic.14 13.  The terms Step-up and Step-down are borrowed from (Crystal 1969 :143–52) to avoid any confusion with downstep, as characterised in (note 11) above. 14.  Lowering and raising of register linked to informational factors such as emphasis or parenthesis, here described as Step-down and and Step-up are associated with and may be described

53

Bernard Caron

Example (7) shows an emphasis of the final adverb kawai through a Step-up of 73 Hz from 138 to 211 Hz, which is remarkable for a male speaker. (7)

kèrèŋkéːʃe ʧi ɓastə jélɣə̂n / ¡káwâj // kèrènkéːʃe ʧi ɓas =tə jel -kə́nì / káwâj // Kerenkeshe 3SG.SBJ.be PosL =3S.OBJ see -NMLZ / merely // Kerenkeshe was merely watching him. (SAY_BC_NAR_02_SP1_155–6)

Pitch (Hz)

230 200 150 100 75 0

Time (s)

1.687

In example (8), a Step-down separates the temporal frame (‘since I started’) from the assertion (I haven’t been here). The two IUs average at 101 and 87,75 Hz respectively, with their respective nuclei measuring at 111 and 89 Hz. (8)

túnɗan mə ŋgúp / !bàː máː ɬə teː ɗánǐŋ // túnɗan mə ngúp / bàː máː ɬə teː ɗáni hə́ŋ // since 1SG.AOR start / NEG1 1SG.PFV go at there NEG2 // Since I started, I haven’t been there. (SAY_BC_CONV_03_SP1_135)

130 120 Pitch (Hz)

54

100 80 60 0

Time (s)

1.153

In example (9), after an initial IU corresponding to the introduction of a new topic (a new example to prove the speaker’s case), a Step-down accompanies some backgrounded elements where the speaker reminds her audience of the theme of the conversation (women keep running about, overworking themselves, whereas men stay idle in the compound, chatting with their friends). This long paratone is as compression and expansion of register. Level and span are intimately linked, insofar as raising the voice involves expanding the pitch span from the bottom up while the bottom of the speaking range remains more or less constant. “[…] broadly speaking, the higher the level the wider the span.” (Ladd 1996: 260).



Tone and intonation

characterized by ample declination and clear change of register at the beginning of the last two IUs. (9) mǎːm móːmi kúmáːːː / !ɮàmɗì gòsɗìːːː / ʧáː fini gòs / makaranta // maːm kə́ móːmi kúmá -ːːː / ɮam -ɗi gòs -ɗi -ːːː / mum posl Momi also -length / return -ctp 3sg.pos -ctp -length / ʧáː fi -ni gòs / < koyarwa makaranta > // 3sg.ipfv do -inch 3sg.pos / < teaching school > // As for Momi’s mum, the place where she goes, what she does, is to teach children in school. (SAY_BC_CONV_02_SP1_023–26)

Pitch (Hz)

320 300 250 200 150 100 0

Time (s)

3.485

3.2 Terminal intonemes These terminal intonemes are the Fall, the Rise, the Level, and the High Rise. 3.2.1 Fall The Fall intoneme (transcribed with the sign “↓” in the annotation) consists in a distinctive lowering of the pitch at the end of the paratone. It characterises canonical assertions and Wh-questions. In Zaar, contrary to what avails e.g. in French and other Afro-Asiatic languages e.g. Hausa (Newman 2000: 613) and Bole (Schuh, Gimba & Ritchart 2010: 236), it is found at the end of Y/N-Questions as well.15 (10)

ʧáː ɬə́ git məːri ɣá makaranta ↓// ʧáː ɬə git məːri ká makaranta // 3SG.IPFV go show child.PL at school // She goes to teach children in the school. (SAY_BC_CONV_02_SP1_028)

15.  Cf. Caron et al. in this volume.

55

Bernard Caron

Pitch (Hz)

270 250 200 150 100 0

Time (s)

1.619

3.2.2 Rise This final intoneme (transcribed ↑) is mostly associated with exclamation, such as can be seen on example (5) and here in example (11), where the final high tone on máː is measured at 255,5 Hz while the second syllable of soːséj, the paratone nucleus, peaks 12 Hz below at 243,6 Hz only: (11)

Pitch (Hz)

56

àː sòːséj máː ↑// àː sòːséj máː // ah quite even // Ah quite so ! (SAY_BC_CONV_01_SP2_052)

275 250 200 150 0

Time (s)

0.806

3.2.3 Level This final intoneme (transcribed →) cancels declination. It is often associated with lengthening and induces the only (rare) cases of plateau realization of flat tones. This intoneme can be observed twice in example (9), at the end of the first two IU’s. The intonation of this example can now be transcribed as follows: mǎːm móːmi kúmáːːː →/ !ɮàmɗì gòsɗìːːː →/ ʧáː fini gòs / < koyarwa > makaranta ↓//. As is the case here with the first two IU’s, the Level intoneme often identifies the limit and relationship between a topic and a comment. It is also associated with hesitation, e.g. in example (13) at the end of the paper. 3.2.4 High Rise High Rise (transcribed ↑↑) is characterised by a sharp rise of F0 to a level beyond the speaker’s usual range of high tones. It is systematically associated with



Tone and intonation

emphasis on negation, ideophones and assertion particles. It can be followed by a Fall when occurring at the end of a paratone. In Example (12), we have two occurrences of this intoneme. The first High Rise occurs at the end of an intonation unit, but paratone-internally. It is borne by the last syllable of the word kìmsə́. The second High Rise occurs at the end of the paratone, and is followed by a Fall.

Pitch (Hz)

(12)

[…] mjáːnaː tul gìp kìmsə́ ↑↑/ (816) káwâj màːʃîn / (314) fi mə́ɣə́p ↑↑↓// mjáːnaː tul gìp kì =mə -sə / 1SG.CONC arrive inside 2PL.SBJ =1SG.OBJ -PL / káwâj màːʃîn / fi mə́kə́p // merely motorbike / do stop // […] we had just entered Kimseh when the motorcycle stopped. (SAY_BC_ CONV_03_SP1_400–4)

200 180 160 140 120 100 0

Time (s)

3.451

4. Intonemes combine into intonation structures To illustrate how these various intonemes can be used to characterise a complex intonation structure, let us have a look at example (13) spoken by a mature woman describing her daily routine. (13) mjàːní ɗaŋgəní →/ (511) mjàːní guɗi ɗaŋgəníːːː →/ ɗaŋːːː →/ mì ɗûːn ↑↓/ (420) mjǎː ɬǐː lə̂p jáː ɬǎːj tôː / (371) mə́ ŋgâː ʒàɗì ↓// mjàːní ɗangəní / mjàːní guɗi ɗangəní ɗan mì ɗúːni / 1PL now / 1PL woman.PL now REL2 1PL.SBJ here / mìká ɬə -íː lə̂p jáː ɬaː -íː tòː / 1PL.CONT go -RES place 3SG.COND cut -RES well / mə́ ngaː ʒà -ɗi // 1PL.AOR fetch water -CTP // We now, we women now… who… are here, we go when the day breaks well we fetch water. (SAY_BC_CONV_02_SP1_001–3)

57

Bernard Caron

Pitch (Hz)

400 300 200 75 0.002672

Time (s)

3.565

400 Pitch (Hz)

58

300 200 75 3.985

Time (s)

6.777

This example consists of two paratones divided into many IU’s. IU #1 is a topic, finishing in a Level intoneme; IU’s #2 and #3 finish in hesitation, marked by final vowel lengthening and a level intoneme. From a functional point of view, IU # 2 is the development of the Topic expressed in IU #1. The hesitation introduces IU #3, which consists in a relative pronoun announcing a further development of the second Topic. However, this IU finishes in the same hesitation as IU #2. These hesitations, meanwhile, introduce some ambiguity as to the interpretation of the rest of the paratone. IU #4 is expected to achieve the completion of the relative clause announced in IU #3, and of the second Topic at the same time. However, the combination of Rise and Fall at the end of this unit could lead us to interpret it as a whole thetic paratone, and IU #3 as the end of an aborted paratone. In this case, IU’s #5 and #6 make a second paratone with IU #5 functioning as a conditional frame for the rest of the utterance. The alternative interpretation sees the whole utterance as a completed paratone, with IU #6 as the comment of the two initial topics (IU #1 and IU’s #2 to #4) and IU #5 as the conditional frame of IU #6. In this case, the Rise and Fall intonemes at the end of IU #4, quite unusual for a topic, could be interpreted as a reflex from the speaker to compensate for the preceding hesitations, giving an unnecessary assertive power to the topic. The two competing structures are: (a) mjàːní ɗaŋgəní →/ (511) mjàːní guɗi ɗaŋgəníːːː →/ ɗaŋːːː →/ mì ɗûːn ↑↓/ (420) mjǎː ɬǐː lə̂p jáː ɬǎːj tôː / (371) mə́ ŋgâː ʒàɗì ↓// We now, we women now… who… are here, when the day breaks, we go and fetch water.



Tone and intonation

(b) mjàːní ɗaŋgəní →/ (511) mjàːní guɗi ɗaŋgəníːːː →/ ɗaŋːːː ## mì ɗûːn ↑↓// (420) mjǎː ɬǐː lə̂p jáː ɬǎːj tôː / (371) mə́ ŋgâː ʒàɗì ↓// We now, we women now… who… Here we are. When the day breaks, we go and fetch water.

Structure (a) consists in one long complex paratone; structure (b) consists in a period with three paratones, the first one being aborted. This type of ambiguity is not uncommon in natural speech, and the identification of the tonal exponents of intonation in Zaar has enabled us to describe it precisely. 5. Conclusion Two conclusions can be drawn of a preliminary exam of the few elements discussed in this paper about Zaar intonation: If I refer to the typology sketched by T. Bearth (1998: 80–1), which distinguishes between two types of languages, i.e. (i) those that stack intonation patterns over lexico-gramatical tones and (ii) those that express intonation at the periphery of the utterance, Zaar would be a mixed language, with both internal intonation (with Step-up and Step-down inducing pitch-raising or lowering over whole intonation units) and peripheral intonation (with Rise and Fall final intonemes). This could be further developed if the final intonemes are confirmed to be correlated by anticipatory Rises and Falls inside the intonation units. Beyond the variations in the location of intonemes (whether peripheral or contiguous with the whole intonation unit), the same general pragmatic interpretation of intonation contours seems to hold for Zaar, as well as for Toura, and English for that matter: a stepped-down Intonation Unit either bears some background information, or a repetition of something that has already been said.16

References Akinlabi, Akiniyi & Liberman, Mark. 2001. Tonal complexes and tonal alignment. In Proceedings of the North East Linguistic Society [NELS 31], Minjoo Kim & Uri Strauss (eds), 1–20. Amherst MA: GLSA. Bearth, Thomas. 1998. Tonalité, déclinaison tonale et structuration du discours - Un point de vue comparatif. In Les unités discursives dans l’analyse sémiotique: La segmentation du discours, Gustavo Quiroz, Ioanna Berthoud-Papandropoulou, Evelyne Thommen & Christina Vogel (eds), 73–87. Bern: Peter Lang.

16.  Markus 2006 :117.

59

60 Bernard Caron Caron, Bernard. 2005. Za:r (Dictionary, grammar, texts). Ibadan (Nigeria): IFRA. Clements, George N. 1979. The description of terraced-level tone languages. Language 55: 536– 558. DOI: 10.2307/413317 Crystal, David. 1969. Prosodic Systems and Intonation in English. Cambridge: CUP. Ladd, D. R. 1996. Intonational Phonology. Cambridge: CUP. Markus, Manfred. 2006. English and German prosody - A contrastive comparison. In Prosody and Syntax: Cross-linguistic Perspectives [Usage-based Linguistic Informatics 3], Yuji Kawaguchi, Ivan Fónagy & Tsunekazu Moriguchi (eds), 103–124. Amsterdam: John Benjamins. DOI: 10.1075/ubli.3.07mar Mertens, Piet. 2002. Prosogramme, v. 2.9. (7 February 2011). Mertens, Piet. 2004. Le prosogramme. Une transcription semi-automatique de la prosodie. Cahiers de l’Institut de Linguistique de Louvain 30: 7–25. DOI: 10.2143/CILL.30.1.519212 Newman, Paul. 2000. The Hausa Language: An Encyclopedic Reference Grammar [Yale Language Series]. New Haven CT: Yale University Press. Schuh, Russell G., Gimba, Alhaji Maina & Ritchard, Amanda. 2010. Bole intonation. UCLA Working Papers in Phonetics 108:226–248. Wichmann, Anne. 2000. Intonation in Text and Discourse. Beginnings, Middles and Ends. Harlow: Longman.

Part 2

Interfacing prosody, information structure and syntax

The intonation of topic and focus Zaar (Nigeria), Tamasheq (Niger), Juba Arabic (South Sudan) and Tripoli Arabic (Libya) Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira

LLACAN (UMR 8135), Inalco, CNRS, PRES Sorbonne Paris-Cité / DDL-Lacito / SeDyL (UMR 8202), Inalco, CNRS / LACNAD, INALCO

A follow-up of the CorpAfroAs project, this paper presents a typologicallyoriented study of the intonation of Topic and Focus in four Afroasiatic languages (Zaar, Tamasheq, Juba Arabic and Tripoli Arabic), in relation to their phonological and information structures. The different prosodic systems represented in the study — i.e. the demarcative accent system of Berber, the lexical stress system of Tripoli Arabic; the pitch accent system of Juba Arabic; and the tone system of Zaar — give ground to the study of the correlation between these prosodic systems and their intonation structures; and more particularly, how declination, wich seems to be a universal of the intonation of declarative sentences, interacts with other sentence types, such as Yes/No-Questions, WH-Questions, Exclamations, etc. Likewise, the paper explores the correlation between the prosodic systems and the intonational exponents of Topic and Focus. The paper starts by setting up the concepts and typological frame used for the study. Then, it presents a case study of the four languages, examining their prosodic systems, and the prosodic exponents of topic and focus. Finally, the paper compares the four systems, drawing conclusions from a typoligical point of view. A general rule seems to emerge from the study: lack of a specific intonation pattern for a specific intonation structure is supplemented by morpho-syntactic marking. In other words, the more a structure relies on morpho-syntax, the less it relies on intonation.

doi 10.1075/scl.68.03car 2015 © John Benjamins Publishing Company

64 Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira

1. Topic and Focus In this preliminary study of the relationship between sentence type, information structure and intonation1 in a selection of AfroAsiatic languages, one feature has emerged as characteristic of oral corpora, viz. the statistical prominence of topics, more especially in the case of conversations. This has led us to study the correlates of topics, i.e. foci, as contrasted with utterances that have neither a topic nor a focus. To reflect this partition, we have developed a three-pronged terminology introducing a typological division into thetic, topical and focal utterances. Topical and focal utterances are based on a dichotomy between two elements: topic and comment on the one hand; focus and preconstruct on the other hand. Thetic utterances are not based on such a dichotomy, and correspond to one single unit, expressing logically simple judgements. A parallel can be established with (Lambrecht 1994)’s typology (e.g. § 5.2.1, pp. 221 ff.), with Topical = Predicate-focus structure; Focal = Argument-focus structure; Thetic = Sentence-focus structure. As can be seen from this parallel, we have narrowed down the use of the term “focus” to Argument-focus in order to stress the difference in the structure and nature of the relationship between the two constituents of topical and focal utterances. In our terminology, a focus only appears in a structure where it is related to a preconstruct. ‘Preconstructed’, as opposed to ‘presupposed’ refers to a pragmatic element that has a linguistic manifestation as a clause-level construction, whereas what is ‘presupposed’ tends to refer to cognitive notions such as ‘representation of the world’ and ‘knowledge’ (Lambrecht 1994: 55). 1.1 Thetic In thetic utterances, the assertion is presented as a whole, corresponding to a logically simple judgement. They correspond to event-reporting sentences, e.g. the English “Here is ME.” Or the French “Maman, y’a Pierre qui m’embête !”. (Cornish 2005) characterises these utterances as follows: « […] selon les philosophes Brentano et Marty, les propositions thétiques comportent un « jugement unique » — l’état de choses dénoté par la proposition est présenté d’un seul tenant, pour ainsi dire — plutôt qu’un jugement « double », où un objet, une proposition (logique) ou un état de choses est d’abord identifié, puis dans un 1.  We understand the words “intonation structure” in compliance with Knud Lambrecht’s definition: “That component of sentence grammar in which propositions as conceptual representations of states of affairs are paired with lexicogrammatical structures in accordance with the mental states of interlocutors who use and interpret these structures as units of information in given discourse contexts.” (Lambrecht 1994: 5)



The intonation of topic and focus

deuxième temps, quelque chose en est prédiqué (cette dernière situation correspondrait à un énoncé « topique-commentaire » ou bien à focus contrastif, impliquant un jugement « catégorique »).(p. 76)

We use the term ‘thetic’ to refer negatively to utterances that don’t have a topic or a focus. 1.2 Topic Topics appear in topical utterances. A minimal topical utterance is characterized by a division into two intonation units: < Topic / Comment >. We use the word “topic” for what (Lambrecht 1994) defines as the ‘Topic Expression’: “A constituent is a topic expression if the proposition expressed by the clause with which it is associated is pragmatically construed as being about the referent of this constituent” (p. 131) A topic states the referent on which the comment, characterized by the notion of “aboutness”, states what is asserted: “Jean [Topic], il est drôlement costaud [Comment]”. In this work, we will further restrict the use of the word ‘topic’ as a short-cut for Argument-topic, i.e. the “disjoint lexical support” of the utterance (Morel & Danon-Boileau 1998). The Argument Topic, or Topic proper, is to be differentiated from left-dislocated circumstantials which include Time and Place adverbials, conditions, etc. In the literature, these are often treated together with topics, but they merely set the circumstantial frame for the following predication. In this study, we will use the term “Frame” (as a short-hand for Frame-setting Topic) to set them apart from the Topic (as a short-hand for Argument Topic). A topic need not be integrated syntactically into the predication, e.g. the following examples taken from (Furukawa 1996: 25) where the topic is italicized: “Oh, tu sais, moi, la bicyclette, je n’aime pas me fatiguer”; “Oh, euh, mais tu sais, le metro, avec la carte orange, tu vas n’importe où.” A more complex topical utterance will have either more than one topic and/or include a focus inside the comment, e.g. moi in the following example “Non, la cuisine, c’est moi qui la fais.” (Lambrecht 1994: 293). 1.3 Focus A focal utterance is a complex syntactic construction where a predication is given as a preconstruct falling outside the scope of the assertion. Out of this predication, an element is selected and identified as the relevant element that fills the gap created by the extraction out of the predication. As a result we have two predications that are syntactically linked: a qualitative identification of the focus expression;

65

66 Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira

and a ‘classical’ predication which is preconstructed (Caron 2000; Robert 1993). The assertion of the utterance bears on the identification of the focus expression, e.g. “C’est jean qui est venu.” where “qui est venu” = “( ) est venu” is the preconstruct (“someone has come”), and “C’est jean” identifies “Jean” with the “someone” who has come. As stated in (Lambrecht 1994: 224), the word “argument” in “argument focus” is used as a cover term for any non-predicative term in the proposition, including place time and manner. Included in focal utterances are Wh-Questions, as opposed to Yes/No-Questions. 1.4 Summary As a summary, our typology distinguishes between: 1. Topical utterances, divided into a topic expression and a comment; the assertion bears on the predication inside the comment ; the topic is merely stated; 2. Focal utterances, divided into a focus and a preconstruct; the assertion bears on the identification of the focus expression; the predication is preconstructed; 3. Thetic utterances which don’t have a focus or a topic ; the assertion bears on the whole utterance. For the sake of feasibility, we will limit the scope of this paper to the study of Foci, Topics (disjoint lexical supports) and Frames (left-dislocated circumstantials) as these share certain intonational properties with Topics. 2. The intonation of Topic and Focus in Zaar 2.1 Zaar prosodic system Zaar is a tone language with three phonemic tones: High (written with an acute accent: á), Mid (left unwritten: a) and Low (written with a grave accent: à). Two contour tones result from the combinations High-Mid and Low-Mid on a single syllable, i.e. resp. Falling (written with a circumflex accent: â) and Rising (written with a caron: ǎ).



The intonation of topic and focus

2.1.1 Neutral intonation pattern and declination For both tone and non-tone languages, declination has been presented as a universal tendency due to physiological constraints,2 linked to the energy used to expel pulmonic air through the vocal organs. This creates the background for a “neutral” intonation against which variations of pitch by the speaker can be interpreted as meaningful patterns of deviation.3 In Zaar, it can be observed from the unit up to the period, as a gradual lowering of the pitch over the utterance. The “neutral” intonation pattern is characterized in Zaar by a combination of declination and a final fall. This intonation pattern obtains for all types of sentences: assertions (see (ex. 1) for positive assertion, and (ex. 2) for negative assertion), Wh-Questions (see ex. 3), Yes/No-Questions (see ex. 4): (1)

sə̂m gwón tu kèrèŋkéːʃe // sə̂m gón tu kèrènkéːʃe // name.POS some OPN Kerenkeshe // One was named Kerenkeshe. (SAY_BC_NARR_02_SP1_05)

Pitch (Hz)

200 150 100 75

səm 0

(2)

gwón

tu

kè Time (s)

rèŋ

ké

ʃe // 1.52

á lə̌ːrmí ŋáːwôs mə́nɗi mə̀ jèlí oː // á lə́ːr =mí ŋaː =wôs mə́n -ɗi 3SG.AOR.SBJ bring =1PL.OBJ son =3SG.POS BEN -DIR mə̀ jel -i -oː // 1PL.AOR.SBJ see -SPCF FCT // He has brought his son for us to see. (SAY_BC_CONV_02_SP2_029)

2.  “ (…) F0 tends to decline over the course of phrases and utterances, both in tone languages and in languages like English or Dutch.” (Ladd 1996: 73ff.) 3.  However, see (Bearth 1998) on Toura, a four-tone African language where declination is limited to local tonal downstep.

67

68 Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira

Pitch (Hz)

270 250 200 150 120

á

lər



ŋa

wôs

0

məć

ɗi



li

o // 1.904

Time (s)

mə̀ ʧì kêngájaː // mə̀ ʧi kéni =káj =aː // 1PL.SBJV eat forward =ANAPH =NASS // Shall we go on? (SAY_BC_CONV_02_SP2_091) Pitch (H)

(3)

mən

350 300 250 200 150 100

məm̀

kén

ʧi



ja // 0.9675

Time (s)

0

This neutral pattern remains unchanged by the specialized assertion markers which can appear at the end of the utterances: -oː for emphatic assertions (ex 2); -aː for non-assertions, e.g. Yes/No-Questions, which are always associated with this particle (ex 4); -eː for WH- Questions (ex 5). (4)

má fítə̀ wúrɣəníjeː // má fi =tə wúri =kəní -eː // 1PL.FUT do =3S.OBJ how? =COP2 -QUEST // How shall we do? (SAY_BC_CONV_02_SP2_157) 350 Pitch (H)

300 250 200 150 120

má 0

fi

tə̀

wúr Time (s)





je // 1.207

Simple Wh-Questions (without the final -eː particle seen above in ex 4) are merely characterised by declination, without a final fall, e.g. (Ex 5): (5) má fí wuri // má fi wuri // 1PL.FUT do how //

The intonation of topic and focus 69



How shall we do? (SAY_BC_NARR_01_SP1_683) Pitch (Hz)

190 150 100 75



0

wu

fi

ri// 0.7254

Time (s)

2.1.2 Exceptions to declination There are three types of exceptions to declination in Zaar, which are associated with: (i) suspensive intonation; (ii) utterance-final (ideophonic) adverbials; (iii) rhetorical questions expressing surprise or irony. Suspensive intonation, characterized by the absence of final fall and a plateau at the end of the unit, can be observed, e.g. in exclamations: à múrín múrín // á múr -ə́n múr -ə́n // ah man -PROX man -PROX // Ah, this man, this man! (SAY_BC_CONV_02_SP2_178) Patch (Hz)

(6)

350 300 250 200 150 120 0.002596

á



rín Time (s)



rín // 0.7415

Suspensive intonation can be observed at the end of units in utterances consisting of lists, such as (ex. 7) with a list of proper nouns: (7)

sə̂m gón tu dàːgùláw / (461) sə̂m gón tu vwàːgàní / (1446) < ʃíː kêː náŋ > // sə̂m gón tu dàːgùláw / sə̂m gón tu vwàːgàní / name some COMP Dagulau / name some COMP Vwagani / < ʃíːkèːnán > // < shikenan > // One was named Dagulau, one was named Vagani, that’s it. (SAY_BC_ NARR_02_SP1_07–11)

70 Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira

Pitch (Hz)

200 150 100 70

səm góntu dàgù láw

461 səm góntu vwàgà ní

0

1446

ʃíkê náŋ//

5.317

Time (s)

=a

The intonation pattern characteristic of utterance-final adverbials finishes in a strong rise (High-rise), realised as an extra-High tone, pronounced outside the range of normal pitch variation. This is used for adverbials such as ideophones and adjectival ideophones, e.g. (ex 9), with extra-high pitch at 231Hz on the second syllable of ʧolʧól, ‘very smooth’, compared to an average of 168Hz for the rest of the utterance). tôː ɮǐːwôs ʧolʧól // tòː ɮìː =wos ʧolʧól // well body =3SG.POS very_smooth // Well, his body is very smooth. (SAY_BC_CONV_02_SP1_118) Path (Hz)

(8)

300 100 100 70

tò 0

ɮì

=wos

ʧol

ʧól //

Time (s)

1.334

In rhetorical questions expressing surprise and/or irony, the utterance finishes in a Rise-Fall, i.e. an extra-High tone followed by a Fall, e.g. (ex. 9), in the rhetorical question ʧikâː, ‘Is that so?’, pronounced by a male speaker, culminating at 245Hz, compared to an average 146,5Hz over the next utterance tu èː, ‘He said yes.’ pronounced by the same speaker in the same example. (9)

ʧiɣâː // tu èː // ʧík =aː // tu èː // thus =NASS // OPN yes // Is that so? He said yes. (SAY_BC_NARR_01_SP1_046–7)

The intonation of topic and focus

Pitch (Hz)

300 250 200 150 100 75

tu

â //

ʧi

2.482 = //

è // 3.216

Time (s)

2.1.3 Register We have seen in the previous chapter that there is no influence of sentence types (assertions, questions) on the “normal” intonation pattern of Zaar consisting in a declination ending in a fall. Yet, we need to explore the possibility of a difference in register between for example, assertions and questions, with questions having a higher overall register than assertions.4 This might be the case if we compare the average pitch of the question (221Hz) to that of the answer (171,8Hz) in (ex. 10) á ndârá ŋâː // á ndârá // á ndará ŋâː // á ndará // 3SG.AOR be_proper QUEST // 3SG.AOR be_proper // Isn’t it good? It’s good (SAY_BC_CONV_02_SP2_032/SP1_037)

Pitch(Hz)

(10)

350 300 250 200 150 100

án

da



ŋâ //

án 0

Time (s)

da

rá // 1.307

However, the comparison of the average pitch of a few declarative and interrogative sentences of the same female speaker in the second file of the corpus (SAY_BC_Conv_02) has brought a negative answer. The pitch of questions varies between 209 and 221Hz, whereas assertions vary between 172Hz and 312Hz, for an overall average of 226,76Hz. The three assertions above the average (240Hz, 306Hz, 312Hz) all relate to a passage where the female speaker gets carried away when criticizing the laziness of men, compared to the excessive work load of women. The high pitch of the question in (ex. 9) should not be attributed to the fact that 4.  See for example (Newman 2000: 613) on the intonation of Y/N-questions in Hausa, characterised by suspension of declination and an overall higher pitch.

71

Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira

it is a question, but to the ironic content of the rhetorical question. Overall differences in register in whole utterances are associated to emotional, inter-subjective relationships rather than sentence types. However, differences in register are used utterance-internally, as a demarcative exponent setting the limit between intonation units. See (ex. 11) consisting in 4 units: the first with a relatively high register and hardly any declination (259Hz, compared to the average pitch of the speaker: 226,76Hz); the second with a much lower register (176,5Hz); the third (206,7Hz) and fourth (194,5Hz) following a normal declination after a pitch reset. mǎːm móːmi kúmá / ɮàmɗì gòsɗìːːː / ʧáː fini gòs / < koyarwa makaranta > // maːm kə́ móːmi kúmá / ɮam -ɗi gòs -ɗi -ːːː / mum pos Momi too / return -ctp 3sg.pos -ctp -length / ʧáː fi -ni gòs / < koyarwa makaranta > // 3sg.ipfv do -inch 3sg.pos / < teaching school > // As for Momi’s mother, the place where she goes, what she does is, teaching children in school. (SAY_BC_CONV_02_SP1_023–6)

(11) 320 300 Pitch (Hz)

72

250 200 140 0

Time (s)

3.485

Together with pause, length and pitch reset, change in register is one of the exponents of the intonation associated with topics. 2.2 Focus Focus is expressed in Zaar through a cleft construction involving left-dislocation, and indentification of the focus with either of the two ‘be’ copulas: the independent particle nə (Foc1), or the enclitic particle =kən or one of its allomorphs (=kəndí, =kəndá) (Foc2), or both (Foc3). The relativizer ɗan can optionally be associated with the Foc1 construction. This gives four different syntactic structures: 1. Foc1a : < nə NP > Predication 2. Foc1b : < nə NP ɗan > Predication (ex. 12)



The intonation of topic and focus

3. Foc2 : < NP=kən > Predication (ex. 13) The two structures can be combined: 4. Foc3 : < nə NP=kən > Predication (ex. 14) These structures have a negative counterpart when combined with the sentencefinal negative particle hə́ŋ, which can be completed by the optional loanword bàː, borrowed from Hausa, preceding the particle nə. The result is the structure (bàː) nə … hə́ŋ, as can be seen in (ex. 14). The resulting focal utterances are realized as a single intonation unit, with the standard pattern characterised by declination and final Fall. There is no intensity stress on the focus, or pause between the left-dislocated focus and the predication. nə mjàːní ɗaːŋ fu sə́mwòpíːjoː // nə mjàːní ɗan àː fu sə̂m =wopm -íː -oː // cop1 1pl rel2 3sg.pfv say name =1pl.pos -res -ass // We are the ones whose name he called. (SAY_BC_CONV_02_SP2_221)

(12)

Pitch (Hz)

280 250 200 150 100 75 0

(13)

Time (s)

1.721

tákwâːràs ŋátá mán tum / tákwâːràs =kən átâ man tu =mə / Takwaras =COP2 3SG.REM come meet =1SG.OBJ / […] it’s Takwaras who came to meet me […] (SAY_BC_CONV_03_ SP1_695)

180

Pitch (Hz)

150 100 60 0.04755

Time (s)

1.057

73

Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira

(14)

bàː nə maláːri gòpíːɣəndá tə̀tà wul vìː gíː hə́ŋ // bàː nə malâːr -i gòpm -íː =kən NEG1 COP1 Malar -INDF 1PL.POS -RES =COP2 tə̀tà wul vìː gíː hə́ŋ // 1PL.REM say mouth DIST NEG2 // […] it’s not our people of Malar who are speaking like this. (SAY_BC_ CONV_03_SP1_698)

140 120 Pitch (Hz)

74

100 80 60 0

Time (s)

1.875

2.3 Topic Two types of topics exist in Zaar: specified topics which are followed by a topic particle (called modal particle in Chadic linguistic tradition), and unspecified topics. 2.3.1 Unspecified topics Unspecified topics are left-dislocated, and correspond to an intonational unit characterized by various exponents separating the topic from the comment. The two main exponents that are always present are: suspension of declination, followed by a pause. These two exponents can be reinforced by a lengthening of the last segment of the topic, pitch reset and/or change of register. In (ex. 11), the second and third topics, ɮàmɗì gòsɗìːːː ‘the place where she goes’ and ʧáː fini gòs ‘what she does’ are unspecified topics, separated by a change in register. The third topic is followed by a pitch reset. The first topic mǎːm móːmi kúma ‘as for Momi’s mother’ is a topic specified by the discourse particle kuma ‘as for’. 2.3.2 Specified topics When topics are followed by a modal particle, e.g. kàm, ‘indeed’; máː, ‘even’, kúma, ‘too, as for’, there appears none of the elements characterizing unspecified topics, and the utterance constitutes a single major unit with overall declination and final fall. This is the case in (ex. 15) where the topic kotá lǎː mə́mmonʧí ‘all the men’s work’ is specified by the discourse particle máː ‘even’:



The intonation of topic and focus

kotá lǎː mə́monʧí máː mjǎː fíɣə́nì bát // kotá laː kə́ mə́monʧí máː mìká fi -kə́nì bát // all work POS man \PL even 1PL.CONT do -NMLZ all // All the men’s work even, we do it all. (SAY_BC_CONV_02_SP2_278)

(15) 500 Pitch (Hz)

400 300 200 100 0

Time (s)

2.12

As can be seen, the intonation of specified topics is not different from that of either thetic or focal utterances. This could mean that modal particles introduce some sort of focus in the sentences where they appear. However, it should be emphasised that specified topics are different from focus. The argument comes from a neighbouring language, Hausa, which has developed a split in the verbal system between TAM’s which are ± compatible with focus. Wh-Questions and Argumentfocus require a TAM that is compatible with focus, whereas topics, whether specified or not, do not require such TAM’s. This can be transposed to the function of topic particles in Zaar, which have all been borrowed from Hausa. 2.3.3 Frames The intonation of frames (Circumstantial elements which belong to the left-dislocated part of the utterance) is the same as that of unspecified topics, i.e. they are accompanied by suspension of declination and followed by a pause. A good example of frame is provided in (ex. 16) by the condition jâːn nə ŋaː gə̀t ‘if it is a girl’: (16)

jâːn nə ŋaː gə̀t / (247) wò somŋgə nə́ wút bàɬkə̀nì // jâːn nə ŋaː gə̀t / if COP1 young woman / wò som =kə nə́ wul =tə baɬ -kə́nì // 3SG.FUT help =2SG.OBJ for say =3S.OBJ tend -NMLZ // If she is a girl, she will help you with minding the fire. (SAY_BC_ CONV_02_SP2_290–2)

75

Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira 500 400 Pitch (Hz)

76

300 200 75 0

Time (s)

2.44

2.4 Conclusion Zaar intonation can be characterized by the following elements: 1. There is no correlation between sentence types, i.e. affirmative or interrogative, and intonation patterns. Sentence types are marked by syntax and morphology, not by intonation. 2. Register does not play a role to identify sentence types, but is used as a demarcation device for intonation units. 3. Two generic intonation patterns emerge: the first one is associated with thetic and focal utterances, characterized by declination and final fall; the second one is associated with unspecified topics and frames. It is minimally characterized by suspension of declination and a pause between the topic and the comment. 4. Minor intonation patterns have been identified. These are List-intonation and Exclamation, characterized by suspension of declination at the end of the unit; High-rise, associated with utterance-final ideophones and quality adverbials; Rise-Fall associated with rhetorical questions conveying surprise or irony. 3. The intonation of Topic and Focus in Tamasheq Tamasheq is a Berber language (Afroasiatic phylum), spoken in the most desertic parts of 5 different countries: mainly in Mali, Niger Republic and Algeria, and, to a lesser extent, Libya and Burkina-Faso.



The intonation of topic and focus

This paper, and more generally the Tamasheq part of the CorpAfroAs project, describes the Tawellemmet, a variety of Tamasheq spoken in the West of the Niger Republic: it is based on data collected near Abalak (West Niger). Even if Tamasheq is a fairly well-described language, its prosodic elements are largely underdescribed, as are the information structures of focus and topic. Our aim, here, is to give an outline of the accentual and intonational systems in Tamasheq, and see the part played by intonation in focus and topic. 3.1 Tamasheq prosodic system 3.1.1 Accent and general intonational contour Tamasheq is one of the few Berber languages which has an accent, essentially demarcative, i.e. used to identify accentual unit boundaries. It regularly appears on the antepenultimate syllable of nouns or verbs (including clitic elements).5 However, this demarcative accent can change position due to lexical or morphological constraints. For example, if a noun ends with a consonant, the last syllable becoming bi-moraic, the accent therefore falls on the penultimate syllable. As just mentioned, morphological considerations can also affect the rules of default accentuation: for example, resultative and imperfective verbal aspects have their own accentual patterns; possessive clitics, moreover, attract accent, and disturb the default demarcative rules. Finally, some nouns have their own accentual pattern, which is lexical. As for the other languages of this study, the most frequent intonational pattern in Tamasheq is a falling one, characterized by a regular lowering of the pitch, typical of declarative statements, whether positive (ex. 17) or negative, and WhQuestions (ex. 18): (17)

tssɐn ˈɣaz ˈdat awa // t- əssɐn ɣas dat awa // 2SG- know\PFV only before SG.M.PROXb.IDP // You knew that before. (TAQ_CL_NARR_03_003)

5.  For further precisions, cf. (Heath 2005), (Louali & Philippson 2005), (Lux & Philippson 2010).

77

Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira 330 300 Pitch (Hz)

250 200 150 100 0

0.9383

Time (s)

To be more precise, in an unmarked declarative utterance, F0 normally rises up to the first demarcative accent, the pitch peak of the whole utterance, and then begins to fall regularly down to the end of the IU. In the example above, the pitch peak of the whole utterance is the vowel of the first accented word: the adverb ɣas ‘only’. In (ex. 18), a Wh-Question, the pitch peak is the second syllable of the utterance, the lexical part zdɐj of the verb izdɐj ‘he recognized’. (18)

iˈzdɐj aˈdəriz ˈmani wa // i- əzdɐj adəriz mani wa // 3sg.m- recognize\PFV track\abs.sg.m which.q sg.m.proxb // TAQ_CL_NARR_03_MOUSSATRACES Which track did he recognize? (TAQ_CL_NARR_03_085)

91.9111935 250

92.522731

200

Pitch (Hz)

78

150

100 i 91.91

zd5j

a d@

riz

ma

Time (s)

ni

wa 92.52

This general intonational pattern interacts with the perception of accent, as the power of accentuation follows the general intonational contour of the IU: the higher the F0 in the intonational pattern, the more audible the accent is, and vice-versa.



The intonation of topic and focus

For a neutral declarative sentence, i.e. with an intonation pattern characterized by declination over the IU, F0 rises sharply with the first demarcative accent which constitutes the pitch-peak of the IU. The next demarcative accents follow the declination of the intonation pattern, and, if further rises in F0 are possible, their absolute value is lower, and they get weaker and weaker, becoming almost inaudible at the end of the IU: there is a sort of accentual declination (see ex. 17 & 18). 3.1.2 Particular accentual contours While in Tamasheq the neutral intonative pattern is falling (a universal trend in languages) we can often notice a rise of F0 at the end of an intonation unit, which corresponds to a suspensive intonation. In that case, the general intonational pattern is falling, but F0 rises at the real end of the IU, usually on the last syllable of the last term of the IU (or on the last two syllables), as we can observe on (ex. 19): (19)

uˈhun nəˈggiːːːlɐt za / uhun n- əggilɐt za / then 1PL- move_on\RSLT hence / So, we were moving… (TAQ_CL_NARR_005_15)

340

Pitch (Hz)

300 250 200 150 100 0

Time (s)

0.9766

We find the same intonational shape for enumeration, in which the rise of F0 at the end of the intonation units is accompanied by a lengthening of the last vowel of the IU, as we can see here in (ex. 20): (20)

təẓẓurt ən ˈtədiːːːst / təẓẓurt ə̃ ˈɣɐssaːːːn / 438 əˈruruːːː / t- əẓẓur -t n t- ədis -t / F- pain\ABS -F.SG GEN F- belly\ANN -F.SG / t- əẓẓur -t n ɣɐs -an / əruru / f- pain\abs -f.sg gen bone\ann -pl.m / back\ann.sg.m / Pain of stomach, pain of bones, (of) back… (TAQ_CL_CONV_02_05)

79

80 Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira

Pitch (Hz)

230 220 200 180 160 150 0

Time (s)

3.309

Another exception to declination is found in Yes/No-questions, where F0 rises at the end of the unit, as we can see in (ex. 21): tsɐla // t- əsla -ɐ // 2SG- hear\PFV -2SG // Do you understand? (TAQ_CL_CONV_01_102)

(21)

Pitch (Hz)

270 250 200 150 120 0

Time (s)

0.5019

Finally, among falling IUs, we can distinguish a sub-category which is typical of narratives: regularly, the entire group beginning an utterance is pronounced in a very high-pitched voice, and then F0 begins to go down. In that case, we can no longer identify the pitch-peak of the IU, as a whole group has this role (ex. 22). What we have here is a device meant to make the text livelier: this is used by all our speakers. This somewhat modifies the neutral intonative pattern, even if the general curve, in this case, remains the falling one. (22)

əˈglaːndu ˈttagɐn ˈəʃʃahɐj // əglaː -ɐn =du ad t- aggu -ɐn leave\RSLT -3PL.M =PROX POT IPFV- do\IPFV -3PL.M əʃʃahɐj // tea\ABS.SG.M //



The intonation of topic and focus

Then, they left to make some tea. (taq_cl_narr_03_073)

Pitch (Hz)

450 400 300 200 100 0

Time (s)

1.148

In (ex. 22) the general intonational contour is falling, but the first part of the utterance əglaːndu ‘they left’ is pronounced at an extra-high level (F0 average is 360Hz for this chunk −271Hz min. / 434Hz max.), and is separated from the rest of the IU, which is pronounced in a ‘normal’ voice, and much lower (F0 average is 182Hz for this chunk −113Hz min. / 294Hz max). Thus, Tamasheq has several intonational patterns, depending on sentence types, but the neutral intonational pattern is falling. Specific constructions may also change this basic intonational pattern, i.e. focus constructions. These constructions are marked by their intonation, as well as their morphology and syntax. However, although Tamasheq uses different morpho-syntactic structures to differentiate between subject, object and predicate focus, these structures share the same intonation pattern. 3.2 Focus 3.2.1 Subject and Object Focus in Tamasheq In Tamasheq, focus is expressed by morphological, syntactical and intonational means. Syntactically, focus is left-dislocated. While neutral word order in Tamasheq is considered to be VSO, the normally post-verbal nuclear arguments move to clause initial position if focused (Heath 2005). Morphologically, three exponents are associated with focus. The first exponent is the morpheme a, originally a neutral demonstrative pronoun, which follows the focused term. The second exponent affects the verb: when the subject is focused the verb appears in a dependent form, traditionally called ‘participle’, mainly used in subject relative clauses. This use of a dependency marker can be understood as an exponent of the preconstructed status of the predication.

81

Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira

The third exponent is the use of the ‘absolute state’. When the subject argument of a verb appears in its default place after the verb and if its morphological shape allows it, it carries a mark of dependency. If on the contrary, it is placed before the verb, this mark of dependency disappears, and the noun occurs in the ‘absolute state’. Thus, when the subject argument is focused, it is placed before the verb and occurs in the absolute, rather than annexed, state. The morphosyntactic exponents of argument focus in Tamasheq are summed up in the following table: Non Focus

V

S[Annexed State]

O

Object Focus

Oa

V

S[Annexed State]

Subject Focus

S[Absolute State]

V[Participle]

O

a

Last but not least, in Tamasheq, focus has a specific intonational contour. Actually, we notice a pitch peak on the focused term, where a regular demarcative accent was expected, combined with a peak of the intensity curve, both on the accented syllable of the focused term, and on the morpheme a. (Ex. 23) illustrates these three features for the subject argument focus fɐlan: morphological (underlined in the glosses), syntactical and intonational. ˈfɐlan a eṃosɐn ɐˈwedɐn əʔ# / fɐlan a i- əṃos -ɐn fellan foc sg.m.rel.sbj- be\pfv -sg.m.rel.sbj ɐwedɐn əʔ# ## person\abs.sg.m əʔ# ## It’s Fellan who was someone… (TAQ_CL_NARR_02_02)

(23) 250 200 Pitch (Hz)

82

150 100 0

Time (s)

1.698

As far as intonation is concerned, we observe that the pitch-peak of the IU is the first syllable of the focus fɐlan ‘Fellan’, combined with intensity peaks, on the same syllable of the word fɐlan and on the morpheme a. In this case, and quite often, as the focus is left-dislocated, it is the first element of the IU, so that the pitch-peak on the focus can be explained also by its position



The intonation of topic and focus

in the intonational contour. But this is not always the case: in spontaneous discourse, even though the focus is left-dislocated, we can find a preamble preceding the focus in the same IU, such as the introductory verb orde ‘I think’, in (ex. 24) where the subject is focused. orˈde ˈʃʃis a eˈṃɔsn aməˈggergɨs // ordaː -ɐ as ʃi =s a believe\RSLT -1SG that father\SG.M =3SG.M.POSS.KIN FOC i- əṃos -ɐn aməggergəs // SG.M.REL.SBJ- be\PFV -SG.M.REL.SBJ rich_man\ABS.SG.M // I think that his father was rich. (TAQ_CL_NARR_03_127)

(24)

Pitch (Hz)

280 250 200 150 90 0

Time (s)

1.597

In this case, orde ‘I think’ presents a pitch-peak (233Hz) as first term of the IU, but the highest pitch-peak is on the focus ʃis ‘his father’ (247Hz), even though it appears in second position in the IU. Besides, although the mean intensity of the utterance is 61dB, we can notice an intensity peak at 68,03dB on ʃis ‘his father’, and another peak at 68,81dB on the focus morpheme a. Then, focus is not only marked by heavy morpho-syntactical means, but also by prosodic means (increase of F0 and intensity), even when the focus is not at the beginning of the IU. Focus supersedes the declination characterizing the neutral affirmative intonational pattern. Finally, when we compare the behaviour of focus and intonation across Berber languages, we find that if specific intonational contours are a common means of expressing focus, the conditions of variation of F0 are specific to each language. According to (Mettouchi 2003), in Kabyle, the rise of F0 in contrastive focus constructions occurs on the focus morpheme a, and not on the focused term as in Tamasheq. This type of difference is to be expected, as each Berber language has its own prosodic system (as opposed to Tamasheq, Kabyle has no demarcative accent, for example) and confirms the necessity for prosodic systems to be described accurately for each Berber language.

83

84

Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira

3.2.2 Predicate Focus: Different construction, same intonation Predicate focus in Tamasheq uses a construction that is similar, but more complex than argument focus. In that case, the semantic value of the action is carried out by a focused verbal noun followed by the conjugated verb iga ‘to do’, which has lost its semantic value. In these constructions, the morpheme a, a morphological mark of focus, is not obligatory. The main characteristics of predicate focus are summedup in the following table, set against Non-Focus sentences: Non Focus

V

S[Annexed State]

O

Predicate Focus

Verbal Noun[Absolute State] (a)

V iga

O

(Robert 1993: 45) presents an interesting analysis of this construction for various languages, when she argues that this kind of construction shows “une dissociation entre deux fonctions généralement confondues dans le verbe : celle de centre syntaxique (ici assumée par le pro-verbe) et celle de centre assertif (noyau rhématique exprimé par la forme nominale du verbe)”. The pro-verb iga ‘do’ takes on the function of syntactical nucleus of the sentence while the assertive function is moved out of the verbal nexus and taken on by the morpheme a and the intonation stress highlighting the rhematic value of the Verbal Noun. However unusual the predicate-focus structure may be for Tamasheq as compared to argument-focus structures, it still shares with them the same intonation pattern, e.g. in (ex. 25) with a rise of F0 on the accented syllable of the focused item abzug ‘madman’, and two intensity peaks, one on the accented syllable of the focused item (77,80dB), and one on the morpheme a (79,20dB), for an average intensity of 70,47dB. (25)

kɐjju ˈabzug a tiˈgɐ ˈnnɐnas // kɐjju abzug a t- iga -ɐ 2SG.M.SBJ.IDP madman\ABS.SG.M FOC 2SG- do\PFV -2SG ənna -ɐn =as // say\PFV -3PL.M =3SG.DAT // You, a madman you are, they said to him. (TAQ_CL_NARR_03_109)



The intonation of topic and focus 300

Pitch (Hz)

250 200 150 100 0

Time (s)

1.231

We can see that focus constructions are heavily marked in Tamasheq, by morphological and syntactical elements as well as by intonation. Yet, although three morpho-syntactical features enable us to distinguish between three focus constructions that are slightly different from one another, depending on the syntactic function of the focus (Predicate, Subject, Object), these three structures share the same intonation pattern. Contrary to focus constructions, topic constructions in Tamasheq are lightly marked from a morpho-syntactic point of view. Moreover, as appears in the pilot corpus, they carry no specific intonational pattern. 3.3 Topic As already mentioned in the introduction, very few descriptions of Berber languages give an overview of topicalization. However, in the available literature, (Heath 2005), (Lafkioui 2010), (Kossmann 2011), the topic is minimally characterized by left-dislocation: “topicalized elements are put in the left periphery of the sentence” (Kossmann 2011: 132). In Heath’s opinion, topic is even “almost external to the clause” (Heath 2005). Apart from left-dislocation, two more elements are associated with the topic. As it is left-dislocated, the topic, like the focus, has a preverbal position. Thus, if it is a subject argument, it will appear in the ‘absolute state’ form, as opposed to non-topicalized post-verbal subjects that are in the ‘annexed state’. However, this morphological distinction between topicalized and non-topicalized subject is possible only if the topic is a subject, and if the morphological category of the noun allows that change. Then, according to Tamasheq grammars, the topic is very often taken-up by a pronominal element in the second part of the utterance. However, the use of such a resumptive pronoun is optional for topics, and it is very rare in the Tamasheq CorpAfroAs corpus of spontaneous discourse. As for intonation, unlike other Berber languages, topic in Tamasheq cannot be associated with specific intonation patterns. For example, for (Lafkioui 2010), in Rifian Berber, topic is necessarily followed by an intonation cleft, and

85

86 Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira

by an inversion of the melody. For Tamasheq, this seems to be less obvious, even if (Heath 2005) states that a topicalized element “may have a coma intonation” that probably corresponds to a break or a suspensive intonative contour. In the CorpAfroAs data for Tamasheq, most of the time, the topic is not correlated to a specific intonation, neither by a break nor by an inversion of the melody or a suspensive contour, as we can see in (ex. 26), where tazdit ‘the fact of recognizing’ is topicalized from an initial post-verbal position of object. ˈtazdit əˈmmɐdu tɐˈga ˈɣurna // t- azdi -t əmmɐj =du F- recognition\ABS -SG.F when.Q =PROX t- ɐga -ɐ ɣur =ənɐɣ // 2SG- do\PFV -2SG at =1PL.PREP // This vision, when did you have it in our place? (TAQ_CL_NARR_03_096)

(26) 245

Pitch (Hz)

200 150

90

0

1.156

Time (s)

In this example, tazdit ‘recognition’ is left-dislocated, and this is the only evident exponent of its topical status. From an intonational point of view, the accented syllable of the topic constitutes the pitch peak of the whole utterance, but this is the regular behaviour for the first term of a neutral IU. Topics are often the pitch peak of the utterances, but since they are also the first terms of IUs, this is the kind of intonation that is expected. As far as intensity is concerned, we can also notice that, for topics, there is no evidence of regularity: either the intensity curve is quite flat in those examples, or the intensity peaks are not underlying the topic, contrary to what is observed for focus. However, an optional intonative mark can be associated with topic constructions: in some topicalized constructions, in our data, we have noticed a significant change of register between the topicalized term and the comment, e.g. (ex. 27). (27) ˈaləsa taɣɐˈranet aləs =a

t- aɣɐra

=net

//



The intonation of topic and focus

man\abs.sg.m =proxa f- characteristics\abs.sg =3sg.m.poss // This man, these are his characteristics. (TAQ_CL_NARR_03_096) 320 300 Pitch (Hz)

250 200 150 90 0

Time (s)

0.8334

In this topicalized nominal sentence (‘This man, these are his characteristics.’) the topic aləsa ‘this man’ is taken up by the resumptive possessive pronoun =net, ‘his’. The pitch average is 252Hz for the topic, whereas it is only 174Hz for the comment. We find the same register shift in (ex. 28), which is an example of a very frequent kind of topicalization where the topic is an independent subject pronoun. (28)

ˈnɐkk ˈzinnasɐn ɐˈgɐdo ɣurwɐn ˈtazdit // nɐkk za i- ənna =asɐn 1sg.sbj.idp hence 3sg.m- say\pfv =3pl.m.dat ɐga -ɐ =du ɣur =əwɐn do\pfv -1sg =prox at =2pl.m.prep t- azdi -t // F- recognition\abs -sg.f // As for me, he said to them, I had a vision there. (TAQ_CL_NARR_03_092)

280

Pitch (Hz)

250 200 150 100 0

Time (s)

1.574

In this example, apart from the register shift, we also notice the presence of the morpheme za ‘hence’ and the quotative innasɐn ‘he said to them’ between the

87

88

Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira

topic and the comment.6 Even if za ‘hence’ is not a morpheme of topicalization, it very often appears after the topic, especially with this kind of topicalization in which the topic is an independent subject pronoun. The insertion of these different elements supports the interpretation of the topic as external to the clause in Tamasheq, as Heath stated in his grammar (Heath 2005). In this precise case, the register shift occurs after the whole group topic -za-innasɐn. To sum up, in Tamasheq the topic is marked by various emponents: left-dislocation; the so-called ‘absolute state’ if the topic is a subject; resumptive pronouns; discourse particles, e.g. za, etc.; and an optional intonational mark, i.e. a register shift. However, among those elements, the resumptive pronouns, discourse particles and register shift are all optional. The other elements (left-dislocation and ‘absolute state’) are shared with the focus. The only defining elements for the topic are negative, i.e. the absence of the morphosyntactic elements accompanying leftdislocation in the focus structure, which are the focus morpheme a and the heavy stress on the focused item. These differences in the topic and focus structures are probably linked to the frequency of appearance of these two constructions. Actually, focus constructions are not frequent in Tamasheq, and they are heavily marked, while topic constructions are much more frequent, and, consequently, lightly marked. This illustrates the fact that the more frequent a construction, the less marked it is (and viceversa); this is true both for morphosyntax and intonation. 3.4 Topic and focus in the same utterance As a last example, I want to show how topic and focus can coexist in the same utterance. First of all, the topic, as it is almost external to the clause, remains the first term of the IUs, whereas the focus, much more integrated into the clause, follows the topic. We can see in (ex. 29), in which two names of places (agɐrɐjgɐrɐj ‘Agaraygaray’ and ajɐṛ ‘Ayr Moutains’) are contrasted through two topicalized independent subject pronouns (kɐjju ‘you’ and nəkkəni ‘us’). (29) ˈkɐjjan agɐriˈgɐrɐj / ənəˈkkəne ˈɑjɐṛ ˈdu nəfəl // kɐjju ijan agɐrɐjgɐrɐj / 2SG.M.SBJ.IDP one\SG.M Agaraygaray /

6.  The pitch peak of F0 on the second syllable of the dative clitic asɐn is due to someone else speaking at the same time.

The intonation of topic and focus 89



nəkkəni ajɐṛ =du n- əfəl // 1PL.M.SBJ.IDP Ayr =PROX 1PL- leave\AOR // You, you are an Agaraygaray (from the middle), and we, we come from ayr. (TAQ_CL_NARR_03_115) 370

Pitch (Hz)

300 200 100 0

Time (s)

1.903

In this example the accented syllables of the two topicalized independent subject pronouns (kɐjju7 and nəkkəni) at the beginning of each intonation unit constitute the pitch-peak of the occurrences, which is the regular intonation for the first term of a neutral declarative occurrence. In the second intonation unit, we notice a remarkable rise of pitch on the accented first syllable of ajɐṛ ‘Ayr Mountain’, which would be expected to be lower in a neutral declarative utterance. This rise in pitch is combined to a steeply rising intensity curve:8 We recognize here the typical intonative contour of focus. Moreover, ajɐṛ is left-dislocated: the object would be placed after the verb in a neutral word order. We remark also the migration of the directional particle du before the verb, something that is found in regular focus constructions. Even if the focus morpheme a is absent in this example, we can consider ajɐṛ ‘Ayr mountain’ to be focused. Intonation on the one hand and left-dislocation on the other hand are enough to express focus: syntax (left-dislocation) and intonation seem to be the two most important elements in order to identify a focus in Tamasheq, as the only two elements that are obligatory. 3.5 Conclusion As far as intonational patterns are concerned, we saw that Wh-Questions and assertive sentences are not differentiated by intonation in Tamasheq: they share the same falling intonational pattern, which has been defined as the neutral one. Only 7.  The sharp pitch decrease on the first syllable of kɐjju is due to a background noise. 8.  While mean intensity for this IU is 70dB, the peak on this word reaches 79,8dB.

90 Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira

suspensive intonation, including enumeration and Yes/No-Questions presents a different intonative pattern, with a rise of F0 at the end of the unit. Focus constructions have a particular intonational contour too, which is obligatory and plays an important part in their identification: these morpho-syntactical heavily marked constructions are also heavily marked from an intonational point of view. On the contrary, topics, the other extraction process parallel to focus, have no specific intonation, apart from an optional register shift that can also be used in other constructions, and are only marked negatively by morpho-syntactical means. In Tamasheq, intonation, together with morpho-syntax, seems to underline the less frequent and more intricate constructions: topic is clearly peripheric in the clauses, and can be easily identified; focus, on the other hand, is more intricated in the clause, and has to be underlined quite heavily, by different means, so as to be perceived. 4. The intonation of Topic and Focus in Juba Arabic 4.1 Juba Arabic prosodic system Juba Arabic is a pitch accent language in which, differently from modern Arabic dialects, vowel length is not phonologically distinctive, whereas the position of pitch accent discerns both lexical (ex. sába ‘seven’ vs. sabá ‘morning’) and grammatical (ex. kátulu ‘to kill’, katúlu ‘the action of killing’, katulú ‘to be killed’) meanings (Manfredi and Petrollino 2013; Manfredi and Tosco, 2014). 4.1.1 Declarative sentences The unmarked status of declarative utterances in Juba Arabic is signaled by a global declining pattern of F0. More precisely, the gradual lowering of the intonation curve in declarative sentences regularly floats with the lexical high pitch accents included in a given intonation unit. In these conditions, the sentence accent corresponds to the highest pitch accent of the intonation curve. In (ex. 30), the sentence accent falls on the first syllable of the main verb dúgu ‘beat’ culminating at 149.3Hz while the bottom of F0 corresponds to the first non-accented syllable of the last content word kalás ‘definitely’ which falls down to 78.4Hz. The following example forms a single intonation unit. (30)

u dʒa dúgu telefóːn ligó madáːris báda henák kaláːs // úo dʒa dúgu telefón ligó madáris báda henák kalás // 3SG do_after beat telephone find schools start there definitely // He called (since) the school year already has started there. (PGA_SM_ NARR_1_113)



The intonation of topic and focus 160

Pitch (Hz)

140 120 100 75 0

1.895

Time (s)

4.1.2 Yes/No-Questions In most cases, Yes/No-Questions are morphologically and syntactically unmarked.9 On that account, prosody is fundamental to differenciate Yes/No-Questions from declarative utterances. Contrasting with the neutral intonation contour of declarative sentences, Yes/No-Questions are associated with an overall rising of the intonation curve in which the sentence accent falls on the first syllable of the last content word of the utterance. In (ex. 31), the sentence accent corresponds to the first syllable of the phonological word [taːnuː] (resulting form the agglutination of the 2SG independent pronoun íta with the following verb ájnu ‘see’) and it culminates at 169.7Hz. (31)

minːːː wára dʒébel kudʒúr taːnuː / min wára dʒébel kudʒúr íta ájnu / from behind mountain Kujur 2SG see / From behind Mount Kujur, do you see? (PGA_SM_CONV_1_SP1_303)

Pitch (Hz)

180 160 140 120 100 75

min

wa

ra

dé bel

0

ku

dúr

ta

nu// 1.369

Time (s)

9.  As observed by Manfredi and Tosco (2014), in Juba Arabic polar questions have the same SV(O) order as declarative utterances and they can be optionally introduced by the sentence initial interrogative particle hal, which is absent in our corpus.

91

Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira

4.1.3 Wh-Questions As opposed to polar questions, Wh-Questions are marked by the same declining intonation pattern as declarative sentences. However, since all the question-words present a lexical high pitch accent on their last syllable (i.e. munú ‘who’, ʃenú ‘what’, jatú ‘which’) and they generally occur sentence finally, the bottom of the intonation curve corresponds to the penultimate syllable of the question word. In (ex. 32), the sentence accent corresponds to the monosyllabic subject tin [ti:n] ‘mud’ that reaches 171Hz, while the bottom of the intonation curve coincides with the first syllable of the interrogative pronoun ʃenú ‘what’ at 89Hz. The lexical pitch accent on the second syllable of the interrogative pronoun reaches 120Hz. (32)

tiːn wála ʃenú tin wála ʃenú // mud or what // Is this mud or what? (PGA_SM_CONV_1_SP1_431) Pitch (Hz)

92

180 160 140 120 100 75

tin 0



la

Time (s)

ʃe

nú / / 0.509

4.2 Focus Morpho-syntax and intonation are complementary in marking contrastive focus in Juba Arabic which is characterized by the presence of at least two different contrastive focus markers. These are: 1. zátu (ge FOC1; rx PTCL.FOC) expressing contrastive focus proper. 2. mà= (ge FOC; rx INTF.FOC) expressing counter-assertive focus. These focus particles normally act within a single intonation unit, and they correlate with different prosodic contours, as well as with different syntactic configurations. 4.2.1 The contrastive focus particle zátu In Juba Arabic contrastive focus (i.e. the focus selected among presupposed alternatives) on arguments, predicates and sentences is marked by the independent focus particle zátu which is diachronically related to the Sudanese Arabic 3SG.M emphatic reflexive *zaːt=u ‘himself ’. In Juba Arabic, zátu can be considered as a focus marker since its use is not obligatory in simple declarative clauses and it entails a contrast between the focused item and other entities that might fill the same syntactic position.



The intonation of topic and focus

When zátu marks a contrastive focus on arguments and predicates, it follows the focused item. In these cases, zátu often occurs as the last content word of an intonation unit where it appears at the bottom of the intonation curve, while the focused item bears a higher pitch point induced by the following focus marker.10 In (ex. 33), the first syllable of the focused verb dówru ‘walk’ (realized as [doːru]) culminates at 128.5Hz, and the focus particle zátu (realized as a monosyllable [zaːt]) reaches 114.6Hz. In this context, it can be noticed that the sentence accent does not correspond to the focused item, since it falls on the non-focused subject íta (realized as a monosyllable [ta]) that reaches 154Hz. (33)

íta ma bágder doːru zaːt // íta ma bágder dówru zátu // 2SG NEG can walk FOC1 // You cannot even walk. (PGA_SM_CONV_1_SP2_279)

Pitch (Hz)

160 140 120 100

ta

ma

bág

der

dó

ru

0

zát // 0.766

Time (s)

When zátu marks a sentence focus, it presents different prosodic and syntactic constraints. In this case, zátu precedes the whole focused sentence and it bears an extra-high pitch accent. In (ex. 34), the sentence accent falls on the first syllable of zátu, which is the first content word of the major intonation unit, and reaches 170.7Hz. The rest of the focused utterance hajá bikún kweːs is characterized by a sharp fall of the intonation curve. (34)

jan zátu hajá bukún kweːs // jáni zátu hajá bi= kun kwes // that_is_to_say FOC life IRR= be good // Life can indeed be good. (PGA_SM_CONV_2_SP1_89)

10.  It should be stressed that zátu can still be used as an emphatic reflexive pronoun (ex. PGA_ SM_CONV_2_SP2_290 úo zátu biwónusu ma ána sáwa ‘he himself would talk together with me’). In these cases, zátu bears a pitch accent that is higher than that of the preceding noun phrase it is associated with.

93

94 Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira

Pitch (Hz)

200 150 100 75

jan

zá

tu

ha



bi

0

kun

kwes // 1.112

Time (s)

4.2.2 The counter-assertive focus particle mà= Counter-assertive focus is a sub-type of contrastive focus in which the speaker is at odds with the hearer, because the speaker considers that the embedded proposition should already be part of the mutual knowledge of the conversation, and presupposes that the hearer doesn’t (Zimmerman 2007: 150). As a consequence of morpho-syntactic interference with Sudanese Arabic, Juba Arabic speakers tend to integrate the proclitic mà= to mark a counter-assertive focus.11 The prosodic contours of counter-assertive utterances are easily identifiable since the focused element is emphasized by a high pitch on its first syllable, while the preceding focus operator mà= is pronounced with a low pitch accent. In (ex. 35), the speaker points out the obviousness of the fact that the house door is habitually left open in Khartoum. He then puts in focus the adjective fáti ‘open’ whose first syllable reaches 132.5Hz. Prosodic prominence is also associated with a longer realization of the first syllable of the embedded element [faːti]. The counter-assertive marker mà=, is related to a lower pitch (113.2Hz) than both the preceding subject and the following focused attribute. (35)

baːb < màfaːti > // baːb < mà= fáti > // door < FOC= open > // The door is open (PGA_SM_CONV_2_SP2_602) Pitch (Hz)

140 120 100 80

bab 0

mà Time (s)

fá

ti // 0.559

11.  See (Manfredi 2009) for a description of the pragmatic functions played by mà= in the Baggara Arabic dialect of Western Sudan.



The intonation of topic and focus

4.3 Topic Juba Arabic marks topicalization by means of syntax and prosody. In addition, as in Zaar, we can distinguish between unspecified and specified argument topics. In Juba Arabic unspecified topics do not present any marker of definiteness and they typically begin with the invariable existential copula fi that marks the introduction of a new referent into the universe of discourse (Manfredi and Tosco, Forth.). Unspecified topics are left-dislocated and they constitute a separate intonation unit ending with a suspension of declination and (optionally) a pause. As we can see in (ex. 36), the prosodic contour of the intonation unit containing an unspecified topic is characterized by an emphatic high pitch rise corresponding to the existential copula fi (reaching 149Hz) as well as by a sharp rise of the intonation curve on the last syllable of the last content word (in this case the adverb henák whose pitch reaches 145Hz in the topic, against 93.5Hz for the same adverb as final content word of the comment). The comment is characterized by a gradual declination of the intonation curve where the prosodic prominence falls on the first syllable of the main verb géni that reaches 119.5Hz. The syntactic role of the nominal topic is marked in the comment by a resumptive 3SG independent pronoun úo in oblique position. (36)

fi akú tái táni henák / 315 ána bra géni mo henák // fi akú tái tani henák / 315 EXS brother POSS.1SG other there / 315 ána bi= rówa géni ma úo henák // 1SG IRR= go stay with 3SG there // There’s a friend of mine there, I will go (to stay) with him there. (PGA_SM_ NARR_1_SP1_95–97) 175 160

Pitch (Hz)

140 120 100 75

fí 0

a



tái

tá ni he nák/

321 Time (s)

ána bra

gé ni mo henák// 2.148

Specified topics are also left-dislocated but, unlike unspecified topics, they are also delimited by a final default proximal demonstrative pronoun de. In prosodic terms, specified topics typically constitute an intonation unit followed by a pause. (Ex. 37) shows a complex topicalized utterance where the first intonation unit corresponds

95

96 Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira

to an intrinsically specified topicalized 3SG personal pronoun úo that is followed by a second intonation unit, constituted of the specified topic mustéʃfa ta dʒúba de. This second topic functions as an apposition to the initial independent pronoun úo. In this case, the syntactic function of the two topics is manifested in the comment by a resumptive 3SG independent pronoun úo in subject position. (37)

uːːː / mustésfa ta dʒuba da // 147 u tabán // úo / mustéʃfa ta dʒúba de // 147 úo tabán // 3SG / hospital POSS Juba PROX.SG // 147 3SG tired // It, the Juba hospital, it is poor. (PGA_SM_CONV_2_SP2_2–5)

Pitch (Hz)

200

150

90

u /

mus



sfa

ta dú ba

da /

147

u

ta

bán // 1.896

0 Time (s)

4.4 Frames In Juba Arabic, as in Zaar and Tripoli Arabic, the intonation of frames is the same as that of specified topics. This means that the left-dislocated frame is prosodically marked by suspension of declination followed by a pause. In syntactic terms, frames are distinguished from topics by the absence of resumptive pronouns as they don’t have any function as argument of the predicate. (Ex. 38) shows a locative frame setting topic creating an intonation unit ending with a sharp rise of F0 at 172Hz on the first syllable of the word dʒúba ‘Juba’. The frame is followed by an argument topic and, then, by a comment. The argument topic is linked with a rising intonation culminating with the last content word zol ‘man’, while the comment is characterized by an emphatic high pitch on the first syllable of múʃkila ‘problem’ reaching 164Hz. (38) fi dʒúba / 262 aʃanán bwodáʃara álle zoːl / 236 múʃkila // fi dʒúba / 262 aʃán ána bi= wodí áʃara alf le zol / in Juba / 262 in_order_to 1SG IRR= give ten oh to man /



The intonation of topic and focus

236 múʃkila // 236 problem // In Juba, in order to give ten-thousands (pounds) to someone, it’s a problem. (PGA_SM_CONV_1_SP1_174–178)

Pitch (Hz)

200 150

100 75

fi dé ba/

262

a ʃan án bwo dá ʃa ra al le

0

zo1/

236

múʃ kila // 2.295

Time (s)

4.5 Conclusion All things considered, Juba Arabic intonation is characterized by the following elements: 1. There is an unmistakable correlation between sentence types and intonation patterns: declarative sentences and Wh-Questions follow a declining intonation pattern, while that of Yes/No-Questions is rising. 2. With regard to contrastive focus, we can observe the presence of different prosodic configurations related to different focus types: the first one is that of a contrastive focus marked by zátu on predicates and arguments in which there is no prosodic marking of contrastiveness; the second one is that of contrastive focus on sentences which is instead characterized by the presence of an extra high pitch on the focus particle zátu and a fall of F0 corresponding to the rest of the focused sentence; the third one is related to counter assertive focus (a subtype of contrastive focus) where the focus marker mà= is systematically related to a low pitch while the focused element receives prosodic prominence. 3. Unspecified and specified topics are both syntactically and prosodically differentiated. In syntactic terms, unspecified topics are introduced by an existential copula fi, while specified topics are followed by the proximal singular demonstrative pronoun de. Prosodically speaking, both unspecified and specified topics are separated from the comment by a suspension of declination and a pause. Unspecified topics are also characterized by the presence of an extra high pitch on the existential copula fi.

97

98 Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira

4. Frame setting topics have the same intonation contour as specified argument topics. However, they are distinguished from argument topics by the absence anaphoric elements in the comment. 5. The intonation of Topic and Focus in Tripoli Arabic 5.1 Tripoli Arabic prosodic system Tripoli Arabic has a lexical stress system, i.e. every content word is stressed on one of its syllables. The place of the stress is not fixed but it is predictable (Pereira 2010: 88–89). The prosodic structure of Tripoli Arabic also involves sentence-level prominence stress: within a sentence, some words are more prominent than others and stress thus concerns one particular syllable which is perceived as the most prominent in the sentence. 5.1.1 Neutral intonation pattern and declination In Tripoli Arabic, positive assertions (ex. 39), negative assertions (ex. 40), and WhQuestions (ex.41) are characterized by declination (a gradual lowering of the pitch over the intonation unit). Yes/No-Questions (ex. 42) show a different intonation pattern characterized by a pitch rise occurring in the penultimate syllable of the intonation unit.

Positive assertion (Ex. 39) illustrates the basic structure of intonation in a positive assertion. At the beginning of the utterance, on the first syllable, the pitch is 104.92Hz. The maximum pitch of the whole utterance is 119.15Hz indicating its nuclear stress: the maximum of the curve is situated on the second syllable [ħaʃ]. Then, the curve is characterized by the gradual lowering of the pitch down to 96.53Hz. (39)

staːħǝʃna ǝlbaziːn // staːħǝʃ -na ǝl= baːziːn // miss\PFV -1PL DEF= barley_flour_gruel // We’re short of barley flour gruel. (AYL_CP_NARR_01_003)

The intonation of topic and focus 99

130

Pitch (Hz)

120

100 90 311

sta

ħәʃ

nal

ba

xi

3.623

na

22ss 4.706

Time (s)

Negative assertion (Ex. 40) illustrates the basic structure of intonation in a negative assertion. At the beginning of the utterance, the pitch is 188Hz and it rises until 217.30Hz indicating its nuclear stress. Then, the last syllable of the utterance [maʕʃ] is characterized by the declination of the curve down to 131.75Hz. (40)

ma tǝsmǝʕʃ / ma t- smǝʕ =ʃ / NEG 3F- hear\IPFV =NEG / She can’t hear. (AYL_CP_NARR_06_073)

240

Pitch (Hz)

200 150 110

ma

tao

mәʃ

36

36.78 Time (s)

5.1.2 Wh-Questions The prosodic contour of Wh-Questions is characterized by the rapid fall-off from high pitch (from 168.67Hz down to 113.52Hz), occurring after the nuclear stress of the utterance situated on the vowel [iː] of the interrogative [kiːf] “how” (see ex. 41). The rest of the prosodic contour shows a gradual lowering of the pitch down to 93.83Hz for the last syllable. (41) kif əlburdim ahwa // kiːf ǝl= buːrdiːm

aːhuːwa //

100 Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira

how DEF= steamed_meat PROX.M // How is this steamed meat (cooked directly in a hole in the ground)? (AYL_ CP_NARR_04_SP1_006) 200

Pitch (Hz)

150

100 75

kif

dim

bur

əl

6.212

a

hwa 7.149

Time (s)

5.1.3 Exceptions to declination: Yes/No-Questions Yes/No-questions in Tripoli Arabic share the same SVO form of sentences as declaratives. The difference between them is signalled by intonation. Whereas the intonative contour of a declarative sentence (an assertion) is characterized by the gradual lowering of the pitch over the intonation unit, Yes/No-questions are characterized by the rising of the pitch occurring in the penultimate syllable, leading to the raising of the pitch register. The rise in the penultimate syllable of the utterance is related to the final lexical stress. Moreover, the vowel of the last segment is usually lengthened. In (ex. 42), the pitch decreases from 144.13Hz at the beginning of the utterance to 103.16Hz, and then suddenly rises to 129.43Hz when pronouncing the penultimate syllable [si]. (42)

tsaʒʒlu ʃʃwaṛaʕ ṛṛajsija // t- sǝʒʒǝl -u ǝl= ʃwaːrǝʕ ǝl= ṛaːjsi -a / 2- record\IPFV -PL DEF= street\PL DEF= main -F / Have you recorded the main streets? (AYL_CP_NARR_05_SP1_006)

160

Pitch (Hz)

140 120 100 75

tsa

lu

ʃʃwa

raʔ

rraj

si

ja 5.087

3.873 Time (s)

The intonation of topic and focus 101



5.2 Topic In Tripoli Arabic, the topic is obligatorily marked by syntactic and prosodic means. From a prosodic point of view, topics in Tripoli Arabic correspond to an intonation unit. This unit is shaped like a bell: at the beginning, it is marked by the rising of the pitch, followed by an important lengthening of the last syllable, a moderate declination, and a lowering of the pitch at the end of the intonation unit that doesn’t reach the lower end of the speaker’s vocal range. This first unit is separated from the comment by a more or less perceptible pause and what follows (another topic or a comment) begins by a pitch reset which occurs at a boundary between intonation units. 5.2.1 Subject and Object Topics In (ex. 43), the subject hadika lǝṃṃwajjaːːː “that water” is dislocated in sentenceinitial position. The prefix t- of the third person feminine singular of the verb tabda “she/it begins” is coreferential with the detached constituent. In this example, there are two intonation units: the topic and its comment < tabda tabda taqi# ə dgila >. The topic is marked by the raising of the pitch contour up to 246.78Hz on the penultimate syllable [ṃṃaj] and above all by the important lengthening of the vowel of its last syllable [jaːːː] (310 ms). Then, a pause separates each intonation unit: the topic is separated from the comment by an important 270 ms pause. After the pause, i.e. at the boundary between each intonation unit, a pitch reset occurs: from 170.25Hz up to 223.83Hz. (43)

hadika ləṃṃajja / (270) tabda tabda taqi# ə dgila // haːdiːka ǝl= ṃṃeːj -a / 270 DIST.F DEF= water\DIM -F / 270 t- bda t- bda taqiː# ǝ tgiːl -a / 3F begin\IPFV 3F begin\IPFV taqiː# ǝ heavy -F / That water, it becomes, it becomes heavy, er, heavy… (AYL_CP_NARR_07_ SP1_051)

Pitch (Hz)

260

200 150 110

26hadi ka lemmaj

ja

270

tabda tabda taqi# ə dgila 55.3

52.1 Time (s)

102 Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira

In (ex. 44), the object əlgasʕaːːː “the tray” is dislocated to the sentence-initial position. In Tripoli Arabic, objects are expressed by clitic pronouns and, in this utterance, the pronoun of the third person feminine singular =ha “her, it” is directly bound to the verbal form tsəmmi ‘you name’ and is coreferential with the detached constituent. This first topic is followed by a second one introduced by the adverb ʕadatan “actually” followed by the independent pronoun hijaːːː “she, it” coreferent with the object əlgasʕaːːː “the tray”. In this example, there are three intonation units: the first topic “the tray”, the resumption of the topic “actually she/it”, and the comment “you can call her/it big tray”. Each topic is characterized by a lengthening of the vowel of its last segment, followed by a pause. After each pause, a pitch reset occurs: from 107.92Hz to 119.99Hz, and from 101.32Hz to 128.71Hz. (44)

ǝlgasʕaːːː / ʕadatan hijaːːː / (558) təkdər tsammiha ṣuniːja kbiːra // ǝl= gǝsʕ -a / ʕaːdatan hiːja / 558 t- gdǝr DEF= tray -F / usually 3SG.F / 558 2- can\IPFV t- sǝmmi =ha ṣuːniːj -a kbiːr -a / 2- name\IPFV =OBL.3SG.F tray -F big -F / The tray, as for it, you can call it big tray… (AYL_CP_NARR_02_SP1_063– 066)

Pitch (Hz)

200

150

100 75 (157əll)gəs

a

a da tan hi

558 təkdər tsammiha sunija kbi(ra 1301)

ja

99.59

94.53 Time (s)

In Tripoli Arabic, topics can also be introduced by the prepositional phrase b=nǝsb-a l= “concerning” and the particle ǝmma “as for”:12

12.  This example is quoted from unrecorded personnal data not transcribed for the CorpAfroAs project.



(45)

The intonation of topic and focus 103

bnǝsba li ṛabiːːːʕ / ahu sre ʃagga u xṭǝb / amma nafaːːːʕ / mazal ʃweja // b=nəsb-a li ṛabiːʕ / aː=hu by=conformity-F to Rabia / PRST=OBL.3.SG.M ʃra ʃəgg-a u xṭǝb / buy\PFV[3SG.M] apartment-F and ask_for_hand\ PFV[3SG.M] / ǝmma naːfaʕ / maːzaːl ʃweːja // as_for Nafa / still a_little_bit // Concerning Rabia, he bought an apartment and asked for a girl’s hand; as for Nafa, not yet.

This utterance is composed of two topics: the first one is introduced by a prepositional phrase and is followed by its comment ; the second one, , is a contrastive topic introduced by a particle. From a syntactic point of view, those are subject topics. The first topic is identified in the comment through the clitic pronoun of the third masculine singular =hu. The second topic is not syntactically integrated into the predication. More complex topicalized utterances can have more than one topic. In (ex. 46),13 there are two topics: the first one “me” is the subject topic and the second one “this one” is the object topic; the comment is “I took it for one hundred”. The semantic role of all topics is indicated in the comment: the suffix of the first person singular -t of the verb xdeːt is coreferential with the topic aːna “me” and the bound clitic =h is coreferential with the demonstrative pronoun aːhuːwa “this one”. The speaker pronounced this utterance so quickly that the vowel lengthening and the pauses are barely perceptible. Nevertheless, each intonation unit begins with a pitch reset: from 144.91Hz to 146.80Hz and from 131.91Hz to 135.85Hz. (46)

13.  ibid.

ane / ahwa / xdetah bmija / aːna / aːhuːwa / xdeː -t =h 1SG / PROX.M / take\PFV -1SG =OBL.3SG.M b= mi:j -a by= hundred -F Me, this one (this mobile phone), I bought it for one hundred (Libyan Dinars)…

104 Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira

Pitch (Hz)

170 160 140 120 100

a

ne

a

hwa

xdeta bmija 1.205

0 Time (s)

5.3 Frames In Tripoli Arabic, the same prosodic contour as that of topics can be used to express other left-dislocated elements, called préambule in (Morel & Danon-Boileau 1998: 37–44). The same prosodic contour concerns adverbial phrases of place, adverbial phrases of time, and conditionals which are detached elements also placed at the beginning of the sentence. (Ex. 47)14 shows a complex topicalized utterance with a topic, followed by an adverbial phrase of time, and followed by a comment. There are three intonation units: the topic “my group of mine”, the adverbial phrase of time “at that time”, and the comment “they were studying with me in the same class”. Each left-dislocated element is characterized by lengthening of the vowel of its last segment. A pause separates each intonation unit. After each pause, i.e. at the boundary between each intonation unit, a pitch reset occurs. (47)

14.  ibid.

gṛubi ʕaneːːː / zmaːːːn / (119) fiːːː / nafs əlqism jəgṛu mʕaj // gṛuːp =i aːna / zmaːn / 119 group =POSS.1.SG 1SG / period_of_time / 119 fi nǝfs ǝl= qism j- gṛa -u mʕa =j // in soul DEF= class 3- read\IPFV -PL with POSS.1.SG // My group (of friends) of mine, formerly, they used to study with me in the same class.

The intonation of topic and focus 105

1.1574611

Pitch (Hz)

210

grupi_ane 1.6829051

150

90 gru 0

pi

Pa

ne::: /

zma:::n /

219

Time (s)

fi /

nafs

l qis mje g mQaj u // 3.303

5.4 Focus 5.4.1 Focus Marker: The particle ṛaː In Tripoli Arabic, the particle ṛaː is used to focus the predicate or the entire predicative relation.15 This morpheme is used with a clitic pronoun that is co-referent with the subject of the utterance. The combination results in the following paradigm:

1sg.m 2sg 2sg.f 3sg.m 3sg.f

ṛaː=ni 1pl ṛaː=na ṛaː=k 2pl ṛaː=kum ṛaː=ki ṛaː=h ~ ṛaː=hu 3pl ṛaː=hum ṛaː=hi

The third person masculine singular forms ṛaː=h and ṛaː=hu are grammaticalized and invariable. In Tripoli Arabic, they can replace the forms referring to any other person and thus precede any predicate. The marker ṛaː is not combined with a specific intonative contour. The prosodic contour of the utterances is that of a declarative sentence, i.e. a descending one. 15.  This morpheme comes from the Arabic verb raʔaː “he saw”. It also exists in other Arabic varieties and it has already been described as having an intensive value in Moroccan Arabic (Caubet 1992) and as a focus marker in Yemeni Arabic (Vanhove 1996).

106 Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira

Predicate Focus The particle ṛaː followed by a cataphoric pronominal clitic can be placed before a conjugated verb, as well as a noun or an adjective in a nominal sentence. Verbal Predicate Focus ṛaː can precede a verb in the perfective:16 (48)

ṛak xallet ǝḍḍej maftuħ // ṛaː=k xǝlleː-t ǝl=ḍeːj mǝftuːħ FOC=2SG.M leave\PFV-2SG DEF-light open\PTCP.PASS.M (Really / actually) You do have left the light on!

ṛaː can also precede a verb in the imperfective: (49)

ṃṃǝssǝx ʕadnan / ṛah biḍayyʕǝk / ħaṛṛ halba // mwǝssǝx ʕadnaːn ṛaː=h be_dirty\PTCP.PASS.M Adnan FOC=3SG.M b=i-ḍǝyyǝʕ=k ḥaːṛṛ haːlba FUT=3-destroy\IPFV=OBL.2SG hot a_lot Adnan is a dirty bastard, this is really going to destroy you, it’s very spicy!

ṛaː can also precede a verbo-nominal form, i.e. an active participle: (50)

ṛani ʕaṛǝfkum // ṛaː=ni ʕaːṛəf=kum FOC=OBJ.1SG know\PTCP.ACT.SG.M=OBL.2.PL I do know you / I do know who you are! (and I do know how bad you can be)

Nominal Predicate Focus In a nominal sentence, ṛaː can precede a noun or an adjective and thus focus the nominal or the adjectival predicate. (51)

ṛani mriḍ // ṛaː=ni mṛiːḍ FOC=OBJ.1SG ill I am so / really ill !

(52)

libja ṛahi fi wuṛṭa // liːbja ṛaː=hi fi wuṛṭ-a Libya FOC=OBL.3SG.F in impasse-F Libya is in such a dead-end!

16.  Those examples are quoted from unrecorded personnal data.



The intonation of topic and focus 107

Predicative Relation Focus The invariable grammaticalized forms of the third person masculine singular ṛaː=h and ṛaː=hu can appear at the end of an utterance and thus focus the entire predicative relation. (53)

dajra libja kullha gdima ʃweja ṛah // daːjər-a liːbja kull=ha gdiːm-a do\PTCP.ACT-F Libya every=OBL.3SG.F old-F ʃweːja ṛaː=h a_little_bit FOC=OBL.3SG.M It has (already) been through all Libya; it’s a little bit old indeed! (“the whole of Libya” knows it already !!!)

Even when ṛaː is situated at the end of the utterance, it is not combined with a specific intonative contour and the prosodic contour of the whole sentence is a descending one. 5.4.2 Focus expressed by intonative contours In Tripoli Arabic, focus can be expressed by intonative contours only. A correlation also exists between the intonative contours and phonetic (an important vowel lengthening) as well as syntactic marks (inversion of the word order and cleft sentences).

Focus Expressed by Intonative Contours only In Tripoli Arabic, contrastive focus can be expressed by intonative contours only: the element the speaker wants to focus is intonatively marked with a sharp rise in pitch. After the focused element, there is a sharp fall in pitch indicating the boundary between the focused element and the rest of the assertion. Let’s consider the utterance in (ex. 54) that precedes the focus in the narration. The speaker was talking about the city of Tripoli and he was telling the number of its inhabitants. He first asserted that in Tripoli there were 1.800 inhabitants (instead of 1.800.000). In this utterance, the pitch varies between 90.26Hz and 128.56Hz. (54)

alf u tamən mijt nasama // aːlf u tǝmn miːj -t nasam -a // thousand and eight hundred -F\CS person -F // One thousand and eight hundred persons. (AYL_CP_NARR_05_SP1_034)

108 Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira 140

Pitch (Hz)

120 100 80

alf

u

ta

na

mijt

mən

sa

ma 47.46

46.31 Time (s)

When the speaker realized he was wrong, he immediately uttered the sentence in (ex. 55) to correct himself. He insisted on the word [məljon] “one million” (as opposed to “one thousand”), which is marked by a sharp rise in pitch from 113.58Hz at the beginning of the syllable [məl] to 147.36Hz indicating the maximum of the curve which is situated on the vowel [o] of the syllable [jon]. The word [məl.jon] is then followed by a sharp fall in pitch to 114.97Hz corresponding to the rest of the assertion beginning with the vowel [u].17 (55)

məljon u təmn mijt alf nasama // mǝljoːn u tǝmn miːj -t aːlf nasam -a // million and eight hundred -F\CS thousand person -F // It’s one million and eight hundred thousand people! (and not one thousand and eight hundred). (AYL_CP_NARR_05_SP1_036)

Pitch (Hz)

170 160 140 120 100 90



1jon

u

ta mən

mijt

alf

na

sa

ma 51.55

49.95 Time (s)

17.  This confirms what has been described for three other varieties of Arabic (Moroccan, Yemeni and Kuwaiti) where “the shared strategy used to convey contrastive focus consists of a rising-falling movement” and “the accented syllables of focused words stand out clearly from the surroundings. This is brought about by considerably raising of F0 of the focused syllables and diminishing the F0 deflections on succeeding and preceding stressed syllables” (Yeou, Embarki & Al-Maqtari 2007, 322).

The intonation of topic and focus 109



Focus expressed by intonative contours + vowel lengthening The sharp rise in pitch can coincide with an important vowel lengthening. In (ex. 56), there are two focused elements: ħasan “Hassan” and ħuseːːːn “Husseyn” (the names of the speaker’s the twin brothers). As for the first element, its first syllable [ħa] is marked by a rise of F0 from 85.66Hz to 131.06Hz, followed by a sharp rise on its second syllable up to 145.43Hz. This first focused element precedes a sharp fall of the pitch from 145.45Hz to 114.08Hz when the conjunction [u] “and” and the first syllable [ħus] of the second focused element are uttered. Then, the intonative contour is marked by a sharp rise in pitch from 114.15Hz at the end of the syllable [ħus] to 190Hz corresponding to the maximum of the curve situated on the vowel [e] of the last syllable [seːːːn]. Moreover, this syllable undergoes an important lengthening of its vowel (363 ms) and a fall of the intonative contour. ħasan u ħusseːːːn // ħasan u Husseːn / Hassan and Husseyn / It’s Hassan and Husseyn! (For real, I swear it’s them both) (AYL_CP_ CONV_07_SP2_086)

(56)

Pitch (Hz)

210

150

100 80

ha

san

wa

hus

0

sen 0.6232

Time (s)

Focus expressed by intonative contours and word order inversion The sharp rise in pitch can also coincide with an inversion of the word order. In (ex. 57), the focused element ħusseːːːn “Husseyn” is marked by a sharp rise in pitch from 142.24Hz at the end of its first syllable [ħus] to 203.73Hz occurring on the second syllable [seːːːn], which is also marked by an important lengthening of its vowel [eːːː] (392 ms). Moreover, the word order is inverted and the focused element precedes the demonstrative pronoun aːhuːwa “this one” (lit. ‘Husseyn is this one’) as opposed to the canonical order aːhuːwa ħusseːn “This one is Husseyn”. The nominal predicate ‘Husseyn’ is focused by putting it before the demonstrative pronoun, thus inverting the order of the constituents, and insisting on the fact that this was Husseyn and nobody else.

110 Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira

(57)

ħusseːːːn ahwa // ħusseːn aːhuːwa / Husseyn PROX.M / This one is Husseyn! (and not somebody else) (AYL_CP_CONV_07_ SP2_089)

Pitch (Hz)

210

150

100

hus

sen

hwa

a

0.9098

0 Time (s)

In (ex. 58) the focused element waːːːtja “ready” is marked by a sharp rise in pitch from 137.40Hz to 196.08Hz, by an important lengthening of the vowel [aːːː] of its first syllable and by its anteposition before the verb tabda “she/it is”, as opposed to the canonical order tabda waːtja “she/it is ready”. This serves here to direct one’s attention to the fact that the hole has been dug once and for all. (58)

laːːː lħufṛa hadi fħufṛat əlburdim waːːːtja tabda // la ǝl= ħufṛ -a haːdi f= # no DEF= hole -F PROX.F in= # ħufṛ -t ǝl= buːrdiːm waːti -a t- bda / hole -F\CS DEF fire_hole ready -F 3F- begin\IPFV / No! This hole… The hole to cook the steamed meat is already ready! (It has been dug once and for all) (AYL_CP_NARR_04_SP1_088)

Pitch (Hz)

210

150

90 la lħu fra ha di

fhu

frat

əl

bur

dim

wa

tja

ta

bda 2.545

0.3697 Time (s)

The intonation of topic and focus 111



Focus expressed by intonative contours and clefting Combined with a sharp rise in pitch, cleft sentences can also focus an element. In (ex. 59) the utterance is composed of two intonative units: the first one “a good idea” is the focus and the second one “we gave you about the baːziːn (barley flour gruel)” is the pseudo-relative clause. The register of the cleft sentence, i.e. the focus, is much higher than that of the pseudorelative clause: whereas the average pitch of the first intonation unit is 140.33Hz, the average pitch of the second intonation unit is 107.50Hz. The first intonation unit ends with the nuclear stress of the assertion that peaks at 212.2 Hz. Then, there is a sharp fall in pitch to 109.84Hz, indicating the border between the focus and the pseudo-relative clause that begins at 109.84Hz and finishes at 105.61Hz. (59)

fikra kwajjsa / ʕaṭenaha ʕalbaziːn // fikr -a kwǝyyǝs -a / idea -F good -F / ʕṭe: -na =ha ʕ= ǝl= baːziːn // give\PFV -1PL =OBL.3SG.F on= DEF= barley_flour_gruel // It’s a good idea we’ve given you about the baːziːn (barley flour gruel). (AYL_ CP_NARR_03_SP1_265)

250

_59_

0

1.74133787

Pitch (Hz)

200

150

100 70 fi 0

kra

kwaj

jsa

Qa e na ha

Time (s)

Qal

ba

zi:n 1.741

112 Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira

5.5 Conclusion Tripoli Arabic intonation can be characterized by the following elements: 1. There is a correlation between sentence types and intonation: since yes/noquestions and assertions share the same SVO form, the difference between them is signalled by intonation. 2. Three intonation patterns emerge: a. The first one is associated with declarative sentences and is characterized by declination; this intonation pattern is shared with predicate focus characterized by the morpheme ṛaː. b. The second one is associated with topics and frames and is marked by an important lengthening of the vowel of the last segment of the intonation unit containing the left-dislocated element, followed by a pause and a pitch reset; this pattern concerns topics and frames. c. The third one is associated with argument foci and is marked by a sharp rise in pitch while pronouncing the focused element (which coincides with the nuclear stress of the utterance) and followed by a sharp fall in pitch for the rest of the utterance. As expected, Wh-Questions share the shame prosodic contours as argument-foci since the interrogative pronoun in a Wh-Question is syntactically focused. Moreover, in Tripoli Arabic, this third intonation pattern can be combined with phonetic or/and syntactic marks, viz. (i) an important vowel lengthening; (ii) inverted word order (the anteposition of the focused element); (iii) clefting. Anteposition can combine with an important vowel lengthening. 6. Closing remarks Strong individual features have emerged from the survey of our four languages: Zaar only has two basic intonation patterns: one for unspecified topics, and a default pattern, characteristic of thetic sentences, which is shared by all other assertion and information structures. Tamasheq has a very complex system of morpho-syntactic exponents of focus, depending on the syntactic function of the focused item. Each of these structures corresponds to a specific intonation pattern identifying the focus. Topic, on the other hand, has only two compulsory syntactic exponents: left-dislocation and the use of the absolute state for the topicalized item. Apart from those two, all other exponents, whether morphological (the particle za) or intonational (pause, register shift, vowel lengthening) are optional.



The intonation of topic and focus 113

Juba Arabic uses the morphemes zatu in two different positions to differentiate between argument and predicate focus, and ma for counter-assertive focus. These three types of focus structures are associated with three different intonation patterns. As for topic, Juba Arabic differentiates between specified topics, marked with the morpheme de, and unspecified topics, marked with the existential copula fi. Again, these two types of topics each have their own intonation pattern. Tripoli Arabic has the same opposition between predicate focus marked with raː, and argument focus with no morphological marker. Each of these two structures is associated with its specific intonation pattern. Finally, Wh-Questions in Tripoli Arabic are clearly a case of focused utterance since their question-word is left-dislocated and they share the same intonation pattern as argument focus. This is different from the other three languages where the question words remain in situ. Despite those differences, strong tendencies emerge from this first survey of the intonation of topic and focus in our four AfroAsiatic languages with different phonological pitch systems. The first tendency concerns the default intonation of thetic sentences. It is characterized by a bell-shaped curve with a peak on the nucleus of the utterance (whether the first high tone of the first content word for Zaar, or the sentence nuclear stress for the other languages), followed by a continuous declination down to a final fall. The second one concerns the intonation of topic. It is characterized by (i) leftdislocation; (ii) a boundary consisting of either or all of the following elements: vowel lengthening, pause, change of register. These two elements define the topic as an initial intonation unit. The other intonation patterns characterizing the structures of Yes/NoQuestions, Focus and Wh-Questions exhibit a series of variations. These variations however follow a rule: lack of a specific intonation pattern for a specific information structure is supplemented by morpho-syntactic marking. A good example is given by Yes/No-Questions where the general pattern is that of a specific intonation with no morpho-syntactic marking. This intonation pattern is characterized by a sharp rise towards the end of the utterance. In Zaar however, the intonation pattern of Yes/No-Questions is the same default intonation pattern as that of thetic sentences in all four languages, but this absence of specific intonation pattern is supplemented by a sentence-final -aː suffix characterizing this type of assertion. Likewise, in Tripoli Arabic, predicate focus is marked by the morpheme raː, and the intonation pattern is that of default thetic sentences. Argument focus on the other hand has no equivalent to the raː morpheme of predicate focus, but has a specific intonation pattern characterized by a sharp rise in pitch on the focused element, followed by a sharp fall for the rest of the utterance. This is summed up

114 Bernard Caron, Cécile Lux, Stefano Manfredi and Christophe Pereira

in the formula: the more a structure relies on morpho-syntax, the less it relies on intonation. Tamasheq, however, is an exception to the extent that topic has very few exponents, whereas focus combines a specific intonation with heavy morphological and syntactic exponents.

References Bearth, Thomas. 1998. Tonalité, déclinaison tonale et structuration du discours - Un point de vue comparatif. In Les unités discursives dans l’analyse sémiotique: La segmentation du discours, Gustavo Quiroz, Ioanna Berthoud-Papandropoulou, Evelyne Thommen & Christina Vogel (eds), 73–7. Bern: Peter Lang. Caron, Bernard. 2000. Assertion et préconstruit: Topicalisation et focalisation dans les langues africaines. In Topicalisation et focalisation dans les langues africaines, Bernard Caron (ed.), 7–42. Louvain: Peeters. Caubet, Dominique. 1992. Deixis, aspect et modalité: Les particules hā- et ṛā- en arabe marocain. In La deixis, Marie-Annick Morel & Laurent Danon-Boileau (eds), 139–149. Paris: Presses Universitaires de France. Cornish, Francis. 2005. Une approche pragmatico-discursive des phrases ‘thétiques’. In La syntaxe au coeur de la grammaire, Frédéric Lambert & Henning Nølke (eds),75–84. Rennes: Presses Universitaires de Rennes. Furukawa, Naoyo. 1996. Grammaire de la prédication seconde. Formes, sens et contraintes. Louvain: Duculot. Heath, Jeffrey. 2005. Grammar of Tamashek (Tuareg of Mali). Berlin: Mouton de Gruyter.  DOI: 10.1515/9783110909586 Kossmann, Maarten. 2011. A Grammar of Ayer Tuareg (Niger) [Berber Studies 30]. Köln: Rüdiger Köppe. Ladd, D. R. 1996. Intonational Phonology. Cambridge: CUP. Lambrecht, Knud. 1994. Information Structure and Sentence Form: Topic, Focus, and the Mental Representations of Discourse Referents. Cambridge: CUP. DOI: 10.1017/CBO9780511620607 Lafkioui, Mena. 2010. La topicalisation en berbère: Formes et structures. In Etudes Berbères V (Actes 5. Bayreuth-Frankfurt-Leidener Kolloquim zur Berberologie 8-11 Octobre 2008), Harry Stroomer (ed.), 121–133. Cologne: Rüdiger Köppe. Louali, Naïma & Philippson, Gérard. 2005. Deux systèmes accentuels berbères: Le siwi et le touareg. Faits de langues 26: 11–22. Lux, Cécile. 2012. `Tamasheq Corpus’. Corpus recorded, transcribed and annotated by Cécile Lux. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOI: http://dx.doi.org/10.1075/scl.68.website. (= TAQ_CL) Lux, Cécile & Philippson, Gérard. 2010. L’accent en tetserret et en tamacheq: Contacts et contrastes. In Etudes Berbères V (Actes 5. Bayreuth-Frankfurt-Leidener Kolloquim zur Berberologie 8-11 Octobre 2008), Harry Stroomer (ed.), 133–164. Cologne: Rüdiger Köppe. Manfredi, Stefano. 2009. Counter-assertive focus in Kordofanian Baggara Arabic. Studi Maghrebini VI: 183–194.



The intonation of topic and focus 115

Manfredi, Stefano. 2012. ‘Juba Arabic Corpus’. Corpus recorded, transcribed and annotated by Stefano Manfredi. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOI: http://dx.doi.org/10.1075/scl.68.website. (= PGA_SM) Manfredi, Stefano & Petrollino, Sara. 2013. Juba Arabic. In The Atlas of Pidgin and Creole Structures III: The Language Surveys, Susanne Michaelis, Philippe Maurer, Magnus Huber & Martin Haspelmath (eds), 54–65. Oxford: OUP. Manfredi, Stefano & Tosco, Mauro. 2014. The morphosyntax and prosody of topic and focus in Juba Arabic. In Arabic-based pidgins and creoles. Special issue of Journal of Pidgin and Creole Languages (29:2): 319–351. Stefano Manfredi & Mauro Tosco (eds). Mettouchi, Amina. 2003. La focalisation contrastive dans les clivées en kabyle (berbère). In Mémoires de la Société Linguistique de Paris: Fonction et moyens d’expression de la focalisation à travers les langues [Nouvelle Série Tome III], Anne Lacheret-Dujour & Jacques François (eds), 81–97. Leuven: Peeters. Morel, Mary-Annick & Danon-Boileau, Laurent. 1998. Grammaire de l’intonation. L’exemple du français oral. Paris: Ophrys. Newman, Paul. 2000. The Hausa Language: An Encyclopedic Reference Grammar. New Haven CT: Yale University Press. Pereira, Christophe. 2010. Le parler arabe de Tripoli (Libye). Zaragoza: Instituto de Estudios Islámicos y del Oriente Próximo. Robert, Stephane. 1993. Structure et sémantique de la focalisation. Bulletin de la Société Linguistique de Paris LXXXVIII: 25–47. DOI: 10.2143/BSL.88.1.2013041 Vanhove, Martine. 1996. Les particules qad et ra’ dans un dialecte arabe de Yafi’ (Yémen). In Proceedings of the 2nd International Conference of l’Association Internationale pour la Dialectologie Arabe held at Trinity Hall in the University of Cambridge, 10-14 September 1995. Cambridge: University of Cambridge. Yeou, Mohamed, Embarki, Mohamed & Al-Maqtari, Sallal. 2007. contrastive focus and F0 patterns in three Arabic dialects. Nouveaux Cahiers de Linguistique Française 28: 317–326. Zimmermann, Malte. 2007. Contrastive focus. In Interdisciplinary Studies on Information Structure 6, Caroline Féry, Gisbert Fanselow & Manfred Krifka (eds), 147–159. Potsdam: Universitätsverlag Potsdam.

Quotative constructions and prosody in some Afroasiatic languages Towards a typology Il-Il Malibert and Martine Vanhove INALCO & LLACAN (Paris)

This chapter investigates, in a crosslinguistic perspective, the relationship between prosodic contours and direct and indirect reported speech (i.e. without or with deictic shift) in four typologically and genetically different Afroasiatic languages of the CorpAfroAs pilot corpus: Beja (Cushitic), Zaar (Chadic), Juba Arabic (Arabic based pidgin) and Modern Hebrew (Semitic). The descriptive tools and analysis of Genetti (2011) for direct speech report in Dolakha Newar (Tibeto-Burman) are used as a starting point and adapted to the annotation system of CorpAfroAs. Each language section investigates the prosodic cues and contours of direct speech reports, in relation to their quotative frame and their right and left contexts. As contradictory claims (e.g. Coulmas 1986 ; Klewitz & Couper-Kuhlen 1999 ; Jansen et al. 2001) have been made concerning the prosodic features of indirect reported speech, for example in English, the same prosodic features are also investigated for the three languages in our corpus which have indirect reported speech (Zaar, Juba Arabic and Hebrew). It is shown that speech reporting as a rhetorical strategy varies a lot from one language to another and is more frequent in the three unscripted languages of the sample. Even if speech reports show a wide range of prosodic behaviors, there are nonetheless clear tendencies that become apparent and which are related to various factors: speech report types, types of constituents of the quotative frame, genres, and typological features of the languages in question. A preliminary typology of the interface between prosody and speech reporting is proposed.

1. Introduction and theoretical background The topic of direct and indirect speech has attracted a lot of attention from linguists. Recently their prosodic treatment has been the focus of several studies for betterknown languages such as English especially from the perspective of conversational

doi 10.1075/scl.68.04mal 2015 © John Benjamins Publishing Company

118 Il-Il Malibert and Martine Vanhove

discourse analysis (e.g. Klewitz & Couper-Kuhlen 1999; Jansen et al. 2001) and computational linguistics (e.g. Oliveira & Cunha 2004). It is only more recently that an in-depth analysis of a lesser-known language, Dolakha Newar, a TibetoBurman language spoken in Nepal, was provided by Genetti (2011). The present study builds on the methodology and analysis developed in her study with some adaptation to the annotation system of the CorpAfroAs project. It aims at providing first-hand information on other lesser-described languages of another language phylum, namely Afroasiatic, and at paving the way for a typology of the interaction between prosody and reported speech. This chapter analyses from a crosslinguistic perspective the relationship between prosodic contours and direct and indirect reported speech (i.e. with or without deictic shift) in four typologically and genetically different Afroasiatic languages of the CorpAfroAs pilot corpus: Beja (Cushitic), Zaar (Chadic), Juba Arabic (Arabic based pidgin) and Modern Hebrew (Semitic), in this order. The study is limited to direct and indirect speech reports marked by quotative frames, which contain the most basic speech verbs of each language under study, i.e. ‘say’ verbs, excluding verbs such as ‘ask’, ‘demand’, ‘shout’, and those which contain just a complementizer. Our analysis is based on an extension of the theoretical approach proposed by Genetti (2011) for Dolakha Newar: we applied her concept of the “prosodic integration cline” of direct speech reports within the quotative frame of a single language to several typologically different languages and also to indirect speech reports. The prosodic integration cline is characterized and explained as follows: A number of features are used to mark discourse as direct speech, including the relative positioning of prosodic and syntactic boundaries, patterns of terminal contours, and changes in loudness, pitch range, register, and timing. As many of these features are scalar, direct speech reports can be placed on a cline from prosodically independent to prosodically integrated with respect to elements of the quotative frame. This variable prosodic behavior can be attributed to competition among discourse functional, syntactic, and production factors. (Genetti 2011: 55)

Genetti further shows that this cline has two distinctive endpoints, one with IU boundaries at both sides of the speech report, shifts in pitch and loudness, and the production of terminal contours, alone or in sequence, typical of prosodically independent units in other types of discourse … On the other end is the necessarily shorter speech report, which is fully integrated into the quotative frame, and receives no prosodic marking whatsoever … Between this, direct speech reports can vary from having more or fewer markers of independence and greater or lesser degrees of variation in pitch or loudness. (Genetti 2011: 72)

Section 2 provides the conventions and methods for the prosodic analysis. Section 3 analyses intonation contours of speech reports in Beja, Section 4 in Zaar,



Quotative constructions and prosody in some Afroasiatic languages 119

Section 5 in Juba Arabic and Section 6 in Modern Hebrew. The theoretical extension constitutes the basis of the concluding section of our study where we compare our findings in the four languages and propose the preliminary basis for a crosslinguistic typology of the prosodic treatment of speech reports, also in relation with the morpho-syntactic typological profiles of the languages. We also discuss, when relevant, the degree of prosodic integration with the adjacent narrative or conversational context. 2. Conventions and methods of prosodic analysis The corpus on which our analysis is based is made of hour-long recordings for each language, except Juba Arabic for which only 46 minutes were available at the time of writing this chapter. The prosodic segmentation of the data, as for the entire CorpAfroAs project, is based on intonation units (henceforth IU) defined in their most commonly accepted sense of “a coherent intonation contour” (cf. e.g. Chafe 1994; Du Bois et al. 1992; Tao 1996) which “encapsulates a functional, coherent segmental unit, be it syntactic, semantic, informational, or the like” (Izre’el & Mettouchi this volume; cf. Cruttenden 1997). As for the whole CorpAfroAs corpus, four major perceptual and acoustic cues were used to recognize the boundaries between IUs: (1) final lengthening; (2) initial rush; (3) pitch reset; (4) pause. In addition, two internal criteria were used: (1) declination (also called ‘downdrift’); (2) tonal parallelism (Wichmann 2000), or isotony (Du Bois 2004; cf. Izre’el and Mettouchi, this volume). A distinction is made between minor (or continuous, i.e. signaling ‘more to come’) and major (or terminal, i.e. signaling ‘nothing more to say’) boundaries which basically follows the difference, based on speech act theory, between terminal break and non-terminal break as used in Cresti and Moneglia (2005) for the C-ORAL-ROM project: a prosodic break is considered terminal if a competent speaker assigns to it, according to his perception, the quality of concluding a sequence … a prosodic break is considered non-terminal if a competent speaker assigns to it, according to his perception, the quality of being non-conclusive (Cresti & Moneglia 2005: 17).

The precise prosody of the final contours of IUs (such as high-fall, mid-fall, rise, etc.) which depend on pragmatics and on the modal category of the utterance (e.g. assertion, interrogation) are not specified in the annotation. It is important to recall here the reasoning behind such a choice, since it explains the difference between Genetti’s (2011) more precise annotation, based on Chafe (1994) and Du Bois et al.’s (1992) annotation, which is sensitive to prosodic movement:

120 Il-Il Malibert and Martine Vanhove

[T]he annotation of terminal and non-terminal breaks does not describe the prosodic movement that actually occurs in correspondence with a specific speech segment, but rather it selects the specific segment where, according to perception, a significant movement occurs. At the same time the annotation does not specify which proper speech act is performed by a sequence of word, but rather, specifies which sequence of words performs an act, for prosodic reasons. … Once the relevant domain for prosodic movements and speech acts is determined, this will probably allow a better interpretation of both the relevant prosodic movements and the functional, dialogical value of the speech event. The same consideration can hold for syntactic features. Utterances cannot be identified and defined on the basis of syntactic properties as clauses can, for instance, but once an utterance is identified on the basis of a terminal break, any kind of morpho-syntactic and lexical evaluation can be driven on it. (Cresti & Moneglia 2005: 20)

In addition to the annotation of minor and major intonation boundaries, the annotations indicate the duration of pauses in milliseconds, and the mention of breath intakes (because in some languages they play a role in narratives). The table below sums up the prosodic transcription conventions used in CorpAfroAs (and in this chapter): Table 1.  Prosodic transcription conventions Intonation units boundaries minor boundary

/

major boundary

//

breath intake (during a pause)

BI_

Duration of pauses (in milliseconds) short pause

(100) to (200)

medium pause

(300) to (600)

long pause

(700) and over

Thus where Genetti (2011) has five terminal contours, high-falling (\\), mid- falling (\), level (_), rising (/), marked-rising (//) and rise-fall (/\), we only note a continuing contour (/) and a terminal one (//). Unlike Genetti, we do not annotate “normal” and “emphatic” phrasal accents (i.e. pronounced at a higher pitch than the narrator’s average), but on the other hand, we systematically quantify the duration of pauses. Nevertheless, prosodic movements are discussed for each of the four languages. We tried as much as possible to stick to the largely adopted convention (cf. Chafe 1994) which consists of writing each new IU on a new line. However, this was not always possible because of the length of certain glosses or IUs which forced



Quotative constructions and prosody in some Afroasiatic languages 121

us to write one IU on two or three lines. In these cases, the second and third lines are indented towards the end of the line to signal that they form one IU with the preceding line; the first occurrence of /, //, (duration of pause in milliseconds), or BI_ (duration of pause in milliseconds) indicates the end of the IU. 3. Beja quotatives 3.1 Elements of syntax and prosody Beja, a North-Cushitic language spoken in Sudan, like a good number of Cushitic languages, is strongly SOV, and is the sole language of this kind in our sample, although the linear order may vary and be determined by information structure. When the object is a clitic pronoun, the order changes to SVO. The language has three basic nominal cases, nominative, accusative and genitive. Grammatical subjects are marked on noun determiners with a nominative case characterized by a vowel uː (SG) / aː (PL). The object category is marked by an accusative case, characterized by a vowel oː (SG) / eː (PL), on noun determiners,1 which is used with patients of transitive verbs as well as with patient and recipient arguments of ditransitive verbs. Recipient arguments of transitive verbs, among them the quotative verb di ‘say’, are introduced by the directional postposition dhaːj / =da / =d ‘towards’ which licences the genitive case, characterized by a vowel -i (SG) / -eː (PL) suffixed to its nominal or pronominal host. The use of the postposition is systematic when the recipient is a noun or an independent pronoun, but impossible with an enclitic object pronoun, which is in the accusative case. In complex sentences, the relative clause is usually embedded in the matrix clause; the nominal head most often precedes the relative clause (it may follow it for pragmatic reasons), the rest of the embedding clause follows it. Headless relative clauses, object complement clauses (as well as other types of dependent clauses) usually precede the matrix clause, and thus precede, directly or indirectly, the verbal head. All three sentence types are most often introduced by the same clitic markers (enclitics are directly attached to the verb, proclitics to the first constituent of the relative clause). Clitic markers are not compulsory with non-restrictive relative clauses. Complement clauses may use the Simultaneity or the Manner converb, without a clitic marker instead (for further details see Vanhove 2012a).

1.  Only some syllabic structures of nouns license these vowels. For the others, case is not overtly marked and the determiner has an invariable vowel i. See Hamid-Ahmed (2013) for a detailed analysis.

122 Il-Il Malibert and Martine Vanhove

Except for the linear order, quoted speech does not pattern with the above mentioned complex sentences: speech reports are never introduced or followed by a complementizer. Beja is the sole langue of our sample in which reported discourse is always direct, i.e. without a deictic shift to the perspective of the narrator: the speech is reported as told by the character.2 Direct speech reports are syntactically quotative complements,3 objects of the quotative verb: they take up the same syntactic slot as a nominal object, which may also be the object argument of the quotative verb. Similarly to relative clauses, the quoted speech is embedded within the quotative frame (i.e. the matrix clause). Both the subject and the recipient, i.e. the addressee (except if an enclitic object pronoun) when overtly expressed (which is rare in the CorpAfroAs data) precede the quoted speech in this order, but there is not a single occurrence with them both in this order in the CorpAfroAs data; the quotative verb di ‘say’ follows the quoted speech (in some rare instances, it can also occur before the completion of the reported speech). (1) oː=jaːs-i=d hus ak-a4 eː-jad-na DEF.SG.M.ACC=dog-GEN.SG=DIR voice be-IMP.SG.M 3-say\IPFV-PL ‘they tell the dog: Shut up!’ (BEJ_MV_NARR_18_Adam_devil_221)

The use of the quotative verb is not compulsory: (2) uː=tak halak hasara DEF.SG.M.NOM=man cloth ADVS jhak-s-aː=b ki=i-ki get_up-CAUS-CVB.MNR=INDF.M.ACC NEG.IPFV=3SG.M-be\PFV ‘the man (says): But, I have not taken any (warm) clothes!’ (BEJ_MV_ NARR_07_cold_22)

To sum up, the quotative frame enfolds the speech report, it is most often reduced to the quotative verb which follows it. The quoted speech may be used without

2.  We follow Genetti (2011: 56–57), who uses “the term “narrator” to refer to the speaker who produced the narrative text and the term “character” to refer to the speaker whose words are reported.” 3.  One reviewer mentioned, for a different view, Thompson (2002) who shows that in English spontaneous conversations, complement-taking predicates — in the sense of Noonan (1985) — are rarely if ever “complements”, and what is usually considered as the complement clause of complement-taking predicates is in fact better “understood in terms of epistemic/evidential/evaluative formulaic fragments expressing speaker stance towards the content of a clause.” (Thompson 2002: 125). Note however that speech reports introduced by the quotative verb ‘say’ are highly marginal in her data and that our data contain only a minority of spontaneous conversations, and none in the Beja data under discussion here. 4.  As in Genetti (2011), the speech reports are highlighted in bold.



Quotative constructions and prosody in some Afroasiatic languages 123

the quotative verb5 (typically in a series of dialogues) and only be signalled by the absence of deictic shift and by prosodic cues. Prosodically, Beja is a language with both lexical and grammatical stress, realized as a high pitch, except in a few nouns where singular and plural forms are opposed by a rising contour and a falling contour on the accented final syllable. In words used in isolation, stress assignment rules are partly conditioned by the syllabic structure and partly depend on the grammatical category, as well as on the presence or absence of affixes and clitics. Position of stress may also be the only means to distinguish two homophones (for details see Vanhove 2012a: 8). In verbs, stress depends on the syllabic structure of the flexional morphemes and on the verb class. For nouns, stress is lexically assigned, unpredictable in most cases except for the penultimate stress of disyllabic nouns ending in a short vowel. In continuous speech, stress assignment depends in addition on pragmatics, speech tempo and intonation contours which may produce stress shifts as well as “emphatic” as opposed to “normal” accents, in the sense of Genetti (2011: 160): “I distinguish two types of phrasal accent: normal and emphatic. Normal phrasal accent results in prominence which is noticeable but unremarkable. Emphatic phrasal accent, by contrast, has significant pitch excursions.” One IU may have several “normal” stresses before the final break, but rarely more than one “emphatic” phrasal accent. In declarative utterances, the prosodic contour of a minor continuing break is either rising or level, with a possible pharyngealization or lengthening of the vowel of the last syllable; prosodic contours of major terminal breaks are falling or mid-falling in utterances which follow a regular declination contour, and final vowels are often devoiced. In questions, final breaks are either level or mid-fall, sometimes high-fall, rarely rising; they are regularly rising with the polar question marker han in final position and with the interrogative verb keː ‘to be where?’. In addition, the overall contour of a question is marked at the onset by a much higher pitch than the rest of the question, most often on the first syllable of the question, rarely the second one, then F0 strongly drops and usually continues with a level intonation (rarely ending on a final rising). Exclamative and imperative utterances also usually start at a high pitch, but contain more prosodic variation on the whole utterance; they usually end in a rising or high-fall contour. Topics are in a separate IU, most often followed by a pause. Focusing does not necessarily entail a higher pitch on the focused element. The CorpAfroAs data on which this analysis is based, contrary to the other languages of the project, only contains narratives (17 traditional tales and 1 personal narrative; 17 were told by the same male speaker, and one by a female 5.  The absence of a quotative verb or frame is common crosslinguistically for both written and oral registers. This has been particularly discussed within the frame of conversational analysis (see Klewitz & Couper-Kuhlen 1999).

124 Il-Il Malibert and Martine Vanhove

speaker); conversations were impossible to record because of social rules of politeness and honour. Direct speech report with a quotative verb is a frequent rhetorical strategy in Beja narratives; a total of 317 utterances were studied for this paper (leaving aside those without a quotative frame, or reduced to ‘yes’, ‘no’ or ‘that’s fine’ which are numerous and have been counted only once). 3.2 Prosodic integration cline in Beja 3.2.1 Speech reports and quotative verb Direct speech in Beja is rarely set off from the quotative verb by intonation-unit boundaries. The prosodic integration of the direct speech reports within the same IU as the quotative verb concerns the vast majority of the speech reports (almost 90% of the 317 examples). The quotative verb belongs either to the same IU as the whole quoted speech (90 examples), or, if the reported speech is split into several IUs, to the last IU of the quoted speech (175 examples), or to an internal IU (12 examples). As shown by these figures, the production of a speech report across multiple IUs concerns a large majority (68%) of the utterances. In the corpus, a single direct speech report can be as long as 13 IUs of various durations and with different pitch variations, separated or not by short, medium and long pauses. Quite often the quotative verb cliticizes to the speech report, is uttered in a very rapid tempo and pronounced in such a low pitch that it does not show on the pitch trace provided by the PRAAT software. When the quotative verb is in the 3SG.M Perfective ini and does not bear any clitic element, it may even be phonetically reduced to a single vowel, often devoiced. Thus, comparatively, the direct speech reports are very often, but not always, louder and at an average higher pitch than the quotative verb. In a reported conversation, the quotative verb itself appears most often at the end of each character’s turn-taking, rarely inside the quoted speech, and never several times for the same stretch of speech report of a character. Below are three typical examples of each of the above categories. Example (3) shows a rather long speech report, which is set off from the previous and next IUs by medium pauses, and which includes the quotative verb in the same prosodic unit: (3) aːlaʤ-an=hoːb uː=jhaːm d=heː tease-PFV.1SG=when DEF.SG.M.NOM=leopard DIR=poss.1SG.ACC far-ija i-ni  // jump-PFV.3SG.M 3SG.M-say\ipFV ‘When I teased it, the leopard jumped on me, he said.’ (BEJ_MV_NARR_15_ leopard_052)

Quotative constructions and prosody in some Afroasiatic languages 125

250

Pitch (Hz)

200

150

90 alaʤanhob

ujham

dhe

farija

ini //

When I teased it, the leopard jumped upon me, he said. 45.26

Time (s)

47.09

In example (4) the direct speech report is produced across five IUs: (4) uːn ani eːt PROX.SG.M.NOM 1SG.NOM PROX.PL.F.ACC t=ʔar=t=i eː-bi=hoːb  / DEF.F=child\PL=INDF.F=POSS.1SG.ACC 1SG-go\INT.IPFV=when BI_365 naː fiːr=eː a-nfariːd / 162 which face =ABL.PL 1SG-talk\IPFV kaːk a-ndi / 110 how 1SG-say\IPFV kaːk i-d=heːb a-ndi / how 3SG.M-say\PFV=OBJ.1SG 1SG-say\IPFV naː=t mʔari haːj eː-bi diːt / which=INDF.F food COM 1SG-go\INT.IPFV say\CVB.ANT she says: “When I go to my daughters, how shall I have the guts to tell them? What shall I say? How shall I tell them what he told me? What shall I bring them to eat?”, and… (BEJ_MV_NARR_14_sijadok_138–145)

126 Il-Il Malibert and Martine Vanhove 111.50557 200

113.078321

Pitch (Hz)

150

100 70 ani

un

et

tʔarti

ebihob/

BI_365

When I go to my daughters, 111.5

113.1

Time (s) 114.693

250

115.491

200

Pitch (Hz)

150

100 70 na

fire

anfarid/

How shall I have the guts to tell them? 113.1

Time (s)

162

kak

andi /

110

What shall I say? 115.6

Quotative constructions and prosody in some Afroasiatic languages 127

115.616536 250

118.344306

Pitch (Hz)

200

150

90 kak

idheb

andi /

How shall I tell them what he told me? 115.6

nat

mʔari

haj

ebi

dit /

What shall I bring them to eat?, she says and 118.3 Time (s)

Example (5) is one of the rare examples of a direct speech report produced over two IUs with the quotative verb inserted in the quote itself, between the matrix clause and the complement clause which are separated by a major boundary and a medium pause. (5)

dheːj-i=da baː=hadiːd-aːna i-ndi eːn  // 381 people-GEN.SG=DIR NEG.PROH=talk-IMP.PL 3SG.M-say\IPFV say\PFV.3PL damʔara=b ni-mri=jeːt toː=na  // gold=INDF.M.ACC 1PL-find\PFV=REL.F DEF.SG.F.ACC=thing6 He tells them: “Don’t tell the people!” they said, “that we found gold!” (BEJ_ MV_NARR_02_farmer_073–075)

6.  =jeːt toː=na is both a relative marker and a complementizer. As all polyfunctional and polysemous items of the CorpAfroAs project they are glossed the same (in this case as “=REL.F DEF. SG.F.ACC=thing”), according to their prototypical and most frequent meaning.

128 Il-Il Malibert and Martine Vanhove 200

Pitch (Hz)

150

100 80 bahadidana

dhajida

indi

en//

He tells them: Don’t tell the people! they said, 67.52

69.26

Time (s)

69.2964016 200

71.0531328

Pitch (Hz)

150

100 80 381

damʔarab

nimrijet

tona//

that we found gold! 69.29

Time (s)

71.05

In only 40 examples does the direct speech report occur in a different IU, set off from the quotative verb by a pause in 60% of these (24), most often short or medium. The non-integration of the direct speech report within the quotative frame occurs frequently (29 tokens) when the quoted speech consists of an exclamative utterance or contains an Imperative verb form or onomatopoeia, (but these quote types may also occur, more rarely, in one IU together with the quotative verb). In one instance



Quotative constructions and prosody in some Afroasiatic languages 129

it occurs after a hesitation. These clause types typically start on a high pitch (of 200 Htz and above for the male speaker, far above his average pitch) before dramatically decreasing some 100 Htz or more. The non-prosodically integrated quotative verb, even when not separated by a pause from the reported speech, starts with an upward pitch reset (between 20 to 30 Htz) which sets it off from the speech report. In most cases there is in addition a rush on the quotative verb. Such dramatic falling contours, not characteristic of non-quotative narrative discourse, are commonly heard on exclamatory and vocative expressions. Below are a few examples. (6)

allaːj bareːsoːkna=ka / God 2PL.ABL=CMPR nhas=ka nijaː=ju // BI_541 clean=CMPR intention=COP.3SG i-ndi eːn // 3SG.M-say\IPFV say\PFV.3PL ‘God has better intentions than yours! he says, they said’7 (BEJ_MV_ NARR_08_drunkard_077–080)

(7)

beːn kʷiːkʷʔaj hiː-na // 383 DIST.SG.M.NOM crow give-IMP.PL ti-ndi=jeːb oː=doːr // 3SG.F-say\IPFV=REL.M DEF.SG.M.ACC=time ‘when it says: Give it to Crow!’ (BEJ_MV_NARR_16_Prophet_Fox_ Crow_306–308)

(8)

hawawawawa / woof i-ndi=hoːb / 3SG.M-say\IPFV=when ‘when it says: Woof!’ (BEJ_MV_NARR_18_Adam_devil_207–208)

(9)

jhaː naːnaː=t=iː ti-dir=i / ADRF what=INDF.F=alb.sg AOR.2SG.M-kill=OBJ.1SG ǝǝǝ / er i-ndi=jaːt // 3SG.M-say\IPFV=COORD ‘Hey! Why have you killed me? er, it says and…’ (BEJ_MV_NARR_18_ Adam_devil_160–162)

7.  The verb ‘say’ in the 3rd person plural of the Perfective eːn, is very frequently used as a discourse marker which signals the end of a paratone, be it a reported speech or not. It is used even after a quotative verb. The literal translation ‘they said’ has been retained throughout the examples.

130 Il-Il Malibert and Martine Vanhove

The remaining examples are long quotes (from 3 to 13 IUs) with rather complex syntactic organization, including quotes within quotes. 3.2.2 The onset of the speech report As regards the previous context of a reported speech whose quotative frame is limited to the quotative verb, be it a narrative section or a previously quoted speech, it is most often set off from the speech report itself, namely in 98% of the 308 examples without an overt subject or lexical addressee at the onset of the quotative frame. Such a high percentage clearly indicates that the prosodic break is a marker of the onset of a quotation, even if not, of course, exclusive to this utterance type. Among these examples, 73% direct speech reports are separated from the previous utterance by a pause (10% short, 51% medium, 39% long), and among the 27% remaining quotes, 34 are preceded by a major boundary with a terminal break, and 47 by a minor boundary, with a continuative contour. In this last instance, the previous IU always contains a syntactically dependent clause, either temporal or causal, a coordinated clause with a finite verb, or a converbal clause. But it should be noted that these clause types may also be followed by major (final) boundaries or pauses of various length. Below is a typical example of a first coordinated clause with a continuative contour and a rush on the last word of the first IU, followed by the quote which starts louder and at a higher pitch (an increase of 50 Htz). (10) jiːn-a i-sini=t i=ʃiːtaːn day-PL 3SG.M-wait\PFV=COORD DEF.M=devil ɖaːb-eː dhaːj jʔ-i=jaːt  / run-CVB.SMLT DIR come-AOR.3SG.M=COORD j=ʔar=aːn keː-jaːn DEF.M=child\PL=POSS.1PL.NOM be_where-PFV.3PL i-ndi=hoːb / 374 3SG.M-say\IPFV=when ‘after some days, the devil comes back running and when he says: Where are our children? …’ (BEJ_MV_NARR_18_Adam_devil_298–299)

Quotative constructions and prosody in some Afroasiatic languages 131

200

Pitch (Hz)

150

100 80 isinit

jina

iʃitan

ɖabeda

jʔijat/

after some days, the devil comes back running and 268.2

270

Time (s)

200

Pitch (Hz)

150

100 80 jʔaran

kejan

indihob/

when he says: Where are our children? 270.1

Time (s)

271.1

3.2.3 Speech report, subject and addressee of quotative frame The discussion of the prosodic integration cline of direct speech reports and their quotative frame in Beja would not be complete without mentioning its others elements, namely the subject and the recipient addressee, which only co-occur once in the corpus, but not in the canonical word order.

132 Il-Il Malibert and Martine Vanhove

3.2.3.1  Subject.  In the Beja data of the CorpAfroAs project, the syntactic subject of the quotative verb is rarely overtly expressed as an independent lexical or pronominal item: out of the 317 examples, only six have an overt lexical or pronominal subject. In all six cases, the subject is set off from the reported speech by prosodic boundaries, five times by a pause, and once by a minor boundary. Because finite verb forms contain a subject index coreferent with the lexical or pronominal subject and/or because the previous context is usually enough to make it clear who is talking, it seems the narrator mainly feels necessary to clarify which character is the author of the quote in cases of a possible contextual ambiguity, as is the case in three of the examples. For instance in (11) below, which occurs in a complex narrative setting involving several characters, Adam, Lion, Dog, and the other animals, it cannot be assumed from the previous context which of them is the author of the speech report, and as it is not the last mentioned, the narrator needs to specify who is talking. In the context, the narrator hesitates several times before choosing the correct character, and pauses after he has found it, before uttering the speech report. Speech processing thus also plays a role in the separation of the subject from the speech report. (11)

ti=ɖhaniːni kass=t=aː / 509 DEF.F=wild_animal\PL all=INDF.F=POSS.3PL.NOM aːn hinin / BI_1190 PROX.PL.M.NOM 1PL.NOM oːn oː=tak niː-ʃibib / 335 PROX.SG.M.ACC DEF.SG.M.ACC=man 1PL-look\aor niː-m-dir eː-jad-na eːn // FUT.PL-RECP-kill 3-say\IPFV-PL say\PFV.3PL ‘all the animals say: We are going to observe this man and fight against him, they said.’ (BEJ_MV_NARR_18_Adam_devil_191–197)

In addition to disambiguation, the subject may have a pragmatic function of topic which in Beja is prosodically followed by a continuing or a terminal break followed or not by a pause. Example (12) below is a dialogue between two characters, Fox and Crow, and Crow’s quoted speech follows that of Fox. The subject is not integrated in the quote and is set off prosodically by both a medium pause of 246ms and syntactically by the dependent clause which describes the setting of the conversation (and does not belong to the quotative frame). The subject here functions pragmatically as a contrastive topic: (12)

ontʔa kʷiːkʷʔaj // 246 now crow i=sikka-i hireːr-eː a-haragʷi DEF.M=road-GEN.SG walk-CVB.SMLT 1SG-be_hungry\PFV



Quotative constructions and prosody in some Afroasiatic languages 133

i-ndi=jeːb oː=doːr / 3SG.M-say\IPFV=REL.M DEF.SG.M.ACC=time ‘Now, when Crow, while walking on the road, said: I am hungry…’ (BEJ_ MV_NARR_16_Prophet_Fox_Crow_166–168)

The subject of a speech report can also be pragmatically expressed as an afterthought topic shift, which occurs in a non-canonical position, after the quotative verb. Later on in the same tale as above, the narrator mentions an action carried out by the two characters. He then directly goes on with their dialogue. In the quote of the first character, the subject of the quotative frame appears after the quote itself, uttered in a low and rather flat pitch, typical of afterthoughts, as the speaker realizes somewhat late that the audience may not have understood which of the two characters is talking: (13)

t-haragʷi=jeːk soː-j-a=heːb / 151 2SG-be_hungry\IPFV=if CAUS-say-IMP.SG.M=OBJ.1SG ti-ndi eːn // 553 3SG.F-say\IPFV say\PFV.3PL lhaːweː=t // fox=INDF.F ‘If you are hungry, tell me! Fox said, they said.’ (BEJ_MV_NARR_16_ Prophet_Fox_Crow_159–163)

3.2.3.2  Addressee. The addressee (i.e. the recipient argument) is also rarely overtly expressed in the Beja data; in fact there are only eleven examples. Eight of them are enclitic 1st person object pronouns on the quotative verb (this low figure is at least partly due to the fact that enclitic 3rd person object pronouns are zero morphemes in Beja), and three have a lexical addressee at the onset of the quotative frame. The addressee may be integrated in the same IU as the speech report (or its beginning); this is the case in two instances. In the third one the addressee is set off by a long pause. Such a low figure provides no basis for explaining the various prosodic processes. (14) oː=jaːs-i=d hus ak-a DEF.SG.M.ACC=dog-GEN.SG=DIR voice be-IMP.SG.M eː-jad-na eːn // 3-say\IPFV-PL say\PFV.3PL ‘they tell the dog: Shut up!, they said.’ (BEJ_MV_NARR_18_Adam_ devil_221)

134 Il-Il Malibert and Martine Vanhove

(15) ti=ndeː=t-i=da ja iraːni DEF.F=mother=INDF.F-GEN.SG=DIR ADRF gosh w=ʔoːr=oːk / DEF.SG.M=child=POSS.2SG.ACC bak tʔi-it=eːt hajʔaː=t=ib rh-an thus resemble-VN=REL.F way=INDF.F=LOC.SG see-PFV.1SG i-ndi=hoːb / 3SG.M-say\IPFV=when ‘when he tells the mother: Gosh I saw these things that happened to your son…’ (BEJ_MV_NARR_13_grave_073–075) (16)

lhaːweː=t-i=dha // 950 fox=INDF.F-GEN.SG=DIR lhaːweː=t / BI_624 fox=INDF.F jhak-a ti-ndi eːn get_up-IMP.SG.M 3SG.F-say\IPFV say\PFV.3PL ‘it says to Fox: Fox, get up! they said…’ (BEJ_MV_NARR_16_Prophet_Fox_ Crow_203–207)

To conclude, in Beja, the direct speech reports are most often prosodically marked with respect to the quotative frame, but there is a radical asymmetry between the onset of the quote and its end which integrates the quotative verb in the same IU as the (end of) the speech report in roughly 75% of the examples. Furthermore, there is almost always a prosodic boundary at the onset of the direct speech report which is a clear prosodic cue of the beginning of a quote. Such a pervasive prosodic pattern might be linked to the syntactic properties of the language and to physical constraints, i.e. the SOV word order, the embedding of the quote within the quotative frame whose initial part is most often missing, the absence of a complementizer, and the fact that the short quotative verb most often occurs at the end of the speech declination. Only comparative work with other languages which share this combination of typological features would confirm or reject the hypothesis. Any comparison with Dolakha Newar can only be partial because even though it is also an SOV language, it does not share all the features of Beja. 4. Zaar quotatives 4.1 Elements of syntax and prosody Zaar, a South-Bauchi Chadic language spoken in Nigeria, is strongly SVO, except in some specific morpho-syntactic contexts: it is obligatorily SOV with pronominal



Quotative constructions and prosody in some Afroasiatic languages 135

direct objects, and optionally so with nominal direct objects when the verb is in the Continuous tense. The linear order is the only cue that distinguishes between S and O. It is also the only cue that distinguishes between the recipient and the direct object of a ditransitive verb, among them the quotative verb wul ‘say’ (often reduced to [wû]); the recipient precedes the direct object. Recipients of other transitive verbs are marked by a special morpheme. In complex sentences, relative and complement clauses follow their nominal and predicative heads respectively. They may or may not be introduced by various non-specific markers and complementizers. Zaar makes use of direct and indirect speech reports which are both introduced by specific complementizers.8 The whole quotative frame, including the quotative verb wul ‘say’, precedes the quoted speech which is introduced by one of the two complementizers (termed “Opener” in Caron’s (2012a) terminology), tu and wéj, the latter being a borrowing from the Hausa evidential particle. As described in Caron (2012a) Zaar is a three-tone language9 with no phonologized downstep and with stress overriding declination, stress meaning here a prosodic emphasis, i.e. the relative prominence of syllables within a word. The relative height of tones within an IU is linked to stress. Emphatic stress in Zaar is used to underline the rhematic status of lexemes. Initial intonemes, i.e. a set of distinct intonation contours associated with particular functions, are characterized by downstep or upstep, i.e. there is a noticeable change in the register of an IU compared to the preceding one. Both upstep and downstep are associated with specific functions: topicalization, polar questions, emphasis of adverbials and emotional statements for the former; parenthesis, comments following an (upstepped) topic, and contrastive focus for the latter. Final intonemes are either falling, rising, continuing, or high-falling. The falling intoneme corresponds to canonical assertions and Wh-Questions. The rising intoneme is mostly associated with polar questions and exclamations. The continuing intoneme, which only occurs at the end of minor units, cancels declination, is often associated with lengthening and induces the only cases (and these are rare) of plateau realization of a flat tone. It is often associated with topicalizing morphemes. The rising-falling intoneme appears as a sharp downward fall preceded by a smaller rise. It is systematically associated with emphasis on negation, ideophones and assertion particles. Direct reports are more frequent than indirect ones in the CorpAfroAs data: for a total of 125 speech reports, 66% are direct ones, and 34% are indirect. Each type is evenly distributed between conversations and narratives which represent 8.  Direct reported speech introduced by a complementizer is often called “semi-direct speech”. 9.  This paragraph is extracted from Caron (2012a: 43–45) in which further details and examples are provided.

136 Il-Il Malibert and Martine Vanhove

respectively 48% and 52% of the total duration of the Zaar corpus.10 The complementizer tu is more frequent than wéj (106 vs 20), which is limited to the conversation register (where codeswitching with Hausa is also more frequent). The use of the quotative verb is not compulsory, and the complementizer may be used on its own. There are 20 examples with tu and 13 with wéj, both with direct and indirect quotes. The complementizer wéj may combine with tu, but this occurs only once in the CorpAfroAs data. Zaar quoted speech is thus always signaled segmentally by at least a complementizer, and is thus syntactically marked. But note that in Zaar as in Beja, direct speech reports may also occur without a quotative frame, but that this is never the case for indirect speech reports. 4.2 Prosodic integration cline in Zaar 4.2.1 Speech reports and quotative frames The prosodic integration of the direct speech reports within the same IU as the quotative frame is less frequent in Zaar than in Beja. The degree of the prosodic integration of the (onset of) the speech reports within the previous quotative frame varies with the type of constituents of the quotative frame. When the quotative frame contains both wul ‘say’ and the complementizer tu the speech reports (or their first IU in cases where they are produced over several IUs) are prosodically integrated in almost 40% of the direct and indirect speech reports (41/106), but less frequently for indirect speech (33%) than for direct ones (50%). When the speech reports are set off from the quotative frame, the prosodic boundary always occurs after the complementizer tu, except once (see below ex. 21). Prosodic boundaries may be a major or a minor break, followed or not by a pause of any size. Below are examples for direct (ex. 17–18) and indirect (ex. 19–20) speech reports.11 (17)

(direct speech; quotative frame and speech report in one IU) kətá wuləm tu mə̀ ɓúp kə kímsə kətá wul=mə tu mə̀ ɓup=kə kímsə 2SG.sbj.REM say=1SG.obj OPN 1PL.sbj.SBJV wait_for=2SG.obj Kimsẹ ‘You told me: we wait for you in Kimsạ.’ (BC_SAY_CONV_03_SP2_263)

10.  19 speech reports had to be left aside because of overlaps with another speaker which made the data unclear, either segmentally or prosodically. 11.  Because of the tone differences in Zaar between the surface and underlying tones, the phonetic transcription is also provided.

Quotative constructions and prosody in some Afroasiatic languages 137

200

Pitch (Hz)

150

90 kətá

wuləm

tu



ɓúp



kímsə//

You told me to wait for you in Kimsa 494.6

(18)

Time (s)

495.7

(direct speech; quotative frame and speech report in two IUs) èː jâːm mjáː sûːŋ wúlɣə tuːːː / èː jáːni mjáː súː-ə́n wul=kə tu-ːːː / yes 3SG 1SG.sbj.IPFV want-PROX say=2SG.obj OPN-length ɗaŋ ka ɬə̂ːʃí kóː / ɗan ka ɬə=ʃí kóː / as 2SG.sbj.FUT go=3PL.obj or bàː ʧík ŋâː nə lǎːn mə tájáː fuːɣə tún tún bàː ʧík ŋaː nə́ laː-ə́n mətájáː fuː=kə tún tún NEG1 thus vrt for work-PROX 1SG.sbj.IPFV.REM tell=2SG.obj since since ‘That’s why I want to tell you: When you go, isn’t not like this, for the work that I have been telling you since?…’ (BC_SAY_CONV_03_SP2_118–120)

138 Il-Il Malibert and Martine Vanhove 169.547887 150

171.789206

140

Pitch (Hz)

120

100 90 è jâm mjá

sûŋ

wúlə

tu /

165 ɗaŋ

That’s why I want to tell you that, 169.5

ka

ɬ ʃí

kó /

when you go, 171.8

Time (s)

150 140

Pitch (Hz)

120

100 90 bà

tʃík

ŋâ

n

lan



tájá

fuə

tún

tún//

like this, is not it for the work that I have been telling you since?...’ 171.8

(19)

Time (s)

(indirect speech; quotative frame and speech report in one IU) ka wu tu kə mân / ka wul tu kə mân / 2SG.FUT say OPN 2SG.AOR come ‘you will say you have come…’ (SAY_BC_CONV_01_SP1_114)

173.6

Quotative constructions and prosody in some Afroasiatic languages 139

150

Pitch (Hz)

140

120

100 ka

wu

tu



mân /

‘you will say you have come...’ 232.7

(20)

Time (s)

233.4

(indirect speech; quotative frame and speech report in two IUs) kúmá á wû tu / kúmá á wul tu / 451 also 3SG.AOR say OPN ʧáː súː tə̀ vjáːj ɗaːmí / ʧáː súː tə̀ vjáː-íː ɗa=mí / 3SG.IPFV want 3SG.SBjV spend_day-res at=1PL.obj ‘And he said he wants to stay in our village’ (BC_SAY_CONV_03_SP1_549– 551)

140 Il-Il Malibert and Martine Vanhove 525.839665 150

527.531421

140

Pitch (Hz)

120

100

80 kúmá

á wû tu /

451

ʧá

And he said 525.8

sú

t

vjáj

ɗamí /

he wants to stay in our village

Time (s)

527.5

4.2.2 Prosodic integration of the complementizers 4.2.2.1  The complementizer tu.  As shown in the above examples, tu is usually included in the quotative frame; it thus marks in itself the onset of the speech report, and Zaar, contrary to Beja, does not need to resort to prosodic means for this purpose. This is probably one of the syntactic reasons why Zaar has more freedom regarding the prosodic integration or non-integration of the speech report itself within the quotative frame. Below is the sole example where the complementizer is prosodically integrated in the direct speech report instead of the quotative frame, from which it is set off by a medium pause of 295ms. The wrestling of the dog is mentioned seven times before ex. (21). It seems the narrator hesitates and is repeating the same action in various syntactic ways to give himself time to remember the next line of the tale. Difficulties in speech processing here may explain this unusual position of the IU boundary. (21)

mbə́rgə̀ptəŋ wùl / 295 mbə́rgə̀ptə wul / hyena say tu tôː tə̀ ɲòm tə́ káɗi // tu tòː tə̀ ɲom tə́ káɗi // OPN well 3PL.SBJV take with dog ‘Hyena said: Well, let him wrestle with Dog.’ (SAY_BC_NARR_03_SP1_479– 481)

Quotative constructions and prosody in some Afroasiatic languages 141



When the quotative frame is reduced to the complementizer tu, both in direct and indirect quotes, it is systematically integrated within the same IU as (the onset of) the quoted speech as in (22) where in addition it expresses the manner in which the previous utterance is accomplished. This example echoes Noonan’s (2006) findings about the various semantic functions of direct speech reports in Chantyal, a Tibeto-Burman language, adding one more meaning to his list which consists of reason and causation, purpose and motivation, intention, attendant circumstance, and the listing of alternative. (22)

tàːtá ŋgâː tə́ mârá sə̂mwòs / tàːtá ngaː tə́ mará sə̂m=wos / 3PL.sbj.PFV.REM start 3PL.sbj.AOR spoil name=poss.3.SG tu ʧáː ʧi mə̂ːr // tu ʧáː ʧi mə̂ːr // OPN 3SG.sbj.IPFV eat theft ‘they spoilt his reputation by saying that he was a thief.’ (BC_SAY_ CONV_03_SP2_458–459)

896.740991 170

898.561915

160

Pitch (Hz)

140

120

100 90 tàtá

ŋgâ

t

mârá

s mwòs /

they spoilt his reputation 896.7

tu

ʧá

ʧi

m r //

by saying that he was a thief.

Time (s)

898.6

The sole exceptions occur in two utterances, one with an inceptive auxiliary verb, and the other with a lexical subject in the quotative frame: tu, as when used with a quotative verb, is integrated within the quotative frame (ex. 23 and 24).

142 Il-Il Malibert and Martine Vanhove

(23)

tə̀tà ŋgúp tu ká / tə̀tà ngúp tu ká / 1PL.REM start OPN disapproval ka ɗu bóːlǐŋǎːn // ka ɗu bôːl-íː ŋaː hə́ŋ // 2SG.FUT beat football-DEF vrt NEG2 ‘they started (to say): What! won’t you play football?’ (BC_SAY_CONV_3_ SP1_674–675)

(24)

bàsàjì gòs tu / 783 bàsàjì gòs tu / Zaar poss.3SG OPN ìndán in kjáː kárá ŋgə́tnɗi / ìndán in kjáː kará ngə́tn-ɗi / if if 2SG.COND beg thing-ctp ‘Zaar people (say) that: if you beg for something…’ (SAY_BC_NARR_02_ SP1_444–446)

As far as the preceding clause is concerned, be it a narrative section or another speech report, tu is never prosodically integrated within it. 4.2.2.2  The complementizer wéj (glossed evd in Caron’s data) which introduces both direct and indirect discourse, on its own or in addition to wul tu or tu, only occurs in two of the three conversations, but not in the narratives. 20 tokens are introduced by wéj. In the five tokens where the quotative frame consists of wul tu wéj the quote is in a different IU than the quotative frame, with one exception. Similarly to tu, when wéj stands alone (13 examples), it is most often integrated in the same IU as the speech report (11 examples) (25)

kətá wu tu wéj / kətá wul tu wéj / 2SG.REM say OPN evd nə núːɣəŋ átâjáː sop / nə núː=kən átâjáː sop / COP1 who =COP2 3SG.REM.IPFV court duːkjôː ŋǎːn // duːkíja=oː ŋaː hə́ŋ // Dukiya=fct vrt NEG2 You were wondering: who is dating Dukiya? (BC_SAY_CONV_03_ SP1_925–927)



Quotative constructions and prosody in some Afroasiatic languages 143

4.2.2.3  End of quotes.  The end of both direct and indirect speech reports is almost always set off from the following IU by a major boundary, followed or not by a pause, very rarely by a minor boundary. 4.2.2.4  To conclude, in Zaar there are no major prosodical differences between direct and indirect speech reports, except that direct reports are more often integrated with the quotative frame than indirect ones (50% vs 33%). As the use of a complementizer systematically marks the onset of a speech report, the prosodic resources need not be used for this purpose. This allows more variation in the prosodic integration cline of the speech reports within the quotative frame than in Beja. The sole major distinction in Zaar is not linked to direct or indirect speech, but to the use of a quotative verb in the quotative frame: if the quotative verb is present, the complementizer always belongs to the same IU as the quotative frame; if not, the complementizer belongs to the same IU as the speech report. 5. Juba Arabic quotatives 5.1 Elements of syntax and prosody Juba Arabic, an expanded Arabic based pidgin spoken in South Sudan with hardly any morphology, is predominantly SVO. The linear order is the only cue that distinguishes between subject and object arguments. The basic constituent order may become SOV in the presence of contrastive topicalisation. Indirect objects follow direct objects, and the recipient of a ditransitive verb is signaled by the use of the dative preposition le. It often precedes the direct object. Relative clauses are head external and are introduced by an invariable relative marker al which follows the head, but in basilectal registers the relative marker is often missing. Headless relative clauses are introduced by the same marker. Subordinate complement clauses usually follow the primary verb directly, i.e. without a complementizer. Speech reports have the same pattern as complement clauses and are introduced by any of the three general quotative verbs wonosu, kelim, and gale. In addition, the verb gale ‘say’ may function as a complementizer with verbs of speaking and thinking, including after the verb gale itself. Nothing can intervene between the verb and the complementizer except in acrolectal varieties, where the recipient may optionally separate the verb of speaking and the complementizer. In basilectal varieties the dative marker le is often omitted. Juba Arabic is a pitch accent language where stress, realized as a high pitch, is distinctive both lexically and grammatically. Usually, stress falls on the first heavy

144 Il-Il Malibert and Martine Vanhove

syllable of a word or on the first syllable of a word with no heavy syllables. In a few cases, stress is lexically distinctive: ˈsaba ‘seven’ vs. saˈba ‘morning’. In verbs, stress distinguishes the active voice from the passive voice for verbs ending in -u: ˈkatulu ‘kill’ vs. katuˈlu ‘be killed’. Ambitransitive verbs always have a final stressed syllable. Stress on the penultimate syllable in verbs ending in -u is associated with deverbal nominal forms: ˈkuruju ‘cultivate’ vs. kuˈruju ‘cultivation’.12 In declarative utterances, pitch variation is not very important in the speech of the four male speakers of the CorpAfroAs data. Continuing terminal contours (minor breaks) are either level, with possible lengthening of the last vowel, or slightly rising. Final terminal contours are falling except in questions where the most common contour is fall-rise or rising, more rarely rise-fall or high, preceded by a rather flat contour on the previous syllables of the IU. Exclamative utterances have a sharp rising final pitch, and the IU may in addition start with an important rising pitch reset. Very long, rapid IUs lasting up to some two seconds are not infrequent and can include very long stretches of some 14 syllables. The 46mn of the CorpAfroAs Juba Arabic corpus consists of 40% narratives and 60% conversations. The data shows that speech reports are a much more frequent rhetorical strategy in narratives than in conversations:13 Out of a total of 70 speech reports, 77% occur in narratives, against only 23% in conversations. Juba Arabic has both direct and indirect speech reports. The vast majority are direct (66/70), of which only nine are introduced by the grammaticalized complementizer gale. The four indirect speech reports are only signaled by a deictic shift to the perspective of the narrator; none of them is introduced by the complementizer. 5.2 Prosodic integration cline in Juba Arabic Juba Arabic presents a typological profile distinct from both Beja and Zaar: it is the sole language in our sample where all speech reports (except two), direct and indirect, are integrated (fully or partially if produced across multiple IUs) within the same IU as the quotative frame. The quotative frame itself is most often set off prosodically from the previous utterance, but in a few rare instances it is part of the same IU. Like in Zaar, the end of quotes, a mirror image of the onset of quotes in Beja, is systematically set off from the following discourse or narration either by a pause (53 examples), or a major (13) or minor boundary (1) without a pause. The 12.  The above three paragraphs are a summary of Manfredi & Petrollino (2013). 13.  One reviewer rightly points out that some conversational genres incorporate narratives within them, and it is possible that these would show higher levels of quoted speech. The other reviewer questions the fact that conversations including direct speech reports could be regarded as a rhetorical strategy, at least in some cultures.

Quotative constructions and prosody in some Afroasiatic languages 145



three remaining examples are truncated utterances for which it is not possible to tell whether there is a terminal or a continuing boundary. 5.2.1 Intonation units of direct speech reports The prosodic integration of direct speech reports varies from the total length of the quote to just its initial word. As mentioned above, the Juba Arabic speakers (all males) of the CorpAfroAs corpus may have very long IUs in terms of duration and number of syllables, encompassing several syntactic clauses, including speech reports with their quotative frames. Ex. (26) below lasts almost two seconds and contains 14 syllables actually pronounced. Thus Genetti’s (2011: 72) claim that speech reports have to be “necessarily shorter” in order to be integrated within the same IU as the quotative frame may apply to Dalokha Newar, but it is certainly not the case for Juba Arabic and her observation cannot be considered as a prosodic universal. (26) úo gále ja áki úo ma bi=árifu kalám zej de // 3sg say voc bro 3sg neg irr=know discourse like prox.sg ‘The lion said: they would never know about this.’ (PGA_SM_NARR_2_ SP1_595) PGA_SM_NARR_2

567.587693 130

569.312

Pitch (Hz)

120

100

80 70 úo 567.6

gal

jáki

úo

ma

Time (s)

bárif

kalám

dZe da // 569.3

At the other end of the extreme are the direct speech reports whose first constituent is the sole element included within the same IU as the quotative frame. Nineteen direct speech reports are of this type; most of them start with a vocative element (9 examples), a discourse particle (4 examples) or an exclamation word (4 examples); there is also one instance with a pronoun, and one with an adverb.

146 Il-Il Malibert and Martine Vanhove

Example 27 is one of these speech reports, with a minor boundary and a pitch reset at the beginning of the next IU. (27)

gále ja áki / Abigo say VOC bro badá éna // 2PL afterwards IMP-come_back\IPFV-3PL here ‘Abigo said: Bro, after that, you have to come back.’ (PGA_SM_NARR_2_ SP1_544–545)

160 140

Pitch (Hz)

120 100 80 70 abigó

gal

jáki /

badá

min

héna //

Abigo said: Bro, after that, you have to come back. 518.8

Time (s)

520.8

Variations in pitch are often not very important in direct speech reports which most often follow the natural declination of speech, even in exclamative contexts as in (28) below, where the pitch increase at the onset of the speech report is of only 10 Htz as compared with the pitch of the quotative verb. (28) gále waláhi ána ma bi=árifu bet de // say by_god 1SG NEG IRR=know house PROX.SG ‘(She) said: I swear, I don’t know this house.’ (PGA_SM_CONV_2_SP1_533)

Quotative constructions and prosody in some Afroasiatic languages 147

160 140

Pitch (Hz)

120 100 80 60 gal

waláhi

ána

ma

bárif

bet

de //

(She) said: I swear, I don't know this house. 767.8

768.9

Time (s)

More important pitch variations may occur in exclamative contexts, together with a slight increase in loudness at the beginning of the speech reports as in (29) where the stressed syllable of gubár is ten decibels higher than the previous vocative elements as shown on the solid line which indicates intensity. (29)

gále gubár ja zol / say dust VOC man now sehí~sehí // get_up\PTCP.ACT.SG.M type right~right ‘(She) said: dust, man, really a lot of (dust)!’ (PGA_SM_CONV_1_SP1_258– 259)

gal

gubbar

ja

‘(She) said: dust, man,

zol /

now

sehísehí //

really a lot of (dust)!

Typical of direct speech reports which contain exclamative or vocative words and imperative verbs is an isochronous pattern which includes the succession of pitch

148 Il-Il Malibert and Martine Vanhove

rises at the end of each IU, far above the average pitch of the speaker as in (30), followed or not by a pause: (30)

jála gále ja áki // 205 then Abigo say VOC bro ja árnab // VOC rabbit / 2SG.M légetu now al áfin de / 261 gather type REL rotten PROX.SG ‘So Abigo said: Bro! Rabbit! You! Gather the bad one!’ (PGA_SM_NARR_2_ SP1_140–146)

123.607369 250

125.673282

200

Pitch (Hz)

150

100 70 jála

gále

ja

áki //

So Abigo said: Bro! 123.6

205

ja

árnab //

Rabbit!

Time (s)

125.7

Quotative constructions and prosody in some Afroasiatic languages 149

125.804773 250

127.54407

200

Pitch (Hz)

150

100 70 /

You! 125.8

léget

now

al

áfin

da /

Gather the bad one!

Time (s)

127.5

The recipient argument of the quotative verb is rarely expressed in the corpus, and occurs only twice, each time with the direct speech report introduced by the complementizer gale. In both examples (with one lexical and one pronominal recipient) the dative preposition is not used, and the beginning or the entire speech report is also integrated in same IU as the quotative frame. (31)

úo báda kélim ána gále ja weledí // 245 3SG start speak 1SG say VOC sonny fi / EXS jaːni // that_is_to_say fi sarájr gi=adʒirú~adʒirú / EXS beds PROG=rent\PASS~rent\PASS ‘He answered me: Sonny, there is, I mean, there are some beds that can be rented’ (lit. he started telling me that) (PGA_SM_NARR_1_259–263)

150 Il-Il Malibert and Martine Vanhove 195.906905 200

197.383196

Pitch (Hz)

150

100 70 úo báda

kélim

ána

gále

ja

weledí //

245

He answered me: Sonny, 195.9

197.4

Time (s)

197.403721 200

199.271365

Pitch (Hz)

150

100 70

197.4

fi /

jani /

there is,

I mean,

fi

sarájr

gi=adirú~adirú /

there are some beds that can be rented

Time (s)

199.3

5.2.2 Intonation-unit boundaries in indirect speech reports As mentioned before, the sole segmental difference between a direct and an indirect speech report in Juba Arabic is the presence of a deictic shift in indirect ones: the indirect speech is reported from the perspective of the narrator, not from that

Quotative constructions and prosody in some Afroasiatic languages 151



of the character. Only one of the four indirect speech reports is produced across multiple IUs; three follow the natural speech declination with minor pitch variations and a rather flat overall contour; one (34) has more pitch movements than the others and a sharper declination. (32) bes rówa kan gále íta ázu wáhid // only go ant say 2SG want one ‘Just go and say that you want one bed.’ (PGA_SM_NARR_1_286) 150 140

Pitch (Hz)

120

100

80 70 bes

rówa

kan

gále

íta

ázu

wáhid //

Just go and say that you want one bed. 215.6

(33)

Time (s)

216.7

gále úmon gi=gum / say 3PL PROG=get_up úmon déru rówa / 3PL want go ‘(They) said that they are leaving; they want to go’ (PGA_SM_CONV_1_ SP2_003)

(34) gále úo gi=rówa túrkja úo gi=rówa dʒíbu say 3SG PROG=go Turkey 3SG PROG=go bring afas-át ta mustéʃfa furniture-PL.N POSS hospital ‘(he) said that he is going to Turkey in order to bring the furniture of the hospital.’ (PGA_SM_CONV_2_SP1_042)

//

152 Il-Il Malibert and Martine Vanhove 160 140

Pitch (Hz)

120 100 80 70 gále

úo gi=rówa túrkja

úo gi=rówa díbu

afas-át

ta

muséʃfa //

(he) said that he is going to Turkey in order to bring the furniture of the hospital. 72.57

Time (s)

74.48

This last example is interesting as it echoes findings about the various semantic functions, mentioned in Section 3.2.2, expressed by direct speech reports in Chantyal (Noonan 2006). The purposive meaning of ex. (34) is not a translation effect, and it is interesting to note that whereas Chantyal uses the sequential converb of the quotative verb for this particular meaning, Juba Arabic, a language with hardly any morphological devices, simply uses a multifonctional verb form. Still, it is not a surprising evolution as the grammaticalization of a quotative verb into a purposive marker is widely attested crosslinguistically, and in Sudan in particular (see e.g. Saxena, 1995; Vanhove 2004; Güldemann 2008). To sum up, direct and indirect quotes do not seem to behave differently in their cline of integration within the quotative frame, and in their final boundaries. The sole difference, the absence of an isochronous pattern in indirect speech reports, needs to be checked on a larger sample as four examples are too few to draw general conclusions. 6. Modern Hebrew quotatives 6.1 Elements of syntax and prosody Like Zaar and Juba Arabic, the canonical constituent order of Modern Hebrew is SVO, but it may vary for reasons linked to information hierarchy. The subject argument is morphologically unmarked, but the direct object argument, when definite, is marked by a specific clitic preposition et= (with nouns), and its allomorph ot= (with pronouns). Recipient arguments of transitive verbs, among them



Quotative constructions and prosody in some Afroasiatic languages 153 .

the quotative verb lomaʁ ‘say’ and its suppletive form lehagid, are introduced by the preposition l= ‘towards’, which is clitic to its nominal or pronominal host (the preposition also clitizes to the verb when the recipient argument is a pronoun). In complex sentences, the relative clause is usually embedded in the matrix clause with the nominal head preceding it and the rest of the embedding clause following it. Object complement clauses usually follow the matrix clause, and thus follow, directly or indirectly, the verbal head. Both clause types are introduced by an invariable clitic marker ʃe=. Speech reports may be direct or indirect. Direct speech reports are syntactically quotative complements, direct objects of the quotative verb, but in most cases, unlike definite nominal objects, pronominal objects, and complement clauses, they are not introduced by a preposition or a complementizer. The quotative frame, i.e. the subject, the quotative verb and the recipient, expressed in this order, precede the quoted speech. The recipient argument is often omitted, but, unlike the other three CorpAfroAs languages discussed in this chapter, omission of the subject is rare: in fact there are only two subjectless quotative frames in the entire corpus. In some rare instances, the direct quoted speech is introduced by the similative14 marker keilu ‘like’ (a crosslinguistically frequent source of quotative markers, see Güldemann, 2008), a construction typical of the younger generation (such as the female speaker 1 of CONV_1 and NARR_1 who is under 35). Indirect speech reports on the other hand, like complement and relative clauses, are introduced by the clitic marker ʃe= ‘that’, and they are syntactically adapted to the narrator’s perspective by a deictic shift. Modern Hebrew15 has a lexical accent system where word-final stress characterized by a higher pitch is the phonological “default” stress. In addition, there is a smaller set of lexical and prosodic words which have penultimate stress (antepenultimate stress is mainly found in borrowings). There is an overt rhythmic play between stressed and unstressed syllables. Function words do not usually carry stress. The duration of the final syllable of an IU, be it a continuing or a terminal contour, is double the length of the other syllables. Major terminal contours are falling in declarative utterances, rising on the last accented item in questions and even very strong rising in exclamative utterances. Continuous minor boundaries are of five types according to Silber-Varod and Kessous (2008) (who used the Corpus of Spoken Israeli Hebrew [CoSIH], see Izre’el and Rahav 2004): continuous-rising tone, continuous-falling tone, continuous rising-falling tone, 14.  It also functions as a discourse marker. 15.  This paragraph on prosody is mainly based on Mixdorff and Amir (2002), Amir, SilberVarod and Izre’el (2004), Silber-Varod and Kessous (2008), and Silber-Varod (2011).

154 Il-Il Malibert and Martine Vanhove

continuous-level tone, and continuous-elongated tone, the latter being by far the most frequent and the second and third the least frequent. In the CorpAfroAs corpus of Modern Hebrew16 speech reports are a rare rhetorical strategy, far less frequent than in the other three languages of our sample. Indirect speech reports amount to a mere nine examples, while direct speech reports are slightly more than twice as numerous17 (21, of which only two are introduced by the similative marker keilu). Direct and indirect quotes are almost equally distributed between narratives and conversations, but indirect speech reports are twice as numerous in narratives (6) than in conversations (3). These low figures, and the fact that conversations represent 33% of the one-hour Modern Hebrew corpus do not lend statistical significance to this observation, which would need to be confirmed on a larger corpus. 6.2 Prosodic integration cline in Modern Hebrew 6.2.1 Intonation-unit boundaries in direct speech reports 14 of the 21 direct speech reports, i.e. a majority, are integrated within the same IU as the quotative frame (totally or partially when the quote is split into several IUs), including one of the two examples with the similative marker keilu. In the remaining seven examples, the quotative frame is set off from the direct speech, most often simply by a major or a minor boundary, and in two cases also followed by a pause. There is a clear tendency in the degree of prosodic integration according to genres. In narratives, the direct quote is integrated in the same IU as the quotative frame in six examples, and not integrated in the remaining example. The non-integration might not be significant because it concerns one utterance where someone else speaks at the same time as the narrator. Below is an example in a narrative where only the first word of the speech report is integrated in the same IU as the quotative frame because the narrator hesitates. The very high rising pitch of 150 Htz on the quotative verb, far above the usual pitch rise before keilu in other utterances, is explained by pragmatic reasons: the narrator is trying to convince her interlocutor of something implausible. (35)

ve=aːːː# and=FS anaʃim ʃe=omʁ-im keilu ani / men\PL COMP=say\PTCP.ACT-PL.M like 1SG

16.  One text, NARR7, is taken from the CoSIH corpus (see Izre’el & Rahav 2004). 17.  Zuckerman (2006: 469) claims that the ratio of direct to indirect speech reports is just the opposite, but he does not mention what kind and quantity of data support his assertion.

Quotative constructions and prosody in some Afroasiatic languages 155





ani // 1SG lemaʃal a# / for_example FS lemaʃal kʃe=ata mekabel stam // for_example when= 2SG.M obtain\PTCP.ACT[SG.M] whatever notn-im dugm-a / give\PTCP.ACT-PL.M example-SG.F ‘and the people who say like: I, I, for instance I… for instance when you get… they give there an example… ‘(HEB_IM_NARR1_SP1_135–140)

500 400

Pitch (Hz)

300 200 100 vea#/

naʃim

ʃeomrim

keilu

ani /

and the people who say like: I, 134.8

Time (s)

136.7

In conversations, the direct quote is more commonly set off from the quotative frame (eleven examples), than the reverse (two examples). One of these two (ex. 36) is split into several IUs, and its first element consists of the 1st subject independent pronoun, and the quotative verb with which it is phonetically fused and reduced to [omʁanə]. A minor prosodic boundary follows it and the rest of the quoted speech follows with an initial pitch reset of over 100 Htz. (36)

687 omʁ-im ani ko# kodemkol jisʁael-i   / say\PTCP.ACT-PL.M 1SG FS first_of_all Israel-ADJVZ ve=axʁej ze jehud-i / and=after anaph Jewish-ADJVZ ‘they say: I am first of all an Israeli, after that a Jew…’ (HEB_IM_CONV2_ SP1_116–117)

156 Il-Il Malibert and Martine Vanhove 216.281051 300

218.541042

250

Pitch (Hz)

200 150 100 omrim ani / ko#

kodemkol

they say: I am

jisraeli

veaxrej

ze

jehudi /

first of all an Israeli, after that a Jew

216.3

218.5

Time (s)

The other example contains the similative marker keilu; a minor IU boundary follows the marker with a falling contour, and the beginning of the direct speech report slowly increases again over the first syllables, before a sharp rise of 90 Hz on the accented syllable. (37)

ve=hem omʁ-im keilu / and= 3PL.M say\PTCP.ACT-.PL.M. like ze lo=meʃan-e im ze davaʁ katan // INDF NEG=change\PTCP.ACT-SG.M if INDF thing small ‘and they say like: it does not matter if it is a small thing’ (HEB_IM_ CONV1_SP1_127–128)

188.985971 450

190.78437

400

Pitch (Hz)

300 200 100 vehem omrim

keilu / ze

and they say like: 189

lomeʃane

im ze

davar

katan //

it does not matter if it is a small thing

Time (s)

190.8

Quotative constructions and prosody in some Afroasiatic languages 157



Because of the scarcity of examples it is not possible to tell whether keilu usually belongs to the quoted speech or to its frame. In any case the number of direct quotes is too small to draw conclusions about the distribution of prosodic patterns across genres. This would require further research on a larger sample. In association with the marking by IU boundaries, direct speech reports are also set off from the quotative frame by variations in pitch, and can either be pronounced at a lower pitch than the average pitch of the quotative frame after a falling terminal contour as in (38) or at a higher pitch after a continuing contour as in (39). On the pitch trace, a decrease or an increase of the fundamental frequency is clearly seen over the IU of the reported speech as compared with the quotative frame. (38)

ve=az hem matxil-im l=hagid l=xa // and=so 3PL.M begin\PTCP.ACT-PL.M to=say\INF to=2SG.M ʁega ve=ex ani os-e et=ze // one_moment and=how 1.SG do\PTCP.ACT-SG.M OBJ.DEF=DEM ‘and then they start telling you: Wait a minute! How do I do it?’ (HEB_IM_ CONV1_SP1_103–104)

166.874662 450

168.816797

400

Pitch (Hz)

300 200 100 veaz em matxilim leagid

lexa //

and then they start telling you: 166.9

(39)

rega

veex

ani

ose

et ze //

Wait a minute! How can I do that?

Time (s)

keilu at lo jexol-a l=ʃevet l=hagid / like 2SG.F NEG can\PTCP.ACT-SG.F to=sit\INF to=say\INF BI aj ex ani ʁots-a / bi oye how 1SG want\PTCP.ACT-SG.F ‘like you cannot sit and say: I wish so much…’ (HEB_IM_CONV2_ SP2_015–016)

168.8

158 Il-Il Malibert and Martine Vanhove 31.5468096 300

34.0423101

250

Pitch (Hz)

200 150 100 keilu at lo jexola laʃevet lehagid /

he

like you cannot sit and say: 31.55

ex ani rotsa /

aj

I wish so much 34.04

Time (s)

Of the three examples which are set off from the quotative frame by a pause, two are clearly due to the speaker’s hesitation as in (40) where the subject independent pronoun is repeated and lengthened; the very long length of the third token with a pause (41) is more difficult to interpret but may be linked to speech processing issues as well: (40)

keilu omʁ-im / 358 like say\PTCP.ACT-PL.M ani / 1SG ani xoʃev ʃe / 1SG think\PTCP.ACT.[SG.M] COMP ‘as if they say: I, I think that…’ (HEB_IM_CONV1_SP2_121–124) 282.628512 250

285.471318

Pitch (Hz)

200

150

100 keilu omrim / as if they say: 282.6

358

ani / I,

Time (s)

ani

xoʃev

ʃe /

I think that 285.5

Quotative constructions and prosody in some Afroasiatic languages 159



(41)

ve=ata omeʁ / 1037 and= 2SG.M say\PTCP.ACT.[SG.M.] okej // okay ‘and you say: Okay’ (HEB_IM_CONV1_SP1_099–101)

164.453664 400

166.558268

Pitch (Hz)

300

200

100 veata omer /

1037

and you say: 164.5

okej // Okay

Time (s)

166.6

When the speech report is (fully or partially) integrated into the same IU as the quotative frame, there is a clear prosodic declination which starts towards the beginning of the quoted material as in (42) and (43) below: (42)

ve=at omeʁ-et ani and=2F.SG say\PTCP.ACT-SG.F 1SG mekabel-et mi=meni // obtain\PTCP.ACT-SG.F from=1SG ‘and you say to yourself : I get it from myself ’ (HEB_IM_CONV1_SP2_127)

160 Il-Il Malibert and Martine Vanhove 400

Pitch (Hz)

300

200

100 veat

omeret

ani

mekabelet

mimeni //

and you say to yourself : I get it from myself 290.9

292.3

Time (s)

(43) amaʁ-nu na-ase kombin-a // say\PFV-1PL 1PL-do\nFCT trick-SG.F ‘we decided to play a trick’ (lit. we say: let’s play a trick) (HEB_IM_NARR7_ SP1_1152) 160 140

Pitch (Hz)

120 100 80 70 amarnu

naase

kombina //

we say: let’s play a trick 1119

Time (s)

1120

The above example reminds us of the intention meaning of reported speech constructions as described by Noonan (2006) for Chantyal, except that in Hebrew it is a finite verb form that is used instead of a converb. It is also highly possible that the optative form of the verb in the reported speech plays a role in this semantic interpretation.



Quotative constructions and prosody in some Afroasiatic languages 161

Another feature which generally also sets off the direct speech report from the quotative frame is that in most cases, the latter is pronounced more rapidly than the former, as can be easily seen on all the pitch traces of this section, and as was already observed in some instances in Beja. As in Zaar and Juba Arabic, the end of direct quotes are systematically set off from the next quote or narrative utterance by an intonation-unit boundary, a major one in most cases, and often also by a pause (in 15 examples). 6.2.2 Intonation-unit boundaries in indirect speech reports As already mentioned, indirect speech reports are more frequent in narratives (seven examples) than in conversations (two examples) but the difference is not significant, because it almost matches the proportion of each genre in the corpus. Contrary to direct speech reports, indirect speech reports (or their beginnings) are never set off from the quotative frame by a prodosic boundary. The complementizer is clitic to the first constituent of the indirect quote (whereas it belongs to the same IU as the quotative frame in Zaar). As with direct speech reports, the ends of the indirect speech reports are always marked by a prosodic boundary, mainly major ones (seven examples) of which three are also followed by a pause. Again, these observations need to be checked on a larger sample. Indirect speech reports are more often uttered at a lower pitch than the quotative frame as shown in (44) where the quote follows a declination from 250Htz to 150Htz and ends on a falling terminal contour. (44)

amaʁ=l=i me=ha=ʁega say\PFV[3SG.M]=to=1SG from=DEF=one_moment ʃe=hu pagaʃ=ot=i / REL= 3SG.M meet\PFV=OBJ.DEF=1SG ʃe=ani e-heje moʁ-a l=joga l=jeled-im // COMP=1SG 1SG-be\nFCT teacher-SG.F to=yoga to=child-PL.M ‘(my amazing teacher who is my mentor) told me from the first moment we met that I’ll be a yoga teacher for kids’ (HEB_IM_CONV3_SP1_210)

162 Il-Il Malibert and Martine Vanhove 350 300

Pitch (Hz)

250 200 150 100 amarli

mearega

ʃehu pagaʃ oti ʃeani eheje mora

lejoga

lejeladim //

told me from the first moment we met that I'll be a yoga teacher for kids 211.6

214

Time (s)

Indirect speech reports may also contrast with the quotative frame in an opposite way: they can be pronounced at a higher pitch than the quotative frame, with very little variation in pitch (as in 45), except when a strong emotion is associated to the pitch increase (as in 46). (45) haj-u omʁ-im al=ha ʃe=hi be\PFV-3PL say\PTCP.ACT-PL.M on=3SG.F COMP=3SG.F holex-et ben ha=tip-ot // go\PTCP.ACT-SG.F between DEF=drop-PL.F ‘it was said about her that she walks between the raindrops’ (HEB_IM_ NARR4_SP1_318) 150 140

Pitch (Hz)

120 100 80 70 aju

omrim alea ʃei

olexte

ben

atipot //

it was said about her that she walks between the raindrops 373.9

Time (s)

[laughter] 375.6

Quotative constructions and prosody in some Afroasiatic languages 163



(46) ʃam amʁ-u ʃe=benladen joʃev // there say\PFV-3PL COMP=Bin_Laden sit\PTCP.ACT.[SG.M] ‘there it was said that Bin Laden stayed’ (HEB_IM_NARR7_SP2_018) 170 160 140

Pitch (Hz)

120 100 80 70 ʃam

amru

ʃebenladen

joʃev //

there it was said that Bin Laden stayed 38.33

Time (s)

39.95

The use of indirect speech is clearly linked to modality and marks the stance the narrator is taking toward the speech report, bringing epistemic and evaluative values. Three main syntactic devices are associated with the use of indirect quotes in these modal contexts. i. The formulation of the quotative frame as a question: (47) amaʁ-ti=l=exa ʃe=jom axʁej say\PFV-1SG=to=2SG.M COMP=day after ʃe=xazaʁ-ti mi=lajden haj-a minus eseʁ  / REL=come_back\PFV-1SG from=Leiden be\PFV-3SG.M minus ten ‘Did I tell you that one day after I came back from Leiden it was minus ten?’ (HEB_IM_NARR7_SP2_101)

ii. The inclusion of the quotative verb in a modal construction with an auxiliary verb as in (45) and (48): haj-ti omeʁ ʃe=ha=tipul b b / be\PFV-1SG say\PTCP.ACT COMP=DEF=care in in b=ha=jelad-im // in=DEF=child-PL.M b=bajit-ej ha=jelad-im // 353 in=house-PL\CS DEF=child-PL.M

(48)

164 Il-Il Malibert and Martine Vanhove

haj-a behexlet tipul tov haj-a / be-PFV.3SG.M by_all_means care good be-PFV.3SG.M I would say that the care of the children, in the children’s homes, it was definitely good care… (HEB_IM_NARR4_SP1_158–163)

iii. The omission of the lexical or pronominal independent subject of the quotative verb used in the 3rd person plural, as a pseudo-passive construction, as in (46) and (48) which has in addition an epistemic modal adverb in the indirect speech report: (48)

amʁ-u=l=o ʃe=hu kaniʁʔe // say\PFV-3PL=to=3SG.M COMP=3SG.m maybe saviʁ l=haniax ji-ʃaeʁ / reasonable to=suppose\INF 3SG.M-stay\n-FCT tsemax l=kol=ha=xajim // plant to=every=DEF=life\PL hu jaxol ʁak l=matsmets ki ze XXX // 3SG.m can\ACT.PTCP.[SG.M] only to=blink\INF CSL indf XXX ‘he was told that maybe it is possible that he will remain paralyzed for life, he will only be able to blink because…’ (HEB_IM_CONV1_SP1_041–043)

7. Conclusion: Towards a typology The four languages of our sample differ radically in the extent to which they make use of reported speech as a rhetorical strategy: in Beja they are almost three times as numerous as they are in Zaar, over three times as numerous as they are in Juba Arabic (for which the 70 examples attested in the 46 mn corpus could be extrapolated up to some 90 examples had the corpus been an hour long like the others), and more than ten times as numerous as they are in Modern Hebrew. It is noticeable that the highest proportions of reported speech occur in the three unscripted (or very recently scripted) languages of the sample. The type and content of narrations and conversations (and the absence of the latter genre in Beja) may have introduced a bias in the quantitative results, but they may be indicative of a more general rhetorical profile in these languages. The four languages also differ in the proportion of direct vs indirect speech reports, in favor of the former. Indirect speech report is unknown in Beja, while it is just incipient but still marginal in the Juba Arabic pidgin. Figures for Modern Hebrew are not statistically significant but nevertheless direct quotes are more than twice as numerous as indirect ones. In this language the hypothesis concerning the link between genres and types of speech reports needs to be checked on



Quotative constructions and prosody in some Afroasiatic languages 165

a much larger corpus. Zaar on the other hand clearly favors direct quotes which represent 66% of all speech reports. Direct and indirect speech reports do not seem to differ greatly in our data in terms of prosody (unfortunately the Hebrew and Juba Arabic data are not statistically significant — for different reasons), but there are nevertheless some indications of quantitative differences that need to be checked on larger corpora: in Zaar indirect speech reports are less integrated than direct ones in the quotative frame, but it is the reverse in Modern Hebrew where the number of tokens is not significant; in Juba Arabic, indirect speech has less pitch variation than direct speech and may lack isochronous patterns. The prosodic integration cline of speech reports also varies from language to language, according to different criteria, and we would now like to propose some preliminary hypotheses for a typology of the interface between prosody and speech report that could be further tested empirically on other languages and on larger samples. These claims concern the interface between morpho-syntax and intonation units: 1. If languages have no complementizer, the prosodic integration of speech reports within the same intonation unit as the quotative frame tends to be very high. (The prosodic integration concerns the end of speech reports in SOV languages, and their onsets in SVO language; our small sample contains no VSO languages). Juba Arabic represents the top most representative language for this claim with 100% of prosodically integrated speech reports, be they direct or indirect; Beja comes next with almost 90% (with the quotative verb only). Modern Hebrew direct speech reports (without a complementizer) show a majority of integrated tokens (65%), but this is statistically not significant because of the small number of examples. Further research on larger corpora is needed for this language. 2. Conversely, if languages have a complementizer, whatever their word orders, speech reports tend to be less integrated within the quotative frame. Zaar, with only 40% of integrated speech reports, is representative of this type.18 Modern Hebrew indirect speech reports might be a counterexample, but again there are too few examples in our data to ascertain. Still, this may be indicative of a difference between direct and indirect speech reports that needs to be further investigated. 3. Non-clitic complementizers tend to be prosodically integrated within the quotative frame, but not exclusively, when a quotative verb is overtly expressed. 18.  And may be also Dolakha Newar, but Genetti (2011) does not provide any statistics.

166 Il-Il Malibert and Martine Vanhove

Zaar is paramount of this type where a prosodic boundary between the quotative frame and the speech report can only occur after the complementizer. Juba Arabic (in which all speech reports are integrated in the quotative frame) is also representative. 4. In general, it seems that the presence of an explicit morpho-syntactic cue allows for less prosodic integration. 5. In SOV languages where the quotative verb follows the speech reports, their onset is systematically set off from the previous intonation unit, a clear prosodic cue, marking the beginning of the speech report. In SVO languages it is the end of the speech report which is set off from the next IU. This last claim seems to be a good candidate for a universal prosodic cue of speech reports, with a strong preference for major terminal boundaries and pauses. It concerns all four languages of our sample. The rare cases with a minor boundary occur when the adjacent utterance is a dependent clause, as in Beja. This is a strong indication that quoted speech is treated as independent of the narrative in which it is embedded. There are few exceptions to the above claims in our data (often explainable by difficulties in speech processing), but this does not mean that they can be generalized without further empirical and typological studies, and we hope to have paved the way for further research in this domain. More attention should be also paid to the other prosodic cues of speech reporting such as loudness and rhythm, seldom mentioned in this chapter for lack of space.

List of glosses / //

ABL ACC ADJVZ ADRF advs anaph ANT AOR CAUS CMPR COM

minor prosodic boundary major prosodic boundary code-switching ablative accusative adjectivizer address form adversative anaphoric anterior aorist causative comparative comitative

INF INT IPFV IRR LOC M MNR N NEG nFCT NOM OBJ OBL opn

infinitive intensive imperfective irrealis locative masculine manner nominal negation non-factual nominative object oblique opener

Quotative constructions and prosody in some Afroasiatic languages 167

COMP COND COORD COP CS CSL CVB DEF DEM DIR DIST EXCM EXS F FCT fs FUT GEN IMP IND INDF

complementizer conditional coordination copula construct state causal converb definite demonstrative directional distal exclamation existential feminine factual false start future genitive imperative indicative indefinite

PASS PFV PL POSS PROG PROH PROX PTCP.ACT RECP REL REM res SBJ SBJV SG SMLT VN VOC vrt

passive perfective plural possessive progressive prohibitive proximal active participle reciprocal relator remote resultative subject subjunctive singular simultaneity verbo-nominal vocative virtual

Acknowledgements We are most grateful to Bernard Caron and Stefano Manfredi who have been kind enough to share their knowledge about Zaar and Juba Arabic respectively with us and to answer our numerous questions about these two languages. Our gratitude is also due to Dave Roberts who kindly corrected our non-native English, and to our two reviewers who helped us improve the first version of this chapter. Of course all remaining errors are ours.

References Amir, Noam, Silber-Varod, Vered & Izre’el, Shlomo. 2004. Characteristics of intonation unit boundaries in spontaneous spoken Hebrew: Perception and acoustic correlates. In Proceedings of the 2nd International Conference on Speech Prosody - SP2004, Nara, Japan, 677–680. Caron, Bernard. 2012a. A Grammatical Sketch of Zaar (a South-Bauchi Chadic language spoken in Nigeria). (10 June 2013). Caron, Bernard. 2012b. ‘Zaar Corpus’. Corpus recorded, transcribed and annotated by Bernard Caron. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of

168 Il-Il Malibert and Martine Vanhove Spoken AfroAsiatic Languages. DOI: http://dx.doi.org/10.1075/scl.68.website. Accessed on 10/06/2013. (= SAY_BC) Chafe, Wallace. 1994. Discourse, Consciousness, and Time: The Flow and Displacement of Conscious Experience in Speaking and Writing. Chicago IL: The University of Chicago Press. Coulmas, Florian, 1986. Reported speech: Some general issues. In Direct and Indirect Speech, Florian Coulmas (ed.), 1–28. Berlin: Mouton de Gruyter. DOI: 10.1515/9783110871968.1 Cresti, Emanuela & Moneglia, Massimo (eds). 2005. C-ORAL-ROM: Integrated Reference Corpora for Spoken Romance Languages [Studies in Corpus Linguistics 15]. Amsterdam: John Benjamins. DOI: 10.1075/scl.15 Cruttenden, Alan. 1997. Intonation, 2nd ed, [Cambridge Textbooks in Linguistics]. Cambridge: CUP. DOI: 10.1017/CBO9781139166973 Du Bois, John W. 2004. Representing Discourse, Part 2: Appendices and Projects. Santa Barbara CA: Linguistics Department, University of California. Du Bois, John W. & Cumming, Susanna, Schuetze-Coburn, Stephan & Paolino, Danae. 1992. Discourse Transcription [Santa Barbara Papers in Linguistics 4]. Santa Barbara CA: Department of Linguistics, University of California. Genetti, Carol. 2011. Direct speech reports and the cline of prosodic integration in Dolakha Newar. Himalayan Linguistics 10(1): 55–71. Special issue in memory of Michael Noonan and David Watters. Güldemann, Tom. 2008. Quotative Indexes in African Languages: A Synchronic and Diachronic Survey. Berlin: Mouton de Gruyter. DOI: 10.1515/9783110211450 Hamid-Ahmed, Mohamed Tahir. 2013. Les articles définis en bedja, dialecte du Gash. In Proceedings of the 5th International Conference on Cushitic and Omotic Languages. Paris, 16–18 April 2008, Marie-Claude Simeone-Senelle & Martine Vanhove (eds), 175–180. Cologne: Rüdiger Köppe. Izre’el Shlomo & Rahav, Giora. 2004. The Corpus of Spoken Israeli Hebrew (CoSIH); Phase I: The pilot study. In LREC 2004 Satellite Workshop; Fourth International Conference on Language Resources and Evaluation: Compiling and Processing Spoken Language Corpora (Lisbon, Portugal), Nelleke Oostdijk, Gjert Kristoffersen & Geoffrey Sampson (eds), 1–7. Paris: ELRA-European Language Resources Association. Jansen, Wouter, Gregory, Michelle L. & Brenier, Jason M. 2001. Prosodic correlates of directly reported speech: Evidence from conversational speech. Ms. (10 June 2013). Klewitz, Gabriele & Couper-Kuhlen, Elizabeth. 1999. Quote-unquote? The role of prosody in the contextualization of reported speech sequences. Pragmatics 9(4): 459–485. Malibert, Il-Il. 2012. ‘Modern Hebrew Corpus’. Corpus recorded, transcribed and annotated by Il-Il Malibert. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOI: http://dx.doi.org/10.1075/scl.68.website. Accessed on 10/06/2013. (= HEB_IM) Manfredi, Stefano. 2012. ‘Juba Arabic Corpus’. Corpus recorded, transcribed and annotated by Stefano Manfredi. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOI: http://dx.doi.org/10.1075/scl.68.website. Accessed on 10/06/2013. (= PGA_SM) Manfredi, Stefano & Petrollino, Sara. 2013. Juba Arabic. In The Survey of Pidgin and Creole Languages, Vol. III: Contact Languages Based on Languages from Africa, Australia, and the Americas, Susanne Michaelis, Philip Maurer, Magnus Huber & Martin Haspelmath (eds), 54–65. Oxford: OUP.



Quotative constructions and prosody in some Afroasiatic languages 169

Mixdorff, Hansjörg & Amir, Noam. 2002. The prosody of Modern Hebrew. A quantitative study. In Proceedings of Speech Prosody 2002. An International Conference, 515–518. Aix-enProvence. Noonan, Michael. 1985. Complementation. In Language Typology and Syntactic Description, Vol. 2, Timothy Shopen (ed.), 42–139. Cambridge: CUP. Noonan, Michael. 2006. Direct speech as a rhetorical style in Chantyal. Himalayan Linguistics 6: 1–32. Oliveira, Miguel Jr. and Cunha, Dóris A. C. 2004. Prosody as marker of direct reported speech boundary. Speech Prosody 2004, International Conference. http://sprosig.isle.illinois.edu/ sp2004/PDF/Oliveira-Cunha.pdf (10 June 2013) Saxena, Anju. 1995. Unidirectional grammaticalization: Diachronic and cross-linguistic evidence. Sprachtypologie und Universalienforschung 48(4): 350–372. Silber-Varod, Vered. 2011. Dependencies over prosodic boundary tones in spontaneous spoken Hebrew. Depling 2011 - International Conference on Dependency Linguistics, Barcelona, September 5-7, 241–250. Silber-Varod, Vered & Kessous, Loїc. 2008. Prosodic boundary patterns in Hebrew: A case study of continuous intonation units in weather forecast. In Proceedings of the Speech Prosody 2008 Conference, Plinio A. Barbosa, Sandra Madureira & César Reis (eds), 265–268. Campinas: Editora RG/CNPq. Tao, Hongyin. 1996. Units in Mandarin Conversation: Prosody, Discourse, and Grammar [Studies in Discourse and Grammar 5]. Amsterdam: John Benjamins. DOI: 10.1075/sidag.5 Thompson, Sandra A. 2002. ‘Object complements’ and conversation towards a realistic account. Studies in Language 26(1): 125–164. DOI: 10.1075/sl.26.1.05tho Vanhove, Martine. 2004. ‘dire’ et finalité en bedja: Un cas de grammaticalisation. Journal of African Languages and Linguistics 25(2): 133–153. DOI: 10.1515/jall.2004.25.2.149 Vanhove, Martine. 2012a. A Grammatical Sketch of Beja. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOI: http:// dx.doi.org/10.1075/scl.68.website/BEJ/PDF/BEJ_MV_GRAMMATICALSKETCH.PDF>. Vanhove, Martine. 2012b. ‘Beja Corpus’. Corpus recorded, transcribed and annotated by Martine Vanhove. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOI: http://dx.doi.org/10.1075/scl.68.website. Accessed on 10/06/2013. (= BEJ_MV) Wichmann, Anne. 2000. Intonation in Text and Discourse: Beginnings, Middles and Ends [Studies in Language and Linguistics]. Harlow: Pearson Education. Zuckermann, Ghil’ad. 2006. Direct and indirect speech in straight-talking Israeli. Acta Linguistica Hungarica 53(4): 467–481. DOI: 10.1556/ALing.53.2006.4.5

Part 3

Cross-linguistic comparability

Glossing in Semitic languages A comparison of Moroccan Arabic and Modern Hebrew* Ángeles Vicente, Il-Il Malibert and Alexandrine Barontini Universidad de Zaragoza / INALCO (Paris) / INALCO (Paris)

Interlinear morphemic glosses facilitate the comprehension and analysis of any described language, even one that is unfamiliar to the reader. In Semitic studies, most publications do not include glosses, forcing readers to analyse the examples in order to understand them. This paper examines Moroccan Arabic and Modern Hebrew, and proposes the use of interlinear morphemic glosses within the typological grammatical tradition, for Semitic linguistics in general.

1. Introduction The interlinear morphemic glossing proposed in this article gives researchers a choice of labelling conventions for their morphological analyses, and enables readers unfamiliar with the language to identify the meaning and function of individual morphemes. Whenever possible, similar cross-linguistic categories are glossed with the same labels, in order to provide a morphemic glossing system for all Semitic languages. We start with overviews of the Arabic and Hebrew traditions with regard to this type of morphological analysis in Sections 2 and 3. Then in Section 4 we describe the common labels used to identify identical categories in Moroccan Arabic (henceforth MA) and Modern Hebrew (henceforth MH), and their application to the analysis of these genetically similar spoken varieties. Some conclusions are drawn in Section 5.1 *  We would like to thank Amina Mettouchi, Martine Vanhove and Marie-Claude SimeoneSenelle for their help and suggestions. 1.  Six members of CorpAfroAs are working on Semitic languages. All of them participated in discussions to decide the most accurate labels and methodology for the interlinear glossing doi 10.1075/scl.68.05vic 2015 © John Benjamins Publishing Company

174 Ángeles Vicente, Il-Il Malibert and Alexandrine Barontini

2. Interlinear morphemic glossing in Arabic varieties Interlinear morphemic glossing of linguistic data first appeared in the 1960s and from the 1980s onwards, it became standard in scientific publications (Lehmann 2004: 4). Nevertheless, its application to Arabic studies has been comparatively slow, and this is particularly true of vernacular Arabic varieties. The first Arabic grammars adopted a descriptive approach and did not include interlinear morphemic glosses at all; neither did other classical languages like Latin and Greek. This was because these very famous, although by now outdated, grammars were meant for the already initiated who did not need glossing (Lehmann 2004: 3). This is the case, for instance, with Fleisch (1979) for Classical Arabic, and Fleisch (1974) and Fischer & Jastrow (1980) for vernacular Arabic varieties. The first attempts at interlinear morphemic glossing appeared in studies of Classical Arabic, and vernacular Arabic studies have gradually followed this. Nevertheless, even though it is on the increase, it remains an idiosyncratic practice rather than a widespread tradition. Concerning vernacular Arabic varieties, the absence of glosses has partly been due to the choice of research methods, which has been more focused on collecting texts, describing isoglosses, and analysing diachronic data than on typological studies. Most of the better known and representative works in this field, even those that describe synchronic varieties, do not gloss the examples. This is the norm in both older and more recent studies (e.g. Grand’Henry 1972; Caubet 1993; Watson 1993; Yoda 2005; Gralla 2006; and De Jong 2011). This is surprising since glossing is useful for describing and comparing different languages and varieties which is, after all, the first aim of comparative dialectology. This situation began to change some years ago. Descriptions from the last decades of the 20th century use some standard abbreviations, usually for gender (m or f), for number (SG or PL) and for some grammatical categories difficult to translate by a simple word. These abbreviations were included in word-by-word translations of the examples, the aim being to facilitate comprehension more than to provide a morphological analysis, as this example from Vanhove (1993: 181) demonstrates in Maltese (1):2 (1) saníftaḥ ḥanût “Je vais ouvrir une boutique” proposed in this article. Besides the three authors of this article, the other researchers are Dominique Caubet, Stefano Manfredi and Christophe Pereira. 2.  Examples by other authors will be reproduced with their original transcriptions.

Glossing in Semitic languages 175



Such literal translations were the first steps towards the practice of glossing, but because most category labels had not yet been identified or were very limited, they did little to reveal the internal structure of the language. It has only been since the beginning of the 21st century that this practice has become more widespread, though even now it continues to be more common in Classical Arabic than in vernacular Arabic varieties. Even quite recently, edited collections, such as Festschrifts in which authors represent various traditions (e.g. Haak et al. 2004), contain only a few contributions like Owen’s article on Nigerian Arabic that include morphemic glossing (2), (Owens 2004: 210): (2) an-akal-at rūs al-qalla PS-eat-F ID DEF-grain “The grain got crunched up”

Similarly, the interlinear glossing included in an article on Kuwaiti Arabic (Tsukanova 2008: 448) is very much the exception in the proceedings of the various AIDA3 conferences (3): (3) fa ga‘ yigūl luhum ənna “ ‘āna miḥtār” so PROG he-says-IPFV to=them that “I confused-A.PTCP” “So he’s telling them that “I’m confused”

Of course, there are some exceptions. One of the first studies about a vernacular variety of Arabic that includes a morphological analysis of data by means of glossing was a study of MA-Dutch code-switching (4, Boumans 1998: 190):4 (4)

ah, maši bḥal l-mġaṛba lli ka-ne-ʕṛef-hŭm ana f l-.. eh oh neg like def-Moroccan.pl rel asp-1-know-3pl 1sg in def-er omgeving dyal-i environment of-1sg “Oh, it is not at all like the Moroccans I know in my environment”

Practice is changing, with glossing becoming increasingly common in the most recent publications. But even these tend to cover only some functions of the analysed variety, and often provide only a word-by-word translation with some additional abbreviations.5 The result is a hybrid system, somewhere between a literal translation and a linguistic analysis.

3.  AIDA stands for Association Internationale de Dialectologie Arabe; it was created in 1993. 4.  Bouman’s article includes a list of category labels on p. 404. 5.  It is usual to find a list of abbreviations. For instance Vanhove (2004: 330), writing in French, uses: CONC = concomitant, COP = copule, FOC = focalisation. See also Henkin (2010: 1–2).

176 Ángeles Vicente, Il-Il Malibert and Alexandrine Barontini

Nevertheless, nowadays it is possible to find interlinear morphemic glosses in monographs or articles in edited collections referring to vernaculars such as Egyptian Arabic (5, Esseesy 2010: 239) and Classical Arabic (6, Solimando 2011: 77): (5) katabtə-l-kum wrote.1sg-to-you.mpl “I wrote to you” (6) yu-ṣīb-u al-qirṭā-s wa-llāh-i 3ms hit.imperf-ind def-target-acc Oh-God-gen “It will hit the target, I swear by God”

The Encyclopedia of Arabic Language and Linguistics is an example of this new tendency where “in morph-by-morph translations, standard coding has been used” (EALL Versteegh et al. 2006: introduction, vi).6 Nevertheless, morphemic glossing is not extensively practiced in this publication and neither is it systematically used in descriptions of Arabic varieties (7–10): (7)

Entry “Agreement” (Bahloul 2006: 45–46): nāma l-’awlādu slept-3.m.s the-boys-nom “The boys slept”

(8)

Entry “Auxiliary”(Haak 2006: 218): kāna l-maliku yamurru bi-hi be:3Msg the-king pass:ind3Msg by-him “The king was passing by him”

(9)

Entry “Ellipsis” (Mughazy 2007: 19): ḥa:waltu fa-lam ’astaṭi‘ tried.Ims but-NEG.past be able to.Ims “I tried but I could not”

(10)

Entry “Functional grammar”(Moutaouakil 2007: 144): hal tašrabu š-ša:ya: Q drink-you the-tea-ACC “Do you drink tea?”

There are also examples of Arabic varieties (11–13):

6.  There are approximately 500 entries in four volumes published in 2006–2009.

Glossing in Semitic languages 177



(11)

Entry “Aktionsart” (MA), (Reese 2006: 51): kemmel l-makla complete:PFV:3Sm ART-meal “He finished his meal”

(12) Entry “Construct state” (MA), (Benmamoun 2006: 479): a. l-mǝdras-a dyal nadya the-school-Fem of Nadia “Nadia’s school” b. mǝdras-t nadya school-Fem Nadia “Nadia’s school” (13)

Entry “Resumption” (Lebanese Arabic), (Aoun 2009: 81): kill walad she:f m‘allimt-o every boy saw.3ms teacher.f-his “Every boy saw his teacher”

Moreover, in EALL (Versteegh et al. 2006) morphemic glosses are more common in entries related to Classical Arabic. It is also important to indicate that there is no standardized system between the four volumes of the Encyclopaedia, each being influenced by the theoretical approach of the author in question.7 3. Interlinear morphemic glossing in Modern Hebrew Interlinear morphemic glossing for MH has been even less frequent than for vernacular Arabic varieties because of the special linguistic situation of this language. Hebrew was not used as a spoken language for 1700 years, so there is a lack of continuity between ancient Hebrew and MH in both structural and sociolinguistic terms. Glossing Hebrew consists of a synchronic description of the language that is not based on previous descriptions of Hebrew. Moreover, descriptions and grammars of MH written in languages other than Hebrew, such as English, tend merely to translate the examples (e.g. Cohen & Zafrani 1968; Rosen 1977; Aronson Berman 1978; Berman 1997). The absence of glosses can be explained by the fact that researchers were first preoccupied with legitimatizing the revived language. Some individual works that include glosses in contemporary Hebrew descriptions are grammars or PhD theses recently written in English. In these works the Hebrew examples are all translated into English. Besides the translation, the authors give some general information 7.  This is not to say that the editors are always responsible of this situation.

178 Ángeles Vicente, Il-Il Malibert and Alexandrine Barontini

regarding morphosyntactic categories of the items, general meaning of special affixes, global meanings of TAM (tense, aspect, mood) markers, and some indication of register. It is only in the late 1980s that a grammar of MH with glossing (Glinert 1989) makes an appearance. Partial glosses can be found in Schwarzwald’s Modern Hebrew Grammar (2001). Two recent PhD theses (Dekel 2010; Silber-Varod 2012) include glosses based on the Leipzig Glossing Rules (Bickel, Comrie & Haspelmath 2008) with some additional labels. Table 1 summarizes these various approaches. In Schwarzwald’s Modern Hebrew Grammar (2001: iv), there is a list of abbreviations including glosses for special affixes, TAM markers, morphosyntactic categories, register indications and names of other languages to indicate the origin of loan words. Dekel (2010) adds some glosses that are not included in the Leipzig Glossing Rules such as the seven Hebrew verbal stems. Besides the glosses and the translated words the author provides information about the nature of the consonantal root (14): (14)

maXaR at oleXet tomorrow you (2-F-SG) go (hlk-Qal-PTCP-F-SG) abajta aXaRe abXina (D-3-4-1: 102) home after the test “tomorrow after the test you will be going home” (Dekel 2010: 16)

Silber-Varod (2012: ii) uses a list of abbreviations that she refers to as “Parts of Speech tagging”. Like Dekel, in addition to the Leipzig conventions, she adds her own tags for certain affixes, inflectional forms, verbs such as ‘to be’, existentials ([jeS8] “there is” and [en] “there is not”), and register indications (15, Silber-Varod 2012: 41): (15) a. b.

eldad heXlit S-uː lo mvateR Eldad decided.3SG.M that-he not gives-up “Eldad decided that he wouldn’t give up” bikaS-t Seː ni-kne adaSim ve ke- veː Xalav ask.PST-2SG.F that 1PL-FUT.buy lentils and @9- and milk “you asked us to buy lentils and @- and milk”

8.  Silber-Varod (2012) employs this symbol to designate the voiceless fricative ʃ. 9.  Silber-Varod (2012) employs this symbol to designate a false start.



Table 1.  Glosses used in MH descriptions Glinert (1989) Morphosyntactic categories ADJ – adjective ADV – adverb CONJ – conjunction EMPH – emphatic particle F – feminine IND – indirect INTERROG – interrogative M – masculine N – noun OM – object marker PREP – preposition Pl – plural V – verb Q – question QUANT – quantifier S – singular SUFF – suffix

Dekel (2010)

Silber-Varod (2012)

adj – adjective abs – nominal suffix AC – accusative marker adv – adverb cns – construct state f – feminine m – masculine n – noun NP – noun phrase pl – plural PP – prepositional phrase pref – prefix V – verb sg – singular suf – suffix

PRE – prefixed form SUF – suffixed form

ACC-PRON – accusative marker with pronominal suffix AL – negation lexeme CONJ – conjunction DM – discourse marker LO – negation lexeme MOD – modifier and quantifier NUM – numeral expression MOD – modifier and quantifier PN – proper noun POSS-PRON – possessive particle PREP – preposition PREP-DEF – preposition with definiteness marker PREP-PRON – preposition with pronominal suffix PRON – pronoun PRP – personal pronoun SUB –subordinate particle V – verb EXT – existentials POLITE – polite interjections

Glossing in Semitic languages 179

Schwarzwald (2001)

Glinert (1989)

Schwarzwald (2001)

Dekel (2010)

TAM

PRES- present PART- participle FUT- future IMP-imperative INF- infinitive

pres-present and participle fut-future inf-infinitive

AS-aspect CONT-continuous DEO-deontic EPS-epistemic HAB-habitual MD-mood TNS- tense

Register information

C – casual usage F-formal usage

coll – colloquial lit – literally

Languages

Verbal stems

BH-Biblical Hebrew GE-general European stock Gk- Greek Hb- Hebrew MishH-Mishanic Hebrew MH-Modern Hebrew Sl-Slavic languages Yd-Yiddish Qal Nifal Piel Hifil Hitpael Pual Hufal

Silber-Varod (2012)

180 Ángeles Vicente, Il-Il Malibert and Alexandrine Barontini

Table 1.  (continued)

Glossing in Semitic languages 181



4. Interlinear morphemic glosses for Moroccan Arabic and Modern Hebrew in CorpAfroAs Some of the glossing rules followed in CorpAfroAs are common to all the represented languages, but some are language specific and others are language-group specific.10 In Section 4.1, we define underlying forms in MA11 and in MH. Then, in Section 4.2, we identify common categories and define their labels with the aim of designing a useful system for glossing Semitic languages in general.12 4.1 The first step: Defining underlying forms For parsing and retrieval purposes, CorpAfroAs proposes an analysis on six tiers. For each major word category (nouns, verbs and prepositions), it distinguishes between the \mot tier which contains the morphophonological transcription into grammatical words and the \mb tier which contains the underlying form13 of the words segmented into morphemes (clitics and affixes). In MA, the differences between the allomorphs on the \mot and \mb tiers concern metathesis (16) and vowel elision (17), in order to avoid a short vowel (ə) in an open syllable:14 (16)

\mot kanrətħu \mb ka=n-rtəħ -u \ft we rest (ARY_AV_NARR_01_009)

(17)

\mot raːʒʕiːn \mb raːʒǝʕ-iːn \ft coming back (ARY_AB_NARR_01_098)

10.  CorpAfroAs adheres to the Leipzig Glossing Rules whenever possible, but it has been necessary to add some labels. These have been agreed collectively by the CorpAfroAs authors, and can be found on 11.  There are three MA corpora in CorpAfroAS, compiled by Alexandrine Barontini (Meknès), Dominique Caubet (Casablanca) and Ángeles Vicente (Jbala and Ceuta). 12.  The procedure described can be extended to any Semitic language, for instance, Libyan Arabic and the particular case of Juba Arabic, both of which are represented in CorpAfroAs. 13.  The underlying form of the word is the allomorph that has not undergone transformation due to metathesis or vowel elision. 14.  See Section 4.2.8.1 for the underlying form of verbs, and Section 4.2.11.1 for the underlying form of prepositions.

182 Ángeles Vicente, Il-Il Malibert and Alexandrine Barontini

In MH, the difference between the allomorphs on the \mot and \mb tiers concerns only vowel elision (18): (18)

\mot sxuʁa \mb saxuʁ-a \ft rent (HEB_IM_CONV_3_SP1_033)

4.2 The second step: Defining common grammatical categories The interlinear morphemic glosses are based on the \mb tier (with one cell per morpheme), and occupy two tiers. The \ge tier contains glossing with grammatical category labels whilst the \rx tier contains other relevant information for retrieval purposes.15 4.2.1 Nouns and adjectives Nouns and adjectives are glossed by meaning on the \ge tier, and by part of speech (N or ADJ) on the \rx tier. 4.2.1.1  Gender. Since there is no overt mark for masculine gender in MA and MH, only the feminine will be glossed on the \ge tier with the standard tag F (see below examples 21, 22, 23 and 24). Nevertheless, in order to retrieve all masculine forms, the gloss M is provided on the \rx tier in MA (19 for MA and 20 for MH). (19)

\mot ʕərs \mb ʕərs \ge wedding \rx N.M \ft a wedding (ARY_AV_NARR_01_120)

(20)

\mot ʁega \mb ʁega \ge one_moment \rx N \ft one_moment (HEB_IM_NARR_7_SP2_082)

15.  The other two tiers in CorpAfroAs data analysis are \tx (a broad phonetic transcription), and \ft (a free translation). For more about the definitions of the different tiers, see . In this article, the first tier \tx, will not be provided in the examples.

Glossing in Semitic languages 183



The morphological feminine (the suffix -a) is glossed as F on the \ge tier, even if there is not an equivalent masculine form (22, 24), to facilitate retrieval of all the feminine nouns. PNG (person, number, gender) on the \rx tier indicates that this morpheme provides gender information (21 and 22 for MA, 23 and 24 for MH). (21)

\mot qbiːla \mb qbiːl-a \ge tribe-F \rx N-PNG \ft a tribe (ARY_AB_NARR_01_020)

(22)

\mot ħaːȝa \mb ħaːȝ-a \ge thing-F \rx N-PNG \ft a thing (ARY_AV_NARR_01_121)

(23)

\mot kalba \mb kelev-a \ge dog-F \rx N-PNG \ft a female dog

(24)

\mot ʃeela \mb ʃeel-a \ge question-F \rx N-PNG \ft a question (HEB_IM_NARR_7_SP2_064)

Inherent feminine forms with no overt marking are glossed on the \rx tier, not on the \ge tier16 (25 for MA and 26 for MH). (25)

\mot daːṛ \mb daːṛ \ge house \rx N.F \ft a house (ARY_AB_NARR_01_133)

16.  Not all words ending in -a are feminine nouns. For this reason, masculine words ending in -a are glossed as N.M on the \rx tier.

184 Ángeles Vicente, Il-Il Malibert and Alexandrine Barontini

(26)

\mot deʁex \mb deʁex \ge way \rx N.F \ft way (HEB_IM_CONV_3_SP1_020)

The suffix -a can also have singulative value in MA. This is glossed as PNG.SING on the \rx tier, and F is preserved on the \ge tier because it remains a feminine marker17 (27): (27)

\mot ǝlbǝgra \mb ǝl=bgǝr-a \ge DEF=cow-F \rx DET=N-PNG \ft the cow (ARY_AB_NARR_01_300)

In Semitic languages, when a suffixed personal pronoun is added to a word, it is an annexation or “construct state” (see Section 4.2.3). In such cases, the feminine suffix -a becomes -t in MA and -at in MH, which is glossed with the same grammatical category label (F) in both cases. The annexation is glossed with a backslash \ and labelled as CS (28 for MA and 29 for MH): (28)

\mot ṣəħħa ṣəħħtu \mb ṣəħħ-a ṣəħħ-t=u18 \ge health-F health-F\CS=OBL.3M \rx N-PNG N-PNG=PRO.PNG \ft health his health (ARY_AV_NARR_02_197)

(29)

\mot doda dodati \mb dod-a dod-at=i \ge uncle-F uncle-F\CS=ATTR.1SG \rx N-PNG N-PNG=PRO.PNG \ft an aunt my aunt

4.2.1.2  Number. In nouns and adjectives, plural and dual are glossed, but not singular because there is no overt marking. Nevertheless, singular is glossed in 17.  The gloss COL is used on the \rx tier for masculine and collective nouns, for instance, məʕz “goats”, \rx N.COL (ARY_AV_NARR_03_010). 18.  = clitic boundary.

Glossing in Semitic languages 185



participles,19 pronouns and verbal indexes. The standard tags (SG, DU, PL) are used on the \ge tier, preceded by a hyphen  -  when the dual or plural morphemes are suffixes, or by a backslash \, if the plural is marked by a morphophonological change of pattern in the lexical item, as is often the case in Semitic languages. PNG information is added on the \rx tier, because the morpheme provides information about number (30 and 31 for MA and 32 for MH). (30)

\mot jawmaːjn \mb jawm-aːjn \ge day-DU \rx N.M-PNG \ft two days (ARY_AV_NARR_01_242)

(31)

\mot swaːq \mb swaːq \ge market\PL \rx N \ft markets (ARY_AB_NARR_01_165)

(32)

\mot jeladim \mb jeled-im \ge children-PL.M \rx N-PNG \ft children (HEB_IM_CONV_3_SP1_442)

In MH, in cases where the number suffix appears but the singular form without this suffix does not exist as a word, the plural morpheme is not glossed on the \ge tier but the gloss PL is used on the \rx tier, as in example 33. (33)

\mot anaʃim \mb anaʃim \ge people \rx N.PL \ft people (HEB_IM_NARR_7_SP1_0087)

19.  Both active and passive participles are derived morphologically from verbs. Because of this, both participles are inflected like adjectives and they can take feminine and plural morphemes. Concerning participles in Arabic, see Owens (2008: 541). See Section 4.2.9 for the glossing of participles.

186 Ángeles Vicente, Il-Il Malibert and Alexandrine Barontini

4.2.2 Articles In both MA and MH the definite clitic article is glossed with DEF on the \ge tier and DET on the \rx tier (34 for MA and 35 for MH). (34)

\mot əttaːmaːn \mb əl=taːmaːn \ge DEF= price \rx DET=N.M \ft the price (ARY_AV_NARR_01_276)

(35)

\mot abajit \mb ha=bajit \ge DEF=house \rx DET=N \ft the house (HEB_IM_CONV_1_SP1_163)

The two types of indefinite article in MA are glossed with labels INDF1 and INDF2 on the \ge tier, and DET on the \rx tier. The first one is a clitic, its construction being similar to demonstrative determiners (see Section 4.2.6), but the second is not, because the particle (ʃi) and the noun are two phonologically independent words (36 and 37 for MA). (36)

\mot waːħdəṛṛaːʒəl \mb waːħəd=əl=ṛaːʒəl \ge INDF1=DEF=man \rx DET=DET=N \ft a man (ARY_AB_NARR_05_002)

(37)

\mot ʃi ħaːʒa \mb ʃi ħaːʒ-a \ge INDF2 thing-F \rx DET N-PNG \ft something (ARY_AV_NARR_01_150)

4.2.3 Annexation This specific grammatical category of Semitic languages is glossed only when it is overtly marked morphologically. This label is glossed with \CS on the \ge tier (see examples 28 and 29). For MA, this is the case with some feminine nouns (38) and numerals (39); for MH, it is the case with nouns (40).

Glossing in Semitic languages 187



(38)

\mot ħkaːjti \mb ħkaːj-t=i20 \ge story-F\CS=POSS.1SG \rx N-PNG=PRO.PNG \ft my story (ARY_AB_NARR_01_858)

(39)

\mot sətt ʕjaːl \mb sətta ʕjaːl \ge six\CS boys\PL \rx N.NUM N.M \ft six boys (ARY_AV_NARR_02_083)

(40)

\mot batej malon \mb bajit-ej malon \ge house-PL\CS N \rx N-PNG N \ft houses(of) hotel(=hotels) (HEB_IM_NARR_7_SP2_317)

4.2.4 Numerals Gender and number suffixes are not glossed on numerals. Numerals are translated on the \ge tier and their label is N.NUM on the \rx tier when they are nouns (41 and 42 for MA). Since ordinal numbers are adjectives, their gloss is ADJ.NUM. (41)

\mb tlaːta \ge three \rx N.NUM (ARY_AB_NARR_01_378)

(42)

\mb xəmsiːn \ge fifty \rx N.NUM (ARY_AV_NARR_01_173)

In MH gender suffixes are glossed (43 and 44). Masculine numerals have the suffix -a, but this does not function as a feminine marker in this context. The feminine for numerals is marked by the absence of a suffix.

20.  As example 28 shows, in MA the feminine suffix -a becomes -t when there is an annexation, in this case xǝdma “work” > xdǝmti “my work”.

188 Ángeles Vicente, Il-Il Malibert and Alexandrine Barontini

(43)

\mb ʃaloʃ \ge three[F] \rx N.NUM \ft three (HEB_IM_NARR_7_SP1_0434)

(44)

\mb ʃloʃ-a \ge three-M \rx N.NUM-PNG \ft three (HEB_IM_NARR_7_SP1_0334)

4.2.5 Personal pronouns In MA and in MH, independent personal pronouns are glossed with standard PNG labels on the \ge tier, and as PRO.IDP on the \rx tier (45 for MA, 46 for MH). (45)

\mot aːna ɣaːnəmʃi \mb aːna ɣa=n-mʃi \ge 1SG FUT=1SG-go\IPFV \rx PRO.IDP TAM=PNG-V \ft me I will go (ARY_AB_NARR_01_332)

(46)

\mot ani matsati \mb ani matsa-ti \ge 1SG find\PFV-1SG \rx PRO.IDP V\TAM-PNG \ft I found (HEB_IM_CONV_3_SP1_671)

Non-independent personal pronouns are enclitic because they cannot be used on their own (i.e. as independent words) and they depend on a specific context and syntax.21 In MA, since the first person singular has two suffixes for two different grammatical functions, there are two labels: OBJ for the direct object function (47), and POSS for the possessive function and with prepositions (48 and 49). One of these labels and PNG information are provided on the \ge tier. On the \rx tier, the label is PRO.PNG.

21.  Cf. The Online Dictionary of Language Terminology, s.v. ‘clitic pronoun’.

Glossing in Semitic languages 189



(47)

\mot jʕaːwnuːni \mb j-ʕaːwən-u=ni \ge 3-help\IPFV-PL=OBJ.1SG \rx PNG-V.DER3-PNG=PRO.PNG \ft they will help me (ARY_AV_NARR_02_354)

(48)

\mot wliːdi \mb wliːd=i \ge boy\DIM=POSS.1SG \rx N=PRO.PNG \ft my boy (ARY_AB_NARR_01_365)

(49)

\mot ʕəndi \mb ʕənd=i \ge at=POSS.1SG \rx PREP=PRO.PNG \ft at me (ARY_AV_NARR_03_059)

Regarding the second and third persons in MA, the label is always OBL plus PNG information on the \ge tier, and PRO.PNG on the \rx tier, for verbs, prepositions and possessive function (see 50 and 51). (50)

\mot qtəlha \mb qtəl=ha \ge kill\PFV=OBL.3SG.F \rx V=PRO.PNG \ft he killed her (ARY_AB_NARR_01_137)

(51)

\mot djaːlna \mb djaːl=na \ge of=OBL.1PL \rx PREP.POSS=PRO.PNG \ft of us (our) (ARY_AV_NARR_01_014)

In MH, there are two different paradigms, one for possessive pronouns with a nominal head and one for pronouns which are obligatorily introduced by a preposition. The forms are only different in the plural. Possessive pronouns are glossed for both numbers as ATTR with PNG indication (2nd and 3rd persons only for latter). Personal pronouns affixed to prepositions are glossed only with PNG

190 Ángeles Vicente, Il-Il Malibert and Alexandrine Barontini

indication. As in MA, the gloss on the \rx tier is PRO.PNG for both series of pronouns (52 and 53 for MH). (52)

\mot axotenu \mb axot=enu \ge sister=ATTR.1PL \rx N=PRO.PNG \ft my sister

(53)

\mot lanu \mb l=anu \ge to=1PL \rx PREP=PRO.PNG \ft to us (HEB_IM_NARR_7_SP1_0666)

4.2.6 Demonstratives: Pronouns, determiners and modifiers For demonstrative pronouns in MA, near and far deixis are glossed as PROX and DIST respectively, plus gender and number information on the \ge tier. On the \rx tier the gloss is PRO.DEM (54 and 55). (54)

\mb haːda haːdi haːdu \ge PROX.M PROX.F PROX.PL \rx PRO.DEM PRO.DEM PRO.DEM \ft this this these (ARY_AB_NARR_01_134 / ARY_AB_NARR_01_022)

(55)

\mb haːdaːk haːdiːk haːduːk \ge DIST.M DIST.F DIST.PL \rx PRO.DEM PRO.DEM PRO.DEM \ft that that those (ARY_AV_NARR_03_612 / ARY_AV_NARR_02_078 / ARY_AB_NARR_01_396)

MA determiners are joined to the following article and noun because they are considered to be clitics, determiners being phonologically dependent on them. Near and far deixis are also distinguished (DEM.PROX / DEM.DIST) on the \ge tier, and the gloss is DET on the \rx tier (56 and 57). (56)

\mot haːdəlɣuːl \mb haːd=əl=ɣuːl \ge DEM.PROX=DEF=ogre \rx DET=DET=N.M \ft this ogre (ARY_AB_NARR_01_363)

Glossing in Semitic languages 191



(57)

\mot diːkəlʒuːʒ \mb diːk=əl=ʒuːʒ \ge DEM.DIST=DEF=two \rx DET.M/F22=DET=N.NUM \ft that two (ARY_AV_NARR_02_097)

In MH, near and far deixis demonstrative modifiers are glossed as PROX and DIST respectively, plus gender and number information on the \ge tier. On the \rx tier the gloss is DEM (58 and 59). (58)

\mb haze hazot haele \ge PROX.M PROX.F PROX.PL \rx DEM DEM DEM \ft this this these

(59)

\mb hahu \ge DIST.M \rx DEM \ft that

hahi DIST.F DEM that

hahem DIST.PL DEM those

In MA, when a pronoun follows a demonstrative determiner construction and agrees in gender and number with the noun, it is glossed as PRO.DEM on the \rx tier (60). (60)

\mot haːdəlbənt haːdi \mb haːd=əl=bənt haːdi \ge DEM.PROX=DEF=girl PROX.F \rx DET=DET=N.F PRO.DEM \ft this girl over here

In MH, demonstrative modifiers also follow the noun (61 and 62). (61)

\mot aatlas azə \mb ha=atlas haze \ge DEF=atlas PROX.M \rx DET=N DEM \ft this atlas (HEB_IM_NARR_7_SP2_003)

22.  diːk is used for masculine and feminine gender only in the Northern Moroccan variety.

192 Ángeles Vicente, Il-Il Malibert and Alexandrine Barontini

(62)

\mot haezoʁ hahu \mb ha=ezoʁ hahu \ge DEF=area DIST.M \rx DET=N DEM \ft that area (HEB_IM_NARR_7_SP2_289)

4.2.7 Relative pronouns In MA, the relative pronoun is expressed by the form əlli (with an allomorph lli), that is invariable in gender and number and may refer to either persons or things (63). It is glossed as REL on the \ge tier, and PRO on the \rx tier. The label REL is also used in MH on the \ge tier, but on the \rx tier it is CONJ since it also functions for subordination. This pronoun is a clitic in MH (64). (63)

\mb əl=wəqt əlli dxəl \ge DEF=time REL come_in\PFV[3SG.M] \rx DET=N.M PRO V \ft at the time he came in (ARY_AB_NARR_01_207)

(64)

\mb jeʃ tsav ʃe=ʃomeʁ \ge be turtle REL=keep\PTCP.ACT[SG.M] \rx EXS N CONJ=VN \ft there is a turtle that keeps (HEB_IM_NARR_7_SP1_0189–0190)

4.2.8 Verbs In MA, verbs are glossed with their meaning plus the labels \PFV (for perfective aspect), \IPFV (for imperfective aspect) and \IMP (for imperative). In MH, verbs are glossed with their meaning plus the labels \PFV (for perfective aspect or past tense), \nFCT (for imperfective aspect, non factual mood or future tense), \INF (for infinitive), and \IMP (for imperative). 4.2.8.1  Underlying forms for verbs. For regular verbs in both languages, the 3SG.M form is given as the underlying form on the \mb tier for all aspect and tenses, and their allomorphs appear on the \mot tier. From this is it possible to rebuild the structure of the word (65 and 66 for MA, 67 for MH).

Glossing in Semitic languages 193



(65)

\mot nqədru \mb n-qdər-u \ge 1-can\IPFV-PL \rx PNG-V-PNG \ft we will can (ARY_AV_NARR_01_036)

(66)

\mot ʕəṛfət \mb ʕṛəf-ət \ge know\PFV-3SG.F \rx V-PNG \ft she knew (ARY_AB_NARR_01_696)

(67)

\mot jiʃaeʁ \mb j- iʃaeʁ \ge 3M-stay\nFCT[SG] \rx PNG-V\TAM \ft he will/shall stay (HEB_IM_CONV_1_SP1_042)

In MA, regarding hollow and defective verbs,23 two basic forms are used, one for perfective (PFV), the 3SG.M form, and another one for imperfective (IPFV) and imperative (IMP), the verbal root without personal affixes, for instance: gaːl (PFV) / guːl (IPFV/IMP), and mʃaː (PFV) / mʃiː (IPFV/IPM), (68 and 69 for perfective and 70 and 71 for imperfective). (68)

\mot kənt \mb kaːn-t \ge be\PFV-1 \rx V-PNG \ft I was (ARY_AV_NARR_02_094)

(69)

\mot mʃiːna \mb mʃaː-na \ge go\PFV-1PL \rx V-PNG \ft we went (ARY_AV_NARR_01_135)

23.  In Semitic languages, the basic root of the verbs is usually, but not always, a tri-consonantal root, when the second element of the root is a semiconsonant they are called hollow verbs, and verbs whose third element of the root is a semiconsonant are called defective verbs. In both cases, certain morphophonemic rules apply when conjugating them.

194 Ángeles Vicente, Il-Il Malibert and Alexandrine Barontini

(70)

\mot nguːzu \mb n-guːz-u \ge 1-pass\IPFV-PL \rx PNG-V-PNG \ft we will pass (ARY_AV_NARR_01_072)

(71)

\mot kanəmʃiːw \mb ka=n-mʃiː-u \ge REAL=1-go\IPFV-PL \rx TAM=PNG-V-PNG \ft we will go (ARY_AV_NARR_01_044)

In MH, two forms are also used for the same category of verbs, one for perfective (72) and another for the non-factual mood (73), as usual using the 3SG.M form. (72)

\mot dibaʁtem \mb dibeʁ-tem \ge talk\PFV-2PL.M \rx V\TAM-PNG \ft you talked (HEB_IM_NARR_7_SP2_161)

(73)

\mot taazvu \mb t-aazv-u \ge 2-leave\nFCT-PL \rx PNG-V\TAM-PNG \ft they will leave (HEB_IM_CONV_1_SP1_016)

4.2.8.2  Verbal indexes. For verbal indexes, the same rule applies: the basic form appears on the \mb tier and any allomorphs on the \mot tier. Tables 2 and 3 show their basic forms. These verbal indexes are affixes (they are phonologically and syntactically dependent on the verbal form). PNG information is glossed on the \ge tier and the PNG label is used on the \rx tier (74 and 75 for MA,24 76 for MH).

24.  As examples 74 and 75 demonstrate, the variants for the same verbal index (for instance, the case of the perfective 3SG.F in MA) are indicated on the \mot and \mb tiers because they are variants, not allomorphs.

Glossing in Semitic languages 195



Table 2.  Verbal indexes in MA IPFV n-

(1)

-u (PL)

t-

(2 or 3f)

-i (F) / -u (PL)

j-

(3m or 3)

-u (PL)

PFV -t (1SG)

-na (1PL)

-t / -ti (2SG or 2SG.M / 2SG or 2SG.F)

-tu (2PL)

ø ([3SG.M])

-u (3PL)

-ət / -aːt (3SG.F)

Table 3.  Verbal indexes in MH25 nFCT Vowel /Ø/25

(1SG)

n- (1PL)

t-

(2 or 3F)

-i (F) / -u (2PL)

j-

(3M or 3)

-u (3PL)

(1SG)

-nu(1PL)

PFV -ti -ta / -t

(2SG.M / 2SG.F)

-tem / ten(2PL.M / 2PL.F)

Ø

([3SG.M])

-u(3PL)

(74)

\mot səktaːt \mb skət-aːt \ge be_quiet\PFV-3SG.F \rx V-PNG \ft she kept quiet (ARY_AB_NARR_01_574)

(75)

\mot qaːlət \mb qaːl-ət \ge say\PFV-3F \rx V-PNG \ft she said (ARY_AV_NARR_02_182)

25.  In fact this verbal index is pronounced with a weak glottal stop.

196 Ángeles Vicente, Il-Il Malibert and Alexandrine Barontini

(76)

\mot avati \mb avad-ti \ge work\PFV- 1SG \rx V\TAM-PNG \ft I worked (HEB_IM_CONV_3_SP1_099)

Following the Leipzig Glossing Rules,26 the 3SG.M person is glossed between square brackets (in both languages) on the \ge tier, because the gloss does not correspond to an overt element (77 and 78). (77)

\mot mʃa \mb mʃaː \ge go\PFV[3SG.M] \rx V \ft he went (ARY_AB_NARR_01_163)

(78)

\mot jaʃav \mb jaʃav \ge sit\PFV[3SG.M] \rx V\TAM \ft he sat (HEB_IM_NARR_7_SP1_0247)

4.2.8.3  Derived verbal forms. In Semitic languages, some verbal forms are derived from a basic root (usually, but not always, a tri-consonantal root) by means of a morphophonological stem change. These forms are usually tagged with Arabic numerals in traditional grammars.27 Following this tradition, derived verbal forms are glossed on the \rx tier as DER2, DER3 etc. in MA (79). (79)

\mot tʒəwwəʒ \mb t-ʒəwwəʒ \ge 2-get_married\IPFV \rx PNG-V.DER2 \ft you will get married (ARY_AV_NARR_03_641)

26.  Leipzig Glossing Rules, Rule 6: “Non-overt elements. If the morpheme-by-morpheme gloss contains an element that does not correspond to an overt element in the example, it can be enclosed in square brackets”, Bickel, Comrie & Haspelmath (2008). 27.  This is true of grammars written in European languages, but not of grammars written in Arabic or Hebrew.

Glossing in Semitic languages 197



In MH, on the other hand, a general value for each verbal stem is provided, rather than the traditional name for these forms, to help make the corpus accessible to non-specialists. They are glossed on the \ge tier as CAUS (causative), REFL (reflexive) and PASS (passive) when they have one of these special derivational meanings, and on the \rx tier as DER (80): (80)

\mot tavi \mb t-avi \ge 2-bring\CAUS\nFCT \rx PNG-DER\TAM \ft bring (it)! (HEB_IM_NARR_7_SP1_0013)

This system is not used in MA, since it is difficult to determine a single value for each form. For cross-linguistic comparison between Semitic varieties, derived verb forms can be retrieved by searching DER on the\rx tier. 4.2.8.4  Quadri-consonantal roots. These are glossed as V.4. on the \rx tier because they are not derived stems (81 for MA).28 (81)

\mot tkərfəsna \mb tkərfəs-na \ge be_damaged\PFV-1PL \rx V.4-PNG \ft we were damaged (ARY_AV_NARR_02_007)

4.2.9 Participles and verbal nouns Active participles are glossed as PTCP.ACT and passive participles as PTCP.PASS on the \ge tier, and as VN (meaning verbo-nominal forms) on the \rx tier. Any allomorphs appear on the \mot tier (82 and 83 for MA, 84 and 85 for MH). (82)

\mot gaːləs \mb gaːləs \ge sit_down\PTCP.ACT.SG.M \rx VN (ARY_AV_NARR_02_188)

28.  In case of quadri-consonantal roots, there is a single derived verb form, which is glossed as V.4.DER.

198 Ángeles Vicente, Il-Il Malibert and Alexandrine Barontini

(83)

\mot məsħuːr \mb məsħuːr \ge bewitch\PTCP.PASS.SG.M \rx VN29 (ARY_AB_NARR_01_100)

(84)

\mot ovedet \mb oved-et \ge work\PTCP.ACT.SG.F \rx VN-PNG (HEB_IM_CONV_3_SP2_012)

(85)

\mot sguʁim \mb saguʁ-im \ge close\PTCP.PASS.PL.M \rx VN-PNG (HEB_IM_CONV_3_SP1_078)

Verbal nouns are glossed with their general meaning on the \ge tier and as N.V. (noun verbal) on the \rx (86 for MA, 87 for MH). (86)

\mot əttsaːriːja \mb əl=tsaːriːj-a \ge DEF=go_for_a_walk-F \rx DET=N.V.DER6-PNG (ARY_AV_NARR_01_262)

(87)

\mot ʁitsa \mb ʁitsa \ge race \rx N.V. (HEB_IM_NARR_7_SP1_1200)

4.2.10 TAM markers TAM markers behave differently in the two languages. In some Arabic vernaculars, a prefix is added to the imperfective to indicate aspect. For instance, in MA, ka / ta (and other allomorphs) are used with the imperfective to indicate present simple, repetition and concomitance.30 These markers are considered as clitics because they depend phonologically on the verbal form whilst being syntactically 29.  For participles of derived verb forms, the same labels are used as those in Section 4.2.8.3. 30.  Imperfectives without this TAM marker are translated as future tense in order to differentiate them.

Glossing in Semitic languages 199



independent. They are glossed as REAL (realis mood) on the \ge tier, and as TAM on the \rx tier (88). (88)

\mot kaimʃiːw \mb ka=j-mʃiː-u \ge REAL=3-go\IPFV-PL \rx TAM=PNG-V-PNG \ft they go (ARY_AB_NARR_01_044)

Future particles in MA (ɣaːdi, reduced to ɣaːd / ɣaː; maːʃi, reduced to maːʃ, etc.) are glossed as FUT on the \ge tier, and as TAM on the \rx tier (89). In the case of reduced particles, they become clitics (90). (89)

\mot ʕaːdi ma tʒəbruʃi \mb ʕaːdi ma t-ʒbər-u=ʃi \ge FUT NEG 2-find\IPFV-PL=NEG \rx TAM PTCL PNG-V-PNG =CL \ft you will not find (ARY_AV_NARR_04_204)

(90)

\mot ɣaːtəwṣəl \mb ɣaː=t-wṣəl \ge FUT=2-arrive\IPFV \rx TAM=PNG-V \ft you will arrive (ARY_AB_NARR_01_385)

In MH, the only aspectual marker used is PFV (for perfective aspect), which also means past tense. The other verbal marker is nFCT (non factual). This is a modal marker but it can also mean future tense or imperative (91 and 92). (91)

\mot jaʃav \mb jaʃav \ge sit\PFV[3SG.M] \rx V\TAM \ft he sat (HEB_IM_NARR_7_SP1_0247)

(92)

\mot taazvu \mb t-aazv-u \ge 2-leave\nFCT-PL \rx PNG-V\TAM-PNG \ft they will leave (HEB_IM_CONV_1_SP1_016)

200 Ángeles Vicente, Il-Il Malibert and Alexandrine Barontini

4.2.11 Prepositions, conjunctions and adverbs 4.2.11.1  Prepositions. The same general rule is applied, with the underlying form (i.e. the isolated form) appearing on the \mb tier, and its allomorphs on the \mot tier. Prepositions are also translated with a prototypical lexical gloss31 on the \ge tier, and as PREP on the \rx tier. Examples 93, 94 and 95 show how prepositions are glossed in MA: (93)

\mot ʕliːha \mb ʕla=ha \ge along=OBL.3F \rx PREP=PRO.PNG \ft along her (ARY_AV_NARR_02_265)

(94)

\mot djaːlhum \mb djaːl=hum \ge of=OBL.3PL \rx PREP=PRO.PNG \ft theirs (PL) (ARY_AB_NARR_01_838)

(95)

\mot ʕla ṣəħħtu \mb ʕla ṣəħħ-t=u \ge along health-F\CS=3M \rx PREP N-PNG=PRO.PNG \ft in good health (ARY_AV_NARR_02_197)

Examples 96 and 97 show how prepositions are treated in MH: (96)

\mot bagan \mb b=ha=gan \ge in=DEF=kindergarten \rx PREP=DET=N \ft in the kindergarten (HEB_IM_CONV_3_SP2_009)

31.  A single translation is chosen for each preposition, conjunction, adverb and other particles in order to retrieve them when performing cross-linguistic comparisons. This is a principle of CorpAfroAs: “Lexical glosses refer to basic stems only, irrespective of the semantic changes induced by derivational and other material”, (20 January 2013).

Glossing in Semitic languages 201



(97)

\mot laavoda \mb l=ha=avoda \ge to=DEF=work \rx PREP=DET=N.V. \ft to the work (HEB_IM_CONV_3_SP2_009)

4.2.11.2  Conjunctions, adverbs and interrogatives. These are translated on the \ge tier, and the part of speech (CONJ or ADV) is provided on the \rx tier. In MA, the potential (POT) and irrealis (IRR) moods are distinguished. Examples 98 and 99 show how conjunctions are glossed in MA: (98)

\mot iːla \mb iːla \ge if \rx CONJ.POT (ARY_AB_NARR_01_231)

(99)

\mot luːkaːn \mb luːkaːn \ge if \rx CONJ.IRR

Example 100 shows how a conjunction is glossed in MH: (100)

\mot im \mb im \ge if \rx COND (HEB_IM_CONV_3_SP1_234)

Examples 101 and 102 show how adverbs and interrogatives are glossed in MA: (101)

\mot təmma \mb təmma \ge there \rx ADV (ARY_AB_NARR_01_050)

(102)

\mot ʃənni \mb ʃənni \ge what \rx PRO.Q \ft what? (ARY_AV_NARR_02_214)

kiːfaːʃ kiːfaːʃ how ADV.Q how? (ARY_AV_NARR_02_252)

202 Ángeles Vicente, Il-Il Malibert and Alexandrine Barontini

In MH, interrogative adverbs are glossed separately (103 and 104): (103)

\mot efo \mb ejfo \ge where \rx ADV.Q (HEB_IM_CONV_3_SP1_235)

(104)

\mot mataj \mb mataj \ge when \rx ADV.Q (HEB_IM_NARR_7_SP2_226)

4.2.12 Negation Arabic vernaculars usually use two words to form the negation of verbal forms. In MA, these two particles are glossed as NEG on the \ge tier. Since the first part of the negation ma is a phonologically independent particle, it is glossed as PTCL on the \ rx tier, and the second part of the negation (ʃi and its allomorphs) is glossed as CL (clitic) because it is phonologically dependent on the verbal form (105). (105)

\mot ma kaixərʒuʃ \mb ma ka=j-xrəʒ-u=ʃ \ge NEG REAL=3-go_out\IPFV-PL=NEG \rx PTCL TAM=PNG-V-PNG=CL \ft they do not go out (ARY_AB_NARR_01_567)

In the case of nominal clauses, the negative particle is maːʃi. It is glossed as NEG. CONT32 on the \ge tier, and PTCL on the \rx tier (106). (106)

\mot maːʃi \mb maːʃi \ge NEG.CONT \rx PTCL (ARY_AV_NARR_01_279)

In MH, there are four negative allomorphs (clitics or affixes) to express negation of verbal or non-verbal forms: the clitic lo for the negation of a verb, the clitic al with the nFCT conjugation to convey an prohibition, the prefix i with the modal efʃaʁ meaning “possible” and with some adjectives. All of them are glossed as NEG 32.  Where CONT means ‘continuous’, the negative particle is formed by the two markers of verbal clause negation, as described in example 107.

Glossing in Semitic languages 203



on the \ge tier, and as PTCL on the \rx tier (107, 108 and 109). The fourth negative clitic, en, is used with existential sentences. It is glossed as NEG.EXS on the \ge tier, and PTCL on the \rx tier (110). (107)

\mot lonasati \mb lo=nasa-ti \ge NEG=travel\PFV-1SG \rx PTCL=V\TAM-PNG \ft I didn’t travel (HEB_IM_CONV_3_SP1_177)

(108)

\mot altoxli \mb al=t-oxel-i \ge NEG=2-eat\nFCT- SG.F \rx PTCL=V\TAM-PNG \ft don’t eat!

(109)

\mot iefʃaʁ \mb i-efʃaʁ \ge NEG-possible \rx PTCL-MOD \ft it’s impossible

(110)

\mot enkviʃ \mb en=kviʃ \ge NEG.EXS=road \rx PTCL=N \ft no road (HEB_IM_NARR_7_SP2_086)

5. Conclusion An overview of the traditions of morphemic glossing in Arabic and Hebrew reveals the lack of a unified system for carrying out this kind of morphosyntactic analysis. Traditional linguistic studies never established this practice, and only the most recent publications contain a list of labels and some morphemic glosses. Our first aim has been to propose a cohesive procedure for applying morphemic glosses to data in Semitic languages. In addition, we have identified similar grammatical categories between two Semitic languages, MA and MH, in order to describe parallel morphemes and their similar functions cross-linguistically. By means of specific language examples, we have illustrated the decisions taken

204 Ángeles Vicente, Il-Il Malibert and Alexandrine Barontini

by the six CorpAfroAs researchers who are working on Semitic languages as they sought to retrieve their morphological particularities. This involved showing the differences between the underlying form of a grammatical word and its allomorphs for the major grammatical categories (verbs, nouns and prepositions). For each one, we identified a basic form and its possible allomorphs to allow retrieval of the modifications that a morpheme undergoes in any given context, for example: metathesis or vowel elision for nouns and verbs (16, 17, 18, 65, 66), apophony for verbs (68, 69 72) and vowel lengthening for prepositions (93). Moreover, specific language-group categories such as the construct state or annexation, which is reflected only when there is an overt morphological mark (38, 39, 40), were given specific labels. Nevertheless, our analysis is not identical for all morphological categories. Obviously each language needs specific treatment. One example is the differences between non-independent personal pronouns in MA (47–51) and MH (52–53), numerals in both languages (41–44), the indefinite article that exists in MA (36– 37) but not in MH, and the system of negation that is more complex in MH than it is in MA (105–110).33 To conclude, it is important to underline the interest of these labels and this glossing procedure for a more precise morphosyntactic analysis of MA and MH independently as well. Linguistic research with glosses in Semitic languages is necessary to achieve more accurate results. This article proposes a first attempt for establishing a standard in the domain.

References Aoun, Joseph. 2009. Resumption. In Versteegh, Eid, Elgibali, Woidich & Zaborski (eds), IV, 80–85. Aronson Berman, Ruth. 1978. Modern Hebrew Structure. Tel Aviv: University Publishing Projects. Bahloul, Maher. 2006. Agreement. In Versteegh, Eid, Elgibali, Woidich & Zaborski (eds), I, 43–48. Barontini, Alexandrine. 2012. ‘Moroccan Arabic Corpus (Meknes)’. Corpus recorded, transcribed and annotated by Alexandrine Barontini. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOI: http:// dx.doi.org/10.1075/scl.68.website. Accessed 10/06/2013. (= ARY_AB) Benmamoun, E. 2006. Construct State. In Versteegh, Eid, Elgibali, Woidich & Zaborski (eds), I, 477–482. 33.  A complete list of these glosses can be found on CorpAfroAs website: http://dx.doi.org/ 10.1075/scl.68.website



Glossing in Semitic languages 205

Berman, Ruth A. 1997, Modern Hebrew. In Semitic Languages, Robert Hetzron (ed.), 312–333. New York NY: Routledge. Bickel, Balthasar, Comrie, Bernard & Haspelmath, Martin. The Leipzig Glossing Rules. Conventions for Interlinear Morpheme by Morpheme Glosses. Revised version of February 2008. Leipzig: Max-Planck-Institut für Evolutionäre Anthropologie.  (17 July 2012). Boumans, Louis. 1998. The Syntax of Codeswitching. Analysing Moroccan Arabic/Dutch Conversation. Tilburg: Tilburg University Press. Caubet, Dominique. 1993. L’arabe marocain, Tome 1: Phonologie et morphosyntaxe. Tome 2: Syntaxe et catégories grammaticales, textes. Leuven: Peeters. Cohen, David & Zafrani, Haïm. 1968. Grammaire de l’hébreu vivant. Paris: PUF. Dekel, Nurit. 2010. A Matter of Time: Tense, Mood and Aspect in Spontaneous Spoken Israeli Hebrew. PhD dissertation. (23 April 2013). Esseesy, Mohssen. 2010, Grammaticalization of Arabic Prepositions and Subordinators. A Corpusbased Study. Leiden: Brill. DOI: 10.1163/9789004187634 Fischer, Wolfdietrich & Jastrow, Otto. 1980. Handbuch der arabischen Dialekte. Wiesbaden: Harrassowitz. Fleisch, Henri. 1974. Études d’arabe dialectal. Beyrouth: Dar el-Machreq. Fleisch, Henri. 1979. Traité de philologie arabe. Beyrouth: Dar el-Machreq. Glinert, Lewis. 1989.The Grammar of Modern Hebrew. Cambridge: CUP. Gralla, Sabine. 2006. Der arabische Dialekt von Nabk (Syrien). Wiesbaden: Harrassowitz. Grand’Henry, Jacques. 1972. Le parler arabe de Cherchell (Algérie). Louvain-la-Neuve: Université Catholique de Louvain-Institut Orientaliste. Haak, Martine. 2006. Auxiliary. In Versteegh, Eid, Elgibali, Woidich & Zaborski (eds), I, 216– 221. Haak, Martine, de Jong, Rudolf & Versteegh, Kees (eds). 2004. Approaches to Arabic Dialects. A Collection of Articles Presented to Manfred Woidich on the Occasion of his Sixtieth Birthday. Leiden: Brill. Henkin, Roni. 2010. Negev Arabic. Dialectal, Sociolinguistic, and Stylistic Variation. Wiesbaden: Harrassowitz. de Jong, Rudolf E. 2011. A Grammar of the Bedouin Dialects of Central and Southern Sinai. Leiden: Brill. DOI: 10.1163/ej.9789004201019.i-440 Lehmann, Christian. 2004. Interlinear morphemic glossing. In Morphologie. Ein internationales Handbuch zur Flexion und Wortbildung, Geert Booij, Christian Lehmann, Joachim Mugdan & Stavros Skopeteas (eds), 1834–1857. Berlin: Walter de Gruyter. Malibert, Il-Il. 2012. ‘Spoken Hebrew Corpus’. Corpus recorded, transcribed and annotated by Malibert Il-Il. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOI: http://dx.doi.org/10.1075/scl.68.website. Accessed 01/06/2013. (= HEB_IM) Moutaouakil, Ahmed. 2007. Functional Grammar. In Encyclopaedia of Arabic Language and Linguistics, Versteegh, Eid, Elgibali, Woidich & Zaborski (eds), II, 143–150. Mughazy, Mustafa A. 2007. Ellipsis. In Versteegh, Eid, Elgibali, Woidich & Zaborski (eds), II, 18–21. Owens, Jonathan. 2004. Remarks on ideophones in Nigerian Arabic. In Approaches to Arabic Dialects. A Collection of Articles Presented to Manfred Woidich on the Occasion of his Sixtieth Birthday, Martine Haak, Rudolf de Jong & Kees Versteegh (eds), 207–220. Leiden: Brill.

206 Ángeles Vicente, Il-Il Malibert and Alexandrine Barontini Owens, Jonathan. 2008. Participle. In Versteegh, Eid, Elgibali, Woidich & Zaborski (eds.), III, 541–546. Reese Johannes. 2006. Aktionsart. In Versteegh, Eid, Elgibali, Woidich & Zaborski (eds), I, 50–53. Rosen, Haïm. 1977. Contemporary Hebrew. The Hague: Mouton. DOI: 10.1515/9783110804836 Schwarzwald, Ora R. 2001. Modern Hebrew. Munich: Lincom. Solimando, Christina. 2011. Ellipsis in the Arabic linguistic thinking (8th-10th century). In The Word in Arabic, Giuliano Lancioni & Lidia Bettini (eds), 69–82. Leiden: Brill.  DOI: 10.1163/9789004206427_006 The Online Dictionary of Language Terminology (ODLT), (16 March 2013). Tsukanova, Vera. 2008. Discourse functions of the verb in the dialect of sedentary Kuwaitis. In Between the Atlantic and Indian Oceans. Studies on Contemporary Arabic Dialects, Stephan Procházka & Veronika Ritt-Benmimoun (eds), 447–455. Wien: Lit Verlag. Vanhove, Martine. 1993. La langue maltaise. Études syntaxiques d’un dialecte arabe ‘périphérique’. Wiesbaden: Harrassowitz. Vanhove, Martine. 2004. Deixis et focalization: La particule ta en arabe de Yafi‘ (Yémen). In Approaches to Arabic Dialects. A Collection of Articles Presented to Manfred Woidich on the Occasion of his Sixtieth Birthday, Martine Haak, Rudolf de Jong & Kees Versteegh (eds), 329–342. Leiden: Brill. Varod-Silber, Vered. 2012. The SpeeCHain Perspective: Prosody‐Syntax Interface in Spontaneous Spoken Hebrew. PhD dissertation, Tel Aviv University. Versteegh, Kees, Eid, Mushira, Elgibali, Alaa, Woidich, Manfred & Zaborski, Andrzej (eds). 2006–2009. Encyclopaedia of Arabic Language and Linguistics. 4 vols. Leiden: Brill. Vicente, Ángeles. 2012. ‘Moroccan Arabic Corpus (Ceuta)’. Corpus recorded, transcribed and annotated by Ángeles Vicente. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOI: http://dx.doi.org/10.1075/ scl.68.website. Accessed 15/05/2013. (= ARY_AV) Watson, Janet C. E. 1993. A Syntax of Ṣan‘ānī Arabic. Wiesbaden: Harrassowitz. Yoda, Sumikazu 2005. The Arabic Dialect of the Jews of Tripoli (Libya). Wiesbaden: Harrassowitz.

From the Leipzig Glossing Rules to the GE and RX lines Bernard Comrie

Max Planck Institute for Evolutionary Anthropology and University of California Santa Barbara

The Leipzig Glossing Rules (http://www.eva.mpg.de/lingua/resources/glossing-rules.php) were devised with a very specific purpose in mind, namely to standardize the notations used by linguists in order to present the morphological structure of example sentences in language structures unfamiliar to the reader. While they form a suitable basis for annotation in projects like CorpAfroAs, such projects have a higher level of requirements, in particular the need to be able to retrieve particular categories and structures from corpora in various languages. The article discusses with examples the extensions of the LGR that are needed for this purpose.

1. From translation to glossing Suppose I have a text in L1 and want to make it accessible to speakers of L2 who have no knowledge of L1. An obvious answer is that I translate the text into L2 and give this version to my audience. If my sole aim is to convey the information contained in L1 to the speakers of L2, then this is probably sufficient. But let us suppose that I want to go further, and have my audience understand the structure of the text in L1, so that they know not only what information is conveyed by the text, but also how the text comes to convey the information that it does. The difference between the two tasks can be seen by examining the history of the decipherment of the Rosetta Stone. When the Rosetta Stone was discovered, the Greek text showed clearly what information was intended to be conveyed, and researchers could reasonably conclude that the Ancient Egyptian text, in its hieroglyphic and demotic variants, conveyed the same information. But it was nonetheless several decades before linguists were able to decipher the Ancient Egyptian text, i.e. to work out how it comes to convey the information that is also conveyed by the Greek text, by means of its own lexical and grammatical resources. doi 10.1075/scl.68.06cor 2015 © John Benjamins Publishing Company

208 Bernard Comrie

The answer to the question posed in the first paragraph of Section 1 is that the L1 text needs not (only) to be translated, but (also) to be glossed, i.e. to be provided with explanations that enable someone ignorant of L1 but familiar with the glossing conventions used to follow what is going on linguistically — lexically and grammatically — in the original text. Let us step back for a moment. There are actually several levels at which one can make a text linguistically accessible to an audience. In the simplest case, the audience already knows the language in which the text is written, as for instance in a grammar written for native speakers of a language. Examples can be presented without any glossing (or, of course, translation), and the necessary analysis can be provided in the rest of the text (including, possibly, diagrammatical representations of the sentence to illustrate particular aspects of structure). Sometimes, in order to highlight particular aspects of linguistic structure that are not explicit in the normal representation used, this representation might be “enriched”, for instance by adding a (partial) syntactic bracketing to illustrate the two interpretations of (1) in (2) and (3), or by adding subscripts to indicate different coreference relations of (4) in (5) and (6).

(1) old men and women1



(2) old [men and women]



(3) [old men] and [women]



(4) John saw Bill, and then he ran away.



(5) Johni saw Billj, and then hei ran away.



(6) Johni saw Billj, and then hej ran away.

Second, an example in L1 might be presented to an audience who are not native speakers of L1 but nonetheless already have a sufficiently advanced knowledge of L1 that they can be assumed to understand most of the structure of the example in question and only have their attention directed to some more advanced point that is of interest. A learner of Standard Arabic who has got to the point of confronting relative clauses in that language might well be presented with examples (7) and (8) with appropriate translations and an explanation that the relative pronoun ((ʔa)llaðiː) is used with definite antecedents, but not with indefinite ones.

1.  For examples in languages that use the Latin alphabet, the usual orthography is used; in the case of Latin and Old English examples, the spelling of the original manuscript. Examples from other languages are given in IPA modified according to the guidelines of the CorpAfroAs project; in particular, a subscript dot is used to indicate so-called emphatic consonants.



From the Leipzig Glossing Rules to the GE and RX lines 209

(7) raʔajtu l-walada llaðiː taraka ʔabaːhu ‘I saw the boy who had left his father.’ (8) raʔajtu waladan taraka ʔabaːhu ‘I saw a boy who had left his father.’

(See (20) for a fuller glossing of (7), and (12) for a glossing of waladan.) In Anglo-Saxon England, higher office-holders in monasteries were faced with the problem of making the Latin text of the gospels accessible to their rankand-file, and their solution was to write translations of the individual words underneath the original. Examples (9)–(11) are some examples from the Old English Rushworth Glosses, with the three lines representing the Latin original, the Old English gloss, and a Modern English translation. (9) populi ðæs folces ‘of the people’ (10) ad illas to him ‘to them’ (11) venient dies cymeð dagas ‘days will come/there will come days’

Example (9) is quite successful at communicating the structure of the Latin, since a Latin genitive case is glossed by an Old English genitive, the only difference being that the Old English has an added definite article, a category absent from Latin. In (10), the two word combination is in a sense correctly translated into English, and the structure of preposition + complement is retained, but there are also structural differences. The Latin preposition ad requires the accusative case, while the Old English preposition to requires the dative, so that the gloss does not provide full information, in particular it does not identify the case of the Latin pronoun. In fact, the situation is even worse: Dative him could in principle be plural of any gender, or masculine or neuter singular, while Latin illas is unequivocally feminine plural; from the context, it is clear that the reference is to a group of women. In (11), the verb forms are particularly interesting. Latin venient is future tense, third person plural. The Old English form cymeð is unequivocally plural, but is uninformative as to person, so only the fact that its subject has to be interpreted as ‘days’ guarantees the interpretation as third person plural. Old English lacks a distinct morphological future tense, and in fact the Old English verb form is simply non-past, and only the context leads the reader to assign future time reference

210 Bernard Comrie

to the coming of the days. Thus, while the glosses provide an indication of which Latin word corresponds to which English word (with occasional one-to-many correspondences), it does not do justice to the morphological categories expressed in the Latin original and relies a lot on context and other inferencing. Anyone attempting to write a grammar of Latin based on the Old English glosses would be foolhardy indeed! With the rapid development of linguistic typology in the second half of the twentieth century, the issue of glossing became ever more important. A typical typological article might include examples from dozens of languages, while monographs might take this into the hundreds, all or nearly all likely to be unfamiliar to all or nearly all of the audience. And since these examples were being provided for linguistic purposes, it was essential to devise a means of representing the structure of examples in the language investigated in such a way that linguists could readily see exactly how the example in question comes to mean what it does, and how it illustrates the particular grammatical points that it is meant to represent. During this period glossing practices were gradually improved, with the Leipzig Glossing Rules (hereafter: LGR; accessible on-line at http://www.eva.mpg.de/lingua/resources/ glossing-rules.php; last consulted on 2014 August 06) being an attempt to draw together the best practice that had emerged as a consensus by the early 21st century. In what follows, I will present some of the most important features of LGR. Glossing aims to identify both lexical and grammatical morphemes of the object language, with lexical morphemes conventionally glossed by lower-case words in the meta-language (here, English), grammatical morphemes by abbreviations in upper case (for esthetic reasons, small capitals are usually used). When a word in the object language can be readily segmented into morphemes, i.e. in cases of concatenative morphology, then the formal side of glossing is relatively straightforward. The word in the object language is shown with hyphens indicating the morpheme boundaries, and in the gloss line both the order of glosses and the hyphens correspond to those in the object language line, as in (12), for one of the words from example (8). (12) walad-a-n boy-acc-indf

Here, the lexical item is glossed as ‘boy’. The first suffix is identified as accusative, the second as indefinite, using the abbreviations set out in LGR for these grammatical categories.2 It should be emphasized that LGR provides a representation 2.  LGR provides a set abbreviations for only a limited number of grammatical categories, based on an appraisal of those most frequently used in the linguistic literature, leaving it up to the individual linguist to create abbreviations for other grammatical categories and provide a list



From the Leipzig Glossing Rules to the GE and RX lines 211

and presupposes an analysis, i.e. the linguist must first carry out and bear responsibility for the analysis. If, for instance, the linguist doubts that Arabic nunation (the last morpheme in (12)) is adequately characterized as indefiniteness — after all, it occurs with inherently definite proper names like ħusajn-u-n ‘Husein’, here with the nominative suffix -u — then criticism is appropriately directed at the analysis underlying (12), not at LGR. Note that where a morpheme is a clitic rather than an affix, it is appropriately set off from the rest of the word, both in the object language and in the gloss, by means of an equal sign. Things are less straightforward when morphology is non-concatenative, which of course frequently arises in analyzing Afroasiatic languages, with the extreme case being so-called root-and-pattern morphology, where the exponence of lexical and grammatical categories is inextricably intertwined. My aim here is not to go into all the details of LGR, since the rules are readily available and hopefully clearly formulated, but rather to give particular examples illustrating especially those notations that have proven useful in the CorpAfroAs project. Where it is impossible or undesirable to segment a word or part of a word in the object language, then the constituent morphemes can be listed in the gloss separated by a period. Thus, a radical word-and-paradigm analysis of the verb form raʔajtu in (7)–(8), which is first person singular of the perfective of the verb ‘see’, might simply be as in (13): (13) raʔajtu see.pfv.1sg3

But the criterion of “desirability” clearly leaves a certain amount of leeway, and the linguist might prefer at least to segment off the person-number marker, as in (14): (14) raʔaj-tu see.pfv-1sg

While the period is the default means of indicating nonsegmentability, LGR allows numerous options for more specific representations, and some of these have been put to good use in CorpAfroAs. thereof. In projects like CorpAfroAs it is of course necessary to have a homogeneous list of abbreviations across the whole project, including those not included in LGR. 3.  Strictly speaking, one might argue that “1sg” actually conflates two grammatical categories, person and number, and should therefore more properly be written “1.sg”. However, even setting aside theoretical arguments over whether the so-called plural of a speech act participant pronoun really is its literal plural, the conflation of person and number into a single formant is so frequent cross-linguistically that the unified representation “1sg” is recommended by LRG (Rule 5).

212 Bernard Comrie

Thus, LGR (Rule 4D) allows use of a backslash to separate the representation of a grammatical category that is indicated by means of a morphophonemic alternation. The gloss with the backslash may precede or follow the gloss it qualifies, according to convenience. In standard Arabic, the difference between the perfective and imperfective stems of a verb can be treated as a morphophonemic alternation, and thus an even more explicit glossing of the verb in (7)–(8) would be as in (15). (15) raʔaj-tu see\pfv-1sg

Infixation occurs in a number of Afroasiatic languages, and LGR provides a way of representing it (Rule 9), namely by surrounding the infix by angled brackets in the object language and then putting the gloss, also surrounded by angled brackets, before or after the morpheme into which it is infixed, as convenient. Thus, the VIII form of the verb in Arabic might be glossed (partially)4 as in (16). (16) ʤamaʕa gather

LGR also provides a means of notating reduplication, as in (17) from Hebrew (Rule 10). In the object language, the copied element is separated by a tilde, likewise in the gloss, with the reduplicated element of the object language corresponding to the grammatical category or other effect of the reduplication in the gloss. In (17), the reduplicated element is rak, and this kind of reduplication expresses attenuation (att = attenuative); the final m.pl is ‘masculine plural’. (17) jerak~rak-im green~att-m.pl ‘greenish ones’

Another notational device provides a specific way of glossing non-overt elements, such as the fact that Maltese kiteb, without any overt indication of personnumber, is specifically third person singular masculine. Square brackets can be used to enclose the gloss corresponding to the absence of overt exponence in the object language (Rule 6),5 as in (18).

4.  I have not attempted to characterize form VIII any further than to gloss it as “viii”. The gloss in (12) does not include tense-aspect-mood or person-number. 5.  Rules 6 provides an alternative, writing the non-overt category as a zero morph in the object language and then glossing concatenatively, i.e. kiteb-Ø and “pfv\write-3sg.m”. In practice, this alternative seems to be generally avoided by more empirically and functionally oriented linguists.



From the Leipzig Glossing Rules to the GE and RX lines 213

(18) kiteb write\pfv[3sg.m] ‘he wrote’

Especially in cases where a single lexical morpheme in the object language requires two or more words in translation into the meta-language, one might reasonably argue that this is not really conflation of two morphemes into a single formant in the object language, but rather a lexical gap in the metalanguage — or at least that the analysis should not be biased either way. In such cases, one can of course use the general notation of the period between the two words in the gloss, but LGR also allows a more specific solution of using an underline (Rule 4A). Thus, the two Arabic translations of ‘uncle’ might be glossed as in (19). (19) ʕamm xaːl paternal_uncle maternal_uncle

On the basis of the above, an alternative representation of (7) would be as in (20) (where rel = relative), giving one possible glossing that reflects the morphological categories of the sentence. Note the convention of left-aligning the beginnings of words. (20) raʔaj-tu l-walad-a llaðiː tarak-a ʔab-aː-hu see\pfv-1sg the-boy-acc rel.m.sg leave\pfv-3sg.m father-acc-3sg.m.poss ‘I saw the boy who had left his father.’

Although glossing, as with LGR, provides an enormous advance in the ability to present an analysis of the object language understandable to a linguist unfamiliar with the language, there are still some gaps, some of which will covered by the extensions provided in CorpAfroAs in the GE and RX lines, to be discussed in Section 2. In addition, there are others that are not specifically addressed by CorpAfroAs since they are not particularly germane to its aims. For instance, the kind of ambiguity illustrated in (1) is not representable using the LGR conventions, as can be seen equally in a comparable Arabic example with full glossing in (21). (21) fiː bajt-i r-rajul-i l-kabiːr-i in house-gen the-man-gen the-big-gen ‘in the big (old) man’s house’, or ‘in the man’s big house’

The genitive adjective in (21) can be construed as an attribute either of ‘the man’ (in the genitive as possessor) or of ‘the house’ (in the genitive because it is governed by the preposition). More generally, while LGR provides a viable means of presenting the morphological analysis of an object-language example, it is not

214 Bernard Comrie

designed to provide a syntactic analysis. An LGR glossing of an English sentence like the farmer killed the duckling will not, given the lack of case marking or verb agreement, enable the user to identify subject/agent or object/patient, although the translation into the metalanguage will give such basic information. In the transition to the discussion of GE and RX lines of CorpAfroAs in Section 2, two further points may be noted. First, LGR does provide a means of annotating inherent categories, such as the gender of a noun, by using parentheses (round brackets), as in (22), where the information that the noun riʤl ‘foot’ is feminine in Arabic, though lacking the usual -a(t) suffix of feminine nouns, is important for understanding why the attributive adjective is also feminine. (22) riʤl-u-n ṣaɣiːr-at-u-n foot(f)-nom-indf small-f-nom-indf ‘a small foot’

Here, CorpAfroAs has a more general solution, exploiting the use of several lines (tiers) of analysis, something that is not available in LGR. Second, LGR does allow the creation of “negative” categories, such as “non-singular”, “non-past”, by prefixing an “n”, i.e. “nsg” and “npst”, but this can lead to confusion where there are other abbreviations that happen to start with “n”; in the particular case of “nsg”, this is also sanctioned as an abbreviation for ‘neuter singular’ in a language with frequent gender-number combinations (Rule 5A). Here, CorpAfroAs adopts a clearer notation by using lower-case “n”, i.e. “nsg”, “npst”; the other upper-case letters make it clear that this is a grammatical category, while the lower-case initial is clearly distinguishable from any letters that form part of the base grammatical category label. This is a useful addition that might well be incorporated into future versions of LGR. 2. CorpAfroAs: From glossing to the GE and RX lines While glossing following LGR typically uses only three lines, as in the examples in Section 1, namely the object language, the glossing, and the free translation, CorpAfroAs uses at least six lines (or tiers) in order to represent an utterance, namely (from top to bottom): TX, MOT, MB, GE, RX, FT. Of these, FT corresponds more or less to the LGR free translation line, with the exception that translation is by intonation groups rather than by sentences. In languages with, for instance, constituent orders different from English, this can mean that this line sometimes has to be split into two, so that the free translations of the individual intonation groups can in turn be joined together into a grammatical English sentence; this is done in the MFT line. However, since this does



From the Leipzig Glossing Rules to the GE and RX lines 215

not concern glossing or retrievability (see below) as such, we will have nothing further to say about it here. The first three lines (TX, MOT, MB) all correspond to the LGR object language line, but reflect different levels of analysis. MB is essentially the same as the object line as illustrated in Section 1, with hyphens indicating morpheme boundaries and equal signs indicating clitic boundaries. MOT is a stage of analysis before the identification of such boundaries, corresponding roughly to a phonological transcription. In fact, the possibility of two levels of representation of the object language corresponding to MOT and MB is envisaged by LGR; this point is mentioned in Rule 2. The line TX reflects the oral nature of the CorpAfroAs corpus, and is a broad phonetic transcription. Again, issues of glossing and retrievability do not arise, so these three lines will also not be discussed further. This leaves the two lines GE and RX. Before discussing them in further detail, it is necessary to consider the different requirements of glossing as presented in Section 1 and a corpus project such as CorpAfroAs. As noted in Section 1, the main purpose of glossing as used, for instance, in linguistic typology is to enable the reader to understand the morphological structure of elements in the object language, in order to see at least some of the principles that govern how the original text comes to mean what it does. By contrast, one of the main aims of establishing a corpus is to enable searches to be made, so that one can, for instance, search for all instances of a particular lexical item, or all instances of a particular grammatical category. This means that the glossing possibilities of LRG need to be enriched in order to achieve this more encompassing aim. GE corresponds more or less to the glossing line as illustrated in Section 1, but needs in certain respects to be more explicit, or perhaps more accurately to ensure that it is fully explicit. As has often been noted, even best practice in glossing for typological purposes has typically adopted the practice that where there is a clear markedness relation between two category values, especially if the unmarked versus marked distinction correlates with absence versus presence respectively of an overt indicator of the grammatical category in question, then only the marked value is glossed. A simple example would be the distinction between the singular and plural in the two English words in (23). (23) cat cat-s cat cat-pl

Similarly, one might overtly gloss subjunctive but not indicative, in particular if there is no overt segmentable morph identifiable as the exponent of “indicative”. In glossing, this is usually no problem, since linguists are sufficiently familiar with

216 Bernard Comrie

the convention.6 A similar example can be provided by the number (singular, plural) and gender (masculine, feminine) forms of a regular Arabic adjective (with a sound masculine plural), as in example (24), where the first form is masculine singular, the second feminine singular.7 (24) ɣaːʔib ɣaːʔib-at ɣaːʔib-uːna ɣaːʔib-aːt absent absent-f absent-m.pl absent-f.pl

The first two forms in (24) are both singular, but are not explicitly marked as such. This has the effect that one can easily do a search for “plural” in examples glossed as in (24), but it is difficult if not impossible to do a search on “singular”; certainly, one cannot directly search for this category value. Moreover, in the singular the unmarked masculine is not explicitly indicated as such, although the marked feminine is. In the plural, the suffixes -uːna and -aːt are both portmanteau morphs combining the expression of plural number and (masculine or feminine) gender, and are thus conventionally glossed in way that expresses both number and gender overtly. Overall, this has the strange result that one can readily retrieve plural or feminine, but that a search on “masculine” will yield masculine plurals but not masculine singulars! While this feature of best-practice glossing in typology, incorporated into LGR, does keeps glosses shorter and more readable, it is not adequate as a basis for retrieving categories or category values. Within CorpAfroAs, a gloss along the lines of (25) would be essential. (25) ɣaːʔib ɣaːʔib-at ɣaːʔib-uːna ɣaːʔib-aːt absent[m.sg] absent-f.sg absent-m.pl absent-f.pl

Thus, the GE line corresponds closely to the glossing line in LGR, but needs to be maximally explicit. The GE line in this way enables retrieval of all the morphological categories of the language, while the MB line allows retrieval of all lexical items (through the use of forms of lexical items stripped of morphology in the object language

6.  There are, however, potential problems. Thus, the glossing of English cat in (23) is in appearance no different from that of Japanese neko, although Japanese does not have an obligatory number distinction in nouns, and neko is number-neutral (translating as ‘cat’ or ‘cats’ according to context) rather than specifically singular. The consumer of glossed typological examples needs to be aware of such distinctions. One might argue that the really best practice in glossing even for typological purposes should overtly gloss all category values implied by a word, although this would have the effect that glosses grow even longer than they currently are. 7.  For ease of presentation, case and nunation suffixes are omitted, likewise the fact that -uːna is specifically nominative (versus accusative-genitive -iːna).



From the Leipzig Glossing Rules to the GE and RX lines 217

representation)8 — this is one of the advantages of separate MOT and MB lines. However, there are still many other categories that one might want to search for but which do not form part of the GE line. For these, the RX line has been created. The RX line is essentially a repository of all other grammatical information that is not included in the GE line. This is a very broad characterization, and as has been noted in the practice of CorpAfroAs this leaves the set of such information somewhat open-ended, depending on the state of the art in our investigation of particular linguistic phenomenona, and on developing demands made by researchers who want to exploit retrievability in corpora. What we can reasonably do here is to indicate some of the information that can profitably be included in RX, with special reference to CorpAfroAs. One such item is the part of speech (word class) of words. A reasonable demand of a corpus is to be able to retrieve, for instance, all verbs, e.g. as the initial stage of a corpus-based study of verb morphology. Since most languages do not have a morpheme that identifies all and only verbs, LGR provide no assistance here. Part of speech must, therefore, be identified in the RX line. Only if a language actually has a morpheme that identifies all and only members of a particular word class would the situation be treated differently. I am not aware of natural languages with this property, but in Esperanto all and only nouns have the suffix -o, all and only adjectives have the suffix -a, so presumably this information will be included in the GE line (and in a conventional LRG glossing) for Esperanto, as in (26). (26) la brun-a hund-o the brown-adj dog-n ‘the brown dog’

While identification of parts of speech is perhaps the most obvious use of RX, it can also be used for identifying other classes. Thus, while the GE line identifies individual tense-aspect-mood categories and category values such as “perfective” in (20), each of these will be identified as “tam” in the RX line. The RX line thus provides a general solution for grouping things together that are identified only individually in the RE line, for instance all affixes can be identified as “affx”, or all suffixes as “sufx”. Another important use of the RX line is to identify grammatical categories other than part of speech that are covert in the given language, or in the given item, i.e. which do not find expression in any overt morphological marker of the relevant item in the object language. An obvious example here would be gender.

8.  However, homonyms (homographs) would still be problematic.

218 Bernard Comrie

Of course, if gender is indicated morphologically, as with the -at suffix in Arabic,9 then it will be glossed in the GE line, as in (27), where the second example is clearly related to masculine malik ‘king’ — here, again, case and nunation are omitted. (27) kalim-at malik-at word-f king-f ‘word’ ‘queen’

But for words like ʔumm ‘mother’ and riʤl ‘foot’, which are feminine but do not contain any morpheme or morphological process that can be identified as indicating feminine gender, the RX line is the appropriate place to express this information. The same applies, incidentally, to inflectional classes, for instance the fact that the Arabic male given name Sulajmaːn is a diptote (with accusative = genitive), in contrast to the triptote (accusative ≠ genitive) Zajd. It was noted in Section 1 — see the discussion of (22) — that LGR does provide a means of including information on covert categories, one that is surely sufficient for typological purposes; the RX line, however, forms part of a more general solution that fits in well with the overall aim of CorpAfroAs to enable retrievability of all grammatical categories. A further important type of information that is included in the RX line is grammatical relations, such as subject (sbj) and object (obj). Thus, the RX line for example (20) would minimally identify walad ‘boy’ as being a noun (n), as being of masculine gender (m), and as being object (obj) of its clause. The open-endedness of the RX line means that it can also be put to corpusspecific uses. Thus, the texts that comprise the CorpAfroAs corpus include frequent code-switching, and such switching from the base language can be indicated in the RX line, e.g. by using “csw.fra” to indicate code-switching by a speaker of Moroccan Arabic into French. (Here, the first part of the abbreviation identifies code-switching in general, the second part the target language of the switch in the particular case at issue.) The domain of the RX line thus transcends grammar. 3. Conclusion The kind of glossing represented by LGR is adequate for the purposes of linguistic typology, namely to make the morphological structure of an example sufficiently explicit so that readers unfamiliar with the object language can understand how it 9.  At least assuming that one accepts this analysis of -at, despite the existence of the handful of words like xaliːf-at ‘caliph’ which include this suffix but are masculine. It cannot be emphasized enough that the GE line can only represent analyses, it is not a way of deciding among competing analyses.



From the Leipzig Glossing Rules to the GE and RX lines 219

comes to mean what it means. It is, however, insufficient for a corpus, where the retrievability of information on grammatical categories is crucial, and here the RX line that characterizes the CorpAfroAs corpus is essential.

Cross-linguistic comparability in CorpAfroAs Amina Mettouchi, Graziano Savà and Mauro Tosco LLCAN, Paris / LLCAN CNRS / University of Turin

One of the aims of CorpAfroAs is to allow queries within and across the language samples composing the corpus. Through the study of phenomena represented in several languages of the corpus (directional morphemes, case, and gender) we show that CorpAfroAs indeed allows the retrieval of a body of data amenable to cross-linguistic comparison, within the Afroasiatic phylum and beyond. However, given the annotation scheme of the corpus, the retrieval of relevant data has to rely on information given in the accompanying grammatical sketches.

Introduction When the CorpAfroAs project was submitted in 2006, one of the aims underlying the creation of a corpus composed of several single-language corpora within AfroAsiatic, was to provide a basis for cross-linguistic comparison. In order to provide such comparable annotations, homogenization was necessary because descriptive traditions diverged a lot in their terminology and their perspective (see Barontini et al. this volume), not to mention the variation linked to the language in which the analysis was previously conducted by members of the project (in our case French, Italian, Spanish, English, and Hebrew). The annotations chosen in CorpAfroAs are based on form, and they are language-internal in the sense that categories are defined within each language and are not comparative in essence (for the distinction between the two types of categories see Lazard (1975), Comrie (1979), Bybee (1985), Haspelmath (2010), among others). Only morphosyntactic information is provided in the first annotation line, \ge, while other types of information (semantic or morphological verb class, syncretism, etc.) are given on the second annotation line, rx. The basis of the morphosyntactic annotation is a form/function pairing, where a form coding a function, regardless of its many contextual interpretations, is always annotated in the same way. For instance, the s- derivation in Berber is consistently annotated as

doi 10.1075/scl.68.07met 2015 © John Benjamins Publishing Company

222 Amina Mettouchi, Graziano Savà and Mauro Tosco

Causative (CAUS) despite the fact that it often has a transitivizing function when applied to an intransitive verb, and is sometimes used to derive a verb of sound from onomatopoeia. The same is true for lexical items: the same verb, whatever its contextual interpretations, is annotated in the same way. For instance, in Kabyle, verb xdəm is always annotated as ‘make’, even if in some contexts it can be translated as ‘work’ (physical activity or employment). This allows the verification of hypotheses that may emerge in the study of corpora: for instance, is the interpretation of the lexical item as ‘work’ limited to intransitive uses of the verb xdəm? An automatic search involving the retrieval of the structures containing this verb shows that this is indeed the case. One of the assumptions underlying the annotation process in CorpAfroAs was that there is some degree of resemblance between a language-internal category and a comparative one (cf. Haspelmath 2010 among others). Thus, Perfective in language A is basically comparable with Perfective in language B, regardless of the fact that Perfective in a language that only has a binary opposition with Imperfective does not have the same properties as Perfective in a ternary system also involving an aorist for instance. The effect of this assumption is that retrieval of bodies of data for the verification of hypotheses is conducted directly on the corpus, through a search interface on the website, that allows complex queries based on labels (available as a list of glosses and their abbreviations. See the paper by Chanard in the present volume). For instance, it is possible to retrieve all the negative clauses containing a Perfective, in all the languages of the corpus that have the category Perfective and Negation, by using the abbreviations NEG and PFV. Indeed, homogenization was necessary, but not sufficient to conduct an informed cross-linguistic study. Relying only on labels may lead to ineffective searches in the corpus: for instance subject in Kabyle is a bound pronoun, whereas in Beja it is sometimes a noun, sometimes a nominal extension, sometimes a pronoun. Moreover, without indications about the criteria used for subjecthood assignment, it is difficult to consider a priori that we are dealing with the same category. Comparing Subject in the two languages cannot be done without the preliminary examination of the way this category has been used by the annotators of the various single-language corpora. This is why we decided to provide an accompanying grammatical sketch for each language, in which the labels used by each linguist of the project are defined: in each sketch, a complete list of labels is provided, and information on the definition of most glosses1 is given.

1.  The complete list of all glosses used in the various languages composing the corpus is available on the project’s website http://dx.doi.org/10.1075/scl.68.website.



Cross-linguistic comparability in CorpAfroAs 223

The use of corpora for cross-linguistic comparison is thus mediated through a grammatical description (and possibly several, since the end-user can also use other sources before searching the corpus data). This paper illustrates the potential for cross-linguistic comparison of the CorpAfroAs corpus, through examples of searches concerning three phenomena: directional morphemes, case and gender. Each study is based on automatic searches in the corpus, after prior analysis of information given in the corresponding grammatical sketch, and some grammars of the languages under consideration. Those searches can be replicated by accessing the online corpus at the following address: http://dx.doi.org/10.1075/scl.68.website. 1. Directional verbal extensions in Chadic, Berber and Cushitic Some Afroasiatic languages have grammaticalized a system of bound morphemes that originally indicate directionality of the movement denoted by the verb. Often, those morphemes are used for all kinds of verbs, and their meaning is extended to such notions as benefit for the speaker, or resultativity (Mettouchi 1997 for Western Kabyle (Berber)) or to affected argument, non-controlling argument, or point of view of the predicate (Frajzyngier 2012b for Wandala (Chadic)). The following description aims to show how data from CorpAfroAs can be the basis for a cross-linguistic study of those directional elements. 1.1 Distribution Six languages of the corpus have such directional morphemes: Hausa, Zaar, Tamasheq, Kabyle, Gawwada, and Ts’amakko. The Hausa Ventive morpheme is glossed DIR (Directional) in the corpus, and corresponds to verb class 6 (glossed V6 in ge). This is the Grade 6 conjugation of Newman (2000). It “indicates action in the direction of, or for the benefit of the speaker” (Caron 2012).

224 Amina Mettouchi, Graziano Savà and Mauro Tosco

(1)

an saːmoː tà neː2 an saːmoː tà nèː 4.PFV.NFOC get.DIR 3SG.F COP1.NFOC3 PNG.TAM V6 PRO.OBJ PTCL.SYNT “We got it” (HAU_BC_CONV_02_SP2_260)

Zaar has the suffix -ɗi, which attaches to pronouns or verb complexes, and is glossed as CTP (Centripetal) in ge and PTCL (particle) in rx. (2)

wò sutə́ɗi / wò su =tə -ɗi / 3SG.FUT return =3S.OBJ -CTP / PNG.TAM V =PRO -PTCL / “He will come back” (SAY_BC_CONV_01_SP2_171)

In Western Kabyle there are two clitics, Proximal =dd (glossed PROX in ge and PTCL in rx) and Distal =n (glossed DIST in ge and PTCL in rx), which attach to verbs of all kinds (not only motion verbs) and, like pronominal clitics, climb to MoodAspect-Negation particles, or relativizers.4 (3)

amidawiɣ θamaʃaɦut͡s / ad =am =dd awi -ɣ tamaʃaɦuƫ / POT =ABSV2SG.F =PROX bring\AOR -SBJ1SG tale\ABS.SG.F / PTCL PRO PTCL V14 PRO N.OV / “I will offer you a tale” (KAB_AM_NARR_01_0003)

(4) jəddməttaʦəffaħt / i- ddəm =dd taƫəffaħt / SBJ3SG.M- grasp\PFV =PROX apple\ABS.SG.F / 2.  Examples have the following layout: the first line contains a phonetic transcription with prosodic words; the second line contains a morphophonological transcription involving grammatical words with morpheme breaks; the third line, named ge, is the morphosyntactic glossing tier; the fourth line, named rx, contains information about parts of speech, syntax, semantics, etc. The translation is followed by the identifier of the example within the corpus. This identifier always has the same syntax: ISO code of the language, initials of the author, genre (conversation or narration), number of the file, speaker (if more than one speaker is involved), number of the intonation unit in the file. Single or double slashes signal a prosodic boundary, non-terminal (/) or terminal (//). See the general introduction to the volume for more details. 3.  The list of abbreviations is available at http://dx.doi.org/10.1075/scl.68.website. It is an expanded version of the Leipzig Glossing Rules, and its extension has been supervised by Bernard Comrie within the CorpAfroAs project (see the Introduction in this volume for more details). 4.  Clitic climbing in Kabyle and Tamasheq is obligatory in front of Mood-Aspect-Negation particles, relativizers and some conjunctions.



Cross-linguistic comparability in CorpAfroAs 225

PRO- V23 =PTCL N.OV / “He took an apple” (KAB_AM_NARR_02_028) (5)

antrˤuħ arʃʃixiw / ad =n t- ṛuħ ɣr ʃʃix -iw / POT DIST SBJ3SG.F- go\AOR to teacher\ANN.SG.M -POSS1SG / PTCL PTCL PRO V24 PREP N.COV PRO / “She would go to my teacher” (KAB_AM_NARR_03_0478)

In Tamasheq (Berber) there are two clitics, Proximal =du (glossed PROX in ge and PTCL in rx) and Distal =in (glossed DIST in ge and PTCL in rx), which attach to verbs of all kinds (not only motion verbs), and like pronominal clitics, can climb to Mood-Aspect-Negation particles, or relativizers. (6)

iḍgɐẓtid ɐhɐḍ / i- əḍgɐẓ =tu =du ɐhɐḍ / 3SG.M- squeeze\PFV =ACC.3SG.M PROX night\ANN.SG.M / PNG V.IA1/TAM PRO PTCL N.OV / “The night surprised him” (TAQ_CL_NARR_01_026)

(7)

uhun oṣadin / uhun oṣa =in / then arrive\PFV[3SG.M] DIST / CONJ V.IA10/TAM.PNG PTCL / “Then he went there’ (TAQ_CL_NARR_02_71)”

The situation in Ts’amakko and Gawwada is more complex due to the number of verbal extensions. In Ts’amakko, =na is an assertive element marking the actual existence of an entity, or reality of a fact, which appears after nouns and verbs. After verbs, it is glossed ass in ge and V.CL in rx; =nu is a Dative or Ablative after noun phrases, and a complementizer marking a conditional clause after verbs, where it is glossed DAT in ge and CONJ.V in rx. In Gawwada, -na and -nu are decomposed into MOV (mover), i.e. the element to which -a, -u (and marginally -í) need to be affixed in order to act as adpositions (and different from their use with nouns) for n- and either CFG (Centrifugal) for -a, and CTP (Centripetal) for -u. They can both attach to nouns and verbs. With locative nouns, -a and -u attach directly to the noun stem with no intervening =n (cf. Tosco 2012a).

226 Amina Mettouchi, Graziano Savà and Mauro Tosco

1.2 Distribution and functions The distribution of those bound morphemes is variable across the corpus. First of all, Zaar and Hausa only have one extension, the Ventive. Among the languages that have at least two extensions, there is no necessary balance between the two in terms of frequency of use. Whereas in Tamasheq the proportion between distal and proximal is roughly 40% / 60%, in Western Kabyle it is 0.1% / 99.9%.5 The difference within Berber is especially striking since the Proximal and Distal extensions are of the same diachronic origin (>*d; >*n) throughout the language family. Ts’amakko and Gawwada also share historically identical morphemes -na and -nu. In Ts’amakko the proportion between complementizer and assertive is roughly 3% / 97%, in Gawwada the proportion between centripetal and centrifugal is roughly 21% / 79%. Complementizer and Centripetal are of the same origin, as are Assertive and Centrifugal. Chadic and Berber languages tend to use the Proximal more extensively than the Distal. The latter for instance has disappeared in Eastern Kabyle dialects. The table in Frajzyngier (1987) shows that for a sample of thirty Chadic languages, all of them have Centripetal extensions, but only fourteen also have Centrifugal extensions. In Gawwada and Ts’amakko on the contrary, the Distal/Centrifugal is used more extensively than the Proximal/Centripetal. In Western Kabyle, the Centrifugal extension is used in a limited number of contexts: (8)

innajas lliʦin ðinəβgirˤəppwi / i- nna =jas lli =ț =in SBJ3SG.M- say\PFV =DAT3SG open\AOR(IMP2SG) =ABSV3SG.F =DIST d inəbgi n ṛbbi / COP guest\ABS.SG.M GEN god / “He said open it (the door), I’m (lit. it is) a beggar’. (KAB_AM_ NARR_01_0677)”

(9)

aːːːʕmar sərsijin / a aʕmar sərs =iji =n / VOC aʕmar be_placed\CAUS.AOR.IMP2SG =ABSV1SG =DIST / “Amar please put me down!” (The ogress was put on a donkey by Amar) (KAB_AM_NARR_02_760)

5.  Those counts are indicative, since they are based on different amounts of data, but they correspond to the overall distribution of the two extensions in the languages under consideration.



Cross-linguistic comparability in CorpAfroAs 227

Mettouchi (2011) proposes that the function of the Distal clitic is to indicate that the process is construed relative to the deictic center of the addressee. Distance is not at play, since in (9), Aʕmar is holding the donkey, and in (8) the door is in front of the speaker. Viewpoint is more important: the speaker could have used a proximal clitic in examples (8) and (9), thus making the command more peremptory. In both examples, the use of the Distal clitic subordinates the speaker’s viewpoint to the addressee’s, with politeness side-effects. This shows that the distinction here is not motivated by direction of a movement, but by modal viewpoint/stance. The same holds for (5), where the verb could have been used without a directional clitic. Movement towards the addressee is a possible interpretation, but politeness is also at stake in (5). Spatial directionality cannot therefore be considered as a core function since most examples involve no movement, and no spatial distance from the addressee. In Tamasheq, the distal extension is used mostly with motion verbs (‘come’, ‘arrive’, ‘go’, ‘be on the point of arriving’) as well as verbs of saying. (10)

ikkain hartin oṣa / i- əkka =in har=tu =in oṣa / 3SG.M- go\PFV =DIST until=ACC.3SG.M =DIST arrive\PFV[3SG.M] / PNG- V.IA9/TAM =PTCL CONJ=PRO =PTCL V.IA10/TAM.PNG / “He went in this direction (to see it)” (TAQ_CL_NARR_01_088)

(11)

ənnɐɣasin ɐṛɐt / ənna -ɐ =as =in ɐṛɐt / say\PFV -1SG =DAT.3SG =DIST thing\ABS.SG.M / V.IA9/TAM -PNG =PRO =PTCL N.COV / “I said something to him” (TAQ_CL_NARR_05_21)

The proximal extension is also used with motion verbs and verbs of saying, as well as other types of verbs. (12)

ʒɐˈrɐkkɐntɐdˈdu muˈdɐrɐn / ʒɐrɐkkɐt -ɐn =tɐt =du mudɐr -ɐn ʒɐrɐkkɐt / dig_up\PFV -3PL.M =ACC.3SG.F =PROX animal\ANN. -PL.M dig_up\PFV / V.XA2/TAM -PNG =PRO =PTCL N.OV -PNG V.XA2/TAM / “Wild animals had dug her up” (TAQ_CL_NARR_01_095)

(13)

iḍgɐẓtid ɐhɐḍ / i- əḍgɐẓ =tu =du ɐhɐḍ / 3SG.M- squeeze\PFV =ACC.3SG.M =PROX night\ANN.SG.M / PNG- V.IA1/TAM =PRO =PTCL N.OV / “The night surprised him” (TAQ_CL_NARR_01_026)

228 Amina Mettouchi, Graziano Savà and Mauro Tosco

Unlike Western Kabyle, in Tamasheq the use of directionals with motion verbs is widespread, as well as the interpretation in terms of location of a situation close to the speaker or far from him or her. The proximal and distal meanings are still central, even though the general function of each marker is larger as is shown by their use with verbs of saying, where they involve stance, and with other types of verbs, where we find some of the dimensions noticed in Kabyle: completion, present relevance. Languages that use extensions very frequently, such as Western Kabyle, are likely to use them with a large variety of verbs, not only motion verbs. Indeed, the distribution of verbs in the Western Kabyle corpus of CorpAfroAs is consistent with findings in Mettouchi (1997), where beside motion verbs, the proximal clitic was also encountered with change of state verbs, and with verbs of saying, handling (‘take’, ‘hold’, etc.), finding, among others. Almost any verb is possible, since the proximal clitic has lost its original directional value, and more generally organizes the utterance around the deictic center of the (direct or reported) speaker or protagonist (Mettouchi 2011), with modal or aspectual dimensions as well as purely spatial ones. In example 14, we can see the use of Proximal in two contexts. One is a verb of handling with motion (‘take away’) where the Proximal clitic is motivated by the focus on completion of the action, underlined by the conjunction alamma ‘until’: it is only when the bread is taken off the shelf that the father will know that his youngest daughter is old enough to feed herself if her stepmother neglects her. The other context is negative and involves a verb that is not usually associated with a Proximal clitic. The motivation for the use of the Proximal clitic here is modal: the utterance is organized around the speaker’s viewpoint and underlines stance: it is a categorical statement, almost an oath. This is reinforced by the use of the Negative Perfective with future time reference, usually in contexts of solemn oaths. (14)

urdəzwiʤəɣ / alamma θəkksədd / fatˤima θuħrˤiʃθ / aɣrum əgðəkkwan // ur =dd zwiğ -ɣ / alamma t- kks NEG =PROX marry\NEGPFV -SBJ1SG / until SBJ3SG.F- take_away\PFV PTCL =PTCL V23 -PRO / CONJ PRO- V23 =dd / Fatima tuħṛiʃt / aɣrum g udəkkan // =PROX / Fatima clever / bread:ABS LOC shelf:ANN // =PTCL / NP ADJ / N.OV PREP N.OV // “I won’t marry until clever Fatima manages to take the bread from the shelf ” (KAB_AM_NARR_01_0142)

The Hausa corpus also shows that motion verbs are not the only ones to be associated with Ventive extensions: beside ‘return’, ‘leave’, ‘enter’ or ‘go’, we find ‘carry’, ‘get’, ‘do’, ‘sell’, ‘take’, ‘catch’ (verbs of handling). In Zaar, the Centripetal extension



Cross-linguistic comparability in CorpAfroAs 229

is associated with motion verbs (‘return’, ‘go’, ‘arrive’, ‘leave’, ‘enter’, ‘thrust’, ‘pass by’), as well as verbs of handling (‘take’, ‘hold’, ‘bring’, ‘weave’, ‘tie’, ‘rub’, ‘dig’, ‘gather’, ‘fetch’). It is remarkable that the same semantic subsets are associated with proximal/ventive/centripetal extensions in the three languages (Hausa, Zaar, and Kabyle). (15)

mə́ ngâː ʒàɗì // mə́ ngaː ʒà -ɗi // 1PL.AOR fetch water -CTP // PNG.TAM V N -PTCL // “We fetch water” (SAY_BC_CONV_02_SP1_007)

Gawwada and Ts’amakko use the Centripetal affix with various types of verbs, not necessarily motion verbs. As for the numerous Centrifugal affixes, they are mostly used with verbs of saying and telling: the proportion of affixation to verbs of saying compared to the two next most frequent verbs (‘go’ in Gawwada and Tsamakko, ‘be there’ in Gawwada, ‘run’ in Ts’amakko) is 8 to 1 in Ts’amakko, and 3 to 1 in Gawwada.6 Other verbs used with the Centrifugal are ‘return’, ‘arrive’, ‘run’, ‘jump’ in Gawwada, ‘arrive’, ‘tend cattle’, ‘eat’ and ‘come’ in Ts’amakko. In Western Kabyle, contrary to Hausa where the Ventive remains attached to the verb, Proximal particles are subject to clitic climbing with Mood-AspectNegation preverbal particles, or relativizers, and must attach to those preverbal morphemes (this is also the case for Absolutive or Dative pronouns) (see ex.16). The list of hits for a search involving the Proximal or Distal clitics cannot directly provide a list of associated verbs. Partial searches are necessary to recover all examples, after which the visualization of those examples makes it possible to retrieve the contextual elements at play in the interpretation of meaning: types of verbs, but also types of pronouns, presence of modal markers, types of aspectmood used, etc. The precise study of those contexts highlights the frequent use of this Proximal clitic with Dative pronouns (19% of clauses (85 out of 451) containing a Proximal particle also contain a Dative pronoun, the proportion of clauses with Dative pronouns in the whole corpus being 13%). (16)

aɣiddəħku səʦʦi θimuʃuɦa // ad =aɣ =dd t-ħku səƫƫi timuʃuɦa POT =DAT1PL =PROX SBJ3SG.F-tell\AOR grandmother\SG tale\ABS.PL.F PTCL =PRO =PTCL PRO-V13% N.KIN N.OV “My grandma would tell us folktales” (KAB_AM_NARR_03_0245)

6.  Counts are based on 100 verbs for each language.

230 Amina Mettouchi, Graziano Savà and Mauro Tosco

This finding is consistent with the tendency of Zaar to have the centripetal extension attached to Benefactive markers (7% of clauses containing a centripetal extension also contain a Benefactive marker, the proportion of clauses with Benefactive markers in the whole corpus being 1%). (17)

àː lə́ːrmí mə́nɗi ʧiɣə́j / àː lə́ːrmí mə́nɗi ʧikə́j àː lə́ːr =mí mə́n -ɗi ʧík -íː 3SG.PFV bring =1PL.OBJ BEN -CTP thus -RES PNG.TAM V =PRO PTCL -PTCL ADV -ASP “as he brought [him] to us like this” (SAY_BC_CONV_02_SP2_044)

Finally, in Zaar, we notice the regular association of Resultative (glossed res in ge) and the Centripetal extension:7 14% of clauses containing a centripetal extension also contain a resultative marker, the proportion of clauses with Resultative markers in the whole corpus being 10%. This may suggest that, as in Western Kabyle, movement towards the deictic center of the speaker can be associated with Completed or Perfect aspects, or the attainment of a goal. (18)

ŋgwôɣŋ tùlíːɗi / ngôkn tùlíːɗi ngôkn tul -íː -ɗi he_goat arrive -RES -CTP N V -ASP -PTCL.EXT “He-goat arrived” (SAY_BC_NARR_03_SP1_653)

This is interesting, since Western Kabyle, which did not grammaticalize the function ‘resultative’, regularly uses the Proximal particle to convey this meaning (Mettouchi 1997), as shown in example (14). On the other hand, Tamasheq, which has a Resultative aspect (glossed RES in ge), does not show any correlation between that aspect and the Proximal or Distal clitics. Those qualitative findings serve as a basis for a larger cross-linguistic comparison of directional morphemes, should other Berber, Chadic and Cushitic languages be added to CorpAfroAs. They help formulate heuristic hypotheses on centripetal/proximal extensions in Chadic and Berber: once the markers start to be used outside a strictly spatial domain, it seems that the notion of direction towards a deictic center is extended to impact on the situation (with resultative meaning), or on the participants (with beneficial/detrimental meaning). It can even, as in Western Kabyle, take on modal values, such as viewpoint (especially with verbs 7.  Ader Hausa, not represented in CorpAfroAs, has a combination of centripetal and resultative in the form of Grade 4 suffix -ikkee (Caron 1989: 147).

Cross-linguistic comparability in CorpAfroAs 231



of saying, or in irrealis or negative contexts). On the other hand, the findings concerning the Cushitic languages Gawwada and Ts’amakko show that extension of grammaticalization can also concern the Centrifugal extension. The very strong co-occurrence pattern with verbs of saying indicates that what is probably at stake, apart from direction of motion, or localization, is that the function of the particle is modal. And indeed, the centrifugal is glossed Assertive in Ts’amakko. 2. Case in AfroAsiatic The second study is about cross-linguistic comparison of case in some languages of the CorpAfroAs corpus. It presents a typology of case values and a discussion on marking of syntactic roles in general. The languages taken into consideration are Kabyle and Tamasheq for Berber, Hausa for Chadic, Hebrew and Moroccan Arabic for Semitic, Wolaytta for Omotic and Afar, Gawwada and Ts’amakko for Cushitic. 2.1 Defining case in CorpAfroAs According to a common definition: Case is a system of marking dependent nouns for the type of relationship they bear to their heads. Traditionally the term refers to inflectional marking, and, typically, case marks the relationship of a noun to a verb at the clause level or of a noun to a preposition, postposition or another noun at the phrase level (Blake 2001: 1).

Case as defined above is one of the possible coding means of syntactic roles. However, any marking of syntactic property of nouns and pronouns is often defined as case. While case is sometimes reduced to syntactic role marking, some theories expand its functional properties and use it to indicate more abstract semantic roles (Fillmore 1968). This is because often, but not always, case and syntactic-role correspond to some semantic characteristics. For example, a Nominative noun encoding the Subject in a sentence often acts as the agent. Case is a form associated to a syntactic-marking function and typically a case system is ordered in case declensions with suffixes as case markers. Latin, Greek and Turkish are languages with such a system. However, other approaches to case allow case markers to be marked by clitics to the nouns or the phrase and pre-/post-positions. This is because sometimes pre-/post-positions, nouns and phrase clitics and inflectional case markers are connected on a grammaticalization line and in fact may express the same function. The degree of boundedness of the case marker can also go in the other direction, so that case is marked by word

232 Amina Mettouchi, Graziano Savà and Mauro Tosco

suppletion and the whole form changes according to the expressed case. This is typical of case-determined pronominal paradigms. Typical labels such as Nominative, Accusative and Dative are used in CorpAfroAs to indicate case. Syntactic role marking labels are Subject, Direct Object, Indirect Object etc. and semantic roles are agent, patient, recipient etc. The use of these labels in the corpus reflects the kind of analysis applied and has loose correlation with segmental properties. The preference goes to general syntactic role marking if the element is not considered ‘case-like’. Unbound elements such as pre-/post-positions tend to receive lexical glosses such as ‘to’, ‘on’, ‘with’. However, they can be interpreted as grammatical role markers and glossed accordingly: one preposition in Kabyle is glossed DAT because the function of this element is considered similar to the one typically coded by Dative case. Semantic roles can be coded by any bound or unbound form. It should be added that syntactic roles can also be inferred, among other coding means, from agreement and the position of the word in the clause. In the languages of the CorpAfroAs corpus analyzed in this paper, case systems are rather poor. One language has a suffixal case system. In others case is coded by different forms, which are integrated in one system. Other languages have cases only in pronouns. 2.2 A description of case marking in AfroAsiatic 2.2.1 Case suffixes and apophony The only language of the CorpAfroAs corpus with an exclusive series of case affixes creating a declension is Wolaytta (Omotic). The declension applies to both nouns and pronouns. Eight nominal case suffixes operate in this language: Nominative (NOM), Accusative (ACC), Genitive (GEN), Dative (DAT), Locative (LOC), Directive (DIR), Instrumental (INS), Comitative (com). The Nominative in Wolaytta, and in several Ethiopian languages, is typologically interesting because it is not part of a system that can be defined as Nominative-Accusative or Ergative. It marks the Subject in an intransitive clause and the agent in a transitive clause and indicates the Subject of a copula clause. In the last case the predicative element, i.e. noun, pronoun or adjective, is marked by the Accusative case. Nominative and Accusative case affixes are gender-sensitive. Therefore, M or F precede, separated by a dot, the case glosses (see 3.6. below for more details). In example (19) the masculine noun gaammo ‘lion’, is marked by the Nominative case -i:



(19)

Cross-linguistic comparability in CorpAfroAs 233

gaammóy ʔissí ʔíndé míizza laaggíis // gaammóy ʔissí ʔíndé míizza gaammó -í ʔissó -í ʔindé miízza lion -M.NOM one -LINK female.old cow N -CASE NUM -CONNECT ADJ N laaggíis // laagg -iis // drive -3MSG.PAST.AFF.DECL V1 -TAM “the lion drove one old cow” (WAL_AA_NARR_05_lion_15)

Gawwada and Ts’amakko, both Cushitic languages of the Dullay cluster, have only one suffixal case: Associative (ASSOC). The case is actually expressed by three case-sensitive forms according to the three-gender distinction of these languages. As for the meaning, the three suffixes indicate both a location in a sentence and a possessor in a noun phrase. See example (20) from Gawwada, where the noun kolle ‘river’ is marked as locative by the Associative feminine case -atte after deletion of the final Feminine gender marker -e. (20)

ʃeːtte ʕagaba gollaj / gollatte / ʃeːtte ʕakapa kollaj# ʃeːt -t -e ʕak -a =pa kollaj# / girl -SING -F be_there -IPFV.3SG.M8 =LINK kollaj# / N -PNG -PNG V -TAM.PNG =CONJ FS / kollatte koll -atte / river -ASSOC.F / N -PNG / “There was a girl at the…at the river” (GWD_MT_NARR_07_012–013)

In Afar, Nominative (NOM), which has similar characteristics as the one described above for Wolaytta, and Genitive (GEN) indicate case marking by apophony and movement of the accent to the word-final syllable. In fact, only masculine nouns in the unmarked Absolutive (ABS) case that end in a vowel are marked for NOM and GEN. For both cases the apophony is a > i and the accent moves to the last syllable of the word. If the word-final syllable is underlyingly accented, the case is marked by apophony only. 8.  The M agreement of an F noun is caused by the loss of agreement between the subject and the verb. This is due to Gawwada’s Subject focusing strategy in the formation of thetic sentences (Tosco 2010: 325).

234 Amina Mettouchi, Graziano Savà and Mauro Tosco

2.2.2 Case clitics Other syntactic roles of nouns and pronouns in Gawwada, Ts’amakko and Afar are indicated by a series of clitics. In Gawwada and Ts’amakko the domain of case marking by clitics is not the noun but the noun phrase. They attach after the last element of a noun phrase and do not replace the last vowel of a modified noun, in contrast to what happens with the Associative case. The clitics in Ts’amakko are Dative (DAT), Diffusive (DIFF), Comitative-Instrumental (COM) and Locative (LOC). It is to be noted that DAT in Ts’amakko marks both a recipient-receiver and a source-provenance. Gawwada glosses differ in that there is no LOC clitic and the Ts’amakko Dative =nu corresponds to a combination of the Mover (MOV) morpheme =n followed by the Centripetal (MOV-IN) affix -u. The Gawwada =n-u is opposed to =n-a, Mover-Centrifugal (MOV-OUT), and =n-í, Mover-Specific (MOVSPEC). This means that Gawwada has two additional case clitics =n-a (MOV-OUT) and =n-í (MOV-SPEC). The description is summarized in the following table: MOV =n

IN -u

OUT -a

SPEC -í

=n-u

=n-a

=n-í

In the following example from the Ts’amakko corpus, the Diffusive =ma follows the modifier linq’e ‘clean’ since it marks the whole Noun Phrase rather than the Head Noun ɗoːllo ‘skin mat’. (21)

bagannaŋki qawko ɗoːllo / liːːːnq’e aɠiːppi / garmitto // bagaɗnanki q’awko ɗoːllo bagaɗ -n -anki q’awk -o ɗoːll -o / run.P -FUT -IPFV.1PL man -M skin_mat -M / V -TAM -TAM.PNG N -PNG N -PNG / linq’ema ɠiːppi / garmitto linq’e=ma ɠiːf ~p -i / garm -itt clean=DIFF go_to_sleep ~SEMELF -PFV.3SG.M / lion -SING ADJ=CASE.CL V ~V.der -TAM.PNG N ~n.der “We’ll run. The one who sleeps on the clean mat is a lion”. (TSB_GS_NARR_001_SP1_248–250)

-o -M -PNG

// // //

Gawwada and Ts’amakko case clitics also attach to pronouns. They attach to the Object pronouns, labeled OBJ in CorpAfroAs, following directly the pronominal morpheme. The other main pronominal paradigm is Subject (SBJ). Gawwada glosses differ here in preferring Oblique (labelled OBL) for the Ts’amakko Object and in using the Subject paradigm only for the participants, while non-participants use the aforementioned Specific (SPEC) -í or a Generic (GEN) -a. Therefore, only non-core case is marked by case clitics on pronouns.



Cross-linguistic comparability in CorpAfroAs 235

In Ts’amakko, when a pronoun is marked for locative, the case clitic =ta rather than the Locative case is used. This is shown in the following example, where the 1SG.OBJ pronoun ʔeːta is followed by the Locative =ta: (22)

eta sabbete ita maɠɠi // ʔeːta sabbete ʔeː =ta sabb -ete ʔita maɠɠi // 1SG.OBJ =LOC top -LOC.P away go_away.IMP.SG // PRO.IDP CASE.CL N.LOC CASE ADV.LOC V // “Get away from me” (TSB_GS_NARR_006_SP1_35)

Afar also makes use of case clitics for nouns and pronouns. These are =h Centripetal (CPT), =k Centrifugal (CFG), =l Instrumental (INS) and =t Locative (LOC). 2.3 Syntactic roles marking in pronouns In the rest of the languages under analysis, i.e., Hausa and Zaar (both Chadic), Kabyle and Tamasheq (both Berber), Hebrew and Moroccan Arabic (both Semitic), case and syntactic roles are only indicated in pronouns. Case marking in the Berber languages is Accusative (ACC) and Dative (DAT) in Tamasheq and Absolutive (ABS) and Accusative (ACC) in Kabyle. The following glosses are also used in Berber: SBJ for pronominal Subject and ABSL (Absolute) and ANN (Annexed). The latter two do not indicate case but state of the nouns in the context of the clause and the phrase. How the two states are selected according to the syntactic context in which they appear is one of the big questions of Berber linguistics (see Mettouchi and Frajzyngier (2013), for the most recent hypothesis that has an impact on general typology). Other pronominal series that indicate syntactic roles are the Object (OBJ) pronominal clitics and Subject (SBJ) independent pronouns in Hebrew and the Possessive (POSS) and Object (OBJ) pronominal clitics of Moroccan Arabic. Case syncretism between POSS and OBJ in Moroccan Arabic is analyzed as Oblique case and labeled OBL. Finally, Hausa has Object (OBJ), Benefactive (BEN) and Possessive (POSS) pronominal paradigms, while what in the other languages is presented as a subject pronominal paradigm here is labeled IDP, i.e. “Independent”. 2.5 Cross-linguistic queries on case in CorpAfroAs The description presented above shows that in the CorpAfroAs corpus case is poorly expressed and case systems largely integrate morphological marking of syntactic role. The only exception is Wolaytta with its full-fledged case declension. When conducting queries in the CorpAfroAs corpus, therefore, one should be

236 Amina Mettouchi, Graziano Savà and Mauro Tosco

aware of the fact that syntactic roles may or may not be indicated by case glosses. For example it is noteworthy that the core case glosses NOM and ACC are used for case suffixes and less so in pronominal paradigms. The syntactic role labels SBJ and OBJ are preferred for pronominals. The corpus also shows that case marking is not necessarily a modification of a word. Ts’amakko and Gawwada show a case concord system where the domain of case marking is the noun for case suffixes, but the noun phrase for case clitics. The noun is marked by the clitic if it is the only element of a noun phrase. The structure of those languages being Head-Modifier, if any modifier, including a relative clause, follows the Head Noun, the clitic attaches to the modifier. If there is more than one modifier, the case marker will still follow that rightmost modifier. This is not valid in the case of pronouns, which are directly followed by the case-clitics. According to one of the principles of the CorpAfroAs methodology, a single gloss is associated to each grammatical form and each gloss reflects the meaning and the function of the form. The choice of the gloss is therefore an outcome of the language-internal analysis suggested by the grammatical system of each language. This is visible also in the glossing of case. 3. Gender in AfroAsiatic 3.1 Overview Both gender and number are robust categories in AfroAsiatic languages in general, and in this respect the languages of our corpus are good representatives of the language family as a whole: gender is marked in all of them with the exception of Zaar, and Juba Arabic (a creole/pidgin). Number (which will be tackled here only insofar as it interacts with gender) seems to be marked in all languages of the project. Moreover, gender and number interact in many interesting and different ways, as will be shown below. The robustness of gender in AfroAsiatic is shown in agreement with a gendered nominal head on modifiers, as well as on the verb, where the gender of the subject (be it overtly expressed as a noun, pronominalized, or contextually given) governs agreement on the form of the verb. The correlation of grammatical gender with sex in animates may be weak, and sometimes it is non-existent. The Afroasiatic gender system is based upon a binary Masculine (M) vs. Feminine (F) distinction, with the latter generally being the marked member of the opposition.

Cross-linguistic comparability in CorpAfroAs 237



Number is based minimally upon a Singular (SG) vs. Plural (PL) opposition, with the latter being again the marked member. Against these family-wide generalities, a number of deviations are observed. Within the gender system, F is, occasionally, the unmarked member: such a situation has been described for Zayse and Zargulla (Omotic; Hayward 1989) but is not represented in the corpus. Variation within the number system is more widespread and diversified and involves both the number of elements in opposition and their markedness value. A common departure from the basic SG vs. PL opposition involves a Collective from which a nomen unitatis, or Singulative (SING) is derived: in this case, the markedness values are reversed, with SING often being marked. Other variation may involve the presence of a separate Dual (not represented in the corpus). More restricted variations may yield a Plurative alongside a Singulative, and the reanalysis of Plural as a third gender (in the sense of a partially lexically-specified classification of nouns; see below 3.6.). Gender and number may interact in agreement as well as in the actual shape of the exponents. 3.2 Categories affected by gender Among the languages in our corpus, nouns, personal pronouns and verbs favor the expression of gender. Adjectives too are often, but to a lesser degree, gendermarked. Moreover, a few languages (represented in our corpus by Afar) may lack the category of adjectives altogether. Other categories mark gender in at least a subset of their members. The conditions affecting the marking of gender may be lexical or morphosyntactic; e.g., demonstratives in Kabyle do not show gendervariation when they occur as affixed nominal modifiers, but they do as pronouns. Cf (23) vs. the pronominal use in (24). (23) a-rgaz-agi “this man” ABSL.SG-man-PROX (24)

wagi “this one (M)” PROX.SG.M wigi “these ones (M.PL)” PROX.PL.M

t-a-qʃiʃ-t-agi “this girl” F-ABSL.SG-child-SG.F-PROX tagi “this one (F)” PROX.SG.F tigi “these ones (F.PL)” PROX.PL.F

Berber languages have gendered numerals; when the native numerals have been superseded by (Arabic) loans, as in Kabyle, gender is marked on the inherited numbers ‘one’ and ‘two,’ and absent in the Arabic-derived numerals from ‘three’ onwards:

238 Amina Mettouchi, Graziano Savà and Mauro Tosco

(25) jiwən sin

“one (M)” “two (M)”

jiwət snat

“one (F)” “two (F)”

Where original numerals have been retained (as in some other Berber varieties) gender-agreement applies to the whole category of numerals. Similar restrictions operate in other languages of the AfroAsiatic phylum and in the corpus. In Table 1, a language will be considered as marking gender on the relevant category if it marks it in a subset (minimally, one element) of the members of that category: Table 1.  Gendered categories in the CorpAfroAs languages language

family

Noun Pers. Adj. Pro.

Afar

Cushitic +

Dem. Num. Poss. Def.

Verb

+

missing9 −



+

missing +

Arabic: Moroccan Semitic

+

+

+

+



+



+

Arabic: Tripoli

Semitic

+

+

+

+



+



+

Arabic: Juba

Semitic

















Beja

Cushitic +

+

+

+

+

+

+

+

Gawwada

Cushitic +

+

+



+

+

missing +

Hausa

Chadic

+

+

+

+



+

+

+

Hebrew

Semitic

+

+

+

+

+

+



+

Kabyle

Berber

+

+

+

+

+

+

missing +

Tamasheq

Berber

+

+

+

+

+

+

missing +

Ts’amakko

Cushitic +

+

+



+

+

missing +

Wolaytta

Omotic

+

+



+



+

+

+

Zaar

Chadic

















The defining characteristic of gender is agreement, and evidence for gender must be found outside nouns: a language may be said to have a gender system only if different agreement patterns are found on various target categories, and these ultimately depend on controllers (typically, nouns) of different types (cf. Corbett 1991, 2006). The following sections will provide evidence of the morphological expression of gender on nouns (3.3.) and pronouns (3.4.) before discussing gender agreement (3.5.) and the interaction of gender with number (3.6.). 9.  “Missing” implies that the corresponding word-class does not exist in the language in question. In the case of Afar (and other East Cushitic languages not represented in CorpAfroAs), the semantic class of “adjectives” is represented by different categories of verbs.

Cross-linguistic comparability in CorpAfroAs 239



3.3 Gender in nouns As anticipated in 3.1. and as is common in gender systems, little if any relationship is found between grammatical gender and natural sex. The following are two Cushitic examples among many. As will be further expounded in 3.6 below, Gawwada and Ts’amakko often overtly mark number — in (26), the Singulative — before gender on nouns: (26) hisk-att-o / hesk-att-o woman-SING-M

“woman”

(27) loʔ-o cow-M

“cow” (Gawwada/Ts’amakko)

(Gawwada/Ts’amakko)

Conflict between morphological (gender-assigned) and semantic (sex-determined) agreement are not uncommon; e.g., Gawwada hisk-att-o ‘woman,’ morphologically M, governs agreement with the verb in the 3F form when subject, although, e.g., morphological agreement is always followed by an agreeing possessive or adjective, which occur in the M. Languages without gender marker, such as Juba Arabic, may express the sex of animate entities lexically, for example with the word mára ‘woman;’ e.g.. ásed ‘lion,’ ásed ábu mára ‘lioness’ (where ábu, literally ‘father,’ is used, as in Arabic, as a relative marker). Languages where one gender only is marked on the head are very common; in such a case, the unmarked member is the M, with F being marked by a suffix, a prefix, or both. In Moroccan Arabic (Semitic), only F is in general overtly marked. The marker is suffixal: (28) əl=ħbəq “the basil” (ARY_AB_narr_01_004) ART=basil[-M] DET=N.M (29) əl=qbiːl-a “the tribe” (ARY_AB_narr_01_020) ART=tribe-F DET=N-PN

(28) further shows that whenever a category (in this case, and most typically in the domain of gender, M) is not formally marked in the language, it is not per se retrievable from the glosses (in (28) M is added in brackets for comparative purposes). Often, both genders are overtly marked, for example, in languages of the Cushitic group. In Gawwada, affixal -o and -e mark, respectively, the M and F gender (as well, for -e, the PL, as detailed in 3.6. below):

240 Amina Mettouchi, Graziano Savà and Mauro Tosco

(30) paʃ-o field-M N-PNG

“field” (GWD_MT_NARR_011_019)

(31) pij-e land-F N-PNG

“land” (GWD_MT_NARR_011_017)

Also in Wolaytta, M and F nouns have different endings, generally followed by gender-sensitive determiners and case markers: (32) gaammó-a “the lion” lion-DEF.M.ACC N-PNG-CASE

(WAL_AA_NARR_05_lion_05)

(33) ʔindé-ó “the old one” female_old-F.ACC ADJ-PGN

(WAL_AA_NARR_05_lion_26)

Covert (zero) gender marking is by no means rare. E.g., Moroccan Arabic daːṛ ‘house’ is unmarked as F; the agreeing adjective that follows is duly marked as F by -a: (34)

f=əl=daːṛ waːħəd-a in=DEF=house a_single-F PREP=DET=N.F ADJ-PNG “In one house” (ARY_AV_NARR_02_398)

Or, in the following example, by the verbal form, which is again marked as F: (35)

əl=daːṛ ʕaːdi t-ṭiːħ ʕla=na DEF=house FUT 3F-fall\IPFV along=OBL.1PL DET=N.F PTCL PNG-V PREP=PRO.PNG “The house will fall on us” (ARY_AV_NARR_02_044)

In Beja (Cushitic), gender is recovered inter alia from gender-sensitive (in)definite markers, as shown below: (36) i=taktʔi “the scarecrow” (BEJ_MV_NARR_09_jewel_48) DEF.M=scarecrow DET=CN.M (37) tiː=koːba DEF.F=container DET=N.F

“the container” (BEJ_MV_NARR_09_jewel_43)



Cross-linguistic comparability in CorpAfroAs 241

Gender marking may be affected by a following modifier, as in the Semitic status constructus, represented in the corpus by Arabic varieties. In this construction, the head precedes a (nominal or pronominal) modifer in genitival constructions; a F head is in this case followed by the affixal F gender -t which is dropped in isolation and in other syntactic configurations: (38)

ħkaːj-t haːjna story-F\CS Hayna N-PNG NP “The story of Hayna” (ARY_AB_NARR_01_014)

Although gender tends to be marked suffixally, it can also be expressed by a prefix or by both a prefix and a suffix (a circumfix), as in one of the Kabyle examples of (23) Kabyle t-aqʃiʃ-t-agi (‘F-ABSL.SG-child-SG.F-PROX’) ‘this girl.’ One and the same language can use both prefixes and suffixes in different word classes or subclasses. E.g., in Gawwada, while gender is marked on nouns by a final vowel, it is marked by a prefixal consonant on, inter alia, the possessives, where it marks the gender of the head noun: (39)

kaf-k-o h-aːju family-SING-M M-POSS.1SG N-PNG-PNG PNG-PRO.POSS “My family”(GWD_MT_NARR_002_009)

(40) pij-e t-aːni “our land” (GWD_MT_NARR_002_209) land-F F-POSS.1PL N-PNG PNG-PRO.POSS

This is further coupled for a few persons (in Gawwada, 2SG and 3SG) with genderagreement with the possessor: (41)

harɠ-ú=sa h-iːsi hand-M\DEM=DIST M-POSS.3SG.F M-PRO.DEM=PTCL.DEM PNG-PRO.POSS “That hand of hers” (GWD_MT_NARR_009_101)

3.4 Gender in personal and other pronouns Gender agreement in the personal pronouns is very widespread among the languages in the corpus. The most common situation is the presence of three forms for the non-participants, a M.SG and a F.SG one, and a gender-indifferent PL one. Other languages have much richer systems, where gender is present also in the forms for the addressee, both sometimes Singular and Plural:

242 Amina Mettouchi, Graziano Savà and Mauro Tosco

The following table shows the independent (emphatic or nominative according to the language) pronominal forms in a subset of the languages of the corpus: Table 2.  Independent personal pronouns in selected CorpAfroAs languages Beja

Gawwada

Hausa

Kabyle

Zaar

1SG

(un)ˈani

ʔano

niː

nəkk (i(ni))

mi

2SG.M

(um)baˈruːk

kai

kəʧʧ(i(ni))

2SG

ki

ʔato

2SG.F

(um)baˈtuːk

3SG.M

(um)baˈruː (um)baˈtuː

keː

kəmm(i(ni))

ʔiso

ʃiː

nəʦʦa

ʔise

ita

nəʦʦat

3SG 3SG.F

ʧi nəkkni

1PL.M 1PL

(an)hiˈnin

ʔine

muː nəkknti

1PL.F 2PL.M



(am)baˈraːk(na)

kunwi hune

2PL

kuː



2PL.F

(am)baˈtaːk(na)

kunnəmti

3PL.M

(am)baˈraː

nutni

3PL 3PL.F

ʔusunɗe

suː

(am)baˈtaː

ʧì nutənti

The simplest system in Table 2 is represented by Zaar, where gender does not play a role at all in the personal pronouns (the same happens in Juba Arabic). A very widespread pattern is exemplified by Gawwada, which has gender-specific forms for the 3SG only (the same obtains, among the languages of the corpus, for Afar, Ts’amakko and Wolaytta). Other languages show different stages of complexity: Hausa opposes M and F forms in 3SG and 2SG, but not in the PL, while Beja has separate M and F forms both in the SG and PL and for both the 2nd and 3rd persons. The same is true in Hebrew. Arabic dialects vary between these possibilities, while, among the languages in our corpus, Kabyle represents the farthest development in gender marking, with separate M and F forms for all the persons except 1SG. Gender-marking in pronouns therefore proceeds along the following cline: Ø → 3SG



{2SG, 3SG}



{2, 3}



{1PL, 2, 3}

Gender in other pronominal categories is subject to heavy language-specific constraints: sometimes object (or oblique) pronouns follow the distribution patterns

Cross-linguistic comparability in CorpAfroAs 243



of independent pronouns, often with a few reductions; e.g., the Kabyle pronominal clitics do not oppose 1PL.M and 1PL.F; the dative clitic of 3SG is likewise genderneutral: Table 3.  Gender in Kabyle pronominal clitics Dative clitic

Absolutive clitic

1SG

(i)yi

(i)yi

2SG.M

(j)ak

(i)k

2SG.F

(j)am

(i)kəm (i)t

3SG.M 3SG

(j)as (i)ʦʦ

3SG.F 1PL

(j)aγ

(j)aγ

2PL.M

(j)awən

(i)kwən

2PL.F

(j)asənt

(i)tənt

3PL.M

(j)asən

(i)tən

3PL.F

(j)asənt

(i)tənt

Sometimes different patterns emerge: in Gawwada, 2SG Oblique (used as direct objects and with postpositions) and Associative pronouns have separate M and F forms (in this as in other Cushitic languages there are no 3rd person object pronouns): (42)

ho he 2OBL.SG.M 2OBL.SG.F hola hela 2ASSOC.SG.M 2ASSOC.SG.F

Gender may further affect other pronominal categories, such as the Interrogative pronouns of Gawwada and other Cushitic languages (both M and F forms contrasting with a single PL form): (43)

h-ú-nka “which one (M)?” M-M\DEM-which h-í-nka “which ones?” PL-PL\DEM-which

t-í-nka “which one (F)? F-F\DEM-which

244 Amina Mettouchi, Graziano Savà and Mauro Tosco

3.5 Gender agreement Given the huge typological differences between and within each Afroasiatic language group (cf. Frajzyngier 2012a), it is no wonder that agreement patterns are very diversified, too. A selection of the main features is exemplified below. 3.5.1 Gender and gender agreement in Adjectives One of the simplest and more widespread agreement patterns involves the presence of the same (or a similar) allomorph of the head noun on the modifier, as in the following examples from Moroccan Arabic: in (44) a Ø-marked M.SG noun is followed by a Ø-marked adjective, while in (45) a F.SG noun is followed by an agreeing F.SG adjective. The same pattern is used in plural nouns: in Hebrew (Semitic; 46) a M.PL head is similarly followed by an agreeing adjective. In (44) M.SG, being the unmarked value for gender and number, is not overtly marked on either the head or the modifier: (44)

təqliːd ʕaːdi tradition(-M.SG) common(-M.SG) N.M ADJ.SG.M “A common tradition” (ARY_AB_NARR_01_275)

(45)

əl=məṛṛ-a əl=taanj-a DEF=time-F DEF=second-F DET=N.F-PNG DET=ADJ-PNG “The second time” (ARY_DC_NARR_01_SFCC_068)

(46)

anaʃ-im umlal-im man-M.PL unfortunate-M.PL N-PNG ADJ-PNG “Miserable people” (HEB_IM_CONV_2_SP1_065)

Agreement with the Head operates across an intervening noun modifier in a genitival construction. In the following example from Hebrew, the Adjective (meuʁgan-et) agrees with its F Head noun (xevʁa-t), which is further modified by the noun (jelad-im) immediately following it: (47)

xevʁa-t jelad-im meuʁgan-et society-F.SG child-M.PL organize\ACT.PTCP.F.SG N.F-CS N-PNG V-PNG “An organized children’s company” (lit.: “a society of children, an organized one”); (HEB_IM_NARR_4_SP1_076)

Verb-final languages (such as the Cushitic and Omotic languages of the Horn of Africa) may have either Head-Modifier clause order (as represented in the corpus



Cross-linguistic comparability in CorpAfroAs 245

by Gawwada and Ts’amakko) or Modifier-Head (as in Afar and Wolaytta). In Wolaytta the adjectives do not agree in gender, number and case with the head noun: (48) woggá góda-í “the big chief ” (WAL_AA_NARR_05 lion_43) big chief-M.NOM ADJ N-CASE (49)

sagi mhiːn eː-stʔeː big place 3SG.M-sit_down\REFL.IPFV ADJ N.M PNG-DER.V1 “He stays in a remote place” (BEJ_MV_NARR_09_jewel_54)

Identification of a separate category of Adjectives is more problematic for other languages (such as, among the languages of the corpus, Gawwada, Ts’amakko, and especially Afar), where adjectival concepts may be conceived of in verbal terms. Gender agreement is nevertheless found, as in the following example from Gawwada: (50)

ʃiːn-am-k-o piʕ-a=tta=kka ʔan=woʔ-i smear-PASS-SING-M white-M=INS=CONTR SBJ.1=want-PFV.1SG v–V.DER-PNG-PNG ADJ-PNG=CASE=PTCL PRO.SBJ=V-TAM.PNG “I want the white butter” (GWD_MT_NARR_006_033)

(51)

haːr-itt-e=si ɗaʕamm-aj fish-SING-F=PROX big\INT-F N-PNG-PNG=DEICT ADJ-PNG-PRO “This very big fish” (GWD_MT_NARR_004_071)

3.5.2 Gender and gender agreement in definite markers, demonstratives and other nominal modifiers A few languages possess definite markers. In Arabic and Hebrew they are invariable for gender and number, but in other languages (in)definite markers are gender-sensitive, as in Beja: (52)

i=tarab=eː ti=balami=t=eː firʔa-tiːt DEF.M=half=3PL.ACC DEF.F=supply=INDF.F=poss.3PL.ACC go_out-CVB.ANT DET=N.M=PRO DET=N.F=DET=PRO V1.PNG “They shared their food supply and” (BEJ_MV_NARR_07_cold_14)

(53)

naː=t ka=soː-j-a thing=INDF.F NEG.IPFV=CAUS-say\PFV-3SG.M N.F=DET PTCL=DER.V1-V1.IRG “He did not tell him anything (else)” (BEJ_MV_NARR_07_cold_75)

246 Amina Mettouchi, Graziano Savà and Mauro Tosco

Demonstratives are likewise gender-marked in a few languages, such as in Moroccan Arabic, where haːdaːk (DIST.M) and haːdiːk (DIST.F) contrast with a gender-neutral form diːk. Also in Wolaytta both the Distal and Proximal demonstratives have different gender-sensitive forms: (54)

he-ge-á ʔússa DIST.DEM-M.NMLZ-DEF.M.ACC heifer N.M “That (group of) heifers” (WAL_AA_NARR_05_lion_20)

(55)

ha-nn-ó-kka ʔeh-éetí PROX.DEM-F-F.ACC-INCL bring-2PL.PRES.AFF.Q DEICT-PGN-PGN-[ABSENT] V1-TAM “Just this one (F) you bring?” (WAL_AA_NARR_05_lion_34)

Gawwada and Ts’amakko have no Definite markers and their Demonstratives are invariable; they have instead a special class of pronominal heads (‘the one which…’). They are formed by a prefix gender marker (h-/k- for M and PL, tfor F) followed by the suffix gender markers of nouns, yielding semantically empty words. The combination of prefixes and suffixes unambiguously differentiates M, F, and PL, as exemplified in (56): Table 4.  The gendered pronominal heads of Gawwada and Ts’amakko Gawwada

Ts’amakko

M-M

h-o

k-o

F-F

t-e

t-e

PL-PL

h-e

k-e

(56)

h-o ɗaːmm-a maːtt-a M-M big-M Maatta-M PNG-PNG ADJ-PNG NP-PNG “The big one is (called) Maatta” (GWD_MT_NARR_002_021)

In a few languages (e.g., Beja) numerals agree in gender with the head they modify; in others, gender agreement is restricted to lower numerals, and minimally to ‘one,’ as in the following example from Gawwada: (57)

haːr-itt-e toʔ-ott-e=si fish-SING-F one-SING-F=PROX N-PNG-PNG N.NUM-PNG-PNG=DEICT “This one fish” (GWD_MT_NARR_004_056)

Cross-linguistic comparability in CorpAfroAs 247



In other cases, gender agreement is limited to ‘one’ and ‘two,’ as in Kabyle (cf. (25) above), or in the following example from Hebrew: (58) ʃn-ej jelad-im “two children” (HEB_IM_NARR_4_SP1_017) two-M.PL child-M.PL N-CS N-PNG

Plurality is often not marked on the noun: (59) qaw-h-o lakki “two men” (GWD_MT_NARR_005_090) man-SING-M two N-PNG-PNG NUM

The number ‘one’ has separate M, F, and PL forms (the latter meaning ‘some, a few’) in Gawwada and Ts’amakko: Table 5.  Gendered ‘one’ in Gawwada and Ts’amakko Gawwada

Ts’amakko

M

toʔ-okk-o

do-okk-o

(“one-SING-M”)

F

toʔ-ott-e

do-ott-e

(“one-SING-F”)

PL

toʔ-okk-e

do-okk-e

(“one-SING-PL”)

3.5.3 Gender and agreement in verbs As anticipated, in AfroAsiatic subject nouns command gender-agreement on the form of the verb, although this is rare in the PL (and a fortiori, where existent, in the Dual). Gender-agreement for the addressee (the 2nd person) in the verbal form is found only in Kabyle and Tamasheq among the languages represented in the corpus; much rarer, and not found in our corpus, is gender-agreement for the speaker (the 1st person). In contrast, gender agreement for a non-participant (the 3rd person) in the SG is almost universal, with different M.SG and F.SG forms: (60)

jhaːm dhaːj jʔ-i=t leopard(-M.SG) DIR come-AoR.3SG.M=COORD SBJ.N.M POSTP V2.IRG-TAM.png=CONJ “A leopard came towards them and” (BEJ_MV_NARR_15_leopard_016)

(61)

hiːja ma=lqːa-t ma=t-diːr 3SG.F NEG1=find\PFV-3SG.F NEG1=3F-do\IPFV PRO.IDP PTCL.NEG=V-PNG PTCL.NEG=PNG-V “She did not know what to do” (ARY_AB_NARR_01_120)

248 Amina Mettouchi, Graziano Savà and Mauro Tosco

(62)

ka=i-sərħ-u=h REAL=3-take_to_graze\IPFV-PL=OBL.3SG.M TAM=PNG-V-PNG=PRO.PNG “They take them to graze” (ARY_AB_NARR_01_273)

(63)

ẓid-it idammən-iw ad=tn be_sweet\PFV-QLT.PL blood\ABSL.PL.M-POSS1.SG POT=ABSV3SPL.M V.QLT-PRO N.COV-AFFX PTCL=PRO t-sw-mt SBJ2-drink\AOR-SBJ2PL.F CIRC1-V13%-CIRC2 “My blood attracts you and you will drink it?” (KAB_AM_ NARR_01_M_340)

Gender-agreement is also found in participial forms, as in Hebrew: (64)

Hi haj-ta koʁ-et 3F.SG be\PFV-3F.SG name\ACT.PTCP-F.SG PRO.IDP V-AFFX.PNG V-PNG “She used to call” (HEB_IM_NARR_4_SP1_097)

Subject gender marking may appear on phonologically separate morphemes, as in Hausa: (65)

tà ʤeː gidaː à gàɽiː-n-sù 3SG.F.AOR go home at town-GEN-3PL.GEN PNG.TAM V0 N PREP N-SYNT-PNG “She went home to their town” (HAU_BC_NARR_02_SP1_021)

On the other hand, different paradigms in the same language may show various syncretism patterns, whereby gender and/or number oppositions are lost. For example, negative paradigms in Cushitic usually have a single form in the Past or Perfect; in Gawwada, a single form is used for all Singular subjects in the Negative Past. 3.6 The interaction of gender, number and case In a few case-rich languages core cases may have different case forms for gender and number. In Wolaytta this happens for the Nominative, the Accusative, and the Definite and Indefinite Genitive. An affix -í marks the Nominative case in SG.M nouns and in PL nouns irrespective of gender, while -á signals SG.F nouns. The same syncretism of the M gender with the PL number is found in the Accusative,

Cross-linguistic comparability in CorpAfroAs 249



with affixal -á marking both SG.M nouns and all PL nouns, and affixal -ó being reserved for SG.F nouns. In Afar, gender plays a role together with the phonological shape of the word in conditioning the expression of the Subject and Genitive case: only vowel-final M nouns change their final vowel of the Basic (or Absolute) case form into -í. F nouns, as well as consonant-final M nouns (and a few exceptions of the vowelending ones) do not have overt case-marking. The accented nature of the M case affix causes a change in the accent pattern, which becomes the sole marker of case for M i-final nouns: Table 6.  Subject/Genitive case-marking on V-final M nouns in Afar Absolute case

Subject and Genitive case

áwka

awkí

“boy”

abbatíːnu

abbatiiːní

“authority”

absíːse

absiːsé

“supervisor”

gínni

ginní

“demon”

Also, in Beja the Definite markers are case-sensitive for the Nominative (and the Accusative) cases: (66) uː=mha “the morning” (BEJ_MV_NARR_09_jewel_59) DEF.SG.M.NOM=morning DET=SBJ.N.M (67) tuː=tiji “the monster” (BEJ_MV_NARR_09_jewel_49) DEF.SG.F.NOM=snake DET=SBJ.N.F

The Associative (or Locative) case, which is the only morphological case of Gawwada and Ts’amakko (Cushitic), also has different gendered case forms: Table 7.  The gendered associative case in Gawwada and Ts’amakko Gawwada ASSOC.M

Ts’amakko

-ito

-ilo

ASSOC.F

-atte

ASSOC.PL

-ete

The association between gender and number marking is pervasive; a good example of a typical relation in gender and number marking is shown in Tamasheq verb conjugational pattern. Basically, SG is marked by a prefix, generally j-/i-but

250 Amina Mettouchi, Graziano Savà and Mauro Tosco

Ø in certain verb classes for M and t- for F, but no suffix. PL is instead marked by different gendered suffixes but no prefix: Table 8.  The interplay of gender and number marking in Tamasheq verbs prefix

suffix

3SG.M

i-, j-, Ø

Ø

3SG.F

t-

Ø

3PL.M

Ø

-ɐn, -Vn

3PL.F

Ø

-nɐt

Gender switch coupled with number is common in many Cushitic languages, such as Somali (not represented in the corpus). In such a system, usually called ‘gender polarity,’ the gender of a noun of specific noun classes is reversed in the PL. The latter is usually marked by a suffix, but certain noun classes may be marked by gender switch alone. Again in Cushitic, while gender is an inherent property of nouns, number is often not an obligatory category and may be seen as a matter of derivation (cf. Mous 2012: 361–363). A special situation is provided by two closely-related languages of the Dullay branch of the Cushitic group (Gawwada and Ts’amakko; cf. Savà 2005), which are analyzed in the corpus as having a three-fold gender system, with PL alongside M and F, and a three-fold number system: preternumeral (or basic), SING, and Plurative (PLUR). Like M and F nouns, PL nouns are marked by a final vowel (typically, -o for M, and -e for both F and PL). Number marking may or may not be present in the shape of a noun. The internal morphological composition of nouns may be captured by the following template

STEM ± NUMBER MARKING + GENDER MARKING

In short, number marking always precedes gender marking, and while overt expression of number may be absent, the marking of gender is always part and parcel of a noun form. While the vast majority of count nouns are M or F in their basic form, a few are PL. Many mass nouns are PL. As anticipated, the gender of nouns denoting inanimate countable entities is not semantically motivated: they may either be Masculine, Feminine, or (in a minority of cases) Plural. Number derivation operates from a basic noun, with the addition of either a Singulative or a Plurative affix before the gender marker. Against the free genderassociation of basic nouns, Singulative nouns may only be either M or F in gender, and Plurative nouns are always PL in gender.

Cross-linguistic comparability in CorpAfroAs 251



The interplay of gender and number in Gawwada is graphically illustrated in Table 9: Table 9.  The interaction of gender and number in Gawwada SING

basic

M

M

F

F PL

PLUR

PL

The simplest case involves probably a number-unmarked (or basic) noun, semantically both a singular and a generic and either M or F in gender, and a numberderived Plurative expressing a plural: (68) paʃ-o “field” paʃ~ʃ-e “fields” field-M field~PLUR-PL

In (69) the referent is a sex-differentiated animate, and a Singulative Feminine form is further derived: (69) har-o “dog” dog-M

har-itt-e “bitch” dog-SING-F har~r-e “dogs (bitches)” dog~PLUR-PL

For many nouns, having either animate or inanimate referents, no number-unmarked form is found: a Singulative acts both as a singular and a generic, against which a Plurative form acts as a plural: (70) ʔasp-itt-e “storm” storm-SING-F

ʔasp-iɗɗ-e “storms” storm-PLUR.PL

Even a morphological Singulative may act as a semantic generic or collective, from which a further, or second, Singulative (with a singulative meaning) can be derived: (71) ʔinn-akk-o “fly; flies” fly-SING-M

ʔinn-att-akk-o “a single fly” fly-SING-SING-M

252 Amina Mettouchi, Graziano Savà and Mauro Tosco

Not infrequently, the morphologically simplest (i.e, not gender-marked) form is a semantic plural, from which a Singulative is derived: (72) ʔilk-e “teeth” tooth-PL

ʔilk-akk-o “tooth” tooth-SING-M

As expected, semantics plays a role in the selection of gender, but not a decisive one; while (72) above may give the — partially correct — impression that the Plural gender is mostly selected for collective entities or mass nouns (from which a Singulative acts as a nomen unitatis), exceptions are by no means uncommon: (73) ker-e “headrest” ker-aɗɗ-e “headrests” headrest-PL headrest-PLUR-PL (74) minn-e “house” house-PL

minn-aɗɗ-e “houses” house-PLUR-PL

Finally, (75) shows a Plural (and semantic collective) noun for an animate entity against which both a pair of gendered Singulatives (reflecting natural gender opposition) and a Plurative are derived: (75) ʔorr-e “potters” potter-PL

ʔorr-itt-o “a potter (man)” potter-SING-M ʔorr-itt-e “a potter woman” potter-SING-F ʔorr-aɗɗ-e “(many) potters” potter-PLUR-PL

3.7 AfroAsiatic languages as gendered languages par excellence? Apart from Chadic, where many languages have no gender at all, Afroasiatic languages are ‘gendered’ languages par excellence: ‘a few gender morphemes, foremost among them the F marker -t, show an extraordinary persistence across time and space, and may be seen as a shibboleth for the whole phylum. Also the gender system as a whole, with its binary distinction between a Masculine and a Feminine, is very persistent — no additions to the system of genders is observed (except for the possible use of Plural as a gender; cf. 6. above). Conversely, absence of gender is found only in Chadic and in typologically ‘deviant’ languages, such as the Arabicderived Juba Arabic and Ki-Nubi creoles. Gender is marked in a number of lexical categories and subcategories and plays a central role in agreement. On the other hand, gender in AfroAsiatic is not only a means of reference, but has acquired semantic functions such as diminutive, sometimes pejorative (Frajzyngier 2012a: 522). As mentioned above, gender



Cross-linguistic comparability in CorpAfroAs 253

— alone or in combination with an affix — may come to mark number in the socalled ‘gender polarity’ of certain Cushitic languages. As we have tried to show in this paper, all or most of these properties and values are evidenced and can be neatly investigated in CorpAfroAs.

Conclusion The three studies conducted in this paper show that a corpus-based analysis can lead to interesting discoveries concerning features of Afroasiatic languages, provided some information is given in the grammatical sketch of the corresponding language. Automatic retrieval of directional particles in the corpus allows a quick assessment of the distribution of those morphemes, as well as the semantic types of associated verbs. Contexts facilitate the analysis of discourse factors and modal dimensions. It appears that for the six languages under consideration, the directional morphemes have grammaticalized outside the domain of space and motion, and have acquired aspectual, modal or interactional dimensions. A thorough comparative study of those morphemes within AfroAsiatic is yet to be conducted, on the basis of this preliminary exploration. The analysis of labels pertaining to the domain of Case shows that case systems largely integrate morphological marking of syntactic role. Various morphological means are used to mark Case, depending on the languages, and the corpus allows the end-user to retrieve the relevant forms, within their context. Thus, it is also possible, as was done in this paper, to investigate one case label (Nominative) across the corpus, and thanks to the associated grammatical sketches, conduct an informed comparison. However, the limits of a comparison based on labels and grammatical sketches is apparent in the fact that each case label has to be considered within a system. The paper by Frajzyngier and Mettouchi in this volume proposes an alternative solution for cross-linguistic comparison, to be implemented in a project funded by the Agence Nationale de la Recherche for 2013–2016, CorTypo.10 Finally, Gender is shown to be a pervasive category within AfroAsiatic, and CorpAfroAs provides rich and varied examples illustrating not only the morphological marking of Gender, but also its uses in agreement, for reference-tracking, and for semantic distinctions. Further, more fine-grained comparisons, for instance the cross-linguistic comparison of the use of gender for diminutive mark-

10.  DOI: http://dx.doi.org/10.1075/scl.68.website.

254 Amina Mettouchi, Graziano Savà and Mauro Tosco

ing, are yet to be conducted, on a larger corpus for which CorpAfroAs provides a pilot version.

References Azeb Amha. 2012. ‘Wolaytta Corpus’. Corpus recorded, transcribed and annotated by Azeb Amha. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOI: http://dx.doi.org/10.1075/scl.68.website. Accessed on 12/07/2013. (=WAL_AA) Barontini, Alexandrine. 2012. ‘Moroccan Arabic Corpus (Meknes)’. Corpus recorded, transcribed and annotated by Alexandrine Barontini. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOI: http:// dx.doi.org/10.1075/scl.68.website. Accessed on 12/07/2013. (=ARY_AB) Blake, Barry. 2001. Case. Cambridge: CUP. DOI: 10.1017/CBO9781139164894 Bybee, Joan L. 1985. Morphology: A Study of the Relation between Meaning and Form [Typological Studies in Language 9]. Amsterdam: John Benjamins DOI: 10.1075/tsl.9 Caron, Bernard. 1989. The verbal system of Ader Hausa. In Current Progress in Chadic Linguistics [Current Issues in Linguistic Theory 62], Zygmunt Frajzyngier (ed.), 131–169. Amsterdam: John Benjamins. DOI: 10.1075/cilt.62.08car Caron, Bernard. 2012a. Zaar grammatical sketch. ANR CorpAfroAs: A Corpus for Afro-Asiatic Languages. http://dx.doi.org/10.1075/scl.68.website (18 April 2013). Caron, Bernard. 2012b. ‘Zaar Corpus’. Corpus recorded, transcribed and annotated by Bernard Caron. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOI: http://dx.doi.org/10.1075/scl.68.website. Accessed on 12/07/2013. (=SAY_BC) Caron, Bernard. 2012c. Hausa grammatical sketch. ANR CorpAfroAs: A Corpus for Afro-Asiatic Languages. http://dx.doi.org/10.1075/scl.68.website (18 April 2013). Caron, Bernard. 2012d. ‘Hausa Corpus’. Corpus recorded, transcribed and annotated by Bernard Caron. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOI: http://dx.doi.org/10.1075/scl.68.website. Accessed on 12/07/2013. (=HAU_BC) Comrie, Bernard. 1975. Aspect: An Introduction to the Study of Verbal Aspect and Related Problems. Cambridge: CUP. Corbett, Greville G. 1991. Gender. Cambridge: CUP. DOI: 10.1017/CBO9781139166119 Corbett, Greville G. 2006. Agreement. Cambridge: CUP. Caubet, Dominique. 2012. ‘Moroccan Arabic Corpus’. Corpus recorded, transcribed and annotated by Dominique Caubet. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOI: http://dx.doi.org/10.1075/ scl.68.website. Accessed on 12/07/2013. (=ARY_DC) Fillmore, Charles J. 1968. The case for case. In Universals in Linguistic Theory, Emmon Bach & Robert T. Harms (eds), 1–88. New York NY: Holt, Rinehart, and Winston. Frajzyngier, Zygmunt. 1987. Ventive and centrifugal in Chadic. Afrika und Übersee 70(1): 31–47. Frajzyngier, Zygmunt. 2012a. Typological outline of the Afroasiatic phylum. In The Afroasiatic Languages, Zygmunt Frajzyngier & Erin Shay (eds.), 505–624. Cambridge: CUP.



Cross-linguistic comparability in CorpAfroAs 255

Frajzyngier, Zygmunt. 2012b. A Grammar of Wandala. Berlin: Mouton de Gruyter.  DOI: 10.1515/9783110218411 Hayward, Richard J. 1989. The notion of ‘default gender’: A key to interpreting the evolution of certain verb paradigms in East Ometo, and its implication for Omotic. Afrika und Übersee 72: 17–32. Lazard, Gilbert. 1975. La catégorie de l”éventuel. In Mélanges E. Benveniste, 347–358. Paris: Société de linguistique. Lux, Cécile. 2012. ‘Kabyle Corpus’. Corpus recorded, transcribed and annotated by Cécile Lux. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOI: http://dx.doi.org/10.1075/scl.68.website. Accessed on 12/07/2013. (=TAQ_CL) Malibert, Il-Il. 2012. ‘Modern Hebrew Corpus’. Corpus recorded, transcribed and annotated by Il-Il Malibert. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOI: http://dx.doi.org/10.1075/scl.68.website. Accessed on 12/07/2013. (=HEB_IM) Mettouchi, Amina. 1997. La particule D en berbère (kabyle): Transcatégorialité des marqueurs énonciatifs. In Proceedings of the 16th International Congress of Linguists, Paper No 0270. Oxford: Pergamon. Mettouchi, Amina. 2011. The grammaticalization of directional clitics in Berber. Paper presented at the workshop ‘Come and Go off the grammaticalization path’, convened by Jenneke van der Wal and Maud Devos at the 44th Annual meeting of the Societas Linguistica Europaea. Mettouchi, Amina. 2012. ‘Kabyle Corpus’. Corpus recorded, transcribed and annotated by Amina Mettouchi. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOI: http://dx.doi.org/10.1075/scl.68.website. Accessed on 12/07/2013. (=KAB_AM) Mettouchi, Amina & Frajzyngier, Zygmunt. 2013. A previously unrecognized typological category: The state distinction in Kabyle. Linguistic Typology 17(1): 30–59.  DOI: 10.1515/lity-2013-0001 Mous, Maarten. 2012. Cushitic. In The Afroasiatic Languages, Zygmunt Frajzyngier & Erin Shay (eds), 342–422. Cambridge: CUP. Newman, Paul. 1983. The efferential (alias causative) in Hausa. In Studies in Chadic and Afroasiatic Linguistics, Ekkehard Wolff & Hilke Meyer-Bahlburg, 397–418. Hamburg: Buske. Newman, Paul. 2000. The Hausa Language: An Encyclopedic Reference Grammar. Yale CT: Yale University Press. Savà, Graziano. 2005. A Grammar of Ts’amakko. Cologne: Rüdiger Köppe. Savà, Graziano. 2012. ‘Ts’amakko Corpus’. Corpus recorded, transcribed and annotated by Graziano Savà. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOI: http://dx.doi.org/10.1075/scl.68.website. Accessed on 12/07/2013. (=TSB_GS) Tosco, Mauro. 2010. Why contrast matters: Information structure in Gawwada (East Cushitic). In The Expression of Information Structure. A Documentation of its Diversity across Africa [Typological Studies in Language 91], Ines Fiedler & Anne Schwarz (eds), 315–348. Amsterdam: John Benjamins. DOI: 10.1075/tsl.91.12tos Tosco, Mauro. 2012a. The grammar of space of Gawwada. In Proceedings of the 6th World Congress of African Linguistics, Cologne, 17-21 August 2009, Matthias Brenzinger & AnneMaria Fehn (eds), 523–532. Cologne: Rüdiger Köppe.

256 Amina Mettouchi, Graziano Savà and Mauro Tosco Tosco, Mauro. 2012b. ‘Gawwada Corpus’. Corpus recorded, transcribed and annotated by Mauro Tosco. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOI: http://dx.doi.org/10.1075/scl.68.website. Accessed on 12/07/2013. (=GWD_MT) Vanhove, Martine. 2012. ‘Beja Corpus’. Corpus recorded, transcribed and annotated by Martine Vanhove. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOI: http://dx.doi.org/10.1075/scl.68.website. Accessed on 12/07/2013. (=BEJ_MV) Vicente, Ángeles. 2012. ‘‘Moroccan Arabic Corpus (Ceuta)’. Corpus recorded, transcribed and annotated by Ángeles Vicente. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOI: http://dx.doi.org/10.1075/ scl.68.website. Accessed on 12/07/2013. (=ARY_AV)

Functional domains and cross-linguistic comparability* Zygmunt Frajzyngier and Amina Mettouchi

University of Colorado, Boulder, EPHE and CNRS-LLACAN

This paper investigates a strategy other than the one currently implemented in the CorpAfroAs project, allowing cross-linguistic comparison among multiplelanguage corpora. It involves comparing a database of functional domains and subdomains across languages. The underlying principle is that cross-linguistic comparison should be conducted on the basis of meanings/functions actually encoded in the grammatical systems of individual languages rather than on the basis of aprioristic categories. Such a study yields reliable information regarding the differences and similarities between grammatical systems.

1. Introduction This paper proposes a theory and methodology aimed at overcoming a longstanding difficulty in choosing the proper object in cross-linguistic comparison. The fundamental assumption is that languages may differ significantly in the functional domains and subdomains encoded in their grammatical systems and also in the specific meaning and functions belonging to those grammaticalized domains. The term ‘grammaticalized’ in the present study indicates the encoding of a given meaning through some formal means, e.g. inflectional means, auxiliaries, linear order, etc. It has long been assumed that even if languages encode the same meanings or functions, the formal means by which they are encoded may differ considerably, even among related languages. Once we can isolate specific meanings grammaticalized in a given language, we will have a better tool for the comparison of the formal means used. This paper adapts a previously developed theoretical approach to linguistic typology (Frajzyngier & Shay 2003; Frajzyngier *  Frajzyngier’s work on this study was supported by the Chaire de chercheur étranger des pays de la Loire. This support is gratefully acknowledged. Frajzyngier’s previous work on Mina and Hdi was supported by grants from National Science Foundation. doi 10.1075/scl.68.08fra 2015 © John Benjamins Publishing Company

258 Zygmunt Frajzyngier and Amina Mettouchi

2010, 2011, 2013, submitted), and considers in detail one of the issues for which the CorpAfroAs project provides an important source of information. To a certain extent, this paper complements the contribution by Mettouchi, Savà and Tosco, in this volume. The domain that will illustrate the theory and methodology is the reference system; the languages examined are Kabyle (Berber) and Mina (Central Chadic). 2. Basic question What should be the starting point in typological comparison? The proper object of linguistic typology depends very much on the linguist’s aims and the aspects of the language targeted for study. Therefore, there is no single answer to this question. When it comes to the comparison of what is often referred to as syntactic structures, linguists are as divided with respect to typology as they are with respect to other major theoretical issues. Some linguists agree that the proper object of linguistic typology should be functions (Lazard 2004; Seiler 1995; Haspelmath 2007, 2010); Newmeyer 2007 maintains the notion that formal structures are the starting point. Even for those who favor functions, exactly which functions should be chosen remains a controversial issue. For Frajzyngier and Shay (2003), only functions actually encoded in the language should be targeted. For Haspelmath (2010) following Lazard (2004), one should make an abstraction from the functions actually encoded and instead postulate some canonical functions, that they refer to as ‘comparative concepts’. In practice, the selection of functions is largely intuitive, informed by linguists’ experience with a variety of languages as evidenced by many chapters in World Atlas of Language Structures (Haspelmath et al. 2005) and discussions in Dixon (2010) where a large inventory of functions is given, without explicit explanation about why certain categories were selected rather than others. 3. Theoretical assumptions of the present approach Every grammatical system encodes a finite number of grammaticalized meanings. Those meanings are organized into a system of functional domains, such as Aspect, Reference, Noun modification, etc. The number of functional domains is finite for each language (Frajzyngier & Mycielski 1998). At any given time, the number of grammaticalized meanings in a language is also finite, although languages may grammaticalize new meanings and lose others.



Functional domains and cross-linguistic comparability 259

For the purpose of synchronic analysis, the constant diachronic change may be ignored. Each grammaticalized meaning may be realized by one or several constructions, and those constructions do not have to share formal characteristics. One formal means, e.g. reduplication, may encode functions belonging to different domains such as aspect (progressive, habitual or perfective depending on the language), quantification, or plurality. Portmanteau morphemes by their nature encode functions belonging to different domains. Grammaticalized meanings are organized in a system of functional domains and subdomains. Members of the same subdomain are in complementary distribution: if one occurs, the other cannot. Members of different domains can cooccur with each other. The discovery of the functional domains, subdomains, and grammaticalized meanings constitutes the discovery of the semantic structure of the language (Frajzyngier, submitted). Once one has discovered individual forms, including constructions, one is able to discover functional domains and subdomains through the study of the cooccurrence of various forms. This done, the individual grammaticalized meanings can be described through contrast with other grammaticalized meanings belonging to the same domain. Given the common physiological, cognitive, and psychological characteristics of humans, and their common social needs, one can expect some degree of overlap in the meanings that have been grammaticalized in individual languages. One can also expect a number of differences across languages with respect to the domains that have been grammaticalized and with respect to individual grammaticalized meanings. 4. How to discover grammaticalized meaning 4.1 General methodology In order to discover grammaticalized meanings one must first have a full list of the formal means of encoding available for individual languages. Then, one can proceed with the discovery of which formal means of encoding are in complementary and which are in contrastive distribution, very much in the manner of discovering underlying segments in phonological analysis. Forms that can co-occur encode meanings belonging to different domains, unless those forms are components of one means of encoding. Forms that cannot co-occur most probably encode meanings belonging to the same domain. Once again, the description of the grammaticalized meaning is accomplished through comparison with other meanings encoded in the given domain. Consider the distribution of definite and indefinite

260 Zygmunt Frajzyngier and Amina Mettouchi

articles in English. The fact that a and the cannot simultaneously determine a noun indicates that they belong to the same domain. Extending the argument, the deictics ‘this’ and ‘that’ are also in complementary distribution with definite and indefinite articles. Hence they also belong to the same domain of reference. Consider now the co-occurrence of articles and deictics/demonstratives with possessive pronouns. Determiners and possessive pronouns cannot occur in any sequence with a noun in English: *a my book, *the my book, *this my book, *that my book, *my a book, etc. The constraints on co-occurrence in a sequence may imply that possessive pronouns and determiners also belong to the same domain. But that is actually a false conclusion. In English there exists a construction that is specifically dedicated to accommodating the encoding of functions subsumed under the definite and indefinite articles and the determiners with nominal or pronominal possessors. The construction has the form: DET N of Possessor, followed by a significant pause. The possessor may be nominal or pronominal. If it is pronominal it is marked by a specific set of independent possessive pronouns, e.g. mine, yours, etc. (1) (all English examples from COCA): (1) a. A friend of mine called me, b. All right, Jay-Z, of course, he’s probably a friend of yours. c. This friend of his was seen taking $5,000 in cash away from the home as well as a Rolex.

So possessive pronouns in English can co-occur with determiners in the same construction, and therefore they belong to different domains. Note that this conclusion has been stated only with respect to English. The structure of other languages may be quite different. 4.2 Formal means of encoding Languages differ significantly with respect to the types of formal means available and with respect to the composition of each type. We do not yet have a comprehensive list of formal means of encoding available across languages or for any individual language. The following list is illustrative rather than exhaustive. 1. Lexical categories and subcategories. Languages differ in the types of categories they exhibit. Some languages have adjectives; others do not. Some languages have adverbs; others do not. 2. Free grammatical morphemes (prepositions, complementizers, subordinators, etc.) 3. Auxiliary verbs and the use of nouns as a encoding means in the grammatical system (e.g. spatial specifiers in a number of Chadic languages).



Functional domains and cross-linguistic comparability 261

4. Linear orders (which is slightly different from the notion of ‘word order’ as it is used in functional typology: linear orders include different types of encoding through position, relative order and variation on the default order, as described in Frajzyngier (2011a). 5. Tone, stress, various types of vowel harmony, and other phonological means including: rhythm, pause, intonation, creaky voice, intensity. 6. Inflectional means on all lexical categories. The inflectional means include affixations, phonological changes of underlying segments, gemination, reduplication, apophony. 7. Serial verb constructions, i.e. the use of verbs to encode grammatical meaning. This formal means is widely attested in languages of Africa, South-East Asia, and Australia. 8. Repetition of phrases comprising a noun and a verb, hence quite distinct from reduplication (Gidar, Frajzyngier 2008). 5. Examples of functional domains The notion of functional domain has been informally recognized in many traditional grammars, when talking about the tense system, aspectual system, or relative clauses, but the composition of a functional domain is generally not explicitly justified. Thus, in many traditional and contemporary works there is a good deal of hesitation regarding assignment of a given form to a given domain. Typical examples involve tense and aspect, even in some languages with a long and rich tradition of research (consider the early statement by Chaim Rabin about Semitic languages quoted in Izre’el (2002) that some tenses behave like aspects and vice versa). In the present approach, the notion of functional domain is crucial for determining grammaticalized meaning. The functional domain is a set of functions that all share one semantic characteristic, and the forms that realize them are in complementary distribution within a relevant constituent. Thus, if a language has grammaticalized the domain of tense, one should not have two tenses in the same clause (a tense can however be realized by a complex construction involving several morphemes such as inflexion on a auxiliary, plus a participial form). If a language has also grammaticalized the domain of aspect, a clause may contain a tense and an aspect, but not two aspects. It may happen that two forms individually encode different functions, and are combined to encode a third function. The following are some of the functional domains grammaticalized in many language families:

262 Zygmunt Frajzyngier and Amina Mettouchi

– – – – – – – – – – – –

Semantic relations among sentences in discourse; Semantic relations among clauses in sentences; Mood; Aspect; Tense; A variety of relations between nouns or noun phrases; A variety of relations between noun phrases and the predicate; Reference system; Equational predications; Social relationships between the speaker and the addressee(s); Spatial relationship between the place of speech and an event; Information structure.

An utterance may consist of several grammaticalized meanings. Some meanings may be concatenated and others may be embedded in a hierarchical structure. It will indeed be rare for an utterance to consist of just one grammaticalized meaning. Even a single-word utterance may represent two or more grammaticalized meanings. For example, the imperatives of intransitive verbs consist of the imperative modality and the intransitive predication. Moreover, structures consisting of the same form and the same lexical categories, e.g. coordinating conjunctions of adjectives or verbs, may represent different grammatical meanings across languages (Frajzyngier 2012). 6. Similarities and differences among languages 6.1 Languages may encode different functional domains An examination of individual languages indicates that they may vary significantly in the functional domains they have grammaticalized. Thus some languages have grammaticalized the domain of aspect, others have grammaticalized the domain of tense, and still others have grammaticalized both. Some languages (e.g. Hausa, West Chadic, Nigeria) have grammaticalized the domain which indicates location of the event with respect to the place of speech, distinguishing between ventive and andative, whereas other languages (e.g French) have not grammaticalized such a distinction.



Functional domains and cross-linguistic comparability 263

6.2 Languages differ in the internal structure of the functional domain Even if languages have grammaticalized the same functional domains, the internal structure of such domains may differ significantly across languages. This fact has been well documented for the domains of tense and aspect (see Comrie 1976; Dahl 1985; Cohen 1989; Bybee 1994). 6.3 Languages differ in their grammaticalized meanings Even if languages have grammaticalized the same functional domains and the same subdomains, they may differ in their grammaticalized meanings. If two languages have a subdomain of future, but in one language this consists of two or three tenses, while in another it consists of only one, then the tenses in the subdomains of future in the two languages have different values. Let us consider a specific example. Although English and Hdi (Central Chadic, Nigeria) both have a tense whose time reference is past with respect to the time of speech, the English past tense does not specify what kind of past it is, while the Hdi past tense refers to a past that must be accompanied by a specific time reference (2). (2) ká-xə̀n mántsá, sí ndá gá ká ndá rvíɗìk comp-3pl like that past assc where 2sg assc night ‘and they said: where were you last night?’

If the time of the event is past but not specific, the form sí is not used and there is no marking of tense. The clause below was taken from a narrative referring to an event in the past, but in another context it could have meant a habitual or ongoing event (3): (3) lá-ghà pákáw ghúvì kà mná-n-tá krì go-d:pvg hyena seq tell-3-ref dog ‘And Hyena said to Dog, go!’ (Frajzyngier with Shay 2002)

Thus, the English past tense does not encode the same time-reference as the Hdi referential past. 7. Potential meanings within the domain of reference The domain of reference enables the listener to identify the referent of a nominal, pronominal, verbal, or propositional form. The types of subdomains encoded are quite large, and as with all other subdomains, differences across languages are quite significant. The traditional distinction between reference to the environment

264 Zygmunt Frajzyngier and Amina Mettouchi

of speech (deixis) and reference to the elements in discourse (anaphora) is not sufficient to account for the varieties of forms in various languages. Languages from different families make an additional distinction between the de dicto and de re domains (Frajzyngier 1991, 1997). The de dicto domain includes hypothetical situations and reported speech. The de re domain includes actual rather than hypothetical referents. In some languages, the de dicto domain encodes fewer distinctions than the de re domain. Thus, in Mupun (West Chadic, Nigeria) there is a distinction between the second person masculine and feminine in the de re domain, but there is only one gender, masculine, in the de dicto domain. Here is an example illustrating the de dicto domain, where the masculine pronouns refer to an obviously feminine referent (4): (4)

gaskiya, get kadan ka kə ak ɓe truly (H.) past if (H.) 2m with pregnancy SEQ ba də mo pə ıal ɗik n-ka neg past 3pl prep marry prep-2m ‘Truly, in the past if you (masc.) were pregnant they wouldn’t marry you.’ (Frajzyngier 1993: 88)

7.1 Mere formal categories are not good indicators of function The system of reference is one of the many systems where formal categorial similarity across languages is not a good predictor of function. Let us consider the category ‘subject pronoun’ whose existence can be established through standard distributional analysis, the analysis of morphological forms, linear orders and other discovery techniques. The functions of subject pronouns in English are quite different from those of Polish (Frajzyngier 1997). In English, subject pronouns sharing the features of gender, person, and number with the preceding subject noun encode coreference with the preceding subject. Subject pronouns in Polish, on the other hand, always encode switch reference or contrastive focus with respect to the preceding subject, regardless of how the preceding subject is marked (Polish examples from Polish National Corpus, NKJP). In example (5), the third person plural subject pronoun oni is deployed even though the verb in the complement clause also encodes the third person plural masculine subject. The reason for the deployment of the pronoun is that the preceding subject is second person singular (5): (5)

I pomyślałaś, że oni nie chcieliby, conj think:prf:past:2m:f comp 3pl:m neg want:past:hyp:3pl:m żebyś była takim tchórzem, który comp:hyp:2sg be:3f:past such:instr coward:instr rel:m



Functional domains and cross-linguistic comparability 265

boi się pójść na studia. fear refl go on study:pl ‘And you thought that they wouldn’t want you to be such a coward not wanting to go on to higher studies’

Coreference in Polish is encoded by the marking of gender, person, and number on the verb. Consider the following example where the complementizer is not followed by a pronoun, and the verb encodes the first person singular subject just like the verb in the matrix clause (6): (6) Właściwie nie mogę powiedzieć, że ją znam. rightly neg be able:1sg:pres say comp 3f know:1sg ‘To tell you the truth, I cannot say that I know her.’

One of the important factors for predicting whether identical categories have the same function is the analysis of the functional domain to which the category belongs, and in particular, whether or not there are other forms belonging to the same domain. Thus, for the examples above, the mere fact that Polish has an agreement system in addition to subject pronouns and English does not should tell us that subject pronouns in English and Polish must differ in their function. Consider now a different case. English has only one set of subject pronouns. The third person subject pronoun in the complement clause of the verb of saying encodes coreference with the subject pronoun of the matrix clause (7):

(7) After that, he said he wanted to do some different moves on me…

Mupun, on the other hand, has two sets of subject pronouns, each with its own functions, and neither of which has the same function as the subject pronouns in English. One of the third person subject pronouns in Mupun encodes coreference with the subject of the verb of saying in the matrix clause; the other encodes switch reference with respect to the subject of the matrix clause. The logophoric pronouns, to use Hagège’s (1974) term for pronouns encoding coreference, are morphologically different from the pronouns of the matrix clause. Pronouns encoding switch reference are morphologically identical with the pronouns of the matrix clause (Frajzyngier 1985, 1993). Some of the potential meanings encoded in the systems of reference are listed in the next section (7.2), based partially on Frajzyngier (2011b). The use of a determiner or a deictic with a noun does not necessarily mean that the form involved belongs to the domain of reference. Thus, in many Chadic languages, topicalization (i.e. establishing the topic of a paragraph or of a sentence) is frequently encoded by adding a determiner to a noun. In Mina, this is the standard means of topicalizing a noun, as in example (8), which is the first clause

266 Zygmunt Frajzyngier and Amina Mettouchi

of a narrative establishing the topic, thus providing the necessary evidence that the determiner does not have a deictic or an anaphoric function: (8) hìd-yíì wà í tə̀tə̀ màkáɗ man-PL DEM 3PL 3PL three ‘There were three men.’

7.2 Selected subdomains and grammaticalized meanings within the domain of reference system In this section we include the subdomains of deixis and anaphoric reference, as well as individual grammaticalized meanings, e.g. a marker indicating that the speaker should deduce the identity of a referent from previous discourse, when the referent has not yet been mentioned. The deduced reference marker instructs the listener to identify the referent through a process of deduction using knowledge from various sources, including the listener’s cognitive system, the speech environment, and previous discourse. In Mina (Central Chadic) the deduced reference marker tá may be the only component of a noun phrase or it may be a determiner, modifying another noun or a quantifier (9a–b). (9) a. b.

žíŋ ngùl-yíi pár sùlúɗ tàn then man-PL other two DED í nd-áhà bàhá 3PL go-GO again nd-á mábàr mbír bàhá kə́ mə̀l tàŋ go-GO lion leap again INF seize DED ‘Later, when the two men arrived, the lion jumped to catch them.’ mbígìŋ wàcíŋ í ɗál ngàm mə̀ts mbiguin DEM 3PL do because sickness kə̀ ɗál nə̀ hàyák í hóynə̀ tàŋ INF do PREP village 3PL calm (F.) DED ‘This mbiguin [a ritual], they do it because there is sickness in the village. They cure it.’

One piece of evidence for the proposed function of the marker tá is that its antecedent need not have been mentioned previously. In fact, the presence of tá explicitly tells the listener that the referent is not the noun marked by tá but some other referent associated with that noun. In the last line of the following fragment (10), tá follows the noun báy ‘chief ’, which has already been mentioned several times. However, the form tá does not identify the chief himself but rather his court, an entity that has not been mentioned in the discourse at all (10):

Functional domains and cross-linguistic comparability 267



(10)

báy zá ngwáy bàhámàn bákà bá chief COMP ‘People’ Bahaman today still dzán-á nòk mí find-GO 1PL what ‘The chief said, “People, what else did Bahaman find for us today?” ’

hí ndə̀ lùw-á-ŋ mə́ ndà-hà 2PL go say-GO-3SG DEB go-GO ‘ “Go tell him to come here.” ’ ndá yà í y-ù go call 3PL call-3SG ‘Someone went to call him.’ tíl á nd-á á r báy tàŋ go 3SG go-GO PRED PREP chief DED ‘He went to the chief’s [court].’ (Frajzyngier 2005: 335)

Such grammaticalized meanings can constitute parts of a subdomain together with other grammaticalized meanings or they may make up a subdomain of their own. Note here the importance of the notion ‘grammaticalized’. In addition to the implication that there are specific formal means that encode a function, it also implies that the function must be encoded if the event referred to contains referents that meet the criteria for the given function. Here is the list of grammaticalized meanings within the domain of reference in some Chadic languages: – Deixis referring to entities: proximate and remote, with speaker, listener, or neither as point of reference, e.g. Hausa (Jaggar 1994); – Locative deixis as distinct from locative anaphora, e.g. Mupun (Frajzyngier 1993), Hdi (Frajzyngier with Shay 2002); – Previous reference to entities, with the distinction between proximate and remote as distinct from deixis and not overlapping with the English definite, e.g. Hdi, Mupun; – Locative anaphor, proximate and remote, as distinct from locative deixis, e.g. Mupun; – Deduced reference as described above, e.g. Hdi; – Instruction to the listener to identify the referent in any way they can, e.g. definite article in English; – Unspecified, indefinite member of a set; – Disjoint/conjoined reference (also called switch reference and same reference) in sequential and complement clauses; – Logophoricity with subject and object in its scope (e.g. Mupun, Frajzyngier 1993), confined only to a limited number of matrix clause predicates);

268 Zygmunt Frajzyngier and Amina Mettouchi

– Encoding the referent as known rather than previously mentioned (first described by Ebert 1971 for Frisian, also present in Mina, Frajzyngier 2005). 8. A proposal for the structure of a database The theoretical model of Systems Interactions (Frajzyngier & Shay 2003) could be translated into a configuration amenable to software implementation via a database allowing the investigation of the CorpAfroAs corpus. Once the languages under consideration have been examined, the identified grammaticalized meanings would be grouped into subdomains within domains, and the encoding means (i.e. the forms) by which they are expressed would be listed. Constraints on the occurrence of those grammaticalized meanings would be given, as well as the appropriate labels reflecting grammaticalized meanings. If the same functions were found to be encoded in other languages, so much the better. If such parallels were not found, this would contribute to the creation of a non-aprioristic typology. Such an approach would be useful for linking the actual language-internal labeling of the discovered grammatical meaning to their equivalents in various linguistic or typological theories, or in different analytical traditions. Each form belonging to a grammaticalized meaning would be retrievable in the corpus by means of a query pointing to annotations in the two glossing lines “ge”, and “rx”1. Thus, each file in the database would correspond to a grammaticalized meaning in a particular language, and would contain instructions for retrieval of the associated structures or forms in the CorpAfroAs corpus. If the names of domains and subdomains were shared among languages, this would ensure the possibility of retrieving grammatical meanings (with their associated encoding means) among several languages of the corpus. If the names of domains and subdomains were different, that would demonstrate the extent of cross-linguistic differences. For example, in Kabyle, ‘identifiability of the head of a relative clause’ is a grammaticalized meaning that is encoded by the use of relator i just after the head noun. This relator has been encoded as REL.REAL on the ge tier, and as PTCL on the rx tier. The following file in the database allows the identification of the grammaticalized meaning, of its insertion in a subdomain and a domain, and the retrieval of the relevant sequences in the Kabyle corpus.

1.  For more details about the contents of ge and rx, see the Introduction to this volume, as well as Mettouchi and Chanard (2010).

Functional domains and cross-linguistic comparability 269

Domain

Reference

Subdomain

Reference in relative clause

Grammaticalized meaning

Identifiability of the head of a relative clause

Form

relator i

Constraints

Cannot appear with irrealis mood in the verb of the relative clause

Annotation in ge

REL.REAL

Annotation in rx

PTCL

Retrieval instruction

“look for REL.REAL in ge and PTCL in rx, both annotations vertically aligned on the same morpheme (mb) cell”

(11)

ufsinara jinjinənni / ur fsi-n ara jinjən-nni / NEG melt\NEGPFV-SBJ3PL.M POSTNEG hearth_stone\ANN.PL.M-CNS / PTCL V24-AFFX N.INDF N.OV-DEM /



iθəxðəm akkən fəlkanun // i t-xdəm akkən af lkanun // REL.REAL SBJ3SG.F-make\PFV thus on hearth\ABSL.SG.M // DEMPRO PRO-V23 ADV PREP N.COV // ‘the fireplace stones didn’t melt, which she had thus put in the fireplace.’ (KAB_AM_NARR_02_420–422)

In Mina (see Section 9.1) relative clauses distinguish between two types of referentiality. If the head is not pronominal, the referentiality of the head of the relative clause is marked by the clause-final demonstrative wàcín or wàhín. If the head is pronominal, hence inherently referential, the clause-final demonstrative does not occur. The non-referentiality of the head of the relative clause is unmarked, and does not have a clause-final demonstrative. The grammaticalized meaning ‘referentiality of the head of the relative clause’ seems close to the ‘identifiability of the head of the relative clause’ of Kabyle. Mina is not part of CorpAfroAs, but had it been, the encoding means of the grammaticalized meaning, the relative clause marker wàcín, would have been annotated as DEM in ge and DEM.REL in rx. So it would have been systematically retrievable with its context, thus allowing a thorough investigation of possible similarities between Kabyle and Mina. Domain

Reference

Subdomain

Reference in relative clause

Grammaticalized meaning

Referentiality of the nominal head of a relative clause

270 Zygmunt Frajzyngier and Amina Mettouchi Form

clause-final demonstrative wàcín or wàhín

Constraints

If the head is pronominal (inherently referential), the clause-final demonstrative does not occur

Annotation in ge

DEM

Annotation in rx

DEM.REL

Retrieval instruction

“look for DEM in ge and DEM.REL in rx, both annotations vertically aligned on the same morpheme (mb) cell”

A representation of relative clause in the CorpAfroAs convention would look something like (12) a. for non-referential head: (12) a. hìdì mə̀ ɓám mə̀kwádàk gə̀r kə́ nzə̀ kə́ man REL eat vulture search INF be like N Ptcl V N V PTcl V PREP

ngámbə̀-n skù friend-1SG NEG N-PRO PRT ‘The man who ate/eats vulture cannot be a friend of mine.’

A representation of the relative clause with the referential head would have the form (12) b., where the clause–final demonstrative wàcíŋ (bolded) codes the existential status of the head, its referentiality: (12) b. séy hìdì mə̀ ɮà kə̀sáf wàcíŋ so man REL cut grass DEM PTcl N PTcl V N DEM

à zá wàcíŋ tá nàŋ 3SG COMP DEM GEN 1SG PRO PRT DEM PRED PRO ‘The man who cuts grass said, “This is for me.” ’

9. Application of the database to the reference system in Mina and Kabyle 9.1 The domain of reference in Mina The data in this study comes from Frajzyngier et al. (2005), in some cases with an updated analysis. Mina does not have gender, one of the frequent formal means for encoding reference (Frajzyngier & Shay 2003). Neither is the plurality of nouns obligatorily marked. These two factors reduce the potential use of pronouns as reference markers. The third person pronoun, unmarked for gender, can refer to any

Functional domains and cross-linguistic comparability 271



noun in the previous discourse. Hence, one can reasonably expect the presence of some other means of encoding to enable reference across discourse. The encoding means relevant for the domain of reference in Mina include: – – – – – – – – –

Overt use of a noun; Absence of a noun; A noun followed by a determiner; Previous mention markers; Deictics; A deduced reference marker; Independent pronouns; Obligatory subject pronouns; Object pronouns.

Table 1 lists some grammaticalized meanings in the domain of reference and their associated encoding means in Mina. Table 1.  Subdomains in the Mina system of reference Subdomain

Grammaticalized meaning

Encoding means

Deixis

Proximate

(N) wà (can occur alone)

Mention in discourse Known

First mention (non-topicalized)

Noun alone

Remote previous mention

N (-POSS) nákáhá

Instructs the listener to consider preceding noun as known

N wà (can follow remote previous mention)

Locative

mè-hín (for inherently nonlocative antecedents) màcín (only for inherently locative antecedents) (N) ta

Deduced reference

Coreference/switch refer- Switches reference from imme(PREP) mbí, or use of the ence diately preceding antecedent to third person singular subject another, previously mentioned. The pronoun antecedent may be a noun phrase or an event Coreference

No subject pronoun

Indefinite

Always in the domain de re, ‘one of a set’

N ɗáhà

Unspecified

Human

ví ‘who’, í ‘3PL’

Non-human

ú (object)

272 Zygmunt Frajzyngier and Amina Mettouchi

Table 1.  (continued) Subdomain

Head of relative clause

Grammaticalized meaning

Encoding means

Place

váy (directional) tíkì (stative), ngíɗ ‘place other than the place of speech’

Non referential

No determiner after the relative clause

Referential

Rel. Clause wàcín or wàhín

9.2 The domain of reference in Kabyle The encoding means relevant for the domain of reference in Kabyle include: – – – – – – – – – – –

Gender marking on nouns and adjectives; Number marking on nouns and adjectives, both singular and plural; Overt use of a noun; Absence of a noun; Independent pronouns; Deictics; Obligatory subject pronouns; Optional object pronouns; Interrogative pronouns; Relators; Unspecified nouns.

Table 2 lists some grammaticalized meanings in the domain of reference and their associated encoding means in Kabyle. Table 2.  Subdomains in the Kabyle system of reference Subdomains

Grammaticalized meanings

Encoding means

In all types of clauses Deixis

Proximal Deixis

Affixed demonstratives: -a ; -agi ; -agini

Distal Deixis

Affixed demonstratives: -in ; -inna ; -ihin ; -ihinna

Known reference

Affixed demonstrative -nni

Coreference

bound pronouns

Switch reference

independent pronouns

Specification

Unspecified entity nouns

jiwən (‘one’, animate), aʃəmma (‘thing’), ara (‘thing’ in negative contexts)

Functional domains and cross-linguistic comparability 273



Table 2.  (continued) Subdomains

Grammaticalized meanings

Encoding means

Identifiability

Identifiability of the head in relative clause

relator i following the head.

Specification

Unspecified antecedent

animate pronoun (with gender and number distinctions: wid, tid, win, tin non-animate pronoun (no gender-number distinctions: ajən)

Identification

Unidentified referent

interrogative pronouns : animate (with gender and number distinctions: anwa, anta, anwi, anti), non-animate (no gender or number distinctions: aʃu), place (anda)

In relative clauses

All of the grammaticalized meanings that constitute this domain are retrievable in the corpus, with the relevant search instructions. For instance, -agi can be found by looking for [proxb in ge and AFFX in rx]; win by looking for [the_one\SG.M in ge and indf.PRO in rx]; relator i by looking for [REL.REAL in ge and DEM.PRO in rx], etc. In this way, it is possible to examine in greater detail not only the general context of occurrence of the different encoding means and their frequency, but also their combination with other predications. 10. Comparison of the domain of reference in Mina and Kabyle We can now compare the structure of the domain of reference, and the grammaticalized meanings in the domain of reference in two languages belonging to the same phylum. At this stage of analysis, we cannot be absolutely sure that the subdomains as listed above are indeed properly delimited nor can we be sure that all the grammaticalized meanings have been properly identified. To give an example of the issues involved, Frajzyngier et al. (2005) analyzes the marker ta (phrase-final form tàŋ and táŋ) in Mina as a remote deictic marker for entities and a deduced reference marker. However, upon closer scrutiny, no natural language evidence emerges for the remote deixis function of ta. All examples that were taken to represent the remote deictics could equally well be analyzed as representing the deduced reference marker, which is why only the latter has been retained in Table 1. The table demonstrates an asymmetry in the subdomain of deixis: there is only the grammaticalized meaning of proximate deixis and there is no remote deixis.

274 Zygmunt Frajzyngier and Amina Mettouchi

The cross-linguistic comparison has at least two obvious benefits. The first is the possibility of discovering ways in which languages are similar and different, with respect to a given domain. The second benefit — much more important — is heuristic: it provides an indication of possible directions for future research. If there are differences across related languages, even remotely related ones, one would like to find the reasons for those differences, similarity among languages of the same family being the default assumption. A comparison of the subdomains and the individual grammaticalized meanings within the domain of reference between Mina and Kabyle reveals the following similarities and differences:

Subdomain of deixis The subdomain of deixis is present in both languages. While Mina has only a proximal deictic marker, Kabyle has two series, proximal and distal, each differentiated for gender and number. A possible avenue for future research would be to explore whether it is really the case that Mina does not have a remote deictic marker.

Mention in discourse Kabyle does not appear to have the domain ‘Mention in discourse’. Mina, on the other hand, has a specific form restricted to the first mention in discourse and remote previous mention. Possible research questions to be pursued are: is it indeed the case that (a) Mina encodes remote previous mention, and that (b) Kabyle has not grammaticalized any meanings related to previous mention?

Known reference Both Kabyle and Mina have grammaticalized the category ‘Known reference’. Although sharing of the category between two related languages is the default expectation, one should nevertheless be careful here given the fact that this category is otherwise quite rare across languages. One of the early descriptions of such a category was Ebert’s (1971) work on Frisian. In Mina, the category ‘known’ is marked by the form wa, with its clause-final variants wàcín or wàhín. The marker indicates that the referent is to be treated as a known entity, regardless of whether the listener actually knows the referent. The source of the knowledge could be previous mention in discourse, regardless of the distance between the previous mention and the current mention, or it could be some other source of knowledge (see Frajzyngier et al. 2005). The source of knowledge could also be what is generally expected from any speaker of the language. The category ‘known’ is different



Functional domains and cross-linguistic comparability 275

from ‘previous mention’. All previous mentions are known, but not every known element has a previous mention as its source, as the following Mina examples illustrate. In (13), baboon is mentioned in the immediately preceding sentence. In (14), the teacher was the topic of the previous paragraph, but last mention as màllúm was five sentences earlier. In between there were several other participants mentioned. In (15) the noun mìšìl was mentioned five clauses earlier, the act of stealing two clauses earlier. (13) kwáyàŋ à ndíŋ bə̀ làkáf wàcíŋ squirrel 3SG fear ASSC baboon DEM ‘The squirrel was afraid of that baboon’ (14) nd-á ɗéw ká á bə́r màllúm wàcíŋ go-GO sit POS PRED side marabout DEM ‘He came to sit next to this teacher.’ (15) í k̀ə́ mə̀l zá á n mìšíl wàhín 3PL INF seize EE PRED PREP theft DEM ‘They arrested him for stealing.’

Consider the following fragment, where in the first sentence the noun fòrə́m ‘horn’ is encoded by the remote previous mention marker nákáhà. In its mention in the next sentence, it is followed by the form wà: (16) í hók rə̀ wàcíŋ séy wàl wà 3PL lift D.HAB DEM then wife DEM ɓə̀t á ɓə̀t fòrə́m nákà bə́ vènjéh take 3SG take horn REM ASSC pepper ɗíyà á ɗí kə́ nə̀ mà put 3SG put in PREP mouth ‘When they were lifting [the stones], the wife took the horn which contained pepper and put it in her mouth.’ íf á íf-é tə́ n fòrə́m wà dàp blow 3SG blow-GO GEN PREP horn DEM just ‘She just blew out what was in the horn.’

Mina, unlike Kabyle, has grammaticalized reference to known locative argument, with further differentiation between the reference to inherently locative and inherently non-locative noun phrases. The facts in Mina are consistent with the existence of locative predication in this language.

276 Zygmunt Frajzyngier and Amina Mettouchi

Deduced reference Mina has the category of deduced reference and Kabyle does not. The obvious research question is to make sure that the analysis of the marker ta in Mina really holds. Another would be to explore whether natural discourse clauses containing this marker should not be analyzed as containing remote deixis, a category found in Kabyle but not in Mina.

Coreference and switch reference Both languages have grammaticalized the distinction between coreference and switch reference. The interesting and not too difficult research question here is why both Kabyle and Mina use different means to encode the two grammaticalized meanings within this subdomain.

Unspecified reference Both languages have grammaticalized the notion of unspecified reference. Kabyle makes a distinction between animate and inanimate, and Mina makes a distinction between human, non-human, place, and time. In addition, there is a further distinction in Mina between various types of unspecified place. Again this is consistent with the existence of locative predication in Mina.

Reference in relative clauses Both languages encode the referential status of the head of the relative clause. Mina marks the non-referentiality of the head through the absence of a determiner after the relative clause, and referentiality through the presence of the determiner wàcín after the relative clause. Kabyle has a richer system, which encodes identifiability of the head: an unspecified head, and an unidentified head of the relative clause. The research question that emerges here is why the two languages encode different categories, and why the categories are encoded by specific means. The answer to the first question appears to be related to the number of encoding means in relative clauses in each language. Kabyle has a rich system of relative and interrogative pronouns encoding the gender and the number of the head of the relative clause. Mina, on the other hand, does not have gender distinction, and the relative marker is the same for all heads of relative clauses. This comparison of the various encoding means in Mina and Kabyle indicates a certain complementarity. Kabyle has gender and a robust number encoding on nouns; Mina does not have a gender system and productive encoding of number

Functional domains and cross-linguistic comparability 277



on nouns is limited to humans and larger domestic animals. Hence pronouns in Kabyle, which encode gender and number are a better means of encoding reference in discourse. Since Mina does not have gender and only a weak encoding of number, it has grammaticalized other means of encoding, namely the specific switch reference marker for the third person, and the deduced reference marker. 11. Conclusion The study contributes to the theory and methodology for non-aprioristic typological research. By choosing a functional domain that is encoded in at least two of the languages studied, we can provide a systematic account of the typological differences between the languages under investigation. The differences pertain to grammaticalization of domains, grammaticalization of subdomains, and grammaticalization of specific meanings within each domain. The proposed theory and methodology has considerable heuristic value as it indicates what should be investigated for each domain studied, in order to explain the differences among related languages. Moreover, the theory and methodology also indicates the potential cause and effect relationships for the differences among languages. The proposed theory will be implemented in a follow-up project to CorpAfroAs, also funded by the Agence Nationale de la Recherche, entitled CorTypo.2 The work done in CorpAfroAs will provide the basis for the retrieval of forms in context, and allows to conduct complex language-internal searches; the database will allow non-aprioristic cross-linguistic comparison among the languages of the corpus.

List of abbreviations ASSC

associative

M

masculine

COMP

complementizer

NEG

negative

CONJ

conjunction

PAST

past tense

D

dependent

PL

plural

DEB

debitive

PRED

predicator

DED

deduced reference

PREP

preposition

DEM

demonstrative

PRF

perfective

EE

end-of-event

pro

pronoun

F

feminine

ptcl

particle

2. 

278 Zygmunt Frajzyngier and Amina Mettouchi F.

Fula

PVG

point of view of goal

GEN

genitive

REF

referential

GO

goal

REFL

reflexive

H.

Hausa

REM

remote

HAB

habitual

SEQ

sequential

HYP

hypothetical

SG

singular

References Bybee, Joan L., Perkins, Revere Dale & Pagliuca, William 1994. The Evolution of Grammar: Tense, Aspect, and Modality in the Languages of the World. Chicago IL: University of Chicago Press. COCA - Corpus of Contemporary American English. Cohen, David. 1989. L’aspect verbal. Paris: Presses Universitaires de France. Comrie, Bernard. 1976. Aspect: An Introduction to the Study of Verbal Aspect and Related Problems. Cambridge: CUP. Dahl, Östen. 1985. Tense and Aspect Systems. Oxford: Blackwell. Dixon, Robert M. W. 2010. Basic linguistic theory, Vol 2: Grammatical topics. Oxford: OUP. Ebert, Karen. 1971. Referenz, Sprechsituation und die bestimmten Artikel in einem nordfriesischen Dialekt [Studien und Materialien 4]. Bredstedt: Nordfriisk Instituut. Frajzyngier, Zygmunt. 1985. Logophoric systems in Chadic. Journal of African Languages and Linguistics 7: 23–37. DOI: 10.1515/jall.1985.7.1.23 Frajzyngier, Zygmunt. 1991. The de dicto domain in language. Approaches to Grammaticalization, Vol.1 [Typological Studies in Language 19], Elizabeth Closs Traugott & Bernd Heine (eds), 219–251. Amsterdam: John Benjamins. DOI: 10.1075/tsl.19.1.11fra Frajzyngier, Zygmunt. 1993. A Grammar of Mupun. Berlin: Reimer. Frajzyngier, Zygmunt. 1997. Pronouns and agreement: Systems interaction in the coding of reference. In Atomism and Binding, Hans Benis, Pierre Pica & Johan Rooryck (eds), 115–140. Dordrecht: Foris. Frajzyngier, Zygmunt. 2008. A Grammar of Gidar. Bern: Peter Lang. Frajzyngier, Zygmunt. 2010. Cross-linguistic comparison as a heuristic device: What are object pronouns good for? In Essais de linguistique générale et de typologie linguistique, Frank Floricic (ed.), 63–86. Lyon: ENS Éditions. Frajzyngier, Zygmunt. 2011a. Les fonctions de l’ordre linéaire des constituants. Bulletin de la Société de Linguistique de Paris 107(1): 7–37. Frajzyngier, Zygmunt. 2011b. Grammaticalization of the reference systems. In Handbook of Grammaticalization, Bernd Heine & Heiko Narog (eds), 625–635. Oxford: OUP. Frajzyngier, Zygmunt. 2012. Theoretical bases for differential marking of grammatical and semantic relations of noun phrases: The proper domain for argument-adjunct distinction. Paper given at SLE Conference in Stockholm, September. Frajzyngier, Zygmunt. 2013. Non-aprioristic typology as a discovery tool. In FunctionalHistorical Approaches to Explanation: In Honor of Scott DeLancey [Typological Studies in Language 103], Tim Thornes, Erik Andvik, Gwendolyn Hyslop & Joana Jansen (eds). Amsterdam: John Benjamins. DOI: 10.1075/tsl.103 Frajzyngier, Zygmunt. Submitted. Semantic prerequisites for typology of functional categories.



Functional domains and cross-linguistic comparability 279

Frajzyngier, Zygmunt, Johnston, Eric with Edwards, Adrian. 2005. A Grammar of Mina. Berlin: Mouton de Gruyter. DOI: 10.1515/9783110893908 Frajzyngier, Zygmunt & Mycielski, Jan. 1998. On some fundamental problems of mathematical linguistics. In Mathematical and Computational Analysis of Natural Language [Studies in Functional and Structural Linguistics 45], Carlos Martin-Vide (ed.), 295–310. Amsterdam: John Benjamins. DOI: 10.1075/sfsl.45.27fra Frajzyngier, Zygmunt & Shay, Erin. 2003. Explaining Language Structure through Systems Interaction [Studies in Language Companion Series 55]. Amsterdam: John Benjamins. DOI: 10.1075/tsl.55 Hagège, Claude. 1974. Les pronoms logophoriques. Bulletin de la Société de Linguistique de Paris 69(1): 287–310. Haspelmath, Martin. 2007. Pre-established categories don’t exist: Consequences for language description and typology. Linguistic Typology 11: 119–132. DOI: 10.1515/LINGTY.2007.011 Haspelmath, Martin. 2010. Comparative concepts and descriptive categories in cross-linguistic studies. Language 86(3): 663–687. DOI: 10.1353/lan.2010.0021 Haspelmath, Martin, Dryer, Matthew S., Gil, David & Comrie, Bernard. 2005. The World Atlas of Language Structures. Oxford: OUP. Izre’el, Shlomo. 2002. Preface. In Semitic linguistics: The state of the art at the turn of the twentyfirst century. Shlomo Izre’el (ed.). Tel-Aviv: Eisenbrauns. Jaggar, Philip J. 1994. The space and time adverbials NAN/CAN in Hausa: Cracking the deictic code. Language Sciences 16(3-4): 387–421. DOI: 10.1016/0388-0001(94)90010-8 Lazard, Gilbert. 2004. On the status of linguistics with particular regard to typology. Linguistic Review 21: 389–411. DOI: 10.1515/tlir.2004.21.3-4.389 Mettouchi, Amina & Chanard, Christian. 2010. From fieldwork to annotated corpora: The CorpAfroAs project. Faits de Langues-Les Cahiers 2: 255–265 Newmeyer, Frederick. 2007. Linguistic typology requires cross-linguistic formal categories. Linguistic Typology 11(1): 133–157. DOI: 10.1515/LINGTY.2007.012 NKJP -Narodowy Korpus Języka Polskiego. Seiler, Hansjakob. 1995. Cognitive-conceptual structure and linguistic encoding: Language universals and typology in the UNITYP framework. In Approaches to Language Typology, Masayoshi Shibatani & Theodora Bynon (ed.), 273–326. Oxford: Clarendon Press.

Part 4

Language contact

Language contact, borrowing and codeswitching Stefano Manfredi, Marie-Claude Simeone-Senelle and Mauro Tosco

SeDyL (UMR 8202), Inalco, CNRS / LLACAN (UMR 8135), Inalco, CNRS, PRES Sorbonne Paris-Cité / Università di Torino

Within the larger rubric of language contact we will analyze in this chapter the two phenomena of lexical borrowing and codeswitching as represented in the languages of the CorpAfroAs database. After establishing a theoretical background concerning the difficult distinction between borrowing and codeswitching (§ 1), the study analyzes the semantic, phonological and morphological integration of lexical borrowings in different languages of the corpus (§ 2). The core of the paper (§ 3) focuses on the relation between morphosyntactic and prosodic constraints of codeswitching in CorpAfroAs. Finally, the study argues (§ 4) that, even though syntactic constituency admittedly tells us a great deal about the types of boundaries where speakers are likely to codeswitch, prosodic segmentation plays a pivotal role in the definition of codeswitching. Furthermore, we will show that variation in intonation contours provides a good litmus test for telling the two phenomena of borrowing and codeswitching apart.

1. Theoretical framework One of the very first issues faced in CorpAfroAs was the presence of “foreign” elements in the languages of the corpus. In the context of our work, the problem mainly centered around the morphosyntactic status of such material — and, therefore, its glossing: should such “aliens” be treated according to the rules and glosses of the recipient language? Or rather according to those of the donor language? Or should they not be glossed at all, marking their “foreign” status through the very absence of glossing? The first solution would imply that the foreign element is, all things considered, not that foreign at all (because a glossing in the target language is possible); the second choice would mean, if taken to its logical conclusion, that two separate morphosyntactic systems co-exist within one and the same language. doi 10.1075/scl.68.09man 2015 © John Benjamins Publishing Company

284 Stefano Manfredi, Marie-Claude Simeone-Senelle and Mauro Tosco

This in turn would entail the momentous implication that these systems qualify as mixed languages (along the lines of such well-known and disputed cases as Michif or Media Lengua; cf. Bakker and Matras 2003 for a general discussion). As to the third solution, it looks not only like an easy escape hatch: it could also be taken to imply that foreign material is unanalyzed in the target language — and therefore retains its grammatical independence. An even more basic question then comes to the fore: just how “foreign” must material be to be considered foreign? And who is to assess its foreignness? Generally speaking, linguists view borrowing (hereafter BORR) and codeswitching (hereafter CSW)1 as forming a continuum, with code-switching providing the means by which new words can be introduced into the recipient language (Heath 1989; Romaine 1989; Myers-Scotton 1992). We consider lexical borrowing as a synonym of loanword (cf. Tadmor 2009) and an integral part of the language in which it occurs, without delving into its etymology and its ancientness. For instance, in English both fellow (ultimately from Old Norse fēlagi and attested before the 12th century) and kamikaze (from Japanese and first attested in English in 1945) are part of the same wide category of lexical borrowings. The distinction between older and newer borrowings, as well as the degree of their adaptation in the recipient language, are therefore largely immaterial, and even when the alien status of such foreign material is morphophonologically apparent and well known to the speaker. Lexical borrowings are therefore simply glossed according to the rules of the recipient language, with the optional addition of the label BORR in the rx tier (see § 2). Obviously, the impact of foreign material may be considerable: for example, 40% of the Kabyle lexicon is thought to be of Arabic origin, and made up of both ancient and recent borrowings. The same is true of Hausa, again with Arabic as the donor language, and possibly other languages of the corpus. As to CSW, it has been variously defined as ‘the juxtaposition within the same speech exchange of passages of speech belonging to two different grammatical systems or subsystems’ (Gumperz 1982: 59), or as ‘the alternative use by bilinguals of two or more languages in the same conversation’ (Milroy & Muysken 1995: 7). Matras, for his part, simply defines CSW as ‘the alternation of languages within a conversation’ (Matras 2009: 101). While the alternation of material from different codes is certainly part of any definition of CSW, the weakness of Matras’s definition rests on a poorly-defined concept of ‘conversation.’ It could for example encompass the case of a conversation among speakers of closely related languages or 1.  We use the abbreviation CSW for codeswitching, rather than the more usual CS, because CSW is the label for codeswitched material in the rx line of CorpAfroAs. Moreover, CS is actually used as a gloss for the Construct Case in different Semitic and Berber varieties of the corpus.



Language contact, borrowing and codeswitching 285

varieties of the same language where each speaker uses her/his code and a certain degree of reciprocal accommodation is made: if such an accommodation consists, for example, in avoiding words and constructions which are foreign to each other’s variant and does not involve the use of the other’s variety within one’s own utterances, by our definition there is no CSW. Any definition of CSW must further take into account its relationship with BORR. The notion of a continuum between BORR and CSW is stressed by Matras, who proposes the following multi-dimensional scale (Figure 1): In Matras’s view, there is therefore no theoretical boundary between BORR and CSW. There are certainly many borderline cases between CSW and BORR, as the following sections will exemplify with data from the corpus. Still, in contrast to Matras, we maintain that a distinction is necessary and possible (on both theoretical and heuristic grounds, as detailed below). The definition of CSW a ‘linguistic or discourse practice in which elements and items from two or more linguistic systems, or codes — be they different languages or varieties of a language — are used in the same language act or interaction’ (Mejdell 2005: 414) is inadequate to distinguish it from BORR: the crux of the matter lies, of course, in the use of the English preposition from, which can imply either a synchronic or a diachronic transfer. For the same reason, the even broader definition of CSW as ‘the juxtaposition of Bilinguality bilingual speaker ↔ monolingual speaker Composition elaborate utterance/phrase ↔ single lexical item Functionality special conversational effect, stylistic choice ↔ default expression Unique referent (specificity) lexical ↔ para-lexical Operationality core vocabulary ↔ grammatical operations Regularity single occurrence ↔ regular occurrence Structural integration not integrated ↔ integrated codeswitching ↔ borrowing

Figure 1.  A bidirectional Codeswitching-Borrowing Continuum (Matras 2009: 111)

286 Stefano Manfredi, Marie-Claude Simeone-Senelle and Mauro Tosco

elements from two (or more) languages or dialects’ (McCormick 1994: 581) is also insufficient. For our part, we propose to define CSW broadly as follows:

CSW is the presence of lexical or sentential material belonging to different linguistic systems, provided that its different origin is still transparent in the speaker’s output in one or more grammatical domains.

The definition implies that the distinctive feature of CSW is the simultaneous presence of two (or more) codes (Matras’s ‘alternation of languages’) but it makes no references to a necessary boundary as to the stretch of linguistic material (the sentence, the utterance, or the conversation) encompassed by CSW; in order to distinguish CSW from BORR, reference is made here to the native or foreign status of the material in the speaker’s output, and not to its origin. The speaker’s output, in turn, can be assessed on the basis of the morphophonological integration of the foreign material, which is generally considered pivotal in distinguishing CSW from BORR: BORR is morphophonologically integrated in the recipient language; CSW is typically not. The borrowing process can affect the recipient language, causing it to change its phonomorphological rules (i.e., the canonical shape of words). Therefore, we do not make a distinction between integrated and non- (or partially) integrated borrowing. Some degree of integration is a necessary component of BORR, but only a possibility in CSW, where the phonology of the switched elements can be influenced by the speaker’s native language (see § 3.1.2., § 3.2.).2 It must be noted that our definition does not make reference to discourse and sociolinguistic conditions, which are nevertheless crucial in the rise of CSW as a social phenomenon. Switched elements are part and parcel of another language, with which the speaker must be at least partially conversant: while lexical borrowings can be used by monolinguals, CSW is always the production of (at least partial) bi- and multilinguals. Of course, bi- and multilingualism is a necessary but not sufficient condition for CSW. In our corpus, Beja represents a case of widespread societal bilingualism with Sudanese Arabic and very little CSW (s. also § 3.1.1.). We further assume that, unlike BORR, CSW is provisional and determined mainly by pragmatic factors: it is at least in principle the result of a choice. It has social and psychological values, and these values are at least partially shared by 2.  It should be also remarked that a former BORR might become the object of CSW when contact with the donor language persists or is renewed. For example, in ‘Afar of Djibouti, the very toponym Djibouti (whose etymon is contested between Somali, ‘Afar and Arabic) is phonologically integrated as /gebo:ti/. When it is pronounced /ʤibu:ti/ it could instead be considered a CSW, even if very probably a very common one. The same applies of course to many toponyms, and not only in ‘Afar.

Language contact, borrowing and codeswitching 287



the community of speakers: this is the reason why a (at least partially, imperfectly) bilingual community is necessary in order to have CSW. The “community” itself may be minimal — consisting just of the participants in a dialogue, provided they share the same language or variety. This is also why CSW is, by our definition, a less common phenomenon than BORR. BORR, of course, does not exclude individual and societal bilingualism — it simply does not require it. Even a minimal degree of exposure to a foreign culture and its language — hardly qualifying as bilingualism — may trigger a significant amount of cultural borrowing. The flood of American loans in Western European languages in the post-WW2 period — although hardly stemming from a “minimal degree of cultural exposure” — is a case in point, as it certainly was not triggered by massive bilingualism. This is also evident in the case of the only pidgin language in our corpus, i.e., Juba Arabic (and this notwithstanding that for a part of the community of speakers Juba Arabic is a creole (Tosco and Manfredi 2013)): the speakers of a pidgin are by definition (first, or native) speakers of “something else.” In our case, the Juba Arabic corpus was collected from speakers who are trilingual in Juba Arabic, Sudanese Arabic, and their ethnic, and presumably native, Nilotic language. The case of Arabic “dialects” in the corpus is similar but not identical, because Arabic diglossia is a matter of functional register. Still, we agree with Mejdell (2005) in considering Modern Standard Arabic as a separate language and it therefore follows that diglossia in Arabic dialects is an instance of CSW. Table 1.  Borrowing and codeswitching in CorpAfroAs Language

(Mainly) borrowing from

(Mainly) switching towards

(Djibouti) ‘Afar

Arabic, French

French

Beja

Arabic

Arabic

Gawwada

Amharic

Amharic

Hausa

Arabic, English

English, Arabic

Hebrew

English, Arabic



Juba Arabic

English, Bari

English, Arabic

Kabyle

Arabic

French, Arabic

Moroccan Arabic

French, Spanish, English

French, Modern Standard Arabic, Spanish

Tripoli Arabic

Italian, Turkish, English

Modern Standard Arabic

Ts’amakko

Amharic, Hamer

Hamer

Wolaytta

Amharic



Zaar

Hausa, Arabic

Hausa, English

288 Stefano Manfredi, Marie-Claude Simeone-Senelle and Mauro Tosco

Table 1 shows the main donor of (at least recent) borrowings and the main target language in CSW for each language of the CorpAfroAs corpus. In this context, early lexical borrowing in ancient layers of the languages is largely disregarded. The following sections will show how different cases of CSW and BORR were handled in the languages of the corpus. 2. Lexical borrowing In CorpAfroAs as a whole there are very few lexical items glossed as BORR and in some of the corpora (e.g. Gawwada, Tsamakko, Wolaytta, Tamasheq and Tripoli Arabic), none has been singled out as such. This is not because these languages never integrated ‘foreign’ lexical elements, but rather because we decided that, being a diachronic phenomenon, lexical borrowing did not represent a retrieval priority and we thus chose to optionally mark it by means of the label BORR. Furthermore, the identification of a borrowed item always depends on both structural and socio-historical assessments. With regard to this, it should be remarked that all the languages of CorpAfroAs belong to the same genetic phylum, and that some of them have been in contact with each other for centuries. Given this overall situation, it may be very difficult to evaluate the degree of morphophonological integration of a given lexical item from one Afro-Asiatic language to another, and therefore to identify it as BORR. In the Kabyle corpus, it is noteworthy that the verb ṛuħ ‘leave’, which is borrowed from the Arabic imperfective paradigm *i-ṛūħ, is largely used by monolingual speakers and has the same paradigm as any Berber verb. That being so, it can be considered as an instance of lexical borrowing. However, it is never marked as BORR because it belongs to an ancient layer of the recipient language. In contrast, the Arabic vernacular phrase la bās ‘all right’, which has been integrated in Kabyle as a verb labas ‘be in good health’ that can be normally inflected with personal affixes (see KAB_AM_NARR_01_994), is overtly marked as BORR due to its relatively recent integration into the recipient language. This means that, independently from the degree of morphophonological integration, the criteria for labelling borrowings as such mainly depended on the linguist’s evaluation of the different socio-historical dynamics related to their integration. The vast majority of borrowed lexical items in CorpAfroAs are nouns. In point of fact, nouns are more easily borrowed than any other word class (Haspelmath 2008: 50). With regard to this, Matras (2009: 150) has observed that “the high borrowability of nouns is a product of their referential functions since nouns cover the most differentiated domains for labelling concepts, objects, and roles”. These nominal lexical fillers are also called ‘non-core borrowings’ (Myers-Scotton 2002: 239)



Language contact, borrowing and codeswitching 289

because they are typically related to semantic references associated with previously unknown objects and concepts. Non-core borrowings are necessary, as they fill a gap in the mental lexicon of the speaker (or rather, the loanword filled it when it was first incorporated and established in the target language). Examples of such nominal non-core borrowings are ṛaḍjo in Moroccan Arabic (ARY_DC_NARR_3_31, borrowed from French radio), doktorá ‘PhD’ in Juba Arabic (PGA_SM_CON_2_SP1_270, borrowed from French doctorat, through Sudanese Arabic) or fíːber in ‘Afar ‘boat with a fibreglass hull’ (borrowed from English fiberglass, through Yemeni Arabic or maybe, in Djibouti, directly from French fibre de verre).3 The morphophonological integration of a borrowed item in the recipient language may be accomplished to a greater or lesser extent. On a very low degree, a given borrowed word can still display very different phonetic realizations. In Moroccan Arabic of Ceuta, the Spanish noun esparteña ‘sandals with rope-soles’ is unpredictably realized as spardiːna (ARY_AV_NARR_04_626) or sbərdiːna (ARY_AV_NARR_03_635) by the same speaker. In the first case, the presence of the voiceless bilabial /p/, which is absent in the consonant system of the recipient language, signals a lower degree of phonological integration. In the second case, the voicing of /p/ together with vowel centralization affecting the first syllable of the borrowed item, illustrate a complete phonological integration. This variation exemplifies an on-going process of integration of a lexical borrowing and it also shows how blurred the border can be between BORR and CSW. However, it should be noted that in both occurrences the syllabic structure of the lexical borrowing is modified by an aphaeresis of the initial-vowel of the first syllable. Other borrowings are subject to a higher degree of integration, to such a point that the word is hardly detectable as a foreign word. This is the case, for example, with the verb hiːjad ‘sew’ in Beja. (1)

tx tithiːjad / mot tithiːjad / mb ti= t- hiːjad / ge DEF.F= 3F.SG- sew\IPFV / rx REL= PNG- V1.BORR.ARA / ft that she sews BEJ_MV_NARR_ 17_ shoemaker_224

3.  This compound-noun is truncated and it is integrated into the ‘Afar lexicon with a different meaning since it is used to refer to the whole object made of fibre, through a synecdoche expressing pars pro toto. Furthermore, stress displacement clearly characterises this noun as a borrowed item. If this noun is borrowed from English, then the long vowel [iː] finds a reason in to the absence of diphthongs in ‘Afar. On the contrary, if it is borrowed from French, the item is integrated with a syllabic change due to the impossibility of final consonant clusters (CvCC > CvvCvC).

290 Stefano Manfredi, Marie-Claude Simeone-Senelle and Mauro Tosco

This non-core borrowing presumably finds its etymology in the Arabic verb *xajaṭ.4 The word is completely integrated into the Beja consonant system in which both the voiced velar fricative /x/ and the voiceless dental emphatic /ṭ/ are absent. As a further matter the borrowed item is incorporated with a long vowel [iː], which is absent in the original Arabic verb. The integration of borrowings passes also through their assimilation into the derivation system of the recipient language. In Hebrew, as well as in other Semitic languages, a borrowed item can be subjected to the intra-categorical derivation for expressing different meanings from a same root. For example, the borrowed root filosof shapes a feminine noun related to the abstract notion of ‘philosophy’ if followed by the gender marker -ja as in example (2). (2)

tx giliti ʃefilosofja zelo kazenoʁa // mot giliti ʃefilosofja kaze noʁa // mb gili -ti ʃel= filosof -ja kaze noʁa // ge discover\PFV -SBJ.1SG COMP= philosophy -F.SG like_this terrible // rx V -PNG CONJ= N.BORR -G PRO.ADJ ADJ // ft I discovered that philosophy is not such a terrible thing. HEB_IM_CONV_3_SP1_317

However, the same borrowed root can also express the concrete meaning of ‘philosopher’ if marked by the plural masculine suffix -im as in example (3). (3)

tx jeʃam exad ʃeafilosofim ʃeomeʁ ʃe / mot jeʃʃam exad mb ʃel= ʃam exad ge be there one rx exs ADV.DEICT NUM mot ʃeafilosofim ʃeomeʁ ʃe / mb ʃel= ha= filosof -im ʃe= omeʁ ʃe / ge COMP= DEF= philosophy -PL.M NMLZ= say\ACT.PTCP NMLZ / rx CONJ= DET= N.BORR -PNG REL= V1\TAM REL / ft There is also one of the philosophers that says that HEB_IM_CONV_1_SP1_229

CorpAfroAs also displays a minor number of ‘core borrowings’, i.e. ‘words that duplicate elements that the recipient language already has in its word store’ (MyersScotton 2002: 240). Contrary to non-core borrowings, these lexical items are systematically marked as BORR in CorpAfroAs. For instance, Moroccan Arabic has 4.  Beja people are originally camel herders and dressmaking is not a part of their traditional activities. For this reason, the Beja language does not possess a verb for expressing the meaning of ‘sew’.



Language contact, borrowing and codeswitching 291

borrowed the word semana ‘week’ from Spanish, which is used nowadays alongside Arabic *(u)sbuːʕ also as a result of the influence of Modern Standard Arabic. (4)

tx ʃ i seːmaːna iːjəh ʃ i seːmaːna laʔ // mot ʃ i siːmaːna iːjəh ʃ i mb ʃ i siːmaːn -a iːjəh ʃ i ge INDF2 week -F yes INDF2 rx DET N.BORR.SPA -PNG ADV DET ft Every other week. ARY_AV_NARR_1_214

siːmaːna laʔ // siːmaːn -a laʔ // week -F no // N.BORR.SPA -PNG ADV //

Irrespective of the semantic reference expressed by siːmaːna in Moroccan Arabic, the examples above show that the original term is also remodelled on the phonology and morphosyntax of the recipient language. More particularly, the Spanish vowel /e/, absent in Moroccan Arabic, is replaced by /i/ (even if it is phonetically realized as [e]). Furthermore, the syllabic structure of the borrowed noun varies from the original Spanish (CvCvCv > CvːCvːCv). The borrowed noun follows the agreement rules of Moroccan Arabic: the noun is integrated as feminine because of the presence of a final -a, and it is determined by the indefinite article ʃ i. In example (5), the same borrowed item is modified by the definite article əl-, which is assimilated with the first alveolar consonant (s‑), as in Arabic. (5)

tx baːʃ məlli kaːjəmʃ i diːksseːmana / mot baːʃ məlli kaːiːmʃ i mb baːʃ məlli kaː= j- mʃ iː ge in_order_to when REAL= 3- go\IPFV rx CONJ ADV TAM= PNG V mot diːkssiːmaːna mb diːk= əl= siːmaːn -a / ge DEM.DIST= DEF= week -F / rx DET.M/F= DET= N.BORR.SPA -PNG / ft In order to go that week. ARY_AV_NARR_3_240

As a final remark, it is noteworthy that CorpAfroAs also displays some instances of verb borrowing. The relatively low number of borrowed verbs in the corpus finds good parallels crosslinguistically: in the World Loanword Database,5 over 31% of all nouns are loanwords, while less than half this figure (14%) are verbs, and only two out of 41 languages in the sample have a smaller proportion of loan nouns than loan verbs (Tadmor 2009: 61). Structural constraints certainly play a role in 5.  World Loanword Database, (22 August 2013).

292 Stefano Manfredi, Marie-Claude Simeone-Senelle and Mauro Tosco

disfavoring the borrowing of verbs crosslinguistically (isolating languages, on the contrary, can apparently borrow verbs quite easily). The difficulty of integrating verbs into the complex inflectional/derivational system of Afro-Asiatic languages would thus explain both the small number of borrowed verbs in our corpus and the fact that the few borrowed verbs are usually integrated as bare forms to which personal indexes of the recipient language are added. The high degree of morphosyntactic integration of borrowed verbs does not allow us to differentiate them from any other verb of the recipient language. In ‘Afar, this is the case of the verb jitriħeːnih in example (6). (6)

tx badak addal jitriħeːnih ħaban tiː // mot badak addal jitriħeːnih mb bada -k adda -l j- itriħe -eni -h ge sea -at depth -in 3- drop\PFV -PL -COOR rx N.M -POSTP N.F -POSTP PNG- V1.TAM -PNG -POSTP mot ħaban tiː // mb ħaba -n ti // ge let\IPFV -3PL thing // rx V2.TAM -PNG N // ft Something that is let and dropped offshore. AAR_MCSS_NARR_1_

The ‘Afar verb root itriħe ‘drop’ is etymologically related to the Arabic derived verb form *aṭraħ ‘put’. In the previous example, it is noteworthy that /t/ is the reflex of the Arabic emphatic dental plosive */ṭ/, which is absent in 'Afar. Morphologically speaking, the borrowed item belongs to a verbal sub-class in which person and gender markers are prefixed (and number indexes are suffixed). As with any ‘Afar verb, aspect is marked by vowel apophony: the open vowel a marks the Imperfective while close vowels signal the Perfective. Here, itriħe is used with the meaning drop, but its use is limited to the semantic sphere related to fishing techniques (‘Afar already possesses a verb to express the notion of ‘throw, drop’).6 Apart from some ‘Afar/Arabic bilingual fishermen the word is not recognized as a foreign lexeme. Similarly, Beja displays several instances of verb borrowing from Arabic (see also example 1). In example (7), the verb aːmal ‘do’ is borrowed from Arabic *ʕamal. This verb is phonologically well integrated, as shown by the elision of the initial pharyngeal consonant ʕ that has given rise to a long open vowel aː in Beja. In morphosyntactic terms, the verb also takes the same TAM and agreement markers as any Beja verb. 6.  In ‘Afar, as it was previously noticed, the technical vocabulary concerning the sea-life (names of boats, material, etc.) is borrowed from English or French. On the contrary, the vocabulary related to fishing techniques and fish names is borrowed from Arabic.

Language contact, borrowing and codeswitching 293



(7)

tx oːdheːj daːjib aːmalin akoː / mot oːdheːj daːjib aːmaliːna mb oː= dheːj daːji =b aːmal -iːna ge DEF.SG.M.ACC= people good =INDF.M.ACC do -AOR.3PL rx DET= N.M ADJ =DET V2.BORR.ARA -TAM.PNG mot akoː / mb akoː / ge be\CVB.SMLT / rx PTCL / ft Since the people have good relationships with him. BEJ_MV_NARR_2_farmer_308

3. Codeswitching In this section we aim at presenting a syntactic and prosodic overview of CSW in CorpAfroAs. Broadly speaking, the annotation system allows a linear syntactic analysis of CSW, at the same time as prosodic segmentation enables investigating the prosody-syntax interface of switched clauses. As to the syntactic grammaticality of CSW, several studies (Poplack 1980; Sankoff & Poplack 1981; Muysken 1995) have shown that there are preferential syntactic sites for language switch. Given that CSW is, among other things, a discourse phenomenon, we hold the view that its investigation needs to take prosody into account in order to explain the occurrence of switched utterances in natural discourse. We will thus follow the analytic approach of Shenk (2006) and Mettouchi (2008), who integrated the well-known (although disputed) opposition between intersentential and intrasentential CSW with a prosodic investigation based on the distinction between monolingual and bilingual Intonation Units (§ 4.1). It has also been demonstrated that CSW serves as a contextualization cue that tends to build up contrasts in discursive contexts (Auer 1998). In this regard, Gumperz (1982: 98) has already noted that ‘codeswitching signals contextual information equivalent to what in monolingual settings is conveyed through prosody or other syntactic and lexical processes’. Zentella (1997: 96), for her part, affirms that ‘what monolinguals accomplish by repeating louder and/ or slower, or with a change of wording, bilinguals can accomplish by switching languages’. Bearing this in mind, we will also analyze the intonation contours of intrasentential CSW and compare them with those related to the presence of lexical borrowings (§ 3.2).

294 Stefano Manfredi, Marie-Claude Simeone-Senelle and Mauro Tosco

3.1 Prosodic segmentation and codeswitching 3.1.1 Intersentential codeswitching Intersentential CSW involves a switch at a clause or sentence boundary, where each clause or sentence is either in one or the other language(s) (Romaine 1989). Intersentential CSW represents by far the minority of CSW types in CorpAfroAs. This is probably because the production of an entire sentence in a ‘foreign’ language requires a relatively high degree of bilingual proficiency, and in CorpAfroAs this holds true for both narrative and conversational texts. It is not a coincidence that intersentential CSW mainly concerns Moroccan Arabic / French bilingual speakers who have been much more exposed to the use of the former colonial language than other linguistic communities represented in CorpAfroAs. In syntactic terms, we can define any instance of intersentential CSW as a syntactically coherent clause that lacks syntactic obligatoriness with regard to the preceding sentence. We agree with Watson, Breen and Gibson (2006: 1047) that syntactic obligatoriness is a better predictor than semantic closeness for the definition of prosodic boundaries. It follows that intersentential CSW is prosodically isolated and systematically yields monolingual Intonation Units. Example (9) shows a clear instance of Moroccan Arabic / French intersentential CSW in which the embedded clause contains the French idiomatic sentence ‘qui se ressemble, s’assemble’ [ki sə Rəsãbl sassãbl] (i.e., ‘birds of a feather flock together’). The embedded clause is separated from the previous Moroccan Arabic Wh-question by a major prosodic boundary and covers two French monolingual Intonation Units.7 (9)

tx mʕa mən ʃəbbəhtək // < ki sə Rəsãbl > / < sassãbl> // mot mʕa mən ʃəbbəhtək // mb mʕa mən ʃəbbəh -t =ək // ge with who compare\PVF -1SG =OBJ.2SG.M // rx PREP PRO V.TAM -PNG =PRO.PNG // ft With whom did I compare you? mot < qui se ressemble > / < s’assemble > // mb < qui se ressemble > / < s’assemble > // ge < qui se ressemble > / < s’assemble > // rx < CSW.FRA CSW.FRA CSW.FRA > / < CSW.FRA > // ft < birds of a feather > < flock together >. ARY_DC_NARR_04_59–61

7.  Differently form borrowed items, CSW is not glossed in the \ge tier. On the contrary, CSW is highlighted by means of chevrons < … >. With regard to intrasentential CSW (§ 3.1.2.), it should be remarked that chevrons do not signal the embedded language as against the matrix language, but they rather indicate the presence of ‘foreign’ elements with regard to the sample corpus. See the online Manual for further information about the annotation of CSW in CorpAfroAs.



Language contact, borrowing and codeswitching 295

Example 9 illustrates that intersentential CSW may embed a complete verbal phrase. However, the majority of syntactically coherent switched clauses in CorpAfroAs include just a single self-standing constituent. Example (10) shows an instance of Beja / Arabic intersentential CSW taken from a narrative text in which the narrator is reproducing a conversation between two speakers. The switch towards Arabic is the reaction of one of the speakers to a preceding Beja exhortative form. More specifically, the embedded clause shows an emphatic reiteration of the Arabic selfstanding adverb abadan ‘never’ and is prosodically produced as a monolingual Intonation Unit with a major prosodic boundary. However, the narrative register used by the speaker might have favoured this prosodic segmentation of the switched material. (10)

tx oːn wʔoːr nijaːwa // < abadan abadan > // mot oːn wʔoːr nijaːwaj // mb oːn w= ʔoːr ni- jaw -aj // ge PROX.SG.M.ACC DEF.SG.M= child 1PL- give\IPFV -so // rx DEM DET= N PNG- V1.IRG -PTCL // ft “So, let’s give it to that boy!” mot < abadan abadan > // mb < abadan abadan > // ge < abadan abadan > // rx < CSW.ARA CSW.ARA > // ft < abadan abadan > // ft < “it’s out of the question” >. BEJ_MV_NARR_10_rabbit_59–61

Other examples of syntactically self-standing constituents occurring as intersentential CSW are related to the use of Arabic formulaic expressions by Hausa speakers. In example (11), the embedded clause corresponds to the Arabic religious formula alḥamdu liḷḷahi ‘God be praised’ that, being integrated into the Hausa segmental and tonal system, is realized as [àlhamdùlìllaːhì]. Also in this case, intersentential CSW covers a monolingual Intonation Unit with a major prosodic boundary.8 (11) tx sun ɗàuki wani zaːmàniː / sun ɗoːɽoː wà ɽàːjuwansù // < àlhamdùlìllaːhì > // mot sun ɗàuki wani zaːmàniː / 8.  In this connection, it should be stressed that the same Arabic religious formulas are considered as genuine borrowings in other languages (i.e. Kabyle, Beja). The different treatment of these complex items mainly depends on the different degree of bilingualism characterizing the Muslim communities represented in CorpAfroAs.

296 Stefano Manfredi, Marie-Claude Simeone-Senelle and Mauro Tosco



mb sun ɗàukaː -i wani zaːmàniː / ge 3PL.PFV.FOC take -ACC1 some.M present_time / rx PNG.TAM V2 -ACC1 DET.INDF N / ft They took the period when they lived mot sun ɗauɽoː wà ɽàːjuwansù // mb sun ɗauɽo wà ràːjuwː -n -sù // ge 3PL.PFV. tie.DIR DAT existence -GEN L.GEN // rx PNG.TAM V6 PTCL.SYNT N -SYNT -PNG // ft and they created their own destiny mot < alḥamdu liḷḷahi > // mb < alḥamdu liḷḷahi > // ge < alḥamdu liḷḷahi > // rx < CSW.ARA > // ft < God be praised >. HAU_BC_CONV_4_SP2_328–330

3.1.2 Intrasentential codeswitching Intrasentential CSW involves a switch within the clause or sentence boundary and also includes switching within word boundaries (Romaine 1989). Intrasentential CSW indubitably represents the most common type of CSW within CorpAfroAs. The syntactic standpoint we use for the analysis of intrasentential CSW is that of the Matrix Language Frame proposed by Myers-Scotton (1993, 2001), in which the matrix language (ML) is defined as the language from which the greatest number of morphemes is drawn, while the embedded language(s) (EL) refers to the other language(s) used in the conversation. The underlying assumption is that ML sets the frame for the different types of morphosyntactic constituents occurring in intrasentential CSW.9 As far as CorpAfroAs is concerned, intrasentential CSW is mainly enacted through the insertion of single, high-frequency lexical items from the embedded into the matrix language (a phenomenon also referred to as tagswitching; cf. Caron 2002). In contrast to intersentential CSW, intrasentential CSW embeds obligatory constituents that would be syntactically incoherent if placed outside the sentence in which the language switch occurs. Therefore, if we accept the idea that syntactic obligatoriness is an operational predicator for the occurrence of prosodic boundaries (§ 3.1.1.), intrasentential CSW is more likely to give rise to bilingual Intonation Units. Moreover, in line with Mettouchi’s (2008: 185) observation for 9.  According to Myers-Scotton (2001), there are three types of constituents in sentences showing intersentential CSW: (1) ML+EL constituents that show morphemes from the two or more participating languages. (2) ML islands that are composed entirely of ML morphemes. (3) EL islands that are instead composed entirely of EL morphemes.



Language contact, borrowing and codeswitching 297

Berber / French CSW, we note that in CorpAfroAs too intrasentential language switch tends to occur at the boundaries of bilingual Intonation Units. Example (12) shows two instances of intrasentential CSW with Moroccan Arabic as ML and French as EL. In this case, the switches take place within a complex sentence covering four Intonation Units. Both the EL constituents les thèmes [le teːm] and les morceaux [le moṛsoː] occur at the end of two minor Moroccan Arabic / French bilingual Intonation Units. (12) tx mot mb ge rx ft mot mb ge rx ft mot mb ge rx ft mot mb ge rx ft

nnaːs ʕəʒbuːhum < le teːm > / ʕəʒbuːhum < le moṛsoː > / u ʕəʒbaːthum əlḥəḍ ṛa / ʕəʒbaːthum əlmuːsiːqa // nnaːs ʕəʒbuːhum < les themes > / əl= naːs ʕʒəb -u =hum < les themes > / DEF= people please\PFV -3PL =OBJ.3PL < les themes > / DET= N.PL V.TAM -PNG =PRO.PNG < CSW.FRA CSW.FRA > / People appreciated < the themes >, ʕəʒbuːhum < les morceaux > / ʕʒəb -u =hum < les morceaux > / please\PFV -3PL =OBJ.3PL < les morceaux > / V.TAM -PNG =PRO.PNG < les morceaux > / they liked < the tracks >, u ʕəʒbaːthum əlḥəḍṛa / w ʕʒəb -at =hum əl= ḥəḍṛ -a / and please\PFV -3SG.F =OBJ.3PL DEF= discourse -F / CONJ V.TAM -PNG =PRO.PNG DET= N.F -PNG / and they liked what we had to say, ʕəʒbaːthum əlmuːsiːqa // ʕʒəb -at =hum əl= muːsiːq -a // please\PFV -3SG.F =OBJ.3PL DEF= music -F // V.TAM -PNG =PRO.PNG DET= N.F -PNG // they liked the music. ARY_DC_NARR_2_11–14

The same correspondence between intrasentential CSW and prosodic boundaries can be observed in example (13), where Hausa is the ML and English the EL. In this case, the EL constituent later, which is phonologically integrated as [lettì], occurs at the boundary of a bilingual Hausa / English Intonation Unit. However, this occurrence might also have been favoured by the syntactic overlap between Hausa and English, in which adverbs of time usually appear sentence finally. (13) tx àkwai wata ɽaːnaː / 1277 mukà jiwoː // mot àkwai wata ɽaːnaː / 1277 mb àkwai wata ɽaːnaː / 1277

298 Stefano Manfredi, Marie-Claude Simeone-Senelle and Mauro Tosco



ge COP3 some.F day / 1277 rx PTCL.SYNT DET.IND N / 1277 ft One day, 1277 mot mukà jiwoː < late > // mb mukà jiwoː < late > // ge 1PL.PFV.FOC do.DIR < late > // rx PNG.TAM V6 < CSW.ENG > // ft we were < late >. HAU_BC_NARR_3_SP2_2–4

It should be remarked that the tendency for intrasentential language switch to correspond to the boundaries of bilingual Intonation Units is not always necessarily realized because of the intervention of discourse and/or pragmatic factors. Actually, intrasentential CSW can also occur within monolingual Intonation Units. This is typically the case of embedded discourse markers, that typically occur as prosodically independent words. Example (14) shows an instance of Juba Arabic / English intrasentential CSW due to the insertion of the English discourse marker so, which alone constitutes a minor English monolingual Intonation Unit. (14)

tx < so > / min henák // mot < so > / min henák // mb < so > / min henák // ge < so > / from there // rx < CSW.ENG > / PREP ADV.LOC // ft < So > after that. PGA_SM_NARR_2_SP1_505–506

Like discourse markers, hesitation too can cause intrasentential CSW to correlate with monolingual Intonation Units. Example (15) shows another instance of intrasentential CSW with Juba Arabic as ML and English as EL. In this case, hesitation is signalled by vowel extra-lengthening on the preposition le [leːːː] occurring at the boundary of a Juba Arabic monolingual minor Intonation Unit. The embedded clause which follows is made up of the English nominal phrase displaced schools [displeːsi skuːl] and covers an Intonation Unit ending with a major prosodic boundary. (15)

tx mot mb ge rx ft

medrése de tában tábe leːːː / < displeːsi skuːl > // medrésa de tában tábe le / medrésa de tában tábe le / school PROX.SG obviously belong to / N PRO.DEM ADV V PREP / This school belongs to

Language contact, borrowing and codeswitching 299





mot < displaced schools > // mb < displaced schools > // ge < displaced schools > // rx < CSW.ENG CSW.ENG > // ft < the displaced schools >. PGA_SM_NARR_1_39–40

Example (16) displays a more complex switch, in which English represents the ML and Juba Arabic the EL. The first Intonation Unit encloses both ML and EL constituents, the latter conforming to the general tendency of intrasentential CSW to occur at prosodic boundaries. Here, hesitation is signalled by the Juba Arabic discourse marker jáni ‘that is to say’ which makes up a separate Intonation Unit. As for the predicative adjective green [griːl] which follows, it covers an English monolingual Intonation Unit with a major prosodic boundary. (16)

tx u < iz olmos > / jáni / < griːl > // mot úo < is almost > / jáni / mb úo < is almost > / jáni / ge 3SG < is almost > / that_is_to_say / rx SBJ.PRO.IDP < CSW.ENG CSW.ENG > / DM / ft It < is almost >, I mean, mot < green > // mb < green > // ge < green > // rx < CSW.ENG > // ft < green >. PGA_SM_NARR_2_SP1_119–120

In contrast with example (16), in which prosodic segmentation is induced by the presence of the discourse marker jáni ‘that is to say’, if intrasentential CSW covers more than one Intonation Unit, monolingual Intonation Units with a major prosodic boundary tend to start in the same language as the one in which the preceding minor bilingual unit ends.10 Such prosodic constraints are evident in example (17) in which French is the ML and Moroccan Arabic the EL. The first minor bilingual Intonation Unit ends in French. This is followed by a French monolingual Intonation Unit ending with a major prosodic boundary, signalled in turn by a pause and an intake of breath.

10.  Mettouchi (2008: 187) observed instead that ‘the tendency is for (bilingual and monolingual) intonation units to start consistently in the same language as the beginning of the preceding one, with occasional switches that are pragmatically motivated’.

300 Stefano Manfredi, Marie-Claude Simeone-Senelle and Mauro Tosco

(17) tx waːlaːkin < ila fe boku dʃoːz e il fo ã > / < ã minimõ də ʀəkonesɑ̃ːs > // BI_278 ldaːkʃ i lli daːr // mot waːlaːkin < il a fait beaucoup de choses mb waːlaːkin < il a fait beaucoup de choses ge but < il a fait beaucoup de choses rx CONJ < CSW.FRA CSW.FRA CSW.FRA CSW.FRA CSW.FRA CSW.FRA ft But < he has done a lot mot et il faut un > / < un minimum mb et il faut un > / < un minimum ge et il faut un > / < un minimum rx CSW.FRA CSW.FRA CSW.FRA CSW.FRA > / < CSW.FRA CSW.FRA ft and you need > < a minimum mot de reconnaissance > // BI_278 l= daːk əl= mb de reconnaissance > // BI_278 l= daːk əl= ge de reconnaissance > // BI_278 to= DIST.M DEF= rx CSW.FRA CSW.FRA > // BI_278 PREP= DEM DET= ft of recognition > BI_278 mot ʃ i lli daːr // mb ʃ i lli daːr // ge thing that do\PFV.2.SG.M // rx N.M REL V.TAM // ft he has done. ARY_DC_NARR_5_57–60

Lastly, we should also consider the sporadic presence of intra-word CSW, in which the switch does not only occur within clause boundaries, but also within word boundaries. In view of this, the occurrence of intra-word CSW does not systematically coincide with prosodic boundaries, since it can appear everywhere within a bilingual Intonation Unit. However, the switched component is generally signalled by an emphatic pitch rising (§ 3.2.). As example (18) shows, in CorpAfroAs intraword CSW is mainly realized through the insertion of lexical verbs from the EL (i.e., the English verb work [woːk]), to which TAM markers and/or personal indexes from the ML are affixed (i.e., the Juba Arabic irrealis preverbal marker bi=).11 This represents an instance of language switch rather than an integrated verb borrowing because ‘to work’ is regularly expressed by the verb istákal in Juba Arabic (see for example PGA_SM_CONV_1_SP2_446).

11.  It should be remarked that, because of retrieving reasons, all the phonological words interested by intra-word CSW are enclosed between chevrons (including affixes and clitics), but only the switched component is marked by the label CSW in the \rx tier.



Language contact, borrowing and codeswitching 301

(18) tx éta kan ma < biwoːk > de éta maːːː / 238 ámin henák dámman be henák // mot íta kan ma < biwork > mb íta kan ma= < bi= work > ge 2.SG if NEG= < IRR= work > rx SUBJ.PRO.IDP CONJ.POT PTCL.NEG= < TAM= CSW.ENG > ft If you don’t < work >, mot íta ma / 238 ámin henák dáman mb íta ma= / 238 ámin henák dáman ge 2.SG NEG= / 238 safe there up_to rx SUBJ.PRO.IDP PTCL.NEG= / 238 ADJ ADV.LOC ADV ft you won’t be safe at all there (in Juba). mot be henák // mb be henák // ge by there // rx PREP ADV.LOC // PGA_SM_CONV_1_SP2_193–195

3.2 Intonation and codeswitching We have seen in the preceding paragraphs that there is a sizeable correlation between types of CSW and prosodic boundaries. We want to show now that variation in intonation contours can represent a further structural discriminant for the identification of intrasentential CSW. Even though many authors (Gumperz 1982; Zentella 1997; Karrebæk 2003) have already noted a connection between intonation and CSW, there is a lack of research on this matter. In this regard, Olson and Ortega-Llebaria (2010) showed that the perceptual relevance of CSW is more significant in the absence of other contextualization cues, such as narrow focus intonation. However, our data from CorpAfroAs point out that intrasentential tag-switching systematically correlates with some form of intonation emphasis. Example (20) shows an instance of Juba Arabic / English intrasentential CSW occurring within an Intonation Unit enclosed within a major prosodic boundary. With regard to the intonation curve, it is noteworthy that the declination of F0 related to the declarative status of the utterance12 is suspended by an emphatic pitch rise occurring on the first stressed syllable of the switched item ‘master.’ This word is well integrated into the Juba Arabic syllable structure as [masat]. This emphatic high pitch corresponds to the highest point of the intonation curve and reaches 125.1 Hz. 12.  See the chapter ‘The intonation of Topic and Focus in Zaar, Tamasheq, Juba Arabic and Tripoli Arabic’ for a description of the intonation curve of declarative clauses in Juba Arabic.

302 Stefano Manfredi, Marie-Claude Simeone-Senelle and Mauro Tosco

(19) tx

laːna bíga < másat > ta boːr sudaːn zátu //

130 120

100

75 la:



na



ga

< má

sat >

ta

bor

su

dan



tu / /

0

1.58

mot le ána bíga < master > ta boːr sudaːn zátu // mb le ána bíga < master > ta boːr sudaːn zátu // ge to 1SG become < master > POSS Port Sudan FOC1 // rx PREP SUBJ.PRO.IDP V < CSW.ENG > PTCL N.PR N.PR PTCL // ft Until I became < the master > of Port Sudan PGA_S_NARR_1_447

Example (20) displays an instance of Moroccan Arabic / French intrasentential CSW occurring in a bilingual Intonation Unit with a major prosodic boundary. In this case, as before, the two EL constituents are phonologically integrated into the consonant system of the ML and they correlate with an emphatic high pitch, reaching 70 Hz max and 204.3 Hz respectively, on the last switched word. (20) tx

məlli katxrəʒ mən < ṣ aːl > < lṣ aːl > //

225 200

150

100 75 mə 0



mot mb ge rx ft

lli

kət

xor

< șal >

mən

Time (s)

məlli katxrəʒ mən < salle > məlli ka= t- xrəʒ mən < salle > when REAL= 2- go_out\IPFV from < salle > CONJ TAM= PNG- V PREP < CSW.FRA > When you go from < hall >

< lșal > / / 1.4

Language contact, borrowing and codeswitching 303





mot < lsalle > // mb < l= salle > // ge < to= salle > // rx < PREP= CSW.FRA > // ft < to hall >. ARY_DC_CONV1_SP1_20

A similar change of fundamental frequency can be also observed in Hausa / English intrasentential CSW. Example (21) contains three instances of English tagswitching occurring in a complex sentence covering seven Intonation Units. Each switched item (i.e. federal, ABU and students) corresponds to a new intonation unit, with an emphatic high pitch on the second syllable of English ‘federal’, and, in each case, with a suspension of F0 downdrift. (21) tx àːːː / tʃ ikinːːː / < feːdaral > / 117 iɽìn zuwàː sei / < eːbìːjù > hakà / takàn rarràbaːmà / < stuːdèn > // 175 160 140 120 100 75



0

mot àːːː / tʃ ikinːːː / < federal > / mb àːːː / tʃ ikin -ːːː / < federal > / ge FILL / inside -LENGTH / < federal > / rx FILL / PREP -LENGTH / < CSW.ENG > / ft (She would take it) aah, in < federal > (college) mot 117 iɽìːn zuwàː sei / < ABU > hakà / mb 117 iɽìː -n zuwàː sei / < ABU > hakà / ge 117 type -GEN going to / < ABU > like_this / rx 117 N -SYNT N.V0 PREP / < CSW.ENG > ADV / ft 117 and she can go up to < ABU > (university) for example mot takàn rarràbaː mà / < students > // mb takàn rarràbaː mà / < students > // ge 3F.SG.HAB distribute DAT / < students > // rx PNG.TAM V1.PL PTCL.SYNT / < CSW.ENG > // ft she would distribute (it) to the . HAU_BC_CONV_1_SP2_42–49

3.804

304 Stefano Manfredi, Marie-Claude Simeone-Senelle and Mauro Tosco

The preceding analysis of the CSW intonation supports the idea that ‘the most common function of tag-switching is pragmatic: highlighting an event, setting off a personal reaction to what has been said’ (Caron 2002: 22)13 and that CSW functions as a contextualization cue among others cues such as narrow focus intonation (Auer 1998; Olson and Ortega-Llebaria 2010). In this regard, example 22 shows an instance of Tripoli Arabic / English tag-switching CSW co-occurring with BORR and narrow focus. The highest pitch falls on the focused quantifier kull and reaches 137 Hz. The high pitch on the first syllable of the switched element ‘fucking’ barely reaches 83 Hz, and the lexical borrowing maːsǝʒ (from English message) does not display any intonation prominence. Therefore, contrastive focus appears to be more prominent prosodically than CSW. (22) tx

maːsəʒ kulla < fakiŋ > //

140

120

100

80



0

0.8729

mot maːsəʒ kulla < fucking > // mb maːsəʒ kull =h < fucking > // ge message all =OBJ.3SG.M < fucking > // rx N.BORR.ENG QNT =PRO.PNG < CSW.ENG > // ft The message was all < insult > HAU_BC_CONV_1_SP2_42–49

The majority of the previous examples show that, contrary to what is generally assumed, morphophonological integration cannot be used to pinpoint the distinction between BORR and CSW, since switched items may well be integrated into the phonology of the speaker’s native language. With this background, the change of f0 linked to the occurrence of the switched items is of particular interest when compared to the ordinary intonation contours related to BORR, as already seen in example (22). Example (23) further shows the Arabic nominal borrowing xawaːdʒa ‘westerner’ in Beja. That the borrowed item is not integrated into the phonology of the recipient language is shown by the retention of the Arabic 13.  See also Ziamari (2010) for an analysis of the pragmatic implication of Moroccan Arabic / French.

Language contact, borrowing and codeswitching 305



voiceless velar fricative /x/ and by the absence of variation in the declining pattern of the first two Intonation Units. (23) tx



onoːjham / deːr ajihob / xawaːdʒajda //

mot oːn oːjhaːm / mb oːn oː= jhaːm / ge PROX.SG.M.ACC DEF.SG.M.ACC= leopard / rx DEM DET= N.M / ft The leopard mot deːra anihoːb / mb deːr -a a-ni =hoːb / ge kill\INT -IMP.SG.M say\PFV.1SG =when / rx der.V1 -TAM.PNG V1.IRG =CONJ / ft When I said: ‘Kill it!’ mot ixawaːdʒajda // BI_425 mb i= xawaːdʒa -i =da // BI_425 ge DEF.M= foreigner -GEN.SG =DIR // BI_425 rx DEM= N.M.BORR.ARA -CASE =POSTP // BI_425 ft to the foreigner // BI_425 BEJ_MV_NARR_6_foreigner_12–14

4. Conclusions In this chapter, we have argued for a neat separation between BORR and CSW: while the former is part and parcel of the recipient language, and is glossed and analyzed

306 Stefano Manfredi, Marie-Claude Simeone-Senelle and Mauro Tosco

as such, CSW is an instance of one or more alien elements entering a recipient language without — at least in principle — being integrated into the latter. In the CorpAfroAs corpus as a whole few borrowings have been indicated because, on the very basis of our definitions, all of them are integrated as a part of the lexicon of the recipient language — often, obviously, with semantic changes. On the morphosyntactic level borrowings in CorpAfroAs frequently — but not always — keep the word category of the donor language: borrowed nouns are integrated as nouns and verbs as verbs in the recipient language. The degree of integration in the system of the recipient language is very variable, and it does not seem to depend on either the donor or the recipient language. BORR is always predictable because in any given context it is the only tool for speakers to express what they want to say. Between CSW and BORR runs a very thin line, especially when adjustment to the recipient language is only phonetic. Different types of CSW have different prosodic segmentation patterns: intersentential CSW is systematically related to monolingual Intonation Units, while intrasentential CSW tends to occur at the end of bilingual Intonation Units. In a minority of cases, intrasentential CSW can also occur in monolingual Intonation Units due to the intervention of discourse factors. Tag-switching is regularly highlighted by prosodic prominence (emphatic high pitch). This is a major result of our investigation and can be taken as a new constraint for distinguishing CSW from BORR. Of course, all these conclusions will have to be confirmed through further quantitative analysis.

References Auer, Peter. 1998. The pragmatics of codeswitching: A sequential approach. In One Speaker, Two Languages: Cross-disciplinary Perspectives on Code-switching, Lesley Milroy & Peter Muysken (eds), 115–135. Cambridge: CUP. Bakker Peter & Matras, Yaron (eds). 2003. The Mixed Language Debate: Theoretical and Empirical Advances. Berlin: Mouton de Gruyter. Caron, Bernard. 2002. Zaar, Hausa, English codemixing. In Lexical and Structural Diffusion. Interplay of Internal and External Factors of Language Development in the West African Sahel, Robert Nicolaï & Petr Zima (eds), 19–25. Nice & Prague: Publications de la Faculté de Lettres, Arts et Sciences Humaines de Nice et de la Faculté des Etudes Humaines, Université Charles de Prague. Caron, Bernard. 2012. ‘Hausa Corpus’. Corpus recorded, transcribed and annotated by Bernard Caron. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOI: http://dx.doi.org/10.1075/scl.68.website. Accessed on 01/07/2013. (= HAU_BC) Caubet, Dominique. 2012. ‘Moroccan Arabic Corpus’. Corpus recorded, transcribed and annotated by Dominique Caubet. In Amina Mettouchi & Christian Chanard (eds). The



Language contact, borrowing and codeswitching 307

CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOI: http://dx.doi.org/10.1075/ scl.68.website. Accessed on 01/07/2013. (= ARY_DC) Gumperz, John J. 1982. Discourse Strategies. Oxford: OUP. DOI: 10.1017/CBO9780511611834 Haspelmath, Martin. 2008. Loanword typology: Steps toward a systematic cross-linguistic study of lexical borrowability. In Aspects of Language Contact: New Theoretical, Methodological and Empirical Findings with Special Focus on Romancisation, Thomas Stolz, Dik Bakker & Rosa Salas Palomo (eds), 43–62. Berlin: Mouton de Gruyter. Haspelmath, Martin. 2009. Lexical borrowing: Concepts and issue. In Loanwords in the World’s Languages. A Comparative Handbook, Martin Haspelmath & Uri Tadmor (eds), 35–54. Berlin: Mouton de Gruyter. Heath, Jeffrey. 1989. From Codeswitching to Borrowing. A Case Study in Moroccan Arabic. London: Kegan. Heath, Jeffrey. 1994. Borrowing. In The Encyclopaedia of Languages and Linguistics, Vol. 1, Ron E. Asher (ed), 383–394. Oxford: Pergamon Press. Karrebæk, Martha Sif. 2003. Iconicity and structure in codeswitching. International Journal of Bilingualism 7: 407–411. DOI: 10.1177/13670069030070040401 Malibert, Il-Il. 2012. ‘Hebrew Corpus’. Corpus recorded, transcribed and annotated by Il-Il Malibert. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOI: http://dx.doi.org/10.1075/scl.68.website. Accessed on 01/07/2013. (= HEB_IM) Manfredi, Stefano. 2012. ‘Juba Arabic Corpus’. Corpus recorded, transcribed and annotated by Stefano Manfredi. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOI: http://dx.doi.org/10.1075/scl.68.website. Accessed on 01/07/2013. (= PGA_SM) Manfredi, Stefano & Pereira, Christophe. 2013. Arabic Youth Languages in Africa: A Comparative Overview. Paper presented at the African Urban and Youth Languages Conference, University of Cape Town, July 5-6. Matras, Yaron. 2009. Language Contact. Cambridge: CUP. DOI: 10.1017/CBO9780511809873 McCormick, Kay M. 1994. Code-switching and mixing. In The Encyclopaedia of Languages and Linguistics, Vol. 2, Ron E. Asher (ed), 581–587. Oxford: Pergamon Press. Mejdell, Gunvor. 2005. Code-Switching. In Encyclopaedia of Arabic Language and Linguistics, Vol. 1, Kees Versteegh, Mushira Eid, Alaa Elgibali, Manfred Woidich & Andrzej Zaborski (eds), 414–421. Leiden: Brill. Mettouchi, Amina. 2008. Kabyle/French codeswitching: A case study. In Berber in Contact: Linguistic and Sociolinguistics Perspectives, Mena Lafkioui & Vermondo Brugnatelli (eds), 187–198. Cologne: Rüdiger Köppe. Mettouchi, Amina. 2012. ‘Kabyle Corpus’. Corpus recorded, transcribed and annotated by Amina Mettouchi. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOI: http://dx.doi.org/10.1075/scl.68.website. Accessed on 01/07/2013. (= KAB_AM) Muysken, Peter. 1995. Code-switching and grammatical theory. In One Speaker, Two Languages: Cross-disciplinary Perspectives on Code-switching, Lesley Milroy & Peter Muysken (eds), 177–198. Cambridge: CUP. DOI: 10.1017/CBO9780511620867.009 Myers-Scotton, Carol. 1992. Comparing codeswitching and borrowing. Journal of Multilingual and Multicultural Development 13(1-2): 19–39. DOI: 10.1080/01434632.1992.9994481 Myers-Scotton, Carol. 1993. Duelling Languages: Grammatical Structure in Codeswitching. Oxford: OUP.

308 Stefano Manfredi, Marie-Claude Simeone-Senelle and Mauro Tosco Myers-Scotton, Carol. 2001. The matrix language frame model: Developments and responses. In Codeswitching Worldwide II, Rodolfo Jacobson (ed.), 23–58. Berlin: Mouton de Gruyter. Myers-Scotton, Carol. 2002. Language Contact: Bilingual Encounters and Grammatical Outcomes. Oxford: OUP. Olson, Daniel & Ortega-Llebaria, Marta, 2010. The perceptual relevance of Code Switching and Intonation in creating narrow focus. In Selected Proceedings of the 4th Conference on Laboratory Approaches to Spanish Phonology, Marta Ortega-Llebaria (ed), 57–68. Somerville MA: Cascadilla Proceedings Project. Poplack, Shana. 1980. Sometimes I’ll start a sentence in Spanish y termino en español: Toward a typology of codeswitching. Linguistics 18(7-8): 581–618.  DOI: 10.1515/ling.1980.18.7-8.581 Romaine, Suzanne. 1989. Bilingualism. Oxford: Blackwell. Sankoff, David & Poplack, Shana. 1981. A formal grammar for codeswitching. Papers in Linguistics: Journal of Human Communication 14(1): 3–45. DOI: 10.1080/08351818109370523 Shenk, Petra Scott. 2006. The interactional and syntactic importance of prosody in SpanishEnglish bilingual discourse. International Journal of Bilingualism 10(2): 179–205.  DOI: 10.1177/13670069060100020401 Tadmor, Uri. 2009. Loanwords in the world’s languages: Findings and results. In Loanwords in the World’s Languages. A Comparative Handbook, Martin Haspelmath & Uri Tadmor (eds), 55–75. Berlin: Mouton de Gruyter. Tosco, Mauro and Stefano Manfredi. 2013. Pidgins and creoles. In The Oxford Handbook of Arabic Linguistics, Jonathan Owens (ed.), 495–519. Oxford: Oxford University Press. Vanhove, Martine. 2012. ‘Beja Corpus’. Corpus recorded, transcribed and annotated by Martine Vanhove. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOI: http://dx.doi.org/10.1075/scl.68.website. Accessed on 01/07/2013. (= BEJ_MV) Vicente, Ángeles. 2012. ‘Moroccan Arabic Corpus (Ceuta)’. Corpus recorded, transcribed and annotated by Ángeles Vicente. In Amina Mettouchi & Christian Chanard (eds). The CorpAfroAs Corpus of Spoken AfroAsiatic Languages. DOI: http://dx.doi.org/10.1075/ scl.68.website. Accessed on 01/07/2013. (= ARY_AV) Watson, Duane, Breen, Mara & Gibson, Edward. 2006. The role of syntactic obligatoriness in the production of intonational boundaries. Journal of Experimental Psychology: Learning Memory and Cognition 32(5): 1045 -1056. DOI: 10.1037/0278-7393.32.5.1045 Zentella, Ana Celia. 1997. Growing up bilingual. Oxford: OUP. Ziamari, Karima. 2010. Moroccan Arabic-French codeswitching and information structure. In Information Structure in Spoken Arabic, Jonathan Owens & Alaa Elgibali (eds), 189–206. London: Routledge.

Part 5

Information technology

ELAN-CorpA Lexicon-aided annotation in ELAN Christian Chanard

CNRS LLACAN, UMR 8135

For a long time, Toolbox has been the most used software dedicated to text annotation in the community of field linguists, especially for African linguistics. However, its limitations, and the growing need to pair text and sound, have made it important to find another solution to text annotation. This paper, aimed at a readership of information technology specialists, is a presentation of the software development conducted within the CorpAfroAs project on the basis of the software ELAN, developed by the Max Planck Institute in Nijmegen. This development, whose result is the ELAN-CorpA software, involves the addition of an internal parser linked to a lexicon, for semi-automatic interlinearization purposes.

The CorpAfroAs project One of the main objectives of CorpAfroAs1 was to create a one-hour corpus of speech recordings in each language involved in the project, prosodically segmented, transcribed and morphosyntactically annotated. These corpora provide the basis for further cross-linguistic research. From the outset, we were aware of a variety of tools that could help us perform at least some parts of the various tasks involved. Among these, ELAN2 was the only tool capable of integrating the whole process. However, its capacity for multitiered annotations of media files lacks one significant function that is widely ex-

1.  DOI: http://dx.doi.org/10.1075/scl.68.website (9 September 2013). The project ended in June 2012 but the website and corpus are still in progress. 2.  ELAN is developed by the Max Planck Institute for psycholinguistics in Nijmegen; (9 September 2013). doi 10.1075/scl.68.10cha 2015 © John Benjamins Publishing Company

312 Christian Chanard

ploited by field linguists using Toolbox:3 lexicon-aided annotation. So we decided to create this function in ELAN, since this software was open source. The challenge was to manage an external lexicon from within ELAN. To do this, we decided to create two panels, one to interface with the lexicon and the other to display the segmentations proposed by the parser. Each of these areas had to be capable of responding to a mouse-click to validate the user’s choices. A new tab, Interlinearize, was added to the main window, which opens a new panel displaying a menu and two sub-panels lexicon and segmentation (see Figure 1).

Figure 1.  Segmentation of the word amoːk

The lexicon panel contains a grid to display the content of the lexicon, and two buttons to Insert a new record or Edit an existing one. Both these buttons let the user toggle between a record and a variant (i.e. the alternate form of a record). The segmentation panel contains two tabs: – Parse & Annotate: this tab is where the full process of parsing the words into morphemes and annotating them takes place. The area displays the following options: – Full Lexicon removes the filter masking the lexicon entries; – Interlinearize launches the process of segmenting and annotating the word; – Auto-interlinearize launches the same process without stopping at the next word if there are no ambiguities such as homonyms or multiple glosses to be resolved by the user; – A checkbox allows the use of another lexicon containing already processed words to speed up the automatic process; – A grid displays the segmentation proposed by the parser. – Annotate: this tab is for the process of annotation when the segmentation line already exists or if there is no need to parse the words into morphemes. The area displays the following options: – Annotate simply annotates words or morphemes without segmentation;

3.  Toolbox was developed by SIL International; (9 September 2013).

ELAN-CorpA 313





– Auto-Annotate performs the annotation process without stopping at each word if there are no ambiguities such as homonyms or multiple glosses to be resolved by the user.

The interlinearize process The interlinearize process begins by searching for the whole word in the lexicon. If it does not find it, it searches for the prefixes and suffixes recursively. In other words, it searches for the remaining part of the word after the isolation of a prefix or a suffix, until this remaining part is empty or no longer segmentable. The different possible segmentations of the word, relatively to the content of the lexicon, are then suggested to the user. If the remaining part of a segmentation is not empty, it is preceded by an asterisk to show that it is a guessed root not yet listed in the lexicon. The rootParser looks for the word or the remaining segment in the lexicon. The prefixParser and the suffixParser search all the possible lexicon entries (prefixes and suffixes) to find a match with the beginning or end of the target word or with the remaining segment. These methods are applied on a string that represents the current segmentation of the word in the form:

(prefix‑)*(remainingSegment)(-suffix)*

where the remaining part of the target word is surrounded by two special delimiters, e.g. internationalization => inter-[nation]-al-ize-tion

Specific lists from the lexicon When the ELAN-CorpA lexicon is opened, various arrays are created depending on the type of lexicon entry: rootForms, prefixForms, suffixForms and stemForms. These lists are used by the parser to find a match with the analysed word or part of it. The stemForms array contains the same type of lexicon entries as the rootForms array except that the entries appear in the lexicon preceded or followed by an underscore to show that they are not full words in the same way as roots are. Since the parser must ignore these underscores in the matching process, they are listed in a special array. Underlying forms are lexicon entries that are analysed by the user as compounds of different entries from the lexicon. Since there can be underlying forms for each type of lexeme (roots, prefixes and suffixes), three corresponding arrays

314 Christian Chanard

are created: rootUnderlyingForms, prefixUnderlyingForms and suffixUnderlyingForms.

The parser algorithm The rootParser method is the entrance point to the interlinearize process. It applies to the whole word (in the script below: annotation), but throughout the process of morphological analysis, when affixes are gradually isolated from the word, the remaining part is surrounded by two specific delimiters that let the parser know which part of the segmented word is to be processed (below: X or Y). The following algorithm explores all possible segmentations based on the lexicon affixes and roots. rootParser (annotation) annotation = prefix* + X + suffix* /* X represents the remaining segment surrounded by 2 delimiters */ if(X) not null foreach rootUnderlyingForm if match(X, rootUnderlyingForm) /* the root that matches is a compound of other roots */ X => root* AddParseToList(prefix* + root* + suffix*) Exit /* only one underlying form for a root */ foreach rootForm if match (X, rootForm) /* the rest is a word in the lexicon */ AddParseToList(prefix* + rootForm + suffix*) foundRoot=True if foundRoot = False /* the root is not in the lexicon /* search for suffixes */ suffixParser(annotation, False, True) /* search for prefixes */ prefixParser(annotation, True, False) else /* annotation cannot be segmented anymore AddParseToList(annotation) prefixParser(annotation, (boolean) foundSuffix, (boolean) foundPrefix) annotation = prefix* + X + suffix* foreach(prefixUnderlyingForm) if match (X, prefixUnderlyingForm) /* the prefix that matches is composed of other prefixes */ X = (prefix+) + Y rootParser(prefix* + Y + suffix*) /* parse the remaining

ELAN-CorpA 315



segment Y */ foreach prefixForm if match (X, prefixForm) /* a prefix matches the beginning */ foundPrefix=True /* of the remaining segment X */ X = prefixForm + Y partialSegmentation.add(prefix* + Y + suffix*) if foundPrefix = True foreach(partialSegmentation) rootParser(partialSegmentation) /* parse the remaining segments Y */ else /* no prefix available */ if foundSuffix = True /* coming from a previous suffix segmentation

or start process => search for other suffix in X */

suffixParser(annotation, false, false) else /* matching no suffix nor prefix for the remaining segment X */ guessedRoot="*X" AddParseToList(prefix* + guessedRoot + suffix*) suffixParser(annotation, foundSuffix, foundPrefix) annotation = prefix* + X + suffix* foreach suffixUnderlyingForm if match(X, suffixUnderlyingForm) /* the suffix that matches X is composed of other suffixes */ X => Y + (suffix+) rootParser(prefix* + Y + suffix*) foreach suffixForm if match(X, suffixForm) /* a suffix match the end */ foundSuffix=True /* or the remaining segment X */ X => Y + suffixForm partialSegmentation.add(prefix* + Y + suffix*) if foundSuffix = True foreach(partialSegmentation) rootParser(partialSegmentation) /* parse the remaining segments Y */ else /* no suffix available if foundPrefix = True /* coming from a preceding prefix segmentation or start of process => search for other prefix in X */ prefixParser(annotation, false, false); else /* matching no suffix nor prefix for the remaining segment X */ guessedRoot="*X" AddParseToList(prefix* + guessedRoot + suffix*)

316 Christian Chanard

AddParseToList

Adds a possible new parse if and only if it is not already in the list.

Displaying the results of the parser The list of morphological segmentations of the target word is then displayed in the segmentation panel grid (see Figure 2).

Figure 2.  Possible segmentations of the word adifhoːb

The user chooses the correct segmentation, then validates each segment of it with reference to the lexicon where one morpheme may have several values (either in terms of the gloss and/or the grammatical category). The lexicon displayed in the lexicon area is filtered so as to display only those entries that correspond to the morphemes of the various segmentations (see Figure 3):

Figure 3.  Lexicon for the segmented word adifhoːb

Validating the morphemes of a segmentation To select the correct annotations, a double-click on the morpheme in the segmentation area filters the lexicon to the corresponding entry with each value displayed on a different line. A double-click on the correct line in the lexicon validates the morpheme annotation. Then the next unit in the segmentation area is selected and the validation process continues. When the last morpheme is validated, these annotations are copied into the annotation area, underneath the target word.



ELAN-CorpA 317

Adding a new entry to the lexicon from the segmentation area A remaining segment in the chosen segmentation line is preceded by an asterisk. The segment can be entered into the lexicon by right-clicking on it. A pop-up menu offers a choice between recording it as an entry or a variant. Depending on the choice, the Insert Lexicon Data window or the Insert Variant window opens. If the parser fails to give the correct segmentation for a word, a right-click on an incorrect morpheme in the segmentation area lets the user access the Insert Lexicon Data or the Insert Variant window, and edit it accordingly. Then the interlinearize process may be relaunched to take this entry into account.

ELAN-CorpA Lexicon Structure The lexicon used in the interlinearize process is an XML file that is linked to ELAN through its associated .pfsx parameter file: C:\myProject\myELAN\LexBej.eafl

The ELAN-CorpA lexicon contains both lexical and grammatical morphemes. A lexical entry has three main elements: the citation form of the morpheme , its contextual forms (variants and underlying forms) , and its meaning(s) . These are organized in the lexicon structure according to this DTD (where initial uppercase elements have no child element):

The lexicon is composed of lexical entries which have an identifier number and a date of creation/modification. Each has three elements: – : whose value is the entry form and which has a type attribute: root, stem, wordForm, prefix or suffix (typ = lem|stem|wf|pref|suf); – : contains the various alternate forms (Variants) the entry can present in text, and possibly their Underlying forms; – : contains the various Glosses of the entry, and their associated categories (tierX). The element may contain various alternate forms . Each element: – must have a word form which contains the value of that alternate form. This element may have a der attribute which would identify the derivation of this alternate form with regard to the entry form; – may have an underlying form composed of various lexicon entries listed in . yór H- yor

The element contains the various glosses of the entry with their associated categories tierX.

ELAN-CorpA 319



like wish want

Importing a Toolbox dictionary It is possible to import a lexicon from Toolbox and save it as an ELAN-CorpA lexicon (see Interlinearize menu). The interface allows the user to associate ELANCorpA lexicon entities with their corresponding Toolbox fields. The most common associations are: Toolbox

ELAN-CorpA

lexeme

\lx

Lexeme

gloss

\ge

Gloss

part-of-speech

\ps

TierX

alternate form

\a

Variant

underlying form

\u

Underlying-form

TierX versus Part of Speech In the Toolbox lexicon structure, a part-of-speech \ps covers all the glosses \ge under it. \ps V2 \ge like \ge wish \ge want

So in cases of multiple glosses for a particular lexeme within the Toolbox parser, the category of a specific gloss is implicitly the part-of-speech that precedes that gloss in the lexicon. In the ELAN-CorpA lexicon, however, each Gloss has its own explicit category, tierX, which is not simply a part-of-speech category but can be a label of any kind necessary for the search engine to retrieve a morpheme. When a lexicon is imported from Toolbox, each Gloss will have its tierX attribute filled with the part-of-speech it depends on.

320 Christian Chanard

Problem of identifiers when importing the underlying form field (u) In the ELAN-CorpA lexicon, the underlying form of an entry or its variant, , consists of a sequence of other lexicon entries , which must have identifier numbers (ref attribute). But, when importing, these underlying segments do not yet have identifiers, so they are imported with an empty id attribute. For this reason, during the interlinearize process on any word with an underlying form, the user will have to select from the lexicon the entries that match these different segments. This updates the ref identifiers of the underlying segments of that entry.

Residual fields During importation, any Toolbox fields not associated to an ELAN lexicon field are saved inside an element labelled . Example of a simple root entry (no alternate form)

taf take

Example of a prefix entry imported from Toolbox

tuː- \gn art.f.sg.n DEF.SG.F.NOM

Example of a variant (altForm) with multiple glosses

ʔareː



ELAN-CorpA 321

areː like wish want

Example of a main entry with a derived form (altForm) and its function (der) (tikʷ = go_down > atkʷan = go_down\PFV.1SG )

tikʷ atkʷan go_down

Note that, in the case of a verb (V in the tierX field), if there is a nominalised derivation, a new entry has to be made for it, rather than a variant. This is because, even if a variant of an entry may have a derivation category (VN in the der field) which is added to the mb and ge values of the entry in the corresponding tiers, the category that appears in tierX is that of the entry itself (V). So any variant must stay in the same category as that of the entry itself, otherwise a new entry must be created for it. Example of an entry with a variant and its underlying form

yor yór H- yor

yór H- yor Inac- s’arrêter mv- v

322 Christian Chanard

s’arrêter

Exportation of the ELAN-CorpA lexicon to Toolbox

ʔareː areː like wish want

\lx ʔareː \a areː \ge like \rx V2 \ge wish \rx V2 \ge want \rx V2 \dt 14/Jan/2010

yor yór H- yor s’arrêter

\lx yor \a yór \u H- yor \ge s’arrêter \rx v \dt 12/May/2013



ELAN-CorpA 323

Parse Lexicon Structure The ELAN-CorpA parse lexicon is used to speed up the interlinearize process as it contains already parsed and annotated words. If parse lexicon is loaded, the user can choose to search there first rather than launching the parsing process directly. Parse lexicon is an XML file that is linked to ELAN through its associated .pfsx parameter file, e.g. C:\myProject\myELAN\parseBej.eafp

Here is its DTD:

the element is composed of parsed words and has a date of creation attribute dt. Each element has: – a word form element which contains the value of the word; – a morpheme element containing the list of morphemes that compose the word. The latter contains the value of the morpheme, and has the attributes ge ‘gloss’ and rx ‘category’. agoːjt a- goːj -t ahagit

324 Christian Chanard

a- hagit ...

The grid for the lexicon The lexicon is displayed on a grid, with one or more lines per entry. The fields of the grid are: – Nr: the line number; – Lexeme: the value of the lexical entry; – Variant(s): alternate forms of the Lexeme, separated by a comma if there are more than one; – Gloss: when an entry has different glosses, there is one line with the same Lexeme for each of them (for this reason the line number (Nr) is not the same as the identifier number (id) of the lexical entry); – TierX: the category corresponding to the gloss of the lexical entry; – Underlying-Form: underlying Segments of a compound entry, separated by a colon.

Sorting Clicking any column header sorts the Lexicon grid by that field.

New entry When a new entry is added to the lexicon, it appears at the end of the grid. The user may click on the Lexeme header to see that entry in its place in the lexicon grid.

Editing an entry When an entry is edited and saved, it is deleted from the lexicon, and saved as the most recent entry.



ELAN-CorpA 325

The segmentation area This area has two functions. Firstly, a grid displays potential morphological analyses for the target word, with one analysis per line and one morpheme per column. Secondly, when an analysis has been selected, and the selected segmentation morphemes have been validated in the lexicon, a grid shows the glosses. The cells in the grids respond to three MouseEvents: – Left-click: filters the lexicon in the first grid to show all the entries concerned by the segmentation the cell belongs to; – Double-click: selects the segmentation, opens the second grid and filters the lexicon to show the entries corresponding to the value of the segment; – Right-click: opens a pop-up menu with a choice between Insert a Record or Insert a Variant into the lexicon, with the value of the cell as the entry (that can be edited).

Ergonomics of the morpheme glossing in the segmentation area When a morpheme is selected in the segmentation grid, double-clicking on the corresponding item in the lexicon copies the Gloss and TierX values in the cells under the morpheme in the segmentation grid. Then the next morpheme in the segmentation area is selected (see Figure 4).

Figure 4.  Annotation process

If the morpheme to be glossed is the last item in the segmentation area, selecting its gloss in the lexicon with a double-click, launches the process that copies the glossing annotations of the morphemes from the segmentation area to underneath the active word in the annotation area.

326 Christian Chanard

The process of creating the annotations under the word to be interlinearized Once the parser has given a set of possible morphological segmentations for the target word, the user selects the correct one (by double-clicking on one of its morphemes) and chooses the correct annotations from the lexicon. Now these annotations have to be created under the word in the annotation area. Since this action modifies the annotation file, there is an option to return to a previous stage if necessary. The process of creating new annotations: – deletes the existing annotations (i.e. children of the word being interlinearized) with the ELAN undoable command ELANCommandFactory.DELETE_ MULTIPLE_ANNOS; – creates the mb annotations under the current word, using the ELAN undoable commands: ELANCommandFactory.NEW_ANNOTATION_VAL for the first child, then ELANCommandFactory.NEW_ANNOTATION_ VAL _AFTER for the next ones; – creates the ge and rx annotations using the same undoable command: ELANCommandFactory.NEW_ANNOTATION_VAL.

When this process is finished, it: – selects the next word in the annotation area; – removes the filter from the lexicon and clears the segmentation area.

Interlinearize menu The interlinearize process in ELAN-CorpA requires opening or creating a lexicon, then identifying which tier to start on and which tiers to place the annotations on. This is the subject of the Interlinearize menu which contains:

Lexicon – Create: creates a new lexicon with a .eafl extension; – Open: locates and opens an ELAN lexicon, linking it to the ELAN file; – Import: imports a Toolbox lexicon by displaying a frame with two windows letting the user match the Toolbox fields to the ELAN lexicon fields; – Export: exports the ELAN lexicon in the Toolbox standard format. Instead of a \ps field, there is an \rx field after each \ge field (gloss);



ELAN-CorpA 327

– Save: saves the ELAN lexicon, since the ELAN file.save command (ctrl/s) does not save the Lexicon files because these are linked. When closing ELAN, if the lexicon has not already been saved, a save prompt appears before quitting.

Parse A file containing the segmentations and annotations of the words of the current ELAN file can be created or merged with an existing one. This is called a Parse Lexicon (cf. Parse Lexicon Structure). A Parse Lexicon can be opened and linked to the ELAN file in order to speed up the interlinearize process. When a parse file is linked to the current ELAN file, a checkbox lets the auto-interlinearization process search in that Parse lexicon for the target word, and if that word has already been analysed (and therefore already exists in the parse lexicon), it copies the segmentations and annotations of the word directly, without asking, then moves on to the next word. It only stops if it finds an unknown word, or if a word appears in the lexicon with different annotations (homonyms in the parse lexicon). Then it asks the user to make a choice. The parse menu contains three items: – Export Parse Data: exports the segmentation and annotations of the words of the current ELAN file in a parse lexicon with the extension .eafp. The entries of the parse lexicon are unique except if the segmentation or gloss of a word has been analysed differently in the source file. In such cases there will be homonyms in the parse lexicon. The wordForms added to the parse lexicon are those of the current interlinear tier selected in the Parameters. So for example, if the ELAN file contains a dialogue, an export & merge should be done to merge the segmented and annotated wordForms of the second speaker (after having changed the Parameters) with those of the first; – Open Parse Data: locates and opens a parse lexicon, and links it to the current ELAN file, so that it will open automatically next time the ELAN file is opened; – Export & merge Parse Data: adds the segmentations and annotations of the words of the current ELAN file to an existing one. This selection will open a directory window to locate and select an existing parse file.

Parameters (Setting up the Interlinearize or Annotation process) Interlinearize is a dual process involving both morphological segmentation of a word and glossing of the morphemes. The annotation process has been isolated to

328 Christian Chanard

allow simple annotation of the elements on a tier by using any ELAN-CorpA lexicon. When the annotation tab is used, the parser is by-passed. It is possible to add new pairs of annotation tiers (with ge and rx types) to a source tier by annotating this tier with a different lexicon. The interlinearize and annotation processes can be launched on any tier, so it follows that the user must make a choice. By default, the names of the added tiers are mb (morpheme breaks), ge (glosses), and rx (categories) but these can be changed. The Parameters menu allows the configuration of each process. Three submenus are available to choose between creating new annotation tiers or using existing ones, and to define characters used to identify affixes in the lexicon. – Interlinearize Tier Parameter: for setting up the Interlinearize process i.e. which is the starting tier and which tiers will receive the segmentation and annotations. – Configure Interlinearize tiers: displays the Configure tiers window to create new tiers for interlinearization by choosing the segmentation tier from a dropdown list and editing the tier names to be created for annotation (mb, ge and rx by default). If the new tier name already exists, it is created again with a –cp extension; – Rename Interlinearize tiers: displays the Rename tiers window to change the current tier configuration for interlinearization. For these settings to be permanent when closing and re-opening an ELAN file, the parameters are saved in the preferences file (.pfsx) associated to the ELAN file. The keys for these parameters are: Interlinear.Tier.Word, Interlinear.Tier.Parse, Interlinear.Tier.Gloss and Interlinear.Tier.Pos in the following syntax:

mot@SP

– Annotation Tier parameter lets the user set up the Annotation process (for which there is no parsing process): – Configure Annotation tiers displays the Configure tiers window to let the user choose the annotation tier from a dropdown list and edit the name of the tier to be created (ge and rx by default). If the new tier name already exists, it is created again with a –cp extension. – Rename Annotate tiers displays the Rename tiers window with the default names (mb, ge and rx), and allows the user to edit them. These settings are saved in the .pfsx parameters file in the following elements: Annotate.Tier. Parse, Annotate.Tier.Gloss and Annotate.Tier.Pos.



ELAN-CorpA 329

Morpheme break characters: to set up the special characters used in the lexicon to mark prefixes and suffixes (–), clitics (=) and stems (_). These delimiters can be reproduced on the subsequent tiers during the interlinearize process by checking the box transmit morpheme break characters to annotation tiers. This parameter is saved in the .pfsx parameters file as 1 if checked, and as 0 otherwise:

1

Note that, if the .pfsx file associated to an ELAN file is lost, the interlinearize setup will have to be re-done.

LinkedFiles When an ELAN-CorpA Lexicon or a Parse Lexicon is opened to annotate an ELAN file, these files are linked to it, so they will open next time the ELAN file is opened. This menu shows the filenames of the Lexicon and ParseLexicon currently open. By unchecking a filename, the associated lexicon will be unlinked to the current ELAN file (i.e. it deletes the Interlinear.LexiconSource and/or the Interlinear. ParseSource key(s) from the .pfsx file.

Various interfaces to Create or Edit a lexicon entry During segmentation The interlinearize process is interactive. The ELAN parser offers possible segmentations of a word with respect to the contents of the lexicon (words and affixes). When a segmentation is incomplete, that is to say a sequence remains that is not in the lexicon, it displays this sequence preceded by an asterisk in the segmentation area. By right-clicking on this segment, one can insert the segment (the asterisk disappears) in the lexicon as a Lexeme or a Variant.

From the lexicon area The Insert Record button: this button is reversible. It can be changed from Insert Lexeme to Insert Variant by means of a dropdown list.

330 Christian Chanard

From the lexicon grid Once a line is selected in the lexicon grid area, a right-click on it lets the user choose show associated record which opens a show/edit window for this entry.

The Insert Lexicon data window This window has two tabs (see Figure 5):

Figure 5.  Insert Lexicon Data

– Insert Record is for inserting a new Lexeme with a Gloss, a possible derivation function and a category (TierX). The gloss, derivation and category can all be multiple for the same entry; – Insert Underlying Form is for entering an underlying form composed of other entries from the lexicon.

The Insert Variant window This window has two tabs (see Figure 6):

Figure 6.  Insert Variant



ELAN-CorpA 331

– Insert Variant is for inserting an alternate form and selecting the lexicon entry it refers to from a dropdown list. A derivational function can be added in a separate cell which appears at the end of the morpheme and its gloss when it is annotated; – Insert Underlying Form is for entering an underlying form composed of other entries from the lexicon.

The Show/Edit window This window has four tabs (see Figure 7):

Figure 7.  Show/Edit Lexicon Data

– Show Record shows the contents of the chosen Lexeme; – Edit/Delete Record allows the user to edit the entry value, delete variants of the entry and edit or delete the Lexeme contents; – In the Gloss values, glosses, derivations and categories can be edited, added or deleted; – In the Parse values, segments can be changed or deleted. The entry itself can be deleted by clicking the Delete button. The two other buttons let the user quit or save.

Conclusion The new interlinearize function included in this version of ELAN-CorpA facilitates the morphological segmentation and annotation work of the field linguist, and above all ensures a certain consistency across the entire corpus. When a text has been annotated with ELAN-CorpA, the resulting file is the same as if it had been annotated by hand or imported from Toolbox. The structure of the ELAN

332 Christian Chanard

file is not affected. Only the .pfsx parameter file (maintained by ELAN for each file) is different as it contains keys other than those in the Max Planck Institute’s current release version. These keys let ELAN-CorpA know which lexicons are to be used, where they are located, and what the actual setup of the interlinearize or annotation process is for this file (word and morpheme breaks, glosses, categories and tier names). These keys do not prevent ELAN from opening the file correctly, they just will be ignored. Unfortunately, the interlinearize tool cannot currently be added as a plugin to ELAN, though we would eventually like to develop this. In the meantime, each new release of ELAN requires us to add our own package to its sources and compile it as our own version of ELAN-CorpA. The main objective of the development reported in this article was to have a tool that would simulate the Toolbox interlinearize process to help researchers in their task. We are confident that we have achieved this goal. Looking ahead, we will be working towards greater flexibility, so as to further expand the audience for this indispensable tool.

Language index

A ‘Afar  231, 233, 234, 235, 237, 238, 238n9, 242, 245, 249, 286n2, 287, 289, 289n3, 292, 292n6 Afroasiatic  13, 28, 40, 43, 63, 76, 117, 118, 211, 212, 221, 223, 236, 244, 252, 253 Amer  287 Amharic  287 Arabic  286, 287 Arabic dialects  90, 242, 287 Libyan Arabic  181n12 Moroccan Arabic  15, 16, 19, 23, 27, 105n15, 108n17, 173, 175, 181, 191n22, 218, 231, 235, 238, 239, 240, 244, 246, 287, 289, 290, 291, 294, 297, 299, 302, 304n13 Sudanese Arabic  286 Tripoli Arabic  63, 96, 98, 100, 101, 102, 104, 105, 107, 112, 113, 287, 288, 301n12, 304 Modern Standard Arabic  287 B Bari  287 Beja  21, 41, 117, 118, 121, 122, 122n3, 123, 124, 131, 132, 133, 134, 136, 140, 143, 144, 161, 164, 165, 166, 222, 238, 240, 242, 245, 246, 249, 286, 287, 289, 290, 290n4, 292, 295, 295n8, 304 Berber  28, 63, 76, 77, 83, 85, 221, 223, 225, 226, 230, 231, 235, 237, 238, 258, 284n1, 288, 297

C Chadic  43, 74, 117, 118, 134, 223, 226, 230, 231, 235, 238, 252, 260, 265, 267 Central Chadic  258, 263, 266 West Chadic  262, 264 Chantyal  141, 152, 160 Cushitic  117, 118, 121, 223, 230, 231, 233, 238, 238n9, 239, 240, 243, 244, 248, 249, 250, 253 D Dolakha Newar  117, 118, 134, 165n18 E English  43, 43n1, 48n10, 59, 64, 67n2, 117, 122n3, 177, 208n1, 209, 210, 214, 215, 216n6, 221, 260, 263, 264, 265, 267, 284, 285, 287, 289, 289n3, 292n6, 297, 298, 299, 300, 301, 303, 304 F French  29n5, 55, 64, 175n5, 218, 221, 262, 287, 289, 289n3, 292n6, 294, 297, 299, 302, 304n13 Frisian  268, 274 G Gawwada  15, 223, 225, 226, 229, 231, 233, 233n8, 234, 236, 238, 239, 241, 242, 243, 245, 246, 247, 248, 249, 250, 251, 287, 288 Gidar  261 Greek  174, 179, 207, 231

H Hausa  43, 55, 71n4, 73, 75, 135, 136, 223, 226, 228, 229, 230n7, 231, 235, 238, 242, 248, 262, 267, 277, 284, 287, 295, 297, 303 Hdi  257n1, 263, 267 Hebrew  16, 18, 23, 24, 26, 30, 32, 33, 34, 35, 36, 160, 196n27, 203, 212, 221, 231, 235, 238, 242, 244, 245, 247, 248, 287, 290 Modern Hebrew  117, 118, 119, 152, 153, 154, 164, 165, 173, 177, 178, 179, 181 I Italian  287 J Juba Arabic  18, 23, 63, 90, 91n9, 92, 94, 95, 96, 97, 113, 117, 118, 119, 143, 144, 145, 150, 152, 161, 164, 165, 166, 181n12, 236, 239, 242, 252, 287, 289, 298, 299, 300, 301, 301n12 K Kabyle  16, 17, 23, 28, 83, 222, 223, 224, 224n4, 226, 228, 229, 230, 231, 232, 235, 237, 238, 241, 242, 243, 247, 258, 268, 269, 270, 272, 273, 274, 275, 276, 277, 284, 287, 288, 295n8 L Latin  174, 208n1, 209, 210, 231

334 Index M Mina  257n1, 258, 265, 266, 268, 269, 270, 271, 273, 274, 275, 276, 277 Mupun  264, 265, 267

235, 238, 239, 241, 244, 261, 279, 284n1, 290 Somali  286 South-Bauchi  134 Spanish  287

N Nilotic  287

T Tamasheq  63, 76, 77, 79, 81, 82, 83, 84, 85, 86, 88, 89, 90, 112, 114, 223, 224n4, 225, 226, 227, 228, 230, 231, 235, 238, 247, 249, 250, 288, 301n12 Tawellemmet  77 Tibeto-Burman  117, 118, 141 Ts’amakko  19, 23, 24, 25, 223, 225, 226, 229, 231, 233, 234, 235, 236, 238, 239, 242, 245, 246, 247, 249, 250, 287 Turkish  231, 287

O Omotic  43, 231, 232, 237, 238, 244 P Polish  264, 265 S Semitic  117, 118, 173, 173n1, 181, 181n12, 184, 185, 186, 193n23, 196, 197, 203, 231,

W Wolaytta  43, 231, 232, 234, 238, 240, 242, 245, 246, 248, 287, 288 Z Zaar  43, 44, 44n2, 44n3, 45, 47, 49, 50, 55, 59, 63, 66, 67, 69, 71, 72, 74, 75, 76, 95, 96, 112, 113, 117, 118, 134, 135, 136, 136n11, 140, 142, 143, 144, 152, 161, 164, 165, 166, 223, 224, 226, 228, 229, 230, 235, 236, 238, 242, 287, 301n12 Zargulla  237 Zayse  237

Subject index

A absolute state  82, 84, 85, 88, 112 absolutive  17, 229, 233, 235, 243 accent  (see stress) accusative  209, 210 acrolectal  143 active voice  144 addressee  122, 130, 131, 133, 227, 241, 247 (see also recipient) adjective  244, 245 adverb  20 adverbials  69, 70, 76 adverbial phrases of place  104 adverbial phrases of time  104 afterthought (see topic) agreement  238, 247 ambitransitive  144 anaphora  264 annexed state  82 annexation  184, 186 anteposition (see also word order)  110 apophony  204 articles  186, 205, 206 assertion  66 assertion marker  68 assertion particle  57, 135 negative assertion  99 positive assertion  98 assertive function  84 associative  277 B Background(ed)  54, 59 basic form (verbs)  194 basilectal  143 bell curve  101, 113 bilingualism  287

boundary  boundary break  26 boundary recognition  22, 33 boundary tones  22 major boundary  26 minor boundary  26 major prosodic boundary  166 minor prosodic boundary  155, 166 prosodic boundary 134, 136, 155, 161, 166, 169 break non-terminal break  25 major prosodic break  26 minor prosodic break  26 prosodic break  25, 119 C cataphoric (see pronoun) causative  197 character  122, 122n2, 124, 132 cleft construction  72 cleft sentences  107, 111 clitic marker  121, 153 comment  53, 56, 64, 65, 66, 76, 101, 102, 135 comparative  221 comparative concept  258 complement clauses object complement clauses  121, 153 complementary distribution  259 complementizer  118, 122, 127n6, 134, 135, 136, 140, 141, 142, 143, 144, 149, 153, 161, 165, 226 grammaticalized complementizer  144 conditional  104 conjunction  200n31, 201

constituent order (see word order) construct state  184 contextualization cue  293, 301, 304 continuous speech  123 Continuous tense  135 contour (see intonation) contrastive distribution  259 converb  121, 152, 160 converbal clause  130 conversation  135, 142, 144, 154, 155, 164 coordinated clause  130 coreference  265, 276 coreferential  101, 102, 103 covert  217, 218 D dative  143, 149, 209 de dicto  264 de re  264 declarative  78, 100, 123, 144, 153 declarative sentences  90 declarative utterances  123, 144, 153 declination (see also downdrift, pitch lowering)  48, 48n10, 67, 74, 75, 76, 98, 99, 101, 119, 135 defective verbs  193 definite  152, 153, 208, 209, 211, 260, 267 deictic  265, 266, 273, 274, 279 deictic center  227, 228, 230 deictic shift  118 deixis  264, 274 demonstrative  81, 95, 97, 103, 109, 166, 186, 190, 191, 269, 270, 272, 277

336 Index dependent clause  121, 130, 132, 166 derived verbal forms  196 detached elements  (see leftdislocated) determiner  190 deverbal nominal forms  144 devoiced  123, 124 diglossia  287 discourse marker  298, 299 discourse particle  145 ditransitive  121, 135, 143 downdrift (see also declination)  119 downstep  48n10, 49, 135 E elision  (of vowel)  204 embedded language  296 emotion  162 emotional statements  135 emphasis emphasis of adverbials  53, 53n15, 135 emphasis on negation  57, 135 enumeration  79 epistemic  163, 164 evaluative values  163 Evidential  135 exclamation  76, 135 exclamative  123, 128, 144, 146, 148, 153 exclamatory  129 F F0  (see fundamental frequency) fall  55 final fall  67, 68, 74, 76 fall-off  99 feminine  214 finite  130, 132, 160 focal  64, 66, 75, 76 focus argument focus  66, 82, 112, 113 contrastive focus  83, 92, 97, 107, 108n17, 135, 264, 304 contrastive predicate focus  93 counter-assertive focus  92, 94 focus intonation  301, 304 focus marker  92, 105 nominal predicate focus  106

object focus  82 predicate focus  84, 105, 106, 113 predicative relation focus  107 sentence focus  93 subject focus  82 verbal predicate focus  106 focusing  123 formal means  259 frame  75, 76, 96, 104 function word (see word) functional domains  257 fundamental frequency  90, 96, 97, 108n17, 109, 123, 157, 301, 303 future  199 G gender  182, 236 gender polarity  250 gender-sensitive  245, 246 grammatical gender 239 genitive  209, 213, 232, 233, 248, 249 genres  154, 157 grammatical sketch  222 grammatical word (see word) grammaticalized meaning  257, 267, 271 H head nominal head  121, 135, 153, 236, 269 predicative head  135 verbal head  121, 153 hesitation  56, 129, 158, 298 high-rise  56, 70, 76 hollow verb 193 I identifiability  268, 269, 272, 276 ideophone  57 Imperative  123, 128, 147, 192, 262 Imperfective  192 indefinite  260 indexes (verbal)  194 indirect speech (see speech report) indirect quote (see speech report)

infinitive  192 intention meaning  160 interference  94 interlinearize process  313 intersentential codeswitching  294 intonation continuing contour  120, 157 flat contour  144 intonation contour  118, 123, 135 intonation pattern 77, 78, 97 list-intonation  76 neutral intonation pattern  98 suspensive intonation  69, 79 terminal contour  118, 120, 144, 153 intonation unit  21 bilingual intonation unit, 296, 297, 298, 299, 300, 302, 306 intonation-unit boundaries  120, 150, 154, 161 monolingual intonation unit  293, 294, 295, 298, 299 suspended intonation unit  27, 28 truncated intonation unit  27 intonemes  53 initial intonemes 135 final intonemes  135 intrasentential code-switching  296, 301 intra-word code-switching  300 isochronous pattern  147, 152 isotony  23, 24, 25, 35, 119 IU (see intonation unit) L label  184, 186, 187, 188, 189, 192, 194, 214, 319 language-internal  221 left-dislocation  72, 81, 82, 85, 86, 89, 95, 96, 104, 112, 113 lengthening vowel extra-lengthening  298 vowel lengthening  79, 101, 103, 104, 110, 107, 109, 112, 113, 123 final lengthening  119 lexical word (see word) lexicon structure  317 linear order (see word order) LinkedFiles  329

Index 337

logophoric pronoun (see pronoun) logophoricity  267 loudness  118, 147 M matrix language  296 Matrix Language Frame  296 mention in discourse  274 metathesis  204 modal particle  74 modal viewpoint  (see stance) modality  163 mood  192, 194 morphosyntactic word  (see word) mover  225, 234 N narrative  123, 124, 135, 142, 144, 154 narrator  122, 122n2, 144, 153, 163 natural sex 239 negation  57, 202 nominal clause 203 nominal sentence  106 nominative  232, 233 non-canonical position  133 (see also word order) number  184, 236, 237 numeral  187 O object  188, 218 object pronoun  (see pronoun) onomatopoeia  128, 222 onset  130, 133 opener  135 Optative  160 P paratone  29, 30, 48, 129n7 parenthesis  53, 53n15, 135 Parse Lexicon  323 Parser Algorithm  314 participle  197 particle  103 particle ṛa  105, 106 passive (voice)  144, 197 past  263 patient  121, 214, 232 pause  48, 75, 76, 120, 260, 261

Perfective  192 period  35, 48 pharyngealization  123 phonological word  (see word) pitch emphatic high pitch  95, 96, 303 emphatic pitch rise  300, 301 pitch lowering (see also declination)  98, 99, 100 pitch movement  119, 120, 151 pitch peak  78 pitch range  118 pitch reset  50, 101, 102, 103, 112, 129, 144, 146, 155 pitch rise  100, 101 pitch variation  124, 144, 146, 147, 151, 165 sharp fall in pitch  107, 109 sharp rise in pitch  107 plateau realization  135 plural  216 plurative  252 polar question  67, 80, 91, 91n9, 92, 98, 100, 113, 123, 135 possessive pronoun  (see pronoun) pragmatic function  132 pragmatics  123 preamble  83, 104 preconstruct  64, 65 predicative relation  105 preposition  200 prepositional phrase  102, 103 preverbal position (see also word order)  85 previous mention  275 pronoun cataphoric pronominal clitic  106 clitic pronoun  103, 121 independent pronoun  271, 272 logophoric pronoun  265 object pronoun  121, 122, 133 personal pronoun  188 possessive pronoun  189 relative pronoun  192, 208 resumptive possessive pronoun  87 subject pronoun  264

prosodic boundary (see boundary) prosodic break (see break) prosodic cue  134 prosodic integration cline  118, 124, 136, 144, 152, 154 prosodic movement (see pitch movement) prosodic notation  26 prosodic word (see word) prosody-syntax interface 293 pseudo-passive  164 purposive meaning  152 Q question  123, 144, 153 question word  92 quotative complements  122, 153 quotative frame  118, 122, 134, 144 R recipient (see also addressee)  121, 122, 131, 133, 135, 143, 149, 152, 153, 232, 234 reference (see also coreference, switch reference)  263 deduced reference  266, 276 known reference  274 previous reference  267 unspecified reference  276 reflexive  197 emphatic reflexive  92 register (see also pitch)  53n15, 71, 76, 86, 87, 88, 90, 118 expansion of register  53n15 pitch register 100, 111 register shift  55, 74, 86, 87, 135 relative clause  121, 143, 153, 208, 236, 268, 269, 276 pseudo-relative clause  111 non-restrictive relative clauses  121 headless relative clauses  121, 143 relative marker  127n6, 143, 239, 276 rhetorical question  69, 70, 76 rhetorical strategy  124, 144, 154, 164 rise  56

338 Index Rise-Fall  70, 76 rush  129, 130 initial rush  119 S search engine  319 segmentation  17, 21, 22, 23, 25, 26, 28, 114, 119, 312 semantic role  103 semi-direct speech (see speech report) similative marker  154, 156 singulative  252 span  53n15 speech act theory  119 speech processing  132 speech report direct speech report  117, 118, 122, 124, 134, 136, 141, 145, 153, 154, 165 indirect speech report 117, 118, 135, 136, 144, 150, 153, 154, 161, 164, 163, 165 semi-direct speech  135n8 speech tempo  123 stance  163, 227, 228 state 235 step-down  53 step-up (see also upstep)  53 stress 77 accent shift 233, 249 demarcative accent  77 emphatic phrasal accent 120, 123 emphatic stress  135 final lexical stress  100 grammatical stress  123 lexical accent system 153 lexical stress  63, 100 lexical stress system  98 normal phrasal accent 120, 123 nuclear stress  98, 99 phrasal accents  120 pitch accent 90 sentence accent 90

sentence-level prominence stress  98 stress assignment  123 subject pronoun  (see pronoun) suppletive form  153 suspension (see also intonation)  28, 32 switch reference  265, 276 syllables (per intonation units)  135, 144, 145, 153 syncretism  248 syntactic obligatoriness 294 syntactical nucleus  84 T tag-switching  301, 304 target  45 thetic  64, 65, 66, 75, 76, 112, 113 text-sound indexation 28 timing  118 tonal parallelism  (see isotony) tonic parallelism  (see isotony) tone  135 flat tone  135 topic  123, 132, 133, 135 afterthought topic shift  133 contrastive topic  103, 132 object topic  101, 102, 103 specified argument topic  74, 95, 98 subject topic  103 topic resumption   102 unspecified argument topic 95 unspecified topic  74 topical  64, 66 topicalization  135, 265 complex topicalized utterance  103, 104 contrastive topicalisation  143 topicalizing morpheme  135 transcription  13, 14, 15, 27, 33n6 broad phonetic transcription  14

morphophonemic transcription  15 orthographic transcription  14 phonological transcription  215 transitive  121, 135, 152 truncation  27, 32 turn-taking  124 U underlying form 181 unmarked  90 unmarked declarative utterance  78 unscripted languages  117, 164 unspecified  (see reference) upstep (see also step-up)  135 utterance  25 V verb-final  244 verbal noun 198 vocative  129, 145, 147 W Wh-Question  67, 78, 92, 98, 99, 135, 294 word function word  153 grammatical word  204 lexical word  153 morphosyntactic word  18 phonological word  17, 18, 19, 20, 21, 91 prosodic word  17, 19, 20, 21, 153 word order  107, 109, 112, 121, 134, 135, 143, 152, 165, 257, 261 canonical word order  131 inverted word order  109, 112 Y Yes/No-Question (see polar question)