Austroasiatic Syntax in Areal and Diachronic Perspective 9004396950, 9789004396951

This volume elevates historical morpho-syntax to a research priority in the field of Southeast Asian language history, t

952 103 6MB

English Pages 269 [352] Year 2020

Report DMCA / Copyright


Polecaj historie

Austroasiatic Syntax in Areal and Diachronic Perspective
 9004396950, 9789004396951

Table of contents :
List of Illustrations and Tables
Notes on Contributors
Introduction: Austroasiatic Syntax in Diachronic and Areal Perspective • Paul Sidwell, Mathias Jenny and Mark Alves
Part 1: Syntactic Reconstruction
1 Verb-Initial Structures in Austroasiatic Languages • Mathias Jenny
2 Initial Steps in Reconstructing Proto-Vietic Syntax • Mark Alves
3 Nicobarese Comparative Grammar • Paul Sidwell
Part 2: Northern Austroasiatic Word Order
4 Word Order and the Grammaticalization of Gender in Khasian • Hiram Ring
5 Word Order in the Wa Languages • Atsushi Yamada
Part 3: Munda
6 Proto-Munda Prosody, Morphotactics and Morphosyntax in South Asian and Austroasiatic Contexts • Gregory D.S. Anderson
7 The Proto-Munda Predicate and the Austroasiatic Language Family • Felix Rau
8 Proto-Kherwarian Negation, TAM and Person-Indexing Interdependencies • Bikram Jora and Gregory D.S. Anderson
9 Relative Clauses in Santali: A Matching Analysis Approach • Mayuri Dilip, Rajesh Kumar, Kārumūri V. Subbārāo, G. Uma Maheshwar Rao and Martin Everaert
Part 4: Grammatical Lexicon
10 Austroasiatic Affixes and Grammatical Lexicon • Mark Alves, Mathias Jenny and Paul Sidwell

Citation preview

Austroasiatic Syntax in Areal and Diachronic Perspective

The Languages of Asia Series Series Editor Alexander Vovin (EHESS/CRLAO, Paris, France)

Associate Editor José Andrés Alonso de la Fuente ( Jagiellonian University, Kraków, Poland)

Editorial Board Mark Alves (Montgomery College) Gilles Authier (EPHE – École Pratique des Hautes Études, Paris) Anna Bugaeva (Tokyo University of Science/National Institute for Japanese Language and Linguistics) Bjarke Frellesvig (University of Oxford) Guillaume Jacques (Centre de recherches linguistiques sur l'Asie orientale) Juha Janhunen (University of Helsinki) Ross King (University of British Columbia) Marc Miyake (British Museum) Mehmet Ölmez (Istanbul University) Toshiki Osada (Institute of Nature and Humanity, Kyoto) Pittawayat Pittayaporn (Chulalongkorn University) Elisabetta Ragagnin (Freie Universität Berlin) Pavel Rykin (Russian Academy of Sciences) Marek Stachowski ( Jagiellonian University, Kraków, Poland) Yukinori Takubo (Kyoto University) John Whitman (Cornell University) Wu Ying-zhe (Inner Mongolia University)

volume 23 The titles published in this series are listed at

Austroasiatic Syntax in Areal and Diachronic Perspective Edited by

Mathias Jenny Paul Sidwell Mark Alves


Cover illustration: This photo of a stone face was taken at Bayon, Angkor Thom in Cambodia (courtesy of Mark Alves, May 23, 2019). Library of Congress Cataloging-in-Publication Data Names: Jenny, Mathias, editor. | Sidwell, Paul, editor. | Alves, Mark J., editor. Title: Austroasiatic syntax in areal and diachronic perspective / edited by Mathias Jenny, Paul Sidwell, Mark Alves. Description: Leiden ; Boston : Brill, [2020] | Series: The languages of Asia series, 2452-2961 ; volume 23 | Includes bibliographical references and index. Identifiers: LCCN 2020007392 | ISBN 9789004396951 (hardback) | ISBN 9789004425606 (ebook) Subjects: LCSH: Austroasiatic languages–Syntax. | Austroasiatic languages– Grammar, Historical. | Southeast Asia–Languages–Syntax. | Southeast Asia–Languages–Grammar, Historical. Classification: LCC PL4281 .A975 2020 | DDC 495.9/3–dc23 LC record available at

Typeface for the Latin, Greek, and Cyrillic scripts: “Brill”. See and download:‑typeface. ISSN 2452-2961 ISBN 978-90-04-39695-1 (hardback) ISBN 978-90-04-42560-6 (e-book) Copyright 2020 by Koninklijke Brill NV, Leiden, The Netherlands. Koninklijke Brill NV incorporates the imprints Brill, Brill Hes & De Graaf, Brill Nijhoff, Brill Rodopi, Brill Sense, Hotei Publishing, mentis Verlag, Verlag Ferdinand Schöningh and Wilhelm Fink Verlag. All rights reserved. No part of this publication may be reproduced, translated, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission from the publisher. Authorization to photocopy items for internal or personal use is granted by Koninklijke Brill NV provided that the appropriate fees are paid directly to The Copyright Clearance Center, 222 Rosewood Drive, Suite 910, Danvers, MA 01923, USA. Fees are subject to change. This book is printed on acid-free paper and produced in a sustainable manner.

Contents Preface vii List of Illustrations and Tables viii Notes on Contributors x Introduction: Austroasiatic Syntax in Diachronic and Areal Perspective 1 Paul Sidwell, Mathias Jenny and Mark Alves

Part 1 Syntactic Reconstruction 1

Verb-Initial Structures in Austroasiatic Languages Mathias Jenny



Initial Steps in Reconstructing Proto-Vietic Syntax Mark Alves



Nicobarese Comparative Grammar Paul Sidwell


Part 2 Northern Austroasiatic Word Order 4

Word Order and the Grammaticalization of Gender in Khasian Hiram Ring


Word Order in the Wa Languages Atsushi Yamada





Part 3 Munda 6

Proto-Munda Prosody, Morphotactics and Morphosyntax in South Asian and Austroasiatic Contexts 157 Gregory D.S. Anderson


The Proto-Munda Predicate and the Austroasiatic Language Family 198 Felix Rau


Proto-Kherwarian Negation, TAM and Person-Indexing Interdependencies 236 Bikram Jora and Gregory D.S. Anderson


Relative Clauses in Santali: A Matching Analysis Approach 258 Mayuri Dilip, Rajesh Kumar, Kārumūri V. Subbārāo, G. Uma Maheshwar Rao and Martin Everaert

Part 4 Grammatical Lexicon 10

Austroasiatic Affixes and Grammatical Lexicon 287 Mark Alves, Mathias Jenny and Paul Sidwell Index


Preface This volume grew out of the Austroasiatic Workshop on Comparative Syntax held September 5–7, 2016, at the Myanmar Center, Chiang Mai University, Thailand. The workshop was conceived as a sister event to the International Conference on Austroasiatic Linguistics (ICAAL), which is not held every year. At the 2015 ICAAL, held in Siem Reap (Cambodia), it was decided that it would be appropriate to organise working meetings on off-years, at which participants could present and discuss work tackling specific themes of programmatic and topical importance. This is in contrast to the open sessions of the ICAAL, at which individual scholars present their talks according to their personal priorities, without necessary reference to a common theme. The workshop was held in Chiang Mai with financial support of the Max Planck Institute Jena (Department of Linguistic and Cultural Evolution) and the University of Zurich (Department of Comparative Linguistics) in cooperation with Chiang Mai University (Myanmar Center). This generous support facilitated participation of some 16 scholars plus attendance by various local staff and students. Participants came from Australia, Germany, India, Japan, Singapore, Switzerland, Thailand, USA, and included scholars outside of Austroasiatic studies who have expertise in areal and typological linguistics and syntax, as well as history. This proved to be a successful strategy, allowing discussions to be grounded in wider typological and historical context, in addition to Austroasiatic etymological and descriptive aspects, generating lively and productive discussion. The plan for the meeting arose when the question of Austroasiatic historical syntax was raised in Siam Reap in the context of Jenny’s (2015) groundbreaking chapter presenting evidence that VS/VAP word order may have been dominant in proto-Austroasiatic clauses. The novelty and audacity of the claim reminded us all just how much syntactic studies have been absent from, or poorly handled, in Austroasiatic language descriptions, and attention was galvanised around the prospects of integrating syntax into the investigation of Austroasiatic language history, including reconstructing syntactic structures of the proto-language. The chapters that emerged, and were subsequently brought together for this volume, demonstrate a variety of investigations into syntactic change, including reconstruction of word order at phrasal and clausal levels, syntactic dependencies, and grammaticalizations, and motivations in terms of internal dynamics and areal contexts. The authors use different methodologies in their approach to the task of explaining the syntactic diversity of the AA family.

Illustrations and Tables Maps 0.1 3.1

Map of AA languages 2 Nicobar languages from Wurm & Hattori (eds.) (1981/83)


Figures 1.1 2.1 3.1 6.1 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11 7.12 7.13 7.14 7.15 7.16

Distribution of verb-initial patterns in AA 33 Vietic sub-branches and languages (Sidwell 2015: 203–205) 47 Nicobar languages tree. 85 Classification of the Munda languages using lexical and grammatical data 160 Proto-Munda clause 201 The syntactic positions and reconstructed morphemes 202 The proposed prosodic structure associated with the predicate 202 Pinnow’s reconstruction of the proto-Munda predicate/clause 203 Zide & Anderson’s 2001 reconstruction of the proto-Munda verb 204 Anderson’s 2007 reconstruction of the proto-Munda verb 204 Comparison of Bahnaric and proto-Munda causatives 208 Morphological of the proto-Munda verb according to Anderson (2007) 222 The predicate in Kharia according to Peterson (2011, p. 335) 225 Prosodic structure of the verb and pre-verbal positions in Kharia 225 Proto-Munda pre-stem structure 226 The proto-Munda clause 227 The proto-Munda clause compared to other Austroasiatic languages 227 Prosodic structure of the proto-Munda predicate 228 proto-Munda clause 231 The syntactic positions and reconstructed morphemes 231

Tables 1.1 2.1 2.2 2.3

Verb-initial structures in AA 33 Sociocultural and linguistics aspects of Sinitic and Vietic in the Han Dynasty 54 Shared Vietic grammatical vocabulary 59 Key syntactic data sources for Vietic languages 61

illustrations and tables 2.4 4.1 4.2 4.3 4.4 4.5 4.6 5.1 5.2 5.3 5.4 5.5 6.1 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11 8.1 8.2 8.3 8.4 9.1 10.1 10.2 10.3 10.4 10.5 10.6 10.7

Hypothesized features of the Proto-Vietic clause 69 Khasian varieties and verb-initial structures 119 Personal pronouns in Khasian Pnar 120 Summary of pronouns in Austroasiatic languages 121 The Man Noi Plang pronominal paradigm (Lewis 2008) 122 Some Pnar expressives and elaborate expressions 125 Schematic examples of gender marking development in Khasian 127 Clause structures 140 Word order and subject type 141 Information structure 143 Word order in subordinate clauses 147 Word orders in the Awa, Parauk, and Va languages 148 Distribution of negators in Munda 183 Causative morphemes in Munda languages 206 Verbs with noun incorporation 209 Reciprocal morphemes in Munda languages 210 Negation morphemes in Munda languages 212 Past/perfect morphemes in Munda languages 216 Future/imperfect morphemes in Munda languages 217 The position of person markers in Munda languages 219 Pronouns and subject markers in Munda languages 220 Pronouns and object markers in Munda languages 221 Common morphological slots with bound morphemes in Munda languages 223 The prefix domain in Munda languages 224 Subject-marking patterns in Kherwarian negative formations 248 Subject-marking patterns in Kherwarian negative formations 248 OBJ vs. SUBJ encoding in Kherwarian PRS (negative) copula forms 252 OBJ vs. SUBJ encoding in Kherwarian PST (negative) copula forms 252 Syntactic operations of five types of analysis for relative clauses 272 Proto Austroasiatic pronouns/demonstratives (Pinnow 1965) 299 Proto Austroasiatic pronouns/demonstratives (Pinnow 1966: 167) 300 Proto Austroasiatic Pronouns/demonstratives (Sidwell, 2015) 301 Referential terms 309 Interrogative terms 312 Locative terms 315 Grammatical(ized) verbs 321


Notes on Contributors Mark Alves is a professor in the Department of Reading, ESL, and Linguistics at Montgomery College in Rockville, Maryland (USA). His research has focused on historical, comparative, and typological linguistics in Southeast Asia, especially Vietnamese and the Austroasiatic language family. Mark is also the Editor-inChief of the Journal of the Southeast Asian Linguistics Society (University of Hawaii Press). Gregory D.S. Anderson Founder and Director of the Living Tongues Institute for Endangered Languages, is a specialist in the Munda language family. He is currently surveying Munda to develop studies on the typology and reconstruction of the family, and grammatical, lexical and text materials of individual Munda languages. Mayuri Dilip is research scholar in linguistics in the Department of Humanities and Social Sciences at the Indian Institute of Technology Madras, Chennai. She is working on some select aspects in Syntax of Santali for her doctoral dissertation. Mayuri works on syntactic typology, where she compares the structure of Santali with some South Asian languages. Martin Everaert is professor of Linguistics at Utrecht University. He works primarily on the syntax-semantics and the lexicon-syntax interface. His other areas of interest are language evolution and the history of linguistics. He is, a.o., co-editor of the Wiley-Blackwell Companions to Linguistics. Mathias Jenny is a senior researcher and lecturer at the Department of Comparative Linguistics, University of Zurich, Switzerland. His main fields of interest are language contact and language change in Southeast Asia, with a special focus on the languages of Myanmar/Burma, on which he has conducted fieldwork and widely published over the past twenty years. Mathias was an editor of the 2015 Brill The Handbook of the Austroasiatic Languages.

notes on contributors


Bikram Jora is a project-coordinator of South Asia at the Living Tongues Institute of Endangered Languages. His research mainly focused on Munda languages and especially on Kherwarian languages. He is an expert field linguist, documented many Munda and Kho-Bwa languages spoken in India. Rajesh Kumar teaches linguistics in the Department of Humanities and Social Sciences at the Indian Institute of Technology Madras, Chennai. The broad goal of his research is to uncover regularities underlying both the form (what language is) and sociolinguistic functions (what language does) of natural languages. He obtained his Ph.D. in linguistics from the University of Illinois at Urbana-Champaign. G. Uma Maheshwar Rao is a Professor at the University of Hyderabad, India. His research interests cut across the fields of Language technology for Indian languages, Linguistic Genetics, Human Migrations, Linguistic Archaeology, Long Range Comparison of Central Asiatic language families, and empirical concerns of the economics of Indian languages. Felix Rau researcher at the University of Cologne, has conducted extensive fieldwork on Gorum and has been working in the Koraput Area of Odisha (India) since 2002. His research interest includes the Munda languages and the historical relationship of this branch to the rest of the Austroasiatic family. Hiram Ring is a researcher at the Department of Comparative Linguistics, University of Zurich, Switzerland. His research has focused particularly on Khasian languages, but also extends more broadly across phonology, morphology, and syntax, and encompasses diachronic, synchronic and areal perspectives. Paul Sidwell is an honorary associate at Sydney University, and a founding partner of the firm Language Intelligence (Canberra). His research career has focused on the reconstruction of Austroasiatic language history at the branch and family level, and wider implications for the history of Mainland Southeast Asia. Paul was an editor of the 2015 Brill The Handbook of the Austroasiatic Languages.


notes on contributors

Kārumūri V. Subbārāo is formerly professor of Linguistics at Delhi University and chair professor at Hyderabad University. His research is focused on the syntactic typology of South Asian languages in general, and Austro-Asiatic, and Tibeto-Burman languages in particular. At present, he is working on Mizo and Rabha grammars. Atsushi Yamada is a professor of linguistics in Japan Health Care College. His research concerns documentary linguistic and linguistic anthropology, and he has conducted field research on minority groups in China and northern mainland South-East Asia. He is currently working on the project “New perspectives of Text Studies in Yunnan, China”.


Austroasiatic Syntax in Diachronic and Areal Perspective Paul Sidwell, Mathias Jenny and Mark Alves


Diachronic Syntax

Diachronic syntax has become a field of serious study only in the last couple of decades, much later than diachronic phonology and morphology. The study of diachronic syntax includes all aspects of syntactic change caused by language internal and external factors. Some areas of historical syntax are more well developed than others; for example, grammaticalization theory (Traugott & Heine eds. 1991, Hopper & Traugott 2003, Heine & Kuteva 2007, etc.) and areal and contact linguistics (Thomason 2001, Heine & Kuteva 2005, Matras & Sakel eds. 2007, Chamoreau & Léglise eds. 2012, etc.) have enjoyed decades of attention and their assumptions and methods are widely understood among scholars. We see this applied to explaining the divergent word order typologies across Austroasiatic, as branches came into contact with other language families, and underwent convergence within language areas. Grammaticalization theory underlies the reconstruction of the emergence of inflectional morphology in Munda and Nicobarese, and the widespread use of semantically bleached open class lexemes as grammatical functors among the more configurational Austroasiatic languages. On the other hand, there remain especially challenging issues such as the nature and viability of syntactic reconstruction—not merely the modeling of syntactic change over time, which may be grounded in historical texts—but the application of ideas and practices grounded in the comparative method to recover the structures of proto-languages on the basis of synchronic descriptive data. There has been increasing discussion of the theoretical and empirical problems around comparative syntactic reconstruction (e.g. Pires & Thomason 2008, Barðdal & Eythórsson 2012, Walkden 2013), focussing on issues such as the units of comparison and level of abstraction, testing the analogy with phonological reconstruction. Importantly, such studies seek to go beyond the Indo-European grounded traditions that largely relied upon historical texts to establish earlier linguistic states and model their development over time. The rising wave of interest in historical syntax has been largely uncoordinated,

© koninklijke brill nv, leiden, 2020 | doi:10.1163/9789004425606_002


map 0.1

sidwell, jenny and alves

Map of AA languages reproduced from van Driem (2001: 267) with gracious permission

although volumes like this one, and Viti (ed.) (2015) are pulling together work under the rubric, and focussing attention to the various issues raised by the work being done. Although historical syntax is necessarily grounded within a general theory of language, this volume focuses on Austroasiatic because of the special opportunities it presents for historical syntactic studies. The typological diversity and areality of the family is quite well understood, we have some 1500 years of written history for two of the branches (Mon and Khmer), the phonological and lexical reconstruction is well developed, we know much about the history of the Mainland Southeast Asian region, and useful (and in some cases extensive) textual and descriptive resources are available for all branches of the family. Austroasiatic studies saw a general renaissance in the 2000s; regular meetings of the International Conference on Austroasiatic Linguistics resumed in 2006, in the same year Harry Shorto’s A Mon-Khmer Comparative Dictionary was posthumously published, and a substantial two-volume handbook (Jenny & Sidwell eds.) appeared in 2014. The recent era of documentary linguistics has seen a dramatic improvement in the extent and quality of descriptive resources being produced, and electronic data sharing and processing has made previous generations of work available and useful to researchers in ways never imaged by their authors. This favourable environment allows us to move beyond tra-

austroasiatic syntax in diachronic and areal perspective


ditional historical-linguistic concerns with classification, phonology, and lexicon, and to elevate historical morpho-syntax to a research priority. Below we briefly review Austroasiatic historical studies and lay out our programmatic perspective on the way forward for the historical investigation of syntax. This discussion grew directly out of the 2016 Chiang Mai workshop discussions, and we hope that it will help to frame readers’ thoughts as they consider the various chapters herein, as well as provide some clarity and direction for ongoing and future studies. If our ultimate goal is to understand and reconstruct as fully as possible an account of Austroasiatic language history it is essentially that some kind of accessible road map is sketched out for discussion and consideration.


Austroasiatic Historical Studies

The Austroasiatic language family has received considerable attention by comparative linguists since the late 19th century. The main methods employed in respect of language classification have always been heavily lexical, and to a lesser extent typological, but a specific focus on syntactic features has been missing. The unity of the family was contested for decades, resulting at times in the separation of the Munda group and Vietnamese from the rest of the family, and the Nicobarese varieties also posed special problems in terms of figuring out how they fit into the family. In 1904, Schmidt, relying on numerals, placed Munda with the Indo-China Austroasiatic languages. Serious divisions of opinion arose in the 1930s, and in 1942 that Sebeok outlined a general case against a connection between Munda and other Austroasiatic, asserting that the lexical parallels are unconvincing, and ought to be subordinate to consideration of structural features. Until the 1960s, while the arguments remained largely lexical, scholars had begun increasingly to point out the structural differences between the rather consistent verb-final word order of the Munda languages of India and the rest of the family spoken in Southeast Asia (termed “Mon-Khmer” by many authors) with flexible verb-medial clause structures, which also allow for pragmatic variation. So, while in 1959 Pinnow firmly re-established the credibility of Munda within Austroasiatic, again relying on lexical (and phonological) evidence, and he also soon went on (1963) to raise the issue of “structural classification” and he characterised the problem as follows: On the basis of their syntactical framework the Austroasiatic languages may be classified into two groups: (a) the largely co-ordinating and ana-


sidwell, jenny and alves

lytic Khmer-Nicobar languages, and (b) the largely subordinating and synthetic Munda languages.1 Pinnow 1963: 145

This framework was further elaborated upon by Donegan and Stampe at the 1978 SICAL meeting and in various subsequent publications (1983, 1993, 2004) drawing in a new dimension in terms of the phonology/syntax interface. Yet, we can say that, so far, when Austroasiatic syntax has been considered, the facts are typically very broadly characterised and have not dealt with specifics that would allow for confident reconstructions of syntactic structures. Much progress has been made in the late 20th and early 21st century in refining (and rearranging) the classification and development of the Austroasiatic languages (see Sidwell 2009 for overview), and we now have a much clearer idea of the overall shape of the Austroasiatic language family. It has become evident that many of the differences between Munda and the rest of Austroasiatic are due to areal influences, although it is extremely problematic to distinguish between contact driven and accidental convergence within the MSEA sian Linguistic Area (Sidwell 2015). Thus, Munda has converged structurally in many respects with dominant neighbouring Indo-Aryan and Dravidian languages, but not without leaving traces of older structures in several points. On the other hand, no shared innovations suggest that “Mon-Khmer” is actually a subgroup at the same level as Munda, but the evidence rather points towards a flat tree structure, with Munda splitting off at the same level as the other branches (Sidwell & Blench 2011). While the classification and reconstruction of Austroasiatic has has been seriously pursued by scholars and delivered real progress in understanding, similar energy has not been invested in typological, especially syntactic, studies in Austroasiatic languages. For a long time the linguistic material available for comparative studies was restricted to wordlists of varying quality and (phonetic/orthographic) accuracy, or hidden in texts without much in terms of annotation and analysis (one laudable exception is Costello and Sulavan’s (1993) collection of Katu folktales, complete with transcription and word-byword glosses). Fortunately, the tide has begun to turn and the 21st century has seen the publication of a number of excellent full grammars and grammar sketches of Austroasiatic languages written in modern linguistic terminology (e.g. Burenhult 2005, Kruspe 2004, Peterson 2011, Haiman 2011, Alves 2006,

1 This was apparently not meant to be a characterisation of the phylogenetic relations within the family, although the remark subsequently influenced thinking on this question.

austroasiatic syntax in diachronic and areal perspective


Jenny 2005, among others), as well as the two volume Handbook of Austroasiatic Languages (Jenny & Sidwell 2015). With the availability of good descriptive material, it has become possible to look at the Austroasiatic languages also from a syntactic typological perspective (e.g. a typological overview of Austroasiatic by Jenny, Weber, and Weymouth 2015 and Austroasiatic typological morphology by Alves 2014 and 2015). The family has potentially much to offer in different areas, from insights into the working of language contact to internal syntactic change and the areality and spread of features. The superficial typological differences among Austroasiatic languages often reflect the history of migrations and contact with unrelated peoples and languages, which have left traces in the profile of the documented varieties. Behind these traces of contact induced features lies the core of proto-Austroasiatic syntax and morphology. This original core can be worked out in a similar fashion to which we strive to separate indigenous vocabulary from loans which entered all languages at different times and from different sources. Teasing apart inherited and contact induced traits has to rely upon knowledge of: a) what is common to all or most languages within the family, b) corresponding structures in potential contact languages, both past and present, and c) consideration of local or sub-regional patterns, relevant to a) and b). Besides linguistic data, the syntactic reconstruction, like lexical comparison, must be able to be reconciled with insights from history, geography, and population genetics.


Questions and Methodology

The basic questions to be answered seem simple enough: What is where, why, and when? What categories, structures and patterns are found in which languages of the family and in which areas within specific languages (clauses, phrases, subordinate clauses, etc.). How can we explain the occurrence of these structures where we find them? Are there genealogical and/or areal patterns in the distribution? Can the structures be explained by biases in language processing and language use? Simple as the questions ‘what, where, why, when?’ appear, there are a number of methodological and practical obstacles in answering them. How can we get around the lack of certain historical data in most cases when it comes to Austroasiatic languages, their speakers and migrations? How do we get past the lexical and phonological reconstructions of sub-branches and look at syntactic features? While sound changes are typically relied upon to be regular


sidwell, jenny and alves

and systematic, it is generally regarded to be problematic to expect syntactic changes to follow regular or predictable paths (Harris & Campbell 1995). The general challenge is to ask the right questions in the right context and find a methodology or methodologies to answer them. The following points should be considered when looking at the development of Austroasiatic syntax: – What are the most appropriate methods to reconstruct Austroasiatic syntax? How much can we say about the syntax of Proto-Austroasiatic or of later periods? What are the limits of what can be explained? – What features are found across the family? Special importance is to be given to features that are found in widely dispersed languages and are not explainable by language contact (areal convergence) nor typologically very common. This leads to the question “What were the syntactic traits of ProtoAustroasiatic and of later stages of Austroasiatic in various sub-branches?” – (To what extent) are typological exceptions or unexpected structural features evidence of events in the history of the language? – When changes occurred, under what circumstances, and due to what factors did they occur? – How can we distinguish shared original structures from shared changes due to spread of features (e.g. the general SV/AVP structure in MSEA)? How can we distinguish external factors (language contact) from internal factors (change by language use, pragmatic variation, etc.)? – How much can the timing of stages be obtained regarding historical syntactic developments in Austroasiatic? 3.1

Some Questions and Challenges in the Undertaking of the Syntactic Reconstruction of Proto-Austroasiatic

– There are many gaps in available data, and the available sources of descriptive materials vary greatly in quality and comparability. How can this situation be overcome? How can we weigh the reliability of inferences and analyses based on these data? – Which features are retentions? Which are changes? Of changes, are they internally or externally motivated? It may be difficult to find reliable criteria by which to answer these questions in many or most cases, but this shouldn’t preclude the attempt to find answers. – How can we reconstruct syntactic structures when there is uncertain status of some lexical categories and subcategories? Alternatively, if we take the view that word classes are emergent properties of syntactic structures, rather than the latter being constrained by them, do we gain more useful traction in modeling historical syntax?

austroasiatic syntax in diachronic and areal perspective


– How can we distinguish SV/AVP from topic-comment constructions? Or more generally, how can we distinguish syntactic from pragmatic features in Austroasiatic grammar? 3.2 Time Depth to Consider within the Phylum Other questions to consider involve historical timing. Can we assign time depths to the different genealogical levels of the Austroasiatic languages, as in the sample list below? How can we correlate the linguistic periods with historical periods? Do we actually need this information for a successful reconstruction of Austroasiatic syntax? – Proto-Austroasiatic – Main branch (e.g., Vietic, before and into early contact with Chinese) – Sub-branch (e.g., Viet-Muong, possibly post-Chinese independence) – Modern (e.g., Muong varieties) 3.3 Questions of Geography The data should answer questions about the timing but also historical geography of the different Austroasiatic groups and the proto-language, and possible correlation with regional languages with some known history (e.g. Chinese, Cham, Pali, etc.). – What can we say about the approximate center of dispersal of Austroasiatic? – Is (the knowledge of) a geographic center important or not? Should it not be possible to reconstruct without explicit reference to dispersal area? – What does the morphosyntactic data suggest regarding the separation of earlier Austroasiatic speech communities and issues such as community size, language shift, bottlenecking of populations, and so on? 3.4 Questions of Society In many cases peripheral societies and peoples (‘residual zones’) tend to be more conservative than centralized societies (‘spread zones’) (Nichols 1996), or at least retain archaic features which are more recognisable as such. Language change is often triggered by L2 speakers, which suggests that peripheral languages with no or few L2 speakers retain archaic structures better than central languages (Næss & Jenny 2011). – How can we use information about the social structure of a people to assess the status of their language? – Are peripheral languages really more conservative than central languages?

8 4

sidwell, jenny and alves

Data Sources

The sources of data include modern, ancient, and comparative data. An important point to keep in mind is that in many cases grammatical descriptions are not sufficient as reliable sources of data. Often one has to resort to texts to find certain structures, which may be ‘minor use patterns’ (Heine & Kuteva 2005) in a language, but go back to older stages and may be retentions of the proto-language. Various sources are available, and the number of good quality linguistic descriptions and annotated and glossed texts in Austroasiatic languages is increasing steadily. The following can be considered the main types of data sources for the comparative study of Austroasiatic syntax, to which data gathered through fieldwork may be added, both elicitation and recordings of natural speech. 1. Grammatical descriptions of individual languages and features 2. Texts of individual languages (ideally glossed and annotated) 3. Historical documents and inscriptions (e.g. Mon, Khmer, Vietnamese) 4. Reconstructions of individual branches and sub-branches 5. Typological comparison within and outside the family


Syntactic Aspects to Consider

Diachronic syntactic studies can include not only core clause and phrase structures but also functional categories, lexical categories, and etymological sources and development of grammatical vocabulary. The following aspects may be worthy of more in-depth investigation in Austroasiatic languages. 5.1 Constituent Order in Clauses (Dependent, Independent) Different word order types are found in Austroasiatic languages, at least to some extent coinciding with general areal patterns. The bulk of Austroasiatic languages on the SEA mainland between Myanmar and Vietnam are verb-medial, with the main word orders SV and AVP. In all or most of these languages, arguments can be freely dropped if they are known or retrievable from the linguistic or extra-linguistic context. For pragmatic reasons, arguments and peripheral elements can be fronted or placed after clauses as afterthoughts (Antitopics). This leads to a great variation of constituent order possibilities in the eastern Austroasiatic languages. The variation is in most cases restricted to independent (main) clauses, with dependent clauses generally exhibiting the basic word orders SV and AVP. This can be explained by the fact that subordinate clauses are often presupposed and thus less or not at all accessible to pragmatic processes.

austroasiatic syntax in diachronic and areal perspective


The Munda languages of India are rather consistently verb-final, or more generally head-final, a characteristic which can be considered a case of convergence towards the neighboring Indo-Aryan and Dravidian languages. The Nicobarese varieties, heavily affected by language contacts despite being geographically separated from the mainland, show generally verb-initial structures in main clauses and verb-medial in dependent clauses. The picture gets more complex when considering more peripheral constructions and geographically more peripheral languages. In Mainland SEA, verbmedial is the dominating word order in all major languages (apart from pragmatically triggered fronted constituents and afterthoughts). Languages without scripts or standardized grammars show more variation, though, and verbinitial clauses are not uncommon, e.g. in Katu (Costello & Sulavan 1993). In Rumai (Palaungic) most types of subordinate clauses are verb-initial, and in Wa verb-initial structures are found mainly in subordinate clauses, but also in main clauses. While Standard Khasi is mostly verb-medial, the less standardized varieties such as Pnar and War are mostly verb-initial. Even in generally verb-final Munda languages, agreement patterns and noun-incorporation show verb-initial structures which can be taken as more archaic than the verbfinal clause structures. In Aslian, verb-initial clauses are found in different contexts, and claimed to be the most frequent order in Semelai (Kruspe 2004). In Nicobarese main clauses are generally verb-initial and subordinate clauses verb-medial. While the verb-final patterns in Munda and verb-medial structures in mainland SEA can be explained as areal features, language contact cannot straightforwardly account for the verb-initial patterns found throughout the family (with the possible exception of Nicobarese and Aslian, where Austronesian influence may have played a role). In light of this kind of variation, what can we conclude from the distribution of word order types in Austroasiatic languages? What can be attributed to areal convergence? One possible explanation is that the proto-language at least allowed this word order, possibly with pragmatically triggered variation. 5.2 Information Structure The encoding of information structure is closely linked to the word order variation found in the family, though not restricted to it. Semelai (Aslian) is reported to have a tendency towards “new information before old information” (Kruspe 2004), which also seems to be the case also for Nicobarese. Eastern Austroasiatic languages show topic-comment structure, which is also true if the topical element is the verb and the predicate a nominal (e.g. ‘left is only rice, the curry is all eaten’). Some languages use explicit markers indicating the information structural status/function of constituents (e.g. Mon, see Jenny 2006, 2009).


sidwell, jenny and alves

Thus, one must ask, what means are found in Austroasiatic languages to encode information structure? What can we say about the changes and development of information structure encoding? To what extent is the encoding of information structure mirrored in neighboring non-Austroasiatic languages? Can we draw any conclusions for the proto-language—can we suggest that pAA had a comment-topic structure which later changed to topic-comment in some languages? 5.3 Phrase Structure: Components of Verb Phrases and Noun Phrases Different elements can be combined to make up noun phrases and verb phrases, from relatively simple to quite morphologically complex. While highly isolating Vietnamese does not seem to make a distinction between the wordlevel and phrase level, the situation is less clear in other languages. Khasi, Nicobarese and Munda make use of noun incorporation or compounding which result in word forms syntactically distinct from phrases. Yet this is not clearly the case in Mon and Palaungic, for example. Verb phrases can consist of a number of verbal and non-verbal elements. Questions to consider include: Which elements in which languages can be attributed to language contact and areal convergence? What are the regularities in the ordering of elements? Can we conclusively say anything conclusive about the elements making up an NP and a VP in proto-Austroasiatic? 5.4 Subordinate Clauses—Form and Functions Subordinate clauses are cross-linguistically not a unified category, neither in forms employed nor in functions covered. Adverbial, complement, and attributive clauses can behave very differently, and can take different markers within various branches and individual languages. A subordinate clause may or may not take an overt subordinator, which may occur in clause initial or clause final position, or within the subordinate clause, like the relativizer in Middle and Modern Mon. Subordinate clauses have been said to be more conservative than independent clauses (Bybee 2001), so they may be useful in identifying older stages of the languages. Additionally, pragmatic word order variation is often restricted in subordinate clauses. Questions to consider include: What forms of subordinate clauses are found where in the family? What functions are covered by which forms? Are there overt subordinators? Where are they placed in the clause (and why)? 5.5 Complex Verbal Predicates (Secondary Verbs, Serial Verbs, TAM) In many Austroasiatic languages, a verbal predicate can consist of more than one lexical verb. Secondary verbs in multi-verb construction can appear in dif-

austroasiatic syntax in diachronic and areal perspective


ferent forms, positions, and functions. Especially common are verbs of movement expressing aspectual and directional functions, such as ‘stay’ for ongoing events, ‘go’ or ‘arrive’ to indicate locative goals. More specific functions are often found expressed by verbs such as ‘eat’, ‘throw, discard’, ‘keep’, and many others. These constructions are typical of SEA languages in general, so it is not easy to attribute individual secondary verb constructions to Austroasiatic. But the presence of common constructions can be telling in terms of past contact scenarios, from which contact/migrations may be deduced. Questions to consider include: Which secondary verbs are found in Austroasiatic and surrounding languages? Where can contact influence be detected? On what bases can construction types be attributed to proto-Austroasiatic? Are there common sources of the expressions of specific functions? 5.6 Negation Negation is in most Austroasiatic languages achieved by pre-verbal negators, but some languages have post-verbal negators. These can be free or bound morphemes, as in Vietnamese and Mon, respectively. Some negators have lexical origins as verb (e.g. Car and Munda ʔət, Khmer ʔɒt), while others have no known etymology as anything but negators. In many or most Austroasiatic languages, negators can be used only with verbal elements, the negation of nonverbal constituents requiring special constructions, often involving a copula or a dummy verb. The position of the negator in complex verbal predicates is either determined by its scope (usually over the immediately following verb) or by structural restrictions. In some languages, negation can be reinforced by clause-final particles, which in turn can be reanalyzed as principal negators, leading to the loss of the original negation marker in some cases. Besides general negators, a number of Austroasiatic languages (e.g. Rumai, Khasi, Kammu, and Vietnamese) have a special ‘perfect/past tense negator’ meaning ‘not yet’. In a number of Austroasiatic languages, negative imperative expressions make use of a special prohibitive marker, which may be a free or a bound morpheme. These may go back to lexical verbs meaning ‘stop’, ‘avoid’ or similar. Questions to consider include: Are there any common forms or shared developments of negators across the family? Is basic negation generally restricted to verbal elements? Is narrow scope of negation over the immediately following verb the norm? Can we postulate bound negators for the proto-language, or is all we see development from free (possibly verbal) negators to clitics or affixes? Is prohibitive a separate category? Is there a pattern in the development of prohibitive markers?


sidwell, jenny and alves

5.7 Nominal Classification (Classifiers, Noun Classes) A number of Austroasiatic languages make use of classifiers when common nouns are combined with numerals (Adams 1991). The use of classifiers is not (or barely) attested in the early documented languages Old Mon and Old Khmer. Also modern Mon and Khmer do not use classifiers, but the structure of their quantifier phrases is reminiscent of classifier constructions: In Mon, measure words like ‘day’, ‘mile’, ‘kilo’, etc. may follow a numeral, while common nouns precede the numeral. There are thus the different constructions ‘two day’ vs. ‘friend two’, similar to languages with classifiers such as Thai, where the same expressions would be ‘two day’ and ‘friend two CLF’, respectively. Classifiers are usually seen as a typical feature of mainland SEA and EA languages (Chinese, Thai, Vietnamese, etc.), but they are also regularly used in non-core SEA languages, such as Khasian (which also has a grammatical gender system) and Nicobarese. At the same time, classifiers are all but absent from the core SEA languages Mon and Khmer. Questions to consider include: Are there common patterns in classifier constructions across the family? Can we reconstruct classifiers for the proto-language? If proto-Austroasiatic had no classifiers, how can the occurrence of classifiers in languages like Khasi and Nicobarese be explained? 5.8 General Linkers (Proto-Austroasiatic *ta and Similar) A grammatical morpheme (called here ‘linker’) covering a range of functions, including marking of oblique objects, attributive/relative expressions, adverbials, and others is found in a number of Austroasiatic languages, often with a similar form ta or ti (e.g. Old Mon, Old Khmer, Car). Not all functions are present in all languages where the marker occurs, and in some languages the form of the marker is slightly different (e.g. Palaungic), but the range of functions remains similar. Questions to consider include: What are the functions (and forms) we can reconstruct for the ‘linker’ in proto-Austroasiatic? How are the same functions expressed in the languages that have lost the linker or restricted its functions? 5.9 Grammatical Relations, Argument Coding and Alignment Most Austroasiatic languages show accusative alignment in most or all construction types, that is, intransitive subjects are treated the same as transitive subjects (Agents), but differently from objects (Patients). Grammatical relations (GR s) can be marked by position (e.g. S/A preverbal, P postverbal), crossreferencing on the verb (e.g. agreement with S and or A), case marking (affixes, adpositions), or others. Although claims have been made that ergative constructions are widespread in Austroasiatic (Diffloth n.d.), these are actually not

austroasiatic syntax in diachronic and areal perspective


found in many languages of the family. Aslian languages have special marking for Agents (as opposed to intransitive S), but the alignment is rather tripartite, treating S, A, and P differently. Some intransitive subjects in Mon occur after the verb, making them syntactically similar to Patients, but this is restricted to a small class of (topical) verbs (Jenny forthc.). Questions to consider include: What alignment patterns are found where in the family? Can external influences be seen as responsible for these patterns? What can we reconstruct for proto-Austroasiatic? What grammatical relations can be found in what construction types? Are there any patterns in GR s across the family? What means are there to overtly mark arguments? 5.10 Questions—Polar and Content Austroasiatic languages have different means to form polar and content questions. Intonation may be used by some languages for the expression of the former, while frequently sentence particles are employed, either alone or in combination with intonation. Content questions typically contain an interrogative element, which may occur in situ or in a high-focus position in the clause, often in immediate preverbal or clause initial position. A sentence particle may or may not be present in content questions. If there is a question particle, it is typically different from the polar question marker. Questions to consider include: What means are common in the expression of questions across the family? Are there common forms of interrogatives that can be traced back to the proto-language? Is there any regularity in the placement of interrogatives and question particles? 5.11 Sentence and Clause Particles Sentence particles are used in many languages in SEA and beyond to express a wide range of functions, including speaker’s attitude, illocutionary force, and others, usually in sentence final position. Some of these particles in some languages can be traced to lexical origins, but in many cases they have no obvious lexical source. On the other hand, a number of sentence particles have lookalikes in non-related neighbouring languages (e.g. the emphatic particle Mon nah and Thai náʔ). Mutual influence may well play a role here, but the widespread use of sentence and clause particles in Austroasiatic languages suggests that this was a feature of the proto-language. Questions to consider include: Are there any common forms and functions of sentence particles across the family? Can we separate areal influences from inherited traits?


sidwell, jenny and alves

5.12 Prosodic Structure and Syntax The prosody of an utterance is of eminent importance, but the connection between intonation and syntax in Austroasiatic languages has not received much attention in the past. Claims have been made that the major change that occurred in Munda languages was from rising to falling intonation patterns (Donegan & Stampe 2004), and that all other changes naturally followed from this. The prosody of at least some Munda languages (e.g. Kharia, see Peterson 2007; also Anderson this volume) shows falling intonation on the clause and phrase level, but rising on the word level. Similar patterns are found in Mon (also in close contact with a consistent verb-final/head-final language), though with much less effect in the syntactic structure (though verb-final patterns are increasingly found, see Jenny 2011). Questions to consider include: Can we say anything about the prosody of proto-Austroasiatic based on patterns found in the modern languages? Is there a connection in posited or observed syntactic changes and changes in prosody? Is the prosody more likely to converge with neighbouring languages than other aspects of a language? 5.13 Grammaticalization Recurring grammaticalization clines are seen in the region, such as those noted in Matisoff (1991). The possibility of reconstructing grammaticalized sources in Proto-Austroasiatic is challenging although a small start is made in the Appendix of Grammatical Lexicon in this volume. Within individual A subbranches the prospects for reconstructing instances grammaticalization are much clearer, and the task entirely necessary. Many of these instances of grammaticalization of words and structures (e.g., deriving the comparative from ‘over’) are likely the result of language contact, although contact-induced patterns of grammaticalization of words and structures are still part of the linguistic history in the language family and must be described and as much as possible explained in terms of the paths and their likely sources, whether direct borrowing, more regional contact, or language internal innovations that are typologically common (e.g. those included in Heine and Kuteva 2002). The matter of grammaticalization is significant as it is relevant to almost all of the previously mentioned categories. Questions to consider include: What, if any, grammaticalized words can be reconstructed to the proto-Austroasiatic level? How many of such words are the result of regional or sub-branch developments? Are there places in time or geography in which certain forms emerged and spread in Austroasiatic? Which are the result of contact with Austroasiatic or other language groups? How can we determine the difference between preservations, innovations, and borrowings?

austroasiatic syntax in diachronic and areal perspective


5.14 Other Topics to Consider A number of other morphosyntactic topics promise insight into the structure of the Austroasiatic proto-language. The following list is by no means exhaustive, and as more material becomes available, more aspects can be studied in increasing depth and detail. – Syntactic bases of word classes – Pronominal systems – Reference tracking (anaphoric and logophoric elements) – Numeral systems – Agreement (verbal and nominal) – Indication of plurality (verbal and nominal) – Indication of aspect, tense, or state – Voice alternations (morphological and periphrastic)



The necessity and opportunity to study the history of Austroasiatic syntax is long overdue. There is more data on more Austroasiatic languages than ever before, more typological context to consider, and recent decades of ideas and insights about language contact and change to take into account. It is, however, a tremendous task that will require coordinated effort, multiple stages, and we can expect varying degrees of success and failure. As this indicates, there are many more questions than answers at this point, although some are addressed to various extent by papers in this volume. Yet further insights are likely to come and will hopefully contribute to understanding of human history in the greater Southeast Asian region.

References Adams, Karen Lee. 1989. Systems of Numeral Classification in the Mon-Khmer, Nicobarese and Aslian Subfamilies of Austroasiatic. Pacific Linguistics Series B-No.101. Canberra: Australian National University. Alves, Mark J. 2014. Mon-Khmer derivational morphology. In Rochelle Lieber and Pavel Stekauer (eds.) The Oxford Handbook of Derivational Morphology. Oxford: Oxford University Press, 520–544. Alves, Mark J. 2015. Morphological functions among Mon-Khmer Languages: Beyond the Basics. In Nick Enfield and Bernard Comrie (eds.). Mainland Southeast Asian Languages: The State of the Art. Berlin/New York: Mouton de Gruyter, 531–557.


sidwell, jenny and alves

Alves, Mark. 2006. A Grammar of Pacoh: a Mon-Khmer language of the central highlands of Vietnam. Canberra: Pacific Linguistics (PL580). Barðdal, Jóhanna & Thórhallur Eythórsson. 2012. Reconstructing Syntax: Construction grammar and the comparative method. In Sign-Based Construction Grammar, Hans C. Boss & Ivan Sag (eds.), (257–308). Stanford CA:CSLI Publications. Burenhult, Niclas. 2005. A grammar of Jahai. Canberra: Pacific Linguistics. Bybee, Joan. 2001. Main clauses are innovative, subordinate clauses are conservative. In Joan L. Bybee and Michael Noonan (eds.) Complex Sentences in Grammar and Discourse: Essays in honor of Sandra A. Thompson. Amsterdam/Philadelphia: John Benjamins Publishing Company, 1–17. Chamoreau, Claudine and Isabelle Léglise (eds.). 2012. Dynamics of Contact-Induced Language Change. Berlin: deGruyterMouton. Costello, Nancy A. and Khamluan Sulavan. 1993. Katu FolkTales and Society. Vientiane: Institute of Research on Lao Culture and Society. Diffloth, Gérard (n.d.). Austroasiatic languages. Encyclopædia Britannica Online. Retrieved 29 June, 2016, from‑langua ges. Donegan, Patricia and D. Stampe. 2004. Rhythm and the synthetic drift of Munda. In R. Singh (ed.) The Yearbook of South Asian Languages and Linguistics. Berlin/New York: Mouton de Gruyter, 3–36. Donegan, Patricia and David Stampe 1983. Rhythm and holistic organization of language structure. In: J. Richardson, M. Marks and A. Chukerman (eds) Papers from the Parasession on the Interplay of Phonology, Morphology and Syntax. Chicago: Chicago Linguistic Society, 337–353. Donegan, Patricia. 1993. Rhythm and vocalic drift in Munda and Mon-Khmer. Linguistics of the Tibeto-Burman Area 16.1: 1–43 van Driem, George. 2001. Languages of the Himalayas: An Ethnolinguistic Handbook of the Greater Himalayan Region, containing an Introduction to the Symbiotic Theory of Language (2 volumes). Leiden: Brill. Haiman, John. 2011. Cambodian (Khmer). Amsterdam/Philadelphia: John Benjamins. Harris, Alice and Lyle Campbell. 1995. Historical Syntax in Cross-Linguistic Perspective. Cambridge: Cambridge University Press. Heine, Bernd and Tania Kuteva. 2002. World Lexicon of Grammaticalization. New York: Cambridge University Press. Heine, Bernd and Tania Kuteva. 2005. Language Contact and Grammatical Change. Cambridge: Cambridge University Press. Heine, Bernd and Tania Kuteva. 2007. The Genesis of Grammar. Oxford: Oxford University Press, 2007. Hopper, Paul and Elizabeth Traugott. 2003. Grammaticalization. Cambridge: Cambridge University Press.

austroasiatic syntax in diachronic and areal perspective


Jenny, Mathias. 2011. Burmese in Mon syntax: external influence and internal development. In: Srichampa, Sophana; Sidwell, Paul; Gregerson, Kenneth. Austroasiatic Studies: papers from ICAAL 4. Dallas, Salaya, Canberra: SIL International; Mahidol University, Pacific Linguistics, 48–64. Jenny, Mathias & Paul Sidwell (eds.) 2015. Handbook of Austroasiatic Languages. Leiden/Boston: Brill. Jenny, Mathias. 2015. Syntactic diversity and change in Austroasiatic languages. In: Carlotta Viti (ed.) Perspectives on Historical Syntax. Amsterdam: John Benjamins, 317– 340. Jenny, Mathias, Tobias Webber and Rachel Weymuth. 2015. The Austroasiatic Languages: a typological overview. In Jenny, Mathias & Paul Sidwell (eds.) Handbook of Austroasiatic Languages. Leiden/Boston: Brill, 13–133. Jenny, Mathias. 2005. The Verb System of Mon. Zurich: ASAS. Kruspe, Nicole D. 2004. A grammar of Semelai. Cambridge: Cambridge University Press. Matisoff, James A. 1991. Areal and Universal Dimensions of Grammatization in Lahu. In Elizabeth Closs Traugott and Bernd Heine (eds.) Approaches to grammaticalization. Volume 2: Focus on types of grammatical markers. Amsterdam/Philadelphia: John Benjamins, 383–453 Matras, Yaron and Jeanette Sakel (eds). 2007. Grammatical Borrowing in Cross-Linguistic Perspective. Berlin: Mouton de Gruyter. Næss, Åshild & Mathias Jenny. 2011. Who changes language? Bilingualism and structural change in Burma and the Reef Islands. Journal of Language Contact, 4(2): 217–249. Nichols, Johanna. 1996. Linguistic diversity in space and time. 1992. Chicago: University of Chicago Press. Peterson, John. 2011. A Grammar of Kharia. Leiden/Boston; Brill. Pinnow, Heinz-Jürgen. 1959. Versuch einer historischen Lautlehre der Kharia-Sprache. Wiesbaden: Otto Harrassowitz. Pinnow, Heinz-Jürgen. 1963. The position of the Muṇḍā languages within the Austroasiatic language family. In: H.L. Shorto (ed.). Linguistic Comparison in Southeast Asia and the Pacific. London: SOAS, 140–152 Pires, Acrisio and Sarah Thomason. 2008. How much syntactic reconstruction is possible? In Principles of Syntactic Reconstruction, G. Ferraresi and M. Goldbach (eds.). Amsterdam: John Benjamins. pp. 27–72. Schmidt, Wilhelm. 1904. Grundzüge einer Lautlehre der Khasi-Sprache in ihren Beziehungen zu derjenigen der Mon-Khmer-Sprachen. Mit einem Anhang: die PalaungWa-, und Riang-Sprachen des mittleren Salwin. Abh. Bayrischen Akademie der Wissenschaft 1.22.3: 677–810. Sebeok, Thomas A. 1942. An examination of the Austro-Asiatic Language family. Language, 18: 206–217 Sidwell, Paul & Roger Blench. 2011. The Austroasiatic Urheimat: the Southeastern River-


sidwell, jenny and alves

ine Hypothesis. In N.J. Enfield (ed.) The Dynamics of Human Diversity. Canberra: Pacific Linguistics, 315–343. Sidwell, Paul. 2015. Local drift and areal convergence in the Restructuring of Mainland Southeast Asian Languages. In Nicholas Enfield and Bernard Comrie (eds.) Languages of Mainland Southeast Asia: the state of the art. Berlin: DeGruyterMouton, 51–81. Thomason, Sarah. 2001. Language Contact: An Introduction. Edinburgh: Edinburgh University Press. Traugott, Elizabeth and Bernd Heine (eds.). 1991. Approaches to grammaticalization, vols. I & II. Amsterdam: John Benjamins. Viti, Carlotta (ed.). 2015. Perspectives on Historical Syntax. Amsterdam/Philadelphia: John Benjamins Publishing Company. Walkden, George. 2013. The correspondence problem in syntactic reconstruction. Diachronica 30.1: 95–122.

part 1 Syntactic Reconstruction

chapter 1

Verb-Initial Structures in Austroasiatic Languages Mathias Jenny



The Austroasiatic (AA) languages can be divided into three typologically distinct zones, which coincide largely with the geographical distribution in Mainland Southeast Asia (SEA), central and eastern India, and on the Nicobar Islands. The Aslian languages of peninsular Malaysia are typologically somewhere between Mainland SEAn and the Nicobar languages, while the Khasian languages in Northeast India are closer to the SEAn languages. The great majority of AA languages belong to the general Mainland Southeast Asian type, with the Munda group at least superficially well integrated in the South Asian context. Socially isolated for the last few decades from the rest of the family and other languages except Hindi and English, the Nicobarese languages geographically form a typological group apart within the family. The picture that the AA languages in the three main groups superficially present is very divergent in many aspects of the linguistic structure, from phonology and lexicon to the grammatical organization in terms of morphology and syntax. One obvious difference is the divergent constituent orders on the clause level. It has long been noticed that the Munda languages differ from the rest of the family in having verb-final clause structure (Pinnow 1963, Donegan & Stampe 2004), while the bulk of the family is verb-medial. The syntax of the various Nicobarese varieties was until recently not widely known or well-described, but it has been noted by some authors that these languages are generally verb-initial or predicate-initial (e.g. Braine 1970; de Roepstorff 1884; Man 1889; Whitehead 1925). The present study takes a detailed look at the distribution of the verb-initial word order patterns found in AA languages. Verb-initial structures pop up in every corner of the family, either on the clause level or deeper down in the grammar, in phrases or morphologically complex words, or in peripheral constructions that are not frequently taken into account in general overviews. The wide-spread presence of verb-initial constructions or apparent traces thereof in the family needs to be explained, as language contact can in most cases not be seen as a (or the main) triggering factor, nor is a purely pragmatic explanation satisfying. Based on data from a wide range of AA languages, it is claimed

© koninklijke brill nv, leiden, 2020 | doi:10.1163/9789004425606_003



that verb-initiality or predicate-initiality is inherited from the protolanguage and retained in some languages under certain circumstances, which include both linguistic and societal factors.1


Word Order in AA Languages—Diversity and Areal Convergence

Superficially, the picture of AA syntax is rather simple. The eastern AA languages of Mainland Southeast Asia, including Khasi in Northeast India and the Aslian languages in peninsular Malaysia, have the main word orders SV/AVP, that is, they are verb-medial. Changes in the main word order are possible and frequent, mostly for pragmatic reasons. These include fronting of topical or focal elements, both arguments and peripheral constituents, and right displacement of anti-topics or afterthoughts. Pragmatically triggered word order modifications are in most cases restricted to independent clauses and do not widely occur in subordinate clauses. The syntactic profile of the eastern AA languages is largely in accordance with the Southeast Asian area (see Enfield 2005; Comrie 2007). Sentence (1a) illustrates the default, pragmatically unmarked word order in Mon (Austroasiatic, Monic), while (1b) shows a fronted focal object. (1) Mon2 a. rɔ̀ ə ʔuə məkɤ̀ʔ ciəʔ hənɔm. companion 1sg des eat noodles ‘My friend doesn’t want to eat noodles.’ b. pɤŋ ɗɛh hə-mòc ciəʔ. cooked.rice 3sg neg-des eat ‘Rice he doesn’t want to eat.’ The same sentences (2a–b) in non-related Thai (Tai-Kadai) show parallel mutations in the same contexts.

1 This study is partly based on research financed by the Swiss National Science Foundation project nrs. 100012_150136 “The Greater Burma Zone” and 100015_176264 “The Development of Verb-Initial Structures Cross-linguistically: Insights from Austroasiatic”. It may be more adequate to talk of ‘predicate-initial’ structures rather than ‘verb-initial’, as it seems that in most cases the order of elements follows the information structure commenttopic in the languages discussed in this paper. Verbs being the category prototypically encoding predicates, I will use the term ‘verb-initial’ here as quasi-synonym of ‘predicate-initial’. 2 Where no source is given for the examples, the data are from the author’s own field notes.

verb-initial structures in austroasiatic languages


(2) a. Thai pʰɯ̂ ən pʰǒm jàːk kin kǔəj.tǐəw. friend 1sg.m eat noodles b. kʰâːw (kʰǎw) mâj jàːk kin. rice 3hum not eat Similarly, subjects (S and A) can occur in postverbal position as afterthoughts or anti-topics. These are not part of the clause, as is seen by the possibility to have pronouns in situ in the clause. The patient is frequently fronted in these constructions. Sentences (3) and (4) illustrate the occurrence of afterthoughts, again in Mon and Thai. (3) Mon mùʔ (ɗɛh) hə-hɒm pùh, rɔ̀ ə ʔuə. what 3 neg-speak neg companion 1sg (4) Thai (kʰǎw) mâj dâj pʰûːt ʔəraj, pʰɯ̂ ən pʰǒm. 3hum not get speak what friend 1sg.m These pragmatically triggered word orders result in PAV and VPA or PVA constructions, respectively. Importantly, these alternations do not occur in subordinate clauses, which can be taken as an indication that SV/AVP is the unmarked constituent order in Mon and Thai, as well as other Mainland SEAn languages. The Munda languages, on the other hand, are believed to have adopted verbfinal word order from the neighboring Dravidian and Indo-Aryan languages, forming part of the South Asian linguistic area (e.g. Masica 1976; Subbarao 2012). Examples (5) and (6) show the clause structure of Sora (Austroasiatic, Munda) with Hindi-Urdu (Indo-European, Indo-Aryan). (5) Sora anin dɔŋ-ɲɛn dɑrəj-ən ə-tiy-ben idsɨm-tɛ ted. 3sg obj-1sg rice-art inf-give-inf want-3 not ‘He/she doesn’t want to give me rice.’ (Donegan & Stampe 2004) (6) Hindi-Urdu ham laḍḍũ khā-nā cāh-te hãĩ. 1pl sweets eat-inf wish-impv ‘We wish to eat sweets.’ (Subbarao 2012)



The verb-initial structures generally found in Nicobarese languages can be explained either as contact influence from Austronesian languages such as Toba Batak (Cumming 1984; Nababan 1981), Nias (Brown 2005), and Old Malay (Mahdi 2005), or as retention of archaic AA structures. The latter has been taken as evidence that Austroasiatic is at a deeper level related with Austronesian in the hypothesized Austric phylum, a view superficially supported by common morphological patterns mentioned in the literature (e.g. Reid 1994; see also Jenny 2015a). The clause structures of Car (Austroasiatic, Nicobarese) in sentence (7) is compared with Nias (Austronesian, Malayo-Polynesian) in (8). (7) Car tíːntə ŋam maʔãhãːŋa cin nə ʔurehekúːʔ meh. send.down def one.who.invites 1sg 3.sub in.front.of 2sg.obl ‘I send my messenger before thy face.’ (Sidwell 2015) (8) Nias i-be zi=to-röi gö-da ba nasu 3sg.rls-give rel=res-leave.behind.mut food.mut loc dog.mut ina-gu. mother-1sg.poss ‘My mother gave our leftovers to the dog.’ (Brown 2005)


Verb-Initial Patterns in AA

The general image of the distribution of word order patterns in AA, based on the data usually found in the literature, is thus rather straightforward. A closer look at data from a wider range of AA languages and constructions in different languages suggests that verb-initial structures are more widespread. This section looks at the distribution in languages and language domains where verb-initial structures occur in AA. The presence of these structures has to be explained, either as contact-induced or language-internal development, or as retention of archaic features, going back possibly to proto-AA. 3.1 Distribution 3.1.1 Munda The Munda languages, though strictly verb-final (or more generally head-final) in their clause (and mostly phrase) structure exhibit some notable structures that suggest an earlier profile that was non-verb-final (head initial). While the majority of head-initial constructions (e.g. synthetic possessives, compounds)

verb-initial structures in austroasiatic languages


are not decisive as to whether proto-Munda had overall verb-medial or verbinitial structure, in a few cases verb-initiality must be assumed for some stage of the languages. Possible explanations and scenarios will be explored in section 3. Synthetic Forms It has been claimed that “today’s morphology is yesterday’s syntax” (Givón 1971), so one may assume that traces of older syntactic patterns can be found in morphological structures, though there is cross-linguistic evidence to the contrary. There has been some discussion and disagreement regarding the status of synthetic verb forms of pronominal affixes among Munda languages, but these show in many cases verb-initial patterns, as seen in the following examples from Sora. The Sora verb, sometimes with verbal markers like negation, regularly comes at the beginning of the verbal expression, with arguments following, resulting in VS and VPA/VAP-like patterns. For native speakers of Sora, a clause-initial subject NP or pronoun must be added in sentence (9), and the prefix ə- in (10) has the default reading ‘1pl.excl’, so in both cases it can be argued that the expressions are actually AVP with a person-agreement marker suffixed to the verbal expression. (9) Sora paŋ-ti-dar-iɲ-teːn carry-give-cooked.rice-1–3.pst ‘He brought and gave me rice.’ (Anderson 2007) (10) Sora ə-ədn-əl-gə{b}rɔj-l-ɑy pl-not-recip-{caus}feel.ashamed-pa-1 ‘We (exclusive) didn’t shame each other.’ (Donegan & Stampe 2004) Apart from the uncertainty of the adequate interpretation of the data, the position of pronominal elements in verb phrases or verbal expressions shows great variability cross-linguistically, so that no conclusive statement can be made about the Munda data. They can at best be taken as weak support for any claims about original verb-initial patterns in AA. Compounds/Incorporation Clearer support of verb-initial structures in Munda (and proto-AA) are instances of incorporated agents and subjects in South Munda verbs, which rather clearly suggest that at the time of their formation, the word order was



verb-initial. This is seen in verb-noun compounds as in the following examples (11) and (12) from Sora and Kharia. The order of morphemes found here are VS and VAP. (11) Sora ɲam-kid-t-am ‘tiger will seize you’ [seize-tiger-npst-2] saː-bud-t-am ‘bear will mangle you’ [mangle-bear-npst-2] paŋ-sum-t-am ‘spirit will carry you away’ [carry-spirit-npst-2] (Anderson 2007) (12) Kharia ajoˀɖ-ɖaʔ ‘dry up (of water)’ [‘dry.up-water’] muʔ-siŋ ‘sunrise’ [‘come.out-sun’] uluʔ-ɖaʔ ‘boil (of water)’ [‘boil-water’] (Peterson 2011) Morphological processes, including compounding and incorporation, are less easily affected by language contact and tend to be more conservative than higher levels of the grammar, such as clausal word order. This makes the South Munda incorporated forms good candidates for evidence of earlier verb-initial structures, even if they are restricted to some southern languages of the group. Assuming that it is more likely for the North Munda languages to have restructured the phrase structure along with the clause structure than for South Munda to have introduced verb-initial phrase structure, we can argue that the pattern must be old in the family. This argument is supported by the fact that there are no known verb-initial languages in contact with South Munda past or present. 3.1.2 Aslian Aslian languages are generally described as verb-medial (Kruspe 2004; Burenhult 2005; Wnuk 2016), though with some degree of free or pragmatic variation. However, in Semelai, “the most frequently employed constituent order is verb initial with either A and O, or S, placed after the verb” (Kruspe 2004), which is a basic comment-topic arrangement of information, similar to some western Austronesian languages like Tagalic, but also Old Malay with verb-initial tendencies (e.g. Mahdi 2005) and minority Austronesian languages on Sumatra, such as Toba Batak, among others. Aslian languages such as Semelai and Jahai frequently show VS and VAP structures, besides the general SV/AVP pattern. No pragmatic difference is given by Kruspe (2004, and p.c.) for sentences (13a) and (13b). This suggests that with intransitive predicates, both SV and VS are equally possible and probably common.

verb-initial structures in austroasiatic languages


(13) Semelai a. kʰbəs ʔmaʔ=hn. be.dead mother=3poss ‘His mother died.’ (Kruspe 2004) b. ʔmaʔ=hn kʰbəs mother=3poss be.dead ‘His mother died.’ (Kruspe 2004) Sentences (14) from Jahai and (15) from Semelai show preverbal clitic agreement markers and postverbal full NP s expressing the A argument. While it is possible to analyze these clauses as containing a pronominal subject in preverbal and full NP subject in postverbal (antitopic) position, this analysis is unlikely at least synchronically in the case of Semelai due to the presence of the patient after the agent in (15). Antitopical agents generally occur outside the clause as afterthought, in this case after the patient argument. Thus, a VAP analysis does seem to apply in these sentences. (14) Jahai braʔ wa=muŋkɛr lagiʔ ka=ʔap ton. neg irr.3sg=to.wake.up again sbj=tiger that ‘That tiger never woke up again.’ (Burenhult 2005) (15) Semelai ki=bukɒʔ la=knlək hn=pintuʔ. 3a=open a=husband o=door ‘The husband opened the door.’ (Kruspe 2004) (16) Semelai daʔ daʔ mandeh mə=ki=ca la=bsiʔ neg exist what rel=3a=penetrate a=metal ‘There was nothing that the metal could penetrate.’ (Kruspe 2004) Example (16) shows a subordinate clause with postposed agent. Subordinate clauses do not usually allow pragmatic alternations, so the analysis as antitopical agent is not convincing. Transitive verbs in Semelai obligatorily take the proclitic agent-marker, so the pattern here is best described as VAP. While the possibility of verb-initial (or predicate-initial) clauses in Aslian may be explained as early influence from Austronesian contact languages such as Old Malay, the structural difference (VPA in Austronesian, VAP in Aslian) sug-



gests that other explanations must be considered as well. It could well be the case that Aslian languages retained inherited verb-initial structures because they were in contact with verb-initial Austronesian languages from an early time. Language contact would then be a factor for retention of old structures, rather than a trigger for language change. 3.1.3 Palaungic In Palaungic languages, including Shwe (Hsamlong), Rumai, and Wa varieties, verb-initial clause patterns are common. In Rumai and Shwe, they are mostly restricted to dependent clauses, while in Wa, verb-initial structures are also found in independent clauses, though less frequently than in subordinate clauses (Ma Seng Mai 2012). The structures found in Palaungic are VS, VAP, AUX SV/AVP, that is, the clause initial position is occupied by a verb or an auxiliary. In transitive clauses, the agent precedes the patient, as seen in (17a–b). (17) Shwe a. ʔuː jaːm pʌ̆t ʔʌːn mjʌːm ɟuːŋ. one time pick 3sg tea rain ‘While he was picking tea leaves, it was raining.’ (Milne 1921) b. jaːm lɔh kuːŋ biː ta bɤːŋ ta sʰaŋkʰāiŋ time go dig people obl hole obl cemetery ‘When they go to dig the grave.’ (Milne 1921) In intransitive clauses, the subject may follow the verb also in independent clauses, as seen in (18). (18) Shwe kəɕeː biː tʰaːŋ ʔʌːn biː daːh kʰuːn ʔʌːn lʌgaː ashamed people because 3sg people say father 3sg Naga kəɕeː biː ashamed people ‘The people are ashamed because of it, they say, its father is a Naga, they are ashamed.’ (Milne 1921) Sentence (19) illustrates the use of the clause initial auxiliary bɤːn ‘get to, have a chance to’,3 in this case modified with the future marker di. The clause is not 3 See Enfield 2003; Jenny 2015c for grammatical functions of the lexical verb ‘get’ as auxiliary in SEAn languages.

verb-initial structures in austroasiatic languages


syntactically subordinate in Palaung, but juxtaposed to the matrix. The (semantic) matrix clause ‘he asked me’ shows AVP word order. (19) Shwe ʔuː jaːm diː bɤːn miː lɔh ta brī, ʔʌːn sʰʌr.mwɔ̆ t ʔɔː. one time fut get 2sg go obl forest 3sg ask 1sg ‘He asked me when I was going to the jungle.’ (Milne 1921) In Wa, independent clauses can be either verb-initial or, more frequently, S/A-initial, but subordinate clauses, apart from conditional and complement clauses, are consistently verb-initial (Ma Seng Mai 2012). In this author’s fieldnotes verb-initial simple clauses are frequent. In transitive clauses, the agent precedes the patient, as seen in (20a). Alternatively, the word order AVP also occurs, as in (20b). Both sentences are from the same elicitation session with the same speaker, and there seems to be no difference in the different word orders in terms of semantics or pragmatics. (20) Wa a. sɔm ʔɤʔ ʔɯp. eat 1sg rice ‘I eat rice.’ b. ʔaŋ ʔɤʔ seʔ sɔm ʔɯp. neg 1sg tam? eat rice ‘I don’t eat rice.’ The Palaungic languages, spoken in northeastern Myanmar, northern Thailand, and southwestern China, show different degrees of influence from the dominant neighboring languages, Burmese, Shan, Thai, and Chinese. None of these have verb-initial patterns, and there is no evidence of other languages with verb-initial structures in the area in historical time. This makes language contact as source of the Palaungic structures highly unlikely. 3.1.4 Khasian The Khasian group of AA is spoken in the northeast Indian region, with its center in Meghalaya. External influence is mainly from dominant Indo-Aryan languages such as Assamese and Hindi and surrounding Sino-Tibetan languages, though the latter are not seen as dominant or prestigious in the area. Standard Khasi is mainly verb-medial. In contrast, more peripheral (that is, less standardized) varieties, such as Kudeng War, Amwi, and Pnar, show more or less



consistent verb-initial structures in both dependent and independent clauses. In sentence (21) from Kudeng War, the word order is VPA. (21) Kudeng War ə əri viɛr ʔi hun kə. dcl leave lose 3pl child 3sg.f ‘She left her children for ever.’ (Anne Daladier, p.c.) Sentence (22a) from Amwi shows VAP order, with a clausal argument in place of the patient. In sentence (22b) the order is VPA, with the agent expressed both by a full NP and a resumptive pronoun. The presence of the latter makes an analysis of this as an antitopical agent unlikely. If the presence of an extraclausal antitopic is assumed, it would most probably be the resumptive pronoun alone, rather than the full NP and the pronoun together. (22) Amwi a. tɔ hnta hə pərɔ̃ m ŋə tiə jaʔlə beʔ ʃkɛ. good then part tell 1sg when go.together hunt deer ‘Well then, I will tell you about us going deer hunting.’ (Weidert 1975) b. jaʔlə beʔ ʃkɛ ʔi hnthlɛ cuprəw ʔi. go.together hunt deer 1pl seven person 1pl ‘We went deer hunting together, seven of us.’ (Weidert 1975) In Pnar, the word order is generally VAP, as seen in examples (23a–b). (23) Pnar a. tʃim u=bru ka-wat̪. take m=person f=sword ‘The man took the sword.’ (Ring 2015) b. e kɔ ka=hu.kum ja o. give 3sg.f f=command ben ‘She gave a command to him.’ (Ring 2015) Being surrounded and influenced mainly by verb-final Indo-Aryan and SinoTibetan languages, with possible influence from verb-medial Ahom (Tai-Kadai) during the existence of the Ahom kingdom in Assam, there is no obvious source for verb-initial patterns through language contact in the Khasian varieties. As in the case of Palaungic, pragmatic factors are not a likely explanation for the

verb-initial structures in austroasiatic languages


VAP patterns. The pragmatic right movement of the agent to antitopic position would lead to VPA, rather than VAP, unless a second displacement of the patient is assumed. The fact that standard Khasi is generally described as verb-medial and thus diverges from the rest of the group may have its explanation in the history and sociological structure of the Khasi society. Khasi was standardized through the introduction of the Latin script together with the translation of the bible. Interestingly, the bible was translated not by a native speaker of Khasi, but by a Welsh missionary. This could have influenced the fixation of the word order SV/AVP, which spread only in the Khasi community, but not into more peripheral groups. Pnar, larger in terms of population size, may have proven more resistant to external influence (p.c. Hiram Ring). 3.1.5 Katuic Katu, although generally described as SV/AVP language, shows a considerable number of verb-initial clauses in narratives, not all of which can be explained as pragmatic verb fronting. In many instances, the expressions sound rather formulaic, as in (24b), which may be either taken as archaic or as more flexible in terms of word order. In most cases, verb-initial patterns occur in intransitive (or low-transitive) clauses, as in (24a–b) and transitive clauses with omitted patients, as seen in (24c). Katu allows a degree of flexibility in word order not found in the dominating neighboring languages like Lao and Vietnamese, as is shown by the position of the future marker ʔɛ in sentences (24b) and (24c), and the adverb ʔjɤʔ ‘more’ in (24a) and (25). (24) Katu a. cet bɤt mənɯjh, kah mənɯjh ʔjɤʔ. die all human neg.exist human more ‘All people died, there were no people anymore.’ (Costello & Sulavan 1993) b. ʔənuʔ mɔ́ ːn “ʔɛ bral ʔətaːw ʔəʔɤŋ.” dog speak fut arrive pn pn ‘The dog said “here come Attau and A’oeng.”’ (Costello & Sulavan 1993) c. lɔ́ ŋ kap bɤt hɛ, ɗah bɤt bɯəl ʔɛ. then bite all 1pl eat.meat all village fut ‘Then we can bite it, the whole village will eat it.’ (Costello & Sulavan 1993)



Besides VS/VA structures, Katu also has auxiliaries in clause initial position, a pattern also found in Aslian and Palaungic languages. This pattern is illustrated in (25). (25) Katu bʌːn ku ʔʌːj, ku ʔjɤʔ buːj.tuj. get 1sg answer 1sg more happy ‘I was able to answer; I was still happy.’ (Costello & Sulavan 1993) Being spoken in eastern MSEA, an area generally favoring verb-medial clause structure with a great variation of constituent order for pragmatic reasons in independent clauses, Katu has no obvious source in language contact for verb-initial patterns. With the lack in the presently available data of transitive clauses with both arguments overtly expressed, it is not possible to determine the word order as VAP or VPA, so the possibility of pragmatic right-displacement of the subject must be taken into account. Neither the context, nor the translation or transcription of the above examples suggest pragmatic markedness. 3.2 Verb-Initial Patterns in AA—Summary (Preliminary) Table 1.1 summarizes the verb-initial structures found in different groups of AA. The list is by no means conclusive, but it reflects the current state of findings. As these structures are in many cases peripheral in a language, or found in peripheral (therefore often ill-described) languages, one has to look at actual texts, ideally natural language, including conversations and narratives, to detect the extent of verb-initiality in the languages. Better annotated and accessible corpora of more AA languages could well bring forth more verb-initial patterns in more languages. The most common verb-initial pattern found is VS/VAP, with Nicobarese being an obvious exception, showing more generally VPA word order. Clauseinitial auxiliaries are found in at least two branches (Palaungic and Katuic). The distribution of verb-initial patterns in AA languages (Figure 1.1) is telling in that they are widely dispersed in different sub-groups across a vast geographical area, suggesting a great time-depth of the structures. This excludes the possibility of internal influence and spread from one group to another through language contact. Areal influence is in most cases also excluded as origin of verb-initial structures in AA languages, with the possible exceptions of Aslian (Old Malay, though this was not consistently verb-initial) and Nicobarese (AN languages of northern Sumatra, such as Nias and Toba Batak). In the case of Nicobarese, contact influence is a likely source of verb-initial structures, as these are found only in main clauses, while independent clauses are generally

verb-initial structures in austroasiatic languages


figure 1.1 Distribution of verb-initial patterns in AA table 1.1

Verb-initial structures in AA

Car (Nicobarese) Sora, Kharia (Munda) Semelai, Jahai (Aslian) Shwe, Rumai, Wa (Palaungic) Pnar, War, Amwi (Khasian) Katu (Katuic)


main clause VN-compounds frequent subord. clauses general folktales, intr.

verb-medial (see Sidwell, this volume). If this were the case, we would have an interesting instance of a language (group) changing from verb-initial to verbmedial and back to verb-initial.4 The presently available data suggest that proto-AA was verb-initial, or at least had VS/VAP word order as common alternative. The likelihood of this hypothesis needs to be further tested.

4 This change is attested also in Welsh, although for very different reasons (Poppe 2000).

34 4


Was Proto-AA Verb-Initial?

4.1 SEA and Proto-AA Structure Judging from the present-day AA languages and other language families of SEA, the assumed geographical center of spread of AA (Diffloth 2005; Sidwell & Blench 2011), we can assume to some extent flexibility in constituent order, allowing for pragmatic variation also in the protolanguage. The typical pragmatic variants found in the region (as well as in other parts of the world) include fronting of topical or focal elements, and right displacement of constituents as afterthought. In the verb-medial (head-final, topic-comment) languages of SEA, verbs may be fronted if they are topical, or in thetic expressions. In the second clause of sentence (27) from Mon, the verb seh ‘be left over, remain’ is the topic, pɤŋ ‘cooked rice’ gives the new information (comment, predicate). (27) Mon hwaʔ ʔɒt ʔa jaʔ, seh mɔ̀ ŋ cʰaʔ pɤŋ. curry all go nsit remain stay only cooked.rice ‘The curry is all gone, there’s only rice left.’ Sentences (28a–b) from Pacoh show thetic expressions with flat information structure. Verb-initial patterns are frequent in this context in many languages (Lambrecht 1994), especially in presentational expressions (‘there is …’), but not restricted to these. (28) Pacoh a. joːl do̰ ːj ləjʔ. remain rice not ‘Is there still some rice left?’ (Alves 2006) b. viː li.mɔː lam duŋ mo̰ ːj mu-lam trɨəŋ. exist several clf house and one-clf school ‘There are several houses and a school.’ (Alves 2006) Another common feature of AA and other SEA languages is the frequent dropping of known or retrievable constituents, including S, A and P. A minimal clause can consist of a verb alone if the arguments are known from both the linguistic and extra-linguistic context. If these features found in the present-day languages can be assigned to proto-AA, we get a rather flexible syntax, possibly with a general comment-

verb-initial structures in austroasiatic languages


topic information structure. This comment-topic structure at some point was changed to topic-comment, for reasons to be further explored (see section 4). 4.2 Contributing Evidence of Verb-Initiality in Proto-AA The presence of verb-initial structures in a number of non-adjacent AA languages, together with the fact that in many cases contact influence cannot account for the presence of these structures, suggests that these patterns are inherited from the common protolanguage. But there are more independent factors supporting this hypothesis, as is laid out in the following sections. 4.2.1 Peripheral Parts of the Grammar Word order changes are not uncommon under areal pressure, as can be seen in the Munda shift to verb-final under influence from the dominant neighboring Dravidian and Indo-Aryan languages. Similarly, Mon is showing increasing tendency towards verb-final structures under Burmese influence, though to a much lesser extent than Munda (Jenny 2011). When language structure changes under contact influence, it is usually the higher structures that are affected first. This means, sentence structure as a whole is more readily influenced than clause, phrase, and word structure (in this order; see Aikhenvald & Dixon 2002, 2007). Munda syntax is verb-final (head-final) on the sentence and clause level, and to a great extent on the phrase level, but at the word-level (compounds, synthetic forms), verb-initial structures are common, suggesting an earlier change from verb-initial to word-final. This change first affected the sentence and clause structure, but only later (and incompletely) the phrase and word structure. Similarly, a number of AA languages are verb-initial in subordinate clauses, rather than in independent clauses. The former are believed to be more conservative in general (Bybee 2001), partly because they are less accessible to pragmatic variation. It should be noted that pragmatic word order variation is heavily restricted in subordinate clauses in most languages of SEA, including Thai and Mon, which otherwise exhibit great freedom in their constituent order. In subordinate clauses, fronting and right displacement is generally not available, leaving SV/AVP as the sole word order in these two languages. If it is true that dependent clauses are more conservative, Palaungic should have retained the original word order in most subordinate clauses, while innovating in main clauses. Khasian varieties can then be seen as retaining the old order in all clause types. A problem here is the Nicobarese languages, which are verb-initial only in independent clauses, while the putatively more conservative subordinate clauses are consistently verb-medial. Why would the suggested old word order



appear in main clauses, and the innovative one in subordinate clauses? Two solutions are at hand. First, the statement that subordinate clauses are more conservative is merely a tendency and not an absolute universal fact. Exceptions can be found but may need an explanation. Second, Nicobarese possibly changed to verb-medial (i.e. head-final) order together with the mainland SEA AA languages, but later came under AN influence and started changing back to verb-initial (head-initial). This change would be expected to show in main clauses first, as is the case in Nicobarese today. The details of the history of migrations to the Nicobar Islands and the development of Nicobarese languages are not known, but there is abundant evidence of language contact in Nicobarese. The possible scenario for Nicobarese begins with a change from proto-AA comment-initial (verb-initial) to topic-initial (verb-medial) along with most mainland languages. This constituent order was later changed to verb-initial through contact with neighboring Austronesian languages by generalizing anti-topical subjects, resulting in VPA order in main clauses. Subordinate clauses, not being accessible to pragmatic variation, were not affected by this change and retained the SV/AVP order. 4.2.2 Mostly Found in Peripheral Languages A second important point is that verb-initial structures are found mostly in peripheral, rather than central languages. It has been suggested that peripheral languages are more conservative than central languages spoken by large numbers of speakers, including L2 speakers (Nichols 1992 uses the terms ‘retreat’ or ‘residual’ for the former, ‘spread’ languages for the latter). The big languages with a long history of writing like Mon, Khmer, and Vietnamese, are all nonverb-initial, but verb-initial patterns are found in the smaller, less standardized languages of the region, like Katu. Similarly, standard Khasi is mostly verb-medial, but the more peripheral varieties like War and Pnar are generally verb-initial. Languages with large numbers of speakers, especially many L2 speakers and people shifting from another language, and used in large areas, are more prone to change and levelling than peripheral languages. Central languages undergo more rapid change by more speakers, both native and nonnative speakers, using them (Croft 2000), producing more opportunities for internal change to occur and spread. Attracting large numbers of L2 speakers, central languages are more exposed to non-standard (“wrong”) structures used by L2 speakers of different levels of proficiency, which may lead to these non-standard structures spreading to L1 speakers (Næss & Jenny 2011). A third reason why peripheral languages tend to be more conservative is that they are to different degrees isolated from mainstream sociocultural processes, including standardization of the grammar, in some cases based on globally dominant

verb-initial structures in austroasiatic languages


languages such as Latin or English. The latter can be seen for example in the development of passive-like constructions and a tense-like system in Thai since the mid-19th century (Diller 2001). 4.2.3 VAP Not Due to Pragmatic Variation We have seen that pragmatic variation is widespread in AA and SEA languages in general, leading to a wide array of word orders in different pragmatic contexts. This makes it difficult at times to decide on the “basic” word order of a language. In most cases, however, this basic word order is found in subordinate clauses, as these are opaque to pragmatic variation. While pragmatic word order variation is undeniably an important feature of the present-day linguistic landscape of SEA and can be claimed to have been there in the AA protolanguage, it is difficult to see the verb-initial structures found in different sub-groups of AA as result of grammaticalized wordorder variation. First, it would be hard to explain these in subordinate clauses, which, as we have seen, are not usually accessible to word order variation. Second, fronting of verbs seems to be rather restricted, especially in transitive clauses, so getting from AVP to VAP is not a commonly expected phenomenon cross-linguistically. Third, the frequent process of right displacement to afterthought position, which is also found in verb-final languages, does not lead to VAP, but results in VPA; that is, the Agent is displaced and occurs after the Patient. The most common verb-initial pattern in AA languages is VAP, so afterthought is not a likely source for these structures, as it would involve recursive displacement of Agent and Patient: AVP → VPA → VAP, a scenario that is not very likely and rarely, if at all, described in languages of SEA and elsewhere. 4.2.4 AVP → VAP Not Likely Typologically The worldwide distribution of languages with verb-initial structures suggest that this word order is typologically dispreferred when compared with verbmedial and verb-final (Dryer 2013). Only a small percentage of languages worldwide show consistent verb-initial order, mostly reconstructable to the protolanguage, as in the case of Austronesian and Semitic, rather than to language change, be it internal or contact induced. There are a few instances of languages changing from non-verb-initial to verb-initial by internal change, such as the Celtic languages (MacAulay 1992), but obviously the development in this family was triggered by the loss of preverbal particles with a basic verb-second word order, as seen for example in German in main clauses. Verb-initial structures are found in some Indo-European languages in some grammatical and pragmatic contexts, such as polar questions (German, English, French) and jokes



(German). Spontaneous language internal development from non-verb-initial to verb-initial is not a likely explanation for the present state in AA languages. There are, on the other hand, a number of documented examples of languages changing from verb-initial to non-verb-initial, like Malay and other western AN languages (Adelaar & Himmelmann 2005), Afroasiatic including Egyptian/Coptic (Loprieno 1995, 2000; Allen 2013) and Semitic with Hebrew and modern Arabic (Hetzron 1997). Verb-initial patterns survived in some more peripheral Semitic languages, such as South Arabian varieties (SimeoneSenelle 1997), as well as in a number of Austronesian languages, including the Formosan languages (Tsukida 2005; Zeitoun 2005). 4.3 Development of Proto-AA Syntax The most consistent picture that emerges of proto-AA syntax at the present stage of research is the following: – The word order, or rather information structure, in proto-AA was basically verb-initial or comment-topic, possibly with alternative order AVP as stated in Greenberg’s Universal 6 (Greenberg 1963). Generally, commentinitial is realized syntactically as predicate- or verb-initial structure. It is not clear whether syntactic categories such as verb and noun can be assumed for the protolanguage, so it is safer to speak of predicate-initial until further evidence can be adduced. – Later, for some reason, the AA languages changed to topic-comment arrangement (> S/A-initial); this involves the highest level of the grammatical structure (information structure), which is easily affected by language contact or spontaneous change due to variation in usage. Syntactically, topiccomment is realized as SV or AVP/APV. This process results in the verb being removed from clause-initial to clause-medial or -final position, the state found in the majority of present day AA languages. – Some peripheral languages retained the predicate-initial structures in some domains of their grammars, while the bulk of the family changed to verbmedial, with Munda becoming verb-final under areal pressure. 4.4 Nicobarese Word Order—Inherited or Innovated? Reid’s (1994) paper takes Nicobarese as retaining the original structure of protoAA, the islands allegedly being isolated from the rest of the family and outside influence for thousands of years. Arguments against this perspective come from both linguistics and history of the Nicobar Islands. The Nicobars, far from being isolated for several thousand years from outside influence, rather served as an important port of call for traders between the Southeast Asia, India, and Europe probably since at least classical antiquity. The Nicobar Islands are

verb-initial structures in austroasiatic languages


possibly mentioned in Ptolemy’s 2nd century CE geography under the names Maniola for Car and Agathodaimonos for Great Nicobar (Murthy 2007). The foreign influence on the Nicobarese languages is reflected in numerous loan words from different languages, including Malay varieties and Portuguese (Jenny 2015a). The main linguistic argument against Nicobarese as archaic languages retaining proto-AA syntactic features is the fact that Nicobarese exhibits verbinitiality in main clauses, but has verb-medial order in subordinate clauses, and the position of the A argument is after the Patient (VPA), like in neighboring AN languages, but unlike the VAP structures found in the other AA languages. This supports the view that verb-initial main clauses in Nicobarese are due to contact with AN languages of northern Sumatra (Jenny 2015a). Though the evidence from Nicobarese may have to be (partially) discarded, proto-AA remains to be reconstructed as predicate-initial and the question regarding the connection with AN remains to be answered. On the level of syntax, though verb-initiality is not a typologically common word order, it is not a sufficient indicator of relatedness of two language families (or languages). Also, AN languages are mainly VPA, while the patterns found in AA suggest VAP as original structure. On the other hand, there is some similar morphology, especially common pre- and infixes, some even with similar form and functions, such as the causative prefix pa- and the nasal infix marking attributive or nominalization. Infixes are typologically rare and not easily borrowed, and if two language families spoken in adjacent areas use similar sets of infixes, the question of a common origin naturally arises. Convincing as these shared features may be, there is next to no shared lexicon (see Diffloth 1994), which means that the shared morphology would have survived in (almost) all branches in almost identical form, while at the same time almost the whole lexicon was replaced. Also, the morphology would have survived for several millennia from the protolanguage, but was then lost in almost all branches of AA within a relatively short time, while being retained in a number of AN languages. Affixation is only productive in a few AA languages, but lexical traces show that it was productive at the branch level in all branches. At the present state of research, it is difficult or impossible to prove or disprove the relationship between AA and AN and reconstructing proto-AA as verbinitial does not necessarily add substantial evidence to the discussion. There could have been more or less extensive contact between the two language families, maybe already at the protolanguage level, and at different subsequent periods, leading to the superficial similarities. The question must therefore be left open, and it is not seen as relevant to the internal reconstruction of AA.

40 5


Explanations for Word Order Change in AA—An Outlook

The more pressing question we have to address is “why did the majority of AA languages change from predicate-initial to predicate-final order?” AA languages are the earliest known languages of Mainland SEA, so it’s difficult to see contact influence as trigger—if the change is the result of language contact, then contact from where? What languages could have been in contact with early AA? The answer is simply that we don’t know and will probably never know. What we do know is that the earliest Mon and Khmer inscriptions from the SEA are dated well before the first contact with verb-medial Tai-Kadai speakers in Mainland SEA, but they are consistently verb-medial. If there is no conceivable contact scenario we can presently reconstruct to account for the major word order change in early AA, what other factors could have caused the change? An obvious alternative is a spontaneous language-internal change, favored by at least three processing biases. 5.1 Processing Biases Several recent experiments and large-scale comparisons have shown that verbinitial word order leads to conflicts on at least three levels. – “given before new principle” (Chafe 1994; Lambrecht 1994; Junge et al. 2015): New information (comment) can be more easily processed if it is grounded in known information (topic). Topics, typically encoded as subject in many languages, are most commonly placed before comments, typically encoded as verbs or verb phrases. Languages are therefore ideally S/Ainitial. – P adjacent to V (dependency length minimization; Hawkins 2004; Newmeyer 2005; Futrell et al. 2015): In many languages (and syntax theories), objects are seen as dependents of the verb within the verb phrase. In terms of configurationality, it is preferred to keep heads and dependents in adjacent positions, meaning that separating the object from its verb leads to increased cost in parsing. If a language is verb-initial, it should therefore preferably be VPA in order to keep the dependency length minimal. – A before P (e.g. Sauppe 2016): Language processing experiments have shown that the human parser interprets the first NP preferably as S/A in a clause. If at a later point it is found that the first NP was in fact the P argument, the processing cost increases through the necessary reinterpretation. A verb-initial language should therefore ideally be VAP, which conflicts with the dependency length minimization principle. In summary, a verb-initial clause necessarily violates at least two of these biases, namely ‘given before new’, and either ‘P adjacent to V’ or ‘A before P’.

verb-initial structures in austroasiatic languages


Does it follow from these conflicts that verb-initial structures are less stable or disprefered compared to verb-medial and verb-final structures? The worldwide rarity of verb-initial languages seems to speak in favor of this. Instances of verbinitial (or predicate-initial) languages langauges such as Tagalog and Malagasy are simply exceptions to an otherwise robust typological tendency. This might then explain the spontaneous change in early AA from comment-initial to topic-initial, a single change at the highest level of the discourse structure, which led to the restructuring of the syntax to predicate- or verb-medial or -final. This restructuring further led to resolving the conflicts in information structure and parsing biases inherent in verb-initial expressions. We may speculate that this change happened at a time when AA languages grew and spread, attracting more numerous L2 speakers and being used in larger areas. The original structures were retained in some peripheral languages and parts of the grammars, resulting in the present picture we get of the AA family. 5.2 Outlook Reconstructing verb-initial clause structure for proto-AA is at odds with traditional diachronic explanation of the syntactic patterns found in the family. The evidence gathered from texts, rather than grammatical descriptions, is enough, though, to formulate such a hypothesis at this point. Taking verb-initial AA as a working hypothesis, further research can be directed towards supporting or falsifying this hypothesis. In order to confirm the likelihood proto-AA being verb-initial, more extensive data is needed from annotated corpora of natural language. Of special importance will also be data from the Khmuic, Bahnaric, and Katuic branches, especially small or peripheral languages. These datasets, once collected and digitized in a common accessible format, can then be compared using traditional and newly developed methodologies of reconstruction to gauge the possibility of proto-AA indeed having dominant verb-initial structure. In spite of the challenges involved in the reconstruction of syntactic structures and patterns, it is certainly worthwhile pursuing this undertaking in AA. At the same time, the likelihood of a language internal development from verb-initial to non-verb-initial must be tested with documented similar developments in other language families. Austronesian and Afroasiatic would be natural candidates here. Additional evidence from language processing biases can be adduced to complement the emerging picture of historical AA syntax.



References Adelaar, Alexander & Niklaus P. Himmelmann (eds.) 2005. The Austronesian languages of Asia and Madagascar. London: Routledge. Aikhenvald, Alexandra Y. & R.M.W. Dixon (eds.) 2002. Areal diffusion and genetic inheritance: problems in comparative linguistics. Oxford: Oxford University Press. Aikhenvald, Alexandra Y. & R.M.W. Dixon (eds.) 2007. Grammars in contact. A crosslinguistic typology. Oxford: Oxford University Press. Allen, James P. 2013. The Ancient Egyptian language: an historical study. Cambridge: Cambridge University Press. Alves, Mark. 2006. A Grammar of Pacoh: a Mon-Khmer language of the central highlands of Vietnam. Canberra: Pacific Linguistics (PL580). Anderson, Gregory. 2007. The Munda verb. Typological perspectives. Berlin/New York: Mouton de Gruyter. Anderson, Gregory D.S. 2008. Introduction to the Munda languages. In The Munda languages. Gregory D.S. Anderson (ed.), London/New York: Routledge, 1–10. Braine, Jean Critchfield. 1970. Nicobarese grammar (Car dialect). PhD dissertation. Berkeley: University of California. Brown, Lea. 2005. Nias. In Alexander Adelaar & Niklaus P. Himmelmann (eds.) The Austronesian languages of Asia and Madagascar. London and New York: Routledge, 562–589. Burenhult, Niclas. 2005. A grammar of Jahai. Canberra: Pacific Linguistics. Bybee, Joan. 2001. Main clauses are innovative, subordinate clauses are conservative. In Joan L. Bybee and Michael Noonan (eds.) Complex Sentences in Grammar and Discourse: Essays in honor of Sandra A. Thompson. Amsterdam/Philadelphia: John Benjamins Publishing Company, 1–17. Chafe, William. 1994. Discourse, consciousness, and time. The flow and displacement of conscious experience in speaking and writing. Chicago/London: The University of Chicago Press. Comrie, Bernard. 2007. Areal typology of Southeast Asia: what we learn from the WALS maps. Manusya special issue 13: 18–47. Costello, Nancy A. & Khamluan Sulavan. 1993. Katu Folk Tales and Society. Vientiane: Institute of Research on Lao Culture and Society. Croft, William. 2000. Explaining language change. An evolutionary approach. Harlow: Longman. Cumming, Susanna. 1984. The syntax and pragmatics of prepredicate word order in Toba Batak. Studies in the structure of Toba Batak. Paul Schachter (ed.), 17–36. UCLA Occasional Papers in Linguistics 5. Diffloth, Gerard. 1994. The lexical evidence for Austric so far. Oceanic Linguistics 33(2): 309–321.

verb-initial structures in austroasiatic languages


Diffloth, Gérard. 2005. The contribution of linguistic palaeontology to the homeland of Austro-asiatic. In The Peopling of East Asia: Putting Together Archaeology, Linguistics and Genetics. Laurent Sagart, Roger Blench & Alicia Sanchez-Mazas (eds.), 77–80. Routledge/Curzon. Diller, Anthony. 2001. Thai grammar and grammaticality. In Kniffka, Hannes (ed.) Indigenous grammars across cultures. Frankfurt am Main: Peter Lang, 219–244. Donegan, Patricia & D. Stampe. 2004. Rhythm and the synthetic drift of Munda. In R. Singh (ed.) The Yearbook of South Asian Languages and Linguistics. Berlin/New York: Mouton de Gruyter, 3–36. Dryer, Matthew S. 2013. Order of Subject, Object and Verb. In: Dryer, Matthew S. & Haspelmath, Martin (eds.) The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. (, Accessed on 2016-04-07.) Enfield, N.J. 2003. Linguistic epidemiology. Semantics and grammar of language contact in Mainland Southeast Asia. London/New York: RoutledgeCurzon. Enfield, N.J. 2005. Areal linguistics and Mainland Southeast Asia. The Annual Review of Anthropology 34: 181–206. Fischer, Wolfdietrich. 1997. Classical Arabic. In Robert Hetzron (ed.) The Semitic languages. London/New York: Routledge, 187–219. Futrell, Richard, Kyle Mahowald, & Edward Gibson. 2015. Large-scale evidence of dependency length minimization in 37 languages. PNAS 112.33: 10336-10341. Givón, Talmy. 1971. Historical syntax and synchronic morphology: an archaeologist’s fieldtrip. In Papers from the Seventh Regional Meeting, Chicago Linguistic Society. Chicago: Chicago Linguistic Society, 394–415. Greenberg, Joseph H. 1963. Some universals of grammar with particular reference to the order of meaningful elements. In Universals of grammar. Joseph H. Greenberg (ed.), 73–113. Cambridge: MIT. Hawkins, John A. 2004. Efficiency and complexity in grammars. Oxford: Oxford University Press. Hetzron, Robert (ed.) 1997. The Semitic languages. London: Routledge. Jenny, Mathias. 2015a. Syntactic diversity and change in Austroasiatic languages. In Viti, Carlotta (ed.) Perspectives on historical syntax. Amsterdam/Philadelphia: John Benjamins, 317–340. Jenny, Mathias. 2015b. Modern Mon. In Jenny, Mathias & Paul Sidwell (eds.) Handbook of Austroasiatic Languages. Leiden/Boston: Brill, 553–600. Jenny, Mathias. 2015c. The far West of Southeast Asia: ‘Give’ and ‘get’ in the languages of Myanmar. In Enfield, N.J. & Bernard Comrie (eds.) The languages of Mainland Southeast Asia. The state of the Art. Berlin/Boston: De Gruyter Mouton, 155–208. Jenny, Mathias & Paul Sidwell (eds.) 2015. Handbook of Austroasiatic Languages. Leiden/Boston: Brill.



Junge, Bianca, Anna L. Theakston & Elena V.M. Lieven. Given-new/new-given? Children’s sensitivity to the ordering of information in complex sentences. Applied Psycholinguistics 36: 589–612. Kruspe, Nicole D. 2004. A grammar of Semelai. Cambridge: Cambridge University Press. Lambrecht, Knud. 1994. Information structure and sentence form. A theory of topic, focus, and the mental representation of discourse referents. Cambridge: Cambridge University Press. Loprieno, Antonio. 1995. Ancient Egyptian. A linguistic introduction. Cambridge: Cambridge University Press. Loprieno, Antonio. 2000. From VSO to SVO? Word order and rear extraposition in Coptic. In Sornicola, Rosanna, Erich Poppe & Ariel Shisha-Halevy (eds.) Stabilty, variation and change of word-order patterns over time. Amsterdam/Philadelphia: John Benjamins, 23–39. MacAulay, Donald (ed.) 1992. The Celtic languages. Cambridge: Cambridge University Press. Mahdi, Waruno. 2005. Old Malay. In Adelaar, Alexander & Niklaus P. Himmelmann (eds.) The Austronesian languages of Asia and Madagascar. London and New York: Routledge, 182–201. Man, Edward Horace. 1889 [1975]. A dictionary of the Central Nicobarese language. New Delhi: Mittal. Masica, Colin P. 1976. Defining a linguistic area: South Asia. Chicago: University of Chicago Press. Milne, Leslie Mrs. 1921. An elementary Palaung grammar. Oxford: Clarendon Press. Murthy, R.V.R. 2007. Andaman and Nicobar Islands: a geo-political and strategic perspective. New Delhi: Northern Book Centre. Nababan, P.W.J. 1981. A grammar of Toba-Batak. Canberra: Pacific Linguistics. Næss, Åshild & Mathias Jenny. 2011. Who changes language? Bilingualism and structural change in Burma and the Reef Islands. Journal of Language Contact, 4(2): 217–249. Newmeyer, Frederick J. 2005. Possible and probable languages. A generative perspective on linguistic typology. Oxford: Oxford University Press. Nichols, Johanna. 1992. Linguistic diversity in space and time. Chicago: University of Chicago Press. Peterson, John. 2011. A Grammar of Kharia. Leiden/Boston; Brill. Pinnow, Heinz-Jürgen. 1963. The position of the Muṇḍā languages within the Austroasiatic language family. In: H.L. Shorto (ed.). Linguistic Comparison in Southeast Asia and the Pacific. London: SOAS, 140–152. Poppe, Erich. 2000. Constituent order in Middle Welsh. The stability of the pragmatic principle. In Sornicola, Rosanna, Erich Poppe & Ariel Shisha-Halevy (eds.) Stabilty, variation and change of word-order patterns over time. Amsterdam/Philadelphia: John Benjamins, 41–51.

verb-initial structures in austroasiatic languages


Reid, Lawrence A. 1994. Morphological evidence for Austric. Oceanic Linguistics 33.2, 323–344. de Roepstorff, F.A. 1884. Dictionary of the Nancowry dialect of the Nicobarese language; in two parts: Nicobarese-English and English-Nicobarese. Edited by Mrs. deRoepstorff. Calcutta: Home Department Press. Ring, Hiram. 2015. Pnar. In Mathias Jenny & Paul Sidwell (eds.) The handbook of Austroasiatic languages. Leiden/Boston: Brill, 1186–1226. Sauppe, Sebastian. 2016. Verbal semantics drives early anticipatory eye movements during the comprehension of verb-initial sentences. In Frontiers in Psychology 7, article 95. Sidwell, Paul. 2015. Car Nicobarese. In Mathias Jenny & Paul Sidwell (eds.) The handbook of Austroasiatic languages. Leiden/Boston: Brill, 1229–1265. Sidwell, Paul & Roger Blench. 2011. The Austroasiatic Urheimat: the Southeastern Riverine Hypothesis. In N.J. Enfield (ed.) The Dynamics of Human Diversity. Canberra: Pacific Linguistics, 315–343. Simeone-Senelle, Marie-Claude. 1997. The modern South Arabian languages. In The Semitic languages. Robert Hetzron (ed.), 378–423. London/New York: Routledge. Subbarao, Karumuri V. 2012. South Asian languages. A syntactic typology. Cambridge: Cambridge University Press. Tsukida, Naomi. 2005. Seediq. In Adelaar, Alexander & Niklaus P. Himmelmann (eds.) The Austronesian languages of Asia and Madagascar. London and New York: Routledge, 291–325. Weidert, Alfons. 1975. I Tkong Amwi. Deskriptive Analyse eines Wardialekts des Khasi. Wiesbaden: Otto Harrassowitz. Whitehead, G. 1925. Dictionary of the Car-Nicobarese language. Rangoon: American Baptist Mission Press. Wnuk, Ewelina. 2016. Semantic specificity of perception verbs in Maniq. Nijmegen: Max Planck Institute for Psycholinguistics. Zeitoun, Elizabeth. 2005. Tsou. In Adelaar, Alexander & Niklaus P. Himmelmann (eds.) The Austronesian languages of Asia and Madagascar. London and New York: Routledge, 259–290.

chapter 2

Initial Steps in Reconstructing Proto-Vietic Syntax Mark Alves



This paper presents preliminary reconstructions of early Vietic clause structure and noun phrase structure, with additional notes on other aspects of Vietic syntax, such as voice and verb complements. To do so effectively requires a reconstruction based on an understanding of historical language contact in the region and insights gleaned from textual data of early Vietnamese, Khmer, and Chinese. Typological convergence over two millennia though multiple periods of language contact (e.g. Viet-Muong convergence with Sinitic, convergence among Mainland Southeast Asian languages, convergence among languages in Vietnam, etc.) has restructured the syntax of Vietic languages to the point that it would be impossible to reconstruct patterns other than the modern typology without historical evidence and comparative evidence with languages other than Vietic. Ultimately, while available data can only allow the reconstruction of verb-medial clause structure (i.e. agent-verb-patient) with topic-comment tendencies—the same as in modern languages—the previously strictly rightbranching Vietic noun phrase structure has been transformed to a blended one in which quantity terms precede nouns but modifying elements follow them. An introductory background of Vietic follows. The Vietic sub-branch of Austroasiatic has one group with Chinese-like linguistic features, namely, the Việt-Mường group, and one group with features similar to many Austroasiatic languages, the highly conservative ‘Southern Vietic’1 group. While Vietnamese and Mường languages have complex tone systems, monosyllabicity and isolating typology, basically CVC syllable structure, a history of the use of Chinese writing, and many borrowed words from Chinese, Southern Vietic languages exhibit a range of partial tone and/or phonation

1 In this study, I use the broad geographic shorthand of ‘Southern Vietic’ (as in Ferlus 2004) for all non-Viet-Muong (effectively, ‘Northern Vietic’, though I use the term Viet-Muong as it is well known) highland languages of Vietic, that is, the Pọng-Toum and Chứt languages. These have been termed Pọng-Chứt languages by Nguyễn T.C. (1995), and that Vietnamese term— which is not used widely in English-language publications—is also used sometimes in this paper.

© koninklijke brill nv, leiden, 2020 | doi:10.1163/9789004425606_004

initial steps in reconstructing proto-vietic syntax


– Việt-Mường: Vietnamese (various dialects), Mường Muồt, Mường Nàbái, Mường Chỏi, etc. – Pong-Toum: Phong, Đan Lai, Hung, Toum and others – Chut: – East: Mãliềng, Maleng, Arem, Kri, Chứt (Mày, Rục, Sách, Mụ Già) – West: Thavung, Pakatan figure 2.1 Vietic sub-branches and languages Sidwell 2015: 203–205

systems, polysyllabicity and agglutinative typology, CCVC syllable structure, no previous use of Chinese writing, and little lexical influence from Chinese (excluding recent borrowing of Sino-Vietnamese vocabulary via Vietnamese) (cf. Alves 2003). This broad sub-branching within Vietic separating Viet-Muong from other Vietic languages has been proposed by various researchers (e.g. Hayes 1992, Chamberlain 2003, Ferlus 1979 and 1989–1990, Nguyễn T.C. 1995). An overview of a more recent view of Vietic group sub-branches is that of Sidwell (2015) shown in Figure 2.1. The lexical and phonological data of Vietic languages has given tremendous insight into Vietnamese language history, concretely connecting that currently tonal, isolating morphology with its non-tonal agglutinative origins. However, now, a sufficient quantity of syntactic data and descriptions, as well as relevant historical linguistic and historical, archaeology, and ethnological data, can give some insight into the syntax of Proto-Vietic, or at least early Vietic just prior to the impact of contact with Sinitic.2 Previously, historical linguistic research on Vietic have largely focused on phonology. Studies of the phonology of Vietic languages began in the early 20th century, and for Vietnamese historical phonology, there was Maspero’s 1912 substantive monograph on the topic. But for other Vietic languages, there were primarily just wordlists with very limited phonological notes (e.g. Cadiere 1905, Chéon 1907, etc.). A reconstruction of the phonology of the Việt-Mường sub-branch came first (Thompson 1976), but it is only around the beginning of the 21st century that reconstructions of Vietic phonology and lexicons were 2 In this paper, the term ‘Sinitic’ is used to refer to the linguistic ancestor of modern varieties of Chinese. It is a speech community in the distant past, including potentially a number of closely related varieties of a language group. Using the term ‘Chinese’ is often ambiguous and also gives an incorrect association between modern varieties of Chinese and the extremely different language(s) of the ancestor speech community of over 2,000 years ago. Moreover, ‘Sinitic’ removes modern political and cultural associations. However, the term ‘Chinese’ is used in this paper with mention of later speech varieties or with reference to the Chinese culture broadly.



attempted (Nguyễn T.C. 1995, Nguyễn N.S. 2003, Ferlus 2007, Trần 2011), in large part due to fieldwork-gathered Vietic linguistic data by Vietnamese, French, and Russian linguists in the last few decades of the 20th century and some continuing today. In contrast, Proto-Vietic morphosyntax has been given very little attention. The Chut languages, as do other Austroasiatic languages, have presyllabic material, and consequently, recovering Vietic prefixes and infixes in modern monosyllabic Vietnamese words have been attempted (cf. Ferlus 1977 and 2009), and textual evidence of polysyllabic words in Vietnamese yielded increasingly useful data (e.g. Shimizu 2000 and 2015, Shimizu et al. 2005). Regarding issues of historical syntax, access to many ancient Vietnamese writings has allowed studies of chronological developments in Vietnamese grammatical vocabulary, primarily in the past few decades (Nguyễn Đ. H. 1991, Lê 2002, Nguyễn N.S. 2003, Stankevitch 2006, Alves 2005, 2009a, and 2009b, Vũ 2006, Võ 2016, etc.). However, the history of noun phrase structure in early writings has only appeared in publication in some recent publications (e.g. Vũ 2014 on Vietnamese noun phrase structure in the 1500s). Moreover, available syntactic data of other Vietic languages had been limited, but some increase in that area has occurred in the last couple of decades (see §1.3). And what that data shows is surprising consistency in the patterns of all Vietic languages in Vietnam, which would lead to reconstructions that appear entirely the same as those spoken today. Thus, the question is how to begin a reconstruction of Proto-Vietic syntax. For this task, key sources of syntactic data include (a) modern syntactic descriptions and data of the highly conservative Southern Vietic groups, primarily Rục (Nguyen V.L. 1993 and Solntsev, Solntseva, and Samarina 2001), Thàvựng (Srisakorn 2008), and Kri (Enfield and Diffloth 2009), (b) modern Vietnamese and its regional, dialectal variants, (c) vernacular Vietnamese in Chinese-based Nôm writings (samples in this study considered from texts back to the 1300s),3 and (d) 17th century Catholic missionaries’ early descriptions of Vietnamese grammar (de Rhodes 1651, as well as short writings and letter

3 While epigraphic Nôm script samples date back to the 8th century, extant texts of use in syntactic studies are only from the 1200s and 1300s. However, during the brief Ming Dynasty political control of Vietnam from 1407 to 1427, the Chinese administration mandated destruction of non-Chinese writings, and thus, there are few texts from before that period (Taylor 2013: 179). Many early texts consist of poetry, rather than prose, and early Vietnamese writers were extremely familiar with Chinese writing: such matters are caveats that diachronic syntactic studies in Vietnamese from this early period, as Nguyễn N.S. notes (2003: 218– 219).

initial steps in reconstructing proto-vietic syntax


of other missionaries). Moreover, comparative data from modern Austroasiatic languages and ancient writings of Old Khmer (800s to 1300s CE) give further perspective (Jacob 1993, Sak-Humphrey 1993, Jenner and Sidwell 2010). Based on such data, Vietic clause structure and noun phrase structure can be reconstructed largely as that of syntax in modern languages in that Austroasiatic sub-branch: (a) basic clausal structure of agent-verb-patient, though with robust topic-prominent tendencies, and (b) right-branching NP structure of noun-modifiers, with the position of quantification somewhat less certain but still probably in post-nominal position but without classifiers (see § 2.1). All descriptions in this study must be considered preliminary working hypotheses pending additional data that will hopefully be gathered and made available in the future. Moreover, the reconstructions are general, with no attempt to pinpoint, for example, ordering of modifying elements in NP s. Nor will there be discussion of multi-clausal constructions, only patterns of basic sentences. In subsections 1.1 to 1.3, historical sociolinguistic issues in northern Vietnam and shared grammatical vocabulary in Vietic are discussed to provide context for the data on NP s and clause structure in available syntactic data of Vietic languages. Sections 2 and 3 then present data about noun phrases and clauses respectively. See the Appendix for an outline of this study. 1.1 Overview of Historical Sociocultural Circumstances Discussion of Vietic linguistic history must be placed in the wider context of the Southeast Asian Linguistic Area. Contact-driven convergence has affected all the regional language families (Sinitic, Tai-Kadai, Hmong-Mien, Austronesian, Austroasiatic), especially since the Han Dynasty (206 BCE–220 CE) onward. All of these language groups have been hypothesized to come originally from nontonal languages with more complex syllable structure and probably more complex word-formation patterns, and all appear to have experienced a period of massive typological restructuring and convergence (cf. Haudricourt 1954, Ratliff 2010: 198–191 on tonogenesis in the region, and Delancey 2013 on the overall typological phenomenon). In addition, Sinitic and Vietic also both come from polysyllabic languages with affixes, lost completely in both Chinese and Việt-Mường languages, though retained in the Southern Vietic languages, a significant detail in the sub-branching within Vietic. This refutes the view that Vietnamese received its linguistic structure from Chinese as Sinitic itself was simultaneously undergoing restructuring. Add to this many subsequent eras of contact with surrounding Southeast Asian languages and we then have large periods of time during which language contact situations and history cannot be explicitly described. This makes claims about



developments of early Vietic clause and NP structure tenuous, or at least temporary pending additional data and analyses. Nevertheless, the language contact situation between Sinitic and Vietic can be offered in some detail to begin to understand what early Vietic syntax may have been like, as is done below. In this section, we consider the sociolinguistic setting of Vietic in the early period of contact with Sinitic from around 200 BCE and the parameters of Sinitic-Vietic language contact by which linguistic change could have occurred. Moreover, aspects of regional typological convergence are described. To hypothesize about early Vietic syntax and possible developments, the following are questions addressed. – In what area(s) were early Vietic speech communities in the last few centuries of the first millennium BCE, and how can we know that? – What can be inferred about the sociocultural circumstances of Vietic speech communities before and during contact with Sinitic? – What was the likely language typology of Vietic and Sinitic during initial contact? – What are some of the stages of historical linguistic changes in Vietic and other languages, primarily Sinitic and Tai, with which Vietic experienced language contact? 1.1.1

Vietic Speech Communities in Vietnam in the First Millennium BCE The Vietic homeland has been considered to be in north-central Vietnam (e.g. the region formerly called Bình Trị Thiên), where a high degree of linguistic diversity of both Vietnamese dialects (e.g. Alves 2007) and Vietic languages can be seen in general (Chamberlain 1998), including varieties of Mường and PọngChứt languages, with some also in bordering areas of Laos. However, new evidence suggests a different scenario. Recent archaeogenetics studies have provided strong support that (a) Austroasiatic groups have been in the Red River Delta since approximately 2000 BCE (McColl et al. 2018) and (b) in the Dong Son period (c. 600 BCE to 200 CE), traces of both Vietic and Tai groups were identified (Lipson et al. 2018). A probable scenario then is that an Austroasiatic group settled in the region, over time becoming a distinct speech community, what became Proto-Vietic. Regarding Vietnamese dialectal diversity in northcentral Vietnam, this need not indicate that Vietnamese specifically dispersed from this region. Reduction in diversity can be the result of the convergence of groups in a political center, as is the case for Sinitic (i.e. northern origins but less linguistic diversity in northern China than southern China). On the linguistic side of data, it is challenging to associate Proto-Vietic words specifically with the Đông Sơn culture. Instead, helpful linguistic evidence

initial steps in reconstructing proto-vietic syntax


comes from early Sino-Vietnamese loanwords, that is, the hundreds of Chinese loanwords in Vietnamese borrowed in the first several centuries CE (cf. Wang 1948, Mei 1970, Phan 2013, Alves 2016, etc.), and some of these early Sinitic loanwords appear as well in Chut languages (cf. Baxter and Sagart 2014). Such data is robust in terms of both number of loanwords and consistency of the phonological correspondence. Northern Vietic borrowed Sinitic words at a time when presyllables were still retained and when pre-tonal final glottal stops and fricatives were part of the phonology of both language groups (Haudricourt 1954). Recent hypotheses in Chinese historical phonology, based on comparative linguistic and textual evidence, support the claim that Sinitic only developed tones in the first few centuries CE (Zhu 2015). Such a scenario requires that sufficient Sinitic-Vietic bilingualism and language contact occurred in the very early centuries CE, and the best place for this to have occurred, based on Chinese historical records, was in Jiaozhi (an administrative region of northern Vietnam), not as far south as the region of Southern Vietic in north-central Vietnam. The two main factors in understanding early Vietic syntax are contact with Sinitic and areal Southeast Asian typological convergence. The focus in the first following subsection is on Sinitic-Vietic sociolinguistic circumstances, while the second subsection is on areal contact, with a note on Tai-Vietic contact. 1.1.2 Sociolinguistic Circumstances of Sinitic-Vietic Contact This section presents possible sociocultural and sociolinguistic situations of Vietic at the time of contact with Sinitic, with a comparative summary of sociocultural and linguistic traits in Table 2.1. In the mid-1st millennium BCE, assuming connection with Đông Sơn culture, the Vietic speech community was part of a developed agricultural, early Iron Age society having sophisticated bronze metallurgical techniques, and with an emerging state-level system (Kim 2013). There is no concrete evidence of an indigenous writing system. It is safe to assume surrounding chiefdoms and tribes as part of the extended speech community as well as contact with other speech communities, such as early Tai-Kadai groups. Nevertheless, some Vietic speakers were likely part of a community which had established sociopolitical status, an important factor to consider in the language contact scenario. In terms of linguistic typology, at the time of initial contact with Sinitic around the beginning of the first millennium CE, the Vietic language group must have had predominantly Austroasiatic typological traits, including being non-tonal and bisyllabic and having initial clusters and both prefixes and infixes, much as many of the modern Pọng-Chứt languages do. Regarding



morphosyntax, if Vietic was similar to the dominant morphological typology of Mainland Southeast Asian (MSEA hereafter) Austroasiatic languages,4 it likely had agglutinative, derivational—but not inflectional—morphology and right-branching noun-phrase structure, much as modern Southern Vietic languages do. As all modern Vietic languages with available descriptions and most Austroasiatic and all nearby Austroasiatic sub-branches (i.e. Katuic, Bahnaric, Khmuic, Mangic, and Khmeric) are topic-prominent languages with SVO/AVP tendencies, it seems likely that Vietic was similarly so, though unfortunately, no data exists to confirm or refute this. Moreover, as Sinitic was of a similar typology, there is no contrasting point to evaluate this matter, unlike the issue of NP structure (i.e. noun-modifier in Vietic versus modifier-noun in Sinitic), as discussed below. Some likely relevant sociocultural and linguistic characteristics of Sinitic in the period of initial contact are as follows. Sinitic in the 3rd Century BCE at the earliest time of contact with Vietic was spoken in a developed Iron Age and advanced agricultural society with a developed state-level society consisting of multiple competing polities, including a mixture of Sinitic and nonSinitic speech communities. The fully developed Chinese writing system is part of the means by which administration could be extended so far geographically as well as to exert sociolinguistic status. The sociocultural and political status of the expanding culture from northern China evidently led to substantial lexical borrowing into non-Chinese languages, but Chinese social status could not prevent Sinitic’s eventual typological restructuring due to generations of multilingualism over a huge geographic area among many languages combined with imperfect acquisition. And apparently, the many small groups of speakers of Southern Vietic language lived far enough outside the cultural and linguistic Sinosphere to retain ancient linguistic features and generally experience much less sinification than the Việt-Mường languages did. In terms of linguistic typology, in the Han Dynasty, Sinitic, as a sub-branch of Sino-Tibetan, was likely non-tonal (cf. Zhu 2015) and still retained final glottal stops and fricatives, polysyllabic with prefixes and suffixes, and complex consonant clusters, very unlike the modern tonal CVC syllables of modern varieties of Chinese. Numerous Chinese writings from that period suggest that Sinitic at that time was a topic-prominent language, though with basic SVO structure (cf. Aldridge 2015). This is in contrast with widespread SOV verb-final patterns

4 Regarding MSEA Austroasiatic/Mon-Khmer morphology, see Alves 2014 on derivational issues and Alves 2015a on grammatical functions of Mon-Khmer morphology.

initial steps in reconstructing proto-vietic syntax


of many other Sino-Tibetan languages. At that time, Sinitic marked passive voice lexically via preverbal 見 jiàn (homophonous with ‘to see’) and agentmarking 於/于 yú (Meisterernst 2015), prior to the development of the passive 被 bèi construction (the etymological source of Vietnamese passive-marking bị), which became dominant only in Late Medieval Chinese of the 7th to 13th centuries (Peyraube 1996: 177). Classifiers had not yet been fully grammaticalized, but measure words moved from post-nominal to pre-nominal position in the Late Archaic period around the mid-first millennium BCE (Peyraube 1996: 196, Behr 2009). Early classifiers occurred in post-nominal position during the Han Dynasty, but eventually they matched the measure word pattern by the Tang Dynasty (Ibid.). It is possible that this development among Chinese languages was also the case in Chinese spoken in the region, thereby impacting NP structure Việt-Mường. Table 2.1 shows that Sinitic and Vietic at that time shared a number of key linguistic features.5 Later syntactic developments in Sinitic—primarily the emergence of classifiers, impacting Vietic NP structure somewhat—did ultimately become part of Việt-Mường. Such structural influence can best be accounted for by assuming a large Sinitic community amidst Vietic, such as Phan’s (2013) proposed Annamese-Chinese. Still, as for overall syntactic properties, other than the pre-nominal position of quantity and measure words, language contact over a period of two millennia has not impacted the mostly right-branching Vietic syntactic structure. Whether Proto-Vietic clause structure was verb-initial—as per Jenny’s hypothesis for Proto-Austroasiatic (2015 and in this volume)—at the time of Sinitic-Vietic contact cannot be demonstrated. However, since Sinitic was already an SVO language at the time of contact, it is at least hypothetically possible that it influenced Proto-Vietic clause structure. The significant difference between early Vietic and Sinitic noun phrase structure, as well as other syntactic details,6 further highlights the typological differences between the two

5 Whether Sinitic had already started undergoing linguistic restructuring due to previous contact with non-Sinitic groups in southern China is an issue that cannot and need not be addressed here. 6 Despite the apparent impact of Sinitic on Vietic in terms of the lexicon, phonology, prosodic word structure, and noun phrase structure, it is nevertheless important to note how all Vietic languages, including Vietnamese, went down quite a different typological linguistic path from varieties of Chinese. Many syntactic features that are widespread among Chinese languages are lacking in Vietnamese, such as the Chinese A-not-A interrogative pattern, preverbal locatives and other adjuncts, preverbal adverbial constructions, prenominal modifiers, and NP s formed with clause- or phrase-final de 的 (or functionally comparable but etymologically dis-



table 2.1

Sociocultural and linguistics aspects of Sinitic and Vietic in the Han Dynasty

Sociocultural aspects Sinitic


Food production – State-level managed agriculture Sociopolitical structure – Developed state-level with multiple states and large polities Literacy – Standardized writing system Metal Age – Developed Iron Age

– Advanced agriculture – Early state-level along with chiefdoms and tribes – No writing system – Early Iron Agea

Linguistic aspects Phonology



– – – – – – – – – – –

Non-tonalb CCVCC syllables Final *-ʔ and *-s Agglutinative Non-inflectional Prefixes and suffixes Topic-comment SVO NP: modifier-noun Marking of passive Prior to classifier grammaticalization

– – – – – – – – – – –

Non-tonal CCVC syllables Final *-ʔ, *-h, and *-s Agglutinative Non-inflectional Prefixes and infixes Topic-comment SVO NP: noun-modifier Lacked marking of passive (?) Prior to classifier grammaticalization

a While the Iron Age may have begun in the Đông Sơn region in the 500s BCE, perhaps through trade, the number of iron items in Đông Sơn archaeological sites remained very small in number until well into the Han Dynasty (cf. Higham 2014: 207). b The issue of tonality versus phonation and register—and whether early Sinitic and/or Vietic had phonation systems as in many Austroasiatic languages—is another significant matter, but beyond the scope of this study.

tinct morphs in other varieties of Chinese) but without the semantic head noun. AnnameseChinese was present in Vietnam at the time of the grammaticalization of the Chinese ba 把 disposal construction (cf. Peyraube 1996: 169), but this feature did not become a grammaticalized element in Vietnamese (though this is admittedly more of a northern Chinese feature). As for differences from neighboring varieties of Chinese, Yue and Pinghua (cf. Li 1998: 29– 30), Vietnamese has neither of those varieties’ pronouns nor plural markers (e.g. ngo2 dei5 (1-PL) 我哋 ‘we’), the possessive marker (e.g., ge 嘅), or question words (e.g., mat/me 乜 ‘what’).

initial steps in reconstructing proto-vietic syntax


groups in that early period, all of which begs the question of whether Vietic was anything other than verb-medial. But again, all available sources of data on Vietic languages, including ancient textual data, show SVO word order, making a reconstruction of SVO for Proto-Vietic the only available option at this point. 1.1.3 Parameters and Details of Sinitic-Vietic Sociolinguistic Influence Sections 1.1.1 and 1.1.2 show that there was Sinitic-Vietic language contact in northern Vietnam during the Han Dynasty. However, it is difficult to precisely characterize the intensity of the early Sinitic-Vietic bilingualism. In terms of social complexity/diversity, even among Sinitic-speaking communities in Jiaozhi, there was likely a mixture of ethnically northern plains people and non-Chinese people who constituted a relatively small percentage of the overall population but who were politically powerful enough to have significant social status and sociocultural influence. Their cultural and linguistic presence in the lowlands was, over a period of centuries, significant enough to lead to sub-branching in Vietic, while the highlands of north-central Vietnam and bordering parts of Laos, where Southern Vietic groups generally reside, are the geographic limit to the Chinese linguistic influence on Vietic.7 The population size of Sinitic speech communities can only be broadly inferred from ancient Chinese documents which note historical migrations, especially those which provide numbers in censuses. For instance, in the first century CE of the Eastern Han Dynasty, some 20 thousand Chinese soldiers were given land to live permanently in Jiaozhi (Taylor 1983: 45–46). It was during this period that administrative mandates purportedly imposed Chinese domestic and general cultural practices on locals in Jiaozhi. There is lexical evidence in Vietnamese that corroborates some of these administrative requirements, such as terms related to marriage and to household accoutrements (Alves 2016). Phan (2013: 76) notes two primary points in the first millennium of Sinitic-Vietic contact, namely, the mid-1st century CE in the East Han Dynasty and the 4th century due to southward migration due to political chaos in the north. However, while it is noted in historical records that upwards of a million people moved from the north to southern parts of Chinese in the first part of the 300s CE (Gernet 1989: 180), I cannot find specific census numbers showing how many ended up in northern Vietnam.

7 The Southern Vietic languages’ partial tone systems may be best considered the result of regional influence and contact with tonal languages, such as Vietnamese and Laotian, and natural typological tendencies, and not the direct influence of Chinese.



There are, however, Chinese records to infer what influence might have occurred. The Han administration clearly attempted to promulgate Chinese culture, explicitly mandating adoption of Chinese household practices, clothing, marriage, and the like, and some of the Chinese administrators presumably had high social status in the community. Regarding the amount of institutional support of Chinese, early Sino-Vietnamese loanwords related to literacy (e.g. ‘read’, ‘pen’, ‘paper’, etc.) shows that the socially powerful means of writing left an impact early. Moreover, it can be speculated that, as an administrative and trade language, and one that was likely central in the regional typological changes, Chinese—or whatever variety of Sinitic was dominant in that region at the time—must have, at times, served as a lingua franca. It could well be, for example, that bilingual Tai speakers used Sinitic with Vietic speakers, spreading elements of Chinese and further contributing to the regional language mixing and convergence.8 Altogether, this suggests the conditions under which the differentiation of Northern and Southern Vietic took place and how some of the structural changes in noun phrase structure occurred. Moreover, the amount of bilingualism with Sinitic, and presumably Tai groups at different times—likely ranging from full to partial/imperfect degrees of bilingualism—is a probable factor in the development of the monosyllabic, tonal, analytic language that Vietnamese is. That the Vietnamese system of kinship terms has been Sinicized (cf. Benedict 1947, Alves 2017), with a mixture kinship terms from both ESV and SV vocabulary, as well as a number of ESV high-frequency verbs (e.g. ‘see’, ‘need’, ‘wait’, ‘sell,’ ‘steal’, etc.) and grammatical vocabulary (cf. Alves 2007 and 2009b), supports the notion that there was language contact involving close interpersonal interaction and intermarriage, not just literacy (cf. Chew 2015: 19), and not just trade or administrative influence. The overall consequence of this major typological transformation of the Northern Việt-Mường sub-branch ultimately led to the sub-branches of Vietic (cf. Phan 2013: 5–10), and hence allowing impact on Vietic syntax. 8 The persistent effort of indigenous peoples in Jiaozhi at various times in the 1st millennium CE to rebel against Chinese administration shows an assertion of cultural identity and appears likely to have initially mitigated the linguistic impact of Sinitic. Indeed, after Vietnamese political independence from the Chinese administration in 939CE, Phan’s hypothesized Annamese-Chinese essentially disappeared through shift to Việt-Mường, not the other way around. If early Vietic speakers had not had sufficiently strong social status, or if there had been a sufficiently large Sinitic population to culturally dominate, northern Vietnam could hypothetically still be a Chinese province with its own variety of Chinese. Instead, ViệtMường groups have maintained ethnic identities and linguistic paths distinct from those of the Chinese.

initial steps in reconstructing proto-vietic syntax


1.1.4 Mainland Southeast Asian Convergence and Tai-Vietic Contact A significant challenge to hypothesizing early Vietic syntax is the effect of longterm contact within the MSEA language area. Typological convergence among the MSEA languages—including Austroasiatic, Tai, Malayo-Chamic, HmongMien, Tibeto-Burman, and Sinitic languages—resulting in a number of shared phonological and morphosyntactic traits is a well-known phenomenon of the region (cf. a list of traits in Enfield and Comrie 2015: 7–8). Some features seen in Chinese, Vietnamese, and other MSEA languages include SVO sentences, obligatory use of semantically complex systems of classifiers, and sentence-final particles with complex modal and pragmatic functions (cf. Clark 1989). While bilingualism allows the transfer and convergence of linguistic features, it has been asserted that language features in MSEA have also diffused as a result of linguistic behaviors and typological tendencies, regardless of the degree of bilingualism in a community without need for shared cognates, such as the multiple grammatical sense of ‘get’ verbs (Enfield 2003) or the ‘surpass’ comparative (Ansaldo 2010). Consequently, it is challenging to differentiate linguistic features which are reinforced retentions from features which may result from centuries or even millennia of language contact. Regarding early language contact between Vietic and Tai-Kadai groups in northern Vietnam and southern China (cf. §1.1.1), we cannot use any possible Tai-Vietic contact during the Đông Sơn to make claims about Proto-Vietic syntax. Both Vietic and Tai-Kadai experienced similar typological restructuring over several centuries from the time of Sinitic presence in the region. However, the lexical evidence of Tai-Vietic contact from that period is very limited. Previous studies presenting Vietnamese vocabulary supposedly related to Tai languages (e.g. Maspero 1912, Vương 1963: 187–190, Nguyễn T.C. 1995: 231–323, Nguyễn N.S. 2003: 137–140) are problematic by including (a) words originally Chinese in origin, (b) areal terms in multiple language families in the region, or (c) probable false assumptions based on insufficient and mixed historical linguistic data and misapplication of historical linguistic methodology. Even the types of Sinitic grammatical vocabulary borrowed into Tai (cf. Alves 2015b) are quite different from that in Vietnamese, suggesting distinct eras and scenarios of language contact with Sinitic. Finally, while there may eventually be some persuasive lexical evidence of early Tai-Vietic contact (e.g. Tai and Vietic ‘drum’ potentially from the Đông Sơn era of bronze drums (cf. Alves 2015b: 42, Churchman 2016: 33)), as for issues of grammatical matters, lacking studies of diachronic syntax of Tai-Kadai, we can only make very general claims of interaction between Tai and Vietic and any shared regional typological tendencies.



1.2 Proto-Vietic Grammatical Vocabulary In considering Proto-Vietic syntax, we can also consider grammatical vocabulary. Core Austroasiatic vocabulary in Vietnamese basic vocabulary Vietnamese is well established (Thomas and Headley 1970, Nguyễn N.S. 2003, Alves 2006, etc.), and the connection with Austroasiatic becomes even clearer in light of lexical, phonological, and morphological characteristics of the Pọng-Chứt languages. Consequently, Vietic languages naturally share a significant number of basic functional vocabulary. These include several Austroasiatic etyma, such as ‘you’ and ‘he/she’, ‘this’, basic numbers ‘one’, ‘two’, ‘three’, ‘four’, among others, as well as Vietic innovations, such as ‘I’, ‘no/not’, ‘several’, and a reciprocal word. In Table 2.2, Proto-Austroasiatic etyma are from Shorto 2006, while Mường and Southern Vietic language data are presented following published sources: Nguyễn, Bùi, and Hoàng 2003 for Mường, Nguyễn V.L. 1993 for Rục, Srisakorn 2008 for Thàvựng, and Ferlus 2007 for other Southern Vietic languages. Additional lexical materials consulted include that of the north-central Vinh dialect (Trần and Thái 1997) and a dictionary of ancient Vietnamese words (Vương L. 2002). The SEAlang Mon-Khmer Etymological Dictionary online was also consulted and contains much of the same or additional lexical data. However, in contrast with such likely archaic retentions, substantial borrowing of grammatical vocabulary into Pọng-Chứt languages has also taken place, much of it rather recent. Rục has borrowed various core Vietnamese function words, while Thàvựng has borrowed heavily from Lao and/or Thai. Unlike actual early Vietic etyma, which are highly phonologically differentiated among Vietic languages, these items are transparently related to the source language forms. Grammatical words in Thàvựng of Thai and/or Lao origin include wa1 ‘(say) that’ (Lao wāː ‘(say) that’), kap1 ‘and’ (Lao káp ‘with’), ja1 ‘will’ (Lao cá ‘will’), kua1 ‘more than’ (Lao kwāː), lɛw1 ‘already’ (Lao lɛ̑ːw), bɔʔ1 ‘question particle’ (Lao bɔ̄ ː), all of which seem to have the neutral level tone regardless of the tone in Thai or Lao. This fact is important since some of these Tai words are originally from Chinese (cf. Alves 2015b), but their tones clearly show they are borrowed via Tai, not directly from Chinese. In Rục, grammatical loanwords include Vietnamese bị and được passive, đang progressive, của possessive, and others. Again, the passive items are Chinese etyma, but based on their phonological forms, they were borrowed via Vietnamese, not from contact with Chinese. Overall, such vocabulary has no bearing the ancient historical situation of Vietic and only show evidence about much more recent language contact. It is worth considering the issue of passive voice as an example of the history of Vietic syntax. As is hypothesized in Section 3.3, Proto-Vietic likely had

initial steps in reconstructing proto-vietic syntax table 2.2


Shared Vietic grammatical vocabulary



M(ường), S(outhern) V(ietic), and P(roto) A(ustroasiatic)

About to

– sắp


– hết

each other he/she

– chắc (Nghe An dialect) – hắn


– tao

– – – – – – – – – – – –

many this

– – – – no/not – – several/how – many what

nhiều này ni (Central Viet.) nọ chẳng nỏ (Vinh) mấy

M: khắpa SV: khrạp3 (Rục) PA: *srap ‘ready’ M: hết SV: pahit3 (Rục); (similar forms in Mày, Sách, Malieng, and Kri); PA: *ʔət; *ʔəət; *[ʔ]it ‘used up, finished, lacking’ SV: cak3 (Rục, Mày) PA: (possibly related *cak ‘body’) SV: han3 (Mày); han (Pọng); han3 (Tho); PV *hanʔ PA: *han 3p/demonstrative (Sidwell 2015) M: ho SV: saw (Pọng [Liha and Toum]); soː (Pọng [Phong]); sòw (polite) (Thàvựng); ho:1 (Rục, Mày) – SV: (various comparable forms in Rục, Mày, Sách, Malieng, and Kri) – PA: *niʔ, 91.B *nih

– – – – –

– gìb – – chi (Central Viet.) – – – – mô (Central Viet.) –

PA: *nɔʔ, 92.B *nɔh M: chăng SV: nɔː (Pọng) M: mal SV: bil (Pọng [Phong]); bɔn (Pọng [Liha]); bʌj³ et (Tho [Cuoi Cham]); bʌj³ (Tho [Lang Lo]); bʌl (Pọng [Toum]) – M: chi SV: (similar forms in Malieng and Kri) SV: chamơ3 (Rục); (similar forms in Mày and Arem) PA: *m[o]ʔ; *m[o]h what (Proto-AA)

a While the etymon is listed in a Mường-Vietnamese dictionary with the original, pre-grammaticalized meaning ‘ready’, I could not find a sample in the dictionary of the word with the grammaticalized meaning. It is thus listed in the table in parentheses to indicate this uncertainty. An example of this word in Rục is in sentence (21) in § 3.2. b This word, which has no apparent AA source, has some similarity to Old Chinese *[g]ˤaj, Chinese 何 hé ‘what’. However, the initial palatalization cannot be explained, and the word is short, so it is probably chance similarity.

no explicit marking of passive voice, whether through affixes or words, and instead probably utilized the mediopassive (cf. English ‘This car drives well’). Most Vietnamese passive markers are derived from Chinese morphs, as listed in dictionaries of Vietnamese pronunciations of Chinese characters: Vietnamese bị (adversative passive) from Chinese 被 bèi, do (neutral passive) from Chinese 由 yóu, and được (beneficiary passive) probably from 得 dé (cf. Middle Chi-



nese *tok).9 However, Vietnamese does have a native morph phải with a less commonly used passive-marking function and which is homophonous with the word meaning ‘correct/right’.10 This is a regional typological feature seen in Khmer trəw and Thai thuùk,11 as the morphs both mean ‘correct’ and have passive marking functions in comparable syntactic constructions. Such is also the case for Mường, with phái, cognate with the Vietnamese form, and for Thàvựng, with cɔh1, an entirely native form meaning both ‘right’ and marking the passive voice, as seen in sentence (1). Thus, this pattern is seen in multiple language groups in the region and, as with borrowed lexemes, this regional pattern cannot be reconstructed to Proto-Vietic, but rather must be considered an example of a regional change. (1) kunʔit2 cɔh1 luk1 kat2 child correct snake bite ‘A child was bitten by a snake.’ (Thàvựng) 1.3 Sources of Data on Vietic Syntax The key sources for this study include those in Table 2.3. Syntactic data for Vietic languages varies in quantity and depth. Vietnamese morphosyntax has been explored in numerous descriptive grammars and linguistics articles on Vietnamese syntax, and searchable corpora of Vietnamese are increasingly available. Thus, there is a tremendous amount of sentential data of modern Vietnamese and many older texts from a several-hundred-year literary tradition, though many older texts have yet to be digitally available. 9



The word được is irregular in having a low rather than the expected high tone. There is, nevertheless, sufficient shared phonetic, semantic, and distributional features of this word in Vietnamese and varieties of Chinese to consider it a likely Chinese loanword. Moreover, the history of language contact and the use of this etymon with an abilitative function in early Vietnamese writing from the 1300s suggest this is a word likely borrowed well before that time. The slight phonetic irregularity may be the result of substantial time depth of a spoken, high function word, or it represents the timing of the borrowing of the word in relation to periods of voicing shifts. This has been speculated to be an Early Sino-Vietnamese form of Chinese 被 (bèi), for which there are some reasonable phonological correspondences (syllable shape and tone category, but unexpected initial). However, lacking textual evidence of passive in early Vietnamese writings, we must assume this is an instance of a regional grammaticalization path. The situation is complicated by related meanings of these words: in Thai, a homophonous form means ‘to contact’ (parallel to Malay kena, which also means ‘to contact’ and marks an adversative passive), and in both Vietnamese and Khmer, the same form means ‘must’. There thus appears related tendencies, but the timing and direction of spread, among other matters, remain unclear and are beyond the scope of this paper.

initial steps in reconstructing proto-vietic syntax table 2.3


Key syntactic data sources for Vietic languages

Vietnamese – Numerous standardized grammars and linguistic analytical works of modern Vietnamese – De Rhodes’ 1651 grammatical description of Vietnamese – Written Nȏm writings in Vietnamese since the late 1300s – Cư Trần Lạc Đào Phú (1300s) – Phật thuyết đại báo phụ mẫu ân trọng kinh (1400–1500s) – Truyện Văn Kiều (early 1800s) Other Vietic languages – Syntactic data of Mường (e.g. Nguyễn, Bui, and Hoàng 2002) – Syntactic descriptions and/or data of highly conservative Southern Vietic languages – Rục (grammar by Nguyễn V.L. 1993 and hundreds of sentences Solntsev, Solntseva, Samarina 2001) – Thàvựng (full grammar by Srisakorn 2008) – Kri (partial sketch grammar Enfield and Diffloth 2009) Other languages – Old Khmer texts (Sak-Humphrey 1993) – Ancient Chinese texts (

In contrast, there are (to my knowledge) no published grammars of Mường, though there is a 550-page Mường-Vietnamese dictionary with thousands of phrase and sentence samples. Of the two dozen Pọng-Chứt languages, only Rục, Mày and Thàvựng have been amply described in complete grammars, while Kri has some useful but limited coverage of its syntax. Moreover, Russian teams of linguists have published a collection of well over a thousand sentences and phrases in Rục and Mày (with Russian and Vietnamese translations). Crucially, however, all of these minority languages lack writing systems and historical documents to show the morphosyntax of the languages in the past. Nevertheless, this collection of data sources of Vietic morphosyntax is the largest utilized to date. The data was not analyzed statistically, considering the state and limited amount of data (except for Vietnamese). Instead, the core features of sen-



tence and NP structure were gained from descriptive grammars, and in cases where grammars are lacking, such as Mường, available data was considered, and representative sentences were selected. While all the listed sources were consulted, in the sentential data throughout the paper, the sources consistently come from the following sources: Rục from Nguyen V.L. 1993 and Solntsev, Solntseva, Samarina 2001; Mày from Babaev and Samarina 2018; Kri from Enfield and Diffloth 2009; Thàvựng from Srisakorn 2008; Mường from Nguyễn, Bui, and Hoàng 2002; and Vietnamese from various textual sources.


The Vietic Noun Phrase

NP structure in modern Vietic languages is consistently right-branching in all available data,12 but the positions of quantifiers, measure words and classifiers show some variation, appearing mostly before nouns but after nouns in a minority of cases. The most common Vietic NP structure is a typological blend of MSEA and Chinese-type structures, as can be shown with comparative examples from Vietnamese, Cantonese, and Khmer. As in sentences (2) to (4), the semantic head noun is phrase-final in Cantonese, phrase-initial in Khmer (no classifier, which is optional), and phrase medial in Vietnamese, with quantity before and modifiers after. (2) ngo5 go2 di1 saam1 zek3 hak1-sik1 gau2 我

啲 三


1 dist pl 3 clf black dog ‘Those three black dogs of mine’ (Cantonese) (3)

ckae kmaw bəy rəbɔh kñom nuh dog black three poss 1 dist ‘Those three black dogs of mine’ (Khmer)


One exception is in lexical borrowing from Chinese into Vietnamese (sometimes secondarily into other Vietic languages) in which two-syllable compounds have the Chinese order adjective-noun. Such words are frequently unanalyzed by Vietnamese speakers except through awareness by explicit studying of Sino-morph meanings. Some compounds which have semantically recognizable elements to Vietnamese speakers have been reversed (e.g. Á Châu (asia-continent) ‘Asia’ from Chinese 亞洲 yà zhōu has come to follow Vietnamese NP order Châu Á (continent-Asia)).

initial steps in reconstructing proto-vietic syntax


(4) ba con chó đen đó của tôi three clf dog black dist poss 1 ‘those three black dogs of mine’ (Vietnamese) All documented Vietic languages in Vietnam (Vietnamese, Muong, Ruc, and May) follow the same pattern, and so early textual data becomes important to consider. Among Vietic languages, only Vietnamese has a literary tradition and thus a means of looking into the past to see what its NP structure was. Vietnamese language texts from the 1300s show that the Vietnamese NP, with pre-nominal quantification, was already established. Sentence (5) comes from the Cư Trần Lạc Đào Phú (居塵樂道賦), a 14th century Buddhist scripture and one of the earliest complete texts in Nôm writing. Note that the unit noun đứa ‘individual’ is in modern Vietnamese a fully grammaticalized classifier used for children or for adults with a derogatory sense. (5) những đứa 仍

ngây thơ 疑𦭟

PL individual simple-minded ‘simple-minded people’ (Vietnamese) While the text contains noun phrases with numerals, it does not contain prenominal classifiers, such as the modern Vietnamese default classifier cái. In the historical text, numbers (‘one’, ‘nine,’ etc.) and quantity expressions (‘many’, ‘each’, etc.) occur directly before two classes of nouns, namely, humans and units of time (‘instance’ and ‘night’). A separate category of quantity-noun structure is of Chinese origin, namely, specialized Buddhist terminology (e.g. Vietnamese tam độc (three-poison) ‘the three poisons’, with Chinese words 三 毒 san dú, without a classifier or measure word, etc.), which ultimately stem from older stages of Chinese and cannot be considered representative of spoken Vietnamese. Even by the 1400s13 in Phật Thuyết Đại Báo Phu ̣Mẫu Ân Troṇg Kinh (佛説大報父母恩重經), a Buddhist Sutra, classifiers appear to have been optional (Vũ 2014: 6). However, in the 15th century Quốc Âm Thi Tập (國音詩集), a collection of poetry by Nguyễn Trãi, classifiers are quite numerous and occur in both pre-nominal and post-nominal position, as discussed in 2.1.2. Overall, this suggests that the evolution of classifiers in the Vietnamese NP occurred largely in the last several centuries (and thus long after the period of contact with Chinese in the second half of the first millennium CE), though more data is 13

It has been posited that this text was from the 1200s, but Shimizu (2015) notes lexical and visual features that place it in the 1400s instead.



needed to clarify this matter. However, if this is the case, one possibility is that this development in the Vietnamese noun phrase spread to other Vietic languages in this period. 2.1 Vietic NP Data: Positions of Quantification and of Modifiers In available syntactic data of all Vietic languages, all modifiers (i.e. adjectives, relative clauses, prepositional phrases, possessive nouns and NP s, and demonstratives) appear after head nouns. However, the position of quantifiers in NP s shows variation. Modern Vietnamese (unlike Vietnamese in ancient writings as considered in §2 and §2.1.2), Mường, and Southern Vietic languages in Vietnam (though only Rục and Mày have adequate descriptions) have quantifiermeasure phrases before nouns, while in Thàvựng, spoken in Laos and Thailand, quantification follows head nouns, as they do in Lao and Thai. Finally, data on Kri, spoken in Laos, demonstrates flexibility, with quantifier-measure phrases freely occurring before or after head nouns, as is the case in Vietnamese in texts of the 1400s. Clearly, understanding of the quantity-measure unit within the Vietic NP will require comparative data and consideration of the history of language contact. 2.1.1 Post-nominal Modification The post-nominal position of modifiers is consistent throughout all available Vietic data (see footnote 16 about words with Sino-morph compounds). In sentences (6) and (7), modifiers follow head nouns in NP s. Determining differences in the relative ordering of modifying elements would require additional investigation. (6) ʔuh1 ʔit2 laŋ1 ʔali1 house small clf prox ‘this small house’ (Thàvựng) (7) hal1 hon1 mêew2 tạng1 angaj3 ling1 tơang2 two clf cat prog run on road ‘two cats that are running on the road’ (Rục) Available data shows that relative clauses retain the SV/AVP word order but without resumptive pronouns, as in (7), (8) and (9). Regarding relative clause markers, Vietnamese has a relative clause marker mà, as in (8), which is still optional in some circumstances, but available data often shows no marker used in other Vietic languages, such as Mường, as in (9). Thus, etyma for relative clause markers are not reconstructable at this time.

initial steps in reconstructing proto-vietic syntax


(8) cuốn sách (mà) tôi đã đọc clf book rel 1 compl read ‘the book that I have already read’ (Vietnamese) (9) môch pấc tlanh cỏ dả tlĩ lẳm one clf picture have value very ‘A picture which has great value’ (Mường) Also, while demonstratives are frequently in NP-final position in other MSEA languages, such as Khmer, Sre, Thai, Nung, and Hmong (Clark 1989: 187), in Vietnamese and Rục, possessive nouns are NP final, as in (10). Thàvựng, on the other hand, has essentially the same word order as Lao and Thai do, including the borrowing of grammatical vocabulary, such as the Lao-Thai possessive marker, as in (11). It is thus not possible to say anything concrete about word order of modifier elements in the early Vietic NP, other than that they followed nouns they modified. (10) mấy cá i nhâñ này của tô i (Vietnamese) bɣ̆ l3 kɛ4 kɯcɛn1 ni1 kuə4 ho1 (Rục) pl clf ring prx poss 1 ‘these few rings of mine’ (Vietnamese and Rục) (11) cɔ2 lɔk1 han1 to1 khɔŋ1 kan1 lo1 dog white two clf poss 1 prx ‘those two white dogs of mine’ (Thàvựng) 2.1.2 Mixed Order of Quantification Unlike the consistently post-nominal position of modifiers, quantification in Vietic varies regionally. In available data, pre-nominal quantification does occur most often. In Vietnam, at least for Vietnamese, Mường, Rục, and Mày, NP order is consistently numeral-measure/classifier-head noun, as in (12). (12) ba con dao này (Vietnamese) pa1 kɛ4 aɲɛ:n1 ni1 (Rục) three clf knife prx ‘these three knives’ However, in Vietnamese poetry, flexibility with NP word order in which the quantity-measure/classifier in post-nominal position has been noted (Nguyễn Đ. H. 1957: 126). Numerous instances can be seen in poems from the early



1400s (Nguyễn Trãi’s (1380–1442) collection of poems in the Quốc Âm Thi Tập ‘National Language Poetry Collection’), as in (13). These post-posed numeralphrase units often appear in places where the shift to post-nominal position serves to facilitate rhyming in poems. However, such alternations also appear in prose texts such as Nguyễn Dữ’s (1497-?) Tân Biên Truyền Kì Mạn Lục Tăng Bổ Giải Âm Tập Chú, from around the 16th century (Nguyễn Tuấn Cường p.c.), such that rhyming cannot be the main factor. However, as Vietnamese literati were well-read in Classical Chinese, in which there were periods when quantity expressions followed nouns (Peyraube 1996), it is not impossible that this represents a literary register in Vietnamese.14 Ultimately, whether this represents an original archaic feature, an earlier state of flux/free alternation in earlier Vietic, or flexibility only within the poetic register will require additional research to show whether this data can clarify the early Vietic NP. (13) Thơ một hai thiên rượu một bình poem one two clf wine one bottle ‘One or two poems and a bottle of wine.’ (Vietnamese) (Quốc Âm Thi Tập no. 14) Thàvựng, on the other hand, follows the Thai-Lao entirely right-branching NP pattern, as in (14). Lexical borrowing further suggests structural impact. The Thàvựng classifier in (14) is a probable Lao loanword for animals (cf. Lao to: ໂຕ, not likely from Thai tua ตัว). (14) ka2 sak1 pa1 pon2 to1 fish about three four clf ‘3 or 4 fish’ (Thàvựng) However, it is worth noting that in MSEA, the post-nominal quantification pattern appears to be an ancient one. Written records of Old Khmer show the order noun-quantity (shown in sentence (16) in the next section) with classifiers very rarely used (Sak-Humphrey 1993: 19–21). Thus, while it is possible the Thàvựng NP was influenced by Lao, it is also possible this order—regardless of


I must thank Nguyễn Tuấn Cường of the Institute of Sino-Nom Studies (Vietnam Academy of Social Sciences) for identifying many samples of these post-posed phrases from the Nôm poetry and prose. He also noted that the Vietnamese authors of that writing would have had working knowledge of the early stages of Chinese prior to the emergence of classifiers and speculated that this might have had been a factor in this word order.

initial steps in reconstructing proto-vietic syntax


the borrowing of a classifier—is a retention, whereas the Chinese pattern in Vietnamese represents change from the Proto-Vietic NP. Lastly, the Kri NP shows flexibility with respect to quantification, with the quantity-measure/classifier unit potentially appearing before or after the semantic head noun, as shown in (15). This may represent competing influences of Lao and Vietnamese due to bilingualism in those languages (or other minority languages in the Vietnamese typological region), but it may also represent an earlier stage of flexibility. (15) haar longq kadeeq two clf child ‘two children’

kadeeq haar longq child two clf ‘two children’ (Kri)

Again, the varied position suggests uncertainty about the original position of quantification in Vietic. Language contact is clearly a significant factor in the regional patterns probably developing over the past several centuries. However, comparative data in Old Khmer provides additional data to suggest a post-nominal position of quantification was possible in Vietic, as discussed in § 2.2. 2.2 Summary of the Proto-Vietic NP Again, while the proto-Vietic NP most likely had post-nominal modifiers—a persistent feature in both Vietic and other Austroasiatic languages and despite Việt-Mường contact with Sinitic with modifier-first NP structure—the data showing varied position of quantification in NP s reduces the uncertainty about the original position of quantification in Vietic. Language contact is the likely cause of the regional variation. The likelihood that Vietnamese could have served as a sociolinguistic influence to shape NP structure in neighboring Mường and Southern Vietic languages (as well as possibly Katuic and Bahnaric and even varieties of Tai in Vietnam (cf. Jones 1970 on the issue in greater Southeast Asia)) further strengthens the notion that areal typological convergence is the cause. Another question is what order the quantification and the modifiers took in the distant past. As the Vietic languages in Vietnam have pre-nominal modifiers—as do Katuic and Bahnaric languages in Vietnam—there is no alternate model. But if one considers the NP in modern Khmer, and especially Old Khmer, it is certainly possible that quantity followed modifiers in early Vietic as well: *[noun-modifier-quantity-(measure)]. Jacob notes how in Old Khmer inscriptions, noun phrases show the modern order of noun + numeral (e.g., Old Khmer kon ber (child-2) ‘two children’), but they also frequently had the



distinct order of noun + (unit) + numeral (e.g., prak samrap mvay (silver-setone) ‘one set of silverware’) (Jacob 1993: 34). The Old Khmer sample phrase in (16) is certainly a viable model for what the NP in early Vietic (and Katuic and Bahnaric) may have been. (16) ‘anak ‘aṅgana ponna people court four ‘four people of the court’ (Old Khmer) (Sak-Humphrey 1993: 20) Overall, (a) the Old Khmer NP order is old enough to predate intensive contact with Tai and thus is not the result of influence; (b) the grammaticalization of Chinese classifiers occurred only towards the end of the 1st millennium CE and could not have influenced Vietic NP structure until that point or later; and (c) Vietnamese writings from the 1300s to 1400s show mixed use of classifiers (flexible position, optional or limited usage, etc.) and only more stabilized usage in the 1500s. Based on these matters, it is reasonable to assume that in 200 BCE, Vietic had no classifiers (there is insufficient data to discuss general unitary measure words, which are assumed to have existed), and with right-branching NP structure, quantity words would have followed the head noun. Thus, the favored working hypothesis of early Vietic NP order is *[noun(optional measure)-quantity] until additional data shows otherwise.


The Vietic Sentence

Early vernacular Vietnamese Nôm writings show no significant differences in general clause structure properties from those of modern Vietnamese (i.e. topic-comment structure with tendency toward SVO patterns). It may thus be possible that, in overall parameters, Vietic clause structure before 200BCE was largely the same as it is today. This is noteworthy considering the long period of language contact that has otherwise substantially transformed Vietic phonology, morphology, and NP structure, though whether more careful study of early Vietnamese writings could reveal further subtle differences remains to be seen. The features of clause structure noted in Table 2.4 are also common among languages of other language families in MSEA, with the assumption that longterm language contact may also have contributed to the current situation in Vietnamese. Language contact need not always lead to change: it could equally have helped maintain original language features. Taking account of shared features of all modern Vietic languages, the Proto-Vietic clause may have had the traits listed in Table 2.4.

initial steps in reconstructing proto-vietic syntax table 2.4


Hypothesized features of the Proto-Vietic clause




Various sentence-initial semantic functions (e.g., preposed objects, conditional or causal elements, general topic focus, etc.) Verb-medial Tendency towards SV(O) pattern Middle-passive No full passive voice expressed with either morphologicala or lexical marking Right-branching VP Generally right-branching complements (e.g. prepositional phrases or other time and location words) a A few Austroasiatic languages in Mainland Southeast Asia have morphemes deriving stative verbs/adjectives with passive-like meanings (Alves 2015a: 543), but these appear to be lexicalized functions and not an indication of early passive structures in the past in Austroasiatic.

The verb-medial pattern is also shown in relative clauses in noun phrases, as noted in Section 2.1.1. Table 2.4 refers only to single-clause structures and does not deal with multi-clause constructions (e.g. conditional sentences, which can be considered an expansion of topic-comment structure) or with issues of interrogative statements (in-situ questions in Vietnamese and Chinese but also many other MSEA languages), for which reconstructions are beyond the scope of this study. The aspects in Table 2.4 are discussed and exemplified below. 3.1 Topic-Comment Structures Topic-comment structures and topicalization are a prominent part of Vietnamese grammar, like other Vietic languages as well as many other languages in the region. In the data of Vietic languages, topicalization can involve a fronted patient, as in (17). (17) ʔaw2 ʔali1 kan1 cak2 wɨn1 shirt prx 1 buy come ‘I bought this shirt.’ (Thàvựng) Another type of topic-comment structure is one in which the sentence-initial NP has a more generalized semantic theme. In (18) and (19), the topics are not the subjects of the sentences; instead, they have semantic possessive relationships with the subjects (i.e. referring to the price of the pig and the window of the house), but expressed structurally in topic-comment patterns (e.g. ‘as for



the pig, the price’ and ‘as for the house, the window’). It is also worth noting that the nouns do not form noun phrases as this would violate the strict headmodifier order of NP s (cf., they cannot be single noun phrases meaning ‘the price of the pig’ or ‘the window of the house’). (18) con củi nì dả cơ-nò clf pig prx price how-much ‘As for this pig, the price is how much?’ (Mường) (19) nha2 ni l tunmợh1 pạn pạn3 house prx window short ‘As for this house, the window is short.’ (Rục) Neither (18) nor (19) have lexical markers meaning ‘as for X’ to connect topics to comments, and grammars of these languages, while listing such vocabulary, do not provide enough information to understand their usage precisely in terms of frequency, pragmatics, or phrasal intonation. In contrast, topic-comment linkers in Vietnamese (e.g. thì, là, mới, mà, etc.) as in (20) are much better understood. In such topic-prominent constructions in Vietnamese, the roles of the topics and the semantic relationships between the theme and rheme can vary tremendously, though the initial NP occurring with lexical markers are interpreted as definite (Cao 1992). (20) ông Ba thì tôi không biềt nhà mister name (link) 1 no know home ‘As for Mr. Ba, I don’t know his address.’ (Vietnamese) (Cao 1992: 143) There is not yet enough data about topic-comment linkers to reconstruct any in Vietic based on available data, and such particles are generally either borrowed (e.g. the forms ma and la are linkers in Rục, and seen in also the Katuic language Pacoh) or have probable grammaticalization paths that are rather recent. While not yet proven, the most high-frequency modern Vietnamese linker thì ‘… then …’ (which appears with a linking function even in the Cư Trần Lạc Đào Phú mentioned in section 2.1) may stem from the homophonous Sino-Vietnamese thì, the reading of Chinese shí 時 ‘time/when’ and which still carries semantics of ‘time’. Also, Vietnamese mới ‘only then’ (even more frequent in the abovementioned ancient Vietnamese text), which is homophonous with mới ‘new’ is a reasonable grammaticalization path: NEW > THEN.15 Altogether, such words 15

Compare Malay baru meaning ‘new’ and ‘only then’ and Cantonese sin 先 ‘first’ and linker

initial steps in reconstructing proto-vietic syntax


are relatively more recent either as innovations or loanwords, and this suggests that such particles were probably not in Vietic, though it is too early to make strong claims. 3.2 Verb-Medial Structures A correlate to topic-comment structures is that, in multi-argument clauses, Proto-Vietic most likely had a verb-medial pattern. It is not accurate to characterize Vietic as a strict SVO language, nor is there inflectional morphological marking in any Vietic language (as is the case in most Austroasiatic languages of MSEA) to test the subjecthood of NP s. Still, the SVO pattern is common enough to categorize modern Vietic languages as SV/AVP languages. Sentences (21) to (23) show subjects in sentence-initial position and objects in sentence-final position in a few non-Vietnamese Vietic languages. (21) tlu1 kuơ4 hạn3 khrạp3 kưchit3 buffalo pos 3 about die ‘His buffalo is about to die.’ (Rục) (22) da tló-thay cảy chi đỉ 2 point-hand clf what there ‘What are you pointing at there?’ (Mường) (23) sòòk tzrôôh mleeng mee vềềk boo dêêh seek meet person do work pcl neg ‘Have (you) found somebody to do the work?’ (Kri) One type of verb-medial sentence involves adjectival stative verbs with body parts as oblique objects and with animate subjects, as in (24). Such verbs generally cannot be progressive but can co-occur with intensifying adverbs, as in (25). In many cases, such stative verbs refer to pain, sickness, or other impediments. However, in Vietnamese, they also include neutral or positive meanings, such as being skilled/physically able (khéo tay (skilled-hand) ‘to be dexterous with one’s hand’) or sexually stimulated. This pattern has also led to metaphorical lexicalized constructions, as in tốt/xấu bụng (good/bad-stomach) ‘to be good/bad of one’s heart (i.e. to be good-hearted)’.

‘only then’, though admittedly, this is still insufficient to demonstrate a regional grammaticalization path.



(24) hạn3 sot3 kulôok4 3 hurt head ‘His head hurts.’ (Rục) (25) tôi mỏi chân lắm 1 tired leg/foot very ‘My foot is very tired.’ (Vietnamese) This pattern is seen in many other MSEA languages, such as Thai, Khmer, Pacoh, and Malay, among others in the region, and it is quite distinct from the Sinitic pattern, in which verbs appear after the affected body parts, as in (26). This is thus yet another way in which Vietnamese and Vietic syntax patterns with the Southeast Asian typology and is distinct from the Sinitic one, as in (26). (26) ngo5 tau4 tung3 我

1 head hurt ‘My head hurts / I have a headache.’ (Cantonese) These types of sentences are, in a sense, two-layered topic-prominent constructions, albeit with one of the topics after verbs (i.e. in (25), ‘as for my being tired, it is of the foot’). As this is clearly distinct from the Chinese type, it might be a retention in Vietic, though it is currently impossible to separate retention in Vietic or other Austroasiatic languages from the impact of regional typological convergence, pending evidence from ancient texts in Khmer or Vietnamese. Still, one hypothesis is that this construction could be a fossilized pattern from an even older verb-initial/comment-topic pattern in Proto-Austroasiatic (Jenny 2015 and in this volume), but one in which a more recent pre-posed topic has been added (e.g. in (24), the verb-noun pattern ‘hurt-head’ has a topic ‘he’). 3.3 Middle-Passive Constructions Another effect of employing topic-comment structures and lacking inflectional morphology is the lack of explicit marking of passive voice, as in (27) to (29). The structural and semantic overlap when there is no lexical marking makes these sentence types similar to topic-comment constructions, but the combined tendencies of (a) frequent occurrence with inanimate nouns, (b) the consistent thematic role of patient rather than others, and (c) the use of distinct lexical markers other than topic-comment markers suggests they be treated separately. Mediopassive constructions among Vietic languages (as well

initial steps in reconstructing proto-vietic syntax


as many other languages in the region) are interpretable not by lexical or morphological marking but rather by interpretive semantics. The subjects/topics in such sentences tend to be inanimate and thus readily interpreted as patients despite the lack of morphemes to indicate their semantic roles. As noted, passive voice markers exist in Vietic languages, but in most cases, these can be shown to be innovated or borrowed lexemes (see § 1.2). (27) na3 ni1 mưn2 pạŋ2 kơâj2 crossbow this make from tree ‘This crossbow is made of wood.’ (Rục) (28) nầng nhà lắp bản wall house assemble board ‘The walls of this house were assembled with boards.’ (Mường) (29) cuốn sách này đã bán hết rồi clf book prx pst sell complete already ‘This book is already sold out.’ (Vietnamese) 3.4 Right-Branching VP (Complements and Adjuncts) A verb-medial, topic-prominent language with SVO tendency corresponds with right-branching VP structure. This may include also serial verb constructions and prepositional adjuncts. An example can be seen in (30) of a Kri verb phrase in which there appears both a post-verbal directional verb ‘exit’ and a source preposition ‘from’. In 31, the verb ‘enter’ functions as the directional indicator with the goal of the locational noun ‘inside’. (30) lêêq kààjh lôôh dêêwq vòòngq take stone exit from pot ‘Taking a stone out from a pot.’ (Kri) (31) han3 ʔaŋai3 lɔ̤:n2 lɔ:ŋ1 ɽaw1 3s run enter inside forest ‘He ran into the forest.’ (Mày) While some likely Proto-Vietic grammatical words have been identified (§ 1.2), no prepositions can be reconstructed for Proto-Vietic. Possible cognate prepositions are sometimes seen in more than one Vietic language. For example, Kri dêêwq ‘from’ in (30) may be related to Rục dêw4 ‘from’, but the somewhat similar looking Vietnamese word từ ‘from’ is probably not related as that is an Early



Sino-Vietnamese word (cf. standard Sino-Vietnamese tự from Chinese 自 zì). Borrowing of these words from Vietnamese or Thai and Lao is also common. Grammaticalization of verbs into prepositions via serial verb constructions is a likely source of prepositional constructions. In (32) and (33), the directional verbs in both serial verb constructions probably belong to the same etymon. Regardless, it seems likely that Vietic used post-verbal directional terms, again part of the right-branching nature of the language, but this issue requires more exploration. (32) hô1 palôon2 kús1 lôon2 nha2 1 cause-enter firewood enter house ‘I brought the firewood into the house.’ (Rục) (33) pa1 kɔn1 lɔn1 wat1 take kid enter temple ‘The mother took children to the temple.’ (Thàvựng) Proto-Vietic derivational morphology could have allowed for syntactic constructions not available to a robustly isolating language as modern Vietnamese. Thus, one possibility is that the loss of affixes had impact on some early Vietic syntactic patterns. Like other Austroasiatic languages, Rục has causative prefixes, of which one derives a ditransitive, causative verb as in (34), while in Vietnamese, which lacks such affixes, causation requires a separate verb and goal-marking preposition cho ‘for’, derived from the sense ‘give’ in (35). However, until more data is gathered on Southern Vietic languages, little can be said about early Vietic morphosyntax. (34) mêe4 pamơâk3 aw3 pakon1 mother cause-wear clothes child ‘The mother had the child put on clothes.’ (Rục) (35) mẹ mặc aó cho con mother put on clothes for child ‘The mother dressed the child.’ (Vietnamese)

initial steps in reconstructing proto-vietic syntax



Summary of Main Claims and Future Directions

While features that are common in a language group often lead to reconstruction of that feature, such data must be weighed against historical language contact and typological tendencies. Based on available modern and historical data, coupled with assumptions about long-term language contact and typological convergence, the following tentative hypotheses can be made about early Vietic clause and noun phrase structure. The general properties of the early Vietic clause (see Table 2.4) appear to have changed little. Early Vietic was topic-prominent and verb-medial, as the languages are today. Its clause structure was likely right-branching, with complements after verbs. It may have had SVO tendencies, though identifying a period of grammaticalization of the subject is currently impossible. There is no evidence Proto-Vietic marked the passive voice, a feature which appears to have developed much later in Vietic history largely through language contact, and so it probably employed middle-voice constructions, which overlapped structurally with topic-comment sentences. The Early Vietic Clause *[topic-verb-complement] Evidence of the Vietic NP shows it had strongly right-branching structure without classifiers and with probable post-nominal quantification, possibly after modifiers, though this last detail is less certain. The pre-nominal position of quantifiers in Vietnamese, Mường, and Pọng-Chứt languages in Vietnam is likely due to language contact with Sinitic and its grammaticalization of classifiers in the late first millennium BCE, while other Vietic languages appear to follow the Thai/Lao pattern. But comparative data on the Old Khmer noun phrase supports the possibility of a post-nominal position of quantifiers in Vietic as well. The Early Vietic Noun Phrase *[noun-modifier-quantity] As noted, typological convergence could have completely masked original structures, making it impossible to be certain of these tentative reconstructions. Also, the limits of the data cannot be understated. Long-term data gathered in naturalistic language settings could reveal other patterns giving additional insights. Other diachronic syntactic studies should include exploration of functional categories, such as negation, interrogative sentences, position of



time and location words and expressions, comparison, and coordination and multi-clausal constructions.

References Aldridge, Edith. 2015 (online). Old Chinese syntax: Basic word order. In The Encyclopedia of Chinese Language and Linguistics, General Editor Rint Sybesma. Accessed 19 December 2016.‑7363_ecll_COM_00000307. Alves, Mark J. 2001a. What’s so Chinese about Vietnamese? In Papers from the Ninth Annual Meeting of the Southeast Asian Linguistics Society, Graham W. Thurgood ed. 221–242. Arizona State University, Program for Southeast Asian Studies. Alves, Mark J. 2003. Ruc and other Minor Vietic languages: Linguistic strands between Vietnamese and the rest of the Mon-Khmer language family. In Papers from the Seventh Annual Meeting of the Southeast Asian Linguistics Society, Karen L. Adams, Thomas John Hudak, and F.K. Lehman eds. Tempe, Arizona: Arizona State University, Program for Southeast Asian Studies. 3–19. Alves, Mark J. 2005. Sino-Vietnamese grammatical vocabulary and triggers for grammaticalization. The 6th Pan-Asiatic International Symposium on Linguistics. Hà Nội: Nhà Xuất Bản Khoa Học Xã Hội (Social Sciences Publishing House). 315–332. Alves, Mark J. 2006. Linguistic research on the origins of the Vietnamese language: An overview. Journal of Vietnamese Studies 1. 1–2:104–130. Alves, Mark J. 2007. A look at North-Central Vietnamese. In SEALS XII Papers from the 12th Annual Meeting of the Southeast Asian Linguistics Society 2002, Ratree Wayland, John Hartmann Paul Sidwell eds. Canberra: Pacific Linguistics. 1–7. Alves, Mark J. 2009a. Loanwords in Vietnamese. In Loanwords in the World’s Languages: A Comparative Handbook, Martin Haspelmath and Uri Tadmor eds. Berlin/Boston: De Gruyter Mouton. 617–637. Alves, Mark J. 2009b. Sino-Vietnamese grammatical vocabulary sociolinguistic conditions for borrowing. Journal of the Southeast Asian Linguistics Society 1. 1–9. Alves, Mark J. 2015a. Morphological functions among Mon-Khmer Languages: Beyond the basics. In Nick Enfield and Bernard Comrie eds. Mainland Southeast Asian Languages: The State of the Art. Berlin/Boston: Mouton de Gruyter. 531–557. Alves, Mark J. 2015b. Grammatical Sino-Tai vocabulary and implications for ancient Sino-Tai sociolinguistic contact. Paper given at the 48th Annual International Conference of Sino-Tibetan Linguistics, UC Santa Barbara, August 21–23, 2015. Powerpoint presentation. Alves, Mark J. 2015c. Historical notes on words for knives, swords, and other metal implements in Early Southern China and Mainland Southeast Asia. Mon-Khmer Studies 44:39–56.

initial steps in reconstructing proto-vietic syntax


Alves, Mark J. 2016. Identifying early Sino-Vietnamese vocabulary via linguistic, historical, archaeological, and ethnological data. The Bulletin of Chinese Linguistics 10. 1. Ansaldo, Umberto. 2010. Surpass comparatives in Sinitic and beyond: typology and grammaticalization. Linguistics 48. 4:919–950. Babaev, Kirill V. and Irina V. Samarina. 2018. May language: Materials of the RussianVietnamese linguistic expedition (in Russian). YASK Publishing House. Behr, Wolfgang. 2009. Classifiers, lexical tone, determinatives: a look at their emergence and diachrony in Old Chinese and beyond. Handout from Diachrony of CLF Workshop, NIAS Wassenaar, 2009. Benedict, Paul K. 1947. An Analysis of Annamese Kinship Terms. Southwestern Journal of Anthropology 3:371–392. Cadiere, Léopold. 1905. Les Hautes Vallées du Sòng-Gianh. Bulletin de l’ Ecole française d’Extrême-Orient 1:349–367. Cao, Xuân Hạo. 1992. Some Preliminaries to the Syntactic Analysis of the Vietnamese Sentence. Mon-Khmer Studies 20:137–152. Chamberlain, J.R. 1998. The origin of Sek: implications for Tai and Vietnamese history. In The International Conference on Tai Studies. S. Burusphat, ed. Bangkok, Thailand, 97–128. Institute of Language and Culture for Rural Development, Mahidol University. Chamberlain, J.R. 2003. Eco-Spatial History: a nomad myth from the Annamites and its relevance for biodiversity conservation. In X. Jianchu and S. Mikesell, eds. Landscapes of Diversity: Proceedings of the III MMSEA Conference, 25–28 August 2002. Lijiand, P.R. China: Center for Biodiversity and Indigenous Knowledge, 421–436. Chéon, M.A. 1907. Note Sur Les Dialectes Nguon, Sac Et Muong. Bulletin de l’ Ecole Francaise d’Extreme-Orient 7.1–2:87–100. Chew, Grace. 2015. Vietnamese Terms of Address and Person-References: Ideological Change and Stability. Doctoral thesis, University of Huddersfield. Clark, Marybeth. 1989. Hmong and areal Southeast Asia. In Papers in South-East Asian Linguistics No. 11: South-East Asian Syntax. David Bradley, ed. Canberra: Pacific Linguistics. 175–230. Churchman, Catherine. 2016. The People between the Rivers: The Rise and fall of a Bronze Drum Culture, 200–750 CE. New York: Roman and Littlefield. de Rhodes, Alexandre. 1991 (translation into Vietnamese from 1651 publication). Từ Ðiển Annam-Lusitan-Latinh (Thường Gọi là Từ Ðiển Việt-Bồ-La) [An AnnamesePortuguese-Latin Dictionary]. Ho Chi Minh City: Nhà Xuất Bản Khoa Học Xã Hội. Delancey, Scott. 2013. The origins of Sinitic. In Zhou Jing-Schmidt, ed. Increased Empiricism: Recent Advances in Chinese Linguistics, 73–99. Amsterdam: John Benjamins Publishing Company.



Enfield, Nick J. 2003. Linguistic epidemiology: semantics and grammar of language contact in Mainland Southeast Asia. London: Routledge. Enfield Nick J. and Gérard Diffloth. 2009. Phonology and sketch grammar of Kri, a Vietic language of Laos. Cahiers de Linguistique Asie Orientale 38.1:3–69. Enfield, N.J. and Bernard Comrie. 2015. Mainland Southeast Asian language. In The languages of Mainland Southeast Asia: The state of the art. N.J. Enfield and Bernard Comrie, eds. Boston/Berlin: De Gruyter Mouton (Pacific Linguistics 649), 1–28. Ferlus, Michel. 1977. L’infixe Instrumental -rn En Khamou Et Sa Trace En Vietnamien. Cahiers de Linguistique Asie Orientale 2:51–55. Ferlus, Michael. 1979. Lexique Thavủng-Français. Cahiers de Linguistique, Asie Orientale, 5:71–94. Ferlus, Michel. 1989–1990. Sur l’origine des langues Viet-Muong. Mon-Khmer Studies 18–19:52–59. Ferlus, Michel. 2007. Lexique de racines Proto Viet-Muong (Proto Vietic Lexicon). Unpublished ms. Accessed 1 January 2017. Ferlus, Michel. 2009. A layer of Dongsonian vocabulary in Vietnamese. Journal of the Southeast Asian Linguistics Society 1:95–108. Gernet, Jacques. 1989 (reprint). A History of Chinese Civilization, Second Edition. Cambridge: Cambridge University Press. Haudricourt, André G. 1954. Comment reconstruire le Chinois Archaïque. Word 10.2– 3:351–364. Hayes, La Vaughn H. 1992. Vietic and Viet-Muong: a New Subgrouping in Mon-Khmer. Mon-Khmer Studies 21:211–228. Higham, Charles. 2014. Early Mainland Southeast Asia: from First Humans to Angkor. Bangkok: River Books Co. Ltd. Jacob, Judith M. 1993. Notes on the numerals and numeral coefficients on Old, Middle, and Modern Khmer. In Cambodian Linguistics, Literature and History, David A. Smyth ed. 27–43. School of Oriental and African Studies, University of London. Jenner, Philip N. and Paul Sidwell. 2010. Old Khmer grammar. Canberra: Pacific Linguistics. Jenny, Mathias. 2015. Syntactic diversity and change in Austroasiatic languages. In Perspectives on Historical Syntax, Carlotta Viti, ed. John Benjamins. 317–340. Jones, Robert B. 1970. Classifier constructions in Southeast Asia. Journal of the American Oriental Society 90.1:1–12. Kim, Nam. 2013. Lasting monuments and durable institutions: labor, urbanism, and statehood in Northern Vietnam and beyond. Journal of Archaeological Research 21.3:217–267. Lê, Đình Khẩn. 2002. Từ vựng gốc Hán trong Tiếng Việt (Words of Chinese Origin in Vietnamese). Ho Chi Minh City: Nhà Xuất Bản, Đại Học Quốc Gia Hồ Chí Minh.

initial steps in reconstructing proto-vietic syntax


Li, Rong (李榮). 1998. 南寧平話辭典 (A dictionary of Nanning Pinghua Chinese). Nanjing, China: 江蘇教育出版社. Lipson, Mark et al. 2018. Ancient genomes document multiple waves of migration in Southeast Asian prehistory. Science 10. 1126/science. aat3118. Maspero, Henri. 1912. Études sur la phonétique historique de la langue Annamite: les initiales. Bulletin de l’École Françoise d’Extrême-Orient 12:1–127. McColl, Hugh et al. 2018. Ancient genomics reveals four prehistoric migration waves into Southeast Asia. Science 10.1126/science.aat3628. Mei, Tsu-Lin. 1970. Tones and prosody in Middle Chinese and the origin of the rising tone. Harvard Journal of Asiatic Studies 30:86–110. Meisterernst, Barbara. 2015 (online). Warring States to Medieval Chinese. In Encyclopedia of Chinese Language and Linguistics, General Editor Rint Sybesma. Accessed 16 December 2016.‑7363_ecll_COM_00000445. Nguyễn, Văn Khang, Bùi Chi, and Hoàng Văn Hành. 2002. Từ điển Mường-Việt (A Mường-Vietnamese dictionary). Hà Nội: Nhà Xuất Bản Văn Hoá Dân Tộc. Nguyễn, Ðình Hòa. 1957. Classifiers in Vietnamese. Word 13.1:124–152. Nguyễn, Ðình Hòa. 1991. Seventeenth-century Vietnamese lexicon: preliminary gleanings from Alexandre de Rhodes’ writings. Austroasiatic Languages: Essays in Honour of H.L. Shorto. J.H.C.S. Davidson ed. 95–104. Nguyễn, Ngọc San. 2003. Tìm hiểu Tiếng Việt lích sử (Exploring the History of the Vietnamese Language). Ho Chi Minh City: Nhà Xuất Bản Đại Học Sư Phạm. Nguyễn, Tài Cẩn. 1995. Giáo trình lịch sử ngữ âm tiếng Việt (Textbook of Vietnamese historical phonology). Hà Nội: Nhà Xuất Bản Gíao Dục. Nguyễn, Văn Lợi. 1993. Tiếng Rục (The Rục language). Hà Nội: Nhà Xuất Bản Khoa Học Xã Hội. Peyraube, Alain. 1996. Recent issues in Chinese historical syntax. In New Horizons in Chinese Linguistics. C.T. James Huang and Y.H. Audrey Li eds. Dordrecht: Kluwer, 161–213. Phan, John. 2013. Lacquered Words: The Evolution of Vietnamese under Sinitic Influences from the 1st century BCE through the 17th Century CE. dissertation. Cornell University. Ratliff, Martha. 2010. Hmong-Mien Language History. Canberra: Pacific Linguistics. Sak-Humphrey, Chhany. 1993. The Syntax of Nouns and Noun Phrases in Dated PreAngkorian Inscriptions. Mon-Khmer Studies 22:1–126. SEAlang Mon-Khmer Etymological Dictionary. ctionary/. SEAlang Mon-Khmer languages project. Accessed 15 April 2016. Shimizu, Masaaki. 2000. Khảo sát sơ lược về cấu trúc âm tiết tiếng Việt vào thế kỷ XIV– XV qua hai cứ liệu chữ Nôm (A brief survey of Vietnamese syllable structure in the XIV-XVth centuries in two Nom scripts). In Việt Nam Học—Kỷ Yếu Hội Thảo Quốc Tế Lần Thứ Nhất. Hà Nội: Thế giới. 252–265.



Shimizu, Masaaki. 2015. A reconstruction of ancient Vietnamese initials using Chữ Nôm materials. 国立国語研究所論集 (NINJAL Research Papers) 9:135–158. Shimizu, Masaaki, Lê Thị Liên, and Shiro Momoki. 2005. A trace of disyllabicity of Vietnamese in the 14th century: Chữ Nôm characters contained in the inscription of Hộ Thành Mountain. Kobe City University of Foreign Studies 64:17–49. Shorto, Harry L. 2006. A Mon-Khmer Comparative Dictionary. Paul Sidwell, Doug Cooper, and Christian Bauer eds. Canberra: Pacific Linguistics. Sidwell, Paul. 2015. Austroasiatic classification. In The Handbook of Austroasiatic Languages, Paul Sidwell and Mathias Jenny eds. Boston: Brill. 144–220. Solntsev V.M., Solntseva N.V., and Samarina I.V. 2001. Phonetics and Phonology. Field materials: Vocabulary and Grammar. In Ruc language: Materials of the RussianVietnamese linguistic expedition from 1986. Srisakorn, Preedaporn. 2008. So (Thavung) Grammar. Dissertation. Thailand: Mahidol University. Stankevitch, N.V. 2006. Vài nhận xét về các hư từ tiếng Việt thế kỉ 16. Ngôn Ngữ 9:1– 9. Taylor, Keith Weller. 1983. The Birth of Vietnam. Berkeley: University of California Press. Taylor, Keith W. 2013. A History of the Vietnamese. Cambridge University Press. Thompson, Laurence C. 1976. Proto-Viet-Muong phonology. In Austroasiatic Studies, Philip N. Jenner, Laurence C. Thompson, and Stanley Starosta eds. 1113–1204. Honolulu: The University Press of Hawaii. Trần, Hữu Thung and Thái Kim Đỉnh. 1997. Từ điển tiếng Nghệ (A dictionary of Nghe An Vietnamese). Vinh, Vietnam: Nhà Xuất Bản Nghệ An. Trần, Trí Dõi. 2011. Một vài vấn đề nghiên cứu so sánh-lịch sử nhóm ngôn ngữ Việt-Mường (A historical-comparative study of Viet-Muong group). Hà Nội: Nhà xuất bản Đại Học Quốc Gia Hà nội. Võ, Thị Minh Hà. 2016. Lượng từ chỉ số lượng trong văn bản thư từ thế kỉ XVII–XIX của người công giáo (Quantity words indicating number in letters of Catholic missionaries from the 17th to 19th centuries). Ngôn Ngữ 1:64–77. Vũ, Đức Nghiệu. 2006. Hư từ tiếng Việt thế kỉ XV trong Quốc Âm Thi Tập và Hồng Đức Quốc Âm Thi Tập (Vietnamese function words of the 15th century in the National Language Poetry Collection and the Hong Duc Anthology). Ngôn Ngữ 12:1– 14. Vũ, Đức Nghiệu. 2014. Cấu trúc danh ngữ tiếng Việt trong văn bản Phật thuyết đại báo phụ mẫu ân trọng kinh (Noun phrase structure in the text Phật thuyết đại báo phụ mẫu ân trọng kinh). Ngôn Ngữ 1:3–19. Vương, Hoàng Tuyên. 1963. Các dân tộc nguồn gốc Nam Á ở miền bắc Việt Nam (The ethnic groups of northern Vietnam). Hanoi: Nhà Xuất Bản Giáo Dục. Vương, Lộc. 2002. Từ Ðiển Từ Cổ (A Dictionary of Ancient Words, 2nd. Ed.). Hà Nội: Nhà Xuất Bản Ðà Nẵng, Trung Tâm Từ Ðiển Học.

initial steps in reconstructing proto-vietic syntax


Wang, Li. 王力. 1948. Hanyueyu yanjiu 漢越語研究 (Research on Sino-Vietnamese). Lingnan Xuebao 岭南学报9.1:1–96. (reprinted in 1958). 292–401. Zhu, Xiaonong. 2015 (online). Tonogenesis. In The Encyclopedia of Chinese Language and Linguistics, General Editor Rint Sybesma. Accessed 09 November 2016. http://‑7363_ecll_COM_00000427.

chapter 3

Nicobarese Comparative Grammar Paul Sidwell



Nicobarese is a small branch of Austroasiatic (AA) speech that is uniquely insular in distribution, with the closest neighbours speaking varieties of Austronesian (An) and Andamanese languages. Strikingly, there are various typological similarities between Nicobarese and An languages of Northern Sumatra and the Malay Peninsula, strongly suggestive of contract driven convergence in prehistory. Despite typological changes, the AA heritage of Nicobarese is revealed by close examination of the basic lexical stock and oldest layers of morphology, recognized over a century ago by Schmidt (1906). The present typological profile includes: VS/VPA order dominating in independent clauses, left-branching phrase structure, extensive use of affixation, and a strong tendency towards CV(C) syllables (with loss of onset clusters, and iambicity preserved only in lexical roots). For more than a century of active research, the problem of AA historical syntax has been largely neglected in favour of lexical and phonological studies, due in some measure to the lack of comparable materials. To be sure, various grammars and texts for AA languages have been produced since the mid 1800s (since Mason’s 1854 Talaing (Mon) sketch) through to the present, but it is only in recent decades that comparative AA syntax is becoming a viable research topic. While it can be relatively easy to make useful lexical and phonological comparisons even with disparate transcription systems and idiosyncratic wordlists once one has some hundreds of items, syntax is another matter. Produced over decades and within many different scholarly traditions, textual translations and grammatical categorizations vary in challenging ways: technical terminology is inconsistent, theoretical models are diverse and changeable, and there

* Work which contributed towards this chapter was supported in part by the Australian Research Council Future Fellowship (project ID: FT120100241), and variously support in the form of travel and accommodations provided by the Max Planck Institute for the Science of Human History (Jena), and the Department of Comparative Linguistics at the University of Zurich. I am also especially grateful in particular for comments on drafts of this paper provided by Mathias Jenny and Mark Alves.

© koninklijke brill nv, leiden, 2020 | doi:10.1163/9789004425606_005

nicobarese comparative grammar


is little tradition of standards of eliciting complex or otherwise interesting syntactic phenomena (e.g. it can be difficult to find examples of subordinate clauses or other types of recursive structures in the texts and grammars). Dealing with Nicobarese languages on the basis of available publications and manuscripts, one runs into immediate practical problems; the most extensive sources belong to the colonial era and are linguistically naïve by the standards of contemporary typological linguistics. Further frustrating matters, the studies produced since the 1960s in more sensible structural-linguistic frameworks are in large part derived from colonial era studies, augmented to some extent by data and insights from a modest number of native-speaker informants. While it is quite understandable that researchers would begin with the legacy sources and seek to confirm their content with native speakers, this still has something of a skewing effect in regard to syntax, and we need to be mindful of this. Additionally, since the 1960s the Nicobar Islands have effectively been off limits to scholars who are not Indian nationals, and the Tsunami of 2004 and more recent militarization of the islands by India mean that prospects to improve the available data are highly constrained. The principle sources utilized for this study are Sidwell’s (2015a) sketch of Car, which is substantially based on Brain (1970) (née Critchfield), Radhakrishnan’s (1970) description of Nancowry, and to a lesser extent Rajasingh’s (2016) sketch of Muot/Nancowry and older legacy sources such as De Röepstroff (1884), Man (1889), Temple (1902), and Whitehead (1925). Note that example sentences taken from these sources have been reglossed to bring them into line with the recommended list at the ICAAL Projects site,1 since many of them only have rather free English translations in the original sources. I have not attempted to interpolate phonemicization on the older legacy source examples, but have kept original orthography transcriptions intact. The Car sentence examples provided by Brain (1970) and Radhakrishan (1970) have been extracted, glossed, and are available online.2 Regrettably only Car and Nancowry could be relied upon for this study. There are other Nicobar lects, which are not only very small in speaker populations (perhaps only hundreds of speakers) but also only poorly represented in published works and cannot be effectively utilized for comparative syntactic study at this time. The languages of the Nicobars are conventionally reported to consist of six lects, mostly named after the islands on which they are spoken:

1‑book‑project/notational‑conventions. 2‑languages‑project.


map 3.1


Nicobar languages from Wurm & Hattori (eds.) (1981/83)

nicobarese comparative grammar



Car Central: Nancowry/Müot, Camorta, Trinkat, Katchall Southern: Great Nicobar, Little Nicobar, (?) Shompen

figure 3.1 Nicobar languages tree

1) Car 2) Chowra 3) Teresa and Bompoka 4) Central (Nancowry/Müot,3 Camorta, Trinkat, Katchall) 5) Southern (Great and Little Nicobar, Pulo Milo, Kondull) 6) Shompen (interior of Great Nicobar Island) In Sidwell (2014b), I presented a statistical analysis which found that the Nicobarese lects of the Central island group (Nancowry, Katcall, Camorta, Kondul, Pulo Milo, Teressa) form a coherent dialect grouping that coordinates with Car, forming a tree with two main branches. Additionally, there are phonological indications that Shompen and various Southern lects (Great and Little Nicobar) form a sister group with Central Nicobarese. Thus, the Nicobarese lects appear to fall historically into three main groups, essentially consistent with the geographical clustering of the islands (see Map 3.1). This relationship is diagrammed in figure 3.1. The above configuration also supersedes the study by Blench and Sidwell (2011), which hypothesized that Shompen may be more closely related to Aslian or otherwise represent a branch intermediate between Nicobarese and Aslian. Based on the 400+ item comparative word list provided by Man (1889),4 Shompen apparently shares substantial common Nicobarese vocabulary, plus a tendency to harden nasal codas in common with Great Nicobar. Similarly, van Driem (2008), discussing the data of Elangaiyan et. al. (1995) reported a substantial proportion of Nicobarese lexicon in Shompen. On balance it seems reasonable to provisionally place Shompen in the southern group. Taking the above family tree as our starting point, we note that Car and Nancowry fall across the two coordinating branches of Nicobarese, and we may reasonably adopt the working assumption that features found to be held in

3 ‘Müot’ is lately preferred in place of ‘Nancowry’ by V.R. Rajasingh (CIIL Mysore) who has been relatively active in Nicobarese research of late. 4 The Man comparative data is made available in a google sheet by this writer at: https://docs sp=sharing.



common may be reconstructed to the proto-Nicobarese level. It is acknowledged that we have no real sense of the extent to which Car and Nancowry have influenced each other after diverging from proto-Nicobarese, nor to what extent present or now extinct Nicobarese lects may have played a role in the history of the group, but we temper all the analyses by keeping general principles in mind and taking into account what we already know about AA. In addition to the technical difficulties and limitations mentioned above, there is the further issue that word replacement by tabooing practices has accelerated the background rate of churn in the lexicon, resulting in a very low rate of shared cognate lexicon available for determining regular correspondences between segments and other levels of structure. Man (1889) describes this problem graphically: The diversities of speech which have sprung into being among the four communities in question, are, moreover, no doubt in great measure ascribable to the operation of a superstitious custom, which here, as in various other remote regions, has effected constant changes in the language of the inhabitants; but in every instance of this kind such changes have been limited to the area of the particular community concerned. The practice referred to is based on a firm belief in an after-existence, and requires that the names of deceased relatives and friends shall be tabooed for a certain lengthened period—generally about one generation—for fear of summoning or offending the ghost of the person so named. Man (1889:viii)

It should be kept in mind that while it is already challenging to attempt comparative study with just two source languages, the problem of lexical churn also potentially hampers any attempts to find the provenance of grammatical morphemes. The Census of India counted approximately 6,000 inhabitants in some 158 villages in the Nicobars in the 1880s, implying an average of less than 40 persons per village, speaking some six named speech varieties. The social situation as described at that time by de Röepstroff indicates a lack of centralized political structures or hierarchical social relations, and a high mobility among males in particular. This suggests a general lack of opportunity to develop deep linguistic diversity; it is thus realistic to think of the situation as a small cluster of dialect chains or ‘linkages’ forming a language area of related lects which drifted along together over time. As a branch, it may have diverged tremendously from the rest of AA, but internally there may have been a tendency to leveling that mediated against internal divergence. If this is correct, the proto-

nicobarese comparative grammar


structures reconstructed in this study may be of deep or shallow historical depth and not at all helpful in trying to internally reckon the age of the branch.


Word Order

Jenny (2015, and this volume) persuasively argues for VS/VAP basic word order in proto-AA, which is preserved variously in the geographical periphery of AA, while the geographically Indo-Chinese AA branches have restructured more or less completely to SV/AVP which dominates the AA, An, HM, and Tai language of Mainland Southeast Asia (for example, see WALS map 81A (Dryer 2013)). Car has relatively fixed word order, while in Nancowry there is great flexibility, with both favours verb initiality in independent clauses. Below we examine each language and suggest a historical synthesis of the patterns we find. 2.1 Car Word Order The dominant word order in Car independent clauses is verb initial and subject final (VS/VPA); additionally there is significant use of passivization, demoted arguments and adjuncts, as well as pragmatic elision of arguments, all of which affect word ordering in speech. Arguments are realized with Nominal Phrases (NP), while predicates are characterized as consisting of Verbal Complexes (VC) and their dependent arguments. Adverbial phrases, generally indicating time or place, distribute peripherally (initially or finally), the apparent tendency being for time phrases to come first and location after. Within dependent/subordinate clauses, word order tends to follow SV/AVP, although data are not numerous enough to make strong generalizations on this. According to Brain (pp. 27–28), intonation rather than word order distinguishes declarative from polar interrogative clauses. Normally declarative clauses have a falling pitch contour, while in polar questions a high pitch is maintained through the utterance. A striking feature of Car grammar this author has noted relates to the nature of Car alignment and argument marking. The language can be analysed as somewhat pronominalizing, in a manner that creates partial Absolutive marking on common nouns, but not pronouns or proper nouns. The basic pattern is as follows. – For intransitive clauses with lexical S (not a pronoun or proper noun), a 3rd person pronoun (glossed AGR) agreeing with S, is clitisized to the main verb, and the S takes an obligatory DEM (most often the generic ŋam). For pronominal or proper noun S, there is no such agreement marking, although S incorporating a possessive pronoun appears to have optional agreement.



– For transitive clauses, agreement is required by lexically expressed P. However, there is a restriction such that AGR and DEM do not co-occur with transitives; consequently, if the P is definite, DEM is used, but in other cases, AGR is present. Alternately, a lexical Adjective (ADJ) can stand in the place of DEM. This system is somewhat reminiscent of the agreement marking in Nias (an Austronesian language spoken on islands west of Sumatra, see Brown 2001 for a description) although in that language agreement marking is prefixed to verbs and there is mutation of onsets in case marked arguments, so the parallel is weak. It strikes me that for Car, a consequence of the agreement system is that there is almost always a pronoun indexing of S immediately following V, which perhaps hints at an earlier VS preference. Example sentences illustrating structural occurrence of AGR follow: Intransitive clauses with lexical S: (1) kúˑn-ə=ʔan ŋam taɾík small-PERS=AGR DEM man ‘The man is small.’ (CB:122.3) (2) póˑjti=ʔən ŋam tahɛ́ˑl much.width=AGR DEM river ‘It is a wide river.’ (CB:122.5) (3) laɾák=ʔan ŋam kanúˑc split=AGR DEM pencil ‘The pencil is split.’ (CB:248.10) Intransitive clauses with pronominal S: (4) fɤ́ŋ-kə-ɾɛ cin burn-INTR-REFL 1S ‘I burn myself (accidentally).’ (CB:172.1) (5) hu-kafɤ́t-ə ʔap EMPH-flick-PASS 3S.NONVIS ‘He (did it by) flicking.’ (CB:167.1) Transitive clause with lexical P:

nicobarese comparative grammar


(6) ha-cát-ŋɛn=ʔək líˑpəɾɛ cin CAUS-lose-away=AGR book 1S ‘I lost the book.’ (CB:145.2) (7) kéˑʔ-ə-ɾə=ʔəp kanúˑt cu man, ʔi kuj miˑs take-ATTR-toward=AGR comb 1S.OBL 2S, LOC head table ‘Go and get my comb on the table.’ (CB:202) Transitive clause with pronominal P: (8) mɨk ʔan cin, cú-ʔə see 3S 1S, 1S.OBL-REFL ‘I saw him, myself.’ (CB:148.1) (9) hɤŋ=tiˑʔ ʔɔ cin wait=hand 3S.OBL 1S ‘I’m waiting for him (to do it).’ (CB:247.1) For pronominal non-core arguments and adjuncts, there are special Oblique (OBL) forms for the personal pronouns and demonstratives. There are no Copular or Existential verbs in Car (unlike Nancowry); instead, any lexeme can take V position as a derived verb and take a Stative (STAT) or Existential (BE) reading according to the inherent semantics or pragmatics of the event. The syntactic analysis of this construction is ambivalent; the derived verbs in these constructions still carry S-agreement, as if they are merely displaced lexical S, while the demonstrative arguably takes the archetypal S position. These appear to reflect a special case; we can speculate that historically lexical S was fronted, displacing a BE verb that has since fallen out of use (yet continues in Nancowry, see below) Examples: (10) taŋɛ̃ˑ́ ʔ=ʔən ŋih bone=AGR PROX.S.NHUM ‘This is a bone.’ (CB:132.6) (11) kap=ʔan ŋamɔ́ h tortoise=AGR DIST.S ‘That is a tortoise.’ (CB:141.2) Generally, intransitive and detransitivized clauses are very common in the Car data. Brain recognises Passive (PASS) constructions in which A is demoted and



marked with the Linker tə (LINK) (labeled “grammatical relator” by Brain). V in these constructions is also usually marked with one of several phonologically and morphologically unrelated suffixes (-ə, -hu, -ijə/-i, -ləŋə, -ɲu), whose use is conditioned phonologically and lexically: (12) ɲɛ́ˑk-ə cin, tə ɲanɛ́ˑk bind-PASS 1S, LINK cord ‘I’m bound by the cord.’ (CB:183.4) (13) haʔã ́h-ləŋə cáʔa, tə jik hól-ɾɛ feed-PASS 3PL.VIS, LINK 3PL.NONVIS friend-REFL ‘They were served by their friends.’ (CB:183.2) (14) vɛ́ˑ-ɲu=naŋ cin, tə cɔˑn, huɻɤ́c tell-PASS=ear 1S, LINK PN, tomorrow ‘I’ll be told by John tomorrow.’ (CB:216.1) The use of tə in Car has parallels in Western Austronesian languages, particularly the di-verb prefix in Malay and Indonesian, regarded as a marker of passive voice by Hopper (1983), and more recently treated as a general marker of reduced valency (e.g. Arka 2005). Contrast this to the situation broadly in Austroasiatic, “Few AA languages have special morphological or periphrastic passives. Frequently, unmarked verb forms can receive passive reading, depending on the context.” (Jenny, Weber & Weymuth 2014: 105). Impersonal Passives, lacking an overt demoted S/A, but still morphologically marked, are also common. E.g.: (15) laɾáˑk-ijə=ʔan ŋam pak-cóˑn split-PASS=AGR DEM branch-tree ‘The branch has (accidentally) split away.’ (CB:176.2) (16) kɨhɨ́ˑt-ə líˑpəɾɛ cin5 finish-PASS book 1S ‘My books are all taken.’ (CB:246.4) Additionally, there is another type of detransitivized clause evident in the Car data. Brain regards as transitive various sentences in which P follows A. Yet in

5 Note the apparent optional agreement marking with S formed with possessive NP.

nicobarese comparative grammar


these cases, P is introduced with tə, and V may be marked with -ə or other suffixes, much like a passive construction, so these may instead be characterized as Anti-Passive (ANTIP) constructions. (17) lakúk-ə=tiˑʔ cin, tə pilɤ́n break-ANTIP=hand 1S, LINK bottle ‘I broke a bottle.’ (CB:184.4) (18) ʔət kahúl-l-uvə ʔəm, tə ɲáʔã NEG cook-upward-POSS 2S.SUB, LINK food ‘Don’t you have food to cook?’ (CB:190.1) (19) haɻóh tum taˑk kahɛ́ˑʔ=tiˑʔ man, tə ɾupíˑʔ some number CLF take=hand 2S, LINK money ‘Give me some money.’ (CB:120.3) It can be suggested that the high frequency of intransitive and detransitivized clauses indicate a strong preference for only one core argument per verb within main clauses, with elision and demotion as common strategies to achieve this preference. Significantly, tə also derives adjectives, being analyzed by Brain as a prefix in such cases. This may have arisen from the use of tə to coordinate extra-clausal statives which were then reanalysed as NP constituents. Consider the following examples: (20) lɤ́ktɛn ʔan, nə tə-ʔɔ́ kə, minɛ́ˑʔ-ɲə, ŋam təŋə therefore 3S, SUB ADJ-drunk name-outward DEM manáh, tə-ʔɔ́ kə meaning ADJ-drunk ‘Therefore it is called təʔɔ́ kə, which means, “which is drunk”.’ (CB:230.3) (21) kéˑʔ-tə tə-manúl man líˑpəɾɛ, ʔin cu take-toward ADJ-yellow 2S book, DIR 1S.OBL ‘Give me the yellow book.’ (CB:248.2) The word order in Subordinate Clauses seems to favor SV/AVP structure. The first element is typically a subordinate pronoun nə (SUB), which indexes the S/A of the subordinate clause.



(22) kasál-ə mɛh cin, ʔəm tisɔ́ k-ŋə kuj ŋih mak dare-TR 2S.OBL 1S, 2SG.SUB jump-away over PROX.S.INAN water ‘I dare you to jump across this well.’ (CB:146.3/240.1) (23) han-sóˑŋ ʔan, nə sut ŋam ɲam EMPH-collide 3S, SUB kick DEM plaything ‘By kicking he played the ball.’ (CB:167.3) (24) haɾún ŋam ɲiˑʔ, nə kuˑc train DEM child, SUB write ‘Train the child to write.’ (CB:213.7) Prepositional Phrases, functioning as Adjuncts, always take a peripheral position to the right. There are only two prepositions, ʔɛl and ʔi~ʔin. The former has typically locative focus (LOC), while ʔi tends to be more directional (DIR), and the ʔin variant tends to be used for benefactive meanings (Brain regards ʔi and ʔin as morphological variants). The ʔi Adjuncts mostly require nə, while there is no such requirement with ʔɛl or ʔin. (25) paɻúˑj-a ʔan ŋam mak, ʔɛl páˑlti murky-STAT AGR DEM water, LOC bucket ‘The water in the bucket is murky.’ (CB:192.1) (26) ʔaˑm taka jip taɾík, ʔɛl ʔuɾɔ́ hɔ how.many CLF 3PL.NONVIS person, LOC room ‘How many men are in the room?’ (CB:206.1) (27) laʔóh-hət-və ʔək kúˑʔ-cɔ́ k, nə ʔi ʔaláha haʔún break-inward-POSS AGR face-arrow SUB DIR body pig ‘The arrow point was broken in the pig’s body.’ (CB:184.3) (28) ha-ɾɔ́ ˑn-haka ʔan ŋam kanúˑc, nə ʔi ŋam líˑpəɾə CAUS-slant-CONT AGR DEM pencil, SUB DIR DEM book ‘The pencil is leaning against the book.’ (CB:193.1) (29) kéˑʔ-ə-ɾə=ʔəp kanúˑt cu man, ʔi kuj miˑs take-ATTR-toward=AGR comb 1S.OBL 2S, DIR head table ‘Go and get my comb on the table.’ (CB:202)


nicobarese comparative grammar

The benefactive use of ʔin marks what we might otherwise characterize as an Indirect Object (IO). The few examples found data are imperatives, such as 30) and 31) below. Similar constructions are also found without ʔin, such as 32), and 33), the difference appears to be that the latter lack the overt S. (30) ɾahɛ́c-hə-ta man, tə cíˑni, ʔin cu little-INC.OBJ-toward 2S, LINK sugar, BEN 1S.OBL ‘You give me a little sugar.’ (CB:124.1) (31) kéˑʔ-tə tə-manúl man líˑpəɾɛ, ʔin cu take-toward ADJ-yellow 2S book, BEN 1S.OBL ‘You give me the yellow book.’ (CB:248.2) (32) kéˑʔ-tə kanúˑc cu take-toward pen 1S.OBL ‘Give me a pen.’ (CB:239.3) (33) haɻóh tum taˑk kahɛ́ˑʔ=tiˑʔ man, tə ɾupíˑʔ some number CLF take=hand 2S, LINK money ‘Give me some money.’ (CB:120.3) We observe various strategies above; P + Oblique IO, P + ʔin Adjunct with Oblique IO, and demoted P marked with tə while other arguments are elided. So far these data do not appear to show three arguments in a main clause, consistent with a general preference to minimize the number of arguments per verb. Our brief overview of Car clause structures supports the following abstraction of preferred word order:

Verb complex










Oblique~demoted~subordinated constituents; ʔi adjuncts ʔɛl, ʔin adjuncts

2.2 Nancowry Word Order The sources vary in their characterizations of word order in Nancowry. Man, briefly discussing syntax, writes:



The collocation of words comprising a sentence being in many cases identical with or very similar to that of English, the framing of simple sentences presents little or no difficulty. […] E.g. ane inôat lamang ten chüa, that knife belongs to me; homkwòm ten chüa ane pōwah, give (to) me that paddle. Man (1889:liii)

His short discussion on syntax also makes favourable comparisons to Malay and Burmese,6 and on the whole leaves the strong impression that SV/AVP is the basic word order. Rajasingh (2016: 40) asserts that, “The preferred syntactic pattern of the language is Verb–Object–Subject, although some re-ordering can occur.” Unfortunately, Rajasingh’s paper provides rather few transitive clauses, and while V initial clauses do dominate numerically, other word order patterns occur, and the significance of these is not explicitly discussed in that paper. Interestingly, the WALS atlas, citing Radhakrishnan (1970) as its source, characterizes Nancowry as having no dominant order. Radhakrishnan does not make a general statement about word order; however, he does present and discuss some 169 example sentences, focusing mainly on the syntax of ta, ʔin, and the fused form tin, all of which he generally labels as Particles (PTL) and likens these to the English articles, although without a functional analysis. It is apparent to this writer that ta, ʔin, and tin, are important to understanding issues of argument structure and alignment in Nancowry, and they feature heavily in the discussion below. Nancowry ta, cognate with the Car linker tə, has various functions. Within a main clause it marks non-human P; beyond the main clause, its functions are roughly parallel to its cognate in Car, namely, introducing clause peripheral constituents and deriving adjectives. ʔin is characterised as a demonstrative by Rajasingh, which appears to be correct etymologically, but on the basis of its distribution, Radhakrishnan finds that its basic function is to mark human core arguments. tin is identified by Radhakrishnan as a fusion of ta+ʔin; it mostly precedes human P in transitive clauses and human S in verbless constructions. In Rajasingh’s data, tin (written t̪iˑn) is generalised to an accusative marker for all P, but I regard this is a misanalysis based on a skewed data set, and for the present the discussion is focussed on the evidence of Radhakrishnan’s data, which is known to be drawn directly from native speakers and much are less reliant on scriptural translations. 6 Man’s comparison to Burmese is odd given the strong reference for V final ordering in that language, although his heavily prescriptive attitude to grammar is evidently Euro-centric.

nicobarese comparative grammar


Examples of intransitive clauses in Radhakrishnan’s data give us a clear indication of the placement of S in respect of V (34) rián ʔin cə̃/cʉ̃ ə run PTL 1S ‘I am running.’ (RK:39) (35) siáŋ ʔáné máŋka sweet that.VIS mango ‘That mango is sweet.’ (RK:60c) (36) ruk ʔin ʔən məʔ come PTL 3S FUT ‘He will come.’ (RK:111) In polar interrogative clauses, the S is repeated. S appears preverbally and post verbally, and it may be elided post-verbally. The historical significance of this is ambiguous; it could be indicative of a change in progress towards an SV structure for polar interrogatives, or it might be a holdover from an earlier SV pattern: (37) cə ruk ʔin cʉ̃ a 1S come PTL 1S ‘Am I coming?’ (RK:115) (38) mɛ juáŋa ruk 2S PST come ‘Did you come?’ (RK:118) (39) ʔifɛ́ juáŋa ruk ʔin ʔifɛ́ 3PL PST come PTL 3PL ‘Did they come?’ (RK:120) (40) cit ruk məʔ 1S.NEG come FUT ‘I will not come?’ (RK:142) Passives are also found in the Nancowry data, although unlike Car, the byphrase is indicated with taj which also functions as an instrumental preposition, apparently a Nancowry innovation, having grammaticalized the marker



from lexical ‘hand’. This is both intuitively reasonable, and has cross-linguistic parallels; Mithun (2002: 253) points out that in Numic (Uto-Aztecan) languages, “Verbs containing the instrumental prefix ma-, descended from the noun root for ‘hand’, can cooccur with independent instrumental nominals …”. (41) ciáw-a-n mɛ taj ʔən call-a-PTL 2S INS 3S ‘You are called by him.’ (RK:82) (42) juáŋa fĩaw-a-n ʔən taj mɛ PST beat-a-PTL 3P INS 2S ‘He was beaten by you.’ (RK:85) (43) pa-ɲáp taj mic CAUS-die PTL lightning ‘(Someone) was killed by lightning.’ (RK:96) Some examples of transitive clauses are provided below. We see non-human P unmarked (44, 47) when the order is VPA, and non-human P marked with ta with the order VAP (48, 49). The latter parallels the demotion of P noted above for Car, and can potentially be treated as Anti-Passive. In the other examples, we see human P marked with tin. (44) jʉ́ ʔ-si ʔɛ tin-ʔɛ̃h put-downward 3S.NHUM here (PTL-PROX) ‘Put it down here.’ (RK:22) (45) ha-líap tin cə̃ CAUS-learn PTL 1S ‘Teach me.’ (RK:36) (46) juáŋa hew kəʔ ka ʔin cʉ̃ ə PST see NVIS fish PTL 1S ‘I saw a fish.’ (RK:51) (47) kalóʔ nɔt cən k-am-alóʔ steal pig 1S.POSS thief (⟨AGT⟩steal) ‘The thief stole my pig.’ (RK:55)

nicobarese comparative grammar


(48) hew cə tin ʔən na cim see 1S PTL 3S PTL cry ‘I see him crying.’ (RK:62) (49) juáŋa hew cə ta ʔam na ʔu-hú PST see 1S PTL dog PTL REDUP-bark ‘I saw the dog barking.’ (RK:63) (50) ciáw tin ʔə́n-in mɛ PTL 3P-PTL 2S ‘You call him to come.’ (RK:79) (51) fĩaw tin ʔə́n-in mɛ beat PTL 3P-PTL 2S ‘You beat him.’ (RK:84) The above examples suggest that VPA is the least marked order (much like Car). It is interesting that in the VAP clause examples, the A is always first person singular and not marked with ʔin. Assuming that the basic function of ʔin is to mark human arguments, not grammatical relations, it would seem inconsistent not to mark them in such cases. However, it is also apparent that fronted/topicalised arguments consistently get neither ʔin nor ta regardless of their animacy or grammatical role. One can suggest that the immediately post-verbal A’s as fronted, dislocated from their otherwise basic position following P. While one might regard the immediate post verbal position as basic for A given that it is morphologically unmarked, that would otherwise ignore the fact that in intransitive clauses human S is marked (with ʔin).7 Radhakrishnan provides several examples of di-transitive clauses. Note the uses of ta (in both free and fused forms) to mark the Indirect Object (IO): (52) ʔum-kuám ta nɔt ʔin ʔən tin ʔijáw REDUP-give PTL pig PTL 3P PTL crocodile ‘He gives a pig to the crocodile.’ (RK:74) (53) ʔum-kuám ta nɔt ʔin ʔən tin cʉ̃ ə REDUP-give PTL pig PTL 3P PTL 1S ‘He gives me a pig.’ (RK:75) 7 Applying the same logic, we might regard the non-case marked preverbal A of Semelai, as described by Kruspe (2004) as extra-clausal.



(54) ʔum-kuám ta huŋʔõk ʔin ʔən ta caʔ nɔt REDUP-give PTL food PTL 3P PTL DAT pig ‘He gives food to the pig.’ (RK:76) As alluded to above, one also finds frequent examples in the legacy sources of medial V8 (such as Man 1889) and the Nancowry Bible texts;9 consider the following from Man: (55) chit akâh nga-shī an 1.NEG know concerning 3S ‘I know nothing about it’ (Man1889:xxxix) (56) an pöya hēang-e-chuk kaling 3S sit among foreigner ‘He sits among the foreigners’ (Man1889:xxxix) What is striking about such sentences is that the A is never marked with ʔin (en), consistent with the pattern found in Radhakrishnan’s data, that fronted/ topicalised elements are unmarked, which is also consistent with cross-linguistic tendency generally. It is also important that we do not really know how these early text collections were made. It is apparent that the early Bible translations, which have a strong tendency to follow typical European word orders, influenced the 19th century sources, so it may be inadvisable to rely too heavily on these to inform our analysis of Nancowry syntax. My suspicion is that the 18th and 19th century writers noted the availability of the AVP construction, which otherwise marks a topicalized S, and over-generalzed it in translation by using it as a neutral transitive order. Nancowry na, cognate with Car subordinator nə, has similar distribution and functions. There are not many examples in Radhakrishnan’s thesis, but the patterning is consistent. In RK:62, RK:63 and RK:xx below, we see na introducing subordinate clauses that consist of just a verb. In RK:73, na links a prepositional adjunct: (57) hew cə tin ʔən na cim see 1S PTL 3S PTL cry ‘I see him crying.’ (RK:62) 8 Such examples provided by Rajasingh (2016) appear to be lifted directed from these legacy sources. 9 The Gospel of Matthew was translated by Moravian missionaries about A.D. 1780, and published by de Roespstorff in 1884.

nicobarese comparative grammar


(58) juáŋa hew cə ta ʔam na ʔu-hú PST see 1S PTL dog PTL REDUP-bark ‘I saw the dog barking.’ (RK:63) (59) juáŋa hew tin kapríal na ta kuj ʔuɲíha PST see PTL person PTL PTL on tree ‘I saw Gabrial on the tree.’ (RK:73) (60) ta-láʔ ʔõk cə-n ʔən na ʔuk-sák PTL-side back 1S-PTL 3S PTL REDUP-stand ‘He is standing behind me.’ (RK:xx) We do not find clear indications of Absolutive marking in Nancowry, as we do in Car. Rather, the alignment appears to be Nominative-Accusative, with both configurational and morphological marking of grammatical relations. In the broader AA context, this strongly suggests that Nominative-Accusative alignment is inherited from proto-Nicobarese, with Car being the innovator of Absolutive marking. There are also a couple of examples in Radhakrishnan’s data of ta that appear to parallel the use in Car of tə- in deriving adjectives, Radhakrishnan glosses ta as a particle and appears to regard the lexical root as a stative: (61) kəʔ mɛʔ ta lo NVIS goat PTL fast ‘The goat is fast.’ (RK:48) (62) juáŋa hew kaʔ papúh ta karú-n-cə̃ PST see EMPH person PTL big-PTL-1S ‘I saw a huge man.’ (RK:72) Interestingly, Nancowry has more prepositions than Car, specifically: laʔ ‘direction’, ŋãl ‘above’, kuj ‘on’, caʔ ‘to (Dative)’, and ʔuál ‘in’ (cognate with Car ʔɛl LOC). In Radhakrishnan’s data, these adjuncts are always introduced with ta (unlike Car). (63) juáŋa hew tin kapríal na ta kuj ʔuɲíha PST see PTL person PTL PTL on tree ‘I saw Gabrial on the tree.’ (RK:72)



(64) ʔuk-sə́k ta ʔuál riák REDUP-stand PTL in water ‘(Someone) is standing in the water.’ (RK:10) (65) ʔuk-sə́k ta laʔ ʔok cən ʔən REDUP-stand PTL side back 1S.POSS 3S ‘He is standing behind me.’ (RK:109) The above consideration of Nancowry word order supports the following summary template:


Verb complex









Demoted~subordinated constituents/adjuncts

Additionally, this order is disrupted by fronting/topicalization (particulary of A), and by demotion of A (passivization). In the basic positions, human10 arguments are marked by ʔin, and P (human and non-human) by ta with fusion of these to tin in the case of human P. Obliques/Indirect Objects are generally marked with ta + ʔin or relevant preposition (with or without fusion of the morphemes).


Proto-Nicobarese Historical Word Order and Grammatical Relations Marking

The best documented Nicobarese languages, Car and Nancowry, show a strong tendency towards VS/VPA as their basic or least-marked word order in independent clauses. In respect to Car, word order quite strictly follows this pattern, while in the Nancowry data there is more variation, especially in the early Bible translations and colonial era sources with their stronger tendency for preverbal 1st person S/A. However, there are reasons to regard VS/VPA as basic, and therefore resonstructable to proto-Nicobarese.


The category of human in Nancowry also appears to include high animacy creatures such as crocodiles and sea turtles.

nicobarese comparative grammar


Both languages have passive constructions with demoted A marked with ta/tə;11 Car also allows for demotion of P (antipassive), and Nancowry allows the same but requires further investigation. In respect to three-place predicates, the languages differ somewhat in their marking of IO. In Car pronominal IO s occur in OBL form either within the main clause or demoted with ʔin, and lexical IO s are also given a peripheral position but marked with tə. In Nancowry, all IO s take a peripheral position and are marked with ta (or the fused form tin if the nouns are human). This suggests that, at a minimum, proto-Nicobarese relegated IO s to the clause periphery with *ta marking. In terms of coding grammatical relations, both languages employ strategies of configuration (word order) and morphological marking. In terms of configuration, both are underlyingly quite similar once we abstract somewhat, but in relation to morphological marking some differences are notable. Car shows partial Absolutive marking that may be heading in the direction of a pronominalizing language, and on general and historical grounds we can suggest that this is innovative. However, the mechanism and motivation for this shift in Car are not clear, although it is tempting to invoke contact with one or more regional Austronesian languages. Various languages of Sumatra have V initial or highly variable word order, and Nias in particular (Brown 2001) has VS/VPA basic order, ergative alignment, and is pronominalizing, being perhaps the closest parallel to Car in respect of these features. This suggests some kind of areal or sub-stratum explanation, which would be the topic of another study. Car has an extensive, highly derived paradigm of personal pronouns and demonstratives (see Brain 1970, Sidwell 2014a) with Nominative, Oblique, Interrogative and Subordinate forms, otherwise unprecedented among AA languages, while Nancowry preserves a more or less typical AA paradigm of personal pronouns and demonstratives (see Rajasingh 2016: 35) lacking any case marked forms. So in the first place, we are inclined to propose that protoNicobarese was strongly configurational in marking core grammatical relations, and did not have an elaborate system of morphological making of the same, but probably carried on and later elaborated upon what it inherited from proto-AA. The ta/tə morpheme, which has cognates elsewhere in AA (OldKhmer ṭa /ɗə/ subordinating conjunction, OldMon ta /tə/ ‘to, toward, for, on, before’, and 11

The phonetic difference between Nancowry ta and Car tə is probably not significant. In both languages there are only three or four vowel distinctions in phonologically weak positions, so the real value of the vowel in this morpheme is short with variable timbre, which is not easily captured with a single IPA symbol, and we provisionally reconstruct *ta for now.



others), can be characterized functionally as linking non-core or peripheral constituents to the clausal core, thus it does duty linking demoted constituents, relative or subordinate clauses, and some adjuncts. We can propose that this was the essential function of proto-Nicobarese *ta; to mark constituents that fall outside of the clausal core (VPA) but are nonetheless linked to it. As already noted, Nancowry ta also marks non-human P, and this is likely to have been a secondary development. Similarly, the extension of the use of ta/tə in both languages to mark attributives within NP s, deriving adjectives or stative attribution, appears to be a secondary development. We can readily imagine the development in which a stative relative clause introduced with ta/tə (e.g. ‘the man who is big’) becomes reduced and takes a simple attributive reading. The fact that adjectives precede the noun in Car, and the equivalent construction follows the noun in Nancowry, suggests a parallel development rather than a single proto-Nicobariese innovation. Both languages also make use of na/nə to mark subordination, indicating a proto-Nicobarese *na 3rd person subordinate pronoun. Many AA languages do variously have 3p pronouns, demonstratives and conjunctions of the form nV (Shorto 2006 reconstructs PMK *niʔ ‘this’, *nɔʔ ‘this’, *nɔʔ ‘what, which’), and a phonologically weak form of one of these could yield proto-Nicobarese *na. To summarize, the present facts prompt us to reconstruct the preferred proto-Nicobarese clausal core as *VS/VPA, plus variants occasioned by topicalization, demotion, and deletion. Subordinate clauses were linked with *na, and other peripheral constituents linked generally with *ta.

References Arka, I Wayan. 2005. The core-oblique distinction and core index in some Austronesian languages of Indonesia. Keynote paper presented at International ALT VI Association of Linguistic Typology conference, Padang Indonesia, July 2005 Blench, Roger and Paul Sidwell. 2011. Is Shom Pen a Distinct Branch of Austroasiatic? In Sophana Srichampa, Paul Sidwell & Kenneth Gregerson (eds.) Austroasiatic Studies: papers from the ICAAL4: Mon-Khmer Studies Journal Special Issue No. 3. Dallas, SIL International; Canberra, Pacific Linguistics; Salaya, Mahidol University. pp. 90– 101 Braine, Jean Critchfield. 1970. Nicobarese Grammar (Car Dialect). Ph.D. Dissertation, University of California, Berkeley. De. Röepstroff, Frederik. 1875. Vocabulary of Dialects Spoken in the Nicobar and Andaman Isles. (2nd edition) Calcutta: Superintendent of Government Press. De. Röepstroff, Frederik. 1884. Dictionary of the Nancowry Dialect of the Nicobarese

nicobarese comparative grammar


Language, in Two Parts: Nicobarese–English, and English–Nicobarese. Ed. by Mrs. De Röepstrofff. Calcutta: Superintendent of Government Press. Dryer, Matthew S. 2013 Order of subject, Object and Verb. In Matthew S. Dryer, Martin Haspelmath (eds.), The World Atlas of Language Structures Online, Leipzig: Max Planck Institute for Evolutionary Anthropology. Elangaiyan, Rathinasabapathy et. al., 1995. Shompen–Hindi Bilingual Primer Śompen Bhāratī 1. Port Blair and Mysore. Hopper, Paul, 1983. Ergative, passive and active in Malay narrative. In: F. Klein-Andreu (ed.), Discourse perspectives on syntax, 67–88. New York: Academic Press. Jenny, Mathias. 2015. Syntactic diversity and change in Austroasiatic languages. In Viti, Carlotta (ed.) Perspectives on historical syntax. Amsterdam/Philadelphia: John Benjamins. pp. 317–340. Jenny, Mathias, Tobias Weber & Rachel Weymuth. 2014. The Austroasiatic Languages: A Typological Overview. In Mathias Jenny & Paul Sidwell (eds.) The handbook of Austroasiatic languages. Leiden/Boston: Brill, 13–143. Kruspe, Nicole D. 2004. A grammar of Semelai. Cambridge, Cambridge University Press. Man, Edward Horace. 1889. A Dictionary of the Central Nicobarese Language. Reprint 1975, Delhi: Sanskaran Prakashak. Mason, Francis. 1854. The Talaeng Language. Journal of American Oriental Society.4: pp. 277–288. Mithun, Marianna. 2002. An invisible hand at the root of causation: the role of lexicalization in the grammaticalization of causatives. In Ilse Wischer and Gabriele Diewald (eds.) New Reflections on Grammaticalization. Amsterdam/Philadelphia: John Benjamins. pp. 237–258. Radhakrishnan, R. 1970. A preliminary descriptive analysis of Nancowry. PhD. dissertation. Department of Linguistics, University of Chicago. Radhakrishnan, R. 1981. Nancowry Word, Phonology, Affixal Morphology and Roots of a Nicobarese Language. Corbondale and Edmonton: Linguistic Research Inc. (Published version of Radhakrishnan 1970, with syntax chapter removed). Rajasingh, V.R. 2016. Mūöt (Nicobarese). Mon-Khmer Studies: 45.14–52 Sidwell, Paul. 2014a. Car Nicobarese. In Mathias Jenny & Paul Sidwell (eds.) The handbook of Austroasiatic languages. Leiden/Boston: Brill, 1229–1265. Sidwell, Paul. 2014b. Austroasiatic Classification. In Mathias Jenny & Paul Sidwell (eds.) The handbook of Austroasiatic languages. Leiden, Boston: Brill. pp. 144–220. Temple, Richard. 1902. A Grammar of the Nicobarese Language. Chapter IV, Part II, The Census Report on the Andaman and Nicobar Islands. Port Blair: Superintendent’s Press. van Driem, George. 2008. The Shompen of Great Nicobar Island: New linguistic and genetic data, and the Austroasiatic homeland revisited. Mother Tongue 13: 227– 247.



Whitehead, George. 1925. Dictionary of the Car-Nicobarese language. Rangoon, American Baptist Mission Press. Wurm, S.A. and S. Hattori, eds. 1981, 1983. Language atlas of the Pacific area. (Pacific Linguistics C-66, C-67). Canberra: Australian Academy of the Humanities in collaboration with the Japan Academy.

part 2 Northern Austroasiatic Word Order

chapter 4

Word Order and the Grammaticalization of Gender in Khasian Hiram Ring



Verb-initial order is reported for less than 9% of the 1,377 languages in the WALS word order database (Dryer 2013). This makes such order somewhat unusual among languages of the world, and leads us to question how verb-initial order develops or is lost diachronically. Many of the Khasian varieties prefer verbinitial clauses, a feature they share with Nicobarese lects but not with the rest of the phylum (Munda is largely SOV and AA languages in MSEA are largely SVO in terms of the WALS typology; see Jenny et al. 2015); yet this fact has been hidden from scholarship until recently due to the dearth of information about the Khasian varieties. The current chapter describes the state of knowledge about word order within this group, with the perspective that Khasian word order has interesting implications for the development of various subsystems of grammar such as gender marking in Khasian, and potentially for the history of Austroasiatic as a whole. The remainder of this introductory section presents evidence showing that Khasian varieties are largely verb-initial. While this does not necessarily identify a word order pattern for Proto-Khasian, I propose that there are other subsystems of grammar in these varieties that interact with word order. If we take a parsimonious position that Proto-Khasian was verb-initial or at least headinitial, a pathway becomes evident for the development of gender agreement from gendered pronouns at the Proto-Khasian or Pre-Proto-Khasian stage. If we assume instead a non-verb-initial profile, it becomes more difficult to moti-

* Fieldwork and research for this chapter was supported in part by an MOE grant (MOE20121-100) at Nanyang Technological University in Singapore, and at the University of Zürich in Switzerland by an SNSF grant (SNSF100015_176264). I am grateful to the editors of this volume and two external reviewers for their insightful comments, as well as for the fruitful discussion and comments from attendees of the first Austroasiatic Workshop in Chiang Mai (2016) and attendees of ICHL 24 in Canberra (2019) which helped to clarify various aspects of this work. Remaining mistakes and omissions are my own.

© koninklijke brill nv, leiden, 2020 | doi:10.1163/9789004425606_006



vate such a pathway. Accordingly, in §2 I present a brief discussion of gender grammaticalization in Proto-Khasian pronouns, linked with a semantic shift suggested by comparison with a Palaungic language’s pronominal system. I then outline a word-order-dependent hypothesis for how gender agreement developed in Khasian. In §3 I conclude with some comparative observations and suggest future directions for this line of research. 1.1 Overview of Khasian Varieties1 The label ‘Khasian’ refers to a family of AA varieties spoken in North-East India whose geographically closest linguistic relatives are the Palaungic languages spoken in Myanmar, South-Western China, Northern Thailand, and Laos (Sidwell 2011; Nagaraja et al. 2013). Speakers of the Khasian varieties share many social and cultural practices, and their lects share a large number of lexical items. Their tribal super-group is Khasi, which is a “scheduled tribe” in India, a political designation enshrined in the constitution that has socioeconomic ramifications (for discussion see Kumar 1992; for a specific example see Kapila 2008). The written form of this larger designation is Standard Khasi (SK), which is taught from primary school onward and serves as a language of wider communication in the state of Meghalaya, along with English and, to some degree, Hindi. There is considerable variation among Khasian varieties in terms of grammar and pronunciation, but the shared vocabulary items and the learned bilingualism provided by instruction in SK allow the speakers of these different varieties to communicate and have a sense of shared ethnicity, which factors into (and is influenced by) the desire to maintain a united political front. Modern Khasian varieties known to exist alongside SK are Pnar, War, Lyngngam, Mnar, and Maram (and perhaps Bhoi).2 While some of these varieties (SK, Pnar) are relatively well documented, for the others it is not fully clear how similar or different they are in relation to each other, and more work needs to be done to properly determine the internal structure of the Khasian (or Meghalayan) branch.3 Significant numbers of War, Pnar, and SK speakers also live 1 The designation ‘language’ in the Khasian context is politically sensitive, even as a technical linguistic term. I therefore use the terms ‘variety’ and ‘lect’ somewhat interchangeably in this work to describe Khasian varieties. 2 The linguistic definition of the “Bhoi” in Re-Bhoi district of Meghalaya state is unclear. Nagaraja (1993) gives a few sentences, but Dikshit & Dikshit (2014: 361) highlight that they may be speakers of a Tibeto-Burman language (Karbi/Mikir) who have adopted Khasi cultural practices. Of two different “Bhoi” speakers I met, one indicated that Bhoi was a Khasi variety and one indicated that it was Karbi. 3 The current paper attempts to include most of the recent linguistic literature with example sentences available for the Khasian group, but note that Grierson (1904) only includes four

word order and the grammaticalization of gender in khasian


in the state of Assam and in the neighboring country of Bangladesh. Other (non-Austroasiatic) languages in the state of Meghalaya with sizable speaker populations are Tibeto-Burman languages: Biate, Karbi, Garo, and Atong—this is important to keep in mind from a diachronic and comparative perspective in order to trace possible contact effects in future work. The following subsections outline the current state of knowledge regarding Khasian varieties for which there is data,4 in terms of what is known about word order. Below I discuss Standard Khasi (§1.2), Lyngngam (§ 1.3), Bhoi (§ 1.4), Pnar (§1.5), War (§1.6) and Mnar (§1.7), with the following caveat. Due to the general lack of data, the example sentences come from sources which vary in terms of subject matter, degree of standardization, and domain. While some authors are explicit in identifying their language data as primarily spoken and resolvable to recordings or transcribed stories, others are not explicit and the majority of their data seems to be from elicited sentences or from normalized texts such as Bible translations.5 This raises various problems which cannot be fully dealt with here and highlights the need for more comprehensive text corpora based on transcribed speech in comparable genres and domains. Even with the existing spotty data, however, we can identify some common patterns and tendencies in the Khasian family. A general observation is that while word order is somewhat variable in each lect, each also seems to have a ‘basic’ word order in main clauses which follows a head-initial pattern. Variation in this word order also follows a common pattern in the group whereby new or ‘topical’ information is given utterance-initially. 1.2 Standard Khasi Word Order Standard Khasi is the Khasian variety most frequently described, with speakers located primarily in central and southern Meghalaya around the state capital of Shillong. As noted above, it is an administrative language of Meghalaya and one that children across the state learn in school. It is also the only variety with an official Bible translation. The primary linguistic work on SK is by Rabel (1961), followed by Nagaraja (1985 and subsequently). Rabel’s work was conducted in California through native speaker consultants from Cherrapunji (Sohra) varieties of “Khassi” [sic] languages in his survey: Standard Khasi, Pnar, War, and Lyngngam. Grierson’s data includes a transcription of two texts for each of these varieties, amounting to about 41 sentences each, which provide a point of comparison with the modern languages that is unfortunately outside the scope of the current paper. 4 There is currently no published data on Maram, for example, though recordings are in the process of being transcribed and translated by the author. 5 In the relevant subsections I have tried to provide contextual information regarding the various authors’ data sources, where available.



and contains several glossed texts with roughly 250 sentences. Nagaraja’s work has been conducted largely in Shillong through native speaker consultants. He includes two texts in his 1985 grammatical description, amounting to 81 sentences, and his subsequent papers contain further examples. Interestingly, there is variation in what is reported for SK word order. To start with the most recent source on SK, Nagaraja (1993)6 agrees with earlier work in demonstrating that SK tends to be SV in intransitive clauses (1) and AVP in transitive clauses (2).7 This is true whether the arguments are pronouns or full nouns (3), although note in the latter case the occurrence of a repeating subject pronoun pre-verbally.8 (1) u leyt 3sg.m go ‘he goes’ (Nagaraja93-S1) (2) u la ay ya ka=kot ha ka 3sg.m past give obj f=book to 3sg.f ‘he gave the book to her’ (Nagaraja93-S3) (3) ka=kɨnthey ka la leyt kloykloy f=woman 3sg.f past go slowly ‘the woman went slowly’ (Nagaraja93-S4) In Nagaraja (1985), we find some word order variation in the included texts, such as in the story of ‘Nohkalikai’, about a well-known waterfall near Cherrapunji. In sentences 4 and 6 of the story (examples 4 and 5 below) an (intransitive) existential/copula clause is verb-initial (a common cross-linguistic ten6 Nagaraja (2014) is largely a summary of earlier data, so here I cite the earlier work. 7 I here use ‘A’ to refer to the primary or most necessary argument of a transitive verb, and ‘P’ to refer to the secondary or least necessary (typically a semantic patient). ‘S’ refers to the single argument of an intransitive verb, and there is a clear alignment between S and A, such that the Khasian languages can be considered to have a ‘subject’. 8 Language examples use orthographic conventions of the respective authors, except in places where I clarify word/clitic boundaries and normalize glosses. In some cases this means (unfortunately) that it may be difficult for readers to resolve letters to actual underlying sounds, as for example with the character y, which in orthographies of various authors can be used to indicate central vowels (most often [ə]) or the palatal approximant [ j] or a glottal stop [ʔ]. Full normalization awaits more access to native speaker consultants. Text and line/example numbers are given in brackets in the free translation line. Abbreviations largely follow Leipzig Glossing conventions, with AA language differences (see icaalprojects/aa‑book‑project/notational‑conventions).

word order and the grammaticalization of gender in khasian


dency of “thetic” expressions), while in (5) the order AVP occurs in the dependent/relative clause ki šañ bad thaw ki=tyarnar ‘they forged and made iron implements’. (4) la don taŋ ki=mawkɨnroʔ bad ki=noŋrim ka=noŋjaʔ past exist only pl=wall.stones conj pl=foundation f=Nongjah ‘There are now only walls and foundations of Nonjah’ (Nagaraja85-A2.1–4) (5) ka la loŋ ka=šnoŋ ha ka ba [ki šañ bad thaw 3sg.f past be f=village at 3sg.f nml 3pl forge conj make ki=tyarnar] pl=iron.tool ‘it was a village in which they forged and made iron implements’ (Nagaraja85-A2.1–6) Nagaraja’s analysis does not describe alternate word orders beyond topic fronting.9 In contrast, Rabel’s earlier analysis sets aside a section to describe these alternations, stating: Often the verb and the subject are inverted. This may happen in major clauses introduced by /haŋta/ [‘then’] and other connectives and it may happen in minor clauses introduced by [other adverbs of time or manner] Rabel 1961: 126

In example (6) below, the second example of her section 512, this is clearly seen, with haŋta ‘then’ introducing the clause and both arguments ‘the mouse’ (A) and ‘Naam’ (P) following the verb, in that order. (6) hangta la ˀoŋ ˀii=khnaay ya ka=Naam then past say dim=mouse to f=Naam ‘Then said the (little) mouse to Naam …’ (Rabel61–512.2) On the same page she notes that “following a minor clause, the major clause may have inverted word order” as in example (7).

9 ‘Topic fronting’ refers to a process whereby topical/focus elements (referents or verbs) are spoken first in order to highlight them, occurring at the ‘front’ of a clause. This is quite common in languages of the world, and also occurs in the other Khasian varieties.



(7) katba ka naŋ khreʔ šet, ˀii la wan ˀii=wey ˀii=khnaay while 3sg.f prog prepare cook dim past come n=one n=mouse ‘While she was preparing to cook, a mouse came’ (Rabel61–512.6) The verb-initial word order following adverbial constructions in SK is clear from Rabel’s texts. Example (8) is taken from a traditional story, with the verb ‘be pure’ being followed by the main argument ‘world’. Example (9) is taken from a personal story and the verb ‘be astonished’ is followed by the main argument ‘parents’ (an elaborate expression). (8) haba daŋ hɔk ka=pərthej when cont be.pure f=world ‘When the world was still pure …’ [Rabel61-TFOTK_01] (9) lŋ̩ŋɔʔ sa ki=kmi ki=kpa be.astonished pl=mother pl=father ‘the parents (were) astonished’ [Rabel61-LWC_13] An example from another story in Rabel’s data illustrates that full nouns may also follow verbs in relative clause constructions (10). Here, while the translation and the punctuation make it clear that this is a single complex construction, it is not completely clear whether the relative clause is ka ba la sngew shyrkhei ‘(sth.) which sounded terrible’ or simply ka ba la sngew ‘(sth.) which was heard’. If the former, this is an example of the head noun (ka=jinguot ‘the groaning’) following its dependent. If the latter, shyrkhei ka=jinguot ‘the groan was terrible’ (the second part of the complex construction) is another example of a verb-initial stative construction. (10) ka ba la sngew shyrkhei ka=jinguot, … 3sg.f rel past hear be.terrible f=groaning ‘… which sounded terrible the groaning, …’ [Rabel61-DNHAFC_37] These examples from Rabel suggest that having main arguments follow the verb in SK is common in certain contexts. While the primary pattern is for AVP or SV word order, stative verbs or verbs that occur after time adverbials can have arguments follow them, with VS or VP word order. Example (6) shows a clear VAP order in SK. A count of word orders from Rabel’s story texts (as opposed to conversation) shows that verb-initial clauses are roughly 10 % of 179 sentences and always either follow adverbials (of time/manner), occur with stative or intransitive verbs, or are idiomatic expressions (of which verb-initial patterns are the primary form).

word order and the grammaticalization of gender in khasian


This highlights an apparent difference between the data gathered by Rabel and that gathered by Nagaraja, indicating that either the two researchers worked with different varieties of SK (possibly Sohra vs. Shillong) or that there may be some variance in Nagaraja’s data that was not adequately dealt with. Lacking access to text corpora, the latter conjecture cannot be tested and the other possibilities await further information. What can be said, however, is that Standard Khasi word order is somewhat variable, with S, A, and P arguments both preceding and following the verb depending on clause type and potentially other (as-yet unknown) constraints. 1.3 Lyngngam Word Order Lyngngam speakers are mainly located in western Meghalaya and the variety is distinctive within Khasian in part because (at least based on modern data) it does not seem to have classificatory gender clitics as the other Khasian varieties do.10 This is a characteristic shared with speakers of A’tong, Garo, and other TB languages with whom Lyngngam speakers are in close geographical proximity and have some cultural/marriage relations, though whether such contact has led to loss of gender marking cannot be determined at this time. Lexically, however, Lyngngam is quite close to Khasian (see Nagaraja et al 2013), and initial inquiries by the author indicate that there are affinities with western Khasi varieties such as Maram. Textual data on Lyngngam is limited to Nagaraja (1996) and primarily unpublished data (van Breugel 2015; 2016 and p.c.; Baker 2013 and p.c.). Lyngngam word order is reported by Nagaraja to be SV/AVP (11), and this is confirmed by van Breugel in simple matrix clauses (12), though there is variation in questions (13–14). In the case of questions this variation may be driven by other factors, with some elements potentially being extra-clausal. Unfortunately there is not enough textual data for such variation (or potentially other variation, such as might be expected in subordinate or adverbial clauses) to be accounted for. (11) nə di-laʔ 1sg go-past ‘I went’ (Nagaraja96-L9)


In Grierson’s texts (1904: 20–22) there are gender clitics on at least some nouns that mirror those of the other varieties, but Nagaraja (1996) states that there is no gender marking. Paul Sidwell (p.c.) notes that fossils of clitics seem apparent in at least some words, e.g. gum ‘water’ (Lyngngam) < ka=um (Khasi, Pnar).



(12) ny wan ʔam chahlang 1sg come abl Shallang ‘I come from Shallang’ (vanBreugel) (13) diʔ tynyt mi go where 2sg.m ‘Where are you going?’ (vanBreugel) (14) mi hyk symyt s-ny 2sg.m call why to-1sg ‘Why do you call me?’ (vanBreugel) 1.4 Bhoi Word Order The work of Nagaraja (1993) has as its primary goal to give access to data on Bhoi, spoken in the northern part of Meghalaya, with 22 sentences. Nagaraja (2018) is a second source of data, with 29 sentences/clauses. The data provides clear instances of verb-initial sentences, such as in examples (15) and (16). Here the primary order for pronouns is VS/VAP. If the subject is a full noun the order seems to shift to SV (17) or AVP (18); in the latter (transitive clause) case the post-verbal subject pronoun is retained. (15) ley ŋa go 1sg ‘I go’ (Nagaraja93-B1) (16) laʔ ay u ka=kot ha ka past give 3sg.m f=book to 3sg.f ‘he gave the book to her’ (Nagaraja93-B2) (17) ka=kanthey ley panchayt f=woman go quickly ‘the woman goes quickly’ (Nagaraja93-B4) (18) u=ksaw laʔ beʔ u ha ka=myaw m=dog past chase 3sg.m to f=cat ‘the dog chased the cat’ (Nagaraja93-B3) These patterns are also attested by the examples in Nagaraja (2018), such that Bhoi verb-initial sentences occur with pronouns and not with full nouns. A caveat here is that it is not clear from the sources whether post-verbal subject nouns are permitted—future work could seek to determine this.

word order and the grammaticalization of gender in khasian


1.5 Pnar Word Order Pnar is the subject of an MA thesis (Choudhary 2004) and two PhD dissertations by Bareh (2007) and Ring (2015). Ring’s textual data consists of 2,739 sentences transcribed from folktales and narratives,11 on the basis of which Ring (2015: 23) states that Pnar has “a VAO/VS basic constituent order.” Example (19) shows verb-initial word order in an intransitive clause, and (20–21) in transitive clauses. (19) laj u go 3sg.m ‘He goes/went’ [Ring15PP05KO_013] (20) khut u=siŋ u=nik call m=Singh m=Nik ‘Singh called Nik’ [Ring15KP_009] (21) i=ji u=e mi ja ŋa n=thing nf=give 2sg.m ben 1sg ‘what will you give me?’ [Ring15PP05KO_044] Variations in word order in Pnar, as in SK, are primarily related to pragmatic function, with fronting of arguments only occurring for the purpose of highlighting topical information. Ring (2015: 90–91) illustrates this with the following two corpus examples (22–23) where the first is the unmarked realization and the second is the ‘topic-fronted’ marked realization. (22) jap [u=wɔʔ kiaŋ naŋbaʔ]S die m=hon Kiang Nangbah ‘Mr. Kiang Nangbah died’ [Ring15KNI_006] (23) [u=wɔʔ kiaŋ naŋbaʔ]S.TOP jap uS m=hon Kiang Nangbah die 3sg.m ‘Mr. Kiang Nangbah, he died’ [Ring15-KNI_010]


The complete dataset is available for download as interlinear glossed texts and linked audio (Ring 2017).



1.6 War Word Order The War variety is spoken along the southern and south-eastern slopes of the Meghalaya plateau, and the primary recent researcher has been Anne Daladier. Of her recent work, Daladier (2005) and Daladier (2011) contain 84 sentences in total. Weidert’s (1975) grammatical description of the Amwi variety of War is an earlier source (in German), primarily concerned with phonology, morphology, and lexicon, and written in a sometimes difficult to follow manner within the tagmemics framework. He makes no explicit mention of constituent order, but does include two glossed texts (121 sentences), from which information can be extracted. Examples from these texts are given in (24–27) with glosses in English. The majority of clauses in Weidert’s data are verb-initial, though there is variation. In (24) the ‘3pl’ pronoun jə follows the verb ‘say’ and the verb ‘come’. It could be argued here that both clauses are subordinated, the first with the conjunction ‘but’ and the second with the relative marker bə. However, a similar pattern emerges for example (25), where jə follows the verbal constructions for ‘win’ and ‘conduct war’. This pattern is also seen in (26) with the ‘1pl’ pronoun, and (27) shows that a full noun ʔu khlɛn ‘snake’ can immediately follow the verb as the agent of ‘bite’. (24) hnrej chu ʔɔŋ jə bə-ə wan jə di liaŋ ʃlɔʔ ʃɛ̃ but just say they that come they to side come.out day ‘It is only said that they came from the East.’ (Weidert75-Text1.04) (25) cɔp mə cɔp jə kat chaʔ lə jaʔliaʔ thmi jə win and win they as nearby go conduct war they ‘They won wherever they went to war’ (Weidert75-Text1.12) (26) jaʔlə beʔ ʃkɛ ʔi hnthlɛ cuprəw ʔi go.together hunt deer we seven people we ‘We went hunting for deer with seven men’ (Weidert75-Text2.02) (27) tə ʔɔŋ jə, ʔə hit ʔu khlen so say they pres bite the snake ‘They said, “The serpent bit him (the dog).” ’ (Weidert75-Text2.18) In Daladier’s (2005; 2011) data there is also variation in word order, and representative examples are given below. In (28) the noun ʔu=hun ‘boy’ occurs pre-verbally and a post-verbal pronoun refers to the same referent, while in examples (29) and (30) all arguments occur in post-verbal position.

word order and the grammaticalization of gender in khasian


(28) ʔu=hun bɔ tsi ʔu m=boy eat rice 3sg.m ‘the boy eats rice’ (Daladier05-4) (29) e ngem u ti ka=ˀam dcl immerse 3sg.m in f=water ‘he immersed in the river’ (Daladier11-3.2-11) (30) themphue ke di u=kwijang greet 3sg.f inst m=necklace ‘she greeted (him) with a necklace’ (Daladier11-3.2-39) This data indicates that War constituent order is primarily verb-initial with either pronouns or full nouns following the verb, but with variation that can be explained as ‘topic-fronting’ or highlighting of arguments for pragmatic purposes, similar to Pnar. 1.7 Mnar Word Order Mnar is spoken in northern Meghalaya along the border with Assam. The single publication with grammatical data that exists for Mnar (otherwise known as Jirang) is a short paper by Koshy & Wahlang (2010) with a 39-sentence text (the biblical story of the prodigal son) included as an appendix. They state (p. 157) that “the Basic Word Order of Mnar is SVO but there are possible indications that at the deep structure level it is VSO.” They suggest that all SVO sentences may be ‘derived’ from underlying VSO structures like example (31) and give example (32) as evidence that VS/VAO structure is quite natural in Mnar. (31) liʔ iːet u=ɟɔn ha ga=meri past love m=John acc f=Mary ‘John loved Mary’ (KoshWah10-7) (32) da liʔ pɨnlut u badkleŋ ga=thai aʔ u, laʔ after past finished 3sg.m all f=nom poss 3sg.m come waʔ i=mi ga=nemsniau imoʔ ŋa=tu ŋa=chnoŋ conseq n=one f=famine all f=dist f=place ‘After he had finished everything, a famine came to the place’ (KoshWah10-8)



This claim seems to be supported by the textual data they include, but more than 39 sentences would be necessary to fully confirm or reject these claims, and further investigation awaits a more comprehensive text corpus.12 1.8 Summary So far, we have looked at basic constituent order in the six Khasian varieties for which data is available. We find that there is quite a bit of variability in word order, yet for particular lects a dominant word order can be identified, and the order may change depending on whether the referent is a pronoun or a noun. Out of the varieties investigated here, SK and Lyngngam have a primary word order of SV/AVP. For Lyngngam, verb-initial structures only turn up in interrogative constructions (see 13), and for SK verb-initial structures are only observed (by Rabel 1961) following time adverbials, in dependent clauses, and in interrogatives.13 Bhoi, Pnar, War, and Mnar are largely verb-initial, and their word order variation can be understood as conditioned by pragmatics, with initial sentence position used to highlight or focus referents for various pragmatic purposes (salience, new information, switch reference; the exact uses of this initial position within discourse in these varieties has yet to be fully worked out). These observations are summarized in Table 4.1 where ‘Y’ indicates that the pattern occurs, and ‘+’ and ‘-’ indicate that the pattern is major or minor, respectively. Crucially, all of the lects examined here allow multiple orders of referents with respect to the verb, which makes it difficult to reconstruct a particular word order simply on the basis of the frequency with which particular orders occur in main clauses. Although there is not enough data for most of these varieties to make a full assessment of the constraints on such realizations, we still need to consider whether some varieties changed from verb-initial order (attested in Pnar/War/Bhoi/Mnar) or from verb-medial order (attested in SK and Lyngngam), or from verb-final order (unattested in modern Khasian varieties). If we assume a similar time depth for the branching of all six varieties from the proto language, the most parsimonious explanation is that Proto12


Paul Sidwell has provided me with some of Harry Shorto’s notes on Mnar, which were collected when Shorto visited Weidert in Shillong in 1978. These include 4 pages of Mnar sentences; unfortunately there was not enough time to analyze this data for the current work. There are also indications of verb-initial order being the default for copula/existential clauses, though this is crosslinguistically common, as noted above, and consider expressions like “There is a bird on the tree” where “there is” could be considered an existential verbal construction (which is how it is encoded in many languages, including the Khasian varieties).

word order and the grammaticalization of gender in khasian table 4.1


Khasian varieties and verb-initial structures

SV/AVP VS/VAP Notes Standard Khasi Lyngngam Bhoi Pnar War Mnar


YYY+ Y+ Y+ Y+

V-initial limited to cop/dep/adv/int clauses V-initial may be limited to interrogatives AVP limited to full nouns (new info?) AVP always require post-V pronoun AVP always require post-V pronoun AVP always require post-V pronoun

Khasian was largely verb-initial in main and subordinate clauses, and that SK and Lyngngam changed to verb-medial in main clauses. This explanation also seems to make the most sense for understanding the grammaticalization of other subsystems in the Khasian family. To illustrate this, in the following section I focus specifically on the grammaticalization of gender agreement in the Khasian group.


Word Order and Grammaticalization in Khasian

The constituent order in Khasian varieties has important implications for the development of various grammatical features of these lects. One prominent feature in this respect is the existence of a gender agreement system, which is not found in other Austroasiatic languages. Khasian varieties (with the possible exception of Lyngngam) very clearly show sex-based gender agreement within the noun phrase and with pronouns, as in (33). In this Pnar example the feminine clitic ka= marks wi ‘one, an’ as agreeing with the noun kn̩ thaj ‘woman’, while the stative verb ‘be old’ serves as a property concept modifying the same noun. (33) ɛm jap ka=wi ka=kn̩ thaj tm̩ mɛn have die f=one f=female be.old ‘there was one old woman died’ [Ring15-LS3J_007] For the origin of these gender morphemes we need look no further than the pronominal paradigms. In §2.1 I present an example of the current Khasian pronominal system in relation to other AA languages and (through comparison with a Palaungic language) suggest a means by which a semantic shift toward

120 table 4.2

ring Personal pronouns in Khasian Pnar

Gender: Person:

Masc. 1st 2nd 3rd

sg Fem.

ŋa, ɔ me, mi pha, phɔ o, u ka, kɔ


pl N/A


i phi ki

gendered pronouns might have taken place. In § 2.2 I take an existing gendered pronominal system as the basis of gendered marking within the noun phrase and identify the possible role of word order in the development of the agreement system. 2.1 Development of Gender from Pronouns Table 4.2 illustrates a typical Khasian pronominal system, as found in Pnar, with three persons, two numbers, and three genders in third person. The noun gender clitics are u= ‘masculine’, ka= ‘feminine’, and i= ‘neuter, diminutive’, with no gender distinction in plural, which is marked by ki=. Clearly these forms resemble the third person pronoun forms, which are likely the source. It is important here to note that sex-based (or ‘natural’) gender is not a feature of the pronominal paradigms of most other AA languages, nor is it reconstructed for any AA branch. This means that we must first explain how a sex-based distinction became a prominent part of the Khasian pronominal system before developing a hypothesis regarding the grammaticalization of gender agreement in the Khasian noun phrase. With a careful comparison of pronominal systems in AA languages (Table 4.3; collated from descriptions in Jenny & Sidwell 2015, Diffloth 1994, Shorto 2006, Phillips 2012) a hypothesis of how sex-based gendered pronouns developed becomes relatively clear. So while gender is almost non-existent in other AA pronominal systems (with the marginal exceptions of Semaq Beri [Aslian], Koho Sre [Bahnaric], and Kammu [Khmuic]), there is at least a three-number system (singular, dual, plural) and/or clusivity (inclusive, exclusive) represented in the pronouns of most language families within AA. This may be why Diffloth (1994) reconstructed Proto-Mon-Khmer pronouns with clusivity and sg/pl, while Shorto (2006) was apparently in the process of reconstructing Proto-Mon-Khmer with clusivity and trial sg/dl/pl pronouns. I propose that gender in the pronominal systems of AA languages where it does exist arose from a reanalysis of number/clusivity in the Proto-AA pronom-


word order and the grammaticalization of gender in khasian table 4.3

Summary of pronouns in Austroasiatic languages Person Clusivity Number Case 1 2 3 inc exc sg du pl s/o

MK Proto MK [D] Proto MK [S] Mu Santali Gorum As Proto Aslian Semaq Beri Mo Old Mon Modern Mon Pe Chong Kh Old Khmer Mod. Khmer Ba Koho Sre Ka Kui Ntua Vi Vietnamese Km Kammu Ma Bugan Pa Dara’ang Palaung Danau Ks Khasi Pnar Ni Nicobarese

+ + + + + + + + + + + + + + + + + + + + +

+ + + + + + + + + + + + + + + + + + + +

+ + + + + + + + + + + + + + + + + + + + (+) + + + + + + + + + +

+ + + + +

+ (+) + + +

+ + + + + + + + + + + + + + + + + + + + +

+ + +

+ + +


+ + + + + + + (+) (+) + + + + + + + + + + + +


Gender m n

Politeness pol imp frm inf

+ +


+ (+) + + + (+) (+) (+)

+ +






+ +




+ +

+ +

+ + + (+) (+)

inal system, becoming associated with biological sex. In fact, this seems to be exactly what is happening with the pronominal system of Man Noi Plang (Table 4.4), a Palaungic language spoken in South-Western China and Northern Thailand. Lewis (2008: 20) notes that “The third person dual [ka], rather than the third person singular, is always used to refer to a female who has had children.” This suggests a possible shift whereby 3dl pronoun forms in (Pre-)ProtoKhasian were increasingly associated with female sex and 3sg forms were increasingly associated with male sex, becoming the 3sg.f and 3sg.m pronouns, respectively. A comparison of Man Noi Plang third person pronominal forms ʔɤn, ka, kɛ ‘sg, dl, pl’ with those of Pnar (u, ka, ki ‘sg.m, sg.f, pl’) is striking in this regard.14



Compare also Pnar mi ‘2sg.m’ / pʰa ‘2sg.f’ / pʰi ‘2pl’ and Man Noi Plang mi ‘2sg’ / pa ‘2dl’ / pɛ ‘2pl’. Sidwell (2015) notes that Man Noi Plang ɯ and ɛ are reflexes of Proto-Waic *uː and *iː respectively.

122 table 4.4

ring The Man Noi Plang pronominal paradigm (Lewis 2008)



1 2 3

ʔɯ mi ʔɤn

Number dl pl ʔa pa ka

ʔɛ pɛ kɛ

2.2 Development of Agreement via Word Order The facts above give us a working hypothesis of how gendered pronouns have developed in the Khasian family from an earlier pronominal system with expanded number. Given that Khasian varieties are spoken in an area where Indo-Aryan languages also migrated, this shift could have been partially motivated or encouraged by contact with Indo-Aryan languages that had productive gender systems. It is still not clear, however, how Khasian gender agreement may have arisen. This is where an understanding and reconstruction of constituent order may play a pivotal role. There are several elements to the role of constituent order here, and below I first consider the position of the pronoun vs. the position of the gender clitic within clauses in Khasian (§ 2.2.1). I then look at how the processes of compounding and reduplication function with regard to ‘elaborate expressions’ (§2.2.2) and observe the use of gender clitics as nominalizers (§2.2.3) in these varieties. 2.2.1 Position of Pronoun and Gender Clitics in Khasian There are several observations to be made regarding the order of pronouns and gender clitics in Khasian clauses, namely: 1) the position of the pronoun in relation to the verb, 2) the position of the pronoun in relation to relative/subordinate clauses, 3) the position of the gender clitic in relation to the noun or noun phrase, and 4) the position of the pronoun in relation to quantifier phrases. The Khasian varieties observed here follow a head-initial or head-modifier typological profile (see Tesnière 1959; Greenberg 1963). This means that adverbs follow the verb they modify, modifiers of nouns follow nouns, and so on. Following from a reconstruction of Proto-Khasian as primarily head-initial, variations in this order may signal change from an earlier head-modifier alignment.

word order and the grammaticalization of gender in khasian

123 The Position of the Pronoun in Relation to the Verb In most Khasian lects described in §1 (with the exception of SK and Lyngngam), the subject pronoun (S or A) occurs following the verb (or verb complex). In cases where a full noun identifies the subject, it either replaces the pronoun in the subject slot or, if preceding the verb, takes a ‘topical’ focus slot while the pronoun remains in the subject slot immediately following the verb. The Position of the Pronoun in Relative/Subordinate Clauses As Khasian lects tend to follow a head-modifier profile, full pronouns precede relativized elements even in SK and Lyngngam (see example 10 above, repeated below as 34). In the case of subordinate clauses or relativized/nominalized verbs serving as NP modifiers, the pronoun serves as a head which references the gender of the element being modified. This means that relative clauses resemble matrix clauses in these varieties, but with the addition of a relativizer (ba in the SK example below) immediately after the pronoun and before the verb complex. (34) ka ba la sngew shyrkhei ka=jinguot, … 3sg.f rel past hear be.terrible f=groaning ‘… which sounded terrible the groaning, …’ [Rabel61-DNHAFC_37]

The Position of the Gender Clitic in Relation to the Noun or Noun Phrase In all Khasian languages that have gender marking, the gender marker precedes the noun. If these languages follow a head-modifier pattern, then in some sense the gender marker may be more salient than the more contentful noun which follows. In terms of agreement and in terms of information structure it is possible that the categorization of a noun by gender allows the interlocutors to more quickly identify grammatical relationships within the clause/sentence (see Lew-Williams & Fernald 2007, 2010; Ferrer-i-Cancho 2017). This, combined with observation #2, indicates a means by which a clausal combination of pronoun-noun (head-modifier) could have become reanalyzed as a single constituent (gender_clitic-noun), though other motivations, including direct calques through borrowing or otherwise, cannot be ruled out at this time. The Position of the Pronoun in Relation to Quantifier Phrases In Khasian phrases with enumeration or quantification (which often require classifiers), the pronoun/gender marker often precedes the numeral, but in some cases it can be left off altogether (though it is currently unclear what motivates this difference in structure). This is unlike demonstrative phrases, in



which the same gender clitic is required to mark both the deictic and the noun to which it refers. The variable realization in quantifier phrases suggests that quantification and numeral classifiers may serve a similar function as gender marking, namely to identify salient classificatory properties of the referent.15 Summary of Pronoun Order Properties The fact that subject pronouns follow the verb in the majority of Khasian lects while gender clitics precede the noun which they mark leads to some possibilities of historical development. One such possibility is that the gender clitics were originally the pronoun head of a head-modifier construction, similar to the pronoun in current Khasian relative/attributive clauses. In this scenario, over time, the gender pronoun became increasingly associated with a classificatory function and combined prosodically with the following noun, similar to how compounds are formed via prosody in Khasian varieties. Further evidence supporting this possibility comes from elaborate expressions in Khasian (§2.2.2) and the use of NP-agreement gender clitics for nominalization functions (§2.2.3). 2.2.2 Elaborate Expressions in Khasian The term ‘elaborate expression’ refers (since Haas 1964) to sets of words that are “intermediate in structure between ordinary compounds and reduplications … a compound containing four (usually monosyllabic) elements, of which either the first and third or the second and fourth are identical” (Matisoff 1973: 81–82; see also Haas 1964; Solnit 1995). In Khasian varieties such as Pnar these kind of words are identified as part of an important set of kten kn̩ nɔʔ or ‘sounding words’ which include other lexical items termed ‘expressives’ in the literature. The linguistic term ‘expressive’ was used by Diffloth (1976; 1979; apparently following Jakobson 1960) to describe phonaesthetic form pairs (from Henderson 1965) containing onomatopoeia and sensory information to express intensity and colour.16 These distinctive lexical forms are found throughout AA languages and the term has been applied to other languages of South-East Asia for words 15


The function of numeral classifiers in Chinese may be instructive here: Cheng & Sybesma (2005) state that classifiers in Chinese are equivalent to a definite article, while Wu & Bodomo (2009) disagree. Chen (2004) suggests an alternate analysis, that classifiers are instead related to “identifiability”, a pragmatic communicative property. In all cases the assumption seems to be that classifiers serve to constrain reference of the noun phrase (one of the properties of definite markers in English), which is also the case for gender markers in Khasian. It is worth noting that Banker (1964; Bahnar) and Watson (1966; Pacoh) referred to this lexical phenomenon under the term “descriptives”, while for Rabel (1961; 1976; Khasi) these were termed “ideophones”.

word order and the grammaticalization of gender in khasian table 4.5


Some Pnar expressives and elaborate expressions

Expressive Gloss

Elaborate expression


nɔk nɔk khrot̪ khrot̪ tɛr tɛr lat̪ -lod̪ ml ̩lu ml ̩la awri awra

i=pr̩ thaj i=mn̩ dɛr ki=mrad ki=mrɛŋ u=sɛr u=skaj ka=khnam ka=rn̩ teʔ da ʧat̪ da khiaʔ da piaʔ da pra

‘earth, world’, ‘all creation’ ‘herbivores, omnivores’, ‘all animals’ ‘sambar deer, barking deer’, ‘deer’ ‘arrow, bow’, ‘archery’ ‘be healthy, be whole’, ‘well-being’ ‘break, scatter’, ‘break to pieces’

‘limp’ ‘perfectly ripe’ ‘etcetera’ ‘be free’ ‘such and such’ ‘argue’

that seem to similarly exhibit iconic properties. In Khasian lects, there are various types of expressives and elaborate expressions, which are exemplified (for Pnar) by Ring’s (2015: 189) Table 9.1, here Table 4.5. For the present discussion, elaborate expressions are of greater interest than expressives. This is because elaborate expressions in Khasian varieties involve multiple grammatical morphemes—either gender clitics or case/mood particles that occur along with nouns/verbs. The use of multiple morphemes of different lexical types for productive compounding, as found in such elaborate expressions, opens the door for re-analysis of the functions of the morphemes involved, which I propose is what happened for these gender markers in noun phrases. What makes this scenario more compelling is the way that gender clitics are also involved in nominalization. 2.2.3 Gender Clitics as Nominalizers in Khasian Ring (2014) states that gender clitics serve nominalization functions in Pnar when preceding verbs. The masculine gender clitic u= forms a purposive nominal or non-finite state from a verb (35), the feminine gender clitic ka= gives a resultative noun (36), and the neuter gender clitic i= creates an action nominal (37). (35) biaŋ i=pn̩ thɔr u=rɛp u=khiʔ enough n=farmland nf=farm nf=work ‘enough farmland to farm, to work’ [Ring15-PP04SKO_044] (36) he-i=ʤoʔ i=pɔr man kɔ ka=khiʔ loc-n=same n=time happen 3sg.f res=work ‘at the same time it is work’ [Ring15-AIJ_072]



(37) i=ni hɛʔ i=khiʔ jɔŋ i n=prox only n=work gen 1pl ‘this is our only work’ [Ring15-AIJ_013] Example (35) presents a nominalized elaborate expression u=rɛp u=khiʔ ‘to farm, to work’, which is formed from the respective verbs for ‘farming’ and ‘working’. The use of pronominal forms in conjunction with verbs or nouns to form compounds in this manner may thus have started simply to denote the referential status of the head, and could then have been re-analyzed as serving a classificatory function. When we further consider that a pre-clausal (or pre-head) slot is reserved for ‘topic’ arguments in all the Khasian varieties, it is not too far a stretch to think that a similar pattern used for referential denotation on nouns/verbs in elaborate expressions could have then been extended to nouns/verbs in prosaic language, giving rise to gender agreement. 2.3 Summary The preceding discussion has presented the following multi-stage hypothesis of gender development: 1) Gender developed in the Proto-Khasian pronominal paradigm via reanalysis of a singular/dual/plural pronominal paradigm through a semantic shift associating dual number with female sex. 2) Proto-Khasian being primarily head-initial, pronouns were used as referential heads with content nouns or modifying elements (such as attributives) following. 3) Over time, and through reanalysis of their functional usage with regard to such creative lexicon as elaborate expressions, gendered pronouns cliticized to the following nouns and became bleached of their head status, beginning to mark other elements of the noun phrase as general referential markers and verbal nominalizers. This development can be schematized as in Table 4.6, which is formulated slightly differently, such that the acquisition of pronominal gender occurs after the referential usage of the pronoun as head of a relative/attributive clause. Stages one and two in the summary above may in fact have happened at the same time; since this is a continuing matter for investigation I do not currently make a strong claim for one stage to precede the other. Such a schematization outlines a series of interlocking hypotheses and stages for the development of gender marking in the Khasian lects. Importantly, positing a general verb-initial or head-initial order in Proto-Khasian gives us a more parsimonious explanation for the development of gender agreement as well as for the development of word order patterns in the dif-

word order and the grammaticalization of gender in khasian table 4.6


Schematic examples of gender marking development in Khasian

Schematic examples


(1) *jaB *bruː *wa *san head-modifier clause structure v n [REL v] ([Pre-]Proto-Khasian) die person nml be.big ‘the person died who was big/important’ (2) *jaB *bruː *ka *wa *san pronoun inserted as v n [EM REL v] head of REL clause die person 3sg nml be.big ‘the person died, the one who was big/important’ (3) *jaB *bruː *ka *wa *san pronoun acquires gender v n [pro REL v] die person 3sg.f nml be.big ‘the person died, she who was big/important’ (4) *jaB *ka *bruː *ka *wa *san pronoun inserted to mark N v pro n [pro REL v] die 3sg.f person 3sg.f nml be.big ‘the female person died, she who was big/important’ (5) jap ka=bru ka wa san noun marker cliticizes to N v G=n [pro REL v] (present day Pnar) die f=person 3sg.f nml be.big ‘the woman who was big/important died’

ferent Khasian varieties. If Proto-Khasian was not verb-initial, we then have to explain or motivate a change to verb-initial order in the majority of the Khasian varieties for which there is data. Further, we would need to explain how the gendered referential pronoun developed a pre-nominal position (and became a clitic in all varieties) while the subject pronoun has a post-verbal position in the majority of lects.

128 3


Conclusions and Future Directions

The preceding sections have shown, first, that word order in Khasian lects is somewhat variable, but that verb-initial main clause structures are found in varieties throughout the family. The two Khasian lects which have primary verb-medial order in basic clauses (SK and Lyngngam) also have verb-initial structures in adverbial, subordinate, interrogative, or existential clauses. However, there is generally a lack of data on individual Khasian varieties and thus not enough examples to be certain of word order patterns in the array of clause types where word order should be investigated. Second, word order has implications for the historical development of such features as gender, such that verb-initial word order allows for a clearer pathway of gender grammaticalization. This is a complex issue involving many different grammatical features and subsystems, not to mention phonological properties of word combination and many other linguistic facets (semantics, information structure, etc.). I have put forward a working hypothesis of how gender has developed in the Khasian group from a pronominal system with three numbers and how such a gendered pronominal paradigm gave rise to a gender agreement system. This seems justified based on the existing evidence. One question that still remains is whether word order patterns in other AA languages would support the position that Proto-Khasian was verb-initial. Another is whether other AA languages also use pronominal elements in elaborate expressions or for emphatic purposes, and whether such uses align with how such elements function in Khasian. While this is a subject for future research, below I present a brief comparison with Man Noi Plang in terms of word order (§3.1) and realization of elaborate expressions (§ 3.2). 3.1 Comparison of Word Order in Man Noi Plang Currently there is much still to be done in order to compare Khasian word order realizations with other AA languages. Some parallel patterns exist, however, such as those reported by Lewis (2008) for Man Noi Plang. Since the Khasian varieties are more closely related to Palaungic languages within the phylum (Sidwell 2011), comparisons of data from the two families may provide insight into shared historical syntactic developments. For example, Lewis (2008: 14) writes that “Plang pronoun syntax is unique: not only do pronouns occur in the default subject position”, that is, pre-verbally, “but they also seem to occur after the verb when they are referring to the subject participant or when they are coreferential with a subject.” She illustrates the multiple occurrence of pronouns in a single sentence with (38).

word order and the grammaticalization of gender in khasian


(38) lɪk hɔn lɔi ʔɯ ɔn kɛ mok kɛ juŋ khɔ pig big three 1sg that 3pl exist 3pl at pig.pen ‘Those three big pigs of mine are in the pigpen.’ (Lewis08-x125-Data.091) Other examples in Lewis’s thesis reveal post-verbal pronominal order that in many cases aligns with the order found in Khasian lects. In (39) the existential predicate kui ka juŋ kuti ‘there was a village’ is structurally remarkably like the Pnar existential predicate ɛm ka=wi ka=ʧnɔŋ ‘there was a village’ (40), with the main difference in overall structure being the placement of the numeral within the noun phrase (before the noun in the Pnar example, after the noun in the Plang example). What Lewis calls here a ‘dummy subject’ may be functioning to identify the referential nature of the following noun juŋ ‘village’ and forming a nominal unit, ka juŋ ‘the village’. (39) kui ka juŋ kuti muh mannoi have dummy.sub village one name Man.Noi ‘once there was a village called Man Noi.’ (Lewis08-x193-Data.051) (40) ɛm ka=wi ka=ʧnɔŋ have f=one f=village ‘there was a village …’ (Pnar-fieldnotes) There are other indications that in Man Noi Plang the postverbal position for pronouns is the older one. Citing (41) as an example, Lewis (2008: 93) notes “The sentence becomes ungrammatical when the clitic ɛ ‘1pl’ is replaced by a subject pronoun NP before the verb in a temporal adverbial clause”. Since these adverbial clauses do not allow the variable realization that is allowed in main clauses, it suggests that they are an older structure. This word order parallels that identified by Rabel (1961) for Khasi adverbial clauses which, like in Plang, have verb-initial word order, different from Khasi main clauses. (41) cɤt ɛ mɔ lɛi phat hɔt a iŋ hɯɪt lɛi hɯɪt məŋhun finished 1pl then then drive car rf go arrive then arrive Menghun ‘When we finished, we then drove the car and arrived in Menghun.’ (Lewis08-x233-Trip.003) 3.2 Comparison of Elaborate Expressions in Man Noi Plang Man Noi Plang also nominalizes verbs in various ways, and the main nominalizer is the morpheme ku which is glossed by Lewis as ‘nominalizer, particle’. In (42) this morpheme actually occurs as part of an elaborate expression, much



as the gender clitics in the Pnar example (35) above.17 It is quite striking that the same pattern of construction, with a morpheme that similarly serves as a nominalizer, occurs in both Man Noi Plang and in Khasian languages. (42) ʔa pun ti cɪm ku cɯ ku joŋ cɪm pun 1du attained main.part get nml know nml know get can ti lat watkɔŋ main.part special.knowledge ‘We will be about to get knowledge and ability and can get the special knowledge.’ (Lewis08-x30-Brothers.020) 3.3 Final Notes The hypothesis I have presented here is that word order in Proto-Khasian was verb-initial. This follows from a parsimonious explanation for current word order patterns among the Khasian varieties as well as the head-initial properties that would allow for a clear account of how gender agreement grammaticalized in the Khasian lects. While the most parsimonious account may not necessarily be the most accurate, it gives us a starting point from which to explore the interaction of other grammatical paradigms and patterns. One promising direction of research is comparison with other AA languages closely related within the AA tree. The preceding comparison with Man Noi Plang has shown a striking similarity of grammatical structure, but such comparison should be expanded to other AA languages. Further, the origin of certain elements (such as nominalizers) needs to be investigated for each of these varieties, as does the interaction between arguments and salient referential properties. To conclude, this chapter has given an overview of Khasian word order in relation to other linguistic subsystems, contributing to a historical reconstruction of Austroasiatic syntax. With more data on Khasian lects, we may be able to identify more parallel constructions and develop a clearer and more complete syntactic reconstruction of Proto-Khasian. With more data on other AA language families, we can better understand the grammaticalization of gender and classifiers in Khasian as well as related developments in other Austroasiatic languages.


Although we do not know the precise source of the Plang morpheme ku, Palaung has a form ku used as a classifier for humans in songs (see Shorto 2006: 73, item 20 *ɟkooʔ ‘body, self’; also Milne 1931).

word order and the grammaticalization of gender in khasian


References Anderson, Gregory D.S. & Felix Rau. 2008. Gorum. In Anderson, Gregory D.S. (ed.), The Munda languages, Routledge Language Family Series 3. New York: Routledge. ISBN 0-415-32890-X. Baker, Keren Joy. 2013. The Phonology Of Lyngngam: A Meghalayan Austro-Asiatic Language Of North-East India. Master’s thesis, The Australian National University, Canberra. Banker, Elizabeth M. 1964. Bahnar affixation. The Mon-Khmer Studies Journal, 1: 99– 117. Bareh, Curiously. 2007. Descriptive analysis of the Jowai and Rymbai dialects of Khasi. Ph.D. thesis, North-Eastern Hill University, Shillong, Meghalaya, India. Chen, Ping. 2004. Identifiability and definiteness in Chinese. Linguistics, 42(6): 1129– 1184. Cheng, Lisa Lai-Shen & Rint Sybesma. 2005. Classifiers in four varieties of Chinese. In Cinque, Guglielmo & Richard Kayne (eds.), The Oxford Handbook of Comparative Syntax, pp. 259–292. Oxford: Oxford University Press. Choudhary, Narayan K. 2004. Word Order in Pnar. Master’s thesis, Jawaharlal Nehru University. Daladier, Anne. 2005. Kinship and spirit terms renewed as classifiers of “animate” nouns and their reduced combining forms in Austroasiatic. Berkeley Linguistic Society, 28. Daladier, Anne. 2011. A multi-purpose project for the preservation of War oral literature. North East Indian Linguistics, 4: 166–197. Diffloth, Gérard. 1976. Expressives in Semai. In Jenner, Philip N., Laurence C. Thompson, & Stanley Starosta (eds.), Austroasiatic studies, Part I., No. 13 in Oceanic Linguistics Special Publications, pp. 249–264. Honolulu: University of Hawaii Press. Diffloth, Gérard. 1979. Expressive phonology and prosaic phonology in Mon-Khmer. In Thongkum, Theraphan L., Pranee Kullavanijaya, & Vichin Panupong (eds.), Studies in Tai and Mon-Khmer Phonetics and Phonology in honor of Eugénie J.A. Henderson, pp. 49–59. Bangkok: Chulalongkorn University Press. Diffloth, Gerard. 1994. The lexical evidence for Austric, so far. Oceanic Linguistics, 33(2): 309–321. University of Hawai’i Press. Dikshit, Jutta K. & Kamal Ramprit Dikshit. 2014. North-East India: Land, People and Economy. Advances in Asian Human-Environmental Research. London: Springer Dordrecht. Dryer, Matthew S. 2013. Order of Subject, Object and Verb. In Dryer, Matthew S. & Martin Haspelmath (eds.), The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. URL /81.



Ferrer-i-Cancho, Ramon. 2017. The placement of the head that maximizes predictability: An information theoretic approach. Glottometrics, 39: 38–71. Ghosh, Arun. 2008. Santali. In Anderson, Gregory D.S. (ed.), The Munda Languages, pp. 11–98. New York: Routledge. Grierson, George Abraham. 1904. Linguistic survey of India, Vol. 2: Mōn-Khmēr and Siamese-Chinese Families (including Khassi and Tai). Calcutta: Office of the Superintendent of Government Printing, India. Greenberg, Joseph H. 1963. Some universals of grammar with particular reference to the order of meaningful elements. In Greenberg, Joseph H. (ed.), Universals of Language. Cambridge, Mass.: MIT Press. Haas, Mary R. 1964. Thai–English student’s dictionary. Palo Alto: Stanford University Press. Henderson, Eugénie J.A. 1965. Final -k in Khasi: A secondary phonological pattern. Lingua, 14: 459–466. Jakobson, Roman. 1960. Closing statement: Linguistics and poetics. In Sebeok, Thomas A. (ed.), Style in Language, pp. 350–377. New York: Technology Press of MIT and Wiley and Sons, Inc. Jenny, Mathias & Paul Sidwell (eds.). 2015. The Handbook of Austroasiatic languages. Leiden: Brill. Jenny, Mathias, Tobias Weber, & Rachel Weymuth. 2015. The Austroasiatic languages: A typological overview. In Jenny, Mathias & Paul Sidwell (eds.), The Handbook of Austroasiatic Languages, Vol. 1, Ch. 2, pp. 13–143. Leiden: Brill. Kapila, Kriti. 2008. The measure of a tribe: the cultural politics of constitutional reclassification in North India. Journal of the Royal Anthropological Institute, 14: 117–134. doi:doi:10.1111/j.1467–9655.2007.00481.x. Koshy, Anish & Maranatha G.T. Wahlang. 2010. Mnar morpho-syntax: a preliminary study. Indian Linguistics, 72: 153–171. Kumar, Dharma. 1992. The affirmative action debate in India. Asian Survey, 32(3): 290– 302. doi:10.2307/2644940. Lew-Williams, Casey & Anne Fernald. 2007. Young children learning Spanish make rapid use of grammatical gender in spoken word recognition. Psychological Science, 18(3): 193–198. Lew-Williams, Casey & Anne Fernald. 2010. Real-time processing of gender-marked articles by native and non-native Spanish speakers. Journal of Memory and Language, 63(4): 447–464. Lewis, Emily Dawn. 2008. Grammatical studies of Man Noi Plang. Master’s thesis, Payap University. Matisoff, James A. 1973. A Grammar of Lahu. University of California Publications in Linguistics, 75. Berkeley: University of California Press. Milne, Leslie. 1931. A Dictionary of English-Palaung and Palaung-English. Rangoon: Superintendent, Government Printing and Stationary.

word order and the grammaticalization of gender in khasian


Nagaraja, Keralapura S. 1985. Khasi, A Descriptive Analysis. Ph.D. thesis, Deccan College, Pune. Nagaraja, Keralapura S. 1993. Khasi dialects, a typological consideration. The MonKhmer Studies Journal, 23: 1–10. Nagaraja, Keralapura S. 1996. The status of Lyngngam. The Mon-Khmer Studies Journal, 26: 37–50. Nagaraja, Keralapura S. 2015. Standard Khasi. In Jenny, Mathias & Paul Sidwell (eds.), The Handbook of Austroasiatic Languages, Vol. 2, Ch. 19, pp. 1143–1185. Leiden: Brill. Nagaraja, Keralapura S. 2018. Bhoi Khasi compared to Standard Khasi. Journal of the Southeast Asian Linguistics Society, 11(2):i–xii. doi: 52427. Nagaraja, Keralapura S., Paul Sidwell, & Simon Greenhill. 2013. A lexicostatistical study of the Khasian languages: Khasi, Pnar, Lyngngam, and War. The Mon-Khmer Studies Journal, 42: 1–11. Phillips, Timothy C. 2012. Proto-Aslian: towards an understanding of its historical linguistic systems, principles and processes. Ph.D. thesis, Universiti Kebangsaan Malaysia, Bangi. Rabel, Lili. 1961. Khasi, a language of Assam. Louisiana State University Press. Rabel-Heymann, Lili. 1976. Sound symbolism and Khasi adverbs. In Liem, N.D. (ed.), South-east Asian Linguistic Studies, Vol. 2, pp. 253–262. the Australian National University: Pacific Linguistics. Ring, Hiram. 2014. Nominalization in Pnar. The Mon-Khmer Studies Journal, 43 (ICAAL 5): 16–23. Ring, Hiram. 2015. A Grammar of Pnar. Ph.D. thesis, NTU, Singapore. Ring, Hiram R. 2017. Replication Data for: A grammar of Pnar. DR-NTU (Data). doi:10.219 79/N9/KVFGBZ. Shorto, Harry L. 2006. A Mon-Khmer comparative dictionary. Canberra: Australian National University: Pacific Linguistics. Sidwell, Paul. 2011. Proto-Khasian and Khasi-Palaungic. Journal of the Southeast Asian Linguistics Society, 4(2): 144–168. Solnit, David. 1995. Parallelism in Kayah Li discourse: elaborate expressions and beyond. In Bilmes, Leela, Anita Lang, & Weeraw Ostapirat (eds.), Proceedings of the 21st Annual Meeting of the Berkeley Linguistics Society, BLS 21: Special session on discourse in Southeast Asian languages, pp. 127–140. Berkeley: University of California Press. Tesnière, Lucien. 1959. Eléments de la Syntaxe Structurale. Paris: Klincksieck. van Breugel, Seino. 2015. Journey to the Lyngams: People of Meghalaya, Northeast India. Humanities Journal, 22(1): 252–290. van Breugel, Seino. 2016. A presentation and description of Lyngam kinship terms. Humanities Journal, 23(1): 179–211.



Watson, Richard L. 1966. Reduplication in Pacoh. Master’s thesis, Hartford Seminary Foundation, Hartford. Weidert, Alfons K. 1975. I Tkong Amwi. Deskriptive Analyse eines Wardialekts des Khasi. Wiesbaden: Harrassowitz. Wu, Yicheng & Adams Bodomo. 2009. Classifiers ≠ Determiners. Linguistic Inquiry, 40(3): 487–503.

chapter 5

Word Order in the Wa Languages Atsushi Yamada



Austroasiatic languages spoken in Mainland Southeast Asia are generally verbmedial, but a handful of languages are exceptional in having verb-initial basic word order (Jenny et al. 2015). The Wa languages, which genetically belong to the Palaungic branch (Sidwell 2015), are among these exceptions, showing two alternative word orders: VS (Verb-Subject) and SV (Subject-Verb). This paper aims to analyze these alternative word orders from synchronic and diachronic perspectives. Wa is, in fact, not a single mutually intelligible language but an internally diverse group of numerous named languages (or dialectal variants) spread over a wide area that overlaps the Shan State of Myanmar, Yunnan Province of China, and Northern Thailand (Diffloth 1980). The total number of Waspeaking people can only be estimated. Ethnologue (2017) lists three relatively large languages within it: Parauk (around 805,700 speakers), Awa (98,000), and Va (40,700). Among these languages, Parauk (sometimes called Paraok or Praok) is regarded as the representative language of the Wa group, in that it has a considerably larger speaker population than the others and sometimes serves as a lingua franca for speakers of different Wa languages. In China, Parauk is treated as the standard language of the Wa (佤) nationality, and its phonological system is the basis of official transcriptions of the language. Information on Parauk has been provided by many researchers, including Zhou & Yan (1984), Watkins (2002), and Yamada (2008). On the other hand, Awa and Va are little known, and linguistic data about them are still limited. Historically, it is assumed that contacts with the prestigious Tai-Kadai and/or Han-Chinese cultures started from peripheral areas where the Va-speaking people and some parts of the Parauk-speaking people are living today. The Awa-speaking people were the last group to come into contact with the outside. The following section first focuses on Parauk. The grammatical forms are presented, and the pragmatic differences between forms are discussed. Awa and Va are then taken into consideration, and the varieties of word orders in the Wa languages are analyzed from the point of areality and language contact.

© koninklijke brill nv, leiden, 2020 | doi:10.1163/9789004425606_007

136 2


Synchronic Analysis of Parauk

Parauk is characterized by two alternative word orders: VS and SV. Many studies refer to this fact, but only provide minimal discussion. This section aims to discuss Parauk word order in more detail. First, simple clauses and subordinate clauses are respectively described. Then, the alternation of word orders is analyzed from a pragmatic perspective. 2.1 Preceding Studies Zhou & Yan (1984), Huang & Wang (1994), Yamada (2008), and Ma (2012) are the representative descriptive studies of Parauk grammar; Xiao (1981), Schiller (1985), Yan (1987), and Zhao & Zhao (1998) are also good references on Parauk word order. However, of these studies, only Schiller (1985), Zhao & Zhao (1998), Yamada (2008), and Ma (2012) go beyond describing the VS-SV alternation to discuss the basic order in detail. Schiller (1985) and Zhao & Zhao (1998) provide polar-opposite views on these phenomena and their relationship: Schiller (1985) regards VS as the basic order and SV as the result of language contact, while Zhao & Zhao (1998) suggests that SV is the basic order and that VS is used to achieve certain rhetorical effects. Yamada (2008) and Ma (2012) both also mention the possibility of differences based on clause types; however, these discussions are only postulative, as none of the studies provides sufficient analysis to securely establish its interpretations. Nevertheless, from these preceding studies, we can observe the following three points: (i) The order of main clauses is flexible, but that of subordinate clauses is not. (ii) The two word orders may be somewhat related to information structure. (iii) The basic order cannot be clearly established. The following sections will examine Parauk word orders in detail, beginning with these baseline observations. Linguistic examples of Parauk are from field data we collected in the Cangyuan Wa Autonomous County of Yunnan Province, China. 2.2 Simple Clauses In simple clauses, both SV and VS orders are possible in many cases.1 In this section, simple verbs and verbal complexes will be respectively analyzed based on predicate types.

1 In this paper, we treat the category of ‘subject’ as encompassing S and A arguments in all relevant constructions.

word order in the wa languages


2.2.1 Simple Verbs In clauses with intransitive verbs or stative verbs, the single argument (‘subject’) of the clause can precede or follow the verb. (1) hu nɔh kah ca̲ u̲ŋ tiʔ walk 3SG by foot REFL ‘He walked on his foot.’ (2) nɔh hu kah ca̲ u̲ŋ tiʔ 3SG walk by foot REFL ‘He walked on his foot.’ (3) mhɔm sibe̲ʔ maiʔ good cloth 2SG ‘Your cloth is nice.’ (4) sibe̲ʔ maiʔ mhɔm cloth 2SG good ‘Your cloth is nice.’ In clauses with transitive verbs, the order of constituents (Agent, Verb, and Patient) is AVP or VAP. There is no ditransitive verb in Parauk; constituents of ditransitive expressions such as Agent, Theme, and Goal/recipient are lined as AVGT or VAGT, which is essentially the same as transitive expressions. (5) suat ʔai ɡhok li̲k kah ɡɔn kill PN neck pig by knife ‘Ai stabbed the neck of the pig with a knife.’ (6) ʔai suat ɡhok li̲k kah ɡɔn PN kill neck pig by knife ‘Ai stabbed the neck of the pig with a knife.’ (7) tɔʔ ʔɤʔ loksu̲ p mai su̲ p kah maiʔ give 1SG pipe and cigar to 2SG ‘I gave the pipe and cigar to you.’ (8) ʔɤʔ tɔʔ loksu̲ p mai su̲ p kah maiʔ 1SG give pipe and cigar to 2SG ‘I gave the pipe and cigar to you.’



In copular clauses, constituents (Subject, Verb, and Complement) can appear in VSC or SVC order, as illustrated in the following examples: (9) mɔ̲h ʔan koŋthai khrauʔ be that plow new ‘That is a new plow.’ (10) ʔan mɔ̲h koŋthai khrauʔ that be plow new ‘That is a new plow.’ The possessive meaning is expressed by the verb koi, which may precede or follow the subject, as seen in examples (11) and (12). Similarly, the existential meaning is expressed by the verb ʔot, which again precedes or follows the subject, as seen in examples (13) and (14). (11) koi yauŋ yiʔ ɲɛ̲ʔga̲ ɯ̲la̲ i ̲ have village 1PL school ‘Our village has a school.’ (12) yauŋ yiʔ koi ɲɛ̲ʔga̲ ɯ̲la̲ i ̲ village 1PL have school ‘Our village has a school.’ (13) ʔot la̲ i ̲ ʔan pi̲a̲ŋ phɯn exist book that on desk ‘That book is on the desk.’ (14) la̲ i ̲ ʔan ʔot pi̲a̲ŋ phɯn book that exist on desk ‘That book is on the desk.’ 2.2.2 Verbal Complexes Verbs form verbal complexes with auxiliary particles that express TAM (Tense, Aspect, and Modality). Verbs with auxiliary particles precede the subjects in the case of the VS order, as seen in the following examples: (15) saŋ rhɯp nɔh rɔm taɯʔ IRR eat 3SG soup vegetable ‘He will have a vegetable soup.’

word order in the wa languages


(16) nɔh saŋ rhɯp rɔm taɯʔ 3SG IRR eat soup vegetable ‘He will have a vegetable soup.’ Some auxiliary particles, such as hɔik ‘PFV (perfective marker)’ and ʔaŋ ‘NEG (negative marker)’, can also occur initially by themselves, as seen in examples (18) and (21): (17) hɔik hu ʔai PFV go PN ‘Ai has gone.’ (18) hɔik ʔai hu PFV PN go ‘Ai has gone.’ (19) ʔai hɔik hu PN PFV go ‘Ai has gone.’ (20) ʔaŋ ɲa̲ ŋ to nɔh NEG not.yet run 3SG ‘He has not run yet.’ (21) ʔaŋ nɔh ɲa̲ ŋ to NEG 3SG not.yet run ‘He has not run yet.’ (22) nɔh ʔaŋ ɲa̲ ŋ to 3SG NEG not.yet run ‘He has not run yet.’ In addition, Parauk has plenty of serial verb constructions. Yamada (2006) classified these into two categories, separate type and compound type, which correspond to ‘non-adjacent SVC’ and ‘adjacent SVC’ respectively in Aikhenvald & Dixon (2006). Examples (23) and (24) demonstrate the separate type, while examples (25)–(27) show the compound type. In the former case, only the first verb of the clause can occur initially, while in the latter type, either the first or both verbs of the clause can occur initially.

140 table 5.1

yamada Clause structures

VS order Verb–Subject–Object Verbal complex–Subject–(Verbal complex)–Object (Full/Head) (Dependent) SV order Subject–Verb–Object Subject–Verbal complex–(Object)–(Verbal complex)–Object (Full/Head) (Dependent)

(23) li̲a̲k ʔai tiʔ / ɲi̲ ɲa̲ ɯ̲ʔ plai buy PN REFL / PN drink liquor ‘Ai bought liquor to drink. / Ai bought liquor for Nyi to drink.’ (24) ʔai li̲a̲k tiʔ / ɲi̲ ɲa̲ ɯ̲ʔ plai PN buy REFL / PN drink liquor ‘Ai bought liquor to drink. / Ai bought liquor for Nyi to drink.’ (25) hu la̲ ik̲ ʔɤʔ laih kah kaɯŋdu̲ m go 1SG market at PN ‘I went to Kaengdum to shop around the market.’ (26) hu ʔɤʔ la̲ ik̲ laih kah kaɯŋdu̲ m go 1SG market at PN ‘I went to Kaengdum to shop around the market.’ (27) ʔɤʔ hu la̲ ik̲ laih kah kaɯŋdu̲ m 1SG go market at PN ‘I went to Kaengdum to shop around the market.’ 2.2.3 Structure of Simple Clauses From the analysis above, the structure of simple clauses in Parauk can be summarized as in Table 5.1. In VS order, either a full verbal complex or its head element can precede the subject.


word order in the wa languages table 5.2


Word order and subject type


Proper noun

Noun + determiner

Common noun

Indefinite pronoun

frequent frequent

frequent frequent

frequent frequent

frequent rare

frequent rare

2.2.4 Pragmatic Analysis Both word orders are possible in many simple clauses, but the two orders are not always interchangeable. Table 5.2 shows the relationships between the word order and type of subject in our data, including written documents, conversations, and narratives, which have partly been presented in Yamada (2007). As shown in Table 5.2, subjects with lower specificity tend to favor VS order. For example, subjects of indefinite pronouns (e.g. interrogative pronouns) rarely occur in SV order.2 (28) chuh patiʔ tan move what there ‘What is moving there?’ (29) mɔ̲h mɔʔ pa ʔaŋ hoik kɔʔ be who NML NEG come yesterday ‘Who is the man who did not come yesterday?’ The subjects in examples (28) and (29) can be generalized as new information. Note how the SV order is chosen when the subject is obvious to the speaker and the listener. Imperative and exclamatory sentences are typical cases in which this is so, as seen in examples (30) and (31).

2 There are only two examples of SV order with an interrogative subject in the literature. Their contexts are not clear, but they are possibly questions in which the speaker is ‘asking back’ to an interlocutor for clarification of a previous statement. (a) mɔʔ saŋ la̲ ik̲ veŋ who IRR town ‘Who will come into the town?’ (Zhou & Yan 1984: 55) (b) mɔʔ tɯ̲ i̲ kloŋ hu who take bowl go ‘Who did take the bowl away?’ (Zhao & Chen 2006: 53)



(30) ʔaʔ hu ma ta̲ ɯ̲ 2DU go field together ‘Let’s go to the field together.’ (31) li̲k ʔin lu̲ ßk kluiŋ lɛʔ pig this really fat EXCLAM ‘(As I thought) that pig is really fat!’ Examples (32) and (33) are the beginning of story. In example (32), the subject pa̲ u̲ʔʔaik ra kaɯʔ ‘two brothers’ is not specified, and so naturally the VS order is used. In contrast, in example (33), the subject ɲɛ̲ʔ kɛʔ ‘their house’ is clear from the context shown in example (32), and therefore the SV order is used. (32) ʔot pa̲ u̲ʔʔaik ra kaɯʔ ɲɛ̲ʔ tiʔ noŋ live brother 2 CLF house REFL alone ‘Two brothers live in their house without any other family members.’ (33) ɲɛ̲ʔ kɛʔ kin hot kin sibhɔm house 3DU very poor very hunger ‘Their house is very poor.’ 2.2.5 Information Structure Under the pragmatic analysis above, VS order appears to be the unmarked order. Moreover, the SV order is probably the result of argument topicalization. Information structure in Parauk is shown in Table 5.3. The Floating Topic slot, often separated from the rest of the sentence by a pause or particle, can include topicalized arguments or non-arguments (e.g., time or place). The Topic slot is limited to topicalized arguments. In examples (34)–(38), the floating topic and the topic are respectively marked by (FT) and (T). (34) hɔik ʔɤʔ li̲a̲k sibe̲ʔ PFV 1SG buy cloth ‘I have already bought the cloth.’ (35) kɔkɔʔ hɔik ʔɤʔ li̲a̲k sibe̲ʔ yesterday (FT) PFV 1SG buy cloth ‘Yesterday, I already bought the cloth.’

word order in the wa languages table 5.3


Information structure

Floating Topic




(36) ʔɤʔ hɔik li̲a̲k sibe̲ʔ 1SG (T/FT) PFV buy cloth ‘I have already bought the cloth.’ (37) sibe̲ʔ hɔik ʔɤʔ li̲a̲k cloth (T/FT) PFV 1SG buy ‘About the cloth, I have already bought it.’ (38) sibe̲ʔ ʔɤʔ hɔik li̲a̲k cloth (FT) 1SG (T) PFV buy ‘About the cloth, I have already bought it.’ In examples (36) and (37), particles such as nɛ̲h and kɔʔ can follow ʔɤʔ ‘1SG’ and sibe̲ʔ ‘cloth’ in the respective examples. In these cases, they can be regarded as floating topics. 2.3 Subordinate Clauses In general, subordinate clauses are divided into three types: relative clauses, complement clauses, and adverbial clauses. Unlike simple clauses, subordinate clause word order is almost fixed in all of these clause types. 2.3.1 Relative Clauses Relative (attributive) clauses in Parauk follow the head noun they modify, without any markers. Relative clauses are bracketed in the following examples. (39) ʔaŋ ʔɤʔ tɔ̲ŋ pu̲ i ̲ [ bru̲ k bru̲ ŋ ] NEG 1SG know human ride horse ‘I do not know the man who rode the horse.’ The head noun as an object is seen in example (40), whereas an exocentric construction is seen in example (41). These examples show that the word order in relative clauses is VS. (40) ʔɯp [ sɔm ʔeʔ ʔin ] ʔaŋ ɲɔ̲m cooked.rice eat 1PL this NEG good ‘This cooked rice which we have eaten was not delicious.’



(41) dɯ̲ [ li̲a̲k ʔeʔ sibe̲ʔ kah ] ʔot daɯʔ laih place buy 1PL cloth at exist in street ‘The place where we bought clothes is on the street.’ 2.3.2 Adverbial Clauses The main functions expressed by adverbial clauses include temporal, causal, and conditional functions. They can be either preceded or followed by main clauses. When occurring in front of main clauses, the adverbial clause’s word order depends on its meaning: temporal and causal clauses occur in VS order, while conditional clauses are in SV or VS order. When occurring after main clauses, adverbial clauses are consistently in VS order. (Clauses in the examples below are separated by //). Temporal Clauses Either in front of or after the main clause, temporal meanings are expressed using VS order, which can be optionally preceded by conjunctional particles such as khaiʔ ‘after’ or ya̲ m ‘when’; these particles have homonym nouns khaiʔ ‘future’ and ya̲ m ‘time.’ If a temporal clause occurs after the main clause, a prepositional particle, such as mai ‘while’, is obligatory. (42) (khaiʔ) hɔik li̲h pu̲ i ̲ khaiŋ siɡaŋ // ʔaŋ ya̲ u̲ʔ patiʔ (after) PFV come.out human from gourd NEG see something ʔih tiʔ eat REFL ‘After humans came out from the gourd, they could not see anything to eat.’ (43) ʔɤʔ lɔk hoik // khaiʔ sɔm tiʔ 1SG soon come after have.meal REFL ‘I will come after having a meal.’ (44) to nɔh li̲h // mai yi̲a̲m tiʔ run 3SG out while cry REFL ‘He ran out while crying.’ Causal Clauses When occurring in front of the main clause, causal meanings are expressed in VS order, optionally preceded by a conjunctional particle such as ja̲ u̲ ‘because’ or khɯ ‘because.’ (The former has a homonym noun ja̲ u̲ ‘reason.’) If a causal clause occurs after the main clause, causal meanings are also expressed in VS order, a prepositional particle such as kah ‘by’ is obligatory.

word order in the wa languages


(45) ( ja̲ u̲) ti̲ŋ lhɛʔ // rɔm klɔŋ kɯm vhuan (because) large rain water river then expand ‘Because it rained heavily, the water of the river expanded.’ (46) (khɯ) mɔ̲h ʔaŋ nɔh hoik // ʔɤʔ ʔaŋ hu tan (because) be NEG 3SG come 1SG NEG go there ‘Because he did not come, I also did not go there.’ (47) yi̲a̲m nɔh // kah hu maiʔ cry 3SG by go 2SG ‘He cried because you had gone.’ Conditional Clauses Conditional clauses always precede the main clause, and almost always exhibit SV order in many cases. Additionally, Conditional subordinator particles may optionally occur. (48) (vi̲a̲ŋ) lhɛʔ ti̲ŋ // ʔɤʔ hu ma khɔm (though) rain large 1SG go field as.well ‘The rain is still large. I will go to the field as well.’ (49) (sin) maiʔ ʔaŋ hoik // ʔɤʔ kah ʔaŋ hu (if) 2SG NEG come 1SG also NEG go ‘If you will not come, I will not go either.’ (50) (kɛ̲h) ʔɤʔ koi ma̲ ɯ̲ // ʔɤʔ ri̲a̲n tiʔ ɡɔ̲ nɔh (if) 1SG have money 1SG ready REFL help 3SG ‘If I have money, I will help him.’ However, there are some exceptional cases using VS order, as seen in example (51). The choice between the two word orders may be related to kinds of conditional subordinator particles. (51) (tɔʔ) ʔaŋ nɔh hoik // maiʔ ʔiŋ khaiŋ (if) NEG 3SG come 2SG go.back only ‘If he do not come, you can only go back.’ 2.3.3 Complement Clauses Complement clauses occur as the clausal arguments of perception and cognition verbs such as ya̲ u̲ʔ ‘see’ and simɛ̲ ‘hope’. They can be divided into two



types: VS order type and SV order type. The former type are optionally adjoined to the matrix clause with complement marker kah,3 as seen in the following examples: (52) ʔɤʔ ya̲ u̲ʔ (kah) ʔih nɔh ʔan 1SG see (by) eat 3SG that ‘I saw that he was eating that.’ (53) lhat ʔɤʔ (kah) ya̲ u̲ʔ tiʔ nɔh be.afraid 1SG (by) see REFL 3SG ‘I was afraid that I had seen him.’ The latter type are directly adjoined to the matrix clause without any marker, as seen in examples (54) and (55). (54) simɛ̲ ʔɤʔ nɔh dɯ̲ i̲h hoik ca̲ u̲ hope 1SG 3SG return come early ‘I want him to come back early.’ (55) ʔaŋ nɔh cu̲ tiʔ hu NEG 3SG agree REFL go ‘He did not agree to go.’ Although complement clause word order remains unclear, it does appear, as Yamada (2008) and Ma (2012) reported preliminarily, that it is affected by the type of main verb used. Verbs that express subjective meanings such as simɛ̲ ‘hope’, kɤ̲t ‘think’, muih ‘love’, po̲ n ‘get’, ci̲ʔ ‘be possible’, and khɔ ‘be correct’ show SV order, while verbs such as ya̲ u̲ʔ ‘see’, tɔ̲ŋ ‘know’, and yi̲ ‘believe’ show VS order optionally marked by the complement marker kah. 2.3.4 Word Order in Subordinate Clauses Based on the analysis above, we can summarize word order in subordinate clauses in Parauk as in Table 5.4. Almost all adverbial clauses show fixed VS order. Conditional clauses are the only exception which has two orders.

3 The complement marker kah can possibly be regarded as a prepositional particle, as described in section

word order in the wa languages table 5.4


Word order in subordinate clauses

Clause type

Word order

Relative Adverbial

VS VS (in temporal, causal) SV, VS (in conditional) Complement VS (optionally marked by kah) SV (without any marker)

2.4 Summary In conclusion, Parauk word order is as follows: (iv) Main clauses alternate between VS and SV, where SV is probably the result of argument topicalization. (v) VS is predominant in subordinate clauses, but SV is used depending on clause type and maybe type of main verb. (vi) The VS order is possibly the unmarked order.


Comparative Analysis

Two languages closely related to Parauk will be taken into consideration in this section: Awa (Ava/Vo) and Va (Loi). Speakers of these languages are regarded as part of the Wa nationality in China. 3.1 Sociolinguistic Information The Awa-speaking people live in the southern part of the region where the Wa nationality live. Their autonyms include ʔavɤʔ, rɤviaʔ, and vɔʔ. Their society is known to be conservative in relation to other societies in this area (Yamada 2009), and had been little in contact with other people before the middle of the twentieth century; therefore, monolingualism is still dominant even today. The Va-speaking people live in the northern part of the region where the Wa nationality live. Autonyms of the Va-speaking people are vaʔ, la, phirʌk, and mɔŋhom (probably from the name of the city called Menggong in Mandarin 勐汞). They are often called benren (本人) by the Han-Chinese people and la kumɤŋ by Shan (Tai Nuea) peoples, both of which mean ‘original people (in

148 table 5.5

yamada Word orders in the Awa, Parauk, and Va languages

Grammatical category




Main clause Subordinate clause Adposition Adjective and Noun Relative clause and Noun Demonstrative and Noun Numeral and Noun Degree word and Adjective Negative and Verb

VS, SV VS, SV Preposition N-Adj N-Rel Dem-N-Dem N-Num-CLF Adj-Deg Neg-V

VS, SV VS, SV Preposition N-Adj N-Rel N-Dem N-Num-CLF Deg-Adj Neg-V

SV SV Preposition N-Adj N-Rel N-Dem N-Num-CLF Deg-Adj Neg-V

this area)’. They have little sense of a common ethnicity with the Parauk and Awa, and compared with the Parauk and Awa, the society of the Va-speaking people is more receptive to intercourse with other societies (Yamada 2009); they have maintained contact with Tai-Kadai and/or Han-Chinese peoples for a long time. They tend to be multilingual, with knowledge of Tai-Kadai and/or Chinese languages, and not all of them still speak their own language. 3.2 Comparative Analysis of Word Order Word orders in the three languages are presented in Table 5.5. Remarkable features are highlighted and discussed in this section. Linguistic examples of Awa and Va are from field data we collected in Menglian County and Cangyuan County, respectively, of Yunnan Province, China. 3.2.1 Word Order in Main Clauses Parauk and Awa have two alternative orders, VS and SV, whereas Va has a SV order that is fixed in all cases. [Parauk] (56) li̲a̲k ʔɤʔ su̲ p buy 1SG cigar ‘I bought a cigar.’

ʔɤʔ li̲a̲k su̲ p 1SG buy cigar ‘I bought a cigar.’

[Awa] (57) cɔk ʔɯʔ sop buy 1SG cigar ‘I bought a cigar.’

ʔɯʔ cɔk sop 1SG buy cigar ‘I bought a cigar.’

word order in the wa languages


[Va] (58) ʔʌ̲ʔ ve ɲɔ̲ 1SG buy cigar ‘I bought a cigar.’ 3.2.2 Word Order in Subordinate Clauses Parauk and Awa have two alternative orders, VS and SV, based on the functions of the subordinate clauses in which they appear. In contrast, Va has fixed SV order whatever the clause function may be. [Parauk] (59) ya̲ u̲ʔ ʔɤʔ (kah) ʔih maiʔ taɯʔ ʔan see 1SG (by) eat 2SG dish that ‘I saw that you were eating that dish.’ (60) sin maiʔ ʔaŋ hoik // ʔɤʔ kah ʔaŋ hu if 2SG NEG come 1SG also NEG go ‘If you will not come, I will not go either.’ [Awa] (61) yiuʔ ʔɯʔ phrɔʔ miʔ ʔoan taɯʔ ʔoan see 1SG eat 2SG that dish that ‘I saw that you were eating that dish.’ (62) khrim miʔ ʔaŋ hoik // ʔɯʔ ʔaŋ huan if 2SG NEG come 1SG NEG go ‘If you will not come, I will not go either.’ [Va] (63) ʔʌ̲ʔ yauʔ bʌ̲ʔ ʔi̲h ta̲ u̲ʔ ʔɔ̲n 1SG see 2SG eat dish that ‘I saw that you were eating that dish.’ (64) phan bʌ̲ʔ ʔaŋ ho̲ ik̲ // ʔʌ̲ʔ ʔaŋ hu̲ if 2SG NEG come 1SG NEG go ‘If you will not come, I will not go either.’ 3.2.3 Word Order in Demonstratives and Nouns Demonstrative modifiers follow nouns in all three languages; in addition, in Awa, two demonstrative pronouns surround the head noun.

150 [Parauk] (65) krak ʔin buffalo this ‘this buffalo’ [Awa] (66) ʔian krak ʔian this buffalo this ‘this buffalo’ [Va] (67) kra̲ k ʔi̲n buffalo this ‘this buffalo’


pu̲ i ̲ mhɔm ʔin human good this ‘this good man’

ʔian phui hmɔm ʔian this human good this ‘this good man’

phi bo̲ m ʔi̲n human good this ‘this good man’

3.2.4 Word Order in Degree Words and Adjectives Adjectives of degree are expressed in different ways across the languages. In Awa, the modifier follows the verb; in Parauk and Va, it is expressed by the modal marker in front of the verb. [Parauk] (68) pu̲ i ̲ kɛt mhɔm human very good ‘very good man’ [Awa] (69) phui hmɔm khoat human good very ‘very good man’ [Va] (70) phi nɤŋ bo̲ m human very good ‘very good man’ 3.3 Areality and Language Contact It is difficult to account for the diversities of word orders on the basis of features of only the Wa languages themselves. Here, the influences of the neighboring languages Shan (Tai Nuea) and Chinese are taken into consideration. Examples (71)–(76) show sentences with the same meaning as examples (56)–(64).

word order in the wa languages


[Shan (Tai Nuea)] (71) kau6 sɯ4 ja3mom5 1SG buy ciger ‘I bought a cigar.’ (72) kau6 han1 maɯ2 kin6 phak7 lai4 1SG see 2SG eat dish that ‘I saw that you were eating that dish.’ (73) thɯŋ1va6 maɯ2 jaŋ5 ma2 // kau6 kɔ3 jaŋ5 ka5 if 2SG NEG come 1SG also NEG go ‘If you will not come, I will not go either.’ [Chinese] (74) wŏ măi yān 1SG buy ciger ‘I bought a cigar.’ (75) wŏ kànjiàn nĭ chī nà dào cài 1SG see 2SG eat that CLF dish ‘I saw that you were eating that dish.’ (76) rúguŏ nĭ bù lái // wŏ yě bú qù if 2SG NEG come 1SG also NEG go ‘If you will not come, I will not go either.’ Shan (Tai Nuea) has consistent SV order, which is as same as that of Va. Chinese also shows the same word order except for the order of numeral and noun. As mentioned in Section 3.1, the three dialects have different socio-linguistic backgrounds; Va was very likely the earliest group to come into contact with the prestigious Tai-Kadai and/or Han-Chinese cultures. Parauk is next to Va, and Awa was then the last group to come into contact with outside cultures. This may give one possible reason for the variation in word order, in that the Wa languages were influenced by neighboring SV languages. Va may have been influenced the most, thereby changing to consistent SV order.

152 4



Parauk has two word orders: VS and SV, which are not always interchangeable. Based on our synchronic analysis, it can be suggested that the usages of SV are more limited than those of VS. In main clauses, VS functions as the syntactic default order, while SV functions as the pragmatically marked order. The first element of the SV word order is probably the result of argument topicalization (or other pragmatic factors). In subordinate clauses, word order is fixed. VS is predominant, and SV is restricted to conditional clauses and complement clauses conditioned by types of main verbs. Comparisons with other Wa languages—Awa and Va—yield more suggestions to our analysis of Parauk above. Historically, the Awa-speaking people has been regarded as the most isolated of the three, while the Va-speaking people started to contact neighboring Tai-Kadai and/or Han-Chinese peoples at a relatively early stage. Some variation in word order can be seen between the three languages. Parauk and Awa have default VS order and use SV depending on syntactic and pragmatic factors. Awa is characterized as the most consistent of the three in the use of head-initial constructions, while Parauk lacks them only in degree words modifying adjectives (Degree-Adjectives). Va shows consistent SV order and Degree-Adjectives constructions. It is difficult to draw diachronic conclusions at this stage, but sociolinguistic contexts imply that the VS order of Awa and Parauk is older and more indigenous, while the SV order of Va has been influenced by contact with neighboring SV languages. According to Shintani (2016), Siam (Hsem), a Wa language spoken by an ethnic group in Northern Thailand which is strongly influenced by Tai-Kadai languages, also seems to be consistent in using SV order. The diachronic changes analyzed in this paper will support the discussion of predicate-initial structures in the proto-Austroasiatic language (see Jenny 2015 and Jenny in this volume). Parauk can be conjecturally analyzed as being in an intermediate stage of change from a VS language to a SV language, where SV order is presently rather restricted in use. Of course, this remains a matter of speculation, and more detailed and comprehensive analysis must be done in future research.

word order in the wa languages


References Aikhenvald, Alexandra Y. & R.M.W. Dixon. 2006. Serial Verb Constructions: A Crosslinguistic Typology, Oxford: Oxford University Press. Diffloth, Gérald. 1980. The Wa Languages. Berkeley: Department of Linguistics, University of California. Ethnologue. 2017. Languages of the World, 20th Edition, SIL International, https://www Huang, TongYuan & JingLiu Wang. 1994. A Brief Sketch of the Wa Language (in Chinese). In Wang, JingLiu, HuaPeng Zhang & YuFen Xiao (eds). Study of the Wa Language (in Chinese), 1–38. Kunming: Yunnan Minzuchubanshe. Jenny, Mathias. 2015. Syntactic diversity and change in Austroasiatic languages. In Viti, Carlotta (ed.) Perspectives on historical syntax, Amsterdam/Philadelphia: John Benjamins, 317–340. Jenny, Mathias, Tobias Weber & Rachel Weymuth. 2015. The Austroasiatic Languages: A Typological Overview, In Jenny, Mathias & Paul Sidwell (eds.) The Handbook of Austroasiatic Languages, Volume 1, 13–143. Leiden/Boston: Brill. Ma, SengMai. 2012. A Descriptive Grammar of Wa, M.A. Thesis, Payap University. Schiller, Eric. 1985. An (initially) surprising Wa language and Mon-Khmer word order, UCWIPL 1, 104–119. Shintani, Tadahiko. 2016. The Siam (Hsem) Language, Linguistic Survey of Tay Cultural Area No. 107. Tokyo: The Research Institute for Languages and Cultures of Asia and Africa. Sidwell, Paul. 2015. Austroasiatic Classification, In Jenny, Mathias & Paul Sidwell (eds.) The Handbook of Austroasiatic Languages, Volume 1, 144–220. Leiden/Boston: Brill. Xiao, ZeGong. 1981. Word order of subject and predicate in Wa (in Chinese). Minority Languages of China 2. Yamada, Atsushi. 2006. Serial Verb Constructions in Parauk Wa: with Special Reference to Constructions with Two Volitional Verbs (in Chinese). Studies of Eastern Eurasian languages, 222–233. Tokyo: Kohbun. Yamada, Atsushi. 2007. Parauk Wa Folktales. Tokyo: The Research Institute for Languages and Cultures of Asia and Africa. Yamada, Atsushi. 2008. Descriptive Grammar of Parauk Wa (in Japanese). Ph.D. Dissertation, Hokkaido University. Yamada, Atsushi. 2009. Memories of Simganglih: Oral Traditions of the Wa in Yunnan Province, China (in Japanese). Tokyo: Yuzankaku. Yan, QiXiang. 1987. Word order in Wa (in Chinese). Studies in Language and Linguistics 1. Watkins, Justin. 2002. The Phonetics of Wa: Experimental Phonetics and Phonology, Orthography and Sociolinguistics. Canberra: Pacific Linguistics.



Zhao, FuRong & GuoQing Chen. 2006. A basic Course of the Wa language (in Chinese). Beijing: Central Nationalities University of China. Zhao, YanShe & FuHe Zhao. 1998. Grammar of the Wa Language (in Chinese). Kunming: Yunnan Minzuchubanshe. Zhou, ZhiZhi & QiXiang Yan. 1984. A brief Description of the Wa language (in Chinese). Beijing: Minzuchubanshe.

part 3 Munda

chapter 6

Proto-Munda Prosody, Morphotactics and Morphosyntax in South Asian and Austroasiatic Contexts Gregory D.S. Anderson


Introduction and Overview*

The Munda languages appear to be perfect laboratories for the study of mutual influences in South Asia. Some features of originally non-Munda origin appear to have entered the Munda languages at various times, through borrowing or through (partial) structural accommodation leading to ‘metatypy’ (Ross 2007). As the comparative study of the Munda languages, independently and within Austroasiatic broadly, remains in its infancy, much remains preliminary and impressionistic (Pinnow 1959, 1960, 1963, 1966, Anderson 2004, Sidwell and Rau 2015). Further, since Munda languages have their linguistic cousins and origins to the east, they also have features reflecting shared history with various language groups of Southeast Asia at an earlier period in their history. It has been traditionally assumed that Munda accrued various features that it originally lacked from different South Asia sources, assuming also that its sister languages (or some of them) retain a more pure/original state. The exact nature and range of such varied areal and temporally defined influences however have not yet been adequately investigated. The history of Munda languages remains elusive. But investigators of South Asian languages have and continue to evoke contact with, or influence from (or on), ‘Munda’ to ‘explain’

* Thanks to Opino Gomango, Dr. Bikram Jora and Dr. Luke Horo for assistance in the Munda Languages Initiative and also to National Endowment for the Humanities for grant PD5002513 “Documentation of Hill Gtaʔ, an endangered Munda language of India”, the National Science Foundation for award 1500092 “Documentation of Gutob, an endangered Munda language of India” award 1844532 “Sora typological characteristics: Towards a Reevaluation of South Asian Human History” and award 0853877 “Documentation of Remo (Bonda)”, the Genographic Legacy Fund grant for the “Ho Talking Dictionary”, and to Ironbound Films for in part making work possible on Sora, Remo, Juang, Santali, and Ho during filming of The Linguists. Other work on the following Munda languages was made possible under occasional funding to Living Tongues’ Munda Languages Initiative: Bhumij, Birhor, Gtaʔ, Remo, Gutob, Gorum, Juray, Sora, Korku, Santali, Kharia, Juang, Keraʔ Mundari and Tamaɽia Mundari.

© koninklijke brill nv, leiden, 2020 | doi:10.1163/9789004425606_008



a whole range of unexpected lexical or structural features or elements in nonMunda language groups of South Asia, without any basis for identifying what is old or what is new in Munda, or any other means of scientifically assessing such claims. Such studies include Bloch (1934) who attributed aspiration in Indo-Aryan to Munda influence. It also includes Kuipers (1948, 1965) who attributed expressive reduplication and hundreds of lexemes in Indo-Aryan to Munda, subsequently largely refuted by Osada (2005). Other largely fanciful investigations are Witzel (2003) on ‘para-Munda’ vocabulary and Witzel (2009 [2012: 52]) on putative Sanskrit loans from Kharia. Another no longer viable conclusion in this realm includes studies on the rise of object encoding in the verb in South Central Dravidian under alleged Munda influence (Burrow and Bhattacharya 1970, Israel 1979), a claim refuted by Steever (1993). Although neat, simple and appealing, claims that Munda languages show structures that are “opposite at every level” (Donegan and Stampe 2004: 3, 5) from their related sister languages (Donegan and Stampe 2004, Donegan (1993) and Stampe and Donegan 1983) represent a overgeneralization and a significant misrepresentation of the actual facts. A systematic comparative and synchronic study of the Munda language family is underway (since 2005) to help to determine how exactly to situate the Munda languages within their broader South Asian areal context. It also seeks to determine the precise nature of their relationship to other Austroasiatic language groups, most of which themselves in turn now constitute core members of the Southeast Asian areal linguistic complex. Munda linguistic features examined here are thus situated in their broader South Asian and Southeast Asian comparative contexts. This includes features relating to the phonological and prosodic structure of Munda words and phrases, the nature of their lexica and nominal and verbal derivational systems (Anderson 2015a), and features of their inflectional and morphosyntactic systems (Anderson 2007, 2016). With this, we can begin to unravel and explain the complex layering of historical influences that these languages reveal. All Munda languages are typically SOV and proto-Munda was likely this too, with V Aux structure that gave rise to the verbal complexes. But there are postverbal slots generally filled by (pro)nominal subject and topics, such that VS order is also found, and this constituent order is likely old in the AA language family (Jenny et al. 2015, Jenny this volume). There are both prefixes and suffixes (and infixes) that are morphotactically distinct from proclitics and enclitics in different individual Munda languages, but proto-Munda may have had only inflectional clitics (e.g., subject proclitics), except possibly for a single post-verb root slot for either tam/(i)tr or obj suffixes. This requires more research into the actual morphophonology of these elements in the languages that preserve

proto-munda prosody, morphotactics and morphosyntax


them. Various relict prefixes found in Munda lexical items can be found in relict forms in other Austroasiatic branches, as can the putative case prefix on speech act participants. Noun incorporation was also found. Word-level prosody in proto-Munda was likely *[Weak-Strong]Word, while phrasal level prosody has become [Strong-Weak]Phrase in many Munda languages (Anderson 2015b, Ring and Anderson 2017). In short, proto-Munda, despite its verb final syntax, otherwise seems very Austroasiatic indeed. With respect to the classification of the Munda languages, South Munda per se does not seem to exist as a valid taxon (Anderson 2012, 2015, 2016; Sidwell 2015a). Rather this alleged group is simply at best a quasi-areal grouping of nontaxonomic status. Even its areality is questionable as most areas where Kharia is spoken are dominated by Kherwarian North Munda speakers (or Indo-Aryan or Dravidian-speaking Kurukh Oraons). Both lexical and grammatical innovations support the North Munda branch, and each of its daughters, ProtoKherwarian and Proto-Korku. Proto-Sora-Juray-Gorum and Proto-Gutob-Remo are likewise defensible taxa, but the remaining three languages form no higherlevel taxon with other members, i.e., Proto-Juang, Proto-Kharia, and ProtoGtaʔ (Figure 6.1). Lexically, considering each of these groups to be separate also appears to be justifiable, a conclusion reached independently by Sidwell (2015a) as well. Grammatical data gives more or less the same results with some notable exceptions. Juang has maintained a number of archaic features, and appears to have been on an independent path of development for some time. At some later as yet undetermined historical point, (Proto-)Gtaʔ came under the influence of Proto-Gutob-Remo. Or perhaps it was more recently from just Remo itself, and various lexical doublets have arisen in Gtaʔ: One looks more Remolike and thus likely originated as a borrowing, and one has an original Gtaʔ origin, e.g. nti versus tti ‘hand’ (cf. Remo titi), or nsig versus nsoʔ ‘banana’ (cf. Remo nsuɽaʔ > nsuaʔ > nsoʔ), etc. Note that these doublets crosscut the recognized dialect divisions of the language (Plains vs. Hill). In some Hill Gtaʔ villages, such Remo-like forms are often considered ‘women’s speech’ even as we have recorded men saying them. Conversely, grammatically speaking, Proto-Gutob-Remo and Proto-Kharia shared a number of innovations that suggest a period of common development. This includes the fundamental opposition between two verb classes, roughly active/middle or transitive/intransitive, each differentiated by formally cognate tam markers encoding perfectivity or past tense. Such locally shared developments/inter-influencing in seemingly already quasi-differentiated groups reflects a situation that has been described as a proto-language linkage by Ross (1988, 1997), recently discussed by François (2014).



figure 6.1 Classification of the Munda languages using lexical and grammatical data

The ‘tree’ of Munda in figure 6.1 should be viewed with caution and considered still preliminary and impressionistic. The same must be said of all reconstructions attempted to date published (Anderson 2001, 2004, 2015, Sidwell and Rau 2015) and herein.


On the Prosody, Morphotactics and Syntax of Munda Languages

2.1 The Fallacy of Rhythmic Holism In a series of papers, Donegan and Stampe (1983, 2002, 2004) and Donegan (1993), have argued for a strong position of the Indospheric versus Sinospheric debate (see also Post 2011). This has been a very widely received paper, which makes it all the more unfortunate that much of the core of what they assert does not actually accurately reflect what the Munda languages are really like. In many ways, the notions discussed are oversimplified and anachronistic, starting with the stipulation that Munda and an undifferentiated “Mon-Khmer”1 1 Such a dichotomy implying a primary split between Munda and all other Austroasiatic groups in a so-called Mon-Khmer taxon is not widely held to be true in Austroasiatic linguistics today, neither the Munda: non-Munda split, nor the term ‘Mon-Khmer’.

proto-munda prosody, morphotactics and morphosyntax


are “systematically opposite at every level” (Donegan and Stampe 2004: 3, 5). Among this full set of oppositions that covers virtually all ‘external’ aspects of language from phonetics to syntax, they enumerated the following (Donegan and Stampe 2004: 3): (1) Domain Grammar Words Consonants Vowels

Munda/Indosphere “Synthetic” Falling (Trochaic) Stable/Assimilative Harmonizing/Stable

Mon-Khmer/Sinosphere versus versus versus versus

“Analytic” Rising (Iambic) Shifting/Dissimilative Reducing/Diphthongizing

Each of these claims is addressed in brief below. I will not belabor the lack of methodological soundness in the parameter of ‘synthesis’ versus ‘analysis’: There are many degrees of phonological integration and cross-constructional variation both within individual languages and within taxonomic units. This includes both Munda and other branches of Austroasiatic, as well as many other taxa from around the world. Many researchers have pointed out these facts before. However, given the impact that Donegan and Stampe (2004) continues to have both within Austroasiatic studies and more broadly, it is worth mentioning that some degree of synthesis in individual constructions have been reported for many so-called ‘Mon-Khmer’ branches, including Nicobarese (Radhakrishnan 1981), Aslian (Omar 1975, Benjamin 1976, 2011, Burenhult 2002, Matisoff 2003, Kruspe 2004), Khasian (Nagaraja 1993), Palaungic (Milne 1921), Khmuic (Svantesson 1983), Mangic (Li 1996, Li and Lou 2015), Khmeric (Bauer 1986, You Sey 1976, Schiller 1994, Thomas 1990, Jenner and Sou 1982), Monic (Jenny 2003, 2005, Bauer 1982, 1989), Katuic (Watson 1964, 1966; R. Watson 2011, Solntseva 1996, Costello 1966, 1998, 2001, Bauer 1990, Alves 2004), Bahnaric (Smith 1969, 1973, Gradin 1976, D. Thomas 1969, Bauer 1987–1988) and even Vietic (Enfield and Diffloth 2009). Indeed as one might expect, there appears to be a cline that the more peripheral, to the west, south or north the taxon is within Austroasiatic the more frequently one encounters traces, or active systems, of morphology. This is more or less explicitly stated by Alves (2013, 2014, 2015),2 and is at least in line with the spirit of Sidwell (2008). It is of course precisely on the periphery that one

2 There are of course also zones of innovations not only in the central zone but also in the peripheral areas as well (Alves 2015, personal communication).



expects to see archaic retentions in contrast with an innovative central zone in large taxa like Austroasiatic. This has already been known to be characteristic of residual zones in contrast to spread zones as discussed by Nichols (1992). Such developments are also entirely in accord with so-called wave theories of change in historical linguistic analysis. So we see a cline of degrees or range of synthetic and analytic structures both within Munda and Austroasiatic as a whole, and thus clear-cut distinctions of Munda/Indospheric/Synthetic and Mon-Khmer/Sinospheric/Analytic are overstated and indeed misleading. Given that Munda languages represent one taxon on an equal footing with every other one in Austroasiatic, it makes little sense methodologically to categorically exclude everything in Munda as secondary. Outmoded views of the phylogeny of Austroasiatic permeate the discussion in Donegan and Stampe (2004). Many recent papers on Austroasiatic classification (e.g. Sidwell 2015a, 2009, 2013) have rejected this anachronistic notion. But it nevertheless continues to resonate as a subtext in many contemporary studies on comparative Austroasiatic. I will not offer a detailed critique here of the claims that consonants and vowels are both stable in the Munda languages, nor that consonants are assimilative and vowels harmonizing typically, as has been claimed by Donegan and Stampe (2004: 3). Once again, Gtaʔ stands out in this regard. It shows both significant, and different, trends towards diphthongization and subsequent re-monophthongization of vocalic nuclei. It also shows considerable loss or reduction of word final consonants, and indeed renewed fortitions in this position as well as across the various Gtaʔ lects. This can be seen in the various local instantiations of the word for ‘dog’ variably realized as gsu(ʔ), gəsu(ʔ) gusu(ʔ), gswi(ʔ), gsuj(ʔ), gswij(ʔ) and gsujg. This point is examined in detail with respect to the historical phonology of Gtaʔ in Anderson (in preparation). Armed with such a powerful explanatory mechanism, it is no wonder that no one felt the need to look into the details of the Munda languages as actually spoken or used, nor posit any alternative explanations. This appealingly comprehensive principle has also easily become part of the received canon of knowledge about Austroasiatic. Indeed, it does not appear to have ever been seriously challenged or scrutinized to see if it in fact accounts for the actual data attested, prior to a preliminary attempt in Anderson (2015), although some details of this were questioned or alluded to by Sidwell (2012) and Jenny et al. (2015) as well. I expand this critique below. Obvious misstatements in Donegan and Stampe have been left to ride without ever being questioned. For example, they assert that having the same functional elements appear before or after the verb necessarily entails that the “only obvious explanation … is that those elements were still free forms” (Donegan

proto-munda prosody, morphotactics and morphosyntax


and Stampe 2004: 8). One need only look as far internally within Munda at the behavior of Kherwarian subject clitics, to know that this does not have to be the case. These synchronically vary in position as pre-verbal or post-verbal and they are quite clearly clitics that do not bear independent stress. Another unsubstantiated claim concerns the word rhythm of Turkic languages (2004: 17) since what they claim is demonstrably false for most Siberian Turkic languages, e.g. Xakas (Anderson 1998) or Tuvan (Anderson and Harrison 1999), and indeed for many varieties of Republican Turkish as well. Other demonstrably false statements they make include (2004: 10) that head last structure necessitates synthesis (since many languages in Africa and Papua New Guinea are head-last and nearly isolating). They also claim that Sora (2004: 12) has no “foreign” phonemes when it clearly has phonemic retroflex ɽ (Arsenault 2012, 2016; Anderson and Harrison 2008b). Note that this phoneme is not included among the phonemic inventory of either the initial Ci or final Cf consonantism (nor medially either) of Proto-Austroasiatic (Sidwell and Rau 2015). Yet another unsupported claim is that (2004: 19) “(i)n Mon-Khmer, new affixes are prefixed; in Munda, they are suffixed”. Even the very data they introduce in their first example on the first page of their study disproves this, since the new objective marker dɔŋ is a prefix with pronominals. Note that this distributional pattern characterized the case marker it replaced precisely (see section 7). So, as is often the case in historical morphosyntax, the marker of the functional category can get renewed, but the prosodic structure or morphophonological distributional (or morphotactic) structure has been maintained or preserved. So this brings us to perhaps the most enduring myth of Munda languages listed in (1) and one that has only recently begun to be questioned seriously: The uncritical acceptance of rhythmic holism, and the corresponding fallacy that Munda languages have falling word rhythms and have become just like the other core genetic units of mainland India, viz. Indo-Aryan and Dravidian. As Donegan and Stampe (2004: 5) state Munda and other South Asian languages have falling phrase rhythms (as in nóun+pòstposition) and, excepting some Indo-Aryan languages, also falling word rhythms (as in báse+sùffix). Mon-Khmer and South-East Asian languages have rising phrase rhythms (as in prèposition+nóun) and rising word rhythms (as in prèfix+báse). They contrast these more widely, such that one can speak of “initial versus final accent in phrases and in words” and “falling versus rising rhythms” in the language as a whole. It is this overly powerful and indiscriminately applied



mechanism that accounts for the highly un-Austroasiatic and altered formations they described as “highly synthetic structures” in the Munda languages. Before I turn to what how Proto-Munda may have appeared morphosyntactically, morphotactically, morphophonologically and prosodically, and where those structures fit in within the typology of Austroasiatic, I first turn to discussing the nature of the prosody of phonological words and the concept of phonological versus morphological words as applied to the Munda languages. This is at the very core of the claims of Donegan and Stampe. Most revealingly, there is no support for these same claims when put under scrutiny. While Donegan and Stampe claim that all Munda languages have falling (trochaic) word structure, the actual data from many Munda languages does not support this. Ghosh (2008: 30) described the fixed second syllable stress of Santali. Weak-strong or iambic word prosody has been ascribed to Mundari (Osada 2008), Kharia (Peterson 2008, 2011), Remo (Anderson and Harrison 2008a), Gtaʔ (Anderson 2008, in preparation) or indeed Sora itself (Horo 2017, Horo and Sarmah 2015). Indeed, many Munda languages have maximally threesyllable phonological words, often with one extrametrical syllable. Thus, while sesquisyllabic structure no longer predominates except in some Gtaʔ varieties, the weak-strong word prosody it reflects endures strongly. These facts are difficult to maintain in the face of rhythmic holism. Precise phonetic analyses of Munda languages such as Sora clearly refute the claims of an all-pervasive, nongradient rhythmic holism. It is very important to make a distinction, as Donegan and Stampe (2004) and everyone who relies on them have not done, between word and phrasal prosody in the Munda languages. This entails a distinction between prosodic phonological words versus prosodic morphological words (or phrases). Once one recognizes the level of prosodic phonological word in most Munda languages as distinct from morphological word, then the dichotomies or distinctions predicted by rhythmic holism between “Mon-Khmer” and Munda become less distinct, although of course clause-level constituent order nevertheless remain quite distinct. Such syntactic variation can arise in even relatively shallow branches of Austroasiatic, however, such as the very distinct syntactic structures that are found in Khasian demonstrate where one finds standard Khasi SVO (Nagaraja 2015) but Pnar VSO (Ring 2015).3 To exemplify the concept of prosodic morphological word versus the phonological word, let us examine the following form from Santali: əguke’tkoae ‘he

3 Daladier’s (2011) somewhat fanciful model of the history of the Austroasiatic linguistic colonization of Meghalaya notwithstanding.

proto-munda prosody, morphotactics and morphosyntax


brought them’. At first glance this appears to be an example of the highly synthetic structures developed by the shift in rhythmic holism to falling rhythm. In the Santali grammatical tradition codified by Bodding, (1922, 1929) this form has been analyzed as a single complex six-syllable word: (2) Santali əgu-ke-’t-ko-a-e bring-tr.pfv-3pl.obj-ind-3sbj ‘He brought them.’ Speakers however actually say this string as in (3); in this notation, []pω surrounds the phonological words, which all have a weak-strong pattern, and these (two or)4 three phonological words have cliticized together into a morphological phrase-word, marked {}mω. (3) {[əgú]pω=[ke-’t-kó] pω=[a=è](pω)}mω Taking a hypothetical four-syllable sequence example, combining two twosyllable phonological words into four-syllable morphological phrase-words can follow four logical paths. The phonological words could be left-headed or rightheaded, as can the morphological phrase-words. Many Munda languages, contra Donegan and Stampe (2004), retain iambic phonological words, even if many have trochaic phrase prosody. Put differently many Munda languages retain Austroasiatic phonological word templates and prosody, even while adopting more typical ‘South Asian’ phrasal prosody. Such a circumstance has led to the mismatch between phonological and morphological words that characterize many Munda languages today. This may well have been characteristic of various intermediate proto-language levels or Proto-Munda itself even. Formally speaking, phonological words are created by assigning a foot structure to syllabic sequences. In principle, this foot structure, i.e., the phonological prosodic word, can be trochaic (strong-weak) or iambic (weak-strong). In the case of many Munda languages, it is actually the iambic pattern that predominates at this level of phonological prosodic word. Such a pattern characterizes Kharia (Peterson 2008, 2011), Remo (Anderson and Harrison 2008a),

4 There is variation among speakers as to the status of the final two vowels in such sequences whether they should be treated as metrically strong ‘prosodic’ words, or extrametrical parts of the preceding ‘word’.



Gtaʔ (Anderson in preparation), and indeed is also characteristic of many of the Kherwarian languages (Anderson and Jora 2016) even today. Thus starting with a putative four-syllable sequence like, possible derivations of prosodic/phonological words and morphological words/phonological phrases, the following logical possibilities arise as in (4): (4) Assign prosodic word foot structure ⟨R-headed (iambic), L-headed (trochaic)⟩ Left-headed [á.ka]pω [dá.na]pω á. *

ka –

dá. *

na –

Right-headed [a.ká]pω [da.ná]pω a. –

ká *

da –

ná *

Larger units acquire macro-foot structure similarly, and the word and the phrase can either be left-headed or right-headed, yielding four logical combinations. Consistency between the phonological word-level and phonological phrase or morphological word level would yield initial or final stress (for consistently left-headed or trochaic or consistently right-headed or iambic word and phrase prosody). In other words the foot structure of the word and phrase match, as in (5): (5) {[Left-headed]pω+[Left headed]pω}mω {[Á.kà]pω [dánà]pω}mω {[Right headed]pω+[Right headed]pω}mω {[à.ká]pω [dà.nÁ]pω}mω Á. kà dá. nà * – * – ** – * – phrase/mω-initial-stress

proto-munda prosody, morphotactics and morphosyntax


à. ká dà. nÁ – * – * – * – ** phrase/mω-final-stress Mismatches between the prominence pattern or the prosodic foot structure of phonological words versus morphological words or phrases yield other patterns, as in (6): (6) {[Left-headed]pω+[Right headed]}mω {[á.kà]pω [dÁ.nà]pω}mω á. kà dÁ. nà * – * – * – ** – penultimate stress {[Right headed]pω+[Left headed]}mω {[à.kÁ]pω [dàná]pω} à. kÁ dà. ná - * * - ** * second-position stress mω pω

morphological word, m-Word phonological word, p-Word

It is actually this last pattern that characterizes many Munda languages. In languages like Santali or Remo, with second position stress, a two-syllable word has final stress. A three-syllable morphological word has second-syllable stress and typically an optionally extrametrical grammatical index in final position. And four-syllable morphological words (mω) first are assigned to phonological prosodic words (pω) in an iambic pattern: (7) Remo {[susúm]pω =[ɖen-t-ìŋ]pω}mω ‘I am eating.’



su. súm ɖen. tìŋ * * ** * a-goi=tə-no {[a-gói]pω =[tə-nò]pω}mω

‘You are not dying.’

a. gói tə. nò - * - * - ** - * {[Right headed]pω +[Left headed]}mω In even longer morphological words, the pattern is replicated. Take a Santali sentence like the one in (8): (8) Santali (iŋ) gidra {[ba=iŋ]pω [arub]pω-[aka-d]pω-[ko-a]pω}mω (1sg.pron)5 baby neg=1sbj wash-perf-tr-3pl.obj-ind ‘I haven’t washed the babies.’ Syntactically such a sentence has the structure of S → NP⟨sbj⟩ VP; VP → NP⟨obj⟩ V’; V’ → neg=sbj Verb=tam-(i)tr=obj=fin. This large morphological verbal complex consists of four phonological words. Each retain a large degree of the historical input syntax, within a predicate nucleus that has moved to clause final position. But it has taken with it almost the entire original linear syntax of its operators tied in a clitic string with the semantic head (verb). That is, these Munda verbal complexes suggest earlier syntactic phrasal structures. The only really unusual aspect of this structure from an Austroasiatic perspective is the post-verbal tam operator, but a number of Austroasiatic languages actually utilize such constructions, see section 6 below.6 Otherwise, this structure of the phrasal verb or morphological verb is reasonable in a historical Austroasiatic light, just with the elements localized in a clitic string that has shifted to clause final position. Another problem is that what Donegan and Stampe assert about Sora is not true of Sora either. Based on a series of phonetic studies, Horo and Sarmah have 5 There are nouns and pronouns in Santali and other Kherwarian languages that seem violate the minimal word constraint. In actual pronunciation many such words have phonetically long vowels or diphthongs, and thus are usually only apparent violations. 6 Note that the declarative/indicative marker was clearly an innovation in Proto-North-Munda and does not project back farther than that stage.

proto-munda prosody, morphotactics and morphosyntax


in a number of papers (2013, 2014, 2015) significantly dismantled several claims about the structure of words and the vowel system of Sora made by Donegan and Stampe. They demonstrated that Assam Sora in fact has word prosody very similar to the Austroasiatic languages of MSEA. Based on their instrumental studies, Horo and Sarmah (2015: 78) have determined that “vowels (in Assam Sora) in the first syllables are more centralized” and “vowels in the second syllable are more representative of the canonical vowel space”. They go on to demonstrate that, that (2015: 80) “(t)he first syllable has statistically significant lower f0 and maximum f0 than the second syllable”. They also show that (2015: 82) “(t)he vowel space in initial syllables is reduced. … the average f0 and maximum f0 of the second syllables is higher”. And they further demonstrate “that in V.CVC words, even though the vowel in the second syllable is in a closed syllable, the vowel in the first syllable is still significantly shorter than the vowel in the second syllable” (Horo and Sarmah 2015: 79). All told, this amounts to the fact that (2015: 82) “the second syllable is stressed in a disyllabic word in Assam Sora, characterized by greater pitch, longer duration, and by change in vowel quality … (and that) the second syllable displays higher f0 and duration of the vowel … suggest greater prominence.” It is clear that the actual phonetic facts do not support the previous assertions about the falling word prosody of Sora and, by extension, Munda as a whole. This therefore draws into serious question (or indeed refutes) the central assertion of Donegan and Stampe’s (2004) thesis. Thus, Assam Sora appears to conform to old phonological word prosody. Horo (2017) demonstrates that the same holds for Odisha Sora too. This supports the observation, made in a very different context by Brunelle and Pittayaporn (2012: 426), “that language contact does not inevitably lead to changes in word shapes”. In case one might think that Assam Sora is anomalous in this regard, even central Indian Sora of Odisha does not fit the model that they laid out. Horo and Sarmah (2015) and Horo (2017) demonstrated using instrumental phonetic data that this is true of both Assam Sora and Odisha Sora alike. Word prosody is iambic in Sora of Odisha (Horo 2017). Donegan and Stampe also claim that Sora has a phonemic inventory of nine vowels. Horo (2017) shows that Sora has six phonemic vowels, the five cardinal vowels and the central vowel [ə]. It is of course possible that the vowel system of Sora is in flux and has reduced the number of central vowels to one, but the data do not suggest this. This makes Sora seem more in line with other southern Munda languages, such as Gtaʔ.7

7 Gtaʔ also has a six-vowel system and phonetically a central to front vowel varying in pronunciation between [ə] ~ [ɨ] and even ~ [ɛ] ~ [æ].



Psycholinguistic intuitions and intonational patterns of Sora speakers do not always accord with the view of the language presented in Donegan and Stampe. So testing the very forms offered from Sora in that paper, we got the following response to the first complex word presented, reproduced as (9). Here I give first the form in an undifferentiated sequence as I presented it to the speaker (who is literate in both IPA and Ramamurti’s orthography) and who has studied linguistics. (9) Sora ədməltijdariɲdae əd-məl-tij-dar-iɲ-da-e neg-des-give-rice-1.obj-aux-3sbj ‘He does not want to give me rice.’ The meaning was understood, and he acknowledged the archaic nature of the form (“this is how old people speak, not people today”) but he wanted to first add the subject pronoun anin and rejected it without it. Next, when asked how many words this was (it was written as one), he first said one and then rejected this or self-corrected it, subsequently proceeding to repeat it as in (10): (10) Sora anin {[əb-mə́l]pω =[(tijg)-dar-íɲ]pω =[dá-j]pω}mω 3sg.pron neg-des =give-rice-1.obj =aux-3sbj ‘He does not want to give me rice.’ That is, this morphological verb word he decomposed into three phonological words. He sees the negative and desiderative as forming one unit, the verb stem+incorporated object+ pronominal object to be another unit and the clause-level tam and subject operator sequence to be a third unit. Note that the vowel in this last element, /e/ coalesced with the preceding /a/ to form a diphthong [–aj] in this last unit. This despite the fact that one does not expect diphthongization here, or in Munda generally—or so we have been led to believe. This interpretation offered by the speaker is not surprising given that the morphology involved here belong to different historical layers, with the negative, object marker and subject markers belonging to an older layer than either the desiderative or tam auxiliary. However, similar patterns emerge even with elements that predominantly belong to the oldest layer in Munda. Take another of Donegan and Stampe’s (2004) examples.

proto-munda prosody, morphotactics and morphosyntax


(11) Sora əədnəlgəbrɔɉlaj ə-ədn-əl-gə⟨b⟩rɔɉ-l-aj 1pl-neg-recp-shame⟨caus⟩shame-pst-1sbj ‘We did not shame each other.’ Native speaker intuition, and indeed prosodic word phonology, suggests a psycholinguistic reality of three separate phonological words in this morphological word or phrase sequence as well. In saying this the speaker reduced the initial sequence to a short vowel. (12) Sora {[(ə)-ədn-ə́l]pω -[gə⟨b⟩róɉ]pω -[l-áj]pω}mω (1pl)-neg-recp shame⟨caus⟩shame pst-1sbj ‘We did not shame each other.’ There appear to be some interactions in stress assignment and word structure and syllable weight or morphemic structure (Horo and Anderson 2019), and also variation across Sora varieties, so much of the system remains to be fully worked out for certain. In fact, Sora dialects are very under-described, and a Proto-Sora-Juray system will be necessary to fully inform further and higher comparative studies. This work is ongoing. Several things are noteworthy about (10) and (12), apart from very clearly showing prosodic word phonology variants much more reminiscent of typical Austroasiatic patterns categorically excluded by Donegan and Stampe in Sora. The subject prefix/proclitic, the negative and reciprocal are all old elements in Munda, as is the causative infix and the l-initial perfective/anterior series tam element. Only the first person-cum-cislocative marker -aj is a secondary innovation in Sora: All other elements have parallels in other Munda languages.8 While the reciprocal, subject marker and negative have been traditionally analyzed as prefixes, the cognate reciprocal element is considered to be a free standing pre-verbal element in Kharia by Peterson (2011). The three cognate elements in Gtaʔ–the subject markers, the negative and the reciprocal—all occur in this same order as in Sora. In Gtaʔ these vary in their morphophonological behavior between being included in the phonological word or behaving more like extrametrical proclitics, depending on their immediate phonological host/context (Anderson in preparation).

8 And -aj has parallels in the closely related Gorum.



A full study of phonological and morphological words of Sora lies beyond the scope of the present study; this is the object of ongoing work (Anderson, Horo and Gomango in preparation). However, what we can say is that Proto-Munda seems to have been both OV syntactically but still retained iambic phonological word prosody. However, morphological word and phrasal prosody could assign stress or prominence to other than the second syllable of a word, even if that syllable is prominent or stressed in isolation. One is struck by the similarity of this putative Proto-Munda state that endures to the present in languages as diverse as Santali, Kharia, Remo and Gtaʔ, to Sidwell’s assessment of Car Nicobarese (2015b: 1232) “morphemes are typically mono- or disyllabic and primary stress usually falls on the second syllable in a disyllabic morpheme, although at word level primary stress may fall on the ultima, penultimate or pre-penultimate syllables.” 2.2

Proto-Munda Syntax and Morphosyntax in Austroasiatic Comparative Light The functional elements in other non-Munda Austroasiatic languages do not appear to be that different than what likely existed in Proto-Munda (see below), though they do differ as to whether they appeared in a distribution before and after the predicate satellite or verb. The drift to post-verbal functional elements in languages with OV dominant word order is of course to be expected, so an increase in this in Munda is ‘natural’. But what did it shift from? To be sure, Verb-initial order or VSO typology with a variant, as is typical of Verb-initial languages, of SVO, was what Proto-Austroasiatic may have had perhaps (Jenny et al. 2015a, 2015b, 2016). Verb-noun compounds found in the Munda languages of Odisha, with cognates in other languages in the east, are internal evidence that Munda languages too were most likely this originally. Indeed such an optional order is still found in Gtaʔ narratives (Anderson in preparation) more than one would expect in an OV language from Eurasia. Conversely, recent work on comparative Austroasiatic syntax (e.g. Ring et al. 2019) show evidence for apparent verb final SV order in intransitive clauses in a number of non-Munda AA languages. However, the comparative method compels us to reconstruct dominant OV order for the Proto-Munda level. Various pre-verbal functional tam clitics may have originally had a variable distribution like that of the Kherwarian subject clitics, optionally before or after the host. The completive/perfective marker *la may have already been post-verbal in pre-Proto-Munda. Functionally similar elements often are post-verbal even in VO serializing languages (Anderson 2006) and in other Austroasiatic languages as well (not the formal element but its functional analog). Note that such a distribution is quite common for example across a number of different genetic units of Africa (Anderson

proto-munda prosody, morphotactics and morphosyntax


2011). In fact, this is expected based not only on its serial origin but on general iconicity principles operative in many Southeast Asian languages (Jenny, p.c.). Such a formation could have helped draw other functionally similar elements into this position (see section 6 below for examples). Proto-Munda was probably an accusatively aligned language with a primary object (Dryer 1986) patterning of both verbal argument encoding and NP case marking.9 The Proto-Munda clause therefore looked something like (13). (13) NP⟨sbj⟩ [*(Ɂ)a1/2.obj=]NP⟨obj⟩ 1/2.sbj=neg=Verb.Stem=⟨tam⟩=obj=[3.sbj] Note that such structures are far from being significant deviations from putative ancestral Austroasiatic structures. The restructuring or adding of some of the inflectional elements such as the use of prosodically weak resumptive pronominal clitics and tam (and negative) markers deriving from serialized verbs, are completely in line with a language that has undergone shift from a VSO or SVO language to an OV one. Some examples from other AA languages and Munda are offered in sections below. So syntactically at the clausal level, there has been a significant restructuring but the result is not as radically divergent from Austroasiatic sources as has been previously believed. To be sure, many of these Proto-Munda features have likely direct parallels in other branches, only with inflectional or functional clitics behaving in a manner consistent with the VSO (14) or SVO (15) dominant clausal constituent order. (14) Proto-Austroasiatic VSO clausal constituent variant? neg ⟨tam⟩ Verb.Stem NP⟨sbj⟩ [*(Ɂ)a1/2.obj=]NP⟨obj⟩ (15) Proto-Austroasiatic SVO clausal constituent variant? NP⟨sbj⟩ neg ⟨tam⟩ Verb.Stem [*(Ɂ)a1/2.obj=]NP⟨obj⟩ Proto-Munda does have a case system at least with pronominals and this was preserved in several of the southern Munda languages, and indeed may be preserved as well in certain unexpectedly a-initial pronominal forms in various Kherwarian languages like Ho. For examples see section 7 below. In the following sections I give a brief and impressionistic set of phenomena found in various Austroasiatic subgroups that should be considered when

9 Juray however appear to be an actor-undergoer or split-S language (Anderson and Gomango 2016, 2017ms, 2019), and Sora has reflexes of this, but they appear to stand alone in the family in this regard.



attempting to find more nuanced explanations for the drift to verb-final structure that the Munda languages clearly underwent. Put differently, while it is certain that there has been a shift to OV order in Munda, it is not the case that no other Austroasiatic language or branch shows any constructions reminiscent of what is found in Munda. Thus, like the wholesale shift to head-initial structure, to synthesis, or in rhythm, a more nuanced and gradient approach to the attested developments should be adopted. Among the features that are found in specific branches of Munda languages or the branch as a whole that seem to have parallels in other Austroasiatic subgroups are the presence of prohibitive formations that are formally distinct from other negative systems, or two parallel systems of negation whatever the functional categories contrasting them are more broadly, discussed in section 3. Another feature can be considered the use of subject proclitics (section 4) and reduplication to encode imperfectivity (section 5). Other features relating to verbal morphosyntax that bear mention include post-verbal aspectual and modal operators addressed in section 6. Also included here are a few other instances of OV syntax found in individual constructions in Austroasiatic that suggest that post nominal verbs or post-verbal operators are not as alien to Austroasiatic branches spoken outside of India as has been suggested. In section 7, I briefly examine two features of NP morphosyntax in Munda and other Austroasiatic groups, namely the use of case markers and the so-called reflexive or reflexive-possessive.


On Negation in Munda and in an Austroasiatic Perspective

Like many other Austroasiatic languages (Jenny et al. 2015), more than one type of formal negative scope operator is used in most of the Munda languages and their intermediate proto-languages and indeed Proto-Munda as well most likely. The specific details of the functional categories involved in the opposition may vary across individual languages and subgroups.10 For example, Kherwarian North Munda languages typically contrast two series of negative markers. One is alo for prohibitive. The other is a default negator (Jora and Anderson 2017, this volume), which has several realizations, ka in Mundari and


The placement of subject clitics at the end of the complex in Kherwarian like Santali does not imply a priori that VPA order was original (Jenny et al 2015: 38), as this appears to be a recent and secondary process in Kherwarian, despite this order typifying War varieties and Nicobarese. Of course, it may reflect an original VAP/VS order perhaps, à la Wa or Pnar (Ring 2015).

proto-munda prosody, morphotactics and morphosyntax


Mundari-esque lects like Ho or Bhumij, ba[n/ŋ] in Santali (as well as Korku) and me(ne)/mer(e) in Korwa. North-Munda internal comparative data favors ba[n/ŋ] as the general negator and alo as the prohibitive. The former was lost in Korku and the general negator ba[n/ŋ] extended into the prohibitive function in Korku. In Mundari-Ho-Bhumij, a different negative scope operator of unknown function in the proto-language, ka, was innovated to replace the general negator ba[n/ŋ]. This ka negator, as in Ho (16) is possibly cognate with those that are found in Palaungic Danau (17) and Old Mon (18) among others.11 (16) Ho (Kherwarian) kula sukri=ke ka=i goiʔ=ki-j=a tiger pig=obj neg=3.sbj ‘The tiger did not kill the pig.’ (17) Palaungic Danau (Si 2015: 1114) kɔʔ (18) Old Mon (Jenny and McCormick 2015: 535) kaḥ ‘not to’ The ba[n/ŋ] series in Santali (19) and Korku (20) (and proto-North Munda) in turn may have possible analogs in other Austroasiatic branches as well (21), or the putative correspondence may simply be chance of course. (19) Santali iŋ iɲa(Ɂ) ʤoʤom ti ba=iŋ 1sg.pron 1sg.pron:gen red~eat > right hand neg=1sbj arub-aka-n-a wash-prf-intr-ind ‘I didn’t wash my right hand.’ (20) Korku kor-ku ɖusra-ku=ʈen ban munɖi man-pl other-pl=abl neg hit ‘The men will not hit each other.’ (21) Bahnaric Kơho-Sre (Olsen 2015: 765)


baɲ proh


The negator ka is also found in Shwe Palaung. Thanks to an anonymous reviewer for this comment.



Like Korwa and Juang, several other Austroasiatic languages might reflect rather an m-initial negator. Some forms that might suggest this include the ones in (22)–(23). (22) Bahnaric Bunong (Butler 2015: 739) mo:

neg preverbal

(23) Mangic Bugan (Li & Luo 2015: 1042) mǝ

neg preverbal

Possibly relevant to this same putative negative scope operator is the Standard Khasi negative clitic as well (24). (24) Standard Khasi (Nagaraja 2015: 1177)

=m neg


It is even less clear though that the prohibitive in Chong might belong here (25) too. (25) Pearic Chong (Premsrirat/Rojankul 2015: 614) ma̰ ːj

proh preverbal

It is of course possible that these represent different m-initial negators in these Austroasiatic groups too. Why this is relevant to Munda is that in addition to the ba[n/ŋ] negator reflected in Santali and Korku (and thus likely in protoNorth Munda too) and the ka negator seen in Mundari-Ho-Bhumij is that there are one or two other m-negators found in preverbal position in various Munda languages as well. One of these is the negator um found in Kharia, which, like Kherwarian languages with which it is in contact in Jharkhand, serves as the host for subject clitics in negative conjugations (26). It is possible that this Kharia form is cognate with the negator am- in Juang as well.12 (26) Kharia um=iɲ ter=e neg-1sbj give=irr ‘I won’t give.’ (Peterson 2008: 463) The other m-negator in Munda is ma- found in Gtaʔ and in Juang. In both Juang (27) and Gtaʔ (28), the ma- negator occurs in non-finite or non-verbal forms. 12

As an anonymous reviewer pointed out, Khmu Ou and Khmu Cuang use the negator am and ma in the form of aw V ma occurs in Palaungic Rumai. Thanks for these comments. Such evidence might be used in support of the ‘Northern Mon-Khmer’ view by proponents of such a theory.

proto-munda prosody, morphotactics and morphosyntax


(27) Juang apa a-ma-ɉim-ke ete aiɲ kikib ɉena 2du.pron 2du-neg-eat-prs because 1sg.pron red~do neg.cop ‘Because you don’t eat (it), I didn’t do it.’ (Patnaik 2008: 546) (28) Gtaʔ ma-bihæ=nǝ ngire neg.attr=marry=attr ‘Unmarried young man, bachelor’ (Field Notes) All negators in Munda enumerated above (ka, um, ba[n/ŋ]), except possibly ma- in its non-finite functions, may have likely originated in serial verb constructions in a pre-Proto-Munda Austroasiatic dialect. Although plausible typologically and within the broader Austroasiatic context, there is no Mundaspecific or Munda-internal evidence that suggests this. Thus, all negators must be simply considered negative particles (or prefixes) at all historical stages within Munda proper, attested or reconstructed. Such elements were drawn into the dependent operator functional clitic chain usually in one of the two leftmost slots. Originally, they probably hosted the subject proclitics, an order reflected in Juang, Sora and Gtaʔ. This includes both the ma series in Juang and the one marked by a(r/d)- (see below) that has become the default or general negator in several Munda branches. This last-mentioned element shows some complex interactions with tam marking in many of them (Sora, Remo, Gutob, Gtaʔ), see below. The negative scope operator a(r/d)- is a prefix in all Munda languages that attest it. Historically however it appears that it might be cognate with a preverbal negator of serial origin found in Khmer and Nicobarese ʔət (there may be some issues with the historical phonology however), and if so, this thus would have putatively derived from an original verb meaning ‘lack’, ‘be used up’, ‘finished’ (Shorto 2006: 274), reconstructed to ProtoAustroasiatic. Juang shows the simplest conjugational system associated with the element a(r/d)-, realized there as a- typically (29)–(32). In Juang this negative scope element simply attaches to the positive form and conveys negative scope over the tam marker. (29) Juang ne dʒandare lara-ke prox laugh-prs ‘The woman is laughing.’



(30) Juang ne dʒandare a-lara-ke prox neg-laugh-prs ‘The woman is not laughing.’ (31) Juang arokia baronoŋ a-goiʔ-ki-kia 3du.pron two.hum neg-die-prs-3du ‘Those two do not die, are not dying.’ (32) Juang arokia baronoŋ a-goiʔ-joʔ-kia 3du.pron two.hum ‘Two of them did not die.’ In the past in Sora, a similar pattern is attested and the negative prefix a-/əsimply attaches to the positive past form (33)–(34) and negates this. (33) Sora anin-dʒi rban daʔa-n a-tij=l-əm-dʒi 3pron-pl yesterday water-n.sfx neg-give-pst-2.und-3pl.actr ‘Yesterday they didn’t give you water.’ (34) Sora aman doʔŋ=ɲen a-giɟ-l-iɲ 2sg.pron obj-1sg.pron neg-see-pst-1.und ‘You have not seen me.’ However things get more complex in the non-past in Sora. In the positive a tam marker is used (35), but in the negative it is suppressed and realized as zero (36). Thus, the negative non-past is marked by only the negative prefix and no tam marking in Sora. (35) Sora ɲen giʔj-t-aj 1sg.pron see-npst-1 ‘I (will) see.’

proto-munda prosody, morphotactics and morphosyntax


(36) Sora (Anderson & Harrison 2008b: 346, 331) ɲen bazar-ɪn ə-jeːr-ej 1sg.pron market-n.sfx neg-go-1 ‘I don’t, won’t go to the market.’ Sora like Kherwarian has a separate prohibitive marker than the general negator used in declarative sentences. (37) Sora tij=doŋ-iɲ give-proh-1.obj ‘Don’t give me!’ Turning now back to the negator a(r)- in Munda languages of Odisha, we see that in Hill Gtaʔ, this negative element occurs in both declarative (39) and prohibitive (38) forms, with some complications as seen in Sora above. (38) Hill Gtaʔ a-næjŋ na-á-basoŋ=gɛ obj-1sg.pron 2-neg-tell-proh ‘Don’t tell me!’ (39) Hill Gtaʔ gubug a-goiʔ-tǝ pig neg-die-neg.pst ‘The pig didn’t die.’ In the positive conjugation, in Hill Gtaʔ =tǝ typically encodes present (40), as it does in Sora in (35). In simplex predicates in the negative on the other hand, the combination a-…-tǝ, i.e., what appears to be neg + prs, marks rather negative past tense in Hill Gtaʔ (41). (40) Hill Gtaʔ sela santa we-tǝ girl market go-prs ‘The girl goes to the market.’ (41) Hill Gtaʔ gubug a-we-tǝ pig neg-go-neg.pst ‘The pig didn’t go.’



However, confoundingly this same sequence a-…-tǝ encodes negative present progressive tense-aspect in complex predicates (42) in Hill Gtaʔ. (42) Hill Gtaʔ ɖiaŋkoj ɖiaŋkoj ho(ʔ)-barsoŋ a-ɽiŋ-tǝ woman woman recp-speak neg-ipfv-npst ‘The women are not speaking to each other.’ That it is, it retains its opaque etymological constructional meaning of (negative + past) with simplex predicates while with complex predicates it has a new combinatorial/concatenative meaning of (negative + present). Like Sora (36), negative future forms in Hill Gtaʔ simply have the negator a- and no tam marker (43), while positive future forms require a future marker (44). (43) Hill Gtaʔ kine hãwe a-na n-a-biʔ prox bow obj-2sg.pron 1-neg-give ‘I will not give you this bow.’ (44) Hill Gtaʔ kine hãwe a-na m-biʔ-wɛ prox bow obj-2sg.pron 1-give-irr/fut ‘I will give you this bow.’ Like Hill Gtaʔ, the same negator in Gutob is used in declarative sentences and in prohibitives, but Gutob shows a different but still very complicated system of tam+negation interdependencies. For example, the tense marker -gu marks past with class-I verbs (mainly intransitive and middle verbs), but when combined with the negative prefix ar- it marks prohibitive (45)–(46). (45) Gutob ser-gu sing-pst ‘He sang.’ (46) Gutob ar-ser-gu neg-sing-pst.intr/mid ‘Don’t sing!’

proto-munda prosody, morphotactics and morphosyntax


Similarly the TAM suffix -to encodes a habitual present in the positive (47), but when combined with the negative prefix ar-, a negative past tense is the result (48). Thus, like the Hill Gtaʔ negative past in simplex conjugations and in prohibitives, negation is constructional in Gutob, not combinatorial as in Juang. (47) Gutob ser-to sing-npst/hab ‘She sings.’ (48) Gutob ar-ser-to neg-sing-neg.pst ‘She didn’t sing.’ With past copula formations, one finds ar-ɖu-gu in Gutob, with both the negative and past in their expected meanings; in other words, in copula forms the sequence ar- -gu is rather semantically combinatorial, not constructional. With non-copula main verbs, the same sequence encodes prohibitive in a construction (46). (49) Gutob niŋ=nu dʒoɽek ɖieŋ ɖu-gu 1sg.pron-gen two house aux-pst ‘I had two houses.’ (50) Gutob niŋ=nu dʒoɽæk ɖien æɖ-ɖu-gu [ar > æɖ] 1sg.pron=gen two house neg-aux-pst ‘I did not have two houses.’ Finally, Remo shows a slightly different system but one with its historical origins in the ones discussed so far in the Munda languages of Odisha. The present forms in Remo show combinatorial semantics in the negative, with the default negator and the tense marker keeping their individual meanings (51)–(52). (51) Remo niŋ a-no dʒu-t-iŋ 1sg.pron obj-2sg.pron see-npst-1 ‘I see you.’



(52) Remo niŋ a-no a-dʒu-t-iŋ 1sg.pron obj-2sg.pron neg-see-npst-1 ‘I don’t see you.’ However, in the negative past in Remo, one finds two new auxiliary markers having been innovated but maintaining a structure cognate with that in Gutob and Hill Gtaʔ. That is the negative past in Remo shows the structure negaux-npst-sbj with what functions as the positive present tense marker on the auxiliary, i.e., it maintains the constructional nature of the negative past. (53) Remo niŋ a-no dʒul-oʔ-niŋ 1sg.pron obj-2sg.pron ‘I saw you.’ (54) Remo niŋ a-no dʒu(l)-oʔ 1sg.pron obj-2sg.pron a-boŋ-t-iŋ ‘I didn’t see you.’ In summary then, many Munda languages, like their sister languages to the east, thus make at least a formal distinction between several negators (Jenny et al. 2015). In Munda, at least four and possibly five different negative scope elements can be found a(r/d)-, ka, ba[n/ɲ/ŋ], ma- and um/am, see Table 6.1. With the exception of the last one in Kharia and possibly Juang, the first four appear to have possible analogs in other Austroasiatic branches. The first three enumerated ones (a(r)-, ka, ba[n/ɲ/ŋ],) might have arisen in serial verb formations, while ma- may have originally been nominal or non-finite originally in Munda at least. The origin of the possible fifth negator um/am remains unclear so far. Some branches attest to a formal opposition between prohibitive and declarative negative elements (Sora, Kherwarian)—a contrast widespread in Southeast Asian languages (Jenny, p.c.). In other groups, there is rather a complicated interaction, parts of which might also need to be projected back to the proto-language, between tam markers and negation. Among the patterns that seem potentially old in the Munda family are at least negator + Ø tam marking for the negative future (proto-North Munda, Sora, Gtaʔ), and possibly negator plus non-past tam marker for negative past (Gutob-Remo, Gtaʔ).

proto-munda prosody, morphotactics and morphosyntax table 6.1


Distribution of negators in Munda

ba[n/ɲ/ŋ] ka a(d/r)- um/am maSantali Korku Mundari Ho Kharia Juang Gta? Sora Gorum Gutob Remo

+ + -

+ + -

? ? ? ? + + + + + +

+ + ? -

+ + ? ? ? -

It is also possible that there was a finite versus non-finite or nominalized negation opposition as well marked by ma- (Juang, Gtaʔ). In individual languages or sub-groups within Munda, various neutralizations seemed to have occurred that have resulted in these oppositions having been obscured or restructured. In other words, regularization or elimination of the more marked paradigms has occurred or is occurring. Lastly, most if not all of the Munda negative markers have parallels in other Austroasiatic branches. Tellingly, the syntax of these negative markers is not dissimilar either, as they consistently remain preverbal across the Munda languages (except the Sora prohibitive). New negative copular forms, some likely auxiliaries in origin, have been innovated in clause final position in endangered languages like Juang and Gutob. All such developments of course are consistent with the verb-final syntax that appears to have been in place in Munda languages for some time already. I now turn very briefly to a various other aspects of Munda verbal morphosyntax that have likely analogs in other branches of Austroasiatic.


Subject Proclitics

The Munda subject proclitics are assuredly an innovation at the Proto-Munda level, but prosodically weak resumptive pronouns have been used in other branches of Austroasiatic. Such forms have been described (Alves 2013: 5) for example in various Aslian languages, e.g. Temiar (55) or Che Wong, which



Kruspe et al. (2015: 436) describe as obligatory with dynamic verbs. They may even be found in Katuic Pacoh (56). (55) Temiar ye:ʔ ʔi-tɛrsǝg cɛp 1sg.pron 1sg-trap bird ‘I trapped the bird.’ (Benjamin 1976: 175) (56) Pacoh ʔi-taʔ pǝllo: ʔalɔ:ŋ unspec-make tube wood ‘One makes a wooden tube.’ (Alves 2006: 39) Note also the gender-cum-person subject proclitics in Khasi negative and future constructions (see (64) for an example of the latter).


Reduplicated Imperfective

As already described by Pinnow (1966) and synchronically active in as diverse a group of modern Munda languages as Santali and Hill Gtaʔ (57), a monosyllabic verb stem in Proto-Munda could be reduplicated to indicate imperfectivity, such as iterative action, continuous action. (57) Hill Gtaʔ næjŋ a-ná kǝkéj n-a-ɖéj=tɛ 1sg.pron obj-2sg.pron red~see 1-neg-aux=npst ‘I’m not looking at you.’ The stem for ‘see’ is kej in Hill Gtaʔ and the partial C(V)- reduplication patterns yields the reduplicant kǝ-. Similar structures can be found in other Austroasiatic branches as well, such as the Aslian language Semaq Beri (58). (58) Semaq Beri kɛ gh-gǝh kweh 3sg.pron red.ipfv~-not.share biscuit ‘He isn’t sharing his biscuits.’ (Kruspe 2015: 486)

proto-munda prosody, morphotactics and morphosyntax



Post-verbal Operators and OV-like Constructions in Other Austroasiatic Branches

As mentioned above, the verbal template of functional operators in the clitic chain of Proto-Munda seems to have the order Verb-tam-obj, that is the tam operator and the object were both in a post verbal position. It is certainly true that tam operators appearing in post-verbal position is not widely distributed in other branches of the phylum. However, it is also not the case that Munda is unique among branches of Austroasiatic where constructions of this type are found. Thus, it is possible that such structures may have existed in earlier stages of the language—a hypothesis that that recent research appears to confirm (Jenny et al. 2019). As discussed in Anderson (2006, 2011), one of the most common sources diachronically of tam operators is serial verbs. Austroasiatic is no exception in this regard. Further, completive or perfective serial verb constructions typically involve a serial verb meaning ‘complete’, ‘finish’, ‘end’, etc., and even in VO languages, not infrequently follows an iconic ordering following the lexical head of the construction. As such, post-verbal operators with completive/perfective semantics can be found in various Austrosiatic branches other than Munda, including lɛʔ in Bahnaric Bunong (59) or the postverbal [zojA2] ‘accomplished’ and [ɗaC2] ‘anterior completive’ particles Vietnamese (60)–(61). (59) Bunong cha: klɛʔ nta̤ ŋ ho: lɛʔ eat tuber continually very complete ‘All (I) had to eat were tubers.’ (Butler 2015: 741) (60) Vietnamese Lan về quê rồi Lan go.back hometown accmpl ‘Lan has gone back to her hometown.’ (Brunelle 2015: 944) (61) Vietnamese Lan về quê đã Lan go.back hometown compl.ant ‘Lan first goes back to her hometown.’ (Brunelle 2015: 945) In Katuic Kui Ntua the sequence of bǝ:n (62) or ro̤ ːc (63) + the new situation aspect marker hǝ:j are found in post-verbal position to form a completive aspect construction.



(62) Kui Ntua haj wuɒ bǝ:n hǝ:j 1pron make get nsit ‘I have done/completed it.’ (Bos & Sidwell 2015: 872) (63) Kui Ntua ksaj nɛŋ kuǝj stṳːŋ ro̤ ːc hǝ:j month ana.prox person plant.rice achieve nsit ‘This month the people have completed planting the rice.’ (Bos & Sidwell 2015: 872) In Standard Khasi the perfective is also marked by a post-verbal tam element noʔ (64)–(65): (64) Standard Khasi u la ba:m noʔ pst eat pfv ‘He ate (completely).’ (Nagaraja 2015: 1175) (65) Standard Khasi ka=n mareʔ noʔ run.away pfv ‘She will run away.’ (Nagaraja 2015: 1175) Other post-verbal tam markers found in non-Munda Austroasiatic languages include the durative aspect marker in Mangic Bugan (66) or Modern Mon (67).13 (66) Bugan mɯ31 ȵu33 naŋ13(31) mǝ0dze55 2pron do dur what ‘What are you doing?’ (Li and Luo 2015: 1054) (67) Modern Mon ɗɛh rɔ̀ p kaʔ kɤ̀ʔ 3 catch fish get ‘He may (is allowed to, can) catch fish.’ (Jenny 2015: 585) 13

To be sure, such formations may well reflect fairly recent influence from Burmese in modern Mon and Danau and from Chinese in Bugan. Word order follows iconicity principles in Mon (Jenny, p.c.)

proto-munda prosody, morphotactics and morphosyntax


According to Bisang (2015: 690), a range of constructions in modern Khmer arose from the resultative construction, including tam markers encoding categories such as cap or compl. Similarly, stative formations in Aslian Maniq yield what appear to be predicate/verb final forms (68). (68) Maniq tieʔ pasɛl ground be.dry ‘The ground is dry.’ (Kruspe 2015: 437) According to Jenny (2015: 532–533), imperative forms in Old Mon could have the order N V and be followed by the focus marker da < ‘be’. Note however that this da is not the same as the post-verbal element of similar shape in Vietnamese mentioned above (Alves, Jenny, p.c.). Nor is it likely either to be the same as the post-verbal existential da element in Semaq Beri of the Aslian branch (Kruspe 2015: 507). Stative formations in Palaungic Danau that involve the copular form mǝ̄nɔʔ (69). The copula hɔʔ also appears in clause final position in Danau (70). (69) Danau lɐik nī=nǝʔ ō pɐ-phɐ̄ mǝ̄nɔʔ text prox=top 1sg.pron abil-read stat ‘I can read this.’ (Si 2015: 1113) (70) Danau ō tin hɔʔ 1sg.pron sleep cop ‘I am sleeping.’ (Si 2015: 1112) Other OV syntax-like formations found in various Austroasiatic languages spoken outside of India include types of pragmatically conditioned instances of object fronting in languages like Palaungic Dara’ang Palaung (71) or Khmuic Mlabri (72): (71) Dara’ang Palaung sanaʔ mɨ tu mə̌h gun tcl neg have ‘(A) Gun, (I) don’t have (one).’ (Deepadung, Rattanapitak & Buakaw 2015: 1077)



(72) Mlabri kap boŋ duck eat ‘(We do) eat ducks.’ (Bätscher 2015: 1014)


Features of NP Morphosyntax

The last two features of proto Munda (morpho)syntax that appear to have parallels in other Austroasiatic branches that I mention here include the use of a case particle (7.1) and a reflexive element (7.2). 7.1 Case Since case marking is typical in languages with constituent orders that have argument NP s on the same side of the verb,14 whether preceding or following it, it is therefore not surprising that Proto-Munda with its SOV order appears to have used a proclitic particle to mark primary objects. There are retentions of this in several different sub-branches of the family. In Kherwarian it no longer has the function of case marking and simply appears as a lexicalized element in pronouns in languages like Ho. It may actually also be preserved as the applicative marker a that appears before object pronominals in a number of verb forms in Kherwarian and Korku and thus Proto-North Munda as well. Both possible reflexes can be seen in (73). (73) Ho aiŋ am ka=iŋ nel-a-me-a 1sg.pron 2sg.pron neg=1sbj see-appl-2obj-ind ‘I did not see you.’ The objective proclitic became a prefix and retains its function as an object marker in Hill Gtaʔ (74)–(75), the latter repeating (44) above. (74) Hill Gtaʔ næjŋ a-na n-a-kej-tǝ 1sg.pron obj-2sg.pron 1-neg-see-neg.pst ‘I did not see you.’


Thanks to M. Jenny for reminding me of this.

proto-munda prosody, morphotactics and morphosyntax


(75) Hill Gtaʔ kine hãwe a-na m-biʔ-wɛ prox bow obj-2sg.pron 1-give-irr/fut ‘I will give you this bow.’ Case prefixes may be odd in an SOV language, but less so if this derived from a proto-language like proto-Austroasiatic that might have been VSO. Thus it is worth noting that there are apparent possible cognates to the proto-Munda system in peripheral language groups in the eastern part of the family. Thus, Alves (2006, 2015) mentions dative forms of personal pronouns that take a prefix ʔa-. In form and function, this Pacoh (76) appears identical to a subset of contexts of use that characterize the Gtaʔ formation. (76) Pacoh kɨ: pacɔ:m ʔa-maj kaŋ ʔaɲ 1sg.pron teach dat-2sg.pron language English ‘I teach you English.’ (Alves 2015: 889) To be sure, different Austroasiatic languages make use of case particles of this sort in similar form and function. Whether any, or which, of these may be cognate with the forms above remains to be demonstrated. They are at least suggestive of a hypothesis thereof. Such case elements include Standard Khasi ja acc versus ha dat/loc (Nagaraja 2015: 1153), kaʔ “loc” or in Che Wong of ba= goal in Jahai in the Aslian branch (Kruspe et al. 2015: 437) or the haʔ loc Semaq Beri (77)–(78). As a verb-medial or verb-initial language, ProtoAustroasiatic would have likely innovated such functional operators from a serial verb construction. Thus, the development would be something like as has happened in Modern Mon in the shift of kɒ ‘give’ to an adposition meaning ‘to’ in at least the form cited by Jenny (2015: 586). The development of the object case markers in the various Austroasiatic branches including Munda suggests a similar derivation. Note that recent grammatical case forms, except for Sora as mentioned above, all constitute borrowings from Indo-Aryan, at least for languages like Keraʔ Mundari and Plains Gtaʔ (Anderson and Jora 2016). (77) Semaq Beri ʔǝɲ knãl haʔ ja 1sg.pron know loc 2sg.f.pron ‘I know you.’ (Kruspe 2015: 507)



(78) Semaq Beri kɛ nɔ̃ t haʔ ʔǝɲ 3sg.pron see loc 1sg.pron ‘She looked at me.’ (Kruspe 2015: 507) 7.2 Reflexive-Possessive Another feature of the nominal morphosyntax of Munda that has obvious parallels in other branches of Austroasiatic is a reflexive-possessive nominal marker. It occurs in a range of functions in specific individual languages. It tends to occur in possessive constructions, but has been generalized as a third person pronominal in modern Mon. Its use in formations like (79) in modern Mon has direct analogs to the functions of the seemingly cognate element in Munda Gtaʔ (80)–(81). (79) Modern Mon ʔǝpa ɗɛh jǝmùʔ kjɛ.láj father 3.ref name pn ‘Father’s name is Kyae Lay’ (Jenny 2015: 578) (80) Gtaʔ mba=ɽæ(ʔ) mni(ʔ) lojkoŋ father=3.refl name pn ‘His father’s name is Loikong.’ (81) Gtaʔ ghæʔ wleʔ-ɖæ rope leaf-3.refl ‘Rope-leaf creeper, a plant species’



Munda languages are more similar to the other branches of Austroasiatic in their prosodic and morphosyntactic structures than has previously been appreciated. While the predominant verb-final clausal constituent order is quite distinct within the family, a closer inspection of the details of the morphotactics and prosodic features of Munda languages reveal many familiar characteristics. In turn, a more nuanced look at non-Munda Austroasiatic languages show many potential analogs to formations found in Munda. To be sure, peripheral areas retain more such features than the central areas do. Gradients of syn-

proto-munda prosody, morphotactics and morphosyntax


thetic structures, morphological encoding of inflectional categories and even post-verbal functional operators, reveal themselves, as do potential analogs to the case and possessive systems of Munda, in a range of other Austroasiatic branches. In turn, the prosodic structure of Munda upon more careful inspection turns out to not be as un-Austroasiatic as has been previously claimed. Work on the historical morphosyntax of Munda therefore should not continue to be excluded from comparative Austroasiatic studies, as it is simply not true that Munda languages are “systematically opposite at every level” from the other branches of Austroasiatic.

Abbreviations accmpl actr ant appl mid NP pron sfx tam und

Accomplished Actor Anterior Applicative Middle Noun Phrase Pronoun Suffix Tense-Mood-Aspect Undergoer

References Alves, Mark J. 2006. A Pacoh grammar. Canberra: Pacific Linguistics. Alves, Mark J. 2013. Grammatical Functions in Mon-Khmer Morphology. Presented at SEALS 23. Alves, Mark J. 2014. Mon-Khmer. In R. Leiber & Pavol Stekauer (eds.) The Oxford Handbook of Derivational Morphology. pp. 520–544. Oxford: OUP. Alves, Mark J. 2015a. In N. Enfield and B. Comrie (eds.) Languages of Mainland Southeast Asia: The State of the Art. Berlin: de Guyter, pp. 531–557. Alves, Mark J. 2015b. Pacoh. In Jenny & Sidwell (eds.), pp. 881–906. Anderson, Gregory D.S. 1998. Xakas. München: Lincom Anderson, Gregory D.S. 2001. A New Classification of Munda: Evidence from Comparative Verb Morphology. In Indian Linguistics 62: 27–42. Anderson, Gregory D.S. 2004. Advances in Proto-Munda Reconstruction. In MonKhmer Studies 34: 159–184.



Anderson, Gregory D.S. 2006. Auxiliary Verb Constructions. Oxford: OUP. Anderson, Gregory D.S. 2007. The Munda Verb. Typological Perspectives. Berlin: Mouton de Gruyter (Trends in Linguistics, Studies and Monographs, 174). Anderson, Gregory D.S. 2008. Gtaʔ. In Anderson (ed.), pp. 682–763. Anderson, Gregory D.S. (ed.) The Munda Languages. London & New York: Routledge. Anderson, Gregory D.S. 2011a. Auxiliary Verb Constructions (and Other Complex Predicate Types): A Functional-Constructional Typology. Language and Linguistics Compass 5 (11): 795–828. Anderson, Gregory D.S. 2011b. Auxiliary Verb Constructions in the Languages of Africa. Studies in African Linguistics 40 (1–2): 1–409. Anderson, Gregory. D.S. 2012. What Munda languages are really like. Presented at ICOLSI, keynote address. Shillong. Anderson, Gregory D.S. 2015a. Overview of the Munda languages. In M. Jenny and P. Sidwell (eds.) The Handbook of Austroasiatic Languages. Volume 1. Grammar Sketches of the World’s Languages. Mainland and Insular South East Asia. Amsterdam: Brill. 364–414. Anderson, Gregory D.S. 2015b. Prosody, phonological domains and the structure of roots, stems and words in the Munda languages in a comparative/historical light. Journal of South Asian Languages and Linguistics 2 (2): 163–183. Anderson, Gregory D.S. 2016. Do Koraput Munda, Lower Munda, and even South Munda really exist? Once more on the still unresolved classification of the Munda languages. In S. Pattanayak, C. Pattanayak and J.M. Bayer (eds.) Multilingualism and Multiculturalism. Perceptions, Practices and Policy. Delhi: Orient Blackswan, pp. 313– 334. Anderson, Gregory D.S. In preparation. The Gtaʔ Language. Texts. Lexicon. Grammar. Anderson, Gregory D.S. and Opino Gomango. 2016. On the current status and state of Juray in the Sora-Juray cluster. Foundation for Endangered Languages XX. Bath, UK: FEL 104–109. Anderson, Gregory D.S. and Opino Gomango. 2017ms. Sociolinguistic and linguistic differences between Juray and Sora. Presented at the International Conference of the Linguistic Society of Sri Lanka. Colombo, November 2017. Anderson, Gregory D.S. and Opino Gomango. 2019. Grammatical case marking and referent indexing in Sora-Juray. Presented at 8th International Conference on Austroasiatic Languages (ICAAL 8). Chiangmai, Thailand. Anderson, Gregory D.S., Luke Horo and Opino Gomango. In preparation. A grammar of Sora. Anderson, Gregory. D.S. and K. David Harrison. 1999. Tyvan. München: Lincom. Anderson, Gregory. D.S. and K. David Harrison. 2008. Sora. In Gregory D.S. Anderson (ed.) The Munda Languages, London & New York: Routledge, pp. 299–380. Anderson, Gregory D.S. and Bikram Jora 2016. Borrowing and metatypy in the history

proto-munda prosody, morphotactics and morphosyntax


of the Munda languages. Presented at Conference on Indian Languages in Contact, Pune. Arsenault, Paul. 2012. Retroflex consonant harmony in South Asia. Ph.D. Dissertation. University of Toronto. Arsenault, Paul. 2016. Retroflexion in South Asia. Presented at workshop on South Asian linguistic areas. Uppsala. Bätscher, Kevin. 2015. Mlabri. In Jenny & Sidwell (eds.), pp. 1003–1030. Banker, Elizabeth M. 1964a. Bahnar affixation. Mon-Khmer Studies 1: 99–117. Banker, Elizabeth M. 1964b. Bahnar reduplication. Mon-Khmer Studies 1: 119–134. Bauer, Christian. 1982. Morphology and syntax of spoken Mon. Dissertation. University of London. School of Oriental and African Studies, Department of the Languages and Cultures of Southeast Asia and the Islands. Bauer, C. 1986. Recovering extracted infixes in Middle Khmer. Mon-Khmer Studies 15: 155–164. Bauer, Christian. 1989. The Verb in Spoken Mon. Mon-Khmer Studies 15: 87–110. Bauer, Christian. 1990. Reanalyzing reanalyses in Katuic and Bahnaric. Mon-Khmer Studies 16–17: 143–182. Benjamin, Geoffrey. 1976. An outline of Temiar Grammar. In P.N. Jenner, L.C. Thompson, and S. Starosta (eds.) Austroasiatic Studies Part I, 129–188, (Oceanic Linguistics Special Publication No. 13). Honolulu: University of Hawaii Press. Benjamin, Geoffrey. 2011. Temiar Morphology (and Related Features): A View from the Field. Fifth International Conference Austroasiatic Linguistics 5. Mahidol University, Salaya, Thailand. Bisang, Walter. 2015. Modern Khmer. In Jenny & Sidwell (eds.), pp. 677–716. Bloch, Jules. 1934. L’indo-aryen, du véda aux temps modernes. Paris: Adrien Maisonneuve. Bodding, Paul O. 1922. Materials for a Santali grammar, I (mostly phonetic). Dumka: Santal Mission of the Northern Churches. Bodding, P.O. 1929. Santal Grammar for Beginners. Dumka: Santal Mission of Northern Churches. Bos, Kees Jan & Paul Sidwell. 2015. Kui Ntua. In Jenny & Sidwell (eds.), pp. 837–880. Brunelle, Marc. 2015. Vietnamese, In Jenny & Sidwell (eds.), pp. 909–953. Brunelle, Marc and Pittayawat Pittyaporn. 2012. Phonologically-constrained change. The role of the foot in monosyllabization and rhythmic shifts in Mainland Southeast Asia. Diachronica 29 (4): 411–433. Burenhult, Niclas. 2002. A Grammar of Jahai. Dissertation. Department of Linguistics and Phonetics, Lund University. Burrow, Thomas and S. Bhattacharya. 1970. The Pengo language: grammar, texts and vocabulary. Oxford: Clarendon Press. Butler, Becky. 2015. Bunong. In Jenny and Sidwell (eds.), pp. 719–745.



Costello, Nancy A. 1966. Affixes in Katu. Mon-Khmer Studies 2: 63–86. Costello, Nancy A. 1998. Affixes in Katu of the Lao P.D.R. Mon-Khmer Studies 28: 31–42. Costello, Nancy A. 2001. Aspect and Tense in Katu of the Lao P.D.R. Mon-Khmer Studies 31: 121–125. Daladier, A. 2011. Pnaric-War-Lyngngam and Khasi as a branch of Pnaric. Journal of Southeast Asian Linguistics 4: 169–206. Deepadung, Sujartilak, Ampika Rattanapitak and Supakit Buakaw. 2015. Dara’ang Palaung. In Jenny and Sidwell (eds.), pp. 1065–1103. Diffloth, Gérard. 1979. Jah-Hut, an Austroasiatic language of Malaysia. Southeast Asian Linguistic Studies, vol. 2: 73–118. Canberra: Australia National University. Donegan, Patricia J. 1993. Rhythm and vocalic drift in Munda and Mon-Khmer. Linguistics in the Tibeto-Burman Area 16 (1). 1–43. Donegan, Patricia J. & David Stampe. 1983. Rhythm and the holistic organization of language structure. In John Richardson, M. Marks, & A. Chukerman (eds.) The Interplay of Phonology, Morphology, and Syntax, 337–353. Chicago: Chicago Linguistic Society. Donegan, Patricia J. & David Stampe. 2004. Rhythm and synthetic drift of Munda. In R. Singh (ed.) Yearbook of South Asian Languages and Linguistics 2004, 3–36. Enfield Nick J. and Gérard Diffloth. 2009. Phonology and sketch grammar of Kri, a Vietic language of Laos. Cahiers de Linguistique—Asie Orientale (CLAO), 38(1): 3–69. François, Alexandre. 2014. Trees, Waves and Linkages. Models of Language Diversification. In Claire Bowern and Bethwyn Evans (eds.) The Routledge Handbook of Historical Linguistics, pp. 161–189. London: Routledge. Gradin D. 1976. Word affixation in Jeh. Mon-Khmer Studies 5: 25–42. Horo, Luke. 2017. A Phonetic Analysis of Assam Sora. Ph.D. Dissertation, Indian Institute of Technology, Guwahati. Horo, Luke and Gregory D.S. Anderson. Towards an intonational typology in Sora. Presented at workshop on the prosody of under-represented languages. International Congress of Phonetic Sciences. Melbourne, Australia. August, 2019. Horo, Luke and Priyankoo Sarmah. 2015. Acoustic analysis of vowels in Assam Sora. In L. Konnerth et al. (eds.) Northeast Indian Linguistics 7, 69–86. Canberra: ANU. Israel, Michael. 1979. A grammar of the Kuvi language. Trivandrum: Dravidian Linguistics Association. Jenner, P. and S. Pou 1982. A lexicon of Khmer morphology. Mon-Khmer Studies 9–10: 1–517. Jenny, Mathias. 2003. New infixes in spoken Mon. Mon Khmer Studies 33: 183–194. Jenny, Mathias. 2005. The verb system of Mon. Zurich: ASAS. Jenny, Mathias et al 2015b. Syntactic typology of Austroasiatic. Presented at ICAAL 6, Siem Reap. Jenny, Mathias and Patrick McCormick. 2015. Old Mon. In Jenny & Sidwell (eds.), pp. 519–552.

proto-munda prosody, morphotactics and morphosyntax


Jenny, Mathias. 2015. Modern Mon. In Jenny & Sidwell (eds.), pp. 553–600. Jenny, Mathias, Tobias Weber and Rachel Weymuth. 2015. The Austroasiatic Languages: A Typological Overview. In Jenny and Sidwell (eds.), pp. 13–143. Jenny, Mathias and Paul Sidwell (eds.). 2015. The Handbook of Austroasiatic Languages. 2 volumes. Leiden: Brill. Jenny, Mathias, Hiram Ring and Wei Wei Lee. 2019. On reconstructing Proto-Austroasiatic syntax. Methodology and implications. Presented at SEALS 29. Tokyo, Japan. May 2019. Jora, Bikram and Gregory D.S. Anderson. 2017. Towards the Proto-Kherwarian verb: Historical-comparative study of negation, TAM and person-indexing interdependencies. Presented at International Seminar on Munda linguistics held in Deccan College, Pune. Jora, Bikram and Gregory D.S. Anderson. this volume. Proto-Kherwarian negation, tam and person-indexing interdependencies. Kruspe, Nicole. 2004. A Grammar of Semelai. Cambridge University Press. Kruspe, Nicole 2015. Semaq Beri. In Jenny & Sidwell (eds.), pp. 475–516. Kruspe, Nicole, Niclas Burenhult and Ewelina Wnuk. 2015. Northern Aslian. In Jenny & Sidwell (eds.), pp. 419–474. Kuiper, F.B.J. 1948. Proto-Munda Words in Sanskrit. Amsterdam: Noord-Hollandsche Uitgevers Maatschappij. Kuiper, F.B.J. 1965. Consonant variation in Munda. Lingua 14: 54–87. Li, Jinfang. 1996. Bugan: A New Mon-Khmer Language of Yunnan Province, China. MonKhmer Studies 26: 135–159. Li, Jinfang and Yongxian Luo. 2015. Bugan. In Jenny & Sidwell (eds.), pp. 1033–1062. Matisoff, James. 1991. Sino-Tibetan linguistics: Present state and future prospects. Annual Review of Anthropology 20: 469–504. Matisoff, James. 2003. Aslian: Mon-Khmer of the Malay Peninsula. Mon-Khmer Studies 33: 1–58. Milne, Leslie. 1921. An elementary Palaung grammar. London: Oxford University Press. Nagaraja, K.S. 1993. Khasi dialects: a typological consideration. Mon-Khmer Studies 23: 1–10. Nagaraja, K.S. 2015. Standard Khasi. In Jenny & Sidwell (eds.), pp. 1145–1185. Nichols, J. 1992. Linguistic Diversity in Space and Time. Chicago: University of Chicago Press. Olsen, Neil H. 2015. Kơho-Sre. In Jenny & Sidwell (eds.), pp. 746–788. Omar, Asmah Haji. 1975. The verb in Kentakbong. In P.N. Jenner, L.C. Thompson, and S. Starosta (eds.) Austroasiatic Studies Part II, 951–960, (Oceanic Linguistics Special Publication No. 13). Honolulu: University of Hawaii Press. Osada, Toshiki. 2005. How many proto-Munda words in Sanskrit?–with special reference to agricultural vocabulary. 1–24. Presented at Harvard RIHN Roundtable.



Osada, Toshiki. 2008. Mundari. In Anderson (ed.), pp. 99–164. Patnaik, M. 2008. Juang. In Anderson (ed.), pp. 508–556. Peterson, John M. 2008. Kharia. In Anderson (ed.), pp. 434–507 Peterson, John M. 2011. Grammar of Kharia. Leiden: Brill. Pinnow, Heinz-Jürgen. 1959. Versuch einer historischen Lautlehre der Kharia-Sprache. Wiesbaden: Otto Harrassowitz. Pinnow, Heinz-Jürgen. 1960. Über den Ursprung der voneinanderabweichenden Struktur der Munda und Khmer-Nikobar Sprachen. Indo-Iranian Journal 4: 81–103. Pinnow, Heinz-Jürgen. 1963. The position of the Munda languages within the Austroasiatic language family H.L. Shorto (ed.). Linguistic Comparison in Southeast Asia and the Pacific. London: SOAS. 140–152. Pinnow, Heinz-Jürgen. 1966. A Comparative Study of the Verb in the Munda Languages. In: Norman H. Zide (ed.): Studies in Comparative Austroasiatic Linguistics. The Hague et al.: Mouton (Indo-Iranian Monographs, V). 96–193. Post, Mark W. 2011. Prosody and typological drift in Austroasiatic and Tibeto-Burman: Against “Sinosphere” and “Indosphere”. In S. Srichampa et al. (eds.) Austroasiatic Studies, 198–221. Presmsrirat, Suwilai and Nattamon Rojankul 2015. Chong. In Jenny & Sidwell (eds.), pp. 603–640. Radhakrishnan, R. 1981. The Nancowry word: phonology, affixal morphology and roots of a Nicobarese language. Edmonton: Linguistic Research, Inc. Reddy, J., 1979. Kuvi grammar Vol. 4. Mysore: Central Institute of Indian Languages. Ring, Hiram. 2015. Pnar. In Jenny and Sidwell (eds.), pp. 1186–1226. Ross, Malcolm D. 1988. Proto-Oceanic and the Austronesian languages of Western Melanesia. Canberra: ANU. Ross Malcolm D. 1997. Social networks and kinds of speech-community event. In Roger Blench and Matthew Spriggs (eds.) Archaeology and language. Volume 1: Theoretical and methodological orientations. London: Routledge, 209–261. Ross, Malcolm D. 2007. Calquing and metatypy. Journal of Language Contact, Thema 1: 116–143. Schiller E. 1994. Khmer nominalizing and causativizing infixes. In K.L. Adams and T.J. Hudak (eds.) Papers from the Second Annual Meeting of the Southeast Asian Linguistics Society, pp. 309–326. Arizona State University, Program for Southeast Asian Studies. Si, Aung. 2015. Danau. In Jenny & Sidwell (eds.), pp. 1104–1141. Sidwell, Paul. 2008. Issues in the morphological reconstruction of Proto-Mon-Khmer. in Bowern, Claire, Bethwyn Evans and Luisa Miceli (eds.), Morphology and Language History: In honour of Harold Koch, 251–265. Canberra. Sidwell, Paul. 2009. Classifying the Austroasiatic Languages: History and State of the Art. München: Lincom.

proto-munda prosody, morphotactics and morphosyntax


Sidwell, Paul. 2011. Proto-Khasian and Khasi-Palaungic. Journal of Southeast Asian Linguistics 4: 144–168. Sidwell 2012-ms. Diversity, discontinuity and asymmetry in the typological restructuring of Mainland Southeast Asian languages. Revised version published as Sidwell (2015c)? Sidwell, Paul. 2013. Issues in Austroasiatic Classification. Languages and Linguistics Compass 7/8: 437–457. Sidwell, Paul. 2015a. Austroasiatic Classification. In Jenny & Sidwell (eds.), pp. 144–220. Sidwell, Paul. 2015b. Car Nicobarese. In Jenny & Sidwell (eds.), pp. 1229–1265. Sidwell, Paul. 2015c. Local drift and areal convergence in the restructuring of MSEA. In N. Enfield and B. Comrie (eds.). Languages in Mainland South East Asia: The state of the art. Berlin: de Gruyter, pp. 51–81. Sidwell, Paul and Felix Rau. 2015. Austroasiatic Comparative-Historical Reconstruction: An Overview. In Jenny and Sidwell (eds.), pp. 221–363. Smith, Kenneth D. 1969. Sedang Affixation. Mon-Khmer Studies 3: 108–124. Smith, R.L. 1973. Reduplication in Ngeq. Mon-Khmer Studies 4: 85–111. Solntseva, Nina. 1996. Case-marked Pronouns in the Taoih Language. Mon-Khmer Studies 26: 33–36. Steever, S.B., 1993. Analysis to synthesis: The development of complex verb morphology in the Dravidian languages. Oxford: Oxford University Press. Svantesson, Jan-Olof. 1983. Kammu phonology and morphology. Sweden: CWK Gleerup. Thomas, D.M. 1969. Chrau affixes. Mon-Khmer Studies 3: 90–107. Thomas, Dorothy M. 1990. The Instrument/Locative and Goal Affix -N- in Surin Khmer. Mon-Khmer Studies 16–17: 85–98. Watson, Richard. 2011. A Case for Clitics in Pacoh. In Sophana Srichampa, Paul Sidwell, and Kenneth Gregerson (eds.), Austroasiatic Studies: Papers from ICAAL4. MonKhmer Studies Special Issue No. 3. Dallas, SIL International; Salaya, Mahidol University; Canberrra, Pacific Linguistics. pp. 222–232. Watson, Saundra. 1964. Personal pronouns in Pacoh. Mon-Khmer Studies 1: 81–97. Watson, Saundra. 1966. Verbal affixation in Pacoh. Mon-Khmer Studies 2: 15–30. Witzel, Michael. 2003. Linguistic Evidence for Cultural Exchange in Prehistoric Western Central Asia, Sino-Platonic Papers 129: 1–70. Witzel, Michael. 2009 [2012]. Origin and development of language in South Asia: Phylogeny versus epigenetics? Paper presented at Darwin and Evolution, mid-year meeting of the Indian Academy of Sciences, Hyderabad, India, July 3, 2009. http://‑3:HUL.InstRepos:8554510 You Sey. 1976. Some Old Khmer affixation. Mon-Khmer Studies 5: 85–95.

chapter 7

The Proto-Munda Predicate and the Austroasiatic Language Family Felix Rau



The fact that most languages in the Munda branch of Austroasiatic have extensive verbal morphology has led to the widespread assumption that protoMunda itself had a morphologically complex verb. Pinnow (1966) as well as Norman Zide and Gregory Anderson (Zide & Anderson 2001, Anderson & Zide 2001, Anderson 2004, Anderson 2007) tried to reconcile the diversity of affixes and clitics and the abundance of morphological structures in modern Munda languages by reconstructing complex verbal morphology in the common ancestor. Reconstructions along these lines set proto-Munda apart from other Austroasiatic languages and from what we know of the history of the Austroasiatic family. Furthermore, a morphologically complex proto-Munda locates many crucial morphological developments in a pre-proto-Munda stage and has proto-Munda emerge as an exceptional Austroasiatic language apart from all other branches and with no explanation how and when it changed so dramatically. My goal is to suggest an alternative hypothesis and restart the discussion of how the Munda branch developed and how it fits into the Austroasiatic language family. The primary claim of this paper is that proto-Munda was a language with very few bound morphemes. In fact, these bound morphemes are well known from other branches of Austroasiatic—making proto-Munda a rather typical Austroasiatic language. The other side of this claim is that the extensive morphology found in Munda languages today can be shown to be a later development in branches of the Munda group and often even in individual languages. This paper argues that it is time to shift the focus of research from * I would like to thank the anonymous reviewer wholeheartedly for their detailed and thoughtful comments. While I considered all of the points they raised carefully, I did not address all of them here, but I hope this paper evokes more detailed and thoughtful reactions like these. This would be the discussion we need for the history of the Munda languages and their role in the Austroasiatic family.

© koninklijke brill nv, leiden, 2020 | doi:10.1163/9789004425606_009

the proto-munda predicate


the static exceptionality of proto-Munda and the Munda languages towards the investigation of the process that made the Munda branch an exceptional group of Austroasiatic—an endeavour broadly in the spirit of Donegan & Stampe (1983, 2002, 2004) and Donegan (1993). The approach advanced here also entails a shift in scope from a narrow view of historical morphology towards the predicate position of proto-Munda and its place in the syntactic and prosodic organization of the Munda languages. Recent works on individual Munda languages—especially Peterson (2011a, 2011b) and the papers in Anderson (2008a)—have contributed crucial data supporting a reconstruction of proto-Munda with a morphologically simple verb and a set of pre-verbal particles. The morphological diversity in modern Munda languages is a central problem for explanations positing a morphological complex proto-Munda. For the approach argued in this paper, the diversity of affixes and clitics and the abundance of structures in Munda languages is not an inscrutable obstacle, but an integral part of the development of the sub-branches of Munda.


Some General Considerations

Any account of the development of verbal morphology in the Munda languages has to present a reconstruction of proto-Munda and show how it relates to what we know about the history of the Austroasiatic family. Previous accounts by Pinnow (1966) as well as Norman Zide and Gregory Anderson (Zide & Anderson 2001, Anderson & Zide 2001, Anderson 2004, Anderson 2007) have reconstructed a morphologically complex verb in proto-Munda and explained the structures and morphemes in modern Munda languages by substantial remodelling of the morphological structure in the development of the differrent languages. Although it has to be said that the structures reconstructed for proto-Munda became less and less complex, the resulting proposals developed a typologically unlikely picture of the morphological changes with constant loss and emergence of prefixes as well as suffixes and enclitics. The holistic shift proposal developed by Donegan & Stampe (1983, 2002, 2004) and Donegan (1993) offers a consistent model for the changes that it posits, but has problems to explain the particularities of Munda morphology and their development. Although I believe Donegan and Stampe are ultimately right to look for an explanation in terms of prosodic and rhythmic patterns and their development, this abstract account cannot explain the considerable differences among the Munda languages, nor does it account for the diversity of specific developments in the various languages.



The challenge is to develop a model that accounts for the differences as well as similarities in the morphological templates of the individual Munda languages and posits a morphological and syntactic structure in proto-Munda from which the different structures developed. This paper constitutes a first step towards such a model. The underlying assumption of this paper is that morphology is “a reflection of the historical sequence of grammaticalization of affixes” (Mithun 2000, p. 232). For the templatic structures of the different Munda languages, the basic assumption is that these structures developed successively as morphemes closer to the verb root were bound earlier than more peripheral verbal markers. If we take the morphological structure of Gorum as an example and in particular focus on the suffix domain, the basic assumption states that the ventive suffix -aj developed later than the undergoer suffixes1 represented in this example by the first person singular -iŋ, which in turn developed later than the tense markers, represented here by the past tense suffix -r(u). (1) aɖaʔ-r-iŋ-aj thirsty-pst:act-1su-ven ‘I was thirsty.’ The assumption that the grammaticalization of person suffixes preceded the grammaticalization of the ventive -aj is supported by the fact that the ventive only occurs in the Sora-Gorum group, while object suffixes occur in several branches of Munda. The tense marker -ru can be reconstructed to proto-Munda *lə. However, there are specific paths that result in new affixes appearing in positions closer to the root than older affixes. In particular, there is one supplementary mechanism that is crucial to explaining the diverse inventories of tense aspect affixes in Munda languages. This mechanism explains the different tense/aspect suffix inventories in the individual languages. Examples (2) and (3) illustrate this situation with word forms from Gutob. (2) sui-tu-niŋ plow-act.npst-1sg ‘I will plow.’ (cf. Griffiths 2008, p. 643)

1 The undergoer suffixes mark object on transitive verbs, but in a small group of intransitive patientive verbs, these suffixes mark the only argument, mostly experiencers.


the proto-munda predicate

(3) pi-loŋ-niŋ come-mid.npst-1sg ‘I will come.’ (cf. Griffiths 2008, p. 660) The active non-past suffix -tu in (2) has cognates in several Munda languages and can be reconstructed in proto-Munda. The middle voice non-past suffix loŋ in (3) is only attested in Gutob. The suffix -loŋ is probably derived from a verbal root and the source construction for the current suffix in this position is a combination of a non-finite main verb—most likely the stem—followed by a finite auxiliary loŋ. Through this path, established templatic positions can become hosts for new affixes long after the initial grammaticalization of affixes in this location.


The Proto-Munda Predicate

The reconstruction presented in this paper differs substantially from previous accounts of proto-Munda. Most significantly, it focusses on the predicate position with particles and in relation to its position in the clause as opposed to the purely morphological approaches of Pinnow (1966), Zide & Anderson (2001) as well as Anderson (2004, 2007). The reconstructed structure consists of a morphological simple verb that could be augmented by a very small set of derivational pre- and infixes, allowed for reduplication and probably featured incorporation of monosyllabic nominal forms: deriv-rdl:v-inc. This verb is set in a clausal structure that can be reconstructed as in Figure 7.1. [subj]




caus [verb](=) asp:voice


figure 7.1 Proto-Munda clause

The various syntactic positions of the predicate complex in proto-Munda could be occupied by a small set of markers. Some of these markers—such as the causative *Oˀp, the reciprocal particle *kƏl or the negators *əˀt and *Um—can be reconstructed with high certainty. The same is true for some of the aspectvoice formatives that occupied the postverbal position. The perfective *lə and imperfective *tə are well attested across the different Munda branches as is the middle voice marker *n (see also Rau 2011 for a reconstruction of tense and voice in the proto-Munda predicate). The status of active voice *ˀt as well as other aspect or voice markers is still problematic and needs further research. The most tentative part of the proto-Munda predicate is the pre-negator moodaspect position that is based on reflexes in Gorum and Gutob, as well as its sim-



ilarity to phenomena in Khasic and Palaungic languages. The resulting reconstruction can be represented as: mod/asp neg recp caus deriv- rdl: root -inc asp *A *O *Vj *mO

*əˀt *kƏl *Um




*lə perf :*n mid *tə imperf :*ˀt act

figure 7.2 The syntactic positions and reconstructed morphemes

The existing evidence allows us to form a hypothesis about the prosodic structure of the proto-Munda predicate. The verb itself consists of a predominantly monosyllabic root that can be reduplicated or can incorporate a monosyllabic nominal combining form. Proto-Munda had a strong quantitative bias for mono- and bisyllabic stems. The pre-verbal particles were simple freestanding phonological words. This situation still found in a slightly different way in Kharia (Peterson 2011a, 2011b) as well as in many other Austroasiatic languages. The causative *Oˀp immediately preceded the stem. Although all reflexes of *Oˀp are bound morphemes in modern languages. The lack of any lexicalized reflexes in North Munda suggests that it was not yet bounded to the stem in proto-Munda. The stem was immediately followed by a combined aspect-voice marker that was monosyllabic. Although, the evidence from modern Munda languages suggests that this marker could have been already closely bound at a proto-Munda stage, this poses new challenges for any account of the developement, as it has been generally assumed that the prefixes predate any suffixes. We can thus posit the following prosodic structure for the predicate complex of proto-Munda: (ω) | (σ)

ω | σ

ω | σ

ω | σ





ω [[(σ)

root rdl deriv

(ω) σ]




inc root root

figure 7.3 The proposed prosodic structure assocoated with the predicate

the proto-munda predicate


The structure reconstructed here is similar to modern languages in other branches of Austroasiatic. The crucial difference are the fused aspect-voice post-stem position and the potential boundedness of this post-stem syllable in proto-Munda. The details, evidence, and consequences of this reconstruction are discussed in the following sections.


Previous Accounts

The complexity of Munda verbal morphology is exceptional both in the Austroasiatic family and in South Asia. This has led to a considerable interest in historical verbal morphology of Munda. The seminal treatment of verbal morphology in Pinnow (1966) is still highly relevant and shows some similarity to the reconstruction argued for in this paper. Pinnow sums up his analysis with the statement that “only the simple and compound verb stem with the primary affixes [i.e. recp, caus, and refl, F.R.] and with the aspect affixes partially expanded by -ed, -en, -ug [i.e. recp, caus, and refl, F.R.], can be considered an old and genuine verb complex.” (Pinnow 1966, p. 180) This results in the following structure: neg subj [recp/caus:root(:inc)-refl-asp:(in)trans] obj figure 7.4 Pinnow’s reconstruction of the proto-Munda predicate/clause

Pinnow also reconstructs morphemes for the different morphological positions. Besides the pronouns, he reconstructs the negators *kwam/kwom and *adro. The reciprocal *qəl- as well as the causative *əb- or *ab-, and a reflexive *-dom. Additionally, Pinnow reconstructs six aspect suffixes which he groups in two categories: imperfective—called infective in Pinnow (1966, p. 179)— and perfective. The imperfective suffixes are the progressive *-ta, habitual *-e, and durative *-ia. His perfective suffixes are the resultative *-oka, and nonresultative *-le. These suffixes were closely joint with transitivity markers: transitive forms were zero marked or featured the suffix *-ed, intransitives were marked by *-en, and passive forms were marked by *-ug. Although some forms such as the reciprocal *qəl-, the causative *əb-/*ab-, and the non-resultative perfective *-le are virtually identical to the reconstructions proposed here, other forms seem difficult to justify for proto-Munda with our current understanding of the development of the different languages. Another remarkable reconstruction was presented by Zide & Anderson (2001). They not only explicitly claim that proto-Munda had a complex verbal morphology, but also that parts of this morphology represent archaisms



that have to be reconstructed for proto-Austroasiatic (Zide & Anderson 2001, p. 517). Their account of the proto-Munda verb assumes the most complex verbal morphology with bound subject and object markers as well as a negative prefix along with causative, reciprocal as well as voice and tense morphology. Following Zide & Anderson (2001, p. 518), the resulting morphological structure of the proto-Munda verb comprises eight morphological positions and can be represented as: subj-neg-[caus/recip-rdpl-root-pass/intr]-trans/tense-obj figure 7.5 Zide & Anderson’s 2001 reconstruction of the proto-Munda verb

The proto-Munda verb, as proposed by Zide & Anderon (2001), requires extensive loss, re-grammaticalisation of the same categories and considerable restructuring of the verbal morphology. In the case of the subject prefixes, they assume a remarkable case of de-grammaticalization in Kherwarian (following Anderson & Zide 1999) in which the prefixes detached from the verb and developed into preverbal enclitics, attaching to any material preceding the verb. Anderson (2007) is more cautious about the status of the subject and object markers and gives them tentatively affix or clitic status. This results in the slightly modified structure: subj= caus/recip- verb -inc -tns/asp/trans =obj figure 7.6 Anderson’s 2007 reconstruction of the proto-Munda verb

Even in this version, a reattachement of the subject proclitic or prefix is required to account for all structures attested in the different Munda languages. So far, all accounts posit a morphologically complex proto-language and then focus on the loss of morphology and sometimes even several cases of demorphologization. This places the development of bound morphemes into a preproto-Munda stage and leaves it unexplained. The approach advocated here reconstructs a basic set of morphemes and posits a morphological structure from which all current verb morphologies delevoped without demorphologization or defaulting to loss of bound morphemes as the preferred explanation.

the proto-munda predicate



Reconstruction of the Proto-Munda Morphemes

This section provides evidence for the reconstructions of the morphemes populating the template outlined in the previous section. Most of the morphemes are attested widely in the branch and can be reconstructed with a high degree of certainty. 5.1 Roots Verbal roots in Munda are mostly mono-syllabic with a (C)V(C) structure, although bi- and exceptionally even tri-syllabic roots may have existed in protoMunda. We can reconstruct a number of verbal roots, but a thorough discussion of specific verb roots or hypotheses about phonological root structures or other patterns is beyond the scope of this paper. The six reconstructed roots taken from Sidwell & Rau (2014) in (4) are a good examples of proto-Munda roots. (4) *kaˀp ‘to bite’ *ɟOm ‘to eat’ *gEˀp ‘to burn’ *gam ‘to speak’ *uˀt/uˀk ‘to drink’ *sEn ‘to go’ Additionally, a few polysyllabic roots have to be reconstructed for protoMunda. Examples (5) and (6) were also taken from Sidwell & Rau (2014). The root in (5) is one of the few certain examples of this type. However, several of the putative polysyllabic roots may turn out to be polymorphemic structures, such as *(bə)ɡoˀj ‘to kill’ in (6). The form *(bə)ɡoˀj is best interpreted as a combination of the separately reconstructed root *goˀj ‘to die’ and a causative prefix *bə-, discussed in the following section. (5) *gətaɟ ‘to sleep’ (6) *(bə)ɡoˀj ‘to kill’ 5.2 Causative *Oˀp The causative morpheme *Oˀp can be reconstructed for proto-Munda. This particular causative is widely attested in the Munda family and has generally been reconstructed as a prefix—e.g. Pinnow (1966, pp. 114 and 178) reconstructs *əband *ab- while Anderson (2004, p. 160) reconstructs *əˀb-. A closer look at the evidence in the different Munda languages and comparative evidence from other branches of Austroasiatic suggests that *Oˀp might have been a pre-verbal particle and not a morphologically bound prefix. Evidence for this is presented below. The following table illustrates the wide range of reflexes of *Oˀp.

206 table 7.1

Gorum Sora Gutob Remo Gtaʔ Kharia Juang Korku Mundari

rau Causative morphemes in Munda languages



abab-/əboboaʔoˀb-, obab-, am-, ap-, (a-), (u-), (o-) (a-) (a-)


-ˀb-, -b-, -ʔ-b-

The reflexes of *Oˀp in Kharia and Juang as well as the Sora-Gorum branch are mostly self-evident—althought the allomorphy in Juang warrants a second look in the discussion of North-Munda causatives. There is also hardly any doubt that Remo o- and Gutob ob- derive from proto-Gutob-Remo *ob-. Gtaʔ aʔ- is also a straightforward reflex of *Oˀp, since according to our current understanding of the development of Gtaʔ, proto-Munda *ˀp became ʔ, at least in morpheme final position in this language. This causative also occurs as an infix with bisyllabic stems in several southern languages. This process of infixation is probably rather old, but its complete absence from the Northern languages is indicative. In fact, the North Munda languages, the Khewarian branch as well as Korku, lack definite reflexes of *Oˀp altogether. These languages generally feature causative suffixes, such as Santali -oco (Neukom 2001, p. 139) and Korku -khej, -ej (Nagaraja 1999, p. 57). A causative prefix is only attested in frozen lexical forms. These lexemes contain a formative a- that has been generally considered a remnant of the same morpheme that gave rise to Sora ab-/əb- and Kharia oˀb-, the causative reconstructed as *Oˀp here. While this is certainly a possible explanation, an alternative explanation is proposed in the following. Lexical pairs such as Mundari ajom ‘to feed’ and jom ‘to eat’ found in North Munda are generally interpreted as lexicalized reflexes of *Oˀp. The lack of any reflex of *ˀp in these forms and especially the apparent complete absence of any instance of a lexicalised *-ˀp- make a reconstruction as **A- a phonologically more likely alternative. A loss of final /b/ (or *ˀp) is theoretically possible, as the pair Remo o- and Gutob ob- attests. However, it cannot be motivated by known sound changes in North Munda and the lack of any fossilized instances

the proto-munda predicate


of *-ˀp- make this even less likely. On the other hand, the reconstruction of a separate causative **A- can be supported by data from other Munda languages as well as Austroasiatic languages outside of Munda. The interpretation of North Munda *a- is crucial for the status of the causative *Oˀp in proto-Munda. The position taken here is that with the currently available data North Munda *a- is better interpreted as a reflex of an old causative prefix **A- and not taken as a reflex of *Oˀp but. Since *Oˀp is thus only attested as a bound morpheme for the southern Munda languages, it should be reconstructed as a preverbal particle in proto-Munda becoming bound at a later stage, when North-Munda had separated from the rest of the family. Another consequence of this hypothesis is that *Oˀp is not linked to proto-Austroasiatic *p- (Sidwell 2008, p. 260). Reflexes of pAA *p- can be found in some fossilised forms and are discussed in the following paragraph. Even though, *Oˀp cannot be linked to pAA morphology, it may have cognates outside of the Munda branch. Especially the similarity of the proto-Munda causative *Oˀp to the causative auxiliary op in the South Bahnaric language Chrau is striking. This auxiliary derives from the lexeme op ‘to make’ (Thomas 1971). Beyond reflexes of *Oˀp, Munda languages feature scattered reflexes of other formatives. These reflexes suggest the existence of three causative prefixes in proto-Munda: **A-, **bə-, and **tA-. Maybe the most problematic is **A-, because of the closeness of its reflexes to the reflexes of *Oˀp, as discussed above. The lexicalized causatives found in North-Munda—e.g. Korku and Mundari a-—are reflexes of a proto-Munda **A-. The lexically motivated variants a-, u-, o-, of the causative ab- in Juang (Patnaik 2008, p. 536) may also be reflexes of **A-, since the loss of morpheme-final *b cannot be motivated phonologically. For proto-Munda, we reconstruct the lexeme pair *goˀj ‘to die’ and *(bə)ɡoˀɟ ‘to kill’ (see Sidwell & Rau 2014). The prefixal *bə- in the latter lexeme is based on Gtaʔ bagweʔ ‘to kill’ which relates to gweʔ ‘to die’ in the same language. This is tentative evidence for a causative prefix *bə- which in turn may be related to Khasi pɨn (Nagaraja 2014, p. 1155) and Golden Palaung pʌn (Mak 2012, p. 73), one type of Bahnaric causatives such as Pacoh pa- (Watson 1966, p. 17) and ultimately to proto-Austroasiatic *p- as reconstructed by Sidwell (2008). A similar verb pair from Gorum—kḭn ‘to die’ and takḭn ‘to kill’—suggests a fossilized causative *ta- in Gorum. This could be evidence for a proto-Munda causative prefix **tA- which in turn could be related to a second type causatives in Bahnaric—such as Bahnar tơ- (Banker 1964, p. 105) Chrau ta- (Watson 1969, p. 91). Interestingly, the causatives of the Bahnaric branch are particularly informative when reconstructing the causatives of proto-Munda. Bahnaric has causa-



tives with initial bilabial such as Pacoh pa-, with a dental such as Chrau ta-, vowel or glottal stop initial such as Pacoh ʔa- (Alves 2006, p. 33) or Bahnar a- (Banker 1964, p. 105) and pre-verbal causative auxiliaries such as Chrau op. Munda shows fossilised remnants of causative prefixes **bə-, **tA-, as well as **A- and many Munda languages still possess a productive prefixes reflecting proto-Munda *Oˀp. This results in the following correspondences. Whether this superficial similarity can be substantiated by advances in historical phonology has to be seen, but the similarities are striking. Bahnaric pata(ʔ)a- op proto-Munda **bə- **tA- **A- *Oˀp figure 7.7 Comparison of Bahnaric and protoMunda causatives

While the three proto-Munda prefixes have to be considered tentative, we can reconstruct the proto-Munda causative *Oˀp with a high degree of certainty. While it has been generally assumed that *Oˀp was already a bound morpheme in proto-Munda and a reflex of proto-Austroasiatic *p-, the hypothesis proposed here assumes three bound causative morphemes in proto-Munda— **bə-, **tA-, and **A-—with parallel prefixes in other branches of Austroasiatic and a free pre-verbal particle *Oˀp with a possible cognate free form in Chrau (Thomas 1971). 5.3 Incorporation Most Munda languages show substantial reflexes of post-root nominal incorporation. In some languages—most notably Sora (Anderson & Harrison 2008b)—noun incorporation is still a productive process, which seems to resemble the original process quite closely. Nominal forms that were incorporated are generally monosyllabic roots (Zide 1976) that do not conform with the so called bimoraic constraint (Anderson & Zide 2001b), requiring free nominal forms to consist of either one heavy syllable or more than one syllable. Anderson in this volume also discusses noun incorporation in the Munda languages. Productive or fossilized noun incorporation is attested in Sora (Anderson & Harrison 2008b, p. 351), Gorum (Zide 1976), Remo (Anderson & Harrison 2008a, p. 602), Gutob (Griffiths 2008, p. 662) Kharia (Peterson 2011a, p. 122), Juang (Patnaik 2008, p. 539), Gtaʔ (Anderson 2008b, p. 736), and—although much more sparsely attested—in Kherwarian languages (Anderson, Osada & Harrison 2008, p. 228). In all languages, the verb immediately precedes the incorporated noun. Table 7.2 illustrates the parallel morphological structures by analog examples from seven Munda languages.

the proto-munda predicate table 7.2



Verbs with noun incorporation

abaːsi ‘wash hands’ abaː ‘wash’

< sˀiː ‘hand’ (Anderson & Harrison 2008b, p. 354) Gorum asiʔ ‘wash hands’ *a ‘wash’ < siʔ ‘hand’ (field notes) Gutob gujti ‘wash hands’ guj ‘wash’ < titi ‘hand’ (Griffiths 2008, p. 640) Remo guiti ‘wash hands’ gui ‘wash’ < titi ‘hand’ (Bhattacharya 1968, p. 68) Kharia gu’ɟ ‘wash hands’ gu’ɟ ‘wash’ < tiʔ ‘hand’ (Pinnow 1959, p. 23) Juang guidi ‘wash hands’ gui ‘wash’ < iti ‘hand’ (Patnaik 2008, p. 539) Gtaʔ gweʔti ‘wash hands’ gweʔ ‘wash’ < tti ‘hand’ (Anderson 2008b, p. 737)

The status of noun incorporation in proto-Munda is difficult to determine, but it is possible that some sort of incorporation was already present in protoMunda and worked along the lines attested in modern Sora. Specific reconstructions of the incorporated nouns are not ventured here, but a position for incorporated nouns is reconstructed in the predicate template. 5.4 Reciprocal *kƏl Proto-Munda had a pre-verbal reciprocal particle *kƏl. The existence of such a formative as well as its form are generally undisputed. Pinnow (1966, p. 116) already reconstructs *qəl- and other writers follow him implicitly or explicitly. However, the claim made here is that the morphological status and its position in relation to other formatives was different than generally assumed. Pre-stem reciprocal morphemes can be found in four Munda languages— Sora, Juang, Gtaʔ, and Kharia—and there are remnants of such a morpheme in the lexicon of Gorum. In Sora, Juang and Gtaʔ the reciprocal is a prefix positioned before the stem and the causative prefix and following the negative prefix. The particle kol in Kharia is an phonological independent word that is positioned in the syntax of the Kharia predicate complex between the verb (potentially with its causative prefix ob-) and the preceding negative particle um. Biligiri (1965) regards kol as a prefix, but Peterson (2011a and 2011b) argues convincingly that kol has to be regarded as a phonological word by itself with clear prosodic characteristics of a free phonological word. It directly precedes the verb and only so called incorporated nouns can intervene (Peterson 2011a, p. 128). The status of Kharia kol as an independent particle is crucially different from the prefixes in Sora, Juang, and Gtaʔ. Gorum displays isolated remnants in the lexicon such as al-pa’d ‘to mend’ relating to pa’d ‘to sew’. This formative is interpreted here as cognate with Sora əl- and allows us to reconstruct the prefix *əl- for proto-Sora-Gorum.

210 table 7.3

Sora Gorum Juang Gtaʔ Kharia

rau Reciprocal morphemes in Munda languages



əl(al-) ko-/kuhokol

prefix lexicalized prefix prefix prefix particle

Kharia kol, Juang ko- (with a variant ku-), Sora əl-, and the probable Gorum al- can be reconstructed confidently as proto-Munda *kƏl. The connection of Gtaʔ ho- is less secure, but final *l was lost in Gtaʔ as swa ‘fire(wood)’ from protoMunda *səŋal and usa ‘skin’ from proto-Munda *usal demonstrate. Gtaʔ /h/ is one possible reflex of proto-Munda *k, although the sound laws that lead from *k to Gtaʔ /h/ and /k/ are not understood yet. However, it seems to be a viable hypothesis to posit a pre-verbal particle *kƏl in proto-Munda that accounts for Sora əl-, Gorum al-, Juang ku-, Gtaʔ ho-, as well as Kharia kol. Proto-Munda *kƏl can be connected to Golden Palaung kʌr (Mak 2012, pp. 71 and 100). Shorto (1963, p. 53) gives kər- for Palaung, which he genetically connects to Riang-Lang tər-, while he also lists a separate Riang-Liang kər-. Since Pacoh also has kar-/tar- reciprocals (Watson 1966, p. 20), the connection of *kVr and *tVr seems to be old and linking Munda with the Palaungic, and Katuic branches of Austroasiatic and with Palaung in particular. If the Jeh reciprocal ta- (Gradin 1976, p. 35), the Bahnar reciprocal tơ- (Banker 1964, p. 107), Sedang to- (Smith 1969, p. 115), the Kammu reciprocal tr- (Sidwell 2008, p. 262) and Katu ta- (Castello 1966, p. 70) can be connected to this complex, a reciprocal cluster *kVr/*tVr would be attested in Munda, Palaungic, Katuic, Bahnaric as well as Khmuic. This would suggest a very old pre-verbal reciprocal. If the the reciprocal *kVr is not connected to *tVr, the /k/ initial reciprocal would still link Munda, Palaungic and Katuic. Three languages of the southern group lack the reciprocal despite featuring the preceding negative: Gutob and Remo as well as Gorum. In the case of Gutob and Remo no reflex can be found in the lexicon, while there is at least one clear lexical instance in Gorum. These three languages employ an alternative strategy to form reciprocals which involves the reduplication of the verb stem and changing the active voice to middle voice marking. In these three languages, the simple stem of these verbs with the middle voice would be interpreted

the proto-munda predicate


as a reflexive form, the reduplication changes this to a reciprocal interpretation. The Gorum lexicon still contains remnants of a prefix al- as discussed above, showing that at least Gorum had reciprocal forms with a prefix derived from *kƏl. However, these forms were replaced by the reduplication and middle voice strategy and only remnants survived in the lexicon. No reflexes of *kƏl are known from either Remo or Gutob making it possible that it was never bound in this branch of Munda. However, a loss parallelling their direct geographic neighbour Gorum is possible. The Khewarian languages (e.g. Santali, Neukom 2001, p. 130) and Korku (Nagaraja 1999, p. 56) feature a reciprocal infix -pV-. This infix has the appearance of an old morphological device, but its history is problematic. Although, an infix *-p- has been reconstructed for proto-Austroasiatic (Sidwell 2008, p. 260), its function as a nominalising instrumental is very different from the reciprocal function of North Munda -p-. No parallels are known from other Munda languages. Unless a connection between pAA *-p- and the reciprocal forms in North Munda can be demonstrated, this reciprocal remains restricted to the North Munda branch.2 The evidence produced by Peterson (2011a, 2011b) proving that Kharia kol is an independent phonological word is crucial evidence for our understanding of the development of the Munda languages. The fact that the reflex of *kƏl in Kharia is a free word, suggests that the other reciprocals became bound only after Kharia seperated form the rest. The lack of reflexes in North Munda is then not an instance of the loss of a prefix, but these languages in all likelihood never had a bound morpheme dervied from *kƏl. This development is parallel to the development proposed for the causative. The only difference here is that in the case of the causative Kharia groups with the other southern Munda languages by having bound morphemes derived from *Oˀp, while in the case of the reciprocal *kƏl, Kharia never developed a reciprocal prefix but kept the free reciprocal word kol, while the use of the reciprocal words derived from *kƏl fell out of use in the Khewarian languages and Korku. 5.5 Negation *əˀt / *Um A set of negators or negative polarity markers can be reconstructed for the proto-Munda stage. The reconstructed forms are *əˀt, *Um and more tentatively *ka, and *ban. Table 7.4 lists all negators related to the four reconstructed negative markers. 2 An anonymous reviewer points out that there might be reflexes of a bilabial retroflex in Gtaʔ as well. This would suggest that it might be possible to reconstruct a reciprocal *-p- for protoMunda.

212 table 7.4

rau Negation morphemes in Munda languages

*əˀt Sora Gorum Gutob Remo Gtaʔ Juang Kharia Ho Turi Santali Korku

ədar-, oraraar-, aara-, (*aʔ)





amaum ka ka baŋ ban

As can be seen from this table, there is ample evidence from Gorum, Sora, Gtaʔ, Gutob, Remo, and Juang for a negator *əˀt or *ər. Even though the most widespread form of the negative prefix is /ar/, the variety of forms—especially reported from Sora—suggest that /ar/ is not the form closest to the proto-form. The variation includes Sora aʔ-/əʔ-, aʔn-, aʔd-, ədn-, and əd- but also Kharia *aʔ, discussed below. These variants suggest that Sora əd- is the more original form from which other forms are derived. Sora əd- makes proto-Munda *əˀt the most likely proto-form. The fact that Kharia lacks a bound negative marker and the fact that negation precedes the reciprocal, which is not bound in Kharia either, suggests that the negative morpheme was not attached to the verb until Kharia and with that most likely also the North Munda languages where separated from the southern languages Gorum, Sora, Gtaʔ, Gutob, Remo, and Juang. That means that we have to reconstruct *əˀt as a free word in proto-Munda. Comparative evidence from other branches of Austroasiatic suggest that the negative marker *əˀt is not an innovation of the Munda languages, but connects to negative markers in other Austroasiatic languages and ultimately to the verb *ʔət ‘to lack’ (MKCD 943). An etymology already suggested by Donegan and Stampe (2004), who equate Sora əd- to Austroasiatic *ət which is their equivalent of MKCD 943 *ʔət. Although its main negator um is not derived from *ʔət, Kharia has negative forms that could be interpreted as reflexes of *ʔət. Peterson (2011a, p. 338) takes the second person singular form abu, which is also used for third persons, as

the proto-munda predicate


the base form. The internal structure of abu is—if it exists—not apparent, but two other forms of negation in Kharia are interesting: aʔbar ‘neg.2du/2hon’ and aʔpe ‘neg.2pl’. Elsewhere, the clitics =bar ‘=2du/2hon’ and =pe ‘=2pl’ are the bound forms of the personal pronouns ambar and ampe respectively and in this case seem to attach to a formative aʔ. This formative aʔ could be reasonably reconstructed as a reflex of *əˀt and would add Kharia to the languages in which *əˀt is attested. Unlike other reflexes of *əˀt, Kharia aʔ is not bound to the verb and in fact looks more like the base to which =bar ‘=2du/2hon’ and =pe ‘=2pl’ attached. This would add further support to the notion that negators were free forms in protoMunda. Beyond reflexes of *ʔət, Juang possesses a prefix ama- (Pattanaik, 2008, p. 537), which cannot be explained as a reflex of *əˀt, but would have developed from a negator containing the bilabial /m/. The Kharia negative particle um (Peterson, 2011a, p. 335) is the freestanding equivalent. If other free standing negators such as the Gorum negative imperative ambu ‘Don’t!’ are connected, it seems justified to tentatively posit a negator *Um for protoMunda. Ho and Turi negative particle ka (Deeney 1978, p. 174; Anderson, Osada & Harrison, 2008, p. 227) has no clear cognate form in branches other than Khewarian. Santali baŋ, Chaibasa Ho ban and Korku ban attest a negation device *ban that cannot be traced beyond proto-North-Munda, the generally assumed common ancestor of Khewarian and Korku. Reflexes of *ban have clear characteristics of a negative copula and suggest a more recent verbal origin, than *əˀt and *Um. If we assume both Khewarian *ka and North-Munda *ban are reflexes of negators already present in the proto-Munda stage, we would have to posit *kA and *bAn. However, unless these formatives can be connected to etymons in wider Austroasiatic, positing these two negators for proto-Munda is not well supported.3 5.6 Tense and Mood Prefixes in Gorum A set of prefixes occurring in the negative prefix position in Gorum encodes modal-aspectual semantics as well as negative polarity. The position can be occupied by one of three prefixes. Two of the prefixes encode negative polarity and mood/aspect, while the third prefix only encodes modality:

3 An anonymous reviewer suggests possible parallels for *bAn in Kơho-Sre and Khmer as well as Danau, Palaung, and Old Mon for *kA.



ar- negative past, negative irrealis, negative conditional or- negative non-past, negative imperative aj- irrealis This system of modal vowel alteration in the prefixes is parallelled by a similar vowel alteration in the negative copula: iŋkaʔ negative irrealis copula, negative conditional copula iŋkoʔ negative realis (negative indicative) copula iŋkuʔ non-finite copula The negative prefix ar- is also part of the aspectual particle-like negative copula arlaŋ ‘not yet’ which derives from a—probably verbal—*laŋ. The Gutob middle voice non-past suffix -loŋ (Griffiths 2008, p. 654) could be a cognate to the laŋ part of Gorum arlaŋ. The two very distinct morphological positions of the modal vowel alterations—fused parts of verbal prefixes and in the coda of the negative copula— characterise this phenomenon as an archaic feature. The semantic similarities between ar- as well as aj- with iŋkaʔ and between or- and inkoʔ suggest that the phenomena are indeed historically related and go back to a threefold distinction. The morphological distribution in modern Gorum characterises them as originally freestanding morphemes. The three morphemes can be reconstructed as: *A ‘realis + perfective’ *Aj ‘irrealis’ *O ‘realis + imperfective’ This phonological reconstruction of the three morphemes is highly tentative and their actual phonological substance is impossible to determine with the available evidence. This shortcoming stems first and foremost from the fact that a single vowel in modern Gorum can correspond to three or more vowels on proto-Munda level. Furthermore, consonants in some prefixes seem to have been weakened or deleted in the development of Gorum, making an original (C)V(C) structure of these morphemes conceivable. Beyond these crucial limitations, the evidence for pre-verbal mood-aspect morphemes which in part fused with the negative marker *əˀt and gave raise to the two negative prefixes or- and ar- as well as the irrealis prefix aj- is conclusive. This observation is corroborated by the Gutob negatives ar- ‘not’ and mor- ‘not yet’ (Griffiths 2018, p. 659). These two prefixes display an aspectual distinction that corresponds to a difference in the vowels (and onset).

the proto-munda predicate


The different semantics and form of the negative prefixes or- and ar- in Gorum and Gutob ar- and mor- cannot be explained by a development from *əˀt alone. These forms are best explained as forms that arose from a combination of the negative *əˀt with preceding aspectual-modal markers. The only univocal reflex of these aspectual-modal markers is Gorum irrealis aj-, but the forms in modern Gorum and Gutob suggest that at least perfective, imperfective, and irrealis markers occurred in the position preceding the negative marker. The available evidence suggests the following five reconstructions: *Aj*A+*əˀt *Aj+*əˀt *O+*əˀt *mO+*əˀt

Gorum aj- ‘irrealis’ Gorum ar- ‘negative past’ and Gutob ar- ‘negative’ Gorum ar- ‘negative irrealis’ Gorum or- ‘negative non-past’ Gutob mor- ‘negative imperfective (not yet)’

A common source of Gorum ar- ‘negative past’ and Gutob ar- ‘negative’ in the combination of a perfective *A and the negative əˀt is possible. An interesting open issue is how to relate the supposed imperfective *mO- posited as an explanation of Gutob mor- ‘not yet’ and imperfective *O- that is currently posited to explain Gorum or- ‘negative non-past’. The similarity in semantics and the possible similarity of the posited vowel *O in *mO- and *O- makes commons source of Gorum or- and Gutob mor- tempting. This reconstruction allows to posit a pre-verbal modal-aspectual position that preceded the negation position. Since the only evidence from within the Munda branch comes from Gorum and Gutob, it is possible that it is an innovation of a subbranch of Munda. However, The archaic appearance of this feature and evidence from other branches of Austroasiatic suggest that it is an inherited feature. In Golden Palaung, aspect auxiliaries precede the negators in pre-verbal position (Mak 2012, p. 27). These auxiliaries comprise markers for perfective, progressive/durative, inchoative among others (Mak 2012, p. 84).4 Pnar has a pre-negator mood position. This syntactic position accommodates a passivizer and a deontic marker, but more crucially the realis marker da as well as the irrealis marker daw occur in this position (Ring 2013). The existence of a comparable pre-negator modal and aspectual position in Palaungic and Khasic languages further supports the hypothesis of a pre-negator mood-aspect position in proto-Munda.

4 See also Janzen (1976) for an analysis of a different Palaung variety, Pale.

216 table 7.5

rau Past/perfect morphemes in Munda languages

Past/perfect Sora Gorum Santali Mundari Korku

-lə -ru -let’/-len -le (-le)

5.7 Tense/Aspect Munda languages have a large inventory of tense/aspect markers. However, most of these morphemes cannot be reconstructed at the proto-Munda stage, but seem to have developed at later stages. This paper will focus on the two best attested tense/aspect markers: the perfective *lə and the imperfective *tə; see also Rau 2011 for a discussion of the history of tense/aspect and voice morphology. Two branches of Munda, Kherwarian and Sora-Gorum, feature clear reflexes of a perfective marker, that can be reconstructed with a level of certainty as *lə. Given the occurrence in these very different branches we can posit perfective *lə already for proto-Munda. The relevant markers are given in the following table. In Sora-Gorum, perfective *lə developed into a general past tense marker. Nagaraja (1999, p. 74) describes Korku past perfect as v-pst-ɖaːn with intransitive past -en and transitive past -khe in the pst position. However, v-le and v-le-ɖaːn forms—as in (7) and (8)—are given there as well. This suffix is not discussed by Zide (2008), but the position of the suffix -le and its contribution to the meaning of the verb form suggest a connection to *lə. (7) ji-le give-pst ‘gave’ Nagaraja (1999, p. 74) (8) ji-le-ɖaːn give-pst-pst.perf ‘had given’ Nagaraja (1999, p. 74)

the proto-munda predicate table 7.6


Future/imperfect morphemes in Munda languages

Non-past/imperfective/progressive Sora Gorum Gutob Kharia Mundari

-tə -tu -tu =te/=ta -ta

There is another very tantalizing word, that bares some resemblance in form and function with the other reflexes of *lə. In Gtaʔ, post-verbal læʔ ‘to stay, to remain’ functions as a perfective auxiliary. Our knowledge of historical phonology of Munda and Gtaʔ in particular does not allow for a reliable reconstruction,5 but it is possible that is connected to the perfective marker *lə. In this case, the reconstructed marker could not have existed as a clitical marker in proto-Munda, but it would have certainly been a free verbal element *lə(ʔ) at this stage. However, all substantiated evidence points to *lə being a bound morpheme in proto-Munda. Outside of Munda, some etymons with a reported meaning ‘completed’ or ‘finished’ look promising as cognates. For example, Sidwell (2000) reconstructs *ləʔ ‘completed’ for proto-South-Bahnaric and later positing *lɛʔ ‘completed’ also for proto-Bahnaric (Sidwell 2011). The other well attested tense-aspect marker is the imperfective *tə/tɛ. It is better attested in the southern Munda languages than *lə, but not as well in North Munda. However, Mundari possesses a progressive suffix -ta (Osada 2008, p. 127), that could be related. If the active past markers Kharia =oʔ, Gutob and Remo -oʔ, and Juang -o are related to Korku active past -èʔ (Zide 2008, p. 273), this set should go back to proto-Munda as well. However, in this case the marker would seem to consist of a fused aspect and voice morpheme already at proto-level.

5 The biggest problem is that læʔ is also reported as laʔ in modern Gtaʔ. It is possible to reconstruct Gtaʔ læʔ as *laˀc on proto-Munda level. However, this is only one of several options and a purely mechanical one, as not other reflexes of such a hypothetical proto-Munda word exists.



Other tense/aspect morphemes attested in one or more branches may have existed in proto-Munda, but evidence is so far not sufficient to reconstruct any of these on the proto-level. 5.8 Voice *-n All Munda languages make a voice distinction between active and middle voice—sometimes called transitive and intransitive respectively. Although voice marking is fused with tense/aspect marking in most Munda languages, Rau (2011) reconstructs middle voice *-n for proto-Munda. As discussed there, Kherwarian suggests that the middle voice *-n was paralleled by an active voice marker *-ˀt. However, there is no evidence for *-ˀt outside of Kherwarian, making the reconstruction of the active voice morpheme considerably less reliably than the middle voice marker.6 In southern Munda languages outside of Sora-Gorum, tense/aspect morphemes are completely fused with voice morphemes into a single marker. There is no way to separate the active voice part from the past tense component in Kharia active past clitic =oʔ or in Gutob and Remo active past suffix -oʔ. This suggest a very complex history of tense/aspect and voice morphology in Munda, beyond the few morphemes reconstructable to the proto-Munda level.


Person Markers

The development of person markers in Munda languages has been a topic of discussion for some time—starting with Pinnow (1966) and most influentially Zide & Anderson 2001, Cysouw (ms.), and Anderson (2007). It is probably the most widely discussed topic in historical morphology of Munda languages. In fact, the history of person marking is particularly complex in these languages and would warrant a separate extensive study. For the sake of brevity, this section focuses first and second person markers and the etymologically related pronouns. The first and second person pronouns—in particular the first and second person singular—have good cognate forms in other branches of Austroasiatic, so that these grammaticalized person markers are ultimately connected to free and bound forms in other Austroasiatic languages. Third person markers are etymologically more diverse and developed from other sources than personal pronouns and are excluded here.

6 Pinnow’s *-ed (Pinnow 1966, p. 115) is the equivalent in his reconstruction to this marker. He connects Korku -èʔ to this complex.

the proto-munda predicate table 7.7




The position of person markers in Munda languages

-O -O -O -O

=S =S

Gorum, Juang, (Sora) Gtaʔ Kherwarian Gutob, Kharia Kherwarian Gutob, Remo, Kharia Sora, Korku

Their particular etymologies and morpho-syntactic positions require a separate treatment, but do not change the overall picture delevoped in this section. 6.1 Person Marking Patterns in Munda Languages At least seven different patterns of person marking can be distinguished in modern Munda languages (Table 7.7). The patterns result from combinations of subject prefixes, object suffixes, as well as subject enclitics preceding the verb or attached to the end of the verb. 6.2 Subject Markers Six individual languages and the various Kherwarian languages feature subject marking in the predicate, but only four languages—Juang, Gtaʔ, Gorum, and Sora—have markers not homophonous with pronouns or not transparently derived from contemporaneous pronominal free forms. The markers of these four languages are also the only subject prefixes, as opposed to the clitics the other languages. These prefixes are clearly old and worth considering as reflexes of older stages of subject marking. A closer look at the different paradigms in Table 7.8 reveals very different structures in the four languages. In Juang, Gtaʔ, and Gorum the structures mirror the paradigmatic structure of the free pronouns of the respective language. The exception among the four is Sora, the closest relative of Gorum. Sora only has a single prefix ə-. Unlike the prefixes in the other languages that encode person and number information, ə- only marks plurality. The form of the prefixes in Juang, Gtaʔ, and Gorum suggests that they are historically related to pronominal forms, but their sometimes archaic and always very contracted form indicates some age. For example, Gorum first person singular ne- corresponds to the free pronoun miŋ ‘’, but reflects *niŋ, an older stage of the pronoun. Despite this historical depth, the prefixes clearly reflect

220 table 7.8

rau Pronouns and subject markers in Munda languages

1 SG

Juang Gtaʔ Gorum


V-/Øaiɲ nnæŋ nemiŋ

baniɲba niʔniaʔ lebileŋ






1 PL INCL nVniɲ (næʔ-) næʔ

2 SG

2 DU

2 PL

mVam nana momaŋ

aapa papa bomaiŋ/baiŋ

Vape pepe



=me/=m am

=ben aben




Pronominal clitics (subject and object marking): Santali =iɲ/=ɲ =laŋ =liɲ =bo(n) =le iɲ alaŋ əliɲ abo(n) ale


=pe ape

language-specific developments in the pronominal domain. The contrast in the first person singular subject prefixes—between Gtaʔ n- and Gorum ne- on the one side and Juang V-/Ø- on the other—is best explained by the corresponding lack of initial /n/ in the first person pronoun in Juang. The same is true for the /l/ in the Gorum first person plural prefix le- as opposed to Juang nV- and Gtaʔ næ. Again the prefixes reflect distinct developments in the pronominal domain of the different languages. Crucially, the differences cannot be explained by general sound changes, but only by lexeme specific changes to the corresponding pronouns. The uniformity of the negation prefixes in these languages is in stark contrast to the heterogenity of the subject prefixes. The differences in the subject prefixes and the fact that they parallel the pronominal paradigms of the different languages suggest that free pronouns which proceeded the verb became bound in Juang, Gtaʔ, and Gorum (or alternatively proto-Sora-Gorum), separately. The patterns found in these four languages cannot be explained by an early, single grammaticalization event. 6.3 Object Markers Juang, Gorum, Sora, and Korku as well as the languages of the Kherwarian branch feature object marking on the verb. Although, the patterns for object marking are less diverse than the subject marking systems, the basic situa-


the proto-munda predicate table 7.9

Pronouns and object markers in Munda languages

1 SG

Juang Gorum Sora Korku

-ɲ aiɲ -iŋ eniŋ -iɲ ɲen -(i)ɲ iɲ

1 DU INCL -ɲba niɲba -ileŋ enleŋ -lɛn anlen -laɲ alaɲ


1 PL INCL -ɲeniɲ niɲ

-liɲ aliɲ

-buɲ abuɲ

2 SG

2 DU

2 PL

-m am -om enom -əm amən -mi am

-pa apa -ibeŋ enbeŋ -ben ambeɲ -piɲ apiɲ

-pe ape

-pe ape

=me/=m am

=ben aben

=pe ape


-le ale

Pronominal clitics (subject and object marking): Santali =iɲ/=ɲ =laŋ =liɲ =bo(n) =le iɲ alaŋ əliɲ abo(n) ale

tion is similar. The object markers in the Khewarian languages are phonological identical to the subject clitics in these languages, but their morphosyntactic behaviour is slightly different. Just as the subject clitics, object clitics are regularly derivable from the corresponding free pronouns. In the other four languages—Juang, Gorum, Sora, and Korku—we find specialized object suffixes. The markers in Korku are very similar to the clitics in the closely related Kherwarian languages, but are generally described as suffixes not enclitics (Nagaraja 1999, p. 67). The structure of the paradigm is identical to the pronominal paradigm in Korku and to the paradigms in Kherwarian languages. The three other languages display a decidedly different situation. The paradigms of Sora and Gorum are identical in structure and the suffixes only differ between the two languages according to sound laws applying in the SoraGorum branch. This suggests that object marking was already present in protoSora-Gorum. Object marking in Juang, is again different and is in form and paradigmatic structure closer related to the free pronouns of Juang than to the object suffixes in the other languages. 6.4 Development of Person Markers in Munda The evidence suggests that the grammaticalization of person marking in Munda languages were separate processes in the different branches. Most first and second person markers developed from pronominals and to the degree



the pronouns are cognate across Munda languages the person markers are ultimately related. However, the branch or language specific developments in the pronominal lexemes are reflected in the bound markers. Showing clearly, that the markers developed after these changes occurred in the sub-branches or individual languages. In fact, the person markers of Kherwarian are simply bound or cliticized versions of the free pronouns and bear no close relation to any markers in other languages with the exception of the object suffixes in Korku. In other branches, the grammaticalization may have happened at early stages of the branches. So, it is possible to reconstruct the object suffixes of proto-Sora-Gorum, based on the reflexes in the modern languages. The picture emerging from the available evidence suggests that while most Munda languages developed some sort of morphological person marking, proto-Munda had no bound person markers on the verb.


Comparative Evidence of Morphological Structure

Several templates describing the morphological structure of the proto-Munda verb have been proposed. The original proposal is Pinnow (1966), but Anderson (2007) in Figure 7.8 can be considered the state of the art. subj= caus/recip- verb -inc -tns/asp/trans =obj figure 7.8 Morphological of the proto-Munda verb according to Anderson (2007)

The slots in Table 7.10 are based on the morphological structure found in the languages of the Sora-Gorum branch as well as in Juang, because the languages of this subgroup are the closest to the reconstructed structure proposed by Anderson (2007) and other previous accounts of the proto-Munda verb. The structure underlying Table 7.10 represents a hypothesis for morpheme order in proto-Munda based on which morphemes can be reconstructued at the proto-Munda stage and it what order they are attested in modern languages. In the following, I will argue that this morpheme order is in fact the best hyposthesis and examine the status of the morphemes in these position, especially in regard to morphological boundedness at the proto stage.


the proto-munda predicate table 7.10 Common morphological slots with bound morphemes in Munda languages

prefixes stem suffixes s mood recp caus rdl root inc asp voice o Gorum Sora Juang Gtaʔ Gutob Remo Kharia Santali Korku

x (x) x x

x x x x x x


Pre-stem Positions

(x) x x x

x x x x x x x (x) (x)

x x x x x x (x) x x

x x x x x x x x x

x x x x x x x

x x x x x x x x x

x x (x) (x) (x) (x) (x) x (x)

x x x

(x) (x)

The prefix domain of Juang, Gtaʔ, Gorum, and Sora is the best available evidence for pre-verbal markers in proto-Munda. Examples (9) to (15) illustrate the current range of prefixes. (9) Juang a-ku-buji-ri-kia neg-recp-like-prs-dl ‘They don’t like each other’ (Patnaik 2008, p. 517) (10) Juang m-ab-soj-e 2-caus-learn-fut ‘You will teach me.’ (Patnaik 2008, p. 530) (11) Juang ni-kɔ-ɔɳ-se-na 1pl-recp-see-prf-fut ‘We will see each other.’ (Patnaik 2008, p. 533)



table 7.11 The prefix domain in Munda languages

subj Juang

Gtaʔ Gorum Sora

(9) (10) (11) (12) (13) (14) (15)











buji soj ɔɳ ba ble so’ɟ gə{b}rɔj

-ri-kia -e -se-na -ke


(12) Gtaʔ a-ho-ba-ke neg-recp-get-t/a ‘He did not meet (the cat)’ (Anderson 2008b, p. 731) (13) Gtaʔ n-ar-aʔ-ble 1-neg-caus-ripen ‘Shouldn’t I be cultivating (grass)?’ (Anderson 2008b, p. 712) (14) Gorum ne-r-ab-so’ɟ-om 1sa-neg-caus-learn-act:2su ‘I didn’t teach you.’ (15) Sora ə-ədn-əl-gə{b}rɔj-l-aj pl-neg-recp-{caus}:feel.ashamed-pst-1 ‘We (exclusive) didn’t shame each other.’ (Donegan & Stampe 2004, p. 4) The prefixes in these four languages all follow the same basic order as demonstrated in Table 7.11. As shown in the first part of this paper, the negation, reciprocal, and causative prefixes of these languages are cognate. The North Munda languages lack these prefixes and any reflexes of the relevant morphemes. This could be explained by assuming that the prefixes were lost in the development of North Munda or by assuming that these morphemes were not morphologically bound at the proto stage. The crucial evidence comes


the proto-munda predicate

from Kharia. The configuration illustrated in Table 7.11 is paralleled by the patterns found in the morphology and syntax in Kharia. Figure 7.9 gives the structure according to Peterson (2011, p. 335). neg=pers/num/hon



v2(s)=perf= tam/voice

figure 7.9 The predicate in Kharia according to Peterson (2011, p. 335)

The crucial part is the sequence neg recp caus-root. In contrast to Gtaʔ, Juang, Gorum and Sora, the reciprocal and the negator are not bound morphemes. The subject enclitic cannot head the syntagma in the first position and thus occurs in second position with the negative as its host, resulting in the sequence [neg=subj] recp caus-root. The crucial part for reconstructing the proto-Munda predicate is that the sequence in Kharia contains three phonological words [neg=pers]ω [recp]ω [caus-lexeme]ω while basically exhibiting the same morpheme sequence—neg+recp+caus—as Juang, Gtaʔ, Gorum, and Sora. ω σσ neg=pers/num/hon figure 7.10

ω σ recp

ω σ σ (σ) caus-lexeme

Prosodic structure of the verb and preverbal postions in Kharia

The prosodic structure in Figure 7.10 is the best evidence that the negator *əˀt and the reciprocal *kƏl were not bound in proto-Munda. Evidence for a pre-verbal subject position is present in every branch of Munda. The patterns favour an original position of the subject preceding the negative marker. In languages such as Juang, Gtaʔ, Gorum (or proto-Sora-Gorum), the subject pronoun developed into a subject prefix, while in Kharia and the Khewarian languages it developed into an enclitic that has to follow a host, resulting in the general pattern [np=subj] [verb] in Kherwarian and the negation pattern [neg=subj] [recp] [caus-root …] in Kharia. This allows us to reconstruct the sequence of free words subj neg recp preceding the verb in proto-Munda. As discussed earlier, there is limited, but intriguing evidence from Gorum and Gutob for an aspect or mood slot following the subject position, but preceding the negator. Although the remaining evidence is tightly bound to the negative morpheme in Gutob and Gorum, there is no evidence that the aspect/mood morpheme was bound at the proto stage. We can thus tentatively expand the preverbal position to subj (asp/mood) neg recp.



The last pre-stem marker that can be reconstructed for proto-Munda is the causative *Oˀp. The causative immediately precedes the stem. While all current reflexes of *Oˀp are bound morphemes, the lack of any fossilized reflexes of *Oˀp in North Munda and evidence from outside of Munda discussed above raise the possibility that the *Oˀp causative was not bounded in proto-Munda. Given its position relative to the stem and the fact that all its reflexes are bound morphemes, the probability that *Oˀp was already bounded in protoMunda is higher than with any other prefix. Fossilized remnants of at least three causative morphemes—**bə-, **tA-, and **A- —can be found in different Munda languages. These derivational morphemes were almost certainly bound, resulting in the following pre-stem structure: subj (asp/mood) neg recp caus deriv-root figure 7.11


Proto-Munda pre-stem structure

Post-stem Positions

In contrast to modern Munda languages, the post-stem domain in protoMunda was minimal. There is substantial evidence for aspect morphemes directly following the stem. The evidence discussed in the first part of the paper mostly suggests that the aspect morphemes may have been already bound on proto-Munda level. The aspect morphemes were followed by voice morphemes. The cohesion between the aspect morphemes and the voice morphemes is particularly strong. In several modern Munda languages, aspect and voice morphemes are consistently expressed as portmanteau morphemes. Even in the languages in which separate voice morphology can be identified, voice morphemes are phonologically minimal. The only morpheme that can be reliably reconstructed is middle voice *-n. The form of the middle voice marker makes it unlikely that it was a free standing word even in protoMunda. From this evidence the post-verbal morphology can be reconstructed as [verb]=asp:voice. The situation of the object markers is similar to the subject markers. The markers in the different languages are cognate, but the fact that language and lexeme specific changes in the pronouns are reflected in the markers is evidence that the object pronouns were bound later in separate events. This means the whole post-verbal domain in proto-Munda can be reconstructed as [verb]=asp:voice obj.


the proto-munda predicate


Consequences of the Reconstruction

The structure in Figure 7.12 shows the complete reconstructed proto-Munda clausal core. The clause had a basic SVO structure with modal/aspectual, negation, reciprocal particles and a causative particle or auxiliary positioned between subject and verb. The verb was immediately followed by a combined aspect and voice marker. [subj]


figure 7.12



caus [verb]=asp:voice [obj]

The proto-Munda clause

While modern Munda languages differ considerably in syntax and morphology from other Austroasiatic languages, the clausal template in proto-Munda is very similar to clausal structures found in many Austroasiatic languages. Figure 7.13 compares the proto-Munda template to the clausal patterns of Palaung (Mak 2012), Pnar (Ring 2013), and Chrau (Thomas 1971).

pMunda (asp/mood) Palaung asp

Pnar Chrau

neg recp caus neg intent capability

mood neg asp mood/asp/neg aux intent.v

figure 7.13


[deriv- root]

v+inc [caus- main.v]

=asp:voice directional adverbs reflexive pro adverbial

The proto-Munda clause compared to other Austroasiatic languages

Based on the comparitive evidence and the reconstructed morphemes, we can try to reconstruct the prosodic structure of the predicate in proto-Munda. The stem consisting of a single root would be a monosyllabic word. Reduplicated roots, roots with derivational prefixes—in particular the proposed three causative morphemes **bə-, **tA-, and **A- —and bisyllabic roots form bisyllabic words. This mono- or bisyllabic verb is preceded by monosyllabic auxiliaries or particles. The word status of reciprocal *kƏl and negative *əˀt can be reconstructed with high confidence, based on the evidence from modern Kharia. The independent word status of causative *Oˀp is decidedly less certain, but it is assumed to be an independent particle or auxiliary in the reconstruction proposed here. The reconstructed modal-aspectual morphemes preceding the negation marker are not attested well enough to reconstruct their status in proto-Munda. Their status as an independent word is a conjecture based on



their position in the predicate complex and the fact that they also occur at the end of the negative copula in Gorum. Beyond the derivational prefixes, the reconstructed voice morphemes are reconstructed as bound morphemes with the highest certainty. Middle voice *n and the tentative active voice *-ˀt are not syllabic and thus cannot form an independent word on their own. As discussed in the first part of this paper, the voice morphemes already formed a close unit with the aspectual morphemes in proto-Munda. Whether the unit of aspect and voice was an independent word or bound to the stem is a particularly interesting question with far reaching consequences for any model of the development of morphology in the Munda languages and will be discussed below in more detail. The resulting structure in Figure 7.14 was framed by the preceding subject position and the final object position. (ω) | (σ)

ω | σ

(mod/asp) neg

ω | σ

ω | σ



ω [[(σ)

root rdl deriv figure 7.14

(ω) σ]




inc root root

Prosodic structure of the proto-Munda predicate

The reconstruction of the proto-Munda clause presented here and in particular the prosodic status of the different components raises important question for the development of verbal morphology in the Munda languages. Austroasiatic languages have a well known preference for prefixes (Donegan & Stampe 2002). Although such a preference for prefixes is typologically unusual (Cysouw 2009, Himmelmann 2014), it is possible that this preference persisted for some time after the Munda branch formed. Modern Munda languages have a strong preference for suffixation. At some point in the development of the modern Munda languages, the preference must have changed from the general Austroasiatic prefixing to modern Munda suffixing. Donegan & Stampe (1983, 2002, 2004) and Donegan (1993) developed a proposal that postulates a holistic shift from a head-first, analytic language with rising rhythm to a head-final, synthetic language with falling rhythm during the development of the Munda branch. A scenario based on these premises would assume that the prefixes in Munda languages are older than the suffixes. Fur-

the proto-munda predicate


thermore, prefixes closer to the stem should be the oldest bound morphemes, while the more peripheral prefixes are younger, but older than the suffixes close to the stem, leaving the peripheral suffixes as the youngest morphology. While at first glance, this scenario seems to broadly match the history of Munda verb morpholgy, the evidence presented in this paper suggests a more complex development. Even if we allow for this holistic switch to appear in the different branches of Munda at different times, the scenario is to simple to explain the attested patterns. The proposed archaic causative prefixes **bə-, **tA-, and **A- have cognate prefixes in other branches of Austroasiatic and thus seem to go back to a stage before the Munda branch separated from Austroasiatic. However, all other prefixes attested in the Munda languages are restricted to subgroups of Munda and were probably morphologized at stages later than proto-Munda. However, the best candidate for a bound morpheme that defines protoMunda and sets it apart from other branches of Austroasiatic are the combinations of the perfective *lə or the imperfective *tə with the middle voice *-n or with the more putative active voice *-ˀt. The comparative evidence suggest that the sequences of aspect and voice morphemes already formed a close unit (asp:voice) in proto-Munda. Furthermore, this unit seems to have been already bounded to the verb stem at the proto-Munda level. After the development of the aspect-voice markers in the development of proto-Munda, the North Munda continued to develop suffixes after the protoMunda stage. This branch never developed any prefixes and all morphological innovations in Kherwarian and Korku are suffixes or enclitics. The situation is different in the southern languages. The southern branches morphologized the proto-Munda causative *Oˀp as a prefix. If this constitutes an individual event, this would be an argument for South Munda as a proper subgroup of the Munda languages. The causative ob- is the last prefix that Kharia acquired. All other bound morphemes in this language are suffixes or enclitics. The other branches—Juang, Gtaʔ, Sora-Gorum, and Gutob-Remo—morphologized the reciprocal *kƏl and the negative *əˀt as prefixes, either in a single event or in separate developments.7 All subsequent innovations in Gutob-Remo are suffixes or enclitics. In a final development, Juang, Gtaʔ, and Sora-Gorum acquired subject marking prefixes. The differences in the paradigms and in the form of 7 The lack of reflexes for *kƏl in proto-Gutob-Remo could be explained by a later replacement by another construction. However, it could be evidence that the reciprocal and the negative were not only bounded in separate events, but also each independently in each of the four sub-branches. In this case, Gutob-Remo, never morphologized the reciprocal, but only the negative prefix *ar-.



the prefixes suggest that these developed independently in each of the three branches. All later developments in Juang, Gtaʔ, and Sora-Gorum are suffixes or enclitics. In all modern Munda languages, every recent morphologization is either a suffix or an enclitic. However, while most of the suffixes and enclitics are comparatively recent and language or branch specific, the aspect-voice markers seem to be old and could predate all Munda prefixes, except the derivational prefixes inherited from Austroasiatic.


The Typological Shift

Donegan & Stampe (1983, 2002, 2004) and Donegan (1993) postulate a holistic shift from a head-first, analytic language with rising rhythm to a head-final, synthetic language with falling rhythm. This holistic typological shift entails a shift from the development of prefixes to the development of suffixes or enclitics. While it is indisputable that modern Munda languages have a preference for suffixes and enclitics, the notion of a holistic shift has to accommodate the diversity of patterns in modern Munda languages and the different developments in the individual branches. In particular, the fact that the bound aspectvoice morphemes can be reconstructed for proto-Munda, while the prefixes are innovations of subgroups of Munda languages has to be reconciled with the notion of a holistic shift and the patterns in modern Munda. Furthermore, recent works—such as Peterson (2011b) and Ring and Anderson (2018)—have cast doubt on the completeness of the shift towards a falling rhythm. While the emerging evidence points to a more complex development, the relation between morphological and prosodic patterns remains a central, but poorly understood, component of the development of modern Munda languages. The holistic shift model proposed by Donegan and Stampes seems inadeuqate to explain the evidence from the different branches of the Munda group. To develop a better model for the processes that gave rise to the diverse morphological structure attested in the individual Munda languages, more research on the prosody of individual languages and comparisons between prosodic structures of languages from different branches along the line of Ring and Anderson (2018) is needed.


the proto-munda predicate



Proto-Munda can be reconstructed an Austroasiatic language with an SVO word order, few bound morphemes and a range of syntactic slots for particles. Comparative evidence allows a reconstruction of the morphology and the core clausal syntax of proto-Munda. The proposed reconstruction of the clausal core (Figure 7.15) is strikingly similar to syntactic structures in modern languages of other branches of Austroasiatic. [subj] mod/asp neg recp caus [verb](=)asp:voice [obj] figure 7.15

The proto-Munda clause

The syntactic positions are connected with morphemes that can be reconstructed for proto-Munda. Several of the reconstructed morphemes in Figure 7.16 can be linked to morphemes in other branches of Austroasiatic. mod/asp neg recp caus deriv- rdl: root -inc =asp *A *O *Vj *mO

*əˀt *kƏl *Oˀp **bə*Um **tA**A-

figure 7.16


*=lə perf :*n mid *=tə imperf :*ˀt act

The syntactic positions and reconstructed morphemes

Besides the reconstruction of proto-Munda, this paper presented a proposal for the development of the modern languages from their common ancestor and accounts for the diversity of morphology in the different Munda languages. The details of this development question existing theories about the development of the Munda languages and the underlying mechanisms of the changes. Especially the bound status of the post-verbal aspect-voice morphemes is intriguing as it seems to contradict widespread assumptions about the sequence of grammaticalization events. Future research should produce a detailed model that accounts for the attested patterns of prefix and suffix morphologization as well as the changes to the internal organization of the different Munda languages.



References Alves, Mark J. 2006. A grammar of Pacoh: a Mon-Khmer language of the central highlands of Vietnam. Canberra: Pacific Linguistics, Research School of Pacific and Asian Studies, Australian National University. Anderson, Gregory D.S. 2004. “Advances in proto-Munda reconstruction.” Mon-Khmer Studies 34: 159–184. Anderson, Gregory D.S. 2007. The Munda Verb: Typological Perspectives. (Trends in linguistics: Studies and Monographs; 174). Berlin: Mouton de Gruyter. Anderson, Gregory D.S. (ed.) 2008a. The Munda Languages, (Routledge Language Family Series) London: Routledge Anderson, Gregory D.S. 2008b. “Gtaʔ.” In: Gregory D.S. Anderson (ed.) The Munda Languages, (Routledge Language Family Series) London: Routledge. 682–763. Anderson, Gregory D.S. & K. David Harrison 2008a. “Remo.” In: Gregory D.S. Anderson (ed.) The Munda Languages, Routledge Language Family Series London: Routledge. 557–632. Anderson, Gregory D.S. & K. David Harrison 2008b. “Sora.” In: Gregory D.S. Anderson (ed.) The Munda Languages, Routledge Language Family Series London: Routledge. 299–380. Anderson, Gregory D.S.; Toshiki Osada & K. David Harrison 2008. “Ho and the other Kherwarian languages.” In: Gregory D.S. Anderson (ed.) The Munda Languages, (Routledge Language Family Series) London: Routledge. 195–255. Anderson, Gregory D.S. & Norman H. Zide 1999. Recent advances in the reconstruction of the proto-Munda (Austroasiatic) verb. Chicago (Presented at ICHL XIV, Vancouver), ms. Anderson, Gregory D.S. & Norman H. Zide 2001a. “The proto-Munda verb system and some connections with Mon-Khmer.” In: Peri Bhaskararao & Karumuri Venkata Subbarao (eds.) Tokyo symposium on south asian language contact, convergence and typology. (Yearbook of South Asian Languages and Linguistics; 4) New Delhi: Sage. 517–540. Anderson, Gregory D.S. & Norman H. Zide 2001b. “Issues in proto-Munda and protoAustroasiatic nominal derivation: The bimoraic constraint.” In: Marlys A. Macken (ed.) Papers from the 10th annual meeting of the Southeast Asian Linguistic Society. Tempe: Arizona State University. 55–74. Banker, Elizabeth M. 1964. “Bahnar Affixation.” Mon-Khmer Studies 1: 99–117. Bhattacharya, Sudhibhushan. 1968. A Bonda Dictionary. Poona: Deccan College. Biligiri, Hemminge S. 1965. “The Sora verb: a restricted study.” In: G.B. Milner & Eugénie J.A. Henderson (eds.) Indo-Pacific Studies, Part II: Descriptive Linguistics. (Lingua 15). Amsterdam: North-Holland. 231–250. Costello, Nancy A. 1966. “Affixes in Katu.” Mon-Khmer Studies 2: 63–86.

the proto-munda predicate


Cysouw, Michael. 2009. “The asymmetry of affixation.” Snippets (Special issue in honor of Manfred Krifka, ed. by Sigrid Beck and Hans-Martin Gärtner) 20: 10–14. Online: Cysouw, Michael ms. A history of Munda person marking. URL: manuscripts_files/cysouwHISTMUNDA.pdf Deeney, John 1978. Ho-English Dictionary. Chaibasa: Xavier Ho. Donegan, Patricia J. 1993. “Rhythm and Vocalic Drift in Munda and Mon-Khmer.” Linguistics of the Tibeto-Burman Area. 16(1): 1–43 Donegan, Patricia J. & David Stampe 1983. “Rhythm and the holistic organization of language structure.” In: In John F. Richardson et al. (eds.) Papers from the Parasession on the Interplay of Phonology, Morphology and Syntax. Chicago: CLS. 337–353. Donegan, Patricia J. & David Stampe 2002. “South-East Asian Features in the Munda Languages: Evidence for the Analytic-to-Synthetic Drift of Munda.” In: Chew, Patrick (ed.) Proceedings of the 28th Annual Meeting of the Berkeley Linguistics Society, Special Session on Tibeto-Burman and Southeast Asian Linguistics, in honor of Prof. James A. Matisoff. Berkeley: Berkeley Linguistics Society. 111–129. Donegan, Patricia J. & David Stampe 2004. “Rhythm and the synthetic drift of Munda.” In: Rajendra Singh (ed.) The Yearbook of South Asian Languages and Linguistics 2004. 3–36. Gradin, Dwight. 1976. “Word affixation in Jeh.” Mon-Khmer Studies 5: 25–42. Griffiths, Arlo 2008. “Gutob.” In: Gregory D.S. Anderson (ed.) The Munda Languages, (Routledge Language Family Series) London: Routledge. 633–681. Himmelmann, Nikolaus P. 2014. “Asymmetries in the prosodic phrasing of funtion words: Another look at the suffixing preference.” Language 90(4): 927–960. Janzen, Hermann 1976. “The System of Verb-Aspect Words in Pale.” In: Philip N. Jenner, Laurence C. Thompson & Stanley Starosta (eds.) Austroasiatic Studies Part I (Oceanic Linguistics Special Publications; 13) 659–667. Mak, Pandora 2012. Golden Palaung: A grammatical description. Canberra: College of Asia and the Pacific, the Australian National University. Nagaraja, Keralapura S. 1999. Korku language: grammar, texts, and vocabulary. Tokyo: Institute for the Study of Languages and Cultures of Asia and Africa, Tokyo University of Foreign Studies. Nagaraja, Keralapura S. 2014. “Standard Khasi.” In: Paul Sidwell & Mathias Jenny (eds.) The Handbook of Austroasiatic Languages. Leiden: Brill. 1145–1185. Neukom, Lukas 2001. Santali. Munich: Lincom. Osada, Toshiki 2008. “Mundari.” In: Gregory D.S. Anderson (ed.) The Munda Languages, (Routledge Language Family Series) London: Routledge. 99–164. Patnaik, Manideepa 2008. “Juang.” In: Gregory D.S. Anderson (ed.) The Munda Languages, (Routledge Language Family Series) London: Routledge. 508–556. Peterson, John 2011a. A Grammar of Kharia: a South Munda Language. Leiden: Brill.



Peterson, John 2011b. “‘Words’ in Kharia—Phonological, morpho-syntactic and ‘orthographical’ aspects.” In: Geoffrey Haig, et al. (eds.) Documenting Endangered Languages: Achievements and Perspectives. (Trends in Linguistics. Studies and Monographs; 240) Berlin: De Gruyter. 89–119. Pinnow, Heinz-Jürgen. 1959. Versuch einer historischen Lautlehre der Kharia-Sprache. Wiesbaden: Harrassowitz. Pinnow, Heinz-Jürgen 1966 “A Comparative Study of the Verb in the Munda Languages.” In: Norman H. Zide (ed.) Studies in Comparative Austroasiatic Linguistics. The Hague: Mouton. 96–193. Rau, Felix 2011. “Grammatical voice in Gorum.” In: Rajendra Singh & Ghansyam Sharma Annual Review of South Asian Languages and Linguistics. Berlin: De Gruyter Mouton. 125–158. Ring, Hiram 2013. The Pnar Verbal Complex. Paper presented at International Conference on Austroasiatic Linguistics 5, Canberra. Ring, Hiram & Gregory Anderson 2018. “On the prosody of Khasian languages in relation to Munda.” In Hiram Ring & Felix Rau Papers from the Seventh International Conference on Austroasiatic Linguistics. Honululu: University of Hawai’i. 1– 35. Shorto, Harry L. 1963. “The Structural Patterns of Northern Mon-Khmer Languages.” In: Harry L. Shorto (ed.) Linguistic comparison of Southeast Asia and the Pacific. London: School of Oriental and African Studies. 45–61. Sidwell, Paul 2000. Proto South Bahnaric: A reconstruction of a Mon-Khmer language of Indo-China. (Pacific Linguistics; 501) Canberra: Research School of Pacific and Asian Studies, Australian National University. Sidwell, Paul 2008. “Issues in the Morphological Reconstruction of proto-Mon-Khmer.” In: Claire Bowern, Bethwyn Evans & Luisa Miceli Issues in the Morphology and Language History: In honour of Harold Koch. Amsterdam: Benjamins. 251–265. Sidwell, Paul 2011. Proto Bahnaric. ms. (Quoted after SEAlang Mon-Khmer Etymological Dictionary). Sidwell, Paul & Felix Rau 2014. “Austroasiatic Comparative-Historical Reconstruction: An Overview.” In: Paul Sidwell & Mathias Jenny (eds.) The Handbook of Austroasiatic Languages. Leiden: Brill. 221–363. Smith, Kenneth D. 1969. “Sedang Affixation.” Mon-Khmer Studies 3: 108–129. Thomas, David. 1971. Chrau Grammar. Honolulu: University of Hawaii Press. Watson, Saundra K. 1966. “Verbal affixation in Pacoh.” Mon-Khmer Studies 2: 15–30. Zide, Arlene R.K. 1976. “Nominal Combining Forms in Sora and Gorum.” In: Philip N. Jenner, Laurence C. Thompson & Stanley Starosta (eds.) Austroasiatic Studies Part II (Oceanic Linguistics Special Publications; 13) 1259–1294. Zide, Norman H. 2008. “Korku.” In: Gregory D.S. Anderson (ed.) The Munda Languages, Routledge Language Family Series London: Routledge. 256–298.

the proto-munda predicate


Zide, Norman H. & Gregory D.S. Anderson 2001. “The proto-Munda verb system and some connections with Mon-Khmer.” In: Peri Bhaskararao and Karumuri V. Subbarao (eds.) The Yearbook of South Asian Languages and Linguistics 2001. New Delhi: Sage. 517–540.

chapter 8

Proto-Kherwarian Negation, TAM and Person-Indexing Interdependencies Bikram Jora and Gregory D.S. Anderson


Introduction and Overview*

The present study represents the first attempt to unravel the synchronic complexities and diachronic origins of the systems of negation seen in the Kherwarian languages as a whole and within the broader North Munda and comparative Munda contexts. Like many other Munda languages, and Austroasiatic more broadly, Kherwarian languages have two formally distinct systems of negation: (i) a general negative form and (ii) a prohibitive marker. Based on our enormous comparative Kherwarian data set, some interesting features that project back to the Proto-Kherwarian or Proto-North Munda stages have come to light, in particular, complex interdependencies between negation, tam-marking and person indexing. In 2005, the authors began systematically surveying the Kherwarian languages. The goal is to create a massive comparative lexical and morphosyntactic database and to reconstruct the Proto-Kherwarian language. From each variety (to date, Keraʔ Mundari, Tamaɽia Mundari, and Birhoɽ, Santali, Ho, Bhumij), we collect a word list of roughly 5,500 entries and a phrase and sentence list with roughly 2,200 samples. In addition, in each language, we have collected between six and forty ethnographic texts, riddles, songs, aphorisms, personal

* Thanks to Opino Gomango for assistance in the Munda Languages Initiative and also to National Endowment for the Humanities for grant PD50025-13 “Documentation of Hill Gtaʔ, an endangered Munda language of India”, the National Science Foundation for award 1500092 “Documentation of Gutob, an endangered Munda language of India”, award 0853877 “Documentation of Remo (Bonda)”, and award 1844532 “Sora typological characteristics: Towards a Reevaluation of South Asian Human History” the Genographic Legacy Fund grant for the “Ho Talking Dictionary”, and to Ironbound Films for in part making work possible on Sora, Remo, Juang, Santali, and Ho during filming of The Linguists. Other work on the following Munda languages was made possible under occasional funding to Living Tongues’ Munda Languages Initiative: Bhumij, Birhoɽ, Gtaʔ, Remo, Gutob, Gorum, Juray, Sora, Korku, Santali, Kharia, Juang, Keraʔ Mundari and Tamaɽia Mundari, particularly a grant from the Zegar Family Foundation for work on Birhoɽ.

© koninklijke brill nv, leiden, 2020 | doi:10.1163/9789004425606_010

proto-kherwarian negation, tam and person-indexing


and traditional narratives on a range of topics to have a thematically coherent cross-language text collection and set of naturally occurring sentences structured within different types of narrative frames and contexts. The present study represents an initial attempt at the reconstruction of the system of negation in the Munda languages. Kherwarian and North Munda are among the only definable subgroups in the family consisting of more than a single language, the other two being Sora-Gorum and Gutob-Remo (Anderson 2016). Thus, it is important to get definitive reconstructions of all such branches in order to be able to compare these systems with those arrived at by internal reconstruction of the single language branches to have a clear picture of the Proto-Munda system of negation. Once we have achieved this understanding, we will begin to have a better understanding of how the Munda system developed out of its proto-Austroasiatic origins. The present study represents the first step on this journey.


Kherwarian Verb Structure and Negative Formations

Proto-Kherwarian had a complex verbal system like its daughter languages. Two inflectional series can be reconstructed for Proto-Kherwarian, roughly a perfective series (1) and an imperfective series (2), each with its own inflectional template, and each likely a morphological word complex consisting of more than a single phonological word [pω]. (1) Proto-North Munda maximal verb template [perfective series] ⟨[(neg)/X=sbj]pω⟩ [Verb.Stem]pω-[appl-tam-voice/valence=/-obj =ind]pω⟨=sbj⟩ (2) Proto-North Munda maximal verb template [imperfective series] [(neg)/X=sbj]pω [Verb.Stem(-obj)]pω-[tam/voice/valence-ind]pω The Proto-North Munda verb stem in turn was quite simple, a stem plus an optional reciprocal infix. The etymological causative prefix was preserved only in a lexically restricted manner, the functional category being renewed by various innovated auxiliary forms. (3) Proto-North Munda Verb Stem → [⟨caus⟩-]Root[⟨/recp/⟩]


jora and anderson

The Proto-North Munda pattern endures in many Kherwarian languages today. Negative verb forms consist of a sequence of short phonological words, and indeed positive verb forms themselves are also morphological complexes, not single phonological words. Thus, the clitic chain forms phonological words combining sequences of two nuclei in a weak+strong pattern at the level of phonological word, but in long morphological words, the primary word stress can follow the pattern of strong+weak at the phrasal level. In other words, at the phonological/prosodic word level, verb complexes are right-headed/iambic but can be left-headed/trochaic or right-headed/iambic at the level of phrasal or morphological word prosody; see Anderson (this volume), Ring and Anderson (2017) for details. (4) Santali ʤʰuɽi iŋ [alo=m]pω [em-a]pω-[iɲ-a]pω Basket 1sg proh=2sbj give-ben-1obj-ind ‘Don’t give me the basket!’ (5) Keraʔ Mundari sukri [ka=i] pω [gɔɔɉ]pω-[ka-n-a]pω Pig neg=3 kill=aor-intr-ind ‘The pig was not killed.’ (6) Tamaɽia Mundari kula sukri=ke [ka=i]pω [goiˀ]pω-[k-i-a]pω tiger pig-obj neg=3 kill=pfv-3-ind ‘The tiger did not kill the pig.’ Subject clitics in negative forms in most Kherwarian languages are hosted by the negative particle and appear before the remainder of the verbal complex. Thus, negative verb forms in Kherwarian languages typically consist of three phonological words. (7) Santali iŋ hola haaʈ [ba=iŋ]pω [tʃaláó]pω-[le-n-a]pω 1sg yesterday market neg=1sbj go:ipfv -ant-intr-ind ‘I did not go to (the) market yesterday.’ Note that pre-verbal subject clitics in Kherwarian attach enclitically to any word that occupies immediately preverbal position, including case-marked or unmarked pronouns or nouns as in (8) and (9a), and even overt subject pronouns themselves as in (9b).

proto-kherwarian negation, tam and person-indexing


(8) Ho aiŋ hoˀ=ke=ŋ goiʔ-k-i-a 1sg man=obj=1sbj kill-prf-3obj-ind ‘I killed the man.’ (9) a. Santali am iɲ=em ɖaɽ-oʧo-ki-d-iɲ-a 2sg 1sg=2sbj run-caus-tr.pfv-tr-1obj-ind ‘You made me run.’ b. Santali onko=ko lǝi-e-d-a they=3pl.sbj say-aor-tr-ind ‘They said (to him).’ (Ghosh 2008ms: 64) Although this is their preferred locus, subject clitics do not obligatorily occur dislocated from the verb on the word immediately preceding it, as in (4)–(9) above. They can occur at the end of the verbal complex (10), even in prohibitive forms in Keraʔ Mundari (11), a pattern which other Kherwarian languages virtually never show.1 (10) Keraʔ Mundari era-ku inini=se ɉagarɔ-r-a=ku woman-pl each.other=purp/dat speak:ipfv-prog/prs-ind=3pl.sbj ‘The women are speaking to each other.’ (11) Keraʔ Mundari aiŋ=ke alɔ kaɉ-iŋ=me 1sg=obj proh tell-1obj=2sbj ‘Don’t tell me!’ In prohibitive forms in Tamaɽia Mundari they occur on both hosts at the same time, with a double marking of subjects appearing on the prohibitive particle preceding the verb and also at the end of the verbal complex in (12).

1 Prohibitive forms in Kherwarian languages typically find the subject agreement clitics attaching to the prohibitive particle and appearing pre-verbally together with a finite-marked form of the verb as in (4), see also 3.3 below.


jora and anderson

(12) Tamaɽia Mundari aiŋ=ke kanʧi alo=m om-a-iŋ=me 1sg=obj basket proh=2sbj give-ben-1obj=2sbj ‘Do not give me the basket!’ These subject clitics appear to be extrametrical (they are never stressed in complexes larger than two syllables), except when prosodic minimal word constraints necessitate including them within a phonological word, e.g., with monosyllabic intransitive verb stems in singular imperative forms, where they are eligible for stress. So in (13), the fully vocalized form of the subject clitic is necessary to fill the prosodic word constraint in the singular imperative in Birhoɽ (13), but not in the singular prohibitive (14) as the prohibitive particle has two syllables, and instead, the reduced form of the subject clitic is found. (13) Birhoɽ nir=me run=2 sbj ‘Run!’ (14) Birhoɽ alo=m nir-a proh=2sbj run-ind ‘Don’t run!’ Note that a formal opposition between a prohibitive and other negative forms is a common pattern attested across Austroasiatic as a whole (Jenny et al. 2015). All of the common default negators found in Kherwarian {ba(n/ŋ), ka, me(ne)} have parallels in non-Munda Austroasiatic languages. However, what the functional specification of these varied formal markers might have been in the original systems of either Munda or Austroasiatic more generally remains a topic of ongoing research.2

2 Note that there is no Kherwarian- nor Munda-internal evidence that the negative particles of Kherwarian originated as auxiliary verbs in any Munda-definable period, even if there are possible Austroasiatic parallels that would reflect such an origin. This historical path of development was proposed by Jenny et al. (2015). This appears to be based primarily on typological origins of negators cross-linguistically and the placement of the subject clitics on these negative polarity elements, as well as evidence from non-Munda Austroasiatic languages.

proto-kherwarian negation, tam and person-indexing



Subject/Negative Interdependencies in Kherwarian

3.1 Inanimate Subjects Turning from the morphophonology of Kherwarian verb forms, let us now examine some historically interesting details in the morphosyntax of negative constructions in the Kherwarian languages. While it was mentioned above that subjects may be doubly marked in negative formations in some Mundari varieties, subject agreement is in fact typically suppressed and thus absent in a range of instances as well. For example, inanimate singular subjects are almost invariably left unexpressed in positive sentences in all Kherwarian languages, as in Bhumij (15), Santali (16) and Ho (17)–(18). (15) Bhumij koto rəpud-ʤa-n-a branch break-prf.itr-itr-ind ‘The branch broke.’ (16) Santali ɖɛr rapud-e-n-a branch break-prf.itr-itr-ind ‘The branch broke.’ (17) Ho kɔtɔ rəpuɖ-ɔ-tən-a Branch break-itr-ipfv-ind ‘The branch broke.’ (18) Ho kɔtɔ rəpuɖ-jə-n-a Branch break-prf.itr-itr-ind ‘The branch broke.’ However, Kherwarian languages are split in the treatment of encoding these inanimate singular subjects in negative formations. Languages like Bhumij and Ho parallel the positive forms (19)–(21) with subject marking absent. (19) Bhumij koto ka rəpud-ʤa-n-a branch neg break-prf.itr-itr-ind ‘The branch did not break.’


jora and anderson

(20) Ho kɔtɔ ka rəpuɖ-ɔ-a branch neg break-itr-ind ‘The branch does not break.’ (21) Ho kɔtɔ ka rəpuɖ-jə-n-a branch neg break-prf.itr-itr-ind ‘The branch did not break.’ In Santali on the other hand, inanimate subject encoding is typically not suppressed in the negative in (22), in contrast with the corresponding positive forms as in (16). (22) Santali ɖɛr ba=i rapud-kan-a branch neg=3sbj break-ipfv-ind ‘The branch isn’t breaking.’ 3.2 Animate, Non-human, Singular Subjects Animate non-human singular subjects also show distinct behavior in Kherwarian languages. Similar to Santali with inanimate singular subjects, Keraʔ Mundari (23, 24) and Tamaɽia Mundari (25, 26) show a pattern where subject clitics are present in negative formations but suppressed in positive conjugations. (23) Keraʔ Mundari sukri gɔʤ-e-n-a=e pig kill-pfv-itr-ind=3anim.sbj ‘The pig was killed.’ (24) Keraʔ Mundari sukri ka=i gɔʤ-ka-n-a pig neg=3sbj kill-prf.neg-itr-ind ‘The pig was not killed.’ (25) Tamaɽia Mundari kula sukri=ke goiˀ-k-i-a tiger pig=obj ‘He tiger killed the pig.’

proto-kherwarian negation, tam and person-indexing


(26) Tamaɽia Mundari kula sukri-ke ka=i goiˀ-k-i-a tiger pig-obj neg=3sbj ‘The tiger did not kill the pig.’ Ho shows an identical pattern, with non-human animate subject marking often suppressed in positive forms as in (27), but overtly present on the negative marker ka in negative conjugations as in (28). (27) Ho kula sukri=ke goiʔ-ki-j-a tiger pig=obj ‘The tiger killed the pig.’ (28) Ho kula sukri=ke ka=i goiʔ-ki-j-a tiger pig=obj neg=3.sbj ‘The tiger did not kill the pig.’ 3.3 Imperative vs. Prohibitive Imperative and prohibitive formations show different formal properties in Kherwarian languages. On the Santali (to Birhoɽ) end of the continuum,3 one finds full forms of the subject clitic with monosyllabic stems obligatorily in the imperative of intransitives as in (29) to (31). (29) Birhoɽ nir=me run=2sbj ‘Run!’ (30) Birhoɽ gitiʧ=me sleep=2sbj ‘Go to sleep!’

3 Note that while several authors have claimed a Mundari orientation of Birhor in the Kherwarian language-dialect continuum, grammatically and phonologically it is clear that it belongs at least as much together with Santali (Jora and Anderson 2017, Anderson and Jora forthcoming).


jora and anderson

(31) Santali ɖaɽ=me run=2sbj ‘Run!’ In the prohibitive on the other hand, the subject clitic attaches to the prohibitive particle but the verb is marked by the ‘indicative’/final clitic =a, unlike the corresponding imperative forms, which lack the clitic. (32) Birhoɽ alo=m nir-a proh=2sbj run-ind ‘Don’t run!’ (33) Birhoɽ alo=m gitiʧ-a proh=2sbj sleep-ind ‘Don’t sleep!’ (34) Santali alo=m ɖaɽ=a proh=2sbj run-ind ‘Don’t run!’ Some forms in Tamaɽia Mundari and Ho show a similar pattern: (35) Tamaɽia Mundari nir=me run=2sbj ‘Run!’ (36) Tamaɽia Mundari alo=pe nir-a proh=2pl run-ind ‘Do not run!’ (37) Ho kaʤi=m tell=2sbj ‘Tell (me)!’

proto-kherwarian negation, tam and person-indexing


(38) Ho alo=m kaʤij-a proh=2sbj tell-ind ‘Do not tell (me)!’ However, as mentioned above in singular prohibitive forms, one typically finds both double marking of subjects as in (39) and no final clitic =a in Tamaɽia Mundari as in (40). (39) Tamaɽia Mundari alo=m kaʤij-eŋ=me proh=2sbj tell-1obj=2sbj ‘Do not tell me.’ (40) Tamaɽia Mundari aiŋ=ke kanʧi alo=m oma-iŋ=me I-obj basket proh=2sbj give:appl-1obj=2sbj ‘Do not give me the basket!’ Keraʔ Mundari also typically lacks the final clitic in =a in prohibitive formations. Unlike Tamaɽia Mundari, however, subject marking in Keraʔ Mundari tends to occur only once. This appears on the lexical verb and not on the prohibitive particle as in items (41) and (42), in contrast with other Kherwarian varieties. (41) Keraʔ Mundari aiŋ=ke alɔ kaʤi-ŋ=me I=obj proh tell-1obj=2sbj ‘Don’t tell me!’ (42) Keraʔ Mundari alɔ nir=em proh run=ind:2sbj ‘Don’t run!’ Bhumij, on the other hand, prefers just a single post verbal element in imperatives, whether it encodes subject (in 43) or object (in 44).


jora and anderson

(43) Bhumij nir=em run=2sbj ‘Run!’ (44) Bhumij kaʤi-ŋe tell-1obj ‘Tell me!’ In prohibitives, these tendencies converge in Bhumij, and one finds formations similar to those of Birhoɽ or Santali as in (45), but also to that of Tamaɽia Mundari with double subject marking as in (46). However, if there is an overt object, this appears instead of the second, pleonastic or redundant subject clitic (47). (45) Bhumij alo=m sen-a proh=2sbj go-ind ‘Don’t go!’ (46) Bhumij alo=m nir-em proh=2sbj run-ind:2sbj ‘Don’t run!’ (47) Bhumij alo=m kaʤi-ŋe proh=2sbj tell-1obj ‘Don’t tell me!’ Ho shows yet a different tendency, but one that has various parallels to the previously discussed data. So intransitive imperatives behave in the expected fashion with an overt subject clitic in its full/vocalized form with monosyllabic stems: (48) Ho nir=me run=2sbj ‘Run!’

proto-kherwarian negation, tam and person-indexing


Transitive imperatives encode both object and subject, in that order. The inflectional elements tend to stack on the verb, and not appear on the word immediately preceding the verb as is typical in declarative and prohibitive formations in Ho. (49) Ho ʈɔla ema-iŋ=me basket give:appl-1obj=2sbj ‘Give me the basket!’ As might be expected, prohibitive formations in Ho follow the typical pattern with intransitive prohibitives, whereby subject clitics attach to the immediately preverbal prohibitive particle and are marked with the final element -ja. (50) Ho alɔ=m nir-ja proh=2sbj run-ind ‘Do not run!’ Unlike Bhumij, with transitive prohibitives, the object clitic appears together with the final clitic, and the subject clitic attaches in the expected immediately preverbal position. (51) Ho ʈɔla alɔ=m ema-iŋ-ja basket proh=2sbj give:appl-1obj-ind ‘Don’t give me the basket!’ These findings on subject-marking patterns in Kherwarian languages and the putative reconstructed formations in Proto-Kherwarian and Proto-North Munda are summarized in Tables 8.1–2. It is straightforward to reconstruct Ø-subject marking in positive conjugations with inanimate singular subjects, and probably also in negative forms as well for Proto-Kherwarian. The presence of agreement in Santali with negative ba (but not, importantly, with baŋ) is likely an innovation based on a parallel with animate (non-human) subjects. With these non-human animate subjects, on the other hand, we can safely reconstruct a pattern in Proto-Kherwarian in which subject marking is found in negative conjugations but not positive ones. Human animate subjects parallel the speech act participants and typically are marked in both positive and negative forms.


jora and anderson

table 8.1

Subject-marking patterns in Kherwarian negative formations

Language INAN INAN.neg ANIM ANIM.neg Preverbal Preverbal Post-verbal Post-verbal enclitic NP enclitic PROH enclitic VERB enclitic IMP Bhumij








Birhor Santali Keraʔ Mundari Tamaɽia Mundari Ho PKherw Korku PNM


Ø Ø √ba, Øbaŋ Ø Ø Ø

√ √ √

+ +

+ + -


+ + +








Ø *Ø Ø *Ø

Ø *Ø Ø *Ø

Ø *Ø Ø *Ø

√ *√ Ø *√

+ *+ *+

+ *+ *+

*-/+ *-

+ *+ *+

Key: √ Ø INAN/INAN.neg ANIM/ANIM.neg 1sg/1sg.neg + VERB NEG table 8.2

agreement is present agreement lacking inanimate positive and negative animate positive and negative human plural positive and negative pronominal subjects (e.g., 1st singular positive/negative) locus of agreement not locus of agreement occurs on verb occurs on negative particle Subject-marking patterns in Kherwarian negative formations

Language 1sg 1sg.neg Preverbal Preverbal Post-verbal Post-verbal enclitic NP enclitic PROH enclitic VERB enclitic IMP Bhumij Birhor Santali Keraʔ Mundari Tamaɽia Mundari Ho PKherw Korku PNM

√ √ √ √

√ √ √ √

√ √VERB √ √

√ √NEG √ √

+ + +

+ + + -

-/+ -/+ +

+ + + +





√ *√ Ø *√

√ *√ Ø *√

√ *√ Ø *√

√ *√ Ø *√

+ *+ *+

+ *+ *+

*-/+ *-

+ *+ *+

Key: see Table 8.1

proto-kherwarian negation, tam and person-indexing


With respect to the development of Proto-Kherwarian from Proto-North Munda, based on comparative data with Korku (52), we must reconstruct *ba(N) as the default preverbal negative particle in Proto-North Munda. (52) Korku japai-ko ɖusra-ku=ʈen ban manɖi=lakken woman-pl other-pl=obl neg speak=prog ‘The women are not speaking to each other.’ However, what the functional specifications of the other negators that appear to have been present in earlier stages of Munda as these languages developed from their proto-Austroasiatic ancestor remain to be determined by future research.


neg.cop tam/sbj-obj Interdependencies in Kherwarian

We now turn to some curious patterns seen in negative copula formations in the Kherwarian languages. First let’s examine for comparison how positive and negative copula formations with animate possessa operate in the present. In Bhumij (53) to (54) and Ho (55) to (56), both positive and negative formations encode such referents as morphological objects in the present. (53) Bhumij iɲa(ʔ) bəria kuɽihon-kin mena(ʔ)-kin-a 1sg:gen two daughter-du cop-3du.obj-ind ‘I have two daughters.’ (54) Bhumij iɲa(ʔ) bəria kuɽihon-kin baŋ-kin-a 1sg:gen two daughter-du neg.cop-3du.obj-ind ‘I don’t have two daughters.’ (55) Ho aiɲa(ʔ) bəria kuihon-kin menaʔ-kin-a 1sg:gen two daughter-du cop-3du.obj-ind ‘I have two daughters.’


jora and anderson

(56) Ho aiɲa(ʔ) bəria kuihon-kin baŋ-kin-a 1sg:gen two daughter -du neg.cop-3du.obj-ind ‘I don’t have two daughters.’ Similar formations can be found in positive (sentences 57 to 58) and negative copula forms (sentences 59 to 60) in the present tense throughout Kherwarian Munda languages, regardless of what the formal shape of the negative particle/copula is (e.g., banu (Santali), baŋ (Tamaɽia Mundari), or ka … li- (Keraʔ Mundari), etc.). So the pattern endures even while the formal markers of the negative copula (or negative copular construction) vary. (57) Santali iŋ-rin barija kuɽi gidra banu(ʔ)-kin-a 1sg-gen.anim.possessum two girl child neg.cop-3du.obj-ind ‘I don’t have two daughters.’ (58) Keraʔ Mundari aiɲa(ʔ) du ʈʰɔ kuɽihɔn hen-kin-a 1sg:gen two clf daughter cop-3dl.obj-ind ‘I have two daughters.’ (59) Tamaɽia Mundari aĩja(ʔ) barija honkuɽi-kin baŋ-kin-a 1sg:gen two daughter-du neg-3du.obj-ind ‘I do not have two daughters.’ (60) Keraʔ Mundari aiɲa(ʔ) du ʈʰɔ kuɽihɔn ka li-kin-a 1sg:gen two clf daughter neg neg.cop-3du.obj-ind ‘I don’t have two daughters.’ In past negative copular formations, animate possessa are rather encoded as subjects. Such a pattern is attested across the Kherwarian languages, see (61) to (66), and thus can be safely projected back to Proto-Kherwarian. The subject agreement clitics attach as expected to the preverbal negative particle, whether this is the ka/kə series or the ba(ŋ) series.

proto-kherwarian negation, tam and person-indexing


(61) Birhor iŋ-ren bəria majõ kə=kin 1sg -gen.anim.possessum two daughter neg=3du.sbj təhiken-a pst.cop-ind ‘I didn’t have two daughters.’ (62) Bhumij iɲa(ʔ) bəria kuɽihon ka=kin taiken-a 1sg:gen two daughter neg=3du.sbj pst.cop-ind ‘I did not have two daughters.’ (63) Santali iŋ-rin barija kuɽi gidra ba=kin 1sg-gen.anim.possessum two girl child neg=3du.sbj taheken-a pst.cop-ind ‘I did not have two daughters.’ (64) Keraʔ Mundari aiɲa(ʔ) du ʈʰɔ kuɽihɔn ka=kin dɔhɔnken-a 1sg:gen two clf daughter neg=3pl.sbj pst.cop-ind ‘I did not have two daughters.’ (65) Tamaɽia Mundari aĩja(ʔ) barija honkuɽi-kin ka=kin taiken-a 1sg:gen two daughter-du neg=3du.sbj pst.cop-ind ‘I did not have two daughters.’ (66) Ho aiɲa(ʔ) bəria kuihon-kin ka=kin taiken-a 1sg:gen two daughter=du neg=3.du.sbj pst.cop-ind ‘I did not have two daughters.’ The particular interdependencies between subject and object agreement and negation in copular forms in the Kherwarian languages, and the putative reconstructed Proto-Kherwarian systems are presented in Tables 8.3–4. This putative Proto-Kherwarian system of copular formations has reflexes in Korku as well. However, in Korku, unlike Kherwarian, the present forms show the same split as the past ones do, and thus (67) all positive copular forms treat

252 table 8.3

jora and anderson obj vs. subj encoding in Kherwarian prs (negative) copula forms





Bhumij Birhor Santali Keraʔ Mundari Tamaɽia Mundari Ho Pkherw

mena(ʔ) mena(ʔ) mena(ʔ), Ø⟨anim⟩ hen mena mena(ʔ) *mena(ʔ) cop

√ √ √ √ √ √ √

√ √ √ √ √ √ √

table 8.4


bano/bəno bənu/o banu/bano; baŋ; ba ka likna baŋ baŋ *ba(N)[o]


obj vs. subj encoding in Kherwarian pst (negative) copula forms




Bhumij Birhor Santali Keraʔ Mundari Tamaɽia Mundari Ho Pkherw

taiken təhiken taheken dɔhɔken taiken taiken pst.cop



√ √ √ √ √ √ √

ka taiken ba təhiken ba taheken ka le ka taiken ka taiken neg+pst.cop

√ √ √ √ √ √ √

the animate possessa as objects (and thus can be encoded in the morphological verb-word), but in negative copular forms (68), they are encoded rather as subjects and thus remain unmarked in the verbal complex, as Korku lacks subject marking (Zide 2008). (67) Korku iɲ-en bari koɲje-kin ʈa-kin 1sg-gen/dat two daughter-du cop-3.du.obj ‘I have two daughters.’ (68) Korku iɲ-en bari koɲje-kin bən 1sg-gen/dat two daughter-du neg.cop ‘I do not have two daughters.’

proto-kherwarian negation, tam and person-indexing


It seems likely, therefore, that Proto-Kherwarian reflects the original ProtoNorth Munda system, and that this was analogically extended to include present copular forms as well in Korku. That copula forms that typically derive from intransitive predicates historically with meanings like ‘remain, stay’ or even ‘be’ can take object marking in the first place is of course noteworthy. A detailed picture of how and why this system arose must await further research. For a different view on the history of copula forms in Kherwarian, see Peterson (2017).


tam/neg Interdependencies in Kherwarian

In addition to interdependencies between argument encoding and negation (+TAM), there are also interdependencies seen in Kherwarian languages between negation and the formal markers of TAM. Many Kherwarian languages prefer different TAM markers in series-II or perfective series negatives than they use in the corresponding positive conjugations. This variation is complex and extensive and remains a subject of ongoing research as to how to tease apart the various historical layers and synchronic semantico-pragmatic factors at play in determining these. We offer one brief set of examples below. In Santali, perfective transitive forms prefer the TAM marker =ke-/=ki- (69). Corresponding negative forms prefer the marker =le-/=li- (70). However, as (71) shows, the same TAM marker as the positive conjugation is permitted in negative forms, and thus the opposition is a tendency or statistical preference, not an absolute. (69) Santali am iɲ=em ɖaɽ-oʧo-ki-d-iɲ-a 2sg 1sg=2sbj run-caus-tr.prf-tr-1obj-ind ‘You made me run.’ (70) Santali am iŋ ba=m ɖaɽ-oʧo-li-d-iɲ-a 2sg 1sg neg=2sbj run-caus-tr.ant-tr-1obj-ind ‘You didn’t make me run.’ (71) Santali am iŋ ba=m ɖaɽ-oʧo-ki-d-iɲ-a 2sg 1sg neg=2sbj run-caus-tr.prf-tr-1obj-ind ‘You didn’t make me run.’


jora and anderson

With intransitives, there is an even stronger preference, but still not an absolute requirement to use the l-series (‘anterior’) marker in Santali. (72) Santali iŋ hola haʈ ba=iŋ ʧalao-le-n-a 1sg yesterday market neg=1sbj go-ant-itr-ind ‘I did not go to market yesterday.’ Note that this contrasts with the default perfective form of intransitive predicates in the positive conjugation in Santali which is rather -e-/-ja- (73, repeating 16). (73) Santali ɖɛr rapud-e-n-a branch break-prf.itr-itr-ind ‘The branch broke.’ It also contrasts with the usual function of the anterior tam marker in positive conjugations in Santali (74), which as an anterior implies ‘did X first (then Y)’ or ‘had already Xed’. (74) Santali iɲ am-ʈhɛn noa katha ləi-ləgit’=iɲ 1sg 2sg-all dem.prox word tell-purp=1sg hɛc’-le-n=a come-ant-itr/mdl=ind ‘I had come to tell you this (first)’ (Ghosh 2008: 36) The result is that the contrast in the perfect tam markers between transitive perfect (-ke-/-ki-) vs. intransitive perfect (-e-/-ja-) that typifies positive conjugations in the Kherwarian perfective series of inflections is neutralized in the negative. In the negative perfect form, a third default tam marker is used (-le-/li-), and one that has instead anterior or pluperfect functions when not under the scope of negation. Thus, Kherwarian languages attest tam/negation interdependencies similar to the type also found in southern Munda languages of Odisha (Anderson, this volume).

proto-kherwarian negation, tam and person-indexing




Many Munda languages make at least a formal distinction between two types of formations with regards to their system of negative marking, often contrasting prohibitive with other negative conjugations. Similar phenomena have been attested in a wide range of Austroasiatic languages. Such a formal opposition in negative markers can be reconstructed for Proto-North Munda at least. Both appear before the verbal complex and typically serve as the host for subject clitics. In Santali, both markers frequently serve in this function. In Keraʔ Mundari, the default negative particle ka is more likely to serve as host for the subject clitics than the prohibitive one, alo. This may reflect a pattern copy from Dravidian Kurukh which shows a similar pattern for the locus of subject agreement, i.e., post-verbally. Note that Keraʔ Mundari speakers shifted to the Kherwarian Munda language relatively recently from Kurukh. In all the Kherwarian languages but Santali, inanimate subjects are unmarked in both positive and negative structures. Animate non-human singular subjects are typically unmarked in Bhumij, Birhor, and Tamaɽia Mundari in positive forms but marked in negative ones, while in Santali, this is variable, and in Keraʔ Mundari, subject marking is found in both positive and negative formations of this type. Speech-act participants, human singular, dual and plural subjects are typically encoded in both positive and negative formations in all languages, and these patterns can be safely reconstructed to Proto-North Munda. Subsequently, subject proclitics were lost in Korku. In Imperative forms throughout Kherwarian, subject marking is found, while in prohibitive forms, subjects can be optionally doubly marked on both the prohibitive marker and the verb in Bhumij and Tamaɽia Mundari. Unlike other Kherwarian languages, in Keraʔ Mundari, prohibitive forms prefer subject marking on the verb rather than the prohibitive element itself. There seems to be a preference for various valence-specific perfective series tam markers to be used as the default in the positive conjugations, one for transitive stems and one for intransitive ones, and yet a different one in the negative conjugations neutralizing this opposition, as in Santali. Note that similar negative/tam interdependencies are commonly found in Munda languages of southern Odisha (Anderson, this volume). Thus, unravelling these details from a pan-Munda perspective remains a priority for future research. The present study is merely the first step in a full-scale reconstruction and typology of negation and negative structures and constructions found throughout the Munda branch of Austroasiatic.


jora and anderson

Abbreviations anim all ant aor appl ben caus clf cop dat dem du dl fin gen hum imp inan ind ipfv

Animate Allative Anterior Aorist Applicative Benefective Causative Classifier Copula Dative Demonstrative Dual Declarative Finite Genitive Human Imperative Inanimate Indicative Imperfective

itr mdl np neg obj obl pfv pl proh prog prox prf prs pst purp recp sbj sg tam tr

Intransitive Middle Noun Phrase Negative Object[ive] Oblique Perfective Plural Prohibitive Progressive Proximal Perfect Present Past Purposive Reciprocal Subject Singular Tense/Aspect/Mood Transitive

References Anderson, Gregory D.S. 2001. A New Classification of Munda: Evidence from Comparative Verb Morphology. In Indian Linguistics 62: 27–42. Anderson, Gregory D.S. 2004. Advances in Proto-Munda Reconstruction. In MonKhmer Studies 34: 159–184. Anderson, Gregory D.S. 2007. The Munda Verb. Typological Perspectives. Berlin: Mouton de Gruyter. Anderson, Gregory D.S. 2016. Do Koraput Munda, Lower Munda or even South Munda really exist? Once more on the still unresolved classification of the Munda languages. In Supriya Pattanayak, Chandrabhanu Pattanayak and Jennifer Bayer (eds.) Multilingualism and Multiculturalism: Perceptions, Practices and Policy, pp. 313–334. Delhi: Orient Blackswan. Anderson, Gregory D.S. this volume. Proto-Munda prosody, morphotactics and morphosyntax in South Asian and Austroasiatic contexts. Anderson, Gregory D.S. and Bikram Jora. forthcoming. Introduction to the templatic verb morphology of Birhor (Birhoɽ). To appear in Languages and Linguistics.

proto-kherwarian negation, tam and person-indexing


Anderson, G.D.S., T. Osada and K.D. Harrison. 2008. Ho and the other Kherwarian languages. In G.D.S. Anderson (ed.) The Munda Languages, 195–255. Abingdon/Oxford: Taylor and Francis. Ghosh, A. 2008ms. Santali. Preprint version circulated of article that appeared in G.D.S. Anderson (ed.) The Munda Languages, pp. 11–89. Routledge Language Family Series. Abingdon/Oxford: Taylor and Francis. Jenny, Mathias, Tobias Weber and Rachel Weymuth. 2015. The Austroasiatic Languages: A Typological Overview. Jenny, Mathias & Paul Sidwell (eds.) The handbook of Austroasiatic languages, 13–143. Leiden/Boston: Brill. Jenny, Mathias and Paul Sidwell (eds.). 2015. The Handbook of Austroasiatic Languages. 2 volumes. Leiden: Brill. Jora, Bikram and Gregory D.S. Anderson. 2017. Spatial deixis, serialization and syntactic-semantic dependency mismatches in Birhor. Indian Linguistics 77 (3–4). 69– 79. Peterson, John. 2017. Jharkhand as a linguistic area. Language contact between IndoAryan and Munda in eastern-central South Asia. In Raymond Hickey (ed.) Cambridge Handbook of Areal Linguistics, 551–574. Cambridge: CUP. Sidwell, Paul. 2015. Austroasiatic classification. In Jenny, Mathias & Paul Sidwell (eds.) The handbook of Austroasiatic languages, 144–220. Leiden/Boston: Brill. Sidwell, Paul and Felix Rau. 2015. Austroasiatic comparative-historical reconstruction: an overview. In Jenny, Mathias & Paul Sidwell (eds.) The handbook of Austroasiatic languages, 221–363. Leiden/Boston: Brill. Sidwell, Paul, Mathias Jenny and Mark Alves. This volume. Introduction: Austroasiatic syntax in diachronic and areal perspective, pp. 1–18. Zide, Norman H. 2008. Korku. In G.D.S. Anderson (ed.) The Munda Languages, 256–298. Abingdon/Oxford: Taylor & Francis (Routledge).

chapter 9

Relative Clauses in Santali: A Matching Analysis Approach Mayuri Dilip, Rajesh Kumar, Kārumūri V. Subbārāo, G. Maheshwar Rao and Martin Everaert



The chapter presents a unified account of relative clauses of Santali, whose unmarked structure is prenominal form of Externally Headed Relative Clauses. Our primary focus is to identify if Santali converged with neighbouring nonAustroasiatic languages such as Indo-Aryan and Dravidian languages. With the help of typological analysis, we also try to identify evidence to show that ProtoSantali might have been a language with SVO word order. ‘A relative clause is a clause that modifies a phrasal constituent, generally a noun phrase. We call the noun phrase that is so modified the head of the relative clause’ (Riemsdijk 2006: 338; Subbarao 2012: 263). A typological analysis of Austroasiatic, Indo-Aryan and Dravidian languages shows that the prenominal form of Relative Clause is an innovation due to the convergence of Santali with the neighbouring nonAustroasiatic languages. We compare Santali with languages surrounding Santali and also Austroasiatic languages outside South Asia. Apart from Santali, the languages we observe are Kharia (Munda, Austroasiatic), Hindi-Urdu, Oriya and Bangla (Indo-Aryan), Khasi (Mon-Khmer, Austroasiatic) and Telugu (Dravidian). Similar to Santali, Kharia is located within South Asia surrounded by Indo-Aryan and Dravidian languages. Khasi is located in north-eastern part of South Asia, which is geographically away from Indo-Aryan and Dravidian languages. We also compare Santali with Austroasiatic languages that are not surrounded by Indo-Aryan and Dravidian languages, which allows us to observe a structure different from present day Santali. We establish a typological study of various theoretical frameworks and opt for a suitable analysis in order to

* We are very grateful to the editors and the editorial board of Austroasiatic Syntax in Areal and Diachronic Perspective. We appreciate Sanat Hansda, Visva-Bharati University and Rimil Hembrom, Indian Institute of Technology Madras, for providing data and native judgments in Santali.

© koninklijke brill nv, leiden, 2020 | doi:10.1163/9789004425606_011

relative clauses in santali: a matching analysis approach


analyse relative clauses in Santali. The results obtained from the application of Noun Phrase Accessible Hierarchy (NPAH) (Keenan & Comrie 1977) shows that the relativization ceases when the relativized head is an ablative object. When the relativized head is comitative object, the relative clause has an alternative relativization strategy. We analyse relative clauses in Santali by adopting free X’, matching analysis and D-complement hypothesis. The chapter starts off by providing an introduction of Santali language, its linguistic situation within South Asia, followed by the basic structure of relative clauses in Santali. We then demonstrate a typological analysis, application of NPAH and finally the syntactic configuration of relative clauses in Santali.


Structure of Relative Clauses in Santali

According to Ethnologue, there are 7,340,000 Santali speakers in South Asia according to 2011 Census. Typologically, Santali has Subject-Object-Verb (SOV) word order, postpositions, wh-word in-situ, pronominal number markers indicate both subject and objects, non-ergative, non-tonal. As mentioned in Ghosh (2008: 15), the speakers of Santali reside in Orissa, Jharkhand and West Bengal. As a result, the speakers also speak Bengali in West Bengal, Hindi-Urdu in Jharkhand and Oriya in Orissa. These speakers move to industrial areas in these for employment and education. As a result, they learn and communicate in IndoAryan languages. Since, it is the Santali speakers who moved into the groups of non-Austroasiatic languages, Santali got influenced by the non-Austroasiatic and not vice versa. Previous studies such as Ghosh (2008), Peterson (2008), Abbi (1997) and Osada (1991) discussed the convergence of Santali with IndoAryan languages. In the present study, our primary focus is to observe convergence in Santali. There are three types of relative clauses in South Asian subcontinent such as the following: (I) Internally-headed relative clauses (IHRC s), (II) Externally-headed relative clauses (EHRC s), (III) Relative-correlative clauses, and EHRC s are further classified into two groups, namely, (see Subbarao 2012: 264 for details) (a.) the non-finite type, and (b.) the sentential type. Santali has prenominal form of EHRC as in (1a), similar to Hindi-Urdu as in (1b) and in Telugu as in (1c).


dilip et al.

(1) a. [[baha-y Ø bεora-sakame kol-akaode] koṛagidrə-do Baha null letter send-pst ptcp boy-def bhəccən-ka-n-a] friend-pst-intr-fin ‘The boy to whom Baha sent a letter is my friend.’ b. [øi kone mẽ baiṭh-ῑ huῑ] pyārῑ baccῑi corner in sit-pst ptcp cute girl ‘The cute girl sitting in the corner.’ (Subbarao 2012: 279) c. [rāmuḍu øi cadiv-in-a] pustakami Ram read-pst-adjr book ‘The book that Ram read …’ (Subbarao 2012: 284) Unlike Santali, Sangtam (Tibeto-Burman) has IHRC s, where the overt form of relativized head occurs in the embedded clause and the null operator in the matrix clause as in (2). (2) [S2nɨ-nɨ nistarɨ khaŋi šeti ṭhraʔ-ba-tsə S2] øi (pro) khɨtaŋ tšɨŋle you-nom person to letter write-nmlz-def very tall ‘The person you wrote a letter to is very tall.’ (Subbarao 2012: 270) Relative-correlative exist in Indo-Aryan languages and less commonly in Dravidian languages as in Hindi-Urdu (3a) below. Santali also has relative-correlative as in (3b). In these clauses, the relativized noun occurs in the embedded clause and the pronoun/demonstrative phrase in the matrix clause. (3) a. jo ãdhī kəl aī thī vəh bəhut nuksan kər rel storm.f yesterday come.pst.prf.f.s that much damage.m do gəī go.prf.f.s ‘The storm that raged yesterday did a great deal of damage.’ (Kachru 2006: 220) b. [[je hilok’ uni-ny nyel-led-e-a] un hilok’ do which day 3s-1s:subj see-plup:A-3s:obj-fin that day top sombar tahekan-a] Monday cop:pst-fin ‘The day I saw him was Monday.’ (Ghosh 2008: 84)

relative clauses in santali: a matching analysis approach


Among EHRC and relative-correlative in Santali, we focus on the analysis of EHRC by comparing the structure in Santali with Indo-Aryan, Dravidian, Austro-Asiatic languages. The following section shows the evidence obtained from a typological analysis shows the origin of the prenominal form in Santali.


Convergence of the Relative Clauses with the Neighbouring Indo-Aryan Languages

In this section, the evidence shows that the prenominal form of relative clause in Santali is an innovation and not a borrowing from the neighbouring IndoAryan languages. Such innovation is due to the convergence Santali that causes a shift of the word order from SVO to SOV. In order to provide evidence, we compare the structure of Santali with the neighbouring Indo-Aryan languages, Austroasiatic languages and the implicational features of SVO/SOV order. The discussion is elaborated below. Below, there are five sets of relative clauses in different languages1 grouped based on their geographical location. Such grouping is done in order to observe languages in different linguistic environments. By linguistic environment, we mean that the languages that we compare are either in contact with Santali or they are languages of same language family of Santali, however they are located away from Santali. Such grouping and comparison provides us sufficient environment to observe convergence. Set (i) has Santali and Kharia (Munda). Set (ii) has Oriya and Bangla2 (Indo-Aryan). Set (iii) has Mon-Khmer and Khasi (Austroasiatic languages within South-Asia). Set (iv)3 has Ta’ang, Mlabri, Vietnamese, Pacoh and Semelai (Austroasiatic languages outside South-Asia). Set (v) has Nancowry (Nicobarese, Austroasiatic). We grouped Santali and Kharia in one group since they belong to one sub-group and both the languages are surrounded by Indo-Aryan and Dravidian languages. The grouping of Oriya and Bangla allows us to compare the structure of Indo-Aryan languages with the other sets. Mon-Khmer and Khasi in one group since they

1 The language families of the languages is based on the information as in Ethnologue as on 12th October 2019. 2 I thank Gargi Roy and Manoranjan Sahoo, both from Indian Institute of Technology for providing Bangla and Oriya data. 3 The data of Kharia, Khasic, Mon, Khmer, Ta’ang, Mlabri, Vietnamese, Pacoh, Semelai and Nancowry is extracted from Jenny (2011); and Santali from Neukom (2001).


dilip et al.

occur within South-Asia, however they are not surrounded by Indo-Aryan and Dravidian languages. Austroasiatic languages in set (iv) are located outside South-Asia where it is not in contact with any South Asian languages. Set (v) is a group that has Nancowry. This language is located in a geographical area within South Asia, away from Indo-Aryan languages, however it is surrounded by Tamil (Dravidian) speakers. Firstly, notice that Munda languages in set (i) and Indo-Aryan languages in set (ii) have prenominal form of relative clauses. Further, notice that the prenominal form does not occur in any other AA languages as in set (iii), (iv) and (v). Further, AA languages in sets (iii), (iv) and (v) have post-nominal or circumnominal form of relative clauses. These are the languages which never came in contact with Indo-Aryan languages, especially the AA languages in set (iv) that exist outside the South Asian sub-continent. Hence, we assume that the postnominal feature/circumnominal feature is unique to Austroasiatic languages and it is retained without getting influenced by the Indo-Aryan languages. Set (v) shows a post-nominal, which is a non-prenominal structure similar to set (iv) irrespective of its contact with the neighbouring Dravidian language, Tamil. Therefore, we assume that the pronominal form of relative clause might not be a native form of Austroasiatic language family. Further, the prenominal form in Munda languages is either borrowing or innovation which we will discuss in the following discussion. Set i. (4) a. Santali [uni hɔpɔn-tɛt’ Anuə ləgit’-e idi-y-et’-tahɛ̃kan] khicɽi that (an)son-3poss A. for-3s take-y-imp:act-cop:pst mixed daka adɔ uni toyo-ge ɖher-tɛt’-dɔ-e rice then that(an) jackal-foc much-emph-top-3s jɔm-ket’-a eat-pst:act-fin ‘The jackal ate most of the rice mixed (with dal) [that she was taking out to her son Anua].’ (Neukom 2001: 198) b. Kharia ho=ki [ho=kaɽ=te yo=ɖuʔ]RC dinu somtoʔ aw=ki that=pl that=s.hum=obj see=ptcp day Monday cop=m.pst ‘The day they saw him on was Monday.’ (Peterson 2008: 488)

relative clauses in santali: a matching analysis approach


Set ii. (5) a. Oriya Khadya khaothiba pillaṭi mo bhai food eat boy my brother ‘The boy who is eating food is my brother.’ b. Bangla khabar khao-a chele-ṭa amar bhai food eat-PFV ptcp-boy my brother ‘The boy who is eating food is my brother.’ Set iii. (6) a. Khasic ka-mēyd (ha-ka) [ba u-lam u-bōʔ ya-ka-kɔt]RC f.s-table loc-3f.s adjr(rel) m.s-Lam 3m.s-put acc-f.s-book ka-laʔ-kdyaʔ. 3f.s-prf-break ‘The table on which Lam put the book is broken.’ (Subbarao and Temsen 2009) b. Old Mon pun dān [ma smiṅ pa]RC merit donation rel king do ‘The acts of merit and charity which the king performed.’ (Jenny 2005) c. Middle Mon galān [smiṅ ma həm]RC word kinh rel speak ‘The words which the king spoke.’ (Jenny 2005) d. Spoken Mon ʔərè [ɗɛh hɒm]RC kɔ̀ h language 3 speak medl ‘The words he said.’ (Jenny 2005) e. Old Khmer oy ta ʔji yeṅ [ta jmaḥ teṅ soṁ]RC [ta kvan give lnk ancestor 1pl lnk name teṅ s lnk child


dilip et al.

teṅ pavitra]RC teṅ pn ‘[He] gave [it] to a forebear of ours named teṅ Soṁ, daughter of the teṅ Pavitra.’ (Jenner and Sidwell 2010) f. Modern Khmer khɲom khɤ̀ːɲ mənùs(s) nùh [dael lòːk-krù: baːn nìyìːəy pìː.msɤl]RC 1s see man that rel teacher get speak yesterday ‘I see the man about whom the teacher spoke yesterday.’ (Jacob 1968) Set iv. (7) a. Ta’ang kwɔ̄ n kan.nyɔ̄ m [pɛ̄ pʌn mēn h̄ r.dīn]RC ʌ̄n ka.bɛ̄ child child 2PL rel yesterday 3 be.ill ‘The child you saw yesterday is ill.’ (Milne 1921) b. Mlabri (Khmuic) kheep [mɤm maʔ ʔoh ʔa noɲ]RC ʔa tac slipper father give 1s prf finish prf break ‘The slippers father gave me are used (they broke).’ (Rischel 1995) c. Vietnamese tôi đã tìm thấy quyển.sách [mà anh nói hôm.nọ]RC 1sG ant seek see book sub 2s speak ‘I found the book you were talking [about] the other day.’ (Thompson 1987) d. Pacoh ʔa.cɔː [ʔən poːk ʔa.ɲaːʔ]RC ʔŋ.kɨː dog rel go quickly poss.1s ‘The dog that goes fast is mine.’ (Alves 2006) e. Semelai jkɔs [mə=ki=jəl la=cɔ]RC paloh Porcupine rel=3A=bark A=dog flee ‘The porcupine that the dog barked at fled.’ (Kruspe 2004)

relative clauses in santali: a matching analysis approach


Set v. f. Nancowry [ka homkwòm meṅ pōwah ten chüa] shīna leät dähnga rel give 2s paddle to 1s crel finish break ‘The paddle you gave me is broken.’ (Man 1889) We extract evidence from Peterson’s (2008: 425) study in Kharia (Munda) and from “On the history of Munda prenominal relative” (n. d.) which is a study in Korku (Munda) and state that the prenominal form in Santali is an innovation and not a borrowing. Peterson (2010) provides step by step development of relative clauses and claims that the prenominal form is an innovation in Kharia as in (8). (8) fully finite fully finite partially finite non-finite Circumnominal → prenominals → prenominals → prenominals attributive clause (masdars) Among the various forms mentioned in (8), Santali has circumnominal/IHRC as in (9a) and partially finite prenominal form as in (9b). We consider (9b) as partially finite since the finite marker -a is absent in the embedded clause. Santali does not show a structured order as in (8). However, the presence of circumnominal and prenominal forms show that the relative clause formation might be an innovation. (9) a. [ jãhã kɔlɔm-tɛ mɔlmɔl-akad-a]RC ona dɔ-rɛ oka-rɛ any pen-ins-2s write-prf that top which-loc ‘Where is the pen which you have written with?’ (Ghosh 2008: 83) b. [uni hɔpɔn-tɛt’ anuə ləgit’-e idi-y-et’-tahɛ̃kan] that(an) son-3poss A. for-3s take-y-imp:act-cop:pst khicɽi daka adɔ uni toyo-ge ɖher-tɛt’-dɔ-e mixed.rice then that(an) jackal-foc much-emph-top-3s jɔm-ket’-a eat-pst:act-fin ‘The jackal ate most of the rice mixed (with dal) [that she was taking out to her son Anua].’ (Neukom 2001) Pinnow (1966) and Anderson (2007) among others mention that Proto-Munda was likely to be SVO similar to Khasi. However, the word order of modern


dilip et al.

Santali is SOV. The table (10) shows that the implicational features of Santali resemble the features with Hindi-Urdu and not with Khasi. The consistent pattern between Santali and Hindi-Urdu shows that the prenominal form might have been a result of innovation parallel to other features similar to other SOV word order features. However, the embedded verb being finite does not resemble Hindi-Urdu, where the embedded verb is non-finite. If we argue that the prenominal form is a borrowing, it would have had the morphological structure of Indo-Aryan language along with the prenominal form. For example, relative-correlative or the participial form of relative clauses exist in Kharia (see Peterson 2010: 408 & 417 for examples) and they possess a morphological pattern similar to Indo-Aryan languages. Recall that the relative clauses in Bangla and Oriya sentences as in (5) have embedded verb suffixed with morpheme indicating participial form. In Santali, such participial form of suffixation is absent in this particular type of relative clauses. Further, as mentioned in “On the history of Munda prenominal relative clause” (n. d), Pinnow (1966: 171–172) claims that before the rise of the finite marker -a, the relative clause in Munda languages have fully finite predicate with TAM morphology parallel to Korku and the existence of a finite marker is a recent origin. A similar kind of fully finite TAM morphology exists both in prenominal form and circumnominal form of relative clauses in Santali. We did not mention Dravidian languages here since the structure is similar to Hindi-Urdu.

(10) Santali



GN AN Postposition DO-V IO-DO Comparative and superlative are absent

GN AN Postposition DO-V IO-DO Comparative and superlative are absent

Adverb-V RelN

Adverb-V RelN

NG NA Preposition V-DO DO-IO Comparative and superlative are present V-Adverb NRel

Summing up the discussion above, we demonstrated the prenominal form of EHRC, whose embedded verb is finite in Santali. A typological analysis shows that the prenominal form is an innovation. The reason for its innovation is

relative clauses in santali: a matching analysis approach


that the EHRC cannot be a borrowing since the morphological structure of the embedded verb in Santali does not resemble the structure of Hindi-Urdu. If it was borrowing, then the morphological structure of the embedded verb would have been identical to Hind-Urdu along with the prenominal form. The prenominal occurrence in Santali might have been a result of the SOV word order in Modern Santali. That is, all the implicational features exhibit commonalities between Santali and Hindi-Urdu and not with Khasi. However, other Austroasiatic languages which never came in contact with Indo-Aryan languages possess post-nominal or circumnominal relative clauses and this is an evidence to show that the these two types of relative clauses can be native to Austroasiatic languages. In the following section, we examine the structure of relative clauses based on Noun Phrase Accessibility Hierarchy.


Relativization in Noun Phrase Accessibility Hierarchy

In this section, we investigate the relativization of various syntactic constituents based on Keenan and Comrie’s Noun Phrase Accessibility Hierarchy (NPAH). NPAH states the following: (i) A language must be able to relativize subjects. (ii) The strategy of relativization must apply to a continuous segment of grammatical functions/constituents of the NPAH-scale. (iii) Strategies that apply at one point of the NPAH-scale may in principle cease at any point on the scale. The continuous segment of grammatical/syntactic constituents are subject, direct object, indirect object, oblique objects (locative object, instrumental object, ablative object, comitative object), object of the genitive and object of comparison.4 In order to examine whether the constraints mentioned above hold or not, an attempt is made to relativize various syntactic constituents in Santali. Below are the sentences showing relativization of various syntactic constituents5 such as the subject, direct object, indirect object and adjuncts such as locative, instrumental, genitive and ablative objects (in the same order) based on NPAH.

4 The Object of Comparison does not exist in Santali and therefore, the relativization of this object is absent. 5 Santali data is provided by the native speakers of Santali.


dilip et al.

(11) [[ø onḍe tingu-akan] koṛa d(o)] iɳ boiha-kan-a-y null there standing-prs boy emph my brother-prs-fin-sm ‘The boy who is standing there is my bother.’ (12) [[baha-y ø agu-aka-t] potob] aḍi cehrah-a Baha-sm null buy-pst-tr book very beautiful-fin ‘The book which Baha bought is beautiful.’ (13) [[bahā-y ø bεora-sakamε kol-aka(w)-de] koṛagidrə-do] iɲ-rεn Baha-sm null letter send-pst-tr boy-emph my-gen bhəccən-ka-n-a friend-pst-intr-fin ‘The boy to whom Baha sent a letter is my friend.’ (14) [iɳ-ɳ ø duba-kan] kursi] aḍi cehraha-a I-sm null sit-prs chair.loc very beautiful-fin ‘The chair on which I sat is beautiful.’ (15) [[ jo get-akan churi] aḍi lafer-a] fruit cut-prs knife.ins very sharp-fin ‘The knife with which I cut the fruit is very sharp.’ (16) canḍbol get’akan pusi tuva ɲui-a-y tail cut-pst-intr cat milk drink.prs-fin-sm ‘The cat whose tail is cut drinks milk.’ As we observe (11)–(16) the relativization of various syntactic positions into EHRC is possible. However, the relativization ceases when the head is ablative object. In order to confirm such non-existence of EHRC with ablative object, we attempt a diagnostic test, where the grammaticality of ablative object relativization is examined. In the diagnositic test, we provide a hypothetical construction to the native speakers of Santali, which has ablative object as the relativized head as in (17), which has an EHRC-like structure. The result is that the hypothetical construction is ungrammatical.

relative clauses in santali: a matching analysis approach


(17) *ayo bagi-ed-ey-a gidrə-y raga-ey-a mother depart-pst-om-fin child-sm cry-om-fin *‘The mother departed child is crying.’ In the case of comitative object as the relativized head, we find a variation in the strategy of relativization as in (18). In other words, the unmarked structure which is EHRC remains unchanged. Additionally, a nominal number suffix (in bold) occurs at the location of the null operator ø of the relativized NP. This is not a usual location for a nominal number suffix. The relative clauses depict ungrammaticality, if the nominal number suffix is omitted as in (18d). The nominal number suffix is a sum of individuals involved in the action. That is, it indicates individuals quantified in the subject position along with the individuals quantified in comitative object position. The marker occurs in the preverbal position, which is a position of the subject marker.6 However, the marker is not just a subject marker due to its unique quantification. Therefore, in order to accommodate such quantification, the marker occurs with an additional morpheme -ta that functions as a licensor of the nominal number suffix in an unusual position. We label it as a licensor, due to its function of providing eligibility to a pronominal marker occurring in an unanticipated location as in (18a)–(18c).7 The relativization of comitative object should have ceased like several South Asian languages (Subbarao 2012). However, the relativization of a comitative object does not cease in Santali, since the construction is rescued by the occurrence of an additional number marker in the embedded clause. (18) a. bahai ø-ta-kini+j sen-len kuṛigidrəj-dɔ misra-ɲ Baha com-licensor-dual go-pst girl-def sister-my ka-na-yj pst-intr-om ‘The girl who went to the market with Baha is my sister.’ 6 A nominal number suffix occurs as a subject marker either to the right of the finiteness marker or it suffixes to a preverbal constituent. Further, an object marker occurs to the left of the finiteness marker (See Ghosh 2008: 77 for sentences with subject and object marking). 7 The occurrence of -t-/-ta- with a function like a licensor of a nominal number suffix, also occurs in various constructions with object indicating possessor as in (i) and with subject indicating possessor as in (ii) below. i. baha iɲ-agi dal-t-iɲi-a-e Baha I-gen strike-licensor-om-[+FIN]-sm ‘Baha will strike mine.’ ii. uni-rεni mɔṛεgɔṭāŋ gatej mεnaʔ-kɔj-ta-e(y)i-a five friend has-pl-licensor-SG-FIN ‘She has five friends.’


dilip et al.

b. bahai ø-ta-kɔi+j sen-len kuṛigidrəj-dɔ misra-ɲ ka-na-kinj Baha com-licensor-pl go-pst girl-def sister-my pst-intr-om ‘The two girls who went to the market with Baha are my sisters.’ c. bahai ø-ta-kɔi+j sen-len kuṛigidrəj-dɔ misra-ɲ ka-na-kɔj Baha com-licensor-pl go-pst girl-def sister-my pst-intr-om ‘The girls (more than two) who went to the market with Baha are my sisters.’ d. *bahai ø sen-len kuṛigidrəj-dɔ misra-ɲ ka-na-kɔj Baha null. com go-pst girl-def sister-my pst-intr-om ‘The girls (more than two) who went to the market with Baha are my sisters.’ Summing up the section above, the relativization of ablative object is absent. Further, a variation of relativization strategy is adopted in the case of comitative object, where a nominal number suffix occurs following the position of a null pronominal that corresponds the comitative object. This nominal number suffix indicates a total number of individuals participating in the action, that is, the individuals indicated in the subject along with the individuals indicated by the comitative object. Now, in the following section, we analyse the syntactic configuration of EHRC s in Santali, keeping in view their structural distribution discussed until now.


Syntactic Analysis of Relative Clauses in Santali

In this section, we analyze EHRC s in Santali based on the distribution observed in the previous sections. The investigation is elaborated below. 5.1 Typological Investigation of Theoretical Frameworks We observe five types of analysis as cited in Vries (2002) such as i. Old Standard theory ii. Revised standard theory iii. Raising analysis iv. Promotion theory v. Matching analysis. In the old standard theory (Smits 1988), the relative clause is right-adjoined to N’. Further, the determiner occurs in [Spec, NP] as in (19). (19) [NP Det [N’ [N’ Ni [CPwhi … ti …]]]

relative clauses in santali: a matching analysis approach


In revised standard theory (Fabb 1990), the relative clause occurs as a complement of D. The modifying clause occurring as a complement of D is first discussed in Smith (1964). The determiner is projected by its own phrase DP. Further, the CP is a complement of N. There is a co-indexation between the whelement and the relativized N. The wh-element moves to [spec, CP], leaving a trace ti as in (20). (20) [DP [D’ D [NP [N’ Ni [CPwhi … ti …]]]]] In the raising analysis (Brame 1968, Schachter 1973, Vergnaud 1974/1985and Bhatt 2002), the relativized head along with the wh-element base-generates internal to the relative clause and it moves to [spec, CP]. The N occurring in the [spec, CP] further moves out of the phrase and projects as an NP as in (21). The CP occurs as an adjunct of the projected NP. (21) [S [comp [NPwh-det N]i ][S … ti …]]→ [NPi [S’ [compD-reli] [S … ti …]]] In the Matching analysis (Lees 1961, Lees 1963, Chomsky 1965 & Bhatt 2002) a token corresponding to the external head will be present in the embedded clause. In contrast to raising analysis, the internal and the external head do not form a chain. Instead, a copy of the external head book occurs in the internal clause at the level of LF. At the level of PF, the phonetic form of the external head is retained and the internal head is deleted as in (22). (22) the [book] [CP [Op/which book]i John likes ti]→ the [book] [CP [Op/which book]i John likes ti] In the antisymmetric promotion theory (Kayne 1994) has raising analysis, Dcomplement hypothesis, and fixed spec-head-comp order as in (23). In this analysis, the DPrel, which is, ‘the relativized head, and the wh-element’, basegenerates in the internal IP moves out of the IP and occupies [spec, CP]. Further, the NP in [Drel NP] moves and occupies [spec, DPrel]. Further, the CP is the complement of DP as in (23). (23) [DP [D’ D [CP [DP-rel NPk [Drel tk]]i [C’ … ti …]]]] Since, antisymmetric promotion theory has fixed spec-head-comp, the analysis for N-final/prenominal relative clause is different from N-initial/post-nominal relative clause. The structure in (23) is for a post-nominal relative clause. The

272 table 9.1


dilip et al. Syntactic operations of five types of analysis for relative clauses





Free X’/ NP/DP Adjunction/ antisymmetry complement

Base-generated/head/ raising/matching

old standard theory revised standard theory raising analysis promotion theory matching analysis

free X’ free X’ free X’ antisymmetry free X’

base-generated head base-generated head base-generated head Raising Matching



N-adjunction D-complement N-complement D-complement D-complement


structure of prenominal relative clause is provided in (24) followed by a description of it. (24) [DP IPj[D0[CP [NP picture] [C [t]j]]]]] In (21), the IP which is a modifying clause base-generates as a complement of C. The relativized NP that occurs within IP moves to [spec, CP] and further, the IP moves to [spec, DP]. The question that arises is, ‘Which analysis among the five types of analysis is suitable for relative clauses in Santali?’ As we tried to implement each analysis, we noticed that no analysis is completely suitable for relative clauses in Santali. Hence, by adopting the suitable components from each analysis, we compose an operation suitable for EHRC in Santali. In the table (9.1), the operations that occur in each analysis are listed and the operations that are suitable for Santali are highlighted in bold. The reason for the selection of such operations is elaborated below. The column B shows that the analysis is either free X’ or antisymmetry. Recall that we discussed two different analyses in antisymmetric analysis that vary based on the word order as in (23) and (24). In contrast, due to the effect of mirror image in free X’, we adopt the same operation for different word orders. In order to keep the structure simple, where the analysis takes place with fewer operations we adopt free X’. The column C shows that the relative clause can be either a DP or an NP. We claim that the relative clause in Santali as a DP and the evidence comes from the application of Bhatt’s (2002) diagnostic test such as (25).

relative clauses in santali: a matching analysis approach


(25) a. We made headway. b. *(The) headway was satisfactory. c. The headway that we made was satisfactory. As we observe (25), the nominal element such as headway occurs in a simple sentence and it never has a determiner in [spec, NP] position as in (a) and (b). However, the same nominal element occurs with a determiner in a relative clause as in (c). The co-occurrence of the determiner and the nominal element is because the relativized N occurs in [spec, CP] position and the CP occurs as a complement of D. Similarly, Santali has a structure such as (25b) where the relativized head is obligatorily definite. In (26a), the noun potob ‘book’ is preceded by a numeral mitṭaŋ ‘one’, which functions as an indefinite determiner. Further, the same construction can occur without a determiner as in (26b). However, when potob is relativized as in (26c), potob cannot occur with a determiner mitṭaŋ. The obligatory non-occurrence of the indefinite determiner in (26c) shows that the relativized head is definite. This position preceding the relativized head has a function that does not allow an indefinite featured determiner. Now, the question that arises is, ‘Which element is responsible for the obligatory nonoccurrence of the indefinite marking?’ The only two options are (i) a position internal to the relativized NP or (ii) a position external to the relativized NP. If we assume that the obligatory non-occurrence of indefinite marking is internal to NP, then the obligatory non-occurrence will also exist in simple sentences. However, this is not the case, since the determiner mitṭaŋ shows optionality in simple sentences as in (26a) and (26b). If we adopt the second option, that is the restriction of the determiner occurring external to the relativized head, the optionality of the determiner in simple sentence and the obligatory nonoccurrence of a determiner in a relative clause are retained. In other words, the modifier influences the relativized head restricting the head’s definiteness features. Hence, when an NP is relativized, the NP is dominated by a DP whose head has definite feature that disallows indefinite featured determiner. Since, Santali does not have determiners equivalent to English the, the determiner indicating definite feature is a zero morpheme. In other words, relative clause with a zero morpheme is grammatical and with an indefinite determiner mitṭaŋ, it is not. Therefore, based on the analysis above, we state that the relativized head is a DP and not an NP. (26) a. ṭebl cetan re mitṭaŋ potob doho-aka-n-a table on one. indef book keep-pst-intr-fin ‘A book is kept on the table.’


dilip et al.

b. ṭebl cetan re potob doho-aka-n-a table on book keep-pst-intr-fin ‘The book is kept on the table.’ c. ṭebl cetan re (*mitṭaŋ) potob inyak-ka-n-a table on one.indef book mine-pst-intr-fin ‘{The/*a} book which is kept on the table is mine.’ The column D shows that the modifying clause is either an adjunct of N, a complement of N, or a complement of D. If the modifying clause occurs as an adjunct of D, then the modifying clause also has a feature of optionality, similar to an adjunct. However, the modifying clause cannot be optionally dropped. Hence, based on the obligatory occurrence of the embedded clause, we adopt consider the embedded clause as a complement of D. Column E shows three types of operations such as base-generated head, raising analysis and matching analysis. We adopt matching analysis since basegenerated head and raising analysis have the following problems. In the case of base-generated head, the head originates externally. As a result, the null counterpart is unanalysed. In the case of raising, there is no form at the basegenerated position. In other words, we require an element at the base-generated position also, in order to interact with the operations at this position. The attachment of the pronominal number suffix in the relativization of the comitative object is the operation that needs to be taken care at the base-generated position of the relativized head. This is further explained below with a relative clause in Santali. (27) [[baha nyel-akate] koṛigidrə-do aḍi cehra-a-y] Baha see-ptcp girl-def very beautiful-fin-sm ‘The girl whom Baha saw is very beautiful.’ In (27), the relativized head depicts two thematic roles, one corresponding to the NP in the matrix clause that is, the agent and the other one corresponding to the NP occurring in the embedded clause that is the patient. Theta criterion says that the thematic roles must be assigned uniquely and hence, one thematic role is assigned only to one NP. However, when we say that the analysis involves raising, there is only one token of the relativized head moving from internal to the external clause and it is supposed to hold two thematic role which is a violation of theta criterion. In order to solve the problem mentioned above, the relative clause requires matching analysis which proposes that there are two tokens of a single NP. If the relative clause undergoes matching analysis, there

relative clauses in santali: a matching analysis approach


will be two tokens of the relativized head, one in the matrix clause and another in the embedded clause at the level of LF. Further, the external token is assigned case, as a result the phonetic form is retained. However, the embedded verb is not eligible to assign case to the internal counterpart of the relativized head and as a result, the phonetic form of the internal token is not realized. Hence the internal token of the relativized head is deleted at the level of PF. The result of the above investigation is as follows. The relative clause in Santali is a DP where the modifying clause as an IP, which occurs as a complement of C. Further, the CP occurs as a complement of the head D. The head D has as definite feature as its property. In such case, when the relativized head occurs in the scope of D, the indefinite featured determiner of the relativized head will be deleted. We adopt matching analysis where relativized head has two tokens, one in [spec, CP] and the other token in the internal IP. The two tokens of the relativized head occur at the level of LF. The token that occurs internal to the IP is not eligible to be assigned case. As a result, the internal token is deleted at the level of PF. Further, the matrix verb assigns case to the external token of the relativized head, as a result the phonetic form of the external token is retained at the level of PF. (28) LF: [DP D[+definite][CP NP.externali [C’ C [IP … NP.internali …]]]] PF: [DP D[+definite] [CP NP.externali [C’ C [IP … NP.internali …]]]] Summing up the discussion in this section, five types of analysis of relative clauses are observed, such as old standard theory, revised standard theory, raising analysis, promotion theory and matching analysis. Since, no one particular analysis was completely suitable to analyse relative clauses in Santali, certain operations from each analysis are selected and combined them to form an analysis that is suitable to investigate relative clauses in Santali. As a result, the analysis includes free X’ approach, matching analysis and the relative clause being D-complement. Further, in addition to the operations above, we added to the analysis that the head D has definite feature, which prevents the relativized head to have an indefinite featured determiner in [spec, NP]. Such restriction over the determiner occurs since the relativized NP is in the scope of the head D which has definite feature.

276 6

dilip et al.


In this chapter, we discussed the syntactic configuration of prenominal form of relative clauses in Santali. The unmarked structure of relative clause is partially finite Externally Headed Relative Clauses whose relativized head is a prenominal form. A typological analysis shows that the prenominal form in Santali corresponds to Kharia, which is again surrounded by both Indo-Aryan and Dravidian languages. Other Austroasiatic languages that are either away from Indo-Aryan languages or they are located outside South Asia possess non-prenominal form of relative clause. We assume that the feature of EHRC being prenominal is an innovation due to word order shift from SVO to SOV. The relative clauses have features of both SOV and SVO. The evidence that the prenominal form is an innovation, comes from the fact that the other word order features consistently correspond to the SOV order. Hence, the prenominal form is likely to be one of the word order features which shifted along with the SOV word order. Further evidence comes from the fact that the embedded verb has its TAM morphology similar to the matrix verb, which is not a feature of the neighbouring Indo-Aryan languages. In other words, if the prenominal form is borrowed from the Indo-Aryan languages, the morphological structure would have been borrowed along with the prenominal structure. The variation at the morphological level shows that Santali is not totally an SOV language. Further, the TAM morphology matches with the morphology of circumnominal relative clause in Santali. The application of Noun Phrase Accessibility Hierarchy showed that the EHRC is possible for all the syntactic constituents, except with ablative object and we discovered an alternative relativization strategy for comitative object. In the case of ablative object, the relativization ceases and no alternative relativization strategy was found. In the case of comitative object as the head, in addition to the structure of EHRC, a nominal number suffix occurs at the location of the null operator. This nominal number suffix indicates a total number of individuals quantified in the comitative object and the subject. The subject marker indicating the relativized head is suffixed to the right of the matrix verb. Having described the basic structure of relative clauses in Santali, we analysed the syntactic configuration of relativization. For this purpose, we observed five types of theoretical frameworks, such as Old standard theory, Revised standard theory, Raising analysis, Promotion theory and Matching analysis. We found that no framework was completely suitable to analyse the relative clauses in Santali. As a result, we chose the suitable operations from the above mentioned frameworks and combined to form an analysis suitable for relative clauses in Santali. As a result the analysis has free X’, D-complement hypothesis and matching analysis. Hence, we demonstrated that the modifying

relative clauses in santali: a matching analysis approach


clause, which is an IP occurs within a CP. Since, we adopted matching analysis, the relativized head has two tokens, one token in the [spec, CP] position, which is the position of the external head and another token within IP, which is the internal head. The two tokens of the relativized head occur at the level of LF. The internal token gets deleted at the level of PF, since it is not case-marked by the embedded verb. In contrast, the external token retains its phonetic form at the level of PF, since it is case-marked by the matrix verb. Further, the CP occurs as a complement of DP, where it has a head D, with a definite feature. In this case, when the CP occurs as a complement of D, the relativized head occurs in the scope of D.

Appendix (1) Santali GN lɔkhɔn-(ic’) girdrə ‘Lakhan’s son’ (Ghosh 2010: 63) Hindi-Urdu GN aji:t ka: baṛa: beṭa: Ajit gen-MS elder.son PM house ‘Ajit’s elder son’ (Koul 2008: 165) Khasi NG ka iyeŋ joŋ u jon gen PM John ‘John’s house.’ (‘The house of John.’) (Sharma 1999: 146) (2) Santali AN sendra koṛa ‘hunting boy’ (Ghosh 2010: 119) Hindi-Urdu AN choṭi: ila:yci: ‘small cardimom’ (Koul 2008: 74)


dilip et al.

Khasi NA ka miej ba yoŋ PM table rel black ‘Black table’ (Sharma 1999: 142) (3) Santali Postposition dan-(tɛ) tiyok’-mɛ stick-with pull-imp ‘Pull down by the stick.’ (Ghosh 2010: 60) Hindi-Urdu Postposition mez ko sa:f karo table (dat) clean do-imp ‘Clean the table.’ (Koul 2008: 41) Khasi Preposition ŋa šoŋ ha kane ka iyeŋ I stay this in the house ‘I stay in this house.’ (Nagaraja 1985: 52) (4) Santali DO-V baha arel gidra ematkinay Baha Arel child gave ‘Baha gave the child to Arel.’ Hindi-Urdu DO-V mẽne aji:t ko kita:b di: I-erg Ajit-dat book-FS gave-FS ‘I gave Ajit a book.’ (Koul 2008: 215) Khasi V-DO u la a:yya ka kot ha ka he pst give obj to her ‘He gave her a book.’ (Nagaraja 1993: 2)

relative clauses in santali: a matching analysis approach

(5) Santali IO-DO baha arel gidra ematkinay Baha Arel child gave ‘Baha gave the child to Arel.’ Hindi-Urdu IO-DO mẽne aji:t ko kita:b di: I-erg Ajit dat book-FS gave-FS ‘I gave Ajit a book.’ (Koul 2008: 215) Khasi DO-IO u la a:y ya ka kot ha ka he pst give obj f. book to her ‘He gave her a book.’ (Nagaraja 1993: 2) (6) Santali Comparative and superlative Absent Hindi-Urdu Comparative and superlative Absent Khasi Comparative and superlative barit ‘small’; ba kham rit ‘smaller’; ba kham rit-tam ‘smallest’ (Roberts 1891: 29) (7) Santali Adverb-V usdra kamime quickly do ‘do quickly’ (Ghosh 2010: 135)



dilip et al.

Hindi-Urdu Adverb-V vah hameša: acchi: mehnat karta: hẽ he always good hard work do is ‘He always works very hard.’ (Koul 2008: 216) Khasi V-Adverb u la leyt suki he pst go slowly (ADV) ‘He went slowly.’ (Sharma 1999: 141) (8) Santali RelN [uni hɔpɔn-tɛt’ anuə ləgit’-e idi-y-et’-tahɛ̃kan khicɽi daka that (an) son-3poss A. for-3s take-y-imp:act-cop:pst mixed rice ‘… the rice mixed (with dal) [that she was taking out to her son Anua].’ (Neukom 2001) The main clause is deleted for sake of brevity. Hindi-Urdu RelN [pūjā ke kal ke kharῑde hue S2] kapṛe Pooja gen yesterday gen bought (PFV ptcp) clothes ‘The clothes that Pooja bought yesterday …’ (Subbarao 2012: 268) The main clause is deleted for sake of brevity. Khasi NRel u briew uba wan PM human.being PM.rel come ‘The man who comes …’ (Sharma 1999)

relative clauses in santali: a matching analysis approach


Abbreviations def fin tr intr 3 act acc adjr com cop crel dat emph erg f gen imp ins lnk

definite finite transitive intransitive third person active accusative adjectivalizer comitative copula correlative dative emphatic ergative feminine genitive imperative instrumental link

loc m medl null om prf pl poss PP ptcp prs prog pst rel s sm sub TB top

locative masculine medial null operator object marker perfect plural possession postposition participle present progressive past relative singular subject marker subject Tibeto-Burman topic

References Abbi, Anvita. 1997. Languages in Contact in Jharkhand. In Abbi, Anvita (ed.), Languages of tribal and indigenous peoples of India: The Ethnic Space, 131–148, Delhi: Motilal Banarsidass. Alves, Mark. 2006. A grammar of Pacoh: a Mon-Khmer language of the central highlands of Vietnam. Canberra: Pacific Linguistics. Anderson, Gregory. 2007. The Munda verb: typological perspectives. Berlin: Walter de Gruyter. Berlin. Bhatt, Rajesh. 2002. The raising analysis of relative clauses: Evidence from adjectival modification. Natural language semantics, 10(1), 43–90 Brame, Michael. 1968. A new analysis of the relative clause: evidence for an interpretive theory. Unpublished manuscript. Massachusetts: Massachusetts Institute of technology. Chomsky, Noam. 1965. Aspects of the Theory of Syntax (Vol. 11). Massachusetts: Massachusetts Institute of technology press. Fabb, Nigel. 1990. The difference between English restrictive and nonrestrictive relative clauses. Journal of Linguistics, 26(01), 57–77.


dilip et al.

Ghosh, Arun. 2008. Santali. In Gregory Anderson (ed.), The Munda languages. 11–88, London/New York: Routledge. Ghosh, Arun. 2010. Santali: A Look into Santali Morphology. Delhi: Gyan Publishing House. Jacob, Judith. 1968. Introduction to Cambodian. London: Oxford University Press. Jenner, Philip, and Sidwell, Paul. 2010. Old Khmer Grammar. Canberra: Pacific Linguistics. Jenny, Mathias. 2005. The verb system of Mon. Zürich: Universität Zürich. Jenny, Mathias. 2011. “In search of Austroasiatic I: Relative clauses”. Presentation at Annual Meeting of the Southeast Asian Linguistics Society XXI, Kasetsart University, Bangkok. Kasetsart University, Bangkok, May 11–13 2011. Downloadable at http:// Austroasiatic_1.pdf Kachru, Yamuna. 2006. Hindi (Vol. 12). Netherlands: John Benjamins Publishing. Kayne, Richard. 1994. The antisymmetry of syntax. Massachusetts: Massachusetts Institute of technology Press. Keenan, Edward and Comrie, Bernard. 1977. Noun phrase accessibility and universal grammar. Linguistic Inquiry, 8(1), 63–99. Koul, Omkar. 2008. Modern Hindi Grammar. USA: Dunwoody Press. Kruspe, Nicole. 2004. A grammar of Semelai. Cambridge University Press. Lees, Robert. 1961. The constituent structure of noun phrases. American Speech, 36(3), 159–168. Durham: Duke University Press. Lees, Robert. 1963. The grammar of English nominalizations (Vol. 26). Bloomington: Indiana University. Man, Edward Horace. 1889. A dictionary of the Central Nicobarese language. New Delhi: Mittal Publications reprint 1975. Milne, Mary Lewis Harper. 1921. An elementary Palaung grammar. Oxford: The Clarendon Press. Nagaraja, Keralapura Shreenivasaiah. 1985. Khasi, a descriptive analysis. Pune, India: Deccan College Post-Graduate and Research Institute. Nagaraja, Keralapura Shreenivasaiah. 1993. Khasi dialects, a typological consideration. Mon-Khmer Studies Journal 23, 1–10. Neukom, Lukas. 2001. Santali. Munich: Lincom Europa. Osada, Toshiki. 1991. Linguistic convergence in the Chotanagpur area. In: Mullick, Bosu (ed.), Cultural chotanagpur: Unity in diversity, 99–119, New Delhi: Uppal Publishing House. Peterson, John. 2008. Kharia. In Gregory Anderson (ed.) The Munda languages. 434– 595, London/New York: Routledge: London/New York. Peterson, John. 2010. A grammar of Kharia: A South Munda language. Leiden: Brill. Pinnow, Heinz-Jürgen. 1966. A comparative study of the verb in the Munda languages.

relative clauses in santali: a matching analysis approach


In Norman Zide (ed.), Studies in Comparative Austroasiatic Linguistics. The Hague: Mouton (Indo-Iranian Monograph, V), 96–193. Rischel, Jørgen. 1995. Minor Mlabri: a hunter-gatherer language of Northern Indochina. Copenhagen: Museum Tusculanum Press. Roberts, Hugh. 1891. A Grammar of Khasi Language. New Delhi: Mittal Publications reprint 2015. Schachter, Paul. 1973. Focus and relativization. Language, 19–46. Sharma, Hanjabam Surmangol. 1999. A comparison between Khasi and Manipuri word order. Linguistics of the Tibeto-Burman Area, 22(1), 139–148. Smith, Carlota. 1964. Determiners and relative clauses in a generative grammar of English. Language, 40(1), 37–52. Smits, Reinier Johannes Charles. 1988. The relative and cleft constructions of the Germanic and Romance languages. Dordrecht: Foris Publications. Subbarao, Karumuri Venkata and Temsen, Gracious. 2009. Comitative PP as head in externally-headed relative clauses in Khasi. In Sophana Srichampa and Paul Sidwell (eds.) Austroasiatic Studies: papers from the ICAAL4: Mon-Khmer Studies Journal special issue number 2. 184–201, Thailand: Mahidol University and SIL International. Subbarao, Karumuri Venkata. 2012. South Asian languages: A syntactic typology. Cambridge: Cambridge University Press. Thompson, Laurence. 1987. A Vietnamese reference grammar. Hawaii: University of Hawaii Press. Van Riemsdijk, Henk. 2006. Free relatives. In Martin Everaert, Henk van Riemsdijk, Rob Goedemans and Bart Hollebrandse (eds.), The Blackwell companion to syntax, (Vol. 2), 338–382, Oxford: Blackwell Publishing. Vergnaud, Jean-Roger. 1974. French relative clauses, Doctoral dissertation. Massachusetts: Massachusetts Institute of technology. Vergnaud, Jean-Roger. 1985. Dépendances et niveaux de représentation en syntaxe. Amsterdam: John Benjamins Publishing. Vries, Mark De. 2002. The syntax of relativization. Amsterdam: Landelijke Onderzoekschool Taalwetenschap.

part 4 Grammatical Lexicon

chapter 10

Austroasiatic Affixes and Grammatical Lexicon Mark Alves, Mathias Jenny and Paul Sidwell

The editors of this volume collaborated to create this Austroasiatic grammatical lexicon as a resource for the investigation of the history of PAA syntax. It began as a simple compilation of grammatical and grammaticalised items extracted from Shorto’s (2006) reconstruction of Proto-Austroasiatic/MonKhmer, and was then augmented with data from the SEAlang Mon-Khmer and Munda Languages Project. Later, special sections on pronouns and morphology were added, extending beyond Shorto’s work with other published sources. As noted in the introduction, in the history of Austroasiatic research, morphosyntax has been somewhat neglected in favour of lexical and phonological studies, and this has affected the quality and quantity of available grammatical data and remains a serious ongoing problem. This is not to denigrate previous work; researchers legitimately prioritized those facets of language, especially phonology and lexicon that were important to them, and were largely consistent with their immediate professional milieu. In the second half of the 20th century, the diverse and highly dynamic world of grammatical theory often seemed remote from the concerns of those collecting primary data, and work presented within the constraints of particular theoretical approaches was not made more accessible by that fact. Consequently, we feel that it is appropriate to take a back-to-basics approach and present a broad index of grammatical items in etymological context. Austroasiatic reconstruction remains a maturing field. Thus, it is not possible to simply list proto-AA forms for any or all grammatical items, yet it is still often premature to set aside particular etyma that are not widely attested, so we must proceed carefully while always making clear our data sources and reasoning. The compilation presented here is to be regarded as a working document and resource in a highly contingent field of inquiry. Some reconstructions involve only a couple of branches (with a number of items moved to the final subsection of this paper as less likely PAA items or complete exclusions), while others appear in several branches and thus can be considered stronger candidates for original Proto-AA status rather than later innovations which spread aross multiple branches. This data thus gives a sense of Proto-AA grammar, from personal and demonstrative pronouns, to negation and time, to location and comparison.

© koninklijke brill nv, leiden, 2020 | doi:10.1163/9789004425606_012


alves, jenny and sidwell

Missing from this list are preverbal modal verbs, sentence particles, and classifiers, which are common regionally and which have identifiable contentword sources that developed grammatical functions (although we do include a section of high-frequency generic verb that have a strong tendency to grammaticalize). Items are listed in sections with reconstructions and discussion of their geographic distribution in AA and occasional comments on additional grammatical developments and issues of language contact. The main sections include (1) affixes and reduplication, (2) pronouns, (3) other referential terms, (4) interrogative terms, (5) locative terms, (6) grammatical/grammaticalized verbs, and (7) excluded items. The referencing of sources within entries requires some explanation. Unreferenced items are taken directly from Shorto (2006) and can be found under the Shorto entry numbers at the head of the relevant set; where Shorto items have been reassigned to other sets, this is noted. Most other additional data items were sourced via the online SEAlang Mon-Khmer Etymological Dictionary and the SEAlang Munda Etymological Dictionary and are provided with reference information from these sites. A small number of additional items are taken from other sources and are refenced conventionally.


Affixes and Reduplication

Discussion of pAA morphology necessarily begins with consideration of the contraints on word shape and word formation. Comparative reconstructions suggest that the pAA phonological word was highly constrained, in ways that also interacted with the morphology to condition tendencies among morphological processes. Sidwell & Rau (2015: 229) suggest that pAA lexical words were monosyllabic and sesquisyllabic/disyllabic iambs with preferred phonological templates *Ci(Cm)VCf and *(Cp(n/r/l))CiVCf respectively.1 These patterns precluded suffixing but readily accommodated prefixation and infixation, plus reduplication (and potentially compounding although there is little evidence for that in pAA). Affixation, and reduplication are regarded as likely active word-formational processes in pAA and evidence for this is discussed below.2

1 The formalism reads follows: Ci = mainsyllable onset, Cm = medial, Cf = mainsyllabe coda, Cp = presyllable onset, V = mainsyllable nucleus. Additionally, vowels without phonological value may have functioned as nuclei of presyllables. 2 Additionally, there was probably a rich and dynamic expressive adverbial lexicon that functioned along idiophonic/sound-symbolic principles, but this is beyond the scope of our present discussion.

austroasiatic affixes and grammatical lexicon


Typologically, the AA languages can be grouped into three distinct areas based on their morphological characteristics (See Jenny et al. (2015) for a synchronic profile of affixation across AA): – The Munda languages in central and eastern India are consistently verb-final agglutinating languages, with a large number of affixes (including prefixes, suffixes, and infixes) expressing derivational processes as well as case relations with nominals, and tense-aspect and person with verbs. – The Nicobarese languages, spoken on the Nicobar Islands in the Andaman Sea, are generally verb-initial and exhibit complex morphological processes, including prefixes, infixes, and suffixes. – The rest of the family, mostly spoken in Mainland Southeast Asia, is generally verb-medial and predominantly isolating, yet showing inherited derivational morphology, prefixing and infixing, that varies from moribund to highly productive. Marginally there is prefixing of nouns for deictic distinctions and case-marking (see Alves 2015 for samples) but these show no indications of antiquity. The available evidence lacks strong indication of any pAA inflectional morphology. All such morphological material attested synchronically in AA (e.g. Munda inflectional morphology, case-marking in Katuic, agreement marking and verb aspect in Aslian, etc.) appears to be innovative, i.e. to have appeared after the historical separation into distinct branches. Consequently, pAA morphology appears to have been principally derivational in function. This brings us to the likelihood that pAA affixation was more broadly consistent with what we still find in conservative eastern AA languages such as Katu, Bahnar, Khmu, as well as the written languages Old Mon and Old Khmer. Our working assumptions is that pAA prefixes and infixes were underlyingly single consonants that may have triggered further morpho-phonological processes related to syllabification. These included: – vowel-consonant alternations as derived forms which variously resyllablified or resisted resyllabification, – reduplicative infixation in which affixes creates a well-formed new syllable by including a copy of the mainsyllable coda in the presyllable, – prefix-infix alternations: allomorphs infixing into short monosyllables but prefixing to longer stems. The last point above hints at an earlier general origin of infixes from prefixes. Some valuable preliminary efforts to survey morphology across the phylum are available (e.g. Alves 2014, 2015), and these studies find truly widespread common patterns of affixes and reduplication, although much of the revealed complexity likely belongs to later branch-level innovations, leaving only a modest core of material reconstructable to any great depth. To illustrate the kind of


alves, jenny and sidwell

diversity of affixal forms employed for same or similar functions, consider the following from Alves (2015) compiling various reciprocal forms: Bahnaric:

Khmeric: Khmuic: Palaungic:

Chrau pəm ‘to hit’ > tampəm ‘to hit each other’ (Thomas 1971: 154), Rengao ʔwaj ‘to reside’ > taʔwaj ‘to live together’ (Gregerson 1979: 108) Khmer cum ‘to surround’ > pracum ‘to join together’ (Jenner and Pou 1982: 85) Khmu pok ‘to bite’ > trpok ‘to bite each other’ (Svantesson 1983: 39) Palaung ʔɛːh ‘to scold’ > kərʔɛːh ‘to abuse each other’ (Milne 1921: 52)

1.1 Affixes Sidwell (2008) offered a tentative reconstruction of several affixes by the method of direct comparison of similar forms across Eastern AA languages. Prefixes causative *p-/*pCreciprocal *t-/*tNstative *h-/*hNInfixes nominalising *⟨n⟩ nominalising iterative *⟨r⟩/⟨l⟩ nominalising instrumental *⟨p⟩ nominalising agentive *⟨m⟩ The above are not all equally certain, and there were undoubtedly other pAA affixes; what is needed in particular is a comprehensive morphological comparison of Munda with Eastern AA (both Alves and Sidwell neglected to incorporate Munda data in their comparative morphological studies) and going forward we should heed Anderson’s advice that “… the totality of data must be considered …” (2004: 161). Below are some representative examples in supporting these proto-affixes. 1.1.1 Prefixes: Causative *p-/*pCThe causative *p-/*pC- is widespread in AA, including many fossilized pairs, testifying to its antiquity. It is often accompanied with a nasal or rhotic augment, but the significance of this is not clear. In Munda, the causative affix is

austroasiatic affixes and grammatical lexicon


often an a~ə prefix as well as a labial stop prefix~infix, leading Anderson (2004) to reconstruct pMunda causative **əˀb-/⟨ˀb⟩. Rau (this volume) argues that various Munda vocalic affixes reflect a separate etymon, and instead suggests that there were several pMunda causatives: prefixes **bə-, **tA-, and **A-, and *Oˀp-. Aslian: Bahnaric: Katuic: Khasian: Khmeric: Khmuic: Monic:

Mangic: Munda: Nicobaric: Palaungic: Pearic: Vietic:

Jahai gej ‘to eat’ > pjgej ‘to feed’ (Burenhult 2005: 106) (note reduplicative infixation of /j/) Bahnar dəŋ ‘to stand’ > pədəŋ ‘to make inanimate things stand up’ (Ban1979:C:2882-1) Ngeq doːm ‘ripe’ > padoːm ‘to ripen’ (The2001:C:Sid2005~1110-6) Khasi bam ‘to eat’ > pynbam ‘to feed’ (Nagaraja 1985: 27) Khmer dac ‘torn’ > phdac ‘to break, separate’ (Bisang 2015: 685) Khmu: kaːj ‘to come’ > pkaːj ‘to cause to come’ (Premsrirat 1987: 26) Mon tɒn ‘to go/come up’ > pətɒn ‘to raise’; Nyah Kur (Central/ South) tun ‘to go/come up’ > pətun ‘to arm (a spring-trap), to tense (a cross-bow)’ (Diffloth 1984: 206) Mang θiːt6 ‘to die’ > paθiːt6 ‘to kill’ (Lợi et al. 2008: 141) Gorum giʔu ‘to see’ > abgiju ‘to show’ (Anderson & Rau 2008: 404), Gtaʔ gweʔ ‘to die’ > bagweʔ ‘to kill’ (Rau, this volume) Car ɲaː ‘to eat’ > haɲaː ‘to feed’ (Das1977:C:260) (h < p) Riang-lang lɛ ‘to go out’ > plɛ ‘to drive out’ (Shorto 1963: 53) Chong hoac ‘to die’ > pnhaoc ‘to kill’ (Isa2007a:C:25-1) Pong ceːt ‘to die’ > pceːt ‘to kill’ (Fer2xx7:C:765-8)

1.1.2 Prefix: Reciprocal *t-/*tNThe evidence for this prefix is restricted to Eastern AA languages. It may occur with a nasal augment or iterative ⟨r⟩. Aslian: Bahnaric: Katuic: Khmeric: Khmuic: Palaungic:

Kensiu tɨʔ ‘to collide’ > taʔtɯ̃ ʔ ‘to crash together (Bis1994:C:1248) (note reduplicative infixation of /ʔ/) Bahnar ɓɛːt to stab’ > təɓɛːt ‘to stab each other’ (Ban1979:C:32441) Katu kap ‘to bite’ > takap ‘to bite each other’ (Costello 1966: 70) Surin Khmer dom ‘lump, clod’ > tdom ‘to gather together’ (Dha1978:C:2879) Khmu mel ‘to roll’ > tŋmel ‘to roll one’s self’ (Suw2002:C:3339) Riang-Sak rak ‘to love’ > tərrak ‘to love each other’ (Sho2006:C:391-11)


alves, jenny and sidwell

1.1.3 Prefix: Stative *h-/*hNThis is a speculative reconstruction, in particular since the tendency for various consonants in phonologically weak positions to lenite to /h/ makes it problematic to establish historical forms. For example, *h- may be from *s-, cf. Khmer tuːən ‘to repeat’ > stuːən ‘repetitious’, kiːəp ‘to scratch’ > skiːəp ‘scratchy, irritating’ and others (Jenner p.c.). The following have been suggested as supporting this reconstruction: Katuic: Khmuic: Nicobaric:

Katu jur ‘to rise’ > hajur ‘to be raised’ (Costello 1966: 69) Khmu paːŋ ‘to open’ > hmpaːŋ ‘opened’, hncaːk ‘torn’ (Suwilai 2002:lx) Car cək ‘pain’ > hacək ‘painful’ (Das1977:C:171)

However, *h-/*hN- is—on reflection—unsustainable. Very many AA languages have prefixes to derive stative verbs with resultative adjectival and/or passivelike meanings, but often there is remarkable diversity of said prefixal forms within individual languages. For instance, Khmu has a rich array of allomorphs in this category: /m-/, /n-/, /ŋ-/, /nt-/, /ŋk-/, /mp-/, and /tr-/ (Premsrirat 1987: 26). These facts suggest that a morphological stative is an old feature in AA. A sample of forms from several languages follows. Bahnaric: Katuic:


Chrau: rih ‘to tear’ > ta-rih ‘ripped’ (Thomas 1971: 153), Sre ha:l ‘to cut’ > gə-ha:l ‘to be cut’ (Manley 1972: 46) Katu: ɟah ‘to break’ > ha-ɟah ‘to be broken’ (Nguyễn Hữu Hoành 1995: 241), Pacoh hɛ:ʔ ‘to tear’ > ti-hɛ:ʔ ‘to be torn’ (S. Watson 1966: 22) Khmu: làk ‘to split’ > hń-làk ‘split’ (Svantesson 1983: 71)

1.1.4 Infix: Nominalising *⟨n⟩ The *⟨n⟩ infix is securely reconstructed to pAA; it is attested in all branches of the phylum, and while it is no longer productive in many languages, we find cognate root and derivative pairs that we can reconstruct directly to pAA (such as ‘to eat’ > ‘food’, ‘to fly’ > ‘wing’ etc.). Sometimes only the derivative remains while the root has been replaced (see Khmuic and Vietic examples below). Aslian: Bahnaric: Katuic: Khasian:

Semai ca ‘to eat’ > ca’na ‘food’ (Mea1987:C:1521) Bahnar par ‘to fly’ > pənar ‘wing’ Katu caː ‘to eat’ > canaː ‘food’ Khasi: shosng ‘to sit, dwell’ > shnong ‘place, village, town’

austroasiatic affixes and grammatical lexicon

Khmeric: Khmuic: Monic: Munda: Nicobaric: Palaungic: Pearic: Vietic:


Khmer poəl ‘to try, test’ > pnoəl ‘wager, bet’ Khmu: pnɨr ‘wing’ (Suw2002:C:711) ( kənaʔ ‘food’ Kharia koi ‘to shave’ > konoi ‘razor’ (Donegan 1993: 19) Car fɔh ‘to sweep’ > fanɔh ‘whip, broom’ (Das1977:C:116) Lawa sat ‘to comb’, cf. Hu nn̥ àt ‘comb, to comb’ (Sva1991:C:112) Kasong kʰéːt ‘to comb’ > kʰnéːt ‘comb’ (Nop2003:C:493-3) Maleng sənam¹ ‘year’ (Fer2xx7:C:687-2) (< pAA *cam ‘to wait’)

1.1.5 Infix: Iterative *⟨r⟩/⟨l⟩ Characterised as a nominaliser by Sidwell (2008), this infix apparently has a much broader range. It is reconstructed on the basis of the apparent high frequency of medial l~r in verbs denoting repetitive motion, nouns referring to things with multiple parts and/or performing repeated actions, and adverbials with similar reference. Often l~r appear to vary freely, while some branches favour one over the other. Aslian: Bahnaric:

Katuic: Khasian: Khmeric:

Khmuic: Monic: Palaungic:

Jahai hrɲpeɲ ‘goose bumps’ (Bur2005:C:511), krntɛn ‘wrinkles’ (Bur2005:C:830) Sre krʔeːp ~ klʔeːp ‘centipede’ (cf. Bahnar kʔɛːp ‘id.’) (Dourne 1950), Bahnar klwɛk-klwɛk ‘noise of person eating’ (Ban1979: C:1275-1) Ta’Oi krwaːŋ ‘to roll up’ (FerMSND:C:Sid2005~124-3), Ong krdɔːc ‘to tickle’ (Fer1974b:C:Sid2005~356-7) Khasi byrhai ‘many and in order’ (Sin1906:C:279), kyrkieh ‘hastily’ (Sin1906:C:2670) Khmer kiək ‘hold under arm’ > kliək ‘armpit’ (Sho2006:C:269-4), Surin Khmer trbɑːɁ ‘bundle of twenty betel leaves’ (Dha1978: C:105) Khmu klʔaːk ‘crow’ (Suw2002:C:505), klʔus klʔas ‘cluttered’ (Suw2002:C:3747), klpoːm ‘to coil rope’ (Suw2002:C:2961) Old Mon kilwa ‘bat’ (Sho2006:C:237-1) (c.f. Khasi khwak ‘vampire bat’) Lamet srʔɔːk ‘to sweep’ (Lin1978:C:338), Riang kərʔəʔ ‘to hiccup’ (Sho2006:C:9-8)

1.1.6 Infix: Nominalising Instrumental *⟨p⟩ Like *⟨n⟩ above, *⟨p⟩ is confidently reconstructed to pAA; it is not so widely distributed but is found in archaic vocabulary.

294 Bahnaric: Katuic: Khmeric: Khmuic: Monic: Munda:

alves, jenny and sidwell

Chrau liet ‘to lick, taste’ > ləpiet ‘tongue’ (Tho1961:C:Sid2000~ 1336) Ngeq tuc ‘to steal’ > tampuc ‘thief’ (Smi1970:C:3033) Khmer dal ‘to pound’ > tbal ‘mortar bowl’ (Huf1971:C:3763-502-1) (widely diffused into other AA branches) Khmu hmpɔːk ‘skin, bark’ (Suw2002:C:209) (< pAA sɔːk ‘to peel’ Shorto §466.A) Old Mon til ‘to plant’ > twil ‘cultivable land’ (Shorto 1971: 179) Ho sopola ‘reconciliation’, sepeːɖ ‘young man’, gopoeʔ ‘fight, battle’ (Anderson et al. 2008: 214) (base forms not provided in the source)

1.1.7 Infix: Nominalizing Agentive Infix *⟨m⟩ Right across AA there are recurring old pairs of base and derivative indicating a pAA ⟨m⟩ infix, such as Bahnar kɔːn ‘child’, kəmɔːn ‘nephew’, Korku kon ‘child’, komon ‘nephew’. But by far and away examples of ⟨m⟩ infixation involve an agentive verb and nominal derivative being the agent or tool of the action, e.g. Bahnar so:c ‘to sting’, hmo:c ‘ant’, Santali muˀɟ ‘ant’. Some more examples: Bahnaric: Khmeric: Monic:

Bahnar preh ‘to hit’ > bəmreh ‘dead stick used for hitting’ (Banker 1964: 103) Old Khmer cuːəɲ ‘to trade’ > cmuːəɲ ‘trader’, rʊət ‘to run away’ > rmʊət ‘fugitive’ (Jenner pers. com) Old Mon goṅ ‘be brave, dare’ > gmoṅ ‘(the) brave (one)’, jnok ‘be big’ > juṁnok ‘(the) big (one)’, pa ‘to do’ > puma ‘doer, the one who does’. Diffloth (1984: 264) suggests that the Agentive ⟨m⟩ is distinct from Attributive ⟨m⟩, though both may have a common origin in pre-Old Mon. There is not much evidence to support this claim, and the ⟨m⟩ infix can be seen as nominalizer targeting the subject of the verb (with attributive or nominal interpretation). The infix was productive in Old Mon (also applied to Indian loans), but merges with other infixes in Modern Mon (all infixes now being realized as ⟨ə⟩). Lexicalized forms of ⟨m⟩ survive also in Nyahkur.

1.2 Reduplication Beyond affixational morphology, another ubiquitous AA word-formation strategy is reduplication. Reduplication as a word-formation strategy is common worldwide, but the significant range of morphophonological patterns and the substantive degree of productivity of reduplication throughout AA is notable.

austroasiatic affixes and grammatical lexicon


In addition to full reduplication, which constitutes a smaller percentage of reduplicants in AA, both partial reduplication (i.e. copying segments from parts of syllables words) and especially alternating reduplication (i.e. copying of word and syllable templates with a mix of copied and alternating segments) are relatively productive strategies throughout the AA language family. The degree of productivity of the various types of reduplication varies according to language, but most AA languages have dozens to hundreds of documented reduplicants, and sometimes thousands (e.g. Vietnamese), and the languages generally have the capacity to produce new reduplicants. Moreover, undoubtedly, many more reduplicants exist in all the languages than the actual published items due the somewhat vague and even ephemeral nature of reduplicants as well as the challenges in gathering such forms through traditional word-collection methods. The semantico-syntactic functions of AA reduplication include (a) the creation of entirely new words, (b) the expression of fully grammatical functions or functions overlapping with grammatical functions (e.g. iterativity and progression, distributivity and plurality, etc.), and (c) general semantic embellishment of existing words. Reduplication in AA is frequently associated with the lexical category of “expressives,” which semantically overlaps with adjectives and adverbs but which does not fit in those syntactic slots (cf. Sidwell 2014). Also, in some branches, certain reduplicative processes function as inflectional morphology or can have grammatical-like functions. While no specific reduplicant forms can be reconstructed to the PAA level, alternating reduplication, often expressing semantically complex concepts, is prevalent enough to be considered a possible morphological process in Proto-AA (cf. Sidwell 2008). 1.2.1 Functions of AA Reduplication AA reduplication may involve copying a base with semantic embellishment or can generate completely new words without an apparent lexical base. In some cases, the semantics of derived words involve intensification, lightening, or generalization of base words. In many other instances, the semantics of such words are often impossible to translate with single words and rather require longer phrases as the items, actions, and phenomena often involve vivid sensory, physical, and/or visual features (e.g. Mundari riti piti ‘very small leaves as those of tamarind’, Mundari keoŋ meoŋ ‘a feeling of loneliness and fear in the middle of the forest’ (Osada 2008: 140)). Consider further various Vietnamese reduplicants with the numerous senses related to ‘glittery’: lóng lánh ‘shine; glitter; sparkle’, nhấp nhoáng ‘glitter; shine, sparkle’, sáng soi ‘shine, glitter, sparkle, illuminate, illumine, light up’, and xán lạn ‘shine brightly; be resplendent, glitter; resplendent, bright, glittering’ (Bui 1992).


alves, jenny and sidwell

In the following samples for ‘butterfly’ (an insect with visually complex appearance and fluttery movement), a tremendous range of forms are observed. A small portion involve partial reduplication (e.g. Jeh (Yeh) pɯpɯk ‘butterfly’ (The2001:C:jeh-1531) in which only the CV is copied), while a majority involve alternating reduplication (e.g. Alak pɔŋ pɯk ‘butterfly’ (The2001:C:alk1531) in the entire CVC syllable template is copied but the rhyme alternates). Bahnaric:

Alak pɔŋ pɯk ‘butterfly’ (The2001:C:alk-1531), Jeh (Yeh) pɯpɯk ‘butterfly’ (The2001:C:jeh-1531), Cua pʌm pil ~ pʌŋ pil ‘moth, butterfly’ (Mai1981:C:3019), Laven (Juk) rak raːl ‘butterfly’ (The2001: C:Sid2003~356), Sedang reŋ reə ‘butterfly’ (Smi2000:C:421), Sedang tɛk tɛj ‘butterfly’ (Smi2000:C:422) Katuic: Souei kan kɔŋ klaap ‘butterfly’ (Fer1974a:C:Sid2005~143-1), Kui mphlaːp mphlaːp ‘butterfly’ (Pra1978:C:1375), Ta’Oi paŋ pək ‘butterfly’ (Ngu1986:C:Sid2005~518-5), Bru ta̤ ŋ ʔati̤r ‘butterfly’ (The1980:C:3637), Katu (An Diem) wiːk waːk ‘butterfly’ (Cos1971: C:443-1), Pacoh ʔa.paːŋ pɯk ‘butterfly’ (Wat2009:C:351) Khmer: Khmer plak-plaat ‘butterfly’ (Hea1997:C:8253) Khmuic: Tai Hat kup lup ‘butterfly’ (Fer1970:C:755), Phong kuːp pluːp ‘butterfly’ (Bui2000:C:755), Khsing-Mul kɨː tɨt ‘butterfly’ (Pog1990:C: 2189) Monic: Nyah Kur (Nam Lao) phòok-kǝphàak ‘butterfly’ (The1984:C:27985), Nyah Kur (Klang) phǝkphàak ‘butterfly’ (The1984:C:2798-4), Mon yɔk ye ‘butterfly’ (Sho1962:C:9116) Munda: Juang kuŋkulaŋ ‘butterfly’ (matson1964grammatical:C:i696), Juang (Keonjhar) kɔŋkulaŋ ‘butterfly’ (dasgupta1978linguistic:C: c1.i202), Bondo (Plains) la?lap’ ‘butterfly’ (bhattacharya1968bonda:C:c1.p118.r4.i2362.s2359), Bondo (Hill) ləlap ‘butterfly, moth’ (fernandez1963standard:C:c1.i722.sN853), Bondo (Hill) saŋ-saŋləlap ‘yellow-collared moth, butterfly’ (fernandez1963standard: C:c1.i723.sN853a), Bodo-Gadaba zizi-moĩna ‘butterfly’ (zide1963 gutob:C:i980.s10291) Nicobarese: Car liː-la ‘a kind of butterfly’ (Whi1925:C:3411) Palaungic: proto Palaungic *puŋ paaʔ ‘butterfly’ (Sid2010:R:900), Lawa (Umphai) (mboŋ)mbɯaŋ ‘butterfly’ (Sho2006:C:631-3), Wa (Praok) (puŋ)pɛŋ ‘butterfly’ (Sho2006:C:631-1), Lamet (Lampang) laj ləʔ ‘butterfly’ (Nar1980:C:347), Palaung paŋ pa ‘butterfly (in songs)’ (Mil1931:C:1641), Riang (Sak) puŋ¹ pɑʔ¹ ‘butterfly’ (Luc1964:C: RS-388), Lamet (Lampang) waːk wʌ́h ‘butterfly grub’ (Nar 1980:C:567), Palaung wiəŋ wiəŋ ‘to fly (of butterfly)’ (Mil1931:C: 2582)

austroasiatic affixes and grammatical lexicon



Vietnamese (Hanoi) bươm bướm ‘papillon, butterfly’ (Fer2xx7:C: 729-10); Pong (Phong) pampaːm ‘papillon, butterfly’ (Fer2xx7:C: 729-3), Thavung piŋ pɔːt ‘butterfly’ (Suw2000:C:1368), Tho (Lang Lo) pəmpɨam³ ‘papillon, butterfly’ (Fer2xx7:C:729-5), Muong (Bi) pɨəm¹ pɨəm³ ‘papillon, butterfly’ (Fer2xx7:C:729-9)

Grammatical functions expressed via reduplication, while not common, are seen in a number of AA languages, such as Aslian, Munda, and Khmuic languages, though these are rare and semantically diverse enough to suggest innovation (cf. Alves 2015). In some cases, reduplication with grammatical semantics is not fully productive with semantics only partly overlapping with grammatical senses, but they nevertheless occur with some frequency throughout the family. 1.2.2 Forms of Reduplication (Alliteration, Rhyming, Chiming, etc.) Alternative reduplication is seen in all branches of AA. In it, complete syllable and word templates are copied but with some segments copied and others alternated. These are manifested as alliteration (i.e. copying of the initial but alternation of the rhyme), rhyming (i.e. copying of the rhyme but alternation of the initial), chiming (i.e. copying of the initial and final but alternation of the vowel), and tonal alternation (i.e. copying of all segments but alternation of the tone). With polysyllabic words, this becomes even more complicated, but in each case, the complete word template is copied but with a mixture of copied and alternated segments. Again, many of these alternating reduplicants express complex, vivid semantics with some degree of iconicity indicated by this word-formation method. Aslian:



Khasi: Khmer:

Kensiu cʌʔcõʔ ‘sound used to shame child; call animals’ (Bis1994: C:172), Semelai sɲepkẽp ‘sound of rice being winnowed by someone experienced’ (Kru2004:C:1054) Halang ʔlo̤ ːŋ ʔlo̤ ːt ‘to wobble; roll back and forth’ (Coo1976:C: 2168), Sre (Koho) cup blup ‘to stumble; to fall head forward’ (Dou1950:C:503), Laven blip blaːŋ ‘bright light, flash’ (Fer1969:C: Sid2003~485) Ngeq kiʔ kiʔ koʔ koʔ ‘hobbling along a lame person on knees’ (Smi1970:C:1365), Bru ŋo̤ ːʔ-ŋe̤ːʔ ‘tottering, unsteady’ (The1980:C: 2066), Katu (An Diem) ʔapəʔ ʔapuoŋ ‘imitate’ (Cos1971:C:1571-1) Khasi kren kʰɲum kʰɲum ‘to mumble’ (Sin1906:C:2614), Khasi ktuːp ktuk ‘to mumble’ (Sin1906:C:2839) Khmer teel-taal ‘to be weaving from side to side, dodging, flutter-


alves, jenny and sidwell

ing, (of a boat), tossing’ (Hea1997:C:4658), Khmer krɑvək-krɑvɑk ‘to be winding, sinuous, twisting and turning, meandering, zigzag’ (Hea1997:C:1254) Khmuic: T’in (Mal) khuh khuh ‘to stumble’ (Fil2009:C:2016), Khmu (Yuan) kərwìːc-krwɯ̀ al winding, meandering (Lin1974:C:Sho2006~179431) Monic: Nyah Kur (Nam Lao) múm-mám ‘sound of muttering, mumbling’ (The1984:C:2403-1), Nyah Kur (Huai Khrai) phlúm-phlám ‘sound of muttering, mumbling’ (The1984:C:2866-6uth) Munda: Mundari hayam hayam ‘to talk in whispers’ (Osada 2008: 139), Mundari sar sor alternating ‘to eat away with a savage appetite’ (Osada 2008: 139) Nicobarese: Car kiɲəː kərɛ ‘wiggle, wobble, shake’ (Das1977:C:693) Palaungic: Riang (Lang) lak² lɔk² flash of lightning (Luc1964:C:RL-261) Pearic: Chong (Samre) pliːw plûːt fire fly (Por2001:C:990-4) Vietic: Vietnamese móm mém ‘chew without teeth; mumble’, Chứt (Rục) cuon cuòn ‘dragonfly’ (Phu1998:C:163)



AA languages tend to manifest two broad types of personal pronoun systems: (1) languages with coherent paradigms of purely grammatical pronouns with distinctions of person, number, and inclusivity, and (2) languages that avoid real pronouns in favour of kinterms or other social status terms. Both types manifest clines within these tendencies and overlap to some extent. Extreme examples of the later type include Vietnamese, in which kinship-derived terms (several of Chinese origin) with pronominal functions are dominant in almost all socially polite discourse, while just a few native pronouns (including Austroasiatic etyma) are retained largely in intimate or profane usage. Similarly, Khmer notionally retains a coherent pronoun paradigm, yet speakers typically use status terms (such as khɲom ‘slave’ for ‘I, me’) in speech according to socially-conditioned pragmatic factors. It is clear that languages of the second type are generally innovative, reflecting the tendency for development of more hierarchical social relations with state formation and Indospheric and Sinospheric cultural influences. At the same time, among languages of the former type, it is significant that we do find cognate pronominal forms and structures attested in geographically distant branches. Consistent with the general wisdom that pronouns are among the most stable basic vocabulary cross-linguistically, this is strongly indicative


austroasiatic affixes and grammatical lexicon table 10.1 Proto Austroasiatic pronouns/demonstratives (Pinnow 1965)

1st person

2nd person



*iŋ, *ai(ŋ)/*ai(ɲ)


*me *mo/mu *ma *nee *hai 3rd person/demonstrative *a-*i/e-*u/o *ha-*hi/he-*hu/ho *na-*ni/ne-*nu/no *an-*in/en-*un/on *han-*hin/hen-*hun/hon *di/de-*du/do *ta-*ti/te-*tu/o *ca-*ci/(ce)-*cu/co *va-*vi/ve-*vu/vo *ma(*mai)-*mi/me-*mu/mo


*he(i) (incl.) *bɨ(n) (incl.) *je(h) (excl.) *le/ne (excl.) *be(n) *pe *pa *ji


that protoAA was of the first type, and consequently, it is not difficult to suggest a coherent set of pAA personal pronouns. It is also apparent that from a historical perspective, we cannot clearly distinguish 3rd person personal pronouns (3PP s) and demonstratives, and must treat them together as part of a shared semantico-syntactic sytem. This is typologically unremarkable; although far from universal, demonstratives and 3PP s are often in complementary distribution in the world’s languages, such that they cannot be formally distinguished (see Höhn 2015 for a recent discussion). In respect to AA, we observe (see data below) just this kind of categorical confusion, such that it is not possible to reliably distinguish demonstratives and 3PP s once we begin reconstructing deeper than the branch level. The 1960s saw particular attention given to AA pronouns, especially by the Munda specialist Pinnow. His pioneering included presentations of two impressionistic reconstructions of pAA pronouns, reproduced as Table 10.1 and Table 10.2 (reorganized into table form, from the lists by which they appear in the originals).


alves, jenny and sidwell

table 10.2 Proto Austroasiatic pronouns/demonstratives (Pinnow 1966: 167)a




1st person


2nd person 3rd person/demonstrative

*me *eɟ (anim.) *a (inam.) *mai (gen.)

*liŋ (excl.) *liŋ (incl.) *ben *kɨn, *kin

*le (excl.) *bɨ(n) (incl.) *pe *kɨ

a Regrettably Sidwell (2015) mislabled this table as being from Zide (1966).

Pinnow’s compilation attempted to account for all forms he found in the data, with a maximal paradigm filled out with vowel alternations of the type he believed he saw, especially in the Munda languages. This is an interesting exercise, but it delivered an unrealistic paradigm that did not take into account typology or directionality of change. At about the same time Zide presented his paradigm, which is more more typologically reasonable, although it is heavily weighted toward Munda. This remained the state of discussion for several decades until Sidwell (2015) proposed the reconstruction in Table 10.3 on more systemic etymological grounds. Comparisons supporting the forms in the table below are provided in the listing below. In some cases, these include etymologies not referenced by Shorto (2006), or which present radically different analyses to Shorto’s. Somewhat echoing Pinnow, Sidwell has a substantial listing of 3P/DEM forms, in this case a dozen. All of these have at least some etymological support, but it is difficult to sort out the likely semantics; there may have been further distinctions of gender, animacy, number or other functions. 1S Shorto: Sid2015: Aslian: Bahnaric: Khmeric: Khmuic: Monic:

(Shorto suggests that these are sandhi forms of § 2 *ʔiːʔ ‘person’) *ʔaɲ/*ʔiɲ 1S Che’ Wong, Semai I ʔiŋ, Semelai ʔəɲ, Semnam ʔĩːɲ (Bur2009:C: 320) Stieng ʔaɲ (Blood 1966), Sre aɲ, Chrau aɲ, iɲ, Bahnar iɲ Old Khmer añ, Modern Khmer ʔaɲ Pong ʔeːɲ (Bui2000:C:593), Mal ʔəɲ (Fil2009:C:1043), Khsing-Mul ʔaɲ (Pog1990:C:593) Nyah Kur ʔéːɲ (The1984:C:4061-1) (also ‘self’, which may be the basic meaning (see below ‘self’); may be influenced by Khmer)

austroasiatic affixes and grammatical lexicon


table 10.3 Proto Austroasiatic pronouns/demonstratives (Sidwell, 2015)




1st person



2nd person 3rd person/demonstrative

*miːʔ *ʔan *ʔɤːn *ʔɔːʔ *ʔuːʔ *ʔiːʔ *nVʔ, *nVh *tVʔ, *tVh *han *ga(ː)ʔ

*hɛʔ *(ʔ)jeːʔ *pɛʔ *giːʔ

Munda: Pearic: Comment:

1D Shorto: Sid2015: Mangic: Nicobaric: Palaungic:

Vietic: Comment:

Juang aɲ (Rajendran 2002), Ho aɲ (Kobayashi 2003), Mundari aɲ (Kobayashi 2003), Juray ɨɲ (Zide 1982) Samre ʔiɲ (Por2001:C:1406-4), Chong ʔiɲ (Huf1971:C:3140-421-19), Chong of Kompong Som ʔeɲ (Isa2007b:C:1406-2) Notwithstanding Shorto’s suggestion, pAA *ʔaɲ ‘I’, is confidently regarded as a secure reconstruction.

§1439 *ɟʔaːj ‘we two’ *ʔaːj 1D Bolyu ʔaːi⁵⁵ ‘we’ (Edm1995:C:1125) Central Nicobarese cəai ‘we two’, Car ai ‘we two (oblique case)’ (Whi1925:C:96) Palaung aj, Riang-Lang ɑj ‘we two’, Lamet ʔáj ‘both of us’ (Nar 1980:C:368), U ʔǎj ‘we (dual)’ (Sva1988:C:252), Hu ʔàj ‘we (dual)’ (Sva1991:C:133) Ruc ʔaːj ‘that’ (Phu1998:C:1358), Mương Hoa Binh ʔaːj¹ ‘who’ (Fer 2xx7:C:38-8), Vietnamese ấy ‘that’, ai ‘who/whoever’ Shorto posits initial ɟ- based on Nicobarese, but this is probably an error, and cəai is rather a fusion of *cə ‘I’ + *ʔaːj ‘we two’.


alves, jenny and sidwell

There are also similar pronominal forms in Katuic, e.g. Souei ʔaj ‘he’ (Huf1971: C:2828-380-12), but this could reflect Lao ʔȃːj ‘older brother’ which is used as an address term widely in Katuic. 1P Shorto: Sid2015: Aslian:




1P Shorto: Sid2015: Aslian: Bahnaric: Katuic: Munda: Palaungic: Comment:

1P Shorto: Aslian: Khasic: Khmuic: Nicobaric:

§1.B *hiʔ ‘we’ *hɛʔ 1P (? incl.) Kensiu heʔ, Semai I hiːʔ, Semelai heːʔ ‘we (inclusive)’, Jahai hɛj ‘we (dual incl.)’ (Bur2005:C:462), Semelai hɛ ‘we (incl.)’ (Kru2004: C:1274) Stieng heːi, Biat hɛː ‘I’, Central Rölöm hiː ‘we (excl.)’, Bunör hɛː ‘we (incl.)’, Cua haj ‘we (incl.)’ (Mai1981:C:229), Sre Koho he ‘we’ (Boc1953:C:Sid2000~1112), Halang he̤ːj ‘we (incl.)’ (Coo1976:C: 981) Kuy hai ‘we (dual)’, Ngeq heː ‘we (pl.)’, So haj ‘I, we’ (Gai1985:C:Sid 2005~569-4), Ir hiɛ ‘we (incl.)’ (Chi1978:C:Sid2005~569-9), Katu An Diem hɛː ‘we (incl.)’ (Cos1971:C:3703-1), and others. Central Nicobarese hẽ ‘we (dual)’, heː ‘we (pl.)’, Car yē hī ‘if we’ (Whi1925:C:6613)

§150 *j(eː)ʔ ‘we’ *(ʔ)jeːʔ 1P (? excl.) Mintil jɛʔ, Semaq Beri jɛːh ‘we (pl. incl.)’, Semelai jɛ ‘I’ (Kru2004: C:593), Temiar jeːʔ, Kensiu jɛʔ Bahnar ɲiː ‘we (dual excl.)’ Katu jiː ‘we (excl.)’ (Cos1971:C:3702-1) Juang (niɲ)-ɟe ‘we’ Palaung jɛ ‘I and they elsewhere’, Praok ji ‘we’ Shorto grouped Kensiu, Temiar and Semaq Beri forms with § 2 *ʔiːʔ rather than §150, which appears to be a mistake.

§1.A *ʔiʔ ‘we’ Temiar ʔɛːʔ ‘we (incl.)’, Semnam ʔɛːʔ ‘we (incl. pl.)’ (Bur2009:C: 1201) War Amwi ʔi ‘we’ (Wei1975:C:744), Pnar ʔi ‘we’ (Cho2004:C:744) Kammu-Yuan ìʔ ‘we (pl.)’, Khmu ʔiʔ ‘we (pl.)’ (Huf1971:C:6501-8478) Car ī-hö ‘we’ (Whi1925:C:1844)

austroasiatic affixes and grammatical lexicon

Palaungic: Comment: 2S Shorto: Sid2015: Aslian: Bahnaric: Katuic: Khasic: Khmuic: Munda: Nicobaric: Palaungic: Vietic: 2P Shorto: Sid2015: Bahnaric: Katuic: Khasi:


Palaung ɛ, Riang-Lang eʔ¹, Praok e, Lawa ʔeʔ ‘we (pl.)’ This is apparently separate from § 2 3P *ʔiːʔ below.

§128 *mi(ː)ʔ; *miːh ‘you (singular)’ *miːʔ 2S Bateg Nong məʔ, Bateg Dek mɔh, Semnam miːh you (singular) Sre mi you (singular), Stieng meːi, Chrau maːj, Biat mai you (masculine), Bahnar ʔmih you (dual). Kuy mài you (familiar) Khasi me Kammu-Yuan mèː you (masculine singular) Santali, Mundari 2P agreement suffix -me (and others) (Pinnow 1965: 26) Central Nicobarese me, mẽ you (singular) Palaung mi, Riang-Lang miʔ¹, Praok maj, Lawa Bo Luang maiʔ, Lawa Umphai miʔ Vietnamese mày, Rục mi: (Phu1998:C:758)


§99 *piʔ ‘you (plural)’ *pɛʔ 2P Laven pɛː (Jac2002:C:1040) (possibly back-borrowed via Katuic) Pacoh ʔipɛː (Wat1979:C:Sid2005~568-5) Khasi phi ‘you’, Pnar pʰi ‘you (masc./fem. agreement marker)’ (Cho2004:C:458) Kharia -pe, &c. (PINNOW 1959, 175a; Proto-Munda *-pɛ). Central Nicobarese (i)feː Palaung pɛ, Riang-Lang peʔ¹, Praok pe, Lawa Bo Luang paiʔ, Lawa Umphai, Mae Sariang peʔ, U phé (Sva1988:C:163), Danaw pɤ¹ (Luc1964:C:D-823) Vietnamese bay

3P/DEM Shorto: Sid2015: Bahnaric: Katuic: Khasic: Khmeric:

§26 *ge(ː)ʔ (?) deictic & 3rd person pronoun *giːʔ 3P/DEM Röngao gɛː, geː, Kontum Bahnar giː ‘he, she’ Kuy kɤi ‘that’ Khasi ki ‘they’, ka feminine pronominal affix Khmer kèː ‘one, someone, he, they’

Munda: Nicobaric: Palaungic:

304 Khmuic: Munda: Nicobaric: Palaungic:


alves, jenny and sidwell

Kammu-Yuan kìː ‘this’, Kammu-Yuan kə̀ ː ‘he’, kɯ̀ ʔ ‘many (people)’ Kharia -ki ‘plural suffix’ (Pinnow 1959 V074), Juang -ki ‘third person plural’ (Pinnow1960beitraege:C:i823) Nancowry kí ‘all’ Palaung ge, Praok ki’ they’, Lawa Bo Luang keʔ ‘he, she’, RiangLang kəʔ¹ ‘they’, Riang-Lang ke¹ plural particle, pPalaungic *giːʔ ‘they’ (Sidwell 2015 §258) Shorto also compares Röngao gaː & gaːr (!) ‘he, she’, Sre gə indefinite pronoun, but see *ga(ː)ʔ 3P/DEM below. Munda languages show -ki agreement suffix that has unexpectedly devoiced onset, although may also be compared to Shorto § 252 *kh(iː)ʔ ‘this, he, they’.

3P (doubtful pAA) Shorto: §252 *kh(iː)ʔ ‘this, he, they’ Aslian: Semai II keːʔ, Semelai kəh ‘he’ (with k- by dissimilation). Bahnaric: Sre khaj 3rd person pronoun, Chrau khəj pronominal plural particle Khmuic: Thin khi ‘this, here, now’ Munda: Kharia -ki ‘plural suffix’ (Pinnow 1959 V074), Juang -ki ‘third person plural’ (Pinnow1960beitraege:C:i823) Pearic: Chong kʰi̤ː ‘there’ (Sir2001:C:1828-5) Comment: This comparison is highly problematic, as the aspirated onset is highly marked for AA and the attested forms are thinly spread. Potentially the forms are explained as devoiced varients of § 26 *ge(ː)ʔ (see preceding entry). 3P Shorto: Sid2015: Aslian: Bahnaric:

Khasi: Khmuic:

§2 *ʔiːʔ ‘person’ *ʔiːʔ 3P/DEM Semai ʔi ‘his, her, its’ (Mea1987:C:2552), Kensiu ʔi- prefix: woman’s name (Bis1994:C:1558) Stieng iː, Alak ʔiː ‘it’ (Huf1971:C:3228-432-7), Sapuan ʔiː ‘she’ (Jac 1999:C:133), Laven ʔie ‘grandmother’ (Jac2002:C:21), Tampuon ʔeː ‘he’ (Huf1971:C:2828-380-25), ‘it’ (Huf1971:C:3228-432-9) Khasi i ‘he, she, it’, Amwi ʔi ‘we’ (Wei1975:C:744), Pnar ʔi ‘he, she’ (Bar2010:C:2417-1), ‘we’ (Bar2010:C:744-1) Mlabri ʔi title of junior female (Ris1995:C:1427), Mal ʔiː ‘Mrs, Miss’ (Fil2009:C:1312)

austroasiatic affixes and grammatical lexicon

Monic: Palaungic:

3P/DEM Shorto: Sid2015: Bahnaric:

Katuic: Munda: Nicobaric: Palaungic:

Pearic: Vietic: Comment:

3P/DEM Shorto: Sid2015: Bahnaric: Katuic: Khmuic: Munda:

Nicobaric: Comment:


Old Mon ’ey /ʔɔy/, Modern Mon oa, Proto-Nyah Kur *ʔə̱ j/*ʔwə̱ j (Diffloth 1984 N263) Lawa Bo Luang ʔaiʔ ‘I’, Palaung i- (in ime ‘male’, ipən ‘woman’), Riang-Lang iʔ¹ ‘person, human being’, U ʔí ‘people, others’ (Sva 1988:C:269), Danaw ʔɪ¹ ‘we (more than two)’ (Luc1964:C:D-316)

§1115 *(ʔ)anʔ 3rd person singular pronoun *ʔan 3P/DEM Cua ʔan ‘this, which one’ (Mai1981:C:1645), Halang ʔan ‘here, now, this’ (Coo1976:C:8), Sre ʔan ‘why?’ (Dou1950:C:49), Laven ʔan ‘thing’ (Jac2002:C:2) Bru ʔan ‘he, this, they’, Kui ʔan ‘it’ (The1980:C:4226), Pacoh ʔan ‘that’ (Wat2009:C:279), Sora an(in) 3rd person singular pronoun. Car ʔan ‘we two’ (Whi1925:C:170) Riang-Lang an² clause subordinating particle, Palaung ən, RiangLang hnʔ¹ 3rd person singular pronoun, Hu ʔə́ n ‘he, she, it’ (Sva 1991:C:139) Chong ʔan ‘here’ (Isa2007a:C:8-1), ‘this’ (Huf1971:C:6020-790-12) Vietnamese hắn ‘he, she’ (doubtful here in light of h-) Shorto suggests borrowing from Palaungic into Shan of ʔan, nan forms, but rather these are native Tai forms that have also influenced proximal AA languages.

§1115 *ʔən( ) 3rd person singular pronoun *ʔɤːn 3P/DEM Stieng əːn ‘who, what, which?’, Tarieng ʔʌːn ‘they’ (The2001:C:tdf953) Pacoh ʔɤn relative pronoun ‘that’, ‘which’ (Wat2009:C:664) Thin ʔən ‘I’ (Huf1971:C:3140-421-7), ‘we (pl.)’ (Huf1971:C:6504-8477), Khsing-Mul ʔɨn ‘he’ (Pog1990:C:2633) Juray ɨn-lɛn ‘we, us’ (zide1982reconstruction:C:c2.p466.i1514), Sora anin ‘he, she’ (Anand2002savara:C:i221), əni ‘s/he’ Mahali (Anderson p.c.). Central Nicobarese ən 3rd person singular pronoun, Nancowry ʔə́ n. Shorto’s notes on this etymon suggest borrowing from Palaungic into Shan (hense Shan án ‘which’ nā̀ n ‘that’, nán ‘thus’, nàn ‘there’)


alves, jenny and sidwell

but this is unlikely given the widespread distribution of similar forms in Kra-Dai and other language families of the region. 3P/DEM Sid2015: Bahnaric: Khasian: Munda:

Pearic: Vietic:

*han 3P/DEM Laven han ‘he, she, it’ (Jac2002:C:379), Muong han ‘there’ (Blo 2005:C:1783) Lyngngam hən niɁ ‘here’ (Dal2009:C:117) Mundari han ‘yonder’ (Pinnow1959versuch:C:c4.p71.i1178.sV016), Santali han-te ‘away, that way’ (Minegishi2001santali:C:c1.p190.r. i1240) Pear hen ‘she’ (Huf1971:C:5087-672-4), Chong heːn ‘he (familiar); they, (contemptuous)’ (Bar1941:C:166,167-1-1-cog-T) Vietnamese hắn ‘he, she’, Pong han ‘he, she’ (Fer2xx7:C:839-2), Ruc hán ‘him, her’ (Phu1998:C:248), pVietic *han ‘he, she’ (Fer 2xx7:C:839-1) (See comments for § 1115 *(ʔ)anʔ.)

3P/DEM Sid2015: Bahnaric:

*ga(ː)ʔ 3P/DEM Sre gə indefinite pronoun, Röngao gaː (& gaːr!) ‘he, she’, Sedang ga̰ ‘he, she, it’ (Smi2000:C:1332) Katuic: Pacoh kaː ‘that which’ (Wat2009:C:833) Khasian: Khasi ka ‘she’, Pnar ka ‘she’ (Bar2010:C:1118-1), Amwi kə ‘she’ (Wei 1975:C:1117) Khmuic: Khmu Cuang gəː ‘he, it’ (Suw2002:C:2243), Kammu Yuan kə̀ ː ‘he’ (Lin1974:C:Sho2006~26-11) Nicobarese: Nancowry ka ‘who, which, what, that (relative)’ (Man1889:C: 2283), Car ka adjectival prefix (Whi1925:C:2104) 3P/DEM Sid2015: Katuic: Khmuic: Khasian: Monic: Nicobarese: Palaungic:

*ʔɔːʔ 3P/DEM pKatuic *ʔɔː ‘he/she’; Ngeq ʔɔː ‘he’ (Huf1971:C:2830-380-15) Khmu Cuang ʔoʔ ‘I’ (Suw2002:C:2239) Pnar ʔɔ ‘I’ (Bar2010:C:1017-1) Nyah Kur ʔɔːʔ ‘this’ (Huf1971:C:6019-790-3) Car òṅ /ʔɔ̃ / ‘him, her’ Lawa ʔauʔ ‘I’ (Mit1972:C:Dif1980~8), Danaw ʔoʔ¹ ‘I’ Luc1964:C:D326, Lamet ʔɔːʔ ‘I, my’ Lin1978:C:251, Hu ʔɔ́ ʔ ‘I’ (Sva1991:C:151), pPalaungic *ʔɔːʔ ‘I’ (Sidwell 2015 § 28)

austroasiatic affixes and grammatical lexicon

3P/DEM Sid2015: Aslian: Bahnaric:

Katuic: Khasian: Mangic: 3P/DEM Shorto: Sid2015: Aslian: Bahnaric: Katuic: Khasian: Monic: Pearic: Comment:

3P/DEM Shorto: Sid2015: Aslian: Bahnaric: Katuic:

Khasian: Khmeric:


*ʔuːʔ 3P/DEM Jahai ʔoʔ ‘he’ (Bur2005:C:1770), Kensiu ʔuʔ ‘it’ (Pha2006:C:336-1) Tampuon ʔəu ‘other (people)’ (Huf1971:C:4068-537-12), Su’ ʔuː ‘thing’ (Sidxxxx:C:Sid2003~3622), Laven ʔuo ‘respect title for middle-aged or younger man’ (Jac2002:C:34) pKatuic *ʔuː ‘3 person pronoun’, Pacoh ʔuː ‘3S; respect’ Khasi u ‘he’; Pnar ʔu ‘he, masc. singular accusative marker’ (Cho 2004:C:1053) Bolyu ʔaːu⁵⁵ ‘I, me’ (Edm1995:C:1129) (? or < *ʔɔːʔ)

§65a *t₁iʔ ‘that, there’ *tVʔ, *tVh 3P/DEM Semang (ha’) teh ‘there’ (Skeat & Blagden 1906 T 54 (a)). Bahnar tiː, Sedang taj ‘up there’, Sre ti ‘that (spoken of, past)’, Chrau tiʔ (!) ‘there, yonder’ Bru tih ‘there (far)’ (Mil1976:C:Sid2005~860-2), tih ‘up there’ (Smi 1970:C:3463), Pacoh tih ‘up there, ahead’ (Wat2009:C:5309) Khasi -tei ‘that up there, the aforesaid’, -thie ‘that down there’ Middle Mon te’ ‘there, then, that, those’, Mon teʔ ‘there (yonder)’ (Huf1971:C:5932-777-3) Samre tih ‘there’ (Por2001:C:1829-4), teh ‘there (yonder)’ (Huf1971: C:5938-777-15) This and the following set may be linked as a pair with vowel alternation correlating with difference in proximity/elevation from ego.

§66a *tɔʔ ‘that, there’ *tVʔ, *tVh 3P/DEM Semnam tuʔ ‘he, she, it’ (Bur2009:C:301), Stieng tɔːu, Biat tɔː ‘that, there’, Bahnar tɔː ‘that, there (far away)’ Ngeq tuh ‘over there (same level)’ (Smi1970:C:3730), tɔh ‘over there, south, underneath’ (Smi1970:C:3306), Pacho tuh ‘yonder’ (Wat2009:C:5588), tɔh ‘down there, downstream’ (Wat2009:C: 5396) Khasi (hang)to ‘there (mooted, near at hand)’, (u)to ‘he, that (near)’ Khmer dɔː relative particle

308 Monic: Pearic: Vietic:


alves, jenny and sidwell

Mon teʔ ‘there (yonder)’ (Huf1971:C:5932-777-3) Chong (of Ban Thung Saphan) teh ‘there (distant)’ (Huf1985:C: 1829), Chong teh ‘there (yonder)’ (Huf1971:C:5938-777-15) Vietnamese đó ‘that, there’

Other Referential Terms

Note: AA aligns with the areal tendency for demonstraives with n- onset and high front vowel (compare Thai níː, Malay ini, Cantonese ni⁵⁵, and others, ‘this’). Additionally, there is a tendency within AA for paradigmatic and/or expressive vowel variation indexing proximal/distal degrees with typically higher and fronter articulations indicating greater proximity, and visa versa. This suggests, for example, that the difference between Shorto’s *niʔ and *nɔʔ was one of proximity, with the latter indicating perhaps a medial proximity. DEM PROX Shorto: §91.A *niʔ, 91.B *nih ‘this’ Sid2015: *nVʔ, *nVh 3P/DEM (also subsumes the following set, assuming vowel alternation indexing proximity) Aslian: Semai ne ‘just this’ (Mea1987:C:1959) Bahnaric: Stieng neːi, niː ‘this, here’, Sre ne ‘there’, Bahnar ʔnɛj, ʔniː ‘that’, Laven neʔ, neː ‘this’ (Huf1971:C:6018-790-9), Alak neː ‘this’ (The 2001:C:alk-564), Cua ʔɨŋ nɛʔ ‘this’ (Mai1981:C:3535), pBahnaric *niː ‘he, it, this’ (Sid2011:R:574) Katuic: Kuy nìː, Ngeq neː ‘this’ (Smi1970:C:1965), Katu-Phuong neʔ ‘this, this one’ (Cos1971:C:Sid2005~675-2), pKatuic neʔ, neː ‘this’ (Sid 2005:R:675) Khasic: Khasi une, kane ‘this (near me)’, Amwi nǝ ‘this’ (Wei1975:C:1792), Pnar ʔu ni ‘this (masc.)’ (Bar2010:C:2427-1), pKhasic *ne ‘this’ (Sid2012:R:91.A), *ʔəni ‘this’ (Sid2012:R:740) Khmeric: Old Khmer neh, Middle Khmer neh ṇɛḥ, nìh neḥ, Modern Khmer nih ‘this’ (Huf1971:C:6016-790-1) Khmuic: Kammu-Yuan nìʔ ‘this near at hand’, Mal neː ‘this, here’ (Fil2009: C:2104), Mlabri niː ‘this’ (Ris1995:C:960), pKhmuic *niʔ ‘that’ (Sid 2013:R:366) Munda: Sora -ne- in e.g. ten-ne- ‘here’, Kurku ini ‘this’, &c Nicobaric: Central Nicobarese əne ‘that (pronoun)’ Palaungic: Riang-Lang ni² ‘this’, U ní ‘this’ (Sva1988:C:147), Lamet neʔ ‘that’

austroasiatic affixes and grammatical lexicon


table 10.4 Referential terms

DEM PROX DEM (MED ?) ‘self’ ‘many, much’

Vietic: Comment:

§91.A *niʔ, 91.B *nih ‘this’ §92.A *nɔʔ, 92.B *nɔh ‘this’ §87 *ɗeʔ; *ɗeh reflexive pronoun. §737.A *gluŋ, 737.B *gl(i)ŋ ‘much, many’

(Nar1980:C:216), pPalaungic *neʔ ‘this’ (Sidwell 2015 § 545) (Note: Shorto > 91.B) Mương nì ‘this’; Ruc niː ‘this’ (Phu1998:C:843) The -h coda of Khmer forms may be due to analogy with other demonstratives rather than any hypothetical pAA alternation. Vietnamese này ‘this’ also belongs in this entry as the diphthong corresponds to earlier Vietic *-i, as in more conservative varieties of Vietnamese. In Munda, most Munda languages have deictic of similar shape e.g. -ne Sora proximal (Anderson p.c.).

DEM (MED ?) Shorto: §92.A *nɔʔ, 92.B *nɔh ‘this’ Sid2015: *nVʔ, *nVh 3P/DEM Aslian: Semelai nɔʔ ‘here’, nɔʔnɔʔ ‘this’, Temoq ʔanɔʔ ‘this’ Aslian: Semnam nɔh ‘this’ Bahnaric: Bahnar (ʔ)nɔh ‘here, this’ Bahnaric: Chrau nɔʔ (!) ‘there near at hand’, Bahnar ʔnuː, ʔnɔw, ʔnəw ‘here, this’ Katuic: Kuy nàu ‘he, she’, Bru nə̀ w ‘here, she, who’ Khmeric: Old Khmer noḥ, Middle Khmer nɔh ṇoḥ ‘that, there’, nùh noḥ ‘that’ Khmuic: Kammu-Yuan nɔ̀ ː pronoun 3 plural Monic: Mon -nɔʔ ‘this’, Middle Mon ‘ano’ /ənɔʔ/, Modern Mon ənɔʔ ‘here’ Munda: Santali no-ko ‘these’ no-a ‘this.inanimate’ (Anderson 2008: 45) Palaungic: Mae Sariang (saŋeʔ) nɔʔ ‘to(day)’, Praok nɔ pronoun 3 singular, pPalaungic *nɔʔ ‘this (prox.)’ (Sidwell 2015 § 543) Vietic: Vietnamese nọ ‘this’ Comment: There are phonological irregularities among the coda consonants, in particular unexpected zero and -h, although these may reflect Khmer influence *nɔʔ is essentially sufficient to account for these diverse reflexes.


alves, jenny and sidwell

Deictic (doubtful pAA) Shorto: §1435a *ʔ(əj)ʔ; *ʔ(əj)h; *h(əj)ʔ deictic Aslian: Kintaq Bong ʔəh ‘this’, ʔə̃ h ‘here’ Bahnaric: Biat iː locative pronominal head (?), Bahnar ɛj ‘that near at hand’ Khmeric: Khmer (s)ʔɤj, ʔvɤj ‘what?’ Nicobaric: Central Nicobarese ẽh ‘near, close, this’, Nancowry ʔɛh ‘near’ Palaungic: Riang-Lang e¹ ‘that’, Lawa Bo Luang ʔəih, Lawa Umphai ʔeh, Mae Sariang ʔɛih ‘this (year)’ Vietic: Vietnamese ấy ‘that’ Comment: Shorto may have erred, being too keen to compare -j and -h codas, confusing potentially distinct or related etyma. The forms assembed under §1435a may well reflect two or three distinct etyma. Deictic (doubtful pAA) Shorto: §1475 *n₁aːj deictic. Khmeric: Khmer nìːəj ‘on the far side, over there’ Khmuic: Kammu-Yuan nàːj ‘that’, Khmu Cuang naːj ‘that’ Vietic: Vietnamese này ‘this, these’ Comment: The correspondence between Khmer and Kammu forms is regular but trivial so borrowing cannot be ruled out. Also the Vietnamese form may belong to *niʔ. ‘self’ Shorto: Bahnaric:

Khasic: Khmuic: Monic: Munda: Nicobaric: Palaungic:

§87 *ɗeʔ; *ɗeh reflexive pronoun. Chrau dɛː belonging to, Bahnar ɗɛː indefinite pronoun, Bahnar dəh 3rd person possessive pronoun, Bahnar kədih reflexive pronoun Khasi (la)de Kammu-Yuan teː general pronoun Old Mon ḍeḥ /ɗeh/, Modern Mon deh 3rd person pronoun (low honorific form) Gtaʔ =ɖæʔ reflexive (Anderson p.c.), Juang -ɖɛrɔ ‘reflexive pronoun, (self)’ (?) (Pinnow1960beitraege:C:i431) Nancowry dēde, rēre ‘self’ Palaung de, Riang-Lang dɛʔ¹, Praok ti reflexive pronoun, Lawa Bo Luang teʔ, Lawa Umphai tɛʔ

austroasiatic affixes and grammatical lexicon


‘self’ (doubtful pAA) Shorto: §483 *ʔeːŋ ‘self, oneself’ Monic: Mon ‘iṅ ‘oneself’, Thailand Mon ʔəɲ ‘oneself’ (Huf1971:C:4024533-4); Note: not attested in Myanmar Mon, likely borrowed from Thai (ultimately Khmer). Nyah Kur ʔéːɲ ‘oneself’, also ‘I, myself’ (see above) (The1984:C:4061-1) Khmer: Khmer ʔaeŋ ‘self, oneself, this very’ Katuic: Kuy ʔeːɲ ‘(one)self, alone’ Bahnaric: Stieng iːŋ ‘oneself, alone, individual’, Biat eːŋ ‘in person, alone’ Comment: This etymon is found also in Thai and Lao, but is missing from Tai languages closely related but further afield such as Shan. Possibly the Khmer form diffused into SWTai and some AA groups. ‘many, much’ Shorto: §737.A *gluŋ, 737.B *gl(i)ŋ ‘much, many’ Bahnaric: Halang gəgluŋ gəglaŋ ‘many animals’ (Coo1976:C:867), kəliːŋ ‘very much’ (Coo1976:C:1776), Sapuan klaŋ klaŋ ‘much, extremely’ (Jac1999:C:208) (< Mon?) Katuic: Ngeq klɨŋ ‘many’ (Smi1970:C:1402), Souei klɨŋ ‘many’ (Huf1971:C: 3822-509-8), Pacoh klɯŋ ‘many (of people)’ (Wat2009:C:1887), Katu An Diem klɨəŋ klaːŋ ‘many’ (Cos1971:C:1911-1) Khasi: Khasi kyllong ‘very big’ Monic: Middle Mon gluiṅ, Old Mon gluṅ, gloṅ /glɯŋ/, Modern Mon klàŋ ‘to be much, many’, Nyah Kur khlɤ̀ŋ ‘much, many’ (The1984:C: 1203-1) Comment: Shorto also lists Palaung lɯŋ ‘(animals) to be many’ but this is much more likely to be a borrowing of Shan lɤ̀ŋ ‘to be plentiful’.


Interrogative Terms

There are few interrogative terms reconstructable to pAA, but those below are widespread in the family and thus solid AA etyma. Q Shorto: Bahnaric: Monic:

§46 *(ʔ)ciʔ relative/interrogative pronoun Sre chi ‘it, which’, Sre nchi ‘what, which’ Nyah Kur [Southern] cíiʔ ‘how many, how much’ (The1984:C:6493), Mon [Myanmar] mùʔ-ciʔ, [Thailand] mòa ciʔ ‘how much’ (Huf1971:C:3070-414-3)


alves, jenny and sidwell

table 10.5 Interrogative terms

Q §46 *(ʔ)ciʔ relative/interrogative pronoun Q §136.A *m(o)ʔ ‘what’ Q §136.B *m(o)h ‘what’


Mundari, Birhor, Ho ci ‘what, interrogative suffix’ (Pinnow 1959: 119) Santali ce:ˀd ‘what’ (suresh2002santali:C:i246), Santali (Singhbhum) cet˺ ‘what’ (Minegishi2001santali:C:c1.p62.r.i397), Santali ceˀd ‘what’ (Pinnow1959versuch:C:c2.p119.i1407.sV222) Nicobarese: Nancowry ciː ‘who’ (Rajasingh 2016: 39), chya ‘what, why?’ (Man 1889:C:473) Palaungic: Palaung se ‘what (relative/interrogative), anything’ Vietic: Mương chi, Vietnamese chi, gì ‘what; anything’ Comment: There are serious phonological problems with this etymology, yet the confluence of form and meaning across branches is striking. pAA onset *c shifted generally to /s/ in South Bahnaric, Munda, and Palaung, hence Shorto’s tentative preconsonantal glottal stop hypothetically blocking the lenition. The following two sets are phonologically similar and may be etymologically related, suggestive of an alternation of codas ʔ~h similar to what we find among the Khmer demonstratives. However, this is speculative and we treat these separately below. Q Shorto: Bahnaric: Khmuic: Monic: Munda:


§136.A *m(o)ʔ ‘what’ Chrau mɔʔ ‘what, why’, Biat mɔh ‘which, why?’, Alak mɔʔ c’ɔʔ ‘how, why’ (Huf1971:C:3041-410-11) Kammu-Yuan mɔ̀ ʔ ‘who’ Old Mon mu, mo’ /mɯʔ/, Modern Mon mòʔ ‘what?’, Nyah Kur múːʔ ‘what, why, how, anything, something’ (The1984:C:381-11) Bondo (Plains) ma ‘what, whatsoever, any, something, nothing, what for’ (bhattacharya1968bonda:C:c1.p102.r1.i2041.s2036), Bondo (Hill) ma ‘what’ (fernandez1963standard:C:c1.i953.sN1317), Bondo ma ‘what’ (gopalakrishnan2002bonda:C:i246), Gta’ mɛe ‘what, which (of a number)’ (chatterji1963gata:C:i.s10661) Palaung mɔ ‘what, which, where, when?’, Praok mɔ ‘who, which?’, pPalaungic *mɔ/ɤːʔ ‘any’

austroasiatic affixes and grammatical lexicon

Q Shorto: Aslian: Katuic:

Khmuic: Palaungic:


§136.B *m(o)h ‘what’ Semai maːh, Semaq Beri hmɔh ‘what?’, Semelai hmɔh ‘what’ (Kru 2004:C:1282) Souei mɔh ‘to ask’ (Fer1974a:C:Sid2005~1041-2), Ngeq Chatong tamuh ‘ask, ask for, question, greet’ (The2001:C:Sid2005~1041-8), Kantu timuah ‘ask, ask for, question, greet’ (The2001:C:Sid2005~ 1041-9), Bru ʔamɒ̤ h ‘to ask, ask for’ (The1980:C:4210) Kammu-Yuan məh ‘what?’, Khsing-Mul təmoh ‘where’ (Pog1990: C:724), həlmoh ‘what’ (Pog1990:C:351) Palaung mɔh ‘any’

Q (Doubtful pAA) Shorto: §1855 *laːw ‘which?, what?’ Aslian: Kensiu ləw, Temiar loʔ ‘what?’, Semnam loʔ ‘what?’ (Bur2009:C: 1208) Monic: Old Mon lhāw /lhaw/, Modern Mon lɒ ‘which?, what?’ Vietic: Vietnamese nào which? Comment: It is not clear why Shorto included Viet. nào in this etymology, since the n:l correspondence is not regular. Assuming Vietnamese is unrelated, this is limited to Southern-AA. The Old Mon form (l)h(ā)w occurs only once in the corpus and the reading is not certain. Shorto (1971: 344) compares Sre -löh, Pangan -löō, Bah. liliö ‘what, how, why’.


Locative Terms

There are many items in this category, including various strong PAA candidates. In some cases, they have no source content words, while others are clear instances of grammaticalization from concrete words, such as body part terms. They have undergone additional semantic change and grammaticalization in various languages, sometimes obscuring the original semantic features. ‘middle’ Shorto: Aslian: Bahnaric:

§#85 *ɗiːʔ ‘middle, in’ Jahai padeʔ, Temiar pʌdɪʔ ‘middle’ Sre də dative particle (locative particle, Dournes 1950), Biat diː (dɔl) ‘(in the) middle’, Bahnar aneʔ ‘in the middle’, Sedang tadḛj


Katuic: Khmeric: Khmuic: Monic:

Mangic: Munda: Palaungic: Pearic: Other: Comment:

‘in’ Shorto: Aslian: Bahnaric:

Katuic: Khasian: Khmeric: Khmuic: Palaungic:


alves, jenny and sidwell

‘middle’, Cua kadiː ‘half, middle’ (Mai1981:C:373), Laven kdɛj ‘middle’ (Jac2002:C:693), Tampuon ʔnti̤ʔ ‘middle’ (Cro2004:C:15811C), Nyaheun ttiː ‘middle’ (Jac2003:C:Sid2003~4672) Kuy nthìː, Bru ndiː ‘middle’, Katu-AnDiem tadiː ‘half’, Pacoh tər.ɗiː ‘in the middle, half full’ Khmer ptej ‘stomach’ Kammu-Yuan taː, təː ‘at’, Thin dă, də̆ ‘in’; Kammu-Yuan tərtìʔ ‘between’ Old Mon ḍey /ɗɔy/ locative particle, Nyah Kur dɤ́j ‘middle’, Old Mon pḍey ‘inside’, Middle Mon pḍay, Modern Mon doa ‘in’, Old Mon tirḍey, Modern Mon hədoa Mang ɗøː⁶ ‘stomach’ (Loi2008:C:288) Sora tə’rɑːŋdiː-, tɑ’rɑːŋdiː-n ‘middle’ (Pinnow 1959). Palaung kəndi ‘middle’, U thɛ̀ʕ ‘middle’, Lamet Lampang phɯn thiʔ ‘in the middle’ Chong toŋ daːj ‘middle’ (loan?) Proto-Austronesian *di: Malay di- locative prefix, &c. (Dempwolff 1938: 40). It appears that the original lexical meaning is ‘middle’, variously grammaticalizing to ‘in’, ‘inside,’ among other related senses.

§593 *( )n₁uŋ ‘in’ Semelai knɛŋ ‘before’ (Kru2004:C:103) Cua neŋ ‘at’, Mnong-Rölöm nəːŋ ‘facing, in front of’, Brao nɨŋ ‘up at, on top of; located at’ Sapuan, Oi nɨŋ ‘north, above’, Stieng kənuːŋ ‘in, inside’, Biat knoŋ ‘in’, pWestBahnaric *knuŋ ‘inside’ (Sid2003:R:865) (< Khmer) Kuy nɤŋ ‘in’, Ngeq nuŋ ‘with/and’ (< Khmer) Khasi neng ‘above’ (Shorto #727.C *ləŋ ‘above, on’) Khmer knoŋ ‘in’, nɨŋ ‘and’, Surin Khmer nɑŋ ‘by; with, and’ Khmu Cuang ʔnɨŋ ‘there, that one (higher level)’ (Suw2002:C: 4255), pKhmuic *ʔnəŋ ‘above’ (Sid2013:R:607) Palaung nɯŋ ‘above, upstream’, Lawa nɨŋ ‘on’ (Huf1971:C:4004531-2), ‘to, toward’ (Huf1971:C:6134-805-3), pPalaungic *nə(ː)ŋ ‘on, by’ (Sid2010:R:742) Shorto includes forms with initial /l/, but these are grouped here with #727; otherwise, the distribution of initial /n/ forms suggests an older locative with meanings related to ‘above’ or ‘in front’.

austroasiatic affixes and grammatical lexicon


table 10.6 Locative terms

‘middle’ ‘in’ ‘in, at, towards’ ‘above, on’ ‘belly’ ‘belly’ ‘back’ ‘near’ ‘to reach, towards’ ‘side, edge’ (? < ‘elbow’)

‘above, on’ Shorto: Aslian: Katuic: Khmeric: Monic: Munda: Palaungic: Vietic: Comment:

‘above, on’ Aslian: Bahnaric:


§#85 *ɗiːʔ ‘middle, in’ §593 *( )n₁uŋ ‘in’ §67 *t₁uːʔ ‘in, at’ §727.A *luŋ, §727.B *luəŋ ‘above, on’, §1743 *dul; *duəl; *dəl ‘middle, belly’ §735 *kluuŋ (& *kluŋ?); *(k)luəŋ ‘middle, belly’ §1844 *krawʔ ‘back, behind’ §2014.A *t₂ɗih, 2014.B *t₂ɗəh, 2014.C *t₂ɗiʔ ‘near’ §1483 *b(oː)j ‘direction, towards’ §504.A *ɟkiː(ŋ), §504.B *ɟkiə(ŋ), § 504.C *ɟkai(ŋ) ‘side, edge’

§727.A *luŋ, §727.B *luəŋ ‘above, on’, Sakai (gua)-long ‘on top’ (i.e. Jah Hut; Skeat & Blagden 1906 A 9), Semelai leŋ ‘upwards, upstream’ (Kru2004:C:1241) Kuy lɔ̀ ːŋ ‘(to go) high up, lofty’ Khmer laəŋ ‘to rise up’ (↔ Thai lə̌ ːŋ ‘going up too high’) Old Mon cloṅ /clɔŋ/ ‘highest point, spire’ Sora ’laŋkaː-n ‘above’, loːŋ- ‘inside’, Juang aliŋ ‘top’ (Pinnow1960 beitraege:C:i51), Riang-Lang lɔŋ² ‘in’, Praok loŋ ‘above’ Ruc liːŋ¹ ‘top, upstream’ (Fer2xx7:C:932-3), lîŋ cɨ́ːt ‘mountain top’ (Phu1998:C:708), Pong leːŋ¹ ‘to mount’ (Fer2xx7:C:931-5). Shorto includes forms with initial /n/, but these are grouped here with #593. Shorto also connects forms with initial /p/ by infixation, but this unconvincing, and is probably a separate unrelated etymon (see below).

Semnam kipaːŋ ‘upper side’ (Bur2009:C:1183), Jahai krpiŋ ‘above’ (Bur2005:C:833), Kensiu kɛpiŋ ‘on’ (Pha2006:C:419-1) Bahnar kəpəŋ ‘above, on top of’, Tampuon pəŋ ‘in, at’, Alak pəːŋ ‘in, at’, Halang pèːŋ above, Jeh pèːŋ ‘upper side’, Brao pɨːŋ ‘on top of’, Jru’ pɨəŋ ‘above, on top’, pWBahnaric *pɨːŋ ‘over; summit’ (Sid2003:R:1032), pBahnaric *paŋ ‘above’ (Sid2011:R:605) Kuy pɑːŋ ‘on, above’, Bru pəːŋ ‘above’, Ngeg pəːŋ ‘upstream’, pKatuic *pəːŋ ‘above, upstream’ (Sid2005:R:382)

316 Khmeric: Khmuic: Comment:

‘belly’ Shorto: Aslian: Bahnaric:

Katuic: Khmeric: Monic: Munda: Palaungic:



‘belly’ Shorto: Aslian: Bahnaric: Katuic: Khmuic: Palaungic: Vietic:

alves, jenny and sidwell

Surin Khmer pʌːŋ ‘to lift up, raise’ (Dha1978:C:3080) Mlabri pɤːŋ ‘to tilt, be oblique’ (Ris1995:C:1018) The above comparisons strong suggest an Eastern AA proto-form *pVŋ ‘above’ with some subsequent dialect borrowing confusing the regularity of vowel correspondences.

§1743 *dul; *duəl; *dəl ‘middle, belly’ Central Sakai pĕduâl ‘centre’, Temiar eijhdel ‘intestines, guts, bowels, entrails’ (Mea1998:C:774) Stieng kənduːl, Sre (kə)ndul, Chrau kəndɯl, Biat ndul ‘belly’, Biat (diː) dɔl ‘(in the) middle’, Stieng kənɔːl ‘middle’, Laven pdəl ‘stomach’ (Jac2002:C:1096), Tarieng ʔadil ‘stomach’ (The2001:C: kgc-169), pBahnaric *-dɨl ‘stomach’ (Sid2011:R:169) Ngeq tṳl ‘stomach’ (The2001:C:Sid2005~1374-1), pKatuic *-dul ‘stomach’ (Sid2005:R:1374) Khmer tùl ‘belly flesh (of certain fish)’, kɔndaːl ‘middle’ (?, with secondary lengthening; if so, → Chrau kənɗaːl) Middle Mon dor /dow/, Modern Mon tò ‘middle’ proto-Kherwarian *tɔl ‘middle’ (Munda1968proto:R:c3.p1218.i 140-2) Lawa Bo Luang tu ‘intestines’, Praok tu ‘belly’, U tû ‘belly, stomach’ (Sva1988:C:91), Lamet kətɨl ‘belly’ (Lin1978:C:68), pWaic *kdɤl ‘stomach’ (Dif1980:R:L11-1), pPalaungic *kɗəl ‘belly’ (Sid2010:R: 477) Ruc kdəːl ‘belly’ (Phu1998:C:421), Tho (Cuoi Cham) duːl³ ‘belly, stomach’ (Fer2xx7:C:551-5), Malieng kudəl¹ ‘g’ (Fer2xx7:C:5511).2 Semantic development from ‘belly’ to ‘middle’ is likely on the bases of attested grammaticalization paths.

§735 *kluuŋ (& *kluŋ?); *(k)luəŋ ‘middle, belly’ Kenaboi bûlang ‘belly’ (Skeat and Blagden 1906 B 162) Stieng kluːŋ ‘(in the) middle’, Sre kluŋ ‘stomach’ Bru kloŋ ‘inside’ (The1980:C:Sid2005~1385-2) Khmu Cuang kluəŋ ‘in, inside’ (Suw2002:C:4312), O’du kluŋ ‘in, within’ (Dan1983:C:256) Riang-Lang kluŋ¹ ‘belly, womb’ Mương Hoa Binh kloːŋ⁴ ‘belly’ (Fer2xx7:C:1197-10), Pong kluŋ

austroasiatic affixes and grammatical lexicon



‘belly’ (Fer2xx7:C:1197-7), Thavung khalûŋ ‘belly’ (Suw2000:C: 643), Vietnamese trong ‘in’ Shorto compares Vietnamese lòng ‘intestines, heart’. While similar to Sino-Vietnamese trung ‘middle’, the vowel difference cannot be explained, and the connection of trong with Austroasiatic (as well as the attested *kl to /ʈ/) is the more likely source.

The development from ‘belly’ to ‘inside’ appears to be restricted to Khmuic. ‘back’ Shorto: Aslian: Monic:

Palaungic: Vietic: Comment:

‘near’ Shorto: Aslian: Aslian: Bahnaric: Bahnaric: Khmuic: Munda: Nicobaric: Palaungic: Vietic: Comment:

§1844 *krawʔ ‘back, behind’ Kensiu kijɔʔ, Temiar kərɯʔ ‘back’; Semnam kjoʔ ‘back’ (Bur2009: C:35), Jahai krɔʔ ‘back (of person)’ (Bur2005:C:817) Old Mon krow /krɔw/ ‘behind, after’, Modern Mon krao ‘to be subsequent’, Nyah Kur ŋkráw ‘in the rear, behind’ (The1984:C: 3834-1) Lawa Umphai (ka)ŋgroʔ, Mae Sariang (ɣa)ŋgjoʔ ‘back’, Danaw təkjɛn² ‘back of body’ (Luc1964:C:D-729) Mương khau, Vietnamese sau ‘behind’ Shorto compres Burmese kro ‘back’, a comparison that does not hold, as the spelling in Burmese is kjo rather than kro (both /ʨɔ̀ /).

§2014.A *t₂ɗih, 2014.B *t₂ɗəh, 2014.C *t₂ɗiʔ ‘near’ Mendriq pədəh, Lanoh pələndɔh ‘near’ Kensiu tədeh ‘near’, Semelai ddɛs ‘near’ (Kru2004:C:766) Sre, Biat dih ‘outside’ (?) Stieng dəːh ‘near’ Phong kdiəh ‘near’ (Bui2000:C:748) Asuri aɖeː ‘near, towards’ (Konow1906asuri:C:i3), Koda hɛdɛ ‘near’ (Kim2010santali:C:c11.p109.r282) Car röh-ta ‘to be near’ (Whi1925:C:5099) Praok de, Lawa Bo Luang sandaiʔ, Lawa Umphai, Mae Sariang sandiʔ ‘near’ Tho (Cuoi Cham) dəː¹ ‘near’ (Fer2xx7:C:72-6), Pong sdəː ‘near’ (Fer2xx7:C:72-2) Note Malay pada ‘on, at’ as possible source of Aslian forms. Shorto also compares Kammu-Yuan les ‘near’ but unlikely to be related due to l- onset.


alves, jenny and sidwell

‘to reach, towards’ Shorto: §1483 *b(oː)j ‘direction, towards’ Bahnaric: Bunör bəːj, Central Rölöm pəːj ‘at the point of’, Biat bəːi ‘nearly’ Khasian: Khasi poi ‘to reach’ (Sin1906:C:4432), Pnar pɔj ‘to reach’ (Bar2010: C:1099-1) Monic: Old Mon boy /boj/ ‘direction, location, manner’ biboy ‘towards, in accordance with’, Modern Mon pòa adverbial phrase head. Comment: We assume that the Khasian semantics indicate the original lexical meaning, with other senses demonstrating grammaticalization. Shorto also compared Vietnamese về ‘towards/return’, but it is rather from proto-Vietic *ve:r ‘to return’ (Fer2xx7:R:473) (see Shorto §1669 ‘to go round, to turn round’). ‘behind, after’ (doubtful pAA) Shorto: §1505.A *ruj, 1505.B *ruːj ‘behind’ Katuic: Kuy krɔːi ‘afterwards, later’ (< Khmer?) Khmeric: Khmer kraoj ‘behind, after’, Surin Khmer krɔːj ‘afterwards, later; backward, behind’ (Dha1978:C:846) Palaungic: Palaung krɯj ‘(time) before’ Vietic: Vietnamese rồi ‘to finish; already, afterwards’ (both the phonological and semantic correspondence of the Vietnamese form make the connection doubtful; the usually observed etymological development is ‘be finished’ > ‘then’ > ‘afterwards’, which would be inversed in Vietnamese in this case) Comment: Viet. rồi lacks initial k- (*kr- > ʂ-). Perhaps kr- onset in others is due to contamination by *krawʔ ‘back’ (§ 1844). It is also almost certain that the Kuy form is borrowed from Khmer, and this is also possible in respect of Palaung as this etymon is apparently abscent alsewhere in Palaungic. This leaves this etymon restricted to Khmer and (more doubtfully) Vietic. ‘above’ (doubtful pAA) Shorto: §1703 *kaːl ‘in front, before’ Aslian: Temiar kal ‘later on, subsequently’ (Mea1998:C:1248) Khmuic: Kammu-Yuan káːl ‘before’, Khmu Cuang kaːl ‘in front of’ (Suw 2002:C:2583) Palaungic: Praok ka ‘first, before, until’, Lawa Bo Luang, Lawa Umphai ka ‘in front, before’

austroasiatic affixes and grammatical lexicon



Shorto adds Sakai kâl ‘tomorrow’ (i.e. Semai), (Skeat & Blagden 1906 M 178). The semantics of the Aslian comparison are somewhat problematic, so perhaps it is an accidental lookalike with an otherwise Northern AA etymon. Also note, the loss of coda -l is regular in Waic.

‘in, at, towards’ Shorto: §67 *t₁uːʔ ‘in, at’ Bahnaric: Stieng tuː ‘in, at, with’, Chrau tuː ‘at, to’, Bahnar təː ‘to, towards’, Sedang tɔ ‘also’ (Smi2000:C:64), Mnong Rölöm tɔː ‘at, to with, by, for, pertaining to, and’ (Blo2005:C:5514) Katuic: PKatic: *təʔ ‘come; at; more’ (Sid2005:R:396), Pacoh toʔ ‘in; at; to; until’ (Wat2009:C:5513), Kui tɔʔ ‘until’ (Huf1971:C:6345-829-6), Ngeq təʔ ‘at’ (Tho1978b:C:Sid2005~396-7), Bru tɤ̀ʔ ‘arrive’ (Huf 1971:C:582-72-14), Katang tɨəʔ ‘to come’ (Mil1988d:C:Sid2005~ 409-7) Palaungic: Palaung tə (in senses) ‘in (to)’, Riang-Lang tuʔ¹ (in senses) ‘in (to)’, Lawa Bo Luang tauʔ, Lawa Umphai, Mae Sariang toʔ ‘middle, in the middle of’, Praok daə ‘in (to)’, pPalaungic *tuːʔ ‘inside’ (Sidwell 2015 §839) Comment: Katuic forms such as Bru and Katang with meanings ‘to arrive’, ‘to come’ perhaps indicate the original verbal meaning of this grammaticalized etymon. Note forms included in Sidwell’s (2005) set §396 with meaning ‘more’ probably reflect a separate etymon. ‘side, edge’ (? < ‘elbow’) Shorto: §504.A *ɟkiː(ŋ), §504.B *ɟkiə(ŋ), § 504.C *ɟkai(ŋ) ‘side, edge’ Bahnaric: Sre kiŋ ‘edge, direction’, Biat keːŋ (mɛːŋ), Jeh kiːŋ ‘edge’, Sre səkiŋ ‘on one’s side, to one side’, Chrau ŋkeːɲ ‘on one’s side’, Biat ŋkeːŋ ‘on one’s side; to lean over’, Halang kəniːŋ ‘edge’, Tampuon kɛːŋ ‘direction, side’ (Cro2004:C:617-1C) Katuic: Kuy khɛː̀ɲ ‘on one’s side, to one side’, ŋkhɛː̀ɲ ‘to tilt, lean’, Bru sakèːŋ ‘to tilt’, Katu Triw paŋgɛːŋ ‘leaning to one side’ (The2001:C: Sid2005~593-4) Khasic: Khasi kynring ‘by the side, towards the side’ Khmuic: Khmu NgheAn waːŋ skɛːŋ ‘place a thing on its side’ (Suw2002:C: 2949), Phong taːʔ taŋ skʰɛːŋ ‘lie on the side’ (Bui2000:C:1735) Munda: Sora ’sʔeːŋ-ən ‘side, direction’, Kharia si’niŋ ‘side, direction’ Palaungic: Lawa ŋgiəŋ ‘side’ (Huf1971:C:5197-687-1)


alves, jenny and sidwell



Sách təkeːŋ² ‘on the side of’ (Fer2xx7:C:1070-2), Rục cəkeːŋ² ‘on the side of’ (Fer2xx7:C:1070-3), Thavung cakɛ̀ːŋ, cakhɛ̀ːŋ ‘to lean to one side, side’ (Suw2000:C:103), Pong tkɛːŋ ‘on the side of’ (Fer2xx7:C:1070-7) Notwithstanding Shorto’s reconstruction of three phonologically distinct rhymes, most if not all of the above forms probably reflect a single AA etymon which is also attested in terms for ‘elbow’ under Shorto §891, also pKhmuic *kiəŋ ‘elbow’ (Sid2013:R:242), pKatuic *kɛɛŋ/*trkɛɛŋ ‘elbow’ (Sid2005:R:604), pBahnaric *kiəŋ ‘elbow’ (Sid2011:R:390), Nyah Kur kéːɲ ‘elbow’ (The1984:C:1594-1), Nancowry det-ongkēang ‘elbow’ (Man1889:C:630), Khmer kaeŋ daj ‘elbow’ etc.

‘to follow; according to’ (doubtful pAA) Shorto: §1346 *(ʔ)t1aːm ‘according to’ Bahnaric: Halang taːm ‘to run after and capture; to meet; overtake’ (Coo 1976:C:3381), Nyaheun taːm ‘together’ (Fer1998:C:1158), ‘to catch up with swm.’ (Jac2003:C:Sid2003~443), Lavi taːm ‘on time, in time’ (The2001:C:Sid2003~440), Brao taːm ‘after, behind’ (Kel 1977:C:Sid2003~445) Khmer: Khmer taːm ‘to follow; according to’ Monic: Mon tāṁ ‘according to’ (listed in Hallidays’ dictionary, marked as used by Mon in Siam); not attested in Myanmar Mon (likely loan from Thai) Munda: Sora tam- particle generally prefixed to berna:-n as in tam-berna: -n ‘according to thy word’ (Ramamurti 1986: 276) Pearic: Chong taːm ‘follow’ (Huf1971:C:2440-325-17) Comment: Shorto suggests borrowing: Mon → Thai → Khmer; however, it is more likely that the non-Munda AA forms are all borrowed from Tai, e.g. Thai taːm ‘to follow; following along, according to’, pTai *tam ‘to follow, continue (as is)’ (Li 1977). The Munda comparison is dubious.


Grammatical(ized) Verbs

Only some of the verbs in this section were likely grammatical in pAA, but they all exhibit enough grammaticalization tendencies to warrant attention in studies of AA historical grammar.

austroasiatic affixes and grammatical lexicon


table 10.7 Grammatical(ized) verbs

‘to be’ ‘to lack, cease’

§2046.A *muh, 2046.B *muəh, 2046.C *muʔ ‘to be’ §943.A *ʔət, §943.B *ʔəːt, § 943.C *(ʔ)it ‘used up, finished, lacking’ ‘to give’ §1119.A *ʔun, §1119.B *ʔuːn, § 1119.C *ʔuən, § 1119.D *ʔan, §1119.E *ʔaːn ‘to give’ ‘to obtain’ §1179 *( )ɓ(ɯə)n ‘to get, obtain’ ‘to pass’ §1200.A *lun, 1200.B *luən, 1200.C *lən (& *lan?) ‘to pass, to exceed’ ‘to follow, accompany’ §1463.A *t₁uj, 1463.B *t₁uːj, 1463.C *t₁uəj ‘to follow, accompany’ ‘finished’ §2066.A *lah, 2066.B (*lah-s >) *las, 2066.C *laːs ‘finished’ ‘to grow’ §1219.A *hən, 1219.B *həːn ‘to grow, to increase’ ‘plant, to begin’ §1343.A *t₂əm, §1343.B *t₂əːm, § 1343.C *t₂am ‘plant, to grow; to begin’

‘to be’ Shorto: Aslian:

§2046.A *muh, 2046.B *muəh, 2046.C *muʔ ‘to be’ Sakai moh ‘to be’, Semang moah ‘to be’, Temiar mɔʔ ‘there is’, ‘to have’ (Mea1998:C:1774) Khmeric: Old Khmer moḥ /mùəh/ ‘that is’ Nicobarese: Car muh ‘there, thither’ (Whi1925:C:4086) Palaungic: Palaung mɯh ‘to be’, (?) Lawa mah ‘to be’, Praok mɔ ‘to be’, RiangLang moʔ² ‘to remain, stay’ ‘to lack, cease’ Shorto: §943.A *ʔət, §943.B *ʔəːt, §943.C *(ʔ)it ‘used up, finished, lacking’ Bahnaric: Sre ət ‘restrain, to hold (breath), suppress (cough &c.)’, Chrau ət ‘lacking, to hold (breath)’, Biat ɔt ‘to abstain from’, Bahnar ət ‘(wind) to stop, to hold (breath)’, Brao ʔɑt ‘without’ (Huf1971:C: 6741-875-12), pBahnaric *ʔət ‘to stop doing’ (Sid2011:R:964) Katuic: Kuy ʔat ‘to lack, to restrain, to hold (breath)’, Bru ʔəːt ‘to stop’ (Huf1971:C:5693-744-10) Khmeric: Khmer ʔɔt, ʔɤt ‘to be without’ Khmuic: Khmu Cuang ʔɨt ‘suppress or hold back’ (Suw2002:C:1258) Monic: Old Mon ’ut /ʔøt/ ‘all’, Modern Mon ɒt also ‘to be exhausted, have


Munda: Nicobaric: Palaungic: Khasic: Vietic: Comment:

alves, jenny and sidwell

exhausted (or variant)’, Proto-Nyah Kur *ʔə̱ t ‘used up, completely’ (DIFFLOTH 1984 V124) Sora (i) rə’jad- ‘to be exhausted, used up’, Gta’ aɗa- ‘to finish (doing something)’ (chatterji1963gata:C:i.s291) Car öt ‘not’, the negative particle with adjectives (Whi1925:C: 4437) Riang-Lang ət¹ ‘to cease’, pPalaungic *ʔɤt ‘to stop, refrain’ (Sidwell 2015 §37) Khasi jing-it, jynit ‘fast, abstinence from food’ Mương hết (Barker 1966: 18), Vietnamese hết ‘to end, be finished, cease, to finish’ Shorto also compares Central Nicobarese leɛt ‘finished, to cease’ but this is doubtful considering the initial.

Although Shorto reconstructs three proto-variants, a single form *ʔət probably explains all the above. Katuic forms are restricted to the western sub-branch, so they may be back-borrowings from Khmer. Semantic developments derserve further consideration (e.g. could the verbal meanings have developed from ‘to be all/complete’?) ‘to give’ Shorto: Bahnaric:

Katuic: Khmuic: Munda: Comment:

‘to obtain’ Shorto: Katuic:

§1119.A *ʔun, §1119.B *ʔuːn, §1119.C *ʔuən, § 1119.D *ʔan, § 1119.E *ʔaːn ‘to give’ Bahnar an ‘to give’, Tampuon ʔɔn ‘to give’ (Huf1971:C:2616-348-16); Chrau ɯn ‘to give’; Stieng, Chrau, Biat aːn ‘to give, permit’, pBahnaric *ʔaːn ‘to give, permit’ (Sid2000:R:66) Kuy ʔɑːn, Bru ʔɔ̃ ːn ‘to give’, Souei ʔoːn ‘to give’ (Huf1971:C:2616348-9), pKatuic ʔoːn ‘to give’ (Sid2005:R:1115) Kammu-Yuan ùːn ‘to give’; Thin ʔăn ‘to give’, Khmu-Cuang ʔan ‘to give’ (Suw2002:C:3808), pKhmuic *ʔan ‘to give’ (Sid2013:R:592) Kharia un ‘to put, keep’, The vowel correspondences are certainly problematic, but they may be explained by various factors such as Sandhi forms or postpositional phonological weakening in constructions.

§1179 *( )ɓ(ɯə)n ‘to get, obtain’ Kuy bɯːn ‘to get, obtain, to be able to’, Bru bəːn ‘able’ (Huf1971:C: 332-39-13), Katu An Diem bʌːn ‘to have’ (Cos1971:C:Sid2005~37314), Ta’Oi bɤːn ‘able to’ (The2001:C:Sid2005~373-6), pKatuic *ɓəːn ‘able, have, get’ (Sid2005:R:373)

austroasiatic affixes and grammatical lexicon

Khmeric: Khmuic:

Palaungic: Comment:

‘to pass’ Shorto: Bahnaric: Khmeric: Monic: Palaungic:

Vietic: Comment:


Khmer baːn ‘to get, obtain; be able to’ Kammu-Yuan pɯ̀ an ‘to get, to be able to’, Mlabri bɤːn ‘can; know how to; acquire’ (Ris1995:C:43), pKhmuic *bəːn ‘to get, be able’ (Sid2013:R:18) Palaung bɯn, Riang-Lang bɔn¹, Praok pon ‘to get, obtain’, pPalaungic *ɓɤːn ‘to get, obtain’ (Sidwell 2015 § 71) The development ‘to obtain’ → ‘to be able’ is typologically common in multi-verb predicates.

§1200.A *lun, 1200.B *luən, 1200.C *lən (& *lan?) ‘to pass, to exceed’ Biat lan ‘past, ago’ Khmer lùən ‘very, excessive(ly)’ Middle Mon l(w)on /lon/ ‘to elapse, be past, to surpass, exceed, exceedingly’, Modern Mon lòn ‘to go past’ Praok luan ‘to go past, to pass, escape’, Riang-Lang pluan¹ ‘to project’; Lawa Bo Luang loan, Lawa Umphai lɔn ‘very’, pWa-Lawa *lɒn ‘to go’ (Dif1980:R:N19-4) Vietnamese luồn ‘to pass, sneak (through), slip underneath’ Shorto suggested Palaung → Shan pū̀ n ‘to exceed’ (due to regular loss of medial -l- in Shan) but this is unnecessary as it could have been borrowed, cf. Thai pʰón ‘beyond, exceed’. Shorto also unnecessarily complicated the phonological reconstruction; probably a pAAform *lɔn can account for all of the above reflexes.

‘to reach > until’ (doubtful PAA) Shorto: §1740.A *dəl, 1740.B *dal ‘as far as; to reach’ Bahnaric: Biat dɔl ‘as soon as’, Bahnar dal ‘till’ Khmer: Old Khmer dāl, Modern Khmer tɔ̀ əl ‘to go right through; as far as, till’, tùəl ‘(to reach) as far as, till’, dɔl ‘to arrive, reach; as far as’ Mon: Middle Mon duiw, Modern Mon tɜ̀ ‘as far as’ Munda: Boda-Gadaba ɖel ‘to arrive’ (Zide1963gutob:C:i518.s5221) Comment: If the Munda comparison is not sustainable, this set may be explained as a Mon or Khmer etmon that diffused locally. ‘to follow, accompany’ Shorto: §1463.A *t₁uj, 1463.B *t₁uːj, 1463.C *t₁uəj ‘to follow, accompany’ Bahnaric: West Bahnar həmɔːi, East Bahnar səmɔːi ‘in the same direction as, parallel to …’, Jeh katoːj ‘to accompany’, Brao *toːj ‘to follow’


Katuic: Khmeric: Monic:

Nicobaric: Palaungic:


‘finished’ Shorto: Monic:

Khmuic: Bahnaric: Khasian: Mangic: Munda: Bahnaric:


alves, jenny and sidwell

(Huf1971:C:2447-325-14), Laven *toːj ‘to follow’ (Huf1971:C:2447325-13), pWestBahnaric *toːj ‘to follow, accompany’ (Sid2003:R: 666), Mnong *tuːj ‘to follow, imitate, according to’ (Blo2005:C: 5725) Ngeq *toːj ‘to imitate, follow’ (Smi1970:C:3523) (? from Khmer) Old Khmer toy, Modern Khmer daoj ‘to follow’ (→ Thai doːj ‘by, by means of’) Old Mon tūy /tuj/ ‘be finished, completed’, adverbial of sequential action, ‘having …’, Modern Mon tɔe ‘finished’ also ‘then …’ (if connected this could be an instance of degrammaticalization ‘then’ > ‘be finished’). Nancowry tój ‘next’ Riang-Lang tɔj¹ ‘to follow, accompany; following, along, after’, Palaung kərtɯj ‘to join (wood, cloth) together’, Riang-Lang tərtɔj¹ ‘together’, Praok sitoj ‘to be joined together, make a whole’, Palaung kərtuj ‘to join (wood, cloth) together’, pBahnaric *tɔj ‘to follow’ (Sidwell 2015 §807) Short suggests Palaung → Shan tɔ́ ɛ ‘(animals) to flock together’ although probably from Burmese twɛ̀ ‘to associate’. However, borrowing → Cham tuːj, Röglai tuj ‘to follow’ from AA is probable.

§2066.A *lah, 2066.B (*lah-s >) *las, 2066.C *laːs ‘finished’ Old Mon blaḥ /blah/ ‘be relieved, come to an end; (that which) precedes or is finished, after (that), then’ (initial cluster makes the connection doubtful) Khmu Cuang klah ‘to finish’ (Suw2002:C:4277) Stieng lɛh ‘finished’, klɛh ‘to finish, use up, finish (doing)’ Pnar lat ‘to finish’ (Cho2004:C:981) Mang li³¹ ‘finished, late’ (Edm1995:C:163) Gtaʔ conjunctive =la, Sora past and conjunctive =le, Kherwarian =lV anterior Bahnar klajh ‘to have finished (doing); then’, Sedang klɛj ‘end, finish, conclude, stop (preverb)’ (Smi2000:C:932), pBahnaric *klaːs ‘finished’ (Sid2011:R:404) Perhaps also compare Juang (Munda) jela ‘to finish’.

Although the phonological correspondences are problematic, the similarities strong suggest a pAA root *las is reflected here.

austroasiatic affixes and grammatical lexicon

‘to grow’ Shorto: Aslian: Bahnaric: Khasian: Khmuic:

Palaungic: Vietic:


§1219.A *hən, 1219.B *həːn ‘to grow, to increase’ Proto-Semai *hiidn ‘to grow taller’ (DIFFLOTH 1977) pBahnaric *hɔːn ‘to grow’ (Sid2011:R:233) Khasi byrhien ‘(people) in large numbers’, pKhasian *həːn ‘to grow (Sid2012:R:1219.B)’ Thin hɤn ‘more’, Ksing-Mul hɨːn ‘to rise, ascend’ (Pog1990:C:705), Tai Hat hɨːn ‘to go up’ (Fer1970:C:705), pPray-Pramic *hɨːn ‘to rise, ascend’ (Sid2013:R:pPP-705) Palaung hən ‘to grow in height’, Riang-Lang han¹ ‘to be long’ Mương hơn (Barker 1966 12), Vietnamese hơn ‘to surpass, be more than’

‘plant, to begin’ Shorto: §1343.A *t₂əm, §1343.B *t₂əːm, §1343.C *t₂am ‘plant, to grow; to begin’ Aslian: Mintil toum ‘tree’, Bahnaric: Biat tɒːm, Sre, Chrau, Bahnar təːm ‘(foot or trunk of) tree, beginning’, Stieng taːm ‘to plant, sow’, Bahnar pətəːm, Hre basèm, Sedang pasiam ‘to begin’, Chrau, Biat nəːm ‘quantifier for trees’ Khmeric: Khmer daəm ‘ancient, original’, phdaəm ‘to begin’, Surin Khmer dʌːm ‘classifier for trees and plants; at the beginning, last time, before’, Angkor Khmer tāṁ /ɗam/ ‘to plant, sow, grow, erect’ (> Thai dam ‘to plant rice’). Khmuic: Kammu-Yuan sərnɯ̀ m ‘medicine’ Monic: Old Mon taṁ /tɔm/ ‘plant, tree, base, foot, beginning’ (Sho2006: C:1343-22), Modern Mon tɔm ‘base, foot, beginning’, pətɔm ‘begin’ (Sho2006:C:1343-23) Palaungic: Palaung sɯm, səm ‘to plant’, Riang-Lang pəksəm¹ ‘to plant, lay out (garden &c.)’ Vietic: Vietnamese đâm ‘to grow, sprout’ Comment: The semantics of this root follow a consistent pattern across AA; the data indicate an original meaning of ‘a plant as something that begins (to grow once planted)’. The etymon has developed variously into nominals, verbs, co-verbs, and adverbials. ‘or’ (doubtful pAA) Shorto: §2065 *lah ‘or’ Bahnaric: Stieng, Sre, Biat lah, Bahnar dah ‘or’ Katuic: Ngeq, Ong, Pacoh, Ta’Oi lah ‘if’


alves, jenny and sidwell

Khmeric: Monic: Comment:


Old Khmer laḥ, loḥ ‘or’ Old Mon laḥ /lah/ ‘or’ This may be grammaticalised from § 2063 *lah ‘to divide’. The distribution suggests a diffusion from Old Khmer or Old Mon into Bahnaric, Katuic, etc.

Excluded Items

The following items have been excluded, but with a further division according to potential future usage. The data for some are determined to be too weak to consider for reconstruction to any depth in AA, but they still may have comparative value of inter-branch contact, or if additional supporting data is uncovered. In other cases, the data for items are completely problematic, and these items Shorto posited for reconstruction are untenable. Items are divided into two broad categories (with potentially some overlap): 1) etyma that appear to be later than pAA, and have diffused across languages/branches, and 2) reconstructions that—due to difficulties of semantics and/or regularity of correspondences—appear unsustainable. 7.1

Local or Etyma or Loanwords that Diffused Across AA Branches

1D Shorto: Khmuic: Palaungic: Comment:

‘other’ Shorto: Monic: Bahnaric: Comment:

§4 *( )ʔaʔ ‘we two’ Kammu-Yuan àʔ, Phong ʔeː ‘we’ (Bui2000:C:598), Khsing-Mul ʔeː ‘we’ (Pog1990:C:598), Mal ʔiaʔ ‘we (dual)’ (Fil2009:C:2298) Praok a Within Palaungic, the Praok (Wa) form is isolated, so it is best regarded as a loan into Waic from Khmuic.

§490 *cʔa(i)ŋ; *(c)ʔiiŋ ‘other’ Old Mon c’āṅ /cʔaiŋ/, Modern Mon həaiŋ, Nyah Kur chǝʔáːŋ (The1984:C:390-1) Stieng iːŋ, Biat eːŋ, Biat rʔeːŋ Lexical distribution of this etymon strongly suggests Mon innovation that was borrowed into South Bahnaric.

austroasiatic affixes and grammatical lexicon


‘time, occasion’ Shorto: §1942 *( )las ‘time, occasion’ Bahnaric: Bahnar lah (!) ‘time, once’, Jeh (ku) llajh ‘at once’, Halang (mə)leh ‘(one) time’ Monic: Late Old Mon las ‘occasion’ Comment: This is arguably a Mon loan into Bahnaric. ‘all’ Shorto: Bahnaric: Khmeric: Monic: Comment:

‘more’ Shorto: Monic: Bahnaric: Comment: ‘equal’ Shorto: Bahnaric:

Katuic: Khmeric: Khmuic: Monic: Comment:

§1943.A *las, §1943.B *laːs ‘intensive’ Stieng lɛh ‘all’ Khmer (cbah)-lɔ̀ əh ‘very (clear)’ (?) Mon lɛ̀h lɛ̀h ‘(not) at all’ Beyond Mon influence on Bahnaric, this set is phonologically irregular and thus dubious.

§1314 *gam (& *gəm?) ‘more’ Old Mon gaṁ /gɔm/ ‘more, further, other, besides’ Sre gam ‘still, more’, Chrau gəm (vaː) ‘and’ This is notionally indicative of a Mon loan into Bahnaric.

§149.A *smə( )ʔ, 149.B *sməh ‘equal, alike’ Chrau səməː ‘same’, Stieng səmɯː ‘equal, similar’, Bahnar həmō ‘equal, similar, level’, Tampuon samaə ‘level, equal’ (Cro2004:C: 1947-1C) Kuy mhəː, sməː ‘to be smooth, even, level’ Khmer smaə ‘equal’ Thin s(ə̆ )mɤ ‘to be like; just like’ Old Mon smoḥ /smɔh/ ‘to be equal, alike’, Modern Mon hmuh (cɒt) ‘to agree’. Mon coda -h is not explained. Otherwise, Bahnaric, Katuic and Khmuic forms are likely borrowed from Khmer, also → Thai samə̌ ː, Cham sāmū.

Superfiacially similar Munda forms (e.g. Bodo-Gadaba soman ‘equal’) borrowed from Indic.

328 ‘equal’ Shorto: Bahnaric: Khmeric: Khmuic: Palaungic: Comment:

‘other’ Shorto: Monic: Katuic: Comment:

Q Shorto: Katuic: Khasic: Vietic: Comment:

‘above’ Shorto: Khmer: Bahnaric: Comment:

‘above’ Shorto: Khmuic:

alves, jenny and sidwell

§1394.A *t₁rim, 1394.B *(t₁)rəm, 1394.C *t₁rəːm ‘level, equal’ Sre ndrəːm ‘similar, equal’ Khmer trɤm ‘equal to, up to the same point as’, tùmrɔ̀ əm (!) ‘from now until’ Kammu-Yuan trɯ́ m ‘level’ Lawa Bo Luang ŋgrəum, Lawa Umphai ŋgreum ‘level’ Bahnaric, Khmuic, and Palaungic forms are likely borrowed from Khmer.

§2019 *t(rn)əh ‘other’ Middle Mon tanah, tanoh /tənɔh/, Modern Mon kənɔh Kuy nɑh, Bru kanɑh, Katu Phuong kanɔh ‘other, other person’ (Cos1971:C:Sid2005~1028-6) The limited distribution of this etymon suggests a Mon etymon that was borrowed into Katuic.

§92a *nɔʔ ‘what, which?’ Kuy nɑ̀ ː ‘what?’ Khasi -no ‘which?, some(one &c.)’ Vietnamese nào This set may consist of accidental lookalikes, since short forms with nasal onsets are common among referentials in AA. In particular, the Kuy form may be related to Vietnamese nào ‘which, any, every’, although nào cannot be the regular outcome of a glottal coda give the tone category.

§194 *lə(ː)ʔ ‘on top of, on’ Khmer lɤː ‘on, above’ (→ Thai ləː) Stieng lɯː, Biat (aː) ləː, Chrau avləː ‘above’ Shorto compares to An: Malay (h)ulu ‘up-river, up-country’, Cham halɔw ‘head, source’, However, it is more likely an isolated Khmer etymon that was loaned into South Bahnaric.

§1533 *kw(əː)j ‘top, on top, above’ Khmu kwəːj ‘above’, Kammu-Yuan pərwə̀ ːj ‘upper part, top’, pKhmuic *kwəːj ‘upper part, top’ (Sid2013:R:300)

austroasiatic affixes and grammatical lexicon




Palaung kərvɯy ‘above, beyond, upper part of house, loft’, Lawa Bo Luang (haɯk) ʔawui, Lawa Umphai (hauk), rawui ‘eyebrow’, (?) Praok sivoj ‘in front, before’, pPalaungic *kʋɤj ‘above, upper part’ (Sidwell 2015 §875). Sidwell (2015: 113) suggests borrowing from Palaungic into Khmuic, so this etymon is likely to be a Palaungic innovation.

‘able to, have to’ Shorto: §1472.A *ɗəːj, 1472.A *ɗəj ‘to have, to be obliged to, be in a position to, be about to’ Aslian: Jahai deʔ ‘to do’ (Pha2006:C:699-2), Kensiu diʔ ‘to make, to do’ (Bis1994:C:186) or < Malay? Bahnaric: Bahnar ɗɛj ‘to have, possess; perfect auxiliary’, Central Rölöm dəːj ‘to be able to’, Biat dəːi ‘to be (un)able to’, Chrau diː- (!) ‘(in order) to’, Sre di ‘it is necessary’ (Dou1950:C:584) Khasic: Khasi dei ‘must’ Munda: Santali—dae ‘responsibility to do’ Palaungic: Praok ti ‘(in order) to’, Palaung di (!) future prefix, Riang-Lang dəj² future prefix Comment: The semantics of the various comparisons are quite diverse, so it questionable whether this is more than a mix of phonologically similar words. Note also Proto-Tai *ɗajᶜ (e.g. Thai dâj ‘to get’) also has undergone grammaticalization in various modern languages, with senses including the abilitative. ‘to remain’ Shorto: §643 *dmɔːŋ ‘to remain, continue, be’ Aslian: Semai mong ‘to be’ (Mea1987:C:4068), Semelai kmɔŋ ‘to finish, end’ (Kru2004:C:432) Bahnaric: Sre moːŋ ‘to be accustomed to’, Bahnar pəmɔːŋ ‘to be accustomed (to)’ Monic: Old Mon dmoṅ /dmɔŋ/ ‘to remain, be (located), reside, stay’, Modern Mon mòŋ ‘to remain, stay, continue, reside’ Comment: The Bahnaric forms are readily explained as borrowed from Mon, and the Aslian forms may be related as Mon loans or as part of a Southern AA etymon.

330 7.2

alves, jenny and sidwell

Unlikely or Disputed Reconstructions

‘each, every’ Shorto: Bahnaric: Khmeric: Monic: Munda: Comment:

‘to give’ Shorto: Bahnaric: Katuic: Khasi: Khmeric: Palaungic:


‘all’ Shorto: Monic: Palaungic: Aslian: Comment:

§1700 *rʔəlh; by metathesis *rlʔəh ‘each, every’ Sre dəh Khmer rɔə̀ l Middle Mon ruih, Modern Mon rɜ̀h̀ (also verb ‘to count’) Sora diːShorto’s reconstruction appears excessively contrived; it may be that none of the branch-level etyma are related.

§1434 *ʔaːjh ‘to give’ Sre aːj ‘to give’ Kuy ʔɛː ‘to take, bring’ Khasi ai ‘to give’ Old Khmer oy, Modern Khmer ʔaoj ‘to give’ Riang-Lang ɛ¹ ‘to give, cause to, allow to; let …!’; so that, Praok e adhortative particle, Palaung deh ‘to give’, pPalaungic *ʔe(ː)h ‘to give, take’ (Sidwell 2015 §30) Phonological irregularities render this a very dubious set. It probably comprises several unrelated styma. Note also in Munda, -ai/aj is cislocative or subject-orientation marker in Sora, common source of such verbal deictics are ‘give’ and ‘come’

§198 *klɔʔ ‘all’ Old Mon klo’ /klɔʔ/ ‘all’ Riang-Lang klɔʔ¹ ‘all’ Semang nalo’ (SKEAT & BLAGDEN 1906 A 61), Temiar neigelok ‘to surround, to encircle, to be on all sides of’ (Mea1998:C:1831) This is geographically limited, suggesting regional sharing, perhaps from Monic into Palaungic, and perhaps Aslian forms are accidentally resembling Mon.

‘side, edge’ Shorto: §1974 *gah ‘side, edge, direction’ Katuic: Bru ka̤ h ‘side’ (The1980:C:635), Kuy ŋàh ‘rim, edge’ (< pKatuic *ŋah ‘rim, mouth of container’ Sidwell 2005) Bahnaric: Sre gah ‘side, border, edge’, Bahnar gah ‘direction, towards, side’ (Ban1979:C:1707-1)

austroasiatic affixes and grammatical lexicon

Vietic: Comment:

‘to be’ Shorto: Bahnaric: Palaungic: Comment:

‘time’ Shorto: Palaungic: Vietic: Comment:

‘time’ Shorto: Palaungic: Vietic: Comment:


Vietnamese ngả ‘direction, way; lean, incline’ Shorto suggests borrowing into Chamic, cf. Cham kàːh, Jarai gaːh, Röglai, North Röglai gah ‘side’, but it is more likely that Chamic is the source of Bahnaric and Bru forms. Katuic *ŋah and Vietnamese ngả probably reflect an unrelated etymon.

§1117 *ʔən ‘to be, exist’ Stieng ən ‘to exist, to have’, Halang ʔan ‘here; now; this’ (Coo1976: C:8), Oi ʔəːn ‘have’ (Pra1995:C:Sid2003~4562) Riang-Lang an² ‘to be the case, be true’ The data supporting this comparison are so weak it is very doubtful.

§1171 *bənʔ ‘time’ Palaung bən ‘(future) time’, Praok pon ‘time of day’, pPalaungic *bɤn ‘time’ (Sidwell 2015 §59) Mương pận, Vietnamese bận ‘time (quantifier)’ Given the apparent lack of cognates beyong Palaungic and Vietic, this set appear to reflect accidental lookalikes.

§1511.A *l( )aj(ʔ), §1511.B *l( )aːj(ʔ) ‘again’ Riang-Lang ləj¹ ‘more, longer, else’, Praok laj ‘mark of continuous or habitual action’ Mương lê, Vietnamese lại ‘again’ Shorto suggests → Shan lɑ̄ i ‘again’; this also could be a loan from Burmese lɛ̀ ‘exchange’ or lɛ ‘turn’ (though the tone is problematic in both cases). Phonological irregularity and high likelihood of borrowing render this set doubtful.


alves, jenny and sidwell

References Alves, Mark J. 2014. A survey of derivational morphology in the Mon-Khmer Language Family. In Pavel Stekauer and Rochelle Lieber (eds.), The Oxford Handbook of Derivational Morphology. Oxford: Oxford University Press. 520–544. Alves, Mark J. 2015. Morphological functions among Mon-Khmer Languages: Beyond the Basics. in Nick Enfield and Bernard Comrie (Eds.), Mainland Southeast Asian Languages: The State of the Art. Berlin, Mouton de Gruyter. 531–557. Anderson, Gregory D.S. and K. David Harrison. 2008. Sora. The Munda Languages, ed. by Gregory D.S. Anderson. New York: Routeledge. Barker, Milton E. 1966. Vietnamese-Muong Tone Correspondences. In Studies in Comparative Austroasiatic Linguistics, edited by Norman H. Zide. 9–25. Mouton & Co. Blood, Henry and Evangeline Blood. 1966. The Pronoun System of Uon Njun Mnong Rơlơm. Mon-Khmer Studies Journal 2: 103–111. Burenhult, Niclas and Claudia Wegener. 2009. Preliminary Notes on the Phonology, Orthography and Vocabulary of Semnam (Austroasiatic, Malay Peninsula). Journal of the Southeast Asian Linguistics Society. 1: 283–312. Dempwolff, Otto. 1938. Vergleichende Lautlehre des austronesisches Wortschatzes. Bd. III: Austronesisches Wörterverzeichnis. Berlin and Hamburg, Dietrich Reimer. Diffloth, Gérard. 1984. The Dvaravati Old Mon Language and Nyah Kur. Chulalongkorn University Printing House. Dournes, Jacques. 1950. Dictionnaire Srê (Köho)—Français. Impr. d’ Extrême-Orient. Höhn, George. 2105. Demonstratives and Personal Pronouns. Cambridge Occasional Papers in Linguistics. 8.5: 84–105. ISSN 20505949. Halliday, R. 1922. A Mon-English Dictionary. Siam Society, Bangkok (reprinted in 1955 by the Mon Cultural Section, Ministry of Union Culture, Government of the Union of Burma, Rangoon.) Kobayashi, Masato, Ganesh Murmu and Toshiki Osada. 2003. ‘Report on a preliminary survey of the dialects of Kherwarian languages’, Journal of Asian and African Studies, 66: 331–364. Li, Fang-kuei. 1977. Handbook of Comparative Tai. Honolulu: University of Hawai’i Press. Osada, Toshiki. 2008. Mundari. The Munda Languages, ed. by Gregory D.S. Anderson. New York: Routeledge. 99–164. Pinnow, Heinz-Jürgen. 1959. Versuch Einer Historischen Lautlehre Der Kharia-Sprache. Wiesbaden, Otto Harrassowitz. Pinnow, H.-J. 1965. Personal Pronouns in the Austroasiatic Languages: A Historical Study. In Indo-Pacific Linguistic Studies. Lingua 14–15. Translated by H.L. Shorto. pp. 3–42. Pinnow, Heinz-Jürgen. 1966. A Comparative Study of the Verb in the Munda Languages. In: Norman H. Zide (ed.): Studies in Comparative Austroasiatic Linguistics. The Hague, Mouton (Indo-Iranian Monographs, V). 96–193

austroasiatic affixes and grammatical lexicon


Rajasingh, V.R. 2016. Mūöt (Nicobarese). Mon-Khmer Studies 45: 14–52. Rajendran, S. 2002. Kharia In “Linguistic Survey of India Special Studies: Orissa”, B.P. Mahapatra, editor. Language Division, Office of the Registrar General, pp. 372–398. SEAlang Mon-Khmer Etymological Dictionary. SEALANG Munda Languages Project. SEAlang Munda Etymological Dictionary. SEALANG Munda Languages Project. http:// Shorto, Harry L. 1971. A dictionary of the Mon inscriptions from the sixth to the sixteenth centuries. London: Oxford University Press. Shorto, Harry L. 2006. A Mon-Khmer comparative dictionary. ed. by Paul Sidwell, Doug Cooper, and Christian Bauer. Pacific Linguistics, Research School of Pacific and Asian Studies. Australia National University. Skeat, Walter W., and Charles Otto Blagden. 1906. Pagan Races of the Malay Peninsula. 3 Vols., London, Macmillan. Sidwell, Paul J. 2005. The Katuic Languages: Classification, Reconstruction and Comparative Lexicon. LINCOM Europa. Sidwell, Paul. 2008. Issues in the morphological reconstruction of Proto-Mon-Khmer. in Claire Bowern, Bethwyn Evans & Luisa Miceli (eds.), Morphology and Language History: In honour of Harold Koch, 251–265. Amsterdam: Benjamins. Sidwell, Paul. 2014. Expressives in Austroasiatic. The Aesthetics of grammar: Sound and meaning in the languages of Mainland Southeast Asia, ed. by Jeffrey P. Williams. Cambridge University Press. 17–35. Sidwell, Paul. 2015. The Palaungic Languages: Classification, Reconstruction and Comparative Lexicon. Munich, Lincom Europa. Zide, Arlene R.K. 1982. A Reconstruction of Proto-Sora-Juray-Gorum Phonology. PhD thesis.

Index Note: # marks data examples absolutive marking; see: argument marking (absolutive) adverbial clause/construction 112, 144–145 agreement 15 gender 107, 119ff. argument 27, 82ff., 236ff. Amwi #30 animacy 94, 96–97, 100, 241–243, 247–248, 252, 255 anti-passive 91, 96, 101 anti-topic 8, 27, 30–31, 36 areal convergence; see: convergence (areal) argument marking 27, 87–89, 94–99, 218– 222, 236ff.; see also: subject marking; object marking; agreement absolutive 87–89 Aslian languages 21, 26; see also individual language entries aspect; see: TAM attributive clause; see: relative clause Austroasiatic; see individual language group entries auxiliary 28, 32, 138–140, 182–183, 201, 215, 217, 227, 237, 240f.n Ava; see: Awa Awa; also: Ava, Vo 135, 147–152, #148–#150 Bahnaric languages; see individual language entries Bangla #263 Bhoi #114, 114–115 Bhumij #241, #246, #249, #251 bilingualism; see: multilingualism Birhoɽ #240, #243–244, #251 borrowing 52, 65–67, 74, 157, 189, 262; see also: loan Bugan #176, #186 Bunong #176, #185 Cantonese #62, #72 Car #24, #88–#93 case marking 188–190 and verb-initiality 189 Chinese #151

Chong #176 classifier (numeral) 12, 53–54, 57, 62–68, 123–125; see also: quantification clitic chain 177, 185, 238 complement clause 145–146 completive 172, 185 conditional clause 145 constituent order; see: word order contact with Austronesian 9, 24, 27–28, 36, 82, 88, 90, 101 with Chinese 7, 29, 46 ff., 135, 148, 151– 152 with Dravidian 4, 9, 23, 35, 163, 255, 258, 276 with Indo-Aryan 4, 9, 23, 29–30, 35, 122, 163, 189, 258 ff. with Sinitic 46 ff.; see also: Chinese with Sino-Tibetan 29–30; see also: Tibeto-Burman with Tai-Kadai 22–23, 30, 40, 51, 56–58, 135, 148, 150–152 with Tibeto-Burman 57, 109; see also: Sino-Tibetan convergence (areal) 4, 6, 22–24, 46 ff., 82, 258; see also: contact copular clause 110, 138, 187, 249–253 Danau #175, #187 de-grammaticalization 204 definiteness 88, 141 degree word 148, 150, 152 dependent clause; see also: relative; adverbial; conditional; complement clause and verb-initiality 28, 119, 143–147, 149 and word order 9, 22–23, 91–92, 143–147, 149 descriptive; see: elaborate expressions double marking (subject) 239, 245–246 elaborate expression 122–129 existential expression 110, 118 f.n, 129, 138, 187

336 fronting 22, 34 of arguments 97–98, 100, 115, 117, 126, 187 of verbs 31, 37 gender agreement 107, 119ff. grammaticalization 107ff. sex-based 119–126 Gorum #200, #224 grammaticalization 15, 70, 74, 95, 119ff., 200–201, 204, 221–222, 231; see also: de-grammaticalization grammaticalized verbs 320–326 of gender 107ff. of verbal markers 200–201 Gtaʔ #177, #190, #224 Gutob #180–181, #200–201 Gutob-Remo (proto) 159 head-initiality; see: verb-initiality Hill Gtaʔ #179–180, #184, #188–189 Hindi-Urdu #23, #260, #277–280 Ho #175, #188, #239, #241–251 holistic shift 160–172, 199, 228–230 ideophone; see: elaborate expression imperative 11, 93, 141, 187, 214, 243–247; see also: prohibitive imperfective 184, 201, 203, 214–217, 237 incorporation 10, 25, 26, 159, 201, 208–209 Indo-European; see: Indo-Aryan Indosphere 160–161 infix 39, 48, 54, 201, 206, 211, 237, 288–294 information structure, see: pragmatics; topic interrogative 13, 69, 75, 95, 118, 128, 141, 311– 313 Jahai #27 Juang #177–178, #223 Katu #31–32 Katuic languages 31–32; see also individual language entries Kharia #26, #176, #262, #265 Kharia (proto) 159 Khasi 109–113, #110–112, #123, #176, #186, #263, #277–280 Khasian languages 21, 29–31, 107ff.; see also individual language entries

index Khasian (proto) 107–108, 119–130 Kherwarian languages 236ff.; see also: Munda languages; individual language entries Kherwarian (proto) 236ff. Khmer (Modern) #264 Khmer (Old) #62, #68, #263 Khmuic languages; see also individual language entries Korku #175, #216, #249, #252 Kơho-Sre #175 Kri #67, #71, #73 Kui Ntua #186 language contact; see: contact lexical doublet 159 loan/loanword 39, 51, 56, 58, 66, 71; see also: borrowing locative 11, 92, 313–320 Loi; see: Va Lyngngam 113–114, #113–114 Man Noi Plang 121, 128–130, #129–130 Maniq #187 mediopassive 59, 72; see also: middle voice metatypy 157 middle voice 201, 210–211, 218, 226–229; see also: mediopassive Mlabri #188, #264 Mnar #117, 117–118 modality; see: TAM Mon (Middle) #263 Mon (Modern) #22–23, #34, #186, #190, #263 Mon (Old) #175, #263 multilingualism 52, 55–57, 67, 108, 148 Munda languages; see also: Kherwarian languages; individual language entries and Austroasiatic 198, 203, 227–231 and Mon-Khmer 160–164 bound morphemes 198 ff. classification 159–162 history 157–158 holistic shift 160–172, 199, 228–230 language contact 157–159 morphological complexity 198–203 morphological diversity 198–199 negation 159, 174–183, 236 ff. noun phrase 188–190

index person marking 236ff. relative clause 258ff. TAM 236ff. typology 21, 158–159 verb-final syntax 21, 174, 183, 185–188 verb-initiality 24–26 verbal morphology 198ff. Munda (proto) 157ff.; see also: protoMunda predicate argument marking 218–222 case marking 173, 188 clause structure 201, 231 prosody 172, 202 typology 198 word order 172–174 Mundari (Keraʔ) #238–239, #242, #245, #250–251 Mundari (Tamaɽia) #238–245, #250–251 Mường #65, #70–71, #73 Mày #73 Nancowry #95–100, #265 negation 11, 174ff. and person marking 236ff. and TAM markers 177–182, 253–254 in copular constructions 11, 249–253 parallel systems 159, 174–183, 236ff. proto-Munda 201, 211ff. proto-Kherwarian /proto-North Munda 249, 252 Nias #24 Nicobarese languages 82ff.; see also: individual language entries and Austroasiatic 82, 90 and Austronesian 24, 38–39, 82, 88, 90, 101 argument marking 87–89, 94–99 clause structure 93, 100 passive (and antipassive) voice 89–91, 95–96, 100, 101 typology 21, 82 verb-initiality 9, 21ff., 38–39, 87, 100 Nicobarese (proto) 86, 99, 100–102 North Munda; see: Kherwarian North Munda (proto) 26; see also: Kherwarian (proto) noun phrase accessibility hieararchy 267– 270

337 object marking 220–221, 245–247; see also: argument marking Oriya #263 OV syntax; see: verb-final syntax Pacoh #34, #184, #189, #264 Palaung (Dara’ang) #187 Palaung (Shwe) #28, #29 Palaung; also: Ta’ang #264 Palaungic languages 28–29, 128–130, 135 ff.; see also individual language entries passive voice 54, 59–60, 72, 89–91, 95–96, 203 perfective 185–186, 201, 203, 214–217, 237, 253 person marking; see: argument marking Pnar #30, 115, #115, #119, #125–126, #129 possessive construction 65, 69, 87, 138, 190 post-nominal verb; see: verb-final syntax pragmatics 7; see also: topic and dropping of arguments 34, 87, 91 and pronouns 298 and verb-initiality 22, 30–32, 35, 37, 118, 152 and word order variation 3, 6, 8–10, 22– 23, 26–27, 34, 43–35, 37, 87, 115, 118, 141–142, 152 Praok; see: Wa (Parauk) predicate type 137–140 predicate-initiality; see: verb-initiality prohibitive 11, 174–183, 236ff. pronoun/pronominal system 119ff., 298– 308 complex system 101 three-number system 120 proper noun 87, 141 prosody 160–172, 202, 238 proto-Munda predicate 198ff.; see also: Munda (proto) aspect-voice marker 202–203, 218, 227– 231 morphological complexity 198–203 derivational affixes 201 position 201–202 pre-verbal particles 199, 202 prosodic structure 202

338 quantification 12, 49, 62–68, 122–124; see also: classifier quantifier; see: quantification reciprocal 171, 201, 203–204, 209–211, 225, 227, 237 reduplication 184, 201–202, 210–211, 289, 294–298; see also: elaborate expression reflexive-possessive marker 190 relative clause 69, 102, 111–112, 116, 122–124, 143–144, 258ff. Remo #167, #181–182 rhythmic holism; see: holistic shift Rục #64–65, #70–74 Sangtam #260 Santali #165, #168, #175, #238, #241–242, #244, #250–254, 258ff., #260–271, #277–280 Semaq Beri #184, #189–190 Semelai #27, #264 serial verb construction 10–11, 73–74, 139– 140, 172–173, 177, 182, 185, 189 Shan (Tai Nuea) #151 Sinosphere 160–161 Sora #23, #25–26, #170–171, #178–179, #224 specificity 141–142 subject type of 141–142 clitic 183–184, 236ff. marking; see: subject marking subject marking 183–184, 219–220, 236ff.; see also: argument marking double 239, 245–246 proto-Kherwarian 247–248 proto-North-Munda 247–248 suppressed 241–243 subordinate clause; see: dependent clause Ta’ang; see: Palaung tabooing (lexical) 86 TAM 138–140, 185–188, 216–218 and negation 177–182, 213–215, 253–254 and person marking 236ff. Telugu #260 Temiar #184 temporal clause; see: adverbial clause tense; see: TAM

index Thai #23 Thàvựng #60, #64–66, #69, #74 topic floating topic 142–143 topic-comment 7, 9–10, 26, 34 ff., 40 ff., 46, 54, 68–73, 75 topic-prominent language 49, 52, 69–73, 75 topicalization 69, 98, 100, 142, 147 Va; also: Loi 135, 147–152, #149–150 verb (perception and cognition) 145–146 verb-final syntax 21, 174, 183, 185–188; see also: word order verb-initiality 21–45, 107 ff., 135ff. and language contact 21, 32, 36, 38–39, 41 and pragmatics 22, 30–32, 34–35, 37, 118, 152 and processing bias 40–41 and typology 37–38 Austroasiatic 21 ff. Austroasiatic (proto) 34 ff., 41, 152, 189 geographical distribution 24 ff. Khasian languages 107 ff. Munda languages 158, 172 Nicobarese languages 24, 38–39, 82 ff. Vietic languages 72 Vietic languages 46 ff., see also individual language entries clause structure 68–75 convergence (typological) 46 ff. directional verb 73–74 language contact 46 ff. noun phrase 62–68, 75 passive voice 58–60, 72–73 Southern Vietic 46–47, 49 topic-comment structure 69–71 typology 46–47, 51–54, 72 Việt-Mường 46–47, 49 Vietic (proto) 46 ff. clause structure 68–75 grammatical vocabulary 58–62 noun phrase 67–68, 75 quantification 49, 62–68 Vietnamese 46 ff., #63–66, #70–74, #185, #264 VO syntax; see: verb-initiality Vo; see: Awa

index Wa (Parauk) 135ff., #137–150 Wa #29 War (Kudeng) #30 War 116–117, #116–117 word order; see also: verb-initiality; verb final syntax alternative; see: word order (variability) and areal convergence 22ff. and pragmatics; see: word order (variability)

339 and pronominal system 119 ff. and type of clause 136–147 and type of subject 141–142 and type of verb 145–147 fixed 87, 143, 148–149 flexible; see: word order (variability) in Austroasiatic 21–24 variability 3, 6, 8–10, 22–23, 26–27, 34, 43–35, 37, 87, 115, 118, 141–142, 152 word prosody; see: prosody