The Oxford Handbook of Persian Linguistics (Oxford Handbooks) 0198736746, 9780198736745

This handbook offers a comprehensive overview of the field of Persian linguistics, discusses its development, and captur

562 105 23MB

English Pages 608 [528] Year 2018

Report DMCA / Copyright


Polecaj historie

The Oxford Handbook of Persian Linguistics (Oxford Handbooks)
 0198736746, 9780198736745

Citation preview

The Oxford Handbook of




THE OXFORD HANDBOOK OF LINGUISTIC ANALYSIS Second Edition Edited by Bernd Heine and Heiko Narrog

THE OXFORD HANDBOOK OF CHINESE LINGUISTICS Edited by William S.-​Y. Wang and Chaofen Sun




THE OXFORD HANDBOOK OF HISTORICAL PHONOLOGY Edited by Patrick Honeybone and Joseph Salmons



THE OXFORD HANDBOOK OF DEVELOPMENTAL LINGUISTICS Edited by Jeffrey Lidz, William Snyder, and Joe Pater

THE OXFORD HANDBOOK OF INFORMATION STRUCTURE Edited by Caroline Féry and Shinichiro Ishihara

THE OXFORD HANDBOOK OF MODALITY AND MOOD Edited by Jan Nuyts and Johan van der Auwera



THE OXFORD HANDBOOK OF ERGATIVITY Edited by Jessica Coon, Diane Massam, and Lisa deMena Travis

THE OXFORD HANDBOOK OF POLYSYNTHESIS Edited by Michael Fortescue, Marianne Mithun, and Nicholas Evans


THE OXFORD HANDBOOK OF PERSIAN LINGUISTICS Edited by Anousha Sedighi and Pouneh Shabani-​Jadidi

For a complete list of Oxford Handbooks in Linguistics, please see pp. 575–​6

The Oxford Handbook of





3 Great Clarendon Street, Oxford, ox2 6dp, United Kingdom Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research, scholarship, and education by publishing worldwide. Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries © editorial matter and organization Anousha Sedighi and Pouneh Shabani-​Jadidi 2018  © the chapters their several authors 2018 The moral rights of the authors‌have been asserted First Edition published in 2018 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, by licence or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above You must not circulate this work in any other form and you must impose this same condition on any acquirer Published in the United States of America by Oxford University Press 198 Madison Avenue, New York, NY 10016, United States of America British Library Cataloguing in Publication Data Data available Library of Congress Control Number: 2018937418 ISBN 978–0–19–873674–5 Printed and bound by CPI Group (UK) Ltd, Croydon, CR0 4YY Links to third party websites are provided by Oxford in good faith and for information only. Oxford disclaims any responsibility for the materials contained in any third party website referenced in this work.


Contents ix xi xv xix

Acknowledgements  List of Figures and Tables  List of Abbreviations  The Contributors 

1. Introduction  Anousha Sedighi and Pouneh Shabani-​Jadidi


PA RT I   C L A S SI F IC AT ION A N D H I S TORY 2. From Old to New Persian  Mauro Maggi and Paola Orsatti


3. Typological approaches and dialects  Mohammad Dabir-​Moghaddam


PA RT I I   S O U N D SYS T E M 4. Phonetics  Golnaz Modarresi Ghavami


5. Phonology  Mahmood Bijankhan


6. Prosody  Arsalan Kahnemuyipour


PA RT I I I   SY N TAX 7. Generative approaches to syntax  Simin Karimi


8. Other approaches to syntax  Jila Ghomeshi


viii   Contents

9. Specific features of Persian syntax  Pollet Samvelian


PA RT I V   L A N G UAG E A N D WOR D S 10. Morphology  Behrooz Mahmoodi-​Bakhtiari


11. Lexicography  Seyed Mostafa Assi


12. Academy of Persian Language and Literature  Mohammad Dabir-​Moghaddam


PA RT V   L A N G UAG E A N D P E OP L E 13. Sociolinguistics  Yahya Modarresi


14. Language contact and multilingualism in Iran  Shahrzad Mahootian


15. Persian as a heritage language  Anousha Sedighi


16. Teaching Persian to speakers of other languages  Pouneh Shabani-​Jadidi and Anousha Sedighi


PA RT V I   L A N G UAG E , M I N D, A N D T E C H N OL O G Y 17. Psycholinguistics  Pouneh Shabani-​Jadidi


18. Neurolinguistics  Reza Nilipour


19. Computational linguistics  Karine Megerdoomian


References  Name Index  Subject Index 

481 541 551


Working on this volume has been by far the greatest honour and privilege of our professional lives. We are especially proud of this contribution as it sheds light to our beloved and ancient Persian language, for language encapsulates the soul of a culture and civilization. We would like to thank the Oxford University Press for agreeing to take on this project and the OUP’s Delegates for accepting our book proposal and for providing valuable feedback and advice. Additionally, we wish to thank Julia Steer, Vicki Sunter, and Karen Morgan for their help since we first started this project in 2014. Thanks are also due to the anonymous reviewers of our book proposal for their helpful and productive feedback. A project of this magnitude cannot come together without the conscious efforts, support, and encouragements of many experts and scholars. Throughout the journey of editing this volume, we have had the pleasure of working with many world renowned scholars and enjoyed their invaluable insights and reviews. In particular, we would like to extend our appreciations to the following:  Drs Paul Hirschbuhler, Mohammad Dabir-​Moghaddam, Golnaz Modarresi Ghavami, Koorosh Safavi, Shahla Raghibdoust, Ahmad Sedighi, Arsalan Kahnemuyipour, María Luisa Rivero, John Jensen, Eric Mathieu, Ana Arregui, Sima Paribakht, Patricia Higgins, Stephen Millier, P.  Oktor Skjaervo, Nicholas Sims-​Williams, Martin Schwartz, Gernot Windfuhr, Agnès Lenepveu-​Hotz, Carina Jahani, Ali Reza Abasi, Ash Asoudeh, Azita Taleghani, Christina Manouilidou, Ladan Hamedani, Corey Miller, Alireza Korangy, Hossein Samei, Linda Godson, Mehdi Marashi, Mohammad Vahedi-​ Langrudi, Nathan Hill, Nima Sadat-​Tehrani, Reza Ghafar Samar, Saeid Atoofi, Saloumeh Gholami, Shervin Farridnejad, Touraj Daryayi, Asghar Seyed-​Gohrab, and Franklin Lewis. We would also like to thank all the contributors of the volume for their expertise, responsiveness, and patience during this lengthy project. On a more personal note, we would like to express our heartfelt gratitude to our family members Dr Ahmad Sedighi, Ladan Rahiminia, Farzin Tavakkoli, as well as Ali Shabani, Simin Aletaha, Shohreh Shabani, and Dr Marc de Jardins for their love, support, and encouragements. Last but not least, we wish to thank our loving sons, Kourosh Tavakkoli and Arian Mirhashemi for bringing boundless joy and unparalleled love into our lives.

List of Figures and Tables Figures 4.1 Waveform and spectrogram of a glottal stop in initial position 


4.2 Waveform and spectrogram of a glottal trill 


4.3 Waveform and spectrogram of voiceless fricatives 


4.4 Glottal fricative [h]‌in intervocalic [ɑ_​_​a] position 


4.5 Spectrogram of nasals 


4.6 Waveform and spectrogram of [le] 


4.7 Waveform and spectrogram of [je] 


4.8 Waveform and spectrogram of [ɹe]  102 4.9 Waveform and spectrogram of Persian Simple Vowels 


4.10 Vowel formant frequencies (Female speakers) 


4.11 Vowel formant frequencies (Male speakers) 


4.12 Vowel space of Female and Male Persian Speakers 


4.13 Waveform and spectrogram of Persian Complex Vowels 


4.14 Vowel space of Persian diphthongs 


4.15 Waveform, spectrogram, and the pitch contour of a statement 


4.16 Waveform, spectrogram, and pitch contour of [bale] ‘yes’ 


13.1 Percentage of vowel assimilation by class and style, female adults (Jahangiri 1980) 


13.2 Percentage of vowel assimilation by class and style, male adults (Jahangiri 1980) 


19.1 Sample language processing pipeline. Adapted from ParsiPardaz, a Persian language toolkit (Sarabi et al. 2013) 


19.2 Sample block diagram of a Persian–​English speech-​to-​speech system (Georgiou et al. 2006) 


xii    list of FIGURES AND TABLES

Tables 2.1 Middle Persian phonemes (adapted from Durkin-​Meisterernst 2014: 114ff.) 


2.2 The early two-​case system of Middle Persian 


2.3 Manichaean Middle Persian endings of the present (after Durkin-​Meisterernst 2014: 232ff.) 


2.4 Middle Persian past tenses (PP = past participle) 


2.5 New Persian documents in different scripts (with abbreviations) 


2.6 Dialectal and chronological classification of ENP documents 


2.7 Phonological system of literary Contemporary New Persian 


2.8 Phonological system of Early and Classical New Persian 


3.1 Modern Persian’s word order typology 


3.2 Old Persian’s word order typology 


3.3 Middle Persian (Pahlavi) word order typology 


4.1 Consonants of Early New Persian 


4.2 Consonants of Standard Modern Persian 


4.3 VOT (ms) of voiceless plosives in initial and intervocalic positions 


4.4 Average VOT (ms) of voiced plosives in initial and intervocalic positions 


4.5 VOT values for Persian affricates (ms) 


4.6 Duration (ms) of Persian vowels 


5.1 Feature chart of Persian 23 consonants


5.2 Feature chart of Persian six vowels 


9.1 Telicity in Persian complex predicates 


10.1 Persian pronouns 


10.2 Basic possessive paradigms 


10.3 Cardinal numbers 


10.4 Verbal endings in Persian 


10.5 Basic paradigms for the verb bordan ‘to take’ 


10.6 Aspects, moods, and tenses: bordan ‘to take’ 


13.1 Percentage of final stop deletion in Persian by style 


13.2 Percentage of final stop deletion in Tehran Persian by gender, age, and style 


13.3 Percentage of final stop deletion by education and style in Tehran Persian 


13.4 Percentage of vowel harmony in casual speech of Persian speakers by education and gender 


13.5 Percentage of /​u/​realization in the speech of Persian speakers in Tehran by age and style 


list of FIGURES AND TABLES    xiii 13.6 Percentage of /u/ realization of Persian speakers in Tehran and Ghazvin by style 


13.7 Social status of Persian during the last several centuries 


15.1 Differences between written and spoken forms of Persian (Tehrani dialcet) 


17.1 Examples of stimuli: Experiment 2a 


17.2 Priming of nominal constituent in non-​head/​word-​initial position 


17.3 Examples of stimuli: Experiment 2b 


17.4 Priming of head/​word-​final position 


18.1 Clinical linguistic batteries for assessing acquired language impairments in Iranian brain-​damaged patients 


18.2 Neurolinguistic studies on monolingual and bilingual Persian-​speaking brain-​damaged patients 


18.3 Mean scores, SDs of subtests, and AQ of Persian aphasic and epileptic patients 


18.4 fMRI studies in Persian 


19.1 Plural morpheme example in attached, detached, and separated forms in the Persian writing system 


19.2 Complex tokens consisting of two distinct but attached lexical categories 


19.3 Examples of Persian light verb constructions 


19.4 Examples of separable affixes in Afghan Persian (Dari) text 


List of Abbreviations

1 first person 2 second person 3 third person A adjective ACC accusative ADV adverbial AFF affectedness AgrP agreement phrase AH Latin: Anno Hegirae, ‘in the year of the Hijra’ (Islamic calendar) AP accentual phrase ApplP applicative phrase AQ aphasia quotient ART article AspP aspect phrase ASR automatic speech recognition AVM arterio-​venous malformation BAT bilingual aphasia test BDAE Boston diagnostic aphasia examination BG basal ganglia C Consonant CD compact disc CG construction grammar CL (1) compensatory lengthening; (2) enclitic pronoun; (3) computational linguistics CLAS I Cross Language Aphasia Study I CLC clitic CLEF Conference and Labs of the Evaluation Forum CLF classifier CLIR cross-​language information retrieval CMP classical modern Persian CNJECT conjectural Comp comparative Comp compiler/compiled by COMPL complete CONT continuous aspect marker COP copula CP complementizer phrase CPr complex predicate CQ cortical quotient

xvi   LIST OF ABBREVIATIONS CVA cerebral vascular accident CVCC consonant-​vowel-​consonant-​consonant CVR criterion validity ratio DAT dative DEF definite DIR direct case DO direct object DOM differential object marking DPA Dorsal Place Assimilation Dur durative DUR durative ELRA European Language Resources Association EZ ezafe F feminine F0 fundamental frequency F1 first formant F2 second formant F3 third formant FP functional phrase FT fronto-​temporal FTP fronto-​temporal-​parietal G genitive GB government and binding (theory) GEN genitive HPSG head-​driven phrase structure grammar Hz hertz IMP imperfect INCOMPL incomplete IND indefinite IND indicative Indef indefinite INF infinitive INST instigation IO indirect object IP intonational phrase IPA International Phonetic Alphabet IPFV imperfective IR information retrieval kHz kilohertz L1 first language L2 second language L BG left basal ganglia LFG lexical functional grammar LFT left fronto-​temporal LH left hemisphere LIFG left inferior frontal gyrus LOC locative

LIST OF ABBREVIATIONS    xvii LP left parietal LQ language quotient LV light verb M masculine MLU mean length of utterance ms milliseconds MS morphological structure MT machine translation MWEs multi-​word expressions N (1) noun; (2) neuter NEG negative/​negation NG negative/​negation NI noun incorporation NLP natural language processing NOM nominative NP noun phrase NPA Nasal Place Assimilation NPA nuclear pitch accent NV non-​verbal NVE non-​verbal element OBJ object OBL oblique OCR optical character recognition OM object marker OPT optative OT optimality theory OV object–​verb P1 preposition group 1 P2 preposition group 2 PAB Persian aphasia battery PART particle PART participle PASS passive PCLD Persian Clinical Linguistic Database P-​DAB Persian diagnostic aphasia battery PerDT Persian Dependency Treebank PF phonetic form PFP perfect participle PL plural PLDB Persian Linguistic Database PNI pseudo noun incorporation POS part of speech POSS possessive Post/POST postposition PP preposition phrase PredP predicate phrase Prep preposition

xviii   LIST OF ABBREVIATIONS PRS present PST past PTCP participle P-​WAB-​1 Persian and Western aphasia battery 1 R BG right basal ganglia REF referential REL relative RH right hemisphere RRC reduced relative clause RRG role and reference grammar RT reaction time SBJV subjunctive SF surface form SG singular SINF short infinitive SLP speech–​language pathology SOV subject–​object–​verb Spec specifier Spec-​IP/​TP specifier of infinitival phrase/​tense phrase SPR superlative Subj subjunctive Super superlative SVC serial verb construction SVO subject–​verb–​object TEP Tehran English–​Persian (parallel corpus) TG transformational grammar TOPH two object position hypothesis TopP topic phrase TMC Tehran monolingual corpus TP (1) tense phrase; (2) temporo-​parietal TTR type/​token ratio UF underlying form UPDT Uppsala Persian Dependency Treebank UPSID UCLA Phonological Segment Inventory Database V Vowel VH Vowel Harmony VO verb–​object VOC vocative VOS verb–​object–​subject VOT voice onset time VP verb phrase VSO verb–​subject –​object VVPE verb-​stranding VP-​ellipsis WAB-​R Western aphasia battery revised WSD word sense disambiguation ZWNJ zero-​width non-​joiner

The Contributors

Seyed Mostafa Assi  has a PhD in Linguistics (lexicography and computational linguistics) from the University of Exeter, 1989, and is Professor and Dean of the Faculty of Linguistics at the Institute for Humanities and Cultural Studies, Tehran, Iran. His areas of research are English and Persian linguistics and lexicography. He is the author and co-​author of more than 70 papers and 18 books, such as A Selective Lexicon of Linguistics (1996), A Comprehensive Management Dictionary (1998), Persian Equivalents for Computer Terms (2002), and A Comprehensive Persian–​English Dictionary (4 volumes) (2003). He is also the founder and director of the Persian Linguistic Database, available at http://​ Mahmood Bijankhan  is a Professor of Linguistics in the department of General Linguistics at the University of Tehran. He received his BS degree in Mathematics from the University of Texas at Arlington and his MA and PhD in Linguistics from the University of Tehran. His research interests lie in the area of phonetics, phonology, and corpus linguistics. In recent years, he has focused on Persian proficiency test for non-​Persian speakers. Mohammad Dabir-​Moghaddam  received his PhD in linguistics from the University of Illinois at Urbana-​Champaign in 1982. He is Professor of Linguistics in Allameh Tabataba’i University (Tehran). He is a permanent member of the Academy of Persian Language and Literature. He is the author of Theoretical Linguistics: Emergence and Development of Generative Grammar, Studies in Persian Linguistics, Typology of Iranian Languages (2 volumes), and a number of articles. Jila Ghomeshi  is Professor of Linguistics at the University of Manitoba. She has carried out research and published articles on many aspects of Persian syntax and morphology. In addition to her scholarly research, she has sought to bring linguistics to a more general audience with short radio columns and a book on prescriptivism entitled Grammar Matters. These efforts earned her a National Achievement Award from the Canadian Linguistic Association in 2014. Arsalan Kahnemuyipour  received his PhD in Linguistics from the University of Toronto in 2004. He is currently an Associate Professor of Linguistics at the University of Toronto Mississauga. His areas of expertise are syntax, morphology, and the interface between syntax and prosody. He has worked on a number of languages including his native Persian, as well as English, Armenian, Turkish, Niuean, among others. He has published a book with Oxford University Press and several articles in journals such as Lingua, Linguistic Inquiry, Natural Language, and Linguistic Theory and Syntax. Simin Karimi  is a Professor in the Department of Linguistics at the University of Arizona. She has worked on various syntactic topics in Persian, including word order and scrambling, syntax–​discourse interaction, complex predicates, and complex DPs. Her current research

xx   THE CONTRIBUTORS focuses on control constructions, ellipsis, and the syntax and semantics of complex predicates in various Iranian languages. She has published journal articles, book chapters, and one book length monograph. She has also edited/​co-​edited five books and a special issue for the journal Lingua. Mauro Maggi (PhD 1992)  is Associate Professor of Iranian philology at La Sapienza University (Rome) and was Associate Professor of Indo-​Iranian philology at L’Orientale University (Naples) until 2008. His areas of expertise are the Khotanese language and literature, Central Asian Buddhism, the history of the Iranian languages, and early and sub-​ standard New Persian. He has published The Khotanese Karmavibhaṅga (1995), Pelliot Chinois 2928: A Khotanese Love Story (1997), and many scholarly articles. Among his edited books are The Persian Language in History (with Paola Orsatti, 2011) and Buddhism among the Iranian Peoples of Central Asia (with Matteo De Chiara and Giuliana Martini, 2013). Behrooz Mahmoodi-​Bakhtiari  received his MA (1999) and PhD (2004) in Linguistics from Allameh Tabatabaee University, Tehran, and is now an Associate Professor of Linguistics and Persian at the Faulty of Fine Arts, University of Tehran. His major publications include Tense in Persian (2002), Fārsi Biyāmuzim [Let’s Learn Persian] (2004), and Persian for Dummies (forthcoming). He is also the author of numerous articles on Persian linguistics, grammar, and its several dialects. Shahrzad Mahootian is Program Coordinator and Professor of Linguistics in the Linguistics Department at Northeastern Illinois University in Chicago. In addition to her attention to aspects of Persian and Iranian linguistics, her research interests and publications include topics in language contact, bilingual language acquisition, structural, cognitive and social aspects of code-​switching and language choice, language and identity, endangered languages, and language documentation and maintenance. Karine Megerdoomian  is a Principal Computational Linguist at MITRE, a federally funded research and development centre, and Adjunct Faculty at the Communication, Culture, and Technology department at Georgetown University. Karine’s expertise is in the domains of social media analytics and linguistically informed Natural Language Processing with a focus on Middle Eastern Languages. Her current research focuses on the relationship between language in online media and associated socio-​political issues—​with emphasis on sentiment analysis, and automatic framing and narrative analysis. Karine’s linguistic research has focused on the study of complex predicates and the syntax–​semantics interface. Yahya Modarresi  received his PhD from the University of Kansas, USA and is currently a Professor of Linguistics at the Institute for Humanities and Cultural Studies (I.H.C.S.). He has taught sociolinguistics and anthropological linguistics in the Department of Linguistics, IHCS, and the Department of Anthropology, University of Tehran.  His books include An Introduction to Sociolinguistics (1989) and Language and Migration (2015).  He has written many articles in academic journals. At present, he is the Editor in Chief of the journal of the Linguistics Society of Iran. He has also been a member of the editorial board of different academic journals, including the International Journal of the Sociology of Language for many years. Golnaz Modarresi Ghavami  is a faculty member of the Linguistics department at A.T.U., Tehran, Iran. She teaches general phonetics, introductory and advanced phonology, Persian

THE CONTRIBUTORS   xxi and English phonetics and phonology, acoustic phonetics, and historical linguistics. Her research is mainly focused on the phonetics and phonology of Persian. She is the author of Phonetics:  The Scientific Study of Speech (2011, 2015)  and A Glossary of Phonetics and Phonology (2015), both in Persian. Reza Nilipour  is Emeritus Professor of Neurolinguistics and Clinical Linguistics and former Chairman of Department of Speech Therapy, University of Social Welfare and Rehabilitation Sciences. He developed the first PhD programme in Speech Therapy in Iran. He has been developing several clinical linguistic Batteries and is the author and co-​author of neurolinguistics chapter books and research articles in Brain and Language, Neurolinguistics, Aphasiology, and Basic and Clinical Neuroscience. He is a member of Academy of Sciences in charge of Linguistics department. He was guest professor to European Masters of Clinical Linguistics, University of Potsdam, Germany in 2005. Paola Orsatti  (Laurea in Lettere, 1979) entered university as a researcher in 1983 and is currently Associate Professor of Persian Language and Literature at La Sapienza University (Rome), where she formerly also taught history of the Persian language. In addition to her Persian studies, she specialized as keeper of manuscripts at the Scuola Speciale per Archivisti e Bibliotecari of La Sapienza University in 1992. Her research focuses on the history of Persian, Persian classical literature, the history of Persian studies in Europe, palaeography, and codicology of Islamic manuscripts. Besides a number of scholarly articles, she has published Il fondo Borgia della Biblioteca Vaticana e gli studi orientali a Roma tra Sette e Ottocento (1996), Appunti per una storia della lingua neopersiana. 1: Parte generale, fonologia, la più antica documentazione (2007), Corso di lingua persiana (with Daniela Meneghini, 2012), and edited The Persian Language in History (with Mauro Maggi, 2011). Pollet Samvelian  is Professor of Linguistics at Sorbonne Nouvelle University. She has published several books and articles on the syntax and morphology of Western Iranian languages, especially on complex predicates, bare objects, differential object marking, word order, verbal periphrases, and clitics. Her recent publications include La Grammaire des prédicats complexs. Les constructions nom-​verbe (2012, Hermès-​Lavoisier) and Approaches to Complex Predicates (2015, co-​ed. with L. Nash, Brill). Anousha Sedighi is Professor of Persian and Persian Program Head at Portland State University. She received her PhD in Linguistics from the University of Ottawa in 2005. She has published on syntax, morphology, and teaching Persian as a heritage language. Her first book, Agreement Restrictions in Persian, was published in 2008 by Rozenberg and Purdue University Press and republished in 2011 by Leiden University Press and University of Chicago Press. Her second book, Persian in Use: An Elementary Textbook of Language and Culture, was published in 2015 by Leiden University Press and University of Chicago Press. She served as the President of the American Association of Teachers of Persian (2014–​16) and she currently sits in the executive board of the National Council of Less Commonly Taught Languages. Pouneh Shabani-​Jadidi  is Senior Lecturer of Persian Language and Linguistics at McGill University. She holds a Ph.D. in Linguistics from the University of Ottawa (2012) as well as a Ph.D. in Applied Linguistics from Tehran Azad University (2004). She has taught Persian language and linguistics as well as Persian literature and translation at McGill University, the

xxii   THE CONTRIBUTORS University of Oxford, the University of Chicago, and Tehran Azad University since 1997. She has published on morphology, psycholinguistics, translation, teaching Persian as a second language, and second language acquisition. Some of her representative publications are Processing Compound Verbs in Persian: A Psycholinguistic Approach to Complex Predicates (Leiden and University of Chicago Press, 2014), The Routledge Introductory Persian Course and The Routledge Intermediate Persian Course (2010, 2012, with Dominic Brookshaw), as well as What the Persian Media Says (Routledge, 2015). She is the translator of The Thousand Families: Commentary on Leading Political Figures of Nineteenth Century Iran, by Ali Shabani (Peter Lang, 2018, with Patricia Higgins). She serves as reviewer for International Journal of Iranian Studies, International Journal of Applied Linguistics, Sage Open Journal, Frontiers in Psychology, International Journal of Psycholinguistic Research, and LINGUA. Currently, she is president of the American Association of Teachers of Persian (2018–2020).

chapter 1

INTRODU C T I ON Anousha Sedighi and Pouneh Shabani-​J adidi In an ideal world, there is continuous communication and collaboration among all the subfields of linguistics. As such, linguistic theories are tested by the experimental studies in order to get validated, and research on language acquisition feeds into hands of the language curriculum developers and educators who provide feedback for the theorists. What inspired us to edit this volume is the fancy of a world where scholars, researchers, and students in all the fields and subfields of Persian linguistics, both theoretical and applied, and in all the countries around the world are collaborating with one another. This handbook is a step towards creating such a world. Modern Persian belongs to the Western Iranian branch of the Indo-​Iranian group of the Indo-​European language family. It is a descendant of the Middle Persian, the official language of the Sasanian Empire (third century bce–​seventh century ce) and Old Persian, the language of the Achaemenid Empire (sixth–​fourth century bce). Currently, Persian is spoken by more than 110 million people, mainly in the three countries of Iran, Afghanistan, and Tajikistan. Persian is sometimes known by its endonym ‘Farsi’, which was the term used by all its native speakers until the twentieth century. Currently, due to political reasons, it is called ‘Dari’ in Afghanistan and ‘Tajiki’ in Tajiksitan, while in Iran, the term ‘Farsi’ has remained the local name of the language. These three dialects of Persian are mutually intelligible by its native speakers. Persian of Iran has borrowed many words from Arabic and French. In Afghanistan, the Dari Persian is closer to the Middle Persian and it has borrowed more words from English. Tajikistan, due to its being part of the ex-​Soviet Union has borrowed many words from Russian. Persian has adopted the Arabic writing system since the Arab conquest of Persia in seventh century ce which is now utilized in Iran and Afghanistan. Tajiki Persian, on the other hand, uses the Cyrillic alphabet. Other than these three countries, where Persian is an official language, Persian is also spoken in many other parts of the world. Persian has influenced other languages such as the Turkic languages, Armenian, Georgian, and Urdu. It has also had some influence on Arabic especially Bahraini and Kuwaiti Arabic. For centuries, Persian has been a prestigious cultural language in Central Asia, South Asia, and Western Asia, and Persian literature has been described as one of the great literatures of mankind.

2    Anousha Sedighi and Pouneh Shabani-Jadidi Research on Persian linguistics had previously centred on historical linguistics, mainly on ancient languages of the greater Iran, including Old Persian, Avestan, Pahlavi, and also Middle Persian. Within the past century, Modern Persian linguistics has become an active topic of research and many noteworthy volumes on different subfields have been published across the world. However, most of these volumes have been somewhat fragmented focusing on certain subfields or have been gathered in a conference proceeding format, which does not necessarily cover all the subfields of theoretical and applied linguistics. The present handbook is thus the first comprehensive attempt to bring together all subfields of historical, theoretical, and applied Persian linguistics into one cohesive volume. We have invited the internationally renowned leading scholars of major subfields of Persian linguistics to contribute to this project either as an author or a reviewer. At the same time and as much as possible, the volume aims to maintain accessibility to those outside the immediate specialization of the authors so that the book can also be informative for students and non-​specialist readers. In order to have a broader spectrum of contributors, we have included scholars from across the world to collaborate on this project. Of course, no road that is worth travelling is smooth, and we have had our share of challenges in the journey of editing this volume. One of the biggest challenges was to persuade the older generation of scholars, to whom Persian linguistics owes a great deal, to contribute to the volume. This caused our project to be postponed in several occasions, as a few of the original contributors that we had invited to write a chapter withdrew in the middle of the journey, and we had to roll the wheel from point zero again. Another challenge was to find contributors for certain chapters. As the readers might be aware, the most-​studied area of Persian linguistics is syntax, hence our having allocated three chapters to different topics in Persian syntax. However, several subfields of Persian linguistics are deeply under-​studied as reflected in the short overview of literature within those subfields. Hopefully the avid readers and students of Persian linguistics will ensure to fill the gaps in the near future. There are six parts to this volume, each of which contains several chapters. Transliterations and transcriptions may vary based on the focus and topic of each chapter. We have decided to honour the choice of terminology each author has chosen as some of the authors have strong feelings about them. For instance, some chapters use the term ‘Tehrani Persian’ and some use ‘Tehran Persian’. The same goes for the terms ‘Tajiki’ and ‘Tajik’. The discussions move from more general to more specific topics encompassing historical, theoretical, and applied linguistics. Part I is on the history and classification of the Persian language, discussing the linguistic change from Old Persian, to Middle Persian, and finally to the New Persian, as well as the typological approaches and dialects of Persian. In Chapter 2, Mauro Maggi and Paola Orsatti look at the evolution of Persian and provide a description of the most significant features of Old, Middle, and New Persian, with an analysis of the main changes over time. In Chapter 3, Mohammad Dabir-​Moghaddam describes the main morphosyntactic typological features of Modern Standard Persian and discusses various linguistic features of a number of currently spoken Persian dialects, namely Tajiki Persian, Afghan (Dari) Persian, Isfahani Persian, and Gha’eni Persian. Part II is on the sound system of Persian, encompassing phonetics, phonology, and the prosodic structure of the Persian language. In Chapter 4, Golnaz Modarresi Ghavami investigates the phonetic aspects of the sound system of Modern Persian. She introduces the phonemes of Persian and discusses their articulatory as well as acoustic properties. The chapter also discusses the phonetic aspects of suprasegmental features of stress and intonation. In

Introduction   3 Chapter 5, Mahmood Bijankhan investigates the phonology of Modern Persian according to the formal and colloquial speech data in the Tehrani dialect. The chapter presents a phoneme inventory and discusses the syllable structure, the phonological processes and rules as well as the interaction of violating markedness and faithfulness constraints. In Chapter 6, Arsalan Kahnemuyipour discusses Persian prosody at the various levels and its interaction with information structure. He also considers the phonetic realization of prosodic prominence and intonation in Persian. Part III is focused on Persian syntax, which as mentioned earlier, is the most widely studied area of Persian linguistics. Hence, we have assigned three chapters to cover all of the existing work and to capture the essence of Persian syntax and its language specific features. In Chapter 7, Simin Karimi provides a descriptive overview of some of the major syntactic and morphosyntactic properties of Persian and introduces the major literature provided by grammarians and linguists inside and outside Iran. While Chapter 7 is mainly a description of generative approaches to Persian syntax, primarily from the perspective of the minimalist framework, other approaches to Persian syntax have been introduced and discussed in Chapter 8. In this chapter, Jila Ghomeshi presents an overview of some of the other theoretical approaches to Persian syntax and showcases the way in which aspects of Persian syntax have been addressed within a number of different approaches. While the focus is on Persian, some of the discussion of this chapter extends to languages in the same family as well as contact languages. In Chapter 9, Pollet Samvelian discusses the three controversial features of Persian syntax: the Ezāfe construction, the enclitic rā, and complex predicates. In this chapter, language-​specific challenging facts in each of these three phenomena are described and accounted for. These language-​ specific phenomena can also be the topic of cross-​linguistic investigation and hence the Persian data is of crucial importance. Part IV, is on language and words, encompassing topics such as morphology, lexicography, and the Academy of Persian Language and Literature. In Chapter 10, Behrooz Mahmoodi-​ Bakhtiari discusses Persian morphology, including lexical and functional morphemes, and looks at a large array of pronouns, nouns, adjectives, adverbs, as well as tense and verbal morphology. He also looks at processes such as compounding and minor word-​formation types in Persian. In Chapter 11, Seyed Mostafa Assi offers a concise chronological overview of Persian lexicographic tradition and discusses the recent advances and developments in the lexicographic publications. In Chapter 12, Mohammad Dabir-​Moghaddam introduces the Academy of Persian Language and Literature by giving an overview of the developments that led to the establishment of the first, second, and third Iranian academy for word-​selection and the activities and contributions of this academy. Part V focuses on language and people, including topics such as sociolinguistics, language contact and identity, as well as pedagogy. The Persian-​speaking countries, which presently include Iran, Tajikistan, and Afghanistan, create a great laboratory for sociolinguists. In Chapter 13, Yahya Modarresi introduces Persian sociolinguistics and discusses concepts such as dialect studies, social variations, contacts, borrowing, and code-​switching. While the official language of Iran is Persian, the mother tongue of a large group of Iranians is a language other than Persian. In Chapter 14, Shahrzad Mahootian focuses on the indigenous languages of Iran and discusses the struggles faced by minority languages similar to that of the other multilingual nations. The subsequent chapter looks at the other side of the same coin. The Persian-​speaking countries have undergone much political turmoil within the last

4    Anousha Sedighi and Pouneh Shabani-Jadidi several decades that have created mass migration of Persian speaking population to the west. Thus the issue of language maintenance among second generation Persian speakers comes to importance. In Chapter 15, Anousha Sedighi examines the characteristics of heritage Persian speakers in terms of their linguistic and metalinguistic abilities, compares their profiles with that of a native speaker and a second language learner and sheds light on the current challenges within this field. In Chapter 16, Pouneh Shabani-​Jadidi and Anousha Sedighi explore teaching Persian to speakers of other languages from a variety of perspectives including language acquisition, pedagogy, assessment, and curriculum development. Part VI investigates language, mind, and technology through a more experimental lens. The relevant topics include psycholinguistics, neurolinguistics, and computational linguistics. In Chapter 17, Pouneh Shabani-​Jadidi discusses existing studies in Persian psycholinguistics and provides the grounds for comparison between Persian psycholinguistics and psycholinguistic studies in other languages. In Chapter 18, Reza Nilipour discusses current neurolinguistics research from two major subfields of neurolinguistic studies on monolingual and bilingual speakers of Persian:  the patholinguistic studies in brain damaged speakers, and the first experimental fMRI studies on healthy monolingual and bilingual speakers of Persian. This volume would be incomplete without discussing the machine language. In the last chapter, Chapter 19, Karine Megerdoomian provides an overview of the Persian computational linguistics by presenting the essential components of computational linguistic analysis, discussing the main challenges of Persian within this field, and showcasing some of the important resources and methodologies developed in the field. Our intention as the editors of this volume has been to create a central reference for theoretical and applied linguists interested in Persian as well as for scholars specializing in comparative grammar. The contributions are also aimed to highlight crucial problems and to present potential solutions or suggest further lines of research. An important goal of this volume is to enhance communications and collaborations among the scholars of different subfields of Persian linguistics across the globe. Today we are living in a world where scholars are more and more specialized and less and less polymathic, which reduces dialogue and unity within the field. The editors hope that the present volume helps establish a platform where ideas of collaborations emerge. The Oxford Handbook of Persian Linguistics, thus, in one volume, gives critical expression to this language and builds a unifying bridge among its scholars.

Pa rt  I


chapter 2

F ROM OLD TO N EW PE RSIA N Mauro Maggi and Paola Orsatti * 2.1  Over 2,500 years of Persian Persian had its cradle in and owes its name to the south-​western region of Iran called Pārsa in Old Persian (Middle Persian Pārs, New Persian Pārs, Fārs) and Persis in Greek. Among the Iranian languages, that are conventionally divided into the three stages of Old, Middle, and New Iranian, Persian occupies a special position in that it is the only one to be substantially documented in all three periods as Old, Middle, and New Persian. This depends on its close connection with the main political centres for most of the time over the centuries. Old Persian was the language of the ruling dynasty of the Achaemenid empire from the sixth to the fourth century BC and, after the long interval of Greek and Parthian suzerainty over Iran, Middle Persian was the language of the ruling dynasty of the Sasanian empire from the third to the seventh century AD. Subsequently, New Persian was associated with Islamic powers: the Iranian Persian speaking, Islamized armies that conquered eastern Iran and Transoxiana; the Tahirid, Saffarid, and Samanid courts under the Abbasid caliphate at the very origins of the New Persian literary language between the ninth and the tenth centuries (Lazard 1975a: 595–​6, 601–​2; Perry 2009a: 52–​3); the non-​Iranian Persianate dynasties from the end of the tenth century with the Ghaznavids to the early twentieth century with the Qajars; and finally the Persian Pahlavi ruling house in the twentieth century. Though the wide area where Persian was spoken underwent a significant reduction after the second half of the eleventh century due to the spread of Turkic peoples (section 2.21), Persian was an important literary and prestige language far beyond the Persian speaking area all over the Islamic period. The Turkic dynasties that succeeded one another almost uninterruptedly for nine centuries in the Persian speaking territories had a major role in spreading the Persian culture and literature in large areas of Asia. Thus, an important chapter in the history of Persian literature is comprised of works produced in India from the late Ghaznavid dominion over north-​western India in the eleventh and twelfth centuries; Persian kept its function as the learned and official language in India until 1834; and it was the language of official correspondence and diplomacy, as well as a literary language from Ottoman Turkey to Indonesia. In the present time, the three major varieties of Persian are official *  Sections 2.1–2.3 and 2.5–2.11 (Old and Middle Persian) by Mauro Maggi; sections 2.4 and 2.12–2.22 (New Persian) by Paola Orsatti

8    Mauro Maggi and Paola Orsatti languages in the modern states of the Islamic Republic of Iran (Contemporary New Persian of Iran), Afghanistan (Afghan Persian, officially called dari, Pashto being another official language), and Tajikistan (Tajik Persian or tojikī) (see also Chapters 3, 11, 13, and 19). In addi­tion, Persian is nowadays spoken by naturalized communities in neighbouring countries, including Pakistan, Turkey, the United Arab Emirates, Azerbaijan, Uzbekistan, and other Central Asian countries, as well as in Europe and North America. It is accordingly possible to follow the historical development of the Persian language over the centuries for more than 2,500 years.1

2.2  Research on Old Persian A critical bibliography of linguistic studies on Old Persian in the last three decades is offered by Rossi (2008: 95–​111), and a quick general survey by Schmitt (2013: 233–​5). Schmitt (1989, 2004) and de Vaan and Lubotsky (2009) offer general presentations of the language. Schmitt (2009) has an up-​to-​date edition and translation of the entire Old Persian corpus, while Lecoq (1997) presents a complete translation of the inscriptions accompanied by a thorough treatment of Achaemenid culture. Schmitt (2016) classifies the stylistic phenomena of the inscriptions. For grammar, traditionally approached in a historical perspective, Kent (1953) and Brandenstein and Mayrhofer (1964) still prove useful. Skjærvø (2009b) gives a comprehensive updated overview of Old Persian grammar in the framework of Old Iranian (see also Testen 1997 on phonology and Skjærvø 2007 on morphology). Schmitt (2014) updates and summarizes information on Old Persian vocabulary. Hinz (1975) and Tavernier (2007) study the substantial Old Iranian element, including Persian, in other languages (Old Persian is further discussed in Chapters 3 and 11).

2.3  Research on Middle Persian Rossi (1975) offers a well-​organized bibliography on Middle Persian but is limited to the years 1966–​73. Nawabi (1987: 262–​384) has an exhaustive bibliography on Middle Persian along with Parthian. More recently, Durkin-​Meisterernst (2013) surveys the studies on Middle Persian in the framework of Middle Iranian. Sundermann (1989b) and Hale (2004) offer sketches of the language (see also Weber 1997 and 2007 on phonology and morphology). Klingenschmitt (2000) is important from

1  Because of space constraints, for the modern and contemporary periods this chapter is basically restricted to New Persian of Iran and does not cover the other national varieties. For dari and the Persian dialects of Afghanistan see Kieffer (1985: 505–​10); Farhadi (1955); Farhadi and Perry (2011); Kieffer (2004). For tojikī see Lazard (1956); Rastorgueva (1964); Perry (2005); Perry (2009b). References in this chapter generally privilege recent publications or the most recent ones on a given subject, where references to earlier literature can be found.

From Old to New Persian    9 the standpoint of historical linguistics. Durkin-​Meisterernst (2014) provides now a thorough and up-​to-​date treatment of all aspects of the grammar of Middle Persian (see also Rastorgueva and Molčanova 1981) and Parthian (see also Skjærvø 2009a), including supplements to Brunner (1977) on syntax. Standard Middle Persian dictionaries are MacKenzie (1971) for Zoroastrian texts (see also Nyberg 1974), Durkin-​Meisterernst (2004) for Manichaean texts, and Gignoux (1972: 3–​39) for the inscriptions. Proper names are dealt with by Gignoux (1986, 2003) and Zimmer (1991). The compilation of a comprehensive Middle Persian dictionary is underway (Shaked and Cereti 2005). The sections on Middle Persian in the ground-​breaking survey of Middle Iranian by Henning (1958: esp. 21–​7, 30–​7, 43–​52, 58–​79, 89–​92, 97–​104) contribute a wealth of information to the history of the language from its very first, sparse documentation in the third century BC onwards. See sections 2.9.1–​2.9.4 for references and information on the corpus of Middle Persian writings (see Chapters 3 and 11 for more discussion on Middle Persian).

2.4  Research on the history of New Persian A comprehensive history of the New Persian language is still a desideratum. However, there are studies on different stages or aspects of Persian, as well as good grammatical descriptions. A critical discussion of New Persian grammatical studies is given by Windfuhr (1979); a bibliography of linguistic studies is offered by Ahadi (2002), and a critical survey by Ludwig Paul (2013a). As to historical grammar, scholars have at their disposal only the one by Darmesteter (1883). Horn’s description of New Persian (1898–​1901) and Hübschmann’s work (1895) on Persian etymology and historical phonology are still useful. On New Persian etymology, a reference work has been recently provided by Ḥasandust (2014). Among the three main periods considered below (section 2.12), only the first one (Early New Persian) and the last one (Contemporary New Persian) have been studied to some extent from a purely linguistic perspective, while for the second period (Classical New Persian) one has mainly to rely on research based on a stylistic approach (Bahār 1942). Early New Persian has been further discussed in Chapters 3 and 4. For contemporary literary or standard New Persian of Iran, a comprehensive reference grammar is provided by Rubinčik (2001), to which the descriptions by Lazard (1989), Perry (2007), and Windfuhr and Perry (2009) should be added. Phillot’s grammar (1919) is still useful. Less attention has been paid to the spoken informal variety: apart from more or less detailed information in some grammars (e.g. Meneghini and Orsatti 2012: 255–​63) or independent studies (e.g. Alfieri and Barbati 2010), the reference work for the spoken informal variety is that by Lazard (1957). For literary Early New Persian, Gilbert Lazard’s ample description (1963) of the language of the most ancient prose texts of New Persian literature is destined to remain the standard authority for many years to come; for the language of ancient New Persian poetry see Lazard (1964: vol. 1, 41–​6). In Persian, a comprehensive linguistic study of both prose and verse New Persian texts up to the mid-​thirteenth century is given by Xānlari (1986). Very good

10    Mauro Maggi and Paola Orsatti studies of the language of single authors are offered by editors of classical texts like Maḥjub (1959: 33–​56) in the preface to his edition of Gorgāni’s poem Vīs va Rāmīn, or Shafiʿi Kadkani (1987a: 181–​209) in the preface to his edition of Asrār al-​tawḥīd by Muḥammad b. Munawwar. For the language of the Šāhnāme, Wolff ’s glossary (1935) is still a valuable research tool. As to studies more specifically related to the history of the language, a comprehensive and still useful reconstruction of the earliest attestations of New Persian was offered by Henning (1958: 77–​81, 86–​9). Orsatti (2007b: 102–​72) provides a critical survey of the most ancient New Persian documents in Hebrew, Syriac, and Manichaean scripts. The verbal system of literary New Persian from the eleventh to the sixteenth centuries is the subject of a thorough study by Lenepveu-​Hotz (2014), who also takes into account Early Judaeo-​Persian documents. For the actual phonetic reality of Classical New Persian, Meier (1981) provides us with a true mine of information, mainly based on the analysis of rhymes in early and classical poetry. Telegdi (1955) offers an important historical, mainly lexical study of Persian verbs with the ‘prefixes’ bar, dar, farā, foru, and bāz ~ vā. Two volumes gathering articles on various aspects of the history of Persian have been recently edited by Paul (2003b) and Maggi and Orsatti (2011). Quite useful is the publication of a volume collecting Lazard’s articles on the formation of the New Persian language (1995) and a volume collecting Utas’s contributions to the history of Persian (2013). Finally, mention should be made of recent multi-​author works such as the one edited by Karimi, Stilo, and Samiian (2008).

2.5  Old Persian: documentation, use, script, and parallel tradition Old Persian is documented in the inscriptions of the Achaemenid kings (558–​330 BC). These epigraphic texts—​which mark the beginning of writing in Iranian languages—​are free from modifications due to textual tradition, but form a comparatively small corpus (Lecoq 1997; Schmitt 2009; cf. Huyse 2009: 73–​83). Most inscriptions are from Fars, ancient Elam, and Media, that is, from the first regions which the Persians occupied and annexed in the seventh and sixth centuries BC after their immigration to south-​western Iran and which formed the core of their empire. The inscriptions date from the time from Darius I (522–​486 BC) to Artaxerxes III (359–​338 BC), but most of them are from the times of Darius I and Xerxes I (486–​465 BC). Later texts are short, repetitive, and mostly not accompanied by versions in other languages, unlike the earlier inscriptions, that display Elamite, Babylonian, and, if produced in Egypt, Egyptian texts beside the Old Persian ones as a mark of continuity with the previous powers whose territories had been incorporated into the Persian empire. For more information on Elamite, refer to Chapter 3. Though Old Persian was the Iranian dialect spoken in Fars and the native tongue of the Achaemenids, the language of the inscriptions is a formal language with many loanwords and an archaizing character. Old Persian as we know it from the inscriptions with its special features was meant as a means to promote the prestige of the kings and their feats. Its written use was, thus, very delimited. The same holds true for the quasi-​alphabetic writing system of the cuneiform Old Persian script (Hoffmann 1976; Lecoq 1997: 59–​72, 285), which

From Old to New Persian    11 imitated ealier writing systems of ancient Near East in the use of wedge-​shaped marks and was not conceived for everyday usage, but as a prestige script for a prestige language. This is confirmed by the fact that, after an incubation period, the script was first adopted extensively for adding the Old Persian text (Schmitt 1991) to the original Elamite and Bablonian texts of Darius I’s Bisotun inscription aimed at royal self-​portrayal and propaganda following his contrasted accession to the throne, and that it was employed for epigraphical texts partly inaccessible and, thus, not intended to be actually read. This is precisely the case, for instance, of the Bisotun inscription engraved into the rock more than 20 m above the closest point reachable by climbing and 60 m above the nearby caravan trail. Old Persian did not spread across the multiethnic Achaemenid empire, where a large number of languages were in use (Schmitt 1993). The language of the central administration, the official correspondence, and the local administration in some provinces was the so-​called Official Aramaic, while Persian had virtually no role in the actual administration of the empire (see Chapter 3 for more information on Aramaic). Even for the court administration in Persepolis the language used was Elamite, just like Babylonian in Babylonia, Egyptian in Egypt, and Greek and other local languages in Asia Minor. A consequence of the multilingualism of the Achaemenid empire is the occurrence, in foreign language sources, of numerous Old Persian and other Iranian words and names that are not preserved in the comparatively small textual corpus and form the so-​called parallel tradition (Hinz 1975; Tavernier 2007).

2.6  Old Persian innovations A number of innovations characterize Old Persian as against the other Iranian languages (Iranian languages are further discussed in Chapter 3). The most conspicuous phonological changes—​which enable one to distinguish in part genuine Persian words from loanwords (section 2.7)—​are the following (Schmitt 1989: 68–​70):

1) Old Persian ϑ, d, d, as against s, z, z in the other Iranian languages, from Iranian *ts, *dz, *dz resulting from the Indo-​Iranian palatals *ć, *ȷ́, *ȷ́ʰ (cf. Vedic ś, j, h) < Indo-​ European *ḱ, *ǵ, *ǵʰ: for instance, Old Persian *daϑa ‘ten’ (> Middle and New Persian dah) indirectly attested in *daϑa-​pati ‘decurion’, *daϑa-​pa-​ ‘decury’, and *daϑa-​ hva-​ ‘one tenth’ of the parallel tradition (Tavernier 2007: 419, 451, 455), but Young Avestan dasa, cf. Vedic dáśa < Indo-​Iranian *dáća; Old Persian present stem dā-​nā-​ ‘to know’ (> Middle and New Persian dān-​), but Avestan zā-​nā-​, cf. Vedic jā-​nā́-​< Indo-​Iranian *ȷ´ā-​nā́-;​ Old Persian adam ‘I’ (> Middle Persian an), but avest. azəm, cf. Vedic ahám < Indo-​Iranian *aʰám. 2) Old Persian ç (= [ss]?) from Iranian *ϑr resulting from Indo-​Iranian *-​tr-​ and preserved elsewhere as such or in its continuations: for instance, Old Persian puça-​‘son’ (> Middle Persian pus and pusar [with -​ar by analogy with other nouns of relationship] > New Persian pesar), but Avestan puϑra-​, cf. Vedic putrá-​ < Indo-​Iranian *putrá-​; Old Persian xšaça-​ ‘kingdom, kingship, power’, but Avestan xšaϑra-​ (or the borrowed Middle Persian > New Persian šahr), cf. Vedic kṣatrá-​< Indo-​Iranian kšatrá-​.

12    Mauro Maggi and Paola Orsatti 3) Old Persian s, as against sp elsewhere apart from Khotanese and Wakhi š, from Indo-​ Iranian *ću̯ < Indo-​European *ḱu̯: for instance, Old Persian asa-​‘horse’ (also in asa-​ bāra-​‘horseman’ > Middle Persian aswār > New Persian savār), but Avestan aspa-​(or the borrowed Middle Persian asp > New Persian asb) and Old Khotanese aśśa-​[aša-​], cf. Vedic áśva-​< Indo-​Iranian *áću̯a-​. 4) Old Persian šiy from the Iranian cluster *ϑi̯ resulting from Indo-​Iranian *ti̯ and preserved elsewhere: for instance, Old Persian hašiya-​‘true’, but Avestan hai​ϑiia-​ < Iranian *haϑi̯a-​, cf. Vedic satyá-​< Indo-​Iranian *sati̯á-​.

In the nominal and pronominal inflections, the Indo-​European and Indo-​Iranian eight case system (nominative, accusative, vocative, genitive, dative, ablative, instrumental, locative), which is still preserved in Avestan, was reduced to six cases in Old Persian in that all functions of the dative were subsumed by the genitive endings and the ablative virtually merged with the instrumental. Moreover, several other originally differing endings came to coincide because of the loss of most final consonants, so that, for example, a single ending -​āyā stands for the genitive-​dative, ablative (< Iranian *-​āyah), locative, and instrumental singular (< Iranian *-​āyā) of the ā-​declension nouns (see Chapter 9 for more discussion on case). Likewise, the verbal system exhibits restructuring with losses and innovations (Skjærvø 1985). Notably, there is no longer any opposition of aspect between the rare aorist forms and the prevailing imperfect, which denotes both progressive and completed action (see section 2.6.1 on the inherited perfect and a new periphrastic past tense), so that the formal third singular aorist active adā ‘he created’, preferred by Darius I and others in the solemn formula seen in (1a), interchanges in otherwise virtually indentical contexts with Darius’s two occurrences of the more colloquial imperfect adadā (1b–​c):2 (1)

a haya          šiyāti-​ m           adā           martiya-​hyā (DNa 3–​4 etc.)3 who.M[NOM.SG] happiness-ACC.SG create.AOR[3SG] man-​ GEN.SG b haya who.M[NOM.SG]

šiyāti-​m happiness-​ASS.SG

adadā create.IPRF[3SG]

c haya adadā šiyāti-​m who.M[NOM.SG] create.IPRF[3SG] happiness-​ACC.SG ‘(Ahuramazdā) who created happiness for man’.

martiya-​hyā (DSe 4) man-​GEN.SG martiya-​hyā (DNb 2–​3) man-​GEN.SG

Peculiar of Persian from its earliest stage are also some lexical items. Thus, the Indo-​ Iranian verbs for ‘to speak, say’ *u̯ač-​ and *mrau̯H-​/m ​ ruH-​ are continued in Avestan as vac-​ and mrū-​ (cf. Vedic vac-​ and brav-​), but are replaced in Old Persian by verbs that


Labels with specialized meanings and additional labels used in this chapter for glossing are: aor = aorist, ezf = ezafe (also for the Middle Persian relative particle, 2.10.2), gen = genitive-​dative, hort = hortative particle, ins = instrumental-​ablative, iprf = imperfect, iprt = imperativ. 3  The now standard system of sigla for referring to the Old Persian inscriptions was introduced by Kent (1953) and expanded by others (Schmitt 2009: 7). Old Persian quotations in this chapter are basically from Schmitt’s edition (note r̥ = [ər]), but no dot is used in a.u = aʰu etc., as no ambiguity with the diphthong au̯ is possible, and accent marks are added when appropriate.

From Old to New Persian    13 apparently were originally used with honorific force: θanh-​ ‘to say’ (cf. Middle Persian saxwan ‘word, speech’ > New Persian saxon, soxan) from an original meaning ‘to praise, announce’ witnessed by Avestan saŋh-​ ‘to announce, declare’ and Vedic śaṃs-​ ‘to praise, announce’; and gaub-​ ‘to say’, only attested in the middle diathesis with the meaning ‘to call oneself ’ (> Middle Persian gō(w)-​, guftan ‘to say, speak’ > New Persian gū(y)-​, goftan), from an original meaning ‘to praise, announce’ witnessed by Sogdian γwβ-​ ‘to praise’, Choresmian γwβ(y)-​ ‘to praise oneself, boast, be proud’, etc. Similarly, Old Persian does not continue Indo-​Iranian *ćrau̯-​/ć​ ru-​ ‘to hear’ (cf. Avestan sru-​ and Vedic śrav-​),4 but has the vivid metaphorical ā-​xšnu-​ ‘to hear’ ← ‘to sharpen (the ears)’ (> Middle Persian āšnaw-​, āšnūdan, cf. New Persian šenav-​, šenudan), whose original meaning is preserved in Avestan hu-​xšnuta-​ ‘well-​sharpened’ and Vedic kṣṇav-​ ‘to whet, sharpen’ (cf. Schmitt 1989: 84; Cheung 2007: 113–​14, 334, 456–​7).

2.6.1 New perfect and pluperfect The old synthetic perfect occurs only once in the optative (caxriyā third singular active to kar-​‘to do’ in DB 1.50) and is actually replaced, in the indicative, by a new periphrastic formation with resultative value (Skjærvø 2009b: 144–​5). This new perfect consists of the -​ta-​ past participle and the auxiliary ah-​‘to be’, which is omitted in the third singular, and occurs with both intransitive (2) and transitive verbs (3): (2)

Pārsa-​hyā martiya-​hyā dūrai̯ r̥šti-​š parāgmatā Persian-​GEN.SG man-​GEN.SG far.away spear-​NOM.SG arrive.PST.PTCP.F[NOM.SG] ‘the Persian man’s spear has arrived far away’ (DNa 43‒5).


Çūsā-​yā paru fraša-​m framāta-​m, Susa-​LOC.SG much(NOM.SG.N) wonderful-​NOM.SG.N order.PST.PTCP-​NOM.SG.N, paru fraša-​m kr̥ta-​m much(NOM.SG.N) wonderful-​NOM.SG.N do.PST.PTCP-​NOM.SG.N ‘In Susa, much wonderful work has been ordered, much wonderful work has been executed’ (DSf 56–​7).

When the copula is in the imperfect, the formation has pluperfect value (4): (4)

xšaça-​m, taya hacā amāxam tau̯mā-​yā kingship-​NOM.SG REL[NOM.SG.N] from 1PL.GEN family-​INS.SG parābr̥ta-​m āha, ava adam take.away.PST.PTCP-​NOM.SG.N be.IPRF[3SG], that[ACC.SG.N] 1SG.NOM patipada-​m akunav-​am do.IPRF-1SG ‘I re-​established the kingship which had been taken away from our family’ (DB 1.61‒2).

4  The conservative Old Persian past participle passive çuta-​‘famous’ ← ‘heard of ’ is only found as the first member in proper names attested in the parallel tradition (Tavernier 2007: 161–​2).

14    Mauro Maggi and Paola Orsatti Since the -​ta-​ past participle has a passive meaning with transitive verbs, their new perfect is also only passive and contrasts with the present and the imperfect that have both active and passive constructions. When an agent is expressed, this is in the genitive-​dative (5): (5)

utā taya manā kr̥ta-​m utā and REL[NOM.SG.N] 1SG.GEN do.PST.PTCP-​NOM.SG.N and taya=mai̯ piç-​a kr̥ta-​m, REL[NOM.SG.N]=1SG.OBL father-​GEN do.PST.PTCP-​NOM.SG.N, avas=ci Auramazdā pā-​tu that.ACC.SG.N=INDF Ahuramazdā[NOM.SG.M] protect-​IPRT.3SG ‘and may Ahuramazdā protect whatever I have done and whatever my father has done’ (← ‘ ... what has been done by me and what has been done by my father’) (XPa 19‒20).

This construction with the agent in the genitive-​dative, often referred to as the ‘manā kr̥tam construction’, is the systematization of inherited expressions occasionally found in Avestan (Haig 2008: 23–​88; Jügel 2015: 68–​80, 322–​4, 571–​8) and is at the origin of the Middle Persian ergative construction (section 2.10.5). The so-​called ‘potential construction’ consisting of a past participle and the verbs kar-​‘to do’ (active) and bav-​‘to become’ (passive) expresses (successful) completion of an action in Old Persian (Filippone 2015).

2.6.2 Beginnings of the ezafe construction The relative pronoun haya-​/​taya-​5 used to join a modifier to a usually preceding substantive (Kent 1953: 85; Skjærvø 2009b: 100–​1) is another construction which is also found in Avestan and developed further as the Middle Persian relative particle (section 2.10.2) and the New Persian ezafe. For more discussion on ezafe, refer to Chapters 3, 6, 7, 9, and 19. The modifer can be an apposition, an adjective (6), or a modifying noun or pronoun (7): (6) hau̯=mai̯ ima xšaça-​m frābara taya he=1SG.OBL this[ACC.SG.N] kingdom-​ACC.SG.N bestow.IPRF[3SG] rel[ACC.SG.N] vazr̥ka-​m taya uvasa-​m great-​ACC.SG.N REL[ACC.SG.N] possessed.of.good.horses-​ACC.SG.N umartiya-​m​ACC.SG.N ‘He bestowed this kingdom on me, great, possessed of good horses (and) good men’ (DSf 11–​12). (7) kāra haya manā ava-​m kāra-​m army[NOM.SG] REL.M[NOM.SG] 1SG.GEN that-​ACC.SG.M army-​ACC.SG.M taya-​m Vahyazdāta-​hya6 aja vasai̯ REL-​ACC.SG.M Vahyazdāta-​GEN.SG beat.IPRF[3SG] greatly ‘my army beat that army of Vahyazdāta greatly’ (DB 3.45‒6). 5 

The stem haya-​is used for the nominative singular masculine and feminine, the stem taya-​for all other cases. The Old Persian relative pronoun is an innovation which resulted from the univerbation of the Indo-​Iranian demostrative pronoun *sá-​/t​ á-​and relative pronoun *i̯á-​ (Avestan ha-​/t​ a-​ and ya-​). 6  So spelled.

From Old to New Persian    15 With appositions, the case of the relative pronoun and the apposition is the same as that of the modified noun: nominative Gaumāta haya maguš (DB 1.44), accusative Gaumātam tayam magum (DB 1.49–​50) ‘Gaumāta the Magian’.

2.7  Loanwords in Old Persian As there are Old Persian words and names in the parallel tradition, so there are a number of loanwords in the Achaemenid inscriptions. A Semitic word such as Aramaic mašk ‘skin’ with the emphatic state suffix -​ā (rather than Babylonian mašku) was borrowed as the -​ā-​ declension word maškā-​(> Middle and New Persian mašk ‘leather bottle’) to refer to the ‘(inflated) skins’ used by Darius’s army as floats to cross the Tigris (DB 1.86: Schmitt 2014: 213). Most loanwords, however, concern kingship and administration and are probably of Median origin, though the phonological developments observed in these loanwords differ from the Old Persian ones but are not specifically Median: because the Persians had been subject to the Medes until the conquest of Media by Cyrus II (558–​530 BC), it is only natural that the Persians regarded themselves as their political heirs and took up their political terminology. Thus, on the one hand, it is virtually certain that the epithets vispa-​zana-​ ‘having all (kinds of) men’ and uv-​aspa-​ ‘having good horses’ that qualify the empire (and contrast with everyday Old Persian visa-​ ‘all’ and asa-​ ‘horse’ < Indo-​Iranian *u̯íću̯a-​ and *áću̯a-​) come from Median because the outcome sp of Indo-​Iranian *ću̯ is documented by the Median form spáka ‘bitch’ quoted by Herodotus (Histories 1.110.1; cf. Old Persian *saka-​‘dog’ > Middle and New Persian sag). On the other hand, one can only postulate a Median origin for xšāyaϑiya-​ ‘king’ (> Middle and New Persian šāh), with -​ϑiy-​ instead of expected Old Persian -​šiy-​ < Iranian *-​ϑi̯-,​ because the Median outcome of Iranian *-​ϑi̯-​is not otherwise known (Schmitt 1984: 185–​96). Part of the political terminology adopted from Median goes ultimately back to earlier Near Eastern formulas: for example, the expression vašnā Auramazdāha ‘by the greatness/​ might of Ahuramazdā’ (Skjærvø 2007: 903, 935, instrumental of *vazar-​/v​ ašn-​ ‘greatness’, cf. vazr̥ka-​‘great’ > Middle Persian wuzurg > New Persian bozorg) correponds to Urartian Ḫaldinini alsuišini/​ušmašini ‘by the greatness/​might of Ḫaldi’; and the title xšāyaϑiya xšāyaϑiyānām ‘king of kings’ corresponds to Babylonian šar šarrāni. Both formulas betray a non-​Iranian origin because the modifying genitives (Auramazdāha, xšāyaϑiyānām) follow the modified nouns instead of preceding them, as is commonly the case in Old Iranian. The regular word order of the title was restored in Middle Persian šāhān šāh > New Persian šāhan-​šāh (Meillet and Benveniste 1931: 14–​15; Colditz 2003: 63–​4).

2.8  From Old to Middle Persian Grammar and spelling mistakes frequently found in the inscriptions of Artaxerxes I (465–​ 24), II (404–​359), and III point to a language already approaching Middle Persian with confusion and loss of endings and conflation of different antecedents. Some of the mistakes are unsuccessful endeavours to restore the by then archaic forms (Kent 1953: 23‒4; Schmitt 1989: 60): for example, the genitive-​dative singular (formed by appending the a-​declension

16    Mauro Maggi and Paola Orsatti ending -​ahyā also to nominatives of other declensions) occurs instead of the nominative and vice versa in genealogies from Artaxerxes I onwards and, in an inscription of Artaxerxes III, the regular i-​declension singular accusatives būmim ‘earth’ and šiyātim ‘happiness’ (e.g. DNa 2, 4) are replaced by būmām and šāyatām (A³Pa 2, 4) with the more common ā-​declension ending. The latter mistake also reveals that the word had become šāt or the like by the mid-​ fourth century (cf. the historical Pahlavi spelling for the Middle Persian adjective šād ‘happy’ < Old Persian šiyāta-​) because the -​ā-​ resulting from -​iyā-​ was erroneously restored as -​āya-​ on account of the coincident outcome -​ā-​ from earlier -​āya-​ found, for instance, in xšāyaϑiya-​‘king’ > Middle Persian šāh (see Schmitt 1999: 59–​118 for a detailed study of the features of ‘Late Old Persian’). During the long Greek and Parthian domination of Iran by the Seleucids (305–​125 BC) and the Arsacids (247 AC–​224 BC), the documentation of Persian is scarce and provides little linguistic information: a damaged and hardly readable inscription on Darius I’s tomb at Naqš-​e Rostam near Persepolis, where the words ḥšʾyty wzrk ‘great king’, mʾhy ‘month’, and possibly slwk ‘Seleucus’ have been recognized, is thought to have been written phonet­ ically in Early Middle Persian in Aramaic script at the request of some noble Persian in the early Seleucid period (Boyce and Grenet 1991: 118–​20); the legends on the third series of coins of the rulers of Fars, where (from Aramaic br-​eh ‘his son’ with suffixed pronoun inappropriate to the context instead of
bar one would expect if the legends were actually written in Aramaic) must stand for Middle Persian pus ‘son’, attest to the use of aramaeograms (conventionally transliterated by capital letters) for writing Persian from about the end of the second century BC (Henning 1958: 25); and an inscription on a bowl from the time of Ardašīr II, king of Fars in the second half of the first century BC and a vassal of the Arsacids, is the first known and readable Middle Persian inscription (Skjærvø 1997a). Under the Arsacid dynasty, Parthian gained a dominant position in Iran as the vehicle of Iranian culture, including oral epic poetry (Boyce 2003), and this caused a first batch of Parthian words, recognizable from phonological changes contrasting with the Persian ones, to enter Middle Persian (cf. section  2.9 on later Parthian loanwords), whence they then reached New Persian as in the case of such a political term as Parthian šahr → Middle Persian šahr ‘kingdom, country; city’ > New Persian šahr ‘city, town’ (as against Old Persian xšaça-​< Iranian *xšaϑra-​: see Tedesco 1921: esp. 198–​9 on -​hr-​ and cf. section 2.6, no. 2) or such a term common in military and epic contexts as Parthian asp → Middle Persian asp > New Persian asb ‘horse’ (as against Old Persian asa-​< Indo-​Iranian *áću̯a-​: cf. section 2.6, no. 3).

2.9  Middle Persian: documentation and scripts Middle Persian formed in post-​Achaemenian times as a development of Old Persian and its use was confined to Fars until the rise of the Sasanian dynasty (224–​651 AD), when it began not only to be more substantially employed in writing, but also to spread outside its region of origin, as it became the language of administration and communication in the Sasanian empire. Middle Persian continued to be used as a living language for a while in post-​Sasanian times and as a church language by the Zoroastrians in Iran and India and the Manichaeans

From Old to New Persian    17 in Chinese Central Asia. Major Middle Persian texts date from the third century, on account of the connection of the language with the ruling dynasty, and the ninth century, when it enjoyed a revival due to the endeavour on the part of the Zoroastrians to preserve their religious tradition after the spread of Islam. The use of Middle Persian spans, thus, over many centuries and well beyond Fars. This is the reason why various linguistic stages and developments are mirrored by the different text groups that document it (survey in Durkin-​ Meisterernst 2014: 15–​25 with references).

2.9.1 Inscriptional Middle Persian The first substantial text group are the inscriptions (Huyse 2009: 90–​102). Most important and comparatively extensive are the third-​century ones of the Sasanian kings Šābuhr I (241–​ 72) on the Kaʿbe-​ye Zardošt at Naqš-​e Rostam and Narseh I (293–​302) at Pāikūlī in Iraqi Kurdistan, the prominent Zoroastrian priest Kerdīr, and the court dignitary Abnūn. The other royal inscriptions (none are known after Šābuhr III, 383–​8), the coin legends, and the inscriptions on seals, gems, bullae, and vessels provide less linguistic information.7 Similarly to what happened with the Achaemenids, only the first Sasanians Ardašīr I (224–​40) and Šābuhr I produced trilingual inscriptions in Middle Persian as well as in Parthian and Greek in continuity with the previous imperial powers of the Seleucids and the Arsacids, while Šābuhr I already gives up Greek in a few inscriptions and Narseh in the Pāikūlī inscription uses only Parthian besides Middle Persian. Inscriptions by subsequent, fourth-​century kings are in Middle Persian only.

2.9.2 Manichaean Middle Persian A second text group, essential for the study of Middle Persian phonology, is the Manichaean literature in Middle Persian (Sundermann 2009) initiated in the third century by Mani himself (216–​77), the founder of the Manichaean religion. He was at the court of King Šābuhr I and dedicated him a description of his doctrine in Middle Persian titled Šābuhragān, which survives in comparatively extensive fragments. A number of other dogmatic, homiletic, and hymnic works composed by Mani and his disciples and followers are known from Middle Persian manuscript fragments recovered from Turfan in Chinese Central Asia. The bulk of them, either translations or original compositions, must go back to the time when Persian speaking Manichaeans were still in Iran, before escaping persecution by the Zoroastrians, and bears witness to the language as spoken in the early Sasanian centuries (Durkin-​ Meisterernst 2014: 9–​11), though a few texts may have been produced in Central Asia and some late features are occasionally detected (Durkin-​Meisterernst 2003).

7  The same applies to the legal and administrative documents (few third-​century texts from Dura-​ Europos in present-​day Syria, various papyri from seventh-​century Egypt, documents on parchment and linen from seventh-​century Iran (Weber 2008), and a number of ostraka from post-​Sasanian Iran) and the late private inscriptions in cursive script mostly on tombs (Huyse 2009: 100–​5), including the Middle Persian-​Chinese one from Xi’an in China (Rezai Baghbidi 2011).

18    Mauro Maggi and Paola Orsatti

2.9.3 Zoroastrian Middle Persian Since Narseh’s Pāikūlī inscription (between 293 and 296) is the last royal inscription with a Parthian version and none of the few short private Parthian inscriptions in Parthia is presumably later than the fourth century, it is likely that, at some point, the Sasanians imposed Middle Persian as the only official and written language of Iran with the consequence that it gradually spread everywhere from the fourth century on and even became the only recognized language of Zoroastrianism as the state religion of the Sasanian empire, thus marking the cultural triumph of Persia. The replacement of Parthian by Persian outside Fars brought about, by reaction, the introduction of a further number of Parthian loanwords in Persian (Boyce 1979: 116–​17; see also section 2.14 with n. 22). This is why, whereas the Manichaean texts basically represent genuine Middle Persian in its provincial purity (Sundermann 1989b: 139), a large number of Parthian loanwords characterizes conversely the late Sasanian speech varieties mirrored in the literature in Zoroastrian Middle Persian (also called Book Pahlavi). Though no manuscript is earlier than the fourteenth century, the Zoroastrian books were produced in the ninth and tenth centuries also on the basis of earlier textual tradition and form the third and largest text group of Middle Persian, which comprises, besides translations of large portions of the Avesta, other religious, doctrinal, didactical, and juridical texts, as well as a few non-​religious ones (Macuch 2009). It is noteworthy that, unlike later Zoroastrian Middle Persian, the early Avesta translations are linguistically conservative and preserve a morphology and syntax comparable to Inscriptional and Manichaean Middle Persian and the Pahlavi Psalter (section 2.9.4) and mirror a comparably early stage of the language (Cantera Glera 1999, who contrasts ‘Old Pahlavi’ with Book Pahlavi). For more information on Pahlavi, refer to Chapters 3 and 11.

2.9.4 Christian Middle Persian After their separation from the patriarchate of Antioch and the Western church chiefly for political reasons in the fifth century, Christians in Sasanian Iran used Middle Persian both in original texts and in translations before eventually abandoning it in favour of Syriac as their sole church language (Henning 1958: 77–​8; Sims-​Williams 1992: 534). The scant remnants of Middle Persian texts produced and used by Christians form a fourth, small text group consisting just in fragments of the so-​called Pahlavi Psalter (ed. Andreas and Barr 1933) dated by scholars between the fourth and the seventh century (Durkin-​Meisterernst 2006: 6–​78) and a fragmentary list of Pahlavi aramaeograms, both found in Bulayïq (Turfan).

2.9.5 Middle Persian scripts Apart from Manichaean texts, for which the clear and unambiguous Manichaean script is used, all other text groups are written in varieties of the highly conservative Pahlavi script derived from the script used in the Achaemenid period for writing Official Aramaic (Skjærvø 1996; Durkin-​Meisterernst 2014: 29–​74). 8 

Carbon-​14 dating of the Pahlavi Psalter now shows it to be not earlier than the late eighth or ninth century (Dieter Weber, lecture at a workshop in Berlin in 2010).

From Old to New Persian    19 The Pahlavi script is characterized by a heterographic writing system that combines words and endings written phonetically with hundreds of frequently occurring words (verbal and nominal stems, pronouns, prepositions, adverbs, and conjunctions) written as aramaeograms (or heterograms), that is, written in their Aramaic or pseudo-​Aramaic shape but read as the corre­ sponding Persian words (Utas 1988), much like the Latin ligatures & and @ one reads and and at in English (cf. pus ‘son’ in section 2.8). For more information on heterograms, refer to Chapter 11. The script also abounds in historical spellings that mirror the phonology of the language in the last centuries BC and probably no longer correspond to the evolution of the language at the time when the texts were written from the third century on, as is shown by the contemporary Manichaean spellings: thus, Manichaean , indicate that the word for ‘father’ was already pronounced pid, pidar (direct and oblique < Old Persian nominative pitā and accusative *pitaram, section 2.10.1, no. 1) with voiced postvocalic d in AD 300 ± 50 in contrast with Pahlavi , with historical , used besides the aramaeograms , (see the groundbreaking article by MacKenzie 1967, and its implementation in MacKenzie 1971). In contrast to the conservative writing conventions of the Pahlavi script which lasted unchanged until its demise and even introduced pseudo-​historical spellings, its ductus underwent a process of cursivization, which increased the intricacies of the script in that the shapes of several letters came to coincide in the Zoroastrian books and especially the papyri and the ostraka (Henning 1958: 46–​9).9

2.10  A new language type: survivals and innovations In comparison with Old Persian, Middle Persian (Table 2.1) is characterized by phonological changes that resulted in a phonemic system very close to the Early New Persian one:

1) lenition of consonants in non-​initial position through (a) voicing of the old voiceless occlusives p, t, k > b, d, g after voiced sounds10 (xšap-​‘night’ > šab; nominative brātā ‘brother’ > direct brād; bandaka-​‘vassal, follower’ > bandag ‘servant’); (b) voicing of old č > *ǰ after vowels and subsequent assibilation and depalatalization of secondarily voiced *ǰ and original ǰ > *ž > z in all positions (hacā ‘from’ > az; present stem jīva-​‘to live’ > zī(w)-​); (c) spirantization of the old voiced occlusives b, d, g > w, y, y (naiba-​ ‘good’ > nēw; pāda-​‘foot’ > pāy; baga-​‘god’ > bay); 2) contraction of the old diphthongs ai̯, au̯ > ē, ō (dai̯va-​‘demon’ > dēw; gauša-​‘ear’ > gōš) and introduction of short e and o (dahyu-​ ‘land, district’ > deh ‘land; village’; Auramazdā-​> Ohrmezd (Durkin-​Meisterernst 2014: 131–​2);


After the Pahlavi script was restricted to Zoroastrian circles in Islamic times, Middle Persian texts in Pahlavi script were occasionally transposed in Avestan script (Pāzand, twelfth century) or even Arabo-​Persian script (Pārsī, twelfth and thirteenth centuries) with adaptation to contemporary spoken Persian and replacement of aramaeograms and difficult words (Durkin-​Meisterernst 2014: 23–​5). On the possibility of obtaining information on Early New Persian from Pāzand texts, see Lazard (1991); Klingenschmitt (2000: 195–​6); and the criticism by de Jong (2003). 10  Cantillations and transcriptions in Sogdian script of Manichaean Middle Persian texts use to record the late allophones [β δ γ] of postvocalic /​b d g/​that Middle Persian shares with Early New Persian (Durkin-​Meisterernst 2014: 58, 116–​17; cf. section 2.16.2).

20    Mauro Maggi and Paola Orsatti Table 2.1 Middle Persian phonemes (adapted from Durkin-​Meisterernst 2014: 114ff.)11 īi

ūu ə (?)



āa Labial Plosive


Labiodental b

Affricate m




k c


Liquid Approximant

t f

Spirant Nasal






j (ž) x xw γ (?) h


ŋ (?) rl



3) general loss of vowel and coda in final syllables due to accentuation of the previous syllables (pati ‘in, at’ > pad; mártiya-​‘man’ > mard; genitive-​dative plural martiyā́nām > oblique plural mardān).12

The last mentioned change amounted to the loss of very many of the old nominal and verbal endings and brought about the disintegration of the Old Persian morphological system and its restructuring into a new Middle Persian one. Nouns and pronouns no longer distinguish gender (as against Old Persian masculine, feminine, and neuter) and their inflection is reduced first to two cases and then one case in the singular and plural, the dual number having vanished (see Chapter 9 for more discussion on case). Also the verbal inflection lost many forms and categories: the rare aorist, perfect, and future13 temporal stems, the middle diathesis, the dual number, and virtually all the secondary endings are not continued, modal forms are much reduced, and rare imperfect and -​ya-​ passive forms survive for a short time. Each verb has just a present stem, from which analytical forms are obtained, and a past participle in -​t/​-​d (< Old Persian -​ta-​), which combines with auxiliary verbs into past periphrastic formations. The loss of the old inflectional richness affected heavily the morphology and syntax of Persian, which greatly expanded the use of periphrastic verb forms (section  2.10.5) and

11  In this chapter, vowels with macrons are used in the phonological and the conventional scholarly transcriptions for long vowels. Likewise, ’ is used for [ʔ], c j for [ʧ] [ʤ], š ž for [ʃ] [ʒ], and y for [j]. 12  On the transformations of the accentual systems from Proto-​Iranian to Old and Middle Persian, see Klingenschmitt (2000: 210–​15); Huyse (2003: esp. 47–​61, 95). 13  The only possible but debated future form in Old Persian is the ‘historical future’ patiyāvanhyai̯ ‘I will/​was to implore’ in DB 1.55, if read correctly (Schmitt 2014: 275–​6). However, later forms like Middle Persian paywah-​(Inscriptional , Manichaean ) ‘to implore, entreat’, without -​n-​ and with -​h-​as part of the present stem, rather point to an Iranian root *u̯ah-​‘to venerate, implore, pray’—​ which also continues in Avestan, Parthian, and Bactrian—​and seem incompatible with the idea that patiy-​ā-​van-​hy-​ai contains the future suffix -​hy-​(< Indo-​Iranian *-​si̯-​) added to the root van-​< Indo-​ Iranian *u̯an-​‘to desire’, otherwise unattested in Iranian (Cheung 2007: 405–​6).

From Old to New Persian    21 resorted more and more to preverbal particles (bē lit. ‘out’, hamē lit. ‘always’) to express aspectual distinctions (Brunner 1977: 157–​68; Durkin-​Meisterernst 2014: 388–​90) and prepositions (especially pad ‘to, at, in, on, etc.’, ō ‘to, at, etc.’, az ‘from, because of, etc.’, and the postposition rāy ‘for, for the sake of, etc.’) to express and disambiguate syntactical functions of nouns and pronouns, including those of the agent through pad (8) and az (9) and the direct object through ō (10), rarely used for inanimate objects (Paul 2003a: 188–​90; Durkin-​ Meisterernst 2013: 251), and, in late texts, rāy (11), though agent and direct object were basi­ cally expressed by the oblique case alone, which was mostly endingless in the singular (see Brunner 1977: 116–​55; Durkin-​Meisterernst 2014: 298–​359, 386, for the sources of the examples; cf. section 2.17.8, with n. 32): (8)

ud hān rōšnī ud xwašan īg yazd-​ān, ī az nox pad āz ud and that light and beauty EZF god-​PL, which from beginning by Āz and Ahremen ud dēw-​ān ud parīg-​ān zad būd, ... Ahremen and demon-​PL and witch-​PL smite.PST.PTCP be.PST.PTCP, ... ‘and that light and beauty of the gods, which was smitten in the beginning by Āz and Ahremen and the demons and the witches, ... ’.


dēn ī xwarāsān pad wisp šahr ud pāygoš āfur-​īh-​ā-​nd religion EZF east in every country and region bless-​PASS-​PRS.SBJV-​3PL jāydān az wisp yazd-​ān for.ever by all god-​PL ‘The religion of the east should be for ever blessed by all the gods in all countries and regions!’

(10)  uzdēsparist-​ān kē parist-​ē-​nd idolator-​PL who worship-​PRS.IND-​3PL ‘idolators who worship images’.


pahikar-​ān image-​PL

ābāyed ka14 ān šagr-​ān rāy zīndag ō amāh āwar-​ē-​d that those lion-​PL OBJ alive to 1PL bring-​PRS.IND-​2PL ‘It is necessary that you bring us those lions alive’.


Word order is comparatively less free in Middle Persian and may contribute in part to make clear the relationships of words in a clause (Mękarska 1981–​4; Durkin-​Meisterernst 2014: 262–​3). The language moved, thus, from the mainly synthetic morphological patterns of Old Persian to the decidedly more analytic type of Middle Persian (Henning 1958: 89–​90).

2.10.1 Two-​case declension and shift from case to number opposition Substantives, adjectives, and pronouns preserve conspicuous remnants of a two-​case system (Table 2.2) with a direct case used for the subject and the predicate noun, and an oblique case 14 

See section 2.18 on the late confusion of ka ‘when, if ’, kē ‘who, which’, and kū ‘where; that; than’.

22    Mauro Maggi and Paola Orsatti Table 2.2 The early two-​case system of Middle Persian Singular



Oblique Direct


Nouns of relationship




‑arān (-​arīn, -​arūn)

Other nouns




-​ān (-​īn, ‑ūn)

First singular pronoun



used for direct object, indirect object, agent, to express possession, and with prepositions, the postposition rāy, and the relative particle (in the plural, either the direct or the oblique case could express the direct object). The two cases (Durkin-​Meisterernst 2014: 197–​203, 206–​8; on the prehistory of the system, see Huyse 2003; Cantera 2009) are formally distinguished only in:

1) the singular and plural of the nouns of relationship in -​dar < Old Persian -​tar-​ (singular: direct brād < nominative brātā, oblique brādar < accusative *brātaram; plural: direct brādar < nominative *brātara, oblique brādarān < analogical genitive-​dative *brātarānām) and pus ‘son’ (Old Persian puça-​) with analogical pusar; 2) the first person singular pronoun: direct an < *anam < Old Persian nominative adam (Sims-​Williams 1981: 166); oblique man < genitive-​dative manā;15 3) the plural of all other substantives, adjectives, and non-​personal pronouns (oblique -​ān, more rarely -​īn, -​ūn < Old Persian genitive -​ānām, *-​īnām, -​ūnām).

The two-​case system occurs coherently only in Inscriptional Middle Persian but is on the verge of dissolution and vanishes in time. Already in the Pahlavi Psalter the oblique plural is used in a few instances as a general plural form (Skjærvø 1983), as regularly happens in Manichaean Middle Persian, which only distinguishes two cases in the nouns of relationship and the first person singular pronoun (Sims-​Williams 1981: 166–​7 1). The two-​ case system is still functional in the early Avesta translations, as is particularly clear for the nouns of relationship (Cantera Glera 1999: 194–​202). The further development during the Sasanian period resulted ultimately in the simplification of the system in Zoroastrian Middle Persian, where man is the only form of the first person pronoun, the old oblique -​ān is the general plural ending (so that an opposition of number prevails on the oppo­ sition of case), and only the nouns of relationship keep the old singular direct and oblique forms (brād, brādar) but without any functional distiction. The next step will be taken by New Persian, where only originally oblique singular forms in -​ar survive (singular barādar, plural barādarān). In late texts, both Manichaean and especially Zoroastrian, there also occurs the plural ending -​īhā (Durkin-​Meisterernst 2014: 201), the antecedent of New Persian -​hā (cf. section 2.17.10, n. 35). 15  The enclitic personal pronouns (singular -​m, -​t, -​š; plural -​mān, -​tān, -​šān), most often suffixed to the first word in a clause, only function as oblique in all text groups (Durkin-​Meisterernst 2014: 208–​10, 291–​6), as is still the case in New Persian.

From Old to New Persian    23

2.10.2 Relative pronouns and relative particle The Middle Persian relative pronoun and particle ī (Manichaean also īg with suffixal -​g < -​ka-​) continues Old Persian haya-​/t​ aya-​in both its values as a pronoun proper and as a device connecting substantives with modifiers (section 2.6.2). Middle Persian ī follows head nouns or pronouns and connects them to modifying dependent nouns or nominal phrases (12–​13), prepositional phrases (14), adjectives and adjectival phrases (15), and even clauses (16) (see Boyce 1964: 28–​9, 37–​47; Durkin-​Meisterernst 2014: 268–​7 1, for the sources of the examples): (12)

ō wimand ī Xūzestān to border EZF Xuzestan ‘to the border of Xuzestan’.


pad nām ī=š in name EZF=3SG.OBL ‘in the name of his father’.

pidar father

(14) aw-​īn ī andar diz that-​PL EZF in fortress ‘those in the fortress’. (15)

āstānag ī naxwistēn ī abardar mahy ud mehmdar az abārīg-​ān threshold EZF first EZF higher greater and broader than other-​PL ‘the first threshold, higher, greater, and broader than the other ones’.

(16) čē ēn=iz ī nūn amāh ō ēn handēmān ī because this=too EZF now 1PL to this presence EZF mad h-​ē-​m, ... come.PST.PTCP be-​PRS.IND-​1PL, ... ‘because this too, that we have now come to the presence of the gods, ... ’.

yazd-​ān god-​PL

This construction largely compensates for the loss of the Old Persian genitive-​dative and other indirect cases, so that it occurs much more commonly in Middle Persian. It is the direct antecedent of the New Persian ezafe construction with -​e (doxtar-​e bāhuš ‘intelligent girl’, mardom-​e Gilān ‘the people of Gilan’, etc.; cf. section 2.17.9), though the Middle and New Persian constructions have partly different functions and the New Persian one occurs even more frequently. In addition to the relative pronoun ī, Middle Persian, like the other Middle Iranian languages, also uses the inherited interrogative pronouns kē ‘who? and čē ‘what?’ with relative force, though, in this function, reference to living beings or inanimated things is not always distinguished (Durkin-​Meisterernst 2014: 216, 415–​30).

2.10.3 Present and modal forms Middle Persian present stems continue Old Iranian present stems formed by means of a variety of suffixes, whose presence is partly obscured by phonological changes

24    Mauro Maggi and Paola Orsatti Table 2.3 Manichaean Middle Persian endings of the present (after Durkin-​Meisterernst 2014: 232ff.)16 Indicative 1 singular

‑ēm (‑am, ‑om)


‑ēh > ‑ē


‑ēd (‑ad)

1 plural

‑om, ‑ēm (‑am)








‑ān -​ Ø (‑ē)

‑āy ‑ād

‑ēh > ‑ē

‑ām ‑ēd

‑ād ‑ānd

(e.g., with suffix *-​nau̯-,​ Old Iranian *kr̥-​nau̯-​ ‘to do’ > Old Persian ku-​nau̯-​ > Middle Persian kun-​; cf. section 2.6, no. 1 for an example [dān-​‘to know’] of the old suffix -​nā-​ and section 2.10.4 on the old suffix -​sa-​). It is commonly held that, in the inflection of the present (Table 2.3), two suffixes came to prevail: Old Persian -​aya-​ > Middle Persian -​ē-​ for indicative and imperative; Old Persian -​a-​ with the addition of the subjunctive suffix -​a-​ (> -​ā-​> Middle Persian -​ā-​) for subjunctive and the optative suffix -​i-​ (-​ai̯-​> Middle Persian -​ē-​) for optative, only attested in the third singular -​ē < -​ēh < *-​ait (?) (Sundermann 1989: 148–​50). Recently, Durkin-​Meisterernst (2014: 241) has suggested that also the subjunctive contains the old suffix -​aya-​ (-​aya-​ + -​a-​ > -​ayā-​> Middle Persian -​ā-​), which provides a unified historical explanation of the inflection of the present with parallels in Middle Indo-​Aryan. A conspicuous exception is the third singular present of the copula ast ‘is’, which continues the suffixless inherited Old Persian form as-​ti with the ending added directly to the root. The rest of the paradigm is levelled and based on the stem h-​(hēm, hē, etc.). Differently from Inscriptional and Manichaean Middle Persian, the Pahlavi Psalter, and the early Avesta translations, later Zoroastrian Middle Persian only has subjunctive forms for the third persons singular and plural (Cantera Glera 1999: 177–​87; Durkin-​Meisterernst 2014: 232–​9). In the absence of a specific form for the future, this is expressed by the indicative present as the mood of plain statements and the subjunctive present as the mood of wish and possibility. Combined with the particle ēw/​hēb, the indicative acquires an exhortative meaning similar to the imperative and optative (Durkin-​Meisterernst 2014: 377–​81).

2.10.4 New verb suffixes: causatives, denominatives, ‘inchoatives’, and passive New present stem formations that compensate for the loss or change of function of old ones (e.g. the suffix -​aya-​ also forming causatives in Old Persian [Kent 1953: 72–​3] but turned into 16  Alternative endings only attested in Inscriptional and Zoroastrian Middle Persian are enclosed in parentheses.

From Old to New Persian    25 a general present suffix in Middle Persian) are produced by a few suffixes with clearly defined functions (Durkin-​Meisterernst 2014: 228–​30): 1) -​ēn-​ makes an intransitive verb transitive (rōzēn-​ ‘to make bright’ from rōz-​ ‘to shine’), changes a transitive verb into a causative (zāmēn-​ ‘to send’ from zām-​ ‘to lead’), and forms denominatives with causative meaning (pērōzēn-​ ‘to make victorious’ from pērōz ‘victorious, victor’); 2) the so-​called ‘inchoatives’ add synchronically the suffix -​s-​ (< Old Persian -​sa-​, no longer productive in its original inchoative value: Kent 1953: 71; Weber 1970) to the past participle without final -​t to form intransitive verbs (hanzafs-​‘come to an end, become perfect’ from hanzām-​, hanzaft-​ ‘to finish, fulfil’); 3) -​īh-​ forms passives (dānīh-​‘to be known, recognized’ from dān-​‘to know, recognize’; kēšīh-​ ‘to be taught’ from a suffixless denominative *kēš​- ‘to teach’ from kēš ‘​(false) teaching’) and is possibly a transformed reflex of the Old Persian suffix -​ya-​.

2.10.5 Imperfect, periphrastic past tenses, ergative construction, and periphrastic passive The old synthetic imperfect survives in only few forms in the early inscriptions (ʾkylydy /​akirīy/​‘was made’ < Old Persian akariya; Durkin-​Meisterernst 2014: 244–​6) and the third singular anād, plural anānd of the verb ‘to be’ (if based ultimately on forms of the Old Persian imperfect with stem āh-​ < a-​ah-​ developed by analogical and conflation processes:  see Skjærvø 1991 and 1997b: 171–​2). All other past tenses (Table 2.4) are expressed by periphrastic formations consisting of a past participle and inflected forms (including periphrastic ones) of the indicative and, more rarely, the subjunctive or the optative of the auxiliary verbs h-​‘to be’ (the third singular present being always omitted), ēst-​, ēstād ‘to stand’, and baw-​, būd ‘to become’ as follows:17

Table 2.4 Middle Persian past tenses (PP = past participle) Preterite

PP + present of h‑

šud hēm šud

I went, have gone he went, has gone

Past preterite

PP + preterite of h‑

šud būd hēm šud būd

I had gone he had gone


PP + present of ēst‑

šud ēstēm šud ēstēd nibišt ēstēd

I have gone he has gone it is (stands) written


PP + preterite of ēst‑

šud ēstād hēm šud ēstād nibišt ēstād

I had gone he had gone it was (stood) written


Skjærvø’s terminology (2009a: 218–​19) is adopted here for past tenses in Middle Persian.

26    Mauro Maggi and Paola Orsatti The periphrastic past tenses are a development and an expansion of the new perfect and pluperfect of Old Persian (section 2.6.1). The periphrastic formation is further discussed in Chapter 3. On the one hand, periphrastic past tenses of intransitive verbs have an active meaning: for example, āmad hēm ‘I came’. On the other hand, when they occur with the passive past participle of transitive verbs (cf. the Old Persian ‘manā kr̥tam construction’), they have a passive meaning and the logical subject, if expressed, is grammatically an agent in the oblique case: thus, from paymōz-​, paymōxt ‘to don, wear; dress’, paymōxt hēm ‘I was dressed’, paymōxt būd hēm ‘I had been dressed’, man paymōxt hēnd ‘I dressed them’ ← ‘they were (hēnd) dressed by me (man)’ (Sundermann 1989b: 152–​3). This gives rise to a situation of split ergativity in that ergative alignement only occurs in the past of transitive verbs but not in the past of intransitive verbs and the present of all verbs (Haig 2008: 89–​129; Durkin-​ Meisterernst 2014: 392–​400; Jügel 2015: 81–​95, 325–​44, 626–​806). Besides the synthetic passives in -​īh-​, a periphrastic passive present can be formed by combining a passive past participle with the present of baw-​‘to become’: paymōxt bawēm ‘I am (being) dressed’ (Sundermann 1989b: 152; Skjærvø 2009a: 221; cf. the Old Persian passive ‘potential construction’, section 2.6.1).

2.11  The linguistic situation in late Sasanian Iran An account by Ibn al-​Muqaffaʿ (d. 757 AD), a native of Fars who translated numerous works from Middle Persian into Arabic and may be accordingly regarded as a realiable witness, makes it possible to outline the linguistic situation in late Sasanian Iran. The account, which must refer to the end of the Sasanian period in the mid-​seventh century, has been transmitted by Ibn al-​Nadīm in his Fihrist (about 987 AD) and other early Arabic writers, and studied in detail by Lazard (1971a) in its implications for the subsequent history of Persian. According to it, five languages were then in use in Iran, including two non-​Iranian ones:18 soryāni, that is, Aramaic; xuzi, possibly a survival of Elamite in Khuzistan; pārsi, the language used in Fars and by the Zoroastrians priests (mowbad) and the learned people; dari, used at the royal court (dar) and in the east up to Balkh (present-​day Afghanistan); and pahlavi, used in the historical region of ‘Fahlah’ (Pahle, north-​western Iran). In this context, pahlavi refers to the Parthian language still spoken in north-​western Iran at that time (Middle Persian Pahlaw means ‘Parthia’), while pārsi and dari denote two varieties of Middle Persian, that must have coexisted during the formative period that preceded the origin of New Persian (section 2.14). On the one hand, pārsi was Persian proper, that is, the spoken language of Fars and southern Iran that also formed the basis of the written religious and literary language. On the other hand, dari was a more innovative variety that was spoken at the Sasanian court in Ctesiphon (al-​Madā’in) in Mesopotamia, but, as the prestigious language of the imperial capital, also spread east and was to form the basis of New Persian.


New Persian equivalents are substituted here for al-​Nadīm’s arabicized terms.

From Old to New Persian    27

2.12  Chronological and other divisions of New Persian The history of New Persian, or simply Persian, covers a period ranging from the time of the oldest documents assumed to be written in New Persian, in the eighth century AD, until now. This span of time can be divided on the basis of various criteria—​linguistic, historical, or a blending of both—​and various divisions have been proposed (Windfuhr 1979: 166; Paul 2013a: 258). The very beginning of the New Persian linguistic period is a controversial issue. It is usually connected with the historical change brought about by the conquest of Iran by Muslim Arabs and the end of the Sasanian empire in the mid-​seventh century, but this is only a conventional starting point based on extra-​linguistic data. Indeed, it is unlikely that such however epochal change, which afterwards also entailed a change of religion from Zoroastrianism to Islam and the adoption of the Arabic script to write Persian, could have any immediate consequences on the languages spoken in Iran (see Chapter 11 for more on the influence of Arabic on Persian). A possible division of the history of New Persian reckons three major periods, which correspond to the traditional major periods in the history of Persian literature.

1) The first or archaic period, usually referred to as Early New Persian (Paul 2013b), lasts from the first attestations of New Persian to the beginning of the thirteenth century. It spans over several historical epochs, from the inclusion of Iran into the Arabic cal­ iphate to the first Mongol incursions on Iran. 2) The period of Classical New Persian begins with the blossoming of Persian classical literature in the thirteenth century, the century of Saʿdi, and is usually considered to reach the eve of modern Iran.19 Starting from the thirteenth century, literary New Persian reached a unitary form all over Iran, losing the dialectal features still present in Early New Persian texts and giving rise to a canon, to which the literary language shall adhere for the centuries to come. In this period the literary language exerted an increasing influence on the old non-​Persian dialects, which in some cases were even supplanted, or survived longer only among the religious minorities of Iran (Yarshater 1974). Literary New Persian also exerted a unifying influence on the spoken varieties of Persian, and the old Persian dialects were replaced by new dialects issued from the encounter of the literary language with the old dialectal substratum. An example are the old dialect of Isfahan studied by Tafażżoli (1971) and the modern one (Smirnova 1978). For more discussion on Persian dialects, see Chapters 3, 13, and 14 and for more information on Isfahani, see Chapter 3. 3) Lastly, the period of Modern and Contemporary New Persian, from the mid-​eighteenth century to the present day, is characterized by an increasing influence, on the development of the literary language as well as on Persian literature, of European culture and languages: French, English, and—​in the Central Asian varieties of Persian—​Russian.

19  Criticism of the concept of ‘Classical Persian’ as a term referring to any linguistically based definition of any period of the history of the New Persian language has been voiced by Paul 2002. However, in the absence of a better definition, such term has been retained here.

28    Mauro Maggi and Paola Orsatti Within each of these periods it is appropriate to distinguish between literary and non-​ literary language varieties. Non-​literary texts such as inscriptions, coins, and private documents (letters, legal documents, etc.) are particularly important because, compared to literary texts, they usually display linguistic features closely related to the everyday language of a certain region and time. Moreover, inscriptions, coins, and private documents are normally preserved in the original, while literary texts have mostly undergone a long transmission that may have altered their linguistic reality because of the normalizing intervention of copyists. A last distinction concerns the presence of a high (or written) and a low (or spoken) vari­ ety in Contemporary New Persian, as suggested by Jeremiás (1984) in a study on the interpretation of the contemporary linguistic situation of Iran in terms of diglossia, though this has been questioned by Perry (2003; summary of the matter in Rossi 2015). For more information about diglossia, see Chapters 13 and 19. In actual fact, a distinction between a spoken variety—​with a further stylistic differentiation between a formal, official, or educated spoken sub-​variety, and an informal, familiar, or colloquial spoken sub-​variety—​and a literary or, for modern times, a standard variety should be taken into account not only for Contemporary New Persian, but, with the due differences, also for each period in the history of Persian (see Chapters 3, 4, 5, 6, 10, 11, and 15 for more on colloquial form).

2.13  Early New Persian texts in different scripts Especially in the case of Early New Persian, it is important to take further into account the differences between varieties reflected in documents in various scripts (Table 2.5). Indeed, Early New Persian is documented not only by texts in Arabic script, but also by texts in other scripts, emanating from the Persian speaking religious minorities spread all over Iran. For Early New Persian the following documents should be considered, besides the texts in Arabic script: Judaeo-​Persian texts, that is, Persian texts in Hebrew script; Manichaean New Persian texts; Persian texts in Syriac script; Zoroastrian New Persian texts in Pahlavi script (on these, see de Blois 2000 and 2003). Judaeo-​Persian undoubtedly represents the most important corpus, both for the quantity and quality of its documents, and for their ancientness. Apart from single studies and editions, an overall study of the language of the Early Judaeo-​ Persian texts, including some unpublished private letters, has been recently provided by Paul (2013c). Like New Persian in Syriac script, which, however, offers a much smaller corpus,20 and unlike Manichaean and Zoroastrian New Persian, Judaeo-​Persian has a continuation into later periods. Later Judaeo-​Persian texts are less interesting from the viewpoint of linguistic history, however, as their language can be considered ‘an offshoot of Classical Persian’ (Shaked 2010: 321) and their orthography appears as a mere transliteration of Arabo-​Persian orthography (Meier 1981: 108). 20  Christians used the Pahlavi script in Sasanian Iran (section 2.9.4). In the Islamic period, they normally used the Arabic script, even dating their manuscripts according to the Hegira, the Islamic era. On this phenomenon and the cultural dynamics among the various ethnic-​religious minorities in ancient Iran, see Orsatti 2007a.

From Old to New Persian    29 Table 2.5 New Persian documents in different scripts (with abbreviations) Early New Persian in Arabic script Marriage contract (Scarcia 1963, 1966) Codex Vindobonensis (facsimile ed. Muwaffaq 1972) QQ: Qur’ān-​i Quds (Revāqi 1985; Lazard 1990) AT: Asrār al-​tawḥīd (Shafiʿi Kadkani 1987b) Judeo-​Persian Ar: Argument (MacKenzie 1968; Shaked 1971a: 178–​80; MacKenzie 1999: 671–​3; MacKenzie 2011) Du1: Letter from Dandān Uiliq 1, Central Asia, northeast of the Khotan oasis (Utas 1968; Lazard 1988) Du2: Letter from Dandān Uiliq 2 (Zhang and Shi 2008) Ez1: Tafsir of Ezechiel, first part (Gindin 2007) Gen: Tafsir to Genesis (Shaked 2003) Kd: Karaite document (Shaked 1971b) Lr: Law report of Ahvaz (Asmussen 1965; MacKenzie 1966; Shaked 1971a: 180–​2) Ta (A, B, C): three inscriptions of Tang-​i Azao, western Afghanistan (Henning 1957) Manichaean New Persian Ha: Bilawhar and Būdāsaf (Henning 1962: 91–​8) Hb: qaṣīda (Henning 1962: 98–​104) Lehrtext (Sundermann 2003) Manichaean New Persian fragments (Provasi 2011) New Persian in Syriac script Baptism (Orsatti 2003a) Glosses (Maggi 2003) Matthew (Maggi 2005) Psalms (Sundermann 1974; Sims-​Williams 2011: 353–​61) New Persian in Latin script Codex Cumanicus (Monchi-​Zadeh 1969, Bodrogligeti 1971)

Five eighth-​century Judaeo-​Persian documents represent the earliest attestation of New Persian and go back to a period when the Arabic alphabet had probably not yet been adapted to writing Persian (section 2.19). These are three inscriptions from Tang-​i Azao in western Afghanistan dated 1064 of the Seleucid era corresponding to 752 AD (Henning 1957)21 and two letters from Dandan Uiliq in the Khotan region in Chinese Central Asia, datable between 780 and 790 AD (Utas 1968 with references, and Zhang and Shi 2008). To these one may add a number of New Persian glosses in Syriac texts basically from the first half of the eighth century (Maggi 2003). 21  Rapp (1967: 55–​6) unconvincingly questioned Henning’s dating and proposed the much later date of 1299–​300 CE.

30    Mauro Maggi and Paola Orsatti The Manichaean New Persian documents published so far (Henning 1962; Sundermann 1989a, 2003; Provasi 2011) can be dated to the tenth and eleventh centuries and come from the territory formerly occupied by the Sogdian colonies of Chinese Turkestan in Central Asia. The earliest original documents of New Persian in Arabic script, both literary and not, go back instead to the eleventh century. They are the so-​called Codex Vindobonensis, a pharmacological treatise of the end of the tenth century by Abū Manṣūr Muwaffaq b. ʿAlī al-​Hirawī, copied by the poet Asadī in Šawwāl 447/​December 1055–​January 1056 (facsimile editions:  Muwaffaq 1972, 2009), and, among non-​literary documents, the Marriage contract from Bāmiyān, Afghanistan, dated 470/​1078 (Scarcia 1963 and 1966) as well as a deed concerning a sale of land from Khotan dated 501/​1107 (Margoliouth 1903 with facsimile; Minorsky 1942 correcting the date as 501 instead of 401 of the Hegira).

2.14  Dialectal classification of the Early New Persian documents: pārsi and dari A major dialectal division of Early New Persian is that between pārsi, ‘Persian’ tout court, diffused all over southern Iran and in the first centuries of Islam still linguistically close to literary Middle Persian, and dari ‘(the language) of the court’, which covered the regions of northern Iran from west to east (Lazard 1971a, 1975a, 1993; cf. section 2.11). Each of these major dialectal varieties

Table 2.6 Dialectal and chronological classification of ENP documents North/​North-​West Iran

North-​East Iran

Tafsir of Ezechiel, first part (Ez1): eleventh century

Inscriptions of Tang-​i Azao (Ta): 1064 Seleucid/​752 AD

Tafsir of Genesis (Gen): eleventh century or after

Letters of Dandān Uiliq (Du1, Du2): datable to the second half of the eighth century Codex Vindobonensis: dated Šawwāl 447/​December 1055–​January 1056 Marriage contract: dated 470/​1078 Matthew: datable Herat eleventh century Psalms in Syriac script: before mid-​thirteenth century

South-​West Iran

South-​East Iran (Sistan)

Glosses: first half of the eighth century

Qur’ān-​i Quds (QQ): datable to the second half of the eleventh century

Argument (Ar): tenth century or earlier Karaite document (Kd): 1262 Seleucid/​ 950 AD Law report of Ahvaz (Lr): Ahvāz (Khuzistan, south-​western Iran), 1332 Seleucid/​1020 AD Baptism: before the thirteenth century

From Old to New Persian    31 of Early New Persian are divided into western and eastern sub-​varieties (see Table 2.6): pārsi is known from documents originating from south-​western (Khuzistan) and south-​eastern Iran (Sistan) respectively; likewise, dari is known from documents originating from north-​western or central Iran, and north-​eastern Iran and Transoxiana (Lazard 2014). One of the most important dialectal features is the treatment of initial wi-​(Lazard 1987: 174; 2014: 93–​4), which is gu-​in north-​eastern dari and hence in literary New Persian, and bi-​(still close to Middle Persian wi-​) in documents from southern Iran. The glottonyms dari and pārsi have different meanings in different periods:

1) At the end of the Sasanian epoch dari may have referred to the oral register of Middle Persian, spoken at the Sasanian court (dar) and more broadly in the capital city of Ctesiphon, on the Tigris. It was a variety of Middle Persian endowed with prestige and probably more innovative compared to written or literary Middle Persian. 2) In the first decades of Islam, dari, the spoken variety of Middle Persian, received a strong burst to expansion thanks to the successive waves of conquest. Indeed, Persian was the language of the Islamic conquests towards Central Asia; and the glottonym dari came to refer to the northern and north-​eastern varieties of Persian. In this movement towards north and north-​east, dari superseded other Iranian languages such as Parthian and Sogdian, also borrowing some features from them and thereby increasingly differing from the Middle Persian (pārsi) still spoken in southern Iran.22 3) When, in the ninth century, literary New Persian arose in the courts of north-​eastern Iran, dari was the linguistic variety at the basis of literary New Persian. Then pārsi-​e dari, or simply dari, came to mean ‘literary New Persian’.

Correspondingly, pārsi as opposed to dari means:

1) The written register of Middle Persian, that is, literary Middle Persian. 2) The more conservative south-​western variety of Persian, which continued to be spoken and written during the first centuries of Islam. When, by the beginning of the thirteenth century, the new unitary literary language originated in north-​eastern Iran spread all over Iran, pārsi ceased to be attested. 3) Persian in general as opposed to its literary variety (dari or pārsi-​e dari).

2.15  Later New Persian texts in other alphabets, the Codex Cumanicus, and Persian as a lingua franca in Asia Later New Persian texts written in alphabets other than Arabic, for instance Armenian and Latin, are also relevant for reconstructing the earlier stages of Persian. The so-​called Codex Cumanicus (Venice, Marciana Library, MS Lat. DXLIX 1597, dated 1330) is one of the most ancient New Persian texts in Latin script and furnishes rich linguistic material. Its first 22  See Lentz (1926) on Parthian elements in the Šāhnāme and Henning (1939) on Sogdian loanwords in New Persian.

32    Mauro Maggi and Paola Orsatti part contains a Latin–​Persian–​Cuman lexicon, whose original was probably composed by Genoese merchants in Solghat, Crimea, in 1324–​5 (Drimba 1981; Drüll 1980 proposes a somewhat earlier date). It represents a kind of manual for interpreters (Ligeti 1981). The Persian linguistic material, whose dialectal characterization poses not easily solvable problems (MacKenzie 1992), has been published and studied by Monchi-​Zadeh (1969) and Bodrogligeti (1971). As to the reasons for the presence of Persian in a manual for interpreters to be used in fourteenth-​century Crimea, a Turkish speaking area, the commonly accepted theory is that Persian functioned as lingua franca in the Black Sea region (see Vásáry 2005) as well as in large parts of Asia (for criticism against this theory, see Orsatti 2003b; Orsatti 2007b: 51–​5; and Bausani 1969a: 517 for the sea-​trade language in Asia). Transcriptions of Persian texts in Latin script made by Catholic missionaries or European travellers to Safavid Iran multiply in the seventeenth century and provide important linguistic data (Orsatti 1984; Perry 1996). The more or less occasional rendering of Persian through the Latin alphabet may be heavily conditioned by the spelling conventions the transcriber uses to write his own language (Italian, Portuguese, Spanish, etc.). A trivial example is the use of (jaber for xabar) to write the uvular fricative x in the transcription of a Persian translation of the Koran in Latin script made by a Spanish missionary at the beginning of the seventeenth century (Vatican Library, MS Vat. Pers. 51: Bodrogligeti 1961). If the occasional use of the Latin script entails the difficulty just exposed for reconstructing the linguistic reality of a text, other writing systems more stably used for writing New Persian and other Iranian languages pose two kinds of equally thorny problems: the presence of historical spellings, which is particularly cumbersome in Manichaean New Persian;23 and the coexistence of different orthographic layers, whereby old and new spellings occur in the same text.

2.16  New Persian phonology in historical perspective 2.16.1 Contemporary New Persian of Iran Before dealing with earlier stages of the phonology of Persian, it is appropriate to sketch the phonological system of Contemporary New Persian as a term of comparison. For literary Contemporary New Persian of Iran, Pisowicz (1985) establishes a phonological system of 6 vowel and 24 consonant phonemes (Table 2.7). For vowels, the distinctive character opposing /​e/​to /​i/​, /​a/​to /​ā/​,24 and /​o/​to /​u/​is timbre, that is, the different tongue positions and rounding vs. non-​rounding. A difference of length between the two series /​e a o/​and /​i ā u/​is only perceptible in an open unstressed syllable, whereas in a stressed position the length of all vowels is more or less identical. The 23  To write New Persian, the followers of Manichaeism used the Manichaean alphabet. Because Middle Persian and Parthian were the languages of Manichaean liturgy, New Persian Manichaean orthography shows influence from both these languages’ orthography. On the adaptation of the Manichaean script to write New Persian, see Henning 1958: 73–​5; Henning 1962: 90–​1; Orsatti 2007b: 150–​64. 24  The symbol ā indicates a mid, back, labialized vowel [ɒ], distinct from mid, central, unlabialized a. This symbol is used here in analogy with the symbol used in the scholarly transcription of Persian.

From Old to New Persian    33 Table 2.7 Phonological system of literary Contemporary New Persian i

u e

o aā

Labial Plosive


Labio-​dental Dental b



Affricate Fricative Nasal

f m

Liquid Approximant


















n lr



diphthongs ey and ow are interpreted as a sequence of two phonemes each, /​e + y/​and /​ o + w/​, the latter phoneme only occurring after /​o/​. As to consonants, the opposition between the two series of plosives /​p/​~ /​b/​, /​t/​~ /​d/​, /​k/​ ~ /​g/​, and affricates /​c/​~ /​j/​is of tenseness rather than of voicing. This kind of opposition had already been recognized for plosives and affricates in colloquial educated Contemporary Tehrani New Persian by Provasi (1979). The realizations of /​q/​(written ġeyn and qāf  ) as [q]‌, [ɢ], or [ʁ] (= γ) are mere allophones conditioned by position. Pisowicz regards the palatal articulation of /​k g/​, that is, [kʲ gʲ], as their chief realizations in Contemporary New Persian of Iran.25

2.16.2 Classical and Early New Persian For Classical and Early New Persian Pisowicz reconstructs a vocalic system consisting of three short vowels /​i a u/​, five long vowels /​ī ē ā ō ū/​, and two monophonemic diphthongs /​a͡y a͡w/​(Table 2.8). As to consonants, there is no /​v/​phoneme, because present-​day v-​at the head of a syllable was realized as /​w/​. The opposition between the two series of plosives, affricates, and fricatives was of voicing, not of tenseness as in Contemporary New Persian. The ancient ‘Iranian labiovelar’ /​xw/​, subsequently reduced to and merged into /​x/​, was still preserved. The voiced uvular fricative /​γ/​contrasted with the plosive /​q/​of Arabic and Turkish loanwords. The methods Pisowicz follows for his reconstruction are diverse, first and foremost the study of Persian texts in Latin alphabet like the Codex Cumanicus (section 2.15) and of New Persian borrowings in other languages. The analysis of morphophonological alternations 25  For colloquial educated Contemporary Tehrani Persian, Provasi (1979) reconstructs a system of only twenty-​two consonantal phonemes, not recognizing as phonemes /​ʔ/​and /​ž/​of the literary variety. As to vowels, the distinctive character opposing /​i u ā/​to /​e o a/​is tenseness vs. laxeness.

34    Mauro Maggi and Paola Orsatti Table 2.8 Phonological system of Early and Classical New Persian īi

ūu ē ay

ō aw āa

Labial Plosive




Dental t


Affricate Fricative





Velar k






Uvular q

Laryngeal ʔ

x xw γ h



Liquid Approximant

n lr



in the contemporary language is also important. For example, alternations like Xosrow ‘(King) Xosrow’ ~ xosravi ‘regal’ and miravam ‘I am going, I go’ ~ berow ‘go!’ attest to an ancient pronunciation aw of the present-​day diphthong ow. The development aw > ow is confirmed by Arabic loanwords in Persian as Arabic dawr > Contemporary New Persian of Iran dowr ‘circle’. Comparison with present-​day Persian of Afghanistan (also called Dari) is useful because of its conservative character. Dari retains a vocalic system with three short vowels /​æ e o/​ (the two last ones still articulated close to /​i u/​) and five long vowels /​ī ē ā ō ū/​matching the Classical New Persian ones. Indeed, Afghan Persian still retains long /​ē ō/​(the so-​called majhul vowels ‘unknown’ to Arabic), which in New Persian of Iran have merged into /​ī ū/​respectively (see below in this section on their possible outcomes ay and aw). As to consonants, the opposition of voicing recognized for Early and Classical New Persian between the two series of plosives, affricates, and fricatives is maintained in Afghan Persian. Moreover, Classical New Persian presents an opposition between /​γ/​and the new phoneme /​q/​introduced through the massive entrance of Arabic and Turkish loanwords. This opposition has disappeared from New Persian of Iran but is preserved in Afghan Persian. Dari also suggests that the final -​e of Iranian and Arabic words in Contemporary New Persian of Iran was formerly -​a. This is confirmed by spoken Contemporary New Persian of Iran, where final -​e alternates with -​a before the postposition -​rā, for instance, xāne-​rā ~ xuna-​ro ‘the house (direct object)’. Words such as ke ‘who’, ce ‘what’, se ‘three’, by contrast, always retain final -​e, thereby attesting to the original presence of a different vowel. Meier (1981: 86–​103) gives a complete survey of the development of the majhul vowels /​ē ō/​of Early and Classical New Persian on the basis of the analysis of rhymes and especially of infractions of the rule which prohibits rhyming /​ē/​with /​ī/​and /​ō/​with /​ū/​. He concludes that the merging of /​ē ō/​into /​ī ū/​(šēr ‘lion’ > šīr, now a homonym of šīr ‘milk’; bō ‘smell’ > bū) originated in western Iran and spread eastwards, without reaching Afghanistan, and that it was still in progress in central Iran in the first half of the thirteenth century. Starting from

From Old to New Persian    35 cases where ē rhymes with the diphthong ay and ō with aw (the latter rhyme being more frequent and even tolerated by theoreticians like Šams-​e Qeys in the thirteenth century), Meier shows that the real pronunciation of the two diphthongs was very close to the majhul vowels and that /​ē ō/​in some cases merged with /​ay aw/​, as is indicated by Middle Persian nō ‘new’ > Classical New Persian naw, or by double outcomes like Nēšābūr and Nayšābūr. Lastly, Meier (1981: 127–​56) draws up a list of suffixes and endings that had the form of long -​ē in Classical New Persian: indefinite -​i (yā-​ye nakere); relative or determinative -​i (yā-​ye ešārat or taʿrif  ); the verbal suffix -​i denoting unreal or habitual action (section 2.17.4); and diminutive -​i. For more information on indefinite, see Chapter 6. By contrast, the verbal ending and the copula of the second person singular, the gerundive ending, the suffix of abstraction (yā-​ye maṣdari), and adjectival -​i (yā-​ye nesbat) were long -​ī. Early New Persian knows fricative allophones for postvocalic /​b d g/​. In texts in Arabic script, [β] is written ‫ﭪ‬, afterwards abandoned. In ancient manuscripts up to the mid-​thirteenth century, the allophone [δ] of /​d/​is written with letter ḏāl, also present in Arabic loanwords. Early New Persian δ is not generally considered as a phoneme. However, de Blois (2006: 94) thinks that, given the existence of a phoneme /​ḏ/​in Arabic loanwords (presumably pronounced δ in Early New Persian, as in Arabic), postvocalic δ in Persian words should also be regarded as a separate phoneme. In both the Arabic and the few Persian words which retained δ < d—​as gozaštan ‘to pass’ and gozāštan ‘to let pass, to put’—​the ancient interdental fricative merged afterwards with /​z/​.26 The fricative allophone of postvocalic /​g/​is not attested in texts in Arabic script, presumably because the spelling with ġeyn would have given rise to confusion with the phoneme /​γ/​, but is testified to in other scripts (Maggi 2003: 118). Delabialization of /​xw/​> /​x/​already occurred in the thirteenth century. Indeed, it appears already concluded in the Persian language reflected in the Codex Cumanicus (first half of the fourteenth century; section 2.15), as is revealed by spellings such as ‘pleasant’ for Early and Classical New Persian xwaš, Contemporary New Persian xoš. Delabialization of /​xw/​entailed the change a > o, as in the preceding example, but had no effect before /​ē/ or /ā/​: Early New Persian /​xwēš/​/xwāst/> Contemporary New Persian /​xiš/​/xāst/ (Pisowicz 1985: 121–​3; Meier 1981: 74–​85; Cipriano 1998: 293–​365). Several questions are still debated. First of all, the existence and phonological status of short e and o in Early New Persian, either as a possible continuation of Middle Persian /​e o/​(section 2.10, no. 2), or as allophones of /​i/​(or possibly of /​a/​) and /​u/​respectively. For Early Judaeo-​Persian the existence of [e]‌is generally accepted (Paul 2013c: 42–​3, section 26). Another question concerns the time when the couples /​ī/​~ /​i/​, /​ā/​~ /​a/​, and /​ū/​~ /​u/​begun to be contrasted through timbre, with an articulation shift of /​i/​towards e and /​u/​towards o, and with a back pronunciation of /​ā/​contrasting with a slight palatalization of /​a/​. The Codex Cumanicus already shows the beginning of the moving of /​i/​towards e, /​u/​ towards o, and /​a/​towards æ, written e (Bodrogligeti 1971: 43–​45), a phenomenon more widely attested in seventeenth-​century Latin transcriptions of Persian. It is significant that, in spoken Contemporary New Persian of Iran, short i is still retained near /​š k c j/​and in the 26  Recent summaries of the question of ‘dāl and ḏāl’ are de Blois 2006: 94–​6; Orsatti 2007b: 94–​8; Filippone 2011: 185–​6. Orsatti (2018, forthcoming) regards the complementary distribution of dāl and ḏāl in literary manuscripts mainly as a rule intended to bring order in the multifarious dialectal realizations of /​d/​in Early New Persian. Therefore, in what follows the fricative allophones of /​d b g/​are mirrored in transcriptions of Early New Persian texts only if they are so recorded in writing.

36    Mauro Maggi and Paola Orsatti proximity of syllables containing etymologically long i. As to the back articulation of /​ā/​, a pre-​thirteenth-​century text in Syriac script from South-​West Iran provides an early example in rwos for rās(t) ‘openly, truthfully’ (Baptism 10), where the vowel is vocalized as o and final -​t is lost as in the present-​day spoken language (Orsatti 2003a: 166). Finally, the first instances of a closing and backing of /​ā/​> o, u before nasals go back to the fifteenth century according to Pisowicz (1985: 79); but Ṣādeqi (1984) discusses earlier occurrences of this change in some toponyms (as Bisotun/Behistun), which he traces back to the variety of Middle Persian spoken in Khuzistan between the third and the fifth centuries AD.

2.17  From Middle to New Persian: morphosyntactic continuity and innovation The earliest documents of New Persian display a language still close to Middle Persian, but signals of later development are already visible. The most striking changes—​some of them already attested in late Pahlavi texts influenced by New Persian—​concern the verbal system.

2.17.1 Ergative construction The ergative (passive) construction of transitive verbs in the past (section 2.10.5) gives way to an accusative (active) construction (Ergativity is further discussed in Chapter 8). This happens in parallel with a gradual change of the value of the past participle of transitive verbs, mainly acquiring an active meaning: Middle Persian dīd hēm ‘I was/​have been seen’ > New Persian dīdam ‘I saw/​have seen’, with amalgamation of the ancient past participle, in its new function as past stem, with the auxiliary verb. Early examples of the New Persian active construction are (17) and (18): (17)

ʾn by hštym ān bi-​hišt-​ēm that ADV-​abandon-​1PL ‘we abandoned that one’ (Ar L4).27

This text also offers older constructions, as: kyš ʾyn kʾrhʾ ʾdst yšmʾ dʾd kē-​š īn kārhā u-​dast-​i šumā dād ‘who gave these matters into your hands (lit. ‘who (by) him these matters (were) given into your hands)’ (Ar P5–​6), where traces of ergativity may be seen (cf. Orsatti 2007b: 121 with n. 197 and Paul 2013c: 127, section 156). On possible traces of ergativity in literary Early New Persian texts, cf. Lazard 1963: 257–​8, sections 319–​20 and Lenepveu-​Hotz 2014: 57. Modern constructions, already attested in Early New Persian literary texts, like goft-​eš ‘he said’ (also applied to intransitive verbs: raft-​eš ‘he went (out)’) are considered as remnants of the ergative construction (Maḥjub 1959: 49). 27 

From Old to New Persian    37 (18)  wbr kwndwm28 ʾny u bar-​xwānd-​um ān=i and ADV-​read.PST-​1SG that=which ‘and I read what you had written’ (Du2 7).

nbyšt nibišt write.PST.PTCP

bwdy būd-​ī be.PST-​2SG

Possible transitional forms (Paul 2008a: 192) are provided by combinations of an old past participle like nibišt in (18) with the verbal endings, followed by the third singular present or, more often, past of būdan ‘to be’ to obtain present perfect, as in (19–​20), or past perfect (= pluperfect) forms of transitive and intransitive verbs, as in (21–​22) (Paul 2013c: 132, section 164c and 134 section 168b; Lenepveu-​Hotz 2014: 56–​7, qq.v. for the sources of the examples): (19) ʾb prystʾdnd āb firistād-​end water send.PST.PTCP-​3PL ‘They have sent water’. (20) kwšk bwdynd xušk būd-​end dry be.PST.PTCP-​3PL ‘they have become dry’. (21)

hyst hest be.PRS.3SG

hyst hest be.PRS.3SG

mwlk gryptynd bwd mulk girift-​end būd kingship take.PST.PTCP-​3PL be.PST.3SG ‘they had seized kingship’.

(22) ʾbʾz ʾmdnd abāz āmad-​and back come.PST.PTCP-​3PL ‘they had returned’.

bwd būd be.PST.3SG

2.17.2 Old subjunctive There are only scant traces of the old subjunctive with long thematic -​ā-​ (section 2.10.3) in subordinate clauses like rasād ‘it shall arrive’ in (23): (23) ʾg(r) yn nʾmh by šmʾ rsʾd agar īn nāma bē (i)šmā29 ras-​ād if this letter to you.2PL arrive.PRS-​SBJV.3SG ‘if this letter shall arrive to you’ (Du2 23).

28  Transliterations in the examples reproduce the word division found in the manuscripts, so that alignment with transcription and glossing is at times approximate. 29  In Du1 and Du2 the second person plural pronoun is išmā.

38    Mauro Maggi and Paola Orsatti In Early and Classical New Persian, only third singular forms, as bād < baw-​ād in (24), of the old subjunctive with precative value are found (Lazard 1963: 338–​9 section 474, where a single occurrence of the third plural is recorded). Today, only bād survives in set combinations like zende bād ‘viva!’ or har ce bād-​ā bād ‘what will be will be’. (24) yʾr ʾš ʾw bʾd yār=aš ō bād helper=POSS.3SG he be.PRS.SBJV.3SG ‘May He be his helper’ (Ta A3).

2.17.3 Towards a new present subjunctive Because of the virtual disappearance of the old subjunctive, Early and Classical New Persian have no formally marked subjunctive distinct from the present indicative: the new present subjunctive with grammaticalization of the prefix bi-​ did not appear before the sixteenth century (Lenepveu-​Hotz 2014: 249, 303). The prefix bi-​, used in Early and Classical New Persian with both present and past forms, has probably to be identified with the Middle Persian adverb and preverb bē ‘outside; out, away’ (Lazard 1975b). Its function ranged from cases where it retained its full lexical meaning (raft ‘he went’ ~ bi-​raft ‘he went away, left’) to cases where it seems to be only a means for emphasizing the verb (Lazard 1963: 298–​326, sections 394–​448). Some scholars rather recognize an aspectual perfective value in forms with bi-​(MacKinnon 1977; Josephson 1993 and 1995 for late Zoroastrian Middle Persian; to a different conclusion, coinciding with that by Lazard for New Persian, had come for Middle Persian Brunner 1977: 159–​60). Possibly as a consequence of the disappearance of the old subjunctive, which expressed future time reference in both main and subordinate clauses, in Early and Classical New Persian present forms with bi-​also acquired a future value (section 2.22.1). The bi-​+ present forms, as well as the unmarked present, i.e. present without bi-​nor (ha)mē, also developed, especially in subordinate clauses, a modal value as irrealis (Jahani 2008: 159–​60). It is from these values of the present forms with or without bi-​that the new subjunctive was at last developed. For Early and Classical New Persian only the present of ‘to be’ might show some clear modal opposition between the three different forms-​ast/​buwad/​bāšad ‘(s/​he, it) is’, the latter being a new form only developed in New Persian. Lenepveu-​Hotz (2014: 251–​68) has studied the values of these forms from the tenth to the sixteenth centuries and recognizes an oppo­ sition ‘permanent’ vs. ‘transitory’ between buwad and bāšad (some scholars had attributed a future meaning to bāšad), and just an opposition of emphasis between -​ast and buwad (Lenepveu-​Hotz 2014: 266). The form buwad disappeared almost completely in the fifteenth and sixteenth centuries (Lenepveu-​Hotz 2014: 267).

2.17.4 Optative A new optative came into existence in Early and Classical New Persian. It inherited the two values of unreal or habitual action (cf. English would) of the old optative (Lazard 1984a: 4–​6, 10–​11). Unlike the Middle Persian optative (section 2.10.3), the new optative has a complete

From Old to New Persian    39 conjugation obtained by the suffix -​ē appended to the verbal endings combined with the past stem (unreal and habitual action) or, more rarely, with the present stem (only unreal action): duzdī kardam-​ē ‘I used to be a thief ’, agar mā dānistēm-​ē ‘if we had known’.30 This verbal suffix—​possibly originated from Middle Persian hy hē < Old Iranian *hait, third singular optative of ah-​‘to be’—​has also a form -​ēd in some Early New Persian texts in Arabic script from the region of Herat (Lazard 1963: 328, section 450). It is written -​y and occasionally -​yh in Judaeo-​Persian (Paul 2013c: 115, section 137). In Manichaean New Persian the verbal suffix -​ē is written with the numeral ‘one’ ( in transliteration), like the indefinite article -​ē and the final vowel of the adverb and verbal prefix hmI hamē ‘always’. Subsequently, the verbal suffix -​ē gradually fell into disuse and its two values —​apparently beginning with that of habitual action—​were subsumed by the prefix mi-​. The suffix -​ē survives today only in forms like bāyest-​i ‘it was, would be, would have been necessary’ and, perhaps, in other fixed expressions like guy-​i ‘one would say’ (Lenepveu-​Hotz 2014: 157–​62).

2.17.5 Passive Some Early Judaeo-​Persian forms attest to a survival of the old synthetic passive in -​īh-​ (section  2.10.4, no.  3), with shortening of the vowel:  ʾyʾryhynd ayār-​ih-​ind ‘they will be helped’, bwrhʾd bur-​ih-​ād ‘it may be cut’, gwyhyd gōw-​ih-​id ‘it is (being) said’ (Paul 2013c: 136, section 171; Lenepveu-​Hotz 2014: 61–​6). Periphrastic passives, formed mainly with the auxiliary āmadan ‘to come’, are very frequent both in Early Judaeo-​Persian (Paul 2013c: 136–​7 section 172) and in Early New Persian in Arabic script (Lazard 1963: 345, section 490). However, (25) provides an interesting instance of an analytic present passive (karda buwad) formed with a past participle still with passive meaning, and the auxiliary buwad like in Middle Persian: (25) qʾr ʾy prmwdy ʾš skt qwnwm tʾ krdh bwd kār=i farmūd-​ī=aš saxt kun-​um tā karda buw-​ad work=which order.PST-​2SG=it hard do.PRS-​1SG that do.PST.PTCP be.PRS-​3SG ‘the work which you ordered, I shall work hard, so that it shall be done’ (Du1 29). Already at the end of the eleventh century, the auxiliary āmadan lost ground in favour of šudan ‘to become’, originally ‘to go’, though the former appears still retained under certain circumstances (for example, to indicate a process vs. a state) or in a stylistically high level (Lenepveu-​Hotz 2014: 66–​72).

2.17.6 Hortative Faint traces of the old hortative, consisting of the particle (h)ē (< Middle Persian hēb, ē(w)) before a present (Lazard 1984a: 6–​8; cf. section 2.10.3) as seen in (26) and (27), are found in 30  On these forms in Early New Persian, see Lazard 1963: 327–​38, sections 449–​72; Paul 2013c: 126, section 153 and 130–​1, section 162. For occurrences of the verbal suffix -​ē directly after the stem, with or without an enclitic personal pronoun, see Lazard 1963: 329–​31, sections 452–​4.

40    Mauro Maggi and Paola Orsatti Manichaean New Persian and northern Early Judaeo-​Persian (Du1, Gen), but are unknown to Early New Persian in Arabic script: (26) hyb wyrʾyd hē wīrāy-​ad HORT arrange.PRS-​3SG ‘he should arrange’ (M 9011 /​A/​6, Provasi 2011: 143). (27)

yzyd kwdh ʾy yʾr īzid/​ēzid xudah ē yār god lord HORT helper ‘may the Lord God help (us)!’ (Du1 1).31

bʾšd bāš-​ad be.PRS-​3SG

Even in north-​eastern Early Judaeo-​Persian texts, the hortative seems to be marginal and perhaps stylistically marked. Compare ē ... bāšad in (27) with bād in (24), with the hortative alternating with old subjunctive.

2.17.7 Causative and past stem suffixes Causatives with the suffix -​in-​ (written yn or simply n, Middle Persian -​ēn-​, section 2.10.4, no.  1), with the vowel probably already shortened, occur in the Early Judaeo-​Persian texts from south-​western Iran (Paul 2013c:  137, section  173):  by ʾngyzynyd bi-​angēz-​ in-​īd ‘He (God) aroused’ (Ar E11). These causatives are an important dialectal feature distinguishing the southern (pārsi) variety from the northern and north-​eastern (dari) one (section 2.14): texts in dari (and hence literary New Persian) present instead a north­ ern dialectal causative form in -​ān-​ instead of -​ēn-​. The causative form is also discussed in Chapter 3 and Chapter 7. From the northern dialect also originate the past stems with -​ād-​ instead of -​īd-​ like firist-​ ād-​an ‘to send’ as against f(i)rist-​īd-​an in south-​western Judaeo-​Persian (Henning 1933: 213, 222–​3; Paul 2013c: 110, section 125b). The fragment from a Persian version of the Gospel of Matthew in Syriac script (eleventh century) edited by Maggi shows an interesting causative in -​ān-​ formed from the past verbal stem instead of the present stem (košt-​ān-​īd-​ēd ‘you have had murdered’, Matthew, l. 12), which is a noteworthy dialectal feature typical of north-​eastern (dari) Persian (Maggi 2005: 639).

2.17.8 Prepositions, postposition -​rā, and circumpositions Besides the postposition -​rā, Early Judaeo-​Persian has a number of simple or compound prepositions and a variety of circumpositions or frame prepositions (Paul 2003a; Paul 2013c: 143–​50, section 180–​4).


On this passage, cf. Lazard 1988: 205–​9.

From Old to New Persian    41 As a remedy to the loss of a formal distinction between subject and object in Middle Persian (at least in the singular), the direct object came to be marked—​like the indirect object and the beneficiary—​by the postposition -​rā (Middle Persian rāy), as in (28), or by the directional preposition u/​o ‘towards’, possibly also pronounced a (Middle Persian ō), as in (29).32 The preposition u, unknown to Persian texts in Arabic script, is mainly attested in south-​western Judaeo-​Persian texts (Lazard 2009), though it also occasionally occurs in northern texts: ʾpyš a-​pēš ‘near, before’ (Du2 8).33 (28) kwdʾ mn rʾ frstyd xudā man=rā f(i)ristīd god I=OBJ send.PST[3SG] ‘God has sent me’ (Ar N1–​2). (29) ʾḥṣrʾ kyrd ʾdnyʾl iḥżār34 kird u=Danyāl summoning do.PST[3SG] OBJ=Daniel ‘she summoned Daniel’ (Lr3). Other prepositions and circumpositions could also mark the direct object: u ... -​rā (direct or indirect object), az mar ... (-​rā) lit. ‘for, because of ’ (30), or just mar ... (-​rā) (31), the circumposition mar ... -rā being a north-​eastern form (Lazard 1963: 382–​4 sections 575–​7): (30) ʾgr ... ʾzmrš by kwšy agar ... az mar=aš bi-​kuš-​ī if ... OBJ OBJ=it ADV-​kill.PRS-​2SG ‘if ... you kill it’ (Ar C12–​13). (31)

bzwurg kwunaδ xwδʾh mr drwyešāʾn rāʾ buzurg kun-​aδ xuδāh mar darwēšān=rā great make.PRS-​3SG god OBJ poor=OBJ ‘the Lord makes great the poor’ (Psalms IIr3–​4).

After a long and slow evolution, among all these forms only -​rā survived in the modern language. By losing—​though not completely even now—​its old values as a marker of the beneficiary and the indirect object, it specialized to express the direct object under certain

32  Middle Persian rāy marked the cause, purpose, beneficiary, indirect object, and hence possession; its use for the direct object is a late development. Likewise, Middle Persian ō ‘to’ could also mark the indirect and—​in late Manichaean Middle Persian—​the direct object (Lazard 2009: 169–​70). See (10) and (11), section 2.10. 33  The preposition u is well-​known in Persian dialects of western Iran (Filippone 2011: 198). Lazard (1986: 252) has suggested that a survival of this preposition—​reduced to a short vowel and then disappeared from pronunciation—​can be detected when a complement has no preposition in the contemporary spoken language: mira(va)m šahr ‘I am going to the town’. 34  For a discussion of this reading, see Orsatti (2007b: 111–​13). The same reading is given by Paul (2013c: 144, section 180b).

42    Mauro Maggi and Paola Orsatti grammatical (determinate vs. indeterminate; human vs. non-​human; topicalization) and pragmatic constraints (Paul 2008b). Besides u/​o, Early Judaeo-​Persian knows a directional preposition bē ‘to, towards’, probably from the Middle Persian adverb bē ‘out’ used to reinforce ō in the compound prepo­ sition bē ō ‘towards’. The preposition bē (Lazard 1986 and Shaked 1989) is attested in southern texts like Ar and QQ, and in northern texts, as can be seen from (23). It possibly merged afterwards with pa(d), later ba ‘to, at, in, on’, which also acquired a directional meaning (section 2.18–​2.19).

2.17.9 Ezafe as a relative pronoun A characteristic feature of Early Judaeo-​Persian is the preservation of the ezafe, a particle linking a noun with a following adjective or nominal determinant, in its old value as a relative pronoun (section 2.10.2), as in (18) and (25). In Manichaean New Persian, examples are to be found, inter alia, in Lehrtext: i xwad xuškī u sardī ‘which itself (is) dryness and cold’ (Lehrtext d2), and i tarī u sōzāgī ‘which (is) moisture and burning’ (Lehrtext d4). In Early New Persian in Arabic script there are only scant traces of such value of the ezafe, both written and not written (Lazard 1963: 490–​1, sections 855–​6). An instance of the ezafe in its value as a relative pronoun, written by letter yā attached to the preceding word, is to be found in (32): (32)

Bū Saʿīd=rā bar suft gīr tā qurṣ=i bar Abu Saʿid=OBJ on shoulder take.PRS[IMP.2SG] so.that round.loaf=which on ān ṭāq=ast furō gīr-​ad that niche=be.3SG down take.PRS-​3SG ‘take Abu Saʿid on your shoulders, so that he can bring down the round loaf which is on that niche’ (AT 17.24–​18.1).

2.17.10 The plural ending -​ihā/​-​hā For the plural of nouns, continuations not only of the regular ending -​ān (section 2.10.1), but also of the late ending -​īhā of Middle Persian are well attested in both Early Judaeo-​ Persian (Paul 2013c: 73–​6, sections 77–​9) and in Early New Persian in Arabic script (Lazard 1963: 195–​6 sections 149–​52).35 As for the latter ending, in Judaeo-​Persian the form -​ihā (with a probably already shortened -​i-​) occurs mainly in religious texts, and -​hā in non religious texts (Paul 2013c: 73 section 78a). In Early New Persian in Arabic script the form -​ihā, explic­ itly vocalized with short i, is seldom attested. The Marriage contract of 1078 from Bāmiyān shows a form -​hāy apparently unattested elsewhere. The early distribution of the forms is not very different from today’s usage, with -​ān being used for nouns denoting animate beings though not exclusively (full description of the classical and modern distribution of the 35  On the origin of Middle Persian -​īhā, see Salemann 1895–​1901: 282, 284–​5; Horn 1898–​1901: 100; Henning 1958: 81, 90, n. 1; Sundermann 1989b: 155.

From Old to New Persian    43 endings in Moʿin 1977: 28–​81). The ‘exceptions’ are so many that the original distribution of the old ending -​ān and the late ending -​hā might seem mainly a matter of use. When the Arabic loanwords entered Persian, however, they received the Persian endings -​ān or -​hā depending on the opposition human vs. non-​human, which indicates that this opposition became at some time relevant in the choice of the ending. When, much later, the European loanwords entered the Persian language, only -​hā was still a living and productive ending.

2.17.11 Comparative and superlative Middle Persian had the superlative suffixes -​tom and -​ist, and the comparative suffix -​tar (also with superlative force); the material and identifying suffix -​ēn (Darmesteter 1883: 139) could be added to obtain -​tomēn and potentially *-​tarēn when the adjectives were used attributively (Durkin-​Meisterernst 2014:  203–​5 with n.  103). Among these suffixes, literary New Persian only continues -​tar and -​tarīn, as well as -​īn with elative force. In Early New Persian, there is no clear distinction yet between the comparative and superlative functions of -​tar and -​tarīn respectively, as these may express both meanings (Lazard 1963: 206–​14 sections 181–​99). Old ‘irregular’ comparatives like mih ‘bigger’, kih ‘smaller’, and bih ‘better’, were re-​marked with -​tar (mih-​tar, kih-​tar, bih-​tar) and both old and new forms could receive the suffix -​īn (mih-​īn ~ mih-​tar-​īn, kih-​īn ~ kih-​tar-​īn, bih-​īn ~ bih-​tar-​īn), which is presumably also to be recognized in numerals (awwal-​īn ‘first’, duwwum-​īn/​duyyum-​īn ‘second’, etc.).36 Only later -​tar specialized to express the comparative and -​tarīn to express the relative superlative (‘the most ... ’), albeit with syntactic restrictions, because only -​tar forms with superlative force are admitted when used predicatively.

2.18  From Middle to New Persian: notable phonetic changes Early New Persian documents from both north-​eastern (Ta inscriptions) and south-​western Iran (Glosses) show that by the mid-​eighth century final -​g had already disappeared after a long vowel: qy ʾyn nywy qnd kē īn niwē kand ‘who incised this inscription’ (Ta A2), with niwē ‘inscription’ < Middle Persian nibīg.37 As a consequence, the adjectival suffix -​īg had become -​ī as in gurgānī ‘[pistachio nut] of Gorgan’ (Glosses 4). Instead, -​g was still retained after -​a, as in Glosses 5 banušag ‘middle-​sized pistachio nut’, 10 drīmag ‘wormwood’, and 14 jāmag ‘cup’ (for a thorough discussion of the matter, see Ciancaglini 2008: 54–​7, 72–​7). The old form of the abstract suffix -​īh is retained in Early Judaeo-​Persian documents of south-​western origin. In north-​eastern Iran, the very conservative Manichaean New Persian

36  The suffix -​in was, and still partially is, in co-​occurrence with and has been gradually replaced by adjectival -​i in both material adjectives (Paul 2007) and numerals (Orsatti 2005: 791): bolurin ~ boluri ‘of crystal’, avvalin ~ avvali ‘first’. See (15) for Middle Persian examples of an -​ist-​ēn superlative and regular and irregular comparatives. 37  For a discussion of this word, cf. Henning 1957: 337; Provasi 2011: 150.

44    Mauro Maggi and Paola Orsatti orthography also keeps the Manichaean Middle Persian spelling -​yẖ, but a tenth-​century poem proves that the abstract suffix had actually become -​ī: in Ha22, farāmōšīh ‘oblivion’ with final -​h would not fit the meter. Conflation of the suffixes -​īg and -​īh is also occasionally attested by Manichaean Middle Persian texts (Durkin-​Meisterernst 2014: 175–​6). The metrics of Manichaean New Persian texts also shows that the third singular ending -​yd had a short vowel (Manichaean New Persian /​ad/​and Judaeo-​Persian /​ed/​or /​id/​). Manichaean New Persian texts, from north-​eastern Iran, appear innovative in many respects. The new form br bar ‘on, upon’, with loss of initial a-​, alternates with older ʾbr abar, and bʾ bā ‘with’ alternates with ʾbʾ(g) abā (de Blois 2006 s.vv.), thereby indicating that new and old forms occurred together, unless the latter are mere historical spellings. In Early Judaeo-​Persian, the new form yār ‘friend’ already occurs in TA and Du1 instead of older ayār, as can be seen in (24) and (27). One of the Manichaean New Persian fragments (M 595a+; Provasi 2011: 161–​62, 166) shows a curious inverse spelling for the verbal prefix bi-​, written like the preposition pa(d), later ba < Middle Persian pad ‘to, at, in, on’. This may indicate that the scribe of this fragment perceived the two morphemes as homophonous and confused them, whereas south-​western Judaeo-​Persian texts still had pa(d) with voiceless initial consonant. The spellings kʾ, ky, and kw originally corresponding to Middle Persian ka ‘when, if ’, kē ‘who, which’, and kū ‘where; that; than’ still occur in Manichaean New Persian (where the spelling kʾ/​qʾ prevails) as well as in the south-​western Early Judaeo-​Persian Argument; but they tend to interchange and conflate probably on account of formal coalescence (de Blois 2006: 106 s.v. kʾ; Provasi 2011: 165–​66 s.vv. kʾ, kw, ky; MacKenzie 1968: 252).

2.19  The birth of literary New Persian and the adoption of the Arabic alphabet The main languages of culture in Iran in the first centuries after the conquest were Arabic and, still, literary Middle Persian (Zoroastrian Middle Persian literature was entrusted to writing in the first centuries of Islam). From a piece of information provided by the ninth-​ century Arab historian Balāḏurī, we know that Middle Persian in Pahlavi script was used for administration until the late seventh or the early eighth century in western Iran and even longer in eastern Iran, before being replaced by Arabic (Xānlari 1986: vol. 1, 307–​14). In the same years, the coinage reform of 77/​696 under caliph ʿAbd al-​Malik, directed at removing all symbols associated with the former Sasanian rule, put an end to the so-​called Arab-​Sasanian coinage (Mochiri 1981: 168; Bates 1987), a very interesting example of co-​ occurrence of such literary languages as Arabic and Middle Persian in the early decades after the conquest. The birth of literary New Persian, which entailed a new literature in the vernacular language of Iran, is more a major cultural than a merely linguistic issue. It is connected with the rise of courts more or less independent from the Arabic Abbasid caliphate in eastern and north-​eastern Iran and the emergence of a new Persian ruling class not sufficiently assimilated to Arabic culture (Lazard 1971b, 1975a). The variety of Persian spoken in north-​eastern Iran (dari) came, thus, to be the basis of the literary language (section 2.14).

From Old to New Persian    45 We do not know when, where, and for what purposes (administration, literature, private documents, etc.) the Arabic script was first adapted for writing Persian. When, towards the mid-​ninth century, the new poetry in the vernacular language of Iran emerged in the courts of eastern and north-​eastern Iran, it was certainly written down in Arabic script. As this poetry consisted of substituting Persian for Arabic within the pattern of Arabic poetry (Bausani 1960: 307–​11), one can suppose that the establishment of a New Persian orthography in Arabic script was a part of this experiment. What is sure is that New Persian in Arabic script is exempt from the historical spellings which hamper the study of New Persian texts in other writings including, perhaps, Judaeo-​Persian (one cannot exclude that an adaptation of the Hebrew alphabet to write Persian had already begun in Sasanian times, as was claimed by Bacher 1904). The Arabo-​Persian orthography betrays a clear normalizing aim. Middle Persian ka ‘when, if ’, kē ‘who, which’, and kū ‘where; that; than’ (section 2.18) merged in what had probably become a single new form ki, so that they were no more distinguished in writing and were spelled ky or kh (or simply k-​joined on to the following word, and -​k after ān ‘that’) in early Arabo-​Persian orthography. Likewise, of the prepositions bē ‘to, towards’ and pa(d) ‘to, at, in, on’ (sections 2.17.8 and 2.18), only the latter survived, also subsuming the directional meaning of bē. Its initial voiceless labial, perhaps also by influence of Arabic bi-​‘with, for, by’, became voiced and was written b-​(generally joined on to the following word) even in manuscripts that use the four letters added to the Arabic alphabet for writing Persian phonemes (Lazard 1963: 387 section 582). The preposition u/​o < Middle Persian ō, apparently not very frequent in dari (section 2.17.8), was dropped from pronunciation and from the literary language. The suffixes -​īh of abstract nous and -​īg of adjectives, which had formally merged (section 2.18), were both represented by -​y -​ī and the latter also merged, both formally and functionally, with the Arabic relation suffix -​iyyun (-​ī of nisba). The ezafe disappeared from writing (though of course not from pronunciation), apart from rare cases where it is written even after words ending in a consonant (Lazard 1963: 200 section 162). Though it is generally admitted that the ezafe had already been shortened in New Persian, these occasional spellings, as well as its metrical value as either short or long, point to the presence of a long variant of the ezafe in Early New Persian (Meier 1981: 131–​2). The use of the ezafe as a relative pronoun (section 2.17.9), probably already marginal in north-​eastern Persian (dari), was ousted from the literary language, though some memory of it may survive until now in such expressions as vaqt-​i(-​i) ānjā residam ... ‘(in) the time (in which) I arrived there ... ’ or in ce kār-​i bud(-​i) kardi ‘what kind of work was this (that) you did?’, where one can postulate the fall of a no longer written nor pronounced relative ezafe. The Early New Persian conjunction u ‘and’ < Middle Persian ud, though being a short vowel, was written as an independent word. However, in the non-​literary Marriage contract of 1078 the conjunction was regularly written only at clause beginning, where it supposedly begun to be pronounced wa as in Arabic (Orsatti 2018, forthcoming).

2.20  The Arabic element in Persian In the orthography of the earliest, eighth-​century Judaeo-​Persian documents (Ta, Du1, Du2), Persian /​k/​is written , so that kaph was left to represent Persian /​x/​, for which no

46    Mauro Maggi and Paola Orsatti special Hebrew letter was available.38 Only a couple of Arabic loanwords are to be found in Du2: hqym ḥakīm ‘doctor’ (Du2 4, 13) and hrb ḥarb ‘war’ (Du2 33). They are written without any attempt to transliterate their Arabic spelling by distinguishing Arabic emphatic ḥā from non-​emphatic hā, as it happens in later Judaeo-​Persian.39 This suggests that Arabic loanwords had not yet massively entered the current Persian language in the eighth century. The situation is significantly different from the tenth and eleventh centuries onwards, when texts in Hebrew (especially the legal documents Kd and Lr) and Manichaean scripts are full of Arabic loanwords. Their orthography shows a careful attempt to represent the orig­inal Arabic spelling by means of the possibilities offered by the relevant alphabets (Orsatti 2007b: 110–​13, 158–​63). A Manichaean New Persian text datable to the eleventh century (Sundermann 2003: 251) testifies to the spread, precisely in ‘this time’, of a new philosophical lexicon of Arabic origin, when it says that the body is dominated by jhl ‘ignorance, foolishness’ (Arabic jahl), ‘which the people of this time call lust (Arabic hawā) and temptations (Arabic waswās)’ (qš40 xlg ʾyg ʿyn zmʾng hwʾ ʾwd wswʾshʾ hmI xwʾnʾnd k-​aš xalq-​i īn zamāna hawā wa waswāshā hamē xwānand, Lehrtext, c10–​11). Scholars generally agree that the Arabic element entered Persian as learned loanwords from the written Arabic language (Telegdi 1973: 52; Bausani 1978: 13–​14; Pisowicz 1985: 19). In Persian texts in Arabo-​Persian script and, as far as possible, Hebrew, Syriac, and Manichaean scripts, Arabic loanwords retained their original spelling, though they were probably pronounced according to Persian phonology, as they are today. This seems to be evidence of their origin from books. However, Perry has recently suggested that a number of Arabic loanwords, which he terms ‘pre-​literate Arabisms’, entered Persian by way of speech and that the Arab settlements before and after Islam were a major contributing factor to the Arabicization of Persian (Perry 2009a: 54; Windfuhr and Perry 2009: 419). In his view, among Arabisms of this kind there are words assimilated to Persian morphology and phonology like mosalmān ‘Muslim’ (perhaps a plural with metathesis from Arabic muslim41) and such onomastic elements as mir from amir ‘prince’ or Bu from Abu ‘father’, which underwent the same loss of initial a-​as Persian words at the beginning of the New Persian linguistic period (Perry 2014; cf. section 2.18). The percentage of Arabic lexicon varies according to the literary genre and increases over time at least until the twelfth century (Skalmowski 1961; Lazard 1965; Bausani 1969b; Telegdi 1973; Utas 1978). Lyrical poetry of the new type, i.e. composed according to Arabic prosody, is from the beginning rich in Arabic words and phrases referring to the common Islamic culture (exemplarily, Koranic quotations). What is considered one of the most ancient pieces of New Persian poetry, the six line panegyric that Muḥammad b. Waṣīf presented to the Saffarid Yaʿqūb-​i Lays in the aftermath of his victory in 251/​865—​preserved in the anonymous Tārīx-​ i Sīstān (eleventh century, with later additions)—​is already Arabicized in its lexicon and prosody (ed. Lazard 1964:vol. 2, 13–​14). The vocabulary of epic poetry is less Arabicized. Though Arabic loanwords very often received Persian suffixes, with Arabic broken plurals even re-​pluralized (Moʿin 1977: 81–​87), the preservation of the original spelling of Arabic 38  Afterwards, Hebrew qōph came to be used for Arabic /​q/​, and kaph for both /​k/​and /​x/​with or without diacritic. 39  In Judaeo-​Persian, Arabic is transliterated by Hebrew ḥēth and Arabic by Hebrew hē. 40  In Manichaean orthography, and alternate to write /​k/​, while the new letter was created for Arabic /​q/​. On the spelling hmI for hamē, see section 2.17.4. 41  On this word and other possible explanations of its origins, see Moʿin 1977: 80–​1.

From Old to New Persian    47 loans may have entailed the awareness of their non-​Iranian origins, at least in a learned context. Indeed, poetry seems to betray a sort of artificial and scholarly pronunciation of Arabic loanwords. For example, the letters which in Persian have and probably had one and the same phonetic reference ( /​z/​, /​s/​, /​h/​, and /​ʔ/​) never rhyme together (Meier 1981: 103). Nowadays, Arabic words or expressions of common use like baʿd, baʿd az ‘after’ are felt as of a lower stylistic register in comparison to their Persian counterparts (pas, pas az). The orthography of the Arabic loanwords has remained unchanged throughout the history of the New Persian written tradition, which suggests the idea of the Arabic vocabulary of Persian as an immutable set. Only one morphological class of Arabic loanwords has undergone a change since its embedding into New Persian. This is the Arabic loanwords with tā marbūṭa (about 1500 items), which entered Persian either with the ending -​a (later -​e) or -​at, according to semantic features and stylistic choices: -​a is felt as ‘more concrete’, -​at as ‘more abstract’. A consistent part of the words originally borrowed with -​at (about 200 out of 800) shifted to -​a in the course of the past thousand years and some 40 items present a double sorting with different meanings: qovve ‘(military) force, (industrial) energy; faculty’ is felt as a concrete, countable noun, whereas qovvat ‘strength, power’ is felt as an abstract noun (Perry 1991, 1995). The massive entrance of Arabic loanwords has sometimes been considered responsible for the falling into disuse of the ancient Iranian verbs in New Persian, and their gradual replacement by ‘compound verbs’ or verbal periphrases formed by an Arabic noun and a Persian infinitive, as andišidan ~ fekr kardan ‘to think’. However, both Telegdi (1950–​1951: 321) and Ciancaglini (2011: 3) have noticed that such periphrases are also based on Persian words, as in the cases of por kardan ‘to fill’ or kušeš kardan ‘to strive’, the latter alternating with the corresponding simple verb kušidan. Ciancaglini (2011) has shown that the verbal periphrases of the type noun + kardan are very ancient, and must be traced back to Indo-​Iranian. Compound verbs are further discussed in Chapters 3, 7, 8, 9, 10, 15, 17, and 19.

2.21  Turkish influence on Persian Starting from the second half of the eleventh century, Turkish peoples moved from Central Asia to Iran, furnishing the basis for a long series of Turkish dynasties. This led to the Turkization of wide areas of Iran, particularly in western and north-​eastern Iran, where different varieties of Turkish supplanted Persian first in rural areas and later also in towns, thereby gradually reducing the area where Persian was spoken. In Azerbaijan and eastern Transcaucasia, this process may be considered accomplished around the fourteenth century (Lornejad and Doostzadeh 2012: 18–​19, 143–​88). Turkish was widely spoken in Iran. For the Safavid epoch, a number of European travelers attest to the diffusion of Turkish as a language spoken both at the court in Isfahan and largely by the population (Orsatti 2003b). Turkish loanwords, mainly relating to the domains of power, politics, and popular culture, are less numerous than the Arabic ones.42 However, the

42  On Turkish loanwords in Persian, see Doerfer 1963–​75; for Turkish words in classical poetry, see Ganjei 1986, and Lornejad and Doostzadeh 2012: 93–​108.

48    Mauro Maggi and Paola Orsatti influence of Turkish languages and dialects on Persian of Iran, especially on its phonology, was very strong. The Turkish adstratum has been considered responsible for the replacement of the opposition of length between /​i a u/​and /​ī ā ū/​by an opposition of timbre, as well as for the fronting of /​a/​(Pisowicz 1985: 90, 93); though the front articulation of /​a/​, represented by e in seventeenth-​century European transcriptions in Latin alphabet, might also be due to the influence of the coeval dialect of Isfahan (Pisowicz 1985: 97–​8; Smirnova 1978: 11–​12). The Turkish adstratum has also been regarded as a contributing cause for the replacement of the opposition of voicing between /​p/​~ /​b/​, /​t/​~ /​d/​, /​k/​~ /​g/​, and /​c/​~ /​j/​by an opposition of tenseness, and for the dephonologization of the opposition between /​q/​and /​γ/​(Pisowicz 1985: 106, 113). Grammatical influences are more difficult to prove. The particular syntactic construction seen in (33) has been explained as a calque on Turkish (Pisowicz 1985: 91), but the unsoundness of the Turkish hypothesis has been shown by Rubinčik (2001: 355–​58) in a thorough discussion of this construction in the framework of Persian syntax: (33)

ʿamu zan=eš mariż=e uncle wife=POSS.3SG ill=be.PRS.3SG ‘the wife of (my) uncle is ill’.

However, Turkish influence can certainly be seen in expressions with the modifier before the head noun like Nāder Šāh as against Šāh ʿAbbās, Mirzā Ṣādeq as against ʿAbbās Mirzā, or juje kabāb as against kabāb-​e barre (Perry 2001). In the case of lexical doublets like Persian xar ~ Turkish olāɣ ‘donkey’, kārd ~ cāqu ‘knife’, bānu ~ xānom ‘lady’, āheste ~ yavāš ‘slowly’, the Turkish loanwords tend to occupy a lower sociolinguistic register than their Persian counterparts (Perry 2001: 196).

2.22  Post-​classical developments 2.22.1 Periphrastic future Apart from the loss of the New Persian optative (section 2.17.4), a major development in the verbal system of post-​classical New Persian is the rise of a new periphrastic future with the auxiliary xwāstan ‘to want, will’. While the phrase xwāham raft(an) had both a volitional and a future force in Early and Classical New Persian, xwāham raft, with the shortened form of the infinitive (raft), is grammaticalized in post-​classical New Persian to express only the future: ‘I will go’ (Jahani 2008; Lenepveu-​Hotz 2014: 183–​97). In Middle Persian, future time reference was mainly expressed by subjunctive in both main and subordinate clauses (Lazard 1984a: 2). The disappearance of the old subjunctive may have been the reason for the meaning of future to be expressed just by the present. On the one hand, Jahani (2008: 158–​63) has shown that, of the three present forms of Early and Classical New Persian—​unmarked present (without prefix),43 present with bi-​, and present 43 

‘Non-​past’ in Jahani’s terminology.

From Old to New Persian    49 with (ha)mē—​only unmarked forms and less often forms with bi-​ can have future time reference, whereas she found no examples of present with (ha)mē with future force in her corpus of Early and Classical New Persian. On the other hand, Lenepveu-​Hotz (2014: 190) has shown that only present forms occur in subordinate clauses to express the future, the periphrastic forms with xwāstan being restricted to principal clauses. These remarks clearly suggest that the two concurrent ways of expressing the future in Early and Classical New Persian—​present with or without be-​ and periphrasis with xwāstan—​specialized in later Persian as the new subjunctive and the new future respectively.

2.22.2 Progressive present and past The post-​classical creation of a progressive present and past with the auxiliary dāštan ‘to have’ is also relevant: dār-​am mi-​rav-​am ‘I am going’, dāšt-​am mi-​raft-​am ‘I was going’. This periphrasis probably originated from northern or central Persian dialects and is little attested in the literary language (Jeremiás 1993).

2.22.3 Possessive expressions In Early and Classical New Persian, there occur various possessive expressions such as ān-​i ... (34), az ... (35), ān-​i ... -​rā (36), ... -​rā (37): (34) īn kūdak ān=i kī=st? this boy that=of who=be.PRS.3SG ‘whose is this boy?’ (AT 17.17). (35)

īn az šumā=st this of you.2PL=be.PRS.3SG ‘this is yours’ (AT 24.25).

(36) ʾny mrʾ ān=i ma=rā that=of I=OBL ‘that belonging to me’ (Du1 4). (37) dyinēʾ pāk̇ mšyiḥāʾ rāʾḣ dīn=i pāk mašīhā=rā religion= EZF pure Christ=OBL ‘the pure religion (is) of Christ’ (Baptism 4, 8, 12). These expressions have been replaced in post-​classical New Persian of Iran by an ezafe construction with māl ‘property’: māl-​e man ‘property of me, mine’ (see Chapters 3, 6, 7, 8, 9, 10, and 19 for more discussion on this topic). The latter is still used with possessive meaning in standard Contemporary New Persian but, in the colloquial variety, also has other meanings

50    Mauro Maggi and Paola Orsatti such as in (38), where it expresses origin, and is being replaced by the complex preposition barāy-​e ‘for’ to express possession as in (39): (38)

in gardanband māl=e this necklace property=of ‘this necklace is from Iran’.

Irān=e Iran=be.PRS.3SG


in ketāb barāy=e man=e This book for= EZF I=be.PRS.3SG ‘this book is mine’.

2.22.4 The birth of the ‘relative -​i’ In Persian a bare noun without any article may refer either to a whole class of items (esm-​e jens ‘generic noun’ in traditional grammar: ketāb ‘book’ instead of other things) or to a fully determinate referent (ketāb ‘the book’, already known or referred to). Very early, however, a need was felt for a more unambiguous reference. Already in Late Middle Persian (Josephson 2011: 36–​7) and Early New Persian (Orsatti 2011: 75–​80), the suffix -​ē of the yā-​ye nakere, the ‘indefinite article’, developed a strong individualizing meaning and was redundantly used with nouns and nominal phrases endowed with a determinate value to highlight the individuated or fully determinate meaning of the referent. This particular individualizing value of the yā-​ye nakere (Daniel Paul 2008 for Contemporary New Persian), which can be identified as the yā-​ye ešārat, the ‘deictic -​i’ of Persian grammarians, probably originated in the spoken language as a means to emphasize the individuated or fully determinate reference to a denotatum. Indeed, it is only occasionally attested in literary texts, as in (40): (40) īn nā-​kardan=ē=t pindār=ast this NEG-​do.INF=ART=POSS.2SG immagination=be.PRS.3SG ‘(it is) this your non-​doing (that) is illusory!’(AT 34.22). In such usage, the suffix -​ē had the same value as the modern suffix -​e/​-​he of the spoken language, the so-​called ‘definite article’, which redundantly indicates an individuated or fully determinate referent: ye tork-​e ‘a Turk’ (not any Turk, but a certain one), doxtar-​e ‘the girl’, ān āqā-​he ‘that Sir’. An accented variant of the yā-​ye nakere in this particular individualizing value still survives as a facultative and stylistically marked suffix in pronominal or adverbial expressions of the spoken language like digar-​i ~ digar ‘the other’, un-​var-​i ~ ān var ‘on the other side’, in-​jur-​i ~ in jur ‘in this way’, kodum-​yek-​i ~ kodām yek ‘which one’ (Orsatti 2005; 2011: 53–​65). At a later stage in the history of Persian the individualizing value of the ancient -​ē suffix gave rise to the ‘relative -​i’, the suffix marking the head-​noun of determinative relative clauses (Orsatti 2011: 81–​5). Indeed, it has been noted that in Early and Classical New Persian the ‘relative -​i’ before a determinative relative clause was less frequent than today: its usage was optional and, after a substantive with specific or determinate reference, especially if preceded by a demonstrative, it was altogether omitted: waqt-​ē ki ... ‘when’ (AT 19.20), dar ān waqt ki ... ‘at that time, when ... ‘ (AT 25.1). The grammaticalization process of suffix -​i ( [ɡoːhaɹ] ‘jewel’.

4.3  Consonants and vowels of Standard Modern Persian In Standard Modern Persian, briefly referred to as Persian in this section, all speech sounds involve an egressive pulmonic airstream, i.e. they use the body of air that goes out of the lungs in their production. Speech sounds are divided into two major groups of consonants and vowels, depending on the amount of obstruction involved in their production and the position they occupy in a syllable.

4.3.1 Consonants Phonetically speaking, consonants are those sounds that are produced with a relatively obstructed vocal tract. They appear in the margin of syllables, when considered from a phonological point of view. The sound system of Persian includes twenty-​three consonants (Table 4.2) in nine places and six manners of articulation. The individual places of articulation can be summarized into

Table 4.2 Consonants of Standard Modern Persian



t d f



s z ʃ




ɟ x


Central Approximants


Lateral Approximants



Palatal c

Affricates Nasals

Post-​ alveolar



Labio-​ dental

p b



Laryngeal Glottal








ʔ h

94   Golnaz Modarresi Ghavami four main places, taking into consideration the active articulator involved in the production of consonants. The labial place of articulation includes bilabials (/​p,b,m/​) and labio-​dentals (/​f,v/​), i.e. those sounds that use the lower lip as the active articulator. Coronals include dentals (/​t,d/​), alveolars (/​s,z,n,ɹ,l/​), and post-​alveolars (/​ʃ,ʒ,ʧ,ʤ/​) which involve the tip/​blade of the tongue (corona) as the main articulator. Palatals (/​c,ɟ,j/​), velar (/​x/​) and uvular (/​ɢ/​) consonants come under the dorsal place of articulation, as the tongue body (dorsum) is the active articulator involved in their production. For more discussion on dorsal sounds, see section below and Chapter 5. The laryngeal place of articulation includes sounds that involve the vocal folds and the space between them known as the glottis. The two consonants /​h/​ and /​ʔ/​are the laryngeal (glottal) consonants of Persian. Manners of articulation can be summarized under two main categories of obstruents and sonorants. Obstruents (plosives, fricatives, and affricates) are produced with a momentarily complete obstruction, a narrowing of the vocal tract to the degree that friction is made, or a combination of the two. Obstruents can be voiced or voiceless, i.e. the vocal folds vibrate or not in their production. Sonorants are articulated with a relatively open vocal tract that makes spontaneous voicing possible. So the members of this class are all voiced. This class includes nasals and approximants as far as consonants are concerned. Table 4.2 represents all consonants of Standard Modern Persian. In each box, following the IPA convention, the symbol to the left represents a voiceless consonant and the one to the right represents its voiced counterpart. The articulatory as well as acoustic properties of these consonants are discussed below. Obstruents As mentioned, obstruents are produced with a momentarily complete obstruction or narrowing of the vocal tract to the degree that friction is made, or a combination of the two processes. This class includes plosives, fricatives, and affricates. Plosives Plosives, also known as (oral) stops, are produced with a complete obstruction of the vocal tract, followed by a sudden release or explosion of air. Persian has eight plosives: /​b,p,t,d,ɟ,c,ɢ,ʔ/​seen in all four main places of articulation: (i) labial (/​b,p/​); (ii) coronal (/​t,d/​); (iii) dorsal (/​c,ɟ,ɢ/​); and (iv) laryngeal (/​ʔ/​). The labial plosives /​b/​and /​p/​are made with lower and upper lips as the active and passive articulators respectively. The coronal stops /​t,d/​are dental in Persian, produced with the blade of the tongue as the active articulator and the back surface of the upper front teeth as the passive articulator. Dorsal stops involve the tongue body as the active and the palate as the passive articulator. The palate itself can be divided into three main sections: hard palate, soft palate (velum), and the uvula. Consonants produced at the hard palate are called palatal. Likewise, consonants produced at the velum have a velar place of articulation and those produced at the uvula are called uvular. The dorsal plosives /​c/​and /​ɟ/​are palatal in Standard Modern Persian. They assimilate in place of articulation with the following vowel in syllable-​initial position as in [caɹi] ‘deafness’, [ɟaɹi] ‘baldness’, [kɑɹi] ‘working, active’ and [ɡɑɹi] ‘carriage’. In other words, they acquire velar allophones syllable-​initially, but are pronounced as palatal syllable-​finally irrespective of the preceding vowel, in words such as [nic] ‘good’, [pɑc] ‘clean, pure’, [diɟ] ‘cooking pot’, and [suɟ] ‘mourning’.

Phonetics   95

Figure 4.1   Waveform and spectrogram of a glottal stop in initial position Besides the two dorsal plosives discussed above, there is another dorsal consonant in Persian produced with back of the tongue as the active and the uvula as the passive articulator. This voiced consonant, represented as [ɢ] in IPA, appears in words such as [ɢalam] ‘pen’ and [ʔotɑɢ] ‘room’. This consonant has a voiced uvular fricative (free) variant in intervocalic position in words such as /​ɑɢɑ/​ [ʔɑʁɑ] ‘Mr.’ and /​bɑɢi/​ [bɑʁi] ‘remaining’ and a voiceless velar fricative allophone in the context of voiceless consonants in words such as /​bɑɢʧe/​ [bɑxʧe] ‘small garden’ and /​vaɢt/​[vax(t)] ‘time’. This latter allophone is sometimes seen intervocalically in colloquial Persian in words such as /​jaɢe/​[jaxe] ‘collar’ (see Chapters 2, 3, 5, 6, 10, 11, and 15 for more on colloquial speech). A glottal stop, as its name indicates, is produced by the obstruction of the glottis and its abrupt opening. Thus, it shows up ideally as a silence gap (representing the obstruction phase) followed by release burst (representing the release of obstruction) in spectrograms. Spectrograms are frequency (y-​axis) by time (x-​axis) representation of speech signals. Intensity is represented by shades; darker sections are more intense. Figure 4.1 shows the waveform and spectrogram of the glottal stop [ʔ] in word initial position in [ʔɑn] ‘that’. The release phase is indicated by an arrow on the waveform and the obstruction phase is indicated by the flat horizontal line before the release. Phonetically speaking, this consonant can appear as a stop in all positions in careful speech, but can be replaced with a glottal trill (creaky voice) in intervocalic position (Yazarlou 2014). Creaky voice is characterized by irregular and slow vibration of the vocal folds. Figure 4.2 shows the waveform and spectrogram of a glottal trill in intervocalic position in [xɑneʔaʃ ] ‘his/​her home’. The sparse and irregular glottal pulses observed in the boxed section, represent a glottal trill as a variant of [ʔ]. The glottal stop (as well as the glottal fricative discussed below) can be deleted syllable​finally and result in the lengthening of the preceding vowel in words such as /​baʔd/​ [baːd] ‘after’, /​ʤamʔ/​ [ʤaːm] ‘addition’, and /​maʔni/​ [maːni] ‘meaning’. In terms of voicing, Standard Modern Persian has four voiced /​b,d,ɟ,ɢ/​ plosives:  /​bɑm/​ ‘roof ’; /​dɑm/​ ‘trap’; /​ɟɑm/​‘step’; and /​ɢam/​‘sorrow’, and three voiceless ones: /​p,t,c/​ as in /​pɑc/​ ‘pure’; /​tɑc/​‘grape’; and /​cɑc/​‘type of cookie’. The glottal stop is also

96   Golnaz Modarresi Ghavami

Figure 4.2   Waveform and spectrogram of a glottal trill (boxed section)

Table 4.3 VOT (ms) of voiceless plosives in initial and intervocalic positions [p]‌














(Nourbakhsh 2009 [1388]: 177)

voiceless, as its production involves the same articulators involved in voicing (the vocal folds) and these articulators cannot perform both actions simultaneously. All voiceless stops /​p, t, c [k]‌/​are aspirated word-​initially. This means that voicing of the following vowel is delayed after the release of plosives and the air that escapes the oral cavity comes out as an audible puff of air. The time lapse between the release of plosives and the beginning of voicing for the following vowel is called Voice Onset Time (VOT) measured in milliseconds (ms). Voiceless aspirated stops have an average VOT of 80 ms in initial position. This places the voiceless plosives of Persian in the category of heavily aspirated stops. Degree of aspiration reduces intervocalically to an average of 50 ms (Table 4.3). This observation is consistent with Samareh (1378 [1999]) who introduces partially aspirated allophones for voiceless stops in Persian in intervocalic position. This half-​aspirated allophone is considered by Samareh to be limited to unstressed intervocalic position. However, statistical analysis by Nourbakhsh (2009 [1388]: 142–​3) indicates that there is no significant difference in VOT as a function of stress. Voice Onset Time for voiced stops in initial and intervocalic position is reported in Table 4.4 below. Negative VOT values indicate that voicing is present during the closure phase of the plosive, while positive values indicate lack of voicing during this phase. As numbers indicate, [b, d, ɟ] are slightly voiced word-​initially, while [ɡ] and [ɢ] are totally voiceless in this position. Voicing is present in the production of [b,d,ɟ,ɡ] in intervocalic position as

Phonetics   97 Table 4.4 Average VOT (ms) of voiced plosives in initial and intervocalic positions [b]‌














[ɢ] +5 ––––––​

(Nourbakhsh 2009 [1388]: 177)

negative VOT values indicate. The uvular stop realizes as the fricative [ʁ] in intervocalic position making VOT irrelevant. Voiced stops become partially or fully devoiced when adjacent to voiceless consonants as well as word-​finally, as in [xɑb̥] ‘sleep’, [zud̥] ‘early’, [saɟ]̥ ‘dog’, [hab̥s] ‘custody’, [ʔasb̥] ‘horse’, [had̥s] ‘guess’, [ɢasd̥] ‘intention’, [diɟʧe] ‘small pot’, and ̥ [mesɟaɹ] ‘coppersmith’ (Samareh 1999 [1378]). ̥ To summarize, Standard Modern Persian has two sets of oral plosives; aspirated /​ph,th,ch/​ and /​b,d,ɟ,ɢ/​. The latter are only fully voiced in intervocalic position and are mainly voiceless phonetically in other positions. This observation has led some investigators (Lazard 1972; Modarresi Ghavami 2007 [1386]; Nourbakhsh 2009 [1388]; Bijankhan 2013 [1392]) to conclude that the main distinguishing characteristic of the two sets of stop consonants is not voicing, but aspiration in Persian. Fricatives Fricatives are sounds that are made with the narrowing of the vocal tract to the degree that the passage of air results in random noise. The sound system of Persian includes three voiced /​v,z,ʒ/​and five voiceless /​f,s,ʃ,x,h/​fricatives as in [vɑm] ‘loan’; [zaɹ] ‘gold’; [ʒaɹf] ‘deep’; [fɑm] ‘colour’; [saɹ] ‘head’; [ʃiɹ] ‘lion’; [xeɹs] ‘bear’, and [hes] ‘sense’. In terms of place of articulation, /​f,v/​are labial, /​s,z,ʃ,ʒ/​are coronal, /​x/​is dorsal, and /​h/​is laryngeal. These consonants can be divided into two main groups of sibilants /​s,z,ʃ,ʒ/​ and non-​sibilants /​f,v,x,h/​. Sibilants have a hissing sound produced by air passing through the narrow passage made by the tongue blade and the alveolar ridge/​post-​alveolar region. Acoustically, this hissing sound shows as compact noise or concentration of energy in certain frequencies. In Persian, the alveolar sibilants /​s,z/​have a compact noise at frequencies above 5 kHz, while the post-​alveolars /​ʃ,ʒ/​have a compact noise between 3 and 5 kHz (Sepanta 1998 [1377]; Bijankhan 2013 [1392]). The non-​sibilant oral fricatives /​f,v,x/​, on the other hand, are characterized by diffuse noise, i.e. noise is seen in their spectrogram in all frequencies. Figure 4.3 below shows the spectrogram of the oral voiceless fricatives of Persian in [fe], [se], [ʃe], and [xe] sequences. The noise part of each sequence is shown in boxes. As Figure 4.3 shows, [f]‌and [x] have diffuse noise seen in all frequencies, while [s] and [ʃ] have compact noise in certain frequencies characteristic of sibilant fricatives. The special characteristic of [x]‌is that the formants of the following/​preceding vowel is observable during its noise. This observation indicates that the tongue is in the position required for the production of an adjacent vowel during the production of [x]. As such, the

98   Golnaz Modarresi Ghavami

Figure 4.3   Waveform and spectrogram of voiceless fricatives (boxed sections)

Figure 4.4   Glottal fricative [h]‌in intervocalic [ɑ_​_​a] position. Vowel formants are indicated by dotted horizontal lines fricative /​x/​has three positional allophones; a front allophone (velar) that appears before and after front vowels, and a back allophone (uvular) that appears before and after back vowels. There is a third post-​velar allophone that appears post-​consonantally in clusters in words such as [neɹx] ‘rate’, [tabx] ‘cooking’, [talx] ‘bitter’, etc. The velar allophone has a spectral peak at an average of 1,646 Hz in the context of front vowels. The post-​velar allophone has a spectral peak at an average of 1,421 Hz word-​finally in clusters and the uvular allophone has a spectral peak at an average of 785 Hz in the context of back vowels (Asadi 2012 [1391]). These results indicate that the phoneme /​x/​is mainly a velar/​post-​velar consonant rather than a uvular. The glottal fricative [h]‌as its name indicates is produced at the glottis. The vocal folds are relatively open in the production of this consonant and the air that passes through

Phonetics   99 Table 4.5 VOT values for Persian affricates (ms) [ʤ]


Initial Position



Medial Position



(Nourbakhsh 2009 [1388])

them produces friction noise that resonates in the cavities above the larynx. Since the supralaryngeal cavity is either in neutral position or in the position for adjacent vowels during the production of this consonant, the formants of adjacent vowels are observable throughout the noise representing [h] (Figure 4.4). The glottal fricative [h]‌is basically a voiceless consonant as it involves a relatively open glottis in its production, nevertheless, it can become voiced in intervocalic position in words such as [baɦɑɹ] ‘spring’ and [ʧɑɦɑɹ] ‘four’. In the production of the voiced glottal fricative [ɦ], part of the vocal folds vibrate, while part of them is open and the air that passes through them creates noise. This glottal fricative, like its stop counterpart [ʔ], can be deleted word/​syllable-​finally resulting in the compensatory lengthening of the preceding vowel in examples such as /​tehɹɑn/​‘Tehran’ and /​dah#tɑ/​‘ten units’, which are pronounced as [theːɹun] and [daː thɑ] in colloquial Persian respectively. Affricates Affricates involve a complex production in which a complete obstruction of air in the oral cavity is followed by a gradual escape of air through a narrow cavity. Persian has two affricates: /​ʧ/​as in /​ʧanɟɑl/​‘fork’ and /​ʤ/​as in /​ʤomʔe/​‘Friday’ produced in the post-​ alveolar region. As with plosives, VOT is important in distinguishing /​ʤ/​and /​ʧ/​. The VOT values for these two consonants in initial and medial positions are given in Table 4.5. Numbers indicate that [ʧ] is heavily aspirated in initial position and aspirated medially. The other member of this contrast, i.e. [ʤ] is voiceless in initial position and voiced medially. Like plosives, this observation leads to the conclusion that the main distinguishing feature of the two consonants is aspiration rather than voicing. Sonorants Sonorants are articulated with a relatively open vocal tract that makes spontaneous voicing possible. This class includes nasals, approximants, and vowels. Acoustically, sonorants are characterized by formants, which are resonating frequencies of the air in the vocal tract observable as dark horizontal bands in spectrograms. As the vocal tract is more constricted in the production of sonorant consonants relative to vowels, their resonating frequencies are less intense, hence consonantal formants are seen lighter in shade compared to the surrounding vowels.

100   Golnaz Modarresi Ghavami

Figure 4.5   Spectrogram of nasals Nasals Persian has two nasal consonants: /​m/​and /​n/​. The oral cavity is closed in the production of nasal consonants and air escapes through the nasal cavity. Acoustically, these sonorant consonants are characterized by weaker formants compared to vowels (Figure  4.5). The first three formants of [m]‌are around 250, 1,000, and 2,700 Hz. The same formants are reported to be around 250, 1,500–​1,600, and 2,800–​3,000 for [n] in Persian (Sepanta 1998 [1377]: 89–​90). The bilabial nasal /m/​assimilates in place of articulation only with the following labio-​ dental fricatives as in /​amvɑl/​ [ʔaɱvɑl] ‘properties’ and /​samfoni/​ [saɱfoni] ‘symphony’. The alveolar nasal shows assimilation with all places of articulation, as in /man baɹ miɟaɹdam/​[mam baɹ miɟaɹdam] ‘I will return’; /anvaʔ/​ [ʔaɱvɑ] ‘types’; /​ɢand/​ [ɢan̪d̪] ‘sugar cube’; /pance/​[paɲce] ‘fan’; /​anɟuɹ/​ [ʔaŋɡuɹ] ‘grape’; and /​menɢɑɹ/​ [meɴɢɑɹ] ‘beak’. Approximants In the production of approximants, articulators are close together to the degree that no friction is made by the air that passes through. Persian has /​l,ɹ,j/​as approximants. /​l/​is a lateral approximant and /​ɹ,j/​are central approximants. The lateral approximant /​l/​is produced by making a complete closure at the alveolar ridge with tip/​blade of the tongue, keeping the sides of the oral cavity open. As with other sonorants, this consonant is also characterized by weak formant structure (Figure 4.6). The first three formants of this consonant are reported to be around 250, 1,500–​1,600, and 2,450 Hz by Sepanta (1998 [1377]: 92); 321, 1,532, and 2,588 Hz by Bijankhan (2013 [1392]: 199); and 296, 1,944, and 2,547 Hz by Alinezhad and Hosseinibalam (2013 [1392]: 173). /​l/​is devoiced word-​ finally. When this consonant appears after a voiceless consonant in word-​final position, it is produced as a voiceless fricative. The glide /​j/​is produced by raising front of the tongue towards the hard palate and pushing the air through the narrowing with no friction. This consonant is characterized acoustically

Phonetics   101

Figure 4.6   Waveform and spectrogram of [le]

Figure 4.7   Waveform and spectrogram of [je] by weak formants with frequencies around 275, 2,100, and 2,650 Hz (Sepanta 1998 [1377]: 94). It has the same formant frequencies of the high front vowel [i]‌(Figure 4.7). Different types of rhotic (r-​like) consonants are observed in the languages of the world. IPA has eight symbols for different types of rhotics. These sounds are observed in three places (alveolar, retroflex, and uvular) and four manners of articulation (trill, tap/​flap, approximant, and fricative). Trills involve an articulation in which one articulator is held loosely near another so that the flow of air between them sets them in motion, alternately sucking them together and blowing them apart. This kind of production is seen in some forms of Scottish English represented as /​r/​in IPA. A tap is a sound made by a rapid movement of the tip of the tongue upward the alveolar ridge, then returning to the floor of the mouth along the same path. This consonant is observed as an allophone of /​t,d/​in American English in words such as ‘better’ and ‘butter’ pronounced as [bɛɾɚ] and [bʌɾɚ] respectively,

102   Golnaz Modarresi Ghavami

Figure 4.8   Waveform and spectrogram of [ɹe] with a tap/​flap as the intervocalic consonant. An approximant is an articulation in which one articulator is close to another, but without the tract being narrowed to such an extent that a turbulent airstream is produced (Ladefoged and Johnson 2011). Many forms of English have an approximant rhotic consonant represented as an alveolar /​ɹ/​or retroflex /​ɻ/​ in IPA. The only rhotic consonant of Persian is considered by the majority of linguists and grammarians to be an alveolar trill with a tap allophone in intervocalic position and a voiceless fricative allophone in word-​final position. This latter allophone is also observed before voiceless consonants in clusters (Nye 1954: 15; Jazayeri and Paper 1961: 29; Kord-​Zafaranlu-​ Kambuziya 2006 [1385]). Others such as Samareh (1999 [1378]), Majidi and Ternes (1999: 124–​5) have introduced an approximant allophone for this consonant as well. Acoustic investigation of this consonant (Shekari and Nourbakhsh 2012 [1391]; Modarresi Ghavami, 2016 [1394]) has shown that this consonant is basically an approximant in Persian, although tap allophones and voiceless fricative allophones are also observed. Trills are rarely observed and their occurrence is limited to positions of emphasis. Figure 4.8 shows a spectrogram of [ɹ] in initial position. As the spectrogram indicates, [ɹ] has the same formant structure as the following vowel except that it has less intense formants (indicated by a lighter colour) in comparison. This formant structure shows that this consonant is vowel-​like in its production (an approximant) and not a trill or tap. Another feature of this consonant is that its third formant (F3) is lower in frequency compared to the adjacent vowel.

4.3.2 Vowels Vowels are produced with no obstruction in the vocal tract. Standard Modern Persian has six simple and six complex vowels. Simple vowels (monophthongs) Standard Modern Persian has six simple vowels [i, e, a, u, o, ɑ]. This vowel system includes three front and three back vowels, [i, e, a] and [u, o, ɑ] respectively. In terms of height, [i, u] are high, [e, o] are mid, and [a, ɑ] are low.

Phonetics   103

Figure 4.9   Waveform and spectrogram of Persian Simple Vowels

4000 3500














1500 1000


500 0

358 i







395 a





Figure 4.10   Vowel formant frequencies (Female speakers) Vowel quality Acoustically, vowels are characterized by strong formants seen as dark horizontal bars in the spectrographic representation of vowels. The spectrogram in Figure 4.9 shows the formants of the six simple vowels of Persian produced by a male speaker. The first four formants are represented as horizontal bands for each vowel. Each vowel has its own specific formant pattern which distinguishes it from other vowels. The frequency of the first three formants of the six vowels of Persian are given in Figure 4.10 for female speakers and in Figure 4.11 for male speakers. The first formant (F1) is sensitive to vowel height. The higher the vowel, the lower the frequency of the first formant. As the values for the first formant in Figures 4.10 and 4.11 show,

104   Golnaz Modarresi Ghavami 3500 3072



2500 2000




1915 1578

1500 1000 496

283 i





361 a





500 0






Figure 4.11   Vowel formant frequencies (Male speakers) F1 frequency increases as the front vowels [i,e,a] and the back vowels [u,o,ɑ] become more open. The second formant (F2) is sensitive to place of articulation. Front vowels have high F2 frequencies, while back vowels are characterized by low F2 values. The vowel space of female and male speakers is seen in Figure 4.12. Acoustically, the vowel space of female speakers is larger than those of men. This is due to the fact that the female vocal tract is smaller and its resonant frequencies are hence higher. Vowel quantity Up to the sixteenth century AD the vowel system of Early New Persian was the same system observed in Middle Persian which included five long (/​ī, ū, ā, ē, ō/​) and three short (/​i, u, a/​) vowels. The short vowels /​i/​and /​u/​have changed into /​e/​and /​o/​respectively in Standard Modern Persian. Moreover, the long vowels /​ā/​, /​ē/​and /​ō/​have been replaced by /​ɑ/​, /​i, ey/​ and /​u, ow/​respectively (Sadeghi 1978 [1357]: 129–​33). Thus, in the development of Standard Modern Persian vowel system, the quantitative distinction between the vowels has been replaced by a distinction in quality between /​i/​and /​e/​, /​u/​and /​o/​, and /​a/​and /​ɑ/​. Despite this development, vowels are still divided into two groups by many linguists: long /​i,u,ɑ/​and short /​e,o,a/​vowels. This division is reflected in writing by the fact that long vowels are generally represented by letters and short vowels by diacritics,2 which are omitted in writing after children master the writing system in the first grade. This distinction is also important in Persian metrics. There is a difference between the behaviour of short and long vowels in the phonology of Persian: short unstressed vowels can be deleted in rapid speech, while long vowels are almost never deleted (Lazard 1957: 12–​13); in monosyllabic words with CVCC syllable structure, where a glottal consonant appears in the cluster or the final C is a nasal or a liquid, only short vowels appear in V position; short vowels undergo compensatory 2  Short vowels are also represented by the letters and word-​finally as in [to], [na], and [ɡoɹbe].

Phonetics   105 200 i 400

u u

i o


o o









a 2500





Figure 4.12   Vowel space of Female (dots) and Male (squares) Persian Speakers

Table 4.6 Duration (ms) of Persian vowels [i]‌ [e]‌






190 179






169 159





(Modarresi Ghavami 2015 [1393])

lengthening, while long vowels are never lengthened (Kord Zafaranlu Kambuziya and Hadian 2009 [1388]). Some investigators (Sokolova et al. 1952; Hodge 1957; Lazard 1957; Rastorgueva 1964) have preferred to refer to long vowels as ‘stable’ and short vowels as ‘unstable’, due to the observation that long vowels are never deleted or shortened, while short vowels undergo both processes. Moreover, early acoustic investigations had shown that the length distinction between the two sets has disappeared in Standard Modern Persian except in open unstressed syllables (Sokolova et al. 1952; Hodge 1957; Mohammadova 1974). However, recent work on the duration of vowels in Persian has shown that the duration distinction is also present in closed stressed syllables (Modarresi Ghavami 2015 [1393]). Table 4.6 shows the duration of vowels in closed stressed syllables. As the numbers indicate, the long vowels [i,u,ɑ] are consistently longer than [e,o,a] respectively. The low back vowel [ɑ] is the longest and the mid

106   Golnaz Modarresi Ghavami back vowel [o]‌is the shortest vowel in Persian. Moreover, vowels are consistently longer in the speech of women compared to men. Phonetic context can affect the duration of vowels. For example, all vowels are longest in CVCC syllables, stressed vowels are longer compared to their unstressed counterparts, and all vowels are lengthened before voiced codas (Samareh 1999 [1378]). The duration of short vowels increases when a following syllable-​final glottal consonant, i.e. /​ʔ, h/​is deleted, in a process known as compensatory lengthening. Examples are /​daʔva/​‘fight’ and /​tehɹɑn/​‘Tehran’ which are pronounced as [daːvɑ] and [teːɹun] respectively in colloquial Persian. This phonetic increase in duration, results in phonetically distinct pairs such as /​baʔd/​ [baːd] ‘after’ versus /​bad/​[bad] ‘bad’ and /​daʔvɑ/​ [daːvɑ] versus /​davɑ/​ [davɑ] ‘drug, medicine’. Length also plays a distinctive role between loanwords such as [mɑ̆dɑm] ‘madam’ and [kŭɹi] ‘(Marie) Curie’ on the one hand and loan/​native words such as [mɑdɑm] ‘as long as’ and [kuɹi] ‘blindness’. In such examples, the vowels in European loanwords seem to be shorter compared to native words or words borrowed from Arabic. Complex vowels (diphthongs) Diphthongs are sequences of vowels in a syllable. Persian has six complex vowels: [ei] as in [cei] ‘when’, [ai] as in [hai] ‘alive’, [ui] as in [ɹui] ‘zink’, [oi] as in [xoi] ‘(the city of) Khoi’, [ɑi] as in [ʧɑi] ‘tea’, and [ou] as in [ʤou] ‘barley’. The spectrogram in Figure 4.13 shows these six complex vowels. As seen, the formants for each diphthong change in frequency during the production of these vowels. Since the second member of each diphthong is a high vowel, the frequency of the first formant reduces from the beginning to the end of the diphthong, as the first formant is sensitive to vowel height. For the complex vowels that end in [i]‌, the second formant has a rising pattern, as [i] is a front vowel characterized by high F2 frequencies. The only diphthong that ends in [u], i.e. [ou] shows a reduction in F2 frequency, as [u] is a back vowel characterized by low F2 frequencies.

Figure 4.13   Waveform and spectrogram of Persian Complex Vowels

Phonetics   107

ui ui

400 ou ei ei

oi ou oi



ɑi ɑi


ai 1000


ai 2500





Figure 4.14   Vowel space of Persian diphthongs Acoustic investigation of the diphthong [ou] has shown that this vowel is mainly pronounced as a long vowel which can be represented as [oː] (Modarresi Ghavami 2010 [1389]). The diphthong variant of this vowel is seen in a minority of cases in careful exaggerated speech. Figure 4.14 shows the diphthongal vowel space of female (dots) and male (squares) speakers of Persian.

4.4 Suprasegmentals Suprasegmentals are those aspects of speech that involve more than one segment. These include stress, tone, and intonation. The three acoustic properties of duration, intensity, frequency, and spectral characteristics, which are respectively known as length, loudness, pitch, and quality in the perceptual domain, are used to realize the suprasegmental features of stress, tone, and quantity in the lexical domain and intonation in the non-​lexical domain (Hirst and Di Cristo 1998: 7). The two most relevant suprasegmental features in Persian are stress and intonation discussed below.

108   Golnaz Modarresi Ghavami

4.4.1 Stress Stress is the perceived prominence of a syllable compared to other syllables in the string of speech. Stressed syllables can be heard as louder, longer, and higher in pitch. In certain languages such as English, vowels are fully realized in stressed syllables, but are reduced to other vowels such as a schwa in unstressed ones. Earlier studies have indicated that Persian vowel space gets smaller when vowels are unstressed compared to when they are stressed (Ghara’ati 2010 [1389]; Alinezhad 2012 [1391]). In other words, vowels tend to shift to a central position similar to the position for a schwa in unstressed syllables. However, further acoustic investigation shows that keeping all variables (including word length) constant except for stress, vowel space does not change as a function of stress (Modarresi Ghavami 2014 [1392]), indicating that vowel quality does not change in unstressed syllables compared to stressed syllables. Investigations of the acoustic correlates of stress in Persian (Sepanta 1975 [1354]; NatelKhanlari 1988 [1367]: 150–​1; Vahidiyan Kamyar 2000 [1379]: 23–​4; Mousavi 2007 [1386]) have found frequency (pitch) to be the main acoustic correlate, while the involvement of duration and intensity has been found to be minimal. Gender seems to play an important role in this issue, as women have been found to use duration to make a syllable more prominent, while frequency seems to be the main acoustic correlate of stress in the speech of men (Modarresi Ghavami 2014 [1393]).

4.4.2 Intonation Intonation is defined as patterns of pitch changes used by speakers to convey linguistic as well as pragmatic meaning (see also Chapter 6). As such, the study of intonation from a phonetic point of view involves the investigation of changes in fundamental frequency (i.e. the frequency of vocal fold vibration) as the acoustic correlate of pitch. For example, in Figure 4.15 the variations in fundamental frequency (F0) in the phrase [bɑjad zudtaɹ baɹɟaɹdand]

Figure 4.15   Waveform, spectrogram, and the pitch contour of a statement

Phonetics   109

Figure 4.16   Waveform, spectrogram, and pitch contour of [bale] ‘yes’

‘they must return i­mmediately’ is shown as a pitch contour superimposed on the spectrogram of this phrase. Two prominent peaks are observable in this figure: a less prominent one on the second syllable of [bɑjad] ‘must’ and a more prominent one on the second syllable of [zudtaɹ] ‘immediately’. Speakers can use pitch to make a word more prominent in a linguistic unit such as a phrase, clause, or sentence. The speaker of the sentence represented in Figure 4.15 has made two words more prominent in the phrase in order to highlight the importance of departing immediately. Using intonation to highlight words important in conveying the intended meaning is called tonicity. Tonic peaks fall on the stressed syllable of intended words. Tonicity is dependent on the intended linguistic and pragmatic meaning and is not predictable. The tonic peak could have fallen on the first syllable of [baɹɟaɹdand] ‘return’ (3rd pl.) to highlight ‘returning’ rather than any other action. At the same time, we can see that the pitch contour falls at the end of the phrase in Figure 4.15. This is the typical intonation pattern of a statement in Persian. The use of intonation to mark the beginning and end of phrases, clauses, and sentences is called tonality. Intonation can also be used to convey definiteness, incompleteness, objection, irritation, etc. without changing the lexical meaning. This use of intonation comes under the heading of tone. Figure 4.16 shows the waveform and pitch contour of the word [bale] ‘yes’. The first instance on the left represents the word in its neutral condition, i.e. expression of consent and agreement marked by a falling intonational pattern. The second instance with a rising pattern represents a question form of the word meaning ‘what?’ with a touch of irritation on the part of the speaker. The third case, which has a relatively long and prominent first syllable and shows a peak on the second syllable conveys definite agreement and consent. The last case still means ‘yes’ but conveys a sense of reluctance and frustration on the part of the speaker due to its high-​falling pitch contour pattern. Intonation plays many important roles in language: grammatical, pragmatic, attitudinal, sociological, psychological, etc. which are topics discussed in phonology.

110   Golnaz Modarresi Ghavami

4.5 Summary This chapter was an overview of the phonetic aspects of the sound system of Modern Standard Persian. It started with a brief introduction to the sound system of Early New Persian spoken between the eighth and twelfth centuries AD and the development of this system into what we observe today in Modern Standard Persian. The articulatory as well as acoustic properties of Modern Standard Persian consonants were introduced in two main sections on obstruents (plosives, fricatives, and affricates) and sonorants (nasals and approximants). In discussing consonants, the place and manner of articulation of Persian consonants, as well as issues of voicing and VOT were introduced. This section was followed by a review of the acoustics of simple and complex vowels of Persian. Simple vowels are not only qualitatively different, but also a difference in quantity is observable between short and long vowels acoustically. Phonetically complex vowels were also investigated acoustically, showing that at least five complex vowels are observable in Modern Standard Persian. The last section of the present chapter included a brief discussion on the acoustics of the suprasegmentals of stress and intonation in Persian.

chapter 5

PHONOL O G Y Mahmood Bijankhan 5.1 Introduction This chapter investigates the phonology of contemporary Persian according to the formal and colloquial speech data as spoken slowly by literates in the Tehrani dialect. First of all, a phoneme inventory is posited using identical and complementary distributions. Then Persian syllable structure is explained on the basis of the sonority hierarchy. Phonological processes in Persian are tested against feature geometry in order to posit the hierarchical structure of the phonetic features. Afterwards, Persian phonological rules are discussed as evidence for natural classes. Prosodic features of length and stress are also discussed. Finally, several appropriate processes are chosen to investigate the interaction of violating markedness and faithfulness constraints which lead to a laryngeal conspiracy and hierarchical Hasse diagram.

5.2  Phoneme inventory When two words differ minimally in one sound, the two sounds can belong to two separate phonemes. Examples of how this works in the consonantal system of Persian are provided in (1). Noticeably, the pronunciations of these surface minimal n-​tuplets follow -​aɹ or -​ɑɹ frames, except for [ʒaɹf ] and [ɹɑn] in (1b). For instance, from the minimal 5-​tuplet in (1a), five consonantal phonemes, i.e. /​p/​, /​b/​, /​f/​, /​v/​, and /​m/​can be derived. Single quotations are used to represent the meaning of words. ‘ ̥’ stands for absence of voice. (1)  a. labials:

phaɹ b̥ aɹ ‘feather’ ‘on’

b.  dentalveolars: thaɹ


faɹ vaɹ mɑɹ 'glory’ ‘side’ ‘snake’ saɹ





‘wet’ ‘door’ ‘head’ ‘gold’ ‘evil’ ‘profound’ ‘four’

ʤɑɹ ̥ ‘shout’

112   Mahmood Bijankhan naɹ ‘male’

lɑɹ ‘name of a city’

c. palatals: chaɹ d. velars:

‘deaf ’

ɟ̥aɹ ‘bald’

khɑɹ ‘work’

ɡ̥ɑɹ ‘a suffix’

e. uvulars: ɢ̥ɑɹ ‘cave’

ɹɑn ‘thigh’

jɑɹ ‘friend’

χɑɹ ‘thorn’

f. glottals: Ɂɑɹ ‘scandal’

hɑɹ ‘rabid’

In order to classify Persian consonants, the articulatory gestures described by Catford (2003) and the articulator-​based theory in phonology (Halle et al. 2000) were taken as criteria for a sound classification. Accordingly, consonants are phonetically classified into six groups: labials, dentalveolars, palatals, velars, uvulars, and glottals. For labials, for instance, the lower lip articulates with either the upper lip, as in /​p/​, /​b/​, and /​m/​, or the upper teeth, as in /​f/​and /​v/​. In (1), subordinate examples of Persian consonant categories are headed by the place of articulation of the bolded first segments of each word. A closer examination of these examples encourages us to posit additional Persian consonants that are phonemically distinct. Controversial distinctions will be reviewed and discussed in (i) to (v) below. (i) The laryngeal states responsible for contrast in pairs of homorganic stops and those responsible for fricatives are different. While scholars agree that pairs of homorganic fricatives are easily categorized as either voiceless or voiced phonemes, i.e. /​f/​vs. /​v/​, /​s/​ vs. /​z/​, and /​ʃ/​vs. /​ʒ/​, the situation for stops is not straightforward.  Whether aspiration or voicing causes the contrast is significant. Experiments by Zavjalova (1961, cited in Windfuhr 1979) showed that voiceless stops are generally aspirated whereas voiced stops are never aspirated but may be (partially) devoiced or (partially) voiced in specific environments. Qarib (1965, cited in Windfuhr 1979) found that the voiceless stops are marked by various degrees of aspiration that are, however, entirely lost after voiceless fricatives. Lazard (1992: 8–​9) claimed that voiced stops become voiceless in the final position but without being confused with unvoiced counterparts, which, by comparison, are more energetic and strongly articulated. Mahootian (1997) believes that voiceless stops are aspirated in the syllable-​initial position but unaspirated at the end of a syllable. Samareh (1999) considered voiceless stops to be aspirated except when they precede stops and fricatives. Based on impressionistic judgement, he believes that all voiced obstruents lose their voice at the end of a word or when they occur before or after a voiceless consonant, and that voiced stops in the word-​initial position lose their voice almost totally. According to VOT (Voice Onset Time) studies by Bijankhan and Noorbakhsh (2009), Persian uses mainly {voiceless unaspirated} and {voiceless aspirated} categories for voiced and unvoiced distinctions in the initial position and {voiced} and {voiceless aspirated} categories

Phonology   113 in the intervocalic position. Windfuhr (2009b) asserted that the distinctive feature of the pairs of stops and fricatives is still being debated. It may be identified either as voice or as tenseness, and tense stops are aspirated word-​initially. UPSID (UCLA Phonological Segment Inventory Database) considers Persian voiceless stops as phonologically aspirated segments. It seems that the amount of space between vocal folds signals opposition for the homorganic stops, and the presence vs. absence of the vocal fold vibration (or tension) signals opposition for the homorganic fricatives. Some Persian scholars used the tense/​lax distinction for voiced–​voiceless opposition (Samareh 1999; Windfuhr 2009b; see also Jessen 1998 for German). However, the conventional IPA (International Phonetic Alphabet) transcription for voiced and voiceless stop is maintained here. (ii) No general consensus has been reached for the place of articulation of /​t/​and /​d/​. While some consider them to be dental (Windfuhr 1979; Pisowicz 1985; Lazard 1992; Samareh 1999; UPSID), others describe it as either dentalveolar (see Haghshenas 1990)  or alveolar (Majidi and Ternes 1999). Mahootian (1997) described it as either apico-​alveolar or apico-​dental. Therefore, consistent with the opinion of most scholars, /​t/​and /​d/​, whose constrictions are made by the tip and blade of the tongue (as active articulators) and the region including the upper teeth and alveolar (as two passive articulators), could be dentally articulated. Affricates along with /​ʃ/​and /​ʒ/​ are considered to be post-​alveolar and thus are classified with dentalveolars (Majidi and Ternes 1999; UPSID), because their active articulator depends on the blade of the tongue. Windfuhr (2009b) classifies them as palatal on account of the passive articulator that forms the constriction. Like voiceless stops, voiceless affricates are aspirated (Bijankhan and Noorbakhsh 2009). /​s/​, /​z/​, /​l/​, and /​ɹ/​are considered to be apico-​alveolar. (iii) There is no consistency in the recognition of the phonemic status of the palatal and velar stops. While some scholars consider them as velars or prevelars (c.f. Mahootian 1997; Majidi and Ternes 1999; Windfuhr 2009b; UPSID), Pisowicz (1985) considers the palatal articulation as the chief one. Velars occur only in the syllable-​onset position when the nucleus is a back vowel while palatals occur in all other positions. Therefore, posterodorso-​velar stops [k, ɡ] should be considered as allophones of anterodorso-​ palatal stops /​c, ɟ/​. (iv) There is also some debate as to whether the voiced uvular phoneme is a dorso-​uvular stop or dorso-​velar fricative. Pisowicz (1985), Mahootian (1997), Samareh (1999), and UPSID consider it to be a stop, while Windfuhr (1979, 2009b) and Majidi and Ternes (1999) classify it as a fricative. According to Windfuhr (2009b), while it is systemically a lax fricative, intervocalically it is a lax velar fricative, [ɣ]; in initial and final positions it is a lax uvular stop [ɢ]. Majidi and Ternes (1999) state that /​ɣ/​is [ɢ] in the word-​ initial position after nasals, and when geminated; otherwise it is postvelar. Sadeghi (2006) posited that the voiced dorso-​uvular stop in Persian was borrowed from Arabic in the first centuries of the modern Persian era. Bijankhan and Noorbakhsh (2009) concluded that the voiced uvular should be treated as a stop, although in some positions (e.g. between two vowels or in the word-​final position) it is converted into its fricative or sonorant allophone. For example, /​ɢ/​in /​ɑɢɑ/​‘Mr.’ becomes [Ɂɑʁɑ]. Hayes (2009) considers the uvular stop as voiceless.

114   Mahmood Bijankhan (v) There is also no consistency in the explanation of the phonemic status of glottal stops (Windfuhr 1979). /​Ɂ/​is an Arabic loan similar to /​ɢ/​. When used at the beginning of words and before vowels, the main issue is whether [Ɂ] is predictable in being automatically inserted by Persian speakers or is distinctive there. A dual function can be suggested for the glottal stop (Haghshenas 1990): it is distinctive at the beginning of Arabic (and English) loan words though not at the beginning of the original Persian words, and it is a prosodic element (in Firthian terminology), like /​j/​, when resolving hiatus in accordance with the syllable-​onset obligation. Based on the IPA consonant chart, the Persian consonant system is tabulated in (2). (2)

Labial p b f v m

Voiceless stops Voiced stops Voiceless fricatives Voiced fricatives Nasals Liquids Glide

Dentalveolar t, ʧ d, ʤ s, ʃ z, ʒ n l, ɹ

Palatal/​Velar c ɟ

Uvular ɢ χ

Glottal Ɂ h


Oral stops and affricates are symmetrical in terms of voicing and aspiration, except for the uvular stop. In terms of voicing, labial and dentalveolar stops and fricatives also form a complete symmetry. From the minimal 6-​tuples in example (3), six vowels can be derived: /​i/​, /​e/​, /​a/​, /​u/​, /​o/​, /​ɑ/​. (3)  siɹ ‘garlic’ seɹ ‘numb’ saɹ ‘head’

suɹ ‘feast’ soɹ ‘sliding’ sɑɹ ‘starling’

Based on Catford’s (2003) coding system for the cardinal vowels, Persian vowels can be coded as CV1 (/​i/​), CV2 (/​e/​), CV4 (/​a/​), CV8 (/​u/​), CV7 (/​o/​), and CV5 (/​ɑ/​), where CV stands for the cardinal vowels. The place of articulation is either front, i.e. anterodorso-​ palatal, or back, i.e. posterodorso-​velar. Front vowels are unrounded and back vowels are rounded. The three degrees of vowel height are distinctive. Lip shape and nasality are not distinctive. Vowels lengthen before voiced consonants and clusters (Samareh 1999), and vowels are classified into two groups in terms of length: short vowels /​e, o, a/​, and long vowels /​i, u, ɑ/​. Scholars interested in diachronic phonology use the term ‘stable’ for long vowels because they retain their duration in all positions, and ‘unstable’ for short vowels because their duration varies in accordance with their position (Windfuhr 1979, 2009b; Lazard 1992). Length distinction is also the basis of rhythm in classical Persian verse. The Persian vowel system is displayed in (4). (4)

High Mid Low

Front i e a

Back u o ɑ

Phonology   115 A partition of the vowels into the long vowels /​i, u, ɑ/​ surviving from the Middle Persian /​e:, o:, a:/​, and the short vowels /​e, o, a/​  surviving from the Middle Persian /​i, u, a/​(Sadeghi 1978), is needed to understand some phonological patterning. Some minimal pairs like [ɢom] ‘name of a city’ vs. [ɢowm] ‘people’, [doɹ] ’pearl’ vs. [dowɹ] ‘around’, and [hol] ‘push’ vs. [howl] ‘about’ may provide evidence of the diphthong /​ow/​as a phoneme. However, it is one of the least frequently occurring sounds in Persian, being observed in few words and not participating in the nucleus position of CVCC syllables at all. [w]‌would be in complementary distribution with [v] when it occupies the position of the first consonant of the syllable coda, in which case /​o/​is the only vowel that can occupy the nucleus position. Thus [w] emerges as an allophone of /​v/​(Haghshenas 1990; Yarmo­ hammadi 1995; Samareh 1999). Accordingly, the above-​mentioned words can be phonemi­ cally transcribed as /​ɢovm/​, dovɹ/​, and /​hovl/​. In total, the Persian phoneme inventory contains 23 consonants and 6 vowels.

5.3  Syllable structure A syllable is a formal and psychological entity to which phonological rules, phonotactic constraints, alternations, stress, and intonational patterns refer. A phonological word is parsed into syllables. In general, a syllable (σ) contains an obligatory nucleus (N) preceded by an optional consonantal onset (O) and is followed by an optional consonantal coda (CO). The onset and the coda form the margin of a syllable. Rhyme (R) is another constituent of the syllable formed by a nucleus and a coda. Rhyme and onset are dominated by the syllable node. The hierarchical structure of Persian syllables is displayed in (5). (5) 

σ O






The template of the Persian syllable is CV(C)(C). The onset must contain only one consonant. Thus Persian has three syllable structures: CV, CVC, and CVCC. A glottal stop occurs at the beginning of a vowel-​initial word. Clear evidence in Persian supports a strong bond between the nucleus and the coda rather than the nucleus and the onset. For example, no onset constraint exists in CV words except that /​ʒ/​ cannot be followed by /​o /​. A syllable template makes syllabification more straightforward. Given that vowels alone comprise the nucleus in Persian, syllabification of words into sequences of syllables starts with the assignment of each vowel to a syllable root. Then each intervocalic consonant before a vowel is assigned to the onset of the following syllable, and by joining the remaining intervocalic consonants to the coda of the preceding syllable, the process of syllabification ends. For example, syllabification of the word /​ʃahɹvandi/​‘citizenship’ proceeds as shown in (6).

116   Mahmood Bijankhan Nuclei are indicated by bolded underlined vowels and the syllable boundary is indicated by a dot. (6) /​ ʃ a h ɹ v a n d i /​→ /​ ʃ a h ɹ v a n d i /​→/​ ʃ a h ɹ. v a n . d i /​ Persian phonotactic constraints within the codas of the syllables violate the sonority sequencing principle (SSP) (Hayes 2009), which requires segments to progressively decrease in sonority as one proceeds outwards from the nucleus towards the end of the syllable. For example, the order of the segments in the coda of a monosyllabic word such as /​sadɹ/​ ‘top’ disagrees with the sonority hierarchy according to which liquids are more sonorant than stops. According to a statistical analysis of the Persian lexicon conducted by the author, it was found that 21.5 percent of the clusters in the coda position violate the SSP. Sequences of fricatives+nasals comprise the most frequent violations. Putting the sequences made of affricates aside, sequences of nasal+glides and liquids+glides do not exhibit any violations. This being the case, given a consonant cluster in disagreement with the sonority hierarchy in Persian, one will also find a corresponding cluster in agreement with it.

5.4  Distinctive features Sound distinctions within a minimal n-​tuplet can be characterized by phonetic features as basic units of phonological description. For example, the difference between /​baɹ/​ and /​mɑɹ/​, or /​ daɹ/​ and /​naɹ/​, which results in identification of the phonemes /​b/​vs. /​m/​, and /​d/​vs. /​n/​, can be characterized by the distinctive feature [nasal]. Since [nasal] is a function of velic aperture and only two values, i.e. being closed or not, are used to differentiate the words, the distinctive feature [nasal] is binary: [+nasal] for /​m/​and /​n/​, and [–​nasal] for /​b/​and /​d/​. Features may be single-​ valued or multi-​valued depending upon how they function in the sound system. Features are, in fact, formal devices for the partitioning of phonological space into natural classes of sounds. Based on the methodology of McCarthy (1988), phonological processes like assimilation, reduction, and dissimilation operate on consistent subsets of features, resulting in a hierarchical organization of features, called feature geometry. In this section, some Persian phonological processes are tested against the feature geometry of Halle et al. (2000), as illustrated in (7). (7)

[cons], [son]

Place Guttural

Lips Tongue Blade

Tongue Body

Soft Palate


[lat][cont] [lab] [ant][dist][cor][dors][back][high][low][nasal][sg][cg][voice]

Phonology   117 First of all, broad articulative categories, i.e. vowels, glides, liquids, nasals, and obstruents, can be specified by two main articulator-​free class features: [consonantal] and [sonorant] (8), abbreviated as [cons] and [son]. (8)

Vowels Glides Liquids Nasals Obstruents

[son] + –​ + + –​

[cons] –​ –​ + + +

Natural class /​i, e, a, u, o, ɑ/​ /​j, h, Ɂ/​ /​l, ɹ/​ /​m, n/​ all stops, fricatives, and affricates

While voiced and voiceless obstruents are phonemically distinct except for uvulars (see (2)), no such distinction exists for sonorants because they are all voiced in Persian. This justifies two natural classes: [+son] and [–​son]. Vowels and glides, /​j/​and [w]‌, belong to the [–​cons] class because they do not have a radical constriction in the supralaryngeal cavity. However, /​j/​and [w] unlike vowels occupy the position of margins in Persian syllables. The laryngeals /​h, Ɂ/​, like /​j/​and vowels, belong to the [–​cons] class, because they do not have a radical constriction in the supralaryngeal cavity. In Persian, laryngeals and glides can be characterized as a natural class because they occur between two consecutive vowels in order to resolve hiatus. Changes in the values of the features of [son] and [cons] affect the entire segment, so they represent the root node. Deletion of /​t/​and /​d/​in the word-​f inal position provides evidence for reduction of the major class features of [-​son] and [+cons] in conjunction with the features of [-​cont], [voice], and Place which constitute the whole segments of /​t/​ and /​d/​. Distribution of laryngeal states in different contexts, as stated in section 5.2, suggests that aspiration is more fundamental than voicing in the distinction between voiced and voiceless stops. Therefore, the feature [spread glottis] differentiates between voiceless stops ([+spread glottis]) and voiced stops ([–​spread glottis]), as in German (Jessen and Ringen 2002). (From now on, [spread glottis] is abbreviated as [sg].) Laryngeal features form a class node independent of place and manner features. Aspirated (voiceless) stops coming after voiceless fricatives deaspirate and reduce to unmarked [–​sg] (see (9)), and voiced obstruents assimilate in [voice] to their adjacent voiceless consonant (see (11)). In Persian, an alveolar nasal is assimilated to the oral consonant following it, but it remains unchanged before laryngeals. Taken from Halle et al. (2000), (7) shows the division of place nodes into three active articulators: [lips], [tongue blade], and [tongue body]. In Persian /​n/​becomes a labial [m]‌before the bilabials /​m/​, /​b/​, and /​p/​and it becomes a labio-​dental [ɱ] before the labio-​dentals /​f/​and /​v/​. Both bilabials and labio-​dentals employ the lower lip as an active articulator but differ in the passive articulator. Also, affricates become fricative before consonants sharing the same active articulator, i.e. the tip or blade of the tongue. The sequence /​ʤd/​becomes [ʒd], as in [maʒd] ‘glory’, simply because /​ʤ/​loses its [-​cont] portion. In colloquial speech, the place of articulation of the palatal stops /​c, ɟ/​in the morpheme/​word-​final position accommodates to the velar region when occurring before morpheme/​word-​initial velar or uvular consonants, [k, ɡ, ɢ, χ]. For example, the word /​ɹoch-​ɟu/​is pronounced as [ɹok-​ɡu] ‘frank’ (see Chapters 2, 3, 4, 6, 10, 11, and 15 for more on colloquial form).  Such accommodation could easily occur as a result

118   Mahmood Bijankhan of the palatal, velar and uvular sharing the same active articulator (i.e. the body of the tongue) in Persian. Terminal binary features are responsible for finer distinctions within coronal (abbreviated as [cor]) and dorsal sounds (see Chapter 4 for more information about dorsal sounds). Place assimilation of the non-​continuants /​t/​, /​d/​, and /​n/​to the place of the coronals /​s/​, /​z/​, /​ʃ/​, /​ʒ/​, /​ɹ/​, and /​l/​suggests the binary features [anterior] and [distributed], abbreviated as [ant] and [dist]. [+ant] coronals are articulated at the alveolar ridge or further forward, while [−ant] coronals are articulated behind the alveolar ridge. [+dist] coronals have a longer contact area than [–​dist] coronals (Hayes 2009). /​t, d, ʧ, ʤ, ʃ, ʒ/​ are [+dist].  Additionally, place assimilation of the palatal stops /​c, ɟ/​to the following back vowels /​u, o, ɑ/​, together with the three degrees of height contrast within the vowels, lead one to suggest the binary features [back], [high], and [low]. In Persian, the palatal, velar and uvular consonants should be considered as [–​back, +high], [+back, +high], and [+back, –​high], respectively. However, there is no evidence for the unity of manner features under one class node in Persian. [continuant], abbreviated as [cont], distinguishes stops, affricates, and nasals, all involving airflow interruption, i.e. [–​cont], from other segments, i.e. [+cont]. Depending upon their place of articulation, Persian stops have a tendency to convert to fricatives in colloquial speech. For example, the words /​vaɢt/​‘time’ and /​χaste/​‘tired’ are pronounced as [vaχ] ‘time’ and [χasse] ‘tired’, respectively. Affricates, as will be seen, should be regarded as a sequence of [–​cont] and [+cont]. The only Persian [lateral] is alveolar. The frequent allophone of the liquid /​ɹ/​is approximant-​or spirant-​like (Bijankhan 2014). However, it becomes a tap [ɾ] in the intervocalic position. Some scholars consider the glides and liquids as [+approximant] according to their distribution in the syllable structure (Kenstowicz 1994; Gussenhoven and Jacobs 2011), though no evidence for this conclusion exists in Persian. In some languages, like English, sonorants occur only before obstruents in the coda clusters, whereas Persian sonorants, like other segments, occur before and after obstruents in the coda clusters. Tables 5.1 and 5.2 contain the feature specifications of twenty-​three consonants and six vowels in Persian, respectively. In the tables, phonemes are specified for features that are responsible for phonological distinctiveness and natural classes. If a feature specification does not help to differentiate one phoneme from another, no value for that feature is specified. For example, vowels are not marked for [lab] and [cor], simply because vowels are dorsal. Since the vowel distinctive features [back], [high], and [low] and their combination cannot specify vowels in terms of the length, the feature [long] can be specified for the classification of vowels into two natural classes, i.e. /​i, u, ɑ/​as [+long] and /​e, o, a/​ as [-​long]. The value of [voice] is specified for all stops and affricates because they form a natural class with their fricative counterparts. /​ʧ/​and /​ʤ/​are represented by a sequence of the values of minus and plus for the feature [cont]. Membership of laryngeals to the [+son] or [–​son] class is controversial in various languages (Gussenhoven and Jacobs 2011). Chomsky and Halle (1968) classify laryngeals as [+son]. Since laryngeals and /​j/​ form a natural class in Persian, specified as [–​cons] (see section 5.8), they are unspecified for [son].

Phonology   119 Table 5.1 Feature chart of Persian 23 consonants. Unary features are marked by ✓ ph b th d



G ʧh ʤ f

v s

z ʃ

Ʒ χ m n


ɹ j




–​ –​ –​ –​

–​ –​ –​ –​

–​ –​ –​ –​ –​ –​ –​ –​ +


+ + +


+ + +




+ + –​ –​ –​


–​ –​ –​ –​


+ + +

+ + + + + + + +

–​ –​ –​ –​ + –​ + + + + + + + +


–​ –​ + + + +



–​ –​ + –​


–​ + –​ +

–​ + +


+ –​ +



–​ +

–​ –​ +

–​ + –​ + –​ + –​ +


+ + +


–​ +

cg lab

–​ +

✓ ✓

✓ ✓


✓ ✓

✓ ✓ ✓ ✓

✓ ✓





+ + –​ –​


+ +



–​ –​ + +


dist dors

✓ ✓ ✓


–​ –​ +









Table 5.2 Feature chart of Persian six vowels i



























5.5 Phonological rules Phonological rules map the underlying form (UF) of morphemes onto either the intermediate representation or the surface form (SF) of the morphemes. One of their functions is to provide evidence for distinctive features and natural classes. In this section, first a

120   Mahmood Bijankhan descriptive generalization characterizing regularities for the data under review is provided for each phonological pattern. Then, to account for generalizations, phonological rules are formulated. Steps of derivation are not, however, shown for the data.

5.5.1 Deaspiration Voiceless stops lose their aspiration after voiceless fricatives or before obstruents. (9)

chaɹ ɟ̥aɹ thaɹ d̥aɹ

Initial ‘deaf ’ ‘bald’ ‘wet’ ‘door’

Medial ʃechaɹ ‘sugar’ Ɂaɟaɹ ‘if ’ χathaɹ ‘danger’ hadaɹ ‘waste’

Final χɑch ‘soil’ saɟ̥ ‘dog’ χath ‘line’ had̥ ‘limit’

-​-​-​C or C-​-​-​ zechɹ̥ ‘mention’ Ɂascaɹ ‘a name’ Ɂacsaɹ ‘most’ sathl̥ ‘bucket‘

Voiceless stops appear aspirated in all positions except after voiceless fricatives and before obstruents. They are optionally unreleased before nasals. Unaspirated stops appear as voiced only in the voiced environment while they are either unvoiced or semi-​ voiced unaspirated in all other positions. Thus, aspiration is more distinctive than voicing in contrasting homorganic stops. The relevant feature in differentiating homorganic stops should be [sg]. The phonological rule in (10) can be postulated when [sg] is distinctive. (10) –cont +sg


–son +cont [–son]

Rule (10) implies that phonological distinction between aspirated and unaspirated stops is neutralized in contexts in which deaspiration occurs.

5.5.2 Devoicing Two kinds of devoicing are accounted for: (i) Obstruents become voiceless in the word-​final position and before or after a voiceless consonant. (11)

UF juz-​phalanɟ phitzɑ Ɂathvɑɹ ʧhub ʧhub-​i ʧhub-​ʃuɹ

SF juspalaɲɟ ̥ phitz̥ɑ Ɂatv̥ɑɹ ʧhub̥ ʧhubi ʧhub̥ʃuɹ

‘panther’ ‘pizza’ ‘manners’ ‘wood’ ‘wooden’ ‘pretzels’

The rule in (12) formalizes the devoicing process for Persian obstruents. Stops and affricates are also voiceless or devoiced in the word-​initial position.

Phonology   121 (12) [–son]

# [–voice] [–voice]


(ii) Sonorants lose their voice after aspirated stops and become spirant-​like. (13) UF sathl machɹ Ɂemthijɑz

SF sathl̥ machɹ̥

‘bucket’ ‘deception’

Ɂemthjɑz ̥


Rule (14) formalizes the devoicing process for Persian sonorants. (14) [+son]



5.5.3 Degemination Word-​final geminates are reduced to singletons when in isolated form and before consonants. (15)

Root χath ‘line’ doɹ ‘pearl’ hes ‘sense’

Before vowel χ​i ‘linear’ doɹ.ɹ-​i ‘pearl-​like’ hes.s-​i ‘sensorial’

Before consonant χat-​cheʃ ‘ruler’ doɹ-​dɑne ‘dear’ hes-​ɟaɹ ‘sensor’

-​i marks the adjectival or nominal suffix. Alternation between a final single consonant and consonant doubling should be considered to postulate UF. Allomorphs in the root and before consonant columns in (15) include a final single consonant but allomorphs before a vowel-​initial suffix include consonant doubling. Let us assume that the root represents the UF. This might be a reasonable postulation, because a single consonant rather than consonant doubling occurs in more phonological contexts. If so, then a rule that doubles final consonants before a vowel generates forms included in the second column. However, such final consonant doubling does not exist for a large number of Persian words. Therefore, the root with a final geminate is the better candidate to represent UF. Subsequently, a rule that reduces gemination would generate the surface root and the third column forms in (15). Moreover, conversion of a consonantal doubling into a singleton suggests that the structural position of a phoneme, represented non-​linearly by the X tier in (16), could be independent of the phoneme itself. In other words, X is a phonological position that represents the quantity of the consonant, which is completely separated from the featural content of the consonant (Kenstowicz 1994: 424). Delinking of the association line represents degemination. The final consonant of the geminate is not moraic and its omission does not trigger lengthening. (16) X


son cons

# c

122   Mahmood Bijankhan

5.5.4 Nasal place assimilation (NPA) An anterior nasal assimilates in place to the following consonant. NPA suggests that [place], unlike manner features, is a group feature. In Persian, an anterior nasal assimilates to the following consonant in the place of articulation, no matter what place it bears, but remains unchanged before laryngeals (17). This may further suggest that laryngeals need not be specified for place or manner features. (17)

UF ʤavɑnmaɹd phanbe senf dandɑn Ɂensɑn ɹanɟ ʧhanɟɑl menɢɑr thanhɑ sanɁath

SF ʤavummaɹd phambe seɱf d̪an̪d̪ɑn Ɂensɑn ɹaɲɟ ʧhaŋɡɑl meɴɢɑr thanhɑ sanɁath

‘chivalrous’ ‘cotton’ ‘union’ ‘tooth’ ‘human’ ‘colour’ ‘fork’ ‘beak’ ‘alone’ ‘industry’

NPA posits that the three oral active articulators, i.e. the lips, tip, or blade of the tongue and the dorsum, act as if they are independent of each other. Considering feature geometry, one unary feature for each active articulator is suggested: [lab] for bilabials and labio-​dentals, [cor] for dentalveolars, and [dors] for palatals, velars, and uvulars. All three are dependent on the group feature [place] (18). Since the coronality of a nasal consonant is affected by whatever place follows an oral consonant, [place] would be an autosegment. Thus the spreading of [place] leftward to a preceding anterior nasal and the delinking of [place] of a nasal explain the Persian NPA. (18)

Soft Plate






Tongue Blade [+ant]

Phonology   123

5.5.5 Dorsal place assimilation (DPA) The palatal stops assimilate in place to the following velars. (19) UF ɟul ɟol chɑl sanɟchɑɹ

SF ɡul ɡol khɑl saŋkhɑɹ

‘deception’ ‘flower’ ‘unripe’ ‘stoner’ [σ

(20) dors –cont


[+back] –son –cont +back +high

The velar phoneme in rule (20) can be either a back vowel or a velar stop.

5.6  Coronal assimilation Apico-​dentalveolar stops assimilate in place to the following coronal. Place assimilation of the non-​continuants /​t/​, /​d/​, and /​n/​to the place of the coronals /​s/​, /​ z/​, /​ʃ/​, /​ʒ/​, /​ɹ/​, and /​l/​suggests the binary features [cor] and [dist]. (21) Phonological context -​-​-​ /​t , d/​ -​-​-​ /​s , z/​ -​-​-​ /​ʃ , ʒ/​ -​-​-​ /​l , ɹ /​

/​t/​-​words ‘even’ Ɂatse ‘sneezing’ Ɂat̠ʃɑn ‘parched’ sathɹ̥ ‘row’ hat̪t h̪ ɑ

/​d/​-​words mod̪d̪at̪h ‘time’ hads ‘guess’ χad̠ʃe ‘flaw’ Ɂadl ‘justice’

/​n/​-​words χan̪d̪e ‘laugh’ t̪hanz ‘irony’ Ɂeṉ ʃɑ ‘essay’ fanlɑnd̪ ‘finland’

Elsewhere allophones of /​t/​and /​d/​in Persian are dental. [t̪]and [d̪] are [+dist] just like interdentals in English, because they have extended constriction in comparison to /​s, z, l, ɹ/​. /​n/​assimilates in place to /​t, d/​, and becomes apico-​dental, i.e. [+dist, +ant]. /​t, d, n/​ assimilate in place to /​s, z, l, ɹ/​, and become apico-​alveolar, i.e. [–​dist, +ant]. However, rule application for /​n/​is vacuous (Hayes 2009). /​t, d, n/​also assimilate in place to /​ʃ , ʒ/​, and become lamino-​post-​alveolar, i.e. [+dist, –​ant]. ‘ ̠ ‘ stands for the place of lamino-​post-​ alveolars. Rules (22) and (23) formalize the process linearly and non-​linearly, respectively. (22) –cont +ant

–αdist αant

–αdist αant

124   Mahmood Bijankhan (23) [–cont]







Blade [+ant]

5.5.7 Compensatory lengthening (CL) Two kinds of CLs are accounted for: (i) Deletion or shortening of glottal consonants in the coda is compensated for by lengthening of the preceding vowel. (24) Formal ʃamɁ manɁ thaɁmiɹ ʃeɁɹ ʃahɹ sobh behthaɹ

Colloquial ʃaːm ma:n thaːmiɹ ʃeːɹ ʃaːɹ soːb beːthaɹ

‘candle’ ‘prohibition’ ‘repair’ ‘poetry’ ‘city’ ‘morning’ ‘better’

The colloquial data in (24) show that segmental duration could be independent of the segments themselves. As the glottal consonant in the coda is deleted, the overall length remains constant through the lengthening of the preceding vowel. This process will not occur if a glottal consonant in the onset is deleted. Since the CV tier treats the onset and coda as the same, it is not suitable for CL justification. Hayes (1989) proposed the mora (μ) tier as an intermediate level between segments and syllables to distinguish the asymmetrical relation between the onset and the coda in CL. Therefore, onset consonants are attached directly to the syllable node, whereas according to the rule of weight-​by-​ position (Hayes 1989), a mora is assigned to each consonant in the coda position. Hayes assigned moraic status to consonants, whose deletion triggers vowel lengthening. Vowels are always moraic. Darzi (1991) explained Persian CL according to the moraic phonology of Hayes (1989) and concluded that glottals in the coda are moraic. Shademan (2005) shows experimentally that the deletion of the glottal consonant does not always result in the lengthening of the vowel. (ii) Lenition and then deletion of /​v/​in the first consonantal coda position is compensated for by lengthening the preceding vowel /​o/​ (25).

Phonology   125 (25)

UF dovɹe nov novɁ fovt movɹed ɢovm

SF dowɾe /​doːɾe now/​noː nowɁ/​noː fowt/​foːt mowɾed/​moːɾed ɢowm/​ɢoːm

‘cycle’ ‘new’ ‘type’ ‘death’ ‘case’ ‘folk’

A question raised here is ‘How can variation between [ow] and [oː] be interpreted in order to achieve a reasonable UF?’ If /​oː/​underlies the variation, a rule is required to shorten the vowel and insert [w]‌. But in the case of /​ow/​as a diphthong, another rule is required to replace the glide portion with [o]. However, lack of a CVCC in Persian cannot be justified if we accept /​o:/​or /​ow/​as a nucleus (Haghshenas 1990). If [ow] is taken as a sequence of a vowel and consonant, then [w] could be regarded as either a phoneme or an allophone of some phoneme. [w] is not distinctive in Persian, because it has a very restricted distribution. Since [w] does not appear initially and is conditioned by the nucleus /​o/​, and since [v] does not occupy the position of the first consonant after /​o/​, [v] and [w] are in complementary distribution. In that case, /​v/​is underlying and [w] is an allophone (Yarmohammadi 1995; Samareh 1999). Another convincing piece of evidence for accepting /​v/​as underlying is the pronunciation of [v] in the derivative or inflected forms of data stated in (26), while formal pronunciation of their roots end in [w] (Samareh 2001; Kord-Zafaranlu-Kambuziya  2007). (26) 

Root now χosɹow ɹow

Derivative/​Inflective nov-​in ‘newest’ χosɹav-​i ‘proper noun’ ɹav-​ɑn ‘fluid’

‘new’ ‘proper noun’ ‘go’

To account for the data in (25), a lenition rule converts /​v/​to [w]‌(27a), and then the delinking of [w] is accompanied by the spreading of /​o/​to the stranded mora (27b). (27) a.

(27) b.

(a) +cont +voice lab (b)

+son dors +back

+back –high –low




+cons –son





[+back] [–high] [–low]



126   Mahmood Bijankhan Data in (24) and (25) show that the subset {[w]‌, /​Ɂ/​, /​h/​} forms a moraic natural class, because their deletion in the coda triggers the vowel lengthening.

5.5.8 Epenthesis Two kinds of epenthesis are accounted for: glide and short vowel epenthesis.

(i) Glides insert to resolve hiatus in morphemic boundaries.

Phonetically, glides /​j/​ and [w]‌belong to the [–​cons] class, while phonologically they function as consonants because they occur between two vowels to avoid hiatus. From the following colloquial speech data in (28), which consist of nouns suffixed with the plural marker -​hɑ, it is clear that when a noun ends in a consonant, /​h /​ is deleted and the final consonant of the noun syllabifies with /​ɑ/​. However, [j] or [w] is inserted when a noun ends in /​i/​or /​u /​, respectively. [h] survives when a noun ends in other vowels. (28) Formal chethɑb-​hɑ faɹʃ-​hɑ bɑzi-​hɑ ʤɑɾu-​hɑ sɑje-​hɑ sedɑ-​hɑ

Colloquial chethɑb-​ɑ faɹʃ-​ɑ bɑ.zi-​jɑ ʤɑ.ɾu-​wa sɑje-​hɑ sedɑ-​hɑ

‘books’ ‘carpets’ ‘plays’ ‘brooms’ ‘shadows’ ‘sounds’

In the framework of the CV phonology of Clements and Keyser (1983), glide insertion is interpreted as the spreading of the features of /​i/​or /​u/​to the C-​position, which leads to epenthetic [j]‌or [w] (29). The phonetic content of glides is similar to the corresponding vowels (Catford 2003). (29)














[+sg] [+back]


Phonology   127 (30) PROG-​VERB=AGR mi-​ɑ=am mi-​ɑ=i mi-​ɑ=ad mi-​ɑ=im mi-​ɑ=id mi-​ɑ=and

Formal SF mi-​Ɂɑ=jam mi-​Ɂɑ=ji/​mi-​Ɂɑ=Ɂi mi-​Ɂɑ=jad mi-​Ɂɑ=jim/​mi-​Ɂɑ=Ɂi mi-​Ɂɑ=ji d/​mi-​Ɂɑ=Ɂid mi-​Ɂɑ=jand

Colloquial SF mi-​jɑ=m mi-​jɑ=j mi-​jɑ=d mi-​jɑ=jm mi-​jɑ=jd mi-​jɑ=n

‘I am coming’ ‘You are coming’ ‘He/​She is coming’ ‘We are coming’ ‘You are coming’ ‘They are coming’

From the formal data in (30), consisting of verbal roots prefixed with the progressive morpheme mi-​and suffixed with the agreement endings (agr), it is clear that [Ɂ] inserts between the prefix and the verbal root, and [j]‌or [Ɂ] inserts between the verbal root and the agreement endings to resolve hiatus. ‘=’ stands for enclitics. (31) Ø → +cg /​]morph1-​-​-​-​[verbal root Condition: morph1 is progressive morpheme /​mi-​/​ (32)

 −cons  Ø→  /​ ]morph1-​-​-​-​[agr  − sg  Condition: morph1 is verbal root

Therefore, the data in (28) and (30) suggest that laryngeals and glides form a natural class because they have a similar function in occupying the floating C-​position to prevent the occurrence of adjacent vowels.

(ii) Short vowels are inserted to break clusters between the stem and suffix.

(33) UF Stem-​Suffix phɑs-​bɑn phaɹvaɹd-​ɟɑɹ mehɹ-​bɑn aɹʤ-​mand

SF phonological word phɑ.s-​e.-​bɑn phaɹ.vaɹ.d-​e.-​ɡɑɹ meh. ɹ-​a.-​bɑn Ɂaɹ.ʤ-​o.-​mand

‘cop’ ‘Lord’ ‘kind’ ‘venerable’

There are lexical exceptions that are not subject to the above rule. /​e/​is the most frequent short vowel in epenthesis. No rule is formalized because the inserted short vowel cannot be predicted phonologically.

5.5.9 Deletion Post-​fricative and post-​nasal anterior stops are deleted at the end of a word in colloquial speech.

128   Mahmood Bijankhan (34)  UF ɹafth ɟeɹefth dasth ɡuʃth mozd ʧhand

SF ɹaf ɟeɾef das ɡuʃ moz ʧhan̪

UF choʃth ɟozɑʃth dusth doɹosth saχth boland

‘went’ ‘received’ ‘hand’ ‘meat’ ‘wage’ ‘several’

SF khoʃ ɡozɑʃ dus doɾos saχ bolan̪

‘killed’ ‘put’ ‘friend’ ‘right’ ‘hard’ ‘tall’

Words that are not frequent in colloquial speech do not undergo such deletion. Rule (35) formalizes the process. (35) –son –cont cor

αson –αcont

cor +ant



5.5.10 Spirantization Oral stops spirantize as a function of their place of articulation. (36) Formal ɑɢɑ maɢɑze noɢthe ɁeʤtemɑɁ veʤdɑn maʤnun χaste nazdich

Colloquial Ɂɑʁɑ maʁɑze noχte Ɂeʃtemɑ voʒdɑn maʒnun χasse nazzic

‘Mr.’ ‘store’ ‘point’ ‘community’ ‘conscience’ ‘mad’ ‘tired’ ‘near’

In colloquial speech, Persian oral stops have a tendency to convert to fricatives while retaining their place of articulation. Dentalveolar stops tend to spirantize after alveolar fricatives (37), affricates spirantize before dentalveolar stops (38), and uvular stops tend to spirantize intervocalically and before dentalveolars (39). (37) –son –cont cor +ant


–son +cont cor +ant

Phonology   129 (38)



–cont dors +back –high











V [cor]

Additionally, there are some free variations differing in /​p/​vs. /​f/​. For example, /​pɑɹsi/​vs. /​ fɑɹsi/​, ‘Persian’, and /​sephid/​vs. /​sefid/​, ‘white’, among others. Therefore, a binary distinction within obstruents between stops and fricatives results in the feature of continuancy. There are disagreements among scholars about how to treat Persian uvulars in terms of the stop-​fricative dimension, i.e. whether /​ɢ/​is distinctive, or /​ɣ/​or /​ʁ/​. There is no doubt that word-​initial uvulars are realized as stops. Given the data in (36), if either /​ɣ/​or /​ʁ /​were distinctive, then an implausible rule would be needed to convert either one to the stop [ɢ] in the word-​initial position. By accepting /​ɢ/​as distinctive, spirantization in postvocalic and intervocalic contexts would be cross-​linguistically reasonable (Kenstowicz 1994). A convincing argument as to why affricates should be regarded as a sequence of [–​cont] and [+cont] can be provided by the lenition of affricates in the postvocalic position before coronals, resulting in the loss of the [–​cont] portion (see Kenstowicz 1994: 32 for English), which means that the contrast between homorganic affricates and fricatives is neutralized. It is reasonable to take the formal forms of the data in (36) as the UF for colloquial forms.

5.5.11 Vowel harmony (VH) Two kinds of VH are accounted for: within-​morpheme and between-​morpheme (Modarresi Ghavami 2011). This dichotomy is explained here in terms of vowel length if necessary (see Chapters 13 and 15 for more on vowel harmony). Within-​morpheme VH Vowels in open syllables harmonize with following vowels under specified conditions.

(i) Short vowels in open syllables harmonize with following low back vowels when the intervening consonant is glottal.

130   Mahmood Bijankhan (40) 

UF bahɑɹ mohɑcheme Ɂemthehɑn saɁɑdath soɁɑl Ɂeɹt heɁɑʃ

SF bɑhɑɹ mɑhɑcheme Ɂemthɑhɑn sɑɁɑdath sɑɁɑl ɁeɹthɑɁɑʃ

‘spring’ ‘trial’ ‘exam’ ‘bliss’ ‘question’ ‘vibration’

The data in (40) provide evidence for back and height harmony in a large number of Persian words. The mora tier is used to represent short vowels as monomoraic, i.e. linked to one µ slot, and to represent /​ɑ/​as bimoraic, i.e. linked to two µ slots. Then VH can be characterized by the spreading of the place node leftward to the harmonic vowel and the delinking of the original place of the vowel. Since glottals are unspecified for the place of articulation, they are transparent to the spreading (41). (41)









Place Body

Body [+back]


(ii) Mid vowels in open syllables are raised to the following corresponding high vowels.


UF doɹud sophuɹ seɹiʃom cheʃich belith

SF duɾud suphuɹ siɾiʃ chiʃich bilith

‘greeting’ ‘sweeper’ ‘isinglass’ ‘watch’ ‘ticket’

The data in (42) provide evidence for height harmony in a large number of Persian words. (43) +son –high –low



+son +high Between-​morpheme VH Front vowels in the open syllable of a functional morpheme either convert to the following high vowels or are raised in height by one degree.

Phonology   131 (44) 

IMP-​ PRES_​STEM be-​ɟiɹ be-​ɟu be-​χoɹ be-​do


SF bi-​ɟiɹ bu-​ɡu bo-​χoɹ bo-​do

‘get’ ‘say’ ‘eat’ ‘run’

NEG-​IMPERF-​PRES_​STEM-​PERSONAL na-​mi-​ɟiɹ=ad na-​mi-​ɹiz=ad na-​mi-​χoɹ=ad

ne-​mi-​ɟiɾ=e ‘He/​She does not get’ ne-​mi-​ɾiz=e ‘He/​She does not pour’ ne-​mi -​χoɾ=e ‘He/​She does not pour’

The data in (44) include the UF of imperative/​subjunctive verbs with their surface representation of colloquial pronunciation, and the UF of negative imperfective verbs with their surface representation of formal pronunciation. The rules in (45) and (46) formalize the /​e/​agreement of imperative/​subjunctive morphemes in height and backness with the vowels /​i/​, /​o/​or /​u/​. The rule in (47) formalizes the /​a/​partial agreement of negative morphemes by one degree vowel (/​i/​) of the imperfective. (45)

+son –back –high –low


] morph C

–back +high

Condition: Morph is imperative. (46)

+son –back –high –low

+back αhigh

–back ]morphC –low αhigh

Condition: Morph is imperative and C must not be labial. (47)

+son –back +low

–high –low


+nasal –back lab +high

Condition: Morph is a negative marker.

5.6  Rule interaction Rules apply in order to derive the allophones from UF in their specified context. Rule interaction results from the relationship that exists between the structural descriptions of any two rules and the phonotactics of the data under analysis. There are many within-​and intermorphemic consonant sequences that involve ordered rules of derivation for the desired SF. Typical examples are given in (48).

132   Mahmood Bijankhan (48)  ʧhand ʧhan̪d̪ ʧhan̪

‘several’ ʧhand-​chɑɹe ‘multi-​functional’ : UF First cycle ʧhan̪d̪-​ : NPA (18) ʧhan̪ -​ : Deletion (35) ʧhan̪ -​chɑɹe Second cycle ʧhan̪-​khɑɾe : DPA (20) ʧhaŋkhɑɾe : NPA (18)

Regardless of the cycles, the ordering relation among rules is given in (49). (49) DPA >> NPA >> Deletion

NPA both precedes and follows word-​final anterior stop deletion. To solve this problem, ordering relation should be defined in two separate cycles: in the first cycle on the word level and in the second cycle on the compound (or phrasal) level. NPA and deletion are in a counterbleeding relation. DPA feeds NPA. (50) rɑsth-​ɟu ‘truthful’ rɑsth-​ɡu rɑs-​ɡu rɑsk-​u

ɁeʤthemɑɁ -​ -​ ɁeʧthemɑɁ ɁeʃthemɑɁ ɁeʃtemɑɁ

‘society’ : UF : DPA (20) : Deletion (35) : Devoicing (12) : Spirantization (38) : Deaspiration (10)

The ordering relation among rules for the data in (50) is given in (51). (51) DPA >> Deletion >> Devoicing >> Spirantization >> Deaspiration The ordering of spirantization and deaspiration can be optional if affricates are taken as a sequence of [–​cont] and [+cont], because in such a case, the [+cont] portion of the affricate, i.e. the unvoiced fricative [ʃ], appears before the aspirated stop. Therefore, the sequence satisfies the structural description of the deaspiration.

5.7  Prosodic structure In this section, prosodic features of length and stress are discussed. There are word pairs in Persian whose opposition comes from a geminate consonant (Tashdid, in Arabic and Persian) in one word and singleton in the other. Such cases are realized phonetically with quantity difference between long versus short consonant, as illustrated below. (52) 

Geminate ban.nɑ ‘builder’ koɹ. ɹe ‘pet’

ba.nɑ ko.ɾe

Singleton ‘building’ ‘sphere’

Phonology   133 mɑ cham.mi hal.lɑl

‘matter’ ‘quantitative’ ‘solvent’

mɑ.de cha.mi ha.lɑl

‘female’ ‘paucity’ ‘halal’

As an example, in opposition between ban.nɑ and ba.nɑ, /​n/​of /​ban/​contrasts with zero of /​baØ/​. Thus a phonetic contrast between [nː] and [n]‌reveals. There are also some Arabic loan-​words containing final consonantal geminate such as /​sadd/​‘dam’ vs. /​sad/​‘hundred’ and /​chamm/​‘quantity’ vs. /​cham/​ ‘little’ A number of phonological patterns provide evidence for long and short vowels as natural classes. For example, the first consonant of clusters in CVCC syllables is strictly constrained by the length of the nucleus. While any consonant is allowed to be the first member of any cluster when the nucleus is a short vowel, only a small subset of clusters, mostly consisting of oral unvoiced fricatives and /​t/​, can occur after long vowels. From this respect, words like /​dasth/​ ‘hand’, /​cheɹm/​‘worm’, /​sobh/​‘morning’ are comparable with /​mɑsth/​‘yogurt’, /​ɟuʃth/​, ‘meat’ and /​ɹiχth/​‘poured’ (Samareh 1999). Consonantal clusters within the words and in the loanwords are mostly broken by short rather than long vowels (Lazard 1992). As was seen, short vowels agree in place and height with long vowels, and not vice versa. Vowel length is also a major determinant of the syllable weight. Light and heavy syllables are monomoraic and bimoraic, respectively. Based on Persian quantitative metrics, Hayes (1989) classifies Persian syllables into four types: light (CV), heavy (CVV, CVC), superheavy (CVVC, CVCC) and ultraheavy (CVVCC) (here, V stands for short vowel and VV for long vowel). Based on compensatory lengthening process in colloquial speech, Darzi (1991) reduces Hayes’ classification into light (CV), heavy (CVVC, CVCC, CVVCC), and superheavy (CVVC). There is a consensus among scholars that pitch is the main phonetic correlate of stress in Persian (Abolhasanizadeh et  al. 2012). Traditional studies distinguish between verbal and non-​verbal stress placement rules: while verbal stress changes as a function of tense, the last syllable of all non-​verbal content words bears stress (Ferguson 1957; Lazard 1992). This is then followed by a unified account of Persian stress, independent of lexical categories (Eslami 2005). Kahnemuyipoor (2003) theorized the unified account by arguing that the same stress rule applies to different syntactic categories at a certain level of the prosodic hierarchy. According to his analysis, Persian stress is assigned rightmost at the phonological word level, leftmost at the phonological phrase level, rightmost at the intonational phrase level and leftmost at the utterance level.

5.8  Optimality theoretic analysis While rule-​based phonology takes UF as an input and applies some operations, such as phonological rules, to change it into a SF as an output, optimality theory (OT), as a constraint-​based phonology, usually takes UF as an input to generate an output candidate set from which one candidate will be selected as the optimal SF. Selection of an output candidate is done by considering members of the candidate set against a hierarchy of violable constraints. The candidate that is most favoured by ranked violable constraints, hence the most harmonic among candidates, would be considered as an optimal SF. The most harmonic

134   Mahmood Bijankhan candidate is the one that performs best on the highest-​ranking constraint (Prince and Smolensky 1993). OT distinguishes two types of constraints:  markedness constraints, which prohibit an output from having some property; and faithfulness constraints, which prohibit differences between the input and output (McCarthy 2002, 2008).

5.8.1 Glottal reduction: a conspiracy To start an OT analysis, some data on deaspiration and devoicing, i.e. dataset (9), are repeated and new data are added. (53)  UF chaɹ ɟaɹ saɟ χɑch ʃechaɹ Ɂadvɑɹ Ɂathvɑɹ hads

SF chaɹ, caɹ, sac, χɑch, ʃechaɹ, Ɂad̥v̥ɑɹ, Ɂatfɑɹ, hats

*caɹ *ɟaɹ *saɟ *χɑc, *ʃecaɹ *Ɂatvɑɹ *Ɂatvɑɹ *hads

‘deaf ’ ‘bald’ ‘dog’ ‘soil’ ‘sugar’ ‘periods’ ‘modes’ ‘guess’

UF aɟaɹ Ɂaschaɹ mesɟaɹ Ɂachsaɹ sathl ɹabth ɹothbe ɹezɢ

SF Ɂaɟaɹ, Ɂascaɹ, mescaɹ, Ɂacsaɹ, sathl, ɹapth, ɹotpe, ɹesq,

*Ɂacaɹ *Ɂaschaɹ *mesɟaɹ *Ɂachsaɹ *satl *ɹabth *ɹothbe *ɹezq

‘if ’ ‘a name’ ‘coppersmith’ ‘most’ ‘bucket‘ ‘relevance’ ‘rank’ ‘aliment’

For each UF, two output candidates are given in the SF column. The starred candidate is the nearest competitor for SF. Returning to the rules in (10), (12), and (14), laryngeal distinctions are neutralized to plain voiceless in two ways: aspiration distinction is neutralized to plain voiceless after a voiceless fricative or before an obstruent, and voicing in obstruents is neutralized to plain voiceless after or before obstruents. The other way to describe the rules is to state those constraints that prohibit aspirated and voiced stops in the SF when they occur in the related contexts above. In OT terminology (McCarthy 2008), aspirated stops are not allowed after the voiceless fricatives or before the obstruents. This requirement is enforced by reduction in glottal spreading. In addition, voiced obstruents are not allowed after or before obstruents. This requirement is enforced by reduction in vocal fold vibration. Seven violable constraints are responsible for laryngeal contrast and neutralization: Ident(lar), *Lar, *[–​sg], *Lar/​Lar, *[voice]#, *#[+voice] Ident(lar):  Every laryngeal autosegment in the input does not change in the corresponding output segment (Lombardi 2001; McCarthy 2008). *Lar: Do not have laryngeal features (Lombardi 2001). *[–​sg]: An unaspirated stop is not allowed. *Lar/​Lar: Do not have laryngeal features for stops in the obstruent sequences. *[voice]#: Word-​final obstruents are not allowed to be voiced. *#[+voice]: Word-​initial stops are not allowed to be voiced. Ident(lar) is a featural faithfulness constraint that prohibits changing the values of the laryngeal features in the output. *Lar is a context-​f ree markedness constraint that prohibits a consonant bearing marked laryngeal features, i.e. [+sg] and [+voice]. Lombardi (2001) uses *Lar just for the marked [+voice] feature. Unlike *Lar, *[–​sg] prohibits an

Phonology   135 unmarked laryngeal feature. *Lar/​L ar is an intersegmental, context-​sensitive markedness constraint that prohibits a stop bearing marked laryngeal features in the obstruent sequence. This constraint is justified by Browman and Goldstein (1986) in two organizational principles governing glottal opening-​and-​closing gestures occurring in the word-​initial onsets of Germanic languages: (1) that glottal peak opening is synchronized to the midpoint of any fricative gestures and otherwise to the release of any closure gestures and (2) there is at most a single glottal gesture word-​initially. The same reasoning can hold for sequences in word-​final and medial positions in Persian. *[+voice]# and *#[+voice] are intrasegmental markedness constraints that prohibit a word-​initial and word-​final obstruent from being voiced. To construct an OT analysis, two kinds of inputs are taken into account: /​chaɹ/​and /​ɟaɹ/​. Since Persian preserves aspiration in the word-​initial position and in sonorant environments, a faithful mapping results. Ident(lar) dominates *Lar and *[–​sg]. A  priority relationship among constraints is explained by means of constraint conflict, i.e. one constraint acts counter to another one in favouring the two competing output candidates. Ident (lar) dominates *Lar because it favours the winner [chaɹ], while *Lar favours the loser [caɹ] (see Tableau 5.1). Ident(lar) also dominates *[–​sg] because it favours the winner [caɹ], while *[–​sg] favours the loser [chaɹ] (see Tableau 5.2). Tableau 5.1 /​chaɹ/​ ☞chaɹ caɹ

Ident (lar) *W

*Lar * L

Tableau 5.2 /​ɟaɹ/​ ☞caɹ chaɹ

Ident (lar) * **W


* L

Tableau 5.3 illustrates an unfaithful mapping in which the winner [caɹ] violates Ident(lar) while the loser [ɟaɹ] disobeys the markedness constraint *#[+voice] having higher priority. Tableau 5.3 /​ɟaɹ/​ ☞caɹ ɟaɹ

*#[+voice] *W

Ident (lar) * L

*Lar *

*[-​sg] * *

Since the rule of deaspiration is triggered by an output constraint that forbids stops in the obstruent sequences to have the features [+sg] or [+voice], Ident(lar) is violated and dominated by *Lar/​L ar (see Tableau 5.4). *Lar/​Lar favours [Ɂascaɹ] while Ident(lar) favours [Ɂaschaɹ].

136   Mahmood Bijankhan Tableau 5.4 /​Ɂaschaɹ/​ ☞Ɂascaɹ Ɂaschaɹ

*Lar/​Lar Ident (lar) *W

* L

By the same reasoning, the output constraints *[+voice] have higher priority than Identl (lar), and together with *Lar/​Lar are in partially ordered relation with each other, as shown in Tableaux 5.5 and 5.6. (54) represents the laryngeal constraints hierarchy for Persian obstruents. Tableau 5.5 /​saɟ/​ ☞sac saɟ

*[+voice]# *W



Ident (lar) * L


Tableau 5.6

/​hads/​ *[+voice]# *Lar/​Lar Ident (lar) *Lar *[-​sg]

☞hats hads


* L


* *

(54) *[+voice]#, *#[+voice], *Lar/Lar >> Ident(lar) >> *Lar, *[–​sg] One could posit the generalization that the deaspiration and devoicing rules join a conspiracy according to which they support reduction in the laryngeal magnitude of the SF, whether in glottal spreading or vocal cord vibration.

5.8.2 Syllable structure According to Persian phonological structure, syllable onset must contain one consonant. Thus ONSET that forbids initial vowel syllables is undominated. Accordingly, when a phoneme string starts with a vowel, two methods may be chosen for syllabification to deal with ONSET: whether the initial vowel is deleted or a consonant is inserted at the beginning of the syllable. Persian chooses the second and inserts an initial glottal stop. Thus DEP, as a faithfulness constraint, interacts with ONSET and results in a preference for the universal unmarked CV, as illustrated in Tableau 5.7. ONSET: No syllable is allowed to start with a vowel. DEP: Every segment of the output has a correspondent in the input.

Phonology   137 Tableau 5.7 VCV



*W **W


Since Persian prefers initial consonant insertion to vowel-​initial deletion, the faithfulness constraint MAX, which penalizes deletion, should dominate DEP (see Tableau 5.8). MAX: Every segment or autosegment of the input has a correspondent in the output.

Tableau 5.8 VCV



* L


Persian resolves hiatus, i.e. a sequence of two vowels, by either an epenthetic consonant between two vowels or the deletion of one of the vowels. Therefore, ONSET should dominate both MAX and DEP. For example, formal and colloquial pronunciations of the verb /​be+ɡu+am/​‘say’ are [beɡujam] and [beɟam], respectively. Tableau 5.9 illustrates interaction of ONSET, MAX, and DEP. Tableau 5.9 CVVC ☞CV·CVC ☞CVC CV·VC




Since Persian assigns intervocalic consonants to the coda rather than the onset, *Complex-​Onset has higher priority than *Complex-​Coda. Tableau 5.10 illustrates the syllabification of the VCCCV string. *COMPLEX-​ONSET: A consonant cluster is not allowed in the onset. *COMPLEX-​CODA: No coda is allowed to have more than one consonant.



* L L

138   Mahmood Bijankhan Since the whole CVCC string is parsed into one syllable, i.e. .CVCC., NO-​CODA is dominated by ONSET, MAX, and DEP, as shown in Tableau 5.11. The number of violations of NO-​ CODA is equal to the number of consonants in the coda position. NO-​CODA: No syllable-​final consonant is allowed.

Tableau 5.11 CVCC






*W *W

** *L *L *L

Tableaux 5.12 and 5.13 illustrate that loanwords with CCVC and CVCCC patterns can be syllabified as CV.CVC and CVC.CVC, respectively. For example, Persian speakers syllabify the English words /​class/​and /​lustɹe/​ as /​ce.lɑs/​ and /​lus.teɹ/​. Tableau 5.12 CVCCC ☞CVC·CVC CVCC CVCCC Tableau 5.13 CCVC ☞CV·CVC CCVC CVC



* L L




*W *W




* L L

Finally, the constraint hierarchies for Persian syllable structure are as follows: ONSET>>MAX>>DEP>>NO-​CODA *COMPLEX-​ONSET>>*COMPLEX-​CODA *COMPLEX-​ONSET, *COMPLEX-​CODA>> DEP

5.8.3 Intersegmental constraint interactions We saw in section 5.6 that segmental features of voicing, aspiration, place, and manner interact with each other. In this section, rule interaction is interpreted in line with the OT approach. The UF and SF of two typical words are reiterated in (55).

Phonology   139 UF SF (55)  ʧhand-​chɑɹe ʧhaŋkhɑɾe, *ʧhandkhɑɾe, ‘multifunctional’ *ʧhaɲchɑɾe,*ʧhankhɑɾe ɁeʤthemɑɁ ɁeʃtemɑɁ, *ɁeʧthemɑɁ , * ɁeʃthemɑɁ ‘society’ In colloquial speech, an anterior stop, i.e. /​t/​, /​d/​, is deleted when preceded by a fricative or an anterior nasal, and followed by a consonant. In OT terminology, the triconsonantal sequence [Ct/​dC] is not allowed. This requirement is enforced by [t/​d] deletion. Since [t/​d] deletion is triggered by a need to simplify the triconsonantal sequence Ct/​dC, MAX should be dominated by a context-​sensitive markedness constraint, or *Ct/​dC, as illustrated in Tableau 5.14. Coetzee (2004) proposed similar constraints for [t/​d] deletion in English dialects. *Ct/​dC: A word-​final anterior stop is not allowed if it is followed by a consonant and preceded by a fricative or anterior nasal. Tableau 5.14 /​ʧhand#chɑɹe/​ ☞ ʧhaŋkhɑɾe ʧhaŋdkhɑɾe

*Ct/​dC MAX * L


Unfaithful mapping of /​ʧhand-​chɑɹe/​ →[ʧhaŋkhɑɾe],*[ʧhaɲchɑɾe] in (53) denotes that a high dorsal stop is not allowed to be different from the following vowel in the value of [back]. This requirement is enforced by [back] spreading from the vowel to the stop. Thus AGRRE (Back) requiring dorsal agreement dominates faithfulness to the featural autosegment, as illustrated in Tableau 5.15. AGREE (Back): High dorsal stops must have the same value of [back] as the following vowel. IDENT: No feature in the input changes in the corresponding output segment.

Tableau 5.15 /​ʧ hand#chɑɹe/​ ☞ ʧ haŋkhɑɾe ʧ haŋchɑɾe

AGREE(Back) *W


Additionally, the same unfaithful mapping of /​ʧhand-​chɑɹe/​ →[ʧhaŋkhɑɾe], *[ʧhankhɑɾe] in (55) denotes that the nasal is not allowed to be different in place from a subsequent consonant. This requirement is enforced by [Place] spreading from the consonant to the nasal. Thus AGREE (Place) requiring place agreement dominates faithfulness to the featural autosegment, as illustrated in Tableau 5.16. AGREE (Place): Non-​labial nasals must have the same place of articulation as a subsequent consonant. Tableau 5.16 /​ʧhand#chɑɹe/​ ☞ ʧhaŋkhɑɾe ʧhankhɑɾe

AGREE(Place) *W


140   Mahmood Bijankhan Given *[ʧhakhɑɾe] as the candidate having two deletions in relation to the input, Tableau 5.17 illustrates the priority of MAX over IDENT. Tableau 5.17 /​ʧhand#chɑɹe/​ ☞ ʧhaŋkhɑɾe ʧhakhɑɾe

MAX * **W


A summary ranking of constraints is shown in (56), as a hierarchical Hasse diagram. (56)  AGREE (Back) ∗Affricate/Stop


AGREE (Place)

MAX IDENT Another unfaithful mapping in (55), ɁeʤthemɑɁ → ɁeʃtemɑɁ, *ɁeʧthemɑɁ, indicates that Persian reacts to the triautosegmental sequence of [–​cont][+cont][ –​cont], since the first two autosegments belong to an affricate and all three are attached to the same place node. Thus a sequence of a homorganic affricate and a stop is not allowed. This requirement is enforced by the spirantization of the affricate, i.e. deletion of the first [–​cont]. Tableau 5.18 illustrates a ranking argument for the priority of *Affricate/​Stop over MAX. *Affricate/​Stop: A sequence of a homorganic affricate and a stop is not allowed. Tableau 5.18 /​ɁeʤthemɑɁ/​ ☞ ɁeʃtemɑɁ ɁeʧtemɑɁ

*Affricate/​Stop *W


5.9  Concluding remarks The Persian phoneme inventory includes 23 consonants and 6 vowels. The typical template of the Persian syllable is CV(C) (C). A glottal stop is inserted at the beginning of a vowel-​initial word. Given a consonant cluster in disagreement with SSP in Persian, one will find a corresponding cluster in agreement with SSP. Testing the phonological processes against the feature geometry of Halle et al. (2000) led to the positing of 14 binary and 3 unary features. The laryngeal features responsible for the contrast in the stops and fricatives are different: [sg]

Phonology   141 for stops and [voice] for fricatives. Voicing is more robust in fricatives than in stops. [Voice] is the laryngeal feature that partitions the phonological space into voiced and voiceless obstruents. The sound resulting from the deaspiration of the voiceless stops is perceived as a corresponding voiced stop. [Tense] could also play the same role as [voice]. No consistent agreement is reported on the phonemic status of the dorsal obstruents. The production and perception experiments could resolve such an inconsistency. Segmental, autosegmental, and moraic phonologies were studied by means of 11 phonological rules accompanied by many typical examples. Rule interaction posited a hierarchy for a subset of cross-​linguistic rules that should be completed as part of the Persian phonology. Furthermore, phonological analysis of the laryngeal features for stops in the framework of optimality theory resulted in the rules for joining a conspiracy that support reduction in the laryngeal magnitude of the SF, whether in glottal spreading or vocal fold vibration. Finally, an OT approach to derivational interactive rules led to a hierarchical Hasse diagram of some other posited violable constraints. Interpretation of the constraint interaction in the framework of the harmonic serialism could lead to new findings.

Acknowledgements I thank Mr Justin Cancelliere and Dr Parvaneh Shayestehfar for editing this chapter. I also wish to sincerely thank two anonymous reviewers for helpful comments.

chapter 6

PROSODY Arsalan Kahnemuyipour 6.1 Introduction This chapter provides an overview of the prosody of the Persian language.1 The chapter starts with a discussion of word stress and builds on that to cover stress at the phrasal and clausal levels. In the process, we will briefly consider some accounts of Persian prosody at the various levels and its interaction with information structure. In the end, we will briefly consider the phonetic realization of prosodic prominence and intonation in Persian. The chapter is organized as follows. Section 6.2 deals with final stress at the word level and the divergence from this pattern. Section 6.3 provides an account for the non-​final stress in the verbal domain. In section 6.4, we look at the phonetic correlates of prosodic prominence in Persian and provide a brief overview of Persian intonation. Section 6.5 concludes the chapter.2

6.2 Word stress This section looks at stress at the level of the word in Persian with the aim of establishing a general rule which can account for the various stress patterns. While a general tendency for word-​final stress had been noted by many linguists starting with Chodzko (1852), the super­ ficial diversity found in Persian stress patterns had led many scholars to suggest various splits, in particular based on lexical categories. Looking at the examples in (1), it is easy to detect the

1  While many of the generalizations made in this chapter may apply to various dialects of Persian spoken in Iran and other neighbouring countries, the discussion in this paper is based on the dialect spoken in Tehran, the capital of Iran. The data in this paper are largely based on the author’s native judgment. In addition, when needed, several other sources have been consulted for confirmation, e.g. Ferguson (1957); Lazard (1992); Mahootian (1997); Sadat-​Tehrani (2007); Same’i (1996); Thackston (1993); Windfuhr (1979). 2  Much of what is discussed in sections 6.2 and 6.3 is based on my older work, particularly Kahnemuyipour (2003, 2009). I have put technical details aside and have focused mostly on descriptive generalizations here.

Prosody   143 divergence from word-​final stress. We hope to be able to make more sense of this divergence by the end of this chapter.3 (1) a. ketā́b b. divāné c. divānegí d. ság-​am e. xéili

‘book’ ‘crazy’ ‘craziness’ ‘my dog’ ‘very, a lot’

f. xaríd g. xaríd-​i h. mí-​xor-​e i. hamishé j. váli

‘s/​he bought’ ‘you bought’ ‘s/​he eats’ ‘always’ ‘but’

The first thorough discussion of Persian stress can be found in Chodzko (1852). He identifies word-​final stress as the basic stress rule in Persian and attributes this pattern to simple, derived, and compound nouns and adjectives, as well nominal verbs (a type of infinitive).4 For verbal stress, he suggests different rules for different tenses. Ferguson (1957) makes a distinction between verbal stress and the other categories. ‘It is certainly safe to say that in modern Persian the verb has recessive stress. This is in sharp contrast with the noun, where the stress tends to be near the end of the word’ (Ferguson 1957: 26–​7). In a similar fashion, Lazard (1992) draws a line between non-​verbal words and verbs, taking the former to have word-​final stress and the latter ‘recessive stress’. Mahootian (1997) states that stress is word-​ final in simple nouns, derived nouns, compound nouns, simple adjectives, derived adjectives, infinitives, and the comparative and superlative forms of adjectives as well as in nouns with plural suffixes, and underlines verbal stress as one of the exceptions to this rule. Finally, the clearest divide between verbal and non-​verbal stress in Persian comes in the work of Amini (1997), where an End Rule Right is proposed for all categories including non-​prefixed verbs and an End Rule Left for prefixed verbs. While these accounts rely on a split between verbs and other categories to account for the stress patterns found in Persian, they also reveal that even such a split fails to capture the discrepancies observed in Persian, exemplified in (1). Just to illustrate the point, take the examples in (1f) and (1h). While both are verbs, stress is final in (1f) but initial in (1h). This issue is irrespective of whether one considers using categorial splits to account for the diversity in stress patterns to be a desirable move from a theoretical perspective. In order to decipher the rule governing the stress system of Persian, we need to look at the stress patterns exemplified in (1) more closely. Let us start with simple non-​affixed words of varying lengths and categories, exemplified in (2). (2) a. dást b. kebrít c. tasādóf d. buqalamún e. ráft

‘hand’ ‘match’ ‘accident’ ‘turkey’ ‘went’

f. tálx g.   bozórg h.  divāné i. motenāséb j. nevésht

‘bitter’ ‘big’ ‘crazy’ ‘proportionate’ ‘wrote’

I use acute accent ́to mark primary stress. Persian long infinitives (what Chodzko 1852 referred to as nominal verbs) align themselves with nouns not only with respect to stress, but also when you consider their morphological behaviour: they take the nominal plural marker (e.g. xābidan-​ā, sleeping-​pl, ‘the acts of sleeping’) or take the suffix -​i which is typically added to nouns to form adjectives (e.g. compare qānun-​i, law-​i, ‘legal’ with xundan-​i, reading-​i, ‘readable, reading-​worthy’). 3 


144   Arsalan Kahnemuyipour The generalization is very clear here: stress falls on the last syllable of the word. This generalization also covers (1a, b, f, i). The example in (1c) also shows word-​final stress but it is different from the ones in (2) in that it contains a derivational affix. When we look at more examples of derived words, we note that all derivational suffixes take stress (see also Ferguson 1957). As a result, word stress falls on the last syllable in derived words in Persian. With derivational prefixes too, the stress falls on the last syllable of the whole word. Some examples involving derivational affixes are given in (3). The examples in (3e) and (3f) show the pattern very clearly. In (3e), with a derivational prefix bi ‘without’, stress is on the last syllable of the whole word, hence on the root. Once the derivational nominalizing suffix -​i is added to the same form, stress falls on this suffix as the last syllable in the whole word. (3)

a. tasādof-​í accident-​adj. ‘accidental’

d. nā-​momkén neg.-​possible ‘impossible’

b. talx-​í bitter-​nom. ‘bitterness’


bi-​ráhm without-​mercy ‘merciless’

c. hushmand-​āné intelligent-​adv. ‘intelligently’


bi-​rahm-​í without-​mercy-​nom. ‘mercilessness’

It is worth noting here that the plural marker and the comparative/​superlative markers in Persian are also part of the stressed word with the stress falling on these suffixes, as shown in (4). While these affixes are typically treated as inflectional across languages, Kahnemuyipour (2000b, 2004) argues that they behave like derivational affixes in Persian, making their stress behaviour unsurprising. (4) a.

kebrit-​ā́ match-​pl. ‘matches’

b. divāne-​tár crazy-​comp. ‘crazier’ c.

divāne-​tarín crazy-​super. ‘craziest’

We can therefore maintain the generalization so far that stress is word-​final in Persian, as long as we take the word to include derivational affixes. Kahnemuyipour (2003) formulates this generalization in the framework of Phrasal (or Prosodic) Phonology (Selkirk 1980a,b, 1981, 1984, 1986; Nespor and Vogel 1982, 1986; among others). In this framework, various prosodic domains such as the phonological word, the phonological phrase and the intonational phrase are derived from morphosyntactic constituents. The domain relevant to our discussion here is the phonological word, which is typically defined as the domain for word stress, phonotactics, and segmental word-​level rules. For Persian, stress falls on the

Prosody   145 last syllable in the phonological word, which contains the root and all derivational affixes (including the plural and comparative/​superlative markers).5 Unlike derivational affixes, inflectional ones are not part of the domain of word stress in Persian (see also Ferguson 1957). In (1d), for example, the first person singular possessive marker -​am does not carry stress, with stress falling on the root sag ‘dog’.6 Similarly, stress falls on the last syllable of the preterite verb in (1f) and when the inflectional suffix -​i marking second person singular agreement is added as in (1g), stress remains on the stem. More examples of words involving inflectional suffixes are given in (5). In (5a–​c), we see several examples of nominal inflectional endings, in addition to the possessive marker we have seen so far. In (5d), we see another example of the verbal agreement markers not receiving stress.7 (5) a.

ketā́b-​i book-​indef. ‘a book’


māshín-​o car-​acc. ‘the car (accusative)’


míz-​e qahveyí8 table-​Ez brown ‘brown table’


ráft-​im went-​1pl. ‘we went’

The different behaviour of derivational and inflectional affixes in Persian is not surprising. In many languages, affixes behave differently with respect to whether they are part of the phonological word or not (see, for example, Hall and Kleinhenz 1999). Dixon (1977a,b) 5  Compounds are also treated as single words in Persian with stress falling on the final syllable of the whole compound, e.g. ketāb-​xuné book-​house ‘library’, bozorg-​manésh great-​attitude ‘magnanimous’. Morphologically, too, compounds behave like single words with no affix interrupting the two parts of the compound. 6  Most grammars of Persian treat the possessive marker (and a few other elements discussed in this paper) as enclitics rather than suffixes. (For more information about enclitics, refer to Chapters 3, 8, 9, and 10.) I am abstracting away from this distinction as it is not relevant for the discussion in this paper. The important point here is the inflectional status of these elements. I will refer to them as suffixes below. 7  It is worth noting here that agreement markers receive stress in present perfect forms in colloquial Persian, e.g. did-​ím ‘we have seen’. The present perfect consists of the past participle plus the agreement marker. In its full form, the past participle ends in the vowel -​e and stress falls on this vowel as expected: didé-​im. In colloquial Persian, the vowel -​e is dropped, giving its stress to the adjacent vowel in the suffix -​im. Also, see Kahnemuyipour (2003) for a discussion of the difference in stress behaviour between verbal agreement markers in the past and present tenses. I am abstracting away from such details here. 8  In Persian, the noun is connected to its post-​nominal modifiers via a vowel known as the Ezafe (marked in the example as Ez). For a more detailed discussion of this marker and different accounts of its syntax, see Ghomeshi (1997); Kahnemuyipour (2014); Larson and Yamakido (2008); Samiian (1994). Crucially, Ezafe behaves like an inflectional suffix and does not carry stress. The Ezafe construction is further discussed in Chapters 3, 6, 7, 8, 9, 10, and 19.

146   Arsalan Kahnemuyipour refers to this distinction using the terms ‘cohering’ and ‘non-​cohering’. Using this terminology, derivational affixes in Persian are ‘cohering’, while inflectional ones are ‘non-​cohering’. The division in Persian seems to be particularly well behaved, given the plausibility of taking suffixes involved in derivation (i.e. a lexical process) to be part of the phonological word and inflectional suffixes that are often considered to have syntactic status to be outside the phonological word. It is worth noting that cohering suffixes are ordered before non-​cohering ones, leading to the schema in (6), where ω marks the phonological word boundary, the domain of word stress assignment. In (7), we see examples of words involving both a cohering and a non-​cohering suffix. (6) (stem-​cohering suffixes)ω non-​cohering suffixes (7) a. divāne-​gí-​ash crazy-​ness-​her/​his ‘her/​his craziness’ b. ketāb-​ā́ -​ro book-​pl.-​acc. ‘the books (accusative)’ Let us take stock of what we have covered so far. We started with the examples in (1), repeated below as an illustration of the range of stress patterns we can find in the Persian word. We then noted that the word-​final stress rule can capture word stress in non-​affixed words, exemplified in (1a), (1b), (1f), and (1i). With the additional distinction between cohering and non-​cohering affixes and the proposal that the word-​final stress rule applies to the phonological word which includes cohering suffixes, but excludes non-​cohering ones, we managed to capture the stress pattern exemplified in (1c), (1d), and (1g). We are now left with three words in our list, namely (1e), (1h), and (1j). The following section is devoted to examples like (1h), where stress is on a prefix in a verb. It is precisely this type of example which had led most Persian linguists to suggest a categorial division with respect to word stress, as discussed above. Meanwhile, before turning to that case, I would like to take on examples (1e) and (1j) which exhibit stress on the first syllable in clear contradiction to the more general word-​final stress in Persian.9 (1)

a. b. c. d. e.

́ ketāb divāné divānegí ság-​am xéili

‘book’ ‘crazy’ ‘craziness’ ‘my dog’ ‘very, a lot’

f. g. h. i. j.

xaríd xaríd-​i mí-xor-​e hamishé váli

‘s/​he bought’ ‘you bought’ ‘s/​he eats’ ‘always’ ‘but’

9  I am leaving aside a productive stress shift that occurs in Persian in the formation of the vocative, the only process of this kind in the language. While the stress on (proper) nouns is word-​final, in the vocative, the stress shifts to the first syllable, e.g. rézā ‘Reza!’, dóktor ‘Doctor!’, xā́num ‘Ma’am!’. In Modern Persian, there are also some remnants of an older process of vocative and optative formation from Classical Persian using the non-​cohering suffix. -​ā. Due to the non-​cohering status of this suffix, the stress fell on the stem. As a result, these remnants show non-​final stress: e.g. daríqā ‘alas’, xóshā ‘how pleasant is  . . .  ’, xodā́yā ‘Oh God’, bádā ‘how bad is  . . .  ’, bā́dā ‘may it be’ (see also Ferguson 1957).

Prosody   147 The examples in (1e) and (1j) belong to a list of perhaps a handful of words with exceptional non-​final stress. To be more precise, these are all two-​syllable words with stress on the first syllable. While these exceptional cases have been noted in the literature (see, for example, Amini 1997; Hosseini 2014; Lazard 1992; Mahootian 1997), often a selected list is provided with no major discussion of possible generalizations, making further exploration of these cases worthwhile here. Ferguson (1957) has the most detailed discussion, where he has also attempted to provide an exhaustive list, but he is still missing a few words. Here, I will try to add those missing words and provide a complete list. Meanwhile, I am leaving out those words on Ferguson’s list which appear to be archaic or outdated. I am hoping, therefore, to provide an exhaustive list of words with non-​final stress in contemporary Persian and discuss their distribution. In (8), we find the full list of these words with stress on the first syllable. The cases that are from Ferguson are marked with F in brackets.10,11

(8) a. ā́ri/​báli/​bále (F) b. náxeir (F) c. váli/​ámmā/​bálke/​lā́ken (F) d. shā́iad (F) e. ā́iā (F) f. hámin/​hámān (F) g. háttā (F) h. kā́shke/​kā́shki (F) i. mā́shāllā (F) j. xéili (F) k. ágar (F) l. mágar (F) m. gúiā (F) n. bá’zi/​bárxi (F) o. chérā (F) p. zírā (F) q. mérsi12 r. chónke13 s. ā́ffarin/​bā́rikallā t. hárchand u. váqti (also váqtike)

‘yes’  (cf. colloquial āré) ‘no’ ‘but, however’ ‘may, perhaps, maybe’  (cf. bāyád ‘must’) ‘whether, yes/​no question word’ ‘the same’ ‘even’ ‘I wish  . . .  , it would be great if  . . .  ’ ‘bravo’ ‘very, a lot’ ‘if ’ ‘unless’ ‘as if ’ ‘some’ ‘why, yes (in response to negative question)’ ‘because’ ‘thank you’ (cf. mamnún, sepā́s ‘thank you’) ‘because’ ‘bravo’ ‘even though’ ‘when’

10  For the words adapted from Ferguson (1957), transcriptions have been modified for consistency. Some translations have also been modified. 11  Ferguson (1957) also has a list of Arabic formulaic expressions which have been borrowed into Persian, often with their original stress that does not match that of Persian. He concedes that most of these expressions are no more in use (even in 1957). Of those, there are only two that are still commonly used: alhamdo lellāh ‘Thank God!’ and inshāllāh ‘God willing’. The first expression has two possible stress patterns, one with main stress on the second syllable (alhámdo lellāh) and one with stress on final syllable (alhamdo lellā́h). The second example also has two possible stress patterns, one word-​final (in shāllā́h) and one word-​initial (ínshāllāh), when uttered in isolation in response to a statement. Also in this category, one might consider Arabic ordinal adverbials which are still commonly used, e.g. ávvalan ‘firstly’, sā́niyan ‘secondly’, sā́lesan ‘thirdly’. 12  This is a French borrowing but by far the most common word for expressing gratitude. 13  A variant of this form is chón with a single stressed syllable.

148   Arsalan Kahnemuyipour Many of the examples in (8) defy any generalization and may simply need to be accepted as lexical idiosyncrasies. Meanwhile, one might be able to provide some partial explanations or extract some general tendencies from the above facts. To begin with, a few of the words in (8) may consist of a root plus a non-​cohering affix, which makes their initial stress unsurprising (see also Hosseini 2014). (8u) is a clear example of this kind with vaqt being a free root meaning ‘time’ and -​i the indefinite marker, a non-​cohering affix. One may be able to place (8j) and (8n) in the same category, while noting that xeil meaning ‘group’ has a marginal use in contemporary Persian and ba’z or barx are not used as free roots in Persian and their status as bound roots in this context are at best questionable. (For more information on the indefinite marker, refer to Chapters 2, 3, 7, 8, and 9.) Another example which can be easily broken down into a root plus a non-​ cohering suffix is given in (8o), where the Persian word cherā ‘why’ can be split into che ‘what’ and the non-​cohering accusative marker -​rā. While the synchronic status of this division may be questionable, this is certainly the historical reason for the stress pattern of this word. Similarly, one might classify zirā ‘because’ in (8p) with cherā even though zi does not have a root status in contemporary Persian (see Hosseini 2014 for a possible historical explanation).14 A number of other examples in (8)  appear to consist of two (phonological) words, in which case the stress should not really be seen as non-​final stress in a single word but main stress on the first word in a two-​word phrase. This is particularly plausible in the context of a leftmost phrasal stress rule (see Hosseini 2014; Kahnemuyipour 2003). The words in (8f), for example, can be broken down into ham ‘also’ and in/​ān ‘this/​that’. Similarly, chonke ‘because’ in (8r) can be split into chon ‘because’ (see footnote 13) and ke ‘that’. The examples in (8h) may also be broken down to kāsh ‘I wish’ and the complementizer ke ‘that’, with ki perhaps seen as a variant of ke (Hosseini 2014). Finally, harchand ‘even though’ in (8t) may debatably be taken to consist of har ‘every’ and chand ‘few’. Several of the examples in (8) are words that can be used as single-​word utterances raising the question of whether their initial stress may be related to their status as an utterance.15 The following fall under this category: (8a), (8b), (8d), (8i), (8o), (8q), (8s). The example in (8d) is particularly interesting, because of the two modals shāiad ‘may, perhaps’ and bāiad ‘must’, only the former can be used as a single-​word utterance and only that one has initial stress. While this pattern is interesting especially because it applies to quite a few forms, it can by no means be generalized to all one-​word utterances. The absence of generality in this regard can best be seen in the various forms for ‘thank you’ shown in (8q). Of those forms, only one, albeit the most common one mérsi has initial stress. Similarly, not all the forms for ‘yes’ in Persian show initial stress. In fact, as can be seen in (8a), the most common form in colloquial Persian āré exhibits word-​final stress.


The word guyā in (8m) may also fall in the same category of a root plus a non-​cohering suffix, with -​ā possibly a remnant of the Classical Persian optative marker, see footnote 9. 15  Relating the initial stress to utterance-​level stress may be particularly plausible in the context of Kahnemuyipour’s (2003) proposal that stress at the utterance level is leftmost in Persian. That still leaves open the question of why utterance level stress which should apply to an utterance consisting of two clauses applies at the word level in these cases, choosing one syllable over another for the application of stress.

Prosody   149 Finally, there is perhaps one category which shows initial stress without exception, namely clausal conjunctions.16 This category has the most examples in (8) and presents the strongest generalization in the sense that Persian clausal conjunctions (almost) never show final stress.17 The following examples from (8) fall into this category: all the forms for ‘but’ in (8c) as well as (8g), (8k), (8l), (8p), (8r), (8t), and (8u).18 We started this section by looking at a number of examples of Persian words in (1) which at first sight did not appear to follow a systematic rule. With closer inspection, we noted that a general word-​final stress rule can account for the stress pattern of most of those examples, as long as the word is seen as the Phonological Word which consists of the root and all cohering affixes that are attached to the root. Crucially, non-​cohering affixes (i.e. inflectional affixes) fall outside the domain of the Phonological Word and as a result words containing them will exhibit non-​final stress. We then looked at a number of exceptional cases in Persian which do not seem to follow the general word-​final stress rule. We noted that these 20–30 words seem to defy any comprehensive generalization to account for their unusual stress pattern. In the meantime, we underlined some general tendencies in the data which may pave the way for a better understanding of these examples in future. Of the words we started with in (1), we are left with only one example to account for, namely mí-​xor-​e ‘s/​he eats’. In section 6.3, we will see that this example is part of a much more general pattern of non-​final stress in the verbal domain. I will argue that this stress pattern can be accounted for if we consider stress at the level of the verb phrase.

6.3.  Non-​final stress in the verbal domain We ended the previous section with the only remaining seemingly exceptional example from (1), namely mí-​xor-​e ‘s/​he eats’. If we take this example to constitute a single Phonological Word, one should expect to have main stress on the second syllable, * mi-​xór-​e, with the stress rule skipping the agreement maker -​e, a non-​cohering suffix, and falling on the stem. This pattern is not observed. In fact, with all verbal forms involving the durative marker mi-​, stress falls on the durative marker. This is also true of the subjunctive marker be-​ and the negative marker na-​/​ne-​, as shown in (9). (9) a. mí-​xor-​am dur.-​eat-​1sg. ‘I eat.’ 16  Some clausal conjunctions such as chon ‘because’ or pas ‘so, therefore’ are monosyllabic and as such cannot distinguish between word-​initial or word-​final stress. 17  An anonymous reviewer has brought two examples to my attention which seem to allow final stress: agarche and garche, both meaning ‘even though’. It should be noted that these bi-​morphemic words, consisting of (a)gar and che, allow for both initial and final stress. Meanwhile, in light of the existence of these forms which allow final stress, I qualified the generalization with ‘almost’. 18  Some of the words with non-​final stress in (8) contrast with other segmentally homophonous words based only on the stress pattern, e.g. váli ‘but’ vs. valí ‘guardian’, ā́ri ‘yes’ vs. ārí ‘devoid’ (see also Ferguson 1957). Note, however, that this contrastive pattern cannot be taken as the motivation behind the non-​final stress as it is limited only to very few examples from the above list. Also, there are many homophon­ous words in Persian with no stress difference.

150   Arsalan Kahnemuyipour b. bé-​xor-​i subj.-​eat-​2sg ‘that you eat (subjunctive)’ c. ná-​xord-​im neg.-​eat.past-​1pl ‘We didn’t eat.’ At first glance, the stress pattern in (9) seems surprising, as stress seems to have shifted to the left for no obvious reason. Meanwhile, things start looking more systematic once we add more material within the verb phrase to the left of the verb. In (10a), we have added a non-​ specific object, and main stress falls on the object. In (10b), a measure adverb is added, and main stress falls on the measure adverb. (10)  a. kéik mi-​xor-​am cake dur.-​eat-​1sg. ‘I eat cake.’ ́ keik mi-​xor-​am b. ziyād a lot cake dur.-​eat-​1sg. ‘I eat too much cake. (lit. I eat cake a lot.)’ Kahnemuyipour (2009) argues that the correct generalization capturing the facts in (10) is that main stress falls on the leftmost element (or Phonological Word) in the verb phrase (see also Same’i 1996). 19 Manner and measure adverbs are argued to mark the left edge of the verb phrase (see Holmberg 1986; Webelhuth 1992; among others), thus explaining the main stress on the measure adverb in (10b).20 It is worth noting that the same pattern obtains with the very productive Persian construction known as complex verbs (see, for example, Dabir-​Moghaddam 1997; Karimi 1997; Megerdoomian 2001; Vahedi-​Langarudi 1996). Some examples are shown in (11). (11c) shows that when a manner adverb is added, main stress falls on the manner adverb. For more discussion on complex verbs, see Chapters 2, 7, 8, 9, 10, 15, 17, and 19.


Kahnemuyipour (2009) is written in a minimalist framework using the notion of phases and multiple spell-​out (Chomsky 1995, 2001, and subsequent authors). In this context the relevant verbal domain is the phasal vP (where v is the functional head introducing the external argument) and manner/​ measure adverbs are taken to mark the left edge of this domain. I am abstracting away from these technical details in this overview. 20  An anonymous reviewer introduces a measure adverb, which does not receive the main stress of the sentence, as a possible counterexample to this generalization: hesābi ‘a whole lot (colloq.)’. While a thorough examination of this adverb is beyond the scope of this chapter, I should point out that even regular measure adverbs such as xub ‘well’ can sometimes appear without the main stress of the sentence (see Kahnemuyipour 2009). Meanwhile, there is a subtle semantic difference in such instances (something like: What he did well was  . . .  ), which may justify a higher syntactic position outside the verb phrase. I have the intuition that in a similar manner, hesābi may be in a higher syntactic position outside vP. If this intuition is on the right track, the mapping between the stress domain and the vP can be maintained.

Prosody   151 (11) a. sedā́ mi-​zan-​e sound dur.-​hit-​3sg. ‘S/​He is calling.’ b. geryé mi-​kard-​im cry dur.-​did-​1pl. ‘We cried.’ c. xúb dars mi-xund-​i well lesson dur.-​read.past-​2sg. ‘You studied well.’ We are now ready to revisit the facts in (9). Given the general leftmost rule at the level of the verb phrase, the stress on the prefixes in (9) can be seen as the result of the same rule. If we take the verbal prefixes to constitute separate phonological words, then the stress on these prefixes can be seen as the regular leftmost stress in the domain of the verb phrase.21 The fact that Persian verbal prefixes behave as independent phonological words is not surprising. Similar proposals have been made for some affixes in other languages in the Phrasal Phonology framework (e.g. Cohn 1989 on Indonesian; Kang 1992a,b on Korean; Nespor and Vogel 1986 on Italian; Selkirk and Shen 1990 on Shanghai Chinese). In particular, Rice (1993) has argued that the verb in Slave (a Northern Athabaskan language) is parsed as a phonological phrase. The verbal prefixes in Persian just appear to be another example of the same type of behaviour. Under this view, the seemingly ‘recessive’ stress pattern of Persian verbs should not be seen as an exception to the general word-​final stress rule in Persian but rather as a consequence of the general leftmost rule at the level of the verb phrase.22 The generalization that main stress falls on the leftmost element in the verb phrase finds support from a contrast in the stress behaviour of non-​specific and specific objects. In Persian, while non-​specific objects receive main stress (12a), specific ones do not (12b).23 This contrast receives a straightforward account in the context of a widely accepted view that specific objects are in a higher syntactic position compared to their non-​ specific counterparts cross-​linguistically (see de Hoop 1996 and Koopman and Sportiche 21 

Kahnemuyipour (2003) treats the negative marker as inherently focused and its stress as focus stress (see below). In other words, the stress on the negative marker is obtained differently from the other two prefixes. Note that in the context of the negative marker, when more material is added to its left, unlike with the other two prefixes, stress does not shift to the left and stays on the negative marker. I argue that this contrast follows from the focus status of the negative marker. 22  The idea that the stress on the verbal prefixes (or any other element within the verb phrase) is the result of leftmost stress may pave the way for a different analysis of non-​cohering suffixes, where non-​cohering suffixes are taken to be independent phonological words with the absence of prosodic prominence on them attributed to a general leftmost rule at the phrase level. This is the path I took in Kahnemuyipour (2003), but I am no more committed to it for several reasons. For one, there are reasons to believe the leftmost prominence rule in the verb phrase does not extend to the noun phrase (see Kahnemuyipour 2009), while the non-​cohering suffixes are found in all categorial domains. Also, taking functional and prosodically weak elements to constitute independent phonological words is problematic (see Hosseini 2014). The crucial point here is that non-​cohering suffixes are not part of the phonological word containing the root and are thus outside the domain of word stress. 23  It is worth noting that when the non-​specific NP consists of more than a single word, then the main stress of the sentence falls on the word within the NP with the highest prominence (for more details on how the notion of leftmost element is implemented, see Kahnemuyipour 2009). For example, if a

152   Arsalan Kahnemuyipour 1991 for Dutch; Diesing 1992 and Enç 1991 for Turkish; Mahajan 1990 for Hindi; among others). This syntactic difference between specific and non-​specific objects was proposed for Persian by Browning and Karimi (1994) (see also Ghomeshi 1996; Karimi 1996; Megerdoomian 2002). If we take the specific object to have moved out of the domain of the verb phrase, its failure to receive main stress is expected under the view presented here which takes main stress to fall on the leftmost element in the verb phrase. The idea that the specific object is outside the verb phrase is supported by the fact that it appears to the left of manner/​measure adverbs (12d), which mark the left edge of the verb phrase. The non-​ specific object, however, appears on the right side of manner/​measure adverbs (12c). In all the examples in (12), the main stress falls on the leftmost element in the verb phrase, indicated by the acute accent in the data. 24 (12) a. Ali [vP bastaní xord] Ali ice-​cream ate ‘Ali ate ice-​cream.’ b. Ali bastani-​sh-​o [vP xórd] Ali ice-​cream-​his-​acc. ate ‘Ali ate his ice-​cream.’ c. Ali [vP xúb bastani mi-​xor-​e] Ali well ice-​cream dur.-​eat-​3sg. ‘Ali eats ice-​cream well.’ d. Ali bastani-​sh-​o [vP xúb Ali ice-​cream-​his-​acc. well ‘Ali eats his ice-​cream well.’

mi-​xor-​e] dur.-​eat-​3sg.

This correlation between structural height and stress behaviour manifests itself in the domain of adverbs as well. As discussed above, manner adverbs are often argued to mark the left edge of the verb phrase, leading to their receiving the main stress in Persian. Meanwhile, other adverbs such as speaker-​oriented or subject-​oriented adverbs are typically taken to have higher structural positions (see, for example, Cinque 1999; Jackendoff 1972). Putting these two ideas together, the expectation is that these higher adverbs should not receive main stress in Persian. This prediction is borne out. This is best illustrated by lexical items which are ambiguous between a manner reading and a subject-​oriented reading. In Persian, this difference is realized as a difference in stress, as shown in the examples in (13). When sexâvatmandāne ‘generously’ is used as a manner adverb, it is inside the vP and receives

non-​specific object such as se tā bastani-​ye bozorg three claasif. ice-​cream big ‘three big ice-​creams’ is used, main stress falls on the adjective bozorg within this leftmost phrase in the vP. 24 

It is worth pointing out that the sentences in (12) can be reordered in Persian via a process known as vP-​preposing, placing the whole vP at the beginning of the clause. The main stress still remains on the leftmost element in the verb phrase in accordance with the system laid out above. Also, various elements in the verb phrase can be topicalized outside of the vP, thus escaping the main stress of the clause (see Kahnemuyipour 2009).

Prosody   153 main stress (13a) but when it is used as a subject-​oriented adverb, it is outside the vP and the main stress falls on the leftmost element within vP, namely the non-​verbal element in the complex verb in (13b). (13) a. Ali sexāvatmandāné Ali generously ‘Ali helped generously.’

komak kard help did

b. Ali sexāvatmandāne komák kard Ali generously help did ‘It was generous of Ali to help.’       (adapted from Kahnemuyipour 2009: 91) Before we end this section, it is important to note that the stress pattern illustrated here and the corresponding generalization which places the main stress on the leftmost element within the verb phrase are formulated in the context of a focus-​neutral sentence, i.e. a sentence which contains all-​new information (Context question: What happened?). If an element is focused in a particular sentence in Persian, that element will receive the highest prominence (for more details, see Kahnemuyipour 2009). Let us take (12d) as an example. If this sentence was to be uttered with the subject focus (Context question:  Who eats his ice-​cream well?) or with the object focus (Context question: What does Ali eat well?), then the highest prominence would be on the subject and object, respectively, as shown in (14). In this example, underlining marks focus and the highest prominence in the sentence has been marked with an acute accent.25 Note that this highest prominence is realized on the vowel which receives word stress: i in Ali in (14a) and i in bastani-​sh-​o in (14b). Recall that the accusative marker -​o is a non-​ cohering suffix and does not receive stress. (14)  a. Alí bastani-​sh-​o [vP xub mi-​xor-​e] Ali ice-​cream-​his-​acc. well dur.-​eat-​3sg. ‘It is Ali who eats his ice-​cream well.’


b. Ali bastaní-​sh-​o [vP xub mi-​xor-​e] Ali ice-​cream-​his-​acc. well dur.-​eat-​3sg. ‘It is his ice-​cream that Ali eats well.’


We started this chapter with a question about stress at the level of the word in Persian. In trying to understand the seemingly exceptional behaviour of verbs in this regard, we explored prosody at the level of the verb phrase and learned that with a correct understanding of prosody at this level, the stress behaviour of verbs is unsurprising. We cannot complete this chapter, however, without a discussion of the phonetic realization of prosody and intonation in Persian, the topic of the next section.


Kahnemuyipour (2009) argues that in a sentence with a focused constituent, while the focused element receives the highest prominence, the element which receives main stress by the default stress rule receives secondary stress.

154   Arsalan Kahnemuyipour

6.4  The phonetics of prosody and intonation in Persian In the previous sections, we looked at the distributional properties of Persian prosody without any reference to the phonetic realization of prosody in Persian. This section turns to the phonetics of prosody in Persian and provides a brief overview of the phonetic correlates of stress in Persian as well as the intonational structure of the language (see Chapter 4 for more discussion on intonation). Stress has been known to have several possible phonetic correlates cross-​linguistically. The phonetic correlates of stress in English, for example, are typically known to be pitch, intensity, and duration (see, for example, Ladefoged and Johnson 2015; Reetz and Jongman 2009; Rogers 2000). As for Persian, until recently, only some intuitive statements could be found with respect to the phonetic realization of stress. For example, Ferguson (1957) suggests that the phonetic property involved seems to be relative loudness or intensity. Lazard (1992) takes intensity to be a relevant phonetic correlate but adds pitch as an equally important property (see also Mahootian 1997). Other scholars (for example, Haghshenas 2001; Sepanta 1977; both cited in Hosseini 2014) underlined pitch (rise in f 0) to be the main correlate of acoustic prominence in Persian. Meanwhile, no experimental work was done to corroborate these impressionistic claims until the study by Abolhasanizadeh, Bijankhan, and Gussenhoven (2012), which I turn to below (see also Sadeghi 2012, cited in Hosseini 2014). Abolhasanizadeh, Bijankhan, and Gussenhoven (2012) is the first study of its kind to provide an experimental basis for the phonetic correlates of prosodic prominence in Persian. For their experiment, Abolhasanizadeh et al. came up with minimal pairs of segmentally identical words, where one member consisted of a single root and the other a root plus a non-​cohering suffix (or clitic). Recall that non-​cohering suffixes are not part of the phonological word and do not receive stress. This provided them with words that were minimally distinct only in the position of stress. An illustrative pair of this kind is: tābésh ‘light’ vs. tā́b-​esh swing-​her/​his ‘her/​his swing’. These targets words were then placed in carrier sentences with differing syntactico-​semantic conditions. In (15), I show only the two conditions discussed here, namely focus-​neutral (15a) and post-​focal (15b), where underlining marks focus.26 (15) a. un tābésh-​e that light-​is ‘That is light.’ b. un tābésh-​e that light-​is ‘THAT is light.’ 26 


un tā́b-​esh-​e that swing-​her/​his-​is ‘That is her/​his swing.’


un tā́b-​esh-​e that swing-​her/​his-​is ‘THAT is her/​his swing.’

The examples in (15) have been modified in line with the transcription and notational conventions used in this chapter. Also, in addition to the conditions in (15), Abolhasanizadeh et al. considered a focal condition, where the target words were focused and moved clause-​initially. They also considered the three conditions in interrogative sentences. I am abstracting away from these details for convenience.

Prosody   155 The sentences were read by twelve speakers, six male and six female, and the results were analysed by Praat (Boersma 2002). The authors compared a stressed syllable with its segmentally identical unstressed syllable, e.g. esh in the two words in (15a), for pitch accent (f 0), duration, intensity, and spectral measures. They found that the only significant difference between the two stressed and unstressed syllables was in pitch accent (f 0) and the differences in the other factors were insignificant and could be attributed to side effects of pitch accent placement. Their conclusion is that prosodic prominence in Persian is only marked by pitch (f 0).27 Abolhasanizadeh et al. also consider a second question related to the condition illustrated in (15b). Their question is whether post-​focal words undergo complete deaccentuation. In other words, in the examples in (15b), is the distinction between the stressed and unstressed syllables esh lost when they are used after the focused un? They conduct both production and perception experiments to test this and find that while there is post-​focal compression, i.e. the pitch range is reduced, no neutralization occurs. Put differently, the stressed syllables maintain some prosodic prominence (marked by pitch) even in the context of a prosodically more prominent focused element (see footnote 25). It is worth noting, however, that Abolhasanizadeh et al.’s second claim with respect to lack of complete deaccentuation in post-​focal contexts has been questioned in more recent work by Rahmani et al. (2016). They question the design of Abolhasanizadeh et al. with respect to how focus was implemented in their experimentation, with focused words simply being printed in bold letters. Rahmani et al. incorporated the notion of focus in their experimentation much more carefully to ensure that the subjects truly treated the focused words accordingly. They concluded that indeed, as suggested by some previous scholars (e.g. Eslami 2000; Sadat-​Tehrani 2007), Persian word accents are deleted in post-​focal contexts. In short, while previous phonetic descriptions of prosodic prominence in Persian seemed to be inconclusive with respect to the relevant phonetic correlates, the experimental work by Abolhasanzadeh et al. seems to clearly show that prosodic prominence is marked by pitch and any other differences that may be found are statistically insignificant.28 With this brief discussion of the phonetic correlates of prosodic prominence in Persian, we can now turn to an overview of intonation in Persian. The most comprehensive work on Persian intonation to date is that of Sadat-​Tehrani (2007), which the discussion that follows


It is worth noting that under some approaches, once it is established for a particular language that the only phonetic correlate of prosodic prominence in a word is pitch features, the use of the term ‘stress’ is considered inappropriate. Beckman (1986), for example, uses the term ‘non-​stress accent’ for these instances and retains the term ‘stress accent’ for those cases where other phonetic cues such as duration or intensity are also involved. From a different perspective, ‘stress’ is the term used for the prosodic prominence which is obligatory on all content words regardless of how it is phonetically realized, pitch accent or otherwise (Hyman 2006). This paper is more in line with the latter approach. The term ‘stress’ is used here simply to refer to a high level of prosodic prominence. 28  In a recent study of the phonology and phonetics of prosody in Persian, Hosseini (2014) compares nuclear (final) and pre-​nuclear (non-​final) accents in Persian and finds that they are phonetically distinct. Hosseini finds that two types of accents differ in the shapes of the pitch curves and where they fall and this seems to be the most significant factor based on his production and perception experiments. In the production experiment, he found that the syllable with the nuclear accent has a longer duration than one with pre-​nuclear accent but this difference did not seem to play as significant a role in perception.

156   Arsalan Kahnemuyipour is largely based on.29 Sadat-​Tehrani (2007) used 528 utterances involving different types of simplex and complex sentences read by eight native speakers of Persian and examined the pitch track of all these utterances using Praat. He then analysed these pitch tracks in the autosegmental-​metrical framework (Bruce 1977; Ladd 1996; Liberman 1975; Pierrehumbert 1980; among others). According to this framework, the tonal structure consists of phonologically significant tonal events such as pitch accents and edge tones. In this model, there are two primitive tonal levels, H(igh) and L(ow). Sadat-​Tehrani takes the smallest unit of Persian prosody to be the Accentual Phrase (AP), which typically consists of a content word and all its clitics. Each AP is associated with the pitch accent pattern L+H*, where the asterisk shows association with the stressed syllable of the word (see also Mahjani 2003). According to Sadat-​Tehrani, the L+H* representation has two variants, with L+H* the default, used in polysyllabic words with final stress and H*, the variant used for initially-​stressed and monosyllabic words. Sadat-​Teharni takes the next level of Persian prosody to be the Intonational Phrase (IP), which dominates one or more APs. The right edge of an IP is marked by a low or high boundary tone (L% or H%) depending on the type of sentence involved, to be elaborated below.30 Sadat-​Tehrani uses these basic notions to map out the intonational structure of Persian sentences in various types of simplex and complex clauses. Below, we review some of the basic patterns he discusses (see the original work for more details). The intonational structure of a simple declarative sentence in Persian consists of one IP and one or more APs, with everything after the Nuclear Pitch Accent (NPA) (referred to in previous sections as main stress) being deaccented. Each AP is associated with L+H* and there is an L% boundary tone at the end. The declarative pattern is exemplified in (16) (adapted from Sadat-​Tehrani 2007: 8, fig. 6).31 In this example, āli receives the NPA. (16)  L+H* L+H* L% havā ālí mi-​sh-​e weather excellent dur.-​become-​3sg ‘The weather will become excellent.’ The tonal pattern of yes/​no questions is very similar to declaratives, according to Sadat-​ Tehrani, with the only difference that they have a H% boundary tone in contrast to the L% in declarative clauses. The location of the NPA is still similar to a declarative sentence with following material being deaccented up to the H% boundary tone. An example is given in (17) (adapted from Sadat-​Tehrani 2007: p. 9, Fig. 7). Recall from the previous section that specific

29  For additional discussions of Persian intonation, see Eslami (2000); Hayati (1998); Lambton (1957); Mahjani (2003); Mahootian (1997); Sadat-​Tehrani (2009); Towhidi (1974). 30  Sadat-​Tehrani (2007) argues for another intermediate level of boundary tones, l (low) and h (high), which mark the right edges of APs. He suggests that the l boundary tone marks the edge of the AP with nuclear pitch accent (what was referred to as main stress of the sentence in previous sections), while all other APs are marked by the h boundary tone. He discusses some exceptions to this generalization. In this discussion, I am abstracting away from the intermediate boundary tones and will not show them in any of the examples. 31  The examples in this section are all from Sadat-​Tehrani (2007), but they have been modified in accordance with the transcription conventions used in this paper. I am also not showing the pitch tracks here.

Prosody   157 objects do not receive the main stress in the sentence, and as such the verb receives the NPA in this example. (17)

L+H* L+H* L+H* H% shāgerd-​ā miz-​ā-​ro avórd-​an? student-​pl. table-​pl-​acc. brought-​3pl. ‘Did the students bring the tables?’

The tonal pattern of wh-​questions is different from yes/​no questions and similar to declaratives in that it has a L% boundary tone. Meanwhile, the wh-​word attracts the NPA of the whole IP and causes deaccentuation up to the end of the clause. This is not surprising as the wh-​word behaves like a focused phrase and receives the main stress of the clause (see section 6.3, example (14)). The similarity between the tonal patterns of focused phrases and wh-​phrases is also confirmed by Sadat-​Tehrani who illustrates the pitch tracks for both of these constructions. An example of a wh-​question is given in (18) (adapted from Sadat-​Tehrani 2007: 10, fig. 8). (18)

L+H* L+H* L% bachche-​hā az kojā́ ketāb xarid-​an child-​pl from where book bought-​3pl ‘Where did the children buy books from?’

Sadat-​Tehrani also discusses the tonal pattern of several types of complex clauses in Persian. The first case he considers is coordinated sentences. He shows that in coordinated clauses, each clausal conjunct behaves like a regular IP in Persian. Meanwhile, only the last IP has a L% boundary tone, expected in Persian declarative clauses. All the other IPs are realized with an ‘incomplete’ intonation pattern ending with a H% boundary tone. An example of a coordinated clause is given in (19) (adapted from Sadat-​Tehrani 2007: 11, (6)). In this example, the first clausal conjunct ends in a H% boundary tone, while the second one ends in L%. Note that the NPA of the second clausal conjunct is on zang and as a one-​syllable word, the pitch accent realized on it is H*. (19)

L+H* L+H* H% L+H* nāme-​hé resíd va be-​hesh letter-​def. arrived and to-​her/​him ‘The letter arrived and I called her/​him.’

H* L% záng zad-​am bell hit-​1sg.

Subordinate clauses show a pattern very similar to the coordinated clauses discussed above in that there is a H% boundary tone before the embedded clause, which itself ends in a L% boundary tone (see Chapters 3, 7, and 8 for more information on subordinate clauses). An example of a clause subordinated under the main verb say is given in (20) (adapted from Sadat-​Tehrani 2007: 13, fig. 10). (20)  L+H* L+H* H% L+H* L+H* L+H* L% amin gofté bud ke shāgerd-​ā miz-​ā-​ro āvórd-​an Amin say.past-​part was that student-​pl table-​pl-​acc. brought-​3pl ‘Amin had said that the students brought the tables.’

158   Arsalan Kahnemuyipour In this section, after a brief discussion of the phonetic correlates of stress in Persian, I provided an overview of some basic tonal structures of the language. A more detailed exploration of the phonetics of prosody and intonation in the Persian language is beyond the scope of this chapter and the interested reader is referred to the original works cited above.

6.5 Conclusion In this chapter, I have provided a brief overview of the prosody of Persian. We took our starting point to be word stress and the seemingly unsystematic behaviour of the distribution of prosody at the word level. To begin with, it looked like several words in various categories defied the otherwise general word-​final stress rule. We showed that these cases can be accounted for with a correct understanding of the morphosyntax of Persian and a distinction between cohering and non-​cohering suffixes. Under this view, the word-​final stress rule applies to the phonological word which includes cohering suffixes, but excludes non-​cohering ones. We further explored the apparently non-​final stress in prefixed Persian verbs and showed that their behaviour is by no means exceptional and falls within a more general pattern of leftmost stress in the verb phrase. This view of stress in the verb phrase coupled with a better understanding of the syntax of Persian enabled us to explain the stress pattern differences between specific and non-​specific objects and different types of adverbials. This still left us with just a handful of words with exceptional non-​final stress. Several possible generalizations were explored in this regard, while none enabled us to explain away the idiosyncratic nature of the stress in all of these forms. We ended the discussion of Persian prosody with a brief overview of the phonetic correlates of prosodic prominence and intonational patterns in Persian.

Pa rt  I I I


chapter 7

GE NERATIVE A PPROAC H E S TO SY NTAX Simin Karimi 7.1 Introduction Persian is a member of the Southwestern Iranian language family spoken in Iran (Farsi), Afghanistan (Dari), and Tajikestan (Tajiki). Various aspects of this language, specifically its syntactic properties, have attracted the attention and interest of traditional grammarians and modern linguists. The goal of this chapter is to offer a descriptive overview of some of the major syntactic and morphosyntactic properties of Persian, and to introduce the reader to the major literature provided by grammarians and linguists inside and outside Iran. Persian grammars in the twentieth century were primarily written for the purpose of language learning, and consisted of descriptions of various aspects of this language. One of the first such grammars was authored by Mirza Habib Esfahani, published in 1910 in Istanbul (Natel-Khanlari 1986). The first grammar that was actually used in schools was a booklet written by Mirza Qarib in 1911 (Windfuhr 1979). One of the most influential grammars was a book known as dastur-​e panj ostâd ‘grammar of five masters’, written by Qarib and four of his colleagues, published in 1950.1 This book offers descriptive discussions of various properties of the language, including the syllable structure, nouns, verbs, numbers, and the Ezafe morpheme.2 It also includes descriptions and examples of what they call goftar ‘complete sentence’ and soxan ‘incomplete sentence’. Almost all the data presented in this grammar are borrowed from the work by famous Iranian poets. Another influential work within the grammarian tradition is Dastur-​e Zabân-​e Farsi ‘Grammar of Persian language’ by Natel-Khanlari (1984). This book is specifically devoted to a descriptive analysis of Persian morphology and syntax, including sentence types (interrogative, imperative, complex), as well as verbal and nominal morphology, and has a section on specific lexical variations such as bâyad, bâyast, and bâyesti, special phrasal modifiers, and

1  Qarib, Bahar, Foruzanfar, Homayi, and Rashidi are the authors of dastur- e panj ostâd ‘grammar of five masters’ mentioned above. 2  See section 7.4.1 in this article for a discussion of this element.

162   Simin Karimi ‘incorrect’ application of the morpheme -​râ. The majority of data discussed in this work are also borrowed from famous classical texts and poetry.3 There are a number of Persian grammar books written by non-​Iranian scholars, with the same pedagogical goal in mind. These works include Lambton (1953), Lazard (1992), and Thackston (1993). They are primarily based on Persian formal written language, although Thackston has examined aspects of the spoken language as well. The first description of Persian syntax based on modern linguistics criteria is Bateni (1969). This work rests on a description and analysis of 11,000 Modern Persian sentences taken from newspapers, journals, various writings, as well as the colloquial language. The discussion includes analyses of the structure of main and subordinate sentences, verb phrases, noun phrases, and adverbial phrases. Another grammar based on modern linguistics is Meshkoatoddini (1987). This work addresses various aspects of Persian syntax within the framework of Transformational Grammar (Chomsky 1965). Mahootian’s (1997) widely cited work is a linguistics-​based comprehensive analysis of contemporary conversational Persian, and consists of discussions of major grammatical aspects of this language, including its syntactic and morphological properties. During the last half-​century, specific aspects of Modern Persian have been analysed within various theoretical frameworks. Based on the number of dissertations and published work in the last several decades, it is evident that the syntactic and morphosyntactic properties of Persian have attracted more attention than other aspects of this language. The organization of this chapter is as follows. Section 7.2 discusses the clausal and phrasal architecture of Persian, including word order variations, scrambling, and wh-​constructions. One of the most discussed properties of Persian syntax is the so-​called complex predicates (CPr). This topic is addressed in section 7.3. Persian noun phrases, including their simple and complex versions, as well as the Ezafe construction and classifiers are the topics of section 7.4. Another property of Persian, shared by some other languages such as Turkish, is the differential object marking. The nature of this property, and the role of the morpheme -​râ, are reviewed in section 7.5. Passive, causative, and resultative constructions are subjects of section 7.6, followed by a discussion of raising and control in section 7.7. Section 7.8 is devoted to a number of topics that have not been explored widely in the literature: modality, aspect, and negation, as well as ellipsis and sluicing. Section 7.9 concludes this chapter. As for the data taken from other sources, I have retranscribed, reglossed, and retranslated them in some cases for the purpose of uniformity.

7.2  Clausal architecture, scrambling, and wh-​constructions A description of the basic clausal architecture of Persian is offered in 7.2.1, followed by a discussion of scrambling, one of the major syntactic properties of this language in 7.2.2. 3  Other grammar books within the grammarian tradition devoted to Persian morphology and syntax include Vazinpour (1976) and Mogharrabi (1993). Again, these works base their descriptions on formal written language and Persian poetry.

Generative Approaches to Syntax    163 Wh-​constructions are discussed in 7.2.3. For further information on scrambling, see Chapter 3, and for wh-​constructions, see Chapter 5.

7.2.1 Clausal architecture Persian is a head-​initial language with the exception of the verb phrase.4 The word order in a discourse neutral sentence is the one exemplified by (1a) with a specific direct object5 and (1b) with a non-​specific direct object.6 (1)

a. mâdar  dâstân-​e  hezâr -​o    yek shab-​ro    barâ  bachche-​hâ  tarif       kard mother story-​ Ez thousand-​and one night-​ râ for  child -​  pl    description did ‘The mother told the children the story of a thousand and one nights.’ b. mâdar    barâ bachche-​hâ se -  tâ   dâstân tarif     kard mother for   child-​pl   three-​ Cl    story  description  did ‘The mother told the children three stories.’

As these data suggest, the verb appears in the final position. However, the complement clause to the verb follows the verb, as in (2). (2)  a. man be Parviz goft-​am [CP  (ke)  Kimea  fardâ      mi -​ r -​ e] I   to Parviz said-​1sg  [  that Kimea tomorrow  asp-​go-​ 3sg] ‘I told Parviz that Kimea will go tomorrow.’ b.  man did-​am  [CP (ke)  Parviz bâ    Kimea harf mi-​zad ] I   saw-​1sg [  that Parviz with Kimea talk  asp-​hit-​3sg] ‘I saw that Parviz was talking with Kimea.’ Note that the complementizer ke ‘that’ is optional in these examples. Also, the verbal concept in the embedded clause is realized as a complex unit. The same word order holds for subordinate clauses indicating indirect questions. (3) man az Parviz porsid-​am [CP (ke) Kimea key   mi-​r -​e] I   of Parviz asked-​1sg [  that Kimea when asp-​go-​ 3sg] ‘I asked Parviz when Kimea will go.’ Persian word order changes drastically when the information conveyed by the sentence interacts with discourse phenomena. This is the subject of the next section. (For more discussion on word order, see also Chapters 3, 8, 10, and 15.)

4  See Karimi (1989); Darzi (1996); Mahootian (2007); among others, for discussions of Persian word order. 5  See section 7.5 for a descriptive discussion of the properties of specific and non-​specific direct objects in Persian. 6 Abbreviations: Ez: Ezafe affix; -​râ: Accusative marker for specific objects; sg: singular; pl: plural; asp: aspect, neg: negation; subj: subjunctive; Cl: classifier; rel: relative, ind: indefinite, emph: emphatic.

164   Simin Karimi

7.2.2 Scrambling Scrambling is a syntactic property observed in many languages (Persian, Turkish, Japanese, Korean, Hindi, to name a few). In these languages, phrasal categories may appear in different positions in the clause. Karimi (1999, 2005) and Rasekh-​Mahand (2003) offer elaborated discussions of this phenomenon in Persian. The following clausal architecture is based on Karimi’s (2005) proposal. (4) [CP [TopP [FocP [TP [T’ [ [vP [v’ [XP [X’ ]] v ]]]]]]]] Karimi suggests that both TopP and TP host the topic in this language, thus a focal element may precede or follow the topic (in Spec of TP), or appear between two topics. Consider the following data in which the scrambled element reveals either a focus or a topic interpretation, depending on the intonation. Unstressed scrambled elements reveal a topic interpretation, while their stressed versions receive a focus reading.7 The bold ‘e’ in these examples represents the original position of the moved element. (5)  Scrambling of the specific object over the subject ketâb-​o   Parviz e barâ Kimea xarid book -​ râ Parviz   for  Kimea bought ‘As for the book, Parviz bought (it) for Kimea.’ Or ‘It was the BOOK that Parviz bought for Kimea.’ (6) Scrambling of the Indirect Object over the Subject barâ Kimea Parviz ketâb-​o e xarid for   Kimea Parviz  book-​ râ  bought ‘As for Kimea, Parviz bought (her) the book .’ Or ‘It was for KIMEA that Parviz bought the book.’

O-​râ S




O-​râ V

Note that the direct object and the indirect object may both scramble to precede the subject. In this case, one of them may receive a topic interpretation and the other a focus reading. These elements may appear in different orders depending on the position of the topic, as in (7a,b). (7) Scrambling of the direct and indirect object over the subject a. [ketâb-​o]1 [barâ   Kimea]2  Parviz e1  e2 xarid b. [barâ   Kimea]2 [ketâb-​o]1 Parviz e1  e2 xarid

O-​râ   IO    S          V IO




Non-​specific objects may also scramble. They usually receive a contrastive reading when scrambled, as in (8a), although they may also reveal a topic interpretation, as in (8b).


See also Ganjavi (2003), who claims that Persian scrambling is not driven by focus.

Generative Approaches to Syntax    165 (8) a.  KETAB Parviz barâ Kimea e xarid O S IO BOOK Parviz for Kimea bought ‘Parviz bought BOOKS for Kimea.’ (as opposed to journals) b. gusht Kimea hichvaght e ne -​mi -​xor-​e meet Kimea never neg-​asp-​eat -​3sg ‘As for meet, Kimea never eats (it).’


O S Adverb V

Persian also exhibits long-​distance scrambling. That is, elements of the subordinate clause, with the exception of the verb, may move into the main clause, as the following data reveal. (9)

Long-​distance scrambling of the embedded subject Kimea  man  fekr    mi -​kon-​am [ (ke) e in  dâstân-​ o dust   dâr-​e ] Kimea  I   thought  asp-​do-​1sg   that     this     story-​râ    friend  have-​3sg ‘As for Kimea, I think that (she) likes this story.’ Or ‘It is KIMEA I think that (she) likes this story.’ (as opposed to Parviz.)


Long-​distance scrambling of the embedded specific direct object in  dâstân-​o  man  fekr    mi -​ kon-​am [ (ke)      Kimea  e  dust   dâr-​e ] this  story -​ râ  I     thought   asp-​do-​1sg    that  Kimea    friend  have-​3sg ‘As for this story, I think that Kimea likes (it).’    Or ‘It is this STORY I think that Kimea likes.’


Long-​distance scrambling of the embedded indirect object be Kimea  man  fekr   mi-​kon-​am [(ke)  Arezu  un      ketâb-​o   e dâd to Kimea   I     thought  asp-​do-​1sg  that   Arezu  that  book-​râ     gave-​3sg ‘To Kimea I think that Arezu gave that book.’ Or ‘It was to Kimea I think that Arezu gave that book.’

Similar to simple sentences, long-​distance scrambling of multiple arguments is also possible, as exemplified by the following sentences. Again, the interpretation of the dislocated elements is based on their stress pattern. (12)  a. [ketâb-​â -​ro]1 [be Kimea]2 man fekr mi-​kon-​am [(ke) Arezu e1 e2 b. [be Kimea]2 [ketâb-​â -​ro]1 man fekr mi-​kon-​am [(ke) Arezu e1 e2 Adjuncts undergo scrambling as well. (13)

tu sinamâ       man  fekr   mi-​kon-​am  [(ke)  Parviz  Kimea-​ro  e did in movie theatre  I      thought  asp-​do-​1sg    that   Parivz  Kimea-​râ    saw ‘As for in the movie theatre, I think Parivz saw Kimea (there).’ ‘It was in the movie theatre I think that Parviz saw Kimea’

Next, the displacement of wh-​arguments and adjuncts will be discussed.

dâd] dâd]

166   Simin Karimi

7.2.3 Wh-​constructions Persian does not exhibit structural wh-​movement. That is, wh-​phrases may stay in situ, and yet receive a wh-​interpretation.8 (14)  a. Parviz chi-​ro be Arezou dâd Parviz what-​râ to Arezou gave ‘What did Parviz give to Arezou?’ b.  Parviz ketâb-​o be ki dâd Parviz book-​râ to who gave ‘Who did Parviz give the book to?’ c. Parviz ketâb-​o key be Arezou dâd Parviz book-​râ when to Arezou gave ‘When did Parviz give the book to Arezou?’ Wh-​arguments and adjuncts, however, may scramble. This movement has been suggested to place the wh-​phrase in the Specifier of a focus phrase (Karimi 1999, 2005; Megerdoomian and Ganjavi 2000; Karimi and Taleghani 2007; Toosarvandani 2008).9 (15)

a. chi-​ro Parviz e be Arezou dâd what-​râ Parviz to Arezou gave ‘What was it that Parviz gave to Arezou?’ b. be ki Parviz ketâb-​o e dâd to whom Parviz book-​râ gave ‘To whom was it that Parviz gave the book?’ c. key Parviz ketâb-​o e be Arezou dâd when Parviz book-​râ to Arezou gave ‘When was it that Parviz gave the book to Arezou?’

Note that scrambling of wh-​phrases is subject to superiority condition (Karimi 1999, 2005; Kahnemuyipour 2001; Lotfi 2003; Karimi and Taleghani 2007). That is, a wh-​phrase may not cross another one in a higher position.

8  Dari, a variant of Persian spoken in Afghanistan, has an interesting property. While the wh-​phrase is in situ, another wh-​phrase appears in sentence initial position.

(i)  chi fekr mi-​kon-​i [u    ki-​râ   did] ? what thought asp-​do-​2sg    she  who-​râ  saw ‘Who do you think she saw?’ Lit: what do you think who she saw? See Karimi and Taleghani (2007) for an analysis. 9  Raghibdust (1994) suggests, however, that the dislocated wh-​phrases represent topic.

Generative Approaches to Syntax    167 (16) a. ki chi xarid? who what bought ‘Who bought what?’ b. *chi ki


(16a) has a pair-​reading interpretation. The answer is something like ‘Kimea bought a book, Parviz a shirt, Arezou a pair of shoes’ (Kahnemuyipoiur 2001; Lotfi 2003; Karimi 2005). Long-​distance scrambling is also possible, and is subject to superiority condition.10 (17) a. kii pro fekr mi -​kon-​i [CP ei bâ ki be-​raghs-​e] who thought Asp-​do-​2sg with who subj-​dance-​3sg ‘Who is it you think will dance with whom?’ b. * bâ kik pro fekr mi-​kon-​i [CP ki tk be-​raghs-​e] Note that the sentence in (17b) is grammatical if the wh-​phrase in situ is not stressed. In that case, it is interpreted as an indefinite DP with no quantificational force, similar to ‘someone’ in English (Karimi 1999). Furthermore, the surface order of two scrambled wh-​phrases is subject to a certain restriction: they must appear in the same order as those in situ. (18) a. [ki]i [bâ ki]k pro fekr mi-​kon-​I [CP ei ek be -​raghs -​e] who with who thought asp-​do-​2sg subj-​dance-​3sg Lit: ‘Who with whom is it you think will dance?’ b. * [bâ ki]i [ki]k pro fekr mi-​kon-​I [CP ei ek

be -​raghs -​e]

The indirect object precedes the subject in (18b), rendering the sentence ungrammatical. The next section is devoted to a descriptive discussion of Persian complex predicates, one of the most discussed properties of Persian syntax.

7.3  Complex predicate constructions Complex predicates (CPr) are verbal complex constructions consisting of more than one word which convey information that is normally expressed by a single verb in a language like English.11 In (19b), for example, the Persian complex predicate shekast dâd ‘defeat gave’ corresponds to the English verb defeated in (19a). These constructions typically consist of a light verb (LV) and a non-​verbal element (NVE) (dâd and shekast, respectively, in the Persian example). Complex predicates are further discussed in Chapters 2, 3, 8, 9, 10, 15, 17, and 19. 10 

Persian is a Null-​Subject language, hence the presence of pro in the subject position. These elements are also called compound verbs (e.g. Dabir-​Moghaddam 1995) and phrasal verbs (e.g. Bateni 2007) in the literature. 11 

168   Simin Karimi (19) a. Mary defeated John. b. Mary John-​ro shekast dâd. Mary John-​râ defeat gave ‘Mary defeated John.’ Complex predicates are particularly common in South Asia (among Turkic, Indic, and Iranian languages) as well as in Northern Australia and some parts of Papua New Guinea. Due to the syntactic and morphological peculiarities of these elements, this topic has received extensive attention in the last few decades. In this regard, the role of different components of the complex predicate, and their contribution to the meaning and argument structure of the whole, have been the focus of many research projects. Thus alternative views have been proposed regarding the relation between the NVE and the LV in these constructions. Some have interpreted a nominal NVE as the internal argument of the light verb (Lieber 1980). Others have considered the formation of complex predicates as syntactic incorporation, by which one semantically independent word comes to be inside the other (Baker 1988, 1996). Grimshaw and Mester (1998) suggested that light verbs are semantically deficient, and serve as a host for agreement and tense morphology. They argue that the nominal NVE of the complex predicate lends its arguments to the light verb, turning it into a theta marker. Finally, Mohanan (1997) suggests an argument-​sharing theory based on Hindi. In Persian, complex predicates have gradually replaced simple verbs since the thirteenth century. The tendency to form complex verbs has resulted in the existence of two sets of verbs, simple and complex, for a number of verbal concepts. In many cases, the application of the simple verb is restricted to the written and elevated language (Dabir-​Moghaddam 1995; Karimi 1997). The productivity of CPr formation is such that it has completely replaced the former morphological rule of simple verb formation in this language (Bateni 1989). Thus it is not surprising that this topic has received enormous attention by linguists interested in Persian syntax.12 In this section, I discuss the properties of the light verb and the non-​verbal element of Persian complex predicates in 7.3.1 and 7.3.2, respectively. I  conclude this section with a brief introduction to an interesting construction that involves a special case of complex predicates.

7.3.1 Light verbs The light verb of Persian complex predicate ranges over a number of simple verbs (Karimi 1997; Megerdoomian 2012a). These verbs include, among others, kardan ‘doing, making’, 12 

Various aspects of complex verb constructions have been discussed by Moyne (1970); Tabaian (1979); Bashiri (1981); Barjesteh (1983); Karimi (1997, 2005); Heny and Samiian (1991); Mohammad and Karimi (1992); Sadeghi (1993); Ghomeshi and Massam (1994); Dabir-​Moghaddam (1995); Vahedi-​Langrudi (1996); Karimi-​Doostan (1997, 2005, 2008, 2011); Megerdoomian (2002a, 2002b, 2012a); Goldberg (2003); Folli, Harley, and Karimi (2005); Family (2006); Samvelian (2006, 2012); Toosarvandani (2009); Sedighi (2009); Müller (2010); Pantcheva (2010); Shabani-Jadidi (2014); Samvelian and Faghiri (2013); among others.

Generative Approaches to Syntax    169 shodan ‘becoming’, xordan ‘colliding’, gereftan ‘catching, taking’, zadan ‘hitting’, keshidan ‘pulling’, dâdan ‘giving’, dâshtan ‘having’, âmadan ‘coming’, raftan ‘going’, and bordan ‘carrying’. Following Grimshaw and Mester (1988), some authors have suggested that the Persian LV is semantically bleached (Mohammad and Karimi 1992; Karimi-​Doostan 2005), and serves as a Case assigner (Karimi-​Doostan 2005). Others have proposed that the choice of LV determines whether or not the CPr selects for an agent (Karimi 1997; Megerdoomian 2002a; Folli, Harley, and Karimi 2005).13 This is shown in the following contrasts: the light verb dâdan ‘to give’ is agentive, while xordan ‘to collide’ is unaccusative. (20) a. dolat mardom-​o farib dâd government people-​râ deception gave ‘The government deceived the people.’ b. mardom farib xord -​an people deception collided-​3pl ‘The people got deceived.’ Similarly, the causativity of CPr is suggested to be determined by the LV (Megerdoomian 2002b; Folli, Harley, and Karimi 2005). (21) a. âb be jush âmad water to boil came ‘The water boiled.’ b. Nimâ âb-​ro be jush Nima water-​râ to boil ‘Nima boiled the water.’

âvard brought

(Megerdoomian 2002b)

Finally, it has been suggested that the LV is responsible for the eventiveness (Folli, Harley, and Karimi 2005)  and the duration (Megerdoomian 2002a; Folli, Harley, and Karimi 2005) of the CPr. Compare the following data. (22) a. be yâd dâshtan to memory have ‘To have (something) in the memory.’ b. be yâd âvardan to memory bring ‘To remember.’ (23) a. dast zadan hand hit ‘to touch’





See Samvelian (2006a) and Samvelian and Faghiri (2013) for a different view regarding this issue.

170   Simin Karimi Duration

b.  dast keshidan hand pulling ‘to touch’

I continue the discussion of the Persian CPr by an overview of some of the properties of their NV elements.

7.3.2 Non-​verbal elements The Persian NVE ranges over a number of different elements. (24) a. Nominal

dast zadan

(hand hitting)‘

‘to touch’

b. Eventive Nominal shekast dâdan

(defeat giving)

‘to defeat’

c. Adjective

tamiz kardan

(clean doing)

‘to clean’

d. Adverbial

birun kardan

(out doing)

e. Particle

pas dâdan


‘to return’

f. PP

be bâd dâdan

(to wind giving)

‘to waste’

‘to fire (someone)’

The Persian NVE seems to be syntactically independent of the LV, since it can be modified, scrambled, and elided.14 The example in (25) shows that the NVE kotak ‘beating’ is syntactically modified by the adjective bad ‘bad’ in an Ezafe construction. Note, however, that this elem­ent receives an adverbial interpretation, modifying the CPr kotak xord ‘got beaten’ (Karimi 1997; Megerdoomian 2012a; Karimi, Key, and Tat 2014). Thus the actual meaning of the sentence in (25) is ‘he was beaten badly’. (25) kotak-​e bad-​i xord beating-​Ez bad-​ind ate ‘He got a bad beating.’

(Megerdoomian 2012a: 196)

Furthermore, the NVE may scramble away from the LV, as in (26). Again, even though che ‘what’ and saxti ‘hard’ syntactically modify the NVE zamin ‘earth’, semantically they modify the whole CPr, as the English translation indicates. (26) Kimea [che zamin-​e saxti ]i diruz [CV ei xord ] Kimea what earth-​Ez hard yesterday collided ‘What a hard fall Kimea had yesterday.’

(Karimi 1997: 304)

The NVE is also subject to ellipsis, as in (27). (27) Kimea faghat man-​o da’vat karde, to-​ro ke _​_​ na-​karde Kimea only me-​râ invitation did, you-​râ emph _​_​ neg-​did ‘Kimea has only invited me, not you.’

(Karimi 1997: 301)

14  See Karimi-​Doostan (2011) for a discussion of separability of the Persian nominal NVE. This author suggests that some nominal NVEs may function as the direct object of the verb, in which case they may be modified by an adjective, be relativized, scrambled, and focused.

Generative Approaches to Syntax    171 Furthermore, the NVE is considered to determine the telicity of the complex predicate. That is, eventive nominals, adjectives, particles, and prepositional phrases are responsible for the telic (accomplishment and achievement) interpretation of the CPr, while nominal elements provide an atelic (activity or semelfactive) interpretation for them (Folli, Harley, and Karimi 2005). Finally, one of the issues discussed in the literature is whether the nominal NVE and the non-​specific object receive a uniform treatment, or whether they should be considered as belonging to two distinct categories. Ghomeshi and Massam (1994) argue in favour of the former, based on evidence suggesting that they both give rise to unbounded predicates, while the specific object reveals bounded properties. Compare the following data: (28) a. man barâ-​ye ye sâ’at/​*ye  sâ’ate bâ Parviz harf zad-​am I for-​Ez an hour/​within an hour with Parviz letter hit-​1sg ‘I talked with Parviz for an hour/​*in an hour.’ b.  man barâ-​ye ye sâ’at/​*ye  sâ’ate ketâb xund-​am I for-​Ez an hour/​within an hour book read-​1sg ‘I read books for an hour/​*in an hour.’


Non-​specific object

c. man *barâ-​ye ye sâ’at/​ye sâ’ate un ketâb-​o xund-​am Specific object I for-​Ez an hour/​within an hour that book-​râ read-​1sg ‘I read that book *for an hour/​in an hour.’ However, as shown by Karimi-​Doostan (1997), Folli, Harley, and Karimi (2005), and Megerdoomian (2012a), the nominal NVE may also appear in bounded predicates. (29) kâr-​esh *barâ-​ye ye -​sâ’at /​ye  sâ’ate anjâm gereft Work-​her for –​Ez an -​hour /​within an hour accomplish caught ‘Her work was taken care of *for an hour/​in an hour.’


Furthermore, the non-​specific object and the verb can appear in an Ezafe construction, while this is not possible in the case of an NVE and LV, as the contrast in (30b) and (31b) reveals (Karimi 1997). (30) a. Kimea be Râmin ketâb dâd Kimea to Ramin book gave ‘Kimea gave (a) book to Ramin.’ b. dâdan-​e ketâb be Râmin dorost na-​bud giving-​Ez book to Ramin right neg-​was ‘Giving books to Ramin was not right.’ (31) a. Kimea be radio gush dâd Kimea to radio ear gave ‘Kimea listened to the radio.’

172   Simin Karimi b. *dâdan-​e gush be râdio dorost na-​bud giving-​ Ez ear to radio  right  neg-​was Finally, the non-​specific object of a verb can undergo scrambling to the sentence initial position without the need to have a quantificational quality. This derivation is blocked with respect to the NVE of a complex predicate (Karimi 1997). (32) a. Kimea mehmun da’vat kard Kimea guest invitation did ‘Kimea invited guests.’ b.  [mehmun]i Kimea ei da’vat kard c. *[da’vat]i Kimea mehmun ei kard The data presented above indicate that the non-​specific object of a heavy verb is syntactically distinct from the NVE of a complex predicate.

7.3.3 Impersonal complex predicates There is a class of Persian complex predicates, generally known as impersonal constructions, that exhibit interesting properties. One of the peculiarities of these constructions is that the optional DP in the clause initial position is co-​indexed with a clitic that is attached to the NVE of the CPr. Furthermore the verb is always in third-​person singular, regardless of the person and number of the optional DP. Karimi (2005) discusses two types of these constructions, calling them inalienable possessor constructions and inalienable pseudo-​possessor constructions, exemplified by (33) and (34), respectively. The difference between the two is that the former has BE as its LV, while the latter has an unaccusative LV other than BE. (33) Inalienable possessor constructions a. (mani) gorosn-​ami-​e I hungry-​me-​is ‘I am hungry.’ b. (Kimeai) sard-​eshi-​e Kimea cold-​her -​is ‘Kimea is cold’ (34) Inalienable pseudo-​possessor constructions a. (man) az in rang xosh-​am mi-​yâ-​d I of this colour pleasure-​me asp-​come-​3sg ‘I like this colour.’

(Karimi 2005: 78)

Generative Approaches to Syntax    173 b. (bachche-​hâ) xeyli dard-​eshun mi-​yâd child-​pl very pain-​them asp-​come-​3sg ‘The children are hurting badly.’

(Karimi 2005: 83)

Considering constructions involving psych verbs such as fear, Harley (1999) expresses the intuition that HAVE is in fact a prepositional element incorporated into a verbal be. Based on such analysis, Karimi (2005) suggests the structure in (35) for the inalienable possessor construction in (33): the adjective HUNGER moves into HAVE, providing the interpretation ‘possessing hunger’. Furthermore, the copy of the possessor DP in the Specifier of the PredP appears as phi-​features on the root HUNGER, and is realized as a clitic pronoun attached to the root. vP

(35) ....

v’ PredP

v BE


DP mani (I) ROOT


HUNGER+phi-features HAVE

Agree As for the inalienable pseudo-​possessor constructions in (34), Karimi proposes the structure in (36), similar to that in (35). The main difference between the two constructions is the choice of LV, as mentioned before. vP

(36) …

v’ v


mi-yâd ‘comes’




NP PP mani ‘I’ az in rang



xosh + phi-features

‘of this color’ AGREE

174   Simin Karimi Yadgar Karimi (2013) defines the second type of impersonals, exemplified by (34), as a CPr consisting ‘of a psychological state nominal and an unaccusative verb which is predicated of an experiencer argument.15 Note that some linguists have argued that impersonal constructions do not belong to the category known as complex predicates (Dabir-​Moghaddam 1997; Sedighi 2005, 2009) (see also Chapters 3 and 15). As Karimi (2013) states, the structure one assumes for impersonal constructions, as well as the theoretical assumptions one holds to define the class of complex predicates, dictate whether or not impersonals form a subclass of the general CPr category.

7.4  Persian noun phrases This section is devoted to an overview of scholarship on the structure of Persian NOUN PHRASES. The first subsection provides a discussion of the Ezafe construction and the literature on this phenomenon. Subsection 7.4.2 offers an examination of Persian complex noun phrases, followed by a brief discussion of classifier(s) in this language in 7.4.3. For more discussion on noun phrases, see Chapters 3, 8, and 9.

7.4.1 Ezafe construction One of the most intriguing properties of the Persian noun phrase is the Ezafe construction, a morphosyntactic phenomenon that ranges over several constructions inside the DP. Thus it is not surprising that this subject has received enormous attention by grammarians and linguists over several decades. The study of the Ezafe construction goes back to Phillott (1919), Qarib et al. (1950), Lazard (1957), Palmer (1971), Natel-Khanlari (1972), and continues to the present day.16 I start this section by a descriptive introduction to the Ezafe construction in, and continue with a review of some theoretical analyses of this construction in The Ezafe construction is further discussed in Chapters 2, 3, 6, 7, 8, 9, 10, and 19. What is the Ezafe construction? Literally, Ezafe means ‘addition’, and is derived from the Arabic idafa(t) (Karimi and Brame 2012). It has been suggested that its origin in Modern Persian can be traced back to the Old Persian relative/​demonstrative hya/​tya (Samvelian 2007, based on Darmesteter 1883). This element gives rise to an extremely common construction in Persian. The Ezafe affix, which surfaces as -​e or -​ye (following vowels) can attach to a wide range of category types, including 15  See Yadgar Karimi (2013) for a different proposal representing the structure of the impersonal construction in (34). 16 The Ezafe construction has also been examined in other Iranian languages. See, for example, Holmberg and Odden (2005, 2008) for Hawrami; Larson and Yamakido (2006) for Zazaki; Samvelian (2008) for Kurmanji and Sorani; among others. In some of these languages, the form of the Ezafe varies depending on specific properties of the noun, including its phi features.

Generative Approaches to Syntax    175 count nouns, mass nouns, pronouns, adjectives, some prepositions, verbal nouns, past participles, and quantifiers. It cannot, however, be affixed to verbs, adverbs, certain prepositions, or conjunctions. The Ezafe construction is employed to express modification, possession, origin, material, specification, and more. Here are some examples. (37) a. mard-​e xub man-​Ez good ‘Good man’

(Palmer 1971: 11)

b.  otâq-​e Ali room-​Ez Ali ‘Ali’s room’

(Samiian 1983: 39)

c. zir-​e miz under-​Ez table ‘Under the table’

(Samiian 19833: 273)

d. montazer-​e Jiân Waiting-​Ez Jian ‘Waiting for Jian.’

(Ghomeshi 2006: 27)

e. shahr-​e Kermân city-​Ez Kerman ‘The city Kerman’

(Ghomeshi 2007: 730)

f. hame-​ye doxtar-​hâ all-​Ez girl-​pl ‘All (of) the girls’

(Karimi and Brame 2012: 124)

Several modifiers may appear in the same noun phrase, each one linked to the previous element by the Ezafe affix. The possessor, if present, is always the final nominal in these constructions. Since the Ezafe affix links two elements, it cannot attach to the last element within the noun phrase (38c). (38) a. otâg-​e kuchik-​e zir-​e shirvani-​ye Ali room-​Ez small-​Ez under-​Ez roof-​Ez Ali ‘Ali’s small room under the roof.’

(Samiian 1983: 39)

b. mâshin-​e man car-​Ez I ‘my car’ c. ketâb-​e ru-​ye book-​Ez on-​Ez

miz-​(*e) table-​(*Ez)

As mentioned before, the Ezafe affix may not attach to verbs, adverbs, and conjunctions. (39) a. *raft-​am-​e went-​1sg-​Ez

176   Simin Karimi b. *hanuz-​e still-​Ez c. *vali-​ye but-​Ez

(Karimi and Brame 2012: 112)

Although the Ezafe affix may be attached to some prepositions, as in (37c) and (40b), it is not allowed with some others (40a). (40) a. *be-​ye Hasan to-​Ez Hasan b. tu-​ye in-​Ez

manzel house

(Samiian 1994: 29)

I will come back to this issue in the next subsection where I provide an overview of some of the specific analyses of certain properties of the Ezafe construction and the function of the Ezafe affix itself. Distinct treatments of the Ezafe construction: an overview Ghaniabadi (2010) suggests the following order within the Ezafe domain: (41) N




The Ezafe affix appears between each one of the constituents in the post-​nominal domain. (42) N

Ez A






Samiian (1983, 1994) considers the Ezafe affix as a Case marker. She argues that clauses (CPs) and real prepositions do not allow the Ezafe to precede them since they do not need a Case assigner. Following Samiian, Larson and Yamakido (2006, 2008) argue along the same lines. Samiian (1983, 1994) divides Persian prepositions in two groups: one group (P1) does not allow the Ezafe affix (be ‘to’, az ‘from’, bâ ‘with’, dar ‘in’, bi ‘without’ tâ ‘until’), while the other one (P2) does (zir ‘under’, ru ‘on’, bâlâ ‘up’, kenâr ‘next to’, jelo ‘front of ’, barâ ‘for’). She suggests that these two groups differ in three ways: (a) P1 is a real function word, while P2 has some semantic content; (b) P1, but not P2, is strictly subcategorized for a DP complement; and (c) P2, but not P1, displays some nominal properties (can take a demonstrative, be pluralized, and be marked by -​râ). Samiian rejects, however, the idea that members of P2 group are true nominals based on the grounds that they cannot be modified by an adjective or a relative clause. Karimi and Brame (2012), on the other hand, suggest that prepositions that allow the Ezafe affix are in fact nouns. Their argumentation is based on pieces of evidence in addition to those discussed by Samiian. First, they show that these elements can in fact be modified by an adjective: (43) zir-​e kasif-​e miz under-​Ez dirty-​Ez table ‘The dirty underspace of the table’.

(Karimi and Brame 2012: 132)

Generative Approaches to Syntax    177 Second, these elements can be reduplicated for emphasis. (44) zir zir-​â-​ye miz’ under under-​pl-​Ez table ‘Everywhere under the table.’

(Karimi and Brame 2012: 132)

Reduplication of nouns is fairly common as a grammatical or morphological device in a variety of languages. It is used, for example, as a means of pluralization in some languages, and for the purpose of diminutivization in others. Reduplication can also be found in conjunction with pronouns, verbs, and even adjectives. These authors state, however, that to their knowledge, reduplication is not found with prepositions. As for the internal structure of the Ezafe construction, Ghomeshi (1996, 1997) provides a novel analysis of Persian common nouns by suggesting that these elements do not project. That is, they are of the category X0, and do not appear with a complement or a Specifier. She suggests that the fact that there is only one possessor inside the Persian noun phrase follows from this analysis: since nouns cannot project complement and Specifier positions, the only position available for the possessor phrase is the Specifier of the DP, which she suggests to be on the right of the phrase. The structure she provides for the Ezafe construction is the following. (45)


[D’ [NP [N0 [N0

[N0 N0 N0 ] A0 ] P0 ] ] Ddef ] DPPoss ] Ezafe domain

Unlike Samiian and Larson and Yamakido, Ghomeshi argues that the Ezafe element is a linker attached to an X0. Ghaniabadi (2010), working within the framework of Distributed Morphology, adopts the same analysis with respect to the nature of the Ezafe, and suggests that the Ezafe insertion is a phonological rule that applies at the Late-​Linearization stage at PF. Based on empirical evidence, Samvelian (2007, 2008) argues against Ghomeshi’s proposal that the Persian nouns do not project. She further suggests that the Ezafe affix is neither a Case assigner, nor a linker inserted at PF.17 Samvelian (2007) states that the fact that relative clauses in Kurdish dialects and Zazaki can be introduced by the Ezafe affix further supports the claim that this element is not a Case assigner, since CPs are not assigned Case. She argues that this element has undergone a process of grammaticalization. That is, the Ezafe is best regarded as an affix which has the function of indicating dependency relations between the head noun, its modifiers, and the possessor NP. In other words, the Ezafe is a phrasal (inflectional) affix that occurs at the right edge of the nominal projections, similar to the determiner -​i and personal enclitics. Kahnemuyipour (2000a, 2006)  builds the Ezafe construction primarily on Cinque’s (1994) assumption that the cross-​linguistic asymmetry concerning the relative order of nouns with respect to their modifiers is the result of the syntactic head raising of the noun to a functional head within the noun phrase. Kahnemuyipour (2014) offers a treatment of the Ezafe construction based on the structure of DP proposed by Cinque (2010). Cinque argues that any word order variation is the result of a roll-​up movement of phrasal 17  Furthermore, Kahnemuyipour (2015) provides data indicating that Ezafe may appear preceding pure prepositions in Persian, thus further undermining the Case assigning nature of the Ezafe affix.

178   Simin Karimi elements. Extending this roll-​up analysis to Persian DP, Kahnemuyipour argues that the Persian noun phrase is head-​final, and that the surface word order is derived by the phrasal movement of the NP containing the head noun through the Specifiers of the intermediate functional heads.18 The structure in (46) represents this roll-​up movement: XP and YP stand for the modifier phrases, and AgrP for the functional phrases intervening between the modifiers. (46) [DemP [Numeral Numeral [AgrP [ AgrEZ [XP AP [ X [AgrP [ AgrEz [YP AP [ Y [NP/​Number ]]]]]]]]]]] In (46), the NP (carrying the head noun and the plural marker, if present) moves cyclically into the Specifiers of the intervening AgrPs. The Ezafe functions as a linker, the realization of the inversion process. Kahnemuyipour’s analysis explicitly suggests that the Ezafe realization is directly correlated with the phrasal status of the elements it links with each other. This analysis, thus, explains why demonstratives and numerals, being heads, are never followed by the Ezafe element due to their nature as heads. This is in a sharp contrast with Ghomeshi’s view that considers the elements inside the Ezafe domain as X0. The next subsection offers an overview of complex DPs in Persian.

7.4.2 Complex DPs Persian sentential arguments of nouns and relative clauses exhibit an interesting property. That is, the morpheme -​râ, known as specific direct object marker, may intervene between the head noun and the complement/​relative clause, illustrated by ‘a’ and ‘b’ in (47). (47) a. Hame [DP [ in edde’â ]-​ro [CP ke Ramin bigonah-​e]] mi-​pazir-​an. all this claim-​râ that Ramin innocent-​be-​3sg asp-​accept-​3pl ‘Everyone accepts this claim that Ramin is innocent.’      (Karimi 2001: 63) b. Man [DP[ un ketâb-​i ] -​ro [CP ke Sepide diruz xarid]] I that book-​Rel-​râ that Sepide yesterday buy-​past-​3sg be Kimea dâd-​am. to Kimea gave-​1sg ‘I gave that book that Sepide bought yesterday to Kimea.’ (Karimi 2001: 64) The structures of the data in (47a) and (47b) would be something like those in (48a) and (48b), respectively. (48)




D in




D un


N edde’a

CP ]]] râ

[NP [N’ [N’ [ N ]] ketâb-i

ke……… CP ]]]] râ ke……

Karimi (1996) suggests the following configuration for the Persian DP followed by -​râ: KsP is a head initial phrase, with -​râ in the head position. The DP moves into the Specifier of this phrase for Case purposes, providing the surface word order. 18  Holmberg and Odden (2005), too, propose a ‘roll-​up’ derivation of the Ezafe construction in Hawrami.

Generative Approaches to Syntax    179 KsP

(49) Spec

Ks’ Ks râ


ti (Karimi 1996: 185)

Adopting a revised version of Kayne’s (1994) theory of relative clauses, Karimi suggests that the demonstrative and the head noun reside in the Specifier of the DP containing the relative clause. The movement of these two elements as one constituent into the Specifier of KsP gives us the word order in (50). (50)

KsP Spec [un ketab-i]i râ

Ks’ DP D’

Spec ti


CP (Karimi 2001: 77)

The clausal complement of N receives a similar treatment. As reported in Karimi (2001), constructions such as those in (51) and (52), where -​râ follows the whole complex DP, are employed in mass media and by younger speakers. However, these forms are considered to be incorrect by prescriptive grammarians (for example, Najafi 1991). (51) Hame [DP [ in edde’â ] [CP ke Ramin bigonâh-​a]] -​ ro mi-​pazir-​an. all this claim that Ramin innocent-​be-​3sg-​râ asp-​accept-​3pl ‘Everyone accepts the claim that Ramin is innocent.’ (52) Man [DP [ un ketâb-​i ] [CP ke Sepide diruz xar-​id]] -​ ro be Kimea I that book-​rel that Sepide yesterday bought-​3sg-​râ to Kimea dâd-​am gave-​1sg ‘I gave the book that Sepide bought yesterday to Kimea.’ The treatment of complex DPs proposed in Karimi (2001) provides a simple solution for the generation of these forms: In both cases, the entire complex DP containing the CP moves into [Spec, KsP], as in (53): KsP

(53) DPi

Ks’ râ

DP (Karimi 2001: 80)

180   Simin Karimi As for relative clauses within the subject DP, there are some interesting data that need careful examination. Here are a few examples: (54) a. [DP dâneshju-​â-​yi [CP ke bâhush hast-​an]] xub dars mi-​xun-​an student-​pl-​rel that smart be-​3pl well lesson asp-​read-​3pl ‘Students who are smart, study well.’ b. [DP dâneshju-​â-​yi ei ] xub dars mi-​xun-​an [CP ke bâhush hast-​an]i student-​pl-​rel well lesson asp-​read-​3pl that smart be-​3pl ‘(Those) students are smart who study well.’ There is a scope difference between these two sentences. The sentence in (54a) means that the set of students who are smart is the subset of students who study well. In other words, among the students who study well, there are some who are not so smart. The sentence in (54b) means that the set of students who study well is the same as the set of smart students. That is, all students who study well are smart. Here is another set. (55)

a. [DP Iruni-​yâ-​yi [CP ke   tu Lake Oswego zendegi mi-​kon-​ an]] puldâr hast-​ an Iranian-​pl-​rel that  in  Lake  Oswego  life     asp-​do-​3p   rich be-​3pl ‘Iranians who live in Lake Oswego are rich.’ b. [DP Iruni-​yâ-​yi ei ] puldâr hast-​an [CP ke   tu Lake  Oswego  zendegi Iranian-​pl-​rel rich be-​3pl that in Lake Oswego life mi-​kon-​an] asp-​do-​3pl ‘(Those) Iranians are rich who live in Lake Oswego.’

Again, (55a) means that those Iranians who live in Lake Oswego are rich, but there are other rich Iranians as well who live elsewhere. In other words, the set of Iranians who live in Lake Oswego is the subset of rich Iranians. (55b) means that the set of Iranians who live in Lake Oswego is the same as the set of rich Iranians. Are the ‘b’ sentences in (54) and (55) the result of a syntactic extraposition rule? If so, why should the extraposition of the relative clause in these and similar data provide a different interpretation? This is an interesting topic that I leave for future research.

7.4.3 Classifiers Classifiers are functional morphemes which, in some languages including Persian, appear between a numeral and a noun (Gebhardt 2008, 2009). Following Samiian (1983), Ghaniabadi (2010) proposes the existence of three types of classifiers: true classifiers, measure nouns, and group nouns. True classifiers are used with count nouns, while measure nouns and group nouns are used with mass nouns.

Generative Approaches to Syntax    181 (56) True classifiers -​tâ ‘until’ used with all count nouns nafar ‘person’ jeld ‘unit’ used for books pors ‘unit’ used for meals Measure nouns metr ‘metre’ kilo ‘kilogram’ litr ‘litre’ Group nouns daste ‘bunch’ guruh ‘group’ fenjun ‘cup’ ghâshogh ‘spoon’

(Ghaniabadi 2010: 25)

The classifier tâ is the most common one and may replace the other ones. Ghaniabadi suggests that this classifier may not appear with the other ones. (57) a. se jeld ketâb three unit book ‘Three books’ b. * se tâ jeld ketâb c. se tâ ketâb

(Ghaniabadi 2010: 26)

Furthermore, Persian classifiers are suggested to be optional (Qarib et al. 1950; Bateni 1969; Mahootian 1997; Megerdoomian 2002; Gebhardt 2009).19 Gebhardt (2009) provides a feature-​driven theory of classifiers in several languages, including Persian, organized along the feature-​geometric analysis of Harley and Ritter (2002). He suggests that the classifier tâ has the features [group, abs].20 This means that the appearance of tâ indicates plurality. The plural nature of tâ thus explains why it cannot appear with ye(k) ‘one’. (58) *ye(k) tâ dânešju one Cl student Gebhardt goes on stating that there are some elements in Persian that semantically agree with the noun with regard to shape, material, animacy, etc., such as nafar (for people), ghabze (for swords, rifles), adad (for smallish inanimate things like pencil), etc. He

19  See Ganjavi (2007), however, who argues that optionality boils down to formal versus informal style: while classifiers are not employed in the former style, they are in fact obligatory in the latter. See also Lazard (1992) for a similar account of Persian classifiers. I tend to agree with this assessment. 20  Gebhardt uses ‘abs’ for ‘absolute’, a more specific quantity feature for numerals.

182   Simin Karimi does not consider these elements as classifiers. For him, they function as modifiers of the classifier tâ, in case they appear together. Thus for him the string se tâ jeld ketâb (cf. 57b) is grammatical; jeld ‘volume’ restricts the semantic domain of the classifier tâ. Comparing jeld with tâ, he shows that the former may appear with ye(k), while the latter may not (cf. 58). (59) ye(k) jeld ketâb one volume book ‘One volume (of) book’ The contrast between (58) and (59) might be due to the specific semantics of tâ and jeld: while the former is inherently plural, the latter is not. However, the order of these two elements is the same as the order of nouns and their modifiers. A revers order of tâ and jeld makes the string ungrammatical. (60) a. do tâ jeld ketâb two Cl Modbook book ‘two (volumes of) books’ b. *do jeld tâ ketâb

(Gebhardt 2009: 273)

Although the phrase in (60a) sounds a little odd to me, it is far better than the one in (60b). Furthermore, the appearance of tâ with other elements that are classified as group noun classifiers by Samiian (1983) and Ghaniabadi (2010) sound perfectly well formed. (61) a. se tâ daste gol three Cl bunch flower ‘Three bunches (of) flowers.’ b. se tâ fenjun ghahve three Cl cup coffee ‘Three cups (of) coffee.’ c. se tâ ghâshogh shir three Cl spoon milk ‘Three spoons (of) milk.’ Finally, Gebhardt shows that the optional classifier tâ is in fact obligatory in partitive constructions, as in (62). (62) pænj *(tâ) az pesar-​hâ dir resid-​aend five Cl of boy-​pl late arrived-​3Pl ‘Five of the boys arrived late’

(Gebhardt 2009: 280)

Gebhardt argues that the obligatory presence of tâ in partitive constructions, such as the one in (62), provides a serious problem for previous accounts of classifiers (Chierchia 1998, Borer 2005). This is so because Chierchia’s theory states that in some languages all nouns are mass,

Generative Approaches to Syntax    183 and therefore, they need a classifier to convert them into predicates, which can then be used with numerals. However, the prepositional phrase az pesær-​hâ is not a noun in the first place. Thus his theory cannot explain the obligatory presence of the classifier in these cases. Borer, on the other hand, argues that if a noun is the complement of a ‘divider’ (either a classifier or a plural morphology), then it becomes a count noun. The problem for her is that the PP az pesar-​hâ is not a noun, and therefore, it cannot be placed in an appropriate position within the DP to be divided by the classifier. Gebhardt then goes on suggesting that the classifier tâ is optional in Persian, since the divider feature can float on other elements such as numerals in this language. Borer’s theory can account for this optionality. However, the numeral subcategorizes for a Number Phrase. Since PPs are not Number Phrases in the sense of Ritter (1991), the classifier is the only option to save the derivation, and thus becomes obligatory in this case.

7.5  Marked and unmarked objects and the morpheme -​r â Differential object marking (DOM) is the morphological marking of direct objects in some languages based on one or more nominal hierarchies, such as definiteness or animacy. In some languages, marking is obligatory on definite objects and prohibited on non-​referential objects, regardless of animacy, while referential indefinite objects are sometimes marked and sometimes not, based on specificity (Enç 1991; Karimi 2003; Key 2008). I discuss the unmarked object in 7.5.1, followed by an overview of the morpheme -​râ, and the DPs marked by this element in 7.5.2.

7.5.1 Unmarked objects The semantics of the unmarked object combined with the verb gives the impression that the nominal element is not an argument of the verb, but rather part of the predicate, as in (63). (63) Kimea ketâb xund Kimea book read ‘Kimea did book-​reading.’ In fact, it has been suggested that this element incorporates into the predicate (Dabir-​ Moghaddam 1997). Along the same lines, Ghomeshi and Massam (1994: 183–​4) suggest that non-​referential objects undergo Type I Noun Incorporation (NI) in the sense of Mithun (1984). Given Baker’s (1988, 1996) Noun Incorporation, this would imply that the unmarked object, incorporated into the verb, does not receive Case. As noted by Ganjavi (2007), however, bare objects may be modified by adjectives, and thus receive a phrasal status. In those cases, they cannot incorporate into the verb since this operation involves only Xo

184   Simin Karimi categories.21 A number of questions arise: is there any evidence that the unmarked object is a DP, saturating the internal argument of the verb, or an NP, lacking a clear argument status?22 Does it require Case? There are pieces of evidence suggesting that the unmarked object is syntactically independent of the verb, saturates its internal argument, and therefore, is a DP that requires Case. Evidence for this claim is provided by discourse constructions where the unmarked object is contrastively focused or topicalized as in (8) in section 7.2 of this chapter. Furthermore, the unmarked object may serve as the subject of a passive construction, thus confirming that it is in fact the internal argument of the verb. (64) kelid dar in kârkhune sâxte mi -​sh -​e key in this factory made asp-​become-​3sg ‘Keys are made in this factory.’ (65) ketâb dar in sâ’at xunde mi -​sh -​e book in this chour read asp-​become-​3sg ‘Books are read at this hour.’ These data suggest that the unmarked object is in fact the argument of the verb, and thus has a DP status and requires to be checked for Case. The next subsection is devoted to an overview of the morpheme -​râ and the DP it marks.

7.5.2 Marked objects and the nature of -​râ No other morpheme in Modern Persian has attracted as much interest on the part of grammarians, linguists and language teachers as the postpositional morpheme -​râ. In Old Persian, -​râ appears as râdi marking a cause with the meaning ‘for the sake of ’. The same interpretation holds for rây, the reflex of râdi in Middle Persian. According to Brunner (1977), Middle Persian rây served other functions as well. It appeared as an illustration of purpose, reference, beneficiary, or indirect object (Karimi 1990). Traditional grammarians have assumed -​râ to be a direct object marker (Natel-Khanlari 1986; among others). Some linguists have argued that this element has a secondary function as marking the object for definiteness (Phillott 1919; Sadeghi 1970; Vazinpour 1976; among others). Lazard (1957) distinguishes between polarized objects (NP+râ with transitive verbs) and quasi-​polarized objects (NP+râ with intransitive objects). Examining the diachronic development of -​râ, Key (2008) suggests that this element marked animate, rather than 21  Ganjavi (2007) suggests that the bare object is a complement of the verb, and undergoes a Pseudo Noun Incorporation (PNI) in the sense of Massam (2001), and thus does not require Case. Discussing Niuean, Massam suggests that the object in a VSO order in this language is a full DP, while the less common VOS order involves an NP object that undergoes PNI. Ganjavi (2007, 2011) further suggests that the non-​râ marked object is either an NP or a NumP. 22  Mahootian (1997: 203 fn.) suggests that Persian bare noun phrases are either indefinite or generic. Her analysis includes subjects as well. Modarresi and Simonenko (2007) and Modarresi (2014) suggest that the indefinite reading of the bare object is likely to be the result of a semantic incorporation. See also Ghomeshi (2008) whose analysis of bare noun phrases includes objects of locative prepositions, in addition to subjects and objects.

Generative Approaches to Syntax    185 definite, objects in early texts (e.g. Qabusnâme, eleventh century). Browne (1970) argued that -​râ appears with definite as well as specific indefinite objects. The following examples represent both definite and specific indefinite objects. (66) a. Kimea in ketâb-​*(ro)  barâ man xarid Kimea this book-​râ   for   me  bought-​3sg ‘Kimea bought this book for me.’ b. diruz unâ ye film -​i (ro) tamâshâ mi-​kard-​an yesterday they a movie-​Ind (râ) view asp-​did-​3pl ‘They were watching a movie yesterday.’ While -​râ is obligatory with the definite object in (66a), it is optional in (66b). Its presence with the indefinite object provides a specific interpretation. Some linguists have considered -​râ as a topic marker in Modern Persian (Peterson 1974; Windfuhr 1979; Ghomeshi 1997). In fact, the DP followed by -​râ may receive a topic interpretation, as we will see below. However, it may also mark a contrastively focused DP as well. In terms of the position of the object, Karimi (2005) argued that the non-​specific object as well as the specific (marked) object are both merged in the object position of the verb. While the former remains in situ adjacent to the verb in a discourse neutral sentence, the latter moves out of the VP (or Pred(icate)P) into the lower Specifier of vP. This movement can be considered an instance of object shift, an operation observed in some of the Germanic languages such as Icelandic and German (Holmberg 1986, 1999; Diesing 1996). The assumption is that the VP is the domain of novel/​existential interpretation in the sence of Heim (1982), Kratze (1995), Diesing (1992) and others, and thus the specific object, representing old information, has to move out of that domain.23 vP




v’ PredP


to (Karimi 2005: 109)

23  If we assume that the novel domain is not VP, but vP in a more modern sense, then it is not surprising that DP+râ may stay in the Specifier of vP if it reveals novel information such as emphesis or comparison. This is in fact borne out empirically.

(i) shomâ hanuz film-​e tâze-​ye FARHADI-​ro/​ BEHTARIN film-​ro na-​did-​in you yet movie-​Ez new-​Ez Farhadi -​râ /​ best movie-​râ neg-​saw-​3pl ‘You have not yet seen Farhadi’s new movie/​the best movie.’ In this example, DP+râ follows the adverbial hanuz ‘yet’, an adverb that marks the vP edge.

186   Simin Karimi DP+râ might move out of the vP into a higher position, presumably a topic position or a contrastive focus position, preceding the subject.24 Karimi and Smith (2015) argue that -​râ is not a direct object marker, but rather a default case marker in the sense of Marantz (1991). Marantz argues that case assignment only indirectly reflects the syntactic structure, and that there is a special Morphological Structure (MS) of the grammar in which m(orphological)-​case is assigned. He further suggests that the morphological realization of case obeys the disjunctive hierarchy in (68). (68)

Disjunctive case hierarchy i. Lexically governed case: assigned by specific verbs (quirky case (Icelandic)). ii. Dependent case: case is dependent upon the presence of some higher functional projection or a set of such projections (Accusative in Nom-​Acc languages, Ergative in Erg-​Abs languages). iii. Unmarked case: assigned when a DP appears embedded in a certain structural position (genitive in NPs, nominative in Spec-​IP/​TP). iv. Default case: assigned when no other licensing condition is met.

Each type of case in this hierarchy is more specific than the case below it, and thus takes priority in case realization. Therefore, lexically governed case is always assigned over any other case in the hierarchy. This is motivated by the facts of Icelandic quirky case, in which the case assigned by the verb is retained on the quirky case marked DP regardless of its syntactic position. Karimi and Smith’s analysis is based on the well-​known facts that the appearance of -​râ is not specific to direct objects (cf. Lazard 1957, 1992), as the following data indicate. The data in (69) represent the Classical Modern Persian (CMP), although they are still employed in the formal written language. (69) -​râ marking oblique case a. amir-​râ zakhm-​i zad-​am king-​râ wound-​Ind hit-​1sg ‘As for the king, I wounded (him).’


b. loghmân râ porsid-​and adab az ke âmuxt -​i Loghman râ asked-​3Pl politeness from whom learned-​2sg ‘They asked (of) Loghman, whom did you learn politeness from.’


c. in mehnat râ darmân-​i andishide-​am this suffering râ remedy-​Ind thought-​1sg ‘As for this suffering, I have thought (of) a remedy.’


(70) -​râ marking possession a. va in -​râ nâm shâhnâmeh nahâd-​and and this râ name Shahname put-​3pl ‘Its name they marked Shahname.’ Lit. ‘And this, they put the name Shahname on.’


24  Ganjavi (2007) suggests a similar situation for the marked direct object, by moving it out of the VP into the Specifier of a functional head.

Generative Approaches to Syntax    187 b. xalgh-​râ xun be-​rixt-​and people-​râ blood subj-​shed-​3pl ‘As for people, they shed (their) blood.’


The morpheme -​râ also appears in a different possessive construction represented by the example in (71): bud ‘was’ is a copula, yet -​râ appears following the DP pâdshâh ‘king’. (71) pâdshâh -​ râ pesar-​i bud king -​ râ son-​ind was ‘As for the father, there was a son.’


Karimi and Smith argue that the presence of -​râ is clearly not lexically governed, since it can appear with various types of verbs. It does not represent Marantz’s dependent case either, since its occurrence does not depend on the presence of a higher case (cf. 69–​7 1). Nor can it be an unmarked case, since it does not appear with subjects. The only remaining case is the default case. These authors argue that -​râ marks a specific DP when there is no other case available in a derived position, proposing the following generalization. (72) Default case marking i. A specific DP is morphologically marked for Default case in a derived position postsyntactically. ii. The morpheme -​râ represents the Default case. The claim that -​râ is a postsyntactic default case-​marker even in Modern Persian is evident by the fact that this element may appear with intransitive verbs in non-​possessive constructions when the DP is in a derived position revealing a discourse interpretation. (73) shab-​e pish-​o aslan na -​ xâbid-​am night-​Ez last-​râ at all neg-​slept-​1sg ‘As for last night, I didn’t sleep at all.’ Or: it was last night (as opposed to some other time) that I did not sleep at all. The fact that subjects and objects of preposition are not marked by -​râ follows from this analysis, since those elements receive unmarked case (cf. iii in 68)

7.6  Passives, causatives, and resultatives In this section, I discuss three different constructions in Persian: passives, causatives, and resultatives. Although the first two have been discussed by some grammarians and linguists, the latter has not received much attention. I start with a discussion of passive in 7.6.1, followed by an examination of causatives and resultatives in 7.6.2 and 7.6.3, respectively. For more discussion on causative form, see Chapters 2 and 3.

7.6.1 Passives Passives may appear in various forms in Persian. The data in (74–​6) provide three types of constructions that can be considered passives, at least semantically. For more discussion on passive construction, see Chapter 3.

188   Simin Karimi (74) a. bachche-​hâ ghazâ-​ro xord-​an child -​pl food-​râ eat-​3pl ‘The kids ate the food.’ b. ghazâ xorde shod food eaten became-​3sg ‘The food was eaten.’ (75) a. tim-​e shomâ tim-​e mâ-​ro shekast dâd team-​Ez you team-​Ez us-​râ defeat gave-​3sg ‘Your team defeated our team.’ b. tim-​e mâ (az tim-​e shomâ) shekast xord team-​Ez us of team-​Ez you defeat collided-​3sg ‘Our team was defeated (by your team).’ (76) pro dar tâbestân madrese-​hâ-​ro tatil mi-​kon-​an in summer school-​pl-​râ suspense asp-​make-​3pl ‘(They) close the schools during the summer.’ In (74b), the participle form xorde ‘eaten’ is combined with the verb shod ‘became’ to provide a passive reading. In (75b), the unaccusative light verb xord ‘collided’ replaces the agentive light verb dâd ‘gave’ in (75a), providing a passive reading. In (76), there is a semantically understood, but syntactically unspecified and lexically empty, subject. The discussion of passive constructions goes back to Phillott (1919), who suggests that this construction is not used productively in Persian. Some linguists postulate a transformational passive rule for Persian (Marashi 1970; Palmer 1971; Soheili-​Isfahani 1976; Dabir-​Moghaddam 1982b, 1985). Moyne (1974), on the other hand, argues that Persian lacks a passive construction, either morphological or syntactic, by suggesting that there is no underlying agent in so-​called passive constructions in this language. He acknowledges, however, that there are examples such as those in (77) with an overt agent. (77) emshab âvâz tavasot-​e bânu parvâna xânde mi-​shav-​ad tonight song through-​Ez Ms. Parvane sung asp-​become-​3sg ‘Tonight songs will be sung by Ms Parvane’ (Moyne 1974: 250) Moyne suggests, however, that these agentive phrases are new and awkward in Persian. He concludes that there are no active–​passive pairs in Persian, and those constructions with shodan ‘become’ are in fact inchoatives, and the agentive phrases are instrumental. Dabir-​Moghaddam (1982b, 1985)  argues that, in addition to inchoative constructions, there are also structural passives in this language. This author suggests that the verb shodan ‘become’, although a motion verb in Middle Persian, has taken on a special function as a passive auxiliary in Modern Persian. He further shows that, in addition to its new function, shodan continued to appear in its earlier function in Classical Modern Persian (e.g. be Kerman shod ‘S/​he went to Kerman), indicating a functional transition at that period. In Modern Persian, he suggests, this element represents inchoative as well as passive constructions. As for the latter, Dabir-​Moghaddam shows that the direct object of an active sentence

Generative Approaches to Syntax    189 becomes the structural subject of the corresponding passive, and the agent-​phrase (used with an instrumental preposition) is optionally employed. (78) a. ma’mur-​ân-​e sâvâk yek ostâd -​e dâneshgâh -​râ kosht-​and agent-​pl -​Ez Savak a professor-​Ez university -​râ killed-​3pl ‘The Savak agents killed a university professor.’ b. yek ostâd -​e dâneshgâh (tavasot-​e ma’mur-​ân-​e sâvâk) koshte shod a professor-​Ez university by -​Ez agent-​pl-​Ez Savak killed became-​3sg ‘A university professor was killed (by Savak agents). (Dabir-​Moghaddam 1982b: 75) Dabir-​Moghaddam suggests that these constructions differ from inchoatives based on the following contrast: while (79a) represents a passive construction that requires an underlying agent, (79b) is inchoative, lacking such an element. (79) a. *yek ostâd -​e dâneshgâh xod be xod koshte shod a professor-​Ez university self to self killed become *’A university professor was killed gratuitously.’ (Dabir-​Moghaddam 1982b: 77) b. âb xod be xod sard shod water self to self cold became-​3sg ‘The water cooled/​became cool by itself.’ Thus Dabir-​Moghaddam suggests a passive rule that relates the underlying active sentence to the corresponding passive version. He further argues that this rule applies to verbs that express volitional force. Folli, Harley, and Karimi (FHK) (2005) argue that the so-​called passive constructions are instances of complex predicate constructions in which the past participle of the verb serves as an NV element.25 Based on their analysis of complex predicates, they suggest the structure in (81) for the sentence in (80). In this sentence, the past participle dâde has adjectival properties. The complement of the adjective moves into the subject position in this construction. (80) ye gol be Papar dâde shod a flower to Paper given was ‘A flower was given to Papar’ (81)

vP AP v shod PP A’ be Papar O A became to Papar ye gol dâde a flower given [FHK 2005: 1395]


See also Vahedi-​Langrudi (1996) for a similar idea.

190   Simin Karimi FHK suggest that this structure is similar to the regular unaccusative CPr consisting of an adjective as the NV element and an LV. Thus the sentence in (82) has the phrasal structure in (83). (82) xune xarâb shod house destroyed became ‘The house was destroyed.’ vP

(83) AP O xune house

A xarâb destroyed

v shod became

[FHK 2005: 1395-6] These authors further suggest that their analysis predicts that there is no ‘passive’ of CPrs with a nominal NV element. That is, the light verb shodan ‘become’, selects for a predicative small clause complement where there is no room for both a nominal NV element and a deverbal adjective. The following data show that this prediction is in fact borne out.26 (84) a. kise keshidan (brush pulling) ‘to brush (body)’ b. *kise keshide shodan (brush pushed become) ‘be brushed’ (intended) Two of these authors (Harley and Karimi) have come across some (unproductive) data contradicting the generalized analysis presented in their 2005 work. That is, there seem to be data that allow both, the nominal NV element and the deverbal adjective, in addition to the light verb. (85) a. gharâr dâdan place giving ‘To place’

b. gharâr dâde shodan place given becoming ‘To be placed’

(86) a. edâme dâdan continuation give ‘To continue’

b. edâme dâde shodan continuation given becoming ‘To be continued’

There are also cases in which the NV element is a prepositional phrase, allowing the presence of the PP and the deverbal adjective, in addition to the light verb, as in (87–​8). (87) a. be khâk sepordan to soil entrusting ‘To bury’ 26 

b. be khâk seporde shodan to soil entrused becoming ‘To be buried’

See also Vahedi-​Langrudi (1999) who argues that shodan is not an auxiliary, but a main light verb. He derives the complex predicate formation of the light verb and the NV element by a postsyntactic operation.

Generative Approaches to Syntax    191 (88) a. be kâr bastan to work closing ‘To employ’

b. be kâr baste shodan to work closed becoming ‘To be employed’

As for (85, 86) and similar cases, it seems that this kind of construction is only possible if there is no alternative LV construction available in the language. Compare the sets in (89–​90) with those in (91–​2). (89) a. shekast dâdan defeat giving ‘To defeat’

b. shekast xordan defeat colliding ‘To be defeated’

(90) a. tamiz kardan clean making ‘To clean’

b. tamiz shodan clean becoming ‘To becoming clean’

(91) a. be xâter sepordan to memory giving ‘To memorize’

b. *be xâter xordan/​shodan to memory colliding/​becoming Intended: to become memorized

(92) a. edâme dâdan continuation giving ‘To continue’

b. *edâme xordan/​shodan continuation colliding/​becoming Intended: To be continued

As for (91) and (92), it seems that the passive form of these constructions is almost exclusively restricted to data that are used in elevated formal language. However, a careful analysis of these constructions is required.27

7.6.2 Causatives Persian distinguishes several types of causatives. One group, called labile causatives, retain their surface form, but take on an additional argument in order to be interpreted as causative, as in (93b). These forms are not very common in Persian.28 (93) a. panjare shekast window broke ‘The window broke.’

27  Examining these types of constructions is part of an elaborated NSF grant on complex predicates in a number of Iranian languages awarded to Karimi (PI), Harley and Carnie (Co-​PIs) for the period of July 2015–​December 2018. 28  For an elaborated discussion of all types of causative constructions, see Dabir-​Moghaddam (1982a, 1987). Lotfi (2008) examines various causative constructions as well. Soheili-​Isfahani (1987) discusses morphological causatives. Nabors (2014) offers a theoretical analysis of morphological causatives within the framework of Distributed Morphology. She examines, among others, causatives with verbal roots (xor-​ân-​d-​an ‘to make eat) and those with nominal roots (tars-​ân-​d-​an ‘to make scare’).

192   Simin Karimi b. Sima panjara-​ro  shekast Sima window-​râ broke ‘Sima broke the window.’ Light verb alternating causative is another type which is achieved by changing the light verb of the CPr. (94) a. barf âb shod snow water became-​3sg ‘The snow melted.’ b. xorshid barf-​o âb kard sun snow-​râ water did-​3sg ‘The sun melted the snow.’ The third type is the periphrastic causative that is also formed as a bi-​clausal complex predicate. (95) Kimea Parviz-​o tashvigh kard [ke PRO ketâb be -​xun -​e] Kimea Parivz-​râ encouragement do-​3sg [that PRO book Subj-​read-​3sg] ‘Kimea encouraged Parviz to read books.’ There are, however, periphrastic causative CPrs that are used with the LV shodan, and have no counterpart with kardan. These causative predicates are also biclausal. (96) a. bâes    shodan /​*kardan cause  become/​do ‘To (become the) cause’ b. sabab  shodan/​*kardan cause   become/​*kardan ‘To (became the) cause’ Another type of causative is expressed semantically by verbs such as koshtan ‘killing’. These are direct causatives, in which the verb expresses the idea that the act is completed on a patient by an agent, yet there is no overt morpheme expressing causation (Nabors 2014). The sentence in (97) means that the hunter caused the bear to die. (97) shekârchi xers-​ro kosht hunter bear-​râ killed-​3sg ‘The hunter killed the bear.’ Finally, the morphological causation is formed by adding the affix -​ân to a number of transitive and intransitive verbs.29 (98) a. Kimea xandid Kimea  laughed-​3sg ‘Kimea laughed.’ 29 

In colloquial Persian, â is pronounced as u preceding nasal consonants.

Generative Approaches to Syntax    193 b. Parviz  Kimea-​ro  xand -​un -​d Parviz  Kimea-​râ  laugh-​cause-​past-​3sg ‘Parivz made Kimea laugh.’

7.6.3 Resultatives In Persian, change-​of-​state CPrs, revealing a resultative reading, are made up of a light verb plus a resultative NV element, as in (99). In this example, change-​of-​state results in yax ‘ice’ becoming âb ‘water’. (99) yax âb shod ice water became ‘Ice melted.’ English allows a secondary resultative predicate, as in the ice melted away, in which away is a secondary resultative predicate. Additional examples are provided in (100). (100)  a. b.

John wiped the table clean. I hammered the metal flat.

Persian does not allow a secondary resultative predicate in complex predicate constructions. (101)

a. Kimea felez-​ro   chakkosh  zad K metal-​râ  hammer    hit ‘Kimea hammered the metal.’ b. *Kimea felez-​ro sâf chakkosh zad K metal-​râ staight hammer hit The intended meaning: ‘Kimea hammered the metal flat.’ (Folli, Harley, and Karimi 2005: 1393)

As argued in FHK (2005), sâf ‘straight’ cannot be a secondary resultative predicate in this sentence. To obtain a resultative reading, Persian adds a second clause. (102)

Kimea felez -​ro chakkosh zad tâ pro sâf Kimea metal-​râ hammer hit till straight ‘Kimea hammered the metal till it became straight.’

shod. became

Persian resultative constructions have not been thoroughly examined. This is one of the interesting syntactic areas that needs some attention.

7.7  Raising and control The discussion of raising and control constructions within the area of generative linguistics goes back to early stages of this theoretical framework. English examples of these two constructions are provided in (103a,b).

194   Simin Karimi (103)  a. Those children seem [ e to be smart] b. Those children decided [ e to leave]

Raising Control

The empty category (e) in (103a) is considered to be the trace/​copy of the noun phrase children that has moved into the subject position of the main clause. The main verb does not subcategorize for an external argument (subject), and thus the subject position is empty. This movement is considered to be Case-​driven. That is, the infinitive verb in the embedded clause lacks Case, and therefore, the subject moves into the main clause to receive Nominative Case from the finite matrix verb. The empty category in (103b), on the other hand, is suggested to be PRO by Chomsky and Lasnik (1977) and consequent work by others. That is, the main verb subcategorizes for an external argument which is co-​indexed with PRO, the phonologically empty subject of the embedded clause. As for Persian, the following two sentences exemplify these two constructions. (104) a. un bachche-​hâ be-​nazar mi-​yâd [CP (ke) e bâhush bâsh-​an ] that child -​pl to view asp-​come-​3sg that smart​3pl ‘Those children seem to be smart.’ b. un bachche-​hâ tasmim gereft-​an [CP (ke) that child -​pl decision took-​3pl that ‘Those children decided to go.’

e be-​r-​an] subj-​go-​3pl

I start with an overview of the literature on raising constructions in 7.7.1, and continue with an examination of various properties of control constructions in 7.7.2.

7.7.1 Raising Several authors have argued that Persian lacks raising constructions (Hashemipour 1989; Karimi 1999, 2005; Ghomeshi 2001). This assumption is based on the following facts: (i) the embedded subject may stay in situ (105a); (ii) the embedded verb agrees with the embedded subject even when the latter appears in the main clause (105b); and crucially, (iii) the main verb is inflected for the third-​person singular, regardless of the number and person of the raised subject (105b). (105)

a. be nazar mi-​yâ-​d [CP (ke) to view asp-​come-​3sg that ‘It seems that the children are smart.’

bachche-​hâ child -​pl

bâhush smart

bâsh-​an] subj-b​ e-3​ pl

b. [bachche-​hâ]i  be-​nazar  mi -​yâ -​d /​*mi -​yâ -​n [CP (ke) ei bâhush bâsh-​an ] asp-​come-​3sg/​*asp-​come-​3pl Furthermore, any other phrasal element may move out of the embedded clause into the matrix clause in these constructions:

Generative Approaches to Syntax    195 (106) [ketâb-​â-​ro]i be nazar mi -​yâ -​d /​*mi -​yâ -​n [CP (ke) bachche-​hâ ei book-​pl-​râ to view asp-​come-​3sg/​*asp-​come-​3pl that child -​pl xunde bâsh-​an ] read subj-​be-​3pl ‘As for the books, it seems that children have read (them).’ In (106), the object has moved into the matrix clause while the embedded subject is in situ. Similar to (105b), there is no agreement between the verb and the raised object DP. Karimi (2005) suggests that the derived DP in these constructions is in a topic position. This explains why the verb does not agree with it.30

7.7.2 Control There are several types of control constructions in Persian, represented by the data in (107–​10). (107)

Obligatory subject control Kimeai tasmim gereft [CP ke ei /​ *Parviz be -​r -​e ] Kimea decision took-​3sg that Parviz subj-​go-​3sg ‘Kimea decided to go.’


Obligatory object control Kimea Parviz-​roi tashvigh kard [CP ke ei /​ *Arezou be -​r -​e ] Kimea Parviz-​râ encouragement did that subj-​go-​3sg ‘Kimea encouraged Parviz to go.’


Non-​obligatory control Kimeai mi-​xâst [CP ke ei /​ Parviz be -​r -​e ] Kimea asp-​want that Parviz subj-​go-​3sg ‘Kimea wanted herself /​Parviz to go.’


Arbitrary control bâyad e haghighat-​ro goft must truth -​râ said ‘One must tell the truth.’31

The discussion of control constructions in Persian goes back to Hashemipour (1989). Several other authors have discussed these constructions since then, including Ghomeshi (2001), Darzi (2008a), Pirooz (2008), Karimi (2008), Darzi and Motavallian (2010), Asudeh and

30  Darzi (1996) rejects the idea that there is no subject-​to-​subject raising construction in Persian. His analysis is partially based on the observation that the subject position of the matrix clause can be filled with the demonstrative in ‘this’ which he considers to be an expletive. He also provides evidence indicating that the moved subject has some matrix-​subject-​like properties with respect to binding and quantifier floating. However, lack of agreement provides a problem for his analysis. 31  For a discussion of Persian arbitrary control see Karimi (2008).

196   Simin Karimi Mortazavian (2011) and Ilkhanipour (2014). Two issues have specifically been the centre of attention with respect to these constructions. First, the size of the complement of the control predicate in obligatory and non-​obligatory constructions: is this constituent a finite clause (CP), or a verb phrase (vP)? If the latter, what is the nature of the complementizer ke in those constructions? The second question has to do with the nature of the empty subject in the embedded clause in these constructions: is it PRO, as suggested for corresponding English constructions? Ghomeshi (2001) proposes that the syntactic category of the Persian control complement is smaller than CP. Her proposal is based on several arguments, including the following: first, there is no independent tense in the embedded clause in these constructions, and therefore, there is no Tense Phrase (TP). Second, there is no indirect question in Persian control construction, and therefore, there is no Complementizer Phrase (CP). Thus she suggests the following structure for Persian control constructions. (111) [SUBJECTi [VERBcontrol [vP PROi [VERBsubjunctive ]]]] As for the complementizer ke, she suggests that this element is a clitic in these construction, hosted by the matrix control predicate. Darzi (2008a) and Karimi (2008) provide counter arguments to Ghomeshi’s analysis, and argue that the embedded constituent in Persian control constructions is CP. Asudeh and Mortazavian (2011) arrive at the same conclusion. As for ke, Darzi convincingly shows that this element is in fact a complementizer. For example, the following sentence, where the adverbial follows ke, would be ambiguous if this element were a clitic attached to the matrix verb. This prediction, however, is not borne out, since the adverbial hamishe can only be interpreted as modifying the embedded predicate. (112) u hagh na -​dâr -​e ke hamishe to -​râ dar moghâbel-​e digarân s/​he right neg-​have-​3sg that always you-​râ in front -​Ez others sarzanesh be -​kon -​e blame subj-​do-​3sg ‘S/​he does not have the right to always blame you in front of others.’ (Darzi 2008a: 114) Furthermore, ke may lose its vowel in certain contexts. Darzi shows that, in those situations, this element is phonetically attached to the element following it, not the one preceding it. (113) goft-​i ke az . . .  → goft-​i kaz *goft-​ik az said-​2sg that from ‘You said that from  . . .  ’ Ilkhanipour (2014) suggests that the complement of the control predicate is not a full CP, but a defective one. Employing Cinque’s (1999, 2004) universal hierarchy of functional phrases (FPs), and the idea that adverbials are base-​generated in the Specifiers of relevant phrases, she argues that the complement constituent of the control predicate in Persian is a defective CP that lacks value(s) on mood and modal projections. This is evident by an example such as the one in (114) in which the complement of the control predicate is incompatible with an evaluative adverb.

Generative Approaches to Syntax    197 (114)

*to mi-​tun-​i (ke) xoshbaxtâne/​moteassefâne nâhâr bo -​xor -​i you asp-​can-​2sg (that) fortunately /​unfortunately lunch subj-​eat-​2sg ‘*You can/​are able to fortunately/​unfortunately have lunch.’ (Ilkhanipour 2014: 330)

As for the nature of the empty category in the subject position of the embedded clause, it is not clear if it can be considered PRO. This is because these constructions do not uniformly exhibit lack of Tense. The presence of tense indicates the presence of Nominative Case, an issue that conflicts with the nature of PRO, since this element is considered to receive Null Case (Martin 2001) in the subject position of an infinitive clause.32

7.8  Other topics in Persian syntax There are several interesting topics in Persian syntax that have not been studied as vigorously as some others in the past. I briefly discuss some of them in this section, and refer the reader to the existing literature on these topics. Section 7.8.1 is devoted to modality, and the interaction of modals with subjunctive mood. Section 7.8.2 discusses negation and its interaction with modals, followed by an overview of Persian aspect in 7.8.3. Two other topics, namely ellipsis and sluicing, are briefly introduced in section 7.8.4. For more information, refer to Chapter 9.

7.8.1 Modality Taleghani (2006) provides an extensive analysis of the syntax and semantics of Persian modals. She divides these elements into two groups: verbal and adverbial. The first group is divided into two subgroups:  auxiliary modals and complex modals. This is summarized in  (115). (115)

Modals Verbal Auxilary bâyad ‘must’ shâyad ‘may’ tavânestan/tunestan ‘be able to’

Adverbial motmaenan ‘certainly’ hatman ‘certianly’ ehtemâlan ‘probably’

Complex ehtiâj dâshtan ‘to need’ majbur budan ‘to be obliged’ momken budan ‘to be possible’ lâzem budan ‘to be necessary’ ehtemâl dâshtan ‘to be possible’

32  Hornstein (1999) and work thereafter suggest that control and raising constructions have the same syntax. That is, the surface subject in both cases is merged in the embedded clause, and moves into the subject position of the matrix clause. This proposal has several problems, including the fact that the scope interpretation of the subject in these two constructions is quite different: while the subject in a raising construction may receive a wide-​or a narrow-​scope reading, the one in a control construction may only receive a wide-​scope interpretation, indicating that it could not have moved from a lower position to its surface position.

198   Simin Karimi Semantically, Taleghani divides these modals in two major groups: root or event modals that express ability, permission or obligation, and epistemic modals that involve possibility and probability. Root modals consist of two subgroups: those related to obligation and permission from an external source are called deontic, while the ones related to internal ability and willingness are called dynamic. The data in (116) represent the two types of root modals in Persian. (116) a. Sârâ mi -​tun –​e dar in emtehân movafagh be -​sh -​e (Root-​ability) Sara asp-​can-​3sg in this exam success subj-​become-​3sg ‘Sara can/​is able to pass this exam.’ b. Sârâ mi -​tun -​e tu xune be -​mun -​e (Root-​permission) Sara asp -​ can-​3sg in home subj-​stay-​3sg ‘Sara can/​is permitted to stay at home.’               (Taleghani 2006: 12) The following data represent epistemic modals in Persian. (117) a. shâyad Sârâ be mehmuni bi -​y -​âd perhaps Sara to party subj-​come-​3sg ‘Perhaps Sara will come to this party.’           (Taleghani 2006: 12) b. Sârâ momken -​e (ke) mariz bâ -​sh -​e Sara possible-​be-​3sg that sick subj-​be-​3sg ‘Sara may be sick.’                      (Taleghani 2006: 13) One of the interesting aspects of modals is their interaction with the subjunctive mood, expressed by the prefix be-​ (or bo-​, bi-​, depending on the phonological context). Taleghani shows that verbal root modals are compatible only with present subjunctives, while verbal epistemic modals are compatible with both present and perfective mood. The data in (116) exemplify the interaction of the present subjunctive with root modals. Those in (118) represent epistemic modals with both present and perfective forms of the subjunctive mood.33 (118) a. Sârâ momken -​e be ta’tilât be -​r -​e Sara possible-​be-​3sg to vacation subj-​go-​3sg ‘It is possible that Sara will go on vacation.’ b. Sârâ momken -​e be ta’tilât rafte bâ -​sh -​e Sara possible -​e to vacation gone subjperfect -​ go-​3sg ‘It is possible that Sara has gone on vacation.’               (Taleghani 2006: 32) Taleghani shows that adverbial modals are not compatible with the subjunctive mood, and can only appear with the prefix mi-​(see section 7.8.3 for a discussion of this prefix).

33  For a discussion of the interaction of subjunctive mood with past tense, see Tavangar and Amuzadeh (2009). These authors examine the function of the simple past tense as a grammaticalized exponent of epistemic and deontic modality within a future-​oriented temporal framework.

Generative Approaches to Syntax    199 (119)

ehtemâlan Sârâ be in konferâns mi -​y -​âd probably Sara to this conference asp -​ come-​3sg ‘Probably Sara will come to this conference.’


Taleghani suggests that verbal modals take a clausal complement. Thus the subjunctive form be-​appears on the embedded verb in those constructions. Adverbial modals, however, do not take a complement clause. Therefore the verb, as the matrix predicate, is incompatible with the subjunctive prefix in a non-​imperative context.34, 35

7.8.2 Negation Pollock (1989) proposes that negation is the head of its own functional phrase, similar to tense and agreement. Laka (1994) suggests that the position of the negation phrase within the clause is parameterized cross-​linguistically. As for Persian, Taleghani (2006) proposes that the negation prefix na-​appears in the head position of the negation phrase which is located above the Tense Phrase (TP), and is in an Agree relation with a negative feature on the verb, which results in the surface realization of this prefix on the verb. Furthermore, the negation morpheme attaches to the prefix mi-​ , and is incompatible with the subjunctive morpheme be-​. (120)

a. na -​raft -​am neg-​went-​1sg ‘(I) didn’t go.’

b. ne -​mi -​raft -​am neg-​asp-​went-​1sg ‘I wasn’t going.’

c. *ne-​be-​raft-​am d. *be-​ne-​raft-​am

According to Taleghani, the underlying structure for (120b) is the one in (121), where AspP stands for Aspect Phrase: (121)

[NegP Neg [TP


[AspP Asp [vP

[VP V ] V ]]]]

The arrows represent an Agree relation between the verb, on the one hand, and negation, tense (past tense in this case) and aspect (thus the presence of mi-​), on the other. As for the incompatibility of the subjunctive prefix with negation, Darzi (2008b) offers a morphosyntactic solution. He proposes the existence of a phrase (PolP) between TP and AspP in Persian. According to him, the head of this phrase is the locus of either the subjunctive or the post-​modal negative feature, thus capturing the complementary distribution of the two. The interaction of modals with negation is another interesting topic. 34  Rahimian, Najari, and Hesarpuladi (2015) provide a report of the historical development of modal auxiliaries in Persian. They state that Old Persian lacked modals, while Middle Persian employed four modals which had their roots in Old Persian main verbs. They further report that two of those modals are still employed in Modern Persian: bâyestan ‘must’ and tavânestan ‘can’. 35  For a critical review of previous literature on modals, including Taleghani’s work, see Ilkhanipour (2012).

200   Simin Karimi (122) a. Sârâ na -​bâyad be in konferâns be -​r -​e Sara neg -​must to this conference subj -​ go-​3sg ‘Sara needn’t go to this conference.’ b. Sârâ bâyad be in konferâns na-​-​r -​e Sara must to this conference neg -​ go-​3sg ‘Sara must not go to this conference.’

(Taleghani 2006: 140)

In (122a), the negation prefix is attached to the modal, and receives a wide scope. In (122b), it is attached to the main verb, and receives a narrow scope. Taleghani suggests that Persian negation may take wide or narrow scope over root modals, depending on its position, with the exception of the dynamic root modal ehtiyâj dâshtan ‘to need’, in which case negation takes only a wide scope. She reports inconsistencies with respect to the interaction of epistemic modals with negation.36

7.8.3 Aspect In this section I concentrate on two elements with respect to aspect in Persian: the prefix mi-​ and the auxiliary dâshtan ‘to have’. As for mi-​, Ghomeshi (2001) suggests that this element represents the ongoing nature of the event. Windfuhr (1979) considers it to refer to a habitual event, while Mahootian (1997) categorizes it as representing both habitual and imperfect aspects. Taleghani (2006) suggests that mi-​is an aspect marker that refers to continuity and habituality of an action. Syntactically, she puts this element in the head position of the aspect phrase (AspP) which is realized on the verb by an Agree relation (cf. 121). The auxiliary dâshtan (to have) represents progressive aspect in Persian colloquial language, and is fully inflected, along with the main verb, as in (123). This element requires the presence of mi-​on the main verb, regardless of tense. (123) a. Kimea dâr -​e ketâb mi -​xun -​e Kimea have-​3sg book asp-​read-​3sg ‘Kimea is reading a book/​books.’ b. Kimea dâsht ketâb mi -​xund Kimea had-​3sg book asp -​ read-​3sg ‘Kimea was reading a book/​books.’



Nematollahi (2015) offers the following example as one of the instances of this paradigm.37 36 

See ­chapter 5 in Taleghani (2006) for a syntactic analysis of the interaction of root and epistemic modals with negation. 37  Nematollahi (2015) reports that Zhukovskij (1888) was the first to mention this type of progressive form in Persian colloquial language. Nematollahi’s own study is based on data she collected from the literary work published between 1907 and 2010. She shows that this progressive form is not specific to the colloquial language, and has been increasingly employed in the literary style as well. See also Dehghan (1972) for a comprehensive discussion of this phenomenon.

Generative Approaches to Syntax    201 (124)

zâheran dâsht -​e sabzi mi-​xarid-​e apparently have-​perf-​3sg vegetable asp-​buy-​perf-​3sg ‘Apparently s/​he was buying vegetables.’

Evidential (Nematollahi 115: 103)

Taleghani (2006) suggests that the colloquial progressive construction with dâshtan is an instance of Serial Verb Constructions (SVC). This observation is based on Butt’s (1995) description of SVCs, some of them repeated below in (126). (125)

From Butt (1995): a. A single SVC complex describes a single conceptual event. b. One verb is not embedded within the complement of the other. c. The complex takes only one external argument.

The data in (123–​4) have all the properties in (125): each sentence involves only one event, there is only one external argument shared by both verbs, and neither verb is embedded within the complement of the other. An intriguing property of Persian progressive constructions with dâshtan is that they may not co-​occur with negation. (126)

a. * man na I neg

-​dâr -​am ketâb mi -​xun -​am -​ have-​1sg book asp-​go-​1sg

b. *man dâr -​am ketâb ne -​mi -​xun -​am I have-​1sg book neg-​asp-​go-​1sg Intended meaning: I am not reading a book/​books. This is an issue worth close examination in future.

7.8.4 Ellipsis and sluicing Although ellipsis and sluicing have been the subject of much attention with respect to English, the Persian counterparts of these constructions were, to my knowledge, untouched until recently. These two topics are briefly introduced in this section. Ellipsis Ellipsis refers to the omission of one or more word(s) from a clause that are nevertheless recoverable in the context of the remaining elements in the sentence. In this section I introduce two articles and one work in progress on Persian ellipsis. Toosarvandani (2009) discusses ellipsis in Persian complex predicates. In this construction, the light verb survives, while the NVE and the object are elided, as in (127). (127)

Sohrāb piranâ-​ro otu na-​zad vali Rostam [NP piranā-​ro out] zad. Sohrab​râ iron Neg-​ but Rostam shirt.Pl.râ iron hit-​3sg ‘Sohrab didn’t iron the shirts, but Rostam did.’ (Toosarvandani 2009: 65)

202   Simin Karimi In this construction, the NV element, along with the object, are elided, leaving the light verb stranded. Toosarvandani suggests that this construction is similar to VP ellipsis in English. Eliding the subject and object is yet another type of ellipsis. Sato and Karimi (2015) discuss this kind of ellipsis and show, among others, that this language exhibits a subject–​ object asymmetry with respect to sloppy interpretation of null arguments in these elliptical constructions. (128) a. Kimea moalem-​esh-​ro Kimea teacher-​3sg-​râ ‘Kimea loves her teacher.’

dust dâr-​e. friend have-​3sg

b. Parviz ham e dust dâr-​e. Parviz also friend have-​3sg Lit. ‘Parviz also loves e.’


The missing argument here allows both strict and sloppy interpretations. In other words, the sentence in (128b) means either that Parviz also loves Kimea’s teacher (the strict interpretation) or that Parviz also loves Parviz’s own teacher (the sloppy interpretation). Turning now to ellipsis of subjects, the example in (129b) illustrates a null subject construction in which the embedded empty subject is anaphoric to the overt subject in the full-​f ledged antecedent clause in (129a). Unlike null objects, however, null subjects disallow the sloppy interpretation; thus (129b) can mean that Parviz said that Kimea’s friend knows French, but it cannot mean that Parviz said that Parviz’s own friend knows French. (129) a. Kimea goft [ke dust-​esh farsi balad-​e]. Kimea said that friend-​3sg Farsi know-​3sg ‘Kimea said that her friend knows Farsi.’ b. Parviz goft [ke e farânse balad-​e]. Parviz said that French know-​3sg Lit. ‘Parviz said that e knows French.’


Based on various syntactic constructions in Persian, Karimi and Sato argue against analyses built on Verb-​Stranding VP-​ellipsis (VVPE) proposed by Goldberg (2005; among others) for this type of ellipsis in Persian. According to VVPE, the verb is stranded in these constructions, followed by VP deletion (similar to Toosarvandani’s analysis of the complex predicate construction discussed in this section). Sato and Karimi provide arguments showing that the only elements that are missing in these constructions are in fact the arguments themselves, and not a larger constituent. Smith et al. (2016) discuss NVE ellipsis in Persian. They argue that this type of ellipsis, contrary to Toosarvandani (forthcoming), is allowed in this language, although it is restricted by the specificity property of the direct object. The data in (130) exemplify this contrast. (130) a. Bahâr miz-​ro tamiz kard, vali panjera-​ro tamiz na-​kard Bahar table-​râ clean did-​3sg but window-​râ clean neg-​did ‘Bahar clean the table, but didn’t clean the window.’

Generative Approaches to Syntax    203 b. *Bahâr miz tamiz kard, vali panjera tamiz na-​kard Bahar table clean did-​3sg but window clean neg-​did Intended meaning: Bahar cleaned a table, but didn’t clean a window. Smith et al. argue that this restriction boils down to the fact that copies of specific objects (of type ) can convert into bound variables (Fox 2002; Sauerland 2004), thus permitting parallelism when specific objects scramble. Copies of non-​specific objects may not convert, leading to a violation of parallelism that is required for ellipsis. Sluicing The sluice construction was introduced by Ross (1969). (131) represents a typical example. (131)

John met someone yesterday, but I don’t know who.

It has been argued that the wh-​phrase in a sluice construction moves into the Specifier of CP, followed by a process of TP deletion (Ross 1969; Merchant 2001; among others). Thus (132) would be the underlying structure of (131). (132)

[CP who [TP John met (who) yesterday ] ]

Persian exhibits sluicing as well, as attested by the following data. (133)

a. râmin ye chiz-​i xaride hads be-​zan chi Ramin a thing-​ind bought-​3sg guess subj-​hit-​2sg what ‘Ramin bought something. Guess what.’ (Toosarvandani 2008: 679) b. kesi man-​o hol dâd vali ne-​mi-​dun-​am ki someone me-​râ push gave but neg-​asp-​know-​1sg who ‘Someone pushed me, but I don’t know who.’ (Toosarvandani 2008: 680)

Persian is a wh-​in-​situ language, as discussed in section 7.2 of this article. Thus the underlying structure of the sluice construction seems to be a mystery at the first glance, since the wh-​ phrase does not move into the Specifier of CP. Building on the focus construction proposed by Karimi (2005), Toosarvandani suggests the following underlying construction for a sentence like (133a). FP



wh F

TP .… (wh) … (Toosarvandani 2008: 700)

In (134), the wh-​phrase has moved into the Specifier of the FP, followed by the deletion of the TP which contains the rest of the sentence.

204   Simin Karimi

7.9 Conclusion This chapter offered an overview of some of the major syntactic and morphosyntactic properties of Persian. I also referred to the existing literature on each topic without getting into the detailed theoretical discussions. Of the topics introduced in this chapter, three have been examined extensively by various grammarians and linguists over several decades: complex predicates (section 7.3), Ezafe constructions (section 7.4.1) and marked objects and the nature of -​râ (section 7.5.2). Elaborated discussions of these constructions are offered in Chapter 9. Due to the descriptive nature of this chapter, theoretical considerations were not thoroughly discussed, although they were briefly mentioned in some cases. I refer the reader to Chapter 8 for an examination of theories employed in the literature for Persian syntax. Some of the issues introduced in this chapter have not been thoroughly examined in the literature. For example, problems related to complex DPs, specifically with respect to extraposition of the CP out of the complex DP requires close attention. Furthermore, the nature of resultative constructions, and the reason why Persian does not allow secondary predicate constructions such as I hammered the metal flat need to be examined. Finally, topics briefly introduced in section 7.8 (modality, negation, aspect, ellipsis and sluicing) have specially been under-​studied. Due to the space limitation, many other interesting topics in Persian syntax and morphosyntax were not even touched on in this chapter. There is no doubt that Persian syntax and morphosyntax offer interesting and exciting topics that cry out for further exploration and examination in future work.

chapter 8

OTHER APPROAC H E S TO SY NTAX Jila Ghomeshi 8.1 Introduction In Chapter 7, a description of Persian syntax was presented, primarily from the perspective of the Minimalist framework. This chapter presents an overview of some of the other theoretical approaches that have been taken to Persian syntax. It is not intended to be a comparison of the merits and flaws of the theories themselves, but rather to showcase the way in which aspects of Persian syntax have been addressed within a number of different approaches. The coverage is restricted to the scholarly literature in linguistics that has been written in English. While the focus is on Persian, some of the discussion extends to languages in the same family as well as contact languages. Data and interlinear glosses have been modified, where necessary, to maintain one standard format throughout the article.

8.2 Descriptive approaches to Persian In the contemporary linguistics literature, theoretical approaches are often contrasted with descriptive approaches, which purport to present the facts about a language in a theory-​ neutral way. As Dryer (2006) notes, however, even descriptive works are written within a theoretical framework, meaning that there is no such thing as a purely atheoretical description. Dryer argues that the relevant distinction is not between theoretical and theory-​neutral work, but between descriptive and explanatory theories. Descriptive theories tell us what languages are like, while explanatory theories aim to tell us why languages are the way they are (Dryer 2006: 207). Moreover, Dryer asserts that a single theory cannot serve both goals simultaneously. One outstanding example of a truly descriptive, comprehensive, and linguistically informed grammar of colloquial contemporary Persian is Gilbert Lazard’s (1957) Grammaire du Persan contemporain, translated into English by Shirley Lyons as A Grammar of

206   Jila Ghomeshi Contemporary Persian (1992). Lazard’s work is a rich resource for anyone interested in Persian, particularly Persian morphosyntax. It combines grammatical description with notes on register and on variation of use. Examples are drawn from both the spoken language and from written texts and are given in Persian orthography with accompanying translations. Lazard’s grammar has been followed by a number of others that have similarly been intended to serve primarily as linguistic description rather than as teaching tools (e.g. Mahootian 1997; Windfuhr 2009a; and Perry 2007 specifically on Persian morphology). A slightly different kind of contribution but still in the descriptive vein is Windfuhr’s (1979) book on Persian grammar with the subtitle ‘History and State of its Study’. Windfuhr presents the linguistic research literature up until the late 1970s, but situates this literature within a historical context and provides references for many of the ideas about Persian morphology and syntax that pre-​date the modern linguistic period. Reflecting the growing interest in lesser-​studied languages and language documentation, Windfuhr’s (2009b) volume on Iranian languages provides a descriptive chapter on Persian and Tajik (with John R. Perry), alongside grammatical sketches of Western Iranian languages such as Kurdish and Zazaki and Eastern Iranian languages such as the Pamir group. It is worth mentioning at this point that the reference to ‘contemporary linguistics literature’ at the start of this section does not include research into Iranian languages within comparative-​ historical philology, a longstanding tradition of scholarship that continues to the present day, primarily in Europe. It has only been within the last decade or so that linguists working on Iranian languages within the North American ‘theoretical’ school and those working within the European comparative-​historical tradition have begun to interact in a meaningful way. This is due in large part to the inauguration of an International Conference on Iranian Linguistics that has taken place biennially since 2005 (see Borjian 2015 on this point). The diversity of scholarly research is reflected in the publications these meetings have produced (see Karimi et al. 2008; Korn et al. 2011; Haig and Jahani 2013). In this chapter, I will try to blur the line between the two traditions, although merely setting up an opposition between ‘descriptive’ and ‘theoretical’ work frames the ensuing discussion within the modern linguistic paradigm rather than the comparative-​historical one. Turning now to the focus of this chapter, which is on work in the more ‘theoretical’ (pace Dryer) area of syntax, we start with linguistic typology. Typological approaches bear some resemblance to traditional grammar in the sense that they do not purport to be a theory of the mind, but rather aim to capture generalizations about languages. Linguistic typology can reveal both what languages have in common and the dimensions along which they vary. Persian, as it turns out, provides rich territory for work of this kind.

8.3 Linguistic typology The field of linguistic typology can be traced back to Greenberg (1963), who introduced the idea of word order correlations and their potential to reveal universal tendencies across languages (see also Chapters 3, 7, 8, and 15 for more on word order). One well-​known set of correlations is based on whether the direct object precedes (OV) or follows (VO) the verb in a

Other Approaches to Syntax    207 given language. In OV languages complements tend to precede heads, so postpositions are expected, while in VO languages complements tend to follow heads and prepositions are expected. A number of other correlations are thought to be associated with OV vs. VO order, although complications arise depending on whether the relevant contrast is between heads vs. complements or between heads vs. phrasal modifiers (see Dryer 1992). Thus, within noun phrases the only robust correlations concern genitives and relatives clauses: in VO languages these follow the noun and in OV languages they precede it. As it turns out, these three correlation pairs: adpositions with respect to their nominal complements, and genitives and relative clauses with respect to the nouns they modify, are sufficient to show that Persian is exceptional in its patterning. It is an OV language as shown in (1a), but it exhibits VO properties in that it has prepositions (1b), and genitives and relative clauses that follow the head noun (1c, 1d): Modern Standard Persian (1) a. Mina nāma-​ro nevesht mina letter-​om wrote.3sg ‘Mina wrote the letter.’ b.  minā be mo’allem gush=kard mina to teacher ear=did.3sg ‘Mina listened to the teacher.’ c barādar-​e minā brother-​ez mina ‘Mina’s brother’ d. nāme-​i ke minā nevesht letter-​rel that mina wrote.3sg ‘the letter that Mina wrote Persian adpositions are further discussed in Chapter 3. To put Modern Standard Persian in a wider context, it is one of only fourteen OV languages listed in The World Atlas of Language Structures (WALS) that has prepositions in contrast to the 472 OV languages that have postpositions (Dryer 2013b). Thus it is highly atypical and this opens up a number of lines of enquiry: How long has Persian been atypical? Are its closest neighbouring languages also atypical in the same way? What other features of the language are implicated? The answers to these sorts of questions can be found in the typological work pursued by Stilo (2005, 2006) and Dabir-​Moghaddam (2001, 2006, 2012), among others. Stilo (2005) considers geographic proximity as well as genetic affiliation to explain the ‘mixed’ properties of Iranian languages. More specifically, he proposes that languages of mixed typology are often found ‘sandwiched’ (geographically speaking) between languages of opposite syntactic types and thus demonstrate a kind of hybridization. In the case of Iranian languages, they are found in a buffer zone between Arabic and Mediterranean languages which are typically VO, and Turkic, North Caucasian, and Indic languages, which are typically OV, resulting in a mixed typology (Stilo 2005: 38). In Stilo (2006), he narrows his focus to adpositions, which vary across languages in the buffer

208   Jila Ghomeshi zone. He shows that some languages in the Iranian area have prepositions, some have postpositions, or circumpositions, and some have more than one type (for more information on circumposition in Persian, see Chapter 2). Again he demonstrates that areal factors are relevant given that languages to the north such as those of the Turkic family are consistently postpositional while languages to the south, including those of the Semitic language family are consistently prepositional. For further discussion on areal typology, see Chapter 3. Geographic proximity and contact is one explanation for how typologically inconsistent languages arise, however, historical processes are also always at play. Dabir-​Moghaddam (2001) looks back to Old and Middle Persian to consider the changes that have occurred over the passage of time. He notes Persian has changed from being an inflectional language to an analytic one, and has gone from exhibiting relatively free word order to showing more configurational properties. He proposes that along with these changes, Modern Persian is in the process of changing from an OV to a VO language. In the same work, Dabir-​Moghaddam briefly considers the expected correlations corresponding to OV and VO word order in three other Iranian languages:  Gilaki and Mazandarani spoken in the North of Iran and Kurdish spoken in the Western province of Kurdistan.1 This comparison set is expanded in a later work (Dabir-​Moghaddam 2006) to an additional eight languages: Howrāmi, Vafsi, Laki, Lori, Delijan, Delvāri, Lāri, and Nāini. He takes twenty-​four correlations pairs and determines how each language patterns with respect to each pair. From this he identifies the parameters along which all the languages pattern together, which he calls ‘pan-​Iranian parameters’ and the parameters along which they vary, which he calls ‘parameters of variation’. To take an example, we find that within the nominal domain all twelve languages have the same order of intensifier and adjective, but the order of the noun and the adjective can vary. Within the verbal domain, the order of the verb and the auxiliary verb corresponding to be able is fixed across all twelve languages but the order of a content verb and other auxiliaries can vary. (See also Mahmudweyssi and Haig 2009, who find a similar pattern in four West Iranian languages, where the modal can, like be able, shows a more fixed order with respect to the verb than other modals.) For more information about Kurdish, see Chapter 3. Linguistic typology not only affords us a way of determining the parameters along which languages can be compared and categorized, it has also led to the identification of new substantive concepts that can be used to describe languages. Concepts such as ergativity (further discussed in Chapter  2) or internally headed relative clauses would not be part of our descriptive apparatus if we were restricted to English and other European languages. One such concept, evidentiality, is the subject of a recent volume edited by Johanson and Utas (2000). The papers in this volume address the question of whether Iranian languages (along with Turkic and other contact languages) have such a category in their verbal systems. The category of evidentiality is coded by grammatical elements that serve, loosely speaking, to indicate the evidence a speaker has for an assertion. These markers can code direct evidentiality where the speaker has some sensory evidence (e.g. visual, auditory) for a statement made. Languages of this type cluster primarily among the indigenous languages 1 

He does not specify the dialect or variety of Kurdish that he considers.

Other Approaches to Syntax    209 of North and South America (de Haan 2013). There are also languages in which markers can code indirect evidentiality where the speaker learns of an event after the fact. In this case the category is linked to resultativity and/​or perfectivity as the situation described has to have reached an endpoint. There has been growing interest in whether Modern Persian has markers of ‘indirectivity’ and those who claim that it does link indirect evidentiality to the rich tense-​aspect system in the language. Consider examples (2) and (3), which show eight ways of indicating ‘past’ tense in Modern Persian (adapted from Jahani 2000: 189–​91): (2)

a. simple past kard do.pst


a. perfect indicative kardé do.pst.ptcp

b. continuous past mí-​kard cont+do.pst

b. continuous perfect mí-​kardé cont+do.pst.ptcp

c. pluperfect kardé bud do. pst.ptpl be.pst

c. no traditional English term kardé bude do.pst.ptpl be.pst.ptcp

d. progressive past dāsht mí-​kard have.pst cont+DO.pst

d. progressive perfect dāshté mì-​karde have.pst.ptpl cont+DO.pst.ptcp

As these examples show, Modern Persian has two sets of forms with past time reference. Those in (3a–​d) contain a participial form of the main or auxiliary verb and serve as the Perfect counterparts to the forms in (2a–​d), respectively. Several of the papers in the volume by Johanson and Utas (2000) suggest that the forms in (3) are connected in some way or another to indirectivity; that is, they are more likely to be used when the speaker is reporting on inferred information rather than directly experienced information. As Comrie (2000) notes in his introduction to the volume, Lazard (2000) makes the most definitive claim in this regard. Perry (2000) discusses what he calls the ‘epistemic’ function of the perfect tenses, not only in Persian but in Dari and Tajik as well. Jahani (2000a) reports on an empirical study, which indicates that there may be some variation depending on register, and Utas (2000) provides a historical view suggesting that indirectivity is an innovation in Modern Persian. Another example of an in-​depth look at a single phenomenon for the purposes of typological comparison is found in Haspelmath’s (2004) volume on coordinating constructions, which includes a comprehensive paper by Stilo on coordination in three Western Iranian languages: Vafsi, Persian, and Gilaki. This is a rich work containing information not only on types of coordination, but details regarding the historical origins of the relevant conjunctions, the stress and intonation patterns of the coordinating constructions, and careful attention to issues of style and register in the three languages considered. For example, from the list of the fifteen or so coordinators that Stilo covers, we learn that, despite being written identically in Persian, the unstressed enclitic =ò and the word va are of different origins: the enclitic

210   Jila Ghomeshi =ò is derived from Old and then Middle Persian while va is a loanword from Arabic (Stilo 2004: 273). In terms of their phonological properties, Stilo (p. 280) notes that =ò is encliticized to the word that precedes it and as such is part of the intonational contour of that word. Va is not encliticized and can therefore appear clause-​initially. Moreover, =ò is never stressed while va may be.2 While =ò is clearly phonologically enclitic, it acts syntactically as if it is attached to the following word. Compare (4a) in which ye barādar ‘a brother’ and ye xāhar ‘a sister’ are coordinated, with (4b) in which ye xāhar ‘a sister’ is extraposed. We see that the enclitic is also extraposed and attaches to the preceding constituent, which is not part of the conjunctive construction: Modern Standard Persian (4) a. ye barādar=o ye xāhar one brother=and one sister b. xodā ye (dune) barādar dād beh=ésh=o ye xāhar. God one clf brother gave to=3sg.obl=and one sister ‘God gave him a brother and a sister.’             (Stilo 2004: 280.10) This phenomenon of extraposition, or shifting, is common among coordinating constructions. For example, with the conjunctive bisyndetic3 coordinators ham  . . .  (=ò) ham  . . .  it is more common to extrapose the second coordinand, as shown in (5b) rather than to leave it beside the first coordinand as in (5a). The example in (5b) also involves ellipsis of the main verb: (5)

a. ham ruznāme ham majalle mí-​xun-​am. also newspaper also magazine cont-​read-​1sg ‘I read both newspapers and magazines.’           (Stilo 2004: 318.164) b. ham ín=o [mí-​xā-​m], ham also this=om cont-​read-​1sgalso ‘I want both this and that.’

ún=o [ ]‌ that=om        (Stilo 2004: 319.170)

Similarly, with the disjunctive bisyndetic coordinators, ya  . . .  ya  . . .  ‘either . . .  or . . . ’, there is a preference for NP shifting and verb ellipsis as in (6b): (6)

a. yā barādár=esh yā xód=esh be man goft. or brother=3sg.obl or self=3sg.obl to I said ‘Either his brother or he himself told me.’              (Stilo 2004: 320.174) b. yā medā́d=et=o be man [bé-​deh] yā qalám=et=o [ ]‌. or pencil=2sg.obl=om to I sbjv-​give or pen=2sg.obl=om ‘Give me either your pencil or your pen.’           (Stilo 2004: 320.175)


Stilo does not explain why he chooses to give the citation form of this enclitic with secondary stress on the vowel. 3  The term ‘bisyndetic’ means there are two coordinators, while ‘monosyndetic’ means there is only one.

Other Approaches to Syntax    211 Thus in Stilo’s documenting of one deceptively simple type of construction, namely coordination, we see two of the characteristics of Persian that have made it so interesting to study from a mainstream generative (Minimalist) perspective: the shifting of constituents (aka scrambling; see, for example, Karimi 2005) and the omission of constituents (aka ellipsis; see, for example, Toosarvandani 2009).

8.4  Construction Grammar Construction Grammar differs from other formal theories of syntax in that it treats syntactic constructions as having the same status as lexical items, namely as elements of the grammar. Thus, Construction Grammar blurs the distinction between the grammar and the lexicon (for further elaboration, see Goldberg and Jackendoff 2004, for example). Fried (2015) identifies at least three strands of Construction Grammar:  the one originated by Fillmore and his colleagues and students in the 1980s (e.g. Fillmore 1988), the one that focuses on argument structure and language acquisition (see Goldberg 1995, 2006), and the one concerned with issues of typology (see Croft’s (2001) Radical Construction Grammar). In this section, I will discuss the way in which the notion of ‘construction’ has been used to describe aspects of the syntax of Persian, without delving too much into the issues that distinguish these three strands. One phenomenon that is highly amenable to being viewed from a Construction Grammar perspective is the issue of ‘alignment’ in Iranian languages. Alignment systems reflect the way in which the two arguments of a transitive predicate pattern with the single argument of an intransitive predicate. Following Comrie (1978) I will use the abbreviations S for the argument of an intransitive predicate, A for the agentive argument of a transitive predicate and P for the patient argument, though other scholars adopt different conventions for the same concepts; see Dixon (1994), or Haig (2008) for a discussion of alignment in Iranian languages. Alignment systems, i.e. the patterning of core arguments, are expressed in three ways. Their syntactic realization is through word order, but alignment is also expressed morphosyntactically via Case (often termed ‘flagging’) and/​or via agreement and other types of person-​agreement markers (termed ‘indexing’). These three dimensions may themselves diverge so that a language may lose ergative Case marking but still exhibit reflexes of ergativity in its agreement marking. It is for this reason that Haig (2008) advocates for a Construction Grammar approach over others. In seeking to explain how languages, specifically Iranian languages, change over time, he notes that alignment changes can be gradual, taking centuries to take full effect. He argues for an accumulation of small changes, over generations, that can amount to an overall drift, rather than one large-​scale change that takes place during language transmission from one generation to another. (The latter view is the one he attributes to Mainstream Generative Grammar.) The history of Modern Persian involves a rise and then fall of an ergative alignment pattern. The ergative alignment pattern arose during that Old Persian period and continued well into Middle Persian and Parthian but was restricted to transitive clauses formed with the past stem form of the main verb. This system remains evident in some

212   Jila Ghomeshi Iranian languages of the present day. For example, in Zazaki, a Western Iranian language, the subject of an intransitive predicate (S) and the object of a transitive predicate (P) both appear in the Direct case and determine agreement on the verb, while the subject of a transitive predicate (A) in the past tense appears in the Oblique case and does not trigger agreement: Zazaki (7) a. ti kām ē? 2sg.dir who cop:pres:2sg ‘Who are you.’ (Paul 1998: 72, as given in Haig 2008: 7.3) b. wexto ki to āw-​i šimit-​ā at.time that 2sg.obl water-​f:dir drink:pst-​f:3sg ‘When you drank the water [  . . .  ].’ (Paul 1998: 91, as given in Haig 2008: 7.4) Of special interest to typologists is the fact that some Iranian languages exhibit an extremely rare form of alignment whereby the two arguments of a transitive predicate (A, P) share the same form in contrast to the single argument (S) of an intransitive predicate (see Comrie 2013; Dabir-​Moghaddam 2012). Comrie (2016) gives the following examples from Payne (1980) to illustrate: Roshani (8) a. az=um pa Xaraɣ sut 1sg.dir=1sg to Khorog go.pst.f ‘I went to Khorog.’ (Payne 1980: 158, as given in Comrie 2016) b. mu tā wunt 1sg.obl 2sg.obl see.pst ‘I saw you.’ (Payne 1980: 156, as given in Comrie 2016) Comrie (2013) notes that such examples challenge the functionalist perspective that Case marking serves to distinguish the core arguments that occur within a single clause. Moreover, he points out that the Iranian languages exhibiting this highly unusual system of ‘flagging’ or Case marking are neither genealogically related nor are found within the same areal continuum. He concludes from this that there is a predisposition for this type of alignment in Iranian as a whole but that it cannot have arisen from a single historical event. His conclusion is in line with the way syntactic change is viewed within Construction Grammar whereby small-​scale construction-​specific changes can occur in parallel across languages. Returning to the history of ergativity in the Iranian languages and its residual effects in their contemporary counterparts, Haig (2008) presents diachronic evidence supporting the hypothesis that ergative alignment arose out of a reanalysis of the External Possessor Construction rather than out of a Passive construction. This analysis shifts focus onto the way in which non-​core arguments of predicates, or what Haig terms ‘indirect participants’, have been encoded and interpreted in Iranian languages. Moreover, his work highlights the development of several aspects of Persian syntax that intrigue present-​day linguists. For example, he connects the loss of a rich system of nominal case, one that is reduced to a single

Other Approaches to Syntax    213 opposition between an unmarked Direct case and an Oblique case by Middle Iranian, and then disappears altogether in some languages such as Persian, to the subsequent rise of innovated case markers in those languages, including accusative/​object marker -​rā from the Old Iranian postposition rādiy (Haig 2008: 95–​6; see also Bossong 1985 cited therein). For those who work on -​rā in Modern Persian the diachronic view contributes a relevant piece of information in that innovated object markers inevitably display differential object marking, meaning that these markers will target objects that rank highly on an animacy hierarchy first (Haig 2008: 159.134). See Chapter 9 for more discussion on -​rā and differential object marking. Another area that provides rich territory for theoreticians of various persuasions is the occurrence and distributions of clitics and agreement. Haig (2008) suggests that the simplification of the case system from Old to Middle Iranian might have been ‘compensated for by the massive increase in the use of clitics’ (p. 105, see also pp. 334–​8 for a brief overview of the way in which the syntax of clitics has changed throughout the history of Western Iranian, along with references cited therein). The distribution of pronominal enclitics differs widely from language to language within the Iranian group, yet there are persistent ‘family resemblances’. Pronominal enclitic are further discussed in Chapters 3, 6, 9, and 10. Clitics and agreement are both ways of indexing arguments, and indexing is one of the three ways (along with Case and word order) of determining the patterning of core arguments. Unsurprisingly then, Old Iranian not only had ergative case marking but also ergative agreement. Haig (2008) states that this agreement system was lost by Middle Iranian but has been retained somewhat in certain Kurdish dialects. In Central Kurdish, the system of cross-​referencing the A through a clitic pronoun is still robustly attested, while the verbal agreement with the P only surfaces in the absence of an overt P noun phrase in the clause. The example in (9) from the Suleimani dialect illustrates this (see Öpengin 2013; Öpengin 2016; and Haig 2017 for details): Suleimani (9) bāŋ=yān kird-​im call=3pl.clc do.pst-​1sg ‘They called me.’ (MacKenzie 1961: 109, as given in Haig 2008: 290.327) In contrast, in Modern Persian agreement on the verb is with A and it is P that is cross-​ referenced via a clitic pronoun on the constituent preceding the verb. Thus in the examples, (10b) is closer to (9) than (10a) in terms of the order of elements, but the arguments indexed by the clitics and agreement are the opposite of those in (9). Modern Persian (10) a. davat=am kard-​an invite=1sg.clc do.pst-​3pl ‘They invited me.’ b. davat=ešun kard-​am invite=3pl.clc do.pst-​1sg ‘I invited them.’

214   Jila Ghomeshi These sorts of data are mystifying if they are viewed synchronically and in isolation from the rest of the language family. Moreover, the examples themselves are not necessarily representative of their respective languages as a whole. In many Iranian languages there are at least two different patterns of agreement as the history of ergativity in Iranian is also the history of various kinds of splits. Case and agreement patterns have historically been split in Iranian, such that one kind of alignment is found when the verb is in its past form and another is found when the verb is in its present form. Haig (2008: 9–​10) takes care to point out that it is the form of the main verb that matters, not past or present time reference. This is evident from languages in which there can be past-​time reference with constructions formed on the present stem, such as the Imperfect in the Awroman dialect of Gorani, a west Iranian language. In this case, despite the past-​time reference, the alignment is the one associated with present stem forms, i.e. accusative. Given the existence of split systems, it is possible for a single language to have constructions similar to both the one exemplified in (9)  and the one in (10). Dabir-​Moghaddam (2012) in his survey of alignment systems in several different Iranian languages, as manifested through clitics and agreement, presents data from Talyshi, a North-​Western Iranian language, showing precisely this. In (11) below we see that the verbal agreement on a present stem verb is with the single argument of an intransitive verb and the agentive argument of a transitive verb: Talyshi (11) a. az umæn-​æm 1sg come-​1sg ‘I come.’ b. az kitob-​ə sæn-​æm 1sg book-​obl buy-​1sg ‘I buy the book.’

(Dabir-​Moghaddam 2012: 65.122)

(Dabir-​Moghaddam 2012: 65.123)

When the verb appears in its past stem form, however, there is no agreement on the verb and the agentive argument of a transitive is cross-​referenced with a clitic whose host can be the direct object or the verb: (12) a. man æv-​ün=əm zənæ 1sg 3sg-​pl=1sg.clc knew ‘I knew them.’ b. man æv-​ün zənæ=me 1sg 3sg-​pl knew=1sg.clc ‘I knew them.’ c. æmæ=š zæ 1pl=3sg.clc hit ‘He/​she hit us.’

agentive clitic on direct object (Dabir-​Moghaddam 2012: 66.124) agentive clitic on verb (Dabir-​Moghaddam 2012: 66.125) agentive clitic on direct object (Dabir-​Moghaddam 2012: 66.126)

We can note that the examples in (11) resemble Modern Persian, while the examples in (12a) and (12c) bear a closer resemblance to Suleimani in that the A-​marking clitic appears on the

Other Approaches to Syntax    215 preverbal constituent. However, the possibility of the A-​marking clitic appearing on the verb as in (12b), along with the fact that it can co-​occur with an overt pronoun, brings the past stem constructions back in line with constructions like (13) from Modern Persian: (13) man un-​ā-​ro 1sg 3sg-​pl-​om ‘I saw them.’

did-​am see.pst-​1sg

The facts in (11)–​(13) suggest that rather than looking for language-​specific patterns, it may be preferable to look at construction-​specific patterns. This point can be reinforced by considering a non-​canonical construction in Modern Persian that is not based on the tense form of the verb but on the predicate type. The construction in question goes by an unusually large number of different names: the Experiencer Subject Construction, the Impersonal Construction (Ghomeshi 1996), the Psychological Predicate Construction (Sedighi 2005),  the Inalienable possessor construction (Karimi 2005), Impersonal Complex Predicates (Karimi 2013), and Pronominal Complex Predicates (Kazaminejad 2014). In this construction, the ‘subject’ is expressed as a pronominal enclitic on the constituent preceding the main verb, and the verb itself takes third-​person agreement (see also Chapters 3, 7, and 15): Modern Standard Persian (14) a. yad=esh raft memory=3sg.clc went ‘S/​he forgot.’ b. yad=esh mi-​r-​e memory=3sg.clc cont-​go-​3sg ‘S/​he is forgetting.’ The superficial resemblance between this construction and the one in (12c) revolves around the lack of agreement on the verb and the fact that the pronominal enclitic must appear on the preverbal constituent. However, the Experiencer Subject Construction is not limited to the past stem (as (14b) shows) but rather to predicates of psychological or physical states (being warm or cold, liking something, etc.). And the clitics in these constructions mark ‘non-​ canonical subjects’ (in Haig’s terms) in that they index experiencers rather than agents. The type of analysis this construction has received within the formal/​Minimalist literature is found in Sedighi (2011), to name but one example. Sedighi accounts for the properties of this construction by proposing that the experiencer is merged via an Applicative Phrase (ApplP), i.e. a projection that is used for merging applied (rather than core) arguments. While such an analysis links the Experiencer Subject Construction in Modern Persian to similar phenomena in other languages (e.g. oblique marked subjects that are non-​agentive, see Cuervo 2003; Rivero 2004), it has less to say about the resemblance between this construction and the patterns (or remnants) of ergativity found in related languages. Notable exceptions to the charge that formal work looks neither far enough back in time, nor broadly enough across language varieties, are emerging however. Karimi’s (2013) analysis of the experiencer argument as undergoing Possessor Raising, for example, links the clitic-​licensing of the experiencer to the licensing of the external argument (i.e. the subject)

216   Jila Ghomeshi in past transitive clauses in Kurdish (see Karimi 2013: 121, fn. 6 and references therein; see also Kazaminejad 2014, who adopts a blend of formal, functional, and corpus approaches to propose that the Experiencer Subject Construction is an instance of Haig’s (2008) External Possessor Construction). Thus the true issue may be that many of the relevant insights from diachronic and typological syntax are simply too recent to have been incorporated into formal work yet. So far we have seen that Construction Grammar has been used to explain the types of changes that have led to the variation in case marking and indexing across Iranian languages. Haig (2008) argues that the separation of case, agreement and cliticization into distinct components of the grammar such that each can change independently of the other4 leads to far greater variation across languages than if a single parameter is held responsible for a change from ergative to accusative alignment, say. An additional point being made here is that this framework is also a useful way to compare and contrast types of constructions in the languages as they are spoken today. Mapping the network of possible patterns enriches the information associated with each particular one. We now turn to another phenomenon that has been fruitfully discussed within a Construction Grammar approach:  Persian complex predicates. One of the earliest works in this area is by Goldberg (1996, 2003), who argues that complex predicates can be lexically listed but syntactically non-​atomic, hence their status as a ‘construction’. Her work in Construction Grammar in general and on Persian complex predicates in particular has resulted in a number of other publications within Head Driven Phrase Structure Grammar (HPSG) and Construction Grammar (see, for example, Müller 2010). Complex predicates are further discussed in Chapters 2, 3, 7, 9, 10, 15, 17, and 19. The focus of the next section, however, is on the intersection of Construction Grammar with Cognitive and Functional Linguistics.

8.5  Cognitive and Functionalist approaches to syntax It should be noted at the outset that even though I have presented Construction Grammar in a separate section of this chapter from Cognitive and Functional approaches, these frameworks are not incompatible with each other and linguists often see themselves working within all three (see Tomasello 2014: vii–​xiii, on this point). Cognitive Linguistics is situated firmly at the explanatory rather that descriptive end of the spectrum set out at the start of this chapter. It differs, however, from other explanatory theories in that it doesn’t view language as a separate or autonomous cognitive faculty (see, for example, Croft and Cruse 2004; Langacker 2008). The tools used to explain linguistic phenomena within Cognitive Grammar include the concepts of metaphor, metonymy, and polysemy, which hold not only of words but of morphemes, grammatical constructions, and of cognition in general. Key also to Cognitive Grammar is the idea of linguistic categorization that draws on the notion of prototypes and fuzzy boundaries for category membership (Taylor 2003). 4  The order of change is not free. Haig (2008: 303.347) suggests that case changes before agreement. This means that in a shift from ergative to accusative alignment, it is not possible for a language to retain ergative alignment in its agreement marking while having accusative alignment in its case marking.

Other Approaches to Syntax    217 Let us return to complex predicates, or light verb constructions, which have gradually been replacing simple verbs since the beginning of Modern Persian (Natel-Khanlari 1986, as cited in Family 2014: 15). Under a Minimalist approach these constructions raise questions regarding whether they are formed ‘in the lexicon’ or ‘in the syntax’, and among their perplexing properties is the fact that they exhibit a degree of syntactic transparency, albeit limited, regardless of whether they are semantically transparent or opaque. As noted at the beginning of section 8.4, Construction Grammar does not draw a sharp distinction between lexicon and grammar. Rather than attributing the building of words and phrases to different modules (e.g. lexicon vs. syntax) both are seen as generated by rules, but those rules can range from very general to completely idiosyncratic (Family 2014: 21). This cline is based in large part on semantic transparency. Thus, drawing on work in Cognitive Linguistics, the types of questions that might be asked about non-​compositional light verb constructions under this approach revolve around productivity and predictability: How is it that some light verbs appear to be more productive than others? How do Persian speakers know which light verb to use when producing novel verbal notions? More generally, a cognitive approach also concerns itself with how the semantic space occupied by light verb constructions is organized. These and other issues are addressed clearly in Family (2014, see also Family 2011, 2006). She takes what she calls a bottom-​up approach to studying light verb constructions in that she considers the hundreds of collocations possible with each light verb. Her lists are extensive and in themselves provide a valuable resource for those interested in the topic. In considering the range of meanings that can be associated with constructions based on a single light verb, Family shows that the issue of compositionality is not black and white. Constructions can be more or less compositional with the meaning coming as much from the construction itself as from its individual components. Consider the following data in which zadan ‘to hit’ in (a) and keshidan ‘to pull’ in (b) are used to create a variety of meanings: Modern Standard Persian (15) a. dād zadan rang zadan chaman zadan gaz zadan jib zadan chaman zadan b. dād keshidan ghalyān keshidan alam keshidan abru keshidan

yell hit paint hit grass hit bite hit pocket hit grass hit shout pull hooka pull banner pull eyebrow pull

‘yell’ ‘paint’ ‘trim grass’ ‘bite’ ‘pick a pocket’ ‘trim grass’ (partial selection from Family 2014: 22.11) ‘shout’ ‘smoke a hooka’ ‘hoist a banner’ ‘draw in eyebrows’ (partial selection from Family 2014: 51.50)

The data in (15) show that there is considerable variation in the way in which a light verb construction can be non-​compositional. The meaning of the verb zadan ‘to hit’ is more or less evident in the various combinations given in (15a) and the resulting verbal notions can be transitive, as in rang zadan ‘to paint’, or intransitive as in dād zadan ‘to yell’. Moreover,

218   Jila Ghomeshi Family notes that these constructions cluster into groups based on not only on the properties of their constituent parts but also on real-​world knowledge. Thus zadan ‘to hit’ can combine with any substance that can be transferred into an object via a nozzle (see 16a), but when it combines with roghan ‘oil’, the construction has a different meaning, as oil is not added to a car through a nozzle: (16) a. benzin zadan gas hit ‘pump with gas/​petrol’ gāsoil zadan diesel hit ‘pump with diesel’ bād zadan wind hit ‘fill with air’ (e.g. a tyre)

(Family 2014: 46.43)

b. roghan zadan oil hit ‘spread oil on a surface’ Family (2014) refers to groups of collocations that express very similar meanings as clusters of productivity (p. 22, see also pp. 46–​55). Clusters of productivity are groups that are based on a single light verb, the selectional restrictions on the preverbal element it occurs with, and the resulting constructional meaning. Gradience, a key notion of Cognitive Linguistics that can be applied to the semantic compositionality of clusters, is also evident when assessing their productivity. Consider the following two sets of clusters, again based on zadan ‘to hit’: (17) a. harf zadan laf zadan gap zadan ver zadan zer zadan b. gitār zadan piāno zadan violon zadan tombak zadan santur zadan

letter hit speech hit chat hit gibbering hit drivel hit guitar hit piano hit violin hit tonbak hit santur hit

‘speak, talk’ ‘vaunt’ ‘chat’ ‘gibber’ ‘talk nonsense’

(Family 2014: 87)

‘play the guitar’ ‘play the piano’ ‘play the violin’ ‘play the tonbak’ ‘play the santur’ (partial selection from Family 2014: 85)

In terms of token frequency, i.e. the number of times each of these constructions might be used in a corpus, harf zadan ‘speak’ is probably at the top of the list. However, the cluster it is a part of, one that encodes emission of speech (the set in (17a)), is of low productivity since new ways of referencing speech acts are not that common (Family 2011: 24). In contrast, the cluster represented in (17b) is of high productivity as any new noun, X, referring to a musical instrument can combine with zadan, to form the corresponding verb ‘play X’ (see also Family 2014: 214). As an additional point regarding the way principles of Cognitive Linguistics can shed light on light verb constructions, let us consider the meaning of zadan itself in (17a) vs. (17b). We could propose that the verb means ‘play’ in (17b) and could arrive at some comparable but different meaning of zadan for the examples in (17a) along with the examples in (15a). This leads to a list of lexical entries bearing a somewhat random and arbitrary connection to one another. Cognitive Linguistics starts with the notion that polysemy is the norm rather than

Other Approaches to Syntax    219 the exception and focuses on the way in which the meanings of lexical items can be extended in a motivated way to participate in novel patterns and structures. Family (2014: 71) gives a map of the semantic space of zadan such that its ‘emitting’ sense can be subdivided into the visual (barq zadan ‘shine’, lit. shine hit) and the aural (jez zadan ‘sizzle’, lit. sizzle hit) and its ‘piercing and transferring’ sense can be subdivided into a fuelling sense (benzin zadan ‘fill with gas’, lit. gasoline/​petrol hit) and an injecting sense (āmpul zadan ‘get a shot’, lit. shot hit). Each of these divisions can be motivated on semantic grounds. The principle of polysemy alongside constructional meaning permits us to treat zadan as the same linguistic unit in all cases—​one that participates in a variety of constructions, or clusters, contributing a more or less semantically articulated meaning. We have seen how Cognitive Linguistics can handle gradient phenomena, but another area that presents a challenge to formal syntactic approaches is optionality, that is, where there are two seemingly equivalent grammatical expressions available to express a single meaning. Here, approaches that consider utterances in terms of their communicative function and their connection with other cognitive faculties have the potential to shed light on why speakers make the choices they do in particular contexts. Consider the following examples taken from Sharifian and Lotfi (2003) and (2007), respectively: Modern Standard Persian (18) a. shekar-​ā-​ro rikht-​i ru sugar-​pl-​om spill.past-​2sg on ‘You spilled the sugar on the table.’ b. shekar dākhel-​e un sugar in-​ez that ‘The sugar is in that box.’

miz table

ghuti-​e box-​3sg.cop

(Sharifian and Lotfi 2003: 235.13)

(Sharifian and Lotfi 2003: 235.13)

(19) a. belakhareh diruz ketāb-​ā bā post resid-​ø/​resid-​an finally yesterday book-​pl with post /​arrived.pst-​pl ‘Finally the books arrived yesterday.’ (Sharifian and Lotfi 2007: 796.13a) b. belakhareh diruz mehmun-​ā finally yesterday guest-​pl ‘Finally the guests arrived yesterday.’

*resid-​ø/​resid-​an * /​arrived.pst-​pl

In (18) we see that the mass noun shekar ‘sugar’ can sometimes appear with plural marking but without the kind of coerced plural reading that sugars in English would get (e.g. types of sugar, or defined quantities of sugar). In (19a) we see that the plural inanimate subject ketābā ‘books’ may or may not trigger plural agreement on the verb, while in (19b) we see that singular agreement is not possible if the subject is animate (human) and plural. Sharifian and Lotfi (2003, 2007) do not consider these data to exemplify true ‘optionality,’ but rather how speakers conceptualize events. Drawing on Langacker’s (1990) discussion of ‘schematicity’, or degrees of resolution, Sharifian and Lotfi argue that speakers may choose (18a) if conceptualizing sugar at a high level of resolution, i.e. in terms of its individual granules, perhaps because the situation involves a scattering of sugar. Similarly, a plural subject like ketaba

220   Jila Ghomeshi ‘books’ may be conceptualized at a lower degree of construal resolution, i.e. as a whole rather than as individual parts, perhaps because books are typically sent in a parcel or a box. This type of ‘conceptual-​functional’ approach (as Sharifian and Lotfi call it) can be contrasted by formal approaches to the same phenomena. For example, Ghaniabadi (2012) considers plural marking and definiteness to be features that bundle together in Persian such that the appearance of plural on mass nouns may be marking definite quantities. Sedighi (2005), also taking a featural approach, proposes that a number feature may fail to be spelled out in the context of the feature marking inanimates. In both of these cases, the research looks at the way in which features interact in Persian using Minimalism and Distributed Morphology to formalize the principles at play. However, the use of features themselves is not ultimately incompatible with a conceptual-​ functional approach in that features may represent grammaticalized elements that in turn represent conceptual notions. Returning to Sharifian and Lotfi (2003, 2007), they support their claim that the choice of plural marking on mass nouns and singular agreement with plural inanimate subjects is driven by the communicative intention and conceptualization of Persian speakers, by setting up a number of ‘tasks’ for groups of native Persian speakers to complete. Such tasks involve describing a picture or being presented with a scenario and being required to complete it with one final sentence. Their results suggest that for each construction there are contexts in which it is preferred and contexts in which it is dispreferred. In other words, the constructions are not in free variation with each other—​the expected result if we are dealing with true optionality. In a broader sense, the type of research Sharifian and Lotfi are pursuing opens up the door for considering the other types of communicative functions (e.g. formality, respect) that might be at play when speakers opt for one formulation over another—​something that is best determined by looking at linguistic behaviour over groups of speakers, as they do. One more example of optionality in Persian involves the pronominal enclitics in their function of indexing the direct object when they appear on the main verb. Typically they appear when there is no overt direct object, however, they can co-​occur with a direct object as well and in this case their appearance seems to be related to information structure. Bahrami and Rezai (2014) explore the factors affecting the indexing of overt direct objects using Role and Reference Grammar, primarily because it is a theory that incorporates the way in which a speaker takes into account the addressee’s knowledge at the time of utterance. This in turn affects the way in which a noun phrase is coded in an utterance. Bahrami and Rezai use data gathered from a corpus of standard spoken Persian to look at the topicality of indexed direct objects and their order with respect to other constituents in the clause. They find that indexed direct objects are both highly topical and quite mobile within the clause. In the next section we will look at other work that similarly uses data drawn from corpora to look at properties of direct objects in Persian.

8.6  Corpus approaches Corpus Linguistics is not so much an explanatory theory in Dryer’s sense (see introduction to this chapter) as it is a methodology and a commitment to working with data beyond what

Other Approaches to Syntax    221 one native speaker-​linguist alone can generate. Corpus linguistics involves using databases and corpora of all kinds along with questionnaires and experimental results. In some cases, a corpus of spoken language can be a source for documenting construction types. In an early work of this type, before corpus linguistics was so named, was carried out by Frommer (1981) in which spoken data was analysed in order to determine how robustly Persian conforms to a verb-​final pattern. The answer, based on statistical analysis, was that it is far ‘less verb-​ final’ than many grammars and textbooks would have us believe. More recently, Stilo’s (2010) article on ditransitive constructions in Vafsi draws on a corpus of spoken Vafsi generated by his own linguistic fieldwork in Iran. The corpus provides a resource for identifying the three ways in which ditransitives can be coded in Vafsi: via a double object construction, an indirect object construction, and via the indexing of the recipient with oblique person-​ agreement marking on the verb. Stilo discusses each construction in detail using only naturally occurring utterances. Working with corpora need not only be for documenting or describing constructions but for testing hypotheses and predictions. In recent work, Faghiri and Samvelian (2014; F&S hereafter) consider the relative order of direct and indirect objects in Persian using the Bijankhan corpus, a corpus collected from daily news and common texts (F&S 2014: 225 and references therein). Their findings provide a richer picture of the relationship between the verb and its objects (direct and indirect) than has been previously understood. For instance, it has been noted that Differential Object Marking (DOM) in Persian correlates with word order such that whether or not a direct object is marked with -​rā affects its order with respect to an indirect object (see Chapter 9 for more discussion on DOM). Specifically, it has been claimed that -​rā-​marked objects precede the indirect object and those not so marked follow it (see Faghiri and Samvelian 2014: 224 citing Ghomeshi 1997; Karimi 2003; Ganjavi 2007 in this regard). F&S show that the facts are more nuanced in that it is not only marking by -​rā, but the degree of ‘determination’ of the direct object which affects its order with respect to the indirect object. Thus indefinite objects, which in Persian may involve numerals or the indefinite suffix -​i for example, are more likely to pattern with -​rā-​marked objects than with bare nominals. This scalar or continuum-​like result is not easily accommodated within theories that categorically associate one type of nominal with one position (see also Ganjavi 2011, who shows that there is at least a three-​way split in the syntactic patterning of definite, indefinite, and bare objects). F&S (2014) also use their corpus research to show that there are length effects, primarily with indefinite objects. Contra the view, from sentence processing and production, that shorter constituents will precede longer ones, they find a tendency for ‘long before short’ order, that has similarly been posited for other head-​final languages (see Faghiri, Samvelian, and Hemforth 2014: 208 and references therein). F&S argue that this preference shows the significance of conceptual factors in determining word order rather than categorial ones (i.e. factors related to the form of the object). In subsequent work they support their claims with an experimental study (see Faghiri, Samvelian, and Hemforth 2014). Corpus approaches have become increasingly viable with advances in technology and easy access to data via the internet. They in turn have revealed far more variation in word order and sentence structure than had been previously assumed and/​or predicted within purely theoretical models. This trend is bound to continue for the foreseeable future.

222   Jila Ghomeshi

8.7  Formal approaches The formal approaches not yet covered in this chapter are of two types: theories that do not have many adherents who work on Persian; and Minimalism, which has comparatively many. The description of Persian syntax given in Chapter 7 provides ample reference to the Minimalist literature, so I will conclude this chapter with a brief survey of non-​Minimalist approaches. Before doing so, however, it should be noted that Minimalism itself dates only from the early 1990s with the publication of Chomsky’s (1993) essay on the Minimalist Program for linguistic theory. The Minimalist Program evolved from Government and Binding (GB) Theory, in the sense that key concepts and ideas were reformulated. GB in turn replaced Transformational Grammar (TG), the theory originated by Chomsky (see Chomsky 1957, 1965 for the beginnings of Transformational Grammar; Chomsky 1981a&b for a key presentation of Government and Binding Theory).5 The one common denominator that has distinguished each iteration of what is now called the Minimalist Program from almost all other formal theories is that it relies on derivation, i.e. on the notion that structures can be transformed or elements can be moved in order to obtain a grammatical construction. During the TG and GB periods, there were relatively few journal articles published on Persian, even though scholarly articles are perhaps the dominant way to disseminate research results in linguistics. This fact reflects the fact that the number of scholars who had been trained within the TG and, later, GB paradigms and who were working on Persian, was small. Nevertheless in the handful of articles that appeared in the 1970s themes of future research were identified. Browne (1970) argued for -​rā as a marker of specificity, rather than definiteness, an issue that became a focus of debate and enquiry for several decades. Moyne (1974a) argued that there is no passive in Persian and that constructions formed with shodan ‘to become’ are instead inchoative, and Moyne and Carden (1974) presented a transformational account of the doubling of subjects with an emphatic reflexive element—​a phenomenon that has yet to receive its corresponding Minimalist account. The history of generative formal work on Persian within TG/​GB is to be found not in the leading journals of the time, but instead in the doctoral dissertations published in the 1970s and 1980s.6 Moyne (1970) wrote his dissertation on verbal constructions in Persian at Harvard University while the University of Illinois hosted a number of scholars who completed PhDs in Persian syntax. Soheili-​Isfahani (1976) wrote a dissertation on noun phrase complementation, Hajati (1977) on ke-​constructions, and Dabir-​Moghaddam (1982) wrote his dissertation on causative constructions, in which he argued, among other things, that Persian does have a passive construction, contra Moyne (1974). One of the key developments between TG and GB theory was the introduction of X-​bar theory (see Chomsky 1970; Jackendoff 1977). Samiian’s (1983) dissertation was one of the first 5 

While nomenclature is not the main point here, the discussion would not be complete without noting that Government and Binding Theory is also known as the Principles and Parameters approach and that Transformational Grammar comprises three distinct periods: standard theory; extended standard theory; and revised extended standard theory. 6  In this and the paragraph that follows, I do not mean the list of dissertations that I provide to be exhaustive. There are many excellent PhD dissertations written between 1970 and the mid-​1990s that are not mentioned here for reasons of space but that are easily found by searching a good university library.

Other Approaches to Syntax    223 to implement a strict X-​bar theoretic approach for Persian and to explore the consequences, particularly for the Ezafe construction. Meanwhile the government and binding principles that gave GB theory its name were put to good effect by Karimi (1989) and Hashemipour (1989) respectively. Karimi (1989) explored, among other things, the limits that the principles of government imposed on Case and movement operations while Hashemipour (1989) considered the licensing of empty pronominal categories. Dissertations that were underway at the time early versions of the Minimalist Program were in circulation in the early 1990s continued to use a GB approach. Thus Darzi (1996) on raising and control constructions, Ghomeshi (1996) on Case, agreement, and the Ezafe construction, and both Vahedi-​Langrudi (1996) and Karimi-​Doostan (1997) on complex predicate constructions drew more on principles of GB than of Minimalism in the analyses they undertook. We turn now to theoretical approaches that have developed and co-​existed alongside Minimalism and its precursors. Role and Reference Grammar (RRG; see van Valin 1993, 2005), is a theory that incorporates information structure (Lambrecht 1994)  and related pragmatic notions into the core of the grammar. This approach is particularly useful for research that seeks to go beyond sentences in isolation in order to consider the properties of connected discourse. In one such work, Roberts, Barjasteh, and Jahani (2009) analyse Persian narrative text looking at, among other things, the activation of referents in a discourse, the coding of participant reference, and the discourse-​pragmatic structuring of sentences. Their analysis reveals the way in which syntactic structure is informed by discourse structure. In a similar vein, the paper by Bahrami and Rezai (2014), which was briefly discussed in section 8.5, shows that factors such as identifiability of the referent and topicality are at play when an overt object is ‘doubled’ or indexed by a pronominal clitic on the verb in Persian. (For other representative RRG work, see Rezai 2003) Head Driven Phrase Structure Grammar (HPSG; see Pollard and Sag 1994) is a formal theory in which lexical entries are formalized as highly articulated feature matrices. These features, along with a system of constraints on them, are employed to explain syntactic phenomena. Taghvaipour (2004, 2005a&b) uses HPSG formalism to account for gaps vs. resumptive pronouns in restricted and free relative clauses, respectively. Because of its focus on the features that comprise inflectional morphology, HPSG is well suited to handle questions regarding the status of Persian morphemes. This is reflected in a number of published works. For instance, Samvelian (2007) argues that the Ezafe vowel is best analysed as a phrasal affix and Samvelian and Tseng (2010) explore the question of whether object clitics in Persian are truly clitics or inflectional suffixes both with the HPSG framework. Bonami and Samvelian (2015) use both HPSG and Paradigm Inflectional Morphology (Stump 2001) to provide a lexicalist account of periphrastic verbal constructions in Persian. These works are highly formal in that they employ detailed formalism to model pieces of inflection and use the resulting analyses to answer theory-​internal questions regarding the division of labour between syntax and morphology, periphrasis vs. valence-​reducing constructions and clitics vs. affixes. Lexical Functional Grammar (LFG; see Bresnan 1982, 2001; Dalrymple et  al. 1995)  is similar to HPSG in that it is a monostratal (i.e. non-​transformational) generative theory with a richly structured lexicon. It observes the strong Lexicalist Hypothesis whereby there is a one-​to-​one mapping between words and nodes in a syntactic representation. That is, there cannot be two words under one node nor can there be empty nodes in the syntax. Within this framework, the research originating with Butt (1995) on Urdu complex predicates

224   Jila Ghomeshi has inspired some similar investigation on Persian complex predicates (see, for example, Nemati 2010). Optimality Theory (OT; see Prince and Smolensky 1993)  began as a constraint-​based theory of competition in which there was one winner, but has been adapted to permit the modelling of gradience in grammar. Adli (2010) uses gradient grammaticality judgements obtained via an experimental approach and statistical methods to formulate a set of preference constraints on Persian wh-​questions—​constructions that permit a wide range of possible word orders. OT is also insightfully used in Aissen (2003), a work that is not on Persian specifically but on Differential Object Marking (DOM) and which is therefore of great relevance to those interested in -​ra (see the Chapter 9). Aissen is able to capture the variation in DOM across languages by using ranked constraints that express two opposing principles: one of iconicity and the other of economy. The Minimalist literature on Persian syntax is well-​surveyed in Chapter 7 of this volume. Those who characterize the Minimalist approach as formal-​mathematical (e.g. Tomasello 2014:  xx) see, perhaps, a daunting formalism that is impenetrable to those who are not practitioners of the theory. Those within the theory working on clausal architecture, discontinuous dependencies, and/​or features and their geometries, are keenly interested in questions of design and how best to formulate cross-​linguistic principles that apply to all, not just a few languages. In some sense, then, Minimalist work isn’t about Persian but about a theory of language to which the work on Persian contributes. Syntactically speaking, there is enough about Persian to make a significant and exciting contribution.

8.8 Conclusion In this chapter, I have surveyed theoretical approaches to Persian syntax while noting that the division between descriptive and theoretical work on a given language is not all that clear-​cut. The survey is intended to show the kind of work on Persian syntax that has been undertaken within a number of different approaches. However, there are two other observations that emerge from looking at scholarly work across time and from outside the borders of any particular theory. First, this sort of wide view provides us the opportunity to understand the rise and fall of certain theories within a wider context. For example, we can see that the technological advancements of the last two decades have stimulated an increased interest on the part of researchers in corpus linguistics and other approaches that depend on quantitative analysis. On the other hand, the rise of Minimalism had led to a sharp decrease in interest in working within Government and Binding Theory. The second observation we can make is that what is interesting about a language is in large part theory-​ dependent. A cognitive linguist is more likely to be interested in the semantics of complex predicates and less interested in their scrambling possibilities than a formalist might be. A corpus linguist is more likely to investigate the statistical probability that one word order will occur over another than a typologist, who in turn might be more interested in how a given word order correlates with other properties of the language. What this reveals is that a healthy diversity in our theoretical approaches to a language is bound to yield a richer picture of the language itself.

Other Approaches to Syntax    225

Acknowledgements I would like to thank Neiloufar Family and Geoffrey Haig for agreeing to read an earlier draft of this chapter and for their very helpful comments. I would also like to thank the editors of the volume and an anonymous reviewer for their feedback and direction. All errors and omissions are my own.

chapter 9


The Ezafe construction, differential object marking, and complex predicates Pollet Samvelian 9.1 Introduction Three main aspects of Persian syntax have received a great deal of attention for more than thirty years: the Ezafe construction, differential object marking with the enclitic =​rā, and complex predicates. Why such enduring interest? Each of these phenomena involves language-​specific challenging facts which need to be accurately described and accounted for. At the same time, each constitutes a topic of cross-​linguistic investigation for which the Persian data can be of crucial interest. The Ezafe construction, a specific feature of the noun phrase in many Western Iranian languages, sheds a new light on the way dependency relationships, that is, complementation vs. modification, are realized within the noun phrase and the morphological correlates of these relationships with respect to head vs. dependent marking patterns. It also contributes to the debate on the nature of linkers in a variety of languages. Differential object marking (DOM) with the enclitic =​rā displays a complex interaction between various semantic and discourse parameters such as referentiality, topicality, and high transitivity. Modelling the interaction between these parameters in order to account for the occurrence of =​rā has been, and still is, an interesting challenge for formal and theoretical studies on Persian. Cross-​linguistically, rā-​marking is of great interest for typological studies on DOM in the languages of the world, because of the way =​rā has been grammaticalized to realize not only DOM but also topicalization, the range of grammatical functions that can be rā-​marked, and the role of discourse parameters. Finally, complex predicate formation, which is the main device for enriching the verbal lexicon in Persian, provides another theoretical and typological domain of investigation in order to highlight differences and resemblances between syntactic and morphological processes of lexeme formation and the way different syntactic components contribute to the

Specific Features of Persian Syntax    227 makeup of a complex lexical unit. Persian complex predicates constitute an interesting case study for theories of predicate decomposition which postulate the same underlying structure for simplex and complex predicates. This article is devoted to these three phenomena and is divided into three sections. Each section provides an overview of empirical facts and the way various studies have tried to account for them. While it was impossible to do justice to all influential studies because of the impressive amount of work on each topic, the article is nevertheless intended to be as exhaustive as possible and to maintain the balance between different theoretical approaches.

9.2 The Ezafe construction 9.2.1 Overview and historical facts Ezafe, from Arabic idāfa ‘addition, adjunction’, designates an enclitic realized as =(y)e, which  phrase and links the head noun to its modifiers and to the possessor NP. occurs within the noun The surface word order pattern is strongly head-​initial within the Persian NP, as illustrated in (1) and exemplified in (2). A restricted class of determiners, quantifiers, classifiers, and adjectives precede the head noun, while all modifiers and arguments follow it. The possessor NP comes last after attributive nouns, and adjectival and prepositional modifiers. All elements occurring between the head noun and the possessor NP are linked to the head noun and to one another by the Ezafe. The relative clause, on the other hand, is not introduced by the Ezafe and is placed after other modifiers and the possessor NP, outside the Ezafe domain. Argument prepositional phrases (PPs) are merely juxtaposed to the head noun and also occur outside the Ezafe domain if the head noun is followed by either modifiers or the possessor NP, as in (3). As shown by (4), multiple modifiers may occur within the Ezafe domain. In this case, the Ezafe enclitic is reiterated on each modifier except the last. The possessor NP, on the other hand, is unique. In other words, if a noun has two arguments—​which may be the case with eventive nouns—​only one of them, generally the second one or the Patient, can be introduced by the Ezafe, as in (5). The first argument (Agent) either is not realized or has a prepositional realization, (5c).1 (1)

(Det) N=(y)e A-​N=(y)e AP=(y)e PP=(y)e NP(Poss) Rel/​PP


ān        lebās=e    arusi=e    sefid=e   bi        āstin=e    maryam that dress=ez wedding=ez white=ez without sleeve=ez Maryam ke     pārsāl    kharid comp last year buy.pst.3sg ‘That white sleeveless wedding dress of Maryam that she bought last year.’


safar=e  tulāni=e  pārsāl=e     sārā be landan trip=ez   long=ez  last year=ez  Sara to London ‘Sara’s long trip to London last year.’


Glosses follow the Leipzig Glossing Rules (​lingua/​resources/​glossing-​rules).

228   Pollet Samvelian (4)  ān  āvāz=e   zibā=ye   ghadimi=e ghamgin=e āsheghāne that song=ez beautiful=ez old=ez  sad=ez  with love ‘that beautiful old sad love song’ (5) a. * kharid=e   maryam=e   yek  khāne   buying=ez Maryam=ez one house b. * kharid=e   yek  khāne=ye  maryam   buying=ez one house=ez Maryam c. kharid=e   yek  khāne tavassot=e maryam buying=ez one house  by=ez    Maryam ‘Maryam’s buying of a house’ The Ezafe is not restricted to the NP and may also occur within adjective phrases (APs) and some PPs to link the head to its unique complement: (6)

a. āshegh=e    maryam in love=ez Maryam ‘in love with Maryam’ b. barā=ye maryam for=ez     Maryam ‘for Maryam’

The Ezafe construction is not specific to Persian and is found in a significant number of Western Iranian languages (Windfuhr 1989), like Kurdish dialects (MacKenzie 1961), Hawrami (Mackenzie 1966), Zazaki (Paul 1998), and Kermanian dialects (Lecoq 2002). Although neither the shape nor the properties of the Ezafe are identical from one language to another, all these languages display head-​initial word order in the NP.2 The correlation between the head-​initial word order pattern and the availability of the Ezafe can be accounted for on historical grounds. The enclitic Ezafe has been generally assumed to have its origins in a demonstrative morpheme in Old Iranian. In Modern Persian, it can be traced back to the Old Persian relative haya, hayā, taya (Darmesteter 1883; Kent 1944, 1953; Meillet 1931). Kent (1944) thoroughly argues in favour of a relative analysis of haya, hayā, taya, which (i) in most cases introduces a subordinate clause headed by a finite verb, (7a); (ii) takes its case from the relativized function in the subordinate clause rather than from its antecedent, accusative in (7a), since the relativized function is the direct object in the subordinate clause. Upon 388 available instances of haya, hayā, taya in OP, Kent (1944) classifies 276 of them as relatives. In most of these occurrences, haya, hayā, taya indeed introduces a subordinate containing a finite verb (219 instances). However, Kent (1944) also groups with relatives instances such as the one in (7b), where haya, hayā, taya is followed only by a predicative

2  Note that the term ‘inverse’ or ‘reverse’ Ezafe is sometimes used to refer to the ending—​generally a vowel—​occurring on pre-​nominal adjectives in some Iranian languages with Adjective–​Noun order, Gilaki, Mazandarni, Balochi (Windfuhr 1979: 27–​8).

Specific Features of Persian Syntax    229 noun phrase and the copula is lacking. The reason for this is the fact that haya is in the nominative case, as required by its function in the reduced relative clause (i.e. the subject of the copula), rather than in the accusative case of its antecedent NP. These copula-​less constructions where haya, hayā, taya introduces a predicative noun phrase, as in (7b), (7c), (7d), or an adjective phrase, (7e), pave the way to its uses in Middle Persian and the emergence of the Ezafe construction.3 (7) a. ima taya adam akunavam this rel.neut.acc 1s.nom do.pst.1.s ‘this (is that) which I did’ (Kent 1944: 2, DB 1.72) b.  dārayavaum haya manā pitā Dariusacc rel.m.nom my father.m.nom ‘Darius who (was) my father’ (Kent 1944: 2, XPf 23) c. gaumāta haya maguš Gaumāta.nom rel.nom Magian.nom ‘Gaumāta the Magian’ (Kent 1944: 3, DB 1.44) d. kāra hayā manā ( ... ) army this mine ‘my army; the army which is mine’ (Meillet 1931: §407, B. II, 87) e. kāsaka haya kapautaka; kāsaka haya axšaina stone  rel  blue; stone   rel   dark ‘stone this blue; stone this dark’ Haya, hayā, taya becomes = ī in Middle Persian (haya > hyǝ > yǝ > = ī) and progressively loses its demonstrative/​relative value to end up as a simple linker (cf. Jügel 2015: 290ff.). The possessor, as well as adjective modifiers, are introduced by the Ezafe particle ī in Middle Persian: (8)  pus ī maz ī Ardawān son ez oldest ez Artaban ‘the oldest son of Artaban’ As noted by Bubenik (2009), with the consolidation of the sequence Possessee-​Possessor and Modified-​Modifier we reach the New Persian state of affairs. This explains why the Ezafe construction is correlated with the head-​initial order within the NP.


Haig (2011) argues contra Kent (1944) that the primary function of haya, hayā, taya was in fact to introduce appositive phrases within the NP. He draws the attention to a significant fact gone unnoticed in Kent (1944), namely the definiteness of the antecedent noun (i.e. the head noun) in most haya constructions. This entails that the supposed relative clause is not a restrictive relative clause (i.e. it does not contribute to the identification of the referent), but an appositive relative. Consequently, what appears to be a relative clause syntactically is functionally more of a loose appositive construction, which can in fact be extended to all uses of haya, hayā, taya. Determining whether haya, hayā, taya is an appositive ‘linker’ rather than a relative pronoun is beyond the scope of this paper. Therefore, I will not take a stance on this issue here.

230   Pollet Samvelian These changes in the function of the relative/​linker go hand in hand with a prosodic change. Like haya in OP, ī is an independent word in Middle Persian.4 However, given its constraint position, ī is generally the only element that intervenes between the head noun and the adjective or the genitive modifier (Estaji 2009). This regular adjacency prepares the ground for the change of ī to the enclitic =e in New Persian. The Ezafe construction raises several issues in syntax and morphology and has thus been a particular focus of interest in numerous studies on Iranian languages (Ghomeshi 1997a; Haider and Zwanziger 1984; Haig 2011; Hincha 1961; Holmberg and Odden 2008; Kahnemuyipour 2014; Karimi and Brame 2012; Karimi 2007; Larson and Yamakido 2008; Palmer 1971; Samiian 1983, 1994; Samvelian 2006b, 2007, 2008; Schroeder 1999; among others). The most debated issue is the status of the enclitic Ezafe itself and its functions. As noted by Haig (2011), the Ezafe is not straightforwardly accountable in terms of available functional categories and conventional X-​bar phrase structure. Narrowly related to this first issue is the internal structure of the Persian NP and the nature of dependency relationships within this syntactic domain. The Ezafe enclitic has received various and sometimes diametrically opposed analyses. It has been considered as a: • case marker (Hashemipour 1989; Karimi and Brame 2012; Larson and Yamakido 2008; Samiian 1994); • phonological linker, that is, an element inserted in Phonological Form, with no proper value or meaning (Ghaniabadi 2010; Ghomeshi 1997a; Samiian 1983); • marker associated with the syntactic movement of the noun and realizing a strong feature (Kahnemuyipour 2014); • linker indicating subject-​predicate inversion (Den Dikken 2006); • head-​marking affix adjoined to the head noun and its intermediate projections and marking them as waiting for a dependent (Samvelian 2007, 2008).

9.2.2  Ezafe as a case marker The fact that the Ezafe occurs as many times as there are dependents within the NP has favoured its analysis as a case marker. Several studies within the generative framework have developed different variants of this analysis. Hashemipour (1989) considers the Ezafe as a structural case marker on nouns, adjectives, and some PPs. For Karimi and Brame (2012), the Ezafe structurally relates a head to phrases governed by the latter, by transferring the case of the head noun to its complements. Samiian (1994) provides one of the most detailed analyses of the Ezafe as a case marker. She considers the Ezafe as a dummy case assigner, comparable to of in English, which occurs within phrases with non-​case-​assigning heads, that is, NPs, APs, and some PPs, and thus enables the head to case-​mark its complements. Note that in this view, the Ezafe structurally belongs to the modifier it precedes, while it is prosodically attached to the item it follows. The fact that the Ezafe occurs in both NPs and APs is expected, since nouns and adjectives 4 

Estaji (2009) provides a set of arguments in favour of the analysis of ī as an independent word in MP.

Specific Features of Persian Syntax    231 share the feature [+N], and thus do not assign case.5 Its presence in PPs, on the other hand, is not expected, since prepositions are assumed to be case assigning. In order to overcome this problem, Samiian (1994) considers that those prepositions that occur with the Ezafe, P2s in her classification, constitute ‘a kind of in-​between category, sharing some properties with ‘true’ prepositions (P1s) and some with nouns’. This assumption is supported by a set of empirical facts, namely, the semantic content of P2s, their subcategorization frame, the internal structure of the PP they project—​specifically their ability to allow for a specifier—​and their distribution. Adopting the Neutralization Hypothesis (Riemsdijk and Williams 1981), Samiian (1994) assumes that P2s are neutralized in their [-​N] feature, which leaves them with the only feature specification [-​V]. Therefore, P2s cannot assign case, since only [-​N] categories directly assign structural case. Sharing their only feature with the category N, they behave like the latter with respect to their case-​assigning properties and need the Ezafe to assign case. While nouns, adjectives, and P2s cannot assign case, the Case Theory (Chomsky 1981a&b) requires all NPs to receive case. The Ezafe thus endorses the role of a dummy case-​assigner, like of in English, in order to compensate for the inability of these categories to assign case. In order to account for the occurrence of the Ezafe before APs, Samiian (1994) extends the case-​receiving categories to include the AP. To support this idea, she refers to the case-​assigning of attributive adjectives in Latin and Sanskrit. Larson and Yamakido (2008) agree with Samiian’s (1994) analysis of the Ezafe as a case marker, but are not satisfied with the way it deals with modifiers. Case marking (as opposed to agreement) is typically associated with argument status; however, at least some of the Ezafe-​marked constituents are modifiers. So the question arises of why modifiers should need case and what their case-​assigner is. To answer these questions, Larson and Yamakido (2008) extend the shell theory of the VP (Larson 1988) to the DP. The DP is projected from the thematic structure of determiners, which assign scope and restriction as thematic roles to their arguments. For instance, the NP that combines with D saturates the quantifier restriction of the D. Under this account, (most) nominal modifiers originate as arguments of D. Therefore, they are not modifiers or adjuncts, but ‘oblique complements’ which combine with the head prior to other arguments. All modifiers are base-​generated in a post-​head position. Those that bear case features, APs for instance, are required to move to a site where they can check case, that is, the pronominal position. PPs and relative clauses, on the other hand, remain in situ, since they do not bear case features. The fact that in Persian, and some other Iranian languages, APs also remain in situ is explained by the availability of the Ezafe, which acts as a ‘generalized genitive preposition’, inserted to check Case on [+N] complements of D inside the DP. Following this account, the Ezafe heads its own X-​bar phrase, with the modifier as complement. However, for apparently purely prosodic reasons, phonologically it attaches to the preceding item. The analysis for (9) is given in (10). The determiner in ‘this’ checks its one Case feature on its restriction. The Ezafe is inserted and licenses the remaining modifiers in their base positions.


Samiian (1994) adopts the syntactic feature system suggested by Chomsky (1970) and subsequently developed by Jackendoff (1977), which classifies projecting lexical categories according to primitive syntactic features such as [+/​-​V] or [+/​-​N]. Adjectives and nouns are both [+N] while verbs and prepositions are [-​N].

232   Pollet Samvelian (9) in      ketāb=e   sabz=e jāleb this book=ez green=ez interesting ‘this interesting green book’

(Larson and Yamakido 2008: 60, (30))

(10) Larson and Yamakido (2008: 60, (32)) [DP Pro [D ′in [DP ketāb [D ′ t [DP [XP é sabz] [D ′ t [XP é jāleb ] ] ] ] ] ] ] Several studies claim that the analysis of the Ezafe as a case marker faces serious problems (Ghomeshi 1997a; Samvelian 2007; Haig 2011). In particular, PPs headed by P1s as well as adverbial phrases do occur within the Ezafe domain. However, all the studies mentioned in section 9.2.2 acknowledge that the latter do not need to be case-​marked. Moreover, the conceptual problem remains of why APs and PPs would need case. Note that in many languages, adjectives follow the noun and are neither case-​marked nor introduced by any specific device.

9.2.3  Ezafe as a phonological linker Building on Samiian’s (1983) data, Ghomeshi (1997a) argues against the view of the Ezafe as a morpheme heading any sort of syntactic projection. Ezafe is semantically vacuous and although it iterates throughout the NP, it is not the expression of a concord between the head noun and its dependents. Ghomeshi (1997a) therefore suggests viewing the Ezafe not as a morpheme at all, but rather as an element inserted in Phonological Form, in a certain syntactic configuration. The need for the Ezafe results from the fact that, nouns being non-​ projecting in Persian, a ‘phonological linker’ must be present in order to indicate phrasing within the nominal constituent. This task is carried out by the Ezafe. The hypothesis of Persian nouns being non-​projecting, which is the keystone of Ghomeshi’s analysis, is based on a set of restrictions on the Ezafe construction highlighted by Samiian (1983), see (1)–​(3), to which Ghomeshi (1997a) adds the restriction in (4): (i) Attributive noun phrases surface only as bare nouns in the Ezafe domain (11). (ii) Adjectival modifiers cannot take either nominal (12a), prepositional (12b), or sentential (12c) complements when occurring within the Ezafe construction. (iii) Prepositions may appear with a nominal complement within the Ezafe domain (13), but sentential complements are excluded (13b). (iv) NPs including a possessor are obligatorily construed as definite or presupposed, and possessors are in complementary distribution with the indefinite enclitic determiner =i (14).

(11) kif=e     charm /​ *  in charm bag=ez  leather /​  this leather ‘leather bag’ /​(putatively) ‘a bag of this leather’

(Samiian 1983: 45, (65); 53, (105))

Specific Features of Persian Syntax    233 (12)

a.  * mard=e    negarān=e   bachche-​ hā=yash=i     vāred    shod   man=ez worried=ez child-​      entered become.pst (intended) ‘A man worried about his children entered.’ b. * mardom=e  khashmgin  az   ertejā=ye        tehrān be-​pā   khāst-​and   people=ez  angry       at reactionaries=ez Tehran to-​ foot rise.pst-​ (intended) ‘The people of Tehran angry at the reactionary forces rose up.’ c. * mardom-​e   khoshhāl  ke   shāh keshvar=rā    tark     kard=e         irān  jashn   people-​ez  happy     that Shah country=ra desertion do.pst=ez Iran feast gereft-​and take.pst-​ (intended) ‘The people of Iran happy that the Shah left the country celebrated.’ (Samiian 1983: 42, (47b), (48b), (49b))


a. āftāb=e ba’d    az       bārun ghashang=e sun=ez after from rain ‘The sun after the rain is beautiful.’ b. * āftāb=e ba’d   az     in-​ke      bārun  bi-​ād       ghashang=e   sun=ez after from this-​that rain     sbjv-​come.prs (intended) ‘The sun after it has rained is beautiful.’ (Samiian 1983: 57, (124), (125))

(14) * ketāb=e  sorkh=i    maryam   book=ez red=indef Maryam (intended) ‘a red book of Maryam’ In order to account for these facts, Ghomeshi (1997a) assumes that:

(i) Persian nouns are inherently non-​projecting. They never appear with filled specifier and complement positions and the NP node cannot dominate any phrasal material. (ii) In spite of the fact that they are non-​projecting, Persian nouns may still appear as NPs, provided they are selected by a projecting head, e.g. D°. (iii) The Ezafe never attaches to a phrase, which implies that the Ezafe domain is the domain of X°s or bare heads. On the basis of these assumptions, Ghomeshi (1997a) suggests the following structure for the Persian NP:

234   Pollet Samvelian (15)

The internal structure of the Persian NP (Ghomeshi 1997a: 780, (90)) DP D’

DPposs D+def

NP N°-e P°

N°-e N°-e N°-e

Since Persian nouns cannot dominate phrasal material, the possessor DP, which is fully phrasal, is base-​generated as sister to D′, in [Spec, DP] position. An empty D-​head bearing the feature [+ def] is stipulated, whose validity is further supported by the constraint stated in (4). The Ezafe insertion rule, operating in PF, inserts the Ezafe vowel on a lexical X° head that bears the feature [+N], when it is followed by phonetically realized, non-​affixal material within the same extended projection. In the light of this analysis, the restrictions pointed out by Samiian (1983) are straightforwardly accounted for. The only cases that would seem to resist Ghomeshi’s analysis are PPs, which can occur within the Ezafe domain with a complement (13a). Ghomeshi (1997a) claims, however, that (13a) is not a counterexample to her analysis since the noun within the modifying PP is in fact a N° and not an NP (or DP). The combination of the lexical head P° with N° provides another P° and not a P″. To support this claim, Ghomeshi (1997a) contrasts (16a) with (16b), imputing the ungrammaticality of the latter to the fact that the complement of the preposition zir ‘under’ is a DP containing a possessor and not an N°. (16) a. otāgh=e     kuchik=e zir=e shirvuni=e ali room=ez small=ez under=ez roof=ez   Ali ‘Ali’s small room under the roof ’ b. * otāgh=e kuchik=e zir=e shirvuni=e jiān=e ali   room=ez small=ez under=ez roof=ez    Jian=ez Ali   (intended) ‘Ali’s small room, under Jian’s roof ’ (Ghomeshi 1997a: 743, (25a–​b))

Specific Features of Persian Syntax    235 This uniform analysis of modifiers as X°s in the Ezafe domain has been challenged in subsequent studies by Samvelian (2007, 2008) (cf. 2.4), Ghaniabadi (2010), and Kahnemuyipour (2014) (cf. 2.5). Ghaniabadi (2010), who adopts a similar analysis with respect to the nature of the Ezafe and suggests that the Ezafe is inserted by a phonological rule in the Late-​Linearization stage at PF, assumes that the modifiers occurring within the Ezafe domain may either be bare heads, A°s, or phrasal, APs and PPs, and suggests the following ordering of post-​nominal modifiers within the Persian NP: (17)

N A° AP PP  Possessor

The main argument of Ghaniabadi (2010) for this bipartition comes from an elliptic construction he refers to as the Empty Noun Construction, where the head noun is elided leaving behind one or more modifiers. He claims that this type of ellipsis is only possible with bare adjectives, (18a), and not with AP or PP modifiers, (18b) and (18c) respectively. In other words, a head noun can be elided (along with other head-​adjoined elements) only if the remnant is a bare adjective (A) and not an AP or a PP. (18)  a. Sajjād pirhan=e  ābi   pushid,      Sinā  pirhan ghermez Sajjad shirt=ez   blue  wear.pst.3sg  Sina shirt  red ‘Sajjad wore a blue shirt, Sina a red one.’ (Ghaniabadi 2010: 61) b.  * keshvar-​hā [PP negarān=e afzāyesh=e gheimat=e naft]   country-​pl=ez worried=ez increase=ez price=ez oil ettelā’iyye=i      sāder    kard-​and statement=indef  issue  do.pst-​3pl (intended) ‘The ones worried about the increase of the price of oil issued a statement.’ (Ghaniabadi 2010: 69) c. * kafsh-​ā [PP tu(=ye) vitrin=e maghāze] kheili ghashang=e  shoe-​ pl   inside=ez window=ez shop very beautiful=cop.3sg (intended) ‘The ones inside the window shop are very beautiful.’ (Ghaniabadi 2010: 70) Samvelian (2007) and Kahnemuyipour (2014), on the other hand, argue that all post-​ nominal modifiers within the Ezafe domain are XPs.

9.2.4  Ezafe as a head-​marking affix Samvelian (2007, 2008) considers the Ezafe as a suffix attaching to the head and to its intermediate projections in NPs, APs, and some PPs, and marking them as awaiting a modifier or a complement. Comparable to some extent to the case-​marking analyses, in that the Ezafe is considered a ‘morpheme’ marking a dependency relationship between a head and its dependents, Samvelian’s analysis nevertheless adopts the exact opposite standpoint in considering the Ezafe as marking the head and not its dependents and

236   Pollet Samvelian forming a constituent with the head both prosodically as well as functionally. Viewed as such, the Ezafe construction is an illustration of the head-​marked pattern of morphological marking of grammatical relations (Nichols 1986) and reminiscent, all things being relative, of the Semitic construct state construction. This analysis entails that the Ezafe, which once grouped with the constituent it introduced, has undergone a process of reanalysis-​grammaticalization, being thus reinterpreted as a part of the nominal head inflection. Samvelian (2007, 2008) builds on two sets of evidence: (i) The restrictions on the Ezafe construction highlighted by Samiian (1983) and Ghomeshi (1997a) are either not well grounded or are not related to the Ezafe per se but to its co-​occurrence with other enclitics such as the indefinite determiner =i or pronominal clitics. (ii) The Ezafe’s morphological behaviour, especially its complementary distribution with the indefinite determiner =i and pronominal clitics, is typical of (phrasal) affixes, rather than of a post-​lexical clitic (Miller 1992; Zwicky and Pullum 1983).

Example (19a) shows that an AP can be introduced by the Ezafe within an NP. Note that the AP is headed by the adjective negarān ‘worried’, which shows that the ungrammaticality of (12a) does not result from the phrasal status of the modifier headed by the adjective, but from the co-​occurrence of the pronominal enclitic =ash and the indefinite enclitic determiner =i. Removing the latter makes (12a) perfectly grammatical. The same situation holds for PPs: that is, P1s, as well as P2s and P3s, can occur within the Ezafe domain even when they head phrasal projections, (19b) and (19c) respectively. (19) a. abdoljalil ham bā rang=e paride va chashm-​ān=e [AP Abdoljalil also with colour=ez fly.pfp and eye-​pl=ez negarān=e forurikhtan=e divār=e khāne=ash] āntaraftar worried=ez crumble=ez wall=ez farther istāde ( ... ) bud stand.pfp ( ... ) be.pst ‘Abdoljalil also with his pale figure and his eyes worried about the crumbling of the wall of his house was standing farther.’ [M. Dowlatābādi, Ruzegār-​e separi shode...] b. ruz=e [P P ghabl az dastgiri=e farmānfarmā va pesar-​ān=ash] day=ez before of arrest=ez Farmānfarmā and son-​ ‘the day before Farmānfarmā’s and his sons’ arrest’ (Behnud 1995: 65) c. sekke-​hā=ye [P P tu=ye jib=e shalvār=ash]         oftād-​and coin-​pl=ez   in=ez pocket=ez fall.pst-​ ‘The coins in his trousers’ pocket fell down.’

Specific Features of Persian Syntax    237 Samvelian (2007, 2008)  then suggests accounting for the restrictions on the Ezafe in morphological terms. The Ezafe is viewed as a phrasal affix attaching to the head and its intermediate projections within the NP and indicating that the marked head or the intermediate projection is awaiting a dependent. Samvelian (2007) argues that the restrictions highlighted by Samiian (1983) and Ghomeshi (1997a) can be accounted for in terms of slot competition between the members of the class of phrasal affixes, to which belong the Ezafe affix itself but also the indefinite determiner =i and pronominal clitics. The latter may combine with word-​level inflectional affixes, that is, the plural suffix -​hā and the definite suffix -​(h)e, but are in complementary distribution with the members of their own class and thus mutually exclude each other. The major argument in favour of the affixal view of these enclitics is provided by restrictions on their co-​occurrence: any sequence containing two or more of these enclitics is excluded, even when their scope is not the same constituent. The incompatibility between the indefinite determiner =i and clitics is illustrated by (12a). Examples in (20), where a head noun is followed by an adjective, illustrate the same constraint on the co-​occurrence between the Ezafe and =i. Note that the indefinite determiner =i can either occur to the edge of the NP, that is on the adjective, as in (20b), in which case the modifier is introduced by the Ezafe, or on the head noun, between the head noun and the adjective, (20c), and in this case the Ezafe is excluded, (20a). These facts have led some linguists to consider that the determiner =i in examples such as (20c) cumulates the function of both the indefinite determiner and the Ezafe. Perry (2005), for instance, uses the term ‘split Ezafe’ (p. 74) for the enclitic =i in these cases. Lazard (1966: 257) expresses a similar opinion, noting that in addition to its role as a determiner, =i acts in such contexts as a linker, being thus comparable to the Ezafe. (20) a. * khāne=i=e/​=e=i digar   house=indef=ez/​ez=indef another b. khāne=ye digar=i house=ez another=ind c. khāne=i digar house=indef another ‘another house’ Examples in (21), namely the ungrammaticality of (21b), illustrate furthermore the fact that any combination of the three enclitics under discussion is excluded even when they have different scopes. In both (21a) and (21b) a reduced relative clause (RRC), introduced by the Ezafe,6 is embedded within the NP headed by ghahremān ‘hero’. The two NPs differ solely with respect to the constituent ordering within the reduced relative clause. In (21a), the PP az mihan-​ash ‘from his homeland’ precedes the participial head of the modifier, 6  Note that contrary to finite relative clauses, which exclude the Ezafe, reduced relative clauses are always introduced by the Ezafe.

238   Pollet Samvelian while in (21b) it follows the head. Though both constituent orderings within the RRC are grammatical,7 the addition of a possessor NP after the reduced relative is possible only in (21a) but not in (21b). (21) a. ghahremān-​e [RRC   az mihan=ash rānde shode]=ye in romān hero=ez   from drive.pfp become.pfp]=ez this novel ‘the hero of this novel, (who is) driven away from his homeland’ b. * ghahremān=e [RRC rānde   hero=ez drive.pfp in romān this novel

shode az mihan=ash]=e become.pfp from

Samvelian (2007) claims that this contrast can be attributed to the fact that in (21b) the Ezafe is attached to the personal enclitic =ash, but not in (21a). Contrary to (20a), the Ezafe and the personal enclitic have two different scopes in (21b): the personal enclitic is attached to the NP mihan ‘homeland’, while the scope of the Ezafe is the whole N′ ghahremān=e rānde shode az mihan=ash. These facts are reminiscent of (haplology) phenomena discussed by Zwicky (1987) and Miller (1992), which involve the English possessive (genitive) ’s and French weak functional words. Along the same lines of argumentation, Samvelian (2007) concludes that the Ezafe, the determiner =i, and personal enclitics are best regarded as phrasal affixes and outlines a morphological treatment of these items in terms of edge inflection (Klavans 1985; Lapointe 1990, 1992; Tseng 2003) dealt with by word-​level morphology. The Ezafe is thus considered as an inflectional affix adjoining to any nominal non-​ maximal projection and registers the presence of a syntactic dependent, a modifier or a single NP complement, within phrases headed by a nominal category: that is, nouns, adjectives, and nominal prepositions.8


More generally, within APs and RRCs, the prepositional complement can either precede or follow the head, nazdik be khāne/​be khāne nazdik ‘close to home’. The difference between the two orderings is a matter of register. The second ordering, where the complement precedes the head, is rather formal or literary. 8  For the details of the formalization within Head-​driven Phrase Structure Grammar (HPSG) (Pollard and Sag 1994) and the way the feature [+Ez] (which indicates the presence of the Ezafe) and the feature [+Dep] (which indicates that the head or the intermediate projection are awaiting a dependent) are introduced and percolated through the syntactic structure, see Samvelian (2007: 634–​9).

Specific Features of Persian Syntax    239 (22)

The internal structure of the Persian NP (Samvelian 2007: 637, (57)) N”


N’ [–Ez,–Dep]

ān N’






[+Ez] N





mojgān=e P az



[–Ez] sangin=e rimel

9.2.5  Ezafe as the result of a roll-​up movement Kahnemuyipour (2014) develops a phrasal movement analysis of the Ezafe construction using what is known in the literature as roll-​up movement (Cinque 2005, 2010). Contra Larson and Yamakido (2008), who take the basic word order of the Persian NP to be head-​ initial, Kahnemuyipour (2014) assumes a head-​final ordering for Persian NPs. While in Larson and Yamakido’s account, the presence of the Ezafe is the result of a non-​movement, in Kahnemuyipour’s system, modifiers involved in the Ezafe construction are uniformly merged in the specifiers of functional projections above the NP, regardless of whether they are bare or phrasal. Under this view, movement and overt morphology go hand in hand. When there is no movement, there is also no overt morphology. This implies that the pre-​ nominal order within the NP, i.e. the one observed in English, is the basic one. The movement that derives the postnominal order is accompanied with overt morphology, hence the existence of the Ezafe, which is seen as a reflex of the roll-​up movement.

240   Pollet Samvelian The backbone of Kahnemuyipour’s analysis is the near-​perfect correlation between the order of the head noun and other constituents within the NP and the presence of the Ezafe, with the noun clearly demarcating the distribution of the latter: Ezafe cannot occur on elements surfacing before the noun and is mandatory for every element following it. Pre-​ nominal elements are considered as heads, i.e. X°s, and are not involved in the roll-​up derivation. Therefore, they do not need the Ezafe. Post-​nominal elements, on the other hand, are phrases, whose surface position is the result of the roll-​up derivation, leading to the appearance of the Ezafe marker. A crucial aspect of this analysis is that modifiers, whether bare or phrasal, are (part of) XPs located in the specifiers of functional projections above the noun, in accordance with Bare Phrase Structure (Chomsky 1995), a bare adjective is treated as A/​AP and can occupy a structural position similar to that of an AP with a complement. Kahnemuyipour (2014) argues againt Ghaniabadi (2010), who treats bare adjectives and phrasal modifiers in radically different ways. For Ghaniabadi, bare adjectives are heads which are head-​adjoined to the noun, whereas AP and PP modifiers are phrasal elements in the specifiers of functional projections above the NP. As it was mentioned in 9.2.3, Ghaniabadi’s main argument for this bipartition comes from an elliptic construction he refers to as the ‘empty noun construction’, where the head noun is elided, leaving behind one or more modifiers. He claims that this type of ellipsis is only possible with bare adjectives, cf. (18a), and not with AP or PP modifiers, cf. (18b) and (18c) respectively. In other words, a head noun can be elided (along with other head-​adjoined elements) only if the remnant is a bare adjective and not an AP or a PP. Ghaniabadi claims that the ellipsis of the noun along with another bare adjective is possible, because these adjectives are recursively head-​adjoined to the noun. This makes it possible to elide the noun with one or more bare adjectives as long as what is left behind is another bare adjective and not an AP. Kahnemuyipour first notes that pragmatic and lexical restrictions on these elliptical constructions undermine any strong conclusion about the head vs. phrasal status of the modifiers based on the ungrammaticality of a few examples involving one or the other type of remnant. He furthermore claims that there are grammatical examples of noun ellipsis with a modifying PP and AP as remnants and provides a few examples, like the one in (23). (23) Context: There are two jars, one filled with wine and the other with vinegar. The speaker is contrasting the content of the jars, with serke ‘vinegar’ contrastively focused and prosodically prominent. Ali tong=e [AP por az sharāb]=o bardāsht, tong=e [AP por az Ali jar=ez full of wine=ra take.pst.3sg jar=ez full of serka]=ro gozāsht vinegar=ra leave.pst.3sg ‘Ali took the jar filled with wine, and left the one filled with vinegar.’  (Kahnemuyipour 2014: 13, (25a)) Based on these facts, a uniform treatment of bare adjectives and phrasal modifiers as XPs is adopted. The Persian DP is taken to be head-​final, with the NP merged at the bottom of the tree structure and the APs residing in the specifiers of projections above

Specific Features of Persian Syntax    241 it. The demonstrative (Dem) and the Numeral (Num) are heads higher up in the tree structure in accordance with Cinque (2010). In addition, there are intermediate projections enabling the roll-​up derivation. The relevant structures and roll-​up movements are shown schematically in (24), where the projections hosting the APs are marked as XP, YP, etc. and the intermediate projections are marked as AgrPs. Under this view, the Ezafe can be seen as the surface realization of the suggested inversion process, i.e. a linker in the sense of Den Dikken (2006). The height of the movement corresponds to the realization of the Ezafe marker. The ‘overt’ movement stops below elements that are high in the universal schema such as numerals and demonstratives. Consequently, the Ezafe does not occur on the latter. (24) Deriving the Ezafe construction via roll-​up movement (Kahnemuyipour 2014: 17, (31)) DemP Dem

Numeral Numeral


Agrx Ez(e)





Agry Ez(e)




9.2.6 Concluding remarks on the Ezafe construction As mentioned in the introductory remarks, the Ezafe construction is a common feature of those Iranian languages which display a head-​initial word order within their NP. While this construction is assumed to have the same origin in all these languages, it has taken a different path from one language to another, resulting in a contrasting picture in modern Iranian languages. Thus, phonological, morphological, and syntactic properties of the Ezafe construction considerably cross-​linguistically vary. In some languages, the Ezafe particle is inflected and displays agreement features, e.g. Kurmanji and Zazaki, while in

242   Pollet Samvelian others, such as Persian, it is invariable. Likewise, while it behaves rather like a post-​lexical clitic in some cases, showing thus a certain degree of autonomy with respect to its host, in some other cases, it is more or less amalgamated with other nominal inflectional affixes. The same degree of variation is observed in the syntax of the Ezafe construction. While some languages treat relative finite clauses on a par with APs and PPs with respect to the Ezafe (e.g. Kurmanji), other languages exclude finite relatives from the Ezafe construction (e.g. Persian). Likewise, the order of the Ezafe-​marked constituents may vary: while in some languages (e.g. Persian) the possessor NP closes the Ezafe domain, in others modifiers occurring after the possessor NP can be introduced by the Ezafe (e.g. Kurmanji and Sorani). Finally, the lexical head licensing the Ezafe construction may also be subject to variation. While the NP is the favourite domain of the realization of the Ezafe construction, adjectival and prepositional heads can also host the Ezafe construction in some languages (e.g. Persian), but not in all of them (e.g. Sorani). Studies on the Ezafe construction have up to now focused on the description and modelling of the phenomenon in a single language, with a clear preponderance of Persian. Investigating the Ezafe construction in less-​studied Iranian languages using cross-​linguistic approaches can shed a new light on the construction itself and on the nature of dependency relations within the NP. Another promising topic of investigation is the contact-​induced Ezafe construction in non-​Iranian languages spoken in the area, such as Aramaic languages. Finally, broader typological studies investigating the resemblances and differences between the Ezafe and linkers (loosely speaking) within the NPs in the languages of the world can also constitute another fruitful vein of research.

9.3  Differential object marking Differential object marking (DOM)9 is a rather common feature of Iranian languages and, according to Windfuhr (2009: 33), ‘a response to the loss of inflectional case marking’. It has been one of the main topics of interest—​if not the main topic—​in various descriptive and formal studies of Persian, and continues to generate interest despite the significant number of publications dedicated to the topic. In Modern Persian, DOM is realized by the enclitic =​rā, in the formal register, and its colloquial variants =ro and =o (after a consonant). Rā is obligatory with all definite objects. Historically, =​rā is the phonological reduction of rāy in Middle Persian, which in turn comes from the Old Persian postposition rādi(y), ‘for (the sake of)’, ‘in account of ’, ‘concerning’. The suffix =​rā developed as an indirect object marker in late Middle Persian and Early New Persian and progressively developed into a direct object marker only in the course of several centuries.10 According to Paul (2003: 182), unlike in later classical and Modern Persian, it is not predominantly definiteness that determines when =​rā occurs and when not, but animacy. For more information on Ezafe, see Chapters 2, 3, 7, 8, 10, 12, 15, and 19 in this volume. 9  The term was coined by Bossong (1985), who provides a detailed account of the phenomenon in a variety of languages. 10  For a detailed discussion on =rāy in Middle Persian see Jügel (2015: 192–​218, 340–​2).

Specific Features of Persian Syntax    243 The enduring interest in =​rā is due to the fact that a cluster of heterogeneous parameters seem to be at work in rā-​marking, since =​rā can also occur with indefinite direct objects and even other grammatical functions. Spotting the relevant or prevailing parameter(s) that determine(s) the presence of =​rā has thus been the major issue in studies on DOM in Persian. Another issue that has been extensively investigated in generative studies is the syntactic consequence of rā-​marking and whether rā-​marked objects occupy a different syntactic position with respect to their non-​marked counterparts.

9.3.1  Rā as a mark of specificity Cross-​linguistic studies on DOM (Aissen 2003; Bossong 1985; Comrie 1979, 1989; Dalrymple and Nikolaeva 2011; Hopper and Thompson 1980; Lazard 1982, 1984b; Malchukov 2008; Næss 2004, 2007; de Swart 2007; among others) have shown that animacy and definiteness (or specificity) are generally involved in DOM: animate and/​or definite objects are more likely to be marked than inanimate and/​or indefinite objects. Among these two semantic properties acting upon DOM cross-​linguistically, the degree of determination (i.e. definiteness or specificity) prevails in Persian. Grammars and linguistic studies generally qualify =​rā as the mark of the definite direct object (Gharib et al. 1994; NatelKhanlari 1984; Lazard 1957; Mahootian 1997; Sadeghi 1970; among many others). All definite direct objects must be rā-​marked. This implies that personal pronouns, proper nouns, and NPs introduced by a definite determiner (e.g. demonstrative, interrogative) must be marked, as in (25a). This also implies that all definite descriptions and NPs whose reference is unique, (25b), anaphoric NPs, and all those NPs whose reference is given by the context must also be followed by =​rā too, (25c). The omission of =​rā in all these cases yields strict ungrammaticality. (25) a. shomā=rā/​ maryam=rā/​ ān dokhtar=rā did-​am you=ra/​ Maryam=ra/​ that girl=ra see.pst.1sg ‘I saw you/​Maryam/​that girl.’ b. nakhost vazir=e farānse=rā did-​am prime minister=ez France=ra see.pst-​1sg ‘I saw the French prime minister.’ c. ketāb=rā kharid-​i? book=ra buy.pst-​2sg ‘Did you buy the book?’ However, although =​rā is absolutely required with definite NPs, definiteness cannot account for the whole range of the distribution of =​rā. In other words, while definiteness constitutes a sufficient condition for rā-​marking of DOs, it is not a necessary condition, since indefinite objects may as well be rā-​marked, (26). (26)

maryam zan=i(=rā) dar kuche did Maryam woman=indef(=ra) in street see.pst-​3sg ‘Maryam saw a woman in the street.’

Note that in this latter case, the omission of =​rā does not render the sentence ungrammatical.

244   Pollet Samvelian Some authors have therefore suggested that specificity, rather than definiteness, is responsible for rā-​marking (Browne 1970; Browning and Karimi 1994; Karimi 1990, 1996). In this view, the occurrence of =​rā with indefinite NPs is not optional, but depends on the reading of the NP: all specific objects must be rā-​marked, be they definite or indefinite. Note that there is no consensual definition of the notion of specificity. Informally speaking, the referent of a specific indefinite expression is identifiable to the speaker (but not to the addressee). A prototypical specific indefinite is generally assumed to have wide scope, a referential reading, and an existential presupposition.11 Karimi (1996) suggests that a specific NP must be rā-​marked if it occurs in the syntactic configuration, in (27):12 (27) [CP ... NP... [β ... α ... ] ... ] In contrast with this categoric view, other studies insist on the fact that a cluster of features or properties, and not a single binary feature (be it definiteness or specificity), is involved in DOM. Lazard (1982, 1994) claims that, apart from definiteness, the presence of =​rā can be triggered by factors such as animacy (or humanness), the semantic ‘contentfullness’ of the verb, the semantic ‘distance’ between the verb and the object, the relative weight of the syntactic constituents, and finally the information structure. Lazard’s approach combines thus a cluster of non-​homogeneous parameters, involving not only the inherent semantic properties of the object itself but also its relationship with the verb and particularly the way speakers organize their utterance in order to ‘polarize’ the object. Lazard coins the term ‘polarized object’ to designate rā-​marked objects, as opposed to ‘depolarized’, i.e. non-​rā-​marked, objects. Because of the complex interaction between these factors, he concludes that it is impossible to formulate a categoric rule for rā-​marking in Persian. Note that Bossong (1991) also makes the same remark on DOM in variety of languages,13 for example, Hindi, Kannada, and Ostyak, and claims that the rules of DOM in these languages must allow for a certain degree of variability across speakers and situations. The fact that specificity is not sufficient by itself to account for the whole range of the uses of =​rā has been noted by several other authors (Dabir-​Moghaddam 1992; Ghomeshi 1997b; Meunier and Samvelian 1997; among others). The most obvious counterexample to such a generalization is provided by the use of =​rā with generic objects: (28) a. jāni=rā mojāzāt mi-​kon-​and murderer=ra punishment ipfv-​do.prs-​3pl ‘Murderers are punished.’ (lit. ‘They punish murderers.’) b. serke shir=rā mi-​bor-​ad vinegar milk=ra ipfv-​cut.prs-​3sg ‘Vinegar curdles milk.’ 11 

(Lazard 1982: (43))

(adapted from Phillott 1919: 455)

For a detailed discussion of the notion and controversies on specificity, see Fodor and Sag (1982); Hintikka (1986); Enç (1991); Farkas (1995, 2002). 12  Karimi suggests a different view of rā-​marking in a recent work (Karimi and Smith 2015). See also Karimi in this volume (Chapter 7). 13  For a detailed discussion on this point, see also Dalrymple and Nikolaeva (2011).

Specific Features of Persian Syntax    245 c. mi-​dān-​id chetor gusfand=rā mi-​kosh-​and ipfv-​know.prs-​2pl how sheep=rā ipfv-​read.prs-​3pl ‘Do you know how sheep are killed?’ (lit. Do you know how they kill sheep?’  (adapted from Phillott 1919: 459) d. arusi-​hā=rā injā zohr-​hā mi-​gir-​and wedding-​pl=ra here noon-​pl ipfv-​catch.prs-​3pl ‘Weddings are celebrated here at noon.’ (lit. ‘Here, they celebrate weddings at noon.’)  (Modarres Sadeghi 1991: 74) Phillott (1919) notes that the omission of =​rā in (28c) would not change the interpretation of the sentence. A very convincing example in this sense is provided by Hincha (1961), quoted by Lazard (1982): (29) arabi balad=i? ( ... ) torki=rā balad=i? Arabic knowing=cop.2sg Turkish=ra knowing=cop.2sg ‘Do you know Arabic? And what about Turkish, do you know Turkish?’ As Lazard (1982) notes, (29) uncontroversially proves that referentiality is not the only trigger of DOM in Persian and in some cases does not play any role at all.

9.3.2  Rā as a mark of high transitivity Another set of data showing that other factors than referentiality intervene in rā-​marking is provided by the following examples: (30) a. ru=ye yakh=rā kāh mi-​pash-​and on=ez ice=ra straw ipfv-​spread.prs-​3pl ‘[They] spread straw on the ice.’

(Lazard 1982: (63))

b. zamin=e posht=e bāgh=rā lubiyā kāsht-​am ground=ez behind=ez garden=ra bean plant.pst-​1sg ‘I have planted beans in the ground behind the garden.’ (Afghāni 1993: 337) yeki digar mi-​goft injā=rā behtar ast yek one other ipfv-​say.pst.3sg here=rā better be.prs.3sg one tappe be-​kesh-​id hill sbjv-​draw.prs.2pl ‘Another one was saying: here, it would be better to draw a hill.’ (Takhti 1997: 158) c. give=rā na-​bāiad nakh bast espadrille=ra neg-​must string attach.sinf ‘One should not attach laces on espadrilles.’

(Dowlatābādi 1979: 89)

246   Pollet Samvelian d. shāiad u=rā ham āzār-​hā resānde bud-​and perhaps he=ra also persecution-​pl convey.pfp be.pst-​3pl ‘Perhaps they have also harmed him.’

(Sāri 1998: 124)

e. masalan be jā=ye pānsad nafar, emshab hezār for example instead=ez five hundred person tonight thousand nafar=rā dar masjed=e āzarbāijāni-​hā shām sbjv-​give.prs.3sg person=ra in mosque=ez Azerbaijani-​pl dinner be-​dah-​ad ‘For example, instead of 500 people, tonight, he’d better offer dinner to 1,000 people at the Azerbaijanis’ mosque.’  (Ettehādiye 1996: 68) In these examples, the rā-​marked constituent is not the DO properly speaking, but a locative or a ‘dative’ argument of the verb, which can also have a prepositional realization. In example (30b), for instance, zamin=e posht=e bāgh ‘the ground behind the garden’ is by preference introduced by a locative preposition such as dar ‘in’, as illustrated in example (31a). In example (30e), hezār nafar ‘1,000 people’, which is the goal or beneficiary argument of the verb dādan, is canonically introduced by the preposition be ‘to’: (31)

a. dar zamin=e posht=e bāgh lubiyā kāsht-​am in ground=ez behind=ez garden bean plant.pst-​1sg ‘I have planted beans in the ground behind the garden.’ b. be hezār nafar shām mi-​dah-​ad to one thousand person dinner ipfv-​give.prs.3sg ‘He offers dinner to 1,000 people.’

Lazard (1982) claims that in (30a), the whole surface or space designated by the Ground argument, i.e. ru=ye yax ‘on the ice’, is occupied by the result of the action. In other words, the rā-​marked variant of the locative argument implies a holistic reading, while the prepositional variant gives only a locative indication. Therefore, (30b) and (31a) do not display the same truth conditions. Interestingly, the same remarks have been made for the spray-​load alternation in English (Levin 1993), to spray paint on the wall vs. spray the wall with paint. It has been claimed that in this latter case, the locative argument receives a holistic reading (Anderson 1971). The alternation illustrated by (30e) and (31b) on the other hand is comparable to the so-​called ‘dative alternation’ in English (to give something to somebody vs. to give somebody something). The interaction between several parameters, such as definiteness, animacy, and discourse accessibility (or givenness), has been shown to favour the double object variant (Bresnan et al. 2007). It seems at first sight that some of these parameters also play a role in the preference for the double object construction in Persian, although the phenomenon needs to be thoroughly investigated. Lazard (1982) coins the term ‘Polarized Quasi-​Objects’ for the rā-​marked constituents in (30). The role of =rā is thus to turn some oblique arguments, i.e. those that share some typical properties of rā-​marked DOs, into objects. The use of =​rā in this function is not limited to

Specific Features of Persian Syntax    247 locative (or Ground) and ‘dative’ (Goal) arguments but extends to a wide variety of cases, illustrated by the following examples: (32) a. be khodā man hāzer=am sad farsakh rāh=ro bā to God I ready=cop.1sg hundred league path=ra with to piāde bi-​ā-​m you on foot sbjv-​come.prs-​1sg ‘Believe me, I’m ready to walk one hundred leagues with you.’ (lit. ‘I’m ready to come one hundred leagues with you.’) b. zohr, nāhār=rā bā yek dust ( ... ) dar restorān=e farid noon, lunch=ra with a friend ( ... ) in restaurant=ez Farid ( ... ) gharār o madār dār-​am ( ... ) appointment have.prs-​1sg ‘At noon, for lunch, I have an appointment with a friend at Farid restaurant.’ (Fasih 1998: 199) c. shab=rā dar ghahvekhāne manzel kard-​im night=ra in cafe stop-​off do.pst-​1pl ‘At night, we stayed at the café.’ or ‘We spent the night at the café.’

(Lazard 1982: (70))

d. man=e bichāre che be-​gui-​am ke I=ez miserable what sbjv-​say.prs-​1sg that zendegi=am=rā ranj bord-​am ( ... ) life=cl.1sg=ra pain carry.pst-​1sg ‘What can I say, I who am miserable and who have suffered all my life.’ (Lazard 1982: (71)) In (32a), the rā-​marked constituent is a modifier denoting the distance. In (32b), =​rā occurs with an adjunct expressing the purpose. In (32c) and (32d), =​rā is adjoined to temporal modifiers. Concerning the presence of =​rā with temporal modifiers, Lazard (1982) claims that a comparable semantic effect to the one observed with locative arguments is at play here as well. Unlike their non-​rā-​marked counterparts, rā-​marked temporal modifiers serve not only the temporal anchoring to the activity denoted by the predicate, but also its temporal delimitation: that is, the activity occupies the entire time interval. This contrast is illustrated by the difference of interpretation between examples (32d) and (33): (32d), but not (33), implies that the speaker has spent his life suffering. (33) dar zendegi=am ranj bord-​am in life=cl.1sg pain carry.pst-​1sg ‘I have suffered in my life.’ In accordance with Lazard (1982), Ghomeshi (1997b) assumes that these modifiers behave as prototypical direct objects in that they ‘measure out’ or delimit the event described by the verb (Ghomeshi and Massam 1994) and concludes that =​rā is in fact a marker of high

248   Pollet Samvelian transitivity, since the cluster of the properties triggering its presence all correlate with high transitivity in the sense of Hopper and Thompson (1980).

9.3.3  Rā as a mark of topicality The idea that DOM in Persian depends not only on the inherent referential features of the object but also on the information structure has been defended in several studies (Dabir-​ Moghaddam 1992; Dalrymple and Nikolaeva 2011; Ghomeshi 1997b; Karimi 1990; Lazard 1982; Meunier and Samvelian 1997; Peterson 1974; Shokouhi and Kipka 2003; Windfuhr 1979; among others). It has been observed that rā-​marked objects tend to be topics, while non-​rā-​ marked objects display focus properties. The set of data in (34) unambiguously highlights the link between =​rā and topicality. In these examples, =​rā occurs with floating topics located at the left periphery of the sentence and cross-​referenced by a clitic. (34) a. ui =rā gorosne va bi mi-​gozāsht dud=eshi he=ra hungry and without smoke=cl.3sg ipfv-​leave.pst.3sg ‘He would leave him hungry and without opium.’ (Lazard 1982: (83)) b. ān dokhtari =rā sad daf ’e barā=yashi nāme that girl=ra hundred time for=cl.3sg letter ‘That girl, I have written one hundred letters to her.’

nevesht-​am write.pst-​1sg (Lazard 1982: (87))

c. portegāli =o bāiad pust=eshi =o kand ba’d khord orange=ra must skin=cl.3sg=ra tear.sinf then eat.sinf ‘As for oranges, one must peel them and then eat them.’ (Lazard 1982: (92)) The floating topic can correspond to a number of different grammatical functions. In (34a), the topicalized constituent is the DO doubled by a clitic in the sentence, in (34b) the prepositional argument of the verb, and in (34c) the complement of the head noun of the direct object (i.e. it does not bear a function with respect to the verb). Since the referential properties of an NP are generally correlated to its discourse status—​definite/​specific NPs are more likely to be topics while non-​specific NPs tend to be focuses—​some studies have claimed that the main function of =​rā is to mark topicality. Peterson (1974) suggests that specific DOs are rā-​marked because topics are specific in nature. Dabir-​Moghaddam (1990, 1992, 2009), following Windfuhr (1979), claims that =​rā is the mark of secondary topics: all rā-​marked objects are topics while all non-​ marked objects are part of the comment. Dalrymple and Nikolaeva (2011) adopt a less ‘strong’ version of this analysis. While they agree with Dabir-​Moghaddam (1992) that the main function of =​rā is secondary topic marking, they nevertheless disagree with the latter on two points: (i) ​Rā can mark the primary topic as well. (ii) While the distribution of =​rā on non-​objects exclusively depends on topicality, the picture is more complex for direct objects and topicality is not the only relevant factor in determining rā-​marking.

Specific Features of Persian Syntax    249 Following Reinhart (1982), Gundel (1988), and Lambrecht (1994), Dalrymple and Nikolaeva (2011) define topicality as a matter of ‘aboutness’: the topic is the entity that the proposition is about. Consequently, topicality has to do with the construal of the referent as pragmatically salient (or prominent: de Swart 2007) so that the assertion is made about this referent. A potential diagnostic for topic-​hood is the ‘what-​about’ or ‘as-​for’ test (Gundel 1988; Lambrecht 1994; Reinhart 1982). Although topicality correlates with the role played by the referent in the preceding discourse, the correlation is imperfect. Topicality is mainly a question of saliency, and although definite or specific NPs are generally given more saliency because of their referential properties, non-​specific NPs may also become salient if the speaker decides so in a given communicative context. The topic role is not necessarily unique and several studies have acknowledged the existence of at least a secondary topic along with the primary topic (Givón 1984; Nikolaeva 2001; Polinsky 1995). Nikolaeva (2001) defines secondary topic as ‘an entity such that the utterance is construed to be ABOUT the relationship between it and the primary topic’. Topics are ordered with respect to saliency: The primary topic is more pragmatically salient than the secondary topic. Dalrymple and Nikolaeva (2011) note that =​rā can be a primary topic as well as the secondary topic. In (35a), the rā-​marked DO is a secondary topic, the subject being the primary topic, while in (35b) it is the primary topic given that the subject is the focus and therefore cannot be the primary topic. (35)

a. maryam ketāb=o be ki dād? Maryam book=ra to who give.pst.3sg ‘To whom did Maryam give the book?’ b. ki māshin=o did? who car=ra see.pst.3sg ‘Who saw the car?’

The second disagreement is more important. Dalrymple and Nikolaeva (2011) claim topicality is not the only relevant factor in determining rā-​marking on objects. It is a factor for some objects, i.e. indefinite objects. On definite objects, however, rā-​marking is essentially motivated by definiteness, having to do with features of topic-​worthiness: all definite objects must be marked, independent of their information status and even if they are in focus. In fact, Dalrymple and Nikolaeva (2011: 113) postulate two =​rā’s or two functions for =​rā in examples such as (36): the first marks the topicality of the temporal adjunct, while the second is licensed by definiteness. (36) faghat in  ye  sā’at=o   in   ketāb=o  be-​khun just   this one hour=ra this book=ra imp-​read ‘Read this book this one hour!’ Samvelian (2002) also defends the assumption that there are two =​rā’s in Persian, noting a crucial difference between rā-​marked arguments on the one hand and rā-​marked floating topics and adjuncts on the other hand. While there can only be one rā-​marked argument (i.e. object) in a simplex sentence, the number of rā-​marked floating topics and adjuncts is not grammatically limited. The examples in (37) show that the nominal element of a complex predicate can be rā-​marked if referential. The examples in (38) illustrate a ‘transitive’

250   Pollet Samvelian use of the same predicate, so that the verb is preceded by two direct nominal dependents. The relevant point here is that although the nominal element of the predicate can still be modified and even determined, as in (38b), it cannot be rā-​marked, (38c). The situation is identical in ‘double object’ constructions in (39). The complex predicate pādāsh dādan ‘to reward’ (lit. ‘reward give’) may take two direct objects (Theme and Goal), but only one of them can be rā-​marked. When both arguments are definite, the dative (Goal) argument must have a prepositional realization. (37) a. maryam   ārāyesh  karde    bud Maryam  makeup   do.pfp be.pst.3sg ‘Maryam wore makeup.’ b. maryam zibā-​tarin ārāyesh=rā karde Maryam beautiful-​spr makeup=ra do.pfp ‘Maryam wore the most beautiful makeup.’

bud be.pst.3sg

(38) a. maryam in arus=rā ārāyesh karde bud Maryam this bride=ra makeup do.pfp be.pst.3sg ‘Maryam had made this bride up.’ b. maryam in arus=rā ārāyesh=e zibā=i karde    bud Maryam this bride=ra makeup=ez beautiful=indef do.pfp be.pst.3sg ‘Maryam made this bride up beautifully.’ c. *maryam in arus=rā zibā-​tarin ārāyesh=rā karde bud Maryam this bride=ra beautiful-​spr makeup=ra do.pfp be.pst.3sg (intended) ‘Maryam put the most beautiful makeup on this bride.’ or ‘Maryam made this bride up in the most beautiful way.’ (39) a. maryam=rā ketāb=i pādāsh dād-​and Maryam=ra book=indef reward give.pst.3pl ‘They rewarded Maryam with a book.’ or ‘They gave Maryam a book as a reward.’ b. be maryam in ketāb=rā pādāsh dād-​and to Maryam this book=ra reward give.pst.3pl ‘They gave this book to Maryam as a reward.’ c. *maryam=rā in Maryam=ra this

ketāb=rā book=ra

pādāsh dād-​and reward give.pst.3pl

By contrast, rā-​marked floating topics and adjuncts can be multiple: (40) contains two rā-​ marked temporal modifiers and one rā-​marked object. (40) sāl=e pish=o, ruz-​ā=ye jom’a=ro, bachch-​hā-​ro sinamā year=ez before=ra, day-​pl=ez Friday=ra, kid-​pl=ra theatre mi-​bord-​am ipfv-​take.pst-​3pl ‘Last year, on Friday evenings, I used to take the kids to the movies.’

Specific Features of Persian Syntax    251 To sum up, topicality cannot be considered to be the unique trigger of =​rā, as claimed by Dabir-​Moghaddam (1990, 1992, 2009). In fact, it would be nothing less than surprising if an item whose presence is mandatory in some cases was exclusively triggered by information structure. Information structure is a matter of choice: the speaker decides how to ‘structure’ the utterance in order to convey information. The presence of =​rā with definite objects is obligatory in Persian. So, whatever the choice of the speaker about the information or discourse status of a definite object, =​rā must be there as a grammatical constraint. The price for the analysis of =​rā as an exclusive topic-​marker would be to admit that Persian speakers have no choice as to the discourse status of the definite direct object, which must be necessarily construed as a topic. If definite subjects can be non-​topical in Persian, why would definite objects be denied this status? As has been noted by Karimi (1989, 1990), a definite DO can be a focus. In (41), the rā-​marked object can be the answer to a who-​question and hence is the focus of the utterance. Consequently, topicality is not a necessary condition for rā-​marking. (41) diruz, tu kuche Maryam=o Yesterday in street Maryam=ra ‘Yesterday I saw Maryam in the street.’

did-​am see.pst-​

There remains one last problem with the analysis of =​rā as a topic-​marker. Karimi (1989) rightly claims that not all non-​subject topics are rā-​marked, as illustrated by (42), where the bare object gusht ‘meat’ is topicalized without being rā-​marked. (42) gusht mi-​dun-​am ke maryam meat ipfv-​know.prs-​ that Maryam (lit.) ‘Meat, I know, Maryam never eats.’

hichvaght ne-​mi-​khor-​e never neg-​ipfv-​eat.prs-​

More investigation is needed on the topicalization of ‘bare’ objects in Persian, but one can already affirm that not only is topicality not a necessary condition for rā-​marking, it is not either a sufficient condition.

9.3.4 DOM, object positions, and word order Several studies have noted that rā-​marking has a syntactic correlate:  rā-​marked objects tend to precede prepositional arguments and even subjects and thus do not occur in the canonical position of DOs in Persian, i.e. adjacent to the verb. Based on this fact and some others that will be discussed later, many studies have suggested a dual-​position account of the direct object depending on its markedness (Browning and Karimi 1994; Ganjavi 2007, 2011; Ghomeshi 1996, 1997b; Karimi 1990, 2003).14 It is assumed that rā-​marked objects do not occupy the same syntactic position as their non-​rā-​marked counterparts and appear in a higher position than the former. According to Karimi (2003), for instance, rā-​marked DOs 14 

Studies of phrasal syntax within the generative paradigm generally assume two distinct positions for objects, VP-​internal and VP-​external, and establish a correlation between object position and object marking (Diesing 1992; van Geenhoven 1998; Ritter and Rosen 2001; among many others): indefinite/​ non-​specific objects are generally assumed to be VP-​internal, while marked objects are VP-​external.

252   Pollet Samvelian occupy the position of the specifier of VP, while non-​rā-​marked DOs occupy a lower position, that is, the position sister to the verb (under the V′):15 (43) a. [VP DP[+ Specific ] [V′ PP V]] b. [VP [V′ PP [V′ DP[−Specific ] V] ] ] The Two Object Position Hypothesis (TOPH) is built on the claim that rā-​marked and non-​ rā-​marked objects display consistent asymmetries with respect to word order, licensing parasitic gaps, and binding anaphors, and that they cannot be coordinated. The unmarked word order asymmetry is the backbone argument of the TOPH. It is generally assumed that in unmarked, canonical or neutral word order in ditransitive constructions in Persian, rā-​marked DOs precede the indirect object (IO) while non-​rā-​marked DOs follow the IO (Browning and Karimi 1994; Ganjavi 2007; Givi Ahmadi and Hassan 1995; Ghomeshi 1997b; Karimi 2003; Mahootian 1997; Rasekh-Mahand 2004; Roberts et al. 2009). (44) a. kimeā aghlab barā mā  she’r mi-​khun-​e Kimea often    for us  poem ipfv-​read-​3sg ‘It is often the case that Kimea reads poetry for us.’ b. kimeā aghlab barā mā ye she’r az Hafez mi-​khun-​e Kimea often for us a poem of Hafez ipfv-​read-​3sg ‘It is often the case that Kimea reads a poem by Hafez for us.’ (Adapted from Karimi 2003: 91–​2) (45) a. kimeā aghlab hame=ye she’r-​hā=ye tāze-​ash=ro barā mā mi-​khun-​e Kimea often all=ez poem-​pl=ez new=cl.3sg=ra for us ipfv-​read-​3sg ‘It is often the case that Kimea reads all her new poems for us.’ b. Kimeā aghlab ye she’r az Hafez=ro barā mā mi-​khun-​e Kimea often a poem of Hafez=ra for us ipfv-​read-​3sg ‘It is often the case that Kimea reads a (particular) poem by Hafez for us.’ (Adapted from Karimi 2003: 91–​2) Samvelian (2001) questions this hypothesis and postulates a flat structure for the Persian VP:16 rā-​marked and non-​rā-​marked objects occupy the same syntactic positions. In a series of recent corpus-​based and experimental studies, Faghiri et al. (2014), and Faghiri and Samvelian (2014) show that, with respect to word order, indefinite non-​marked DOs group with marked DOs rather than with bare objects. Faghiri and Samvelian (2015b) and Faghiri (2016) further argue that the asymmetries between marked and non-​marked objects do not need to be accounted for in terms of syntactic positions and are best accounted for by semantic and discourse considerations.


In a more recent work, Karimi (2005) proposes a revised version of her Two Object Position Hypothesis (TOPH), in which both objects are base-​generated in the same position, under the v′. The specific object shifts into the specifier of vP position in order to receive its interpretation. 16  Bonami and Samvelian (2015) also adopt a flat structure for Persian sentences.

Specific Features of Persian Syntax    253 Based on a corpus-​based study17 and experimental follow-​up studies, Faghiri and Samvelian (2014), Faghiri et al. (2014), and Faghiri (2016) investigate ordering preferences between the DO and the IO in the preverbal domain in Persian. Observing that non-​rā-​marked objects show more versatility with respect to word order, the authors resort to a more fine-​tuned classification of unmarked DOs, splitting them into bare DOs, ketāb ‘book’, bare modified DOs, ketāb=e kohne ‘old book’, and indefinite (unmarked) DOs yek ketāb(=e kohne) or ketāb(=e kohne)=i ‘an (old) book’. In addition to the realization of the DO, these studies take into account other potentially influential factors such as relative length, givenness, collocationality, and lexical bias, via mixed-​effect regression modelling, in line with key empirical studies on word order variations (Wasow 2002; among others). The data reveal that while rā-​marked DOs show a strong preference for appearing before the IO, among various non-​rā-​marked DOs, only bare nouns show a strong preference for adjacency to the verb. Interestingly, indefinite (non-​rā-​marked) DOs show a clear preference for the inverse, grouping with rā-​marked DOs. Moreover, extra syntactic factors such as relative length also play a significant role in these ordering preferences. Accordingly, Faghiri and Samvelian (2015b,a) argue that the ordering preferences observed for different types of DOs are best represented as a continuum based on the degree of conceptual and/​or discourse accessibility. The authors conclude that any structural account of word order preferences between DOs would lead to wrong predictions. Moreover, even if the TOPH was to be maintained, word order preferences speak in favour of an identical position for rā-​marked DOs and indefinite non-​rā-​marked DOs. To sum up, word order does not seem to constitute a conclusive criterion in favour of a configurational account of rā-​marking. The behaviour of DOs with respect to licensing parasitic gaps is another argument in favour of the TOPH. According to Karimi (1999: 704), only rā-​marked DOs can license parasitic gaps, see (46). (46) a. Kimea [DP in ketāb=ro]i [C P ghabl-​az inke pro ei be-​khun-​e] Kimea this book=ra before that sbjv-​read.prs-​3sg be man dād to I give.pst.3sg ‘Kimea gave me this book before reading (it).’ b. * Kimea [DP ketāb]i [C P ghabl-​az inke pro ei Kimea book before that be man dād to I give.pst.3sg

be-​khun-​e] sbjv-​read.prs-​3sg

Faghiri and Samvelian (2015a) note, however, that examples in (47) are grammatical despite the fact that the non-​rā-​marked object licenses a parasitic gap. The oddness of (46b) may be 17 

The study is based on the Bijankhan corpus, a corpus collected from daily news and common texts, in particular the newspaper Hamshahri, of about 2.6 million tokens, manually tagged for part-​of-​speech information. The corpus was created in 2005 by the DataBase Research Group at the University of Tehran and can be freely downloaded from their website.

254   Pollet Samvelian due to the fact that the verb is in the past tense and the sentence denotes a specific accomplished event where it is expected for the DO to be known to the speaker and hence a bare DO is not felicitous. (47) a. man [hendune]i [ghabl-​az inke pro ei maze kon-​am] I watermelon before that taste buy.prs-​1sg ne-​mi-​khar-​am neg-​ipfv-​buy.prs-​1sg ‘I wouldn’t buy watermelon(s) before tasting it/​them.’ b. man [ye lebās]i [bedun=e inke pro ei emtehān kon-​am] I a cloth without=ez that try do-​prs.1sg kharid-​am va hichvaght ham na-​pushid-​am=esh buy.pst-​1sg and never too neg-​wear.pst-​1sg=cl.3sg ‘I only bought a piece of clothing without trying (it) and I never wore it.’ Unlike non-​rā-​marked DOs, rā-​marked DOs have been claimed to be able to bind an anaphor (Karimi 2003: 102): (48) a. man [se-​tā bachche-​hā=ro]i [be hamdige]i mo’arrefi I three-​clf child-​pl=ra to each-​other introduce kard-​am do.pst-​1sg ‘I introduced the three children to each other.’ b. * man [se-​tā I three-​clf kardam do.pst-​1sg

bachche-​hā]i child-​pl

[be hamdige]i to each-​other

mo’arrefi introduce

Here again, Faghiri and Samvelian (2015a) and Faghiri (2016) claim that in a proper context, a non-​specific DO can bind the IO, as shown by the attested examples in (49), found on the web. (49) a. [chand varagh kāghaz]i [be hamdige]i mangane some sheet paper to each-​other staple mi-​kon-​am ipfv-​do.prs-​1sg (lit.) ‘I staple a few sheets of papers to each other.’ b. lidyā yeki=ro mi-​shnās-​e ke [dokhtar pesar]i Lidya someone=ra ipfv-​know.prs-​3sg that girl boy mo’arrefi mi-​kon-​e [be ham]i to each-​other introduction ipfv-​do.prs-​3sg ‘Lidya knows someone who introduces girls and boys to each other.’

Specific Features of Persian Syntax    255 The fact that non-​rā-​marked DOs and rā-​marked DOs cannot be coordinated has also been evoked in favour of the TOPH (Karimi 2003: 103): (50) *man diruz in aks=ro va ketāb kharid-​am I yesterday this picture=ra and book buy.pst-​1sg Samvelian (2001) claims that coordination cannot be used as a test in favour of the TOPH, and Faghiri and Samvelian (2015a) give the following grammatical example in which a non-​ rā-​marked and a rā-​marked DO are coordinated: (51) man diruz sham’=o ye rumizi=o in tāblo=ro I yesterday candle=and a tablecloth=and this painting=ra kharid-​am buy.pst-​1sg ‘Yesterday, I bought this painting, a tablecloth, and some candles.’ To sum up, although there is a consensus on the TOPH in the studies within the generative framework, at least some of the empirical facts supporting this hypothesis seem to be fragile. In particular, word order preferences do not allow for a clear-​cut distinction between rā-​marked DOs and their non-​rā-​marked counterparts as far as syntactic positions are concerned.

9.3.5 Concluding remarks on DOM The facts addressed in this section show that despite the abundant literature on the semantic and pragmatic parameters triggering =​rā, there is still a lot to investigate in order to draw a clear picture of the situation. Therefore, it is a safe bet that =​rā will remain a popular issue in forthcoming studies on Persian. However, it seems reasonable to conclude, in line with Lazard (1982) and Ghomeshi (1997b), that rather than there being a single binary feature that can characterize rā-​marking, be it specificity, topicality, any other feature, the presence of =​rā is determined by the interaction between several parameters that have been highlighted in various studies. Complex as it may seem, this situation is neither specific to Persian nor to DOM. Bossong (1991) concludes that in many languages the rules of DOM cannot be formulated precisely, but must allow for a certain degree of variability across speakers and situations. Variability is also observed for phenomena other than DOM, like word order, optional realization of some appositions, dative alternation, etc. A growing body of studies since Wasow (2002), Bresnan (2006), Bresnan et al. (2007), and Bresnan and Hay (2008), among others, accounts for grammatical phenomena that involve variation by resorting to a new approach which assumes that variation is part of grammar and can be statistically modelled. These methods can be very useful for the study of rā-​marking in Persian, which involves various parameters whose complex interaction requires more reliable methods of investigation than traditional grammaticality judgements. This point has been clearly demonstrated by Faghiri and Samvelian (2014, 2015b,a), Faghiri et al. (2014), and Faghiri (2016) for the linear position of

256   Pollet Samvelian the rā-​marked and non-​rā-​marked objects: generalizations based on grammaticality judgements turn out to be (partially) wrong when rigorous empirical methods are used. The same vein of research can be applied for modelling semantic and discourse dimensions of rā-​marking. For more information on =rā , see Chapters 2, 3, 6, 7, and 8 in this volume.

9.4  Complex predicates Persian has only around 250 simplex verbs, half of which are currently used by the speech community.18 The morphological lexeme formation process outputting verbs from nouns (khāb ‘sleep’ > khāb-​idan ‘to sleep’, raghs ‘dance’ > raghs-​idan ‘to dance’), though available, is not productive. When they need to refer to a new event type, speakers resort to complex predicates (CPrs), formed by a verb and a preverbal element, which can be a noun, harf zadan ‘to talk’ (lit. ‘talk hit’), an adjective, bāz kardan ‘to open’ (lit. ‘open do’), a particle, bar dāshtan ‘to take’ (lit. ‘particle have’), or a prepositional phrase, be kār bordan ‘to use’ (lit. ‘to work take’). These combinations are generally referred to as complex predicates, compound verbs, or light verb constructions. According to Telegdi (1951), the gradual elimination of simplex verbs and their substitution by ‘periphrastic expressions’ or ‘compound verbs’ is at least as old as Middle Persian. Korn (2013) argues that the rise of CPrs in Persian is linked to the development of the verb pair ‘do’ and ‘become’, which encode the features called Instigation [+INST] and Affectedness [+AFF], respectively. A first issue when dealing with Persian CPrs is the delimitation of the category itself. As discussed extensively in Samvelian (2001, 2012), two facts drastically blur the boundary line between lexical and light verbs, and hence, between CPrs and ordinary object–​verb combinations:

(i) The (expected) consequence of the limited number of simplex verbs in Persian is that most of them have a vague semantic content, which becomes specified only in the context of their combination with their arguments (Samsam Bakhtiari 2000; Samvelian 2012). In other words, most Persian verbs are de facto light verbs, so that, from a semantic point of view, deciding whether a noun–​verb combination qualifies for being a CPr involves some degree of arbitrariness. For instance, in combinations like rang zadan ‘to paint’ (lit. ‘colour hit’), vāks zadan ‘to polish’ (lit. ‘polish hit’), or kare zadan ‘to butter’ (lit. ‘butter hit’), the verb zadan may be considered either a ‘bleached’ (light) verb or a lexical verb meaning ‘to apply, to put’, and accordingly the sequence can be considered as a CPr or an ordinary object–​verb combination. The most striking piece of evidence illustrating this situation is the variability of the combinations listed in Persian dictionaries, which vary considerably from one dictionary to another. (ii) From a strictly syntactic point of view, [bare object–​lexical verb] combinations and [adjective–​copula] combinations are in many respects comparable to N-​V and A-​V

18  Sadeghi (1993) gives the estimate of 252 verbs, 115 of which are commonly used. Natel-Khanlari (1986) provides a list of 279 simplex verbs. The Bijankhan corpus contains 228 lemmas.

Specific Features of Persian Syntax    257 CPrs. For instance, a few of the criteria used to identify N-​V CPrs, like single word stress and limited syntactic autonomy for the noun, also apply to [bare object–​lexical verb] combinations, leading to a situation where sequences like māshin rāndan ‘to drive a car’ are considered ‘compound verbs’ in some dictionaries and in the literature on CPrs. Dabir-​Moghaddam (1995), for instance, suggests that sequences like ruznāme xāndan ‘to read a newspaper’ are also compound verbs comparable to sequences like ersāl kardan ‘to send’ (lit. ‘sending do’). Identifying the type of constructions that can be considered as CPrs has thus been one of the issues discussed in some studies (Dabir-​Moghaddam 1995; Samvelian 2012; Sedighi 2009; among others). However, this issue is probably far from being resolved, probably because whether a sequence is a CPr or not—​except for uncontroversially clear combinations formed with a ‘real’ light verb such as kardan and a ‘predicative’ (or eventive) noun, such as ersāl ‘sending’—​is often a matter of usage and lexicalization rather than the inherent properties of the sequence itself. The main body of work on CPrs has focused on the dual nature of these sequences, which exhibit both lexical and phrasal properties (Barjasteh 1983; Dabir-​Moghaddam 1995, 1997; Family 2006, 2009; Folli et  al. 2005; Goldberg 1996, 2003; Karimi 1997; Karimi-​Doostan 1997, 2005; Lazard 2013; Megerdoomian 2001, 2002, 2012; Müller 2010; Pantcheva 2010; Samsam Bakhtiari 2000; Samvelian 2012; Samvelian and Faghiri 2013b, 2014; Shabani-​Jadidi 2014; Tabaian 1979; Vahedi-​Langrudi 1996; among others). On the one hand, Persian CPrs display all properties of syntactic combinations, including some degree of semantic compositionality, while, on the other hand, they also have word-​like properties, since CPr formation has all the hallmarks of a lexeme formation process, such as lexicalization, idiomaticity, and the fact that the sequence can undergo morphological operations.

9.4.1 Words or phrases? Whether Persian CPrs are ‘words’ or syntactic construals has been one of the most debated issues in the literature. The following arguments are generally put forward in favour of a lexical view of these sequences: • The whole sequence bears a single lexical stress, like a word (52). • CPrs can serve as input to derivational rules (53). • The components of a CPr must be adjacent and can only be separated by a restricted set of elements, i.e. verbal inflectional prefixes, clitic pronouns and the future auxiliary (54).19 (52) a. and‵ākht throw.pst.3sg ‘threw’ 19 

b. be gery‵e andākht to cry throw.r2 ‘made cry’

This claim is questioned in several studies and will be discussed later.

258   Pollet Samvelian (53) Adjective derivation with the suffix -​i a. khordan    ‘to eat’ > khordan-​i  ‘edible’ b. dust  dāshtan       ‘to love’ (lit. ‘friend have’) > dust dāshtan-​i ‘lovely’ (54) a. maryam kheili gol    dust   dār-​ad Maryam much flower friend have.prs-​3sg b. # maryam  gol    dust     kheili   dār-​ad    maryam flower friend much have.prs-​3sg ‘Maryam like flowers very much.’ (55) a. maryam gol     dust    na-​dār-​ad Maryam flower friend neg-​have.prs-​3sg ‘Maryam doesn’t like flowers.’ b. maryam  dust=ash          dār-​ad Maryam friend=cl.3sg have.prs-​3sg ‘Maryam loves her/​him/​it.’ c. maryam gol     dust    khāh-​ ad      dāsht Maryam flower friend fut.aux-​3sg have.sinf ‘Maryam will love flowers.’ Arguments in favour of a phrasal (i.e. syntactically construed) analysis of CPrs are the following:

(i) Not only inflectional material, but also syntactic material such as prepositional phrases, can intervene between the verb and the non-​verbal element in a CPr (56). (ii) The non-​verbal element, if a noun, can be modified, quantified, determined, and rā-​marked (57). (56) a. hichvaght be never to

ātish dast na-​zan fire hand neg.imp-​hit.prs.2sg

b. hichvaght dast be ātish na-​zan never hand to fire neg.imp-​hit.prs.2sg ‘Never touch the fire.’ (57) a. maryam harf zad Maryam speech hit.pst.3sg ‘Maryam talked.’ b. maryam in harf-​hā=rā zad Maryam this speech-​pl=ra hit.pst.3sg ‘Maryam told these words.’

Specific Features of Persian Syntax    259 These apparently contradictory properties have given rise to debates on the appropriate ‘place’ for CPr formation. ‘Lexicalist’ approaches20 claim that Persian CPrs are formed in the lexicon (Barjasteh 1983; Dabir-​Moghaddam 1995, 1997; Karimi-​Doostan 1997). Karimi-​ Doostan (1997: 193) suggests that Persian CPrs are lexically formed complex lexical entries which consists of two zero-​level elements separable in syntax, and thus failing to display lexical integrity. Goldberg (1996) treats the Persian CPr as a construction represented in the lexicon, whose categorial status is V° by default. This guarantees that the verb and the preverbal element are unseparated and thus can undergo derivational processes. However, the V° status is a default status and can be overridden if there is a competing higher-​ranked constraint. ‘Syntactic’ approaches, by contrast, consider CPr formation a syntactic process (Folli et al. 2005; Ghomeshi and Massam 1994; Megerdoomian 2002, 2012; Mohammad and Karimi 1992; Tabaian 1979; Vahedi-​Langrudi 1996; among others). Mohammad and Karimi (1992) suggest that despite their syntactic construal, Persian CPrs display lexical properties for two reasons: (i) the impossibility for the light verb to assign thematic roles, and (ii) the existence in Persian of two distinct positions for objects. More recently, neo-​ constructionist studies on Persian CPrs (Folli et  al. 2005; Megerdoomian 2002, 2012; Pantcheva 2010) adopt the predicate decomposition approach developed by Hale and Keyser (1992, 1993, 1997, 2002), Ritter and Rosen (1996) and Borer (1994), which blurs the classic distinction between simplex and complex predicates and analyses pairs like give a kick and kick in English as sharing the same syntactic representation: the ‘simplex’ verb kick is syntactically construed via the incorporation of the noun kick into an abstract light verb. Megerdoomian (2002) and Folli et al. (2005) claim that Persian CPrs constitute a conclusive argument in favour of neo-​constructionist theories of argument structure, since they are unincorporated counterparts of English simplex verbs and thus reveal the universally complex underlying structure of predicates, be they morphologically simplex or not. On this view, whether CPrs are formed in the lexicon (i.e. morphologically) or in syntax is not relevant, since morphology is handled in syntax. Samvelian (2012) also considers the debate on the dual nature of Persian CPrs a false issue, not because of the lack of a boundary between syntax and morphology, but because of the numerous confusions surrounding the use of terms such as ‘formed in the lexicon,’ ‘word’, and ‘morphologically formed’. Saying that a sequence containing more than one word is ‘formed in the lexicon’ or is a ‘lexical unit’ can mean various things:

(i) The sequence is the output of a morphological operation and prototypically behaves like an atom in syntax. (ii) The sequence is lexicalized (Aronoff 1993)  for various reasons (token frequency, naming force, etc.) and must be stored. It is a listeme in the sense of Di Sciullo and Williams (1987). These are two independent dimensions and must be carefully distinguished. Gaeta and Ricca (2009:  38) suggest a quadripartite typology, which allows for treating the


The term ‘lexicalist’ is ambiguous in the literature. Here, it must be understood as ‘morphologically formed’.

260   Pollet Samvelian properties of being a lexical/​stored unit or the output of a morphological operation as independent grades of freedom. Each dimension is represented by a binary feature, [+ morphological] and [+ lexical], giving rise to four virtually possible combinations. This typology shows that the set of morphological words and the set of lexicalized items need not be coextensive. Turning now to Persian CPrs, the latter are certainly not words in sense (i), since even the most idiomatic Persian CPrs do not behave as atoms and are separable by lexical material. Most Persian CPrs, on the other hand, are lexicalized sequences and display lexeme-​like (i.e. not word-​like) properties (Bonami and Samvelian 2010; Samvelian and Faghiri 2013a). They are [–​morphological], [+ lexical] in the sense of Gaeta and Ricca (2009). Samvelian (2012) furthermore argues that none of the arguments supporting the ‘wordhood’ of Persian CPrs are conclusive: • Lexical accent. Bearing a single lexical stress is not specific to CPrs. Sequences formed by a bare object and a lexical verb bear a single stress and yet have not claimed to be ‘words’.21 • Input to morphological operations. It has been argued that since Persian CPrs can undergo morphological operations, such as nominalization, ‘they must be treated as lexical or X° units’ (Megerdoomian 2002: 59). For Karimi-​Doostan (1997), given that the agentive noun °konande ‘doer’ is not attested, the agentive noun pazira’i konande ‘entertainer’ must be derived from pazira’i kon, which requires that the latter be a word. According to Vahedi-​Langrudi (1996) the suffix -​i adjoins to the whole sequence eslāh kardan ‘to reform’, and not to kardan in order to form eslāh kardani ‘likely to be reformed’. Samvelian (2006a: 162) argues, however, that this line of argumentation is flawed since it leads to the conclusion that mār zadan ‘snake beat’ is a lexical unit on the basis of the existence of the adjectival participle mār zade ‘snake beaten’. But mār zadan never occurs as a sequence in discourse. Bracketing paradoxes, i.e. cases where the semantic scope of an affix and its morphological attachment do not coincide (Pesetsky 1985; Spencer 1988; Sproat 1984; Williams 1981), are rather common in various languages. In the French agentive noun metteur en scène ‘director, producer’, the derivational affix -​eur is attached to the verbal stem, while its scope is the whole sequence. Note that the agentive noun °metteur ‘putter’ (from the verb mettre ‘to put’) is not attested, like °konande ‘doer’ in Persian. Yet, it has not been suggested to derive metteur en scène from mettre en scène. Such an analysis would imply that the affix -​eur behaves like an infix and interrupts a ‘word’. Therefore, the only option here is to derive metteur morphologically first. A similar analysis has been outlined for producing sequences such as pazira’i kon + -​ande ‘entertainer’ by Müller (2010) within the HPSG framework. A lexical rule applies to the stem of kardan ‘to do’ first and produces konande ‘doer’. Since the lexical entry for kardan specifies that it must combine with a preverbal element to form a CPr, konande inherits this information and combines in turn with pazirā’i. 21 

Ghomeshi and Massam (1994) mention also this fact in favour of a syntactic analysis of Persian CPrs.

Specific Features of Persian Syntax    261 • Inseparability. Several studies affirm that the components of a CPr can only be separated by a restricted set of items, which are either morphological material (affixes) or grammaticalized lexical material, i.e. the future auxiliary, comparable to inflectional affixes (Dabir-​Moghaddam 1995; Goldberg 1996, 2003; Karimi-​Doostan 1997). The insertion of ‘real’ syntactic items, these studies claim, is excluded and gives rise to ungrammatical or odd examples, as in (58b). (58) a. ali=rā setāyesh Ali=ra adoration ‘I adored Ali.’

kard-​am do.pst-​1sg

b. ?? setāyesh ali=rā adoration Ali=ra

kard-​am do.pst-​1sg

(Goldberg 1996: 135)

Other studies, however, question this claim and affirm that the members of a CPr can be separated by syntactic or lexical items (Samiian 1983; Ghomeshi and Massam 1994; Ghomeshi 1996; Samvelian 2001, 2012). The following examples are from Ghomeshi (1996): (59) a. gush aslan ne-​mi-​kon-​e ear absolutely neg-​ipfv-​do.prs-​3sg ‘(S)he never listens.’ b. garm=esh tu madrese mi-​kon-​am warm=cl.3sg in school ipfv-​do.prs-​1sg ‘I’ll heat it at school.’ Samvelian (2012) also provides numerous attested examples where the prepositional argument of the CPr interrupts the latter (Samvelian 2012: 58–​60): (60) a. ān zan bargasht va sili be surat=am zad that woman turn.pst.3sg and slap to face=cl.1sg hit.pst.3sg ‘The woman turned around and slapped me.’            (Dāneshhvar 1969: 152) b. sedā=ye āhan ra’she be tan=am mi-​andāz-​ad sound=ez iron shiver to body=cl.1sg ipfv-​throw.prs-​3sg ‘The sound of the iron makes me shiver.’            (Mandanipur 1999: 139) To conclude, none of the properties of Persian CPrs provide a conclusive argument in favour of their analysis as ‘words’ or as being ‘morphologically’ formed. Persian CPrs display all typical properties of syntactic combinations and parallel object–​verb combinations in Persian. However, although not words, Persian CPrs are clearly multiword expressions and CPr formation has all the trappings of a lexeme formation process. The lexical properties of CPrs result from their being lexemes or ‘phrasal lexemes’ in the sense of Masini (2009).

262   Pollet Samvelian

9.4.2 The syntactic status of the nominal element On the basis of a thorough comparison between the nominal element of the CPr and the bare object of a lexical verb, Samvelian (2012) shows that the former are syntactically comparable to bare objects in all respects. Like bare objects, the nominal element of a CPr:

(i) is generally adjacent to the verb and tends to follow adverbials and prepositional arguments; (ii) rarely (if not never) appears in the postverbal position, (61); (iii) can be fronted (extracted) and receive a topical reading, (62); (iv) can be promoted and become the subject of the passive construction, (63); (v) can be coordinated with the nominal element of another CPr, as in (64).

(61) ?? in lebās mi-​dah-​ad bu this dress ipfv-​give.prs.3sg smell (intended) ‘This dress smells (bad).’ (62) dast man bār-​hā be=het goft-​am ke be ātish hand I time-​pl to=cl.2sg tell.pst-​1sg that to fire na-​zan neg.imp-​hit.prs.2sg ‘I’ve told you several times not to touch the fire.’ (63) a. matbu’āt be in mas’ale kheili ahammiyat dād-​and press to this issue much importance give.pst-​ ‘The press gave much importance to this issue.’ b. be in mas’ale kheili ahammiyat dāde shod to this issue much importance give.pfp become.pst.3sg ‘Much importance was given to this issue.’ (64) a. omid lagad va sili khord Omid kick and slap collide.pst.3sg ‘Omid was kicked and slapped.’ Samvelian (2012) concludes that the nominal element of the CPr has exactly the same syntactic status as a bare direct object.22 The differences between the latter and the former are a matter of semantics and not of syntactic construal. While the noun in a CP is more cohesive with the verb than a bare direct object (in terms of word order, differential object marking, and pronominal affix placement), it is impossible to draw a categorical syntactic distinction between the two types of combinations.


For a detailed description of bare nouns in Persian, see also Modarresi (2014).

Specific Features of Persian Syntax    263

9.4.3 Compositionality, productivity, and idiomaticity The compositionality of Persian CPrs has received a great deal of attention in recent literature. Although Persian CPrs are idiomatic, they are also highly productive. Several studies have suggested that compositionality is the key to this productivity and suggested hypotheses on how the contribution of the verb and the preverbal element must be combined to derive the meaning, or at least some of the semantic properties, of the CPr. Two main arguments have been invoked in favour of a compositional analysis of Persian CPrs:

(i) The predictability of their argument and event structure. (ii) The predictability of their lexical (referential) meaning.

In examples (65) and (66):

(i) The referential meaning of the CPr and the roles assigned to the arguments are determined by the nominal element, since the semantic participants of the CPr sili zadan ‘to slap’ (lit. ‘slap hit’) in (65b) are identical to those realized within the NP headed by sili ‘slap’ in (65a). (ii) The verb, on the other hand, determines the argument mapping, since the substitution of zadan ‘to hit’ in (65b) for khordan ‘to collide’ in (65c) entails a change in the mapping of the participants to grammatical functions. (iii) The verb also seems to determine some of the aspectual properties of the CPr, since the verb alternation in (66), dāshtan ‘to have’ vs. āvardan ‘to bring’, gives rise to an aspectual contrast. (65) a. sili=e     sārā be omid slap=ez  Sara to Omid ‘Sara’s slap to Omid’ b. sārā  be  omid  sili    zad Sara to Omid slap hit.pst.3sg ‘Sara slapped Omid.’ c. omid   az    sārā sili  khord Omid from Sara slap collide.pst.3sg ‘Omid was slapped by Sara.’             (Samvelian and Faghiri 2014: 45 (1)) (66) a. maryam (hamishe) in   ettefāgh=rā  be yād      dāsht Maryam always      this event=ra   to memory have.pst.3sg ‘Maryam (always) remembered this event.’ (durative reading) b. maryam (nāgahān) in   ettefāgh=rā be yād      āvard Maryam  suddenly    this  event=ra    to  memory bring-​ pst.3sg ‘Maryam (suddenly) remembered this event.’ (punctual reading) (Samvelian and Faghiri 2014: 45 (1))

264   Pollet Samvelian Various approaches have been developed to account for these facts. Most studies within the generative framework adopt a fully compositional view, in the sense that they build on the assumption that the respective contributions of the components of a CPr are consistent through all their combinations and can be defined a priori. Projectionist approaches, for example Karimi-​Doostan (1997), assumes that the information stored in the lexical entries of the light verb and the non-​verbal element combine to build a CPr, while constructionist approaches, for example, Megerdoomian (2001, 2002, 2012), Folli et al. (2005), and Pantcheva (2010), consider the syntactic and the semantic properties of a CPr to be derived from the syntactic construction in which the verb and the preverbal element are inserted. Alternative analyses have been developed in studies adopting a construction-​based approach in the sense of Fillmore et  al. (1988), Goldberg (1995), and Kay and Fillmore (1999). These studies account for the productivity of Persian CPrs either by adopting a non-​compositional view (Family 2006, 2009, 2014)  or by developing a different view of compositionality (Samvelian 2012; Samvelian and Faghiri 2013a,b, 2014). Karimi-​Doostan (1997) provides one of the first serious attempts to model the respective contributions of the verb and the non-​verbal element in CPr formation. Based on Butt’s (1995) work on argument structure, Karimi-​Doostan proposes an account in terms of argument ‘fusion’ or ‘reformation’. Following Grimshaw and Mester (1988), he assumes that the light verb (LV) does not assign theta-​roles and therefore does not have an argument structure. However, it displays aspectual properties and assigns an aspectual role. Being thematically defective, the LV must combine with another element, namely the preverbal elem­ent of the predicate, to develop into a syntactically and semantically complete verb. This combination gives rises to two kinds of CPrs, either compositional or non-​compositional. The first kind results from the combination of the LV with a predicative noun, that is, a noun displaying an argument structure, such as sili ‘slap’ in (65). Non-​compositional CPrs are formed when the LV combines with a ‘thematically opaque’ noun, i.e. a noun that does not display an argument structure. Yax zadan ‘to freeze’ (lit. ‘ice hit’), ghofl kardan ‘to lock’ (lit. ‘lock do’) and āb dādan ‘to water’ (lit. ‘water give’) are examples of non-​compositional CPrs. CPr formation involves the fusion of the information encoded in the respective lexical entries of the verb and the noun. For more information about opaque versus transitive CPrs and their mental representations, refer to Chapter 17. LVs are divided into three categories with respect to their aspectual properties: Initiatory, dādan ‘to give’; Transition, khordan ‘to collide’; and Stative, dāshtan ‘to have’. Some verbs may belong to more than one category, kardan ‘to do’ for example, which is either Initiatory or Transition, and thus have two lexical entries. The aspectual category of the LV determines the aspectual type of the CPr. Initiatory verbs form CPrs with at least one external argument, i.e. either unergative or transitive CPrs, and are compatible with nouns having at least one external argument that refers to the initiator of the action denoted by the CPr. Transition verbs form CPrs with a single internal argument, i.e. unaccusative predicates, and are compatible with nouns having at least one (internal) argument. The latter is mapped into the subject function and receives the Patient role. A mapping rule ensures the correct association between an LV and a preverbal element. For instance, a noun like shekast ‘defeat’, which assigns Agent and Patient thematic roles, can combine with either an Initiatory verb or a Transition verb. In the first case, its external argument (i.e. the Agent) is mapped into the subject function, (67a), while in the second case, it is the internal argument that becomes the subject, (67b):

Specific Features of Persian Syntax    265 (67) a. ali sāsān=rā shekast Ali Sasan=ra defeat ‘Ali defeated Sasan.’

dād give.pst.3sg

b. sāsān az ali shekast khord Sasan from Ali defeat collide.pst.3sg ‘Sasan was defeated by Ali.’ Since Karimi-​Doostan (1997) postulates a categorical distinction between compositional and non-​compositional CPrs, the latter are not accounted for by his treatment, even though he acknowledges that some of the regularities also hold for non-​compositional CPrs. Megerdoomian (2001, 2012)  and Folli et  al. (2005) are representative examples of constructionist approaches to Persian CPrs. Based on work by Hale and Keyser (1993, 2002) and Borer (1994), these studies claim that the syntactic and the semantic properties of a CPr are derived from the syntactic construction in which the verb and the preverbal element are inserted, and not from their respective lexical entries. A  fully compositional approach is thus maintained, but the burden shifts from the lexicon to the syntax. According to Folli et al. (2005), the verb in the CPr realizes the v head in Hale and Keyser’s approach, as illustrated in (68)–​(70). Persian CPrs are thus the non-​incorporated counterpart of verbal constructions suggested by Hale and Keyser. (68) Folli et al. (2005: 1374, (15b))

vP V’

DP Kimea



gerye kard ‘cry’ (69) Folli et al. (2005: 1375, (16b))







bidār ‘awake’

shod ‘become’

266   Pollet Samvelian (70) Folli et al. (2005: 1375, (17b)) vP


DP Papar

AP D Kimea=ro

V A bidār

kard ‘made’

‘awake’ In this approach, the thematic role of Agent/​Cause is assigned by v to its external argument (Kratzer 1996; Marantz 1997): kardan in (68) and (70) forms agentive predicates, but shodan in (69) does not . In other words, the LV, being the lexical realization of v, is responsible for the agentive properties of the CPr, while the non-​verbal element plays no role here. Megerdoomian (2001: 69) argues along the same lines: ‘the choice of the light verb determines whether an external argument is projected’. This claim is supported by the fact that changing the verb in a CPr entails a change in the mapping of the arguments to grammatical functions, as illustrated in (67). Following Bashiri (1981), the authors claim that the verb also determines some aspectual properties of the CPr, namely its dynamic vs. stative and durative vs. punctual aspect. This explains the aspectual contrast between (67a) and (67b). Like Agent selection, aspectual/​ eventive properties of a given LV are assumed to be consistent through all its combinations to form a CPr. For instance, dāshtan ‘to have’ is always stative. The non-​verbal element, on the other hand, is claimed to determine the Aktionsart properties, i.e. telicity, and the referential meaning of the CPr. If the non-​verbal element of the CPr is a PP, a particle, an adjective, or an eventive noun, the CPr is telic; otherwise—​that is, if the non-​verbal element is a non-​eventive noun—​the CPr is atelic. Table 9.1, adapted from Folli et al. (2005), resumes and exemplifies the contribution of each component in the makeup of the CPr. Other constructionist analyses have been developed by Megerdoomian (2001, 2002, 2012)  and Pantcheva (2010). Notwithstanding their differences, these approaches all build on the assumption that the respective contribution of the components participating in CPr formation is consistent through all their combinations and can be defined a priori. In a series of studies, Samvelian (2012) and Samvelian and Faghiri (2013a,b, 2014) develop an alternative view of compositionality, which they qualify as a posteriori in the sense of Nunberg et al. (1994) for idiomatically combining expressions, and outline a construction-​ based analysis of Persian CPrs, building on the following observations:

Specific Features of Persian Syntax    267 Table 9.1 Telicity in Persian Complex Predicates Telic Complex Predicates PP + LV:

be donyā āmadan ‘to be born’ (lit. ‘world come’) be ātash keshidan ‘to put on fire (lit. ‘fire pull’)

Particle + LV:

kenār āmadan ‘to get along’ (lit. ‘side come’)

Adjective + LV:

derāz keshidan ‘to lay down’ (lit. ‘long pull’)

Eventive noun + LV:

shekast khordan ‘to be defeated’ (lit. ‘defeat collide’) shekast dādan ‘to defeat’ (lit. ‘defeat give’)

Atelic Complex Predicates Non-​eventive noun + LV:

dast khordan ‘to get touched’ (lit. ‘hand collide’) kotak khordan ‘to get beaten’ (lit. ‘beating hit’) dād zadan ‘to yell’ (lit. ‘scream hit’) dast andākhtan ‘to mock’ (lit. ‘hand throw’)

(i) Although there are consistent regularities in the makeup of the syntactic and semantic properties of CPrs, several examples show that the contribution of each component cannot be determined a priori, but is determined in combination with the other component of the CPr and the meaning of the construction as a whole. (ii) While the idiomatic properties of Persian CPrs have been generally acknowledged, they have nevertheless been overlooked or minimized by studies that adopt a fully compositional approach.

Samvelian (2012), on the basis of extensive data, shows that the same verb can give rise to CPrs with different agentive and eventive properties. Likewise, the non-​verbal element’s contribution can vary through its combinations with different verbs. For instance, the verb zadan ‘to hit’, generally considered agentive and eventive, can nevertheless participate in the formation of ‘unaccusative’ (or passive-​like) CPrs such as yax zadan ‘to freeze’ (lit. ‘ice hit’) or zang zadan ‘to go rusty’ (lit. ‘rust hit’). The same holds for gereftan ‘to take’ and kardan ‘to do’, which, apart from agentive CPrs, form ‘unaccusative’ CPrs also, ātash gereftan ‘to take fire’ (lit. ‘fire take’), ādat kardan ‘to get used to’ (lit. ‘habit do’) and dard kardan ‘to ache’ (lit. ‘pain do’). The verbal contribution is not consistent either with respect to the eventive properties of the CPr. Again, the same verb can give rise to both stative and eventive (dynamic) CPrs. For instance, the verb dāshtan ‘to have’ is not invariably stative and can produce eventive predicates, such as ersāl dāshtan ‘to send’ (lit. ‘sending have’), taghdim dāshtan ‘to offer’ (lit. ‘offering have’), and e’lām dāshtan ‘to announce’ (lit. ‘announcing have’).23 The contribution of the non-​verbal element also turns out to be inconsistent. For instance, adjectives and PPs can as well form atelic CPrs, lāzem dāshtan ‘to need’ (lit. ‘necessary have’), penhān dāshtan ‘to keep hidden’ (lit. ‘hidden have’), and be maskhare gereftan ‘to make fun 23  Note that the examples discussed in this section are by no means isolated. For thorough examples illustrating the non-​consistency of the verbal contribution to the agentive and eventive properties of Persian CPrs, see Samvelian (2012: 114–​30).

268   Pollet Samvelian of ’ (lit. ‘to mockery take’). Inversely, non-​eventive nouns can give rise to telic CPs, pust andākhtan ‘to slough off ’ (lit. ‘skin throw’). The non-​predictability of the meaning of the CPr is another significant impediment to fully compositional approaches. In order for the latter to work, the meaning of the CPr must be derivable on the basis of the meaning of its components. However, as mentioned in several studies (Bonami and Samvelian 2010; Family 2006, 2009, 2014; Goldberg 1996; Karimi-​ Doostan 1997; Samvelian 2012; Samvelian and Faghiri 2013a; among others), numerous Persian CPrs are semantically opaque. Moreover, as shown by Samvelian (2012) and Bonami and Samvelian (2010), even in cases where a CPr is semantically transparent, it is hardly ever the case that its meaning is fully predictable from the meaning of its component parts. In other words, the meaning of Persian CPrs, even the transparent ones, is conventional in many cases and therefore has to be learned, in the same way as one has to learn the meaning of the simplex verbs in English, for instance. For a discussion on the processing of transparent and opaque CPrs in second language speakers of Persian, refer to Shabani-Jadidi (2016). Relying on these observations, Samvelian (2012) and Samvelian and Faghiri (2013a,b, 2014) claim that Persian CPrs, at least the lexicalized ones, must be stored, exactly as lexemes are. They nevertheless argue that the need for an inventory is not incompatible with a compositional approach, provided compositionality is defined a posteriori. With respect to their compositionality, Persian CPrs are comparable to Idiomatically Combining Expressions, that is, ‘idioms whose parts carry identifiable parts of their idiomatic meanings’ (Nunberg et al. 1994: 496). This means that the verb and the non-​verbal element of a CPr can be assigned a meaning in the context of their combination. Thus, the CPr is compositional, in the sense that the meaning of the CPr can be distributed to its components, and yet it is idiomatic, in the sense that the contribution of each member cannot be determined out of the context of its combination with the other one. For instance, zadan ‘to hit’ can receive various interpretations according to the noun with which it combines: ‘to apply’ in rang zadan ‘to paint’; ‘to add, to incorporate’ in namak zadan ‘to salt’; ‘to wear’ in māsk zadan ‘to wear a mask’; or ‘to emit’ in dād zadan ‘to shout’. Given the meaning assigned to zadan and the meaning of the CPr as a whole, new combinations can be produced and interpreted. For instance, tag zadan ‘to tag’ (lit. ‘tag hit’), formed with the loanword tag, is created on the basis of barchasb zadan ‘to label’ (lit. ‘label hit’), tambr zadan ‘to stamp’ (lit. ‘stamp hit’), etc. This view of Persian CPrs is then developed into a Construction-​based approach: • Each CPr corresponds to a Construction. • CPrs can be grouped into classes according to their semantic and syntactic properties and each class can be represented by a partially fixed Construction. • Constructions can be structured in networks, thus accounting for different semantic and syntactic relations between CPrs, such as synonymy, hyperonymy/​hyponymy, and valency alternation.24 In this approach, the productivity of the Persian CPrs is accounted for via the analogical extension of the existing classes. It can be compositionality-​based or not. In the first case,

24  See Samvelian (2012) for an application of this analysis to the CPrs formed with zadan ‘to hit’. See also Müller (2010) for a partially comparable approach within the HPSG framework.

Specific Features of Persian Syntax    269 new combinations are created on the basis of the meaning assigned to the Construction as a whole and to its components. However, productivity is not always compositionality-​based, and non-​compositional Constructions (or classes) can also be productive. The productivity of Persian CPrs is also related to other parameters such as the coherence of the classes and their size.

9.4.4 Concluding remarks on complex predicates Like the Ezafe construction and DOM, Persian CPrs have still a lot to reveal on the many faces of predicate formation in languages of the world and key issues such as the idiomaticity vs. compositionality or storage vs. online processing. For the discussion on the online processing of CPrs, see Chapter 17 in this volume. Here again, there is a lot to gain from resorting to empirical methods, which can provide a new insight to elucidate some crucial theoretical issues discussed since the late 1990s concerning CPr formation in Persian. The issue of the productivity of Persian CPrs, for instance, cannot be adequately investigated without taking into account data from usage and without resorting to quantitative methods comparable to those used in morphology (Baayen 1992). Likewise, the issue of whether Persian CPrs must be stored in the (mental) lexicon cannot receive a valid answer without psycholinguistic investigation (Baayen 2007). Recent studies such as Shabani-​Jadidi (2014) and Sadat Safavi et al. (2016) have opened the way for other studies to come. For more information on complex predicates, see Chapters 7, 8, 17, and 19 in this volume.

9.5 Conclusion The main purpose of this article was to offer an overview of the issues raised by three specific features of Persian syntax, namely the Ezafe construction, differential object marking, and complex predicate formation, and the way various studies have tried to account for these issues. Because of space limitations and the impressive number of studies, it was impossible to get into the details and subtleties of all studies presented through the article. Hopefully, nothing has been ‘lost in translation’ and the quoted authors’ positions have been rendered faithfully. Noting the enduring interest for these three phenomena in the literature since the early 1980s, one may (wrongly) assume that they have disclosed all their secrets and that almost everything worth saying has already been said. Along with the presentation of the huge amount of work already done, another aim of this article was to show that each of three phenomena at stake still offers a challenging and promising area for empirical and theoretical investigation, as illustrated by a number of recent studies adopting new methodological approaches.

Pa rt  I V


chapter 10

MORPH OL O GY Behrooz Mahmoodi-​B akhtiari 10.1 Introduction In this chapter, Persian morphology is described. Here, by ‘Persian’, I mean the written form of the Persian of Iran, which is considered to be the most widespread variety of this language, and is regarded as the standard variety of Persian. Colloquial Persian has some specific morphological features which are not covered here, and our seldom references to those features are just for the sake of some comparisons (see Chapters 2, 3, 4, 5, 6, 11, and 15 for more on colloquial Persian). (New) Persian is perhaps the best-​known and the widest-​studied Iranian language. However, it does not represent the family of the Iranian languages in terms of morphology, and has an atypical morphological system among the Iranian languages, in the sense that it has almost completely lost the synthetic nominal and verbal inflection (together with their inflectional classes), as well as the inflectional distinction of case, number, and gender as well as of aspect, mood, tense, and voice; which were to be inherited from its older ancestors. Historically, Persian is a lineal descendant of Old Persian and Middle Persian. Old Persian (like Sanskrit and Classical Greek as its contemporaries), was a typical inflected language. This inflectional nature underwent a radical reduction by the late Middle Persian period, to the extent that it had almost the analytical structure of the current New Persian. As a matter of fact, among all the inflectional structures of the previous stages of Persian, only the two categories of person and number in the form of three persons in singular and plural are still surviving in pronouns and personal endings. In this chapter, after a general study of the Persian morphemes, we will deal with nominal and verbal morphology of Persian, together with describing the compounding process in both of these, and the other methods of word formation in Persian.

10.2  Persian morphemes: a general sketch Persian makes use of both its lexical and functional morphemes in its word formation processes.

274   Behrooz Mahmoodi-Bakhtiari

10.2.1 Lexical morphemes Lexical morphemes in Persian are either ‘free’ (like all the nominal roots, and many of the simple adjectives and adverbs), which have a meaning of their own, such as divār ‘wall’, farzand ‘child’, and mādar ‘mother’. However, bound lexical morphemes are also there in Persian, which constitute the present stems of the verbs such as mi-​rav-​am (incomplete aspect prefix/​indicative mood marker-​’go’ (root)-​1sg. ‘I go’), which do not appear in the Persian discourse, unless they are fully conjugated.

10.2.2 Functional morphemes Persian has a diverse set of functional morphemes. These morphemes may be classified as ‘free’ and ‘bound’ ones as follows. Free functional morphemes Among the free functional morphemes of Persian, two major groups can be identified: adpositions and conjunctions. 1) Adpositions: prepositions Persian principally uses its prepositions to express case relations. New Persian, in its early stages, used to have circumpositions (be-​daryā dar ‘in the sea’, be-​khāne andar ‘inside the house’), which are no longer used. New Persian has now seven primary prepositions: (i) be ‘to’ (dative, and directional, and formerly locative and instrumental as well (see Mashkur 1969: 162), which characteristically introduces the indirect object); as ketāb rā be u dādam ‘I gave the book to him/​her’. As a directional, it is used in both material and figurative contexts: u be kānādā mohājerat kard ‘(S)he migrated to Canada’, zahmat-​hā-​yam be hadar raft ‘my efforts were in vain’, pedar-​ash be saratān mobtalā shode ast ‘His/​her father is afflicted with cancer’. (ii) dar ‘in(to)’ as dar otāq ‘in the room’, dar chenin sharāyeti ‘in such circumstances’; (iii) az ‘from’, denoting the source of something, as khāne-​ye mā az injā kheili dur ast ‘our house is so far from here’, or in beit az hāfez ast ‘This verse is by Hāfez’; (iv) bā ‘with’ (in comitative, as bā dustam dars khāndam ‘I studied together with my friend’, instrumental, as bā chāqu be u hamle kard ‘he attacked him with a knife’, and concessive; as bā kamāl-​e meil in rā mi-​pazir-​am ‘I accept this with all pleasure’); (v) tā ‘to, until’ as bāyad tā shab montazer bemānim ‘We have to wait until night’, az tehrān tā Tabriz cheqadr rāh ast? ‘How far is it from Tehran to Tabriz?’; (vi) (be)joz ‘except’ as hame madrak gereftand, bejoz man ‘Everybody got its certificate except me’; (vii) barā(-​ye) ‘for’ as in gol rā barāye to kharide-​am ‘I have bought this flower for you’. There are also two other prepositions, chun ‘like’ and bar ‘(up)on, over’, which are not commonly used, and appear chiefly in the literary and formal language.

Morphology   275 1a) The postposition rā The only postpostion found in new Persian is rā, whose major function is marking specific direct objects for accusative case (see Chapter 9 for more discussion on rā). Given the fact that the proper nouns, nouns modified by demonstratives or personal pronominals, nouns denoted by the enclitic -​e or the context are all definite NPs, rā may follow them, in case they act as direct objects: dust-​i rā didam ‘I saw a friend’, dust-​e mehraban-​i rā didam ‘I saw a kind friend’, dust-​e khub-​am rā didam ‘I saw my good friend’, dust-​e khub-​e dorān-​e dabirestān-​am rā didam ‘I saw my good high school friend’. Although rā is basically known for its direct object marking, the diachornic studies reveal that it has not had the function from the beginning. It used to denote ‘reason’, and the word cherā ‘why’ (lit. ‘what for’) in Modern Persian is a reminiscent of that usage. In more recent literature, the primary function of rā has been suggested to be marking specificity rather than accusative case. It also serves as a marker for topicalization in spoken Persian (māshin ro dar-​esh ro be-​ band ‘the car, close its door’ from dar-​e māshin ro be-​band ‘close the door of the car’. It also serves to denote the meaning of some prepositions such as ‘for’, in nāhār ro esfāhānim ‘We will be in Esfahan for lunch’, or ‘to’ in gol-​hā ro āb bede ‘(Give) water (to) the flowers’ (see Dabir-​Moghaddam 1992). 2) Conjunctions Persian conjunctions may be classified morphologically into simple or complex ones. Among the simple conjunctions, perhaps the most common is va ‘and’, which is used between sentences, as well as phrases: mā va shomā kār mikonim va pul dar miyāvarim ‘We and you work, and earn money’. The adversative conjunction ammā (or its commonly used Arabic word: vali), means ‘but, however’, as in hāl-​am khosh nist, ammā be sar-​e kār mi-​ rav-​am ‘I do not feel well, but I will go to work’. The two causal conjunctions zirā(-​ke) and chon(-​ke) ‘because’ appear between two sentences or clauses as well. Another conjunction with a similar function is yā ‘or’, which acts as a simple conjunction, when connecting two interrogative sentences (otherwise it may be well seen as a complex conjunction, when repeated): shomā dāneshju hastid yā ostād? ‘Are you a student, or a professor’? The conjunction agar ‘if ’ is a marker of the embedded conditional clause, which is usually placed before the two sentences, and acts almost identically with its English counterpart: agar pul-​e raftan dāshtam, hargez dar injā nemimāndam ‘If I had money to go away, I would never stay here’. The two-​morpheme conjunctions agar-​che ‘even if, although’, har-​ chand (ke) ‘however’, and chonān-​che ‘in case’, are the other major ‘initial’ conjunctions in Persian. The NP-​following conjunction ham ‘also, too, even’ connects two sentences as well, although placed after the subject of the second sentence, rather that on the border of the two sentences: shomā harekat konid, man ham be shomā molhaq mishavam ‘You set out, and I will join you too’. Among the simple conjuctions, tā and ke represent several functions, and denote different meanings. The polyvalent subordinator tā appears at the initial clause, and denotes time: tā dars-​ am tamām na-​shav-​ad, az injā ne-​mi-​rav-​am ‘I will not leave here, unless I finish (as long as I have not finished) my education’, tā ma-​rā did-​pā be farār gozāsht ‘he ran away once/​as soon as he saw me’. It may be placed between two clauses, and in that sense, it introduces a subsequent clause, either to denote a time, as sabr kardim tā qatār-​e ba’di āmad ‘we waited

276   Behrooz Mahmoodi-Bakhtiari until the next train came’; or to denote a purpose, as āmad-​am tā shakhsan ba shomā sohbat konam ‘I came to talk to you in person’. The other multifunctional conjunction, ke ‘that’, is generally known to be a complementizer, as in che khub shod ke shomā rā emruz didam ‘how nice that I met you today’, goft ke nemitavānad biyāyad ‘he said that he couldn’t (lit. ‘can’t’) come’. However, it also acts as a relative marker, as well as introducing the purpose clauses, as talāsh kard-​am ke movaffaq shav-​am ‘I tried to be successful’; causal clauses: emruz az khāne birun na-​rav-​id, ke havā be-​ sheddat ālude ast ‘Don’t go out of your houses today, for the air is extremely polluted’; temporal clauses, to denote the interruption of an action: hanuz harf-​am tamām na-​shod-​e bud ke sili-​ye mohkam-​i be surat-​am zad ‘My sentence was not finished yet, when he slapped me hard on my face’. Ke also gets involved in combination with other lexical items, such as pas az in-​ke ‘after’, hamin-​ke ‘as soon as’, az bas ke ‘so much that’, vaqti ke ‘when’, with the following further combinations: az vaqti ke ‘since’, and tā vaqti ke ‘until’, be mahz-​e inke ‘as soon as’, and others. On the other hand, the complex (or reciprocating) conjunctions ham ... ham ‘both ... and’, yā  . . .  yā  . . .  ‘either  . . .  or’, na  . . .  na  . . .  ‘neither  . . .  nor  . . .  ’, and che  . . .  che  . . .  ‘whether  . . .  or’ link NPs as well as sentences: ham behruz, ham bahrām (both Behrooz and Bahram), ham kar mi-​kard-​am, ham dars mi-​khānd-​am ‘I both worked and educated’, yā u ra birun kon, yā man rā talāq be-​de ‘Either expel him, or divorce me’, na pul-​at rā mi-​khāh-​am, na hozur-​at rā tahammol mi-​kon-​am ‘I neither need your money, nor stand your presence’, che beravi, che bemāni, ozā’ taqyir na-​khāh-​ad kard ‘Whether you go or stay, the situation will not change’. To the list of such symmetrical structures, we may add the negative adverbial na tanha ‘not only’, together with the rhetorical adversative balke ‘but also’: na tanhā zibāst, balke puldār ham hast ‘She is not only beautiful, but also rich’. Bound functional morphemes The major bound functional morphemes in Persian may be classified as affixes and clitics. 1) Affixes Among the different types of affixes, Persian makes use of prefixes, suffixes, and to a very lesser extent, interfixes. More than one suffix is possible in Persian words and each suffixal form usually has just one meaning. Infixation and Circumfixation are not among the morphological processes of Persian. In many of the classical Persian grammars, and the interfixes have been wrongly introduced as infixes (for example, see Vahidiān and Emrāni 2000; Kalbasi 2001). Since Persian is quite rich in terms of its derivational affixes, in the parts to come, we will concentrate mainly on this type of affixes, and will deal with the inflectional affixes within our discussion of the nominal and verbal morphology. 1a) Prefixes Persian prefixes act both as inflectional and derivational affixes. The derivational ones, which, according to definition, change the parts of speech and generate new words, mainly generate adjectives and related nouns. Some of the major prefixes of Persian are as follows: nā-​‘un’, which is added both to the nominal and verbal stems and generates adjectives, such as nā-​sepās ‘ungrateful’ (sepās ‘thanks’), and nā-​tavān ‘disabled’ (tavān-​est-​an, ‘to be able’). This

Morphology   277 prefix, for sure, also acts as a part-​of-​speech saviour as well, as in nā-​zibā ‘unbeautiful, ugly’ (zibā ‘beautiful’) and nā-​be-​khrad ‘unwise’. ham-​ ‘same, co-​’: ham-​peimān ‘ally’ (peimān ‘treaty’), ham-​kelās ‘classmate’. bā-​ ‘with’:  bā-​adab ‘polite’ (adab ‘politeness’), bā-​savād ‘literate’ (savād ‘literacy’). Right after this, we may note bi-​‘without’: bi-​hush ‘unconscious’ (hush ‘wisdom’), bi-​kas ‘alone’ (kas ‘person’). be-​ ‘holding’:  be-​hanjār ‘standard’ (hanjār ‘discipline, order’), be-​andām ‘well-​figured’ (andām ‘body’), be-​sāmān ‘well-​organized’ (sāmān ‘organization’), be-​rāh ‘obedient’ (rāh ‘way’), be-​hush ‘alert’ (hush ‘wisdom’). Although old, this prefix is not much used, and is mainly seen in formal or literary discourse.

The two prefixes abar-​‘super, above’ and bish-​ ‘a lot/more’ are almost new with not much productivity, and have been introduced by the Academy of Persian Language and Literature, in order to be used in loan translations: abar-​rasāna ‘superconductive’ (rasāna ‘conductive’), bish-​fa’’āl ‘hyperactive’ (fa’’āl ‘active’). 1b) Suffixes Although the productive suffixes are not so much in the New Persian, suffixation is a principal method of nominal derivation in Persian.

(i) The Major Noun-​making suffixes are: -​i, which makes abstract nouns from the adjectives or type nouns, such as khub-​i ‘goodness’ (khub ‘good’), and dust-​i ‘friendship’ (dust ‘friend’). It also makes place names such as baqqāl-​i ‘grocery shop’ (baqqāl ‘grocer’). It may refer to some actions such as vazne-​bardār-​i ‘weight lifting’ and asid-​ pāshi ‘acid splashing (on somebody’s face)’. Some of the nouns made by -​i do not represent the stems without this suffix, such as parde-​bardār-​i ‘unveiling’ (curtain-​ remove-​ing), and pārti-​bāz-​i (party-​play-​ing) ‘pulling strings’, while *parde-​bardār and *pārti-​bāz as the probable agents of those actions do not exist.

Another major noun-​making suffix is -​e, which turns numerals, adjectives, generic nouns, and abstract nouns, and usually yields nouns with metaphorical or opaque semantic relationship to the stem, such as gush-​e ‘corner’ (gush ‘ear’), cheshm-​e ‘spring’ (cheshm ‘eye’), pust-​e ‘shell’ (pust ‘skin’), dast-​e ‘handle’ (dast ‘hand’), panj-​e ‘claw’ (panj ‘five’), haft-​e ‘week’ (haft ‘seven’). This suffix is also attached to verbal stems, while added to the past stem, it produces past participle, such as shost-​e ‘washed’ (shost-​an ‘to wash’), bor-​id-​e ‘cut’ (bor-​id-​an ‘to cut’), and sukht-​e ‘burnt, consumed’ (sukht-​an ‘to burn’). It may also be added to the present stems to yield nouns, such as khand-​e ‘laughter’ (khand-​id-​an ‘to laugh’), āmuz-​e ‘instruction’ (āmukhtan ‘to teach’), sanj-​e ‘criterion’ (sanj-​id-​an ‘to measure’), and angiz-​e ‘motivation’ (angikht-​an ‘to encourage’). The suffix -​esh is always attached to the present roots of the verbs, and nominalizes them, as in bakhsh-​esh ‘forgiveness’ (bakhsh-​id-​an ‘to forgive’), bin-​esh ‘(mental) view’, (from the irregular verb did-​an ‘to see’). On the other hand, the suffix -​ār attaches to the past verbal stems to make nouns: nevesht-​ār ‘writing’ (nevesht-​an ‘to write’), goft-​ār ‘speech’ (goft-​an ‘to say’). Finally, the Arabic loan suffix -​iy(y)at, makes abstract nouns either from concrete or abstract nouns, such as jam’-​iyyat ‘population’ (jam’ ‘group’), zedd-​iyat ‘opposition’ (zed ‘opposed’).

(ii) Diminutive suffixes are -​ak and -​che. Although -​ak mainly functions as a diminutive suffix (as in pesar-​ak ‘little boy’), it sometimes produces words not necessarily of this

278   Behrooz Mahmoodi-Bakhtiari type, such as sag-​ak ‘catch’ (sag ‘dog’), tefl-​ak ‘poor fellow’ (tefl ‘baby’), and cheshm-​ ak ‘wink’ (cheshm ‘eye’). However, -​che seems to have only the diminutive function, with a much less degree of productivity that -​ak, in words such as ketāb-​che ‘booklet’ (ketāb ‘book’) and bāq-​che ‘house garden’ (bāq ‘garden’). (iii) Agentive suffixes are -​gar, -​kār, -​mand, -​var and -​ande. -​gar and -​kār attach to nouns and verb stems, such as kār-​gar ‘worker’ (kār ‘work) and jush-​kār ‘melder’ (jush ‘meld’), while -​mand and -​var only attach to the nouns, and -​ande is only attached to the (present) verb stems, like honar-​mand ‘artist’ (honar ‘art’), kherad-​mand ‘wise’ (kherad ‘wisdom’), dān-​esh-​var ‘knowledgeable’ (dān-​esh ‘knowledge’), dav-​ande ‘runner’ (dav-​id-​an ‘to run’), and bāz-​ande ‘loser’ (bākht-​an ‘to lose’). The suffix -​bān also refers to agents or occupations, such as mehr-​bān ‘kind’ (mehr ‘kindness’) and pās-​bān ‘police agent’ (pās ‘watch, care’), along with bāq-​bān ‘gardener’ (bāq ‘garden) and dar-​bān ‘janitor’ (dar ‘door’). To this list we may add two loan suffixes from Turkish: -​chi and -​bāshi (the former very productive), which form occupation and profession nouns, such as shekār-​chi ‘hunter’ and āsh-​paz-​bāshi ‘chef ’. (iv) Suffixes denoting places are -​estān, -​gāh, -​kade, and the less productive -​zār, as farhang-​estān ‘academy’ (farhang ‘culture’), dānesh-​gāh ‘university’ (dān-​esh ‘knowledge’), pazhuh-​esh-​kade ‘research centre’ (pazhuh-​esh ‘research’), and gol-​zār ‘flower garden’ (gol ‘flower’). It should be noted that the suffix -​estān also denotes the names of territories and countries on the basis of the inhabitants: arman-​estān ‘Armenia’ (arman ‘Armenian’), torkaman-​estān ‘Turkmenistan’ (torkaman ‘Turkman’). Finally, the suffix denoting the container of something: -​dān, as in gol-​dān ‘vase’ (gol ‘flower’), and qan(d)-​dān ‘sugar bowl’ (qand ‘cubic sugar’). For a detailed study of the suffix -​ estān, see Paraskiewicz (2008). (v) Adjective forming suffixes are various in Persian. Perhaps one of the most commonly used ones is -​i, which markes attribution, such as ālmān-​i ‘German’ (ālmān ‘Germany’), taryāk-​i ‘opium smoker’ (taryāk ‘opium’), khārej-​i ‘foreign, foreigner’ (khārej ‘outside’), qahve-​’i ‘brown’ (qahve ‘coffee’), and pārche-​’i ‘made of cloth’ (pārche ‘cloth’). For the final purpose (denoting the material of which something is made), there is also the affix -​in, as in āhan-​in ‘made of iron’, and pulād-​in ‘made of steel’ which is not very common, and is mostly used in formal discourse. The other one, -​e, is added to numerical noun phrases, such as se-​pāy-​e ‘tripod’ (pā ‘leg, foot’), chahār kār-​e ‘for four functions’ (kār ‘work’), chand-​manzur-​e ‘multifunctional’ (manzur ‘purpose’), se-​sāl-​e ‘three-​year-​long, three years old’. It also denotes holding, as in diplom-​e ‘diploma holder’. -​āne is another attributive suffix, which yields adjectives such as bachche-​g-​āne ‘childish’. When attached to the adjectives for animates, it produces adjectives with the same meaning for the inanimates, e.g. deed, speech, and the like, such as āqel-​āne ‘wise (deed, work)’ and ahmaq-​āne ‘silly’. This suffix also makes adverbs such as forutan-​āne ‘humbly’, mo’addab-​āne ‘politely’. The two other adjective making affixes with limited productivity are -​nāk and -​gin, as in nam-​nāk ‘wet’ (nam ‘moisture’), and sharm-​gin ‘sorry’ (sharm ‘sorrow, regret’). More examples and functions of all these suffixes may be read in Sādeghi (1991b) and Kashāni (1992). 1c) Interfixes Interfixing in Persian is rare and not much productive. Long misunderstood for infixes, the major Persian interfixes -​ā-​and -​vā-​are bound morphemes attaching two indentical

Morphology   279 lexical items, in order to semantically express the importance or volume of the base word. This morphemes are beleived to be the allomorphs of tā ‘to’ or bā ‘with’, which provide both transparent structures such as rang-​ā-​rang ‘colourful’ (rang ‘colour’), sar-​ā-​sar ‘all over’ (sar ‘head’), jur-​vā-​jur ‘various’ (jur ‘kind’), and lab-​ā-​lab ‘totally full’ (lab ‘lip’), and sometimes less transparent ones such as kesh-​ā-​kesh and garm-​ā-​garm ‘midst’ (kesh ‘pull’, garm ‘warm’), dush-​ā-​dush ‘together’ (dush ‘shoulder’), tang-​ā-​tang ‘close, intimate’ (tang ‘tight’). 2) Clitics The study of Persian clitics is rather new in the Iranian grammar tradition. Until recently, many of the endings now known to be clitics were treated as affixes. But now, in the light of the new linguistic findings, we know that not only Persian, but also most of the contemporary Western Iranian languages make use of enclitic pronouns, to mark objects for their verbs, or possessors for their nouns. This will be dealt with in detail in section 10.3, under our study of the nominal morphology of Persian. Persian does not use proclitics. The major clitics of Persian comprise the nominal endings denoting possession: ketāb=am (book=1sg. poss.), and objects: zad-​am=ash ‘I hit him’ (past of HIT-​1sg.=3sg.). The other widely used clitic is the indefinite marker -​i, but perhaps the most important of all, is the Ezafe marker -​e. Ezafe construction is one of the specific syntactic properties of Persian, in which a noun phrase consisting of the head (an element such as noun or adjective), is connected to its modifier(s) by -​e, such as ketāb-​e simin ‘Simin’s book’ and ketāb-​ e mofid-​e simin ‘Simin’s useful book’ (for more information and examples, see Perry and Sadeghi 1999, as well as Chapters 6, 7, 8, 9, and 19 in this volume). The other clitics used in Persian mainly reveal themselves as the contracted forms of the free functional morphemes in the colloquial Persian, thus not within our major concern. Just to introduce them, they are -​ā (as the contracted form of the plural suffix -​hā), -​am for ham ‘also’, as well as -​o which is an allomorph both for rā (the direct and specific object marker) and the conjunction va ‘and’ (see Shaghāghi 1995).

10.3  Persian nominal morphology 10.3.1 Pronominal morphology As said earlier, New Persian has almost completely lost the synthetic nominal and verbal inflection of its ancestors, therefore, the inflectional distinction of case, number, and gender (together with aspect, mood, tense, and voice) do not exist in it any longer. However, the three persons and the two numbers (in singular and plural) are still distinguished in pronouns and personal endings. The personal pronouns in Persian are the independent and enclitic ones, as shown in Table 10.1. As can be seen, no systematic distinction in terms of gender is there in the third-​person pronouns, and u (with the plural ān-​hā) may denote both the masculine and the feminine. Also, the objective and possessive enclitics -​ash and -​eshān refer to humans (and non-​ humans) of both sexes. Independent pronouns are not pluralized, but the colloquial Persian allows the addition of the plural suffix to the first-​and second-​person plurals for emphatic purposes, such as mā-​hā ‘we all’ and shomā-​hā ‘you all’. On the other hand, in the polite form

280   Behrooz Mahmoodi-Bakhtiari Table 10.1 Persian pronouns 1sg.




















of Persian, second and third singular persons are addressed with the plural pronouns shomā and ishān respectively. The latter, of course, is now restricted to the polite form of discourse, and is not used to refer to the third-​person plural any longer. Also it is possible to use the first-​person plural for the singular, to denote modesty on the part of the speaker, as in mā shagerd-​e shomā hast-​im, which may mean both ‘we are all your students’, or humbly ‘I am your student’. Syntactically, the behaviour of independent pronouns is identical to that of nouns: they can be the object of transitive verbs, they may appear as the second constituent of an Ezafe construction, as well as acting as the modifier of a prepositional phrase, as in man u rā mi-​ shenās-​am ‘I know him’, in ketāb-​e shomā ast ‘this is your book’, and u be man negāh kard ‘he looked at me’. They can also be placed as the NP heads, as in mā khāhar va barādar-​ hā ‘we, (as) sisters and brothers’, shomā zabānshenās-​hā ‘you linguists’. They (except both the third persons) also accept adjectivs, and the first and second singulars receive it with an Ezafe marker, as man-​e bad-​bakht ‘miserable me’, and to-​ye bi-​hayā ‘you shameless’. For the first and second plurals, the adjective is also pluralized: mā bi-​gonāh-​ān ‘we, the innocent (people)’, shomā vahshi-​hā ‘you savages’. The universal pronoun ‘one’ in Persian is ādam ‘human’, as in ādam haz mikonad ‘one is overwhelmed’, ādam shākh dar-​mi-​āvar-​ad ‘one grows horns (out of surprise)’. The indefinite form of such a pronoun would be yeki ‘one’, as in yeki dar rā bāz kon-​ad ‘somebody opens the door!’. In order to denote possession with the independent pronouns, the word māl ‘belonging’ behaves as the head of the ezafe construction, with the pronoun as the modifier: in melk māle-​man ast ‘This property in mine’. māl-​e shomā ān yeki ast ‘yours is that one’.

10.3.2 Pronominal enclitics Pronominal enclitics (further discussed in Chapters 3 and 8) have four major functions in Persian. They either act as a possessive marker, when attached to an NP: ketāb=etān besyār jāleb bud (~ ketāb-​e shomā besyār jāleb bud) ‘your book was so interesting’ (see Table 10.2). They may also be attached to prepositions, and act as the object of preposition:  ānhā barāy=etān pul ferest-​ād-​and (~barāye shomā) ‘they sent money for you’; although this mostly happens in the colloquial form): az=at khāhesh-​i dār-​am (~az to) ‘I have a request from you’, be-​h=emun mahal na-​zāsht (~be mā) ‘he took no notice of us’. As the third function, they attach to the transitive verbs (and in the case of the compound verbs, they are attached to the preverbal element of the compound) and assume the function

Morphology   281 Table 10.2 Basic possessive paradigms Independent

Suffixed (non-​topical)


pedar-​e man


‘my father’


pedar-​e to


‘your (sg.) father’


pedar-​e u


‘his/​her father’


pedar-​e mā


‘our father’


pedar-​e shomā


‘your (pl.) father’


pedar-​e ānhā


‘their father’

of the direct object, equivalent to an independent pronoun together with the direct object marker rā: mi-​shenās-​am=ash (~ u rā mi-​shenās-​am) ‘I know him’, be-​zan-​id=eshān (~ānhā rā be-​zan-​id) ‘hit them!’. Finally, there is a small group of compound verbs in which the object marker is enclitic but as the overt subject does not induce agreement on these verbs, they act as the subject of the verb, such as sard-​am ast ‘I feel cold’ (it is cold for me), hers-​am gereft ‘I got furious’ (fury got me), and si sāl-​am shod ‘I turned thirty’ (lit. ‘thirty year of me became’). These constructions are relatively few in number, and mostly account for a physical or mental experience (see also Chapters 3 and 8). For a detailed study of these constructions, see Sedighi (2009 and 2011) where they are introduced as Psychological Verbs.

10.3.3 Other types of pronoun Reflexive pronouns In terms of reflexive pronouns, classical Persian used to have three types, the last two very highly literal. The pronouns khod, khish, and khishtan, all meaning ‘(one)self ’, are restricted to formal and written discourse, and apply to all persons. They get their meaning from the context, and mainly from the personal endings of the verbs, as in khāne-​y-​e khod rā forukht-​am ‘I sold my house’ in comparison with khāne-​y-​e khod rā forukht-​and ‘They sold their house’. In today’s Persian, only khod is being used, together with the personal enclitics in the form of khod=am ‘myself ’, khod=at ‘yourself ’, khod=ash ‘him/​herself ’ and the like. This pronoun has the classical function of the reflexives, i.e. denoting the identicality of the subject and the direct object, as in u khod-​ash rā kosht ‘he killed himself ’, and mā hargez khod-​emān ra ne-​mi-​bakhsh-​im ‘we never forgive ourselves’. It also acts as an emphatic adjunct before a pronoun or a noun, in an Ezafe construction: khod-​e shomā ‘you yourselves’, khod-​e rezā ‘Reza himself, Reza personally’. These two cases may also be said without an Ezafe construction in the form of shomā khod-​etān and rezā khod-​ash: shomā khod-​etān goft-​id ke rezā khod-​ash mikhāh-​ad bā man sohbat konad ‘You yourself told me that Reza wanted to personally talk to me’. After the prepositions, these pronouns again have an emphatic role: cherā az man mi-​pors-​id? az khod-​ash be-​pors-​id. ‘Why do you ask me? Ask he

282   Behrooz Mahmoodi-Bakhtiari himself ’, in mored be khod-​at marbut ast ‘this case (solely) concerns you’, or in moshkel-​e khod-​at ast ‘this is your problem’. Demonstrative pronouns Demonstrative pronouns of Persian are two: in ‘this’, and ān ‘that’, which may both be pluralized as in-​hā and ān-​hā, as in-​hā dāneshju-​hāye man hast-​and ‘These are my students’. They are also used as adjectives, such as in lebās ‘this dress’ and ān afrād ‘those people’. In colloquial Persian, they are also used with the word yeki ‘one’, to denote a specific item, as in yeki ‘this one’ and un yeki ‘that one’. As adjectives, the demonstratives may not be pluralized, even if they refer to a number of objects: in ketāb-​hā ‘these books’, ān zan-​ān ‘those women’. Demonstratives may appear in several emphatic phrases, such as hamin and hamān, like hamān ketāb ‘(exactly) that book’, and hamin al’ān ‘right now’. They also attach to words such as chon ‘like’, as chon-​in and chon-​ān, with their emphatic forms in-​chonin and ān-​chon-​ān. The words hamchonin ‘also’ and hamchonān ‘still, as before’ are the other productions of such combinations: hamchonin az shomā tashakkor mi-​kon-​im ‘we also thank you’, bārān mi-​āmad, vali mardom hamchonān dar khiyābān bud-​and ‘It was raining, but people were still on the street’. Interrogative pronouns Interrogative pronouns are ke ‘who’ and che ‘what’ (ki and chi in colloquial form). They are not pluralized (although they are, in the colloquial). Like the demonstrative pronouns, interrogative pronoun che may also be used in phrases such as che kas-​i ‘who, what person’, che chiz-​i ‘what, what object’, or che mard-​i ‘what man’. The pronoun ke does not have this characteristic. Indefinite pronouns Indefinite pronouns include hame ‘all’, ba’zi and barkhi ‘some’, and the extinct hich ‘none, nothing’. The traditional Persian grammar refers to these as zamāyer-​e mobham ‘the vague pronouns’, as the exact referents of them are not clear (for example, see Natel-Khanlari 1984: 201). The word hame may either be used alone, or as the head of an Ezafe construction, as in hame raftand ‘everybody went’, and hame-​ye dāneshjuyān hāzer-​and ‘all the students are present’, hame-​ye mā shomā rā dust dār-​im ‘we all like you’. The combinations hame-​kar ‘every work’, hame-​jā ‘everywhere’, hame-​jur ‘any kind’, hame-​raqam ‘any type’ and the like, are among the many compounds made with this pronoun. Although it basically refers to something plural, it may undergo a double pluralization in the form of hame-​gān ‘all, everyone’. The word hame-​gān-​i ‘public, general’ is the famous derivation based on it. The pronouns ba’zi and barkhi ‘some’ have a relatively similar function to hame. They may be used alone, as barkhi mo’taqed-​and ke  . . .  ‘Some believe that  . . .  ’; or may be used as the head of an NP, without the Ezafe marker (or sometimes with the preposition az ‘of, from’), as in ba’zi (az) afrād mi-​guyand  . . .  ‘some (of) people say  . . .  ’. The pronoun ba’zi can be pluralized in the form of ba’zi-​hā, as ba’zi hā masā’el rā kheili sāde mi-​engār-​and ‘some people take the issues very easily’.

Morphology   283 The pronoun hich is no longer used on its own, but its compounds are widely used, such as hich-​chiz ‘nothing’, hich-​kas ‘no one’, hich-​jā/​hich-​kojā ‘nowhere’. The word folān (now used as folāni independently) ‘so and so’ also refers to some known person for the hearer, not necessarily for everyone: bo-​ro be-​gu folān-​i salām res-​ānd ‘Go and say so-​and-​so said hi’. This word may also make compounds such as folān-​kas ‘such-​and such a person’, folān-​jā ‘such-​and-​such a place’ and folān-​chiz ‘such-​and-​such a thing’ (see Mahmoodi-​Bakhtiari 2006). Reciprocal pronouns Reciprocal pronouns are hamdigar and yekdigar ‘each other’, used as direct objects or the objects of preposition: mā hamdigar rā mi-​shenās-​im ‘we know each other’, mā be hamdigar komak mi-​konim ‘we help each other’. In the prepositional phrases of colloquial Persian, hamdigar can be reduced to ham, as in unā az ham motenaffer-​an ‘They hate each other’. Relative pronouns The complementizer ke ‘that’ is the actual relative marker in Persian, and Persian does not assume a specific relative pronoun. Relative clauses are shaped with this pronoun, which is located after the head of the clause, marked by the specific enclitic =i, as pesar=i ke man bozorg=ash kardam ‘The boy whom I raised’.

10.3.4 The noun Persian nouns are not marked in terms of case and gender. Different sexes may either be distinguished lexically (quch ‘male sheep’, mish ‘ewe’), or by using a qualifier: khuk-​e nar, khuk-​e māde (male/​female pig). Arabic represents its influence in terms of pluralization in two ways: First, through its two plural markers used in addition to the Persian ones: -​in and -​āt, like motarjem-​in ‘translators’ and il-​āt ‘tribes’; and second, through the presence of its ‘broken plurals’ such as asbāb ‘tools’ (sg. sabab ‘reason’), and amāken ‘places’ (sg. makān ‘place’). This method of pluralization has even been applied to some native Persian words by analogy, such as khavānin ‘local landlords’ (sg. khān), and asātid ‘professors, masters’ (sg. ostād). Persian does not have specific definite articles, and expresses definiteness mainly by syntactic means. Non-​specific and definite nouns are the stem words carrying the stress on their final syllable, such as ketầb (a book, books (as generic noun), or the book, the book in question): ketầb behtarin dust ast ‘books are the best friends’, and ketầb rā barāy-​at khar-​id-​am ‘I bought the book (in question) for you’. On the other hand, both the indefinite and specific nouns are marked with =i: ketầb=i khar-​id-​am/​ ketāb-​hầ=i khar-​id-​am ‘I bought a book/​I bought some books’, in comparison with ketầb=i rā ke goft-​i kharidam/​ketāb-​hầ=i rā ke goft-​i khar-​id-​am ‘I bought the (certain) book you mentioned/​I bought the (certain) books you mentioned’.

284   Behrooz Mahmoodi-Bakhtiari

10.3.5 Numbers Cardinal numbers Persian numbers are historically natives of its own, except the numbers sefr ‘zero’, milyun ‘million’ and milyārd ‘billion’. Large numbers are counted from the highest to the lowest, each connected together with the conjunction -​o-​. Cardinal numbers in Persian are specific from one to twenty, but get very regular afterwards. Table 10.3 provides a list of the Persian cardinal numbers: Enumerated items come after the numbers, and are not pluralized: haft qalam ‘seven pens’. Numeratives or classifiers may also be used when discrete items are counted, but not necessarily. The most commonly used is tā’fold’, as in do ta bachche ‘two babies’. For humans, however, in the formal and literary language, the words nafar or tan ‘body’ are used: se tan shahid ‘three martyrs’. Distributive nouns are also expressed by juxtaposing the cardinals together with tā: se tā se tā ‘three by three’, and the only exception is for ‘one’: yeki yeki ‘one after another, one by one’. Numbers such as ten, one hundred, 1,000 and the like may be pluralized when they are supposed to amplify the number: sad-​hā nafar az dāneshjuyān ‘hundreds of students’, milyun-​hā tumān hazine ‘millions of tomans of expenses’ (‘toman’ = Iranian currency). The adjectival suffix -​gāne added to the numerals, produces words referring to the numbers of the items included in the noun: dāstān-​hāye se-​gāne ‘the three-​phase stories, the trilogy’. They are also seen in many numerical compounds such as do-​charkh-​e ‘bicycle’, and hezār-​ pā ‘centipede’. Ordinal and fractional numbers Ordinal numbers are formed by suffixing -​om or -​omin to the cardinal numbers. The exceptions are the Arabic words avval ‘first’ (used alone, or even more than the Persian equivalent

Table 10.3 Cardinal numbers 1




























































101 sad-​o yek; 200 devist, 300 si-​sad, 400 chahar-​sad, 500 pānsad, 600 shesh-​ sad, etc.; 1,000 hezār 1,957 hezar-​o nohsad-​o panjāh-​o haft.

Morphology   285 yek-​om), ākhar ‘last’ and the numbers do-​v(v)om ‘second’ and sev(v)om ‘third’ with a -​v-​ inserted as a hiatus. Such numbers/​adjectives follow the numerals in an Ezafe construction, in case they are formed with -​om, and precede them without Ezafe, in case of having -​omin as the suffix: ruz-​e panjom-​e hafte vs. panj-​omin ruz-​e hafte ‘the fifth day of the week’. Ordinals containing -​omin are mostly used in formal language. An ordinal may be specified by suffixing -​i to it, as avval-​i ‘the first one’, panjom-​i ‘the fifth one’, ākhar-​i ‘the last one’. An ordinal pronoun may be formed by suffixing stressed -​i to the ordinal in -​om: avvali ‘the first one’, dovvomi, ākhari, etc. Fractional numbers are expressed by the cardinal numbers as the numerators, and the ordinals as denominators; such as yek panj-​om ‘one fifth’. The Arabic word nesf is used for ‘half ’ in many circumstances, except for telling the time, in which the Persian word nim is always used: sā’at se-​o-​nim shod ‘It became 3.30’. The Arabic word rob’ ‘one quarter’ is usually used in telling the time, as in panj-​o rob’ ‘a quarter past five’, or yek rob’ be shesh ‘a quarter to six’.

10.3.6 Adjectives Adjectives are used as attributes and predicates, and are not pluralized when in noun phrase:  dokhtar-​e zibā ‘lovely girl’, dokhtar-​ān-​e zibā ‘lovely girls’. However, when substantivized, they may be pluralized, such as az ān gerd-​hā mi-​khāh-​am ‘I want (some of) those round ones’. Comparison between adjectives is obtained by suffixing -​tar and -​tarin, to provide comparative and superlative adjectives: zibā> zibā-​tar > zibā-​tarin ‘beautiful, more beautiful, the most beautiful’. The only exceptions are beh-​tar ‘better’ for khub ‘good’, and bish-​tar ‘more’ for ziyād ‘much’; although khub-​tar and ziyād-​tar are not ‘wrong’. Diachronically, beh and bish meant ‘better’ and ‘more’ by themselves, and have received a comparative suffix during the course of time. Attributive adjectives normally follow their heads, in an Ezafe construction. However, sometimes the word order undergoes changes, and it turns out to be the order of adjective–​ noun, such as the case of the numbers as adjectives, like haft ketāb ‘seven books’, ordinal numbers with the suffix -​omin, such as panj-​omin ketāb (vs. ketāb-​e panj-​om) ‘the fifth book’, and the superlative adjectives, such as zibā-​tarin dokhtar ‘the most beautiful girl’. To this list we may add the phrases denoting the sympathy on the part of the speaker, such as mazlum hosein-​am ‘my suppressed Hussain’, teflak dokhtar-​ash ‘poor his daughter’, and colloquial sentences denoting the speaker’s emphasis on the adjective, with an indefinite noun such as bad zan-​i gereft-​e ‘(such a) bad woman he has married’, khub jā-​yi estekhdām shod-​i ‘(what a) nice place you have been employed in’. Quantifiers such as kheili and besyār ‘very’ precede the adjectives: pesar-​e khub, pesar-​e kheili khub ‘nice boy, very nice boy’. However, in order to emphasize the adjective, it is possible to displace the position of the quantifier, such as u pesar-​e kheili khub-​i-​st/​ u kheili pesar-​e khub-​i-​st ‘he is a very nice boy, (such a) very nice boy he is’.

10.3.7 Adverbs Persian adverbs are either morphologically identical with nouns and adjectives, or derived from them. Adverbs of time and place do not have a specific morphology of their own, and

286   Behrooz Mahmoodi-Bakhtiari are expressed with the original nouns: fardā mi-​āyad ‘He will come tomorrow’, raft bālā ‘he went upstairs’, aslahe=at ra pāyin biyandāz ‘drop (down) your gun’, bo-​ro ‘aqab ‘go backwards’. Some adverbs of time are formed with prefixing the frozen prefixes di-​ ‘last (day)’, pari-​ ‘the day before last’, pār-​ ‘last’, pirār ‘one before last’ and pas ‘one after’ in the words diruz ‘yesterday’, dishab ‘last night’, pariruz ‘two days ago’, parishab ‘two nights ago’, pārsāl ‘last year’, pirārsāl ‘two years ago’ and pasfardā ‘the day after tomorrow’. Adverbs derived or originated from adjectives are mostly of manner. For example, the adjectives khub ‘good’ and tanhā may also act as adverbs in u khub mi-​dav-​ad ‘he runs well’ and man tanhā kār mi-​kon-​am ‘I work alone’. The suffix -​āne, which is basically used in making adjectives, also acts as adverb marker, as in ma-​rā āsheq-​āne dar āqush gereft ‘he hugged me very tenderly’ (lit. ‘lover-​like’). The other suffixes -​ān (after verbal stems) and -​aki (after adjectives) are also used in formal and colloquial Persian respectively, as in ānjā rā khand-​ān tark kard ‘he left there smiling’, and yavāsh-​aki farār kard ‘he fled quietly’. Adding the Arabic tanvin to some nouns and adjectives may also yield adverbs, such as rasm-​an ‘officially’, shakhs-​an ‘personally’, jam’-​an ‘all together’, and movaqqat-​an ‘temprarily’. Prepositional phrases also act as adverbs, such as be sor’at az ānjā raft ‘he left there quickly’ (lit. ‘with quickness’), or be khub-​i kār rā be pāyān res-​ānd ‘he finished the work well’ (lit. ‘with goodness’).

10.4  Persian verbal morphology 10.4.1 The verb Persian has a very regular verbal morphology. Verbs have two stems each, which are conjugated in terms of three persons and two numbers. In terms of their structures, Persian verbs are either simple, prepositional, or compound. The stems are either present or past, and there is no stem for the future, as it basically has a periphrastic structure. The past stem is an infinitive without the final -​an, therefore most of the past stems end up either in -​t or -​d, as in bord-​an ‘to take’, neshast-​an ‘to sit’, and āmad-​an ‘to come’. Deriving the present stem from the infinitive, however, is not as regular as that of the past tense. However, some consistencies may be seen: all the past stems ending with -​id may provide the present stem in case that -​id is deleted, such as bor-​id-​an ‘to cut’, fahm-​id-​an ‘to understand’ and bakhsh-​id-​an ‘to forgive’; in which borid-​, fahmid-​ and bakhshid-​ are the past stems, and bor-​, fahm-​ and bakhsh-​are the present stems. As the most productive and innovative of its kind, the addition of -​id to many nouns has yielded so many denominated (or ja’li ‘forged’) verbs such as jang-​idan ‘to fight’ (jang ‘war’), raqs-​idan ‘to dance’ (raqs ‘dance’), and bus-​idan ‘to kiss’ (bus ‘kiss’). Apart from -​id, the other ‘past stem morphemes’ attached to the present stem are -​ād and -​d, as in oft-​ād-​an ‘to fall’ (pr. stem oft-​, ps. stem oftād-​) and nah-​ād-​an ‘to put’ (pr. stem nah-​, ps. stem nahād-​). Examples for -​d are setān-​d-​an-​/s​ etān-​‘to get’, kan-​d-​an/​ kan-​‘to dig’, khor-​ d-​an/​ khor-​‘to eat’, parvar-​d-​an/​ parvar-​‘to raise’, āvar-​d-​an/​ āvar-​‘to bring’ and rān-​d-​an/​ rān-​‘to drive’. The number of the ‘irregular’ verbs in which the past stems are not derived from the present are not so many. Efforts have been made to regularize them according to their final consonant clusters, but the rules have almost the same number of exceptional cases as the

Morphology   287 regular ones. Therefore, it may be more appropriate to consider them as the irregular items to be memorized. The only Persian verb with totally two distinct lexical items for its past and present roots is didan ‘to see’, with bin-​and did-​as its two roots. The other verbs show some similarities between their roots. Some of the most regularly used ones are as follows. zad-​an/​ zan-​ ‘to hit’, āfarid-​an/​ āfarin-​ ‘to create’, shenid-​an/​ shenav-​ ‘to hear’, dād-​an/​ dah-​ ‘to give’, kard-​an/​ kon-​ ‘to do’, bord-​an/​ bar-​ ‘to take’, mord-​an/​ mir-​ ‘to die’, dukht-​ an/​ duz ‘to sew’, rikhtan/​ riz-​ ‘to pour’, forukhtan/​ forush-​ ‘to sell’, shenākht-​an/​ shenās-​ ‘to know’, khāst-​an/​ khāh-​ ‘to want’, bastan/​ band-​ ‘to close’, shekastan/​ shekan-​ ‘to break’, neshastan/​ neshin-​ ‘to sit’, dāsht-​an/​ dār-​ ‘to have’, kāsht-​an/​ kār-​ ‘to plant’, kosht-​an/​ kosh ‘to kill’, gashtan/​ gard-​ ‘to search’, bāftan/​ bāf ‘to knit’, shemord-​an/​ shomār-​ ‘to count’, gereftan/​ gir ‘to get’, raftan/​ rav-​ ‘to go’, goftan/​ gu-​ ‘to say’, āmadan/​ ā(y) ‘to come’, and shodan/​ shav ‘to become’.

10.4.2 Conjugation elements of the verbs Verbs are formed on the basis of the stems, and different affixations result in the formation of different tenses. The infinitive without the ending -​an provides the past stem, while it may also be known as a ‘truncated infinitive’, which is a non-​finite verb, and may be used in the impersonal constructions, as well as the periphrastic future construction khāh-​am raft ‘I will go’ (raft-​an ‘to go’). Verbal Stems The simple past tense or the preterite, is shaped with the addition of the personal endings to the past stems: did-​am ‘I saw’ (did-​an ‘to see’). Participles (either past or passive), are formed as a result of adding -​e to the past stem: nevesht-​e ‘written, having been written’ (nevesht-​an ‘to write’). These participles also serve as verbal adjectives. The present stem forms the present verbs, in their indicative, imperative and subjunctive forms: mi-​bar-​i ‘You take’; be-​rav-​i ‘(that) you take’; be-​bar ‘take!’ The agentive nouns are also made up of the present participles -​ān and -​ande, suffixed to the present stem: khān-​ ande ‘reader, singer’ (khān-​d-​an ‘to read, to sing’), khand-​ān ‘laughing’ (khand-​id-​an ‘to laugh’). Prefixes Verb stems receive three types of prefixes: the indicative mi-​, the subjunctive be-​, and the negative na-​/​ ne-​. These prefixes shape the first syllable of the verb, as well as receiving the primary stress. This is as opposed to the place of stress in the nouns, which is basically on the ultimate syllable. 1) mi-​ The prefix mi-​is attached to the past stem to produce the imperfect past: mardom esterāhat mi-​kard-​and ke nāgahān zelzele shod ‘people were taking a rest when suddenly an earthquake took place’; habitual deeds in the past: man be dāneshgah-​e tehrān mi-​raft-​am ‘I used

288   Behrooz Mahmoodi-Bakhtiari to go to the University of Tehran’, and the conditionals which are counterfactual: agar mi-​ tavanest-​am, hatman az injā mi-​raft-​am ‘If I could, I would surely leave here’. With the present stem, mi-​denotes general present: āb dar sad daraje mi-​jush-​ad ‘water boils at 100 degrees’. This may encompass habitual words such as har ruz sar-​e kar mi-​rav-​am ‘I go to work every day’. It also denotes an action in the future: fardā bārān mi-​bār-​ad ‘It will rain tomorrow’. 2) be-​ The prefix be-​ (with its allomorphs bi-​ and bo-​) is now the marker of imperative and subjunctive verbs. However, in classical Persian, it used to attach to the past stems too, in order to make preterite, as in be-​raft-​am ‘I went’. This usage is no longer practised. Imperative verbs require be-​ before the present stems, as in be-​neshin ‘sit!’, and be-​khān ‘read!/​sing!’ For the subjunctive mood, it is placed before the stem of the conjugated verb, i.e. it replaces mi-​in the present verb: be-​rav-​am ‘(that) I go’, be-​band-​and ‘(that) they close’. In the compound verbs, be-​is added to the verbal part of the verb, or the light verb: jiq be-​ kesh-​im ‘(that) we scream’ (jiq keshidan ‘to scream’, lit. ‘scream-​pull’). Also, in the prefixing verbs, it gets attached to the stem, as in dar-​bi-​āvar ‘take out!’ The prefixing verbs having bar-​ as their prefixes, however, do not get be-​either for the imperative forms, or the subjunctive forms: bar-​dār ‘pick up!’ (bar-​dāsht-​an ‘to pick up’, and not *bar-​be-​dār). It represents itself as bi-​before the vowels a and ā, as in bi-​āvar ‘bring!’ (āvar-​d-​an ‘to bring’), bi-​andāz ‘drop!’ (andākht-​an ‘to drop’). It also shows itself as bo-​before the back vowels o and u, such as bo-​ kon ‘do!’, bo-​khor ‘eat!’,and bo-​ro ‘go!’, although it is not always so, and exceptions such as be-​ bor ‘cut!’ are also seen. 3) na-​ The negative prefix na-​/​ne-​ is the sole method of making a verb negative. Na-​is prefixed to the past stem, and ne-​to the present verb, i.e. it supplements mi-​in the present, and precedes the past stem, as in ne-​mi-​khor-​am ‘I do not eat’ vs. na-​khord-​am ‘I did not eat’. With respect to the compound tenses such as the pluperfect and the periphrastic future, it is again placed at the beginning of the verb, such as na-​raft-​e bud-​am ‘I had not gone’, and na-​khāh-​am raft ‘I will not go’. But with respect to the compound verbs (where there is a light verb together with a nominal part), it is attached to the verbal part, as in komak ne-​mi-​kon-​am/​ komak na-​kard-​ am ‘I do not/​did not help’. As to the subjunctives and the imperatives, na-​ is attached to the present stem, as in na-​bar ‘don’t take away’ (bord-​an/​ bar ‘to take away’), and agar na-​bar-​i ‘if you do not take  . . .  ’. In colloquial Prsian, this prefix may also be attached to the second person present subjunctive, in order to denote emphasis on the order: na-​zan ‘don’t hit!’, na-​zan-​i ‘don’t you hit!’ Endings The personal endings attached to the stems of the verbs are six, and are almost identical with the past and present stems (except with the third-​person singular). These endings suffice to indicate the persons, therefore the independent pronouns may be dropped, or be used with an emphatic function: man mi-​rav-​am, to be-​mān. ‘I go, you (may) stay’. The verbal endings and their different types are summarized in Table 10.4.

Morphology   289 Table 10.4 Verbal endings in Persian

Endings Present stem






















Past Stem







Perfect Stem/​Copula







Existential Verb



hast-​ Ø




10.4.3 Absolute tenses: past and present The simple past tense basically refers to an action accomplished in the past, as diruz ketāb-​i khar-​id-​am ‘I bought a book yesterday’. However, it may also refer to something almost done, or in the progress of completion: bachche-​hā ostād āmad ‘Hey kids, the professor is coming (āmad ‘came’), komak konid, khāne-​am sukht ‘help me, my house is burning (sukht ‘burnt’). However, past may also be denoted with the present verbs (see Mahmoodi-​Bakhtiari 2002). In colloquial Persian, some uses of present structure for the past events are recognized, such as the ‘historical present’ in tā ān jā barāyetān goftam ke dozd vāred-​e manzel mi-​sh-​e o yek nafar ro mi-​kosh-​e ... ‘I told you, up to the point when the thief enters the hous and kills a person  . . .  ’, and in some complaints about an action in the past: bachche-​ye man=o mi-​zan-​ i? neshun=et mi-​d-​am! ‘You hit my child? I will show you (the consequences)!’

10.4.4 Relative tenses (compound tenses) There are two tenses which may be regarded as compound tenses: The perfect, and the double compound past (pluperfect) tense. These tenses indicate the epistemic events, i.e. the viewpoint of the speaker is at work in the truth value of the action denoted. The indicative (present) perfect tense is formed by the past participle followed by the enclitics denoting the verb ‘to be’, such as gofte=am, gofte=i, gofte=ast, gofte=im, gofte=id, gofte=and ‘I (have) said, you (have) said  . . .  ’. In standard Persian, the stress falls on the final syllable of the past participle, but when negated (na-​gofte=am), the negative prefix receives the stress. This verb denotes the actions which have taken part in the past, with implications in the present: polis khiyābān ra baste=ast ‘the police have blocked the street’; or an action whose effects or traces are remaining: dārolfonun rā amirkabir sākht-​e ast ‘Amir Kabir has built Dār ol-​Fonun’; presumption of an action happened in the past: ehtemālan mājarā ra be u gofte=and ‘Perhaps they have told him about the event’, or even presumption of an action that the speaker is sure will happen in the future: vaqti to beresi, man az gorosnegi morde=am ‘When you arrive, I will have already starved to death’. Semantically and syntactically parallel to the present subjunctive, is the perfect subjunctive, which denotes an unrealized or desired state of an action. It is formed by the past participle coming together with the conjugation of the subjunctive form of budan

290   Behrooz Mahmoodi-Bakhtiari (bāsh-​): enshallāh resid-​e bāsh-​and ‘God willing they have arrived’, bāyad ta hāla tamām shode bāshad ‘It should have been finished by now’. On the other hand, past perfect is shaped with the past participle plus the past tense of budan:  rafte bud-​am ‘I had gone’, koshte bud-​and ‘They had killed’. Denoting a completed action before to an action noted, it deals with real and unassumed facts: u rā qabl az ezdevāj=emān na-​did-​e budam ‘I had not met her before our marriage’. The stress pattern of the past perfect is the same as the present perfect. There has also been a ‘durative perfect’ tense in Persian, which is not very commonly used in current Persian. Indicating an action considered in its duration in the past, with implications in the present, this tense is shaped by mi-​attached to the present perfect tenses: ānhā injā zendegi mi-​karde=and ‘they used to live here’. The other compound tense is the currently extinct ‘double compound past tense’, referring to the actions completed in the past, and being reported in the present time. It is shaped with the past participle plus the perfect tense of budan: raft-​e bude=am ‘I had already gone’. The last compound tense, is the so-​called future tense, which basically has a periphrastic structure. It should be noted here that this structure is used for emphatic purposes, or the formal speech, and in other cases, the speakers of Persian tend to make use of the present tense (with the future adverbials) to refer to the future. The structure of this tense is the presnet of the verb khāstan ‘to want’ (without the prefix mi-​), together with the truncated infinitives of the main verb, such as khāh-​am raft ‘I will go’ (raftan ‘to go’), khāh-​i did ‘you will see’ (didan ‘to see’). For the compound verbs, khāstan appears before the light verb: kār khāh-​im kard ‘we will work’ (kār kardan ‘to work’); and in the prefixing verbs, it is attached to the main verb, after the prefix: bar khāh-​am gasht ‘I will return’ (bar-​gashtan ‘to return’). Negative prefix na-​is attached to khāstan in all the cases. Table 10.5 provides the basic paradigms for the verb bordan ‘to take’. And Table 10.6 provides the different aspectual, modal, and temporal forms of the verb bordan ‘to take’.

10.4.5 Changing the valence of the verbs: passive voice and causatives The passive form of a verb is formed from the past participle of the verb, together with the conjugation of the verb shodan ‘to become’, depending on the tense. For example, the

Table 10.5 Basic paradigms for the verb bordan ‘to take’ Present



Present perfect

Past perfect






bord-​e bud-​am






bord-​e bud-​i






bord-​e bud-​Ø






bord-​e bud-​im






bord-​e bud-​id






bord-​e bud-​and

Morphology   291 Table 10.6 Aspects, moods, and tenses: bordan ‘to take’ Indicative


Imperfective: be-​bar

Imperative (2sg.)









mi-​bord-​e ast


Aorist: Preterite



bord-​e ast

Present perfect

bord-​e ast

bord-​e bāsh-​ad


Past perfect

bord-​e bud-​Ø

bord-​e bud



bord-​e bud-​e ast

Affirmative future:

khāh-​ad bord

present passive is formed with the present tense of shodan: ān melk kharid-​e mi-​shav-​ad ‘That property is (being) bought’; Past passive: u dar jang kosht-​e shod-​Ø ‘he was killed in the war’; Future passive: ānjā forukht-​e khāh-​ad shod ‘There will be sold’, Present subjunctive passive: agar kosht-​e shav-​ad heif ast ‘It is a pity if he gets killed’, and the like. On the other hand, causatives are formed by suffixing -​ān to the present stem of the verb, as in fahm-​id-​an ‘to understand’, fahm-​ān-​(i)d-​an ‘to convey, to teach’; ras-​id-​an ‘to arrive’, ras-​ān-​(i)d-​an ‘to bring along, to give a lift’. For more discussion on passive voice and causative, see Chapter 7.

10.4.6 Specific verbs and their morphological properties There are some specific verbs in Persian which do not necessarily follow the normal patterns of verbal morphology in Persian, and deserve more attention. In the parts to come, we will study them in more detail. 1 

It is noteworthy that the structure mi-​past stem-​ending, is used both to refer to a continous action in the past (such as mi-​bord-​am ‘I used to take’), and to an action in the conditional clause ([agar] mi-​bord-​am ‘[if] I would take’). That is why this construction is placed both as indicative and non-​ indicative in the table.

292   Behrooz Mahmoodi-Bakhtiari ‘To be’, ‘to have’, and the auxiliary verbs 1) budan ‘to be’ The verb budan ‘to be’ has several forms, many of which not similar to its infinitive. Normally, the verb ‘to be’ is indicated by the copula enclitics =am, =i, =ast, =im, =id, and =and, attached to the noun, and make the independent pronoun optional, or kept for the sake of emphatic reasons: (man) mariz=am ‘(I) am sick’, (to) hasan=i ‘(you) are Hassan’, (u) dānā=ast ‘he is wise’, (mā) āzād=im ‘(we) are free’, (shomā) mahkum=id ‘(you) are condemned’, (ānhā) qātel=and ‘(they) are murderers’. The existential and the subjunctive forms of budan are conjugated on the basis of the stems hast-​and bāsh-​, for which no separate infinitive is assumed: man hamishe āsheq-​e to hast-​am ‘I always love you’, lit. ‘I am always your lover’, chand nafar dar mehmān-​i hast-​and? ‘How many people are there in the party?’, agar ānjā bāsh-​ad, hatman hamān-​jā mi-​mān-​ad ‘If he is there, he will definitely stay right there’. The negative is also rather specific, in the form of ni-​st-​ together with the personal endings, in the form of ni-​st-​am, ni-​st-​i, ni-​st-​Ø, ni-​st-​im, ni-​st-​id, ni-​st-​and. The cluster -​st-​ is surely the shortened form of hast or =ast, which also shows itself when attached to a word ending up with a long vowel: in rezā=st ‘this is Reza’, ketāb ān bālā=st ‘the book is up there’. The imperative is formed via bāsh-​, as in sāket-​bāsh ‘be quiet’ (sg.), and hamin-​jā bāsh-​id ‘stay right here’, lit. ‘be right here’ (pl.). Although bāsh-​is also the basic stem for the subjunctives and conditionals, the simple past of budan may be used to denote the counterfactual conditional states as well, such as kāsh man shohar=at bud-​am ‘I wish I were your husband’, agar āqel bud-​i, in-​jā ne-​mi-​mānd-​i ‘If you were wise, you would not stay here’. The verb budan turns to participle like the other verbs, in the form of bud-​e, and the perfectives are conjugated on that basis, in the form of bud-​e=am, bud-​e=i, bude-​e=ast, etc. However, there is no pluperfect for this verb, neither indicative nor subjunctive. 2) dāshtan ‘to have, hold, keep’ The verb dāshtan/​ dār-​ is a rather complicated and exculsive verb in Persian. It is one of those verbs with distinct past and present stems (dāsht-​and dār-​respectively). It does not get the present prefix mi-​in its present conjugation (dār-​am, dār-​i, dār-​ad, etc.), and as a transitive verb, it does not undergo the process of passivization. Its imperative form is not (or rather, is no longer) lexically made with the addition of be-​to the present stem as dār or be-​dār; instead it has the periphrastic form using the imperative form of the verb budan, together with its past participle: dāsht-​e bāsh ‘have!’ The periphrastic structure is also used for the subjunctive mood, receiving the personal endings, as in dāsht-​e bāsh-​id ‘(that) you have’: barāye in sākht-​emān, bāyad kheili pul dāsht-​e bāsh-​id ‘You need so much money for this building’, agar pul-​e kāfi dāsht-​e bāsh-​am, hatman be to komak mi-​kon-​am ‘If I have enough money, I will surely help you’. The negative marker for both the past and present stems is na-​, which attaches to the stems: na-​dār-​am ‘I do not have’, na-​dāsht-​i ‘you did not have’, na-​dāsht-​e bāsh ‘Do not have!’ However, when the verb dāsht-​an forms a part of compound or prefixing verbs without its base meaning, the present prefix mi-​is attached to the stem, as in ketāb rā bar-​mi-​dār-​am ‘I pick up the book’ (bar-​dāshtan ‘to pick up’, bar ‘up, over’), ān-​hā rā hamin-​jā negah-​mi-​ dār-​im ‘We keep them right here’ (negah-​dāshtan ‘to stop’, negah ‘look’). The imperative will also be made out of the present stem, as in bar-​dār ‘take!’, and negah-​dār ‘stop!’. The negative marker represents itself as na-​for the past, present, and the imperative of the compounds, in

Morphology   293 dust=ash na-​dār-​am/​ dust=ash na-​dāsht-​am ‘I do not/​did not love him’, ān rā bar na-​dār ‘do not pick it up!’, and u rā dar in sharāyet negah na-​dār-​id ‘do not keep him in such conditions’. The other allomorph of na-​(ne-​) is seen before the prefix mi-​in bar-​ne-​mi-​dār-​am/​ bar-​ne-​ mi-​dāsht-​am ‘I do not/​would not pick (it) up’. The verb dāsht-​an is also used in a progressive construction in standard colloquial Persian, which is confined to positive statements, and is not witnessed in literature and formal language. The construction consists of the verb dāsht-​an preposed to the indicative imperfective forms, no matter if it is a simple, prepositional, or compound verb; such as dār-​ad mi-​rav-​ad ‘(S)he is going now/​is about to go’, dār-​am bar-​mi-​gard-​am ‘I am returning, about to return’, and dārand shenā mi-​konand ‘They are swimming’. The past progressive would logically be made by the past tense of dāshtan, and the above sentences would turn to dāsht-​Ø mi-​raft-​Ø ‘(S)he was going/​was about to go’, dāsht-​am bar-​mi-​gasht-​am ‘I was returning, was about to return’, and dāsht-​and shenā mi-​kard-​and ‘They were swimming’. 3) Modal auxiliaries bāyad ‘must’ and shāyad ‘may’ The two auxiliaries bāyad and shāyad usally receive verbs in their subjunctive mood: bāyad be-​rav-​id ‘you must go’, shāyad fardā bārān be-​bār-​ad ‘It may rain tomorrow’. Although bāyad and shāyad look similar in some respects, they are basically different in terms of their origins. The auxiliary bāyad comes from the Middle Persian word abāyēd with almost the same meaning, while shāyad derives from the verb shāyestan ‘to be fitting, to be qualified’, which used to be conjugated as well. Bāyad can get a negative prefix, but shāyad does not get it, although it did in the classical Persian: na-​bāyad be ānjā be-​rav-​i ‘you mustn’t go there’. Bāyad also precedes past progressives, to denote an unfulfilled obligation: bāyad govāhināme mi-​ gereft-​am ‘I ought to have got a driver’s licence’. The other function of bāyad may be seen in the impersonal constructions. Together with the verbs tavānestan ‘to be able to’ and shodan ‘to become’, bāyad receives the verbs in their truncated forms, and act for all the persons: bāyad raft ‘one should go’, mi-​tavān goft ‘one can say’, mi-​shav-​ad gorikht ‘one may escape’. For a detailed study of the functions of bāyad and shāyad, see Windfuhr (1979: 99–​113), and for a study of their internal morphology, see Mahmoodi-​Bakhtiari (2008).

10.4.7 Verbal derivations Present stems receive several suffixes and produce different deverbal nouns (or adjectives). The major derivational suffixes attached to the present stems are the nominalizers -​i in bāz-​i ‘play’ (bākht-​an ‘to lose’), -​esh in rav-​esh ‘method’ (raftan ‘to go’), kush-​esh ‘effort’ (kush-​id-​an ‘to try’), and bin-​esh ‘view’ (did-​an ‘to see’); -​(e)mān in zāy-​emān ‘delivering the baby’ (zā-​ d-​an ‘to give birth’), sāz-​mān ‘organization’ (sākht-​an ‘to make’), and the nominalizer -​āk which is found in few instances such as khor-​āk ‘food’ (khor-​d-​an ‘to eat’), push-​āk ‘clothes’ (push-​id-​an ‘to dress’), and suz-​āk ‘gonorrhea’ (sukht-​an ‘to burn’); The agentive marker -​ ande in dav-​ande ‘runner’ (dav-​id-​an ‘to run’) and khān-​ande ‘singer, reader’ (khan-​d-​an ‘to sing, to read’); and the adjective markers -​ā in tavān-​ā ‘mighty’ (tavān-​est-​an ‘to be able’), dān-​ā ‘knowing’ (dān-​est-​an ‘to know’); and -​ān in khand-​ān ‘laughing’ (khand-​id-​an ‘to laugh’). There are also several derivational suffixes which attach to the past stems. The major ones include -​e, which produces verbal adjectives and passive participle, like nevesht-​e

294   Behrooz Mahmoodi-Bakhtiari ‘written (text)’ (nevesht-​an ‘to write’), and sukht-​e ‘burnt’ (sukht-​an ‘to burn’); and nominal marker -​ār in khar-​id-​ār ‘buyer’ (khar-​id-​an ‘to buy’) or nevesht-​ār ‘(written) text’ (nevesht-​ an ‘to write’), which act as the agentive marker in the former example, and a concrete noun in the latter. The infinitive also receives the adjective making suffix -​i, denoting the potential of the content of the verb. For example khord-​an-​i ‘edible’ (khord-​an ‘to eat’), did-​an-​i ‘sight to see’ (did-​an ‘to see’). Sometimes, it denotes the character’s will to do something, as in mehmān-​ hā-​ye mā raftan-​i nistand, mānd-​an-​i hastand ‘Our guests do not seem to be willing to go, they would stay’ (lit. ‘they are not going-​like, they are staying-​like’).

10.5 Compounding Persian makes an extensive use of compounding in its word formation. In the parts to come, we will deal with compounding in two phases: nominal compounding, and verbal compounding.

10.5.1 Compounding in nominal morphology Compounding in the nominal Persian morphology takes place when (usually) two lexical morphemes are placed adjacent to each other, and form an entirely new word. These compounds are different from the phrases, in the sense that they act just like a single word: they do not let any inflectional or derivational affix between their components, and the plural marker -​hā has to attach to the whole word, not just to a part of it. Persian compounds have two major classifications: non-​syntactic and syntactic. The non-​syntactic compounds are those between the components of which there is no syntactic relationship, such as arre-​māhi ‘swordfish’ (lit. ‘sword-​fish’), kot-​shalvār ‘formal suit’ (lit. ‘jacket-​pants’), gāv-​mish ‘buffalo’ (lit. ‘cow-​ewe’), sine-​pahlu ‘pneumonia’ (lit. ‘chest-​side’), and mādar-​bozorg ‘grandmother’ (‘mother-​ big’). These compounds also encompass the reduplications such as rāh-​rāh ‘stripped’, tond-​tond ‘quickly’, hezār-​hezār ‘in thousands’, kam-​kam ‘gradually’, and tekke-​ tekke ‘in pieces’. Besides these, irreversible binominals, or as Perry (2007) states, ‘copulative compounds’, are those compounds in which the components are almost of equal weight, and are attached via -​o-​, such as gard-​o-​khāk ‘dust’ (lit. ‘dust-​and-​soil’), kār-​o-​kāsebi ‘business’ (lit. ‘work-​ and-​business’), kafn-​o-​dafn ‘burying ceremony’ (lit. ‘shroud-​and-​bury’). These words may sometimes contain meaningless items, such as bar-​o-​bachche-​ha ‘guys’ (lit. ‘?-​and-​kids’), āt-​o-​āshqāl ‘junk’ (lit. ‘?-​and-​rubbish’). The meanings of such compounds, however, are not always as clear as the meaning of the examples. Some opaque compounds of this kind are:  gorg-​o-​mish ‘dawn’ (lit. ‘wolf-​and-​ewe’), cheshm-​o-​cherāq ‘dear’ (lit. ‘eye-​and-​lamp’), shākh-​o-​shāne ‘threat’ (lit. ‘horn-​and-​shoulder’). Another type of non-​syntactic compounds are those refered to as ‘Semi-​syntactical bound phrases’ by Shaki (1964: 18), as the word-​groups intermediate between the syntactical and non-​syntactical word-​groups, such as dast-​be-​jib ‘wealthy’ (lit. ‘hand-​in-​pocket’),

Morphology   295 halqe-​be-​gush ‘obedient’ (lit. ‘ring-​in-​ear’), dast-​be-​asā ‘cautious’ (lit. ‘hand-​to-​stick’), and khāk-​bar-​sar ‘miserable’ (lit. ‘dust-​on-​head’). On the other hand, there are compounds in which a syntactic relationship is traced between the components, such as lebās-​khāb ‘nightgown’, shāgerd-​avval ‘top student’ and pesar-​ amu ‘uncle’s son’ (which are all lexicalized Ezafe constructions). Sometimes the compound is ‘possessive’, that is to say, the result of the compound is an attribute of, or possessed by, a third party, such as gardan-​koloft ‘powerful’ (lit. ‘thick-​neck’), āstin-​kutāh ‘short-​sleeved’ (lit. ‘sleeve-​short’), and pāshne-​boland ‘high heels’ (lit. ‘heel-​tall’). When a part of the compound is a verbal stem, there might be several syntactic relationships between the noun and the verbal stem adjacent to it. These relations may be of the following types: nominative, as in naft-​khiz ‘oil centre’ (lit. ‘oil-​rising’); accusative, as in bomb-​afkan ‘bomber’ (lit. ‘bomb-​drop’); dative, as in āyande-​negar ‘provident’ (lit. ‘future-​look’); adverbial, as gerān-​ forush ‘expensive-​seller’. Some famous verbal stems used in compounds are: -​ālud or -​ālu ‘polluted, stained (with)’ from āludan/​ālā-​ ‘to pollute’: khun-​ālud ‘bloody’, gusht-​ālu ‘plump’; -​āmiz ‘mingling (with)’ from āmikhtan ‘to mix’): mehr-​āmiz ‘kind’; -​angiz ‘arousing’, from angikhtan ‘to stimulate’): del-​angiz ‘pleasant’; -​āvar ‘bringing’, from āvardan ‘to bring’: khāb-​ āvar ‘boring’ (lit. ‘sleep-​bringer’), -​bakhsh ‘giver’ from bakhshidan ‘to bestow’: ārām-​bakhsh ‘tranquilizer’; -​pazir ‘accepting’, from paziroftan ‘to accept’: bāvar-​pazir ‘beleivable’; -​kosh ‘killing’, from koshtan:  hashare-​kosh ‘insecticide’; -​khār ‘eater’, from khordan:  giyāh-​khār ‘vegetarian’. We may also consider the adjectival compounds, where we have lexicalized adjectival phrases normally appear in Ezafe constructions, such as medād rang-​i ‘coloured pencil’, gowje sabz ‘greengage’ (lit. ‘tomato-​green’), gis borid-​e ‘shameless (girl)’ (lit. ‘wigs-​cut’), cheshm sefid ‘ungreatful’ (lit. ‘eye-​white’), and siyāh bakht ‘miserable’ (lit. ‘black-​fortune’), in which the first two are nouns and the rest are adjectives, again, as a result of compounding.

10.5.2 Compounding in verbal morphology As said before, Persian makes a great use of compounding, both in terms of nominal morphology and the verbal. Compound verbal constructions may be classified into two groups: those resulted as the process of compounding (where we have a noun together with a light verb), and those made as a result of incorporation (in which the nominal part is syntactically related to the verb, usually as a direct object). Incorporative verbs are many, common and growing, such as qazā khrodan ‘to eat food’ (lit. ‘food eating’), and lebās pushidan ‘to get dressed’ (lit. ‘clothes dressing’) (see Dabir-​Moghaddam 1997). Compound verbs are further discussed in Chapters 2, 3, 7, 8, 9, 15, 17, and 19. Compounding in Persian is historically new. Middle Persian did not have many complex verbs, and the only light verb used in verbal construction was kardan ‘to do’. In the course of time, the number of compound verbs grew significantly, to the extent that now the number of simple verbs in Persian numbers almost a hundred, many of which are not very actively used, and many of them are used as their nominalized form with a light verb. For example, it is very uncommon to hear the verbs geristan ‘to cry’, āvikhtan ‘to hang’ or andudan ‘to coat’ any longer in casual speech, rather the compounds gerye kardan (lit. ‘cry-​doing’), āvizān kardan (lit. ‘hanged-​doing’) and andud kardan (lit. ‘coat-​doing’) are used.

296   Behrooz Mahmoodi-Bakhtiari Although the most commonly used verb in the verbal part of the compound verbs is kardan, there are a number of other ‘light verbs’ as well, such as dādan ‘to give’ in pās dādan ‘to pass (in sports)’; zadan ‘to hit’ in dād zadan ‘to cry’, raftan ‘to go’ in dar raftan ‘to escape, to be dislocated’, āvardan ‘to bring’, in kam āvardan ‘to get tired, to give up’, dāshtan ‘to have’, in negah dāshtan ‘to stop, to keep (from moving)’, and khordan ‘to eat’ in yekke khordan ‘to be surprised’. This method of verb formation is extremely productive, and is not restricted to the Persian nouns in the compound. Loan words may also play a role in such verbs, such as clik kardan ‘to click’, telefon kardan ‘to call’, āsfālt kardan ‘to asphalt’, shāns āvardan ‘to be lucky’, and rofuze shodan ‘to fail (an exam)’. Many of the Arabic loan words which are basically infinitives, receive the light verb kardan to be regarded as a Persian infinitive, such as qat’ kardan ‘to cut’ (lit. ‘cut doing’), moqāvemat kardan ‘to resist’ (lit. ‘resistance doing’), e’temād kardan ‘to trust’ (lit. ‘trust making’), emtahān kardan ‘to test’ (lit. ‘test doing’), and eshtiyāq dāshtan ‘to be interested’ (lit. ‘interest having’).

10.6  Minor word-​formation types in Persian As we have noted in this chapter inflection, derivation, and compounding are the major word-​formation methods in Persian. However, some minor word-​formation processes, mostly recent and for specific purposes, have been adopted in this language. Here is a brief review of them.

10.6.1 Back-​formation Although back-​formation (together with conversion) may be seen as closely related to derivational affixation, due to their relatively less productivity, we prefer to introduce them both here. Back-​formation refers to a type of word formation in which a single word is considered to be a derived one, thus a new root may be extracted out of it for further word-​formation purposes. Persian examples are bāz-​bin ‘reviewer’ from bāz-​bini ‘review’, baste-​band ‘packer’ from baste-​bandi ‘packaging’, morq-​dār ‘aviarist’ from morq-​dāri ‘aviculture’.

10.6.2 Conversion Conversion (also known as zero-​derivation or functional shift) is a common word formation process in Persian. As to the verb-​oriented conversions, it is not possible, of course, to expect the vast production of verbs in the forms of other parts of speech as it is in a language such as English, since Persian verbs are conjugated, and undergo some changes all the same. However, the verbal stems may be transformed into nouns, as in the past stems bākht ‘loss’ (bākht-​an/​ bāz), dar-​yāft ‘receipt’ (dar-​yāft-​an/​ dar-​yāb ‘to understand, to realize, to receive), and sākht ‘structure’ (sākht-​an/​ sāz ‘to make, to build’). Newly used words such as borun-​raft

Morphology   297 ‘exit’ (borun-​raft-​an ‘to go out) and dir-​kard ‘delay’ (dir-​kard-​an ‘to be late’) are based on compound verbs. Present stems may also act as nouns, especially in bi-​nominal forms, as in jonb-​o-​jush ‘activity’ (jonb-​id-​an ‘to move’, jush-​id-​an ‘to boil’), pors-​o-​ju ‘query’ (pors-​id-​an ‘to ask’, jost-​an ‘to search, to look for’). Adjectives easily convert to adverbs, and there are plenty of such conversions, as khub ‘good, well’, sari’ ‘fast, fast’, zibā ‘beautiful, beautifully’, qalat ‘wrong, wrongly’, ārām ‘slow, slowly’. Some nouns turn to adjectives in specific contexts, such as chaman ‘grass’, which is used as an adjective in zamin-​e chaman ‘grass field’; or an infinitive such as khord-​an ‘to eat’ as an adjective in āb-​e khord-​an ‘drinking water’. The same thing holds true for jādu ‘spell’ in cherāq-​e jādu ‘magic lamp’, and khun ‘blood’ in del-​e khun ‘bloody heart, grieving heart’. Sometimes, the names of the materials are used as adjectives. For example shalvār-​e katān ‘cotton pants’, medāl-​e talā ‘golden medal’, jeld-​e pelāstik ‘plastic cover’. Naturally, all these items should receive an adjective marker -​i.

10.6.3 Clipping New Persian has accepted the process of clipping, or shortening some words while retaining the original meaning, to some extent. Like some other languages (such as English), Persian clipping words do not create lexemes with new meanings, rather they produce lexemes with a new stylistic value. Examples are motor (< motorsiklet ‘motorcycle’), super (< supermārket ‘grocery’), pās (< pāsport ‘passport’), sānt (