The Grammar of Interactional Language
 9781108481823, 9781108693707, 9781108741446

Table of contents :
Frontmatter
Contents
Figures
Tables
Conversation Boards
Acknowledgments
Abbreviations
Prologue
1 Interactional Language
1.1 Introducing Interactional Language
1.2 Toward a Grammar of Interactional Language
1.3 The Significance of Interactional Language
2 The Syntacticization of Speech Acts
2.1 Introduction
2.2 Classic Speech Act Theory
2.2.1 Situating Speech Act Theory
2.2.2 What Are Speech Acts?
2.3 The Linguistic Properties of Speech Acts
2.4 Syntacticizing Speech Acts Part I: The View from Generative Semantics
2.4.1 How to Encode Meaning in Syntax
2.4.2 Syntactic Underpinnings of the Performative Hypothesis
2.5 The Fate of the Performative Hypothesis
2.5.1 Arguments against Austin’s Performative Hypothesis
2.5.2 Arguments against the Empirical Evidence
2.5.3 Arguments against the Syntacticization of Pragmatics
2.6 Syntacticizing Speech Acts Part II: Functional Architecture
2.6.1 Theoretical Background
2.6.1.1 The Model
2.6.1.2 The Fate of S
2.6.2 A Dedicated Speech Act Structure
2.6.3 New Theories, New Problems
2.7 Conclusion
3 From Speech Acts to Interaction
3.1 Introduction
3.2 Philosophical Underpinnings
3.2.1 Assumptions about Conversations
3.2.2 Different Ways of Doing Things with Language
3.3 Dialogue-Based Frameworks
3.3.1 Conversation Analysis
3.3.1.1 Turn-Taking
3.3.1.2 Backchannels
3.3.1.3 Adjacency Pairs
3.3.1.4 Summary
3.3.2 Grounding Theory
3.4 Functional Grammar-Based Frameworks
3.4.1 Systemic Functional Linguistics
3.4.2 Functional Discourse Grammar
3.4.3 Longacre’s Grammar of Discourse
3.4.4 Interactional Linguistics
3.5 Formal Grammar-Based Frameworks: The Semantic Angle
3.5.1 Formal Semantics of the Truth-Conditional Kind
3.5.2 Common Ground and Dynamic Semantics
3.5.3 The Question under Discussion and Being Inquisitive
3.5.4 Toward a Formal Semantics of Dialogue
3.5.5 Expressive Dimensions and Other Forms of Language Use
3.6 Formal Grammar-Based Frameworks: The Syntactic Angle
3.7 Conclusion
Lesson 1: Integrate Different Types of Meaning
Lesson 2: Rethinking the Difference between LanguageCompetence and Performance
Lesson 3: Rethinking the Primacy of the Sentence as the Unitof Analysis
Lesson 4: Rethinking Common Ground Updates
Lesson 5: Beyond the Sub-Discipline Divide
4 The Interactional Spine Hypothesis
4.1 Problems I Want to Address
4.1.1 The Empirical Problem: Confirmationals and Response Markers
4.1.2 The Analytical Problem: The Need for a Framework
4.1.3 The Theoretical Problem: What Does It All Mean?
4.1.4 The Methodological Problem: Interactional Data
4.2 The Framework: The Universal Spine Hypothesis
4.3 What I Propose: The Interactional Spine Hypothesis
4.3.1 Extending the Universal Spine with Interactional Functions
4.3.1.1 The Grammar of Grounding
4.3.1.2 The Grammar of Responding
4.3.2 Assumptions about the Normal Course of the Conversation
4.3.3 Methodology
4.3.4 Reporting Acceptability Judgments
5 Initiating Moves: A Case-Study of Confirmationals
5.1 Introduction
5.2 The Grammar of Initiating Moves
5.2.1 The Function of Confirmationals
5.2.2 Confirmationals on the Interactional Spine
5.2.3 The Core Meaning of Confirmationals
5.2.4 Predictions
5.3 The Role of the Host Clause: Target of Confirmation
5.3.1 Declaratives
5.3.2 Interrogatives
5.3.3 Imperatives
5.3.4 Exclamatives
5.3.5 Summary
5.4 The Articulated GroundP
5.4.1 The Argument from Interpretation
5.4.2 The Argument from Differences in Confirmationals
5.4.3 The Argument from Multiple Sentence-Final Particles
5.4.4 The Argument from Clause-Type Restriction
5.4.5 Summary
5.5 Confirmational Paradigms
5.5.1 A Paradigmatic Contrast Based on [+/−coin]
5.5.2 The Timing of Grounding
5.5.3 The Gradability of Beliefs
5.6 Confirmationals and Their Kin
5.6.1 Narrative vs. Confirmational Eh
5.6.2 When Attitudes Need Not Be Confirmed
5.6.3 What Makes Us Uncertain: The Role of Evidence
5.7 Conclusion
6 Reacting Moves: A Case-Study of Response Markers
6.1 Introduction
6.2 The Grammar of Reacting Moves
6.2.1 RespP in Initiation and Reaction: Similarities and Differences
6.2.2 The Core Meaning of Response Markers
6.2.3 The Target of Response
6.3 Associating Response Markers with the Interactional Spine
6.3.1 Answering: When Response Markers Associate with CP
6.3.1.1 The Syntax of Polarity
6.3.1.2 The Syntax of Polar Response Markers as Answers
6.3.2 (Dis-)agreement: When Response Markers Associate with GroundSpkrP
6.3.2.1 Responding to Different Speech Acts
6.3.2.2 Responding to Assertions and Negative Questions
6.3.2.3 Evidence from Complex Response Markers
6.3.2.3.1 Doubled Response Markers
6.3.2.3.2 Yeah no Is Not a Contradiction
6.3.3 Acknowledgment: When Response Markers Associate with GroundAdrP
6.3.4 Responding: When Response Markers Associate with RespP
6.3.4.1 Austrian jo
6.3.4.2 Response to Vocatives
6.3.4.3 Backchannels Again
6.3.4.4 English well
6.4 Reacting with Emotions
6.4.1 Reactions and Emotions
6.4.2 Emotive Content via Prosodic Modification
6.4.2.1 Expressing Intensity On and Off the Spine
6.4.2.2 Expressing Expectedness
6.4.3 Emotive Content via Complex Response Markers
6.4.3.1 Doubled Response Markers
6.4.3.2 Oh-Prefixed Response Markers
6.4.4 Emotivity Is Not a Spinal Function
6.5 Conclusion
7 The Grammar of Interactional Language
7.1 Introduction
7.2 The Syntacticization of Verbal Interaction
7.2.1 The Interactional Spine Hypothesis
7.2.2 Ingredients of the Interactional Spine Hypothesis
7.3 Toward a Formal Typology of Interactional Language
7.3.1 What Is a Formal Typology of Interactional Language?
7.3.2 Where and How and When
7.3.3 More Predictions of the Interactional Spine Hypothesis
7.3.3.1 Nominal Interactional Structure
7.3.3.2 Complex Moves
7.4 Exploring the Cognitive Underpinnings of the Grammarof Interaction
7.4.1 Evidence for Cognitive Underpinnings
7.4.2 Do Interactive Abilities Precede Linguistic Abilities?
7.4.3 Interactive Abilities Are Also Linguistic Abilities: The Bridge Model
7.5 Conclusions and Further Questions
7.5.1 Conclusion
7.5.2 Logophoricity
7.5.3 Genre, Style, and Subjectivity
7.5.4 Information Structure
7.5.5 The Role of Intonation
7.5.6 The Clause Type–Speech Act Mapping
Epilogue
Bibliography
Index

Citation preview

The Grammar of Interactional Language

Traditional grammar and current theoretical approaches toward modeling grammatical knowledge ignore language in interaction: that is, words such as huh, eh, yup or yessssss. This groundbreaking book addresses this gap by providing the first in-depth overview of approaches toward interactional language across different frameworks and linguistic sub-disciplines. Based on the insights that emerge, a formal framework is developed to discover and compare language in interaction across different languages: the Interactional Spine Hypothesis. Two case-studies are presented: confirmationals (such as eh and huh) and response markers (such as yes and no), both of which show evidence for systematic grammatical knowledge. Assuming that language in interaction is regulated by grammatical knowledge sheds new light on old questions concerning the relation between language and thought and the relation between language and communication. It is essential reading for anyone interested in the relation between language, cognition, and social interaction. martina wiltschko is an ICREA Research Professor at the Universitat Pompeu Fabra, Barcelona.

The Grammar of Interactional Language Martina Wiltschko ICREA, Universitat Pompeu Fabra, Barcelona

University Printing House, Cambridge CB2 8BS, United Kingdom One Liberty Plaza, 20th Floor, New York, NY 10006, USA 477 Williamstown Road, Port Melbourne, VIC 3207, Australia 314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025, India 79 Anson Road, #06–04/06, Singapore 079906 Cambridge University Press is part of the University of Cambridge. It furthers the University’s mission by disseminating knowledge in the pursuit of education, learning, and research at the highest international levels of excellence. www.cambridge.org Information on this title: www.cambridge.org/9781108481823 DOI: 10.1017/9781108693707 © Martina Wiltschko 2021 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2021 A catalogue record for this publication is available from the British Library. ISBN 978-1-108-48182-3 Hardback Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.

To all the warriors!

Contents

List of Figures List of Tables List of Conversation Boards Acknowledgments List of Abbreviations Prologue 1 Interactional Language 1.1 1.2 1.3

Introducing Interactional Language Toward a Grammar of Interactional Language The Significance of Interactional Language

2 The Syntacticization of Speech Acts 2.1 2.2

2.3 2.4

2.5

2.6

2.7

Introduction Classic Speech Act Theory 2.2.1 Situating Speech Act Theory 2.2.2 What Are Speech Acts? The Linguistic Properties of Speech Acts Syntacticizing Speech Acts Part I: The View from Generative Semantics 2.4.1 How to Encode Meaning in Syntax 2.4.2 Syntactic Underpinnings of the Performative Hypothesis The Fate of the Performative Hypothesis 2.5.1 Arguments against Austin’s Performative Hypothesis 2.5.2 Arguments against the Empirical Evidence 2.5.3 Arguments against the Syntacticization of Pragmatics Syntacticizing Speech Acts Part II: Functional Architecture 2.6.1 Theoretical Background 2.6.2 A Dedicated Speech Act Structure 2.6.3 New Theories, New Problems Conclusion

3 From Speech Acts to Interaction 3.1 3.2

Introduction Philosophical Underpinnings 3.2.1 Assumptions about Conversations 3.2.2 Different Ways of Doing Things with Language

page xi xii xiii xiv xv xvi 1 1 2 4

9 9 10 10 11 13 16 17 19 20 21 22 23 24 25 30 34 35

38 38 40 40 42

vii

viii

Contents 3.3 Dialogue-Based Frameworks 3.3.1 Conversation Analysis 3.3.2 Grounding Theory 3.4 Functional Grammar-Based Frameworks 3.4.1 Systemic Functional Linguistics 3.4.2 Functional Discourse Grammar 3.4.3 Longacre’s Grammar of Discourse 3.4.4 Interactional Linguistics 3.5 Formal Grammar-Based Frameworks: The Semantic Angle 3.5.1 Formal Semantics of the Truth-Conditional Kind 3.5.2 Common Ground and Dynamic Semantics 3.5.3 The Question under Discussion and Being Inquisitive 3.5.4 Toward a Formal Semantics of Dialogue 3.5.5 Expressive Dimensions and Other Forms of Language Use 3.6 Formal Grammar-Based Frameworks: The Syntactic Angle 3.7 Conclusion

4 The Interactional Spine Hypothesis 4.1 Problems I Want to Address 4.1.1 The Empirical Problem: Confirmationals and Response Markers 4.1.2 The Analytical Problem: The Need for a Framework 4.1.3 The Theoretical Problem: What Does It All Mean? 4.1.4 The Methodological Problem: Interactional Data 4.2 The Framework: The Universal Spine Hypothesis 4.3 What I Propose: The Interactional Spine Hypothesis 4.3.1 Extending the Universal Spine with Interactional Functions 4.3.2 Assumptions about the Normal Course of the Conversation 4.3.3 Methodology 4.3.4 Reporting Acceptability Judgments

5 Initiating Moves: A Case-Study of Confirmationals 5.1 Introduction 5.2 The Grammar of Initiating Moves 5.2.1 The Function of Confirmationals 5.2.2 Confirmationals on the Interactional Spine 5.2.3 The Core Meaning of Confirmationals 5.2.4 Predictions 5.3 The Role of the Host Clause: Target of Confirmation 5.3.1 Declaratives 5.3.2 Interrogatives 5.3.3 Imperatives 5.3.4 Exclamatives 5.3.5 Summary 5.4 The Articulated GroundP 5.4.1 The Argument from Interpretation 5.4.2 The Argument from Differences in Confirmationals 5.4.3 The Argument from Multiple Sentence-Final Particles

44 44 48 50 51 51 53 54 56 56 57 58 61 62 64 66

72 73 73 75 75 76 77 80 81 85 90 92

93 93 94 94 98 101 102 103 104 106 109 112 116 118 118 120 124

Contents

5.5

5.6

5.7

5.4.4 The Argument from Clause-Type Restriction 5.4.5 Summary Confirmational Paradigms 5.5.1 A Paradigmatic Contrast Based On [+/−coin] 5.5.2 The Timing of Grounding 5.5.3 The Gradability of Beliefs Confirmationals and Their Kin 5.6.1 Narrative vs. Confirmational Eh 5.6.2 When Attitudes Need Not Be Confirmed 5.6.3 What Makes Us Uncertain: The Role of Evidence Conclusion

6 Reacting Moves: A Case-Study of Response Markers 6.1 6.2

6.3

6.4

6.5

Introduction The Grammar of Reacting Moves 6.2.1 RespP in Initiation and Reaction: Similarities and Differences 6.2.2 The Core Meaning of Response Markers 6.2.3 The Target of Response Associating Response Markers with the Interactional Spine 6.3.1 Answering: When Response Markers Associate with CP 6.3.2 (Dis-)agreement: When Response Markers Associate with GroundSpkrP 6.3.3 Acknowledgment: When Response Markers Associate with GroundAdrP 6.3.4 Responding: When Response Markers Associate with RespP Reacting with Emotions 6.4.1 Reactions and Emotions 6.4.2 Emotive Content via Prosodic Modification 6.4.3 Emotive Content via Complex Response Markers 6.4.4 Emotivity Is Not a Spinal Function Conclusion

7 The Grammar of Interactional Language 7.1 7.2

7.3

7.4

7.5

Introduction The Syntacticization of Verbal Interaction 7.2.1 The Interactional Spine Hypothesis 7.2.2 Ingredients of the Interactional Spine Hypothesis Toward a Formal Typology of Interactional Language 7.3.1 What Is a Formal Typology of Interactional Language? 7.3.2 Where and How and When 7.3.3 More Predictions of the Interactional Spine Hypothesis Exploring the Cognitive Underpinnings of the Grammar of Interaction 7.4.1 Evidence for Cognitive Underpinnings 7.4.2 Do Interactive Abilities Precede Linguistic Abilities? 7.4.3 Interactive Abilities Are Also Linguistic Abilities: The Bridge Model Conclusions and Further Questions 7.5.1 Conclusion 7.5.2 Logophoricity 7.5.3 Genre, Style, and Subjectivity 7.5.4 Information Structure

ix 126 127 128 128 131 135 137 138 139 140 144

147 147 150 150 153 154 156 156 160 176 177 182 182 184 193 195 199

200 200 201 201 203 205 205 206 210 213 214 216 217 221 221 222 223 225

x

Contents 7.5.5 7.5.6

Epilogue Bibliography Index

The Role of Intonation The Clause Type–Speech Act Mapping

226 226

229 231 261

Figures

2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 4.1 4.2 4.3 4.4 5.1 5.2 5.3 5.4 5.5 5.6 5.7 7.1

The division of labor for speech acts (version 1) page 14 The division of labor for speech acts (version 2) 15 Transformational grammar 18 Deriving interrogatives 18 Performative deletion 19 Decomposing F 20 The model of Standard Theory 26 Generative models of grammar 26 A Stalnakerian common ground update 36 Analyzing the interactional dimension 39 Situating assumptions about the normal course of conversation 42 Bühler’s organon model 43 Adding the interactional dimension 45 The multi-dimensionality of the interactional dimension 49 The architecture of FDG (based on Hengeveld 2005) 52 A dichotomy of meaning 68 The interactional dimension 70 The normal course of an assertion 87 The normal course of a question 88 Disagreement 89 Ingredients for a conversation board 91 The use of a confirmational 97 Another use of confirmationals: “Confirm that you know” 101 Eh-interrogatives 108 Imperatives 112 Eh-imperatives 112 The normal course of an exclamation 115 Eh-exclamatives 116 The bridge model 219

xi

Tables

2.1 2.2 5.1 5.2 5.3 5.4 5.5 6.1 6.2 6.3

xii

Linguistic sub-disciplines page 13 Syntactic aspects of illocutionary acts 15 Knowledge states for assertions and questions 96 Knowledge states for assertions, questions, and confirmationals 98 The paradigm of grounding particles in Mandarin 131 Variation in the interpretation of [+coin] 133 Contrasting coincidence valuation in GroundSpkr 135 The structural ambiguity of response markers 167 The target of intensification 189 The distribution of response markers in Austrian German 191

Conversation Boards

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.

New dog. New info for Spkr. (Cx 1) New dog. New info for Adr. (Cx 2) Lecture. True question. (Cx 3) Lecture. Confirm you have the same question. (Cx 4) Beer. True imperative. (Cx 5) Beer. Confirm that you have this desire. (Cx 6) Surprise party. Confirm that you are surprised. (Cx 7) Surprise party. True exclamative. (Cx 8) New dog. Confirm that you know. (Cx 9) Movie. Subjective judgment. (Cx 10) Movie. Confirm that you have this evaluation. (Cx 11) New dog. New belief. (Cx 14) New dog. Old belief. (Cx 15)

page 104 105 106 107 109 110 113 113 119 122 123 131 132

xiii

Acknowledgments

Everything I did until now was in preparation for now. My research over the past 25 years has led me exactly to this stage which allowed me to write this monograph. I started out as a syntactician. Already in my dissertation I was curious about the syntax–discourse interface. And moving to Vancouver was motivated in part by the desire to look at the discourse-orientation of Salish languages. But I realized very quickly that I first had to understand their syntax. I didn’t even know how to begin thinking about the syntax–discourse interface. I now have a different understanding of syntax and the field has changed. So I felt ready to embark on this quest. My empirical entry point was the observation that you can say I have a new dog, eh? but not I have a new dog, huh? This cast doubt on the common assumption that eh is just the Canadian equivalent of huh. It is not. The desire to understand the difference between eh and huh has led me to several funded research projects, the eh-lab, and ultimately to this monograph. In a way everyone who was ever with me on my path, has contributed something to this monograph. And I am grateful to all of you. I started writing the manuscript on my sabbatical from The University of British Columbia in 2017. I had planned to complete it within a year, but then life slowed me down and led me to finish it in my new personal and academic home in Barcelona. I finally finished it amidst the pandemic that forced everyone to slow down, that forced us to stop interacting in person, and that forced us to reflect on the world and the human condition that led to its state. During this time I wasn’t sure whether this monograph would ever see the light of day, as it felt like the end of the world. I am grateful to everyone who was and continues to be part of this apocalyptic experience and who motivated me to persevere. Having been able to interact in isolation with old friends and with new ones has taught me the importance of human connections in profound ways. Life depends on interaction. And language shapes its expression.

xiv

Abbreviations

CA DRT D-S FDG ISH IS P&P POV QUD SFL S-S TCU UoL USH

conversation analysis Discourse Representation Theory Deep Structure functional discourse grammar Interactional Spine Hypothesis Information Structure Principles and Parameters point of view question under discussion systemic functional linguistics Surface Structure turn-constructional unit unit of language Universal Spine Hypothesis

xv

Prologue

Much of what I have learned about language (and life) I learned during fieldwork. I got to know languages vastly different from those I was familiar with. And I got to do this through the experience of native speakers – the elders, the wise women. This opened up a new world, a new perspective, and a new quest. It forced me to let go of many assumptions and hence made way for new discoveries. Often what I learned came through their comments. And often I did not understand, but I had learned enough that I understood that they always knew what I needed to know. Two particular comments stuck with me. The first one from my Halkomelem consultant. She kept saying it, so it was clearly important. This is for when you are just saying it. This is for when you are telling someone.

The second one came from my Blackfoot consultant. On several occasions she corrected a sentence I offered and, again, I didn’t know what to make of it. You have to put yourself into the sentence.

I realize now that these comments were driving the questions that led to this monograph. And years later, I finally think I understand what they tried to teach me. The answers I found are nothing beyond what they already said. I have learned that when we use language to communicate, the language we use reflects this interactive mode. Hence there is a difference between just saying something and telling someone. When we “tell someone,” the language of interaction emerges. It is very personal; we do so much more than just exchange our knowledge about the world. We typically have attitudes and feelings about what we think is going on in the world and we might even have some ideas about how they might affect the person we are talking to. The language of interaction allows us to express and convey our attitudes toward what we are saying. It allows us to put ourselves into the sentence.

xvi

Prologue

xvii

In trying to understand the comments of my consultants, I was led on a path to pursue the language of interaction; it opened a new world of data, a new way of collecting data, a body of research on interaction I didn’t know. It taught me a new way of thinking about the essence of language, thought, and the human condition.

1

Interactional Language

It’s not the language but the speaker that we want to understand. Veda, Upanishads

1.1

Introducing Interactional Language

Language allows us to communicate things about the world we live in, how we perceive it, how we think of it, and how we evaluate it. Language allows us to gain insight into one another’s mental worlds. There are thus two aspects of language: the role it plays in the thoughts about the world and the role it plays in the communication of those thoughts. This dual function sets up a dichotomy that has pervaded the study of language; it has defined different research agendas and methodologies. There are those traditions that take as their object of study the form of language used in the expression of thought, and there are those that take it to be its communicative function. A formal (generative) linguist takes language to be a computational system that produces an infinite set of hierarchically structured expressions interpreted by our conceptual-intentional (C-I) system (Chomsky 2008). For the formal linguist, the object of study is humans’ competence for language, rather than their performance. Exploring language competence makes necessary a particular methodology, unique to the generative enterprise: the elicitation of native speaker judgments. A functional linguist takes language to be a means for communication; linguistic form is analyzed for its communicative functions. The distinction between competence and performance does not play a role and the normal way to collect data is by exploring the way people use language, that is, in natural settings. A brief glance at language in interaction makes it clear that this dichotomy is spurious. Consider the interaction in (1) where I and R refer to the initiating and reacting roles, respectively. (1)

I Gal Gadot was amazing as Wonder Woman, eh? R Yeah, I know, right? 1

2

Interactional Language

I expresses their positive evaluation of Gadot’s performance. R indicates that they agree. I and R ’s utterances contain more than the expression of these thoughts; there are several units of language (henceforth UoL) that contribute to managing the interaction, rather than adding content. The sentence-final particle eh signals that I assumes that R shares the same belief and encourages R to respond. Following Wiltschko and Heim (2016), I refer to such UoLs as confirmationals, as they are used to request confirmation. In R ’s response, yeah indicates agreement; it doesn’t seem to add much to the content of the utterance (I know). The use of sentence-final right appears odd: why ask I whether it is “right” that R knows? This is not something we typically need confirmation for. But in this context it makes the agreement more enthusiastic. The thoughts that are expressed in (1) (i.e., the propositional content) are simple, but the interaction conveys much more. Consider the same interaction without these UoLs. (2)

I Gal Gadot was amazing as Wonder Woman. R I know.

The same thoughts are expressed, but the interaction has a distinctly different flavor. Unlike in (1), it is not clear whether I cares about R ’s opinion and R ’s response could be taken as rude; it seems to indicate that I ’s contribution is redundant. Thus, the sentence-peripheral UoLs change the quality but not the content of the exchange; they affect the use of language in interaction. Thus, a strict dichotomy between the form of language to express thought and the way it is used to convey these thoughts cannot be maintained: there are forms that affect the use of language. The forms that regulate interaction (i.e., use) have formal properties as well. Interestingly, these UoLs have received little attention in linguistics, in either formal or functional approaches. The goal of this monograph is to fill this gap by exploring the formal properties of UoLs that regulate interaction, that is, the grammar of interactional language. 1.2

Toward a Grammar of Interactional Language

The core thesis I propose is that grammar not only configures the language used to convey thoughts, but also the language used to regulate interaction. Following generative assumptions, I take sentences to be hierarchically structured expressions, derived via a computational system. Utterances typically considered in grammatical analysis are sentences that convey thoughts about the world that can be true or false. I refer to the structure associated with such sentences as the propositional structure.

1.2 Toward a Grammar of Interactional Language

3

I propose that interactional language is derived by the same computational system and is therefore hierarchically structured: interactional structure dominates propositional structure. I argue that there are two core functions that characterize interactional structure: one serves to manage the common ground between the interlocutors: it is used to express things about mental worlds, rather than the world itself, and hence it plays an important role in the synchronization of minds. I refer to this function as the grounding function. The second function concerns the management of the interaction itself (e.g., turn-taking). It aids the interplay between initiating and reacting moves. I refer to this function as the responding function. This is the core proposal I develop here: the Interactional Spine Hypothesis (henceforth ISH). (3)

Interactional structure

Responding

Grounding S Propositional structure

There are several arguments that interactional language is a part of grammar, in much the same way as propositional language is. I discuss and support each of these arguments in the course of this monograph. Interactional language is subject to well-formedness constraints; speakers have clear judgments about their use. This suggests that interactional language is part of competence. Interactional language shares much in common across languages, while displaying systematic variation in form, function, and distribution. It participates in paradigmatic contrasts and patterns of multi-functionality. In addition to these properties of interactional language, I submit that assuming that there is a grammar of interactional language and that this grammar is essentially the same as the grammar of propositional language is really the null hypothesis. While it is true that the language of interaction is typically realized at the periphery of utterances, it still must be the case that its form and meaning are computed. Sentence-final particles must be combined with the host clause and with each other. I take it to be the null hypothesis that the same system that is responsible for computing propositional language is also responsible for computing interactional language. Moreover, the language of interaction is characterized by some of the same properties as propositional language. For example, it is often prosodically integrated with the propositional

4

Interactional Language

structure and may combine with the same intonational tunes as propositional structure: sentence-final rise (indicated below as ↗) can be realized on a bare clause (4a) or on a confirmational (4b), but in the presence of a confirmational, it cannot be realized on the host clause no matter whether the confirmational also bears a final rise (4c) or not (4d). (4)

a. b. c. d.

You have a new dog↗ You have a new dog, eh↗ *You have a new dog ↗, eh↗ *You have a new dog ↗, eh

This indicates that there must be a computational system responsible for regulating the realization of intonational tunes that cuts across the distinction between propositional and interactional language. The task at hand is to model how UoLs (including intonational tunes) combine with each other to derive the meaning and function of the complex utterance. This is precisely the role of syntax. I submit that a syntactic approach toward interactional language is not only possible, it can also be viewed as a necessary heuristic for exploring the grammar of interactional language. Since interactional language of this type is a novel empirical domain for formal typologies, we require a novel standard of comparison. This is precisely what the structure of interaction illustrated in (3) is meant to be. I take its success to be measured in terms of its empirical coverage. The goal of this monograph is to explore the linguistic reality of the structure of interactional language. If linguistic reality can be established, we can then go on to think about its psychological reality. It is a way of understanding the cognitive underpinnings of the interactional aspect of language, the mechanism that allows humans to synchronize their mental worlds through communicative interactions. In this way, the classic dichotomy between language as a means to express thought vs. language as a means to communicate such thoughts implodes. 1.3

The Significance of Interactional Language

There are several reasons to explore the grammar of interactional language. First, for the description of a language to be exhaustive, interactional language should be included. Strikingly, it is rarely mentioned in descriptive grammars of a language. Second, it is necessary for the sake of developing a typology: just like propositional language, interactional language is subject to variation. The question about the range and limits of variation is familiar from the point of view of propositional language, but it has not been systematically investigated for interactional language. The ultimate goal of finding out about language universals and variation is to find out about the cognitive underpinnings that underlie these universals; in the context of interactional language, this concerns

1.3 The Significance of Interactional Language

5

the cognitive underpinnings of our communicative competence, which are responsible for the logic of human verbal interaction. A grammatical view on interactional language has its roots in a number of traditions (see Chapters 2 and 3). Recognizing the significance of interactional language was (in part) made possible by the recognition that language is typically embedded in an act of speech. Classic speech act theory (Austin 1962, Searle 1969) emphasizes that when we say things, we also do things: but what is the relation between what is said and what is done? The sentence itself is sometimes viewed as being enriched with meaning that regulates its use, namely force. For example, early on, Stenius (1967) argues that propositions by themselves are not units of communication; they need to be associated with illocutionary force. To capture its contribution, Stenius assumes that the sentences in (5) have the proposition (p) in common (the sentence radical), but that in addition, they combine with a modal element that signifies the force of the sentence (Stenius 1967 refers to this as mood). For an indicative, the proposition combines with an indicative modal element (I), for an imperative, there is an order (O), and for the interrogative, there is a question (?), as in (6). (5)

a. You live here now. b. Live (you) here now! c. Do you live here now? (Stenius 1967: 254 (1–3))

(6)

a. I(p) or p b. O(p) c. ?(p) (Stenius 1967, 255 (1ʹʹ–3ʹʹ))

Similarly, Lewis (1970) suggests that non-declarative clauses may be analyzed as being embedded in a (covert) performative clause: the question in (7a) can be rendered as in (7b). (7)

a. Who is Sylvia? b. I ask who Sylvia is.

If what we do with sentences (their force) is part of sentence meaning (as in (8), where p stands for proposition and F for Force), this aspect of interpretation is put squarely in the purview of grammar and is not just a matter of use in context. (8)

F(p)

On this view, certain aspects of sentence use are encoded within the sentence. This changes the way we think about the relation between form and meaning. In early structuralist theorizing, the object of study was patterns of form and how

6

Interactional Language

they relate to meaning. It was a matter of understanding the relation between a particular form and its meaning (Saussure’s program). This is also the focus in classic truth-conditional semantics (Frege’s program). By adding conditions on its use (i.e., their force), the meaning of a sentence is enriched in ways that now involve the speaker and their intentions. The relation between the form of a sentence and its interpretation is mediated by the speaker’s mind. Taking seriously the role of the speaker in the calculation of meaning goes hand-inhand with recognizing the importance of what we do when we say things. The former is typically associated with Grice (1957) and the latter with Austin and Searle. Assuming something along the lines of (8) integrates use into meaning and thus transcends this distinction. But what aspects of use are encoded and how? What is the makeup of F in (8)? Let us take as a starting point Lewis’s (1970) propositional rendering of F, as in (7) (I ask): it includes reference to the speaker and a predicate of communication (ask). For most speech act theoretic approaches, these are the main ingredients of F: the speaker (and their intentions) and the particular type of speech act, its force. However, speech acts do not occur in isolation; they are embedded in interaction. Thus, for the interpretation of speech acts, all interactants, the speaker and the addressee, are essential. According to Russell (1940: 204), there are three purposes of language: “(i) to indicate facts, (ii) to express the state of the speaker, (iii) to alter the state of the hearer.” Knowing who the addressee is, how they relate to the speaker, and what they know coming into the interaction affects what the speaker says and how they say it. And much of what we say comes with explicit instructions to the addressee as to what to do with what is being said. For example, some languages have dedicated mechanisms for indicating whether the addressee is higher or lower than the speaker on a social scale: this is reflected in forms of address, including the pronouns used to refer to the addressee. In addition, speakers are also sensitive to the knowledge states of their addressees. For example, in Bavarian German, the particle fei is used to indicate that the speaker believes that the addressee does not know p (Thoma 2016). Fei contrasts with doch, which signals that the speaker assumes that the addressee knows p (Thoma 2016). (9)

Martl is visiting Alex. Alex sets the dinner table for 2 and Martl assumes the second plate is for him. However, he has other plans that Alex doesn’t know about. a. I Hob fei koa Zeit zum Essn. I Have PRT no time to.DET eat ‘I don’t have time to eat.’ b. *I hob doch koa Zeit zum Essn.

1.3 The Significance of Interactional Language (10)

7

Martl and Alex chitchat. Martl tells Alex he doesn’t have time to stick around for dinner since he’s going to the movies. Alex sets the dinner table for 2 and Martl assumes the second plate is for him. a. *I hob fei koa Zeit zum Essn. b. I hob doch koa Zeit zum Essn. (Thoma 2016: 123 (9/10))

Finally, many of our utterances include a request for a response, for example in the form of rising intonation (see Heim 2019a for a recent discussion). Consider the difference between a declarative with falling and rising intonation, respectively. (11)

a. He has a new dog. ↘ He’s so cute. b. He has a new dog. ↗ *If so, what breed?

Falling intonation is compatible with the speaker keeping their turn; rising intonation is not. A rising declarative is explicitly requesting a response and hence the speaker must end their turn. Thus, the addressee plays a role in the interpretation of speech acts; there are UoLs that are sensitive to the presence of an addressee. Thus, the addressee should be included in the makeup of F. The importance of the addressee is reflected in the classic speech act theoretic trichotomy (locution, illocution, perlocution). In work that follows in the footsteps of classic speech act theory, however, the perlocutionary aspect is hardly addressed (see Chapter 3), and neither is the role of the addressee. The absence of the addressee is also evident in Lewis’s (1970) propositional rendering of the speech act of questions in (7). The verb ask, which corresponds to the speech act, can also be used as a ditransitive verb (I ask you . . .). The propositional rendering of the illocutionary force, which includes both the subject and object of asking (I ask you), underlies the performative hypothesis of Ross (1970). He argues that every clause is dominated by a speech act structure that encodes this frame (I Vsay you). Propositional content is embedded in structure that encodes the illocutionary force but is not spelled out. The insight behind Ross’s analysis for declaratives is that even when we say things, we do things: we tell others about the world. These are the core ingredients of speech acts, according to Ross, and they are syntactically encoded (see Chapter 2). Ross’s original proposal was abandoned shortly after publication, but the syntacticization of speech acts is currently an active research agenda. It has been made possible by the rise of functional projections that define the clausal architecture. And this has opened a new empirical domain to be described and analyzed: the language of interaction, such as the sentence-final particles introduced above. I show in this monograph that there

8

Interactional Language

is a systematicity to the language of interaction, indicating that it makes use of similar building blocks as the propositional grammar it embeds. Specifically, it participates in patterns of contrast and patterns of multi-functionality, two of the hallmarks of universal grammatical categories (Wiltschko 2014). There is a caveat, however. The broad type of information we are concerned with here, having to do with intentionality and interaction, can affect language in two ways: (i) it can be grammatically encoded, and (ii) it can come about via assumptions about the normal course of a conversation and the inferences that follow. It is the goal of this monograph to explore the grammatical underpinnings of interactional language. The monograph is organized as follows. In Chapter 2, I introduce the body of research that aims to syntacticize speech acts. The core problem, I argue, is that it neglects the interactional nature of speech acts. This sets the stage for Chapter 3, where I review various frameworks that take seriously the interactional aspect of language. In Chapter 4, I introduce the ISH, its formal properties, and its methodological implications. Chapters 5 and 6 are the core empirical chapters: I explore the form and function of confirmationals and response markers. I show that the same formal mechanisms that serve to classify confirmationals also serve to classify response markers. This provides evidence for an underlying system that regulates the use of these markers: the interactional spine. In Chapter 7, I conclude and outline empirical and theoretical questions raised by the ISH with the intention to establish it as a research program.

2

The Syntacticization of Speech Acts

Ultimately, life is a chemical interaction.

2.1

Heidi Hammel

Introduction

The goal of this monograph is to explore the grammar of interactional language. I argue that UoLs dedicated to regulating communicative interaction are part of syntactic structure. There are two core sources for this idea: (i) the syntacticization of speech acts, discussed in this chapter, and (ii) the development of speech act theory into a (dynamic) theory of interaction (Chapter 3). I attempt to combine these two lines of research and to explore the consequences. I show that existing models for the syntacticization of speech acts are missing an important aspect of language, namely its interactional component. The chapter is organized as follows. I start with a brief introduction of classic speech act theory (section 2.2). I then discuss the relation between syntactic structure and speech acts. In section 2.3, I discuss approaches that take this relation to be a mapping relation: certain clause types are mapped onto certain speech acts via interpretive mapping rules. I then introduce and evaluate analyses according to which speech act structure itself is part of syntax (section 2.4). I introduce the initial instantiation of this idea, the so-called performative hypothesis, and I review arguments against it (section 2.5). But I also show that the syntacticization of speech acts can be upheld on the assumption that speech act structure is part of the functional architecture of natural language (section 2.6). I refer to this as the neo-performative hypothesis. I then argue that neo-performative hypotheses suffer from several weaknesses: most analyses fail to consider advances that have been made since classic speech act theory, namely the focus on the dynamic and interactional component of utterances. In section 2.7, I conclude. 9

10

The Syntacticization of Speech Acts

2.2

Classic Speech Act Theory

When we talk to others, we not only say things, we also do things (Austin 1962). This insight triggered a large body of work exploring its consequences, both in philosophy and in linguistics. 2.2.1

Situating Speech Act Theory

At the time speech act theory was formulated, the dominant linguistic tradition was structuralism. It had introduced a focus on synchronic analysis and a distinction between language as a system (langue) and concrete instances of language use (parole). This echoes the division between language as a system for expressing thoughts and language as a means for communicating such thoughts. It also foreshadows the more cognitively oriented distinction between competence (what speakers know) and performance (what speakers do). Taking a synchronic approach paves the way for exploring the relation between language and its context of use. But it still took a while for contextual information to become part of linguistic investigation. In structuralist traditions (including generative models), words and sentences are the units of analysis and within classic semantic traditions, Frege’s principle of compositionality in (1) is the guiding principle for analysis. (1)

The meaning of a complex expression is determined by the meanings of its constituent expressions and the rules used to combine them.

It is mainly declarative denoting statements that are analyzed in this tradition; descriptions of facts about the world, which can be true or false. The focus on statements dates back to Aristotle, as does the significance of truth: “Not every sentence is a statement-making sentence, but only those in which there is truth or falsity” (Aristotle, De interpretatione (17a1–5), Edghill translation). The role of truth was formalized by Tarski in the 1930s. This in turn paved the way for truth-conditional definitions of meaning (Davidson 1967). Accordingly, to know the meaning of a sentence is to know its truth conditions, that is, what the world has to look like for the sentence to be true. Against this backdrop, it comes as no surprise that the role of context did not receive much attention and neither did questions about what speakers intend to do when they utter a sentence. Within the Fregean tradition, the absence of this aspect of meaning follows from the fact that its goal was to develop a language that was adequate for logical argumentation rather than to understand the intricacies of natural language. Nevertheless, even before Austin, we find approaches in which these notions played a role. For example, Ogden and Richards (1923) distinguish between symbolic and emotive meaning. Their notion of symbolic meaning corresponds

2.2 Classic Speech Act Theory

11

to the classic notion of meaning (also known as ideational, descriptive, propositional, representational, or referential). Emotive meaning, on the other hand, corresponds to an aspect of meaning rarely discussed in the Saussurian/ Fregean tradition: namely interpersonal, expressive, non-propositional, affective, and stylistic aspects of meaning. Ogden and Richards (1923) were ahead of their time, and the additional dimension of meaning they postulate was not taken up again until the development of speech act theory in the 1950s and 1960s. 2.2.2

What Are Speech Acts?

Austin’s (1962) contribution to our understanding of meaning was the realization that much of what we say goes beyond making statements that can be true or false. Rather, when we say something, we also do something, which is especially clear in cases where the action coincides with the utterance (Koschmieder 1929), that is, performative acts, as in (2). (2)

‘I name this ship the Queen Elizabeth’ – uttered when smashing the bottle against the stem. (Austin 1962: 5 E(b))

By uttering a performative, speakers perform an action that lies outside of the realm of truth and is associated with conditions of felicity, that is, conditions that make an utterance appropriate in a given context. For example, to christen a ship, the speaker has to be authorized to do so. This differs from an utterance that is meant to describe the world (Austin’s 1962: 20 constative) and can be true or false. Austin (1962: 121) develops the distinction between constatives and performatives and argues that we do something with anything we say. By uttering a statement, speakers are telling something to their interlocutors thereby trying to change their beliefs, to synchronize their mental worlds. To capture the insight that any utterance, including a descriptive statement, is a speech act, Austin (1962) introduces the speech act trichotomy below. (3)

a. locution: what is said b. illocution: what is intended by the speaker c. perlocution: what is effected in the addressee

Classic sentence-based syntactic and semantic theorizing concerns itself with locution. Much of the literature on the syntacticization of speech acts takes the illocutionary force to be at the core of what needs to be encoded. Interestingly, perlocution has been neglected in all grammar-based approaches toward speech acts.

12

The Syntacticization of Speech Acts

To get a better idea of what is meant by illocutionary force, consider Austin’s original classification into five illocutionary acts. (4)

a. verdictives (speaker gives a verdict, e.g., giving an estimate or appraisal) b. exercitives (speaker exercises powers, e.g., appointing or voting) c. commissives (speaker commits themselves to doing something; e.g., promising) d. behabitives (concerns attitudes and social behavior, e.g., apologizing and cursing) e. expositives (speaker explains how their utterance fits into the conversation, e.g., I reply, I argue) (Austin 1962: 150)

This classification is problematic as there are speech acts that cut across these distinctions. To develop a full-fledged understanding of illocutionary acts, it is vital to establish criteria for classification. This is what Searle (1976) attempts to do. He assumes that the basic unit of linguistic communication is the illocutionary act (Searle 1976: 2; see also Vanderveken 1990: 1). He criticizes Austin’s classification for being largely based on the meaning of English verbs; but illocutionary force should not be confused with illocutionary verbs (Searle 1976: 2). Illocutionary force is part of any language (and hence part of some form of a universal grammar, however conceived); illocutionary verbs, however, are part of a specific language. Because Austin’s classification is based on language-specific verbs, it can hardly be universally valid. Searle decomposes illocutionary acts into several dimensions (Searle 1976: 8), the three most important ones are listed in (5): (5)

a. illocutionary point (purpose) b. direction of fit between word and world c. sincerity condition based on the speaker’s attitude (psychological state)

Based on these dimensions, Searle identifies five major categories of illocutionary acts: (6)

a. b. c. d. e.

representative: commits speaker to the truth of the proposition directive: speaker attempts to get the addressee to do something commissive: commits speaker to future course of action expressive: expresses psychological state of speaker about p declarations: their performance brings about the propositional content (i.e., Austin’s performatives)

Illocutionary acts are defined by their illocutionary point (given in italics), which in turn determines their direction of fit. For example, in representatives, words are made to fit the world, whereas in directives, the addressee is expected to make the world fit the words. Moreover, each of these illocutionary acts can

2.3 The Linguistic Properties of Speech Acts

13

be realized with different strength and style and hence can come in different guises. On this view, then, illocutionary acts, and by extension speech acts, are not primitives: they are decomposable and hence can be composed in different ways, thus allowing for variation. Searle’s insight, that speech acts are not primitives, is at the heart of the proposal I develop in this monograph. Despite advancements in speech act theory, classic speech act theory is still influential: the syntacticization of speech acts draws mostly on Austin’s insights (via Ross). To appreciate the syntacticization of speech acts, it is useful to review the linguistic properties of speech acts. 2.3

The Linguistic Properties of Speech Acts

Austin identifies the speech act as the unit of analysis for linguistic theory. Doing things with words differs from just saying those words. The purpose of this section is to explore the linguistic properties of speech acts. This serves as a prolegomenon to an empirically grounded theory of speech acts, which takes into consideration the linguistic properties of UoLs that serve to facilitate interaction. For the purpose of this discussion, I follow Austin’s (1962) convention of identifying speech acts with illocutionary acts (F(p)). According to Austin (1962), and many following in his footsteps, speech acts are considered as a unit of analysis for pragmatics – the sub-discipline of linguistics responsible for exploring and explaining language use in context (Levinson 1983). At the time that Austin and Searle developed classic speech act theory, (descriptive and theoretical) linguistics was demarcated into five sub-disciplines, all with their own units of analysis, as summarized in Table 2.1. Even though within the generative tradition the way we view the relation between the different linguistic domains and their units of analysis has changed (largely initiated by Chomsky 1957), this classic division has guided much of the research on speech acts to this day. Limiting the discussion to declarative sentences, within the classic subdivision of linguistics, there is a Table 2.1 Linguistic sub-disciplines Linguistic sub-discipline

Unit(s) of analysis

Phonology Morphology Syntax Semantics Pragmatics

Sounds Words Sentences Words and sentences Speech acts

14

The Syntacticization of Speech Acts F(p)

F Pragmatics

p Syntax Semantics

Figure 2.1 The division of labor for speech acts (version 1)

clear division of labor for dealing with different aspects of speech acts, as illustrated in Figure 2.1: propositional structure (p) makes up the classic notion of a sentence and is thus the core area of analysis for syntax (its form) and semantics (its meaning); in contrast, the illocutionary act (F) is a matter of using the sentence in context and is thus the core area of analysis for pragmatics. What is the relation between the utterance of a sentence and the speech act it triggers? How does F come about? Is there a systematic relation such that we could (at least partially) predict the speech act of a sentence based on its linguistic properties? (7)

p → F(p)

Searle (1976) explores the relation between the syntax of p and the illocutionary force associated with it. Specifically, he proposes a correlation between the form of a sentence (its syntax) and the type of illocutionary act it triggers (summarized in Table 2.2). Searle recognizes, however, that there is no one-to-one correspondence between the syntactic form of a sentence and the speech act it triggers, partly because the interpretation of speech acts is dependent on contextual factors. Accordingly, the division of labor is reconceptualized: F is partly conditioned by grammar and partly conditioned by pragmatic factors (context and conventions). Since properties of F are partly grammatically conditioned, we can ask whether contextual factors interact with grammar, and if so, how. As we shall see, much of the more recent work within syntactic theory assumes that syntactic structure systematically makes available contextual variables, hence configuring the interface with pragmatics. A step in this direction is proposed by Allan (2006), who argues that the primary illocution of a speech act is a matter of semantics and can be detected at the sentence level: each clause type is associated with a unique primary illocution, which in turn gives an initial clue to the pragmatically determined

2.3 The Linguistic Properties of Speech Acts

15

Table 2.2 Syntactic aspects of illocutionary acts Illocutionary act

Illocutionary point

Syntactic form

Representative Directive Commissive Expressive declaration

commits S to truth of p S wants Ax to do something commits S to future action S’s attitude toward p ‘performative’

I verb (that) + S I verb you + you Fut Vol Verb (NP) (Adv) I verb (you) + I Fut Vol Verb (NP) (Adv) I verb you + I/you VP ⇒ Gerundive Nom I verb NP1 + NP1 be pred

Searle (1976) does not provide a gloss for his abbreviations. “S” = speaker and “Ax” = addressee; I assume that “Vol” stands for volition.

F(p)

F

p Syntax Semantics

Context and Conventions

Pragmatics

[[I verb you]...

Syntax Semantics

Figure 2.2 The division of labor for speech acts (version 2)

illocutionary point of an utterance (Allan 2006: 2). For example, the primary illocution of a declarative clause type is to denote a truth value (T) and is likely to be used as a representative in Searle’s sense. However, this is not a categorical mapping. For example, declaratives can be used to express almost all illocutionary points. If indeed there is a relation between clause type and speech act type, it follows that the study of speech acts cannot be restricted to pragmatics. We have to conclude that its exploration will involve syntax, semantics, and morphology. In addition, as also noted by Allan (2006), another cue that allows listeners to infer the speech act is prosody. Consider the examples in (8). (8)

a. John’s gone to New York↘ b. John’s gone to New York↗

Both sentences are declarative but they differ in their prosodic properties: (8a) has falling intonation (↘), while (8b) has rising intonation (↗). These prosodic

16

The Syntacticization of Speech Acts

properties correlate with a difference in their illocutionary point. (8a) is a representative. (8b) is a question, and hence would be classified as a directive: the addressee is asked to provide information. Hence, the study of speech acts has to encompass all of the subdomains of linguistics. It is not only a matter of pragmatics. This is supported by the fact that there are UoLs that modify speech act interpretation. Consider the example in (9). It is identical to the declarative in (8a), with the additional sentenceperipheral discourse marker eh, which changes the illocutionary point of the utterance. (9) is a directive, not a representative: it serves as a request for confirmation. (9)

John’s gone to New York, eh?

Thus, there are UoLs that modify the illocutionary act of an utterance. Unless we allow for the pragmatic component to add UoLs, we have to admit that speech act interpretation is (at least in part) a function of the sentence itself. Recognizing the role of prosody, morphology, syntax, and semantics for our understanding of speech acts is also an important step toward exploring cross-linguistic variation. The necessity for a typological study of speech acts is implicit in Searle (1976, 22 f.). It is implicitly assumed that there is a universal set of basic speech act categories (illocutionary points), which is supported by the fact that languages make use of a limited set of clause types that serve as indicators of illocutionary point (Sadock and Zwicky 1985, Portner 2005, Allan 2006). Specifically, all languages have at least declaratives, interrogatives, and imperatives as dedicated clause types, with some also distinguishing subjunctives and exclamatives. The universality of these clause types reflects the fact that there are certain basic things that people do with words. And the impression that there is a large number of speech acts stems from two facts: (i) Basic speech act categories receive different “flavors” depending on their context of use (i.e., their pragmatics) and the speaker’s intention (ii) Basic speech act categories as determined by clause type may be modified by discourse markers and sentence prosody Given that clause typing and mood play an important role in the interpretation of speech acts, it comes as no surprise that speech acts have been analyzed syntactically. 2.4

Syntacticizing Speech Acts Part I: The View from Generative Semantics

Within linguistics, speech act theory was developed within pragmatics and semantics (see Harris, Fogal, and Moss 2017 for a recent overview). There is a

2.4 Syntacticizing Speech Acts Part I

17

growing literature that explores the syntactic underpinnings of speech acts. The core idea that unites this body of work is that speech act function is (in part) derivable from sentence form. In traditional linguistic description, the unit of analysis for syntax is the sentence, which – by hypothesis – encodes propositional content; illocutionary force is found outside the sentence proper, creating a unit of analysis for pragmatics. The major insight behind the syntacticization of speech acts is the assumption that illocutionary force is part of sentence structure, expanding the classic notion of the sentence to include what Sadock (1969a) refers to as the hypersentence. On this view, the relation between p and F(p) is partly syntactically conditioned. p becomes F(p) by virtue of extending the propositional structure (p-structure) to include speech act structure (SA-structure), as in (10): (10)

S SAstructure

S pstructure

In this section, I introduce the original version of this idea (Sadock 1969a, Ross 1970), couched within the framework of generative semantics. I start with a brief discussion of the main tenets of generative semantics that paved the way toward the syntacticization of speech acts. 2.4.1

How to Encode Meaning in Syntax

Syntax is traditionally characterized as the domain where words are combined to form sentences. Chomsky’s (1957) seminal monograph has redefined the way we think about syntax. It initiated a decomposition of syntax into a series of syntactic structures including phrase structure, transformational structure, and morphophonemics (Chomsky 1957: 46 (35)). Katz and Postal (1964) argue that transformational rules do not change meaning. Hence the input for transformations (also known as Deep Structure; D-S) is also the input for interpretation. The application of transformational rules derives surface form as in Figure 2.3. Counter-examples to the Katz-and-Postal hypothesis are not hard to come by. For example, subject–auxiliary inversion (SAI) changes the meaning of a sentence from declarative to interrogative. Thus, clause type serves as a clue for illocutionary force.

18

The Syntacticization of Speech Acts

Deep Structure (Interpretation)

Surface Structure (Form)

Figure 2.3 Transformational grammar

Deep Structure (Interpretation)

Surface Structure (Form)

S

Q

S

S

SAI

S

Q are

you are here

you

here

Figure 2.4 Deriving interrogatives

(11)

a. Yoshi can catch his ball. b. Can Yoshi catch his ball?

To capture the correlation between the transformational rule and the change in interpretation without giving up the assumption that transformational rules cannot change meaning, Katz and Postal postulate an abstract Q-morpheme present at D-S (see also Baker 1970). This Q-morpheme is responsible for primary illocution and for the movement of the auxiliary, as in Figure 2.4. Foreshadowing Ross (1970) and Sadock (1969a), Katz and Postal (1964: 149) suggest that the marker that indicates illocutionary force (Q for question or I for imperative) is a full-fledged (albeit not pronounced) performative clause. Interrogatives come with an abstract silent performative clause, as in (12). (12)

a. Can Yoshi catch his ball? b. [I ask you whether] Yoshi can catch his ball.

2.4 Syntacticizing Speech Acts Part I

19

Thus, at the level of representation that determines interpretation, indicators of illocutionary force are present. This paves the way for the syntacticization of speech acts. 2.4.2

Syntactic Underpinnings of the Performative Hypothesis

The core insight behind Ross (1970) and Sadock (1969a, 1969b) is that illocution constitutes an integral part of the sentence and hence falls in the domain of syntactic analysis. Their analysis has the ingredients summarized in (13). Propositional structure (S) of the familiar type is dominated by an abstract structure that contains three constituents: (i) a subject referring to the speaker (I), (ii) a verb of communication, indicating the illocutionary force, and (iii) an indirect object referring to the addressee (you). (13)

[I verb you [S]NP]S

The structure that is interpreted is related to the structure that is spelled out via the transformational rules of deletion, as in Figure 2.5. This idea changes the way we can think about the relation between speech act form and function since part of illocutionary force is contained within the sentence proper. In the original schema of Stenius, (F(p)) has to be decomposed into at least two components: F is partly conditioned by contextual factors (F(Cx)) and partly by grammar (F(G)). But even F(G) is not a primitive: it is decomposable into three core ingredients: speaker, addressee, and a performative verb, as in Figure 2.6.

Deep Structure (Interpretation)

Surface Structure (Form)

S SAstructure I tell you that

S

S

pstructure

pstructure

I have a new dog

I have a new dog

Figure 2.5 Performative deletion

20

The Syntacticization of Speech Acts F(p)

Pragmatics F(Cx)

Syntax Semantics

F(G) p [[I verb you]...[p]]

Figure 2.6 Decomposing F

For Ross and Sadock, these structures are present even when the utterance does not overtly encode them. It is inspired by Austin’s (1962: 32) claim that both sentences in (14) are performative even though only (14b) contains an explicitly performative clause (I order you to). (14)

a. Go! b. I order you to go. (Austin 1962: 32)

Thus, Austin (1962) already acknowledged that a grammatically explicit form is not a necessary condition for a sentence to be performative. Ross’s (1970) insight is that this also holds for declaratives. This is the essence of the performative hypothesis. Syntax provides substance for the interpretation of a sentence. It allows for all utterances to be treated alike, no matter their illocutionary point. Just as we are doing something when we are asking someone or commanding them, so are we doing something when we are telling someone. And what we are doing with a sentence is part of its meaning. Aside from this conceptual advantage, Ross (1970) presents 14 empirical arguments. They support the key claims that the speaker, the addressee, and the illocutionary force are part of the sentence proper. 2.5

The Fate of the Performative Hypothesis

The performative hypothesis was short-lived, however. Almost immediately after Ross (1970), several papers were published that argued against it (Anderson 1971, Fraser 1974, Leech 1976, Mittwoch 1976, 1977). The arguments put forth can be grouped into three types. (i) Arguments against Austin’s performative hypothesis (ii) Arguments against the validity of the empirical evidence (iii) Arguments against the syntacticization of pragmatic phenomena

2.5 The Fate of the Performative Hypothesis

21

The goal of this section is to introduce the gist of these arguments. We shall see that the arguments are not applicable to current instantiations of Ross’s idea. Speech act theory has advanced, new empirical evidence for the syntacticization of pragmatic phenomena has emerged, and syntactic theory has changed in ways that make the syntacticization of speech acts no longer vulnerable to the same criticism as in the 1970s (see Etxepare 1997). 2.5.1

Arguments against Austin’s Performative Hypothesis

Ross’s (1970) syntactic implementation of the performative hypothesis is based on Austin’s (1962) theory of speech acts. Hence, criticism that applies to Austin’s work also applies to Ross. While Austin’s main insight that we do things with words has stood the test of time, his implementation has been criticized. We have already seen that the criteria for classifying speech acts into one of the five illocutionary acts are not sufficient (Searle 1976). Furthermore, Austin’s reliance on social conventions for the interpretation of speech acts was criticized as leaving out an important aspect of interpretation, namely a speaker’s communicative intention (Grice 1957, 1968, 1969). The kinds of examples Austin discusses involve highly ritualized settings (e.g., a wedding ceremony, a ship christening), where social conventions are indeed sufficient to interpret acts of speech. However, other speech acts are less conventionalized and their interpretation requires the recognition of the speaker’s intention. According to the performative hypothesis, some of the speaker’s intention is directly encoded in the sentence, namely via the verbs of communication. Moreover, Austin’s speech act theory relies heavily on the lexico-semantics of (performative) verbs. However, there is no formal way to distinguish performative verbs from other verbs, casting doubt on the linguistic significance of recognizing them as a class (Thomas 1995: 44). Moreover, just because a performative verb is present, it doesn’t mean that the corresponding act is actually performed. For example, promise can be used in a non-performative way (15a) and sometimes a promise creates a threat, which is a different type of speech act (15b). (15)

a. I promise too many things to too many people. (Searle 2002, 4) b. I promise to make you suffer.

It is not clear how the performative hypothesis can model the relation between utterance form and illocutionary force. Conversely, we can do things with utterances that do not involve performative verbs. For example, saying hello is an act of greeting that does not require a performative verb. Note that the last argument may be moot under Ross’s (1970) version of the performative hypothesis because even in the absence of an overt performative

22

The Syntacticization of Speech Acts

verb, we can postulate a deleted (hence) silent one. Even so, the performative hypothesis has nothing to say about the relation between clause type and speech act: why is it that (almost) universally we find declarative, interrogative, and imperative clause types that appear to have a very tight connection to particular speech acts? Austin’s classification of illocutionary acts doesn’t reflect this. We thus know that sentence form alone is not enough to determine speech acts; rather, speaker intentions have to be taken into consideration. 2.5.2

Arguments against the Empirical Evidence

The other set of arguments against a Ross/Sadock style analysis of speech acts has to do with their empirical evidence (see for example Fraser 1974). Specifically, the argument is that none of the data presented in Ross can be counted as evidence for a higher performative sentence because in each case there is an alternative analysis available. Consider, for example, Ross’s argument based on speech act modifiers such as frankly. Its use in (16a) modifies the abstract performative clause (I’m telling you frankly), rather than the propositional structure. But as Mittwoch (1977) observes, the distribution of frankly is not restricted to root clauses; instead it can also be used inside a subordinate because-clause, as in (16b). (16)

a. Frankly, I don’t trust Bill. (Mittwoch 1977: 177 (1a)) b. I voted for John, because frankly, I don’t trust Bill. (Mittwoch 1977: 178 (6a))

This is unexpected under the performative analysis according to which frankly modifies the speech act structure. On the basis of data like this, Mittwoch (1977) argues that the performative clause is a parenthetical. Similarly, consider the argument from reflexives. Ross (1970) argues that first person reflexives can be used without an obvious antecedent. This was used as evidence for the presence of an abstract speaker subject in the embedding performative clause. However, a first person reflexive may also be used inside an explicit (overt) performative clause, as in (17). (17)

You are hereby authorized by John and myself to buy that ship. (Newmeyer 1996: 118 (7a))

If the explicit performative clause (you are authorized) does indeed serve as the abstract performative clause, then the presence of the reflexive suggests that the licensing of this reflexive does not require an abstract higher clause. Another licensing mechanism must be in place. However, it has also been suggested that explicit performative clauses are not in fact explicit, but instead count as assertions (as indicated by their declarative clause type).

2.5 The Fate of the Performative Hypothesis

23

Their performative character is an instance of an indirect speech act, while their primary locution is an assertion. Translated into Ross’s (1970) analysis, this suggests that there is indeed another layer of structure above the “explicit performative” in (17), as in (18). (18)

[I’m telling you that] you are hereby authorized by John and myself to buy that ship.

This possibility leads to another problem: how do we know when to stop adding another layer that encodes the illocutionary force of the utterance? That is, we face the problem of a potential infinite regress. Interestingly, Gardiner (1932: 191) identified this problem even before Ross’s formulation of the performative hypothesis: “the attempt to assert the quality of a sentence within that sentence itself does but involve us in an infinite regress.” To resolve this problem, Gardiner concludes that those linguistic elements that indicate illocutionary force are merely indicating rather than describing it. This is a step toward acknowledging the special character of the meaning involved: it is not part of the regular truth-conditional meaning of a sentence. For some, this type of meaning should not be considered part of the sentence proper, and thus should not be part of the domain of syntax and semantics. Instead, it belongs to the realm of pragmatics, which brings us to the last argument against the syntacticization of speech acts. 2.5.3

Arguments against the Syntacticization of Pragmatics

The final set of arguments against the performative hypothesis concerns a general rejection of including pragmatic functions in syntax. The first family of criticism of the syntactically based performative hypothesis comes from the general criticism of the framework: generative semantics. There was a general rejection of the idea that meaning is generated syntactically; instead semantics was argued to be interpretive: syntax generates sentences, and semantics interprets them. Much of the empirical domain that generative semanticists tried to cover was considered not to be part of the type of sentence-level analysis. This includes phenomena that critics of generative semantics considered to be part of performance. According to Newmeyer (1996: 120), the collapse of generative semantics had to do with its “practice of regarding any speaker judgment and any fact about morpheme distribution as a de facto matter of grammatical analysis.” The syntacticization of speech acts falls squarely into this domain as well: speech acts are typically viewed as a pragmatic phenomenon, and pragmatics, as the study of language use, appears to belong to linguistic performance rather than competence. However, the equating of the distinction between competence and performance with the distinction between syntax/semantics and pragmatics is not logically

24

The Syntacticization of Speech Acts

necessary. Rather, there are several traditions that view speaker’s knowledge of how to have conversations as being part of language competence (Campbell and Wales 1970, Hymes 1972, (Ochs) Keenan 1974, Mittwoch 1976, Chomsky 1978, 1980, Ginzburg 2012, among others). It is not a priori clear what type of phenomena should or should not be subsumed under a syntactic analysis. In fact, in recent years, work on the syntax–pragmatics interface has been on the rise. While the conceptualization of a syntax–pragmatics interface (what Ross 1975 called pragmantax) did not fit into the framework of the 1970s, things have changed. In fact, syntax itself may be considered an interface domain. It mediates the relation between form and meaning. Within the generative tradition, this is reflected by models in which there is no direct relation between form (PF) and meaning (LF) as in the T-model of Government and Binding Theory and in the Y-model of Minimalism. In the next section, I review some of the changes in syntactic theory since the Ross/Sadock version of syntacticizing speech acts that have paved the way for a theory that does not face the same problems as the original performative hypothesis; and as McCawley (1985: 61) notes, “The problems [the performative hypothesis] was intended to deal with have not been solved so much as ignored” (see also Davison 1981). Current versions of the performative hypothesis are taking up the challenge, as I show next. 2.6

Syntacticizing Speech Acts Part II: Functional Architecture

Over the past two decades, there have been numerous updated versions of the Ross/Sadock performative hypothesis: I refer to these as the neo-performative hypothesis (Wiltschko and Heim 2016). These analyses differ in terms of the data they are meant to explain and in the details of the analysis. But they all have in common the idea that speech act structure is part of the extended functional architecture. (19)

FP

SAstructure F

CP

pstructure

Assuming a functional architecture above the propositional structure makes it possible to derive the interpretation by means of abstract functors (F) rather

2.6 Syntacticizing Speech Acts Part II

25

than lexical material. Many of the current analyses preserve the same content as in the original performative hypothesis, in that the functional projections contain at least the speech act roles (speaker and addressee). The abstractness of the functors makes it possible to model speech act structure without a deletion process. And if speech act structure is not deleted, we expect that it might be spelled out, at least sometimes. This is indeed the case, and many of the current analyses use overt evidence for speech act structure, such as sentence-final particles (Haegeman 2014) or vocatives (Hill 2014). However, there is no consensus on the nature of the heads that introduce these roles, which, I argue, raises some issues with the enterprise. 2.6.1

Theoretical Background

Current syntactic theory differs from the framework within which the original performative hypothesis was couched in two major ways. One has to do with the architecture of grammar; the other has to do with the conceptualization of the clause. 2.6.1.1 The Model The performative analysis is couched within a framework in which syntax is no longer viewed as the subdomain of linguistics responsible for combining words into sentences. Several innovations define syntactic modeling within the early days of the generative tradition (see Wiltschko 2017a, 2018 for discussion). First, syntax is decomposed into a series of syntactic structures, including phrase structure, transformational structure, and morphophonemics (Chomsky 1957). Second, the building blocks of syntax are no longer just words. In Chomsky (1957), affixes are included in the building blocks. The rule for affix hopping, for example, refers to inflectional morphology. In addition, in the Standard Theory of the 1970s, D-S interfaces with meaning and Surface Structure (S-S) generates the overtly spelled-out form, as in Figure 2.7. If D-S serves as the input for meaning, it follows that transformational rules (TRs in Figure 2.7) cannot change meaning. This led to the postulation of abstract morphemes that are interpreted and yet may trigger a transformation. An example of such a morpheme is the abstract question feature (Q) that simultaneously gives rise to a question interpretation and triggers subject–auxiliary inversion. In this way, the abstract morphemes introduced in the Standard Theory can be viewed as the precursors of abstract syntactic features, which play an important role in current syntactic theory. The decomposition of syntax into a series of syntactic structures made it possible to be much more explicit about the relation between syntax and the

26

The Syntacticization of Speech Acts

other subdomains of grammar. In particular, one of the core assumptions introduced in Chomsky (1957) is the idea that the relation between form (phonology and phonetics) and meaning (semantics) is mediated by syntactic computation (transformational rules), as in Figure 2.7. This assumption is still central to the model of grammar currently assumed. Specifically, the model of Standard Theory was replaced by the Principles and Parameters (P&P) model of the 1980s (model on the left in Figure 2.8) and the Minimalist model introduced in the 1990s (model on the right in Figure 2.8). Note that the relation between form and meaning is still mediated by syntactic computation, although it is more complex. Within the P&P model, there are two abstract levels of representation (D-S and S-S) where D-S interfaces with the lexicon (a list of words and morphemes) and S-S feeds both Phonetic Form (PF) and Logical Form (LF). These syntactic levels of representation (D-S and S-S) were abandoned within the Minimalist Program in favor of a model where the principles, filters, and constraints that played a

meaning

D-S TRs S-S form

Figure 2.7 The model of Standard Theory

D-S

move alpha S-S

PF

syntactic computation

meaning LF

PF form LF

Figure 2.8 Generative models of grammar

2.6 Syntacticizing Speech Acts Part II

27

role at these levels were built into the computation itself. What remains is the same core idea: syntax mediates the relation between form and meaning. 2.6.1.2 The Fate of S One of the core assumptions of linguistic description and analysis is that the sentence is the major unit of analysis for both syntax and semantics. For modern semanticists who work in the footsteps of Frege, the sentence is the domain where propositions are expressed and a truth value can be assigned. For syntacticians, this is the domain that contains (at least) a subject and a predicate. The primacy of the sentence as a unit of analysis for syntax is reflected by the fact that it was used as a label for the root of the sentence structure. But in some contexts, sentences extend beyond a simple subject–predicate structure. Consider for example the embedded clause in (20a) and the wh-question in (20b). (20)

a. I know [that [prices slumped]S] b. Why did [prices slump]S

As indicated by the brackets, the string of words that corresponds to the subject–predicate domain (prices slumped) does not constitute the full clause. In (20a), the complementizer that precedes S, and in (20b), the wh-word (why) and the auxiliary inflected for tense (did) do. So, if there is material outside of S, how should the complex structure be labeled? Bresnan (1972) proposes the phrase-structure rule in (21), which combines an abstract functor COMP with S to return Sʹ. (21)

Sʹ → COMP S where COMP can be filled by that, for, and WH (= “Q”) (Bresnan 1972: 13 (8))

The label Bresnan proposes (Sʹ) suggests that the core of the clause, the domain of subject and predicate, is extended; hence, it is no longer quite like S. Nevertheless, we still have the intuition that we are dealing with a sentence. The development that led to abandoning S as a label was the introduction of Xʹ-theory (Jackendoff 1977). All of the phrases based on lexical categories (N, V, A, and P) are endocentric: their category label is determined by a head of the same category. Since this holds for all categories, it can be generalized to the rule in (22). (22)

XP → . . . X . . .

If, however, the sentence is a phrase, we have a problem: neither the rule for the composition of a (core) sentence (S) in (23a) nor the rule for the composition of the “extended sentence” (Sʹ) in (23b) is endocentric.

28

The Syntacticization of Speech Acts

(23)

a. S → NP VP b. Sʹ → COMP S

For the rule that derives sentences to conform to Xʹ-theory, the second core innovation of Chomsky (1957) comes into play, namely the assumption that UoLs other than words belong to the building blocks of syntax. In addition to the subject and the predicate, for a sentence of English to be well-formed, the verb has to be inflected for tense and agreement. And crucially, the distribution of inflectional morphology fits the bill of a syntactic head: it is obligatory and it is unique. Based on this evidence, Travis (1984) introduces the idea that the head of the sentence is INFL(ection) hosting two features: tense and agreement. Similarly, to assimilate the rule that derives extended clauses (Sʹ), Chomsky (1986) proposes that the head of Sʹ is COMP and hence its label should be CP. Thus, the full structure of the (extended) sentence according to the extended Xʹ-theory (Chomsky 1986) is illustrated in (24). (24)

[CP Spec [C C [IP NP [I [VP . . .]]]]]

Given these innovations, the definition of the sentence is no longer a simple matter. Suppose we define a sentence as a phrase that contains both a subject and a predicate. We quickly observe that this definition will not suffice: not every string of words that fulfills this criterion can function as an independent sentence. To see this, consider the clauses in (25)–(26). (25)

I saw [John walk his dog]

(26)

a. *John walk his dog b. John walks his dog. c. John walked his dog.

There are clauses – known as small clauses – that consist solely of a subject and a predicate. However, in English they are restricted to complements of verbs of perception, as in (25). Small clauses cannot be used as independent clauses (26a); instead, the verb needs to be inflected (26b). Note that clauses that can be used as independent sentences are not well formed in all environments either. For example, as complements of regret (27a) and wonder (27b), an embedded clause has to be introduced by a complementizer. (27)

a. John regrets [that he walked his dog] b. John wonders [whether he walked his dog]

Thus, clauses differ in size depending on their context. Small clauses have been analyzed as bare VPs (Stowell 1983); independent declarative clauses are analyzed as IPs. And other clause types, such as questions, are analyzed as

2.6 Syntacticizing Speech Acts Part II

29

CPs. Hence, we might conclude that some sentences are IPs and others are CPs. The explosion of functional architecture over the past few decades is making it even more difficult to identify the syntactic category of a sentence. For example, within the cartographic framework, Rizzi (1997) develops an analysis of the left periphery arguing that the CP should be decomposed into (at least) four different functional projections, as in (28). (28)

Force > (Topic) > (Focus) > Fin(iteness) > IP . . . .

The (articulated) complementizer system serves as the interface between the propositional content (expressed in the IP) and the superordinate structure. The latter may be a higher clause or the articulation of discourse (Rizzi 1997: 283). The core of the system is Finiteness and Force. The former looks downward into the content of the propositional structure (IP): hence Fin-complementizers are sensitive to whether or not the IP contains a finite structure. The latter (Force) looks upward and as Rizzi (1997, 283) observes: “Complementizers express the fact that a sentence is a question, a declarative, an exclamative, a relative, a comparative, an adverbial of a certain kind, etc., and can be selected as such by a higher selector. This information is sometimes called the clausal Type (Cheng 1997), or the specification of Force (Chomsky 1995).” Note that in this body of work – which marks the beginnings of a new wave of encoding speech act structure in syntax – clause typing is equated with force. This implies that the notion of force used in the syntactic literature is of a different nature than the notion of illocutionary force used in the philosophical and pragmatic literature. (Though this is not always acknowledged.) There is an important lesson to learn here. While the sentence was a primitive syntactic category (S) within the generative tradition up until the mid-1980s, its replacement by functional categories necessarily changes this view. As is clear from Rizzi’s observation above, it is a common assumption that IP serves as the propositional structure, whereas the CP-domain serves to link the propositional structure to the higher structure. Thus, it seems that the sentence is not a welldefined syntactic category. Depending on one’s analytical preferences, it may be identified as IP, CP, ForceP, or whatever other functional category one takes to define the root of the sentence. We may provide a contextual definition such that we identify it as the highest functional category associated with propositional structure. But if there is indeed evidence for the articulation of discourse in the form of functional architecture, then we have to consider this very structure to be part of the sentence. And if indeed propositional structure is demarcated by IP, then we might conclude that not every aspect of sentence meaning is truth-conditional.

30

The Syntacticization of Speech Acts

2.6.2

A Dedicated Speech Act Structure

Speas and Tenny (2003) introduce the idea that above the propositional structure there is a dedicated speech act structure, the SpeechActPhrase (SaP). While other scholars had already proposed ideas along those lines (see Etxepare 1997, Rizzi 1997, Ambar 1999, Cinque 1999), what sets Speas and Tenny (2003) apart is the fact that they explicitly argue that the root clause is dominated by a dedicated speech act structure, rather than analyzing it as part of an articulated CP. In this structure, force, speech act roles, and point of view are encoded as part of the functional architecture of (root) clauses. Speas and Tenny (2003: 315) argue that the configuration of these pragmatic notions is regulated by basic syntactic principles. Following the theory of argument structure in Hale and Keyser (2002), they propose that, like argument structure, speech act structure may have maximally three arguments: the speaker, the addressee, and the utterance content. Assuming that binary branching is an essential restriction on syntactic structures, a three-place predicate has to be decomposed into two two-place predicates. Thus they propose an articulated speech act structure in (29a), akin to the articulated argument structure in (29b). (29)

a. speech act structure

b. argument-structure

saP

vP

Speaker

AGENT

sa

sa*

v

Utterance content sa*

VP

THEME

Hearer

V

GOAL

This allows for a direct translation of Ross’s (1970) analysis into functional architecture: the speaker corresponds to the agent, the addressee corresponds to the goal, and the utterance content corresponds to the theme (I give the utterance to you). Moreover, just as argument structure can be conceptualized as event structure by shifting the focus on the contribution of the verbal heads, so too can speech act structure be viewed as encoding the speech event (Etxepare 1997). The other empirical domain Speas and Tenny (2003) cover is the encoding of point of view (POV, as in evidentials, logophoric pronouns, long-distance

2.6 Syntacticizing Speech Acts Part II

31

binding, speaker-evaluative adverbs, and switch reference). They propose an articulated POV structure, consisting of a domain for evidentiality (EvidP) and one for Evaluation (EvalP). The arguments of each of these projections include the Seat of Knowledge (i.e., the POV holder) in their specifier positions. Constraints on the POV receive a syntactic analysis. Thus, Speas and Tenny syntacticize major aspects of the speech act–clause type mapping and constraints on POV, as in (30). (30) SAstructure

POVstructure

pstructure

Including a domain dedicated to contextually determined interpretation of this type as in Speas and Tenny (2003) is theoretically and empirically motivated. Theoretically, the evidence is one of economy. Since the same ingredients have been established for argument structure, the range of empirical phenomena they cover receives an explanation without adding new assumptions. The only new assumption is that there is an extra layer of structure that dominates the propositional structure and that this structure is concerned with phenomena that are typically viewed as pragmatic. The latter assumption is precisely what some take issue with (see section 2.5.3). That is, syntax has been argued to be context-free, and the domain of linguistics that deals with context-dependent meaning is taken to be pragmatics. However, over the past few decades, syntacticians have included abstract arguments in the syntactic representation whose interpretation is contextually determined. For example, tense can be analyzed as a syntactic head (T), which relates the reference and event time (encoded in Aspect and VP, respectively) to the utterance time (Demirdache and Uribe-Extebarria 1997). As argued by Stowell (1996), the interpretation of these abstract arguments (sometimes conceived of as PRO) is restricted by familiar grammatical mechanisms (e.g., binding and control). Hence, adding linguistic objects whose interpretation is contextually restricted is an independently motivated assumption.

32

The Syntacticization of Speech Acts

Empirically, the arguments for the neo-performative hypothesis are as follows. Speas and Tenny derive the restricted number of clause types across languages and the restricted number of speech act roles (though see Gärtner and Steinbach 2006 for some critical remarks). And by proposing a syntactic analysis, they are able to address the longstanding problem of the mapping between clause type and speech act type. And finally, Speas and Tenny are able to account for a range of phenomena that involves the interaction between speech act type and point of view. Besides the interpretive pieces of evidence for the neo-performative analysis, several scholars have since corroborated Speas and Tenny’s claim based on a wide range of empirical phenomena. One of the most striking pieces of evidence – already discussed in Ross (1970) – is agreement with speech act participants, even if they are not part of the argument structure of the verb. Since Ross (1970), this phenomenon has been reported for several languages, including Galician (Uriagereka 1995), Magahi (Verma 1991), and Mupun (Frajzyngier 1989). And evidence has come forth that clearly situates this type of agreement as part of syntactic structure. For example, it is sensitive to clause-type restrictions (Oyharçabal 1993, Miyagawa 2017). And several authors show that agreement with speech act participants is syntactically conditioned, suggesting that these speech act participants have to be part of the syntactic representation (Zu 2013, Haddican 2015). Other empirical domains that have been analyzed as part of a pragmatic structure are vocatives (Hill 2007, 2013) and discourse particles (Munaro and Poletto 2002, Speas and Tenny 2003, Pak 2006, Davis 2011, Saito and Haraguchi 2012, Haegeman and Hill 2013, 2014, Saito 2015, Woods 2016a, among others). An analysis in terms of speech act structure allows for a straightforward analysis for their form, function, and interpretation. Their formal properties are consistent with the neo-performative versions of speech act structures as they are often derived from verbs (Cardinaletti 2011) and hence are compatible with the assumption that the speech act head is a predicate of sorts (Haegeman and Hill 2013, Haegeman 2014: 371). Thus, over the past few decades, there has been significant work on the syntax of speech acts. This work draws on the main insight of Ross (1970) (and Sadock 1969b). The empirical evidence for the syntacticization of speech acts has broadened and includes not only interpretive properties but also overt morphosyntactic ones. And assumptions about the role and architecture of syntax have changed since Ross. These changes that define the neoperformative hypothesis are immune to some of the criticism of the original performative hypothesis. Recall from section 2.5 that there are three types of arguments against the syntacticization of speech acts.

2.6 Syntacticizing Speech Acts Part II

33

(i) Arguments against Austin’s performative hypothesis (ii) Arguments against the validity of the empirical evidence presented by Ross (iii) Arguments against the syntacticization of pragmatic phenomena These arguments are now addressed as follows. I start with the third argument. Proposing a syntactic speech act structure goes beyond syntacticizing pragmatic phenomena. Adding a layer above the propositional structure is motivated by the fact that utterances may contain linguistic objects that refer to the proposition expressed in the utterance. They may express an attitude toward this proposition or the speaker’s assumptions about the addressee’s attitude about it. This suggests that these UoLs have to be combined with – and thus be higher than – the propositional structure. For some of these UoLs, especially those that appear at the sentence periphery, it might be argued that they are not part of sentence structure but instead are added outside the clause (Kaltenböck, Keizer, and Lohmann 2016). This however begs the question as to what the computational mechanism is that is responsible for not only combining these elements, but also for prosodically integrating them. Syntax is precisely the module responsible for doing that: combining UoLs into a structure that can be interpreted and pronounced. As for the validity of the empirical evidence, there are two points to be made. First, just because (some of) the phenomena discussed in Ross (1970) may receive a different explanation does not imply that the proposal itself is wrong. Second, one of the empirical arguments is the problem of infinite regress: nothing prevents the performative clause from being dominated by another one, and another one, and so on. However, this problem only arises if the performative clause is viewed as yet another instance of propositional structure, as in Ross (1970). Under the neo-performative analysis, the problem dissolves: the performative structure is part of the functional clausal architecture. Whatever is responsible for generating this functional architecture will prevent recursive application. That is, if functional categories (like TP, CP, etc.) are not subject to the problem of the infinite regress, neither is the speech act structure. This requires us to assume that functional architecture is constrained in some way and that it goes beyond the application of Merge. Moreover, one of the original problems raised by the performative hypothesis has to do with the mapping of clause types onto speech acts. While there are only a handful of (universal) clause types, there are potentially infinitely many speech act types. Speas and Tenny’s (2003) proposal addresses this. Tenny (2006: 283) argues that “the grammaticized Speech Acts indicated syntactically within the Speech Act Projection do not correspond to all the different types of illocutionary acts that are possible – these must remain in the pragmatics of the language. Only a small set of basic speaker/addressee

34

The Syntacticization of Speech Acts

relations is grammaticized in a spare, stylized template, which the users of the language can then employ in creative ways to communicate with each other.” Finally let us turn to the argument against Austin’s performative hypothesis. Since Speas and Tenny’s neo-performative hypothesis (along with most other proposals that follow in their footsteps) is directly built on Ross’s (1970) insight, which in turn is based on Austin (1962), this argument does not disappear. This is one of the reasons I postulate a structure quite different in nature from existing analyses. 2.6.3

New Theories, New Problems

There are several problems with the neo-performative hypotheses, which I now discuss. First, speech acts are complex phenomena (Beyssade and Marandin 2006). They are best viewed as consisting of at least two components: (i) the speaker’s attitude or commitment toward the utterance, and (ii) the speaker’s call on the addressee. Both these components play a role in the original speech act theory in the form of the speech act trichotomy (locution, illocution, perlocution). However, the discussion of speech acts and the literature that seeks to syntacticize them typically ignore the perlocutionary act. The locutionary act corresponds roughly to the propositional structure, and the illocutionary act corresponds roughly to the speech act structure. With the exception of Beyssade and Marandin (2006), there is hardly any discussion of the perlocutionary act. Their notion of the call on the addressee is closely related (though not the same) as Austin and Searle’s notion of perlocution. Crucially, there are UoLs that appear to target just that: they serve to encode what the speaker wants the addressee to do with the utterance. For example, rising intonation encodes the speaker’s request for the addressee to respond (Heim 2019a). Similarly, the use of please in English has been analyzed as a speech act head that marks the utterance as a request (Woods 2016b), and requests require a response from the addressee. If indeed speech acts may be decomposed, then they cannot be treated as primitives. We expect that they can be composed in different ways within and across languages, giving rise to interesting typological questions. Even though early on Searle raised this question regarding variation, it has not been properly addressed within the generative tradition. A second problem with the neo-performative hypothesis has to do with the proposed structure. Speas and Tenny (2003) propose (like Ross did) that the speaker argument is higher than the addressee argument. This analysis faces problems in light of certain ordering restrictions on sentence-final particles. If a language has both speaker- and addressee-oriented sentence-final particles, then the speaker-oriented one is linearized in a position closer to the propositional structure than the addressee-oriented one (see Lam 2014, and section

2.7 Conclusion

35

5.4.3 below) and speaker-oriented but not addressee-oriented particles show clause-type restrictions (Lam 2014). This suggests that the speaker-oriented projection but not the addressee-oriented one is in a syntactic position close enough to the propositional structure to impose selectional restrictions (see section 5.4.4). Finally, the third problem is one that holds more generally for functional architecture: it is not clear what determines the labels. This holds equally for categories inside the propositional structure as it does for speech act structure. The problem is made clear by looking at the various proposals for syntacticizing speech acts. The most pervasive label used is simply speech act phrase (Speas and Tenny 2003, Hill 2007, Krifka 2013, Haegeman 2014, Servidio 2014, Kido 2015, Zu 2015, Corr 2016, Woods 2016a, among others). This label is inherently problematic, given the complex character of speech acts: if they are not primitives, we should not use it as a label. However, in the absence of clearly defined criteria for how to determine the label of a functional category, it is hard to know what the alternative should be. Other than speech act phrase, we find several other labels in the literature. For example, Cinque (1999: 106) within his cartographic framework uses the label mood subscripted with speech act (Moodspeech act), along with other instances of Mood heads that serve speech act related functions (i.e., Moodevaluative, Moodevidential, and Moodepistemic). Others use the term ForceP (Rizzi 1997, Haegeman 2014, Paul 2014), either alone or in combination with other projections in the left periphery, reflecting the complexity of speech acts. Additional labels include Illocutionary Force (Coniglio and Zagraen 2012, Corr 2016), Illocutionary Act (Woods 2016b), C(lause)T(ype)P (Coniglio and Zegraen 2012), Disc(ourse)P (Benincà 2001, Garzonio 2004), Prag(matic)P (Hill 2007), AttitudeP (Paul 2014), or Part(icle)P (Haegeman 2014). The wealth of labels makes it clear that the field needs a principled way to determine such labels in a way that allows us to account for universals and cross-linguistic variation. 2.7

Conclusion

The goal of this chapter was to set the stage for the grammar of interactional language. One of the core pillars of my proposal is firmly rooted in the body of literature that aims to syntacticize speech acts. There are two key insights behind this enterprise that remain relevant. First, according to classic speech act theory, when speakers utter a sentence, they not only say it, they do something with it (Austin 1962). Second, even though traditionally speech acts belong to the domain of pragmatics, there have long been proposals for understanding speech acts from a syntactic point of view. In particular, Ross developed the performative hypothesis

36

The Syntacticization of Speech Acts

according to which every sentence is embedded in a speech act structure. Within this structure, the speech act interpretation is determined, even though in the original proposal this structure is not spelled out. With the demise of generative semantics, and due to a series of arguments against it, the performative hypothesis was largely abandoned. However, it has been resurrected over the past 20 years. This was made possible by the introduction of functional architecture and the inclusion of contextual information in syntactic structure. However, current versions of the syntax of speech acts suffer some drawbacks as well. Many of these syntactic analyses of speech acts directly draw on Ross’s original analysis. The problem is that speech act theory has been developed, but the insights of the pragmatic, semantic, and philosophical literature in this area have not made it into the literature on the syntax of speech acts. Many of the current approaches toward speech acts focus on the interactional dimensions of speech acts (see Chapter 3). Specifically, the speaker’s intentions for communication are taken into consideration and so is the role of the addressee. In contrast, in classic speech act theory the addressee matters only for the sincerity conditions. Hence, the absence of the interactional dimension in classic speech act theory is mirrored in the syntacticization of speech acts. While the addressee is encoded as the goal of the speech act, there is no room for interactivity. The transfer of the descriptive content of what is being said from the speaker to the addressee is viewed as something that is non-negotiable. Simplifying a bit, the speech act model on which these analyses are based is as in Figure 2.9. In the case of assertions, the speaker says something, thereby doing two things. CG

CG q, r

p

Figure 2.9 A Stalnakerian common ground update

p, q, r

2.7 Conclusion

37

First, S gives A information. And second, this information becomes part of the common ground (CG). All of the things we do with language involve the addressee: we do something to and with others. While the role of the addressee is recognized in the informal description of what we do with words, it plays virtually no role in classic speech act theory.

3

From Speech Acts to Interaction

It takes two to tango.

3.1

Introduction

This chapter provides a review of current frameworks intended to understand interactional language. It serves as the basis for developing a grammar of interactional language. As we saw in Chapter 2, classic speech act theory introduces a differentiation between what is said (the locutionary act), what is intended by the speaker (the illocutionary act), and what is (sometimes) effected in the addressee (the perlocutionary act). Speech acts are associated with particular conditions that regulate their felicitous use. For example, the (illocutionary) force of assertion is felicitous only if (i) the speaker believes what they say and (ii) the speaker wants the addressee to also believe what they say. This is a step toward understanding the action of assertion. What is missing is the interactional dimension. Intentions do not exist in isolation; in linguistic interaction, intentions are addressee-oriented. Speakers intend their utterances to have a certain effect on their interlocutors, if only to be understood. In (1), I ’s utterance is a felicitous assertion, but since R does not respond, it is not a felicitous interaction. (1)

Context: Anne spontaneously decided to buy a new dog. She didn’t tell anyone. Next time she sees her friend Betty, Anne tells her the news: I I have a new dog. R #[doesn’t respond]

If understanding is not signaled, the speaker’s intention is not fulfilled. The absence of a response in (1) would probably prompt a follow-up request for a response by I (e.g., So what do you think?). Hence, for a felicitous assertion, it is not sufficient for the speaker to want the addressee to believe what they say. The addressee also has to indicate that – as a result of the assertion – they now believe what was said. This is commonly achieved with a response marker, as in (2). 38

3.1 Introduction (2)

I R

39

I have a new dog. Oh, yeah? (Congratulations!)

The requirement for the addressee to indicate their acceptance has influenced the way speech acts are viewed in some frameworks. There are two ways the addressee plays a role in interaction: the speaker may involve the addressee by requesting a response and the addressee gets involved by providing a response. It is these two types of acts (initiation and reaction) that constitute the essential ingredients for interaction. That is, linguistic interaction is not simply made up of a series of speech acts put together like beads on a string; rather these speech acts are systematically linked together. Speech acts cannot be interpreted in isolation. The smallest unit of analysis for interactional language is the sequence of initiation and reaction (Weigand 2016). The main hypothesis I defend in this monograph is that the grammar of natural languages is configured to make interaction possible and that it is sensitive to the distinction between initiation and reaction. The purpose of this chapter is to provide an overview of models that take the interactional dimension into consideration. As illustrated in Figure 3.1, these models can be classified into dialogue-based models and grammar-based models; within the latter, we can distinguish between functional and formal approaches and further between those that focus on semantic aspects and those that focus on syntactic aspects. We can glean insights into the empirical phenomena that define interactional language from dialogue-based approaches. However, these studies lack a framework for disentangling the various factors that contribute to the interpretation of discourse markers. Consequently, it becomes impossible to develop a typology. In contrast, formal syntactic approaches, while well equipped for cross-linguistic comparison, have not explored interactional

Ways to analyze the interactional dimension

Dialogue-based

Grammar-based

functional

formal

semantic

Figure 3.1 Analyzing the interactional dimension

syntactic

40

From Speech Acts to Interaction

language and consequently they lack the empirical base necessary to develop this typology. To be sure, I do not aim to provide a detailed critique of existing frameworks, nor do I intend to argue why a formal framework of the type developed here is superior to any of the existing frameworks. That would be beside the point. Each of the frameworks reviewed, as well as the one developed here, has different perspectives and goals. What I want to do in this chapter is review existing frameworks in order to determine what ingredients an adequate theory of interactional language requires and what type of data it will have to account for. As a secondary goal, this chapter is the first attempt to survey these frameworks. There is little interaction among the different frameworks, despite the fact that they have a similar object of investigation. Thus, the purpose of this monograph is to lay the groundwork for a typology of interactional language. 3.2

Philosophical Underpinnings

Classic speech act theory emphasizes that by uttering a sentence, we not only say something, we also do something. This gives rise to a type of meaning that goes beyond propositional meaning, which was the subject of logicophilosophical investigation up until Austin’s lectures. Though note that Frege (1892, 1918) was already concerned with demarcating different types of meanings; specifically, he identified a class of meanings that do not “affect the thought” or “touch what is true or false.” In this section, I review the philosophical roots behind this dichotomy of meaning. 3.2.1

Assumptions about Conversations

There are aspects of meaning that come about not because of what is said but because of what the speaker means. For some aspects of speaker meaning, it is crucial to be aware of the context and to share assumptions about how conversations proceed. Consider the interpretation of indirect speech acts. What the speaker says is a question about the ability of the interlocutor. But this is not how this sentence is typically used; the speaker intends for the addressee to pass the salt. (3)

Can you pass the salt? (Searle 1975: 61)

Speakers have no problem understanding the utterance as a request. But how can you say one thing and mean something else? According to Grice, it has to do with assumptions we make about conversations. There are three ingredients for understanding aspects of meaning that go beyond the literal meaning of an utterance.

3.2 Philosophical Underpinnings

41

(i) Interlocutors have to recognize the communicative intention of the speaker (ii) Interlocutors rely on some mutually shared background information (Searle 1975) (iii) Interlocutors rely on the principles of cooperative conversation (Grice 1975) Now suppose the addressee is sitting next to the saltshaker, has a free hand, and is not obstructed from passing the salt. In this context, the addressee will recognize that the literal meaning cannot be what the speaker intends to convey. The answer would be obvious and thus the speaker would not have to ask. So, there must be some other intention behind the speaker’s utterance. And this is where Grice’s cooperative principle (4) comes into play. (4)

Cooperative Principle Make your contribution such as is required, at the stage at which it occurs, by the accepted purpose or direction of the talk exchange in which you are engaged. (Grice 1975)

Thus, Grice divides meaning into two types: the meaning of what is said, and the intended meaning. This type of meaning is known as implicature, and comes in two guises: conventional (which depends on particular lexical items) and conversational (which is based on the assumption that the speaker is cooperative). Implicatures arise because of what is done, not because of what is said. Grice offers an explicit account of effective communication. The interlocutors rely on assumptions about the world, about the social context, about the discourse context, and about the ongoing conversation, as well as on the rules of conversation. In short, conversational implicatures arise because of assumptions we make about the normal course of a conversation. These assumptions are not built into the grammar. They are a matter of pragmatic inferencing; they are situated in those aspects of the force of a sentence that are purely contextual and hence outside of grammar, as illustrated in Figure 3.2. Unless the interlocutors know each other well, it can be hard to detect the exploitation of a maxim by violating it (as in indirect speech acts). In this case, it behooves the speaker to mark that the normal course of a conversation is being violated so that the addressee is led to find the implicature. In English, discourse markers and intonational contours may be used for exactly this purpose. Take for example the use of well in (5). (5)

I Where is it? R Uh, well, I’m not sure about how much you know about China. Well, it’s a beautiful city. (Esfandiari Baiat et al. 2013: 285 (3))

42

From Speech Acts to Interaction F(p) Pragmatics F(Cx) Assumptions about the normal course of conversation

Grammar F(p)

F(G) Illocutionary force

p Propositional content

Figure 3.2 Situating assumptions about the normal course of conversation

R ’s response does not answer I ’s question. The use of well, sometimes classified as a delay marker (Li and Xiao 2012), signals that the response is relevant (despite appearances). According to Esfandiari Baiat et al. (2013: 284) “speakers may use some linguistic signals to hold the floor and to make the seemingly incoherent units of utterances into a coherent one; well is among such linguistic signals.” Specifically, well can signal (among other things) that the response is not the response requested, but that there is a reason for the delay; something else needs to be said first. Hence, well signals that the normal course of the conversation is violated: in this case, the requirement for an appropriate response. By acknowledging this violation, it is implied that there is a reason for it. Crucially, markers of this kind, which regulate the flow of the interaction, are integrated into the grammar. They are often words that have other functions as well (well is also an adverb), they display specific ordering restrictions, and they are integrated into the utterance prosodically. This suggests that there is a direct interface between grammar and the speaker’s intentions and their assumptions about the ongoing conversation. Grammar is configured to make such markers available, to linearize them, and for them to be interpreted. This is what I refer to as the grammar of interaction. 3.2.2

Different Ways of Doing Things with Language

It is recognized across different frameworks that there are different types of meanings and that these meanings have to somehow be integrated into the composition and computation of meaning. One of the key aspects of interactional language is that the addressee plays a crucial role. This is not typically reflected in mainstream philosophical discussions. There are some exceptions, however. Take for example Bühler’s model of communication – the organon model (Figure 3.3) – which recognizes three dimensions of meaning (representation, expression, and appeal).

3.2 Philosophical Underpinnings

43

Objects and States of Affairs

Representation

S

Re ce ive r

Ap

er

nd

n

sio

es

pr

pe

al

Ex Se

Figure 3.3 Bühler’s organon model

The representation function corresponds to propositional meaning, the expression function corresponds to the illocutionary act – it is used for encoding the speaker’s meaning – and the appeal function (which he also refers to as the triggering function) refers to the fact that an utterance needs uptake. It is in this way that with the use of language, we can change the world (Bühler 1913: 45). Bühler’s organon model foreshadows Sadock’s three-way classification of meaning into (i) a representational aspect, (ii) an effective, social aspect (what change does it bring about?), and (iii) an affective, emotive aspect to express the feelings and attitudes of the speaker. What is crucial for our purposes is the insight that the addressee plays an important role in the configuration of meaning. While Austin’s speech act theory contained elements of the addressee-oriented function (i.e., the perlocution), most of the discussion following speech act theory was about the speaker-oriented meaning (and the relation between locution and illocution). Much less work has been done on the perlocutionary aspect of speech acts (see Lee 1974, Gaines 1979, Marcu 2000, Weigand 2010 for some exceptions), including addressee uptakes and how they relate to the interpretation of speech acts. This is despite the fact that Austin makes it very clear that many speech acts require the addressee’s uptake for their felicitous use (e.g., a bet is only felicitous if the addressee accepts that bet). Thus, within the philosophical tradition there is a lack of discussion of the addressee-oriented dimension and consequently there is also a lack of discussion of the interactional dimension (without the addressee, there is no interaction).

44

From Speech Acts to Interaction

This is in part due to the primacy of the sentence as the unit of analysis within this tradition. Especially in the context of theories that focus on truth-conditional meaning, the sentence is identified as the main unit of analysis because it is the domain where truth is assigned. The addressee’s uptake goes beyond the sentence, as it requires the addressee to accept the utterance in a new turn. Given the scheme in Figure 3.2, it is not clear how the addressee-oriented dimension of meaning relates to the construction of meaning. It is clearly part of the contextual aspect of meaning in that assumptions about the normal course of a conversation will necessarily include assumptions about the addressee’s role in this conversation. However, it is not immediately obvious whether grammatically encoded force also subsumes an addressee-oriented dimension of meaning. This is an empirical question, and I show in the course of this monograph that there is evidence that it does: interactional language is composed of speaker- and addressee-oriented ingredients that display systematic grammatical patterns. While this is not discussed in much of the scholarship in philosophy and formal linguistics, it is a core part of the empirical base of conversation analysis (CA), to which I turn next. 3.3

Dialogue-Based Frameworks

3.3.1

Conversation Analysis

The article by Sacks, Schegloff, and Jefferson (1974) demarcates the beginning of CA as a sub-discipline of linguistics. The object of investigation of CA is conversation. While the form and meaning of sentences can be explored in isolation, this misses important aspects of interactional language such as turntaking management. That is, conversations are characterized by having multiple interlocutors who coordinate their speech acts with each other. Thus, CA differs significantly from the speech act tradition (including Grice’s intentionalist perspective). The former views language use and construction of meaning as a collaborative act, whereas the latter views it as an intentional act by a speaker. It is only when we take an interactive stance that we can properly understand UoLs used to regulate interaction. Hence, CA adds a number of new empirical domains to be explored, including turn-taking, backchannels, and adjacency pairs. 3.3.1.1 Turn-Taking When interlocutors talk to each other, speakers face an immediate challenge: only one person can talk at any given time. Interlocutors take turns. So how do speakers know when to begin and end their turn? There is systematicity in the organization of conversations, which makes for effortless turn-taking: it is defined by minimal silence between turns on the one hand and

3.3 Dialogue-Based Frameworks

45

little overlapping speech on the other hand (the “no gap/no overlap” requirement; Sacks et al. 1974). While there are contexts where turn-taking is highly structured via convention (e.g., in ceremonies, interviews, and debates), everyday conversations lack explicit conventional rules and yet speakers know when and how to take turns. “Language use, and social interaction more generally, is orderly at a minute level of detail” (Stivers and Sidnell 2013: 2). While originally conceived of as a theory of social interaction, the core goal of CA is to uncover the structural underpinnings of everyday conversation, which makes it a linguistic enterprise, albeit an interdisciplinary one. Given that the systematicity of turn-taking is at the core of CA, its unit of analysis is defined in terms of turns. Specifically, Sacks et al. (1974) demonstrate that turns are constructed to indicate their natural endpoints: there are certain points in a turn where another interlocutor may take over. These are known as turn-constructional units (TCUs), which are self-contained utterances and recognizable as possibly complete (Clayman 2013: 151). Crucially, natural languages have means to mark or foreshadow (project) the points in a turn where another interlocutor may take over. These include syntactic, prosodic, pragmatic, or gestural clues (Ochs, Schegloff, and Thompson 1996). Given the systematic use of these UoLs, we can conclude that there must be a system that regulates them. In light of these findings, we have to adjust our model for decomposing force. We have to assume that there is a difference between illocutionary force (the speaker’s intention) and interactive force (the regulation of interaction with another interlocutor), as in Figure 3.4. Like illocutionary force, interactive force comes in two guises, one conditioned by context via assumptions about the normal course of conversations and one grammatically conditioned.

F(p)

Grammar

Pragmatics Assumptions about the normal course of conversation

Interactive force

Illocutionary force

Figure 3.4 Adding the interactional dimension

Propositional content

46

From Speech Acts to Interaction

3.3.1.2 Backchannels Backchannels are expressions like English uh-huh, yeah, and mhm (along with non-vocal gestures such as head nods) that indicate that the addressee is following and acknowledging the content of the preceding utterance without claiming a turn. The term backchannel was introduced by Yngve (1970; see also Kendon 1967, who uses the term accompaniment signal). Since backchannels indicate that the current speaker’s turn is continuing (Schegloff 1982), they are sometimes referred to as continuers (see Fries 1952: 49 for an early description of this phenomenon). The distribution and function of backchannels is well studied within CA, but has not been studied in grammarcentered frameworks. This is not surprising because backchannels can only be understood outside the sentential domain, and some of the core UoLs that serve as backchannels (e.g., uh-huh, mhm) do not even seem to be proper words of English. They do not contain descriptive meaning and their function is restricted to managing conversations. Consider the example in (6). (6)

B: I’ve listen’ to all the things that chu’ve said An’ I agree with you so much. Now, I wanna ask you something. I wrote a letter (pause) A: Mh hm, B: t’the governor. A: Mh hm ::, B: -telling ’im what I thought about i(hh)m! A: Sh:::! B: Will I get an answer d’you think, A: Yes (adapted from Schegloff 1982: 82 (4)1)

The preface I wanna ask you something makes it clear that B’s turn is meant to contain a question and that there is a reason for this question. With the use of mhm, and sh:::, the addressee acknowledges what has been said: the information is now part of the common ground. Typically, backchannels are used around TCUs. In the absence of any marker of acknowledgment, the speaker doesn’t know whether they can assume that the propositional content is accepted and hence part of the common ground. In other words, grounding is not automatic: it is a process that requires the addressee. Interestingly, the backchannelling function can be fulfilled by repeating the primary speaker’s full utterance (echo-checking; see Traum 1994), as in (7), or part of it, as in (8). (7)

I It would get there at 4. R It would get there at 4. (Traum 1994, 4 (1))

1

I do not represent here all the details of the transcriptions of corpus data. See the sources for those details.

3.3 Dialogue-Based Frameworks (8)

A: B: A: B: A: B:

47

I got everything taken care of. I got insurance on it too. how much it under my name. eleven hundred a year. eleven hundred. three hundred dollars down that’s cheap man. (adapted from Clancy et al. 1996: 361 (3))

The possibility for using full phrases as backchannels suggests that the backchannelling function, too, is part of grammar. This is supported by the fact that backchannelling particles have all the hallmarks of conventionalized form–meaning pairs (Dingemanse 2020). Their form is conventionalized and thus languagespecific. For example, Japanese backchannel particles include un, a=, ee, ha=, and in Mandarin we find uhm, a, ao, eh, hum, and so on. (Clancy et al. 1996). In addition, the frequency and distribution of backchannels differ across languages (Clancy et al. 1996, Cecil 2010). The existence of cross-linguistic differences in the use of backchannels (and turn-practices more generally; Cecil 2010) is indicative of the grammatical nature of this dimension of language (Ginzburg and Poesio 2016). The systematicity in the use of backchannels is part of the speakers’ tacit knowledge of the language. And the variation suggests that there is some amount of variability built into the system; that is, it is subject to the kind of variation that has to be acquired based on the input language. Another indicator that we are dealing with a phenomenon that deserves grammatical analysis is the fact that backchannel particles are multifunctional, a hallmark of grammar. In Chapter 6, I develop an analysis for response markers that accounts for this multi-functionality. 3.3.1.3 Adjacency Pairs CA, as a framework for exploring conversations, naturally looks at responses other than backchannels. Certain types of utterances display a preference for a particular response. These utterance-response sequences are known as adjacency pairs: questions are followed by answers (9a), commands by acceptances (9b), and greetings by counter-greetings (9c). (9)

a. b. c.

I R I R I R

Who left? Bill. Open the window please! Sure. Hi! Hiya! (Ginzburg 2016: 134, (3))

The types of responses in these adjacency pairs follow the normal course of a conversation, which speakers can depart from. For example, if the addressee

48

From Speech Acts to Interaction

does not know the answer to a question, they will have to reply in ways that indicate that, and so on. Dispreferred responses tend to be avoided, but if they do occur, they are marked: they are often delayed by a pause, prefaced with well, and include an explanation for why a dispreferred response is given (Levinson 1983: 333). 3.3.1.4 Summary We have now seen that some of the core tenets of CA are in line with those of generative grammar: there is a communicative competence, there are UoLs used to facilitate conversations, and they are systematic, conventionalized, and thus subject to cross-linguistic variation. Nevertheless, up until recently, there was no interaction between theories of social interaction and formal linguistics. Hence, the potential for developing a full-fledged theory of language competence, including communicative competence, did not arise (but see Ginzburg and Poesio 2016). This may have to do with methodological differences: within CA, spontaneous conversations serve as the empirical basis, whereas in generative grammar, it is well-formedness judgments of constructed sentences. The kinds of data explored in CA are deemed performance data, filled with disfluencies. Thus, despite the fact that both CA and generativism subscribe to the assumption that language is systematically structured, their empirical base is so different that collaboration seemed out of the question. But this raises the question as to what language really is and hence what a theory of language competence should be able to account for. As Schegloff (1979: 263) observes, the types of sentences treated in mainstream grammar (generative or otherwise) are those that one would write rather than use in conversations. But the natural habitat for language is not its written incarnation. In fact, it is at the very core of the generative enterprise to get away from prescriptivism and to understand what people know when they know a language. And part of what people know is how to take turns and how to indicate understanding. To abstract away from actual performance factors, the study of conversation has to be enriched with different methodologies. Corpus studies are a great way to find the kinds of phenomena that one may want to explore, but they are not controlled enough to draw conclusions about what is possible, and what is not. One of the pioneers in approaching communicative competence from a variety of methodological angles is Clark (1996), who introduced grounding theory, to which I turn next. 3.3.2

Grounding Theory

Clark’s approach toward language in use (explicitly summarized in Clark 1996) is guided by three core tenets: (i) Utterances, rather than sentences, are the unit of analysis (ii) Speaker meaning is primary (iii) Conversations (both speaking and listening) are a joint activity

3.3 Dialogue-Based Frameworks

49

This addresses one of the problems of classic speech act theory: the role of the addressee has to be taken into consideration. It includes considerations of turntaking and grounding, two aspects of language that require coordination between both interlocutors. Crucially, these two aspects of language are explicitly encoded by UoLs: they show some universal properties and display systematic variation; hence, they have to be part of grammar, as illustrated in Figure 3.5. Clark uses a three-pronged approach toward the study of language use: intuition, experiment, and observation. Thus, he continues the tradition of CA and relies on observation (via corpus analysis); however, to gain a complete picture, he emphasizes the role of speaker’s judgments as well as experimental linguistics. In terms of the empirical domains, Clark’s work spans across all of the core topics of CA, and his contributions are an important stepping-stone toward the formalization of communicative competence. Consider first adjacency pairs. Within CA, adjacency pairs are constructionspecific: the speech act types for each of the pairs is specified as such (questions/answers, commands/acceptances, etc.). However, the patterns we observe are similar across different speech act types: the success of a speech act requires an appropriate response by the addressee. Hence, the logic of adjacency pairs can be generalized. This is what Clark and Brennan (1991) propose in their grounding theory: the common ground cannot be properly updated without the process of grounding, which involves the speaker and the addressee. Grounding minimally requires a sequence consisting of a presentation phase F(p)

Pragmatics Assumtions about the normal course of conversation

Grammar

Interactive force

Turn-taking

Illocutionary force

Propositional content

Grounding

Figure 3.5 The multi-dimensionality of the interactional dimension

50

From Speech Acts to Interaction

and an acceptance phase. This echoes Humboldt’s notion of Anrede (address) and Erwiederung (response) and Weigand’s (1991) notion of initiating and reacting utterance (the terminology that I adopt here). Since not all initiating moves have to be accepted, I refrain from Clark’s terminology. Grounding theory thus simultaneously recognizes the complexity of speech acts and the role of the addressee in the grounding process. Furthermore, it has consequences for the way we view the common ground (Stalnaker 1978, 2002). In particular, Clark and Brennan’s (1991) conceptualization of grounding as critically involving the speaker and the addressee provides a view of common ground that is based on interaction. Another crucial innovation of grounding theory concerns its ontology. The common ground not only contains propositions and discourse referents, as in its more traditional incarnations, it also includes meta-communicative acts: things we do when we coordinate our conversations, such as turn-taking, and the grounding process itself (Clark and Schaefer 1989, Traum 1994, Brennan 2005). Another important aspect of Clark’s work is his attention to UoLs that have not typically been studied in linguistics. For example, Clark and Fox Tree (2002) show that vocalizations that appear to be disfluencies (such as English uh and uhm) are conventional English words, as speakers plan their utterances and use them just as they would use any word (see Dingemanse 2020 for a recent analysis). Instead of being disfluencies, they are analyzed as marking disfluency and they do this in systematic and language-specific ways. Hence they can be viewed as being part of the language system, rather than as performance errors (see also Schegloff, Jefferson, and Sacks 1977, Levelt 1983, 1989). 3.4

Functional Grammar-Based Frameworks

Grammar-centered frameworks can be divided into two subtypes: functional and formal. What all functional approaches have in common is that they take the function of language as the basic force behind its formal properties: function drives form. Since interpersonal communication is one of the basic functions of language, most functional linguists pay attention to aspects of language dedicated to communication. This contrasts with formal approaches, which take function to be secondary (form derives function). Consequently, functional approaches often take units other than sentences as their unit of analysis. In this section, I introduce several functional approaches toward the interactional dimension of language, with a focus on the ingredients that are missing in current syntactic theorizing.

3.4 Functional Grammar-Based Frameworks

3.4.1

51

Systemic Functional Linguistics

Systemic functional linguistics (SFL; Halliday 1985a) views language as a system that functions as a resource for creating meaning rather than as a system of symbols. There are three main functions of language that SFL recognizes: (i) ideational, (ii) interpersonal, and (iii) textual. The ideational function corresponds to propositional meaning. It is used to express our experience of the world. The interpersonal function is concerned with the way we construe interactions with other people and corresponds to the turn-taking aspect of interactional force. And finally, the textual function is concerned with the way the flow of information in communication is structured to create text. In part this relates to the grounding function, and in part it belongs to the sub-discipline of linguistics known as information structure. SFL takes a top-down approach in that the communicative function of language is assumed to be responsible for its formal properties. Many traditional grammatical elements are viewed as interactional in nature. The focus on the interpersonal function of language reflects the way SFL views the classic notion of speech acts. SFL identifies two acts of speech: giving and demanding and hence SFL recognizes the role of the addressee: the speaker is not only doing something, they are also asking the addressee to do something. Halliday and Matthiessen (2014: 135) suggest that an act of speaking is more appropriately called an interact: giving implies receiving, and conversely, demanding implies giving a response. Given the focus on the interpersonal function of language, it is perhaps surprising that UoLs that are used for interactional purposes do not play a central role within this framework. In fact, Halliday and Matthiessen (2014: 159) suggest that purely interactional UoLs, such as for example vocatives, are at the “fringe of grammatical structure.” Nevertheless, there is no question that such elements should figure in grammatical analysis, because “they participate fully in the intonation and rhythm of the clause” (Halliday and Matthiessen 2014: 159). Analyses of discourse markers are mostly couched within attempts to explain their use via grammaticalization (Brinton 1996) and pragmaticalization (Diewald 2011, Degand and Evers-Vermeul 2015). 3.4.2

Functional Discourse Grammar

Functional discourse grammar (FDG; Hengeveld 2004, 2005, Hengeveld and Mackenzie 2006, 2008) is the successor to Dik’s (1997) functional grammar. It aims to understand the grammatical properties of language but also to accommodate the communicative intentions of speakers and the contextual information available at the utterance situation. In line with the focus on discourse, it pursues the analysis of language in a top-down approach, starting with the speaker’s intention and moving down to the articulation of a sentence.

52

From Speech Acts to Interaction Conceptual component

Contextual component

Grammatical component

Acoustic component

Figure 3.6 The architecture of FDG (based on Hengeveld 2005)

As a grammar-based model, FDG is explicit about how thought (including a speaker’s intention), grammar, and context interact with each other. The proposed architecture (Figure 3.6) contains a conceptual component (akin to a language of thought), which feeds both the grammatical and contextual components. The grammatical component feeds the acoustic and contextual components. And the contextual component, which contains non-linguistic information pertaining to the discourse context, feeds both the grammatical and acoustic components. The grammatical component contains an interpersonal level (roughly pragmatics), a representational level (roughly semantics), a morphosyntactic level, and a phonological level. Thus, what is traditionally classified as pragmatics is here subdivided into the interpersonal level in the grammatical component and the contextual component. The contextual component contains information about the physical context within which communication takes place: this includes information about the speech situation, including properties of the speaker and addressee as well as the time and place of the speech situation. This accords with the division of labor I introduced above, according to which we need to distinguish between contextually conditioned and grammatically conditioned force. The unit of analysis of FDG is the discourse act (the minimal unit of communication). In addition to propositional structure (subject–predicate construction), the discourse act contains the illocution and a representation of the interlocutors (speaker and addressee). It is reminiscent of the TCU in CA in that it can provoke a backchannel and is embedded in a larger unit known as a move. Thus, the assumed configuration for turns is as in (10). (10)

[Move [Discourse structure]]]

Act

Illocution, Spkr, Addr [Communicated

content

propositional

3.4 Functional Grammar-Based Frameworks

53

Each of these sub-structures is subject to the same principles of composition: they contain heads, (optional) modifiers, and operators. Thus, the interactional structure and the propositional structure are part of the same architecture. Hence this is a model, similar to the one I develop here, where grammar is assumed to be operative all the way up to include acts of discourse. What is especially interesting from our perspective is that FDG is a typological approach. It aims to develop a framework for the systematic description of all possible human languages. Its view of language typology is governed by both a formal and a functional outlook. The forms speakers can use in interaction vary across languages, but this variation is constrained by the range of communicative purposes all language users encounter, no matter which language they speak (Hengeveld and Mackenzie 2008). 3.4.3

Longacre’s Grammar of Discourse

Another grammar-centered approach toward interactional language is Longacre’s grammar of discourse. The main thesis in this framework is that language is language only in context and hence it is important to study dialogues (Longacre 1996: 124). He observes that many utterances that appear to be ill-formed in isolation (e.g., fragment answers) are licensed in the context of a dialogue. Surprisingly, Longacre does not explore live conversations. Instead his empirical base comes from dramas or novels that include conversations. Strikingly, the examples Longacre presents and analyzes rarely contain the kinds of discourse markers we are interested in here. In the context of fiction, constructed dialogue is not always colloquial. One of the core assumptions of Longacre’s grammar of discourse is that discourse structure is part of grammar and that we see identical properties across all levels. He emphasizes the hierarchical nature of all levels from discourse to morphosyntax (see also Harris 1946). He analyzes the structure of dialogues in ways that are reminiscent of Clark and Brennan’s (1991) division of presentation vs. acceptance phase. He suggests that a dialogue can be analyzed as consisting minimally of an initiating and a resolving utterance (Klammer 1971), but he recognizes two more categories that arise in natural conversations. Dialogues do not always get resolved within two turns. Instead the first speaker may respond to the resolving utterance with a terminating utterance, whereas the responder may respond not with a resolving utterance but with a continuing utterance; the latter comes in three guises: counter-question, counter-proposal, and counter-remark. In (11), the initiating utterance consists of a question, but the addressee’s utterance does not provide a resolution. Instead, they follow up with a counter-question.

54

From Speech Acts to Interaction

(11)

Initiating utterance: Continuing utterance: Question: Counter question: Where are you going, Bob? Why do you want to know? (Longacre 1996: 130 (15))

Recognizing continuing and terminating utterances captures the fact that conversations are not just simple pairs of initiation and resolution moves. However, it is not clear how to identify them as Longacre does not provide criterial diagnostics. If they have a linguistic reality, there should be linguistic expressions that serve to identify them. For the two core utterance types (initiation and resolution), confirmationals and response markers serve as diagnostics, as we shall see. In the absence of any evidence to the contrary, I assume that the two basic categories, initiation and reaction, are all we need to classify interactional moves. Other moves can be derived from them (see section 7.3.3.2.). When responding to an initiation move with another initiation, assumptions about the normal course of the conversation are violated and an interpretive effect can arise (e.g., that the interlocutor is making a counter-proposal). Similarly, a terminating utterance may be understood as a reaction move immediately following another reaction move. 3.4.4

Interactional Linguistics

Interactional linguistics is a framework dedicated to exploring the language of interaction (Selting and Cooper-Kuhlen 2000, Couper-Kuhlen and Selting 2001). At its very core, it is a conceptualization of grammar as the knowledge of how to do things (Bybee 2002), and its emphasis on the interactional component of language makes it possible to uncover our knowledge of how to do things together (Thompson and Couper-Kuhlen 2005: 482). Hence, interactional linguistics is a grammar-centered framework ideally suited to capture Austin’s insight that led to the syntacticization of speech acts on the one hand, but also Clark’s (1992, 1996) thesis about the collaborative nature of language in use on the other. Interactional linguistics brings together three distinct strands of research: CA, functional linguistics, and linguistic anthropology. I discuss each of these roots in turn. While CA originated as a sociological enterprise, interactional linguistics marks a turn toward a more linguistically oriented approach toward CA. Grammatical constructions are viewed as interactional practices that are chosen by speakers to achieve certain goals of interaction. Interactional linguistics is in some sense a reaction against the generative enterprise, which views grammar as an autonomous system, a system of knowledge that can be put to use.

3.4 Functional Grammar-Based Frameworks

55

Crucially for generative syntacticians, this abstract system is the object of investigation; for an interactional linguist, language in use is to be studied. The focus on real conversations also makes it necessary to take into account the roles and identities of the interlocutors and the setting within which the conversation is taking place. All these are variables that play no role within the generative enterprise. In terms of its linguistic roots, interactional linguistics lies within the tradition of functional (discourse) grammar: the function of language is assumed to drive its form. In the case of interactional linguistics, the function is equated with interaction: there is a motivated, non-arbitrary relation between linguistic form and its discourse function. Interactional linguistics explores real-life conversations and dialogues, whereas in classic functional discourse grammar it was mostly written language (albeit longer discourse stretches including narratives and stories were at the core of investigation). The influence of CA on interactional linguistics also differentiates it from functional discourse grammar. Specifically, the UoLs that are investigated are markedly different. Whereas in functional discourse grammar the classic notions of words, phrases, and sentences still define the units of investigation, interactional linguistics takes into consideration units and primitives of CA such as turn-taking, repairs, styles of interactions, and prosody. The third field that influences interactional linguistics is anthropology, which determines its cross-linguistic focus. While the interest in cross-linguistic investigation is not specific to anthropological linguistics, its emphasis on interactional language puts a unique spin on it, which ties it to anthropological studies. What is of interest is any culture-specific aspect of conversational practices. The integration of these three disciplinary traditions makes it possible to ask new questions about language: (i) how are languages shaped by interaction? and (ii) how are interactional practices molded through specific languages? (Couper-Kuhlen and Selting 2001). In terms of the analytical practices, there are two ways to approach a particular problem. On the one hand, the linguist may start with the interactional function they wish to explore and determine what linguistic resources are used to fulfill this function. On the other hand, the linguist may start with a particular linguistic form and determine what interactional function may be achieved with this form (Selting and Couper-Kuhlen 2000). For many of the form–function pairings, the assumption is that their recurrent use in interaction to fulfill specific interactional needs may have become “sedimented for the accomplishment of specific social actions in everyday interaction” (Ritva, Etelämäki, and Couper-Kuhlen 2014: 440).

56

From Speech Acts to Interaction

3.5

Formal Grammar-Based Frameworks: The Semantic Angle

In this section, I introduce grammar-centered frameworks with a formal focus (as opposed to a functional one). Specifically, I will discuss formal semantic approaches toward the interpretation of interactional language. 3.5.1

Formal Semantics of the Truth-Conditional Kind

Formal semantic theory has its roots in Frege’s principle of compositionality, though it has not always been among the sub-disciplines of linguistics. The only semantic explorations in traditional (structuralist) frameworks concerned themselves with regularities in word meanings (see Geeraerts 2010). Frege’s work was not considered by linguists, despite the fact that it provided a key for solving a problem that structuralists identified as the stumbling block for investigating meaning: namely, that it is too elusive to be treated with any formal rigor. This is precisely the problem Frege tried to solve, albeit for different reasons. For Frege, the goal was to rid language of ambiguities and vagueness so as to be able to use it in logical argumentation. Richard Montague paved the way for the incorporation of Frege’s insights into linguistics (Montague 1970a, 1970b, 1973). He devised an algebra for the composition of meaning (semantics) that closely matches the composition of form (syntax). Compositionality constrains the relation of the semantics to the syntax: there needs to be a homomorphic mapping between the two (see Dowty, Wall, and Peters 1981). With this framework, known as Montague grammar, Montague brought mathematical methods into the study of meaning, just as Chomsky brought mathematical methods into the study of syntax (Chomsky 1957, 1965). There are two aspects of formal semantic theory that are crucial in the development of a semantics of interactional language: (i) the role of context and (ii) the approach to sentences other than declaratives. One of the key contributions toward incorporating contextual information into the calculation of meaning stems from work on indexicals. Kaplan (1989) argued that indexicals are interpreted relative to a particular context. Each context has associated with it at least an agent (the speaker), a time, a location, and a possible world. Kaplan extends this view to all linguistic expressions; all sentences are interpreted relative to this structured context. The second relevant ingredient for the treatment of interactional language is the inclusion of sentences other than declaratives. Conversations are not just a series of assertions: interlocutors may request information or tell each other what to do, and so on. Much of the research within formal semantics in the early days was about declarative sentences. Within the philosophical tradition, the meaning of a declarative is equated with its truth conditions: to know the

3.5 Grammar-Based Frameworks: Semantic

57

meaning of a sentence is to know what the world has to look like for it to be true (Tarski 1933, Davidson 1967). This view is still held within formal semantics: semantic composition of sentences has to result in a truth value. But it is not immediately clear how a truth-conditional analysis applies to questions or imperatives, as they are neither true nor false. Montague (1973: 32, fn. 3) hints at a possible solution: “In connection with imperatives and interrogatives truth and entailment conditions are of course inappropriate, and would be replaced by fulfilment conditions and a characterization of the semantic content of a correct answer.” This insight is formalized by Hamblin (1958, 1973), who proposes that questions denote sets of propositions, which count as possible answers. For yes/no questions, this set consists of the positive and negative propositions that can serve as answers. For content questions, the set of propositions is derived by the assumption that the wh-word (who, what, etc.) denotes a set of individuals and the application of the set of individuals to the predicate yields a set of propositions that could serve as answers to this question. Karttunen (1977) argues that the denotation of questions includes the set of true answers only. The analysis of questions in terms of answers has become the standard analysis for questions. From the perspective of the grammar of interaction, the Hamblin-style analysis is significant as it recognizes the importance of the answer for the interpretation of the question. It can be viewed as the first attempt to give a formal analysis of interaction, in this case a question–answer pair (see Ginzburg 2016). The other clause type not straightforwardly captured in a truth-conditional semantics is the imperative. Its meaning can be accommodated within formal semantic treatments in several ways. For example, Kaufmann (2012) argues that imperatives denote regular propositions with a modal component such that a sentence like Close the door is interpreted like a declarative containing a modal (You should close the door). Kaufmann further proposes a condition requiring imperatives to be interpreted as performatives – a step toward recognizing their interactional nature. Portner (2007) develops an analysis that is even more explicitly interactive. Following Hausser (1980), he argues that imperatives denote a property restricted to the addressee. This has prompted researchers (including Portner) to include the addressee in their representation. In sum, clause types other than declaratives require the integration of interactional components in truth-conditional semantic analyses. Next, I turn to the incorporation of interactional components in the interpretation of declaratives. 3.5.2

Common Ground and Dynamic Semantics

Within formal semantic frameworks, the interactional nature of assertions is commonly captured with the assumption of a common ground (Stalnaker 1978,

58

From Speech Acts to Interaction

2002). Stalnaker develops a model of conversation that recognizes the role of both the speaker and the addressee to achieve mutual understanding. The common ground is the set of shared beliefs: it is accessed for propositions that are presupposed and updated with those that are asserted. Adding the proposition to the common ground affects the interpretation of following utterances because it is now presupposed. In this way, context (speaker, addressee, and their shared beliefs) is introduced into formal semantic treatments of assertions. Recognizing that assertions have a communicative function, namely to update the common ground, paves the way for a dynamic model of semantics. The context relative to which an utterance is interpreted is constantly changing, and some of this change is brought about via the ongoing conversation itself. The concept of the common ground and the dynamic view of semantics has been developed and modified in various ways. Beginning in the early 1980s, Hans Kamp developed a dynamic theory of interpretation (Kamp 1981, Heim 1982). At its core is the insight that sentences are not interpreted in isolation, but relative to a particular discourse context. This model is known as Discourse Representation Theory (DRT, Kamp and Reyle 1993) and is part of a family of theories known as dynamic semantics, which all share the assumption that meaning simultaneously influences and is influenced by context. As such, dynamic theories of meaning equate meaning with the potential for context change (contrasting with the static conception of meaning that equates it with knowing the conditions for truth) (Groenendijk and Stokhof 1991). So how is the common ground updated? The construction of common ground is not automatic: rather, grounding is a collaborative effort (Clark and Brennan 1991). In the next subsection, I introduce two frameworks that deal with this question. 3.5.3

The Question under Discussion and Being Inquisitive

To understand how common ground is established, it is useful to think about what drives conversations. According to Roberts (1996), discourse is structured via questions. Sometimes these questions are explicit (What time is it?). But according to Roberts (1996, 2012), there is an implicit question even if a conversation starts with an assertion. In particular, the Big Question humans are constantly trying to answer concerns the nature of the world: What is the way things are? Hence, there is always a question under discussion (QUD). Following Carlson (1983), Roberts maintains that resolving this eternal question is a game – a metaphor that goes back to Wittgenstein (1953). The goal of the game is to answer the QUD. Achieving this goal is constrained by the rules of the game. So what are the rules?

3.5 Grammar-Based Frameworks: Semantic

59

According to Roberts, there are two types of rules: conventional rules, which derive from the linguistic system, and conversational rules, which roughly correspond to the Gricean maxims and are not linguistic. They derive from rational considerations. The moves in the game come in two guises, according to Carlson (1983): set-up moves and pay-off moves. At any given point in the conversation, there may be more than one question; in which case they are ordered in a stack: the question on top of the stack is the (immediate) QUD that both interlocutors have accepted. When the question is answered, in ways acceptable to both interlocutors, that question is popped from the stack and the answer is added to the common ground (Roberts 2012: 17). By recognizing the QUD as the driving force behind conversation, it becomes possible to make a distinction between two types of propositional content: at-issue and not-atissue. At-issueness is defined as being relevant to addressing the QUD (Simons et al. 2010: 323; see Tonhauser 2012 for criteria to diagnose [not]-at-issueness). This approach addresses the question of how to represent questions and commands in the common ground by distinguishing between two aspects of interpretation associated with any given move: the presupposed content and the proffered content. Proffered content captures all non-presupposed content of assertions, questions, and commands (Roberts 2012: 5). We don’t only know what is the case, we also know what others want us to do. The discourse component associated with imperatives is known as the to-do-list (Portner 2007) or the Plan-set (Han 2000). Roberts’s (2012) QUD framework is mainly intended as a formal way to approach information structure, that is, the packaging of new and old information in the course of an exchange. It is thus not directly concerned with the nature of dialogue itself. But our thoughts about how the conversation should unfold have to be part of the game as well. The QUD is still relevant for our purpose as it has been influential for formal models of dialogue, specifically for inquisitive semantics (Groenendijk and Roelofsen 2009, Ciardelli, Groenendijk, and Roelofsen 2018), a semantic framework mainly intended for the analysis of linguistic information exchange, that is, the process of raising and resolving issues. The core unit of analysis is that of the issue and its resolution. One of the key insights of inquisitive semantics is the assumption that we need an integrated theory of questions and their answers (assertions): one type of speech act cannot be understood without the other (Ciardelli et al. 2018). The notion of inquisitiveness is comparable to Clark’s grounding theory in at least two respects. It recognizes the role of the addressee in dialogue. This is akin to Clark’s view that grounding is a joint effort between the speaker and the addressee. According to inquisitive semantics, linguistic discourse is about exchanging rather than providing information. This view affects the treatment of questions and answers as the basic units of information exchange and the

60

From Speech Acts to Interaction

way declarative sentences are viewed. Propositions are viewed as proposals to update the common ground. Only bare declaratives uttered with a sentencefinal fall, as in (12a), will lack inquisitive content. They are purely informative, though of course the addressee may still disagree no matter how convinced the speaker is. This is already acknowledged by Stalnaker (1978: 323), who suggests that the content of an assertion is added to the common ground “provided that there are no objections from the other participants in the conversation.” However, declaratives may be modified by means of particular intonational tunes such as rising intonation as in (12b), or sentence-peripheral particles such as eh, as in (12c). Such modified declaratives denote more than one alternative and are thus inquisitive (Farkas and Roelofsen 2017). (12)

a. It’s four o’clock↘ b. It’s four o’clock↗ c. It’s four o’clock, eh↗

Another aspect of inquisitive semantics that has commonalities with grounding theory is that it distinguishes between two types of moves: raising an issue and resolving it. Unsurprisingly, with inquisitiveness at the heart of information exchange, assumptions about the normal course of a conversation will be different than in a purely speaker-oriented model, such as Grice (1975). This affects the way we view conversational implicatures. Within a framework that makes use of the common ground, recognizing (at least) two qualitatively different conversational moves calls for a more finegrained modeling of the process of common ground update. An explicit implementation of such a model is found in Farkas and Bruce (2010). Inspired by Roberts’s (1996) idea that the QUD drives conversations, they propose that common ground updates proceed via an intermediate step, which they refer to as the Table (Farkas and Bruce 2010: 86). They assume that speakers’ utterances are first put on a Table before they are placed into the common ground. On this Table are syntactic objects of previous moves paired with their denotation. In line with Roberts (1996), Farkas and Bruce (2010) assume that the items on the Table record what is at issue in the conversation and that they form a stack. The goal of the conversation is to empty the stack so that all issues are resolved. In sum, to model the exchange of information in the context of a conversation we need the following ingredients: the participants, the Table, discourse referents, and commitments (i.e., propositions and other types of content that the participants publicly commit to) (see also Krifka’s 2015 model of commitment space semantics). Inquisitive semantics provides us with an important step toward a fullfledged formal semantics of interactional language. The most complete and explicit model of this sort is that of Ginzburg (2012, 2016) to which I turn next.

3.5 Grammar-Based Frameworks: Semantic

3.5.4

61

Toward a Formal Semantics of Dialogue

Formal grammar-based models acknowledge that linguistic interaction has to be taken into consideration in order to come to terms with the complexity of natural language meaning. Jonathan Ginzburg’s work can be characterized as turning this conclusion on its head: his goal is to characterize the form and function of dialogue. He takes dialogue to be the primary medium of language use. This unites him with the dialogue-centered approaches and some of the functional grammar-centered approaches. Yet, his work falls squarely within the tradition of formal semantics (Poesio and Traum 1997, Asher and Lascarides 1998, 2003, Poesio and Rieser 2011, Ciardelli and Roelofsen 2017).2 What sets him apart, however, is his attempt to develop an entire grammar of interaction rather than focusing on meaning only. For Ginzburg (2016: 1), there are two key problems that define the analysis of meaning in dialogue: (i) conversational meaning and (ii) conversational relevance. The notion of conversational meaning reflects Ginzburg’s conviction that language has to be understood in its natural habitat: in everyday conversations. It requires a careful description of context including the ongoing conversation and the larger situation. Ginzburg’s work incorporates insights from dialoguebased frameworks including CA and grounding theory. Within formal semantics, Ginzburg’s novel contribution is that he extends the domain of analysis to include interjections and non-sentential utterances. Conversational relevance covers what was previously thought to be in the realm of pragmatics in the form of Gricean implicatures. However, it goes beyond how Grice approached the problem in that Grice did not include the role of dialogical context. Grice pushed the field forward by proposing that assumptions about the normal course of a conversation influence the interpretive process. Hence if a particular utterance appears to be irrelevant, interlocutors know to still assume that the utterance is relevant and interpret it with this in mind. This is the role of Grice’s cooperative principle. What Grice did not make explicit is what counts as relevant or how to describe and classify a particular stage of the conversation (talk exchange). For Ginzburg, conversational relevance is sensitive to at least three dimensions: the illocutionary act, metacommunicative aspects of meaning, and the genre within which a particular utterance occurs. Moreover, he argues that any point in a conversation is characterized by a particular information state relativized to the individual conversation participants. Information states come in two parts: a private and 2

There are a few other formal frameworks that deal with the semantics of dialogue, including structured discourse representation theory (Asher and Lascarides 1998, 2003) and the dialogue theory of Poesio and Traum, known as PTT (Poesio and Traum 1997, Poesio and Rieser 2010, 2011). Both frameworks take as their starting point the dynamic semantics of DRT and include a rich discourse structure. I focus here on Ginzburg’s version as it is the framework that is geared toward linguists, whereas the others are concerned with computational modeling.

62

From Speech Acts to Interaction

a public part. Linguistic analysis is necessarily concerned with the public part: the dialogue game board. According to Ginzburg, the game board includes the following ingredients: conversation participants, utterance time, facts (i.e., the common ground), moves (i.e., the content of conversational moves that have been grounded), and the question currently under discussion. This allows for the analysis of meta-communicative acts as expressed by discourse markers and interjections. Consider the example in (13). (13)

I R

I think they should also respect the sanctity of the American home, whether it be in a house or in an apartment. Yeah, yeah, no, I agree with you there. (Tian and Ginzburg 2018: 1242)

The co-occurrence of two response particles of opposing polarity (yeah and no) suggests that these response particles don’t respond to the same content. Rather, response markers can respond to other dimensions of meaning (Tian and Ginzburg 2018: 1243). With the use of yeah in (13), R signals agreement with the propositional content. According to Tian and Ginzburg (2018: 1244) with the simultaneous use of no, R disagrees with I ’s assumption that this belief is exclusive to I , thereby communicating that this belief is already shared. Hence, response particles not only operate over propositional content, but instead can also be used to manage the common ground (see Chapter 6). 3.5.5

Expressive Dimensions and Other Forms of Language Use

One of the dimensions of meaning that does not contribute to truth-conditional meaning is the expressive dimension. UoLs such as ouch and oops serve to express emotions rather than describe them (Hayne 1956). Analyses of expressive meaning have led to the incorporation of non-truth-conditional meaning into formal semantic theory: sentence meaning is enriched with a dimension typed as use-conditional (Recanati 2004, Gutzmann 2013). Expressive meaning is already touched upon in Frege (1892, 1918) as independent of sense or reference, something he describes as coloring and shading of linguistic meaning. For example, while the word dog is neutral in terms of expressive content, the truth-conditionally synonymous form cur (German Köter) is associated with a negative connotation: the speaker reveals that they have a negative attitude toward the referent. Within the formal literature, expressive meaning was not discussed until the late 1990s (Kaplan 1999), though it played a major role in frameworks that took communication to be the primary function of language (Bühler 1934, Jakobson 1960). Kaplan (1999), in discussing the significance of expressives such as oops and ouch, challenges the traditional philosophical belief that “logic is immune to

3.5 Grammar-Based Frameworks: Semantic

63

epithetical colour.” According to Kaplan (1999), expressions can be expressively correct if what is expressed or displayed is indeed correct (Kaplan 1999: [2:40]).3 Expressives can be used in insincere ways (i.e., one can lie with them). For example, if you hurt yourself, and I exclaim ouch, I am not using the word sincerely. Kaplan (1999: [1:21:52]) concludes that “it seems to be quite possible to extend semantic methods, even formal model-theoretic semantics to a range of expressions that have been regarded as falling outside semantics, and perhaps even as being insusceptible to formalization.” The challenge to developing a formal semantic analysis for expressives is taken up by Potts (2007), who argues that they function as context-changing operators. For example, the expressive damn changes the context; it registers that the speaker views the referent negatively. Potts (2007) significantly contributes to our understanding of expressives by developing explicit criteria to diagnose expressive meaning. These diagnostics are summarized in (14): (14)

a. Independence: expressive content is independent from descriptive content b. Nondisplaceability: expressives predicate something of the utterance situation c. Perspective dependence: expressive content is evaluated from a particular perspective (i.e., the judge) d. Descriptive ineffability: it is hard to paraphrase expressives in descriptive terms e. Immediacy: like performatives, expressives achieve their intended act by being uttered f. Repeatability: repetition leads to the strengthening of the emotive content rather than redundancy. (based on Potts 2007: 166 f.)

Since Potts’s (2007) seminal work, we have seen an increase in work on expressive meaning (McCready 2008, Constant et al. 2009, Steriopolo 2009, Fortin 2011, Gutzmann 2011, among many others). This work has inspired others to analyze a variety of linguistic expressions with non-truth-conditional meaning in similar ways. For example, Gutzmann and Gärtner (2013) cover a number of phenomena that impose conditions on language use: discourse particles, formality distinctions in pronouns, ethical datives, intonational tunes, diminutive morphology, and certain syntactic constructions. A semantic analysis of expressive meaning (as opposed to a pragmatic one) is in part motivated by the fact that despite the lack of truth-conditional meaning, these elements are still associated with conventional semantic content: speakers of a language agree on how to interpret these forms. For Davis and Gutzmann (2015), the use-conditional dimension of meaning is formulated parallel to truth-conditional meaning, with two core differences. First, the 3

The numbers are taken from the transcription of Kaplan’s talk by Elizabeth Coppock.

64

From Speech Acts to Interaction

notion of a sentence being true is replaced by the notion of an utterance being felicitously used. Second, while truth is evaluated relative to (possible) worlds, felicity is evaluated relative to a given context. This is illustrated in (15): (15)

a. (T) Truth conditions “Snow is white” 1 is true, 2 iff snow is white. 3 b. (U) Use conditions “Oops!” 1 is felicitously used, 2 iff the speaker observed a minor mishap. 3 (Davis and Gutzmann 2015: 201)

Based on this insight, a use-conditional semantics is developed that allows for a compositional calculation of complex expressions. 3.6

Formal Grammar-Based Frameworks: The Syntactic Angle

Formal syntactic theory has not participated in the same shift in adding an interactional angle, at least not to the same extent. This is perhaps surprising given the fact that Chomsky’s teacher Zellig Harris published a paper on discourse analysis (Harris 1952) in which he develops a framework for the analysis of discourse that uses the same tools as for the analysis of sentences. According to Harris (1952), the sentence as the unit of analysis is an artifact that arose because many descriptive generalizations about the distribution of morphemes are defined over this domain. Nevertheless, he observes that there are also generalizations that require reference to domains that range beyond sentence boundaries. These phenomena can, according to Harris (1952), be studied in the same fashion as those that occur within the boundary of the sentence. The method Harris refers to is formal in the sense that distributional analysis is independent of meaning: it is concerned with the formal relation among linguistic expressions (Harris 1952: 5). Despite this early attempt at a formal discourse analysis that makes use of the same methodology that led to modern syntactic theory, there is to my knowledge only one framework available to date that takes linguistic interaction as the central empirical domain to be accounted for. This is dynamic syntax, developed by Ruth Kempson and her colleagues (Kempson, Meyer-Viol, and Gabbay 2000, Cann, Kempson, and Marten 2005). The central thesis of dynamic syntax is that knowledge of language should not be studied as an encapsulated form of knowledge independent of knowing how to use it. Grammar is taken to be a formalism designed to reflect directly the temporal dimension of natural language processing. Hence there is an

3.6 Grammar-Based Frameworks: Syntactic

65

explicit focus on explaining how natural language is parsed: according to Cann et al. (2005: 1), “knowing a language is knowing how to parse it.” This commitment calls for an incremental (dynamic) syntax, which is equated with a set of procedures for context-dependent interpretation. In this way, this framework simultaneously addresses both the problem of compositionality and the problem of context-dependence. But in so doing, it rejects the generative assumption that grammar has to be treated independently of its context of use. Consequently, dynamic syntax takes as its empirical domain spoken language in interaction, much like CA. One of the arguments is the fact that conversational data is what a child acquiring language is exposed to. It includes phenomena that only arise in interaction, such as turn-taking, repair strategies, fragment utterances, and the joint co-construction of a single sentence by more than one interlocutor, as in (16) and (17). (16)

A: Have you read B: any of your books? Certainly not.

(17)

A: Are you OK? Did you burn B: myself? Fortunately not. (Kempson et al. 2011: 18 (8)–(9))

These phenomena, which in the generative tradition would be taken to be performance disfluencies, are the kinds of data that support an incremental dynamic syntax. Moreover, dynamic syntax also has a cross-linguistic angle. Specifically, the view that syntax provides speakers with the capacity to parse sentences in real time offers an interesting perspective on the differences between head-initial and head-final languages and on the differences between head- vs. dependent-marking languages. Strikingly, there is a line of research on syntax within interactional linguistics that shares some assumptions with dynamic syntax. Specifically, Auer (2007) argues that spoken language needs a syntax that is viewed as a process. There are three necessary properties of a syntactic theory of this sort: it has to be incremental and dialogical, and therefore, according to Auer (2007), it has to rely on constructions, as in construction grammar (Goldberg 2006), rather than being fully compositional. The first two properties are those shared with dynamic syntax. For Auer, incremental interpretation is achieved via a process of projection, which allows the interlocutors to process ongoing speech without delay. It applies to syntactic construction in the narrow sense (i.e., the construction of sentences), but also to the identification of TCUs. Because syntax and interaction make use of the same mechanism for speakers to predict what’s next, Auer (2007) argues that they have to be part of the same system. The process of projection allows for the co-construction of an utterance across different interlocutors, as in (16)–(17), because it allows the

66

From Speech Acts to Interaction

interlocutors to predict what’s next (see Falk 1979, Lerner 1991, Ferrara 1992, Szczepek 2000, Hayashi, Mori, and Takagi 2002). And that’s precisely why Auer argues that a construction grammar view is called for. The online composition of meaning would not allow projection, whereas the assumption that constructions are memorized and hence can be retrieved does. And just like dynamic syntax, Auer’s approach has a cross-linguistic angle in that he is interested in language-specific differences in the potential for projection: languages with rigid word order have a greater projection potential than languages with relatively free word order (Auer 2007: 3). Similarly, headfinal languages have a lower potential for projection than head-initial ones, making Japanese a language with low projection potential. However, aside from word order, other properties of individual languages, such as for example intonation, may contribute to their projection potential. While the syntactic architecture for sentence structure differs significantly between Kempson’s dynamic syntax and Auer’s interactive syntax, their assumptions regarding the objective, function, and empirical scope of syntax are virtually identical. This is despite the fact that the former is in the tradition of formal theory, while the latter is developed within the function-based interactional linguistics. The similarity between the two approaches is significant in itself: it suggests that the divide between formal and functional theories may not be as clear-cut as is typically assumed (but see Bjerre et al. 2008). In fact, Marten and Kempson (2002: 474) assert that dynamic syntax, while formal in implementation, is also functional in that it “aims at formulating the architecture underlying our ability to parse.” In addition to the breakdown of a strict formal/functional divide, the investigation of interactive discourse phenomena within formal syntactic theory goes along with the breakdown of another division that has been assumed to be rather clear-cut until recently: namely, the distinction between competence and performance. According to Cann et al. (2005: 23), the articulation of a model for competence has to be developed such that it allows for the architecture of language to include aspects of performance. This echoes an assumption of CA, according to which there is a communicative competence. The proposal I develop in Chapter 4 endorses this conclusion. 3.7

Conclusion

The purpose of this chapter was to review analyses of interactional language. I reviewed dialogue-based frameworks and several grammar-based frameworks, both within functional and formal traditions. There are several reasons to include an overview of the findings within these frameworks, even though the proposal I develop is couched within a formal syntactic theory of the generative kind. First, as we saw in Chapter 2, generative syntactic analyses

3.7 Conclusion

67

have seen a recent rise in incorporating notions of speech act theory. There are various analyses that posit a layer above the propositional structure that encodes illocutionary force and contextual variables, including representations of the interlocutors. What is striking about these analyses is that they incorporate speech act notions from the time speech act theory was first developed. Current versions of the syntax of speech acts update Ross’s original insight with the tools of modern syntactic theory. But they ignore everything that has happened following classic speech act theory. As we have seen, there have been significant developments in terms of the empirical domains of analysis, the analytical tools, and the theoretical tenets. Many of these developments share the insight that language is more than acts of speech: it involves interaction. If we accept the premise of the body of work that syntacticizes speech acts, namely that syntactic theory should include speech act theoretical notions, then there is no a priori reason why it should not also include notions that are defined in terms of the interactional dimension. And if so, it is crucial to establish what has happened in the wake of speech act theory, independent of the theoretical stance. To the best of my knowledge, this type of literature review has not been attempted before. In fact, with a few exceptions, the literature within each of the frameworks reviewed hardly references work within the other frameworks, despite the fact that they all share a similar goal: to understand and model the properties of language in interaction. There are several lessons to take away from this overview that form the foundation of the proposal I develop here. Lesson 1: Integrate Different Types of Meaning Linguists and philosophers alike have long recognized the existence of qualitatively different notions of meaning. These have been referred to in different ways, but a basic dichotomy emerges. Truth-conditional meaning is often equated with the basic context-independent meaning of sentence types. It contrasts with meaning that does not affect truth conditions but does have use conditions. This dichotomy is schematized in Figure 3.7. What many frameworks have in common is that they advocate for an integrated theory of meaning, one that doesn’t compartmentalize different types of meaning in ways that would assign them to different sub-disciplines of linguistics (semantics vs. pragmatics). Traditionally, grammar-centered frameworks take the type of meaning on the left of Figure 3.7 to be the sole object of investigation. However, many formal and functional frameworks integrate the other type of meaning. The proposal developed here responds to this desideratum and adds another argument: a linguistic expression whose meaning is use-conditional must be integrated into complex linguistic expressions. If we view syntax as the module that is responsible for mediating the

68

From Speech Acts to Interaction

What is said

What is done

descriptive

expressive

propositional

non-propositional

sentential

interactive

truth-conditional

use-conditional

conceptual

procedural

Figure 3.7 A dichotomy of meaning

relation between form and meaning and for the concatenation of UoLs, then a theory is needed that allows for the syntactic integration of both truth- and use-conditional meaning. Lesson 2: Rethinking the Difference between Language Competence and Performance The dichotomy between language competence and language performance had the effect that, within formal linguistic theory of the generative tradition, language in use was not considered an object of investigation. This means that spoken language phenomena as they occur in naturally occurring conversations have not been studied. Interestingly, researchers within CA have long argued that we have to recognize a communicative competence: one’s knowledge of language includes knowledge of how to use language. While this conclusion has led some to conclude that the competence/performance distinction has to be abandoned, there is a different conclusion available. While it is true that language in use can be riddled with errors and disfluencies, there is nevertheless a system in place that lets speakers deal with such errors and disfluencies. It is precisely this system that can be viewed as being

3.7 Conclusion

69

part of our language competence. And, similarly, there is a system that allows speakers to negotiate mutual understanding and to manage interaction. There is no a priori reason to deny knowledge of this system the status of being part of language competence. And once communicative competence is admitted into the domain of language that grammatical theory ought to account for, we have to take into consideration linguistic expressions that are restricted to language in use, such as confirmationals, response markers, and discourse markers in general. Lesson 3: Rethinking the Primacy of the Sentence as the Unit of Analysis Once we take into consideration language in interaction, we have to look beyond the sentence. In conversations, speakers utter units that are either smaller or larger than sentences – the sentence has no privileged status. Units of analysis thus should be defined in terms of their function in interaction. For example, CA takes TCUs to be the basic unit: they are defined as units that can (in principle) function as a turn on their own. In addition, turns can be classified according to whether they are used as initiating or reacting moves. Lesson 4: Rethinking Common Ground Updates Classic speech act theory, combined with the Stalnakerian view of the common ground, paints a picture according to which common ground is updated by virtue of a speaker uttering a sentence. This is reflected in Ross’s (1970) speech act structure, which encodes, for declaratives, I tell you that S, and is echoed in Speas and Tenny’s updated version. This way of thinking of speech acts implies a model for conversation as in Figure 2.9 (see p. 36). An utterance directed at an addressee automatically enters the common ground. However, the literature presented in this chapter has revealed that in actual conversation, the situation is more complex. Update of the common ground does not happen automatically but requires the interlocutor to accept the speaker’s utterance. There are linguistic expressions dedicated to achieve just that and linguistic models need to reflect the joint effort of updating the common ground. For example, drawing on the work by Clark and his colleagues, Farkas and Bruce introduce the notion of the Table where discourse moves are being presented for the addressee to accept. Thus, there are two distinct moves we have to distinguish: I here use the general terms initiating move and reacting move. As illustrated in Figure 3.8, the speaker’s utterance is first tabled in the initiating move. The addressee can then decide to accept it. It is only at this point that the proposition enters the common ground. It thus follows that the common ground has to be decomposed into two separate grounds, one for the presenter

70

From Speech Acts to Interaction

Initiation CG

Reaction CG

q, r Speaker

p, q, r Addressee

p, q, r

q, r

Speaker p, q, r

Addressee p, q, r

p

p

Figure 3.8 The interactional dimension

and one for the responder. It is the publicly accessible common denominator of the two grounds that serves as the common ground. Lesson 5: Beyond the Sub-Discipline Divide Finally, when considering the assumptions necessary to model language in interaction, we observe that the division of labor traditionally associated with different sub-disciplines of linguistics cannot be maintained. For example, we have seen that context-dependent meaning can be – and sometimes has to be – treated as part of semantics rather than pragmatics. Similarly, linguistic expressions that serve to manage ongoing interaction can be of different types, including words and morphemes, syntactic constructions, and intonational tunes. The analysis of these types of expressions belongs to different sub-disciplines, including the study of morphology, syntax, and phonology, respectively. In Chapter 4, I introduce a proposal that combines insights from the syntacticization of speech act literature with the lessons we have learned from models that explore language in interaction. Following this body of work, I assume that there is a system in place that regulates the way we use language in interaction, and furthermore that this system uses the very same ingredients as the system that is responsible for creating sentences in isolation that express propositional

3.7 Conclusion

71

content. That is, I argue that just as morphology is now often viewed as being distributed (Halle and Marantz 1993), so is syntax. In particular, according to distributed morphology, the same module, namely syntax, is responsible for the construction of words and the construction of sentences. In a similar vein, I argue that the same module, namely syntax, is responsible for the construction of sentences and the construction of linguistic interaction. So, syntax is not only to be viewed as the module that goes all the way down to (below) the word level, it also goes all the way up (and over) to the interactional level. And that’s what I turn to next: the syntacticization of interaction.

4

The Interactional Spine Hypothesis

Der Mensch spricht, sogar in Gedanken, nur mit einem Andren, oder mit sich, wie mit einem Andren. W. von Humboldt

My core goal in writing this monograph is to explore the grammar of interactional language and to develop a framework that allows us to do so. In Chapter 3, I reviewed several approaches toward interactional language in an attempt to distill its essential properties. I argue that traditional sentence structure, which represents propositional meaning, is embedded within structure dedicated to language in interaction: the interactional spine. Specifically, I propose that the interactional spine consists of two layers: grounding and responding, as in (1). The core function of the grounding layer is for the speaker to configure the propositional content of the utterance so that the addressee can update their knowledge state to include it. The core function of the response layer is to manage the moves that serve to synchronize the interlocutors’ knowledge states. I refer to this hypothesis as the Interactional Spine Hypothesis (ISH). (1)

The chapter is organized as follows. In section 4.1, I make explicit the problems that I seek to address in the remainder of the monograph. In section 4.2, I give a brief overview of the theoretical framework I use to develop the ISH, namely the Universal Spine Hypothesis (USH). In section 4.3, I introduce the ISH in more detail so as to apply it to the analysis of confirmationals and response markers in Chapters 5 and 6. 72

4.1 Problems I Want to Address

4.1

73

Problems I Want to Address

One of the core insights that unites virtually all of the frameworks that deal with the language of interaction is that it is not confined to the traditional unit of the sentence. On the one hand, it includes UoLs that are sometimes considered to lie outside of the sentence proper, either because they are sentence-peripheral or because they are not integrated into the propositional content. On the other hand, interactional language is characterized by the back and forth between (at least) two interlocutors. Hence the units to be considered are discourse moves or turns. These have an added layer of complexity compared to sentences because they need to be sequenced. Verbal interaction is characterized by adjacency pairs. Generalizing over properties of specific turns, there are two basic moves we need to recognize: initiating and reacting moves, as in (2). (2)

Minimal turn-sequencing: a. Initiating move: I b. Reacting move: R

This sequence represents the minimal conversational unit. While in actual conversations moves can get more complex, there are reasons to assume that this complexity can be derived by these two basic moves (see section 7.3.3.2). With this in mind, I now turn to defining the problems that interactional language presents us with and that the ISH seeks to address. 4.1.1

The Empirical Problem: Confirmationals and Response Markers

Language allows us to talk about how the world is. But it also allows us to integrate what the world is like in our mental states in different ways. For example, propositions can be embedded inside other propositions expressing propositional attitudes, as in (3). (3)

a. [You have a new dog] b. [I believe [you have a new dog]] c. [Is it true that [you have a new dog]]

This reflects the fact that we can think and talk about facts (i.e., propositions) but also about the fact that we know, believe, or wonder about these facts. We can talk about our propositional attitudes. And we can talk about what we think about other people’s attitudes toward these facts. We can engage with others to talk about the validity of our beliefs and by doing so we can learn from each other. Crucially however, languages not only have verbs that express propositional attitudes (know, believe, wonder), but also have functional (closed-class) UoLs, including confirmationals and response markers that define the empirical basis for

74

The Interactional Spine Hypothesis

this study. They define the essential moves of interaction – initiating and reacting moves – and hence they serve to regulate interaction. For example, utterance-final eh is a confirmational used to ask the interlocutor for confirmation. Once uttered, the initiator has to stop talking to make room for the reacting move it elicits. Utterance-initial yeah is used to indicate agreement with the initiator. (4)

I R

Gal Gadot was amazing as Wonder Woman, eh? Yeah, I know.

Thus, confirmationals and response markers are ideal for the exploration of the language in interaction since they type initiating and reacting moves, respectively. (5)

a. Initiation: I [[. . .] Confirmational] b. Reaction: R [Response marker [. . .]]

To appreciate the complexity of confirmationals, consider (6). The confirmational eh is sometimes considered a hallmark of Canadian English (Avis 1957, Denis 2013) equivalent to huh. Indeed in (6a), the two confirmationals appear to be interchangeable. Both confirmationals are felicitous if the speaker wants the addressee to confirm the truth of the proposition, assuming that the addressee has authority to do so. However, there is a crucial difference between the two confirmationals, as shown in (6b). Here, eh can be used to confirm that the addressee knows that the speaker has a new dog but huh is infelicitous in this context. (6)

a. You have a new dog, {eh/huh}? b. I have a new dog, {eh/*huh}?

Thus, the difference between eh and huh can only be described with reference to the knowledge states of the interlocutors relative to the propositional content of the host clause. How this difference comes about is one of the empirical questions I address in Chapter 5. Similarly, the English response markers yes and no come in different guises, and the differences often relate to the knowledge state of the interlocutors. Oh yeah? is used as a response to utterances whose propositional content is news to the interlocutor (7), while yup is used if the propositional content is already known to the interlocutor (8). (7)

I R

Surprise, you’re on candid camera. {Oh yeah?/*Yup}

(8)

I R

So I guess you got a new dog? {*Oh yeah?/Yup}

How the difference in form and function of response markers comes about is the empirical question I address in Chapter 6.

4.1 Problems I Want to Address

75

What these data reveal is that interactional language, just like propositional language, shows intricate patterns of form–function pairings with differences that need to be explored and the first step of exploration is description. We need to know the lay of the land to be able to understand whether there is a system that underlies the patterns of variation. The empirical problem of describing the language of interaction is exacerbated for understudied languages. While the units of interactional language, such as confirmationals and response markers, are ubiquitous in mundane conversations, descriptions of these UoLs are rare. Hence the systematic exploration of the form, function, and distribution of such discourse markers is still an outstanding empirical task. 4.1.2

The Analytical Problem: The Need for a Framework

Along with the empirical problem of describing the relevant UoLs that are part of interactional language comes an analytical one: how do we analyze these UoLs in a way that allows us to generalize over their properties? Generalization is necessary for the purpose of language comparison. Following one of the core tenets of the generative enterprise, the question that the units of interactional language raise concerns their similarities and differences. In other words, what is the range of variation? To answer this question, we need to know the comparison set. This is not a trivial task because the UoLs that comprise interactional language can be superficially very different. For example, some languages use sentence-final particles to encode the knowledge state of the interlocutors; others use different intonational contours (Wakefield 2011). Thus, an adequate framework for description and comparison needs to be able to compare different UoLs. But if formal properties (such as surface category) are not reliable points of comparison, what are? What is currently missing is a typology for interactional language that is not tied to surface properties. Analysis (and ultimately description) that allows for cross-linguistic comparison requires a theoretical framework that is abstract enough that languagespecific details do not become the point of reference for comparison. 4.1.3

The Theoretical Problem: What Does It All Mean?

The fundamental question defining linguistics is the question about how form, meaning, and distribution of UoLs relate to each other. The same question applies to interactional language. I consider it to be the null assumption that we can use the same model for both propositional and interactional language. This is not to say that there are no domain-specific differences; however, I shall assume that the fundamental system that regulates the relation between form, meaning, and distribution is the same across both domains.

76

The Interactional Spine Hypothesis

The theoretical task is thus to develop a theoretical framework that makes use of the same mechanism to model the sound–meaning relation and the system of composition, and this is what I set out to do here. We know that interactional language is characterized by forms of meaning that are intrinsically linked to the context of the ongoing conversation (see Chapter 3): it serves to facilitate the grounding of propositional content and to manage turn-taking and turn-sequencing. This differentiates propositional language from interactional language, and this difference has to be modeled as well. Thus, the theoretical goal of this monograph is to develop a framework that adheres to insights from both the generative grammatical tradition and frameworks that consider interactional language. 4.1.4

The Methodological Problem: Interactional Data

The assumption that interactional language is part of our grammatical language competence brings with it a methodological problem: how do we study it? How do we collect data? The approach I take here spans across the methodologies used in different frameworks, including generative grammar and frameworks dealing with interactional language. Within generative grammar, the study of language competence consists of well-formedness judgments by native speakers. Anyone conducting fieldwork knows that this is not always an easy task, even for propositional language. Eliciting well-formedness judgments for interactional language adds several layers of complexity because of its context-sensitivity. There are at least two interlocutors involved and not only does it matter what they are talking about, but it also matters who is talking to whom, what they know about each other, and when and where the conversation takes place. It is hard to set up this kind of context in typical elicitation. Moreover, when working with non-linguists as consultants, it is difficult to make it clear that linguists are interested in what people know when they know their language, not in what they ought to know as prescribed by conventions. This is especially problematic for spoken language, which is supposedly riddled with improper ways of talking. These problems are encountered with propositional language, but they are exacerbated with interactional language. Frameworks that have interactional language as their empirical domain typically use a different methodology. CA looks at natural conversation (recorded and transcribed). This methodology will not suffice to develop a formal typology for interactional language, because it does not provide us with negative data. That is, when developing theories based on natural language data, there will be predictions about what is expected to be well-formed and what is not. When relying on native speaker judgments, ungrammatical sentences can become part of the dataset, but when relying on corpus data, this is not possible. When a particular type of sentence is not attested in a corpus, we do not know why it is not attested: because it is judged ill-formed by native speakers or because it simply is not part of the corpus. The

4.2 The Framework: The Universal Spine Hypothesis

77

minimal pairs needed for hypothesis testing are simply not available in a corpus of naturally occurring conversational data. The second problem with naturally occurring conversational data is the inverse of the first one: there is a difference between what people know about their language and how they use it. Language use can be riddled with errors due to all kinds of factors. This observation underlies the generative distinction between competence (what we know about a language) and performance (how we use language). While we have seen that there are good reasons to believe that competence includes interactional language, the same logic applies. There is a difference between knowing how to use language in interaction and actually using it. To tap into speaker’s knowledge, it appears that native speaker judgments are essential. Hence, for the study of interactional language, it is essential to combine various methodologies (Clark and Bangerter 2004). 4.2

The Framework: The Universal Spine Hypothesis

According to the ISH, interactional language is regulated by the same formal system that regulates propositional language while still recognizing that there are differences. The ISH is based on a particular approach toward clause structure, namely the USH (Wiltschko 2014). The USH introduces a way to model the relation between the form, meaning, and distribution of UoLs. There is a universal spine that exists independently of the UoLs that associate with it, as in (9). The spine regulates the distribution of UoLs and also contributes to their interpretation. This is because each layer in the spine has a function, which affects the interpretation of the associated UoLs. (9)

Linking

Anchoring

Point-of-view

UoL

Classification

78

The Interactional Spine Hypothesis

The spine consists of four basic functions: classification, point-of-view, anchoring, and linking (Wiltschko 2014). Classification serves to classify events and individuals into sub-classes, point-of-view serves to introduce a point of view from which to describe the event or individual, and anchoring serves to anchor the event or individual to the deictic center; hence, configuring a proposition. At this point, truth conditions may be assigned as the event is anchored in time and space. And finally, the linking function serves to link the propositional content to the ongoing discourse. These spinal functions do not directly translate to functional categories of the familiar type (e.g., AspP, TP, CP, MoodP). Rather, according to the USH, grammatical categories are always derived on a language-specific basis: they are composed of the spinal function and the UoL that associates with the head. For example, in English, the anchoring and point-of view functions are associated with temporal content; hence, the categories are tense and aspect, respectively. But this is not universally determined. UoLs with different content may also associate with these functions. For example, in Blackfoot, point-of-view and anchoring are associated with UoLs that are based on person. This derives the direct-inverse system of Blackfoot (Bliss, Ritter, and Wiltschko 2014). Despite this difference, they nevertheless serve the same core function (anchoring and point-of-view) and they have the same general distribution: they interact with the domain where grammatical roles are assigned (the subject and object role) and they comprise the inflectional core of the language. Thus, distributionally they are akin to the temporal-based inflectional systems of the familiar type (Ritter and Wiltschko 2014). While the categories along the spine differ in their function, they have the same formal properties across these functions. Each spinal head comes with an unvalued coincidence feature [ucoin], which serves to relate two arguments: its complement and its specifier. These arguments are conceived of as abstract (typically silent) situation arguments that include the time and location of the situation and its participants, as in (10). (10)

KP

argsit

K

K

[ucoin]

argsit

[ucoin] has to be valued to be interpreted (Chomsky 1995). Depending on the content of the valuing UoL, different aspects of the situation arguments are activated. For example, morphological past tense marking assigns a negative value to the coincidence feature in the anchoring domain: it asserts that the

4.2 The Framework: The Universal Spine Hypothesis

79

event time does not coincide with the utterance time. In contrast, present tense marking assigns a positive value: it asserts that the event time coincides with the utterance time. It is the substantive content of a given UoL that serves to value the grammatical feature. Thus, the coincidence feature functions as a bridge between conceptual and grammatical knowledge. The content determines whether the value will be positive or negative and what aspect of the abstract arguments will be ordered. In sum, according to the USH, grammatical categories are generated on a language-specific basis. They are not part of a universal repository. While subscribing to a universalist view, the USH denies the existence of universal grammatical categories. This predicts that languages use categories of different content for the same core function. In this way, the USH is a heuristic for the discovery and comparison of grammatical categories across languages. It serves as the basis for a formal typology. For any given UoL, one needs to ask three questions: where, how, and when it associates with the spine. The place of association can be gleaned from a combination of its linear order, scope properties, and the function it fulfills in the configuration of propositional meaning. There are two options for how a given UoL associates with the spine: as a head or as a modifier. If it associates as a head, it participates in contrastive marking since it values [ucoin], which is intrinsically binary [+/−coin]. Finally, the timing of association pertains to whether the UoL associates with the spine before or after syntactic computation. UoLs that associate early often display category-neutral properties since they can associate in various positions along the spine; UoLs that associate late often display categorial complexity (i.e., portmanteau properties). This defines the typological space of the USH and allows for a novel way to explore language variation. The assumption that grammatical categories are derived on a language-specific basis, but in ways that are constrained by the universal spine, reconciles a longstanding tension between the generative universalist approach and the functional typologist approach. According to the former, grammatical categories are universal; according to the latter, they are not. But if categories differ across languages, how do we even compare them and hence how is language typology possible? The universal spine provides a tool for comparison: through the abstract functions that define the layers of structure, grammatical categories can be compared to each other. Moreover, the USH provides a principled way to determine the labels of functional categories. Since the postulation of functional categories within the generative tradition, there has never been a criterion for labeling such categories and this is reflected in the diverse labels that have been proposed. But for a model that relies heavily on functional categories (or the features that comprise them), it seems essential to have a principled way to determine how to decide what can and what cannot be a functional category. According to the USH,

80

The Interactional Spine Hypothesis

universal categories are defined by the abstract functions in (9), while the language-specific categories are variable. Finally, the USH allows us to make sense of the multi-functionality of UoLs. It is a pervasive property of natural languages that a given UoL may have several interpretations and that the difference in interpretation correlates with a difference in distribution. This follows straightforwardly from the USH because spinal functions contribute to the interpretation of a given UoL. In sum, there are three issues that the USH addresses in novel ways: (i) how to compare categories in light of variation, (ii) how to determine the label of a functional category, and (iii) how to deal with multi-functionality. The same issues present themselves for interactional language. There is no point of comparison because language variation in this domain has not been studied. While there is work on the syntax of speech acts, there is no consensus on the labels that are used in this domain. And finally, one of the core properties of UoLs that comprise interactional language is that they are multi-functional. Hence, the USH is an ideal framework to adopt for the analysis of interactional language. All that is needed is to extend the universal spine to include the functions that are essential for relating the propositional content to the interlocutors’ knowledge states and the turns they take to exchange them. This is the essence of the ISH: it extends the spine to configure interactional language, while keeping constant the formal properties of categories. I turn to a more detailed description of the ISH in the next section. 4.3

What I Propose: The Interactional Spine Hypothesis

With language, people can exchange ideas about how they perceive the world. Linguistic interaction allows us to synchronize our minds. While social interaction need not involve language and social interaction is not the only function of language, it is undeniable that interaction is an important function of language. Hence, it does not come as a surprise that there are grammatical means to negotiate interaction. This is not to say that all utterances always contain UoLs dedicated to regulating interaction, but often they do, and sometimes they have to. The ISH seeks to model the system that regulates interactional language. It takes into consideration the lessons we have learned from decades of scholarship on the topic. In particular, we saw in Chapter 3 that there are a few key ingredients that make up interactional language. This is summarized in Figure 3.5 (see p. 49). Language in interaction is partly regulated by grammar and partly by pragmatic principles via assumptions about the normal course of a conversation. Thus, across several different frameworks, it is acknowledged that grammar not only configures the propositional content of an utterance and its illocutionary force, but it also plays an important role in the configuration of

4.3 What I Propose: The Interactional Spine Hypothesis

81

the interactive force. In particular, grammar contributes significantly to two aspects of interaction: grounding and turn-taking. Grammar regulates the formal aspects of interactional language but does not determine when it has to be used; this is a matter of pragmatics. While the main goal of this monograph is to explore the grammatical underpinnings of interactional language, we cannot fully understand the system without an understanding of the assumptions regarding the normal course of a conversation. It is these assumptions that ultimately drive the use of interactional language, but grammar provides and regulates the means. 4.3.1

Extending the Universal Spine with Interactional Functions

In line with the lessons learned from other frameworks, I propose to extend the universal spine to include two more functions: one regulating grounding and the other regulating turn-taking, labeled as Responding, as in (1), repeated below. (1)

4.3.1.1 The Grammar of Grounding The term grounding is meant to capture the function that takes the utterance and relates it to a mental state. It is the grammatical foundation for integrating thoughts about the world into our knowledge states. It echoes Clark’s concept of grounding and the Stalnakerian notion of the common ground. In what follows, I introduce my assumptions about the internal configuration of GroundP. Following the USH, I assume that all categories on the spine have the same internal makeup. Each head is a transitive function relating two arguments: its complement and its specifier. This relation is mediated by the coincidence feature [ucoin]. When valued positively, it is asserted that the argument in complement position coincides with the argument in specifier position; when valued negatively, it is asserted that the two arguments do not coincide. For the propositional spine, the arguments are situation arguments (e.g., the event

82

The Interactional Spine Hypothesis

situation is related to the utterance situation). I propose that the same internal configuration is also found in the interactional spine. GroundP relates two arguments to each other via a coincidence feature. But what are these arguments? The complement of GroundP is the propositional structure embedded under it. For the specifier position, I propose that it corresponds to the ground, which in turn consists of the mental representations of our thoughts about the world. (11)

GroundP

Ground Ground [ucoin]

p-structure

Accordingly, the abstract argument introduced by GroundP is not a situation but a state: the mental state of the interactant. The spinal function of GroundP relates the propositional content to the knowledge states of the interlocutors so they can be communicated. Thus, an utterance embedded in the grounding structure not only encodes propositional content, it also adds a subjective component by asserting that the propositional content is or is not in the knowledge state of the interlocutor. For this proposal to work, it is crucial to assume that the ground is relativized to individual interactants. This differs from the original conceptualization of common ground (Karttunen 1974, Stalnaker 1978, 2002, Lewis 1979) as the mutually shared beliefs among the interlocutors that get updated in the ongoing conversation. The ground introduced by GroundP is not a set of shared beliefs. Instead, it represents the ground of a single interactant. Since interaction necessarily involves two people, we expect that grammar introduces a ground for each of the interactants. Thus, I propose an articulated grounding layer: one is speaker-oriented, while the other is addressee-oriented, as in (12). (12)

GroundAdrP GroundAdrP GroundAdr [ucoin]

GroundSpkrP

GroundSpkr GroundSpkr [ucoin]

p-structure

4.3 What I Propose: The Interactional Spine Hypothesis

83

On this view, the common ground does not have a linguistic reality, while the individual grounds of the interactants do. This assumption is consistent with proposals in the semantics literature. For example, Gunlogson (2003) assumes that we need to recognize the discourse commitments of each individual participant. Similarly Asher and Lascarides (2003) assume a separate discourse representation for each of the cognitive states of the interlocutors (see also Rudin 2018). The ISH is, to my knowledge, the first syntactic representation of this idea. And given its syntactic nature, the question of their hierarchical organization arises. I propose that GroundAdr dominates GroundSpkr. While there is no obvious conceptual argument in favor of this configuration, in the course of the case studies on confirmationals and response markers, we will see empirical evidence for the structure in (12) (see also Haegeman 2014). 4.3.1.2 The Grammar of Responding The second function of interactional language, which has been identified in various frameworks as being regulated by grammatical means, is that of turn-taking. This is what I propose to be the function of Resp(onse)P. In what follows, I introduce my assumptions about the internal configuration of RespP. RespP relates two arguments via the coincidence feature. The complement of RespP is GroundP. I further propose that RespP introduces the response set in its specifier position. The response set is a set of items that an interlocutor is meant to respond to. Hence, the function of RespP is to say whether or not the embedded utterance is in the response set of the interlocutor, as in (13). (13)

RespP

Resp-set Resp [ucoin]

GroundP

The response set roughly corresponds to the notion of the Table in the sense of Farkas and Bruce (2010) (see section 3.5.3): the discourse component to which the interlocutors are tending. To the best of my knowledge, the present proposal is the first attempt to syntacticize the notion of the Table. Though note that Farkas and Bruce (2010) emphasize that what is put on the Table has to be syntactic objects paired with their denotations. Within the ISH, this requirement comes for free: what is asserted to be in the response set is a syntactic object, namely the complement of Resp.

84

The Interactional Spine Hypothesis

Now the question arises as to whether the response set is also indexed to the individual interlocutors, just as the ground is. I propose that it is and that this is what is at the heart of regulating turn-taking. Specifically, I propose that the response set can be indexed to either the speaker or the addressee and that this difference is a defining characteristic of initiating and reacting moves, at least of those that are marked as such. When the response set is indexed to the addressee, it is asserted that the utterance is or is not placed into the addressee’s response set, as in (14). Thus, the positive valuation of the coincidence feature in this type of RespP asserts that the speaker requests a response from the addressee. This is a way to explicitly mark an utterance as an initiating move. When the coincidence feature is valued negatively, it is asserted that the utterance is not placed in the addressee’s response set. This can still be an initiating move, but no response is required. (14)

RespP

RespP

Resp-setAdr

Resp-setAdr

Resp [+coin]

GroundP

Resp [–coin]

GroundP

Conversely, the response set can also be indexed to the speaker, as in (15). In this case, it is asserted whether or not the utterance is in the speaker’s response set. A positive valuation of the coincidence feature now asserts that the speaker’s utterance addresses something that is in their response set. This is a way to explicitly mark an utterance as a reacting move. When the coincidence feature is valued negatively, it is asserted that the utterance is not addressing any item in the speaker’s response set. This can still be a reacting move, but it is explicitly marked as a non-response. (15)

RespP

RespP

Resp-setSpkr Resp [+coin]

Resp-setSpkr GroundP

Resp [–coin]

GroundP

In a minimal turn sequence consisting of an initiating and a reacting move, each move, if typed as such, can only have one Response layer. The addresseeoriented RespP types the initiating move, while the speaker-oriented one types

4.3 What I Propose: The Interactional Spine Hypothesis

85

the reacting move. In this monograph, I restrict myself to analyzing minimal turn sequences, but see section 7.3.3.2 for some discussion of complex moves and sequences of moves. How does the interactional structure map onto the actual moves in a conversation? While a RespP with an addressee-oriented response set has to be an initiating move and a RespP with a speaker-oriented response set has to be a reacting move, this is only a one-way implication. Not every initiating or reacting move has to be marked with the respective RespP. For example, the minimal turn sequence in (16) consists of a question and a (fragment) answer. The question is marked with rising intonation, which I assume to mark an initiating move (see Chapter 5), but the response is simply a noun phrase. There is no indication that it is marked as a reacting move. (16)

I R

When did you go to bed? At midnight.

Similarly, in (17), I utters an exclamation, which is not obviously directed at the addressee. Yet once the addressee responds, the exclamation has served as an initiating move. (17)

I R

Oh my gosh. I know.

Thus, utterances can serve as initiating or reacting moves without being explicitly marked as such. But this raises the question as to what, if anything, regulates the marking of these moves. In line with previous assumptions about the function of discourse markers, I argue that interactional language is only obligatory when the ongoing conversation violates certain assumptions about the normal course of a conversation. This is the purely pragmatic aspect of the ISH. While it is not the focus of the present monograph, it is still an important question to address. 4.3.2

Assumptions about the Normal Course of the Conversation

Linguistic meaning is not only comprised of what is encoded, but is often enriched by inferences. This is the essence of the Gricean program (section 3.2.1). Assumptions about conversations (based on the cooperative principle) are the driving force behind inferencing. The Gricean maxims concern assumptions about how to encode the things we want to say about the world: be informative, be truthful, be relevant, be clear, and so on. Crucially, interactional language is not about being informative, or truthful, or even relevant. Rather, interactional language is used to regulate the interaction itself: the way we synchronize our knowledge states and the way we take turns. To regulate interactional language, we have to assume that there are also assumptions

86

The Interactional Spine Hypothesis

about the normal course of the process of conversation, not only about its content. I here adopt the following assumptions about the normal course of a conversation (see Sacks 1987, Farkas and Bruce 2010). (i) A conversation serves to increase the common ground (ii) Agreement is the desired outcome (iii) Agreement is to be reached as fast as possible The desire to learn things about the world (i.e., to increase the common ground) drives the fact that interlocutors place items on the Table. The desire to agree about how the world is (what Farkas and Bruce 2010 refer to as a stable state) drives the fact that interlocutors tend to the items on the Table. And the desire to do this as efficiently as possible (Sacks’s 1987 principle of contingency) drives the properties of turn-taking and sequencing. Of course, the world isn’t always like this: people don’t always agree and when they talk they are not always efficient. Neither grammar nor our assumptions about the normal course of a conversation can change the way the world is or how people relate to each other. Nevertheless, as Sacks (1987: 65) puts it, “it is not that ‘people try to do it’, it is that there is an apparatus that has them being able to do it.” The fact that language has a built-in preference to this effect is evidenced by the fact that disagreement is typically marked while agreement need not be; and similarly, a delay in response is typically marked while contingent reaction need not be. Every move that places an item on the Table is associated with a canonical way of removing that item from the Table (Farkas and Bruce 2010). Consider for example assertions. The normal course of an assertion is illustrated in Figure 4.1. The boxes above the interlocutors represent the initiator’s grounds: it consists of what the initiator knows (indexed with SELF) and what they know about their interlocutor’s ground (indexed with 27).1 In Figure 4.1, the initiator knows that p (among other things); they think that their interlocutor doesn’t know p (hence p is not in the speaker’s representation of 27’s ground). By uttering p, the initiator places p on the Table. Once their interlocutor signals that they have no objection, p is removed from the Table. The initiator knows that their interlocutor has accepted p, and hence they can update their assumption about 27’s ground. The initiating–reacting sequence illustrated in Figure 4.1 adheres to the normal course of an assertion: there is a perceived imbalance in the knowledge state of the interlocutors (only the initiator knows p); the purpose of asserting p

1

I represent the individual grounds as indexed to self and the index that corresponds to the individual who serves as the other interlocutor as a number. This is because the roles initiator and reactor as well as speaker and addressee are temporary roles that change in the course of the conversation. For ease of exposition, I keep the labels speaker and addressee to index the grounds and the response set in the syntactic representation.

4.3 What I Propose: The Interactional Spine Hypothesis

Initiation

Reaction

SELF

27 p, q, r

87

SELF

r

27 p, q, r

p, r

p 27 27

p

Figure 4.1 The normal course of an assertion

is to increase the common ground, in this case to fill the knowledge gap the initiator has identified in his interlocutor. Now consider questions, which come with different conditions, as in Figure 4.2. Among the mental states of the initiator is the fact that they do not know whether or not p is true. However, they have reasons to assume that their interlocutor knows whether or not p is true (this is represented by know (p∨¬p). Hence, they ask whether p is true and thus p∨¬p is put on the Table. Once 27 responds by asserting p, the initiator can place p into their ground; in addition, they also update what they know about their interlocutor’s ground: now they know that 27 knows p. Thus, in a way, questions are associated with inverse conditions to those of assertions: there is an imbalance in knowledge states, but now it is the initiator that lacks knowledge and they assume that their interlocutor can fill this gap. These are the assumptions that facilitate the use of canonical assertions and questions. As mentioned above, the world does not always follow our assumptions of what is normal and neither do our mental states. This is where discourse markers come into play. In case the canonical way cannot be followed, utterances are marked. UoLs that regulate the flow of the interaction are integrated into the grammar. They are often words that have propositional functions as well; they display specific ordering restrictions and they are integrated into the utterance prosodically. This suggests that there is a direct interface between grammar and the speaker’s intentions and at least some assumptions about the ongoing

88

The Interactional Spine Hypothesis

Initiation SELF (p ¬p) ,q,r....

Reaction

27 r know (p ¬p)

SELF

p?

27

r p

p, q, r

p 27

27

p ¬p

Figure 4.2 The normal course of a question

conversation. That is, grammar is configured in a way that makes such markers available in the first place, which allows them to be linearized relative to the utterance and to be interpreted. This is what I refer to as the grammar of interaction. The ISH makes available a particular perspective on the interaction between the language system and assumptions about the normal course of conversation. In particular, we can assume that each spinal function is associated with a particular canonical way in which it is realized, driven by our assumptions about the world. For example, in the propositional domain, we make assumptions about the normal course of an event, and this drives how temporal aspect, a form of introducing perspective, is expressed. Similarly, in the interactional domain, we make assumptions about the normal course of a conversation and these assumptions drive how the spinal functions associated with the interactional spine are realized. The two spinal functions of the ISH, grounding and responding, are each associated with a particular canonical way they are to be realized. The driving force behind the responding layer is the preference for contiguity; the driving force behind the grounding layer is the preference for a shared common ground. The assumption that there is a preference for agreement implies that disagreement is an option but that it has to be marked. This is another argument for the separation of the ground into a speaker- and an addressee-oriented one. That is, as we have seen in Chapters 2 and 3, on classical assumptions about the common ground, assertions are used to update it. Without further assumptions,

4.3 What I Propose: The Interactional Spine Hypothesis

89

this suggests that a speaker who asserts p automatically places it into the common ground, as in Figure 2.9 (see p. 36). But the interlocutor might disagree. This does not mean that what has been asserted won’t ever make it into the common ground. It means that what needs to be updated is the attitude of each of the interlocutors toward the propositional content. Hence, the common ground cannot just consist of propositions, it must consist of propositions and the publicly displayed attitudes toward these propositions. Disagreement is not detrimental to establishing a common ground, but it necessitates recording individual interlocutors’ attitudes toward the proposition. This is what the articulated grounding layer of the ISH grammaticizes. It serves to synchronize our knowledge states, but synchronization does not imply agreement. It just means that we know what our conversational partners believe. Consider Figure 4.3. Here the initiator has p in their ground and thinks that 27 does not. Upon uttering p, 27 responds by asserting the polar opposite: ¬p. Both p and ¬p are now on the Table, thereby creating a conversational crisis. However, this does not mean that the ground is not updated: both interlocutors know more after this exchange. The initiator, whose ground is represented in the SELF boxes in Figure 4.3, now knows that their interlocutor believes not p. Hence, they might change their representation of p as Bel(p), acknowledging that p might not be a truth after all. There are two options for how to proceed: the interlocutors may proceed to resolve the crisis, or they may agree to disagree. What is important for our purpose is that disagreement can be modeled and moreover that we expect,

Initiation SELF p, q, r

Reaction 27

r q, r

SELF

q, r Bel( p)

p

¬p 27

p

Figure 4.3 Disagreement

27

q, r Bel( ¬p)

27

p

¬p

90

The Interactional Spine Hypothesis

based on the assumptions about the normal course of a conversation, that disagreement has to be marked. And this is indeed the case. When 27 utters that ¬p, there is a preference to mark the disagreement with a discourse marker such as actually, well, or uuhmm, as shown in (18). While it is not ungrammatical to have the bare assertion that ¬p, it will put the conversation in a crisis, more than if it is marked (see Sacks 1987 for discussion). (18)

I R

John is leaving for Italy. i. He isn’t. ii. {Actually, well, uuuhmmmm} he isn’t.

Thus, to capture language in interaction, we have to recognize that the interlocutors keep track of their own and others’ grounds separately. Moreover, any given utterance does not automatically lead to an update in the interlocutor’s ground: this is what the Table model of Farkas and Bruce (2010) is meant to capture. Before anything can get into someone else’s knowledge state, it has to be presented, and for the speaker to know if the addressee has accepted what is being presented, the addressee has to respond. These ingredients, which are necessary for understanding verbal interaction that leads to the synchronization of knowledge states, are part of the ISH: the grounding layer encodes what is part of the speaker’s ground and what they assume to be in their addressee’s ground; the responding layer captures the fact that these items are placed on the Table, inviting a reaction. Much of interactional language is used to regulate successful communication, and its use interacts with our assumptions about the normal course of a conversation. 4.3.3

Methodology

Interactional language is a new empirical domain for generative linguistics and hence it comes with new methodological challenges. Much of the data that I report on are obtained via native speaker judgments, which has been the key methodology within the generative tradition and has stood the test of time (Sprouse and Almeida 2012, 2018). Given the necessity of obtaining minimal pairs and negative data, native speaker judgments are necessary for a thorough empirical basis. To get around the problem of context-dependence, I make use of the storyboard methodology (Burton and Matthewson 2015). Recall that a critical problem with standard elicitation techniques is the fact that it is difficult to elicit utterances in context. This is especially true for interactional language. To make the task simpler, storyboards are used: these are graphic illustrations that set up the context for a particular utterance. The focus of our investigation is language in interaction, which we study based on minimal turn-sequences (initiation and reaction). Thus, the kinds of illustrations I have used are better referred to as conversation

4.3 What I Propose: The Interactional Spine Hypothesis

91

Past Conversation time

Initiation

Reaction

Figure 4.4 Ingredients for a conversation board

boards. Conversation boards are scripted cartoons that set up the scenario for the discourse move. The ingredients of a conversation board are illustrated in Figure 4.4, though actual conversation boards may depart from this scheme. First, if it is necessary to make clear what the interactants bring to the conversation, a panel provides information that pertains to relevant past knowledge or experience (this may only concern one of the protagonists). Second, sometimes it is necessary to display what happens as the two interactants get together, even before the conversation begins. In Figure 4.4, this panel is labeled conversation time. Furthermore, there is a panel that depicts the initiation move in the form of a speech bubble and if the reaction move is under investigation, another panel depicts the reaction move. Depending on the elicitation set up, conversation boards may consist only of pictures or they may contain captions. These may be necessary if the elicitation is done in the absence of the fieldworker (e.g., online). With the use of conversation boards, we are able to overcome the problems we identified for both traditional elicitation tasks and corpus-based data collection. We are able to control the context by indicating the relevant knowledge states of the interlocutors. Furthermore, with the use of the speech bubble in a cartoon-like setting, it is made clear that informal, spoken language is appropriate. Furthermore, we are able to elicit minimal pairs and negative data. The conversation-board data included in this monograph have been collected both informally (with speaker linguists) and with naïve subjects in more formal experimental settings. However, elicitation through conversation boards is not the only methodology that has led to the generalizations reported here. In addition, corpora were consulted, mainly for two reasons. First, they help to establish the lay of the land. For example, we may look for strategies to request confirmation

92

The Interactional Spine Hypothesis

beyond eh and huh. This is done before hypotheses are developed and tested. In a second stage, after a hypothesis is developed and sufficiently tested via conversation-board elicitation, the corpus can be checked again to see whether the data confirm or falsify the hypothesis. 4.3.4

Reporting Acceptability Judgments

Interactional language is highly context-sensitive; hence grammaticality judgments, too, are context-sensitive. Context-sensitive judgments are typically treated differently from grammaticality judgments: grammaticality judgments are viewed as being independent of context and context-sensitive judgments are referred to as felicity (or acceptability) judgments. Accordingly, ungrammaticality is marked with an asterisk (*), whereas infelicity is marked with a hashmark (#). It is one of the core assumptions of the ISH that interactional language, just like propositional language, is part of grammar. If so, then the boundary between context-sensitive and context-insensitive parts of language is no longer clear-cut. This implies that the distinction between grammaticality and felicity cannot be maintained. It is for this reason that I adopt a different convention. As for the terminological convention, I use the terms acceptable or well-formed vs. unacceptable or ill-formed interchangeably. As for the notational convention, I follow Thoma (2016) and use an asterisk to mark any ill-formed utterances and (if necessary) I use a checkmark (✓) to indicate well-formedness. In order to be able to do justice to the context-sensitive nature of these acceptability judgments, I augment the asterisk and checkmark with an index for their context of use. In the remainder of this monograph, I follow this convention, unless felicity is not context-dependent, in which case, I simply use the standard convention of marking the sentence with an asterisk.

5

Initiating Moves: A Case-Study of Confirmationals

Inquiry is fatal to certainty.

5.1

Will Durant

Introduction

The goal of this chapter is to explore the grammar of initiating moves. Confirmationals such as eh and huh serve as a window into their grammatical properties. By using a confirmational, a speaker can request from their interlocutor confirmation for a particular belief; hence, they are ideally suited for initiation. When a speaker ends their utterance with a confirmational, they have to stop talking and let their interlocutor respond. For example, with the use of eh in (1), I indicates that they believe the propositional content (p) expressed in the host clause. Hence, the meaning of eh could be paraphrased as Confirm that p is true. In (1), R ’s response confirms I ’s belief: they start their reaction with a positive response marker (yes). (1)

I You have a new dog, eh? R Yes. I got him last week.

As such, confirmationals are a perfect example of a UoL that encodes aspects of the interactional dimension. They serve to signal the speaker’s attitude toward the propositional content (unlike a bare declarative, they signal uncertainty) and they signal that the speaker requests a response. The functional complexity of confirmationals is perhaps somewhat surprising given that the forms that express them are often simplex, within and across languages. Canadian English confirmational eh is but one example, but confirmationals are equally found in other Englishes (2) and in other typologically and geographically unrelated languages. Below I show examples from Austrian German (3), Ktunaxa (4), Mandarin (5), and Medumba (6). (2)

You have a new dog, {eh, huh, right, no, don’t you/haven’t you}?

(3)

Du host an neichn Hund, geu. Austrian German You have-2SG a new dog, CONF 93

94

Initiating Moves: A Case-Study of Confirmationals

(4)

hin ha’t-i xa’ⱡȼin, qáqá? Ktunaxa 2.SBJ have-IND dog, CONF ‘Please confirm that you have a dog.’

(5)

Ni xin yang le tiao gou, haH-L Mandarin you new keep asp CL dog CONF ‘You’ve got a new dog, haven’t you?’ = Confirm that it is true that you have a new dog.

(6)

kʉ̀ lá Ú ɣʉ̀ ʉ́ ↓mbhʉ́ á Medumba CONF 2SG have dog CONF ‘You have got a dog, don’t you?’

With the exception of Medumba, where the confirmational is complex, the confirmationals in the above examples are simple particles. This tension between their simplex form and their complex function implicates the contribution of the syntactic spine. This chapter is organized as follows. In section 5.2, I introduce the grammar of initiating moves, including the syntactic analysis of confirmationals. The remainder of the chapter supports the ingredients of the analysis. In section 5.3, I discuss the relation between confirmationals and the propositional structure. In section 5.4, I provide empirical evidence that the grounding layer has to be articulated. In section 5.5, I present evidence that confirmationals can form entire paradigms. In section 5.6, I discuss UoLs that appear to be related to confirmationals in ways that are predicted by the ISH. In section 5.7, I conclude. 5.2

The Grammar of Initiating Moves

For any conversation to take place, someone has to initiate the interaction. There are numerous reasons for initiating conversations, but most, if not all, serve to increase the common ground: interlocutors may want to share some of their experiences or their knowledge and attitudes about the world, or they may want to inquire about what their interlocutors have experienced, what they know, or how they view things. In this section, I introduce my proposal about the grammatical properties of initiating moves through the lens of confirmationals. 5.2.1

The Function of Confirmationals

Confirmationals have not received much attention in the formal literature, though they have been discussed in functionally oriented frameworks. Take for example the confirmational eh. The list of functions that have been identified for eh is long and ranges over different types of descriptors, as shown below:

5.2 The Grammar of Initiating Moves

95

• reinforcement of various speech act types (Avis 1972, Love 1973, Gibson 1976, Gold and Tremblay 2006) • affirmation and confirmation (Columbus 2010) • checking (Columbus 2010) • force/strength of statement (Columbus 2010) • politeness (Columbus 2010) • stylistic (Columbus 2010) • equivalent of tag question (Avis 1972, Gibson 1976) • narrative eh/anecdotal (Avis 1972, Gibson 1976, Gold and Tremblay 2006) • request for repetition/pardon (Avis 1972, Gibson 1976, Gold and Tremblay 2006) The proposed functions are different in nature. Some are based on syntactic context; others are pragmatic functions. With such a mixed bag, these functions are unlikely to contribute to a system of formal classification. This highlights a significant problem with the existing literature on confirmationals: while there are some excellent descriptions available, the generalizations provided are not useful for establishing a formal typology. I here propose an analysis based on the ISH with the goal of developing a formal typology that will serve to discover, diagnose, and compare confirmationals across languages. There are two core functions that define a typical confirmational and these functions encompass the two core ingredients of the grammar of interaction: aspects of the interlocutors’ attitudes toward the propositional content (their grounds) and aspects of turn-taking. In addition, contextual factors contribute to their interpretation in systematic ways. To determine their functions, it is useful to compare utterances containing confirmationals with bare declaratives that serve as assertions, on the one hand, and with polar questions. The normal course of an assertion is characterized by two conditions: the speaker knows p and has reason to believe that their addressee does not (see section 4.3.2). The first condition is an essential condition of the assertion and cannot be denied (7a). The second condition is not essential, but rather comes about via assumptions about the normal course of a conversation. Hence it can be denied (7b). (7)

a. He’s guilty. . . . *but I don’t know that. b. He’s guilty. You probably know that already.

If these two conditions are met, then an assertion is well-formed. It can be used by the speaker to propose that the addressee adopt the belief that the proposition is true. In contrast, the normal course of a polar question has the inverse conditions on use: the speaker doesn’t know whether p is true and has reason to believe that the addressee knows and can provide an answer (see section 4.3.2). Again, the first condition is essential and cannot be denied, while the second is not

96

Initiating Moves: A Case-Study of Confirmationals

Table 5.1 Knowledge states for assertions and questions Belief states

Assertion Polar question

S

A

p p∨¬p

– know (p∨¬p)

essential, but comes about via assumptions about the normal course of a conversation and can be denied (8b). (8)

a. Is he guilty? . . . *I already know. b. Is he guilty? You probably don’t know either.

Table 5.1 summarizes the knowledge states for normal assertions and questions. Recall that the addressee’s ground corresponds to the speaker’s perception of this ground. Furthermore, recall that I assume that knowledge amounts to having p in one’s ground without any propositional attitude associated with it. I use the prefix know preceding the denotation of the question to indicate that the speaker thinks that the addressee knows the answer to their question. I use this as a descriptive device without claiming theoretical significance. Now consider the use of the confirmational eh in (9). (9)

He’s guilty, eh?

Confirmationals are used if neither the normal course for assertions nor for questions holds. Eh can be used when the initiator is not certain about the truth of a proposition, but they believe it (to some extent). This is illustrated in Figure 5.1, where Bel(p) in the initiator’s ground represents uncertainty. That is, I assume that when propositions are in the ground without an explicit propositional attitude, they represent information that the ground-holder is certain about, that is, things they know; if a propositional attitude is associated with p, it correlates with uncertainty and subjectivity. This assumes that believing is a weaker attitude than knowing. Knowing implies complete conviction, while beliefs are associated with convictions that are somewhat short of complete (Mill 1865). This difference in representation of knowledge and belief mirrors Kant’s (1781) argument that knowing implies objective validity, while believing requires subjective validity. Here, the propositional attitude associated with the proposition introduces the subjective aspect. With the use of eh, the initiator puts on the Table that they have the (uncertain) belief that p. Furthermore, the initiator has reasons to assume that

5.2 The Grammar of Initiating Moves

97

Initiation SELF Bel(p), q,r

Reaction

27 r Know(p ¬p)

SELF

27 r, p

p, q, r

p, eh?

p 27

27

Bel(p)

Figure 5.1 The use of a confirmational

27 knows whether p or not p with certainty. In this respect, the tag question behaves like polar questions. In this context, neither unmarked assertions nor unmarked questions are felicitous: for an assertion to be felicitous, the speaker has to be certain; for a canonical question to be felicitous, the speaker can’t even have a belief. Polar questions are neutral; with confirmationals, inquiry comes with a bias. This difference is evidenced by the fact that only neutral polar questions, and not biased tag questions, can be followed by a statement that denies the belief. For polar questions, there is no belief to be denied (10a), but for tag questions, there is (10b). (10)

a. Is he guilty? I really have no idea. b. He is guilty, eh? *I really have no idea.

We can conclude that, unlike polar questions, inquiries with confirmationals encode the speaker’s propositional attitude toward the propositional content. Utterances containing eh place the speaker’s belief (Bel(p)) on the Table. Note further that both polar questions and tag questions can only be used if the speaker assumes that their interlocutor will be able to provide the answer. This is, however, not an essential condition, but seems to come about via assumptions about the normal course of a conversation. This is evidenced by the fact that this condition can be denied. (11)

a. Is he guilty? You probably don’t know either. b. He is guilty, eh? You probably don’t know either.

98

Initiating Moves: A Case-Study of Confirmationals

Table 5.2 Knowledge states for assertions, questions, and confirmationals Knowledge states

Assertion Polar question Confirmational

S

A

p p∨¬p Bel(p)

– know (p∨¬p) know (p∨¬p)

Table 5.2 summarizes the differences between the conditions of use for assertions, polar questions, and utterances containing confirmationals. So how do these aspects of meaning come about? Are they part of the lexical meanings of the UoLs or do they come about via inferencing based on assumptions about the normal course of a conversation? And finally, given the ISH, we also have to ask which aspects of meaning are contributed by the (interactional) spine. Since the addressee-oriented condition is the same for confirmationals and polar questions, and since it is not an essential condition, it follows that it cannot be part of the lexical entry of the confirmational. The critical question for confirmationals concerns the bias that is introduced by the tag question, and which sets it apart from both assertions and questions: how is the propositional attitude (Bel) introduced? In what follows, I propose that the grounding structure of the interactional spine contributes the attitudinal aspect of meaning. 5.2.2

Confirmationals on the Interactional Spine

Confirmationals come in different guises. This is expected from the point of view of the USH: grammatical categories are constructed via languagespecific UoLs and the way they are associated with the spine. This is true for confirmationals also. Hence confirmationals do not constitute a homogeneous class. Therefore, it is impossible to provide a universal analysis for confirmationals. What is possible, however, is to develop a framework in which one can analyze confirmationals, and this is what the ISH provides. In this section, I introduce the analysis for the confirmational eh, which then serves as a reference point for analyzing other confirmationals. It makes predictions about other logical possibilities for the construction of confirmationals and hence serves as the basis for a formal typology of confirmationals.

5.2 The Grammar of Initiating Moves

99

There are two functions of eh: it introduces a propositional attitude (belief) and it signals that the initiator requests confirmation. In what follows, I propose an analysis for both these functions. Confirmationals introduce a propositional attitude toward the propositional content of the utterance: rather than putting the proposition on the Table, with the use of eh what is on the Table is the speaker’s belief that p. I propose that this is a function of the interactional spine. I propose that eh in (9) associates with GroundSpkr and values the coincidence feature as [+coin], thereby asserting that the utterance is in the speaker’s ground. (12)

GroundSpkrP

GroundSpkrP GroundSpkr [+coin]

p

In the absence of eh, a bare declarative is used to put p on the Table, without a propositional attitude. In this case, p is presented as a fact that is known by the initiator. It is only when GroundP is involved that the relation to p becomes a matter of a subjective attitude. I argue that this is because the fact that p is asserted to be in one’s ground via [+coin] automatically makes available the contrasting feature valuation [–coin]. That is, the spine makes available contrast, and contrast makes available sets of alternatives: if p is asserted to be in one’s ground, it makes available the alternative valuation, according to which p is not in one’s ground. And if p could be asserted to not be in one’s ground, it follows that p cannot be a known truth. Hence p must be a belief, a subjective attitude toward the propositional content (Kant 1781). Thus, the interactional spine is the source of the attitudinal content of the confirmational. Next consider the second function of eh: it signals a request for confirmation. This is the function that makes confirmationals ideally suited to marking an initiating move. I propose that this function is also a function of the interactional spine, namely the response layer. The topmost layer of the interactional spine (RespP) comes with the response set as its abstract argument in the specifier position. This response set roughly corresponds to what is on the Table. I argue that confirmationals serve to positively value the coincidence feature of Resp, thereby asserting that the complement of RespP, namely GroundP, is placed into the addressee’s response set. However, it is not the UoL eh itself that associates with Resp. Following Wiltschko and Heim (2016),

100

Initiating Moves: A Case-Study of Confirmationals

I assume that confirmational eh is complex. It consists of eh and rising intonation. The rising intonation associates with RespP, as in (13).1 (13)

RespP

Resp-set Resp [+coin]

GroundP

Ground Ground [+coin] eh

p

In sum, I propose that crucial aspects of the interpretation of confirmationals derive from the interactional spine: the fact that it introduces a propositional attitude derives from GroundP and the fact that it introduces a request for confirmation derives from RespP. Hence, these aspects of meaning need not be encoded in the lexical entry for confirmationals. A key piece of evidence for this proposal comes from the multi-functionality of confirmationals. That is, according to the USH, multi-functionality is a hallmark of syntax: UoLs are interpreted differently depending on their syntactic position because the spinal functions contribute to interpretation. This is also observed with confirmationals. I assume that GroundP comes in two guises: one speaker-oriented and the other addressee-oriented. Thus, we predict that confirmationals can mark the propositional content as being part of the speaker’s ground (as in the examples we have seen thus far), but they can also mark the propositional content as being part of the addressee’s ground. This is indeed the case. Beside contexts where the initiator inquires about the validity of their belief that p, eh can be used with a different function, as in (14). (14)

I’m not guilty, eh?

Here the initiator can safely be assumed to know p with certainty since it is a statement about themselves. Thus, eh cannot serve to request confirmation for the speaker’s belief. Instead, what the initiator requests confirmation for is that their interlocutor shares this belief. That is, what is put on the Table is the addressee’s knowledge state. They are asked to confirm that they believe p (Bel(Adr,p)). Once the addressee signals that they know p, the initiator can update 27’s ground. 1

For a discussion on the assumption that prosodic information behaves like any other UoL and thus can associate with the spine, see Truckenbrodt (2011), Heim (2019a), and Heim and Wiltschko (2020).

5.2 The Grammar of Initiating Moves

101

Initiation SELF p, q, r

Reaction

27 r Bel(p)

27

SELF p,q,r

r, p

p, eh? 27

27

Bel(Adr, p)

Figure 5.2 Another use of confirmationals: “Confirm that you know”

Here, too, the normal course of an assertion or question is not met. For a question, the speaker can’t know; but for an assertion, the speaker has to assume that the addressee does not know. Neither of these conditions is met. Thus, with the use of confirmationals knowledge states are marked that do not align with those of normal assertions or questions: this affects both assumptions about the initiator’s knowledge states, but also assumptions about the initiator’s assumptions about their interlocutor’s knowledge state. Hence, it is essential to recognize two separate grounds that interlocutors keep track of. According to the ISH, there are two separate syntactic positions that introduce these grounds. I argue that the multi-functionality of eh is syntactically conditioned: if it associates with GroundSpkr, it places the speaker’s belief on the Table; if it associates with GroundAdr, it places the addressee’s belief on the Table (see section 5.4 for empirical evidence). Assuming that syntax contributes core aspects of the meaning of confirmationals via the spinal functions, the question arises as to what the UoL itself contributes: what is the lexical entry for the confirmational? 5.2.3

The Core Meaning of Confirmationals

Given that the core functions of confirmationals (introducing a propositional attitude and requesting a response) stem from the interactional spine, it must be the case that the information associated with the UoL itself is minimal. A key assumption of the USH is that the content of the lexical entry serves to value the coincidence feature of the spine. We know that the confirmational serves to positively value the coincidence feature in Ground, and hence its content has to

102

Initiating Moves: A Case-Study of Confirmationals

be of the right kind to do so. While the precise lexical entry depends on the individual confirmational, there are clear patterns across languages regarding what type of UoLs can be used as confirmationals. They typically have to do with polarity. For example, in some languages, including English, polarity response markers can be used in this way (15a), as can predicates like right (15b), which predicate the truth of a given proposition. But there are also dedicated markers like eh, which, at least synchronically do not have other functions. (15)

a. You have a new dog, yeah? b. You have a new dog, right? c. You have a new dog, eh?

These lexicalization patterns for confirmationals are consistent across languages: they relate to positive polarity and/or truth. This is the type of content that may serve to positively value the coincidence feature in Ground.2 There is nothing beyond the positive content that resides in the lexical entry of the UoL itself: it doesn’t introduce a propositional attitude, nor does it introduce the request for a response. The confirmational functions derive from the interactional spine. By valuing the coincidence feature in Ground, the particle serves to assert that the propositional content coincides with the speaker’s belief set. And by valuing the coincidence feature in Resp, rising intonation serves to express that the speaker wishes to place the utterance into the addressee’s response set. 5.2.4

Predictions

In the remainder of this chapter, I demonstrate how this analysis applies to different confirmationals. I explore the predictions regarding the typological space for confirmationals. The formal typology based on the ISH allows us to compare confirmationals. Previous analyses and descriptions have been restricted to individual instances of confirmationals. Given that there are different forms, each of which may have different functions, it is not always easy to decide how to compare them to each other. The USH provides a basis for comparison: for each function, we are asked to identify its ingredients (both the UoL and the spinal category it associates with) and the way these ingredients relate to each other, i.e., the structure of the category. It is in this sense that a typology based on the USH is formal: it derives from the structure of categories, rather than their meaning. Thus, the comparison of confirmationals within and across languages will not be based on the UoL itself, but instead on the way it is composed. Specifically, if 2

Interestingly, the negative response marker no can also be use as a confirmational. See OsaGomez (2020) for a detailed analysis of the Spanish confirmational no. I submit that no negatively values the coincidence feature, thereby asserting a different type of bias than positive confirmationals. It is akin to the fact that complex syntactic tag questions come in two varieties: same vs. reverse polarity. They too differ in the type of bias that is introduced.

5.3 The Role of the Host Clause

103

confirmationals are complex, we expect that the ingredients can be composed in different ways. And moreover, we are not restricted to comparing UoLs that are commonly referred to as particles to each other. Rather, any UoL (including intonational tunes for example) will become part of the comparison set. That is, we expect that languages make use of different ingredients for the construction of confirmationals. The analysis of confirmationals in (13) makes the following predictions regarding the composition of confirmationals and related expressions. First, we expect that there are languages where both components are realized by means of particles, yielding a sequence of two particles – one associated with Ground, the other with Resp. Second, assuming that utterances may not always realize all of the layers of the spine, we expect to find UoLs that are attitudinal only and do not involve a response layer. In contrast, we do not expect there to be utterances that involve the response layer only without manifesting grounding structure. This is because of the asymmetry of selection: RespP will always select for GroundP, but GroundP does not require the presence of RespP. A third prediction concerns the value of [ucoin]. We expect there to be confirmationals that associate with the head and serve to value the coincidence feature. This is what I propose for eh: it values [ucoin] as [+coin]. However, we also expect to find UoLs that value it as [–coin]. Hence, the ISH predicts the existence of paradigmatic contrasts in the realm of confirmationals. Finally, a note about linearization is in order. Based on the structure in (13), we expect that confirmationals will linearly precede the utterance and that intonational tunes are sentence-initial. This is contrary to fact however: confirmationals and intonational tunes are usually sentence-final. There are two possibilities for addressing this issue. First, we could assume that grounding and responding structure are head-final; in which case, they follow propositional content. However, English is otherwise consistently head-initial. Hence a head-final structure is unexpected (see, however, Sheehan et al. 2017 for a principled discussion on changes in headedness). The alternative would be to assume that the propositional content moves to a position preceding the particle and the intonational tune (see Haegeman 2014). For the present purpose, the choice between these two options makes no difference. 5.3

The Role of the Host Clause: Target of Confirmation

According to the analysis of confirmationals, the complement of Ground is asserted to be in the speaker’s ground. Thus far, we have only seen confirmationals following declaratives, which denote propositions. In this section, I show that confirmationals are not restricted to declarative host clauses. Depending on the complement clause, the function of eh appears to differ: a confirmational cannot request confirmation for the truth of a proposition if there is no proposition to begin with. This is the case for all clause types other than

104

Initiating Moves: A Case-Study of Confirmationals

declaratives, including interrogatives, imperatives, and exclamatives. With the use of a confirmational following any of these clause types, the initiator requests confirmation for the content of the complement. Thus, the function of confirmationals in these contexts provides us with the opportunity to explore the interpretation of clause types independently of the illocutionary force they are typically associated with: in the presence of a confirmational, all clause types become requests for confirmation. But what is it that is to be confirmed? Furthermore, the behavior of confirmationals in the context of different clause types also leads us to the conclusion that elements of discourse other than propositions can be put on the Table and be part of the ground. For the purpose of the discussion, I assume that the complement of Ground is the linking layer, which is realized as CP. I further assume that its core function is to encode polarity and/or clause type. Polarity will be relevant in Chapter 6; here I focus on the role of clause typing. 5.3.1

Declaratives

As a point of departure, let us again consider declarative clauses (realized with falling intonation). Declaratives are canonically used as assertions with two canonical conditions of use: the speaker knows p and the speaker assumes that their interlocutor does not. Hence the normal course for an assertion is for the speaker to inform their interlocutor. When a confirmational follows a declarative, the conditions of use change – the utterance no longer functions as an assertion. With the use of eh, the speaker indicates that they believe the proposition (Bel(p)), but that they are not certain. In addition, the speaker has to have reason to believe that their interlocutor can provide an answer (Know(Adr, p∨¬p)). Conversation Board 1 is a conversation board for eliciting confirmation for the belief that p. The first panel illustrates a time prior to the conversation. It serves to introduce the knowledge states of the initiator (Mary doesn’t know if John has a dog) and their social relation to their interlocutor (Mary is friends with John). The second panel illustrates the conversation situation. This is critical if the situation introduces new evidence that may lead the initiator to update their ground (Mary sees John walking a dog). Finally, the last panel illustrates the initiation itself.

Conversation Board 1. New dog. New info for Spkr. (Cx 1)

5.3 The Role of the Host Clause

105

Cx 1: Mary and Greg are talking about their friend John. Greg wants to know whether John has a dog and Mary says she doesn’t know. The next day, Mary runs into John and she sees that he is walking a dog. She now assumes that John has a dog, but to be sure she wants him to confirm her suspicion (after all, John could be walking someone else’s dog). So she utters: (16)

a. b.



You have a new dog, eh? You have a new dog.

Declaratives followed by eh, but not bare declaratives, are licensed in this context. The bare declarative in (16b) requires a different context of use: a speaker doesn’t usually inform their addressee about things the addressee is likely to have privileged access to. However, there are exceptions, as for example in Cx 2, shown in Conversation Board 2. Here, the epistemic states required for a bare declarative hold: the speaker knows p and thinks that the addressee does not. In this context, the bare declarative is well-formed, but the eh-declarative is not.

Conversation Board 2. New dog. New info for Adr. (Cx 2)

Cx 2: Peter has wanted a new dog for a while but couldn’t make up his mind. His friend Betty takes fate into her hands and gets him a rescue dog. As she hands the dog to him she utters: (17)

a. b.



Surprise! You have a new dog. Surprise! You have a new dog, eh?

That eh introduces the propositional attitude of belief (rather than asserting knowledge of p) derives from the assumption that eh positively values [ucoin] in GroundSpkr; in addition, the rising intonation on eh is used to signal the request for a response by putting the GroundP into the response set of the addressee. (18)

[RespP Resp-setAdr [↗: +coin] [GroundP GroundSpkr [eh: +coin] [declarative p]]]

106

Initiating Moves: A Case-Study of Confirmationals

What is asserted to be in the ground and thus put on the Table derives from the content of the complement of Ground, which in the case of a declarative clause is a proposition. However, eh is not restricted to declarative hosts, as I now show. 5.3.2

Interrogatives

Eh can be used following interrogative clauses, both wh-interrogatives (19) and polar interrogatives (20). (19)

a. And who is to look after the horses, eh? b. What are you trying to say, eh? (Avis 1972, Love 1973)

(20)

a. Did that seem allright, eh? b. Isn’t that a corker, eh? (Love 1973)

Eh-interrogatives are interpreted as requests for confirmation, but the target for confirmation is not the truth or belief of a proposition: interrogatives do not denote propositions. So, what does the initiator expect their interlocutor to confirm? To establish the relevant felicity conditions for eh-interrogatives, consider first those for bare interrogatives, which are typically used as questions. A canonical question requires the speaker not to know the answer and to assume that the addressee does. A context where a canonical question in the form of a bare interrogative, but not an eh-interrogative, is licensed is illustrated in Cx 3 (Conversation Board 3), with the relevant examples in (21).

Conversation Board 3. Lecture. True question. (Cx 3)

Cx 3: Andy is in a public lecture, waiting for his friend Bob, who is late. As Bob arrives, halfway into the lecture, he wants Andy to summarize the content of the lecture thus far. Bob utters:

5.3 The Role of the Host Clause (21)

a. b.

107



What’s he talking about? What’s he talking about, eh?

Now consider Cx 4 (Conversation Board 4), which licenses eh-interrogatives in addition to bare interrogatives.

Conversation Board 4. Lecture. Confirm you have the same question. (Cx 4)

Cx 4: Andy and Bethany are in a public lecture. The lecture is rather obscure and Andy is not following, though he doesn’t think it’s his fault. Judging from Bethany’s face he determines that she also doesn’t understand (or much like) the lecture. Andy utters: (22)

a. b.



What’s he talking about? What’s he talking about, eh?

In (22), the initiator does not expect an answer to the question. The bare interrogative is used as a rhetorical question and the eh-interrogative is used as a request to confirm the validity of the question. Roughly the contribution of eh here can be paraphrased as Confirm that you have the same question. Within the ISH, this is analyzed as in (23): eh positively values [ucoin] in GroundSpkr, asserting that the interrogative is in the speaker’s ground, while the rising intonation on eh positively values [ucoin] in RespP, placing GroundP into the addressee’s response set. (23)

[RespP Resp-setAdr [↗: +coin] [GroundP GroundSpkr [eh: +coin] [interrogative ?]]]

But what is the content of the interrogative clause? What is asserted to be in the ground? According to the ISH, illocutionary force is not part of an interrogative CP and hence question force cannot be part of the interpretation of the CP. Thus, propositional-level content has to be able to denote questions, without introducing question force. This is what the Hamblin-style analysis reviewed in Chapter 3 does: a question denotes a set of propositions (which count as possible answers to the question). This can be represented as a disjoint set of propositions (p∨¬p) for polar questions and (p1∨p2∨p3∨ . . . ∨pn) for wh-questions.

108

Initiating Moves: A Case-Study of Confirmationals

With the utterance of a bare interrogative, an initiator places the denotation of the question on the Table. Because of assumptions about the normal course of a conversation, the interlocutor will respond so that the common ground can be updated. In contrast, eh-interrogatives are only felicitous in contexts where conditions for canonical questions do not hold. In particular, they are used when the initiator does not expect an answer to the question; instead they assume that their interlocutor has the same question in their ground. The question itself is not meant to be interactional: it is rhetorical. Nevertheless, the utterance itself is interactional: eh marks that the question is in the initiator’s ground and hence it is interpreted as a subjective attitude of the initiator. In addition, the rising intonation on eh signals the request for a response. Thus, just as with ehdeclaratives where the subjective attitude of belief is on the Table, so is the subjective attitude of questioning on the Table with eh-interrogatives, as illustrated in Figure 5.3. By marking the interrogative with eh, the initiator places his subjective question attitude on the Table and asks for confirmation that their interlocutor has the same question. Once the interlocutor responds with agreement, the initiator can update their ground to include that their interlocutor has the same question. That questions can be the content of a (propositional) attitude has been argued in Friedman 2013 (her interrogative attitude): “They are the sorts of attitudes we typically have as we move ourselves from ignorance to knowledge” (Friedman 2013 145). Since questions constitute one of the driving forces for building common ground, it is essential that we know what the

Initiation

Reaction

SELF

27

p1 p2 p3 ... pn

Q (p1 p2 p3 ... pn )

27

SELF p1 p2 p3 ... pn

p1 p2 p3 ... pn

p, eh? 27

Q (p1 p2 p3 ... pn )

Figure 5.3 Eh-interrogatives

27

5.3 The Role of the Host Clause

109

questions are that we are trying to answer. That is, in interaction we do not only strive to agree on what we all know and believe, as typically assumed in the Stalnakerian model of CG, we also have to agree on what we want to find out and what questions we are trying to resolve. 5.3.3

Imperatives

Next we turn to imperatives, which can also combine with eh. (24)

Listen, Harry, phone me before you go out tonight, eh? (Avis 1972)

Eh-imperatives are interpreted as requests for confirmation. To establish their felicity conditions, consider first those for bare imperatives, typically used as commands and requests. With a canonical command, an initiator tries to get their interlocutor to do something; they have to believe that their interlocutor is in a position to carry out the request and that without their request they might not do so (Searle 1969). If the latter condition is violated, bare imperatives are ill-formed, while eh-imperatives are well-formed. Again we observe that eh is used when the normal conditions for an imperative do not hold. In Cx 5, shown in Conversation Board 5, it is clear that the interlocutor has no intention of carrying out the action requested by the initiator. Here bare imperatives are well-formed, while eh-imperatives are not (25).

Conversation Board 5. Beer. True imperative. (Cx 5)

Cx 5: Billy and Alfred are frat boys. Billy is a bully, and Andy a pushover. As Billy is sitting on the couch watching a movie, Andy sits down on the couch and tells Billy that he’s tired and doesn’t want to leave the couch anymore. Billy is used to his roommates serving him beer. So, he says to Andy: (25)

a. b.



Get me a beer. Get me a beer, eh?

110

Initiating Moves: A Case-Study of Confirmationals

In contrast, in Cx 6 (Conversation Board 6) the initiator has reasons to assume that their interlocutor already has the intention of carrying out the requested act. And in this context, bare imperatives are ill-formed whereas eh-imperatives are felicitous (26).

Conversation Board 6. Beer. Confirm that you have this desire. (Cx 6)

Cx 6: Boris and Annabelle are relaxing on the couch watching a movie. Annabelle gets up to go to the kitchen as she usually does during commercials. And she usually brings Boris a beer. Just to make sure that Annabelle is indeed planning on bringing Boris a beer, he utters: (26)

a. b.

Get me a beer! Get me a beer, eh?

Eh is used to request confirmation that the interlocutor already has the intention to do whatever they are requested to do via the imperative. Further evidence comes from the following examples. A bare imperative can be followed with a statement that the initiator does not believe that their interlocutor intends to perform the activity. In this context, a bare imperative is well-formed (27a), but an eh-imperative is not (27b). (27)

a. Get me a beer! (I know you don’t want to) b. Get me a beer, eh? (*I know you don’t want to)

Moreover, an eh-imperative is ruled out if it is clear from the context that the interlocutor is not aware of the speaker’s desire. In (28), I ’s question makes it clear that they do not intend to take the train; they have no idea as to how to get to San Francisco. In this context, eh is ruled out. (28)

[Strangers in the streets of Palo Alto.] I Excuse me, how do I get to San Francisco? R i. Take the train that leaves from over there in 10 minutes. [points to train station] (adapted from Condoravdi and Lauer 2012: 40 (9)) ii. *Take the train that leaves from over there in 10 minutes, eh? [points to train station]

5.3 The Role of the Host Clause

111

Roughly the contribution of eh to an imperative can be paraphrased as Confirm that you share my desire. Within the ISH, this can be analyzed as in (29): eh positively values [ucoin] in GroundSpkr, asserting that the imperative clause is in the speaker’s ground, while the rising intonation on eh positively values [ucoin] in RespP, placing GroundP into the addressee’s response set. (29)

[RespP Resp-setAdr [↗: +coin] [GroundP GroundSpkr [eh: +coin] [imperative]]]

But what is the content of the imperative clause? The literature on imperatives is diverse: there are (at least) four different types of analyses. First, according to Portner (2005), imperatives encode properties, which are placed in the addressee’s to-do-list (see also Hausser 1980, Zanuttini 2008, Zanuttini, Pak, and Portner 2012, Roberts 2018). Second, imperatives can be viewed as modal propositions (roughly to be paraphrased as you must . . .) (Han 2000, Kaufmann and Schwager 2009, Kaufmann 2012). Third, within dynamic semantic frameworks, imperatives are taken to denote context-change potentials (Condoravdi and Lauer 2012, Murray 2014). Finally, imperatives can be analyzed as encoding intentions (Charlow 2014). It goes beyond the scope of this monograph to decide between the different ways of analyzing imperatives. For the purpose of the present discussion, we cannot directly adopt any of these analyses. This is because it is essential to the ISH that there is a clear separation between propositional content and speech act or interactional content: these two aspects of meaning are encoded in different structural positions. Thus, an analysis of imperatives is needed that clearly separates the propositional content from the interactional content. This might be possible for all of the above-mentioned analyses, but it goes beyond the scope of the present discussion to develop such an analysis. For the purpose of this discussion, I assume that the content of an imperative is the desire for the propositional content to be true (Des(p)). In the normal course of a conversation, when uttering an imperative, the initiator has Des(p) in their ground, but does not assume that their interlocutor shares this desire. They are placing Des(p) on the Table. And once the responder indicates that they comply, the initiator will know that their interlocutor is or will be complying. Essentially, in this context putting Des(p) on the Table is interpreted as issuing a request. As for eh-imperatives, I assume that they minimally differ from bare imperatives in that the speaker assumes that their interlocutor shares their desire to make p true. In Figure 5.5, this is indicated by the fact that Des(p) is also in the initiator’s representation of their interlocutor’s ground. Thus, it is the denotation of the complement of Ground that determines the target of confirmation: a proposition for eh-declaratives, a question for ehinterrogatives, and a desire for eh-imperatives.

112

Initiating Moves: A Case-Study of Confirmationals

Initiation

Reaction

SELF

27

27

SELF

Des(p)

Des(p)

Des(p)

!p 27

27

Des(p)

Figure 5.4 Imperatives

Initiation

Reaction

SELF

27

Des(p)

Des(p)

27

SELF Des(p)

Des(p)

!p, eh? 27

27

Des(p)

Figure 5.5 Eh-imperatives

5.3.4

Exclamatives

We now turn to eh following exclamatives, as in (30). (30)

Gee, what a night, eh? (Avis 1972)

5.3 The Role of the Host Clause

113

Eh-exclamatives are interpreted as requests for confirmation but, again, the target for confirmation is not the truth or belief of a proposition: exclamatives do not denote propositions. So once again, we are faced with the question as to what the initiator expects their interlocutor to confirm. Consider first the conditions of use for bare exclamatives. With a canonical exclamative, the speaker expresses surprise and there are no restrictions concerning assumptions about the addressee’s knowledge state. Eh-exclamatives are used when the normal conditions for an exclamative do not hold. Consider the two contexts shown in Conversation Boards 7 and 8. In Cx 7, the speaker is surprised, while in Cx 8, the speaker is not. The eh-exclamative is used to confirm that the addressee is surprised. This is evidenced by the judgments in (31).

Conversation Board 7. Surprise party. Confirm that you are surprised. (Cx 7)

Cx 7: Anne has organized a surprise party for Charlie. Charlie was out with Bob, whose job was to distract Charlie. As Charlie enters the room and everyone shouts “surprise,” Charlie, who is genuinely surprised, utters . . .

Conversation Board 8. Surprise party. True exclamative. (Cx 8)

Cx 8: Anne has organized a surprise party for Charlie. Charlie was out with Bob, whose job was to distract Charlie. As Charlie enters the room, everyone shouts “surprise.” Observing Charlie’s surprised expression, Anne utters . . . (31)

a. b.

, ,

What a surprise! What a surprise, eh?

This establishes that the conditions of use for eh-exclamatives differ drastically from those of bare exclamatives: the speaker need not be surprised, but

114

Initiating Moves: A Case-Study of Confirmationals

they have to make assumptions about their addressee’s knowledge state. It is only if the initiator assumes that their interlocutor is surprised that ehexclamatives are well-formed: they are used to confirm this assumption. Thus, in order to understand eh-exclamatives, it is essential to recognize the role of the addressee’s ground. Thus, I propose that in this case eh associates with GroundAdr to assert that the speaker assumes that the utterance content is in their interlocutor’s ground, as in (32). (32)

[RespP Resp-setAdr [↗: +coin] [GroundP GroundAdr [eh: +coin] [GroundSpkr [excl]]]

...

But what is the propositional content of the exclamative: what is it that the speaker requires confirmation for? The literature on the propositional content of exclamatives is rich and diverse. Following pioneering work by Elliott (1971) and Dale (1971, 1974), several ingredients have been identified that make up their interpretation: (i) a high degree of involvement, (ii) emotional content, and (iii) factivity. That these are the essential ingredients of an exclamative is widely acknowledged; however, how to analyze them is controversial (see Villalba 2008 for an extensive literature overview). The denotation of an exclamative is often equated with that of an interrogative, that is, they are assumed to denote a set of propositions (Gutiérrez-Rexach 1996, d’Avis 2002, Zanuttini and Portner 2003, Abels 2004, Saebø 2005). Exclamatives have the added requirement that some of the propositions in this set are considered surprising in a given context (Zanuttini and Portner 2003). The difference between the two clause types is typically attributed to a difference in illocutionary force. For example, Gutierrez-Rexach (1996) argues that exclamatives are introduced by an illocutionary exclamative operator, which in turn includes a null emotive predicate. There are restrictions that the ISH places on the analysis of exclamatives. Specifically, we cannot associate the illocutionary force of exclamations with the clause type itself. Eh-exclamatives teach us that it may be used without the exclamatory force. And according to the ISH, force is always derived from the interactional spine. Thus, we need an analysis of exclamatives that separates propositional from interactional content. For the purpose of this discussion, I assume an informal and simplified analysis of exclamatives, which captures their main empirical properties. I propose that exclamatives (and exclamations more generally) are used when there is a change in the mental state of the speaker. This may be triggered by the observation of an unexpected eventuality. Unexpectedness triggers an emotional response and this is roughly what is expressed by means of the exclamative. As such, bare exclamatives are more akin to reacting moves: they are typically used to react to a non-verbal event, though of course verbal events too may trigger them. And they do not typically require a response – in fact,

5.3 The Role of the Host Clause

past SELF

Conservation time SELF

q,r

115

Initiation SELF

p,q,r

p,q,r

p!

Figure 5.6 The normal course of an exclamation

exclamatives can be uttered without an interlocutor. I submit that exclamatives are not initiating moves; they are not used to place the propositional content on the Table. They serve as expressions of surprise. This is illustrated in Figure 5.6. At the time prior to the interaction, the speaker does not have p in their ground; at the time of conversation something unexpected happens (as indicated by the flash), which triggers them to adopt a new fact (p) into their ground. This leads to the utterance of the exclamative (p!). The conditions of use for exclamatives thus teach us about the importance of distinguishing the time prior to the conversation time and the time of conversation. The fact that the speaker’s assumptions about the addressee play no role for the felicitous use of the exclamation is indicated in two ways. There is no ground for the interlocutor and there is no Table. On this view, exclamations are not interactional. The conditions of use for an eh-exclamative are predictably different. Ehexclamatives are interactional like all utterances that contain confirmationals. They are not interpreted as exclamatives. As shown in (31), eh-exclamatives can be uttered by a speaker who has no emotional investment. Rather, the contextual restrictions on eh-exclamatives require that the addressee rather than the speaker undergo a change of epistemic state. That is, an ehexclamative is well-formed if the speaker has reason to believe that the addressee did not have p in their ground before the time of the conversation. Rather, the speaker must assume that p just entered their interlocutor’s ground and hence that they are surprised. By adding eh, the exclamation itself is on the Table and the addressee is asked to confirm it. This is illustrated in Figure 5.7.

Initiating Moves: A Case-Study of Confirmationals

116

past SELF p,q,r

Conservation time SELF p ,q,r…

27 p ,q,r…

Initiation SELF p ,q,r…

27 p ,q,r…

p! eh?

27

q,r

Figure 5.7 Eh-exclamatives

5.3.5

Summary

In this section we have seen that the multi-functionality of eh depends to some degree on its host clause. This is not surprising, given the principle of compositionality. Eh is not interpreted in isolation but within its linguistic context. The host clause is interpreted as the target of confirmation for eh. This follows from the ISH. Eh positively values the coincidence feature in Ground and as a consequence the complement of GroundP (the denotation of the host clause) is asserted to be part of the speaker’s and/or addressee’s Ground. The denotation of the complement of Ground thus affects the interpretation of eh, as it determines what is being confirmed. We have seen that confirmationals are not restricted to confirming the truth of a proposition. They can be used to confirm that an utterance may be an appropriate question, command, or exclamation. A syntactic analysis captures this by providing a structural analysis for how confirmationals combine with other ingredients of an utterance (in interaction with assumptions about the normal course of a conversation). Without this structural analysis, it would be difficult to account for all of the different functions eh can acquire. While it is possible to put these restrictions into lexical entries that are subsequently interpreted in a compositional way,

5.3 The Role of the Host Clause

117

lexicalist approaches of this sort fail to provide a restrictive framework for what constitutes a possible lexical entry. The present syntactic analysis contributes exactly that: restrictions on the functions of confirmationals. By exploring the range of clause types that can serve as hosts for confirmationals, we have learned a number of things. First, the interlocutors’ grounds not only contain propositions, but other mental constructs as well: questions, desires, and surprises. This is significant, because it means that by uttering a question, sometimes speakers simply want to acknowledge that the question is valid. To validate a question that is put on the Table, it may be sufficient for the addressee to acknowledge that they have the same question; the question is now in the common ground. That is, some questions are not put on the Table to resolve; they may become part of what we know about each other. The same holds for the other clause types that do not request confirmation of propositional truth. That is, when we synchronize our minds, we not only keep track of our knowledge and belief of what the world is like, but also about what questions and desires we have. Second, confirmationals teach us that we have to separate clause type from illocutionary force. The canonical force associated with a given clause type is not available when it combines with eh. Eh-utterances are always interpreted as requests for confirmation. This is precisely what the ISH does: propositional content is encoded in the propositional structure, which ends at the CPlayer, while its force is a function of the interactional structure dominating it. An analysis in which force is part of the CP is not tenable. If we were to assume that declaratives associate with their assertive force via an articulated CP projection (containing ForceP, as in Rizzi 1997), it is not clear how eh contributes the request for confirmation. A proponent of a ForceP analysis might argue that a ForceP associated with declaratives encodes assertive force, while eh might be analyzed as a modifier. If this were the case, then all of the interpretive properties of eh would have to be located within its lexical entry. This in turn would require us to assume multiple lexical entries because of the multi-functionality of eh. Under a lexicalist analysis of this kind, there is no constraint on what these possible lexical entries might be and how many of them there are. In contrast, under the ISH the interpretation of eh-clauses is compositionally derived and hence constrained by the syntactic spine. Finally, we have also seen that it is crucial to distinguish between the speaker’s ground and the addressee’s ground. While this is acknowledged in the current literature on the common ground, to my knowledge the ISH is the first syntactic implementation of this insight. I turn to explicit arguments in favor of the syntacticization of individualized Grounds in the following section.

118

Initiating Moves: A Case-Study of Confirmationals

5.4

The Articulated GroundP

In this subsection, I motivate the proposal that GroundP is split into a speakeroriented and addressee-oriented layer, as in (33). (33)

GroundAdrP

GroundAdr GroundAdr [ucoin]

GroundSpkrP GroundSpkr GroundSpkr [ucoin]

5.4.1

p-structure

The Argument from Interpretation

Confirmationals can place restrictions on the knowledge state of the speaker, the addressee, or both. Consider again Cx 1, repeated below. With the use of eh in (34), I requests confirmation that their belief is correct. In this context, eh can be paraphrased as “confirm that p is true” and R can respond by confirming the truth of p with the positive polar response marker yeah. Cx 1: Mary and Greg are talking about their friend John. Greg wants to know whether he has a dog again and Mary says that she doesn’t know. The next day, Mary runs into John and she sees that he is walking a dog. She now assumes that John has a new dog, but to be sure she wants him to confirm her suspicion (after all, John could be walking someone else’s dog). So, she utters: (34)

I You have a new dog, eh? R Yeah, I just got him last week.

Now consider Cx 9, shown in Conversation Board 9, and the corresponding conversation in (35). With the use of eh, I (Mary) wishes to confirm that her interlocutor has formed the belief that she has a new dog. Here, confirmation is not about the truth of the belief, but about Mary’s assumptions about the addressee’s belief. In this context, eh can be paraphrased as confirm that you know p. This is consistent with John’s response: no is a response to the

5.4 The Articulated GroundP

119

Conversation Board 9. New dog. Confirm that you know. (Cx 9)

contribution of eh (confirm that you know), rather than to the propositional content. Cx 9: Mary has a new dog, which John doesn’t know about. As Mary is walking her dog, she runs into John. They greet each other, but John is not mentioning the dog at all. So, Mary utters: (35)

I I have a new dog, eh? R No. I didn’t know that.

Based on the interpretation of eh-exclamatives we have already seen that assumptions about the addressee’s knowledge state play a role in the interpretation of eh. But the above data show that this is also true for eh-declaratives. There are (at least) two contexts of use for eh-declaratives. And both depart from the normal conditions of use for an assertion. Eh can be used if the speaker requests confirmation for their belief that p. In this context, a regular declarative is not well-formed because the speaker is not certain about p. In contrast, in Cx 9 the assumption about the addressee does not align with the normal course of an assertion. Here I knows p with certainty, which would license an assertion, but they are not sure whether or not p is in their interlocutor’s ground. What is put on the Table is that the initiator thinks that their addressee might believe p (Bel (Adr,p)). Once the addressee signals that they know p, the initiator can update their interlocutor’s ground as illustrated in Figure 5.2 (see p. 101). According to the ISH, much of the interpretation of eh is conditioned by the spine. The multi-functionality of eh is syntactically conditioned. The confirm that p is true interpretation is the result of eh associating with GroundSpkr; the confirm that you know p interpretation is the result of eh associating with both GroundSpkr and GroundAdr. Eh asserts that p is in the speaker’s ground and in the addressee’s ground, and the addressee is asked to confirm that it is indeed in their ground. I assume that this double association arises via movement, as shown in (36):

120

Initiating Moves: A Case-Study of Confirmationals

(36)

GroundAdrP

Confirm that you know

GroundAdr GroundAdr [+coin]

GroundSpkrP

Confirm that I’m right

GroundSpkr GroundSpkr [+coin]

p

eh

In sum, with the use of confirmationals, knowledge states are marked that do not align with those of normal assertions and questions: this affects assumptions about the initiator’s knowledge states, but also assumptions about the initiator’s assumptions about their interlocutor’s knowledge state. For this reason, it is essential to recognize two separate grounds that interlocutors keep track of. 5.4.2

The Argument from Differences in Confirmationals

The interpretation of the confirmational depends in part on the context of use. If the addressee has authority over the truth of the proposition, then it makes sense for the speaker to request confirmation for the truth of the proposition from the addressee. In contrast, if the speaker has authority over the truth of the proposition, then it would be unexpected for them to ask for confirmation, hence the alternative interpretation ensues. This is a purely contextual restriction that seems to determine which of the two uses of eh is appropriate. This raises the question as to whether the interpretation of eh is determined by context alone. Specifically, one might propose that eh signals a general request for confirmation (e.g., Confirm that this is an appropriate move). However, this cannot be the full story. There are confirmationals whose interpretation is restricted to only one of the interpretations available for eh. Consider for example English huh. It functions like eh in that it allows for the confirmation of the truth of the proposition and hence can be used in contexts where the addressee has the authority over the truth of the event as in (37) and (39a). It cannot, however, be used to request confirmation that the addressee knows p ((38) and (39b)). (37)

(38)

You have a new dog, huh?

I have a new dog, huh?

5.4 The Articulated GroundP (39)

121

Context: An employee is in conversation with their boss after the employee made an unforgivable mistake: a. Employee: I am fired, huh? = confirm that p is true b. Boss: *You are fired, huh? = confirm that you know p

The contrast between eh and huh establishes that we cannot simply derive the difference in interpretation associated with eh from contextual inferencing. If that were the case, then huh should behave just like eh: we would not expect this variation in meaning. Instead, I propose that the reason huh is ill-formed in these contexts is that it associates (and values) GroundAdr only, and not GroundSpkr (40). It can be paraphrased as Confirm that you think so. Huh is not compatible with contexts in which the speaker is committed to the truth of the proposition. (40)

GroundAdrP

Confirm that you think so

GroundAdr GroundSpkrP GroundAdr [+coin] huh GroundSpkr GroundSpkr [ucoin]

p

This analysis accounts for the contrast in (37)–(39) as follows. In (37), the speaker might suspect that the addressee has a new dog, but this belief has not yet entered their ground: they are not committed. In (38), on the other hand, the context makes it clear that the speaker must believe the proposition (people usually know if they have a dog). In this context, eh is used to confirm that the addressee has the same belief as the speaker. Huh is not compatible with contexts in which the speaker is committed to the truth of the proposition. While the analysis in (40) derives the facts, it runs counter to the assumption that unvalued features have to be valued, as assumed in current generative models, including the USH. But in the context of GroundP, the interpretation that would arise in the absence of valuing [ucoin] is exactly what we want: it leaves the speaker’s attitude toward the proposition unmarked: huh is used only when the speaker has no committed belief about the propositional content. If this reasoning is on the right track, we have to reconsider the validity of the assumption that [ucoin] has to be valued as a matter of UG. In the interactional

122

Initiating Moves: A Case-Study of Confirmationals

structure, an unvalued feature is interpretable: it conveys the absence of grounding (Heim and Wiltschko 2020). In sum, on the present analysis, the contextual restriction does not derive the interpretation, rather it constrains the use of eh: if the addressee has authority over the truth of the proposition, then eh can only be associated with GroundSpkr; if the speaker has authority, then eh can only be associated with GroundAdr. So what happens when neither of the interlocutors has authority over the propositional truth, as in contexts where truth is subjective? With predicates of personal taste, truth is relative to a particular individual, sometimes called the judge (Lasersohn 2005). To see this, consider Cx 10 (Conversation Board 10) and the corresponding example in (41).

Conversation Board 10. Movie. Subjective judgment. (Cx 10)

Cx 10: Liam and Monique are going to the movies. As they leave, they discuss the movie and Liam utters: (41)

I This was such a good movie R a. Yeah, I liked it, too. b. Really? I didn’t like it.

In (41), Liam conveys his personal judgment about the quality of the movie. Since this is an evaluative statement, there is no objective truth. In this situation, an interlocutor can agree with the evaluation, as in (41a), or they can disagree, as in (41b). Unlike what we find with objective statements (such as I have a new dog), disagreement here does not mean that the first interlocutor’s statement is challenged: the interlocutors can agree to disagree (what Kölbel 2004 refers to as faultless disagreement). Now consider what happens if we add a confirmational in this context. Eh is well-formed, but huh is not. (42)

a. b.



This was such a good movie, eh? This was such a good movie, huh?

5.4 The Articulated GroundP

123

Eh is used to request confirmation for the evaluation by the speaker. Crucially, it is not asking whether the speaker’s evaluation is correct, but instead whether the addressee agrees with this evaluation. Hence, in this context, the contribution of eh can be paraphrased as Confirm that you agree. This follows from our analysis according to which eh associates with both GroundSpkr and GroundAdr. The resulting interpretation is that eh asserts that the speaker holds the belief that the movie was good and at the same time requests confirmation from the addressee that they also hold this belief. In contrast, huh is ill-formed because it cannot be used when the speaker believes p. Now consider Cx 11, shown in Conversation Board 11.

Conversation Board 11. Movie. Confirm that you have this evaluation. (Cx 11)

Cx 11: Monique is dropping off Liam to go to a movie while she has to teach a yoga class. As she picks him up Liam looks really excited and Monique concludes that the movie must have been really good. So she asks: (43)

This was a good movie, {eh, huh}?

Here both eh and huh are felicitous. The difference between Cx 10 and Cx 11 has to do with who is the judge. In Cx 10, I (Liam) reports his own judgment, whereas in Cx 11, I (Monique) takes a guess at her addressee’s judgment. Hence the interpretation of (43) can be paraphrased as Confirm that you have this judgment. This is consistent with the proposal that huh is an addresseeoriented confirmational (and thus in GroundAdr). When huh is used, [ucoin] in GroundSpkr remains unvalued. If the multi-functionality of eh was derived from context alone, this variation would be unexpected: it is not clear how to restrict huh to disallow multifunctionality.

124

Initiating Moves: A Case-Study of Confirmationals

5.4.3

The Argument from Multiple Sentence-Final Particles

The third argument for an articulated grounding structure comes from languages that allow multiple sentence-final particles, such as Cantonese. In what follows, I review evidence presented in Lam (2014).3 Cantonese has many sentence-final particles that serve a variety of functions (see Sybesma and Li 2007). The two particles relevant here are me1 and ho2: me1 is speaker-oriented and encodes a negative attitude toward the propositional content (I don’t believe p); ho2 is addresseeoriented and encodes that the speaker assumes that the addressee believes the proposition. To see this, consider (44), which illustrates the speaker orientation of me1. The host utterance is a question and the particle introduces a negative bias: the speaker doubts that the answer to the question is positive. Thus the me1 question cannot be followed by a statement that indicates the speaker’s positive bias (I think so) (44a). Conversely, the me1 question can be followed by a statement that confirms the negative bias (I don’t think so) (44b). Finally (44c) shows that me1 is not sensitive to the addressee’s attitude and hence is compatible with a follow-up stating the speaker’s indifference toward the addressee’s attitude (I don’t care what you think). (44)

a. #nei5 daai6 go3 neoi2 laa3 me1? ngo5 gok3dak1 hai6 2SG big Cl girl Prt PrtQ-S. 1SG think cop ‘Are you a big girl already? (I doubt it!) I think so.’ b. nei5 daai6 go3 neoi2 laa3 me1? ngo5 m4 gok3dak1 hai6 2SG big Cl girl Prt PrtQ-S. 1SG neg think cop ‘Are you a big girl already? (I doubt it!) I don’t think so.’ c. nei5 daai6 go3 neoi2 laa3 me1? ngo5 m4 lei5 nei5 dim2 lam2 2SG big Cl girl Prt PrtQ-S. 1SG neg care 2SG how think ‘Are you a big girl already? (I doubt it!) I don’t care what you think. (Lam 2014: 74 (28))

Next consider addressee-oriented ho2. It encodes that the speaker assumes that the addressee believes the propositional content but does not commit the speaker to the belief that p. In contrast to me1, ho2 is compatible with a follow-up that indicates that the speaker is positively biased toward the propositional content (45a) and with a follow-up that indicates a negative bias (45b). This shows that there is no speaker bias encoded by ho2. Rather, with the use of ho2, the speaker indicates that they assume that the addressee is positively biased. Therefore, it cannot be followed by a statement that contradicts this assessment (45c).

3

Lam (2014) labels the functional structure hosting the particles we discuss here as ForceP rather than GroundP.

5.4 The Articulated GroundP (45)

125

ngo5 gok3dak1 hai6 a. nei5 daai6 go3 neoi2 laa3 ho2? 2SG big Cl girl Prt PrtQ-A. 1sg think cop ‘Are you a big girl already? (I assume you think so, right?) I think so.’ b. nei5 daai6 go3 neoi2 laa3 ho2? ngo5 m4 gok3dak1 wo3 2SG big Cl girl Prt PrtQ-A. 1SG neg think Prt ‘Are you a big girl already? (I assume you think so, right?) I don’t think so.’ c. #nei5 daai6 go3 neoi2 laa3 ho2? ngo5 m4 lei5 nei5 dim2 lam2 2SG big Cl girl Prt PrtQ-A. 1SG neg care 2SG how think ‘Are you a big girl already? (I assume you think so, right?) I don’t care what you think. (Lam 2014: 74 (29))

Given that me1 is used to indicate the speaker’s disbelief, I analyze it as valuing [ucoin] in GroundSpkr negatively.4 In contrast, ho2 is used to indicate (the speaker’s assumption) that the addressee believes p and hence is analyzed as valuing [ucoin] in GroundAdr positively, as in (46). (46)

GroundAdrP

GroundAdr GroundAdr [+coin] ho2 GroundSpkr

GroundSpkrP

GroundSpkr [–coin] me1

p

What is crucial for our purpose is that me1 and ho2 can co-occur. And if they do, their order is fixed in ways that is predicted by the analysis in (46). The speaker-oriented particle is linearized closer to the host clause than the addressee-oriented one, as shown in (47). (47)

4

a. *daai6 seng1 zau6 dak1 gaa3 laa3 ho2 me1 loud voice then okay Prt Prt PrtQ PrtQ b. Scenario: Jimmy is the first in a long taxi queue. A taxi is coming, but someone not from the queue opens the door of the taxi, saying loudly that he is in a hurry. Everyone in the queue is angry. Jimmy whispers to the second person in the queue: daai6 seng1 zau6 dak1 gaa3 laa3 me1 ho2? big voice then okay Prt Prt PrtQ. PrtQ ‘What, can one get by just by being loud? I assume you’d agree it’s a valid question, right?’ (Lam 2014: 64 (6))

I discuss negative valuation of Ground in more detail in section 5.5.

126

Initiating Moves: A Case-Study of Confirmationals

A lexical analysis of confirmationals makes no predictions about their ordering restrictions. This would have to be derived by some other constraint, whereas it is predicted without further assumptions by the ISH. Note further that the data in (47) support the proposal that the speaker-oriented category is lower than the addressee-oriented one, as in the ISH and unlike in Ross’s (1970) performative hypothesis and its current neo-performative instantiations (e.g., Speas and Tenny 2003). 5.4.4

The Argument from Clause-Type Restriction

A final argument for the hierarchical organization of an articulated grounding structure comes from selectional restrictions. The ISH predicts that speaker-oriented confirmationals – but not addressee-oriented ones – may place selectional restrictions on the clause type of its host. This is because GroundSpkr is in a local enough relation with CP (namely sisterhood) to select categorical properties of CP. In contrast, GroundAdr is not. As Lam (2014) shows, this prediction is borne out in Cantonese. me1 is restricted to declarative host clauses, while ho2 can co-occur with all clause types. (48)

A father finds that his 13-year-old daughter is drinking beer in her room. a. nei5 daai6 go3 neoi2 laa3 ho2? 2SG big Cl girl Prt PrtQ-A ‘Are you a big girl already? I assume you think so, right?’ b. nei5 daai6 go3 neoi2 laa3 me1? 2SG big Cl girl Prt PrtQ-S ‘You aren’t a big girl yet, are you?’

(49)

Jimmy and Mandy have been training for a marathon race that takes place tomorrow. a. ting1jat6 wui5 m4 wui5 lok6 jyu5 le1 ho2 tomorrow Fut neg fut down rain Prt PrtQ-A ‘Will it rain tomorrow? I assume you’d agree this is a valid question, right?’ b. *ting1jat6 wui5 m4 wui5 lok6 jyu5 le1 me1 tomorrow Fut neg fut down rain Prt PrtQ-S

(50)

Jimmy and Karl are in a shoe store, where a thanksgiving sale is taking place. Both of them find two pairs of shoes that they like. a. gam3 peng4, maai5 saai3 loeng5 deoi3 laa1 ho2? So cheap buy all two pair Prt PrtQ-A ‘It’s so cheap. Buy all the two pairs! You’d agree it’s the right action to take, right? b. *gam3 peng4, maai5 saai3 loeng5 deoi3 laa1 me1? So cheap buy all two pair Prt PrtQ-S

5.4 The Articulated GroundP (51)

127

Scenario: Jimmy and Mandy were almost knocked down by a car. Jimmy is telling this story to their friend Karl. Mandy is listening while Jimmy is talking. Jimmy says this to Mandy. a. zan1 hai6 hou2 him2 gaa2 ho2? real cop very dangerous Prt PrtQ-A ‘How dangerous! You also had this feeling, right?’ b. *zan1 hai6 hou2 him2 gaa2 me1? real cop very dangerous Prt PrtQ-S (Lam 2014: 75 f. (31–34))

The distribution of me1 and ho2 is consistent with our analysis: me1, but not ho2, can place selectional restrictions on the clause type of the host clause. For completeness, note that the analysis does not predict that speakeroriented confirmationals will always place selectional restrictions on their host clause, only that they can do so. Similarly, addressee-oriented confirmationals might place selectional restrictions on the type of GroundSpkr they may combine with, but this will not manifest itself as a restriction on clause type (see Ceong 2019 for evidence to this effect from Korean complementizers). 5.4.5

Summary

In this section we have seen evidence that the grounding layer is articulated: it consists of a speaker- and an addressee-oriented GroundP. Both these layers consist of the same building blocks: an unvalued coincidence feature in the head position establishes a relation between the propositional content of the utterance and the interlocutors’ grounds. According to the ISH, the individualized grounds for the interlocutors are linguistically real, whereas the classic common ground must be derived. This raises a number of interesting questions. There are linguistic phenomena that have been analyzed as being sensitive to the common ground. Definiteness is a case in point.5 Specifically, it has been argued that definite determiners mark referents already in the common ground, whereas indefinites mark new referents. On the current proposal, we expect that any reference to common ground is relativized to either the speaker’s or the addressee’s ground. In other words, we predict access to individual grounds whenever the common ground is implicated. Note that there is indeed a body of literature that distinguishes between referents that are addressee-old vs. new (as opposed to speaker-old vs. new; Prince 1981). Moreover, Schaeffer and Matthewson (2005) argue that there is cross-linguistic variation relative to this distinction: in Salish languages, the construction of the common ground relies on the speaker’s beliefs only, whereas in English, it relies on both the speaker’s and the addressee’s 5

It is commonly assumed that the common ground not only contains propositional content, but also individuals (i.e., discourse referents) (Kamp 1981, Heim 1982).

128

Initiating Moves: A Case-Study of Confirmationals

beliefs. The present proposal predicts this type of variation, whereas on the assumption that common ground is a primitive, it is more difficult to model. 5.5

Confirmational Paradigms

According to the USH, the spine not only contributes to the interpretation of UoLs via the spinal functions associated with each layer, it also contributes to one of the essential characteristics of human languages: contrast and paradigmaticity. The unvalued coincidence feature associated with each spinal head needs to be valued and thus the UoLs that serve to value these features enter into paradigmatic contrasts. The interactional spine is no different and hence we expect to find paradigmatic contrasts in confirmationals. The purpose of this section is to demonstrate that this prediction is borne out. 5.5.1

A Paradigmatic Contrast Based on [+/−coin]

In this subsection, I present a system of sentence-final particles that instantiate four logical possibilities predicted by the analysis: [+/−coin] in GroundSpkr and [+/−coin] in GroundAdr. Mandarin has particles that are used if the speaker is certain about the content of what is being said and particles that are used if the speaker is not. Similarly, there are particles that are used if (the speaker thinks that) the addressee is certain about the content of what is being said and particles that are used if (the speaker thinks that) the addressee doesn’t know anything about the content of what is being said. Note that in Mandarin, there is a rich tradition of treating confirmationals syntactically, unlike in English (and other Indo-European languages). The difference in the scholarly tradition reflects the fact that sentence-final particles in these languages are much more pervasive. They not only encode interactional meaning, but also traditional grammatical notions such as tense and aspect. Thus, sentence-final particles form part of a larger paradigm and viewing them as part of grammar appears to be a matter of course. Mandarin sentence-final particles have been extensively studied within and outside the generative tradition (for the former, see Li 2006, Paul 2014, 2015, Simpson 2014, Deng 2015; for the latter, see Hu 1981, 1988, Li and Thompson 1981, Sun 1988, Chu and Chi 1999, Wang 2011, Yip and Rimmington 2015, among many others). Here I only consider four particles with the purpose of filling the typology predicted by the syntactic analysis. We start with the speaker-oriented particles de and a. The former is used to convey that what is being said is in the speaker’s ground, more specifically that it has been known for a while. According to Li, Thompson, and Zhang (1998), de expresses certainty. Consider (52). The speaker needs to make clear that they

5.5 Confirmational Paradigms

129

are certain about the truth of the proposition. In this context, it is obligatory to use de.6 (52)

Cx 12: John was told that Mary drives to work. He wonders whether he can catch a ride. But he is not sure whether Mary drives every morning. He runs into Bob, Mary’s husband, and wants to know whether it really is true. Bob says: a. Ta meitian zaoshang kaiche shangban de. She everyday morning drive work PRT ‘(I confirm that) she drives to work every morning.’ b. Ta meitian zaoshang kaiche shangban. She everyday morning drive work ‘She drives to work every morning.’

Next, we turn to a, which encodes that the content of what is being said is new to the speaker. It therefore conveys that the proposition was – up until now – not in the speaker’s ground. Consider the data in (53). A student thinks they are ready to graduate, but their supervisor is telling them that they still need to publish a paper. This context makes it clear that the proposition is not in the speaker’s belief set. It got there only at the time of the conversation. In this context, a is used. (53)

Student: Dou wancheng le. Wo xianzai deng zhe biye le ‘Everything is done. Now I am waiting for my graduation.’ Advisor: Buguo ni hai xuyao fabiao yi pian lunwen. ‘But you need to publish one more paper (before you graduate).’ Student: ?Shenme Wo hai dei xie yi pian lunwen What? I still must write one CL thesis ‘What? I still have a thesis to write (I didn’t know that).’

a PRT

This establishes that there is a contrast: de conveys that the speaker knows p, whereas a conveys that the speaker doesn’t know p. I analyze de as valuing [ucoin] in GroundSpkr positively (54a), and a as valuing it negatively (54b). (54)

a. [GroundP GroundSpkr [de: +coin] [p]] b. [GroundP GroundSpkr [a: –coin] [p]]

I now turn to the addressee-oriented particles ma and bei. ma is used if (the speaker thinks that) the addressee already knows what is being said, whereas bei is used if (the speaker thinks that) the addressee doesn’t already know. There are two ways in which the speaker might assume that the addressee already knows what they are saying. The speaker can have firsthand experience of the addressee witnessing the truth of the proposition as in (55). 6

Unless otherwise noted, the Mandarin data reported in this section have been generously provided by Xiaodong (Merlin) Yang.

130

Initiating Moves: A Case-Study of Confirmationals

(55)

Mary gave John a puppy. After a month, John calls Mary to ask which kind of dog food is better for his dog. He says to Mary: Ni shangci gei wo le tiao gou ma . . . You last.time give me ASP CL dog PRT Wo xiang wen ni nage paizi de gouliang hao. I want.to ask you which brand of dogfood is good. ‘Remember you gave me a dog last time . . . Now I want to ask which food is good for him.’

The English translation contains remember, which has no direct correlate in the Mandarin utterance. It simply is an effect of the use of ma, which conveys that the speaker thinks that the addressee knows what is being said. A second way in which a speaker can be sure that the addressee already knows what is being said comes about when what is said is something known by everyone. For example, Wang (2009) argues that ma is used to express the obviousness of a fact or state of affairs; and according to Chappell and Peyraube (2016: 323), ma is used for “situations which are viewed as highly evident in nature and which follow logically from the given facts.” This is shown in (56). (56)

a. Diqiu weirao taitang zhuan. Earth round sun turn ‘The Earth goes around the sun.’ b. Diqiu weirao taiyang zhuan ma. Earth round sun turn PRT ‘(It’s known by all that) the Earth goes around the sun.’

In contrast, bei can be used if the speaker thinks that the addressee does not already know the proposition as in (57). (57)

Cx 13: Mary knows that John doesn’t like cats. But one day, as they are shopping together in the supermarket, Mary observes that John is looking at cat toys, and the following conversation ensues. Mary: Ni zenme kan mao de dongxi? ‘Why are you looking at the cat stuff?’ John: Wo erzi jian huilai yi zhi mao bei ‘(You don’t know that) my son picked up a cat somewhere’ . . . yiding yao yang ‘ . . . and wants to keep it anyway’

We can analyze ma and bei as UoLs that associate with GroundAdr such that ma values [ucoin] positively (58a), whereas bei does so negatively (58b). (58)

a. [GroundP GroundAdr [ma: +coin] [p]] b. [GroundP GroundAdr [bei: –coin] [p]]

5.5 Confirmational Paradigms

131

Table 5.3 The paradigm of grounding particles in Mandarin Speaker-oriented

Addressee-oriented

old

de: GroundSpkr [+coin]

ma: GroundAdr [+coin]

new

a: GroundSpkr [–coin]

bei: GroundAdr [–coin]

Thus, the four Mandarin sentence-final particles instantiate the typology predicted by the current proposal. There are particles that are used to convey speaker-old and speaker-new information, and particles that are used to convey addressee-old and addressee-new information. 5.5.2

The Timing of Grounding

In English, the use of eh is insensitive to when the belief to be confirmed is established. However, this is not true for all confirmationals. In Conversation Board 12, the first panel makes it clear that the speaker (Mary) has no previous reason to believe that Paul has a new dog, but this changes at the time of conversation when she witnesses evidence that Paul has a dog. In this context, English eh is well-formed (59), but its Austrian German counterpart geu is not (60a). Instead, in Austrian German a sentence-internal discourse particle leicht is used in this context (60b) (Burton and Wiltschko 2015). It signals that the belief is new to the speaker. There is an element of surprise.

Conversation Board 12. New dog. New belief. (Cx 14)

Cx 14: Mary hasn’t seen her friend Paul in a while. She’s wondering how he’s doing these days. As she’s walking along, she runs into Paul, by accident. Unbeknownst to Mary, Paul has gotten a new dog, since they last saw each other. Mary, surprised to see Paul with a dog, is not quite sure whether it is in fact his dog.

132

Initiating Moves: A Case-Study of Confirmationals

(59)

(60)

So, you have a new dog, eh?

a.

b.

Du You Du You

host an neichn have a new host leicht an have PRT a

Hund, geu? dog, CONF neichn Hund? new dog?

Now consider Cx 15, illustrated in Conversation Board 13. Here the first panel makes it clear that Mary has been told that Paul has a new dog. In this context, eh is well-formed (61) and so is Austrian German geu (62a), whereas leicht is infelicitous (62b).

Conversation Board 13. New dog. Old belief. (Cx 15)

Cx 15: Jimmy tells Mary that their common friend Paul got a new dog. Next time Mary runs into Paul, who is in fact walking his new dog. Even though she is pretty sure that the dog is in fact Paul’s based on her previous secondhand evidence and the presence of the dog, she utters: (61)

(62)

a.

Du host an neichn Hund, geu? You have a new dog, CONF

b.

Du host leicht an neichn Hund? You have PRT a new dog?

So, you have a new dog, eh?

The contrast between (61) and (62) establishes that confirmationals can be sensitive to the timing of when the belief was first established (i.e., the timing of grounding). In Austrian German, the contrast is realized by means of using the confirmational geu (used for old beliefs) and the sentence-internal discourse particle leicht (used for new beliefs). Thus, it appears that we need to distinguish between different times of grounding. But how can we model this? So far, I have argued that eh values [ucoin] positively. It is used to indicate that what the speaker says is in their ground. I argue that the observed sensitivity to the timing of grounding is not conditioned by temporality in the usual sense (something akin to tense). Rather it arises via the interpretation of coincidence. Specifically, I propose that German and English differ as to how the positively

5.5 Confirmational Paradigms

133

Table 5.4 Variation in the interpretation of [+coin] German

English

belief established at t