Understanding Conversational Joking: A Cognitive-Pragmatic Study Based on Russian Interactions 2020017465, 9789027207357, 9789027260925

This book examines the diverse forms of conversational humor with the help of examples drawn from casual interactions am

361 97 3MB

English Pages [299] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Understanding Conversational Joking: A Cognitive-Pragmatic Study Based on Russian Interactions
 2020017465, 9789027207357, 9789027260925

Table of contents :
Transcription conventions
Table of contents
List of figures
Transcription conventions
1. Introduction
2. Conversational joking from a discourse-analytic perspective
3. Humor as a cognitive phenomenon
4. Conversational humor from a discourse-semantic perspective
5. Conclusion
References
Appendix
Index

Citation preview

Understanding Conversational Joking Nadine Thielemann

John Benjamins Publishing Company

Understanding Conversational Joking

Pragmatics & Beyond New Series (P&bns) issn 0922-842X Pragmatics & Beyond New Series is a continuation of Pragmatics & Beyond and its Companion Series. The New Series offers a selection of high quality work covering the full richness of Pragmatics as an interdisciplinary field, within language sciences. For an overview of all books published in this series, please see benjamins.com/catalog/pbns

Editor

Associate Editor

Anita Fetzer

Andreas H. Jucker

University of Augsburg

University of Zurich

Founding Editors Herman Parret

Jef Verschueren

Robyn Carston

Sachiko Ide

Paul Osamu Takahara

Thorstein Fretheim

Kuniyoshi Kataoka

John C. Heritage

Miriam A. Locher

Jacob L. Mey

University of Southern Denmark

Belgian National Science Foundation, Universities of Louvain and Antwerp

Belgian National Science Foundation, University of Antwerp

Editorial Board University College London University of Trondheim University of California at Los Angeles

Susan C. Herring

Indiana University

Masako K. Hiraga

St. Paul’s (Rikkyo) University

Japan Women’s University Aichi University

Universität Basel

Sophia S.A. Marmaridou University of Athens

Srikant Sarangi

Aalborg University

Marina Sbisà

University of Trieste

Kobe City University of Foreign Studies

Sandra A. Thompson

University of California at Santa Barbara

Teun A. van Dijk

Universitat Pompeu Fabra, Barcelona

Chaoqun Xie

Fujian Normal University

Yunxia Zhu

The University of Queensland

Volume 310 Understanding Conversational Joking A cognitive-pragmatic study based on Russian interactions by Nadine Thielemann

Understanding Conversational Joking A cognitive-pragmatic study based on Russian interactions

Nadine Thielemann Vienna University of Economics and Business

John Benjamins Publishing Company Amsterdam / Philadelphia

8

TM

The paper used in this publication meets the minimum requirements of the American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984.

doi 10.1075/pbns.310 Cataloging-in-Publication Data available from Library of Congress: lccn 2020017465 isbn 978 90 272 0735 7 (Hb) isbn 978 90 272 6092 5 (e-book)

© 2020 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Company · https://benjamins.com

Table of contents

List of figures Transcription conventions Chapter 1 Introduction 1.1 Working hypothesis and research questions  6 1.2 Organization and structure  7 1.3 Data  9

vii ix 1

Chapter 2 Conversational joking from a discourse-analytic perspective 13 2.1 Reconstructing laughables  16 2.1.1 The equivocal nature of laughter  18 2.1.2 Practices for the construction of humorous laughables  21 2.1.3 Problems and challenges with the laughable-approach  25 2.2 Contextualizing humor  29 2.2.1 What exactly is contextualized in joking?  32 2.2.2 Contextualization cues for humor in Russian conversations  39 2.3 Summary and interim conclusions  99 Chapter 3 Humor as a cognitive phenomenon 3.1 Humor as inferential communication  108 3.1.1 Humor as non-compliance with Gricean Maxims  109 3.1.2 Relevance theoretic approaches to humor  116 3.1.3 Summary  122 3.2 Humor and the restructuring of mental representations  123 3.2.1 SSTH and GTVH  125 3.2.2 Humor and the Graded Salience Hypothesis  129 3.2.3 Humor as de-automatized conceptualization  134 3.2.4 Summary  139

105

vi

Understanding Conversational Joking

3.3

3.4

Humor as play with resources privileging interpretations  141 3.3.1 Linguistic conventions  145 3.3.2 Textual and discursive regularities  149 3.3.3 Genre  161 3.3.4 Social norms  170 3.3.5 World knowledge  177 Summary and interim conclusions  181

Chapter 4 Conversational humor from a discourse-semantic perspective 185 4.1 Langacker’s Current Discourse Space model  187 4.2 Clark’s joint action hypothesis  191 4.3 Fauconnier and Turner’s blending theory  195 4.3.1 Mental spaces and blends in interaction  205 4.3.2 Conceptual configurations characterizing humorous cognition in interaction  216 4.4 Summary and interim conclusions  253 Chapter 5 Conclusion

259

References

263

Appendix

283

Index

285

List of figures

Chapter 3 Figure 1.  Set-theoretic presentation of the LM as mapping

128

Figure 2.  sdat’ ego (‘return it’) vs. sdat’ na resajkling (‘return for recycling’)

138

Chapter 4 Figure 1.  Langacker’s (2001: 145) Current Discourse Space

188

Figure 2.  Bipolar usage event according to Langacker (2001: 146)

189

Figure 3.  Layers of action according to Clark (1996: 354)

193

Figure 4.  Mental space prompted by (3) according to Fauconnier (1997: 44) 197 Figure 5.  Configuration of mental spaces triggered by (4) according to Coulson (2001: 23) 198 Figure 6.  Conceptual integration network displaying the emergent conceptual structure of a blend according to Fauconnier and Turner (2003: 46) 201 Figure 7.  Unfolding of a network of mental spaces and blends in conversation (Source: Ehmer (2011: 179), slightly adapted) 215 Figure 8.  Network of mental spaces reconstructed from Example (5)

222

Figure 9.  Network of mental spaces reconstructed from Example (6)

226

Figure 10a.  Blend reconstructed from Example (9)

232

Figure 10b.  Blend from Example (9) in its sequential context

233

Figure 11a.  Blend reconstructed from Example (10)

235

Figure 11b.  Blend from Example (10) in its sequential context

236

Figure 12a.  Consistently sustained blends reconstructed from Example (11) 241

viii Understanding Conversational Joking

Figure 12b.  Sequential blending reconstructed from Example (11)

242

Figure 13.  Conceptual structure of a double grounded metaphor according to Brône and Coulson (2010: 216) 246 Figure 14.  Sequential de-blending reconstructed from Example (12)

248

Transcription conventions

Transcription conventions adapted from Zemskaja and Kapanadze (ed.) 1978 (followed by NkruJa) signs indicating to terminative intonations // ! ? / []   … Ma-aš, ugu gm

termination of an utterance or turn constructional unit exclamation question continuative intonation overlapping, simultaneous talk   hesitation pause lenthening M-možno signal of affirmation, acknowledgment signal of doubt

Transcription conventions adapted from Selting et al. (1998) [ [ = (.) (-), (--) (2.0) :, ::, ::: äh .h, .hh h, hh (h) haha, hehe hoho akCENT ? , -

overlapping, simultaneous talk latching between two units micro pause shorter pauses (0.25–0.75 sec.) estimated duration of pauses longer than 1 sec. lengthening of a syllable, according to its duration hesitation marker breath in, according to its duration breath out, according to its duration laughter particle inserted in speech syllabic laughter, according to the quality of the vowel strong accent, higher volume than surrounding speech pitch contour on the syllable carrying the intonation contour high rising rising to a mid-level constant

x

Understanding Conversational Joking ; . ((coughs))

() (tak)

falling to a mid-level deep falling para- and nonlinguistic actions and events para- and nonverbal actions and events accompanying speech and their scope interpretive comments and their scope incomprehensible episode assumed wording

Chapter 1

Introduction

This book is about humor in interaction. This concept covers the most diverse forms and genres of play which emerge in talk-in-interaction (e.g., Bell (ed.) 2017; Norrick and Chiaro (eds.) 2009; Hay 2000; Kotthoff 1998; Norrick 1993a). Although some authors may have terminological preferences, conversational joking (e.g. Norrick 1993a, 1994, 2003) and conversational humor (e.g. Kotthoff 1999, 2006b) can be used interchangeably with humor in interaction (e.g. Norrick and Chiaro (eds.) 2009) as umbrella terms for variously shaped chunks of non-serious utterances triggering pleasure and amusement. In interaction, a unit of humor may consist of a single ironic or nonsense utterance, or an instance of wordplay, but it can also be built up by a collaboratively developed sequence of teasing, fictionalizing or otherwise non-serious turns. Alternatively, it can be couched in a longer narrative presented predominantly by a single interlocutor. Furthermore, humor in interaction may or may not involve a punch line. It may be purely playful but can also contain an element of aggression. It can be initiated deliberately or evolve unintentionally from a lapse. And, while it may be triggered by linguistic mechanisms, this need not be the case since humor can also rely on play with pragmatic or ontological knowledge. Moreover, a playful chunk can make a meaningful contribution to the conversation or serve merely to entertain. This great variety makes the analysis of conversational humor a challenging task, especially if we look – in a wider perspective – for a unifying concept of humor able to embrace and account for conversational humor in its various guises. Linguistic approaches to humor can be divided essentially into three groups, according to their particular perspective on the phenomenon as a whole and to the particular forms and genres of humor on which they focus. What is more, each group of approaches tends to be affiliated with a particular theoretical concept of humor. The first group of approaches start from the assumption that humor can be attributed to the configuration of a verbal stimulus. Scholars who adopt this perspective are basically interested in wordplay or puns, which they describe in a preponderantly structuralist manner. Humorous effects are explained as resulting from the violation of rules concerning syntagmatic combinability or paradigmatic

2

Understanding Conversational Joking

exchangeability, on various levels of the linguistic system. A fundamental output of such analyses consists in taxonomies classifying wordplay according to the level of the linguistic system on which a norm is violated, or to the mechanism employed to achieve an aberration from the norm (e.g., Attardo 1994: 108–142; Bucaria 2004; and for Russian e.g., Ščerbina 1958; Freidhof 1984; Freidhof (ed.) 1990; Kosta and Freidhof 1987; Sannikov 2003; Norman 2006; Il’jasova and Amiri 2009). These analyses suggest that humor stems from a contrast made apparent by the violation of a linguistic norm, or more precisely, from the contrast between the norm-compliant option and the deviant realization. This contrast is taken to be an essential feature of humor. These approaches therefore ascribe verbal humor – preponderantly in the guise of wordplay – to the concept of incongruity. Discourse analyses of joking sequences, which make up the second group of approaches, generally do not aim at describing the phenomenon under investigation in terms of any essential features. As a result, they are happy to skip attempts at defining such an ‘indefinable’ phenomenon as humor (e.g., Hay 2000; Lampert and Ervin-Tripp 2006; Coates 2007; Zemskaja 1983, 1995a). Instead, discourseanalytic approaches to humor prefer to classify humorous sequences in terms of play (cf. Huizinga 2004; Gruner 2000), focusing on the indicators of a playful modality observable on the discourse surface and characteristic of the performance. Such approaches are interested in the way in which a joking activity is designed and organized, and in its social impact on the interlocutors’ relationship. In this sense, joking can serve a wide range of social purposes “from bonding to biting” (Boxer and Cortes-Conde 1997: 275). On the other hand, discourse-analytic approaches mostly avoid the idea that humor can be explained as emerging or resulting from a specific characteristic explicable in terms of a basic humor concept, such as superiority or incongruity, and applicable to diverse forms of joking. The features typically accompanying and singling out jocular utterances and sequences are treated as symptoms of humor and not as its essential traits. Finally, cognitive approaches conceive of humor as a recipients’ phenomenon (Brock 2004a) that is essentially cognitive in nature, and build on the assumption that humor results from the way a stimulus is processed. Frequently, these approaches model the type of processing specific to humor theoretically on the basis of jokes that serve as test cases (e.g., Dynel 2018b; Ritchie 2018; Yus 2003, 2016; Giora 2003; Raskin 1985; Attardo and Raskin 1991; Coulson 2001; Suls 1972; Graesser, Long and Mio 1989; Long and Graesser 1988). Jokes are a prototypical genre of humor in which the humorous effect can be primarily attributed to a punch line that causes a cognitive disturbance, and it is such disturbances, or dissonances that, on this view, characterize humor-specific processing. The perceived dissonance often triggers a cognitive reorganization which enables the recipient find sense in nonsense (Freud 1905). This humor-specific processing



Chapter 1.  Introduction

is therefore also referred to as incongruity-resolution. Defining humor in terms of disconfirmed expectations was a suggestion made by Kant or Schopenhauer, who can thus be regarded as precursors to the scholars who take this cognitive perspective today. As previously remarked, each of these three groups of approaches highlights a different concept of humor that matches the forms or genres of humor it focuses on. Not surprisingly, each analytic framework proves less capable of explaining forms of humor outside that focus. As a result, humor linguistics displays a kind of eclecticism which deserves attention, especially bearing in mind the diversity of conversational humor. Seemingly, there is no comprehensive concept wide enough to embrace conversational joking in all its diversity. In particular, discourse analysis, with its focus on observable symptoms, avoids the challenge of defining humor in terms of essential features or identifying a basic concept to which it can be traced back. This book, by contrast, endeavors to develop an analytical approach capable of encompassing conversational humor in terms of a basic concept that embraces joking in all its various guises. Indeed, that is the first of its two goals, in pursuing which its main argument will be that incongruity (and its momentary resolution) can be devised in such a way as to encompass diverse forms and genres of conversational humor, including those without a punch line. On the other hand, the work aims to scrutinize cognitive linguistic and pragmatic frameworks for their potential both to analyze conversational joking and to account for the specific features responsible for the humorous effects arising from its diverse forms. In the latter perspective, we advocate a cognitive-pragmatic approach to humor in interaction, an approach which explains how cognitive processes and utterance features interact and contribute to the humor-specific construal of meaning-in-interaction. In line with cognitive approaches, we start from the assumption that humor is essentially characterized by a disruptive or puzzling way of processing a stimulus, which is externalized in a measurable rise in processing effort (Coulson 2001; Giora 2003; Brône 2009). To anticipate the main line of argumentation: When processing a humorous stimulus, the recipient is forced to shift from a cognitively privileged and more easily accessible interpretation to one that is cognitively less privileged, and therefore less accessible, in the widest sense. In the case of conversational humor, this disruptive and costly processing is facilitated by the layout of the utterances concerned, yet also influenced or guided by pragmatic factors and by co-activated, encyclopedic background knowledge. In other words, linguistic, pragmatic and encyclopedic knowledge may all be involved in triggering the humor-specific contrast of interpretations. As a result, dissonance can exist on various levels of face-to-face interaction. In conversational humor, contrastive interpretations can be activated and triggered by various means (i.e., verbal, para- and

3

4

Understanding Conversational Joking

nonverbal input) and are enriched by pragmatic and encyclopedic knowledge. Humor – and especially conversational humor – is thus a phenomenon which is best analyzed within a framework which combines both cognitive and discursive or pragmatic perspectives. Consequently, in reconstructing humor as a cognitive-pragmatic category (cf. Kotthoff 1998; Brock 2004a; Zima 2013a; Brône 2009; Dynel 2009a), this work will combine models and analytical approaches from discourse analysis and cognitive linguistics in order to grasp the humor-specific cognition of jocular talk-in-interaction. Using data preponderantly stemming from casual conversations among Russian interlocutors, we will show, first, how various forms and genres of humor trigger contrastive interpretations on different levels of talk-in-interaction – each time clashing with a cognitively privileged option that matches the expectations nurtured by norms and normalcy. Second, we will demonstrate how formal cues and features setting apart humorous sequences from surrounding, non-humorous discourse facilitate and contribute to this process. This book is therefore also concerned, in a wider perspective, with the questions of how interlocutors interpret and understand utterances in interaction, and how they make meanings available to each other. This question is addressed by both discourse analysis and cognitive linguistics, and indeed plays a crucial role in each of these disciplines. From a discourse-analytic point of view, conversation analysis relies mostly on overt displays of how an utterance is understood and reacted to by its recipient(s) in the next turn. Cognitive linguistics, by contrast, provides models of how meaning, once triggered by linguistic input and enriched by pragmatic and encyclopedic knowledge, is mentally represented within one conceptualizing mind. Langacker, however, stresses that “an individual mind is not the right place to look for meanings. Instead, meanings are seen as emerging dynamically in discourse and social interaction” (2008: 28). Thus neither of the disciplines alone seems to be able to provide a fully satisfactory account of the construction of meaning in conversation. In combining and reconciling the two approaches to explain and analyze how meaning is jointly constructed and emerges in talk-in-interaction, we will follow the recently productive strand of research focusing on cognition in interaction (e.g., Langacker 2013b; Ehmer 2011; Zima 2013a; Oakley and Hougaard (eds.) 2008; Hougaard 2005; Langlotz 2010; Deppermann 2002; Imo 2011; van Dijk 2006; Kibrik 2011; Stadelmann 2012; Pascual 2014). In this context, working with conversational data provides a key advantage. Strictly speaking, cognitive phenomena, such as inferences or mental representations, remain linguists’ reconstructions or interpretations insofar as they are not externalized in some way (cf. Haugh 2017). In face-to-face interaction, however, meaning is jointly constructed, and recipients’ reactions and follow-up turns contribute fundamentally to the



Chapter 1.  Introduction

meaning of an utterance. As Imo (2011: 274) succinctly puts it in a slightly modified quote from van Dijk (2006: 164): “Cognitions are not observable – but their consequences are.” And those consequences can be accessed using conversational data (cf. Deppermann 2018). In humor, we are dealing with creatively derailed cognition. Shifting from one interpretation to another motivates the creation of accounts for this shift and causes a restructuring of the cognitive environment. The processing of humorous utterances makes additional meanings available, establishes new links and breaches expectations about what can ‘normally’ be expected to happen. In this process, a contrast or incongruity is perceived which can be momentarily, but creatively resolved. Humor thus represents an interesting test case, which is a powerful argument against those who object that it is a topic too marginal to merit consideration. Summing up arguments why linguistics should be interested in humor, Brône (2009: 18–21) stresses that a seemingly marginal phenomenon relying essentially on aberration and deviance enables the detection and description of regularities that otherwise cannot be observed very distinctly or visibly. He points in particular to advances in cognitive linguistics which rely essentially on the analysis of ‘deviant’ phenomena such as metaphor or metonymy (2009: 18) and provides a quote from Fauconnier which tellingly stresses this additional benefit: Errors, jokes, literary effects, and atypical expressions use the same cognitive operations as everyday language, but in ways that actually highlight them and can make them more salient. As data, they have a status comparable to laboratory experiments in physics: things that may not be readily observable in ordinary circumstances, which for that reason shed light on ordinary principles.  (Fauconnier 1997: 125)

Similarly, both Il’jasova and Amiri (2009: 29–32) and Weiner and De Palma (1993: 183) point to the potential offered by humor and language play, as instances of aberration, to learn more about the ‘normal’ rules governing linguistic phenomena and language use. It is therefore not surprising that several recent publications (e.g., Arons 2012; Goatly 2012; Dubinsky and Holcomb 2011) employ humorous examples in order to explain regularities of language structure, norms governing language use or standard forms of language processing. In particular, humorous instances involving a norm breach serve to illustrate how the norm, or regularity, functions. Prime examples are the introductions by Goatly (2012) and Arons (2012), the former focusing on semantics and pragmatics, the latter on the levels of the linguistic system. Dubinsky and Holcomb’s (2011) Understanding Language through Humor even attempts to cover both these aspects, being designed as a comprehensive introduction to linguistics that covers topics ranging from phonetics and phonology, morphology and syntax, to semantics, pragmatics,

5

6

Understanding Conversational Joking

variational linguistics, language acquisition and intercultural communication. Similarly, several chapters of Gorelov and Sedov’s (1997) textbook on psycholinguistics examine the violation of norms evident in jokes and other humorous texts in order to explain the ways in which utterances and texts are structured and processed ‘normally’. 1.1

Working hypothesis and research questions

Throughout this book we will pursue the central working hypothesis that conversational humor in its various guises can indeed be accounted for in terms of a unifying basic concept: incongruity. Starting from the assumption that humor results essentially from a clash of contrastive interpretations, either successive or simultaneous, differing in their cognitive accessibility, we will be particularly concerned with the resources that lend a cognitively privileged status to an interpretation. We suggest that these resources can be subsumed under the label of norm(s) in the widest sense, covering not only linguistic norms, pragmatic norms, and default ways of processing (e.g. default construal operations), but also expectations nurtured by general experiences of normalcy which acquire the status of knowledge. When joking in various ways, interlocutors breach and play with a very wide range of norms. The first challenge, then, is how to account for the norms and resources that establish expectations and privilege interpretations, and which are played with in conversational humor. Interlocutors’ playful deviation from interpretations and expectations which are favored by norms and normality assumptions serves to generate a communicative surplus of amusement and pleasure, at least. The shift between, and combination of different interpretive frames in joking sequences is often further exploited for the creation of novel senses and additional meanings, referring to richer cognitive processes that usually remain in the background. These come to the fore in joking, especially when interlocutors react to humorous utterances with more play. Thus, the second challenge is how to model the process of humor-specific cognition in interaction, including the emergent and interactive nature of talk-ininteraction, and thus explain the emergence of creative meanings and interpretations in humorous discourse. Special attention will be devoted to the means by which speakers signal to each other that an utterance or a sequence of turns is not meant to be interpreted seriously. Such cues guide recipients’ interpretation of what is said and what is going on in interaction. Nevertheless, we argue that to treat all of them as mere signals indicating a humorous discourse modality is to adopt an excessively narrow perspective. So our third challenge is to establish the contribution of those features



Chapter 1.  Introduction

that single out jocular utterances and sequences from surrounding non-humorous discourse to the construal of humorous meaning-in-interaction. 1.2

Organization and structure

In order to pursue these research questions and to validate the working hypothesis, the analysis will test and combine methodological approaches from two fields; on the one hand, discourse linguistics, and, on the other, cognitive linguistics and pragmatics, which focus on how utterances are processed. This is reflected in the structure of the book, the body of which is divided into three major chapters. Chapters 2 and 3 discuss, respectively, the discourse-analytic and cognitive, or processing perspectives on humor, and include reviews of the relevant literature. Whereas discourse analysis predominantly focuses on observable “peculiarities of the performance” (Kotthoff 2006b: 299), modeling meaning and interpretation almost exclusively as a members’ category externalized in interlocutors’ reactions, cognitive-linguistic and pragmatic approaches concentrate on modeling how speakers understand utterances and build up conceptualizations. Chapter 4 then develops an alternative, discourse-semantic approach. Combining ideas from those previously discussed, this allows jocular utterances and sequences to be re-analyzed in terms of specifically structured mutual discourse representation(s) emerging in interaction. The overall structure of the book is thus dialectic. While the approaches discussed in Chapter 2 take as their starting point the layout or design features of humorous discourse, those treated in Chapter 3 model the way in which humorous utterances are processed. Chapter 4 provides a synthesis by developing an analytic framework which explains how both cognitive processes and the design features of utterances contribute to the construal of meaning-in-interaction characteristic of humorous discourse. The rest of this section provides the reader with a more detailed overview of the content of the remaining chapters, enabling him or her to skip a chapter if they so wish. Indeed, the chapters are designed to be read separately and thus in any order. Nevertheless, it is recommended to follow the order in which they are presented. Chapter  2 takes a discourse-analytic perspective on conversational joking which basically aims at reconstructing humor as a members’ category. Discourse analysis (including interpretive sociolinguistics and conversation analysis) provides heuristics for the identification of humorous utterances and sequences in face-to-face interaction. Its methodology helps in revealing the cues to which interlocutors themselves orient when interpreting an utterance as serious or non-serious. Consequently, discourse analysis offers a neutral procedure for identifying and extracting humorous utterances in face-to-face interaction, a procedure which

7

8

Understanding Conversational Joking

is applied in this research. Nevertheless, Chapter 2 also addresses the problems which deprive the procedure of its seemingly straightforward character. Laughter, for example, is a resource employed in various ways and does not exclusively occur in humorous contexts. Similarly, humor delivered deadpan, without any overt marking, poses a challenge to this procedure. Chapter 2 presents the cues to which Russian interlocutors orient when joking and which single out humorous utterances. It offers a semiotic classification which suggests that such cues contribute to the construction and interpretation of humorous turns and sequences in various ways. The crucial question thus turns out to be: What is actually signaled and contextualized by these cues and what exactly is their status for humor? This question will be taken up in Chapter 4, which offers a cognitive-pragmatic account of how cues guide and shape interlocutors’ interpretation of what is going on in discourse. Chapter 3 adopts a cognitive perspective on humor which localizes the phenomenon primarily in the way a stimulus is processed. Here, we advance the argument that humorous stimuli force the recipient to shift or oscillate between two interpretations or meanings that differ in their cognitive accessibility. What distinguishes conversational humor, is that this cognitive contrast, or contrast in accessibility, can pertain to meanings and interpretations referring to extremely diverse aspects of talk-in-interaction. The chapter is therefore concerned with approaches which model how recipients arrive at a contextualized interpretation of an utterance, either in Gricean terms or in terms of restructured mental representations. The unexpected, remote additional meanings and interpretations that are generated when joking are explained by Gricean approaches in terms of inferences triggered by the breaching of pragmatic principles. By contrast, approaches of the latter type map humor-specific processing onto a mental representation that is structured, or restructured, in a manner different from the default manner of conceptualizing, or which forces recipients to skip prioritized meanings or interpretations. Additional meanings and interpretations made available in humor accordingly result from the interaction of items and structure evoked and included in the mental representations, which are enriched by pragmatic and encyclopedic knowledge. Against this background, Chapter 3 is devoted to the mechanisms and resources which cognitively privilege an interpretation or conceptualization, and shows that they can be theoretically modelled in terms either of pragmatic constraints or of frames. The notion of norm is suggested as an overarching concept to capture the diverse resources privileging conceptualizations and shaping expectations in talk-in-interaction. Norms may be, inter alia, of a linguistic, textual, discursive or social nature, since humor can in principle rely on play with, or the breaching of conventions in any of these areas.



Chapter 1.  Introduction

Chapter 4 combines and reconciles ideas from cognitive pragmatics and discourse analysis to develop a discourse-semantic approach that meets the demands of analyzing spontaneous verbal online creativity. In conversational joking, humor derives from the one-off or repeated establishment of tension and contrast resulting from the breaching of various norms. Consequently, the analytical account must also take a process perspective, consider that meaning in interaction is jointly negotiated and address the impact of utterances’ design on their understanding. Current discourse representation theories (e.g., Clark 1996; Langacker 2001, 2013b) consider aspects such as the emergent and shared character of a mental representation in talk-in-interaction to limited degrees only. Against this background, we advocate a discursive extension of blending theory (Fauconnier and Turner 2003; Coulson 2001; Coulson and Oakley 2000) which allows for the discourse-semantic analysis of conversational sequences and thus enables humorous cognition in interaction to be traced. The chapter provides an analysis of the conceptual configurations underlying humorous sequences. In doing so, it also reveals the varying impact of linguistic and multimodal input on conceptualization. The discourse-semantic framework that it introduces further allows for a re-assessment of the cues and features characterizing and accompanying humorous utterances, which constituted the focus of Chapter 2. Some of these merely indicate a mellow atmosphere, or perhaps facilitate the perception of an unexpected contrast by drawing attention to something peculiar going on. Others, however, go further. For they elicit and structure a mental representation of what is going on in discourse, and thus guide interlocutors’ interpretations and expectations – which can then be played with. Finally, Chapter  5 sums up the book’s main argument that conversational humor in its various guises subscribes to a basic humor concept, namely norm breaching, which can be traced back to incongruity-resolution and that conversational humor is best analyzed in a discourse-semantic framework combining ideas from cognitive linguistics and discourse analysis. 1.3

Data

Video data can be regarded as a prime source for the analysis of joking activities that emerge in face-to-face interaction, and of the features characterizing them. Such data offer access to all potentially relevant cues for humor including facial expression, gesture and body language (cf. Ford and Fox 2010; Wu 2003). However, video-recording strongly influences a person’s behavior and very often causes excessive self-monitoring that deprives the data obtained of much of their naturalness. In order to limit the negative impact of this observer’s paradox, our analysis

9

10

Understanding Conversational Joking

relies mainly on audio data. Of course, audio-recording may also affect and influence the situation, but it can be regarded as less intrusive since the visual channel is omitted from observation. The data on which this research is based stem from different sources and in principle allow for the analysis of all potentially relevant non-visual humor cues. They come predominantly, but not exclusively, from casual conversations among adults living in larger cities with a relatively high educational level. Since socially determined preferences for specific forms and genres are beyond the scope of our research, this corpus structure seems acceptable. Unfortunately, not all the sources offer access to audio data but merely to transcripts. The corpus of conversations analyzed has three parts. The first is composed of tapes collected by the author from participating interlocutors. These were provided with digital voice recorders and asked to record gatherings with their friends, colleagues and relatives in casual situations, mostly in their own homes. They were also asked to secure informed consent to the recording well in advance. Interlocutors were aged between the early twenties and the early forties, and most were students or graduates resident in larger cities. The group included both Russian native speakers and heritage speakers (who had mostly completed at least some school education in a Russian-speaking country). The second part of the corpus consists of data from the One day of speech project (Odin rečevoj den’ / ORD) hosted at the State University of Saint Petersburg.1 The ORD corpus is a unique source of authentic spoken Russian comprising whole days of speech of Saint Petersburg residents (Sherstinova 2009; Asinovsky et al. 2009). Participants came from differing social and age groups, and had varying professions and educational backgrounds. They were equipped with digital voice recorders and asked to tape their speech days. As a result, ORD contains both private and institutional communication. Like the author’s own tapes, this source offers access to the audio data, something extremely valuable for the analysis of joking sequences. Unlike sources which provide transcripts only, these first two parts of the corpus allow iterative re-assessment of transcripts and repeated checking of the audio tape itself. This is especially important since ready-made transcripts generally prepared for other purposes restrict the ability to consider paralinguistic cues such as laughter, or changes in articulation or voice quality, which play an important role in framing an utterance as non-serious. 1.  The author wishes to express her gratitude to Tatiana Sherstinova and the whole ORD team for granting her access to the ORD corpus, and to the University of Hamburg for its award of a scholarship for her research stay in Saint Petersburg in autumn 2013, during which ORD data were trawled for humorous episodes.



Chapter 1.  Introduction

Nevertheless, ‘edited’ and transcribed collections of conversations were also consulted, and they make up the third part of the corpus. Some stem from the Russian National Corpus (NkruJa), which includes a subsection of spoken data (). Others are drawn from edited collections providing transcripts of conversations conducted in the variety of colloquial Russian (russkaja razgovornaja reč’) habitually ascribed to well-educated urban dwellers in casual face-to-face interaction (cf. Koester-Thoma and Zemskaja (eds.) 1995). The dialogues documented by Zemskaja and Kapanadze (eds.) (1978) and Kitajgorodskaja and Rozanova (eds.) (2005) provide data of this kind originally taped in Moscow, and are for the most part included in the Russian National Corpus. Finally, use was made of a collection edited by Šalina (2011), which also includes conversations among working class people and blue collar workers originally recorded in the city of Ekaterinburg in the Urals. Their speech displays features of a non-standard variety of Russian (prostorečie). In presenting extracts from these sources, various transcription conventions were used. While the conventions of the edited collections and the Russian National Corpus were preserved (see Appendix), the author’s data were transcribed, and those taken from ORD adapted, according to the conventions given in the Appendix, where identifiers are again specified. Each extract is also provided with sufficient context to allow for contextualized understanding.

11

Chapter 2

Conversational joking from a discourseanalytic perspective

Discourse analysis provides methodological frameworks for the analysis of conversational humor in terms of a members’ category. This combines with the benefit of approaching conversational joking without any preconceived concept or theory of humor that necessarily defines it as an analyst’s category. For the sake of terminological clarity, discourse analysis serves as a cover term for various approaches to language above the sentence level (Schiffrin 1994). In this chapter, conversation analysis, interpretive or interactional sociolinguistics, and also the ethnography of communication, will all feature prominently under this label. These frameworks provide methods for identifying what is presented and perceived as funny by the interlocutors themselves, while focusing on the devices and strategies which make a stretch of talk non-serious or not entirely serious. Humorous utterances are usually set apart from the surrounding discourse by several design features. Methods allowing for the detection of these features enable the analyst to identify and select humorous instances from a corpus of authentic face-to-face interaction, and to do so from a participant’s perspective. At the same time, the instances gathered in the process of reconstructing conversational humor as a first-order phenomenon are subsequently available for re-analysis from an analyst’s perspective, designed to identify a second-order concept of humor wide enough to embrace conversational joking in its various guises. As the main argument of this book evolves, we will show how approaches focusing on first-order or participant-constructed categories and on second-order or analyst-constructed categories can inform each other with regard to conversational humor. A stance free from any preconceptions about the phenomenon of interest – in our case, initial neutrality concerning theoretical concepts of humor that try to trace it back to incongruity, superiority or relief  – derives particularly from conversation analysis and its interest in “emic social reality” (ten Have 1999: 36). Pike’s (1993) distinction between emic and etic perspectives on a phenomenon is matched by the dichotomy members’ category vs. analyst’s category. Ten Have (1999: 36) resorts to this dichotomy in explaining conversation analysis’s exclusive focus on what counts as relevant for the participants in an interaction when accomplishing a social action. He quotes Pike: “Emic descriptions provide an internal

14

Understanding Conversational Joking

view, with criteria chosen from within the system. They represent to us the view of one familiar with the system who knows how to function within it himself ” (Pike 1967: 38). Although conversation analysis aims at describing “members’ knowledge-in-use, that is, members’ methods or ‘procedural infrastructure of interaction‴ (ten Have 1999: 36), ten Have advises against an overly narrow understanding of a members’ category which runs the risk of equating it with members’ use of linguistic labels for that category. He (1999: 37) refers the reader to a similar caveat expressed by Goodwin, who suggests determining “emic phenomena in terms of how phenomena are utilized within specific systems of action, not with labels recognized by informants” (Goodwin 1984: 243). Accordingly, in Goodwin’s own research “structures that participants attend to within a strip of talk (…) have been specified, not by questioning the participants, but rather through study in detail of the actions they perform as the talk itself emerges” (1984: 243, quoted from ten Have 1999: 37). Similarly, in the context of our study, the touchstone for selecting humorous instances from the given corpus is not necessarily the attachment of a linguistic label to a particular discursive episode classifying it as an occurrence of a specific form or genre of humor or simply as funny. Rather, it is the mutual orientation of the interlocutors to a given feature signaling humorous intent. This is in line with Eisterhold, Attardo and Boxer (2006: 1241–1242) who: are in complete agreement with Kreuz (2000: 104) “the job of the listener is to recover the discourse goals of the speaker and not to identify some rhetorical label like irony or understatement.” It is probably both the existence of folk-taxonomies that distinguish between, say, irony and exaggeration, and the fragmented way in which the domain of indirect speech has been examined that has led to a loss of perspective, whereby phenomena such as irony, understatement, teasing, etc., have been seen as existing outside of the intentions of the speakers and outside the contexts in which their utterances take place.

This extensive quotation is justified by the authors’ strong plea for the implicit reconstruction of phenomena from a members’ perspective that lies at the heart of this chapter. Summing up, discourse-analytic approaches which aim at revealing what interlocutors orient themselves to when judging whether an utterance is humorous or not initially assume no theoretical concepts of humor. Prosodic, verbal, pragmatic and other features observable in humorous utterances are regarded as indicators or symptoms of humor. Thus, discourse analysis of conversational humor pursues the logic of abductive reasoning in defining its object. Recurrent forms or sequential patterns of joking are determined and characterized in an inductive procedure on the basis of their frequent and similarly designed occurrence. This sufficiently warrants the assumption that interlocutors obtain tacit knowledge



Chapter 2.  Conversational joking from a discourse-analytic perspective

of how to participate in conversational joking and how to signal and recognize humorous intent. Discourse analytic methods aim at revealing this tacit knowledge of members’ methods. Nevertheless, deeper insight into members’ methods should lead to tentative or open categories of the humorous, allowing for further empirical testing and finally resulting in an empirically informed theoretical account of humor. Kotthoff (1998: 93–96) stresses the benefits of combining abductive and inductive methods in analyzing conversational humor, since both guarantee a theoretical perspective without any theoretical preconceptions and ensure an explorative methodology based on testing, validating and possibly substantiating initially open and tentative categories. In matching a dialogue of theory and empiricism, the same author (1998: 95) assigns abductive methods a crucial role in determining why people laugh and how joking sequences are accomplished in conversations. Abduction is the only way to generate hypotheses as explanations for recurrent effects – in Peirce’ terms ‘guessing’ – which can later serve to derive rules (Kotthoff 1998: 93). As he [i.e. Charles Peirce (CP), N. Th.] says, “[a]‍bduction is the process of forming explanatory hypotheses. It is the only logical operation which introduces any new idea” (CP 5.172); elsewhere he says that abduction encompasses “all the operations by which theories and conceptions are engendered” (CP 5.590). Deduction and induction, then, come into play at the later stage of theory assessment: deduction helps to derive testable consequences from the explanatory hypotheses that abduction has helped us to conceive, and induction finally helps us to reach a verdict on the hypotheses, where the nature of the verdict is dependent on the number of testable consequences that have been verified. (Douven 2011)

Following the heuristics of discourse analysis sketched above, this chapter aims to reveal how the Russian interlocutors in our data signal to each other that they are joking. The features of humorous utterances to which recipients orient themselves when decoding the non-serious dimension of an utterance are crucial in reconstructing conversational humor as an emic category. These features also serve as pointers for the extraction of humorous sequences from the data corpus. In this context, accounts of humor within two discourse analytic frameworks are assessed: conversation analysis (2.1) and interpretive sociolinguistics (2.2). Due to its methodological rigor which constrains any preemption, conversation analysis avoids the notion of humor even terminologically and deals with it as a phenomenon intertwined, yet not coincident with the laughable (i.e., a turn eliciting laughter). The equivocal and ambivalent nature of laughter makes it a contingent indicator of humor. This, along with other aspects such as the occasional absence of laughter as a response, suggest interpretive sociolinguistics as an alternative discourse analytic framework utilizable for the analysis of conversational humor. Relying

15

16

Understanding Conversational Joking

essentially on contextualization theory, interpretive sociolinguistics promotes an understanding of conversational humor in terms of a discourse modality actively established by the interlocutors themselves using various contextualization cues. This chapter aims to evaluate and assess these two discourse analytic frameworks for the analysis of conversational humor, which differ mainly in their understanding of context, their handling of ethnographic background knowledge and the analyst’s involvement in the interpretation process. As a practical outcome, the features or cues of humor observable on the surface of discourse are available for thorough re-analysis that questions their precise status, ranging from mere indicators to iconic cues and inherently funny elements. In a further perspective, this raises the question of what exactly is contextualized by these elements and cues, which will be taken up in Chapter 4. 2.1

Reconstructing laughables

Jocular utterances and sequences are usually set apart from surrounding serious discourse by their design. Moreover, joking distracts interlocutors from the matter at hand and consequently affects conversational organization, from the sequential level up to the organization of larger discourse chunks and episodes (Norrick 1993a: 20–42). On the micro-level, joking and laughter are often treated as an adjacency pair, suggesting an underlying stimulus-response relationship between them. Yet this is a rather simplistic perspective owing to the ambiguous nature of laughter, the variety of possible reactions to humor and the fact that humorous utterances may be delivered deadpan. This subsection adopts a conversation-analytic perspective on laughter and its potential to serve as an indicator for humor, while also pointing to problematic aspects of this approach. Firstly, it is important to stress that conversation analysis avoids the notion of humor since this would require some kind of a priori concept, something that conversation analysis is generally reluctant to apply. Conversation analysis broaches the issue of joking in the context of investigation into the laughable (e.g., Schenkein 1972; Holt 2007, 2013b; Ford and Fox 2010). Yet laughter is not merely a reaction to humor but occurs in conjunction with a range of social actions and tasks in conversational management (e.g., Glenn 2003; Holt 2013a; Glenn and Holt (eds.) 2013; Wagner and Vöge (eds.) 2010; Vöge 2008; Jefferson 1979, 1984, 1985). Terminologically, the laughable is defined “retroactively to describe any referent that draws laughter or for which I can reasonably argue that it is designed to draw laughter” (Glenn 2003: 49). Scholars differ in determining the extension of what is laughable. In Holt (2013b: 73), the term “refers to a turn or component of a turn” eliciting laughter. Ford and Fox’s (2010: 349) analysis is based on a broader



Chapter 2.  Conversational joking from a discourse-analytic perspective

understanding of “laughables as actions and sequences of actions which participants formulate with and respond to with possibly laughter relevant displays.” In the broadest sense, any stretch of talk – be it a segment of an utterance, a whole turn or a sequence of turns – which elicits laughter as a response is laughable. Conversation-analytic investigation into the laughable aims at detecting “potentially laughter relevant” (Jefferson 2010: 1478) features of turns and actions (i.e., features to which interlocutors orient to by laughing). The interest is in how interlocutors design an utterance or a turn in order to present it for others as laughable. Glenn (2003: 33) stresses that conversation analysis does not ask why people laugh, in the sense of looking for inherent characteristics of utterances which would undoubtedly elicit laughter. Rather, the focus is “on what people display to each other and accomplish in and through their laughter” (Glenn 2003: 33). He is particularly pessimistic concerning the impact of humor-theoretical accounts and denies the possibility of an intrinsic characteristic trait, determinable a priori, which lends an utterance a humorous quality. Instead, he argues that: [v]‍irtually any utterance or action could draw laughter, under the right (or wrong) circumstances. This fact dooms any theory that attempts to account coherently for why people laugh (…). Although speakers design some turns at talk specifically to provide for recipient laughter (for instance, the punch line of a joke), the distinction between what does and what does not count as laughable, or what makes some particular items humorous (a notion overlapping but not synonymous with laughable) remains elusive. (Glenn 2003: 49)

Accordingly, “[f]‍unniness becomes understood not as an inherent property of a message, or the internal state of a social being, but rather as a jointly negotiated communicative accomplishment” (Glenn 2003: 33). In a similar fashion, Holt (2013b: 70) treats “nonseriousness” as “a result of the negotiation and collaboration of participants”. Conversation analysis is therefore mainly interested in the cues, means and techniques by which interlocutors actively define something as funny and, in doing so, mutually accomplish laughability. Ford and Fox (2010: 365) consequently suggest “understanding these practices as deployed within and constitutive of the courses of action in which amusement is reciprocally enacted”. Thus, all these scholars agree in understanding joking and jocular utterances in terms of shared or better jointly enacted amusement. In this context, laughter remains a key indicator and display of funniness, nonseriousness, amusement or mirth – all these being alternative terms used to avoid humor. Yet laughter is also an uncertain indicator of funniness or humor. Before discussing the humor-relevant displays and practices so far observed by conversation analysts as means of constructing laughables, we will take a closer look at the ambivalent nature of laughter and, hence, the laughable.

17

18

Understanding Conversational Joking

2.1.1

The equivocal nature of laughter

Laughter is a regular resource in interaction that is utilized by interlocutors in various ways – not all of which are related to funniness or amusement. Conversation analysis does not work on the assumption that laughter functions “as a pure mood display” but determines the “meaning(s) of any particular bit of laughter (…) by characterizing the local, sequential environments in which it occurs” (Glenn 2003: 34). As a “non-speech sound” (Jefferson, Sacks and Schegloff 1987), it can be placed within a turn or sequence in various positions where it is regularly oriented and reacted to. Summarizing conversation-analytic research on laughter, Holt (2013a) and Glenn (2003) present various placements and uses of laughter, ranging from marking turn or topic termination, or modulating the strength of an action, to signaling social hierarchies. In doing so, they reveal the various meanings of laughter in face-to-face interaction. Laughter can be shared or unilateral (i.e., recipients can refrain from joining in). Early conversation-analytic studies detect a preference for the recipient laughing along with a first turn containing laughter or laughter particles (Jefferson 1979; Jefferson, Sacks and Schegloff 1987). Shared laughter may signal affiliation “with the conveyed stance of the prior speaker” (Holt 2013a: 2; cf. Jefferson 1984: 348–349). Yet there are context-sensitive deviations from this regularity. Jefferson (1984), for example, shows that speakers engaged in trouble-telling rely on laughter and laughter particles to display their trouble resistance. At the same time, she argues that when recipients do not also laugh in listening and responding to the trouble-telling, they signal their trouble receptiveness. Jefferson, Sacks and Schegloff (1987) make similar observations, with laughter within turns and sequences presenting improprieties (e.g., verbal or moral lapses, taboo topics) in order to pursue intimacy: If recipients react to such delicate narratives with laughter or escalating comments, they display their complicity by demonstrating that they share amusement about the same issues; intimacy is created. However, if recipients withhold laughter, the issue is usually not elaborated further since there is no overt display of affiliation. These regular uses and distribution patterns of laughter illustrate that laughter, as well as speech interspersed with laughter particles, can and often does acquire meanings different from funniness. Laughter is neither necessarily an indicator of, nor a reaction to something humorous going on in conversation (Attardo 1994: 10–14). At this point, it is worth stepping back and taking a wider perspective on laughter which does not narrow it down to an interactional resource. Laughter is a physiological reaction and yet a psychologically relevant display (Provine 2000; Poyatos 1993). It is a biological behavior that is, nevertheless, socially constrained (Schwitalla 2001: 326). Culture, social roles and situations, for example,



Chapter 2.  Conversational joking from a discourse-analytic perspective

all influence the intensity of laughter, as well as whether laughter is considered appropriate at all. The funniness of the stimulus itself is not the sole influence on the quality or intensity of the laughter response (Attardo 1994: 11). Kotthoff (1998: 105–109) stresses for the twofold nature of laughter. She defindes it as an activity which is “at the same time bodily and symbolic” (“körperliche und symbolische Aktivität gleichzeitig” 1998: 105). Due to this twofold structure (“Doppelstruktur”), it is both emotive display and semiotic resource (Kotthoff 1998: 107). This enables the two perspectives, the social and the psychological, to be reconciled. By their laughter, interlocutors define something as worth being laughed about and point to a specific emotive stance that laughter expresses in itself. This turns laughter into an “iconic strategy for the transformation of meaning” (“ikonisches Verfahren der Bedeutungsveränderung” 1998: 195). It is this potential that lends laughter the character of an “index or indexalic icon” (“Index oder indexalisches Ikon” 1998: 195). As Kotthoff still treats laughter as a key indicator of humor, the corresponding emotive stance is exhilaration (“Erheiterung”). However, studies on the distribution of laughter and laughter particles in sequences of improper talk (Jefferson, Sacks and Schegloff 1987) and trouble-telling (Jefferson 1984) suggest that the concept of exhilaration does not adequately cover such instances since it contains a component of pleasantness. Chafe (2007) proposes an alternative approach to humor and laughter in tracing them both back to one and the same feeling: the feeling of nonseriousness. He explicitly challenges the idea that “laughter is subservient to humor” and instead argues that “laughter itself is subservient to a feeling, and that the feeling is what we ought to be treating as underlying laughter and humor both” (Chafe 2007: 1). Chafe (2007: 65) ascribes a “pleasantness component” to this “complex feeling”, which inhibits “a default tendency to take things seriously – in other words, an unwillingness to incorporate such things into one’s knowledge of how the world really is.” Following his line of argumentation, humor is just one technique for deliberately eliciting the feeling of non-seriousness. Other than by humor, this feeling can also be elicited in other situations and even accidentally. The situations Chafe (2007: 73–87) proposes, however, mostly seem to lack a pleasant component. Essentially, he identifies undesirable and abnormal situations as those in which laughter, as an indicator of the feeling of non-seriousness, occurs in the absence of humor. Among the former type, Chafe mentions inappropriate choice of language (e.g., vulgar and obscene words), self-deprecation, criticism or talk about things that are disgusting or depressing (2007: 74–81). The abnormal situations in which laughter occurs are, inter alia, characterized as anomalous or surprising, or as containing something unexpected or awkward (2007: 82–95). Although these characterizations of non-humorous, yet laughter-triggering situations may not necessarily facilitate a clear-cut classification, they are nonetheless inclusive

19

20 Understanding Conversational Joking

of the regular placements of laughter particles in improper talk or trouble-telling mentioned above. This makes quite appealing the idea of deriving the occurrence of laughter from a feeling of non-seriousness. At the same time, laughter displays this stance. In a similar vein, Chafe motivates the occurrence of laughter in these situations psychologically – “the nonseriousness mitigates or ameliorates to some degree the undesirability or abnormality (…) adding that feeling, communicated through laughter, helped lessen whatever negative feelings were associated with what has been said” (2007: 85). This explanation is also consistent with most of the non-humorous uses and occurrences of laughter and smile voice articulation observed by Schwitalla (2001). He shows how laughter often softens the emotional or social impact of potentially threatening actions, delicate topics or surprising and paradoxical situations. In his data, laughter also occurs in addressing awkward topics and in confessions of moral, social and other lapses. Laughter further functions as a hedge, allowing the speaker to dissociate themselves from what is said (2001: 340). Schwitalla finally points to occurrences of laughter in modifying the potential of face threatening acts for recipients, that is in facework (e.g., mitigating as well as aggravating critique, ridicule) or protecting speaker’s face needs during presentations of lapses or failures (2001: 333–337). To sum up, laughter is an uncertain indicator of funniness and amusement, and does not necessarily point to something humorous going on. Bearing in mind the twofold nature of laughter as a semiotic resource and an emotive display, laughter in many cases nevertheless both indexes and expresses a specific emotional attitude towards a stretch of talk. Chafe’s suggestion that laughter goes back to a feeling of non-seriousness is definitely very comprehensive and has the potential to include cases which, as they lack pleasantness, are not prototypically classified as humor or amusement. However, these cases display features traditionally associated with humor (e.g., surprise, something paradoxical or unexpected) and frame a stretch of talk as not meant to be interpreted seriously, thus preventing harm being caused by delicate or undesirable issues or actions. Not surprisingly, conversation-analytic research into laughables continues to treat laughter as a “prime indicator of humor or play” (Glenn 2003: 8; cf. Partington 2006). Against the background of Chafe’s argumentation, however, it might seem reasonable to follow Holt’s lead and apply to the laughable the notion of non-seriousness, since not all instances necessarily testify to mirth, exhilaration or amusement. “[L]‍aughter, in a variety of sequential positions, is recurrently centrally bound up with notions of nonseriousness: it is often the clearest clue that some turns are being designed to be nonserious or are being rated as such” (Holt 2013b: 73).



Chapter 2.  Conversational joking from a discourse-analytic perspective

2.1.2 Practices for the construction of humorous laughables Conversation-analytic investigation into the laughable aims at revealing the recurrent design features of utterances which turn them into laughables and which are ratified by laughter responses (Holt 2011: 394). Although non-seriousness is the broad label encompassing the most diverse laughables, this subsection focuses on laughables co-constructed in the achievement of amusement: on humorous laughables, that is. These are essentially characterized by shared laughter, or by interlocutors’ affiliative and escalating utterances expressing mirth and testifying to extended and shared amusement (Ford and Fox 2010: 343; Hay 2001). Reviewing research on humorous laughables also draws attention to the shortcomings of a strictly conversation-analytic approach and clears the way for another discourse-analytic approach, namely for an analysis within the framework of interpretive sociolinguistics. Working on the assumption that interlocutors rely on particular design features in mutually accomplishing laughables, conversation analysis subsumes all features, cues, means or strategies involved in this process under the label of “practices for constructing laughables” (Ford and Fox 2010). Such practices stem from several semiotic systems and may vary in size and complexity. Linguistic, paralinguistic and more general, multimodal properties may turn an utterance into a laughable, just as pragmatic means or aspects concerning the content of the talk do. Practices may and frequently do cluster (Holt 2011: 397). Moreover, recurrent design features may regularly co-occur and form a holistic pattern. Probably the most prominent practice that elicits laughter belongs to the realm of genre – the joke (Glenn 2003: 91). Sacks (1974, 1978) analyzes the telling of a joke basically as a technique for triggering laughter. He focuses on the sequential macro-structure of the telling, which breaks down into a preface sequence, a telling sequence and a response sequence. The preface sequence preceding and licensing the realization of a multi-unit turn such as a joke suspends the regular turntaking mechanism and guarantees that the floor belongs to the teller. It includes an announcement of the joke in which the teller usually also denies authorship, specifies the character and quality of the joke, and signals “that laughter is desired in the response sequence and that it should be done on the recognition of a punch line” (Sacks 1974: 341). Despite his mention of tripartite lists and taboo topics, Sacks is less interested in the genre structure of the joke itself and its individual elaboration than in the techniques for triggering laughter at particular points. Obviously, the punch line plays a crucial role since it marks the completion of the telling sequence and the place where laughter is to be elicited (1974: 345). For Sacks, jokes function like understanding tests. The punch line must remain implicit so that the recipients of the joke display, through their laughter, that they have been able to resolve it. For this reason, the suspension of laughter after the punch line has been

21

22

Understanding Conversational Joking

delivered is quite problematic, and interlocutors have techniques at their disposal to elicit initially suspended laughter upon completion of a joke (1974: 347–350). One such strategy “for encouraging laughing, and for encouraging its non-delayed production” (1974: 349) is accomplished by the teller’s own laughter serving as a “candidate answer” which the interlocutors can join in.1 Puns or word play are another technique for inviting laughter. From a conversation-analytic perspective, it is irrelevant whether word play has been deliberately planned or whether it gradually evolves from an accidental lapse or slip of the tongue. If two or more meanings are evoked by a single one-word form, or by a word form and similar sounding or slightly modified formal structure, and if both meanings are not equally compatible with the given context, an unexpected contrast is available which can transform an utterance into a laughable. Yet it is important to stress that conversation analysis does not further specify the structure or technique of a word play triggering laughter. Sacks (1973) adduces an example in which the literal reading of a figurative expression is subsequently evoked and earns interlocutors’ laughter. Glenn (2003) observes how laughables resulting from verbal errors or misused expressions may launch extended word play. In “playing the error game” (2003: 134) a “grammatical error provides the starting point for an episode of turns devoted to speech play and mock-errors” (2003: 140). These practices for the construction of laughables coincide with folk-categories of humor about which the members of a given speech community have tacit knowledge. Other practices lack genre labels but, nevertheless, signal that a turn or sequence can be oriented to as laughable. Holt (2007) observes that “enactments” following turns with reported speech regularly trigger laughter. By an enactment, she means an utterance which formally pretends to render the speech of another, but does not claim authenticity. Enactments provide fictional utterances, ascribed to real or virtual figures, which further elaborate on the preceding reported speech. If interlocutors join in by providing further enactments, a fictional or, in Holt’s terms, a “joking hypothetical scenario” emerges. Glenn (2003: 82–83) describes similar phenomena such as “dramatizing talk” and “fantasy theme chaining” which are accompanied by laughter and other displays of appreciation, all in the achievement of “extended mirth” (2003: 83). The practices to which interlocutors orient themselves as funny described thus far represent complex and holistic structures. Other studies (e.g., Holt 2011; Ford 1.  This exclusive focus on jokes as a macro-sequential technique for eliciting laughter strongly contrasts with approaches interested in the working and more specifically in the cognitive effect of the punch line as well as with approaches interested in techniques facilitating the striking presentation and elaboration of the joke text itself. These aspects of jokes will be dealt with in Chapter 3.



Chapter 2.  Conversational joking from a discourse-analytic perspective

and Fox 2010) decompose practices that trigger laughter into individual features which co-occur. Here, it is important to stress that we are dealing with an analytical decomposition since the features are perceived in conjunction. In particular, Holt (2011) points out that only an in-depth analysis of each extract that also considers how these features are sequentially embedded can reveal which of them interlocutors orient themselves towards with laughter, which action is actually accomplished by this practice and whether amusement is enacted at all. In her data, interlocutors’ laughter occurs adjacent to figurative phrases (mostly metaphorically motivated, some purely idiomatic) which are realized in a “dramatic, exaggerated or ‘overdone’ ” fashion (Holt 2011: 399). Yet an approach based on laughability cannot determine whether interlocutors orient themselves exclusively to the overdone and exaggerated character of the phrase, or to other accompanying features as well. Against this background, Holt conducts thorough analyses of several examples that reveal further features accompanying the figuratively overdone phrase to which laughter potentially orients itself, such as a “slow deliberate delivery” (2011: 399) or an “animated intonation” (i.e., lengthening of syllables, singsong intonation, modulation of pitch) (2011: 400). Furthermore, the inclusion of laughter particles and smile voice articulation in the delivery of a figuratively overdone phrase marks its laughability (2011: 402). Consequently, Holt arrives at the conclusion that “there are clusters of properties that the recipient may be orienting to in laughing” (2011: 402). Ford and Fox’s (2010) research is unique in analyzing multimodal practices for the construction of laughables. Using videotaped casual conversations of private gatherings, they present bodily-visual, vocal-phonetic and action-sequential practices contributing to the joint enactment of amusement, claiming “that it is through the coordination of these semiotic systems – sound, body and sequence – that laughables are co-constructed” (Ford and Fox 2010:  342). At the same time, they stress that the laughables in their data do not constitute units with clear boundaries that can easily be separated from surrounding discourse: On the contrary, what constitutes and may be reciprocally oriented to as a laughable involves diffuse and cumulative practices rather than discrete and contrastive structural slots, segments, or units. Laughables are regularly distributed across strips of activity rather than discretely bounded in single units. (2010: 344)

Hence, the practices they observe are spread over sequences, transforming these into laughable stretches of talk. The phonetic practices they observe predominantly resemble laughter sounds. Ford and Fox mention aspiration or breath particles contributing to an overall impression of “breathiness”, smile voice articulation due to a spread of the lips and lengthening of vowels and fricatives, and laryngeal constrictions (2010: 350–354). They also note a modulation of pitch and loudness

23

24

Understanding Conversational Joking

giving the impression of “wobbling” (2010: 352) as a signal of laughability. Bodily practices are also closely associated with the physiological symptoms of laughing. Here, Ford and Fox (2010: 354–359) list the following “visual displays”: lipspreading, leaning back and forth, throwing one’s head back, hand(s) covering one’s face and clapping hands, as well as beating on a surface, tension of the neck or face muscles, and shaking shoulders or body. The last group of practices concerns the sequential organization of actions. Responses offering unexpected and exaggerated upgrades, constituting incongruities or strong contrasts, or simply defeating sequentially elicited expectations: all these are identified as “action-sequential patterns” in the achievement of laughability (2010: 360–365). These practices modify the “normative, expectable and/or preferred progressions of sequences-in-progress or turns-in-progress, with the producer of a possible laughable adding and even savoring an upgraded and extreme description or an unexpected juxtaposition within an otherwise projectable trajectory of action” (2010: 360). Drew (1987) and Holt (2013b), finally, show that practices used in constructing laughability may also be neglected, and that laughables are indeed co-constructed by the interlocutors (i.e., it is up to the interlocutors whether they orient to a potential laughable). They draw attention to strategies that suspend the serious impact of potentially threatening actions, and so allow recipients either to react to the serious dimension or to join in joking and laughing. Drew (1987) analyzes teases as a form of specifically marked playful aggression, which prevents them seriously damaging the interlocutors’ relationship (cf. Glenn 2003: 97–98). When teasing, speakers render something an interlocutor has said or done but exaggerate the wording and content, or the illocution, of the original pattern (Drew 1987: 232). The rendition “is therefore immediately recognizable as an exaggerated or extreme version, and by virtue of that is not meant seriously to apply” (1987: 231). Similarly, Holt (2013a) points to overstatements and extreme wordings as signals with the potential to inhibit serious reactions “where participants negotiate their way through potentially tricky sequences, often concerning delicate activities such as offers, requests, and invitations” (2013a: 84). This ambivalent quality of laughable utterances offers recipients the opportunity to react in various ways, thus allowing them to (re)‍define the interaction as serious, nonserious or to maintain its ambivalent character (Glenn 2003: 136–138). In Drew’s data, interestingly, more than half of the victims of teases react ‘po-faced’ (i.e., they address the serious dimension of the teasing turn(s)). Holt (2013b) observes all possible options in reacting to such equivocal laughables. Apart from treating them seriously or non-seriously, “second turns (responses) may” also offer “some combination of the two” (2013b: 73). This points to general problems with the laughable-based approach to humor.



2.1.3

Chapter 2.  Conversational joking from a discourse-analytic perspective

Problems and challenges with the laughable-approach

The practices presented in the reviewed literature are actually symptoms of laughability which repeatedly occur in the sequential environment of laughter. Conversation analysis is not interested in revealing the reasons for laughter but focuses on practices which often attract laughter, and it understands responsive laughter as oriented to these practices. The laughable-based approach works well as long as the analysis can document that interlocutors’ laughter orients to these practices. It becomes problematic when recipients withhold laughter, and also when laughter is allocated to turns lacking any such features. The latter situation, sometimes termed deadpan delivery, has been identified as a challenge (e.g., by Holt 2013b: 73; Ford and Fox 2010: 364; Schegloff 2001: 1951f). Responses to it vary. Schegloff (2001: 1950) addresses the issue of “equivocal” marking of nonserious utterances, in which “various marks of ‘kidding’ – its overtness, its overstatement, its broad ‘aside’ delivery – are offset by the fully deadpan character of its delivery”, as a touchstone for conversation-analytic methodology. His analysis of the English discourse marker no shows that interlocutors utilize it as a transformation marker which in turn-initial position “can mark a transition from nonserious to serious talk” (2001: 1948). The working of no can be compared to a pivot signaling that the preceding discourse has “been a joke or in some other respect non-serious” and “that what is to come is not” (2001: 1954) – irrespective of the presence of other recurrent features marking the laughability of the preceding discourse. Consequently, a humorous or nonserious episode can also be marked ex post by a transition marker, and analysts can rely on this discourse marker “as a data-internal evidence for an otherwise not overtly marked ‘non-seriousness’ or ‘joke’ design for the speaker’s prior utterance” (2001: 1951f). Given its basic assumption that laughability is a joint accomplishment and a matter of interlocutors’ negotiation (see above), conversation analysis rarely attempts to trace laughability back to an utterance’s inherent properties. Schegloff (2001) avoids doing so by focusing on a resource-signaling transition to serious discourse, to which interlocutors orient themselves in the absence of overt markers of laughability. Ford and Fox (2010: 364, Fn 14), who also observe that interlocutors in their data “construct successful laughables without any notable phonetic or visual ‘marking’ at all”, seem somehow willing to break with this conversation-analytic tenet in assuming that inherent properties of these utterances are responsible for their laughability (cf. Bertrand and Priego-Valverde 2011: 350; Brock 2003: 361; Kotthoff 2006b: 274; Müller 1984: 118). “It may be that such ‘deadpan’ deliveries constitute forms of incongruity and contrast in themselves, doing something unexpected or inapposite as if doing seriously” (Ford and Fox 2010: 364, Fn 14). Interestingly, among the sequential practices for the construction of laughability,

25

26 Understanding Conversational Joking

they list “contrast and incongruity” (Ford and Fox 2010: 362–364), which may intrinsically elicit laughter. These practices thus have a different status to that of mere symptoms of laughability. More generally, not all of the above-mentioned practices contribute equally to the accomplishment of laughability. Some of them (e.g., puns, jokes, disconfirmation of sequentially established expectations) comprise a humor-specific contrast, providing a moment of incongruity said to trigger laughter anyway. Others (e.g., bodily-visual or phonetic-audible displays) point to and express a feeling of mirth, since they are often directly connected to the physiological reaction triggered by the feeling of non-seriousness. This particularly concerns laughter, aspiration or breath particles, and the smile voice articulation. These practices can be regarded as icons or iconic indexes of humor. Other practices for constructing laughables such as exaggeration or overstatement signal that a particular stretch of talk deviates significantly from the surrounding discourse and is to be interpreted differently. However, the tendency to gather all observable features of humorous stretches of talk under one umbrella also applies to other discourse-analytic approaches. Accordingly, we will sum up and address all such problems at the very end of this chapter. Aside from deadpan delivery, a lack of responses ratifying the laughability and the nonseriousness of laughables poses another challenge when strictly applying conversation-analytic methodology to detect humorous laughables within faceto-face interaction. The method’s advantages – approaching the data without preconceptions; not imposing the analyst’s prefabricated categories nor resorting to any kind of background knowledge beyond the sequential context – come at the expense of interpretive depth. This is particularly striking when it comes to the range of possible reactions to a (potential) laughable, and to the various meanings which absence of laughter may acquire (Bell 2009; Hay 2001; Kotthoff 2003; Attardo 2002; Eisterholt, Attardo and Boxer 2006). Investigation into the reaction to humor shows that responses depend heavily on the form or genre of humor, and on diverse sociolinguistic variables (e.g., social background of the speakers, interlocutors’ relationship, situation or speech event). Reactions to irony, for example, resemble in some respects those to teases as observed by Drew (1987). Recipients have several options: They can adopt the ironic mode (Attardo 2002) and provide more ironic utterances (Hay 2001: 62– 63), or start a joint teasing sequence (Kotthoff 2003: 1394–1396). Yet they can also address the message actually conveyed and react to  – in Grice’ terms  – what is meant (Kotthoff 2003: 1401–1404). Kotthoff (2003: 1396–1398) further observes in her data combined responses addressing both what is said and what is meant, which are necessarily ambiguous as to the trigger. Hay (2001: 66), again, argues that irony is a self-sufficient form of humor which can do without support (cf. Eisterholt, Attardo and Boxer 2006). Moreover, in the case of irony, the situation



Chapter 2.  Conversational joking from a discourse-analytic perspective

seems to strongly affect the choice of response options. Comparing dinner talk and conversations on TV, Kotthoff (2003) finds that, at the dinner table, there is a preference for responses maintaining nonseriousness, whereas on TV interlocutors react to what is meant. Other types of laughables, too, seem to regularly prefer non-laughter responses. Hay (2001: 64–65), for example, argues that “self-depreciating humor” is followed by contradiction on the interoclutor’s part as a means of expressing sympathy with the one poking fun at themselves (cf. Jefferson 1984). Absence of typical “humor support strategies” such as laughter, playing along with an initiated joke or a conversational organization testifying to greater involvement (e.g., overlap, echo etc.) as listed by Hay (2001: 57–63, 64–65), can thus be variously motivated and is not necessarily an evidence of failed humor. In fact, humor can fail in various ways, and it requires a model of how humorous utterances are processed and understood in order to specify why a humorous utterance has not been ratified. Hay makes an attempt in this direction. She stresses that such prime ways of ratifying a humorous laughable, such as supplying laughter or more humor, implicate recognition, understanding and appreciation of the jocular utterance at the same time (Hay 2001: 67). These three stages of humor support are modeled as scalar implicatures, suggesting entailment of the first within the latter and the last one (2001: 167). Reactions to humorous utterances can, accordingly, withhold appreciation but signal that humorous intent has been recognized and understood (by specific overt displays) (2001: 69–70). An outright lack of reaction to humorous utterances, however, does not allow identification of either the reason why the humor attempt failed, or the stage at which it did so (2001: 70). Hay mentions a number of potential reasons why humorous attempts may not be supported: (1) Insufficient contextualization (2) Being too late, or reviving “dead” humor (3) Assuming too much background knowledge (4) Misjudging relation between speaker and audience (5) Negatively teasing someone present (6) Trying to gain membership of an exclusive sub-group (7) Disrupting serious conversation (8) Portraying oneself inappropriately for one’s status or gender.  (Hay 2001: 71) Some of these testify to a lack of recognition or understanding (e.g., (1) or (3)), others point rather to a refusal of appreciation (particularly (4), (5), (6) and (8)). Interestingly, a lack of overt laughability markers is merely one reason (the first). In any case, a thorough classification rests upon the analyst’s interpretation and often relies on the inclusion of background knowledge (e.g., knowledge about social

27

28

Understanding Conversational Joking

and group-specific norms, a speaker’s position within a given group or situation). Conversation analysis, however, is reluctant to include the wider (e.g., social) context in determining any phenomenon, including humorous laughables, a tendency that points to another shortcoming of this methodology and its application to the analysis of humor. Conversation analysis’s diligence in minimizing and preventing the analysis from being influenced by analysts’ categories, including world view, interpretations and the like, is actually a benefit and drawback in one. The method starts from the basic assumption that utterances themselves contain enough information to be properly interpreted. Meanings and order are taken to be locally produced by the interlocutors themselves. The immediate sequential context is thus assumed to contain all required information for the analyst to reveal how a specific activity such as the laughable is jointly accomplished. Consequently, conversation analysis works with a narrow concept of context, restricted to the sequential co-text. Yet, especially with conversational joking in mind, the local sequential co-text may not always suffice to determine why a stretch of talk draws laughter – or fails to do so. Teasing, for example, relies on the playful violation of social norms, and whether a particular norm can be playfully violated or not depends on group-specific standards. From Hay, we further learn that joking may fail if interlocutors inappropriately assess their relationship and their position within a given group or situation. In fact, ethnographic background knowledge exceeding that available through the immediate sequential context is often crucial for the analysis of conversational joking (Branner 2003: 144; Kotthoff 1998: 96). A number of scholars are less skeptical about the inclusion of such information in the analysis of humor. They advocate approaches based on a wider understanding of context and allowing for a more interpretive analysis. Hay (1995), Branner (2003) or Kotthoff (1998), for example, apply an ethnographically enriched identification procedure and conduct an analysis of conversational joking which follows the framework of interpretive (elsewhere also termed interactional) sociolinguistics. This is in line with Deppermann (2000), who likewise argues for an ethnographically enriched discourse analysis (“ethnographische Gesprächsanalyse”), since conversation analysis in its pure form “misconstrues the indispensable role interpretation plays in the analysis of discourse. It therefore neglects the preconditions and the effects by which the analyst’s knowledge inevitably shapes the process and the results of conversation analytic studies” (Deppermann 2000: 96). The same author (2000: 101–102) stresses that ethnographic background information can contribute to the analysis particularly when there is a lack of the immediate overt displays which conversation analysis exclusively relies on. As we have seen in this subsection, overt displays for laughability may be lacking if the speaker initiates a joke ‘deadpan’ or the recipient(s) withhold laughter for various reasons. Deppermann (2000: 108–115)



Chapter 2.  Conversational joking from a discourse-analytic perspective

illustrates how the inclusion of ethnographic knowledge (e.g., knowledge about social or situational habits and conventions) contributes to the analysis by bridging interpretive gaps, preventing misinterpretations or enriching and validating an analysis. Summing up, a discourse analysis of conversational joking that relies exclusively on overt displays of laughability (including humor support strategies in response to a laughable as well as overt displays and practices for the construction of a laughable) runs the risk not only of excluding (potential) humorous instances, but also of insufficiently accounting for their humorousness. In order to gain a deeper understanding of conversational joking, it is helpful to include ethnographic background knowledge in the analysis. In theoretical terms, this goes together with a shift from the strictly local understanding of context characteristic of conversation analysis to a wider understanding of context that includes social, situational and other information typically considered in interpretive sociolinguistics (Kotthoff 1998: 114–117; Branner 2003: 141–144). This discipline acts on the assumption that both social and cultural norms, situational roles, gendered identities and the like all affect and shape the way verbal interaction is conducted. It thus provides a methodological framework for detecting how linguistic and sequential or action patterns produce and reflect social, cultural and other meanings, a framework based on a dynamic notion of context. Context is seen as an actively constructed interpretive frame used by interlocutors to signal how an ongoing activity should be interpreted. The next subsection focuses on the specific understanding of context within interpretive sociolinguistics. It also introduces contextualization theory as a theoretical framework for analysing conversational humor that can compensate – at least partially –for the shortcomings of the laughables-approach. 2.2

Contextualizing humor

Conversation analysis focuses exclusively on meaning or activities as a local accomplishment of the interlocutors and aims at revealing the practices involved in the mutual negotiation of meaning. It does not speculate about recipients’ interpretation but relies solely on interlocutors’ reaction in the next turn as a display of their understanding. Meaning is produced and reproduced locally. Applied to conversational joking, this approach results in an understanding of humor as jointly enacted amusement operationalizable in terms of ratified laughables. Interpretive sociolinguistics departs from these conversation-analytic assumptions and shifts the focus to how recipients understand utterances, how speakers employ various means of guiding their recipients’ interpretation and how both parties rely on their knowledge about what can be expected and done next in a speech event

29

30

Understanding Conversational Joking

of a given type. Contextualization theory was developed by Gumperz (1982) to account for this “situated and contextbound process of interpretation” (Gumperz 1982: 131), which he also terms conversational inferencing. The theory explains how interlocutors rely on various cues in signaling the type of activity they are engaged in. As regards humor, contextualization theory provides a framework describing how specific cues accompanying or characterizing humorous utterances signal that the stretch of talk marked by them is not meant to be interpreted seriously. Here, it is important to stress that these cues may coincide with practices for the construction of laughables. The two frameworks differ in the theoretical status ascribed to these practices or cues and in their overall account of jocular talk. We will therefore not only present the contextualization cues on which Russian interlocutors rely when framing a stretch of talk as humorous, but also discuss how contextualization theory conceives of humorous talk by asking what actually is contextualized when joking. Contextualization theory (Gumperz 1982, 1992a, 1992b; Auer 1986, 1992; Schmitt 1993; Auer and di Luzio (eds.) 1992) aims at explaining how people understand utterances in context. It works on the assumption that, when communicating, interlocutors actively construe the context relevant for the interpretation of their utterances. Hence, context is conceived of as an ongoing “accomplishment” (Schmitt 1993: 331–335). In other words, speakers “enact a context for the interpretation of a particular utterance” (Auer 1992: 25). Auer (1992: 21) characterizes this conceptualization of context as flexible because “context (…) is continually reshaped in time” and as reflexive because the “relationship between context and text” is “one in which language is not determined by context, but contributes itself in essential ways to the construction of context.” The term contextualization has been coined to refer to this dynamic and emergent conceptualization of context as accomplishment or enactment. In most general terms, contextualization therefore comprises all activities by which participants make relevant, maintain, revise, cancel…. any aspect of context which, in turn, is responsible for the interpretation of an utterance in its particular locus of occurrence (Auer 1992: 4; italics in original)

Such activities, which in contextualization theory are termed contextualization cues, cover “any feature of linguistic form that contributes to the signaling of contextual presuppositions” (Gumperz 1982: 131). They make the relevant context available and guide recipients’ inferences. Conversational inferencing is then understood as “the situated or contextbound process of interpretation, by means of which participants in an exchange assess others’ intention, and on which they base their responses” (1982: 153). Hence, “constellations of surface features of message form are the means by which speakers signal and listeners interpret what



Chapter 2.  Conversational joking from a discourse-analytic perspective

the activity is, how semantic content is to be understood and how each sentence relates to what precedes or follows” (1982: 131). Contextualization cues derive this potential from “conventionalized co-occurrence expectations between content and surface style” (1982: 131). Initially developed in order to account for friction in intercultural communication, such cues were shown by Gumperz (1982: 130– 152) to result from the application of different and culture-specific contextualization conventions. On various levels of interaction ranging from conversational management to the organization of whole speech events, contextualization cues evoke relevant contexts, provide appropriate contextual presuppositions and guide conversational inferencing (Gumperz 1992a, b; Auer 1986, 1992; Schmitt 1993). They do so by evoking interlocutors’ background knowledge about how a particular activity is conducted. Gumperz himself relies extensively on Levinson’s ([1979] 1992) concept of the activity type, which models the way our knowledge about particular complex and interactively accomplished activities is structured in a schema- or frame-like fashion (cf. Schmitt 1993: 331; Auer 1992: 26). Contextualization cues, accordingly, signal what kind of activity the interlocutors are engaged in and evoke their knowledge about this activity, thus enabling them to get involved in it and structuring their expectations about what will happen next (Gumperz 1992b: 46). Gumperz himself also points to the role contextualization plays on the local level of discourse and stresses that contextualization cues help not only in assessing a turn’s illocution but also in conversational management (1992b: 46). Potentially, contextualization applies to all levels of interaction since it evokes the most varied aspects of context. Such an aspect of context may be the larger activity participants are engaged in (the “speech genre”), the small-scale activity (or “speech act”), the mood (or “key”) in which this activity is performed, the topic, but also the participants’ roles (the participant constellation, comprising “speaker”, “recipient”, “bystander”, etc.), the social relationship between participants, the relationship between a speaker and the information he conveys via language (“modality”), even the status of “focused interaction” itself. (Auer 1992: 4)

Consequently, research within the framework of contextualization theory aims at revealing how cues from different semiotic systems evoke the relevant contextual presuppositions at various moments in discourse. Auer (1986, 1992) and Gumperz (1982, 1992a, b), for example, illustrate how: – choices concerning style, register or terms of address, as well as code-switching phenomena, frame an activity and the relationship of the intertactants involved in a specific way; – formulaic language brings about speech genres;

31

32

Understanding Conversational Joking

– gesture and proxemic behavior facilitate the transition to another activity; – vocal and prosodic cues signal turn-holding, turn-yielding as well as turntermination. The cues themselves are said to lack an inherent meaning (Gumperz 1982: 131, 1992b: 50; Auer 1992: 24–25). Rather, they mark and single out a stretch of talk. They “establish contrasts and influence interpretation by punctuating the interaction by these contrasts. (…) The only meaning a cue has (…) is to ‘indicate otherness‴ (Auer 1992: 31). Gumperz (1992b: 50) compares cues to “indexical signs” belonging to “the realm of metapragmatics”. Nonetheless, unlike lexically based indexical signs, typical contextualization cues such as code-switching or prosodic patterning “convey information by setting off or establishing oppositions among sets of lexically coded strings. What is conveyed by such maneuvers are constraints on interpretation which are independent of what is conventionally called the propositional meaning of these strings” (Gumperz 1992b: 50). Contextualization theory thus contributes to discourse analysis by connecting various semiotic resources observable on the surface of discourse, on the one hand through contextual presuppositions and the corresponding socio-cultural background knowledge and, on the other, with interpretive constraints. Applied to the analysis of conversational joking, contextualization theory contributes to an understanding of humor as a specific framing that suspends serious interpretation of a particular stretch of talk. It shifts the focus to the means that set apart humorous talk from surrounding serious discourse and serve as contextualization cues (e.g., Norrick and Chiaro (eds.) 2009; Kotthoff (ed.) 1996). However, before presenting the cues by which Russian interlocutors signal to each other that a particular stretch of talk is not meant to be interpreted seriously, it is necessary to raise the question of what exactly is contextualized in jocular talk. This is a crucial issue, since we earlier identified the main benefit of discourse-analytic methods as being that they approach humor without any theoretical preconceptions. Linguistic discourse analysis has so far adopted several theoretical concepts, inter alia from other branches of discourse studies, in order to grasp conversational humor in terms of a specific framing. Surveying these concepts therefore also provides better insight into the discourse-analytic conceptualization of humor promoted in interpretive sociolinguistics and, to put it heretically, shows how a concept of humor is somehow brought in through the backdoor. 2.2.1 What exactly is contextualized in joking? The superordinate concept required to account for context providing the humorous dimension of specifically marked utterances is that of a humorous framing



Chapter 2.  Conversational joking from a discourse-analytic perspective

or play frame. However, this label covers various, slightly different conceptualizations. The notion of frame, which is common in linguistics, was adopted from several scientific disciplines and fields ranging from sociology and anthropology to artificial intelligence and cognitive science (cf. Tannen 1993; Bednarek 2005; Busse 2012). In discourse studies, frame analysis “is an approach to cognition and interaction that focuses on the construction, conveying, and interpretation of meaning” because “[f]‍rames affect the way in which we categorize, remember, and revise what we know, as well as what we say, how we mean it, how others hear it, and how we do things together linguistically and otherwise” (Telles Ribeiro and Hoyle 2009: 74). In the most general terms, frames refer to structured sets of expectations and formats, within which experience is organized and available as knowledge. Frames provide the background against which we interpret and understand an activity or a text  – “one organizes knowledge about the world and uses this knowledge to predict interpretations and relationships regarding new information, events, and experiences” (Tannen 1993: 16).2 In line with Ensink and Sauer (2003), Telles Ribeiro and Hoyle (2009) and Tannen and Wallat (1993), we can essentially distinguish two concepts of frame – interactive frames and knowledge frames. The latter have also been termed schemas or scripts if referring to dynamic events (cf. Schank and Abelson 1977). Knowledge frames or schemas describe “the way our knowledge is organized and how we use our knowledge in understanding” (Ensink and Sauer 2003: 4). They represent chunks of our knowledge about the world as these are stored in our long-term memory. The concept of interactive frame, by contrast, “refers to a sense of what activity is being engaged in, how speakers mean what they say” (Tannen and Wallat 1993: 60). Interactive frames are at work in conversations; they explain how interlocutors understand and interpret what is going on as a social encounter of a particular type. It is this latter understanding of frames as an interactive frame which features prominently in most of the examples provided by Gumperz. Scholars describing conversational humor as “talk in a play frame” (Coates 2007; cf. Norrick 1993b; Straehle 1993) mainly adopt frame concepts from anthropology or sociology. These differ slightly from those sketched above. The “anthropological/sociological view stresses frame as a relational concept rather than a 2.  Frames are involved in the interpretation of any activity or text: “We identify, both to ourselves and to our interactants, how to define a situation, how its parts fit into the whole, how the whole relates to larger structures of experience, and how what is unfolding at a given moment affects what will come next. Language users, whether conversationalists, readers, or viewers of a film, must decide whether a message– a statement, command, question, or laugh– is hostile, friendly, aggressive, or flirtatious; intended to amuse, inform, annoy, persuade; intended as a main point or a side remark. Framing, then, is a matter of conveyed meaning more than literal meaning.” (Telles Ribeiro and Hoyle 2009: 74).

33

34

Understanding Conversational Joking

sequence of events; it refers to the dynamic relationship between people” (Tannen 1993: 19; italics in original). In the remainder of this subsection, we will discuss and review not only the corresponding concepts of frame and framing but also how each conceives of play, since in these approaches play regularly replaces or stands for humor. At this point it is also important to stress that not all authors use the term frame. Some introduce alternative terminology such as meta-message, keying or discourse modality. Nonetheless, following Tannen’s (1993) lead, frame or framing can be regarded as comprehensive umbrella terms. Gregory Bateson’s (1972) Theory of Play and Fantasy serves as a central point of reference in many analyses of conversational humor (e.g., Coates 2007; Norrick 1993b; Straehle 1993; Dynel 2011). Having observed that monkeys can engage in combat seriously or playfully, Bateson advances the idea that they orient to particular signals conveying a meta-message “This is play” which enables them to distinguish serious from playful nips. “Expanded, the statement ‘This is play’ looks something like this: ‘These actions in which we now engage do not denote what those actions for which they stand would denote‴ (1972: 180; italics in original). In other words, “the message or signals exchanged in play are in a certain sense untrue or not meant” (1972: 183). The signals conveying the meta-message “This is play” enable the recipient to decide whether an action is what it stands for or “fictional” (1972: 182). So, in Bateson’s approach “a frame is metacommunicative” and “[a]‍ny message, which either explicitly or implicitly defines a frame, ipso facto gives the receiver instructions or aids in his attempt to understand the messages included within the frame” (1972: 188). His understanding of play is mainly that of overtly marked pretence or play-acting. It thus resembles Haiman’s (1990, 1998) account of sarcasm as alienated talk. Haiman argues that, when being sarcastic, speakers detach themselves from what they say by simultaneously communicating the meta-message “I don’t mean this”. This meta-message is conveyed by “stage separators” stemming from various semiotic systems that signal the sarcast’s dissociation from what they render (Haiman 1998: 28–60). These correspond to contextualization cues establishing a humorous frame. They “mark any text or any activity as fictional” and “indicate that whatever is acted out behind them is either ‘a play’ or ‘not serious’ in the purely formal sense and that it does not have an impact on the real world” (Haiman 1990: 182). Further parallels can be drawn to Goffman’s (1981) concept of footing, which elaborates on how speakers can align to what they say in various ways. Goffman distinguishes several participation roles. The principal is the originator of the ideas promoted, the author is responsible for the words in which the ideas are couched and the animator merely lends their voice in order to articulate the principal’s ideas and the author’s words (Goffman 1981: 144–146). Goffman refers to



Chapter 2.  Conversational joking from a discourse-analytic perspective

instances in which speakers dissolve the usual unity of these participation roles in terms of shifted footing. This occurs if speakers lend their voice to another figure and render talk attributed to another author or principal and, as a result, become merely the animator of what they say. Similar to Haiman’s concept, Goffman’s idea of footing stresses that speakers can deny responsibility for what they say by using various cues indicating that they are not talking for themselves and thus contextualizing a shift of footing (cf. Couper-Kuhlen 1996, 1999). Thielemann (2013) adopts Goffman’s concept in order to account for humorously detached, rendered speech in which a speaker pretends to do something, or to hold a particular view, while clearly signaling that this is not the case (cf. Clift 1999; Kotthoff 2002). Bateson and Haiman both establish a direct nexus between a play frame evoked by signals conveying the meta-message “This is play/I don’t mean this” and play-acting within a theater frame. Kotthoff (1998: 167–168), however, advises against equating theater framing and humorous framing on the grounds that there are forms of communication within a theater frame which do not aim to elicit amusement or laughter (e.g., role play within therapy sessions). Instead, she (e.g., Kotthoff 1999, 2006a) prefers Hymes’ (1974) notion of keying in characterizing humorous talk. Hymes himself does not elaborate on humor but alludes to mock communication, in which the serious impacts of actions are suspended. Key or keying is introduced as one of the parameters for the analysis of speech events within Hymes’ SPEAKING acronym and refers to: the tone, manner or spirit in which an act is done. It corresponds roughly to modality naming grammatical categories. Acts otherwise the same as regards setting, participants, message form, and the like may differ in key, as e.g., between mock : serious or perfunctory : painstaking. (Hymes 1974: 57)

Hymes stresses that keyings are “often conventionally ascribed” (1974: 57) to particular speech events, while also pointing to the possibility that other keyings are allotted to an activity. Nevertheless, if the keying of an activity “is in overt conflict with the content”, the key “often overrides the latter [as in sarcasm]” (1974: 58). So, keying denotes an additional layer of meaning which can transform a given activity. Hymes classifies the meaning of a key or keying as “social meaning (…) involving whether in its conventional intention the sentence is to be taken as mock or at face value” (1974: 181; italics in original). Later, he specifies that in “the mock case (…) the speaker is not purporting to have the intention” (1974: 182). This again is reminiscent of the conceptualization of play framing suggested by Bateson and maintained by Haiman, which also essentially relies on cancelling the face value of an action or message. Conceiving of joking as performing an activity in a humorous keying (Kotthoff 1999, 2006a), however, does not necessarily associate it with play-acting. The advantage of Hymes’s concept is that it allows for a layering of

35

36

Understanding Conversational Joking

framings in which keying represents merely one layer but possesses the potential to transform an activity. Similarly, Goffman adopts the musical term key to refer to the transformation of an activity “already meaningful in some primary framework” which “is transformed into something patterned on this activity but seen by the participants to be something quite else” (Goffman 1974: 44). In Goffman’s (1974) seminal Frame Analysis, the concept of frame acquires a rich variety of aspects. In this context, he also shows how different forms of humorous interaction can be accounted for in terms of specifically framed communication. Inspired by Bateson, Goffman closely associates playfulness with make-belief, which is achieved by a layered framing, that is a keying that transforms a primarily framed activity (1974: 40–47). Although other forms of interaction also rely on a layered make-belief frame, such as fantasy or daydreaming and drama (Goffman 1974: 52–56), playfulness remains “the central kind of make-belief ” (Goffman 1974: 48.). Whereas these forms of interaction rely on overtly recognizable frames, Goffman also deals with deceptive framings, which he terms fabrications. In fabrications, not all interactants are initially aware of the framing (1974: 83–85). Goffman points to several forms of humorous interaction, such as “practical jokes” or “corrective hoaxing”, which represent playful fabrications (1974: 89–92). These forms of humor that would fail if the the person fooled could identify the playful frame in advance. Closely related to the concepts of key and keying is the notion of discourse or interaction modality, which originated in linguistic discourse analysis (e.g., Kallmeyer 1979; Müller 1984: 106–119; Karasik 2007). Kallmeyer (1979: 556) explains that “means and devices that lend a presentation, activity or situation a specific symbolic significance” (“Verfahren (…), die einer Darstellung, Handlung oder Situation eine spezielle symbolische Bedeutsamkeit verleihen”) constitute an interaction modality (“Interaktionsmodalität”), in that they relate what is going on to various categories of being and acting such as “play, dream or institutional situations” or to “knowledge and interactants’ intention” (“mit Bezug z.B. auf die Seinswelt wie Spiel oder Traum, auf Wissen und Intentionen der Beteiligten oder auf eine institutionelle Situation”). In order to illustrate how modality frames (“Modalitätsrahmen”) affect the symbolic significance of objects, he points to the meaning acquired by a stick or a loaf of bread in the context of play or ritual in contrast to the reality of everyday life (Kallmeyer 1979: 556–557). The interaction modality of play modifies the relation of activities to real-life interaction in suspending their consequences; what is done in a playful modality need not impact everyday life (“die Modalität Spiel wiederum bietet die Möglichkeit, die Hinübernahme von Konsequenzen aus der Spielwelt in die alltagsweltiche Handlungsrealität zu sperren”) (Kallmeyer 174: 557). In a similar vein, Chafe (2007: 11–13) points to “nonseriousness as a safety value” which offers interactants the option to withdraw the



Chapter 2.  Conversational joking from a discourse-analytic perspective

face-value of their utterances, thus preventing serious harm to their relationships. Müller (1984: 109) adopts Kallmeyer’s concept as “modality of dealing with topics” (“Modalität der Themenbehandlung”) and mentions “seriousness, fun, dream and phantasy as existential modalities” (“Existenzielle Modalitäten (Ernst, Spaß, Traum, Phantasie…”). He shows how shifting from the modality of seriousness to that of fun affects the way topics are chained in conversation. In serious talk, topics shift in accordance with real-life, ontological logic, whereas in joking sequences topics are strung together in a rather associative way, enabling the establishment of links otherwise inhibited by the norms or logic of real life (Müller 1984: 148–152). In Kallmeyer’s and Müller’s conceptualizations of fun or play modality, a crucial role is played by interlocutors’ detachment from reality. When conversing in a humorous modality, interlocutors need not conform to various norms of reallife communication. Creative connections usually inhibited by real-life logic can evolve, norms active in real life interaction can be suspended, references to reality can be made looser. Such an understanding of playful communication also forms the basis of humorous genres which essentially rely on fictionalization and in which interlocutors create virtual scenarios by moving, turn by turn, away from reality (e.g., Bange 1985; Bergmann 1998; Kotthoff 2009b; Holt 2007; Winchatz and Kozin 2008; Thielemann 2012a, b). Yet the same understanding also paves the way for scholars to point out the propinquity of joking and poetic talk or verbal art, arguing that, in playful communication, attention shifts from the content or intentions conveyed by talk to its form and performance (e.g., Schwitalla 1994; Kotthoff 1999; Zemskaja 1983, 1995a). The latter aspect is particularly striking in word play, or rather language play, that violates or manipulates various norms and rules of the linguistic system. Brock’s (2003) attempt to determine and characterize playful communication (“ludische Kommunikation”) shows how the two aspects are intertwined, with both resulting from an understanding of humor in terms of norm-breaching. According to Brock (2003: 358), “playful communication is prototypically characterized by a suspension of real life obligations and communicative norms which allows for the overt manipulation of rules and regularities of linguistic communication” (“Spielerische Kommunikation liegt prototypisch dort vor, wo außersprachliche Lebenzwecke sowie kommunikative Beschränkungen weitgehend suspendiert sind und so die Regeln und Bedingungen sprachlicher Kommunikation der freien Manipulation offen liegen”). He models playful communication in terms of a layered (gestuft) category lacking a core prototypical instantiation. Although interlocutors can play with virtually any norms, including those regulating the linguistic system, extreme violations of linguistic norms come at the expense of comprehensibility and, hence, run the risk of communicative breakdown (“weil eine völlige Befreiung von sprachlichen Regeln ein kommunikatives Chaos nach sich

37

38

Understanding Conversational Joking

zöge”, Brock 2003: 359). Hence the lack of a core prototype. Developing Brock’s idea, types of humor playing with linguistic norms and focusing on form and performance of the talk would be situated in a circle immediately surrounding the vacant core. Other types of humor that may also rely on play with linguistic norms without entirely cancelling any serious dimension (e.g., purpose or intention, real life reference), and that may pursue purposes beyond amusement, would be placed in outer circles. It is important to note that “[t]‍he playful keying does not necessarily suspend any of the serious meanings created” (Kotthoff 2006b: 287). This would also be in accord with Zemskaja’s (1983, 1995a) distinction between two basic forms of language play: ‘empty’ joking, serving exclusively entertainment ends (balagurstvo), and sharp-minded joking, conveying serious messages (ostroumie). Obviously, in the latter case not all an utterance’s real-life impacts are meant to be cancelled. In Russian linguistics, the concept corresponding to that of discourse or interaction modality is tonality (tonal’nost’). Karasik (2007, no pagination) describes tonality as the “emotional-stylistic format of an utterance” (“ėmocional’no-stilevoj format obščenija”), which influences and constrains the interpretation. Scholars of colloquial Russian (russkaja razgovornaja reč’), the variety habitually ascribed to well-educated urban dwellers in spontaneous face-to-face interaction, identify both serious talk and a sphere of communication in which a playful tonality (šutlivaja tonal’nost’) prevails (Zemskaja 1983, 1995a). Some speech genres are regularly performed in this tonality (Kitajgorodskaja and Rozanova 2010: 92– 119). Earlier, the concept of tonality was also used to account for the heterogenous nature of colloquial Russian, which varies across situations. Zemskaja (1995b: 52– 55) tended to model colloquial Russian as a variety guided by its own norms and forming a distinct linguistic system. Nevertheless, some colloquial Russian texts display the typical ‘systemic’ features of this variety to only a limited extent. This has subsequently been explained in terms of the different tonalities of colloquial Russian, which are responsible for variation within the system. Russian scholars highlight several characteristic aspects of talk conducted in a playful tonality. Karasik (2007, no pagination) stresses that it aims at establishing a close and emotional relationship between the interlocutors. According to Zemskaja (1983: 174), interlocutors employing a humorous tonality primarily aim at achieving amusing effects (ustanovka na komičeskij ėffekt). Various forms of language play that violate diverse linguistic and pragmatic norms essentially contribute to amusement and are indicative of humorous tonality (Mečkovskaja 2007; Zemskaja 1983, 1995a). The orientation towards form and sound, as well as the focus on language play, suggest an understanding of playful communication in which the poetic function of language (Jakobson 1960) gains in importance. Zemskaja (1995a: 268) even goes to such lengths as to describe humorous



Chapter 2.  Conversational joking from a discourse-analytic perspective

utterances as utterances which lack a message and which are purely oriented to their unconventional form (“Eine solche Sprechhandlung ist, was den Inhalt bzw. die Information anbetrifft, leer und ganz auf die Form bzw. auf die Ungewöhnlichkeit der Form orientiert”). In a nutshell, the factors accounting for a humorous framing resemble an additional layer of meaning that transforms an activity or a meta-message conveyed into a “collateral track” (Clark 2004) commenting on how a message is meant to be interpreted in a way distinct from its primary meaning. In this regard, the concepts of framing applied are indeed “relational” (Tannen 1993) or “transformational” (Goffman 1974). With regard to the conceptualization of play, however, they vary slightly. Three clusters emerge, each of which emphasizes a different aspect of play. In the first, and in line with Bateson, Hymes or Goffman, playful communication is understood in terms of pretense or make-belief. Joking interlocutors act as if joking were on a par with play-acting. In the second cluster, following Kallmeyer, Müller or Brock, playful communication is based on a suspension of real-world conditions and norms regulating real-life communication. Joking favours fantasy and fictionalization. Of course, pretense and make-belief may involve the suspension of real-life obligations; the two clusters differ in accentuating one or other of these two aspects. The third concept of play actually coincides with, or results from an extreme suspension of norms. In playful communication, speakers may violate or play with linguistic rules and regularities to such an extent that play with the linguistic form eclipses transmission of message. Some scholars (e.g., Kotthoff 1999; Schwitalla 1994; Zemskaja 1983, 1995a) therefore associate playful communication with verbal art, arguing that, in joking, play with linguistic form and the performance itself becomes the overriding concern. As focus is shifted from the message to its form, even at the expense of the content, the poetic function of language comes to the fore. 2.2.2 Contextualization cues for humor in Russian conversations Humorous chunks of Russian discourse are set off from surrounding talk by various means which point to the speaker’s playful stance, and to which interlocutors orient when interpreting the utterances framed in this way as non-serious. These contextualization cues vary in several regards. They rely on different semiotic resources and they may potentially be drawn from any level of discourse, from paralinguistic, prosodic or verbal devices to pragmatic and variational strategies, and choices concerning the content of the talk. In principle, and as shown by Ford and Fox (2010), visual cues can also establish a humorous modality. However, such cues cannot be traced in our data as we rely exclusively on audio-taped data and edited transcriptions, so we will ignore them. Some of the cues we can trace are

39

40 Understanding Conversational Joking

complex and affect not merely a single interaction level, but several in conjunction. The ways in which these complex cues are arranged for presentation here can thus be disputed to some degree. Accordingly, whenever a cue could be differently classified, or the constitution of a complex cue involves or is affected by other levels of discourse, a cross-reference will be provided. The analysis of contextualization cues establishing a humorous modality further raises the question of the precise status and nature of contextualization strategies. Gumperz compares them to a knot in a handkerchief which has no meaning in itself but is intended as a reminder of something else. Nevertheless, contextualization theory also allows for “natural” contextualization cues (Auer 1992: 33–34) which iconically display what they indicate. Similarly, devices with an inherent lexical meaning, such as overt announcements of genre, can serve as contextualization cues (Auer 1992: 24–25; Schmitt 1993: 347–348). In humorous discourse, both these kinds of cues are at work alongside prime forms of contextualization cues, which are non-referential and non-lexical, and hence have only an “implicit meaning” (Gumperz 1982: 132). In addition, contextualization cues are divided into groups according to their placement or relation to the framed unit of talk (Auer 1992: 28; Clark 2004: 373– 381). They can be clearly separated from the contextualized discourse unit, preceding or following it (external anticipatory or retrospective cues in Auer’s terms), or inserted into it (insertions in Clark’s terms). Lastly, they can be realized within the framed talk. This final group can be divided into “concomitants”, which are accompanying signals “displayed at the same time as, but separate from a primary signal” (cf. Attardo et  al. (2003) on para-communicative alerts), and “modifications” of the primary signal, that is, the utterance itself (Clark 2004: 374). Auer (1992: 28) differentiates cues realized alongside according to their extension, intensity and placement within the framed chunk of discourse, and divides them into peripheral and non-peripheral cues. The latter further divide into singular, recurrent or permanent cues (1992: 28). As observed by Ford and Fox (2010), contextualization strategies for humor are dispersed throughout the jocular sequence and occur in various positions relative to the framed talk. Finally, contextualization cues are typically used redundantly and cluster in talk-in-interaction. Auer stresses that this facilitates the: process of inferencing (…) not only in the sense that a negligent participant may fail to monitor co-participant’s behavior on all the levels in play and is still in a position to receive enough information, but also in the sense that the contextualization value of an individual cue, which may be ambiguous itself, is made less ambiguous by such a multiplicity of coding. (Auer 1992: 29–30)



Chapter 2.  Conversational joking from a discourse-analytic perspective

This applies particularly to conversational joking, in which a multitude of cues co-occur and may combine with intrinsically humorous features (e.g., incongruities, disconfirmed expectations), so that it “is difficult to describe [conversational humor] without taking into account the context and the combination of prosodic cues with the other resources (like the lexicon for instance)” (Betrand and PriegoValverde 2011: 334). In decomposing clusters of cues, the following presentation of contextualization strategies for humor observed in the analyzed data accordingly provides, for the sake of clarity, what are in some sense artificial divisions. 2.2.2.1 Prosodic and paralinguistic cues For all their ambivalence, laughter and related phenomena which affect the realization of speech remain prime cues for a humorous framing (see 2.1.1). Laughter (i.e., discrete syllabic laughter particles) can precede or follow a framed unit of talk, but they may also be inserted into or superimposed upon it. While laugh pulses may also exit through the nose, the following characterization by Chafe, who conducts a detailed phonetic analysis of laughter, seems apt. [A] laugh consists of one or more spasmo explusions of air from the lungs, (…) these pulses pass through the larynx where they are usually though not always voiced, and (…) finally exit through the mouth, where they usually acquire the quality of a neutral or schwa-like vowel. (Chafe 2007: 23)

Practices in transcribing laughter vary (cf. Jefferson 1985). In this study, we have partially followed and adapted the GAT conventions for transcribing talk-in-interaction set out by Selting et al. (1998, 2009), rendering syllabic laughter with a schwa-like vowel as hehehe and nasal laughter with a closed mouth as mhemhem. Obviously, there are instances in the data of syllabic laughter in which the vowel quality deviates from schwa, the laughter concerned sounding more like hahaha, hohoho or hihihi. This information is retained in the transcripts because such realizations of laughter often serve as more than mere indexalic icons pointing to amusement. They can stand for laughter, and mark quoted and ironically detached laughter which, however, is a distinct sort of cue (see 2.2.2.2). Moreover, laughter can be superimposed upon speech or co-occur with speech. Based on a thorough acoustic analysis of English data, Chafe distinguishes forms of “laughing while speaking” according to the “kinds of oscillation that occur as people talk” (2007: 47). He identifies creaky voice “produced in a relaxed larynx at something approximating 50 Hertz but in a highly irregular fashion” (2007: 49), and an articulation he terms tremolo, which is “slower at roughly 20 Hertz, though again with much variation”. Discrete “laugh pulses” are characterized by a still slower oscillation of about 5 Hertz (Chafe 2007: 49). Similarly, Ford and Fox observe “local modulations of loudness, sometimes accompanied with local modulation of

41

42

Understanding Conversational Joking

pitch” (2010: 353) as pointing to laughability. For the sake of clarity, and for those wishing to stick to musical terminology, tremolo refers to modulations of loudness, whereas modulations of pitch should be referred to as vibrato. Another laughter-related phenomenon which establishes a humorous framing is the so-called smile voice articulation. Smiling is a core visual cue of the feeling of nonseriousness (Chafe 2007: 51–57). Psychologists analyze and categorize smiles within the facial action coding system (FACS) by decomposing them into contractions of particular face muscles and describing types of laughter in terms of muscles involved (Kotthoff 1998: 109–110). A smile voice is a secondary effect of this visual cue achieved by muscle contractions of the face. Spreading of the lips while smiling also affects articulation and is responsible for an audible smile voice articulation. Nevertheless, not every laughter-related detail which can be exactly placed, precisely measured, and characterized in terms of acoustic phonetics is equally relevant to the participants. Perception is the decisive factor in a discourse analysis aimed at the reconstruction of emic categories. Chafe acknowledges this in stressing “that a listener may perceive a stretch of speech as being coextensive with laughter, even though the physical manifestations of that laughter are sporadic and varied and not sustained throughout the speech in question” (2007: 49). The accumulation of laughter and/or laughter-related phenomena (e.g., vibrato, tremolo, creaky voice, smile voice) throughout a sequence contextualizes humor because of the holistic impression it leaves for the interlocutors. Bearing that in mind, we refrain from an instrumentally verified, exact placement of each occurrence of such laughter-relevant cues. Instead, we mark the discourse chunk which perceptually contains these cues, because “the way we perceive the relation between laughter and speech often diverges significantly from the physical nature of that relation” (Chafe 2007: 49). In line with the GAT guidelines, we use square brackets to specify the quality and scope of these laughter-relevant cues. This practice is illustrated in Examples  (1) and (2). In (1), Olga and Lena talk about scholarships provided on condition that, after graduation, students work in the district administration for several years. Graduates who take another job must reimburse the grant. Olga’s talk in lines 5–8 is interspersed with laughter particles and realized laughingly, as she explains that, in this event, particularly successful students who receive higher grants also have to refund more money. Lena affiliates with the humorous modality initiated by Olga and articulates “vozmeščat’” (‘repay’) in a smile voice. The discrete laughter particles in line 12 are also attached to Lena’s turn. In adding that students who have not been awarded scholarships do not have to refund anything, Lena’s comment combines with Olga’s initial turn to reveal a strange reversal which seemingly discriminates against successful students relative to mediocre ones.



Chapter 2.  Conversational joking from a discourse-analytic perspective



(1) ORDs20_03 (06:02:59), Olga (Ol), Lena (Le)

1 2

Ol :

3 4 5 6 7 8 9

Le :

10 11 Ju : 12 Le :

[…] i:: eščё: vozmestit’ vsju stipendiju;= and in addition pay back the whole scholarship =vsju stipendiju kotoraja za pjat’ let the whole scholarship all five years of it vyplačivalas’; (1.5) = studies and got a higher scholarship =a kto voobšče ničё ne polučal; (-) and those who didn’t get anything at all togda ne nado ničё = then they don’t have to repay anything =[da; yes =[hehehe-

In (2), Olga signals, by means of a low frequency oscillation (tremolo) characterizing parts of her utterance in lines 4–5, that her assumption is not serious. She is merely suggesting, humorously, that their friend Anja cannot be reached because she is lying in an ambulance.

(2) RuBoFreiburg2005 (00:28:05–2), Olga (O), Oksana (Ok)

1

O :

2 3

Ok:

4

O :

5 6

Ok:

my choteli pozvonit’ ane-= we wanted to call Anja =u ani byl otključen mobil’nyj; Anja had her mobile phone switched off dva raza. twice dva raza. ambulance net; ne dumaju. no I don’t think so

43

44 Understanding Conversational Joking

To put it at its most general, deviations from an interlocutors’ normal way of speaking in terms of pitch, speech rate, volume or other voice qualities, as well as unusual intonational patterns, may indicate that an utterance should not be taken at its face value and point to a need for additional inferences. Such deviations from a speaker’s default vocal and prosodic traits can thus be utilized to establish a humorous modality (e.g., Kotthoff 1998: 192; Attardo et al. 2003: 252). In Extract (3), for example, a lowering of F1’s voice signals that her criticism of I10’s behavior in line 7 is not meant to be interpreted seriously. Her jocular reprimand is an immediate response to I10’ turn, in which he relates that he ordered a particular dish for his flatmate to prepare for him. F1 jokingly criticizes him for treating his flatmate as a servant. In addition, there is adjacent laughter which also signals a humorous framing.

(3) ORDs10_05, F1 (female), I10 (male)

1 2

I10:

3 4 5 6 7

8 9 10

F1 :

[…] nu segodnja makSIMka varit. well today Maksim is cooking (2.25)

today I’ve sort of ordered Plov from him (---)

I say come on [ you’re neither ashamed nor have pangs of conscience (--)

(-)

In addition to deviations with regard to volume, unusual changes in the speech rate can similarly function as contextualization cues for humor. In (4), Olga relates her troubles in finding a couple therapist. She rejects the recommended psychologists as these specialize in working with drug and alcohol addicts. Oksana immediately attaches a humorous comment, claiming untruly that Olga has problems with alcohol and drug addicts (“.” – ‘you have problems with alcohol and with drug addicts’). In doing so, she significantly lowers her speech rate in order to suspend serious interpretations. This shift to a humorous modality is ratified by Sergej’s laughter in line 11, although he initially amends Oksana’s claim by changing the kind of problems humorously alluded to (“s alkogolikami i narkomanami” – ‘with alcoholics and drug addicts’).



Chapter 2.  Conversational joking from a discourse-analytic perspective



(4) RuBoFreiburg2005 (00:14:16–8), Olga (O), Raisa (Ra), Oksana (Ok), Sergej (Se)

1 2

O :

3 4

Ra:

5 6 7 8

Ok:

9 10 Se: 11 12 Ra:

[….] začem mne nuž[na äh-] psi=psicholog; why do I need eh a psy=psychologist [hahaha] kotoraja zanimaetsja alkogolikami i narkomanami; who works with alcoholics and drug addicts esli u menja problemy v sem’e i ja choču .hh if I have problems in the family and if I want .hh rešit’ to solve problemy v sem’e,= the problems in the family =. drug addicts s alkogolikami i narkomanami; with alcoholics and with drug addicts [( )] mhehem mhehehem [s ( )] with

Similar alienating effects that set a chunk of talk off from the surrounding discourse and signal that additional meanings and intentions are conveyed can be achieved by overarticulation. The data shows evidence of two main features lending an overarticulated character to an utterance and also observable in the joking sequences of French, British or American speakers. These are the special prominence given to accentuated syllables and the lengthening of individual syllables (e.g., Bertrand and Priego-Valverde 2011: 341; Haiman 1998: 39; Attardo et  al. 2003: 244–245; Zemskaja 1983: 268). If these features combine with pauses within phrasing units, the utterance as a whole has the potential to become rhythmized, as is illustrated in Example (5). Here Vasilisa, having discovered a sack of rice in her host’s kitchen, humorously interprets this as a sign of wealth. Her utterance in lines 3 and 4 exhibits a pattern of pronounced pitch accents. Although the turnconstructional units in each line show a pattern of similarly distributed pitch accents, structural parallelisms regarding initial wording and similar placement of the unit-internal pause, they are not completely isochronous. Nevertheless, Vasilisa’s humorous comment tends strongly towards rhythmization. Again, the humorous framing signaled by these cues is ratified by adjacent laughter.

45

46 Understanding Conversational Joking



(5) RuBoBerlin2009 (00:50:51), Vasilisa (Va), Afrosinja (Af)

1

Va:

2

Af:

3

Va:

4 5

? :

oksana [pokupaet-] Oksana buys [( )] celyj mešok [risa; a whole sack of rice [ona=ž- (-) boGAčka;= of course she’s a rich woman =ona=ž (.) pokupaet ris mešKA: mi. of course she buys rice by the sack mhemhemhe

In Extract (6), particularly pronounced pitch accents and vowel lengthening are combined with smile voice and laughter pulses (superimposed upon speech and discrete) to give I1’s utterance a jocular character. These cues indicate that her exaggerated references to the omnipresence of culture in Saint Petersburg are to be taken with a grain of salt (lines 2–6).

(6) ORDs01_02 (19:30), I1 (female)

1 2

I1 :

3 4 5 6 7 8 9

.hh h= = = and culture THERE == =u menja (.) podružka äh sčitaetI have (.) a friend eh thinks

In establishing a humorous modality, interlocutors further rely on speech modulations affecting the intonation with which a phrasing unit is realized. Interestingly, both a melodic and a monotonous intonation can mark that something is uttered tongue-in-cheek. In both cases, however, it is important to stress that the effect is achieved in combination with the message itself. The unusual intonation patterns observed do not merely set an utterance apart from surrounding talk. They also contrast starkly with the message conveyed. Amusement



Chapter 2.  Conversational joking from a discourse-analytic perspective

results from the perceived mismatch between intonation and content (Attardo et al. 2003: 251–253). As already alluded to in (5), humorous utterances can be contextualized as such by a rhythmic intonation pattern. Haiman (1998: 37–38) or Müller (1983: 304) argue that a rhythmic patterning giving the impression of singsong or chant establishes a theater frame which allows speakers to detach themselves from what they are saying. Russian interlocutors also rely on chant-like rhythmic patterning as a contextualization cue signaling that their intention is not serious. In (7), the effect of rhythmic speech emerges from a series of phrasing units with a recurrent distribution pattern of pitch accents, and from accentual isochrony (lines 10–11 and 14–15). Although the phrasing unit in line 12 “” (‘I’ve been struggling my whole life’) is realized at a higher speech rate, it does not isochronously match with the surrounding rhythmic units. Nevertheless, in conjunction with the surrounding sequences of rhythmic units, it contributes to the overall melodic impression made by this part of Oksana’s utterance. In order to render rhythmic patterning in the transcript, we have adopted the conventions developed by Selting et al. (2009: 387–388), with slashes (/) used to bracket rhythmic units each containing one stressed syllable. The beat of a rhythmic unit is marked by capitalization.

(7) RuBoFreiburg2005 (00:11:09–5), Olga (O), Raisa (Ra), Oksana (Ok)

1 2

O :

3 4 5 6 7 8 9 10 11

Ra: Ok:

[…] mama ne budet ponimat’-= mum won’t understand =počemu ja nervnaja.= why I’m so on edge =budet dumat’ čto ja vsegda na ljudej brosajus’. she will think that I’m always turning on people .hh [nu ja prosto.hh well I simply [( ) [ja skažuI say brosaetsja? does she turn on? brosaetsja. (-) she turns on / vot taKAja /she’s such a / vot skoTIna /such a beast

47

48 Understanding Conversational Joking

12 13 Ra: 14 15 16 17 O : 18 19 Ra:

[I’ve been struggling all my life [mhemhem mhemhem] / i skol’ko LET / and how many years / uže straDAju /I’ve been suffering u tebja ložka [vot tam est’; your spoon there it is [poka mama sčitaet čto:so far mum thinks that ėto ty brosaeš’sja na ljudej. it’s you that turns on people mhemhemhem

Needless to say, Oksana’s melodic realization of her utterance is in stark contrast to the lament which it conveys. The chant-like enunciation signals that she has not seriously (or at least not entirely seriously) ‘been struggling’ and ‘suffering all her life’, or even ‘for years’, because her flatmate is ‘such a beast’ (lines 10–12, 14–15). Again, this example shows that several cues to, and features of an utterance combine in achieving humorous effects. Here rhythmic patterning, exaggeration (see 2.2.2.3) and a mismatch between intonation and message together contribute to the amusement ratified by Raisa’s laughter in line 13. Just as the singsong realization of serious or depressing messages elicits amusement, the laconic realization of emotional and positively evaluated messages can achieve this effect as well. In this case, too, it is not exclusively the unusual intonation pattern, but also the contrast between message and pattern, that contributes to the humorous effect. This kind of “intonational misfit” (Cruttenden 1984) – that is, realization of a positive evaluative message with a flat and monotonous intonation  – is typical of ironic or sarcastic utterances. Attardo et  al. (2003: 249–250) observe a “compressed pitch pattern (…) with very little pitch movement” as a prosodic cue for irony in sitcoms, when remote recipients in front of their TV sets might not recognize the irony without such cues. These authors also stress that a flat intonation does not of itself convey the meaning of irony; rather, it steers interlocutors by signaling that additional inferences are required. Similarly, Haiman (1998: 35–36) cites “total melodic monotony” as a signal of sarcasts’ detachment from what they render. He treats a flat intonation as a cue establishing a theater frame which cancels the speaker’s personal responsibility and allows them to dissociate themselves from the framed chunk of talk. Extract (8) provides an example of irony contextualized by a flat and monotonous intonation that prevents the utterance in lines 9–10 from being taken at face

Chapter 2.  Conversational joking from a discourse-analytic perspective 49



value. In listening to Olga’s report about her family disputes and domestic troubles (which was taped and used in linguistic research), Sergej intersperses non-serious comments (lines 9–10 and 13). These refer ironically to Olga’s backbiting as ‘wonderful text’ (“prekrasnyj tekst”) and allude to the fact that the talk is being taped and will later be heard by another party. Generally, irony does without additional cues provided interlocutors or recipients obtain enough background knowledge to recognize that the speaker does not actually mean what they are saying and to draw the required inferences. Here, the flat intonation additionally indicates that inferences need to be triggered. In line 13, Sergej resorts to another, previously used prosodic strategy in signaling that he is not serious in classifying Olga’s talk as ‘wonderful text’. Overemphasizing and lengthening the syllable carrying the pitch accent (“preKRA: snyj”) lends his comment a drawling character, turning it into an ironic praise and preventing it from being taken seriously.

(8) RuBoFreiburg2005, Olga (O), Sergej (Se), Oksana (Ok)

1 2

O :

3 4

Se:

5 6

Ok:

7

Ok:

8 9

Se:

10 11 Se: 12 Ok: 13 Se: 14 Ok:

[…] ona [vret]. she’s lying [mhem] [ej ot menja nužny tol’ko den’gi]. she only needs my money [hehe hehehe hehe hehe hehe hehe] i ona [zlitsja potomu čto ja ne idu rabotat’-= and she’s angry because I don’t work [.hh .]= you talk rubbish =[i poėtomu deneg netu. and so there’s no money =[; wonderful text [ja ne znaju kto iz vas ego pridumal; I don’t know which of you made it up [mhem hehehehe hehe hehe hehe hehe hehe hehe [. but it’s wonderful [hehe hehe hehe hehe hehe

Moreover, prosodic deviations from a speaker’s default way of talking can be indicative of mimicking another social role. This creates amusement, especially when

50

Understanding Conversational Joking

the role imitated is incompatible with the context. Mimicking prosodic, as well as other linguistic and pragmatic features habitually attributed to another person (be they real, fictional or stereotypical) is classified as a pragmatic strategy for the contextualization of humor (see 2.2.2.3 on animated speech) since it often involves a shift of deixis, which moves from the current speaker to the staged figure. Such a shift also includes the vocal deixis (Couper-Kuhlen 1999: 16): in other words, there is an observable deviation from the speaker’s prosodic-paralinguistic usual idiosyncratic properties. In Extract (9), Petr and Viktor significantly shift their vocal deixis, and even mimic foreign accents, in order to signal that they are not speaking for themselves and, hence, do not seriously mean what they say. Discussing the Russian-Georgian conflict in 2008, they ridicule the accompanying information war and render humorously the positions both of those who wish to switch off Russian broadcasting channels (lines 5–6) and of those rejecting the idea as undemocratic (lines 8–9). Petr speaks at a higher pitch level than usual and with a squeezed voice, while Victor mimics the staccato rhythm associated with a Baltic accent.

(9) RuMaCharkov2008 (00:38:44–7), Viktor (V), Petr (P)

1 V : 2 3 P : 4 5 6 7 V : 8 9 10

tem čtoin doing that ne DAT’ im [( ) don’t let them ( ) [eh da.= eh yes =ograničit’ demokRAtiju, limit democracy i . watch Russian TV channels chotja NE n: although no n: . democracy (1.0)

The next extract shows a shift of vocal deixis slightly different from that just illustrated. In (10), Lena performs the action of requesting a drink in a humorous way by adopting a mode of talking unusual for her which, however, does not allude to any particular social role (lines 18 and 19). She merely speaks with a higher



Chapter 2.  Conversational joking from a discourse-analytic perspective

and louder voice, and lends prominence to particularly accented syllables of her utterance. These prosodic cues are sufficient to set her request off from the surrounding discourse and attune her interlocutors to the fact that this is not merely a request. The unusual inferences triggered by the shift of vocal deixis are, however, due to another aspect of the utterance. Earlier, Lena has explained that she once cured herself by drinking vodka with Tabasco (lines 2–6). In the following turns, interlocutors elaborate on the suggested understanding of drinking alcohol as a form of cure. In Olga’s turn (lines 10–12), offering a drink is understood as offering a treatment, which is also presented as a laughable. Lena’s request in lines 18 and 19 continues the comparison; she does not directly ask for a drink but to be given a treatment (“”). So the amusing effect is due also to the framing of drinking as cure, which usually does not apply to the context of a party with friends (see 2.2.2.4 on the evocation of other contexts as a contextualization cue regarding content). The unusual prosodic realization additionally draws interlocutors’ attention to another semantic dimension of this utterance; its serious intention (viz. to obtain a drink) is preserved. The prosodic cues do not cancel it, but add an additional layer marking the activity as being performed humorously. (10) ORDs20_03 (15:02:33), Lena (Le), Olga (O), Daniil (Da) 1 2

((pouring vodka into glasses)) =ja ej togda lečilas’;= I cured myself with this 3 =s tabasko vot pila,= I drank it like with tabasco 4 =[voobšče;= naturally 5 Ol : =[aga;] aha 6 Le : =klassno pomoglo; (-) it really helped 7 Ol : da- (---) yes 8 ((…)) 9 Ol : […] 10 nikto ne boleet? is anybody ill? 11 (1.1) 12 a to by tak polečili, then they could be cured this way Le :

51

52

Understanding Conversational Joking

13 14 15 16 Da : 17 Ol : 18 Le : 19 20 21 Da : 22 Ol :

mhemhe- (1.6) a ėto kakoj-t(o)and this is some [kakoj-to est’ (tam) eščёthere’s still some (there) [ (-) and why don’t you fill our glasses

well so = CUre me == =[hm=hm; =[da- (1.3) yes

For their presentation here, the prosodic contextualization cues have been examined in isolation. As illustrated in the examples, they may suffice for the contextualization of a humorous discourse modality or framing. Nevertheless, they often co-occur with other contextualization strategies with the aim of ensuring that interlocutors do not interpret a humorous utterance (exclusively) at face value even if they fail to notice one of the cues involved. As we have learned from some of the examples, they may also accompany utterances which contain an intrinsically funny moment (e.g., an incongruity, an unusual contrast). Furthermore, many of the prosodic cues, as well as laughter and laughing while speaking, are sufficient to provide a humorous keying. In doing so, they are prime instances of contextualization cues with no inherent meaning. As we will see in the remainder of this chapter, this status can be challenged in the case of vocal deixis shifts constitutive of the complex contextualization strategy of animated speech, in which another social role is evoked by mimicking prosodic and vocal features attributed to that role. In such instances, vocal and prosodic peculiarities associated with a particular social role accordingly index the social identity evoked rather than a humorous framing. 2.2.2.2 Linguistic cues Contextualization strategies which essentially rely on verbal means are basically divided into three groups: strategies of poetic talk, a shift to a register or style incommensurate with the content or context of a given utterance, and interlocutors’ explicit framing of a stretch of talk as humorous. These are classified as linguistic



Chapter 2.  Conversational joking from a discourse-analytic perspective

because they primarily affect the wording of utterances, although other levels of discourse may be involved, too. Humorous talk can be singled out from surrounding serious discourse and attract interlocutors’ attention due to various techniques of verbal art (Kotthoff 1999). These testify to a shift from the content of talk to its form, and to performance itself. In linguistics, the term performance has various meanings (Scharloth 2009a: 234-236). In the context of conversational joking, the understanding of performance promoted in linguistic anthropology applies. Linguistic anthropology deals with performance as a mode of communication in which a speaker is not held responsible for what they say but for how they say it: that is to say, for how they perform (Bauman 1977; Scharloth 2009a, b). In other words, speakers are assessed for the quality of their performance and not for the message conveyed. Aesthetic elaboration gains in importance. This tendency matches Zemskaja’s (1983, 1995a) understanding of conversational joking as play with linguistic form in achieving humorous and entertaining effects, and as a form of communication in which the aesthetic purpose is dominant. Indeed, many of the linguistic cues contextualizing humor, such as repetitions and structural parallelisms, or iconic strategies, focus on the poetic function of language which “projects the principle of equivalence from the axis of selection into the axis of combination” (Jakobson 1960: 358; italics in original), and which “promotes the palpability of signs” (Jakobson 1960: 356). Contextualization cues that involve wording and playing with the linguistic form of an utterance for aesthetic purposes clearly mark a deviation from serious talk. With spoken language in mind, structural parallelism of linguistic units primarily concerns the sound of speech (e.g., Jefferson 1996; Schwitalla 1994). The elements reiterated may differ in size and vary slightly in substance. Repetitions of words or phrasal units, as well as sequences of similar sounding words or slightly varied constructional units, all have an audible effect, which Jefferson (1996) terms “sound rows” or “sound sequences”, and which contribute to the “poetics of talk”. Exact or slightly varied reiteration of verbal elements or linguistic structures alerts interlocutors’ that something unusual is going on in conversation and can indicate an orientation to play and non-seriousness. In the analyzed data, both reiteration and variegated rendition of linguistic structures occur as verbal contextualization strategies for humor, which in addition may achieve rhythmic effects. Example (5) above illustrates how structural parallelism and variegated repetition can establish a humorous modality. And, as illustrated in Extract (11), sheer repetition can also shift interlocutors’ attention to the form of an utterance and elicit amusement. Oksana and Vasilisa each poke fun at the sequential routine preceding the telling of a joke by repeating three times a turn-constructional unit with no, or only slight, variation in wording and prosodic patterning (lines 5–7 and lines 8–10). As a result, there is a cluster of similar sounds.

53

54

Understanding Conversational Joking

(11) RuBoBerlin2009, Vasilisa (Va), Oksana (Ok) 1 2

Va:

3 4 5

Ok:

6 7 8

Va:

9 10 11 ? : 12 Af:

[….] ja ( ) chotela kupit’ sigaretyI ( ) wanted to buy cigarettes ( ) ; well I’ll tell a joke now rasskaži anekdot;= tell a joke =rasskaži anekdot; (.) tell a joke rasskaži anekdot; tell a joke , repeat it [ DE:tskij. for children

If such repetitions or other structural parallelisms occur across turns, they create extremely cohesive stretches of talk, which are also typical of joking sequences. Further instances of this contextualization strategy, which entails creating structural and formal links across turns, will therefore be provided in the subsection on pragmatic cues (2.2.2.3). Here, we confine ourselves to the presentation of “cross-speaker poetics” (Jefferson 1996: 28) in the service of word play. Similar sounding linguistic strings rendered by several interlocutors not only testify to aesthetic elaboration but can also be exploited for the purpose of word play, which once more stresses the intersection of humor with verbal art. Many of Jefferson’s (1996: 28–30) own examples show how linguistic units are related to each other by their similar sounds, how this is done at the risk of providing unusual links regarding semantic content, and how it frequently achieves amusement. This is just as in poetry, where “any conspicuous similarity in sound is evaluated in respect to similarity and/or dissimilarity in meaning” (Jakobson 1960: 372). Indeed, Jakobson explicitly points to the propinquity of poetry and word play. “In a sequence, where similarity is superimposed on contiguity, two similar phonemic sequences near to each other are prone to assume a paronomastic function. Words similar in sound are drawn together in meaning” (Jakobson 1960: 371).



Chapter 2.  Conversational joking from a discourse-analytic perspective

This is exactly the case in (12), where Olga, in explaining that the mayor was replaced, makes a slip of the tongue (“mėr smeni”, line 6). This phrase not only remains unfinished but also contains a grammatical error since mėr (‘mayor’), as an animate male noun, requires an inflectional marker ({-a}) in this construction; grammatically correct would be mėra smenili (‘They replaced the mayor’). This slip provides the basis for a subsequent paronymic word play. Olga immediately accomplishes a self-initiated self-repair to clarify that not the mayor but the head of the administration had been replaced (line 7–8), the potential misunderstanding being the product of an inappropriate referent and not the grammatical error. Nevertheless, her repaired phrase provides the opportunity for word play based on paronymy, which is used by Daniil (line 9) in rendering a similar sounding string (“mers smenili”, mers is the short form for Mersedes (a Mercedes car) in colloquial Russian), as a repair candidate. The meaning of this phrase (‘They replaced the Mercedes’), however, seems odd in this context. Daniil himself orients to this incompatibility regarding the content as if it were a laughable by continuing in a smile voice in line 11. Lena (Le), nevertheless, offers as a repair a grammatical correction of what Olga initially said, by rendering “mėra smenili” (line 12). This attempt of repair is not ratified by Daniil, who sticks to his phrasing and maintains the humorous modality “= “an’ crap” (Jefferson 1990: 69), she shows how list completers (i.e., third parts) can predominantly fit in formally, for instance, when there are “acoustic consonance and punlike relationships (…) between list completers and prior list items” (1990: 71). In the ‘cakes an’ candy’ example, the deprecatory list completer fits formally, by alliteration, but it does not match with the preceding list items in evaluative terms. In terms of content, it serves as a mundane generalized list completer. Another way in which mismatching list completers can trigger laughter relies on their “expectable sameness provided by the adequate representativity feature exploited to design” a punchline (Jefferson 1990: 79). Recipients expect all items of a list to display a “feature of adequate representativity” (1990: 81): that is, to fit into a particular set or category. Furthermore, “a three-part list manages a three-step movement from one topic to the next” (1990: 80). Funny and surprising effects may arise from lists with third parts that are not in accord with the direction initiated and established by the first two list items, or alternatively from list completers inconsistent with the category opened by the preceding list items, which is often defined ad hoc. These features characterize bathetic lists as in Oksana’s explanation of a specific psychological disorder in Example (18). Here, the list completer ‘with the Komsomol’ (“s komsoMOlom”) breaks with the series starting from ‘identification with the family’ (“identifikacija s sem’ej”) and ‘with one’s profession’ (“identifikacija s professiej”). We suggest that the character of this violation differs substantially from the one in (17). In Giora’s terms, it relates to the relevance requirement, and not to the graded informativeness requirement. (18) RUKoBerlin 2009 (01:11:52–1), Oksana (Ok), Daša (D) 1 2 3 4

Ok:

net;=ähmno ähm čelovek kotorogo nosit s seksual’noj a person who has troubles with their sexual identifikaciej; identification čelovek kotoryj ne prinimaet sebjaa person who doesn’t accept themselves

Chapter 3.  Humor as a cognitive phenomenon 153



5 6 7

[Ok:]

8 9 D : 10 [Ok:] 11 12 13 14 15 16 [Ok:] 17 D : 18 [Ok:] 19 20 D : 21 [Ok:] 22 D : 23 [Ok:] 24 25

prinimaet sebjaaccept himself/herself zaputyvaetsja vo vsem; is confused about everything [čelovek kotorogo nosit äh so vsemi ostaln’nymi a person who has troubles äh with all other kak by temamitopics so to speak [mhm.] bud’-to äh otnošenie k sem’e;= whether it is äh their attitude towards the family 1=identifikacija s sem’ej;= identification with the family =identifikacija s professiej;= 2 identification with one’s profession =identifikacija tam ja ne znaju ( );= identification with I don’t know =s čem uGODno; anything; .hh s äh (-) bljad’ s komsoMOlom; 3 hh with äh (-) shit with Komsomol [bez raznicy s čem; ponimaeš’? it doesn’t matter; do you understand? [mhm. mhehehehehehehehehehehehehe [to est’ ėto voobšče kakie-to tvoi i tvoi that means these are just some your and your sobstvennye fakty-= own facts [.hh hehe .hh .hh .hh] =da,= yes =[mhm. =[to est’ vot čto čto takoe JA; that means this is is me vot ėto JAthis is me =vot ėto ne JA. this is not me.

A list construction can also be continued by an interlocutor who delivers a punch line by providing a list completer that does not fit into the set initiated and redirects

154 Understanding Conversational Joking

the series. In (19), Daniil supplies a third item which supplements a phrase that, in terms of prosody and syntax, is otherwise a completed coordinate phrase. The two items coordinated (“administracija (-) gatčinskogo rajona” in lines 4–5 and “i:: administracija gatčina (.) goroda” in line 8) can easily be subsumed under the category label ‘administrative units’. Daniil’s additional item ‘and the dude’ (“i djadja” in line 10) turns the construction into a list construction, while simultaneously delivering a list completer not characterized by sameness: that is, it lacks features that would make it a (good) representative of the category. In fact, this third list item picks up a story told earlier in the conversation about this same dude (djadja) offering services in exchange for bribes while working in an administrative unit. The recipients’ laughter thus reflects their recognition of the dude as a remote and untypical representative of the set of administrative units. (19) ORDs20_03 (04:02), Olga (Ol), Lena (Le), Daniil (Da), Julija (Ju) 1 2

Ol :

3 4 5 6 7

Le : Ol :

8 9 10

Da :

11 12

Ju :

13 14

FM : Ol :

15 16

MM : Ol :

[…] ne znaju kak u vas,= I don’t know how do you have it =u nas (-) ob’’edinilas’;= we have it merged =ran’še byla .hh administracija (-) gatčinskogo 1 earlier there were the administration unit of rajona-= gatchina district =[hm=hm- ]= =[otdel’no]; separately 2i:: administracija gatčina (.) goroda; and the administration unit of gatchina town (-) 3i djadja and the dude mhemhe[mhe[; and the dude hehehehehe[hehe [(i ščas)(and now)

u nas ščas (.) v kontore v našejwe have now (.) in our office



Chapter 3.  Humor as a cognitive phenomenon 155

Further interactional resources and structural mechanisms that are relied on by interlocutors when organizing interaction, and to which they orient themselves in interpreting utterances, concern organization in adjacency pairs, the relationship of conditional relevance and the preference organization of turns. Conversation analysis (CA) works on the assumption that the position of an utterance within a sequence and its structural design provide sufficient context for its interpretation as “it is through the knowledge of the place of an action in a sequence that one reaches an understanding of what the action was (or turned out to be)” (Bilmes 1988: 162). A particular first pair part of an adjacency pair projects specific second pair parts to complete and close the sequence. For example, a question makes an answer conditionally relevant; an invitation, an acceptance or a declination, and so on. In other words, the sequential context (i.e., the preceding turn) leads interlocutors to expect specific reaction. Conversely, an utterance realized within a particular sequential slot is seen as an activity typically realized within that slot and projected by the preceding turns. “[S]‍tructural information about conversational organization (…) predisposes participants to see utterances fulfilling certain functions by virtue of their structural location” (Levinson 1992: 75). The organization in adjacency pairs, however, allows for options among conditionally relevant second parts. These divide into preferred and dispreferred turns. Pomerantz (1984), for example, shows how, in reaction to initial assessments, affirmative assessments take a preferred turn shape in comparison to disagreements. Nonetheless, two, sometimes competing conceptualizations of preference exist in discourse analysis (cf. Bilmes 1988). According to the understanding predominant within CA, which is favored by Sacks (1974), Pomerantz (1984) and Levinson (1990), preference is a purely formal feature of turns. Dispreferred turns are characterized by such features as delay, reluctance markers, additional accounts, or modifications (Pomerantz 1984: 70–77). This means that not only interlocutors, but also analysts who adopt their perspectives, learn from the turn design whether they are dealing with a preferred or dispreferred second part (e.g., with an affiliative or disaffiliative second assessment). Bousfield (2007: 9), by contrast, identifies a “second order concept” of preference which covers the “psycho-social aspect of preference” and which goes beyond the exclusively formalist and structural first order concept adopted by ‘hardcore’ CA. This concept of preference reflects “the psycho/social expectations (…) of the participants within the discoursal context” so that the preference organization can be rephrased as “‘expected – unexpected’ preference organization” (2007: 10; italics given). On this view, affiliative assessments, for example, would be preferred for social reasons. This latter aspect or understanding of preference organization introduces knowledge about the type of social encounter and the nature of the interlocutors’ relationship which sheds light on whether, in this case, agreement

156 Understanding Conversational Joking

or disagreement is (dis)‍preferred (cf. Bilmes 1993). Preference is turned into a context-sensitive mechanism. To pursue the same example, whereas agreement and affiliation are preferred in casual talk among peers, preference organization may shift to disagreement in a context of fierce argument, particularly if this is characterized by high-involvement conflict styles (cf. Thielemann (2010) on Russian and Ukrainian women’s high-involvement conflict style). Deviations exploiting these organizational resources or “rules of order” (Bilmes 1988: 163) can trigger laughter as recipients realize the failure of their standard procedures for interpreting what kind of action an utterance accomplishes. They are forced to look for a meaning or an account beyond the action conventionally expected in a given slot: that is, the meaning normally ascribed to an utterance within this sequential slot. This is the case when a (psycho-socially) dispreferred action is realized in a preferred turn design, a situation which often characterizes conversational irony (Clift 1999) or mock politeness (Bousfield 2007). In these cases, the turn design superficially signals accordance and matches what is conventionally expected within that particular slot, although the speaker actually dissociates themselves from the suggested position (cf. Clark and Gerrig 1984). Sanders (2013) shows that interlocutors do indeed treat such a mismatch as an interactional problem deserving of attention; if it is produced accidentally, they deal with it by repair. By contrast, he analyzes irony or sarcasm in terms of a deliberately produced discrepancy between generic speaker meaning (i.e., the meaning which is conventionally ascribed to a turn due to its sequential position) and the individual speaker meaning (i.e., the position actually held by the current speaker). Such instances, however, mostly contain additional cues pointing to a divergence between the two meanings. Otherwise, they would similarly require repair. Sanders’ analysis of irony or sarcasm is also consistent with approaches that explain irony in terms of double-voiced utterances, in which a speaker renders talk, or at least a position, attributable to another source, while simultaneously dissociating themselves from it (e.g., Sperber and Wilson 1981; Wilson and Sperber 1992; Kotthoff 2002; Christodoulidou 2012; Thielemann 2013; Clift 1999). Such an account is even more appealing in cases where ironic utterances cannot (or can hardly) be traced back to another source, as for example in (20). There, Sergej reacts ironically to his interlocutors’ relating how they tried to help a friendly couple with their marriage problems. Olga’s turn reporting that, despite all efforts, the couple finally divorced is immediately followed by Sergej’s ironic response in line 11 (“.=mhehe=”, ‘all in all very successful’).

Chapter 3.  Humor as a cognitive phenomenon 157



(20) RuBoFreiburg2005 (00:21:09), Olga (O), Oksana (Ok), Sergej (Se) 1

O :

2 3 4

Ok:

5

O :

6 7 8 9

Se:

10

Ok:

11

Se:

12

Ok:

13 14

Ok:

15

a ok pytalas’ delat’ semejnuju terapiju stefanu i ah Ok tried to do therapy with Stefan and vere. Vera pomniš’?= do you remember? =(ne);=ja [ne delala terapiju; no I didn’t do therapy [ty govorila čto ty s nich den’gi budeš’ you said you’re going to brat’; charge them ona s nimi delaet terapiju;= she does therapy with them =[končilos’ tem-[oni teper’ razvodjatsja;= it ended with -[they are getting divorced now =[ėto očen’ udač[no v summe. this is very successful all in all [(ja ne delaju terapiju); (I’m not doing therapy) =[.=mhehe= all in all very successful =[ja-] I=hehehe ja vse chorošo meždu pročim sdelala;= I did it well by the way =oni zdes’ dralis’; they fought here

His utterance is characterized by a preferred turn design. There is latching, alignment and the turn is rather short. These structural features turn his utterances into a receipt of Olga’s report, which is the appropriate action type for this slot. Superficially, his utterance signals consent or alignment and fits into the slot. Psycho-socially, however, his evaluative comment is dispreferred. For social reasons, expressions of pity would be preferred in reaction to a couple’s divorce. The additional smile voice articulation, however, serves as a cue (cf. 2.2.2.1) signaling that Sergej does not seriously evaluate the efforts as successful and actually detaches himself from what is said. His comment therefore provides a prime example of irony in

158 Understanding Conversational Joking

which “conversational expectations of what constitutes a next turn are fulfilled on the level of form, but undermined on the level of content” (Clift 1999: 529). Syntactic co-constructions across turns (Helasuvo 2004; Lerner 1991, 2002) are another format or resource in which conversational expectations and sequential projections can be exploited for the creation of humor. On the one hand, they establish strong formal ties between turns by supplying constituents, whether optional or obligatory, supplementing a syntactic construction started in the previous turn. On the other, they provide the potential to change the direction of the previous utterance, which turns them into powerful instruments in discourses with competitive agendas (Grenoble 2008). Yet they can also be used in a cooperative way and thus characterize collaborative conversational behavior (Grenoble 2013). In either case, the syntactic construction of the preceding turn projects (and delimits) possible options for supplementation or completion (cf. de Beaugrande and Dressler 1981: 155). The intention and content of the previous utterance often favors particular continuations. Humorous exploitations of this format tie up formally with the preceding construction while subverting the intention of the previous turn (cf. Zima 2013a, b). This is the case in (21), where Vlad realizes a final clause (line 5) constituting an optional complement to Alena’s turn in line 4. The group is having dinner at a friend’s home when Alena asks Dusja, who recently left the dining room, to bring her handbag from the other room. From the context, we deduce that she wants to present exhibition materials which are in the bag to the group. Her utterance ‘Dusja, please my black bag over there’ (“dusja požalujsta- u menja tam SUMka  černaja-”), though off-record, is therefore easily understood as a request to bring the bag. In syntactic terms, her utterance can be regarded as complete. However, the pending prosody does not signal closure and projects a completing element to come. Vlad’s retort ‘so that øthe bag doesn’t get lost’ (“čtoby ne propala;”), though syntactically optional, accomplishes prosodic completion. Such prosodic completions across turns are described as highly cooperative by Szczepek Reed: In these instances, a first speaker has produced a contour that can be heard as incomplete, that is we expect it to go on, typically in the same TCU. If another participant takes over this contour and TCU bringing it to completion, the result is a collaboratively produced contour and turn. (2001: 31)

Thus, there are strong formal ties (regarding syntax and prosody) across these turns, which do not coincide with alignment concerning illocution and content. In the example, Vlad’s retort cancels a default inference, or an inference which is contextually coherent (bring the bag in order to show the catalogue), in favor of another (bring the bag in order to avoid its loss) that is rather remote given that they are at a friend’s home (cf. Brône 2008: 2048).

Chapter 3.  Humor as a cognitive phenomenon 159



(21) RuVeHamburg2012 (11:56), Alena (Al), Vlad (Vl), Dusja (Du) 1

Al:

2

Vl:

3 4

Du: Al:

5

Vl:

6

Al:

7

FM:

dusja ty po= Dusja you =ešče prideš´? will you come again? mhm; dusja požalujsta- u menja tam SUMka černajaDusja please – my black bag is over there čtoby ne propala; so it doesn’t get lost [prinesi [po=hehe ja choču ėto=mhemhe bring it hehe I want this [heheheehehehehehehehehehehe

The regularities concerning structure and composition of text and discourse discussed so far affect the way in which recipients process discourse as it proceeds in time. They spell out criteria that guide recipients’ expectations concerning informativity, sequentiality and alignment between turns. For instance, recipients expect informativity to increase gradually, first pair parts to project particular second pair parts, and preferred turn designs or co-constructions to characterize utterances aligned with previous turns. These normality assumptions can all be humorously undermined since jesters can rely on their recipients to expect a particular item to come. Thus, their humorous exploitations create a mismatch between what is favored by conventions, norms and regularities and what is actually accomplished (i.e., an incongruous contrast), a mismatch that becomes available subsequently. The last discursive or textual regularity to be discussed in this subsection differs slightly from those already discussed in this regard, since breaching it establishes a simultaneously perceived contrast. Recipients of text or discourse orient to a specific production format (Goffman 1981) as default. Adopting Goffman’s terminology, speakers are supposed, by default, to be principal, author and animator of their utterances. If this unity is dissolved, and speakers highlight one “production role” (Levinson 1988) and shift footing (Goffman 1981: 124–160), the deviation is specifically marked and meaningful. As we have seen in Chapter 2, a shift of footing in which the speaker is merely the animator of discourse attributed to another author and/or principal can serve as a pragmatic cue for a humorous discourse modality. It is mainly accomplished by speakers mimicking prosodic and linguistic features habitually ascribed to the source (cf. 2.2.2.3). In doing so, they signal to the recipients that they neither align to what they say, nor account for it. Instead, they merely animate another voice.

160 Understanding Conversational Joking

In humorous utterances, however, we are rarely dealing with a neutral rendition of other voices. Typically, speakers explicitly dissociate themselves from what they render, for instance by exaggerating vocal or linguistic features associated with the figure whose words are rendered. The animating speaker’s evaluative voice is superimposed on that of the source (Günthner 1999). Thus, in humorous utterances there is a layering of contrastive voices which relies on a shifted footing. The humorous effect of a simultaneously available contrast between the animating speaker’s evaluative stance and the position they render, quote or mimic is frequently utilized in ironic or sarcastic utterances (e.g., Clift 1999; Kotthoff 2002; Haiman 1990, 1998), while also characterizing other forms of humor (e.g., Thielemann 2013; Priego-Valverde 2012). In very broad terms, a speaker’s humorous detachment from what they say – irrespective of the linguistic, pragmatic and prosodic marking – is a characteristic of denigrating forms of humor. This is illustrated in (22). The group is talking about the Russian-Georgian conflict of 2008 and ridicules the overly patriotic attitude of the Georgians, which contrasts starkly with their military power. In this context Nadja, and subsequently also Saša, reproduce parts of a famous phrase from a Soviet movie was originally meant as a positive evaluation of Georgians (malen’kij no očen’ gordyj narod– ‘a small but very proud people/nation’). Here, however, the speakers underline their detachment from the rendered talk by mimicking prosodic features habitually associated with a Georgian accent (e.g., avoidance of the qualitative reduction of /‍o/‍ to /‍a/‍ in unstressed syllables (akanje), velar pronunciation of palatal consonants). As the quote can probably be assumed to function as a precedent text (Karaulov 1986) in the Russian speech community, the recipients retrieve the source context with its evaluative dimension while simultaneously noticing the animating speakers’ dissociative attitude. Utterances with a shifted footing, furthermore, have the potential to provide moments of interdiscursivity, as can also be seen in this same example. (22) RuMaCharkov2008 (00:16:10), Viktor (V), Nadja (N), Saša (S), Petja (P) 1

V :

2 3

N :

4 5

S?:

=[. tank faina [-] it should be a number of people who =[(vsre- v srednem- v srednem) šest’ iz semi ] (on on average on average) six out of seven čelovek people protiv inostrancev. are against foreigners. i protiv neruss[kich. and against non-Russians. [a- ėto ešče normal’no; ah- this is still okay

Knowledge about how, when, by whom, and for what purposes activities or genres are usually performed is based on interlocutors’ communicative competence. At the same time, it enables them to quickly detect deviations from entrenched generic forms in any of these regards. Speakers can exploit this faculty for humorous purposes by deliberately digressing from genre norms. Thus they may stage a genre outside the setting with which it is usually associated, or combine a compositional and/or sequential structure, or use linguistic and stylistic genre patterns with inappropriate slot-fillers or content (cf. Example (23) and (24)). In all such cases, the result can be a humor-specific cognitive dissonance since recipients perceive a mismatch between what is actually realized and what they had expected to be realized, a mismatch that can pertain to the setting, the participation roles, the purpose or logic of the activity performed or the linguistic or stylistic means chosen. 3.3.4 Social norms A further resource to be played with in face-to-face interaction is provided by social norms. These cover not only established behavioral standards within a social group or culture, and conceptions of situationally appropriate behavior, but also interlocutors’ mutually shared knowledge of what counts as a taboo topic and the like. Playful violations of such behavioral norms typically characterize various



Chapter 3.  Humor as a cognitive phenomenon 171

forms of teasing (i.e., playful aggression). Successful teasing fundamentally relies on a common recognition by speakers and recipients (i.e., victims) of behavioral standards, as these mark the behavior ‘normally’ expectable and so enable processing of the divergence in terms of humor. Furthermore, cues pointing to a humorous framing play an important role in preventing the violation from being taken at its (aggressive) face value. In this subsection, we will briefly sketch how concepts from politeness research can contribute to the characterization of habitualized social norms and behavioral standards. Our aim will be to illustrate how deviations from normal, expectable and appropriate behavior can be exploited for the construction of humor. In the search for a conceptual framework to account for interlocutors’ mutual orientation to normatively appropriate social behavior, politeness research represents a promising source. Admittedly, conceptualizations of politeness vary, as the field has grown to an almost unmanageable size (cf. Eelen 2001; Watts 2003). Nevertheless, this research framework points to ways in which culture-specific and situationally appropriate verbal behavior can be analyzed (cf. Ogiermann 2009; Brehmer 2009; and Thielemann (2010) for culture-specific and group-specific behavioral standards in the Russian speech community and modelled within politeness frameworks). In very general terms, politeness refers to linguistic and communicative means in the service of maintaining smooth interaction (Ide 1989: 231). Politeness is thus seen as a strategy for modeling the interpersonal relationship in a way that complies with the cultural and social norms. This understanding is in line with Sifianou (1992: 86), who equates politeness with “the set of social values which instructs interactants to consider each other by satisfying shared expectations”. This allows joint behavioral standards to be associated with politeness, irrespective of the way in which politeness is theoretically modeled (e.g., as behavior guided by specifically ranked meta-pragmatic principles (Leech 1983) or as strategy for the management of interlocutors’ face needs (Brown and Levinson 1987)) and culturally, socially, or situationally parametrized (cf. Ogiermann 2009). Watts (2003), however, warns against putting polite behavior on the same level as normatively appropriate behavior and introduces the concept of politic behavior. Politic behavior is defined as “linguistic behaviour which is perceived to be appropriate to the social constraints of the ongoing interaction” (Watts 2003: 19). It therefore differs from polite behavior, which is “perceived to be beyond what is expectable” (Watts 2003: 161). In other words, politeness provides a surplus, whereas politic behavior features normalcy and inconspicuousness. However, Watts undermines this distinction by combining politic behavior with Bourdieu’s concept of the habitus: that is, “the set of dispositions to act in certain ways, which (…) is acquired through socialization” (2003: 149). Accordingly, speakers tacitly orient to politic behavior and expect their interlocutors to do so too. Locher and Watts

172 Understanding Conversational Joking

(2008) therefore argue for “frames of expectation” (77), which represent what emerges as an (in)‍appropriate behavior in a given social practice. Such frames comprise interlocutors’ knowledge of the conventionalized or habitualized ways in which face needs are handled. They therefore specify what kind of behavior is considered normal and can be expected. A similar position is adopted by Terkourafi, who, however, sticks to the term politeness. She argues that “frame-based politeness (…) corresponds to Watt’s politic behavior since both aim to capture what is appropriate relative to a certain situation” (Terkourafi 2005: 252–253). Accordingly, “frames may be thought of as psychologically real implementations of the habitus” (Terkourafi 2005: 253). She further stresses that frame-based politeness points the way to a concept of politeness which bridges the gap between interlocutors’ own emic concept and analysts’ second order conceptualizations. Norms are present (allowing interlocutors and the analyst alike to reason preemptively, i.e., predictively, about politeness) only to the extent that regularities of co-occurrence can be empirically observed between linguistic expressions and their contexts of use in these data. Such regularities can be described in terms of frames. That is, frames are first and foremost an analyst’s tool for describing the observed regularities. However, the fact that observable regularities can be detected in a large corpus of data by the analyst who takes an emic standpoint raises the possibility that these regularities are also available to be detected by interlocutors, and that interlocutors are sensitive to them when producing and interpreting discourse. (Terkourafi 2005: 253)

Understood in this way, frames representing the group-specific or situational habitus – what counts as politic behavior in the sense of Watts or as polite in the sense of Terkourafi – provide institutionalizations concerning social behavior which, in turn, serve as reference points for the assessment of any deviation, be it humorous or non-humorous. If the behavioral standard is associated with politeness, humor-related meaningful deviations from it assume essentially one of two shapes, depending on whether they are parasitic on politeness or on impoliteness. Mock impoliteness results in banter or teasing, whereas mock politeness results in sarcasm (e.g., Bousfield 2007; Haugh 2010; Furman 2013). Humorous deviations from habitualized behavioral standards pertain to various aspects of linguistic behavior, as is shown in Examples  (25), (26) and (27). The extract given in (25) illustrates how playful violations of conventionalized modes of addressing can be exploited to create humorous effects. The two interlocutors are fellow university teachers conversing in the faculty room, the female (I5) being younger than her male colleague (M3). I5 orients herself to addressing her colleague by first name and patronymic, which is rather conventional for this

Chapter 3.  Humor as a cognitive phenomenon 173



constellation, but is unable to recall his actual names (lines 2, 4). Instead of repairing the problem by providing his correct first name and patronymic, M3 jocularly suggests that she address him as ‘Your highness’ (“vaše prevoschoditel’stvo”). This is recognized and appreciated by I5 as a humorous deviation from their habitualized way of addressing each other. (25) ORDs05–07 (00:08:13), I5 (female), M3 (male) 1

I5:

2 3

M3:

4 5

[I5] M3:

6 7

[M3]

8

I5:

9

M3:

10

I5:

11 12

M3:

13 14

I5: M3:

ja vse vrema zaputajus’. I am always mixing up vladimir [valentinovič]- vadim valer’evič[zovite ( )] call ( ) vladimir vadimovič- äh [valerij vladimirovič; [( ) da- zovite menja ( ) just call me prosto vaše simply your prevoschoditel’stvo;= highness =hehe okay- agreed

simply classy hehehehe [] highness [ėto ėto ėto ėto ne tol’ko vy elen- ( ) ] it’s it’s it’s it’s not only you Elena[hehehehe ] [ne tol’ko] vy. not only you.

According to Brown and Levinson (1987), politeness refers to behavior which is oriented to interlocutors’ face needs. These relate, on the one hand, to protecting their private territory from intrusion (usually associated with negative face) and, on the other, to their positive self-presentation (associated with positive face) remaining intact, although cultures and groups may differ in the weights they assign to the two aspects. Extracts (26) and (27) show how speakers may try to playfully breach their interlocutors’ needs for positive self-presentation, and how such teasing can succeed (cf. (27)) or fail (cf. (26)). In doing so, they illustrate

174 Understanding Conversational Joking

that interlocutors discursively negotiate what is impolite, and accordingly mockimpolite (cf. Furman 2013). They furthermore highlight the importance of contextualization cues in signaling that the jester is not really being impolite. In (26), we see how mock impoliteness is potentially risky and can be reacted to in a “pofaced” manner (Drew 1987). Here, Saša humorously attacks an aspect of Dima’s masculinity (lines 20–27), relying mainly on exaggeration to signal a humorous framing. However, it appears that the topic touched upon (success with women) is too delicate to be joked about for Dima, whose belated reaction could indeed be described as po-faced (lines 29–31). (26) RuKaMoskau2009 (00:04:19), Saša (S), Dima (D) 1

S :

2 3 4

5 6 7

D :

8 9 10

S : S :

11

D :

12

S :

13 ((…)) 14 D : 15

nu i čo;= so what =ladno. alright. s toj s kotoroj ty znakomthe girl you met ty smog s nej tam pravda razvit’ otnošenija, have you managed to develop a relationship with her ((sips)) ((clears throat)) podderžat’ smog; managed to maintain; nu posmo[trim-=tam ničo ne budet; well we’ll see-= it’s not gonna work out [] ; great odna tam kleilas’; one girl was hitting k tebe? on you? nu a ty č=o, and what about you n: net;= n: no; =ona uže togda kleilas’(často); she used to hit on me (a lot)

Chapter 3.  Humor as a cognitive phenomenon 175



16 17

D :

18 19 20

S :

21 22 23 24

D : S :

25 26

S :

27 28 29

D : ?: D :

30 31

(a) sejčas ona čut čut čut ėto samoe (.) vypila; now she is a little bit like (.) tipsy . she’s become more active skol’ko ej let? how old is she? (2.0) no to est’ u tebja byl šans stat’ mužčinoj so it means that you finally had the chance to nakonec-to na: become a man at ėtoj svad’be,= that wedding = [da? yes? (0.5) [no ty im opjat’ ne vospol’zovalsja, but again you didn’t seize your chance [.hhh ((sips)) ty ne boiš’sja čto ot takich slov možet (.) aren’t you afraid of getting a numb tongue jazyk zatupit’sja;=ne? from such words; =no? nakonec-to; finally

In (27), by contrast, a mock impolite utterance (in line 13) successfully launches a banter sequence in which even the victim joins. The mother (F2) sneeringly praises the proficiency of her daughter (F1) in opening wine bottles and refers to her as an ‘old alcoholic with professional experience’ (“”, “so STAžem.”). A friend of the family (I18) humorously supports this (lines 14, 20) and admires F1’s skills (“=;”). Both adequately frame their utterances as not entirely serious by using exaggeration and prosodic cues. As a result, F1 is finally able to adopt the humorous perspective by saying that ‘one needs to practice since college days’ (line 25–26). In any case, it seems likely that, within this group, an allusion to one’s drinking habits is not assessed as a particularly severe attack on one’s positive face.

176 Understanding Conversational Joking

(27) ORDs18_01 (00:14:05–6), mother (F2), daughter (F1), female friend (I18) 1

I18:

2 3

F2: I18:

4 5

I18:

6 7 8 9 10 11

F2: I18:

12 13

F2:

14

I18:

15 16 17 18

F2: F2:

19 20

I18:

21 22 23

F2: I18:

24 25

F1: :

26 27

ėto ty vsju molodost’ otkryvala i u tebja užeso you opened them all the time during adolescence and now

[ruka nametana. you are a dab hand at it (-) = not like your daughter = she always opens all the bottles (-)

(.) da::. yes. (-)

an old alcoholic = yes. =hehe[he [ (---) so STAžem. with professional experience (.)

yes yes yes (--) = =; but how do you do it (.)

> S

Time



H Ground

Context Shared knowledge

Figure 1.  Langacker’s (2001: 145) Current Discourse Space

The conceptualization in the viewing frame relies on multimodal input. As shown in Figure 2, Langacker promotes a comprehensive view of conceptualization, as it can be triggered and shaped by verbal elements and strings, by their prosodic realization and by gestures. Accordingly, conceptualization is not restricted to the object or situation being construed, but also includes the information state of an element (e.g., new, given) and speech management issues (e.g., turn-taking, turnholding). Diverse aspects thus together account for a usage event.



Chapter 4.  Conversational humor from a discourse-semantic perspective 189

Speech management Information structure

Conceptualization channels

Objective situation Segmental content Intonation

Vocalization channels

Gesture Viewing frame

Figure 2.  Bipolar usage event according to Langacker (2001: 146)

As discourse proceeds, speakers shift viewing frames with every upcoming usage event, whose multimodal structure triggers a new conceptualization, with linguistic (or multimodal) elements signaling the linkages between events. Langacker illustrates this, inter alia, using the examples of sentence-initial therefore and the filler uh. The former indicates that the proposition it introduces follows logically from another proposition in the immediately preceding usage event (Langacker 2001: 149). It thus points to the projective way in which the adjacent usage events are to be connected. The filler (uh) relates first and foremost to the conceptualization channel of speech management: It functions as a turn-holding device which signals to interlocutors that the speaker is merely hesitating and plans to continue their turn (2001: 148). Like therefore, it indicates a particular kind of upcoming usage event and prevents its recipients from taking the turn. Apart from that of the conceptualization channel, Langacker introduces a further dimension relevant for the “management of attention in the flow of discourse” (2001: 154), which affects discourse organization. Adopting ideas from Chafe (1994), he (2001: 154) argues that discourse is organized in chunks of “digestible size” which fit into “windows of attention”. What is within the scope of this attentional framing is “at a consciously accessible level” (2001: 154). Attentional frames are thus associated with intonation units (Chafe 1994: 53–69), which often coincide with clauses and satisfy Chafe’s (1994: 106–119) one new idea constraint. In the most general terms, Langacker’s CDS model is a sound follow-up framework that implements CG theoretically within a cognitive analysis of discourse. However, the model does not meet equally well the four basic demands on an approach towards the analysis of cognition in interaction set out above (i.e., the ability to account for the emergent and shared character of cognition in interaction, including possibly richer backstage cognition and acknowledgement of the differentiated impact of multimodal input on conceptualization). On the one hand, its dynamic and inherently temporal character helps in explaining discourse as an emergent process in which mental representations are constantly updated and during which only one viewing frame or conceptualization is consciously on stage at a time. On the other, though, the mutually shared, coordinated

190 Understanding Conversational Joking

and thus intersubjective character of the conceptualization within the viewing frame remains largely an idealization (Ehmer 2011: 45, 49; Zima 2013a: 54; Brône 2009: 89). According to Langacker, conceptualization “takes place in the minds of individuals but does not occur in isolation – the conception involved in language is both shaped by, and a primary vehicle of, social interaction” (2013b: 97–98). In essence, intersubjective cognition relies on “a simulation of the other’s experience, thereby apprehending the situation from the other’s vantage point” (2013b: 98). Although Langacker’s concept of the ground includes the interlocutors (S and H), it can hardly be assumed that their perspectives (including their social, cultural and situational background knowledge) completely coincide. The CDS model thus accounts rather inadequately for the shared character of cognition in interaction. After all, conceptualization is not conceived of as a joint process of negotiation in this framework but as a mental representation construed by one conceptualizer (Brône 2009: 89). Finally, the inclusion of multiple vocalization and conceptualization channels can again be regarded as a strength of the model as it takes into account the multimodal nature of face-to-face interaction and the variety of meanings relevant in conversation (including information packaging and interaction management). Although, as stressed by Brône (2009: 89), the illocutionary dimension of usage events is missing, the same author (2008) shows that the CDS model can be adopted in the analysis of teasing retorts from British TV comedy. Similarly, Zima (2013a, b) applies the model in her analysis of hecklings, which also subvert the communicative intent of a preceding turn. Both scholars thus confirm that the CDS model enables the tracing of humorous cognition in interaction when it comes to sequentially realized forms of humor, though in practice their analyses focus on conceptualization of the object, or objective situation, and neglect other input having an impact on the conceptualization. Both scholars explain humorous cognition as resulting exclusively from a sequentially realized de-automatization of the default construal mechanisms usually employed in conceptualizing an object or situation. This is generally in line with the CDS model, which essentially relies on the schematic reconstruction of conceptualization familiar from CG. As argued in Subsection 3.2.3, jesters frequently rely on social and cultural background knowledge which further enriches conceptualization, and thus contributes to rich and creative humorous cognition. Its impact on the conceptualization, however, can hardly be included in a schematic fashion (in the manner of CG). The role of knowledge from the wider (social and cultural) context – organized in knowledge frames and accounting for rich(er) backstage cognition – in the conceptualization thus remains vague and underspecified in the CDS model (as it does in CG and its construal schemes).



4.2

Chapter 4.  Conversational humor from a discourse-semantic perspective 191

Clark’s joint action hypothesis

A further dimension that the CDS model treats in a cursory, somewhat unsatisfactory manner is the role of coordination in language use. By contrast, the next framework to be discussed here stresses the joint negotiation of meaning in discourse. It is Clark’s (1996) discourse representation model, which originates from his ideas on language use as a joint activity. In the process of this joint activity, interlocutors incrementally develop the mutually shared basis (common ground) which forms a prerequisite for (successful) communication. Clark’s approach is relevant because it integrates cognitive and social perspectives on discourse, pursuing the central claim that “the study of language use is both a cognitive and a social science” (1996: 25). It is based on several assumptions crucial for the analysis of face-to-face interaction, namely that language use is a “species of joint action” which “always involves speaker’s meaning and addressee’s understanding” and which is employed for “social purposes” (1996: 23–25). In this joint endeavor of language use, coordination is a task constantly to be dealt with. Not surprisingly, Clark devotes much attention to various coordination devices which assure mutually shared understanding in any kind of joint activity, language use being merely one form of this. These features make his approach another potential framework for the analysis of (humorous) cognition in interaction. The understanding of discourse as a joint activity requires a discourse model which displays how speakers “keep track of a discourse representation” (1996: 52) that is constantly updated, and which thus accounts for the inherently temporal nature and emergent character of discourse. According to Clark, both textual and situational representation amount to discourse representation (1996: 53). In this context, the concept of common ground is crucial. Clark defines it as follows. For two people A and B, it is common ground that p if and only if: 1. A and B have information that some basis b holds; 2. b indicates to A and B that A and B have information that b holds; 3. b indicates to A and B that p. (1996: 66)

As discourse proceeds, speaker and hearer accumulate common ground (1996: 52) by virtue of utterances or other signals. This constantly expanding common ground then forms the context in which any upcoming utterance is embedded and interpreted, and thus the continually updated basis allowing for joint action. With regard to common ground, Brône (2009: 92) particularly stresses that the assumption of a shared mental representation remains an individual one about what the other participant believes, a view he finds warranted by Clark’s own assertion (1996: 96) that “we are in fact acting on our individual beliefs or assumptions about what is in our common ground”. To accomplish a joint activity,

192 Understanding Conversational Joking

interlocutors must therefore indicate to each other what they assume to be the common ground. This is where coordination devices such as convention, precedent or explicit agreement come into play (Clark 1996: 62–86). Interlocutors orient themselves to these in order to ascertain what is considered to be mutually shared common ground. Interlocutors also rely on “[s]‍ignaling systems” (1996: 75) which similarly point to the relevant context and indicate how their individual actions fit into the joint activity. Here, Clark (1996: 241–252; 2004) distinguishes between two types of signals used by interlocutors. Primary signals “refer to the official business” (2004: 373) (i.e., what the talk is actually about), while collateral signals (vocal or visual gestures, repair or side sequences, choices of style or register, etc.) serve meta-communicative purposes. The latter comment on the performance in the primary communicative track and signal how it refers to the business at hand. However, the groups of collateral signs adduced by Clark (2004: 373–381)  – inserts (e.g., side sequences), modifications (e.g., repair phenomena, phonetic variation), juxtapositions (e.g., incremental replacements) and concomitants (e.g., accompanying gestures) – are not consistently located in a separate channel. Not all of them constitute additive signals, as the terminological distinction between a primary and a collateral track might suggest. Modifications, in particular, include means which alter the form of the primary signal and can therefore not be separated from it. Indeed, Clark’s collateral signals generally resemble contextualization cues (Gumperz 1982, 1992a, b; Auer 1986, 1992), both in form and in function, as they point to the context in which an utterance should be understood (cf. Langlotz 2008: 358–361). Like contextualization cues, collateral signals help in mutually adjusting the contextual framing of a communicative event. Yet surprisingly, Clark (1996, 2004) himself does not point to the striking parallels between contextualization theory and his ideas on language use and coordination. Clark broaches the issue of humor twice. Once he touches upon it as an effect arising from coordination problems. For him, these testify to discrepancies in what is assumed to be the common ground. They can thus also be deliberately exploited for humorous purposes. Clark (1996: 68) adduces the example of riddles, which cause puzzlement (i.e., a cognitive disturbance) because interlocutors do not share the same contextual assumptions with regard to the solvability premises. In (1), for example, Ben is not able to guess the ‘right’ answer and Ann takes advantage of this. The riddle has a solution which Ann does not expect Ben to discover.

(1) example quoted from Clark (1996: 68)

Ann: Ben: Ann:

When is a thought like the sea? (after thinking a bit) I don’t know. When? When it’s a notion.



Chapter 4.  Conversational humor from a discourse-semantic perspective 193

Secondly, Clark (1996: 353–384) dwells on humor in the context of layering phenomena which cover inter alia fiction and drama, as well as various forms of non-serious utterances. These confirm that “[l]‍anguage use often has more than one layer of activity” (1996: 24). As illustrated in Figure  3, the first layer (layer 1) represents the actual communicative situation and the interlocutors involved in it. It can be associated with the ground from Langacker’s CDS model. The other layer (layer 2) comprises an imagined, staged or pretended situation evoked by the speakers

Layer 2 Layer 1

Figure 3.  Layers of action according to Clark (1996: 354)

According to Clark (1996: 369–376), irony (and similarly teasing) relies on joint pretense (cf. Haiman 1990, 1998). By this, he means that all interlocutors involved are aware of the layering and the staged character of the second layer. He explains humorous layering using examples like the one reproduced in (2). The extract stems from a couple’s conversation about the husband’s charges for private lessons and essentially pivots on the use of cheap.

(2) example quoted from Clark (1996: 353)

Ken: and I’m cheap,– – – Margaret: I’ve always felt that about you,Ken: oh shut up, (-– laughs) fifteen bob [shilling, a former British monetary unit] a lesson at home.-

Margaret’s utterance is not meant to be taken at its face value. She does not really feel about her husband that way but merely pretends to do so.1 Ken’s laughter indicates that he has noticed the pretense and appreciates it.

1.  Clark’s comprehension of humorous utterances as pretense which involves jesters “acting as if ” (1996: 353) and staging their utterances resembles the analysis of humor (or at least some forms of humor) in terms of polyphony or Bakhtin’s heteroglossia (e.g., Kotthoff 2002; PriegoValverde 2012; Thielemann 2013). However, he does not pay much attention to the means by which interlocutors indicate to one another that they are not speaking for themselves (cf. Haiman 1990, 1998 for an approach centering on this aspect).

194 Understanding Conversational Joking

Irony as a particular form of humor, however, is characterized not only by joint pretense but also by a specific configuration of the two layers. Clark accounts for this in his pretense theory of irony (cf. Clark and Gerrig 1984) In the pretense theory, then, irony has two layers. A and B are at layer 1, and their implied counterparts Ai and Bi are at layer 2: Layer 1 Ai is performing a serious communicative act for Bi. Layer 2 A and B jointly pretend that the event in layer 2 is taking place.  (Clark 1996: 372)

Irony essentially owes its funniness to interlocutors’ “recognition of the contrast between the two layers” (1996: 372). Consequently, the humorous effect of nonserious, staged communication derives from “an unexpected incongruity between what might have been (scene 2) and what is (scene 1)” (1996: 372). In a sense, the humor-specific cognitive contrast is translated into a mismatch between the ‘reality’ layer (comprising the interlocutors and the actual situation, cf. Langacker’s ground) and the staged layer (comprising a pretended and somehow unusual event) that are linked by a relation of correspondence. Teasing resembles irony as it also relies on staged communicative acts: that is, on the pretense of actions, events or scenarios that do not match, exaggerate or run counter to those located on the ‘reality’ layer (1996: 374). In the context of teasing, however, exaggeration and overstatement are frequently employed to mark the pretense and thus point to the layering (1996: 375–376). By and large, Clark’s ideas meet several of the demands on a framework for the discourse-semantic analysis of humor in interaction. The assumption of an incrementally-updated common ground, which serves as a jointly presupposed basis in communication, kills two birds with one stone. On the one hand, the idea of a continuously evolving context against which any upcoming utterance is interpreted promotes an understanding of discourse as process. On the other, it explains the inherently contextualized nature of language use and understanding. Compared to Langacker, Clark seems to cope well with the jointly negotiated character of meaning in interaction, stressing the role of coordination. In distinguishing two communicative tracks, both of which contribute to the accomplishment of meaning in joint activities, he advances ideas reminiscent of contextualization theory. His approach incorporates the potential multi-modal design of a (humorous) stimulus and accounts for its impact on contextualized understanding as a matter of principle. Finally, in contrast to CG and the CDS model, Clark considers the impact of social and cultural background knowledge on understanding utterances in context, subsuming cultural facts, norms and procedures amounting to a shared stock of knowledge under the label of communal common ground (1996: 100–102).



Chapter 4.  Conversational humor from a discourse-semantic perspective 195

Nevertheless, Clark’s model remains fuzzy in how input from these various sources is actually included and displayed in the concrete analysis of cognition in interaction. And that is by no means the only regard in which it falls well short of our requirements. For example, although Clark points to the way in which primary and collateral signals partake in the joint making of meaning, these are not at the center of his interest. Accordingly, apart from mentioning exaggeration and overstatement in the context of teasing, he does not elaborate on the contribution of collateral signals to layered activities, which is particularly relevant in the analysis of humor. Moreover, irrespective of its general explanatory power as a discoursesemantic model applicable to the analysis of several forms of humor (riddles, irony or sarcasm, teasing), Clark’s model largely avoids formalization. Indeed, Brône (2009: 97) criticizes its low granularity and misses sophistication in its semantic analysis, which cannot compete with the construal schemes externalizing conceptualizations in CG. And, last but not least, in contrast to Langacker’s CDS model, the cognitive status of Clark’s discourse representation model remains unclear. 4.3

Fauconnier and Turner’s blending theory

In Sections 4.1 and 4.2, we found that neither of the models discussed there adequately meets the demands on a framework for the analysis of humorous cognition in interaction set out at the start of this chapter. For neither takes adequate account of the multiple sources contributing to cognition in interaction (ranging from multimodal input and the immediate discursive and situational context to wider social and cultural background knowledge) while also striving toward a more sophisticated presentation and visualization of conceptualization. A more productive avenue is provided by a blending analysis of conversational humor that incorporates ideas from conversation analysis in order to display and visualize the peculiarities of humorous cognition in interaction. This subsection therefore begins by introducing the basics of blending theory including mental space theory (Fauconnier 1994, 1997; Sweetser and Fauconnier 1996; Hougaard and Oakley 2008; Coulson 2001), from which it originated. It will then move on to a blending and mental spaces analysis of conversational joking designed to show which sequentially deployed configurations and blends of mental spaces underlie or characterize humorous cognition in interaction. As essentially cognitive models of discourse representation, mental space and blending theory are able to explain (rich) conceptualization triggered by verbal and other input. Admittedly, their exclusively cognitive perspective has some minor drawbacks in the analysis of face-to-face interaction. However, both theories definitely benefit from the inclusion of ideas from discourse, and especially conversation

196 Understanding Conversational Joking

analysis which account for the temporal and interactive character of the data analyzed and which help to further validate a cognitive analysis (4.3.1). As a result, they enable the analysis of the configurations and blends of mental spaces characterizing humorous sequences, and of the multimodal and other resources contributing to humorous cognition in interaction (see Section 4.3.2). As a result, we will finally be able to reveal, in Section 4.4., the diverging impact of formal features characterizing and singling out humorous discourse analyzed as contextualization cues in Chapter 2. Blending theory (Fauconnier and Turner 2003) provides a cognitive semantic framework within which to grasp the emergent, novel, and in very general terms, creative conceptual structures characterizing human thinking across several modalities. It has been preponderantly applied to the analysis of textual discourse, but also to multimodal items combining text and illustrations. More recently, analyses of conversational discourse have broadened their scope, demonstrating how blending theory can be turned into a model for the analysis of cognition in interaction (e.g., Hougaard 2005; Oakley and Hougaard (eds.) 2008; Ehmer 2011; Stadelmann 2012; Pascual 2014). As a result, blending theory has been converted into an “interactional cognitive semantics” (Hougaard 2005: 1656) suitable for the analysis of cognition in interaction that helps to externalize the conceptual structures specific to humorous discourse, analyzing them in terms of mental representations elicited by multimodal stimuli and enriched by co-activated background knowledge. These enhancements, which mostly reflect the integration of ideas from conversation analysis, pave the way for a blending analysis of humorous sequences capable of revealing which conceptual configurations are responsible for humorspecific disruptive processing. Furthermore, by allowing for all the different sorts of input that trigger and contribute to conceptualization, blending theory paves the way for a re-analysis of the performance features setting apart humorous utterances. Finally, with regard to the possibly heterogeneous nature of contextualization cues (see Section 2.3), mental space and blending theory help to determine more precisely their different impact on humorous cognition. To begin at the beginning, mental space theory is a cognitive semantic model of online meaning construction which describes how linguistic input interacts with background knowledge in the local process of understanding discourse. As does its offspring blending theory, the model relies essentially on the concept of mental spaces, which are defined by Fauconnier and Turner (2003: 40) as “small conceptual packets constructed as we think and talk, for the purpose of local understanding and action”. As such, they contain “a partial representation of the entities and relations of a particular scenario as perceived, imagined, remembered, or otherwise understood by a speaker” (Coulson 2001: 21). In other words, mental spaces are temporary cognitive structures triggered by variously structured input



Chapter 4.  Conversational humor from a discourse-semantic perspective 197

possibly including or combining resources ranging from linguistic structure to drawings and gestures. The vast majority of research within this framework, however, focuses on linguistic input. The role of linguistic structure in giving rise to a mental space is reminiscent of cognitive linguistic ideas about conceptualization. Mental spaces are “constructs distinct from linguistic structures but built up in any discourse according to guidelines provided by linguistic expressions” (Fauconnier 1994: 17). Accordingly, words “do not directly refer to entities in the world” but merely function as “linguistic cues” which “prompt speakers to set up elements in a referential structure that may or may not refer to objects in the world” (Coulson 2001: 21). As a result, the scene represented in the mental space relies not only on the linguistic input but also on the evocation of background knowledge organized in frames or cognitive schemas, stored in long-term memory and “typically elaborated locally during any particular discourse” (Sweetser and Fauconnier 1996: 11). This knowledge substantially contributes to the conceptual structure comprised in the mental space and accounts for a conceptualization which is richer and more specific than the linguistic input. In this regard, the model is in line with one of the basic tenets of cognitive linguistics: meaning is conceptualization (Langacker 2008, 2013a; Evans and Green 2007; Croft and Cruse 2004). Consequently, a mental space “typically includes elements to represent each of the discourse entities, and simple frames to represent the relationships that exist between them” (Coulson 2001: 21). The “frame in this context provides a set of organizing relations among the elements in the space” (Fuji 2008: 185). This is illustrated in Figure 4, which displays the mental space triggered by sentence (3), an example given by Fauconnier (1997: 44).

(3) Achilles sees a tortoise.

Here, the frame activated by the verb to see supplies the structure (shown in the rectangle) of the mental space that specifies how the two elements (i.e., the discourse entities Achilles and a tortoise) are related to each other (i.e., one takes the role of seer and the other that of the seen). At this point, it is important to stress that mental spaces should not be equated with frames or conceptual domains. As partial representations, they include only what is required for the purpose of the given discourse extract. However, the frame, once activated, remains available a name Achilles b tortoise SEE a b

a b

Figure 4.  Mental space prompted by (3) according to Fauconnier (1997: 44)

198 Understanding Conversational Joking

in the background and can be referred to for the construction of further mental spaces (Hougaard and Oakley 2008: 4; Ehmer 2011: 45). Mental spaces are set up in working memory and modified according to local demands as discourse proceeds. Consequently, discourse can be analyzed in terms of a specific configuration or “network of interconnected mental spaces” (Hougaard and Oakley 2008: 3). This can be illustrated using the relatively simple example of the story reproduced in (4), an example from Coulson (2001: 22). (4) Arnold is an actor. He plays a mercenary on TV. But in real life, he’s a pacifist and a vegetarian.

The story triggers the construction of two interrelated mental spaces, each comprising a discrete scenario. One includes Arnold (a) in his real-life roles as actor, pacifist and vegetarian, the other a presentation of him (a’) playing the part of a mercenary in a TV movie. The lattice of spaces mentally representing the story (see Figure 5) thus reflects how the information given in the text is partitioned and how the properties of a discourse entity may change according to the context (Coulson 2001: 22–24). Elements Arnold (a)

a

Relations Actor (a) Pacifist (a) Vegetarian (a)

Base

Elements a′ a′

Relations Mercenary (a')

Television

Figure 5.  Configuration of mental spaces triggered by (4) according to Coulson (2001: 23)

The reconstructed network of mental spaces was initially designed as a theory of referential structure “concerned with the dynamic management of context and reference in discourse” (Langlotz 2008: 350). It visualizes the way in which a recipient



Chapter 4.  Conversational humor from a discourse-semantic perspective 199

efficiently identifies both representations of Arnold, although they occur in differently framed scenarios and are assigned different roles. This task is accomplished due to a connector which links both counterparts across the mental spaces. The line connecting both elements indicates that the TV-counterpart of Arnold (a’) is accessed via the representation of Arnold in the real life-space (a). “If two elements a and a’ are linked by a connector F (a’ = F(a)), then element a’ can be identified by naming, describing, or pointing to its counterpart a” (Fauconnier 1997: 41). This Access Principle, earlier also termed Identity Principle (Fauconnier 1994: 3), specifies the grounds on which links or mappings can be established across spaces. In the example, an identity connector is at work. Elements from different spaces, however, may also be linked in other ways, for instance by “similarity, analogy, metonymy, and other pragmatic functions” (Coulson 2001: 26) which enable rapid access when processing discourse. Mental spaces can pertain to different kinds of scenarios. They can “represent such diverse things as hypothetical scenarios, beliefs, quantified domains, thematically defined domains, fictional scenarios and situations located in time and space” (Coulson 2001: 22). In the example, the first, base space pertains to ‘reality’ (ignoring, for the moment, the difficulties inherent in defining or determining what reality is), while the second space represents a virtual or fictional scenario (television). As discourse proceeds, the base space serves as a point of departure from which further spaces and elements therein are accessed. It is important to notice that this space need not be true or real or actual in any way outside the cognizer’s (or cognizers’) understanding. Only as an interpreter’s understanding of something it is claimed to be very real. (…) The base space is thus a here-and-now space with respect to the unfolding discourse, not with respect to any real or possible world situation. This is also a major reason why mental spaces are cognitive. (Hougaard and Oakley 2008: 3; italics given)

The truth-conditional status of the base space is thus irrelevant since it is defined as “the speaker’s mental representation of reality” (Fauconnier 1994: 15). Given its decisive role as the space from which meaning construction at a given moment starts, it comes as no surprise that Ehmer (2011: 34) compares it to Langacker’s current discourse space. As discourse proceeds, elements and structure are modified and further spaces evolve from the base space. Mental spaces are thus “not just frozen images, states or relations; they can have extensions in time and change over time” (Hougaard and Oakley 2008: 5). In this manner, the model accounts for the inherently temporal and emergent character of discourse. The need to set up a new mental space can be indicated in different ways. Space builders are “overt mechanisms which speakers can use to induce the hearer to set up a new mental space” (Sweetser and Fauconnier 1996: 10). Fauconnier (1994: 17)

200 Understanding Conversational Joking

mentions several expressions and linguistic structures which serve as such triggers, including prepositional phrases (in the picture, …), adverbs (really, probably, …), connectives (if _, then _,…), or subject-verb patterns (X believes, hopes,… that _). Nevertheless, spaces can also be triggered implicitly or inferentially if the need to open “an alternative domain of reference” (Coulson 2001: 22) arises in discourse. Cienki (2008) broadens the scope, arguing that spaces can be triggered multimodally. He points, in particular, to nonverbal resources (e.g., pictures, gestures) which can serve as space builders. Using the example of staged and animated scenes in face-to-face interaction, Ehmer (2011) shows how gestures such as drawing quotation marks with one’s fingers can establish new spaces. Space builders in this sense may also specify the domain of reference: that is, whether a belief space, a fictional or hypothetical scenario, etc. has to be set up (Ehmer 2011: 37). As a rule, however, they do not provide conceptual structure (Ehmer 2011: 38). In sum, multimodal input fulfils several roles with regard to conceptualization. According to Fauconnier who focuses specifically on verbal input, linguistic expressions mainly accomplish three tasks. They can function as space builders and “establish new spaces”, they can introduce elements within them, and they can evoke frames which specify the “relations holding between elements” (1994: 17), and thus provide the conceptual structure of a mental space. Nevertheless, these three tasks can also be accomplished by nonverbal input (e.g., gesture, pictures). Accordingly, mental space theory accounts for rich backstage cognition (Fauconnier 1994) in several regards. Frames activated from background knowledge supply conceptual structure and contribute to a coherent cognitive representation construed, sometimes even from ‘poor’ linguistic input. As discourse unfolds, a complex network of mental spaces emerges. In this process, links between differently framed mental spaces are established. “Meaning construction thus consists of mapping cognitive models from space to space while keeping track of the links between spaces and between elements and their counterparts” (Coulson 2001: 24). Blending theory lifts meaning construction to yet another level and accounts for instances which creatively combine conceptual structure from differently framed mental spaces. Such creative conceptual fusions, also called blends, are analyzed in terms of conceptual integration networks. These represent the configurations of mental spaces which underlie and describe the “creative construction of meaning in analogy, metaphor, counterfactuals, concept combination and even the comprehension of grammatical constructions” (Coulson and Oakley 2000: 176). A prototypical conceptual integration network (see Figure 6) consists of at least two input spaces, a generic space and a blended space (or simply blend) (Fauconnier and Turner 2003: 40–50). In the process of blending, a new conceptual structure emerges through the selective projection of elements and structure from the two differently framed input spaces to the blended space (marked by dashed lines).



Chapter 4.  Conversational humor from a discourse-semantic perspective 201

Blending further establishes mapping relations which link counterparts across input spaces (marked by solid lines). This mapping of elements is warranted by vital relations such as (dis-)‍analogy, representation, similarity, part-whole, role etc. (Fauconnier and Turner 2003: 89–102). The generic space finally contains a conceptual structure that is shared by the entire network. The conceptual structure which emerges in the blend is characterized by its own logic and differs substantially from the input spaces. This emergent structure “is generated in three ways: through composition of projections from the inputs, through completion based on independently recruited frames and scenarios, and through elaboration” (Fauconnier and Turner 2003: 48; italics in original). Composition refers to both forms of combining conceptual structure: the selective projection of elements or structure to the blended space and the mapping of counterparts onto each other across input spaces. The latter kind of projection (i.e., mapping counterparts onto the same element in the blend) results in “fusion” (Fauconnier and Turner 2003: 48). Completion again directly concerns the blended space, which is “unconsciously” supplemented by “background meaning” (white dots) provided by co-activated familiar frames that are “silently but effectively [recruited] to the blend” (Fauconnier and Turner 2003: 48). Completion consequently “occurs when structure in the blend matches information in long-term memory” (Coulson and Oakley 2000: 180). Lastly, elaboration refers Generic space

Input space I1

Input space I2

Figure 6.  Conceptual integration network displaying the emergent conceptual structure of a blend according to Fauconnier and Turner (2003: 46)

202 Understanding Conversational Joking

to a process of emancipation which takes place when the novel and emergent logic of the blend is perpetuated: when it comes to a discursive and mental entrenchment of the logic of the blend, also called “running the blend” (Fauconnier and Turner 2003: 48). Creative combinations of concepts occur in various guises, and conceptual integration networks may vary in their configuration and complexity. For the sake of brevity, two relatively simple blends are discussed here in order to illustrate how blending works across modalities. First, Coulson (2001: 115–118) gives the example of an ad hoc, invented game named trashcan basketball played by two students, who throw scrunched up pieces of paper into a trashcan. This blend combines two input spaces, one comprising background knowledge from the domain of basketball (about the ball, the basket(s), rules of the game, types of shot, etc.), the other framed by knowledge about waste disposal (kinds of waste such as scrunched up paper, trashcans, etc.). The conceptual integration network establishes crossspace mappings based on analogies between elements from the two input spaces: scrunched up paper and the basketball are mapped onto each other, likewise the trashcan and the basket. The blend thus suggests understanding waste disposal in terms of a sports game and turns it into a fun activity. If the two students involved in this novel activity try to make dunks or hook shots with their paper ‘balls’, they elaborate and run the blend (i.e., mentally simulate its logic). The blend just described is accomplished by performing the activity itself (i.e., playing trashcan basketball), and not merely by linguistic means. In other cases, the blend relies entirely on linguistic expressions, as in our second example. This is, the blend underlying the phrase “Murdock knocks out Iacocca” analyzed by Fauconnier and Turner (2003: 126–131). Here, the input spaces involved are structured by background knowledge from the domains of boxing and business. Within this conceptual integration network, the CEOs Murdock and Iacocca are mapped onto two boxers. Fauconnier and Turner classify this example as a single scope network. By this, they mean the conceptual structure of “highly conventionalized source-target metaphors”, in which “[o]‍ne of the inputs but not the other supplies the organizing frame” (2003: 127). Blending theory thus embraces conceptual metaphors (Lakoff and Johnson 2011) as a particular kind of blend. Lakoff and Johnson’s source domain thereby corresponds to the input space largely supplying the frame or logic of a metaphoric blend, while their target domain corresponds to the input space which merely adds single items or agents to the blended space (Fauconnier and Turner 2003: 126–129). In this example, the [boxing] frame provides the structure and the CEOs from the [business] frame merely fulfill the roles of, or perform like boxers. The blend thus suggests understanding economic competition in terms of physical confrontation. This blend can also be elaborated on by consistently sustaining the metaphoric mapping (e.g., by using



Chapter 4.  Conversational humor from a discourse-semantic perspective 203

phrases such as hit someone below the belt or go the distance to refer to the realm of business competition). Summing up, blending refers to a particular form of creative meaning construction which “promote[s] novel conceptualizations, involving the generation of inferences, emotional reactions, and rhetorical force” (Coulson and Oakley 2000: 176). The frames complementing the input spaces involved remain available in the background for further meaning construction, while facilitating rich inferences. They provide more information than is required for a given purpose. A mental space only recruits from a frame the information necessary for that purpose. However, once activated, the frame remains available in the background and can be easily accessed for further conceptualization. The emotional and rhetorical effects of blending trace back mainly to the combination and interaction of specifically framed mental spaces and to the mapping of elements therein, which together give rise to the emergent logic of the blend. As will be shown in Subsection 4.3.2, humor is one of the emotional reactions deriving from specifically assembled networks of mental spaces. Returning to the requirements on an analytical framework set out at the beginning of this chapter, we see that mental space and blending theory, as cognitive semantic approaches, do indeed cope with many of the challenges listed there. The framework they provide is essentially designed to account for rich backstage cognition (Fauconnier 1994). The scenic representation of what is going on in discourse that is contained in a mental space is triggered by multimodal input and complemented by knowledge organized in frames. It is therefore naturally richer than what is suggested by the surface input. “Language, as we use it, is but the tip of the iceberg of cognitive construction. As discourse unfolds, much is going on behind the scenes” (Fauconnier 1994: xii). In this context, Fauconnier (1994: xii) stresses that “we do not suspect the extent to which vast amounts of prestructured knowledge, selected implicitly by context, are necessary to form any interpretation of anything. We notice only the tip of the iceberg – the words – and we attribute all the rest to common sense.” Though mental spaces and blending theory often focus cognitive frames providing “relatively abstract and static habitualized contextual-background information” (Langlotz 2008: 349), frames pertaining to other kinds of background knowledge (social, generic, etc.) can also be considered to enrich conceptualizations and contribute to a coherent interpretation of what is going on. They can also be creatively combined and mapped onto each other. In such cases, the cognizer accordingly relies on knowledge about how, when and by whom an activity or genre is performed, when to use a particular style or register, what pragmatic strategy to choose, and so on, when construing a rich mental representation reflecting their understanding of what is currently going on in discourse (Section 3.3). Genres or

204 Understanding Conversational Joking

activity types, styles or registers, and pragmatic strategies indicate specific situations, social roles and ways in which the relationship between the interlocutors is modeled. This kind of knowledge is also active, at least in the background, and definitely available for further meaning construction. As already alluded to, mental spaces or blending theory is in principle also able to consider multimodal input relying on non-linguistic displays (e.g., bodily movements or activities, graphic elements) (e.g., Cienki 2008; Langlotz 2008, 2013; Williams 2008). Its applications, however, have so far not been very elaborate, particularly when it comes to those non-linguistic displays relevant in faceto-face interaction (e.g., gesture). The book-length study by Ehmer (2011), which discusses gestural display, constitutes an exception in this regard. Nonetheless, it is indisputable that multimodal input, co-text and context (including wider social and cultural background knowledge) contribute to the network of variously interconnected mental spaces visualizing what is going on in a cognizer’s mind and displaying their local online understanding of discourse (Oakley and Coulson 2008: 28; Ehmer 2011: 47). In contrast, the status of mental spaces is problematic. In the search for a framework for the analysis of cognition in interaction, the interconnected network of mental spaces triggered by discourse should ideally represent the interlocutors’ joint cognition. Yet, Hougaard and Oakley (2008: 23) rightly doubt whether reconstructed mental spaces and blends really deal with social cognition, wondering “whose conceptual operations are being modeled in a mental spaces analysis?” As analyst’s reconstructions, the networks of mental spaces or blends are based on the researcher’s introspection (Hougaard 2005). From a cognitive linguistic point of view, they aim to display what is going on in an individual mind or, at best, in the mind of an “ideal speaker-hearer” (Hougaard and Oakley 2008: 23). Neither mental space nor blending theory, however, inherently account for intersubjective, shared cognition. Another, closely related point connects back to the framework’s potential to deal with the temporal character of (conversational) discourse. The emergent nature of discourse is generally considered in terms of a proliferating network of variously interconnected mental spaces, which unfolds as discourse proceeds. Diagrams such as those reproduced in Figures 5 and 6, however, instead give “an atemporal representation of an understanding that evolves incrementally” (Oakley and Coulson 2008: 47). Again, the reconstructed network of mental spaces is an ex post construct, mainly relying on the analyst’s introspection (Ehmer 2011: 48). In a way, the analysis seems to adopt a product perspective rather than a process one, thus adding to “blending’s post hoc problem” (Hougaard 2005: 1658). Whereas the model’s weak potential for visualizing the sequential or online character of cognition in interaction seems to be a matter of representation, the other demurrals



Chapter 4.  Conversational humor from a discourse-semantic perspective 205

mentioned – that is, how to account for shared interactive cognition and how to empirically justify a reconstruction initially based on analyst’s introspection  – remain problematic. These two shortcomings, however, can be regarded as minor and do not render the analytic perspective sketched out in the foregoing inviable when it comes to tracing cognition in face-to-face interaction. On the contrary, talk-in-interaction provides data which (i) give access to interlocutors’ understanding of what is going on in discourse, (ii) show how interlocutors jointly construe and modify meaning as discourse proceeds, and (iii) comprise a multitude of multimodal cues which contribute to this process. Consequently, if mental space and blending theory are to be turned into a comprehensive framework for the analysis of (creative) cognition in interaction, its methodology needs to be complemented in such a way as to put the analysis on an empirical basis by taking advantage of the data which give access to interlocutors’ understanding. The next subsection shows how ideas from discourse and notably conversation analysis help cope with these challenges and how they can amend a cognitive analysis by putting it on firmer ground than solely the analyst’s introspection. 4.3.1 Mental spaces and blends in interaction The main issue with cognitive discourse-semantic frameworks (e.g., Langacker 2001, 2013b; Fauconnier and Turner 2003) is their autonomous mind-view which “locates sense-making in the mental capacities of the individual” (Langlotz 2010: 178): that is, in the mind of an individual cognizer. In talk-in-interaction, however, we are dealing with distributed social cognition, also termed shared, coordinated or intersubjective cognition. Cognitivists try to live with this by working on the (idealistic) assumption that all cognizers share the same conceptual operations and construal mechanisms, and thus arrive at the same conceptualization (e.g., Langacker 2013b). Accordingly, it comes as no surprise that analyst’s introspection is the method of choice. This may also reflect the common practice of analyzing (frequently decontextualized) samples from (mostly) written discourse. If the recipient(s) is/are remote and silent, the only way to access meaning is by speculating about what is going on in the mind of an individual (ideal) cognizer. Consequently, shared (or social) cognition has never formed the focus of cognitivists’ interest. In talk-in-interaction, however, any utterance receives an immediate uptake which reveals how it has been understood by the co-present recipient(s). Production and perception of talk take place simultaneously, and interlocutors jointly negotiate meaning in interaction. In conversation, therefore, we are dealing with “situated social cognition” (Langlotz 2010: 167). In other words, “while the

206 Understanding Conversational Joking

cognizing individual constitutes the medium of meaning-generation, the actual sense-making process is managed through the coordinated social interaction of two (or more) cognizers” (2010: 178). Several scholars (e.g., Hougaard 2005, 2008; G. Hougaard 2008; Ehmer 2011; Oakley and Coulson 2008) working within cognitive frameworks such as mental space and blending theory have made important suggestions which help to bridge the “cognition versus interaction gap” in modeling “the construction of interpersonal meaning (…) as a situated social process that is cognitively mediated by cognizing individuals” (Langlotz 2010: 178). In this context, it is important to stress that social cognition does not cover solely the mutually shared background knowledge shaping and enriching mental representations. For it necessarily includes the way in which context (ranging from situation and text to culture) is jointly accomplished and thus enables intersubjectivity and allows for coordinated action (Ehmer 2011: 47–48). A concept of social cognition must therefore also account for multimodal resources which contribute to, and facilitate a shared understanding of what is going on in discourse (cf. Langlotz 2008, 2013). This subsection discusses how to “bridge the gap between cognitive/mentalist and social interactionist models of language” (Langlotz 2008: 356). It suggests that this can be achieved by combining ideas from discourse and conversation analysis with the cognitive approaches of mental space and blending theory to develop a framework for the analysis of humorous cognition in interaction. It mainly pursues two questions. – How can we put on an empirical basis networks of mental spaces reconstructed by analysts’ and claiming to externalize a mutually shared mental representation? – How can we account, in a sufficiently sophisticated manner, for the various impacts of the multimodal input triggering the reconstructed network? The first question touches upon issues which can hardly be resolved in their entirety. The solutions remain idealistic, at least in some sense. The assumption of a shared mental representation is problematic for several reasons, as spelled out by Ehmer (2011: 49). Interlocutors’ position, for example, may hinder visual and/ or auditory perception of what is going on (2011: 49). Furthermore, their foci of attention need not coincide since one may be distracted by a side sequence or another parallel incident (2011: 49). Moreover, Ehmer points to the generally indexical nature of talk-in-interaction, where much information is merely indicated. Discrepancies in understanding may thus derive from interlocutors being unable to access the same background knowledge (cf. Gumperz 1982). Ehmer (2011: 50) also rightly mentions the possibility that interlocutors may set up (and modify) mental spaces without disclosing and verbalizing what they think.



Chapter 4.  Conversational humor from a discourse-semantic perspective 207

Rich backstage cognition in the sense of Fauconnier takes place predominantly in the black box of the human brain and remains largely unexternalized. Face-toface interaction, however, provides a unique data source offering access to interlocutors’ understanding of what is going on. Obviously, this may not be the only way to interpret what is going on, but it is definitely the only empirically verifiable interpretation. As a result, interaction data are the only kind permitting a method of interpretation other than introspection (G. Hougaard 2008). Not surprisingly, Hougaard (2005, 2008), Cienki (2008: 236–237) and Ehmer (2011: 50–55) promote a methodological combination of conversation analysis with mental space and blending theory which provides “interactional evidence” for a “cognitive description” (Hougaard 2005: 1672). Conversation analysis relies essentially on methodological procedures which reveal and trace members’ methods in making sense of what is going on. A turn’s meaning is consistently determined by the interlocutors’ displayed understanding of it (next-turn-proof procedure). This is obviously just one possible understanding, though it is the only one jointly accomplished in interaction and provable from the data. Conversation analysis thus helps reduce the number of possible interpretations. In other words, while the reconstruction method characterizing mental space and blending theory generates “interactionally motivated hypotheses in the shape of possible runs of meaning construction processes” (Hougaard 2005: 1679), conversation analysis helps to “qualify and constrain ‘cognitive accounts’ on the basis of emic, micro-sociological analyses” (Hougaard 2008: 180). Conversation analysis relies on several procedures and mechanisms to which interlocutors themselves orient and which deliver insight into how these negotiate meaning. Adopting a members’ perspective, a preceding or supplementing conversation analysis helps to reveal the intersubjectively relevant course of meaning construction and lends empirical support to a mental space or blending analysis. In this context, overt displays in which recipients immediately show how they understand a previous turn play a crucial role, as already stressed. Of course, interlocutors may also withhold reactions. Yet this does not necessarily doom the method to failure. For lack of overt reaction often testifies to another interpretive procedure employed by the recipient(s) which is referred to as ‘let it pass’. Though rooted in phenomenology, it makes a general appeal to common sense (Cicourel 1973). ‘Let it pass’ is based on the assumption that interlocutors may refrain from any overt displays of misunderstanding or indication of trouble, on the assumption that upcoming talk will bring about clarification or that redundancy will solve the problem as talk progresses. It is clearly not as reliable as conversation analysis’s orientation to overt displays, but it justifies the inclusion of less collaborative chunks of discourse containing the phenomenon under investigation,

208 Understanding Conversational Joking

since upcoming discourse helps in verifying the analysis. Ehmer (2011: 55) tacitly (i.e., without mentioning the ‘let it pass’ principle) follows this lead and includes fictional scenarios predominantly fleshed out by a single speaker in his mentalspaces analysis of imagination in discourse. Similarly, Hougaard (2008) analyzes summaries of preceding talk which are not rejected, commented on or reacted to in any other way as unproblematic, positing that the wrap-up offered is tacitly accepted and thus shared. Nonetheless, without interlocutors’ ratifying or elaborating turns, analysis is necessarily strongly influenced by the analyst’s subjective interpretation. Other interactional and organizational resources familiar from conversation analysis can be easily employed in order to validate and constrain a cognitive analysis, because they provide insight into how interlocutors jointly negotiate meaning. Schegloff (1991), for example, argues that repair sequences working on misunderstandings are an organizational structure designed to accomplish socially shared cognition; they allow for tracing how interlocutors adjust their understanding, and thus describe a sequential mechanism in which intersubjectivity is achieved. Hougaard again offers a mental space analysis of formulations in which the conversation analytic account informs the cognitive analysis in terms of “turn packing utterances as a type of conceptual compression” (Hougaard 2008: 197). From a conversation-analytic point of view, formulations represent candidate understandings of what participants have said earlier that are offered for ratification (Heritage and Watson 1979). As non-neutral summaries of preceding turns, they suggest merely a “particular interpretation” of the material “packed up” this way (Hougaard 2008: 194; italics in original). Accordingly, recipients either tacitly accept the offered upshot (and ‘let it pass’) or provide an alternative version. Hence, formulations represent a practice by which interlocutors negotiate what they have been talking about. By virtue of their summarizing potential, Hougaard (2008: 200) refers to them as “turn-packing utterances” which, in cognitive terms, provide a compression of the variously interconnected mental spaces evoked by the turns summed up. Consequently, they testify to the “construction of a shared interactional memory chunk which may come about because of the interactional activity” (Hougaard 2008: 203). A (tacitly or explicitly) ratified formulation comprising the gist of several previous turns and mentally represented by a compressed network of mental spaces thus contains what is in the interlocutors’ ‘corporate’ mind at a given moment in discourse. Ehmer (2011: 52–54) stresses the capacity of projection, which is active on several levels of conversation and which can also be used to validate a cognitive analysis. On several levels of interaction, interlocutors orient to projections, which help them to anticipate what can happen next. Projection is active within turns, within sequences and with regard to larger and more complex formats (Selting



Chapter 4.  Conversational humor from a discourse-semantic perspective 209

2000; Auer 2005). Within adjacency pairs, a specific first pair part raises expectations of a particular second pair part to come. A question, for example, makes an answer conditionally relevant. “Bigger projects” such as jokes or narratives are preceded by prefaces or pre-sequence in which an announcement or invitation elicits a ratification which licenses and projects the actual telling (Sacks 1992). Projection is also active within turns: that is, between turn constructional units (TCUs). Selting (2000) shows that some TCUs end in a transition relevance place, while others do not. These latter thus project turn continuation. In particular, Lerner (1991) demonstrates how interlocutors, when completing or supplementing each other’s turns, orient to and exploit the syntactic structures of TCUs by delivering an obligatory or optional constituent. Specifically, he shows how interlocutors orient to syntax in co-constructing turns. An inchoate syntactic structure projects a particular continuation, which can thus be used either to hold the turn or to take it, whether collaboratively or competitively (Auer 2005; Helasuvo 2004; Grenoble 2008, 2013; Lerner 2002). In other words, a syntactic construction projects, but also delimits continuations, which can be realized either by the same speaker or by an interlocutor. Whichever option is realized, when processing utterances-in-progress interlocutors orient to the established projection (Lerner 1991). Thus, projections can serve as prompts to set up a mental space and simultaneously specify the conceptual structure of that space. The segmentation of turns in TCUs, and an interlocutor’s orientation to syntactic projections when processing unfolding utterances, are utilized by Hougaard (2005: 1658) in his analysis of “micro-development blending” to assist in combating “blending’s post hoc-problem”. TCU by TCU, and even constituent by constituent, he traces gradually unfolding networks of mental spaces that are variously interconnected or blended, and thus shows how mental spaces are evoked and/ or modified by successive units. For this purpose, “[u]‍tterances are divided into fragments, and corresponding stages in the meaning construction / interpretation process are described sequentially in mental space terms” (Hougaard 2005: 1660). This allows for an analysis of online meaning construction in interaction as it gradually evolves in very small steps. Mental spaces, and networks thereof, are meant to visualize what is in interlocutors’ minds at a given point in discourse. Variously interconnected networks of mental spaces also display how information in discourse is partitioned, and how interlocutors parse discourse when processing utterances-in-progress. Hougaard (2005) assumes that units such as TCUs and fragments of them set up and modify mental spaces. By contrast, in order to grasp what is within interlocutors’ focus of attention at a given moment, Oakley and Coulson (2008) or Langacker (2013b: 154) resort to the intonation unit (IU) (Chafe 1994). Though not stemming from conversation analysis proper, this concept provides empirical arguments in favor of a particular cognitive reconstruction

210 Understanding Conversational Joking

and accounts for the inherently temporal character of oral discourse and talk-ininteraction. Chafe (1994: 71–74) argues that an IU represents information which is active (i.e., in the focus of consciousness), as compared to semi-active, albeit still accessible information from preceding IUs. In other words, an IU represents an idea, and “each such idea is active, or occupies a focus of consciousness, for only a brief time, each being replaced by another idea at roughly one-or-two second intervals” (Chafe 1994: 66). Research into spoken language and talk-in-interaction shows that the units which emerge in discourse are defined by multiple criteria of syntactic, prosodic or semantic-pragmatic nature (e.g., Auer 2010; Kibrik and Podlesskaja 2006, 2009). Chafe’s (1994: 53–70) IU is such a unit. Initially conceived of as an idea unit (Chafe 1980), it is an essentially cognitive unit prototypically characterized by a distinct intonation contour, set apart by boundary signals (filled/unfilled pauses, creaky voice, etc.) and syntactically realized by a clause. In cognitive terms, such a substantive IU verbalizes “the idea of an event or state” (Chafe 1994: 66). Chafe’s concept is adopted by Kibrik and Podlesskaja (2009: 55–64), who coin the label elementary/basic discourse unit (ėlementarnaja diskursivnaja edinica) and attest to similar syntactic and prosodic features characterizing and setting apart such units in Russian oral discourse. They (2009: 56) also point to an early concept from Russian linguistics, the sintagma, used by Ščerba (51955: 87–88) to stress the correlation between prosodically shaped, semantic discourse units and recipients’ focus of consciousness (“Fonetičeskoe edinstvo, vyražajuščee edinoe smyslovoe celoe v processe reči-mysli i moguščee sostojat’ kak i z odnoj ritmičeskoj gruppy, tak i iz celogo rjada ich, ja nazyvaju sintagmoj”). This turns the IU (cf. sintagma, ėlementarnaja diskursivnaja edinica) into an apt candidate for the unit or amount of information in an interlocutor’s mind at any given moment in discourse. It can be utilized as an indicator of when to set up a new mental space comprising a discrete scenario in the flow of discourse. Coulson and Oakley even argue that IUs are interactionally relevant and claim that interlocutors mutually orient to them in parsing discourse and partitioning information. These units guide meaning construction because they possess prosodic instructions for understanding what information is prominent in the speaker’s consciousness at any given moment. Hearers then unpack that information according to the prosodic guidelines of the perceived speech. In our theory, then, an IU helps discourse participants to construct mental spaces and mental space networks that are sufficiently similar in their semantic and pragmatic facets to facilitate interaction. (Oakley and Coulson 2008: 33)

These authors adopt Chafe’s concept in its entirety and distinguish substantive IUs, which represent a distinct scenario or idea, from regulatory IUs (e.g., discourse



Chapter 4.  Conversational humor from a discourse-semantic perspective 211

markers, backchannel behavior, continuers) and fragmentary IUs (i.e. truncated utterances), which serve discourse-structuring and organizational purposes. Their analysis of an extract from an interview, however, shows that “there is no one-toone mapping between material in an IU and material in a single mental space” (Oakley and Coulson 2008: 34). In other words, upcoming substantive IUs need not necessarily set up a new mental space but may merely modify or elaborate on one previously established. This is in line with observations by Hopper and Thompson (1980) on the low information density of conversational discourse. They show, for example, that referents in discourse are often maintained through several clausal units, and that the introduction of new concepts is often delayed in favor of maintaining an idea already introduced. Nonetheless, a perceptible “terminal contour may signal the end of the structuring of the reference space and allow the listener to anticipate the need to activate structure from a novel domain” (Oakley and Coulson 2008: 38). In sum, the units into which talk-in-interaction is segmented can, at best, serve as clues that can be utilized to validate an analysis. The end of one unit and the start of another can trigger the build-up of a new mental space or signal that a previously established mental space is to be modified. But it may also merely maintain an existing mental space. Whatever the case, such a unit represents the amount of information which is consciously on stage at a given moment in discourse. In face-to-face interaction, the range of resources which can launch a mental space is yet wider. Apart from linguistic constructions, or structures such as the IU, the TCU or still smaller units which are verbally realized and which are characterized by a prosodically and syntactically projected course and closure, it includes iconic or pointing gestures (e.g., Ehmer 2011). Müller (2008), for example, shows how speakers verbalizing metaphors in their talk perform gestures which iconically represent or allude to the source domain of the metaphor. She adduces this as evidence that the metaphoric mapping in question remains active in thought, something that distinguishes conventional from ‘dead’ metaphors. Mental spaces may also be triggered implicitly, for example by the situation. One mental space evoked by the speech situation comprises the interlocutors, their real-life roles in the current encounter, and the goals or purposes of the ongoing interaction. As a mental representation of the interlocutors and their roles in the here-and-now, this space can be compared to Fauconnier’s (1997) base space, which it functionally resembles in serving as point of departure for further meaning construction. At the same time, it is distinct in being rooted in the here-and-now speech situation. Ehmer (2011: 178) therefore labels this mental space reality space. Oakley and Coulson reinterpret this specifically structured base space typical of face-toface interaction in terms of grounding (Langacker 2001; Clark 1996):

212 Understanding Conversational Joking

Grounding allows the theory of mental spaces to consider explicitly how situational knowledge contributes to the understanding and management of discourse. Grounding involves specifying (1) the discourse participants and their roles, (2) the rhetorical situation that serves as the immediate local context for the current communicative act, (3) the situational and (4) argumentative relevance of the mental space network. (Oakley and Coulson 2008: 30)

Conceptual knowledge evoked by grounding can also contribute to blending, or be exploited to achieve it. This is the case when interlocutors evoke discourse types or scenarios different from the one defined by the current speech situation. Ehmer (2011), for example, shows how interlocutors jointly imagine and stage virtual scenarios in interaction, during which they assume roles other than their real-life ones (cf. Sinha 2005). The mental spaces evoked, interconnected and blended in this context can be regarded as discourse spaces suggesting different roles and relationships between the participants in the scenarios the spaces contain (Langlotz 2008: 355). Mental spaces, accordingly, can comprise various kinds of conceptual structure. They can pertain to knowledge about discourse or activity types, about genres or social-communicative activities and about norms governing language use or politeness. Accordingly, such spaces can specify interlocutors’ understanding of the activity they are involved in, their relationships with other participants or objects, and so on. Consequently, resources such as pragmatic patterns (e.g., terms and formulae of address, greeting conventions), genre patterns, styles, registers or varieties, and quotes introducing interdiscursivity can also be utilized in order to evoke mental spaces in face-to-face interaction (Langlotz 2008, 2013; Fuji 2008). From an interactionist point of view, such resources (e.g., forms of address, styles) are taken to be contextualization cues indicative of a particular context (e.g., a situation of use or activity) and used by interlocutors to establish this context actively. Langlotz (2008, 2013), who seeks to reconcile cognitivist and interactionist approaches, therefore suggests re-interpreting them as space builders. As “heavily overlearned form-content associations that are structurally coupled with the [idealized cognitive models] of conventionalized social-communicative activities”, contextualization cues have the capacity to evoke a particular “usage-context in actual processing” (Langlotz 2008: 361). Usage-context here refers to the interactive frame providing knowledge about the speech event and thus giving conceptual structure to the corresponding mental space. Thus, strictly speaking, Langlotz draws no distinction between frames and mental spaces in reinterpreting contextualization cues in terms of space builders. Using the example of an extract from computer-mediated communication among students, he shows how contextualization cues, such as greeting conventions and quotes from literature, are employed by the users to creatively determine a leader within their study group. Specifically, the use of greeting conventions and forms of address different from those habitually used among the



Chapter 4.  Conversational humor from a discourse-semantic perspective 213

students evokes mental spaces representing distinct discourse spaces; that is, mental spaces comprising a different kind of situation and thus suggesting a different framing of the activity. Conceptual structure from these spaces (pertaining, e.g., to the students’ roles and relationships) is subsequently blended with structures from the ‘reality’ space (cf. base space or grounding) comprising the students’ reallife roles within their study group. In Langlotz’s example, a potential team leader who addresses his fellow students with Hi groupies, for instance, blends his actual role with that of a rock star. When the same student quotes the well-known phrase One ring to rule them all from Tolkien’s Lord of the Rings, he blends his role as team leader with that of the all-powerful Lord of the Rings. “In this sense, contextualization cues can be re-interpreted as mental-space builders” (Langlotz 2008: 361). A further benefit of Langlotz’s suggestion derives undoubtedly from another characteristic feature of contextualization cues: namely, that interlocutors mutually orient to them since they are conventionalized within the given speech community. As long as a cue is sufficiently salient, and thus perceived by the interlocutors, it is likely to activate the correspondent background knowledge. As “conventional devices or signals that are directly connected to regularities in recurrent joint social activities” (e.g., greeting conventions) contextualization cues function as “mentally-stored conventional coordination devices” (Langlotz 2008: 358; italics in original). Thus in a sense, Langlotz confirms the cognitive roots of contextualization and contextualization theory, illustrating how cues matter in an occasionally “complex process of cognitive contextualization” (2008: 351; italics in original). At the same time, his re-interpretation of contextualization cues as space builders is problematic, albeit in a rather subtle way predominantly related less to its practical applicability than to its implementation in mental space theory. To be specific, the resources he lists (e.g., formulae of address, quotes) do more than merely prompt recipients to set up a new mental space and to specify its domain of reference. They also activate background knowledge about particular discourse types, social and communicative activities and situations of use. In doing so, they immediately specify the conceptual structure of the newly evoked mental space. By virtue of this capacity, they should be referred to as frame-setting devices rather than as space builders in the narrow sense (see above on Fauconnier’s understanding of space builders). This specific understanding traces back to an ongoing trend observable in research within the framework of blending theory, namely to abandon or neglect the distinction between mental spaces and frames. Irrespective of these subtleties, however, both understandings harmonize with the original outline of contextualization theory (Gumperz 1982). Summing up, in face-to-face interaction several multimodal resources (ranging from linguistic structures and gesture to discursive or genre patterns, pragmatic conventions, styles, varieties and registers or quotes) evoke mental spaces.

214 Understanding Conversational Joking

In this process, the conceptual structure and background knowledge enriching the scenario mentally represented, and often also evoked by these resources, rely on various discursively relevant frames. Mental spaces can recruit encyclopedic knowledge as it is stored and provided by cognitive frames. They can also comprise a “discourse space” structured by “knowledge about different discursive contexts with implied role models” (Langlotz 2008: 355) including social and behavioral norms. “In sum, frames give structure to the interpretation of events at semantic, interactional/relational and even socio-cultural levels” (Fuji 2008: 186), and elements and structure from within such variously framed mental spaces can be exploited for (creative) meaning construction in interaction. The methodological enhancements outlined above do indeed help in coping with many of the remaining challenges to be met when mental space and blending theory is applied to the analysis of interaction. Conversation analysis can help in constraining and reducing the number of possible cognitive reconstructions. Furthermore, it provides methods for tracing shared cognition since it reveals what interlocutors orient themselves to, and how meaning is jointly negotiated (Hougaard 2005, 2008; G. Hougaard 2008). Nevertheless, what is going on in the mind of an individual cognizer may be richer, although it need not be externalized in discourse. A fully fledged blending analysis conducted from a strictly cognitive point of view aims to display this rich cognition. In the following analyses we will therefore point also to the possibly richer backstage cognition visualized in the reconstructed conceptual integration networks, even if only a specific element of structure or mapping is ratified by the interlocutors. Discourse analysis mainly contributes to enhancing the approach in two ways. First, the idea of linguistically shaped cognitive units emerging in discourse, such as the IU (Chafe 1994) or the elementary discourse unit (Kibrik and Podlesskaja 2006, 2009), defines information which is in cognizers’ focus of attention at a given moment, and thus determines possible units of conceptualization (Oakley and Hougaard 2008; Langacker 2013b). Second, combined with contextualization theory, discourse analysis ultimately suggests a broad perspective on resources that may impact on conceptualization in evoking mental spaces and activating the frames structuring them, and thus contributing to a coherent interpretation (Langlotz 2008, 2013; Fuji 2008). In line with Langlotz, we further assume that particularly salient cues guiding interlocutors’ interpretation may be consulted in order to ensure that the respective mental representation is mutually shared. Against this background, the previously mentioned problem of mental space and blending theory in accounting adequately for the ephemeral character of conversation and cognition in interaction can be regarded as minor. The standard means of depicting blends or related networks of mental spaces represents them as products and deprives them of their sequential and temporal character. Coulson

Chapter 4.  Conversational humor from a discourse-semantic perspective 215



(2005) and Ehmer (2011), however, find means of representation which account for the process character of blending in conversation, and which visualize how a network of variously interconnected mental spaces unfolds as discourse proceeds. Coulson (2005) generally avoids the typical representation of spaces in the diagram form seen represented in Figures 5 and 6. Instead, she arranges those mental spaces subsequently evoked, as well as blends combining elements and structure, into tables, in which the vertical order of the table elements reflects the sequential order in which they are evoked and blended in discourse. In contrast, Ehmer (2011) continues to represent mental spaces and related networks in the form diagrams that are variously interconnected and arranged (see Figure 7). t

frame

Blend

Input space 1 (currently evolving in conversation)

frame

Input space 2

frame

Figure 7.  Unfolding of a network of mental spaces and blends in conversation (Source: Ehmer (2011: 179), slightly adapted)

In order to display how a network or blend unfolds sequentially, Ehmer arranges the mental spaces along a temporal axis from left to right. Input spaces positioned on the timeline are explicitly evoked in conversation. Positioned farther from the timeline are input spaces from which blends recruit elements and structure not explicitly triggered in the preceding discourse but activated in the background. Frames supplying conceptual structure are located directly below the mental spaces to which they relate. This arrangement enables improved visualization of frames previously activated that remain active in the background, and thus available for further meaning construction. We adopt and adjust Ehmer’s mode of representation for our analysis of humorous cognition in interaction, which aims at revealing the conceptual configurations characterizing humor-specific processing and thus

216 Understanding Conversational Joking

typical of joking sequences. Consequently, any assembly of variously linked mental spaces should be read from left to right, in order to trace cognition in interaction and online meaning construction as it unfolds sequentially in conversation. 4.3.2 Conceptual configurations characterizing humorous cognition in interaction In Chapter 3, we adopted an essentially cognitive perspective on humor, which characterizes the phenomenon in terms of its disruptive processing responsible for a measurable rise in processing effort. Accordingly, humor was conceived of as a form of cognitive creativity triggering rich and unusual inferences, and generating new meanings and senses originating from norm-breaching. A mental space and blending approach informed by discourse and conversation analysis will now allow us to analyze more thoroughly how interlocutors process humorous sequences as these evolve in interaction. It further allows for a comprehensive and more thorough description of humor-specific creative meaning construction, which can be depicted in the form of conceptual configurations arranged specifically for the purpose. Networks of mental spaces establishing mappings and connections between the individual spaces visualize how new meanings arise. The frames providing contextual knowledge and background information to the mental spaces facilitate rich inferences. They supply additional material, which can be utilized immediately, but also resorted to subsequently for further meaning construction, while the frames remain in the background. The following subsections will reveal which configurations and blends of mental spaces (including the frames supplementing the conceptual structure) account for the humor-specific (i.e., disruptive, effort-demanding and rich inferential) processing of joking sequences. The analysis will consider not only the relationships between the frames involved (e.g., with regard to their evaluative, emotive and social load or their inherent logic). It will also cover the humor-specific ways in which elements and their relationships are connected and mapped onto each other across mental spaces. Moreover, the analysis will illustrate how these interconnected or blended networks of mental spaces evolve sequentially, and how interlocutors may take the chance to launch a joking sequence and start a humorous digression from otherwise serious talk. Such situations include both those in which one speaker predominantly initiates and accomplishes a (possibly also longer) humorous sequence, and others in which the sequence is developed collaboratively by several interlocutors. In the first scenario, the jester elaborates on a lattice of mental spaces they have initiated themselves by incrementally extending their turn. In the second, several interlocutors jointly unfold a network of mental spaces by adding humorous turns and TCUs to one another’s contributions.



Chapter 4.  Conversational humor from a discourse-semantic perspective 217

Lastly, the analysis will shed light on the varying impact of the (mostly linguistic) input on meaning construction in humorous discourse.2 The way in which humorous talk is designed affects, guides and facilitates humorous cognition in different ways. In Chapter 2, the performance peculiarities setting apart humorous utterances from surrounding serious discourse and indicating a humorous framing were all subsumed under the label of contextualization cues. The new analytic perspective provides a framework for their re-analysis, and for a re-assessment of their impact on humorous cognition, by distinguishing space builders from other resources facilitating the humor-specific disruptive processing. In the remainder of the chapter, we will present the three basic conceptual configurations eliciting single or multiple humor-specific cognitive contrasts in conversation, using the example of variously patterned forms of conversational humor from the database. The cognitive contrast forcing recipients, or rather co-present interlocutors, to conduct a semantic re-analysis can be caused by the subsequent evocation of differently framed mental spaces (4.3.2.1), by the one-off or repeated blending of disparate mental spaces (4.3.2.2) or by a dissolution of entrenched blends that reintroduces a contrast between one of the input spaces involved and the cognitively prominent blended space (4.3.2.3). 4.3.2.1 Frame-shifting Humorous effects in conversation can result from single or multiple frame-shifts which may be initiated spontaneously whenever interlocutors notice the opportunity to start a humorous digression from a serious, or at least non-humorous, task currently being addressed. These frame-shifts in a sense resemble those characterizing punch line-humor from jokes (e.g., Coulson 2001; Norrick 1986). In jokes the frame-shift makes available two alternative interpretations which differ in their cognitive accessibility (e.g., Forabosco 1992; Yus 2003, 2016; Giora 1991). These interpretations offer alternative ways of contextualizing a given element, event, scenario, activity or other phenomenon that are in some way related. Humor theories refer to this relation, which warrants the frame-shift, in terms of (incongruity)‍resolution. In this context, it is important to stress that recipients of jokes are willing to accept as resolution mechanisms logical relations which might not hold beyond the realm of humor (Attardo, Hempelmann and Di Maio 2002; Ziv 1984; Hofstadter and Gabora 1989; Dynel 2009a).

2.  Since this research is predominantly based on audio-data (supplemented by already transcribed data, cf. 1.3), the visual channel is omitted from the analysis. Thielemann (2015), however, includes an example from videotaped media discourse which allows for a multimodal perspective in the analysis of conversational joking.

218 Understanding Conversational Joking

Conversational joking sequences analyzable in terms of frame-shifting are characterized by a preference for specific resolution mechanisms, as can be seen in Examples (5) and (6) below. The resolution mechanism that licenses the frameshift can be associated with the connector linking or mapping elements across mental spaces, while the humorous sequences evoke a chain of ‘weakly’ linked scenarios structured and enriched by disparate frames. Interlocutors usually pick an element (a) from the currently evoked mental space and connect it to an element from a differently framed mental space (a’). In the case of frame-shifting in spontaneous conversational joking, the links connecting the two spaces often rely exclusively on remote associations (based, e.g., on polysemy, homonymy, formal cohesive ties, sound- or form-based similarity). Consequently, the two neighboring scenic representations resulting from the subsequent evocation of a second frame often share very few or no other links. The emerging discourse is thus characterized by a conceptual coherence different from that typical of serious (or at least non-humorous) discourse. The mental spaces subsequently evoked are connected in ways that make it difficult to perceive them as alternative interpretations of a scenario. In other words, the conversational sequences under consideration are essentially characterized by conceptual discontinuity, and are merely held together by associative coherence. The topical dimension prominent or focused on in non-humorous discourse fades into the background, and form-based aspects licensing the connection come to the fore. This also testifies to interlocutors’ increasing orientation to play and especially to play with form (cf. Subsection 2.2.1). The conversational episode reproduced in (5) illustrates how interlocutors jointly elaborate a humorous sequence which forces recipients to shift between frames linked merely by association and otherwise disparate. The reconstructed network of mental spaces (Figure  8) visualizes how the sequential evocation of differently framed mental spaces unfolds in a manner characteristic of this humor-specific conceptual configuration. Since the triggering sequences are collaboratively constructed, the reconstruction can further be supposed to externalize shared cognition. The episode is spontaneously initiated by Petja and associatively interlaced in talk about a settlement on the outskirts of the interlocutors’ city, Kharkov. In doing so, Petja exploits two potential sources which allow for the creation of associative links and thus facilitate frame-shifting. These are the names of the settlement (Šiškovka) and of a friend who lives there (Medved). Both have the potential to introduce ambiguity which can be and is utilized for humorous purposes in (5). The friend’s name, although here pronounced with a non-palatalized final sound, is close to Russian medved’ ‘bear’. If it is a nickname, something not clear from the data, it can be taken to be figuratively or metaphorically motivated, which would further justify the claim that the concept of a bear is evoked at least in

Chapter 4.  Conversational humor from a discourse-semantic perspective 219



the background. The settlement’s name shares the lexical root {šišk-} with šiška ‘cone’ and similarly gives rise to associations with the corresponding concept due to a “sub-lexical morphological level match” (Attardo, Hempelmann and Di Maio 2002: 15). In conjunction, the names make possible a humorous digression from the preceding serious discourse on the social environment of the settlement. For they facilitate a frame-shift from [urban settlements as habitat for people] to [forests as habitat for animals] based on a resolution mechanism which can generally be categorized as cratylism (Attardo, Hempelmann and Di Maio 2002: 18). In other words, whereas names as such are usually meaningless, both Medved and Šiškovka can be reinterpreted here as telling names capable of evoking the second frame. 1

(5) RuMaCharkov2008 (00:48:44), Petja (P), Viktor (V), Saša (S), Nadja (N) P :

2 3 4 5 6 7 8 9

V :

10

S :

11

V :

12 13 14 15

ži živja na TROEščine, li- living in Trojeshchina koNEčnosure i nevychodja iz RAJOna-= and not leaving the neighbourhood =on budet tebe rasskazyvat’ čto he is telling you that (do fiGA gopnikov); (tons of white trash) ty sprosi (.) čeloveka kotoryj živёt na SALtorke but ask a person who lives in Saltorka naprimer. for example (-) nu vot- u menja medVED est’. but well I have Medved (bear) there on na SALtorke živёt? does he live in Saltorka? nu- on on na ŠIŠkovke živёt;= well he he lives in Shishkovka =ėto ešČЁ chuže. that is even worse (-) vot; yeah i on do sich POR nachoditsja v [polnom and until now he has been in [full

220 Understanding Conversational Joking

16 17

P :

18 19

S :

20 21

N : P :

22 23

S :

24

V :

25

P :

26 27 28

V :

29

P :

30

V :

31 32

33 34 35

N :

36

V :

[hahahaha .hh .hh . Shishkovka ( ) na ŠIŠkovke? in Shishkovka voa:, .hh . in a bear’s den hehehehehe da:= yes =. bear and cones .hh . in the cone forest .hhe .hhe oa:.= =vot ONwell he