Meaning, Mind and Communication: Explorations in Cognitive Semiotics 9783631657041, 3631657048

This volume constitutes the first anthology of texts in cognitive semiotics – the new transdisciplinary study of meaning

351 54 6MB

English Pages 482 [484] Year 2016

Report DMCA / Copyright


Polecaj historie

Meaning, Mind and Communication: Explorations in Cognitive Semiotics
 9783631657041, 3631657048

Table of contents :
Table of Contents
1. Introduction: Cognitive Semiotics Comes of Age • Jordan Zlatev, Göran Sonesson & Piotr Konderak
Part I. Metatheoretical Perspectives
2. Mutual Enlightenment: A Phenomenological Interpretation of the Embodied Simulation Hypothesis • Carlos A. Pérez
3. A Cognitive Semiotic Perspective on the Nature and Limitations of Concepts and Conceptual Frameworks • Joel Parthemore
4. Agency in Biosemiotics and Enactivism • Morten Tønnessen
5. Design Semiotics with an Agentive Approach: An Alternative to Current Semiotic Analysis of Artifacts • Juan Carlos Mendoza Collazos
6. Towards a Cognitive Semiotics of Science: The Case of Physical Chemistry • Michael May, Karen Skriver & Gert Dandanell
Part II. Semiotic Development and Evolution
7. Meaning, Consciousness, and the Onset of Language • Lorraine McCune
8. The “Symbol Grounding Problem” Reinterpreted from the Perspective of Language Acquisition • Mutsumi Imai
9. Key Roles of Found Symbolic Objects in Hominin Physical and Cultural Evolution: The Found Symbol Hypothesis • Keith E. Nelson
10. Mindreading, Mind-travelling and the Proto-discursive Origins of Language • Francesco Ferretti & Ines Adornetti
11. From Conversation to Language: An Evolutionary Sensory-Motor Account • Alessandra Chiera
12. Protolanguage as Formulaic Communicaction • Serena Nicchiarelli
Part III. Meaning across Media, Modes and Modalities
13. From Mimesis to Meaning: A Systematics of Gestural Mimesis for Concrete and Abstract Referential Gestures • Cornelia Müller
14. Verbal and Nonverbal Markers of Impolite Behavior in Russian Language and Non-Verbal Code • Grigory Kreydlin & Lidia Khesed
15. Symmetrical Reasoning in Language and Culture: On Ritual Knots and Embodied Cognition • Jamin Pelkey
16. Cognitive Semiotics of Mental Disorders, with Focus on Hallucinations • Štěpán Pudlák
17. Pictorial Responses and Projected Realities: On an Elicitation Procedure and its Ramifications • Gisela Bruche-Schulz
18. Iconic Properties are Lost when Translating Visual Graphics to Text for Accessibility • Peter Coppin, Ambrose Li & Michael Carnevale
Part IV: Language, Blends and Metaphors
19. Deonstemic Modals in Legal Discourse: The Cognitive Semiotics of Layered Actions • Todd Oakley
20. Commutation of Cognitive Source Domains as a Semiotic Tool for Paradigmatic Analysis • Vlado Sušac
21. The Emergence of Multimodal Metaphors in Brazilian Political-electoral Debates • Maíra Avelar
22. “A Light in the Darkness”: Making Sense of Spatial and Lightness Perception • Marco Bagli
23. Performative Metaphor in Cultural Practices • Katherine O’Doherty Jensen
24. Objects and Nouns: An Account of the Vision-Language Interface • Francesco-Alessio Ursini
25. Linguistic Theory in the Framework of Cognitive Semiotics: The Role of Semio-Syntax • Per Aage Brandt
List of Contributors
Index: Subject

Citation preview

Jordan Zlatev / Göran Sonesson / Piotr Konderak (eds.)

Meaning, Mind and Communication Explorations in Cognitive Semiotics

Meaning, Mind and Communication

Jordan Zlatev / Göran Sonesson / Piotr Konderak (eds.)

Meaning, Mind and Communication Explorations in Cognitive Semiotics

Bibliographic Information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the internet at Library of Congress Cataloging-in-Publication Data Names: Zlatev, Jordan, editor. | Sonesson, Göran, 1951- editor. | Konderak, Piotr, editor. Title: Meaning, mind and communication : explorations in cognitive semiotics / Jordan Zlatev, Goran Sonesson, Piotr Konderak (eds.). Description: Frankfurt am Main ; New York : Peter Lang, [2017] Identifiers: LCCN 2016042909| ISBN 9783631657041 (Print) | ISBN 9783653049480 (E-PDF) | ISBN 9783631701300 (EPUB) | ISBN 9783631701317 (MOBI) Subjects: LCSH: Semiotics. | Cognition. | Human evolution—Psychological aspects. | Evolution—Psychological aspects. | Language and culture. Classification: LCC P99 .M3978 2017 | DDC 302.2—dc23 LC record available at Cover Images: © Piotr Konderak ISBN 978-3-631-65704-1 (Print) E-ISBN 978-3-653-04948-0 (E-PDF) E-ISBN 978-3-631-70130-0 (EPUB) E-ISBN 978-3-631-70131-7 (MOBI) DOI 10.3726/978-3-653-04948-0 © Peter Lang GmbH Internationaler Verlag der Wissenschaften Frankfurt am Main 2016 All rights reserved. Peter Lang Edition is an Imprint of Peter Lang GmbH. Peter Lang – Frankfurt am Main ∙ Bern ∙ Bruxelles ∙ New York ∙ Oxford ∙ Warszawa ∙ Wien All parts of this publication are protected by copyright. Any utilisation outside the strict limits of the copyright law, without the permission of the publisher, is forbidden and liable to prosecution. This applies in particular to reproductions, translations, microfilming, and storage and processing in electronic retrieval systems. This publication has been peer reviewed.

Table of Contents Jordan Zlatev, Göran Sonesson & Piotr Konderak Chapter 1 Introduction: Cognitive Semiotics Comes of Age...........................................................9 Part I. Metatheoretical Perspectives Carlos A. Pérez Chapter 2 Mutual Enlightenment: A Phenomenological Interpretation of the Embodied Simulation Hypothesis........................................................................................31 Joel Parthemore Chapter 3 A Cognitive Semiotic Perspective on the Nature and Limitations of Concepts and Conceptual Frameworks.............................................................................47 Morten Tønnessen Chapter 4 Agency in Biosemiotics and Enactivism...........................................................................69 Juan Carlos Mendoza Collazos Chapter 5 Design Semiotics with an Agentive Approach: An Alternative to Current Semiotic Analysis of Artifacts..............................................................................83 Michael May, Karen Skriver & Gert Dandanell Chapter 6 Towards a Cognitive Semiotics of Science: The Case of Physical Chemistry..................................................................................................................101 Part II. Semiotic Development and Evolution Lorraine McCune Chapter 7 Meaning, Consciousness, and the Onset of Language..............................................127


Table of Contents

Mutsumi Imai Chapter 8 The “Symbol Grounding Problem” Reinterpreted from the Perspective of Language Acquisition........................................................................................................145 Keith E. Nelson Chapter 9 Key Roles of Found Symbolic Objects in Hominin Physical and Cultural Evolution: The Found Symbol Hypothesis..................................................161 Francesco Ferretti & Ines Adornetti Chapter 10 Mindreading, Mind-­travelling and the Proto-­discursive Origins of Language......................................................................................................................................175 Alessandra Chiera Chapter 11 From Conversation to Language: An Evolutionary Sensory-­Motor Account ........................................................................................................................................189 Serena Nicchiarelli Chapter 12 Protolanguage as Formulaic Communicaction............................................................199 Part III. Meaning across Media, Modes and Modalities Cornelia Müller Chapter 13 From Mimesis to Meaning: A Systematics of Gestural Mimesis for Concrete and Abstract Referential Gestures.................................................................211 Grigory Kreydlin & Lidia Khesed Chapter 14 Verbal and Nonverbal Markers of Impolite Behavior in Russian Language and Non-­Verbal Code.........................................................................................227 Jamin Pelkey Chapter 15 Symmetrical Reasoning in Language and Culture: On Ritual Knots and Embodied Cognition................................................................................................................239

Table of Contents


Štěpán Pudlák Chapter 16 Cognitive Semiotics of Mental Disorders, with Focus on Hallucinations........251 Gisela Bruche-­Schulz Chapter 17 Pictorial Responses and Projected Realities: On an Elicitation Procedure and its Ramifications.........................................................................................261 Peter Coppin, Ambrose Li & Michael Carnevale Chapter 18 Iconic Properties are Lost when Translating Visual Graphics to Text for Accessibility.........................................................................................................................279 Part IV: Language, Blends and Metaphors Todd Oakley Chapter 19 Deonstemic Modals in Legal Discourse: The Cognitive Semiotics of Layered Actions.........................................................................................................................299 Vlado Sušac Chapter 20 Commutation of Cognitive Source Domains as a Semiotic Tool for Paradigmatic Analysis............................................................................................................317 Maíra Avelar Chapter 21 The Emergence of Multimodal Metaphors in Brazilian Political-electoral Debates.....................................................................................................331 Marco Bagli Chapter 22 “A Light in the Darkness”: Making Sense of Spatial and Lightness Perception....................................................................................................................................349 Katherine O’Doherty Jensen Chapter 23 Performative Metaphor in Cultural Practices...............................................................363


Table of Contents

Francesco-­Alessio Ursini Chapter 24 Objects and Nouns: An Account of the Vision-­Language Interface...................379 Per Aage Brandt Chapter 25 Linguistic Theory in the Framework of Cognitive Semiotics: The Role of Semio-­Syntax.....................................................................................................391 References....................................................................................................................................409 List of Contributors..................................................................................................................473 Index: Subject..............................................................................................................................477

Jordan Zlatev, Göran Sonesson & Piotr Konderak

Chapter 1 Introduction: Cognitive Semiotics Comes of Age 0. Introduction A decade since the appearance of the journal Cognitive Semiotics, the establishment of the International Association for Cognitive Semiotics (IACS) in 2013, and two successful international conferences, in Lund in 2014 and Lublin 2016, cognitive semiotics can hardly be characterized as an “emerging” discipline anymore. It is already here. Yet, we who are involved with this field are often pressed to answer: what is it really? The chapters in this volume, which originate from the IACS conference in Lund, are a kind of extensional answer to this question: this is what cognitive semiotics is like! The reader may note that the editors of the volume and authors of this introduction – who have served as “founding fathers” or, less pretentiously, as the main organizers of the first two international conferences – do not participate as authors in this anthology. Rather, we have taken up the role of “mid-­wives”, for a proper mix of metaphors, of this important project, which in effect is the first published volume of explorations in cognitive semiotics. This discipline of cognitive semiotics can be described as the study of meaning, mind and communication, as reflected by the title of this book. Admittedly, this is a broad object of study, but as this volume aims to show, there is both an internal coherence to, and an important mission for, cognitive semiotics: to help in “mending the gap between science and the humanities”, in the words of Stephen Jay Gould (2003). Crucially, this is to be done through mutual respect and methodological understanding, rather than a reductionist takeover from the side of natural science, or a postmodernist relativism from the side of the liberal arts. In previous works, we have highlighted a number of features of the new discipline of cognitive semiotics (Sonesson 2009, 2012; Zlatev 2012, 2015). All of these features do not have to be fulfilled, but together they define a prototype-­kind of structure; as we can see, the chapters in this volume conform to this characterization by displaying at least the first two features, and in many cases more. The first feature is that cognitive semiotics focuses on the study of meaning, and does so through a trans-­disciplinary (implying tighter contact than “interdisciplinary”) combination of methods and concepts from at least semiotics, cognitive science, and linguistics. As these fields are interdisciplinary themselves, this opens the doors to a number of related fields such as anthropology (e.g. Pelkey, this volume), graphic reasoning (Coppin, this volume), cognitive development (McCune,


Jordan Zlatev, Göran Sonesson & Piotr Konderak

this volume), language acquisition (Imai, this volume), and political discourse analysis (e.g. Avelar, this volume, Sušac, this volume). The second feature is what we have referred to as the conceptual-­empirical loop (Zlatev 2012) or more adequately: spiral (Zlatev 2015). One particular variant of this is what we have called “the dialectics of phenomenology and experiments” (Sonesson 2013). The point is that cognitive semiotics takes a keen interest in the analysis of concepts, not unlike philosophy, starting with proverbially ambiguous notions such as meaning, sign, language, culture, and consciousness. But to do so adequately, it is necessary to plunge into empirical studies where these phenomena are studied through scientific (in the broad sense) methods, and then re-­emerge, reinvigorated, on the conceptual side. Conversely, empirical investigations, for example on the difficult issues concerning how language emerges in evolution and development (discussed in Part 2), inevitably become involved with conceptual issues, since “what”-questions constrain the answers to “how and why”-questions. This also has the advantage of making it possible to specify psychological experimental paradigms in order to answer specifically semiotic questions, and of integrating experimental results into discussions within semiotics, phenomenology, and other variants of philosophy (see Sonesson 2013, Pérez, this volume). A third feature, more or less explicitly manifested in cognitive semiotic research, is to proceed in the study of the phenomenon in question – be it children’s symbolic play, the use of metaphors, symmetrical reasoning, hallucination etc. – by combining methods using a first-­person perspective (of the analyst or the participant), a third-­person perspective of detached observation and experimentation (allowing quantification in many cases), united by a second-­person perspective acknowledging that every form of scientific exploration is an act of communication, between experimenter and participant, analyst and informant, author and reader etc. The chapters by Parthemore, Bruche-­Schulz and Pudlak discuss this procedure overtly, but it is present in many of the other chapters, especially in those influenced by phenomenology, which is in itself important enough to be given as a fourth feature of cognitive semiotics. The “scientific study of consciousness”, or more adequately “the study of human experience and of the ways things present themselves to us in and through such experience” (Sokolowski 2000: 2), phenomenology is the philosophical tradition inaugurated by Edmund Husserl over a century ago, and continued by Maurice Merleau-­Ponty and others up to the present day (e.g. Zahavi and Gallagher 2008). In cognitive semiotics it is especially useful for the study of subjectivity (e.g. perception, affect, sense-­making) from the “inside” in a manner that is intersubjective, providing results that are, in a sense, objective (see Sonesson 2015a). It is also a very useful tool for analysing the difference between signs and other meanings (see Sonesson 1989, 2012). Further, as phenomenologists have been consistently anti-­ reductionist while open to science, concepts and methods from phenomenology have been very productive for researchers searching for “mutual enlightenment” (Gallagher 1997, see Pérez, this volume) between lived experience and detached



experimentation. The chapters by Pérez, Parthemore, Mendoza Collazos, McCune, Pelkey, and Bruche-­Schulz include explicit acknowledgments of this, but traces can be observed in many of the other chapters, such as in the analysis of gestures (Müller) or language (Brandt). The last feature is that of meaning dynamism, or the emphasis on the study of the various forms of meaning, not as objects or structures, but as processes, on various time scales, from those of evolution and development (see Part 2), to those of “the human level” of social interaction (the analysis of gestures, “multimodal metaphors” and language use in Part 3 and 4, or of language), to the micro-­level of “time consciousness” in retention and pretension process (in the chapters by Perez and Pelkey). So what kind of topics has cognitive semiotics been applied to so far? If we look at some representative previous publications we can list: the relation between attention and rhetoric (Oakely 2001), dynamic linguistic semantics (Brandt 2004), the evolution of consciousness (Donald 2001), children’s gestures (Andrén 2010) and pictorial competence (Lenninger 2012), the theoretical integration of semiotics and phenomenology (Sonesson 1989, 2009, 2015a), intersubjectivity and mimesis in evolution (Zlatev 2008a) and ontogenetic development (Zlatev 2013, 2014). Most of these topics are also addressed in the chapters of this volume, which we have grouped into four parts, with each part and contribution presented in the remainder of this introductory chapter, followed by some brief concluding words.

1. Part I: Meta-­theoretical perspectives In the first part the authors all address issues concerning the relation between cognitive semiotics and its surrounding fields: phenomenology and embodied cognition (Pérez), cognitive science (Parthemore), biosemiotics and enactivism (Tønnessen), agentive semiotics (Mendoza Collazos) and the semiotics of science (May, Skriver and Dandanell). In doing so, they address important disciplinary questions, such as what, if anything, makes cognitive semiotics “special”. Further, these chapters delve into central conceptual issues concerning the nature of meaning (both subjective and intersubjective, both perceptual and categorical etc.), agency, representation, and metaphor, and others – which also appear in the rest of the volume. In chapter 2, Carlos A. Pérez takes on the task to elucidate the “mutual enlightenment” – in the words of Gallagher (1997) and Thompson (2007) – between phenomenology and embodied cognitive science. This is done by assessing Bergen’s (2012) embodied simulation hypothesis, according to which we understand language by performing (mostly unconscious) “mental simulations”. Bergen’s work is meant for a broader audience, but in effect it popularises many of the key ideas of leading cognitive linguists (e.g. Lakoff and Johnson 2009) and cognitive scientists (e.g. Barsalou 2009) on the embodiment of meaning. Pérez reviews a simple experiment meant to demonstrate such unconscious mental simulation. Participants first read sentences concerning everyday objects like nails and walls, and then, when given pictures of such objects, where shown


Jordan Zlatev, Göran Sonesson & Piotr Konderak

to be faster in responding when the objects where presented in more common orientations than in less common ones. Pérez shows the ambiguity of the notion of “embodied simulation” from the perspective of phenomenology and proceeds to provide a more adequate analysis.1 Combing the analysis of language and speech acts of the early Husserl (where act, object and meaning are distinguished, and the intersubjective nature of the latter is emphasized) and the notions of presentification and protention of Husserl’s later genetic phenomenology, the author provides a convincing elucidation of the relation between linguistic and perceptual meaning, explaining the results of such experiments in a more coherent manner than by appealing to embodied simulation. This contribution is particularly important as it shows how a phenomenological cognitive semiotics differs from, and may offer advantages to, more mainstream accounts. Joel Parthemore begins chapter 3 by providing a brief history of cognitive science, with some of it trials and tribulations, including the (in)famous “computer metaphor” of mind. On this basis, he proposes that cognitive science “needs periodically to re-­invent itself – in light of the present age, in keeping with contemporary insights and discoveries” – and suggests that cognitive semiotics, with its considerable ontological and epistemological openness, can be seen as one such “reinvention”. Parthemore then proceeds to summarize his own theory of concepts (Parthermore 2011a, 2013), Unified Conceptual Space Theory (UCST), which goes beyond a number of classical dichotomies, such as that between know-­what/ know-­how, and shows advantages compared to what he refers to as the “knowledge representation view” in traditional cognitive science. In a number of respects, Parthemore proposes that UCST naturally falls within the (broad) framework of cognitive semiotics, illustrating this claim with respect to some of the typical features of the new discipline. Through its “grounding in semiotics”, the theory demonstrates how concepts are both entwined with language and pull apart from it. Through its roots in phenomenology, the theory takes an informed view on the nature of “representations”. Through its focus on meaning as a dynamic process, it shows how concepts’ relative stability belies an underlying dynamics, and through its resonance with enactive philosophy, it shows how concepts impose seemingly sharp boundaries onto underlying continuities. Finally, Parthemore applies this perspective to debates concerning the nature of metaphor, and argues that the crucial distinction is not between literal and metaphorical meanings, but between meanings that call attention to themselves and those that do not (in agreement with e.g. chapter 21 by Avelar). In chapter 4, Morten Tønnessen compares the notions of agency in one particular school of semiotics, biosemiotics, and in the enactive approach in cognitive science (e.g. Varela, Thompson and Rosch 1991). This is highly relevant, as biosemiotics partly overlaps with cognitive semiotics in seeking to bridge the gap be1 A similar critique of “embodied simulation” is provided by Blomberg and Zlatev (2014).



tween biology and meaning (see Sonesson & Zlatev 2009), while enactivism has been highly influential for cognitive semiotics (see Zlatev 2012; Parthemore, this volume; Collazos, this volume). Tønnessen shows that neither biosemiotics nor enactivism are internally unified frameworks, with single coherent conceptual systems, and both harbor different conceptions of subjectivity and agency. The notion of agency is intrinsically related to the conception of action, and the understanding of what constitutes action varies. Thanks to the variation in views, however, specific similarities between different approaches in biosemiotics and enactivism can be identified. Tønnessen discovers a clear affinity between the Uexküllian approach in biosemiotics, and the “mind in life” approach of Thompson (2007). In particular, both theories find agency and thus (minimal) meaning in even the simplest organisms such as bacteria. At the same time, both are capable of distinguishing between this and the experience of agency, which presupposes self-­awareness, and is only found in “higher” organisms (cf. Zlatev 2009a, 2009b). As far as cognitive semiotics is concerned, Tønnessen suggests that, since biosemiotics and enactivism are two of the most innovative and integrative contemporary approaches to the nature of life and living systems, cognitive semiotics, as a field devoted to the study of cognition, should look for its foundations in them. Or, as he formulates his point rather bracingly: “cognitive semiotics should therefore be conceived of as a subfield of biosemiotics”. A different conclusion, however, might be that cognitive semiotics, being an even newer and (hopefully) even more integrative approach than either biosemiotics or enactivism, may learn from the strengths and weaknesses of both, as well from other theoretical resources such as phenomenology and cognitive linguistics, in order to avoid using notions such as agency, semiosis, and meaning too broadly and, when necessary, to introduce clear subdivisions and “semiotic thresholds”. In chapter 5, Juan Carlos Mendoza Collazos presents an overview of a new approach to signification, known as agentive semiotics, that “links achievements of logic, phenomenology and cognitive sciences” (Niño 2015). As the author states, this clearly aligns this approach with cognitive semiotics. Similarly, agentive semiotics is influenced by enactivism, though it takes a more specific stand on the notion of agency, defining an agent as a being that is animate, situated, and capable of paying attention. This means that artifacts for example only have derived agency, that is, a kind of agency that has been assigned by agents proper, which in the case of artifacts means designers.2 The bulk of the chapter applies the theory of agentive semiotics precisely to the semiotics of artifact design. Unlike traditional semiotic design analysis, the agentive approach implies focus not on the artifacts themselves, but on acts

2 This is reminiscent of the notion of remote intentionality, introduced by Sonesson (1999), seemingly for the opposite reason: in arguing against the denial of there being any agency involved when a camera is rigged up in front of the finishing line of a horse race to be triggered automatically when the horses cross the line.


Jordan Zlatev, Göran Sonesson & Piotr Konderak

of production and response. Artifacts have significance (a network of potential responses) and signification (“the actual response an agent activates”), thus paralleling (one version of) the distinction between semantics and pragmatics. The chapter makes a strong case for the application of the theoretical corpus of agentive semiotics to design practice, allowing new insights into the actions and experiences of designers and users. Notions such as agenda, per-­agenda, agentive scene, etc. are clearly explained and illustrated, showing how theoretical and “applied” cognitive semiotics can intermix. In chapter 6, Michael May, Karin Skriver and Gert Dandanell argue for the urgent need for a “semiotics of science” that integrates “different literacy issues in science education and the interwoven conceptual issues in the history and didactics of science”. The authors point out that such a project was prophecised by Charles Morris nearly a century ago, but that despite some efforts from the direction of Hallidayan “social semiotics”, it was never realized. The authors then propose that by integrating semiotics, cognitive science, and cognitive linguistics, as well as first, second and third-­person perspectives and methods, cognitive semiotics would be more suited than alternatives to establish such a discipline. Pointing to common themes in the history and didactics of physical chemistry, they illustrate the need for “semiotic literacy” with reference to several concrete cases occasioned by the use of such semiotic means as mathematical formulas and diagrams as representations of chemical reactions. In one case study, they show that a schema becomes misleading when interpreted as an image, in the Peircean sense of a bundle of qualities, rather than as a Peircean diagram, that is, as a rendering of relationships. A second example concerns graphs showing enzymes “speeding up” chemical reactions, the misunderstanding of which is explained through the notion of force dynamics (Talmy 2000). The conclusion is, rather in the manner of the previous chapter, that a cognitive semiotics of science should not be a purely philosophical project, but an “empirical investigation of the individual sciences from the point of view of meaning, signification and experience”.

2. Part II: Semiotic development and evolution The second part groups together chapters that deal with the emergence of sign use and language either in ontogeny or in evolution. While all of the authors apart from Nelson frame their explorations as primarily concerning the emergence of language, each investigation focuses on specific cognitive-­semiotic preconditions or prerequisites for the ability of language to emerge: mental representation (McCune), abductive inference (Imai), symbolism (Nelson), navigation and mental time travel (Ferretti and Adornetti), affordance perception (Chiera) and goal-­directed actions and pantomime (Nicchiarelli). Several recurring themes are parallels between phylogeny and ontogeny, embodied meaning and the stage-­like progression from action to signification and language. In chapter 7, Lorraine McCune, a prominent Piagetian developmental scholar, summarizes her comprehensive theoretical framework for the emergence of ref-



erence during the first two years of life (McCune 2008), relating it to concepts in cognitive semiotics. Consistent with Sonesson (2007) for example, according to McCune “all experience of meaning (even sensation) can be considered semiotic (i.e. meaningful), but only some special kinds of meaning are signs”. Also, in line with both Piaget (1962) and many scholars in cognitive semiotics, McCune regards mental representation as the capacity for “contemplation beyond the here and now”, and thus crucially dependent on consciousness. McCune argues that semiotic development starts with relatively holistic experiences of perception and movement, and proceeds largely through processes of differentiation, between self and other, and between expression and content. McCune documents five levels of representational play, aligning these with stages in Piagetian theory, and thus charts the transition from pre-­sign meanings to mental representation and sign use. Yet, human language is predominantly vocal, and McCune proposes a theory for “embodying symbols in the vocal medium” consistent with the approach of Werner and Kaplan (1963). She extends this conception by hypothesizing that laryngeal vocalizations (“grunts”) are of particular importance, as they undergo a transition from automatic responses to “personal symbols”. The author reviews a study showing how such grunts co-­occur with effort, attention and communicative use. The third crucial prerequisite for the transition to referential speech, according to McCune, are vocal motor schemes, deriving from babbling. In chapter 8, the well-­known cognitive scientist Mutsumi Imai addresses the so-­ called “symbol grounding problem” (Harnad 1990), i.e. how to connect meaningless “symbols” to experience, from the perspective of language acquisition by children. While initially presented as a challenge for “symbolic AI”, the author argues that it is no less damaging for “connectionist” models, as children cannot acquire a lexicon by associating perceptual experiences with word-­tokens. Rather, a realistic model of language development needs to account for the following cognitive-­semiotic capacities and achievements: (a) understanding that words refer to concepts, (b) finding the referent in a particular context, (c) finding the semantic domain to which the word belongs, (d) generalizing the meaning of the word in the context of previous linguistic experiences and (e) acquiring “an adult-­like representation of the domain as a whole”. Reviewing and combining much of her own research on these topics, Imai first shows that sound symbolism, i.e. non-­arbitrary mappings between expression and content, may serve as a powerful “bootstrapping mechanism” for (a) and (b). Still, “children have to learn to infer the meaning of words when the help from sound symbolism is not available, and they have to do that in relation to other words they have already learned”, i.e. proceed to (c-­e). To account for this, the author outlines a model grounded in empirical studies, in which fast-­mapping (grasping rough word meanings) and slow mapping (further adjustments of meanings of words in relation to other words) processes combine in constructing lexical systems. Finally, Imai proposes a cognitive function that may be crucial for these achievements: a uniquely human capacity for a “bidirectional reasoning bias”, related to Peircean abductive


Jordan Zlatev, Göran Sonesson & Piotr Konderak

reasoning. In sum, without explicitly referring to cognitive semiotics, Imai’s chapter is an excellent illustration of the potential of combing methods and concepts from cognitive science and semiotics. Chapter 9 by Keith Nelson turns the focus to evolution, but like earlier chapters uses developmental data (along with evidence from archaeology) to argue for the found symbol hypothesis. According to this proposal, the first “full symbols” (i.e. signs used intentionally to refer to categories of objects and acts) in hominin evolution where neither words nor gestures but rather natural objects such as the Makapansgat Pebble, dated at over 2 million years ago and resembling a human face (cf. Bednarik 1998). The pre-­condition for this to arise in our ancestors (but not in any other species) according to Nelson was a “tricky convergence” of cognitive capacities such as pattern detection-­comparison processes and attention regulation, and social processes like cooperation and negotiation. Nelson provides two kinds of support for this hypothesis. The first is based on comparative neuroscience, where the author compares early hominids, nonhuman primates, and human infants, and concludes that using “found symbols” requires cognitive skills within the reach of early Homo species, but not of other non-­human primates. The second concerns evidence for symbol learning by children in the second year of life, such as that referred to by McCune and Imai, crucially involving not only words and gestures but also material objects (such as pictures) as symbols. The hypothesis is attractive, and could potentially provide a missing piece to other cognitive-­semiotic models of symbol origins (e.g. Donald 1991; Deacon 1997; Tomasello 2008), as “communication with found symbols would not have required any tool use or complex planning … adequate for complex communication sequences in any sign, gesture, speech, art or multimodal forms”. It helps explain certain apparent “anomalies” in the archaeological record (e.g. why brain size increased while tools remained more or less the same among the early Homo), and serves as the basis for predictions for future findings. In chapter 10, Francesco Ferretti and Ines Adornetti argue, in line with the conceptual-empirical spiral (see the introduction) that the topic of language origins requires re-­thinking the nature of language. They align their approach with the “action-­oriented perspectives” in cognitive science (e.g. Clack 1997) and with cognitive pragmatics (Sperber and Wilson 1996). Thus, they reject syntax-­ centric perspectives in linguistics and the related “code model of communication”, also dominant in classical, structural semiotics. What both lack, the authors argue, is the ability to account for discourse-­level coherence, which cannot be reduced to formal devices of text cohesion, as shown by neurolinguistic research, where coherence may be preserved without syntax (e.g. aphasia) and vice versa (e.g. schizophrenia and Alzheimer’s disease). On this basis, the authors propose that “language has a proto-­discursive origin and the selection pressures that drive the evolution of language meet the needs of pragmatic concerns before grammatical concerns”, in apparent agreement with the influential relevance theory of Sperber and Wilson (1996), where the speaker gives “clues” as to his/her intension,



and the audience draws inferences on the basis of “mindreading”. Such a model of language evolution has actually been proposed and argued for in some detail by Scott-­Phillips (2014). However, Ferretti and Adornetti contend that “models such as these suffer from a serious difficulty: the exclusive attention paid to the speaker’s intentions leads one to exclude the temporal dimension from discourse”. In the remainder of the chapter, they propose a remedy by outlining a model where priority is given to events, and discourse is organized as a temporal chain of events. Such a revision requires additional cognitive abilities underlying discourse comprehension and communicative skills, which the authors link to navigation in space in time, and specifically to the proposal by Corballis (2011) that the uniquely human capacity for “mental time travel” underlies the ability to construct and interpret narratives. Research on this hypothesis is in progress, e.g. concerning schizophrenia, possibly supporting the authors’ conclusion that “the metaphor of navigation we have assumed as a key explanation of human narrative capacities is more than a simple metaphor”. In chapter 11, Alessandra Chiera continues the theme from the previous chapter: the development of a “holistic model of language evolution”. Like Ferretti and Andornetti, she rejects bottom-­up, computational approaches to language in linguistics and cognitive science. The motivations for the critique and the proposed solution are, however, somewhat different. The main problem of “modular” approaches in the spirit of Fodor (1983), according to Chiera, is that they imply rigid boundaries between semantics (“code”) and pragmatics (“context”, “inference”). The reason this is problematic is that empirical studies in language comprehension and production strongly suggest that contextual, top-­down information influences language use in real time: i.e. that “speakers build a cumulative representation of the global message conveyed by the conversation and that such a representation immediately constrains production and comprehension processes.” Consequently, the author seeks to ground the evolution of language primarily in the contingencies of online conversation (more than the coherence of narrative). In the next step, she asks where human capacities for perceiving socially relevant contextual and conversational features derive from, and proposes that these are essentially not a matter of “mind reading”, but of affordance perception: “in order to extract relevant contextual information, individuals rely on the same processes by which they interact with the physical environment, that is action and perception processes”. Such processes allow what she calls pragmatic alignment, in which communicative actions are coordinated, and successful conversations emerge. The advantage of such an account is that it allows successful communication “despite fragmentary and ambiguous coding”, while at the same time providing the frame for the gradual evolution of a conventional-­normative system, in a dialectical manner. The author explicitly applies the conceptual-­empirical spiral of cognitive semiotics in several cycles, offering original and productive definitions: “In this framework, conversation can be defined as a dynamic interactive exchange, situated within a jointly determined – and constantly evolving – semiotic system.”


Jordan Zlatev, Göran Sonesson & Piotr Konderak

The final chapter in this section (chapter 12) by Serena Nicchiarelli adds one more brick to the (putative) bridge between the linguistic and cognitive skills of the common ape-­human ancestor and our own: a blend of practical, goal-­directed actions and communicative acts that Nicchiarelli terms “communic-­action”. The chapter starts with a discussion of the concept of protolanguage (a hypothetic semiotic system that helps span the gap between ape communication and human language), outlining the two major models that have been proposed: a lexical, “grammarless” protolanguage (e.g. Bickerton 2010) and a holistic protolanguage with utterances expressing complex, but non-­compositional, communicative acts (e.g.  Arbib 2005a). Nicchiarelli defends the latter view, presenting her specific communic-­action hypothesis, according to which protolanguage evolved from the combination of (conventionalized) pantomimes and “vocal gestures”. This is consistent with Arbib’s (2012) evolutionary model, but the author differs by emphasizing the pragmatic dimension of such formulaic utterances as performatives (Austin 1962) and facilitators of social interaction and online conversations. Evidence for the hypothesis is twofold. First, by building on work by Wray (2002), Nicchiarelli argues that the communic-­action strategy serves today as a “living fossil” from the holistic protolanguage stage. Second, the chapter reviews neurolinguistic evidence that formulaic language, along with discourse and non-­ verbal communication, is dependent on the basal ganglia and subcortical prefrontal circuitry in the right hemisphere – areas responsible for activities such as planning, evaluating cues, acting in an environment. As a capstone of the chapter, Nicchiarelli points out an intriguing dissociation: it is precisely these areas that are damaged and these capacities, including formulaic language use, that are impaired in Parkinson’s disease. On the other hand, formulaic language and motor control are preserved in Alzheimer’s disease, while higher functions such as coherence and navigation are damaged, as noted in chapter 10. If so, then it appears that the cognitive-­semiotic capacity targeted by this chapter (perhaps along with the alignment mechanisms studied by Chiara) is more basic, and evolutionarily more ancient, than the one explored by Ferretti and Adornetti. This is just one indication of how the ideas in the chapters of this section could possibly be combined in a more synthetic account.

3. Part III: Meaning across media, modes and modalities The third part combines chapters that deal with “cross-­modality” in a broad sense. In fact, this characterization can be given to most studies in cognitive semiotics, as a central tenet of cognitive semiotics is that the study of meaning cannot be restricted to a single semiotic resource (language, gestures, pictures), modes of presentation (perception, imagination, material artifacts) and sensory modalities (hearing, vision, touch…). This idea is sometimes expressed by the term “multimodality” (e.g. Kress 2010), but we will refrain from using it as it conflates the distinctions just made. In any case, the chapters in this part deal explicitly with crossing semiotic borders: between gesture and mimesis (Müller), language and



gestures (Kreydlin and Khesed), symmetry in visual patterns, embodied experience and language (Pelkey), indexicality in hallucinations and perception (Pudlak), felt qualities and gestalts across language and pictures (Bruche-­Schulz) and iconicity across vision and hearing (Coppin et al.) In chapter 13, Cornelia Müller, one of the leading scholars in gesture studies, asks how the fundamental human capacity for bodily mimesis (Donald 1991; Zlatev 2008b) shapes the structure and meaning of a central semiotic resource: gestures. She begins by reviewing the interest in mimesis in cognitive semiotics from a diachronic perspective, where it is applied in attempting to explain the ontogeny and phylogeny of sign use in general, and language in particular. She agrees with such approaches, but argues that the role of mimesis is greater than this, as it also has a key synchronic dimension since it “motivates the (embodied) semantics of referential gestures”, or what she terms gestural mimesis. (Müller thus echos the argument from chapter 12 by Nicchiarelli concerning “living fossils”). In her analysis, Müller departs from Aristotle’s classical concept of mimesis and his claim that human beings are fundamentally mimetic beings. Following his three categories of mimesis in the arts, Müller analyses (a) bodily articulators as media, (b) concrete or abstract actions or entities displayed as objects, and (c) ways of displaying as modes of mimesis. Generalizing over previous analyses, she distinguishes between two fundamental modes: Acting (“enacting bodily actions and movements”) and Representing (“becoming bodily sculptures”), noting that only the first corresponds to mimetic schemas (Zlatev 2005). With multiple examples, she then shows that the relation between objects and modes is complex and flexible, allowing both entities and actions, either concrete or abstract, to be displayed in either mode. For that reason, she is critical toward the distinction between “iconic” and “metaphoric” gestures (McNeill 1992), as both can use the same mode, and differ only with respect to their object. Müller’s analysis is not only theoretically fruitful for cognitive semiotics, distinguishing between processes of “sign formation (motivation)” and “local meaning”, but also for practical gesture analysis, as it “offers intersubjectively accountable descriptions of the particular form of conceptualization in… everyday gestures”. In chapter 14, Grigory Kreydlin and Lidia Khesed take a very different approach to the analysis of gestures as “non-­verbal sign units” embedded in a “corporal semiotic code”, which they argue is closely related to the semiotic code of language. Specifically, the authors focus on the category of impoliteness, as reflected in the social norms, linguistic categorization and gestures in modern Russian culture. The methodology used is informed by the dictionary-­oriented Moscow Semantic School, with important works such as that of Apresyan (1995) extended to gesture analysis by Kreydlin (2002). Accordingly, the analysis begins with semantic analysis of the Russian adjectives грубый (grúbyj) ‘rude’, дерзкий (dérzkij) ‘impudent’ and хамский (hámskij) ‘caddish’, which are claimed to structure the semantic field of impoliteness. The authors emphasize features such as (a) culture-­specificity, (b) historicity and (c) context-­dependence of the categories involved.


Jordan Zlatev, Göran Sonesson & Piotr Konderak

Interestingly, however, when they proceed to the analysis of gestural signs of impoliteness, no clear correspondence with the linguistic categorization transpires. The features (a-­c) are indeed illustrated, but all the examples given do not appear specific to the Russian context. One group of gestures are “impolite in all contexts of their usage”, including cross-­culturally familiar types such as a raised middle finger, and sticking out the tongue. A second class that varies contextually and historically such as hand-­shaking and hand-­kissing are also fairly widespread, though, of course, by no means universal. The specific norms on pointing in Russian culture (“don’t point with your index figure at people”) are also well-­known, and even more (alas!) dominance expressing “gestures of bosses” and gestures of male sexual harassment. Thus, without contradicting the analysis presented by the authors, interesting parallels with the previous two chapters emerge. In line with Nicchiarelli (chapter 12), the “impolite gestures” appear to function not so much as the compositional elements of language, but as embodied holophrases, and in line with Müller (chapter 13), their embodied meaning is grounded in mimesis, either enacting or representing offensive behaviors. In this view, language-­specific semantics and culture-­specific mirco-­conventions (e.g. the meaning of the OK gesture) are sedimented upon such meanings that are originally more holistic, embodied, and hence more universal. In chapter 15, Jamin Pelkey investigates the phenomenon symmetrical reasoning, as manifested in different semiotic resources. In agreement with Imai (chapter 8), he argues that it is essential for language, but extends this to any full-­ fledged sign use, with reference to Helen Keller’s famous insight, which was “not so much an awakening to the existence of symbols as it is an awakening to the reflexive symmetrical potential of symbolic activity for modeling possible worlds.” But what is its fundamental nature and origin? Pelkey’s exploration focuses on symmetrical designs found in human material cultures around the world, showing striking resemblances between, for example, Celtic knot designs from medieval Ireland, ritual sand tracings of Vanuatu islanders in the South Pacific and “eternal knots” of Tibetan Buddhism. Even the earliest (relatively uncontested) symbolic expression from over 70,000 years ago, found in the cave in Blombos shows symmetrical criss-­crossing X-­figures. Semiotically, such patterns may be treated as “iconic legisigns”, i.e. diagrams, and formally described in anthropology with the help of mathematical group theory, but as Pelkey shows, such analyses tell us very little about their meaning. Nor do they help explain the riddle of their apparently universal appeal. To do so, he argues, it is necessary to regard such patterns not only as visual (or somehow derivative from language), but as bodily, in the sense of embodied phenomenology, which has shown that “we not only experience our bodies in movement but also project the feeling of our bodies onto other people, things and events”. Such ideas are reflected in much current thinking related to cognitive semiotics, the most important precursor of which, the author suggests, is M. Merleau-­Ponty, who uncovered a fundamental form of (near-)symmetry between our sensed and the sensing bodies, referring to it as a “chiasm” and representing it as the X-­figure. Secondly, and more empirically, Pelkey reports on the development



of a digital web app prototype in which participants are to observe and retrace intertwining patterns, and subsequently are tested for “cognitive benefits”. While pointing out that this is still ongoing research, Pelkey thus manages a bold foray into answering the riddle of symmetrical design and thought, thereby showing the potential of cognitive semiotics. Štěpán Pudlák addresses in chapter 16 another, no smaller “mystery”: the nature of mental disorders, with a focus on schizophrenia and more specifically on hallucinations. He shows how these have been misrepresented in psychiatric theories either as “misinterpretations of the inner speech” or as “especially vivid images”, since such descriptions do not get at the core of their phenomenology: they are experienced as indexical (directed to their object as something contiguous in space and time) in the same manner as sensory perceptions. Using Peircean terminology, he explains that the hallucinatory nature of, for example, the utterance “There is a unicorn on the porch” does not have to do with the semantic content of the dicisign or proposition (see Stjernfelt 2014), or its mode (verbal or visual), but with the experience of indexicality: the (immediate) Object of the sign (the “utterer” of the sentence, or the “vision” of the unicorn on the porch) is indistinguishable from objects of perception in schizophrenic first-­person experience. Such an analysis is obviously very close to one that can be provided by phenomenology, where the indexicality in question corresponds to the most fundamental kind of intentionality: the directedness of perception (cf. Sokolowski 2000). Accordingly, Pudlak finds resonances between his Peircean analysis and a study of mental illness by Gallagher (1997), but is critical toward the latter’s claim that hallucinations may be attributed to failure in distinguishing between self and other. Still, to distinguish between a hallucination and a real worldly object, Pudlak uses the Peircean division between Immediate and Dynamic object: the hallucinated unicorn may only be the first (as it can be subjectively experienced) but not the latter, as it does not exist in intersubjective reality. Failures in indexicality may thus be related to failures in intersubjectivity. In any case, Pudlak’s chapter is another excellent illustration of the integrative potential of cognitive semiotics to unlock (if not yet fully open) the doors to long-­standing mysteries. In chapter 17, Gisela Bruche-­Schulz analyzes the “pictorial responses” (see below) produced by student-­participants from five different cultures and languages (English, Chinese, German, Russian and Turkish) when presented with a page of Saint-­Exupery’s Le Petit Prince, translated into their own language, and asked to “jot down whatever comes to mind” in the margins. All five groups produced approximately as many responses, but interestingly, the “English group” (actually Cantonese-­English bilinguals living in Hong Kong) produced between 4 to 10 times more pictorial responses than the other groups. However, the author is less interested in accounting for any (cultural) differences between the groups than in finding common (experiential) structures. This is in part due to previous research (Bruche-­ Schulz 2014), which showed that all the responses were distributed into text segments in a comparable fashion across the five groups, with peaks and valleys that correlated with the content of the text, analyzed in terms of force gestalt patterns.


Jordan Zlatev, Göran Sonesson & Piotr Konderak

Hence, Bruche-­Schulz proposes examining the pictorial responses as reflections of an intersubjective “felt reality”, inspired by the social phenomenology of Alfred Schütz. In the analysis, the author initially distinguishes between three different kinds of pictorial responses, depending on the degree of figurative details: (a) scenes with at least some detail, (b) relatively abstract diagrams and (c) pictographs: “highly conventionalized and abstracted pictorial symbols” like circles and hearts, presenting and discussing several examples of each category. The boundaries turn out to be somewhat blurred, but (c) is, as admitted by the author, is most difficult to interpret, possibly due to the fact that abstract symbols afford less access to “the felt impact of the “halos” of things”. As Bruche-­Schulz was the person who conducted the study, engaging with the participants on their own turf, as well as the analyst who lives through the relevant “gestaltist patterns of experience”, her methodology can be described as a productive combination of the first-­person and second-­person perspectives in cognitive semiotic analysis. Peter Coppin, Ambrose Li and Michael Carnevale start their exploration in chapter 18 with the observation that blind people are impaired when needing to access representations such as charts and graphs through linguistic descriptions. But what kind of information is “lost in translation” and how can the problem be redressed? They propose that sound would be the natural medium for communicating to blind persons, but “how could a designer identify appropriate mappings from iconic properties of visual graphics to those of sound to convey the same relations”? In the spirit of the conceptual-­empirical spiral, the authors first delve into a conceptual exploration, and begin with distinctions made by Shimojima (1999): graphic representations are presented two-­dimensionally, allow the representation of relations with relations, are analogue (“dense”), and obey intrinsic constraints on processing. In contrast, linguistic representations are sequential, represent relations with symbols, discrete and obey extrinsic constraints. The authors briefly relate these notions to semiotic analyses of such notions (e.g. Sonesson 1989; Stjernfelt 2000), pointing out both overlaps and a degree of terminological confusion. The challenge, as they see it, is to present a consistent model that integrates the relevant concepts with findings from cognitive science. They depart from a previous model (Coppin 2014), but extend it to make it more terminologically coherent with the semiotic literature. In brief, the main distinction of this extended model is between (a) iconic´ and (b) symbolic´ properties (using primes to indicate that their synthetic concepts are related but not identical with standard semiotic definitions). Applying these to the problem at hand, they propose that (a) correspond to features that in “both visual graphics and sound … [are] picked up by sensory receptors and processed by lower-­level perceptual categories and simulators”. On the other hand, (b) build upon (a) but “cause the perceiver to have a simulation that falls under the author’s intended conceptual category”. As can be seen from these definitions, the authors rely heavily on the notion of (embodied) simulation (Barsalou 1999), criticized by Pérez (chapter 2) for being excessively broad. Still, such apparent



disagreement could possibly be resolved by unpacking the authors’ use of the term with the help of phenomenology (e.g. their “low-­level simulations” clearly correspond to protentions). Most importantly, the chapter answers the question of how sound could be recruited to preserve iconic properties, and proposes how their analysis can inform design.

4. Part IV: Language, blends and metaphors The chapters in the fourth and final part all deal with “blends” and “mappings” across different kinds of meaning – and in this way are continuous with those in Part 3 – but they deserve to be grouped separately as they deal, more or less, with language. At the same time, each one explores to what extent its structures are continuous with those of other structures such as political institutions (Oakley), conceptual metaphors (Sušac), gestures and social background (Avelar), embodied experiences (Bagli) and non-­discursive practices (Jensen). Finally, the chapters by Ursini and Brandt take up and answer – from very different theoretical perspectives – the issue: what would a cognitive semiotic approach to language look like? In chapter 19, Todd Oakley addresses a recurrent use of the English modal auxiliary verb must, which blends epistemic and deontic modality, referring to it as deonstemic modality. An example is the sentence “This tax must be unconstitutional” used by the Supreme Court of the United States (SCOTUS), where Oakley documents extensive use of such blends, which in Searlean terms combine a mind-­ to-world (epistemic) and world-­to-mind (deontic) direction of fit. In searching for an explanation of this phenomenon, Oakely argues that linguistic analyses, even those that emanate from cognitive linguistics (Sweetzer 1999; Talmy 2000), which emphasize the link between the modalities, are lacking. The author suggests that the reason they have treated linguistic modalities as necessarily distinct is that they have not sufficiently appreciated the socially layered nature of language, and in particular its extra-­cranial institutional nature. Oakley proposes that a cognitive semiotics of institutional discourse would be better suited for addressing the puzzle, as “a proper understanding of the deontestemic modality comes sharply into focus when we examine these texts as multilayered artifacts, with each layer potentiating actions of different addressees.” To address this, he develops a descriptive six-­layer model of SCOTUS discourse, building on notions from rhetoric (Blitzer 1968) and discourse analysis (Clark 1996). The first two layers are those of Decision and Rationale, stemming from the nature of the institution itself, whose deontic powers issue from the perception of its decisions being “prudential” and “reasonable”. The author points out that the model implies that language users must (deonstemic usage?) have a shared capacity for differentiating one layer of meaning from another in order to be able to appreciate the deontestemic category itself, “whose signal purpose is to blend our obligation with our reason”.


Jordan Zlatev, Göran Sonesson & Piotr Konderak

Vlado Sušac’s chapter 20 adopts a rather unusual cognitive semiotic synthesis between structural semiotics in the legacy of Barthes (1967) and conceptual metaphor theory (Lakoff and Johnson 1980) in order to investigate the possibility of replacing metaphorical concepts in political discourse with alternatives. Specifically, the aim is to apply the commutation test, where one signifier is substituted with another and the effect on signification assessed, to expressions from different “source domains” such as WAR and TRAVEL. This is challenging, as conceptual metaphors are specifically defined as “mappings across cognitive domains”, rather than across linguistic expressions. Methodologically and conceptually, Sušac resolves this with the help of work by Andrew Goatly (2007) and a database that links conceptual mappings (“root analogies”) with specific linguistic expressions in English. Sušac finds cases of “formal diversification” from the database, i.e. where more than one source domain maps to a single target domain (e.g. POLITICS IS WAR, POLITICS IS PATH), and applies these to a corpus of Croatian political discourse. One of the findings is that the use of respective metaphor/mapping correlated with political ideology (i.e. the more right-­wing party used the more “aggressive” metaphor). More importantly, he shows that only the commutational replacement of source domains requires more than domain overlap: what is needed is a degree of “conceptual synonymy” between the alternative source domain expressions, as well as the target “lexical item”, which Sušac theorizes in terms of a shared ground (Richards 1991). For example, bashing on and keep on trucking can be commuted as they share a common aspectual meaning of CONTINUE ACTIVITY, while backlash and milestone cannot. Having clarified the conditions for performing the commutation procedure on political metaphors, the chapter leaves it for further research to determine if this could and should be performed pro-­actively as a form of “conceptual correctness”. Chapter  21 by Maíra Avelar continues the topic from the previous two chapters – political discourse – and likewise uses it for the exploration of theoretical issues. In her case, this concerns studying metaphors not as static mappings, but as dynamic, multimodal metaphoricity, involving actual usage, socio-­cultural context, speech and gesture. The descriptive model used, Multimodal Semiotic Blending, is itself an impressive cognitive semiotic blend, based on Brandt and Brandt’s (2005) enunciation-­based extension of “classical” blending theory (Fauconnier and Turner 2002), combined with Kendon’s (2004) analysis of gestures and Müller and Cenki’s (2009) verbal-­gestural metaphorical compounds. Using this model, the author analyses sequences of two presidential debates in Brasil between Dilma and Serra (2010) and Dilma and Alécio (2014), focusing on the verbal and gestural resources used by the participants. In line with the dynamic, context-­sensitive perspective on metaphor employed, Avelar proposes that multimodal metaphors will differ in their degree of “compression” (i.e. how much important information can be expressed in few expressions), and thus in their perceived “metaphoricalness” depending on their degree of conventionality. The limited amount of data (and degree of operationalization) does not allow her to test this as a true empirical hypothesis, but the distinction is clearly illus-



trated, with e.g. CONFUSING IS MAKING FOAM as non-­conventional and highly compressed, and MOVING FROM X TO Y as conventional and less compressed, and can be further pursued in the future. In sum, the chapter clearly illustrates a major claim of cognitive semiotics (“our conceptual system is broader than our linguistic system”) and the tools presented contribute to an important goal: “to show in detail how each modality works, how they interact, and jointly lead to emergent metaphors”. Chapter 22 by Marco Bagli addresses the topic of cross-­domain mappings between spatial concepts (in/out), brightness (light/dark) and morality (good/bad) in a predominantly cognitive-­linguistic framework, analyzing concepts in terms of image schemas and the metaphors as experiential mappings. Bagli’s main descriptive tool is conceptual blending theory (Fauconnier and Turner 2002), presenting a compact and useful summary of this in relation to CMT (Lakoff and Johnson 1980). He points out that the light-­good/dark-­bad mapping can be explained as a (possibly) universal “primary metaphor” (Grady 1999), grounded in pan-­human experiences that link darkness with danger. However, he asks, why should one kind of mapping IN-­LIGHT/OUT-­DARK be preferred to another IN-­DARK/OUT-­LIGHT, when there seem to be experiences of both kinds (e.g. bright rooms, and dark caves)? To answer this question, he first investigates this particular mapping in two very different kinds of narratives: the classical Bildungsroman Demian by Hermann Hesse (1919) and The Rocky Horror Picture Show by Jim Sharman (1975), which is “iconic” for American youth culture. He points out that both “texts” (the second also in music and visual art) link LIGHT-­IN-GOOD on the one hand, while OUT-­ DARK-BAD on the other hand (Rocky Horror rather ironically and subversively). But these connections are made (and perceived) highly consciously while cognitive linguistic analysis maintains that the relevant mappings are done above all within the “cognitive unconscious”. In an original manner, Bagli manages to address the issue by applying an experimental design from social psychology known as the Implicit Association Test, where by measuring reaction times, “subtle forms of evaluative difference” can be shown. The results, with native English speakers of university age, showed very high association strength value between the categories of in and light, leading the author to conclude that this particular mapping is indeed the “default construal”. The extent to which such results generalize beyond Western culture remains to be shown, but Bagli proposes that this could further validate the thesis that embodied experience grounds human cognition and language. In chapter 23 Katherine O’Doherty Jensen proposes the original notion of performative metaphor, which concerns primarily neither language (as in traditional metaphor theory), nor conceptualization (as in CMT), but “non-­discursive practices in which something is treated in terms of something else”. Her recurrent example is that of applause at a concert, where the higher levels correspond to higher levels of evaluation, and vice versa. As this example shows, and as the author repeatedly points out, a key feature of such non-­discursive cultural practices is gradience rather than the (relative) discreetness typical for linguistic categorization. The dimensions that are mapped in performative metaphor are thus scalar, and the mapping itself


Jordan Zlatev, Göran Sonesson & Piotr Konderak

is best characterized in terms of structural analogy (Itkonen 2005). According to the author, performative metaphors give rise to shared gradient meanings and are reproduced when they “make sense” to social actors. Thus they have an implicit normativity and are truly “metaphors we live by”. O’Doherty Jensen then applies this notion to an illuminating analysis of a puzzle concerning food practices. There are apparently universal evaluative tendencies (e.g. meat > fruit & vegetables > cereals; alcohol > soft drinks), which are furthermore gendered: despite much cross-­cultural variation, men value the foods higher on the scales, and women tend to value those that are lower. The author considers both substantive metaphorical accounts (“men are (like) animals, women are like fruits”) and functional accounts based on social power and finds them lacking compared to an analysis in terms of the performative metaphor GRADES OF FOOD ARE GRADES OF PEOPLE. The analysis implies that “discernments of gender-­appropriate consumption are … mediated … in a non-­arbitrary fashion. In the measure such practices are recognized as appropriate, they become conventionalized, regulated by social norms and the sanctions that accrue to deviation from gendered role.” She concludes by emphasizing the difference between such non-­ discursive practices and meanings (which she regards as “mimetic”, using the term rather broadly) and those based on language, and, analogously to Müller (chapter 12), argues for the importance of the former not only as a “stage” in evolution and development, but as a crucial part of all concurrent human cultural sense-­making. In chapter 24, Francesco-­Alesio Ursini takes a more formal approach than customary in cognitive semiotics in his investigation of systematic connections between linguistic and non-­linguistic meaning. He formulates his project as the quest for (a) “core cognitive modalities and their underlying ontologies, defined as the sets of basic categories and relations on which cognitive processes operate” and (b) “properties and relations underpinning these ontologies and their governing processes”, and focuses on commonalities between visual perception of objects and the semantics of noun phrases (NPs). An “ontology” for object perception is formulated on the basis of central sub-­systems proposed in cognitive science; four categories with respective types are identified: motion, attributes, geometrical structure (shape), and quantity. In the case of NPs, four semantic categories are proposed, building on the literature in linguistic typology (i.e. the study of similarities and differences across languages): animacy/gender, qualities of the referent (e.g. mass/count nouns), shape, number and quantity. Even from this simple listing, it appears that “the classification principles … work in parallel across the two domains”, but as Ursini points out, this parallel needs to be more formally elaborated. He proceeds to do so with the help of a formal theory of “information flow” (Barwise and Seligan 1997) that utilizes the notion of infomorphism. In a dense but clear section, the author defines the core concepts of the approach, including classifications for object types and noun phrase types, defining two symmetrical functions, from object to noun phrases and vice versa. The final section provides an empirical application (serving as a justification) for Ursini’s infomorphic approach. First, as the theory allows for both one-­to-one and



many-­to-many functions, it is possible to precisely describe patterns of linguistic variation (e.g. an ambiguous object like “a lump of coins” can correspond to different types of NPs coins vs. money), within and across languages. Second, flexibility with respect to patterns of reference, corresponding to the notion of construal in cognitive linguistics, can be accounted for. Third, and perhaps most relevant for cognitive semiotics, the symmetric functions between object categorization and nominal semantics, are consistent with a dialectical relation between linguistic and perceptual categorization of objects, in agreement with findings in language acquisition (cf. Chapter 8 by Imai). In the final chapter of this section (chapter 25), and the book as whole, one of the pioneers of cognitive semiotics, Per Aage Brandt, turns back to one of the sources for cognitive semiotics, linguistic theory, and asks how it could be reinvigorated with the help of the new discipline. His proposal focuses on the core of linguistics: the nature of grammar. Brandt briefly considers both generative grammars in the tradition of Chomsky, and construction grammars (e.g. Goldberg 1995), which dominate cognitive linguistics, and finds both lacking; the first as they have very little to say about (sentence) meaning, and the second because they treat meaning/content and form/expression in terms of “pairings” or “mappings”, which may be appropriate for simple signs, but not for the limitless creative potential of language. Rather, Brandt’s proposed semio-­syntax is conceived as a system that can link, in both directions, imagination and thought on the one hand, and linearized sequences of symbols on the other. The representations of grammar that mediate are “simulations of thought”, allowing “a relatively stable transfer” of meaning between speakers. Admitting the vastness of the topic of re-­conceptualizing grammar in semiotic terms – rather in the spirit of a recent proposal by Daniel Dor (2015) –, Brandt proceeds to offer some ideas for this project, viewing it as a cognitive semiotic extension of construction grammars. One such idea is to ground grammar in enunciation, living language use. Another is to “find the organizing semantic principle of all constructions, in as many languages as possible”, for which he proposes a kind of semantic hierarchy, a “canonical sequence of operations”, that can account for the ungrammaticality of some constructions. Finally, he proposes a kind of semio-­ syntactic sentence trees corresponding to the “stemmas” of Tesnière’s dependency grammar and illustrates this with examples from English, Danish and French. He concludes, echoing Saussure (1916) one century later, that the principles of semio-­ syntax may be extended to culture as a whole as “cultural life presupposes semiotic simulations … of thoughts and concepts that circulate more or less anonymously, in order to form shared ideas”.

5. Concluding remarks The reader that has had the patience to read through these summaries – on which we have labored extensively – should be able to appreciate both the richness of cognitive semiotics, and to notice challenges and potential disagreements. For example, the notion of (mental) simulation is both criticized (chapter 2) and


Jordan Zlatev, Göran Sonesson & Piotr Konderak

utilized (chapters 18, 25). There are also differing interpretations of the notion of mimesis (chapter 13 vs. 23). A possible criticism (and one which we have met at conferences) of cognitive semiotics is the lack of sufficient internal coherence (cf. the analysis of agency in biosemiotics in chapter 4). Our reply is, however, that such a criticism would misunderstand the difference between a discipline and a theory. Cognitive semiotics is the former and not the latter. While it would indeed (most likely) be incoherent if a single theory used both a structural and a Peircean sign concept, or a given theoretical term with different meanings, we cannot and should not expect different theories in cognitive semiotics to always agree. That would actually be stifling and “block the way of inquiry” (to quote the famous Peircean principle on what is not to be done). What we can and do expect is that disagreements are acknowledged, and dealt with the necessary respect, asking questions such as: are the issues conceptual or empirical? If the former, can the concepts be calibrated? If the latter, what kind of evidence can be adduced to make an informed decision between the alternatives? In sum, cognitive semiotics has created a platform for asking such questions – with respect to what is arguably most central to our lives, and to the future of life on our planet: meaning. With this we step back, and urge the reader and possible beginner to cognitive semiotics to jump in, and participate in the discussion!

Part I. Metatheoretical Perspectives

Carlos A. Pérez

Chapter 2 Mutual Enlightenment: A Phenomenological Interpretation of the Embodied Simulation Hypothesis 1. Introduction: phenomenology and cognitive semiotics The relationship between phenomenology and semiotics is not new. Husserl’s influence on the Prague school and on Jakobson’s thought in particular has indeed been widely acknowledged (Holenstein 2005). However, due to the cognitive turn in semiotics, which we can trace back to the work of Sonesson (1989), as well as the phenomenological turn of enactivism and embodiment in cognitive science (Varela et al. 1991), it is particularly important to rethink this relationship. Given that cognitive semiotics and the embodied mind hypothesis share a phenomenological approach (cf. Zlatev 2012, 2015a), it seems that they share the same goals. Nonetheless, we need to be careful, since the terminology used in cognitive science is not always as clear as one may wish, and phenomenology is not always within their theoretical landscape. Varela, Thompson and Rosch’s book, The Embodied Mind (1991) re-­opened the door to phenomenology in the study of mind and cognition. The book begins with the explicit recognition of the importance of taking into account the analysis of human experience in order to capture and comprehend the embodied nature of the human mind, and the situatedness of human cognition. According to Varela et al., embodiment and enaction, the key terms that define their theoretical project, are two sides of the same coin: to be a cognitive agent is to be able to interact with and within an environment; and interaction cannot be properly understood without taking into account the role of the body and the particular ways in which it shapes cognition: By using the term embodied we mean to highlight two points: first, that cognition depends upon the kinds of experience that come from having a body with various sensorimotor capacities, and second, that these individual sensorimotor capacities are themselves embedded in a more encompassing biological, psychological, and cultural context. By using the term action we mean to emphasize once again that sensory and motor processes, perception and action, are fundamentally inseparable in lived cognition. (1991: 172, 173)

As for phenomenology, this intimate bond between organism, action and environment is presented by the authors as one of the original insights in the phenom-


Carlos A. Pérez

enology of Merleau-­Ponty, who is regarded as “one of the few whose work was committed to an exploration of the fundamental entre-­deux between science and experience, experience and world” (1991: 15).1 For Varela, Thompson and Rosch, it is only possible to understand the embodied nature of human cognition and the situated character of the enactive engagement with the environment if one takes into account the phenomenological findings concerning the structure of human experience. And this is one of the points that differentiates their work from that of George Lakoff and Mark Johnson who also called attention to the role of the body in the constitution of human thinking. In their most recent joint publication, Lakoff and Johnson (1999) declare that “the peculiar nature of our bodies shapes our very possibilities for conceptualization and categorization” (ibid: 19), an idea originally suggested in their famous book, Metaphors We Live By (Lakoff and Johnson 1980) and developed later in Lakoff’s Women Fire and Dangerous Things (Lakoff 1987) and Johnson’s The Body in the Mind (Johnson 1987). Far from phenomenology, they build up their defense mostly through linguistic analyses pointing to the fact that (a) the human conceptual repertoire is developed metaphorically, and (b) the grounding of our conceptual system, which gives human thinking its definite structure, is realized by image schemas that emerge from sensorimotor interactions with the environment. Without ruling out consciousness from the cognitive science domain, neither Lakoff nor Johnson take phenomenology as a part of their theoretical frame. Lakoff has been recently concerned with linguistic analysis and its neurological grounding (Dodge and Lakoff 2005), and Johnson has adopted a pragmatist philosophical position, influenced mostly by John Dewey (Johnson 2007), stepping aside from phenomenology and giving more importance to empirical findings in neurology and cognitive psychology.2 This stance has been critiqued from the perspective of cognitive semiotics (Sonesson 2007; Zlatev 2010), for reasons such as those discussed in the rest of this chapter.

1 At the time, the authors were not familiar with Husserl’s later phenomenology, and therefore expressed criticism: “The irony of Husserl’s procedure, then, is that although he claimed to be turning philosophy toward a direct facing of experience, he was actually ignoring both the consensual aspect and the direct embodied aspect of experience. (In this Husserl followed Descartes: he called his phenomenology a twentieth-­century Cartesianism.)” (Varela et al., 1991: 17). Their attitude to Husserl within enactivism, nonetheless, changed with the years, and finally turned into a positive one (Thompson 2007). 2 Regarding phenomenology, Johnson declares: “We must keep in mind that phenomenological analysis alone is never enough, because image schemas typically operate beneath the level of conscious awareness. That is why we must go beyond phenomenology to employ standard explanatory methods of linguistics, psychology, and neuroscience that allow us to probe structures within our unconscious thought processes.” (Johnson 2005: 21)

Phenomenological Interpretation of the Embodied Simulation


2. Phenomenological elucidation and cognitive semiotics The most common answer to the question surrounding the relevance of phenomenology for cognitive sciences is that of “mutual constraint” (Varela 1996) or “mutual enlightenment” (Gallagher 1997). Although neither Varela nor Gallagher was thinking about cognitive semiotics rather than a scientific study of cognition that acknowledges the central role of first-­person accounts,3 the fruitfulness of such mutual enlightenment has been recognized from a cognitive semiotic perspective (Zlatev 2012). For Gallagher (2012: 33) “the convergence pertains to how phenomenology is put to use in the research fields of psychology and neurosciences. It’s a convergence on a methodological plane”. His own proposal, called “front-­loaded phenomenology”, consists of introducing phenomenological insights into the design of experiments, so as “to allow the insights developed in phenomenological analyses to inform the way experiments are set up” (ibid: 38), making a fruitful interaction possible between first and third-­person methodologies. Certainly, semiotics’ main concern is not with experimental design, but with meaning; nonetheless, for cognitive semiotics the proper understanding of meaning and meaning construction calls for a fertile integration of methods and theoretical perspectives (Zlatev 2012). In what follows, I will try to show how phenomenology may help to clarify some fuzzy zones in the vast landscape of what could be called the cognitive sciences of meaning. First, I introduce the “embodied simulation hypothesis”, directing attention to an experiment that supposedly supports its central thesis. Then I present Husserl’s theory of meaning in the Logical Investigations, and, based upon these investigations, review the experiment. Finally, I present the theory of perception that can be found in the later works of Husserl, in order to make a final comment on the experiment and the embodied simulation hypothesis.

2.1. The “embodied simulation hypothesis” Benjamin Bergen presents and defends the embodied simulation hypothesis, according to which “meaning is something that you construct in your mind, based on your own experiences” (Bergen 2012: 13). For Bergen, “we understand language by simulating in our minds what it would be like to experience the things that language describes” (Bergen 2012: 17), and he supports this using a great amount of experimental data, but little – if any – first-­person analyses. Bergen presents an experiment performed by Stanfield and Zwaan (2001) in which the participants are told to read a sentence on a screen, and then determine, within a series of pictures shown on the screen one after another, if the pictured item was mentioned in the sentence. For example, after reading the carpenter hammered the nail into the wall, a picture of an object pops up on a screen, and the participant 3 Against, for example, Dennett (2001), who claims: “First-­person science of consciousness is a discipline with no methods, no data, no results, no future, no promise. It will remain a fantasy.”


Carlos A. Pérez

must decide as quickly as possible if it was mentioned or not. The key manipulation was that, along with pictures of objects not mentioned in the sentence (an elephant, for example), the screen also showed the picture of a nail, sometimes horizontally oriented, and sometimes vertically oriented. The results showed that “people were faster to answer correctly when the orientation of the object implied by the sentence matched the orientation of the picture” (Bergen 2012: 54). That is, participants took more time to identify the match between the linguistic expression and the picture when the orientation of the nail was not the one implied by the sentence. So, Bergen argues, “a very reasonable explanation is that when people read sentences, they construct visually detailed simulations of the objects that are mentioned” (ibid: 54). He describes simulation mostly in terms of brain activity; it is our brain that simulates, using the same areas involved in direct perception. Although Bergen recognizes that simulations can sometimes be conscious, as in the case of mental imagery, he claims that “many of the same brain processes are engaged, invisibly and unbeknownst to you” when consciousness is not involved (ibid: 14). Since “invisible” means inaccessible to consciousness, embodied simulation would have nothing to do with first-­person methodologies, and meaning would have nothing to do with consciousness. But this is, precisely, the opposite of the position held by phenomenology: meaning is potentially conscious meaning, and must be elucidated by means of phenomenological analysis. In fact, returning to our question about the relationship between phenomenology and cognitive semiotics, Sonesson (2009) insistently remarks on the intimate link between meaning and consciousness, not only for experimental design, but for theoretical precision: “Without the elucidation offered by the phenomenological method, semiotics and cognitive science risk indeed to end up forming an eclectic patchwork.” (ibid: 28) and “semiotics takes meaning as its perspective on the world. What this means, no doubt, will remain somewhat obscure until meaning has been phenomenologically elucidated.” (ibid: 35). Thus, following Sonesson, the main task for a cognitive approach to semiotics is to account for the way in which signs and meanings are accessible to consciousness, as the task for phenomenology is precisely to understand the ways in which consciousness relates to meaning.

2.2. Phenomenology, language and perception in the early Husserl The theory of meaning presented by Husserl in Logical Investigations is, among other conceptions, appropriate for present purposes. On the one hand, it is in the Logical Investigations that Husserl studies language on its own, providing what Bundgaard (2010) calls “an ontology of language use”. On the other hand, the meaning-­intention and meaning-­fulfillment correlation, upon which Husserl builds his proposal, finely matches the kind of task effected during the experiment described above. According to the 1st Logical Investigation, a verbal utterance has a double function: On the one hand it serves to indicate the utterer’s will to express himself: every speech act is linked to the utterer’s intention to communicate something. On the

Phenomenological Interpretation of the Embodied Simulation


other hand, a verbal utterance expresses a meaning-­intention (also called by Husserl a meaning-­conferring act), which is the mental act that, so to speak, animates or gives life to a sign-­expression. The description of the meaning-­intention is given by Husserl in the 1st Logical Investigation, making a clear distinction between act, object and meaning. An act is the particular psychological mode in which an object is presented to consciousness. An object can be perceived, imagined, desired, feared, remembered, etc. As such, an act is a subjective and temporally framed conscious process. The intentional act must be distinguished from its object: every act is directed at something, is about something. I can perceive a dog, or imagine a dog or remember a dog; an act without an object is a phenomenological incongruity. Finally, the intentional meaning of the act is the specific way in which the object is intended. An object is always presented as something; in our example, the object is intended as a dog. Meaning and object are not the same, but meaning intentions are directed to an object through a meaning: An expression only refers to an objective correlate because it means something, it can be rightly said to signify or name the object through its meaning. An act of meaning is the determinate manner in which we refer to our object of the moment (2001 [1900]: 198)4

Notably, in Husserl’s account, while every act is subjective, meanings are intersubjective:5 What we assert (…) involves nothing subjective. My act of judgement is a transient experience: it arises and passes away. But what my assertion asserts, the content (…) neither arises nor passes away. (Husserl 2001 [1900]: 194)

For Husserl, meanings are ideal and not psychological entities, and, although the meaning of an expression is one, the ideal meaning is instantiated in each singular act. Specific utterances may be interpreted in different ways, as Langacker (2008: 463) illustrates: If someone says the cat is on the mat, you are likely to envisage a typical domestic feline reclining on a flat piece of woven material spread out on the floor. This is what we take as being the expression’s meaning. But does the sentence really mean this? It would, after all, be quite appropriate for describing other situations. Perhaps, for

4 Unless stated otherwise, all italics are in Husserl’s work. 5 To insist on the intersubjective nature of meaning is a challenge not only for the embodied simulation hypothesis, but for Cognitive Semantics in general. Most of the authors in cognitive linguistics don’t have a problem with accepting and defending an “intracranial” view on meaning, according to which meanings are constructions that reside inside our heads (for a Chomskian-­inspired view, see Jackendoff, 2003; for a non-­Chomskian view, see Lakoff and Johnson, 1999 or Gallese and Lakoff, 2005). While on the right tracks – I think – Husserl’s theory in the Logical Investigations is openly Platonist (or Fregean); intersubjectivity is not yet a central issue of his phenomenological descriptions.


Carlos A. Pérez example, (…) a large, voracious cat, having already devoured the curtains, is now eating the mat. Maybe the cat is a tiger in a cartoon, who has just lost a boxing match and is lying unconscious on the canvas. Or suppose we are using a light-­colored mat as a makeshift screen for a slide show. To find where to place the projector and how to aim it, you put in a slide with the image of a cat. When the projector is finally positioned properly, I can let you know by saying OK, the cat is on the mat.

The sentence can take different meanings in different contexts, but in each case, the utterer knows exactly what meaning she has conferred to the expression, and, if communication is successful, so does the listener. In Husserl’s account, meaning is linked to the act, not to the sentence, and referring is an activity carried out by the subject in intentional acts, and not a property of signs themselves. But this is only the first part of Husserl’s account. Crucial for my discussion is the link between meaning-­bestowing acts and meaning-­fulfilling acts: If we leave aside the sensuous acts in which the expression, qua mere sound of words, makes its appearance, we shall have to distinguish between two acts or sets of acts. We shall, on the one hand, have acts essential to the expression if it is to be an expression at all, i.e. a verbal sound infused with sense. These acts we shall call the meaning-­ conferring acts or the meaning-­intentions. But we shall, on the other hand, have acts, not essential to the expression as such, which stand in the logical relation of fulfilling (confirming, illustrating) it more or less adequately, and so actualizing its relation to its object. These acts, which become fused with the meaning-­conferring acts in the unity of knowledge or fulfillment, we call the meaning-­fulfilling acts. (Husserl 2001 [1900]: 192)

When I say there is a hamster in the room next door, I express a meaning-­intention, according to which there is a furry little rodent in the room next door. Since the object is not directly given, the meaning-­intention is described by Husserl as an empty intention, as is the case for any act of intending to use a sign – a signitive intention – to direct consciousness to an object or state of affairs. But I can go to the room next door and see the hamster; this perceptual act would, then, fulfill the previous meaning intention. Of course, meaning-­fulfilling acts are not necessary to confer meaning to a sentence. It is possible for a meaning-­conferring act never to be fulfilled. The perceptual act is not a part of the signitive, empty intention, but – only – its fulfillment, and the fulfilling act is not the simple, direct presence of the perceptual situation, but “the experience of coincidence between an empty intention and its fulfilling object” (Moran 2012: 131). What Husserl is trying to show is that the meaning-­conferring act would remain phenomenologically unintelligible without taking into account its relation to the meaning-­fulfilling act. Now, although secondary and “not essential”, meaning-­fulfilling acts are important for understanding the relationship between perception and linguistic meaning. Following Husserl, perceptual acts must be regarded as the confirmation of the meaning intentions expressed through linguistic signs in speech acts. But, conversely, Husserl himself claims that the content of an expression is the content of the act that fulfils it.

Phenomenological Interpretation of the Embodied Simulation


The word ‘expression’ is normally understood – wherever, that is, we do not speak of a ‘mere’ expression – as sense-­animated expression. One should not, therefore, properly say (as one often does) that an expression expresses its meaning (its intention). One might more properly adopt the alternative way of speaking according to which the fulfilling act appears as the act expressed by the complete expression: we may, e.g., say that a statement ‘gives expression’ to an act of perceiving or imagining. (Husserl 2001 [1900]: 192)

If this is the case, (a) the specific structure of speech acts can be described independently from its fulfillment in the act of perception (the fulfilling act is not essential to the expression), but (b) in order to fulfill a meaning intention, perceptual meaning must be related to the semantic characterization of speech acts. The problematic relation between speech and perception is further addressed by Husserl at the beginning of the 6th Logical Investigation: I have just looked out into the garden and now give expression to my percept in the words: ‘There flies a black bird!’ What is here the act in which my meaning resides? I think we may say (…) that it does not reside in perception, at least not in perception alone. It seems plain that we cannot describe the situation before us as if there were nothing else in it – apart from the sound of the words – which decides the meaningfulness of the expression, but the percept to which it attaches. For we could base different statements on the same percept to which it attaches, and thereby unfold quite different senses. I could, e.g., have remarked: ‘That is black!’, ‘That is a black bird!’, ‘There flies that black bird!’, ‘There it soars!’, and so forth. (Husserl 2001 [1900]: 195)

A perceptual scene, by itself, doesn’t mean anything. In simple perception, the objects are given in a direct, immediate, temporal manner. On the other hand, to see that a state of affairs obtains, is to capture it in a perceptual judgment. A meaningful perceptual act is, thus, founded in the sensuous content of perception, but is not derived from it. Rather, epistemic perception, that is, a perceptual act conceptually and propositionally framed, belongs to a different realm of intentional acts. For example, the table in front of me is captured from a particular point of view, and my perspective on the table changes with every movement I make. Nonetheless, the object picked out through a meaning in what Husserl calls a nominal act, that is, the object seen as a table, or the state of affairs intended by the perceptual judgment This table is black, is one and the same, independently of my particular point of view. We can say, thus, that one thing is to see x and another thing is to see that x; one thing is to – simply – see the perceptual thing, and another one is to see that this table is black. That perception, in this sense, has epistemic value, is clear when we realize that I can be right or wrong about what I see or judge to see: maybe it is not a table, but a really big chair, or maybe the table is not black, but brown, and, in both cases, I would be mistaken. In any case, the epistemic value of the perceptual act is independent of the sensuous perceptual act. Intuition may indeed be allowed to contribute to the meaning of a perceptual statement, but only in the sense that the meaning could not acquire a determinate relation to the


Carlos A. Pérez object it means without some intuitive aid. But this does not imply that the intuitive act is itself a carrier of meaning, or that it really makes contributions to this meaning. (2001 [1900]: 196)

Perception as such, according to Husserl in the Logical Investigations, is deprived of meaning. In order to be saturated with meaning, perception must ground (and be accompanied by) another kind of act, either a propositional one (whose object would be a state of affairs), or a nominal one (whose object would be a meaningful presentation of an object).

2.3. Meaning and the embodied simulation hypothesis The main difference between Husserl’s account and Bergen’s embodied simulation hypothesis is the ontological perspective of each proposal. Phenomenology is clearly committed to a first-­person perspective on meaning. Conversely, in Bergen’s proposal, there is a blurry boundary between a first- and a third-­person perspective; a blurry limit that, once dissipated, shows the limits of the hypothesis. Bergen identifies two kinds of simulations: conscious and unconscious. He gives some examples of the former, but he passes through them quickly since his main concern is the unconscious simulations. Regarding the conscious ones, he says: [you simulate] when you imagine your parents’ faces (….) when you imagine sounds in your head, (…) [a]nd you can probably conjure up simulations of what strawberries taste like when covered with whipped cream (…). (Bergen 2012: 14)

In phenomenological terms, all of Bergen’s examples (to imagine or to remember something) are presentifications (Vergegenwärtigung), that is, intentional acts in which the intentional object is not directly and intuitively given. Presentifications are neither presentations (Vorstellungen), in which the object is given in its direct, lived presence (as in direct perception), nor representations (Repäsentationen), in which the intentional object is given through the mediation of signs.6 In fact, Husserl recognizes memory and fantasy as presentifications: in memory, I intend an object as having existed, while in fantasy the object is intended in the mode of the as if (that is, as non-­existent). Now, contrary to memory and fantasy, signitive intentions linked with representations do have an intuitive character, since they also involve a perception, not of the intended object, but – in semiotic terms – of the signifier.7 However, Bergen conflates the levels of description when he proceeds to characterize simulation as a brain process. If simulation refers to a brain state, the term must be used to describe a process from a legitimate, objective, third-­person perspective. And even though a benevolent reading of Bergen’s account could insist 6 For the semiotic relevance of the distinction between presentation, presentifications and representation in Husserl’s final division of psychic acts, see Sonesson (2015). 7 The most complete characterization of signs within the limits of Husserlean phenomenology has been given by Sonesson (2012).

Phenomenological Interpretation of the Embodied Simulation


on showing that what is phenomenologically described as a presentification, from a first-­person perspective, is correlated with some kind of brain process characterized from a third-­person perspective, this doesn’t seem to be the case. Actually, Bergen talks about simulation both as a conscious act and as a brain state: Simulation is an iceberg. By consciously reflecting (…) you can see the tip – the intentional, conscious imagery. But many of the same brain processes are engaged, invisibly and unbeknownst to you, beneath the surface during much of your waking and sleeping life. Simulation is the creation of mental experiences of perception and action in the absence of their external manifestation. (Bergen, 2012:14)

This last sentence is phenomenological non-­sense. Bergen is claiming that is possible to have a mental experience in the absence of an experience. But even if we keep Bergen’s assertions at their simplest and modest level, and instead of creation of mental experiences we understand “simulation” as an unconscious cognitive process, either it is a process that eventually can become a conscious experience, or it is a brain state that must be regarded as such, without any relation to the mind or to meaning; the difference in the levels of description must be preserved. As Searle (1992: 152) has argued, “the notion of an unconscious mental state implies accessibility to consciousness.” To phrase this in Bergen’s terms, if simulation is an iceberg, the hidden part of it, the one that lies beneath the surface, must be accessible to consciousness if we want to admit it as part of our mental life. But, again, this is not the case for the “embodied simulation hypothesis”. To put it simply, simulation – as understood by Bergen – is not an intentional act, and thus, it has nothing to do with meaning, as understood by Husserl. Now let’s return to the carpenter and the nail in the experiment reviewed in Section 2.1, and give it an interpretation according to Husserl’s theory of meaning. In the experiment, the participants were asked to identify whether or not an object shown on a screen was previously named or not. In the terms just exposed, the task consists in determining whether the object shown in the picture fulfills the term used in the previously read expression. From a phenomenological perspective, perceptual consciousness is different from image-­consciousness. I cannot present Husserl’s account of images here,8 but it is not difficult to realize that it is one thing to perceive an object, and another to see its visual representation.9 Concisely, in direct perceptual intuition, objects are given directly from a determinate point of view. Nonetheless, although we only have perceptual access to a given profile while the other sides of the object remain hidden, 8 Image-­consciousness is discussed in the V Logical Investigation, § 14, and also in many of Husserl’s posthumous papers (see Husserliana, XXIII). On the latter, see Sonesson (1989, 2012). 9 In semiotics, the discussion about the iconicity of a visual fact is a matter of theoretical controversy (Groupe µ, 1993 [1992]: 127). On the other hand, Husserl (1973: 92) himself writes at length about the situation in which a man takes a mannequin for a real man.


Carlos A. Pérez

we perceive the object as a whole; when we move ourselves, the previously hidden profiles may appear, and there must be a temporal continuity between the intuited profile and the other one that shows itself in correlation with the movements of my own body. Such a thing is not possible for a two-­dimensional representation of an object. However, let’s assume, for the sake of our discussion, that image-­consciousness can fulfill the meaning-­conferring act linked with the verbal expression previously shown. If this is the case, the verbal expression and the image are semantically related. In the specific scenario designed in the experiment, what is shown on the screen is not the image linked with the state of affairs that would fulfill the verbal meaning-­intention, but an image whose intentional object is picked out by a previously read sentence. The important issue here is that the distinctive orientation of the nail is not constitutive of the nominal act related to the word, something that is presupposed and accepted by the experiment, since incorrect responses were dropped from the analyses (Stainfield and Zwaan 2001: 155). The relationship between conceptual meaning and its objects is one and the same, and is not affected by differences in the object. The experiment only measures the reaction time, which is something that falls outside the limits of Logical Investigations. As we will see, time is the central problem in the genetic analysis developed by Husserl around 1920, and it is in this respect that the phenomenological approach to the experiment may be illuminating.

2.4. Time and the dynamics of perception The theory of meaning presented in Logical Investigations changed substantially from 1900 to 1913 (the year of the publication of Ideas, book I, and the second edition of the Logical Investigations), and then took a definite turn around 1916, when Husserl embarked on genetic analyses. The key issue in the genetic analyses is to capture the temporal essence of intentionality: lived experiences don’t appear out of nothing, nor do they disappear or vanish into nothingness. Rather, lived experiences emerge from and in a pre-­given horizon of lived experiences. The intentional structure has its phenomenological history, its genesis. Perception, within the phenomenological frame of genetic analyses, must be understood not as an epistemic stance towards the world, but as a practical and embodied engagement with it. Around 1920, Husserl was disappointed with the way his early phenomenological analyses failed to take into account the experience of what he called the perceptual individuum, “the realm of concrete and discrete phenomena” (Husserl 2001 [1921]: 479). He realised that epistemic perception is actually grounded in a perceptual horizon infused with affective motivations, associations and motor displacements. Genetic analyses are Husserl’s attempt to capture phenomenologically not only the difference but also the relationship between the predicative and the pre-­ predicative spheres of experience. The predicative realm is the sphere of judicative knowing. The pre-­predicative sphere, on the other hand, underlies the predications

Phenomenological Interpretation of the Embodied Simulation


involved in judgments prior to any active thematization. A judgment asserts that something is the case, and involves a position taken on the part of the subject towards the world. But “every judging supposes that an object is on hand, that it is already given to us, and it is that about which the statement is made” (1973 [1939]: 14). Perception is giving in an orginary manner with respect to its immanent object, that is, its sense. But (…) this can only be the case insofar as it is an integration of pure acts of making-­present and presentifications streaming along, which as phases of the stream are non-­independent. We call the momentary, pure making-­present of every perception, in which there is a new making-­present in every moment, a primordial impression. Its accomplishment is the primordial institution of a new temporal point in the mode of the Now, filled with objectlike formations. The continua of presentifications of something that has occurred “just” now which belongs to every moment of perceiving, we take as retentions; they fuse into a unity of one retention, which however has a new mode in every phase of the continuum. Under close scrutiny – and this would be a necessary supplementation – we notice that a new sort of presentification still belongs to perception, what we call protention. Protentions are anticipations continually undergoing change and, from the very beginning, are constantly aroused by the course of retentions”. (Husserl 2001: 611 [323])

For the sake of clarity, let’s consider this quote in some detail. In perception, an object never appears in its totality, but in a certain perspective. An object is seen as a whole, but appears and shows itself only perspectivally. And while it is possible to change my point of view and take a look at other profiles of the object, the object is never given at once in its totality. According to Husserl, the sensorial and directly given profile is immanent to consciousness, as long as it cannot be separated from my experience: the profile is an appearance insofar as it is given for me. But precisely because it is the profile of an object, the perceptual entity transcends the immanent flow of my subjective experience: every intentional act is always directed towards a transcendent, extra-­mental, object; consciousness is always consciousness of something outside of it. Hence, Husserl characterizes the constitution of intentionality as a transcendence in immanence. There is no profile without a point of view, and there is no profile without an object. The profile actually perceived is surrounded by horizons: while looking at the book lying on my desk, I know that there are hidden, unseen profiles that, nonetheless, can show themselves if I turn the book around. So, the perceived thing has a perceptual sense insofar as the present directly given profile (the primordial impression) points beyond itself to other hidden profiles of the object, as well as to the previously given but now hidden sides of it. Returning to our example, let’s suppose that I move my head to see the previously unpresented spine of the book. The spine is now the primordial impression that points back to the previously present but now hidden sides of the book, while pointing forward to the unseen sides (for example, the back of the book).


Carlos A. Pérez

The absent sides of the object are, for Husserl, presentifications. As we saw earlier, there are different types of presentifications. Both memory and fantasy make present to consciousness an absent object, the former a previously perceived one, the latter an imagined one. But in perception, what is presentified is not an object, but a profile, which is intended in the form of a retention or a protention. When I (now) see the spine of the book, the previously present profile –the just-­seen one – remains present (although not directly given) in my retentional consciousness. Correspondingly, the present, given profile, points to an upcoming profile, not yet perceived but expected in my protentional consciousness. When I move my head and the spine of the book shows itself, the protention is fulfilled, giving place to a new protention and a new retention. Primal impression, retention and protention configure an internal system of referential implications that is dependent upon a system of lived body movements: I can move my head and see the spine, I can manipulate the book and turn it around, etc. The appearances are kinaesthetically motivated, and there is a correlation between the series of kinaesthetic sensations by which I’m aware of my corporal movement: Only as dependent upon kinaestheses can [the appearances] continually pass into one another and constitute a unity of one sense. Only by running their course in these ways do they unfold their intentional indicators. Only through this interplay of independent and dependent variables is what appears constituted as a transcendent perceptual object. (Husserl 2001: 52)

The temporal horizon that merges in the dynamic correlation between adumbrations and kinaestheses is organized in and through perceptual schemata (Welton 2000: 332), which are entirely different from the conceptual meaning instantiated in meaning-­conferring acts. First, conceptual meaning can subsume different objects independently of the differences between them. All the books in my library are, precisely, books. Perceptual schemata, on the other hand, change along with the object. Having in front of me three different books, my old edition of the Cartesian Meditations, the 1550 pages of Tolstoy’s War and Peace, and my little son’s favorite, Lost and Found, the perceptual sense of each object is not the same. Second, perceptual schemata are supported by bodily knowledge, while conceptual meaning has a referential function (whether something is the case or not) that has no relationship with bodily-­kinesthetic skills. The practical involvement with the world is possible because of the mastering of my bodily movement. Third, linguistic meanings are inherently intersubjective (Zlatev 2010), while perceptual schemata are subjective and embodied. Still, the question remains: how does the conceptual layer of meaning relate to the perceptual, practical dimension of sense making? And this is precisely the question that lies beneath Stanfield and Zwaan’s experiment, as I show in the following section.

Phenomenological Interpretation of the Embodied Simulation


2.5. Language and time According to the experiment, “the recognition of an object mentioned in a sentence was influenced by the orientation of the object” (Stanfield and Zwaan 2001: 155). In what follows, I will try to show that such a result can be rendered comprehensible from the point of view of genetic phenomenology. For genetic phenomenology, conscious experience is always temporally framed. Thus, every act has its own temporal horizon. Speech acts are not (just) the mere expression of a logical meaning; as a temporally situated act, meaning-­conferring acts are retentionally and protentionally determined. In his characterization of protentional intentionality, Dieter Lohmar (2002) distinguishes, among others, between two kinds of protention: rigid protention and movable intentional expectation. Rigid protention is motivated by immediate sensuous data and current retention, while the movable intentional expectation is subject to change according to previous experience. This can be easily understood following Lohmar’s example: when looking at a red traffic light, the just seen red light enters into retention, motivating, along with the present intuition of the red light, the protention (in continuity) of the red light. This would be the kind of experience of someone who has not seen a traffic light before. But for us, who are familiar with traffic lights, it is possible to “see” the yellow light coming while the traffic light is still red; we know that the yellow light is about to light up. According to Lohmar, this would be the kind of experience in which protention is not rigid but movable. It must be said that Lohmar’s analysis is meant to account for the temporal dimension of perceptual intentionality. What about speech acts? Let’s suppose that I am at my brother’s apartment and he tells me, trying to warn me about the situation: The dog is on the rug. His speech act is directed towards an object which, although not immediately present, is going to appear. Conceptually, the specific, concrete mode of appearance of the dog on the rug is not and cannot be specified. The dog being on the rug must be regarded as an individual referent of an intersubjective, abstract meaning. But genetically, the interpretation of my brother’s speech act has a subjective and embodied character. I expect something, based on my previous experiences, just as the participants in the experiment expect a horizontal nail and not a vertical nail. This claim is phenomenologically intelligible keeping in mind the distinction between near and far protention introduced by de Warren (2009: 196):10 Within the three-­fold declension of original time-­consciousness, near protentions contribute directly to the overall constitution of the living present. Far protentions, by contrast, exceed the arc of the living present and, in this sense, are “layered-­over” the near protentions. Perhaps we could also speak of far protentions as “asleep” in

10 Although not spelled out by Husserl himself (he wrote only about far retentions) the distinction between near and far protentions has been addressed by his commentators (Rodemeyer 2006; De Warren 2009; Lohmar 2002)


Carlos A. Pérez the sense of “protentional expectations” projected into the remote future, which are “pre-­given” as horizons on the outer limit of the living present, lurking and waiting to be discovered.

Like Lohmar, De Warren is trying to account for the temporal dimension of perceptual experience. But again, this characterization may be extended to speech acts as well: I anticipate in far protention the perceptual appearance of a situation. The situation appears as something that is about to happen, an appearance co-­determined by my practical and embodied engagement with the world. And this holds even when the intended object is a past event, since it is possible to have in protention a past or an imagined event. Both fantasy and memory are lived experiences, and, thus, can be described according to the retention-­protention structure. But protention is not fantasy, nor memory, but a constitutive feature of temporal consciousness. In other words, neither retention nor protention are intentional objectifying acts. Rather, protention is an associative connection which arises passively, and not an active anticipation of an event directed to the future, as in a prophecy. When I hear my brother’s utterance, I’m passively associating my current ongoing situation with the situation of the dog lying on the rug. Returning to the experiment, the sentence The carpenter hammered the nail into the wall, can be fulfilled, for example, either by a carpenter standing upright and hammering a nail into the side a wall (Figure 1, A), or by a carpenter hammering a nail into the top of a wall (Figure 1, B). That is, it can be fulfilled in epistemic perception. In each case, the orientation of the nail is different, but this is irrelevant for meaning-­fulfilling acts. Figure 1. Schematic representations of the meaning-­fulfilling acts of hammering a nail on the side of a wall (A) and on the top of a wall (B)

However, when reading the sentence, I anticipate in protention the perceptual sense of an intended state of affairs. This means that I anticipate a particular, subjective, perceptual scene organized in terms of perceptual schemata, based on my own past experiences. As stated above, past experiences do not disappear into nothingness; the repetition of certain experiences becomes part of me through habitualization. This is why, when I interpret the sentence The carpenter hammered the nail into the wall, I pro-

Phenomenological Interpretation of the Embodied Simulation


tend something like Figure 1A more than 1B. Such a phenomenological analysis can account for the results of the experiment without assuming any kind of simulation.

3. Conclusions Husserl’s account of meaning is rich and complex. My aim in this chapter was a modest one: to illustrate, with a particular example, the way in which phenomenology can clarify the results of empirical cognitive research. In doing so, I highlighted the richness of the phenomenological description of the temporal dimension of human experience. Besides, I hope to have shown the fertility of addressing the question of the relationship between linguistic meaning and perceptual sense, and for the evaluation of “embodied simulation hypothesis”. In summary, to claim that meaning is something that you construct in your mind, based on your own experiences, is phenomenologically misleading. Conceptual meaning is accessible to consciousness through conscious acts, but it has an intersubjective value. What you construct, based on your own experiences, is a system of practical implications that shapes your involvement with the world. The phenomenological description of such a system must take into account the insights provided by Husserl regarding time-­consciousness. And finally, it must be clear that one of the most challenging questions for cognitive semiotics is the one about the relationship between perceptual sense and conceptual meaning.

Acknowledgements I would like to thank Catalina Palomino for her insightful observations. I would also like to acknowledge the two anonymous reviewers for their careful reading of the chapter and their helpful comments

Joel Parthemore

Chapter 3 A Cognitive Semiotic Perspective on the Nature and Limitations of Concepts and Conceptual Frameworks 1. Introduction: From cognitive science to cognitive semiotics What philosophy of mind calls theories of concepts (for a representative listing of theories, see Section 3.2), cognitive science has long described as knowledge representation (Section 3.4). Regardless of the choice of terminology, the concern is with addressing how it is – and what precisely it means to say – that the thought patterns of non-­infant human beings (if not others) are systematic and productive: systematic in that the same ideas can be applied in essentially the same way in each new context the agent encounters; productive in that a finite number of these ideas can be combined into unboundedly many complex structures that can, in the human case at least, be expressed in language: e.g., the notion of a politically left-­leaning, possibly Asperger’s, science-­fiction reading, computer-­game playing, bicycle riding American philosopher of mind with a taste for hot chili peppers, a notion I came up with just now. Indeed, cognitive science – whose roots are often traced back to the Dartmouth conference of 1957 – has been seen by many as preoccupied, since its foundation, with what it means to have a (human) mind – presumably something more than a pure stimulus/response system – and whether and in what sense that mind could be said to be “computable”. Cognitive science, as traditionally practiced, has often and justly been held to criticism (Froese 2007, 2011, 2012) and even scorn (Dreyfus 1972, 1992) and ridicule. Consider this from Anthony Chemero (2011: 6): Empirical propositions about the number of planets or about the history of cilia, it is typically thought, are not to be ruled out by logic or definition. Somehow, though, this attitude has not made its way into cognitive science, where conceptual arguments against empirical claims are very common. Indeed, one could argue that the field was founded on such an argument.

In offering his own critique, Jordan Zlatev (2012: 3) writes of cognitive science that it has often “from its onset in the 1950s adopted an explicitly physicalist (computational and/or neuroscientific) take on mind, connecting to the humanities quite selectively, and above all to philosophy of mind with a distinctly reductionist bent”.


Joel Parthemore

Nevertheless, cognitive science in general and the mind-­as-computer metaphor in particular have a proud history of more than sixty years of conceptualizing and re-­conceptualizing the very structure of conceptually structured thought itself: as clearly demonstrated among human beings, as arguably found in varying degrees among a range of other species, as much more controversially attributed either to existing or potential future artefacts. Together, they have provided a distinctive mirror for the human species to hold up to itself and contemplate – in an inevitably self-­referential-paradox threatening fashion – the nature of its cognitive nature, along with that of other agents who share elements of what it means to be cognitively human. Cognitive science has been criticized by Tom Froese and others for all too often implying, if not actively embracing, some form of mind-­versus-matter Cartesian dualism.1 In case one thinks that Froese has a straw man in mind, Owen Holland – though not a substance dualist – in describing his Chronos/Simnos architecture (see e.g. 2007) openly embraces the sort of “Cartesian theatre” homuncular model that Daniel Dennett (1991) sees as intimately tied up with Cartesian dualism. That said, cognitive science has often if not generally been inclined toward a more pragmatic attitude, as exemplified so well by the work from the 1970s onward at the University of Sussex, UK, of such giants as Margaret Boden, with her hugely influential Artificial Intelligence and Natural Man (1977), which shaped many an aspiring cognitive science student; and Aaron Sloman, with his perhaps equally influential The Computer Revolution (1978). Sloman I see as representing some of the best in what has come to be known (often disparagingly!) as GOFAI, short for good old-­fashioned AI (Boden 2006, ch. 10), with his focus on AI as cognitive modeling (as opposed to, say, consciousness creation: see e.g. Sloman 1985) and his programmer’s/hacker’s mentality of trying things out just to see what they do. Both Boden and Sloman may be seen as proponents of a pragmatic kind of functionalism that assumed by no means a complete independence between surface behaviour and underlying mechanisms – which would imply, if not require, a form of Cartesian dualism – but only a relative independence. In particular, theirs was, and is, a form of functionalism that emerged in response to the worst excesses of the earlier behaviourist movement, with its positivist-­inspired disregard for anything that could not directly be measured. This functionalism emerging out of the 1970s and ’80s stressed that if one were to have any hopes of getting one’s cognitive models right and explaining all the behaviour that could be directly perceived, then one had to make the right assumptions about all the underlying cognitive processes that could not be so perceived. For all their accomplishments, Boden and Sloman – and many others like them – stand on the shoulders of another giant, whose article in Mind, under the unassum1 Substance dualists are notoriously thin on the ground, but physicist Henry Stapp (2007) is one, though his “quantum” interactive substance dualism is meant to avoid the problems of earlier such accounts.

Concepts and Conceptual Frameworks


ing name of “Computing Machinery and Intelligence” (Turing 1950), aimed straight for the heart of one of the most basic of questions: what does it mean to be a thinking (conceptual) being? Turing was one of the first to see the powerful metaphorical connections between the new digital computers – remembering, of course, that the word “computer” originally applied to persons who performed calculations – and the conceptual structure of human thought, raising important questions about how far such thought could be reduced to some amount of computation, some set of algorithms, and where any such explanation must forcibly stop. Although his Imitation Game, introduced in that paper, has long been taken as providing something like a gold standard for establishing “true” intelligence (e.g. Whitby 1996), Blay Whitby and I have argued (Parthemore and Whitby 2014) that Turing never intended it as an operational test for intelligence but rather as a thought experiment to get his readers to think about thinking: where does simply going through some set of predefined motions end, and flexible conceptual thought begin? What is it that sets a thinking agent apart from an automaton? One further name bears mention with relation to the heady days of cognitive science research in the 1970s and ’80s: John Searle, with his Chinese room thought experiment (Searle 1980): a fascinating counterpoint to Turing’s Imitation Game. The Chinese room can and has been taken as a lecture on the sin of disregarding embodiment and its role in cognition (e.g. Preston and Bishop 2002) – a concern I enthusiastically endorse. Nevertheless, I object to the unstated, yet critical, intuition/presupposition that Searle quietly slips in: namely that, “if one looks inside the Chinese room – or, by analogy, inside the skull – sees the operations there, and ‘knows’ that those operations could not even in principle produce intelligence/consciousness, then there is no intelligence/consciousness regardless of any observable behavior or indeed any measurable differences whatsoever” (Parthemore and Whitby 2013: 113). Consider the ethical minefield that this opens: if one “knows” that the agent is “really” an automaton, then that might seem to justify whatever one might do to it in terms of torture, abuse, and so on: after all, one “knows” that it feels nothing, despite all its perhaps frantic claims to the contrary.2 For Searle, there are no possible grey areas between being an automaton and a thinking agent, but rather (as for Descartes, in his own way) an absolute divide. I am as convinced as Searle that intelligence and consciousness cannot be achieved by brute-­force programming. Nevertheless, what Searle would place beyond possibility of doubt I prefer to leave open in principle to empirical discovery – in the end, my empiricist tendencies trump my rationalist ones. Like Sloman, I want to try things out and see what happens; so the theory of concepts I am developing (see Section 3) has been accompanied, from the beginning, by a software program as a direct translation of that theory: one that shows both where the theory works and where it still has gaps. 2 Precisely this territory is explored in the 2015 film Ex Machina.


Joel Parthemore

Logical fiats – such as those Searle appears to offer – should always be viewed with suspicion. For all of their errors and omissions in other regards, the behaviourists had this much right: all one can go on, in the end, is what one observes. Echoing Whitby’s (2003) sentiments, the take-­home message I wish to deliver is that, if cognitive science was never that day or week or month or year away from achieving “true” AI – as its more naive or unprincipled advocates declared – it was never quite so blind to situatedness or embodiment or neuroscience, nor so impoverished of accomplishments, as many contemporary researchers have seen fit, for their own rhetorical purposes, to portray it.3 Nevertheless, like most if not all intellectual endeavours, cognitive science faces the need to periodically re-­invent itself – in light of the present age, in keeping with contemporary insights and discoveries. Cognitive semiotics – characterized by Zlatev (2012: 2) as “an emerging field dedicated to the ‘transdiciplinary study of meaning’, involving above all researchers from semiotics, linguistics, developmental and comparative psychology and philosophy” – offers one tantalizingly encouraging opportunity to do so. Cognitive science, he complains, has often been explicitly reductionist and physicalist; “[cognitive semiotics] is… considerably more pluralist in its ontological and methodological commitments” (ibid: 3). For all I see as the enduring strengths of Turing’s insights, what is perhaps most striking to me about the mind-­as-computer metaphor is how badly so many of both its fiercest advocates and opponents seem to misunderstand the nature of minds and computers. Computers do not simply “do as they are told” – even as existing models lack what could be called, in anything but the most metaphorical of terms, intelligence, never mind free will. Despite what Roger Penrose (1994: 66) claims to the contrary, they are not equivalent to Turing machines – which are strictly idealized mathematical entities. The problem is not that computers are not embodied, and embedded in an environment with which they richly interact: they are. The problem is rather that they are embodied in the wrong way, not least because of the seemingly unavoidable dependence of cognition on life – even if that life is something different than traditionally biologically conceived (Zlatev 2001, 2002, 2009a). Meanwhile, for all I disagree with her on other matters, Patricia Churchland is right to observe – contra Penrose – that it is unclear whether anything in the universe is not, in principle, algorithmically describable or “computable”, in the very broad way of computability Penrose has in mind (Grush and Churchland 3 Again, if one thinks that they are erecting a straw man, one need only look at the (growing) field of artificial general intelligence, which unabashedly attempts to revive an explicit-­rule-based GOFAI-­style approach. To be fair to AGM advocates, including Claes Strannegård, whom I cite below, they would certainly not describe themselves as “blind”; they would simply say that “true” AI can be achieved without necessarily needing to address one or more of these issues. See (Nizamani et al. 2015; Strannegård et al 2013) to get a good feel for the general AGM style.

Concepts and Conceptual Frameworks


1995: 190). That does not mean that an algorithmic description is always desirable, or even possible. After all, some things may, and probably do, ultimately outstrip the capabilities of human cognition – including, almost certainly, its capacity to understand itself. Here again cognitive semiotics, with its openness to methodological and theoretical pluralism, shows its strengths, in allowing the possibility that there may be (and probably is) no one single “correct” way to look at the mind, and that the mind-­as-computer metaphor, properly understood, has its role to play. The problem is not the metaphor; the problem, again, is the need for periodic renewal. To echo a concern from elsewhere in cognitive semiotics (see below), understanding how the human mind differs from a digital computer – and the gulf is wide indeed! – requires first having an understanding of those things we have in common. In Section 2, I offer my interpretation of the field of cognitive semiotics. In Section 3, I discuss the focus of my own research interests: namely, the nature of conceptually structured thought. Section 4 offers a cognitive semiotic perspective on concepts, looking at how cognitive semiotics improves over traditional cognitive science in offering insights into the nature of concepts and conceptual frameworks. Section 5 applies these ideas to metaphor theory. Throughout there will be echoes of both the relationship between cognitive science and cognitive semiotics, and the enduring power of the mind-­as-computer metaphor. I close with some thoughts about the way forward both for theories of concepts and for cognitive semiotics.

2. The emergence of cognitive semiotics [Cognitive semiotics] can be seen as called for by historical needs… the need to unify or at least to “defragment” our world-­views, the need to come to terms with increasingly higher levels of dynamism and complexity, the need to understand better – and thus deal with – the dialectical relationship between individual freedom (autonomy) and collective dependence (sociality), etc. In other words, if Cognitive Semiotics did not exist, we would need to invent it (Zlatev 2012: 18–19).

Zlatev (2012, 2015a) describes how cognitive semiotics emerged as a distinctive field in the 1990s, first in Denmark and then in the US and Sweden. Its focus was, and remains, on the study of meaning and meaning making.4 In the spirit of enactivism (Thompson 2007; Maturana and Varela 1992; Varela et al. 1991) with its focus on agent/environment interaction and the irreducibility of that interaction, “a basic… tenet is that meaning is not ‘inside’ brains, minds, groups, or communities but is a result of processes of self/other/world interaction” (Zlatev 2012: 17). Meaning is fundamentally perceptually grounded. Thus, one of the leading voices of cognitive 4 Or sense making: see e.g. Thompson and Stapleton 2009; De Jaegher and Di Paolo 2007.


Joel Parthemore

semiotics, Göran Sonesson “has consistently argued for the primacy of perceptual meaning over other kinds of meaning – including signs” (Zlatev 2012: 6). Signs and sign use are the most obvious “outward” indicators of meaning. Cognitive semiotics finds some significant part of its origins in semiotics. That said, “it is not to be seen as a branch of the overall field of semiotics, defined either in terms of ‘domain’ (in the manner of e.g. biosemiotics, semiotics of culture or social semiotics), or ‘modality’ (e.g. visual semiotics, text semiotics)” (Zlatev 2012: 2). In keeping with this relationship, cognitive semiotics tends, when it talks about signs, to prefer a far narrower definition than offered by Charles Sanders Peirce for example, for whom a sign may be taken to include nearly anything to do with meaning (see e.g. Peirce 1981, in stark contrast to Sonesson 2007) – but a broader definition than one that restricts signs to language or human-­language-like communication. For myself, I prefer to take signs as conceptually mediated attempts at communication that are at least semi-­conventionalized, of which language is a subset; the intended target could be another agent, or it could be oneself (though the former is presumably primary). Signs and sign use presuppose communication and social context. Like concepts (see Section 3), signs have both expression and content, each of which may be distinguished from the other. Unlike concepts, expression and content must be distinguishable – at least normally – not just to an “outside” observer but to the sign-­using agent herself. On such a view, concepts are prior to signs and sign use ontogenetically and phylogenetically, while signs and sign use are prior to language. Zlatev (2012, 2015a) describes cognitive semiotics as intrinsically interdisciplinary – or, as he prefers, transdisciplinary. Cutting across rigidly delineated academic subdomains is, of course, hardly new to cognitive semiotics. What is new is the particular mix of disciplines it brings and, in particular, the way it takes lessons from semiotics (regarding meaning), phenomenological philosophy (regarding experience), and linguistics (regarding language) to re-­invigorate discussions that have been going on for half a century in cognitive science. Of course, interdisciplinarity is a two-­edged blade: focus too narrowly on one’s specialization, and one risks facing endless terminological turf wars when meeting researchers who use the terms differently; focus too broadly, and one risks losing oneself in terminological and conceptual vagueness. Different disciplines use terminology differently for good reasons: because their needs for that terminology, though overlapping, are also different. Nevertheless, a certain disregard if not irreverence for traditional academic boundaries – as cognitive semiotics clearly shows – is hard to see as anything but a good thing; as is its refusal to be drawn either toward a theoretically strong but empirically weak perspective (all too common in philosophy), or an empirically strong but theoretically weak one (all too common in empirical science; see Zlatev 2012: 14). Finally, taking both an onto- and phylogenetic perspective, cognitive semiotics attempts to discover what is uniquely human through a better understanding of

Concepts and Conceptual Frameworks


what is not,5 with the understanding that what appears to be key, at first blush, may not be. So far, language – in the sense of a rich integration of syntax, semantics, and pragmatics and not merely innovative communication – seems to be. On the other hand, concepts and conceptually structured thought arguably are not (see e.g. Parthemore 2011a, 2013, 2014). Cognitive semiotics is not an attempt to replace cognitive science. Neither is it, as Zlatev (2012: 3) considers and rejects, “a new and fancier name” for it. It is rather an attempt to shine new light on old phenomena: whether that be the origins of proto-­language and language or of consciousness, or the limitations of AI and so-­ called machine consciousness – or the nature of conceptually structured thought.

3. Concepts and theories of concepts 3.1. Concepts: Attempt at a definition Concepts are the framework underlying not just language but unspoken thought. They may roughly be taken equally either as units of structured thought or as the ability/-ies to possess and employ structured thought – I take the two formulations as equivalent; such that that thought is (a) systematic… (b) productive… (c) compositional… (d) intentional per Brentano … (e) re-­identifiable… (f) “spontaneous” per Kant’s terminology… and… (g) subject to revision… (Parthemore 2014: 193–194).

What this definition amounts to is that a concept of x is one’s structured understanding of some concrete or abstract entity x, such that one might, but need not, be able to explain it even to oneself, let alone anyone else. After all, many of the skilled basic activities we engage in we feel like we “just do”, without any need to think about them (Felix 2015: Ch. 3). This claim – that concepts, in effect, bridge Gilbert Ryle’s (1949) knowledge how / knowledge that divide – will be controversial in certain circles; as will the claim that concepts must be relatively stable but not too stable: that they are not just intrinsically open to change but indeed to continuous, if incremental, revision.6 Concepts do not stand still. They must be stable to be reusable across unboundedly many contexts but not too stable so as to be unable to adapt to each new context – in some way, large or small, different in a thousand details from any context that preceded it and any that follow. The other properties I list are, by contrast, all but universally accepted. I have already explained systematicity and productivity. Compositionality – the idea that 5 Compare: “…while CS [cognitive semiotics] practitioners indeed focus on what is specific about human forms of meaning-­making, there is widespread agreement that this can only be properly understood in a comparative and evolutionary framework” (Zlatev 2012: 2). 6 Jerry Fodor would oppose it outright: in his Informational Atomism account (Fodor 2008), it is essential to the nature of concepts that they cannot change. Others would say merely that they can change.


Joel Parthemore

concepts can be added together (and at least some concepts can be taken apart) – falls out directly; as does re-­identifiability: the ability to describe the same thing as the same thing – from many different perspectives, in unboundedly many contexts, by various sensory modalities.7 Intentionality8 is the deceptively simple idea that concepts are always concepts of: they always have some target (indeed, for someone). All of this, however, is to describe concepts as if they were carefully pinned out lepidoptera. What are concepts in use? Concepts simplify (and, by simplifying, distort) in pursuit of understanding, reducing the complexity of the world to a level we can grasp. They allow us to approach the world in a distinctively flexible way at the price of removing us from that world – in favour of a conceptual model of it.9 That flexibility of response is the hallmark of both conceptually structured thought and consciousness. Indeed, conceptual agency and consciousness may be seen as two sides of a single coin (Parthemore and Whitby 2014), where the appropriate attribution of the one is the appropriate attribution of the other. On my reading, such a link is assumed by most contemporary theories of consciousness – Information Integration Theory (Tononi 2008; Balduzzi and Tononi 2008), Global Workspace Theory (Baars 1988, 1996), Dual Aspect Theory (Chalmers 1997), to name just three – but rarely if ever made explicit, an omission I wish to correct. In keeping with arguments made by Douglas Hofstadter (2000 [1979]) with respect to Gödel’s Incompleteness Theorem (see also Parthemore 2011: 192–195), concepts logically entail their own limitations. What this means is – pace Penrose (1994) – that any given conceptual framework cannot be complete and consistent at the same time, as a conceptually well-­structured argument derived from Gödel’s two proofs can be made to show. Otherwise, the very expressive power of conceptual frameworks allows the formation of paradox-­inducing self-­referential conceptual structures that threaten to bring down the entire framework. Indeed, as Hofstadter suggests, one might imagine  – in a manner reminiscent of the ancient Greek Pyrrhonists (Lind 2013) – deliberately exploiting the paradoxes to achieve just that.

7 A formulation I owe to Ruth Millikan (personal communication). 8 Per Brentano. Philosophers also speak of intentionality in another way, as that which is willfully motivated: e.g. Felix (2008), following Davidson (2001), uses it in this sense. 9 By saying this, I reveal not just my phenomenological leanings but my antirealist and Kantian metaphysics: concepts are interpretation; the world is essentially always revealed to us interpreted, because we cannot set our conceptual frameworks aside. Neither can we thoughtfully reconstruct the uninterpreted world. There is no “literal” meaning that can simply be taken for granted (see Section 5).

Concepts and Conceptual Frameworks


Figure 1. “Video feedback at last night’s NYU IMPACT performance”: public-­domain image downloaded from Wikimedia Commons (

Putting this in another way, concepts portray the world as being one way and not another, with no conceptual perspective privileged above all others: a conceptual perspective is necessarily a limited one. That most pointedly applies, in paradox-­ threatening fashion, to a conceptual view on concepts themselves. For a good visual equivalent, consider what happens if you train a camera on its own playback monitor. If the focus is a bit off, one gets an amusing regress (see Figure 1); if it is straight on, one gets an attractive but uninterpretable abstract pattern. At least for human conceptual agents (simpler conceptual agents may be less encumbered), concepts have a habit of getting very abstract very quickly, particularly when we stop to contemplate ourselves contemplating them, as opposed to merely getting on with using them non-­reflectively. Consider the concept of grasp, highlighted by Vittorio Gallese and George Lakoff (2005) in making their claim that all that any concept amounts to is nothing more than a specific sensorimotor engagement, albeit with parts of that engagement typically suppressed. Their assertion is oddly reminiscent of George Berkeley (1999 [1710]), with his claim that no one has a general concept of “triangle” that is not the image of a specific triangle, albeit with parts of that image ignored. Their choice of such a seemingly concrete concept is oddly telling, since their account is meant to apply to all concepts – even the most abstract. The problem is that even the concept of grasp is not so clearly an “actual” sensorimotor grasping as Gallese and Lakoff would have one believe. Pace their account, the concept of grasp is at least as unlike a specific instance of grasping as it is like. The very power of the concept – the power of concepts in general – is the way it abstracts away from any specific instance of application at the same time that it applies back to every


Joel Parthemore

conceivable application: both the cases of physical grasping and those – such as “eureka” moments – where the grasping is noticeably metaphorical/abstract.

3.2. Theories of concepts Within philosophy of mind, theories of concepts fall within a subdomain, active at least since the late 1990s, whose leading figures include the following. • Jerry Fodor, with his Informational Atomism account (1998), whereby most concepts are non-­decomposable physical-­symbol-based atoms that track their referents in the world in law-­like fashion: e.g., a concept of gold is a concept of gold because it tracks all and only gold (and not, say, philosophers), • Jesse Prinz, with his Proxytypes Theory (2004), which he describes as “informational atomism without the atomism” (ibid: 164), and • Peter Gärdenfors with his Conceptual Spaces Theory (CST) (2004) – based, like Proxytypes Theory, on Eleanor Rosch’s work on prototypes (1975, 1999); as well as, perhaps, • Ruth Millikan, who offers no theory of concepts as such, but sees concepts less as representations (say) than abilities to form representations10 and stresses how accounting for them requires a teleological account (Millikan 1998, 2010), and • Edouard Machery, who argues (2009) that concepts do not constitute a proper class of things at all, but several disjunct sets: one set belonging to psychology, another to philosophy, another to linguistics, etc. My own Unified Conceptual Space Theory (UCST) (Parthemore 2011b, 2013, 2015; Parthemore and Morse 2010; see below) is largely based on and attempts to extend Gärdenfors’ work in a more algorithmically well-­defined and empirically testable direction, offering a precise algorithmic “recipe” for the creation, evolution, destruction, and application of concepts – both at the level of the individual and that of the group or society. It comes with the aforementioned software program, intended to be used as the basis for a series of experiments designed, with a psychologist, to test the underlying theory (Parthemore 2015).

3.3. The Unified Conceptual Space Theory (UCST) UCST attempts to fill in the gaps in the self-­avowedly scaffold-­like structure that is CST. In the process, it makes explicit the commitment to enactive philosophy that CST leaves implicit; talks of representations11 in a much narrower, more precise 10 A formulation I owe to Millikan, via personal communication. 11 Needless to say, this disallows talk of e.g. neural representations. Meanwhile, many would talk of mental representations as something ontologically distinct from so-­called external representations, like paintings. UCST allows that such a distinction is, at best, a useful conceptual distinction and not a prior ontological one. What makes anything a representation is the perspective that an agent intentionally takes toward it.

Concepts and Conceptual Frameworks


way, borrowed from Inman Harvey (1992), according to which agent A intentionally (with wilful motivatation) uses B to stand in place of C for herself or another agent D; and, most importantly, provides a framework within which all of an agent’s (or, on another level, society’s) many conceptual spaces, as discussed by CST, can come together in a single, unified space of spaces. UCST makes the following empirically testable claims (see Parthemore 2015): 1. All concepts can be assigned to one of three sub-­categories (objects, events, and properties) that derive from three innate proto-­concepts (proto-­objects, proto-­ events, and proto-­properties): so-­called because they fail to fulfil all the properties of “true” concepts (in particular, they are too few in number to be, of themselves, productive; and, being innate,12 they are not under the agent’s endogenous control). These sub-­categories align, more or less, with the English lexical categories of noun/pronoun, verb, and adjective/adverb. 2. All concepts can be oriented with respect to one of three dimensions that define the unified space and that represent the integral dimensions (see Gärdenfors 2004: 24–26) in common to all concepts: namely, an axis of generalization from most specific to most abstract, reprising the familiar “ISA” hierarchy whereby e.g. a cat is a mammal is an animal is an organism, etc.; an axis of abstraction from most concrete and seemingly non-­conceptual/physical to most abstract and straightforwardly (higher-­order) conceptual/mental, where all points along the axis represent one and the same target at different levels of abstractness; and, finally, an axis of alternatives, along which one finds all the possible variations of a concept at the same level of generality/specificity and concreteness/abstractness, derived by varying the values of any one or more of the integral dimensions defining the concept most specifically (e.g., hue, saturation, and brightness in the case of colour). 3. In addition to these proximal connections to “neighbouring” concepts within the unified space, concepts relate to distal (“non-­neighbouring”) concepts in three possible ways: a. Some concepts (more concrete object and event concepts, but not abstract objects or events, or properties in general) have one or more necessary subcomponents of the same broad category (the subcomponents of object concepts are always object concepts, that of event concepts always event concepts). b. All concepts are defined by one or more integral dimensions. c. All concepts exist within a context of other concepts with which they are more or less routinely associated (co-­occurring). A useful metaphor is children’s building blocks, but a special kind where blocks affect not only directly adjacent blocks but ones that may be considerably removed in the pile. 12 In this way, UCST endorses a very modest nativism, in stark contrast to the radical nativism commonly associated with the early Fodor, whereby all – or nearly all – of our concepts are innate.


Joel Parthemore

3.4. Knowledge representation Within cognitive science, the same discussions covered in philosophy of mind under theories of concepts have traditionally fallen under the rubric of knowledge representation (KR), with a focus on propositionally structured (or structurable) knowledge such as might be expressed in the following statement from the programming language Prolog, which may be read as saying that two persons are first cousins if each has a parent such that those parents are siblings: first-­cousin(X,Y) :- parent(R,X), parent(S,Y), sibling(R,S). What one might call the “KR view on concepts” is problematic for UCST because, like CST, UCST takes concepts to be beholden neither to knowledge that, as an “intellectualist” like Fodor or Jason Stanley (2013; see also Felix 2015: chs. 4–5) would have it; nor knowledge how. Concepts sit in the middle, being both knowledge how and knowledge that and, at the same time, neither. They are neither particularly propositionally amenable (as knowledge that is traditionally understood) nor propositionally averse (as knowledge how is portrayed by all but the intellectualists). Put another way, concepts are neither strictly representational (as the KR view suggests) nor strictly non-­representational and associational (as Rodney Brooks might have it: see e.g. Brooks, 1991a, b). Again, concepts sit in the middle: looking more like simple associations when pushed the one direction, more like iconic and symbolic representations when pushed in the other. It follows, contra the KR view, that concepts are neither intrinsically linguistic nor – for linguistic agents like human beings – properly separable from language. Language transforms and extends our conceptual abilities, not only facilitating the sharing of concepts but allowing existing concepts to become more abstract and making possible the contemplation of abstract concepts that probably could not arise without language (Parthemore 2014:  197–198). An agent who lacks language will have concepts more toward the knowledge how, more toward the non-­representational and associational, more toward the first- as opposed to higher-­ order, more toward the concrete than the abstract – with all of these positioned along a continuum. Most worrisome of all, perhaps, the KR view on concepts risks conjuring what might be seen as the worst renderings of the mind-­as-computer metaphor: the mind as a physically implemented Turing machine, processing through an explicit set of linearly arranged instructions:13 the kind of soulless automaton imagined by Desc-

13 An anonymous reviewer objected, at this point, that I was addressing an “outdated version” of cognitive science; I hope I have convinced the reader of reasons to think otherwise. AGI aside, I could offer as further evidence of the persistence of what I am calling the “KR view” – just to take three examples – SNePS (Shapiro, 1978; Shapiro and Rapaport 1987), ACT-­R (Anderson 1996), and Cyc (Lenat 2006): all deeply “symbolic” systems.

Concepts and Conceptual Frameworks


artes for all non-­human animals, extended to human beings by T.H. Huxley (1874; see also Kim 2007). Once again, cognitive semiotics offers a contrasting perspective.

4. A cognitive semiotic perspective on concepts As suggested in Section 2, cognitive semiotics provides a way to consider how concepts on the one hand and language on the other (or sign-­based communication more generally) are both deeply intertwined and pull apart. Such a position is controversial in some circles, as concepts are often taken to equate to lexical concepts – either as a matter of logical stipulation or empirical consequence. Nevertheless, the opening for such a position is provided by the broad view cognitive semiotics is inclined to take toward communication and cognition, and the consequent “conviction that language – its nature, evolution and development – cannot be understood outside the context of a more general approach, taking both meaning and mind seriously” (Zlatev 2012: 7). Meanwhile, the arguments from philosophers in favour of “animal concepts” (Newen and Bartels 2007; Allen 1999) and the empirical evidence for complex cognitive capacities – in effect, conceptual abilities – in non-­human species (including corvids and delphinids, as well as the great apes; see e.g. Jacobs et al. 2015; Osvath and Sima 2014; Osvath et al. 2014; Osvath and Persson 2013; Osvath and Karvonen 2012; Osvath 2009; Bugnyar 2011; Raby et al., 2007) continue apace. Figure 2. A revised version of the semiotic hierarchy (cf. Zlatev 2009a)


Joel Parthemore

Figure 2 shows my proposed revision or re-­interpretation of Zlatev’s (2009a) semiotic hierarchy, with its four successive levels in the organization of meaning: life, consciousness, sign function, and language, with each level building on the previous. Because of what I see as their intimate relationship, I place concepts on the same level as consciousness and add two additional levels, (intentional inter-­agent) communication between consciousness/concepts and sign use,14 and cognition between life and consciousness/concepts. The context for life itself is, of course, the umwelt – which becomes, for the conscious agent, the lifeworld. Drawing on its roots in phenomenology, which takes experience as a fundamental entity – ultimately explanans rather than explanandum – cognitive semiotics focuses attention on the conceptual distinction (as opposed to a prior ontological one) between concepts as we reflect on them and concepts as we possess and employ them (presumably most of the time) non-­reflectively. It is the difference between concepts at the level of pre-­reflective self-­awareness and that of fully reflective self-­consciousness (e.g. Gallagher 2007). Of course, there is no single school of phenomenology; in particular, there is a contrast to be made between those phenomenologists, following certain readings in Husserl, who would use phenomenological methods in general and the phenomenological reduction in particular to pass beyond experience to the world itself, and those who take what might be seen as a more Kantian line, seeing the mind-­independent world and the lifeworld, the Körper and the Leib, as ultimately inextricably intertwined: all that one sees, one sees from a perspective, and no perspective is privileged above the others. Regardless, “…the basic idea is to depart from experience itself, and to provide descriptions of the phenomena of the world, including ourselves and others, as true to experience as possible – rather than constructing metaphysical doctrines, following formal procedures, or postulating invisible-­to-consciousness causal mechanisms that would somehow ‘produce’ experience” (Zlatev 2012: 15). Given such an approach, it seems difficult if not impossible to give full credit to the way we experience our conceptually structured thought, and to allow that the way we experience it may not be revealing of its pre-­reflective nature. Table 1 gives some idea of how these two views on concepts – reflective and pre-­ reflective – give rise to a whole series of contrasting perspectives on the nature of concepts. Through its focus on meaning as dynamics and on meaning making in the context of dynamical systems, cognitive semiotics shows how concepts, though they are concepts to the extent they are relatively stable across contexts, can

14 I intend communication to be more-­or-less equivalent to the stage Zlatev himself adds in the current, five-­stage version of the model, which he terms “culture”, even if the emphasis is slightly different (Sonesson 2015b, Zlatev 2015b). Note that I am using “communication” in a narrow sense; Zlatev (2009b) applies the model as a whole to communication in the broad sense, showing which kinds of communication are possible at each level.

Concepts and Conceptual Frameworks


only function as concepts to the extent that they are able to change as needed to fit each new context: to revise the old proverb, one cannot fit an unchanging peg into a constantly changing hole; the essentially static nature of concepts is an illusion, and meaning only appears fixed as a necessary convenience (Parthemore 2014). Table 1. Two contrasting yet complementary perspectives on concepts Concepts as…

Concepts as…

things we reflect upon

things we possess and employ non-­reflectively

intentionally imposed “top down”

activity derived “bottom up”

product of reason (rationalism)

product of discovery (concept empiricism)

objects of perception

means of perceiving objects

consciously accessible

partly or substantially not consciously accessible

knowledge that

knowledge how



symbolic entities

skilful abilities

“mental” representations

abilities to form representations

sub-­propositional components of thought

subconscious components of interaction

abstract and “mental”

concrete and “physical”

abstracted from context

sensitive to context

“internal” to agent

“external” to agent – in the environment



discrete (individuable)


easily tied to language

clearly distinct from language



The question of whether concepts change (Woodfield 1990) and, if so, how intrinsic is that capacity for change (do some change? all? must they change?) remains one of no small debate. Fodor’s concepts cannot change because concepts, on his account, are strictly public entities defined purely by their extension: my concept of gold is exactly the same as the ancient Greeks’ concept of gold, despite our differing understandings of and beliefs about gold, because the thing that gold


Joel Parthemore

refers to is the same. From a cognitive semiotic perspective, in contrast, beliefs are partly constitutive of concepts and not just (as Fodor and everyone else allows) concepts constitutive of beliefs: i.e., concepts are defined both by their extension and intension. Cognitive semiotics is open not just to the possibility but to the inescapable reality of conceptual change because of the way it “studies meaning on all levels – from perception to language, along with the various forms of ‘external’, cultural representations (theatre, music, pictures, film, etc.) – primarily as dynamic processes rather than static products…. Nearly all [cognitive semiotics] scholars have made the point that viewing meaning in purely static, structural terms is insufficient for understanding the essentially relational, subject-­relative, and (often) interpretive nature of semiosis” (Zlatev 2012: 16). Cognitive semiotics views meaning as fundamentally dynamic, because of the way it views not just life but the lifeworld itself as fundamentally dynamic. In alignment with enactive philosophy, which takes interaction as a fundamental entity and agent as continuous with (not ultimately separable from) environment, cognitive semiotics shows how concepts impose crisp boundaries on what are, from a conceptual perspective, underlying continuities. In psychological parlance, this is the phenomenon known as categorical perception. As Stevan Harnad explains (1990: ix), “categorical perception occurs when the continuous, variable, and confusable stimulation that reaches the sense organs is sorted out by the mind into discrete, distinct, categories whose members somehow come to resemble one another more than they resemble members of other categories”. He continues (1990: 3): “a [categorical perception] effect occurs when… a set of stimuli ranging along a physical continuum is given one label on one side of a category ‘boundary’ and another label on the other side…. In other words, in CP there is a quantitative discontinuity in discrimination at the category boundaries of a physical continuum…”. The implication is that, pace the natural kinds philosophers, so-­called natural kinds concepts do not, to borrow a well-­worn phrase from Plato’s Phaedrus, “carve nature at its joints”; if nature has joints at all, if the notion of natural “joints” is even coherent, then these are not the joints that matter conceptually. This is not to say that concepts (for the individual or the society) are free to draw lines where they want: they are not, any more than agents are free to believe whatever they wish about the world; only that even such basic things as our seemingly inescapable tendency to view the world in terms of objects, events, and properties may say more about us than it does about the world: we literally cannot imagine seeing the world any other way. With its refusal to shy away from paradoxical or even seemingly contradictory perspectives, cognitive semiotics shows how concepts hide the very world they reveal, beneath a veil of interpretation. They both vastly extend cognitive capacities and, at the same time, in fundamental ways limit them. On the one hand, they facilitate an active choice of response: faced with the “same” circum-

Concepts and Conceptual Frameworks


stances on different occasions, the same agent can choose differently, based on her reasoned consideration of what has gone before and what is yet to come. Her mind is not trapped in the present moment but free for mental time travel (see e.g. Suddendorf and Corballis 2007; Osvath and Gärdenfors 2007) into the past and the future. On the other hand, they rob the agent of a certain essential spontaneity: a fully in-­the-moment experience that, perhaps, only strictly non-­ conceptual agents can have; at the same time, by presenting the world in one way rather than another, they blind agents to the range of possible alternative perspectives. Conceptual abilities are neither a straightforwardly good nor bad thing; they are something that sets us very much apart, though, from species that lack those abilities. Finally, through its theoretical and methodological pluralism, cognitive semiotics reveals how concepts are not one unified phenomenon. No single theory will ever capture conceptual abilities and frameworks in all the ways one might be interested in. Cognitive semiotics is in general open to the possibility that there is no one, single, “correct” perspective on many if not most matters of substance, including the endless tedious debates over representationalism and whether concepts are best seen as representations (as Fodor or Prinz would have it) or abilities (as Evans (1982) is commonly read, as Gottleb Frege (1951) doubtless intends, and as Alva Noë (2004) would have it; see also Parthemore 2011a: 20–30). Given its rejection – with the enactivists and phenomenologists – of unconscious representation but its openness to a kind of (conscious or reflectively self-­conscious) representationalism, cognitive semiotics is amenable to the possibility that concepts both are (when we look at them) and are not (when we simply use them) representations (see again Table 1). That is, cognitive semiotics advocates a measured representationalism/anti-­representationalism and rejects the dogmatic anti-­representationalism that many discern in radical enactivism (see Zlatev 2012: 12). Where does all this conceptual revisionism leave us in terms of the mind-­ as-computer metaphor? Just as digital computers were named after their human predecessors, because of a certain striking resemblance. In particular, the computers’ capacity to perform calculations that previously only a human computer could do; so, too, may the human mind be seen to reflect the operations of a digital computer – in some ways – albeit a massively parallel one, with a storage and computing capacity that, in certain key respects, continues to put computers to shame. Moreover, certain living creatures, including at minimum human beings, are conceptual agents – something that, given all available evidence, no existing artefact can lay claim to. At the same time, the take-­home message that cognitive semiotics invites us to draw from Brooks among others is that much of our vaunted intellectual capacities – supposedly so flexible, so subject to our free will, so demonstrative of our superiority over other species – are surprisingly mechanical, straightforwardly algorithmic, and utterly dependent on the environment being a certain way. Much


Joel Parthemore

of what we think of as conceptual thought is not under our endogenous control. Perhaps we shy away from the mind-­as-computer metaphor because we see too much of the “mindless” computer in us.

5. A cognitive semiotic perspective on concepts: The case of metaphor Before I close, I would like to offer a more specific and concrete example of just how cognitive semiotics can re-­shape our thinking about concepts, by looking at how it can help us re-­conceptualize not just this one metaphor but metaphor in general – not as a strictly linguistic phenomenon but, in keeping with Conceptual Metaphor Theory [CMT] (Lakoff and Johnson 2008 [1980]; see also Fusaroli and Morgnani 2013) first and foremost a conceptual one, underlying and making possible linguistic metaphor.15 CMT has undoubtedly contributed to the study of language and cognition with its recognition that what is metaphorical in language is first metaphorical in thought: again, language and systematically structured thought, though deeply intertwined, pull apart. Nevertheless, CMT remains burdened with conceptual baggage from its predecessors: most notably a tendency toward a strict ontological (as opposed to pragmatic and conceptual) distinction between literal and metaphorical and a consequent naive assumption that literal meaning can more or less be taken for granted, while metaphorical meaning requires explanation. This leads to many a tortuous debate over whether a given construction is or is not metaphorical and whether, if it is, it qualifies as “dead” metaphor. In keeping with CMT, and in alignment with the discussion above, I hold that metaphor is not a strictly linguistic phenomenon. In contrast to CMT, and again following on from the earlier discussion, I wish to suggest that metaphor is not definable in opposition to literal meaning, at least without a full explanation of what counts as literal meaning, which Allwood (1981) argues to require as much explanation as metaphor; that it is not clearly demarcated from non-­metaphor; and that it is consequently not meaningfully separable into “living” and “dead”. Metaphor is not some type of natural kind, nor indeed any other reifiable entity. It is not something one possesses a stable collection of so much as something one actively and continuously engages with and helps to create. What, then, is the relation between literal meaning and metaphor? The literal/ metaphorical distinction – often useful, perhaps necessary – is a conceptual rather than a prior ontological one. Alternatively, and, I believe, equivalently, metaphor may be defined as a special case of contextual meaning determination, based on meaning potentials (Allwood 2003). From a UCST-­inspired view, essentially all conceptual meaning involves the mapping across domains meant to be characteristic 15 This is based on work I am currently doing with Jens Allwood at the University of Gotherburg, Sweden.

Concepts and Conceptual Frameworks


of metaphor, precisely because conceptual experience and a non-­conceptual world do not cleanly pull apart (Parthemore 2011b). Consequently, metaphor needs to be defined in a different way, one that does not rely on the presupposed nature of the literal. What distinguishes metaphors is their tendency to map a more abstract domain onto a more concrete one and so give the abstract a faux concreteness. In place of a clear, crisp literal/metaphorical divide, one finds a continuum of meaning from primary to secondary, tertiary, and altogether novel meanings/usages, with no sharp lines at any point. Where one draws the lines at any given time is depen­ dent on context: the lines are pragmatic, not absolute. The conclusion is not that all meaning is metaphorical – as some might misread the claim – but that some traces of the metaphorical remain in even the most standardized of meanings/ usages and vice versa. A cognitive semiotics perspective on metaphor reveals metaphor as a special kind of sign, according to the definition of “sign” I offered in Section 2. That which is paradigmatically metaphorical is that which – with varying degrees of explicitness – calls attention to itself as metaphor, presenting itself not as an error in usage (either slip or failure in competence) but as a deliberate act, often taking something abstract and giving it more concrete expression. Linguistic metaphor is generally (though not always) for the benefit of another agent; conceptual metaphor is primarily for one’s own sake – while obviously serving to shape one’s language. With its non-­standard use of meaning and its more-­or-less deliberate juxtaposition of domains that seem far removed from one another and are not normally juxtaposed, metaphor of whatever kind – conceptual or linguistic, “internal” or “external” – challenges the conceptual agent to ferret out the intended meaning, through some degree of pre-­reflective or self-­reflective contemplation, thereby experiencing her social and physical environment in new ways. Such a perspective on metaphor explores metaphor in all its phenomenological richness. Not limiting metaphor to one form or sensory modality, it shows how visual metaphor is as much metaphorical as linguistic or conceptual metaphor (see Figure 3) and opens up such possibilities as tactile, gustatory, and olfactory metaphor, along with various combinations. It finds little use, most of the time, in dividing metaphor between “living” and the “dead” but rather sees metaphor – being a subdomain of conceptual meaning – as deeply and intrinsically dynamic, so that even as that which was living may die, that which was dead may be given new life. Metaphor that ceases to exploit novelty or is no longer able to do so whatsoever – i.e., metaphor that becomes strictly conventionalized – ceases to be meaningfully interpretable as metaphor.16 Metaphor is not just dynamic but often deeply creative, lending itself to “thinking outside the box”, turning concepts and language, ideas 16 Consider the English word haven, whose Nordic equivalent continues to retain its earlier meaning of “port” or “harbour”. In English, what had been the metaphorical


Joel Parthemore

and images, from just another tool into something playful. When one’s view of the world becomes too settled, too mundane, metaphor enters in. Figure 3. “This is your brain on drugs”: centerpiece of a popular and widely memed campaign by the US government as part of its “war on drugs” in the late 1980s and early ’90s. Public-­domain image downloaded from Wikimedia Commons (

This perspective on metaphor eschews talk of the literal and sees the seemingly sharp boundaries between the metaphorical and non-­metaphorical in light of categorical perception. The upshot is that the sharp boundaries are a product of how we use and interpret metaphor and not part of its underlying ontological nature, whatever that might be. Such a move makes questions of what precisely is and is not metaphor highly dependent on context, and arguments over such – whenever some global perspective is assumed – ultimately meaningless. This perspective on metaphor reveals how metaphor both tremendously extends our cognitive and linguistic possibilities – allowing the juxtaposition of nearly any conceptual domain with any other domain – and, at the same time, limits them. It does so in part by opening the possibility to yet more confused communications and confused thoughts (just because any two domains can be juxtaposed does not mean that they should be: a good metaphor should not be “too” novel but just novel enough, in part by lending the abstract – the most common target of metaphor – a meaning has become the primary meaning, and the earlier primary meaning has been all but completely forgotten.


Concepts and Conceptual Frameworks

seeming concreteness that it does not genuinely have). The lesson may be that much of what we think of as paradigmatically concrete may be far more abstract than we realize: not least our seemingly most concrete concepts. Finally, through its theoretical and methodological pluralism, cognitive semiotics shows how metaphor can usefully be considered through its own sets of contrasting perspectives (Table 2). Table 2. Two contrasting yet complementary views on metaphor Metaphor as…

Metaphor as…

conventionalized (based on degrees of conventionality)

novel (valued for its novelty)



conceptual (“internal”)

linguistic/visual/etc. (“external”)

tied to a certain form (spoken/written language) or modality (verbal/auditory)

agnostic toward form and modality; open to tactile metaphors, etc.

continuous with other metaphor and with the “literal”

discrete; easily distinguishable from other metaphor and from the “literal”

6. Conclusions This chapter began from the premise that GOFAI was never so “bad” FAI as it is often made out to be but that, nevertheless, like any field of inquiry, cognitive science is up for periodic renewal, and cognitive semiotics – with its focus on the meaning of meaning; with its prioritizing of dynamic processes over stable or static entities; with its inspiration from enactivism, phenomenology, and semiotics; with its inclination toward theoretical and methodological pluralism – provides one highly promising opportunity to do so. A proper understanding of concepts is necessary for understanding both mind in general and consciousness in particular, where concepts and consciousness are two sides of one coin. However, too much of the thinking about concepts in “traditional” cognitive science remains mired in discussions about (generally symbolic – indeed generally propositional, explicit as opposed to implicit, etc.) knowledge representation. Cognitive semiotics has much to inform theories of concepts regarding e.g. the nature of phenomenal experience and the lifeworld but, even more basically, to break discussion of concepts out of the traps into which it has all too frequently fallen: allowing one to see concepts as dynamic rather than fixed, prior to and foundational to language rather than part-­and-parcel with, of one nature when we stop and reflect on them but logically of another when we possess and employ them non-­reflectively, and so on. The hope is that a better understanding of concepts can, in turn, inform


Joel Parthemore

cognitive semiotics’ pursuit of a better understanding of the uniquely human place in the world. In the meantime, by working together, cognitive semiotics and theories of concepts can inform a better understanding of metaphor: one that is not based on antiquated notions of a strict and far-­from-unproblematic distinction between literal and metaphorical.

Acknowledgments The author gratefully acknowledges the financial support and creative intellectual environment of the Centre for Cognitive Semiotics at Lund University, where much of the work for the present chapter was conducted; his colleagues in the Department of Cognitive Neuroscience and Philosophy at the University of Skövde; as well as useful feedback received during his presentation at the International Association for Cognitive Semiotics (IACS) conference in Lund, Sweden, in September 2014, and from the reviewers of earlier versions of this text.

Morten Tønnessen

Chapter 4 Agency in Biosemiotics and Enactivism 1. Introduction: what is an agent? Although there is currently no consensus in the biosemiotic community on what constitutes a semiotic agent, i.e. an agent in the context of semiosis (the action of signs), most respondents to a recent survey agree that core attributes of an agent include goal-­directedness, self-­governed activity, processing of semiosis and choice of action, with these features being vital for the functioning of the living system in question (Tønnessen 2015).1 In this chapter I seek to compare the biosemiotic understanding(s) of agency with the enactive understanding(s) of agency. Despite considerable overlap in views and outlook, there are in some cases sharp differences in how agency is understood in biosemiotics and enactivism (e.g. Varela et al. 1991). Mapping the differences in outlook and understanding is complicated indeed, given the diversity of views in both camps. Before we get into any intricacies, however, we should first ask: Why is it of interest to compare enactivism and biosemiotics, with regard to their respective notions of agency? And what does this have to do with cognitive semiotics? A partial answer to the first question is found in the fact that the phenomenon of agency is without doubt central to both enactivism and biosemiotics, and their respective perspectives on the nature of life. This is outlined below. But even if it makes sense to compare enactivism and biosemiotics and their respective views on agency, and even if commonalities in views are possible to identify, the second question remains: What does this discussion have to do with cognitive semiotics? My answer here is that given that biosemiotics and enactivism are two of the most innovative contemporary perspectives on the nature of life and living systems, cognitive semiotics, as a field devoted to the study of cognition, will likely depend

1 I will not attempt to define each of these four features. Some examples of their common usage is provided by Tønnessen (2015). Precisely defining each of the four features mentioned in this biosemiotic definition of agency would not necessarily clarify matters much, given that each new definition would in turn call for yet another definition, and so on ad infinitum. Most definitions are provisional and indicitative, nothing more. However, let me say this much: The phrase “choice of action”, as it is understood in biosemiotics, does not refer only to conscious choices, since agency is taken to occur in all living systems. The “choices” being made refer to the (sign-­based) taking of different paths among several alternatives, depending on “processing of semiosis” at any particular level of biological organization.


Morten Tønnessen

on work done in biosemiotics and/or enactivism as foundationally important. That remains to be seen, but for a start, it is at least a plausible hypothesis. In the programmatic article “Theses on biosemiotics: Prolegomena to a Theoretical Biology” (Kalevi Kull et al. 2009), the authors list “seven properties or conditions that must be met” in the interpretive architecture of an organism, which “the simplest model of the creation of a semiotic relationship should involve” (ibid: 171). Agency, described as “[a] unit system with the capacity to generate end-­directed behaviours”, is the first criterion on the list. Where there is life, in consequence, there is agency – life is fundamentally agential. The most central term in enactivism, at least originally, is autopoiesis (Maturana and Varela 1980; Maturana and Varela 1998), the process by which a living system maintains and reproduces itself. “By definition,” writes W. Cameron (2001: 456), “agency is concerned with action in the world and an entity is characterized as having agency in so far as it manifests the capacity for action that is for it.” He claims that agency and autopoiesis, the “autonomous self-­maintenance of autonomy” (ibid: 454), are complementary: Autopoiesis does not logically require agency, but an autopoietic entity will only survive insofar as the dynamic adaptation of its structural coupling can maintain those variables essential to its autopoiesis within their essential limits in an unpredictable environment, and this requires agency. The capacity of autopoiesis is a necessary criterion for life and the capacity of agency is a necessary criterion for the ongoing maintenance of autopoiesis in an epistemically contingent world.

In another, unrelated discussion of the relation between autopoiesis and agency – and adaptivity, Ezequiel Di Paolo (2005: 450) remarks that “agency is not implied by autopoiesis and adaptivity”, and that there may, in the living realm, “be different degrees of agency measured by the organism’s capability to control and alter its body and environment.” Yet another somewhat enactivist author, Anthony Chemero (2009: 386), claims that “the agent and the environment are non-­linearly coupled, they together constitute a non-­decomposable system”. This is consistent with biologist Jakob von Uexküll’s (1864–1944) – a precursor to biosemiotics – understanding (von Uexküll 1928) that the living organism and its environment as perceived, the Umwelt, together make up one coherent unit. The history of biosemiotics goes back to the foundational work of Thomas A.  Sebeok (1920–2001) in zoosemiotics (the study of animal sign exchange) in the 1970s, and the emerging acknowledgement of the relevance of von Uexküll’s Umwelt theory (1928, 1956) for semiotics.2 On the other hand, enactivism was first launched by Varela, Thompson and Rosch (1991). According to Thomson (2007: 15), the intention was to present a number of ideas as a coherent whole, including the

2 For the full vista, see Don Favareau (2009).

Agency in Biosemiotics and Enactivism


idea that “autonomous agents actively generate and maintain themselves thereby enacting or bringing forth their own cognitive domains”.3 Biosemioticians differ in their attitudes to enactivism. While some biosemiotic scholars see these two approaches as complementary and related to each other, others view enactivism as incompatible with and inferior to biosemiotics. The latter group of scholars will, unsurprisingly, typically criticize enactivism for not being able to account for the semiotic aspects of life processes. If living systems are intrinsically of a semiotic nature, as biosemioticians hold, enactivism is of course at a loss insofar as it does not relate to the semiotic nature of nature. But as we will see, enactivism envelops a whole range of different perspectives, and such dismissive attitudes cannot possible apply to all existing versions of enactivism. As Andreas Weber (2001: 11) states, central arguments or views in enactivism “can be fruitful for a Biosemiotic approach to [the] organism”. Weber (ibid) further points out that “Varela himself already applies concepts as e.g. “signification”, “relevance”, “meaning” which are de facto Biosemiotic.” While Varela was personally “always reluctant about semiotics” due to its formal design, Weber thinks his work “shows a close association of the fields of embodied cognition, biological phenomenology, and semiotics, conflating in what could best be called a biology of subjective agents” (ibid: 12). Although Weber makes convincing points, it is worth remembering that the varieties of enactivism advanced by him, as well as by Varela and Thompson, only form one prominent kind. Nevertheless, this demonstrates that bridging enactivism and biosemiotics is possible in principle. The fact that enactivism is not a single theory can be illustrated by Daniel Hutto and Erik Myin’s recent criticism of another central enactivist, Alva Noë. In general terms, Hutto and Myin (2013: 3) state that it “is certainly true that there hasn’t yet been a definitive articulation of the central and unifying assumptions of [enactivism]”, and observe (ibid: 14) that “when Enactivists speak of “embodied action” and their intellectual opponents talk of “action,” they are not operating with the same notion of action.” Noë (2008: 663), on his part, holds that “[p]erceiving is an activity of exploring the environment drawing on an understanding of the ways in which one’s movements affect one’s sensory relations to things” (italics in the original). In other words, his understanding of perception is tied to both understanding, or knowledge, and movement. For Noë, there is no sharp distinction between conceptual knowledge and practical ability – perception as such is conceived of as a thoughtful activity. In Hutto and Myin’s eyes (2013: 23), Noë’s work is representative of “sensorimotor enactivism”, which they regard as distinct from their own “radical enactivism”. In their interpretation, Noë’s “Sensorimotor Enactivist claims can be read as concerning essentially personal-­level phenomena”, or in other words “interactive organismic engagements” (ibid), and this “version of Sensorimotor Enactivism 3 Cf. Varela’s statement in the Afterword of Maturana and Varela (1998: 255): “I have proposed using the term Enactive to […] evoke the idea that what is known is brought forth”.


Morten Tønnessen

remains significantly conservative. It embraces intellectualist ideas” (ibid). The terms “conservative” and “intellectualist” are here used in a derogative sense, in order to contrast Noë’s kind of enactivism with their own radical kind, which Hutto and Myin claim can revolutionize the cognitive sciences.4 In this chapter I will present biosemiotic understanding(s) of agency, before I proceed to discuss enactive understanding(s) of agency. In each case I will focus on works by selected scholars in the two respective fields. In the case of biosemiotics, I will present the perspectives of Jesper Hoffmeyer and Alexei Sharov (and Tommi Vehkavaara), whereas my treatment of enactivism will focus on the outlooks of Shaun Gallagher and Evan Thompson.5 In the final section, I identify some affinities between biosemiotics and enactivism. Drawing on the foregoing investigation, I also conclude by situating cognitive semiotics within a relevant, thematically more extensive field.

2. Biosemiotic understanding of agency The aforementioned survey carried out in the biosemiotic community, in preparation of a review article (Tønnessen 2015), had 18 respondents, who were given a chance to provide an in-­depth response in form of citations etc. This survey must therefore be regarded as a qualitative study rather than a quantitative one. The conclusion, namely that most respondents agreed that the key attributes of an agent include goal-­directedness, self-­governed activity, processing of semiosis (i.e. sign use) and choice of action, is somewhat open to interpretation. Some biosemioticians would surely say that these four features are necessary but not sufficient conditions for what constitutes agency. In other words, they would hold that further features of agency can and should be identified – for instance traits such as anticipation, habits, self/other, or recursion (ibid: 127–128). From this perspective, the four-­feature definition of agency is a pragmatic, minimal definition open to further elaboration. On the other hand, several non-­biosemioticians – many enactivists and contemporary cognitive semioticians included – might think that some but not all of the four features described are necessary conditions for what constitutes agency. That would be the case if, for instance, you think that only human beings (or a some4 I thank Paulo De Jesus for pointing out to me, in response to reading a draft of this chapter, that “autopoietic enactivism […] speaks of sense-­making which is” in De Jesus’ opinion “almost equivalent to semiosis”. De Jesus holds that autopoietic enactivism “provides a more encompassing framework” than sensorimotor enactivism or radical enactivism, and that it can “subsume other strands of enactivism insofar as these are compatible.” 5 The selection of biosemiotic scholars presented here is intended to indicate some of the variation of biosemiotic perspectives, but as we will see the two perspectives overlap and are on most points not mutually exclusive. In a similar manner, the selected enactivist scholars have been chosen for study in order to indicate some of the most important (but certainly not all) variation in views within enactivism.

Agency in Biosemiotics and Enactivism


what more extensive class of organisms) make choices. This discussion can only be carried out in light of one’s understanding of the range of agency in nature: Are all living organisms agents, i.e. endowed with agency? Are there non-­living agents as well? And if there are non-­living agents too, are all these made by or controlled by living agents? As we will see in the presentations of three perspectives within biosemiotics, even though biosemioticians tend to operate with a wide range of agency in nature, there are some important differences in the understanding of agents even within the biosemiotic community. In this section, I outline biosemiotic understanding(s) of agency, by presenting firstly the views of Hoffmeyer, followed by those of Sharov (and Vehkavaara).

2.1. Hoffmeyer Jesper Hoffmeyer, a Danish molecular biologist, is arguably the most emblematic contemporary biosemiotician. His book Signs of Meaning in the Universe (Hoffmeyer 1996) helped define biosemiotics as an emerging field, which since 2001 has had annual conferences (“Gatherings in biosemiotics”) and since 2005 a society (International Society for Biosemiotic Studies). Of the nine citations that were surveyed by Tønnessen (2015: 129–130), the most popular was a citation from a text by Hoffmeyer (1998: 38): The question of how to account for the existence of natural autonomous agents […] leads us to the question of the origin of semiotic competence, which according to the American linguist and semiotician Thomas A. Sebeok is probably congruent with the question of the origin of life.6

The understanding that life is fundamentally agential, which is typical for biosemiotics, is consistent with Hoffmeyer’s (2009: 940) observation that [i]t is indeed possible to explain non-­human living systems as if they were deprived of semiosis, agency and true interests. This is exactly what biology has been doing for more than a century and often in ingenious ways. In the course of the last few decades, however, it has become clear that the reductionist assumptions that lead scientist[s] to pursue such an awkward approach are no longer supported by scientific evidence.

What Hoffmeyer is claiming here is that there is today overwhelming scientific evidence for the existence of agency and semiosis in living systems in general.7 This understanding presupposes a simple conception of signs, and of action, which can be applied widely. Hoffmeyer tends to make a distinction between semiosis – sign exchange – as a phenomenon that occurs throughout the living realm, and

6 In the full quotation there is a reference to Sebeok (1979: 26). 7 And consequently also for the assumption that living systems have interests, given that agency and semiosis – or, in one expression, semiotic agency – is a precondition for having interests.


Morten Tønnessen

perception, which is in his understanding a very complex semiotic phenomenon. He clearly highlights perception in animals with a central nervous system – i.e. a brain and spinal cord – and sometimes writes as if this is the only kind of perception (i.e., that simpler organisms do not perceive). Furthermore, Hoffmeyer makes distinctions between different kinds of perception. In an interesting case study, presented below, he comments on snakes’ perception. This is related to his understanding of agency, since perception and action are intimately connected, and agency (in perceiving organisms) is thus tied to the specific perceptual capabilities of an animal.8 In the article “Uexküllian Planmässigkeit”, Hoffmeyer (2004: 90) writes: The snake may well go on searching for the prey at the spot where it disappeared, but it will not calculate the eventual path the prey may have taken. The dog on the other hand will proceed away, guided by an anticipation of where the hare would be expected to turn up next. […] This lack of true intermodality in the snake makes it “hard to imagine that the snake can harbor some form of a concept of a mouse in its brain” (Sjölander 1995: 5). The snake apparently can not integrate its sense modalities to form a central construct.

In Hoffmeyer’s reading (ibid: 91), this “does not necessarily mean that snakes are totally deprived of an experienced world, but if indeed they have experiences, these must be lacking in inner coherence”. The behavioural decisions of small animals with simple activity schemes can in Hoffmeyer’s view perhaps “be accounted for in terms of instinctive patterns of sensomotoric reflex circles” (ibid). This would mean that there is “a direct connection between a stimulus and a corresponding behavioral act” (ibid). In effect, in the snake’s Umwelt “there is indeed no mouse, but only things to be searched for, things to be stroked, and things for swallowing, whereas for animals dealing with more complex patterns of challenges a direct coupling of stimulus and behavior is no longer sufficiently flexible.” As we see here, Hoffmeyer claims that whereas animals with sufficiently complex perceptual capabilities relate to objects, other animals – even among animals with a central nervous system – do not. In sum, Hoffmeyer regards both semiosis and agency as universal phenomena within the living realm, but holds that only some animals perceive, and that only some perceiving animals relate to objects. Agency, from this perspective, becomes more complex the more complex the semiotic and eventually perceptual capabilities of an animal are. Furthermore, for Hoffmeyer the individual level of biological organization, that of the organism, is not necessarily of greater interest than other levels, including the somatic and the ecological level. Life is fundamentally agential no matter the organizational or temporal perspective from which we study it. 8 With Hoffmeyer’s perspective, we could say that action is tied to either perception (in animals with a central nervous system) or simpler semiosis (in other animals, and living systems).

Agency in Biosemiotics and Enactivism


2.2. Sharov Alexei Sharov, another central biosemiotician, holds that agents are “systems with goal-­directed programmed behavior” (Sharov 2010: 1052), and that agents are “either living organisms or their products” (ibid), since only such systems can pursue goals. Notably, Sharov holds that humans are not the only species that recruit subordinate agents, or “subagents”, as he calls them. Various agents therefore tend to be interconnected, in Sharov’s perspective, and agents “are always produced by other agents of comparable or higher complexity” (ibid). Elsewhere, Sharov (2013: 345) defines an agent as “a system with spontaneous activity that selects actions to pursue its goals”. This reflects the aforementioned biosemiotic definition of agency as involving “choice of action” – and, relatedly, Sharov (2010: 1053) states that “agents select specific actions out of multiple options”. He thus indicates that for agents in general, whether these are living agents or abiotic agents controlled by humans, different actions are possible, and being an agent amounts, among other things, to choosing specific actions from the available options. Sharov acknowledges the existence of very simple agents, e.g. ribosomes and viruses (Tønnessen 2015, Appendix: 4). An article of particular interest in our context, “Protosemiosis: agency with reduced representation capacity” (Sharov and Vehkavaara 2015), has just appeared. Here Sharov and Tommi Vehkavaara, a Finnish philosopher and biosemiotician, distinguish between “protosemiosis”, where “agents associate signs directly with actions without considering objects” (ibid: 103), and “eusemiosis”, where “agents associate signs with objects and only then possibly with actions” (ibid). Protosemiosis thus refers to simpler forms of sign exchange than eusemiosis does. The term “object” refers first of all to the object in Peirce’s triadic sign model, but simultaneously objects are also defined as “distinct components of the environment which can be addressed selectively and repeatedly by agents for sensing and action purposes” (ibid: 106). An implication of Sharov and Vehkavaara’s acknowledgement of protosemiosis is that Peirce’s triadic sign model is at work in some places in the living realm (namely, in eusemiosis), but not everywhere in the living realm. In protosemiotics, Sharov and Vehkavaara (ibid: 105) state that “[the] Peircean logical concept of sign should be substituted by a more general one”. Interestingly, Sharov and Vehkavaara refers to “actions” even in protosemiosis – which means that their notion of action is of wider application than their notion of proper signs. From the two biosemioticians’ perspective (ibid: 103), “[p]rotosemiosis started from the origin of life, and eusemiosis started when evolving agents acquired the ability to track and classify objects”. Protosemiotic agents – such as bacteria – do not have the capacity to recognize objects. Protosemiosis is thus “a primitive kind of semiosis that supports ‘know-­how’ without ‘know-­what’ ” (ibid). “Our conclusion that sensing in bacteria does not refer to objects”, Sharov and Vehkavaara (ibid: 113) observe, “contradicts […] the views of Hoffmeyer and [Claus] Emmeche (1991) […] who believe that categorization as well as Peircean triadic relationship between


Morten Tønnessen

sign vehicle, object, and interpretant are universal for semiotic processes in all living organisms.”9 As we have seen, Sharov and Vehkavaara pinpoint specific disagreements with Hoffmeyer with regard to the range of what the two call eusemiosis. In their terminology, Hoffmeyer would hold that all biosemiosis (i.e., sign exchange in the living realm) is eusemiosis, and that there is no protosemiosis (or at least this is what they indicate as probable). Sharov and Vehkavaara (ibid: 106), on the other hand, stress that “[o]rganisms with a low level of functional organization (e.g., with [a] small number of sensors and/or weak information processing) tend to have reduced representations of objects around them”, and claim that “it seems reasonable to expect that there is some threshold of functionality below which objects are no longer represented by organisms”. A difficulty in understanding how exactly Sharov’s perspective differs from Hoffmeyer’s concerns the term “object” as it is applied by the two scholars. Notably, while Hoffmeyer uses snakes’ perception as a case study on relating to objects, Sharov and Vehkavaara discuss bacteria’s sensing – and thereby refer to a much simpler organism. They can hardly refer to “objects” in the exact same sense. In Sharov and Vehkavaara’s view, snakes must clearly relate to objects, in their sense of objects – the two indicate that even ticks relate to objects (ibid): “We cannot say that the object (deer) is entirely absent in this sign relationship, because after dropping down the tick expects to land on a warm and furry surface of an animal and is prepared for a certain sequence of actions.” In sum, this implies that, on one hand, Sharov and Vehkavaara operate with a lower threshold for organisms’ relating to objects than Hoffmeyer does, but, on the other hand, they claim that they might differ from Hoffmeyer in acknowledging that Peircean semiosis (broadly speaking, eusemiosis) is not universal in the living realm.

3. Enactive understandings of agency This section will focus on the work of Shaun Gallagher (2007d, 2012; Tsakiris, Schütz-­Bosbach and Gallagher 2007) and Evan Thompson (2007). These two contemporary thinkers are arguably, along with figures such as Noë and Hutto, at the forefront of current enactivism.

3.1. Gallagher Based on a review of various theories and brain-­imaging experiments, Gallagher (2007d) claims, in “The Natural Philosophy of Agency”, that “there is no consensus about how to define the sense of agency.” In Gallagher’s understanding, agency in its proper sense depends on the agent’s “consciousness of agency”. The “sense” of

9 Sharov and Vehkavaara (ibid) acknowledge that “this disagreement possibly results from differences in terminology”.

Agency in Biosemiotics and Enactivism


agency is explained as the awareness or experience of agency – in other words, having a sense of agency is here to be understood as experiencing, or being aware of, one’s agency. In the aforementioned article, Gallagher explores what is meant by sense of agency, and how it arises, i.e. how it is generated. Among other things he makes a distinction between sense of agency and “sense of ownership” for movement. While sense of ownership implies experiencing pre-­ reflectively that “I am the subject of the movement”, sense of agency implies experiencing pre-­reflectively that “I am the cause or author of the movement” (ibid: 348). In cases of involuntary movement, he suggests (ibid: 350), “there is a sense of ownership and no sense of self-­agency”. In describing sense of agency as a complex phenomenon, Gallagher (ibid: 354) makes the proposal that “the loss of the sense of agency in various cases – including schizophrenia, anarchic hand syndrome, obsessive-­compulsive behavior, narcotic addiction, etc. – may in fact be different sorts of loss.” In other words, the sense of agency “might be disrupted in different ways depending on what contributory element is disrupted” (ibid). Whereas sense of agency as a first-­order experience may be either “linked to intentional aspect (task, goal, etc.)” or “linked to bodily movement”, sense of agency can also be conceived of as second-­order, reflective attribution (ibid: 354–355). In “Multiple Aspects in the Sense of Agency”1 (Gallagher 2012), pre-­reflective, minimal or first-­order sense of agency is referred to as “sense of agency1” and higher-­order, reflective sense of agency as “sense of agency2” (ibid: 18). Together, the sense of agency and the sense of ownership is said to “contribute to a basic or minimal self-­ awareness” and thereby constitute aspects of the self. Gallagher’s understanding of agency as described here is obviously quite different from the common biosemiotic understanding of agency. Most importantly, his notion of “sense of agency” concerns other topics than most biosemiotic references to “agency” in living systems. While biosemioticians typically discuss the capacity a living system has e.g. for relating to objects, and through such semiotic agency to be a cause of events in the world (and in its own life), Gallagher is rather concerned with self-­awareness. This means that Gallagher’s area of interest is in this context considerably more limited than that of biosemioticians. Gallagher’s interest in the phenomenon of having a “sense of agency” is limited in at least two senses. Firstly, it is limited to individual agency, rather than to the agency of living systems, whether in our bodies or in the form of ecosystems, social organizations etc. Secondly, it is limited to the sense, the experience of having agency, rather than to agency in itself. Naturally, Gallagher’s topic is of great interest, but the biosemiotic concern is much wider thematically and, unlike Gallagher’s approach, applies to the entire living realm. Gallagher’s emphasis on being consciously aware of one’s agency makes it natural to address a few questions that concern the manners in which we relate to ourselves as someone who acts. Unlike in the biosemiotic context, “we” cannot refer here to “any living organism whatsoever”, but must rather refer to certain conscious animals that have the capacity for consciously relating to themselves.


Morten Tønnessen

Gallagher underlines that “in most cases of intentional action, [we] engage in such action without having an action plan” (ibid: 20), and acknowledges (ibid: 28) that [w]e do not attend to our bodily movements in most actions. We do not stare at our own hands as we decide to use them; we do not look at our feet as we walk; we do not attend to our arm movements as we engage the joystick.10 Most efferent motor control and body-­schematic processes are non-­conscious and automatic.

In attempting to explain how we relate to our own bodies, Tsakiris, Schütz-­Bosbach and Gallagher (2007: 645) suggest that “the coherent experience of the body depends on the integration of efferent information with afferent information in action contexts”. Here, efferent information, manifested as efferent signals, is derived from motor neurons, whereas afferent information, manifested as afferent signals, is derived from sensory neurons. “Overall”, the authors write (ibid), “whereas afferent signals provide the distinctive content of one’s own body experience, efferent signals seem to structure the experience of one’s own body in an integrative and coherent way.” Such an understanding, based on modern neurophysiology, should on the whole be compatible with Jakob von Uexküll’s (1928) model of the functional cycle. Reframed in Uexküllian terminology, we can in simplified terms state that efferent (motor) signals effectively constitute the Innenwelt (inner world) whereas afferent (sensory) signals effectively constitute the Umwelt (outer world) of an organism. What Tsakiris, Schütz-­Bosbach and Gallagher are claiming is in other words that conscious animals relate to their own bodies by (continuously, we must assume) integrating Umwelt and Innenwelt – and, as we might add, Merkwelt (perceptual world) and Wirkwelt (effective world, world of action).

3.2. Thompson While Gallagher’s treatment of agency has largely been devoted to examining the phenomenon of having a sense of agency, Thompson’s approach to agency is a bit harder to summarize concisely. In this chapter I limit myself to aspects of his monograph Mind in Life: Biology, Phenomenology, And the Sciences of Mind (Thompson 2007). I generally agree with Thompson (ibid: x) that “we need scientific accounts of mind and life” that are informed by rich phenomenological accounts of the structure of experience, and that “[p]henomenology in turn needs to be informed by psychology, neuroscience, and biology.” From the perspective of a biosemiotician such as myself, whose chief source of inspiration is Jakob von Uexküll’s Umwelt theory, it is noteworthy that Thompson (ibid: 59) explicitly adopts Uexküll’s Umwelt concept: “This idea of a sensorimotor world – a body-­oriented world of perception and action – is none other than von Uexküll’s original notion of an Umwelt.” While Thompson refers to sensorimotor

10 Or, as we could claim, if we do these things, it is likely symptomatic of something being wrong or out of the ordinary, such as when we are sick or intoxicated.

Agency in Biosemiotics and Enactivism


worlds only, however, it is worth keeping in mind that Uexküll did not limit the application of the Umwelt notion to the sensorimotor worlds in which animals with a central nervous system live, but extended the notion even to unicellular organisms such as bacteria. At any rate, in his modified version of Uexküll’s functional cycle, Thompson (ibid: 60, Figure 3.3) depicts the co-­emergence of autonomous selfhood and world, and refers to the “significance/valence” that results from the “domain of interactions”, which is in turn affected by the “identity” of the animal and its “operational closure”. Later on in the book, Thompson (2007: 146–147) recounts how Varela in one of his final essays revised his view on whether or not “autopoiesis involves anything teleological” (ibid: 146). As we shall see, in this late text Varela also refers to Uexküll’s Umwelt notion as a given. Weber and Varela (2002) “argue that autopoiesis entails immanent purposiveness in two complementary modes” (Thompson 2007: 146), namely identity and sense-­making. As for the latter, in Thompson’s words (ibid: 146–147) “an autopoietic system always has to make sense of the world so as to remain viable. Sense-­making changes the physicochemical world into an environment of significance and valence, creating an Umwelt for the system.” In both Varela’s and Thompson’s view, sense-­making “is none other than intentionality in its minimal and original biological form.”11 In the words of Weber and Varela (2002: 117), because there is an individuality that finds itself produced by itself it is ipso facto a locus of sensation and agency, a living impulse always already in relation with its world. There cannot be an individuality which is isolated and folded into itself. There can only be an individuality that copes, relates and couples with the surroundings, and inescapably provides its own world of sense.

This world of sense can be conceived of in terms of Uexküll’s Umwelt notion, and is in Weber and Varela’s outlook (ibid) related to the organism’s intrinsic teleology associated with its “sense-­creation purpose whence meaning comes to its surrounding”, whereby the organism introduces a difference between environment and world. In short (ibid: 117–118): [b]y defining itself and thereby creating the domains of self and world, the organism creates a perspective which changes the world from a neutral place to an Umwelt that always means something in relation to the organism.

So far we have examined some similarities between biosemiotic thinking informed by Uexküll’s Umwelt theory and Thompson’s and Varela’s enactivism. A key term that has consistently appeared is that of “self”. In some sentences, Thompson appears to equate the term “self” with “agent”. For instance, he writes (2007: 260) that “[k]nowledge implies a knower or agent or self that embodies this knowledge.” 11 “Intentionality” is here used in the sense of directedness adopted in phenomenology (see Pérez, this volume).


Morten Tønnessen

Now, in the biosemiotic outlook, practically any organism is typically conceived of as having a self in semiotic terms. When Thompson (ibid) makes the claim that “[a]ccording to the enactive approach, agency and selfhood require that the system be autonomous”, it resonates quite well with biosemiotic thinking. In this context, Thompson discusses different examples, such as a missile guidance system, which he observes “has no genuine sensorimotor agency or selfhood, and […] cannot be said to be actively and normatively related to the world” (ibid: 261), and a replicating macromolecule, which “is not a basic autonomous system” (ibid: 260). He then makes a distinction between selfhood in a minimal and a narrower sense (ibid): “Whereas biological selfhood in its core cellular form is brought forth by the operational closure of the autopoietic network, sensorimotor selfhood results from the operational closure of the nervous system”. “Agency and meaning”, in short (ibid: 160), require autonomy, minimal agency and meaning require minimal autonomy or in other words autopoietic organization. Yet again biosemiotics and enactivism appears to be reasonably compatible. In contrast with the lucid treatment of minimal and proper selfhood, Thompson’s treatment of the notions of “mind” and “cognition” starts out in a confusing manner. “Where there is life there is mind”, he bravely claims in the book’s Preface (ibid: ix) – “[t]he self-­producing or ‘autopoietic’ organization of biological life already implies cognition” (ibid).12 What, then, is mind? And what is cognition? Are not these concepts related to that of “consciousness”? Apparently not, for later on Thompson (ibid: 162) goes on to reject Lynn Margulis and Dorian Sagan’s view (1995: 122) that all life is conscious. He here states that “immanent purposiveness does not entail consciousness”, and reports that he finds it “unlikely that minimal autopoietic selfhood involves phenomenal selfhood or subjectivity” – this would require “the reflexive elaboration and interpretation of life processes provided by the nervous system” (Thompson 2007: 162). Here we are at a crossroad. On the one hand, Thompson has a nuanced and informed take on consciousness. On the other hand, however, biosemioticians would tend to respond that Margulis and Sagan’s (1995: 122) reference to a “level of awareness, of responsiveness owing to that awareness, [that] is implied in all autopoietic systems”, even the cell, should not be characterized as consciousness, but instead as semiosis or more specifically biosemiosis. In the tradition of Sebeok and Hoffmeyer, it is commonplace to hold that all living organisms, not just those with a nervous system, are capable of interpreting the signs they encounter. Moreover, in the tradition of Uexküll’s ‘subjective biology’, many biosemioticians would regard Umwelten of any organism, be it with or without a nervous system, as subjective worlds that are phenomenal by nature (without implying any consciousness) and

12 Another statement in the Preface (ibid) is, despite its deliberately wide-­reaching content, far more accurate: “The roots of mental life lie not simply in the brain, but ramify through the body and environment.”

Agency in Biosemiotics and Enactivism


that involve a measure of subjectivity and selfhood. As in the case of Gallagher, we see that biosemiotics ultimately has a wider conception of such notions than enactivism does, even in Thompson’s version.

4. Conclusion We have seen in this chapter that there is considerable variation in views on agency within both biosemiotics and enactivism. This makes direct comparisons between the two fields difficult. However, a general tendency is that biosemiotics most often operates with a notion of action, and thereby of agency, that is of a wider range than enactivist notions of action and agency. In other words, biosemioticians tend to have a lower threshold for agency than enactivists do. In the work of Thompson, Varela and Weber, however, there is considerable overlap with the biosemiotic approach, since all these scholars make use of Jakob von Uexküll’s Umwelt theory. Overall the treatment of Hoffmeyer’s, Sharov’s, Gallagher’s and Thompson’s views on agency have shown that biosemiotics and enactivism are to some extent compatible. Within a biosemiotic perspective, I think it is very important to distinguish between proper subjects and quasi-­subjects (Tønnessen and Beever 2014). Proper subjects stand out because their lives have another dimension – namely, unified, cohesive experience of their surroundings – which quasi-­subjects lack. Quasi-­ subjects such as plants, fungi and animals with decentralized bodies also have semiotic agency and quasi-­experience, but only proper subjects have cohesive, integrated experience. In consequence, while biosemiotics can refer to “subjective experience” and “subjectivity” everywhere in the living realm, we should distinguish sharply between subjective experience proper and quasi-­subjective experience.13 As I promised in the introduction, I conclude by situating cognitive semiotics within a relevant, thematically more extensive field. The field I have in mind is none other than that of biosemiotics. Just as biosemiosis, i.e., sign exchange in the living realm, is the study matter of biosemiotics, the proper study matter of cognitive semiotics is cognitive semiosis, understood as biosemiosis related to the operation of consciousness. Since cognitive semiosis is a special case of biosemiosis, we can conclude that, in a similar vein, cognitive semiotics is a special case of biosemiotics. Cognitive semiotics should therefore be conceived of as a subfield of biosemiotics. Contemporary cognitive semioticians and biosemioticians have different theoretical backgrounds, and to some extent different semiotic and philosophical sources 13 As we have seen, Hoffmeyer holds, in this terminology, that snakes and possibly reptiles in general, plus even simpler organisms, are quasi-­subjects. Sharov and Vehkavaara could be said to hold that at least bacteria and plants and fungi are quasi-­subjects.


Morten Tønnessen

of inspiration. Consequently, cognitive semioticians and biosemioticians have tended to disagree on the conception of sign, on the semiotic threshold(s), etc. Personally, I think that development of (or, further development of) common, shared theory, terminology and methodology could be beneficial for both fields. In fact, this is the only way forward if one takes seriously my claim that cognitive semiotics is actually a subfield of biosemiotics. Such theoretical and methodological integration will require bold and innovative thinking in both camps.

Acknowledgements This work has been carried out thanks to the support of the research project Animals in Changing Environments: Cultural Mediation and Semiotic Analysis (EEA Norway Grants/Norway Financial Mechanism 2009–2014 under project contract no EMP151).

Juan Carlos Mendoza Collazos

Chapter 5 Design Semiotics with an Agentive Approach: An Alternative to Current Semiotic Analysis of Artifacts 1. Introduction: the need for a new design semiotics Artifacts have played a central role in social relations since the dawn of history. Most of our relationships are mediated by artifacts. In recent centuries, humankind’s relationship with artifacts has taken on unprecedented importance. Human systems are held and operated by a complex network of devices that make life in society possible. In addition, these devices have evolved from simple instrumental tools to ones that have considerable symbolic and significative value. Artifacts are elements of symbolic mediation by means of which individuals create social ties and activate processes of self-­identification and mutual recognition (Setiffy 2014). For this reason, the semiotic analysis of artifacts has been a field of study of great interest not only in sociology, archeology and anthropology, but also for professions that specialize in the design of artifacts such as industrial design, graphic design and architecture. In this chapter, agentive semiotics (Niño 2015) is presented as a new approach to the semiotic analysis of artifacts. Design semiotics aims to explain how the products of professional design signify (Vihma 2007:225). From the earliest output of design theory and research, semiotics has been a core interest. Cid Jurado (2002) establishes four stages in the development of design semiotics: (1) the stage of precursor studies on objects (artifacts) and their communicational aspects. (2) The pre-­semiotic stage, a first approach to product semantics with tools and theories not informed by semiotic theory. (3) The trans-­linguistic stage, which applies linguistics models to the study of artifacts. Finally, (4) the semiotic stage, referring to the use of semiotic models for analysis, with in-­depth explanations and specialized teaching on semiotic issues at design schools. The last two stages are still relevant today; indeed, a review of state-­of-the-­ art design semiotics (Mendoza Collazos 2015) shows that the trans-­linguistic stage remains prevalent. Some problems in current design semiotics have been pointed out by Vihma (2007). Current design semiotics is limited to explaining artifacts through semantic descriptions, centered on the communicational aspects of the artifacts. As a result, design semiotics overuses linguistic terminology, despite some of these terms being inadequate or appearing forced in the case of product design. Indeed, it can be argued that “form also embraces characteristics other than those called language-­like (if there are language-­like qualities at all in the first place)” (Vihma, 2007: 225) and


Juan Carlos Mendoza Collazos

“if the aforementioned shortcomings and reductions are to be overcome, a design semiotics of a different kind must be introduced” (Vihma 2007: 227). Trans-­linguistic explanations in design semiotics are mainly based on structuralism, on Peircean triadic classifications and on discourse theory. For example, Sánchez (2005) proposes to adopt the three levels of semiotic analysis introduced by Morris (1938): Sema-­analysis, sintag-­analysis and pragm-­analysis. Sema-­analysis, for an artifact such as glasses, takes into account each part separately. Each part is a sema with a particular meaning related to its function. Eye wires/rims support the lenses, the bridge connects both symmetrical sides of the glasses, the nose pads hold the glasses near to the eyes, the temples hold the glasses to the ears, etc. Sintag-­ analysis refers to the relationships between each part of the glasses and their spatial arrangement: bilateral symmetry in front and upper view, asymmetrical in side view, etc. Finally, pragm-­analysis refers to connotations and contextual relationships involving the glasses. Another example of the structural approach to design semiotics is based on the work of Barthes (1988 [1964]): …analysis of everyday products can be divided into paradigmatic and syntagmatic aspects. Let us look at, for example, a table and chairs when they are composed as a set on the paradigmatic axis. They can be associated with dining (in a dining room context), children, office work and so on, on the basis of their form. The syntagmatic axis, in turn, shows the spatial arrangement and relationships between the components of the set (Vihma 2007: 226).

In these two examples we can see that the semiotic analysis is highly centered on the artifact. In consequence, the scope of analysis extends only to descriptions of the artifact. Issues about users and the signifying process are ignored. The Peircean approach to design semiotics is mainly centered on the familiar classification of signs as icons, indexes and symbols. An advantage is that “Peirce’s philosophy does not stem from linguistics and is, therefore, not easily caught up in traps of verbal metaphors” (Vihma 2007: 227). Though the potential of Peircean semiotics has yet to be fully exploited, some of his notions are misused nonetheless. For example, Proni (2003) uses Peirce’s notion of interpretant not as a possible grounded response, but as a type of interpretation. Also, Proni omits the division of interpretants (immediate, dynamic and final), resulting in a narrow use of this notion. Also, notions that have great explanatory potential in design semiotics are overlooked, e.g. the notion of purpose. As we will see below, this notion is central to agentive semiotics. Discourse theory is another approach used in current semiotic analysis of artifacts. Though discourse theory can be considered as an independent field of analysis, it is frequently linked to semiotics. This approach is centered on the general effect on society of a coordinated set of messages. Some applications deal with the narrative of the object, assuming that all artifacts tell their story (Vihma 2007: 224). This kind of analysis does not offer an explanation of users’ signifying processes, nor of how artifacts signify.

Design Semiotics with an Agentive Approach


At this point, it is possible to summarize the set of limitations in current design semiotics. Most explanations are centered on language-­like and trans-­linguistic analysis, and as a consequence the semiosis that emerges when agents use artifacts cannot be sufficiently analyzed. Users and their cognitive processes are overlooked. As a result, designers cannot obtain practical benefits and applications from current semiotic analysis, beyond the mere descriptions of artifacts with a focus on semantics. As user/human centered design tendencies testify, in-­depth knowledge of how we think and the mechanisms of semiosis are very important for designers, but semiotic analysis based on trans-­linguistic explanations cannot fulfil this prospect. Within current perspectives, questions about how artifacts signify and what the conditions for semiosis are remain unresolved. Recent achievements of semiotic analysis still have to be incorporated into design semiotics before it can embrace a cognitive semiotic perspective. As argued here, agentive semiotics has the potential for overcoming these limitations.

2. What is agentive semiotics? Agentive semiotics (Niño 2015) is a new approach to dealing with signification. It links together the achievements of logic, phenomenology, biology and cognitive sciences and puts them at the service of semiotics. Agentive semiotics can thus be considered a form of cognitive semiotics because it agrees upon and assumes (with some discussion) the conclusions and outcomes of recent cognitive perspectives in semiotics. For example, the embodiment thesis (Varela, Thompson and Rosch 1991; Gibbs 2006; Hutchins 2010) refers to the notion that the human body impacts on the way we think and that meaning depends on our bodily characteristics. In this way, rationality emerges from experiential body structures (Johnson 2007), whereas physicochemical/neural reductionism is rejected because it is an “embrained” perspective that overlooks the body as a whole. Signification depends on co-­constituting world-­organism relationship, also known as enaction (Varela, Thompson and Rosch 1991; Thompson 2007; Gallagher 2005, 2009; Stewart, Gapenne and Di Paolo 2010). The notion of enaction may be adapted to refer to any kind of response that an organism gives to/with/in the environment. Thus, for agentive semiotics, enaction is a situated action and part of the ongoing agenda (Niño 2015: 23: 568) explained in what follows. Further, agentive semiotics takes a pragmatic perspective, inspired by Charles Sanders Peirce’s notion of purpose (Short 2007) and following Dov Gabbay and John Woods’ (2003) agenda/agent/agency notions. The principle of ontological continuum (James 1907; Gallagher 2005, 2007; Sheets-­Johnstone 2008, 2010, 2011) is the main epistemological orientation of agentive semiotics, and it therefore rejects ontological (body/mind) dualism. Based on Gabbay and Woods (2003), Peirce (Short 2007) and James (1907), agentive semiotics places central importance on the relationship between signification and goal. Signification emerges from an agent attempting to accomplish his/her agenda, whereby agent is an entity with a capacity for action, and agenda is the goal or type of outcome that an agent seeks to achieve. The agent/


Juan Carlos Mendoza Collazos

agenda relationship, that is, the relation between an agent’s actions and the type of outcome that the agent seeks to achieve through his or her actions (Niño 2015), establishes the agentive approach. It can be noted that agentive semiotics has principles in common with biosemiotics, but some differences must be pointed out. Agentive semiotics focuses on human agency and it is centered on the explanations of the conditions that enable human beings to signify, whereas biosemiotics is centered on the biological basis of semiosis with more of a stress on primeval organisms rather than complex, high-­level agents. According to agentive semiotics, the conditions of the agency are animation (Sheets-­Johnstone 2010, 2011), situatedness (Thompson 2007: 13; Hutchins 2010: 428) and attention (Goldstein and Naglieri 2014). These three conditions are mandatory if an entity is to be considered an agent. Conversely, within some approaches in biosemiotics (e.g. Sharov 2010), artificial devices such as automated machines and robots are also regarded as agents, as they may fulfill the three key conditions: “(1) agents select specific actions out of multiple options, (2) these selected actions are useful in a sense that they help agents to reach their goals, and (3) agents do not emerge by chance, they are produced only by other agents of comparable or higher level of functional complexity” (Sharov 2010: 1053). These conditions are sound but do not go far enough because they do not address the situated, animated and attentive conditions of an agent, which are not present in artificial devices. In agentive semiotics, an agent acts only if it is situated. Its action must be embedded and embodied, and have a kineto-­perceptual basis and a specific place and time. An agent’s action is always affective; it tends to interact with other agents and it is under several degrees of control (from unconscious to highly concentrated actions); thus, it is always attentive (Niño 2015: 92). The agentive approach is centered on the agent/agenda relationship with a focus on the agent in context (Zlatev 2002), where “context” is understood as a sociohistorical and cultural construction regulated by agendas. The narrative character of a (complex) human agent and his volition, memory and capacity for action are comprehensively explained by agentive semiotics, which is based upon the following principles:1 a. Signification is an exclusive activity of agents. b. The process of signifying is intrinsically apt to be assessed. c. The distinction between significance vs. signification. (a) Signification is an activity that is exclusive to agents, and this is the core idea in agentive semiotics. Therefore, things, words, and signs do not intrinsically have meanings, because they do not have inherent agency. In consequence, signification is not a property of signs. If signs or things do not have signification, they are not the starting point in semiotics. Understanding how agents construct meanings, as well as the cognitive processes that make possible the construction of meaning, are 1 The original work Elementos de Semiótica Agentiva by Niño (2015) can be consulted for further explanations.

Design Semiotics with an Agentive Approach


the central questions in cognitive semiotics. In addition, fundamental to agentive semiotics are questions about what an agent is, what an agenda is, how agendas are established and how they are structured, and what conditions allow signification to emerge from an agent’s actions. Along the lines of Zlatev (2002: 289) and in opposition to Sharov (2010), artifacts, machines or artificial devices are systems not fulfilling the necessary conditions to be considered as agents. Nevertheless, agents can assign agency to objects without inherent agency. In this way, objects like artifacts have a special form of agency that we call derivative agency; that is, a kind of agency that has been assigned by agents (designers in this case) as functions (see Figure 1). Figure 1. Derivative agency (prepared by author based on Niño 2015)

For example, in the design process for a cell phone, the designer assigns functions to the artifacts. The cell phone acquires a derivative agency through the design process. But the cell phone cannot act for itself, even if it executes automatic functions. When users recognize the cell phone functions, they may attribute agency to the cell phone. Derivative agency depends on agents’ actions in order to obtain meaning. Only agents can detect derivative agency in the artifacts. For this reason, strictly speaking, an artifact does not interact with users. A blender, for example, has functions assigned to it by the designer (derivative agency). But a blender does not have the capacity to act for itself, nor to recognize other artifacts or the agency of users. The agency of a blender depends upon its users, and they alone can recognize its functions. Artifacts are not agents, even if they work automatically. The capacity to act requires enactive conditions such as animation, situatedness and attention. It also requires cognitive capabilities such as perception, memory or volition (Prinz et al. 2013), as well as special properties such as autopoiesis and an intrinsic value system (Zlatev 2002: 289). Interaction is only possible between agents; however, interaction between agents is often mediated by artifacts.


Juan Carlos Mendoza Collazos

Derivative agency may look similar to the notion of functional information introduced by Sharov (2010). However, there is one notable difference: Sharov uses the notion of functional information in order to argue that artificial devices are agents, whereas in agentive semiotics only agents can recognize, detect, activate and benefit from derivative agency in artifacts. Hence, agents attribute meanings to artifacts. Artificial devices are not agents; they are only prosthetic incorporations that agents use in order to improve efficiency and obtain their goals. Agents incorporate artifacts into their ongoing agendas. For this reason, artifacts do not have their own agendas. They are just a means to an end, and the agendas belong exclusively to the agents. For example, the agenda or goal of a comfortable divan is not that people sit there; the divan merely exists with the properties and functions assigned to it by the designers. It does not possess the will to persuade people to use it. Only agents who see the divan recognize its functions and properties (e.g. comfortable, safe, stable); thus, agents incorporate the divan into their agenda. Similar arguments can be made for advanced automated devices such as robots or CNC machines. This kind of artifact executes functions automatically, but it is not plausible to suggest that robots have their own agendas. The meaning of their functions is only of benefit to the agents. The principles (b) and (c) are best explained jointly. Any kind of possible response, grounded or justified, is comprised of a nodal network of potential responses, namely, significance. Signification, on the other hand, is an actual response that an agent activates (see Figure 2). Figure 2. Significance vs. signification (prepared by author based on Niño 2015)

Design Semiotics with an Agentive Approach


For example, a bicycle has a complex nodal network of possible responses that are justified or grounded by its significance conditions. Signification emerges when an agent updates a response or a set of responses. The agent’s potential responses prompted by the significance conditions of the bicycle can be actions, sensations, opinions, memories, etc. Any kind of possible response (potential or actual) is called responsivity. In brief, significance is a grounded semiotic responsivity, while signification is an active responsivity. In the case of the bicycle, the agent could execute the action of using it, activating the responsivity of the action. Alternatively, the agent could simply remember his childhood when he was starting to ride a bicycle, activating the responsivity of remembering. In this way, the potential significance of the bike becomes actual signification to the agent. The matching process between significance and signification can be assessed in relation to the agenda of the agent. Correction criteria can be introduced into the activity of signifying, but only within the framework established by the agent’s agenda. Taking into account these theoretical foundations, the main goal of this chapter is to set out the applications of agentive semiotics in professional design practice. The theoretical structure that was built upon the foundations of Niño (2015) is linked with design semiotics, in order to analyze the significance conditions of artifacts. In addition, issues taken from Elementos de Semiótica Agentiva are developed and shown in the application proposal.

3. Application of the agentive approach to design semiotics Describing and analyzing the artifact’s significance conditions enhances the design process, because the designer will be able to identify the conditions that determine the user’s response to an artifact. Therefore, the designer obtains tools to conceive of better and more efficient artifacts based on more thorough knowledge of the foundations of users’ signification processes. The conditions of significance of artifacts can be divided into three axes of analysis. The first axis is the conditions of the agency in order to analyze the user. Second, the conditions of the agenda required for the analysis of the design. And third, the conditions of the material world so as to analyze the artifact. Significance conditions shape and determine the possible response of any agent.2

3.1. Conditions of agency Conditions of agency take into account the user profile and his/her agency capabilities. The agent’s background is relevant in describing the user profile, and is composed of (a) the sociohistorical background, which refers to the user’s demographic profile, e.g. gender, age, social status, etc., in a specific context; (b) the 2 See Mendoza Collazos (2015) for a more detailed exposition of these conditions.


Juan Carlos Mendoza Collazos

biographical background, the description of the user’s self, that is, his personality as it has been shaped by experience; (c) the bodily-­experiential background, which refers to the user’s kineto-­perceptual patterns determined by the influence of the material world and the environment; and (d) the memory systems, the information stored in the user’s long term memory. This information constitutes a system comprised of episodic memory, i.e. the user’s events and experiences; semantic memory, i.e. the user’s entrenched cognitive models and meanings; skill memory, i.e. the user’s miscellaneous skills acquired by training; and finally, emotional memory, i.e. the affective pattern that the user displays in his life span and for specific events or circumstances. The structure of the agendas is important for ascertaining the motivations for the user’s actions. Agendas are driven by the agents’ actions and goals; as we have seen, an agent is an entity with a capacity for action; and agenda is the goal or type of outcome that an agent seeks to obtain. Agendas can be general and vague (e.g. calling) or specific (e.g. calling the police on a cell phone for help). Agendas can be complex (e.g. reducing the crime rate) or simple/basic (e.g. dialing a number on a cell phone). Agendas are embedded into one another. The structure of an agenda comprises the main agenda, sub-­agendas, and goals. For example, the main agenda in calling someone by cell phone constitutes the reason for the call (e.g. asking for help); the sub-­agenda is achieving effective communication (e.g. managing to talk to someone); and the goals are the simple actions inherent in the act of calling (e.g. hold the cell phone, dialing, etc.), see Figure 3. Figure 3. Structure of the agendas (prepared by author based on Niño 2015)

Finally, agendas can be motivated by duty, will, or biological programming. Breathing is a biologically programmed agenda; dancing could be an agenda motivated by will or desire, and working could be an agenda motivated by duty. Insofar as

Design Semiotics with an Agentive Approach


agendas are aims that the agent seeks to achieve through his or her actions, they can be completely achieved, partially achieved, or not achieved at all. In agentive semiotics there is a special kind of agenda known as per-­agendas. The accomplishment of per-­agendas depends on actions performed by other agents. For example, in product design, the designer has a per-­agenda because his goals are achieved by user’s actions. An agent’s per-­agenda seeks to persuade other agents to act in a certain way. For this reason, the technical term per-­agenda employs the Latinate prefix per, used in words such as persuasion, to persuade. Thus, if the designer seeks to persuade users, for example, to dial a cell phone in a certain way, that designer’s per-­agenda could be to dial using voice recognition rather than tapping the keys. Thus, the designer’s per-­agenda is only accomplished if the user dials using the voice recognition tool. Further, agentive semiotics introduces the notion of agentive scene. The agentive scene must be analyzed by designers in order to accurately evaluate the artifacts or improve the designer’s choices in the design process. The agentive scene is comprised of a base scene and a semiotic scene; a base scene is the frame in which the agentive meaning emerges, while the semiotic scene is where the agent focuses his/her attention in order to accomplish the agenda. It comprises the environment or physical background, whereas the semiotic scene is the focus of attention and signification.3 In the typical relationship between agent and environment, the base scene and semiotic scene are one and the same. That said, there are events in which the base scene (background, material world) is inactive and the agent focuses his/her attention on a “virtual” semiotic scene, for example when an agent remembers, imagines, or engages with a movie. For example, when an agent uses a hammer, the base scene and semiotic scene are fused and they are the same, but when the agent stops hammering, say, to remember his girlfriend, the agentive scene breaks into two scenes. Also, when an agent is watching a movie, the base scene is inactive and the agent focuses his attention and signification on the movie’s actions. The agentive scene is divided because the physical actions of the agent (sitting on a cinema chair) in the base scene do not correspond to actions that he is giving significations to. This division of agentive scene can be useful for the analysis of all interface types. The final point to be considered in this analysis of the conditions of agency is the conditions of goal achievement, based on the notions developed by Austin and Searle (Austin 1962; Searle 1969, 1983; Searle and Vanderveken 1985). In agentive semiotics, these conditions are related to the agenda’s conditions and are expanded to all type of acts/actions, not only applied to verbal communication. Details about these adaptations are explained by Niño (2015: 96). The conditions of goal achievement describe the acts and the agent’s actions to accomplish his agenda. The conditions 3 However, the agentive scene is a phenomenologically indivisible unit, and its division into two scenes here is only for the purpose of facilitating the analysis and presenting noteworthy distinctions.


Juan Carlos Mendoza Collazos

of goal achievement are made up of satisfaction conditions and success conditions; the former are the agent’s acts and actions, while the latter are the necessary conditions in the agent’s agency (see Figure 4). For example, the satisfaction conditions of calling by cell phone are to dial, hold the cell phone to one’s ear, talk, etc. Success conditions allow us to accomplish the agenda, and in this case could include, for example, not being mute. Figure 4. Conditions of goal achievement (prepared by author based on Niño 2015)

3.2. Conditions of agential resolution The conditions of agential resolution are the second axis for analyzing an artifact’s significance. These conditions refer to a designer’s proposal for agenda accomplishment. The designer’s agenda, or more specifically the per-­agenda, is the way in which the designer seeks to efficiently contribute to the user’s agenda through artifacts. An analysis of the agenda conditions begins with agentive notions of the contexts. Contexts are inter-­subjectively established sociohistorical constructions composed of roles, topics and anchors (see Figure 5). Contexts are regulated by agendas. Roles are the assignable behaviors that an agent ought to deploy in a specific context; topics are admissible dispositions; and anchors are the typical objects associated with a particular context. For example, the context of classroom has roles such as teacher and students. The specific role is a partaker, e.g. John (teacher). In the classroom, admissible dispositions are teaching, silence, order, attention, etc. Typical anchors of the classroom are student desks, tables, board, etc. A particular context is manifested in a specific place and establishes a circumstance.

Design Semiotics with an Agentive Approach


Figure 5. Elements of context (author’s elaboration of Niño 2015)

The structure of the contexts is useful in analyzing the circumstances in which agents will use artifacts, whether it is an established context or a new context proposed by the designer. For example, the context of calls radically changes with the introduction of cell phones. As stated previously, a designer has a per-­agenda in that he seeks to persuade users about the actions that they ought to take in order to accomplish agendas with efficiency by using artifacts. The effects of a per-­agenda on the users are called per-­effects. A per-­effect is an actual effect on the user brought about by a per-­agenda. The designer can plan the per-­effects that he wants to bring about, and evaluate whether he was successful in doing so. Agentive semiotics establishes different types of per-­effects: continual, temporary, and emotional effects. For example, a designer may wish to obtain a continual effect on users so that they always use a cell phone rather than a landline telephone, or may plan on achieving an emotional effect on users through the artifact’s forms being suggestive and appealing. On the other hand, the designer achieves a temporary effect on users, for example, if they temporarily use the umbrella bag dispenser located at a building’s entrance. An important issue related to the notion of per-­effects is that all agendas of this type also have unexpected effects known as concomitant effects; for example, the artifacts may have non-­intended effects on the environment. Designer’s agendas and per-­agendas are predicated upon a main/complex agenda that should be common to all design professionals: improving quality of life and enhancing the efficiency of the human system without adversely affecting non-­human systems. Finally, the analysis of agenda includes the conditions of agential resolution. These relate to the way (the actions taken) that a designer seeks to accomplish the agenda. They are composed of attainment conditions and compliance conditions (see Figure 6). The former are the mode in which an agenda is ideally complied with, for example,


Juan Carlos Mendoza Collazos

by using an artifact established by the designer, while the latter are necessary conditions that must be fulfilled. The designer proposes the ideal way of accomplishing the agenda of taking notes, for example, using an electronic tablet to fulfill the attainment conditions; thus, he defines the time and resources required to accomplish the agenda of taking notes. Figure 6. Conditions of agential resolution (prepared by author based on Niño 2015)

As for compliance conditions, the designer conceives of the mechanisms and technical issues related to the electronic tablet to fulfill the conditions that allow the agenda to be accomplished in the background. It should be noted that the conditions of goal achievement and the conditions of agential resolution comprise a whole (figures 4 and 6). The former are actual (agentive) realizations that are performed by an agent in order to accomplish his goals, and the latter are ideal (agential) conditions that should be accomplished by the agent and are formulated by the designer. This distinction is useful for allowing the designer to evaluate the significance conditions of artifacts, by means of comparisons between effective (actual) actions performed by users and the designer’s proposal for ideal achievement of an agenda.

3.3. Conditions of artifacts The last of the axes in the agentive approach to semiotic analysis are the conditions of artifacts. As stated in Section 2, the designer has assigned derivative agency to artifacts by means of the design of their functions. Artifacts are ana-

Design Semiotics with an Agentive Approach


lyzed with respect to the actions required to fulfill their functions. This is a novel perspective on analyzing an artifact’s functions. According to agentive semiotics, actions are broken down into kinetic actions, intellective actions and expressive actions. These types of actions are present and interlink with one another in any artifact, but any one of them may receive more emphasis depending upon the main artifact’s function. Thus, there are artifacts that encourage intellective actions, e.g. electronic tablets, books, and calculators. Others bring about kinetic actions, such as locomotion, manipulation and exploration. Locomotive actions are present in bikes, cars, skates; manipulative actions are prompted by joysticks, keyboards, or hammers; explorative actions are evident in binoculars, flashlights, maps, etc. Finally, expressive actions can be found in guitars, rings, speakers, medals, etc. This taxonomy is useful for classifying artifacts according to their functions. Artifacts are described in the agentive approach on the basis of multimodal or intermodal ways of obtaining information from the environment. The information captured is related to the sensory organs and encompasses sense, sound, sight, touch, and taste. For example, visual information allows an artifact to be described, in terms of colour, shape, texture, position relative to body, position relative to other objects, movement, etc. Thus, the agentive approach always focuses on the agents and differs from conventional design descriptions that employ technical terminology. Finally, to establish the artifact conditions, a designer analyzes the usability conditions. These are related to an artifact’s characteristics and features. Usability conditions are composed of conditions of use and conditions of functioning. The former are actions that the user has to take in order to make effective use of the artifact, while the latter are required to achieve the artifact’s goal. If the agent performs the tasks properly, but the conditions of functioning do not obtain, the agent has not achieved the artifact’s function. For example, the conditions of use for a washing machine include the following usage sequence: connecting to electricity, pressing the on/off button, putting the laundry in, pressing the wash cycle button, and so on. A washing machine’s conditions of functioning are technical features and mechanisms that actually make the machine work. When the conditions of functioning fail (the washing machine does not work) for any reason, the user cannot achieve the agenda (washing clothes) even if the conditions of use are followed correctly. It should be noted that the conditions of attainment (Section 3.2) coincide with the conditions of use, but there is a subtle difference: the conditions of attainment are the ideal way of fulfilling an agenda; the designer proposes a method for doing so (time and resources including artifacts), but following this proposal is optional and users can possibly find a better way based upon their own agendas. On the other hand, conditions of use are the prescriptive way of using a given artifact. These conditions are stated in the user manual or instructions.


Juan Carlos Mendoza Collazos

Figure 7. Significance conditions network (prepared by author based on Niño 2015)

In summary, the conditions of significance in artifacts give rise to a complex network of conditions, shown in Figure 7. To summarize, the conditions of agential resolution are the way that an agenda is accomplished, and are partially defined by the designer. Conditions of usability are the sequence of use, namely, tasks or actions required to use a specific artifact. They also include the ontological grounds of this artifact (materials, shapes, mechanisms) as shown in the middle axis of Figure 7. These conditions are entirely dependent on the designer or design team. In addition, the lower axis shows the conditions of goal achievement. These depend on the user’s actions and the conditions of his agency.

3.4. Enaction complexity and taxonomy of failures An additional element in the agentive approach that can be useful for design semiotics is the notion of enaction complexity. For agentive semiotics an enaction is any kind of response that an organism gives to/with/in the environment, which is consistent with use of the term in the literature (Varela, Thompson and Rosch 1991; Thompson 2007; Gallagher 2005, 2009). Thus, enaction as situated action is part of the ongoing agenda (Niño 2015: 23: 568). Enaction complexity is a scale from ease to complexity of use in any artifact. This consists of the inverse proportion between agentive resources and cognitive economy. An artifact will be easy (low enaction complexity) to use if the agentive resources tend to decrease, increasing its cognitive economy. Inversely, if the deployment of agentive resources tends to increase, and the cognitive economy is reduced, the artifact is hard to use (high enaction complexity), as shown in Figure 8.

Design Semiotics with an Agentive Approach


Figure 8. Enaction complexity as the inverse proportion of agentive resources and cognitive economy (prepared by author based on Niño 2015)

For example, an electronic tablet requires high consumption of agentive resources (attention, finely controlled manipulation, memory, inferences), mainly during initial use, with resultant high enaction complexity. Each new use of the tablet enables entrenchment of the form of use and the cognitive load is thus reduced, so fluidity of action is achieved and cognitive economy increases. Enaction complexity is also related to the sequence of normative/flexible stages or tasks that the user must perform and to the quantity thereof. The tasks required to use a washing machine are prescriptive and sequential and the number or tasks is considerable, so enaction complexity is high. On the other hand, opening a water bottle is a simple task requiring minimal usage of agentive resources; but if the bottle is poorly designed and is hard to open, the demand for agentive resources increases (greater physical effort, more concentration, etc.) and the complexity of the enaction rises. The last element of agentive semiotics to be described is the taxonomy of failures. Failures occur when significance matching is mistaken, namely when significations attributed to a semiotic item are not coincident with its significance (see Figure 2). This taxonomy, shown in Figure 9, is useful for explaining mistakes, errors, misunderstandings and confusion when agents use artifacts. Failures of responsivity occur when the actual response does not match the grounded responsivity. For example, when the user does not know (lack of thematic expertise) how to drive a car and carries out mistaken actions. Failure in agentive skill is when the user knows how to drive but is unable to do so because he/she is handicapped as a result of an accident. An example of failure in the agentive disposition is when the user knows how to drive and is able to do so, but does not want to perform the action correctly. In turn, attention failure is when the user does not concentrate and thus crashes the car. Failures of foundation occur when the ontological foundation, that is the physical features of the artifact such as materials, functions, etc., fail. There is a failure by non-­foundation when the artifact does not work. Failure by atypical foundation is when an artifact works sometimes or in unexpected way. Agential failures occur when the agent uses artifacts incorrectly. Errors in function attribution are, for example, when the user presses the wrong key when typing a text. An error due to contextual non-­pertinence is when artifacts are used im-


Juan Carlos Mendoza Collazos

properly, e.g. the user makes a call by means of cell phone in a concert hall when the concert is in progress. Finally, axiologically inadmissible failure is when the artifact is used, for example, to kill someone. This detailed classification of failures is useful to the designer because he can establish the cause and type of failure and can redesign the artifact accordingly. Figure 9. Failures in matching the significance of use (prepared by author based on Niño 2015)

In this manner, the complex network of artifact significance conditions can be analyzed. An understanding of the complex network of significance conditions paves the way to a greater insight into human actions. This point is very important for designers, because they must have a profound knowledge of users in order to successfully anticipate their responses. Case studies that take this approach are being developed by industrial design undergraduates at the Universidad de Bogotá Jorge Tadeo Lozano and by postgraduates at the Universidad Nacional de Colombia. Applications have been devised for artifacts such as digital tables, cutting tools, and vehicles.

4. Conclusions Design semiotics with an agentive approach provides a fresh insight into users’ actions and experience of using artifacts. As shown in this chapter, the approach offers semiotic tools that permit, for example, analysis of types of failures from

Design Semiotics with an Agentive Approach


the perspective of agents, understanding of the conditions of agency, and study of the operations of the agentive realization. Studying and describing the complex network of artifact significance allows for greater insight into how users act and signify. Concepts such as analysis of memory systems, the distinctions between significance and signification, between agential and agentive, multimodal descriptions of artifacts, the notion of responsivity, and enaction complexity enable in-­depth understanding of the variables involved in the meaning-­making that emerges when agents use artifacts. As these variables have not yet been sufficiently analyzed and systematized, designers have not been able to appreciate the semiotic complexity of their field. The presentation of agentive semiotics in this chapter offers a solid foundation for design semiotics, because it takes into account a range of issues concerning what an agent is and how he or she signifies. As expressed by Vihma (2007: 229) “the foundations, which support an interpretation, actually become the most important issue with respect to design, and not, for example, efficient functioning or low cost, which also have to be taken into consideration.” The tools and procedures of agentive semiotics serve to enhance the design process, allowing the designer to obtain tools to conceive better and more efficient artifacts based on a more thorough understanding of the foundations of users’ signification processes.

Michael May, Karen Skriver & Gert Dandanell

Chapter 6 Towards a Cognitive Semiotics of Science: The Case of Physical Chemistry 1. Introduction: The theoretical landscape 1.1 Situating a cognitive semiotics of science among other approaches Cognitive semiotics is an emerging interdisciplinary field combining theories and methods of cognitive science, phenomenology, linguistics and semiotics in the study of meaning (Sonesson 2012; Zlatev 2012). One of the key ambitions of cognitive semiotics is to bridge the gap between the humanities and the natural sciences, and one way of contributing to this bridge building is to move the investigation of meaning into the core of the natural sciences as disciplinary discourses by analysing issues of meaning and conceptualization in specific scientific topics and domains. Using the development of chemical reaction kinetics and chemical thermodynamics as an example domain, we argue here that didactic aspects of teaching and learning science should be understood in close relationship to the history and philosophy of the domain. Another ambition of cognitive semiotics – following the methodological “triangulation” of first person “subjective”, second person “intersubjective” and third person “objective” perspectives (Zlatev 2012) – is to include the complexity of first, second, and third person knowledge and experience in the explication of reality, rather than enclosing science and philosophy within an imagined third-­person objectivity. What is ultimately at stake here is the question of how to overcome the gap between phenomenology and cognitive science (Gallagher and Zahavi 2008). We will attempt to demonstrate in an example how perspectivization is necessary in order to understand the relation of students’ conceptualizations of physical chemistry to the construction of objectivity in physical chemistry as a science. The study of meaning and signification in science has been elaborated within semiotics, linguistics and cognitive science, as well as within the history and philosophy of science, but seen synoptically these approaches can appear fragmented and not likely to contribute to a common interdisciplinary understanding of science and science teaching. The theoretical landscape will be briefly described below as potential contributions to a “semiotics of science” not yet integrated into a cognitive semiotics of science.


Michael May, Karen Skriver, Gert Dandanell

1.2. Science and science learning in semiotics Charles S. Peirce (1839–1914) already formulated a semiotics of science insofar as his contributions to the pragmatic logic of inquiry and the philosophy of science became embedded in his conception of signs and signification (Hookway 1985). In a broad sense, his philosophy of inquiry was also taken into the domain of education through the influential work of John Dewey (1859–1952), who was a student of Peirce, although he did not follow Peirce’s semiotic conception of inquiry. Peirce wrote about the classification of the empirical sciences and contributed to the philosophy of mathematics, but he did not exploit his semiotic conception of inquiry in an empirical analysis of the “special sciences”. The main contribution of Peirce to a semiotics of science is his analysis of forms of representation according to the classification of signs and the process of inquiry, and more specifically the role of diagrammatic reasoning (Peirce 1906; Stjernfelt 2007). The point of view presented by Charles W. Morris (1901–1979) in his Foundations of the Theory of Signs (Morris 1938), his contribution to the joint programmatic ambition of Otto Neurath (1882–1945) and Rudolf Carnap (1891–1970) towards a “Unified Science”, was that semiotics would provide an “organum” for the sciences. The ambition was that semiotics would serve as a conceptual instrument for the empirical sciences by showing how observations and empirical data have to rely on sign relations, and furthermore it would make it possible to investigate how specific sciences rely on natural language and mathematics for the expression of concepts, theories and models (Morris 1938: 56–57). This project was, however, never realized. Even today a “unification of the semiotical sciences” as stipulated by Morris would run aground on the many discrepancies in the conceptual foundations of semiotic research traditions (e.g. logical, structural, biosemiotic) – including the distortion of Peirce’s pragmatism towards logical empiricism brought about by Morris himself (cf. Rochberg-­Halton and McMurtrey 1983).

1.3. Science and science learning in semiology and functional linguistics Following the stipulation of Ferdinand de Saussure (1857–1913) in his foundational lectures on linguistics of a “semiology” that would study the extensions of the language system to other sign systems and be a part of social psychology (Saussure 1976 [1916]), the Danish linguist Louis Hjelmslev (1899–1965) developed a semiotic theory of language (“glossematics”) based on the double distinction between content/expression and form/substance. This follows Saussure’s conception of language, but Hjelmslev added the idea that semiotic systems can be structured on multiple layers (“planes”) of content and expression. According to Hjelmslev natural languages are the first objects of semiotic analysis, but the paradigmatic and syntagmatic analysis of languages can be extended to other types of semiotic systems. The systems that can be analysed on two singular levels, a plane of expression and a plane of content, are called denotative languages, whereas the more

Cognitive Semiotics of Science


complex non-­denotative systems can be analysed as either connotative or meta-­ semiotic. A connotative semiotic adds a secondary plane of content to a system that is itself a full language; that is, the plane of expression of this new system is a language. A meta-­semiotic adds a secondary plane of expression to a language; the plane of content of this new system is a language. (Hjelmslev 1943, 1963). This idea was later used (rather crudely) by Roland Barthes (1915–1980) to analyse myths and ideology (Barthes 1967): “ideology” would be the additional content plane of a rhetorical layer of connotation, whereas “science” would be a meta-­semiotic adding a technical layer of expression on top of natural language. In the context of a possible “semiotics of science”, the systemic functional linguistics of Michael A.K. Halliday (born 1925) and its extension into a social semiotics of multimodal communication by Gunther Kress (born 1940) have to be mentioned. However, it has been difficult to integrate the work within this approach to semiotics with other theoretical traditions because of its idiosyncratic use of novel concepts, as well as a lack of effort in relating systematically to other developments in linguistics (e.g. linguistic pragmatics, cognitive linguistics) or even its own foundations in Hjelmslev and structural linguistics (Bache 2010; Butler 1988; Taverniers 2011). Despite the relative lack of interest in developing a semiotics of science within other branches of linguistics, the contributions of Halliday, e.g. “On the Language of the Physical Science” (1988), “On the Grammar of Scientific English” (1997), and “Things and Relations: Regrammaticising experience as technical knowledge” (1998) should nevertheless be recognised (Halliday 2004; Martin and Veel 1998). One of Halliday’s observations is – almost as an echo of Benjamin Lee Whorf1 (1897–1941) – that “a natural language embodies, in its grammar, a theory of human experience”, whereas “a scientific theory differs from this in that it is a dedicated and partially designed semiotic subsystem which reconstrues certain aspects or components of human experience in a different way, in the course of opening them up to be observed, investigated and explained” (Halliday 2004: 59). Although this could be a promising point of departure, Halliday has mainly been interested in processes of nominalization that lead to the construction of the extended vocabulary of technical English. A related interest of Halliday is the possible “ideological” implications of “metaphorical grammar”, where experiences are re-­phrased and meanings shifted when e.g. verb phrases are transformed into nominal phrases. The recent development of a social semiotics of multimodal communication elaborates on the systemic functional linguistics of Halliday (Kress 2010), but it maintains the unfortunate isolation from the wider community of linguistics and

1 Whorf was a controversial American linguist researching Native American languages like Hopi, Nahuatl, Shawnee and the Maya language, and theorizing about the interconnected nature of language and thought (Whorf 1956). He anticipated cognitive linguistics in many ways (Lee 1996). Significantly, he also wrote about the language of science from this perspective, e.g. “Science and Linguistics” in 1940 (Whorf 1956).


Michael May, Karen Skriver, Gert Dandanell

semiotics. Once again it is necessary to acknowledge the work being done, because it has established itself as the main approach involved in the empirical investigation of educational processes from a semiotic point of view (Kress, Jewitt, Ogborn, and Tsatsarellis 2014; Kress and Selander 2012). Our aim here is to indicate an alternative approach within the framework of cognitive semiotics.

1.4. Science and science learning in cognitive science With the influence of cognitive science in the 1980s, a shift in focus towards constructivist learning theories occurred within educational studies. The focus on conceptual difficulties in student learning of e.g. physics and chemistry generated new types of didactic and cognitive research such as phenomenographic field studies of students’ individual partial conceptions of scientific concepts and models. A paradigmatic example was the analysis of the Mole concept2 in chemistry (Lybeck, Marton, Strömdahl, and Tullberg 1988). The aim was to uncover the reasoning patterns and concepts implied by high school students in basic chemistry. From interviews, researchers constructed a collective map of the conceptual relations involved in students’ reasoning, as well as individual maps where some concepts or relations could be missing. Individual conceptions of the Mole in their relations to other concepts (molar mass, Avogadro’s number, atomic mass, volume) could a posteriori be constructed as partial representations of the complete map of conceptual relations constituting the concept. There is, however, an unresolved tension in phenomenography between the epistemic issue of the Mole as a scientific concept, and the Mole as an empirically constructed collective representation distributed over a group of individuals. This tension is not unlike the one created by Saussure in his stipulation of linguistics as a science: that the language system (la langue), as opposed to individual acts of speech (parole), is not complete in anybody but “only exists perfectly within a collectivity” (Saussure 1976: 30), i.e. as distributed over the mental space of language users. This tension is still an issue in modern linguistics, because this is where some forms of cognitive linguistics such as construction grammar will diverge from structural linguistics: there is no need to postulate high-­level grammatical categories (as an epistemic object of linguistics) that do not actually appear as units of speech (Croft and 2 As a concept a mole refers to an amount of substance (not to be confused with weight). A mole is also a fundamental unit in chemistry and traditionally associated with Avogadro’s number (approximately 6.022 x 1023). This is the number of particles, atoms, ions or molecules in 1 mole of a given substance, and it is important for the stoichiometric computations for the substances entering into and being produced in chemical reactions. The amount of substance in 1 mole is a fixed number: 1 mole of water will contain the same number of molecules as the number of hydrogen atoms in 1 mole of hydrogen, but 1 mole of water will be heavier that 1 mole of hydrogen. The name was introduced in chemical terminology by Wilhelm Ostwald and derived from the German word “Molekül” (molecule).

Cognitive Semiotics of Science


Cruse 2004). In a similar way, we should not equate the collective map of the Mole concept with the scientific concept, because the process of gradually constructing relevant conceptual relations from the first-­person perspective of a learner is different from the established third-­person perspective of a scientific community of practice, and it will also be different from its gradual development within a dialogical second-­person perspective in the history of science. What is common to those situations is not the “collective map” but the differential grounding of conceptual relations in actual problem situations. How these situations might be related is an empirical issue that can be investigated from the point of view of didactics, cognitive semiotics and the history of science. After the “cognitive turn” a secondary shift in pedagogical theories occurred towards more dialogical and social dimensions of science learning, and this has led to a renewed interest in John Dewey and Lev Vygotsky (1896–1934). It has also led to a recognition of semiotic aspects of science learning such as the scientific literacy problems associated with the acquisition and mastery of multiple representational forms within the disciplinary discourses of, for instance, physics (Linder 2013) and biochemistry (Schönborn and Anderson 2006). Educators and researchers find themselves dealing with didactic difficulties beyond the purely disciplinary content insofar as these difficulties require the linguistic forms of content knowledge (i.e. its lexical, grammatical, narrative and rhetorical forms) to be analysed and planned for in their own right. It is an important point that different representational forms have different disciplinary affordances for supporting student learning (Ainsworth 2006; Fredlund, Airey and Linder 2015; Fredlund, Linder, Airey, and Linder 2014). In the following section selected aspects of the history and didactics of physical chemistry are introduced as a case study. After a historical introduction to the conceptual development of chemical reaction kinetics, we will turn to the issues of scientific language and imagination as conceptualized by the founding fathers of physical chemistry.

2. Chemical reaction kinetics: a historical introduction Chemical reaction kinetics is the study of the rates of chemical reactions and their dependence on e.g. temperature and concentrations of the involved substances. As a topic it is a part of the domain of physical chemistry, and it arose in scientific communities during the second part of the 19th century. There were a number of inherent difficulties in arriving at valid rate laws for chemical reactions, and these difficulties reappear recurrently in chemistry teaching in high schools and universities. It is this entanglement of conceptual difficulties in the history of physical chemistry and the conceptual difficulties of students in learning the topic that is seen here as significant from the point of view of a cognitive semiotics of science, and it is taken as an argument to the effect that topics in science should be taught together with aspects of their historical development. Scientific literacy issues and representational competencies have to receive separate attention in the teaching of disciplinary content – which is generally not the case today (Osborne 2002).


Michael May, Karen Skriver, Gert Dandanell

Without assistance students do not always take notice of the literacy issues posed by the multiple representational forms they encounter in science learning. As one student (who did focus on the literacy issue) said in discussing the difficulties of learning biochemistry: “It is like learning a new language”.

2.1. The foundation of chemical kinetics In Norway the mathematician Cato Gulberg (1836–1902) and his chemist brother-­ in-law Peter Waage (1833–1900) formulated a law of mass action according to which chemical reaction “forces” are proportional to the product of what they called the “masses” of the reactants (Waage and Gulberg 1986, 1864). Right away we should note the importance of reflecting on scientific language since the “forces” and “masses” referred to here can only be understood in the context of Newtonian mechanics as a dominant paradigm of science in the 19th century. The point of departure for Gulberg and Waage was a dissatisfaction with the available electrochemical and thermochemical theories for explaining chemical reactions, and based on an implicit analogy with mechanical physics they pointed a way forward for understanding the “forces” of chemical reactions by studying purely quantitative aspects of chemical reactions in the hope of discovering empirical regularities. Between 1862 and 1864 they conducted approximately 300 quantitative investigations of reaction rates under laboratory settings, and based on their observations they introduced the following conceptual abstraction: If we maintain that for a given chemical process two opposing forces are in effect, one which strives to form new substances and one which strives to restore the original compounds from the new, it is enlightening that, when in the chemical process these forces become equally large, the system is in equilibrium. (Waage and Gulberg 1986, 1864: 1045)

Waage and Gulberg had realized that reaction rates were dependent on concentrations, but also that concentrations as well as reaction rates can change in the course of a chemical reaction. The role of “active masses” (concentrations) was already known at the time, but coached in a theory of chemical “affinity”. In 19th century chemistry, it was common to refer vaguely to “affinities” and “attractions” when speaking of the observed tendency of chemical substances to combine, and it was known that substances participated in chemical reactions according to their “affinities”, i.e. in given proportions. However, these “affinities” were not usually quantified, and no explanatory function was associated with the concept (Duncan 1996). The type of explanation that would count as scientific was, however, given by Newtonian mechanics, and it could accordingly take the form of a mathematical law or the form of a mechanical explanation. It was in this context that Waage and Gulberg concluded that the available knowledge about the role of electrical forces and “heat” in chemical reactions was insufficient to establish an explanatory theory, but that it would be possible to work out an empirical law for reaction rates and their dependence on the “active masses” of the involved substances. They focused on the importance of the concept of chemical equilibrium: many chemical reactions take

Cognitive Semiotics of Science


place as reversible processes, where forward and reverse reactions occur simultaneously, and when these processes find an equilibrium, it means that forward and reverse reactions occur at the same rate. In introductory chemistry teaching, equilibrium is sometimes misunderstood by students as a static state where no processes occur, and this was in fact a widespread belief in earlier chemistry (Laidler 2000). For a simple reaction with two substances A and B being transformed into the products C and D with stoichiometric coefficients a, b, c, and d, a reversible reaction can be written as in (1). The law of “mass action” would then predict an equilibrium constant Keq for this process to be as in (2), where the brackets indicate concentrations of the involved substances. (1) 𝑎𝐴 + 𝑏𝐵 ⇌ 𝑐𝐶 + 𝑑𝐷 (2)

The “law of mass action” is still taught in high school chemistry, because it correctly states the equilibrium constant for simple reactions, but it is arrived at through an argument about rate laws that is not generally valid (Laidler, Meiser and Sanctuary 2003). In order to explain this, we will go back to the original discovery of reversible reactions and its relation to the concept of a chemical equilibrium. The idea of a chemical equilibrium in reversible reactions had been introduced at the beginning of the 19th century by Claude Berthollet (1748–1822), a professor of chemistry at the École Normale Supérieure in Paris. Berthollet knew a one-­way reaction from the laboratory, where sodium carbonate and calcium chloride (both dissolved in water) would react and form calcium carbonate and sodium chloride (ordinary salt). This exothermic (heat producing) reaction can be written as in (3). (3) 𝑁𝑎2𝐶𝑂3 + 𝐶𝑎𝐶𝑙2 ⟶ 𝐶𝑎𝐶𝑂3 + 2 𝑁𝑎𝐶𝑙

In 1798, Berthollet accompanied Napoleon as a scientific advisor on an expedition to Egypt, and there he noticed large deposits of sodium carbonate (“soda”) around the edges of local salt lakes. This caused him realize that these deposits of 𝑁𝑎2𝐶𝑂3 had been formed by a reverse process to the one he knew from the laboratory, and this process was caused by the excessive concentration of salt in the lake exposed to intense heat and evaporation of water. Concentrations could act as a kind of force “pushing” the reaction in the reverse direction. According to Berthollet, chemical “affinities” were a manifestation of a universal attraction, and in this way chemistry was articulated conceptually according to the ideal of Newtonian physics (Califano 2012). Back in France, Claude Berthollet published his findings in Essai de statique chimique (1803) (Holmes 1962; Weller 1999). It was with a background in Berthollet’s discoveries that half a century later Waage and Gulberg investigated the equilibrium constants of different reversible processes and formulated the law of mass action. Conceptually, the idea of an equilibrium constant arose from the proportional dependency of both a forward and a reverse reaction. For both reactions viewed in isolation the reaction rate was


Michael May, Karen Skriver, Gert Dandanell

formulated as proportional to what Waage and Gulberg called “active masses” with each concentration in the exponential power of its stoichiometric contribution to the process, as in (4). (4) rate of forward reaction = 𝑘𝑓[𝐴]a [𝐵]b rate of reverse reaction = 𝑘𝑟[𝐶]c [𝐷]d

At the point of equilibrium, the rate of the forward and reverse reactions will be exactly the same, and this is how the equilibrium constant Keq in (2) is derived as a fraction, i.e. by isolating the two rate constants kf and kr on the left side of the equation and isolating the “active masses” on the other side. There is a philosophically important point in the concept of chemical equilibrium with regard to the separation of a macroscopic level of observable or measurable phenomena and the microscopic level of the dynamics implied in explaining chemical reactions. Equilibrium is, in a sense, static at the macroscopic level, i.e. when it is attained it does not change unless something disturbs the chemical process, whereas it is dynamic at the microscopic level. This is what is sometimes misconceived by students in introductory chemistry, when they believe that equilibrium means that processes have come to a halt. The problem with the formulation of Waage and Gulberg is the so-­called rate laws in (4). They are true for simple reactions with no intermediate steps, where the stoichiometric constants will be numerically identical to the exponents of the concentrations in the rate laws, but this will not generally be the case. Since students, however, are introduced to the simple cases first, they are very likely to make the false assumption that the rate laws can be seen directly in chemical equations. Learning that reaction rates have to be measured experimentally is therefore a critical point, i.e. they cannot be inferred directly from the stoichiometric expressions. This turns into a literacy issue that is aggravated by some textbooks, because they repeat the familiar law of mass action (where reaction rates appear to be determined from reaction equations) while at the same time stressing that reaction rates have to be measured experimentally. If this discrepancy is not addressed explicitly, students will be confused, i.e. they will not understand why the experimental procedure is necessary. Other contributions to the growing field of physical chemistry were made by the French chemist Henry Louis Le Châtelier (1850–1936) who investigated the conservative nature of chemical equilibrium under disturbances (expressed in “Le Châtelier’s Principle”), and the Swedish chemist Svante Arrhenius (1859–1927) who investigated the temperature dependency of chemical reaction rates (expressed in “the Arrhenius equation”). Physical chemistry as a coherent and mathematically expressed domain was, however, established on the basis of these developments as well as independently discovered by the Dutch chemist Jacobus van’t Hoff (1852–1911) in collaboration with the Latvian-­German chemist Wilhelm Ostwald (1853–1932). Jacobus van’t Hoff published his research in chemical kinetics in an influential volume on Études de dynamique chimique (Studies in Chemical Dynamics) in 1884, and in 1887 he and Ost­ wald jointly published the first journal in the domain, the Zeitschrift für physikalische

Cognitive Semiotics of Science


Chemie3. van’t Hoff linked chemical kinetics to thermodynamics, and he introduced the mathematics of chemical reaction orders, although the final clarification of reaction orders was due to Ostwald. It was also Ostwald who made the early contribution of Waage and Gulberg in chemical kinetics known to van’t Hoff. Wilhelm Ostwald made many significant contributions to physical chemistry including the important discovery that the role of catalysis in chemical reactions was only to change the rate of reactions and not alter the equilibrium. The fundamentals of chemical thermodynamics had been established independently by the American engineer Josiah Willard Gibbs (1839–1903), but his work remained relatively unknown in Europe until it was translated into German by Ostwald in 1883, and the work of Gibbs had a different perspective on the topic of thermodynamics compared to van’t Hoff. Gibbs provided a foundation of physical chemistry in theoretical thermodynamics using abstract concepts of energy and entropy, whereas van’t Hoff was pragmatically oriented towards experimental measurements relevant for the working chemist (Kragh 2001; Van Houten 2001). Figure 1. Left: Jacobus van’t Hoff (on the left) visiting Wilhelm Ostwald in his laboratory in the Department of Physical Chemistry at Leipzig University (around 1900). Right: van’t Hoff cardboard molecular models, Museum Boerhaave, the Dutch National Museum for the History of Science and Medicine, Leiden. Creative commons (Dragicevic and Jansen, 2012).

2.2. The role of imagination and language in science: J. van’t Hoff and W. Ostwald From the point of view of a cognitive semiotics of science, the interdisciplinary interests of Ostwald and van’t Hoff are significant. Besides his chemical kinetics, van’t Hoff made important contributions to stereochemistry, and he pioneered the 3 The Nobel prize in chemistry was given in turn to the founding fathers of physical chemistry: first to van’t Hoff in 1901, then to Arrhenius in 1903, and finally to Ostwald in 1909. Ostwald not only created the first journal of physical chemistry, but also headed a Department of Physical Chemistry at Leipzig University (Van Houten, 2002).


Michael May, Karen Skriver, Gert Dandanell

use of spatial models to represent molecular structures (Figure 1, right). His work in La Chimie dans l’espace (1878) was harshly criticised by one of the important contributors to organic chemistry, Hermann Kolbe (1818–1884), who found that his speculative use of spatial models was “quite incomprehensible to the sober chemist”, and Kolbe even ridiculed van’t Hoff’s teaching of applied chemistry to veterinary students in Utrecht. Kolbe claimed that the theory of spatial structures did not count as “exact chemical research”, but had to be the result of a preference for poetic speculation: according to Kolbe, van’t Hoff had found it “more convenient to mount Pegasus (borrowed, no doubt, from the Veterinary School) and to proclaim in his ‘La Chimie dans L’Espace’ how on his daring flight to the chemical Parnassus the atoms appeared to be arranged in space” (Benfey 1960). This personal attack actually contributed to making van’t Hoff famous in the European chemical community. In an inaugural talk for a new position as professor of chemistry at the University of Amsterdam in 1878, van’t Hoff chose to speak – not directly of his contributions to physical chemistry – but of “The Role of Imagination in Science” (van’t Hoff 1878). This apparently surprising topic was motivated by Kolbe’s attack. Rather than excusing his “poetic” exploration of spatial structures, he listed a number of cases from the history of science, where the creative or even artistic production of scientists had played an important role in their scientific discoveries. Imagination (in the Dutch original, “De Verbeeldingskracht”) should be seen as necessary to scientific discovery, because observation of scientific facts is not even possible without an imaginative anticipation of events and recognition of similarities with past events, and neither without a posterior simplification and conceptualization of the observed phenomena. Specifically, van’t Hoof addresses the role of imagination in explicating the relation of cause and effect in the sciences, and imagination is understood as “the capacity to visualize a particular thing so clearly, that all its properties can be recognized with the same certainty as if the object was directly observed” (van’t Hoff 1878). He stresses what Charles S. Peirce would have called abductive reasoning, i.e. the role of imagination in hypothesizing about possible causes behind observed regularities. One of van’t Hoff’s examples was the work of the German chemist Eilhardt Mitscherlich (1794–1863), who had noticed similarities between different salts such as ammonium arsenate (NH4)3AsO4 and ammonium phosphate (NH4)3PO4. These are different chemical compounds with different affordances and properties (e.g. ammonium arsenate is highly toxic and ammonium phosphate is highly unstable), but what was striking to Mitscherlich was that the structural analogy of the composition of the two compounds seemed to be approximately reflected in a similarity of their crystalline structures. The visual similarity of the salts as crystalline structures was therefore hypothesized by Mitscherlich to be an effect of an underlying structure at the molecular level, and he stated this as an (approximate) empirical law of isomorphism (Freund 1904; Ihde 1964). As stated by van’t Hoff, imagination was needed for this discovery to be made: “if the crystal form of the first was not vividly present to him during the observation of the second [compound], the correspondence between them could not have occurred to him” (van’t Hoff, 1878). More importantly, how-

Cognitive Semiotics of Science


ever, imagination had to play a role in the formation of the fundamental analogy between the microscopic level of structural organization and the macroscopic level of observable crystalline forms. The discovery of Mitscherlich was important for van’t Hoff, because it was a precursor to his own discovery of 1874, in which he explained the isomerism4 of certain organic compounds in terms of the stereochemical properties of carbon bonds (Ramsay 1981). With his emphasis on the importance of imagination in scientific discovery, van’t Hoff implicitly recognized its importance for teaching and understanding chemistry. He noted that imagination would not always have a constructive role, but could find a “pathological” expression in superstition and hallucination5, and he also indicated that its role in modern science had shifted with the extensive use of collaborative laboratory work: “the imagination can now be replaced by the sacrifice of a great amount of labour” (Benefit 1960).6 As noted above, van’t Hoff pioneered the use of spatial models to represent molecular structures. He introduced diagrammatic representations of molecular structures and also produced physical models as aids to visualization (Figure 1, right). The latter was actually meant to assist his research colleagues (van der Spec 2006), but they would have also played a didactic role in teaching organic chemistry. The importance of molecular models in chemistry was underestimated or even “forgotten” in classical philosophy of science, although they played an important role in chemical scientific practice (Rancour, 1997), but this is not the case in more recent cognitively oriented philosophy of science, where the role of conceptual tools and material artefacts in scientific modelling – in different ways – is seen as a central topic (Baird 2004; Gere 1992, 2006; Nersessian 2008). Wilhelm Ostwald, on his part, was deeply involved in philosophical aspects of natural science. He wrote a treatise on Natural Philosophy (Ostwald 1910), and he defended a form of “energetics” that denied the material existence of atoms. Stipulating energy as more fundamental than matter was not unusual at the time, but his

4 Isomerism had been introduced as a concept by the Swedish chemist Jöns Jacob Berzelius (1779–1848) to describe the existence of chemically different substances, isomers, that have the same composition but different structures and different chemical properties. A simple example is the chemical composition C2H6O which has two isomers, the liquid we know as ethanol (or ethyl alcohol) with the structure CH3CH2OH and the gas dimethyl ether with the structure CH3OCH3. Isomers are an important concept in organic chemistry because of the spatial combinations made possible by carbon bonds. 5 A more mundane expression would be the conceptual mistakes resulting from misleading analogies such as the one discussed in Section 3.2 on the concept of an activation energy barrier. 6 This could be read as a sarcastic reply to the criticism of Kolbe as an experimental scientist: a lack of imagination might be compensated by a massive amount of laboratory work, i.e. investigating “in all directions and by trial and error” might eventually produce results that could have also been arrived at through abductive reasoning!


Michael May, Karen Skriver, Gert Dandanell

opposition to a mechanical world view went further into a unitary foundation of all sciences including not only physics and chemistry, but also biology and sociology. As a chemist he paradoxically wanted a “Chemie ohne Stoffe”, i.e. chemistry without substances (Holt 1970). The “ideological” conception of an “energetics” opposed to mechanical physics can be seen as a part of the perceived “crisis of physics” at the end of the 19th century, but it was widely opposed by leading German physicists including Ludwig Boltzmann (1844–1906) and Max Planck (1858–1947) (Krogh, 2015). His anti-­atomism was effectively refuted by Albert Einstein’s famous 1905 paper on observable Brownian motion, where the jerky motion of small particles in liquid suspension was interpreted as a consequence of molecular motion according to the kinetic theory of heat (Ren 2005). From our present perspective it can be difficult to understand the position of Ostwald, but it should be noted that chemists debated the existence of atoms throughout the 19th century. This is an expression of a difference in the level of reality articulated by physics and chemistry: for the chemist at the end of the 19th century it could still make sense to treat atoms and molecules as mathematical fictions or conceptual artefacts, but this was at a time when experimental physicists like J.J.  Thomson (1856–1940), Ernst Rutherford (1871–1937), Wilhelm Rontgen (1845–1923) and Marie Curie (1867–1934) were already doing things to atoms7 (Pays 1982). As a chemist and natural philosopher Wilhelm Ostwald took a practical interest in science as a language, an interest that also grew out of his work in translating key papers in chemistry into German as he had done with Waage and Gulberg, and later with the work of Gibbs on thermodynamics. Ostwald saw that science suffered from a language barrier created by the national languages used by his colleagues in chemistry, and he saw the need to establish an international language for science. Beyond the language barrier, however, Ostwald identified another problem: natural languages had evolved for practical communication in everyday situations, but they were inconsistent in their grammar and inherently vague in their semantics. Science needed a special language to supplement mathematics, i.e. a universal world language (“Weltsprache”) that had to be constructed for the specific purpose of scientific communication (Gordin 2015). This should remind us of the rationalist dream of in7 One should reference here the famous phrase of “entity realism” by Ian Hacking recalling an experiment at Stanford University that convinced him of the reality of electrons. He had observed the manipulation of a macroscopic object in an electrical field through a process of “spaying” electrons on the object, and how this affected the behaviour of the object: “So far as I’m concerned, if you can spray them, then they are real” (Hacking 1983). Although electrons might have started out as conceptual artefacts in physics, this changed when we began to use electrons as tools in experiments and engineering: they are considered real when we can use them instrumentally. This instrumental form of realism might, however, be too narrow to account for scientific practice, and in another sense it could be illusory, because manipulation without specific knowledge of what is being manipulated cannot be sufficient for a sound scientific realism (Gelfert 2003).

Cognitive Semiotics of Science


venting a logical language of conceptual calculations proposed by Gottfried Wilhelm Leibniz (1646–1716) in his reflections on a “characteristica universalis”, but also of the mythological conceptions of language in a perfect state of a transparent clarity and unity of language and thought in an “Adamic language” before Babel (Eco 1995). Ostwald was a prominent member of the international language movement that debated the use of constructed languages (e.g. Volapük, Esperanto, Ido) for science communication and the standardization of scientific terminology and nomenclature. He contributed to a report on “Considerations on the Introduction of an International Language into Science” with the French mathematician Louis Couturat (1868–1914), the Danish linguist Otto Jespersen (1860–1943), the Austrian electrochemist Richard Lorenz (1863–1929), and the Austrian physical chemist Leopold Pfaundler (1839–1920). Ostwald’s contribution was a note on the importance of standardized chemical nomenclature (Couturat, Jespersen, Lorenz, Ostwald, and Pfaundler 1910). He optimistically indicated “that the problem of an international language has already been partly solved in science.” Ostwald argued that van’t Hoff had treated chemical reactions without naming the chemical substances, “considering that his meaning would be much better conveyed by the corresponding structural formulae.” His valid point is that chemical formulas are recognizable to the chemist without naming the substances in any particular natural language. However, he also claimed that this would work “without any such words existing at all”, and this cannot be true, since the chemical community would still need to identify and explain their topics in language in order to provide more information about it. Expressions in mathematics, chemical symbols or diagrams will need natural language for science communication to work, so seeing chemical notation in itself as an international language of science is somewhat misleading. Ostwald discussed the necessity of “fixating” concepts of scientific language in a manner reminiscent of Peirce’s early ideas on “fixation of belief” (Peirce 1877), and he mentions that “the concepts of logic and the theory of cognition” require the same process of fixation. Ostwald must have known Peirce through William James during his stay at Harvard University, where he lectured in 1905–1906, and he might have been introduced to Peirce’s logic through Louis Couturat, who treated Peirce in his studies on algebraic logic. In the later part of the 20th century, English established itself as a de facto international language of science, thus making the quest for an international language apparently obsolete, although globalization by itself has not resolved the conceptual aspects of scientific literacy.8 As a natural extension of his interests in translation, 8 Translation of science into national languages such as French, German, Spanish, Russian and Chinese is still an influential factor for the knowledge available within local scientific communities, but the more equitable distribution of scientific languages that existed at the beginning of the last century seems to have broken down after World War 1, specifically with regard to scientific German and scientific French (Gordin 2015). As pointed out by Gordin, the transition to scientific English was easy in mathematics and natural sciences by the role of formal languages and notational


Michael May, Karen Skriver, Gert Dandanell

publication, international language and nomenclature in chemistry, Ostwald involved himself in didactic transpositions of chemistry to the public such as his “Conversations on Chemistry” (1905). Furthermore, he worked actively on the foundation of a world archive for arts and sciences called Die Brücke (The Bridge), i.e. something like Wikipedia before the internet! Ostwald was explicitly pursuing a kind of combinatorial logic of knowledge with reference to Leibniz, but with the initiative of The Bridge he was also pragmatically proposing a chemical analogy to knowledge construction and science communication: he invented a standardized form for the reuse of small pieces of knowledge to be used individually, i.e. as what we today call “learning objects”; pieces that would also be a part of an illustrated encyclopaedia, a “world brain”, i.e. what is effectively realized today in collaborative efforts such as Wikipedia and Khan Academy (Hapke 1999, 2012). For the visionary Ostwald, it was a way of liberating “mental work” from “mechanical labour” and providing a metaphorical link to his philosophy of “energism” (Holt 1970). The international language movement and associated initiatives like The Bridge were, however, effectivly dismantled by World War 1, and they never regained their strength after the war. A temporary strengthening of national scientific languages appeared in the context of ideological projects such as the Stalinist “Soviet Science” and the “Aryan Physics” or “Deutsche Physik” of Nazi Germany, but these projects collapsed together with the political regimes that had fostered them (Gordin 2015).

3. The different perspectives of kinetics and thermodynamics The different approaches of Gibbs and van’t Hoff to physical chemistry are sometimes portrayed as a matter of personal style (Deltete and Thorsell 1996), but this is a distortion and simplification of the issue. Although they both contributed to chemical thermodynamics in a form which is recognizable in current textbooks, the actual conceptual content is divergent. This is not just a simple issue of terminology and nomenclature, even though some interesting points can be made at a purely lexical level of analysis. systems, but as exemplified by the French mathematician Henri Poincaré in a talk he gave in Göttingen in 1909, the role of a formal language within scientific languages could be described as a prosthetic “crutch”. At the time, German and French were still scientific languages with approximately the same amount of publications as English, and Poincaré had given a series of lectures in German. For his last lecture, however, he wanted to address more conceptual issues, and for this he had to revert to his mother tongue: “Today I have to speak French, and I must apologize for it. It is true that in my earlier lectures I expressed myself in German, in very bad German: to speak foreign languages, you see, is to want to walk while one is lame; it is necessary to have crutches; my crutches were until now mathematical formulas, and you could not imagine what a support they are for an orator who does not feel himself very firm. In this evening’s lecture, I do not want to use formulas, I am without crutches, and that is why I must speak French.” (Poincaré 1909, cit. in Gordin 2015: 13).


Cognitive Semiotics of Science

3.1. Perspectives as indicated on a lexical level To demonstrate the difference in perspective we will first indicate how this is manifested at a purely lexical level of scientific English. Figure 2 below presents a visualization of normalized word frequencies for the most frequently used technical terms in two key texts: the collection of papers by Gibbs on chemical thermodynamics (Gibbs 1906) and van’t Hofff’s Studies in Chemical Dynamics (van ‘t Hoff 1896). The network indicates the 25 most frequently used technical concepts in the Gibbs papers, and the 25 most frequently used technical concepts in the van’t Hoff monography. One significant point is that only seven of these are shared (as highly frequent): temperature, pressure, heat, value, constant, equation and system. Another observation is that the left-­side concepts of Gibbs tend to be more abstract and mathematical terms (e.g. surface, point, phase, state, entropy, part) whereas the right-­side concepts of van’t Hoff tend to be pragmatically related to measurements (e.g. concentration, solution, velocity) or chemical substances (e.g. water, salt, oxygen). Figure 2. Network visualization9 of word frequencies in (Gibbs 1906) and (van ‘t Hoff 1896).



van’t Hoff


9 The method used requires three levels of measurement, data manipulation and visualization. Word counts are performed on digital copies of the texts using the open source software TextSTAT (TextSTAT, Version 2.9, 2014, Matthias Hüning, and normalized word frequencies are calculated in order to arrive at relative frequencies within the individual documents. Singular and plural forms were combined in the count (e.g. “salt” stands for “salt” and “salts”). Ordinary words in English were filtered from the sample, and an


Michael May, Karen Skriver, Gert Dandanell

The distribution and frequency of lexical units in technical English is just to be taken as an indication of more conceptual differences in the organization and presentation of topics. The status of physical chemistry in research and education is a key issue in the philosophy of chemistry, and the potential “derivation” of chemistry from physics is a relevant problem (Brakel 2000). Today, the frontline of research would be in quantum chemistry, involving questions such as: can the observed chemical properties of the elements of the periodic table be derived from quantum physics? (Scerri 1994).10 In the present context of reaction kinetics and chemical thermodynamics, the issue is not reductionism, however, but the difference in perspective of the two approaches as conceptual constructions. The didactic problem of the different perspectives is mainly that they are not addressed explicitly. When students first learn about physical chemistry (in high school), they are introduced to reaction kinetics as a discipline based on empirical observations of reaction rates; they learn about the “mass action law”, and eventually they might also derive the integrated rate laws and become familiar with their mathematical and graphical representations. Thermodynamics of chemical reactions is introduced at more advanced levels of high school chemistry. For students who choose to study chemistry, biochemistry, pharmacy, molecular biology or chemical engineering at university, the order of presentation of these topics will be drastically reversed: courses and textbooks on physical chemistry present topics deductively starting from the abstract principles of thermodynamics and statistical mechanics, i.e. from the point of view of Gibbs, and reaction kinetics appear as an application of thermodynamics.11 It is well documented that this creates conceptual problems arbitrary number of top level frequency words were selected (25 from each text). These word lists were then used to produce node and edge files to import to the Gephi visualization software (Gephi 0.8.2, 2012, Word frequencies are approximately represented as line thickness of the edges. The selected layout algorithm is Fruchterman Reingold. 10 This issue was raised originally by Paul Dirac (1902–1984) in 1929 in a famous paper on “Quantum Mechanics of Many-­Electron Systems”, when he optimistically claimed that chemistry could – at least in principle – be derived from quantum physics. The reservation of Dirac was mainly that the known mathematical physics might be computationally too difficult: “the difficulty is only that the exact application of these laws leads to equations much too complicated to be soluble” (Simões, 2002). 11 Even an author like Keith Laidler, who is engaged in the history of his discipline (cf. Laidler 1985), follows this deductive approach in his textbook on physical chemistry (Laidler et al. 2003): the basic principles of chemical kinetics are introduced only after 360 pages of chemical thermodynamics. This is scientifically correct, of course, although it might be a didactic advantage to approach the topic more inductively (e.g. from experimental observations) and interwoven with the historical development (which retroactively anchored the discipline in thermodynamics and quantum chemistry). We should also be aware that there are disciplinary differences within thermodynamics as presented in e.g. physics, physical chemistry and mechanical engineering (Christiansen and Rump 2008).

Cognitive Semiotics of Science


for students. The understanding of chemical equilibrium can be arrived at through the mass action law (through kinetics) as well as through thermodynamics, but these two approaches can fall apart conceptually in the student’s reasoning (Van Driel and Gräber 2002). There is an argument to be made about the possible meta-­cognitive function of the history of science: real difficulties of conceptual progress and even the “resistance to change” demonstrated in the history of science should be used constructively in teaching. Rather than the idealized deductive presentation of scientific theories common in textbooks, key episodes from the history of science provides an authentic window into conceptual difficulties: because they “form an integral part of the history of science…[they] therefore constitute an acceptable form of scientific literacy”. We should take advantage of these historical episodes of resistance to conceptual change “in order to stimulate students’ intellectual curiosity and make them more conscientious of their own misconceptions that result in resistance to change” (Campanario 2002). Cognitive semiotics is important here because problems of scientific literacy and conceptual change have to be reconstructed as didactically relevant through a linguistic and conceptual understanding of science, and the “science content” should neither be reduced to a collection of facts and methods, nor to a technical language and rhetoric, but should include an understanding of theories and models constructed at multiple levels of meaning using multiple forms of representation. Whereas the order of presentation of a topic can be seen as a purely didactic problem (cf. the order of kinetics and thermodynamics in learning physical chemistry), a number of other problems can only be understood adequately by talking into account the analogies and metaphorical constructions used within the discipline, as well as the construal of perspective characteristic of different approaches such as thermodynamics and kinetics. Accordingly, it is important not to reduce this issue to a rhetoric of science, although it is clear that science does have its own rhetoric as indicated in the functional linguistics of Halliday. Scientific language will tend to have a rhetoric of objectivity as expressed through the use of passive voice, nominalisations and impersonal descriptions (Livnat 2010), but beyond rhetoric we need to understand how different analogies and other types of conceptualisation will construe and perspectivise the conceptual content differently for students. In the following we analyse a specific example.

3.2. Perspective as grammatical construal: the concept of activation energy barrier In explaining chemical reactions at the molecular level and in explaining the action of enzymes in “speeding up” reactions, the concept of an activation energy barrier is used in physical chemistry. It rests on thermodynamics as applied to chemistry. The activation energy is a “barrier” that needs to be “overcome” for a chemical reaction to take place, and the action of enzymes is to lower this barrier. From the point of view of conceptual metaphor theory (Lakoff 1987; Lakoff and Johnson


Michael May, Karen Skriver, Gert Dandanell

1999), the concept must rely on a metaphorical conceptual structure, specifically the barrier metaphor and the conceptualization of a process of “overcoming” this barrier. Furthermore, the notions of “barrier” and “overcoming” can be profiled in different ways (Langacker 1987; Croft and Cruse 2004): we can profile “the barrier” in a situation as influencing the events that can take place there, or we can profile the process of “overcoming”. The concept of an activation energy barrier was introduced around 1889 by the Swedish chemist Svante Arrhenius in thermochemistry. The fundamental idea is that a certain amount of energy has to be present in a chemical system for a reaction to take place. Each specific chemical reaction will have its own minimum energy level for it to proceed. Described in this way, we can recognize that there is a potential energy in a chemical system in analogy with the potential energy of a mechanical system. A stone on a mountain side or an upright brick on a table will have a potential mechanical energy that can be released, i.e. the stone might roll down and the upright brick might fall over. These changes do not happen by themselves, but require some additional energy (e.g. a wind force “pushing” the stone and “overcoming” the friction), and this is what is implied by the barrier concept. In chemistry teaching, the potential barrier will often be described metaphorically as an energy “hill” that reactants must “climb”, and this metaphor will be supported by a graph of potential energy mapped against the “reaction coordinate” – meaning the different steps of a chemical reaction. These “reaction coordinate graphs” are usually not explained very well, and in particular it is not made explicit that they should be read as explanatory diagrams rather than graphs representing measurable quantities. The reaction coordinate is not a simple representation of time, as students might assume, but the “progress of a reaction” involving changes in molecular distances and angles as well as time (Keeler and Wothers 2003). Figure 3. A mechanical analogy to explain the concept of an activation energy barrier as seen in a textbook on general chemistry (Holtzclaw, Nebergall and Robinson 1984)

Cognitive Semiotics of Science


This confusion concerning how to understand the reaction coordinate diagram as a type of representation is only one part of the problem. If we look at how the activation energy barrier is explained and illustrated in textbooks, a conceptual problem in the construal of perspective as implied by the analogies will appear. In Figure 3, the mechanical analogy of a boulder is used, where a human figure pushing the boulder uphill indicates the energy barrier that must be overcome for a chemical reaction to take place. This analogy is useful for grasping the basic ideas of a barrier and of potential energy, but there is a problem. It exemplifies a didactic transposition of content knowledge from research into simplified teachable content as analysed in the French tradition of didactics of mathematics (Bosch and Gascón 2006). The problem of perspectivization in the analogy is that the represented human figure pushing a boulder uphill invites students to imagine the energy barrier from a first-­person perspective involving a singular object, as if the barrier is to be understood as a physical barrier to a chemical reaction for a single molecule.12 This is misleading, however, because the argument for an energy barrier should not be situated at the level of singular objects (molecules). It is a statistical argument involving a variety of energy levels available in a large collection of molecules, where the velocities of individual molecules may or may not be sufficient for breaking chemical bonds through molecular collisions. This was made clear in the collision theory of chemical reactions proposed by William Lewis (1885–1956) and other chemists in the early 20th century. On average, a larger amount of molecules will participate in a reaction if, for example, the temperature of the chemical system is increased. The construal of perspective13 in the analogy has consequences for the conceptual understanding of students, as demonstrated in the episodic example discussed below. This example concerns the use of the activation energy barrier concept in understanding the action of enzymes. Enzymes take part in chemical reactions by temporarily binding molecules of a specific substrate to particular binding sites within their own molecular structure. Like all catalysts, enzymes are not consumed or altered by the chemical processes they enhance, and significantly they do not change the chemical equilibrium of the processes but only influence the rate of reactions. A simplified schema for enzymatic reactions can be given in two steps as shown in Figure 4. 12 In some contexts there is even a reference to the myth of Sisyphus and his never ending effort of pushing a stone uphill, see for instance the http://chemwiki.ucdavis. edu on the Arrhenius law and activation energies. 13 In the theory of Cognitive Grammar, the concept of construal refers to the ability of human speakers to construe the same basic situation in many different ways. That is, the linguistic expression of a conceived situation can be structured and modified by the focus on particular aspects of the situation, by the level of detail with which it is described, and by the perspective from which it is viewed (Langacker 1987). Perspective in turn includes a number of phenomena such as viewpoint, deixis and the construal of objectivity and subjectivity.


Michael May, Karen Skriver, Gert Dandanell

Figure 4. A simplified schema for the two steps of an enzymatic reaction

The two steps of an enzymatic reaction are the binding of a substrate (S) to an enzyme (E) forming an intermediate complex ES; the catalytic step produces the product (P) and releasing the enzyme. Understanding the mechanism of catalytic and enzymatic reactions involves an argument from thermodynamics about “free energy” usually accompanied by a reaction coordinate diagram as shown in Figure 5. The concept of free energy was proposed by Gibbs and by Helmholtz in two different versions (for different constraints on volume, temperature and pressure), but with a similar intention of explaining why reactions happen according to thermodynamics and avoiding the allusive concept of chemical “affinity”. van’t Hoff followed Helmholtz in considering the free energy as the potential of a chemical system to do work: the free energy (not “wasted” as heat, that is as an increase in entropy) is the maximal work realized in a reaction (at a fixed temperature and volume). Chemical equilibrium can then be reconceptualized as the dynamic state where this “work of affinity” is minimal (Kragh 2001). The perspective of Gibbs on free energy, on the other hand, was on the development of entropy of a system under constant pressure. In the diagram in Figure 5, ∆ G++ denotes the change in “free energy” (G for “Gibbs energy”) for the two transition states corresponding to the energy barrier with and without catalysis. The positive free energy describes the energy barrier, whereas the negative free energy describes the difference (∆ G) of energy released in the reaction (notice that ∆ G is not changed by catalysis). In a biochemistry course at the University of Copenhagen14 we observed that some students wondered why a given enzyme-­enhanced reaction occurs at all considering that the energy barrier is not “removed” but only “lowered” by the action of enzymes. Commenting on the reaction coordinate diagram, one student asked: “Why does a reaction happen, when the activation energy barrier is only lowered?” This reveals a misconception about the nature of the barrier as an absolute barrier that can be “removed” rather than a statistical conception, although the question itself is meaningful and the full answer quite complex, because the concepts of “free energy” and “reaction coordinate” themselves are inherently quite complex (Keeler & Wothers 2003). 14 This observation is from an practical session in a course (Spring 2013) on protein chemistry and enzymology taught by Karen Skriver, Department of Biology, as a part of the educational programme in Molecular Biomedicine.

Cognitive Semiotics of Science


Figure 5. “Reaction coordinate” diagram used to explain the action of enzymes with regard to the lowering of the activation energy barrier, simplified after (Nelson and Cox 2008)

As indicated above, the conceptual mistake can be understood as a consequence of the construal operations generating the perspective implied by the analogy. In cognitive grammar, three types of construal operations related to perspective can be listed: viewpoint, deixis, and construal of subjectivity and objectivity (Langacker 1987; Croft and Cruse 2004). The mechanical analogy in Figure 3 affords a visualization of the scene as a first-­person perspective on the action of pushing a stone or boulder uphill. Transposed to the target domain of the analogy, this involves an imaginary viewpoint of a single molecule and an objectivity constructed around the uphill action as a physical barrier blocking a chemical reaction.15 The problem is not the first-­person perspective as such, but the construction of an imagined subject-­ object relation around a singular object. Conceptualizing and visualizing a multitude of molecules with different velocities constitutes a real cognitive difficulty as long as we attempt a didactic transposition based on the analogical object representation. However, this is also where we can see the superior approach of Gibbs and why the representation of energy surfaces are significant in his version of chemical thermodynamics, as in Figure 2 where “surface” is the most frequent technical term in Gibbs (1906). In the paper “A Method of Geometrical Representation of the Thermodynamic 15 Another cognitive semantic operation might be implicated here: the schematization of the described situation in terms of force-­dynamics (Talmy 2000; Croft and Cruse 2004). The problem would in this case be the interpretation of the causal type of the barrier as a static “blocking” rather than a dynamic “letting”.


Michael May, Karen Skriver, Gert Dandanell

Properties of Substances by Means of Surfaces” (1873) a powerful visualization is provided in the form of “thermodynamic surfaces” representing the relations of energy, entropy and volume in a chemical system, and in such a visualization maximal and minimal energy can be seen without reference to object representations at a micro level. The shift from a local viewpoint on activation energy to the global viewpoint of a mathematical abstraction imposed a cost of “readability”: interpreting the thermodynamic surfaces of Gibbs requires a considerable effort on the part of the student. What is overlooked in the misleading mechanical analogy is the probabilistic nature of the argument at the molecular level: lowering the energy barrier will allow more molecules to have sufficient energy to overcome the activation energy barrier required to break molecular bonds and form new bonds.

4. Conclusion Future research will have to investigate the nature of different literacy issues in science education and the interwoven conceptual issues in the history and didactics of science. In physics education, the recurrent conceptual problems of students are no longer seen as a sign of motivated and stable misconceptions, but rather as a sign of a conceptual struggle of disambiguation. In other words, students will try to figure out what the ontology of physical concepts could be across the multitude of descriptions, equations, metaphors, analogies, visualizations and experiments they encounter through the disciplinary discourse and practice of physics (Brookes and Etkina 2009). It will accordingly be important not to reduce the role of representations to a question of “language” in the narrow sense of terminology, or a question of a “rhetoric” of science. In summary, a number of significant issues for future research can be stated as follows: • We need a semiotics of science rather than a purely linguistic approach because of the multimodal nature of representational forms used by instruments, models and theories in scientific practices (Giere 2006; Baird 2004),16 and because of the importance of multiple representations in science learning (Ainsworth 2006). Ronald N. Giere’s proposal of a “scientific perspectivism” is important, but the foundation of scientific instruments and models as distributed cognitive systems has not been worked out in detail (Brown 2009). Scientific perspectivism can be seen as a “quasi-­semiotic” theory in the way it stresses representational models and distributed cognition, but this relation to semiotics should be explicitly elaborated. • We need a cognitive semiotics of science because of the transdisciplinary nature of key problems (Zlatev 2015a). Issues such as scientific literacy, ontological 16 This is also stressed in the multimodal approach of social semiotics (Kress, 2010; Kress, Jewitt, Ogborn, and Tsatsarellis 2014), but as mentioned earlier the conception of semiotics is made problematic by the idiosyncratic nature of systemic functional linguistics and the misrepresentation of foundational works in semiotics.

Cognitive Semiotics of Science


disambiguation, representational forms and scientific instrumentation have to be constructed beyond individual disciplinary approaches, that is, beyond approaches from psychology, linguistics and philosophy of science. A cognitive semiotics of science will accordingly not be a philosophical approach to science and education such as the proposed “edusemiotics” (Semetsky 2010), but an empirical investigation of the individual sciences from the point of view of meaning, signification and experience. • Furthermore, it is expected that a cognitive semiotics of science will have didactic consequences for how science should be transposed into teachable content within educational practices. Specifically, we should stress the unexploited meta-­ cognitive potential of the history in science by bringing historical difficulties of conceptual progress into the teaching of science, as well as the potential of explicitly addressing the requirements of multiple representational forms for science learning. • In a broader perspective a cognitive semiotics of science should be concerned with conceptual problems in the discovery, articulation and understanding of scientific concepts, theories, models, experiments and simulations. This seems to be closely related to the “cognitive history” of science pursued by Nancy Nersessian in her micro-­studies of conceptual modelling in the development of electromagnetism (Nersessian 2008), as well as to the macro-­studies of changing “epistemic virtues” of scientific objectivity by Lorraine Daston and Peter Galison (Daston and Galison 2007). Significantly these approaches to the history of science rely on the study of representational forms such as diagrams and images, but without an explicit semiotic reflection to support it. From the point of view of cognitive semiotics, many recent science studies seem to be articulated as quasi-­semiotic studies. Cognitive semiotics promises to be an interdisciplinary approach combining aspects of semiotics, philosophy of science, phenomenology and the cognitive sciences (Sonesson 2012; Zlatev 2015a), and this is needed to explore the interwoven problems in the history and didactics of science, as we have attempted to demonstrate here for a few aspects of physical chemistry. Beyond the lexical and rhetorical aspects of scientific discourse, we should exploit the new focus on an integration of first, second, and third person perspectives in cognitive semiotics (Zlatev 2012, 2015a) in explicating the construal operations in scientific discourse, and how scientific content is perspectivized and made available for transformations between different types of representational forms (including mathematics) and across different types of scientific practices including instrument measurements, computations and visualizations. Although the disciplinary focus in the present discussion has been on natural science and the case of physical chemistry, this approach should be extended to include domains such as medicine and engineering. The relation of cognitive semiotics of science to the disciplinary domains of the humanities and the social sciences is an important issue that will, however, require a separate investigation.

Part II. Semiotic Development and Evolution

Lorraine McCune

Chapter 7 Meaning, Consciousness, and the Onset of Language 1. Introduction Cognitive semiotic theory, as a systematic study of meaning-­making, offers a rich opportunity for considering children’s gradual development from early consciousness of perceptual and movement experiences, to representational consciousness and language. The present paper develops a cognitive semiotic theory of the transition into language based on empirical findings interpreted through both cognitive (Piaget 1962; Werner and Kaplan 1963) and Peircean theory. For the latter I rely on the work of colleagues who have made strong inroads in the process of integrating developmental and semiotic approaches (e.g. Daddesio 1995; Sonesson 2007; Zlatev 2009a; 2013; Lenninger 2012). Semiotics can be defined as (a) the study of signs, which is the historically more common description or, more broadly, (b) the study of all meanings. Under (a) there is variation in what constitutes a sign, but only sign-­based meanings are included in semiotics. Under (b), favoured by many in cognitive semiotics, all experience of meaning (even sensation) can be considered semiotic (i.e., meaningful), but only some special kinds of meanings are signs. From a developmental perspective, equating semiotics with meaning, while allowing signs a developmental course of emergence is most useful and will be adopted here. The field of cognitive semiotics requires a developmental perspective in order to track the time and growth trajectory of children’s development of meaning. This perspective is represented by a number of pioneers in the field including Lenninger (2012), Andrén (2010) and Zlatev (2009a, 2013). My work has centred on the cognitive basis of children’s transition into language, and initially focused on the underlying development of the capacity for mental representation as observed in play. Since both language and representational play are symbolic modes, I hypothesized a temporal relationship between developments across these modalities (McCune-­ Nicolich 1981a). Children’s expressions in both play and language are interpreted from a phenomenological perspective (Sonesson 2007). That is, for example, in identifying a play act at a particular level, evidence of the child’s state of mind is sought in the accompanying behaviour and attitudes (cf. Bloom 1993; Searle 1992; McCune 2008). While research demonstrated that a given level of language did not appear until the same time or later than the analogous representational play behaviour, the temporal distance between these developments indicated that additional behavioural developments, beyond mental representation contributed to the children’s transition to linguistic productivity (McCune 1995). In fact, the missing element


Lorraine McCune

turned out to be vocal resources suited to embodying representational meanings in vocal expressions, a necessity identified by Sonesson (2007). Influenced by Thelen (1989), I proposed a dynamic system of underlying vocal and representational variables that, together, might predict this transition in individual children (McCune 1992, 2008). In this chapter I bring these ideas under the cognitive semiotic umbrella. Cognitive semiotics has already begun to incorporate developmental ideas. Lenninger’s (2012) analyses of sign development in children’s understanding of pictures, and Daddesio’s (1995) integration of semiotic study with cognitive development provide important background for the consideration of children’s semiotic development. Daddesio limited his analysis of cognition underlying symbolic development to its onset at the end of the sensorimotor period (Piaget 1954), shifting to consideration of language without further attention to underlying cognitive changes supporting linguistic development. The present chapter additionally addresses Piaget’s (1962) analysis of symbolic developments beyond the sensorimotor period. These developments, potentiating language, according to Piaget, are highly relevant for the study of cognitive developmental semiotics. Sonesson’s (1989, 2007, 2010) analyses of Peirce provided background for both Lenninger and the present paper. Zlatev has identified the importance of mimetic processes in semiotic development, following from Donald’s (1991) evolutionary proposals, and developed a hierarchy of bodily mimesis based on Piaget’s (1962) analysis of emerging imitation and initial representation in the sensorimotor period (Zlatev 2008b, 2013). This work facilitated the development of a more complete stage model incorporating the broader perspective where Piaget considers later semiotic development through the lens of representational play (Zlatev and McCune 2014). A primary contribution of the present chapter is the integration of previously neglected psychological work on the development of mental representation in play and language (Piaget 1962; Werner and Kaplan 1963; McCune 2008) with cognitive semiotics.

2. Organization and concepts Piaget’s cognitive theory (Section 3) provides a frame spanning birth to adolescence, and so offers a framework for addressing semiotic development. Within the theory, initial meaning (Section 3.1) is experienced through perception and action, without the infant having access to mental representation. Piaget (1962) illustrated the transition to mental representation showing a stage-­like sequence of representational play activities culminating in differentiated symbolic expression (Section 3.2). In a complementary analysis Werner and Kaplan (1963) addressed the transition to mental representation through vocalization and gesture, the former culminating in differentiated symbolic language (Section 4). Section 5 addresses the embodiment of symbols in vocal expression, including the roles of both originally autonomic laryngeal vocalizations and the source of early words in children’s pre-­ word consonant production. Terminology varies across the theorists whose work is addressed here, so I begin with some theoretical and definitional distinctions.

Meaning, Consciousness, and the Onset of Language


Prior to adoption by cognitive science, the terms representation and mental representation were understood as describing conscious mental states. Contemporary cognitive science employs the term representation to describe information stored in the brain or utilized in computer simulation. I use these terms to refer to mental states with the potential for contemplation beyond present perceptual reality. Representational states are distinguished from perceptual mental states: the latter are limited to meanings and experiences of present reality. In the view expressed here, sensorimotor and perceptual meanings develop during the first year of life. These initial meanings, considered broadly as semiotic, provide the foundation for the representational development evident between one and two years of age in both play and language (McCune 2008). In semiotics, the use of the term sign usually involves one entity standing for another in a one-­way relationship; the term symbol is reserved for conventional signs, such as those of a language or mathematics (cf. Sonesson 2007). A representational intention is the purposeful use of a sign/meaning pairing. Developmentally the relationship between sign and referent varies based on extent of differentiation or distancing between them (Werner and Kaplan 1963; Muller, Yeung and Hutchison 2013). Werner and Kaplan (1963), to accomplish their detailed analysis of symbolic development, invented their own terminology. So what for many might be a “sign/meaning relationship” is, in their terminology, a relationship between a symbolic vehicle and a referent, while the term sign or signal is used for non-­ representational material that cues an animal or human to action or avoidance (ibid: 14). The child’s capacity for building symbolic vehicle/referent relationships develops and differentiation between the two is gradual, culminating in relationships between conventional symbolic vehicles (words) and their meanings. It is only at this most differentiated level of symbolic development that word meanings include syntactic roles in utterances allowing their grammatical combination. Peirce’s well-­known division of signs into icon, index, and symbol is described as, in its pure form, having the following relationships between sign and referent: icon by similarity, index by contiguity, symbol, by convention. Real-­world sign relationships (according to Peirce) are not likely to be purely of one type (Jakobson 1965; Sonesson 1989).1 Any real-­world sign-­referent relationship is likely to be a hybrid of two or all three of the basic types. Developmental cognitive semiotics has the goal of encompassing all forms of meaning (as the study of semiotics in general does). So the child’s sense of meaning prior to the development of sign relationships is also of interest. Both Piaget (1962) and Werner and Kaplan (1963) proposed that the capacity for learning to use conventional symbols, such as the words of a language, is built on the earlier development of sensorimotor and perceptual knowledge, and of personal symbols. Piaget simply terms these “symbols” reserving the term “sign” for con1 This may explain why it can be hard for those of us in psychology, unfamiliar with the nuances of semiotic theory, to deal well with these terms.


Lorraine McCune

ventional symbols. So Piaget’s sign is (more or less) a Peircean symbol. At first glance the difference between them may appear to be one of terminology, but the Piagetian personal symbol offers an important developmental distinction, also respected by Werner and Kaplan, and not found in Peirce, as Peirce preferred to consider symbols without reference to concrete minds in the stage of development that might instantiate them (Daddesio 1995). The term index also seems to have somewhat different meanings for Peirce (cf. Sonesson, 2007) and for Piaget (1954; 1962). For Piaget, the index serves to bring to mind the mental object of contemplation or action. The perceptual experience of a part of a hidden object, for example, facilitates the action of uncovering the object. For Piaget there is neither mental differentiation between the part and the whole object, nor confusion between the two. Rather, perception of the part brings to mind (leads to a representation of) the whole. At an earlier developmental point, the sight of a part of a covered object does not lead to expectation of finding the object as a whole. For Peirce, indices, as all signs, necessarily involve several independent instances. Using the Piagetian criterion of differentiation, Sonesson (2007) suggests that the two notions of index can be made compatible, if we take Piaget to be concerned with the “indexical ground”, or “indexicality”, which is a prerequisite for something to be an index, but not a sign in itself before the differentiation occasioned by the sign function. Piaget (1962) and Werner and Kaplan (1963) provide parallel analyses of how personal symbols initially emerge, following earlier capacity for indexical relationships: Piaget through play, Werner and Kaplan through vocalization and gesture, both influenced by imitation of persons and object motion in the environment. These developments culminate in the use of conventional symbols: that is, those recognized as symbols in Peirce’s view. In what follows I use the terms personal symbol and conventional symbol (or simply symbol) to preserve the developmental distinction and respect the Peircean terminology.

3. Piaget’s cognitive theory Piaget’s cognitive theory includes development from birth through to late adolescence (e.g. Piaget and Inhelder 1969), and provides a psychological foundation against which to consider semiotic development. Piaget (1962) describes the development of the symbolic ability as initially derived from imitation, and culminating in the differentiation of symbol from symbolized in advanced representational play. However, Piaget’s primary interest was not in semiotics, but in the logical operations that, in his view, underlay adult scientific thought. Consequently, his major division into stages is based on developmental change in “operations”. From this perspective, pre-­sign meanings encompass a developing consciousness of stability in contrast with change, in particular changes that are reversible, such as motion toward or away from the child, or movement in a vertical direction, up or down (McCune-­Nicolich 1981b; McCune 2006, 2008). During the first year or so of life, children become able to anticipate the effects of actions on objects in these reversible

Meaning, Consciousness, and the Onset of Language


dimensions. So, important pre-­sign meanings include the consciousness that change is immanent and a simultaneous sense of the anticipated outcome. Early words refer to events that are reversible over time or space, (e.g. up, out) as well as to names of objects and people (McCune 2006). The three major periods of cognitive development proposed by Piaget are termed (1) sensorimotor, (2) concrete operations, and (3) formal operations. The critical element in operations is reversibility, which characterizes all three levels. Perhaps the simplest example of operations that comes to mind is the process of addition and subtraction. An amount can be increased by adding elements. One can reverse this operation of addition by one of subtraction: remove the same number of elements and the total will be as it was before any operations were begun.2 Simple addition and subtraction are considered to be concrete operations because they can be easily demonstrated with real-­world, concrete items. The systematic control of variables and their influence typical of the scientific method is considered to involve formal operations because the operations involved occur in the abstract, but this level is also characterized by reversibility. Reversibility is first evident in the sensorimotor period when the child is able to use motor action to go in one direction; then return to the starting point. Children of one to two years of age are fascinated by the possibility of reversibility as witnessed in their play involving dumping objects from a container and refilling it, building a block tower and knocking it down, or simply stepping up and down on a single step. These activities no doubt contribute to their internal sense of meaning regarding physical reversibility. In Piaget’s theory semiotic development is what allows the child to progress from the sensorimotor period which culminates in operations restricted to reversing literal actions or movements to eventual concrete and formal operations. The sensorimotor and concrete operational levels are considered analogous with respect to operations, but are separated by the long period from about two years of age (when mental representation becomes initially available) until eight years. This developmental period is required to internalize operations and build the content knowledge, or figurative knowledge (see Section 3.1) that is also required for concrete operations. The further capacity for abstraction develops between 8 or 9 years of age and later adolescence, supporting the formal operations. These developments no doubt are integrated with physiological maturation of brain and other neurological structures. Lenninger (2012) refers to “Piaget’s four-­stage model of cognitive development” (p. 119 et seq.) in an analysis emphasizing semiotics, including the period of initial development of mental representation, that is symbolic development (between sensorimotor and concrete operations), as a developmental stage. For Piaget, stage change relates to operations only, but the period of initial symbolic development is of equal importance because without the capacity for internal mental representation no operations beyond sensorimotor action would 2 One can see the analogy of this “operation” with the earlier pre-­sign meaning of reversible transition in motion from up to down.


Lorraine McCune

be possible. As more advanced operations develop, they both utilize and contribute to semiotic development.

3.1. Meaning without mental representation In describing the course of sensorimotor development, Piaget was most interested in the operative characteristics, those that eventually lead to logical operations. However, during the same age range of birth to 18–24 months the child also begins developing figurative knowledge: that is an understanding of the qualities of real world entities, qualities such as texture, colour, weight, and shape. Ruff (1984) showed that as early as 9 months of age children identify differences in shape and colour by their actions on objects. They also show an understanding of size differences as their grasp is adjusted to the size of the target object during reach. This clearly shows pre-­sign cognitive resources (meanings) that will later facilitate representational development. The culmination of the sensorimotor period is defined as the onset of mental representation (Piaget 1962). Following Sartre (1962 [1948]) and many others, I understand mental representation as the ability to have consciously in mind some entity or situation that is not present to the senses. Both theory and research demonstrate a transition from a time when such mental representation is not available to the child and a time when it is. Stage 6 of Object Permanence requires retrieval of an object from a hiding place under conditions where the child does not directly witness the hiding. Therefore, the child must keep in mind the location of the object while the object itself is not in view. But at what point can the child recall the specific object that was hidden? Testing representation (memory) for a specific object involves showing the child a small object, hiding it in the hand or a container, and depositing it, out of view of the child, under a cloth or screen (Ramsay and Campos 1981). To determine whether the children mentally represented a specific object while it was hidden, Ramsay and Campos used a “surprise” task, where the infant found a different object from the one that had been hidden. The infants’ reaction to the novel object demonstrated whether or not they had kept in mind (i.e. represented) the recently hidden specific object. Children (all 11 months old) who had entered Stage 6 (as previously shown in a traditional stage 6 hiding task without an object switch), showed surprise and continued searching when the “wrong” object appeared when they lifted the cloth. In contrast, those who had only achieved Stage 5 happily retrieved and played with the substituted object. Hakke and Somerville (1985) demonstrated the trajectory of development beyond initial entry into Stage 6 in a more complex hiding task. On the figurative side, exploration of objects, and observation of their use, culminate in knowledge of specific objects and their social roles. At the transition to mental representation, children recognize typical household objects by action. For example, they touch a comb to their hair, a cup to their lips. This sometimes occurs as early as 9 or 10 months of age for some children with some objects. These actions are considered the first level of “representational play” because the object elicits

Meaning, Consciousness, and the Onset of Language


actions that “represent” the real function of the object. They are considered “play” because they are done for their own sake, and for the child’s own interest. At the same time, they express early meanings, pre-­sign, yet semiotic in the broader sense. The action of using a comb to mime combing the hair shows comprehension of this meaning, but the act showing the meaning is the act of combing itself, performed out of context. Hence, from a semiotic perspective, sign and referent are very closely aligned. That is, touching the comb to the hair is a sign representing combing the hair, an act using a highly similar action. Still, the child may be unaware of the representational correspondence. Hence these acts are transitional, as the child’s internal sense of meaning is unknown. Further semiotic growth, the development of explicit mental representation, recognized by observable evidence as described in the following is an essential component of language acquisition, but it is not the sole influence.

3.2. Development of representational meaning in play The development of mental representation involves continual processes of differentiation and increasing capacity for the integration of meaning into more abstract structures (Muller et al. 2013). According to Werner and Kaplan (1963), this process is prompted and accompanied by psychological differentiation from the caregiver, and a major motive for communicative development is the maintenance of emotional closeness in the face of the differentiation experienced. Observation of play allows analysis of this differentiation as it proceeds. McCune (1993, 2008, 2010) offers detailed analyses of this process encompassing both play actions at differing levels and potential parallels with self-­other differentiation. In play, the earliest differentiation is between an act occurring instrumentally (such as really drinking water) and the expression of a similar action, out of its real context, when prompted by a familiar object. This act is representational from the observer perspective. The child does not show awareness of the representational relationship between the played act and its real counterpart when these acts are first observed. Pre-­symbolic play (Level 1) described above is typically shown between 8 and 11 months of age, quickly followed by Levels 2 and 3, which do seem to exhibit conscious awareness of their representational nature on the part of the child. While at Level 1 children do express an internalized sense of meaning by their actions when they encounter real objects, familiar from their daily activities out of context, their actions are brief and show a serious demeanour. The act is prompted by the object, and is minimally differentiated from the real activity. These acts mark a transition toward mental representation, but are limited by reliance on perception and directly related action. The same act, performed at Level 2 appears more exaggerated and “playful”, often including sound effects and looks to mother. When putting a cup to the lips at Level 1, the child re-­plays a typical action in reduced form. When “drinking sounds” are added, it seems the child may be making a mental comparison between the real


Lorraine McCune

and simulated act. A look to mother with a smile seems to inquire whether she recognizes the meaning as well. The action is nearly the same as Level 1, but a shift toward the representational in consciousness seems to arise. At Level 3, the child moves beyond his or her own action to include others’ typical activity, or make an “other” the focus of the action. When the child extends the cup (Level 3) to give mother a drink, the action embodying the meaning differs crucially from his own simulated action: he extends the cup to mother, a different gesture from putting it to his own mouth, adjusted to the representational situation. While these developments may seem small, they testify to different internal consciousness underlying these acts and a differentiation of the representation of action (the play) from what it represents (the real action). Empirical studies (McCune, 1995, 2008) have shown that these developments occur fairly closely in time between 8 and 12 months of age for most children. The term “pretend” often used informally seems applicable to Level 3, but not earlier. Level 4, the combinatorial pretend, shows greater variability, and is typically attained at 13 to 18 months of age, but often later. Children’s play combinations may begin as simply as the child pretending to drink, then offering the empty cup to mother or doll: a combination of a self-­pretend act that perhaps suggests the subsequent other-­ pretend act. Often a single pretend scheme such as drinking or grooming is played out with all available animates, child herself, mother, dolls, and experimenter, as if the child’s goal is to express and experience a given meaning through varied but similar acts. With development, however, broad elaboration continuing for several minutes can ensue. A dress-­up sequence may include donning adult shoes from the toy set, putting the toy purse over an arm, and waving “Bye-­bye”, simulating a shopping trip. When play acts are combined in these ways it seems there is an ongoing mental stream of meaningful consciousness, supported by the played acts. The inclusion of an act in varied combinations of action and recipient shows growth in both the differentiation of symbols expressed in play and their potential integration among themselves. It is at this level that McCune (1995; 2008) identified referential words. Level 5: hierarchical pretend exhibits further differentiation between the play act and the meaning it represents as well as even greater variability in the age of attainment (typically between 18–24 months of age). A significant shift is observed at Level 5, when the child gives evidence of internally “planning” an act prior to execution. Such play suggests that the child is now capable of an internal symbolic or representational consciousness, rather than being dependent on enactment with objects to support internal meaning. The representational intention can now arise without external perceptual support, with internal plans guiding external action, fulfilling Sartre’s definition of imaginal consciousness. The hierarchical aspect of this behaviour rests on this prior designation. In object substitution, a prior implicit internal designation guides play, e.g. “For the purposes of this activity, this block is a telephone”. Prior designation, or planning, can sometimes be identified by the child announcing a play intention, by performing preparatory actions indicating a plan, or by searching for objects needed to accomplish the activity. When a doll is treated as a centre of its own action (e.g. placing the

Meaning, Consciousness, and the Onset of Language


cup in the doll’s hand rather than to her mouth for pretend drinking) the implicit internal designation of “doll = animate agent” provides the hierarchical structure for the ensuing activities. Once a play theme is established, at Level 4 or 5, one act can suggest another and elaborate scenarios can ensue. This is the first point at which a differentiated sign/meaning relationship that would allow the development of conventional symbols (words) capable of entering into grammatical linguistic structures seems plausible. However, iconicity remains in force throughout all representational play acts, as the child’s actions mime those she or others have performed with practical goals in the past and the play acts are personal symbols, as the child chooses the actions or objects that become signs of an internal meaning. Objects that resemble the real objects are always a first choice for play action. The Appendix details Piaget’s original analysis of play levels and those operationalized by Nicolich (1997), providing background for McCune-­Nicolich’s (1981) theoretical analysis of play/language relationships, and empirical study by McCune (1995). Where in this sequence does the sign/referent relation fulfil the criterion for representation of the ability to have consciously in mind some entity or situation not present to the senses? This capacity develops gradually. While clearly absent at Level 1, and clearly present at Level 5, intermediate levels can be difficult to read. Differentiation between the sign and what it represents is a gradual process (McCune 1993, 2010).

4. Embodying symbols in the vocal medium A complementary approach to Piaget’s analysis of representational development in play is that of Werner and Kaplan (1963) on language as detailed in their book, Symbol Formation. Sonesson (2007) recognized that a symbol requires “some kind of physical substratum in order to exist”, that is, it must be “reified into an object publicly accessible to all” and separate from the minds that instantiate it in order to function meaningfully in a conventional system. Werner and Kaplan demonstrate this process with an example including the origin of a word form in a “natural” vocalization and its shaping through consonant development in interaction with the ambient language as summarized below. The following two sections, based on empirical study, describe (4.1) communicative use of a “natural” vocalization that may be of universal importance, and (4.2) separate evidence of the effects on the transition to referential words of learning consonants through babbling. Werner and Kaplan examined the formation of symbols broadly, but their detailed developmental proposal seeks to address the manner in which children develop the capacity to form the sentences of their language.3 Their thesis is that human beings 3 Recall that Werner and Kaplan use the term symbolic vehicle for signs in sign/referent relationships in general, while specifying various levels of differentiation.


Lorraine McCune

have a natural resonance with objects and people that surround them, prompting an internal process of dynamic schematizing. The vocalizations and bodily movements they describe also fulfil the definition of bodily mimesis (Zlatev 2008a, 2013), and such internal dynamic schematizing can be understood as a process potentially underlying mimetic functions. Over the course of the first two years of life, observable activities and, no doubt, underlying neurological processes bring about the ability to create symbols both in play action as above, and in vocalization. Initial pre-­symbolic meanings become gradually differentiated from their close practical relationships with entities and actions in the world (McCune 1993; 2008). The culmination of the differentiation process is the integration of conventional verbal symbols (words) into combinations exhibiting the grammar of their ambient language (Roy, Copley and McCune 2015). At this point, symbol and referent are differentiated, as these symbols (words in sentences) include, along with their somewhat directly accessible definitions, the abstract grammatical properties that allow integration in sentences. Werner and Kaplan traced the emergence of clearly referential words (those recognizing classes of elements with the same vocal symbol) along a pathway beginning with sounds produced automatically in certain circumstances (such as the sounds experienced while eating) and culminating in sufficient differentiation to allow grammatical combinations. The initial sounds exemplify pre-­symbolic meanings: that is, the sound and meaning are unified and experienced as such by the child. Werner and Kaplan report that both Hildegard Leopold and a child studied by Lewis (1936) derived their initial vocables, as Werner and Kaplan term these initial meaningful vocalizations, from the sounds accompanying food ingestion (Hildegarde: [m]; Ament’s niece, studied by Lewis: [mammam]). The vocable initially “referred to a large sphere of events related to food-­getting and food eating” (Werner and Kaplan 1963: 111). In this example, rather than highly restricted, the word’s application is diffuse, yet context-­imbedded. In theory, the child first experiences the sounds generated while ingesting food, then begins to be reminded of these sounds in contexts related to eating, for example seeing a desirable food. This leads to production somewhat separate from the actual eating activity that originally generated the sounds. For Ament’s niece the vocable mammam began as a context-­limited word related to eating and food, but was gradually differentiated in both form and meaning through interaction with the ambient language. Context-­ limited words are similar in their symbolic status to the self-­pretend acts of Level 2 described above. That is, differentiation between symbolic vehicle and referent is partial. Werner and Kaplan proposed that children form their initial symbols in the process of gradually developing the ability to represent (i.e., consciously experience) situations internally, in the absence of perceptual support. The steps in representational play mentioned above also demonstrate this process. Grounding in bodily experience is seen as critical. An external symbol, such as a word form, is constructed in relation to the underlying meaning that it comes to express. As additional words are learned, the reference of the initial vocable becomes restricted. Ament’s niece initially (at 354 days of age) used [mammam] in reference to her mother and

Meaning, Consciousness, and the Onset of Language


sister as well as bread, cakes, and cooked dishes. By 597 days of age clearly referential words were produced: [mammam] was delimited to cooked dishes, while bread and cakes were [brodi] and her mother was [mama], her sister [desi]. Werner and Kaplan, as does Piaget regarding play, attribute these developments to an underlying process of differentiation. As the child differentiates the sound produced while eating and producing it in the absence of actual ingestion, ongoing experience of the language accompanying various events of importance to the child facilitates the development of additional meanings in relation to adult words. The sounds of objects are also co-­opted as early vocal signifiers (e.g. the tick-­tock of a clock coming to refer to all sorts of watches and clocks and similar-­ appearing objects). As new meanings come to be used, the original vocalization becomes restricted in its meaning and use. Then, having established the potential for meaningful vocalization outside the natural source context, the child begins to differentiate the wider variety of meanings expressed with various vocal forms produced by speakers in her environment. That is, due to the developmental process of differentiation, the child now recognizes sign/meaning correspondence in conventional words presented by adults in typical situations. Werner and Kaplan did not consider the form-­meaning link to be arbitrary, as word-­form and meaning are co-­constructed through dynamic schematizing. Words become conventional symbols as they partake of the ambient language and its grammatical properties. However, word-­form (symbolic vehicle) and meaning remain related at a neurological level throughout this developmental process and beyond. Dynamic schematizing is defined as the underlying process that allows the differentiation of varied aspects of meaning and form, for example, supporting the transition from use of the original form mammam across varied entities related to eating/food, to the development of separate forms in relation to each of these. At the same time, specific forms become more integrated with specific meanings, and can reach hierarchical integration among themselves when produced in sentences. While from the beginning all of these examples suggest the sign function, a greater level of differentiation is needed for both the personal symbol of Piaget and [conventional] symbols recognized by both Piaget and Peirce. Vihman and McCune (1994) distinguished context-­limited expressions, such as the initial mamam, from context-­flexible expressions, exemplified by mamam, when it refers to a variety of cooked dishes. The latter, termed referential words by McCune (2008) show form/content differentiation by their application across entities or situations sharing relevant content properties. As referential words become differentiated, each becomes more fully integrated with its internal meaning, and differentiated from context. According to Werner and Kaplan, children’s single words must “shift function”, losing their close connection with context and specific communicative goals to take on syntagmatic roles in the sentences of the ambient language. Children experience the words of the language from adults, but then construct form and meaning through their own internal processes. Because of the joint developmental history of sound and meaning, hearing a word instantiates an internal contentful state. For a child who has achieved the


Lorraine McCune

capacity for referential language, various external circumstances related to that same state for the child (but perhaps not for adults) may call the word to mind, presumably its phonetic potential as well as its meaning, leading to production. The fact that children generalize their productions beyond those expected by adults testifies to an internal constructive process. It is this level of differentiation that allows access to the idealized external system of symbols constituting the ambient language (Sonesson 2007).

4.1. From natural meaning to personal symbol: The role of laryngeal vocalization Laryngeal vocalizations termed grunts function communicatively in all extant primate species (Marler and Tenaza, 1977). Adult humans also produce laryngeal vocalizations in conversational contexts to indicate continued attention (e.g. Clancy, Thompson, Suzuki and Tao 1996) and to request clarification of a message (Dingemanse, Torreira and Enfield 2013). McCune (1992, 2008) found communicative use of the grunt vocalization was predictive of children’s onset of referential language. McCune, Vihman, Roug-­Hellichius, Delery, and Gogate (1996), who analyzed data from five of the participants of the McCune studies, traced the occurrence of this vocalization beginning at 9 months of age and found that grunts first co-­occurred with physical effort, then with focused attention, before shifting to communicative use at the transition to referential language. This reflects a process of discovery that can occur across infants, and across generations as well as across species in evolutionary time without external modelling or instruction (Tinbergen 1952; McCune 1999). In seeking a theoretical explanation for the sequence effort, attention, and communication, we determined that autonomic grunts result when reflex closure of the larynx, engaged to enhance oxygenation, is suddenly released. This reflexive process, evident across mammalian species, tends to increase oxygenation to the blood and restore homeostasis, and/or facilitate ongoing effortful activity (e.g. Remmers 1973). While grunts of effort occur from birth, it is not until late in the first year or the beginning of the second year that children begin to apply force purposively in relation to a goal (e.g. attempting to force a puzzle piece into the wrong space or insert an object into too small an aperture.) Grunts accompanying such effortful activities, in a child with a dawning mental representation ability offer the opportunity to signify the internal sense of effort/resistance, just as the sounds of eating can be formed to signify eating-­related situations. However, grunts of effort differ in the breadth of contexts of their occurrence, potentiating a connection across situations where the child experiences somewhat varied intentional states, all unified by the sense of effort and/or attention. Focused attention, that is the sustained visual fixation of a display occurs as early as 6 months of age, and McCune et al. (1996) recognized attention grunts at 9 months, the earliest age analyzed, in their sample. One can consider a grunt accompanying effort as an indexicality (Sonesson 2007), as it is the epiphenomenon of a physiological process with potential for use in a

Meaning, Consciousness, and the Onset of Language


sign relationship. Grunts accompanying effort reflect the same biological processes as those accompanying physiological stress. They occurred accompanying movements that required energy, such as shifting from a crawling position to standing or the reverse. An infant conscious of effort would experience vocal correspon­ dence ­between this autonomic vocalization and the internal sense of effort. Focused ­attention is, by definition, an intentional state, providing the opportunity for sign/ meaning construction once the child has the capacity for mental representation. Communicative use of this vocalization signals the construction of a personal symbol, based on the indexicality existing in the physiological process. Communicative grunts differed in form from effort grunts, which continued to occur, but with reduced frequency, during the same time period where communicative grunts were used. The effort grunts, across the five children studied, utilized the central vowels that require least shaping of the oral tract, suggesting involuntary production. In contrast, communicative grunts exhibited form/meaning synergies with words, such that children who had begun context-­limited words produced communicative grunts utilizing the vowels most prominent in their early words. This shaping of the vocalization is evidence that communicative grunts are constructed as symbolic, and based on their indexical history in relation to biological processes. When grunts began to be used communicatively there was no doubt of their voluntary character and specific meaning, the latter varying with context of use. If the desired response was not forthcoming the grunt was repeated with increasing intensity. In one case, of a 16-month-­old extending a toy bottle for his mother to open, his grunts finally graded into crying, a more emotional aspect of laryngeal activation, before she complied with his request. Another example demonstrates the specificity of meaning intended in some communicative grunt episodes. The child, who showed referential language comprehension, but not production, wanted his mother to name something for him. He was pointing at toy a truck, so she first said truck. This led to more intense grunts and points. She then touched and named parts of the truck until she finally said steering wheel, and he gave an expression of satisfaction. I propose that for each of the children studied the communicative grunt became a personal symbol, where the referent is the child’s current intentional mental content. That is, the child seeks to convey his current internal mental state with each production, so semantic content varies from occasion to occasion. Thus the symbolic vehicle, the communicative grunt, is differentiated from its meaning, but it is not conventional. The relationship between communicative grunt and meaning is not one of similarity or contiguity, but it is fleeting and personal. The vocalization thus symbolizes the internal state the child has in mind in the moment, including both the sense of meaning (which varies freely from one situation to the next) and the communicative goal. In their analysis of diary findings, Werner and Kaplan identify vocalizations similar to communicative grunts as “call sounds” which tend to occur at the transition to conventional language. Similarly, McCune et al. (1996) found that, for children who were phonetically prepared (see the following section), the shift to referential words


Lorraine McCune

and a sharp increase in word production occurred within one session of productive communicative grunt use, while for those lacking phonetic skill, referential comprehension was observed. Children producing this simple vocalization implicitly recognize that a self-­ produced sound can reference a thought. Experiencing this communicative goal state and producing the simple communicative grunt may come to prompt more frequent vocalization in relation to meaningful internal states, perhaps providing the basis for acquiring more conventional symbol/meaning relations from the ambient language. This shift to conventional symbols of the ambient language requires further phonetic ability, the subject of the next section. In learning referential words, the child generalizes initially, based on her own associations, so these words are also at least partially personal symbols, but with more stable meanings that are approaching the conventional. It is only when words enter into sentences that their full conventional use is exploited.

4.2. Phonetic Resources in the transition to language: Vocal Motor Schemes Upper vocal tract activity is essential to human language as laryngeal activity is insufficient for differentiating a broad variety of meanings. Babbling is a known resource for language development, typically beginning between 6 and 9 months of age as the child differentiates and stabilizes jaw movements (Davis and MacNeilage 1990). In order to produce a specific word, it is essential that the child is able to form a given sound “at will”, rapidly orienting the upper tract articulators to the necessary positions in relation the expelled airstream. McCune and Vihman (1987, 2001) sought to determine the timing of children’s ability to form specific consonants. They devised the Vocal Motor Scheme (VMS), as a measure of vocal production skill, assessed by frequent and consistent use of one or more specific supraglottal consonants. The McCune and Vihman term Vocal Motor Scheme has its origin in Piagetian sensorimotor cognition, where skill with a particular movement is termed a “scheme” (e.g. Piaget and Inhelder 1969). For example, Thelen, Corbetta and Spencer (1996) demonstrated that 6-month-­old children’s successive reaches toward an object showed random variation in trajectory, while by eight months each child showed a relatively consistent trajectory in repeated reaches, achieving a “reaching scheme” which could vary with reference to distance and target characteristics. Analogously, repeated accurate production of the motor action yielding a given supraglottal consonant, or some other vocal target, is considered a Vocal Motor Scheme, as the consonant varies in its production by its phonetic role in the utterance. Children who made the transition to referential word production by 16 months, the final month of the study, all showed VMS-­level competence with at least two supraglottal consonants by the time of that transition. Of the referential words used by each child at both 15 and 16 months, on average 90 % incorporated each child’s specific VMS repertoire (McCune and Vihman 2001). Vihman’s continuing studies have more fully established the value of this variable as summarized by

Meaning, Consciousness, and the Onset of Language


Vihman, De Paolis, and Kerin-­Portnoy (2009). While communicative grunts utilize natural meaning in the construction of symbolic correspondences, vocal motor schemes derived from babbling potentiate the development of the infinite variety of sound/meaning correspondences that constitute human speech.

5. Conclusions: From pre-­symbolic meaning to referential language McCune (1992; 2008) found that children made the transition to referential language production (conventional symbols) when the following conditions were in place: (1) Mental representation reached the level shown in play by combining pretend acts, (2) Communicative intent comes to be realized by the production of the “natural” vocalization defined as a communicative grunt, and (3) Phonetic development reached a critical point (defined in early talkers by identification of at least two VMS). The children studied all showed communicative gestures earlier than communicative grunts (Lennon 1984), and gesture development may have provided the communicative foundation for shifting from grunts of effort and attention to communicative use. Children lacking the phonetic skill assessed by two VMS but exhibiting the requisite communicative and representational skills showed referential comprehension by gesture, in the absence of word production only in comprehension. They used communicative grunts in combination with gesture to solicit assistance and share meanings. The variables identified above are all measures of underlying abilities and so might be assessed in other ways. The vocal variables I have emphasized in this chapter, VMS and communicative grunts, share the property that each facilitates some aspect of language development over time. In addition, they both contribute “in the moment” to facilitating communicative production. This dual behavioural and developmental role is typical of variables within a dynamic system. This chapter has provided a cognitive semiotic account of children’s development of meaning, from the time before their first actions show any cultural knowledge up to their capacity for expressing meaning in the conventional words of their language. The several variables contributing to the transition, as is typical in a dynamic system, need not become evident in a particular order. Importantly, the reviewed studies showed that vocalizations in the behavioural repertoire for several months were not co-­opted for communicative use and word production until mental representation, as identified in play, became available. Through entering into the field of cognitive semiotics I have gained a new perspective on previous empirical findings and their theoretical interpretation. I can only hope that the information I offer here can have a reciprocal effect in the cognitive semiotics community.


Lorraine McCune

Appendix Sequence of representational play levels according to McCune (1995) with examples, aligned with developmental stages according to Piaget (1962), discussed in Section 3.2. Piaget (1962)

McCune (1995)

Sensorimotor Period

Level 1: Pre-­Symbolic Schemes The child picks up a comb, touches it to his hair, drops it. The child shows understanding of object use or meaning by brief The child picks up a toy recognitory gestures. telephone receiver, puts it to his ear, sets it aside. • No pretending • Properties of the present object The child gives the toy mop a are the stimulus for action swish on the floor. • Child appears serious rather than playful.

Prior to Stage 6 (8–11 months)

Stage 6 (11–13 months)


Level 2: Self-­Pretend (Auto-­Symbolic Schemes)

The child simulates drinking from an empty toy baby bottle.

Child pretends at self-­related activities while showing by elaborations such as sound effects, affect, and gesture, an awareness of the pretend aspects of the behavior.

The child eats from an empty spoon.

Symbolic Stage I Level 3: Single Representational Play Type I A Schemes (Assimilative) A. Including other actors or Type I B receivers of action, such as (Accommodative; doll or mother. Imitative) B. Pretending at activities of (12–13 Months) other people and objects such as dogs, trucks, trains, etc.

The child closes her eyes, pretending to sleep. Child feeds mother or doll (A) Child grooms mother or doll (A) Child pretends to mop floor (B) Child pretends to read book (B) Child moves toy car with appropriate sounds of a vehicle (B)

Meaning, Consciousness, and the Onset of Language

Piaget (1962)

McCune (1995)

Piaget does not distinguish single acts from simple multiple act combinations. (13–18 Months)

Level 4: Combinatorial Pretend 4.1 Single Scheme Combinations 4.2 Multi-­scheme Combinations

Type II A Type II B

Level 5.1: Hierarchical Pretend

Piaget distinguishes (A) the assimilative case where the child identifies one object with another from

An internal plan or designation is the basis for the pretend act. Child exhibits double knowledge: real and pretend.

(B) the accommodative or imitative case where the child identifies her own body with some other object or person (18–24 Months)

Evidence is of three types: (1) child engages in verbalization, search or other preparation; (2) one object is substituted for another with evidence that the child is aware of the multiple meanings expressed; (3) a doll is equated with a living being, treated as if it could act independently.

Type III A (18–24 Months)

Level 5.2 Hierarchical Combinations Any combinations including an element qualifying as Level 5 are included here.



Child combs own, then mother’s hair. (Single Scheme) Child stirs in pot, feeds doll, pours food into dish. (Multi Scheme) Child picks up a toy screwdriver, says “toothbrush” And makes motions of toothbrushing. Child picks up comb and doll, sets comb aside, removes doll’s hat (preparation) then combs doll’s hair. Child places spoon by doll’s hand.

Child picks up the doll, says “baby”, then feeds the doll and covers it with a cloth. Child puts play foods in a pot, stirs them. She dips the spoon in the pot, says “hot”, blows on spoon then offers it to mother. She waits, says “more”, and offers it again.

Mutsumi Imai

Chapter 8 The “Symbol Grounding Problem” Reinterpreted from the Perspective of Language Acquisition 1. Introduction In a seminal paper Harnad (1990) defined the so-­called symbol grounding problem, referring to the well-­known “Chinese Room Argument” posited by Searle (1980). Suppose you had to learn Chinese as a second language and the only source of information you had was a Chinese/Chinese dictionary. The trip through the dictionary would amount to a merry-­go-round, passing endlessly from one meaningless symbol or symbol-­string (the definientes) to another (the definienda), never coming to a halt on what anything meant. […] How can you ever get off the symbol/symbol merry-­ go-round? How is symbol meaning to be grounded in something other than just more meaningless symbols? This is the symbol ground problem. (Harnad 1990: 339–340)

Although Harnad (1990) raised this as a challenge for artificial intelligence employing the “classical” physical symbol systems approach, this problem may also be seen as one faced by young children learning their first language, who have to learn thousands of words to build up their lexicon. At the earliest stages, when their vocabulary is still small, it is not possible for adults to use language to explicitly teach children new words, since they would not be able to understand verbal explanations of word meanings. So children have to find the elementary semantic categories by themselves. Together with Cangelosi, Harnad imagines how language could have evolved on earth from the view-point of a Martian anthropologist. First beings would acquire “an entry-­level set of categories the honest way, like everyone else, but then assign them arbitrary names” (Cangelosi and Harnad 2001: 139). Then as a next step: “Once the entry-­level categories had accompanying names, the whole world of combinatory possibilities opened up and a lively trade in new categories could begin probably more in the spirit of barter than theft” (ibid: 139). Harnad assumed that the elementary level categories can be easily and almost automatically learned by trial and error because such basic-­level categories are already cut out by the world with natural gaps between them (Rosch and Mervis 1975; Rosch 1978). In this sense, Harnad may not have considered how the first set of symbols arises to be a part of the symbol grounding problem. Harnad further assumed that, once these categories are formed, the language learner’s task is just to learn the associations between the learned categories and arbitrary strings of


Mutsumi Imai

sounds (“names”). However, these assumptions are far too simplistic, when considering how young children acquire meanings of words and build up a vocabulary that contains thousands of words. First, before they can map words to elementary level categories, infants have to realize that the speech sounds they hear are associated with particular things or other elements of the events they observe. In other words, to start learning the meaning of individual words, they need to understand that the speech sounds can be segmented into units (i.e., words) and that each word denotes a concept (Piaget 1962). How infants gain this referential insight remains yet not fully clear (Asano, Imai, Kita, Kitajo, Okada and Thierry 2015; McCune and Zlatev 2015). In order to learn the meanings of individual words, infants need to first establish the indexical associations (Bates, Benigni, Bretherton, Camaioni and Volterra 1979) between words and their referents. This is not an easy task for infants who have just begun word leaning (e.g. Stager and Werker 1997; Imai, Miyazaki, Yueng, Hidaka, Kantartzis, Okada and Kita 2015). But even when infants have succeeded at that task, this is not sufficient to acquire the meaning of the word (Bates et al. 1979). And since adults cannot teach it to infants using language, they need to infer it on their own. However, inductive generalization from a single word-­referent association can go in any number of directions. That is to say, there are an infinite number of meanings that can be generalized from a single association, unless the possibilities are constrained in some way. It is now known that the way infants generalize words when they just start word learning indicates that basic-­level (or elementary-­level) categories do not emerge naturally, in contrast to what was assumed by Harnad and Rosch. Children often overgeneralize a word to form a chain-­structured category (Bowerman 1978). For example, a child who has learned that moon refers both to a full moon and to a crescent moon may extend the word not only to other round things (e.g. a round clock, a watch, a grapefruit) and to other crescent shaped things (e.g. a horn, a croissant) but also to things that are yellow or shiny (e.g. shiny leaves). In other words, in order to successfully map an object name to a basic-­level category, children need to realize that they should not generalize the word based on just any salient perceptual property (such as color, texture, or size); instead, they need to pay attention to shape similarity and ignore other perceptual properties (e.g. Imai, Gentner and Uchida 1994; Landau, Smith and Jones 1988). Many developmental psychologists have studied how young children use strategies such as shape bias to get around this problem when inferring the meaning of a word (e.g. Carey 1982; Clark 1993; Hollich, Hirsh-­Pasek and Golinkoff 2000; Gleitman and Gleitman 1994; Golinkoff, Hirsh-­Pasek, Bailey, and Wegner 1992; Imai, Gentner and Uchida 1994; Imai and Haryu 2001; Imai and Gentner 1997; Smith 1995; Tomasello 1997). However, this problem is just a part of the story. The lexicon is not merely an assembly of words each standing on its own; rather it is a complexly structured system, in which words are contrasted with one another along multiple dimensions at multiple levels (cf. Saussure 1916). In order to learn words so that they can use them according to the conventions of the adult speakers of the ambient language

The “Symbol Grounding Problem” Reinterpreted


community, children need to know how a particular word differs from the other words that surround it in the same semantic domain. Take color words, for example. We cannot say that a child “has acquired” the meaning of the word red if all he knows is that red is the color of apples and hence cannot apply this word to other variants of red, such as the color of fresh blood, red bricks, red autumn leaves, and red roses. Languages differ widely in the way in which they divide the continuous visible spectrum by color names (e.g. Berlin and Kay 1969; Cook, Kay and Regier 2005; Roberson, Davies, Corbett, and Vandervyver 2005), and boundaries for referents of color words are not available in the environment. Even among languages that seem to have highly comparable color vocabulary (i.e., the same number of words and the same focal colors for the corresponding categories), the boundaries can vary considerably. For example, although English and Japanese both have the color terms brown / cha-­iro and orange / daidai-­iro, the range covered by orange is much wider than daidai-­iro, and consequently, the range covered by cha-­iro is broader than the range brown refers to. A natural consequence of this is that children cannot acquire a color name from individual episodes of word-­to-world mapping alone; English-­reared children cannot be said to have acquired the meaning of the word red, unless they have learned how red contrasts with pink, orange, purple and other color names surrounding red, and where the boundaries are delineated between red and these neighboring words (Saji, Asano, Ohishi and Imai 2015). A riddle arises here. To understand the meaning of a particular color word, the learner seems to need to know other color names surrounding that word. More concretely, he or she cannot acquire the meaning of a single color word without knowing the structure of the color lexicon as a whole — ­how many words there are and which section of the visible color spectrum each word covers. Then, theoretically, children can never acquire a color word, because they do not already know the other color words. How can children get off this “merry-­go-round” (cf. Harnad 1990) to acquire the color lexicon? This problem is not limited to the domain of color names, of course. Cross-­ linguistic diversity is more of a rule than an exception for the majority of semantic domains including names of household containers, color, causality, mental states, motion, direction, and spatial relations (see chapters in Gumperz and Levinson 1996, Gentner and Goldin-­Meado, 2003; Wolff and Malt 2010 for detailed illustrations; also see Kay, Berlin and Merrifield 1997; Malt, Sloman, Gennari, Shi and Wang 1999; Wierzbicka 1999; Majid, Boster, and Bowerman 2008; Saji, Imai, Saalbach, Zhang, Shu and Okada 2011; Malt, Gennari, Imai, Ameel, Saji and Majid 2014). The diversity arises because language is highly selective in what elements of experience it encodes in words, and there are many possible ways to map between words and the world (Wolff and Malt 2010; Malt, Gennari, Imai, Ameel, Saji and Majid 2014). This means that it is largely up to each language and culture to choose how and on what basis to divide a given conceptual domain to form the categories to be named.


Mutsumi Imai

In fact, finding the conceptual domain to allocate an incoming new word may not be obvious, because languages differ not only in how a given conceptual domain is divided, but also in how conceptual domains are formed from sensory experiences. For example, Goddard and Wierzbicka (2013) point out that the concept of color, which seems nothing other than universal to speakers of English, French, German, Japanese, Chinese (and many other languages) is in fact not universal across languages in the world. What we see on the surface of an object is a conjunction of multiple properties such as patterns, visual textures, shininess, shape etc. When we (i.e., speakers of a language with color names) talk about color, we abstract color by segregating it from these other perceptible visual properties. Thus, when we consider the diversity in the lexical systems, it is clear that children cannot acquire the meanings of words by simply finding a way of hooking sensory or perceptual experience to some entry level words. To be able to use a word as adults do, children need to discover the range of the category that the given word covers. This requires finding the boundaries among a cluster of words belonging to the same lexical domain, which further means that children need to discover the lexical domain itself (Saji et al. 2011). Thinking this way, the real problem (for children, at least) for understanding the mechanism of lexical acquisition is to find out how children learn the meaning of a word without knowing the semantic domain that the word belongs to, as well as without knowing the words surrounding that word. To recapitulate, in order to “ground” not just a symbol but a system of symbols, children need to go through at least the following achievements (not necessarily in this chronological order): (1) to find out that words refer to concepts; (2) to find the referents of particular words in the particular context in which they are said; (3) to find the semantic domain to which the incoming words belong; (4) to generalize each word beyond the referent originally associated with the word in light of the similar words already existing in the lexicon and, if necessary, modify the meanings of the existing words; and (5) to acquire an adult-­like representation of the domain as a whole. Developmental psychologists have been engaged in uncovering the detailed process by which each of the above achievements takes place. But an equally important challenge for researchers is to uncover the bootstrapping mechanism through which children construct a vastly complex linguistic system, starting with virtually no words and no knowledge about the structure of the lexicon. In this chapter, I extend and reformulate the original “symbol grounding problem” (Harnad 1990) to address the problems children need to solve in the process of lexical acquisition, which include symbol emergence, embodiment, and construction of a system of symbols. For this goal, I review three lines of my work, which I believe are crucially relevant to this reformulated version of the problem. The first line of work concerns the question of how infants first hook sensory/ perceptual experiences onto language and realize that speech sounds refer and have meanings. The second documents how young children infer the meanings of new words, how they integrate these words into the vocabulary and how

The “Symbol Grounding Problem” Reinterpreted


they restructure the meanings of already learned words to construct the complex system of lexicon. The third part explores the roots of the cognitive function that makes flexible and complex inference possible. At the end, I revisit the original symbol grounding problem, asking whether AI, with current computer hardware and learning algorithms, can solve it.

2. How children come to realize that speech sounds are symbols During the first year of life, infants start to map speech sounds onto meanings, and in subsequent years they acquire a great number of words to build up a lexicon. However, for children, at the beginning speech sounds may be no different from other sounds in the environment, sounds which they hear while simultaneously observing rich visual scenes. How do children come to know that the sounds people make with their mouths are words, and that they refer to objects, actions or properties? One possibility is that a biologically endowed ability to realise cross-­modal mapping, particularly between auditory and visual percepts, scaffolds language learning in human infants (Maurer, Pathman, and Mondloch 2006; Imai and Kita 2014). Human infants can already map information in different modalities in the way adults would. For example, they can map size and numerosity (Lourenco and Longo 2010), and acoustic properties of speech and non-­speech sounds onto properties of visually presented objects (Yeung and Werker 2009; Walker, Bremner, Mason, Spring, Mattock, Slater and Johnson 2010). Importantly, this cross-­modal mapping ability is likely to be a heritage from our non-­human ancestors, as chimpanzees can map auditory pitch and luminance (Ludwig, Adachi, and Matsuzawa 2011). Consistent with this idea, 4-month-­old infants appear to sense intrinsic correspondences between speech sounds and certain features of visual input (Ozturk, Krehm, and Vouloumanos 2013; Peña, Mehler, and Nespor 2011). The question is whether infants’ ability for cross-­modal mapping is linked to language processing. We investigated this question with Japanese 11-month-­olds by measuring their brain activity using electroencephalograms (EEG) (Asano et al., 2015). In each trial, infants were presented with a picture of a shape (either spikey or rounded) followed by a novel word (kipi or moma, see Figure 1). The word-­ shape pairs were either sound symbolically matching or mismatching, according to Köhler’s original proposal (Köhler 1947) The recorded EEGs were analyzed to explore (a) whether 11-month-­old infants detect sound symbolism through cross-­modal perceptual binding; (b) how different regions of the infant brain communicate while sound-­symbolically matching and mismatching words are processed; and (c) whether infants at this age process novel sound-­shape pairs over and above simple perceptual cross-­modal binding and respond to sound-­symbolically mismatching words as a form of semantic anomaly.


Mutsumi Imai

Figure 1. Examples of the rounded shapes and the spikey shapes used in Asano et al.’s study

We first analyzed amplitude changes, especially in the gamma-­band, to investigate whether infants process sound symbolism perceptually within local networks that are responsible for cross-­modal perceptual integration. Previous studies have demonstrated that gamma band activity is related to unimodal perceptual binding both in adults (Tallon-­Baudry, Bertrand, Delpuech, and Pernier 1996) and in infants (Csibra, Davis, Spratling, and Johnson 2000). Given these results, gamma-­band activity might be related to perceptual binding across different modalities as well. As expected, in our study of 11-month-­old infants, we found that amplitude change increased more for sound-­symbolically matched sound-­shape pairs than for sound-­ symbolically mismatched pairs in the gamma band and in an early time window (1–300 ms), indicating that sound symbolism is indeed processed as perceptual binding. We further analyzed EEG signals to see how different areas of the brain communicated. The phase synchronization of neural oscillations increased, in comparison to the baseline period, significantly more in the mismatch condition than in the match condition. This effect was most pronounced over left-­hemisphere electrodes during the time window (301–600 ms) in which the N400 effect was detected in the Event Related Potential (ERP).1 We hypothesized that a reduction of N400 amplitude for sound symbolically mismatching sound-­shape pairs would indicate that infants with very little vocabulary assume sound symbolic correspondence between word sound and shape, and consider sound-­shape mismatches to be anomalies at a conceptual/semantic level (Asano et al. 2015). The time course of large-­scale synchronization suggests that cross-­modal binding was achieved quickly in the match condition. In contrast, sustained effort was required in the mismatch condition and left-­lateralized structures seemed to be involved. The beta-­band increase in phase synchronization in the mismatch condition suggests that the neural network identified in adults when they do semantic processing is recruited for processing novel words already at the age of 11 months. 1 The N400 effect is an ERP modulation known to be sensitive to semantic integration processes in adults (Kutas and Federmeier 2011), as well as in infants (Friedrich and Friederici 2011; Parise and Csibra 2012).

The “Symbol Grounding Problem” Reinterpreted


During the same time window in which the infants’ brains showed a differential style of communications across different brain regions, the N400 component was significantly larger for sound symbolically mismatching than for matching pairs in the analysis of ERP. Combined with the results from the amplitude change in the earlier time window, the large-­scale posterior-­anterior synchrony observed in the beta band over the left hemisphere in the N400 time window, and the N400 modulation in ERP suggest that the infants detected anomaly as the meaning of a word when they encountered the sound-­shape mismatch. To summarize, the results of Asano et al. (2015) suggest that 11-month-­old infants who had not actively started word meaning acquisition could spontaneously detect the inherent similarity between speech sounds and visual referents in their attempt to bind visual shapes and sounds, first perceptually and then conceptually. However, this study did not provide direct evidence that sound symbolism actually helped infants in the context of novel word learning. As discussed earlier, in order for children to acquire the meaning of a word, they need to start establishing an association between a word form (sound) and its referent. To examine whether sound symbolism indeed helps infants learn meanings of novel words, we tested 14-month-­old infants in a word learning task using their looking pattern as the measure (Imai et al. 2015). The infants were repeatedly presented with two word-­shape pairs. For half of the infants, the word and the shape sound-­symbolically matched; for the other half, they mismatched. After the infants had been habituated, they heard either kipi or moma and saw the two shapes side by side. Infants looked at the correct referent (i.e., the shape that had been associated with the word during habituation) faster and longer when they had been trained on the sound-­symbolically matching word-­referent pairs than when they had been trained on the sound-­symbolically mismatching pairs. To initiate language learning, infants need to realize that each set of speech sound denotes a concept and has a meaning. The two studies reviewed here (Asano et al. 2015; Imai et al. 2015) together suggest that infants’ ability to detect a non-­arbitrary iconic relation between vision and audition may help them gain this insight.

3. Fast mapping of word meanings As we saw in the previous section, sound symbolism could help infants hook words to their referents. However, most words do not have apparently detectable correspondence between sound and meaning. Hence, children eventually have to learn to infer the meanings of words when the help from sound symbolism is not readily available, and they have to do that in relation to other words they have already learned. Numerous developmental studies have shown that children extract a constellation of cues through previous world learning experience and use that knowledge to infer the meanings of novel words. For example, children know that nouns map to categories of objects and that object names are likely to be generalized by shape similarity, but not by size, texture, or color (Imai, Gentner, and Uchida 1994; Landau,


Mutsumi Imai

Smith and Jones 1988). When they hear a novel noun in association with a novel object, children assume that the word refers to the entirety of the object (Markman 1989), and spontaneously generalize it to other objects of similar shape (Imai and Haryu 2001). At the same time, by 2 years of age, children also know that the generalize-­by-shape rule does not apply to names of substances (Imai and Gentner 1997; Imai and Mazuka 2007). My collaborators and I have demonstrated that young children can combine these sources of knowledge flexibly in various different situations to come up with the most plausible interpretation of a novel word. Imai and Haryu (2001) demonstrated that 2-year-­old children can shift their interpretation of a novel name flexibly. When children knew the basic-­level name (e.g. cup), and the named object was an artifact (a paper cup with a particular design), they interpreted the novel label as referring to a subordinate category (that is, a particular kind of cup). However, when they heard a novel label together with a familiar animal (a penguin), they interpreted the label as the personal name of the animal. Importantly, when they did not know the basic level name of the named object, they mapped the newly taught label to the basic-­level category. Although it is more difficult to extract strong constraints for inferring meanings of a new verb, children do their best to make as plausible an inference as possible for verbs as well as by combining whatever sources of knowledge are available to them. For example, they assume that an action word appearing with only a single object word (i.e., intransitive verbs) tend to refer to people’s spontaneous movement (e.g. walking, running, etc.), while an action word co-­occurring with two object names (i.e., transitive verbs) tend to refer to an action people do to an object to affect the state or the nature of the object (e.g. Fisher and Song 2006; Naigles 1990). However, knowing the difference between transitive and intransitive verbs does not narrow down the possible verb meaning all that strongly, as there are numerous possible ways to generalize an single instance of action within the categories of caused action or spontaneous action. Previous research has indeed shown that preschool children in general have difficulty in generalizing a novel verb. In fact, it has been reported that, across English, Chinese and Japanese, 3-year-­old children could not even generalize a novel verb to the very same action, when the agent (or the theme object, or the instrument of the action) in the original scene is replaced with a new one (Imai, Haryu and Okada 2005; Imai, Li, Haryu, Okada, Hirsh-­Pasek, Golinkoff and Shigematsu 2008). Thus, children need cues to further narrow down the possible verb meanings within transitive or intransitive verbs. We found that sound symbolism could be one such cue. We demonstrated that sound symbolism could help Japanese- and English-­speaking 3-year-­olds find the semantic invariance for a newly taught verb and help them generalize it (Imai, Kita, Nagumo and Okada 2008; Kantartisz, Imai and Kita 2011). Children were assigned to one of three conditions and were taught novel verbs while observing a person walking in different manners. In the experimental condition, a novel

The “Symbol Grounding Problem” Reinterpreted


verb (which had been created by modifying an existing Japanese mimetic word, i.e., a sound symbolic word) was paired with a manner of walking that matched sound-­symbolically. For example, to describe the action of walking quickly with small steps, the novel mimetic choka-­choka was created from the existing Japanese mimetic choko-­choko, and presented as a verb (choka-­choka-shiteru in Japanese, doing choka-­choka in English). Consistent with previous studies (Golinkoff, Jacquet, Hirsh-­Pasek, and Nandakumar 1996; Imai et al. 2005, 2008), Japanese and English 3-year-­olds both failed to generalize a newly taught verb to the identical action performed by a different actor when the sound of the verb did not have sound symbolic correspondence to the action it denoted. However, when the novel verb sound-­symbolically matched the action, not only Japanese 3-year-­olds but also English-­reared 3-year-­olds who were not familiar with the sound symbolic system of Japanese mimetics were able to use this cue to generalize the verb to a new event.

4. Constructing the lexical systems by combining fast and slow learning As I reviewed in the previous section, young children can come up with the most plausible meaning at the moment, and in so doing, they can recruit different sources of cues and coordinate them flexibly. At the same time, the meaning they first assign to a new word is only tentative. As I pointed out earlier, the acquisition of the meaning of any given word requires delineation of the boundaries between it and its neighboring words. Previous studies have shown that delineating boundaries between words and building up an adult-­like representation of the entire conceptual domain is not an easy task for children (e.g. Ameel, Malt and Storms 2008). Here, I review a study that investigated how children might construct a complex lexical system of verbs, using a set of verbs denoting various actions of carrying in Mandarin Chinese as a test case (Saji et al. 2011). Mandarin Chinese makes fine distinctions for a variety of actions which are all denoted by carry (or hold) in English. For example, carrying/holding an object on one’s head is denoted by ding (顶), while carrying/holding an object on one’s shoulder is kang (扛). Carrying/holding an object with two arms is denoted by bao (抱), but if the object is held with one arm at the side of the body, the action is called jia (夹). Several verbs like na (拿), ti (提), and lin (拎) denote carrying/ holding actions with one hand, and verb choice depends largely on the shape of the hand holding the object. This description may give readers an impression that these verbs are all mutually contrastive with clear gaps among them, but this is not the case. Just like most semantic domains in any language, one part of the semantic space is densely covered by several close synonyms with overlapping boundaries, while other parts of the space are only sparsely covered with clear gaps between verbs, as Figure 2 shows.


Mutsumi Imai

Figure 2. The semantic structure of verbs in the carry/hold domain in Chinese

Saji et al. (2011) provided a comprehensive picture of the process Chinese children go through to eventually acquire this complexly structured semantic system. The verbs that tend to be included in children’s early lexicons are those that they hear most frequently, and the words that are frequently used by adults are usually those which are used broadly and polysemously, as is the case with Chinese verb na (or in the case of English speakers, verbs such as go, make or run). Children thus start to use these verbs early and to apply them to a broad range of events. When these frequent and broad-­covering verbs have close neighbors that cover a relatively narrow range of referents, the more frequently used ones may be overextended to cover actions that adults would denote with less frequent but more specific ones. As children’s lexical knowledge of the domain develops with the inclusion of more verbs in the domain, the boundaries of the originally overextended verb are gradually modified. The process of fast-­mapping and continuous adjustment of the meaning there­ after provide sufficient grounds to make my point: that the symbol grounding ­problem for children is not simply a problem of mapping already divided elements of the world to an elementary set of words; rather, it is a problem of constructing the vastly complex system of symbols. It is important, in this process, that children can use words when the system is still under construction, so that they can communicate. To accomplish this goal, children combine fast learning and slow learning. To begin to construct a system, one first has to have elements (i.e., the words). Children thus attempt to build up a sizable vocabulary as fast as possible by learning rough (and

The “Symbol Grounding Problem” Reinterpreted


often inaccurate) meanings of words by fast-­mapping. But they are open to the possibility of modifying the meaning of the word they have learned, and in fact, they continue to update their knowledge of word meanings over many years as they learn new (contrasting) words and witness already learned words in new contexts.

5. What cognitive functions make system construction possible? As argued so far, children learn words as symbols by inferring their meanings from observing a single or a small number of referents. Developmental psychologists have long characterized the problem of word meaning inference as a “problem of induction” (e.g. Carey 1982; Carey and Bartlett 1978; Markman 1989; cf. Quine 1960, 1969). However, I maintain that this kind of inference should rather be seen as a form of abduction (cf., Peirce, 1931–58). Children do make generalizations from an observation of a single word-­reference mapping, which can indeed be thought of as inductive generalizations. However, inductive generalizations need to be constrained, because there are too many ways the single exemplar can be generalized (Markman 1989). In inferring the word-­meaning, children attempt to come up with the best possible interpretation of the meaning of a new or an old word, given the information available at that time, using whatever knowledge they have (cf. Hollich et al. 2000; Imai 1999). This is just like the process of hypothesis construction in the domain of science, which Peirce characterized as abductive reasoning. Fast-­mapping inference of word meanings allows children to learn words rapidly, which bootstraps them into discovering distributional patterns and the hidden structures of language. Using this knowledge, children can further accelerate the learning of new words. If flexible abductive inference is the key to solving the symbol grounding problem, it may also be the key to the evolution of language — i­n other words, to the question of why human beings, but not other animal species, have language. There have been attempts to teach language to our closest evolutionary relatives, i.e., chimpanzees. Although they were able to learn the associations between some numbers of symbol-­referent pairs, researchers were not successful in attempts to get them to use the trained symbols spontaneously as a tool for communication (e.g. Premack and Premack 1972; Terrace, Petitto, Sanders, and Bever 1979/1981). In fact, the way the trained chimpanzees understand the symbol-­referent pairs seems to be fundamentally different from the way human children understand them. A study that attempted to teach chimpanzees language (Asano, Kojima, Matsuzawa, Kubota and Murofushi 1982) tells us that, in order to learn the meaning of a word, memorizing the association between a sound and an object is necessary but not sufficient. In this study, a female chimpanzee was trained to associate a set of symbols with color chips. During the training, she was shown a color chip and was taught to choose the symbol that designated that color from a list of symbols. After training, the chimpanzee seemed to have learned color-­symbol parings as she was able to choose the correct symbol when shown one of the color chips she had been


Mutsumi Imai

trained with. However, when the directionality of the contingency was reversed (i.e., when she was meant to choose the correct color for a given symbol), she failed, indicating that she did not understand that the relation between the color and the symbol was intended to be bi-­directional. This situation is analogous to the following scenario: a human child has been taught the name of a novel object, say dax, and she can now pronounce dax when that object is shown; nevertheless, she cannot choose the object from an array of several objects when you request that she hand the dax to you. This would be quite startling from our (human) perspective, and it reveals the prerequisite for language learning: for human infants to initiate language learning, the first thing they need to understand is the bi-­directional relationship between symbols (i.e., words) and their referents. Note that from one point of view, the chimpanzee was making a logically sound inference: with a premise that the A-­then-B rule is true, one cannot deduce that B-­ then-A is true. The assumption that bi-­directional relation holds between the word form and its referents thus requires logically unsound heuristic reasoning. Humans have a strong tendency to generalize a learned contingency in the reverse direction. For example, having heard someone say “if X happens, I will come,” then he actually appears, people naturally infer that he came because X happened, although he may have come for another reason. Language learning seems to become possible when children make a commitment to the non-­logical assumption that bi-­ directional relations hold true between symbols and their referents. Importantly, bi-­directional inference goes beyond the relation between A and B. Let us think about the relationship among sound, referent, and orthography. We now have three items. A child learns the bi-­directional relationship between the sound inu and its referent DOG. At a different time, she may also learn the bi-­directional relationship between the sound inu and the character 犬. Now, the child does not need to make any further effort to learn the relationship between 犬 and DOG. Because she has learned that inu⇒DOG and inu⇒犬, she automatically assumes that DOG⇒inu and 犬⇒inu; furthermore, 犬⇒DOG and DOG⇒犬. In other words, the relation between the character and the referent comes for free. Thus, if one has a bias to assume bi-­directionality, learning the one-­directional relation between two elements (e.g. A→B and A→C) leads to the learning of four other relations (B→A, C→A, B→C, C→B) for free. Obviously, this is an extremely efficient learning mechanism, even though it could sometimes cause errors. Numeral studies with various animal species have shown that non-­human animals rarely generalize a learned contingency to the reverse direction (e.g. D’Amato, Salmon, Loukas, and Tomie 1985, Lionello-­DeNolf 2009 for a review, Sidman, Rauzin, Lazar, Cunningham, Tailby and Carrigan 1982), as the above example by Chimpanzee Ai shows. Some other researchers tested whether two chimpanzees who had been extensively trained on lexigram-­referent associations in both directions (lexigram⇒referent and referent⇒lexigram) could take advantage of this experience to learn bi-­directional contingencies for a new set of lexigram-­referent pairs (Dugdake and Lowe 2000). These two animals were trained to match a shape to a color, and they were able to learn the associations in the direction they were trained. However, when

The “Symbol Grounding Problem” Reinterpreted


they were tested with the same shape-­color associations in the reverse direction, their performance was at the chance level. Thus, chimpanzees are not likely to generalize the learned contingency in one direction to the other direction, even if they have been trained on contingencies on both directions in other stimulus sets. A question that is extremely important in considering the ontogenesis of language is whether human infants possess the bi-­directional reasoning bias prior to language learning. If they do, the presence of a bias to assume a bi-­directional relation may be something unique to human beings and we might even be bold enough to maintain that it is because of this bias that human children are able to learn language. To investigate this possibility, in my laboratory we compared 8-month-­old human infants and adult chimpanzees with regard to the bi-­directional reasoning bias in a non-­linguistic context, using the same stimuli and comparable experimental procedure (preferential looking paradigm) across the two species (Murai, Miyazaki, Tomonaga, Okada and Imai 2014; Imai, Miyazaki, Murai, Tomonaga and Okada, in preparation). In this study, both species watched ‘A(object)⇒B(motion)’ contingencies until they were habituated. Specifically, one object (a toy dog) appeared on the monitor. It then shrunk into a dot and moved in a zigzag path. Likewise, another object (a toy dragon) always moved in a curvy path. After participants (human babies and chimpanzees) were habituated, they received test trials in the reverse direction, i.e., a motion followed by an object. In half of the test trials, the motion-­ object pairings were kept, e.g. B (zigzag motion) ⇒A (toy dog); in the other half, the object followed the motion that was not paired with it during familiarization, e.g. B (curvy motion) ⇒notA (toy dragon), see Figure 3). We expected that human infants but not chimpanzees would show the bi-­directional reasoning bias, detecting anomaly in the ‘B⇒not A’ contingencies. Figure 3. The design of the comparative study testing the bi-­directional bias in human infants and chimpanzees


Mutsumi Imai

As expected, the human infants showed the bi-­directional reasoning bias. At test trials, they looked at the object longer when it followed the motion that was not paired with it during the familiarization (B1⇒A2, B2⇒A1) than when it followed the originally paired motion (B1⇒A1, B2⇒A2). In contrast, when the direction was reversed, the chimpanzee did not show any difference in looking times across the trained association and the switched association. We checked whether the chimpanzees had learned the contingency at all, comparing the looking times for the trained contingencies in the original order (e.g. A1⇒B1, A2⇒B2) and the looking times for the pairs in which the pairing was switched (e.g. A1⇒B2, A2⇒B1). The chimpanzees were able to detect the violation this time. Thus, although they indeed had learned the contingencies during the training, they did not (over-)generalize the learned contingencies to the other direction. These results suggest that the bi-­ directional reasoning bias is present in human infants before they start initiating active word learning but that this bias is not present in non-­human animals. As discussed, the bi-­directional reasoning bias can produce fast and efficient learning. Unlike chimpanzees who are only able to learn a few hundred lexigram-­ referent pairs at most, human children learn thousands of words, words that are all connected and integrated into the system of the lexicon. Constructing this lexicon is only possible through a bootstrapping mechanism: a learner starts small, using biologically endowed perceptual abilities such as multi-­modal mapping ability, to learn some initial elements, i.e., sound-­referent associations. From there, children infer extensions of individual words. At the same time, as soon as a certain number of elements are learned, they begin to construct hypotheses about the nature and structure of the lexicon to make further learning more effective. I conjecture that the bi-­directional reasoning bias may be crucially related to the root of flexible heuristic thinking in human beings, and hence, to the ontogenesis as well as phylogenesis of language. Heuristic thinking using a variety of sources of knowledge for abductive inference further enables children to advance their knowledge almost limitlessly through a continuous bootstrapping process.

6. Conclusions Some developmental scientists, especially those who adopt a connectionist approach (e.g. Rogers and McClelland 2004; Hills, Maouene, Riordan and Smith 2010; ­McClleland, Botvinick, Noelle, Plaut, Rogers, Seidenberg, and Smith 2010) characterize abductive reasoning as association learning. However, although the associationist view might be able to explain how children could extract statistical patterns in language, it is doubtful if it can explain the entire bootstrapping process through which children not only ground individual symbols per se but integrate an entire system of symbols within their internal representations. As I have shown in this chapter, one should specify at least the following three points in re-­thinking “the symbol grounding problem”: (1) how children make reasonably constrained abductive inferences about the meanings of words, (2) how they discover subsystems

The “Symbol Grounding Problem” Reinterpreted


of language and how they integrate these subsystems into a vastly complex system of the lexicon; and (3) what are the cognitive functions that make this possible. To recap, Hanard raised the symbol grounding problem to criticize the so-­called “symbolic AI” approach, i.e. the view of intelligence as the manipulation of physical symbols (e.g. Newell 1974), and expressed high expectations concerning the connectionist approach. During the past two decades, machine learning technologies have shown tremendous advancement to such an extent that machine intelligence has now surpassed human performance in a number of domains (e.g. board games such as chess, shogi, Othello; and search engines such as Google) and people are even concerned about the “singularity problem” (Kurzwel 2006). However, it is not likely that non-­symbolic approaches with current technologies such as deep learning (e.g. Le 2012) can deal with the symbol grounding problem. Such approaches can certainly excavate hidden structure by mining a huge amount of data that humans can never handle. They can discover new knowledge, as long as the domain — the problem space — is a priori defined (McCarthy and Hays 1969; Dennett 1987). The question is whether machines can ever discover a domain or create a new domain by themselves, as human children do so naturally. Currently, researchers in natural language processing work in specific subsystems of language such as speech recognition, speech production, morphology learning, structure building of sentences, semantic ambiguity resolution, interpretation of word meanings, etc. However, children do not learn language separately within a prefixed domain. They need to learn rules for various aspects of language, including phonology, grammar, semantics, and pragmatics. Parents do not tell children in advance that they need to find rules for this or that aspect of language. Children discover the domains, learn elements and relations among elements within the domains, and spontaneously integrate different domains by abduction. Thus, the reformulated symbol grounding problem for AI now is how a machine can discover domains when it is placed in an unbounded and unlimited environment as we are, and how it can build up an integrated system of knowledge that continuously changes and grows as it discovers new information in the world, all on its own. Whether that is possible at all, or whether conscious human minds are needed for this feat, as argued within cognitive semiotics (e.g. Zlatev 2002), remains to be seen.

Acknowledgements This research was supported by MEXT KAKENHI (#15300088, #22243043, Grant-­ in-Aid for Scientific Research on Innovative Areas #23120003). I would like to thank the reviewers and Jordan Zlatev for their insightful comments on earlier versions of the manuscript.

Keith E. Nelson

Chapter 9 Key Roles of Found Symbolic Objects in Hominin Physical and Cultural Evolution: The Found Symbol Hypothesis 1. Introduction A fundamental, intriguing, but as yet unanswered evolutionary question is when did any species first begin to use intentional symbols. The definition of a full symbol here adopted is that one object or action or object-­plus-action is used intentionally to represent a referent that is distinct from the symbol. In this chapter, these full symbols are distinguished from artefacts and materials that are non-­utilitarian and often termed “symbolic” in function in a broad sense, such as red or black ochre blocks or beads with apparent decorative value (Bouzouggar et al. 2007; D’Errico, Henshilwood, Vanharen, and Van Niekerk 2005; Vanhaeren et al. 2006; Marean et al. 2007; Marean 2011). Further, full symbols refer to categories of referent instances, as when a nut that resembles a fish is used to refer symbolically to a diverse range of fish. On the other hand, there is no requirement for these symbols to be “arbitrary”, and as we will see, their reference is typically based on iconicity. Figure 1. Portable contemporary, potential symbols for “faces” all within hard black walnut shells opened by squirrels and ground squirrels who ate the walnut meat and left varied shapes through chewing on the shells.


Keith E. Nelson

In this chapter I propose the novel hypothesis that the first significant use of full symbols referring to specific inanimate objects, animate beings, actions, scenes, reactions, or events by any species occurred when members of the species began identifying and sharing found symbols. Figures 1 and 2 provide examples of natural objects which could readily serve as found symbols, and multiple additional examples occur in later sections of the chapter. Figure 2. Portable contemporary, potential symbol for “vulva,” Cowrie (Cypraea) shell with natural opening.

Once individuals experienced shared symbolic meaning for a pattern found in nature they were likely to then notice many other such patterns in nature. Shells would have patterns which some clan members would perceive as partially resembling faces or bodies of monkeys or people. Rocks and fossils similarly would have patterns which some clan members would perceive as partially resembling vulva, penises, lions, snakes, trees, faces, rivers, hands, skulls, deer, fish, and so on. The key points are that a species that had never made any symbols would discover, in naturally available patterns in nature, examples that would evoke in their minds reference to a conspecific; a tree, a lion, a breast, a common prey, whatever, and that once they drew the attention of others to the symbol/referent relationship the shared symbolic reference would then occur. This shared referencing would require achieving an understanding equivalent to “this is not literally a lion, but it in some ways it is shaped like a lion, so when we give our joint attention to this object we will think of lions.” The clarity of this shared meaning could be deepened in some instances by pointing to a lion and then pointing to the “lion object” and nodding, presuming shared meaning of these gestures. In this manner hominin

The Found Symbol Hypothesis


clans who had never made a symbol of any kind, and whose skills in signed or spoken language or visual image creation were absent or very limited, were able to use natural patterns as the gateway to symbolic communication using a variety of externally available patterns. This way of entering into symbolic activity would almost certainly require less sophisticated brains and culture than creating symbolic art through careful planning and tool use. By this account, the creation of explicit full symbols through tool use may have depended upon and followed after a period of hominin use only of found external symbols. If you and I were to reach agreement on “angry face” as the symbolic meaning of the faces/nuts in Figure 1, we would need to have some back and forth negotiation to establish that shared meaning. Some kind of similar negotiation would also be required if two or more early hominins were to reach agreement on the symbolic meaning of a found object. We can speculate on whether early-­stage communicative behaviour involving proto-­symbols, imitation, pantomime, and gesture were involved in such negotiations. Nevertheless, the mere presence of a shared, transported object found in hominin settlement/activity sites, with no transparent utilitarian value/function, but with a definite iconic (i.e. resemblance-­ based) relation to other objects/beings/actions, raises the distinct probability that shared, intentional symbolic communication was achieved using the found object as a symbolic vehicle. Bouissac (2004: 1) has highlighted the heavy reliance on inferences for interpreting the functions of prehistoric artefacts: While anthropologists usually can assess the functions of most [contemporary] artefacts by correlating them with specific, observable behaviours, prehistorian archaeologists must construct hypothetical behaviours which can never be verified.

This caveat applies to all prehistoric objects with potential interpretations as symbolic (Bousissac 2000). In the case of Palaeolithic portable statuettes known as “Venus figures” what is common for most modern commentators is a set of inferences that such figures had value sufficient to warrant their manufacture and transport. As modern authors readily “see” a representation of female human bodies in these figures, they assume that they also had a similar symbolic meaning 20,000 to 40,000 years ago for the local clans. However, it is hard to demonstrate how the statuettes fit into religious, planning, ritual, mate selection, or trading aspects of culture at the time (Sonesson 1994). Note, though, that if these statuettes do represent female human beings the shared interpretation relies upon a modest set of visual features that carry the symbolic interpretation – most either lack heads entirely or incorporate heads with quite limited facial features, but emphasize breasts, hips, and vulvas. Similarly, the oldest known “found symbol” candidate is a pebble (see below) that many modern authors readily see as a representation of a person even though it appears as a body-­less face. Again, social negotiation of the particular meaning-­holding between symbol and referent determines whether an object with only some of the features of a person refers to a whole person, some part/s of a


Keith E. Nelson

person, a person with a particular emotion or attitude, the sexiness or fecundity of the person, and so on. This latter point raises a broader point that both in prehistoric artefacts and in modern visual symbolic artefacts there exists a wide range of iconicity levels; successful visual symbols sometimes rely upon a small subset of features with iconic correspondence to features in the referent while other successful visual symbols incorporate most/all of the most salient visual features of the referent. In writing about the possible stages of proto-­mimetic, proto-­symbolic and symbolic behaviour, theorists such as Sonesson (2006) and Zlatev (2014b) provide valuable discussion of the differences between highly iconic versus weakly iconic (“secondary” iconic) versus purely arbitrary relations between symbol/sign-­expression and referent. For the purpose of the present chapter, we use patterns of actual discovered prehistoric objects to determine what varied levels of visual iconicity and symbol/referent relationships may have been part of symbolic visual communication during different time periods between 2 million BP to 100,000 BP. We also note that some theorists have suggested that before spoken or signed language developed there were extensive “mimetic” social exchanges incorporating pointing, gesturing, pantomime and the like (e.g. Donald 1991; Zlatev 2008a). The use of found symbols in relation to these other social communicative devices would seem complementary throughout hominin evolution, and indeed we suggest that negotiation of the shared meaning of a found symbol often would rely on these devices (cf. Deacon 1997). Once shared symbolic meaning was established, a symbolic seashell or fossil or nut would have special salience in pre-­linguistic communication as a way of referring to specific entities in their absence. In archaeological terms, of course, unworked natural objects serving as symbols can function as a material marker that so far is simply not available in any form for prehistoric instances of spoken language, pointing, or pantomime.

2. Objects with symbolic significance at archaeological sites/layers dating to 200,000 or more years ago Here we consider just three natural objects which have been interpreted as having symbolic value because in their natural state they bear resemblance to human figures or parts of human figures and because they were transported and valued by early hominins at established sites for hominin activity. Before our first case of this sort, it is essential to state again that if sense is to be made of the archaeological record for symbolic activity it is crucial to use consistent criteria for assigning symbolic values regardless of the age of an artefact/object. Bednarik (2003: 406) drives home this point strongly. The greatest schism that has developed in our concepts of hominid evolution concerns the antithetical positions of “long-­range” and “short-­range” theories of the cognitive development of humans. Sometimes called the gradualist and the discontinuist mod-

The Found Symbol Hypothesis


els, they perceive two entirely different paths of non-­physical human evolution. The currently still-­dominant short-­range model, which perceives the use of symbolism, language, and paleoart to be limited to the last quarter of the Late Pleistocene, survives by rejecting every instance of earlier evidence of this kind or by explaining it away as a “running ahead of time.”

Figure 3. The Makapansgat Pebble from southern Africa

The prehistoric object shown in Figure 3 is the Makapansgat jasperite cobble from South Africa, excavated in 1925. It is dated to 2–3 million years Before Present (BP) (Bednarik 1998, 2003, 2008). It had been transported to the cave where it was found, a site occupied at least by Australopithecines but perhaps also by early hominins. Its resemblance to a human face is noted by virtually all commentators on the find. But for the most part it has not been integrated into prominent accounts of hominin evolution, instead being either ignored completely or treated as a strange “curiosity.” This second prehistoric object (Figure 4) is seen by modern commentators as highly similar in appearance to a human penis. Dated at 200,000 to 300,000 yr. BP, it was found near Erfoud, Morocco. This apparent penis symbol is a fossil (cuttlefish) form that had been transported to a hominin site rich with stone tools. The TanTan Venus figurine shown in Figure 5, from about 400,000 yr. BP and from Morocco, is also among the earliest exemplars of objects that may be found symbols (Bednarik 2003). It is a naturally formed stone shape that bears 8 grooves, 5 of which have been emphasized by impact, and it had been coated in bright red ochre. As with the Venus of Willendorf and various other Venus figures from the Upper Palaeolithic there are no facial features. Nevertheless, it readily serves as a potential symbol of a human figure with a head, torso, arms, and legs (Bednarik 2003).


Keith E. Nelson

Figure 4. Cuttlefish fossil resembling a human penis

I propose that we treat these examples of probable symbols in the same way that most commentators have treated the Venus of Willendorf, the Venus of Hohle Fels, and other Upper Palaeolithic Venus statuettes: as potentially legitimate exemplars of active, intentional and socially-­shared use of visual patterning in a physical object to represent an absent referent. The cognitive and social processes for shared reference-­ making remain largely the same regardless of whether the origin of a symbol is a found natural object or a crafted pattern in stone, antler, bone, or wood. Viewers of any such symbol need only to make sense of the symbol-­to-referent meaning; they do not need to know how an object was procured from nature or created step-­by-step by a craftsman. Figure 5. The Moroccan Tan Tan figurine

The Found Symbol Hypothesis


Figures 6 and 7 illustrate two additional objects with symbolic value from contemporary natural contexts. Together with all the above examples of found symbolic objects they serve the point that in any local context in which hominins were evolving there would be an abundance of varied natural objects with high potential as symbolic vehicles. Figure 6. Portable contemporary fossil; potentially a symbol for live “snake”

Figure 7. Non-­portable potential symbol for “horned animal” in partially-­decayed log

This proposal fits within a developmental theoretical framework my colleagues and I have called a Dynamic Tricky Mix (Nelson 1980, 2006; Nelson et al. 2004; Nelson and Arkenberg 2008). Whether the progression occurs in prehistory or in the life of a modern-­day young child, to move from pre-­symbolic thinking to some first forms of symbolic thinking requires the tricky convergence of multiple conditions. At a minimum, these include convergence in short real-­time social discourse episodes, foundational levels of sophistication in (a) visual processing, (b) pattern detection, (c) rapid memory storage and retrieval, (d) pattern-­comparison processes, (e) attentional regulation, (f) joint attention and social engagement. Subsequent sections of this chapter bring in multiple domains of inquiry to establish a strong likelihood


Keith E. Nelson

that all the necessary conditions for moving from pre-­symbolic thinking to rapid symbolic thinking are: (1) lacking in contemporary nonhuman primates and in human infants or in early hominins whose brain development had not yet reached a 600 gram size, but (2) present both for early hominins and for children when brain sizes (and accompanying sophistication) exceed 600 grams. Taking all the lines of evidence together within a cognitive semiotics approach, we see then a powerful basis for interpreting the above instances of symbolizing with found natural objects as evidence for sophisticated referential thinking rather than some simpler level of non-­symbolic thinking. Further, the integrated lines of evidence lead to multiple specific predictions about what new exemplars of visual symbolizing in prehistory can be found if research programs systematically and without prejudice explore existing and new archaeological collections of objects.

3. Brain development, brain sizes, and brain organization Working within this found symbol hypothesis, this chapter brings together for discussion key estimates of brain size in early hominins and brain sizes at successive developmental points for young children and for modern, nonhuman primates; in all these cases there are reasonable and converging measurements on brain sizes across investigations. Together these observations help frame estimates of the probable biological neural foundations that would be necessary for the onset of symbolic behavior, including the social sharing of found visual symbols in early hominins.

3.1. Brain sizes for children, modern primates, and early hominins Let us first examine estimates of brain size and organization for human children under 27 months of age and for trained nonhuman primates who have shown acquisition of symbols, which they have encountered through planned experimental exposure. The onset of symbol comprehension and production in children typically arises between 8 and 26 months, with wide individual differences in timing. During this same period there are rapid increases in brain size. In some data, the average at 1 month is 332 cc and has increased to 798 cc by 8 months, to 860 cc by 12 months, to 900 cc at 15 months, to 1064 cc by 21 months, and to 1080 cc by 26 months (Scho­ enemann 2006). It is also of critical importance that for children between 8 and 26 months there are many structural and functional advances in the brain that accompany the increases in brain size. The most extensive evidence on brain development among the bonobo, chimpanzee, and gorilla species serving in experimental work on exposure to symbols comes from the common chimpanzee. Estimated chimpanzee brain size at 8 months is about 253 cc and increases slightly to about 286 cc at 21 months and then to 337 cc at maturity (DeSilva, and Lesnik, 2006; Schoenemann 2006; Dunsworth, Warrener, Deacon, Ellison and Pontzer 2012). Clearly then, for nonhuman primates

The Found Symbol Hypothesis


where experimental evidence of symbol acquisition (at quite slow rates compared to children) has been provided by research, brain size is fairly comparable to that of human infants at about 2 months, before even the earliest stages of children’s symbol production or comprehension. Fossil evidence gives some reasonable estimates of brain sizes for early hominin species. Assuming that, as in modern humans, infants in such species reached 80 % of their mature brain size by ~21 months, here are the relevant estimates for three species using only fossils less than 500,000 BP: H. erectus, 21 mo. 960 cc and 1200 cc adult; H. Heidelbergensis, 21 mo. 1040 cc and 1300 cc adult; H. sapiens neanderthalensis, 21 mo. 1120 cc. and 1400 cc. adult. In all three of these species, then, the estimated neural readiness (if supportive conditions arose) for learning to use symbols by 21 months of age (or later) as judged by brain size alone is remarkably similar to the ~1000 to 1100 cc typical for extant human infants at 21–26 months. As reviewed below, for such human infants there is strong and convergent evidence that rapid learning of many kinds of symbolic forms and symbol/referent relations is within their capabilities (cf. Imai, this volume). Authors from many backgrounds have stressed that the above changes in the size and functional complexity of the brain both across primate species (although for non-­primate species behaviour/brain relations are less clear) and across the months and years of early human development serve to support increasingly greater advances in cognitive complexity (Calvin 1989, 2006; Elman et al. 1996; Deacon 1997; Stiles 2001; Diamond 2002; Paterson, Heim, Friedman, Choudhury and Benasich 2006; Schoenemann 2006; Gao et al. 2009; Herculano-­Houzel 2009; Stiles and Lernigan 2010; Sporns 2011; Stringer 2012). In part, this can be understood in terms of more and more processing units in the brain interacting in far more combinations and at greater speed to support increasing behavioural complexity in, among other behaviours, language and other symbolic modes, social cooperative mindful behaviours, pattern-­detection, planning and problem-­solving. A related important behavioural development by 1.5 to 2 million years BP, with highly convergent evidence across sites and investigators, is the deployment of sufficiently sophisticated pattern-­detection processes to select rock cores which are transported and then struck by selected smaller rocks to produce flakes which are then selected/rejected as adequate for tool use as knifes, blades, and scrapers (Donald 1991; Noble and Davidson 1996; Deacon 1997). Going beyond these intentional, selective, and shared behaviours to also intentionally select and transport objects lying about in nature interpreted as shared symbols rather than tools would not necessarily require any new brain capacities. Instead, rapid symbolic advances could rest upon the already-­existing visual pattern abstraction and sorting, joint attention, analogic, memory, and social-­sharing capacities of early hominin clans. We further propose that 8- to 12-month-­old human infants as well as chimpanzees and gorillas have brain readiness sufficient to support the interpretation of certain selected natural objects such as patterns in shells, rocks, fossils, and parts of trees as “found” symbolic representations. For wild chimpanzees and gorillas, it appears that their evolutionary path did not exploit their brain readiness for


Keith E. Nelson

communicating through external visual symbols. In contrast, the evolutionary path from nonhuman primates toward early hominins might have been able to exploit such readiness, as we can see with present-­day children in their second year of life.

4. Experimental documentation of symbol learning by young children In many different experimental studies children between 12 months and 26 months of age have been shown to learn symbols for totally invented concepts (e.g. Blicket, Fiffin, Bloop,) as well as for conventionally-­named objects, actions, and events (e.g. Rabbit, Cat, Lobster, Fish, Hedgehog, Chase). Furthermore, each of the following kinds of symbols have been acquired by some children in this age range: the conventional signs of signed languages, symbolic gestures, spoken words, written words, photographs, artist illustrations, pictograms, and arbitrary geometric patterns on communication boards (Söderbergh 1977; Nelson 1982, 2006: Nelson, Craven, Xuan, and Arkenberg 2004). The brains of young children at 12 to 26 months are clearly open to the development of successful symbolic communication through a wide range of symbolic vehicles. Consider those children whose first comprehension of certain symbols – such as a photo of a lobster, understood as representing a lobster, or a photo of a blicket understood as representing a novel 3-dimensional object – occurred for photographs (or artist illustrations) when the children reached 15 to 24 months of age. Typically, children also display comprehension of a linguistic symbol/word for the same referent. In these cases, it seems evident that both the spoken word and the two-­dimensional photo are serving as “found” symbols whose meaning emerges in social discourse, but whose origin/manufacture is unclear to the infant. Although this line of research has not been integrated well with evolutionary discussions of symbol processes before, it is of high relevance (cf. Zlatev et al. 2013). In line with other investigators, DeLoache and Ganea (2009) give a clear interpretation of children’s symbolic behaviour under well-­specified experimental conditions: “some children as young as 15 months of age interpret pictures symbolically, and children’s appropriate interpretation of the referential nature of pictures increases gradually with age” (2009: 276). Highly controlled experimental procedures with human infants and toddlers are of special importance to the conclusions regarding early symbols and their precursors in behavioural evolution. These procedures reveal whether the infants and toddlers have the capability despite brain immaturity to acquire symbols when given limited, specific sets of learning opportunities. When particular studies have used invented concepts (e.g. Blicket, Fiffin, Bloop, Bandock, Weedle) with records of all contexts/durations of exposure to concepts and their symbolic labels, if a child does learn symbol production and comprehension then the entire experiential base for that learning is well documented. Surprisingly, by 15 months or before many children are able to learn a new concept from a single labelled exemplar (ob-

The Found Symbol Hypothesis


ject, picture) in an episode lasting just 5 to 15 minutes and are then able to show knowledge of the concept and its symbolic label when a second or third example is presented for the very first time many days later. Accordingly, children as young as 15 months of age are able to abstract the characteristics of a concept along with its spoken lexical name from a single brief occasion, and carry this information in long-­term memory across 4 days to 2 weeks, and then appropriately apply the vocabulary item to other, newly-­encountered examples (Nelson 1982; Nelson et al. 2004). Some studies only look at learning within one brief session and so do not test long-­term memory, but they also are able to demonstrate that children at 15 to 26 months can map a variety of symbols to novel new objects/concepts: spoken words, gestures, and pictograms (DeLoache and Ganea 2009; Namy 2009). For some well-­documented children, their language growth at 15–26 mos. is bilingual and equivalent in rate and complexity for two different modes of language, American Sign Language and English (Bonvillian and Folven 1993; Prinz and Prinz 1981).

5. Conclusions and predictions The incorporation of brain development findings and symbol experimentation studies for human infants and toddlers may be compared to an approach recently taken for studying the possible evolutionary patterns in the transition from dinosaur species to bird species. Heers (2014) has added insights in this regard by studying juvenile rather than adult contemporary birds. Her research demonstrates that the immature feathers of young birds do have adaptive value in facilitating more successful movements in certain spatial contexts even though flight itself is not yet possible, just as it may have occurred for dinosaur species with similar kinds of symmetric simple feathers which later evolved into full asymmetric strong-­quilled feathers on birds. Similarly, we here argue that before the immature brains of children are of sufficient size and complexity to support complex combinations of symbols at the level of sentences, the brain capacities are nevertheless sufficient for simpler aspects of symbolic behaviour. Moreover, the evidence (both experimental and naturalistic) suggests that a brain size of about 600 grams is necessary for any such symbolic behaviour and that as infant brain sizes increase successively between 600 and 1100 grams the complexity of symbolic production and comprehension achievable increases in parallel. For early hominins in the period between 2 million and 200,000 yr. BP we would argue that as brain sizes (and complexities) increase successively between 600 and 1100 grams the necessary neural foundations emerge for gradually increasing the complexity of symbolic production and comprehension. This enables possible new documentation in any form of when and where any early hominins actually realized particular levels of symbolic behaviour. The present chapter has reviewed key findings to date, and below generates predictions regarding future discoveries. In the early stages of hominin physical and cultural evolution, communication with found symbols would not have required any tool use or complex planning


Keith E. Nelson

leading to the creation of new symbolic objects, and so potentially could have occurred before brain sizes and organization in early hominins were adequate for complex communication sequences in any sign, gesture, speech, art, or multimodal forms. Individuals and groups with greater symbolic communication success would be able to better avoid predator attacks, to better find needed shelter, better find food and water and other resources, and in turn would have higher survival and child-­bearing rates. To the extent that greater symbolic success rested upon biological differences in brain size and complexity, then what may have ensued were cycles of co-­ evolutionary increases in brain size and complexity accompanied by increasing use of external symbols in more and more contexts (Deacon 1997; Bednarik 2006). Across many generations, in certain lineages, the brain size and numbers of neurons and brain organization for certain hominin groups could have reached the level of modern human infants at approximately 15 to 18 months of age without any cultural use of syntactically governed signed language, speech, or gesture (cf. Donald 1991). This line of argument leaves open the possibility stressed by Tomasello (2008) that increases in social cooperation, in understanding mental states and social grounds, and in complex gesturing events were fundamental at some point in the evolutionary path to syntactically governed signed language and speech. Yet this account is not inconsistent with the found symbol hypothesis, according to which found natural symbols also contributed to the origins of complex human language. It remains for new kinds of future evidence to clarify whether using natural objects as symbols preceded, accompanied, or followed the origin of complex, socially-­ shared gesturing. One view related to these issues is that for the purposes of archaeology, it is convenient to define language as communication using symbols (Noble and Davidson 1996). Accordingly, one possibility is that communication using found symbols or found symbols combined with symbolic gesturing may have constituted the first such “language.” Furthermore, the found-­symbol hypothesis has the heuristic value of generating multiple new predictions. Putting the lines of evidence and cross-­ disciplinary discussion presented in this chapter together, the following predictions can be made about likely future findings on hominin physical and cultural evolution. 1. Review of objects in museum collections and discovery of new artefacts at archaeological sites will reveal many previously unreported objects of symbolic value which were in no way produced /shaped by hominins, dating to the period between 2 million yr. BP and 100,000 yr. BP. Thus, potential support for the found symbol hypothesis could include new findings from multiple archaeological sites in Africa, Asia, and Europe of portable natural objects which remained unworked by tools but were transported and saved at locations with other evidence of hominin activity. These findings would have a bias in the saved shells, stones, fossils, tree nuts, bones and other objects toward those with particular features/ patterns that strongly resemble and thus can symbolically represent real-­world

The Found Symbol Hypothesis


referents of multiple kinds. Note that a strong degree of resemblance (iconicity) serving symbolic processes need not rest upon the found object having a great many features: even 2 or 3 key features in combination may clearly serve as reference to a camel, a rabbit, a fish, a bear, and so on, as long as there is the inter-­person active negotiation of specific symbolic reference as discussed earlier. 2. The number and types of such symbols identified will increase substantially as time markers advance from 2 million yr. BP toward 100,000 yr. BP. Moreover, the diversity of such symbols will be impressive and will not be confined just to referents of probable high relevance to hominins, such as faces or bodies. 3. At sites where tool-­worked, created symbols have been reported between 100,000 yr. BP and 20,000 yr. BP, closer examination of other objects will reveal many previously unreported, unworked objects of symbolic value. 4. At sites dated to 20,000 yr. BP and older, closer examination of many tool-­worked objects will reveal many previously unreported “hybrid” objects which incorporate the overall shapes of the objects and/or interior smaller shapes, natural shapes which are also intentionally chosen for their found symbolic values. Such hybrid objects have been reported from many locales during the Upper Palaeolithic period, but they have most often been interpreted as “curiosities” with a likely selection bias against reporting them (cf. Bednarik 1992). A new level of interest in finding and discussing such hybrids follows from the fundamental importance of found symbolic patterns proposed in the present found symbol hypothesis. Similarly, the present hypothesis and discussion and predictions fit well with certain other “curiosities” that have been reported from multiple continents in the period 400,000 to 2 million years BP: namely, many carefully-­ crafted “hand axes” which are too large to be functional tools, and deserve closer examination and reflection to uncover which kinds of shared symbolic value they most likely served. To conclude, the growth in the number and variety of found symbols in the period 2 million yr. BP to 100,000 yr. BP would provide a material evidence basis for changes in symbolic behaviour that helped drive the co-­evolution of larger more complex brains and increasingly complex symbolic behaviour. This co-­evolution during the period 2 million to 100,000 years BP would involve higher survival rates and higher numbers of offspring for those clans who adopted more extensive symbolic behaviours. For example, in many contexts symbolic communication would have led to fewer deaths, more successes in finding food, and more successes in finding shelter and water, and thus genetic selection processes would support advances in brain sophistication and drive even more extensive symbolic behaviours in successive generations. This would help solve the puzzle of why, during the majority of this period the sophistication of worked tools changed very little, yet substantial and crucial increases in brain size and sophistication occurred. If the predictions outlined above are confirmed, the patterns of found symbols geographically across sites and across time periods would help build an empirically based stage account of gradual changes in the complexity of symbolization. What-


Keith E. Nelson

ever the outcomes, investigating the availability of found natural symbols throughout prehistory will enhance evolutionary knowledge of hominin cognition and communication. By the same token, the observed empirical patterns together with other new archaeological investigations will allow an evaluation of a central tenet of this chapter: that natural objects which attained (through social negotiation) symbolic value were the first form of explicit and specific symbolic referring to develop among our hominin ancestors.

Acknowledgments The author thanks the School of Liberal Arts at Penn State University for monetary and other assistance, and the reviewers and Jordan Zlatev for their helpful comments. In addition, he appreciates the rich discussions with Jill Cook of the British Museum, Mark Shriver in Anthropology at Penn State University, and Terrence Deacon in Anthropology at The University of California, Berkeley. Figures 3, 4, and 5 are used with the permission of Robert Bednarik, owner of the copyrights. All other figures copyrighted and used with the permission of Keith E. Nelson.

Francesco Ferretti & Ines Adornetti

Chapter 10 Mindreading, Mind-­travelling and the Proto-­discursive Origins of Language 1. Introduction1 The idea that syntax is the essence of human language is a conceptual construct that was strongly supported in the 20th century by the fathers of so-­called classical cognitive science. According to Chomsky (e.g. 1980, 1986), the devices at the foundation of the language faculty are those that elaborate the constituent structure of sentences, and according to Fodor (1975, 2008), the predicative structure of the sentence reflects the propositional structure of the Language of Thought. Underpinning the primacy of the sentence in these authors’ work is a way to propose a specific conception of language and cognition. In fact, the idea that language competence is a device that analyses the shape (syntax) of symbols regardless of their content and the relationship between the uttered expression and its context is part of a broader conception of how to analyse the study of the mind in classical cognitive science. Rejecting the view of cognition as computations on amodal symbols, independent of the brain’s sensori-­motor systems for the perception of an action, the standard position of cognitive science has been criticized by action-­oriented perspectives (e.g. Barsalou 2008; Clark 1997; Varela et al. 1991). These perspectives, with their anti-­cognitivism and attention to bodily experience, have strongly influenced cognitive semiotics (cf. Zlatev 2012). Following these action-­oriented perspectives, we propose a model of language strictly tied to the paradigm of embodied cognition. Against the syntax-­centred view, in effect, we maintain that the nature of language (i.e. its functioning and origin) needs to be analysed in reference to the human pragmatic capacity to build coherent narratives rather than to the ability to construct syntactic well-­formed sentences. Specifically, our proposal is that narrative abilities are dependent on the ability to (mind)travel in space and in time and that the narrative foundation of human language provides important insights to suggest a proto-­discursive model of the origins of our communication skills.

1 This chapter is the outcome of a collaborative effort between the two authors. For the specific concerns of the Italian Academy, we specify that F. Ferretti wrote sections 4 and 5 and I. Adornetti wrote sections 2 and 3 for the final draft. Both authors wrote the Introduction and the Conclusion.


Francesco Ferretti & Ines Adornetti

2. The primacy of microanalysis and sentence At a general level, language can be analysed along two major dimensions: a within-­ utterance or microlinguistic dimension and a between-­utterances or macrolinguistic dimension (e.g. Kintsch and van Dijk 1978; Davis and Coelho 2004). The microlinguistic dimension, which focuses on intra-­sentential functions, assesses how phonological (or graphemic) sequences are organized into morphological strings and words (lexical processing), and how these are inserted into a grammatical well-­ formed sequence (syntactic processing). The macrolinguistic dimension, which analyses inter-­sentential functions, focuses on the ability to select contextually appropriate words and sentences (pragmatic processing) and how sentences or utterances are connected in a flux of speech (or text) that is coherently organized (discourse processing). The theoretical model of language elaborated within the framework of the classical cognitive sciences (Chomsky 1980; Jackendoff 1994; Pinker 1994) is characterized by exclusive attention to the microanalytic dimension. In this tradition, it is taken for granted that the central goal of linguistic production is to generate sentences that at a minimum are structured with a noun phrase and a verb phrase. As a consequence, at the basis of the classical framework are the assumptions that the structure of the internal constituents of a sentence represents the core of language and that the general device at the basis of language is a module specialized in the analysis of the syntactic structures. Emblematic in this regard is the statement by Pickering and colleagues (2001: 1), according to which, “the question of what architectures and mechanisms underlie sentence comprehension, […] illuminate[s] the general nature of human language processing in the context of cognition as a whole”. Evidence from sentence comprehension could be used “to understand the overall nature of language processing” (ibid). There are two things to note about this perspective for the purposes of our argument: the first has to do with the model of communication that emerges from the conception of language functioning that is understood as syntactic processing of internal constituents of the sentence; the second concerns the impact that this model has on how discourse is processed. The model of language of orthodox cognitive science is characterized by the implicit (and sometimes explicit) adhesion to the so-­called “code model of communication” (Shannon and Weaver 1949; for a discussion, cf. Ferretti and Adornetti 2014), a model that Fodor (1975: 106) considers “not just natural but inevitable”. The code model of communication seizes upon the idea that “we have communicated when you have told me what you have in mind and I have understood what you have told me” (Fodor 1975: 109). According to this model, the thought (i.e. the message) is encoded by the speaker in a succession of sounds that the listener decodes in order to share the thought (the message) that the speaker has intended to communicate. In other words, the informational content is entirely encoded in the utterance. Therefore, adhering to the code model means taking a clear stand against pragmatic

Mindreading, Mind-travelling and Origins of Language


theories of communication that are focused on context and the speaker’s intention (cf. Sperber and Wilson 1986, Chapter 1). The second thing to stress for our argument is that from the classical perspective, the production and comprehension of discourse is just a by-­product of the production and comprehension of single sentences. Since discourse is in fact composed of a set of individual sentences, from the perspective of classical cognitive science the analysis of discourse coincides with the analysis of the microlinguistic dimension. From this point of view, producing and understanding discourse is equivalent to producing one sentence after another by means, for example, of the grammatical devices (e.g. the use of pronouns) that provide the links between consecutive utterances. The idea is that the mechanisms that regulate the structure in constituents within the sentence are the same that also regulate the establishment of links between sentences in the external flux of speech. The global level of discourse is attained starting from the analysis of the utterances of single sentences through a sequential process of accumulation of information. In this sense, information processing that underlies narrative abilities has a strong “bottom-­up” character: discourse analysis proceeds incrementally, from the local meanings of sentences to the global meanings of discourse.2 Disputing the priority given to microanalysis, in the following section we propose that the ability to process discourse takes priority over the ability to process sentences. In support of this, we discuss studies from neurolinguistics and neuropsychology showing that 1) the ability to process sentences (i.e. the capacity to construct well-­formed utterances) is not a sufficient condition to communicate efficiently and 2) that it is indeed possible to communicate when the syntactic competence at the basis of sentence construction is disrupted.

3. From microanalysis to macroanalysis: evidence from the study of pathologies of language Our argument is based on the analysis of a specific property of discourse and narration: coherence. In general, coherence can be defined as the conceptual organizational aspects of discourse at the suprasentential level (Glosser and Deser 1990: 69). Even if Chomsky does not address the issue explicitly, scholars who are inspired by generative linguistics and who are interested in the study of narrative processing maintain that the building of the coherent flow of discourse (the basis of any narrative ability) must be interpreted as a bottom-­up process driven by syntactic parser functioning. Kintch and van Dijk’s (1978) construction-­integration model is a good example in this regard. Specifically, the theoretical models that equate language with grammar and linguistic processing with sentence processing explain discourse coherence in terms of the linear relations of cohesion between consecutive sentences (e.g. Halliday and Hasan 1976; Reinhart 1980; for a discussion, 2 For discussion and criticism on this point, cf. Cosentino et al. (2013).


Francesco Ferretti & Ines Adornetti

cf. Giora 2014).3 In a text, cohesive relations are accomplished through grammatical and lexical elements (Halliday and Hasan 1976). Grammatical cohesion includes elements such as reference (inside and outside the text, respectively endophoric and exophoric reference), substitution, ellipse and conjunction; lexical cohesion is based on reiteration (e.g. repetition, synonymy) and collocation (i.e. co–­occurrence of lexical item). An example of a cohesive text is the sample of discourse shown in (1), in which the sentences (a), (b) and (c) are connected through the use of pronouns (a case of grammatical cohesion): (1) (a) They managed to catch him. It was an all-­out abuse. They abused him, and I don’t think something was done about it. (b) They put him in the toilet, I remember the soldier, I remember, he was a friend of mine, a friend from the company. (c) And he took pride in shoving the kid’s head into the toilet. (Anonymous [Sergeant] 2000, in Giora 2014: 143).

For the purpose of our argument, it is important to highlight that from this perspective, cohesion is conceived as a pre-­requisite for coherence (see Daneš, 1974, 1987). The basic idea is that for a discourse to be coherent, its sentences must be cohesive. The coherence of a text, in fact, is not a given, but rather a product obtained through cohesive ties. These ties help to ensure the unity of the text and act as signals that the speaker offers to the listener marking the way the listener should follow in order to interpret the verbal utterances in a coherent way. The idea, in other words, is that discourse coherence relies on linguistic elements and capacities. Now, although cohesive relations have an important role in the expression and recognition of coherence, we argue that the cohesion between consecutive sentences is not a necessary condition for narrative coherence. Our claim is rather that cohesion is the superficial expression of a deeper level of coherence that concerns cognition prior to language production (Adornetti 2015). A crucial distinction is between global discourse coherence and local discourse coherence (cf. Glosser and Deser 1990). Local coherence refers to the conceptual links between consecutive sentences or propositions that maintain meaning in a text or discourse. Global coherence refers to the overall conceptual organization of the flux of speech; it refers to the manner in which discourse is organized with respect to an overall goal, plan, theme, or topic. As shown by the text in (1), cohesion contributes to local coherence. But is local coherence a necessary condition for global coherence? Theoretical arguments and empirical evidence suggest that global coherence does not depend on the local coherence. Consider the “text” shown in (2). (2) I bought a Ford. The car in which President Wilson rode down the Champs Élysées was black. Black English has been widely discussed. The discussions between the

3 Even if Halliday and Hasan (1976) do not equate coherence and cohesion, they are among the scholars that agree that cohesion — a grammatical phenomenon, and for this a surface structure phenomenon — b­ oth reflects and enables discourse coherence.

Mindreading, Mind-travelling and Origins of Language


presidents ended last week. A week has seven days. Every day I feed my cat. Cats have four legs. The cat is on the mat. Mat has three letters (Enqvist 1978: 110–111).

In this text the sentences are connected through the cohesive mechanism of repetition. However, the set of sentences is not perceived as a coherent whole because the sentences do not hang together in any reasonable way. Simultaneously, it can be argued that this text is just an artificial construction that does not reflect how human beings communicate with each other. But let us consider the conversation produced in an actual communicative situation shown in (3). (3) C: I admit this government we’ve got is not doing a good job but the unions are trying to make them sound worse than what they are T: mm C: they . they . cos I’m a Tory actually but I do vote . if there’s a . er . a communist bloke there I will vote communist but . it all depends what his principles are but I don’t agree . with the Chinese communism . and the Russian communism T: right C: but I believe every . should be equal but . I’m not knocking the royal family because you need them T: mm C: and they they they bring people in to see take photos (from Perkins et al. 1995: 304)

Despite the local sequential links (and at least a degree of local coherence) between trade unions/government, government/Tory, Tory/communist, communism/Chinese/ Russian communism, communism/ equality, equality/ Royal Family, Royal Family/ tourist attraction, C shows a form of “topic drift”: he is unable to monitor what has already been talked about or to relate each individual utterance to some overall coherent plan or goal. As mentioned, the text in (3) is produced by a person in an actual communicative situation. Specifically, it is produced by a subject with brain injury. Indeed, neurolinguistic research has shown that in several neurological populations, such as schizophrenic patients, traumatic brain injury subjects and patients with Alzheimer disease, there is a dissociation between the abilities that underlie sentence processing (microanalysis) and those that underlie narrative processing (macroanalysis) (e.g. Dijkstra et al. 2004; Glosser and Deser 1990; Davis et al. 1997; Marini et al. 2008; Marini et al. 2014). Specifically, these patients correctly connect sentences by using cohesion ties (grammatical devices), but they are unable to construct and maintain the global coherence of their verbal productions: they cannot relate the individual sentences to a plan or to a more general purpose, and they often introduce material that is irrelevant to the current context in their verbal productions. Because of such deficits in coherence, these patients are unable to communicate


Francesco Ferretti & Ines Adornetti

in an effective way, despite the fact that their capacities to construct well-­formed sentences are relatively preserved. Interestingly, it has also been shown that aphasic subjects with syntactic deficits and problems with the construction of well-­formed sentences, did not suffer such a pragmatic deficit and were able to produce coherent discourse (Gloser and Deser 1990). To conclude this part dedicated to language functioning, we underline two points relevant to our argument: First, discourse coherence is an essential property of language because it is a necessary condition for communicating in an efficient way. Second, the construction of global coherence in a narrative is not reducible to cohesion, that is to say, the macrostructure of a narrative discourse cannot be formally derived by the microstructure of the sentence.

4. At the origins of human language What can be deduced about language origins from the arguments and empirical evidence we presented about language functioning? Having identified discourse coherence as an essential property of human communication and having argued that global coherence cannot be explained in terms of cohesion leads us to a substantial change of perspective regarding the interpretive models that consider the origin of language in reference to the advent of syntax (e.g. Bickerton 1990; Berwick et al. 2013). In sharp contrast with these models, we propose that language has a proto-­discursive origin, and that the selective pressures that drive the evolution of language meet the needs of pragmatic concerns before grammatical ones.

4.1. Relevance Theory reconsidered The pragmatic turn in cognitive science is represented by relevance theory (RT) proposed by Sperber and Wilson (1986, 2002). However, we question whether relevance can be conceived as a principle that can explain discursive coherence and, if so, whether relevance can be intended as a principle capable of explaining both the origin of language and its functioning. As we will show, global coherence is not reducible to the so-­called relevance principle. Therefore, our hypothesis is that the answer to both questions is negative. That said, our intent is not to reject the model proposed by Sperber and Wilson. Rather, our proposal may be considered as a revision and an extension of the model of communication proposed by RT. We maintain that the basic idea of RT, that in communication the speaker simply offers evidence of her communicative intention to the listener – what can be called the clues model of communication – is at the same time a correct yet incomplete manner in which to analyse the origin and functioning of language. Before clarifying what we think has to be added to this model, it is necessary to briefly present the main assumptions of RT. Relevance theory has its starting point in the criticism of the code model of communication, which has dominated not only classical cognitive science, but also classical semiotics (cf. Sperber and Wilson 1986). As we said in Section 2, according

Mindreading, Mind-travelling and Origins of Language


to this model, communication is an associative process of encoding-­decoding: information is encoded into a signal, sent along a channel, and then decoded at the other end. However, as outlined by Grice (1968), the code model is unable to account for everyday language use, in which the speaker conveys more than she actually says in the encoded message. Assuming Grice’s distinction between a sentence’s meaning (what is encoded) and a speaker’s meaning (what a speaker intends to convey), RT develops an ostensive-­inferential model of communication according to which the production and comprehension of signals does not involve encoding and decoding of a message, but rather the provision and interpretation of evidence of the speaker’s intentions. It is important to note that evidence of such intentions can be various: not only linguistic utterances are understood as evidence, but also pointing, shrugs, glances, nudges, and other gestures. The listener recognizes all these pieces of communicative evidence as clues and draws inferences about the speaker’s intentions. The goal of RT is to explain how the listener infers the speaker’s meaning on the basis of the evidence provided. The explanation is based on the claim that communicative signals automatically create expectations that guide the listener towards the speaker’s meaning. These expectations are relevance-­based. According to Sperber and Wilson (1986), a stimulus (a sight, a sound, an utterance, a memory) is relevant to an individual when it connects to background information she has available to her in order to yield conclusions that matter to her, known as the positive cognitive effect. However, what makes an input worth processing among other competing stimuli is not just the cognitive effects it achieves. In different circumstances, the same stimulus may be more or less salient, more or less accessible, and the same cognitive effects easier or harder to obtain. Indeed, the greater the effort required to perceive, remember, and infer, the less rewarding the input will be to process, and hence less deserving of our attention. In terms of RT, and all other things being equal, the greater the processing effort required, the less relevant the input will be. Thus, relevance may be assessed in terms of cognitive effects and processing effort: (a) all other things being equal, the greater the positive cognitive effects achieved by processing an input, the greater the relevance of the input to the individual at that time; (b) all other things being equal, the greater the processing effort expended, the lower the relevance of the input to the individual at that time (Sperber and Wilson 2004: 252). The clues model of communication proposed by RT is an admirable way to respond to the issue of the origins of language (Origgi and Sperber 2000; Scott-­ Phillips 2014). Indeed, it overcomes both the difficulties of imagining the early stages of human communication in reference to an overly complex code of expression, and the difficulties of thinking about the birth of human language in reference to a simple expressive code based on signals such as those produced by non-­human animals (see Scott-­Phillips 2014, 2015). All that is required for the proper functioning of the clues model of communication, in effect, is a cognitive system that allows one to read the speaker’s communicative intention and take advantage of the clues the sender produces. So characterized, the main value of the analysis in terms of


Francesco Ferretti & Ines Adornetti

the clues model is coincidental with the analysis of the cognitive architectures that allow the receiver to infer the content the speaker intends to communicate. Now, a model of communication focused on the role played by the speaker’s intention in production-­comprehension processing conforms to the idea that the linguistic processes are driven by a mindreading cognitive system. It is exactly this kind of cognitive system to which Sperber and Wilson make reference in order to explain the transition from animal communication (founded on the code model) to human language (Sperber 2000; Origgi and Sperber 2000; Sperber and Origgi 2010; see also Scott-­Phillips 2014). So far, so good. Because of the importance we attribute to the discursive nature of human language, the point to analyse here is the question of whether the clues model and the mindreading system can be considered as sufficient conditions to explain human narrative abilities. The answer Sperber and Wilson give to the question is explicit and peremptory: as relevance is the principle of human communication that can explain any feature of language functioning and origin, even global coherence has to be interpreted in terms of relevance (global coherence is a derivate notion of relevance). However, contrary to Sperber and Wilson’s hypothesis, Giora (1997, 1998) convincingly shows that relevance cannot be the only principle that governs human communication. The relevance principle, in fact “can by no means replace current accounts of discourse coherence since it is neither necessary nor sufficient for text well-­formedness” (Giora 1997: 17). A useful example for understanding the coherence-­pertinence distinction is the case of situations in which it is possible to distinguish between discourses characterized by different degrees of coherence (as in 4a and 4b) (4a) The first time she was married her husband came from Montana. He was the kind that when he was not alone he would look thoughtful. He was the kind that knew that in Montana there are mountains and mountains have snow on them. He had not lived in Montana. He would leave Montana. He had to marry Ida and he was thoughtful (taken from Ida by Gertrude Stein). (4b) The first time she was married her husband came from Montana. He was the kind who loved to be alone and thoughtful. He was the kind who loved mountains, and wanted to live on them. He loved Montana. But he had to marry Ida and leave Montana (Giora, 1997: 26).

Giora’s view is that the difference in coherence between (4a) and (4b) cannot be explained in reference to the principle of relevance. In fact, while the segments of discourse are both relevant (according to Sperber and Wilson’s definition) “they nevertheless differ drastically in terms of coherence” (Giora, 1997: 26): (4b) is more coherent than (4a). According to Giora (1997: 22), the general conclusion that can be drawn from these considerations is that “coherence is not a derivative notion”. The stance in favour of the explanatory autonomy of coherence is grounded in the idea that the narrative dimension of language relies on the identification of the causal links that regulate the segments of discourse: discourse coherence, which is,

Mindreading, Mind-travelling and Origins of Language


in fact, closely linked to the respect of a well-­formedness criterion.4 In sharp contrast with Giora, Wilson (1998) argues that the characteristics of discourse attributable to well-­formedness are not a concern of RT since RT is a theory of comprehension, while the reference to well-­formedness involves properties not implicated in the (psychological) processes of comprehension. Without entering into the details of the dispute between Giora and Wilson, the question to analyse in order to understand if coherence is reducible to relevance is the question of whether the way in which the segments of discourse are connected together becomes part of production and comprehension processes. In accordance with Giora, we maintain that the ability to order the sentences of discourse in the right sequence represents an essential aspect of discourse coherence and of the processes that govern our “narrative faculty”. Specifically, we maintain that to account for coherence it is necessary to refer to principles other than those proposed by Sperber and Wilson. Two issues have to be stressed in this regard. The first one is related to the centrality attributed to the notion of event in human cognitive experience. As Sinha and Gärdenfors (2014: 76) claim, in effect, “the very structure of language attests to the primacy of the event in human cognition,” considering that “the life world of human experience is made of events, in which selves and other people figure as agents, performing actions directed to other agents and to objects”. The second issue concerns the fact that, as narrative discourse can be interpreted in terms of “the temporal organization of event sequences” (Sinha and Gärdenfors 2014: 72), in order to explain narration we inevitably have to explain the ability to analyse the causal structure of the sequence of events. Discourse coherence seems to be strongly linked to a capacity of this type. Data from the study of linguistic pathologies support our view. In a study relative to the temporal order of discourse in schizophrenics, Ditman and Kuperberg (2007) show that the difficulty of these subjects to maintain the coherence links across sentences is due to the fact that “building a coherent representation of discourse meaning (…) requires the establishment of logical and psychological consistency between the events and propositions described in individual sentences” (ivi: 992). It is difficult to account for the logical and psychological congruence between events and propositions without referring to the causal relationships between the events narrated in a discourse and the segments of the discourse used in the narration. The emphasis placed by Giora on the issue of well-­formedness fits well with the idea that the organization of discourse with regard to the temporal sequence of events plays a decisive role in the inability of schizophrenics to construct a coherent representation of discourse. So much for the issues of conceptual order. Considerations of this type have consequences on the level of cognitive architectures. In the following section, we discuss the systems involved in discourse processing. 4 Giora’s criticism toward the possibility of reducing cohesion to coherence is consis­ tent with the observation that according to Giora the expression well-­formedness in this context has a pragmatic (and not syntactic) characterization.


Francesco Ferretti & Ines Adornetti

5. Cognitive systems underlying discourse coherence According to Sperber and Wilson (1986, 2002; especially, Sperber 2000 and Origgi and Sperber 2000), the thesis that relevance is the only explanatory principle of language is strongly connected to the idea that mindreading is the only system at the basis of our communicative skills. The point to be stressed here is the fact that the interpretative models based on mindreading – such as RT– explain the aspects of language related to the clues model of communication alone. However, models such as these suffer from a serious difficulty: the exclusive attention paid to the speaker’s intentions leads one to exclude the temporal dimension from discourse processing, and in so doing, to overshadow the narrative foundation of communication. From the point of view of RT, a speaker can communicate, for example, that she doesn’t intend to go to the cinema both with a simple cue or, without altering the nature of her intention, with a long and detailed discourse explaining the reasons of her refusal. In both cases, all that the listener needs in order to understand what the speaker says is to grasp the speaker’s intention that she doesn’t intend to go to the cinema. The speaker’s intention, acting as an “attractor” that guides the interpretative processes, allows to the listener to grasp the point (in a literal sense) of what is being said by eliminating any accessory and irrelevant news. The contracted and punctuated nature of communicative intentions – that is, their atemporal character – is a great advantage in terms of cognitive economy. Scott-­Phillips (2014) maintains that, in the current research on human communication, the code model and RT are the only two alternatives. As the criticism of the code model made by Sperber and Wilson (1986) is strong, we can say that there is no alternative to RT. However, the possibility to improve upon Sperber and Wilson’s model is an open question that deserves to be examined. The idea that the expressive clues may function as evidence of the speaker’s communicative intentions is of great importance for a model of the origin and functioning of language. That said, the exclusive reference to the relevance principle (and to mindreading as the unique processing system involved in language functioning) prevent scholars from further analysing properties and processing systems crucial to the study of communication. When one switches from the analysis of communicative exchanges conceived as simple cues – the typical examples in support of RT – to the study of conversational exchanges in the flow of speech, a fact clearly emerges: the understanding of the flow of discourse cannot be reduced to the interpretation of the speaker’s intentions. As we said, the atemporal (punctuated and contracted) nature of communicative intentions is, because of its cognitive economy, a strength of the clues model. Nevertheless, it is also a weakness. The fact that the same communicative intention (e.g., not wanting to go to the cinema) could be expressed by means of a simple cue or by means of a long articulated discourse implies, in effect, the functioning of different processes. Indeed, in an articulated discourse the evidence of the communicative intentions offered by the speaker to the listener are deployed on a temporal level. When language functioning is analysed in reference to such a level, it appears clearly not only that the speaker’s intentions can change in an ongoing conver-

Mindreading, Mind-travelling and Origins of Language


sation, but that the intentions themselves can change because of the reciprocal relationship. Such a reciprocal relationship among intentions, guided by a principle of coherence, represents the thorn in the side of the clues model. From these considerations, it follows that the primary reason why discourse coherence cannot be reduced to relevance is that the processing system that grasps the evidence of the speaker’s intentions cannot account for the temporal dimension of conversation. If such temporal dimension appears to be a necessary condition for the flux of speech, then mindreading cannot be considered the only processing system on which to base the functioning of human communication.

5.1 Temporal navigation Given the attention we devoted to the temporal sequence of segments of discourse as a constituent element of the narrative foundation of language, a good way to begin our argument is a quote from Chafe (1987), discussed by Wilson in her dispute with Giora. Wilson is right to claim that: …discourse is best approached in terms of process than structure: “It is more rewarding, I think, to interpret a piece of discourse in terms of cognitive processes dynamically unfolding through time than to analyse it as a static string of words and sentences” (Chafe 1987: 48 quoted in Wilson 1998: 70).

We are completely sympathetic with this perspective. Provided, however, one takes seriously the idea that the processes involved in discourse processing are “dynamically unfolding through time.” Now, in spite of the emphasis reserved by Wilson for the temporal dimension of discourse, in terms of cognitive architectures, RT is not equipped to account for the processing implicated in the temporal plane of flow of speech. To explain processing of this kind, in fact, we have to make reference to Mental Time Travel (MTT), the cognitive device that enables individuals “to mentally project themselves backwards in time to re-­live, or forwards to pre-­live, events” (Suddendorf and Corballis 2007: 299; Corballis 2011; for a neuroscientific review cf. Grondin 2010). Corballis offers an important clue to the fact that a navigation device in time has to be involved in the elaboration of discursive coherence. As he considers that MTT primarily serves to study the ability at the base of the syntactic aspects of language to embed sentences in other sentences (Corballis, 2009: 553; Corballis 2011), he argues that MTT may be related to the human narrative ability (see also Ferretti and Cosentino 2013). Corballis also maintains, quoting Neisser (2008), that remembering is much more like telling a story than playing back a tape or looking at a picture. He states: “the same constructive process that allows us to reconstruct the past and construct possible futures also allows us to invent stories” (Corballis, 2011: 111). Corballis’ analysis clearly indicates a first important move to take in order to extend RT: if the explanation of narrative abilities has to appeal to MTT, then language processing cannot be interpreted in reference to mindreading


Francesco Ferretti & Ines Adornetti

alone.5 But there is more: arguing that temporal navigation is involved in the origin of narrative abilities inevitably means referring to the idea that spatial navigation is involved too. There are anatomical and functional reasons for the necessity of the involvement of spatial navigation in temporal navigation. From an anatomical point of view, the close link between space and time representation is well demonstrated by brain structure (Corballis 2013). The discovery of place cells allowed O’Keefe and Nadel (1978) to argue that the hippocampus is the basis of spatial cognition in rodents and is the substratum for episodic memory of humans (Dudchenko 2010; Assmus et al. 2005; Assmus at al. 2003; Oliveri et al. 2009; Parkinson et al. 2014). The neuroanatomical connections between space and time are commonly used to justify the close relationship between space and time also from a functional point of view. According to the proponents of the “spatial representation account”, in fact, as they “occupy an overlapping temporo-­spatial representation” (Cai and Connell 2015: 269), space and time cannot be considered as separate entities (Stocker 2014). The paradigm of reference is represented by the idea that the close connection between space and time rests on a profound asymmetry where space is primary. The starting point of the perspectives that refer to the “spatial metaphor” is Lakoff and Johnson’s conceptual metaphor theory (1980; Gibbs 2006). When talking about time, speakers of many languages use spatial metaphors, saying things like the future is in front of us and the past is behind us (or vice versa in some cultures) as a way to conceptually interpret abstract entities in reference to more concrete entities. According to Lakoff and Johnson, the ability to talk about time using space is the surface effect of a deeper phenomenon: the spatial metaphor is actually the product of our ability to think about time by means of space. Considerable experimental evidence supports the priority of space over time, and therefore an asymmetric interpretation of their relationship (Casasanto and Boroditsky 2008; Merritt et al. 2010). From these considerations, it is possible to argue that the primary source domain in order to analyse the human narrative capacity is the navigation in space.

5.2. Spatial navigation In effect, even intuitively, spatial navigation represents a good metaphor for thinking about the processes at the foundation of discourse. Gallistel (1990) defines navigation as “the process of determining and maintaining a course or trajectory from one place to another”. The ability to maintain a trajectory is a core component of the process involved in approaching a destination. Indeed, in order to reach the expected destination, one needs to keep the intended route (such as that calculated from the identification of the azimuth on a topographic map to get from point A to destination B) and overcome geographic obstacles (e.g. cliffs, rivers, or forests).

5 For the role of time in the evolution of language see also Cosentino (2011) and Gärdenfors and Osvath (2010).

Mindreading, Mind-travelling and Origins of Language


What happens in real navigation is never equivalent to the straight path drawn on the map: the actual movement in space requires a continuous realignment of the goal because of the difficulties posed by the harshness of the environment. In a very similar way, the process of discourse construction also relies on the ability to identify a goal (the content that the speaker intends to convey to the listener), to construct a route to this goal and to stay on track. Like navigation in space, the achievement of the communicative goal depends on the continuous realignments implemented by speakers to rebuild the route in the face of continual digressions imposed by the different points of view typical of verbal communication (Ferretti 2014; Ferretti and Adornetti 2011; Ferretti et al. 2013). Building the route and maintaining the right trajectory to the goal is equivalent, in narrative terms, to building and maintaining the global coherence of discourse. The hypothesis (at the foundation of spatial metaphor) that the more abstract knowledge domains are interpretable in terms of more concrete knowledge domains is of great value in order to understand the construction of a coherent discursive flow. The idea that time navigation is grounded on space navigation, in effect, allows a step toward the opportunity to understand the nature of the properties required from the cognitive elaboration of the flow of speech. The description of the temporal organization of event sequence that, as we said, forms the backbone of the narrative, has to be guaranteed, not only on the level of the internal relationship between discourse segments, but even on the level of the external relationship between the narrative plan and the flow of events that represent the core knowledge of individual experiences. The temporal relationship between the segments of discourse, in effect, cannot be considered in abstract terms alone: if time represents the key element of the narrative texture of the clues in the expressive speech flow, the spatial metaphor helps make the speech flow congruent with the flow of events narrated (Ferretti 2014). For this reason, spatial and temporal navigation represent the basic metaphor of the discursive nature of human communication. Such arguments lead us to propose that the extension of the clues model of communication must be linked to mindtravelling systems in space and time. That said, what kind of evidence could we offer to justify the involvement of navigational systems in the processing of discourse coherence? Schizophrenic derailment serves as a suitable reference point for an analysis that examines one of the basic building blocks of human narrative capabilities in the ability to stay the course of speech. Disturbances of the speech of schizophrenics are a textbook case of the loss of coherence in discursive abilities (Marini et al., 2008). Although to our knowledge there are no experimental data on the direct causal relationship between navigational systems in space and time and deficits in schizophrenic global coherence derailment, experimental data related to the difficulty in time projection (Peterburs et al. 2013; D’Argembeau et al. 2008) together with data related to the difficulty with space projection (Weniger and Irle 2008) of these individuals seem to support a causal link between the navigation systems


Francesco Ferretti & Ines Adornetti

and the construction of the flow of discourse (for a discussion on the relationship between schizophrenia and MTT, cf. Cosentino 2011). The analysis of the narrative foundation of our communication skills is a useful tool to hypothesize the protodiscursive origin of human language. In fact, it is in reference to a perspective of this kind that the intent to extend RT by linking the clues model of communication with the narrative perspective of language shows its explanatory power regarding the issue of the origins of language (Ferretti 2014). If the ability to maintain the route in navigation can be seen as the condition for the construction of the flow of discourse in human communication, we have good reason to think that the clues model (and the mindreading system strictly tied to it) must seek an ally in the navigation systems in space and time. It is only through projections in space and time that the expressive clues produced by our ancestral relatives earn a significant distinction from the signals produced in animal communication. From this order of argument, it follows that the transition from the code model to the clues model is not a sufficient condition to ensure the transition from animal communication to human language: the reasons we used to maintain that RT has to be extended in order to account for the functioning of language are the same reasons that lead us to maintain that RT needs to be extended and integrated in order to also explain the origin of language.

6. Conclusion In this chapter, we have argued that the narrative foundation of human language is a useful tool to investigate the functioning and the origin of our communication skills. At the basis of our hypothesis is the idea that the production and comprehension of sentences is not a sufficient condition for an effective communication and that the primacy usually assigned to the study of sentence grammar must give way to the investigation of discourse pragmatics. A confirmation of our hypothesis is the fact that discourse coherence is a property reducible neither to cohesion nor to relevance. While the relevance principle probably represents a necessary condition for understanding the birth of our communicative skills, nevertheless it is not a sufficient condition to account for the narrative texture of the flow of speech, and hence nor can it be a sufficient condition to explain the origins of language as a whole. From the point of view of cognitive semiotics, the conclusion to be drawn from these considerations is that discourse processing requires additional devices beyond mindreading, as well as very different devices than those implied in the analysis of constituents of the sentence. In line with an action-­oriented perspective of cognition, we have argued that the basis of the ability to produce and comprehend discourse is located in cognitive-­semiotic systems that allow individuals to navigate through space and time. The experimental data from the pathologies of language concerning the processing of global coherence lead us to propose that the metaphor of navigation we have assumed as a key explanation of human narrative abilities is more than a simple metaphor.

Alessandra Chiera

Chapter 11 From Conversation to Language: An Evolutionary Sensory-­Motor Account 1. Introduction This chapter offers a holistic model of language evolution, namely a model in which the specific dimension of conversation, rather than representing a late product of evolutionary history, marks the first stages of human communication. On the basis of this model, it is suggested that a specific pragmatic function characterizing the conversational context, namely alignment (Pickering and Garrod 2004), might have fostered linguistic communication. The concept of alignment refers to the coordination of situation or mental models (Zwaan and Radvansky 1998) with underlying dialogue, and is believed to be achieved by a primitive mechanism. In the same vein, Zlatev and colleagues (Zlatev and Andrén 2009; Zlatev 2013) pinpoint a specific stage in the development of intersubjectivity close to the “proto-­conversations” capacities outlined by Trevathen (1979): proto-­mimesis. Such a primary phenomenon based on perception/action systems allows empathetic engagements that could be considered as a first stage in semiotic development (Zlatev 2013); in other words, the social-­semiotic nature of language rests on several bodily abilities that are involved in increasingly more complex forms of intersubjectivity (Zlatev 2008a). A methodological claim at the base of this work is that an evolutionary theory of language should be constrained by an empirical cognitive account of linguistic use. To this extent, the question about the evolution of language and the question about the functioning of language are deeply intertwined. Thus, the model of language evolution provided here is strictly tied to an empirically plausible model of language functioning. This methodological premise is deeply connected to the “conceptual-­ empirical spiral” (Zlatev 2015a) according to which addressing what something is requires addressing how something develops in ontogeny and evolves in the species. Assuming this basic commitment, the reasons why the context of conversation represents a driving force in semantic elaboration are first discussed. Later, these arguments are linked to the evolutionary scenario, suggesting that the pragmatic processes involved in the elaboration of holistic contextual factors were also implicated in language evolution. In the last section, the focus is on the specific role of conversational alignment, claiming that in our ancestors it could have been supported by action-­perception mechanisms. Combining a top-­down model of language evolution with a bodily grounded account of cognition (Varela et al. 1991; Barsalou 2008), the chapter finally restates the necessity for a proto-­conversational model of language evolution.


Alessandra Chiera

2. The sentence-­computational approach to language Within the classical computational theory of the mind introduced by Putnam (1960) and developed by Fodor (1975, 1980), (human) cognition may be defined in relation to computational processes consisting in the manipulation of tokens in a language of thought. Specifically in Fodor’s mechanistic approach, thinking amounts to forming causal relations between “mental symbols”. These so-­called symbolic representations show both compositionally semantic and syntactic properties; the supposed advantage of such a mechanistic account is that the “mind/brain”, without directly accessing the semantic properties, can transform the syntactic ones in such a way as to imitate the semantic relations between the contents of those symbols by means of formal inferences. If the mind is a syntax-­driven machine of this sort, then mental processes are truth-­preserving in virtue of the logical form of the representations (Fodor 1987). In such a model, cognition can be explained by a mechanistic model which emphasizes the functioning of mental processes in abstraction from semantic content and sensory processing. This way of viewing the mind has affected the study of language in corresponding ways. According to so-­called classical cognitive science (Marr 1982; Fodor and Pylyshyn 1988), the functioning of language production and comprehension processes merely represents a specific case of a more general theory of mental computation. In particular, language is characterized by certain properties and driven by given processes because its structure mirrors that of the language of thought (Fodor 1975, 2008; Pinker 1994). Thus, if thoughts work on the base of a syntactic form which marks their very core, language – being a means for the expression of thoughts – will show the same features as well. In this sense, communication operates based on inferential mechanisms which make use of the syntactic form inherent in the physical symbol system. Generative linguistics introduced by Chomsky (1957, 1965, 1980) has fostered such a formal perspective on language in cognitive science. In this tradition, meaning is either neglected, or seen as the product of the syntactic combination of “mental symbols”. This hypothesis is strictly tied to the modularist account of the mind (Fodor 1983) according to which some genetically determined domain-­ specific structures operate on certain types of information, having access only to those contents which refer to their own mandatory task and are isolated from the rest of cognition. This informational encapsulation makes each process basically autonomous from other processes; from the sensory and motor processes among others (Pylyshyn 1999; Barrett 2005). All that is necessary for the encoding and decoding processes involved in communication is, according to this view, a syntactic parser able to analyze strings of symbols. This perspective has an important implication regarding the nature of meaning: the communicative exchanges are centered on what people actually say when they express a sentence, that is, “the whole proposition and nothing but that proposition” (Katz 1980: 18). Any eventual extra-­linguistic components are considered to be (relatively) negligible from a semantic point of view. It all comes down to combining the lexical forms of the words

From Conversation to Language


within a sentence by means of context-­free grammatical rules (Cappelen and Lepore 2005). Consider, for example, the sentence S: Rudolf has a red nose. Once you establish the referent of Rudolf, then the comprehension of S follows automatically, corresponding to the proposition that Rudolf has a red nose. Given these assumptions, it has been argued that semantic and pragmatic phenomena require different accounts: classical semantics deals with meaning interpreted in truth-­conditional terms whereas pragmatics is concerned with how meaning is generated within the wider communicative context. This might not seem particularly controversial; what is more controversial is the classical computational idea that pragmatics is either optional, or negligible. Pragmatic inferences are context-­dependent; that is, they rely on prior discourse information, knowledge of the speaker, general assumptions and similar factors which form the situational knowledge of speaker and listener (Grice 1975; Zwaan and Radvansky 1998). In a computational approach, making reference to such features is problematic, to say the least: “the more we believe context can influence semantic content, the more we will find ourselves at a loss when it comes to explaining how ordinary communication […] is possible” (Szabó 2006). Thus, computational psychology can only succeed in accounting for the constituent structure of language, which rests on the sentence unit whose elaboration is autonomous, fast and automatic as an “instinct” (Pinker 1997). The division between classical semantics and pragmatics is ascribable to the debate between two different models of linguistic interpretation. According to some representatives of compositional semantics (e.g. Fodor 1983; Chierchia and McConnell-­Ginet 2000), meaning is constructed as a two-­step process on the basis of semantic composition, which simply adds the semantic value of the single components in order to build the semantic value of the whole utterance. Literal meaning forms the first step and contextual extra-­linguistic information is integrated in sentence meaning only later by means of a slower pragmatic process which, in any case, does not represent the core level of language processing, or meaning. This two-­step model is criticized by those who claim that local and global contextual information is immediately integrated in the interpretative process, providing a one-­step model (Clark 1996; Perry 1997; Kempson 2001). Since the predictions concerning the time course of building non-­linguistic information are clearly opposite, in psycholinguistics the empirical plausibility of these two models is tested regarding the issue of when global situational models are actually integrated. The next section reviews this empirical evidence.

3. Empirical studies of the semantics-­pragmatics interface An increasing number of experimental studies with event related potential (ERP) methodologies are being conducted to analyze the modulation effects of the semantic interpretative processes on the part of conversational or discourse context (for a review, see Breheny 2011). For instance, Heller and colleagues (2008) investigated


Alessandra Chiera

the processing of an unfolding referring expression including a size adjective (e.g., Pick up the big…) within a conversational situation. The listener could see two pairs of size-­contrasting objects from his point of view (e.g., two ducks – one big and one small – and two boxes – one big and one small), however, one of the objects (e.g., the small box) was hidden from the speaker’s perspective. After hearing the word big, the listener anticipated the referent identifying it as the object that was part of the common ground (Clark 1996), by ruling out the possibility that it was related to the box. Hence, the listener quickly (after 300ms) integrated information about the speaker’s perspective into the disambiguation of the adjective. These results provide significant evidence against (a strong, sequential version of) the modular views of language: in those accounts, the fast effects following the onset of the adjective are not considered to affect the initial interpretation. Thus, strict Fodorian modularity is inconsistent with top-­down processing, because such processing constitutes a violation of information encapsulation. Further evidence consistent with this line of research comes from several works conceived by Kim (2014) as well as by van Berkum and collaborators (2003, 2005, 2008). The outcomes of these experiments highlight that some aspects such as the voice, the identity and the appearance of the interlocutor are immediately integrated into the interpretation. That is to say, the message is bound to the extra-­linguistic context in a top-­down fashion, incorporating the conversational framework into the on-­line comprehension. This hypothesis is strongly confirmed by neurolinguistic evidence: an incongruity from the interlocutor’s perspective elicits the so-­called N400 effect: a negative-­going voltage occurring 400ms after a semantic violation which underlines a difficulty in integrating a given stimulus into a previous context (Kutas and Federmeier 2011). Interestingly, Kuperberg (2013) has suggested that the semantic processes indexed by the N400 do not merely encode the relation between single concepts that activate a semantic relatedness network and facilitate the prediction of upcoming information, but that they are rather related to the activation of more complex stored multi-­level representations of entire events and states. To this extent, event knowledge instantly affects the processing of meaning with contextual information working in a constraining manner. In addition, Fussell and Kraut (2004) have observed that even on the side of production processes, speakers adjust their expressions tailoring them on the interlocutor’s perspective. In other words, people employ their situation models in order to produce anticipatory responses linked to the whole context of conversation which, therefore, constrains the interpretative process. In light of their experimental evidence, Otten and van Berkum (2007, 2008) pointed out that the observed effects are not attributable to simple lexical priming or, more generally, to associative mechanisms; rather, they are attributed to actual evaluations of content in relation to the information provided by the context. Overall, these interpretations suggest that speakers build a cumulative representation of the global message conveyed by a conversation and that such a representation immediately constrains production and comprehension processes.

From Conversation to Language


In this respect, the two-­step model appears to falter in explaining language processing in real-­time conversation. In opposition to its predictions, the conversational context has a primary role in building meaning and, by means of a constant adjustment process, drives comprehension among speakers (Altmann and Kamide 1999; Nieuwland and van Berkum 2006). These indications suggest that meaning can be defined as a constructive and context-­dependent phenomenon (Gibbs 1994; Clark 1996). As predicted by the single-­step model, there is not a priority encoding of local semantic information but rather the global context conditions the interpretation. From this point of view, bottom-­up and top-­down processes constantly interact, thus closely intertwining semantics and pragmatics (Carston 2008). Within the classical computational framework, the global level of conversation is achieved by combining the sum of single propositions. On the contrary, according to the reviewed empirical evidence, conversation is largely a top-­down phenomenon, in which a situational interpretation at the global meaning level guides local interpretation (Cosentino et al. 2013). If language use implies the construction of a situational interpretation at the global meaning level which has effects on the interpretation of a linguistic expression, then such an expression is not isolated but is strictly tied to the joint actions performed by speakers. In this view, communication is a collaborative process grounded in the interlaced work of speaker and listener, entailing an integrated account of bottom-­up and top-­down perspectives. Why should these results be relevant to the issue of the evolution of language? The idea is that the implications of these studies can be used to make a more general claim: they suggest that there is no stage of linguistic interpretation that is completely independent of contextual information, that is, meaning is immediately contextualized. This claim implies that in order to account for language evolution we should focus on how context constrains linguistic interpretation, which in turn implies focusing on the pragmatic abilities of our ancestors. In the following sections, these abilities are analyzed, along with the possible mechanism that underlies them.

4. The evolution of language in a top-­down perspective Over the years a great variety of theories of language evolution have been developed (e.g. Christiansen and Kirby 2003). However, they have largely focused on the analysis of strictly linguistic components while underestimating the pragmatic factors, following the idea that investigating the elaboration of individual phrases is a way to cast light on the nature of the overall linguistic phenomenon (Pickering et al. 2001). Nevertheless – if language is fundamentally conversation – rather than a combination of isolated linguistic components, its evolution should be studied in relation to broader capacities that are in charge of the ability to elaborate contextual variables and to establish a common ground with the interlocutor. Starting from such a global, pragmatic view of the functioning of current language use, context and interpretation must have been even more crucial in the evolutionary history of language, when a conventional culturally established code was not yet fully formed. The issue to be tackled is how context-­embedded fac-


Alessandra Chiera

tors could have constrained the interpretative processes. Investigating this issue implies looking into a crucial preliminary matter, that is, what initial (linguistic) communication may have looked like. On this point, recent literature on intersubjectivity (e.g. Zlatev et al. 2008; Fusaroli et al. 2014) proposes that the linguistic processes must be examined at the interpersonal level, where the complex social multimodal interactions display specific dynamic properties. Similarly, Tomasello (2008) argues that the early forms of communication arose from collaboration with others in collective activities and, thus, can be interpreted as joint actions. From this point of view, the problem of explaining how initial communication could work can be rephrased as the problem of explaining how individuals begin to engage in joint actions. This problem can be broken into at least two main sub-­problems, one related to the concept of “jointness” and the other to the notion of “action”. Starting with the latter, the problem is to account for how we understand other people’s actions. Once we have answered this question, the second problem to be dealt with is analysing how we realize joint actions. Thus, the main point so far has been that contextual information is immediately used in a top-­down fashion to interpret the meaning of linguistic expressions. The proposal of the present chapter is that, in order to extract relevant contextual information, individuals rely on the same processes by which they interact with the physical environment, that is action and perception processes. In support of this claim, Cosentino and colleagues (2014) conducted an experimental study to evaluate the processes involved in the semantic integration of a sentence in a discourse. They showed that sensory-­motor information is recruited during the process of sentence meaning composition to extract “ad hoc affordances”, namely dispositional properties of objects in a given situation that are based on a novel, context-­specific function (Barsalou 1993). For example, the researchers built a situation in which an ad hoc affordance for funnel was induced such that a funnel could be used to hang a coat. As ad hoc affordances are contextually induced, their immediate integration during the process of sentence meaning composition suggests that there is no principled temporal or functional precedence of local constraints over global contextual factors. Specifically, the authors link these top-­down effects with a mechanism of affordance perception, that is, the mechanism by which an individual detects an opportunity for action suggested by an object or the environment given her particular bodily structure (Gibson 1979). Accordingly, Amoruso and collaborators (2013) pinpointed a close link between sensory-­motor cognition and language. In particular, they emphasized that action meaning and language meaning elicit similar N400 modulations. In their view, it is plausible to interpret these outcomes within a grounded account of the N400, as the retrieval of sensory and motor information modulates meaning-­related processes indexed by this component. In this respect, meaning is a situated phenomenon embedded in prior experiences with the world and shaped by predictions derived from contextual ongoing information and previous knowledge. Within a motor account of social cognition, it is possible to fill the gap between language and action by assuming that the ability of actively perceiving others’

From Conversation to Language


actions as a form of meaningful behavior plays a role in processing language in context. According to this approach, the ability to understand other people’s minds crucially involves the capacity to understand other people’s intentions by observing their actions (Richardson and Dale 2005; Ramenzoni et al. 2008). Starting with the assumption that understanding a perceived action activates a model of the motor intention (Iacoboni et al. 2005; Gallese 2006), building joint activities involves people comprehending others’ actions alongside their own (Buccino et al. 2004). Thus, in some way the speaker can perceive and comprehend his own action and that of the listener in similar ways (Bermudez 2003) and can make reliable predictions (Grush 2004). Gibson (1979: 128) discussed the close relationship between environmental and social affordances: “the other animals afford, above all, a rich and complex set of interactions, sexual, predatory, nurturing, fighting, playing, cooperating, and communicating”. The application of the ecological perspective to social domain (e.g. Schimdt 2007) is sustained by the idea that social meanings have embodied power that regulates the actions and interactions of human beings. Communication is one of these social activities and, to this extent, can be seen as a sort of extension of bodily actions (Kono 2009); in other words, language comprehension is in some ways similar to event perception. In this view, other people’s intentions could provide a particularly rich source of contextual information as they are social affordances capable of adjusting the interactions. If this socio-­communicative approach is correct, then affordance perception might be a central mechanism as the same sensory-­motor processes that allow to determine one’s own action possibilities in the environment are also involved in the evaluation of the action possibilities of other people and, thereby, in the understanding of their actions (Cosentino 2014). Within this general framework, it is interesting to highlight that the emphasis on the affordance perception mechanism could provide a sensory-­motor account of (some) pragmatic abilities involved in language processing. This would also have the advantage of postulating a very basic mechanism of action-­perception coupling as the evolutionary foundation of more sophisticated mind-­reading abilities which may have fostered linguistic communication. In order to make this account a completely satisfying hypothesis on our ancestors’ pragmatic abilities, it is essential to specify which specific pragmatic function was underpinned by affordance perception and in which way it could have fostered linguistic communication. This is the topic of the next section.

5. Pragmatic alignment as the key for communicating Thus far, we have identified a possible link between action and intention in the affordance perception mechanism. It is worth noting that some assumptions of this chapter are consistent with the inferential model of communication by Sperber and Origgi (2010), according to which linguistic expressions represent a mere hint of the speaker’s meaning. In this approach, at the origin of language, a fragmentary and ambiguous code could possibly be interpreted by drawing inferences concerning the


Alessandra Chiera

speaker’s intentions. In this respect, communication arises from the joint activity of speaker and listener who are engaged in several coordination problems at many levels (Levinson 2000). The central problem consists of coordinating what the speaker means and what the addressee understands him to mean by trying to converge on shared situational models. That is the so-­called process of alignment (Pickering and Garrod 2004). This is the framework wherein the initial communication has been defined as a form of joint action that implies not only the coordination of practical activities but also the convergence of meanings. In the Sperber and Origgi account, language evolution can be explained in association with the evolution of a sophisticated mindreading capability, characterized by fast propositional attitude attributions. However, it is unlikely that the ability to attribute propositional attitudes was already in place, particularly as present-­day children acquire communicative skills before developing such ability (Zawidzki 2013). A more parsimonious account is that the construction of a shared mental space involved in building joint communicative actions is based on sensory-­motor mechanisms. More specifically, the affordance perception device described in the previous section might have a specific role in leading to the key process of alignment. How does it complete this task? Perceiving affordances represents a crucial step in the process of understanding both one’s own intended actions and the potential actions of the agent with whom one is interacting. Thanks to this process, people can make predictions and estimates of the social consequences of those actions; hence, the following performed actions become social clues which can confirm or not confirm the agent’s expectations (Cosentino 2014). From this perspective, the affordance perception mechanism, providing a tool for an immediate comprehension of other people’s intentions, can serve as a basis for simple forms of alignment. Affordance perception permits predictions of others’ intentions that in turn condition the planning of one’s own actions: throughout this adjustment process, the interpretation may be refined moment-­by-moment, driving the convergence between interlocutors towards a conceptual form of alignment. In support of this hypothesis, Linkenauger and colleagues (2012) have recently suggested that some of the social and motor impairments observed in many individuals with Autism Spectrum Disorders (ASD) may be attributable to an impaired mechanism of affordance perception. Whether these impairments are different deficits or have a common origin is very controversial. Despite that, some indications achieved with well-­established experimental paradigms provide interesting data in support of the hypothesis that a difficulty in the perception of affordances is the putative mechanism underlying both social and motor impairments. As previously suggested, an impairment in perceiving one’s own affordances may also affect the ability to perceive the action possibilities of other individuals; such a difficulty might be a crucial obstacle in the comprehension of others’ actions, leading to serious limitations in social domains. It is plausible that these limitations may have effects on the ability to align with people, affecting the pragmatic skills essential for communication. Indeed, despite the fact that strictly linguistic alignment has

From Conversation to Language


been found to be intact in people with ASD (Slocombe et al. 2013; Hopkins et al. 2015), converging with an interlocutor merely at the linguistic level (for instance, at the syntactic level) does not ensure convergence at the level of situation models. In accordance with a pragmatic inferential account, it is the situational alignment which is crucial for successful communication. Returning to the evolutionary scenario in light of the considerations developed above, it is plausible that the ability to read others’ intentions based on the interpretation of their actions is necessary for a structured code to develop. The development of a shared space paves the way for sophisticated forms of mental attunement (Tomasello 2008) that might have first sustained communication with natural gestures such as pantomime and then acted as an infrastructure on which conventional linguistic communication could rest. To this extent, the simple mechanism of affordance perception can provide an answer to some critical considerations raised by Origgi and Sperber (2010) among others. The problem they address concerns the convincing claim that for coded communication to work, speaker and listener have to share exactly the same code. Any difference between the speaker’s and the listener’s code is likely to cause some errors and, consequently, to compromise the success of communication becoming counter-­adaptive. In evolutionary terms then, in order to be advantageous the code has to undergo modifications which do not structurally modify the preexisting code of expression. Within the present framework, the situation would be very different. The alignment model based on affordance perception does not assume that a code needs to be shared. Instead, the ability to build a shared space allows communication to make predictions of the speaker’s meaning despite fragmentary and ambiguous coding. A linguistic code can arise exactly “from affordances that are brought forth by active engagement, and which enable further action and interaction” (van Lier 2006: 146). Conversely, once such a poor and fragmentary code emerges, it is likely to play a major role in semantic alignment. Fusaroli and colleagues (2012) used an experimental design to investigate the influence of public language in performing joint tasks. They asked pairs of subjects to cooperate using language in order to carry out a perceptual test. The findings showed that the more the pairs aligned in their task-­ relevant linguistic behaviors, the higher the level of their task performance. Overall, these indications suggest that there is a close relationship between language and joint actions: language meaning is built within the intersubjective space and this space, in turn, develops through language, in a dialectical manner (Zlatev 2008a). This complex bind represents a privileged key to the origin and evolution of human communication.

6. Conclusions In this chapter, the evolution of human language was addressed alongside the development of action-­based abilities of alignment. The concept of alignment paves the way for a model of language as a joint action that characterizes early communication in proto-­conversational terms. Indeed – starting from the idea that the origin of language has to be interpreted from a top-­down perspective, that is, with respect to the effects elicited by the situational context on the interpretation of an ex-


Alessandra Chiera

pression – the core of language does not rest on an isolated system or property but rather is strictly tied to the joint activities that led speakers to coordinate both practical and conceptual actions. This suggests the need to study early communication with regard to broader capacities that determine the ability to elaborate contextual variables and to establish a common ground with other people. In this framework, conversation can be defined as a dynamic interactive exchange, situated within a jointly determined – and constantly evolving – semiotic system. Interestingly, even at present in coded and well-­structured language, conversation keeps similar features in continuity with the first forms of communication. It is proposed that a specific role might have been played by a basic mechanism grounded in the sensory-­motor experience of individuals: affordance perception. Within the traditional computational model, the contextually varying top-­down influences of the conversational situation are hard to explain. On our account, the mechanism of affordance perception mediates between the top-­down aspects of language and the sensory-­motor nature of cognition, providing a tool for the definition of key processes as conversational alignment. It is plausible to conceive an evolutionary scenario where affordance perception started by guiding the coordination of practical activities and then advanced to guiding the alignment of more complex semiotic actions that involved the understanding of conventional and normative elements. Language might have arisen within this context, making us the extraordinary species that we are.

Acknowledgements This work has been conceived and discussed with Erica Cosentino. I am really grateful to her for her valuable comments. We presented an early version of this paper at IACS 2014 in Lund; we would like to thank the audience of the conference as well as the reviewers of an earlier version of the text for the constructive feedback provided.

Serena Nicchiarelli

Chapter 12 Protolanguage as Formulaic Communicaction 1. Introduction One of the major debates in evolutionary linguistics is concerned with the nature of early protolanguage and its transformation into modern language. In that respect, two competing models have given rise to a lively debate: the synthetic account (Bickerton 1990, 2010; Tallerman 2007, 2010), in which word-­like units are eventually composed into sentences, and the holistic account (Arbib 2005; Wray 1998, 2002), in which sentence units are broken apart into words. The fundamental distinction between the two accounts lies in the initial conditions, namely in the nature and complexity of the meanings associated with basic units of protolanguage (Smith 2008). In Section 2, I introduce these two competing accounts and present a proposal that relies on a notion of protolanguage that is more consistent with the holistic account. However, a protolinguistic code in which every single signal is strictly associated with only one atomic meaning, independently of the context, cannot be assumed to be the evolutionary starting point. The emerging verbal communication could not be useful unless initial linguistic expressions were contextually constrained. Hence, protolinguistic abilities need to be explained with regard to those cognitive systems that allow agents to act in their own environment: “Language can best be understood as a device which refines an already complex system – it is to be explained as a ‘recently’ evolved refinement of an underlying ability to interact with the environment” (Arbib et al. 2014: 62). More specifically, my proposal is that hominin protolanguage was significantly related to structures underlying the capacity to perform and recognize complex communicative goal-­oriented actions. In Section 3 I address the issue of a possible pantomimic foundation of protolanguage, in line with the models proposed by Donald (1991) and Arbib (2005). In order to evaluate the plausibility of this proposal, in Section 4 I present an analysis of formulaic language, in line with the idea that modern holistic processing strategy represents a legacy of ancestral holistic-­protolinguistic ability. The analysis is centered on the description of the functions and properties of formulaic language in actual dialogical interactions, thus highlighting the role of formulaicity in pragmatic tasks (e.g. Gibbs, 2007; Kecskes 2014). Finally, I consider clinical data derived from studies on productive language in Alzheimer’s and Parkinson’s diseases in support of the presented hypothesis of a holistic communicaction protolanguage.


Serena Nicchiarelli

2. Proto-­derby: Compositionality or holophrasis? In their introduction to a compilation of articles debating the nature of protolangauge, Arbib and Bickerton (2010: vii) state a rare point of agreement: “Somewhere and somehow, in the 5 to 7 million years since the last common ancestors of humans and the great apes, our relatives got language”. Beyond this, there is little consensus in the field. It is sometimes claimed (e.g. Berwick et al. 2013; Chomsky 2010) that human language capacity arose abruptly, in a single long jump, a saltation. Under this view, language is a uniquely human phenomenon that qualitatively differentiates us from other species, and it exists as a separate ability from all other cognitive capacities. Scholars in this tradition have looked for a “language organ” (a cerebral area dedicated exclusively to linguistic skills), often asserting that certain genes exist for the exclusive purpose of such an organ. An evolutionary scenario in this perspective maintains that modern language arose as the outcome of a fortuitous mutation that equipped Homo with the gift of language. Language, on this view, is clearly species-­specific, having no analogs on the planet. A more plausible approach states that language “is not a monolithic specific thing that we just have” (Kenneally 2008: 9), but rather something that we do, based on a multidimensional cognitive process that derives from the joint functioning of different systems in the brain, interconnected with other human capacities and other cognitive processes. According to this view, it is necessary to consider some form of gradualism in the evolutionary process that underlies language evolution (Hurford 2014), explaining how to bridge the gap between the cognition and communication of our non-­linguistic ancestors and those that we use today. The notion of protolanguage is used in language evolution research to refer to a hypothetical intermediary stage in the evolution of language characterized by the absence of grammar; it “helps to bridge the otherwise threatening evolutionary gap between a wholly alingual state and full possession of language as we know it” (Bickerton 1995: 51). But what was protolanguage like? Two different views have become the subject of a lively debate, with the main dispute anchored in divergence concerning the semantic complexity of protolinguistic units. According to the compositional or synthetic account (Bickerton 1990, 1995, 2014; Tallerman 2007, 2010), protolanguage consisted of a simple lexicon without syntax, allowing proto-­humans to use a limited set of word-­like units with simple atomic meanings, each associated with a basic pre-­existing single concept, and each connectable to others in a “slow, clumsy, ad hoc stringing together of symbols” (Bickerton 1990: 81). Under the holistic account (Wray 1998 2002; Arbib 2005a, 2005b, 2012, 2013), protolanguage corresponded to a system in which individual signals, lacking in internal morphological structure, conveyed entire complex propositions (cf. Smith 2008). In this model, a complete communicative act involved a unitary utterance used to refer to complex and significantly recurrent events. Such a holophrastic strategy consists of the use of a single complex expression, whose components – whether manual or vocal – lacked independent meanings. The basic units of meaning according to the holistic model

Protolanguage as Formulaic Communicaction


were represented by complex propositions conveyed by holophrastic signals, each linked to a complete communicative goal-­oriented behavior, the meaning of which does not derive from the sum of the parts. In this way, each utterance would have been phonetically arbitrary, without any relation in sound to even those utterances with a similar meaning. In line with Wray (2002), we may consider the holistic strategy for conveying manipulative messages as a legacy of the holistic system observed in the communicative behavior of chimpanzees in the wild (Reiss 1989). On the base of this parallel, it has been proposed that the holophrastic manipulative gestures of our pre-­human ancestors were transformed, over a long period of time, into a phonetically expressed inventory of holistic message vocalizations (Wray 1998). The ancient strategy of using holistic linguistic signals to accomplish some important functions in interactions is still present in the expressions of so-­called formulaic language. This is an open inventory of holophrastic units, each associated with a complex context-­ sensitive meaning that is not derivable from the meanings of subunits. Formulaic language may be considered a living relic of our ancestral protolanguage “because of its holistic nature and also because its functions show a striking correspondence with those for which holistic noise/gesture utterances seem to be used in primates” (McMahon and McMahon 2013: 69). In line with the idea that the holistic structure of language predates its parts, I here suggest on the basis of a “living linguistic fossils” analysis that the earliest protolanguage was in great part holophrastic, and that as it developed though time, it retained holophrastic strategies to accomplish important functions. It was however not predominantly vocal, but closely related to its previous stage in evolution, conventionalized pantomime, that is to say, the ability to use reduced forms of actions to convey complex aspects of other actions and events. Hence, holistic protolanguage may be termed communicaction: an open inventory of context-­sensitive holophrases, each representing a complex, communicative, goal-­oriented behavior. In the following section, I address the issue of a possible pantomimic foundation of protolanguage.

3. A possible pantomimic foundation of protolanguage If it is true that there is a continuum between the communication strategies of non-­human primates and human protolanguage, as well as between the latter and fully developed language, it is necessary to post a series of evolutionary stages underlying those modifications: Could protolanguage have sprung ‘fully-­armed’ from the cognitive armamentarium of primates? My answer to this question is a resounding ‘no’. There are important fundamentals missing from the primate mind, without which protolanguage could not emerge; I shall call these the ‘cognitive preconditions’ of protolanguage. (Donald 1999:140)


Serena Nicchiarelli

According to one specific stage-­based evolutionary model (Arbib 2005a, 2005b, 2006, 2012), a “key neural missing link between the abilities of our non-­human ancestors […] and the modern human capability for language” involves the mirror-­neuron system “with manual gestures rather than a system for vocal communication providing the initial seed for this evolutionary process” (Arbib 2006: 6). On this account, the mirror system for grasping was crucial to the early stages of protolanguage and its extension represents the decisive step that enabled the development of the ability to imitate: Imitation is seen as evolving via a so-­called simple system such as that found in chimpanzees (which allows imitation of complex “object-­oriented” sequences but only as the result of extensive practice) to a so-­called complex system found in humans (which allows rapid imitation even of complex sequences, under appropriate conditions) which supports pantomime (the artless sketching of an action to indicate either the action itself or something associated with it). (Arbib 2005a: 106)

According to Arbib, the “language-­ready brain” is the result of the following steps or stages: (1) grasping, an ability based on cerebral mechanisms that allow animals to interact with objects, and (2) a mirror system for grasping, linked to the comprehension of actions performed by others. Such a system represents the foundation for the development of a simple imitation system for object-­directed grasping through much repeated exposure. Thereafter, (3) evolution furnished the hominin line with the emergence of complex imitation system, namely the ability to recognize another’s performance as a set of familiar actions and then to repeat it. This is a crucial step in the evolution of communication insofar as it paves the way to (4) a manual-­ based communication system, initially founded on mimesis or pantomime (Donald 1991; Corballis 2011; Zlatev 2008a, 2014b). The transition from the ability of complex imitation to pantomime is granted by a critical neural change: the activation of the mirror-­neuron system also in “intransitive” acts, in which the action is not oriented toward (immediately present) objects.1 Similarly, in pantomime the audience needs to infer the meaning of the action from observation of the movement in isolation. Pantomime is typically performed with the intention of getting the observer to think of a specific action or event; it does not require agree-­on conventions and is holophrastic. However, as pointed out by Arbib (2003: 13), it has its drawbacks: …it may be quite long, limited, and energetically costly. So, the range of communication can be greatly improved by the development of conventions on the use of gestures that do not directly pantomime anything, but instead are developed by a community to refine and annotate the more obvious forms of pantomime – forms which themselves would became increasingly ritualized with use. The notion, then, is that the manual

1 Mirror neurons for grasping in the monkey brain will fire only if the monkey sees both the hand movement and the object to which it is directed. A grasping movement that is not made in the presence of a suitable object, or is not directed toward that object, will not elicit mirror neuron firing (Umilta et al. 2001).

Protolanguage as Formulaic Communicaction


domain supports the expression of meaning by sequences and interweavings of gestures, with a progression from “natural” to increasingly conventionalized gestures to speed and extend the range of communication within a community.

This leads to (5) the emergence of “protosign”, as pantomimes were replaced gradually by more economical, less ambiguous conventionalized gestures. As a result of such conventionalization, gestural description of objects and events were increasingly supported by non-­iconic vocalizations. This process, according to Arbib, fostered stage (6) the emergence of holistic protolanguage. This is characterized by communicative goal-­oriented acts based on the use of unitary holistic formulations, whose components – both gestural and vocal – lack independent meanings, as pointed out earlier. A significant effect of the holistic strategy is that certain ways of expressing an idea or an event become accepted as the preferred ones in the speech community (Wray 2002). Initially conventionalized pantomime, and – at a later stage – protolinguistic holophrastic vocalizations, may be described in Donald’s terms as “retrievable (autocueing) familiar action-­schemata” (Donald 1999: 68). As mentioned in Section 2, evidence for such a scenario is provided by the “living linguistic fossil” of formulaic language. As noted by Wray (1998), the holistic delivery of some complex messages is something that we still use in everyday communication, although obviously, the holistic forms we use today are not direct descendants of the original ones: “What we have inherited is not the forms themselves, but the strategy of using holistic linguistic material to achieve some key interactive functions” (Wray 2002: 115). To evaluate this proposal, in the next section I consider the properties and functions of formulaic language in actual conversational interactions.

4. Fossil in action: formulaic language Formulaic language is an “umbrella cover term for a number of formulaic linguistic categories” (Schmitt 2010: 64), each of which is a sequence (continuous or discontinuous) of words or other elements. Such a sequence appears to be “prefabricated”, that is stored and retrieved whole from memory at the time of use, rather than being subject to generation or analysis by grammar (Wray 2002: 9). More specifically, formulaic expressions are definable as recurrent conventionalized multi-­word lexical items that have a single complex meaning not derivable from constituent parts. Furthermore, they are: • undoubtedly holistic in nature (e.g. by and large; to go the whole hog); • grammatically sound but semantically holistic (e.g. to pull someone’s leg; the oldest profession); • indistinguishable from novel utterances, except that they are preferred over other equally possible formulations (e.g. Would you do me a favour? vs. Please perform an act of kindness for me!)


Serena Nicchiarelli

Considering it as a linguistic action-­based solution to the problem of how to promote our own survival, Wray (2002) suggests that formulaic language is the preferred way of communication when we act in a familiar environment. In the same way as pantomime, holistic utterances represent complex communicative goal-­oriented behaviors. They are context-­sensitive and oriented to stimulate a specific effect on the hearer in order to obtain a benefit, not unlike the socio-­interactional functions observed in ape communication. It is possible to find a relevant analogy between conventionalized pantomime and formulaic language, supporting the view that the evolution of the latter is based on the scaffolding realized by the first. Both are characterized by holophrastic form, a conventionalized complex (displaced) meaning, and an intrinsic association with a goal-­oriented action. Thus, we may define the holophrases of formulaic language as conventionalized vocal gestures. Current views on human language processing diverge from the standard generative model, in which such conventionalized vocal gestures were dismissed as a peripheral, limited set of ‘lexical items’, fixed expressions, “dead” metaphors or mere linguistic ornaments (Gibbs 2007). Formulaicity represents an integral part of the language that eases social interaction, enhances textual coherence, and, quite importantly, reflects fundamental patterns of human thought. A change in view has arisen from studies that reveal extensive incidence and communicative importance for formulaic expressions in actual language use (Pawley 2007; Schiffrin 1987; Wray 2002; Wray and Perkins 2000). Such approaches suggest that formulaic language falls under the pragmatic domain of language, as it consists of commonly used phrases that aid pragmatic aspects of communication (Gibbs 2007; Kecskes 2014; Van Lancker 1987; Van Lancker Sidtis 2006, 2012b; Van Lancker Sidtis et al. 2004, 2009; Wray 2010). People rarely talk using literal language exclusively: it is nearly impossible to speak of many human events and abstract ideas without employing idiomatic phrases that convey nonliteral meaning. The traditional view of idioms and related speech formulas considers these phrases as bits and pieces of crystallized language; that is to say speakers must learn these “dead” metaphors and speech gambits by arbitrarily pairing each phrase to some nonliteral meaning without any awareness of why these phrases mean what they do (Chafe 1970; Fraser 1970; Katz 1973). Yet formulaic expressions are not mere linguistic embellishments, aimed to adorn people’s speech style, but are an integral component of the language that facilities social interaction, improves textual coherence, and, highly contextualized, realizes several organizational functions in on-­line conversation. Recent studies reveal extensive incidence and communicative importance of formulaic language in actual language use. Some estimates consider that formulaicity constitutes a significant proportion of expressive linguistic situations with estimates for natural conversational speech at 40 % (Van Lancker Sidtis et al. 2004). When engaged in cognitively taxing tasks, speakers typically show a tendency to resort to the holistic processing mode because of the processing advantage it has over the analytic processing mode (Maad 2010). Further, formulaic expressions serve im-

Protolanguage as Formulaic Communicaction


portant functions in actual discourses and in conversations. Gibbs (2007) illustrates with the following formulaic expressions and their functions in American English: • • • • • •

revealing secrets in terms of spilling the beans; suddenly dying in terms of kicking the bucket; getting angry in terms of blowing your stack; taking risks as going out on a limb; trading gossip as chewing the fat; urging others to take action by saying the early birds catches the worm.

Wray (2002) has shown how familiar expressions are easier for speakers to perform, and easier for listeners to process in reception, thus facilitating the whole interactional process. However, it is not just the memorability of such phrases that represents an aid for communicative interactions (Bowles 2010). Many researchers have pointed out how formulaic expressions play an important role in the pragmatics sphere of actual conversational interactions and in the organization of discourse (Drew and Holt 1995; Gibbs 2007; Wray 2002; Moon 1998; Wray and Perkins 2000). For instance, idioms are excellent ways of signaling topic transition in conversation. Consider the following excerpt from a conversation between a daughter and her mother talking about the death of someone they both knew (adapted from Drew and Holt 1995: 123): Leslie: The vicar’s warden, anyways, he died suddenly this week, and he was still working. Mum: Good grace. Leslie: He was seventy-­nine. Mum:  My word. Leslie: Yes, he was. Mum: You’ve got real workers down there. Leslie: He was a, uh. Yes. Indeed, he was a buyer for the only horse hair factory left in England. Mum: Good grace. Leslie: He was their buyer. So he had a good innings, didn’t he? Mum:  I should say so. Yes. Marvelous. Leslie: Anyways, we had a very good evening on Saturday.

When Leslie says he had a good innings (an idiomatic allusion describing a batsman’s successful performance in a cricket match), she not only summarizes the information presented in her prior turn (e.g., he was a buyer for the only horse hair factory left in England), but refers to the whole theme of the conversation up to that point (e.g., that the vicar’s warden was still working at age seventy-­nine when he died). Leslie’s metaphoric description of the vicar’s warden’s life as a good innings refers to a more abstract, general idea than if she had simply stated that “he had a good life” (e.g. that his life was long and very productive). Thus, the idiom acts to thematically summarize the information revealed in the conversation and allows speakers to move on to the next conversational topic. Idioms are especially useful


Serena Nicchiarelli

in terminating a topic because of their distinctive manner of characterizing abstract themes in concrete ways (Gibbs 2007: 703). The following is a summary of the main properties of formulaic sequences in real communicative situations. (a) It is generally believed that these expressions are stored and processed more efficiently because single memorized units, even though they are composed of a sequence of individual words, can be processed more quickly and easily than the same sequences of words which are generated creatively (Bridges et al. 2013a, 2013b; Pawley 2007; Pawley and Syder 1983; Schiffrin 1987; Van Lancker Sidtis 2012a, 2012b). From this perspective, because of their holistic organization, formulaic sequences represent mental shortcuts both in linguistic production and comprehension, streamlining the conversational process and increasing communicative effectiveness. If, as Sinclair (1991, 2004) notes, actual conversational situations need linguistic production to be partially in accordance with the so-­called idiom principle linked to the natural tendency to reach maximum result with minimum effort, then, in Gibb’s words, “formulaic language is a means of ensuring physical and social survival of the individual through communication, on the one hand, and a way of avoiding processing overload, on the other” (Gibbs 2007: 702). (b) Formulaic expressions represent an instrument for manipulating others (Wray and Perkins, 2000), instigating listeners more efficiently to precise behavioral answers, and mainly oriented to generate modification into the speaker’s environment. In addition, the employment of formulaic expressions increases the sense of membership and collaboration in a linguistic community (Gibbs 2007; Kecskes 2014; Wray 2002). (c) Language has an extensive functioning that assigns different degrees of salience to the parts in which conceptual content is organized (Talmy 2007). There is a general tendency to direct attention to the whole meaning of an utterance rather than to individual components meanings as “a speaker’s actual linguistic expression often poorly represents the conceptual complex that he/she had intended to express” (Talmy 2007: 247). This squares in nicely with the holistic semantics of formulaic expressions. (d) Formulaic expressions are linked to precise frames of action, and thus to highly contextualized use conditions and a great degree of familiarity. Due to this, as underlined by Kecskes (2014: 112), formulaic sequences “create shared bases for common ground in coordinating joint communicative action”. Sinclair (1991) ascribes the prevalence of formulaicity in language use to “the recurrence of similar situations in human affairs […] a natural tendency to economy of effort [and…] the exigencies of real-­time conversation” (Sinclair 1991: 110). (e) Formulaic expressions are highly dependent on appropriate conversational context and often serve to move the dialogue forward or to monitor the action (Van Lancker Sidtis and Rallon 2004). Furthermore, they guarantee and increase discursive and conversational fluency, playing an important role in

Protolanguage as Formulaic Communicaction


correct dynamics of turn-­taking; they are frequently used at meaningful points of a story, summarizing the main discourse topic, achieving prominence by summarizing preceding talk, and providing a controlling image for what follows (Bowles 2010: 80). In the next section, I consider the implications of this analysis for findings from pathology.

5. The neurolinguistics of formulaic language Neural organization and the allocation of speech and language functions have long been attributed exclusively to cortical structures in the left cerebral hemisphere. Research in the last decades has changed this, providing support for subcortical and right-­hemisphere involvement in specific aspects of verbal communication, including those reflecting pragmatic competence (Bridges et al. 2013a, 2013b). Importantly, studies on language production in Alzheimer’s disease (hence, AD) have shown that individuals with AD retain the ability to produce formulaic language long after other cognitive abilities have deteriorated. What emerges from such types of clinical data is that formulaic language production involves significantly the activity of both right hemisphere-­subcortical prefrontal circuitry and the basal ganglia structures. Particularly interesting to the present proposal is the fact that these same cognitive systems are significantly involved in the setting of unaffected aim-­oriented movements, the planning of complex behavior, the analysis and the evaluation of contextual cues (Simons et al. 2005: 1781) and non-­literal meanings (Gibbs 2007). On the other hand, Lles and colleagues (1988) analyzed the production of formulaic language in the spontaneous speech of individuals with focal basal ganglia stroke and compared it to that of individuals with left- and right-­hemisphere stroke, as well as healthy controls. Individuals with basal ganglia stroke had significantly fewer formulaic expressions in their spontaneous speech than individuals with left-­ hemisphere damage and healthy adults, confirming that the activity of basal ganglia and right-­hemisphere is significantly involved in formulaic language production, a finding also confirmed by recent clinical observations (Bridges et al. 2013b, Van Lancker Sidtis et al. 2009). Cognitive impairment is common in Parkinson’s disease (henceforth: PD), even in the early stages, affecting around 25 % of patients without dementia at the time of the diagnosis (Saldert et al., 2014). In particular, cognitive changes are mainly characterized by disorders of executive functions, by deficits of memory and visual perception (Altmann and Troche 2011), and by a progressive dysfunction of the subcortical systems and the basal ganglia circuitry (Bridges et al. 2013b). Thus, it could be expected that there will be significant changes in the production of formulaic language by patients with PD. Indeed, findings show that people affected by Parkinson’s disease demonstrate impairments in those cognitive areas that are responsible for planning and manage-


Serena Nicchiarelli

ment of voluntary movement, and at the same time a strongly reduced production of formulaic sequences in spontaneous conversations. As reported by Altman and Troche (2011), PD affects firstly the comprehension of non-­verbal communication and the identification of non-­verbal meaning. Overall, people with PD show significant impairment at the level of discourse (therefore, at the level of complex meaning-­ making), resulting in reduced informative content, compromised discursive fluency, substantial difficulties individuating main conversational topics, and considerable incompetence in identifying contextual information (Altman and Troche 2011). In sum, the findings show that the proper language functioning of both verbal and non-­verbal communicative competence is compromised by impairments in the cognitive systems that support the production of formulaic language. Moreover, in line with our hypothesis, these cognitive systems are the same mechanisms that manage complex goal-­oriented action.

6. Conclusions In this chapter, I supported a version of the holistic protolanguage hypothesis which sees it as (a) emerging from conventionalized pantomime and (b) consisting of conventionalized vocal gestures, i.e. formulaic utterances. The combination of these features is what I refer to as “communic-­action”. Evidence for this hypothesis is that such communic-­action is still prominent in everyday human interaction. Furthermore, clinical studies show an important dissociation: the linguistic abilities in Alzheimer’s disease, where the basal ganglia and other structures that support practical goal-­oriented actions are spared, reveal a significantly higher incidence of formulaic expressions. On the other hand, a pathology that involves dysfunctional basal ganglia, Parkinson’s disease, is associated with significant deficiencies in the use of formulaic language and non-­verbal communication, as well as discourse in general. Since formulaic language could be considered as a set of goal-­oriented verbal motor gestures, it is likely that it represents a legacy of the ancient holistic processing strategy, supporting the hypothesis of a holistic protolanguage as communicaction. Overall, the present work can be seen as contributing to cognitive semiotics by suggesting how present day language, “a conventional-­normative semiotic system for communication and thought” (Zlatev 2008a), may be rooted in an action-­ based account, which in line with enactivism sees meaning as arising from the bi-­directional relation between the organism and its environment (Thompson 2007).

Part III. Meaning across Media, Modes and Modalities

Cornelia Müller

Chapter 13 From Mimesis to Meaning: A Systematics of Gestural Mimesis for Concrete and Abstract Referential Gestures 1. Introduction When the hands engage in gesturing, they are generally used as a mimetic medium. Basically all types of gestures involve a transformation from instrumental actions to non-­instrumental, communicative ones. Such a transformation is based on gestural mimesis, and calls for non-­trivial cognitive capacities, which makes this topic an important issue of cognitive semiotic reflections on (bodily) mimesis with regard to a phylogenetic and ontogenetic development of gestures and language (Andrén 2010; Donald 1991, 1998, 2012; Zlatev 2008a,b, 2014a,b). Miming actions motivates and grounds the meaning of referential, pragmatic and even pointing gestures and involves iconicity and indexicality as cognitive-­semiotic processes. Gestural mimesis (in the sense of schematized enactments of bodily actions) extends Zlatev’s concept of mimetic schemas (Zlatev 2005, 2007b, 2014a) to an experience-­based cognitive semiotics of gestural meaning. While Zlatev’s notion of mimetic schemas addresses the ontogenetic development of iconic gestures, the concept of gestural mimesis concerns the semiotic motivation of hand-­gestures more generally. Note, as an important aside, that Zlatev’s use of the term iconic gestures is not equivalent with the McNeillian (McNeill 1992) notion of iconic gestures. The latter is quite misleading, since it suggests that only gestures with a concrete referent are iconic, whereas metaphoric gestures, e.g., gestures referring to abstract actions and entities, are not. Furthermore, it implies that pragmatic gestures such as the precision grip or the Palm-­Up-Open-­Hand (Kendon 2004, chapters 12 and 13; Müller 2004) are not iconic. The term “iconic gestures” conflates a semiotic motivation with the specific semantics or pragmatics of gestures. Therefore, in the following, the terms concrete and abstract referential gestures are used, and they are meant to replace McNeill’s iconic and metaphoric gestures while at the same time maintaining the important semantic distinction he introduced (McNeill 1992). From a cognitive-­semiotic perspective, it is fairly obvious that iconicity (and indexicality) are motivating processes of gestures more generally. The concept of gestural mimesis thus not only addresses concrete and abstract referential gestures, it also concerns the historical motivation of conventionalized gestures (for instance, the ring gesture as “sign for love” or as “precision marker”, Müller 2014c). It thus also applies to so-­called pragmatic gestures, as for instance, the Palm-­Up-Open-­ Hand (the Open Hand supine in Kendon’s 2004: 264 terms), which derives its core


Cornelia Müller

pragmatic meaning as “presenting something as obvious” from the instrumental action of showing some small object on the open palm to somebody (Müller 2004). Zlatev’s concept of mimetic schemas (Zlatev 2005, 2014a) appears then to play a core role in a cognitive semiotics of gestural meaning, because mimetic schemas could in fact provide an important experiential ground for semantic frames that motivate gestural meaning. Distancing himself from Johnson’s concept of image schemas (Johnson, 1987), Zlatev has argued that mimetic schemas, being experientially richer “dynamic, concrete and preverbal representations, involving the body image, accessible to consciousness and pre-­reflectively shared in a community” (Zlatev 2005: 334) provide a better explanatory basis for an experientially grounded understanding not only of meaning, but of language development and evolution as well: The main theoretical advantage of mimetic schemas compared to image schemas is that they can help explain, almost literally, the “grounding” of both communication and thought through action and imitation, in both evolution and development. (Zlatev 2014: 5)

Studying the mimetic motivation of gestures offers further insights into the actual forms of experiential grounding of meaning. As argued elsewhere (Bressem and Müller 2014b), semantic frames (understood as schematic structures of canonical experiences, cf., Fillmore 1982) can explain the meaning of a particular group of pragmatic gestures, recurrent in form and function (for an overview on recurrent gestures see Ladewig 2014). The group of Away-­Gestures (sweeping away, holding away, brushing away, throwing away), derives its shared negative meaning from the experience of removing annoying objects from the body space (specifically the effect of those removing actions). Meaning differences between the members of this group are based on the different types of action and the different objects involved in them. Table 1 shows the schematic scene that can account for the similarities as well as for the differences within the group of away-­gestures. Table 1. The Away Action scheme as an embodied (semantic) frame of experience Cause of action:

Annoying objects, in/or approaching body space.


Manual actions of moving or keeping away annoying objects (sweeping, holding, throwing, brushing away, …).

Effect of action:

Cleared body space, the exclusion of annoying objects from body space.

The mimetic schemas involved in the group of away gestures are as-­if actions of mundane instrumental actions: the sweeping away of water from a table, the holding away of a swinging door, the throwing away of an apple core or the brushing aside of crumbs from a sweater. Gestural mimesis implies the enactment of mimetic schemas and involves cognitive-­semiotic processes and mimetic techniques that ex-

Systematics of Mimesis for Referential Gestures


plain how the transition from a full-­fledged action with an object to an “acting-­as-if” is achieved. Notably, gestural mimesis is not restricted to the mimetic enactment of actions as mimetic schemas of iconic gestures, but as semiotic techniques of creating abstract and also pragmatic meanings. Gestural mimesis as a capacity and technique could also be argued to underly the levels of semiotic complexity Andrén postulated for the ontogenetic development of gestures (Andrén 2010; Zlatev 2014a). According to Andrén and Zlatev for a bodily action to count as gesture, Level 3 on either communicative explicitness (#C3), semiotic complexity (#S3) or both – needs to be reached, as shown in Table 2. Table 2. Andrén’s developmental levels from action to gesture in social interaction (reproduced with permission from Andrén 2010: 68). Level Communicative explicitness

Level Semiotic complexity


Explicitly other-­oriented action (visible communicative intentionality)


Explicit sign: Expression stands for meaning X (“as if”)


Action framed by mutual attunement (ambiguous communicative intentionality)


Typified aspects of action meaning (type/token): Expression counts as doing X (“for real”)


Side effect of co-­presence (no visible communicative intentionality)


Situation-­specific aspects of action meaning: An action in its uniqueness

While gestural mimesis semiotically motivates all types of gestures, this chapter concentrates on a presentation of the concept of gestural mimesis based on the Aristotelian understanding of mimesis in the arts. It offers an illustration of how mimesis motivates concrete as well as abstract referential gestures. It is argued that the meaning of concrete and abstract referential gestures is experientially grounded in different forms of bodily mimesis. In doing so, an important touchstone for an experience-­based cognitive semiotics of referential gestures is provided. By offering a systematics of the experiential grounds of gestural mimesis for concrete and abstract referential gestures, the present chapter also aims to contribute to some of the current evolutionary work on bodily mimesis. It appears that using the hands not only for instrumental but also for communicative purposes demarcates one of the most significant steps not only in ontogenesis but also in phylogenesis of language (Arbib 2005a, 2008a, 2012; Corballis 2002, 2013; Donald, 1991, 1998, 2012; Kendon, 2009; McNeill 2015, Zlatev 2008a,b, 2014b). Even our closest relatives, the nonhuman primates, show a (restricted) capacity to use their hands referentially (Liebal, Call and Tomasello 2004; Liebal, Müller, Pika 2007; Pika and Liebal 2012; Müller 2014a).


Cornelia Müller

The capacity to create and use gestures develops in children along with their cognitive, social and linguistic development (Andrén 2010; Volterra and Erting 1990; Goldin-­Meadow 2003; McNeill 1992; Zlatev 2007b, 2014a; Zlatev and Andrén 2009). Zlatev provides a wealth of arguments for a fundamental role of bodily mimesis in children’s language development as well as in language evolution (Zlatev 2014b: 202): “[…] human nature – characterized by a consciousness that is uniquely social and representational – rests on a specific pre-­linguistic adaptation: bodily mimesis (Donald 1991, 1998, 2012; Zlatev 2005, 2007a,b, 2008a,b, 2014b).” Zlatev has put forward a definition of bodily mimesis, which implies a developmental hierarchy from proto-­mimesis, to dyadic and triadic mimesis, leading into the post-­mimetic stages in which language proper takes over. See Table 3. Table 3. Zlatev’s Mimesis Hierarchy, with incremental features and corresponding cognitive-­ communicative skills (reproduced with permission from Zlatev 2014b: 207) Stage Highest system


Communicative skills



Semiotic systematicity

– Grammar (= conventional symbolic system) – Narrative


Protolanguage Conventionality/ normativity

– Two-­word utterances – Multimodal constructions


Triadic mimesis

Communicative intention

– Declarative pointing – Iconic gestures – Joint attention


Dyadic mimesis

Volitional re-­enactment

– (Over) imitation – Imperative pointing – Shared attention


Proto-­mimesis Mapping exteroception and proprioception

– Emotional contagion – Attentional contagion – Neonatal mirroring – Mutual gaze

While Zlatev and Donald are interested in the cognitive-­semiotic macro-­process of language development, be it ontogenetically or phylogenetically, the present chapter zooms in on one specific facet of mimesis: the ways in which a particular form of bodily mimesis is used to create meaning from the body. Put differently, this chapter outlines how mimesis motivates the (embodied) semantics of referen-

Systematics of Mimesis for Referential Gestures


tial gestures, and how mimesis governs the transformation of instrumental hand movements into referential gestures. It is the human capacity for mimesis that makes this transformation possible and the Aristotelian concept of mimesis as conditio sine qua non of human nature sheds light on the semiotic and cognitive principles governing gestural mimesis (Müller 2010a). When the hands are used for communicative purposes, i.e. when actions become gestures, mimesis is what makes this happen. We may use our hands (and other body parts too), to communicate about concrete as well as abstract actions and entities in the world and this is what concrete and abstract referential gestures are about. The mimetic processes that motivate gestures as much as the signs of signed languages1 (etymologically) involve abbreviation, stylization and schematization. Thus there is a significant formal difference between the instrumental and mimed action, something that Goffman and Bateson have identified as a core characteristic distinguishing play and fight in apes (Goffman 1980: 55; Müller and Haferland 1997). Cohen and colleagues have described abbreviation as an inherent characteristic of an iconic sign in a signed language: “To indicate an action the signer generally performs an abbreviated imitation of a characteristic part of it: write, for example, is signed by making such an imitative movement” (Cohen et al. 1977: 17). Now, whereas in a signed language forms of imitation used to refer for instance to a specific action will be limited by convention, (non-­conventionalized) gestures allow for variation in how to imitate one and the same action. Writing, for instance, might be “gestured” by imitating the holding of a pen and moving in a horizontal plane from left to right, or by representing the pen with an extended index finger and moving rightwards (in Western European cultures). Gestural meaning creation appears to go hand in hand with processes of abstraction or schematization and this is what differentiates a mimetic schema from a full-­ fledged action. But gestural mimesis also involves perspective taking and thus can be considered a form of embodied conceptualization. Mimetic forms thus come to be the embodied touchstone of gestural meaning. Note that meaning is understood here in cognitive-­linguistic terms: as conceptual structure and as a form of conceptualizing the world (Langacker 1987, 1991, 1998, 1999, 2008), which – notably – also includes pragmatic meaning. In the following, a systematics of gestural mimesis will be presented, which is based on Aristotle’s concept of mimesis as a fundamental human capacity. Applying Aristotle’s work on mimesis and poetics to gestural mimesis provides the framework for a cognitive-­semiotic take on an embodied mimetic semantics of gestures referring to concrete and abstract actions and entities. 1 Although for a long time it was not “politically correct” to make this point, in fact, researchers of signed languages have over the years come to the conclusion that mimesis, here dealt with in terms of iconicity, plays a core role in the motivation of signs (Wundt 1921; Battison 1974; Cohen, Namir, Schlesinger 1977; Mandel 1977; Kendon 1980a,b, 1986, 1988; Taub 2001; Wilcox 2004). Especially, in classifier predicates, iconicity is seen as a major motivating force of their meanings (Cogill-­Koez 2000; Emmorey 2003; Perniss 2007, see also Müller 2009).


Cornelia Müller

2. Mimesis and poetics: Aristotle’s systematics of different forms of mimesis The concept of mimesis has a long-­standing tradition in Western philosophical reflection upon the arts. In maintaining critical distance from his teacher Plato, Aristotle attributes mimesis a core role in the arts and considers it as a fundamental anthropological condition. The capacity for, and the pleasure of, mimesis distinguishes the human species from all other species, according to him. Human beings are a miming species and, importantly, they enjoy the products of (good) mimesis. Watching a tragedy is thus not only a cognitive but first of all an affective enterprise. Aristotle discusses the question of mimesis prominently in his Poetics (Aristotle 1968) and uses it to discriminate different artistic and literary genres. A variable use of media, objects and modes leads to different mimetic forms, e.g. to different genres and more generally to different forms of art. Table 4 shows an overview of Aristotle’s forms of mimesis. Table 4. Aristotle’s forms of mimesis in the arts Forms of Mimesis in the Arts Media

poetic language, rhythm, melody, music, dance


ethical and unethical actions of human beings


acting, the action is being performed or narrated, a narrator describes actions and events

Three fundamental aspects distinguish the different forms of mimesis: media, objects and the modes of mimesis. Media concern the means by which something is being mimed: poetic language, rhythm, melody, music and dance. Mimetic objects in an Aristotelian sense (and thus objects of arts) are the ethical and unethical actions of human beings. The modes of mimesis explicate how mimesis is achieved. The mimetic modes are critical for Aristotle’s distinction of tragedy and epos. While epos (the “older” literary form) is characterized by a narrating mode (narrator describes actions and events), tragedy follows an acting mode of mimesis: actions are performed rather than described. The objects of mimesis in tragedy are ethically valuable actions and the mimetic mode of tragedy is the depiction of an action by action not by epic narration. Moreover, tragedy is characterized by poetic language, temporal rhythm, and prosody or melody. Although Aristotle discusses mimesis in the Poetics primarily in the context of poetic language and particularly to distinguish tragedy from epos, it is aimed to apply to all mimetic arts, including music and dance. If Aristotle’s analysis is true and mimesis is indeed a basic human capacity, it might help to shed some light on one of the probably most fundamental mimetic practices of human beings, namely, gestural mimesis. As mentioned above,

Systematics of Mimesis for Referential Gestures


precursors of gestural mimesis are found in the great apes (Müller 2014a, Zlatev 2008a,b), but only when it comes to miming very basic actions like giving or hitting. The human capacity to use instruments in dealing with their environment in conjunction with an upright posture has led to an explosion of manual practices (Leroi-­Gourhan 1980; Donald 1991, 1998, 2012) and it is these practices in the first place that constitute an immense base for the emergence of gestures. Prototypical gestures are hand-­movements that mime actions and objects. In the following, Aristotle’s systematics is used as inspiration for arriving at a closer understanding of gestural forms of mimesis.

3. Forms of mimesis in gestures: The Aristotelian systematics as framework for an embodied mimetic semantics Taking Aristotle’s concept of mimesis as a starting point for a systematics of referential gestures offers a possibility to clarify the experiential grounds of gestural meaning. It can even be considered a framework for an embodied mimetic semantics of concrete and abstract referential gestures. Drawing upon Aristotle’s concept of mimesis and its role for the arts, three aspects of mimesis can be distinguished for gestural mimesis as well: the material, the objects of mimesis and the mimetic modes (see Table 3). The material of gestures is the different bodily articulators: fingers, hands, arms, shoulders, head, face, eyes, lips, trunk, legs, feet etc.; the objects of gestural mimesis concern actions and entities that a gesture may be used to refer to; the mimetic modes regard the techniques of gesture creation. The objects of gestural mimesis fall into six semantic subcategories. When entities are the objects of gestural mimesis, gestures can depict: Entities as entities, Properties of entities, Movement and localization of entities. When, on the other hand, the objects of gestural mimesis are actions, gestures may depict: Actions as actions, Aspects of actions, Actions with objects. There are two basic techniques by which gestural mimesis is achieved in hand gestures: acting or representing (Müller 2014a). In the acting mode the hands enact manual actions and movements of the forelimbs; in the representing mode the hands present something other than themselves. By acting as-­if opening a window, turning a car key or by moulding the roundness of an event, the hands become gestures based on the acting mode. By taking on the shape of flat, round, or hollow objects, they act as-­if they were manual sculptures, often sculptures in motion.2 In the following, it is illustrated how the meaning of concrete and abstract referential gestures is grounded in the media, the objects and the modes of gestural mimesis. 2 Zlatev’s notion of mimetic schemas would only apply to gestural mimesis based on the acting mode. The representating mode primarily uses visual mimesis.


Cornelia Müller

Table 5. The Aristotelian systematics as a framework for an embodied mimetic semantics of referential gestures Forms of Mimesis in Referential Gestures Media

Bodily articulators: Fingers, hands, arms, shoulders, head, face, eyes, lips, trunk, legs, feet, (…)


Actions: A  ctions as actions Aspects of actions Actions with objects Entities: Entities as entities Properties of entities Movement and localization of entities


Acting Representing:

Enacting bodily actions and movements. Becoming bodily “sculptures”.

3.1. Media of gestural mimesis The medium of gestural mimesis is the body as articulator of expressive movements (Müller 2014b). So far very little is known about the use of different bodily articulators, let alone of their interrelation. Figure 1 gives four examples of gestures that involve different bodily articulators and different movement gestalts that emerge from the variable interplay of articulators. Figure 1. Different bodily articulators’ contribution to gestural expression

In Figure 1a we see an example of somebody depicting the action of pouring from a pitcher. The gesture involves a fist-­like hand shape, a bent arm, and an up- and sidewards movement of the trunk and the right shoulder. Figure 1b shows a young man looking at his hand as-­if reading. The hand represents an open book – it is a

Systematics of Mimesis for Referential Gestures


kind of manual sculpture of it – while his head is bent forwards, his gaze is oriented towards the hand. In Figure 1c we see somebody acting as-­if his two spread fingers were a pair of scissors (a manual sculpture in motion). This gesture includes the hands and the forearm. Unlike in Figure 1b, gaze towards interlocutor is not part of the semantics of this gesture but rather serves the pragmatic goal of addressing the recipient of a multimodal utterance. Figure 1d depicts yet a different case of a mimetic bodily medium: here the speaker’s right leg is the foregrounded articulatory body-­part. The dancer’s leg metaphorically presents a heavy anchor chain. This is a metaphoric body gesture (Müller and Ladewig 2013) as a heavy leg is a pre-­requisite for the specific style of “walking” in Argentine Tango. It ensures stability and balance while having most of the weight on the other leg. Argentine Tango is based on this specific walking technique and students have lots of difficulties learning this. Now, the dance instructor uses the anchor-­chain metaphor to explain how to achieve this balance: the body of the dancer becomes a ship on the ocean and the free leg is a heavy anchor chain pending into the water. The anchor chain becomes a metaphor for stability in dancing. The full metaphor can be paraphrased as: “Balance (in Argentine Tango) is feeling stable like a ship with a pending anchor chain”. The hands on his leg have an indexing function, together with the gaze direction, they direct the interlocutors’ gaze. The articulators involved in this gesture are: the leg, the arms, head and gaze, maybe also the bent upper trunk area.3

3.2. Objects of gestural mimesis As outlined above, objects of gestural mimesis, as far as concrete and abstract referential gestures are concerned, fall into two categorial groups: actions and entities. When considering how actions and entities are gesturally referred to, it transpires that gestures are used to depict actions as actions (for instance, “enacting waving at somebody”), to depict aspects of actions (for instance, “enacting motion of something or somebody”), or to depict actions with objects (for instance, “grasping a window handle”) (see Table 5 for an overview). When gestures are used to refer to entities, three basic forms are distinguished: gestures refer to entites as entities, for instance, when the hand becomes an open book, as in the example depicted in Figure 1b; gestures may also refer to properties of entities, as when moulding a round shape. Or, very often, gestures are used to refer to movement and/or location of entities. This holds for the scissors gesture in Figure 1c. Notably, the entities and actions gesturally depicted may be very concrete tangible actions, such as the gestural enactment of pouring water from a pitcher (Figure 1a), but they may also be abstract ones as for instance, when a leg stands metaphorically for the weight of an anchor chain (Figure 1d). Four examples from an experiment on gesture usage in metaphorical and non-­ metaphorical contexts (see Müller 2014a) are meant to illustrate how gestures

3 For a detailed analysis of this example, see Müller and Ladewig (2013).


Cornelia Müller

can depict concrete as well as abstract mimetic objects. So, for instance, the two women in Figure 2 gesturally describe the crashing of something against a wall. In (Figure 2a) it is a car crash, and in (Figure 2b) a metaphorical “crash” of an effort to solve something. In both cases, two entities are depicted: one hand represents a dynamically moving entity (movement of entity) crashing into another one, depicted by the other hand, representing a static entity (location of entity). The speakers establish a manual scenario with one static entity (location of entity), the wall in (Figure 2a), an abstract effort as movement of entity in (Figure 2b), and a moving car also as movement of entity. Figure 2. Gestures referring to movement and location of entity: driving something against a wall, in (a) a literal, and (b) a figurative sense

When looking at the vivid gestural depiction of the literal and figurative events, one could be tempted to assume that the event as a whole was presented. However, each gestural depiction comes with a very specific perspective on the action. For example, the gesture in Figure (2a) does not show how somebody held a steering wheel and how her body was thrown back by the crash. The same holds for the metaphoric gesture in Figure (2b): we do not see how the whole body was affected by an imagined crash. Rather the crash is conceptualized from a bird’s eye view, an observer viewpoint, not from a character viewpoint (McNeill 1992; Stec and Sweetser 2013; Sweetser 2012). In both cases, a flat laterally oriented hand represents a wall while the other hand represents in a rather schematic way the energetic movement of an object crashing into another one. In these gestures the speakers display their specific view on the world; thus they can be characterized as embodied conceptualizations. Figure 3 shows two examples of gestures referring to properties of entities. The two women talk about the roundness of a bench in Figure 3a and of some abstract issue in Figure 3b. In both cases, the hands mould an ephemeral round shape. An interesting difference between the concrete use and the figurative use of the gesture is the gesture space. In Figure 3a the location of the gesture depicts the location

Systematics of Mimesis for Referential Gestures


of the round object (the bench), while in Figure 3b the location of the gesture is not semantically loaded, it is performed in the default gesture space. It is important to note that in these examples the mimetic enactments are not used to mime the action, e.g. the reference object is not an action of moulding or touching the surface of an object, rather the mimed action functions as a technique of representation for the shape of objects (Müller 1998a, b). Thus by acting as-­if moulding a round shape, properties are depicted and often via those canonical properties objects are being gesturally referred to. As Mittelberg’s work on metonymy in gestures has pointed out, those semiotic processes involve internal and external metonymy in a Jakobsonian sense (Mittelberg 2014; Mittelberg and Evola 2014; Mittelberg and Waugh 2009, 2014). Here internal metonymy (the as-­if grasping hand) as well as external metonymy (the indexing of an external object by as-­if touching its surface) are critical aspects of a semiotic chain. Figure 3. Gestures depicting Properties of entities: the roundness of (a) a bench (e.g.) literally and (b) to the “roundness” of an abstract issue e.g. figuratively

To sum up, objects of gestural mimesis can be concrete as well as abstract actions and entities. Note that metaphorical reference and abstract reference are used synonymically here, although, strictly speaking metaphor may be very concrete – as in the case of the anchor chain for heavy tango leg metaphor. However, imagining the leg as an anchor chain is certainly a context of use that is more abstract (or less basic than how the Pragglejazz Metaphor Identification Procedure (MIP) would put it, Pragglejaz-­group, 2007) than referring to the heaviness of an anchor chain in the context of a conversation about sailing ships.

3.3. Modes of gestural mimesis: acting and representing as techniques of gesture creation Returning to the third dimension of Aristotle’s account of mimesis, in passing it was already mentioned that the modes of mimesis are critical for distinguishing epos and


Cornelia Müller

tragedy. The narrating mode of mimesis (actions and events are described by a narrator) characterizes epos, the acting mode tragedy. Notably, this opposition of perspective taking that the difference between performing actions and narrating actions is also present in the two basic modes of gestural mimesis: acting and representing. In the acting mode, the hands enact actions and movements of the bodily articulators. In the representing mode, the body depicts something other than itself, it become a manual “sculpture” (See Table 5). This distinction very often goes along with a specific viewpoint: most of the acting gestures come with a character viewpoint, most of the representing gestures with an observer viewpoint (McNeill 1992; Stec and Sweetser 2013, Sweetser, 2012). In earlier accounts, I have termed the mimetic modes gestural modes of representation and described them as manual techniques of depiction and as motivating forces for the iconicity of gestures (Müller 1998a, b, 2014a). At the time four modes of representation were distinguished: acting, moulding, drawing and representing. In the acting mode, the hands act as-­if performing an everyday action (acting as-­if opening a window). In the moulding mode, the hands act as-­if touching a surface (as-­if moulding the round shape of a picture frame). In the drawing mode the hands act as-­if sketching lines and contours (as-­if sketching the contour of a picture frame). In the representing mode, the hands represent an object (as-­if the hands were a sculpture of a picture, an open book, etc.). Arguably (Müller, 2009, 2010b, 2014a) three of the four modes of representation are actually based on manual actions: the hands act, the hands mould, and the hands draw. The difference between the three is not only that they are different types of manual actions. While the acting mode captures enactments of everyday actions of all kinds, mostly involving everyday instruments, such as a window handle or a pitcher, moulding gestures are enactments of hands touching surfaces, which is why they very often depict shapes of objects. Thus their mimetic object is not the action, but the ephemeral shape of an object. In drawing gestures the extended index is used as-­if tracing in sand or mud. Drawing gestures are used to depict shapes of objects too, but often they also outline “object” lines, e.g. the course of a river or the winding path in a garden. Despite those semantic differences, these are all actions of the hands and their meaning is grounded in those schematic actions, corresponding to Zlatev’s mimetic schemas, which is why they are here grouped together under the category of the acting mode of gestural mimesis.4 4 The term representation is used in the sense of Karl Bühler’s concept of Darstellung (‘depiction’) (Bühler, 1934; Müller 2013, 2014a). As the English translation of Bühler’s work uses the term “representation”, I have applied the same term. However, it is important to be aware that the German word does not involve the prefix re- that comes with “re-­presentation”. Bühler’s theory of representation, thus does not imply a re-­ presentation of an objective outside world. His concept of Darstellung (depiction) is free of any such connotations. So are my earlier accounts of gestural modes of representation. By characterizing them as techniques of constructing conceptualizations of perceived and conceived events in the world, the modes of representation open a door to systematically re-­construct the experiential grounds of a given gesture’s meaning.

Systematics of Mimesis for Referential Gestures


Both the acting and the representing mode can be used to refer to concrete as well as to abstract actions and entities, as shown in Figure 4. The acting mode of mimesis may be used to create a gesture referring to a literal (Figure 4a) and to a figurative sense (Figure 4b). In Figure 4a the woman is grasping a window handle, in 4b the man grasps imagined houses and moves them from one city to another one. Figure 4. The acting mode of gestural mimesis used to create gestures that refer to (a) a literal opening a window and (b) a figurative sense of moving houses

The examples in Figure 5 show how the hands become manual sculptures, and represent something other than themselves. In Figure 5a the right hand represents an opened book, in Figure 5b the left hand becomes the iron curtain which used to separate the East and the West during the Cold War. Figure 5. The representing mode of gestural mimesis used to create gestures that refer to (a) a literal (an open book) and (b) a metaphorical sense (the iron curtain dividing East and West)

However, as the collection of examples already has indicated, there is no straightforward match between the mimetic modes and the mimetic objects. How can an


Cornelia Müller

embodied semantics then be grounded in experiences? In fact, the relation between the mimetic modes or the techniques of gestural representation is somewhat more complicated. The possibilities for the bodily articulators to become gestures for communication are not limited to a simple and straightforward connection between the modes of mimesis and the mimetic objects. Thus gestures based on the acting mode do not necessarily depict actions. On the contrary, the mimetic modes may be employed flexibly to construct different kinds of mimetic objects. The question of what mimetic mode is used is therefore a matter of what perspective the speaker takes. Table 6 gives an overview of the examples discussed above, in relation to the respective mimetic modes employed, and the actual objects or actions referred to. Table 6. The relation between mimetic modes (acting and representing) and mimetic objects (actions and entities) is flexible. Mimetic Modes Mimetic Objects




Ex.: “opening window” Ex.: “moving and placing houses”

Ex.: “driving car against the wall” Ex.: “driving enterprise ‘against the wall’ ” (figurative sense)


Ex.: “roundness of bench” Ex.: “roundness of abstract issue” (figurative sense) Ex.: “moving and placing houses”

Ex.: “driving car against the wall” Ex.: “driving enterprise ‘against the wall’ ” Ex.: “localizing open book” Ex.: “localizing and moving iron curtain” (figurative sense)

Some actions that speakers refer to gesturally, such as, for instance, the driving of a car against a wall, can apparently be depicted in both modes. Depending on the chosen mode, they construct the event from either an observer or a character viewpoint in the sense of McNeill (1992). Gestures performed from a character viewpoint take the perspective of the actor, an example would be somebody acting as-­if holding a stearing-­wheel. A gesture made from an observer viewpoint takes a bird’s eye view on an event. So, for instance, in the driving of something against the wall contexts described above, both the concrete and the abstract contexts were gestured from an observer viewpoint. A consequence of this independence of mimetic mode and mimetic object is that the specific combination of the two creates a particular perspectivation, a specific form of gestural conceptualization. Thus, the modes of gestural mimesis always involve perspective taking on mimetic objects, be they figurative or literal. The different forms of gestural mimesis and the different possible combinations of media and mode provide variable experiential grounds for the different mimetic objects, e.g. for creating gestural meaning in concrete and abstract referential gestures.

Systematics of Mimesis for Referential Gestures


4. Conclusion: Mimesis grounds gestural meaning in experience Aristotle holds that human beings are a mimetic species. In this chapter it has been argued that this capacity for mimesis appears not only relevant in the evolution of the human species, but is present and observable in the here and now of gestures in use. The capacity for mimesis provides the experiential grounds for the creation and the understanding of gestures. The different forms of mimesis presented in this chapter offer variable semiotic resources for the emergence of meaning, in particular with respect to the meanings of concrete and abstract referential gestures. This experiential motivation of referential gestures opens up the path to an experience-­based cognitive semiotics of gestures, which could also be accounted for as an embodied mimetic semantics of gestures. It lends support to Zlatev’s reflections upon mimetic schemas in developmental contexts and indicates that a close analysis of the types and processes of gestural mimesis might enlighten reflections upon the ontogenetic and phylogenetic development of gestures and language. It also underlines Zlatev’s hestiation concerning the explanatory role of image schemas in evolution and development, as these are highly abstract and might be more important when it comes to explaining more abstract linguistic meanings. What we see in gestures is very often a semiotically dense enactment of motor patterns that the notion of mimetic schema appears to account more appropriately for than the notion of image schematic structures. However, this highly pertinent issue needs further and deeper discussion from a gesture analysis point of view (cf., Mittelberg 2010). From the point of view of methods for gesture analysis, reconstructing the concept of gestural mimesis offers a key to a differentiated understanding of gestural meaning construction. It does so because it avoids the pitfalls of projecting linguistic categories onto gestural meaning as well as the contextual reading meaning into a gesture. On the contrary, starting out from a close description of the mimetic processes involved in a gestural movement proves to be an intersubjectively accountable descriptive methodology for forms of gestural meaning construction. This is the basic idea of a linguistic and form-­based gesture analysis that can be summarized as follows (see for more detail, Bressem, Ladewig, Müller 2013). Starting from a close account of the gestural form, a gesture analyst may ask, what are the gestural hands actually doing? What mimetic mode is employed and what are the ephemeral shapes, movements, objects that are created, to arrive at a first account of a form-­based meaning. Take for instance someone enacting the holding of a steering-­wheel. It is possible to describe this dimension of the gestural meaning without knowing its context. If then, in a second methodological step, the context (semantic, pragmatic, syntactic, sequential) is taken into consideration, it is possible to further specify this basic meaning of the gesture. For instance: as an answer to a question how are you getting home? By driving! Or it could be a part of a story: we were driving all night long to LA. Thus in gesture analysis there are at least two steps from form to meaning in gestures: the first step spells out the motivation of the gesture, the second one explicates the contextualized meaning of


Cornelia Müller

it. On the side of the speaker/hearer, these relate to two different cognitive-­semiotic processes: (a) one that ensures sign formation (motivation) and (b) one that specifies local meaning (semantics and pragmatics) (Mittelberg and Waugh 2009, 2014; Müller 2004, 2010b). Departing from an account of the particular type of mimetic mode involved in the creation of a gesture provides a form-­based, descriptive ground for a thick description of the gestural meaning in a particular context. It therefore offers intersubjectively accountable descriptions of the particular form of conceptualization involved in the creation of everyday gestures, be they concrete and tangible or abstract ideas. Gestural mimesis opens a path towards an understanding of the embodied roots of gestures referring both to concrete and abstract objects and actions in the world and it also may shed further light on the specific processes involved in the evolution as well as in the ontogenetic development of languages, be they signed or spoken.

Acknowledgments I would like to thank the artist Mathias Roloff for all the drawings in this chapter ( I am very grateful for the extremely insightful comments and suggestions by the reviewers of this chapter. Moreover, this research could not have been conducted without the generous grant of the Volkswagen Foundation: “Towards a grammar of gesture. Evolution, brain and linguistic structures” (www. and the many wonderful people that have contributed to this research enterprise. I’d like to mention specifically Silva Ladewig and Jana Bressem, Sedinha Teßendorf and Susanne Tag, but also my interdisciplinary governing group Ellen Fricke (Semiotics), Hedda Lausberg (Neurocognition) and Katja Liebal (evolutionary anthropology). This research has been supported by the Russian Science Foundation grant #14–48-00067.

Grigory Kreydlin & Lidia Khesed

Chapter 14 Verbal and Nonverbal Markers of Impolite Behavior in Russian Language and Non-­Verbal Code 1. Introduction In the present chapter we analyze and describe various aspects of impoliteness – a linguistically and semiotically marked1 category that embraces some discourse strategies that cause disharmony in people’s interaction and demonstrate a breach of the existing norms of social behavior. Our aim is to discuss some cognitive and linguistic properties of two classes of Russian lexical sign units. The emphasis is placed on the features encapsulated not only in some Russian language signs (words and word combinations) but also in some lexical signs of the Russian nonverbal corporal semiotic code. By this we mean “gestures” in the wide sense of the word, including meaningful movements of the head, shoulders, hands or feet, sign postures, facial expressions, meaningful bodily movements, glances and other types of corporal signs. The model and mode of description chosen clarifies, expands and elaborates some relevant aspects of more general models of impolite communicative behavior presented in the linguistic and semiotic literature (Brown and Levinson 1987; Nikolaeva 1990; Post 1996; Wierzbicka 1999b; Kastler 2004; Bousfield 2008; Сulpeper 2009; Rathmayr 2009). In most previous research the category of impoliteness has been studied either together with or in opposition to the category of politeness. It is only in the last few years that several monographs and a small amount of articles have appeared in which the category of impoliteness is regarded as a separate category.2 However, many cognitive and semiotic aspects related to impoliteness and its representation in language and gestures remain unexamined. We mean here the types of knowledge displayed through the syntax and semantics of verbal and non-­verbal markers of the category of impoliteness, as well as the exploration and description of general mechanisms and specific instruments of human interaction in impolite dialogues. 1 By semiotically “marked” we mean meaningful in the current communicative situation (cf. Kreydlin 2002). 2 Among the most important works published in Russian in the last decade we can name four books: Formanovskaya (2006, 2010) and Larina (2009, 2013), and several articles: Kronhaus (2001, 2004); Krylova (2006). Among works written in English Culpeper (1996, 2009), Bousfield (2008) and Locher and Watts (2008) that should be noted.


Grigory Kreydlin & Lidia Khesed

In this chapter we focus on some adjectives as primary means of the categorization of impolite behavior in Russian, and on some emblematic, co-­speech gestures.3 The semantic representations of both types of signs involve some significant cognitive facets of impoliteness such as the observer’s position, i.e. the aspect by which the speaker qualifies the current communicative situation. The peculiarities of different models of Russian impolite behavior can be clearly understood if one addresses the semantics and syntax of three Russian adjectives: • грубый (grúbyj) ‘rude’ • дерзкий (dérzkij) ‘impudent’ • хамский (hámskij) ‘caddish’ Each of these (along with their derivatives) corresponds to and represents a particular type of Russian impolite behavior. The semantic and syntactic descriptions that we give to these words are based on our exploration of texts extracted from the electronic National Corpus of Russian Language (NCRL, as well as on information taken from dictionaries for Russian language and gestures.4

2. Methodology Our exploratory approach differs in some important ways from previous ones. In particular, we regard impoliteness as an independent category and not a ‘negative pole’ within the category of politeness. For example, we show that to not be polite does not automatically mean “to be impolite” and vice versa. The category of impoliteness is also explored through its representations in both verbal and nonverbal codes, emphasizing the connections between verbal and nonverbal markers of impoliteness. The structure and semantic metalanguage of the explications that are presented in the chapter follow the principles and methodology developed within the Moscow Semantic School and used by Russian non-­verbal semioticians as well. The approach for analyzing language signs applied by the Moscow Semantic School was developed by a number of Russian linguists including J. D. Apresyan, I. A. Melchuk, A. K. Zholkovsky and others. The core principles of that approach are an integral description of language and the systematic lexicographic treatment of kindred linguistic phenomena. According to these statements, the full description of the language signs includes not only an analysis of their meaning, but also an 3 The first term belongs to Efron (1941/1972), the second one – to Kendon (2004). The two are not synonymous, as not all co-­speech gestures are emblematic (cf. McNeill 1992). 4 Here we refer to the explanatory Russian language dictionaries including Dal’ (1867, quotes from 1994 ed.); BAD (1951–1965); Ushakov (1940, quotes from 2000 ed.); ­Ozhegov (1949, quotes from 2007 ed.); SAD (1983); Kuznetsov (1998); NEDRS (2004); and those of gestures, such as DLRG (2001); Armstrong, Wagner (2007) and Akishina, Kano (2010).

Impolite Behavior in Russian Language and Non-Verbal Code


analysis of their usage and formal structure. This description is based on a unified model and aims to reflect the naïve world view of the native speaker. The meta-­ language for the semantic description of the language signs developed at the Moscow Semantic School, corresponds in some respects to Anna Wierzbicka’s theory of semantic primitives (1999a, 1999b), but differs in the following aspects: it is ethnically specific rather than universal and includes signs which reflect the additional semantic components that lie beyond the primitives. This meta-­language is used within this chapter while categorizing the impoliteness within the sentential form X is impolite to Y in the situation Z.5 In 2001, G. E. Kreydlin, S. A. Grigorieva and N. V. Grigoriev published the Dictionary of Russian Gestures where they demonstrated that the same model can be used to describe non-­verbal sign units. Each vocabulary entry includes the following attributes: • • • • • • • •

physical shape the active and passive somatic object in performing the gesture meaning conditions of usage cultural and historical context alternative nominations (including phraseology) text illustrations and images

The same multimodal and multidisciplinary approach to the study of Russian gestures was presented in the monograph Non-­verbal Semiotics (2002), by G. E. Kreydlin. We use it in the present chapter as well. The categorization of impoliteness in Russian language and culture is given through a detailed analysis of its verbal and nonverbal markers and the links between them. This is in line with the general approach of cognitive semiotics, under5 The principles of the Moscow Semantic School were introduced through a number of works such as Explanatory and Combinatorial Dictionary, by I. A. Melchuk and A. K. Zholkovsky (1984), Lexical Semantics by Y. D. Apresyan (1995), New Explanatory Dictionary of Russian Synonyms, ed. by Y.D. Apresyan (NEDRS, 2004), etc. These principles also formed the basis for the National Corpus of Russian Language (further – NCRL). Data from NCLR is actively used in this chapter as the main source of material for semantic analysis. The semantic description methodology for language signs developed by the Moscow Semantic School was inherited and successfully adapted to the needs of Russian non-­verbal semiotics. By this we mean the tradition, methodology and meta-­language developed by a group of Russian linguists specializing in Russian gestures (including T. Akishina, N. Grigoriev, S. Grigorieva, L. Iordanskaya, G. Kreydlin, T. Krylova, T. Larina, S. Paperno: Tumarkin, G. Zytseva and others). Their work continues the tradition of study on non-­verbal semiotics developed through the work of specialists like G. Calbris, M. Johnson, A. Kendon, A. Mehrabian, D. Morris, I. Poggi and A. Sheflen.


Grigory Kreydlin & Lidia Khesed

stood as a transdisciplinary matrix focused on the phenomenon of meaning and its reflection in human cultural practices (cf. Zlatev 2012). Following a definition given by Russian semioticians (Selyaev and Rykov 2008; Valkman 2012), cognitive semiotics is a paradigm which unites mental processes, cognition, sign codes (in language or other media) and the realities of a particular culture. In this chapter, we emphasize that the categories of impoliteness and politeness alike are culturally specific (see Section 6 on ethno-­cultural homonymy), reflecting the social norms unique to a given culture (in our case – Russian), the potential for breaking these norms and the sanctions for such a break. Using data on the Russian language, we analyze what is considered impolite and how impolite behavior demonstrates itself in the corresponding Russian words and gestures. We observe the links between impoliteness and politeness, impoliteness and etiquette, impoliteness and education, or upbringing, namely the process of acquiring social norms. As impoliteness manifests itself both in verbal and non-­verbal codes – and links between the markers in each system are an important aspect for us – we apply a multimodal approach using various sources of data. These include the NCRL, explanatory dictionaries of Russian and body Russian, linguistic works on non-­verbal semiotics and semantics, and our observation of common human behavior.

3. The category of politeness vs. the category of impoliteness: common and distinctive features Before explicating the semantics and syntax of the adjectives considered we shall explicate the common and distinctive features of the categories of politeness and impoliteness and how they are represented in Russian language and gestures. The common features of the categories are context sensitivity, estimative character of polite and impolite sign units and the coordination of these signs with the norms of the social behavior and etiquette existing in Russian culture Examples (1) and (2) illustrate how the context sensitivity influences the estimation of human behavior as polite or impolite: In both cases the addressee of the letter stays silent. (1) В ответ на оскорбительные письма Пётр вежливо промолчал. (V otvét na oskorbítel’noje pis’mó Petr vézhlivo promolchál). ‘In reply to the offensive letter Peter politely remained silent’. (2) Я получила от Вас четыре письма. Получив пятое, я поняла, что промолчать будет невежливо. (Yá poluchíla ot vás chetýre pis’má. Poluchív p’átoje, ja pon’alá, chto promolchát’ búdet nevézhlivo). ‘I received four letters from you. After I received the fifth, I thought that it would be impolite to remain silent’ (Stolitsa journal, 1997).

Impolite Behavior in Russian Language and Non-Verbal Code


By the estimative character of polite and impolite sign units we imply that one’s actual behavior is often estimated as polite or impolite, and a person who is the subject of the estimation is either involved in the actual communicative situation or not. The semantics and usage of polite and impolite lexical signs must somehow reflect these peculiarities of estimation. In (3), the estimation of the actual behavior is given by the addressee who is involved in the communication, and in (4) the estimation is given by the observer. (3) Вы – тот самый кот, что садились в трамвай? – Я, – подтвердил польщённый кот и добавил: – Приятно слышать, что вы так вежливо обращаетесь с котом. (Vý – tót sámyj kót, chto sadílis’ v tramway? – Yá, – podtverdíl pol’shchónnyj kót I dobávil: – Priyátno slýshat’, chto vý tak vézhlivo obrasháetes’ s kotóm). ‘But, sir, are you that same cat, sir, who got on the tram? – I am, – the flattered cat confirmed and added: – It’s pleasing to hear you address a cat so politely’ (M. Bulgakov, The Master and Margarita). (4) Если бы сейчас была дискуссия, – начала женщина, волнуясь и загораясь румянцем, – я бы доказала Петру Александровичу… – Виноват, вы не сию минуту хотите открыть эту дискуссию? – вежливо спросил Филипп Филиппович (Ésli by sejchás bylá diskússija, já by dokazála Petrú Alexándrovichu… – Vinovát, vý ne sijú minútu hotíte otkrýt’ étu diskússiju? – vézhlivo sprosíl Filípp Filíppovich). ‘If there were a discussion now, – said the woman, flushing hotly, – I would prove to Pyotr Alexandrovich… – I beg your pardon, but do you wish to open the discussion this minute? – inquired Philip Philipovich politely’ (M. Bulgakov, The Heart of a Dog).

Polite and impolite types of behavior are both closely associated with the concept of social norms. The norms of Russian social behavior and the system of Russian etiquette have been analyzed and described in the works of Russian linguists such as A. Bayburin and A. Toporkov, N. Formanovskaya, G. Kreydlin, M. Kronhaus, T. Krylova, T. Larina, E. Morozova, R. Rathmayr and others. These norms vary according to the cultural context, historical context, current communicative situation, etc. If these norms are not explicitly fixed in Russian culture or, by contrast, if the behavior of a person in an actual situation satisfies the exact and strict norms of the communicative behavior, then Russian speakers never call the situation вежливая ‘polite’ or невежливая ‘impolite.’ For example, the behavior of small children can’t be considered polite or impolite as they do not know the norms of social behavior. Another example is with military communication when a soldier addresses the commander. In this case, the norms are strictly fixed (even in written form), so there is simply no option for any other type of behavior. Speaking of the correlation between politeness, impoliteness and etiquette, we need to mention that these categories do not stipulate each other. One can follow


Grigory Kreydlin & Lidia Khesed

the rules of etiquette without demonstrating any good will or sympathy towards the addressee, and may even try to offend. We may note the Russian phrases холодное рукопожатие ‘cold handshake,’ надменный поклон ‘arrogant bow,’ and высокомерный кивок ‘haughty nod’ (as a welcoming gesture), referring to categories of etiquette. Further, the differences between politeness and impoliteness are usually regarded as the two poles of one pragmatic discourse category called politeness. As noted in the introduction, we regard politeness and impoliteness as two separate categories. The reasons for this view are as follows. First, не быть вежливым (ne být’ vézhlivym) ‘not to be polite’ does not mean ‘to be impolite’. Second, if one cannot characterize the person’s actual behavior as impolite, it does not mean it’s obligatory to call it polite. The situation described in (5) accounts for our position and explicates it. (5) Imagine a street in Moscow. An old woman is walking down it. At some distance from her a young man is walking in the same direction. He outraces her, pushes her and keeps walking without an apology.

The behavior of the young man is, of course, impolite. He broke a rule of etiquette, according to which he should have apologised to the woman. But if he outraced a woman without pushing her – just because he is younger and is walking faster, his behavior could not be called impolite, without being polite either.

4. The basic script of Russian impolite communicative behavior Taking into account the typical features of Russian impolite communicative behavior, we propose the basic script of the category of impoliteness peculiar to Russian culture. The entry of the script is the predication вести себя невежливо (vestí sebyá nevézhlivo) ‘to be impolite’ that is part of the sentential form X is impolite to Y in the situation Z where variables X, Y and Z show the arguments of the predicate to be impolite. In formulating the script below, we used the semantic metalanguage developed in the Moscow Semantic School, as outlined in Section 2. X is impolite to Y in the situation Z = ‘In the situation Z and other similar situations the social norms prescribe Russians to behave in a certain way and this behavior is considered good; in spite of this the speaker thinks that person X is behaving differently now; the speaker deems X’s behavior to be bad’.

In the present sentential form, X is a person who renders the estimation of being ‘impolite.’ This can either be a narrator (see (6), where an author calls Nekhludoff impolite for not replying to Kolosoff), or a speaker who is involved in the communi-

Impolite Behavior in Russian Language and Non-Verbal Code


cation (see (7), where Margarita considers the behavior of Nikolay Ivanovich to be impolite as he does not reply to her). (6) Нехлюдов, рискуя быть неучтивым, ничего не ответил Колосову и, сев за поданный дымящийся суп, продолжал жевать. (Nekhlúdoff, riskúja byt’ neuchtívym, nichevó ne otvétil Kólosovu i, sev za pódannyj dymyáshyjsya sup, prodolzhál zhevát’). ‘Nekhludoff, at the risk of being impolite, did not answer Kolosoff, and, sitting down in front of the steaming soup, continued to eat’ (L. Tolstoy. The Awaking). (7) Левою рукою Маргарита провела по виску, поправляя прядь волос, потом сказала сердито: – Это невежливо, Николай Иванович! Всë-­таки я дама, в конце концов! Ведь это хамство не отвечать, когда с вами разговаривают! (Lévoju rukóju Margaríta Provelá po viskú, popravlyája pryád’ volós, potóm skazála serdíto: éto nevézhlivo, Nikoláy Ivánovich! Vsó-­taki ya dáma, v kontsé kontsóv! Ved’ eto hámstvo ne otvechát’. Kogdá s vámi razgovárivajut!). ‘Margarita passed her left hand over her temple, straightening a strand of hair, and then said crossly: That is impolite, Nikolai Ivanovich! I’m still a woman after all! It’s boorish not to reply when someone is talking to you’ (M. Bulgakov. The Master and Margarita).

Y is a person to whom this estimation is addressed. It can be either an addressee involved in the actual communication, as in (7), a third party as in (8) or the speaker himself, as in (9), which means that X and Y could indicate the same person. (8) Ваш сын нагрубил преподавателю английского на уроке. За что и получил двойку. (Vásh syn nagrubíl prepodavátelju anglíjskogo na uróke. Za chto i poluchíl dvójku) ‘Your son was impolite to his English teacher today. That’s why he got a bad grade for the lesson’. (9) «Неучтиво, но не могу писать. Всё равно увижусь с ней нынче», – подумал Нехлюдов и пошел одеваться. (Neuchtívo, no ne mogú pisát’. Vsyo ravnó uvízhus’ s nej nýnche – podúmal Nekhlúdoff i poshól odevátsya). ‘It is impolite, but I cannot write. But I will see her today – thought Nekhludoff, and started to dress himself’ (L. Tolstoy. The Awaking).

This script of impolite behavior can be seen as a semantic invariant of all Russian verbal and non-­verbal lexical signs belonging to the field of impoliteness. Thus, the semantic propositions within the script constitute the nucleus of the semantics of the words грубый (grúbyj) ‘rude,’ дерзкий (dérzkij) ‘impudent’ and хамский (hámskij) ‘caddish’ that demonstrate the isomorphism between the semantics of the words


Grigory Kreydlin & Lidia Khesed

and the categorization of Russian impolite communicative behavior.6 The semantics of these words is further analyzed in the following section.

5. The semantics of the Russian words грубый, дерзкий and хамский According to some Russian explanatory dictionaries7 the word грубый ‘rude’ has several meanings that have two common semantic components: a) ‘insufficient correspondence of the object, event or situation described to the existing norms’ b) ‘negative estimation attributed to the object, event or situation described’. Considering the semantic and pragmatic fields of impoliteness, one should reinterpret the first proposition as ‘insufficient correspondence to the norms of polite etiquette behavior (as they are determined in Russian culture and etiquette)’. For example, Russians call a person грубый if he/she usually behaves (or is behaving) contrary to the norms of behavior etiquette. One of the syntactic functions of the Russian adjective грубый is that of a modifier. It modifies both nouns referring to people and nouns referring to verbal or non-­verbal acts or their features, e.g. грубое обращение (grúboje obrashchénije) ‘rude communication’, грубые сцены (grúbyje scény) ‘rude scenes’, грубые жесты (grúbyje zhésty) ‘rude gestures’. The adjective дерзкий ‘impudent’ corresponds to another type of the Russian impolite behavior. It is polysemic and in one of its meanings it is close to the words смелый (smélyj) ‘brave’ and храбрый (hrábryj) ‘courageous’: all these words denote positive characteristics of the person or the person’s particular actions.8 For example, the Russian collocation дерзкое поведение can be translated into English as ‘brave behavior’ and it denotes behavior that breaks established cultural stereotypes. In other words, дерзкое поведение is a non-­standard type of human behavior, which may be valuated positively, as in (10). (10) В 1990 году трое журналистов «Алтайской правды»  совершили весьма дерзкий по тем временам поступок: основали собственную независимую газету. (V 1990 godú tróje zhurnalístov «Altájskoy právdy» sovershíli ves’má dérzkij po tém vremenám postúpok: osnováli sóbstvennuju nezavísimuju gazétu).

6 In other words, specific features of different types of Russian impolite behavior (rude, impudent, caddish) are reflected in the semantics of their standard Russian nominations грубый (grúbyj), дерзкий (dérzkij), хамский (hamskij). 7 Ushakov (1940, quotes from 2000, vol.  1, p.  628); Ozhegov (1949, quotes from 1963: 142); Kuznetsov (1998: 230). 8 The semantic features of the word дерзкий are taken from Ushakov (1940, quotes from 2000, vol.1, p. 694); Ozhegov (1949, quotes from 1963:156); SAD (1983, quotes from 1999 electronic version); Kuznetsov (1998: 252–253).

Impolite Behavior in Russian Language and Non-Verbal Code


‘In 1990 three Russian journalists from “Altayskaya Pravda” were brave to found an independent newspaper’ (

Here the estimation дерзкий ‘brave’ ascribed to the journalists’ action is caused by the fact that in 1990 there were no independent newspapers in Russia and founding the new democratic paper was a brave act. But the word дерзкий also has another meaning, which is opposed to ‘brave’, and in this meaning it is synonymous to грубый ‘rude’ and невежливый ‘impolite’. Its explication holds the proposition ‘violate a border between polite and impolite types of behavior’. Such violation of borders is estimated as breaking the social norms of politeness, as in (11). The word дерзко ‘impudently’ means that one of the participants of the dialogue violates the social distance between himself and the interlocutor. (11) – Что вам надо? – дерзко спросил я. – Ты, слышь, помалкивай, – ответил тот же голос. (– Chtó vam nádo – dérzko sprosíl yá. – Ty, slýsh, pomálkivay, – otvétil tót zhe gólos). – ‘What do you need – I asked impudently. – Shut up and listen – replied the same voice’ (E. Proshkin, The Mechanics of Eternity).

This kind of violation of social norms is typical of Russian young people communicating with the elderly people. The Russian verb дерзить (derzít’) ‘to be impudent’ and most collocations with the verb such as дерзить старшему (derzít’stárshemu) ‘to be impudent with an elderly person, дерзить родителям (derzít’ rodítelyam) ‘to be impudent with parents’, дерзить учителям (derzít’ uchitelyám) ‘to be impudent with teachers’ are examples of this sort of impolite behavior. As for the adjective хамский (hámskij) ‘caddish’, it has a clear etymology in Russian. Хам ‘Ham’ is a biblical name of one of Noah’s sons, who was cursed for disrespecting his father (Bible, Genesis 9:20). The phrase хамское поведение (hámskoje povedénije) ‘caddish behavior’ has an evident association with the etymology of the word and with the biblical story. Hence, the component ‘disrespect’ is an evident semantic focus of the word хамский. It denotes a person who is arrogant and very rude, and whose behavior can be called offensive to others. Хамское поведение is perceived quite negatively and rejected. This is why people who behave хамски ‘caddishly’ usually experience social and personal ostracism. A person who is behaving caddishly sets his or her personality above the personalities of others, and in this he or she humiliates them; in fact, the person is mocking or sneering at the interlocutor and demonstrates his/her superiority.


Grigory Kreydlin & Lidia Khesed

6. Russian non-­verbal signs of impoliteness Each of the three types of impolite behavior denoted by the adjectives mentioned above has its own non-­verbal sign markers, or impolite gestures.9 There are two classes of Russian impolite gestures. One of them consists of signs that are estimated as impolite in all the contexts of their usage. They are stylistically marked in all the lexicographic dictionaries, or “gestionaries” in the terms of Italian semiotician and linguist Isabella Poggi (2003). These are all the Russian gestures of (a) sexual offense such as the raised middle finger, (b) expressions of the stupidity of a person, e.g. to twist a finger on one’s temple, or (c) gestures and facial expressions, which belong to the class of Russian non-­verbal mockery signs (to show one’s tongue, to make a face and others). Another class of Russian impolite gestures is formed by the meaningful corporal signs that can be deemed either impolite or non-­impolite depending on several factors. One of these is the ethno-­cultural factor. One culture may dictate the estimation of the gesture as polite or impolite while another culture may differ in its estimation. This is the case of the widely spread phenomenon of intercultural homonymy (Kreydlin 2002: 135). The OK gesture of Anglo-­American origin has been borrowed in Russian non-­ verbal culture. In both cultures this is not an impolite gesture. But, for example, in some Mediterranean cultures, there is a gesture that has the form identical to that of the OK gesture but quite a different meaning. Performing the gesture, the person belonging to one of these cultures demonstrates that he or she considers the male addressee homosexual, in a pejorative sense (cf. Morris 2010). The second factor that influences the estimation of a particular gesture is the historical one. For example, in the periods before and after October 1917 the views on what are polite gestures and what are impolite gestures were different. The courtesy and refined manners of nobles before and after the historical events that took place in Russia in 1917 were estimated differently. Namely, after October 1917 they came to be perceived as signs of bourgeois views and as components of non-­verbal models of vulgar behavior. In other words, gestures and other non-­verbal signs that had not previously been taken as impolite became considered impolite. For example, in the early Soviet times after 1917 the gestures of kissing a woman’s hand, helping a woman with her coat or giving a woman a place in public transport were considered vulgar and impolite. One more factor that influences the perception of a gesture as polite, neutral or impolite is a particular manner of gesture performance. Dobrova (2011) singles out and describes several types of Russian handshakes such as dominant handshakes, inactive or slack handshakes and strong handshakes performed with both hands. These types of handshakes are all impolite. For example, in performing an

9 This class of Russian non-­verbal signs was described in detail by Kreydlin (2002); Kreydlin and Morozova (2003); Morozova (2006); Morris (2010).

Impolite Behavior in Russian Language and Non-Verbal Code


inactive handshake when meeting the person, the gesturer gives the addressee not the whole hand but only the ends of the fingers. In this way, the gesturer demonstrates his/her disrespect to the addressee and thus behaves impolitely. Russian gestures that are impolite in general and Russian gestures that are not impolite themselves, but the usage of which can be considered impolite, can be illustrated with examples involving pointing. The knowledge of a situation when a particular pointing gesture may be considered impolite constitutes a significant part of social education in Russia. For example, Russian children are taught not to point at the addressee when standing close to him/her. Pointing in this case is considered impolite. The parents often reprimand the children; Не показывай пальцем – это невежливо, lit. ‘Don’t point your finger – it is impolite’. The Russian phrase is clearly elliptical because there is no indication what finger is used here, though one means the index finger. Children and adults are allowed to point with a small finger or to point with a thumb , and both pointing gestures are not regarded as impolite. Using an index finger to point at a close distance is considered impolite because by performing the gesture the person breaks the communicative distance between him/her and the addressee without any permission, thus invading the addressee’s private space. A second class of impolite gestures are the gestures of bosses. In some situations, a gesture that is not impolite itself is interpreted as showing off the gesturer’s superiority over the addressee. Such are the body sign units that we may term the gestures of bosses. These are to sit solemnly, to sit with hands on the nape of the neck or to sprawl in a chair among others. This mode of non-­verbal behavior is always estimated as impolite. A third class of impolite gestures is formed by gender-­marked meaningful corporeal signs. Sometimes the behavior of a man communicating with a woman is considered indecent, vulgar and offensive, and thus impolite. For example, a man clapping a woman on the buttocks, hugging her or giving her a pinch – these types of non-­verbal behavior are estimated as sexual harassment. The Russian verbs expressing these types of non-­verbal behavior are the words ordered from приставать (pristavát’) approx. ‘to put the make on’, which is a weak form of harassment, and to домогаться (domogátsa) ‘to importune’, which is a strong form of harassment. This kind of behavior is always judged by society as highly indecent and impolite because demonstrating sexual desire for a woman in public is forbidden.

7. Conclusions In the present chapter we have observed some key verbal and non-­verbal manifestations of the category of impoliteness in contemporary Russian language and some non-­verbal signs. We described the category of impoliteness and showed that the estimation of human behavior as polite or impolite is highly dependent on the situations in which the particular type of behavior is displayed. The estimation is relevant only for those situations where standards of behavioral etiquette are defined.


Grigory Kreydlin & Lidia Khesed

We presented the explanations of the meanings of Russian words that form the core of the lexical field of impoliteness. Using semantic representations of the words described in our chapter we identified basic types of impolite behavior, such as rude, impudent or caddish. Alongside Russian words and word combinations, we examined some corporeal signs particular to Russian non-­verbal code using the methodology of the Dictio­ nary of the Language of Russian Gestures (DLRG 2001). We described the common features of impolite gestures and distinguished subclasses of gestures according to their semantics and usage. The variety of signs which people use to show disrespect, negligence, boredom and to express their negative emotions shows that impoliteness as a characteristic of social behavior should be studied in depth not only by linguists and semioticians, but also by sociologists and psychologists. Verbal and non-­verbal manifestations of impoliteness are closely connected with each other and a more detailed analysis of these connections could be an object of separate research devoted to the problem of multimodality in Russian. The analysis of the Russian lexemes невежливый, грубый, дерзкий, and хамский reveals the aspects of meaning which help to distinguish between various types of impolite behavior. The observation of several classes of impolite gestures, such as gestures by bosses (marking social superiority), vulgar gestures (which signal an intrusion into the interlocutor’s private space) or discourteous gestures (which are the result of a low level of education) connects the category of impoliteness as an abstract concept with its physical representation in day-­to-day human behavior. Being a subjective category, it is strongly connected with emotional processes: impolite behavior often means physical or emotional offence and provokes negative reactions (irritation, breaking off contact), judgement or sanctions toward a person; being a culturally marked phenomenon, impoliteness discloses cultural stereotypes and norms and the reaction of society to the violation of these norms. We can conclude by pointing out that the research on impoliteness links the “physical, emotional and mental spheres of the human being” (in the terminology of Apresyan, 1995). Further, it links the sign units of various codes and stimulates the cognition of the social world through humanly-­created sign systems. Therefore, it is clearly a relevant topic for cognitive semiotics.

Jamin Pelkey

Chapter 15 Symmetrical Reasoning in Language and Culture: On Ritual Knots and Embodied Cognition 1. Introduction What are the origins of symmetrical reasoning? This is a salient question. Not only is it focal to the research introduced in this paper, it also happens to be the final question posed to the closing plenary lecturer, rounding out a recent high-­profile gathering on cognitive semiotics.1 Before addressing the question or its response, it will be helpful to pose (and answer) a prior question: What is symmetrical reasoning? Technically speaking, symmetrical reasoning appears to be the inverse reciprocal blending of a sign vehicle and a semiotic object in some process of heuristic learning relevant to the ends of a given (human) interpretant. To illustrate, consider Helen Keller’s famous awakening to linguistic communication. Her breakthrough was not merely a realization that the sensation of water on her skin was related to the arbitrary hand-­signals of her caregiver (a unidirectional mapping) but, more importantly, that the caregiver’s hand-­signals could also be generalized to evoke water, even when water was not physically required or present to the senses.2 The latter mapping is dynamic and ambidirectional, involving inverse antisymmetries, or “chiasmus” patterning, that can be formally rendered water : signal :: signal’ : water’. In short, Keller’s famous realization is not so much an awakening to the existence of symbols as it is an awakening to the reflexive, symmetrical potential of symbolic activity for modeling possible worlds. In what follows I suggest that that such modeling activities are manifest not only in patterned processes of speech and writing but also in explicitly symmetrical designs found in human material cultures around the world. The discussion zeroes in on symbolic knots or intertwining lattice designs, using findings from cultural anthropology, archaeology and history. Insights from Peircean semiotics are used to frame the discussion and cognitive semiotic perspectives are applied to suggest vital links between these patterns and the later embodied phenomenology of Maurice Merleau-­Ponty. The chapter ends by outlining a preliminary agenda for cognitive testing of related hypotheses.

1 The gathering is the First Conference of the International Association for Cognitive Semiotics, held at Lund University 25–27 September 2014. The verbatim phrasing of the question was less formal: i.e., “Where does symmetrical reasoning come from?” 2 As Imai’s research shows (Imai 2014, Murai et al. 2014), the former relation can be taught by conditioning, whereas the latter cannot.


Jamin Pelkey

First, it will be helpful to return to the salient question introduced above, a question posed by Jordan Zlatev to Mutsumi Imai, the capstone plenary speaker of IACS 2014. Imai’s response to the question was simply, and wisely, stated: “I don’t know.” Indeed, it may be some time before we can speak with confidence or agreement on the origins and nature of symmetrical reasoning and related phenomena. In the meantime, what we do know, thanks to Imai’s ground-­breaking work (Imai, 2014, Murai et al., 2014; Imai this volume) is first that symmetrical reasoning is a key cognitive function underlying language acquisition and second that this function, or bias, is apparently unique to human beings. As with the peculiarities of any species, we can also be certain that such capabilities will necessarily be related to the body type of the organism in question since, in the words of Deely (2009: 81), “every Umwelt is a species-­specific objective world shared across species lines only to the degree that the bodily type of the organisms involved permits.” With this in mind, we turn to the patterns in question.

2. Anthropological and semiotic context Digging deep into the sediment of a coastal cave in South Africa, a team of archaeologists recently uncovered the earliest known example of human symbolic expression (Henshilwood et al. 2009, 2011; d’Errico et al. 2009). Engraved on a fragment of red ochre are multiple crisscrossing X figures, incidentally forming rhombus (or “diamond”) shapes between them (Figure 1a). The motivation, or meaning, behind the pattern is unclear. It might even be tempting to assume that such questions lead only to fruitless speculation; but from a Peircean semiotic perspective this response would be suspect since it constitutes an overt violation of the consequent premise to Peirce’s first rule of logic (1898/1998, EP2: 48): viz., “do not block the way of inquiry.” Henshilwood and d’Errico (2011) insist that the pattern is symbolic only in a strictly abstract sense — a­ s “a sign that has no natural connection or resemblance to its referent” (2014: 89). How warranted is this assumption? The pattern may have no natural correlate in the external visual world, but I wish to argue that it may indeed have experiential correlates that in the Lifeworld — t­ he embodied cognitive realm of phenomenological and psychological experience. Fortunately, the Blombos inscription is not an isolated instance of this design. Figure 1. Basic lattice networks featuring figure-­ground (X, ◊) oppositions from widely diverse cultures: Detail from a) Blombos red ochre incisions, c. 71,000 BCE, South Africa (Henshilwood et al., 2009); b) Navaho rug design, c.1890, Southwestern US (Washburn and Crowe, 1988); c) Traditional Yombe mat weaving pattern, Lower Congo (Gerdes, 2004a).

Symmetrical Reasoning in Language and Culture


The lattice motif appears and reappears across cultures and millennia (see Figures 1 and 2) as one of a highly selective set of salient patterns (Lewis-­Williams and Dowson 1988; Froese et al. 2013a). According to Froese et al. (2013a), this poses “a kind of universal selection bias” which calls out for explanation “but which has so far remained mysterious” (2013a: 208). From a semiotic perspective, the systematic relations shared between such designs function at the level of the Iconic Legisign, or diagram type. Although each token (or “Iconic Sinsign”) of the pattern varies slightly, the sheer number of analogous repetitions congeals via habit in the memory as a gestalt or generalized icon. More importantly, given that such patterns arise independently around the world and across millennia, in spite of their absence in the natural world, they may be identified as Indexical Legisigns, or unwitting “symptoms”, presenting a stubborn problem in need of diagnosis – posing a semiotic riddle to all who will pay attention. In short, the origins of these patterns, and human motivations for producing and reproducing them, remain unclear. Figure 2. Overlapping lattice networks featuring figure-­ground blends from widely diverse cultures. Detail from a) Han Dynasty tomb carving, c.100 BCE, Sichuan, China (Dye 1937); b) Intarsia knitting design, 17th century, Argyll, Scotland; c) Mbukushu traditional beaded apron pattern, Botswana (Washburn and Crowe 1988); d) Bora traditional twill-­plaited basket weave, Peruvian Amazon (Gerdes 2004b).

Even less clear (and less remarked upon) than the recurring crosshatching lattice network patterns featured in Figure 1 are the reasons why lattice repetitions of X and the rhombus geometries they incidentally produce, as “figure and ground”, are so often brought together to overlap or blend. Once again, this can be observed in cultural designs that span the globe, throughout history. A sampling from diverse societies originating on four continents is illustrated in Figure 2. While such patterns are often repeated on two dimensional surfaces such as cloth, basketwork or windows, they are just as commonly isolated as self-­enclosed networks, featuring a single path that reverses onto itself repeatedly to produce symmetrical gestalts or symbolic knots such as those illustrated in Figure 3.


Jamin Pelkey

Figure 3. Intertwining lattice networks from widely diverse cultures: a) Endless knot, contemporary rendering of an ancient symbol from Lama Buddhism, Tibet; b) Traditional Jokwe cosmology network traced on the ground, Angola (Zaslavsky 1999); c) Celtic knot embellishment, contemporary rendering of a medieval pattern found in the Book of Kells, early 9th century, Ireland; d) Bavarian printer’s ornament, 1569, Germany.

Naturally, just as it is possible for physical symptoms to deceive the pathologist, it is possible for generic indices to deceive the semiotician. Indexical Legisigns can only be tested against, and interpreted in light of, other signs — ­preferably including an immense network of other indices that can together be admitted as a collateral index or “assemblage of symptoms” (Peirce 1903: 223). Observed similarities are crucial, but no more crucial than critical inquiry. Understanding the nature of the patterns under consideration here, then, we must first explore four natural/cultural possibilities that might be proposed to explain-­away their widespread persistence: (1) long-­distance cultural contacts, (2) cross-­generational transmission prior to (and following) distant migration, (3) material affordances, (4) inherent geometric constraints. Perhaps none of these four factors can be conclusively ruled out as conditioning correlates; but, as we will see, none of them ultimately satisfy the riddle these patterns pose either. I propose that of these four possibilities, the first three are the least satisfactory for guiding us to answers. With this in mind, I address the first three here and save the fourth for the conclusion. While it is possible that these basic designs are the result of a single innovation that spread and was modified via cultural contact and cross-­generational transmission, the complex nature of human cultural evolution around the globe, and across millennia, make such an account implausible. This point may become more vivid when considering the isolated cultural practices and geographic situations in which most of the designs in Figures 1–3 are historically rooted. While it is plausible to hypothesize that the Bavarian Printer’s ornament in Figure 3d may have been inspired by early Celtic designs, for instance, any such link between ancient Tibet (Figure 3a) and Medieval Ireland (Figure 3c), would beggar belief, shifting the burden of proof to anyone proposing such a claim. More importantly, however, mimesis alone cannot fully account for any aspect of cultural transmission and evolution (Castro and Toro 2004). As will become clear below, we must also ask what makes such patterns worth reproducing. It is at this crux that the persistence of the pattern emerges as a riddle or Indexical Legisign.

Symmetrical Reasoning in Language and Culture


As for material affordances, it is important to note that the materials used to produce and reproduce these patterns are as diverse as human material culture itself. Even in the small sample discussed in this chapter, we find such designs being generated on everything from ochre, dirt, paper and granite to woven rugs and plaited mats. While it may have been more natural early in the evolution of culture to produce angular designs on certain surfaces such as ochre (Figure 1a), rounded designs would have been just as easily produced on other media, such as dirt on the ground (Figure 3b). Interestingly, the most widely discussed possibility for the emergence of these patterns across cultures relies on appeals to trance-­induced states stemming from some combination of social crisis, physiological trauma, ritual ceremony and the use of narcotics. At least one source (Lewis-­Williams and Dowson 1988) frames the origin of these patterns as a by-­product of trances induced by class struggle, as lone individuals worked out their internal social strivings in the recesses of caves. Lewis-­ Williams and Dowson propose that regular geometric entoptics known as “Turing patterns” (Turing 1952)3 were first perceived by these striving loners due to dream states induced by their isolation. Others argue that this fails as an explanation since we need to account for both the biological/physiological origins and “the cross-­ culturally shared value of these specific kinds of geometric patterns” (Froese et al., 2013a: 200) that would make them worth expressing at all, much less reproducing. In short, Froese (2013, Froese et al., 2013a) agrees that these symmetrical designs are rooted in Turing patterns and form constants (Klüver 1966), perceived by some during altered states of consciousness since lattice patterns and intertwining symmetries are often described in such instances. However, Froese further argues that such patterns are reproduced only because the states that induce them tend to be experienced as significant or meaningful. While this account is plausible, and likely to play a role in the solution of the riddle posed above, not everyone who reproduces lattice designs such as those in Figures 1–3 has had a “form constant” experience. Furthermore, the origins and nature of form constant entoptics are themselves unclear. What is more, neither symmetry creation nor symmetry perception are merely (or even primarily) visual phenomena (Bateson, 1958; Carter 2010; Nöth 1998; Hodgson 2011). Even the recognition of visual symmetry is now known to involve far more than a simple imprint on the retinal wall that is then reprocessed at the back of the brain in the visual cortex. Instead, complex feedback loops are involved that would otherwise seem to be unrelated to the visual experience at all (Williams et al. 2008; Froese et al. 2013a; Treder 2010; Rhodes et al. 2005; Poirier and Wilson 2010). Three key elements involved are movement, memory and an obligatory mapping between two and three dimensions. Each of these non-­visual components, and 3 Although this is the same A. M. Turing who invented the Turing machine and what became known as the “Turing Test” (see Parthemore, this volume), these are not directly related to the pattern description.


Jamin Pelkey

others, coordinate with vision in the human perception and production of visual patterns in space. The enigma grows further following applications of “plane pattern analysis”, the leading contemporary method in anthropology for the study of cultural symmetries (Washburn and Crow 1988, 2004; Washburn et al. 2010; Washburn 2011). This practice approaches such patterns in terms of bilateral (left-­right) contrasts rooted in mathematical group theory, now the predominant approach to symmetry across the disciplines. Using the form “p” as an instance of a basic shape for illustration, a given design can be described in terms of translation (pp), rotation (pd) and reflection (pq) and then classified according to a core set of relationship classes, including finite designs, one dimensional patterns, and two dimensional patterns, that are useful for tracking cultural differences, influences, dating periods and design changes, among other uses. Figure 3a is reproduced in Figure 4 to illustrate. The Tibetan “endless knot” is a finite design featuring basic dihedral symmetry (class d4), which combines both left-­right reflection and rotation relative to the centre of the left-­right axis. Figure 4. Finite design involving basic dihedral symmetry (d4, reflection plus rotation)

As some of Washburn and Crowe’s collaborators (Grünbaum 2004; Frame 2004) have pointed out, however, this approach to the symmetries of material culture has strict limitations, being unable to account for designs that express intertwining movements (such as those in Figure 3, cf. intertwining spirals and helix patterns) or designs not based on bilateral (left-­right) contrasts. Thus, while acknowledging the value of related contemporary work in anthropology, I wish to suggest that this axis of contrast is not sufficient to account for the meaning (and persistence) of various cultural symmetries, prominently including the symmetry group illustrated in Figures 1–3. This decision is further supported by a recent cultural-­historical reconstruction carried out by Hon and Goldstein (2008), who show that an explicit, conceptual awareness of left-­right (sagittal/bilateral) symmetry is absent in ancient and medie­ val worldviews. Rather, the ancient sense of “summetria” involves an awareness

Symmetrical Reasoning in Language and Culture


of part-­whole proportionality, or harmony between parts and wholes, especially involving vertical (i.e., transverse) relations that hold between upper and lower parts of some structure. In fact, conceptual awareness of bilateral symmetry constitutes a revolution in Western thought, not clearly articulated until the end of the 18th century!4 Ironically, Western thinkers are now so well aware of bilateral symmetry that transverse (upper-­lower) symmetries may be difficult to comprehend (see Turner 1991; Tyler 1995; Norrman 1999). These findings bring us back to the lived human body as indispensable for solving the riddle of the persistence of lattice networks and ritual knots of Figures 1–3.

3. Proposed embodied grounding Research in enaction theory (Varela et al. 1991; Varela 1997; Stewart et al. 2010; Froese et al. 2013b), cognitive semiotics (Sebeok and Danesi 2000; Ziemke, Zlatev and Frank 2007, Stjernfelt, 2007) and cognitive semantics (Johnson 1987; Lakoff and Johnson 1999; Hampe 2005; Gibbs 2005b; Steen 2011) has shown that we not only experience our bodies in movement but also project the feeling of our bodies onto other people, things and events (see Amorim 2006; Gallagher 2007; Gibbs 2008; Engel 2008; Doyle 2013 for on-­going empirical confirmation). When we do so with any degree of conceptual awareness, however, we are now more likely to imagine ourselves, other individuals, and phenomena in general as split down the middle rather than harmoniously interrelated, a split that is by no means obvious in many traditional societies (see Levinson and Brown 1994; Danziger and Pederson 1998; Danziger 2011).5 As a result, with very few exceptions, present applications of embodiment theory to symmetrical patterning stop at the level of bilateral contrasts, ignoring (or sometimes disparaging) other modes of symmetrical experience (see e.g., Turner 1991; Norrman 1999; Humphrey 2004; Ewins 2004). Working toward solving the riddle of symmetrical lattice patterning described above will require a more integrated approach. Traditional, implicit awareness of proportional, transverse relationships (which provide analogic grounding) will need to be blended with modern conceptual awareness of left-­right relationships (which provide analytic grounding). This is unlikely to happen if we neglect to consider the profoundly important role embodied feeling must play in any given abstract model – a point illustrated by O’Neill (2008) from the perspective of interactive 4 According to Hon and Goldstein (2008) the concept finds its first full articulation in the work of Adrien-­Marie Legendre in 1794. The important distinction to grasp here is the difference between tacit and explicit conceptual awareness. Although it is abundantly clear that bilateral symmetries and laterlized antisymmetries have played a profound role in shaping human cognition and cultural evolution for more than one million years, our ability to analyze and discuss such phenomena directly is the result of a very recent conceptual revolution. 5 This reversal has broad implications, the most troubling of which are highlighted in the work of neuropsychiatrist and cultural historian, Iain McGilchrist (2009).


Jamin Pelkey

media design, by Sheets-­Johnstone (2011) from evolutionary, developmental and kinesthetic perspectives, by Merrell (2010) from topological and semiotic perspectives, by Gibbs (2005a) from the perspective of cognitive science and by Johnson (2007) from the perspective of pragmatist philosophy. All five thinkers draw on the work of French philosopher Maurice Merleau-­Ponty (1945, 1960), who devoted his life to the problem. By his death in 1961, he had settled on the figure of the intertwining “chiasm” (the X figure) as the best way of coming to terms with the pervasively integrative experiences of embodiment. This he expressed using explicitly symmetrical patterns in his own linguistic collocations and sentence structures: “the body sensed and the body sentient are as the obverse and the reverse, or again, as two segments of one sole circular course which goes above from left to right and below from right to left, but which is but one sole movement in its two phases” (1960: 169, emphasis mine). The intertwining designs sampled in Figure 3 model this movement in ways that are themselves reflections of embodied experience. With these connections in mind, we are at least justified in claiming that human experiences and projections of embodied symmetry involve much more than visual sensation and have implications that range far beyond visual patterning. Anthropologist Peter Roe (2004: 275) reports that, among the Amazonian Shipibo, woven symmetrical patterns (much like those in Figure 3) function as “design therapy” since, for the Shipibo, “the order of the universe is invoked to symmetrically re-­pattern human spirituality as part of a return to a larger condition of homeostatic primal union, therefore health.” Naturally, the sense of wellness that accompanies the creation of such patterns (cf. Jung, 1955: 388–390) is partly derived from the aesthetic pleasures of art and imagination more generally (see Freedberg 2007; Dutton 2009; Bertamini et al. 2013b); nevertheless, contemporary research affirms the distinct pleasure of discovering or creating symmetry gestalts (Thagard and Stewart 2011; Makin et al. 2012; Muth 2013; Bertamini et al. 2013a, 2013b). It is unlikely that intertwining symmetrical designs such as those in Figures 1–3 are exceptional in these regards. A more pertinent question, then, becomes whether or not the production and reproduction of these designs has benefits that range beyond the mere satisfaction of pattern completion or biologically selected preferences for symmetrical forms over asymmetrical alternatives. To what degree might the production and reproduction of these patterns be a mode of linguistic modeling, or symmetrical reasoning? Can retracing such patterns serve to enhance linguistic modeling in other domains? Cognitive testing is needed to answer such questions.

4. Proposed methodology for cognitive trials The research described above summarizes one aspect of the groundwork being prepared for further intensive inquiry into the origins, logic and implications of criss-­ crossing, overlapping and intertwining symmetrical patterns featured in languages and cultures around the world (see Pelkey 2011, 2013a-­d, 2014 for earlier groundwork). At this formative stage the research has two objectives: (1) to establish more

Symmetrical Reasoning in Language and Culture


robust foundations for on-­going critical inquiry into the origins and significance of these patterns, and (2) to begin testing for cognitive benefits that may result from physically re-­tracing visual exemplars of these patterns drawn from traditional cultures around the world. Progress toward both objectives should help explain why such designs emerge independently across cultures and millennia, in spite of their absence in the natural world. Cognitive testing is needed for the exploration of meaningful benefits that may result from manually or visually retracing intertwining lattice patterns, including possibilities for enhanced fluid cognition, analogic/linguistic pattern solving, general attitudes toward ambiguity and paradox, and potential increases in symmetrical reasoning bias. Should cognitive testing prove conclusive in any of these regards, such results would not only help explain the grand riddle outlined above but might also provide an additional tool for mindfulness-­based therapies. Figure 5. Web app prototype based on Figure 3d: Screen capture taken midway through tracing cycle

Tracing exercises are to be carried out via a digital interface, using the individual patterns listed in Figure 3 rendered as vector graphics, using an interactive web app, as shown in Figure 5. The first four prototype designs are now complete. The app is cursor driven and multi-­layered, allowing interested individuals to retrace the geometric design by hand (using a touchscreen interface) or visually (using eye-­tracking software). The pattern display is programmed to change in-­sync with the motion of the user’s finger, stylus or eye movement as the pattern unfolds


Jamin Pelkey

during the act of tracing. The current design concept is programmed to transition from a light grey line through successive shades of grey following each layer of tracing, culminating in a black line. The app is designed to be integrated with an information-­rich website providing further background on both the origins of the individual patterns and the broader project to which they belong. The relative effectiveness of each design, in terms of potential therapeutic and cognitive benefits, is to be tested against the others, and against various controls, in a series of cognitive trials. The results are expected to facilitate both objectives of the research outlined above, and in the process to potentially help to explain some of the reasons for the independent persistence of such patterns across cultures. Cognitive training breakthroughs are often hailed with unmitigated enthusiasm (see e.g., Doige 2007); but, as others are now reporting (e.g., Rabipour and Raz 2012; Harrison et al. 2013) cognitive training exercises have mixed results. Most computer-­ design interfaces that attempt to enhance “fluid intelligence” do so by testing working memory with binary puzzles and strategic decisions. Since such practices are now known to be ineffective (Redick et al. 2013; Harrison et al. 2013), pattern tracing exercises currently being planned are to be tested through such methods as interview questionnaires (probing attitude and affect), paradigm-­completion exercises and solutions to novel scenarios (probing creativity and imagination). Experimental design is planned following a thorough literature review of related testing methods and best practices. Results are to be analyzed both quantitatively and qualitatively. Since human visual processing is much better adapted to the recognition and reproduction of symmetrical patterns compared to manual processing (Humphrey 2004), the results of manual tracing exercises are expected to differ significantly from the results of eye-­tracking tracings of the same patterns.

5. Conclusions Returning to the riddle at the heart of this study, we are now prepared to consider an earlier critical proposal. Could it be that the inherent constraints of geometric space might explain (away) the phenomenon, making the patterns introduced above more likely to emerge and re-­emerge than others? This, in fact, is a conditioning factor that can serve to enhance the embodied cognitive account the chapter goes on to suggest. A pivotal distinction to consider in this connection, however – one that keeps this account from being merely reductive – is the recognition that geometry is not itself a set of disembodied abstractions. Two-­dimensional Euclidean mappings in particular are widely claimed to be manifestations of embodied human relations, both experienced and remembered (Johnson 1987; O’Keefe 1993; VanLier 2003). When such designs come to be reflected not only visually but also in patterns of speech and social organization, we find potential common ground shared between symmetrical visual designs and symmetrical reasoning. Although relationships shared between linguistic symmetries and cultural dynamics are only beginning to be explored, the results of early explorations suggest that careful reflection on intertwining relationships implied by cultural and lin-

Symmetrical Reasoning in Language and Culture


guistic symmetries can help make sense of structures, practices and transitions that would otherwise seem illogical or contradictory (see Douglas 2007; Strecker and Tyler 2009; Pelkey 2013a-­d; Wiseman and Paul 2014; Strecker 2011, 2014; Paul 2014; Tyler 2014). In considering explanations for persistent symmetrical structures found across languages, John Haiman argues that only a “creative aesthetic drive” (2008: 47) could explain the on-­going reproduction of such patterns across time and space by widely diverse generations of speakers. As we have seen, the same explanation also seems necessary to account for the intertwining gestalts described above. In addition to helping us understand the origins of symmetrical reasoning, then, further research at this intersection may eventually shed light on the origins of imagination and the sources of conceptual blending (Fauconnier and Turner 2002; Gall 2010; Abrahamson 2009). As Lewis-­Williams and Dowson (1988) suggest, the deliberate mimesis of such designs is by no means a novel idea. In fact, there appears to be something of a precedent for it in pre-­historic cave art: Although there is no room for communal activity, the vast number of marks on the clay wall of this “sanctuary” suggests … that it has been repeatedly touched … [possibly by] novices who, having entered the caverns on a vision quest, reached out to the existing depictions to absorb their power and to trace their own visual percepts … (Lewis-­Williams and Dowson 1988: 214).

Given the connections sketched out above, we may at least admit the possibility that the prehistoric impulse in question is rooted in a more general capacity for embodied modeling from whence symmetrical reasoning and other distinctive features of language also originate. With this in mind, we may well expect that further inquiry in this vein will provide answers to the question with which the chapter began: “What are the origins of symmetrical reasoning?” Not only does cognitive semiotics provide a natural atmosphere for asking such questions, it also opens up a fitting habitat for finding answers.

Acknowledgements This research is supported by the Social Sciences and Humanities Research Council of Canada: SSHRC-­IDG #430–2015-01226 “Steps to a Grammar of Embodied Symmetry.” Seedfunding was provided by a Social Sciences and Humanities Research Council of Canada Institutional Grant (SIG), a Ryerson University Faculty of Arts New Initiatives Award (NIA) and a Travel Grant from the Ryerson University Dean of Arts. My appreciation goes to two anonymous reviewers, the editors and Stéphanie Walsh Matthews for their helpful comments on earlier versions of this paper. The digital web app prototypes discussed were engineered by Rami Matloob.

Štěpán Pudlák

Chapter 16 Cognitive Semiotics of Mental Disorders, with Focus on Hallucinations 1. Introduction In this chapter I introduce the potential of a cognitive semiotic approach for the study of mental disorders. The vast field of research of mental disorders is approached from several scientific perspectives. Each of these perspectives has its benefits as well as its limitations. My aim is to integrate psychiatric accounts of schizophrenia with semiotics as a methodology for the analysis of signs and signification. Such transdisciplinarity is one of the key features of cognitive semiotics (Zlatev 2012: 14). Psychiatric approaches to the symptoms of mental disorders are usually heuristic, which means that they require an understanding of what these symptoms are. Semiotics, on the other hand, offers theoretical concepts through which we can analyse symptoms from a different perspective. We can then, for example, compare hallucinations to other mental phenomena. In this chapter I shall argue that hallucinations are specific on the level of indexicality of signs and that their indexicality is the same as in the case of sensual perceptions. Section 2 outlines the current status of semiotic theories of mental disorders. Then I focus on the discourse of psychiatry and specifically on how the symptoms of schizophrenia are described. In Sections 4 and 5, a cognitive semiotic approach informed by Peircean semiotics is presented and applied to hallucinations. I conclude by briefly comparing my cognitive semiotic analysis of hallucinations with neuropsychiatric hypotheses. The analysis of hallucinations should be a part of a wider semiotic study of mental disorders. This approach doesn’t aim to be an exclusive interpretation of mental disorders, but rather hopes to enrich the interdisciplinary study of mental disorders with new viewpoints, concepts and hypotheses. If successful, the cognitive semiotics of mental disorders may suggest theories to be validated by neurobiology, new clinical methods for practical psychiatry or psychological treatment.

2. The current status of theories of the semiotics of mental disorders Semiotics can assess mental disorders from several different points of view. It is often used as a methodology for studying media content, e.g. the way the news represent mental disorders (see Allen and Nairn 1997; Aarva and Tampere 2006).


Štěpán Pudlák

Although there are studies that bring together semiotics and psychoanalysis (see Muller 2000), I will not include these perspectives in this study and instead focus on a semiotic analysis of symptoms. There are only a few studies that focus on mental disorders and schizophrenia from the perspective of semiotics. In 1986, James Harrod published an article arguing that schizophrenia can be understood as a “semiotic disorder” (Harrod 1986), inducing discussion on whether it is primarily such, a “thought disorder” or a “linguistic disorder”. This is a discussion that I will not contribute to in this chapter. In contrast, my aim is to approach schizophrenia from the point of view of semiotics rather than argue as to whether it is a “semiotic disorder” or not. Among more recent studies we can indicate Chernygovskaya’s paper on the semiotics of olfaction and hearing (Chernygovskaya 2004) or Elvevag’s, Wynn’s and Covington’s paper Meaningful confusions and confusing meanings in communication in schizophrenia (Elvevag et al. 2011), a study of a schizophrenic patient’s depiction of his own concept of a time machine. Similarly to my approach, the authors propose to analyze the specifics of the delusional expressions of a patient. In conclusion, they state: […] the assertion of conjectures as if they were on the same level as established scientific presuppositions is arguably similar mechanistically to delusional thinking in that a distinctive thing about the delusion is not that it is false, but that the deluded person does not question it. In this case, the resulting prose is incoherent science. (Elvevag et al. 2011: 464)

What is common to most of these studies is that they use semiotics more as a background than as an actual approach or methodology. My aim is explicitly to use semiotic terminology and theories for a more straightforward analysis of the cognitive symptoms of mental disorders, rather in the manner of Gallagher (2007), on which I comment later. First, I would like to present a brief account of the psychiatric perspective on hallucinations.

3. The psychiatric perspective Mental disorders are studied in a number of branches of science and from different perspectives, yet the nature of many mental disorders – and the nature of schizophrenia in particular – still remains something of a mystery. This fact underlines the tediousness of many scientific efforts to understand what happens in the “black box” of the mind and what sort of relationship there is between the mind and the body. Psychiatry does not focus on a deeper understanding of what distinguishes schizophrenia symptoms from other mental processes. The symptoms are described and understood in a rather general sense, relying on a heuristic understanding of what hallucinations, delusions, disorganized speech etc, are. Diagnostic and Statistical Manual of Mental Disorders (fifth edition; henceforth, DSM-­V) states:

Cognitive Semiotics of Mental Disorders


Auditory hallucinations are usually experienced as voices, whether familiar or unfamiliar, that are perceived as distinct from the individual’s own thoughts. (American Psychiatric Association 2013: 87)

In this brief excerpt from the DSM-­V we can see how the description of hallucinations relies on heuristic understanding. The key part of the description is that hallucinations are voices that are “[…] perceived as distinct from the individual’s own thoughts” (ibid). Yet how distinct are these voices from an individual’s own thoughts? How are they perceived? And where is the boundary between what is a hallucination and what is, let us say, a “vivid imagination”? In this chapter, I will focus on one prominent symptom in its most common form: auditory hallucinations (or ‘hearing voices’), as mentioned in the quotation above. An auditory modality of hallucinations in schizophrenia accounts for 80 % of all hallucinations (Frit 1992: 68). A final, complete study of schizophrenia should, of course, also take into account other modalities.

4. The cognitive semiotic approach Why is cognitive semiotics a suitable theoretical background for the study of mental disorders? The explanation of symptoms of mental disorders requires different methodological perspectives. They can be studied from the third-­person perspective (as in psychiatry or neurobiology), but they also require a first and second-­person perspective (Zlatev 2012: 14–15). The first-­person perspective is necessary to describe and analyze mental signs and processes which have a fundamentally subjective aspect, such as perceptual experiences, memories or imaginations. The second-­person perspective is then necessary to approach phenomena of mental disorders which are not experienced by “normal” persons, but are partly accessible by empathy and imaginative projections, as Zlatev (2012) puts it. Cognitive semiotics, as a new field of transdisciplinary research, offers such integrative methodology. Zlatev describes this combination of different phenomenological levels as methodological triangulation (Zlatev 2012: 14). For further research on mental disorders it would be essential to integrate these methods in order to avoid methodological reduction, as it is often manifested in cases of both psychiatry and philosophy. A cognitive semiotic approach to hallucinations is used by Shaun Gallagher (2007). Gallagher distinguishes between sense of ownership and sense of agency. In other words, the former describes an experience in which something is happening to me, to my body or thoughts (e.g. “I am thinking.”). The latter refers to an experience in which something is caused by me, i.e. I am the agent of a certain movement or thought. When I move my hand, I have both: a sense of ownership (it is my hand that moves) and a sense of agency (it is me who is moving my hand). When someone pushes me, I have a sense of ownership as it is my body that moves, but not a sense of agency as I wasn’t the one who caused the movement (Gallagher 2007: 34–35). Hallucinations can be understood in these terms. Patients with hallucinations de-


Štěpán Pudlák

scribe that the “voices” are theirs (sense of ownership), but something or someone else is causing them (sense of agency). But what is even more important for Gallagher is the distinction between a first-­ order experience, which is a phenomenal, pre-­reflective experience, and a higher-­ order cognition, which is reflective and metarepresentational (Gallagher 2007: 33). Gallagher argues for bottom-­up accounts of hallucinatory experience against top-­ down accounts. The latter explain hallucinations as misinterpretations of “normal” experiences. For Gallagher, hallucinations are caused by impairment to the first-­ order experience rather than higher-­order reflective interpretation and he shows that this perspective is coherent both with patients’ descriptions of their hallucinations and with neurobiological research. This approach is fruitful, but I will try to integrate it with Peirce’s semiotic theory, and use it as a methodological background for the study of hallucinations as one of the symptoms of schizophrenia. Peirce’s semiotics offers a coherent theory of signs and their features. Although the theory has major metaphysical and epistemological dimensions, the most beneficial for this study is the semiotic dimension offering sign classifications and accounts of aspects of signs. The major aim is to analyze mental processes that are described as symptoms of mental disorders in psychiatric discourse and compare these symptoms to other mental processes. We have accounts of these symptoms based on psychiatric case studies, but how can we analyse them? The key question here is: what is specific about a hallucination and how is it different from other mental phenomena? How can we distinguish hallucinations from products of the imagination, memories and so forth? I suggest that mental processes can be understood and analyzed as signs and processes of signification, and that the methodology for such study can be based on Peircian semiotics. For Peirce, thinking is the interpretation of signs with other signs, although in Peirce’s “mature semeiotic”, signs are also “outside’ the mind and interpretation is not necessarily human (Short 2007: 289). Through semiotic concepts we can analyze the specificity of mental processes manifested by schizophrenics in much more detail. The psychiatric discourse focuses mainly on the content and form of hallucinations, but also includes some semiotically relevant elements, as can be seen in the following accounts. The hallucinations consisted of a voice speaking in a clear, commanding tone with a dominant manner and was of a man not known to the patient. It was experienced by him as being outside himself although he did not believe that the voice was other than an hallucination. (Glaister 1985: 214) The most common hallucinatory experiences are voices talking to the patient or among themselves. On many occasions, the voice, which can be identified as male or female, is not associated with anyone known by the patient. The voice is usually clear, objective, and definite, and experienced as coming from the outside. (Weiberger and Harrison 2011: 11)

Cognitive Semiotics of Mental Disorders


The semiotic study of schizophrenia should aim to analyze symptoms and their specificity from a perspective different to the one provided by psychiatric discourse. Where the psychiatric discourse focuses on “what” and “how” a voice in a hallucination is telling (as in above quotations), semiotics is able to distinguish between different levels of a sign and the relationships between the elements of a sign. As I show in the following section, “what” a voice of a hallucination is telling is only one of the levels that can be analyzed.

5. The indexicality of hallucinations From a semiotic perspective, there are different levels on which a hallucination can be understood. Generally speaking, one “thing” (e.g. a shoe, a word, a gesture) can act as different types of signs. For example, words in a novel, which separately act as symbols, can also act as icons, i.e. as literary image (Short 2007: 216). Thus we can distinguish between different levels on which something may act as a sign, as I will show in case of hallucinations. Let us now focus on an example of auditory hallucination: a voice saying “The DVD player is on.” We can understand this whole utterance as a sign. This sign refers to a DVD and its feature as being on. In the psychiatric discourse, this would be considered the “content of the hallucination” (American Psychiatric Association 2013). This content is usually specific to each hallucinating patient and depends on the context of his or her own experience (although how depends on the context is unclear). Hallucinations can refer to many different objects or states, e.g. items from patient’s home (“Take the pen from the table.”), general political ideas (“The president is a bad person.”) or fictional objects (“Talk to the unicorn.”). Further, the do not necessarily have to be expressed in language, as in cases of visual hallucinations or incomprehensible auditory hallucinations. Thus, what is specific for a hallucination is not the content, but yet another layer of its characteristic as a sign. In Peircean terminology, the utterance “The DVD player is on.” is a Dicent Symbol (CP 2.262), or, in other words, a proposition. It has a subject and a predicate, which are related (cf. Stjernfeldt 2014). As it is a complex type of sign, it necessarily involves less complex types of signs. A Dicent Symbol is in a part a Rhematic Indexical Legisign (CP 2.262), for it “[…] requires each instance of it to be really affected by its Object in such a manner as merely to draw attention to that Object.” (CP 2.259). And as being a Rhematic Indexical Legisign of a kind, it also acts as a Rhematic Indexical Sinsign (CP 2.259) of a kind. Let us now focus on the aspect of a hallucination as a Rhematic Indexical Sinsign. Peirce states that “A Rhematic Indexical Sinsign [e.g. a spontaneous cry] is any object of direct experience so far as it directs attention to an Object by which its presence is caused.” (CP 2.256). In this case, the object of a hallucination in sense of a Rhematic Indexical Sinsign is the voice itself, as it is in the case of a spontaneous cry. When someone on a street shouts, the shout acts as a sign in respect of the fact that it brings our attention to the person who cries. Thus, whichever class of signs a hallucination would be in, the specificity of it being a hallucination is its


Štěpán Pudlák

character of being a Rhematic Indexical Sinsign and its indexicality in this measure. The indexical aspect of a hallucination draws attention to the “utterer”, which can be in this case a specific person or an unidentified voice. Indexicality, in Peirce’s sense of the word, is the connection of an experience to something else, external. Short interprets Peirce as follows: This idea of indexicality required a reconception of the category that in the ‘New List’ was known as ‘relation’. That reconception, under the new rubric, ‘2ndness’, was provided in Peirce’s phenomenological reinterpretation of the categories, beginning in 1885 (W5:235 ff.). In it, he identified phenomenological 2ndness as a two-­sided awareness of inseparable polarities, for example, force and resistance, action and reaction. To feel oneself acting, one must feel a resistance thereto; to feel a resistance, one must feel oneself exerting a force. Thus we experience something as being other than (‘external to’) our experience of it; both are particular. This experience is directional; hence, spatio-­temporal; self and other are by it experienced as located relative to one another. In experience, 2ndness is itself indexical and accounts for our ability to think about particulars, albeit thinking imports general terms. (Short 2007: 49–50)

The “external” aspect, to which the experience is connected to, does not necessarily belong to “external reality”. It is external to the experience. Memory is external to the experience of a memory itself. If we recall a certain memory, we exert a force to recall it. The indexicality of a sign is thus its connection to a spatio-­temporal instance, i.e. something particular. It is directional in its nature. Thus a shout is indexical in the sense that it directs our attention to the person who shouts and a remembered voice is indexical in the sense that it directs our attention to memory. The indexicality of a sign is its directedness towards its object when the object is a specific, spatio-­temporal instance. Nevertheless, even in the case of symbols, whose character is that of a general law, an instance of such a symbol is always spatio-­ temporal and so it is indexical in a certain manner (CP 2.262). It is this indexical aspect of a hallucination as a sign that is being interpreted as coming from the “outside”. Yet the specificity of a hallucination is not solely in this interpretation. The indexicality of the “outside” or of “reality”, which focuses attention on an “outside object/utterer”, is common to hallucinations and sensory perceptions. In this aspect, a hallucination differs from imaginations or memories, which have a different indexicality, for they focus attention on something internal (“I’m imagining. I’m remembering.”). In acute states of schizophrenia, a patient is usually not aware that he or she is hallucinating. Yet in later stages of the disease, a patient could be aware of this. When the hallucinations first appear, the patient is quite convinced of the external reality of these experiences and may seek help in finding out who is transmitting the messages and how this is achieved. […] In the later stages of the illness the patient will learn that the “voices” are not real. Nevertheless, the experiences still have the quality of “real” perceptions. (Frith 1992: 68)

Cognitive Semiotics of Mental Disorders


In this aspect, my description based on Pierce is coherent with the account given by Gallagher (2007): hallucinations are not “misinterpretations” of higher-­order cognition, to use Gallagher words. However, I disagree with Gallagher that the character of hallucinations can be explained by a disruption of “[s]ome neurological component responsible for the differentiation between self and the other[…]” (Gallagher 2007: 44), which is essential for a sense of self-­agency. For example, imagining and recalling something from memory may be voluntary as well as involuntary. I can recall something as a result of my “intention” to recall it, but it’s not uncommon to recall something without the sense that I wanted to recall it (e.g. by an association to something that I see). In the latter case, I do not have a sense of self-­agency, and yet still understand it as something “internal”. But then, how does a hallucination differ from a perception? What is different is the object of a hallucination. In psychiatric discourse, a hallucination doesn’t have an appropriate external stimulus (Frith 1992: 68). There has to be some other person to tell the patient that the perception he or she has is not external, otherwise, the patient cannot tell if he is hallucinating. So, does it require a second or third-­ person perspective (feedback given to the patient regarding the “reality” of the hallucination) to distinguish a hallucination? In other words, are hallucinations definable only when some other person than the hallucinating patient is involved? It is important to emphasize that for Peirce the object of a sign does not necessarily have to be a “real” object. Peirce’s Object may be perceptible, imaginable or even unimaginable (CP 2.230). The key feature of an Object of a Sign is that the Sign represents the Object. We have seen that in the case of a proposition (or Dicent Symbol) it has to be on some level also a Rhematic Indexical Sinsign. But the Object of a proposition on the level of a Dicent Symbol is different from the Object represented by the same sign on the level of a Rhematic Indexical Sinsign. For example, when someone on a street tells me: “Watch out for the car!” the Object of this sign as a Dicent Symbol is the car (even though it doesn’t have to “really” be there), but the Object of the same sign as a Rhematic Indexical Sinsign is the utterer of the sentence. In order to understand what might be the Object of a hallucination as a Rhematic Indexical Sinsign, we have to consider Peirce’s distinction between an Immediate Object and Dynamical Object: “[…] it is necessary to distinguish the Immediate Object, or the Object as the Sign represents it, from the Dynamical Object, or really efficient but not immediately present Object.” (CP 8.343) According to Peirce, the important difference here is that the Immediate Object is an object of one instance of experience, when the Dynamical Object could be an Object of more instances of experience (CP 8.183). When I catch a glimpse of a landscape outside a window, the Immediate Object would be that of a landscape. When I look more closely, I can see that it is but a realistic painting on a wall. The Dynamical Object, which is the painting, was given differently in different instances of my experience. The Dynamical Object was the same, yet the Immediate Objects in those instances of experience were different. I have corrected my interpretation because it was the same Dynamical Object that was the object of different signs (Short 2007: 192).


Štěpán Pudlák

A hallucination differs from a perception because its Dynamical Object is internal rather than external. But that does not mean that it always requires a second or third-­person perspective to distinguish hallucinations from perceptions. When a third-­person (a psychiatrist, a family member etc.) tells the patient that the object to which his or her hallucination is referring to is not “real” or “external”, the reference is to the Dynamical Object of the sign/hallucination. But it is the first-­person perspective in which the reference is comprehended by the patient. Thus the patient can differentiate between hallucinations and perceptions even without someone else telling him/her that he or she is hallucinating. For example, if a patient in the later stages of the illness hears voices, looks around the room and sees that nobody is in the room, he or she can be aware that the Dynamical Object of the previous sign (hearing voices) is internal rather than external. Of course, in many instances the patient has no other reference to the Dynamical Object of his hallucination than those with the indexicality of perceptions, so he or she cannot refer to the object as internal. At the same time, hallucinations differ from inner mental phenomena such as memory or imagination in their indexicality. This “distortion of indexicality” is what makes hallucinations abnormal. If the indexicality of a given sign of the object was not distorted, the object would be experienced as coming from “inside” (whether as a memory, a product of the imagination or inner speech). That also means that hallucinations cannot be understood merely as a mismatch of internal and external experiences. If hallucinations were caused by a distortion of a cognitive function that distinguishes between internal and external experiences, hallucinations would relate to many different objects in many different situations. But that is not the case. Hallucinations are usually connected to specific objects, even though there can be many of them and they might not be specific (Weinberger and Harrison 2011). A patient may for example consistently hear a voice of a giant hedgehog calling his name, but he or she never hears a voice of any other animal or person. The patient has a whole variety of inner experiences, he or she is able to remember and imagine, yet no other inner experience is mistaken for a perception but those that are related to the giant hedgehog.

6. Conclusions We may conclude that hallucinations are characterised by a distortion of indexicality and so in the core of schizophrenia there must be a distortion of what gives different mental phenomena their indexicality. Whether hallucinations are closer to memories or imagination (see Aleman 2001: 72), they are internal processes that become signs given to consciousness. That suggests that there is a “neurological component” in the sense of Gallagher (2007) where internal and external inputs (sensory information, memory, imagination) are integrated as signs given to consciousness where this shift between the indexical character of sensory modality is wrongly associated with internal mental phenomena.

Cognitive Semiotics of Mental Disorders


We may briefly compare this understanding of the nature of hallucinations to neuropsychological theories of hallucinations. One of these theories suggests that hallucinations “arise from biases in reality monitoring” (Aleman 2001: 84). Another suggests that “hallucinations are extreme vivid manifestations of mental imagery” (ibid). The former is coherent with my analysis. The latter supposes that there is some scale of vividness (ibid: 87) of imagination. It would mean that the hallucinating patient may experience the same content (e.g. voices) both with indexical character of sensory perception (external) and imagination (internal). However, that is not the case. The indexical character of such hallucinatory content does not “slip” from one kind of indexicality to another. The advantage of analysing hallucinations as Rhematic Indexical Sinsigns is also in the fact that it depends on neither the content nor the modality of hallucinations. Frith summarizes possible theories of hallucinations, from which the most plausible are considered “output theories” (Frith 1992: 71–73). These understand hallucinations as an inner voice that is misperceived as coming from somewhere else. Although the majority of hallucinations are auditory (and specifically “hearing voices”), 20 % of hallucinations are in different modalities. What is common to hallucinations regardless of their modality is that they shift attention to the object of the sign, in other words indexicality. In sum, this chapter has shown some of the benefits of a cognitive semiotic approach in the study of mental disorders, including a framework for the integration of semiotic analysis of mental phenomena with psychiatric and neurocognitive descriptions, with focus on first-­person experience. As demonstrated in the case of hallucinations, such an approach can provide the thorough characterization that psychiatric discourse often omits.

Acknowledgements The work was supported by grant SVV-2015-26024002.

Gisela Bruche-­Schulz

Chapter 17 Pictorial Responses and Projected Realities: On an Elicitation Procedure and its Ramifications 1. Introduction The pictorial responses to a narrative text that are focused on in this chapter raise questions concerning their pictorial format, their semiotic status, and their underlying reality-­creating “acts of experience” (Thompson 2007: 268). Such acts of experience open vistas on scenes and scenarios of a narrative that is continued into the readers’ own situation. The responses take the text that is read as contextual input,1 and exhibit the readers’ “empathic perspective-­taking” (de Waal 2008: 285) with the events and the characters of a narrative world by filling it with detail from the readers’ own contexts and perspectives. Thus, the readers’ abductive creations carry “the spiral of semiosis” (Deely 2001: 39) into newly projected, pictorially framed, realities. As for the notion of “reality”, I refer to Schütz (1962 [1945]). Schütz mentions, with reference to Bergson, that “our conscious life shows an indefinite number of different planes, ranging from the plane of action on one extreme to the plane of dream at the other” (1962 [1945]: 212). He remarks that he prefers to talk about “finite provinces of meaning upon each of which we may bestow the accent of reality” (ibid: 230). Such bestowing of the accent of reality on a province of meaning is accomplished by directing attention to it. When complying with the investigator’s request to jot down what comes to mind, the readers apply their attention to this task by using pen and paper. At the same their attention is directed towards a world of “imagination”, but staying within the task-­oriented “world of working” (Schütz 1962 [1945]: 227) when producing material sketches, notes, and scribbles. This doubly directed attention arises from an “intersection” of the “flowing modes of givenness” (Husserl 1999: 312), and is explained by Schütz as follows. In and by our bodily movements we perform the transition from our durée to the spatial or cosmic time, and our working actions partake of both. In simultaneity we experience the working action as a series of events in outer and in inner time, unifying both dimensions into a single flux which shall be called the vivid present. The vivid present originates, therefore, in an intersection of durée and cosmic time. (Schütz 1962 [1945]: 216) 1 The text which is read is an excerpt from Saint-­Exupéry’s Le Petit Prince (1943), chapter XXV.


Gisela Bruche-­Schulz

In this view, the material act of using pen and paper when complying with the investigator’s request happens simultaneously in the intersection of inner time (durée) and cosmic time.2 However, attention is focused (and backgrounded) differently, and different sides of its directedness are applied when responses are designed in the vivid presence which originates in the intersection of durée and cosmic time. When the respondents choose their options of jotting down what comes to mind they simultaneously engage in acts of unfettered imagination in inner time, but these remain, by the design of the elicitation events, “in the world of working connected by communicative acts of working with the Other” (Schütz 1962 [1945): 258). The elicitation events thus provide a working time frame, so to speak, within cosmic time that is experienced in terms of world time. In Section  2, I address the question of how acts of imagination need to rely on remembering and experiential concepts. I first sketch out briefly how the term schema is used for addressing what is remembered and projected onto current experience. I then turn to the question of how the conceptual grounding of remembered and projected experience is accounted for from the point of view of the basic experience of force gestalts (“balance – equilibrium”, and “blockage – disequilibrium”). In Section 3, I provide a description of the data that emerged from the elicitation procedure, and in Section 4 provide illustrations and analyses of increasingly abstract pictorial responses, before concluding by pointing out the experiential, first-­person and second-­person nature of the methodology used.

2. Remembering and meaning The readers in my study produce pictorial responses, i.e. picture signs, but not pictures in the everyday sense (to be hung on the walls of a living room, a gallery, carried in a pocket, or the like).3 These picture signs were not meant to be produced as items for public viewing. The request “to jot down what comes to mind” makes them into products of the moment, starting from a text and merging with a current moment’s ever-­continuing amalgamation of current with remembered experience. In this section, the question of how remembering feeds into this process is approached by looking at the notion of schema. 2 Schütz makes use of Husserl’s perspective on the “double intentionality in the stream of consciousness” (Husserl 1999: 218). 3 It might even happen that one of the readers enlarges a sample item of the pictorial responses one day, frames it, and hangs it on a wall, thus turning it into an object of fond admiration or a different form of appreciation. However, the original purpose – for which it is used in this chapter – was to respond to an investigator’s request to “jot down what came to mind”. The pictorial responses were produced as one-­off processual creations, not meant to outlive the life cycle of the paper onto which they were jotted.

Pictorial Responses and Projected Realities


2.1. The notion of schema The notion of “schema” has been used in diverse traditions of thought, each focusing on a particular point of reference in relation to the notion of remembering. “Schema” became a key concept in Piaget’s work on child development (Piaget 1954), developing from there into a key term of constructivist learning theories. A further influential approach began already with the work of the neurologist Henry Head (1911) who used the notion of “postural schema” for describing how a body organizes incoming sensory impulses which ultimately rise into conscious awareness. The notion of “body schema” is still around today as a notion that explains “a complex background of embodied processes, [i.e.] a body-­schema system involving visual, proprioceptive, and vestibular information” (Gallagher 2005: 76). From a different vantage point, Zlatev’s work makes use of the concept of “mimetic schemas” which develops along stages of intersubjectivity during child development (2007, 2013, 2014a). Bartlett, to whom I turn below, used a way of looking at the matter that seems to apply to my present case. He collected a vast amount of empirical data on “forgetting”, i.e. paying special attention to the items chosen to be stored in memory. Every recognisable (postural) change enters into consciousness already charged with its relation to something that has gone before, just as on a taximeter the distance is presented to us already transformed into shillings and pence. […] For this combined standard, against which all subsequent changes of posture are measured before they enter consciousness, we propose the word “schema”. (Bartlett 1995 [1932]: 199) Particular detail is backgrounded with different degrees of intensity. What is foregrounded, i.e. “transformed into shillings and pence”, involves the transformation of previous experience. Being interested in the detail that was prone to be forgotten, Bartlett traced “forgotten” detail by comparing memories that were reported by his subjects at certain intervals, over periods of months and years. Stories, for instance, became much shorter and “more coherent” (1995 [1935]: 125) over time. He suggests that it is a person’s sense of coherence and personal dispositions (“temperaments, interests, and attitudes” 1995 [1932]: 32) that influence the kinds of (visual and verbal) detail that is remembered or forgotten. The readers in my study were not asked to remember, but to jot down what came “to mind”. If that which comes “to mind” is then captured in a pictorial response, performed with pen and paper, the outcome is line drawings. The resulting design features capture the transformation of remembered experience as occurrences in an intersubjectively shared spatiotemporal world that possesses “tensional, linear, areal, and projectional qualities” (Sheets–­Johnstone 2012: 37). In other words, their pictorial reflections preserve the experience of a shared world and, with that, an experience of a “somatosensory modality” (Damasio 1999: 318). In sum, it is suggested that the pictorial responses always convey experiential content that includes the experience of forces as its conceptual core. Between the


Gisela Bruche-­Schulz

meaning impulse received from the text and the pictorial that is produced as the response lies a common conceptual ground.4

2.2. Line drawings and conceptual ground The present chapter deals with line drawings. The notion of a common conceptual ground between what is depicted in the line drawings and the experiential input, differently labelled, is the core of Arnheim’s explanation of how force-­gestaltist concepts “take shape” (1969: 120–133), and of Johnson’s reception of Arnheim (1973) (Johnson 1987: 76–87). What looks like a technical notion, useful for the description of line drawing, is more than that. The experience of force underlies not only the production and understanding of pictorial productions, but is literally the driving “force” of social life.5 Social forces may be experienced as physical forces and, like physical forces, be applied to matter and mind. Even a seemingly open space is never “imagined” as empty. Ego and alter-­ego are omnipresent. The condition of intersubjectivity overwhelms any potential emptiness. Line drawings thus move in spatial and social space simultaneously. Lines create shapes, and “all shapes are experienced as patterns of forces and are relevant only as patterns of forces” (Arnheim 1969: 276). Johnson illustrates such patterns of forces by diagrammatic, i.e. schematic, line figurations (Johnson 1987: 76–78, 86–87). He elaborates the subdivisions of the structural components that may lead to balance, equilibrium, or the disturbance thereof. They are paths, links, cycles, scales, centre–­periphery, etc. (ibid: 113–126). He explains that these force gestalts are “tied to gestalt features of our experience of physical force and barriers [that yield] important inferential patterns based on FORCE schemata” (ibid: 63). Line figurations may depict the conceptual core of force gestalts, or schemata, with a few strokes. The readers thus, unknowingly, frequently express a requisite meaning impact by sketchy line drawings when jotting down what comes “to mind” in a time frame of ten minutes.

3. The data Five different groups of students were given the same one-­page long excerpt of Saint-­Exupéry’s Le Petit Prince, each time in a different language (English, Written 4 The term “ground” is taken from Peirce who explains “[W]e cannot comprehend an agreement of two things, except as an agreement in some respect, and this respect is […] a pure abstraction. Such a pure abstraction, reference to which constitutes a quality or general attribute, may be termed a ground.” (Peirce 1992 [1867]: 4). 5 Searle (1995) introduces his book by writing, “How does a mental reality, a world of consciousness, intentionality, and other mental phenomena, fit into a world consisting entirely of physical particles in fields of force? This books extends the investigation to social reality: … money, property, marriage, governments, elections, football games, cocktail parties and law courts…”. (1995: xi)

Pictorial Responses and Projected Realities


Chinese, German, Russian, Turkish), on five separate occasions. The excerpt consisted of the first page of chapter XXV. The student-­respondents were asked to “underline the words for which something came to mind” ad lib, and to “jot down what came to mind in the margin of (the one-­page-long) text” ad lib. Each group was given ten minutes for the task. Other than this time limit, there were no special constraints.6 Across the five languages, the students proved sensitive to underlying semantic concepts such as affirmed or negated assertions, questioning senses, boundedness or unboundedness of events, and the states of balance and equilibrium. These underlying semantic concepts are all to be explained as expressing force gestalts (Johnson 1987: Chapter 5). Affirmed assertions do not show a blockage, negated assertions do. Questions open up moving lines or scales that await to be stopped. Unbounded events move on and on, with no end in sight. Balance and equilibrium can be depicted by compositions that use mappings of “symmetrical force vectors meeting at a point onto a curved surface” (Johnson 1987: 86). That these observations are not random or irrelevant is shown by the fact that the students’ responses are not random. The response numbers correlate, in all five language groups, with the textual segments containing such semantic concepts (Bruche-­Schulz 2014, 2013). It is true that it is only the sum of the responses (written and pictorial ones) that contribute to such correlations. I take it that this does not invalidate the fact that there exist correlations. The particularity of the pictorial responses consists of the immediate visibility of force patterns by the diagrammatic line compositions. I suggest that the students, when responding in both written or pictorial formats, at a level not accessed by conscious awareness, responded to a “meaning”. It is the meaning that is preserved in the semantics of the languages, and stored in the memory of a motive force that generates states of equilibrium and blockages thereof (e.g. by affirming vs. negating an assertion, by an ongoing “unbounded” vs. a bounded event). There was no mention of the possibility to respond by drawings. Drawings and pictographs were produced all the same, either solely, or in connection with words that situated a depicted space in some context. The page that was given to the five groups of student-­respondents consisted only of a text, an excerpt from the beginning of chapter XXV without the accompanying drawings by Saint-­Exupéry.7

6 I came across this particular design variation of the ethnographic method of self-­ report by chance (Bruche-­Schulz 2014, 2013). Different from the ethnographic method in anthropology which uses interviews for self-­reports, I cut short the time given to ten minutes. In addition, I did not ask questions, but presented a text to let the respondents freely associate when reporting on what came “to mind”. 7 The title page of Le Petit Prince has as its subtitle “Written and Drawn by Antoine de Saint-­Exupéry”. This subtitle is shown in all translations, and Saint-­Exupéry’s drawings are included in all translations. The text which was read during the elicitation events did not show any of these drawings by Saint-­Exupéry.


Gisela Bruche-­Schulz

The student-­respondents came from different countries, and lived different everyday lives. Given the fact that the instructions did not impose any constraint on the content of what was expected to be “jotted down”, naturally, their pictorial responses showed considerable differences in terms of motifs and preferred topics. In a similar vein, the preference for either the pictorial or the written format seemed to be due to personal dispositions, i.e. not to correlate with a specific parameter. In Table 1 the total of all response instances, be they in written or pictorial format, is mentioned first, e.g. 179 (159). The value in parentheses refers to those responses that were clearly aligned with the text. The number in front of the parentheses includes also UAV-­responses (unaffirmed-­view responses): responses for which no alignment with specific words in the text (by arrow, underlining, circling) was shown.8 Next, the percentages of responses in pictorial formats are shown. Close to half of the respondents (40 % and 41 %) used pictorial formats in the groups reading the one-­page-long text in English and in Turkish.9 In the other groups, the numbers of pictorializers were lower (25 % in the Russian and German groups, and 17.6 % in the Chinese group). Table  1 also indicates the number of “pictorializers” within their respective language groups. In the Chinese group, there are only three persons who jot down responses in pictorial formats, in the German and the Russian groups there are 4 who do so. In the Turkish and the English groups there are 7 and 8 pictorializers respectively. The group average of pictorializer-­persons is the lowest in the Russian group and the highest in the English group. When it comes to the peak value per person, student P17E (P[erson17]) of the English group produces the highest peak value. The lowest peak-­value output of 4 pictorials per person is found both in the German and the Russian group, produced by P9G (P[erson9]), and P9R respectively. As can be seen from the peak values per individual pictorializer, only a small number of respondents contribute to the high numbers of pictorial response items. Higher numbers of pictorial items per person occur mostly when readers use predominantly sketchy pictographs. The sheer distribution of the numbers shows different preferences for either written or pictorial responses. 8 In this chapter, UAV responses are included in the presentation, since – unlike in my previous attempts (Bruche-­Schulz 2014, 2013) – the focus is not on the quantitative relations between the underlined (also circled, or bracketed) words signalling a semantic concept and the resulting response numbers. 9 I will use the shorthand form of ‘English, Chinese, German, Russian, Turkish group(s)’. The number of students in each group ranged from 16–20. The ‘English group’ consisted of Cantonese-­English bilinguals who were students of English Language and Literature at a university in Hong Kong. The students of the ‘Chinese group’ were in their final year of a secondary school in Hong Kong studying Remedial English. They were speakers of Cantonese (reading the Written Chinese) and were preparing for university entry, or other further training. In the ‘German group’, native Germans and four students from other European countries were taking part in a class on semantics at a Berlin university (taught in German). The Russian and the Turkish groups consisted of students of bilingual secondary schools in Berlin, Germany, studying their last year at these schools and preparing for university entry.


Pictorial Responses and Projected Realities

Table 1. Responses to the one-­page text from Le Petit Prince Numbers of respondents / Total of written and pictorial responses

E[nglish] 20 CH[inese] G[erman] R[ussian] T[urkish] P[articipants] 17 P[art] 16 P[art] 16 P[art] 17 P[art]

Total responses; number in parentheses: only those aligned with text

179 (159)

150 (147)

135 (129)

139 (122)

167 (153)

Number of “pictorializers”

8 (of 20)

3 (of 17)

4 (of 16)

4 (of 16)

7 (of 17)

Pictorial responses within the total number of responses

47 26.2 %

12 7 %

6 4.4 %

10 2.8 %

22 4.2 %

Peak value of pictorial response items per individual “pictorializer”

P17E: 18

P16CH: 7

P9G: 4

P9R: 4

P14T: 16

Lowest number of pictorial response items






The difference of time and place, as well as of language of translation of the French original might have played a role during the elicitation events, the relation between investigator and students across the groups perhaps as well. The English and the Turkish groups welcomed me with a “let’s-­have-fun” mood. The other groups behaved in a less spontaneous manner.10 As for the types of the responses, they can be identified according to the degree of figurative detail. The student-­respondents chose: (a) line drawings showing scenes and scenarios with figurative detail, either visibly populated by living beings or meant for them, (b) abstracted imagistic detail applied to motion in space, with increasingly reduced figurativity and (c) pictographs, i.e. highly conventionalized and abstracted pictorial symbols. In all three response types, the pictorial gist may be complemented by written words that introduce a specific discursive quality. In the following section, I first comment on how I see the readers’ responses as situated in “provinces of meaning” (Schütz 1962 [1945]), and then present examples of the three types of responses (a–­c). 10 There were thus some differences concerning place and time, as well as the mood towards the elicitation events. Altogether, however, the fact that the same text which was read by five different groups at five different places and times and in five different languages yielded comparable quantitative values (distributions) should mean something, and I try to explain this here.


Gisela Bruche-­Schulz

4. Provinces of meaning, accents of reality, and response types When the readers produce their responses as drawn on paper with a pencil or ballpoint pen, they aim at the world of the “pragmatic motive” (Schütz 1962 [1945]: 234). By doing so, they follow the request to produce a jotting. When they focus their attention on the thing experienced as a meaning relevant to themselves, they enter the plane of imagination upon request rather than upon their own initial impulse. Thus, when complying with the investigator’s request to jot down what comes to mind, the readers apply their attention to the task by using pen and paper. At the same time – while producing material sketches, notes, and scribbles – their attention is directed towards a world of “imagination”, while staying within the task-­ oriented “world of working” (Schütz 1962 [1945]: 227). Again, at the same time, the request was framed as a playful event, outside the usual teaching situation11, i.e. not bound with the usual student-­teacher relation. They, thus, may even be compared to children “who play together in their make-­believe world, […] indulge with Others in the same ritual [since still] connected by communicative acts of working with the Other” (1962 [1945]: 258). From Schütz’s viewpoint, the readers thus remain in the all-­encompassing world of everyday life where imagination precedes the projecting, or planning of some action, comparable to a scientist who does what she is doing and likes it, i.e. emotionally confirms her work as being meaningful and fulfilling. All these drawings use primarily schematic (diagrammatic) modes of presentation. Only the drawings shown in Figures 1–4 make use of figurative elements in various degrees of intensity and purity. While these diagrammatic pictorial creations mirror the textual input, they themselves may motivate textual output. Viewers may talk about picture signs. In other words, there may result in “discursive effects of […] the relationships exhibited in diagrams” (Pietarinen 2014: 118). Like the words of the text which was read have an effect in that they “excite in the mind of the receiver familiar images […] quite detached from the original circumstances of their first occurrence” (Nöth 2008: 93), the structural parallelism of the resulting diagrammatic design of the pictorial responses may be translated forward into typified discursive signals of new contextualizations. In the analysis, the drawings are shown and complemented with (projected) discursive event types as aligned with the reality status of “everyday life”12: (a) Conversation: impacting on socio-­cohesive relations, (b) analytical commentary: meant 11 The elicitation events took place before a lecture or teaching session started. 12 “[W]ith respect to the paramount reality of everyday life […] this reality seems to us to be the natural one, and we are not ready to abandon our attitude toward it without having experienced a specific shock which compels us to break through the limits of this ‘finite’ province of meaning and to shift the accent of reality to another one.” (Schütz 1962 [1945]: 231). – The discursive event types of conversation, analytical commentary, and evaluation belong to the stock of frequently experienced and expected discourse types.

Pictorial Responses and Projected Realities


to help the pragmatic goal of shaping an environment for a purpose, and (c) evaluation of state of affairs: assessing the chance of human intervention.

4.1. Scenes and scenarios: figurative and diagrammatic components Figures 1–4 show spatial frames, each with a different expressive quality. In Figure 1, a seemingly lonely view of the spatial frame of a scene is depicted. S[egment9] in which the word Sahara is underlined as the word for which something comes to mind, reads, together with s[7]: s[8] “The well that we had come to s[9] was not like the wells of the Sahara.” The simple curvy lines depict the sand of the desert, and another type of upward line shows the air that rises because of the heat. These simple lines “give visible shape to patterns of forces” (Arnheim 1969: 135). The rising lines show air in upward motion activated by the force of the heat, and the wavy horizontal lines indicate the desert sand as a ground that gives way to forces. There are, then, forces that shape motion in space and effect sensorially experienced processes. Figure 1. P13E at s[9] “was not like the wells of the Sahara”. Discursive style: a) Conversation: Don’t go there in summer. It will be unbearable. b) Analytic description: The artist managed to give a rich impression of a difficult environment with only a few strokes. c) Evaluation: Adversity of forces can probably not be removed by the counter-­force of human action.

Three single-­word expressions that accompany this line drawing strengthen its conceptual import. The force of the heat with the resulting types of sensory experience is captured in the words “hot – thirsty – silent”. No specific entity is referred to as being “thirsty”. P[erson]13E belongs to the English-­language group who jotted down their responses in Hong Kong. Every Hongkonger knows (from intense episodic


Gisela Bruche-­Schulz

memory) that heat makes all mammalian bodies thirsty, and that heat disturbs the bodily equilibrium. Human bodies are not displayed by figurative means, but are invoked by the verbal commentary. In Figure 2, it is also s[9] which is in focus. At s[8–9], the text reads: (s[8] “Vardığımız kuyu s[9] çöl kuyularına benzemiyordu” – “s[8] The well that we had come to s[9] was not like the wells of the Sahara”). P[erson3] of the T[urkish] group (P3T) sketches out a space that, this time, includes a self in an explicit manner (the larger stick figure), together with the “little prince” – the smaller stick figure. She indicates the larger stick figure as being herself (ben “I”) in Turkish. The smaller stick figure, standing under a tree (!), represents the little prince (“kleiner Prinz”, the wording here is German). P3T engages in pretend-­ play when expressing wonder and delight about the existence of the one tree in the desert that is created by herself. She adds a bracket to the sketch of the tree, and explains (in German): “in der Wüste nur ein Baum” “in the desert only one tree”. In the process of delineating this imagistic perception (effected by a line drawing!), the “tree” comes to symbolize an affordance which is “working as” a shelter against the burning heat. Figure 2. P3T at s[9] “çöl kuyularına benzemiyordu”. Discursive styles: a) Conversation: Now there is shade. You can go there and have some shelter. b) Analytic description: The artist managed to give a wealth of information with only a few strokes. c) Evaluation: Adversity of forces can be removed by counter-­force (of human action).

The line drawing projects a scenario that is meant to alleviate the effects of the extreme desert climate. The reader-­respondent creates a tree to rescue the situation, i.e. for restoring a state of equilibrium. P3T thus gives the narrative sequence a new turn. She plays on the sense of “care” when creating a tree-­scenario that does not exist in a desert, and involves herself as the larger creator stick-­figure, thus engaging in symbolic play (known from children’s play and works of art). Next, in Figure 3, P8E designs a space in which motion is happening. At s[2] (“[Men] set out in on their way in express trains”), the swishing sound of a fast-­ moving train is depicted by lines running in parallel beneath and on top of a locomotive.13 At s[5] (“Then they rush about, ….”), three stick figures, shown with legs moving and arms outstretched, “rush about” in the fast motion of the experience of

13 The lines in parallel belong to the inventory of schematic design features, the locomotive, with windows and wheels and a requisite shape, represents a figurative feature.

Pictorial Responses and Projected Realities


being in a “hurry”. The concept of motion in space is thus recorded and exemplified by two different sources, the “speed” of the train and the humanly formed “hurry”. An intense emotive element permeates the sensory experience of rushing (panic, i.e. disturbed equilibrium). Figure 3. P8E at S[1–5], express trains and hurried people. Discursive styles: a) Conversation: Express trains are really speedy. Just listen to the swishing sound, and look at the people there. b) Analytic commentary: High speed can be felt to be threatening. c) Evaluation: Picture signs are good since intense experience can be read off from the schematic of their combined figurative-­diagrammatic design.

In Figure 4, spatial space is to be read off the intersubjective space denoted by a stick figure that is placed in quotation marks as raising her arms in despair. The quotation marks are addressed to an audience. The written commentary “Where? to do what? a lost life” complements and intensifies an emotive state; here, the genuine pretend-­played and quotable, feeling of despair. Like P3T in Figure 2, P13E engages in symbolic play that rehearses both the skill of giving an expression to an affective state (despair), and the quotative means of referring to an ever-­present audience. The quotation marks are placed around the stick figure, not around the words. Thus, an abstracted line drawing presents a human being that is heard speaking in the world she inhabits. Figure 4. P13E at s[1–4]. “Where? to do what ? a lost life.” Discursive styles: a) Call for help: A body is in disarray. Someone suffers! b) Analytic observation: A human being may use quotative language when referring to a context other than the present one. c) Evaluation: Oppressive force should be understood as a call for help and should be taken seriously.


Gisela Bruche-­Schulz

Figures 1–4 can be said to reflect abductions that generate imagistic creations with various degrees of schematicity. The readers seem to engage in efforts to seriously apply themselves to the task to jot down “what comes to mind” at the moment of reading. The task was put to them in the context of the world of the “pragmatic motive” that governs everyday life (Schütz 1962 [1945]: 208). Since all respondents jotted down their responses in this context, they should be seen working in the same “reality”, as defined by Schütz.

4.2. Dutifully obliging or fun and play Force gestalts are the epitome of Peircean “diagrammatic” structure, abstracted into the depiction of the most radically condensed experience of bodies in motion and space. They may come “to mind” since they are easy-­to-find equivalents of gestural elements in prelinguistic interaction (Sereno 2014). This concerns their condition of possibility so to speak. In order to bring the responses discussed in this chapter under the umbrella of a semiotic concept, Schütz’s reflection on the “reality of the world of daily life” may be applied. For the drawings in Figures 5–9, the argument can still stand as it is; however Figures 10–11 raise questions. It was stated in the introduction that pictorial responses testify to some form of interaction with the world. Firstly, readers interact with the person who presents a narrative text to them and, in addition, readers use the text for generating imagistic creations. In the following, it will have to be asked whether this statement can stand as it is. In section 4.2.1, the drawings reveal degrees of abstracted structure, with an increasingly prominent diagrammatic core. In section 4.2.2 we will question what it means when heavily picture signs are used by the respondents.

4.2.1. Singular icons, diagrammatic analogies, and engagement with “reality” The response items reflect a momentary envisioned context, born from an engagement with the imagistic inspirations that “come to mind”. Although they choose different styles, the readers apply a specific “attentional attitude [and an] accent of reality” to their choice (Schütz 1962 [1945]: 230). When pictorials exhibit less and less detail, abstraction is at work that scales down the pictorials to a minimal diagrammatic schematic. In Figures 6 and 8 below, degrees of abstraction are observed along a continuum of more or less picture-­like, or more or less abstracted pointers to a scene or a detail of a scenario. While Figure 4 presents a stick figure whose address to an audience is helped by the accompanying words (“Where? To do what? a lost life!”), Figures 6–8 together invoke a context by closer reference to adjoined, text passages. In Figure 6, the sketch of a “bucket” looks like a typified and lonely presentation of a bucket (container, and containment schema).

Pictorial Responses and Projected Realities


Figure 6. P1E at S[40], pictograph or drawing (bucket)

When put in the context of the accompanying line drawing (Figure 7), it may connect with a scenario. Two human figures are walking under the stars, one of them wearing a crown. The bucket tells of the water that was finally found in the desert (S[egment 40] of the text reads, “I raised the bucket to his lips.”) Two figures, dressed in robes, one of them with a crown, are walking under the stars. The presence of the bucket filled with water on the physical page that was read, “feels good” since water was finally found in the text printed on that physical page (state of equilibrium). Words from the text world serve to elaborate the sensory presence of darkness, and the “grass” on the ground radiates light. Motion, space, and sensory feeling belong to an “I” who perceives intrinsic spatial properties. Figures 6 and 7 make use of a mixed figurative and diagrammatic design as well as of words that elaborate sketchy elements like stars and blades of grass, similar to what is found in Figures 1–4. Figure 7. P1E at S[40–45], pictograph and scenario. Discursive styles: a) Conversation: The story is about a figure with a crown, light and darkness, and water. Are you interested? b) Analytic commentary: There are problems (thirst, darkness), but also solutions to these problems, namely, a bucket with water, and the light of the stars in the darkness. c) Evaluation: Forces of bodily imbalance (thirst) and of the disabling force of darkness are overcome by counterforces that restore harmony.


Gisela Bruche-­Schulz

The line drawings in S[40–45] above presents a scenario with an adverse force (darkness). The restoring counterforces of togetherness, as well as achievement (finally able to drink water in the desert) restore balance and equilibrium. In Figures 8, 9, and 10, response items are shown that exhibit a further reduction of detail. In Figure 8, P19E underlines S[34] and provides a description of the sunlight that shimmers in the still trembling water. The water in the bucket that reflects the sunlight is presented by wavy dot-­like lines. This abstracted style of pictorializing still follows the text quite closely, and there is still a literal, visibly iconized connection between the source and the receiver of the sunlight. Figure 8. Pictograph and scenario. Discursive styles: a) Conversational telling: The sun shines, and the sun rays are reflected in the water that fills the bucket. b) Analytic commentary: Mammalian bodies are in disarray when deprived of water. A bucket full of water signals a solution. c) Evaluative commentary: Positive forces are at work, promising equilibrium to bodies which need water and daylight.

The dot-­like lines convey a diagrammatic semblance with water when filling the entire bucket; the shape of the bucket is sketched out, complete with its handle, and the rays of the sun as streaky lines. The three separate parts of the sketch in S[34] (Figure 8) together invoke a spatial space and the result of a particular kind of motion, the transfer of heat that is to be read off the reflection of sunrays in the water. Both figurative (sensory effects of sun rays) and diagrammatic line elements are used in this drawing. The sketch of Figure 9 demonstrates how the very same motif within the same framework of perception (transpiring from S[34] “sunlight shimmer in the still trembling water”) is undergoing a process of abstraction. The sketch seems to be freed from an explicit spatial context. It is shrunk into a symbol. As above in Figure 8, the force of heat that impacts on water is captured. But now, the source of the heat is no longer shown. The sun, the source of the heat, is shrunk into a dot on the surface of the water that fills the bucket. A “shrunk dot on the surface” when understood in connection to the sun reflects an analogy, and “[a]ny analogy is a diagram because it represents a parallelism between the structures of two conceptual domains” (Nöth 2008: 88).

Pictorial Responses and Projected Realities


Figure 9. P7E, UAV at S[34]. Discursive styles: a) Conversation: It looks as if there is a face that expresses frustration. Officially, it depicts how sun rays are reflected in the water that fills some round thing. b) Analytic commentary: Mammalian bodies are in disarray when deprived of water. c) Evaluation: Sunlight is shown to shimmer in the water, as is normally experienced when the sun shines.

In sum, it looks as if diagrammatic structure is everywhere when readers jot down what comes “to mind”. After all, conceptual analogies are already given when the meaning of a text is translated into an impact on minds. What happens if such translations are channelled through heavily conventionalized symbols?

4.2.2. Analogies and schematics: Parallelisms between conceptual structures In Figure 10, P[erson8] of the German group underlines the word “Kreis” (“circle”) at S[egment 5]: (Nachher regen sie sich auf und drehen sich im Kreis… “Then they […] get excited and turn round and round…”). She then jots down in the margin the unadorned shape of a circle, of the size shown by the letters of the written words. The question arises whether or not the degree to which the circle is reduced to a naked shape carries a message by itself. Technically, there is then again, like in Figure 9, an analogy which connects the structure of two conceptual domains. Figure 10. P8G at S[5], Pictograph (circle). Discursive styles: a) Conversational telling: People turn round and round. They circle around something. b) Analytic commentary: The circle describes an analogy. c) Evaluative commentary: A circle is a beautiful symbol of harmony. / A circle is a circle. But so what?

From Figure 1 to Figure 9, the drawings have been shown and complemented with (projected) discursive event types as aligned with the reality status of “everyday life”. The evaluative commentary for Figure 10, seems to give away some difficulty. Does the degree to which the “circle” is reduced to a naked shape allow us to read off some kind of mockery or boredom? Obviously, the question of whether there is an underlying feeling that motivates the production of the readers’ pictorial responses is, for the analyst, a question of judgement concerning the felt impact of the “halos” of things (Schütz 1962 [1945]: 224). In Peirce’s words, “[I]n formulating the


Gisela Bruche-­Schulz

judgement [we perceive the image of our judgement and] we are, in the very act of formulation, aware of a certain quality of feeling” (1998 [1903]: 247). A last example demonstrates how the pictorial creations express the qualitative aspects of a feeling. Figure 11 below covers a whole page. P17E seems to engage in just having fun. I count altogether twenty items of pictographic depictions that P17E jots down, obviously with great enjoyment and fervour. All pictorial creations consist of abstracted line images that come across as sketchy drawings with degrees of a more or less pictorial influence. Figure 11. Peak number of pictographs (P[erson17] of the English group)

Pictorial Responses and Projected Realities


5. Concluding remarks In this chapter, I focused on pictorial responses to a narrative text. The inclination to respond either pictorially or verbally differs among the respondents. Among the five groups of reader-­respondents, the English-­language group provided the largest number of pictorial responses. Pictorials activate a visual scene which is not placed in some past, present, or future, while linguistic discourse is more or less tied to temporally anchored states and events. Both pictorial and written responses are the products of “working […] in outer and in inner time, unifying both dimensions into a single flux which shall be called the vivid present” (Schütz 1962 [1945]: 216). The narrative text itself already presents scenes and scenarios whose conceptual cores consist of gestaltist patterns of experience. This is not immediately visible, but the distribution of the response numbers onto the segments of the text mirror the force-­gestaltist (diagrammatic) aspectual semantics of the narrative text (Bruche-­ Schulz 2014). Even if not consciously aware of this, the respondents’ thoughts and feelings appear to be shaped by it, and the pictorial creations express this type of hidden awareness through line drawings directly. When the reader-­respondents engage in abductions that generate imagistic creations, two styles emerge. The first seems to be born from a joyful, but also serious engagement with the abductive inspirations that “come to mind”. These pictorial creations are composed of lines and non-­conventionalized figurative elements. The second style is marked by more and more conventionalized motifs. Such near-­to pictographic creations, especially when used in large numbers, seem to express a mood of merriment and play. Singular instances seem to express irony. The present author and analyst of the readers’ responses approaches an understanding, i.e. a judgment on their meaning, on the basis of the quality of a feeling as well. Such feeling is needed to convince us of an existing reality. In Schütz’s words, when the attentional attitude towards the “world of everyday life [achieves the] suspension of doubt” (Schütz 1962 [1945]: 230), a perceived givenness can be affirmed as real, confirmed by the process that needs “the emotional interpretant” (Peirce 1998 [1907]: 410) among others. The key notions of this analysis are therefore quality of a feeling, experience of reality, and force-­gestaltist diagrammatic patterns.

The sources of the excerpt Saint-­Exupéry, Antoine de. 2002 [1943]. Le Petit Prince. Paris: Editions Gallimard. – 1998. Der Kleine Prinz. [Leitgeb, Grete; Leitgeb, Josef, trans. Translation first published in 1950.]. Düsseldorf: Karl Rauch Verlag, 78–80. – 1999. 小王子 – Le Petit Prince – The Little Prince. [張譯 ‘Zhang Yi’, trans.] Taipeh: 希代 ‘Xi Dai’ Publishers, 260–262. – 2002. The Little Prince. [Woods, Katherine, trans. Translation first published in 1945.] London: Egmont Books, 76–77.


Gisela Bruche-­Schulz

– 2006a. Küçük Prens. [Avunç, Yaşar, trans. Translation first published in 1987.] Istanbul: Mavibulut Yayıncılık, 79–80. – 2006b. Malen’kij Princ. [Gal’, Nora, trans., foreword]. Moscow: Detskaya Literatura, 99–101

Peter Coppin, Ambrose Li & Michael Carnevale

Chapter 18 Iconic Properties are Lost when Translating Visual Graphics to Text for Accessibility 1 Introduction Imagine a blind or low-­vision individual who needs to access a graphic representation, for example a financial chart (Figure 1a). Unlike a sighted individual, who can see the actual chart, what the blind or low-­vision individual accesses, usually aurally, is often its text description. Both individuals are accessing a representation of rising and falling stock prices over time. However, whereas the sighted individual sees words and undulating shapes, all the screen reader user hears are words (Figure 1d). Such is the state of the art in accessible graphics: Many blind and low-­vision individuals depend on approaches like the Web Content Accessibility Guidelines (WCAG) to provide them with text descriptions – translations of the graphics into text which these individuals can then access via a screen reader, usually aurally via text-­to-speech. Text descriptions are essentially interpretations meant to convey the meaning intended by the author of the graphic representation. However, if a text description can fully convey the meaning of the graphic image, then why did the author create it in the first place? In the foregoing scenario, parts of the chart, predominantly numerical values and labels (Figure 1c), are already text and will naturally carry over to the text description. However, we argue that the rest of the chart (Figure 1b) is not and cannot be adequately1 translated into text because these parts that are originally conveyed via shapes are fundamentally different from the parts originally conveyed via text. Shimojima (1999) calls this fundamental difference the “graphic-­linguistic distinction,” while a semiotician might see this as Lessing’s dichotomy of “pictures” versus “literature” (cf. Sonesson 1988); in either case, one might predict the parts originally conveyed via shapes to be “lost in translation.” Suppose a designer wants to improve the state of the art and design a system of accessible graphics that minimizes what is lost in translation. Let us call this our target design problem. Because the state of the art is often aural, we claim that a sound-­based solution is appropriate given our target design problem. 1 We are adapting Toury’s (as cited by Palumbo 2009: 8) definition of adequacy here, so by adequately we mean being able to subscribe to the norms of the source culture (of visual graphics) and to express the relations expressed in the source culture. This definition is consistent with the understanding that text descriptions should convey the author’s intent.


Peter Coppin, Ambrose Li & Michael Carnevale

Figure 1. Deconstruction of a financial chart. The original chart (a) comprises parts conveyed via shapes (b) and parts conveyed via text (c), but in a text description (d) parts originally conveyed via shapes are also conveyed via text (e). Adapted from “Web Accessibility Best Practices: Graphs” by Campus Information Technologies and Educational Services (CITES) and Disability Resources and Educational Services (DRES), University of Illinois at Urbana/​ Champaign. Copyright 2005 by University of Illinois at Urbana/​Champaign.

Although touch (cf. Kennedy 1993) might be a more natural fit than sound, if the state of the art is primarily aural, a sound-­based solution is probably more immediately applicable and aligns better with our target design problem. Also, although screen readers can deliver text descriptions tactilely as braille, refreshable braille displays are expensive in comparison to audio hardware. Furthermore, although touch screens might be tactile, Klatsky, Giudice, Benntt, and Loomis (2014) have shown that conveying information tactilely via touch screens suffers from challenges related to haptic perception. Touch screens are also not as ubiquitous and inexpensive as audio hardware.

Iconic Properties of Visual Graphics


Now, to solve our target design problem we will first need to conceptualize what is actually lost, which means we need to identify distinct and common properties of graphics relative to text. Theoretically, then, our true goal is not our target design problem, but the conceptualization of what is lost during the translation from visual graphics to text descriptions – in other words, a theoretical model. Practically, however, this model can then inform the design of new approaches for conveying properties of graphically represented shapes in non-­ visual perception modes, which in our target design problem would be sound. While we will not discuss specific plans to test the model, since the translation of graphics to text is a translation process, the concept of “back-­translation” (cf. Palumbo 2009:  14) can be adapted to serve as the basis of experiments to test and challenge the model itself. Our conceptualization of what is lost will be based on the science of perception and cognition, but expressed in terms compatible with semiotics. Although our destination is cognitive semiotics, we will begin, in Section 2, in an area of computer science where research and practice relevant to our target design problem is transpiring, but, in our view, lacks a means of describing what is lost. By reviewing several leading accounts of the “graphic-­linguistic distinction” from the field of diagrammatic reasoning, what will emerge is that charts and graphs designed for visual perceivers are mostly composed of items labelled via text, but located in relation to other labels through visually perceived spatial, geometric, and topological relationships. This section will discuss how spatial properties of sound could be recruited to convey spatial, geometric, or topological relations among labelled items that are currently lost in translation. Synthesizing the various accounts of the graphic-­linguistic distinction results in a pattern that suggests a more effective means of translating what is lost, but the terms for conveying spatial-­topological-geometric properties are unwieldy compared to corresponding terms in semiotics. Recruiting well established terms and concepts from semiotics will partially replace these unwieldy phrases. At this point, diagrams can then be described as iconically conveyed relations among symbolically conveyed items or objects. However, what principles could guide a designer in distinguishing between symbolic and iconic properties of a graphic image? Also, how could a designer identify appropriate mappings from iconic properties of visual graphics to those of sound to convey the same relations? To answer these two questions, we introduce Coppin’s (2014) perceptual-­ cognitive model, which distinguishes between what he calls the pictorial and symbolic properties of graphics, and where he conceptualizes diagrams as pictorial relations among symbolic objects. Pictorial and symbolic properties in this model will be shown to be respectively akin to iconic and symbolic properties from semiotics. With this synthesis in place, the final section applies the synthesized model to the target design problem to precisely identify what is lost in translation and offers a solution to inform translation.


Peter Coppin, Ambrose Li & Michael Carnevale

2. The graphic-­linguistic distinction: Implications for sonic interfaced design Let us now examine the “graphic-­linguistic distinction.” Out of seven candidate distinctions that Shimojima (1999) identifies, we discuss four here in relation to how they extend the idea of this distinction into sound. Because Shimojima’s discussion assumes that “generally, pictures, images, and diagrams are graphical representations” (ibid: 313), any characterization of graphical representations here applies also to diagrams.

2.1. 2D vs sequential The first distinction comes from Larkin and Simon (1987), who define a diagrammatic representation as a “data structure in which information is indexed by two-­dimensional location” and a sentential representation as “a data structure in which elements appear in a single sequence” (ibid: 68). By definition, a text description (Figure 1d) in its visual form is therefore sentential, as text is arranged as a linear sequence of marks, whereas the original chart (Figure 1a) is diagrammatic because financial values are indicated via marks indexed to a 2D grid. Larkin and Simon also state that diagrammatic representations “preserve explicitly the information about the topographical and geometric relations among the components of the problem” (Figure 2, upper right) whereas sentential representations do not. Thus, the spatial relations among the labelled marks enable one to visually perceive the contour of the line or the relative positions of marks scattered across the 2D surface, thereby inferring values and trends that are not explicitly conveyed via labels (cf. Barwise and Etchemendy 1995). Figure 2. Diagrammatic versus sentential representations via sonic and visual perceptual modes

Because Larkin and Simon’s definitions do not require the information in the data structures to be visual, text descriptions in their text-­to-speech form are already by

Iconic Properties of Visual Graphics


definition sentential. Now note that forming a 2D space in the sonic domain will require two properties of sound that can be independently manipulated and perceived. Identifying such sonic properties should in principle enable designers to construct sonic external representations that are diagrammatic (as defined by Larkin and Simon), which, when used to translate visually perceived diagrammatic structures, should enable the conveying of topographical and geometric relations that cannot be conveyed via text-­to-speech translations. We can imagine, for example, how in Figure 2 (lower right) a blind or low-­vision user could press arrow keys to move an “audio cursor” along the x-­axis of an imagined 2D space to perceive the contours of the graph.

2.2. Relation symbols and object symbols The second distinction to examine comes from Russell (1923), who proposes the following: There is, however, a complication about language as a method of representing a system, namely, that words which mean relations are not themselves relations, but just as substantial or unsubstantial as other words. In this respect a map, for instance, is superior to language, since the fact that one place is to the west of another is represented by the fact that the corresponding place on the map is to the left of the other; that is to say, a relation is represented by a relation. (ibid: 90)

For example, a financial chart (Figure 1a) conveys higher and lower monetary values via marks at higher and lower elevations, respectively. This convention enables the visually perceived spatial relationships between the marks to represent relationships among monetary values over time. To consider how relations can be conveyed sonically, consider two tones A and B in the foregoing example, where A has a lower pitch than B (Figure 3). Their sonic relation is then their perceptible difference in pitch. Now if Tone A denotes a stock price at an earlier point in time, and Tone B a stock price at a later point in time, then the perceptible difference between their pitches can convey the difference in price over time. Moving the sonic cursor from left to right would then result in a higher pitch, conveying the relationship between the stock price over time via a sonic relation. Figure 3. By scrubbing a “sonic cursor” along an axis, audiences could access sonically conveyed relations through changes in pitch and via stereo.


Peter Coppin, Ambrose Li & Michael Carnevale

2.3. Analogue vs digital We next explore the analogue-­versus-digital distinction, most commonly associated with Goodman (1968). Analogue systems are dense throughout, where a system being dense refers to the ability to place a new element between any two elements in a representation ad infinitum, and any new element added can change the meaning of the representation. Digital systems, in contrast, are discontinuous and differentiated throughout. Goodman claims that pictorial representations are also defined by their degree of repleteness, which refers to the number of possible perceptual features that can vary before changing the representation’s meaning. Thus pictures are analogue and more replete, diagrams are also analogue but less replete, but linguistic systems are partially digital. In other words, an analogue representation can have an infinite number of variations within a representational space that can carry unique meanings. If a representation is digital, however, the number of meanings is limited. Using the analogue instead of the digital properties of visually perceived graphics appears to require two interrelated capabilities: (a) lower-­level capabilities to perceptually process features from an environment or external representation, and (b) higher-­level capabilities to recognize the linguistic categories of the perceptually processed features. By lower-­level capabilities we here mean “the aspect of perception-­action that is most closely coupled with the proximal stimuli and sensations that impinge upon a viewer” (Coppin 2014: 50), which is akin to perceptual meaning as proposed by Mandler (2004: 59–91), and by higher-­level capabilities we mean capabilities that are thought to utilize these lower-­level capabilities to form the basis for conceptual understanding and language (Mandler 2004:  76). Take the example of discerning the values on a financial chart. This action requires perceptually processing the light reflected from the chart to observe lines in relation to textual labels. Discerning the same values aurally would require the same set of interrelated capabilities: lower-­level capabilities to process varying frequencies, timbre, and so on, as well as higher-­level capabilities to recognize the linguistic meanings of the sounds. According to Mandler (2006:  2), “perceptual categorization happens automatically and does not require conscious attention.” Now contrast this to the current text-­to-speech approach, which exploits only the digital properties of language. We hypothesize that designers might be able to produce more effective translations sonically by exploiting the automatic perceptual categorization of analogue properties of sound to make use of perceptual features that are lost in the translation to language.

2.4. Intrinsic vs extrinsic constraints Last to examine is the “intrinsic versus extrinsic constraints” distinction (Shimojima 1999:  328) where “representations obeying inherent constraints” are considered graphical (ibid. 332). However, let us examine what he quoted from Barwise and Etchemendy (1990: 22):

Iconic Properties of Visual Graphics


Diagrams are physical situations. They must be, since we can see them. As such, they obey their own set of constraints . . . By choosing a representational scheme appropriately, so that the constraints on the diagrams have a good match with the constraints on the described situation, the diagram can generate a lot of information that the user never need infer. Rather, the user can simply read off facts from the diagram as needed. This situation is in stark contrast to sentential inference, where even the most trivial consequence needs to be inferred explicitly.

To illustrate that “diagrams are physical situations,” consider how the fact that a diagram conveys topological and geometric information through visual perception enables the illustration in Figure 4a in Section 3 to describe many relationships. However, each description conveys a different story about what is shown visually and therefore affords different inferences. Barwise and Etchemendy (1990:  22) thus note that a diagram can show “countless facts” (by which they mean one can construct multiple sentences from a diagram). When Barwise and Etchemendy (1990:  22) refer to diagrams as “physical situations,” they are referring to properties (and affordances) of diagrams that can interact with a human perception system. Designers seeking to extend the affordances of visual diagrams to the sonic domain are challenged to identify properties or dimensions of sound that similarly make use of “physical situations” (that interact with human perception) to enable multiple stories to be described about the relationships that are shown sonically. Extending the example in Figure 3, the hybrid stereo–­varying frequency interface should enable one to “hear the shape” of a contour. If text-­to-speech labels are indexed to the contour, then the user should be able to form multiple sentences about the geometric and/or topological relations among the labelled elements.

2.5. Summarizing extensions of the graphic-­linguistic distinction into the sonic domain Extending these classic graphic-­linguistic distinctions thus suggests the following design opportunities when designing sonic versions of visual charts and graphs: a) The 2D versus sequential distinction suggests the need to identify perceptually distinguishable spatial properties of sound in order to afford the communication of spatial, geometric, or topological information. b) The analogue versus digital distinction suggests that analogue properties of sound, such as frequency, timbre, stereo, and echo could convey analogue properties of visual graphs. c) The relation symbols versus object symbols distinction suggests that analogue and spatial properties of sound noted previously could be recruited to map numerical values to perceptual dimensions. d) The intrinsic versus extrinsic constraints distinction suggests the need to identify “physical situations” that naturally emerge via the human perceptual processing


Peter Coppin, Ambrose Li & Michael Carnevale

of sound so that “countless facts” (Barwise and Etchemendy 1990:  22) can be inferred from those sonically conveyed physical situations.

2.6. Recasting the graphic-­linguistic distinction in semiotics terms Although Shimojima is dissecting the characteristics of graphical and linguistic properties from the perspective of diagrammatic reasoning, semiotics has long been tackling this very problem. Semiotics can therefore provide us with a developed terminology and further insights regarding the differences between picture and language. One tradition that can inform us comes from Lessing, who in Laokoon points out that while the link between the signs of language and their objects is conventional, pictures employ qualities of expression that carry similarities to their object. Scholars including Bayer, Wellbery, and Sonesson further developed Lessing’s distinction and helped define the properties and communicative limits of language versus picture. Another relevant tradition is Peirce’s, especially his idea of the icon. Although iconicity is usually defined in terms of similarity to its object, Stjernfelt (2000) points out that Peirce has actually provided what Stjernfelt calls an operational definition or criterion for iconicity: “by the direct observation of it other truths concerning its object can be discovered than those which suffice to determine its construction” (Syllabus 2.279 as cited by Stjernfelt, 2000). Our target design problem then can be characterized as creating, in the sonic domain, accessible graphics that are iconic in character. A key point here concerning Peirce’s operational criterion is that it does not privilege vision; iconicity is thus in principle also relevant to audition, tactition, and other sense modalities. Peirce’s icon is one among the semiotic triangle of icon, index, and symbol. An index “points” to its object on the basis of spatio-­temporal contiguity (e.g., smoke to mean fire), whereas a symbol’s relation to its object is conventional (e.g., words and their meanings). These three kinds are “ideal”, meaning that all empirical signs will be combinations, in various proportions (cf. Jakobson 1965). In relating to Shimojima’s graphic-­linguistic distinction, graphics then correspond most naturally with icons, while linguistic representations correspond to symbols. In particular: 1. “Spatial, geometric, or topological information” of a visual or sonic information display is akin to iconic properties of a visual or sonic display, and 2. “Intrinsic constraints” (physical situations, as stated in Barwise and Etchemendy, 1990: 22) of a visual or sonic information display are akin to iconic properties of a visual or sonic display. Peirce also developed a notion of diagram that has been interpreted by Stjernfeldt (2000). Section 3.4 compares this semiotic understanding to the perceptual-­cognitive notion of diagram that we are proposing.

Iconic Properties of Visual Graphics


3. A provisional model We now introduce a basic model of perception and action based on Coppin (2014), with the goal of integrating these ideas into a coherent system to inform design and aid analysis.

3.1. Perception-­action cycles and predictions Suppose an individual reaches for and grasps an object, such as a cup on a table (Figure 4a). Reflected light from the cup (Figure 4d) and its surrounding environment is picked up by retinal detectors and perceptually processed to inform a reaching and grasping action with fingers and hand positioned to grasp both the proximal and distal sides of the cup (Figures 5b–­c). This perception-­action cycle (cf. Gibson 1986) comprises two relevant interrelated aspects: First, because the proximal side is visible, reflected light from the proximal side of the cup is picked up and processed by sensory receptors. But because the distal side is invisible, the other aspect is the capability to anticipate, predict, or simulate (cf. Barsalou 2009) the curvature of the distal side of the cup to inform a hand–­finger orientation that is sufficient to grip the unseen distal surface (Freyd 1992; Anstis, Verstraten and Mather 1998; Goldstone 1998; Hockema 2004). Figure 4. Information pickup versus simulation

To the extent that individuals predict or simulate an author’s intended meaning when they recognize features of an external representation, prediction or simulation capabilities can enable individuals to use external graphic representations (Coppin 2014). For example, based on prior experience in Western culture, an individual looking at a religious painting in a European church might predict that a flying white dove is intended to have a religious significance. Similarly, an individual reading the written word cat might predict that the author intended to convey the conceptual category of cat (see Coppin 2014, Chapter 3). This has been termed the “conventionalized” account of representation (Kulvicki 2010).


Peter Coppin, Ambrose Li & Michael Carnevale

Although this conventionalized account is uncontroversial for describing how individuals infer meanings intended by authors of written graphics such as text, applying the conventionalized account to picture perception is controversial (Gibson 1960, 1971, 1978; Goodman 1968; Kenned, 1974; Kulvicki 2010; Coppin 2011, 2014). Many researchers have, instead, claimed that picture perception recruits unlearned, innate, or inherited biologically grounded capabilities to perceive and react in environments composed of occluded surfaces and edges (Gibson 1960, 1971, 1978; Kennedy 1974). Rather than describing picture perception capabilities in terms of innateness, Coppin (2014) describes how pictures make use of capabilities that inherently develop when learning to perceive and react within a physical environment composed of surfaces and edges.

3.2. The anatomy of perception-­action Recall that grasping the cup requires capabilities to pick up reflected light and to simulate the distal side. Memory traces of past perception-­actions are the resources from which simulations are constructed. Simulation involves many of the same neural systems used during perception (Kosslyn, Ganis, and Thompson 2001). If I simulate (imagine) a jet flying from left to right, I use many of the same processes and systems for perceiving an actual jet. In the cup example, when I perceive the cup, I also inform potential action (reaching for and grasping the proximal and distal sides of the cup). Thus, perception and simulation are integrated aspects of perception-­action within a physical environment.

3.2.1 Pictorial properties of graphics (and comparison to the semiotic notion of iconic) Let us now apply this simple model to external graphic representation. Suppose a viewer looks at the European painting described previously. Ambient light is reflected from the painted surface to produce optic arrays that are picked up by retinal detectors. The optic arrays are processed by lower-­level perceptual categories and lower-­level simulators to enable perception of a dove – a depicted object other than the painted surface. Even if the viewer has never seen a bird (or does not have a conceptual category for one), she can still perceptually process the spherical shape of the head, the cone-­like shape of the beak, the hemispherical shape of the eyes, and so on, because she has spent a lifetime developing the capabilities to pick up and perceptually process the kinds of optic arrays that the artist has artificially produced via markings. These are the pictorial properties of a graphic representation. Pictorial properties make use of lower-­level perceptual categorization capabilities and simulators developed to perceive and react within environments composed of occluded surfaces and edges (geometric and topological relationships of an en-

Iconic Properties of Visual Graphics


vironment). Thus, pictorial properties inherently convey geometric and topological relationships associated with Larkin and Simon’s (1989) diagrammatic definition. Pictorial properties are also defined as that which is picked up at the sensory surface when light reflects from a graphic. Thus, pictorial properties can be processed and interpreted for identification under multiple conceptual categories. In other words, pictorial properties are clearly on the analogue side of the analogue–­digital spectrum. Finally, pictorial properties of graphics are clearly physical situations: The marks configured by the artist are precisely what produces the perceptual structure that the viewer picks up. Iconicity is often described in terms of similarity to the represented object. However, as previously discussed, iconic representations also have an internal logic or set of rules. This is akin to Coppin’s (2014) perceptual-­cognitive approach to the pictorial-­symbolic distinction because, in that account, graphic representations are composed of both pictorial and symbolic properties, but to varying degrees. Returning to the dove example: During the perceptual processing of the optic arrays from the painted surface, lower-­level simulators enable the perceptual processing of the depiction as a 3D shape. These are the same capabilities that enable perceptual processing of the proximal side in the cup example to engender simulations of the distal side of the cup. In other words, pictorial properties include an “internal logic” or “set of rules” (the simulations of the distal side of the cup or bird). This integrates Coppin’s perceptual-­cognitive conceptualization of pictorial properties with the notion of iconicity from semiotics. To prevent confusion, this integrated notion of iconicity will be referred to as iconic´.2

3.2.2. Symbolic properties of graphics In Coppin’s model, pictorial properties make use of lower-­level capabilities and simulators developed to perceptually process concrete structures configured by an author (for example, by marking a surface). In contrast, symbolic properties make use of higher-­level capabilities and simulators that are thought to utilize these lower-­level capabilities to form the basis for conceptual understanding and language (cf. Mandler 2004: 76). If externalized language is temporal (cf. Sonesson 1988:  91) and temporality implies sequentiality, symbolic properties then correspond more closely to Larkin and Simon’s (1989) sentential definition. Symbolic properties may also be associated with the simulations intended by an author that fall under the author’s intended conceptual categories, thus placing symbolic properties on the digital side of the analogue–­digital spectrum. Finally, symbolic properties of graphics are the inverse of physical situations: They are less easily mapped back to

2 In mathematics, the ′ (prime) symbol is often suffixed to an existing symbol to denote a related concept. We are borrowing the symbol here to suggest that although the integrated concept is closely related to iconicity, the two might not be identical.


Peter Coppin, Ambrose Li & Michael Carnevale

what could be picked up from the physical world through lower-­level perceptual categorization capabilities and simulators. Note that in Coppin’s model, a perceptual system can still pick up the pictorial properties of writing from an unfamiliar language, but the individual may be unable to produce simulations necessary for the symbolic properties (simulations of the author’s probable intended meaning; see Coppin, 2014, Chapter 3). Because symbolic properties make use of higher-­level simulators at the convergence of sensory modes and are more amodal, Barsalou (1999:  578) defines it as “inherently nonperceptual” (we interpret this to mean not easily mapped back to what a perceptual system could pick up in the world); these simulations are more likely to fall under conceptual categories that do not correspond to what could be perceptually processed from the physical world. Instead, symbolic properties typically relate to their objects through socio-­cultural conventions that derive their information relative to their objects through “memory traces created during cultural learning” (Coppin 2014:  64). This coincides with how symbols in semiotics are also understood in terms of relating to their objects through convention. Coppin’s perceptual-­cognitive conceptualization of symbolic properties is therefore quite consistent with the notion of symbolicity from semiotics. However, to prevent confusion, this integrated notion of symbolicity will be referred to as symbolic´.

3.3. Model extended to sound (and cross-­modal representation) During perception-­action, when memory traces of an Item A become encoded (the cup in the previous example), Item A is encoded within a context with other items (Barsalou 2009), such as the table (Item B) that the cup was sitting on, my memories of how the cup felt when I grasped it (Item C), how it tasted (Item D), and how it sounded (Item E). Thus, as lower-­level perceptual simulators and perceptual categories for vision develop over a lifetime, they develop in networks with other lower-­level simulators and perceptual categories for hearing, touching, and seeing. This results in what is known as cross-­modal correspondence. For example, in our physical environment, smaller objects vibrate at higher frequencies and larger objects at lower frequencies, so individuals developing in such an environment might thus “naturally” associate smaller objects with higher-­frequency sounds and larger objects with lower-­frequency sounds (Spence 2011). Such natural associations could explain non-­arbitrary conventions that have emerged for representing sound that go far into human history, such as music notation systems (Figure 5), where higher-­pitched sounds are represented by marks at higher elevations, and recent psychophysical research supports this claim (Parise, Knorre, and Ernst 2014).

Iconic Properties of Visual Graphics


Figure 5. Music notations use a convention where marks at higher elevations represent higher-­frequency sounds. Note. Bach, J. S. (n.d.). Suite in G minor, BWV 995 [Lute score]. Retrieved January 15, 2015 from http://​commons​.wikimedia​.org​/ wiki​/File:​Bachlut1.png

Let us now consider how iconic´ properties of sound can be conceptualized as the aural equivalents of the iconic´ properties of visual graphics. Similar to how light reflected from a marked surface of a graphic representation, picked up by retinal detectors, was conceptualized as the iconic´ properties of visual graphics, we may conceptualize iconic´ properties of sound as the sound vibrations propagated from an object picked up by the sensory receptors of the ear and perceptually processed as objects in an environment. Similar to how iconic´ properties of visual graphics are transduced into nerve signals and processed by lower-­level perceptual categorization and simulation capabilities that developed over a lifespan to enable perception of occluded surfaces and edges, sound vibrations may be transduced into nerve signals and processed by lower-­level perceptual categories and lower-­level simulators that developed over a lifespan to enable perception-­action within physical environments with topological and geometric properties. Thus, similar to how a visual designer can configure a marked surface of a graphic representation to produce iconic´ properties of visual graphics that make use of lower-­level perceptual categories and simulators to enable perceptual processing of surfaces and edges that are other than a marked surface, a sound designer could configure an audio device to produce iconic´ properties of sound that make use of lower-­level sound perceptual categories and simulators that enable perception-­action within a physical world with topological and geometric properties, to enable an individual to perceptually process topological and geometric relationships (perhaps via the Doppler effect, stereo, or echo) that are other than the device that is producing the sounds (“sonic pictures”).


Peter Coppin, Ambrose Li & Michael Carnevale

Let us next recruit the iconic´ and symbolic´ definitions to more carefully conceptualize sonic versions of visual charts and graphs.

3.4. Applying the extended model to the graphic-­linguistic distinction Applying these iconic´ and symbolic´ definitions to extend Larkin and Simon’s diagrammatic and sentential definitions requires replacing elements with symbols´ (as in Section 2.6) and sequentiality with symbolized´ relations. Table 1. Diagrams are composed of iconically´ represented relations among symbolically´ represented objects and sentences convey symbolically´ represented relations among symbolically´ represented objects. Diagram



Iconically´ represented

Symbolically´ represented


Symbolically´ represented

Symbolically´ represented

In a visual or sonic diagram, the relations among symbols′ are conveyed iconically′ (see Table 1) because they are picked up by sensory receptors via light/sounds reflected from a surface, and what is meaningful about what is perceived is the perceptually processed relations among the symbols′. In contrast, sentential relations are conveyed symbolically′ (see Table 1) because, although iconic′ properties of visual graphics are picked up by sensory receptors via light reflected from a surface, what is meaningful about what is perceptually processed are the author’s intended simulation (meaning), and the conceptual category that the intended simulation is intended to fall under. Let us now compare this chapter’s notion of diagram developed thus far with Peirce’s definition. In Peirce’s system, diagrams are a type of icon, and are defined by their “skeleton-­like sketch of relations between parts” (Stjernfelt 2000: 363). The notion of diagram developed in this chapter thus far, where relations among objects are conveyed via an iconic´ perceptual feature (e.g., visuospatial location, pitch, tactile size, etc.), while objects are denoted symbolically´, is potentially synergistic with the Peirce-­Stjernfelt account because iconically´ conveyed relations could be seen as akin to the aforementioned skeleton-­like sketch of relations among parts. The word “diagrammatic” could thus be a term that refers to iconically´ conveyed relations among parts.

Iconic Properties of Visual Graphics


3.5. Applying the model to an example design problem Let us now return to the WCAG text description example from Figure 1 in order to demonstrate how this model aids understanding and can inform design. The problem with the text description (Figure 1d) is that all content is conveyed symbolically´ (via text-­to-speech), whereas the original visual graphic conveys much of the content via iconic´ properties of visual graphics. If a designer aims to present the chart sonically, how can the designer decide which aspects should be conveyed via symbolic´ properties (text-­to-speech) and which aspects should be conveyed via iconic´ properties of sound (spatial sound)? Recall the pictorial and iconic´ definitions, where pictorial and iconic´ properties are predicted to afford the communication of concrete structures more effectively than symbolic´-linguistic properties, and an aspect of a graphic representation can be identified as more concrete if it produces a perceptual structure that corresponds to what could be picked up and perceptually processed from a physical environment. The shape contour (Figure 1b) is thus primarily pictorial-­iconic, and is therefore more appropriate for translation using iconic´ properties of spatial sound. To determine the aspects of the graphic representation that should be conveyed via text-­to-speech, recall the symbolic´ definition, where symbolic´ properties are predicted to afford the communication of concepts more effectively than pictorial or iconic´ properties, and a concept can be identified as less easily mapped back to a structure that could be picked up and perceptually processed from a physical environment. The numbers that label increments on the x- and y-­axis are thus more symbolic´ because they cannot be mapped back to a perceptual structure that could be picked up from a physical environment. Figure 6 shows a spark line visualization of financial cost over time3 (top left) and the corresponding spectral display of its sonic translation (top right). A spark line could be considered a concept (e.g., the value of a commodity) mapped to a concrete structure (e.g., a mountain’s varying elevation from left to right). A text description of the mountain’s shape would convey a string of conceptual categories that could refer to many possible concrete mountain shapes. However, sound frequencies that rise and fall with the elevation (spatial structure) of the drawn mountain (e.g., if panning from left to right) could translate the iconically´ conveyed spatial structure of the mountain more accurately. The figure shows two additions to the varying sound frequency that we postulate might aid shape comprehension: A baseline tone at middle C, to represent the graph’s x-­axis; and white noise that has passed through a bandpass filter with cut-­offs at the varying frequency and the baseline tone, to represent the graph’s shading.

3 For a movie that demonstrates this prototype, please go to http://perceptualartifacts. org/agi/research/sonification/


Peter Coppin, Ambrose Li & Michael Carnevale

Figure 6. Spectral translations (right) of visuals (left)

4. Conclusion In this chapter we have proposed a provisional model to underpin the various accounts of the graphic-­linguistic and iconic-­symbolic distinctions described in the diagrammatic reasoning literature. The model allows us to extend the graphic-­ linguistic and iconic-­symbolic distinctions into aural and cross-­modal domains. In the process, we have attempted to integrate various definitions of key notions from cognitive science and semiotics, making our project an exercise in cognitive semiotics (cf. Zlatev 2015a). The model distinguishes between two interrelated capabilities: Lower-­level perceptual capabilities to pick up and perceptually process concrete structures of an environment, and higher-­level capabilities to process and interpret how perceptually processed structures fall under more abstract conceptual categories. The model conceptualizes iconic´ properties of both visual graphics and sound as what is picked up by sensory receptors and processed by lower-­level perceptual categories and simulators that develop over a lifespan to enable individuals to perceptually process occluded surfaces and edges (topographical and geometric relationships) of a physical environment, thus enabling the perceptual processing of geometrical and topographical relationships that are other than the surface of the external representation.

Iconic Properties of Visual Graphics


The model thus predicts iconic´ properties to afford the communication of concrete structures more effectively than symbolic´ or linguistic properties. Also, symbolic´ properties are thought to emerge when an individual perceptually processes iconic´ properties of an external representation that an author intentionally configures to cause the perceiver to have a simulation that falls under the author’s intended conceptual category. The model thus predicts symbolic´ properties to afford the communication of conceptual categories more effectively than iconic´ properties. By reverse engineering the classic graphic-­linguistic distinction to more fundamental perceptual principles, we have introduced a means to understand how the distinction applies to sonic representations. The proposed model streamlines definitions that distinguish diagrammatic from sentential structures: Text-­sentences comprise symbolically′ represented relations among symbolically′ represented objects, whereas diagrams are iconically′ represented relations among symbolically′ represented objects. A sonic diagram is thus conceptualized as iconically′ conveyed sonic relations among objects linguistically (symbolically′) conveyed via text-­to-speech. This proposed model enables researchers and designers to generate testable predictions for converting visual graphics into non-­visual perceptual modes (other than the text-­to-speech approach proposed by WCAG, which ignores the iconic properties of graphics).

Acknowledgements This research was supported in part by grants from the Centre for Innovation in Data-­Driven Design and the Graphics Animation and New Media Centre for Excellence. We would like to thank research assistant Damon Pfaff, Dr. David Steinman, the reviewers and the editors Piotr Konderak, Göran Sonesson, and Jordan Zlatev. The reviewers’ and editors’ detailed feedback has been invaluable in making this chapter a reality.

Part IV: Language, blends and metaphors

Todd Oakley

Chapter 19 Deonstemic Modals in Legal Discourse: The Cognitive Semiotics of Layered Actions 1. Cognitive semiotics and institutional discourse Many approaches within cognitive semiotics presume that linguistic meaning is constituted by human interaction and intersubjectivity (e.g. Zlatev, Racine, Sinha and Itkonen 2008); therefore, models of language cannot be theorized absent some non-­trivial usage-­based account of linguistic structure, which arises from the social agency of participants as they cope with their elaborate symbolic milieus. If cognitive semiotics names a paradigmatic shift in the cognitive sciences toward language as one of many semiotic modes of fluent coping, and if part of our coping is decidedly related to institutions and their operations, then attention needs to be paid to language as an institutional phenomenon. Much of human meaning making in the West takes place in and among written documents, many of which play a determining role in the construction of social reality (cf. Searle 1995). On this occasion I seek to look at institutional language through the lens of modality, a pervasive feature of grammars and lexicons for expressing the speaker’s subjectivity in relation to events, actions, and states, especially as they pertain to other subjectivities. Standard linguistic studies of modality proceed as if it is simply a type of linguistic category and then investigate such forms and their syntactic distributions within and across clausal boundaries. Traditional linguistic methods have produced useful insights and generalizations upon which the present approach builds; however, this methodological decision hides the important role of institutional context in the shaping of meaningful communication. Sometimes the categorical intuitions of linguistic analysis break down in the face of inherent ambiguity. The linguist then deploys a usual tactic of appealing to context for resolving an ambiguous usage, the implication being that such ambiguities are mere limiting cases of little relevance to the generalizing claims being made. Such tactics are entirely understandable, indeed are necessary, given their aims and purposes. However, what if the aim is to understand how language operates in specific institutional formations with complex histories of issuance, accumulation, interpretation, and citation, as is the case in cognitive semiotics? The cognitive semiotics of language outlined in this chapter focuses attention on its use in rigid institutional settings (De Jaegher 2013)1; specifically, the rigid in1 For De Jaegher, “rigid” means patriarchal, rule-­based, hierarchical, and non-­ democratic institutions, which she contrasts with fluid, democratic institutions.


Todd Oakley

stitution of the Supreme Court of the United States of America (hereafter SCOTUS). The history of SCOTUS is one of three branches of American government that, at the time before the Civil War was considered the weakest of the three branches, but, largely resulting from the slowly gaining momentum of judicial review established in the case Marbury vs. Madison (1803), has come to be without argument the strongest branch of the federal government. To exemplify this change, consider the contrasting dispositions of presidents Andrew Jackson (1829–1837) and Dwight D. Eisenhower (1953–1961). Jackson felt perfectly entitled to ignore SCOTUS’s injunction against the United States “Indian Removal Act” in Worcester vs. Georgia (1832) while Eisenhower felt compelled to enforce their decision in Brown vs. Board (1953) ending racial desegregation, even though it was politically the last thing he wanted to do. Clinton (1989) provides an historical account of the Supreme Court’s arrogation of judicial review, the powers to review and declare unconstitutional statutes and orders originating from the executive and legislative branches of federal and state governments, as the sociological equivalent of incremental increases of the Court’s power to push the recalcitrant and block the zealot; his narrative recounts the morphing of the Court from an institutional featherweight contender to heavyweight champion. A study of the English modal verb must in the rigid institutional setting of SCOTUS offers us a test case for cognitive semiotic research on language as an institutional practice. After a brief discussion of modal verb types, I proceed to discuss an inherently ambiguous instance of must that led me to posit the new, institutionally specific, category of the deonstemic. I then present the findings of a corpus analysis of must, must not, and mustn’t in 33 landmark majority opinions of the Court. I identify 6 discourse layers in SCOTUS in relation to this newly identified mode. On the basis of this model, I proceed to show that the institutionally specific deonstemic mode arises when a statement’s meaning divides attention between the impositional world-­to-mind fit of the justices (layers 1 and 2) and the descriptive mind-­to-world fit constraints on their powers (layers 4–6).2

2. Linguistic modality: A tripartite model The general approach to modality taken in this study closely follows the one outlined by Langacker (1987, 2008) and developed by Sweetser (1990: 49–75), and Talmy (2000: 440–452). I will also draw on the diachronic work of Bybee, Perkins and Pagluica (1994: 175–242) for additional background. Cognitive Grammar (henceforth CG) takes as part of its portfolio the task of formalizing the “groundedness” of modality as a notional category, aligning it with the dialogical approaches of Bakhtin (1986) and Voloshinov (1986), and later by Linnell (2005) and DuBois (2007), even though Langacker himself does not characterize CG 2 The notion of “direction of fit” between mind and world derives from the work of Searle (1995, 2004), and is utilized in the model presented in Section 5.3.

Deonstemic Modals in Legal Discourse


as inherently dialogical. In CG, grounding “indicates a speech event, its participants (speaker and hearer), their interaction, and the immediate circumstances (notably the time and place of speaking)” (Langacker 2008: 259). Grounding is a critical element of CG, for it establishes the relationship between nominal referents and the status of events, actions, or states with respect to spatial and temporal reality. Grounding elements bridge the gap between isolated lexemes and their instantiation, as full expressions require grammatical operators specifying the status of the lexeme vis-­à-vis the ground. According to CG, there are nominal grounding elements (e.g., a, the, this, that, some, every, each, all, none, no, etc.) and clausal grounding elements (e.g., -s, -ed, -ing, will, have to, should, etc.). Nominal grounding elements direct the hearer’s attention to the intended discourse referent, while clausal grounding elements direct attention to the situation (profiled relationship) to “the speaker’s current conception of reality” (ibid: 259). Broadly speaking, the immediate forms of English modal verbs — ­may, can, will, shall, and must — a­ nd their displaced relatives — ­might, could, would, should — ­function as clausal grounding elements that derive their function from a force dynamic (Talmy 2000, see below) and future-­oriented tendency toward action; their “potency,” as Langacker calls it, “inheres in the ground” (Langacker 2008: 305) and serves as the means of distinguishing their senses as either root/deontic, enunciatory, or epistemic. Of the modals, must is both the most semantically forceful and formally distinctive, in so far as it lacks a corresponding displaced relative (e.g., “you may/might (be able to) go to the county fair”; “you must/Ø go to the fair”). As a formal outlier, must gives rise, in certain circumstances, to ambiguous deontic/ epistemic meanings, as I shall exemplify momentarily. Generally, modal verbs express ability, intention, obligation, and permission as well as certainty, desire, necessity, probability, and possibility (for an overview, see Bybee, et. al 1994). These general meanings are distributed over the various deontic, enunciatory, and epistemic usages. Let us review by way of examples the distinct modal usages of must.

2.1. Deontic (root) modality Talmy (2000) offers a unified semantics of modal verbs in the form of force dynamics. The force associated with must is that of a stronger Antagonist (ANT) to a weaker Agonist (AGO), either in pushing the recalcitrant or blocking the zealot, as exemplified in (1) and (2). (1) You must be home before dinner. (2) A junior faculty member must not take on extensive administrative duties until after the pretenure review.

In both examples, an unspecified ANT forces a weaker AGO (you/junior faculty member), whose intrinsic force tendency is either toward inaction (1) or action (2). As Talmy notes, such force dynamic tendencies can be applied to basic (Aris-


Todd Oakley

totelian) physics, intra-­psychological states, and socio-­psychological situations, thus providing the “ancestral” embodied basis of the three modalities outlined in this section.

2.2. Enunciatory modality Sweetser (1990: 69–73) refers to this type as “speech act” modality where the modality applies to either root or epistemic (discussed below) domains, but additionally, to the discourse ground. The frame of conversational interaction is invoked to organize the notion of compulsion — ­force dynamics is understood as illocutionary force. The notion here is that the speaker is divided into a weaker AGO who does not want to say X and a stronger ANT portion that is compelled to do so, often by ethical responsibility or objective/external circumstances. I, however, prefer the term enunciatory, to emphasize the special case of “putting onstage” the act of speaking. The designation “speech act” as a special kind of modal is too imprecise, since all deontic and epistemic modals carry illocutionary forces of either imposing or describing a state of affairs, making them speech acts. Enunciatory modals are, thus, special metalinguistic instances that dramatize the participants as speakers, which, if I am reading Sweetser correctly, is the primary motivation behind her analysis. Sentences (3) and (4) epitomize this type. (3) Regrettably, I must insist that you plagiarized your term paper. (4) AIG must never refer to Credit Default Swaps as “insurance” in any of its internal documents.

In (4), the dramatic business focuses on the proper “baptismal” practices of naming something in official documents.

2.3. Epistemic modality Epistemic modality grounds the speech event in terms of the speaker’s reason and evidential dispositions, with must eliciting “inferred certainty” (Bybee et. al. 1994: 179), as illustrated in (5) and (6). (5) He must be home, for his car is parked in the driveway. (6) She must have been devastated when the doctor gave her that diagnosis!

In (5), the car’s being in the driveway is a sufficiently compelling ANT for the speaker’s inferred certainty, and in (6), the speaker is forced by the compelling nature of the situation to conclude something about the mental/emotional state of the referent. In each case, a situation or piece of evidence is sufficiently powerful to compel a conclusion.

Deonstemic Modals in Legal Discourse


2.4. Ambiguous cases and the “necessity test” As most linguists working with modals attest, there are many instances of inherently ambiguous interpretations where modals are either interpreted deontically or epistemically, as in (7). This utterance can either mean that the referred person is obliged to be there (deontic) or that the speaker is certain he is there (epistemic), depending on local context; the sentence either imposes or describes, but not both (cf. Sweetser 1999: 73). Consider now a real-­world example (8), taken from the final sentence of the landmark SCOTUS opinion, McCulloch vs. Maryland (1816). (7) He must be in his office. (8) Such a tax must be unconstitutional.

In contrast to (7), this sentence can be read both as obliging future agents and agencies to rescind these tax laws, thus imposing a future state of affairs, and expressing certainty about an existing set of laws, thus describing a present state of affairs. As I shall argue in greater detail, such sentences gain their meaning and modal force from being both deontic and epistemic. One way to highlight the nature of this ambiguity is to apply a necessity substitution test to examples (1–7), in which we substitute must with a grammatically appropriate form of need. The frame semantics of need exclusively profiles the core element of REQUIREMENT, whereas must profiles REQUIREMENT and POSSIBILITY in equal measure (Baker et. al. 1998), thereby revealing the subtle but significant differences in the meaning of must, as shown in examples (1´–3´; 5´–7´). (1´) You need to be home before dinner. (2´) ?A junior faculty member need not take on extensive administrative duties until after the pretenure review. (3´) Regrettably, I need to insist that you have plagiarized your term paper. (5´) *He needs to be home, for his car is parked in the driveway. (6´) *She needs to have been devastated when the doctor gave her that diagnosis! (7´) He needs to be in his office.

The semantics of the auxiliary need (1´), (2´), (3´), and (7´) pass the necessity substitution test, but sentences (5´) and (6´) do not.3 The epistemically pure cases of (5) and (6) include some kind of explicit reason for their epistemic nature, which ambiguous cases lack. It is precisely this lack of a reference to the agent judging the

3 (2´) releases the subject from an obligation rather than imposing an injunction on him/her, thereby presuming a contrary state of affairs from (2). Nevertheless, the focus of attention is on the deontic rather than epistemic power(s) of the speaker.


Todd Oakley

situation that renders (5´) and (6´) problematic. Thus, (5´) and (6´) can be rephrased as follows. (5´´) (It is logically necessary for me to consider) him to be at home (for his car is parked in the driveway). (6´´´) (It is logically necessary for me to consider) her to have been devastated (as a result of the doctor’s diagnosis).

Hence, epistemic modality is explained as attributing reasons that are more often than not made explicit, without, however, suggesting that the speaker has any ability to affect the situation. The semantics of need highlight a potential deontic power of the speaker to impose an obligation on the addressee. Cases like (7), on the other hand, are ambiguous precisely because the reasons are left unarticulated, inviting the audience to infer the reason, which is either based on evidence or on deontic powers of the speaker. The use of the infinitive copula be, however, harks back to prior evidence explicitly articulated in the discourse, which is why such examples are “pure” instances of epistemic modality — ­their reasons need to be made transparent, regardless of the speaker’s status. Now, what about the proposed deonstemic modality? The necessity test highlights both the epistemic and deontic dimensions of the same utterance, as shown in (8) and (8´). In the framework of Speech Act Theory (Searle 1969), court statements have declarative force (because of its deontic powers), and therefore they have a double direction of fit and causation: mind-­to-world and world-­to-mind. This being the case, the Court considers that something needed to be considered as X causes it to be of X-­type, giving rise to (8´´) and (8´´´). (8) Such a tax must be unconstitutional. (8´) (Its is logically necessary for the Court to conclude) such a tax to be unconstitutional. (8´´) (By this statement, you Americans need to recognize) such a tax to be unconstitutional. (8´´´) (In America) Such a tax is unconstitutional.

Notice that in this analysis the epistemic force of the statement is blended with the deontic powers derived from the Court. And the ontology described by (8´´´) depends on the social ontology that the Court’s (rigid) deontology can create. In a non-­trivial sense, they are deonstemic.

3. Cognitive semiotics and the dynamics of rigid institutions I wish to offer an initial study at this occasion of lining out the prospects and programs of cognitive semiotics of institutional discourse. That cognitive semiotics has

Deonstemic Modals in Legal Discourse


as one of its aims to try to account for the relation of language to its environment of use provides leverage for better understanding of language as an institutional phenomenon — p ­ articularly written texts originating from institutions with rigid protocols for their composition and dissemination. Such texts fit the prototype of discourse that was described by the rhetorician Lloyd Bitzer in his famous essay, “The Rhetorical Situation” (1968),4 and extends the range of linguist Herbert Clark’s notion of layering (1996: 353–384). It is to these two notions that I now turn.

3.1. Rhetorical situations Lloyd Bitzer provides us with a basic heuristic for thinking about language as situated discourse. According to Bitzer, all public discourse occurs in situations comprising of three basic elements: exigence, audience, and constraints. For a situation to be rhetorical it must (a) address some seemingly imperfect state of affairs (exigence) that “invites utterance” (Bitzer 1968: 2) from a single person to a key demographic, to an entire nation and beyond. For example, a SCOTUS opinion addresses first a disagreement between two parties, the settlement of which has broad implications for the general polity. Secondly, (b) it must be addressed to an audience, or the audience must be invoked. That is, all rhetorical actions are directed at someone; SCOTUS opinions seem to have multiple and specific constituencies, not all of which are equally influential or relevant. Finally, (c) it must be issued from some place, at some time, and under specific conditions, the most obvious of which being the language used. These are the constraints. One of the constraints of a SCOTUS opinion is that it is the “final word” on a case; that is, the only body that can revisit the case is the Court itself. The fact that these opinions are permanently held, publically available, and citable are key constraints. Exigence, audience, and constraints serve a useful purpose in guiding an analysis of the multiple layers of discourse that comprise SCOTUS opinions.

3.2. Layering “People sometimes appear to say one thing when they are actually doing something quite different,” observes Clark in the opening sentence of the final chapter of Using Language (1996: 353). This observation leads Clark to propose that the 4 Among rhetorical theorists, Bitzer’s “objectivist” account of the rhetorical situation has received explicit criticism, particularly by Richard Vatz in “The Myth of the Rhetorical Situation” (1973). Vatz argues that “exigence” is largely a creation of the speaker. Scott Consigny offers something of a Hegelian synthesis in his response “Rhetoric and its Situations” (1974) by suggesting that rhetorical situations are objective facts that ensue from prior rhetorical situations. None of these critiques, however, challenge the heuristic value of Bitzer’s original formulation, which is precisely the purpose of employment here. My own view on the nature of the rhetorical situation aligns with Consigny’s.


Todd Oakley

discourses of fiction and other forms of pretense consist of multiple layers of joint action and events, such that two children can be simultaneously digging in a back yard in the present time (layer 1) and “prospecting for gold in the Dakota Territories circa 1876”, layer 2 (Clark 1996: 354–360). This joint pretense scenario has two layers with distinct roles (prospectors, such as Wild Bill Hickok and Calamity Jane) and values (a particular boy and girl), audiences or ratified participants (e.g., the boy can be the audience of the girl in layer 1; Wild Bill can be the audience of Calamity Jane in layer 2), bystanders (e.g., the boy’s mother calling him to dinner in layer 1, or the mother could “play along” and call Wild Bill to dinner, in which case the command issued to Wild Bill in layer 2 imposes an obligation on the boy in layer 1), constraints (e.g., language use in both layers is English but with contrasting semantic, referential, and prosodic features endemic to layer 1 or layer 2), and joint actions (e.g., digging up the flower bed in layer 1, while prospecting for gold in layer 2). Clark’s (1996: 353–386) account of layering focuses almost exclusively on situations of pretense in play, fiction, drama, irony and sarcasm, teasing, and other rhetorical phenomena, but it does not cover rigid institutional discourses; his account leaves out explicit mention of audience/addressee types. A proper account of the deonstemic modality comes sharply into focus when we examine these texts as multilayered artifacts, with each layer potentiating actions of different addressees. The upshot of my argument is that layering is essential to all textual ecosystems, and built into the legal discourse of legal systems such as SCOTUS. This argument is, in part, based on the intuition that a statement that is ambiguously deontic (imposing) or epistemic (descriptive) is actually imposition at one layer of discourse and descriptive at another; other contextual factors conspire to determine the attentional salience of any illocutionary force of the utterance in question. Layering is particularly critical for the interpretation of legal documents, with SCOTUS opinions being perhaps the most elaborate, as they are the most influential. Primary participants are to imagine events, actions, etc. at higher layers all the while appreciating why the authors and actors at the primary level created them. In the case of SCOTUS, one always has to appreciate the deontic power to perform declarative speech acts, even in cases where those speech acts are themselves ambiguous. The ensuing analysis can be regarded as a meshing of Bitzer’s elements of the rhetorical situation with Clark’s account of layering and is illustrated in the next section.

4. Case study: 33 influential SCOTUS opinions The present corpus of the most influential Supreme Court opinions is gathered from the Constitution Society’s site Landmark Supreme Court Decisions (Roland, n.d.). According to the site’s compiler, Jon Roland, their influence is measured by (1) number of citations as precedent in other court opinions (from SCOTUS and the lower courts) and (2) by the number of citations in articles in law journals, law reviews, and law school textbooks. My analysis found 893 total instances of must (including negations) using Voyant Document Tools. I then proceeded to classify each instance as either deontic, enunciatory, epistemic, or deonstemic.

Deonstemic Modals in Legal Discourse


I outline the classification criteria for each modal type in the sections below. To illustrate each type, I present 8 examples, 1 from Marbury vs. Madison (1803), a decision establishing the doctrine of judicial review, and 7 from Furman vs. Georgia (1972), a decision placing a moratorium on the death penalty as cruel and unusual punishment and eliminating the death penalty in cases of rape. (The first part of this decision was overturned in Gregg vs. Georgia, 1976.) This choice is merely one of convenience, a result of culling examples from a specific part of the master spreadsheet.

4.1. Deontic must Instances of the deontic modality collate with the imperative mood and with active verbs of compliance, as in (9). Equally often, deontic must collates with be + past participle grounding elements a statement of prudence, as exemplified in (10). (9) It must observe a fastidious regard for limitations on its own power, and this precludes the Court’s giving effect to its own notions of what is wise or politic. (10) But the proper exercise of that constitutional obligation in the cases before us today must be founded on a full recognition of the several considerations set forth above. Furman vs. Georgia, 408 U.S. 238 (1972)

4.2. Enunciatory must Enunciatory instances collate with speech-­act verbs and phrases, such as to note and admit. (11) It must be noted that any equal protection claim is totally distinct from the Eighth Amendment question to which our grant of certiorari was limited in these cases. (12) I must also admit that I am confused as to the point that my Brother POWELL seeks to make regarding the underprivileged members of our society. Furman vs. Georgia, 408 U.S. 238 (1972)

4.3. Epistemic must Instances of epistemic uses keep the focus of attention on reasonableness or its opposite; linguistically, the modal collate with the present perfect verb phrase, as exemplified in (13). Equally often, however, the modal collocates with simple present and a verb of deliberation, such as conclude in (14). (13) The validity of his appointment must have been determined by judicial authority. Marbury vs. Madison, 5 U.S. 137 (1803) (14) One must conclude, contrary to petitioners’ submission, that the indicators most likely to reflect the public’s view — l­egislative bodies, state referenda and the


Todd Oakley juries which have the actual responsibility — ­do not support the contention that evolving standards of decency require total abolition of capital punishment. Furman vs. Georgia, 408 U.S. 238 (1972)

4.4. Deonstemic must With the deonstemic modality, the attention oscillates between reason and obligation, with reason being the foundation of obligation. Instances like (15) collocate with the infinitive be. Equally often, however, deonstemic usages are embedded in conditionals as in (16). (15) He is condemned to painful as well as hard labor. What painful labor may mean we have no exact measure. It must be something more than hard labor. It may be hard labor pressed to the point of pain. Furman vs. Georgia, 408 U.S. 238 (1972) (16) But if an innocent man has been found guilty, he must then depend on the good faith of the prosecutor’s office to help him establish his innocence. Furman vs. Georgia, 408 U.S. 238 (1972)

In (15) and (16) reason and obligation are two sides of the same coin. With (15), the emphasis tends toward epistemic, but only as a means of setting up a pragmatic scale for deciding whether some punishment is cruel and unusual. With (16), the emphasis tends toward the deontic, with the apodosis expressing an obligatory condition for the protasis. We can spot this differential emphasis with the application of the necessity test in (15´) and (16´). Sentence (16´) has a much stronger deontic tincture than does (15´). (15´) He is condemned to painful as well as hard labor. What painful labor may mean we have no exact measure. It needs to be something more than hard-­labor. It may be hard labor pressed to the point of pain. (16´) But if an innocent man has been found guilty, he needs then depend on the good faith of the prosecutor’s office to help him establish his innocence.

4.5. Quantitative analysis of must Table 1 presents the aggregate numerical breakdown of must and their corresponding percentages. The largest percentage of instances falls within the deontic category, with epistemic coming in a distant second. Instances of deonstemic modality finish a close third, with enunciatory usages finishing way behind the pack. It appears then that the deonstemic is a significant linguistic phenomenon of SCOTUS decisions. Understanding the operative layers of discourse in SCOTUS opinions will be aided by examining the constituent layers in detail.


Deonstemic Modals in Legal Discourse

Table 3. Quantitative Analysis of must in SCOTUS Opinions Type





61.2 %



20.3 %



15.25 %



3.25 %

5. Discourse layers in SCOTUS opinions Before proceeding to the case study, it is perhaps appropriate to explain how the six layers of discourse outlined below might plausibly be regarded as a cognitive model.

5.1. Units of analysis in a cognitive ecology Clark’s examples of layered discourse comport reasonably well with traditional theories of cognitive science that take the individual brain and body as its unit of analysis. Both participants in the backyard Dakota drama had the requisite mental skills that allowed them to imagine and appreciate the differences between layers of pretense and reality, and we can safely presume that this joint action did not tax working memory and attention to a degree requiring elaborate symbolic scaffolding.5 The six-­layer model used to capture institutional language practices of SCOTUS runs into trouble if the unit of analysis remained inside the skull, or even if it is extended only from the brain to the (non-­neural) body, for it is difficult to imagine how typically developed individual minds could develop such elaborate modes of thinking without extensive external scaffolding. Over the last few decades, cognitive science itself has been changing, such that the role of human cognition cannot be confined simply to brains with bodies. Under this view — v­ ariously referred to as distributed, embedded, embodied, or enactive cognitive science — ­human cognition is irreducibly ecological. Much of what we call thinking is distributed over the brain, the (non-­neural) body, and an environment consisting of objects, tools, texts, other individuals, and institutional structures (Varela, Thompson, and Rosch 1991; Clark and Chalmers 1998; Hurley 1998; Noë 2004; Wheeler 2005; Thompson 2007; Chemero 2009; Hutchins 2010). It is notable that such theories find inspiration from some titrated mix of phenomenology (­Husserl, [1900] 1970; Heidegger [1927] 1962; and Merleau-­Ponty [1945]

5 Although traditional theories tend to underestimate the role the built environment plays in scaffolding the “internal” mental operations that make such pretend play possible.


Todd Oakley

1962), ecological psychology (Bateson 1972; Gibson 1979) and cultural-­historical activity theory (Vygotsky 1978; Wertsch 1985; Lave 1988), aligning them with cognitive semiotics (cf. Zlatev 2015a). The present model of discourse fits within this “cognitive ecology” tradition, wherein symbolic environment is seen as scaffolding much of our thinking, communication, and action. Take away the elaborate documentary modes of permanent script, and the entire enterprise fails to function as a unit of cognition. The multiplex layers of discourse manifest in any given SCOTUS opinion need not be the property of any single justice (although something like it might very well have been internalized by experienced practitioners), for it specifies the social ontological properties of a textual ecosystem that constitutes jurisprudential reasoning. Once in place, the model at the very least offers heuristic value of revealing the way language works to distribute attention and mental resources to different facets of jurisprudence.

5.2. A jurisprudential deontology Consistent with the cultural-­historical dimensions of cognitive ecology, a close examination of the precise notion of American jurisprudence is propaedeutic for proper understanding of the 6-layer discourse model presented in the following section. At base, the Court possesses its powers only to the extent that it embodies the twin principles of prudence and constitutionality. In the present context, the principle of prudence means that justices should be as cautious and conservative with their decision making as practically possible. Relatedly, the principle of constitutionality entrains justices to follow precedents of prior decisions (known in legal lingo as stare decisis) — the degree to which justices follow this principle is a matter of great contention. Historical circumstance and evidence can override these principles, such as when the Court overturned Plessy vs. Ferguson (1896) in the Brown vs. Board (1953) decision, effectively delegitimizing the principle of “separate but equal” and ending de jure racial segregation in public schools. Cases have holdings that are regarded as legally binding and provide constraints or signals to other courts, legislators, and executives about the creation of law, of legislation, and the implementation of policy.

5.3. SCOTUS: Six Layers of discourse A summary view of the six layers of SCOTUS discourse in Figure 1 shows the bottom constituting the primary layer, and each successive layer supervening on the previous layer. Thus, events represented in the sixth layer are only relevant in relation to the structure and function of the lower layers. Higher layers of discourse pertain to temporal and spatial displacement from the genius locus of the Court and its proceedings, as represented by the “deictic arrow” on the right of Figure 1.

Deonstemic Modals in Legal Discourse


Figure 4. Six Layers of SCOTUS opinions

5.3.1 Layer 1: decision (rights). When an opinion is published, the press focuses on its implications for future legislation and policy by executives. Thus, these are the primary ratified audience, for they are the ones who can or are to be compelled to effect change. Private interests may also be immediately affected, but this is only to the extent that they are acting in anticipation of compelling force from legislators by an elected executive (i.e., the President of the United States, governors, cabinet level secretaries and agencies, law enforcement officials, and so on). Interested bystanders include attorneys and other legal authorities, many of whom will serve as ratified parties in future legal, legislative, policy, and academic contests. This layer represents the naked power of the Court to mold social reality in a world-­to-mind direction of fit (Searle 2004: 118), such that the world comes to resemble the minds of the justices. Temporally, layer 1 operates according to the present and future; interpersonally, joint actions at this layer involve sitting Associate Justices and the Chief Justice; linguistically, constructions operating unambiguously at this layer include such commonplaces as the court rules that that this law violates the Equal Protection Clause; this court; and no state can impose such a tax. 5.3.2 Layer 2: rationale (responsibility). Layer 2 focuses attention on the prudential principle. The responsibility of the justices is to be reasonable and rational, for the reputation of the Court is of abiding concern to all justices but especially for the Chief Justice, whose name brands the Court during his tenure (e.g., the Warren Court after Justice Earl Warren; the Roberts Court after the sitting chief justice, John Roberts). The sense that other decisions, being that they are final, need to be given due consideration and influence. The decision cannot come out of the blue, and the ratified audience can potentially resist if the Court is seen as unreasonable and overly cavalier. The latest ruling in the Affordable Care Act (a.k.a. Obamacare) in The National Federation of Independent Businesses, Et. Al. vs. Sebelius, Secretary of Health and Human Services, Et. Al. (2011) is interpreted as Chief Justice Roberts (the vote


Todd Oakley

that tipped the decision in Obama’s favor) recognizing that such a decision coming in the wake of several conservative decisions that have broken with precedent is likely to cast the Court as a rogue institution, ultimately undermining its authority in the long-­run. In short, layer 1 brings with it the discourse expectation that “what we say goes”, while layer 2 highlights the fact that the power of the justices in layer 1 is not absolute. Actors in layer one appreciate the obligation to reasonability specified in layer 1 and are often entrained to imagine the consequences of abusing power. At layer 2, the principal participants are the justices, but the ratified participants include the general public and future general publics, thus explicit reference to the Constitution, to the principle of stare decisis, and to other standards of reason are of strategic importance at this layer. Temporally, the concern is preserving as much continuity with the past as is morally and practically prudent, while at the same time focusing attention on the long-­term impact of the decision. Thus the temporal scope is similar to layer 1, but perhaps with an extended scope of the distant future of the Court. Linguistically, stock phrases like We must always remember that… are used as admonishments to abide by certain rational and jurisprudential principles. At layer 2, the discourse tends to focus on the possibility that the “final word” of the Court may not be its “last words,” or that its “final words” can always become “unjust actions,” as happened in the infamous Dredd Scott vs. Sandford (1857), regarded as the worst opinion in the history of the Court and held liable for precipitating the Civil War. 5.3.3 Layer 3: enunciatory (address and assertion). Justices argue with one another. This layer focuses on justices as personalities who differ, sometime very stridently with the opinions of their contemporaries and predecessors. This layer dramatizes the compelling nature of certain speech acts — ­often reluctantly. Deliberation and disagreements are as uncomfortable and face-­threatening as they are obligatory. They ameliorate the face threatening dimensions of agonistic discourse. These instances can be regarded as illocutionary force indicating devices (cf. Brown and Levinson 1987). At this layer, the primary participants are the justices themselves, each of whom is either speaking for himself, herself, or for the Court. Linguistically, phrases such as I must admit/disagree/demur, etc., or It must be admitted that function as pre-­disagreements. The second example, which appears more about a half-­dozen times in the corpus fits with Sweetser’s (1990: 73) contention that enunciatory modality keeps close company with epistemic modality. 5.3.4 Layer 4: epistemic. Justices are human beings, subject to the same powers of reason as others, and they often must appeal to doxa (common opinion). Epistemic modals that are unambiguous trade on this notion of inferred certainty common to us all, especially in topics associated with common sense opinion. We take what is certain as a basis for deliberating about topics of inherent uncertainty. This layer legitimizes layers 1 and 2 by way of a fortiori reasoning; if the justice is acting reasonably by doxastic standards, then we must be confident that he or she is being reasonable according to jurisprudential standards. To be sure, this is a social ontological expectation, and virtual sparring over the justices’ own motives and competencies has become something just shy of contact sport

Deonstemic Modals in Legal Discourse


among American citizens and pundits. Skepticism aside, layer 4 trades on the general appeal of reason and rationality as a basis for drawing conclusions. If a justice is certain of something, we presume he or she is certain for a good reason and thus is compelled by reason to draw a conclusion. Linguistically, stock epistemic constructions, such as I must conclude and It must have been the case, with a temporal focus either on the present or on the present perfect, operate most naturally at this layer. 5.3.5 Layer 5: deontic (order). Justices are also obliged to take courses of action that meet standards of reasonability and rationality in the order of adjudication. Such standards are often dictated by the order of adjudication followed by the lower courts. Cases are complex with many dimensions and parts. The order of decision cannot be seen as arbitrary but must flow from the logic of the case. Linguistically, this layer is epitomized by such commonplace phraseology as We must now proceed to decide the next question, and The court must subsequently decide that question only after dispensing with this one. Temporally, this layer focuses attention on the here-­ and-now of the opinion and the immediate future, exemplifying a genre-­specific type of discourse deixis. 5.3.6 Layer 6: narrative. The very reason for the Court’s existence is the fact that some event or set of events happened in the past. There is a plaintiff and defen­ dant, two parties in conflict. In most instances (except in the very small range of case types in which the Court has original jurisdiction, such as treaties), cases before SCOTUS were adjudicated by lower courts with original jurisdiction and then subsequently by the lower appellate courts. With rare exception, plaintiffs and defendants have sought remedy in other jurisdictions, with this as the final contest. Thus, William Henry Furman can seek remedy from the State of Georgia and the United States government can establish a National Bank free from excise taxation by the State of Maryland. Linguistically, SCOTUS opinions are replete with narrative of past events pertaining to the case and of its past proceedings of the lower courts. In fact, for many opinions a majority of textual “real estate” is given over to description of past events, actions, and states of affairs, all of which provide the basis for the Court’s verdict. Layer 6 is perhaps the most voluminous, but narrative particulars are rarely a substantial part of the quoted material in other SCOTUS opinions. (Quoted material from Marbury vs. Madison (1803) The most cited opinion, consists of a mere six sentences, none of which pertain to the case itself (Oakley and Tobin 2014).

5.4. Back to McCulloch vs. Maryland (1816) How can we apply the model to the final statement from Chief Justice John Marshall’s opinion ruling against Maryland’s tax on a national bank from (8), repeated as (17)? Here, the emphasis tends toward the deontic, given that it is the final pronouncement of the decision, but the pronouncement is packaged in a clause that


Todd Oakley

gives considerable weight to the epistemic layer (hence the capital markings of layers 1 and highlighted marking of 2 and 4 in Figure 2). (17) This tax must be unconstitutional.

Figure 5. Six Layers of Discourse in McCulloch vs. Maryland (1816)

Imagine the alternative construals in (17a) and (17b). With (17a), we get a very different implication, with attention focusing heavily on layer 4 and (perhaps) layer 2. The illocutionary force of this utterance would be epistemic and descriptive, as if the Court were merely categorizing past laws with little regard for subsequent law, see Figure 3. (17a) This tax must have been unconstitutional. (17b) This tax needs to have been constitutional.

Applying the necessity test to (17a) yields the awkward (but still grammatical) construal of (17b) (which requires change to the final adjective to make it coherent). While (17a) focuses attention on layer 4, (17b) refocuses attention back to layer 1, constituting an authoritative rebuke of a past legislative deed. This version might even lead addressees to expect the court to issue a specific remedy, such as “But the tax is clearly repugnant to the Constitution;6 thus, the State of Maryland must never have enacted such an law in the first place.”

6 The phrase “repugnant to the Constitution” is a favorite of John Marshall, Chief Justice and author of this opinion, and appears in many of his most famous opinions.

Deonstemic Modals in Legal Discourse


Figure 6. SCOTUS Layers applied to example (17a)

We may thus conclude that the notion of a deonstemic emerges from the focal attention to layers 1 and 2, two layers not fully present (or not in the same way) in other governmental institutions in the United States. What is more, this is an institution with a rich textual archive constantly in play, but they are texts with rigid procedures for their composition and dissemination. Thus, an appreciation of the meaning of modal verbs, particularly the most forceful of modal verbs, must, helps us understand the dynamics of force and counterforce at the interface of semiotics and the social world. The pragmatic function of deonstemic modality is to package a pronouncement that conforms to the epistemological expectations at layer 2, so as to ameliorate the patina of the bald exercise of power. One must both appreciate the forcefulness of the pronouncement and appreciate the reasonableness and prudence thereof, even though, at base, (layer 1) the social ontology of SCOTUS boils down to “because we say so”!

6. Conclusion From a linguistic perspective, Sweetser (1990) correctly concludes that “modals can be used either to impose or to describe (report) in both the content and speech-­act domains but can only describe in the epistemic domain” (ibid: 73). From a cognitive semiotic perspective, Sweetser’s conclusion is not fully adequate. True, epistemics describe while deontics impose, but deonstemics both impose and describe, and such instances only become apparent in specific types of rigid institutions. An analysis of 33 of the most influential Supreme Court opinions suggests that there is a language-­specific category of deonstemic modality for must whose purpose is to blend our obligation with reason, and that such “modal blends” are critical for understanding how certain institutions build and maintain trust and predictability among


Todd Oakley

the population(s) it governs. In American jurisprudence, the power to impose derives in part from a view of the Court as a deliberative and reasonable body, meaning that its decisions have to be seen as constrained by something other than naked power.7 The implications of this study for cognitive semiotics are both theoretical and methodological. Theoretically, this study highlights the need for cognitive semiotics to attend carefully and systemically to the fields of rhetoric and persuasion as a tradition of inquiry that can provide substantial theoretical framing for the study of cognition in context, a basal stipulation of cognitive ecology. This chapter emphasizes the need for a disciplined application of the very idea of “rhetorical situation” as a confluence of exigence, audience, and constraint to the study of documents as parts of a symbolic ecosystem capable of defining and constraining thought and action over time and among different agents. SCOTUS opinions are just one manifestation of such documentary systems. Methodologically, this study highlights the need for additional patient investigations of the “symbolic output” and uses of similarly rigid institutions. For instance, a prima facie investigation of an official report from the United States Senate Permanent Subcommittee on Investigation’s 646-page report, Wall Street and the Financial Crisis: Anatomy of a Financial Collapse, reveals no instances of deonstemic must in the entire document, suggesting that this construction may be specific to jurisprudential institutions (and perhaps it is sui generis to common law (i.e., Anglo-­ American) institutions in contrast to both civil and religious legal institutions). On the other hand, a cursory examination of should hints at the possibility of a different form of deonstemic usage with less compulsory but no less normative effects. But I must leave this question for another occasion.

Acknowledgements In addition to the two anonymous reviewers, the author thanks the following colleagues for their thoughtful comments and advice: Mihailo Antović, Line Brandt, Per Aage Brandt, Peer Bundgaard, Seana Coulson, Anders Hougaard, Vladimir Figir, Esther Pascual, Frederik Stjernfeldt, Eve Sweetser, Vera Tobin, Mark Turner, Đorđe Vidanović, Jordan Zlatev, and Svend Østergaard.

7 The Court’s reputation as a reasonable body is never uniform, and its reputation has suffered in recent years, but not so much so that the other branches of government, national or local, can feel confident in explicitly refusing to obey its edicts, as Andrew Jackson did in 1832.

Vlado Sušac

Chapter 20 Commutation of Cognitive Source Domains as a Semiotic Tool for Paradigmatic Analysis 1. Introduction The theoretical postulate of this chapter is based on an attempt to unify the traditional structuralist approach in semiotics and the cognitive theory of conceptual metaphor. The former is based on the definition of sign as a connection between the signifier and the signified, realized through the richness of syntagmatic and paradigmatic relations and possibilities within the message (Saussure 1916). The latter represents a relatively new theory of metaphor, which at the end of the last century transferred this seemingly trivial linguistic phenomenon from the field of stylistics into the area of cognition and human ways of thinking (Lakoff and Johnson 1980, 1999) My aim is to test the possibility of replacing certain dominant metaphorical concepts, primarily in political discourse, with alternative ones. It does not mean giving preference to one type of conceptualization over another (e.g. viewing institutions or political organizations as PLANTS instead of MACHINES), but examining the conditions and mechanisms under which such exchanges are possible, if possible at all. A precondition for such a possibility is the formal overlapping of the target metaphorical domains as the first step of analysis. Smaller or more significant differences in connotative meanings may be further evaluated by placing lexical realizations deriving from such an alternative metaphorical concept into the original context. Consequently, the contextualization of signs and their related concepts inevitably introduces corpus analysis as an additional method of research, which implies the usage of formerly published papers and classifications of political discourse according to its source and target metaphorical domains. Due to the complex structure of such an approach and in order to facilitate the understanding of the theoretical assumptions it relies on, the argumentation is structured as follows. Section 2 summarizes the basics of the paradigmatic analysis with an emphasis on commutation tests and pertaining semantic implications. Section 3 presents the theory of conceptual metaphors, primarily developed within cognitive linguistics and later applied to cognitive semiotics through the shift from verbal to pictorial signifiers, conceptually structured in a similar way. In doing so, special attention is paid to metaphorical interactions, i.e. the possibility of diversification and multivalency of conceptual domains, which opens the possibility of alternating various tenors and vehicles. This is followed by the presentation of the research corpus in Section 4, which provides the basis for the conceptual classification of


Vlado Sušac

metaphorical expressions in political discourse, with particular emphasis on the examples of diversification of source domains. The final step is the transition from a formal to a contextual level of analysis by separating the examples of “conceptual synonymy”, where various source domains overlap in their meanings, i.e. lead to the same target domain in the context given. This, of course, is possible only with the syntactic adaptation of lexical items to the replacing concept. This method aims to explain what conditions have to be satisfied so that the paradigmatic replacement of metaphorical source domain can be successful and whether such a change of metaphorical images may have additional connotative and even ideological implications, as suggested by Goatly (2007).1

2. Paradigmatic relations Semiotic analysis as we know it today has had its own course of development, scattered in many directions from purely structural to social semiotics. Still, we cannot avoid the fact that its origins, at least in the European tradition, stem from the Swiss linguist Ferdinand de Saussure, who suggested semiotics as a general theory applicable to all sign systems. It was exactly the concept of system that Saussure gave not only to linguistics, on which he based the majority of his theoretical assumptions, but also to other scientific disciplines. Structuralism in itself involves two fundamental types of relationships between signs, which make the sign itself a system of differences. These are syntagmatic and associative relations (Saussure 1959: 123), the latter prevailingly known today as paradigmatic under the influence of the Prague School (Vachek 1966). The first are also called relations in praesentia because they include the basic characteristic of the combination of elements present in the statement, while others are called relations in absentia because they refer to the possibilities of a selection of elements from the system and their incorporation into the syntagm. Although this is a logocentric approach, since Saussure recognizes such relations as the essence of langue (language as a system of signs), some semioticians have argued that it is applicable to other sign systems (cf. Barthes 1967; Eco 1976). Syntagmatic relations are mostly recognized and observed through spatiotemporal combinations, either in narrative procedures in film and literature or in the relations of the peripheral and central in images or in any other possible combinations applicable to the selected media, which could have an effect on a certain shift in meaning (Chandler 2007). It is necessary to note that syntagmatic relations do not necessarily have universal characteristics since 1 Ideology as a term, given its vagueness, can have numerous interpretations and any attempt to elaborate them would exceed the limits of a single article and the purpose of this work. Therefore, the term is used in its rather broad sense corresponding to the definitions offered by Van Dijk (1998: 8) as “the basis of the social representations shared by members of a group” and by Hodge and Kress (1988: 6) as “a level of social meaning with distinctive functions, orientations and content for a social class or a group”.

Commutation of Cognitive Source Domains


many of them are culturally conditioned, e.g. reading direction from left to right or vice versa, or from the top down, etc. (Kress and Van Leeuwen 1996). The paradigmatic relations are much more complex because the vertical axis of selection includes significantly more choices. Structural semiotics has borrowed from linguistics two fundamental procedures of paradigmatic analysis, namely, binary oppositions and the commutation test. In terms of methodology, these procedures allow the examination of the impact on the meaning by replacing one signifier with another. The paradigmatic axis of selection is handled by a very wide range of possible and alternative signifiers with respect to gender, age, class, race, ethnicity, and so on. The commutation test, however, is not limited to paradigmatic analysis but may also be applicable to syntagmatic relations through the procedures of addition and deletion of existing elements in the semiotically observed text (Chandler 2007: 90).2 Particularly interesting is the relation between marked and unmarked forms as part of paradigmatic analysis (Jakobson and Waugh 1979: 92). Its purely linguistic application has spread to semiotics, whereby it is taken into account that unmarked forms are the most common and dominant and therefore less visible as an expression of dominant values within a particular culture. This is largely present in the semiotic analysis of pictorial advertisements, where, for example, we can test the connotations produced by replacing a female with a male person signifier, young with old, etc. Especially provoking are the cases when an unmarked sign is replaced with a marked sign, given the fact that the former is less noticeable and therefore reveals the dominant values of a particular culture. In other words, very often by questioning such forms it is possible to reveal particular myths within the observed community that consider such signs unquestionable and thus “natural”, which leads us to the most concise definition of myth as “naturalization of the cultural” (Barthes 1991: 128). Since, as already explained, the foundation of structural semiotic analysis is based on replacing one signifier with another, and even one type of media with another, which verifies the impact on the meaning of such alternative forms, it is legitimate to ask whether the same procedure can be applied to the replacement of one metaphorical vehicle or one metaphorical concept with another, that lead to an approximately similar meaning. The starting point for this procedure we find within the framework of Conceptual Metaphor Theory (CMT), which is applicable to both linguistic and visual signs, where the latter can also be subjected to various commutation tests or binary oppositions (male – female, black – white, etc), This process is particularly known in the field of advertising where visual metaphors frequently replace or supplement the verbal ones by applying the same process of mapping from source to target domains (cf. Forceville 1996). The basic features of CMT are presented in the next section, where we argue that the mentioned pro2 Sonesson (2009: 44) notes that commutation as a procedure in linguistic analysis can be seen as a special case of the operation of ideation in Husserlian phenomenology.


Vlado Sušac

cess of paradigmatic replacement of one metaphorical element with another is not merely a formal operation, as its implications can also be found on the cognitive level. The thesis of linguistic relativity and the influence of language on thought gains extra significance with the theory of conceptual metaphor because it deals with a different conceptualization of reality, attributed by Goatly (2007) to different ideological codes.3

3. Conceptual metaphor theory Metaphor as a linguistic phenomenon was known far back in ancient times as a rhetorical device of good speaking (ars bene dicendi). Later, it was even considered a mere and unnecessary language ornament or an obstacle to the clear understanding of the message. A revolutionary shift occurred at the end of the last century when metaphor began to be recognized as an omnipresent and natural cognitive phenomenon used since the beginning of humankind in order to conceptualize reality. In this way, conceptual metaphor has become the focus of much research in cognitive linguistics, and its application is increasingly found in cognitive semiotics, where the analysis is transferred to other sign systems. Lakoff and Johnson (1980), later followed by many others, have argued that we are not only surrounded by metaphors in their ubiquity, but we also live by them even when we are not aware of them.4 The first distinction in the new approach is the one between conceptual and linguistic metaphors. Conceptual metaphors are viewed as a natural part of human thought, and linguistic metaphors as a natural or intrinsic part of human language (ibid: 247). Conceptual metaphor is explained as “understanding and experiencing one idea in terms of another” (ibid: 5). This process, known as mapping, occurs between the source domain of our mainly bodily experience and usually, but not necessarily, the more abstract target domain, e.g.: (1) Flames of passion surged through their veins. (2) Her inextinguishable love will never disappear. (3) His speeches fuelled the fires of protest.

Here we find three examples of linguistic metaphors identified through their key lexical items as metaphorical vehicles. The concept that connects these lexical items 3 Linguistic relativity is a commonly used term that refers to the “weaker” form of the so-­called Sapir-­Whorf hypothesis. There is growing consensus nowadays that language does not determine, but can have impact on thoughts and cognition (cf. Casasanto 2008). 4 For other references see the special issue on CMT in Fussarolli and Morgagni, eds. (2013). For the overview and response to some critical views of CMT (Zinken, Rakova, Gevaert, etc.) concerning methodology, direction of analysis, schematicity of metaphor, embodiment, and the relationship between metaphor and culture, see Kövecses (2008). Further useful insight into the CMT criticism is provided by Gibbs (2013).

Commutation of Cognitive Source Domains


is the idea of HEAT, which in this case is understood as the source domain that will, according to the original Greek meaning of metaphor, transfer to us their common concept or target domain of EMOTION. Therefore, EMOTION IS HEAT is the label of the conceptual metaphor which consists of the metaphorical mapping occurring between a source and a target domain. This rather simplified example still offers clear evidence how metaphors are structured and how deeply they are rooted in our everyday language and our conceptualization of the world. It also implies one of the principal questions asked here: can the paradigmatic selection of metaphors that we use have an impact on the way we understand the reality that surrounds us? If it can, how does it work? In order to answer such questions, some interactive metaphor relations have to be explained. The study of metaphor has shown that a metaphorical meaning is not limited by the relation of mapping established between exclusively one source and one target domain, but instead we deal with many complex relations resulting from the interaction of several different target and source domains. In this respect, the most challenging inter-­relations are those of multivalency and diversification (Goatly 2007: 12). Multivalency is described as the case when the same source domain is applied to various target domains, as in examples (4) and (5) in Croatian. (4) (5)

Dubrovnik je vrhunska destinacija (GOOD IS HIGH) ‘Dubrovnik is a top destination’ Cijene nekretnina u Dubrovniku su otišle u nebo (MORE IS HIGH) ‘The prices of property in Dubrovnik are sky high’

Since these different targets GOOD and MORE share a multivalent source, they may become associated into an equation MORE = GOOD. This concept, deeply rooted in the Western economy and culture, reinforces patterns of excessive wealth accumulation and consumption as part of the society ethic. In the opposite sense, it is worth remembering the different religious attitudes that promote economic modesty as a moral imperative (ibid: 168).5 Another pair of multivalent sources is given in examples (6–7). This might suggest the equation CHANGE = SUCCESS, which, according to Goatly (ibid: 173) is a pattern of consumer behaviour widely recognized in Western culture, especially in terms of buying the latest and most fashionable products even if we don’t need them, inducing us to frequently change clothes, cars, mobile phones, etc. (6) Attitudes to women shifted in the 19th century. (CHANGE IS PATH) (7) She is intelligent and hardworking; I’m sure she’ll go a long way. (SUCCESS IS PATH)

5 Of course, this equation explains only one particular aspect of culture concerning the acquisition of wealth, while in many other fields of experience it would not be appropriate. Example (5) can also be relativised when viewed from the position of the seller or the buyer of property.


Vlado Sušac

For the objective of this chapter, more relevant is the case of diversification, the opposite to multivalency, where different source domains lead to one target domain, as shown in (8–10; 11–13). (8) He is one of the few clean politicians. (9) It was an immaculate performance. (10) She had a spotless reputation. (11) I’m your number one fan. (12) Saving children always takes priority over adults. (13) She’s one of the foremost paediatric experts.



These examples of diversification are randomly selected from Andrew Goatly’s Metalude database of metaphors accessible online and developed as a project of contrastive analysis of English and Chinese conceptual systems. At first glance we can realize that although two groups of metaphorical phrases share the same target domain of GOOD, it is hardly possible to switch the lexical item from different source domains and use them in a different context. Still, in the afterword to Goatly’s book Washing the Brain – Metaphor and Hidden Ideology (Goatly 2007), Zoltan Kövecses suggests that “we need to uncover these ideologically-­loaded metaphors and look for alternative ones”. At the time, Kövecses had already been aware of the phenomenon of conceptual diversification, particularly having in mind the differences in conceptualizations between various cultures (Kövecses 2005: 83). However, if such alternative conceptualizations are indeed possible, it is worth examining the conditions that satisfy such a possibility. After all, the examples from (8–10) cannot be exchanged with the examples from (11–13) in the registered contexts (e.g. ??“I am your clean fan”), although they may formally share the same target domain GOOD.

4. Method and analysis A reliable database to identify a sufficient number of conceptual pairs that share the same source domain was found in the aforementioned Metalude database created under the supervision of Prof. Andrew Goatly at the University of Lingnan in Hong Kong in the Department of English Language. The data are corpus based and mostly retrieved from English dictionaries (Colling COBUILD, CIDE, OED, etc.) and newspapers in English. It is a rather useful device as it offers the classification of metaphorical expressions according to the lexical item, or the vehicle in Richards’ (1932) terms, the metaphorical mapping or root analogy as named by Goatly and, last but not least, an example taken out from the corpus, which provides the basis for checking the context in which the metaphorical expression appears (see Table 1).

Commutation of Cognitive Source Domains


Table 1. An example of classification from the Metalude database of metaphors. Source: Metalude: Metaphor at Lingnan University, Department of English, with the permission of A. Goatly. ROOT ANALOGY





burning strongly






we had a flaming argument about religion

The database itself offers various possibilities of corpus analysis and firm foundations for either semasiological or onomasiological classification of metaphors or, in other words, from expression to concept (semasiology) or from concept to expression (onomasiology). The search engine provides manifold ways of research, either by typing in lexical items, which are grouped under all the source domains they appear in, or it is possible to insert a complete “root analogy”, which leads to a whole list of language metaphors that correspond to the given mapping. It is also possible to search separately for either source or target domains and the software will offer a list of the corresponding mappings for the given domain, with all the language metaphors they are derived from. The first step in our research was to check the existence of diversification cases (one target and multiple source domains) in the database; we managed to identify a list of conceptual pairs that share the same source domain, as shown in Table 2 based on English examples. Table 2. A list of conceptual diversification mappings extracted from the Metalude database. Source: Metalude: Metaphor at Lingnan University, Department of English, with the permission of A. Goatly. ACTIVITY IS AGRICULTURE





























Vlado Sušac











































Further, we needed to choose the most appropriate conceptual pairs from the list in order to test the possibilities of alternating two or more source domains for the same target domain. We decided to choose and test the most frequent conceptual pairs in the political discourse assuming that such a corpus is highly ideologically motivated and therefore fits into the aforementioned idea of Kövecses’ concerning “ideologically-­motivated metaphors”. For that matter we used previous research on political discourse in Croatia carried out through the corpus analysis of four election campaigns registered in the Internet editions of three daily newspapers in Croatia.6 The objective of the research (Sušac 2007) was bidirectional. Beside the aforementioned test of the most dominant metaphorical concepts in political campaigns, it also aimed to contrast the conceptual systems of English and Croatian. It transpired that all the English conceptual pairs presented in the Metalude database can also be found in the Croatian language, but not necessarily with the equivalent lexical items.7 Therefore, the English database could be combined with examples from the Croatian corpus in order to check for cases of metaphor diversification. The corpus was strictly reduced to all the quoted speeches and utterances of politicians involved in the campaign and excluded journalists’ comments. It spanned a period of five years in the previous decade and more than one thousand extracted examples have shown that there is evidence of significant inclination of conservative and liberal parties towards different types of metaphors. It turned out that two domains FIGHTING and PATH significantly prevail over other concepts. Further analysis showed that the former seems to be visibly preferred by the leading conservative party HDZ (Croatian Democratic Union), unlike the concept of 6 The complete research is available as an unpublished doctoral thesis (Sušac 2007) and partly published in Sušac (2010) 7 The Metalude project itself was launched with the idea of contrasting the conceptual systems of English and Chinese for the purpose of easier language learning.


Commutation of Cognitive Source Domains

PATH preferred by the leading social-­democrat party SDP (Social Democratic Party). The latter is especially emphasized by the fact that the leading conservative party dominantly outnumbers social-­democrats in its overall number of metaphors used in the election campaigns, and that the metaphors of PATH used by the social-­democrats are the only ones that outnumber any other type of metaphors used by the conservatives.8 The next step was to identify which target domains from the Metalude database are shared by the previously mentioned most dominant source domains of FIGHTING and PATH in the political discourse of Croatia. It turned out that the target domain ACTIVITY was the only case of diversification. This is a rather general notion, but within the corpus of analysis of political discourse it can be specified as POLITICAL ACTIVITY, with all its objectives and instruments. Further analysis was focused on the language metaphors or the lexical items deriving from such concepts and being used in particular contexts. The objective was to discover whether certain lexical items belonging to one source domain would fit into a different metaphorical context that shares the same target domain or the idea conveyed through the metaphor. It was found that such alterations were possible in a rather limited number of cases that can be considered cases of contextual diversification (Table 3). Here, the lexical items share the metaphorical meaning, unlike the cases where the same target domain is just formally shared by two different source domains, but with a mismatch in the metaphorical meaning of lexical items. Such cases of formal diversification are shown in Table 4. Here, although ACTIVITY is the common target domain, the metaphorical meaning is significantly different (reaction vs. development) and therefore it would be impossible to switch lexical items from different source domains. Table 3. Contextual diversification of metaphorical source domains. Source: Metalude: Metaphor at Lingnan University, Department of English, with the permission of A. Goatly. Root analogy

Lexical item

Litral Meaning


Metaphorical meaning


Hit hard

Vi+adv Continue doing I bashed something on with my difficult or boring marking past midnight

ACTIVITY Keep on IS PATH trucking

Transporting Idi(cl) goods by lorry

Continue with what you are doing


If you can’t get a job don’t give up applying, just keep on trucking

8 By coincidence or not, the previous name of the Social Democratic Party was the Party of Democratic Change, and considering the frequently present multivalency example CHANGE IS PATH, a possible pattern may be found that will support the assumption of ideologically grounded metaphors (Sušac 2010: 563)


Vlado Sušac

Table 4. Formal diversification of metaphorical source domains. Source: Metalude: Metaphor at Lingnan University, Department of English, with the permission of A. Goatly. Root analogy

Lexical item

Litral Meaning

PS Metaphorical meaning

Strike with a whip on the back


Strong reaction to There’s been events in society a right-­wing backlash against the walfare state in the last 20 years.

ACTIVITY milsestone Roadside IS PATH stone or post that indicates distance


Important stage in something’s development



9/11 seemd like a milestone on the road to war.

In the case of contextual diversification (Table 3), it is possible to replace the lexical item bash on, which belongs to the source domain of FIGHTING, with the lexical item keep on trucking from the source domain of PATH. The reason why it is possible is given in the column of metaphorical meaning, with emphasis on the idea of continuation shared by the two domains. This idea is definitely due to the grammatical construction VERB + on present in both cases, conveying the aspect of an action in progress. However, its metaphorical meaning is owed not to the preposition but the verbs bashing and trucking belonging to two different conceptual domains. It is also worth noting that the syntax of any language would rarely allow diversification by simply replacing a lexical item from one domain with the lexical item from another domain without some kind of syntagmatic adaptation. Of course, paradigmatic selection of other lexical items is also possible as long as they contain or convey the idea shared by the two source domains. It is clear that such “conceptual synonymy” is not absolute, as there is no absolute synonymy in language either, but the example clearly shows that the commutation of source domains is possible, with all further semiotic connotations this may lead to. The overlapping in metaphorical meaning within contextual diversification is possible because of the metaphorical ground shared by the two domains in the particular context, which, as shown in Table 4, is not always the case. The notion of ground belongs to the basic theory of metaphors introduced by I. A. Richards (1971), where there is a distinction between tenor as the object that the metaphoric word or phrase refers to, vehicle as the metaphoric word or phrase and finally ground as the quality that one refers to when using a particular vehicle in relation to the tenor. This rather rhetorical approach reduced to lexical items can be easily transferred into the cognitive sphere of concepts, where the vehicles lead to source and the tenors to target domains. Ground corresponds to the metaphorical meaning and in case of diversification to those parts of qualities shared by two or

Commutation of Cognitive Source Domains


more source domains and one target domain, which in the example above is the idea of continuation. Graphically, this may be represented as in Figure 1. Figure 1. A graphical representation of the metaphorical ground shared by one target (T) and two source (S) domains

The corpus offers a number of cases with other conceptual grounds shared by the observed source domains of FIGHTING and PATH. They are: difficulty, give up/ end/finish, start/begin, change, achieve, doubt/confusion, etc. These shared grounds (as much as conceptual pairs) are present in both languages, English and Croatian, and therefore it was possible to apply the results from the Croatian corpus to the database for English. This was done not just for the purpose of text clarity and the accessibility of the Metalude database, but to show that this kind of procedure can be applicable to other languages as well. These cases largely confirm the initial assumption that commutation of metaphorical source domains is possible as a semiotic tool for paradigmatic analysis when, of course, certain preconditions are satisfied. In semiotic terms, it implies a principal question about the effect or consequence of such alterations on the signifieds. In political discourse we can ask the same question; what is accomplished by replacing the dominant source domain of PATH with FIGHTING (and vice versa)? Let us consider the following examples in Croatian, and their English translations. (14) Dobili smo najteže bitke u razvoju naše zemlje. ‘We have won the severest battles in the development of our country.’ (15) Prošli smo najteže prepreke na našem putu u EU. ‘We have overcome the most difficult obstacles on our way to the EU.’

Obviously, example (14) belongs to the conceptual mapping POLITICAL ACTIVITY IS FIGHTING and (15) fits into the category POLITICAL ACTIVITY IS PATH, so formally, we can recognize the case of diversification. The possibility for contextual


Vlado Sušac

diversification, or the possibility of commutation of conceptual domains and their lexical items in the same context, is not that visible at first sight, but we may find their common ground if we recall another conceptual pair from the Metalude database DEVELOPING/SUCCEEDING IS MOVING FORWARD. It is possible because our concept of FIGHTING includes two basic aspects of defending and attacking, where the first is rather static and the second is prevailingly dynamic, implying the idea of movement shared with the concept of PATH. Therefore, we can easily switch the metaphorical vehicles as in (16–17). (16) Prošli smo najteže prepreke u razvoju naše zemlje ‘We have overcome the most difficult obstacles in the development of our country.’ (17) Dobili smo najteže bitke na našem putu u EU. ‘We have won the severest battles on our way to the EU.’

Clearly, the intended meaning remains the same in (14) and (16), and in (15) and (17), but the construals clearly differ. In different contexts if we identify political activity with a particular goal or target, we can visualize it as a goal / target to reach or a goal / target to shoot at. Similarly, if we consider the conceptual pairs ORGANIZATION IS MACHINE and ORGANIZATION IS PLANT, both mappings definitely share some conceptual grounds – the idea of growth and development in the first place. So a particular company can be in neutral (gear) or simply hibernating, it can be greased or watered, etc. depending on which source domain we prefer. A question that still remains to be answered is whether our choice between machines and plants is a reflection of our inner world and values or simply a habit influenced by dominant discourse within a society. The choice of metaphors in cases of diversification has consequences for cognitive structuring and ideology, and also for syntactic relations. Given space one could provide a system-­diagram of the choices of vehicle available for conceptualizing any particular entity, and try to explore the cognitive and grammatical consequences of the choice (Goatly 1997: 66). This could allow one to address additional questions: can we attribute any kind of positive or negative ideological evaluation to our choice, as proposed by Kövecses, and insist on “conceptual correctness”?9 Are there clear-­cut answers concerning consequences or it is all about opening different doors to never-­ending semiosis? Whatever the stance taken, it is worth remembering that metaphors imply verbal images in the cases where the process of reification is involved (concrete for abstract) and that in communication “a picture is worth a thousand words”. What can

9 By this term we refer to Kövecses’ pleading for alternative metaphorical concepts, bringing to mind the “political correctness” campaign and language prescriptivism inspired partly by the idea of our thoughts affected (not determined) by language (Cameron 1995; Ravitch 2003).

Commutation of Cognitive Source Domains


be visualized or concreted, can presumably be more easily memorized and become part of our “cognitive unconscious” (Lakoff and Johnson 1999).

5. Conclusions The main objective of the research described in this chapter was to investigate whether the commutation of signifiers, as a research procedure that primarily belongs to structural semiotics, can be applied to the field of conceptual metaphor. Diversification, as one of many interplays of metaphor, was shown to be applicable by extending the commutation test from simple signifiers to whole metaphorical source domains. The research corpus of political discourse was appropriate for identifying two kinds of diversification, formal and contextual, where only the latter provides a good basis for paradigmatic analysis. Contextual diversification can take place only when metaphorical realizations deriving from two source domains can be re-­contextualized, but retain the generally intended meaning, as in (14+16) and (15+17). For this, the two domains must share a conceptual ground. The question of consequences of such alterations, whether they are ideologically grounded or merely a matter of style, still remains open. However, the results of the political discourse corpus analysis are rather indicative, given the clearly shown preference of opposed political groups towards different conceptualizations of their political objectives. In conclusion, even if we do not fully support the idea of the need for “conceptual correctness” so that we may get rid of “ideologically loaded metaphors”, it is still worth knowing that there is the possibility of choice in creating different metaphorical construals for the same communicative purpose, at least for the sake of style.

Maíra Avelar

Chapter 21 The Emergence of Multimodal Metaphors in Brazilian Political-­electoral Debates 1. Introduction As there is a lack of studies on multimodality applied to the political domain, this study analyzes the metaphors that gradually emerged in the political-­electoral debates between Brazilian presidential candidates that took place in 2010 and 2014, taking into consideration two semiotic resources: speech, belonging to the auditory modality, and gestures, belonging to the visual modality. To analyze the selected scenes from the debates, an adapted version of a cognitive semiotic model (Brandt and Brandt 2005) was chosen, as it encompasses the enunciative scenario, allowing for a dynamic analysis of the metaphors used by the candidates and, most importantly, allowing for a detailed and integrated analysis of the gestural content together with the linguistic content. Based on Turner’s (2007) assumptions, the main hypothesis that guides this research is that the more entrenched the metaphorical expression in our conceptual system is, the harder it is to recognize such an expression as metaphorical. When approaching the difference between literal and metaphorical meaning, Turner states that these are not “cognitive operations [that are] fundamentally different” (Turner 2007: 1). Rather, the judgment of an expression as literal or metaphorical is intimately related to the degree of productive entrenchment – or, in Müller’s (2009) terminology, to the degree of conventionality – of a conceptual connection. Thus, the higher the degree of productive entrenchment/conventionality – the lower the chance that an expression will be judged as metaphorical. Regarding the meaning of political discourse and, more specifically, the argument developed in a political-­electoral debate, it is relevant to highlight the importance of compression processes (Fauconnier and Turner 2002). One of the central benefits of such processes is their capacity to provide compressions of diffuse event ranges to the human level. According to Conceptual Blending Theory (Fauconnier and Turner 2002), mental spaces and the connections among them are established because compression (and decompression) can provide a global view and a human-­level understanding of conceptual relations. Abstract concepts such as space, time, cause, effect, identity, and change emerge through blending. These are called vital relations, maximized and intensified by compression. Taking the political debates into account, extremely complex political matters (related to the conditions of public administration life, work, economy


Maíra Avelar

etc.) would have to be ostensibly explained to an extremely heterogeneous public, requiring a great amount of time. Compression allows these complex matters to be metaphorically mapped/projected from more primitive levels of our own personal experience. Further theoretical background to the study is presented in Section 2, contemplating well-­known theories such as Conceptual Metaphor Theory and Conceptual Blending Theory, as well as Multimodal Semiotic Blending (MSB): an adaptation of Brandt and Brandt’s Semiotic Model (Miranda and Mendes 2014). As this adaptation highlights the dynamicity of multimodal metaphorical emergence in the enunciative scene, it requires analytical categories such as gesture excursion, metaphoricity in verbal-­gestural compounds, and compression in speech turns. Subsequently, the methods for collecting, selecting, and coding the corpus are presented and we analyze four samples, two from the 2010 debate and two from the 2014 debate. In each one of these, the gestures are photographed, described, and analyzed, followed by the application of the MSB model. Finally, we discuss the findings as well as their relevance for cognitive semiotics.

2. Theoretical Background: Metaphor theories and multimodality In this section, we present our adapted version of Brandt’s (2005) model, called the Multimodal Semiotic Blending (MSB) model (Miranda and Mendes 2014) in the context of better known theories of metaphor. We suggest that our model allows a more detailed discussion of gesture analyses, bringing in categories such as multimodal metaphoricity in speech and gesture compounds (Müller and Cienki 2009), gesture excursion (Kendon 2004), and the compression in speech turns (Hougaard and Hougaard 2008). As is well known, Conceptual Metaphor Theory (CMT) asserts that “metaphor is pervasive in everyday life, not just in language, but in thought and in action. Our ordinary conceptual system, in terms of which we both think and act, is fundamentally metaphorical in nature” (Lakoff and Johnson 1980: 3). Further, conceptual metaphors are crucial for the construction of abstract concepts, as their target domain is anchored in the inferential patterns made from sensorimotor experience (Johnson 2007). Inference is, therefore, “the heart of metaphor” (Lakoff and Johnson 1980: 245). Each conceptual metaphor consists of the systematic mapping of entities and relations from a sensorimotor source domain to a more abstract target domain. There are many examples that illustrate how metaphorical mapping works. For instance, the conceptual metaphor intentional activities are journeys underlying example (1), presents a conceptual mapping from the spatial physical move-

Multimodal Metaphors in Brazilian Political Debates


ment or a journey to the target domain of intentional activities. This conceptual mapping could be illustrated as shown in Figure 1. (1) We still have a long way to go before finishing this paper.

Figure 1. Schema of the Conceptual Metaphor intentional activities are journeys (adapted from Johnson 2007: 177) Source-domain


Starting point A

Initial state

Ending point B

Final state


Goal to be reached

Movement from A to B

Transition from A to B

Obstacles to the movement

Difficulties to reach the goal

The schema clarifies the directionality of the metaphor from source to target domain, and can be interpreted as the understanding of one concept (the more abstract) in terms of another (the more concrete). This means concrete experiences can be extended, through conceptual metaphors, to characterize non-­spatial experiences such as “falling into a depression” (cf. Rohrer 2007). While CMT is a very powerful and influential theory, two main critiques have been addressed to it, among others by authors who work on multimodality (e.g. Müller and Cienki 2009): (a) the proposed metaphorical mapping is too static; (b) examples are created by the analysts, instead of being collected in real contexts of interaction. Such factors provoked the development of a theoretical framework that brings more dynamicity to the metaphorical mappings: Conceptual Blending Theory (Fauconnier and Turner 2002). Conceptual blending (or integration) is assumed to be a basic mental operation that leads to global insights and conceptual compressions that are useful to memory, as it organizes diffuse ranges of meaning. It is a dynamic cognitive process that has a set of constitutive principles: (i) a partial cross-­spatial mapping that links some counterparts to the input mental spaces; (ii) a generic mental space that is mapped in each of the input spaces and has the common elements of these input spaces; (iii) the blending space; and (iv) a selective projection from the input spaces to the blending space. The essence of the blending operation is the construction of a partial correspondence between the two input spaces that will be selectively projected in the blended space (see Figure 2).


Maíra Avelar

Figure 2. Generic schema of the Conceptual Blending operation (see Fauconnier and Turner 2002: 56)

The blending products that are interpreted as literal at a specific domain can be interpreted as metaphorical in another. As pointed out in the introduction, this depends on the degree of productive entrenchment, or conventionality of the structure (Turner 2007). For example, in (2), used by a politician to undermine his opponent’s argument, the conventional metaphor knowing is seeing is used, which is strongly entrenched and the expression is not (typically) recognized as metaphoric. On the other hand, in the utterance (3), winning ticket, conventionally related to the lottery domain, emerges as a novel metaphor, related to the oil reserves domain, and is easily recognized as such (Miranda and Mendes 2014). Consequently, metaphors are seen as emerging from a gradual process: expressions are interpreted as more or less metaphorical depending on their degree of productive entrenchment, or conventionality. (2) As everyone can see, she couldn’t prove her point. (3) He is going to hand over the winning ticket to foreign private companies.

An extension of this model was proposed by Brandt and Brandt (2005), which makes it possible to analyze how subjects, in situated interactions, perform blending by projecting an architecture of spaces (cognitive frames) on top of a semiotic base space that can be understood as the participants’ shared representation of the situation of communication. This semiotic base space is fundamental for meaning construction. Some modifications have been made to the architecture of the spaces originally presented by Brandt and Brandt (2005), mainly in the semiotic base space, which was reformulated as a grounded space (Oakley 2009). Finally, to better suit our analysis of multimodal data, we have proposed that the presentation space should

Multimodal Metaphors in Brazilian Political Debates


be unfolded into two kinds of signifiers: (a) gestural and prosodic resources and (b) linguistic resources (Miranda and Mendes 2014). In the final modified version (Figure 3), the architecture of the spaces consists of: (i) grounded semiotic space (which unfolds into three spheres: semiosis as an instance of language acts performed by the interlocutors; the communication situation, in which the participants of the interaction are found; and the wider phenomenological world, accessible to our experience); (ii) input spaces, called presentation spaces (textual instance), which unfold into two dimensions: gestural/ prosodic resources and linguistic resources, and reference space (object instance); (iii) virtual space (blending), projected from the scenario element (frame) selection of the last two spaces; and (iv) relevance space, which guides the emerging meaning of the virtual space. This architecture of spaces is entitled Multimodal Semiotic Blending (MSB). Figure 3. Multimodal Semiotic Blending (MSB)

As can be noted in the description as well as in the diagram, in the presentation space, presented as a single space in the original model (Brandt and Brandt 2005), unfolds into two: one related to the “here-­and-now” of the enunciation and its material properties, and the other related to the linguistic resources, that is, its lexical properties and syntagmatic relations, used by the participants to build the


Maíra Avelar

utterance. Consequently, the presentation space results in the merge of both what is said and how it is said by the participants of the enunciative scene. The highlight of the prosodic and gestural features is a relevant tool to analyze the specificities of these bodily resources and their relation to the linguistic content, which can be seen in the applications of the model described in Section 4. Before we proceed, we should note that rather than speaking of “conceptual metaphors”, in terms of (static) mapping across domains, the concept of metaphoricity highlights the emergence of metaphor in interactions: metaphorical elaborations can be triggered in several modalities and successively over time. The analysis of syntactic, semantic, prosodic, and gestural contexts reveals that metaphoricity is a dynamic property, as will be explicated in the empirical study described in the remaining parts of this chapter.

3. Methodology 3.1. Analytical categories To analyze the gradual emergence of metaphors in a political debate, we selected the following specific categories related to multimodality.

3.1.1. Gesture excursion A gesture performance includes three phases in which a movement excursion is made, also called a Gestural Unit (GU) or an “excursion succession” (Kendon 2004: 110). • Preparation: an optional phase, in which the limbs move from a relaxed or resting position; • Stroke: a mandatory phase, in which the gestural expression is performed, showing clear dynamic movements that require the focus of both effort and energy. In this phase, considered the gesture’s peak, the hands tend to describe forms and complete movement patterns, or a brief stop in the movement, in which the limbs are kept still before relaxing and returning to the initial position: the “post-­ stroke hold” (Kendon 2004: 112). The combination of stroke and post-­stroke can be considered a “gestural phrase”, as these are phrases that convey meaning or gestural expression; • Retraction: an optional phase, in which there is a movement retraction toward the initial relaxed or resting position. The GU can be defined as the complete excursion of the movements, which starts (preparation) and ends (retraction) with the relaxing of the limbs and reaches its peak in the stroke. It is important to note that the GU can include one or more Gesture Phrases (GP): these are identified when there is a gestural stroke action sequence. Thus, the GP encompasses the preparation phase and the stroke phase, as well as hold phases (pre and post-­stroke) between the stroke sequences and the retraction.

Multimodal Metaphors in Brazilian Political Debates


3.1.2. Multimodal metaphoricity in verbal-­gestural compounds In the context of multimodal metaphoricity, it is possible to describe two patterns of the relation between gestures and speech (Mueller and Cienki 2009: 307): • It is possible to find the same source and the same target in different modalities. In these cases, the gesture embodies the source-­domain of the verbal metaphorical expression, indicating that the metaphoricity of that expression was activated or was in the foreground of the speaker’s attention. • It is also possible to find source and target in different modalities. In these cases, we find a gestural metaphorical expression, together with a target that is verbalized in a non-­metaphorical manner.

3.1.3. Compression in speech turns As stated in the Introduction, compression (and decompression) can provide a human-­level understanding of conceptual vital relations, such as time-­space, cause-­ effect, and identity-­change. When we take into consideration compression in interactions (Hougaard and Hougaard 2008), and, more specifically, in political debates, the metaphors work as a compression mechanism of the argumentation expressed in the speech turn.

3.2. Material The material analyzed includes scenes from two second-­round debates exhibited in 2010 and 2014 by Record, a Brazilian TV channel and collected from Youtube. The generic structure of both debates consists of questions (45 seconds in length), answers (2 minutes in length), replies (1 minute in length), and responses (1 minute in length). We focus on four examples: two answers from Dilma Rousseff, from the PT (Labor Party), one answer from the 2010 debate and one from the 2014 debate; one answer from José Serra, from the PSDB (Brazilian Social Democracy Party), from the 2010 debate; and one reply from Aécio Neves, also from the PSDB, from the 2014 debate. What the chosen sequences have in common is the fact that controversial issues were raised by the candidates, especially when addressing the proposals made by their opponents or by their opponent’s party. In the four sequences, the candidates denounce contradictory moral stances assumed by the opposition. The controversial issues raised in the 2010 debate demonstrated, at least in the discourse, the ideological differences between the PT (a left-­wing party) and the PSDB (a right-­wing party). On the other hand, in the 2014 debate, Aécio, from the PSDB, presents himself as a candidate from the opposition and declares that he was not a candidate from a political party, but the candidate that embodies change. Dilma, at this time the incumbent candidate, reinforces the social changes promoted by the PT and clearly asks for the elector’s vote.


Maíra Avelar

To describe the four sequences, according to Kendon’s (2004) description of the gesture excursion (see 3.1.1), all gesture strokes presented in each gesture excursion that presented a verbal-­gestural metaphor were photographed. In addition, we have highlighted in bold the verbal content that is depicted by the gesture stroke. We use captions, both to show this correlation between the gesture stroke and the verbal content, and to provide the description of the movement made by the hands and the forearms.

4. Analysis 4.1. Dilma’s speech from the 2010 debate (Sequence #1) The first sample was extracted from an answer given by Dilma about public security. The candidate discusses the new oil reserves, and only in the end discusses public security. Considering our coding methods, the gestural sequence is described and analyzed as shown in Figure 4.1 Figure 4. Gestures and descriptions of Dilma’s speech from 2010 (Sequence #1)

I can’t conceive that someone would want to take a winning ticket. Closed hands, next to each other, with folded fingers, positioned on the right side of the body.

There are two tickets, one isn’t a winning ticket Vertically opened left hand, with palms turned upwards and stretched fingers.

1 Eu não posso conceber que alguém queira pegar um bilhete premiado. Tem dois bilhetes: um num é premiado e o outro é. Entregar o bilhete premiado, do qual nós podemos ter

Multimodal Metaphors in Brazilian Political Debates


and the other is Vertically opened right hand, with palms turned upwards and stretched fingers

Hand over the winning ticket, from which we can win our passport for the future, to foreign private companies Closed right hand, folded fingers and half-­stretched arm to the front

In this sample, it is possible to identify a Gesture Unit (GU) made up of four Gesture Phrases (GP): when referring to pré-­sal (the new deep-­sea oil reserves), Dilma depicts them as a bilhete premiado (‘winning ticket’) and stages the first gesture of pegar o bilhete premiado (‘taking the winning ticket’). Through the use of this metaphor, the candidate compares the recently discovered deep-­sea oil reserves to a winning ticket, explaining to the viewer that tem dois bilhetes: um num é premiado e o outro é (‘there are two tickets: one isn’t a winning ticket and the other is’), gesturally staging two objects that occupy opposite positions in space, which correspond to the second and third gestures. She also uses the verbal-­gestural metaphor entregar o bilhete premiado (‘hand over the winning ticket’), which corresponds to the privatization of the deep-­sea oil reserves. The gesture that accompanies entregar (‘hand over’) is the fourth one. Generally speaking, there is a target domain, Oil Reserves, which is verbally and gesturally presented by the source domain Winning ticket. Applying the Multimodal Semiotic Blending model, this can be analyzed as shown in Figure 5.

o nosso passaporte pro futuro, pra empresas privadas estrangeiras. “I can’t conceive that someone would want to take a winning ticket. There are two tickets, one isn’t a winning ticket, and the other is. Hand over the winning ticket, from which we can win our passport for the future, to foreign private companies.”


Maíra Avelar

Figure 5. The Multimodal Semiotic Blending model, applied to Dilma’s speech from 2010 (Sequence #1)

The enunciative scene (in this and all other samples) is socioculturally situated in the political-­electoral discursive field, in which, from the argumentative relevance point of view, the candidates use strategies to undermine their opponent’s credibility. In this specific communicative situation, the political-­electoral debate, Dilma uses the fact that the PSDB, Serra’s party, is known for its privatizations. Predicting her opponent’s strategy, the candidate foresees that the opponent will privatize the deep-­sea oil reserves and accuses him of this. When referring to pré-­sal (‘the new deep-­sea oil reserves’), Dilma metaphorically presents it, both verbally and gesturally, as a bilhete premiado (‘winning ticket’). It is possible to identify a strongly metaphorical process, compressed by some vital relations: analogy [petroleum = winning ticket], identity-­change [old oil reserves are not winning tickets → new deep-­sea oil reserves are], space-­time [right=future x left=past], cause-­effect [Serra’s government → privatization of the deep-­sea oil reserves], and so forth.

4.2. Serra’s sequence from the 2010 debate (Sequence #2) The following sequence corresponds to a sample of Serra’s speech, also from the 2010 debate, which is a rejoinder of Dilma’s accusations about the privatization that would be executed by him, should he be elected president. Serra takes the stance

Multimodal Metaphors in Brazilian Political Debates


of an honest candidate and accuses his opponent of making up lies regarding his political life, aimed at fooling the voters. The example is shown in Figure 6.2 Figure 6: Gestures and descriptions of Serra’s speech from 2010 (Sequence #2)

So she comes and creates fantasies Wide circular gesture of the forearms, which open repeatedly in circles

makes foam, makes myths Wide circular gesture of the forearms, which open repeatedly in circles

In this sequence of two gestures performed by Serra, the GU consists of two GPs: while performing a wide circle, the candidate verbally stages some variants of the conventional metaphor lying is making up stories and, finally, the novel metaphor lying is making foam. The use of wide and repetitive gestures works as a mechanism of directing the listener’s attention: it is impossible not to notice the performed gestures. In sum, the target domain lying is verbally staged by the variants of the source domain making up stories, and gesturally staged by the source domain increasing. Thus, the same target domain is presented by different source domains. Applying the Multimodal Semiotic Blending model produces the analysis shown in Figure 7.

2 Então ela vem e fica criando fantasias, fazendo espuma, fazendo mitos com o único propósito de enganar as pessoas do ponto de vista eleitoral. “So she comes and creates fantasies, makes foam, makes myths, with the sole purpose of deceiving the people from an electoral point of view.”


Maíra Avelar

Figure 7. Multimodal Semiotic Blending model, applied to Serra’s speech from 2010 (Sequence #2)

In the referent communicative situation, the political-­electoral debate, Serra accuses Dilma of being untruthful and, consequently, accuses the PT of being corrupt. From the point of view of the discursive strategies mobilized by the participants, Serra uses lying to the voters as a reference, presenting it with the metaphorical expression She… makes foam linked to wide circular gestures. The mobilization of all of these resources also results in perlocutionary effects of provocation and disqualification, targeting the opponent. In addition, the sequence of the presented metaphors promotes a compression of the speech turn, already indicated in the previous phrase: I am saying all of this to say that… Some vital relations can be identified, such as: analogy [lying= making foam], identity [Dilma’s government = corrupt government], and part-­whole [Dilma’s lies = PT’s lies].

4.3. Dilma’s speech from the 2014 debate (Sequence #3) The next sequence, from the 2014 debate, is a response from Dilma, about the issue of public banks. She accuses Aécio and his future Minister of the Economy of terrorism, because, according to her, he forecasted the end of public banks. The example is represented in Figure 8.3 3 Eu, se fosse funcionário do Banco do Brasil, da Caixa e do BNDES, eu (1) ficaria com três pulgas (2) atrás da orelha (3). “Me, if I were a Banco do Brasil, Caixa, and

Multimodal Metaphors in Brazilian Political Debates


In this sample, in which the GU consists of two GPs, a new metaphor is used. The conventional metaphor being suspicious is having a flea behind the ear is amplified to having three fleas behind the ear. The first gesture points to the ear and the last one, which is composed by a repetition, corresponds to a gesture aimed at the television viewer that says “Beware!” There is also a compression of the speech turn, in which the candidate tries to draw the television viewer’s attention to the risk of the privatization of public banks. Thus, while the source domain having three fleas behind the ear is verbally-­gesturally staged by the first GP (and the speech that occurs with it), the second GP not only complements the first one, but also stages the target domain Being aware. Applying the Multimodal Semiotic Blending model, we reach the analysis in Figure 9. Figure 8. Gestures and descriptions of Dilma’s speech from 2014 (Sequence #3)

Me, if I were a Banco do Brasil, Caixa, and BNDES (public banks) employee, I Left index finger pointing down to the right

would be with three fleas Left index finger pointing to the ear

BNDES (public banks) employee, I (1) would be with three fleas (2) behind the ear (3). Note: We use a literal translation, to better clarify the gestures use. A similar expression in English would be: “to smell a rat” (three rats).


Maíra Avelar

behind the ear Repeated gesture with the left index finger pointing up and down to the front Figure 9. Multimodal Semiotic Blending applied to Dilma’s speech from 2014 (Sequence #3)

In this political-­electoral debate, Dilma uses a classic argumentative structure – If P, then Q – and uses a metaphor to compose that structure: If I was from the public bank staff, then I would have three fleas behind the ear to denounce the fact that the PSDB, Aécio’s party, is known for its privatizations. Based on the statements made by Aécio’s future Minister of the Economy, Dilma makes a prediction of the opponent’s strategy, foreseeing that he and his government will privatize or destroy the public banks and accuses him of this. When issuing her warning that, according to her, the public bank staff should be aware of the imminent risks, Dilma verbally-­ gesturally stages a metaphor Being very suspicious is being with three fleas

Multimodal Metaphors in Brazilian Political Debates


behind the ear, and depicts it pointing to her ear and, after, pointing up and down in the television viewer’s direction. It is possible to identify some vital relations that compress the metaphorical process, such as: analogy [suspicion = fleas behind the ears], cause-­effect [Aécio’s government → privatization and destruction of public banks], and identity [PSDB = privatization party].

4.4. Aécio’s sequence from the 2014 debate (Sequence #4) The following sequence, also from the 2014 debate, is a reply from Aécio concerning Petrobras. At first, he is ironic about Dilma’s answer that Petrobras is a very solid company. Next, he denounces the enterprise’s mismanagement and corruption, as demonstrated below, and finally he presents measures for Petrobras’ professionalization. The example is shown and analyzed in Figure 10.4 Figure 10. Gestures and descriptions of Aécio’s speech from 2014 (Sequence #4)

The company (Petrobras) left the economics section Descending gesture, with both hands, palms facing each other, on the left side of the body

to be published in the police report pages Repeated descending gesture, with both hands, palms facing each other, on the right side of the body 4 Aécio: Ela (a Petrobras) deixou as páginas econômicas (1) para frequantar as páginas policiais (2). The company (Petrobras) left the economics section (1) to be published in the police report pages (2).


Maíra Avelar

This sample can be considered as illustrating the metaphor time is space, unfolded in the specific gestural metaphors: The past is on the left and The present is on the right. As indicated by Calbris (2008) and Casassanto and Jasmin (2010), in many cases past situations are located on the left and the present/future situations on the right, reproducing the direction of writing in Western societies. Aécio draws the television viewer’s attention to Dilma’s terrible economic management, comparing Petrobras’ good development in the past to the corruption within the company in the present. The repetition of the second gesture is a mechanism to highlight the corruption problem. Using the MSB model, the analysis in Figure 11 can be given. Figure 11. Multimodal Semiotic Blending model, applied to Aécio’s speech from 2014 (Sequence #4)

In this communicative situation, the political-­electoral debate, Aécio denounces Petrobras’ precarious situation, based on the press accusations that the company was targeted for mismanagement and corruption. From the point of view of the discursive strategies mobilized by the participants, Aécio metaphorically describes Petrobras’ situation as going from x to (the crime section). Through the use of this metaphor, the candidate compares the good place where Petrobras used to be with the bad place it is now. He depicts this comparison through a spatial opposition (left = past x right = present). It is possible to identify some vital relations that compress the metaphorical process, such as: analogy (petroleum = winning ticket),

Multimodal Metaphors in Brazilian Political Debates


space-­time [right=future x left=past], cause-­effect [Dilma’s government → mismanagement and corruption problems], change [Petrobras was in the economics section → Petrobras is in the police report pages].

5. Discussion and conclusions As a political-­electoral debate is subject to communicational rules that restrict the candidates’ argumentation in time and form, its discursive process is strategically performed through multimodal metaphors that compress the argumentation, producing experiential gestalten. These make it more efficient in interactional terms. Therefore, in this context, metaphors work as compression mechanisms of the argumentation expressed in speech turns. As stated in Section 2, the hypothesis was that of an inverse correlation between degree of conventionality and degree of compression in the metaphors analyzed: the more conventional the metaphors are, the lower their degree of compression and the lower their degree of metaphoricity. On the other hand, the less conventional they are the higher their degree of compression. As shown in the analyses in Section 4, leaving x and going to (#4) is very conventional and hence less compressed. Conversely, making foam (#2) and being a winning ticket (#1) are on the novel and highly compressed end of the continuum. Somewhat intermediary is the case with three fleas behind the ear (#3). Regarding the metaphorical compressions promoted by the vital relations, (re)presentation and analogy could be seen to be pervasive to all the samples. Space-­ Time and Identity-­Change are used to compare the parties’ policies and mostly to undermine the opponent’s party and credibility. In this chapter, we have discussed increasingly complex models of the use of metaphor in communicative exchanges, from the original CMT to our MSB model. Using the latter, the analysis of the multimodal metaphors employed by the candidates in our material enabled us to discuss strategies for conveying particular values, and for achieving specific perlocutionary effects. The performed analysis enabled us to validate claims made in both cognitive linguistics and cognitive semiotics that our conceptual system is broader than our linguistic system. Further, the analysis of the interrelation between the verbal and the gestural factors allows us to overcome the criticism of circular reasoning in earlier metaphor theories: “Verbal metaphorical expressions are evidence of conceptual metaphors …. We know that because we see conceptual metaphors expressed in language” (Cienki 2008: 16). Regarding our model of Multimodal Semiotic Blending, we believe that this extension of earlier models allows adequate analysis of multimodal resources, such as gestures and prosody, in which the enunciation is systematically anchored (Auchlin 2013), and indicates how the represented shared meaning is constructed. In other words, the unfolding of the Presentation Space allows for the integration of the abstract conceptual representation together with the sensorimotor experience of the enunciation’s material characteristics. Consequently, it allows us to show in detail how each modality works, how they interact, and jointly lead to emergent metaphors. We maintain that the MSB could therefore serve as a useful tool within cognitive semiotics.

Marco Bagli

Chapter 22 “A Light in the Darkness”: Making Sense of Spatial and Lightness Perception 1. Introduction This chapter reveals and confirms the implicit association between space and lightness by looking at the correlation between these two domains in two narratives (Demian, by Hermann Hesse and The Rocky Horror Picture Show, hence TRHPS, by Jim Sharman). In both texts there is a strong association between the two domains used to create meaning in the unfolding of the plot. Space is described in terms of lightness, and lightness is given a spatial connotation, following a seemingly recurring pattern where the spatial semantic categories of in and out are associated with the perceptual categories of light and dark. Both kinds of associations are possible, as both are experientially motivated, i.e. the implicit association of in with light and out with dark and vice versa; the association of in with dark and out with light. For instance, the first association (in with light and out with dark) is motivated by an experience such as a lit room at night, when it is light inside and dark outside. It is not uncommon though that the opposite situation be activated (in with dark and out with light). This can be the case of a daytime experience looking out of a room into the sunlight, or the case of a tunnel, where we are in the dark and can see the opening of light at the end. The impetus of the present research stems from an apparent lack of understanding of the linguistic interplay between space and lightness perception (see Sandford 2011 for an exception), even though the topic of spatial conceptualization has been thoroughly discussed. In the following, I provide the theoretical background for my findings (Section 2), before proceeding to a discussion of the texts analyzed in Section 3, and the Implicit Association Test (Section 4) that offers a conceptual explanation for the literary mechanism employed. In Section 4.3 I provide a discussion of the results, and conclusions are given in Section 5.

2. Theoretical background The theoretical concepts I describe here to facilitate an understanding of the research paradigm this study follows include the concepts of spatial and lightness understanding through image schemas, conceptual metaphors, and blending theory. Spatial cognition plays a fundamental role in our everyday lives, not only in our navigation through space, but also in our thinking and reasoning. Looking at spatial semantics may unravel the conceptual structure at its base (see Zlatev 2007c; Evans and Chilton 2010).


Marco Bagli

In order to understand the spatial relations that are conceptualized through metaphor it is important to describe the role of image schemas, and the definition of image schema that I adopt. Johnson (1987: 2) characterizes an image schema as “a dynamic pattern that functions somewhat like the abstract structure of an image, and thereby connects up a vast range of different experiences that manifest this same recurring structure”. Image schemas serve the purpose of structuring our experiences into meaningful units, but also function as supportive structures for our thought and language (Oakley 2007). For example, consider the following excerpt, taken from Johnson’s book The Body in the Mind, as an example of occurrences (of linguistic manifestations) of the container image schema: You wake out of a deep sleep and peer out from beneath the covers into your room. You gradually emerge out of your stupor, pull yourself out from under the covers, climb into your robe, stretch out your limbs, and walk in a daze out of the bedroom and into the bathroom. (Johnson 1987: 30)

This suggests a limited selection of the many ways in which the container image schema helps structure our everyday experience of the world. It is applied not only to physical containers, like the bedroom or the bathroom, but also to metaphorical ones, like sleep. Such an image schema is just one of the many identified in cognitive linguistic literature (others are: paths, links, forces, balance, up-­down, front-­back, part-­whole, center-­periphery, etc.). Although the original concept of image schema, as illustrated above, is mainly rooted in bodily motion and physical interaction, Forceville and Renckens (2013) also consider the light/dark pairing as an image schema. Lightness perception is a basic experience, and the contrast light/dark is common in our everyday lives. We take light and dark for granted, and for this reason we may overlook their importance in metaphor realization. In observing actual utterances though, it is clear how this perception motivates language, as demonstrated by the pervasiveness of conceptual metaphors involving light/dark as their source domain. Moreover, light, which also causes shadow, gives us vital information in our visual interpretation of space. As well-­known, one of the principal tenets of Conceptual Metaphor Theory (CMT) is that metaphor is not only a tool used in rhetoric (as previously thought), but it is a cognitive device that structures our cognition of the world around us: “the essence of metaphor is understanding and experiencing one kind of thing in terms of another” (Lakoff and Johnson 1980: 5). Thus, for example, the domain of love is structured in English with help of the source domain of journey (Lakoff and Johnson 1980: 44–5): (1) love is a journey a) Look how far we’ve come. b) We’re at a crossroads. c) We’ll just have to go our separate ways. d) We can’t turn back now. e) I don’t think this relationship is going anywhere.

Making Sense of Space and Light


f) It’s been a long, bumpy road. g) This relationship is a dead-­end street. h) We’re just spinning our wheels.

A conceptual domain may be defined as the “conceptual representation, or knowledge, of any coherent segment of experience” (Kövecses 2010: 324). Conceptual metaphors (henceforth CM), however, do not use all the knowledge stored in a single domain. Only some elements are selected, which map knowledge from the source domain onto the target domain. This set of relationships is called a source-­target mapping and is exemplified in Table 1, with reference to example (1). Table 1. Source-­target mapping for the love is a journey metaphor (Evans and Green 2006: 295) Source: journey


Target: love




love relationship


events in the relationship

distance covered

progress made

obstacles encountered

difficulties experienced

decisions about direction

choices about what to do

destination of the journey

goals of the relationship

A considerable amount of research has been produced in the CMT framework, accounting for many metaphorical realizations, not only in language, but also in visual arts, and other aspects of our everyday lives. Some linguistic examples of conceptual metaphor regarding the source domains of light and seeing are: intelligence is a light source (He is very bright; He can always shed light on the problem); hope is light (She has bright hopes); impediments to awareness are impediments to seeing (I was in the dark for a long time). The opposition of light/dark is also used in conceptual metaphors that have a moral connotation: thus giving rise to goodness is light (Look on the bright side!) and badness is darkness (The dark side of the force).1 However, the notion of CM is not sufficient in accounting for more complex examples such as that given in (2), following Grady, Oakley and Coulson (1999: 103). (2) This surgeon is a butcher.

1 All of the CMs and examples of their linguistic realization have been taken from the Master Metaphor List, available online at metaphor/METAPHORLIST.pdf.


Marco Bagli

A mapping from source domain A (surgeon) onto a target domain B (butcher) cannot account for the negative assessment of (2), which can be considered as an emergent structure in our understanding of the sentence. There is no element in either source or target domain that actually has a negative value on its own; yet, (2) is understood as being derogatory for such a surgeon. Blending Theory (henceforth BT) as proposed by Fauconnier and Turner (2002) is a model that can account for such more complex cases of metaphorical understanding. In BT the basic units of cognitive structure are mental spaces, not conceptual domains (as in CM). The two differ in that the mental spaces are “a short term construct informed by the more general and more stable knowledge structures associated with a particular domain” (Grady, Oakley and Coulson 1999: 102). In BT two mental spaces, referred to as input spaces, are connected via cross-­space mappings. The third element in BT is the generic space: it maps knowledge onto each of the inputs and it contains knowledge shared by the two input spaces, thus enabling the mappings to take place. The fourth space is the blended space: it is the result of the blending process, and it contains an emergent structure, i.e. knowledge contained in neither of the inputs. Figure 1 contains a simplified, schematic representation of BT as applied to (2) (from Evans and Green 2006): Figure 1. The surgeon as butcher blending model (adapted from Evans and Green, 2006: 406)

Making Sense of Space and Light


The first input space is the surgeon space, while the second is the butcher space. The two share the general, abstract knowledge of “agent” (a surgeon in Input 1, and a butcher in 2), the knowledge of “undergoer” (a person in Input 1, an animal in 2), the knowledge of “work space” (“the operating room” and “the abattoir”). Both have a goal (“healing” in one space, and “severing flesh” in the other), and a means (“surgery” and “butchery”). This abstract knowledge is shared in the generic space. In the blended space though we have an emergent structure, namely “incompetence”, as a result of the mismatch between the goal (“healing”) and the means (“butchery”) performed by a surgeon. In this chapter, BT is used to describe what happens in the dominant conceptualization of light/dark and in/out. Before explaining the model, I discuss two texts that express this conceptual construal: the conceptualization of an experience communicated by a speaker for the understanding of a hearer (Croft and Cruse 2004).

3. Analysis of the texts 3.1. Demian Demian is a Bildungsroman, first published in 1919, written by the German writer Hermann Hesse. It is the story of a young boy, Emil Sinclair, and of the people and experiences that contribute to his growth into a man. From the beginning of the novel, Hesse makes a clear-­cut distinction between two different, separate worlds that coexist in the book and through which the main character moves. These two worlds are defined in terms of Lightness and Darkness, as illustrated in example (3). The moral connotation of this Manichean division of the world into two halves according to the religious, Christian cultural background becomes evident in (4). (3) Two worlds intermingled there; from two opposite poles came the day and the night. (p. 3) (4) What Demian had just said about God and the Devil, about the godly official world and the hushed-­up devilish world, was precisely my own idea, my own myth: the idea of the two worlds, or half-­worlds, one of light and one of darkness. (p. 49)

The two worlds are not only characterized by Lightness or Darkness, but also identified in terms of spatial relations. Emil’s parental home in (5) and (6) is depicted as a source of light, in that it shines radiance and clarity. (5) One world was the parental home, but actually it was even narrower – in truth it contained only my parents. On the whole I knew this world well: its name was Mother and Father, it was love and strict rules, education and example. What belonged to this world was gently shining radiance, clarity, and cleanliness; quiet friendly conversation; washed hands, clean clothes, good behaviour. (p. 3, my emphasis)


Marco Bagli

(6) Back home! Oh, the good, the blessed return to our house, to brightness and peace! (p. 8, my emphasis)

Conversely, the dark world (7) is seen as surrounding the house and being everything that is not light. (7) All these beautiful, horrible, wild, cruel things existed all around – in the next street over, in the house next door. Policemen and beggars ran around, drunks beat their wives, gaggles of girls poured out of the factories after work, old women could cast a spell on you and make you sick, bands of robbers were living in the forest, arsonists were being caught by the country police – this powerful second world welled up everywhere, its scent was everywhere, except in our rooms where Mother and Father were. (p. 4, my emphasis)

As the plot develops, Emil seems to have forgotten this clear distinction of the two worlds, especially after having met Demian, who seems a sort of “dark doppelganger” of the main character. However, after a night out with his friends, Emil is hungover and describes his feelings as follows in (8) and (9): (8) Between headache, nausea, and unspeakable thirst, an image rose up in my soul that I had not seen for a long time: I saw my parents’ house, my hometown, Father and Mother, my sisters, the garden; I saw my quiet, comfortable bedroom, the school and the market square, saw Demian and our confirmation classes – all of it flooded with bright light, radiant, all of it wonderful, godly and pure, and I now knew that everything, everything had still belonged to me the day before, just a few hours ago had been waiting for me to return, but now, only now in this moment, it had sunk forever under the waves, was cursed, was no longer mine. It had thrown me out and now looked upon me with disgust! (p. 58, my emphasis) (9) Once again I belonged entirely to the dark world – to the devil – and in that world I was considered a splendid fellow. (p. 58–9, my emphasis)

Emil’s description of his feelings towards his family is made particularly vivid by the number of metaphors and images based on the light/dark and in/out opposition, which are consistent with the metaphor and image schemas of the whole text: light has a positive value and is portrayed as radiating from the inside of a house into a scary, dangerous and dark world outside of it.

3.2. The Rocky Horror Picture Show The second narrative is The Rocky Horror Picture Show. It first appeared on screen in 1976, and its popularity has grown constantly since then, granting it the status of a cult movie. Moreover, the original musical has been staged in many theatres worldwide since 1973. It features the story of Brad and Janet, a newly engaged-­couple, who are driving through a dark wood to reach a friend’s house. Unfortunately, their car breaks down during a storm in the middle of the journey. While they are wandering around looking for help, they end up at Dr. Frankenstein’s castle, where

Making Sense of Space and Light


a flamboyant transvestite scientist gives life to a muscular, beefy creature named Rocky Horror. The little accident during the route also reflects on their relationship, as if it were a metaphorical journey. Although the references to Lightness and Darkness are pervasive throughout the whole film and the lyrics of the songs, I concentrate on just one particular scene at the beginning of the movie. It is the scene in which Brad and Janet come across Dr. Frankenstein’s castle for the first time.2 It is late at night, and the couple is caught in an ominous storm. As if it were not enough, a tire blows out, so the couple abandon their car in the middle of the forest and start walking toward the nearby castle. As soon as they enter the garden of the estate, Janet starts singing the lyrics in (10). (10) In the velvet darkness, Of the blackest night, Burning bright, There’s a guiding star, No matter what or who you are.

There’s a light (Over at the Frankenstein place) There’s a light (Burning in the fireplace) There’s a light, light. In the darkness of everybody’s life.

As soon as the actress reunites with her companion and as they start singing the verse “There’s a light” the castle appears on the screen. The interior of the building is illuminated, as opposed to the exterior that is clearly dark. Light therefore seems to shine from the inside of the castle. The director gives importance to this detail by zooming in towards the dome. Lightness is depicted as coming from inside the building, thus superimposing a container image schema onto the light image schema.

3.3. Blended space In both texts there is a similar understanding of Lightness. In Demian, the parents’ house imposes a container image schema onto the light image schema, while in TRHPS the castle serves this scope. The association between the two buildings3 and Light is relevant to the understanding of the plot via elaboration of a blended space, as illustrated in Figure 2. The first input space is the building space, which clearly has a spatial dimension, it is structured on a container image schema, and therefore has an in/out orientation. Both in (5) and (6) and in (10), the inside of the 2 This corresponds to scene 24 in TRHPS script available online at http://www.imsdb. com/scripts/Rocky-Horror-Picture-Show,-The.html. 3 For the aim of the study I consider both the parents’ house and Dr. Frankenstein’s castle under the same superordinate label of “building”. The Blending Model proposed here aims to account for metaphorical realizations in both texts.


Marco Bagli

building is depicted as being illuminated. The second input space is represented by the light/dark input space. It is not characterized per se by a spatial dimension, but it provides important information in our understanding of space, like shape or distance. It can be considered as an image schema on its own (see Section 2), and its main feature is that of illuminating. The generic space contains image schemas, the understanding of spatial orientation and the perception of lightness. When in the book (see ex. 3–9) and in the film and related song (ex. 10) the two input spaces (the building space and the light/dark input space) are associated one to another, they conceptually compress together, thus creating the blended space the inside of the building is good. The emergent structure in this blend is the positive value attributed to the interior of the buildings. In both narratives the main characters (and the audience) understand the interior of the buildings as having a positive value: in Demian (see examples 8–9) Emil describes himself as “thrown out” into “the dark world – to the devil” from the inside of his parents’ house, where everything was “flooded with bright light, radiant, […], wonderful, godly and pure”. Similarly, in TRHPS, Brad and Janet are persuaded to enter the house notwithstanding the danger (made explicit by a sign on the gate) by virtue of the light shining from inside the dome, as suggested by the song they sing. The blend created is a double-­scope network (Fauconnier and Turner 2002), in that its inherent conceptual structure is built with characteristics of both input spaces. Figure 2. The interior of the building is good blending model

Making Sense of Space and Light


Moreover, in TRHPS, the emergent structure is used to create irony. The two characters (and the audience) are led to expect the inside of the castle to be “good” and to be a safe shelter for the night during the storm. This is due to the elaboration of the blend that occurs in the understanding of this scene. In fact, as soon as they enter, this situation is reversed, as anticipated by the narrator and his rhetoric question in (11). (11) And so – after braving the inclement weather, and some not too little time – it seemed that fortune had smiled on Brad and Janet and that they had found the assistance that their plight required – or had they? – There was certainly something about this house (to which a flat tyre and a wet night had brought them) that made the both of them uneasy – but, if they were to reach their destination that night, they would have to ignore such feelings and take advantage of whatever help was offered.

The director deliberately violates our metaphorical understanding of the scene to create irony and a surprising effect around which the whole film revolves. Eventually, the two characters will experience self-­emancipation from the stereotypes and difficulties they had at the beginning of the film. This change can be considered as a sophisticated way in which the director restores the “violated” emergent structure of the blended space.

3.4. Discussion Both the house in Demian and the castle in TRHPS are depicted as radiating “light” or “goodness” in the respective narratives. This quality of the two buildings is used to create meaning in the unfolding of the two plots, via elaboration of the blended space that is represented in Figure 2. The conceptual metaphor that motivates the emergent structure is the good is light metaphor (Forceville and Renckens 2013). It may be regarded as a primary metaphor, as it expresses the correlation of the two basic domains in experience (Grady 1997). In many cultures, day and night represent good and evil respectively, in a symbolism that “probably goes as far back as the history of man” (Arnheim 1974 [1954]: 324). This positive value attached to light makes biological sense, in that sight is restricted in the dark, making one more vulnerable to potential threats and dangers (Forceville and Renckens 2013). In this sense, the construal of light as good and dark as bad may be seen as universally motivated, whether or not this motivation is conventionalized in a particular language/culture or not (see Zlatev 2011). Hence, the Judeo-­Christian tradition (which functions as a background to the texts analyzed) can be seen as one possible cultural elaboration of this primary metaphor, which dictates the emergent structure in the blend. This of course does not mean that the domains of good and bad are universally conceptualised in terms of light and dark nor, vice versa, that the domains of light and dark structure only the domains of good and bad. Crucially, the emergent structure in the blend allows for the understanding of the interior of the two buildings as having a positive connotation. Light is given a


Marco Bagli

spatial connotation as it emerges from the interior of the buildings from which it shines, while the spatial orientation of in/out is given a moral connotation, i.e. in has a positive value, whereas out has a negative one. Duly, these observations and analyses are consistent with the results described in the following section.

4. The Implicit Association Test The Implicit Association Test (hence IAT) (Greenwald, McGhee and Schwartz 1998) is a testing procedure to understand implicit attitudes towards pairs of targets in memory. Since then, its relevance and usage have increased consistently, not only in social psychology (for which it was originally conceived), but also in other fields, such as cognitive semantics (Sandford 2011, forthcoming), as a tool to understand the implicit association strength between pairs of semantic categories. In the original study, participants were asked to assess the implicit attitude towards the categories of Flowers and Insects, by associating them with the categories of Pleasant and Unpleasant. The overall, schematic structure of such an IAT is presented in Table 2. Table 2. Flower-­Insects IAT overview (adapted from Greenwald et al. 1998: 1465). Block

Left key assignment (E)

Right key assignment (I)








flower pleasant

insect unpleasant


flower pleasant

insect unpleasant





flower unpleasant

insect pleasant


flower unpleasant

insect pleasant

In block 1, the participants are asked to categorize each stimulus presented on the screen by hitting a corresponding response-­key (usually E as left response-­key and I as right response-­key). Block 2 is similar to block 1, but the other two categories are practiced. In blocks 3 and 4, the two categories are overlapping: thus Flower and Pleasant are associated with the left response-­key, while Insect and Unpleasant with the right response-­key. In block 5 the categories of block 2 switch response-­ key (i.e. Unpleasant is the left response-­key and Pleasant is the right response-­key).

Making Sense of Space and Light


Blocks 6 and 7 are similar to block 3 and 4, but the Pleasant and Unpleasant categories switch response keys. Blocks 3, 4 and 6, 7 are called critical blocks, because the two categories are overlapping and here is where the discrimination takes place. Furthermore, they are divided into compatible critical block, when the categories sharing the same response-­key are implicitly associated (according to the researcher’s hypothesis), and incompatible critical block when the categories sharing the same response-­key are not implicitly associated. The choice of which association is considered compatible or incompatible is determined by the research hypothesis. Blocks 1, 2 and 5 go under the name of practice blocks, because they serve to familiarize the participants with the testing procedure. It goes without saying that the stimuli in each category need to be entrenched in the category itself to represent an easy and automatic choice. The participant must not spend too much time trying to categorize each stimulus; otherwise the results are biased by reasoning, and would not provide an actual implicit attitude. In fact, the distinctive feature of this testing procedure is its capacity to reveal implicit attitudes, beyond explicit awareness. Thus, results may sometimes seem unexpected and surprising to the participant, who may not consciously think that they have the sort of attitude that emerges from the test results. One such case is represented by the results of the IATs that deal with social prejudice, such as the black American vs. white American IAT. This experiment was motivated by results in previous tests, in which there were “automatic expressions of race-­related stereotypes and attitudes that are consciously disavowed by the subjects who display them” (Greenwald et al. 1998: 1473). Results of the test showed a clear preference towards the category of white American that was not supported by consciously elicited data that had been supplied through self-­report questionnaire. The IAT thus unmasked a subtle form of evaluative difference towards the two categories. The basic idea underlying the test is that the easier a mental task is, the quicker its performance. Time is therefore an essential component: performance in the critical blocks is assumed to be quicker when the two categories sharing the same response-­key are also implicitly associated. By calculating the time it took each participant to perform each critical block, it becomes clear which association between categories is stronger.

4.1. The in-­out / light-­dark IAT The categories for the IAT developed for this study were composed of eight lexical items, all of which were selected through corpus analysis of dictionaries, and with the aid of some pre-­tests that I had conducted to verify that each item was clearly attributable to its category. Each item functioned as a stimulus in the IAT test. The categories were composed as illustrated in Table 3. The container image schema structured the prototypical meanings of many of the stimuli representing the spatial categories of in/out (internal, entering, including, joining; external, exiting, exclud-


Marco Bagli

ing, removing). The remaining stimuli (absorbing, focusing, collecting, beginning; extending, completing, distributing, ending) were selected consulting the Particles Index in the Phrasal Verbs Dictionary (Collins’ Cobuild Phrasal Verbs Dictionary, Particles Index, 2011 [2002]: 20–4, 35–41). In this section, the authors provide a list of meanings that prepositions can give to a verb root when combined in a phrasal verb construction. So, for example, the preposition “in” can give the meaning of joining in a phrasal verb like “to get in”. Therefore, joining was chosen as a stimulus representing the category in. The stimuli for the categories of Light and Dark were taken from Sandford (2011), where the author investigates the linguistic categorial association of light-­dark and near-­far. This choice allows for a direct comparison between the construal of near-­far and in-­out with the lightness categories. Some of the stimuli share a direct reference to light and dark (brightness, luminosity, day; shadow, obscurity, night); others were chosen according to the mappings of the cm knowing is seeing (knowledge, clarity; ignorance, secrecy); and the remaining stimuli (red, yellow, white; green, blue, black) are the basic color terms representing the bipolar warm-­light and cool-­dark hue dimension. Table 3. The categories for the in-­out / light-­dark IAT in




































The test was carried out at the Umbra Institute in Perugia (Italy), between November and December 2012. 31 native speakers of English with average age of 21 participated. Participants were reassured about the anonymity and the nature of the test, which is not a pass or fail test. The test was carried out on a laptop in an artificially lit room. The category association of in and light sharing the same response key was considered as a compatible block and the same for out and dark). Conversely, the incompatible block was when the categories of in and dark shared the same response-­key (and the same for out and light).

Making Sense of Space and Light


4.2. Results and discussion The revised scoring algorithm of Greenwald et al. (2003) was used for the analysis. According to this scoring procedure, the IAT results proved to be consistent with the initial hypothesis: the category of in was strongly associated with that of light, for the group as a whole and for all individual participants. Furthermore, 71 % associated the category of in with that of light with a strong automatic preference.4 The same association that emerged from the analysis of the texts in Section 3 seems to be supported by the results of the IAT test. This association suggests an intrinsic positive value of the spatial relation in as opposed to out. This preference for the association of in with light is coherent with CMs such as the visual field is a container (Lakoff and Johnson 1980: 30), knowing is seeing and ignorance is darkness. If we see something, this is because it is in our visual field. But for us as human beings, to see something requires light. Only when our visual field is illuminated are we able to perceive what is in it, thus correlating the experience of being in the visual field with that of light. Conversely, what is not in, or out, of our visual field is unknown, thus allowing the structuring of the domain of ignorance in terms of darkness. However striking these initial results are, further tests should be conducted to confirm this dominant default association.

5. Summary and conclusions This chapter discussed the interplay between concepts of space and light. I have proposed an analysis of two different texts (Demian by Hermann Hesse and The Rocky Horror Picture Show by Jim Sharman) in which the association of light with the interior of two buildings (a house in Demian, and a castle in TRHPS) is exploited in the unfolding of the plot. In both cases, the creation of meaning is granted via the elaboration of a blending process that results in a blended space (see Section 3, Figure 2) where a positive value is given to the interior of the buildings. In Section 4, I presented the results of an Implicit Association Test developed to assess the implicit association that occurs between the spatial categories if in and out and the perceptual categories of light and dark. While the results were elicited within a specific group of English speakers, the unanimity and the consistency with which the participants associated in with light were striking. All of the participants categorized the category of in with that of light, thus suggesting the psychological reality of the association observed in the texts. Further tests should be conducted to assess whether this association holds in different cultures and languages. If they were to give similar results, it would be consistent with the hypothesis that the motivation for such results is to be found in 4 According to the IAT script, D values lower than 0.15 indicate a “little to no” preference, values between 0.15 and 0.35 indicate “a slight” preference; values between 0.35 and 0.65 indicate “a moderate” preference; values higher or equal to 0.65 indicate “a strong” preference.


Marco Bagli

our embodied cognition and perception. The nature of human beings does not allow us to see in the dark or during the night, thus exposing us to potential threats and perils. This experience can potentially be at the basis of the positive value attributed to light, as suggested by Forceville and Renckens (2013). The status of being “in” something, and therefore of being contained, can also be seen as a protected status against potential risks and dangers coming from outside. This experience can give in a positive value. These shared positive values appear to be the basis for the correlation between in and light. Of course, one can imagine a reverse situation in which in has a negative value, because the container is a bad thing, and out of it has a positive value, e.g. in trouble, out of jail. Still, the results of the test revealed a default association of in with light (and out with dark), and therefore being more likely to have a positive value, at least for English speaking participants. The creation of meaning by human beings is a highly dynamic and complex process that involves multiple steps and devices, regulated by imagination: Imagination is at work, sometimes invisibly, in even the most mundane construction of meaning, and its fundamental cognitive operations are the same across radically different phenomena, from the apparently most creative to the most commonplace. These operations are characteristics of the human species. Though taken for granted by human beings, they are extraordinary by any other standard. (Fauconnier and Turner 2002: 89).

The cognitive devices employed to construe the meaning described in this chapter are those of image schemas, conceptual metaphor, and conceptual blending. The account of their interaction presented in this chapter may help us understand how we construe the meaning of lightness in relation to space. On its topic, and methodology, the study represents a bridge between different disciplines: cognitive linguistics, literary studies, film studies, and semiotics, and thus engages in one kind of cognitive semiotics.

Acknowledgments Firstly, this chapter could have not been written without the participants of the Implicit Association Test and the staff at the Umbra Institute, who have always been very helpful and cooperative. Secondly, I would like to thank the reviewers and Jordan Zlatev for their illuminating comments on the first draft of the chapter. Lastly, I wish to express my gratitude to Prof. Jodi Sandford for her invaluable help and guidance throughout these years of working together.

Katherine O’Doherty Jensen

Chapter 23 Performative Metaphor in Cultural Practices I cannot hear what you are saying – your actions speak too loud. Chinese proverb

1. Introduction I draw attention to a cognitive operation that has been overlooked, but nevertheless has pervasive functions in the constitution of human cultures. This operation concerns the capacity to discern and align gradient differences between entities, such that quite different entities – including affects, actions, events and material objects – are treated as analogues of each other. It mediates the performance of signifying practices and the communication of un-­verbalized meanings in many spheres of everyday life. Natural languages express and mediate discernments of categorial distinctions between one kind of entity and another. They also mediate the shared appreciation of gradient distinctions, that is to say, “more-­or-less” differences – often expressed by modifiers such as very, extremely, almost, including gradable adjectives such as younger, higher, closer. Categories are of course also distinguished non-­verbally. Just as a gesture can signify “good-­bye”, so manners of dress can signify gender difference, while one composition of foods signifies “breakfast” and quite a different composition signifies “dinner”. Sometimes the point is made that categorial distinctions tend towards ambiguity when communicated non-­verbally, compared with the relative precision of language. The point under consideration here, however, is that non-­verbal communication differs from language insofar as it centrally concerns the display, expression and discernment of gradient rather than categorial distinctions. Moreover, the shared appreciation of gradient distinctions can become entrenched in ways of doing things that make sense to practitioners, although they may find it difficult or impossible to put into words exactly what such practices mean. While the constitutive role of language in human culture is widely acknowledged, the shared appreciation, normative governance and cultural entrenchment of gradient differences, as expressed by humans in non-­discursive practices, are not. My objective is to redress this state of affairs. The point in question is that while non-­verbal communicative actions such as waving, bowing, hugging and kissing, are different categories of social interaction, the difference between one instantiation and another of the same kind is


Katherine O’Doherty Jensen

quantitative and gradient in character. Even between the same interactants, a hug given in one social context is likely to be closer, more long-­lasting or more intense than that of another. Such gradient differences are discerned as being meaningful, although such meaning tends to remain implicit and is always deniable insofar as it is not put into words. Nevertheless, any given hug is located at some point on gradient continua of relative proximity, duration and intensity; it is more or less close – distant, long – short, and high – low on a scale of intensity. Familiarity with cultural norms can guide us as to whether a hug is expected in a given social context, but the challenge of appropriate performance always remains. This is the challenge of propriety, and it has been succinctly described in gradient terms and in negative form as concerning the discernment of precisely “how far is not going too far?” (Mayol 1998: 21) Appropriate performance, according to folk descriptions, “hits the mark”; it is “fitting” or “spot on”. It is never “way out”, “over the top” or “wide of the mark”. My topic concerns the cognitive operation whereby this discernment is accomplished by social actors and interactants in everyday life. Section 2 introduces the concept of performative metaphor – the objective of which, as seen from a first-­person perspective, is to act appropriately in a given social context by discerning which variant of a given non-­discursive cultural practice fulfils that intention. The concept developed here has its roots in earlier accounts of conceptual metaphor theory (Lakoff and Johnson 1980; Lakoff 1987; Lakoff and Turner 1989; Fauconnier and Turner 2002). While several media of analogical communication can be at issue in performative metaphors, including facial expression, voice tone, body movement and gesture, exemplification in the following sections is mainly focused upon uses of material objects as the medium of communication. Food practices serve as an excellent case in point, not only because they involve the use of objects that are often consumed or deployed in the interactive form of shared meals, but also because they are well illuminated by empirical social research. These practices also serve as an interesting point of departure for understanding non-­discursive (mimetic) practices of our species since – in one form or another – they have preceded our capacity for speech (Donald 1991). The following sections illustrate the relevance of analysing cultural practices from the perspective of a cognitive theory of metaphor that focuses upon what we do, rather than what we say. A case study is presented regarding a puzzling issue that has emerged from anthropological and sociological studies of food. It concerns differences between the eating practices of men and women and a corresponding tendency on the part of both to perceive some foods and dishes as being “masculine” and others as “feminine”. The ways in which social scientists have accounted for these differences hitherto are then compared with an account from the perspective of performative metaphor theory. I conclude by indicating some implications of this account for the field of cognitive semiotics.

Performative Metaphor in Cultural Practices


2. Performative metaphor Metaphorical thinking is at issue in what we do. Lakoff and Johnson (1980) re-­ vitalised this notion with the claim that conceptual metaphor is pervasive in everyday life, not just in language but in thought and action, and they concluded their argument with the suggestion that some of the metaphors we live by are partially preserved in everyday ritual activities. Psychologists have introduced terms such as enactive metaphor (Kirmayer 1992) and metaphorical act (Ogden 2002) to refer to idiosyncratic meanings as presented in gestures, and more recently much research has been devoted to examining the ways in which conceptual metaphors are instantiated in gestures and multimodal discourse (Cienki and Müller 2008). Gesture, posture, ritual activities and uses of material objects have long been conceived by anthropologists as being metaphorical in character. Douglas devoted decades of research to the observation and analysis of food practices, concluding that the way in which the British eat is as formally structured as a Bach sonata (Douglas and Nicod 1974). She explicated the structure underlying these and other cultural practices, and remained convinced that metaphorical thinking (“this code-­breaking, jigsaw-­puzzle solving activity of the human mind”) must be at play in what she termed the cognitive underpinnings of these practices (Douglas and Isherwood 1996 [1972]: viii). Her view was that the exact mechanisms of metaphor, comparison and social grading of events and food which make for cultural competence had not yet been established (Douglas 1984). Others have focused instead on performance in their analyses of metaphors at play in everyday interactions and social dramas (Fernandez 1986; Turner 1987), arguing, for example, that ritual is the acting out of metaphorical assertions (Fernandez 1986). The concept of performative metaphor introduced here, however, seeks to account for the character of metaphorical thinking in what we do, quite independently of what we say. Conceptual metaphor theory (CMT) (Lakoff and Johnson 1980; Lakoff 1987), followed by conceptual blending theory (BT) (Fauconnier and Turner 2002), shifted the focus away from the idea that metaphor is a matter of talking about something in terms of something else, and towards the idea that it concerns conceptualising something in terms of something else. In this light, accounting for the character of conceptual metaphors was seen as contributing to a theory of human cognition, while language practices merely provided an important source of evidence. The account of performative metaphor presented here takes this line of thought a step further with the suggestion that metaphorical thinking is also expressed in non-­ discursive practices, in which something is treated in terms of something else. It follows that these cultural practices also provide important sources of evidence with regard to our cognitive operations. The type of performative metaphor under consideration here can be defined as comprising a voluntary, intentional act mediated by the discernment of the grade of one entity on a gradient continuum as corresponding to the grade of a different entity on another gradient continuum, such that one entity is treated in terms of another in practice. This is a cognitive operation each of us performs many times


Katherine O’Doherty Jensen

in the course of the day. We do so, for example, when we peruse the contents of our implicitly graded wardrobe (ordinary – special) with a view to choosing items of clothing that correspond to the relative importance of events (unimportant – important) to be attended that day. It is precisely because a hug signifies a level of affection (little – much), the relative closeness of a relationship (distant – close) and perhaps the length of time that has elapsed since last performed (short – long) that its performance here and now can disappoint or delight, as well as deceive. An example of performative metaphor in a more formal setting is the normatively governed ritual we perform as members of an audience when a concert, opera or play reaches its conclusion. Given the gradient characteristics of the noise made by clapping – its relative level of volume (soft – loud) and its (short – long) duration – this can be performed by each member of an audience in a manner that signifies at which point on a gradient scale of relative appreciation or enthusiasm (low – high) his or her response belongs. The meaning of thunderous and long applause or barely audible and brief applause, or points between those extremes, is readily understood by all participants in a given social context as representing the audience’s level of enthusiasm and does not usually present any difficulties of interpretation. Common to both CMT and BT is the idea that metaphorical statements “make sense” when points of similarity are discerned between the elements in two conceptual domains. While Lakoff and Johnson sought to account for the process whereby metaphorical conceptualisations become entrenched in language use in ways that speakers perceive to be apt expressions of meaning, Fauconnier and Turner sought to provide a general account of meaning constructions – including metaphorical meanings. CMT remained closer to the more traditional accounts of metaphor in linguistic theory by conceiving the generation of meaning as being unidirectional, that is to say, as comprising the projection of image structure, conceptual structure and inferences from a “source” domain to a “target” domain. They departed from the traditional view by proposing that conceptual metaphors not only inform our way of thinking, but also underlie our manner of expressing ourselves – even when we do not put them into words in a direct fashion. For example, we do not say IDEAS ARE FOODS (Lakoff and Johnson 1980: 46–7). Nevertheless, common speech patterns clearly indicate that we frequently do employ images and concepts drawn from the source domain of food and project these onto the target domain of ideas when we want to express something about the character of the latter domain. Thus, we “plant the seeds” of one idea and “give birth” to another. Ideas “crop up”, “grow” and may prove “fruitful”. We academics work within “fields”, and while some of our students “devour” our ideas others find them “hard to swallow” or difficult to “digest”. It transpires that images and concepts drawn from the entire cycle of food production, distribution, consumption and waste disposal can in fact be identified in the ways we talk about ideas in practice (O’Doherty Jensen 2003). Evidence of this kind lends credence to the view that the main function of conceptual metaphors is that of facilitating the process of conceptualisation. That is to say, we use the images and concepts of a domain that is familiar to us in order to characterise a domain that is more abstract or less familiar.

Performative Metaphor in Cultural Practices


In BT, the distinction between conceptual domains as constituting respectively the “source” and “target” of conceptualisation is seen as one kind of meaning construction among others. Drawing on mental space theory (Fauconnier 1994, 1997), each such domain is described instead as comprising a number of elements (images, concepts) as well as the relations between these elements. When a cluster of this kind is activated in our minds, it is said to occupy a mental space, which can then serve as an “input” to an on-­going construction of meaning. The operation whereby two or more such inputs are compared, and counterpoint connections between their contents are discerned, is termed cross-­space mapping. According to this account, meanings are constructed within a conceptual integration network, which – apart from input spaces – includes two more mental spaces, termed respectively the generic and blended space. The generic space contains a conceptual or image structure that is common to the input spaces and to the blended space. The blended space is one in which some elements from at least two inputs are combined and often further developed. The contents of this space are referred to as a blend, which is said to have an emergent structure insofar as these contents are not found in either input space and can be further developed. A relatively new term such as same-­sex marriage serves to exemplify the character of a double-­scope blend, in which images, conceptual structure and inferences are selectively projected from two inputs (Fauconnier and Turner 2002: 134, 269–71). Cross-­space mapping identifies elements that are common to two inputs concerning marriage and same-­sex partnership respectively (partners, love, sex, living together), and these elements comprise generic and blended spaces. Selective projection then recruits additional structure from each input to the blended space (legal status from the marriage input and same-­sex couple from the other input) yielding a concept of same-­sex marriage. Inferences within a conceptual integration network of this kind do not follow a unidirectional path from source to target. Moreover, the same-­sex marriage blend is a new construction that changes the traditional concept of marriage as a term that formerly only referred to the legal status of a sexual relationship between a man and a woman. While metaphorical meaning is not at issue in this example, the BT framework as compared with CMT is seen as facilitating more differentiated analyses of the ways in which metaphorical meanings are constructed (Grady, Oakley and Coulsen 1999). The cognitive operation whereby quite different entities are compared and counterpoint connections between these entities are discerned also occurs in performative metaphor. The latter differs from CMT and BT, however, in that its function is that of discerning appropriate action, not that of facilitating conceptualisation and apt verbal expression. Since the entities compared in performative metaphors can refer to perceptual or conceptual contents (how loud, how much, how important), they are more adequately termed inputs rather than conceptual domains. Moreover, the mapping process between inputs in performative metaphor is bi-­ directional, which is to say, it does not proceed from a source to a target. Thus, the inputs to a performative metaphor concern the discernment of least two entities as being located respectively at particular points on gradient scales, and mapping compares these locations. The “output”, so to speak, in the form of acting


Katherine O’Doherty Jensen

one way rather than another is discerned as appropriate when these locations correspond to each other and as being inappropriate when they do not. For example, the ritual of applauding as seen from a first-­person perspective is one in which the relative volume and duration of the noise made by clapping hands together is compared with a felt level of appreciation and enthusiasm. The performance of clapping is discerned as appropriate and makes sense to the practitioner and to his or her interactants precisely in the measure that its volume and duration are perceived as corresponding to a felt level of enthusiasm. The meaning of applause can thus be described in the language of BT as comprising a blend. That meaning is not found in either input, as such. Had the level of appreciation been lower, we would expect the volume of clapping to be correspondingly lower – unless, of course, some practitioners are aiming to deceive. The ritual of applauding is a cross-­cultural convention, easily learned at an early point in our socialization, whereas the performance of clapping rests upon the here-­and-now discernment of each member of a given audience. The intervals on such gradient continua or scales are related to each other by reason of contiguity, while mapping between one such continuum and another rests upon the discernment of relative quantitative similarity. This is the operation described elsewhere as the discernment of symmetrical analogy in which mapping reveals a structural similarity between A and B that obtains in both directions simultaneously (Itkonen 2005). It can thus be said that a discernment of appropriate performance, when mediated by performative metaphor, always has the same generic form: the grade of A is the grade of B. Likewise, any discernment of disparity between the grade of A and the grade of B underlies the appraisal of a given performance as being inappropriate. (In such performances, A does not “hit the mark of” B, and cannot therefore be deemed as “fitting”.) Thus, enthusiastic members of an audience, who are thunderously applauding the leading soprano as she takes her bows on the first night, might well deem it inappropriate if the management of the opera house were to present her with a small posy of violets as a token of their appreciation. The reason being, that a small posy of humble violets occupies a location that is too low on a scale of possible floral tributes to adequately convey a high level of appreciation. It is to be noted that the floral tribute functions here as the signifier of a grade of affect. As such, it signifies a gradient meaning, not to be confused with the semantic content of any accompanying words such as congratulations (get well soon, heartfelt sympathy or something else). Given the ubiquity of analogy in cognitive operations, we may well wonder whether all actions mediated by discernments of symmetrical analogy are instances of performative metaphor. States of affect and the vocalisations of babies are analogously related, and the mother running to console her crying baby has doubtless discerned just how distressed her baby is. But there is no good reason to categorize this response as an instance of performative metaphor. Metaphoricity is itself a scalar concept in which meanings are more metaphorical in the measure that the entities compared are also dissimilar in relatively many respects. In the following

Performative Metaphor in Cultural Practices


sections of this chapter I shall be concerned with cultural practices that are largely conventionalised and in which dissimilar entities – material objects, events, activities and people – are treated as analogues of each other. A wider range of entities can be at issue in performative metaphors including affects, kinaesthetic feelings, sensory experiences, vocalisations, facial expressions, body movements and gestures. The inputs at issue may be categorised or named, but not necessarily so. In all cases, however, only entities ranged on gradient continua are in question. A gradient perspective is imposed upon any given set of categories by ranking exemplars according to one or more of their properties, such as relative size or importance. Thus it can be said that the intervals on gradient continua are ranged in the space between gradable antonyms (loud – soft, small – large, little – much, unimportant – important). However, natural languages do not commonly categorise or name these intervals, except when we number them. For this reason, it is often difficult to put into words what it is that has occurred when the grade of A is discerned as corresponding to the grade of B. An action is tacitly understood, a given option feels right, the proper thing to do is recognized by oneself and others, but we may well be at a loss to explain why this is the case. Performative metaphors are not explicit, declarative or reflective, but operate in the sphere of practical understanding in everyday life.

3. Food practices The consumption of food activates the senses of taste, smell and touch, each of which in contrast to the modalities of vision and sound yields experiences that are sequential and continuous, that is to say, non-­discrete. The properties of a pleasurably sweet or salty taste and those of fragrance, warmth and texture are matters of degree, such that these properties tend to be construed as belonging at some point on a gradient scale. Scalarity, as pointed out by Popova (2005), dominates perceptual experience in the so-­called “lower” sense modalities of taste, smell and touch, and is intrinsically normative in that it entails evaluation of instantiations of graded properties. The consumption of any particular foods, dishes or meals are thus experienced as being located at some point on a good – bad gradient, and as being more or less pleasant – unpleasant, satisfying – unsatisfying. Appraisal of gustatory experience concerns the discernment of just how good or pleasant or filling a particular item is. In this light, it is not surprising that foods are commonly rank-­ordered within the daily cycle of eating events. The food practices of one region differ from that of another on many points that are deemed crucial by practitioners. This observation underlies the view commonly held by anthropologists and sociologists that food cultures, like languages, are unique. They not only differ between regions, but also to a greater or lesser extent between tribes within a region or population groups within a larger society. For this reason, distinctive food practices can signify nationality, religious affiliation, ethnicity, social class, age or gender (cf. Bourdieu 1984; Warde 1997). If we are to understand the cultural controls upon signifying practices of this kind, it is held,


Katherine O’Doherty Jensen

we must be able to interpret a given set of practices within its specific cultural context. For the present purpose, however, I shall focus upon common features of food cultures that obtain cross-­culturally. On this point, fewer observations are recorded in the literature. The following are features that obtain cross-­culturally (Twigg 1984): • hot, cooked foods tend to be prized more than cold or raw foods • animal fats and vegetable oils are prized more than water as a cooking medium • animal products are prized more than vegetable products • cereal products occupy a position at the lowest point of the culinary scale

Food products within these superordinate categories are also ranked. Meat, for example, is generally prized more than other animal products, such as eggs and dairy products, just as fruits are prized more than vegetables, leaf vegetables more than roots and tubers, and some cereals are prized more than others. Red meats generally rank above the white meats of poultry or fish, and within these categories some variants, cuts and forms of processing are ranked more highly than others. These appraisals are reflected in the analogical medium of food prices and in the successive phases of global demand for specific kinds of food that follow upon rising standards of living over time (Grigg 1999). They are also reflected in the ways in which food products are mapped onto primary, secondary and tertiary meals and courses within a given food culture (O’Doherty Jensen 2002, 2003, 2009). Briefly, cereal products are generally deemed appropriate to the lowest-­ranking eating event within the meal system (breakfast). Cooked hot foods are deemed appropriate to the highest ranking meal of the day (dinner), meat being the usual centrepiece of the main course of this meal, while cereal products are often excluded as the staple ingredient of this course. Cereals, in the form of bread or pasta for instance, are rarely excluded from a first course of the primary meal or from the secondary meal of the day (lunch/supper), and its accompaniments are drawn from products at the mid-­point or higher locations on the culinary scale (vegetables, eggs, cheese, fish, meat). The culinary status of any meal is thus raised by the introduction of products that occupy a higher position on the culinary scale, by the inclusion of more courses or the addition of cooked dishes. The inclusion of fruit or alcohol, two products that are consumed for the most part outside the meal system, also have the function of raising the culinary status of courses or meals in which they are included. The manner in which food culture is constituted by aligning gradient scales and attuning them to each other becomes clearer by considering ways in which grades of meals are mapped onto grades of participants, the use of more or less household resources of time, money and skill, as well as grades of events. A shared meal is an occasion of joint attention, intention and action, in which the notion of having something in common takes the concrete form of consuming a portion of the same food. While the daily meal cycle serves to distinguish times of day (morning, noon

Performative Metaphor in Cultural Practices


and evening), the highest ranking meal is accorded the most household resources and is consumed at the time of day when most household members can be assembled together. There are many with whom a meal is never or rarely shared. The tramp is given food or drink at the door, neighbours, colleagues, acquaintances and friends meet for coffee, drinks or lunch, but a shared dinner tends to carry the inference or prospect of a closer friendship or intimacy. Everyday meals are distinguished from those of special occasions, which are constituted by including more people, using more resources, including more courses, and selecting items of food and drink that occupy locations on the culinary scale that are deemed appropriate to the grade of both the occasion and its participants. The “dinner party” concerns the consumption of material objects, but cognitively, constitutively and effectively it is an event within the “world mediated by meaning” (Lonergan 1972). It is an institutional fact (Searle 1995), insofar as the practices that canonically and normatively count as a dinner party are performed by its participants (cf. Sinha and Rodriguez 2008). Appropriate performance in this as in other social contexts rests on the cognitive operations whereby gradient scales are attuned to each other by social actors, such that the grade of food and drink, the setting, the status and attire of participants, as well as the duration of the event are all perceived as “fitting”.

4. Gendered food practices Traditionally, food practices are gendered. This has been apparent in three social contexts: the segregation of men and women in the work of commercial production and distribution, the division of labour in domestic settings, and the strictly gendered character of eating practices in societies in which food consumption is regulated by taboo. Food practices in each of these settings have served to distinguish the roles of men in comparison to women. These days, however, we are not too surprised to find a woman working behind the counter in a butcher’s shop, but we are still likely to tell the hostess how lucky she is when we are told that it is the host who is to be complimented on the production of a wonderful meal. Some of us adhere to a particular diet, but we tend to claim that this is due to health reasons, environmental, nutritional, religious or moral convictions. It is not because we recognise food taboos with reference to gender. Nevertheless, significant differences between the food consumption of men and women have been observed by nutritionists, epidemiologists, anthropologists and sociologists during the past 50 years. These differences concern the composition of diets, meals, dishes, foods and beverages, not the quantities of energy consumed. That which is most widely noted concerns men’s preference for and more frequent consumption of meat, in comparison to women’s consumption of vegetables (Andersson 1980; Bourdieu 1984; Charles and Kerr 1988; Fiddes 1991; Jansson 1993; Adams 1994; Fürst 1995; Lupton 1996; Bove, Sobal and Rauschenbach 2003; Sobal 2005; Rozin, Hormes, Faith and Wansink 2012). Significantly more women than men avoid meat altogether (Beardsworth and Keil 2002; Maurer 2002). Likewise, several


Katherine O’Doherty Jensen

studies document men’s distaste for too much “greenery” and for “sissy” or “wimpy” dishes in which vegetables or grains dominate (cf. Sobal 2005). These trends prompted Lupton to conduct focus groups in which men and women were presented with pictures of a range of different menu options and asked to identify which were “masculine” and which “feminine” (Lupton 1996). It was found that while both men and women readily agreed in their discernments of “masculine” versus “feminine” dishes, they were unable to agree on any reasons why this was the case and some maintained that neither the questions nor the answers could be taken seriously. Although Lupton adhered to the view that there is a metaphorical relation between femininity and vegetables, and between masculinity and meat, her study did not illuminate how such discernments are made by social actors. A review of more than 300 social scientific studies undertaken in Australia, Japan, North America and Western Europe clarified that a wider range of foods is implicated in gendered preferences and practices (O’Doherty Jensen and Holm 1999). Women express preferences for fruit as well as vegetables, and for poultry, fish, a variety of sour-­milk products and sweet items, while men have a pronounced preference for alcohol, especially beer and spirits, and for all meat products, especially red meat. Women enjoy “light” meals and regard salads, soups and omelettes as options that can suitably comprise a dinner for their personal consumption, whereas men prefer a “solid” meal, such as meat, potatoes and gravy. Moreover, deviations from these patterns are negatively sanctioned in many settings. Gendered norms in Japan, for example, are reported as being particularly marked in regard to sweet foods, when consumed by men, and alcohol, when consumed by women (Loveday and Chiba 1985). The same food products are also at issue in the norms governing traditional domestic etiquette in Western societies whereby the woman should cut the cake (and serve tea or coffee) and the man should offer alcoholic drinks (and carve the meat). It is likely that some trends noted in the 1980s and 1990s are no longer as marked as they once were. There are clear indications for instance, that drinking habits among Western women are beginning to resemble those of men, just as their smoking habits developed in earlier decades. However, there is an exception to these general trends that has been observed since the early 1980s. Men who have higher levels of education and correspondingly higher incomes tend to have eating practices that resemble those of women rather more closely than they resemble the eating practices of other men (Murcott 1983; Prättälä, Berg and Puska 1992; Sweeting, Anderson and West 1994). I will return to this issue presently.

5. Accounting for gendered food practices Accounts of the manner in which foods and beverages function as signifiers of gender broadly speaking follow one of two paths. The first approach suggests that points of resemblance are discerned between the characteristics of a food group on the one hand and those of men or women on the other. For example the characteristic strength, aggression and virility of animals, as compared to the colourful,

Performative Metaphor in Cultural Practices


decorative and delicate characteristics of vegetables, are thought to underlie their association with masculinity and femininity, respectively (Fiddes 1991; Adams 1994; Fürst 1995; Lupton 1996; Sobal 2005). It has been suggested that women’s preference for sweet items and dairy products could be attributed to the common properties of sweetness, softness and whiteness, as exemplified by the qualities of a bridal dress (Lupton 1996), or that soft, reddish fruits are prohibited foods for men among the Hua of Papua New Guinea because they are perceived as being vaginal, feminizing entities (Meigs 1984). Some of these suggestions are plausible, such that it could be a worthwhile endeavour to pursue their analysis within the conceptual metaphor framework by seeking to identify counterpoint connections between categories and their properties. Others lack plausibility or are not supported by available evidence. It does not seem likely for instance that the adoption of vegetarianism by feminists can be attributed to their conception or image of women as being delicate, colourful creatures who serve a decorative function. The second approach to explaining gendered food practices focuses upon the social function of these practices, which, it is held, is that of maintaining gendered status positions. In this view, foods are accorded the same status as their consumers, and men maintain their relatively higher status by avoiding the products typically consumed by women or children (Barthes 1975; Bourdieu 1984; Charles and Kerr 1988; Adams 1994). The strength of this argument lies in being able to attribute a causal force to social structure in accounting for the mechanism whereby given preferences and practices are reproduced. The fact that men’s preferences are acknowledged in the provision of family meals, for example, is attributed to the differential social status of family members (Charles and Kerr 1988). However, this approach exhibits some weaknesses often associated with functionalist explanations. It provides no account of why or how particular preferences and practices are generated or whether they might be subject to change. Why women or children should develop the particular food preferences they do is a fact for which no account is offered. Why men who have higher levels of education and correspondingly larger incomes should be the population group among men whose eating habits most closely resemble those of women is rendered an anomalous fact. Why the same foods should function in a manner that signifies gender in quite disparate societies (structural-­functional systems) presents itself as a further puzzle to be solved.

6. Gendered food practices as performative metaphor If gendered food practices are mediated by performative metaphor, the form of the account is at least straightforward: both men and women discern a structural similarity between the relative culinary status of different foods on the one hand and the relative social status of men and women on the other. Unlike the first approach noted above, this account rests on the premise that discernments of structural similarity, not substantive similarity, are at issue. Unlike the second approach, it


Katherine O’Doherty Jensen

does not rest on the assumption that social structure or consumption practices are reproduced in the measure that social actors internalise given norms and roles and act accordingly. In contrast, this account is based on the premise that norms and roles will tend to be reproduced in the measure that particular practices continue to “make sense” to social actors. The manner in which grades of food (high – low) are mapped onto grades of meals (high – low), and grades of meals are mapped onto grades of events (important – unimportant) as well as grades of people (close – distant) has been sketched in Section 3 above. These patterns obtain cross-­culturally and ground the assumption that both men and women are familiar with the relative culinary status accorded to different foods and beverages in their everyday lives. Whether both men and women are also likely to be familiar with a pattern whereby a higher social status is commonly accorded to men and a lower status to women calls for further comment. A number of contributions to the field of gender studies in recent decades describe local settings in which men play “feminine” roles, women adopt “masculine” roles, or in which both men and women are accorded multiple roles influenced as much by social class, race or age as by gender. The concepts of “multiple masculinities” and “multiple femininities” have been introduced, and the question has arisen as to whether gender should be conceived as referring to a binary opposition or to a continuum with feminine and masculine poles (Boydston 2009). These studies might well be considered as undermining the view that women (still) have a lower social status than men. On this issue, cross-­cultural data are published annually in Global Gender Gap Reports, which illuminate differential status by calculating national ratios of women to men within the areas of health, educational attainment, economic participation and political empowerment. Among the 142 countries included in the most recent report (Hausmann, Tyson, Bekhouche and Zahidi 2014), representing more than 90 % of the world’s population, it is found that the health gap with respect to life expectancy has been now closed in 35 countries, the educational gap with respect to literacy and enrolment has been closed in 25 countries, while no country has yet closed the gender gaps with respect to economic participation or political empowerment. Women remain significantly underrepresented within the higher echelons of commercial, public, legal and political organizations. It is found that while the gender gaps with respect to rights, responsibilities and opportunities are slowly closing, significant differences between countries remain. These data do not illuminate the extent to which relative levels of gender equality differ between population groups within the same country. On this point, it has been consistently found that educational level is the decisive factor in the development of more equitable attitudes and practices among men, and is also decisive with respect to the development of women’s expectations regarding the division of unpaid domestic labour within households (Lorber 1994; Kitterod and Lappegard 2012; Levtov, Barker and Contreras-­Urbina 2012). On this background, I am prepared to submit that the theory of performative metaphor does yield an account of the data concerning gendered food practices that is more coherent than previous accounts. The metaphor at issue, GRADES OF FOOD

Performative Metaphor in Cultural Practices


ARE GRADES OF PEOPLE, is one in which foods, beverages and dishes that have a high status on the culinary scale are discerned as those appropriately consumed by those who have a higher social status, men, while those located at lower points on the culinary scale are appropriately consumed by women. Men’s food preferences and practices do have the common characteristic just noted. In each case, they occupy locations on the culinary scale that are, so to speak, a step higher than those of women. Thus, men’s preferences concern: the main course of the primary meal of the day (typically meat, vegetables and potatoes) as compared to other courses (fish, soup, salads, fruit, sweet items); the centrepiece of the main course (meat) as compared to its side dishes (vegetables, cereals/staple foods); meat products in general as compared to other animal products (eggs and dairy products), and alcohol as compared to non-­alcoholic beverages. In this way, distinctive, gende