How the Brain Got Language – Towards a New Road Map 9027207623, 9789027207623

How did humans evolve biologically so that our brains and social interactions could support language processes, and how

451 96 16MB

English Pages 401 [403] Year 2020

Report DMCA / Copyright


Polecaj historie

How the Brain Got Language – Towards a New Road Map
 9027207623, 9789027207623

Table of contents :
Table of contents
Introducing the Volume: “How the brain got language: Towards a new road map” • Michael A. Arbib
An Old Road Map to Draw Upon
Computational challenges of evolving the language-ready brain: 1. From manual action to protosign • Michael A. Arbib
Computational challenges of evolving the language-ready brain: 2. Building towards neurolinguistics • Michael A. Arbib
Starting from the Macaque
Reflections on the differential organization of mirror neuron systems for hand and mouth and their role in the evolution of communication in primates • Gino Coudé and Pier Francesco Ferrari
Plasticity, innateness, and the path to language in the primate brain: Comparing macaque, chimpanzee and human circuitry for visuomotor integration • Erin Hecht
Voice, gesture and working memory in the emergence of speech • Francisco Aboitiz
Bringing in Emotion
Relating the evolution of Music-Readiness and Language-Readiness within the context of comparative neuroprimatology • Uwe Seifert
Why do we want to talk? Evolution of neural substrates of emotion and social cognition • Katerina Semendeferi
Mind the gap – moving beyond the dichotomy between intentional gestures and emotional facial and vocal signals of nonhuman primates • Katja Liebal and Linda Oña
Turn-taking and Prosociality
From sharing food to sharing information: Cooperative breeding and language evolution • Judith Burkart, Eloisa Guerreiro Martins, Fabia Miss and Yvonne Zürcher
Social manipulation, turn-taking and cooperation in apes: Implications for the evolution of language-based interaction in humans • Federico Rossano
Language origins: Fitness consequences, platform of trust, cooperation, and turn-taking • Sławomir Wacewicz and Przemysław Żywiczyński
Imitation, Pantomime and Develop
The evolutionary roots of human imitation, action understanding and symbols • Masako Myowa-Yamakoshi
Pantomime and imitation in great apes: Implications for reconstructing the evolution of language • Anne E. Russon
From action to spoken and signed language through gesture: Some basic developmental issues for a discussion on the evolution of the human language-ready brain • Virginia Volterra, Olga Capirci, Pasquale Rinaldi and Laura Sparaci
Praxis, symbol and language: Developmental, ecological and linguistic issues • Chris Sinha
Action, Tool Making and Language
Archaeology and the evolutionary neuroscience of language: The technological pedagogy hypothesis • Dietrich Stout
Tracing the evolutionary trajectory of verbal working memory with neuro-archaeology • Shelby S. Putt and Sobanawartiny Wijeakumar
From actions to events: Communicating through language and gesture • James Pustejov
Meaning and Grammar Emerging
From evolutionarily conserved frontal regions for sequence processing to human innovations for syntax • Benjamin Wilson and Christopher I. Petkov
The evolution of enhanced conceptual complexity and of Broca’s area: Language preadaptations • P. Thomas Schoenemann
Mental travels and the cognitive basis of language • Michael C. Corballis
The Road Map
The comparative neuroprimatology 2018 (CNP-2018) road map for research on How the Brain Got Language • Michael A. Arbib, Francisco Aboitiz, Judith M. Burkart, Michael Corballis, Gino Coudé, Erin Hecht, Katja Liebal, Masako Myowa-Yamakoshi, James Pustejovsky, Shelby Putt, Federico Rossano, Anne E. Russon, P. Thomas Schoenemann, Uwe Seifert, Katerina Semendeferi, Chris Sinha, Dietrich Stout, Virginia Volterra, Sławomir Wacewicz and Benjamin Wilson

Citation preview


lA e a h ic


.) d e ( b Ar b i


uage L ang G ot Map rain Road the B N ew H ow rd s a Towa


How the Brain Got Language – Towards a New Road Map

Benjamins Current Topics issn 1874-0081 Special issues of established journals tend to circulate within the orbit of the subscribers of those journals. For the Benjamins Current Topics series a number of special issues of various journals have been selected containing salient topics of research with the aim of finding new audiences for topically interesting material, bringing such material to a wider readership in book format. For an overview of all books published in this series, please see

Volume 112 How the Brain Got Language – Towards a New Road Map Edited by Michael A. Arbib These materials were previously published in Interaction Studies 19:1/2 (2018).

How the Brain Got Language – Towards a New Road Map Edited by

Michael A. Arbib University of California at San Diego, La Jolla

John Benjamins Publishing Company Amsterdam / Philadelphia



The paper used in this publication meets the minimum requirements of the American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984.

doi 10.1075/bct.112 Cataloging-in-Publication Data available from Library of Congress: lccn 2020023476 (print) / 2020023477 (e-book) isbn 978 90 272 0762 3 (Hb) isbn 978 90 272 6067 3 (e-book)

© 2020 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Company ·

Table of contents

Introduction Introducing the Volume: “How the brain got language: Towards a new road map” Michael A. Arbib


An Old Road Map to Draw Upon Computational challenges of evolving the language-ready brain: 1. From manual action to protosign Michael A. Arbib


Computational challenges of evolving the language-ready brain: 2. Building towards neurolinguistics Michael A. Arbib


Starting from the Macaque Reflections on the differential organization of mirror neuron systems for hand and mouth and their role in the evolution of communication in primates Gino Coudé and Pier Francesco Ferrari Plasticity, innateness, and the path to language in the primate brain: Comparing macaque, chimpanzee and human circuitry for visuomotor integration Erin Hecht Voice, gesture and working memory in the emergence of speech Francisco Aboitiz


54 70

Bringing in Emotion Relating the evolution of Music-Readiness and Language-Readiness within the context of comparative neuroprimatology Uwe Seifert



How the Brain Got Language: Towards a New Road Map

Why do we want to talk? Evolution of neural substrates of emotion and social cognition Katerina Semendeferi


Mind the gap – moving beyond the dichotomy between intentional gestures and emotional facial and vocal signals of nonhuman primates Katja Liebal and Linda Oña


Turn-taking and Prosociality From sharing food to sharing information: Cooperative breeding and language evolution Judith Burkart, Eloisa Guerreiro Martins, Fabia Miss and Yvonne Zürcher


Social manipulation, turn-taking and cooperation in apes: Implications for the evolution of language-based interaction in humans Federico Rossano


Language origins: Fitness consequences, platform of trust, cooperation, and turn-taking Sławomir Wacewicz and Przemysław Żywiczyński


Imitation, Pantomime and Development The evolutionary roots of human imitation, action understanding and symbols Masako Myowa-Yamakoshi


Pantomime and imitation in great apes: Implications for reconstructing the evolution of language Anne E. Russon


From action to spoken and signed language through gesture: Some basic developmental issues for a discussion on the evolution of the human language-ready brain Virginia Volterra, Olga Capirci, Pasquale Rinaldi and Laura Sparaci Praxis, symbol and language: Developmental, ecological and linguistic issues Chris Sinha



Table of contents vii

Action, Tool Making and Language Archaeology and the evolutionary neuroscience of language: The technological pedagogy hypothesis Dietrich Stout


Tracing the evolutionary trajectory of verbal working memory with neuro-archaeology Shelby S. Putt and Sobanawartiny Wijeakumar


From actions to events: Communicating through language and gesture James Pustejovsky


Meaning and Grammar Emerging From evolutionarily conserved frontal regions for sequence processing to human innovations for syntax Benjamin Wilson and Christopher I. Petkov


The evolution of enhanced conceptual complexity and of Broca’s area: Language preadaptations P. Thomas Schoenemann


Mental travels and the cognitive basis of language Michael C. Corballis


The Road Map The comparative neuroprimatology 2018 (CNP-2018) road map for research on How the Brain Got Language Michael A. Arbib, Francisco Aboitiz, Judith M. Burkart, Michael Corballis, Gino Coudé, Erin Hecht, Katja Liebal, Masako Myowa-Yamakoshi, James Pustejovsky, Shelby Putt, Federico Rossano, Anne E. Russon, P. Thomas Schoenemann, Uwe Seifert, Katerina Semendeferi, Chris Sinha, Dietrich Stout, Virginia Volterra, Sławomir Wacewicz and Benjamin Wilson Index




Introducing the Volume “How the brain got language: Towards a new road map” Michael A. Arbib

This volume is based on presentations and discussion at a workshop entitled “How the Brain Got Language: Towards a New Road Map.” Unifying themes include the comparative study of brain, behavior and communication in monkeys, apes and humans, and an EvoDevoSocio framework for approaching biological and cultural evolution within a shared perspective. The final article of the volume builds on the previous papers to present “The Comparative Neuroprimatology 2018 (CNP-2018) Road Map for Research on How the Brain Got Language.”

Comparative Neuroprimatology and the EvoDevoSocio Perspective The Workshop “How the Brain Got Language: Towards a New Road Map” was held in La Jolla, California, on August 29–31, 2017 (Michael A. Arbib, Organizer). Drafts of 21 papers were prepared prior to the Workshop which itself was organized to combine short presentations of key ideas in the papers with lengthy discussions integrating across multiple themes. The versions of those papers published here reflect feedback from the Workshop and subsequent external reviews. Each starts with an introduction that makes clear the aspects of language (or precursors thereof) whose evolution will be the target of the article, and outlines the methods to be used. The final section is titled “Toward a New Road Map” and presents “via points” (some more speculative than others) for the new map, along with prescriptions for future research to address them. These set the stage for “The Comparative Neuroprimatology 2018 (CNP-2018) Road Map for Research on How the Brain Got Language” that is published as the final article in the volume. There are many approaches to the study of language evolution, but the Workshop title makes explicit the concern with the brain mechanisms that support language. Moreover, we adopted the framework of “comparative neuroprimatology” – assessing relevant data and theories concerning the brains, behaviors and https://‍ © 2020 John Benjamins Publishing Company


Michael A. Arbib

communication systems of monkeys, apes and humans to raise hypotheses about LCA-m (our last common ancestor with monkeys) and LCA-c (our last common ancestor with chimpanzees) as a basis for investigating the biological and cultural evolution of the human language-ready brain. The term “language-ready brain” includes the hypothesis that the earliest Homo sapiens had protolanguage but not language and that it required considerable cultural evolution before full-fledged languages emerged. More generally, our concern with comparative neuroprimatology was informed by an EvoDevoSocio approach – the view that biological evolution defines developmental systems that can both shape and be shaped by cultural evolution, the dynamic emergence of patterns of social interaction. With this, here are the titles of the 21 papers in this volume that precede the final paper presenting the CNP-2018 Road Map, arranged into seven themes. In what follows, names of Workshop participants (and, thus, co-authors of the new road map) occur in boldface; names of the co-authors of their papers do not. Since many papers cut across more than one theme, the ordering of papers here is one among many, and readers may find alternate paths that better match their own interests.

An old road map to draw upon The term “new road map” presupposes an “old road map.” This is provided by the book How the Brain Got Language (Arbib, 2012) which develops the so-called Mirror System Hypothesis (MSH) integrating more than a decade and a half of research building on the sketch of “Language Within Our Grasp” (Arbib & Rizzolatti, 1997; Rizzolatti & Arbib, 1998). However, MSH is not a fixed dogma but, rather, an evolving system to be updated as new data and theory become available, and so an explicit charge to Workshop participants was that they should not privilege MSH in any way. Thus, a paper in this issue may show how to develop some aspect of MSH, offer an alternative, or ignore MSH entirely in exploring aspects of biological or cultural evolution missing from the old road map. At the heart of MSH is the observation that human language may be signed as well as spoken, and it pays special attention to the relation between manual action in general and the role of the hands in gesture. It charts a path via complex imitation and pantomime to protolanguage (with manual protosigns providing the scaffolding for the emergence of vocal control and protospeech). It then hypothesizes that these abilities constituted the capabilities of the language-ready brain that enable Homo sapiens to then make the transition to languages via cultural evolution. Two papers summarize aspects of how MSH approaches the changes from LCA-m via LCA-c to H. sapiens, and the primarily cultural evolution whereby language-ready brains came to support language. Moreover, they emphasize

Introducing the Volume

the need to understand the processes whereby the brain supports diverse capabilities – thus the shared attention to “computational challenges.” Michael A. Arbib: Computational challenges of evolving the language-ready brain: 1. From Manual Action to Protosign Michael A. Arbib: Computational Challenges of evolving the language-ready brain: 2. Building towards neurolinguistics

Starting from the macaque Although MSH involved far more of the brain than just mirror neurons, the “classic” mirror system for grasping played an important role in grounding parity in that hypothesis. Coudé shows that oro-facial mirror neurons have a distinctive linkage to other brain regions, especially emotion-related regions of the limbic system, and thus need more attention than MSH offers. Gino Coudé and Pier Francesco Ferrari: Reflections on the differential organization of mirror neuron systems for hand and mouth and their role in the evolution of communication in primates

He suggests that the overlap of the oro-facial and manual regions of F5 may support the linkage of gesture with the limbic system (see “Bringing in Emotion,” below). Hecht provides an overall framework comparing connectivity in macaque, chimpanzee and human brains: Erin Hecht: Plasticity, innateness, and the path to language in the primate brain: Comparing macaque, chimpanzee and human circuitry for visuomotor integration

while Aboitiz focuses on comparing “language-relevant pathways” in macaque and human with an emphasis on the role of working memory in speech: Francisco Aboitiz: Voice, gesture and working memory in the emergence of speech

Bringing in emotion Offering another comparative perspective – that between language and music – Seifert assesses to what extent music-readiness (more attuned to emotional expression?) and language-readiness (more attuned to propositional content?) may be related: Uwe Seifert: Relating the evolution of Music-Readiness and LanguageReadiness within the context of comparative neuroprimatology



Michael A. Arbib

Then, where Coudé offered a bridge from orofacial mirror neurons to the limbic system in the macaque, Semendeferi offers a comparative neuroanatomy that brings emotion into the road map, reminding us of the role of motivation in communication: Katerina Semendeferi: Why do we want to talk? Evolution of neural substrates of emotion and social cognition

Liebal further develop the theme of emotion and social cognition by asking us to reconsider the claim that ape gestures are intentional whereas monkey vocalizations are solely emotional. Katja Liebal and Linda Oña: Mind the gap – moving beyond the dichotomy between intentional gestures and emotional facial and vocal signals of nonhuman primates

Turn-taking and prosociality Whereas many efforts towards characterizing LCA-m focus on macaque, Burkart reminds us that there are properties of “language-readiness” that appear particularly prevalent in callitrichids (e.g., marmosets) among present-day monkeys. She suggests that cooperative breeding may have played a key role in language evolution, in particular by providing motivational preconditions (readiness to share information, platform of trust, see also Wacewicz). Judith Burkart, E.M. Guerreiro Martins, F. Miss, & Y. Zuercher: From sharing food to sharing information. Cooperative breeding and language evolution

Rossano then complements this material with comparative study of primate social manipulation and cooperation Federico Rossano: Social manipulation, turn-taking and cooperation in apes: Implications for the evolution of language-based interaction in humans

while assessing those properties that distinguish human conversation from callitrichid turn-taking. Finally, Wacewicz offers an evolutionary account of language that emphasizes trust, cooperation, and turn-taking. Sławomir Wacewicz and Przemysław Żywiczyński: Language origins: Fitness consequences, platform of trust, cooperation, and turn-taking

Imitation, pantomime and development Myowa compares chimpanzees and young children to assess the MSH hypothesis that LCA-c had only “simple” imitation whereas H. sapiens has “complex” imitation.

Introducing the Volume

Masako Myowa: The Evolutionary Roots of Human Imitation, Action Understanding and Symbols

MSH hypothesizes that pantomime was a crucial bridge to protolanguage that built on complex imitation after LCA-c. However, Russon analyzes imitation in orangutans and offers evidence that they have some form of pantomime. Anne E. Russon: Pantomime and imitation in great apes: Implications for reconstructing the evolution of language

A key theme of MSH is that there is an evolutionary progression from manual action via gesture to protolanguage. Volterra traces a related progression in child development, raising interesting questions about the relation between phylogeny and ontogeny. Virginia Volterra, Olga Capirci, Pasquale Rinaldi and Laura Sparaci: From action to spoken and signed language through gesture: some basic developmental issues for a discussion on the evolution of the human language-ready brain

Sinha complements the notion of a language-ready brain with that of a symbol-ready brain, distinguishing signals from symbols, and offers a general EvoDevoSocio perspective that highlights the role of biological and cultural coevolution, with particular emphasis on the evolution of infancy. Chris Sinha: Praxis, symbol and language: developmental, ecological and linguistic issues

Action, tool making, and language Stout and Putt both employ a method of “neuro-archeology” to explore how the brain may have changed from early Homo to Homo sapiens: Teach modern humans to make stone tools found by archeologists; see what parts and connections of the brain are “exercised” by learning the ancient skill; hypothesize that this may reveal brain regions relevant not only to language-readiness but to skilled action more generally. Stout explores the hypothesis of a domain-general capacity for multi-component behavior sequencing shared by praxic behaviors and language. Dietrich Stout: Archaeology and the evolutionary neuroscience of language: the technological pedagogy hypothesis

Putt focuses on working memory, providing a link back to Aboitiz. Shelby Putt and Sobanawartiny Wijeakumar: Tracing the evolutionary trajectory of verbal working memory with neuro-archaeology



Michael A. Arbib

Pustejovsky turns from neuro-archeology to computational analysis to further probe the relation between action, perception and language: James Pustejovsky: From Actions to Events: Communicating through Language and Gesture

Meaning and grammar emerging Wilson and Schoenemann offer complementary perspectives. Wilson explores possible commonalities between brain mechanisms for sequence processing in macaques and use of artificial grammars to assess cognitive processes that might underpin aspects of human syntax, while Schoenemann stresses the link between grammar and concepts, inspired in part by study of the (proto)‍human fossil record. Benjamin Wilson and Christopher I. Petkov: From evolutionarily conserved frontal regions for sequence processing to human innovations for syntax P. Thomas Schoenemann: The evolution of enhanced conceptual complexity and of Broca’s area: Language preadaptations

Finally, Corballis reminds us that, whatever the importance for primates of communication and social interaction in the here-and-now, a key property of language is displacement, the ability for “mental travel” to communicate about other times and places, both remembered and imagined. Michael Corballis: Mental travels and the cognitive basis of language

Acknowledgements My thanks to all the participants, co-authors and observers for all they have contributed to the Workshop and this volume.

Funding This Workshop was supported in part by NSF Grant No. BCS-1343544 “INSPIRE Track 1: Action, Vision and Language, and their Brain Mechanisms in Evolutionary Relationship,” (M.A. Arbib, Principal Investigator). A grant from the University of California at San Diego made possible the participation of graduate students and post-docs as observers.

References Arbib, M. A. (2012). How the Brain Got Language: The Mirror System Hypothesis. New York & Oxford: Oxford University Press. https://‍ Arbib, M. A., & Rizzolatti, G. (1997). Neural expectations: a possible evolutionary path from manual skills to language. Communication and Cognition, 29, 393–424. Rizzolatti, G., & Arbib, M. A. (1998). Language within our grasp. Trends in Neurosciences, 21(5), 188–194.  https://‍‍01260-0

An Old Road Map to Draw Upon

Computational challenges of evolving the language-ready brain 1. From manual action to protosign Michael A. Arbib

University of California at San Diego

Computational modeling of the macaque brain grounds hypotheses on the brain of LCA-m (the last common ancestor of monkey and human). Elaborations thereof provide a brain model for LCA-c (c for chimpanzee). The Mirror System Hypothesis charts further steps via imitation and pantomime to protosign and protolanguage on the path to a "language-ready brain" in Homo sapiens, with the path to speech being indirect. The material poses new challenges for both experimentation and modeling. Keywords: action pattern reorganization, computational comparative neuroprimatology, imitation, language evolution, mirror system hypothesis, mirror systems, modeling cerebellum, speech evolution

1. The Mirror System Hypothesis (MSH) introduced To have language is, in part, to be able to exploit an open lexicon and a powerful grammar to communicate and comprehend new meanings of increasing complexity. Since humans can learn a signed language as readily as a spoken language if raised in the appropriate milieu, I stress that “having language” is not synonymous with “having speech.” This paper espouses an EvoDevoSocio approach to language evolution, positing that what evolved (Evo) was a language-ready brain  – not a brain with an innate mechanism encoding a universal grammar (Arbib, 2007) but rather one enabling a child to acquire language (Devo), but only if raised in a milieu in which language is already present, something which, it is claimed, required tens of millennia of cultural evolution after the emergence of Homo sapiens (Socio). Biology and culture shape each other, with the structure and function of an adult’s brain reflecting the social and physical interactions in which that person has engaged throughout a lifetime; and those interactions may in turn shape the culture.

https://‍ © 2020 John Benjamins Publishing Company


Michael A. Arbib

To succeed, language must have the parity property that the meaning of an utterance intended by the speaker or signer will be understood (though not always) by the recipient. The F5 premotor brain region in macaque has mirror neurons which appear to support a parity property for manual action – specific neurons fire both when the monkey performs specific actions and when it observes similar actions performed by another. Several data suggested the relevance of mirror neurons to language: – Macaque F5 is homologous to Area 44 in human Broca’s area, a crucial component of the human brain’s language system. – Human brain imaging showed activation for both grasping and observation of grasping in or near Broca’s area – Broca’s area plays a similar role for spoken and signed languages (Poizner et al., 1987). Such observations motivated The Mirror System Hypothesis (MSH) (Arbib & Rizzolatti, 1997): the evolutionary basis for language parity is provided by the mirror system for grasping, rooting speech in communication based on production and recognition of manual gestures. In more detail, we hypothesize that evolutionary elaboration of mechanisms for execution, recognition and imitation of manual skills in time supported manual gesture, ad hoc pantomime and, eventually, protosign to lay the scaffolding for protospeech and the emergence of the language-ready brain. For MSH, the capacity for vocal learning and control that distinguishes humans from other primates still plays a role, but it is secondary (but see Section 7). Much subsequent work has been devoted to “evolving” MSH (Arbib, 2016 for a recent overview). MSH hypothesizes mechanisms for behavior and communication in LCA-m and LCA-c (last common ancestor of humans with macaques and chimpanzees, respectively). These ground (and can be modified by) hypotheses on the path from LCA-m via LCA-c to Homo sapiens. Since data on LCA-m and LCA-c are virtually non-existent, we resort to comparative (neuro) primatology – comparing brain, behavior and communication across extant species (macaques and other monkeys, chimpanzees and other apes, and modern humans) and their precursors as revealed by archeological data. 2. Introducing computational comparative neuroprimatology This article has “Computational Challenges” in the title because too many experimentalists and field workers fail to think through the processing challenges of the behaviors or brain scans they observe. Mathematical or computational models probe in more detail a methodology that will advance the field by asking for each study what is the overall behavior of interest and then seeking to develop a

Computational challenges of evolving the language-ready brain

hypothesis (call it H1) on the interacting processes (e.g., neural networks or schemas/schema instances) and the information flow between them that might support the behavior. H1 may be consistent with the data under review, but if not – or important new data emerge that H1 does not explain  – then one must develop H2 with schemas and interactions that better match overall performance or brain activity … and/or (crucial for us) are more conformable with evolutionary hypotheses based on, e.g., comparative neuroprimatology. And thus the experiment (or field work) – modeling cycle continues. We advocate a computational comparative neuroprimatology. We need to understand what computations occur within each brain region and how the interactions between brain regions orchestrate them to yield behavior, and how interactions between two or more agents underpin language acquisition and cultural change. However, whereas for monkeys we have many single-cell recordings that constrain models of biologically structured neural networks, no such data exist for apes or humans. This has led to the following strategy: 1. For monkeys, build detailed models of interacting neural networks and use computer simulation to explore the relation between overt behavior and neural activity (Section 3). 2. Compare behavior in macaques and apes to chart key changes in the behavioral repertoire, and posit changes in an LCA-m model that can explain the extended behavior, offering this as a hypothesis for LCA-c (Section 4). 3. For humans, we may repeat the process to offer models of neural circuitry averaged over the activity of various simulated circuits to make predictions to be tested against ERP and fMRI (Arbib et  al., 2000; Barrès et  al., 2013). Alternatively, modeling may start from the human data (which may also include lesion data, or data on neurological disorders) and seek to build models directly either at the level of neural networks or interacting schemas (see, e.g., Arbib, 2016; Arbib, 2018; and Cooper, 2016). This paper focuses on “monkey-based” brain modeling on the evolutionary path to protolanguage. A different style is required to trace the path from protolanguage to neurolinguistics (Part 2, Arbib, 2018). Meanwhile, many studies ignore the brain to conduct simulations of how various aspects of language might emerge in a computational milieu (Cangelosi & Parisi, 2002). We have only the beginnings of an integrated framework for EvoDevoSocio modeling that links brain, behavior and communication.



Michael A. Arbib

3. Setting a baseline for LCA-m In this section, we introduce three models of the macaque brain, while leaving details to the original papers. Extended review of alternative models would be valuable especially if we could evaluate their relevance to our evolutionary investigation, but is outside the scope of this short paper. 3.1 The FARS (Fagg-Arbib-Rizzolatti-Sakata) model The FARS Model (Fagg & Arbib, 1998) explains how the brain may use visual information to guide the hand in grasping an object. It was based in part on macaque neurophysiology on neural correlates of “motor schemas for manual actions” in premotor area F5 (e.g., Rizzolatti et  al., 1988) and “grasp affordances” (e.g., Taira et al., 1990) – visual cues as to graspability – in parietal area AIP (anterior intraparietal sulcus). Visual input travels by two pathways: 1. A dorsal (“how”) path via AIP extracts information on affordances to yield parameters for detailed motor control of each action 2. A ventral (“what”) path wherein object recognition provides input that the prefrontal cortex can combine with working memory to plan a sequence of actions while the dorsal path routes the appropriate affordances to the motor cortex to control the current action. Notably, the data at that time focused on conditions where the monkey had only one affordance available for the given trial; the model further addressed having to make decisions between multiple affordances. This dorsal-ventral distinction plays a crucial role in charting the path to language processing in Part 2. 3.2 Modeling mirror systems in action recognition The Mirror Neuron System model (MNS; Oztop & Arbib, 2002) offers a Devo view of mirror neurons. Rather than positing an innate repertoire, it suggests how mirror neurons for manual actions might emerge during observation of one’s own actions. In Figure 1, the external diagonals correspond to the dorsal path of the FARS model for converting an affordance into a grasp and a complementary path for controlling the arm to bring the hand to the desired position. Since we emphasize learning, we distinguish “potential” mirror neurons (before learning) from actual mirror neurons (after their properties are defined by the learning process). These receive both (i) Efferent copy of the code for some of grasps and (ii) input from circuitry that monitors the trajectory of the hand in a reference frame centered on the chosen affordance. The efferent copy acts as a “training signal” for the neurons

Computational challenges of evolving the language-ready brain

it activates  – the learning process strengthens synapses that encode trajectories like those for the current grasp. As learning progresses, the synaptic drive from (ii) will eventually be enough to activate the emerging mirror neurons relevant to that grasp even if input (i) is absent. Since observation of another individual’s action may evoke the same affordance-centered input pattern (ii) as for self-execution, these neurons thus become mirror neurons. MNS demonstrated how, as learning progresses, recognition of the grasp may occur earlier and earlier in the trajectory – though such anticipation will be a function of how precisely the trajectory is represented in the brain, which in turn is a function of attention as well as neural encoding. A crucial aspect of the model, then, was to suggest that mirror neurons may have evolved first to monitor self-actions (see the ACQ model next) – matching intended action to observed trajectory – with their role in the observation of others (which is most emphasized in the literature) being an exaptation of this capability. Our 2002 hypothesis, that F5 mirror neurons of the macaque are sensitive to the sight of the monkey’s own hand during object grasping, was confirmed by Maranesi et al. (2015).

Object features cIPS

Visual Cortex

Object affordancehand state association

Object affordance extraction AIP

Motor program (Grasp)

PF Hand shape recognition & Hand motion detection STS

F5 canonical

Hand-object spatial relation analysis PG

Motor execution

Action recognition (Mirror neurons)


F5 mirror

Motor program (Reach) F4

Object location MIP/LIP/VIP

Figure 1.  The MNS Model of Learning in the Mirror Neuron System. Note that, whatever properties mirror neurons have, they only have by dint of being part of a larger system. See text for details



Michael A. Arbib

3.3 Flexible action patterns and their rapid reorganization Alstermark et al. (1981) demonstrated lesions of axons leaving the spinal cord that impair grasping but not reaching in cats. He taught cats to reach into a glass tube projecting horizontally from the wall and grasp a piece of food, which the cat then brings to its mouth. After just a few trials, a lesioned cat would not try to grasp but would simply bat the food from the tube and then grasp it from the floor with its jaws. How could this new skill emerge so quickly? The Augmented Competitive Queueing Model (ACQ; Bonaiuto & Arbib, 2010) explains such phenomena (and we argue that it applies to the monkey and LCA-m, even though it was inspired by cat data) by having the mirror system monitor self-actions, as emphasized in the previous section. Recall that mirror neurons can be activated both by efference copy of a motor command or by observing a hand-to-object trajectory associated with the grasp. The key to ACQ is that when an intended action is unsuccessful, it may appear similar to an unintended action – and then the mirror neurons for the apparent action can serve a “what did I just do?” function. Thus, when the lesioned cat tries to grasp the food and inadvertently knocks it out of the tube, the mirror system can recognize that it looks like a “batting” action already in the cat’s repertoire. ACQ makes two evaluations of for each action: Desirability depends on the current task or goal. Each time the action is performed, a measure of “expected reinforcement” is updated. This will be positive if the action leads “soon enough” to achievement of the goal, but will be greater the shorter the time required to reach that goal. Executability depends on the availability of affordances (can the action be carried out now?) and the probability of the action’s success. At each time step, the priority of available actions is set by combining executability and desirability – the highest priority action will then be executed (or, since failure is possible, its execution will be attempted). Each time an action is performed successfully, its desirability is updated while executability may be left as is or increased. However, when the action is unsuccessful, executability of the intended action is reduced while desirability of the apparent action is adjusted. This explains the rapid change of behavior in Alstermark’s lesioned cat. Since the grasp keeps failing, its executability is decreased (but its desirability is unchanged) whereas the desirability of batting increases each time it is used. Consequently, in only a few trials the priority of batting comes to exceed that of grasping, and the cat has a new plan of behavior implicit in altered desirability and executability of its actions.

Computational challenges of evolving the language-ready brain

The model assumes that cats have mirror neurons for brachio-manual actions. This has yet to be tested. However, the suggestion is again that mirror neurons arose first for monitoring of self-actions and that this functionality is widespread. 4. An LCA-c innovation built on LCA-m mechanisms MSH does not claim that having mirror neurons for hand movements suffices for language. Rather, it claims that the ability to recognize dexterous manual actions provided a stepping stone for LCA-c to develop novel communicative gestures, and that further steps were needed en route to the language-ready brain. Consider ape gesture as a stand-in for LCA-c. Where some have argued that all ape gestures are simply extracted from an innate repertoire (Hobaiter & Byrne, 2011), others have focused on specific gestures observed in ape populations and suggested a role for social learning. Tomasello & Call (1997) proposed ontogenetic ritualization (OR) as a means whereby (some) ape gestures could emerge: i. A performs praxic behavior X and individual B consistently reacts by doing Y ii. Subsequently, B anticipates A’s overall performance of X by starting to perform Y before A completes X. iii. Eventually, A anticipates B’s anticipation, producing a ritualized form XR of X to elicit Y Halina et  al. (2013) offer examples. Hobaiter & Byrne suggest that if a gesture is in frequent use in an ape group it must be innate, and that OR can only generate idiosyncratic gestures. However, the model below suggests that if a dyadic behavior is common for whatever reason, then one may expect its ritualization to be common, too. Arbib et al. (2014) developed a model based on FARS and MNS. It introduces dyadic brain modeling  – we simultaneously model the brains of two interacting apes. The architecture of each brain is the same but the initial states are different and thus the learning differs in the brains. The emergence of beckoning provides an example. The child’s distal goal is getting mother to hug him. In the initial episodes, the child’s motivation is to be hugged while the mother’s motivation is elsewhere until the child tugs on her arm and pulls her closer. She recognizes this and responds with a hug. Over subsequent episodes, two different mechanisms come into play. Thanks to the ability of the mother’s MNS to learn to recognize an action earlier and earlier in its trajectory, she comes to recognize the child’s request, and thus respond, before its completion. Similarly, the child’s MNS allows the child to recognize earlier that his request is being granted, and to terminate his trajectory accordingly. The model shows how the child makes the transition from intending the full request and stopping



Michael A. Arbib

it, to simply making the initial prefix of that request as the intended action. This transition is from a transitive action (whose goal is set by the affordances of an object – or mother’s arm) to an intransitive action. Why do apes but not monkeys make this transition? Arbib et al. hypothesize that the ape can make greater use of proprioceptive information in setting a goal than the monkey can, so that an intermediate position of the arm can be recognized as a desired end state. But if this verbal explanation is convincing, why specify the details to the point where the model can be simulated on a computer? One answer is that once implemented, we can vary parameters within the model on different simulation runs to establish parameter ranges in which the model does or does not yield the behavior that the verbal discussion rendered so plausible. Similarly, we can see the effects of adding or removing pathways between regions of the model. Hence, if we find that some settings yield behavior more akin to those of monkeys while others yield behavior more typical of apes, then we have a prediction in precise form for what might be a critical evolutionary change in brain organization. Alas, empirical tools to establish parameter values are lacking. However, Diffusion Tensor Imaging (DTI) by Hecht et al. (2012) does speak to differences in MNS connectivity in macaque, chimpanzee and human. Among the trends seen are changes in the ventral vs. dorsal visual routes to frontal cortex which may support increased processing of visual movement details, and changes in connectivity between the parietal mirror region and inferotemporal cortex which may better support social learning of object-related actions. Of course, such connectivity data must be complemented by data on the neural circuits that are connected. 5. Varieties of imitation Imitation comes in many forms (Byrne & Russon, 1998). The general definition of imitation relevant here is “the ability to use observation of others achieving a desirable goal to develop a means of achieving that goal based on the method exhibited by the performer.” Byrne (2003) argued that apes acquire new skills through “imitation as behaviour parsing,” in which the observer comes to recognize that a few subgoals are key to successful performance – but then acquires the action to get from one subgoal to the next through a lengthy process of trial and error. This is simple imitation that, MSH claims, was present in LCA-c, but not LCA-m. MSH then claims that our ancestors post LCA-c acquired a crucial blend of complex action recognition as well as imitation – this still “parses” the behavior but adds attention to the motion as well as the goal of subactions, with the consequent ability to achieve a first approximation to that motion without trial and error. A repertoire of “tweaks” may be used help adjust a motion, but with trial and error still available to hone a moderately successful skill with repeated practice.

Computational challenges of evolving the language-ready brain

MSH posits that, like various monkey species, LCA-m had an innate vocal call system, but lacked the ability – posited for LCA-c – to acquire novel manual gestures. Section 4 showed how LCA-c brains might support OR. Strikingly, though, OR does not involve imitation. Evidence remains sparse for transmission of gesture through imitation, but I predict that evidence will eventually be found. Space does not permit an extended treatment of imitation models. Instead, a few observations. A common mistake is to think that having mirror neurons (or action recognition more broadly) is enough to be able to imitate. This is not so (Oztop et al., 2006; Oztop et al., 2013). Here are two issues: 1. If the action required to achieve a subgoal is already in the observer’s repertoire and can be recognized as such and if the recognition can be used to guide action, then imitation may proceed quite swiftly. Otherwise some means (trial-and-error or not) must be found to acquire that action. Here we see the need for a “reverse MNS”: Recognizing that the action being performed by another reaches a desirable goal, learn features that enable you to recognize it, and (if feasible) use those features to aid you in adding the action to your own repertoire. 2. Being able to acquire a single action does not guarantee one can develop a “program” and working memory that can link various actions to the subgoals and keep track of what subgoals have been achieved in the current behavior. Here a “reverse ACQ” might be relevant, coupling systems in cerebral cortex and basal ganglia. Thus, human evolution may have complemented improved skills in mastering novel actions with an increased capacity to master hierarchical plans of increasing complexity. Note the utility of complex imitation for language learning and complex action recognition for language use once this can be applied to words-as-articulatory actions – a late exaptation of a system that evolved (MSH claims) to support praxis. Animal behaviors may be highly complex  – for example, novel spatial arrangements of prey, predator and barriers as well as motivational state can yield an endless variety of trajectories in frogs (Cobas & Arbib, 1992). However, this seems qualitatively different from the on-line flexibility of conversation, where each utterance may (even though it often does not) express novel meanings. 6. From imitation to pantomime We have seen that MSH claims that, post-LCA-c, intertwining changes in brain and body and in social interaction yielded a capacity for complex action recognition and imitation (CAR&IM), driven primarily by adaptive pressure for increased



Michael A. Arbib

efficacy of transfer of manual skills. But it claims that this paved the way for a new form of communication, ad hoc pantomime. Rejecting a previous definition (Arbib, 2012, pp. 218–219), I suggest the following: A social group “has” pantomime if it has both the brain capability and the social conventions such that dyads (X,Y) of the community can freely engage in the following sort of exchange: – X performs an intransitive action P that resembles an action B which might occur within a context C to achieve goal G – and does so with the intention that observer Y will “get the message” concerning some aspect of C or G; – Y recognizes that A does indeed resemble B and, knowing that action B might occur within a context C’ with goal G’, infers that the message is some aspect of C or G. Crucially, MSH requires that these pantomimes can be freely invented to bridge gaps in communication when previously available means fail and hypothesizes that such pantomime appeared post-CAR&IM (and thus post-LCA-c, but see Russon, 2018). The catch is that pantomime may be unsuccessful, or succeed only after much further effort. This, MSH claims, provided the selective pressure to yield social and biological evolution that yielded protosign in which pantomimes are ritualized in a community to provide low-energy gestures with reduced ambiguity. Indeed, variants may arise to distinguish key interpretations – e.g., a pantomime for “bird flying” might differentiate into protosigns for “bird” and for “flying.” Is a pantomime-ready brain also a protosign-ready brain? Perhaps not. There are brain lesions in users of modern sign languages that impair language use while leaving intact the capacity for pantomime (Corina et  al., 1992; Marshall et  al., 2004) – suggesting that ad hoc use of pantomime is neurally different from access to a symbol within a (proto)‍sign system. MSH then argues that even a limited use of protosign for communication creates an adaptive pressure for the emergence of a capacity to use gestures in other modalities, and that protospeech emerges through invasion of the vocal apparatus by collaterals from the protosign system. (See Arbib, 2012 for some details.) Capabilities for protosign and protospeech then emerge in an expanding spiral (Arbib, 2005): the path to speech is indirect. (Or is it? See Section  7 and Aboitiz, 2018.) The final MSH claim is that once the capacities for complex imitation and protolanguage were in place in early Homo sapiens, the emergence of language – an open lexicon, a grammar supporting a rich compositional semantics, and a phonology – were all primarily the result of social innovation and dissemination (Arbib, 2012, Chapter 10). Nonetheless, Baldwinian evolution may have “tweaked” the biological substrate for acquiring or using these language features, such as

Computational challenges of evolving the language-ready brain

increasing control over vocal articulators and increasing capacity for symbolic working memory. Brain mechanisms for the production and comprehension of language utterances will be a central concern of Part 2 (Arbib, 2018). 7. Is the path to speech indirect? 7.1 Some macaque premotor neurons may control vocalization Whereas the involvement of medial cortex in monkeys in the conditioning of innate calls is widely accepted (Jürgens, 2002), Coudé et al. (2011) found neurons in the ventral premotor cortex that activate during the conditioned vocalizations they studied. Fogassi et al. (2013) suggest that these neurons constitute a primitive neural substrate of a cortical center for the voluntary control of vocalization. But does this promote the hypothesis of a direct route from LCA-m vocalization to speech in which non-homologous regions are implicated, or demonstrate a restricted path on which evolution post-LCA-c could enable protosign mechanisms to scaffold protospeech? Note that orofacial control does not imply vocal control. In support of the latter view: (a) nonhuman primates can master novel manual skills but not novel vocal skills; and (b) pantomime offers relatively direct access to a wider range of meanings than does sound symbolism, thus offering a clearer path to an open semantics via protosign than directly through protospeech. 7.2 Case study: The role of the cerebellum in prism adaptation Rather than offer a model directly bearing on the path (direct or indirect) to speech, I want to turn to a model of the role of the cerebellum in prism adaptation (Arbib et al., 1995). The point made here is that computational neuroscience can contribute interpretive tools of general utility, and that lessons learned from modeling in one domain may in due course illuminate another. Our challenge was to develop a model of the role of cerebellum and related brain structures that could explain the data of Martin et al. (1996) on adaptation of throwing to a target while wearing prisms that shift the visual input laterally. The data contained two surprises. a. In many cases, someone who had adapted to wearing the prisms during repeated throws underarm showed little or no adaptation when, with prisms still on, she started throwing overarm. We explained this by linking different microcomplexes – each a patch of cerebellar cortex linked to a patch of cerebellar nuclei and circuitry in cerebral cortex – to different types of throwing. We showed that the degree of overlap between microcomplexes for underarm and overarm throws could explain the degree of transfer between prism adaptation for the different throws.



Michael A. Arbib

b. After hundreds of blocks of trials, each involving adaptation and readaptation to the prisms, the (very dedicated) subject eventually reached a stage at which no adaptation was required when the prisms were donned or doffed. The basic model rested on the fact that hand areas in cerebellum and cortex are richly endowed with fibers encoding eye position, whereas there is no reason for evolution to have favored fibers encoding prism on/off. Thus, to complete our model, we hypothesized that a neutral mix of fibers from cerebral cortex was available to the relevant cerebellar microcomplexes, and thus a very sparse subset could convey features that might correlate with prism on/off even though neither evolution nor experience had previously selected for them. The model worked as follows: Because many fibers encoded eye position, learning could rapidly adjust enough synapses to adaptively change cerebellar modulation of the arm-throw circuitry. However, because the prism on/off-related fibers were so sparse, the chance of their being modified adaptively was very small, and thus the number of trials for their adaptation to become effective was very large. The suggestion, then, is that the drive from ventral premotor cortex to vocal control discovered by Coudé et  al. is akin to the prism on/off fibers of the model – evidence of sparse random connections rather than an evolved capability. Nonetheless, for creatures for whom greater vocal control became adaptive, this random group could become the target of Baldwinian evolution to foster the emergence of protospeech on the scaffolding of protosign. (This suggestion offers a new challenge for MSH modeling.) 8. Towards a new road map The main focus of this paper has been to show how the analysis of macaque brain models can deepen the understanding of the path from LCA-m via LCA-c to H. sapiens as hypothesized by an “old” road map, that of MSH (Arbib, 2012; 2016). Nonetheless, it has touched on several issues beyond MSH. Here is a slightly augmented list: 1. Further neurophysiology is required to assess the prevalence of mirror neurons for different classes of actions in different species. 2. MSH is based on the hypothesis that macaque F5 is homologous to Broca’s area. However, Belmalih et al. (2009) offer a more subtle parcellation of relevant brain areas in macaque; while Ferrari et  al. (2017) provide new data distinguishing mirror neuron networks in the manual and orofacial pathways (and see Coudé & Ferrari, 2018), linked to sensorimotor and limbic regions, respectively. This points to further exploration of the overlap between manual and orofacial networks “both within and beyond the mirror” to underpin

Computational challenges of evolving the language-ready brain

analysis of (1a) the motivation to communicate and (1b) the linkage between vocal and manual communication. 3. The brains of macaques and humans support diverse systems for working memory and for sequence learning and recall. More care is required to tease apart these subclasses before one can carefully elaborate their evolutionary relationship. 4. It would be useful to provide comparative models of modulation of innate calls in monkeys, ontogenetic ritualization and possible adaptation of an innate gestural repertoire in apes, and phoneme acquisition in human children. 5. As we further chart the evolution of the language-ready brain, we must seek to understand evolution of linked cerebro-cerebellar systems and bring other regions such as basal ganglia and hippocampus into play.

Acknowledgements My thanks to the reviewers for their constructive comments.

Funding This research was supported in part by NSF Grant No. BCS-1343544 “INSPIRE Track 1: Action, Vision and Language, and their Brain Mechanisms in Evolutionary Relationship,” (M.A. Arbib, Principal Investigator). The paper was prepared for a workshop funded by the same grant.

References Aboitiz, F. (2018). Voice, gesture and working memory in the emergence of speech. Interaction Studies, 19(1–2), 70–85.  https://‍ Alstermark, B., Lundberg, A., Norrsell, U. & Sybirska, E. (1981). Integration in descending motor pathways controlling the forelimb in the cat: 9. Differential behavioural defects after spinal cord lesions interrupting defined pathways from higher centres to motoneurones. Experimental Brain Research, 42, 299–318.  https://‍ Arbib, M. A. (2005). Interweaving Protosign and Protospeech: Further Developments Beyond the Mirror. Interaction Studies: Social Behavior and Communication in Biological and Artificial Systems, 6, 145–171.  https://‍ Arbib, M. A. (2007). How New Languages Emerge (Review of D. Lightfoot, 2006, How New Languages Emerge, Cambridge University Press). Linguist List, 18–432, Thu Feb 08 2007, http://‍ Arbib, M. A. (2012). How the Brain Got Language: The Mirror System Hypothesis. New York & Oxford: Oxford University Press. https://‍ Arbib, M. A. (2016). Towards a Computational Comparative Neuroprimatology: Framing the Language-Ready Brain. Physics of Life Reviews, 16, 1–54. https://‍


20 Michael A. Arbib Arbib, M. A. (2018). Computational Challenges of evolving the language-ready brain: 2. Building towards neurolinguistics. Interaction Studies, 19(1–2), 22–37. https://‍ Arbib, M. A., Billard, A., Iacoboni, M. & Oztop, E. (2000). Synthetic brain imaging: grasping, mirror neurons and imitation. Neural Networks, 13(8–9), 975–997. https://‍‍00070-8 Arbib, M. A., Ganesh, V. & Gasser, B. (2014). Dyadic Brain Modeling, Ontogenetic Ritualization of Gesture in Apes, and the Contributions of Primate Mirror Neuron Systems. Phil Trans Roy Soc B, 369 (1644), 20130414.  https://‍ Arbib, M. A. & Rizzolatti, G. (1997). Neural expectations: a possible evolutionary path from manual skills to language. Communication and Cognition, 29, 393–424. Arbib, M. A., Schweighofer, N. & Thach, W. T. (1995) Modeling the Cerebellum: From Adaptation to Coordination in Glencross, D. J. & Piek, J. P. (eds), Motor Control and Sensory-Motor Integration: Issues and Directions. Amsterdam: North-Holland Elsevier Science, 11–36. https://‍‍80005-1 Barrès, V., Simons, A. & Arbib, M. A. (2013). Synthetic event-related potentials: A computational bridge between neurolinguistic models and experiments. Neural Networks, 37, 66–92. https://‍ Belmalih, A., Borra, E., Contini, M., Gerbella, M., Rozzi, S. & Luppino, G. (2009). Multimodal architectonic subdivision of the rostral part (area F5) of the macaque ventral premotor cortex. The Journal of Comparative Neurology, 512(2), 183–217. https://‍ Bonaiuto, J. J. & Arbib, M. A. (2010). Extending the mirror neuron system model, II: what did I just do? A new role for mirror neurons. Biological cybernetics, 102(4), 341–59. https://‍ Byrne, R. W. (2003). Imitation as behaviour parsing. Philos Trans R Soc Lond B Biol Sci, 358(1431), 529–36.  https://‍ Byrne, R. W. & Russon, A. E. (1998). Learning by imitation: a hierarchical approach. Behav Brain Sci, 21(5), 667–84; discussion 684–721.  https://‍ Cangelosi, A. & Parisi, D. (eds) (2002). Simulating the Evolution of Language. London: Springer. https://‍ Cobas, A. & Arbib, M. (1992). Prey-catching and predator-avoidance in frog and toad: defining the schemas. J Theor Biol, 157(3), 271–304.  https://‍‍80612-5 Cooper, R. P. (2016) Schema Theory and Neuropsychology in Arbib, M. A. (ed), From Neuron to Cognition via Computational Neuroscience. Cambridge, MA: The MIT Press. Corina, D. P., Poizner, H., Bellugi, U., Feinberg, T., Dowd, D. & O’Grady-Batch, L. (1992). Dissociation between linguistic and nonlinguistic gestural systems: a case for compositionality. Brain and Language, 43(3), 414–47.  https://‍‍90110-Z Coudé, G. & Ferrari, P. F. (2018). Reflections on the organization of the cortical motor system and its role in the evolution of communication in primates. Interaction Studies, 19(1–2), 38–53.  https://‍ Coudé, G., Ferrari, P. F., Rodà, F., Maranesi, M., Borelli, E., Veroni, V., Monti, F., Rozzi, S. & Fogassi, L. (2011). Neurons Controlling Voluntary Vocalization in the Macaque Ventral Premotor Cortex. PLoS One, 6(11), e26822.  https://‍ Fagg, A. H. & Arbib, M. A. (1998). Modeling parietal-premotor interactions in primate control of grasping. Neural Netw, 11(7–8), 1277–1303. https://‍‍00047-1

Computational challenges of evolving the language-ready brain

Ferrari, P. F., Gerbella, M., Coudé, G. & Rozzi, S. (2017). Two different mirror neuron networks: The sensorimotor (hand) and limbic (face) pathways. Neuroscience. https://‍ Fogassi, L., Coudé, G. & Ferrari, P. F. (2013). The extended features of mirror neurons and the voluntary control of vocalization in the pathway to language. Language and Cognition, 5, 145–155.  https://‍ Halina, M., Rossano, F. & Tomasello, M. (2013). The ontogenetic ritualization of bonobo gestures. Animal cognition, 16(4), 653–666.  https://‍ Hecht, E. E., Gutman, D. A., Preuss, T. M., Sanchez, M. M., Parr, L. A. & Rilling, J. K. (2012). Process Versus Product in Social Learning: Comparative Diffusion Tensor Imaging of Neural Systems for Action Execution–Observation Matching in Macaques, Chimpanzees, and Humans. Cerebral Cortex, 23(5), 1014–24.  https://‍ Hobaiter, C. & Byrne, R. W. (2011). The gestural repertoire of the wild chimpanzee. Animal cognition, 14, 745–767.  https://‍ Jürgens, U. (2002). Neural pathways underlying vocal control. Neuroscience and Biobehavioral Reviews, 26(2), 235–258.  https://‍‍00068-9 Maranesi, M., Livi, A. & Bonini, L. (2015). Processing of Own Hand Visual Feedback during Object Grasping in Ventral Premotor Mirror Neurons. The Journal of Neuroscience, 35(34), 11824–11829.  https://‍ Marshall, J., Atkinson, J., Smulovitch, E., Thacker, A. & Woll, B. (2004). Aphasia in a user of British Sign Language: Dissociation between sign and gesture. Cognitive Neuropsychology, 21, 537–554.  https://‍ Martin, T. A., Keating, J. G., Goodkin, H. P., Bastian, A. J. & Thach, W. T. (1996). Throwing while looking through prisms. II. Specificity and storage of multiple gaze-throw calibrations. Brain, 119, (Pt 4), 1199–211.  https://‍ Oztop, E. & Arbib, M. A. (2002). Schema design and implementation of the grasp-related mirror neuron system. Biol Cybern, 87(2), 116–40.  https://‍ Oztop, E., Kawato, M. & Arbib, M. (2006). Mirror neurons and imitation: a computationally guided review. Neural Netw, 19(3), 254–71.  https://‍ Oztop, E., Kawato, M. & Arbib, M. A. (2013). Mirror neurons: Functions, mechanisms and models. Neuroscience Letters, 540, 43–55.  https://‍ Poizner, H., Klima, E. S. & Bellugi, U. (1987). What the hands reveal about the brain. Cambridge, MA: MIT Press. Rizzolatti, G., Camarda, R., Fogassi, L., Gentilucci, M., Luppino, G. & Matelli, M. (1988). Functional organization of inferior area 6 in the macaque monkey. II. Area F5 and the control of distal movements. Exp. Brain Res., 71, 491–507. https://‍ Russon, A. (2018). Pantomime and imitation in great apes: Implications for reconstructing the evolution of language. Interaction studies, 19(1–2), 200–215. https://‍ Taira, M., Mine, S., Georgopoulos, A. P., Murata, A. & Sakata, H. (1990). Parietal cortex neurons of the monkey related to the visual guidance of hand movement. Experimental Brain Research, 83, 29–36.  https://‍ Tomasello, M. & Call, J. (1997). Primate Cognition. New York: Oxford University Press.


Computational challenges of evolving the language-ready brain 2. Building towards neurolinguistics Michael A. Arbib

University of California at San Diego

A theory of evolving the language-ready brain requires a theory of what it is that evolved. We offer the TCG (Template Construction Grammar) model of comprehension and production of utterances to exhibit hypotheses on how utterances may link to “what language is about.” A key subsystem of TCG is the SemRep system for semantic representation of a visual scene. We offer an account of how it may have evolved as an expansion of the ventral pathway supporting the planning of manual actions, complemented by a dorsal pathway for articulation. The Mirror System Hypothesis (MSH) claims that early Homo sapiens had protolanguage but not language and that cultural evolution yielded the social structures within which children could indeed acquire language. The article poses the challenge of understanding how a brain system could be innately specified that could develop into a TCG-like form, posing a range of questions for future research. Keywords: Mirror System Hypothesis (MSH), protolanguage, languageready brain, SemRep, template construction grammar (TCG), holophrases, fractionation, constructions, neurocomputational principles, schema theory

1. Introduction Our EvoDevoSocio challenge is to understand how brain and body evolved to support social and practical interactions which could exploit the brain’s neural plasticity to develop cultures that included the use of language. Specifically, 1. What may have been the brain, behavior and social interaction of LCA-m (the last common ancestor, LCA, of monkeys and humans), and what evolutionary stages were crucial for the transition to LCA-c (our LCA with chimpanzees)? 2. What may have been the brain, behavior and social interactions of LCA-c, and what evolutionary stages were crucial for the transition to early Homo sapiens?

https://‍ © 2020 John Benjamins Publishing Company

Evolving language: From protolanguage to language

3. What may have been the “initial state” for early Homo sapiens. 4. How did cultural evolution yield the transition to language? 5. What social interaction allows a child to acquire language, and what innate brain mechanisms support that acquisition? 6. How do modern human brains support language use? The separation between (3) and (4) makes explicit the hypothesis that early Homo sapiens had a language-ready brain (LRB) but did not have language: i.e., it did not evolve to possess, innately, a specific language acquisition device or a “universal grammar” (Baker, 2001; Lightfoot, 2006). The further claim is that it took many tens of millennia of cultural evolution for H. sapiens to develop language-rich cultures, but that little or no brain-related changes in the genome were necessary for this transition. Consider the reading-ready brain (Colagé, 2016): the brain of a literate person has distinctive brain regions adapted for reading, e.g., the visual word form area (Dehaene & Cohen, 2011), even though literacy is “recent.” (See Behrmann & Plaut, 2013, for related modeling.) Part  1 (Arbib, 2018) addressed (1)–(3). More specifically, it introduced the Mirror System Hypothesis (MSH) which shows how, under adaptive pressures on behavior and social interaction, the human brain may have evolved to provide the following “initial state” for (3): Early humans had manual and vocal dexterity and the relevant perceptual skills in vision and audition, they had an ability for complex action recognition and imitation in both domains, and they could communicate by ad hoc pantomime and by protolanguage (an open collection of protowords, conventionalized within a group and which could be expressed manually [protosign], vocally [protospeech] and in multi-modal form, but with little or no grammar). The present article addresses (6), with a brief note on (5), to frame discussion of (4). While relatively self-contained, it does enrich the Part 1 discussion. 2. The Template Construction Grammar (TCG) model for how the human brain may support language production and comprehension Where Part  1 worked forward from LCA-m, here we work back from hypotheses on how modern human brains support language. Alas, while there are many interesting ERP and fMRI data, they are fragmentary, and the leading integrative perspectives (e.g., Bornkessel-Schlesewsky & Schlesewsky, 2013; Friederici, 2011; Hagoort, 2013) tend to ignore language production, lack a convincing integration of syntax and semantics, and disagree with each other. But if we cannot agree on how the human brain supports language, how can we agree on how it became language-ready? Arbib et al. (2014b) discussed why construction grammars, which integrate form and meaning in each construction, may better ground



Michael A. Arbib

neurolinguistic models than those which separate syntax and semantics, and assessed four computational approaches to construction grammar. Here, I focus on the Template Construction Grammar (TCG) model  – not because it is “better” but rather because I can be more explicit about how it may serve as a target for “what evolved.” Elsewhere (Arbib, 2016; 2017) I presented key notions concerning the TCG model, stressing that it links linguistic processing to “what the utterance is about.” (Computational details are presented by Barrès & Arbib, 2018a; b; see https://‍ for the code). But first, we need some background from schema theory and its relation to other styles of modeling. 2.1 Modeling using schema theory Here we continue from Section 2 of Part 1 the importance of addressing “computational challenges.” The primary data on the human brain are from neuropsychology (correlating changes in behavior with brain lesions or dysfunction), brain imaging (especially fMRI), and event-related potentials (ERPs), whereas in some animals we have data on activity of some individual neurons. Thus, one modeling strategy is to combine the latter data with evolutionary hypotheses to form models of human neural circuitry and then apply appropriate averaging on neural network simulations to make testable predictions for human brain data. One may also use connectionist neural models to model behavior directly, so long as the model supports testing against some form of human data. Moreover, models of language acquisition and cultural evolution must incorporate learning, e.g., via some form of neural plasticity, as in Hebbian learning and reinforcement learning (e.g., Chapters 6 and 10, respectively, of Arbib & Bonaiuto, 2016). Dyadic brain modeling (see Part 1, and Arbib et al., 2014a) extends brain modeling from analysis of a single brain to exploration of emergent forms of behavior when two animals interact – simulating the changes in each brain during these interactions. However, there is a rich literature using AI models of interacting agents to study aspects of cultural evolution without attention to neuroscience data (Cangelosi & Parisi, 2002; Kirby, 2000; Kirby et al., 2008; Steels, 2011). Future research may use results from this work to frame specific efforts in neurobiological modeling or to redo the experiments using only agents whose computational processes emerge from those we posit to be “innate” to the human brain. In the TCG model, we employ a form of schema theory (Arbib, 2012, pp. 10– 27, provides an exposition explicitly related to MSH modeling) which introduce perceptual schemas as basic functional units of perception and motor schemas for motor control, and other “cognitive” schemas as well. Schemas have activity levels and so may compete or cooperate (each trying to reduce or raise the other’s

Evolving language: From protolanguage to language

activity level, respectively). An activated perceptual schema may carry not only a classification (e.g., “this is an apple”) but also parameters that link it to other active schemas – e.g., a perceptual schema might pass information on size and position to a motor schema for grasping. The most active perceptual schemas that emerge from this process of cooperative computation then provide the context for activating related motor schemas. A key addition to the theory is the notion of schema instance: Consider the VISIONS system for visual scene interpretation (Draper et al., 1989) when assessed for its implications for brain modeling (Arbib & Caplan, 1979): To interpret an outdoor scene, we might need to employ a perceptual schema for “tree” and use it (if it out-competes other schemas) not only to determine that a tree is present but to capture its location, size and other properties. But if we see a scene with multiple trees of interest, we cannot reapply the schema without losing those details. Thus, we must instead require the spawning of instances of the schema, one for each notable tree, to capture full perception of the scene. Frustratingly, we still lack a good brain-related model of schema instantiation. One can train a biologically plausible neural network for, e.g., a perceptual schema to take an encoding of a region of a scene and provide as output a neural code for a confidence level that the scene contains an object of a particular type plus encoding of related parameters. But how does one create multiple instances? Can one set up an additional copy of a schema as and when needed? Or is a single copy switched to different regions as attention dictates, with “instances” being linkages in working memory of each activation to the related parameters (Arbib & Liaw, 1995)? In either case, I know of no neurobiologically verified implementation and thus pursue much of my cognitive modeling in terms of networks of schemas and instances, with these building atop neurobiologically grounded models of action control and recognition. 2.2 A model of language production for visual scene description A key challenge is to link language mechanisms to brain mechanisms for “what language is about.” The TCG approach is based on the “aboutness” provided the description of visual scenes (Arbib & Lee, 2008; 2009). Visual scene description is not the be-all and end-all of language but exemplifies how world and language may be linked. The same methodology could and should be extended to look at the interpretation of questions, commands and more within a given context. The SemRep/TCG production model (Figure 1) incorporates Visual Working Memory (WM) and Long-Term Memory (LTM) based on the VISIONS system briefly described above. Low-level processes segment the visual scene; various perceptual schemas (World Knowledge) are instantiated and compete and cooperate


26 Michael A. Arbib

Linguistic WM


(Construction Application Network)

(Schema Network)

Grammatical WM Construction instances

Grammatical Knowledge Abstract (higher-level) Constructions

Construction Set

Utterance output

World Knowledge Lexical Constructions

Event Object / Action Schemas

Semantic WM SemRep

VisualWM (Scene Interpretation Network)

R low equests -lev el p to roce sses

Schema instances

Schema-labeled regions


Inte rp


d sc ene

Image with labels

Visual input

Figure 1.  The structure of the SemRep/TCG model of scene description. Schema-like constructions provide the schema instances that compete and cooperate to associate an utterance with the SemRep (semantic representation) of a visual scene

in Visual WM until a group of highly activated schemas provides the interpretation of the scene. Such a network of schema instances may contain a rich set of information, e.g., object affordances, that are not germane to scene description. A key innovation, going beyond VISIONS, was to introduce the notion of a SemRep (Arbib & Lee, 2008) as a graph-like “semantic representation” of the visual scene, abstracting away from the assemblage of schema instances in visual WM those details that will not be described. The current SemRep is encoded in Semantic WM, which bridges between visual scene recognition and the language system. As an evolutionary parallel to the visual system, the language system again combines a WM and an LTM. But now the Linguistic LTM (Grammatical Knowledge) constitutes a set of constructions. Crucially, readout may vary, so that it may yield well-formed sentences or, often enough, fragmentary utterances. The latter incorporates a Linguistic WM and Grammatical knowledge (a set of constructions). The Linguistic WM holds a hierarchical covering of the current SemRep by iterated applications of instances of constructions from Long Term Memory. Lexical construction instances apply directly to the SemRep, associating nodes or subgraphs with words or idioms. Higher-level construction instances apply to an already partially completed construction assemblage. As in VISIONS, construction instances compete and cooperate until an assemblage reaches threshold for an utterance to be read off. VISIONS allows Visual WM to request more

Evolving language: From protolanguage to language

data from low-level visual processes; similarly, our model allows the SemRep to be updated by requesting information from the vision system when completion of an utterance requires further attention to the visual scene. 2.3 A model of language comprehension for visual scene description Barrès & Lee (2014) proposed a conceptual extension of TCG to a model of language comprehension structured to explain data on agrammatic aphasics in sentence-picture matching tasks during which the patient is asked to decide whether a sentence he hears matches a visual scene. Caramazza & Zurif (1976) showed that agrammatic aphasics may be impaired not only in production but also in their capacity to make use of syntactic cues during language comprehension. Such aphasics were no better than chance for sentences such as “the tiger that the lion is chasing is fat” but their performance was restored when world knowledge cues constrained the sentence interpretation as in “The apple that the boy is eating is red”. In the TCG-based model of language comprehension (Figure 2), the utterance feeds into two interacting routes. The grammatical (G) route updates the SemRep indirectly through the creation of a construction schema assemblage in grammatical working memory; the heavy semantic route (HS) can generate SemRep nodes directly for content words but is not sensitive to grammatical cues. Further, the heavy semantic route incorporates the possibility for the Semantic WM to query Linguistic WM Grammatical WM Construction instances


Abstract (higher-level) Constructions

Grammatical Knowledge Construction Set

(G) Lexical Constructions

(G) word perception

Utterance input

World Knowledge (G)

Event Object / Action Schemas



World Knowledge WM

Schema-labeled regions

low Requ -lev est el p s to roc ess es Interpreted scene



Semantic WM

Visual WM (Scene Interpretation Network)

Visual input

Figure 2.  TCG as employed in a two-route model of language comprehension. The G (Grammar) route exploits constructions to associate a SemRep with an utterance; the Heavy Semantics (HS) route seeks to exploit world knowledge triggered by words in the utterance to build a SemRep. In general, the 2 paths can cooperate, but in agrammatic aphasia, the grammar route is damaged or unavailable



Michael A. Arbib

the world knowledge WM to generate hypotheses about the plausible relations between SemRep nodes. The Semantic WM becomes the locus of cooperative competition in which the construction assemblage and the world knowledge hypotheses compete and cooperate to update the SemRep. When a message is conveyed within a visual or discourse context, processes within the schema assemblage may further cooperate with that process: the Visual WM remains a source of input for the SemRep as in the production model. (See Arbib, 2016, Section 4.3; 2017, for further details.) Note that many models of sentence comprehension stress syntax as completely abstracted from semantics, with the heart of comprehension being construction of a syntactic tree for the input message. TCG is based on the tenet that the goal of comprehension is to extract the meaning of the message as captured by the SemRep. 3. An evolutionary framework for language-ready pathways and processes In this section, we consider how to envisage the emergence of SemRep from an MSH perspective while bearing in mind that biological evolution yielded a language-ready brain rather than one in which subsystems specific to language are genetically prespecified. The starting point is the observation that macaques and humans have two visual pathways related to the control of hand movements. 3.1 SemRep in LCA-m The ventral pathway (from primary visual cortex via inferotemporal lobe to prefrontal cortex, which can influence premotor cortex) relates observing objects and their relations to the planning of manual action; The dorsal pathway (from primary visual cortex via parietal lobe to premotor cortex) attends to affordances, details such as shape and location relevant to converting intended actions into patterns of motor control. A model of this due to Fagg & Arbib (1998) is described briefly in Part  1. We here refine MSH to assert a non-linguistic role for a SemRep-like system, call it SemRep-pr (pr for praxis) already present in LCA-m. Specifically, we assert that the ventral visual pathway in LCA-m can be decomposed thus (it includes return pathways): Ventral: V1 → Visual scene analysis → SemRep-pr, key aspects of the scene relevant to action planning → A pattern of transitive manual action, i.e., acting on objects. And, as above, this works in cooperation with the dorsal visual pathway, namely:

Evolving language: From protolanguage to language

Dorsal: V1 → Extraction of affordances for attended objects → Motor parameters for execution of actions selected by the action plan. Caveats: (i) While we focus here on visual input concerning the state of the world, our actions (and thus SemRep) may depend on multi-modal cues, including those from audition and touch. (ii) Monkey calls are elicited by visual and/or auditory perception (e.g., the vervet leopard-specific alarm call may be elicited by recognizing the presence of a leopard or hearing a conspecific utter the call). It is recognized that, to the extent that cortex can affect such calls they depend on medial areas, rather than the lateral areas common to manual action and language use. It is thus the evolution of these lateral areas on which we focus. The evolving linkage of the medial and lateral system is a topic for complementary research – see my response (Arbib, 2013) to the commentaries of Aboitiz (2013) and Fogassi et al. (2013). 3.2 SemRep in LCA-c While innate alarm calls may initiate actions beneficial to others, much ape communication involves “instrumental” gestures intended to get the observer to meet some need of the gesturer. The relevant schema assemblage includes not only aspects of the gesturer’s immediate environment and a need/goal which is to be met, but also a recognition that the other could help fill that need. Thus, the “SemRep” that links to manual action in LCA-m must serve not only planning of one’s own actions but planning of a gesture that expresses the desire that the action be performed. We thus posit that two innovations occur in LCA-c (while preserving, and possibly extending, the above mechanisms from LCA-m): Ventral: A manual gestural system can recognize a situation as the basis for emitting a communicative gesture. SemRep-pr is now extended to support a very limited set of communicative actions (which may in some cases have arisen through the ritualization of praxic actions), but the aim of these gestures remains instrumental. Dorsal: In general, these gestures are intransitive, and require increased proprioception to enable a gesture to terminate successfully “in mid-air,” as distinct from tactile feedback from an object after it has been grasped by visual guidance (Arbib et al., 2014a). 3.3 SemRep in the language-ready brain Then, for the transition to the language-ready brain, i.e., the brain of early Homo sapiens that supports protolanguage but little or no grammar, we add the following:



Michael A. Arbib

Ventral: As pantomime and protospeech evolve, SemRep-pr expands to support explicit pantomimes and then protosigns which may request an instrumental response to achieve a praxic goal but may now share information about the current environment or even signal aspects outside the environment. With this, evolution has achieved SemRep-c (c for communication) as a spin-off of SemRep-pr and as a precursor for the SemRep posited by TCG to be the heart of scene description and thus of further feats of openended communication. Dorsal: MSH hypothesizes that protosign provided an adaptive context for vocal control, to fold vocal gestures into the manual communicative stream. A further hypothesis, then, is that  – in an enculturated modern human  – the ventral stream has connections from SemRep-c to motor systems for articulatory control of manual and vocal (and other) communicative signals, but it is the role of the dorsal system to convert an intended word into the detailed specification of how it is to be articulated. 3.4 Implications Modern neurolinguistics studies at least four major sets of connections from posterior to anterior brain (see Aboitiz, 2012, for a figure and references): the arcuate fasciculus, the superior longitudinal fasciculus; the middle longitudinal fasciculus; and the extreme capsule. Somewhat similar circuits have been described for the monkey. Rilling et al. (2008) used comparative diffusion tensor imaging to chart the connectivity of the arcuate fasciculus and its homologs in macaque, chimpanzee and human. For many authors, the comparatively greater enlargement of the (dorsal) arcuate fasciculus relative to overall brain size in the comparison monkey → chimpanzee → human has suggested that it is the dorsal stream that is crucial to the emergence of language. MSH offers a different interpretation (and much more work needs to be done to assess the merits of each hypothesis). First, asserting that the dorsal pathway is crucial to language does not address the fact that it is important in monkeys and chimpanzees, whereas MSH does. Moreover, MSH suggests that the major increase in the dorsal pathway is related to an increase in articulatory skill in both production and perception, while holding that the ventral pathway is crucial for “what language is about” – for linking word sequences via possibly hierarchical processing to “semantic representations.” In the TCG model of scene description (Figure 1), the readout simply provides a string of words – it does not model the process whereby these words are articulated, whether in speech or in sign. It may thus be construed as a model associated with the ventral pathway. Similarly, in the TCG model of comprehension (Figure 2), the words are already abstracted from the sensory domain (whether

Evolving language: From protolanguage to language

visual or auditory). We thus posit that TCG is a model of the ventral stream for sentence processing, complementing dorsal processing of the articulatory form of words whether for production or comprehension. Compare the ventral-dorsal model of speech perception, in which “a ventral (auditory) stream processes speech signals for comprehension, and a dorsal (auditory) stream maps acoustic speech signals to frontal lobe articulatory networks” (Hickok & Poeppel, 2007, from the abstract). Elsewhere (Arbib, 2017), I suggested a path for integrating the TCG model with a conceptual model of sentence comprehension (BornkesselSchlesewsky & Schlesewsky, 2013) based on the auditory ventral and dorsal pathways. An important research challenge is to resolve disagreements over the distribution of effort between dorsal and ventral pathways in each modality. But MSH claims that the early Homo sapiens brain could support protolanguage but not language – i.e., it did not provide mechanisms specifically evolved to support grammar. As the next section shows, MSH hypothesizes that mechanisms supporting complex action recognition and imitation in the manual domain could be exapted for grammatically structured language to emerge through cultural evolution. 4. Complex action recognition and imitation support the transition to language The “initial state” for early Homo sapiens is thought by many to include a capacity for protolanguage as distinct from language, and that cultural evolution yielded the transition to humans with language-rich cultures. The compositional view (Bickerton, 1995) hypothesizes that Homo erectus communicated by a protolanguage in which a communicative act comprised a few “words” denoting objects and actions, but without syntactic structure. In this view, the “protowords” were so akin in meaning to nouns and verbs that languages evolved from protolanguages just by “adding syntax.” The holophrastic view (which I share with Wray, 1998) holds that in much of protolanguage, a complete communicative act involved a “holophrase” whose parts had no independent meaning yet could signal an event or action or object that was either frequent and noteworthy, or very important even if rare. However, in addition to uttering a protoword to get a fellow human to, for example, aid in the hunt, a “speaker” might pantomime the trajectory that he plans to take or wants the “hearer” to take – we see here the precursor of cospeech and cosign gestures, and the way in which, after due cultural evolution, they will remain complementary to language rather than being a part of the conventionalized system. But how could cultural evolution have functioned to get from a limited lexicon of protowords (on MSH’s holophrastic view) to an expanding grammar and lexicon of language which can support the production and comprehension of novel



Michael A. Arbib

utterances? The key is the availability of complex action recognition, the ability to observe a complex action and see it as a composite of subactions – see Chapter 7 of Arbib (2012) and Part 1 for further details, and Myowa (2018) for more on the comparison of imitation skills in chimpanzee and human. In some cases, one can readily perceive “pieces” which are familiar actions. But in expanding one’s repertoire further, one must determine novel pieces that may serve as candidates for interpretation. We follow Wray in arguing that “protowords” were fractionated to yield pieces that originally may not correspond to subactions – but which, once stabilized, could yield words for constituents of their original meaning. We further argue that as protowords were fractionated, constructions developed to arrange the words to reconstitute those original meanings and many more besides. An ahistorical example: The only common element between conventionalized pantomimes for open door and close door might be a handle-turning motion, yet as the only common element it might become fractionated out as the symbol for door. Thereafter, a general capacity for complex action analysis invites one to attend to the remainders as new, complementary protowords for open and close. The similarity between door + open and door + close opens the possibility of generalization to the construction door + X where the slot X can be filled by the protoword for any action one might employ with a door, or the construction Y + open where the slot Y can be filled with a protoword for anything that might be opened. (Note how specific slot fillers are at this stage  – we are a long way from general categories like noun and verb.) We also see how, even at this primitive stage, this emerging protolanguage (there is no hard-and-fast line between complexifying protolanguages and simple languages) invites generalizations that might otherwise never have been made – the very existence of Y + open invites one to consider operations on non-doors that are like opening a door and yet (consider opening one’s mouth) are nonetheless very different. The power of metaphor seems to be an unavoidable concomitant of the recombination of symbols in novel constructions to express meanings that were not covered in the prior repertoire of holophrases. A process like the fractionation of holophrases would have yielded, piecemeal, the phonology of a (proto)‍language, i.e., the regularization of the articulatory components into a limited set of meaningless elements like phonemes or syllables in speech. There can be an intermediate stage in which many protowords were at least, in part, “nonphonological” (Sandler et al., 2011) but in time more and more protowords would be reduced to “phonological form.”

Evolving language: From protolanguage to language

5. Towards a new road map We have offered some comments that make the evolution of SemRep plausible, but we need an explicit account for how planning mechanisms from LCA-m might have “bifurcated” in evolution to include planning for communication, and not just for praxis, on the basis of the ventral schema assemblage (and linked representations). A further challenge is to assess the TCG model, and then modify or replace it accordingly: – We need the TCG model to be refined so that it can be tested against neurolinguistic data (ERP, fMRI and effects of lesions and disease) and improved accordingly. – We need to provide an evolutionary sequence of models that can be calibrated against the data of Rilling et al. (2008), Hecht et al. (2015) and others. – Currently, the production and comprehension models are separate in TCG. They need to be integrated to offer explicit roles for mirror systems and tested against relevant data (e.g., Menenti et al., 2011; Segaert et al., 2012). And then we need dyadic brain modeling to assess what is required for the model to support conversation. A strong claim of MSH is that a brain that evolved to support protolanguage and mechanisms for complex action recognition was language-ready  – i.e., it could support language for someone raised in a human in a language-rich context. Arbib (2012, Chapter 11) presents work (Hill, 1983) showing how the child acquires language not through direct mastery of an adult grammar but by extracting fragments of the adult speech stream and fractionating them to derive constructions which support more effective communication and which are built upon over the years to better and better approximate the adult grammar. Hill’s model is related to the competition model (MacWhinney, 1987; 2005; 2014) and Chang’s (2015) integrated model of acquisition and production in English and Japanese. But none of these link to neurobiology. An important challenge is to develop models of brain mechanisms that could have supported both (i) the initial emergence of languages from protolanguages perhaps 50 to 100 thousand years ago, and (ii) the modern child’s acquisition of the language of the community in which they are raised. But note the very real difference between “inventing” language as a new phenomenon and acquiring a language that already exists. Perhaps most crucially, the approach here rejects the Chomskian “just add Merge” view of language evolution (Berwick & Chomsky, 2016; Bickerton, 2009) in which a single breakthrough yields the transition from protolanguage to language. Construction grammar provides a framework in which, yes, the ability to form new constructions is crucial – but it supports a process of cultural evolution



Michael A. Arbib

whereby the transition via complexifying protolanguages to simple and then more complex languages may occur across the span of tens of millennia. The claim is that constructions emerge initially through decomposition of holophrases (rather than composition of elements in a prior lexicon) by processes that exploited brain mechanisms for recognizing and performing complex actions. Patterns and sequencing associated with such actions can be decomposed in terms of subactions – with the possibility that what the observer sees as a constituent action may not have been an actual subaction of the observed performance. A crucial computational challenge in building on and/or challenging MSH is to model the crucial role of praxis in underwriting the emergence of protolanguage and thence language. But let’s conclude on a conciliatory note: MSH holds that a protoword “could signal an event or action or object that was either frequent and noteworthy, or very important even if rare.” We might regard protowords for actions or objects as “Bickertonian.” Thus, while we stress the co-emergence of constructions via fractionation of holophrases for “events,” we need not require that all the “slot fillers” emerged by fractionation of protowords – some may indeed simply be protowords themselves, which thus graduate to the status of words as they are “captured” by the early grammar. How the Brain Got Language (Arbib, 2012) addresses some of the ways in which the study of emerging sign languages (Chapter 12) and grammaticalization, pidgins and creoles (Chapter 13) provide insight into how systems with simple lexicons and “constructicons” may complexify (but in some instances simplify – contrast Latin and English) through cultural evolution. The computational challenge is then both to model such processes at the schema level and to provide developmental models to show how a language-ready brain could indeed support those processes.

Acknowledgements My thanks to the reviewers for their constructive comments.

Funding This research was supported in part by NSF Grant No. BCS-1343544 “INSPIRE Track 1: Action, Vision and Language, and their Brain Mechanisms in Evolutionary Relationship,” (M.A. Arbib, Principal Investigator). The paper was prepared for a workshop funded by the same grant.

References Aboitiz, F. (2012). Gestures, vocalizations and memory in language origins. Frontiers in Evolutionary Neuroscience, 4(2).  https://‍ Aboitiz, F. (2013). How did vocal behavior “take over” the gestural communication system? Language and Cognition, 5, 167–176.  https://‍

Evolving language: From protolanguage to language

Arbib, M. A. (2012). How the Brain Got Language: The Mirror System Hypothesis. New York & Oxford: Oxford University Press. https://‍ Arbib, M. A. (2013). Complex Imitation and the Language-Ready Brain. Language and Cognition, 5(2–3), 273–312.  https://‍ Arbib, M. A. (2016). Towards a Computational Comparative Neuroprimatology: Framing the Language-Ready Brain. Physics of Life Reviews, 16, 1–54. https://‍ Arbib, M. A. (2017). Dorsal and ventral streams in the evolution of the language-ready brain: Linking language to the world. Journal of Neurolinguistics, 43, Part B, 228–253. https://‍ Arbib, M. A. (2018). Computational challenges of evolving the language-ready brain: 1. From Manual Action to Protosign. Interaction Studies, 19(1–2), 7–21. https://‍ Arbib, M. A. & Bonaiuto, J. J. (eds) (2016). From Neuron to Cognition via Computational Neuroscience. Cambridge, MA: The MIT Press. Arbib, M. A. & Caplan, D. (1979). Neurolinguistics must be Computational. Behavioral and Brain Sciences, 2, 449–483.  https://‍ Arbib, M. A., Ganesh, V. & Gasser, B. (2014a). Dyadic Brain Modeling, Ontogenetic Ritualization of Gesture in Apes, and the Contributions of Primate Mirror Neuron Systems. Phil Trans Roy Soc B, 369 (1644), 20130414.  https://‍ Arbib, M. A., Gasser, B. & Barrès, V. (2014b). Language is handy but is it embodied? Neuropsychologia, 55, 57–70.  https://‍ Arbib, M. A. & Lee, J. Y. (2008). Describing visual scenes: Towards a neurolinguistics based on construction grammar. Brain Research, 1225, 146–162. https://‍ Arbib, M. A. & Lee, J. Y. (2009). Template Construction Grammar and the Description of Visual Scenes. The Neurobiology of Language Conference, Chicago. Arbib, M. A. & Liaw, J. -S. (1995). Sensorimotor Transformations in the Worlds of Frogs and Robots. Artificial Intelligence, 72, 53–79.  https://‍‍00055-6 Baker, M. (2001). The Atoms of Language: The Mind’s Hidden Rules of Grammar. New York: Basic Books. Barrès, V. & Arbib, M. A. (2018a). From Gaze Patterns to Utterances: Modeling the Dynamics of Visual Scene Description. Cognitive Science, In preparation. Barrès, V. & Arbib, M. A. (2018b). SALVIA: A Neuro-Cognitive Model of Normal and Agrammatic Language Comprehension. Brain and Language, In preparation. Barrès, V. & Lee, J. Y. (2014). Template Construction Grammar: From Visual Scene Description to Language Comprehension and Agrammatism. Neuroinformatics, 12(1), 181–208. https://‍ Behrmann, M. & Plaut, D. C. (2013). Distributed circuits, not circumscribed centers, mediate visual recognition. Trends in Cognitive Sciences, 17(5), 210–219. https://‍ Berwick, R. C. & Chomsky, N. (2016). Why only us: Language and Evolution, Cambridge, MA: The MIT Press. Bickerton, D. (1995). Language and Human Behavior. Seattle: University of Washington Press. Bickerton, D. (2009). Adam’s Tongue. How Humans Made Language, How Language Made Humans. New York: Hill & Wang.



Michael A. Arbib Bornkessel-Schlesewsky, I. & Schlesewsky, M. (2013). Reconciling time, space and function: A new dorsal–ventral stream model of sentence comprehension. Brain Lang, 125(1), 60–76. https://‍ Cangelosi, A. & Parisi, D. (eds) (2002). Simulating the Evolution of Language. London: Springer. https://‍ Caramazza, A. & Zurif, E. B. (1976). Dissociation of algorithmic and heuristic processes in language comprehension: Evidence from aphasia. Brain and Language, 3(4), 572–582. https://‍‍90048-1 Chang, F. (2015). The role of learning in theories of English and Japanese sentence processing, in Nakyama, M. (ed), Handbook of Japanese psycholinguistics. Boston: De Gruyter Mouton, 353–385.  https://‍ Colagé, I. (2016). The Cultural Evolution of Language and Brain: Comment on “Towards a computational comparative neuroprimatology: Framing the language-ready brain” by M.A. Arbib. Physics of Life Reviews, 16, 61–62.  https://‍ Dehaene, S. & Cohen, L. (2011). The unique role of the visual word form area in reading. TRENDS in Cognitive Sciences, 15(6), 254–262.  https://‍ Draper, B. A., Collins, R. T., Brolio, J., Hanson, A. R. & Riseman, E. M. (1989). The schema system. International Journal of Computer Vision, 2, 209–250. https://‍ Fagg, A. H. & Arbib, M. A. (1998). Modeling parietal-premotor interactions in primate control of grasping. Neural Netw, 11(7–8), 1277–1303. https://‍‍00047-1 Fogassi, L., Coudé, G. & Ferrari, P. F. (2013). The extended features of mirror neurons and the voluntary control of vocalization in the pathway to language. Language and Cognition, 5, 145–155.  https://‍ Friederici, A. D. (2011). The brain basis of language processing: from structure to function. Physiological Reviews, 91(4), 1357–1392.  https://‍ Hagoort, P. (2013). MUC (Memory, Unification, Control) and beyond. Frontiers in Psychology, 4.  https://‍ Hecht, E. E., Gutman, D. A., Bradley, B. A., Preuss, T. M. & Stout, D. (2015). Virtual dissection and comparative connectivity of the superior longitudinal fasciculus in chimpanzees and humans. Neuroimage, 108, 124–137.  https://‍ Hickok, G. & Poeppel, D. (2007). The cortical organization of speech processing. Nat Rev Neurosci, 8(5), 393–402.  https://‍ Hill, J. C. (1983). A computational model of language acquisition in the two-year-old. Cognition and Brain Theory, 6, 287–317. Kirby, S. (2000). Syntax without natural selection: How compositionality emerges from vocabulary in a population of learners, in Knight, C., Studdert-Kennedy, M. & Hurford, J. R. (eds), The evolutionary emergence of language. Cambridge: Cambridge University Press, 99–119. https://‍ Kirby, S., Cornish, H. & Smith, K. (2008). Cumulative cultural evolution in the laboratory: An experimental approach to the origins of structure in human language. Proceedings of the National Academy of Sciences, 105(31), 10681–10686. https://‍ Lightfoot, D. W. (2006). How New Languages Emerge. Cambridge: Cambridge University Press. https://‍

Evolving language: From protolanguage to language

MacWhinney, B. (1987). The Competition Model, in MacWhinney, B. (ed), Mechanisms of language acquisition. Hillsdale, NJ: Lawrence Erlbaum, 249–308. MacWhinney, B. (2005). A unified model of language development, in Kroll, J. F. & Groot, A. M. B. D. (eds), Handbook of bilingualism: Psycholinguistic approaches. Oxford: Oxford University Press, 49–67. MacWhinney, B. (2014). Item-based patterns in early syntactic development, in Herbst, T., Schmid, H. -J. & Faulhaber, S. (eds), Constructions Collocations Patterns Walter de Gruyter, 33–69. Menenti, L., Gierhan, S. M. E., Segaert, K. & Hagoort, P. (2011). Shared Language Overlap and Segregation of the Neuronal Infrastructure for Speaking and Listening Revealed by Functional MRI. Psychological Science, 22(9), 1173–1182. https://‍ Myowa, M. (2018). The Evolutionary Roots of Human Imitation, Action Representation, and Word Learning Interaction Studies, 19(1–2), 183–199. Rilling, J. K., Glasser, M. F., Preuss, T. M., Ma, X., Zhao, T., Hu, X. & Behrens, T. E. (2008). The evolution of the arcuate fasciculus revealed with comparative DTI. Nature Neuroscience, 11(4), 426–428.  https://‍ Sandler, W., Aronoff, M., Meir, I. & Padden, C. (2011). The gradual emergence of phonological form in a new language. Natural Language & Linguistic Theory. https://‍ Segaert, K., Menenti, L., Weber, K., Petersson, K. M. & Hagoort, P. (2012). Shared Syntax in Language Production and Language Comprehension – An fMRI Study. Cerebral Cortex, 22(7), 1662–1670.  https://‍ Steels, L. (2011). Modeling the cultural evolution of language. Physics of Life Reviews, 8(4), 339–356.  https://‍ Wray, A. (1998). Protolanguage as a holistic system for social interaction. Language & Communication, 18, 47–67.  https://‍‍00033-5


Starting from the Macaque

Reflections on the differential organization of mirror neuron systems for hand and mouth and their role in the evolution of communication in primates Gino Coudé and Pier Francesco Ferrari

Institut des Sciences Cognitives Marc Jeannerod – CNRS

It is now generally accepted that the motor system is not purely dedicated to the control of behavior, but also has cognitive functions. Mirror neurons have provided a new perspective on how sensory information regarding others’ actions and gestures is coupled with the internal cortical motor representation of them. This coupling allows an individual to enrich his interpretation of the social world through the activation of his own motor representations. Such mechanisms have been highly preserved in evolution as they are present in humans, apes and monkeys. Recent neuroanatomical data showed that there are two different connectivity patterns in mirror neuron networks in the macaque: one is concerned with sensorimotor transformation in relation to reaching and hand grasping within the traditional parietal-premotor circuits; the second one is linked to the mouth/face motor control and the new data show that it is connected with limbic structures. The mouth mirror sector seems to be wired not only for ingestive behaviors but also for orofacial communicative gestures and vocalizations. Notably, the hand and mouth mirror networks partially overlap, suggesting the importance of hand-mouth synergies not only for sensorimotor transformation, but also for communicative purposes in order to better convey and control social signals. Keywords: mirror neuron, communicative gesture, premotor cortex, limbic system, macaque

Introduction The motor system has always been referred to as a complex network evolved to control behavior. Although this view is still partly true, it has been challenged by the discovery of mirror neurons (MNs) in macaques (di Pellegrino et al., 1992; https://‍ © 2020 John Benjamins Publishing Company

Mirror neuron systems for hand and mouth in the communication in primates

Gallese et al., 1996). MNs, in fact, provide a motor template through which the perception of others’ behavior activates in the observer the same motor representations used during the execution of the behavior. Due to this additional property, our understanding of the motor system has expanded its role from a purely controlling machine to a system heavily involved in social cognition. The striking properties of MNs led also to the hypothesis that they might be involved in some aspects of language or communication (Rizzolatti and Arbib, 1998). In the present review, we first briefly describe the neuroanatomical connections of the mirror system, complementing the well-charted understanding of the manual mirror system by emphasizing that its relatively overlooked component: the mouth mirror system, has a dramatically different pattern of connectivity with other brain regions, bringing in the limbic system. Secondly, we describe neuroethological studies that link the mouth sector of the premotor cortex to facial mimicry and vocalization. Finally, we speculate that in primates, hand and mouth motor synergies have been at the core of complex forms of communication.

Mirroring others’ actions and gestures through the motor system One of the most striking features of the motor system is its involvement in decoding others’ behavior through a mechanism in which the sensory (either visual or acoustic) description of an action is translated into a motor format. By exploiting the rich neuroanatomical network sustaining sensorimotor transformations at the service of pragmatic functions (i.e. reaching and grasping an object), the motor system has expanded its functions within a social domain. The discovery of MNs represents therefore an important landmark in our comprehension of the mechanisms and functions of the motor system because it has provided a new perspective on how sensory information regarding others’ actions/gestures is coupled with the internal cortical motor representation of it. This coupling allows the creation of a matching mechanism enabling individuals to enrich their interpretation of the social world through the activation their own motor representations that have been built in the course of phylogeny and ontogeny.

Hand and mouth: Two different mirror networks MNs discharge when a subject either actually performs a motor act or simply observes the same act being performed by someone else. In other words, the observation of an action triggers in the observer’s brain a representation of that action in a motor format (Rizzolatti and Sinigaglia, 2016). One of the key features of MNs is that of generalizing the visual stimulus that triggers the neuron response from self’s action to others’ action (Tramacere et al., 2016; Maranesi et al., 2017).


40 Gino Coudé and Pier Francesco Ferrari

The likely starting process through which this generalization process emerged is embedded in the properties of the motor system which in the course of the arm movement is capable of visually tracking one’s own hand to reach the target (Oztop and Arbib, 2002; Maranesi et al., 2017) Such visuo-motor coupling, necessary for grasping objects, has been exploited during evolution in order to track others’ grasping actions. The literature about MNs is abundant and the connectivity pattern of handrelated mirror neurons has been amply described elsewhere (see Rizzolatti et al., 2014; Borra et al., 2017), therefore, we will not further review it here beyond presenting the top half of Figure  1. In this section, we will instead emphasize the anatomical connections of the mouth mirror sector that lies in the lateral sector of F5c (lateral to the hand sector with an overlapping hand-mouth area) and in the bordering dorsal opercular (DO) cortex. Mouth MNs form a category of neurons that mirror ingestive and communicative mouth actions. In fact, a small percentage of mouth MNs respond to communicative gestures such as lip-smacking, a typical macaque affiliative gesture (Ferrari et al., 2003). Recent neuroanatomical data (Ferrari et al., 2017) have shown that the connectivity pattern of the cortical sector of mouth MNs is different from the one subserving hand MNs. We thus proposed that the mirror mechanism is composed and supported by, at least, two different anatomical pathways (Figure  1). As mentioned above, one of these pathways is concerned with sensorimotor transformation in relation to reaching and hand grasping within the traditional parietal-premotor circuits. The other is linked to mouth/face motor control and is connected with limbic structures, involved in facial expression and processing of emotions as well as processing of reward. Note that there is a significant overlap between the hand and mouth representations within the ventral PMv.

Processing reward and social context Unlike the hand regions, the mouth mirror sector has strong connections with the anterior cingulate cortex, orbitofrontal cortex, the anterior insula, and the basolateral amygdala. The projections to the anterior cingulate cortex are likely targeting a region involved in emotional processing of information in relation to reward value or to the relevance of behaviors linked to outcomes (Hayden et  al., 2010; Cai and Padoa-Schioppa, 2012). Interestingly, neuroimaging studies showed that the observation and imitation of facial expressions of emotions activates a region of the anterior cingulate cortex (Carr et al., 2003; Singer et al., 2004). The link of the mouth mirror sector with the anterior cingulate cortex could be therefore important to process information regarding food value for programming and selecting ingestive activities. It can also be exploited within the social communication

Mirror neuron systems for hand and mouth in the communication in primates Hand mirror neuron network

Cg F6/F3



F4/ F1




p 12r/46v

F5/F4 lA

Sll a ul Ins


Mouth mirror neuron network





C lP


IA F5/F4

46v 12r

Face Mouth

11/12o /13 12l


PF/ PFop


l /Sl Skc ula Ins

ST Amygdala Basal nucleus

Figure 1.  Summary of the hand and the mouth mirror neuron networks. The connectivity of the hand mirror neuron network is based on the description of neural tracer injections placed in the dorsal part of area F5 in which the hand mirror neurons were found. Specifically, the connections indicated are based on those observed in previous works. The connectivity of the mouth mirror neuron network is based on the description of neural tracer injections placed in the ventral part of area F5 and in the opercular areas GrFO and DO in which the mouth mirror. Adapted from Ferrari et al. (2017)



Gino Coudé and Pier Francesco Ferrari

domain since there is a link between ACC and the basic affiliative and communicative behaviors (see Apps et al., 2016), as well as with cost-benefit decisions made within social context (Hillman and Bilkey, 2012). The mouth mirror sector is also connected with the orbitofrontal cortex, known to be involved in coding reward value obtained in social context (Azzi et al., 2012). These connections show that the mouth mirror system has access to information related to ingestion and food rewards as well as of social nature.

Mouth mirror access to visual information does not occur via the parietal cortex The hand mirror sector has been shown to have direct anatomical connections to parietal regions AIP and PFG (Rozzi et al., 2006; Gerbella et al., 2011). It has been proposed that these parietal areas transmit visual information regarding hand grasping movement and constitute a hub for the visual input coming from the superior temporal sulcus (STS) (Nelissen and Vanduffel, 2011). However, the mouth mirror sector is connected with parietal areas (Bruni et al., 2017) mostly related to somatosensory and motor representations of the face and of the mouth. We can hypothesize that the visual information serving orofacial or communicative behaviors comes through pathways involving the ventral prefrontal cortex areas (12 and 46), the insula and the amygdala. The ventrolateral prefrontal cortex is connected with mouth mirror sector (Ferrari et al., 2017), and with the temporal sectors that are contributing to the coding of biological motion and facial expressions, and to the processing of complex stimuli linked to vocalizations (Barbas, 1988; Romanski, 2012; Gerbella et al., 2013). It is also possible that the information related to facial expressions and/or its emotional content could reach the mouth mirror sector via its connections with the orbitofrontal cortex, the insula or the amygdala.

Facial gestural communication and the face mirror network The mouth mirror sector sends projections to the facial nucleus for the innervation of the lower part of the face, likely involved in motor control of the mimic facial muscles (Morecraft et al., 2001), a group of muscles in the region of the face that make the physical expression of emotions possible through their movements. As mentioned above, the mouth sector has strong connections with limbic structures known to be involved in encoding emotional facial expressions and processing reward and motivation: the anterior cingulate cortex, the anterior and mid-dorsal insula, the orbitofrontal cortex and the basolateral amygdala. This connectivity pattern is probably reflected in the fact that mouth mirror neurons fire during intransitive (communicative) actions in monkeys, while hand mirror neurons do not. Interestingly, the mid-dorsal insula is a structure known to provoke affiliative

Mirror neuron systems for hand and mouth in the communication in primates

facial expression when electrically stimulated (Caruana et al., 2011; Jezzini et al., 2012). The mouth sector also has connections with the ventral sector of the putamen, a region that is part of a circuit related to motivated behavior and that is involved in foraging behavior (Tremblay et al., 2015). These anatomical data are in agreement with neurophysiological evidence showing that in the mouth MNs F5/ opercular region there are neurons responding to facial communicative gestures (e.g. lipsmacking) (Ferrari et al., 2017). From a functional perspective, it is possible that the lateral sector of the F5/ opercular region has a motor control on the facial mimic muscles for communicative purposes. The capacity to activate motor programs corresponding with those observed may suggest that a mirror mechanism could underpin some behavioral phenomena that require a prompt and matched response to a communicative signal sent by a conspecific. Such mimicry phenomena have been documented in nonhuman primates and they are highly relevant in affiliative context where individuals coordinate face-to-face exchanges, either during play behaviors or during mother-infant affective communication (Ferrari et al., 2006; Mancini et al., 2013). Facial mimicry (see example in Figure 2) is a common behavioral phenomenon that consists in a form of affective dyadic exchange in which the affective state of one individual facilitates the activation of a similar motor program in the receiver. It also activates the bodily/autonomic responses that are associated to it.

Figure 2.  Example of facial mimicry during play in two Gelada Baboons (Photo by Pier Francesco Ferrari)


44 Gino Coudé and Pier Francesco Ferrari

Other forms of emotional contagion (Hatfield et al., 1993) have been described in animals (emotion recognition, emotion contagion, and emotion priming), even though the neural mechanisms responsible for them are still unknown (Decety and Jackson, 2006; Singer, 2006; Lamm et al., 2011; Walter, 2011). Although these forms of emotional contagion are common and their very basic response probably do not require complex cognitive processes, they may nevertheless constitute the building block of rudimentary forms of empathy. In fact, many scholars believe that MNs, or at least a mirroring mechanism, can account for some rudimentary forms of empathy, like facial mimicry (Preston and Waal, 2002; de Vignemont and Singer, 2006). Mouth MNs and the mirroring of facial mimicry are probably at the basis of the capacity to become emotionally attuned with another individual. According to Preston and de Waal (2002) empathy is in fact a multilayer phenomenon which has at its core some mechanisms coupling action and perception. Empathy can thus be defined as the ability to understand and share the internal states of others. Several scholars agree that it is a complex, multidimensional phenomenon that includes a number of functional processes, including emotion recognition, emotion contagion, and emotion priming (see Decety and Lamm, 2006), as well as the abilities to react to the internal states of others, and to distinguish between one’s own and others’ internal states (see Tomova et al., 2014). Empathy can take various forms along a spectrum. At one end of this spectrum, mimicry and emotional contagion appear to be shared by several mammalian species, like primates, mice, pigs and dogs (see Tramacere and Ferrari, 2016). At the other end of this spectrum, higher forms of empathy such as cognitive empathy rely on a conscious, deliberative process through which inferences can be made about others’ bodily and affective states, beliefs, and intentions – often referred to as “mentalizing” – (Keysers and Fadiga, 2008; Zaki and Ochsner, 2012).

Hand mouth synergies There is a significant overlap between the hand and mouth representations within the ventral PMv (Maranesi et  al., 2012). Neurons discharging for mouth, hand actions or gestures are often intermingled. Such motor organization, with overlapping hand and mouth motor representations, has been previously described (McGuinness et  al., 1980; Huang et  al., 1988). Interestingly, electrically evoked complex movements have been also reported in the monkey (Graziano et al., 2002; Kaas et al., 2013), and they often involve coordinated movements of the hand and of the mouth. These ethologically relevant movements seem to reflect synergistic responses aimed at optimizing behaviors that are relevant for survival. The somatotopic and functional organization of the motor cortex facilitates the recruitment of the cortical motor commands involved in the control of facial muscles

Mirror neuron systems for hand and mouth in the communication in primates

when the combined movements of hand and mouth are requested (Graziano et al., 2005; Desmurget et al., 2014). The use of the hand in affecting mouth responses is also supported by numerous human kinematic studies by Gentilucci and colleagues showing that the movement of the hand during grasping affects the simultaneous kinematics of the mouth during different motor tasks. Another series of investigations showed that the grasping of objects of different size influences the motor command for mouth opening (Gentilucci and Campione, 2011). Hand and mouth integration could also be achieved through thalamic relays. Thalamic projections to the mouth mirror sector derive from the anterior nuclei associated with sensory-motor functions (VA; X), but also from more posterior nuclei such as MD. This nucleus is also connected with the prefrontal areas 46v and 12, which contain neurons responsive during the execution of hand and mouth actions (Simone et al., 2015) and during action observation (Simone et al., 2017). Areas 46v and 12 are, in turn, linked with the mouth mirror sector. This trans-thalamic interplay could modulate the efficacy of direct inputs from one cortical area to another (Sherman, 2007; Saalmann and Kastner, 2011). It could likely reflect the integration of hand and mouth motor synergies in coordinated motor sequences that are required during foraging behaviors. This suggests that some aspects of the motor control of the hand and of the mouth are neurophysiologically coordinated in order to support synergies during hand-mouth interaction, when the monkey grasps food and brings it to the mouth, for instance (Gentilucci et al., 1988; Ferrari et al., 2003).

Hand mouth synergies for gestural communication The link between the hand and the mouth has been hypothesized to be integrated also within the communication domain. Several gestures in primates can involve the oro-facial or/and the brachiomanual system in conjunction with body postures. Facial gestures often involve face-to-face exchanges, involuntary acts and autonomic responses (Ferrari et al., 2006). Some of these gestures have been extensively studied by comparative investigations that could reconstruct, with reliable approximation, their possible relatedness and origin among the different species (van Hooff, 1967). Regarding brachio-manual gestural communication, apes use them in a richer and more elaborated way than monkeys (Call and Tomasello, 2007). In the last ten years there has been an increasing body of research, in part stimulated by the idea that brachio-manual gestures have probably played a role in language evolution (Arbib et al., 2008; Liebal and Call, 2012). Apes, for example, are able to use several types of gestures, often in combination, to request food (Leavens et al., 2004, 2005; Gómez, 2007). In captivity, chimpanzees and also some monkeys point to request food or objects and, in the case of chimpanzees, they are sensitive to the attentional


46 Gino Coudé and Pier Francesco Ferrari

state of the human experimenter when they point (Leavens et al., 2004b). Although they do not gesture to share information or to inform others, it has been pointed out that they might use brachio-manual gestures in many flexible ways. Under human rearing conditions some apes have been reported to use declarative gestures, thus showing the potential to expand their cognitive and contextual use of the communicative gesture (Lyn et  al., 2011). There are several lines of converging evidence from neuroscience, ethology and developmental psychology that many of the gestures displayed by nonhuman primates began their existence as actions devoid of a communicative function (Halina et al., 2013; Arbib et al., 2014). Over time, gestures became co-opted and transformed into communicative devices that accomplished similar functions (Fogassi and Ferrari, 2012; Liebal and Call, 2012). Interestingly, a recent brain imaging study in deaf signers found that the Broca’s area, part of the human MN system, is activated during both the observation and execution of hand sign language (Okada et al., 2016). This and other data seem to converge in indicating that a MN system for speech and hand gesture exploits a common brain network (Gentilucci and Corballis, 2006) in which the coupling of sensory and motor information is instrumental to facilitate an efficient signal exchange between the signaler and the recipient. We propose that MNs followed two different evolutionary trends: hand guidance in space and gestural or vocal communication. However, hand-mouth synergies must have been exploited at a communicative level for better conveying and controlling the transmitted information (Gentilucci and Corballis, 2006). This kind of transition can be seen at a behavioral level in apes, but the corresponding neurophysiological data is missing. Vocalization might have had a late beginning in the evolution, but some of its rudiments are present in extant monkeys.

Towards a new road map Neuroanatomical and ethological data indicate that the motor system is not purely dedicated to the control of behavior, but also plays a role in cognitive functions that are especially relevant in social context, complementing systems “beyond the mirror.” The mirror network neuroanatomical data are also relevant at an evolutionary level. The mirror system hypothesis (MSH) has been elaborated in its core elements on the available knowledge at the time on the main properties of hand mirror neurons and on the related hand mirror circuits (i.e. AIP-PFG-F5). It traces an evolutionary path of the role of mirror neurons within larger systems “beyond the mirror” to provide a path via increasingly complex imitation and pantomime to protosign, with even simple protosign providing support for the emergence of protospeech. Systems beyond the mirror evolve to provide meaning that complements the control and perception of articulation.

Mirror neuron systems for hand and mouth in the communication in primates

The mirror neuron system itself evolved. In this paper, we reviewed its connectivity and posit that there were at least two mirror networks in LCA-m. The fact that more than one mirror neuron network exists in LCA-m might indicate that they were shaped through different evolutionary pathways (with important overlap though), each with an independent natural history due to unique selective pressures. Unlike the hand MN sector, the mouth MN sector, have a set of connections with brain regions that are part of the limbic system and that are involved in emotion and reward processing. This suggests that the mirror neuron circuitry changed or perhaps underwent coordinated evolutionary modifications with neural systems “beyond the mirror”, such as the limbic system. Increasing social complexity favored individuals that are attuned with the emotional states of their peers, especially in species, like monkeys and apes, where parental care is particularly long and is foundational for the emotional and cognitive development of the infant. We can speculate that a mouth MN network with an access to the limbic system has important implications regarding the evolutionary processes that linked emotional communication with facial gestures. A second important aspect to be considered is how vocalizations could have been integrated in such complex communication system. For long, monkeys’ vocalizations have been considered outside the volitional control and therefore investigated as a system independent from other forms of communication both in terms of mechanical/anatomical and neural control. The discovery in the PMv of neurons that are activated during conditioned vocalization (Coudé et al., 2011) challenged this view. First, it indicates that volitional control of vocalization might have emerged in anatomical areas overlapping with cortical regions involved in hand and mouth motor control. Whether such anatomical convergence of different effectors had an impact on the potential synergies between vocal control and gestures remains an intriguing hypothesis worth to be investigated. Second, it suggests a timescale for the emergence of vocal control such that some evolutionary pressure must have come into play well before the use of protosigns (i.e. communication based on conventionalized manual gestures) developed. This does not mean that the capability of monkeys to control vocalization “promote“ voice as being a direct route to language. In our view, the vocalization control circuitry in the monkey (LCA-m) was a building block for what would later become the controlled utterances of protospeech. This means that a circuit at least partly dedicated to voluntary vocal control, and emerging from early evolutionary pressures, had opened a restricted path for more complex forms of vocalization. This restricted path, taking advantage of some features of the PMv – among which the overlapping cortical representation of hand, mouth and larynx, and the presence of motor and mirror neurons coding goals independently of the effector used – has probably contributed to make vocal signals


48 Gino Coudé and Pier Francesco Ferrari

“suitable” for further evolutionary changes, where protosign mechanisms are scaffolding protospeech. When writing about the evolutionary expanding spiral involving protosign and protospeech where “mechanisms evolved to support one become available to support the other”, Arbib (2016) writes: “the mechanisms that evolved to support protosign extended collaterals to yield the control of the vocal apparatus that supported an increasingly precise control of vocalization needed to support speech” (see also Arbib’s interesting suggestion on how this vocal control could have evolved, this volume). The neurophysiological and behavioral monkey/ape data, allow to hypothesize that the passage from protosign to protospeech was possible only because the cortical circuitry had, a long time ago (in LCA-m), started a process of shaping vocal control. This process would eventually make vocal control amenable to be scaffolded by protosign and later become protospeech.

Acknowledgements This paper is dedicated to the memory of Maurizio Gentilucci, an outstanding and rigorous scientist, who greatly contributed to our understanding of the mouth-hand motor synergies and their implications for gestural communication.

Funding This research was supported in part by the Division of Intramural Research, NICHD, and NIH P01 HD064653. The paper was prepared for a workshop funded by NSF Grant No. BCS-1343544 “INSPIRE Track 1: Action, Vision and Language, and their Brain Mechanisms in Evolutionary Relationship,” (M.A. Arbib, Principal Investigator).”

References Apps, M. A. J., Rushworth, M. F. S., Chang, S. W. C. (2016). The Anterior Cingulate Gyrus and Social Cognition: Tracking the Motivation of Others. Neuron 90, 692–707. https://‍ Arbib, M. A. (2016). Towards a Computational Comparative Neuroprimatology: Framing the language-ready brain. Phys Life Rev 16, 1–54.  https://‍ Arbib, M. A., Liebal, K., Pika, S. (2008). Primate Vocalization, Gesture, and the Evolution of Human Language. Curr Anthropol 49, 1053–1076 Available at: http://‍ Arbib, M., Ganesh, V., Gasser, B. (2014). Dyadic brain modelling, mirror systems and the ontogenetic ritualization of ape gesture. Philos Trans R Soc B Biol Sci 369, 20130414–20130414 Available at: http://‍ Azzi, J. C. B., Sirigu, A., Duhamel, J. -R. (2012). Modulation of value representation by social context in the primate orbitofrontal cortex. Proc Natl Acad Sci 109, 2126–2131. https://‍

Mirror neuron systems for hand and mouth in the communication in primates 49

Barbas, H. (1988). Anatomic organization of basoventral and mediodorsal visual recipient prefrontal regions in the rhesus monkey. J Comp Neurol 276, 313–342. https://‍ Borra, A. E., Gerbella, M., Rozzi, S., Luppino, G. (2017). The macaque lateral grasping network: a neural substrate for generating purposeful hand actions. Neurosci Biobehav Rev 75, 65–90. https://‍ Bruni, S., Gerbella, M., Bonini, L., Borra, E., Coudé, G., Francesco, P., Fogassi, L., Maranesi, M., Rodà, F., Simone, L., Ugolotti, F., Rozzi, S. (2017). Cortical and subcortical connections of parietal and premotor nodes of the monkey hand mirror neuron network. Brain Struct Funct 0:0.  https://‍ Cai, X., Padoa-Schioppa, C. (2012). Neuronal encoding of subjective value in dorsal and ventral anterior cingulate cortex. J Neurosci 32, 3791–3808. https://‍ Call, J., Tomasello, M. (2007). The gestural communication of apes and monkeys. (Erlbaum, L., ed). Mahwah, NJ. Carr, L., Iacoboni, M., Dubeau, M. -C., Mazziotta, J. C., Lenzi, G. L. (2003). Neural mechanisms of empathy in humans: a relay from neural systems for imitation to limbic areas. Proc Natl Acad Sci U S A 100, 5497–5502.  https://‍ Caruana, F., Jezzini, A., Sbriscia-Fioretti, B., Rizzolatti, G., Gallese, V. (2011). Emotional and social behaviors elicited by electrical stimulation of the insula in the macaque monkey. Curr Biol 21, 195–199.  https://‍ Coudé, G., Ferrari, P. F., Rodà, F., Maranesi, M., Borelli, E., Veroni, V., Monti, F., Rozzi, S., Fogassi, L. (2011). Neurons controlling voluntary vocalization in the macaque ventral premotor cortex. PLoS One 6, 1–10.  https://‍ de Vignemont, F., Singer, T. (2006). The empathic brain: how, when and why? Trends Cogn Sci 10, 435–441.  https://‍ Decety, J., Jackson, P. L. (2006). A Social-Neuroscience Perspective on Empathy. Curr Dir Psychol Sci 15, 54–58 Available at: http://‍ rence&D=psyc5&NEWS=N&AN=2006-06699-002. Decety, J., Lamm, C. (2006). Human Empathy Through the Lens of Social Neuroscience. Sci World J 6, 1146–1163 Available at: http://‍ abs/. Desmurget, M., Richard, N., Harquel, S., Baraduc, P., Szathmari, A., Mottolese, C., Sirigu, A. (2014). Neural representations of ethologically relevant hand/mouth synergies in the human precentral gyrus. Proc Natl Acad Sci U S A 111, 5718–5722 Available at: http://‍www. di Pellegrino, G., Fadiga, L., Fogassi, L., Gallese, V., Rizzolatti, G. (1992). Understanding motor events: a neurophysiological study. Exp brain Res 91, 176–180. https://‍ Ferrari, P. F., Gallese, V., Rizzolatti, G., Fogassi, L. (2003). Mirror neurons responding to the observation of ingestive and communicative mouth actions in the monkey ventral premotor cortex. Eur J Neurosci 17, 1703–1714. https://‍ [Accessed June 27, 2016]. Ferrari, P. F., Visalberghi, E., Paukner, A., Fogassi, L., Ruggiero, A., Suomi, S. (2006). Neonatal Imitation in Rhesus Macaques. PLoS Biol 4, e302. https://‍


Gino Coudé and Pier Francesco Ferrari Ferrari, P. F. F., Gerbella, M., Coudé, G., Rozzi, S. (2017). Two different mirror neuron networks: the sensorimotor (hand) and limbic (face) pathways. Neuroscience 358, 300–315. https://‍ http://‍ article/pii/S0306452217304578 Fogassi, L., Ferrari, P. (2012). Cortical Motor Organization, Mirror Neurons, and Embodied Language: An Evolutionary Perspective. Biolinguistics:308–337 Available at: http://‍www. Gallese, V., Fadiga, L., Fogassi, L., Rizzolatti, G. (1996). Action recognition in the premotor cortex. Brain 119 (Pt 2, 593–609.  https://‍ Gentilucci, M., Campione, G. C. (2011). Do postures of distal effectors affect the control of actions of other distal effectors? evidence for a system of interactions between hand and mouth. PLoS One 6.  https://‍ Gentilucci, M., Corballis, M. C. (2006). From manual gesture to speech: a gradual transition. Neurosci Biobehav Rev 30, 949–960 Available at: http://‍ pubmed/16620983 [Accessed August 21, 2014]. Gentilucci, M., Fogassi, L., Luppino, G., Matelli, M., Camarda, R., Rizzolatti, G. (1988). Functional organization of inferior area 6 in the macaque monkey – I. Somatotopy and the control of proximal movements. Exp Brain Res 71, 475–490. https://‍ Gerbella, M., Belmalih, A., Borra, E., Rozzi, S., Luppino, G. (2011). Cortical connections of the anterior (F5a) subdivision of the macaque ventral premotor area F5. Brain Struct Funct 216, 43–65.  https://‍ Gerbella, M., Borra, E., Tonelli, S., Rozzi, S., Luppino, G. (2013). Connectional heterogeneity of the ventral part of the macaque area 46. Cereb Cortex 23, 967–987. https://‍ Gómez, J. C. (2007). Pointing behaviors in apes and human infants: A balanced interpretation. Child Dev 78, 729–734.  https://‍ Graziano, M. S. A., Aflalo, T. N. S., Cooke, D. F. (2005). Arm movements evoked by electrical stimulation in the motor cortex of monkeys. J Neurophysiol 94, 4209–4223 Available at: http://‍ [Accessed August 27, 2016]. Graziano, M. S. A., Taylor, C. S. R., Moore, T. (2002). Complex movements evoked by microstimulation of precentral cortex. Neuron 34, 841–851. https://‍‍00698-0 Halina, M., Rossano, F., Tomasello, M. (2013). The ontogenetic ritualization of bonobo gestures. Anim Cogn 16, 653–666.  https://‍ Hatfield, E., Cacioppo, J., Rapson, R. (1993). Emotional Contagion. Curr Dir Psychol Sci 2, 96–99.  https://‍ Hayden, B. Y., Smith, D. V., Platt, M. L. (2010). Cognitive control signals in posterior cingulate cortex. Front Hum Neurosci 4, 223 Available at: http://‍ Hillman, K. L., Bilkey, D. K. (2012). Neural encoding of competitive effort in the anterior cingulate cortex. Nat Neurosci 15, 1290–1297 Available at: http://‍ pubmed/22885851%5Cnhttp://‍ Huang, C. S., Sirisko, M. A., Hiraba, H., Murray, G. M., Sessle, B. J. (1988). Organization of the primate face motor cortex as revealed by intracortical microstimulation and electrophysiological identification of afferent inputs and corticobulbar projections. J Neurophysiol 59, 796–818 Available at: http://‍ [Accessed August 27, 2016].

Mirror neuron systems for hand and mouth in the communication in primates

Jezzini, A., Caruana, F., Stoianov, I., Gallese, V., Rizzolatti, G. (2012). Functional organization of the insula and inner perisylvian regions. Proc Natl Acad Sci U S A 109, 10077–10082 Available at: http://‍‍ Kaas, J. H., Gharbawie, O. A., Stepniewska, I. (2013). Cortical networks for ethologically relevant behaviors in primates. Am J Primatol 75, 407–414. https://‍ Keysers, C., Fadiga, L. (2008). The mirror neuron system: new frontiers. Soc Neurosci 3, 193–198. https://‍ Lamm, C., Decety, J., Singer, T. (2011). Meta-analytic evidence for common and distinct neural networks associated with directly experienced pain and empathy for pain. Neuroimage 54, 2492–2502 Available at: http://‍ S1053811910013066. Leavens, D. A., Hopkins, W. D., Bard, K. A. (2005). Understanding the Point of Chimpanzee Pointing. Epigenesis and Ecological Validity. Curr Dir Psychol Sci 14, 185–189. https://‍ Leavens, D. A., Hopkins, W. D., Thomas, R. K. (2004). Referential communication by chimpanzees (Pan troglodytes). J Comp Psychol 118, 48–57 Available at: http://‍www.ncbi.nlm.nih. gov/pubmed/15008672 [Accessed July 31, 2014]. Leavens, DA, Russell, JL, Hopkins, WD. (2004b) Intentionality as Measured in the Persistence and Elaboration of Communication by Chimpanzees (Pan troglodytes). Child Dev 76:291–306.  https://‍ Liebal, K., Call, J. (2012). The origins of non-human primates’ manual gestures. Philos Trans R Soc B Biol Sci 367, 118–128.  https://‍ Lyn, H., Greenfield, P. M., Savage-Rumbaugh, S., Gillespie-Lynch, K., Hopkins, W. D. (2011). Nonhuman primates do declare! A comparison of declarative symbol and gesture use in two children, two bonobos, and a chimpanzee. Lang Commun 31, 63–74. https://‍ Mancini, G., Ferrari, P. F., Palagi, E. (2013) In Play We Trust. Rapid Facial Mimicry Predicts the Duration of Playful Interactions in Geladas. PLoS One 8, 2–6. https://‍ Maranesi, M., Livi, A., Bonini, L. (2017). Spatial and viewpoint selectivity for others ’ observed actions in monkey ventral premotor mirror neurons. Sci Rep:1–19. https://‍ Maranesi, M., Rodà, F., Bonini, L., Rozzi, S., Ferrari, P. F., Fogassi, L., Coudé, G. (2012). Anatomo-functional organization of the ventral primary motor and premotor cortex in the macaque monkey. Eur J Neurosci 36, 3376–3387. https://‍ McGuinness, E., Sivertsen, D., Allman, J. M. (1980). Organization of the face representation in macaque motor cortex. J Comp Neurol 193, 591–608 Available at: http://‍www.ncbi.nlm.nih. gov/pubmed/7440784 [Accessed August 27, 2016]. Morecraft, R. J., Louie, J. L., Herrick, J. L., Stilwell-Morecraft, K. S. (2001). Cortical innervation of the facial nucleus in the non-human primate: A new interpretation of the effects of stroke and related subtotal brain trauma on the muscles of facial expression. Brain 124, 176–208. https://‍ Nelissen, K., Vanduffel, W. (2011). Grasping-related functional magnetic resonance imaging brain responses in the macaque monkey. J Neurosci 31, 8220–8229 Available at: http://‍www.



Gino Coudé and Pier Francesco Ferrari bstract. Okada, K., Rogalsky, C., O’Grady, L., Hanaumi, L., Bellugi, U., Corina, D., Hickok, G. (2016). An fMRI study of perception and action in deaf signers. Neuropsychologia 82, 179–188. https://‍ Oztop, E., Arbib, M. A. (2002). Schema design and implementation of the grasp-related mirror neuron system. Biol Cybern 87, 116–140.  https://‍ Preston, S. D., Waal, F. B. M. De. (2002). Preston_deWaal_2002. 1–71. Preston, S. D., & Waal, F. B. M. De. (2002). Empathy: Its ultimate and proximate bases. Behavioral and Brain Sciences, 25(1), 1–71. Rizzolatti, G., Arbib, M. A. (1998). Language within our grasp. Trends Neurosci 21, 188–195. https://‍‍01260-0 Rizzolatti, G., Cattaneo, L., Fabbri-Destro, M., Rozzi, S. (2014). Cortical mechanisms underlying the organization of goal-directed actions and mirror neuron-based action understanding. Physiol Rev 94, 655–706.  https://‍ Rizzolatti, G., Sinigaglia, C. (2016). The mirror mechanism: a basic principle of brain function. Nat Rev Neurosci 17, 757–765 Available at: http://‍ nrn.2016.135%5Cn http://‍ Romanski, L. M. (2012). Integration of faces and vocalizations in ventral prefrontal cortex: implications for the evolution of audiovisual speech. Proc Natl Acad Sci U S A 109 Suppl:10717–10724 Available at: http://‍ d=3386875&tool=pmcentrez&rendertype=abstract [Accessed July 26, 2014]. https://‍ Rozzi, S., Calzavara, R., Belmalih, A., Borra, E., Gregoriou, G. G., Matelli, M., Luppino, G. (2006). Cortical connections of the inferior parietal cortical convexity of the macaque monkey. Cereb Cortex 16, 1389–1417.  https://‍ Saalmann, Y. B., Kastner, S. (2011). Cognitive and Perceptual Functions of the Visual Thalamus. Neuron 71, 209–223.  https://‍ Sherman, S. M. (2007). The thalamus is more than just a relay. Curr Opin Neurobiol 17, 417–422. https://‍ Simone, L., Bimbi, M., Rodà, F., Fogassi, L., Rozzi, S. (2017). Action observation activates neurons of the monkey ventrolateral prefrontal cortex. Sci Rep 7, 44378 Available at: http://‍www. Simone, L., Rozzi, S., Bimbi, M., Fogassi, L. (2015). Movement-related activity during goaldirected hand actions in the monkey ventrolateral prefrontal cortex. Eur J Neurosci 42, 2882–2894.  https://‍ Singer, T. (2006). The neuronal basis and ontogeny of empathy and mind reading: Review of literature and implications for future research. Neurosci Biobehav Rev 30, 855–863. https://‍ Singer, T., Seymour, B., O’Dohery, J., Kaube, H., Dolan, R. J., Frith, C. D. (2004). Empathy for pain involves the affective but not sensory components of pain. Science (80-) 303, 1157–1162. https://‍ Tomova, L., Von Dawans, B., Heinrichs, M., Silani, G., Lamm, C. (2014). Is stress affecting our ability to tune into others? Evidence for gender differences in the effects of stress on selfother distinction. Psychoneuroendocrinology 43, 95–104. https://‍

Mirror neuron systems for hand and mouth in the communication in primates

Tramacere, A., Ferrari, P. F. (2016). Faces in the mirror, from the neuroscience of mimicry to the emergence of mentalizing. J Anthropol Sci 94, 113–126. Tramacere, A., Pievani, T., Ferrari, P. F. (2016). Mirror neurons in the tree of life: mosaic evolution, plasticity and exaptation of sensorimotor matching responses. Biol Rev:0–0. Tremblay, L., Worbe, Y., Thobois, S., Sgambato-Faure, V., Féger, J. (2015). Selective dysfunction of basal ganglia subterritories: From movement to behavioral disorders. Mov Disord 30, 1155–1170.  https://‍ van Hooff, J. R. R. A. M. (1967). The facial displays of catarrhine monkeys and apes. In: Primate Ethology, Weidenfeld. (Morris, D., ed), pp 7–68. London. Walter, H. (2012). Social Cognitive Neuroscience of Empathy – Concepts, circuits and genes. Emotion Review, 4(1), 9–17. Available at: http://‍​10.1177/​ 1754073911421379. Zaki, J., Ochsner, K. N. (2012). The neuroscience of empathy: progress, pitfalls and promise. Nat Neurosci 15, 675–680.  https://‍


Plasticity, innateness, and the path to language in the primate brain Comparing macaque, chimpanzee and human circuitry for visuomotor integration Erin Hecht

Georgia State University

Many researchers consider language to be definitionally unique to humans. However, increasing evidence suggests that language emerged via a series of adaptations to neural systems supporting earlier capacities for visuomotor integration and manual action. This paper reviews comparative neuroscience evidence for the evolutionary progression of these adaptations. An outstanding question is how to mechanistically explain the emergence of new capacities from pre-existing circuitry. One possibility is that human brains may have undergone selection for greater plasticity, reducing the extent to which brain organization is hard-wired and increasing the extent to which it is shaped by socially transmitted, learned behaviors. Mutations that made these new abilities easier or faster to learn would have undergone positive selection, and over time, the neural changes once associated with individual neural plasticity would tend to become heritable, innate, and fixed. Clearly, though, language is not entirely “innate;” it does not emerge without the requisite environmental input and experience. Thus, a mechanistic explanation for the evolution of language must address the inherent trade-off between the evolutionary pressure for underlying neural systems to be flexible and sensitive to environmental input vs. the tendency over time for continually adaptive behaviors to become reliably expressed in an earlyemerging, canalized, less flexible manner. Keywords: action perception, tool use, white matter, diffusion tensor imaging, superior longitudinal fasciculus, dorsal stream, evolution, chimpanzees, macaques

https://‍ © 2020 John Benjamins Publishing Company

Plasticity, innateness, and the path to language in the primate brain

Introduction: Comparative neuroscience, exaptation, and language Early attempts to study the evolution of human-specific abilities like language focused on adaptations that exist in humans but not other animals. For example, Brodmann’s famous cytoarchitectonic maps  – which he produced in both humans and other species – show an area 45 only in humans, reflecting the notion of human-unique anatomy underlying human-unique cognitive function. Notably, area 45 and 44 homologues have now been established in non-human anthropoid primates (e.g., Petrides, 2005; Schenker et al., 2008). Paralleling this anatomical perspective, earlier behavioral/cognitive perspectives on language evolution focused on innate, “hard-wired” abilities – e.g., Chomsky’s “universal grammar.” In contrast, current perspectives are oriented more toward continuity, asking how human-unique functions were derived or exapted from pre-existing functions relying on pre-existing structures. Increasingly, research indicates that the evolution of vision-for-action circuitry is at the root of a suite of interrelated human specializations that all rely on capacities for complex social learning and cumulative culture, including language. This paper reviews comparative neuroscience evidence on the evolutionary timecourse of these adaptations, and considers theoretical explanations for how new functions can emerge from pre-existing circuits. We cannot directly observe our own history, but we can extrapolate it via comparisons with our extant primate relatives. Humans’ closest living relatives are chimpanzees; our ancestors and theirs diverged about 6–7 million years ago (Goodman et al., 1998). Humans are more distantly related to Old World monkeys, such as macaques, with our last common ancestor existing about 25–32 million years ago (Goodman et al., 1998). Comparative neuroscience draws conclusions about human evolution in the following way. If a trait exists in multiple extant primate species, it is assumed to have existed in their last common ancestor. Conversely, if a trait exists in one group of related species but not a more distantly related outgroup, it is assumed to have emerged after their divergence. This approach allows for the extrapolation of the evolutionary history of brain adaptations.

LCA-m: Early primate adaptations for the visual control of action Primates share a distinctive elaboration of cortical machinery for visuo-manual integration that is perhaps their quintessential brain adaptation. Early primates were diurnal, arboreal animals who made a living by hunting insects and fruit in the fine terminal branches of trees (Sussman et al., 2013). Success in this niche was supported by the emergence of ventral premotor cortex, which allowed for the integration of visual input with new, higher-order control of sequences of actions, and area MT (or VS), a specialized retinotopic motion-processing region. PMv



Erin Hecht

and MT are present in all primates (Kaas, 2012); thus, the basic action-processing adaptations that later became exapted for social and cultural learning were in place at or near the phylogenetic root of our clade. Given that Old World monkeys, New World monkeys, and great apes (including humans) all show evidence of a mirror system (although note that direct electrophysiological observation is limited to macaques and humans, whereas chimpanzee evidence comes from neuroimaging studies (Hecht et al., 2013)), it is likely that mirror neurons were also present in our earliest ancestors, and may develop spontaneously across phyla via general Hebbian learning mechanisms in cells with access to both motor and sensory information. Additionally, both New World monkeys (capuchins) and Old World monkeys (macaques), like humans, can recognize when they are being imitated and show preferences for individuals who imitate them (Chartrand & Bargh, 1999; Paukner et al., 2009; Sclafani, et al., 2015), indicating that some degree of awareness about the correspondence between one’s own and others’ actions and a subsequent link to affective or motivational processing may have also been present very early in primate evolution. From the emergence of MT and PMv in early primates, further neural adaptations evolved, as evidenced by the presence of these features in the brains of extant anthropoids. Visual processing of motion expanded from MT into the dorsal visual stream, a network of linked regions extending from extrastriate occipital cortex into posterior parietal cortex (Goodale and Milner, 1992). The dorsal stream processes “how” observed events unfold and is involved in the on-line control of action. Its functions are dissociable from, but interconnected with, those of the ventral stream, which extends from extrastriate cortex into the lateral and inferior temporal lobe. In contrast to the dorsal stream, the ventral stream processes “what” is observed in the periphery, including the recognition of objects, individuals, and body parts. Both streams are present in modern macaques, chimpanzees, and humans. The ventral visual stream has clearly undergone important evolutionary change, such as the emergence of semantic cortex and specialized modules for face processing. However, we argue that multiple, successive adaptations to the dorsal stream were especially important for the evolution of behavioral products of complex social learning and cumulative culture, including language and tool use (E. Hecht, 2016).

LCA-c: Hominid dorsal stream adaptations for social transmission of learned skills Several adaptations for the social transmission of learned skills appear to have occurred after hominids (humans and other great apes) diverged from monkeys. While primates in general are skilled social learners, there are species differences

Plasticity, innateness, and the path to language in the primate brain

in what kinds of behaviors are socially transmitted. Chimpanzees and orangutans, like humans, spontaneously and flexibly use tools in the wild, and tool use skills are transmitted socially (Gruber et  al., 2012; Inoue-Nakamura & Matsuzawa, 1997). Gorillas also show skilled, hierarchically-complex, socially transmitted object manipulation abilities (e.g., leaf folding, Byrne et al., 2011). Bonobos have not yet been observed to typically use tools in the wild, but are capable of doing so in a laboratory context without training (Roffman et al., 2015). In contrast, monkeys have not been found to show clade-wide endemic capacities for tool use, although important exceptions do exist, as discussed later. The fact that some monkeys do use tools suggests that the neural precursors for tool use could be endemic in anthropoids, which may only evolve into a fully functional species-typical behavior given specific selection pressure. However, the abundance of tool use and gestural communication in great apes, compared with the clearly reduced complexity of these behaviors in monkeys, suggests that the neural mechanisms involved in tool use and gestural communication may have mainly become elaborated after hominids diverged from Old World monkeys, before modern hominid species diverged from each other. There are also species differences in which aspects of observed behaviors have been shown to be socially transmitted. A broad, simplified distinction can be made between emulation, or behaving in a way that results in reproducing the outcome of an observed action (even though the specific behavioral sequence might be different), versus imitation, or additionally copying the specific methods used to achieve the result (Whiten et  al., 2009). Monkeys, to date, are not known to imitate, or may do so only in specialized, limited contexts (Visalberghi & Fragaszy, 2002). However, chimpanzees can imitate in certain circumstances, namely when the causal relationship between an actions’ movements and its result is not perceptible (Homer & Whiten, 2005). Chimpanzees also show limited but measurable success at reproducing arbitrary movements (Hayes & Hayes, 1952) and are capable of miming goal-directed actions in the absence of objects or actual goals (Marshall-Pescini & Whiten, 2008). This suggests that the capacity for imitation may have been present in the brains of early hominids. This wide variation in the capacity for imitation has stimulated not only the quest to better characterize the behavioral variation but also for neuroanatomical correlates. Following the divergence of hominids (apes and humans) from monkeys, there appears to have been a shift in the general distribution of white matter connections within long-range circuitry for performing and observing action. In macaques, ventral-stream temporal regions involved in the perceptual processing of objects and biological motion project mainly to inferior frontal cortex, following a ventral route through the inferior longitudinal fasciculus and extreme/ external capsules; a relatively small proportion of the network connectivity travels



Erin Hecht

dorsally through inferior parietal cortex (Hecht et al., 2013; Petrides & Pandya, 2002, 2009). In chimpanzees, though, this dorsal route through the middle and superior longitudinal fasciculi into frontal cortex became more pronounced, and in humans, these dorsal connections are even more robust (Hecht et al., 2013). These comparisons used diffusion tensor imaging data, which does not image white matter at the cellular level, and it is not yet fully understood what cellular variables may affect this type of quantification. Still, though, it seems that in monkeys, most of the information that inferior frontal cortex receives about observed events comes from the ventral visual stream, whereas in apes, inferior frontal cortex receives a relatively greater input from the dorsal visual stream. This progression of structural differences parallels a progression of functional differences: ventrolateral prefrontal responses to observed objects are greater in macaques than in humans (Denys et al., 2004). Similarly, ventrolateral prefrontal responses to observed object-directed grasping are greater in chimpanzees than humans (Denys et al., 2004). Given that prefrontal cortex is generally engaged with higher-order representations of actions and visual scenes, whereas earlier visual regions contribute feature-level processing, this may reflect a general trend toward increased processing of bottom-up perceptual details of observed actions, as opposed to primarily top-down cognitive representations (Hecht et al., 2013). Additionally, new functional regions emerged in inferior parietal cortex after humans’ and chimpanzees’ last common ancestor with macaques. For example, 3D form-from-motion stimuli activate the intraparietal sulcus in humans but not macaques (Vanduffel et al., 2002). Similarly, observed tool use activates the anterior supramarginal gyrus in humans but not in macaques (Peeters et al., 2009). We do not know how chimpanzee inferior parietal cortex might respond to these types of stimuli because the relevant experiments have not been performed. These evolved functional adaptations in parietal cortex are likely supported by underlying structural differences. Macaques show little or no connectivity between the anterior supramarginal gyrus and inferotemporal object processing cortex (Rozzi et al., 2006; Zhong & Rockland, 2003), whereas diffusion tensor imaging studies in humans and chimpanzees indicate that these connections are readily measurable (Hecht et  al., 2013). We have postulated that these new connections may allow for integration between feature-based object processing in inferotemporal cortex and kinematic-spatiotemporal processing in parietal cortex (Hecht et al., 2013), a function that may be important for both individual and social learning of manual action, potentially including gesture and/or tool use. This shift in the distribution of structural connectivity may also confer different response properties to the mirror system. In macaques, frontal mirror neurons seem primarily responsive to transitive actions for which the object toward which the actions are directed is visible or has very recently been visible (Umiltà

Plasticity, innateness, and the path to language in the primate brain

et al., 2001) and have been reported to respond not at all (Rizzolatti et al., 1996) or very little (Kraskov et al., 2009) to observed movements which lack physical goals on objects (intransitive actions). In contrast, when chimpanzees observe others’ actions, these are mapped onto nearly identical voxels as the chimp would use to produce those same movements itself, regardless of whether they produce a physical result on an object (Hecht et al., 2013). Humans also show highly specific mapping of intransitive action onto one’s own motor system (Kraskov et al., 2009). This suggests that the neural capacity to simulate not only the goals of others’ actions, but also the individual component movements, evolved before humans and chimpanzees diverged – potentially coincident with the capacity for imitation and the perceptual comprehension of non-object-directed manual actions, although it seems clear that humans far out-perform other apes in this domain, as discussed in the next section.

Human-specific adaptations: Integrating cognitive control and action sequencing with high-fidelity representations of action details It appears that the evolutionary trend toward increased bottom-up processing of actions’ perceptual details continued not only past the monkey-ape divergence but also past the chimpanzee-human divergence. Whereas chimpanzees are capable of imitation but behaviorally biased toward emulation, humans show a strong inclination toward imitation, even extending to over-imitation, or reproduction of action details that are not causally related to achieving the end goal (Whiten et al., 2009). Performing actions in a recognizably similar way to particular individuals or groups clearly plays an important socio-communicative role in human interaction; humans spontaneously and subconsciously imitate behaviors like body posture and speech patterns in a way that reflects social status (Chartrand & Bargh, 1999). In addition to greater attention to the details of others’ actions, humans also show greater attention to the movement details of their own actions. For example, chimpanzees find it difficult to differentiate their own cursor from one controlled by the computer, if both are achieving the same end goal (Kaneko & Tomonaga, 2012). Several neural adaptations may underlie this continued shift. For example, during the simple, passive observation of object-directed reach-to-grasp actions, most of the regional cerebral glucose metabolism in the chimpanzee brain occurs in prefrontal cortex, whereas human brains show a more distributed pattern of energy expenditure across of occipital, temporal, parietal, premotor, and prefrontal cortex; chimpanzees show significantly greater activity in ventrolateral prefrontal cortex, while humans show significantly greater activity in inferior parietal, inferotemporal, and ventral premotor cortex (Hecht et al., 2013). In this respect


60 Erin Hecht

chimpanzees are similar to macaques, which show greater glucose metabolism in F5 than PF/PFG during observed grasping (Raos et al, 2004, 2007), and increased prefrontal and reduced parietal activation compared to humans during the perception of actions and objects (Denys et al., 2004). Thus, the macaque and chimpanzee patterns of activation likely represent the ancestral primate condition. In contrast, humans’ increased parietal and occipitotemporal activations during action observation are echoed by meta-analyses of over 100 fMRI and PET studies (Caspers et al., 2010; Molenberghs et al., 2009). This appears to represent greater functional investment in bottom-up perceptual representations incorporating greater kinematic and spatiotemporal details about the internal components of observed actions (Hecht et al., 2013). Accurate representation of these kinematic and spatiotemporal details is likely essential for flexible integration between individual learning and social acquisition of complex action sequences. White matter circuitry has also undergone further adaptation after the chimpanzee-human divergence. The third branch of the superior longitudinal fasciculus (SLFIII), which links anterior inferior parietal cortex with ventral premotor cortex in monkeys, extends into more anterior regions of the inferior frontal gyrus in humans, particularly in the right hemisphere (Hecht et al., 2015). Notably, in macaques, SLFIII’s projections from area PF terminate in ventral premotor cortex and do not reach prefrontal cortex (Petrides & Pandya, 2002, 2009). While chimpanzee SLFIII does show an observable extension into ventrolateral prefrontal cortex, connections with premotor cortex are far stronger, and SLFIII is not right-lateralized at the population level (Hecht et al., 2015). Ventrolateral prefrontal cortex, where SLFIII makes its anterior termination, is activated during tasks that require cognitive control, task switching, recursion, and sequencing, functions that are likely essential for the evolution of complex, hierarchically-structured instrumental behavior, including language. Notably, language is typically left-lateralized in the brain. We found anterior extension of human SLFIII in both hemispheres, but it was most marked in the right hemisphere; this asymmetry and its potential relationship to the lateralization of language is an issue that needs additional research. Interestingly, in chimpanzees, prefrontal extension of right SLFIII is also associated with visual self-recognition. Not all chimpanzees can recognize their own reflection in a mirror, and there is a visible extension in the anterior aspect of right SLFIII projections from chimpanzees who do not recognize themselves in a mirror, to those who show ambiguous behavioral evidence, to those that clearly do (Hecht et al., 2015). Moreover, this same feature – right SLFIII’s projection into anterior inferior frontal gyrus – shows structural change during the acquisition of Paleolithic stone tool use skills in modem humans trained to make these tools (Hecht et al., 2015), and the gray matter that is reached by this projection is activated by Acheulean, but not Oldowan, toolmaking (Hecht et al., 2015; Stout et al.,

Plasticity, innateness, and the path to language in the primate brain

2011; Stout et al., 2008). Together, these results strongly implicate the extension of SLFIII white matter into right anterior inferior frontal gyms in the emergence of human-like visuomotor perceptual integration and action. Thus, to summarize, comparative evidence on primate brain evolution points toward repeated waves of adaptation to the fronto-temporal-parietal action-perception circuitry. The ancestral primate state included early adaptations for visuomotor integration; apes evolved additional adaptations to the dorsal visual stream, likely related to the elaboration of behavioral capacities for imitation and manual gesture; and finally, this trend continued after humans diverged from other apes, with our ancestors evolving further perceptual sensitivities and white matter connections related to integration of bottom-up perceptual action details with higherorder, hierarchically-organized top-down cognitive processes including sequencing and recursion (Figure 1). MODERN MACAQUES



Action observation circuitry heavily skewed toward ventral visual stream (”what” processing); Broca's area homolog has no direct connections with parietal area PF {involved in somatomotor processing for hand and mouth)

Action observation circuitry shows elaboration of the dorsal visual stream (”how” processing); prefrontal extension of SLFIII brings in put from PF into Broca’s area homolog

Action observation circuitry shows further elaboration of dorsal visual stream (”how” processing); further prefrontal extension of SLFIII into Broca’s area, especially in the right hemisphere

Functional brain responses to observed action are more prefrontally-focused than in humans; mirror system largely driven by object goals, with little or no response to observed intransitive action

Functional brain responses to observed action occur mainly in prefrontal cortex; mirror responses to intransitive action are very similar to mirror responses to transitive action

Functional brain responses to observed action include substantial activation in occipitotemporal, parietal, and pre motor cortex, in addition to prefrontal cortex; mirror responses to intransitive action are very similar to mirror responses to transitive action




: 6-


s ion tat sen e e r s ep uag ult nr ang l res actio ) d n l ns oe ica ma nt hu rch itio hiera (in d d d a n s n e, a uit ow s in irc tur od top-d nc ges eth o l i d t a m n a on anu ls a oci ed detai ,m ass cus l use ing o a l f p u o t o to el gly rcep sin ing dev e ud fp terea , la ncl inc ing o i e l , g b s n ls r ia kil ces rni ds , va lea p pro itte ia l stic u c a m l o s p s n om tra hly and bott lly hig n ing of cia nd etwee o n s a o t f si ers on b yo pan nd i xit Ex E n u egrat ple m t tio o c n c i A TIM ing s ed s a a e r re Inc Inc

A MY 32

Figure 1.  Schematic diagram of differences between extant primate species in the structure and function of brain circuitry for observing and producing action, and hypothesized selective forces in our shared evolutionary history



Erin Hecht

The chicken or the egg: Continuity, divergence, and the environmental context for change in brain-behavior evolution The evolution of these circuits likely represents a cyclic interchange between selection pressures and neural changes, where existing neural features became exapted for new functions, which then supported the further exaptation of this circuitry for additional new functions. The chicken must predate the egg in brain-behavior evolution – newly adaptive behavioral and cognitive abilities can’t emerge without the pre-existence of a neural architecture that can support them. But given that we are considering the evolution of new abilities, this neural architecture must have been previously supporting some other perhaps related function. What evolutionary mechanism mediated the exaptation of pre-existing neural adaptations for new functions? We argue that the emergence of new, complex, socially-learned behaviors on an evolutionary timescale is closely tied to adaptations for increased learning and neural plasticity on the timescale of an individual lifespan, an old idea (e.g, Bogin, 1997) which has recently gained a body of new experimental neuroscientific support, discussed below.

Flexibility and environmental sensitivity A framework for the evolution of increased neural mechanisms for learning and plasticity is offered by Buckner and Krienen’s “tethering hypothesis” (Buckner & Krienen, 2013). According to this model, in early mammals, whose cortex mainly consisted of primary sensory and motor regions, chemical signaling gradients constrained cortical networks to a rigid, canonical organization. In contrast, in human’ evolutionary history, massive expansion of the cortical mantle “untethered” large regions from the constraints of signaling gradients, resulting in the emergence of distributed association networks with more flexible and plastic patterns of long-range connectivity. We argue that these distributed, plastic association networks underlie a set of intertwined capacities that together have enabled human technological culture to evolve so rapidly: our ability to socially transmit, and incrementally improve upon, learned behavioral skills; our use of language and other forms of symbolic representation; and our proficiency for tool use and tool-making. These capacities all involve similar (but non-identical), overlapping networks in lateral frontal, temporal, and parietal cortex (reviewed in (Stout & Chaminade, 2012)), and we and many other researchers have considered it likely that some or all of these functions coevolved (e.g., (Arbib, 2012; Fitch et al., 2010; Greenfield, 1991; Hopkins et al., 2007; Pulvermuller & Fadiga, 2010; van Schaik et al., 1999)). In particular, studies by our group and other collaborators have found multiple lines of evidence suggesting human adaptation in these networks.

Plasticity, innateness, and the path to language in the primate brain

Some additional compelling recent data is consistent with this idea. GomezRobles et  al. (2015) compared the heritability of cortical morphology in chimpanzees and humans that had known kinship relationships. Morphology was less heritable in humans, and notably, this effect was most pronounced in association areas. Buckner et al. (2013) have produced a map of individual variability in human functional connectivity, which reflects patterns of co-activation between various brain regions; again, this is greatest in association regions. It seems likely that individual differences in actual anatomical connectivity could underlie this functional and morphological variation, and indeed Gomez-Robles et al. (2013, 2015) postulate that their results may be related to underlying changes in neural circuitry. Additional support for the tethering hypothesis can be found in the high degree of individual variation in human brain organization. This contrasts with the brains of most other vertebrate species, which are quite similar across individuals, especially in primary cortical regions and in subcortical regions involved in the production of species-specific behaviors (e.g., Finlay et  al., 2011). In humans, considerable individual neuroanatomical variability occurs in our species’ greatly expanded association cortex. For example, humans show high individual variation in the location, extent, and internal organization of classical language regions (Anwander et  al., 2007; Galaburda et  al., 1991), and in the gray matter density, topography, and functional organization of posterior parietal association regions (Frey et al., 2005; Kanai et al., 2011; Ryan et al., 2006). The extent of individual variability in association regions appears to be greater in humans than in chimpanzees, as indicated by a recent comparative study on cortical morphology (Gomez-Robles et al., 2014).

Specificity and innateness There is a key evolutionary implication of this relaxed genetic constraint: given that human brain organization has become less pre-ordained by developmental programs, it may therefore be more responsive to the input of individuals’ experiences with the physical, social, and cultural environment, providing a physiological mechanism for plasticity underlying the acquisition of learned skills. Selection for increased plasticity may have occurred because it maximizes the impact of learning on shaping these circuits. Consistent with this idea, human neocortex is characterized by a prolonged myelination period (Miller et al., 2012). During human development, association regions expand nearly twice as much as other regions (Hill et al., 2010) and also myelinate impressively late – into the second and third decade of life (Buckner & Krienen, 2013; Flechsig, 1920; Yakovlev & Lecours, 1966). Interestingly, comparisons with macaques suggest that this pattern of developmental expansion is mirrored by the pattern of evolutionary expansion,


64 Erin Hecht

perhaps because it is adaptive for recently-evolved regions to mature more slowly, to increase the influence of early experience on those regions (Hill et al., 2010). Together, these results point toward a role for increased plasticity in human brain evolution, allowing for increased flexibility and sensitivity to environmental input in the acquisition of learned behaviors like language. Situated in opposition to this idea of reduced innateness in human brain organization, is the idea that given constant environmental selection pressure, over time, behaviors that are tightly tied to survival will tend to become earlier-developing and more-automatic, with increasingly reliable and invariable emergence in every individual. Mutations that lead the learned behavior to be easier or faster to acquire will tend to be favored. This phenomenon is termed the Baldwin Effect (Baldwin, 1896; Osborn, 1896; Weber & Depew, 2003; Bateson, 2004). It describes a mechanism by which pre-existing brain anatomy can become coopted for learned skills  – i.e., by which learned behaviors can become (at least somewhat) innate. Importantly, the Baldwin Effect can only occur if the environment favoring the learned behaviors is relatively constant; socially-transmitted culture can provide some aspects of environmental stability while also providing a mechanism for continued change. Thus, the Baldwin Effect describes a process by which biological evolution can co-occur with, and be driven by, cultural evolution; we and others have proposed that the Baldwin effect played a role in the evolution of neural circuits for learned, socially-transmitted skills, including language and complex tool use (Hecht et al., 2015). Clearly, though, language is not entirely “innate;” i.e., it does not emerge without the requisite environmental input and experience. Thus, a mechanistic explanation for the evolution of language must go beyond identifying the circuits that have changed and address the inherent trade-off between the evolutionary pressure for underlying neural systems to be flexible and sensitive to environmental input vs. the tendency for adaptive behaviors to become more innate over time. On an evolutionary timescale, how are these opposing forces balanced, and what are the selective contexts that tip the balance toward one or the other? And on a mechanistic level, how are these changes mediated? We propose that these are important questions for future research on language evolution.

Toward a new road map In conclusion, the comparative research reviewed here points toward some key transitions relevant to the evolution of what eventually became language circuitry. In general, the ideas outlined here agree with the MSH in the hypothesis that waves successive adaptations to frontoparietal vision-for-action and action-perception circuitry were crucial for the evolution of language. The current evidence points

Plasticity, innateness, and the path to language in the primate brain

toward (1) the elaboration of the dorsal visual stream, including the emergence of new areas, new functional sensitivities, and increasing elaboration of white matter circuitry; (2) the elaboration and emergence of cognitive and behavioral capacities thought to be supported by the dorsal visual stream, and by integration of dorsal- and ventral-stream visual processing with hierarchical representations; and (3) an increase in plasticity in human association circuits, facilitating the learned acquisition of socially transmitted skills, including tool use, gesture, and language. Important targets for future research include mechanisms mediating the tradeoff between evolutionary trends toward increasing innateness and increasing plasticity, and the physiological and anatomical mechanisms which linked this evolving visionfor-manual-action circuitry with vocal and auditory circuitry in spoken language.

Funding The research described here was supported in part by NSF 1631563, Wenner-Gren Foundation 8054, and NIH NRSA F31MH086179-01. The paper was prepared for a workshop funded by NSF BCS-1343544 (M.A. Arbib, Principal Investigator).

References Anwander, A., Tittgemeyer, M., von Cramon, D. Y., Friederici, A. D., & Knosche, T. R. (2007). Connectivity-Based Parcellation of Broca’s Area. Cereb Cortex, 17(4), 816–825. https://‍ Arbib, M. (2012). How the Brain Got Language: Oxford University Press. https://‍ Baldwin, J. Mark. (1896). A New Factor in Evolution. The American Naturalist, 30(354), 441–451. https://‍ Bateson, P. (2004). The active role of behaviour in evolution. Biol Phi/as, 19(2), 283–298. https://‍ Bogin, B. (1997). Evolutionary Hypotheses for Human Childhood. Yearbook of Physical Anthropology, 40, 63–89 https://‍‍1096-8644(1997)‍25+3.0.CO;2-8 Buckner, R. L., & Krienen, F. M. (2013). The evolution of distributed association networks in the human brain. Trends Cogn Sci, 17(12), 648–665.  https://‍ Byrne, R. W., Hobaiter, C., & Klailova, M. (2011). Local traditions in gorilla manual skill: evidence for observational learning of behavioral organization. Anim Cogn, 14(5), 683–693. https://‍ Caspers, S., Zilles, K., Laird, A. R., & Eickhoff, S. B. (2010). ALE meta-analysis of action observation and imitation in the human brain. Neuroimage, 50(3), 1148–1167. https://‍ Chartrand, T. L., & Bargh, J. A. (1999). The chameleon effect: the perception-behavior link and social interaction. J Pers Soc Psycho/, 76(6), 893–910. https://‍


66 Erin Hecht Denys, K., Vanduffel, W., Fize, O., Nelissen, K., Sawamura, H., Georgieva, S., … Orban, G. A. (2004). Visual activation in prefrontal cortex is stronger in monkeys than in humans. J Cogn Neurosci, 16(9), 1505–1516.  https://‍ Finlay, B. L., Hinz, F., & Darlington, R. B. (2011). Mapping behavioural evolution onto brain evolution: the strategic roles of conserved organization in individuals and species. Phi/as Trans R Soc Land B Biol Sci, 366(1574), 2111–2123. https://‍ Fitch, W. T., Huber, L., & Bugnyar, T. (2010). Social cognition and the evolution of language: constructing cognitive phylogenies. Neuron, 65(6), 795–814. https://‍ Flechsig, P. E. (1920). Anatomie des menschlichen Gehirns und Ruckenmarks auf myelogenetischer Grundlage. G. Thieme (in German). Frey, S. H., Vinton, D., Norlund, R., & Grafton, S. T. (2005). Cortical topography of human anterior intraparietal cortex active during visually guided grasping. Brain Res Cogn Brain Res, 23(2–3), 397–405.  https://‍ Galaburda, A. M., Rosen, G. D., & Sherman, G. F. (1990). Individual variability in cortical organization: its relationship to brain laterality and implications to function. Neuropsycho/ogia, 28(6), 529–546.  https://‍‍90032-J Gomez-Robles, A., Hopkins, W. D., Schapiro, S. J., & Sherwood, C. C. (2015). Relaxed genetic control of cortical organization in human brains compared with chimpanzees. Proc Natl Acad Sci U S A, 112(48), 14799–14804.  https://‍ Gomez-Robles, A., Hopkins, W. D., & Sherwood, C. C. (2013). Increased morphological asymmetry, evolvability and plasticity in human brain evolution. Proc Biol Sci, 280(1761), 20130575.  https://‍ Gomez-Robles, A., Hopkins, W. D., & Sherwood, C. C. (2014). Modular structure facilitates mosaic evolution of the brain in chimpanzees and humans. Nat Commun, 5, 4469. https://‍ Goodman, M., Porter, C. A., Czelusniak, J., Page, S. L., Schneider, H., Shoshani, J., … Groves, C. P. (1998). Toward a phylogenetic classification of Primates based on DNA evidence complemented by fossil evidence. Mol Phylogenet Evol, 9(3), 585–598. https://‍ Greenfield, P. M. (1991). Language, tools, and brain: the development and evolution of hierarchically organized sequential behavior. Behav. Brain Sci., 14, 531–595. https://‍ Gruber, T., Singleton, I., & van Schaik, C. (2012). Sumatran orangutans differ in their cultural knowledge but not in their cognitive abilities. Curr Biol, 22(23), 2231–2235. https://‍ Hayes, K. J., & Hayes, C. (1952). Imitation in a home-raised chimpanzee. J Comp Physio/ Psycho/, 45(5), 450–459.  https://‍ Hecht, E. (2016). Adaptations to vision-for-action in primate brain evolution: Comment on “Towards a Computational Comparative Neuroprimatology: Framing the language-ready brain” by Michael A. Arbib. Phys Life Rev, 16, 74–76. https://‍ Hecht, E., Gutman, D. A., Bradley, B. A., Preuss, T. M., & Stout, D. (2015). Virtual dissection and comparative connectivity of the superior longitudinal fasciculus in chimpanzees and humans. Neuroimage, 108, 124–137.  https://‍

Plasticity, innateness, and the path to language in the primate brain

Hecht, E. E., Gutman, D. A., Bradley, B. A., Preuss, T. M., & Stout, D. (2015). Virtual dissection and comparative connectivity of the superior longitudinal fasciculus in chimpanzees and humans. Neuroimage, 108, 124–137.  https://‍ Hecht, E. E., Gutman, D. A., Khreisheh, N., Taylor, S. V., Kilner, J., Faisal, A. A., … Stout, D. (2015). Acquisition of Paleolithic toolmaking abilities involves structural remodeling to inferior frontoparietal regions. Brain Struct Funct, 220(4), 2315–2331. https://‍ Hecht, E. E., Gutman, D. A., Preuss, T. M., Sanchez, M. M., Parr, L. A., & Rilling, J. K. (2013). Process versus product in social learning: comparative diffusion tensor imaging of neural systems for action execution-observation matching in macaques, chimpanzees, and humans. Cereb Cortex, 23(5), 1014–1024.  https://‍ Hecht, E. E., Murphy, L. E., Gutman, D. A., Votaw, J. R., Schuster, D. M., Preuss, T. M., … Parr, L. A. (2013). Differences in neural activation for object-directed grasping in chimpanzees and humans. J Neurosci, 33(35), 14117–14134. https://‍ Hill, J., lnder, T., Neil, J., Dierker, D., Harwell, J., & Van Essen, D. (2010). Similar patterns of cortical expansion during human development and evolution. Proc Natl Acad Sci US A, 107(29), 13135–13140.  https://‍ Hopkins, W. D., Russell, J. L., & Cantalupo, C. (2007). Neuroanatomical correlates of handedness for tool use in chimpanzees (Pan troglodytes): implication for theories on the evolution of language. Psycho/ Sci, 18(11), 971–977. https://‍ Horner, V., & Whiten, A. (2005). Causal knowledge and imitation/emulation switching in chimpanzees (Pan troglodytes) and children (Homo sapiens). Anim Cogn, 8(3), 164–181. https://‍ Inoue-Nakamura, N., & Matsuzawa, T. (1997). Development of stone tool use by wild chimpanzees (Pan troglodytes). J Comp Psycho/, 111(2), 159–173. https://‍ Kaas, J. H. (2012). The evolution of neocortex in primates. Prag Brain Res, 195, 91–102. https://‍ Kanai, R., Dong, M. Y., Bahrami, B., & Rees, G. (2011). Distractibility in daily life is reflected in the structure and function of human parietal cortex. J Neurosci, 31(18), 6620–6626. https://‍ Kaneko, T., & Tomonaga, M. (2012). Relative contributions of goal representation and kinematic information to self-monitoring by chimpanzees and humans. Cognition, 125(2), 168–178. https://‍ Human-specific transcriptional networks in the brain. Neuron, 75(4), 601–617. https://‍ Kraskov, A., Dancause, N., Quallo, M. M., Shepherd, S., & Lemon, R. N. (2009). Corticospinal neurons in macaque ventral premotor cortex with mirror properties: a potential mechanism for action suppression? Neuron, 64(6), 922–930. https://‍ Marshall-Pescini, S., & Whiten, A. (2008). Chimpanzees (Pan troglodytes) and the question of cumulative culture: an experimental approach. Anim Cogn, 11(3), 449–456. https://‍


68 Erin Hecht Miller, D. J., Duka, T., Stimpson, C. D., Schapiro, S. J., Baze, W. B., McArthur, M. J., … Sherwood, C. C. (2012). Prolonged myelination in human neocortical evolution. Proc Natl Acad Sci US A, 109(41), 16480–16485.  https://‍ Molenberghs, P., Cunnington, R., & Mattingley, J. B. (2009). Is the mirror neuron system involved in imitation? A short review and meta-analysis. Neurosci Biobehav Rev, 33(7), 975–980.  https://‍ Ojemann, G. A. (1991). Cortical organization of language. J Neurosci, 11(8), 2281–2287. https://‍ Osborn, H. F. (1896). A mode of evolution requiring neither natural selection nor the inheritance of acquired characters. Transactions of the New York Academy of Sciences, 15, 141–148. Paukner, A., Suomi, S. J., Visalberghi, E., & Ferrari, P. F. (2009). Capuchin monkeys display affiliation toward humans who imitate them. Science, 325(5942), 880–883. https://‍ Peeters, R., Simone, L., Nelissen, K., Fabbri-Destro, M., Vanduffel, W., Rizzolatti, G., & Orban, G. A. (2009). The representation of tool use in humans and monkeys: common and uniquely human features. J Neurosci, 29(37), 11523–11539. https://‍ Petrides, M. (2005). Lateral prefrontal cortex: architectonic and functional organization. Philos Trans R Soc Land B Biol Sci, 360(1456), 781–795.  https://‍ Petrides, M., & Pandya, D. N. (2002). Comparative cytoarchitectonic analysis of the human and the macaque ventrolateral prefrontal cortex and corticocortical connection patterns in the monkey. Eur J Neurosci, 16(2), 291–310.  https://‍ Petrides, M., & Pandya, D. N. (2009). Distinct parietal and temporal pathways to the homologues of Broca’s area in the monkey. PLoS Biol, 7(8), e1000170. https://‍ Preuss, T. M., Caceres, M., Oldham, M. C., & Geschwind, D. H. (2004). Human brain evolution: insights from microarrays. Nat Rev Genet, 5(11), 850–860. https://‍ Pulvermuller, F., & Fadiga, L. (2010). Active perception: sensorimotor circuits as a cortical basis for language. Nat Rev Neurosci, 11(5), 351–360.  https://‍ Raos, V., Evangeliou, M. N., & Savaki, H. E. (2004). Observation of action: grasping with the mind’s hand. Neuroimage, 23(1), 193–201. https://‍ Raos, V., Evangeliou, M. N., & Savaki, H. E. (2007). Mental simulation of action in the service of action perception. J Neurosci, 27(46), 12675–12683. https://‍ Rizzolatti, G., Fadiga, L., Gallese, V., & Fogassi, L. (1996). Premotor cortex and the recognition of motor actions. Brain Res Cogn Brain Res, 3(2), 131–141. https://‍‍00038-0 Roffman, I., Savage-Rumbaugh, S., Rubert-Pugh, E., Stadler, A., Ronen, A., & Nevo, E. (2015). Preparation and use of varied natural tools for extractive foraging by bonobos (Pan Paniscus). Am J Phys Anthropol, 158(1), 78–91.  https://‍ Rozzi, S., Calzavara, R., Belmalih, A., Borra, E., Gregoriou, G. G., Matelli, M., & Luppino, G. (2006). Cortical connections of the inferior parietal cortical convexity of the macaque monkey. Cereb Cortex, 16(10), 1389–1417.  https://‍

Plasticity, innateness, and the path to language in the primate brain 69

Ryan, S., Bonilha, L., & Jackson, S. R. (2006). Individual variation in the location of the parietal eye fields: a TMS study. Exp Brain Res, 173(3), 389–394. https://‍ Schenker, N. M., Buxhoeveden, D. P., Blackmon, W. L., Amunts, K., Zilles, K., & Semendeferi, K. (2008). A comparative quantitative analysis of cytoarchitecture and minicolumnar organization in Broca’s area in humans and great apes. J Comp Neural, 510(1), 117–128. https://‍ Sclafani, V., Paukner, A., Suomi, S. J., & Ferrari, P. F. (2015). Imitation promotes affiliation in infant macaques at risk for impaired social behaviors. Dev Sci, 18(4), 614–621. https://‍ Stout, D., & Chaminade, T. (2012). Stone tools, language and the brain in human evolution. Philos Trans R Soc Land B Biol Sci, 367(1585), 75–87. https://‍ Stout, D., Passingham, R., Frith, C., Apel, J., & Chaminade, T. (2011). Technology, expertise and social cognition in human evolution. Eur J Neurosci, 33(7), 1328–1338. https://‍ Stout, D., Toth, N., Schick, K., & Chaminade, T. (2008). Neural correlates of Early Stone Age toolmaking: technology, language and cognition in human evolution. Philos Trans R Soc Land B Biol Sci, 363(1499), 1939–1949.  https://‍ Sussman, R. W., Tab Rasmussen, D., & Raven, P. H. (2013). Rethinking primate origins again. Am J Primato/, 75(2), 95–106.  https://‍ Umiltà, M. A., Kohler, E., Gallese, V., Fogassi, L., Fadiga, L., Keysers, C., & Rizzolatti, G. (2001). I know what you are doing. A neurophysiological study. Neuron, 31(1), 155–165. https://‍‍00337-3 [pii] van Schaik, C. P., Deaner, R. O., & Merrill, M. Y. (1999). The conditions for tool use in primates: implications for the evolution of material culture. J Hum Evol, 36(6), 719–741. https://‍ Vanduffel, W., Fize, D., Peuskens, H., Denys, K., Sunaert, S., Todd, J. T., & Orban, G. A. (2002). Extracting 3D from motion: differences in human and monkey intraparietal cortex. Science, 298(5592), 413–415.  https://‍ Visalberghi, E., & Fragaszy, D. M. (2002). “Do Monkeys Ape?” Ten Years After. In C. N. K. Dautenhahn (Ed.), Imitation in animals and artefacts (pp. 471–499). Cambridge, MA: MIT Press. Weber, Bruce H., & Depew, David J. (Eds.). (2003). Evolution and learning: the Baldwin effect reconsidered. Cambridge, Mass.: MIT Press. Whiten, A., McGuigan, N., Marshall-Pescini, S., & Hopper, L. M. (2009). Emulation, imitation, overimitation and the scope of culture for child and chimpanzee. Philos Trans R Soc Lond B Biol Sci, 364(1528), 2417–2428.  https://‍ Yakovlev, P. I., & Lecours, A. R. (1966). The myelinogenic cycles of regional maturation of the brain. In A. Minkovski (Ed.), Regional Development of the Brain in Early Life (pp. 3–70). Oxford, UK: Blackwell. Zhong, Y. M., & Rockland, K. S. (2003). Inferior parietal lobule projections to anterior inferotemporal cortex (area TE) in macaque monkey. Cereb Cortex, 13(5), 527–540. https://‍

Voice, gesture and working memory in the emergence of speech Francisco Aboitiz

Pontificia Universidad Católica de Chile

Language and speech depend on a relatively well defined neural circuitry, located predominantly in the left hemisphere. In this article, I discuss the origin of the speech circuit in early humans, as an expansion of an auditory-vocal articulatory network that took place after the last common ancestor with the chimpanzee. I will attempt to converge this perspective with aspects of the Mirror System Hypothesis, particularly those related to the emergence of a meaningful grammar in human communication. Basically, the strengthening of auditory-vocal connectivity via the arcuate fasciculus and related tracts generated an expansion of working memory capacity for vocalizations, that was key for learning complex utterances. This process was concomitant with the development of a robust interface with visual working memory, both in the dorsal and ventral streams of auditory and visual processing. This enabled the bidirectional translation of sequential codes into hierarchical visual representations, through the development of a multimodal interface between both systems. Keywords: arcuate fasciculus, working memory, laryngeal cortex, Broca’s area, vocal learning

Introduction In this paper, I will discuss the emergence of speech after the last common ancestor of chimpanzees and humans, and the origin of the modern language circuits building from an expansion of a preexisting network controlling auditory-mediated vocal articulation. The enhanced network also increased auditory-vocal working memory capacity, which became of benefit to learn complex vocal utterances of social significance. In this process, the auditory-vocal network progressively established associative interactions with visuomotor networks, contributing to transmit meaningful messages. This view emphasizes a coevolution of auditoryvocal and visuomotor circuits from very early stages in early humans, featuring an

https://‍ © 2020 John Benjamins Publishing Company

Voice, gesture and working memory in the emergence of speech

interaction between vocal and visual working memory networks. This perspective downplays the causal role of gesture in Arbib’s Mirror System Hypothesis, but I agree that interaction between "linguistic" and visual working memory networks (for what language is about) is key for estabishing a Template Construction Grammar (TGM) that links linguistic processing to meaning (Arbib, 2016).

Anatomy of the speech circuit To begin, it is necessary to describe the language circuits in some detail. Recent brain imaging studies have unveiled a complex organization of the language network, mostly located in the left hemisphere, but including components of the right hemisphere and adding many cortical and subcortical regions connected to a core language circuit (Tremblay and Dick, 2016) (see Figure 1). The language network connects Broca’s area in the ventrolateral prefrontal cortex (areas 44 and 45), involved in language production (but also comprehension), and Wernicke’s area in the posterior auditory cortex, involved in language perception (but also some production processes). A main tract subserving this connection is the arcuate fasciculus (AF) around the posterior end of the Sylvian fissure. In addition, there is an indirect route between these areas via the inferior parietal lobe, that is connected to Broca’s area via the superior longitudinal fasciculus (SLF), and to Wernicke’s area via the posterior segment of the medial longitudinal fasciculus (MLF, also known as the posterior segment of the AF). Together, the direct and indirect routes are referred to as the dorsal pathway. In addition to these projections, there is a ventral pathway (conveyed by the anterior aspect of the MLF and the inferior fronto-occipital fasciculus) running along the superior temporal lobe, connecting to anterior Broca’s area (area 45) via the extreme capsule (Aboitiz and García, 1997; Aboitiz, 2012; Petrides, 2014; Catani and Bambini, 2014; Tremblay and Dick, 2016). Functional connectivity, as opposed to anatomical connectivity, is a technique that assesses correlations in activity fluctuations of different brain regions. According to this technique, posterior auditory cortex correlates best with area 44 via the AF (dorsal pathway), while area 45 is connected to both the dorsal and the ventral pathways (Friederici, 2011; Petrides, 2014). Area 45 is a point of convergence for a widespread functional and anatomical network encompassing the STS, the anterior temporal lobe, the posterior inferior parietal lobe and area 45 (Petrides, 2014). The latter circuit partly overlaps with the mirror neuron circuitry for action observation in the monkey (Nelissen et al., 2011), and may contribute to cross-talk with the visuomotor system.



Francisco Aboitiz


b. Figure 1.  Anatomical connectivity of the language related regions in the human (A) and homologues in the monkey (B). The dorsal pathway includes the arcuate fasciculus (AF) and the superior longitudinal fasciculus (SLF). The posterior segment of the middle longitudinal fasciculus (MLF) connects posterior auditory regions with inferior parietal areas. Both the AF and the posterior MLF are proposed to expand in human evolution. The ventral pathway runs along the anterior MLF in the superior temporal gyrus (STG) and connects to prefrontal cortex via the extreme capsule (EC). Also shown are the ventral and dorsal visual pathways, originating in the primary visual area (V1). The small segmented arrow below the AF depicts connections between the laryngeal motor cortex and inferior parietal regions. A, auditory cortex; FEF, frontal eye fields (dorsal areas 6 and 8); ILF, inferior longitudinal fasciculus; STS, superior temporal sulcus UF, uncinate fasciculus. PF, PFG, PG are distinct areas of the inferior parietal lobe, and Tpt is a posterior auditory area

Voice, gesture and working memory in the emergence of speech

Grammar and semantics: Imaging studies Functional and tractographic imaging studies have concluded that complex syntactic processing activates preferentially dorsal and posterior ventrolateral prefrontal cortex (including area 44), connected via the dorsal pathway to posterior auditory areas. Furthermore, structural integrity of the AF correlates with syntactical processing capacity (Yamamoto and Sakai, 2016; Skeide and Friederici, 2016). This is probably related to constraints in sequential or phonological processing that may be handled by the dorsal pathway. On the other hand, lexico-semantic processing has been reported to activate predominantly regions connected to the ventral pathway, including the anterior temporal lobe and anterior Broca’s area (area 45) (Friederici, 2011; Petrides, 2014). Nonetheless, due to the subtracting methodology used in functional imaging, these studies have probably downplayed extended networks that operate in both processes, with syntactic networks making use of the ventral pathway, and semantic networks recruiting extended regions of the brain, including motor systems. Thus, studies in adults suggest that syntax processing also activates areas of the ventral pathway, including the left STS and area 45 in the frontal lobe. In lefthemisphere damaged patients, structural alteration of either the AF (dorsal pathway), or the extreme capsule (ventral pathway), resulted in syntactic impairment (Tyler et al., 2010; Griffiths et al., 2013). The same group also found that integrity of anterior Broca’s area and the posterior middle temporal gyrus, both connected to the ventral pathway, were required for correct syntactic performance in neurological patients (Tyler et al., 2011). Likewise, Brennan et al., (2012) observed that when listening to a natural story, syntactic building along the phrases was correlated with activity in the anterior temporal lobe, also part of the ventral pathway. Additional studies have shown that semantic networks include extended areas in the brain, including medial limbic regions, prefrontal cortex and inferior parietal and temporal lobes (Binder and Desai, 2011). A region that seems critical for semantic processing is the STS, which lies in the border between the auditorydriven superior temporal lobe, and the visual-driven middle and inferior temporal lobes. The STS is a multimodal region that participates in language processing, biological motion, theory of mind, and face and voice processing (Beauchamp, 2015). These functions segregate antero-posteriorly, in about the same order as above, with the anterior region being more connected with the ventral pathway, and the posterior region more connected with the dorsal pathway and the inferior parietal lobe (Erickson et al., 2017).



Francisco Aboitiz

Working memory Another process that is relevant for language is verbal, or phonological working memory. This is a transient, limited capacity memory system that keeps information online, to be used in the near future (Baddeley, 2007, 2012). Working memory relies on the activation of sensorimotor loops that maintain information online via persistent neuronal activity, and executive processes that manipulate this activated information. Considering the first, the phonological loop can be defined as an auditory-vocal circuit that transiently keeps articulatory or phonological information, and is tapped by the non-word repetition task and the digit span task, which assess the capacity to transiently keep novel, nonsense phonological combinations online (Baddeley, 2007). The online maintenance of auditory traces is fundamental for the manipulation of these items into novel combinations by executive mechanisms. Recent studies have pointed to area Spt in the posterior auditory cortex as a candidate for an auditory-vocal interface that supports the maintenance of linguistic items in working memory (Hickok et  al., 2011). Area Spt is defined by functional activations, and is believed to overlap with cytoarchitectonic area Tpt that has been considered to represent the core of Wernicke’s area (Tremblay and Dick, 2016) (see Figure  1). Additional studies indicate that the inferior parietal lobe, via the dorsal pathway, contributes attentional resources and selects motor articulatory programs to stabilize phonological working memory (Aboitiz, 2012, 2017; Rauschecker, 2012). Natural speech processing takes place at a series of interacting levels, from phonological to lexical and to syntactical, where combinations of lower order units are recognized according to a contextual template provided by syntactic rules or by semantics/pragmatics, as depicted in Arbib and Caplan’s discussion of the HEARSAY speech processing algorithm (Arbib and Caplan, 1979). It is important to make clear that “verbal” working memory can be separated into different overlapping networks involved in phonological, syntactic and semantic processes (Caplan and Waters, 1999), each of which being supported by the respective neural networks depicted above. In natural speech processing, persistent activity in each of these networks maintains phonological, semantic and syntactic templates online while these are cross-checked among themselves to achieve comprehension and manipulate the linguistic items at different levels. Although some studies have attempted to separate syntactic processing from working memory mechanisms (Makuuchi and Friederici, 2013), perhaps more interesting in these findings is the strong overlap and functional connectivity that was observed between both circuits, being stronger with increasing grammatical complexity. Yet, the original function of phonological working memory may not be just the capacity to process

Voice, gesture and working memory in the emergence of speech

complex language, but to learn it. Baddeley showed in the 1990s that phonological working memory, as assessed by the non-word repetition task, is associated with vocabulary learning in children (Baddeley, 2007).

From monkey to human The organization of the language network largely parallels the cortical auditory network shared with the macaque, which is also divided in dorsal and ventral pathways that interact closely between them (Rauschecker, 2017). The dorsal component is involved in sound localization and performs time-dependent analyses of the stimulus, while the ventral pathway is related to stimulus identification (Romanski et al., 1999; Romanski, 2007; Rauschecker, 2012; Plakke and Romanski, 2016). The dorsal pathway does not only perform sensory functions, but contributes to select motor programs on the basis of sensory information, such as orientation movements to sound sources (Hickok, 2017). Furthermore, all these pathways are bidirectional, and prefrontal and motor areas exert a strong top-down influence to auditory areas, by providing a corollary discharge of the motor program that permits anticipation of the perception of the executed action (in this case, vocalizations), correcting errors and fine-tuning the subsequent motor programs, a process called predictive coding (Rauschecker, 2012). In humans, top-down influences from the ventrolateral prefrontal cortex and other regions carrying mixed motor, semantic and syntactic information, run back to posterior auditory cortex, modulating early stages of speech perception, contributing to stabilize short-term auditory traces during verbal working memory (Rauschecker, 2012; Skeide and Friederici, 2016; Okada et al., 2018). A stronger case of predictive coding, and of more relevance to communication, is predicting what other speakers will say, especially during a conversation (Stivers et al., 2009). The subdivision into dorsal and ventral auditory processing streams emulates the well-known organization of the visual system, containing a dorsal spatialmovement pathway that serves to coordinate actions along the superior parietal and frontal lobes, and a ventral pathway along the inferior temporal lobe and ventral-dorsolateral prefrontal cortex involved in identification of objects and faces (Goldman-Rakic, 1995) (Figure 1). Interestingly, the ventral visual pathway, traveling along the inferior temporal lobe, projects to anterior Broca’s area, partly overlapping with the termination of the auditory ventral pathway (Aboitiz and García, 1997; Romanski, 2007). This region has been found to activate during toolmaking behavior in humans, and is possibly a place for overlap between visual and verbal working memory systems (Putt et al., 2017). Humans and monkeys display largely similar networks of auditory-prefrontal connectivity (Catani and Bambini, 2014). However, tractographic analyses



Francisco Aboitiz

revealed a gradual expansion of the AF from monkey to chimpanzee to human (Figure 1), partly due to disproportionate expansion of the temporoparietal junction and ventrolateral prefrontal cortex in primate evolution (Rilling et al., 2008, 2012, Petrides, 2014; Catani and Bambini, 2014; Aboitiz, 2017). Compared to chimpanzees, the human AF shows increased development of the connectivity between STS and area 44 (see above) (Rilling et al., 2012; Rilling, 2014), which can be related to increasing auditory-vocal connectivity. This anatomical evidence has been confirmed by a weaker functional connectivity between auditory and ventrolateral prefrontal regions in the macaque than in the human (Neubert et al., 2014). An intriguing observation is the relative expansion of the dorsal pathway from monkey to chimpanzee. As tractographic data do not have the resolution to specifically track auditory projections in the AF, it is not clear to what extent this tract conveys auditory or other kind of information, which would be very important to assess experimentally. One possibility is that it serves orofacial control, which is well developed in apes and is extremely relevant for consonant production in modern speech (Lameira et  al., 2014). Recall that face processing is partly represented in the posterior STS, a target of the AF (Beauchamp, 2015). This could represent a dorsal projection to the mouth premotor cortex conveying visual or auditory information (Coudé and Ferrari, this volume). Another possibility is that it relates to other functions of the STS such as action processing, which might fit the mirror system hypothesis. Finally, it is also possible that some threshold of auditory-vocal functional connectivity was required for the establishment of a sufficiently robust articulatory system, which has not been achieved in apes.

Speech origins Consistent with the discussion above, my colleagues and I have proposed that in human evolution, the structural/functional increase of temporoparietal-prefrontal connectivity via the AF and neighboring tracts of the dorsal pathway was critical for the development of a circuit that strengthened auditory-vocal working memory (Aboitiz and García, 1997; Aboitiz, 2012). Increasing working memory capacity was key for learning complex vocal utterances, that gave rise to a primitive phonology. Furthermore, monkeys are strongly limited in auditory recognition and auditory long and short term memory, the latter due to instability of the auditory traces that might be more robust in humans by a top-down influence via the dorsal pathway (Colombo et al., 1990; Scott et al., 2012, 2014). Monkeys’ auditory memories of conspecific voices have been proposed to rely on ventral pathway mechanisms and be actually visual memories (of monkey faces) on which the auditory component is supported (Fritz et al., 2005, 2016). Likewise, tractographic data reveals that the development of the human AF, but not of the ventral

Voice, gesture and working memory in the emergence of speech

pathway, covariates with phonological working memory, verbal fluency and sentence comprehension (Yeatman et al., 2011; Skeide and Friederici, 2016; Schomers et al., 2017). We have proposed that strenthening of auditory-vocal working memory and the associated capacity to learn vocal sequences were of selective benefit for individual and group recognition in the context of an early culture based on tight social cooperation, child rearing and tool making (García et al., 2014; Aboitiz, 2017; see also Wilson and Petkov, this issue). At least in birds, cooperative breeding is related to vocalization complexity, suggesting that social behavior by itself may be a selective force for vocal complexity (Leighton, 2017). Particularly, this innovation may have been important to support a ventral pathway circuit that gave rise to a primitive lexical or pre-lexical communication system, as Baddeley showed for vocabulary learning in children (Baddeley, 2007). However, additional mechanisms may have been required for the development of a sophisticated phonology, a meaningful lexicon and finally a hierarchical grammatical structure. Increasing auditory-vocal working memory certainly supported these acquisitions, but distinct neural processes may have been involved in their development.

Descending control systems Another innovation in the emergence of the speech circuit is the voluntary control of the larynx, which is necessary, but perhaps not sufficient, for vocal learning capacity. In all primates studied, two different vocal control systems have been described, a non-volitional one encompassing limbic components and multisynaptic descending projections to the brainstem, and a voluntary one comprising descending axons from the motor cortex representing the larynx and orofacial musculature, which is connected with the language circuits described above (Hage and Nieder, 2016; see also Coudé and Ferrari, this issue). Considering the second system, it has been claimed that only in humans among primates there is a strong, direct descending cortical projection from motor and premotor laryngeal areas into the nucleus ambiguus of the brainstem, which controls many of the muscles involved in vocalization. In non-human primates, these descending projections are indirect, reaching the reticular formation, where they synapse on interneurons that in turn innervate the nucleus ambiguus (Coudé et al., 2011). Furthermore, connectivity from the laryngeal motor cortex to inferior parietal cortex is seven fold more robust in humans than in macaques, suggesting a much more elaborate sensory-vocal connectivity than in other primates (Figure 1, small segmented arrows below the AF) (Kumar et al., 2016). Hickok recently proposed that through this pathway, area Spt provides polysynaptic auditory feedback to laryngeal motor control, modulating prosodic output among other things (Hickok,



Francisco Aboitiz

2017). This evidence parallels the findings of an expansion of the AF in humans relative to other primates, although Kumar et al. (2016) reported no significant differences in connectivity between the laryngeal cortex and the STG between monkey and human. Nonetheless, the expansion of a pathway to prefrontal areas via the AF or neighboring tracts probably increased auditory-vocal working memory capacity, dramatically increasing learned vocal production, probably contributing to skilfully manage articulatory processes, enabling the production of novel utterances (Schomers et al., 2017). The Mirror System Hypothesis prescribes that in human evolution, the direct connection from the laryngeal motor cortex into the nucleus ambiguus was somehow facilitated by the development of the corticospinal component controlling hand movements (Rizzolatti and Arbib, 1998; Arbib, 2016). Other possibilities are that the direct descending projection to the laryngeal motoneurons appeared de novo, as it has originated in songbirds, who display a direct descending control over the song musculature (Hage and Nieder, 2016), or as a simple consequence of cortical expansion, which increases the number of descending cortical axons (Herculano-Houzel et  al., 2016). Still another alternative is that these connections are present in the newborn non-human primate and retract during postnatal development, as happens in the hand corticospinal system of non-primates (Gu et al., 2017). If this is the case, maintenance until adulthood of these projections by plastic or genetic mechanisms might not be a difficult developmental step.

Hand control and the mirror neuron system An additional element contributing to language origins was hand dexterity, as said provided by a direct descending control over finger motor neurons from the motor cortex, that is typical of primates. Grasping behavior, on which tool-making relies substantially, depends on complex neural networks including the direct descending projection from motor and premotor cortices into motor neurons innervating arm and hand muscles, and parieto-frontal networks for visuomotor control in which mirror neurons participate (Arbib, 2012, Chapters 4 and 5). Arbib (2012) further asserts that during evolution from the last common ancestor with monkeys to humans, dexterity developed to support both the use of learned gestures for communication and an increasing range of manual skills requiring complex parieto-frontal dynamics, with increased ability for imitation supporting the learning of novel skills. This condition led both to complex gestural communication and the elaboration of a culture based on tool making. With this claim, he and other exponents of the mirror system hypothesis pose hand gesturing and complex imitative capacity at the basis of language evolution.

Voice, gesture and working memory in the emergence of speech

More precisely, Arbib and others have proposed the Mirror System Hypothesis to account for the emergence of language from ancestral hand-grasping mechanisms, in which grammar and semantics evolved deeply intertwined in an actionoriented computational device (Arbib, 2012, 2017). Very briefly, he proposes a scenario in which the last common ancestor of humans and apes had voluntary control of hands but not of voice, and in which language emerged from a rudimentary pantomime system, acquired by the faculty for imitating complex actions, partly provided by the mirror neuron system. This was followed by a "protosign" stage, i.e. an open repertoire of meaningful manual signs. The mirror system for grasping provided the scaffolding for the emergence of voluntary vocal control, giving rise to "protospeech", which coevolved with protosign in an open spiral until (via cultural rather biological evolution) speech gained preponderance. Thus, a lexical, or prelexical, multimodal system (involving gestures and vocalizations) was at the basis of the emergence of grammatical rules, which took cultural evolution to be acquired. Our original proposal that the development of the AF and related tracts in early humans facilitated the expansion of auditory-vocal working memory and the origin of speech is not incompatible with Arbib’s hypothesis (Aboitiz, 2017). For example, it also emphasizes the development of complex imitation capacity as a requisite for speech, and the early emergence of a lexical/prelexical system. Yet, I better envision a scenario of coevolution between vocal and manual sensorimotor networks since very early in human evolution, rather than auditory-vocal plasticity strictly deriving from a hand-grasping system (Aboitiz, 2017).

Template construction grammar Below I will discuss a possible relation between the amplification of working memory capacity and the elaboration of a grammatically organized language, building on Michael Arbib’s model of Template Construction Grammar (TCG), in the context of the Mirror Neuron Hypothesis. I will suggest some anatomical background for Arbib’s TCG, which in my view fits the evidence of connectivity discussed in this paper and may provide a useful empirical test for it. Arbib’s model of template construction grammar (TCG) provides a rich account on how the visual and the language systems interact to generate a meaningful grammatical system. Arbib (2012, 2017) emphasizes the linkage of visual mechanisms for assessing a scene with language use based on a grammar adequate to support description of visual scenes. (The language model operates at the word level, and is agnostic as to whether words are expressed via speech or sign.) The grammar has an intrinsic hierarchical organization that identifies agents, attributes and additional items. While TCG postulates purely ventral mechanisms (and is


80 Francisco Aboitiz

thus in conflict with those results that suggest the dorsal path is essential for complex grammatical processing), Arbib does accept a role for the visual dorsal stream in executing and recognizing the parametric details of actions, and for recognizing and executing the phonological details of speech. In this extended perspective, the frontal cortex then integrates/evaluates the output of both streams and generates top-down control over both pathways. Candidates for such function may be area 45 and the dorsolateral prefrontal cortex, connected with the ventral and dorsal auditory and visual pathways. Likewise, the interactions between the dorsal and the ventral pathways for actor computations may be mediated by connectivity between the STS and the inferior parietal lobe (particularly, areas PF and PFG; Figure 1). According to Arbib, so-called content words are linked to conceptual schemas of agents, objects, and attributes (possibly mediated by the STS) whereas function words tend to be bundled up within constructions that explicate relations between the elements that the function words describe. Describing a scene or an action relies on a TCG, where a scene interpretation network is generated in visual working memory (STS and ventral stream), then is translated into a semantic representation about the event (again, possibly mediated by the STS), and is finally translated into words by the language system in Broca’s region via a grammatical working memory (see Arbib, 2017, for further details). Summarizing, Arbib’s model fits the above scheme depicted here of a core articulatory dorsal circuit providing phonological processing, surrounded by a multimodal network that couples this system with visuomotor circuits. Area 45 is particularly relevant for this connectivity, as it is connected with the ventral and dorsal auditory and visual pathways. Likewise, the interactions between the dorsal and the ventral pathways for actor computations may be mediated by connectivity between the STS and the inferior parietal lobe (particularly, areas PF and PFG; Figure  1). Yet, in my opinion, while the basic binding mechanisms (the so-called Merge function) that are critical for grammatical processing may overlap with widespread semantic networks, the articulatory processes of the dorsal pathway may contribute to keep these items in phonological working memory, contributing to organize them in a hierarchical sentence. In this context, a recent report has shown increasing oscillatory activity in the left hemisphere during sentence processing, presumably related to increasing working memory load, which quickly drops as the sentence is formed (Nelson et al., 2017).

Towards a new road map According to the above discussion, an important issue to address in the next road map is what were the relevant mechanisms involved in the origin of speech itself. In this line, the development of auditory-vocal and orofacial circuitry in

Voice, gesture and working memory in the emergence of speech

non-human primates has been downplayed by the Mirror System Hypothesis, although there are notable exceptions (Coudé et al., 2011; Coudé and Ferrari, this volume). Concerning neuroanatomy, studies on the development of descending connectivity of the laryngeal cortex in non-human primates will be instructive for hypotheses on the origin of human vocal control. Furthermore, the functions of the AF in apes need further investigation. Does this tract convey auditory, face or body movement information, or all of them? This information would help decide whether the speech network arose from a preexistent auditory-vocal circuitry or if it was co-opted from other sensorimotor domain, as for example hand control. Another important challenge for the new road map is to provide a neurobiological ground to the bidirectional translation between vocal, sequential signals into visuospatial or multimodal representations depicting objects, actions and events. As said, brain regions like area 45, the STS and inferior parietal areas are especially good candidates for such functions. Among other topics, further studies should focus on the linguistic properties of areas processing biological motion and action planning in the temporal and parietal areas and their connectivity with auditory-vocal circuits, as well as analyzing semantic and action processing in the ventrolateral prefrontal cortex.

Acknowledgements Thanks to Isabel Guerrero for illustrations.

Funding This research was supported in part by FONDECYT Grant 1160258. The paper was prepared for a workshop funded by NSF Grant No. BCS-1343544 “INSPIRE Track 1: Action, Vision and Language, and their Brain Mechanisms in Evolutionary Relationship,” (M.A. Arbib, Principal Investigator).

References Aboitiz, F. (2012). Gestures, vocalizations, and memory in language origins. Frontiers in Evolutionary Neuroscience 4, 2. Aboitiz, F. (2017). A Brain for Speech. A View from Evolutionary Neuroanatomy. London: Palgrave Macmillan.  https://‍ Aboitiz, F., & García, R. (1997). The evolutionary origin of the language areas in the human brain. A neuroanatomical perspective. Brain Research Reviews 25, 381–396. https://‍‍00053-2 Arbib, M. A. (2012). How The Brain Got Language. The Mirror System Hypothesis. Oxford: Oxford University Press.  https://‍



Francisco Aboitiz Arbib, M. A. (2016). Toward the Language-Ready Brain: Biological Evolution and Primate Comparisons. Psychonomics Bulletin Review 24, 142–150. https://‍ Arbib, M. A. (2017). Towards a computational comparative neuroprimatology: framing the language-ready brain. Physics of Life Reviews, 16, 1–54. Arbib, M. A., & Caplan, D. (1979). Neurolinguistics must be computational. Behavioral and Brain Sciences 2, 449–483.  https://‍ Baddeley, A. (2007). Working Memory, Thought and Action. Oxford: Oxford University Press. https://‍ Baddeley, A. (2012). Working memory: theories, models and controversies. Annual Review of Psychology 63, 1–29.  https://‍ Beauchamp, M. S. (2015). The social mysteries of the superior temporal sulcus. Trends in Cognitive Sciences 19, 489–490.  https://‍ Binder, J. R., & Desai, R. H. (2011). The neurobiology of semantic memory. Trends in Cognitive Sciences 15, 527–536.  https://‍ Brennan, J., Nir, Y., Hasson, U., Malach, R., Heeger, D. J., & Pylkkänen, L. (2012). Syntactic structure building in the anterior temporal lobe during natural story listening. Brain and Language 120, 163–173.  https://‍ Caplan, D., & Waters, G. S. (1999). Verbal working memory and sentence comprehension. Behavioral and Brain Sciences 22, 77–94.  https://‍ Catani, M., & Bambini, V. (2014). A model for Social Communication And Language Evolution and Development (SCALED). Current Opinion in Neurobiology 28, 165–171. https://‍ Colombo, M., D’Amato, M. R., Rodman, H. R., & Gross, C. G. (1990). Auditory association cortex lesions impair auditory short-term memory in monkeys. Science 247, 336–338. https://‍ Coudé, G., Ferrari, P. F., Rodà, F., Maranesi, M., Borelli, E., Veroni, V., Monti, F., Rozzi, S., & Fogassi, L. (2011). Neurons controlling voluntary vocalization in the macaque ventral premotor cortex. PLoS One 6, e26822.  https://‍ Coudé, G. & Ferrari, P. F. Reflections on the organization of the cortical motor system and its role in the evolution of communication in primates (this volume). Erickson, L. C., Rauschecker, J. P., & Turkeltaub, P. E. (2017). Meta-analytic connectivity modeling of the human superior temporal sulcus. Brain Structure and Function 222, 267–285. https://‍ Friederici, A. D. (2011). The brain basis of language processing: from structure to function. Physiological Reviews 91, 1357–1392.  https://‍ Fritz, J. B., Elhilali, M., Shamma, S. A. (2005). Differential dynamic plasticity of A1 receptive fields during multiple spectral tasks. Journal of Neuroscience 25, 7623–7635 https://‍ Fritz, J. B., Malloy, M., Mishkin, M., & Saunders, R. C. (2016). Monkey’s short-term auditory memory nearly abolished by combined removal of the rostral superior temporal gyrus and rhinal cortices. Brain Research 1640, 289–98.  https://‍ García, R. R., Zamorano, F., & Aboitiz, F. (2014). From imitation to meaning: circuit plasticity and the acquisition of a conventionalized semantics. Frontiers in Human Neuroscience 8, 605. Goldman-Rakic, P. S. (1995). Cellular basis of working memory. Neuron 14, 477–485. https://‍‍90304-6

Voice, gesture and working memory in the emergence of speech

Griffiths, J. D., Marslen-Wilson, W. D., Stamatakis, E. A., & Tyler, L. K. (2013). Functional organization of the neural language system: dorsal and ventral pathways are critical for syntax. Cerebral Cortex 23, 139–147.  https://‍ Gu, Z., Kalambogias, J., Yoshioka, S., Han, W., Li, Z., Kawasawa, Y. I., Pochareddy, S., Li, Z., Liu, F., Xu, X., Wijeratne, H. R. S., Ueno, M., Blatz, E., Salomone, J., Kumanogoh, A., Rasin, M. R., Gebelein, B., Weirauch, M. T., Sestan, N., Martin, J. H., & Yoshida, Y. (2017). Control of species-dependent cortico-motoneuronal connections underlying manual dexterity. Science 357, 400–404.  https://‍ Hage, S. R., & Nieder, A. (2016). Dual Neural Network Model for the Evolution of Speech and Language. Trends in Neuroscience 39, 813–829.  https://‍ Herculano-Houzel, S., Kaas, J. H., & de Oliveira-Souza, R. (2016). Corticalization of motor control in humans is a consequence of brain scaling in primate evolution. Journal of Comparative Neurology 524, 448–455.  https://‍ Hickok, G. (2017). A cortical circuit for voluntary laryngeal control: Implications for the evolution language. Psychonomics Bulletin Review 24, 56–63. https://‍ Hickok, G., Houde, J., & Rong, F. (2011). Sensorimotor integration in speech processing: computational basis and neural organization. Neuron 69, 407–422. https://‍ Kumar, V., Croxson, P. L., & Simonyan, K. (2016). Structural organization of the laryngeal motor cortical network and its implication for evolution of speech production. Journal of Neuroscience 36, 4170–4181.  https://‍ Lameira, A. R., Maddieson, I., & Zuberbühler, K. (2014). Primate feedstock for the evolution of consonants. Trends in Cognitive Sciences 18, 60–62. https://‍ Leighton, G. M. (2017). Cooperative breeding influences the number and type of vocalizations in avan lineages. Proceedings of the Royal Society B 284 (1868), pii, 20171508. Makuuchi, M., & Friederici, A. D. (2013). Hierarchical functional connectivity between the core language system and the working memory system. Cortex 49, 2416–2423. https://‍ Nelissen, K., Borra, E., Gerbella, M., Rozzi, S., Luppino, G., Vanduffel, W., Rizzolatti, G., & Orban, G. A. (2011). Action observation circuits in the macaque monkey cortex. Journal of Neuroscience 31, 3743–3756.  https://‍ Nelson, M. J., El Karoui, I., Giber, K., Yang, X., Cohen, L., Koopman, H., Cash, S. S., Naccache, L., Hale, J. T., Pallier, C., & Dehaene, S. (2017). Neurophysiological dynamics of phrasestructure building during sentence processing. Proceedings of the National Academy of Science U.S.A. 114, E3669–E3678  https://‍ Neubert, F. X., Mars, R. B., Thomas, A. G., Sallet, J., & Rushworth, M. F. (2014). Comparison of human ventral frontal cortex areas for cognitive control and language with areas in monkey frontal cortex. Neuron 81, 700–713.  https://‍ Okada, K., Matchin, W., & Hickok, G. (2018). Neural evidence for predictive coding in auditory cortex during speech production. Psychonomics Bulletin Reviews 25, 423–430. https://‍ Petrides, M. (2014). Neuroanatomy of Language Regions of the Human Brain. New York: Academic Press. Plakke, B., & Romanski, L. M. (2016). Neural circuits in auditory and audiovisual memory. Brain Research 1640, 278–288.  https://‍



Francisco Aboitiz Putt, S. S., Wijeakumar, S., Franciscus, R. G., Spencer, J. P. (2017). The functional brain networks that underlie Early Stone Age tool manufacture. Nature Human Behaviour, 1, 1–8. https://‍ Rauschecker, J. P. (2012). Ventral and dorsal streams in the evolution of speech and language. Frontiers in Evolutionary Neuroscience 4, 7. Rauschecker, J. P. (2017). Where, When, and How: Are they all sensorimotor? Towards a unified view of the dorsal pathway in vision and audition. Cortex [Epub ahead of print] Rilling, J. K. (2014). Comparative primate neurobiology and the evolution of brain language systems. Current Opinion in Neurobiology 28, 10–14. https://‍ Rilling, J. K., Glasser, M. F., Jbabdi, S., Andersson, J., & Preuss, T. M. (2012). Continuity, divergence, and the evolution of brain language pathways. Frontiers in Evolutionary Neuroscience 3, 11. Rilling, J. K., Glasser, M. F., Preuss, T. M., Ma, X., Zhao, T., Hu, X., & Behrens, T. E. (2008). The evolution of the arcuate fasciculus revealed with comparative DTI. Nature Neuroscience 11, 426–428.  https://‍ Rizzolatti, G., & Arbib, M. A. (1998). Language within our grasp. Trends in Neuroscience 21, 188–194.  https://‍‍01260-0 Romanski, L. M., Tian, B., Fritz, J., Mishkin, M., Goldman-Rakic, P. S., & Rauschecker, J. P. (1999). Dual streams of auditory afferents target multiple domains in the primate prefrontal cortex. Nature Neuroscience 2, 1131–1136.  https://‍ Romanski, L. M. (2007). Representation and integration of auditory and visual stimuli in the primate ventral lateral prefrontal cortex. Cerebral Cortex 17, Suppl 1: i61–i69. https://‍ Schomers, M. R., Garagnani, M., & Pulvermüller, F. (2017). Neurocomputational Consequences of Evolutionary Connectivity Changes in Perisylvian Language Cortex. Journal of Neuroscience 37, 3045–3055.  https://‍ Scott, B. H., Mishkin, M., & Yin, P. (2012). Monkeys have a limited form of short-term memory in audition. Proceedings of the National Academy of Science U.S.A. 109, 12237–12241. https://‍ Scott, B. H., Mishkin, M., & Yin, P. (2014). Neural correlates of auditory short-term memory in rostral superior temporal cortex. Current Biology 24, 2767–2775. https://‍ Stivers, T., Enfield, N. J., Brown, P., Englert, C., Hayashi, M., Heinemann, T., Hoymann, G., Rossano, F., de Ruiter, J. P., Yoon, K. E., & Levinson, S. C. (2009). Universals and cultural variation in turn-taking in conversation. Proceedings of the National Academy of Science U.S.A. 106, 10587–10592.  https://‍ Skeide, M. A., & Friederici, A. D. (2016). The ontogeny of the cortical language network. Nature Reviews Neuroscience 17, 323–332.  https://‍ Tremblay, P., & Dick, A. S. (2016), Broca and Wernicke are dead, or moving past the classic model of language neurobiology. Brain and Language 162, 60–71. https://‍ Tyler, L. K., Shafto, M. A., Randall, B., Wright, P., Marslen-Wilson, W. D., & Stamatakis, E. A. (2010). Preserving syntactic processing across the adult life span: the modulation of the frontotemporal language system in the context of age-related atrophy. Cerebral Cortex 20, 352–364.  https://‍

Voice, gesture and working memory in the emergence of speech

Tyler, L. K., Marslen-Wilson, W. D., Randall, B., Wright, P., Devereux, B. J., Zhuang, J., Papoutsi, M., & Stamatakis, E. A. (2011). Left inferior frontal cortex and syntax: function, structure and behaviour in patients with left hemisphere damage. Brain 134, 415–431. https://‍ Wilson, B., & Petkov, P. I. From evolutionarily conesrved frontal regions for sequence processing to human innovations in syntax (this volume). Yamamoto, K., & Sakai, K. L. (2016). The dorsal rather than ventral pathway better reflects individual syntactic abilities in second language. Frontiers in Human Neuroscience 10, 295. https://‍ Yeatman, J. D., Dougherty, R. F., Rykhlevskaia, E., Sherbondy, A. J., Deutsch, G. K., Wandell, B. A., & Ben-Shachar, M. (2011). Anatomical properties of the arcuate fasciculus predict phonological and reading skills in children. Journal of Cognitive Neuroscience 23, 3304–3317.  https://‍


Bringing in Emotion

Relating the evolution of Music-Readiness and Language-Readiness within the context of comparative neuroprimatology Uwe Seifert

University of Cologne

Language- and music-readiness are demonstrated as related within comparative neuroprimatology by elaborating three hypotheses concerning music-readiness (MR): The (musicological) rhythm-first hypothesis (MR-1), the combinatoriality hypothesis (MR-2), and the socio-affect-cohesion hypothesis (MR-3). MR-1 states that rhythm precedes evolutionarily melody and tonality. MR-2 states that complex imitation and fractionation within the expanding spiral of the mirror system/complex imitation hypothesis (MS/CIH) lead to the combinatorial capacities of rhythm necessary for building up a musical lexicon and complex structures; and rhythm, in connection with repetition and variation, scaffolds both musical form and content. MR-3 states that music’s main evolutionary function is to self-induce affective states in individuals to cope with distress; rhythm, in particular isochrony, provides a temporal framework to support movement synchronization, inducing shared affective states in group members, which in turn enhances group cohesion. This document reviews current behavioural and neurocognitive research relevant to the comparative neuroprimatology of music-readiness. It further proposes to extend MS/CIH through the evolution of the relationship of the language- and music-ready brain, by comparing “affective rhythm” and prosody – i.e. by comparatively approaching the language- and music-emotion link in neuroprimatology. Keywords: mirror system hypothesis, comparative neuroprimatology, languagereadiness, music-readiness, musical emotion, musical rhythm processing, prosody, shared emotion, cohesion, musical beat, music-ready brain, language-ready brain

https://‍ © 2020 John Benjamins Publishing Company

Music-Readiness and Language-Readiness in neuroprimatology

1. Introduction A basic issue in probing the evolution of the language-ready brain is the extent to which it was a response to adaptive pressures for increased communication and more detailed information-sharing, and to what extent it exapted brain mechanisms developed for other purposes, such as increasingly sophisticated manual skills. In this paper, we explore the relation between music-readiness (MR) and language-readiness, noting that language and music share some neuronal substrates for the processing of structural hierarchies and temporal sequences (e.g. Broca’s area and the basal ganglia), and that language supports the direct expression of propositional content, whereas music does not. In the last 20 years, interest has risen in the evolution and neurocognition of music, in which the dominant research topic has been the music-language relationship, in particular “overlapping” neuronal structures and mechanisms shared in the processing of language and music (Jantzen, Large, & Magne, 2016). However, recent research by Norman-Haignere, Kanwisher, & McDermott (2015) reveals distinct cortical pathways in the non-primary auditory cortex for pitch in music and speech. Moreover, research on the neurocognition of music indicates that isochronous beats and tonal encoding of pitch are specific to music. Research since 2000 on the evolution of music has focused mainly on the adaptive value of music, by referring to behavioral studies from comparative psychology and animal cognition. Different hypotheses concerning music’s adaptive function have been proposed, ranging from coalition signaling, social bonding, and cohesion to signaling mate quality. Only recently has evolutionary research shifted to the question of phylogeny and music’s design features, with emphasis on birdsong and (artificial) grammar (learning) as research paradigms (for music, see Honing, 2018; for language, with primate and non-primate approaches, see Scharff, Friederici, & Petrides, 2013, Wilson this volume). In addition, research on emotion through cognitive music psychology is now beginning to integrate evolutionary and neuroscientific ideas. Most of this research lacks a neuroethological perspective, and is not related to the mirror system or complex imitation hypothesis (MS/CIH; Arbib, 2016). One of the weakest points of MS/CIH is its relation to affective processing, in particular to prosodic processing of speech (Ackermann, Hage, & Ziegler, 2014). Thus, the challenge is to relate research on language- and music-readiness (MR); put research on music-readiness into the comparative neuroprimatological framework; investigate the relationship of the emotion-language and emotion-music linkage; and point out paths for further investigations on language and music within this framework and the 2018 road map. We present three hypotheses:



Uwe Seifert

1. MR-1: the musicological “rhythm-first hypothesis.” Rhythm precedes melody (song/singing). 2. MR-2: the “combinatoriality hypothesis.” At the proto-language stage of MS/ CIH, imitation and fractionation provided the necessary combinatorial properties and discreteness of sounds for proto-music; this then lead to musical lexicons with musical pitches and scales for melody and harmony. 3. MR-3: the “socio-affect-cohesion hypothesis.” MR-3 indicates a possible social extension of the MS/CIH by taking into account affective processes. It involves research on prosody and vocal control, comparing studies on the emotional effect of rhythmic structures and motor control. This thesis combines music’s roles in the affective states of individuals and shared affective states of group members. Referring to the individual, it states that ‘pleasurable’ repetitive sound structures contribute to self-induced individual distress reduction. Referring to the group, it states that synchronisation of movements via musical rhythm induces shared affective states, particularly to cope with distress among group members, and subsequently enhancing group cohesion. MR-1 and MR-2 mutually support each other; MR-3 complements MR-1 and MR-2 by introducing rhythmic entrainment in group behavior, and extends them by introducing emotions and communication of emotions as a research topic for language- and music readiness. These hypotheses will in this paper, as far as currently possible, be related to behavioral and neurocognitive findings relevant to comparative neuroprimatology. We first distinguish between comparative neuroprimatology and MS/CIH. The three hypotheses are then concurrently explored in terms of developing links to comparative neuroprimatology, by reviewing relevant current behavioral and neurocognitive music studies. Before proceeding, it must be noted that key terms in this article – “music”, “emotion”, “rhythm”, “entrainment”, and “cohesion” – are not strictly defined. An intuitive understanding of the ideas will be sufficient for the present goal; indeed, rigorous discussion would digress from the article’s topic. The term “music” is used in a very broad sense, encompassing the vast range of musical systems, styles, performances, and pieces in human history and diverse cultures. “Affect” and “emotion” are used interchangeably as umbrella terms for reward, desire, pleasure, motivation, value, etc. Emotions are approached as an internal mechanism involved in behavioral organization and external means of communication (Arbib & Fellous, 2004). “Rhythm” is understood as a specific organization of temporal flux or a pattern of movement in time (Powers, 2003; see Toussaint, 2013 for different rhythm definitions). It is used both broadly, as a component alongside melody and harmony, and narrowly to indicate meter and tempo. “Beat” is best understood as the musical phenomenon that may elicit, for example, temporally regular clapping or

Music-Readiness and Language-Readiness in neuroprimatology 89

tapping as motor response. “Entrainment”, “coordination”, and “synchronization” are used synonymously; these refer to alignments of movements and sounds, as in group dances. “Cohesion” encompasses bonding, attachment, and cooperation. 2. Comparative neuroprimatology and MS/CIH Comparative neuroprimatology is “the comparative study of action-oriented perception, communication and language in monkeys, apes and humans” (Arbib, 2016). The quotation may equally be applied to research on the music-ready brain, if one substitutes “music” for “language.” Parity, complex imitation, and the homology of F5 and Broca’s area are core ideas of MS/CIH, as a neuroprimatological and gesture-based account in which biological evolution of language-readiness is rooted in action and perception for both praxis and communication. That an utterance has almost the same meaning for sender and receiver is referred to as “parity”. MS/CIH grounds parity in communicative interaction via mirror systems for articulatory actions, although further mechanisms must evolve beyond the mirror for the hearer to grasp the lexical or propositional content of the sender’s hierarchically structured utterance, or their intention to act. Complex imitation  – psychologically, an observational-learning operation biologically based in part on mirror mechanisms – is the key operation that leads to protolanguages. Briefly: MS/CIH states that, in language evolution action observation, (complex) recognition of action patterns as a basis for imitation of related patterns lead (conceptually guided) to protosign scaffolding protospeech; and, via pantomime, paved the way to the open-ended protolexical (semantic) systems – protolanguages  – and the potential for novel combinatorial systems of the human languageready brain (Arbib, 2016). Via fractionation of holophrastic protolanguages, this then became languages with syntax, recursion, and compositionality. MS/CIH explains why language evolved as a multimodal manual/facial/vocal system, by placing its roots in the primate manual control system rather than the core primate call system. Music, too, can be viewed as an evolutionary multimodal phenomenon. Our challenge is to define the core capabilities of the music-ready brain; cultural evolution can then yield diverse “musics” from that basis. In sum: Comparative neuroprimatology is a broad enterprise, in which complex imitation, fractionation and homology are at the center of the MS/CIH approach to language-readiness. Crucially for the present paper, further mechanisms seem to be required to bring the sender’s affective-motivational states into a social situation. To develop a comparative neuroprimatology of music-readiness, and to compare the language-ready brain with the music-ready brain, we must find

90 Uwe Seifert

candidates for the structural and functional characteristics of music, taking affective-motivational phenomena and the social situation into account. 3. The music-readiness hypotheses, comparative neuroprimatology, and MS/CIH 3.1 MR-1, the rhythm-first hypothesis Musicology provides evidence and arguments to support the plausibility of MR-1, the rhythm-first hypothesis. Distinguishing between rhythm, melody, harmony, and form, musicology has two main thought traditions concerning music evolution (Wallaschek, 1891). One, following Darwin, states that music is rooted in singing and emotion; this can be characterized as “song first.“ The other posits that music developed in strong connection with dance through rhythm, e.g. stamping feet, clapping hands, smacking lips, and other body movements. This thought tradition (which, regarding motor control, clearly seems more applicable to primates than to birds or cetaceans) relates the evolution of music to ritual, and can be characterized as “rhythm first.” Some comparative musicologists argue that the tonal encoding of pitch – the root of melody  – is much more a cultural phenomenon than rhythm, and that tonal relationships are not as stable as rhythmic phenomena – and that, therefore, melody evolved later than rhythmic processing. Evidence for this also comes from developmental psychology: Harmony perception and harmonic understanding develop late in ontogeny, and are heavily dependent on culture. Further support for the thesis comes from the evolution of musical instruments (Montagu, 2017). Interestingly, Lawergren (1988) refers to hand-song as a “musical” intermediary between gestural and vocal communication, because it combines gestural language with physiologically and neurologically simple vocal behavior; he points out that hand-song as an early musical instrument is implied by gesture-based theories. The idea that music evolved from bodily movements with voice playing, at first, a secondary role, is consistent with the idea that protolanguage emerged from gesture, and that the path to speech is indirect. In accordance with MS/CIH, the complexity of singing, like speech, is rooted in the neural mechanisms for vocal control, and therefore tonal encoding of pitch. This implies that the neuronal capacity for melody processing needed longer to evolve than that for rhythm processing. To summarize: MR-1 is in accordance with one musicological thought tradition; evidence and arguments are provided by comparative musicology, the evolution of musical instruments, and developmental psychology.

Music-Readiness and Language-Readiness in neuroprimatology

3.2 MR-2, the combinatoriality hypothesis MR-2, the combinatoriality hypothesis, links language- and music-readiness. It states that, via complex imitation and protosign (and/or fractionation), MS/CIH solves the riddle of combinatoriality in music, i.e. the capacity to build and, then, combine simple discrete entities (first (percussive) sound events, then musical tones) into more complex ones with a linear order: first cyclic rhythmic (sound) patterns (in drumming, stamping, etc.), then melody (with musical tones in song). Combinatoriality and discreteness, in general, are as necessary to achieve flexibility in combining abstract sound structures in rhythmic and melodic processing (termed “generativity” in Merker, 2015) as in combining words of a language. The key issue here is that, according to MS/CIH, complex action recognition and imitation provide the basis for fractionation, and that fractionation in turn supports two processes: (i) the segmentation of holophrases into semantically meaningful elements (forming the lexicon) and constructions (forming the grammar), and (ii) the segmentation of holophrases and the emergent words into meaningless units (phonemes and syllables, creating phonology). MS/CIH claims that this process leads to an intrusion of neural structures of the communication system based on F5/Broca’s area into neural systems involved in vocalization. This explains why – if accepting the MS/CIH hypothesis that the path to speech is indirect – vocal control of singing and therefore tonal encoding of pitch evolved later in music evolution. According to MR-2, the process that led to protolanguages in hominid evolution also provides the basis for the combinatoriality of the music-ready brain, and supports complex structure building in the rhythmic and (later) melodic domain. We postulate that neural mechanisms for beat processing, particularly isochronicity, developed prior to tonal pitch encoding  – but that both are key capacities for music-readiness. We note that the biological and cultural evolution of music took place in mutual dependency with the evolution of dance movements (Wang, 2015), and proto-music formed an interdependent process of fractionation and recombination. The development of specific musical lexicons and syntactical structures, i.e. specific musical scales and forms, modes, tonality, polyphony, harmonic relationships, etc., took millennia of cultural evolution. 3.3 MR-3, the socio-affect-cohesion hypothesis If: neural capacities for rhythm processing preceded melody processing (MR-1); combinatoriality in music can be traced back to the same mechanisms proposed by MS/CIH for language-readiness (MR-2); music is more strongly related to emotion than language (Seifert et al., 2013); and, as indicated by ethnology, music is group-oriented and embedded in socio-cultural situations (e.g. Lewis, 2013); then



Uwe Seifert

we need to elaborate on MR-1 and MR-2. This is done through MR-3, the socioaffect-cohesion hypothesis. MR-3 states that, for an individual, music functions as a means of affective self-regulation; in particular, it functions as a means of fear reduction, or, more generally, distress reduction, through neural processes generating pleasurable affective states. However, the role of emotion in social interaction may be most critical when exploring the evolution of music-readiness; collectively, musical rhythm may function to induce similar shared affective states in connection with individual and collective distress reduction, as well as supporting interaction synchrony in members of a group to foster coordination and cohesion. Support for MR-3 comes from both psychological research and the neurosciences of music. A review of the psychological functions of music in listening (Schäfer et al., 2013) identified arousal and mood regulation, self-awareness, and social relatedness as the “Big Three of music listening,” and views them as candidates for evolutionary functions of music. Further support comes from the cognitive neuroscience of musical emotions and psychoneuroendrocrinology of music. For example, the hippocampus is involved, through music-induced positive emotions, in stress reduction, social attachment and social cohesion (Koelsch, 2014). In addition, the beta-endorphin level, an indicator of situational stress, is reduced by some music, in which case it correlates with a reduction of blood pressure, worry and anxiety (Schaefer, 2017). There is also increasing evidence that changes in certain neurotransmitter levels, which are related to social affiliation, cohesion, and attachment, are induced by music (Tarr, Launay, & Dunbar, 2014). What is currently known in the neurosciences of music about the neural processing of rhythm, its importance for structure building, and how this rhythmic structure building is related to musical emotions must be used to relate this evidence to comparative neuroprimatology, and how these results might relate to research on an emotion-language linkage in MS/CIH. Moreover, how these findings on musical rhythm and affective-motivational states in the individual could be related to shared affects in groups should be explored. Vuust and Kringelbach (2010) consider the importance of rhythmic structure building in musical form and musical emotions. They focus on musical expectancy, a psychological operation for musical emotions corresponding to the neural mechanism and the evolution of brain structures, proposed by Juslin et al. (2010). Vuust and Kringelbach point out that affective expectation depends on the timing structure of the music; and that meter, i.e., the repetitive cyclic hierarchical patterns of strong and weak beats, provides predictive structures by setting up a “framework for interpreting and remembering music” underlying all other musical expectancy structures (rhythm, melody, and harmony). They find that the mechanisms described by Juslin et al. “act on top of the general principal of musical anticipation and may help to identify how music can influence the reward

Music-Readiness and Language-Readiness in neuroprimatology

system;” they also note that keeping a rhythm involves the motor system, the cerebellum, premotor areas, and basal ganglia. Trost, Labbé, and Grandjean (2017) review and investigate the role of rhythmic entrainment as an affect induction mechanism. They distinguish between perceptual, autonomic physiological, motor, and social entrainment as forms of rhythmic entrainment. “Perceptual entrainment” refers to perceptual representation of period rhythmical patterns; “social entrainment,” synchronization of behavior with conspecifics; “motor entrainment,“ sensorimotor synchronization with beat and meter; and “autonomic physiological entrainment,” adaptation of physiological rhythms towards tempo. Their work suggests that different forms of rhythmic entrainment all induce affect. Further empirical research on musical affect can start from research on “affective rhythm” in connection with rhythmic entrainment, given that, according to Vuust and Kringelbach, rhythm processing (beat, meter; repetition, Margulis, 2013) involves the most basic mechanisms for musical structure building, i.e. musical form, and affective musical experience. Therefore, rhythm is a solid starting point to deal with affect in establishing a comparative neuroprimatology of music-readiness. To date, there are few neuroscientific findings on rhythm processing in nonhuman primates directly relevant to a comparative neuroprimatology of music. Only two approaches (Merchant & Honing, 2014; Patel, 2014) in the cognitive neuroscience of music investigate non-human primates’ rhythm processing capacities. Both focus on Rauschecker’s auditory dual-stream model (Rauschecker & Scott, 2016) and the basal ganglia in beat processing. Merchant and Honing (2014) develop a gradual audiomotor evolution hypothesis (GAE) with focus on the cortico-basal-ganglia-thalamocortical circuit. Patel’s (2014) action simulation for auditory prediction hypothesis (ASAP) is concerned with the superior longitudinal fascicle, branch II, and a premotor-basal ganglia-auditory network. In contrast to MR-1, MR-2 and GAE, ASAP posits that “the capacity to synchronize with a musical beat resulted from changes in brain structure driven by the evolution of complex vocal learning” (Patel, 2014). W. Tecumseh Fitch (2015) is currently involved in research on the biological origins of rhythm processing from a broader biological perspective, without referring to specific neuroscientific findings (but see Merchant et al., 2015). Interestingly, the volume on the evolution of rhythm cognition edited by Ravignani, Honing, & Kotz (2017) contains only one behavioural (and no neurocognitive) study on non-human primates directly relevant to comparative neuroprimatology. In sum: Behavioral and neuroscientific research on (affective) rhythm processing in non-human primates that might inform a comparative neuroprimatology of music-readiness is still in its infancy. Generally, current neurocognitive research on ‘rhythm’ as relevant to comparative neuroprimatology focuses on beat


94 Uwe Seifert

perception, and adopts Rauschecker’s dual-stream model of auditory processing by focusing on the dorsal stream including the basal ganglia. Notably, this research does not deal with emotion processing and the ventral stream. However, Rauschecker (2013) provides some information for speech processing: The anterior-ventral stream and a medial prefrontal network are involved in affective processing. In addition, Sammler et  al. (2015) have investigated the role of the dorsal and ventral stream for prosody, and found some hemispheric asymmetry concerning prosodic processing. In contrast to MR-1, MR-2 and GAE, ASAP regards vocal learning as cause for changes in brain structure that then supported beat induction and synchronisation. As noted in the introduction, MS/CIH lacks an emotion-language linkage; it is not concerned with prosody, which is often taken as a starting point for comparative research on affect in music and speech. Comparing prosody and music, and rhythm in particular, it must be noted that music research distinguishes between rhythms rooted in body movements and those in speech (Powers, 2003). Bodily rooted rhythms are “fundamentally accentual,” whereas speech-oriented rhythms are more varied. Accentual rhythm, which contrasts loud and soft events, is often bodily-based, but can be speech-based; this is opposed to speech-based durational rhythm, in which the contrast of event duration is relevant. Depending on the language, one or the other form of rhythm is realized in speech. In our view, accentual rhythm is evolutionarily more fundamental than the durational rhythm that developed later, as proposed, with speech. Affective prosody in speech might therefore be considered different than affect in melody and rhythm processing – thus positing different dissociable processes in the music- and language-ready brain. As not much is known about this today, we rely on preliminary clues. Armony and LeDoux (2010) focus on the amygdala and distinguish between emotional prosody and music. The amygdala responds to innate and species-specific vocalizations, but there is not much evidence that the amygdala is involved in processing emotional prosody, and the role of the amygdala in music processing is inconclusive. Armony and LeDoux suggest a comparison of the neural networks involved in the processing of prosody and music. Their suggestion and preliminary distinction between emotional prosody and music may indicate a difference in emotional processing in language and music; but further research is needed. The most important recent contribution to the integration of neuroscientific findings on prosody and musical emotion was made by Frühholz, Trost, and Kotz (2016). They propose the first neurocognitive model integrating research results on affective voices and musical emotions. This model indicates that the nucleus accumbens – in particular the ventral tegmental area, hippocampus, and orbitofrontal cortex – are involved in processing musical emotions. The ventral tegmental

Music-Readiness and Language-Readiness in neuroprimatology

area is involved in reward processing; the hippocampus is relevant to social cohesion; the orbitofrontal cortex is associated with processing secondary or learned values. This suggests the importance of musical emotion for social and cultural processes; it also supports our view that different neural networks are implied in music and prosodic processing. Taking these findings and the musicological concept of bodily based accentual rhythm into account, prosody and affective rhythm might be postulated to have had a common origin in the mouth mirror-neuron system (Coudé this volume; Ferrari et al., 2017) and its associated neural networks for motivation and emotion processing, that, after further extensions, modifications and changes, resulted in different neural networks in language and music for affective processing. In general, this suggests that research on affective-motivational processing in monkeys and chimpanzees on the cortical and subcortical level should be included in comparative neuroprimatology research on the language-ready and music-ready brain (Semendeferi this volume); and that, concerning research on prosody, MS/CIH will benefit from integrating comparative neuroprimatology of music-readiness into its framework for comparative research. Thus far, MR-3 has been explored mainly from a proximate perspective of neurocognition of rhythm and affective rhythm in the individual, to find links towards developing a comparative neuroprimatology of music-readiness, and as a starting point for investigations of the emotion-language linkage of the languageready brain. Some existing neuroscientific findings support the importance of music as more group-oriented than language (e.g. Schulkin & Raglan, 2014). These findings on affective-emotional processing in individuals can be extended to group phenomena in an evolutionary context by referring to group displays, in which participants simultaneously perform the same acts. Maynard Smith and Harper (2003) describe a group display lasting hours, with outbursts of calls and drumming, exhibited by chimpanzees of the Budungo Forest in Uganda. Characterized by the excitement and change of the “psychological” state of the participants, they interpret these displays as self-induction of a psychological state conducive to further cooperation. This can be seen as a kind of ‘musical’ parity condition, albeit without referring to intentional-conceptual or propositional meaning as in the parity condition of MS/CIH. MR-3 captures and expresses this idea. Merker (2000) also refers to the same display, but draws attention to synchronous chorusing, discussing this capacity in relation to LCA-c (Last Common Ancestor chimpanzee). He notes the role of vocalization in group displays for the evolution of language and music. Here, current research on turn-taking (see also Burkart’s paper on callitrichids, e.g., marmosets this volume) in macaques and indris is relevant to research on music-readiness.


96 Uwe Seifert

Rhythmic capacities and drumming behavior of non-human primates has also recently gained interest. For instance, Remedios, Logothetis, and Kayser (2009) show that macaque monkeys’ drumming serves as a multi-modal signal of social dominance for the group. Notably, this is interpreted as a socially relevant communicative behavior. Kirschner and Tomasello (2010) note a disposition in many nonhuman primates to create percussive sounds by slapping and stomping on resonating objects. This seems to be in accordance with the idea that the motor control of rhythmic behaviour might be at music’s origin and important in social interaction. 4. Towards a new road map Comparative neuroprimatology of music-readiness views music as a multimodal phenomenon which evolved from non-human primates’ (social) multimodal interaction with action-call-facial-expression units between dyads, group members, and groups. The current focus on drumming in comparative research on musical rhythm shifts to research on multimodal action-call-facial units in social interaction, and takes other movements and turn-taking, e.g. percussive actions such as hand clapping, into account as well. We posit that, to carry out protomusic, the music-ready brain must have been able to process isochrony, beat induction and tonal encoding of pitch. The transition from protomusic to music lead to extended musical lexicons (scales, modes), conventionalized musical meaning, and forms. In parallel, there was an evolution of (musical) emotions. Three hypotheses (MR-1 “rhythm first”, MR-2 “combinatoriality”, and MR-3 “socio-affect-cohesion”) have been discussed and plausibly demonstrated as relating language- and music-readiness, and linking music-readiness to comparative neuroprimatology. How might these hypotheses contribute to the new road map? MusicReadiness generally complements MS/CIH’s multimodal approach to language evolution by an action-call-facial-expression-unit approach to primate communication; by focusing on the communication of desires, i.e. sharing emotions, instead of the communication of states of affairs, i.e. sharing ideas; by elucidating the role of motivation and emotion in the evolution of communication and coordination in dyads and groups; and by comparing the evolution of neural networks for motivation-emotion processing in communicative behaviour, and, for speech, prosody and rhythm processing, from the perspective of music evolution. MR-1 is opposed to the vocal learning approach in language evolution (Aboitiz this volume; Killin, 2017; Merker, 2015). Research guided by this hypothesis (and MR-2) will contribute to answering the most essential question of gestural approaches to language evolution and, a fortiori, language-readiness: how the vocal communication system replaced the gestural system. It further contributes to the

Music-Readiness and Language-Readiness in neuroprimatology

controversial claim that protosign scaffolds protospeech (Arbib, 2016) by challenging or complementing the vocal learning approach to speech evolution, in directing attention to the biological evolution of the neural networks involved in rhythmic behavior in social interaction, and the interaction of language, music, and dance in cultural evolution. The implication of MR-1 is that neural networks controlling isochronous beat processing and rhythmic synchronization were the driving force for neural changes that lead to extending vocal learning capacity in nonhuman primates. In addition, if vocal and gestural capacities evolved together in mutual dependency, then research on the music-ready brain helps tease apart their specific relations. After the music-ready brain – the neural capacity for tonal encoding of pitch, the motor control of isochronous movements and the capacity to synchronize to (proto-musical) beat – had evolved, music making and dance supported the evolution of neural networks for extended vocal learning and control (singing and speech) within the expanding spiral. Here, the evolution of spoken language and (human) song, and their relationship, come into focus. MR-2 concerns a core question about the relationship of language- and music-readiness: To what extent do complex imitation and fractionation of language-readiness account for the combinatorial properties of music-readiness? It also provides, in connection with MR-1, the key to the riddle of how the vocal system took over. MR-3 addresses a weakness of language-readiness  – the language-emotionmotivation linkage  – and bridges the gap from the individual in dyadic communication to group behavior. For MS/CIH, the evolution of the interaction of cortical and subcortical networks in speech, in particular prosody, and language processing and the role of neuromodulation come into focus, and are shaped by comparing neural networks for motivation and emotion processing in language and music evolution. This is necessary to explain the social aspects of language, i.e. language as a social phenomenon and its function for e. g. bonding, coordination, cooperation and group cohesion in social interaction. A starting point for future research might be to look at how the networks in which hand and mouth mirrorneurons are embedded changed (Coudé this volume; Ferrari et al., 2017). Because the mouth mirror-neurons are connected to the limbic “emotion” system, it is necessary to take into account the evolution of the neural networks for motivation and emotion processing, i.e. the evolutionary changes of subcortical networks and their connections to cortical networks. This article focuses on an internal view of emotion by looking at neural substrates of prosody and rhythm. MR-1, MR-2 and MR-3 imply that neural networks for rhythm and prosody processing are not the same. This could mean that emotional and interactional prosody in social interaction for joint group activities might differ from dyadic conversational turntaking. This in turn would have implications for research strategies on turn-taking


98 Uwe Seifert

and alignment dealing with the external view of emotion. Moreover, MR-1, MR-2 and MR-3 predict that the basic timing mechanisms of conversational turn-taking evolved from the evolution of neural substrates for rhythm processing. In general, by challenging the traditional view of prosody and song, MR-1, MR-2 and MR-3 contribute to recent research on emotional and interactional prosody (Filippi, 2016; Cross, 2014), and the interaction of protospeech, protosign, and protomusic in the evolution of language and music (Brown, 2017). Research on affectiveemotional processing in language and music brings subcortical areas, rhythm, and emotional and interactional prosody into focus. Integrating emotional and motivational aspects, i.e. research on prosody, MS/CIH will contribute to illuminating why humans developed a “Mitteilungsbedürfniss”, a desire to share thoughts and emotions through talking. Concerning prosocial behavior, music, and language, a comparative neuroprimatology might also benefit from research on motivation-emotion networks, looking at the Williams syndrome from a neuroscientific and behavioural perspective, and using it as a bridge to link and extend research on language- and music-readiness to molecular genetic investigations (Semendeferi this volume; Bellugi et al., 1999). MR-1, MR-2 and MR-3 are concerned with both biological and cultural evolution and address questions in the three main areas of primate phylogeny most relevant to comparative neuroprimatology, i.e. LCM-c, LCM-m and the hominin line (Homo and Australopithecus). To test and evaluate these hypotheses, more behavioural as well as neuroscientific studies on key “musical” capacities (isochronicity, beat induction, repetition, entrainment, and tonal encoding of pitch) of non-human primates (especially chimpanzees and macaques) in social interaction are needed. Notably, as the current state of neurocognitive technology and methodology, e.g. mobile EEG and hyperscanning, is in its infancy, it is difficult or impossible to get relevant neuroscientific data on interaction within groups. Therefore, more observational and experimental field studies concerning the ‘meaning’ of multimodal action-call-facial-expression units in group displays of non-human primates are needed (Liebal; this volume), as is more data about turn-taking within group interactions; research from neuroarchaeology on tool making and use, and transmission of percussive technologies, should also come into play (see Stout and Putt this volume). In sum: The inclusion of research on music-readiness in the road map will complement and extend research on language-readiness; specifically, it will help answer the central questions of gestural theories of language origins, illuminate the role of prosody in language evolution, and bridge the gap from individual or dyadic communicative behavior to group behavior; and, as a consequence, draw out the role of culturally driven co-evolution in language- and music-readiness.

Music-Readiness and Language-Readiness in neuroprimatology 99

Acknowledgements The author thanks Michael Arbib, the reviewers, Rebekka Gold, and the other workshop participants.

Funding The paper was prepared for a workshop funded by NSF Grant No. BCS-1343544 “INSPIRE Track 1: Action, Vision and Language, and their Brain Mechanisms in Evolutionary Relationship,” (M.A. Arbib, Principal Investigator). This research was supported in part by a travel grant from the University of Cologne.

References Ackermann, H., Hage, S. R., & Ziegler, W. (2014). Brain mechanisms of acoustic communication in humans and nonhumans primates: An evolutionary perspective. Behavioral and Brain Sciences, 37, 529–604.  https://‍ Arbib, M. A. (2016). Towards a Computational Comparative Neuroprimatology: Framing the language-ready brain. Physics of Life Reviews, 16, 1–54. https://‍ Arbib, M. A., & Fellous, J. -M. (2004). Emotions: from brain to robot. Trends in Cognitive Sciences, 8(12), 554–561.  https://‍ Armony, J. L., & LeDoux, J. E. (2010). Emotional responses to auditory stimuli. In A. Rees & A. R. Palmer (Eds.), The Oxford Handbook of Auditory Science: The Auditory Brain, Vol. 2 (pp. 479–505). Oxford: Oxford University Press. Bellugi, U., Lichtenberger, L., Mills, D., Galaburda, A., & Korenberg, J. R. (1999). Bridging cognition, the brain and molecular genetics: evidence from Williams syndrome. Trends in Neurosciences, 22(5), 197–207.  https://‍‍01397-1 Brown, S. (2017). A Joint Prosodic Origin of Language and Music. Frontiers in Psychology, 8(1894).  https://‍ Cross, I. (2014). Music and communication in music psychology. Psychology of Music, 42(6), 809–819.  https://‍ Ferrari, P. F., Gerbella, M., Coudé, G., & Rozzi, S. (2017). Two different mirror neuron networks: The sensorimotor (hand) and limbic (face) pathways. Neuroscience, 358, 300–315. https://‍ Filippi, P. (2016). Emotional and Interactional Prosody across Animal Communication Systems: A Comparative Approach to the Emergence of Language. Frontiers in Psychology, 7(1393). https://‍ Fitch, W. T. (2015). The Biology and Evolution of Musical Rhythm: An Update. In I. Toivonen, P. Csúri, & E. van der Zee (Eds.), Structures in the Mind: Essays on Language, Music, and Cognition in Honor of Ray Jackendoff (pp. 293–323). Cambridge, MA: The MIT Press. Frühholz, S., Trost, W., & Kotz, S. A. (2016). The sound of emotions – Towards a unifying neural network perspective of affective sound processing. Neuroscience & Biobehavioral Reviews, 68, 96–110.  https://‍ Honing, H. (Ed.) (2018). The Origins of Musicality. Cambridge, MA: The MIT Press

100 Uwe Seifert Jantzen, M. G., Large, E. W., & Magne, C. (Eds.). (2016). Overlap of Neural Systems for Processing Language and Music. s. l.: Frontiers in Psychology / Frontiers in Neuroscience. https://‍ Juslin, P. J., Liljeström, S., Västfjäll, D., & Lundquist, L. -O. (2010). How does music evoke emotions? Exploring the underlying mechanisms. In P. Juslin & J. A. Sloboda (Eds.), Music and Emotion: Theory, Research, Applications (pp. 605–642). Oxford: Oxford University Press. Killin, A. (2017). Where did language come from? Connecting sign, song, and speech in hominin evolution. Biological & Philosophy.  https://‍ Kirschner, S., & Tomasello, M. (2010). Joint music making promotes prosocial behavior in 4-year-old children. Evolution and Human Behavior, 31, 354–364. https://‍ Koelsch, S. (2014). Brain correlates of music-evoked emotions. Nature Review Neuroscience, 15(3), 170–180.  https://‍ Lawergren, B. (1988). The Origin of Musical Instruments and Sounds. Anthropos, 83(1/‍3), 31–45. Lewis, J. (2013). A cross-cultural perspective on the significance of music and dance to culture and society. In M. A. Arbib (Ed.), Language, Music, and the Brain: A Mysterious Relationship (pp. 45–65). Cambridge, MA: The MIT Press. Margulis, E. H. (2013). Repetition and Emotive Communication in Music Versus Speech. Frontiers in Psychology, 4, 167.  https://‍ Maynard Smith, J., & Harper, D. (2003). Animal Signals. Oxford: Oxford University Press. Merchant, H., Grahn, J., Trainor, L., Rohrmeier, M., & Fitch, W. T. (2015). Finding the beat: a neural perspective across humans and non-human primates. Philosophical Transactions of the Royal Society B: Biological Sciences, 370, 20140093. https://‍ Merchant, H., & Honing, H. (2014). Are non-human primates capable of rhythmic entrainment? Evidence for the gradual audiomotor evolution hypothesis. Frontiers in Neuroscience, 7(274).  https://‍ Merker, B. (2015). Seven Theses on the Biology of Music and Language. Signata, 6, 195–215. https://‍ Merker, B. H. (2000). The birth of music in synchronous chorusing at the hominid-chimpanzee split. Paper presented at the International Conference on Music Perception and Cognition 2000. Montagu, J. (2017). How Music and Instruments Began: A Brief Overview of the Origin and Entire Development of Music, from Its Earliest Stages. Frontiers in Sociology, 2, 8. https://‍ Norman-Haignere, S., Kanwisher, N. G., & McDermott, J. H. (2015). Distinct Cortical Pathways for Music and Speech Revealed by Hypothesis-Free Voxel Decomposition. Neuron, 88, 1281–1296.  https://‍ Patel, A. D. (2014). The Evolutionary Biology of Musical Rhythm: Was Darwin Wrong? PLOS Biology, 12(3), e1001821.  https://‍ Powers, H. (2003). Rhythm. In D. M. Randel (Ed.), The Harvard Dictionary of Music. Fourth Edition (pp. 723–729). Cambridge, MA: Belnap Press. Rauschecker, J. P. (2013). Brain networks for the encoding of emotions in communication sounds of human and nonhuman primates. In E. Altenmüller, S. Schmidt, & E. Zimmermann (Eds.), Evolution of Emotional Communication: From Sounds in Nonhuman Mammals to Speech and Music in Man (pp. 49–62). Oxford: Oxford University Press. https://‍

Music-Readiness and Language-Readiness in neuroprimatology 101

Rauschecker, J. P., & Scott, S. K. (2016). Chapter 24 – Pathways and Streams in the Auditory Cortex: An Update on How Work in Nonhuman Primates has Contributed to Our Understanding of Human Speech Processing A2 – Hickok, Gregory. In S. L. Small (Ed.), Neurobiology of Language (pp. 287–298). San Diego: Academic Press. https://‍ Ravignani, A., Honing, H., & Kotz, S. A. (Eds.). (2017). The Evolution of Rhythm Cognition: Timing in Music and Speech. s. l.: Frontiers in Human Neuroscience. https://‍ Remedios, R., Logothetis, N. K., & Kayser, C. (2009). Monkey drumming reveals common networks for perceiving vocal and nonvocal communication sounds. PNAS, 106(2), 1810–1815. Sammler, D., Grosbras, M. -H., Anwander, A., Bestelmeyer, P. E. G., & Belin, P. (2015). Dorsal and Ventral Pathways for Prosody. Current Biology, 25, 3079–3085. https://‍ Schaefer, H. -E. (2017). Music-Evoked Emotions – Current Studies. Frontiers in Neuroscience, 11(600).  https://‍ Schäfer, T., Sedlmaier, P., Städtler, C., & Huron, D. (2013). The Psychological Functions of Music Listening. Frontiers in Psychology, 4(511), 1–33. Scharff, C., Friederici, A. D., & Petrides, M. (Eds.). (2013). Neurobiology of Human Language and Its Evolution: Primate and Non-Primate Perspectives. s. l.: Frontiers in Evolutionary Neuroscience.  https://‍ Schulkin, J., & Raglan, G. B. (2014). The evolution of music and human social capability. Frontiers in Neuroscience, 8, 292.  https://‍ Seifert, U., Verschure, P. F. M. J., Arbib, M. A., Cohen, A. J., Fogassi, L., Fritz, T., … Scherer, K. (2013). Semantics of Internal and External Worlds. In M. A. Arbib (Ed.), Language, Music, and the Brain: A Mysterious Relationship, Strüngmann Forum Reports, vol. 10. (pp. 203–229) Cambridge, MA: MIT Press. Tarr, B., Launay, J., & Dunbar, R. I. M. (2014). Music and social bonding: “self-other” merging and neurohormonal mechanisms. Frontiers in Psychology, 5, 1096. https://‍ Toussaint, G. T. (2013). The Geometry of Musical Rhythm: What Makes a “Good” Rhythm Good? Boca Raton: CRC Press. Trost, W. J., Labbé, C., & Grandjean, D. (2017). Rhythmic entrainment as a musical affect induction mechanism. Neuropsychologia, 96, 96–110. https://‍ Vuust, P., & Kringelbach, M. L. (2010). The Pleasure of Music. In M. L. Kringelbach & K. C. Berridge (Eds.), Pleasures of the Brain (pp. 255–269). Oxford: Oxford University Press. Wang, T. (2015). A hypothesis on the biological origins and social evolution of music and dance. Frontiers in Neuroscience, 9(30), 1–10. Wallaschek, R. (1891). On the Origin of Music. Mind, 16(63), 375–386. https://‍

Why do we want to talk? Evolution of neural substrates of emotion and social cognition Katerina Semendeferi

University of California, San Diego

Cognitive and emotional processes are now known to be intertwined and thus the limbic system that underlies emotions is important for human brain evolution, including the evolution of circuits supporting language. The neural substrates of limbic functions, like motivation, attention, inhibition, evaluation, detection of emotional stimuli and others have changed over time. Even though no new, added structures are present in the human brain compared to nonhuman primates, evolution tweaks existing structural systems with possible functional implications. Empirical comparative neuroanatomical evidence is presented here in support of such changes in the limbic system, including the amygdala and the orbitofrontal cortex. Given their possible functional significance, these alterations may further enable and enhance human interest and motivation to communicate beyond what is seen in other primates living in complex social groups. The argument here is that even though emotion processing is likely needed for increased social complexity independent of language, the reason why humans want to talk may be related in part to the enhancement of socioemotional processes resulting from the reorganization and rewiring of underlying neural systems some of which are interconnected to the language areas. Neurodevelopmental disorders in humans affecting both language and sociability fuel such arguments. Keywords: limbic system, ventromedial prefrontal cortex, orbitofrontal cortex, amygdala, striatum, motivation, inhibition, Williams syndrome

Introduction What is behind the strong drive to interact and communicate with others and how has this strong desire influenced the evolution of human gestural communication and language? A small but increasing number of studies targets areas in the human

https://‍ © 2020 John Benjamins Publishing Company

Why do we want to talk? 103

and nonhuman ape and other primate brains that are part of neural systems involved in limbic functions. Even though the contribution of this line of work is not to offer direct insights on whether language evolution is largely driven by a desire for increased gestural and symbolic communication specifically in the human line, it provides critical evidence of derived changes to the brain after LCA-c that would seem to be important for enhancing communication systems. The information provided here also draws attention to the fact that emotions, motivation and intentionality need to be taken into consideration in reconstructions of language evolution given that the underlying neural systems demonstrate features that are species specific. The field of neuropsychology has been yielding increasing evidence that language in humans depends on a set of neural systems that go beyond the classical language areas and that these neural territories exhibit individual variation. There is also evidence that regions of the brain that underlie linguistic function are involved in aspects of emotional processing. The brain increased after the last common ancestor with the chimpanzee (LCA-c) threefold in overall size (Falk, 2016) and the relative size of some of its anatomical components increased while others decreased. Subtle structural modifications accompanied changes in size, pointing to possible alterations in neural circuitry in both cortical and subcortical regions like the number, size, and distribution of cells within them (Semendeferi et al., 2010) and the resulting changes in connectivity (see for example Aboitiz, Hecht). In human evolution, these changes in brain size and organization were potentially critical for systems that underlie cognitive functions, including aspects of language and emotions. Even though neural structures involved in limbic functions, like motivation, attention, inhibition, evaluation, memory, detection of emotional stimuli and modulation of subsequent behavioral responses and others, were viewed historically as conserved in evolution (Barger et al., 2014), there is now increasing evidence to the contrary. Such evidence is relevant to earlier arguments favoring an intimate role between the evolution of cognitive functions, including language, and emotions (Armstrong, 1990). Reports from comparative human and nonhuman primate brain studies demonstrate significant changes in neural limbic systems including cortical and subcortical structures (Figure 1). Specific changes at the cellular and architectonic levels in multiple neural systems, including brain regions like the orbitofrontal cortex, frontal pole, amygdala, anterior cingulate that are linked intimately to systems involved in limbic functions, among others, need to be taken into consideration in models attempting to reconstruct human language evolution. Studies of neurodevelopmental disorders offer additional insights into the interplay between language and emotions (Järvinen et al., 2013). These include disorders with wide genetic etiologies, like the autism spectrum disorder (ASD), but

104 Katerina Semendeferi

Limbic Structures

Ventromedial Frontal Cortex Striatum Caudal Orbital Frontal Cortex Septal Nuclei Temporal Polar Cortex Amygdala Pyriform and Entorhinal Cortex Cingulate Cortex Anterior Thalamus Mammilary Bodies Hippocampus

Figure 1.  Limbic structures shown on the mesial surface of a human brain (modified from Barger et al., 2014). Some of these structures have been studied comparatively in humans, apes and other primates and are discussed in the text. See also Figure 2

also disorders with more focused genetic underpinnings like Williams syndrome (WS) that is caused by a hemizygous deletion of about 26–28 genes in chromosome seven. WS exhibits an excessive sociability that is accompanied by a relative proficiency in expressive language and increased interest in music (see Seifert this volume on music and language), while other significant intellectual and nonverbal cognitive dimensions are impaired (Bellugi et al., 1999, Järvinen et al., 2013). What is of additional interest in the context of language evolution in the case of WS is that the affected chromosomal region is also known to have undergone a number of genomic changes in recent human evolution (Antonell et al., 2005). I argue here that species specific brain alterations at the neuronal level support the idea that language-relevant functional changes on the path from LCA-c to Homo sapiens involve neural phenotypic changes not only in the language area homologues themselves, like Broca’s and Wernicke’s, but equally important in the ways humans handle emotions and navigate their social environment. This makes the neural substrates of emotional and linguistic capacities, including areas involved in face and mouth motor control that are intimately linked to limbic structures (Ferrari et al., 2017, Coudé & Ferrari, 2018), relevant to understanding language. Information from neurodevelopmental disorders that exhibit alterations in linguistic and emotional capabilities (Bellugi et al., 2007) as well as the underlying neural circuitry, discussed briefly below, can further inform reconstructions of

Why do we want to talk? 105

language evolution in LCA-c and Homo sapiens. The classical language areas in the context of evolution are addressed extensively by other papers in this workshop (see for example Aboitiz, Schoenemann, Hecht, Coudé) drawing on primary neuroprimatological data and related models of cognitive evolution (Arbib). This paper goes beyond the classical language areas and into neural structures connected with and supporting communication through limbic functions. The argument here is that the brain of Homo sapiens is more vulnerable to alterations, enhancement or disruptions, compared to apes and other primates and that these alterations occur in neural systems controlling language and emotions more than in other parts of the brain. These alterations are the outcome of increases in absolute size that “carried along” changes in organization compared to smaller brains, like LCA-c, and also epigenetic factors related to the effects of social environment, protoculture and prolonged development that have been in effect already before the appearance of modern humans. The fronto-limbic circuitry, more so than the dorsolateral prefrontal cortex, primary motor, visual, sensory cortex and the main language areas in the frontal lobe, are changed in Homo sapiens and the changes were not driven solely by brain size increase but by additional selection operating on a reorganized brain in a complex social environment.

Gestural communication, language and limbic neural substrates in human and nonhuman primates Research on brain lesions, as well as neural disorders and imaging of control subjects, demonstrates the critical contribution of emotion processing in complex cognition (Damasio, 1994; LeDoux, 1996). It is now well accepted that emotions influence elements of cognition like attention, working memory and cognitive control, blurring the distinction between cognitive and emotional processes (Okon-Singer et al., 2015; Pessoa, 2013). Emotions constitute internal, controlledby-the-central-nervous-system states, triggered by specific extrinsic or intrinsic to the organism stimuli (Anderson and Adolphs, 2014). Some features of emotions are involved in social communication, with animals signaling states that predict behaviors (Adolphs, 2017; Arbib and Fellous, 2004). Studies of language in humans reveal coactivation of sensory, motor and limbic networks pointing to a convergence between processing of semantic and emotional content across modalities of communication (Belyk et al., 2017), but that is not to claim that limbic changes and links between emotion and cognition are related to increased communication of emotional states per se or to speaking emotionally versus discussing emotionally charged events. Studies on macaque monkeys provide increasing evidence that limbic cortical and subcortical areas are recruited in motor activity and are interconnected

106 Katerina Semendeferi

with areas in the lateral and ventromedial frontal cortex, including face and mouth motor areas (Ferrari et al., 2017). Such connections presumably contribute to the emotional and motivational component of decision making for actions. Overall motor cortices in primates have a close association with the limbic system including the anterior cingulate where complete motor representations can be found that receive massive inputs from all parts of the limbic lobe including the basolateral complex of the amygdala (Heimer and Van Hoesen, 2006). Activation of connections between the temporal sensory association cortex, amygdala, and orbitofrontal cortex accompany emotional arousal, while decisions for action in emotional situations may ultimately be directed from lateral prefrontal cortex, which is connected to the orbitofrontal cortex in a layer-specific manner (Barbas, 2015). The ventral premotor and precentral areas have ties with areas in the insula and the operculum enabling fine tuning of facial expressions, gesturing and mirroring (Morecraft et  al., 2015) that allow for motor behavior to become meaningful. Ties with the anterior cingulate and insula support emotional and motivational aspects of motor function. These observations from the macaque monkey have implications for understanding possible precursors of phonemic and emotional components of language and communication in humans (Morecraft et  al., 2015) given their presence in primate species separated from humans by more than twenty million years. Even though limbic structures have attracted less interest in primate comparative studies in the past, there is now increasing neuroanatomical information on species specific comparisons between humans, apes and other primates on structures (Lew and Semendeferi, 2017) that are intimately connected with language areas and other higher cognitive integration regions. Other than the olfactory bulbs, most/all structures are bigger in absolute terms in the human brain compared to other primates, so one of the questions becomes what structures have changed in size as expected and which ones changed less or more than expected relative to the rest of the brain in human evolution. Posing this question does not take away from the functional importance of absolute size, but instead draws attention to parameters and structures that may be functionally significant in a bigger brain that has been reorganized (see for example Deaner et al., 2007, but also Krubitzer and Kahn, 2003 for perspectives on issues of brain size and organization in relation to function). Then the next question is what is it that “size” represents, namely what has been modified in those structures (numbers of neurons, size of dendritic branching, etc) and which of these changes have some functional relevance. Limbic regions appear indeed to have changed, not only in absolute terms but also in relative size after the last common ancestor with the chimpanzees (Lew and Semendeferi, 2017). These structures clearly underlie fundamental functions that involve aspects of detection, monitoring, motivation and emotions involved

Why do we want to talk? 107

in social behavior and language, many of which are also identified as selectively affected in some neurodevelopmental disorders.

Detection of the changing social environment and behavioral responses The amygdala is critical in both the detection of the changing social environment and the behavioral response within the social group, and is thus a key component in social cognition (Adolphs, 2009). Primate studies have found a positive correlation between the volume of parts of the amygdala and social play frequency across species, while in humans amygdala volume has been positively correlated with social network size (Bickart et al., 2011). Even though some of the structures the amygdala is connected to, like the temporal cortex, have undergone expansion in human evolution (Semendeferi et al., 2010), the amygdala had been considered to be relatively conserved across mammals. While several features of the structure appear to be indeed phylogenetically retained and shared with non-human mammals, recent studies provide evidence that the amygdala has changed in human evolution after LCA-c (Barger et al., 2007; 2012). The amygdala is composed of several distinct nuclei. Three of them are most strongly implicated in social and emotional behaviors and include the lateral, basal, and accessory basal nucleus. The lateral nucleus is the primary source of sensory input from the thalamus and temporal cortex into the amygdala, while the basal and accessory basal nuclei receive intrinsic input from the lateral nucleus, as well as input from the orbitofrontal cortex and the anterior cingulate cortex (Stefanacci and Amaral, 2000). The volume of several amygdaloid nuclei in humans and non-human apes reveals that the lateral nucleus in humans is larger than expected for an ape brain of human size (Figure 2), and is the largest nucleus of the basolateral division, while the basal nucleus is largest in all other apes (Barger et al., 2007). Furthermore, the lateral nucleus contains the greatest number of neurons in the human amygdala, while the basal nucleus contains the greatest number of neurons in the nuclei of amygdala of all other apes (Barger et al., 2012). These findings suggest that while many features of the amygdala as a whole may be conserved, subtle specializations of the structure, which is directly involved in the detection and processing of social stimuli and connected with cortical regions involved with language processing (the primary target of connections arriving from the higher order visual and auditory processing areas of the temporal lobe), occurred over the course of human evolution and may have been selected for as an adaptation to an increasingly nuanced and complex social environment. While it is premature to speculate what exactly changed in emotion and social cognition in relation to language due to the increase in the lateral nucleus, this is one of the most solid pieces of direct

108 Katerina Semendeferi






Amygdala Lateral Nucleus Volume increase Neuronal numbers increase

Orbitofrontal cortex Volume increase BA13 decrease, BA13 increase

Amygdala Basnland Central Nucleus Volume decrease Neuronal numbers decrease

Interior Cingulate Density of spindle neurons increase Dausity of calretinin pyramidal neurons inerease

Thalamus Anterior Nucleus Volume increase Neuronal numbers increase

Anterlor Insula Volume increase Dausity of CA1 pyramidal

Striarum Volume decrease

Hippocampus Volume increase Density of CA1 pyramidal neurons decrease

Figure 2.  Neuroanatomical specializations of neural substrates of emotion and social cognition in humans compared to great apes reflect evolutionary changes after the LCA with the apes. “Increase” or “decrease” refers to differences relative to what would be expected for an ape-sized human brain. It does not refer to absolute differences, as the human brain is three times larger than that of an ape. It also does not refer to encephalization which is a measure of total brain to body size relationships, not addressed here. Specific functional attributes of these structures are discussed in the text

evidence we have for increased emphasis in the neural regions linking language comprehension and the evaluation of the emotional salience of cortically derived sensory input (Barger et al., 2014) in the human brain after LCA-c. Abnormalities of amygdala structure, like volume, neuron number, neuron density, and volume of individual nuclei are a common feature of neurodevelopmental disorders in humans (Schumann and Amaral, 2006; Lew et al., 2017), and are associated with severe deficits in the social domain demonstrating that even slight changes to the amygdala are detrimental to one’s ability to navigate the human social environment. Neuroimaging and postmortem histological studies of ASD have found significant alterations to the amygdala, including an increase in amygdala volume in childhood that is no longer present in adolescence, and reduced neuron numbers in the lateral nucleus of the amygdala across age groups when compared to typically developing controls (Schumann and Amaral, 2006).

Why do we want to talk? 109

In WS, neuroimaging studies have demonstrated enlargement of the amygdala in adolescence and adulthood, and reduced activation of the amygdala in response to negative social stimuli. In contrast to ASD, microscopic investigations in WS revealed an increase in neuron numbers in the lateral nucleus of the amygdala (Lew et al., 2017). Behavioral studies suggest that both WS and ASD individuals struggle with determining saliency of conspecific stimuli in social interactions and reduced activation of amygdala is observed in response to social stimuli in WS while the opposite is the case in autism. We hypothesized that it is possible that these differences are related to a dysfunction of developmental events and proposed a few possible genetic candidates in the WS deletion (Lew et al., 2017) including transcription factor WBSCR14, which regulates tissue specific gene expression controlling neurogenesis; Gtf2i, involved in the regulation of several genes that are critical to embryonic neural development; FZD9, involved in timing of cell division and apoptosis, and PSD-95 that demonstrated to play a role in differential cellular morphology in the basolateral nuclei, but not the central nucleus of the amygdala in PSD-95 mouse knockouts. It is clear that the alterations observed in the lateral nucleus of the amygdala in both ASD and WS support the idea that quantitative differences in neural systems involved in social cognition and language have functional implications and can provide insights into the changed relationship between genotype and complex behaviors in modern humans after LCA-c.

Motivation, evaluation of error, modulation The cingulate cortex integrates neural circuits critical for motivation and evaluation of error, and modulates cognitive, endocrine, and visceral responses to stimuli (Wicker et al., 2003). The anterior part of the cingulate has specific connections with the frontal cortex and the amygdala and arguments have been made in favor of its role in intentional communication and translation of intentions into actions with overlapping functions including affective, cognitive and motor components (Benga, 2005). Specifically, the ventral part of the anterior cingulate is connected to the basolateral amygdala nuclei and the hypothalamus, forming a neural system involved in appraisal and expression of limbic functions. Other parts of the anterior cingulate located dorsally have strong connections with the dorsolateral prefrontal cortex, and are more involved in regulation of emotional behavior (Etkin et al., 2011). Furthermore, voluntary control over the initiation and suppression of vocal utterances, in contrast to completely innate vocal reactions such as pain shrieking, relies on the mediofrontal cortex, including anterior cingulate (Arbib, 2012). But the precise way in which these areas link to language areas in the frontal cortex in humans is not clear, given that the above information comes from studies

110 Katerina Semendeferi

on animal models (see below for discussion of importance of techniques and homologies). Still it has been argued that the evolution of vocal speech involved a shift in control from anterior cingulate to Broca’s area in order to include vocal elements in intentional communication (Benga, 2005). Morphologically, there have been at least two types of neurons identified in human anterior cingulate and also anterior insula that are of potential interest to cognitive evolution. One of these cell types is the spindle neuron (Von Economo, 1929), present in layer Vb of the anterior cingulate and the anterior insula in humans and other apes (Nimchinsky et al., 1999), as well as other mammals including macaques (Evrard et al., 2012). Their function is not known but it is assumed to be related to socio-emotional intelligence, because of their presence in neural areas associated with emotion and cognition in highly intelligent mammals with complex social structures (Lew and Semendeferi, 2017). In humans and other apes their density increases with decreasing phylogenetic distance to humans, and the volume of the spindle cell soma correlates positively with relative brain size, such that humans have the greatest number and volume of spindle neurons, followed by bonobos and chimpanzees (Allman et al., 2002). Spindle neurons are also selectively targeted in individuals with frontopolar dementia, a degenerative brain disease that causes deterioration of social and emotional self-awareness, moral reasoning, empathy, and theory of mind (Seely et  al., 2006), further suggesting the role of these neurons in adaptive human social behavior. Another group of neurons, calretinin-containing pyramidal projection neurons, is present in the anterior cingulate in human and non-human apes only. They are thought to be evidence for adaptation of the limbic system after LCA-m related to articulated language and its emotional implications (Hof et al., 2001).

Feelings, body and mind integration, and empathic theory of mind Damasio (1994) argued, based on evidence from studies of humans suffering from neurocognitive disorders, that bodily sensations integrate with emotions in the insula contributing to the creation of feelings. Information from systems related to autonomic regulation, perception, emotion and cognition are integrated in functionally distinct regions of the insula that subserve sensorimotor, olfactogustatory, and cognitive and socio-emotional tasks in its posterior, central and anterior regions respectively. The anterior insula in particular is a site of integration of sensory, autonomic, emotional and cognitive processing (Craig, 2009). These functionally distinct areas are also identified in the cytoarchitectonic organization of the insula that reveals that the posterior insula contains granular neocortex, which transitions to an intermediate zone of dysgranular cortex, while the anterior insula is agranular and more limbic-like. The anterior insula has significant

Why do we want to talk? 111

connectivity to other limbic regions, including the amygdala, temporal pole, orbitofrontal cortex, and anterior cingulate (Mesulam and Mufson, 1982) and provides a link between the mirror neuron system and emotion-processing that enables empathic theory of mind, including for example experiencing and observing disgust (Iacoboni and Dapretto, 2006). There are some species-specific differences in insular volume, including a relative increase in the human fronto-insula in the left hemisphere compared to chimpanzees (Lew and Semendeferi, 2017). Nevertheless, there is no correlation between absolute volume of the subdivisions (relative to whole brain size) and social group size, nor are there deviations from the overall scaling pattern in primates (Bauernfeind et al., 2013). With respect to its internal organization, spindle neurons are present in the fronto-insula in humans and several other species of primates and mammals. Density in the fronto-insula is highest in humans followed by gorillas, which is unlike what is seen in the anterior cingulate where bonobos and chimpanzees have the highest density of these neurons after humans (Allman et al., 2010). One can argue in favor of the significance or lack thereof of the density of these neurons in the great apes in light of variation in their social group size and complexity. Demonstrating a relationship between structure and function of neural features is complex, and this is why neurodevelopmental disorders can fill in some of the gaps (see also below). Also, as mentioned above, the possible relevance of spindle neurons to social cognition is so far based on the fact that they are found in related brain regions. Either way, the differences in the distribution of these neurons between anterior cingulate and anterior insula in great apes with complex social interaction serves as good reminder that information from all great ape species is valuable for reconstructing human brain evolution. The human anterior and mediodorsal thalamic nuclei are connected with other limbic structures like the mammillary bodies and the cingulate as well as the amygdala and the temporal cortex respectively. They are functionally linked to learning, memory, mediation of stress and anxiety and goal directed behaviors. They were found to have more neurons than expected for an ape brain of human size (Armstrong, 1980). In contrast, neuron numbers of other thalamic nuclei implicated in sensory and motor function were as expected for an ape brain on human size (Armstrong, 1980). These findings suggest evidence of structural reorganization of the thalamus during hominoid evolution, with a greater emphasis on limbic roles of the thalamus. There is also evidence of microstructural differences in human brain evolution, including areas in the frontal lobe. One of them, BA 13 is a distinct limbic cortical region that makes up the core of the posterior orbitofrontal region and exhibits species specific cytoarchitectonic changes in humans (Semendeferi et al., 1998). The frontal pole, BA10, while not limbic cortex as defined by connectivity

112 Katerina Semendeferi

criteria of the limbic system, shares connections with limbic posterior orbitofrontal cortex, and demonstrates differences in humans, including relative expansion of the region after LCA-c as well as cellular differences related to increased cortico-cortical connectivity evidenced by decreased neuronal body density and increased arborization in the upper layers of the cortex (Semendeferi et al., 2001; 2010; Bianchi et al., 2013). There is also comparative information regarding neuronal populations in the cortex (Hanson et al., 2014; Hrvoj et al., 2017), in particular pyramidal neurons, that form the basic unit of cortical microcircuitry (DeFelipe et  al., 2002). These neurons differ considerably in development and evolution (Hanson et al., 2014). Aspects of their dendritic morphology emerge and stabilize at different ages, with humans displaying a delayed maturation compared to other primates (Petanjek et  al., 2008, 2011; Bianchi et  al., 2013). In chimpanzee development, dendritic branching in the prefrontal cortex is reduced compared to areas processing primary information, but increases after the juvenile period. Instead growth in humans continues into young adulthood (Bianchi et al., 2013; Petanjek et al., 2008). Additional time for neuronal growth allows also for prolonged time to acquire learning skills and social behaviors toward the acquisition of cultural tools necessary for successful survival. Pyramidal neurons display differences also in neurodevelopmental disorders compared to control subjects and these differences are specific to particular disorders, including ASD (Hutsler and Zhang, 2010) and WS (Hrvoj et al., 2017, Chailangkarn et al., 2016). Differences in neuronal body density in WS frontal cortex (Horton Lew et al., 2017) maybe in part reflective of such differences in pyramidal neuron arborization indicative of an increased emphasis on neuron to neuron communication in specific parts of the sociocognitive circuitry.

Emotion, social cognition and language evolution The question of how preexisting neural substrates for emotional processing may have been adapted to also serve language is central to the theme of this volume. There is a large body of literature supporting the involvement of the frontal lobe, including specifically the posterior orbital and ventromedial regions and their intimate connections with other limbic structures, in emotional and sociocognitive functions (Adolphs, 2001) based on connectivity studies in macaques and neuroimaging studies in humans. In order to be meaningful, motor activity involved in facial expression, gestures and mirroring, needs to be tied to feedback from areas that can process emotional aspects of communication and language (Morecraft et al., 2015). One region of convergence of language and emotions is suggested to be in the inferior frontal gyrus, adapted to serve language in humans by acquiring

Why do we want to talk? 113

semantic functions while serving mostly limbic functions in the evolutionary past (Belyk, et al., 2017). Neurodevelopmental disorders that affect language and emotions involve alterations in the brain including the frontal lobe and limbic structures. For example, in Williams syndrome, individuals have reduced cortical surface area, with some areas affected more than others, such as the orbitofrontal cortex, superior parietal cortex, Sylvian fissure and temporal poles as well as the putamen and nucleus accumbens (Chun et al., 2017). Studies of the microanatomical structure reveal alterations in the underlying neural circuitry, including the decreased dendritic branching of neurons in the prefrontal cortex (Hrvoj et al., 2017), increased neuron numbers in the lateral nucleus of the amygdala (Lew et al., 2017) and a decreased neuron to glia ratio in the caudate nucleus of the striatum (Hanson et al., 2017). Some of the affected structures are relevant to language and socioemotional processing, while others underlie other aspects of the WS behavioral phenotype, like visuo-spatial cognition. We began investigating the link between cognition, the brain and molecular genetics in humans with WS (Chailangkarn et al., 2016) through the use of neural progenitor cells and cortical neurons derived from induced pluripotent stem cells, along with cortical neurons obtained from postmortem brain tissue of WS and typically developing individuals. The study revealed that in WS neural progenitor cells have an increased doubling time and apoptosis and that this cellular phenotype points to a single gene candidate, frizzled 9 (FZD9). At the neuronal stage, some WS cortical neurons have increased arborization patterns and altered network connectivity that accompanies the overall decreased cortical surface in individuals with this syndrome. These studies begin to address a series of questions, including some of the following. What are the specific cortical and subcortical language and limbic regions affected in WS? Are these regions different in typical WS (full genetic deletion) than in atypical WS (partial deletions in the same set of WS genes)? How do those compare to what is seen in the ASD? How does the typically developing human brain’s language and limbic areas differ from that of the great apes and other primates, especially those living in complex social groups, like for example the baboons? One of the fundamental principles in evolution is that increases in body size are accompanied by increases in brain size and that such increases result in grade shifts in the brain-to-body ratio (encephalization), with some taxa, like the primates being more encephalized than others. Increases in brain size carry along adjustments and structural changes to accommodate the larger numbers of neurons and increased connectivity in ways that maintain a functional brain (Striedter, 2004). At the same time selective pressures continue to be in place and shape up the brain of each species in response to ecological adaptations that are species

114 Katerina Semendeferi

specific. Brains of similar size, have for example very different distribution of cortical areas, with echolocating ghost bats having large auditory cortex versus the highly visual short tail opossum having large visual areas, etc (Krubitzer and Kahn, 2003), and hominids and other primates are no exception. So even though the same basic Bauplan (structural design) with the same building blocks is in place, a larger brain is not an enlarged version of a smaller brain, but it is a brain that has been tweaked in response to, at least, these two forces at play. It is not unreasonable to suggest that the same principles were in action in human language and socio-cognitive evolution. LCA-m had a larger brain than prosimians, accompanied by neural reorganization, increased social complexity and increased capability for communication. For LCA-c in addition to larger brains, the evidence points to changes in hominoid (human and ape) brain organization favoring increased capabilities related to larger frontal lobes and related cognitive functions, including symbolic and protolinguist capabilities, possibly more so than limbic system changes based on the evidence we have so far. The changes in the limbic system seem to set Homo sapiens apart from great apes, more so than the great apes from other primates. It is also not unreasonable to subscribe to the idea of a presence of protolanguage and basic socioemotional changes in hominids prior to Homo sapiens (see Arbib) as having been supported merely on the basis of “collateral effects” of increased absolute brain size, followed by species specific changes in the neural circuitry in response to selection. Early Homo was more encephalized and had larger brains than the LCA-c, and their brains had to be reorganized as they got bigger simply in order to stay functional. What was different in early hominids is that their larger reorganized brain was a primate brain, not a cetacean brain or an elephant brain, as it was a product of evolutionary grade shifts and reorganizational events specific to primates, animals that are generalists, behaviorally flexible and with long protracted developmental life histories. Their brain had increased neural computational power and was a “different” brain precisely because of the large size and also because some of the reorganization related to the change in posture and locomotion, all of which provided a novel neural scaffold for selective and epigenetic forces. In summary, I argued here that the much larger, fundamentally-primate-inits-organization brain of the highly encephalized early Homo sapiens reached an unprecedented threshold that combined increased neural power in a reorganized scaffold, with many novelties in the fronto-temporo-limbic circuitry thus enabling full-fledged language not only because of the symbolic capacity but also because the neural substrates of emotions changed. This slowly developing, very plastic brain became more prone to selective pressures related to increased social complexity in Homo sapiens as an adaptation to a new environment enabled by culture in addition to biology.

Why do we want to talk? 115

Towards a new road map Studies in neuropsychology that demonstrate that emotions and language are intimately linked, need to be combined with connectivity studies in nonhuman primates showing links between motor and limbic areas (Heimer and Van Hoesen, 2006). In the large brains of modern humans, neural loops involved in sociocognitive, language, emotional functions are affected in syndromes like WS and ASD. Structures like amygdala, striatum and ventromedial and orbitofrontal cortex (and as a result their connections as well), differ between LCA-c (great apes versus modern humans) and within modern humans (control, ASD, WS). Studies of neurodevelopmental disorders like WS where aspects of language as well as socioemotional processing and the related neural substrates are affected in tandem with desire to communicate with others and the related limbic neural regions being enhanced (Jarvinen et al., 2013). Current imaging technologies applied directly on large cohorts of humans (as opposed to model animals) demonstrate links between linguistic, emotional and other behaviors to brain regions, but do not provide insights into the actual neurobiology of the regions identified in the images. These technologies need to be developed further to increase resolution on the neural regions and their connections. The same applies to all studies involving great apes that are further limited by the challenge of imaging tools having been developed for human sized brains, not apes. For neuro-primatology, it is important that such technologies become available for animals of various brain sizes, and that they are noninvasive on par with those applied on humans. Histological postmortem tissue has been critical for identifying what exactly in the human brain is recruited to produce language and emotional functions, but such studies are limited by small sample sizes in both human and comparative studies and by the fact that they are extremely labor intensive. To make progress on the question of the evolution of language and emotions larger scale efforts are needed to encourage brain tissue donation and harvesting in humans and other primates. Longitudinal studies during the lifetime of individuals, whether human or ape, linking behavioral observations, repeated noninvasive imaging, genetic analysis, and eventually postmortem histological analysis of the brain, will be necessary even if only possible for smaller cohorts. Study of disorders, like Williams syndrome or Autism, can lead the way on how language and emotions intersect, and allow for powerful connections between genes, brains and behaviors. Efforts in this direction need to be expanded and supported. Neuroprimatology has a lot to gain from such work and follow the paradigms set in human studies, as opposed to experimental, invasive animal model

116 Katerina Semendeferi

studies performed mostly on rats, rhesus monkeys or other animals, selected foremost in neuroscience laboratories because of practical reasons and convenience. Identification of language areas and their interconnectivity or connectivity with other cortical and subcortical structures in apes or model animals, involves the additional challenge of ensuring that what is compared is indeed homologous (see Arbib & Bota, 2003, Aboitiz, Schoenemann and others, this volume). Given the long list of criteria needed to identify homologies in the nervous system (Striedter, 2004), homologies are best defined in the context of the specific experiment and the technique used. Attention to the issue of homology is needed from those working on all of the above. Additional adaptive neural reorganizational events in areas of the limbic system that are selectively interconnected with language areas in the frontal and temporal lobes allowed for increased emphasis on attention, motivation and inhibition that are crucial in a complex social environment. As discussed above, progress needs to be made in the currently available imaging and histological techniques to improve our ability to directly and accurately identify and compare changes in homologous neural regions and their connectivity. Selective pressures, like increased social group size and complexity, favoring the appearance of full-fledged language in large brain hominids need to be investigated in reconstructions of hominid fossil sites.

Acknowledgements I thank the Editor, reviewers and participants of the workshop for their constructive feedback on the ideas presented here. I also thank all the students, past and present, in my laboratory who produced some of the valuable empirical comparative data discussed here.

Funding This research was supported in part by NIMH R56MH109587, R03MH103697 and Kavli Institute for Brain and Mind, UC San Diego. The paper is the outcome of a workshop funded by NSF Grant No. BCS-1343544 “INSPIRE Track 1: Action, Vision and Language, and their Brain Mechanisms in Evolutionary Relationship,” (M.A. Arbib, Principal Investigator).

References Adolphs, R. (2001). The neurobiology of social cognition. Current Opinion in Neurobiology, 11(2), 231–239.  https://‍‍00202-6 Adolphs, R. (2009). The Social Brain: Neural Basis of Social Knowledge. Annu Rev Psychol, 60, 693–716.  https://‍ Adolphs, R. (2017). How should neuroscience study emotions? By distinguishing emotion states, concepts, and experiences Social Cognitive and Affective Neuroscience, 24–31.

Why do we want to talk? 117

Allman, J., Hakeem, A., Watson, K. (2002). Two phylogenetic specializations in the human brain. Neuroscientist 8 (4), 335–346.  https://‍ Allman, J. M., Tetreault, N. A., Hakeem, A. Y., et al. (2010). The von Economo neurons in frontoinsular and anterior cingulate cortex in great apes and humans. Brain Struct. Funct. 214 (5–6), 495–517.  https://‍ Anderson, D. J., & Adolphs, R. (2014). A Framework for Studying Emotions across Species. Cell, 157(1), 187–200.  https://‍ Antonell, A., de Luis, O., Domingo-Roura, X., Pérez-Jurado, L. A. (2005): Evolutionary mechanisms shaping the genomic structure of the Williams-Beuren syndrome chromosomal region at human 7q11.23. Genome Res 15, 1179–1188.  https://‍ Arbib, M. A. (2012). How the Brain Got Language: The Mirror System Hypothesis. New York & Oxford: Oxford University Press. https://‍ Arbib, M. A., & Bota, M. (2003). Language Evolution: Neural Homologies and Neuroinformatics. Neural Networks, 16, 1237–1260.  https://‍ Arbib, M. A., & Fellous, J. M. (2004). Emotions: from brain to robot. Trends Cogn Sci, 8(12), 554–561.  https://‍ Armstrong, E. (1980). A quantitative comparison of the hominoid thalamus: II. Limbic Nuclei anterior Principalis and Lateralis nucleus. Am. J. Phys. Anthropol. 52 (3), 43–54. Armstrong, E. (1990). The limbic system and culture: an allometric analysis of the neocortex and limbic nuclei. Hum. Nat. 2, 117–136.  https://‍ Barbas, H. (2015). General Cortical and Special Prefrontal Connections: Principles from Structure to Function. (Edited by: Hyman, S. E.) Annual Review of Neuroscience 38, 269–289 https://‍ Barger, N., Stefanacci, L., Semendeferi, K. (2007). A comparative volumetric analysis of the amygdaloid complex and basolateral division in the human and ape brain. Am. J. Phys. Anthropol. 403 (134), 392–403.  https://‍ Barger, N., Stefanacci, L., Schumann, C. M., et al. (2012). Neuronal populations in the basolateral nuclei of the amygdala are differentially increased in humans compared with apes: a stereological study. J. Comp. Neurol. 520 (13), 3035–3054. https://‍ Barger, N., Hanson, K. L., Teffer, K., Schenker-Ahmed, N. M., Semendeferi, K. (2014). Evidence for evolutionary specialization in human limbic structures. Front. Hum. Neurosci. 8, 1–17. https://‍ Bauernfeind, A. L., de Sousa, A. A., Avasthi, T., et al. (2013). A volumetric comparison of the insular cortex and its subregions in primates. J. Hum. Evol. 64 (4), 263–279 https://‍ Bellugi, U., Järvinen-Pasley, A., Doyle, T., Reilly, J., & Korenberg, J. (2007). Affect, social behavior and brain in Williams syndrome. Current Directions in Psychological Science, 5, 197–208. Bellugi, U., Lichtenberger, L., Mills, D., Galaburda, A., Korenberg, J. R. (1999): Bridging cognition, the brain and molecular genetics: evidence from Williams syndrome. Trends Neurosci 22, 197–207.  https://‍‍01397-1 Belyk, Michel; Brown, Steven; Lim, Jessica; et al. (2017). Convergence of semantics and emotional expression within the IFG pars orbitalis Neuroimage. 156, 240–248. https://‍ Benga, O. (2005). Intentional communication and the anterior cingulate cortex. Interaction Studies, 6, 201–221.  https://‍

118 Katerina Semendeferi Bianchi, S., Stimpson C. D., Bauernfield, A. L., Schapiro, S. J., Wallace, B. B., McArthur M. M., Bronson, E., Hopkins W. D., Semendeferi, K., Jacobs, B., Hof, P. R. and Sherwood C. C. (2013). Dendritic morphology of pyramidal neurons in the chimpanzee neocortex: regional specializations and comparison to humans. Cerebral Cortex 23(10):2429–2436 https://‍ Bickart, K. C., Wright, C. I., Dautoff, R. J., Dickerson, B. C., Barrett, L. F. (2011). Amygdala volume and social network size in humans. Nat. Neurosci. 14 (2), 163–164. https://‍ Chailangkarn, T., Trujillo, C. A., Freitas, B. C., Hrvoj-Mihic, B., Herai, R. H., Yu, D. X., Timothy T. Brown, Maria C. Marchetto, Cedric Bardy, Lauren McHenry, Lisa Stefanacci, Anna Järvinen, Yvonne M. Searcy, Michelle DeWitt, Wenny Wong, Philip Lai, M. Colin Ard, Kari L. Hanson, Sarah Romero, Bob Jacobs, Anders M. Dale8, Li Dai, Julie R. Korenberg, Fred H. Gage, Ursula Bellugi, Eric Halgren, Katerina Semendeferi & Alysson R. Muotri. (2016). A human neurodevelopmental model for Williams syndrome. Nature 536, 338–343. Chun, C. F., T. T. Brown, Hauke Bartsch, Joshua M. Kuperman, Donald J. Hagler Jr., Andrew Schork, Yvonne Searcy, Ursula Bellugi, Eric Halgren, Anders M. Dale. (2017). Williams syndrome-specific neuroanatomical profile and its associations with behavioral features NeuroImage: Clinical. 15, 343–347. Coudé, G., & Ferrari, P. F. (2018). Reflections on the organization of the cortical motor system and its role in the evolution of communication in primates Interaction Studies. https://‍ Craig, A. D. (2009). How do you feel now? the anterior insula and human awareness. Nat. Rev. Neurosci. 10, 59–70.  https://‍ Damasio, A. R. (1994). Descartes’ Error Grosset/Putnam, New York. Deaner, R. O., Isler, K., Burkart, J., & van Schaik, C. (2007). Overall brain size, and not encephalization quotient, best predicts cognitive ability across non-human primates. Brain, Behavior and Evolution, 70(2), 115–24.  https://‍ DeFelipe, J., Alonso-Nanclares, L., Arellano, J. I. (2002): Microstructure of the neocortex: comparative aspects. J Neurocytol 31, 299–316.  https://‍ Etkin, A., Egner, T., Kalisch, R. (2011). Emotional processing in anterior cingulate and medial prefrontal cortex. Trends Cogn. Sci. 15 (2), 85–93.  https://‍ Evrard, H. C., Forro, T., Logothetis, N. K. (2012). Von Economo neurons in the anterior insula of the macaque monkey. Neuron 74 (3), 482–489 https://‍ Falk, D. (2016). Evolution of brain and culture: the neurological and cognitive journey from Australopithecus to Albert Einstein J Anthropological Sciences. 94, 99–111. Ferrari, P. F., Gerbella, M., Coudé, G., & Rozzi, S. (2017). Two different mirror neuron networks: The sensorimotor (hand) and limbic (face) pathways. Neuroscience 358, 300–315. https://‍ Hanson, K. L., Branka Hrvoj-Mihic and Katerina Semendeferi. (2014). A Dual Comparative Approach: Integrating Lines of Evidence from Human Evolutionary Neuroanatomy and Neurodevelopmental Disorders Brain Behav Evol 2014; 84, 135–155. Hanson, K. L., Lew C. H., Hrvoj-Mihic, B., Groeniger K. M., Halgren, E., Bellugi, U. and K. Semendeferi. (2017). Increased glia density in the caudate nucleus in Williams syndrome: implications for frontostriatal dysfunction in autism. Developmental Neurobiology, published online.

Why do we want to talk? 119

Heimer, L. and Van Hoesen G. W. (2006). The limbic lobe and its output channels: implications for emotional functions and adaptive behavior. Neuroscience & Biobehavioral Reviews 30(2):126–147.  https://‍ Hof, P. R., Nimchinsky, E. A., Perl, D. P., Erwin, J. M. (2001). An unusual population of pyramidal neurons in the anterior cingulate cortex of hominids contains the calcium-binding protein calretinin. Neurosci. Lett. 307 (3), 139–142. https://‍‍01964-4 Horton Lew, C., C. Brown, U. Bellugi, and K. Semendeferi. (2017). Neuron density is decreased in the prefrontal cortex in Williams syndrome. Autism Research 10, 99–112. https://‍ Hrvoj-Mihic, B.; Hanson, Kari L.; Lew, Caroline H.; et al. (2017). Basal Dendritic Morphology of Cortical Pyramidal Neurons in Williams Syndrome: Prefrontal Cortex and Beyond Frontiers in Neuroscience. 11, 419. Hutsler, J. J., Zhang, H. (2010): Increased dendritic spine densities on cortical projection neurons in autism spectrum disorders. Brain Res 1309, 83–94. https://‍ Iacoboni, M., Dapretto, M. (2006). The mirror neuron system and the consequences of its dysfunction. Nat. Rev. Neurosci. 7 (12), 942–951.  https://‍ Jarvinen, A., Korenberg, J., Bellugi, U. (2013). The social phenotype of Williams syndrome. Current Opinion in Neurobiology 23, 1–9.  https://‍ Krubitzer, L., Kahn, D. M. (2003). Nature versus nurture revisited: an old idea with a new twist. Progress in Neurobiology 70, 33–52.  https://‍‍00088-1 LeDoux, J. (1996). The Emotional Brain, Simon & Schuster. Lew, C. H., Semendeferi, K. (2017). Evolutionary Specializations of the Human Limbic System. In: Kaas, J. (ed.), Evolution of Nervous Systems 2e. vol. 4, pp. 277–291. Oxford: Elsevier. https://‍ Lew, C. H., Groeniger, K. M., Bellugi, U., Stefanacci, L., Schumann, C. M., K. Semendeferi. (2017). A postmortem stereological study of the amygdala in Williams syndrome. Brain Structure and Function, published online.  https://‍ Mesulam, M. M., Mufson, E. J. (1982). Insula of the old world monkey III: efferent cortical output and comments on function. J. Comp. Neurol. 212 (1), 38–52 https://‍ Morecraft, R. J., K. S. Stilwell-Morecraft, J. Ge, P. B. Cipolloni, D. N. Pandya. (2015). Cytoarchitecture and cortical connections of the anterior insula andadjacent frontal motor fields in the rhesus monkey Brain Research Bulletin. 119, 52–72 https://‍ Nimchinsky, E. A., Gilissen, E., Allman, J. M., Perl, D. P., Erwin, J. M., Hof, P. R. (1999). A neuronal morphologic type unique to humans and great apes. Proc. Natl. Acad. Sci. U.S.A. 96 (9), 5268–5273.  https://‍ Okon-Singer, H., Hendler, T., Pessoa, L., Shackman, A. (2015). The neurobiology of emotioncognition interactions: fundamental questions and strategies for future research. Frontiers in Human Neuroscience, 9, 58.  https://‍ Pessoa, L. (2013). The Cognitive-Emotional Brain: From Interactions to Integration. Cambridge, MA: MIT Press.  https://‍ Petanjek, Z., Judas, M., Kostovic, I., Uylings, H. B. M. (2008): Lifespan alterations of basal dendritic trees of pyramidal neurons in the human prefrontal cortex: a layer-specific pattern. Cereb Cortex 18, 915–929.  https://‍

120 Katerina Semendeferi Petanjek, Z., Judas, M., Simic, G., Rasina, M. R., Uylings, H. B. M., Rakic, P., Kostovic, I. (2011): Extraordinary neoteny of synaptic spines in the human prefrontal cortex. Proc Natl Acad Sci USA 108, 13281–13286.  https://‍ Schumann, C. M., Amaral, D. G. (2006). Stereological analysis of amygdala neuron number in autism. J. Neurosci. 26 (29), 7674–7679.  https://‍ Seeley, W. W., Carlin, D. A., Allman, J. M., et al. (2006). Early frontotemporal dementia targets neurons unique to apes and humans. Ann. Neurol. 60 (6), 660–667. https://‍ Semendeferi, K., Armstrong, E., Schleicher, A., Zilles, K., Van Hoesen, G. W. (1998). Limbic frontal cortex in hominoids: a comparative study of area 13. Am. J. Phys. Anthropol. 106 (2), 129–155. https://‍‍1096-8644(199806)‍106:23.0.CO;2-L Semendeferi, K., Armstrong, E., Schleicher, A., Zilles, K., Van Hoesen, G. W. (2001). Prefrontal cortex in humans and apes: A comparative study of Area 10. Am. J. Phys. Anthropol. 114, 224–241.  https://‍‍114:33.0.CO;2-I Semendeferi, K., N. Barger, N. Schenker Brain reorganization in humans and apes. In: Human Brain Evolving. D. Broadfield, M. Yuan, N. Toth, and K. Schick (Eds) Stone Age Institute Press (4th volume). David Brown Book Company and Oxbow Books, pp.119–155 (2010). Stefanacci, L., Amaral, D. G. (2000). Topographic organization of cortical inputs to the lateral nucleus of the macaque monkey amygdala: a retrograde tracing study. J. Comp. Neurol. 421, 52–79.  https://‍‍1096-9861(20000522)‍421:13.0.CO;2-O Striedter, G. F. (2004). Principles of Brain Evolution Sinauer Associates. Von Economo, C. (1929). The Cytoarchitectonics of the Human Cerebral Cortex. Oxford University Press, Oxford, UK. Wicker, B., Keysers, C., Plailly, J., Royet, J. P., Gallese, V., & Rizzolatti, G. (2003). Both of us disgusted in My insula: the common neural basis of seeing and feeling disgust. Neuron, 40(3), 655–664.  https://‍‍00679-2

Mind the gap – moving beyond the dichotomy between intentional gestures and emotional facial and vocal signals of nonhuman primates Katja Liebal and Linda Oña Freie Universität Berlin

Despite the variety of theories suggesting how human language might have evolved, very few consider the potential role of emotions in such scenarios. The few existing theories jointly highlight that gaining control over the production of emotional communication was crucial for establishing and maintaining larger social groups. This in turn resulted in the development of more complex social emotions and the corresponding sophisticated socio-cognitive skills to understand others’ communicative behavior, providing the grounds for language to emerge. Importantly, these theories propose that the ability of controlling emotional communication is a uniquely human trait, an assumption that we will challenge. By taking a comparative approach, we discuss recent findings from behavioral and neurobiological studies from our closest relatives, the nonhuman primates, on the extent of control over their gestural, facial and vocal signals. This demonstrates that research foci differ drastically across these modalities, which further enhances the traditional dichotomy between emotional, involuntary facial and vocal expressions in contrast to intentionally, voluntarily produced gestures. Based on this brief overview, we point to gaps of knowledge in primate communication research and suggest how investigating emotional expressions in our closest relatives might enrich the road map towards the evolution of human language. Keywords: language evolution, emotional control, intentional, gesture, facial expression, orofacial movements, vocalization, primates

Background Language enables us to express a variety of emotions, either by communicating them directly (e.g., “I am afraid of giving talks”) or as multimodal concomitants of spoken language (e.g., a trembling or high-pitched voice while giving a talk).

https://‍ © 2020 John Benjamins Publishing Company

122 Katja Liebal and Linda Oña

However, despite the variety of theories on how language might have evolved, the potential role of emotions in such scenarios is only rarely considered (Koelsch et al., 2015). The few existing theories jointly highlight the importance of gaining control over emotional communication, and its significance for the evolution of uniquely human sociality and cognition as prerequisites for the emergence of language. After introducing two theories on language evolution, we elaborate on how they are supported by empirical evidence from comparative research on nonhuman primates (hereafter: primates). As intentional production is a key characteristic of human language, we aim at identifying precursors to this ability by discussing recent findings from behavioral and neurobiological studies on the extent of control over their different communicative means. While this suggests that primates other than humans also have control over the production of at least some of their emotional expressions, at the same time, this also points to the current dichotomy in primate communication research, differentiating between emotional, involuntary facial and vocal expressions and intentionally, voluntarily produced gestures.

Scenarios of language evolution and the role of emotions In the following, we will introduce two theories that propose different relationships between language and emotions. While Turner (1996) suggested that emotional control was an essential first step in the emergence of language, Jablonka and colleagues (2012) claimed that language emerged first, which was necessary to gain control over emotional communication. At this stage, we are merely reporting their main, partly very speculative claims, without evaluating existing evidence to confirm or reject their assumptions. Turner (1996) proposed that emotions played a critical part in the evolution of human sociality, which then prepared the grounds for the emergence of language. Thus, the uniquely human rich repertoire of complex emotions, together with the capacity for language, has been suggested to be an incidental by-product of other selection forces (Maryanski & Turner, 1992). Turner (1996) rejected the notion of humans as an “innately social species” and suggested that our hominoid ancestors were forced to live in more cohesive groups, because of the increased predation risk when their environment changed from woods to more open savanna. In this situation, emotions functioned as a crucial “compensatory mechanism” for the rather low sociality of our ancestors, by fostering cohesive group structures enabling bonding and “tie formation” between group members. Along those lines, Spoor and Kelly (2004) argue that “more primitive, automatic” processes like emotional contagion played an essential role in establishing and maintaining social cohesion, as this enabled the development of bonds among group members (also non-kin), and the coordination of group activities by creating “shared affects” (achieved by

Intentional gestures and emotional facial and vocal signals of nonhuman primates 123

mimicking others’ emotional behavior, such as vocalizations or facial expressions), which in turn enabled cooperation and “…to work together in the pursuit of shared desired outcomes” (Spoor & Kelly, 2004, p. 401). Others, however, argue that the evolution of complex group structures resulted in specific demands regarding the involuntary communication of emotional states, as this might be disadvantageous in some situations (Cosmides & Tooby, 2000). Therefore, gaining control over emotional expressions, which are largely unconnected to the corresponding emotional state, was a crucial step in human evolution (Fridlund, 1994). The need to live in more cohesive groups represented the selection pressure to evolve a more varied emotional repertoire, together with the ability to control the expression of emotions, to maintain group cohesion, but also to not attract predators by “noisy emotional responses” and to hunt effectively (Turner, 1996). On a neurobiological level, this resulted in the evolution of cortical “integration areas” (as reliance on emotional communication required the capacity to understand others’ emotional expressions, which in turn required the efficient integration of different types of sensory information), and connections between these cortical areas and the limbic system (which enabled the increasing control over emotional expressions). These social and neurobiological developments prepared the grounds for language to evolve, as increasingly complex hominid groups required “more enhanced communication” (Turner, 1996). Jablonka and colleagues (2012) suggested a different scenario and proposed that language was crucial for the development of emotional control, specifically for inhibiting certain emotions. This process of “self-domestication” supported the emergence of a rich, uniquely human set of social emotions, which played an important role in maintaining social groups and in regulating cooperative activities, like hunting. Increased control of emotions was also a prerequisite for the development of different social and technological practices, such as alloparenting and tool-making, which are considered important for the evolution of human cognition (Jablonka et al., 2012). Thus, as suggested by Turner (1996), emotional control together with a set of uniquely human social emotions was necessary for maintaining social groups. However, unlike Turner (1996), Jablonka and colleagues emphasize that language was necessary for gaining control over emotions, since “…it could be advantageous for the signs themselves (unlike iconic representational signs, such as onomatopoeic words) not to carry any inherent emotional baggage” (Jablonka et al., 2012, p. 2157). Further aspects of such scenarios suggesting the co-evolution of language and emotional communication are that language supported not only the development of new emotions as well as sharing this information of experiencing these emotions with others (Jablonka et  al., 2012), but also played an important role in disambiguating emotional signals (Feldman Barrett, Lindquist, & Gendron, 2007).

124 Katja Liebal and Linda Oña

Despite these different approaches to the role of emotions in scenarios of language evolution, they commonly point to the importance of gaining control over emotions to develop more varied repertoires of emotions, particularly social emotions, to establish or maintain cohesive group structures. Both scenarios claim that these skills  – controlled and rich emotional communication, in addition to the capacity for language – are uniquely human characteristics, separating them from other primates (Jablonka et al., 2012; Turner, 1996). However, primates’ abilities to control their communication and specifically their emotional expressions were not discussed in these theoretical accounts of language evolution. We therefore aim at providing an overview about primates’ abilities to control the production of their different communicative means, including gestures, facial expressions, and vocalizations, by discussing both behavioral and neuroscientific studies. First, however, we will briefly introduce the comparative approach to language evolution to better understand why research foci differ drastically in primate communication research across the gestural, facial, and vocal modalities.

Comparative approaches to language evolution Many scholars have argued that it is highly unlikely that a trait as complex as language evolved from scratch in the human lineage only (Pinker & Bloom, 1990), and that “building blocks” to language were already present in our shared common ancestor (Arbib, 2005). Proponents of this continuity approach suggest that a comparative approach addressing the communicative abilities of other species, particularly primates, is useful to identify such potential precursors to human language (Boe, Fagot, Perrier, & Schwartz, 2018). However, which aspects of primate communication are studied as potential building blocks for language heavily depends on which communicative modality is considered the most promising candidate for the origin of human language (Slocombe, Waller, & Liebal, 2011). Thus, comparative researchers assuming a vocal origin are mostly interested in functionally referential vocalizations. They are produced in specific contexts (e.g., predation), in response to a specific stimulus (the “referent”, e.g., a predator), and receivers of these calls show a specific, stimulus-independent response, as they also respond in the absence of the eliciting stimulus (e.g., the predator) (Evans, 1997). Vocal researchers investigate whether primates use their calls to refer to specific events or objects in their environment, and whether they combine them into meaningful sequences (Arnold & Zuberbühler, 2006). As they focus on the calls’ influence on the recipient’s behavior, it is of less importance whether such vocalizations are voluntarily produced. In contrast, scholars supporting a gestural origin, focus on the signaler’s behavior. They investigate whether primates have control over the production of their gestures and

Intentional gestures and emotional facial and vocal signals of nonhuman primates 125

whether signalers adjust their gesture use to the recipient’s behavior and the social context. Much attention is currently paid to the question how gestures are acquired over a lifetime (Liebal, Schneider & Lembeck, 2018), as in contrast to vocalizations, new gestures can be incorporated into individual repertoires. Related to this, researchers study the variability of gestural repertoires within and between different groups to investigate if and how gestures are transmitted to the next generation (Call & Tomasello, 2007). Finally, orofacial movements receive increasing attention as they connect the motor-visual and vocal-auditory modalities and may thus represent the evolutionary link between the gesture and language. Researchers interested in orofacial movements therefore investigate the rhythmic patterns of mouth movements, the integration of their visual and auditory components, and the extent of control over their production (Bergman, 2013).

Emotional and intentional communication in nonhuman primates In the following, we will discuss the extend of control over the production of different signal modalities separately for each modality, by evaluating behavioral and neuroscientific studies on primate communication. The term “intentional” is defined as voluntarily produced and purpose-full, goal-directed behavior (Benga, 2005), while “emotional” is used to refer to involuntary, more reflexive, automatic actions, although we will show that this term is conceptualized very differently across studies.

Facial expressions Facial expressions are often referred to as important means of conveying emotional information (Darwin, 1889/1998). Note, however, that “emotional” can mean different things, which are not mutually exclusive. First, it can refer to the fact that at least in humans, facial expressions are often linked to specific emotional states (Ekman, 1992). Second, “emotional” is used in a sense that signalers have little to no control over the production of a facial movement. In the following, we will address both issues. In primates, it is largely unknown whether recipients associate specific emotional states with other conspecifics’ facial expressions (Parr, Waller, & Fugate, 2005), although it has been shown, for example, that crested macaques can predict the outcomes of social interactions based on others’ facial expressions (Waller, Whitehouse, & Micheletta, 2016). For the human observer, it might be even misleading to infer specific emotional states from primates’ facial expressions. Thus, even if primate facial expressions share similar structural properties with those of humans, they may have different functions across species. Therefore, researchers often deliberately avoid assigning specific emotional states to primates’ facial

126 Katja Liebal and Linda Oña

expressions. Instead, they use modified versions of the Facial Action Coding System (FACS) (e.g., chimpFACS: Vick, Waller, Parr, Smith Pasqualini, & Bard, 2007; gibbonFACS: Waller, Lembeck, Kuchenbuch, Burrows, & Liebal, 2012) originally developed for humans (Ekman & Friesen, 1978) to categorize facial movements based on a set of minimal observable criteria (“action units”) caused by specific muscle contractions. As this coding method provides very detailed information about structural properties of primate facial movements, it is increasingly used to investigate whether primates modify them depending on the social context, which would indicate that they have some control over their production. For example, Waller and colleagues (2016) showed that orangutans adjust their facial expressions to the recipient’s attentional state, as variants of their ‘playface’ were “more complex” when the recipient was visually attending. Scheider and colleagues (2016) found that facial expressions of different small ape species lasted longer when they were used in social contexts than in non-social contexts. Although this might indicate that orang-utans and small apes are able to control at least some of their facial expressions, both studies acknowledge that lower level explanations are possible, such as increased levels of arousal when individuals are facing each other, causing facial expressions to last longer than in non-facing situations. Interestingly, there are reports of some rare instances describing that both monkeys and apes manually conceal their involuntary facial expressions. For example, Thunström and colleagues (2014) observed a male Barbary macaque who repeatedly covered his face when showing a ‘play face’ or when he screamed. Tanner and Byrne (1993) described a gorilla, who covered her ‘play face’ with her hand to hide her face, as she had no control over the production of this facial expression itself. This may suggest that primates are aware of the possible consequences that showing a facial expression might have, as they manually compensate for not being able to inhibit their facial movements; however, it is important to highlight that the function of such “face-covering” is not well understood. To complicate issues even further, some scholars distinguish between two types of facial expressions that seem to differ regarding the signaler’s extent of control over their production. Orofacial movements, which may co-occur with vocalizations, are often referred to as intentionally produced signals, as compared to “spontaneous emotional expressions”, shown in contexts of high arousal, such as play- or pout-faces. While this is not a clear-cut distinction, the identification of two separate neuroanatomical routes seems to support the notion of two “types” of facial movements, which differ in the extent of volitional control. Spontaneous emotional facial movements are mediated by the facial nucleus in the pons, while different parts of the motor cortex are involved in the production of voluntary facial movements (Müri, 2016; Parr et al., 2005). Interestingly, the facial nucleus of

Intentional gestures and emotional facial and vocal signals of nonhuman primates 127

great apes and humans is larger than expected based on phylogenetic regression (Sherwood et  al., 2005), indicating the greater differentiation of facial muscles, which possibly reflects greater “emotional control” over the corresponding facial movements in these species (Parr et  al., 2005). Furthermore, as in humans, the primary orofacial motor cortex of great apes is enriched with neurofilament protein, possibly indicating more voluntary control over the production of orofacial movements (Sherwood, Holloway, Erwin, & Hof, 2004). While these findings seem to suggest that both humans and apes differ from monkeys in their increased control over facial movements, specifically orofacial movements, there is growing evidence that monkeys also have control over their orofacial movements (Ferrari, Gerbella, Coudé, & Rozzi, 2017). Together these findings show that, in both monkeys and great apes, different brain areas, including cortical structures, are involved in the production of some of their facial movements, and that this ability was most likely already present in the shared common ancestor of monkeys and humans (LCA-m).

Vocalizations Primate vocalizations, like facial expressions, are often referred to as emotional signals, as specific call types seem to be linked to social contexts characterized by specific emotional states (for example, in bonobos, ‘threat barks’ occur during aggressive attacks, ‘pant grunts’ in submissive greetings, pant-laughing in positive social interaction, de Waal, 1988). In addition to this information about the emotional valence of an interaction, many vocalizations provide semantic information about specific referents (e.g., predator species or food types), and are therefore categorized as functionally referential signals. Regarding the extent of control of such signals, it seems that both emotional and semantic information is not intentionally provided by the calling individual, but rather extracted by the recipient (Seyfarth & Cheney, 2003). Furthermore, vocal repertoires seem to be innate and speciesspecific, as there is little variability between individual repertoires and only limited ability of learning new vocalizations (Owren, Dieter, Seyfarth, & Cheney, 1992). This notion of vocalizations as innate, emotional and involuntarily produced signals was supported by earlier neurobiological studies reporting that primate vocal production is largely mediated by several motor nuclei in the pons and the reticular formation in the medulla, with no direct connections to cortical motor areas (Jürgens, 2002). Nonetheless, recent behavioral and neurobiological studies with both monkeys and great apes challenge this traditional perspective (Gavrilov, Hage, & Nieder, 2017; Ghazanfar & Eliades, 2014). First, Schel et al. (2013) showed that chimpanzees’ use of alarm calls, when they encounter a snake, meets several criteria of intentional production, as they are directed towards specific, uninformed

128 Katja Liebal and Linda Oña

individuals and are not produced anymore when all individuals know about the predator’s presence. This indicates that at least in chimpanzees, alarm calls are not just emotional responses upon the discovery of predators, but that these apes have control over the production of these calls (see also Crockford, Wittig, Mundry, & Zuberbühler, 2012). Second, considering neurobiological substrates, different studies found evidence for cortical control in the vocal production of rhesus monkeys, as neurons in the ventral premotor cortex were active when they produced a conditioned coo-call, but not when they uttered these calls spontaneously, suggesting that they were at least partly capable of controlling their larynx (Coudé et al., 2011). Note that the monkeys did not acquire a new vocalization; instead, the rate of their species-specific coo-calls was increased by training. Third, there is evidence that chimpanzees can acquire novel sounds, such as ‘raspberries’ or ‘whistles’, suggesting that they have some control over their supra-laryngeal vocal tract (Zuberbühler, 2015). Interestingly, in chimpanzees, these learned attention-getting sounds seem to be processed in different brain regions than species-specific vocalizations. These brain structures include the inferior frontal gyrus (homologous to the Broca’s area in humans), which is also involved in the production of manual pointing gestures (Taglialatela, Russell, Schaeffer, & Hopkins, 2008). This shows that it is important to differentiate vocalizations (e.g., ‘screams’) from sounds (e.g., ‘raspberries’), since they engage different anatomical structures and might therefore differ in the extent of voluntary control of their production. Furthermore, as suggested by Gruber and Grandjean (2017), even if vocalizations convey emotional information, this does not necessarily mean that callers have no control over their production, as some studies highlight the flexible use of at least some vocalizations across different social contexts (Clay, Archbold, & Zuberbühler, 2015).

Gestures Primate gestures include signals of different sensory modalities, such as distant visual gestures (e.g., ‘extend arm’), tactile gestures involving physical contact between the interacting individuals (e.g., ‘slap’), and auditory gestures, which produce a sound without engaging the vocal folds (e.g., ‘hand clap’). Gestures are defined as purposeful behaviors, used to achieve different social goals by changing the recipients’ behavior (Call & Tomasello, 2007). Gestures are therefore, by definition, intentional signals. Different criteria, adopted from research into gesture use of pre-linguistic children (Bates, Benigni, Bretherton, Camaioni, & Volterra, 1979), are applied to identify intentional gesture use in primates. For example, gestures are only produced in the presence of an audience, and their use is adjusted to the attentional state and/or response of the recipient, indicating their potential for flexible use (Leavens, Russell, & Hopkins, 2005). Although there is little

Intentional gestures and emotional facial and vocal signals of nonhuman primates 129

agreement over which and how many of these criteria are necessary and sufficient to define a gesture (Liebal, Waller, Burrows, & Slocombe, 2013), this definition explicitly excludes “…inflexible expressions of […] emotional states….” (Liebal, Pika, & Tomasello, 2004). One exception is the study by Roberts and Roberts (2016) that considers emotional states of both senders and recipients in their analysis of chimpanzee gestures. They argue that the intensity of arousal has an impact on the behavioral coordination between the interacting individuals, as in contexts of high arousal, “highintensity communication” in the form of tactile and auditory gestures is used more frequently between individuals with weaker bonds, as it requires the recipient’s immediate response with limited opportunities of negotiation. In contrast, lowarousal signals, which are mostly visual gestures, contain less information about the signaler’s emotional state. They require more contextual information to respond appropriately and are therefore preferentially used within dyads with close bonds, as this makes it easier to obtain such information (Roberts & Roberts, 2016). Regarding neurobiological substrates of gestures, most research has focused on the question of whether the corresponding brain areas mediating gesture production in primates are homologues to human brain regions involved in language processing (e.g., Broca’s area), and whether the production of manual gestures is lateralized as found in humans. Much of this research has been inspired by the Mirror Neuron Hypothesis (Arbib, 2012), which has been developed based on the observation that neurons in the F5 region of the macaque brain are active both when monkeys observe manual actions or communicative mouth movements and when they perform these behaviors themselves (Di Pellegrino, Fadiga, Fogassi, Gallese, & Rizzolatti, 1992; Ferrari, Gallese, Rizzolatti, & Fogassi, 2003). Interestingly, the F5 region of macaques and Broca’s area in humans are homologous structures. As Broca’s area is not only involved in language production, but also in the execution of arm movements, these findings seem to support theories of a gestural origin of human language and that the neurobiological foundations were already present in the last common ancestor of humans and monkeys (Rizzolatti & Arbib, 1998). Further support is provided by studies with great apes, who easily learn to point in interactions with human experimenters. Chimpanzees, who preferentially point with their right hand, have a larger inferior frontal gyrus in their left hemisphere (Taglialatela, Cantalupo, & Hopkins, 2006), and parts of this brain area (together with other cortical and subcortical structures) are activated during their production of pointing gestures (Taglialatela et al., 2008). This finding seems to confirm the close evolutionary link between manual actions and language production, as chimpanzees’ inferior frontal gyrus and humans’ Broca’s area are homologous structures, and because the lateralization of chimpanzee pointing

130 Katja Liebal and Linda Oña

gestures suggests that this lateralized manual communication system was most likely present in the shared common ancestor of chimpanzees and humans.

How can comparative research on emotions contribute to theories of language evolution? Theories that integrate emotions into scenarios of language evolution highlight that emotional control was an essential step in the emergence of language and claim that this capacity is uniquely human. We aimed at challenging this assumption by considering the communicative abilities of other primates, with focus on the extent of control over their gestural, vocal and facial signals. As intentional production is one of the key features of human language, evidence for voluntary control over their different communicative means would indicate that precursors to this ability are shared with other primates. We showed that first, research foci drastically differ across modalities, as intentional and flexible production is frequently studied in the gestural modality, while vocal and facial research considers these aspects only to a very limited extent. This often results in the notion of facial expressions and vocalizations as “emotional” signals, although there is increasing behavioral and neurobiological evidence, which challenges the traditional distinction between intentional and emotional signal types. Second, it became obvious that across studies, the term “emotional” is used very differently. Although vocalizations and facial expressions are described as “emotional signals”, researchers avoid attributing specific emotions to these signals and use “emotional” either as an equivalent to high arousal or involuntary, automatically produced behavior. In other words, the exact nature of the emotional component in primate communication remains unclear. Therefore, in this final section, we will provide some suggestions how to tackle these issues and point out how this will contribute towards a new road map of language evolution. We suggest that, to enrich the current version of the road map, it is important to consider if the control over emotional communication – evident in the inhibition or modification of emotional expressions – is unique to humans, or whether this capacity is shared with other primates and thus was already present in the last shared common ancestor of humans and great apes (LCM-c), or even earlier in the common ancestor of humans and monkeys (LCA-m). Our overview indicated that there is evidence for the intentional production for at least some vocalizations and maybe also facial expressions in chimpanzees and potentially also in monkeys; however, studies are difficult to compare, as they differ in their use of the term “emotional”, the criteria used to identify intentional use, and because they usually exclusively focus on one modality only. Therefore, to investigate if primates have control over their communicative means, it is important to systematically

Intentional gestures and emotional facial and vocal signals of nonhuman primates 131

compare different species of monkeys and great apes to understand when this capacity for intentional production emerged over evolutionary times. Furthermore, one could argue that “intentional control of communication” is a different issue than “intentional control over emotional communication”. As in evolutionary scenarios on language evolution emotional control represents a crucial prerequisite for maintaining larger groups, it could be tested whether those primate species living in groups with more complex social structures show a more varied repertoire of emotional expressions (as suggested by Dobson, 2009) and are also able to inhibit the expression of certain emotions if it might be disadvantageous for them (Wilson, Hauser, & Wrangham, 2007) compared to those species with less complex social structures. First, however, comparative researchers need to define explicitly what they mean when using the term “emotional” in the context of studying communication. For example, some have suggested that facial expressions function as “metacommunicative” signals (Bateson, 1955), specifically in the play context, to make sure that gestures like ‘hit’ are indeed perceived as playful and not as aggressive behavior (Bekoff & Allen, 1997). In such instances, the facial expression represents an emotional expression and not an intended signal (Chevalier-Skolnikoff, 1994). Along the same lines, it has been suggested that emotional signals are honest signals, because in contrast to intended communication, they are involuntary signals and therefore cannot be faked (Dezecache, Mercier, & Scott-Phillips, 2013). Thus, while intentional signals may offer the possibility to withhold information or to use a “wrong” signal to deceive recipients, the co-occurring, spontaneous facial expression will incidentally inform the recipient about the true, underlying intention of this communicative attempt. One step towards investigating this issue is to study combinations of different modalities to investigate whether the use of and response to a combination of an intentional gesture with an emotional expression (facial or vocal) is different from when this gesture is produced in isolation. This will not enable us to determine the underlying emotional state; however, we may get answers to the question if emotional expressions add information to the gesture or if they even change its original meaning (Oña et al., in prep). Finally, both Turner (1996) and Jablonka et al. (2012) mentioned many other aspects that might be important for the evolution of language, which could not be addressed in more detail in this paper. These are for example, rich emotional repertoires of more complex social emotions (e.g., gratitude and pride), but also an increased “social awareness” to efficiently identify others’ emotional expressions, in addition to social and technical practices, such as allo-parenting and toolmaking, which might have contributed to the evolution of language in different ways. Future studies need to systematically compare across primate species to investigate the exact nature of the relationships between social and communicative

132 Katja Liebal and Linda Oña

complexity, to determine which of these different aspects are unique to humans, and which are shared with other species to develop more comprehensive theories of language evolution that explicitly integrate emotions in such scenarios.

Funding This research was supported by the Excellence Initiative of the German Research Foundation and the ERC project “The Grammar of the Body: Revealing the Foundations of Compositionality in Human Language (GRAMBY, 340140)”, directed by Wendy Sandler, University of Haifa, Israel. The paper was prepared for a workshop funded by NSF Grant No. BCS-1343544 “INSPIRE Track 1: Action, Vision and Language, and their Brain Mechanisms in Evolutionary Relationship”.

References Arbib, M. A. (2005). From monkey-like action recognition to human language: An evolutionary framework for neurolinguistics. Behavioral and Brain Sciences, 28(2), 105–124. https://‍ Arbib, M. A. (2012). How the Brain Got Language: The Mirror System Hypothesis (Vol. 16). New York & Oxford: Oxford University Press. https://‍ Arnold, K., & Zuberbühler, K. (2006). Language evolution: Semantic combinations in primate calls. Nature, 441(7091), 303–303.  https://‍ Bates, E., Benigni, L., Bretherton, I., Camaioni, L., & Volterra, V. (1979). The Emergence of Symbols: Cognition and Communication in Infancy. New York: Academic Press. Bateson, G. (1955). A theory of play and fantasy. Psychiatric Research Reports, 2(39), 39–51. Bekoff, M., & Allen, C. (1997). Intentional communication and social play: How animals negotiate and agree to play. In M. Bekoff & J. A. Byers (Eds.), Animal play: Evolutionary, comparative and ecological perspectives (pp. 97–114). Cambridge, New York: Cambridge University Press. Benga, O. (2005). Intentional communication and the anterior cingulate cortex. Interaction Studies, 6(2), 201–221.  https://‍ Bergman, T. J. (2013). Speech-like vocalized lip-smacking in geladas. Current Biology, 23(7), R268–R269.  https://‍ Boe, L., Fagot, J., Perrier, P., & Schwartz, J. -L. (2018). Origins of human language: Continuities and discontinuities with nonhuman primates. Frankfurt/Main: Peter Lang. Call, J., & Tomasello, M. (Eds.). (2007). The gestural communication of apes and monkeys. New York: Lawrence Erlbaum Associates. Chevalier-Skolnikoff, S. (1994). The primate play face: A possible key to the determinants and evolution of play. Rice University Studies, 60(3), 9–29. Clay, Z., Archbold, J., & Zuberbühler, K. (2015). Functional flexibility in wild bonobo vocal behaviour. PeerJ, 3, e1124.  https://‍ Cosmides, L., & Tooby, J. (2000). Evolutionary psychology and the emotions. In M. Lewis & J. M. Haviland-Jones (Eds.), Handbook of Emotions (pp. 91–115). New York: The Guilford Press.

Intentional gestures and emotional facial and vocal signals of nonhuman primates 133

Coudé, G., Ferrari, P. F., Rodà, F., Maranesi, M., Borelli, E., Veroni, V., … Fogassi, L. (2011). Neurons controlling voluntary vocalization in the macaque ventral premotor cortex. PLoS ONE, 6(11), e26822.  https://‍ Crockford, C., Wittig, R. M., Mundry, R., & Zuberbühler, K. (2012). Wild chimpanzees inform ignorant group members of danger. Current Biology, 22(2), 142–146. https://‍ Darwin, C. (1889/1998). The Expression of Emotion in Man and Animals (3rd ed.). London: Harper Collins. de Waal, F. B. M. (1988). The communicative repertoire of captive bonobos (Pan paniscus) compared to that of chimpanzees. Behaviour, 106(3), 183–251. https://‍ Dezecache, G., Mercier, H., & Scott-Phillips, T. C. (2013). An evolutionary approach to emotional communication. Journal of Pragmatics, 59, 221–233. https://‍ Di Pellegrino, G., Fadiga, L., Fogassi, L., Gallese, V., & Rizzolatti, G. (1992). Understanding motor events: a neurophysiological study. Experimental Brain Research, 91(1), 176–180. https://‍ Dobson, S. D. (2009). Socioecological correlates of facial mobility in nonhuman anthropoids. American Journal of Physical Anthropology, 139(3), 413–420. https://‍ Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6, 169–200. https://‍ Ekman, P., & Friesen, W. V. (1978). Facial Action Coding System. Palo Alto, CA: Consulting Psychology Press. Evans, C. S. (1997). Referential signals. In D. H. Owings, M. D. Beecher, & N. S. Thompson (Eds.), Perspectives in Ethology (Vol. 12: Communication, pp. 99–143). New York & London: Plenum Press. Feldman Barrett, L. F., Lindquist, K. A., & Gendron, M. (2007). Language as context for the perception of emotion. Trends in Cognitive Sciences, 11(8), 327–332. https://‍ Ferrari, P. F., Gallese, V., Rizzolatti, G., & Fogassi, L. (2003). Mirror neurons responding to the observation of ingestive and communicative mouth actions in the monkey ventral premotor cortex. European Journal of Neuroscience, 17(8), 1703–1714. https://‍ Ferrari, P. F., Gerbella, M., Coudé, G., & Rozzi, S. (2017). Two different mirror neuron networks: The sensorimotor (hand) and limbic (face) pathways. Neuroscience, 358, 300–315. https://‍ Fridlund, A. (1994). Human Facial Expression: An Evolutionary View. San Diego: Academic Press. Gavrilov, N., Hage, S. R., & Nieder, A. (2017). Functional specialization of the primate frontal lobe during cognitive control of vocalizations. Cell Reports, 21(9), 2393–2406. https://‍ Ghazanfar, A. A., & Eliades, S. J. (2014). The neurobiology of primate vocal communication. Current Opinion in Neurobiology, 28, 128–135.  https://‍ Gruber, T., & Grandjean, D. (2017). A comparative neurological approach to emotional expressions in primate vocalizations. Neuroscience & Biobehavioral Reviews, 73, 182–190. https://‍

134 Katja Liebal and Linda Oña Jablonka, E., Ginsburg, S., & Dor, D. (2012). The co-evolution of language and emotions. Philosophical Transactions of the Royal Society B: Biological Sciences, 367, 2152–2159. https://‍ Jürgens, U. (2002). Neural pathways underlying vocal control. Neuroscience & Biobehavioral Reviews, 26(2), 235–258.  https://‍‍00068-9 Koelsch, S., Jacobs, A. M., Menninghaus, W., Liebal, K., Klann-Delius, G., von Scheve, C., & Gebauer, G. (2015). The quartet theory of human emotions: an integrative and neurofunctional model. Physics of Life Reviews, 13, 1–27.  https://‍ Leavens, D. A., Russell, J. L., & Hopkins, W. D. (2005). Intentionality as measured in the persistence and elaboration of communication by chimpanzees (Pan troglodytes). Child Development, 76(1), 291–306.  https://‍ Liebal, K., Pika, S., & Tomasello, M. (2004). Social communication in siamangs (Symphalangus syndactylus): Use of gestures and facial expressions. Primates, 45(1), 41–57. https://‍ Liebal, K., Schneider, C., & Errson-Lembeck, M. (2018). How primates acquire their gestures: evaluating current theories and evidence. Animal Cognition, 1–14. https://‍ Liebal, K., Waller, B. M., Burrows, A. M., & Slocombe, K. E. (2013). Primate Communication: A Multimodal Approach. Cambridge: Cambridge University Press. https://‍ Maryanski, A., & Turner, J. H. (1992). The Social Cage: Human Nature and the Evolution of Society: Stanford University Press. Müri, R. M. (2016). Cortical control of facial expression. Journal of Comparative Neurology, 524(8), 1578–1585.  https://‍ Oña, L., Sandler, W., & Liebal, K. (in prep). Compositionality in chimpanzee communication? Owren, M. J., Dieter, J. A., Seyfarth, R. M., & Cheney, D. L. (1992). Evidence of limited modification in the vocalizations of cross-fostered rhesus (Macaca mulatta) and Japanese (M. fuscata) macaques. Developmental Psychobiology, 26(7), 257–270. Parr, L., Waller, B. M., & Fugate, J. (2005). Emotional communication in primates: Implications for neurobiology. Current Opinion in Neurobiology, 15(6), 716–720. https://‍ Pinker, S., & Bloom, P. (1990). Natural selection and natural language. Behavioral and Brain Sciences, 13(4), 707–784.  https://‍ Rizzolatti, G., & Arbib, M. A. (1998). Language within our grasp. Trends in Neuroscience, 21(5), 188–194.  https://‍‍01260-0 Roberts, A. I., & Roberts, S. G. B. (2016). Wild chimpanzees modify modality of gestures according to the strength of social bonds and personal network size. Scientific Reports, 6 33864, https://‍ Scheider, L., Waller, B. M., Oña, L., Burrows, A. M., & Liebal, K. (2016). Social use of facial expressions in hylobatids. PLoS ONE, 11(3), e0151733. https://‍ Schel, A. M., Townsend, S. W., Machanda, Z., Zuberbühler, K., & Slocombe, K. E. (2013). Chimpanzee alarm call production meets key criteria for intentionality. PLoS ONE, 8(10), e76674.  https://‍ Seyfarth, R. M., & Cheney, D. L. (2003). Meaning and emotion in animal vocalizations. Annals of the New York Academy of Sciences, 1000 (Emotions inside out: 130 Years after Darwin’s “The expression of the emotions in man and animals”), 32–55. https://‍

Intentional gestures and emotional facial and vocal signals of nonhuman primates 135

Sherwood, C. C., Hof, P. R., Holloway, R. L., Semendeferi, K., Gannon, P. J., Frahm, H. D., & Zilles, K. (2005). Evolution of the brainstem orofacial motor system in primates: a comparative study of trigeminal, facial, and hypoglossal nuclei. Journal of Human Evolution, 48(1), 45–84.  https://‍ Sherwood, C. C., Holloway, R. L., Erwin, J. M., & Hof, P. R. (2004). Cortical orofacial motor representation in old World monkeys, great apes, and humans. Brain, Behavior and Evolution, 63(2), 82–106.  https://‍ Slocombe, K. E., Waller, B. M., & Liebal, K. (2011). The language void: The need for multimodality in primate communication research. Animal Behaviour, 81(5), 919–924. https://‍ Spoor, J. R., & Kelly, J. R. (2004). The evolutionary significance of affect in groups: Communication and group bonding. Group Processes & Intergroup Relations, 7(4), 398–412. https://‍ Taglialatela, J. P., Cantalupo, C., & Hopkins, W. D. (2006). Gesture handedness predicts asymmetry in the chimpanzee inferior frontal gyrus. Neuroreport, 17(9), 923–927. https://‍ Taglialatela, J. P., Russell, J. L., Schaeffer, J. A., & Hopkins, W. D. (2008). Communicative signaling activates ‘Broca’s’ homolog in chimpanzees. Current Biology, 18(5), 343–348. https://‍ Tanner, J., & Byrne, R. (1993). Concealing facial evidence of mood: Perspective-taking in a captive gorilla? Primates, 34(4), 451–457.  https://‍ Thunström, M., Kuchenbuch, P., & Young, C. (2014). Concealing of facial expressions by a wild Barbary macaque (Macaca sylvanus). Primates, 55(3), 369–375. https://‍ Turner, J. H. (1996). The evolution of emotions in humans: A Darwinian–Durkheimian analysis. Journal for the Theory of Social Behaviour, 26(1), 1–33. https://‍ Vick, S. J., Waller, B. M., Parr, L. A., Smith Pasqualini, M. C., & Bard, K. A. (2007). A cross-species comparison of facial morphology and movement in humans and chimpanzees using the facial action coding system (FACS). Journal of Nonverbal Behavior, 31(1), 1–20. https://‍ Waller, B. M., Caeiro, C. C., & Davila-Ross, M. (2016). Orangutans modify facial displays depending on recipient attention. PeerJ, e827.  https://‍ Waller, B. M., Lembeck, M., Kuchenbuch, P., Burrows, A. M., & Liebal, K. (2012). GibbonFACS: A muscle-based facial movement coding system for hylobatids. International Journal of Primatology, 33(4), 809–821.  https://‍ Waller, B. M., Whitehouse, J., & Micheletta, J. (2016). Macaques can predict social outcomes from facial expressions. Animal Cognition, 19(5), 1031–1036. https://‍ Wilson, M., Hauser, M., & Wrangham, R. (2007). Chimpanzees (Pan troglodytes) modify grouping and vocal behaviour in response to location-specific risk. Behaviour, 144(12), 1621–1653.  https://‍ Zuberbühler, K. (2015). Linguistic capacity of non-human animals. WIREs Cognitive Sciences, 6, 313–321.  https://‍

Turn-taking and Prosociality

From sharing food to sharing information Cooperative breeding and language evolution Judith Burkart, Eloisa Guerreiro Martins, Fabia Miss and Yvonne Zürcher University of Zurich

Language is a cognitively demanding human trait, but it is also a fundamentally cooperative enterprise that rests on the motivation to share information. Great apes possess many of the cognitive prerequisites for language, but largely lack the motivation to share information. Callitrichids (including marmosets and tamarins) are highly vocal monkeys that are more distantly related to humans than great apes are, but like humans, they are cooperative breeders and all group members help raising offspring. Among primates, this rearing system is correlated with proactive prosociality, which can be expressed as motivation to share information. We therefore propose that the unique coincidence of these two components in humans set the stage for language evolution: The cognitive component inherited from our great ape-like ancestors, and the motivational one added convergently as a result of cooperative breeding. We evaluate this scenario based on a review of callitrichd vocal communication and show that furthermore, they possess many of the mechanistic elements emphasized by the mirror system hypothesis of language evolution. We end by highlighting how more systematic phylogenetic comparisons will enable us to further promote our understanding of the role of cooperative breeding during language evolution. Keywords: primates, prosociality, marmosets, cooperative breeding, convergence, shared ancestry, information sharing

1. Introduction The communicative abilities of extant primates can crucially inform our understanding of language evolution. In the predominant approach, researchers identify elements of language in primates that are more or less closely related to humans, to infer if these elements had likely been present in the corresponding last common ancestor. The big brained great apes, our closest relatives, appear endowed with many of the cognitive prerequisites for language (as perhaps most evident in https://‍ © 2020 John Benjamins Publishing Company

From sharing food to sharing information 137

language trained apes: Tomasello, 2017), which therefore most likely were already present in the last common ancestor of humans and other great apes. Other elements of language seem largely lacking in the great apes, perhaps most fundamentally the motivation to share information which is beneficial to others rather than themselves (see also Wacewicz, Zywiczynski & Chiera, this volume). For instance, even language trained great apes use their communicative skills almost exclusively imperatively (Tomasello,, 2008, 2017); in other words, they mostly lack the motivation to share information (Fitch, 2005). Elements of language that are absent in our closest relatives cannot be explained through shared ancestry. They may be uniquely present in humans, or else present in less closely related species. In the latter case, an evolutionary approach that is complementary to identifying shared ancestry becomes possible, i.e. to ask whether these traits may be the result of convergent evolution, and their presence in some species but not in others linked to specific social or ecological factors. One factor that has played a convergent role during human evolution is cooperative breeding, which has considerable explanatory power for understanding numerous features of our life history, demography, and cognitive endowment (Burkart, Hrdy, & van Schaik, 2009; Hrdy, 2005b, 2009). Cooperative breeding refers to a social system in which not only the parents provide care for the offspring (Solomon & French, 1997). Among primates, callitrichid monkeys (i.e. marmosets, tamarins, and callimicos) and humans are the only species known to show such a social system. In callitrichids, all group members cooperate in raising offspring by carrying and later provisioning the immatures (Digby, Ferrari, & Saltzman, 2007), and they frequently cooperate also in a variety of other activities including territory defense, vigilance, anti-predator behavior or food harvesting (Garber, 1997). The infants are continuously carried during the first weeks of life, which requires high levels of coordination among all group members (Snowdon, 2001). During provisioning, they regularly engage in proactive food sharing, i.e. unsolicited sharing initiated by the possessor. This is common in all human societies and callitrichid monkeys, yet virtually absent in all great ape and most monkey species. In these other species, if food is shared at all, it is mostly shared passively (tolerated taking) or in response to begging and requests and thus initiated by the potential recipient, rather than by the possessor and his or her motivation to share (Brown, Almond, & van Bergen, 2004; Jaeggi, Burkart, & van Schaik, 2010; Melis & Warneken, 2016). Systematic comparative evidence from 24 groups of 15 primate species indicates that the prosocial motivation to share food is linked to cooperative breeding in primates (Burkart et al., 2014; see also Horn et al. 2016 for a similar pattern in corvids). This prosocial motivational predisposition is also reflected in other cooperative interactions. During cooperative problem solving, for instance, callitrichid monkeys continue to contribute to the task even if for some

138 Judith Burkart, Eloisa Guerreiro Martins, Fabia Miss and Yvonne Zürcher

time, they don’t receive a reward for cooperating, whereas independently breeding primates such as chimpanzees, orangutans or capuchin monkeys quickly decrease their cooperative contributions (Snowdon & Cronin, 2007). Thus, cooperation per se is not unique to callitrichid monkeys, but it is more frequent and more often based on prosocial rather than individualistic and selfish motives compared to independently breeding primates. The working hypothesis put forward in this paper is that human language evolution was enabled on the one hand, because our hominin ancestors had inherited from their great ape ancestors many of the cognitive prerequisites for language. On the other hand, they were also equipped with the prosocial motivational component, and this is better understood as a consequence of cooperative breeding that evolved in our hominin ancestors but in none of the other extant great apes. Thus, these two components per se are not unique to humans, but the coincidence of both components in the same species is, and may explain why language evolved in the human lineage, rather than in any other (ape) species. The cooperatively breeding callitrichid monkeys give us the opportunity to investigate the consequences of cooperative breeding and a more prosocial attitude on communicative complexity per se (Borjon & Ghazanfar, 2014; Burkart & van Schaik, 2016; Hrdy, 2005a; Snowdon, 2001; Zuberbühler, 2011). They are small New World primates who shared a last common ancestor with humans more than 37–54 million years ago, and they lack the big and powerful brains of great apes. The goal of this article is to evaluate the hypothesis that there may be a link between cooperative breeding and communicative complexity, by first reviewing callitrichid vocal communication and examining how this evidence fits with specific potential pathways through which cooperative breeding may be linked to communicative complexity. We will then turn to the mirror system hypothesis and show that callitrichids indeed possess many of the mechanistic elements proposed by this hypothesis, and finally propose how more controlled phylogenetic approaches will help us to systematically test the link between cooperative breeding and language evolution. 2. Callitrichid vocal communication The communicative system of callitrichids appears unusual among nonhuman primates (Rukstalis, Fite, & French, 2003; Snowdon, 2013). They are highly voluble monkeys that vocalize almost constantly (Eliades & Miller, 2016), have large vocal repertoires for nonhuman primates (Agamaite, Chang, Osmanski, & Wang, 2015; Campbell & Snowdon, 2007; Cleveland & Snowdon, 1982; Masataka, 1982; McComb & Semple, 2005), and frequently produce a variety of call combinations (Agamaite et al., 2015; Bezerra & Souto, 2008; Cleveland & Snowdon, 1982). The

From sharing food to sharing information 139

structure of callitrichid vocalizations encodes information about group, sex, and individual identity, and recipients can discriminate at least the latter (Rukstalis & French, 2005; Weiss, Garibaldi, & Hauser, 2001). Individuals sometimes engage in cooperative turn-taking where partners flexibly adjust the timing of their vocalizations to each other (Takahashi, Narayanan, & Ghazanfar, 2013), both in dyadic and polyadic situations (Snowdon & Cleveland, 1984). Turn-taking occurs when individuals are separated, and they start calling back and forth with the other group members (using phee-calls; for qualitative differences between callitrichid and human turn-taking, see Wacewicz et al., this volume). Artificial playbacks of interfering noise during turn-taking exchanges suggest considerable vocal control over the timing (Roy et al., 2011). When noise was played back at predictable intervals, the monkeys would time their calls such that the first call would occur in the first silent interval and the answer only after the next bout of noise in the subsequent silent interval. Alternatively, call and answer were emitted with a shortened latency to fully fit within a predictable period of silence. Some of the calls of callitrichids are functionally referential, referring to predators (Cäsar & Zuberbühler, 2012; Kirchhof & Hammerschmidt, 2006) and also to food (Kitzmann & Caine, 2009). Food calls occur in several primate species and can have various functions (Clay, Smith, & Blumstein, 2012). They can be emitted selfishly and indicate ownership of a specific food source. Capuchin monkeys, for instance, are more likely to emit food calls when others are present, and less likely to be approached by others when emitting food calls (Gros-Louis, 2004; Pollick, Gouzoules, & de Waal, 2005). Food calls can also function to attract others to big, sharable food source like a fruiting tree in order to reduce predation rather than to share food. In callitrichids, finally, food calls can function to attract others in order to offer food to them, when during proactive sharing, food possessors first emit food offering calls and then wait with food in their outstretched hands for immatures to come and take it (Brown et al., 2004). Accordingly, adult callitrichids are more likely to call when others are absent rather than present (Caine, Addington, & Windfelder, 1995; Vitale, Zanzoni, Queyras, & Chiarotti, 2003). Callitrichid food calls are to some extent independent of the caller’s own feeding motivation, because adults are more likely to call if immatures are present in the group (Guerreiro Martins, Moura, Finkenwirth, & Burkart, in rev.), and when immatures are unable to obtain food independently from a puzzle box that only the adult can open (Guerreiro Martins & Burkart, 2013; Moura, Nunes, & Langguth, 2010). As in other nonhuman primates, the vocabulary in callitrichds is fixed, and no novel vocalizations are acquired via vocal learning. Nevertheless, some flexibility appears present. The acoustic structure of vocalizations differs considerably between populations (de la Torre & Snowdon, 2009; Zürcher & Burkart, 2017), and

140 Judith Burkart, Eloisa Guerreiro Martins, Fabia Miss and Yvonne Zürcher

translocation experiments show that these differences are indeed the result of vocal production learning rather than environmental or genetic differences (Zürcher & Burkart, in prep). This kind of vocal accommodation, i.e. changes in the structure of a given vocalization in response to social factors, has been reported in several other primate species too and serves the function of indicating social closeness (Ruch, Zürcher, & Burkart, 2018). Social influences on vocal development during ontogeny, however, seem particularly prevalent in callitrichids (Snowdon, 2017b). For instance, immatures lacking adult vocal feedback because they had been socially deprived (Gultekin & Hage, 2017) or deafened (Roupe, Pistorio, & Wang, 2003) appeared unable to develop proper adult vocal repertoires and were less likely to use certain call combinations. Callitrichid infants babble, which to our knowledge has not been described in any other primate species except humans. Babbling bouts are noisy and can last up to one minute or more, and consist of strings of elements of calls from the adult repertoire (Elowson, Snowdon, & Lazaro-Perea, 1998b; Pistorio, Vintch, & Wang, 2006). During babbling bouts, adults are more likely to interact with the infants, and infants who babble more produce well-formed adult calls earlier during ontogeny (Elowson, Snowdon, & Lazaro-Perea, 1998a; Snowdon & Elowson, 2001; Takahashi, Fenley, & Ghazanfar, 2016; Takahashi et al., 2015). These findings are complemented by experimental evidence that confirms that contingent parental feedback speeds up vocal development in common marmosets (Takahashi et al., 2016, 2017). In fact, some instances of parental feedback may even satisfy the criteria for teaching (according to the functional definition by Caro & Hauser, 1992), for instance when during turn-taking, parents add an extra break when infants get the timing wrong and respond too quickly, or when infants respond with the wrong call and parents interrupt them with the correct answer, i.e. a phee-call (Chow, Mitchell, & Miller, 2015; Takahashi et al., 2016). 3. Cooperative breeding and vocal complexity? There are at least three, mutually non-exclusive ways in which cooperative breeding in primates may facilitate the emergence of more diverse and more sophisticated forms of communication (Table 1). First and most importantly, the readiness to share food may extend toward a willingness to share information as well. Cooperative breeding may thus have favored the evolution of human language by adding a key element, i.e. the motivation to share information (Fitch, 2005; Grice, 1975; Noble, 2000; Tomasello, 2008), to the cognitive endowment of the last common ancestor that we had shared with the other great apes. In callitrichids, information donation is apparent in food offering calls that function to attract others to a food item (Brown et al., 2004), but also when parents correct immatures in

From sharing food to sharing information 141

turn-taking sequences where immatures get the timing wrong or choose the wrong call type to answer (Chow et  al., 2015). Another form of information donation is teaching. For callitrichids, more evidence consistent with teaching is available compared to independently breeding primates, where teaching appears virtually absent (Kline, 2014). In callitrichids, both in the wild and in captivity, adults have repeatedly been shown to change their behavior in the presence of naïve immatures in a way that is beneficial to skill acquisition in cotton-top tamarins (Humle & Snowdon, 2008; Snowdon & Roskos, 2017), lion tamarins (Rapaport, 2011) and common marmosets (Dell’Mour, Range, & Huber, 2009; Chow, Mitchell, & Miller, 2015; Takahashi et al., 2016, Guerreiro Martins & Burkart, in prep.). The only context in which information donation appears to occur in a wider range of species are alarm calls; they may easily evolve because small costs can have enormous inclusive fitness benefits, but may also be driven by direct fitness benefits (predator deterrence: Shelly & Blumstein 2004) and therefore not represent a case of genuine information donation. Second, since cooperative breeders routinely engage in a variety of cooperative activities on a daily basis, they may experience more contexts in which they have to coordinate and negotiate with other group members (Snowdon, 2001). For instance, individuals have to coordinate who will engage in infant carrying and who in vigilance or even group defense, because these activities are usually not performed simultaneously. Likewise, the timing of transfers from one caregiver to the next has to be coordinated. An obvious way to meet such increased needs for coordination and social monitoring in arboreal species with limited visibility is through the acoustic channel. The high prevalence of vocalizations emitted in callitrichid groups and their large vocal repertoires are consistent with this pathway. Intriguingly, birds appear to fit this pattern too since cooperatively breeding species have larger vocal repertoires compared to independently breeding ones (Leighton, 2017). Vocal turn-taking (Table 1), finally, is also consistent with this pathway, because it likely scaffolds behavioral coordination by facilitating the monitoring of group members. Third, selective pressure may also be on immatures in cooperative breeders (Chisholm, 2003; Hawkes, 2014; Hrdy, 2009; Tomasello & Gonzalez-Cabrera, 2017; Zuberbühler, 2011), because they arguably have to engage caregivers who are not their mothers. Furthermore, they may even have to engage their own mothers because maternal investment tends to be conditional in cooperative breeders, and a mother may only properly care for her offspring if she perceives the availability of sufficient allomaternal support (Bardi, Petto, & Lee‐Parritz, 2001; Hrdy, 2016). Immatures in cooperative breeders may therefore be particularly selected for attracting caregivers, which may have facilitated the emergence of socio-cognitive skills, including vocal behavior that advertises the immatures’ neediness

142 Judith Burkart, Eloisa Guerreiro Martins, Fabia Miss and Yvonne Zürcher

Table 1.  Potential, non-mutually exclusive pathways that may link extensive cooperative breeding (reliance on allomaternal care) and vocal complexity Pathway

Elements of callitrichid vocal communication consistent with the pathway

Propensity to share food extends to propensity to also share information

– Food offering calls function to share information that is beneficial for the recipient1 – Adults are more likely to give food offering calls and share food when immatures lack the skills to obtain food independently2-4 – Potential teaching by adult marmosets during vocal development of immatures5-7 – Potential teaching in instrumental contexts3, 8-10

Increased need for coordination & monitoring

– Large repertoires and volubility11-14, see also 15 for birds – Turn-taking as cooperative vocal exchanges16 likely to facilitate coordination and mutual monitoring – Turn-taking also follows conversational patterns in polyadic situations17

Engaging caregivers

– Callitrichid infants engage in babbling behavior18, 19 – During babbling, caregivers are more likely to interact with infants20, 21

Notes. 1.  Brown et al., 2004 2.  Moura et al., 2010 3.  Humle & Snowdon, 2008 4.  Guerreiro Martins et al., 2013 5.  Chow et al., 2015 6.  Takahashi et al., 2016 7.  Takahashi et al., 2017 8.  Rapaport, 2011

9.  Guerreiro Martins & Burkart, in prep 10.  Snowdon & Roskos, 2017 11.  McComb & Semple, 2005 12.  Cleveland & Snowdon, 1982 13.  Campbell & Snowdon, 2007 14.  Masataka, 1982 15.  Leighton, 2017

16.  Takahashi et al., 2013 17.  Snowdon & Cleveland, 1984 18.  Elowson et al. 1998a,b. 19.  Pistorio et al., 2006 20.  Snowdon, 2017c 21.  Snowdon & Elowson, 2001.

(Chisholm, 2003; Hawkes, 2014; Hrdy, 2009; Tomasello & Gonzalez-Cabrera, 2017; Zuberbühler, 2011). The conspicuous babbling behavior of callitrichids may well fulfil this function. Immature callitrichids are very small and vulnerable to predation, and it is thus unexpected that they frequently vocalize loudly, as during babbling, rather than behaving more cryptically. An alternative, but likewise intriguing possibility is that babbling is a simple by-product of large repertoires of which the fine acoustic structure has to be learned during ontogeny (see above). Our first review of data thus suggest that these three, non-mutually exclusive pathways for a potential link between cooperative breeding and communicative complexity in primates are plausible. However, to fully evaluate them, more systematic data for a broad range of primates is critical.

From sharing food to sharing information 143

4. Callitrichid communication and the mirror system hypothesis The mirror system hypothesis as an evolving framework (Arbib, 2012) provides an account of the emergence of language at the mechanistic level and stresses the importance of mirror neurons and imitation, intentionality, and brain coupling. In the following section, we summarize studies suggesting that callitrichids in fact possess several of these mechanistic elements. First, mirror neurons are not unique to Old World primates but have also been demonstrated in common marmosets (Suzuki et al., 2015). Furthermore, marmosets engage in so-called true imitation, defined as the faithful copying of a novel technique with a high degree of matching of the precise actions between the model individual and the observer (Voelkl & Huber, 2007). This is unusual among nonhuman primates (Snowdon, 2017a), but note that this kind of imitation appears restricted to single actions, whereas the exact copying of entire action sequences may well be absent in callitrichids, perhaps due to limitations in working memory (which tends to be correlated with brain size in primates: Burkart, Schubiger, & van Schaik, 2017; see also Aboitiz & Putt, this volume). Second, social learning not only appears pervasive among callitrichids (Snowdon, 2017a) but it also relies on intention attribution. Specifically, common marmosets have been shown to only engage in social learning from intentional agents, i.e. agents that they perceive to behave in a goal-directed way (Burkart, Kupferberg, Glasauer, & van Schaik, 2012; Kupferberg, Glasauer, & Burkart, 2013). In habituation-dishabituation experiments, marmosets were first shown to perceive the behavior of approaching one of two objects as goal-directed. They did so if the behavior was performed by a conspecific, a human actor, and to a lesser extent by a robot, but not if performed by a black box. Rather, if performed by the black box, the behavior was encoded with regard to its physical properties (movement trajectory). Immediately afterwards, the subjects copied the choice of the agent, and interacted longer with this target object compared to the other object, but only if they had previously perceived the agent to behave in a goal-directed way. These results show that even simple forms of social learning, such as stimulus enhancement, rely on goal-attribution in marmosets. Finally, marmosets co-represent each other’s actions when jointly engaged in a task (Joint Simon effect: Sebanz, Knoblich, & Prinz, 2003). In the individual Simon task, subjects have to react to a stimulus with a specific response while a second, conflicting stimulus prompts a response that is incompatible with the correct answer. In a situation where subjects can fully ignore the conflicting stimulus because it would only be relevant for a partner who is jointly engaged in the same task, humans nevertheless do not ignore this stimulus. This suggests that human subjects not only represent their own task and actions, but also their partner’s,

144 Judith Burkart, Eloisa Guerreiro Martins, Fabia Miss and Yvonne Zürcher

which is supported by neuroimaging studies (Wen & Hsieh, 2015). This Joint Simon effect appears rather late in human ontogeny and has been linked to Theory of Mind development (Milward, Kita, & Apperly, 2016). Recently, a joint Simon effect and thus action co-representation has been demonstrated in common marmosets (Miss & Burkart, 2018). These results are consistent with their ease of coordinating activities, including vocal exchanges. Taken together, callitrichid monkeys possess mirror neurons and show highfidelity copying of behavior, have been shown to engage in social learning only if they perceive a behavior as intentional, and co-represent each others’ behaviors in a joint task. The presence of these mechanistic elements stressed by the mirror system hypothesis suggests some fundamental parallels between human and callitrichid communication. 5. Towards a new roadmap We suggest that the cooperative breeding perspective on language evolution complements the MSH, by focusing on a critical gap in the old road map, i.e. the origin of the motivation to share information (see also Wacewicz et al.’s platform of trust, this issue), and by providing a working hypothesis for why it was the cooperatively breeding humans, rather than any other ape species to develop language. The complexity of callitrichid vocal communication reviewed in this paper suggests that cooperative breeding indeed facilitates the evolution of more complex communication systems, and thus in the case of humans, of language. We have highlighted three potential pathways that may link cooperative breeding and vocal complexity: First and most importantly, the readiness to share food may extend to a readiness to share information; second, cooperative breeders may face more contexts in which it is vital to coordinate with and monitor group members and third, immatures may need to engage caregivers and use vocal communication to do so. These pathways appear to indeed foster vocal complexity in callitrichids, but perhaps not in communicative complexity more generally. Since the MSH posits that the path to speech is indirect, this may raise a challenge for this hypothesis. To further explore this issue, it is thus crucial to also study callitrichid communication in the broad sense and from a multi-modal perspective, e.g. including gestural communication (see also Liebal et al., this issue). Whereas the proposed pathways find considerable empirical support, we also highlight that more systematic evidence is necessary. First, the details of callitrichid communication complexity needs to be further delineated, e.g. combinatoriality, the role of working memory (see also Aboitiz & Putt, this issue), differences and similiarities in vocal control and turn-taking between humans and callitrichids (see also Wacewicz et al., this issue), and non-vocal communication (see also

From sharing food to sharing information 145

Liebal et al., this issue). Second and most crucially, more comparative evidence from a broader range of primates and other lineages is required to further evaluate the impact of cooperative breeding on communicative complexity. Such approaches should be based on broad phylogenetic comparisons (as done, for instance in MacLean et  al., 2014 or Burkart et  al., 2014), or targeted constrasts (MacLean et al., 2012, Burkart & van Schaik, 2010). This has recently been achieved for birds, where comparative evidence from a large number of bird species revealed that cooperatively breeding species have larger vocal repertoires compared to independently breeding species (Leighton, 2017). Together, the evidence reviewed in this paper suggests convergent evolution of communicative complexity in callitrichids and humans, both cooperative breeders. The enhanced cooperative and prosocial attitudes associated with cooperative breeding may thus give rise to the motivation to share not only food, but also information, which is largely lacking in great apes. The combination of this motivation to share information with strong cognitive abilities (as was the case in the human ancestors, but not in callitrichid monkeys) could then set the stage for language evolution in our lineage.

Funding This research was supported by SNF Grants 310030-13083 and 31003A-172979 (PI: J. Burkart). The paper was prepared for a workshop funded by NSF Grant No. BCS-1343544 “INSPIRE Track 1: Action, Vision and Language, and their Brain Mechanisms in Evolutionary Relationship,” (PI: M.A. Arbib).

References Agamaite, J. A., Chang, C. -J., Osmanski, M. S., & Wang, X. (2015). A quantitative acoustic analysis of the vocal repertoire of the common marmoset (Callithrix jacchus). The Journal of the Acoustical Society of America, 138(5), 2906–2928.  https://‍ Arbib, M. A. (2012). How the brain got language: The mirror system hypothesis (Vol. 16). OUP USA.  https://‍ Bardi, M., Petto, A. J., & Lee‐Parritz, D. E. (2001). Parental failure in captive cotton‐top tamarins (Saguinus Oedipus). American Journal of Primatology, 54(3), 159–169. https://‍ Bezerra, B. M., & Souto, A. (2008). Structure and usage of the vocal repertoire of Callithrix jacchus. International Journal of Primatolgoy (29), 671–701. https://‍ Borjon, J. I., & Ghazanfar, A. A. (2014). Convergent evolution of vocal cooperation without convergent evolution of brain size. Brain, Behavior and Evolution, 84(2), 93–102. https://‍ Brown, G. R., Almond, R. E. A., & van Bergen, Y. (2004). Begging, stealing and offering: food transfer in non-human primates. Advances in the Study of Behaviour, 34, 265–295. https://‍‍34007-6

146 Judith Burkart, Eloisa Guerreiro Martins, Fabia Miss and Yvonne Zürcher Burkart, J., Kupferberg, A., Glasauer, S., & van Schaik, C. (2012). Even simple forms of social learning rely on intention attribution in marmoset monkeys (Callithrix jacchus). Journal of Comparative Psychology, 126(2), 129.  https://‍ Burkart, J. M., Allon, O., Amici, F., Fichtel, C., Finkenwirth, C., Heschl, A., … van Schaik, C. P. (2014). The evolutionary origin of human hyper-cooperation. Nature Communications, 5, 4747. Burkart, J. M., Hrdy, S. B., & van Schaik, C. P. (2009). Cooperative breeding and human cognitive evolution. Evolutionary Anthropology, 18, 175–186. https://‍ Burkart, J. M., Schubiger, M. N., & van Schaik, C. P. (2017). The evolution of general intelligence. Behavioral and Brain Sciences, 1–65. Burkart, J. M., & van Schaik, C. P. (2010). Cognitive consequences of cooperative breeding in primates. Animal Cognition, 13, 1–19.  https://‍ Burkart, J. M., & van Schaik, C. P. (2016). The cooperative breeding perspective helps pinning down when uniquely human evolutionary processes are necessary Behavioral and Brain Sciences, 39, e34. Caine, N. G., Addington, R. L., & Windfelder, T. L. (1995). Factors affecting the rates of food calls given by red-bellied tamarins. Animal Behaviour, 50(1), 53–60. https://‍ Campbell, M., & Snowdon, C. T. (2007). Vocal response of captive-reared Saguinus oedipus during mobbing. International Journal of Primatology, 28(2), 257–270. https://‍ Caro, T. M., & Hauser, M. D. (1992). Is there teaching in nonhuman animals? Quarterly Review of Biology, 67, 151–172.  https://‍ Cäsar, C., & Zuberbühler, K. (2012). Referential alarm calling behaviour in New World primates. Current Zoology, 58(5), 680–697.  https://‍ Chisholm, J. S. (2003). Uncertainty, contingency, and attachment: A life history theory of Theory of Mind. In K. Sterelny & J. Fitness (Eds.), From mating to mentality. Evaluating Evolutionary Psychology. (pp. 125–154). New York: Psychology Press. Chow, C. P., Mitchell, J. F., & Miller, C. T. (2015). Vocal turn-taking in a non-human primate is learned during ontogeny. Proc. R. Soc. B, 282(1807), 20150069. https://‍ Clay, Z., Smith, C. L., & Blumstein, D. T. (2012). Food-associated vocalizations in mammals and birds: what do these calls really mean? Animal Behaviour, 83(2), 323–330. https://‍ Cleveland, J., & Snowdon, C. T. (1982). The complex vocal repertoire of the adult cotton-top tamarin (Saguinus oedipus oedipus). Zeitschrift für Tierpsychologie, 58, 231–270. https://‍ de la Torre, S., & Snowdon, C. T. (2009). Dialects in pygmy marmosets? Population variation in call structure. American Journal of Primatology, 71(4), 333–342. https://‍ Dell’Mour, V., Range, F., & Huber, L. (2009). Social learning and mother’s behavior in manipulative tasks in infant marmosets. American Journal of Primatology, 71(6), 503–509. https://‍ Digby, L. J., Ferrari, S. F., & Saltzman, W. (2007). Callitrichines: The role of competition in cooperatively breeding species. In C. J. Campbell, A. Fuentes, K. C. MacKinnon, M. A. Panger, & S. K. Bearder (Eds.), Primates in perspective (pp. 85–105). New York: Oxford University Press.

From sharing food to sharing information 147

Eliades, S. J., & Miller, C. T. (2016). Marmoset vocal communication: Behavior and neurobiology. Developmental neurobiology. Elowson, A. M., Snowdon, C. T., & Lazaro-Perea, C. (1998a). ‘Babbling’ and social context in infant monkeys: parallels to human infants. Trends in Cognitive Sciences, 35–43. Elowson, A. M., Snowdon, C. T., & Lazaro-Perea, C. (1998b). Infant ‘babbling’ in a nonhuman primate: complex seqeunces of vocal behavior. Behaviour, 135, 643–664. https://‍ Fitch, W. T. (2005). The evolution of language: a comparative review. Biology and Philosopy, 20, 193–230.  https://‍ Garber, P. A. (1997). One for all and breeding for one: cooperation and competition as a tamarin reproductive strategy. Evolutionary Anthropology: Issues, News, and Reviews, 5(6), 187–199. https://‍‍1520-6505(1997)‍5:63.0.CO;2-A Grice, H. P. (1975). Logic and conversation. In P. Cole (Ed.), Syntax and Semantics, Vol 3, (pp. 41–58). New York: Academic Press. Gros-Louis, J. (2004). The function of food-associated calls in white-faced capuchin monkeys, Cebus capucinus, from the perspective of the signaller. Animal Behaviour, 67(3), 431–440. https://‍ Guerreiro Martins, E. M., & Burkart, J. M. (2013). Common marmosets preferentially share difficult to obtain food items. Folia Primatologica, 84(3–5), 281–282. Guerreiro Martins, E. M., & Burkart, J. M. (in prep.). Teaching in marmosets:contingent on age or skill levels of immatures? Guerreiro Martins, E. M., Moura, A. C., Finkenwirth, C., & Burkart, J. M. (subm). Food sharing in three species of callitrichid monkeys (Callithrix jacchus, Leontopithecus chrysomelas & Sanguinus midas): Individual differences and interspecific variation. Animal Behaviour. Gultekin, Y. B., & Hage, S. R. (2017). Limiting parental feedback disrupts vocal development in marmoset monkeys. Nature Communications, 8.  https://‍ Hawkes, K. (2014). Primate sociality to human cooperation. Human Nature, 25(1), 28–48. https://‍ Horn, L., Scheer, C., Bugnyar, T., & Massen, J. J. (2016). Proactive prosociality in a cooperatively breeding corvid, the azure-winged magpie (Cyanopica cyana). Biology letters, 12(10), 20160649.  https://‍ Hrdy, S. (2005a). Comes the child before the man: how cooperative breeding and prolonged postweaning dependence shaped human potentials. In B. Hewlett & M. Lamb (Eds.), Hunter gatherer childhood. (pp. 65–91). Psicataway: Transactions. Hrdy, S. (2005b). Evolutionary context of human development: The cooperative breeding model. In C. S. Carter, L. Ahnert, K. E. Grossmann, S. B. Hrdy, M. E. Lamb, S. W. Porges, & N. Sachser (Eds.), Attachment and Bonding: A New Synthesis; From the 92nd Dahlem Workshop Report (pp. 9–32): MIT Press. Hrdy, S. (2009). Mothers & Others: The Evolutionary Origins of Mutual Understanding. Cambridge: Harvard University Press. Hrdy, S. B. (2016). Variable postpartum responsiveness among humans and other primates with “cooperative breeding”: A comparative and evolutionary perspective. Hormones and Behavior, 77, 272–283.  https://‍ Humle, T., & Snowdon, C. T. (2008). Socially biased learning in the acquisition of a complex foraging task in juvenile cottontop tamarins (Saguinus oedipus). Animal Behaviour, 27(1), 267–277.  https://‍

148 Judith Burkart, Eloisa Guerreiro Martins, Fabia Miss and Yvonne Zürcher Jaeggi, A., Burkart, J. M., & van Schaik, C. P. (2010). On the psychology of cooperation in humans and other primates: The natural history of food sharing and experimental evidence of prosociality. PhilTransB 12(365), 2723–2735. Kirchhof, J., & Hammerschmidt, K. (2006). Functionally referential alarm calls in tamarins (Saguinus fuscicollis and Saguinus mystax)–evidence from playback experiments. Ethology, 112(4), 346–354.  https://‍ Kitzmann, C. D., & Caine, N. G. (2009). Marmoset (Callithrix geoffroyi) Food‐Associated Calls are Functionally Referential. Ethology, 115(5), 439–448. https://‍ Kline, M. A. (2014). How to learn about teaching: An evolutionary framework for the study of teaching behavior in humans and other animals. Behavioral and Brain Sciences, 1–70. https://‍ Kupferberg, A., Glasauer, S., & Burkart, J. M. (2013). Do robots have goals? How agent cues influence action understanding in non-human primates. Behavioural Brain Research. https://‍ Leighton, G. M. (2017). Cooperative breeding influences the number and type of vocalizations in avian lineages. Proc. R. Soc. B, 284, 20171508.  https://‍ MacLean, E. L., Hare, B., Nunn, C. L., Addessi, E., Amici, F., Anderson, R. C., … Barnard, A. M. (2014). The evolution of self-control. PNAS, 111(20), E2140–E2148. https://‍ MacLean, E. L., Matthews, L. J., Hare, B., Nunn, C. L., Anderson, R. C., Aureli, F., … Emery, N. J. (2012). How does cognition evolve? Phylogenetic comparative psychology. Animal Cognition, 15(2), 223–238.  https://‍ Masataka, N. (1982). A field study on the vocalizations of Goeldi’s monkeys (Callimico goeldii). Primates, 23, 206–219.  https://‍ McComb, K., & Semple, S. (2005). Coevolution of vocal communication and sociality in primates. Biology Letters, 1(4), 381–385.  https://‍ Melis, A. P., & Warneken, F. (2016). The psychology of cooperation: Insights from chimpanzees and children. Evolutionary Anthropology, 25(6), 297–305. https://‍ Milward, S. J., Kita, S., & Apperly, I. A. (2016). Individual differences in children’s co-representation of self and other in joint action. Child Development. Miss, F. M., & Burkart, J. M. (2018). Corepresentation During Joint Action in Marmoset Monkeys (Callithrix jacchus). Psychological Science. https://‍ Moura, A. C., Nunes, H. G., & Langguth, A. (2010). Food Sharing in Lion Tamarins (Leontopithecus chrysomelas): Does Foraging Difficulty Affect Investment in Young by Breeders and Helpers? International Journal of Primatology, 31(5), 848–862. https://‍ Noble, J. (2000). Cooperation, competition and the evolution of prelinguistic communication. The Emergence of Language, 40–61.  https://‍ Pistorio, A. L., Vintch, B., & Wang, X. (2006). Acoustic analysis of vocal development in a New World primate, the common marmoet (Callithrix jacchus). J Aocust Soc Am, 120(3), 1655–1670.  https://‍ Pollick, A. S., Gouzoules, H., & de Waal, F. B. M. (2005). Audience effects on food calls in captive brown capuchin monkeys, Cebus apella. Animal Behaviour, 70, 1273–1281. https://‍

From sharing food to sharing information 149

Rapaport, L. G. (2011). Progressive parenting behavior in wild golden lion tamarins. Behavioral Ecology.  https://‍ Roupe, S. L., Pistorio, A., & Wang, X. (2003). Vocal plasticity induced by auditory deprivation in the common marmoset. Paper presented at the Society for Neuroscience, New Orleans, Nov. 11. Roy, S., Miller, C. T., Gottsch, D., & Wang, X. (2011). Vocal control by the common marmoset in the presence of interfering noise. The Journal of Experimental Biology, 214(21), 3619–3629. https://‍ Ruch, H., Zürcher, Y., & Burkart, J. M.. (2018). The function of vocal accommodation in humans and other primates. Biological Reviews, 93(2), 996–1013.  https://‍ Rukstalis, M., Fite, J. E., & French, J. A. (2003). Social Change Affects Vocal Structure in a Callitrichid Primate (Callithrix kuhlii). Ethology, 109(4), 327–340. https://‍ Rukstalis, M., & French, J. A. (2005). Vocal buffering of the stress response: exposure to conspecific vocalizations moderates urinary cortisol excretion in isolated marmosets. Hormones and Behavior, 47(1), 1–7.  https://‍ Shelley, E. L. & Blumstein, D. T. (2004). The evolution of vocal alarm communication in rodents. Behavioral Ecology, 16(1), 169–77.  https://‍ Sebanz, N., Knoblich, G., & Prinz, W. (2003). Representing others’ actions: just like one’s own? Cognition, 88(3), B11–B21.  https://‍‍00043-X Snowdon, C. T. (2013). Language parallels in New World primates Animal Models of Speech and Language Disorders (pp. 241–261). New York: Springer. https://‍ Snowdon, C. T. (2001). Social processes in communication and cognition in callitrichid monkeys: a review. Animal Cognition, 4, 247–257.  https://‍ Snowdon, C. T. (2017a). Cultural Phenomena in Cooperatively Breeding Primates. In J. M. Causadias, E. H. Telzer, & N. A. Gonzales (Eds.), The Handbook of Culture and Biology: John Wiley & Sons.  https://‍ Snowdon, C. T. (2017b). Learning from monkey “talk”. Science, 355(6330), 1120–1122. https://‍ Snowdon, C. T. (2017c). Vocal Communication in Family-Living and Pair-Bonded Primates Primate Hearing and Communication (pp. 141–174): Springer. https://‍ Snowdon, C. T., & Cleveland, J. (1984). “Conversations” among pygmy marmosets. American Journal of Primatology 7, 15–20.  https://‍ Snowdon, C. T., & Cronin, K. A. (2007). Cooperative breeders do cooperate. Behavioural Processes, 76(2), 138–141.  https://‍ Snowdon, C. T., & Elowson, A. M. (2001). ‘Babbling’ in pygmy marmosets: development after infancy. Behaviour, 138(10), 1235–1248.  https://‍ Snowdon, C. T., & Roskos, T. R. (2017). Stick-weaving: Innovative behavior in tamarins (Saguinus oedipus). Journal of Comparative Psychology, 131(2), 174. https://‍ Solomon, N. G., & French, J. A. (1997). Cooperative Breeding in Mammals. New York: Cambridge University Press. Suzuki, W., Banno, T., Miyakawa, N., Abe, H., Goda, N., & Ichinohe, N. (2015). Mirror neurons in a new world monkey, common marmoset. Frontiers in neuroscience, 9. https://‍

150 Judith Burkart, Eloisa Guerreiro Martins, Fabia Miss and Yvonne Zürcher Takahashi, D. Y., Fenley, A. R., & Ghazanfar, A. A. (2016). Early development of turn-taking with parents shapes vocal acoustics in infant marmoset monkeys. Phil. Trans. B, 371(1693), 20150370.  https://‍ Takahashi, D. Y., Fenley, A. R., Teramoto, Y., Narayanan, D. Z., Borjon, J. I., Holmes, P., & Ghazanfar, A. A. (2015). The developmental dynamics of marmoset monkey vocal production. Science, 349(6249), 734–738.  https://‍ Takahashi, D. Y., Liao, D. A., & Ghazanfar, A. A. (2017). Vocal Learning via Social Reinforcement by Infant Marmoset Monkeys. Current Biology.  https://‍ Takahashi, D. Y., Narayanan, D. Z., & Ghazanfar, A. A. (2013). Coupled oscillator dynamics of vocal turn-taking in monkeys. Current Biology, 23(21), 2162–2168. https://‍ Tomasello, M. (2017). What did we learn from the ape language studies?. In: Hare, B. & Yamamoto, S. (eds.) Bonobos: Unique in Mind, Brain, and Behavior. Oxford: Oxford University Press, 95–105 Tomasello, M. (2008). Origins of human communication. Cambridge MA: MIT Press. Tomasello, M., & Gonzalez-Cabrera, I. (2017). The role of ontogeny in the evolution of human cooperation. Human Nature, 1–15. Vitale, A., Zanzoni, M., Queyras, A., & Chiarotti, F. (2003). Degree of social contact affects the emission of food calls in the common marmoset (Callithrix jacchus). American Journal of Primatology, 59, 21–28.  https://‍ Voelkl, B., & Huber, L. (2007). Imitation as faithful copying of a novel techinque in marmoset monkeys. PLoS ONE, 2(7), e611.  https://‍ Weiss, D. J., Garibaldi, B. T., & Hauser, M. D. (2001). The production and perception of long calls by cotton-top tamarins (Saguinus oedipus): acoustic analyses and playback experiments. Journal of Comparative Psychology, 115(3), 258. https://‍ Wen, T., & Hsieh, S. (2015). Neuroimaging of the joint Simon effect with believed biological and non-biological co-actors. Frontiers in human neuroscience, 9. https://‍ Zuberbühler, K. (2011). Cooperative breeding and the evolution of vocal flexibility. In M. Tallerman & K. Gibson (Eds.), The Oxford Handbook of Language Evolution. Zürcher, Y., & Burkart, J. M. (2017). Evidence for dialects in three captive populations of common marmosets (Callithrix jacchus). International Journal of Primatolgoy, 38(4), 780–793. https://‍ Zürcher, Y., & Burkart, J. M. (in prep). Translocation experiments provide evidence for vocal accommodation learing in marmosets.

Social manipulation, turn-taking and cooperation in apes Implications for the evolution of language-based interaction in humans Federico Rossano

University of California, San Diego

This paper outlines how the focus on how communicative signals might emerge and how the capacity to interpret them might develop, does not yet explain what type of motivation is required to actually deal with those signals. Without the consistent production of appropriate responses to the production of communicative signals, there would be no point in producing any signal. If language is a tool to accomplish things with others, we need to understand what would lead to cooperation. The first step consists in avoiding the blind belief that all cooperation requires some prosocial attitude. A great deal of cooperation can occur while each participant in the interaction is selfishly attempting to maximize their own benefits or minimizing damaging consequences. I describe how different types of turn-taking can be achieved via different levels of cognitive complexity and how interpretive turn-taking requires a great deal of cognitive abilities that great apes possess. Finally, I provide empirical evidence of social manipulation in non-human primates. Given our awareness of the occurrence of social manipulation during cooperation among human adults, it seems necessary to reconsider to what degree human communication and language evolution require unique prosocial motivations. Keywords: communication, cooperation, language evolution, prosociality, social interaction, social manipulation

Imagine walking down the street towards some destination and being stopped by a stranger, asking for directions. Most people would stop and provide the information requested, ultimately helping the requester find their way. While this appears as a normal every day behavior, we should ponder for a second how special this is. First of all, instead of running away from a stranger approaching us (or responding https://‍ © 2020 John Benjamins Publishing Company

152 Federico Rossano

aggressively towards them), we stop and listen to their needs. This, by itself, is already remarkable. Human infants start to show a preference for ingroup over outgroup individuals when they are as young as 10 months old (Kinzler et al., 2007). Chimpanzees – one of humans’ closest living relatives – fear, attack, kill, and even cannibalize outgroup chimpanzees (Wilson et al., 2014). Yet, humans can become friendly with, and even act altruistically towards, individuals from a different social group. In other words, while we certainly prefer ingroup individuals we are not quite as xenophobic as chimpanzees. Second, even though the person who is asking us for directions is delaying the completion of our own project (getting to our own destination) we stop and help (helping comes at a cost for us, albeit minimal). Third, we do not have any clear tie to this person and we will likely never see her again, yet we help her (we do not do it because of direct reciprocity, Trivers, 1971). Fourth, knowledge of where things are located and the fastest way to get there can provide a strategic advantage for an individual, yet we are willing to share it with a stranger (we act prosocially, i.e. we engage in a behavior intended to benefit another individual). I am interested in why the addressed recipient would ever stop and provide the information requested by a stranger. We can hypothesize a few (non-exhaustive) possible motivators: (1) fear (the stranger might be scary); (2) empathy (the stranger needs help and we feel for her) (3) coercion (the stranger might force us to help them); (4) social reputation (we might think that we are being observed and want to protect our reputation as cooperative individuals); (5) prosocial attitude (we are naturally prosocial and therefore would spontaneously help people in need); (6) normative expectations (we follow the rule that we should respond to questions, no matter who asks them); (7) a Kantian categorical imperative (we believe in a moral code that transcends the individual and defines us as a species and we abide to it). While one might think that finding the motivation(s) for our cooperative behavior with strangers is a trivial exercise, I argue that understanding what leads a recipient to respond to any communicative signal is key towards understanding language evolution and human communicative abilities. Most importantly, the question of motivation will have to be addressed from an evolutionary perspective. We know, for example, that human beings are particularly cooperative in conversation in that almost 90% of questions in ordinary conversation obtain some form of response (Stivers et al., 2009). Non-human great apes, on the other hand, are reluctant to respond to requests and begging gestures (chimpanzees and orangutans grant requests for food approximately 1/‍3 of the times, see, e.g., Silk et al., 2013; Rossano & Liebal, 2014). Moreover, great apes are mostly competitive, as opposed to cooperative (Tomasello, 2009) and they do not seem to engage in shared intentionality like humans do (Tomasello et al., 2005). So how would we

Social manipulation, turn-taking and cooperation 153

account for the transition from likely highly xenophobic, mostly uncooperative ancestors to modern humans? According to many scholars, the fundamental difference between humans and non-human great apes lies in our increased prosociality. Four theories out there focused on how altruism could evolve and can be summarized as follows: 1. The Big Mistake hypothesis (humans lived in small groups mostly surrounded by kin for a long time, so increased prosociality developed in that setting and we continue to be prosocial now that we are surrounded by non-kin; Burnham & Johnson, 2005); 2. The Cultural Group Selection Hypothesis (groups with more altruists ultimately outcompeted groups with more selfish individuals, but the selection for altruistic behaviors would be cultural and not genetic; Richerson & Boyd, 2005). 3. The Interdependence Hypothesis (at some point humans became obligate cooperative foragers and hunters and this led to pressure on selecting for cooperative individuals first and altruistic behaviors next for survival and procreation; Tomasello et al., 2012) 4. The Cooperative Breeding Hypothesis (humans, like other primates but not great apes, are cooperative breeders and the capacity and motivation to cooperate to raise offspring could have generalized to other domains and social relationships; Hrdy, 2011; Burkart, Hrdy & Van Schaik, 2009) While humans do appear more prosocial than other great apes (and probably other primates in general), I believe that the emphasis on altruism and prosociality has prevented us from seeing how much successful cooperation is possible via social manipulation. Similarly, many developmental studies testing children’s prosociality lack proper controls to satisfactorily rule out selfish motivations for children’s cooperative behavior. In this paper, I first review some findings about the structure of human communication (in particular turn-taking, sequence organization and action formation). Then I report on how it integrates with current approaches to animal communication divided between a communication as manipulation perspective (Dawkins & Krebs, 1978; Owren et  al. 2010) and a communication as transfer of information perspective (Smith, 1977; Seyfarth and Cheney, 2003). I will further present recent empirical evidence from orangutans and chimpanzees showing how successful cooperation can be achieved without altruistic or prosocial motivations and will outline how it integrates with the perspective on the importance of social manipulation and mind-reading for communication developed by Krebs and Dawkins (1984). Finally, I will extract suggestions for factoring these findings into a new road map for the evolution of the language-ready brain.

154 Federico Rossano

Cooperation and human communication Cooperative activities do not have to be things such as building a house together or draining a meadow (a la Hume). Simply engaging in a conversation is a cooperative act, and we do so every day, multiple times a day and for an extensive amount of time. The key importance of cooperation among participants for the success of communication during social interaction has a long scholarly tradition in the social sciences (see, e.g., Grice, 1975; Bruner, 1975). Human communication (and children’s capacity to acquire a language’s lexicon and grammar) rests crucially on (i) our ability to successfully cooperate with each other and (ii) the implicit assumption that our interlocutors are trying to be cooperative while communicating with us, i.e., deception and manipulation is the exception, not the default. These assumptions are meant to facilitate inferential processes while processing communicative signals, including the relevance of potential deviations from what is otherwise expected (e.g. if in response to a suggestion to go out for a walk the recipient responds with “it’s raining”). The key problem here is how do people make sense of each other signals and know how to properly respond. Conversation analysts (Sacks, Schegloff & Jefferson, 1974; Schegloff, 2007) have labeled this problem (i.e., producing and comprehending communicative moves) the action formation (from the speaker’s point of view), and the action recognition/ ascription problem (from the recipient’s point of view). It concerns the resources that participants deploy in social interaction to make their actions (e.g. requests, offers, invitations, complaints, sanctions, compliments) recognizable and intelligible to co-participants so that they can obtain what they wish (Schegloff, 2007; Levinson, 2013). The way we design our communicative signals has to take into account how recipients will perceive, interpret and likely respond to those signals. The cooperative assumption, as outlined for example in the Cooperative Principle by Grice (1975), facilitates this process by assuming that our interlocutors are being cooperative in designing their communications for us. Recent research on epistemic vigilance in children shows that only by age 4 they seem to understand that others might be actively trying to deceive them and therefore they should stop trusting them (see Mascaro & Sperber, 2009). In general, the low price that we place on most of our talk during ordinary conversation makes us underestimate how cooperative human communication is. It also makes most contemporary scholars forget that most animals communicative systems do not require quite the level of cooperation that is observed in human communication. I already mentioned how almost 90% of questions in ordinary conversation obtain some form of response (Stivers et al., 2009). Moreover, in face-to-face interaction, the average time it takes to transition between a question and a response is only about 0.2 seconds (Stivers et al., 2009; Levinson, 2016).

Social manipulation, turn-taking and cooperation 155

Recently, psycholinguists have paid special attention to the timing between utterances (turn-taking) (see, e.g., Garrod & Pickering, 2004; Christiansen & Chater, 2016; Levinson, 2016). We now know that participants need to start planning their responsive turn before the previous turn is complete (see Levinson, 2016). In other words, the inferential processes at work in social interaction must be extensive. On the other hand, researchers interested in animal communication have begun making claims about the similarity between human turn-taking and turn-taking in the vocalizations of non-human primates (Chow et al., 2015, see also Burkart, this volume) and different species of birds (see, e.g. Henry et al., 2015). While this research brings forward important information about the possible precursors to human turn-taking in conversation, it is important to emphasize that turn-taking should not focus just on auditory signals but rather include also visible actions/ gestures. Moreover, there are at least 4 types of turn-taking systems easily identifiable across the animal kingdom (see Figure 1). The first type, ‘mimetic’, can be found in multiple species and basically occurs whenever two animals produce the same vocalization while interacting (e.g. a pant-hoot for chimpanzees or vervet alarm calls) but minimize overlap. This type of turn-taking does not require any special inferential process and can be elicited (primed) by the production of the first vocalization. The second type, ‘alternating’, can also be found in several species (see e.g. marmosets, Burkart, this volume) but especially in birds. It occurs whenever animals take turns in producing their own song, for example. This is cognitively more complex in that it requires the production of a different vocalization compared to the one perceived and still requires the capacity to minimize overlap and produce (mostly) non-overlapping vocalizations. It is an open question to what degree alternation is a necessary precursor of the interpretative stage. The third type, ‘simple interpretative’, has been clearly documented both in human (Schegloff, 1992) and non-human great apes (see e.g., Rossano, 2019; Rossano, 2013; Hobaiter & Byrne, 2011). This occurs when animal 1 produces signal A that is not responded adequately by animal 2 (signal B) and consequentially leads to a pursuit/repair of the initial signal by producing a modified version of the initial signal (signal A1). This usually results in animal 2 next providing a different response (turn C). Here multiple signals are not only produced alternatively, but signal A makes relevant a specific type of response and in case of misunderstanding or inapposite response, a signaler can flexibly modify the signal to solicit the appropriate response. The fourth type, ‘complex interpretative’, goes beyond the capacity of repairing misunderstandings or pursuing missing responses and rather reflects the capacity to achieve interactional projects that take multiple turns and require extensive turn exchanges. Humans indeed can converse for hours. This type of turn-taking can

156 Federico Rossano





1.  MIMETIC: Same signal different signalers/speakers A





2.  ALTERNATING: Different signal different signalers/speakers A




3.  SIMPLE INTERPRETATIVE: A signal makes relevant specific responsive signals. If they are not produced next, a signaler might pursue a target response further B A




4a.  COMPLEX INTERPRETATIVE: extended question-answers sequences A





4b.  COMPLEX INTERPRETATIVE: sequences of stories by multiple speakers Figure 1.  Different types of turn-taking in the animal kingdom. Different colors represent different signalers/speakers and letters identify signals (i.e. A is different from B that is different from C, etc.)

unfold via a long sequence of questions and answers (4a), like one would do when ordering food for delivery or booking a flight over the phone or via the exchange of long tellings (4b), like when friends are catching up after the holidays. In the latter case, especially if in a multy-party situation, the following speakers might selfselect and volunteer tellings that retain some similarity to the story just told and convey the recipient’s understanding of what has just been told (on second stories, see Sacks, 1992). Notably, the production of these stories and congruity with the previous story is less normatively constrained than in question answer sequences. Differently from the mimetic and alternating type, in interpretative turn-taking systems individuals must deal with an ever-changing environment in which every new utterance is simultaneously “context shaped” and “context renewing” (Heritage, 1984). There is no current comparison nor evolutionary theory accounting for the transition from one type of turn-taking system to another. Similarly, we do not know how widespread each type is in the animal kingdom. If we adopt a cognitive perspective in our approach to animal communication, producing an interpretive type of turn-taking is significantly more cognitively demanding than the other two and will be most likely rare. Indeed, the sequentiality of the turns is also meaningful.

Social manipulation, turn-taking and cooperation 157

To understand what an animal is trying to communicate, it would not be sufficient to identify the composition of the signal but also its position in a sequence of signals within an interactional bout (Schegloff, 2007).

Animal communication, manipulation vs. information The study of animal communication is divided between two main perspectives: the Manipulation and the Information perspective. The former mostly focuses on how evolution through natural selection has selected signals that are designed to influence others and believes that communication is fundamentally competitive in nature (Dawkins & Krebs, 1978; Owren et al., 2010). Scholars that adopt this perspective mostly reject the idea that signals convey information (in particular semantic meaning, see Krebs & Dawkins, 1984). In contrast, the information perspective heavily relies on the Shannon and Weaver’s (1949) model of communication and on Wiener’s (1948) cybernetic theory of communication and feedback control. The idea is that communication consists in encoding and decoding information via the production and perception of signals. Signals encode information and the main task of the perceiver is to decode the signal. The Information perspective took hold with the advent of the cognitive revolution and the use of computational metaphors to describe animal communication. Smith (1977) famously suggested that a one-to-one meaning-signal mapping was too simplistic and rather perceivers must invest effort to interpret the signals they are exposed to. Around the same time, the discovery that vervet monkey alarm calls might have clear semantic meanings (Seyfarth et  al., 1980) provided further strength to a perspective that believed in the possibility of developing species typical lexicons, especially for non-human primates. In general, ethologists and behavioral ecologists have long tried to avoid the use of cognitive terminology to make sense of animal communication, basically avoiding terms such as intentionality or theory of mind. This has led to a fascinating side effect: when ethologists talk about animal communication, there are really two types: human communication vs. all other animal species. While the idea of a single framework for all animal species but humans might on the surface seem appealing, it blinds researchers to the fact that the cognitive abilities and social life of a chimpanzee do not resemble those of moths, ants, spiders, etc. For example, we know that great apes recognize each other’s goals by inferring them through others’ behaviors. Great apes know what other individuals can see from their position (Hare et al., 2000) and know what others know (Hare et al., 2001). Furthermore, a recent study has shown that great apes can also pass the implicit version of the false-belief task (Krupenye et al., 2016). In other words, great apes exhibit all the basic skills related to what has been called Theory of Mind (Premack & Woodruff,

158 Federico Rossano

1978). In what way is this relevant for our theories about language evolution? And most importantly, how do these abilities map onto a cooperative vs. a competitive view on animal (and human) communication?

Social manipulation, mind-reading, ontogenetic ritualization Modifying their original chapter on information vs. manipulation in animal signals (1978),Krebs and Dawkins (1984) connect the ability to manipulate objects to the ability to manipulate the behavior of other individuals and most importantly add the capacity to mind-read others, i.e. recognizing other intentions and desires and anticipating how they will behave when confronted with different contextual situations, as key for the development of successful animal communication. They write: “Mind-reading and manipulation are not isolated phenomena. They are intimately locked together in evolutionary arms races and feedback loops. Mindreading is a prerequisite for the evolution of manipulation. Manipulation evolves as an evolutionary response to mind-reading. Mind-reading and manipulation coevolve, and signals are the results of this coevolution.” (1984: 283) They also add that it is not surprising that communicative signals are usually derived from activities and behaviors that were not communicative to begin with. Indeed, this is an easy way to deal with the issue of recognizability of the signal. If one can read other’s intentions or recognize some basic behavioral pattern and its consequences, then just seeing the beginning of that sequence should allow to project what is going to happen next and lead to a reliable response. While at the time of their writing they were talking about phylogenetic ritualization (Tinbergen, 1952; Huxley, 1966), it is now clear that this same process can be observed in non-human primates within their own life span. Tomasello and colleagues (1994) proposed that the learning process involved in great ape gestural development was ontogenetic ritualization, in which individuals learn their gestures in the context of regularly occurring dyadic interactions such that parts of fully functional social behaviors become ritualized (see Caselli et al., 2012 for similar claims concerning human children). Recent work by Rossano and colleagues (Halina, Rossano & Tomasello, 2013; Rossano, under review) has not only shown evidence for this process in the gestural development of young bonobos, but also shown how fast this process can be, contrary to general expectations (as fast as 2 or 3 weeks, according to Rossano, under review). Arbib and colleagues have further shown that this process is computationally plausible once dyadic brain modeling and mirror neurons are taken into account (Arbib et  al., 2014) and roboticists have shown that it is easier for robots to converge on a shared communicative repertoire by relying on ontogenetic ritualization than on imitation (Spranger & Steels, 2014).

Social manipulation, turn-taking and cooperation 159

One interesting suggestion posited by Krebs and Dawkins (1984) is that communicative signals might develop both in a competitive and in a more cooperative (e.g. among kin) context and one way to tease apart what led to development of a signal consists in focusing on whether the signal is amplified, loud and redundant (probably a competitive origin) or rather reduced (probably a cooperative origin). As they phrase it: “the evolution of cooperative signaling should lead not to loud, exaggerated, repetitive, conspicuous signals, but to cost-minimizing conspiratorial whispers.” (Krebs & Dawkins, 1984: 319). Following Krebs & Dawkins’ model, gestures developed through ontogenetic ritualization should develop in cooperative situations and indeed current evidence suggests that mother-infant interaction and play situations between juveniles are the main contexts in which ritualization of gestures has been documented. But what about other contexts? How are signals learned or developed between nonkin? It is clear that ontogenetic ritualization is not sufficient by itself to develop the kind of lexicon typical of human languages nor would it be ideal for one-off interactions with strangers. Are signals between non-kin part of their biologically predisposed repertoire (see Byrne et al., 2017 for a recent review) or rather the byproduct of other processes? And are they produced to exchange information or to manipulate others? The answer is most likely in the middle and some of my ongoing research suggests that manipulation (and not just information transfer) might be equally at play in social interaction between great apes. The goal here is to outline some of these findings to revise our understanding of what was likely in place with LCA-a (apes). First, from a longitudinal project on gesture development in non-human great apes we are finding compelling evidence for imitation of non-communicative behavior between juveniles and adult females that they spend a large amount of time with. While these juveniles do not imitate much of mother’s behavior, once they begin spending more time away from mother and interacting more with others (especially non-kin), they can be observed picking up idiosyncratic behaviors of their adult partners (e.g. rolling sideways to move from one location to another rather than walking but only when the non-kin adult female is co-present), and to reproduce them while being observed by these partners (Rossano, in preparation). These behaviors resemble what recently documented by Blakemore in her studies on the social brain of human adolescents (2012) and fit nicely with the homophily theory of cultural evolution put forward by Haun and Over (2015), in that it suggests that juveniles would imitate other adults not just in their communication but also in their non-communicative behavior to facilitate their social integration and increase social interconnectedness. This is similar to the claim that the goal of imitation in human children is not just to acquire successful techniques but also behaving like a group member and therefore learning to fit in (see Over &

160 Federico Rossano

Carpenter, 2012 and all research on over-imitation in preschoolers, e.g. Nielsen & Blank, 2011). Second, from a longitudinal food sharing project in orangutans conducted at Leipzig zoo over 10 sessions each year in 2010, 2012, 2014 and 2016 (see Rossano & Liebal, 2014; Liebal & Rossano, 2017; and Kaufhold & Rossano, in preparation), we found that not only is requesting very rarely used to access food controlled by another individual (less than 5% of the times in each year instantiation of the food sharing task) but it is also less effective by far than stealing (approximately 33% effective vs. 80% of stealing). In this respect, the question is not why do apes steal but rather why would they ever request? A closer look at the data shows that the individuals requesting are either adult females towards the male or, less likely, juveniles towards their own mothers at a time when they have begun spending more time away from them. In this respect, it appears that requesting is a social test for the strength of a relationship and as such the granting of the request is a measure of that strength. In this respect, instead of looking at requesting as a display of deference or as evidence of an orientation towards property norms of any kind, requesting is a manipulative test to decide how to behave next – granting of a request does not lead to the end of harassment or of requesting but rather to a further request (see Kaufhold & Rossano, in preparation). Third, in a series of studies aimed at testing how orangutans, the least gregarious genus of great apes, interact with each other when presented with cooperation problems, we observed clear evidence of social manipulation and social tool use, both between mother and infants but also between adult individuals (see Völter, Rossano, & Call, 2015; 2016). In a first study we presented three orangutan mother-offspring dyads with different situations in which we placed a food reward outside their enclosure. Crucially, the offspring (all juveniles were of similar age, MAge = 4 years), unlike their mothers, could reach the food reward due to their smaller body size. All orangutan mothers not only stole food from their offspring once they had obtained it but they even proactively manipulated their offspring’s actions. The orangutan mothers actively coerced their offspring into retrieving the food by carrying their offspring to the locations where the offspring could access the food, pushing their hands and bodies toward the food and pulling them back once they had grabbed the food. Crucially, this social tool-use depended on their offspring’s willingness to complete the required actions (i.e. grabbing the food). That is to say, orangutan mothers could only coerce their infants into performing parts of the solution but they could not force their infants into grabbing the food. Their actions, therefore, resembled physical tool use (e.g. using a stick to rake in an out-of-reach reward) but could not be reduced to it because they had to take the self-controlled actions of the social tool into account. Next, we presented three adult orangutans with a cooperative problem-solving apparatus. One individual

Social manipulation, turn-taking and cooperation 161

received the tool but only another individual in the adjacent compartment of the enclosure could insert it into the apparatus. We found that the orangutan females passed the tool spontaneously and reliably to each other and maintained cooperation even when they knew they would not receive food in some trials. Yet when we presented the individual with the tool with a nonsocial alternative (i.e. an apparatus that allows the individual to obtain food only for herself, without cooperating with conspecifics), orangutans passed the tool on to their partner only if they could obtain a higher value food reward by cooperating with the partner and the overall passing of the tool went from 100% to 35% overall and most importantly to only 9% when the food that they could access by themselves was equal to what they could get by collaborating with the others (note that in the latter case, cooperation would lead the others to also get a reward). In other words, the apparent altruism between orangutans observed in the initial study turned out to be simply the consequence of the lack of alternatives. Once an alternative with an equal payoff was provided, all concerns for others’ welfare disappeared and the behavior changed dramatically.

Towards a new road map In this paper I have tried to outline some of the open questions concerning some key steps towards language evolution that I believe are either currently neglected or that deserve further attention. In particular I outlined how just focusing on how signals (and participants capacity to interpret them) might develop does not yet explain what type of motivation is required to actually deal with those signals in a way that would be somehow beneficial for both the signaler and the perceiver of the signal. Without the consistent production of appropriate responses to the production of communicative signals, there would be no point in producing any signal. If language is a tool to accomplish things with others, we need to understand what would lead to cooperation and ultimately to the willingness to engage in a conversation. The first step consists in avoiding the common misconception that given the cooperative nature of human communication, language use requires some prosocial attitude. A great deal of cooperation can occur while each participant in the interaction is selfishly attempting to maximize their own benefits or minimizing damaging consequences. Cooperation ultimately requires working together towards a goal and while it requires understanding the complexity of the task and some basic planning abilities (e.g. understanding that the help of another individual is needed to successfully obtain food), it does not require taking into account the benefits for the other individual(s) involved in the process nor being motivated by the other individual’s welfare.

162 Federico Rossano

I have claimed that turn-taking can be achieved via different levels of cognitive complexity and conveyed that interpretive turn-taking requires a great deal of cognitive abilities that great apes seem to possess, at least to some degree. But the evolutionary roots of human turn-taking should not be searched just in primates vocalizations but also in their gestural exchanges, which are significantly more flexible and adaptive. Also, human communication is clearly not just about conveying information nor just about manipulating others and indeed achieving social affiliation via sharing emotional states (especially displaced ones, like how excited we felt yesterday, how concerned we are about tomorrow) might be another key functions of language use (see Semendeferi, this volume). Hence a more complex model that takes into account the interplay of different communicative and social motivations seems necessary. Here I have only provided a glimpse of the current research that is showing plenty of evidence of social manipulation in non-human primates. Most importantly, given our awareness of the occurrence of social manipulation in human adults, it seems necessary to reconsider to what degree cooperation by young children is manipulative. When Krebs and Dawkins presented a theoretical model of communication as manipulation by including mind-reading and ritualization into the picture, there was little empirical evidence to support their claims for nonhuman primates. Now we are beginning to see multiple facets of manipulation and social tool use in great apes and this capacity should be further studied in other non-human primates and social animals. The last two decades have also seen a growing number of studies suggesting other-regarding preferences in nonhuman primates (that are not limited to kin), in particular, in brown capuchin monkeys and chimpanzees. Long-term characteristics of social relationships (including friendships, dominance, and kinship) appear to have a larger impact on prosociality than short-term interactions, for instance in the form of calculated reciprocity (Amici et al., 2014). However, the robustness of these findings is weakened by many non-replications. Considerable inter-individual and inter-group variance seems to be a contributing factor for the low replication rate together with methodological differences across studies. Identifying the precise determinants of this high variance in results across studies will be a tedious but critical endeavor in future research. A more thorough investigation of the motivation for the cooperation (especially during social interaction and in response to communicative signals) will be a key endeavor towards developing a more compelling model of what lead to language evolution and humanlike social interaction and socio-cognitive abilities.

Social manipulation, turn-taking and cooperation 163

Funding The paper was prepared for a workshop funded by NSF Grant No. BCS-1343544 “INSPIRE Track 1: Action, Vision and Language, and their Brain Mechanisms in Evolutionary Relationship,” (M.A. Arbib, Principal Investigator).

References Amici, F., Aureli, F., Mundry, R., Amaro, A. S., Barroso, A. M., Ferretti, J., Call, J. (2014a). Calculated reciprocity? A comparative test with six primate species. Primates 55, 447–457. https://‍ Arbib, M. A., Ganesh, V. & Gasser, B. (2014). Dyadic Brain Modeling, Ontogenetic Ritualization of Gesture in Apes, and the Contributions of Primate Mirror Neuron Systems. Phil Trans Roy Soc B, 369 (1644), 20130414.  https://‍ Blakemore, S. -J. (2012). Development of the social brain in adolescence. Journal of the Royal Society of Medicine, 105(3), 111–116.  https://‍ Bruner, J. S. (1975). The ontogenesis of speech acts. Journal of child language, 2(1), 1–19. https://‍ Burkart, J. M., Hrdy, S. B., & Van Schaik, C. P. (2009). Cooperative breeding and human cognitive evolution. Evolutionary Anthropology: Issues, News, and Reviews, 18(5), 175–186. https://‍ Burnham, T. C., & Johnson, D. D. (2005). The biological and evolutionary logic of human cooperation. Analyse & Kritik, 27(1), 113–135.  https://‍ Byrne, R. W., Cartmill, E., Genty, E., Graham, K. E., Hobaiter, C., & Tanner, J. (2017). Great ape gestures: intentional communication with a rich set of innate signals. Animal Cognition, 1–15. Caselli, M. C., Rinaldi, P., Stefanini, S., & Volterra, V. (2012). Early action and gesture “vocabulary” and its relation with word comprehension and production. Child development, 83(2), 526–542. Chow, C. P., Mitchell, J. F., & Miller, C. T. (2015). Vocal turn-taking in a non-human primate is learned during ontogeny. In Proc. R. Soc. B (Vol. 282, No. 1807, p. 20150069). The Royal Society. Christiansen, M. H., & Chater, N. (2016). The Now-or-Never bottleneck: A fundamental constraint on language. Behavioral and Brain Sciences, 39. https://‍ Dawkins, R., & Krebs, J. R. (1978). Animal signals: information or manipulation. Behavioural ecology: An evolutionary approach, 2, 282–309. Garrod, S., & Pickering, M. J. (2004). Why is conversation so easy?. Trends in cognitive sciences, 8(1), 8–11.  https://‍ Grice, H. P. (1975). Logic and Conversation. In P. Cole & J. L. Morgan (Eds.), Syntax and semantics. Volume 3: Speech Acts (pp. 225–242). New York: Seminar Press. Halina, M., Rossano, F., & Tomasello, M. (2013). The ontogenetic ritualization of bonobo gestures. Animal cognition, 16(4), 653–666.  https://‍ Hare, B., Call, J., Agnetta, B., & Tomasello, M. (2000). Chimpanzees know what conspecifics do and do not see. Animal Behaviour, 59(4), 771–785.  https://‍

164 Federico Rossano Hare, B., Call, J., & Tomasello, M. (2001). Do chimpanzees know what conspecifics know?. Animal behaviour, 61(1), 139–151.  https://‍ Haun, D., & Over, H. (2015). Like me: a homophily-based account of human culture. In Epistemological dimensions of evolutionary psychology (pp. 117–130). Springer New York. Henry, L., Craig, A. J., Lemasson, A., & Hausberger, M. (2015). Social coordination in animal vocal interactions. Is there any evidence of turn-taking? The starling as an animal model. Frontiers in psychology, 6, 1416. Heritage, J. (1984). Garfinkel and Ethnomethodology. Cambridge: Polity Press. Hobaiter, C., & Byrne, R. W. (2011). Serial gesturing by wild chimpanzees: its nature and function for communication. Animal cognition, 14(6), 827–838. https://‍ Hrdy, S. B. (2011). Mothers and others. Harvard University Press. Huxley, J. (1966). The ritualization of Behaviour in animals and man. Philosophical Transactions of the Royal Society Vol. 251, 772, 249–269.  https://‍ Kinzler, K. D., Dupoux, E., and Spelke, E. S. (2007). The native language of social cognition. Proc. Natl. Acad. Sci. U. S. A. 104, 12577–80.  https://‍ Krebs, J. R. & Dawkins, R. (1984). Animal signals: mindreading and manipulation. In J. R. Krebs, N. B. Davies (Eds.), Behavioural Ecology: an Evolutionary Approach (2nd edn), Blackwell Scientific Publications, Oxford (1984), pp. 380–402. Krupenye, C., Kano, F., Hirata, S., Call, J., & Tomasello, M. (2016). Great apes anticipate that other individuals will act according to false beliefs. Science, 354(6308), 110–114. https://‍ Levinson, S. C. (2013). Action Formation and Ascription. In J. Sidnell & T. Stivers (Eds.), Handbook of conversation analysis (pp.103–130). Malden, MA: Wiley-Blackwell. Levinson, S. C. (2016). Turn-taking in human communication–origins and implications for language processing. Trends in cognitive sciences, 20(1), 6–14. https://‍ Liebal, K., & Rossano, F. (2017). The give and take of food sharing in Sumatran orang-utans, Pongo abelii, and chimpanzees, Pan troglodytes. Animal Behaviour, 133, 91–100. https://‍ Mascaro, O., & Sperber, D. (2009). The moral, epistemic, and mindreading components of children’s vigilance towards deception. Cognition, 112(3), 367–380. https://‍ Nielsen, M., & Blank, C. (2011). Imitation in young children: when who gets copied is more important than what gets copied. Developmental psychology, 47(4), 1050. https://‍ Over, H., & Carpenter, M. (2012). Putting the social into social learning: explaining both selectivity and fidelity in children’s copying behavior. Journal of Comparative Psychology, 126(2), 182.  https://‍ Owren, M. J., Rendall, D., & Ryan, M. J. (2010). Redefining animal signaling: influence versus information in communication. Biology & Philosophy, 25(5), 755–780. https://‍ Premack, D., & Woodruff, G. (1978). Does the chimpanzee have a theory of mind? The Behavioural and Brain Sciences, 1, 515–526.  https://‍ Richerson, P. J., & Boyd, R. (2005). Not by genes alone. University of Chicago Press. Rossano, F. (2013). Sequence organization and timing of bonobo mother-infant interactions. Interaction studies, 14(2), 160–189.  https://‍

Social manipulation, turn-taking and cooperation 165

Rossano, F. (2019). The structure and timing of human versus primate social interaction. In Hagoort, P. (ed.) Human Language: from Genes and Brains to Behavior. MIT Press, 201–219. Rossano, F., & Liebal, K. (2014). Requests’ and ‘offers’ in orangutans and human infants. Requesting in social interaction, 333–362. Sacks, H. (1992). Lectures on Conversation: Volume II. Oxford: Blackwell. Sacks, H., Schegloff, E. A., & Jefferson, G. (1974). A Simplest Systematics for the Organization of Turn-Taking for Conversation. Language, 50, 696–735. https://‍ Schegloff, E. A. (1992). Repair after next turn: the last structurally provided for place for the defense of intersubjectivity in conversation. American Journal of Sociology, 95(5), 1295–1345. https://‍ Schegloff, E. A. (2007). Sequence Organization in Interaction: a Primer in Conversation Analysis. Cambridge, England: Cambridge University Press. https://‍ Seyfarth, R. M., & Cheney, D. L. (2003). Signalers and receivers in animal communication. Annual review of psychology, 54(1), 145–173. https://‍ Seyfarth, R. M., Cheney, D. L., & Marler, P. (1980). Vervet monkey alarm calls: semantic communication in a free-ranging primate. Animal Behaviour, 28(4), 1070–1094. https://‍‍80097-2 Shannon, C. E., & Weaver, W. (1949). The mathematical theory of communication. Urbana, Ill. Univ. Illinois Press, 1, 17. Silk, J. B., Brosnan, S. F., Henrich, J., Lambeth, S. P., & Shapiro, S. (2013). Chimpanzees share food for many reasons: the role of kinship, reciprocity, social bonds and harassment on food transfers. Animal behaviour, 85(5), 941–947. https://‍ Smith, W. J. (1977). The behavior of communicating. Harvard University Press. Spranger, M., & Steels, L. (2014, October). Discovering communication through ontogenetic ritualisation. In Development and Learning and Epigenetic Robotics (ICDL-Epirob), 2014 Joint IEEE International Conferences on (pp. 14–19). IEEE. Stivers, T., Enfield, N. J., Brown, P., Englert, C., Hayashi, M., Heinemann, T., et  al. (2009). Universals and cultural variation in turn-taking in conversation. PNAS, 106(26), 10587–10592.  https://‍ Tinbergen, N. (1952). “Derived” activities; their causation, biological significance, origin, and emancipation during evolution. The Quarterly review of biology, 27(1), 1–32. https://‍ Tomasello, M. (2009). Why we cooperate. Cambridge, MA: The MIT Press. Tomasello, M., Call, J., Nagell, K., Olguin, R., & Carpenter, M. (1994). The learning and use of gestural signals by young chimpanzees: A trans-generational study. Primates, 35(2), 137–154.  https://‍ Tomasello, M., Carpenter, M., Call, J., Behne, T., & Moll, H. (2005). Understanding and sharing intentions: The origins of cultural cognition. Behavioral and brain sciences, 28(5), 673–727. https://‍ Tomasello, M., Melis, A. P., Tennie, C., Wyman, E., Herrmann, E., Gilby, I. C. & Melis, A. (2012). Two key steps in the evolution of human cooperation: the interdependence hypothesis. Current anthropology, 53(6).  https://‍

166 Federico Rossano Trivers, R. L. (1971). The evolution of reciprocal altruism. The Quarterly review of biology, 46(1), 35–57.  https://‍ Uetz, G. W., Roberts, J. A., & Taylor, P. W. (2009). Multimodal communication and mate choice in wolf spiders: female response to multimodal versus unimodal signals. Animal Behaviour, 78(2), 299–305.  https://‍ Völter, C. J., Rossano, F., & Call, J. (2015). From exploitation to cooperation: social tool use in orang- utan mother–offspring dyads. Animal Behaviour, 100, 126–134. https://‍ Völter, C. J., Rossano, F., & Call, J. (2016). Social manipulation in nonhuman primates: Cognitive and motivational determinants. Neuroscience & Biobehavioral Reviews. Wiener, N. (1948). Cybernetics: Control and communication in the animal and the machine (p. 194). New York: Wiley. Wilson, M. L., Boesch, C., Fruth, B., Furuichi, T., Gilby, I. C., Hashimoto, C., Hobaiter, C. L., Hohmann, G., Itoh, N., Koops, K., et al. (2014). Lethal aggression in Pan is better explained by adaptive strategies than human impacts. Nature 513, 414–417. https://‍

Language origins Fitness consequences, platform of trust, cooperation, and turn-taking Sławomir Wacewicz and Przemysław Żywiczyński Nicolaus Copernicus University, Poland

In this paper, we complement proximate or ‘how’ explanations for the origins of language, broadening our perspective to include fitness-consequences explanations, i.e. ultimate, or ‘why’ explanations. We identify the platform of trust as a fundamental prerequisite for the development of a language-like system of symbolic communication. The platform of trust is a social niche in which cheap but honest communication with non-kin is possible, because messages tend to be trusted as a default. We briefly consider the place of the platform of trust on the road map as laid out in the Mirror System Hypothesis. We then turn to recent research on turn-taking in primates, which has been proposed as a precursor of the cooperative structuring of conversation in humans. We suggest, instead, that human turn-taking, in its full richness that makes it an interesting explanatory target, may only appear in a communicative system that is already founded on a community-wide, cooperative platform of trust. Keywords: Mirror System Hypothesis, language evolution, language origins, cooperation, turn-taking, conversation, trust, proximate explanations, ultimate explanations

1. Introduction Human communication is uniquely founded on a “platform of trust” (Wacewicz, 2015): we tend to accept utterances even if no immediate evidence is present to back them up. While deception is always a possibility, it is nevertheless generally contained within manageable limits, and humans typically use language to provide non-kin with cheap but honest information (see Rossano, this issue), which is trusted at least as a default. Although intuitively obvious, this type of behavior is an evolutionary puzzle and its emergence runs counter to the predictions of evolutionary game theory – as we explain in Section 4. https://‍ © 2020 John Benjamins Publishing Company

168 Sławomir Wacewicz and Przemysław Żywiczyński

We consider the cooperative platform of trust to be a “deep” design feature of language, i.e., one that is not immediately visible on the surface-level as a structural property or a cognitive requirement for language. For this reason, the platform of trust is absent from the standard lists of design features of language (Hockett, 1963; but also cf. Hauser et al., 2002). This “deep design feature” is also conspicuously missing from the Mirror System Hypothesis (MSH; Arbib, 2012, 2016), which is otherwise arguably the most complete account of language origins at the level of implementation mechanisms. In what follows, we depart from the level of implementation and rigorously assess the “fitness consequences” of the evolution of the platform of trust, which allows us to demonstrate its importance to language origins hypotheses, including the MSH. Importantly, the platform of trust is understood in socio-ecological terms, so e.g. no psychological sense of trust is implied. Our core “platform of trust” argument is an extension of that put forward by Knight (e.g. 1998), and in a larger sense we take it to be a version of a consensus view that is different from previous proposals (e.g. Tomasello, 2008; Zlatev, 2014; see Section 4) mostly in a consistent focus on the fitness-consequences perspective. 2. Communication: A fitness-consequences perspective Following Tinbergen (1963), behavioral science distinguishes between several levels of analysis which, rather than being mutually exclusive, are all needed for an integrative understanding of an adaptation (whether morphological or behavioral). A majority of contributions in the present issue work from the perspective of phylogenetic continuity, often with a focus on proximate mechanisms (“what” or “how” explanations). This paper is different in having behavioral ecology as its point of departure, and consequently starting from what have been called ultimate (or “why”) explanations which are “concerned with the fitness consequences of a trait or behavior and whether it is (or is not) selected” (Scott-Phillips et al., 2011: 38) and abstract away from implementation details. For behavioral adaptations, rather than assessing their engineering viability, this perspective focuses on the ecological and especially socio-ecological conditions under which a behavioral strategy can invade a population and become evolutionarily stable. The difference between the “engineering” and Darwinian “fitness consequences” perspectives is particularly important for studying interaction between multiple agents. Communication in non-human animals is one such example. From the genetic “fitness consequences” perspective the details of neurocognitive implementation are irrelevant; instead what distinguishes communicative from praxic action is the nature of payoff. In praxic action the payoff is direct, unlike in communicative action, where it depends on the praxic action of another (cf. communication as “a means by which one animal makes use of another animal’s muscle

Language origins 169

power”, Krebs and Dawkins, 1984). Successful reaching requires good motor skills, but a successful request requires compliance of the requestee (who typically prefers to tend to their own wellbeing). To be evolutionarily successful in a Darwinian world, individuals use heuristic calculations of costs and benefits to assess what course of action is profitable, including whether it pays to produce honest signals or react to others’ signals. 3. The platform of trust It is typically assumed that in most animal populations, including primate societies, the default result of these subconscious quasi-economic calculations is negative. Under standard circumstances, the strategy of producing honest informative signals, or reacting to others’ signals, is outcompeted by alternative strategies. In most contexts, communicating “dishonestly” and manipulating the receiver to one’s own advantage will result in greater fitness benefits than communicating “truthfully”, leading natural selection to favor manipulators over “honest” signalers. The presence of manipulators, in turn, puts receivers at risk of losing fitness from “deceit”. As a result, they are selected to consider signals as potentially deceptive – a default that only changes when signal honesty can be reliably ascertained on independent grounds. By this simple logic, first expounded by Krebs and Dawkins (1984), honest communication is not an evolutionarily stable strategy and is not normally expected to evolve. Exceptions abound due to several mechanisms that remove the above honesty constraint by making dishonesty impossible or counterproductive (e.g. Maynard Smith and Harper, 2003; Fitch, 2004; Wacewicz and Zywiczynski, 2012). Main examples include handicaps (inherently high signal cost, see esp. Zahavi and Zahavi 1997), indexes (content strictly dependent on form), kin selection (messages selectively directed to one’s relatives), reciprocity (e.g. exchange of mutually honest signals over a history of repeated interactions), or normative regulations (e.g. external penalty for signaling deceptively). Importantly, none of them explains the emergence of language, which is cheap to produce, arbitrary, not limited to kin or stable dyads, and is itself the medium necessary for formulating explicit codices. From this perspective, the most important and most unusual property of language is large-scale “information donation” (e.g. van Schaik, 2016): communicators share honest messages with biologically unrelated individuals, who, in their turn, mostly accept such messages without demanding evidence. The qualitative difference between human and animal communication is thus flipping the default setting from expecting manipulation to expecting honesty. In other words, language is founded on a global, i.e. community-wide, platform of trust (Wacewicz,

170 Sławomir Wacewicz and Przemysław Żywiczyński

2015).1 Note that lying, common in human communication, is a case in point rather than a counterexample, because it constitutively requires trust: it is not possible to lie unless receivers are predisposed to trust the message at face value. Information donation does exist in other animals, including primates, as exemplified by food calls and alarm calls. However, such information donation is limited to a small, closed set of narrowly specified contexts, and honesty appears to be externally guaranteed by ecological factors such as minimizing predation risk in food calling, or by kin selection and handicaps in alarm calls (see Maynard Smith and Harper, 2003; Zahavi and Zahavi, 1997). This is in stark contrast with language, where the sharing of information is ubiquitous, pervasive and domainindependent. Indeed, the very powerful human “drive or need to share thoughts and feelings” has been proposed as one of the most distinctive characteristic of our species (Fitch, 2010: 140). 4. Cooperation In language evolution research, many see the above honesty constraint as the single most important constraint on the emergence of language (e.g. Power, 2014), and the origins of large-scale sharing of honest information as the “central puzzle” of language evolution (Knight, 1998; Fitch, 2010). A common understanding is that this qualifies as a subtype of the general problem of the evolution of cooperation. The cooperative nature of language requires a comment. While there is broad agreement that human language is indeed a “cooperative” communication system, attention to definitions is important (as is generally the case in interdisciplinary research). To different researchers, the cooperative nature of language may manifest itself, for example, in the collaborative design of conversation (Grice, 1975; Clark, 1996), in the rules of collaborative face maintenance (cf. Hurford, 2007: 274), or in the use of symbolic communication for better coordination of multiperson collaborative tasks (Gärdenfors, 2004). Similarly, in the various literatures relevant to language evolution, “cooperation” more generally is used in a loose way close to the fuzzy vernacular meaning, and may simply indicate “collective action” of more than one individual resulting in a net benefit (e.g. Rossano, this issue), or even coordinated joint action in general. In such contexts, we prefer the term collaboration. In contrast, for the reasons explained in (Wacewicz et al., 2017) and consistent with the fitness consequences perspective, we follow here the game-theoretic construal of cooperation.

1.  See also Tan et al., 2017 for a similar concept of “circle of trust”; and Knight, 1998 for the original argument on trust.

Language origins 171

Cooperative behavior is a classic problem in game theory because non-cooperative behavior typically pays off more, i.e. “purely selfish alternatives most often provide superior fitness” (Sachs and Rubenstein, 2007). Information donation – providing others with veridical information  – meets the technical definition of cooperation (e.g. West et al., 2011), and as such, information donation is subject to similar game-theoretic constraints as the well-known constraints on evolution and stability of cooperative behavior in general (Axelrod, 1984), most importantly vulnerability to “cheaters” (see also Fitch, 2010: 414–417 for an explanation). Just like large-scale cooperation is the exception rather than the rule in animal behavior more generally, large-scale information donation is similarly the exception rather than the rule in animal communicative behavior.2 4.1 The evolutionary origins of human cooperation What are the phylogenetic roots of cooperative signaling in humans? One possibility is that “low-level” interactive mechanisms already present in the LCA-a (as inferred from studies on extant apes) – such as proxemic alignment, postural mirroring or the matching of mannerisms – could scale up to yield a level of trust necessary to support a cooperative signaling system. In a recent paper (Wacewicz et al., 2017) we review a number of mechanisms, e.g. building psychological affiliation based on the similarity of subtle nonverbal cues, which could help stabilize the precarious cooperative character of a signaling system. Although evidence shows that such mechanisms indeed work to bolster cooperation, it is highly unlikely that they would alone suffice to install a platform of trust. Note that when such mechanisms are at work in non-human apes, they tend to be limited to strongly bonded dyads, suggesting that they could at best install isolated islands of trust but not a group-wide platform of trust. The alternative is that the platform of trust is a result of deeper and more overarching human cooperative dispositions. From a game-theoretic perspective, cooperatively informing genetically unrelated individuals is not qualitatively different from other manifestations of cooperation, including the more obvious kind of information donation, i.e. teaching, but also active food sharing or joint alloparental care for immature offspring. What they have in common in terms of fitness is that they all contradict the “economicus” stance of maximizing short-term gains. Humans 2.  Communication in honeybees constitutes a well-known example of reliable and systematic information donation. This is consistent with the present account, because cooperative communication, together with other cooperative behaviors within the bee colony, is stabilized against defection by kin selection. Here, selfish alternatives are not superior, because hymenopteran full sisters are very close genetic kin, so any “cheating” would automatically lower their own inclusive genetic fitness (see Krebs and Dawkins, 1984, for discussion).

172 Sławomir Wacewicz and Przemysław Żywiczyński

exhibit a large suite of such cooperative behaviors, which has been taken to suggest that these behaviors all stem from a common, underlying cooperative basis, domaingeneral rather than limited to communication (e.g. Hare, 2017; Knight and Lewis, 2017; Burkart et al., 2009; Burkart et al., 2014; see also Burkart et al., this issue). Importantly, humans demonstrate the above behaviors uniquely among apes, and near-uniquely among primates (see below). Conversely, available ethological (e.g. Mitani, 2009) and experimental (e.g. Tomasello, 2008) evidence suggests that cooperation, while not completely absent, is severely limited in non-human apes: “it is arguably true that cooperation, with kin and nonkin alike, is a hallmark of humankind, setting us apart in a significant way from our closest living relatives” (Mitani, 2009). Consequently, an emerging consensus – largely based on the work of Tomasello and collaborators – is that the foundation for human cooperation is at least partly hardwired as a biological adaptation. The most recent accounts of the evolutionary emergence of cooperation and prosociality in humans (see, e.g., Zlatev, 2014, for review) tend to underscore the role of alloparenting in humans, who are the only cooperative breeders among apes (see esp. Hrdy, 2009; Burkart et al., 2014). The cooperative breeding hypothesis (CBH) rests on the example of callitrichid monkeys (tamarins and marmosets), who are the only other cooperative breeders among primates and also exhibit a suite of cooperative behaviors, including teaching, food sharing, and joint vigilance (Burkart et al., 2009; Burkart et al., this issue), all supported by a proactive prosocial motivation (Jaeggi et al., 2010). Incorporating insights from CBH, Tomasello has updated his earlier Interdependence Hypothesis by which human cooperativeness originally arose from ecological constraints, i.e. obligate collaborative foraging (Tomasello et  al., 2012). The more recent “composite account” (Tomasello and Gonzalez-Cabrera, 2017) additionally explains a suite of prosocial characteristics that appear early in human ontogeny as adaptations for interaction with alloparents. 5. The platform of trust and the Mirror System Hypothesis As we have seen, the emphasis on fitness points to the platform of trust as a central precondition for language emergence, because only the platform of trust provides a social niche in which language generates net fitness benefits rather than costs. We understand the platform of trust non-psychologically, as a social niche in which large-scale cheap but honest communication is possible because messages tend to be trusted as a default. But on the individual level, it must of course be underpinned by the relevant neuronal, cognitive and behavioral adaptations. The platform of trust is not explicitly identified as an explanatory target in the 2012 and 2016 “roadmaps” (versions of MSH as laid out in Arbib, 2012, 2016).

Language origins 173

How and where should it be added? Two sorts of considerations suggest a relatively early (even if gradual) emergence, that is placing the platform of trust together with the first seven properties of language readiness (Arbib, 2012: 164; 2016: 10), which require a biological anchoring. Firstly, the research reviewed in Section 4 points to biological roots of human cooperation, identifying it as part of our species-specific endowment. Secondly, on the present account even the simplest forms of honest informative communication that qualify as information donation (e.g., pantomime) are subject to the honesty constraint and therefore the platform of trust must predate any such forms of early communication. As for the proximate-level implementation, the existence of a communitywide platform of trust must translate into relevant motivational dispositions (cf. van Schaik, 2016: 423 “[evidence] strongly suggests that the most important limitation to the evolution of human-like language was indeed the motivation to share information, rather than the cognitive ability to do so”). In line with Section 4.1, we suggest that the relevant cooperative disposition in humans is general rather than specific to the domain of communication: a proactive prosocial motivation, i.e., psychological mechanisms that generate prosocial behavior relatively spontaneously and across a wide variety of contexts rather than being conditional on specific triggering stimuli (Jaeggi et al., 2010). One example where such a motivation component would be required is the ontogenetic ritualization (OR) model that Arbib et al. (2014) provide to exemplify a more general idea of dyadic brain modelling. The OR model transforms a praxic action into a communicative signal over a series of interactions in a Mother-Child ape dyad. From a fitness-consequences perspective, the model works because of the local alignment of interests between the mother and the child: a local “island of trust” emerges. Although the initial goal-states of the mother and the child are different, they naturally converge on the common goal of bonding, and this common goal determines the desirability values of potentially executable actions. From the present perspective, this model would be difficult to scale up to the community at large. The high level alignment of interests in the Mother-Child dyad cannot be realistically expected of other dyads, so the local Mother-Child “island of trust” would not extend to form a more global platform of trust. Conflicting interests would lead to conflicting goals, which would influence the desirability values of joint action. What is needed from the implementation perspective is a “prosocial” motivation mechanism that would assign relatively high desirability values to joint action, including joint communicative action, that would remain high in the absence of immediate payoff and across different dyad compositions within a community. What is needed from a fitness-consequences perspective is a socioecological niche in which such a mechanism would not result in negative fitness.

174 Sławomir Wacewicz and Przemysław Żywiczyński

6. Turn-taking Recent primatological research identifies an apparent counterexample to our account: “turn-taking” behaviors in non-human primates that putatively exemplify cooperative communication (vocal in monkeys: e.g. Ghazanfar and Takahashi, 2014; Takahashi et al., 2013; gestural in apes: Fröhlich et al., 2016, Fröhlich, 2017). In line with Levinson’s (2006) “interaction engine” hypothesis, this comparative evidence is sometimes used to assert a continuity view, where turn-taking behaviors in nonhuman primates would be a “precursor” of human communicative turn-taking (Rossano, 2013; Fröhlich et al., 2016; Fröhlich, 2017; Levinson, 2016). In our opinion, the distribution of turn-taking behaviors among various primate and non-primate taxa seems point to convergent evolution rather than evolutionary continuity (also see Burkart et al., this issue). Accordingly, contrary to the continuity view, we argue that a cooperative basis – or the platform of trust – is a necessary requirement for language, including the type of turn-taking characteristic of language. Again, definitions are important. In many such comparative studies turn-taking, or specifically linguistic turn-taking, is defined rather loosely (see Rossano, this issue, for a similar conclusion). Below, we break down linguistic turn-taking into the features of alternation, synchrony (online timing), conditional relevance and egalitarian role-reversibility, showing that the primate studies mentioned above focus on the aspect of interactional synchrony, without paying due attention to the other properties (see Rossano, this issue). 6.1 Alternation In the various literatures relevant to language evolution research, some uses of “turn-taking” only depend on the minimal requirement of alternation between two parties; more specifically, interacting parties must execute their appropriate actions in a coordinated fashion. Interestingly, examples of turn-taking given in the classic text by Sacks et al. (1974) – moves in games, terms of political office, traffic at intersections or service of customers at business establishments – fit into this conceptualization. In like fashion, in primatological literature “turn-taking” is sometimes used loosely to denote alternation at a joint activity, such as grooming, or even a competitive activity, such as feeding (e.g. Hare, 2017). 6.2 Synchrony (fast-paced temporal coordination) Much of the recent spike of interest in linguistic turn-taking results from the temporal dynamics of conversations: specifically, the speed and efficiency with which floor transfers take place. Accordingly, turn-taking results from the interaction

Language origins 175

between two pressures – the pressure to minimize gaps between conversants’ respective turns and the pressure to avoid overlaps between these turns (Sacks et al., 1974). This perspective highlights synchronization, i.e., it involves reciprocal temporal adjustment of interactional roles (Wacewicz et al., 2017). As such it belongs to a large group of behaviors that depend on the abilities “to anticipate, attend and adapt to each other’s actions in real time” (Keller et al., 2014). What has made linguistic turn-taking an interesting research problem is that turn-transitions are extremely fast when compared to how much time it takes to plan for another turn-contribution. Studies employing the measure of Floor Transfer Offset (FTO), calculated as the duration between the end and beginning of two adjacent turns, indicate that FTO values are similar in different languages (e.g. Stivers et al., 2009; cf. Levinson and Torreira, 2015; Roberts et al., 2015). Most turn reactions, irrespective of the context, come within up to 500 ms from the end of the preceding turn. Levinson and Torreira (2015) stress that this time is impressively short, if we consider that it takes 600 ms to plan for the articulation of a single lexeme (Levelt et al., 1999) and as much as 1500 ms for the articulation of a simple utterance (Griffin and Bock, 2000). Extensive research into turn-taking has shown that its rapidity and precision depends on a combination of lexico-syntactic (e.g. Ruiter et  al., 2006) and prosodic signals (e.g. Couper-Kuhlen and Setling, 1996) as well as visually transmitted cues, such as hand movements (gestures: e.g. Kendon, 2004; adaptors: Zywiczynski et al., 2017). In comparative research, turn-taking has come to be understood primarily in temporal terms as the FTO between a signal from one animal and a signal or behavior from the other animal. Such an approach can be seen in the studies on primate calls, e.g. contact calls in Campbell’s, squirrel monkeys and Diana monkeys or social-qua-grooming calls in marmosets (e.g. Masataka and Biben, 1987; Takahashi et al., 2013; Chow et al., 2015; see also Fedurek et al., 2015 for the role of lip-smacking, an audio-visual signal which apparently plays a role in coordinating grooming bouts in chimpanzees) and gestural signals (e.g., soliciting carries in bonobos: Rossano, 2013; joint-travel initiations in bonobos and common chimps, Fröhlich et al., 2016). These works attend to the ecological and interpersonal context to determine when such exchanges can legitimately count as initiations and responses. Some authors, controversially, use this for advancing claims that the studied behaviors “resemble cooperative turn-taking sequences in human conversation” (Fröhlich et al., 2016). 6.3 Conditional relevance In our view, the above conclusions are erroneous because they miss the crucial point for the understanding of linguistic turn-taking, which can be expressed by

176 Sławomir Wacewicz and Przemysław Żywiczyński

the question: Why does linguistic turn-taking constitute an explanatory target? The answer is not that it is so fast, but because it is so fast even though the content needs to be planned: it’s no challenge to say fast just anything, the challenge is to say fast something that makes sense. Due to the semantically open-ended nature of linguistic communication, the length and type of successive contributions do not follow a prescribed format – conversants produce turns of varying length and with various functions and meanings. This open-ended character of conversational interaction relates to what Levinson (1983), specifically referring to adjacency pairs, describes as conditional relevance: linguistic turn-taking is possible because conversants are able to interpret each other’s contributions in an online fashion – both their length and type, and adjust their responses accordingly (cf. relevance; Sperber and Wilson, 1986). The computational challenge that makes linguistic turn-taking such a fascinating phenomenon will not arise from time constraints alone: it only emerges when the time constraints of real-time communication are imposed on the transmission of semantically complex, open-ended propositional content. From our perspective, a key insight apparently missing from the turn-taking literature is that what a person says has fitness consequences. In a propositional communication system, each individual utterance has a potential to affect one’s (inclusive) fitness, sometimes in very dramatic ways. Examples include revealing a secret, or displaying own incompetence in public, not to mention a witty retort that can win a presidential debate, or an unwitting insult that can start a clan war. The tight temporal perspective set by the turn-taking requirements of human language magnifies the challenge: it is not sufficient for the answer to be relevant, but it needs to be optimized in terms of its social consequences. In human language, the computations of the fitness consequences of each potential reply – selected from an open-ended repertoire of propositions – need to be squeezed into the 200–500 ms window of the average turn transition. Conditional relevance is thus crucially important for linguistic turn-taking, and it is absent from the examples of non-human primate turn-taking, which does not present similar challenges of semantic fit. There, the repertoire of possible “moves” available to each interactant at any given point is not only finite but appears to be limited to a small number of options, so the successive contributions are selected from a closed set rather than compiled online. This is a qualitative difference that we trace back to the platform of trust: absent in non-human primates and present in humans. As we have argued, open-ended compositionality requires a global platform of trust. It cannot be implemented in a standard animal communication system, where each individual meaning has to be backed up by costly evidence, because combining such meanings would also exponentially increase the associated costs. Only when the default has changed to “trust unless disconfirmed” can complex communication get off the ground (see Okanoya, 2002; Knight, 1998).

Language origins 177

6.4 (Egalitarian) role reversibility An essential quality of linguistic turn-taking is the egalitarian reversibility of the speaker/hearer roles (cf. Levinson, 2006). In fact, the turn-taking rules, such as the rules for nominating next speaker (Sacks et al., 1978), testify to the egalitarian character of conversational interaction: since there is no rigid ascription of interactional roles in conversation (as is the case in many forms of communication, e.g. in courtship, e.g. Wagner, 2011; or predator-prey interaction, e.g. Vega-Redondo and Hasson, 1993), conversants must abide by the turn-taking rules to meet the timing requirements, i.e., to minimize gaps between turns and avoid overlaps between them (see 6.2). Conversational interaction is then both egalitarian and organized, the combination of which is seen as an important feature of human social organization (see e.g. Schegloff, 2000). The criterion of egalitarian reversibility of the speaker/hearer roles is not always met in the coordinative behaviors of non-human primates that have been described as turn-taking. Egalitarian role-reversibility as described above seems to present in antiphonal (call-and-response) calling in callitrichids (Takahashi et al., 2013), who characteristically are the only cooperative breeders among primates, excepting humans (cf. Burkart et al., this issue). By contrast, in many cases of ape gestural communication the roles of sender and receiver are predetermined by the relationships between the members of the communicating dyads. For example, the solicitation of carries in bonobo mother-infant dyads is necessarily initiated by the young (Rossano, 2013), while join-travel sequences in bonobo and common chimp mother-infant dyads tend to be initiated by the mother (Fröhlich et al., 2016). 7. Towards a new road map How did the platform of trust emerge in the course of human evolution? One possibility is that local “islands of trust” between individuals with unusually wellaligned interests, such as in mothers and offspring, could somehow scale up to the society at large (e.g. Fitch, 2004, see also Section 5). However, as we discussed in 4.1., a more likely scenario is that the platform of trust was itself founded on more general cooperative dispositions that extend beyond communication to permeate human sociality. The origins of human cooperation continue to present a major puzzle, but a number of influential accounts see a role for cooperative breeding (e.g. Hrdy, 2009; Burkart et  al., this issue; see also Tomasello and Gonzalez-Cabrera, 2017; and Zlatev, 2014). When did the platform of trust emerge in the course of human evolution? Its absence in extant nonhuman apes strongly suggests it was not present in LCA-c, i.e. the last common ancestor of the genera Homo and Pan. However, if the present argument stands, a global (society-wide) platform of trust is a logically necessary

178 Sławomir Wacewicz and Przemysław Żywiczyński

prerequisite for any large-scale information donation, which includes not only language but also e.g. pedagogy as well as any form of communication not based on costly signals (cf. Sections 4 and 5). The appearance of the first forms of openended communication, whether pantomimic as proposed by the MSH (Arbib, 2012, 2016) or otherwise, would have only been possible with the platform of trust firmly in place, because without it, even agents capable of producing and understanding pantomime would have simply chosen to ignore any pantomimic messages, except those backed up by independent evidence. The platform of trust, a socio-ecological niche in which messages tend to be trusted as a default, is an ultimate-level category that on the level of implementation would translate into motivation to share honest information (cf. Section 5). From that perspective, the minimal requirements for successful pantomimic communication extend beyond the cognitive abilities of complex action recognition and imitation and need to be complemented by a socio-cognitive trait of motivation: of the producer (i) to inform the receiver and (ii) to do it truthfully, and of the receiver to (i) trust such messages as non-deceptive and (ii) to respond appropriately. This points to the studies on the human motivation to talk – and being talked to – as perhaps the most promising direction in language evolution research. Why do we want to talk? One perspective is afforded by studying human clinical populations, specifically individuals suffering from Williams Syndrome, a neurodevelopmental disorder that affects motivation for social interaction (Semendeferi, this issue). On the level of brain implementation, oxytocin – neurochemical often linked to trust (but see Nave et al., 2017) – has been recently suggested as implicated in the social motivation for vocal learning (Theofanopoulou et al., 2017) and in the propensity to share knowledge as well as material resources (de Boer et al., 2017). The parallels between the motivation to donate information and to donate other resources such as food are currently explored in the cooperatively breeding callitrichids (Burkart et al., this issue), showing the relevance of comparative research. Importantly, both the “engineering” and “fitness consequences” levels of analysis need to be considered for a full, integrative understanding of the origins of the uniquely trust-based human communication system.

Acknowledgements We are grateful to Chris Knight, Paweł Fedurek and Jordan Zlatev for valuable comments. We wish to thank the participants of the August 2017 workshop “How the Brain Got Language” for discussion, and we are particularly grateful to Michael Arbib for extensive and insightful comments on earlier versions of this draft. All remaining errors and omissions are our own.

Language origins 179

Funding This research was supported in part by the Faculty of Languages, Nicolaus Copernicus University, research fund. The paper was prepared for a workshop funded by NSF Grant No. BCS-1343544 “INSPIRE Track 1: Action, Vision and Language, and their Brain Mechanisms in Evolutionary Relationship,” (M.A. Arbib, Principal Investigator).

References Arbib, M. (2012). How the brain got language. Oxford: OUP. https://‍ Arbib, M., Ganesh, V., & Gasser, B. (2014). Dyadic brain modelling, mirror systems and the ontogenetic ritualization of ape gesture. Phil. Trans. R. Soc. B, 369(1644), 20130414. https://‍ Arbib, M. A. (2016). Towards a computational comparative neuroprimatology: framing the language-ready brain. Physics of life reviews, 16, 1–54. https://‍ Axelrod, R. (1984). The Evolution of Cooperation. New York: Basic Books. Burkart, J. M., Hrdy, S. B., and Van Schaik, C. P. (2009). Cooperative breeding and human cognitive evolution. Evolutionary Anthropology: Issues, News, and Reviews 18(5), 175–186. https://‍ Burkart, J. M., Allon, O., Amici, F., Fichtel, C., Finkenwirth, C., Heschl, A., … and Meulman, E. J. (2014). The evolutionary origin of human hyper-cooperation. Nature communications 5, 4747. Burkart, J. M., Guerreiro Martins, E. M., Miss, F., & Zürcher, Y. (2018). From sharing food to sharing information. Cooperative breeding and language evolution. Interaction Studies, 19(1–2), 136–150.  https://‍ Chow, C. P., Mitchell, J. F., and Miller, C. T. (2015). Vocal turn-taking in a non-human primate is learned during ontogeny. Proc. R. Soc. Lond. B Biol. Sci. 282 (1807), 20150069. Clark, H. H. (1996). Using language, Cambridge: CUP. https://‍ Couper-Kuhlen, E., and Selting, M. (1996). “Towards an interactional perspective on prosody and a prosodic perspective on interaction,” in Prosody in Conversation, eds. E. CouperKuhlen and M. Selting (Cambridge: CUP), 11–56. https://‍ de Boer, M., Kokal, I., Blokpoel, M., Liu, R., Stolk, A., Roelofs, K., … & Toni, I. (2017). Oxytocin modulates human communication by enhancing cognitive exploration. Psychoneuroendocrinology 86, 64–72.  https://‍ Fedurek, P., Slocombe, K. E., Hartel, J. A., and Zuberbühler, K. (2015). Chimpanzee lip-smacking facilitates cooperative behavior. Scientific reports, 5.  https://‍ Fitch, T. (2004). “Kin selection and ‘mother tongues’: a neglected component in language evolution”, in Evolution of communication systems: A comparative approach, eds. D. K. Oller and U. Griebel (Cambridge, MA: MIT Press), 275–296. Fitch, T. (2010). The Evolution of Language. Cambridge: CUP. https://‍

180 Sławomir Wacewicz and Przemysław Żywiczyński Fröhlich, M., Kuchenbuch, P., Müller, G., Fruth, B., Furuichi, T., Wittig, R. M. and Pika, S. (2016). Unpeeling the layers of language: Bonobos and chimpanzees engage in cooperative turn-taking sequences. Scientific reports, 6, 25887.  https://‍ Fröhlich, M. (2017). Taking turns across channels: Conversation-analytic tools in animal communication. Neuroscience & Biobehavioral Reviews 80, 201–209 https://‍ Gärdenfors, P. (2004). Cooperation and the evolution of symbolic communication. In: Oller, K., Griebel, U. (Eds) The Evolution of Communication Systems. MIT Press, Cambridge, pp. 237–256. Ghazanfar, A. A. and Takahashi, D. Y. (2014). The evolution of speech: vision, rhythm, cooperation. Trends in Cognitive Science 18(10), 543–553. https://‍ Grice, H. P. (1975). “Logic and conversation”, in Syntax and Semantics, Speech Acts (vol. 3), eds. P. Cole and J. Morgan (New York: Academic Press), 41–58. Griffin, Z. M. and Bock, K. (2000). What the eyes say about speaking. Psychological Science 4, 274–279.  https://‍ Hare, B. (2017). Survival of the friendliest: Homo sapiens evolved via selection for prosociality. Annual review of psychology 68, 155–186. https://‍ Hauser, M. D., Chomsky, N. A., and Fitch, T. (2002). The faculty of language: What is it, who has it, and how did it evolve? Science 298, 1569–1579. https://‍ Hockett, C. F. (1963). “The Problem of Universals in Language” in J. Greenberg (ed.), Universals of Language. Cambridge, MA: MIT Press, 1–29. Hrdy, S. (2009). Mothers and others. London: HUP. Hurford, J. (2007). The origins of meaning. Language in the light of evolution. Oxford: OUP. Jaeggi, A. V., Burkart, J. M., & Van Schaik, C. P. (2010). On the psychology of cooperation in humans and other primates: combining the natural history and experimental evidence of prosociality. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 365(1553), 2723–2735.  https://‍ Keller, P. E., Novembre, G. and Hove, M. J. (2014). Rhythm in joint action: psychological and neurophysiological mechanisms for real-time interpersonal coordination. Phil. Trans. R. Soc. B 369 (1658), 20130394.  https://‍ Kendon, A. (2004). Gesture. Visible Action as Utterance. Cambridge: CUP. https://‍ Knight, C. (1998). Ritual/speech coevolution: A solution to the problem of deception. In Hurford, J. et al. (Eds.), Approaches to the Evolution of Language, Cambridge: CUP, 68–91. Knight, C., and Lewis, J. D. (2017). Wild Voices: Mimicry, Reversal, Metaphor, and the Emergence of Language. Current Anthropology, 58(4), 435–453. https://‍ Krebs, J. R., Dawkins, R. (1984). “Animal Signals: Mind-Reading and Manipulation”, in J. R. Krebs and R. Dawkins (Eds.), Behavioral Ecology. Oxford: Blackwell, 380–402. Levelt, W. M. (1999). “Producing spoken language: a blueprint of the speaker,” in The Neurocognition of Language, eds. C. Brown and P. Hagoort, Oxford: OUP, 83–122. Levinson, S. (1983). Pragmatics. Cambridge: CUP.  https://‍ Levinson, S. C. (2006). On the human “interaction engine”. In: Enfield, N. J., Levinson, S. C. (Eds.), Roots of Human Sociality: Culture, Cognition and Interaction. Berg, Oxford, pp. 39–69.

Language origins 181

Levinson, S. C. (2016). Turn-taking in human communication, origins, and implications for language processing. Trends Cogn. Sci. 20 (1), 6–14. https://‍ Levinson, S., and Torreira, F. (2015). Timing in turn-taking and its implications for processing models of language. Frontiers in Psychology 6, 731. https://‍ Masataka, N., and Biben, M. (1987). Temporal rules regulating affiliative vocal exchanges of squirrel monkeys. Behavior 101 (4), 311–319.  https://‍ Maynard Smith, J., and Harper, D. (2003). Animal Signals. Oxford: OUP. Mitani, J. C. (2009). Cooperation and competition in chimpanzees: current understanding and future challenges. Evolutionary Anthropology: Issues, News, and Reviews 18(5), 215–227. https://‍ Nave, G., Camerer, C., & McCullough, M. (2015). Does oxytocin increase trust in humans? Perspectives on Psychological Science, 10(6), 772–789. https://‍ Okanoya, K. (2002). Sexual display as a syntactical vehicle: the evolution of syntax in birdsong and human language through sexual selection. In: Wray, A. (Ed.), The Transition to Language. Oxford: OUP, 46–63. Power, C. (2014). “Female philopatry and egalitarianism and conditions for the emergence of intersubjectivity”. In Cartmill, E. et al. (eds.), Proceedings of the 10th Evolang, Singapore: World Scientific, pp. 252–259.  https://‍ Roberts, S. G., Torreira, F., and Levinson, S. C. (2015). The effects of processing and sequence organization on the timing of turn-taking: a corpus study. Frontiers in Psychology 6, 509. https://‍ Rossano, F. (2013). Sequence organization and timing of bonobo mother-infant interactions. Interact. Stud. 14(2), 160–189.  https://‍ Rossano, F. (this volume). Social manipulation, turn-taking and cooperation in apes: Implications for the evolution of language-based interaction in humans. Interaction Studies, 19(1–2), 151–166. Ruiter de, J. P., Mitterer, H., and Enfield, N. J. (2006). Projecting the end of a speaker’s turn: A cognitive cornerstone of conversation. Language 82, 515–535. https://‍ Sachs, J. L., and Rubenstein, D. R. (2007). The evolution of cooperative breeding; is there cheating? Behavioral Processes 76(2), 131–137.  https://‍ Sacks, H., Schegloff, E. A. and Jefferson, G. (1974). A simplest systematics for the organization of turn-taking in conversation. Language 50, 696–735. https://‍ Sacks, Harvey, Emmanuel Schegloff, and Gail Jefferson. (1978). “A Simplest Systematic for the Organization of Turn-Taking in Conversation,” in Studies in the Organization of Conversational Interaction, ed. Jim Schenkein (New York: Academic Press), 7–55. https://‍ Schaik van, C. P. (2016). The primate origins of human nature. New York: Wiley. Schegloff, E. A. (2000). Overlapping Talk and the Organization of Turn-Taking for Conversation. Language and Society 29, 1–63.  https://‍ Scott-Phillips, T. C., Dickins, T. E., and West, S. A. (2011). Evolutionary theory and the ultimate–proximate distinction in the human behavioral sciences. Perspectives on Psychological Science, 6(1), 38–47.  https://‍

182 Sławomir Wacewicz and Przemysław Żywiczyński Semendeferi, K. (this volume) Why do we want to talk? Evolution of neural substrates of emotion and social cognition. Interaction Studies, 19(1–2), 102–120. Sperber, D., and Wilson, D. (1986). Relevance. Oxford: Blackwell Stivers, T., Enfield, N. J., Brown, P., Englert, C., Hayashi, M., Heinemann, T. and Levinson, S. C. (2009). Universals and cultural variation in turn-taking in conversation. Proceedings of the National Academy of Sciences 106(26), 10587–10592. https://‍ Takahashi, D. Y., Narayanan, D. Z. and Ghazanfar, A. A. (2013). Coupled oscillator dynamics of vocal turn-taking in monkeys. Curr. Biol. 23(21), 2162–2168. https://‍ Tan, J., Ariely, D., and Hare, B. (2017). Bonobos respond prosocially toward members of other groups. Scientific reports, 7(1), 14733.  https://‍ Theofanopoulou, C., Boeckx, C., Jarvis, E. D. (2017). A hypothesis on a role of oxytocin in the social mechanisms of speech and vocal learning. Proc. R. Soc. B 284, 20170988. https://‍ Tomasello, M. (2008). Origins of Human Communication. Cambridge, MA: MIT Press. Tomasello, M., Melis, A. P., Tennie, C., Wyman, E., and Herrmann, E. (2012). Two key steps in the evolution of human cooperation. Current Anthropology 53(6), 673–692. https://‍ Tomasello, M., & Gonzalez-Cabrera, I. (2017). The role of ontogeny in the evolution of human cooperation. Human Nature, 1–15. Tinbergen, N. (1963). On aims and methods of ethology. Zeitschrift für Tierpsychologie 20(4), 410–433.  https://‍ Vega-Redondo, F., and Hasson, O. (1993). A Game-theoretic Model of Predator – Prey Signaling. Journal of Theoretical Biology, 162(3), 309–319.  https://‍ Wacewicz, S., Żywiczyński, P. (2012). Human honest signalling and nonverbal communication. Psychology of Language and Communication 16 (2), 113–130. https://‍ Wacewicz, S. (2015). The shades of social. A discussion of ” The social origins of language”, ed. Daniel Dor, Chris Knight and Jerome Lewis. Theoria et Historia Scientiarum, 11, 191–208. https://‍ Wacewicz, S., Żywiczyński, P., and Chiera, A. (2017). An evolutionary approach to low-level conversational cooperation. Language Sciences. https://‍ Wagner, W. E. (2011). Direct benefits and the evolution of female mating preferences: conceptual problems, potential solutions, and a field cricket. Advances in the Study of Behavior 43(273), e319.  https://‍ West, S. A., El Mouden, C., and Gardner, A. (2011). Sixteen common misconceptions about the evolution of cooperation in humans. Evolution and Human Behavior 32(4), 231–262. https://‍ Zahavi, A., and Zahavi, A. (1997). The Handicap Principle. New York: OUP. Zlatev, J. (2014). “The co-evolution of human intersubjectivity, morality, and language”, in The Social Origins of Language, eds. D. Dor, C. Knight and J. Lewis (Oxford: OUP), 249–266. https://‍ Zywiczynski, P., Orzechowski, S., and Wacewicz, S. (2017). Adaptors and the Turn-Taking Mechanism: The Distribution of Adaptors Relative to Turn Borders in Dyadic Conversation. Interaction Studies 18(2), is.18.2.07zyw  https://‍

Imitation, Pantomime and Development

The evolutionary roots of human imitation, action understanding and symbols Masako Myowa-Yamakoshi Kyoto University

This paper focuses on how human complex imitation and its developmental processes are related to the abilities for action representation, acquisition of symbols, and language. After overviewing the characteristics of imitation in chimpanzees and humans, I propose a model of imitation emphasizing how these two species differ in the ways they process visual-motor information. These differences may in turn contribute to core interspecies differences in higher-order cognitive functions, not only for bodily imitation but for action understanding through complex referential information from faces, sharing symbols, and language. This ‘developmental-comparative’ approach reveals the development of species-specific intelligences, and shows what is shared and not shared between humans and other primates. In doing so, we can obtain a more complete understanding of the emergence of the ‘language-ready brain’ in relation to its biological and evolutionary foundations. Keywords: imitation, chimpanzees, humans, development, visual-motor information processing, gestures, referential information, faces, symbols

1. Introduction: The evolutionary foundation of human bodily imitation and language During the 1970 and 80s, great ape language projects were actively conducted as a way to enhance our understanding of whether and how human language is unique. Researchers taught these apes some aspects of human language by using sign language, plastic visual symbols, and lexigrams. Evidence suggests that apes can master aspects of the lexicon, but make little or no progress in mastering grammar as a means to combining ‘words’ into utterances that convey novel meanings. Over the years, there have been both successes and failures in teaching ‘words’. For example, studies have suggested that it may be more difficult for apes to learn verbs (i.e., action or doing words) compared to nouns (i.e., object or basic descriptive words)

https://‍ © 2020 John Benjamins Publishing Company

184 Masako Myowa-Yamakoshi

(e.g., Seidenberg & Petitto, 1979, Terrace, Petitto, Sanders, et  al., 1979, Hixson, 1998). Why may this be so? I propose that a biological approach comparing the abilities of humans and chimpanzees (our closest living relative) to perform bodily imitation will give us important clues to consider the evolutionary emergence of the ‘language-ready’ brain. More specifically, I propose that one reason for the language differences between humans and other great apes involves their respective visual motor learning systems which may underlie bodily imitation. Furthermore, such basic differences may contribute to core differences in higher-order cognitive functions such as action understanding by using referential information (e.g., gaze direction, facial expressions, and vocalizations) and the acquisition of symbols (i.e., proto-symbols). 1.1 Imitation focusing on objects and body movements in humans and chimpanzees Let us begin with a consideration of how humans and chimpanzees compare in their ability for bodily imitation. According to Myowa-Yamakoshi and Matsuzawa (1999, 2000), three main findings emerged for chimpanzees: 1. Actions involving unfamiliar motor patterns (i.e., those actions not typically observed in free-play periods such as hitting a lid with a towel) were more difficult to perform than actions involving familiar motor patterns (such as wiping a lid with a towel). Moreover, chimpanzees seldom reproduced demonstrated actions on their first attempt, even when these actions involved motor patterns that they had already acquired. 2. Actions during which an object in the hand is directed towards another external location towards either another stationary object (e.g., covering the ball with a bowl) or one’s own body (e.g., pushing the bowl against one’s arm) were easier to perform than those actions that involved manipulating a single object alone (e.g., merely shaking the bowl). 3. Their performance errors were stereotyped and perseverative for each object. For example, they would continue to put a small ball into the bowl, even though they observed the actions for covering the ball with the bowl. In other words, they seemed to fail to account for context and showed inflexible adherence to the end ‘goal’ states of the action based on their past experiences with the objects, emphasizing the final relation between the objects rather than the movements involved. These findings suggest that chimpanzees, unlike humans, find it difficult to appropriately transform and map the visual information involving the body movements of another individual onto their own corresponding body movements.

The evolutionary roots of human imitation, action understanding and symbols 185

Rather, they may focus on the relation between the objects. For example, when a chimpanzee observes someone hitting a nut using a hammer stone, they mainly pay attention to the nut and stone, rather than to the body movements involved Demonstrator’s action Input

Visual information of object

Extracting action goal

Visual information of body movements

Analysis of scene

Extracting shape/motor trajectory

Object identification/recognition in environment

Hands / Mouth movements

“What?” Ventral

“Where/How?” Dorsal Action Representation Action = Goal + Body Movements

Motor programming

Motor control /command

Imitative action

Simple imitation Complex imitation


Figure 1.  A conceptual representation of an information processing model for bodily imitation. The dotted lines indicate differences in visual-motor information processing between humans and chimpanzees. According to the model, humans represent other individual’s actions by integrating the visual information for both bodily movements and objects (the ventral and dorsal streams). On the other hand, chimpanzees represent other’s actions mainly relying on the information about the object (the ventral stream)

186 Masako Myowa-Yamakoshi

in hitting. Figure 1 shows a conceptual model of bodily imitation illustrating how humans and chimpanzees may differ in the ways they process such visual-motor information, along with the possible neural bases of these processes. This model assumes that, with respect to visual-motor information processing of manipulated objects, both humans and chimpanzees identify these objects from their own experience. They recognize an object’s visual properties (e.g., color, shape, size) and its function based upon its affordances. However, the chimpanzees’ specific types of imitative error suggest their inflexible adherence to the end goal states of the action based on their past experiences with these objects. In other words, unlike humans, chimpanzees may find it difficult to flexibly extract information about the action goal depending on context. Furthermore, the two species may differ in their information processing of body movements. As mentioned earlier, actions involving novel motor patterns were more difficult for chimpanzees to perform than those involving familiar motor patterns. Chimpanzees also seldom reproduced demonstrated actions on their first attempt (e.g., throwing a screwdriver), even though these actions involved motor patterns that they had already acquired (i.e., throwing). Moreover, chimpanzees may interpret the demonstrated action on the basis of how familiar objects (e.g., a screwdriver) are related to specific body movements from their past experience (e.g., twisting), irrespective of the demonstrated body movements to be imitated (e.g. hitting, or throwing a screwdriver, etc.). These processing differences between humans and chimpanzees may influence subsequent steps for imitation. That is, constraints on the input process cause differences involving action representation, motor programming and motor output, which as a result, ultimately manifest as the different imitative performances of humans and chimpanzees. More concretely, humans may represent other individual’s actions by integrating the visual information for both body movements and objects (involving both the ventral and dorsal streams). On the other hand, chimpanzees may represent other’s actions mainly relying on the information about the object (using information from the ventral stream). This seems to correspond to the distinction I make below between simple imitation (chimpanzees) and complex imitation (humans) which suggests an evolutionary transition from the last common ancestor of chimpanzees and humans (LCA-c) and Homo sapiens (Arbib, 2012). Such a difference ultimately makes it difficult for chimpanzees to share action representations (i.e. proto-symbols: intransitive gestures) faithfully amongst other individuals as humans do. How does this proposal relate to what is known about the ‘language’ performance of great apes? Several studies have reported that apes can communicate using gestures. Some researchers have insisted that all ape gestures are simply extracted from an innate repertoire (Hobaiter & Byrne, 2011), but a few group-specific intransitive gestures have been observed in ape populations suggesting a possible role for social

The evolutionary roots of human imitation, action understanding and symbols 187

learning (e.g., ontogenetic ritualization, Tomasello & Call, 1997, also see Arbib et al., 2014). Before explaining how my model may explain these claims, I would like to consider how chimpanzees may acquire action representations (and how they share them among all members of a given population), by discussing the distinction between simple and complex body imitation (Figure 1, also see Arbib, 2012). In doing so, I will focus on two aspects which may differ between humans and chimpanzees: (a) cultural differences of gestures in wild chimpanzees, and (b) social interaction via shared body sensory experiences with others. 1.2 Cultural differences of gestures in wild chimpanzees Chimpanzees’ non-genetic behaviors can be divided into two types (Table 1). The first type involves behavior directed towards objects, including using tools to obtain food, for personal hygiene, and so forth. In the wild, chimpanzees display a wide variety of population-specific tool-using and tool-making behavioral traditions (Matsuzawa & Yamakoshi, 1996, McGrew, 1992, Whiten, et al., 1999). The second type of behavior are those directed towards other individuals, which in turn can be divided into two sub-types. One involves behaviors using objects, and the other involves no objects. Table 1.  Examples of each type of non-genetic behaviors observed in wild chimpanzees Using objects

No objects

Behaviors directed to objects

Tool use, Tool-making

Behaviors directed to individuals

Leaf-clipping, Leaf-grooming

Greeting, Grooming

Regarding the former (i.e., other directed behaviors using objects), there are two well-known cases suggesting the existence of a few population differences in wild chimpanzees. The first example is ‘leaf-clipping’ observed in the Mahale K group (Nishida, 1980), Bossou (Sugiyama, 1981), and Tai (Boesch, 1995) chimpanzees. During this behavior, a chimpanzee pulls one or more dried leaves through its mouth and strips them, making a distinctive loud noise. Its function is interpreted as attention getting. The second is ‘leaf-grooming’ observed in Gombe (Goodall, 1986): a chimpanzee directs typical grooming motor patterns (e.g., peering, mouthing, and lip smacking) towards randomly picked leaves. It occurs when a lone chimpanzee seems bored. Plooij (1978) explained this leaf-grooming as ‘proto-declarative’, as the leaf is being used to get another’s attention. However, Zanma (2002) reported that Mahale chimpanzees perform leaf-grooming to pick up ectoparasites with their lips during grooming, place them on leaves and then squashing them with their thumbs. It is controversial whether leaf-grooming is really used for communication, as it is unclear whether it serves a social function

188 Masako Myowa-Yamakoshi

or serves simply to manipulate objects. Furthermore, it is important to note that both these other-directed behaviors are inconspicuous or rare, and that individuals use the same behaviors in different contexts (Tomasello & Call, 1997), and that there are distinct individual differences within the groups in which these behaviors occur (McGrew, 1992, Tomasello, Call, Nagell, Olguin, & Carpenter, 1994, Tomasello, & Camaioni, 1997). Contrast this limited use with that of humans, who often use both object-directed and human-directed behaviors from a very early period of life; they even start to put them together in both directions at around 9–10 months after birth (i.e., they use another individual to get an object or they use objects to get another’s attention) (e.g., Bates, Camaioni, & Volterra, 1975). Regarding the use of behavior directed towards others using no objects (which is the second sub-type of behavior directed towards other individuals), wild chimpanzees use gestures in a variety of contexts, such as greeting, grooming, and presenting for communication (Goodall, 1986). However, it seems that wild chimpanzees rarely use body movements as population-specific gestures to communicate with another individual. This suggests that wild chimpanzees’ gestures towards other individuals are either species-typical or simply extracted from an innate repertoire (e.g., Hobaiter & Byrne, 2011). However, at least one exception to this situation has been reported, known as the ‘grooming hand-clasp.’ McGrew and his colleagues (2001) reported that there are two different types of grooming hand-clasp: palm-to-palm and non-palm-to-palm. In the former, two chimpanzees clasp each other’s hands with mutual palmar contact, while in the latter, one or neither individual hand clasps the other and usually the hands are flexed with one limb resting on the other. The palm-to-palm hand-clasp grooming dominated in the Mahale K-group and was not observed in Mahale M-group. With regard to social learning processes for the grooming hand-clasp, it could be explained by their sharing proprioceptive experiences simultaneously between the dyad members. This is a different learning process from observational, imitative learning. To date, with the exception of hand-clasp grooming, there is little evidence that all members of a given chimpanzee population use population-specific gestures as humans do. I propose that chimpanzees’ limited capacity for complex imitation based on complex action representations may limit their ability to acquire and transmit such common gestures among group members. For a behavior to function as a social signal (i.e., proto-symbol), that signal must have the potential to be both recognized and performed in a similar manner by interacting individuals. Furthermore, for gesturing reciprocity to occur, the receiver must be able to understand the intentions and meaning of the signaler. Given that humans can anticipate and imitate another’s actions by processing body movement information faithfully, this in turn may enable humans to understand and initiate the use of these same gestural signals involving body movement amongst others. Given

The evolutionary roots of human imitation, action understanding and symbols 189

that chimpanzees have constraints for processing visual information about body movements (as shown Figure  1), it may be difficult for them to share and use the same gestures – especially those gestures involving no manipulated objects to be used as salient cues among other individuals as population-specific symbols. For this reason, unlike humans, wild chimpanzees may rarely display widespread population-specific gestures, especially those functioning as verbs (i.e., those symbols indicating the occurrence or performance of an action, the existence of a state or condition, or time-series changes relating to bodily states; also see Rossano in this volume). 1.3 Social learning via shared body sensory experiences with others Human caregivers actively try to share bodily experiences with their infants by imitating their actions. For example, Pawlby (1977) reported that an imitation episode occurs roughly once a minute in mother – infant face-to-face interactions; indeed, 79% of interactions involve mothers imitating infants. Such interactions involving imitation provide infants with opportunities to experience visual-motor correspondence. These experiences, in turn, allow infants to recognize not only action goals, but also the formation of complex bodily mapping structures. As the infant increases their amount of shared imitation with others, mapping occurs in greater detail. These mapping abilities may not be acquired through sudden and dramatic changes, but are thought to develop gradually through the accumulation of shared proprioception with others based on triadic communication situations in their everyday life following shortly after birth. Let me discuss this last point in more detail. From approximately four months of age, human parents try to initiate having their infants participate in triadic communication contexts involving a ‘self – object – other’. For instance, after a mother shakes a rattle before her baby, she gives this rattle to the baby to hold and shake. Alternatively, after placing food in her child’s mouth, she says “Mmm, good” and eats her food together with the child. Through this type of interaction, infants co-experience with others the function of objects and how they are manipulated by touching and molding. In such instances, infants not only observe visually the actions of others in front of them, but also experience and share those actions through proprioception (e.g., assisted imitation; Zukow-Goldring, 2012, also see Volterra et al. in this volume). Importantly, such triadic interactions are rarely observed between mother and infant chimpanzee dyads even after the latter reaches 3-years of age (Myowa-Yamakoshi & Tomonaga, 2009). Indeed, to the best of my knowledge, chimpanzee infants have never been observed to intentionally bring an object to show to their mothers or call for their mothers in order to attract their attention towards an object (e.g.,

190 Masako Myowa-Yamakoshi

declarative pointing). Moreover, chimpanzee mothers rarely ever interact with their children via the use of objects. It seems, therefore, that compared to humans, chimpanzee infants have very little opportunity to learn about an object’s function or usage by experiencing the object together with others (Matsuzawa, Tomonaga, & Tanaka, 2006). Do simultaneous proprioceptive experiences shared with other individuals, as frequently observed in humans but not chimpanzees, allow the former to acquire complex imitation? Tomasello, Savage-Rumbaugh, & Kruger (1993) seems to give us important clues to answer this question. They suggest that the chimpanzees reared in a human-like social environment (i.e., enculturated) might develop more imitative ability than the wild mother-reared chimpanzees. It is therefore possible that chimpanzees’ imitative ability develops flexibly, depending upon extended exposure to the appropriate surrounding rearing environments after birth. If so, it remains unclear whether there are any differences between encultured and wild-reared mother chimpanzees in the representations that mediate the relationship between the perception and performance of actions required for imitation (i.e., this plasticity via rearing environment is indicated by the dotted lines shown Figure 1). Further longitudinal developmental and comparative studies will help to explore the relationship between species-specific biological characteristics and the effects of various postnatal social experiences (e.g., characteristics of caregiving) in the development of imitation. Subsequently, this will help us better understand how the ability for complex imitation based on representing other individual’s actions (bodily movement + goal) may influence the acquisition of symbols and the sharing process with other individuals. 2. Ontogeny and mechanisms of complex imitation in humans In contrast to other primates (including chimpanzees), humans learn a broad range of behaviors from other individuals, and do so faithfully through processes involving simple and complex imitation, and human-specific active teaching by caregivers. From approximately their first birthday, humans begin regularly imitating complex actions such as the usage of novel tools or gestures. Interestingly, they even try to imitate the arbitrary and unnecessary details of the observed actions required for obtaining a goal. Such ‘over-imitation’ is proposed to be an adaptive capacity for acquiring a wide variety of novel complex and social-cognitive skills, and may therefore play an important role for intergenerational cultural transmission in human environments (i.e., Lyons, Young, & Keil, 2007, Whiten, McGuigan, Marshall-Pescini, et al., 2009). At around 14 months of age, humans seem to start to develop the capacity to inhibit their tendency for over-imitation and, depending upon the context, appear

The evolutionary roots of human imitation, action understanding and symbols 191

to rationally reproduce only those actions allowing for the goal to be obtained. They interpret the actions of others as being goal-directed, and therefore they start to imitate the actions of others in an attempt to efficiently achieve this goal in different forms under different physical contexts. In other words, they not only imitate goal-directed actions mainly based on one-to one bodily mapping, but also by selecting their own body movements based on a rational, efficient manner. Consider the following research. Gergely, Bekkering, and Király (2002) presented one of two actions to 14-month old infants. For each action, the whole body of the demonstrator performing the action is covered with a blanket. However, the manner in which the blanket covers the demonstrator is different for each action. In the first action, the demonstrator is completely covered with the blanket, including the hands, and is thus unable to use both hands. In the second action, the demonstrator is covered with the blanket, but only up to the wrists, and is thus able to freely use both hands. While covered with a blanket in either manner, demonstrators in both conditions performed the unusual action of ‘pressing with their forehead’ a light box placed on a table that would subsequently light up. After one week, this light box was given to the infants. The results showed that 69% of the infants who watched the action of the demonstrator whose hands were free used their foreheads in the same manner to turn on the light box. However, only 21% of the infants who saw the actions of the demonstrator whose hands were covered by the blanket pressed the box with their foreheads (most used their hands). According to Gergely et  al., infants reached the rational conclusion that the demonstrator in the hands-covered condition used their foreheads only because they were unable to light the box using their hands. They surmised that infants therefore chose the most rational action when reproducing the observed action themselves, which was to use their hands in order to achieve the goal of lighting the box. They call this ‘rational imitation’. These findings suggest that human infants can flexibly represent the actions of others depending on different contexts. Although no direct evidence has been provided yet, such action representation seems to go beyond the functions of a ‘monkey-like’ mirror neuron system (i.e., automatically matching processes between action perception and production regardless of context). For example, the high level of intentional inhibition control seen in humans may interact with the mirror neuron system, especially in different contexts (e.g., during rational imitation). If so, researchers should further consider whether and how the development of inhibition control (in addition to automatic imitation) in different species relates to the acquisition of symbols. Moreover, consideration of the distinction between simple and complex imitation using a human ontogenetic perspective will provide additional insights for answering these unsolved questions relating to the emergence of the language-ready brain.

192 Masako Myowa-Yamakoshi

3. Understanding actions by using referential information from faces There is another important comparative issue for considering the evolution of human language. According to the data provided by great ape language researchers, chimpanzees may find it difficult to acquire symbols not only for verbs, but also for emotional, internal states like ‘happy’, ‘sad’, and ‘beautiful’. Again, I would like to consider the referential (i.e., facial/communicative) information processing in action understanding using both comparative and developmental perspectives. Faces provide crucial information about individuals’ social lives. By detecting information expressed in the faces of other individuals (e.g., gaze direction and emotional expressions), individuals are able to rapidly identify and respond to dynamic changes in their social and ecological environments (Allison, Puce, & McCarthy, 2010). Whilst research investigating the roles of motor actions (e.g., hand gestures) and facial information has proved valuable for increasing our knowledge about action understanding (e.g., Flack-Ytter, Gredebäck, & von Hofsten, 2006, Cannon & Woodward, 2012), to date the majority of these studies has investigated characteristics of motor actions and facial information independently. It is currently unclear how information from both sources is integrated when understanding actions, despite the fact that this is a common occurrence in daily life. We investigated the patterns of eye movements for viewing goal-directed actions by comparing human adults and infants, and chimpanzee adults (MyowaYamakoshi, Scola, & Hirata, 2012, Myowa-Yamakoshi, Yoshida, & Hirata, 2015). First, we examined predictive eye movements in relation to action goals such as pouring juice into a cup. Previous studies have demonstrated that both humans and nonhuman primates rationally predict and understand the object-related actions of other individuals (e.g., chimpanzees: Buttelmann, Carpenter, Call, et al., 2007, Myowa-Yamakoshi, et  al., 2012, rhesus monkeys: Rochat, Serra, Fadiga, et  al., 2008). Our results also showed that adult chimpanzees seemed to anticipate an action goal in the same way as human adults. Yet, 8-month-old infants showed no such evidence of goal anticipation. Twelve-month-old infants showed mixed evidence in that strong goal anticipation was not evident, but these infants did show weak predictive tendencies that were statistically comparable to those of human adults and chimpanzees: they are not yet anticipating goal-directedness as fully as human adults and chimpanzees do. One possible framework to help explain these findings is that other individuals’ actions are understood through a direct matching process involving the mirror neuron system, where an observed action (movement + goal) is mapped onto the observers’ own representation of that action. (i.e., the direct-matching hypothesis; Ferrari, Bonini, & Fogassi, 2009, Rizzolatti & Craighero, 2004, Rizzolatti & Sinigaglia, 2010). According to this view, the prediction of another’s action goal is closely related to the observer’s own

The evolutionary roots of human imitation, action understanding and symbols 193

action repertoire. Our data supports this view. Although human and chimpanzee adults can perform this action by themselves, 12-month-old infants can perform similar, albeit simpler, versions of this action (e.g., placing an object located in a container into another container). Younger infants unable to perform the action do not shift their gaze. We also examined the general visual scanning patterns during observation of the goal-directed actions as they occurred. Chimpanzees, unlike human infants and adults, paid little attention to other individuals’ faces during the observation of other’s actions (Figure 2; Myowa-Yamakoshi et al., 2012). These findings demonstrate that the two species appear to have a different predisposition to observe goal-directed actions whilst integrating information from other individuals’ faces. Chimpanzees attend to the action goal (i.e., the end state) mainly based on the affordance information of objects (as seen in the ventral path in Figure 1), whereas humans appear to integrate this with the social information available during goaldirected motor actions using cues from demonstrators’ faces (as seen in the dorsal path in Figure 1). Furthermore, we found that chimpanzees rarely changed their scanning patterns, regardless of whether the predicted action goal was achieved

Figure 2.  Experimental set up and eye movement patterns of a 12-month old infant (Right) and a chimpanzee (Left) for the stimulus video. Unlike chimpanzees, humans may have a predisposition to integrate facial information from the actor when observing goaldirected actions

194 Masako Myowa-Yamakoshi

(Myowa-Yamakoshi et al., 2015). On the contrary, human adults and 3.5-year-old children (but not 12-month-old infants) attended more to the demonstrator’s face after confirming that the predicted goal was not achieved. In sum, these findings suggest that chimpanzees and humans differ with respect to when and why they refer to faces when encoding an actor’s goal-directed actions. Why is this so? One possibility is that by scanning the actor’s face, humans make ‘explicit’ or active inferences concerning whether an action is likely to have been performed intentionally. For example, facial expressions of frustration or disappointment might convey that the observed action goal had been unsuccessful or that the outcome was accidental. In implausible and/or ambiguous contexts, we need to explicitly identify others’ intentions by making inferences concerning the mental states of these actors independently from our own mental states. Let us consider this proposal in more detail. When observing familiar goal directed actions (e.g., juice being poured into a cup), the mirror neuron system is mainly activated which enables us to implicitly or automatically infer the goals of the person pouring the juice. However, when we observe implausible or ambiguous actions – such as juice being poured onto a table – an additional mechanism is required that may go beyond the implicit or automatic understanding associated with the activation the mirror neuron system; this additional mechanism enables us to explicitly explore/understand why other people’s mental states might be different from those of our own (e.g., why did the actor pour juice from a bottle onto the table top rather than the cup?). This explicit exploring/understanding may require numerous neural elements that are involved in higher-order cognitive functioning which are related to the self-other distinction, and by which we actively make inferences concerning the mental/emotional states of other agents (i.e., ‘perspective-taking’ and ‘mentalizing’ which are associated with the top-down activation of the mirror neuron system, (Amodio & Frith, 2006, Frith & Frith, 2006). Several studies have emphasized the evolutionary continuity of the mirror system in macaques and humans (e.g., Arbib & Rizzolatti, 1997, Rizzolatti & Arbib, 1998). More specifically, the ‘Augmented Competitive Queueing Model (ACQ)’ (Bonauito & Arbib, 2010) proposes that we have separate subsystems to distinguish between an intended and unintended action that crucially depend on the mirror neuron system. However, unlike most accounts of the mirror neuron system the emphasis here is on one’s own actions. The notion is that when we act, we have both a corollary discharge encoding the intended action and visual input which enables the mirror system to confirm whether the actual performance looks more like the intended action or another, unintended action. One evaluation index is desirability, which depends on the current task or goal. Each time an action is performed, a measure of expected reinforcement is updated for the observed action, whether intended or unintended (i.e., “If I do this, how close will it bring me

The evolutionary roots of human imitation, action understanding and symbols 195

to succeeding at the task?”). The other one is executability, which depends on the availability of suitable affordances and the probability of the action’s success (i.e., “Can I do the action now?”), and will be decreased if the mirror neuron system reports that the intended action was not performed successfully. The new idea, suggested by Myowa-Yamakoshi et al. (2012, 2015), is to extend this to observations of an action performed by others. Here, the mirror neuron system, in concert with other systems, may initially represent one behavior (the ‘apparently intended’ action). If the completion of the action preserves that representation, there is no problem. However, if the completion is incongruent with the initial action representation, then humans (unlike chimpanzees) can use the facial expression of the actor to judge whether the first assessment or the second assessment is more in agreement with the actor’s intentions. Further empirical studies are needed in order to confirm this proposal. 4. Toward a new road map As shown in Figure  1, I have proposed a cognitive model explaining how humans and chimpanzees’ capacities for imitation differ due to the ways they process visual-motor information, especially relating to dynamic bodily movements. Furthermore, I have proposed that such fundamental differences observed between these two species might relate to chimpanzee’s apparent limitation for understanding actions by using referential information from faces in different contexts. We can now consider how and why such differences may relate to the difference involving the acquisition of proto-symbols in humans and non-human primates. Consider, as one possible example, the research reported by Ferrari, Gerbella, Coudé et al. (2017) proposing that there are two mirror neuron sectors within the premotor cortex of macaques: the hand sector and the mouth sector. The hand sector receives visual input and is linked to parietal-premotor circuits; this sector is thought to perform sensorimotor transformations in relation to hands reaching towards and grasping an object. The mouth sector is linked to the laryngeal motor cortex and limbic pathway involved in emotions, communications and reward processing; it is thought to perform actions relating to mouth/face motor control (also see Coudé and Ferrari in this volume). Similarly, Ferrari, Gallese, Rizzolatti, et al. (2003) suggested that monkeys have not only mirror neurons related to hands but also those responding to mouth actions or communicative gestures. However, in relation to the evolution of language, it is still unclear (1) what role these sectors play in the recognition and articulation of proto-symbols, (2) how they integrate into other neural systems that enable us to understand and produce sentences for social communication. Another important issue to consider for the evolution of language may be the uniqueness of human social interaction. As mentioned above, simultaneous

196 Masako Myowa-Yamakoshi

proprioceptive experiences are observed early in humans – especially in the context of mother-infant interactions – but not in other non-human primates. Although we still do not understand why humans have evolved such a unique communicative characteristic, it seems reasonable to think that it may contribute to learning novel actions faithfully in terms of human ontogeny. Such a unique proprioceptive experience may enable us to understand more complex actions and the internal/ emotional states of that may underlie these states, and furthermore, to share the representation of them among group members as proto-symbols. Such complex information processing for communication might relate to the emergence of the language-ready brain. Consistent with this assumption, my colleagues and I proposed that chimpanzees and humans differ with respect to when and why they refer to faces when encoding an actor’s goal-directed actions; humans developmentally acquire a predisposition to observe and understand goal-directed actions by integrating the social information available during goal-directed motor actions with cues from actors’ faces, especially when a possible goal is not achieved (Myowa-Yamakoshi et al., 2012, 2015). In ambiguous contexts, humans explicitly identify actors’ intentions by making inferences concerning the mental states of other agents independently from our own mental states. Humans might have evolved different neural elements that are involved in such higher-order cognitive functioning related to the self-other distinction involved in perspective-taking and mentalizing. What we need to consider next is how the ability for communicating explicitly by exchanging gestural symbols would have transferred into exchanging vocal symbols that are shared with group members. This process might have been a crucial step in the evolution of language (also see Liebal and Oña in this volume). Cutting edge technologies will soon allow researchers to employ various tools for investigating such possibilities in both human and non-human primate subjects of all ages in everyday life settings (e.g., recording and modeling brain activities and body functions, including and combining electroencephalographic, behavioral, and physiological measures). Indeed, there has never been a time like the present for enhancing our understanding of the biological and evolutionary bases – both developmentally and comparatively – of crucial abilities such as imitation, action understanding, sharing symbols, and language.

Acknowledgements I am grateful to David L. Butler and Michael A. Arbib for their useful comments on an earlier version of this manuscript. I would also like to thank all of the staff, colleagues, parents, infants, and chimpanzees that participated in our studies.

The evolutionary roots of human imitation, action understanding and symbols 197

Funding The research reported here was supported in part by Grants-in-Aid for Scientific Research from the Japan Society for the Promotion of Science, and the Ministry of Education, Culture, Sports, Science and Technology (20220004, 24300103, 23300103, 24119005, 17H01016 M.M-Y, Principal Investigator) and the Mayekawa Houonkai Foundation (2015–2017, M.M-Y, Principal Investigator). The paper was prepared for a workshop funded by NSF Grant No. BCS-1343544 “INSPIRE Track 1: Action, Vision and Language, and their Brain Mechanisms in Evolutionary Relationship,” (M.A. Arbib, Principal Investigator).

References Allison, T., Puce, A., & McCarthy G. (2010). Social perception from visual cues: Role of the STS region. Trends Cogn. Sci., 4, 267–278.  https://‍‍01501-1. Amodio, D. M., & Frith, C. D. (2006). Meeting of minds: The medial frontal cortex and social cognition. Nature Reviews Neuroscience, 7(4), 268–277.  https://‍ Arbib, M. A. (2012). How the Brain Got Language: The Mirror System Hypothesis. New York & Oxford: Oxford University Press. https://‍ Arbib, M. A., Ganesh, V. & Gasser, B. (2014). Dyadic Brain Modeling, Ontogenetic Ritualization of Gesture in Apes, and the Contributions of Primate Mirror Neuron Systems. Phil Trans Roy Soc B, 369 (1644), 20130414.  https://‍ Arbib, M. A., & Rizzolatti, G. (1997). Neural expectations: a possible evolutionary path from manual skills to language. Commun. Cogn, 29, 393–424. Bates, E., Camaioni, L., & Volterra, V. (1975). The acquisition of performatives prior to speech. Merrill-Palmer Quarterly, 21, 205–226. Boesch, C. (1995). Innovation in wild chimpanzees (Pan troglodytes). International Journal of Primatology, 16, 1–16.  https://‍ Bonaiuto, J. J. & Arbib, M. A. (2010). Extending the mirror neuron system model, II: what did I just do? A new role for mirror neurons. Biological Cybernetics, 102(4), 341–59. https://‍ Buttelmann, D., Carpenter, M., Call, J., & Tomasello M. (2007). Enculturated chimpanzees imitate rationally. Dev. Sci., 10, F31–F386.  https://‍ Cannon, E. N. & Woodward A. L. (2012). Infants generate goal-based action predictions. Dev Sci., 15, 292–298.  https://‍ Flack-Ytter, T., Gredebäck, G., & von Hofsten, C. (2006). Infants predict other people’s action goals. Nat. Neurosci. 9, 878–879.  https://‍ Ferrari, P. F., Bonini, L. & Fogassi, L. (2009). From monkey mirror neurons to mirror-related behaviours: possible direct and indirect pathways. Phil. Trans. R. Soc. B., 364, 2311–2323. https://‍ Ferrari, P. F., Gallese, V., Rizzolatti, G., and Fogassi, L. (2003). Mirror neurons responding to the observation of ingestive and communicative mouth actions in the monkey ventral premotor cortex. European Journal of Neuroscience, 17, 1703–1714. https://‍

198 Masako Myowa-Yamakoshi Ferrari, P. F., Gerbella, M., Coudé, G. & Rozzi, S. (2017). Two different mirror neuron networks: The sensorimotor (hand) and limbic (face) pathways. Neuroscience, 358, 300–315. https://‍ Frith, C. D., & Frith, U. (2006). The Neural Basis of Mentalizing. Neuron 50, 531–534. https://‍ Gergely, G., Bekkering, H., & Király, I. (2002). Rational imitation in preverbal infants. Nature, 415(6873), 755.  https://‍ Goodall, J. (1986). The chimpanzees of Gombe. Cambridge, MA: Harvard University Press. Hixson, M. D. (1998). Ape Language Research: A Review and Behavioral Perspective). The Analysis of Verbal Behavior, 15, 17–39.  https://‍ Hobaiter, C. & Byrne, R. W. (2011). The gestural repertoire of the wild chimpanzee. Animal cognition, 14, 745–767.  https://‍ Lyons, D. E., Young, A. G. & Keil, F. C. (2007). The hidden structure of overimitation. Proceedings of the National Academy of Sciences of the United States of America, 104, 19751–19756. https://‍ Matsuzawa, M., Tomonaga, M. & Tanaka, M. (Eds.) (2006). Cognitive Development in Chimpanzees. Tokyo: Springer-Verlag Tokyo.  https://‍ McGrew, W. C. (1992). Chimpanzee material culture: implications for human evolution. New York/Cambridge, UK: Cambridge University Press. https://‍ McGrew, W. C., Marchant, L. F., Scott, S. E., & Tutin, C. E. G. (2001). Intergroup differences in a social custom of wild chimpanzees: The grooming hand-clasp of the Mahale Mountains. Current Anthropology 42, 148–53.  https://‍ Myowa-Yamakoshi, M. & Matsuzawa, T. (1999). Factors influencing imitation of manipulatory actions in chimpanzees (Pan troglodytes). Journal of Comparative Psychology, 113, 128–136. https://‍ Myowa-Yamakoshi, M. & Matsuzawa, T. (2000). Imitation of intentional manipulatory actions in chimpanzees (Pan troglodytes). Journal of Comparative Psychology, 114, 381–391. https://‍ Myowa-Yamakoshi, M. & Tomonaga, M. (2009). Evolutionary origins of social communication. In: de Haan, M. & Gunnar, M. R. (Eds.) Handbook of Developmental Social Neuroscience. pp.207–221, New York: Guilford Press. Myowa-Yamakoshi, M., Scola, C., Hirata, S. (2012). Humans and chimpanzees attend differently to goal-directed actions. Nature Communications, 3, 693. https://‍ Myowa-Yamakoshi, M., Yoshida, C., & Hirata, S. (2015). Humans but not chimpanzees vary face-scanning patterns depending on contexts during action observation. PLos One, 10(11), e0139989.  https://‍ Nishida, T. (1980). The leaf-clipping display: A newly discovered expressive gesture in wild chimpanzees. Journal of Human Evolution, 9, 117–128. https://‍‍90068-8 Pawlby, S. (1977). Imitative interaction. In H. R. Schaffer (Ed.), Studies in mother-infant interaction (pp. 203–223). London: Academic Press. Plooij, F. X. (1978). Some basic traits of language in wild chimpanzees? In A. Lock (Ed.), Action, gesture, and symbol: The emergence of language (pp.111–132). New York: Academic Press. Rizzolatti, G., & Arbib, M. A. (1998). Language within our grasp. Trends Neurosci. 21(5), 188–194.  https://‍‍01260-0

The evolutionary roots of human imitation, action understanding and symbols 199

Rizzolatti, G. & Craighero, L. (2004). The mirror-neuron system. Annu. Rev. Neurosci. 27, 169–19.  https://‍ Rizzolatti, G. & Sinigaglia, C. (2010). The functional role of the parieto-frontal mirror circuit: interpretations and misinterpretations. Nat. Rev. Neurosci. 11, 264–274. https://‍ Rochat, M., Serra, E., Fadiga, L., & Gallese V. (2008). The evolution of social cognition: goal familiarity shapes monkeys’ action understanding. Cur. Biol., 18, 227–232. https://‍ Seidenberg, M. S., & Petitto, L. A. (1979). Signing behavior in apes: A critical review. Cognition, 7, 177–215.  https://‍‍90019-2 Sugiyama, Y. (1981). Observations on the population dynamics and behavior of wild chimpanzees at Bossou, Guinea, 1979–1980. Primates, 22, 432–444. https://‍ Terrace, H., Petitto, L., Sanders, R. and Bever, T. (1979). Can an ape create a sentence? Science, 206, 891–902.  https://‍ Tomasello, M. & Call, J. (1997). Primate Cognition. New York: Oxford University Press. Tomasello, M., Call, J., Nagell, K., Olguin, R., & Carpenter, M. (1994). The learning and use of gestural signals by young chimpanzees: A trans-generational study. Primates, 35, 137–154. https://‍ Tomasello, M. & Camaioni. (1997). A Comparison of gestural communication of apes and human infant. Human Development, 40, 7–24.  https://‍ Tomasello, M., Savage-Rumbaugh, S., & Kruger, A. C. (1993). Imitative learning of actions on objects by children, chimpanzees, and enculturated chimpanzees. Child Development, 64, 1688–1705.  https://‍ Whiten, A., Goodall, J., McGrew, W. C., Nishida, T., Reynolds, V., Sugiyama, Y., Tutin, C. E. G., Wrangham, R. W., & Boesch, C. (1999). Culture in chimpanzees. Nature, 399, 682–685. https://‍ Whiten, A., McGuigan, N., Marshall-Pescini, S., & Hopper, L. M. (2009). Emulation, imitation, over-imitation and the scope of culture for child and chimpanzee. Philosophical Transactions of the Royal Society of London: B, 364, 2417–2428. https://‍ Zamma, K. (2002). Leaf-grooming by a wild chimpanzee in Mahale. Primates, 43, 87–90. https://‍ Zukow-Goldring, P. (2012). Assisted imitation: first steps in the seed model of language development. Language Sciences, 34(5), 569–582.  https://‍

Pantomime and imitation in great apes Implications for reconstructing the evolution of language Anne E. Russon

Glendon College of York University

This paper assesses great apes’ abilities for pantomime and action imitation, two communicative abilities proposed as key contributors to language evolution. Modern great apes, the only surviving nonhuman hominids, are important living models of the communicative platform upon which language evolved. This assessment is based on 62 great ape pantomimes identified via data mining plus published reports of great ape action imitation. Most pantomimes were simple, imperative, and scaffolded by partners’ relationship and scripts; some resemble declaratives, some were sequences of several inter-related elements. Imitation research consistently shows great apes perform action imitation at low fidelity, but also that action imitation may not represent a distinct process or function. Discussion focuses on how findings may advance reconstruction of the evolution of language, including what great apes may contribute to understanding ‘primitive’ forms of pantomime and imitation and how to improve their study. Keywords: pantomime, imitation, great apes, language evolution This chapter contains additional video and database files which can be found at https://‍

Introduction This paper assesses evidence on pantomime and action imitation in modern great apes, two abilities considered key steps in the evolution of language in Arbib’s (2016) and other reconstructions that take manual gesture seriously. Arbib’s reconstruction proposes both evolved uniquely in the human lineage, with action imitation an essential precursor to pantomime. Modern great apes (chimpanzees, bonobos, gorillas, orangutans) – henceforth “great apes” – are important to these reconstructions because they, humans, and their ancestral lineages constitute the

https://‍ © 2020 John Benjamins Publishing Company

Pantomime and imitation in great apes 201

hominids, a natural evolutionary group within the primates, who share features distinct from other nonhuman primates – enhanced brains among them (Russon & Begun, 2004). Great apes are the only surviving nonhuman hominids, so the only living models of the common hominid ancestor (LCA ca 13–18 Ma) and the behavioral and cognitive platform upon which language and other human capacities evolved. Evidence on great ape pantomime and action imitation derives from available researcher reports. For pantomime, I assessed common features and key factors associated with its occurrence and qualities (social relationships, scripts, development, cognition). For action imitation, I assessed factors affecting its fidelity and occurrence. Discussion focuses on how findings can further the reconstruction of language evolution, including what great apes may contribute to understanding basic forms of both abilities and how to study them effectively.

Pantomime Pantomime, in the sense of acting out messages, has been proposed as an evolutionary precursor to language because it enables open semantics. Actors can communicate anything they can perform, including aspects of objects, actions, events and emotions, based on their existing repertoire or ad hoc improvisation (e.g., Arbib, 2016). Pantomime may also have set the stage for grammar (Corballis, 2017). It is commonly considered unique to hominins, great apes at best “on the brink”, e.g., capable of only a few iconic gestures (Arbib et al., 2008). Great ape pantomime may be stronger than this view suggests: little studied systematically, observational reports indicate it occurs (Douglas & Moscovice, 2015; Russon & Andrews, 2010, 2011). If language evolved gradually via pantomime (Arbib, 2016; Corballis, 2017), pantomime-like behavior in modern great apes merits study as the only living evidence. I defined pantomime as a form of iconic gesture in which actors intentionally enact their meaning and referents and simulate engagement in activities (McNeill, 2000; Russon & Andrews, 2010). It differs from other recent definitions in omitting criteria such as using whole vs. part body actions, using reduced forms of actions, or without acting on real objects (e.g., Arbib, 2016; Corballis, 2017; Suddendorf et al., 1999; Tanner, 2008). Language evolution scholarship targets open semantics, additional criteria do not affect this capacity (and can be assessed in cases meeting basic criteria), and basic criteria are most plausible in great apes. I assessed great ape pantomime from an updated version of Russon & Andrews’ (2010, 2011) pantomime database. Pantomime events derive from research on wild great apes (i.e., naturally occurring behavior in native habitat), captives in homes, zoos or laboratories (e.g., imitation, sign language, pretense, deception, cognition), and ex-captives being rehabilitated to free forest life (e.g.,

202 Anne E. Russon

social learning, cognition). All but two events were reported by researchers or trained research assistants who had long-term experience working and sometimes living with the great apes involved. Most are unusual events observed while studying other topics; very few are from systematic studies of pantomime and several were originally interpreted in other terms (e.g., deception, pretense). The entire database and specifications on its compilation are provided in electronic supplementary materials (ESM) at https://‍ It includes 62 events that qualify as basic pantomime and is undoubtedly incomplete. It approaches the size of Byrne & Whiten’s (1990) first primate tactical deception database generated by data-mining (75 events) and is large enough to suggest patterns. Given the method used, it is not representative of the great ape population. Patterns derived from it, especially quantitative patterns, should be interpreted with caution since variation may reflect sources used and/or research intensity more than great ape capacities. Sampling limits notwithstanding, this relatively small database includes pantomimes from all great ape species, all age/sex classes, and all living conditions. Below are two illustrative examples (numbered per ESM), major pantomime features, and a summary of meanings expressed (Table 1). #42. Siti, an adolescent rehabilitant orangutan, was husking a wild coconut manually. She skillfully opened one of its three eyes, extracted and ate some of the jelly inside, then stopped before opening the other two eyes, handed her coconut to the technician monitoring her, and waited. Other wild coconuts that had been sliced apart littered the ground nearby, suggesting someone had chopped them open with a machete to help rehabilitants. This technician offered the coconut back to Siti without opening it. She replied by chopping at the coconut with a stick, as if telling him to use his machete to chop her coconut open. Within seconds he complied; she watched and waited without interfering while he did so, then extracted and ate the remaining jelly inside. #1-#3. Bonobos in 3 wild communities “branch drag” (BD). They obtain a carefully selected "branch" (e.g., small tree, leafy, ca 2m long) then run through the forest dragging it behind. BD is most common in group movement but also occurs in other contexts (excitement, dominance, mild threat); it is performed primarily by adult males and always as a communicative gesture enhancing the actor’s other behavior by adding emphasis or more information. BD associated with group movement, e.g., informs group members of movement initiation, movement direction, and directional changes during travel. Main features of pantomimes in this database are as follows. Actors were 68% orangutans, 15% chimpanzees, 11% bonobos, 6% gorillas; 65% were immatures

Pantomime and imitation in great apes 203

(11% infant, 29% juvenile, 24% adolescent), 26% adults, 10% unknown age. Living conditions were 85% human-defined (48% rehabilitation facilities, 37% captivity – 26% language-trained, 11% other) and 15% wild. Partners were 70% humans, but even captives and rehabilitants addressed some pantomimes to conspecifics. Messages were 69% imperatives, 18% possible declaratives, 3% both, 10% unclear. As points of comparison, these pantomimes involved some form of six of the 115 gesture types in Hobaiter & Byrne’s (2011) great ape gesture repertoire (branch drag, bite, position partner, peer, present for grooming, slap object). The imperative-declarative distribution is consistent with evidence that great ape gestures mostly express imperatives, but verging on inconsistent with the view that they rarely, if ever, express declaratives.

Semantics Since open semantics is proposed as pantomime’s possible contribution to language evolution, I assessed the range of meanings expressed by great ape pantomimes within common message types (Table 1). Within some message types, each pantomime was distinctive (e.g., share experience – specific experience, do “X”  – specific action). Within others, all pantomimes combined standardized main elements with situational specifics (e.g., ask for help – standard “upset” + specific problem; branch drag – standard “drag branch” + specific direction). This suggests great apes produce novel pantomimes improvised ‘on the fly’ (e.g., #42) Table 1.  Common message types in great ape pantomimes within this database Message types

# cases: # variants

Constituent Behaviors


Enact non-threatening behavior within sight of a wary partner without expressing any requesta 1. ignoreb and remain within sight 2. ignore and eat something nearby (real or fake) 3. ignore and build several nests and gradually approach 4. ignore and move away (sometimes faking departure) 5. “groom” partner’s backpack 6. “fake” groom partner (clean invisible items) 7. peel and give food to partner 8. gently bite partner (once also signing “will not bite”)c

share experience 2:2

Enact an observed or previously experienced salient event in front of an observer, with no indication a request is intended 1. finger caught in fence mesh (to observing researcher) 2. “doctor” foot to extract a thorn (to observer who doctored)

Declarativea no harm intended

204 Anne E. Russon

Table 1.  (continued) Message types

# cases: # variants

Constituent Behaviors

ask for help: scared + where/ what problem


Enact a fake problem/emotion to solicit help, in order to disrupt bullying, disrupt work, or get to play outside 1. act scared + sign “cat”c + peer around outside 2. act scared + make noise + look around outside 3. act scared + alarm call + move to/look at door 4. act scared + call + look (target unclear to partners) 5. act toy stuck + call + try-fail

request permission for -X-


Request permission for a specific action by partly enacting that action to an individual who can grant permission 1. Toss leafy branches into an occupied nest, start to enter the nest, then pause for occupant’s response to enter intiative. 2. A subordinate female bonobo left her dominant female friend to join a male. Soon she returned, displayed a sexually receptive posture to her friend, and waited. The friend watched, then looked away (interpretation: “no objection”). The subordinate immediately rejoined the male.

do -X-


Enact a behavior before a partner (human or conspecific) to ask the partner to perform that behavior. 1. Actions (n = 18): bang, trade, tool chop, hammer crack, open, peel, groom, hip shimmy, remove, manipulate, put sand on, wipe clean, scratch, sex, slap, snip, scrape, saw 2. Action targets (n = 17): dirt on head, face, flower, fruit, hair, head, helmet, mosquitos, nut, jungle gym, mouth, stem, stick, hair, termite nest


Notes: a.  I identified declaratives when pantomimes expressed comments (referred to items or events without requesting them) or statements (what the actor was about to do) (Lyn et al., 2011). b.  I treated ‘ignore’ as faking indifference, i.e., faking the actor’s own emotional state. c.  I included signed components in event sketches (“will not bite”, “cat”) but did not include them as pantomime components.

and generic pantomimes suggestive of social influences (e.g., #1–3). The evidence is clearly very limited and unsystematic, but suggests how great ape pantomimes can open semantic possibilities.

Tools, relationships, scripts I assessed these features in great ape pantomimes as relevant to the nature and complexity of the messages communicated and conditions favoring their usage.

Pantomime and imitation in great apes 205

On Corballis’ (2017) suggestion that pantomime may have set the stage for grammar, construed as “the construction of sequences involving combinations of elements, and perhaps even some recursion as sub-sequences were inserted into the main flow of the story”, I considered pantomimes involving tool use because tool use entails using objects in particular combinations and particular relationships with targets, and pantomiming with tool use can specify an agent, target(s) of action, tool object(s), and the tool-target relationship(s) the agent should enact. Several of these great ape pantomimes showed these features, including examples #42 and #1–3. Social factors should be prominent in great ape pantomime, since great apes and many other anthropoid primates live in semi-permanent social groups characterized by long-term social relationships (e.g., Mitani et al., 2012). This form of sociality is also plausible as their ancestral condition, since it is common in living anthropoid primates. I assessed great ape pantomimes for participants’ social relationship and any script contributions to the message. Social relationships, here, are interaction patterns (behaviors, expectations) generated by participants’ interaction history, where interactions are defined by the behaviors an actor directs to a receiver (Hinde, 1976a, b). Hinde distinguished two types of relationships defined by participants’ interaction history: individualized (with a specific partner, e.g., mother and own infant) and generalized (with a partner category, e.g., wild apes – kin, outsider; captive apes – student researchers; humans – cab driver, waiter). In all but three great ape pantomimes (96%), participants had an established social relationship, 83/10/3% individualized/generalized/ both. In one case, an orangutan pantomimed to herself. Scripts are general event representations derived from and applied to social contexts that specify the set and sequence of elements (e.g., roles, props, actions) linked with recurrent contexts and goals (Nelson, 1981). Great apes establish scripts, extensive experience is probably not needed to build them, script violation characterizes some great ape deceptions (Mitchell, 1999), and deception is closely related to pantomime. Scripts were confirmed/probable contributors to 60%/‍15% of great ape pantomime events, not/probably not/unknown to 5%/‍3%/‍18% of them. The prevalence of social relationships and scripts in these great ape pantomimes underlines the importance of participants’ history of shared experiences. These pantomimes typically referred to activities or items already familiar to participants, they often clarified a preceding request to the same partner that had omitted information, and omissions concerned activities or items involved in previous interactions on the same issue. That is, great ape pantomimes frequently concerned familiar scripts associated with participants’ social relationship. The importance of scripts and social relationships in these pantomimes is consistent with the understanding that gestural meaning is context-dependent for

206 Anne E. Russon

humans and great apes (Russon & Andrews, 2011). This should be especially important for pantomime and other iconic gestures, which are often idiosyncratic and sometimes created on the spot from the actor’s mental content and available props. Take-home messages are (a) great ape pantomimes appear to be importantly social- and context-sensitive; (b) partners’ shared interaction history may play an important, possibly critical role in their expressing and understanding meaning via pantomime (i.e., common ground and contextual cues provide much of the meaning, and could make pantomime rarely necessary); and (c) pantomime is possible with some strangers, and occurred in great apes, perhaps because generalized relationships can develop between partner “categories” (e.g., wild apes – transient or dominant male, captive apes  – gullible human students) and their interactions also generate scripts (e.g., how to communicate friendly intent, how to mislead students). Point (b) suggests possible acquisition via ontogenetic ritualization, but only four events suggested the substantial exposure needed during development to generate it (Halina et al., 2013).

Imitation I assessed one facet of great apes’ abilities for imitative learning (i.e., learning new behavior by seeing it done, hereafter “imitation” for convenience)  – action imitation, reproducing relatively detailed, linear specification of sequential actions. Byrne & Russon (1998) distinguished it from program imitation, reproducing a relatively broad description of the subroutine structure and hierarchical layout of a behavioral program. This distinction recalls a long-standing view that imitation serves two functions, interpersonal and instrumental (Yando et  al., 1978). Interpersonal imitation, sometimes characterized as impersonation, focuses on matching the model’s style and action details; instrumental imitation, like program imitation, focuses on solving problems, especially causal ones, so programs, subroutines, and their organization are relevant. Both forms of imitation can contribute to other purposes, and action imitation in particular can enhance instrumental effectiveness. Great apes have reportedly used action imitation instrumentally, e.g., how to fan a fire or make a long stick to rake in an out-of-reach boat (e.g., examples AI-3, OI-1 below). Arbib (2012, 2016) likened action imitation to his complex imitation and proposed overimitation as a key characteristic. Complex imitation combines complex action analysis (recognizing resemblances between another’s performance and an assemblage of familiar actions) with actual imitation (re-enacting the assembled actions in light of the action analysis). Keys are observationally acquired rapid understanding of the overall structure of moderately complex skilled behavior and reducing it to an assemblage of known behavioral components, variants of them,

Pantomime and imitation in great apes 207

and/or new variants otherwise acquired. It is central to his proposed evolutionary pathway to language within the human lineage: complex imitation to pantomime to protosign. I reviewed studies of action imitation in great apes, including overimitation, and the model of imitative processes. I considered three sources of evidence on great apes’ action imitation, “Do what I do” (DWID) games, spontaneous imitation, and overimitation. In DWID studies, a human demonstrates an arbitrary, non-functional action (e.g., touch chin, slap floor) to a subject, instructs the subject to imitate it, and records responses. For great apes, training normally precedes testing in the form of demonstrating a set of training actions and shaping imitative responses using differential food reinforcement. Once subjects reliably reproduce training actions, novel arbitrary actions are demonstrated and subjects’ responses recorded. DWID studies have been conducted with captive chimpanzees, orangutans, and gorillas (Byrne & Tanner, 2006; Call, 2001; Custance et al., 1995; Hayes & Hayes, 1952; Miles et al., 1996; Myowa-Yamakoshi & Matsuzawa, 1999; Tomasello et al., 1993). All concluded that chimpanzees, orangutans, and gorillas imitated some arbitrary, non-functional actions; their copies were typically low in fidelity but researchers could identify them in context. With respect to great apes’ limitations, DWID findings have limited value for several reasons. First, great apes tested varied in age, learning backgrounds, and/ or living/social conditions (infant to adult; human home, language or research laboratory, zoo; human vs. great ape socialized) so test actions may have been unsuitable for some. Some, e.g., may have failed to imitate because the degree of novelty in demonstrated actions exceeded their current capabilities. Vygotsky (1962) argued humans can only learn new behaviors within their “zone of proximal development” (zpd), i.e., at most “slightly” beyond their current capabilities. This also applies to great apes and their zpd is probably narrower than humans’. The gorilla’s closest matches, for example, were for demonstrations similar to gestures she already used (Byrne & Tanner, 2006). Second, great apes’ low fidelity matches may owe to demonstrations providing too little information to identify which features the demonstrator wanted imitated. Chantek, a language trained orangutan, repeatedly changed the features he imitated when told his responses were wrong (Miles et  al., 1996). For example, his caregiver demonstrated jumping 12 times. Chantek responded differently each time. In sequence, he lifted and stomped his foot; took exaggerated steps toward his caregiver; lifted each leg alternately (right, left, right); three times lifted and dropped both legs while sitting; twice no response; alternately stomped each foot; sat and no response; stomped his right foot repeatedly; lifted and dropped both legs, paused, lifted and dropped his right foot; and lifted and dropped his right

208 Anne E. Russon

foot then reached back with his arms, pulled himself up to a step behind him, and lifted his feet in the air while moving backward. All his imitations were partial (he never lifted his whole body off the ground) and suggested jumping was impossible for him. His partial imitations show he interpreted the demonstration as requiring feet off the ground, which he replicated repeatedly with variations. Human children also try to figure out which features to imitate (Yu & Kushnir, 2014). Both resemble the gavagai effect in language, and suggest low fidelity matches may not reflect great apes’ abilities when demonstrations were imprecise. Third, some DWID studies used conditions ill-suited to action imitation, notably demonstrators lacking strong social relationships with subjects and training via food rewards. Both could have led subjects to interpret the task as instrumental vs. social, and/or their job as enacting “correct” (reinforced) vs. “matching” (demonstrated) responses: either invalidates these studies as tests of action imitation. Hayes, Miles, and Myowa-Yamakoshi had long-term relationships with their apes plus records of their experiences, so their studies may be less affected by these problems. Observational studies of great apes’ spontaneous behavior in their normal living conditions have offered evidence for action imitation. They typically document the incidence, qualities, and correlates of imitative behaviors. When the researchers undertaking such studies have long-term knowledge of the great apes they study, findings are valuable in two ways. First, the imitations represent great apes’ independent choices of the models and behaviors they imitated, so they served the imitator’s interests and capabilities. Second, researchers’ knowledge can span several years prior to the imitations observed, including the behavior’s probable novelty and factors likely to have been important in eliciting it (e.g., social relationships, individual and collective histories). Such researchers are well equipped to detect and interpret communicative and other novel social behaviors in the great apes they study. Below are five examples of spontaneous great ape action imitation. #AI-1. Viki, a chimpanzee reared in a human family from early infancy, applied lipstick and powder “just as she had seen the act performed” (Hayes & Hayes, 1952, p. 451). #AI-2. Chantek, an orangutan trained in sign language, curled his eyelashes as he had seen his caregiver do (Miles et al., 1996). #AI-3. Krom, a female chimpanzee in the Burger’s Zoo, walked in an unusually hunched fashion. Young apes in her group once, for days, walked behind her, “single file, all with the same pathetic carriage” (de Waal, 1982, p. 80). #AI-4. Supinah, an adult female rehabilitant orangutan who frequented the forest camp, re-enacted the cooks’ method of starting fires in their outdoor kitchen, including fanning embers using the same tool they used the same way they used it (fan side to side vs. up and down). For obvious reasons and

Pantomime and imitation in great apes 209

because Supinah harassed the cooks, guards actively kept her away from the outdoor kitchen – especially when cooks were working there – so she could not have learned this by working with them (Russon & Galdikas, 1993, 1995). She often watched from a distance, so action imitation is the most plausible explanation. #AI-5. In behaviors common or universal in forest-living orangutans, community-specific stylistic features unrelated to ecological or social constraints may derive from action imitation because active teaching is rare in great apes. Possible examples are local styles of “kiss-squeaks” (kiss sounds made by sharply intaking air through pursed lips) that are unlikely to differ functionally, e.g., lips on inside of palm vs. inside of wrist (Orangutan Cultures, 2017). Overimitation, reproducing a model’s actions that are causally irrelevant as well as those that are causally relevant, has been used to identify action imitation and assess factors affecting it (Lyons et al., 2007; Taniguchi & Sanefuji, 2017). Some spontaneous great ape imitations show its features; three examples below imitate causally irrelevant actions in instrumental tasks. #OI-1. Meinel’s (1995) imitation study demonstrated to zoo orangutans how to obtain a banana from a boat floating out of reach on a moat within their enclosure. She demonstrated joining three short sticks end to end to make one long stick, then using it to hook and pull the boat within reach. Each short stick had a socket at one end, to enable joining them by inserting one’s raw end into another’s socket. Meinel demonstrated joining sticks by insertion plus screwing, as if screwing was essential to joining; it was not. After demonstrating, she gave the orangutans several sets of short sticks. Juara, an adolescent male orangutan who had watched intently, collected three sticks (after other orangutans had pulled their sockets off) and brought them to the moat’s edge near the boat. Then he ‘joined’ two by butting their ends together and twisting them back and forth against one another (‘screwing’). His twisting constitutes overimitation, because it reproduced demonstrated actions that were causally irrelevant and ineffective in joining the two sticks. #OI-2. Eat Atuna racemose leaves (Russon, unpub). Victor, a juvenile rehabilitant orangutan, peered closely while Luna, an adolescent female, ate leaves from an A. racemose tree. While Luna ate these leaves, Victor made no attempt to eat any although the tree was large and abundant in edible leaves. Immediately after Luna left, Victor took her position and ate leaves from her patch. I consider this qualifies as imitation because it is the only record of orangutans eating A. racemose leaves anywhere, and Victor ate these

210 Anne E. Russon

leaves only after observing Luna do so. It also qualifies as overimitation because Victor ate leaves only from Luna’s ‘patch’ with no evident ecological reasons for doing so. #OI-3. Kiss-squeaks. Within-community variants in example #AI-4 that are potentially non-functional could qualify as overimitation in wild orangutans. In conclusion, great apes have offered evidence of action imitation and overimitation in experimental and observational studies, their performances share some of the features seen in humans, and they appear to use both spontaneously but perhaps not often in the wild. There are numerous caveats to interpretations. First, many experimental studies of great ape imitation had flaws that undermine the validity of their findings. For example, great apes’ weak performance in imitation experiments has been attributed to their “emulating” (copying outcomes) versus “imitating” (copying actions) (Tomasello & Call, 1997), but this could owe to confounded tests of imitation (e.g., experimental procedures or model behaviors that misled observers about the important features of demonstrated behavior). Participants’ behavior may then represent responses to misleading cues. Second, observational findings are often associated with limited historical information. Third, there are now multiple models of the processes generating imitative learning and which are valid is unclear. Arbib’s (2016) model overlaps with but deviates from Byrne & Russon’s (1998) model. Some dispute the action-program imitation distinction (e.g., Connolly & Dalgleish, 1989; Lyons et al., 2007; Taniguchi & Sanefuji, 2017; Yu & Kushnir, 2014), arguing they are not distinct processes and effectively proposing single process models. At the other extreme, imitation has been proposed to constitute a single overaching imitation “faculty” or “system” served by multiple special purpose mechanisms (e.g., Subiaul et al., 2016). Aside from the problem of determining which models are correct, few have been assessed empirically in humans let alone in great apes. Notwithstanding, the overall pattern in studies of imitative learning is that great apes are less successful than humans, but more successful than monkeys – notably, in imitating novel motor actions  – although some monkeys have succeeded in other imitative learning tasks (e.g., novel action sequence rules) (Subiaul et al., 2016).

Towards a new road map Arbib’s road map of the evolution of language would benefit from considering all modern great apes and their evolutionary lineage, the hominids (LCA ca 13–18 Ma), not just chimpanzees (LCA ca 6–7 Ma). Pantomime and action imitation,

Pantomime and imitation in great apes 211

two abilities proposed to have been critical in the evolution of language, have been reported in all modern great ape species in some form; by inference, both have hominid origins. If great apes have not (yet) expressed the exact form of these abilities proposed as critical to language evolution, their achievements may represent features that could contribute to reconstructing the evolutionary pathway. Further suggestions for moving forward focus on great ape behavior and cognition, my areas of expertise. Arbib focuses on complex and action imitation, arguing they occur only in the human lineage. Multiple studies have found evidence for action imitation in great apes although their copies are ‘rough’. Available experimental evidence may underestimate great apes’ capacities, because many studies used conditions not conducive to imitative learning. As one example, action imitation for the arbitrary actions typically used may normally require repeated demonstration-copy exchanges before imitators achieve a good match, to isolate which features the demonstrator wants imitated and refine copies to meet expectations. This situation resembles the gavagai effect in language. Further, observational studies suggest that imitative learning in great apes typically occurs between individuals with established positive social relationships; some action imitation experiments did not provide these conditions, and may have failed to elicit imitation in their great ape subjects for this reason. Arbib’s evolutionary model of complex imitation preceding pantomime does not square well with the common view that pantomime enables open semantics, i.e., actors can communicate “anything they can do” including aspects of objects, actions, events and emotions, and with some features of great ape pantomimes. What actors can do spans their entire behavioral repertoire, insofar as they can perform behaviors voluntarily without functional eliciting conditions. Mature great apes have very large voluntary behavioral repertoires, providing a rich basis for pantomime. Of the 62 great ape pantomime events I identified, ~ 60% involved enacting behavior the actor already knew. Second, behaviors in an individual’s repertoire are generated by many mechanisms; imitative learning may be among them, but so are many simpler individual and social learning mechanisms (e.g., trial and error, mental problem solving, stimulus enhancement, ontogenetic ritualization) plus mechanisms generating involuntary responses. In acquiring “aspects of objects, actions, events, or emotions”, imitative learning is unlikely to be essential. Many aspects of objects (e.g., form, shape) may be learned by handling them, and aspects of actions, events and emotions by directly experiencing them. Several great ape pantomimes included faking affective states, all of which the actor had directly experienced (fear, interest, indifference, innocence). Selfimitation is a relatively simple mechanism that could bring their performance under voluntary control.

212 Anne E. Russon

Accordingly, it seems unlikely that complex imitation preceded pantomime in evolution or is essential to it, or that both evolved within the human lineage. Imitative learning could clearly broaden the behavioral repertoire available to pantomime and probably does in great apes: published evidence shows great apes can and do use imitative learning, and ~30% of these great ape pantomimes used behaviors probably acquired by imitative learning (e.g., machete-like chop a coconut, use tools to open termite nests, ‘doctor’ injured foot). In light of these points, developing better understanding of imitative learning and pantomime in great apes is important to advancing reconstructions of the evolution of language. Current experimental findings on imitative learning in great apes are inconclusive, often because experiments neglected critical contributing factors such as ontogenetic development (notably zpd), cognition, and social relations. Some, e.g., did not assess each ape’s level of understanding or current knowledge, and may have demonstrated novel actions ill-suited to their age or learning capacity or actions they already knew. Others used demonstrators lacking suitable relationships with the apes tested, who may have failed to engage their participation. Reasons for subjects’ successes and failures are therefore methodologically confounded. Studies are needed that are designed to take all these critical factors into account. Newer models of imitation as a specialized psychological faculty or system served by multiple specific systems or mechanisms, e.g., motor vs. cognitive imitation (Subiaul et al., 2016), could also improve the design of empirical studies and interpretations of findings. They might help isolate facets of imitative learning that distinguish humans, great apes, and monkeys, an important basis for reconstructing its evolutionary progression. As one example, great apes have shown imitation of hierarchically structured behavioral routines (i.e., program imitation); to date, monkeys have at best shown imitation of sequential order (Subiaul et al., 2016). For pantomime, systematic assessment of its generating processes and great apes’ capacities for them are essential to revising the road map. To that end, construing pantomime in its basic form(s) and as a form of expression rather than as a mental process are two practical starting points. To establish the range of great apes’ pantomime and its governing processes, studies based on factors which the current database suggests are important could help. These include: social relationships, scripts, topics/behaviors important to the actor, communication failures in response to an actor’s simpler messages, and the actor’s current abilities and zpd (i.e., degree and complexity of novelty within their grasp). Findings should help establish the range of great ape pantomime and the mental processes involved, both important bases for distinguishing abilities unique to hominins. Finally, enhanced cognitive capacities often play a central role in reconstructions of human evolution and could well have been pivotal to enhancing

Pantomime and imitation in great apes 213

pantomime and imitation. Cognitive abilities underpin an individual’s understanding and representation of demonstrated behavior, both its organizational structure or “program” (e.g., relational and inter-relational components, hierarchies) and its component actions. Enhancement of cognitive capacities would enable more complex and more accurate understandings and representations, in turn enabling more powerful learning abilities, imitation among them, and thereby richer pantomime repertoires.

Funding This research was supported by NSERC Canada and Indianapolis Zoo. The paper was prepared for a workshop funded by NSF Grant No. BCS-1343544 “INSPIRE Track 1: Action, Vision and Language, and their Brain Mechanisms in Evolutionary Relationship,” (M.A. Arbib, Principal Investigator).

References Arbib, M. A. (2012). How the Brain Got Language: The Mirror System Hypothesis. New York: Oxford University Press.  https://‍ Arbib, M. A. (2016). Towards a computational comparative neuroprimatology: Framing the language-ready brain. Physics of Life Reviews, 16, 1–54. https://‍ Arbib, M. A., Liebal, K., & Pika, S. (2008). Primate vocalization, gesture, and the evolution of human language. Current Anthropology, 49(6), 1053–1076.  https://‍ Byrne, R.W. & Russon, A. E. (1998). Learning by imitation: A hierarchical approach. Brain and Behavioral Sciences, 21, 667–721.  https://‍ Byrne, R. W. & Tanner, J. E. (2006). Gestural imitation by a gorilla: evidence and nature of the capacity. International Journal of Psychology and Psychological Therapy, 6, 215–31. Byrne, R. W. & Whiten, A. (1990). Tactical deception in primates: the 1990 database. Primate Report, 27, 1–101. Call, J. (2001). Body imitation in an enculturated orangutan (Pongo pygmaeus). Cybernetics and Systems, 32, 97–119.  https://‍ Connolly, K. & Dalgleish, M. (1989). The emergence of a tool-using skill in infancy. Developmental Psychology, 25(6), 894–912.  https://‍ Corballis, M. C. (2017). Precursors to language. Topoi. https://‍ Custance, D., Whiten, A., & Bard, K. A. (1995). Can young chimpanzees (Pan troglodytes) imitate arbitrary actions? Hayes & Hayes (1952) revisited. Behaviour, 132, 837–859. https://‍ De Waal, F. B. M. (1982). Chimpanzee Politics. London: Jonathan Cape. Douglas, P. H. & Moscovice, L. R. (2015). Pointing and pantomime in wild apes? Female bonobos use referential and iconic gestures to request genito-genital rubbing. Scientific Reports, 5–13999.  https://‍

214 Anne E. Russon Halina, M., Rossano, F., & Tomasello, M. (2013). The ontogenetic ritualization of bonobo gestures. Animal Cognition, 16, 653–666.  https://‍ Hayes, K. J. & Hayes, C. (1952). Imitation in a home-raised chimpanzee. Journal of Comparative and Physiological Psychology, 46, 450–459.  https://‍ Hinde, R. A. (1976a). Interactions, relationships and social structure. Man, 11, 1–17. https://‍ Hinde, R. A. (1976b). On describing relationships. Journal of Child Psychology and Psychiatry, 17, 1–19.  https://‍ Hobaiter, C., & Byrne, R. W. (2011). The gestural repertoire of the wild chimpanzee. Animal Cognition.  https://‍ Lyn, H., Greenfield, P. M., Savage-Rumbaugh, S., Gillespie-Lynch, K., & Hopkins, W. D. (2011). Nonhuman primates do declare! A comparison of declarative symbol and gesture use in two children, two bonobos, and a chimpanzee. Language and Cognition, 31, 63–74. Lyons, D. E., Young, A. G., & Keil, F. C. (2007). The hidden structure of overimitation. Proceedings of the National Academy of Sciences of the United States of America, 104, 19751–19756. https://‍ McNeill, D. (ed.) (2000). Language and Gesture. Cambridge: Cambridge University Press. https://‍ Meinel, M. (1995). Eliciting true imitation of object use in captive orangutans. Unpublished BA thesis, York University, Toronto, Canada. Miles, H. L., Mitchell, R. W. & Harper, S. E. (1996). Simon says: the development of imitation in an enculturated orangutan. In A. E. Russon, K. A. Bard & S. T. Parker (Eds.), Reaching into Thought, (pp. 278–299). Cambridge, UK: Cambridge University Press. Mitani, J. C., Call, J., Kappeler, P. M., Palombit, R. A., & Silk, J. B. (2012). The Evolution of Primate Societies. Chicago: University of Chicago Press. https://‍ Mitchell, R. W. (1999). Deceit and concealment as strategic script violation in great apes and humans. In S. T. Parker, R. W. Mitchell & H. L. Miles (Eds.), The Mentalities of Gorillas and Orangutans: Comparative Perspectives, (pp. 295–315). Cambridge, UK: Cambridge University Press:  https://‍ Nelson, K. (1981). Social cognition in a script framework. In J. H. Flavell & L. Ross (Eds.), Social Cognitive Development (pp. 97–118). New York: Cambridge University Press. Orangutan Cultures (2017). Orangutan Network, Dept. Anthropology, Universität Zurich. http://‍ (Sept. 30, 2017) Russon, A. E. & Andrews, K. A. (2010). Orangutan pantomime: elaborating on the message. Biology Letters, 7(4), 627–630.  https://‍ Russon, A. E., & Andrews, K. A. (2011). Pantomime in great apes: evidence and implications. Communicative and Integrative Biology, 4(3), 315–317. https://‍ Russon, A. E. & Begun, D. (eds.) (2004). The Evolution of Thought: Evolutionary Origins of Great Ape Intelligence. Cambridge, UK: Cambridge University Press. https://‍ Russon, A. E. & Galdikas, B. M. F. (1993). Imitation in free-ranging rehabilitant orangutans (Pongo pygmaeus). Journal of Comparative Psychology, 107, 147–161. https://‍

Pantomime and imitation in great apes 215

Russon, A. E. & Galdikas, B. M. F. (1995). Constraints on great apes’ imitation: Model and action selectivity in rehabilitant orangutan (Pongo pygmaeus) imitation. Journal of Comparative Psychology, 109(1), 5–17.  https://‍ Subiaul, F., Renner, E., & Krajkowski, E. (2016). The comparative study of imitation mechanisms in non-human primates. In S. S. Obhi & E. S. Cross (Eds.), Shared Representations: Sensorimotor Foundations of Social Life (pp. 109–136). Cambridge: Cambridge University Press.  https://‍ Suddendorf, T., Fletcher-Flinn, C., & Johnston, L. (1999). Pantomime and theory of mind. The Journal of Genetic Psychology, 160(1), 31–45.  https://‍ Taniguchi, Y. & Sanefuji, W. (2017). The boundaries of overimitation in preschool children: Effects of target and tool use on imitation of irrelevant actions. Journal of Experimental Child Psychology, 159, 83–95.  https://‍ Tanner, J. E. (2008). Commentary. Current Anthropology, 49(6), 1067–1068. Tomasello, M., & Call, J. (1997). Primate Cognition. New York: Oxford University Press. Vygotsky, L. (1962). Thought and Language. MIT Press.  https://‍ Yando, R., Seitz, V., & Zigler, E. (1978). Imitation: A Developmental Perspective. Hillsdale NJ: Erlbaum. Yu, Y. & Kushner, T. (2014). Social context effects in 2- and 4-year olds’ selective versus faithful imitation. Developmental Psychology, 50(3), 922–933.  https://‍

From action to spoken and signed language through gesture Some basic developmental issues for a discussion on the evolution of the human language-ready brain Virginia Volterra, Olga Capirci, Pasquale Rinaldi and Laura Sparaci Institute of Cognitive Sciences and Technologies (ISTC), National Research Council (CNR) of Italy

We review major developmental evidence on the continuity from action to gesture to word and sign in human children, highlighting the important role of caregivers in the development of multimodal communication. In particular, the basic issues considered here and contributing to the current debate on the origins and development of the language-ready brain are: (1) links between early actions, gestures and words and similarities in representational strategies; (2) importance of multimodal communication and the interplay between gestures and spoken words; (3) interconnections between early actions, gestures and signs. The innovation of this report is in connecting these themes together to relevant findings from studies on children between 6 and 36 months of age and highlighting interesting parallels in studies on ape communicative behavior. Keywords: child development, communication, multimodality, actions, gestures, words, signs

Introduction We review major developmental evidence on the continuity from action to gesture to both word and sign, reporting how their interrelationship extends beyond early childhood and across cultures. Our aim is to show the importance of pure motor acts (e.g., grasping) in the development of symbolic communication. We propose that underscoring mechanisms involved in language acquisition (spoken, or signed) and overcoming difficulties stemming from differing definitions of gestures may contribute to a more careful comparison of human vs. non-human https://‍ © 2020 John Benjamins Publishing Company

From action to spoken and signed language through gesture 217

primate communication. The terminology presented in these studies is not only heterogeneous, but has often changed considerably over the years reflecting parallel changes in methodology, perspectives and questions addressed. To gain better understanding of the role of gestures in the progression from actions to words, as well as their role in the evolution of the language-ready brain, we will present evidence from gesture studies conducted in different periods, with different methodologies and influenced by diverse theoretical perspectives. For example, in 1975 we observed how in young children specific goal-oriented actions (e.g., orienting, reaching, grasping), would gradually become separated from their concrete goals (i.e., the attempt to reach a specific object), and assume symbolic functions (Bates, Camaioni & Volterra, 1975). Many years later, novel neurophysiological findings on the link between actions and language and the theory of the mirror neuron system (MNS) led to a new understanding of these data, grounding behavioral observations in neurophysiological evidence (Arbib & Rizzolatti, 1997) and leading us to rename those communicative acts ‘action/gestures’ in order to highlight their links with action schemas. These gestures, marking the emergence of symbolic communication together with the production of the first words, entertain a continued interrelationship by giving the child the opportunity to combine spoken and gestural elements (from one- to two-words stage). This relation extends beyond childhood when the gesture-speech integrated system is consolidated in adulthood (McNeill, 1992). Our findings were compatible with Corballis’ theory (2002) on the gestural origin of language, proposing that gesture and speech have co-evolved in complex interrelationships throughout their long and changing partnership. Moreover, we noted striking similarities between early gestures of hearing children learning to speak and early signs of deaf children learning to sign (Caselli & Volterra, 1994). We will show how representational strategies present in everyday gestures and that we have studied in sign languages are not only the same, but also grounded in basic embodied motor acts that we acquire in childhood. What emerges is not a clear-cut separation, but a continuity between co-speech gestures produced by hearing children and early signs produced by children exposed to a sign language. For a schematic outline, see Appendix #1. The model of “language” which emerges from our work is best described by stating that human communication transcends the spoken medium, often exploiting embodied forms such as signs and gestures within a multimodal approach. In an elegant and provocative paper Slobin (2008) suggested and clarified that we cannot assume to uncover in sign languages the same linguistic categories and processes we find in some spoken languages (i.e., written English). Several studies on different sign languages have begun to consider signs as visible actions or dedicated gestures with linguistic properties (Cuxac, 2000; Liddell & Metzger, 1998). Studying the visible actions of speakers and signers leads to a revision of the traditional

218 Virginia Volterra, Olga Capirci, Pasquale Rinaldi and Laura Sparaci

dichotomy between linguistic and enacted (assumed to be “non-linguistic”) and to the development of a new approach to embodied language (Kendon, 2014). In the present paper we will describe different types of gestures and the diversity of mechanisms potentially involved, suggesting that some aspects of ontogeny might suggest hypotheses for the evolution of the Road Map, in particular in relation to the link between praxic actions and spoken and signed language acquisition. There is increasing evidence that not all gestures share the same developmental process. To acknowledge that this diversity of mechanisms might also be involved in the ontogeny of nonhuman primate gestural communication might present an important and fruitful challenge for comparative neuroprimatology. 1. From action to gesture and word 1.1 Links between early motor skills and gestures Traditional studies on the onset of intentional communication in humans highlighted the importance of considering communication as essentially multimodal by stressing the role of ‘performatives’ prior to speech (Bates et  al., 1975). The term ‘performatives’ refers essentially to actions emerging prior to speech, often accompanied by vocalizations and used by young infants to signal different types of basic intentional states. Examples of performatives included: ritualized request, showing off, showing, giving and pointing (see Figure 1). ‘Ritualized requests’ are acts performed by the child to induce specific responses in caregivers to obtain specific objects or actions (e.g., at 8 months children often raise their arms to be picked up or extend an arm towards an object while opening and closing their hand to indicate that they wish to grasp the object). Actions termed ‘showing off ’ are funny or unexpected acts that induce laughter or positive reactions in caregivers and may initiate or maintain social interactions (e.g., around 9 months a child may blow raspberries, inducing laughter in caregivers). Showing acts are usually meant to display presence of an object/event or specific characteristics of an object/event to caregivers (e.g., by 10 months the child is able to extends her arms toward a caregiver while holding a toy and to open her hand to show it). Giving includes actions that imply the passage of an object from child to caregiver (e.g., by 13 months a child may pick up an object on the other side of the room, cross the room and drop it in the caregiver’s lap). Pointing includes deictic actions that lead others’ attention towards specific objects/events immediately present in the surrounding environment (e.g., between 12 and 13 months a child may point to a cat while looking at the adult) (Bates et al., 1975). All the performatives described above originate from basic motor actions that are already present in the toddler’s motor repertoire. They are mainly

From action to spoken and signed language through gesture 219 REPRESENTATIONAL (ICONIC, CONVENTIONAL, REFERENTIAL, ETC.)


Touching objects

Not touching objects

Not involving objects/intransitive

Involving objects/transitive


Ritualized request



From showing off to routines, conventional gestures) (e.g., clapping hands, bye, sleep)

From actions with objects to gestures without objects, pretend play (e.g., phoning, eating, combing)

Figure 1. Types of gestures

220 Virginia Volterra, Olga Capirci, Pasquale Rinaldi and Laura Sparaci

attention-grabbers that gradually emerge during development to attract caregiver’s attention towards the child himself or towards objects/events. These gestures are often accompanied by vocalizations, which enhance their main scope as attention-grabbers, providing means of directing caregivers’ attention toward things and events present in the environment, while also building relevant shared experiences. The primary cognitive prerequisite for performative intentions reported in these studies is the ability of tool use, reported to emerge in the Piagetian sensorimotor stage 5 (Piaget, 1945). In other words, when they first appear on the scene of human communication, performatives don’t necessarily require reference to intentional states; rather they constitute cases in which familiar actions performed in specific contexts attract the other’s attention and induce specific reactions (Zukow-Goldring, 2012). These actions, through repetition, may lead to ontogenetic ritualization, a process of mutual anticipation in which particular social behaviors become ritualized to function as intentional communicative signals (Arbib, Ganesh, & Gasser, 2014). Whole-body actions (e.g., lifting the arms to be picked up), initially induced by specific constants in the surrounding environment (i.e., the adult is usually in a higher position with respect to the child), may be repeated up to the point that they become ritualized requests produced to elicit an adult’s response. Interestingly these types of performatives, which are already present in human infants by 8 months, have also been analyzed in apes. In particular, the emergence of gestures implying ritualized request behaviors described in bonobo mother-infant dyads preceding acts of carrying, highlights relevant similarities with human child behaviors (Halina, Rossano and Tomasello, 2013). However, ontogenetic ritualization in human infants as well as in non-human primates is a highly debated topic (Liebal, 2016; Marentette & Nicoladis, 2012; Tomasello, 2008). While performative behaviors such as showing off are linked to motor behaviors which involve other body parts (e.g., torso, head, face, arms, hands or legs), other performative forms (i.e., request, show, give, and point) may be traced back to infants’ early motor exploration of objects through different types of grasping. The development of this ability is linked to advances in postural control (e.g., independent sitting) and in reaching skills (for an extensive review see Sparaci & Volterra, 2017). Early performative signals, can also be found among other species (Gretscher et al., 2017 for a review). Similarly, the emergence of pointing may be traced back to early fine-motor actions (for an extensive review of pointing skills see Kita, 2003). While pointing is commonly described as a gesture in which the index finger and arm are extended in the direction of an interesting object, many other forms of pointing exist, such as pointing performed using other body parts (e.g. head and/or eye movements, lip-protruding, etc.) (Enfield, 2001). In this broader sense, pointing gestures have been identified by some authors also in

From action to spoken and signed language through gesture 221

non-human apes in special circumstances (Leavens, 2004), but other authors have raised some concerns on this evidence (Tomasello, 2006). Most performative gestures described so far have later been called “deictic” as they express only a communicative intent, but their content can only be interpreted referring to the extralinguistic context. Other performatives (e.g., “raising arms” to be picked up) have also been described in the past as “referential gestures,” as they gradually denote a semantic content which may remain relatively stable across different contexts. What about representational gestures in which the task of representing an object or an action with an object is mainly carried out by the hands? A viable hypothesis is that these gestures are grounded in early actions with objects. In fact, various studies have observed that at the same age at which infants start mastering precision grips (around 12-months), and just before spoken naming onset, short action sequences with objects begin to emerge (Caselli, 1990). These action sequences, are usually related to objects’ functions (e.g., using a spoon to eat). Soon afterwards, these same action sequences may be performed in absence of objects, while maintaining their meaning (e.g., the child could place an empty spoon in his/her mouth as if eating and subsequently reproduce the same handshape and movement used in eating with a spoon with an empty hand) (Capirci, Contaldo, Caselli & Volterra, 2005). In this sense, motor schemas and handshapes exploited by infants in grasping and functional acts may be linked to representational gestures, performed in absence of an object and denoting a specific referent while remaining relatively stable across different contexts (Sparaci & Volterra, 2017). Summing up, considering deictic and representational gestures allows us to trace back the origins of these forms to early fine- and gross- motor skills exercised by infant-caregiver dyads within specific contexts. 1.2 Early action and gesture ‘‘Vocabulary’’ and its relationship to word comprehension and production The early studies described above had identified different types of gestures produced by very young children. In a further longitudinal study on three children followed between 10 and 23 months of age, Capirci and colleagues (2005) found that the ‘practiced meanings’ that infants initially exercise in communicative actions with caregivers are likely to enter their communicative repertoires as representational (i.e., empty-handed) gestures and/or words. These longitudinal findings have been confirmed in a cross-sectional study on a broader sample of 492 Italian infants between 8 and 18 months of age using the Italian MB-CDI assessment tool (Caselli, Rinaldi, Stefanini & Volterra, 2012). The production of action-gestures was strongly correlated with word comprehension, probably because meanings of these gestures are shared with caregivers who often produce

222 Virginia Volterra, Olga Capirci, Pasquale Rinaldi and Laura Sparaci

the related word before or after the child’s gesture production, reinforcing the link between action/gestures (A/‍G) and words. The order of appearance of early A/‍G has also been analyzed considering gestures’ motor execution, outlining a developmental pattern for different gesture types and introducing the distinction between A/‍G involving or not object manipulation (for age of A/‍G appearance see Volterra, Capirci, Caselli, Rinaldi & Sparaci, 2017 and Appendix #2). This distinction was introduced to underline different origins, times of emergence and further outcomes of these gestures. Gestures not involving objects derive from what we called “showing off behaviors” performed very early in dyadic childcaregiver interactions (Reddy, 2003), while gestures involving objects derive from object manipulation behaviors (the traditional schemes with objects described by Piaget, 1945). Age of production of action/gesture and comprehension/production of words with related meaning have been analyzed in Caselli et  al., (2012) (see Appendix #3). The emergence of intentional and symbolic communication relies on both modalities and speech and gesture interact in the development of the ability to convey two pieces of information within a single communicative utterance. Various studies (Capirci et al., 2005; Capirci, Iverson, Pizzuto & Volterra, 1996; Capobianco et al., 2017; Goldin-Meadow & Butcher, 2003) were able to show that cross-modal utterances precede in all children the emergence of two-word utterances (e.g., a child may point towards a chair while saying “mommy” to ask her mother to sit with her; or a child may perform an eating gesture and then point towards a specific food on the table in order to request being fed that food). At this stage a large portion of children’s prototypical nomination and predication structures are expressed via two-element cross-modal utterances. In particular Capirci et al. (1996, p. 663–664) showed how children’s supplementary combinations expressed a variety of basic semantic relations similar to those expressed by children in the early two-word stage as shown in Appendix #4. Further support for the fact that early communicative gestures may be used as a form of naming comes from recent studies conducted on children between two and three years of age, showing that when children expand and consolidate their spoken vocabulary, gesture production, far from declining, continues to accompany spoken words. Our research group carried out several studies on this topic using a new task assessing vocabulary comprehension and production (Bello, Giannantoni, Pettenati, Stefanini & Caselli, 2012), which showed that children, when requested to produce spoken word labels for pictures, also performed pointing and representational gestures (Stefanini, Bello, Caselli, Iverson, & Volterra, 2009). It is striking that children who are already able to name a picture in speech still often resort to gestures. Representational gestures produced were stylized and

From action to spoken and signed language through gesture 223

conventionalized versions of manipulative actions (e.g., bringing a hand to the head as if combing, holding the bar of a playground merry-go-round as if spinning it). Evidence of a link between gestures and actions performed directly on objects are indicated by examples in which children executed the gesture while holding the picture (e.g., combing themselves with the picture of a comb, spinning the picture of a playground merry-go-round). This suggests that words may not be fully decontextualized yet, and that gesture production may allow recreating the action as well as the motor context in which the word was initially acquired. Also, the need to point to the referent depicted in the picture may be understood as an attempt to participate in a communicative interaction based on a joint attention scheme. Similar results on the production of pointing and representational gestures in performing the same naming task have been found in Japanese and Canadian, children. Results from these studies confirm that motor representations may be needed to support linguistic representations in speech, irrespective of the cultural environment in which the child is raised, even though the rate and the way gestures are produced may be influenced by the culture from even early stages in development (Marentette et al., 2016). Analyses of spontaneous gesture production in a naming task also provide empirical support for the idea that gestures and speech share a common conceptual space as well as an activation of hand-mouth motor programs associated with specific objects or actions. The symbolic strategies adopted are also indications of a strong continuity from actions to both spoken and signed language (Pettenati et al., 2012). 2. Representational techniques across elicited pantomime in children, communicative gestures and sign languages In recent research, considerable efforts have been made in attempting to consider together representational techniques in the manual modality as described by different traditions of studies, for example elicited pantomimes in symbolic development studies, co-speech and silent gesture in child and adult spontaneous communication, and signs in Sign Language research. Crucial for any attempt to develop a unified taxonomy is to clarify whether during gesture execution the body/hands represent real actions in the physical world (i.e., how an action is performed or how an object is held/used) or something other (e.g., the object itself or its size/shape). Table  1 shows some of the different labels used in different studies for four main representational strategies. For example, technique (a) that involves the person’s own-body can depict movements of an agent in which case the gesturing body parts engage in a pattern of action that has many kinematic features in common with the action that is being referred to. The origin of this strategy can be traced back to early performatives

224 Virginia Volterra, Olga Capirci, Pasquale Rinaldi and Laura Sparaci

Table 1.  Labels used for representational techniques or strategies in different traditions In symbolic development studies

In gesture studies on adults and children

In sign language studies



own-body enactment mime/pantomime character viewpoint action gesture

constructed action –  swimming: paddling the body classifier arms/hands in the air person transfer – turning: rotating the torso/swiveling

b imagined-object

hand-as-hand handling manipulation action gesture character viewpoint function gesture

constructed action –  fork: bringing a closed fist handling classifier towards the mouth person transfer – combing: fingers wrapped around an imaginary handle of a comb

c body-part-asobject

hand-as-object observer viewpoint modeling action gesture form gesture

entity/instrument classifier situation transfer

– umbrella: bringing an open hand above the head – fork: extended fingers as tines, move from table to mouth


size-and-shape depiction delimitation observer viewpoint

size /‍shape specifiers tracing form transfer

– small: finger and thumb held close together – tree: tracing the form of the tree in the air

involving the whole body (e.g., showing off) described above. In a similar way, in technique b, when actions can be represented using only the hands rather than the entire body, hands can be used to enact how an object is held or manipulated or to show motor acts or grasps associated to object use. The origin of this representational strategy (also termed ‘hand-as-hand’) can be traced back to functional actions with objects (e.g., using a fork to eat). Studies on symbolic development have called these gesture types “imagined-object gestures”. Both techniques 1 and 2, involving ones own body or only the ones own hand-as-hand, rely on representational techniques involving some form of enactment. However, our hands can also be used to represent an object itself, as in technique c. The origin of this representational strategy can be traced back to actions and gestures involving objects during early stages of development. Finally, with technique d, the hand can represent the size-and-shape of an object (i.e., tracing the contours of an object or the outcome path of a movement). This representational strategy should be kept distinct from the hand-as-object

From action to spoken and signed language through gesture 225

strategy. In fact, in this case the hand is used to describe the object, but does not physically stand-for the object itself. The origin of this representational technique may be found in A/‍G, irrespective of whether they involve objects or not. As shown in Table 1, these four techniques have been described by different disciplines using different labels. Results from elicited pantomime studies reported that three-year-olds, requested to label an object such as “comb” and asked to pretend to use that item, were more likely to produce hand-as-object gestures, while children older than 6 years produced also imagined-objects gestures in which they pretended to hold and use an object, depicting its function (e.g., fingers wrapped around an imaginary handle of a comb) (Boyatzis & Watson, 1993). Iverson, Capirci and Caselli (1994) provided the first detailed analysis of representational techniques present in gestures produced by 16- and 20-month-old Italian children. This study examined children’s use of communicative gesture showing how they relied on different strategies depicting: (a) characteristics or qualities of an object (e.g., holding a hand high over the head to indicate big, similar to the size-and-shape technique); (b) actions with an object (e.g., combing with or without a comb in the hand to indicate “comb” or “combing”, similar to the hand-as-hand technique); (c) the form or movement of the object itself (e.g., flapping hands and arms for “bird”, similar to the own-body technique). Iverson et al. (1994) found that the proportion of gestures reproducing actions with objects (e.g., the function of the object) or the form or movement of the object itself, tended to increase between 16- and 20-months-of-age. More recent studies have analyzed gesture production in older children and in different tasks (naming and narrative tasks). Marentette and colleagues (2016) considered cross-cultural differences comparing spoken and gestural productions of 2-year-old children in a picture naming task, showing that Italian children growing up in a ‘high gesture culture’ produced twice as many gestures as Canadian children from a ‘low gesture culture’, but gestures from both groups involve a similar range of the four main representational techniques discussed above. In fact, two-year-olds were equally likely to produce gestures depicting function (technique b) as form (technique c). Despite cultural and linguistic differences in frequency of use of the individual techniques, strategies for depicting information about objects and events make visible different types of embodied practices and suggest a shared cognitive basis, which is recruited by both language and gestural systems. Finally, Capirci, Cristilli, de Angelis, and Graziano (2011) in examining the development of co-speech gestures in four- to ten-year-olds’ narratives, found that the most widely used strategy for all age groups was hand-as-hand, but older children produced more hand-as-object with respect to younger ones. These authors

226 Virginia Volterra, Olga Capirci, Pasquale Rinaldi and Laura Sparaci

suggest that co-speech gestures become more abstract as their form becomes representationally more flexible and that children aim at depicting specific aspects of objects, rather than using their hands as if acting in the physical world. These results by showing that use of hand-as-object representational technique increases with age growth are in clear opposition to data collected using elicited pantomimes and described above as showing that only older children tended to use the hand-as-hand strategy. Possibly these contrasting data are due to the small number of items used in tasks eliciting pantomime. In fact, as reported in studies relying on spontaneous gesture production, children tend to use all four representational techniques at younger ages, their performance reflecting responses to particular items or communicative situations, rather than indicating a limited symbolic capacity. Furthermore, it appears that chosen techniques may depend to a greater extent than formerly recognized on properties of the object or on object affordances (Marentette et al., 2016). In addition to studies on gestures in young children acquiring a spoken language, these representational techniques have been reported upon in signed languages. The sign language forms relevant here have often been called ‘classifiers’ and involve specific handshapes and orientations that function as morphemes and also indicate the semantic class to which the referent belongs by denoting some of its salient, perceived or imputed characteristic. These forms fall into categories which parallel the four representational techniques described above in Table  1: (a) constructed actions or transfer of person (similar to own-body gestures), in which the whole body represents a character and/or his/her actions; (b) handling classifiers, in which the hands represent the manipulation of the object (similar to hand-as-hand gestures); (c) entity classifiers, in which the hands represent the object as a whole or a class of objects (similar to hand-as-object gestures); (d) size-and-shape classifiers (SASSs), in which the hands represent the size or overall shape of an object. It is only recently that sign language researchers have explicitly attempted to connect their linguistic analyses to similar analyses conducted in research on symbolic development. For example, Brentari, Di Renzo, Keane, & Volterra (2014) found that, across two cultures (American and Italian), two signed languages (Italian and American) and two spoken languages (Italian and English), signers and speakers using gestures (both adults and children) were more likely to represent agentive situations (i.e., people acting on objects) using handling strategies (i.e., hand-as-hand) rather than entity strategies (i.e., hand-as-object). Similarly, Padden, Meir, Hwang, Lepic, Seegers and Sampson (2013) considering patterned iconicity in the American Sign Language (ASL) lexicon showed that the particular distribution of representational techniques can vary depending on context. In other words, adults seem to use different signs in ASL in naming an

From action to spoken and signed language through gesture 227

object (i.e., hand-as-object strategy) or describing its use (i.e., hand-as-hand strategy). This evidence suggests that similar differences may emerge in children as well. To sum up this rather brief overview of representational strategies used in both gestures and signs, it is clear that all four representational strategies observed both in studies on adult gesturing and signing appear to be present already in the representational gestures of hearing children from high as well as low gesture culture. 3. Similarities between gestures and signs Initial studies on sign languages tended to focus on the discrete, arbitrary and categorical nature of signs, which seem more like spoken languages, but thereby overlooked the pervasive iconic nature of many sign language structures as well as similarities of signs with co-speech and silent gestures. In these early days, the priority of sign researchers was to demonstrate just how unlike “loose gesturing and pantomime” sign language really was (Kendon, 2014). Only in subsequent years did several researchers working on different signed languages begin to focus on what Cuxac (2000) has termed ‘Highly Iconic Structures’ (HIS), and began considering signs as visible actions or dedicated gestures with linguistic properties. Cuxac (2000) claimed that all sign languages are grounded upon, and exploit a basic capacity signers have in iconizing their perceptual/practical experience of the physical world. The iconization processes in signed languages endow them with two ways of signifying: in an illustrative way, by ‘telling with showing’ and by a non-illustrative ‘telling without showing’. The operations signers perform when choosing an illustrative intent are defined by Cuxac as “Transfers”, and conceived as traces of cognitive operations whereby signers transfer their conceptualization of the real world into the four-dimensional world of signed discourse (i.e., the three dimensions of space plus the dimension of time). These transfers are techniques for depicting information about objects, events and their relationship. These techniques are also labeled ‘constructed actions’, ‘classifiers’ and ‘productive forms’ by other researchers, as detailed in Table 1. A particular striking feature of these highly iconic transfers is that they can be combined with each other or with one lexical unit to encode several kinds of different information on one (or more) referents in a simultaneous manner that has no parallel in speech. Several authors, overlooking terminological differences, have found the widespread use of Highly Iconic Structures in different genres of signed discourse and in different sign languages (e.g., for a comparative study on the signed languages of Italy, the United Stated and France, see Antinoro Pizzuto, Rossini, Sallandre, & Wilkinson, 2006). While the fundamental and pervasive semiotic dimension of Highly Iconic Structures in sign languages is hardly questionable, they are almost absent in linguistic descriptions of sign languages and often

228 Virginia Volterra, Olga Capirci, Pasquale Rinaldi and Laura Sparaci

relegated to a gestural non-linguistic status. The same fate is suffered by depictions in spoken languages that, as recently suggested by Clark (2016), although representing integral parts of everyday utterances, are absent from standard models of language processing. A structuralist/formalist approach to language assumes a representation of language as composed of discrete and listable symbols with strict boundaries and well-defined discrete categories. On the other hand, a usage-based approach, which moves beyond the linguistic vs. gestural dichotomy, leads to a more cognitive view of language in which linguistic units can exhibit variability and gradience (Occhino & Wilcox, 2017). Some recent studies on gestures in children and adults have adopted many strategies for analysis borrowed from sign languages in order to investigate the continuity between gestures and signs. For example, in a study aimed at exploring motor characteristics of representational gestures produced by Italian hearing children (two/three years old) using the Picture Naming Game (PiNG) task, Pettenati, Stefanini and Volterra (2010), showed interesting similarities and consistencies in the manual parameters produced by individual children requested to label the same visual stimulus. Furthermore, some motor characteristics found in the production of these gestures were also found in the first signs produced by signing toddlers. Gestures and early signs were both produced using similar locations (e.g. the face/head, neutral space) and the same restricted set of six basic handshapes. In particular, the same handshapes, which appear in the gestures produced by Italian hearing children, correspond also to the same handshapes described by Boyes Braem (1981) as part of Stage 1 and 2 in her model describing the acquisition of handshapes in ASL. These findings support the view that motoric factors involved in the production of handshapes are seen both for the production of gestures and of signs and could be largely explained by the anatomy and physiology of the hand and arm. Finally, it must be noted that the elements that we classified as gestures in spoken language don’t play the same function as signs/gestures do in sign languages, as also suggested by recent brain studies (Newman et al., 2015). 4. Toward a new road map In the present paper we have described different types of gestures – deictic and representational  – produced by human children in the early stages of language acquisition. We also presented the four basic strategies involved in representational gestures, showing how the same strategies are found not only in children’s communication, but also in sign languages. Furthermore, the ontogenetic relation we have described between praxic actions, gestural communication and signed

From action to spoken and signed language through gesture 229

and spoken language development may provide relevant suggestions and hypotheses for designing a new road map as briefly summarized in the four major points made below. 1. It appears particularly fruitful to have a more differentiated view that considers different types of gestures (i.e., deictic and representational; involving and not involving objects) as possible building blocks of early communicative signals. In the previous road map (Arbib, 2016), performatives or deictic gestures were scarcely described although according to data from human children, they are the first gestures to emerge. The meaning of this type of gestures is given by the extralinguistic context in which they are performed. As reported by different studies (Gretscher et al., 2017) apes can produce ritualized requests mainly for food as well as showing off behaviors to get attention. Apes seems to learn to use declarative pointing only in a human environment, suggesting a role for social responsiveness in the caregiver lacking in the last common ancestor (LCA-c) as described by Arbib (this volume). 2. Another important aspect is that gestural and vocal modalities are exploited together for communication by human children starting from very early stages of development: hand and mouth synergies are present since the very beginning also in the production of performatives gestures which are often produced together with vocalizations (Iverson & Thelen, 1999). The strong interrelation between these two modalities, gestural and vocal, is also evident in the case of the child’s first two element combinations: Basic semantic relations are expressed though a combination of deictic and symbolic elements before being expressed by two representational words or signs. Despite the differences between evolution and ontogeny, the transition from gesture-word combinations to two words-combinations could suggest a possible scenario for the transition from protolanguage to language. In addition it appears that children are capable of expressing multimodal complex combinations where the distinction between gestures or words representing nouns and verbs is not needed. 3. The distinction between pantomimes and protosigns could be further explored according to results from developmental studies especially in relation to representational strategies described not only in children’s and adults’ gestural communication, but also in sign language descriptions which give more attention to different kinds of ‘productive’/’classifier’ forms. The definition provided by Russon of great apes pantomimes as well the examples provided by Arbib in the case of LCA-m as a form of iconic gesture in which actors intentionally enact their meaning and referents and simulate engagement in activities (Arbib this volume; Russon & Andrews, 2011) appear to be very similar to behaviors

230 Virginia Volterra, Olga Capirci, Pasquale Rinaldi and Laura Sparaci

we have described in the present paper as representational gestures adopting an “own body” strategy. Early child communication development shows that gestures adopting an own body strategy can become conventionalized within a community through ontogenetic ritualization. But a key distinction between ontogenetic ritualization in great apes and in the transition from pantomimes to “protosigns” in humans is that the former stays within dyads while the latter can migrate across the community. Our hypothesis is that the so called protosigns usually adopt one of the two symbolic strategies we have described above in which the hand becomes the object itself or becomes the form of the hand performing the action being referred to. 4. Given the link we have described between word comprehension and action/ gesture production in infants, it is essential to consider both communicative productions in non human primates and their comprehension of audible and visual signals in order to better understand the link between actions (transitive and intransitive) and gestural and vocal production. This relationship is an actual research topic in human development, but studies investigating ‘comprehension’ of multimodal signals by non-human species are only at a preliminary stage. Finally, studying the relationship between action, gesture, sign and speech offers a valuable tool for investigating the overarching question of how language emerges from a non-linguistic state. In doing this, the traditional dichotomy between gestures as enacted (gradient, variable, iconic) and signs/word as linguistic (categorical, invariable, arbitrary) should be replaced with a multimodal approach to the study of both spoken and signed languages.

References Antinoro Pizzuto, E., Rossini, P., Sallandre, M. A. & Wilkinson, E. (2006). Deixis, anaphora and highly iconic structures: Cross-linguistic evidence on American (ASL), French (LSF) and Italian (LIS) signed languages. TISLR9 (pp. 475–495). Editora Arara Azul, Brazil. Arbib, M. A., Ganesh, V. & Gasser, B. (2014). Dyadic Brain Modeling, Ontogenetic Ritualization of Gesture in Apes, and the Contributions of Primate Mirror Neuron Systems. Phil Trans Roy Soc B, 369 (1644), 20130414.  https://‍ Arbib, M. A. (2016). Towards a Computational Comparative Neuroprimatology: Framing the language-ready brain. Physics of Life Reviews, 16, 1–54. https://‍ Arbib, M. A. & Rizzolatti, G. (1997). Neural expectations: A possible evolutionary path from manual skills to language. Communication and Cognition, 29, 393–424. Bates, E., Camaioni, L. & Volterra, V. (1975). The acquisition of performatives prior to speech. Merril Palmer Quarterly, 21, 205–226.

From action to spoken and signed language through gesture 231

Bello, A., Giannantoni, P., Pettenati, P., Stefanini, S. & Caselli, M. C. (2012). Assessing lexicon: validation and developmental data of the Picture Naming Game (PiNG), a new picture naming task for toddlers. International Journal of Language and Communication Disorders, 47, 589–602.  https://‍ Boyatzis, C. J. & Watson, M. W. (1993). Preschool children’s symbolic representation of objects through gestures. Child Development, 64, 729–735.  https://‍ Boyes Braem, P. (1981). Features of the handshape in American Sign Language. Unpublished doctoral dissertation, University of California, San Diego. Brentari, D., Di Renzo, A., Keane, J. & Volterra, V. (2014). Cognitive, Cultural, and Linguistic Sources of a Handshape Distinction Expressing Agentivity. Topics in Cognitive Science 1–29. https://‍ Capirci, O., Contaldo, A., Caselli, M. C. & Volterra, V. (2005). From Action to language through gesture: a longitudinal perspective, Gesture, 5, 155–177. https://‍ Capirci, O., Cristilli, C., De Angelis, V. & Graziano, M. (2011). Learning to use gesture in narratives: developmental trends in formal and semantic gesture competence. In G. Stam & M. Ishino (Eds.) Integrating Gestures (pp. 189–200), Amsterdam/Philadelphia, John Benjamins.  https://‍ Capirci, O., Iverson, J. M., Pizzuto, E. & Volterra, V. (1996). Gestures and words during the transition to two-word speech. Journal of child language, 23, 645–673. https://‍ Capobianco, E., Antinoro Pizzuto, E. & Devescovi, A. (2017). Gesture–speech combinations and early verbal abilities. New longitudinal data during the second year of age. Interaction Studies 18(1), 55–76.  https://‍ Caselli, M. C. & Volterra, V. (1994). From communication to language in hearing and deaf children. In V. Volterra & C. J. Erting (Eds.) From Gesture to Language in Hearing and Deaf Children (pp. 263–277). 2nd ed., Washington, DC: Gallaudet University Press. Caselli, M. C. (1990). Communicative gestures and first words. In V. Volterra, & C. J. Erting (Eds.), From gesture to Language in Hearing and Deaf Children (pp. 56–67). Berlin: SpringerVerlag.  https://‍ Caselli, M. C., Rinaldi, P., Stefanini, S. & Volterra, V. (2012). Early Action and Gesture ‘‘Vocabulary’’ and Its Relation With Word Comprehension and Production. Child Development, 83, 526–542.  https://‍ Clark, H. H. (2016). Depicting as a method of communication. Psychological review, 123(3), 324–347.  https://‍ Corballis, M. C. (2002). From hand to mouth. The origins of language. Princeton, Princeton University Press. Cuxac, C. (2000). La langue des signes française (LSF): les voies de l’iconocité (No. 15–16). Ophrys. Enfield, N. J. (2001). ‘Lip-pointing’: A discussion of form and function with reference to data from Laos. Gesture, 1(2), 185–211.  https://‍ Goldin-Meadow, S. & Butcher, C. (2003). Pointing toward two-word speech in children. In S. Kita (Ed.), Pointing: where language, culture and cognition meet (pp. 85–107). Mahwah, N.J.: Erlbaum. Gretscher, H., Tempelmann, S., Haun, D. B. M., Liebal, K. & Kaminski, J. (2017). Prelinguistic human infants and great apes show different communicative strategies in a triadic request situation. PLoS ONE 12, 4, e0175227.  https://‍

232 Virginia Volterra, Olga Capirci, Pasquale Rinaldi and Laura Sparaci Halina, M., Rossano, F. & Tomasello, M. (2013). The ontogenetic ritualization of bonobo gestures. Animal cognition, 16(4), 653–666.  https://‍ Iverson, J. M., Capirci, O. & Caselli, M. C. (1994). From communication to language in two modalities. Cognitive Development, 9, 23–43. https://‍‍90018-3 Iverson, J. M. & Thelen, E. (1999). Hand, mouth, and brain: The dynamic emergence of speech and gestures. Journal of Consciousness Studies, 6, 19–40. Kendon, A. (2014). Semiotic diversity in utterance production and the concept of ‘language’. Philosophical Transactions of the Royal Society B, 369, 20130293. https://‍ Kita, S. (Ed.) (2003). Pointing: Where language, culture, and cognition meet. Mahwah, NJ: Erlbaum. Leavens, D. A. (2004). Manual deixis in apes and humans. Interaction Studies, 5(3), 387–408. https://‍ Liddell, S. K. & Metzger, M. (1998). Gesture in sign language discourse. Journal of Pragmatics, 30, 657–697.  https://‍‍00061-7 Liebal, K. (2016). The ontogeny of great ape gesture – not a simple story: comment on “Towards a computational comparative neuroprimatology: framing the language-ready brain” by M.A. Arbib. Physics of Life Reviews, 16, 85–87.  https://‍ Marentette, P. & Nicoladis, E. (2012). Does ontogenetic ritualization explain early communicative gestures in human infants? In S. Pika & K. Liebal (Eds.). Developments in Primate Gesture Research (pp. 33–53). John Benjamins Publishing Company https://‍ Marentette, P., Pettenati, P., Bello, A. & Volterra, V. (2016). Gesture and symbolic representation in Italian and English-speaking Canadian two-year-olds. Child Development, 87(3), 944–961.  https://‍ McNeill, D. (1992). Hand and mind. What the hands reveal about thought. Chicago: University of Chicago Press. Newman, A. J., Supalla, T., Fernandez, N., Newport, E. L. & Bavelier, D. (2015). Neural systems supporting linguistic structure, linguistic experience, and symbolic communication in sign language and gesture. Proceedings of the National Academy of Sciences of the United States of America 112(37), 11684–89.  https://‍ Occhino, C. & Wilcox, S. (2017). Gesture or sign? A categorization problem. Behavioral and Brain Sciences, 40, e66.  https://‍ Padden, C. A., Meir, I., Hwang, S. O., Lepic, R., Seegers, S. & Sampson, T. (2013). Patterned iconicity in sign language lexicons. Gesture, 13(3), 287–308. https://‍ Pettenati, P., Stefanini, S. & Volterra, V. (2010). Motoric characteristics of representational gestures produced by young children in a naming task. Journal of Child Language, 37(4), 887–911.  https://‍ Piaget, J. (1945). La formation du symbole chez l’enfant: Imitation, jeu et rêve, image et representation. Neuchatel; Paris: Delachaux et Niestlé. Reddy, V. (2003). On being the object of attention: implications for self–other consciousness. TRENDS in Cognitive Sciences, 7(9), 397–402. https://‍‍00191-8

From action to spoken and signed language through gesture 233

Russon, A. E. & Andrews, K. A. (2011). Pantomime in great apes: evidence and implications. Communicative and Integrative Biology, 4(3), 315–317. https://‍ Pettenati, P., Sekine, K., Congestrì, E. & Volterra, V. (2012). A comparative study on representational gestures in Italian and Japanese children. Journal of Non Verbal Behaviour, 36, 149–164.  https://‍ Slobin, D. (2008). Breaking the Molds: Signed language and the nature of human language. Sign Language Studies, 8(2), 114–130.  https://‍ Sparaci, L. & Volterra, V. (2017). Hands shaping communication: From gestures to signs. In: M. Bertolaso & N. Di Stefano (Eds.), The Hand. Studies in Applied Philosophy, Epistemology and Rational Ethics, vol 38, (pp. 29–54). Springer, Cham. Stefanini, S., Bello, A., Caselli, M. C., Iverson, J. M. & Volterra, V. (2009). Co-Speech gestures in a naming task: Developmental data. Language and Cognitive Processes, 24(2), 168–189. https://‍ Tomasello, M. (2006). Why don’t apes point? In N. J. Nicholadis & S. C. Levinson (Eds.), Roots of human sociality: Culture, Cognition, and Interaction (pp. 506–524). Oxford: Berg. Tomasello, M. (2008). The origin of human communication. Cambridge, MA: MIT Press. Volterra, V., Capirci, O., Caselli, M. C., Rinaldi, P., & Sparaci, L. (2017). Developmental evidence for continuity from action to gesture to sign/word. Language, Interaction, Acquisition, 8 (1), 13–41.  https://‍ Zukow-Goldring, P. (2012). Assisted imitation: First steps in the seed model of language development. Language Sciences, 34, 569–582.  https://‍

Appendix #1 Overview of the continuity from action to gesture and sign. Overview

Actions of infants

Symbolic process

Communicative gestures

Co-speech gestures

Sign languges

234 Virginia Volterra, Olga Capirci, Pasquale Rinaldi and Laura Sparaci

Appendix #2 Repertoire of action-gestures produced by Italian infants. Data collected through the parent questionnaire (Gestures and Words MacArthur-Bates Communicative Development Inventories (MB-CDI) Action/Gesture

Age (in months) at which the Action/ Gesture is produced by at least 50% of children of the sample

Extends his/her arm upward to signal a wish to be picked up (*)


Drink from a cup or bottle containing liquid


Request object extending his/her hand


Play peekaboo (*)


Waves the hand for ‘bye bye’


Clap hands (*) Dance

     8      9      9



Open and close door and closet


Shows to adult what has in his/her hand





Gives to adult what has in his/her hand




Throw a ball


‘Sleep’ (leaning head on hand or pillow and closing eyes) (*)


‘Read’ (opens book, turns page)


Push toy car or truck


Points to an object or event

Shakes head or finger for ‘no’



Give slap (*)


Comb or brush own hair


Eat with a spoon or fork


Use remote control


Blow kisses



Stir with spoon


Pound with hammer or mallet


Write with a pen, pencil, or marker


Put on hat


Kiss or hug (dolls or stuffed animals)


Play musical instrument


From action to spoken and signed language through gesture 235


Age (in months) at which the Action/ Gesture is produced by at least 50% of children of the sample

Clean with cloth or duster


Drive car by turning steering wheel


Put key in door or lock


Sweep with broom or mop


Sniff flowers


Blow to indicate something is hot (*)


Put on a shoe or sock


Wipe face or hands with a towel or cloth


Shrugs to indicate ‘all gone’ or ‘where’d it go’


Nods head ‘yes’ (*)

    15     15

Index to cheek for something tasting good



Dig with shovel


Pour pretend liquid from one container to another


Put on a necklace, bracelet, or watch


Brush teeth


Caress (dolls or stuffed animals)


Push in stroller/buggy (dolls or stuffed animals)


Feed with spoon (dolls or stuffed animals)


Index to lips for ‘sch’ (‘be silent’)





Feed with bottle (dolls or stuffed animals)


Rock (dolls or stuffed animals)


Put to bed (dolls or stuffed animals)


Try to put shoe or sock (on dolls or stuffed animals)

> 18

Water plants

> 18

Brush/comb hair (dolls or stuffed animals)

> 18

Pat or burp (dolls or stuffed animals)

> 18

‘Type’ at a typewriter or computer keyboard

> 18

Talk to (dolls or stuffed animals)


> 18

Cover with blanket (dolls or stuffed animals)

> 18

Hold plane and make it ‘fly’

> 18

Wash dishes

> 18

Wipe face or hands (dolls or stuffed animals)

> 18

236 Virginia Volterra, Olga Capirci, Pasquale Rinaldi and Laura Sparaci Note: *  Not involving objects

Adapted from: Volterra, V., Capirci, O., Caselli, M.C., Rinaldi, P. & Sparaci, L. (2017). Developmental evidence for continuity from action to gesture to sign/word. Language, Interaction and Acquisition, 8(1), 13–41.

Appendix #3 Age of appearance of A/‍G and words with related meaning. The age of appearance corresponds to the age at which the 50% of children perform the action or comprehend and produce the word Age of appearance (in months): Age at which 50% of children do it Actions/Gestures

Corresponding Italian word (and English translation)

Action/ Gesture

Word Comprehension

Word Production






















Shakes head or finger No for “no” (no)




* “Sleep” (leaning head Nanna on hand or pillow and (sleep) closing eyes)




Drink from a cup or bottle containing liquid




Play peekaboo

(peekaboo) *

Waves the hand for bye bye *

Clap hands



Ciao (Hallo; Bye bye) Bravo (good boy) Ballare (to dance)


Give slap


Blow kisses

Dare le tottò (give slap) Baciare (to kiss)

Acqua (water)

From action to spoken and signed language through gesture 237

Age of appearance (in months): Age at which 50% of children do it Actions/Gestures

Corresponding Italian word (and English translation)

Action/ Gesture

Word Comprehension

Word Production

Eat with a spoon or fork






Comb or brush own hair





Throw a ball


















(Ball) Push toy car or truck

Brum brum (sound for car)

“Read” (opens book, turns page)

Libro (book)

Use remote control

Televisione (television)

Open and close door

Apre (to open)

Note: *  not involving object manipulation

Adapted from: Caselli, M.C., Rinaldi, P., Stefanini, S., Volterra, V. (2012). Early Action and Gesture “Vocabulary” and Its Relation With Word Comprehension and Production. Child Development, 83, 526–542. doi: 10.1111/j.1467-8624.2011.01727.x

Appendix #4 Semantic relations in crossmodal and unimodal combinations Semantic relations

Crossmodal Gesture-Word combinations

Unimodal spoken Two words combinations


HI + mommy

hi + mommy


EXTEND ARM + other

more + other


ALL_GONE + food

no/all gone + water

Agent + action

POINT(to owen) + burns

mommy + puts

Action + object

POINT(to the toy to be taken takes it + away away) + takes away (continued)

238 Virginia Volterra, Olga Capirci, Pasquale Rinaldi and Laura Sparaci Semantic relations

Crossmodal Gesture-Word combinations

Unimodal spoken Two words combinations

Action or entity + locations

POINT(to toy) + down

go + inside


POINT(to daddy’s cup) + daddy

mommy’s + pen

Attribute + entity

POINT(to balloon) + big

green + that

Note: Gestures are showed in capitals English glosses. Words are given in lower case letters

Praxis, symbol and language Developmental, ecological and linguistic issues Chris Sinha

Hunan University

This article focuses on the interweaving of constructive praxis with communication in ontogenesis, in phylogenesis and in biocultural niche evolution (ecogenesis), within an EvoDevoSocio framework. I begin by discussing the nature of symbolization, its evolution from communicative signaling and its elaboration into semantic systems. I distinguish between the symbol-ready and the languageready brain, leading to a discussion of linguistic conceptualization and its dual grounding in organism and language system. There follows an outline account of the interpenetration in the human biocultural niche-complex of semiosphere and technosphere, mediated by the evolution of the niche of infancy. Symbolization (the foundation of the semiosphere) is by definition normative; the normative character of the technosphere is demonstrated by the interrelations in human development between affordance, action schema and canonical functional object schema. A model of the neuro-computational implementation of dual grounding is proposed. Keywords: EcoEvoDevo, praxis, protolanguage, symbol grounding, niche construction, language evolution, epigenesis, infancy

1. Introduction Controversy over language evolution centers on three issues. First, whether language (i) emerged abruptly as an autonomous cognitive system independently of the evolution of communication, vs. (ii) evolving from nonlinguistic communication and cognition; second, whether the defining attribute of language is (i) recursive syntax, vs. (ii) other features (e.g. symbolization, displacement), singly or in combination; third, whether the emergence of language was (i) relatively recent (after sapiens speciation), vs. (ii) relatively old. for these three issues, respectively, as autonomy, recursion and recency. One hypothesis favors option (i) for all three

https://‍ © 2020 John Benjamins Publishing Company

240 Chris Sinha

issues (Berwick and Chomsky, 2016).1 Alternatives vary from favoring (ii) in all cases (Dediu and Levinson, 2013); to rejecting autonomy and recency but (largely) favoring recursion (Corballis, 2017); to rejecting autonomy and recursion but accepting recency (Arbib, 2012; Sinha, 2013). This article will adopt the last of these positions, proposing that the symbol-ready brain may have been present in LCA-c; and focusing on the evolutionary processes leading from early symbolic culture to Evolutionarily Modern Languages. The interweaving of constructive praxis with language, theorized within an EcoEvoDevoSocio framework, is a key theme of this article. EcoEvoDevo (Abouheif et al., 2014; Gilbert et al., 2015) is a synergistic co-evolutionary approach implicating the dynamically unfolding, interactive “triple helix” of ecogenesis (niche construction), phylogenetic evolution and ontogenetic development; the suffix -Socio highlights the fundamentally social matrix within which this co-evolutionary process occurred in relation to language evolution. I begin by discussing the nature of symbolization, its evolution from communicative signaling and its elaboration into semantic systems. I refer throughout to empirical and theoretical work by myself and my colleagues over many years; this overview necessarily sacrifices detail to an emphasis on comprehensiveness and implications for the new Road Map. 2. From communicative signal to representational symbol Arbib (2015) distinguishes praxic from communicative actions, classifying the latter in terms of modality (visual-enactive pantomime, visual/manual sign and vocal speech); and drawing attention to (a) the semantic compositionality (b) the duality [signifier/signified] of linguistic signs. I wish to emphasize two equally important aspects of linguistic signs, namely their representational status and their semantic systematicity. To clarify the former, let’s analyze the difference between communicative signaling and the communicative representation of something. Communicative signals have the effect of changing the behavior of the recipient organism in a direction that is in some way advantageous to the signaler and/or the recipient. Many, but not all, communicative signals are produced by animals in the context of dyadic interactions. Communicative signals may emerge from non-communicative behaviors through a process of ritualization, in which the expression of an emotional-motivational state, or the initiating sequences of a social behaviour, become abbreviated and acquire a communicative value (Huxley, 1966; Hauser, 1996; Tomasello and Call, 1997; Sinha, 2004; Halina et al., 2013; Arbib, this volume). Non-dyadic communicative signals, such as the well-known case of 1.  This has been called the “Prometheus hypothesis”, but could better be named the “Athena hypothesis”, after the Greek myth in which Athena sprang fully-armed from the head of Zeus.

Praxis, symbol and language 241

vervet monkey alarm calls analyzed by Cheney and Seyfarth (1981), are emitted in “broadcast” mode. Figure 1 represents the causal chain giving rise to a communicative signal in broadcast mode, emitted by an animal in response to some event cuing the signal, and eventuating in an action by a recipient. Environmental Event

Response1/ SIGNAL


Response2/ ACTION


Figure 1.  A communicative signal. Adapted from Sinha (2004)

Note that neither intentionality, nor mutual attentional orientation by signaler and recipient either to each other or to the triggering environmental event, are necessary for the success of communicative signaling. In other words, signals may be “functionally referential” (Slocombe and Zuberbühler, 2005; Liebal and Oña, this

Referential Situation


Symbol Expression




Figure 2.  Symbolic communication. A modified version of Bühler’s model of language (broken lines represent joint attention). Reproduced from Sinha (2004: 225)

242 Chris Sinha

volume) even though they do not fulfil the criteria for true symbolic reference.The centrality of representation to symbolization, and the difference between this and signaling, was stressed by Karl Bühler (1990 [1934]), on whose “Organon model” of language Figure  2 is based. Here, the symbol is understood by both speaker and hearer to have a representational (“standing for”) relationship to the situation (object, event etc.) to which it refers (Sinha, 1988). It thus presupposes both the intentional orientation of each participant to the other, and their mutual orientation to the referential situation. Sinha (2004: 225–226) hypothesizes the following evolutionary scenario for the phylogenetic emergence of symbolic from signal communication: 1. The receiver comes to pay attention to the sender as the source of communicative signals. 2. The sender comes to pay attention to the receiver as a recipient of communicative signals. 3. The receiver comes to pay attention to the evidential reliability of the sender’s communicative signals as a source of information, by checking what the sender is paying attention to, or doing. 4. The sender comes to pay attention to the receiver’s readiness to reliably act upon the information communicated, by paying attention to what the receiver is paying attention to, or doing.2 The model depicted in Figure 2 is modality-independent, and does not imply a commitment to either “speech first” or “gesture first” evolutionary pathways. It characterizes the symbolic logic of both proto-sign and protolanguage, whose emergence depended upon both a symbol-ready brain and a symbolic niche favoring the conventionalization of symbols. Ape language training experiments suggest that the symbol-ready brain may already have been present in LCA-c. However, although individual, human- acculturated non-human primates have been shown to have a symbolic capacity (implying in these species a symbol-ready brain), there is no reliable evidence of widespread, entrenched symbolic practices in any non-human species. Apes have cultures (Whiten et al., 1999), and they communicate, but symbols play virtually no role in either culture or communication: chimpanzee and bonobo cultures are overwhelmingly signal-based, not symbolic. This forces the conclusion that the symbolic niche (or proto-semiosphere: Lotman, 1990), was an achievement of the hominin line. 2.  Arbib (pc) suggests that the absence/presence of communicative intentionality and/or awareness of the attention of the recipient may be an evolutionary marker of the LCA-m to LCA-c transition.

Praxis, symbol and language 243

We have no record of ancestral symbolic communication in vocal or gestural modalities, and we cannot observe in nature how a proto-semiosphere looks, nor what are its other accompanying communicative and praxic features. The question then is: What material (fossil or artefactual) evidence should we accept as a proxy for the existence of a symbolic niche? Ideally, this question should be approached by setting agreed and consistent criteria, rather than starting with fragments of evidence and deducing the presence or absence of symbolization from them (Stout, this volume). Unfortunately, the evidence is so sparse that it is difficult to avoid the latter. There is evidence of the engraving by late Homo erectus of geometric designs on shells, around half a million years ago (Joordens et al., 2014). These designs are not definitive proxies for the symbolic niche. However, they seem to implicate a significant convergence of praxis and communication: the cognitive ability

Supplementary material.  Geometrically engraved shells produced by H. Erectus dated at 0.54–0.43 mya. Original caption: The geometric pattern on Pseudodon DUB1006-fL. a, Overview. b, Schematic representation. c, Detail of main engraving area. d, Detail of posterior engravings. Scale bars, 1 cm in a and c; 1mm in d. Reprinted by permission from Springer Nature. Nature doi:10.1038/nature13962. Homo erectus at Trinil on Java used shells for tool production and engraving. Josephine C. A. Joordens, Francesco d’Errico, Frank P. Wesselingh, Stephen Munro, John de Vos et al. © 2014

244 Chris Sinha

to project an imagined figure onto a material substrate, and the motor skills to execute it. It is not far-fetched, then, to attribute symbolic practices to this hominin culture, and even if the evidence does not sustain a definite conclusion that these practices were conventionalized into a proto-semiospere (including protolanguage) it is consistent with such a conclusion. 3. From symbol to system: The emergence of language From protolanguage to Evolutionary Modern Language (EML), and from the symbol-ready brain to the language-ready brain, is as great a step as that from signal to symbol. Proposed neurocognitive pre-requisites for this evolutionary step include enhanced mental time-travel permitting semantic displacement (Corballis, 2017, this volume), with a concomitant elaboration of episodic and working memory (Putt, this volume). But does this sufficiently explain the emergence of languages “with an open lexicon, a grammar supporting a rich compositional semantics, and a phonology” (Arbib this volume)? I suggest that more is at stake, and that we can profitably focus, from a cognitive-functional linguistic perspective, on the elaboration of lexical and constructional resources enabling language users to differentially construe referential situations. Differential construal can be illustrated by Figure-Ground alternations in linguistic conceptualization of identical referential situations, for example “the cup is on the saucer” vs. “the saucer is under the cup”. The step from protolanguage to EML can then be understood as a transition from “basic” symbolic reference, in which symbols stand for objects, events and participants in a more-or-less oneto-one fashion (i.e. as symbolic “labels”); to complex linguistic conceptualization, in which sense (Sinn) is distinguishable from reference (Bedeutung) (Frege, 1892). So, for example, the same individual may be referred to, from different conceptual perspectives, as “X’s mother” and “Y’s daughter”: the conceptual schematization (sense) varies with the perspectival construal, but the reference remains constant. Construals, then, are, perspectival, relational schemas organized as Figure/Ground (or Trajector/Landmark: Langacker, 1987) relations, within an interactively coconstructed common ground (Clark, 1996) or Universe of Discourse (Figure 3). The step into flexible construal, and with it EML, required the elaboration of symbolization into semantic systematicity. Languages stabilize meaning in terms of two complementary sets of relationships, each of which contributes to the sense of the linguistic sign. The first, already present in protolanguage, is the relationship between the symbol and the conceptual category that it conventionally stands for (denotation). The second relationship consists in the relationships that different signs contract with each other, otherwise known as semantic value (Saussure, 1966). Semantic value is classically understood as involving two kinds

Praxis, symbol and language 245


Universe of Discourse

Ground (Landmark)

Referential Situation


Linguistic Expression Speaker


Figure 3.  Linguistic conceptualization as symbolic construal. Reproduced from Sinha (2004: 231)

of relationship. The first relationship is paradigmatic: two signs are in paradigmatic opposition if one can be substituted for another in an utterance context (e.g. boy vs.girl). The second relationship is syntagmatic: two signs are in syntagmatic combination if their linear co-occurrence in an utterance is legal. In modern linguistic parlance, we can say that the syntagmatic dimension is that of linguistic construction.3 The two aspects of sense, denotation and semantic value, are the semiotic reflections of the dual grounding of language (Sinha, 1999): embodied grounding (of denotation) in the human organism (brain, body and action) in its ecological context; and discursive grounding (of semantic value) in the structuring and usage of linguistic signs. This dynamic inter-relation between the two different aspects of sense and grounding is depicted in Figure 4. 3.  In cognitive and construction grammars linguistic signs can exist at different levels of linguistic organization. This is sometimes interpreted to mean that there is no hard and fast boundary between the lexical and the constructional levels, since (for example) a construction can be inserted into another construction. However, I would argue that there remains a psychologically real, if sometimes fluid, distinction between the lexical semantic network and the syntagmatic combinations of the lexical items.

246 Chris Sinha


SENSE2 Semantic Value

Denotation SENSE1


Figure 4.  Sign, sense and the dual grounding of meaning

The emergence of true linguistic conceptualization as depicted in Figures  3 and 4 required both a language-ready brain (involving significant augmentation of the symbol-ready brain – see Section 5); and a symbolic niche capable of sustaining multiple Universes of Discourse, including kinship, ritual, mythic narrative and other intangible socio-cultural domains necessitating differential perspectival construal. This niche is the semiosphere that permits the construction by human groups of true symbolic cultures. 4. Niche construction: Meaning, materiality and human development Both the language-ready brain, and the semiosphere that unlocked it, were products of a species-unique instance of niche construction (Odling-Smee and Laland, 2009) in which language itself is a biocultural niche (Sinha, 2009). The evolution of the semiosphere was, and is, processually and developmentally interdependent with what I call the technosphere. This is the niche of praxic action, and of the material artefactual supports for praxis and for learning through social interaction. The human biocultural complex is characterized by a unique convergence and material interpenetration of semiosphere and technosphere, mediated by human ontogenetic development (Sinha, 2015a, b). Organisms develop, and the human

Praxis, symbol and language 247

organism does so in a context that is saturated with interaction and communication. An ecological approach to human development (Bronfenbrenner, 1979) implies that infancy should be conceptualized not only as a stage in ontogenetic development, but also as a developing niche supporting mutual, intersubjective, emotional, communicative and cognitive engagement. My argument over many years (e.g. Sinha, 1988) has been that the niche of infancy co-evolved with human language and cultural transmission. It was in the species-unique communicative and interactive niche of infancy and early childhood that the evolutionary interaction between learning-in-the-semiosphere and learning-in-the-technosphere played itself out, culminating in the language-ready brain. A key factor in the co-evolution of communicative and praxic action was the enhanced capacity for the micro-level temporal co-ordination and sequencing of both communicative and praxic action; in a time scale that Enfield (2013) terms enchrony (see also Steffensen and Pedersen, 2016). Enchrony characterizes conversational temporality; but is also implicated in both musical and narrative production, and is a fundamental property of infant as well as adult communicative meaning-making long before language (Trevarthen, 2008; Seifert, this volume). Enchronic co-ordination and sequencing is also involved in the planning and execution of component actions whose combination makes up uniquely human complex praxic actions, including imitative actions (Russon, this volume). 5. The ontogenesis of praxic action, imitation and language: Beyond affordance Fagg and Arbib (1998) propose a model of how the primate brain extracts affordances of objects to select hand-shapes in grasping, thereby constructing an inventory of affordances and their related actions. This is a basic requirement for the development of imitation. However, affordance-based action-object relational schemas are insufficient to account for human action and cognition. Learning the socially conventional, canonical functions of artifactual objects is also of fundamental importance in human cognitive development. Such knowledge is manifest in action from infancy (including both action imitation, and actions in response to linguistic instructions). 18-month old English-acquiring infants were often unable to imitate the placement of a small cube on the bottom of an inverted cup (Sinha, 1982). When presented with a flat-bottomed inverted cup (which afforded a support relationship), and shown the action of placing a small cube on the supporting surface, about 25% of the children tested preferred to turn the cup to its canonical, upright orientation and place the brick inside it, a more complex motor action, and one that ignores the immediate affordance in favor of the instantiation of the canonical function. No children showed the opposite response pattern

248 Chris Sinha

of erroneously turning an upright cup upside-down and placing the cube inside it. When required to change the orientation of the cup to make the imitated cube placement, significantly more children were successful when the end-state was a containment configuration than when it was a support configuration. These children seemed to be locked into a normative apprehension of the cup as a canonical container, which even over-rode the “brute” affordance of the flat surface of the bottom of the inverted cup. Even after this response strategy disappeared in action imitation task performance, it continued to be manifest in language comprehension tasks: for example about 27% of two-year olds, when asked to place a block “on” an inverted cup, turned it to the upright position and placed the block inside it. These data can be interpreted as showing that objects are cognitively apprehended by infants, from an early age, in terms of their socially-imposed, normative and canonical function. Further evidence for the role of joint action in establishing canonical object concepts comes from the experimental design used by Freeman et al. (1981), where the object was functionally “ambiguous”, consisting of a set of stacking/nesting cubes. The child was invited by the experimenter to play with the entire set of cubes, and the experimenter set up this pre-test game as either a nesting or a stacking activity. After successfully completing, as joint action, an activity of constructing either a nest of cubes, or a tower of stacked cubes, the experimenter extracted a medium-size cube and a small cube, and conducted either an action imitation task involving the placement of the smaller cube on top of/ inside/under the larger cube, or an acting-out language comprehension task with instructions to place the smaller cube “in”, “on” or “under” the larger cube. The results were dramatic. After playing a nesting game, children’s error patterns showed a response bias similar to that manifested in the same task using cups. In other words, there was a response preference for placing the small cube inside the larger cube. However, this effect was abolished in the stacking condition, in which there was a tendency to preferentially place the smaller cube on top of the larger cube. The stacking/nesting cubes experiment shows that the structuring of the child’s participation in joint action, as much as (and indeed more so) than the affordances of the object “in itself ”, enables the child, in a process of “guided reinvention” (Lock, 1980), to achieve a co-ordination of action, object representation and language, initially in terms of canonical function. The human developmental mastery of complex praxic action, guided either by imitation of observed behavior or by linguistic representations of required end-states, is both rooted in socioculturally acquired knowledge of the canonical functions of artifacts (Sinha and Jensen de López, 2000; Jensen de López, 2003); and dependent on the ability to “disengage” both from immediate affordance, and from canonical function, in order to re-organize perception-action relations via the “fractionation” of canonical functional schemas.

Praxis, symbol and language 249

We have very limited evidence concerning knowledge of canonical object function (or its absence) in non-human primates. Acculturated chimpanzees have learned to use human artifacts according to their canonical functions, but there has been little controlled investigation of the relationship of their object function knowledge to their praxic abilities. Alenka Hribar (MPI-EVA; unpub. pc) conducted a pilot of the “cups” action imitation experiment described above with an adult male non-acculturated chimpanzee, Alex. Inasmuch as Alex produced ontask responses, there was no indication of a canonicality effect, and there was some indication that his responses were guided by immediate affordance. Myowa (this volume) reports, however, that “[chimpanzees] would continue to put a small ball into [a] bowl, even though they observed the actions for covering the ball with the bowl … they seemed to fail to account for context and showed inflexible adherence to the end states (i.e., goal) of the action based on their past experiences with the objects.” This may, then, have been due to response perseveration rather than the learning of canonical cultural conventions. Normativity (not just frequency) is crucial to the understanding of canonical object function, and is one important aspect linking the synergistic evolution of semiosphere and technosphere with the neuro-computational co-ordination of communicative and praxic action. 6. Toward a new road map

Neuro-computational subsystems and their integration The basic hypothesis is that two neuro-computational subsystems can be distinguished: one organizing the sequential ordering and timing of actions and motor images (both praxic and communicative), involving sequencing, enchrony and (perhaps) recursion; and the other organizing object, inter-object and action-object relational schemas. The first of these subsystems is based in the praxic action co-ordination system, which has been evolutionarily co-opted for implementing both syntagmatic construction grammars and articulatory planning and execution. The second is based in the system for generating pre-conceptual image schemas (Johnson, 1987; Lakoff, 1987), which has been evolutionarily co-opted for implementing the lexical semantic network (based on paradigmatic and hierarchical systemic relations). Figure 5 elaborates this model, in which the dual grounding depicted in Figure 4 is situated in the structural expansion of the two hypothesized subsystems. The integration of the subsystems occurs at two levels. First, at the organismic level (embodied grounding of SENSE1), the sensori-motor action-perception system integrates and re-organizes action and image in terms of inter-object and action-object schemas, building on a canonical-functional core. Fagg & Arbib



Semantic Network

SENSE2 Semantic Value




Action Sequence Generation


Image Schema Generation


Figure 5.  Linguistic conceptualization: Neuro-computational integration of Dual Grounding in the language-ready brain

(1998) propose that in macacques the direct relation between affordances and grasps is mediated by a dorsal stream, whereas a ventral stream with prefrontal cortex chooses between affordances in a task-dependent way. I hypothesize that the integrated modern human pre-conceptual system is governed by an actionperception system with the capacity of reintegrating fractionated canonical object function schemas into imagined initial state → end state action plans (Sinha, 1982). The first subsystem is assumed to be neuro-anatomically configured in the dorsal stream, and the second in the ventral stream and (pre)‍frontal cortex (Arbib this volume; Wilson and Petkov this volume). I further hypothesize that that this

Praxis, symbol and language 251

cortical integration system was an evolutionary innovation of the hominin line, and that the co-optation of the two subsystems for language is the fundamental basis of the language-ready brain, beginning at the latest with h. sapiens speciation. In less formal terms, I am reiterating an ancient intuition: that human praxic creativity, imagination and language are closely related. Second, at the inter-organismic and intersubjective level (interactional discursive grounding of SENSE2), participation in systematically governed linguistic practices integrates the neuro-computational representations of the semantic network with the construction grammar inventory and the phonological inventory. This signlevel integration is accompanied by the elaboration of paradigmatic and syntagmatic relations as conceptual structure, including complex, fractionated and flexible event structure representations (Gärdenfors, 2014; Pustejovsky this volume). In sum, dual grounding: (a) constitutes the EvoDevoSocio dynamic underlying the evolution of the language-ready brain; (b) secures the “binding” of individual ontogenetic processes with socio-cultural and symbolic transmission and innovation; (c) potentiates the social process of grammaticalization of proto-semiosphere into language as a multi-level system of mapping from meaning to expression; (d) is the context of the evolution of the capacity for individual language acquisition and development.

EcoEvoDevo-Socio: A synoptic view It remains a common, but misleading, assumption that symbolization and language are consequences of evolutionary processes impacting on the brain, with no causal role evolutionary role being accorded to them. I have argued, on the contrary, that the evolving symbolic niche, or semiosphere, was in fact one of the motors of the co-evolution of brain and language. To the notion of the languageready brain we must add the notion of a symbol-ready brain that may have been possessed by LCA-c. However, the evidence suggests that while all hominids have evolved signal-based cultures, cultures based on a symbolic niche or (proto-) semiosphere evolved only in the hominin line; this was probably already the case for h. erectus. An important co-evolutionary link between praxic action and communicative action in this phase of the Road Map would have been the regulation of both kinds of actions by socially shared norms and conventions, which are necessary both for the emergence of symbolic repertoires and the understanding and creation of artifacts governed by canonical functions. The biocultural dynamic of human evolution, from this point onwards, was driven not only by organism-niche co-evolution, but also by triadic niche-organism-niche co-evolution, in which at least three components of the human biocultural complex co-evolved synergistically:

252 Chris Sinha

a. the niche of infancy and childhood, including learning and teaching in the niche; b. the niche of praxis and its products (the technosphere), and c. the niche of communication and its mediating signs (the semiosphere). This process eventuated in the emergence, at the latest in h. sapiens, of the language-ready brain. Just as the symbol-ready brain did not immediately unlock symbolic communication, the language-ready brain did not immediately unlock Evolutionary Modern Languages. Rather, we can hypothesize a continuity of protolanguage types (not necessarily tokens, or languages) from h. erectus through early h. sapiens. What finally unlocked the grammaticalization process that led to Evolutionary Modern Languages (Heine and Kuteva, 2007) was the elaboration of true symbolic cultures, with their demands for perspectivally-based flexible construal, which are the hallmark of what archaeologists have called “behavioral modernity” (Henshilwood and Dubreuil, 2011).

Genetics and epigenetics of the language-ready brain It is possible, but unlikely, that the language-ready brain was a result of languagespecific genetic modifications unique to modern h. sapiens. More likely, languagereadiness was a feature of other late hominin species as well as our own. I have argued elsewhere (Sinha, 2004) that epigenetic developmental processes were crucial in the evolution of the niche of infancy, and therefore to human languagereadiness. I have also pointed to evidence that the augmented developmental plasticity of our species was not shared to the same degree by any other (extinct) hominin species, but was part of (or even subsequent to) modern human speciation (Sinha, 2015b). We should not exclude the possibility of post-speciation genetic modifications (probably through epigenetically governed genetic assimilation) that further enhanced the adaptation of the modern human brain to language acquisition and development.

Acknowledgements The author thanks Michael Arbib, the reviewers and the other workshop participants.

Funding This article is the fruit of a workshop funded by NSF Grant No. BCS-1343544 “INSPIRE Track 1: Action, Vision and Language, and their Brain Mechanisms in Evolutionary Relationship,” (M.A. Arbib, Principal Investigator). The research reported here and the author’s attendance were funded by Hunan University.

Praxis, symbol and language 253

References Abouheif, E., Favé, M. -J., Ibarrarán-Viniegra, A. S., Lesoway, M. P., Rafiqi, A. M., Rajakumar, R. (2014) Eco-evo-devo: the time has come. In Landry, C. R. and Aubin-Horth, N. (Eds.) Ecological Genomics: Ecology and the Evolution of Genes and Genomes. New York: Springer 107–25.  https://‍ Arbib, M. (2012). How The Brain Got Language: The Mirror System Hypothesis. Oxford: Oxford University Press.  https://‍ Arbib, M. A. (2015). Towards a computational comparative neuroprimatology: framing the language-ready brain. Phyics of Life Review.  https://‍ Berwick, R. C. and Chomsky, N. (2016). Why Only Us: Language and Evolution. Cambridge, MA. MIT Press. Bühler, K. (1990 [1934]). Theory of Language: The Representational Function of Language. Amsterdam: John Benjamins.  https://‍ Bronfenbrenner, U. (1979). The Ecology of Human Development. Cambridge, MA: Harvard University Press. Cheney, D. L. & Seyfarth, R. M. (1981). Selective forces affecting the predator alarm calls of vervet monkeys. Behavior 76, 25–61.  https://‍ Clark, H. (1996). Using Language. Cambridge, Cambridge University Press. https://‍ Corballis, M. (2017). Language evolution: A Changing Perspective. Trends in Cognitive Sciences 21, 4.  https://‍ Dediu, D. and Levinson, S. C. (2013). On the antiquity of language: the reinterpretation of Neandertal linguistic capacities and its consequences. Frontiers in Psychology 4, 397. https://‍ Enfield, N. (2013). Relationship Thinking: Agency, Enchrony, and Human Sociality. New York. Oxford University Press.  https://‍ Fagg, A. H. and Arbib, M. A. (1998). Modelling parietal-premotor interactions in primate control of grasping. Neural Networks 11, 1277–1303. https://‍‍00047-1 Freeman, N., C. Sinha and S. Condliff. (1981) Confrontation and collaboration with young children in language comprehension tasks. In W. P. Robinson (ed.) Communication in Development, 63–88. London, Academic Press. https://‍ Frege, G. (1892). Über Sinn und Bedeuting. Zeitschrift für Philosophie und Philosophische Kritik 100, 25–50. Gärdenfors, P. (2014). The Geometry of Meaning: Semantics based on conceptual spaces. Cambridge, MA: MIT Press. Gilbert, S. F., Bosch, T. and Ledón-Rettig, C. (2015). Eco-Evo-Devo: developmental symbiosis and developmental plasticity as evolutionary agents. Nature Reviews Genetics 16, 611–622. https://‍ Halina, M., Rossano, F. and Tomasello, M. (2013). The ontogenetic ritualization of bonobo gestures. Animal Cognition 16, 653–666.  https://‍ Hauser, M. D. (1996). The Evolution of Communication. Cambridge, MA: MIT Press. Heine, B. and Kuteva, T. (2007). The Genesis of Grammar: A reconstruction. Oxford: Oxford University Press.

254 Chris Sinha Henshilwood, C. S. and Dubreuil, B. (2011). The Still Bay and Howieson’s Poort 77–59 ka: Symbolic material culture and the evolution of the mind during the African Middle Stone Age. Current Anthropology 52, 361–402.  https://‍ Huxley, J. (1966). A discussion of ritualization of behaviour in animals and man. Phil. Trans. Roy. Soc. London 251, 273–284. Jensen de López, K. (2003). Baskets and Body-Parts: a cross-cultural and cross-linguistic investigation of children’s development of spatial cognition and language. PhD dissertation, University of Aarhus. Johnson, Mark. (1987). The Body in the Mind: The bodily basis of meaning, imagination and reason. Chicago: University of Chicago Press. Joordens, J. C., d’Errico, F., [18 others] & Roebroeks, W. (2014). Homo erectus at Trinil on Java used shells for tool production and engraving. Nature.  https://‍ Lakoff, G. (1987). Women, Fire and Dangerous Things: What categories tell us about the mind. Chicago: University of Chicago Press. https://‍ Langacker, R. W. (1987). Foundations of Cognitive Grammar Vol. 1, Theoretical Prerequisites. Stanford: Stanford University Press. Lock, A. (1980). The Guided Reinvention of Language. London, Academic Press. Lotman, Y. (1990). Universe of the mind, in A SemioticTheory of Culture, transl. A. Shukman. New York: I.B.Tauris and Co. Ltd. Odling-Smee, F. J. and Laland, K. N. (2009). Cultural niche-construction: evolution’s cradle of language. In R. Botha and C. Knight (eds.) The Prehistory of Language. Oxford: Oxford University Press, 99–112.  https://‍ Saussure, F. de. (1966). Cours de Linguistique Générale. New York: McGraw-Hill. Sinha, C. (1982) Representational development and the structure of action. In G. Butterworth and P. Light (eds.) Social Cognition: Studies in the Development of Understanding. Brighton, Harvester, pp. 137–162. Sinha, C. (1988). Language and Representation: A Socio-Naturalistic Approach to Human Development. Hemel Hempstead, Harvester-Wheatsheaf & New York, New York University Press. Sinha, C. (1999) Sinha, C. Grounding, mapping and acts of meaning. In T. Janssen and G. Redeker (eds.) Cognitive Linguistics: Foundations, Scope and Methodology. Berlin, Mouton de Gruyter, pp. 223–255.  https://‍ Sinha, C. (2004) The Evolution of Language: From Signals to Symbols to System. In D. Kimbrough Oller and Ulrike Griebel (eds.) Evolution of Communication Systems: A Comparative Approach. Vienna Series in Theoretical Biology. Cambridge, MA: MIT Press, pp. 217–235. Sinha, C. (2009) Language as a biocultural niche and social institution. In Vyvyan Evans and Stéphanie Pourcel (Eds.) New Directions in Cognitive Linguistics. Amsterdam: John Benjamins, pp. 289–310.  https://‍ Sinha, C. (2013). Niche construction, too, unifies praxis and symbolization. Commentary on Michael Arbib: How the brain got language. Language and Cognition 5, 261–271. https://‍ Sinha, C. (2015a). Language and other artifacts: socio-cultural dynamics of niche construction. Frontiers in Psychology (Cognitive Science) 6, 1601. https://‍ Sinha, C. (2015b). Ontogenesis, semiosis and the epigenetic dynamics of biocultural niche construction. Cognitive Development 36, 202–209.  https://‍

Praxis, symbol and language 255

Sinha, C. and Jensen de López, K. (2000). Language, culture and the embodiment of spatial cognition. Cognitive Linguistics 11, 17–41. Slocombe, K. K. E. and Zuberbühler, K. (2005). Functionally referential communication in a chimpanzee. Current Biology 15, 1179–1184.  https://‍ Steffensen, S. V. and Pedersen, S. B. (2016). Temporal dynamics in human interaction. Cybernetics and Human Knowing 21, 80–97. Tomasello, M. & Call, J. (1997). Primate Cognition. New York: Oxford University Press. Trevarthen, C. (2008). The musical art of infant conversation: Narrating in the time of sympathetic experience, without rational interpretation, before words. Musicae Scientiae 12 (Suppl. 1), 15–46.  https://‍ Whiten, A., Goodall, J., McGrew, W. C., Nishida, T., Reynolds, V., Sugiyama, Y., Tutin, C. E. G., Wrangham, R. W., Boesch, C. (1999). Cultures and chimpanzees. Nature 399, 682–685. https://‍

Action, Tool Making and Language

Archaeology and the evolutionary neuroscience of language The technological pedagogy hypothesis Dietrich Stout

Emory University

Comparative approaches to language evolution are essential but cannot by themselves resolve the timing and context of evolutionary events since the last common ancestor with chimpanzees. Archaeology can help to fill this gap, but only if properly integrated with evolutionary theory and the ethnographic, ethological, and experimental analogies required to reconstruct the broader social, behavioral, and neurocognitive implications of ancient artifacts. The current contribution elaborates a technological pedagogy hypothesis of language origins by developing the concept of an evolving human technological niche and applying it to investigate two key transitions posited by Arbib’s Mirror System Hypothesis: (1) from complex action recognition and imitation to proto-language, and (2) from proto-language to language.

1. Introduction Empirical evidence regarding the evolution of the human capacity for language is generally sought through comparative study of extant species. This is essential for identifying recurring evolutionary cause-effect relationships (e.g. Isler & Van Schaik, 2014) and can help reconstruct the characters of shared ancestors (e.g. Arbib, 2016). However, it cannot by itself reveal the timing and context of evolutionary events since the last common ancestor with chimpanzees. Stout and Hecht (2017) thus propose an integrated “evolutionary neuroscience” approach that evaluates comparative evidence of brain and behavioral variation in light of (a) evolutionary and developmental processes, (b) primary archaeological and paleontological evidence of timing and context, and (c) the ethnographic, ethological, and experimental analogies needed to interpret this primary evidence. The current contribution focuses on experimental studies of stone-tool making, including especially structural and functional neuroimaging, and their contribution to

https://‍ © 2020 John Benjamins Publishing Company

Archaeology and the evolutionary neuroscience of language 257

reconstructing evolutionary interactions between Paleolithic skill learning and language capacities. The specific aim is to provide added behavioral/contextual detail and chronological constraint for two key transitions described in Arbib’s (2012, 2016) Mirror System Hypothesis (MSH) of language origins: the move from complex action recognition and imitation to proto-language, and subsequently from proto-language to language. Comparative (Isler & Van Schaik, 2014), paleontological (Antón, Potts, & Aiello, 2014), and archaeological (Stout, 2011) evidence converge to suggest that the broader evolutionary context for language evolution was the emergence of a human technological niche increasingly reliant on cooperation, sharing, and the intergenerational reproduction of complex subsistence skills (Stout & Hecht, 2017; Stout & Khreisheh, 2015). We describe this niche as “technological” not to promote a narrow focus on tools, but rather to exploit theoretical connections made by the broader anthropological recognition of technology as simultaneously material, economic, social, and cultural (Ingold, 1997). The critical linkage explored here is from the durable artifacts preserved in the archaeological record, to the dynamic bodily skills that produced and mobilized them, and ultimately the biological traits, social structures, and cultural understandings that both enabled and arose from these patterned technological practices (Roepstorff, Niewöhner, & Beck, 2010). Evolutionarily, these complex interactions are addressed within what has been termed an “Extended Evolutionary Synthesis” that emphasizes the constructive nature of developmental processes, the potential for reciprocal causation between organism and environment, and the importance of non-genetic inheritance (Laland et al., 2015). 2. The human technological niche Humans are a highly successful species. Our demographic potential seems paradoxical in a large-brained primate known for its slow and costly development. A growing consensus finds the solution to this paradox in a human strategy of ‘biocultural reproduction’ (Bogin, Bragg, & Kuzawa, 2014), in which individuals other than the parents donate resources to help support offspring. This allows human mothers to produce costly large-brained children (Isler & Van Schaik, 2014) at a rate far outstripping our smaller-brained relatives. How did Pleistocene humans, in contrast to other apes, reliably produce the surpluses that made such alloparental support possible? Embodied capital theory (Kaplan, Hill, Lancaster, & Hurtado, 2000) proposes that humans have evolved a tightly integrated strategy in which a focus on high-value, difficult-to-acquire food resources provides the surpluses needed to fund growth, survival, and reproduction and is in turn enabled by the increased

258 Dietrich Stout

longevity and brain size that allow learning of the requisite foraging skills. Cognitive and affective adaptations for cooperative sociality (Hill, Barton, & Hurtado, 2009), which are necessary for biocultural reproduction, simultaneously provide a conducive venue for social learning and teaching. Similarly, durable technological artifacts and reliably recurring activities both provide a context for learning (Stout & Hecht, 2017) and occasion the use and modification of objects in ways that impose new perceptual-motor challenges and serve as a stable focus for the production of more complexly organized and temporally protracted action sequences (Stout, 2013). Integrating these various theoretical strands leads to a picture of a distinctly human way of life reliant on cognitive, affective, and lifehistory adaptations supporting the intergenerational reproduction and accumulation of technological skills (Stout & Hecht, 2017; Stout & Khreisheh, 2015). The “technological pedagogy” hypothesis (Stout & Chaminade, 2012) elaborated here posits that human entry into this niche exerted interacting pressures on individual skill, social learning, and intentional communication that drove the biocultural evolution of language. One key source of evidence for reconstructing the evolution of the technological niche is the stone artifacts that dominate the Paleolithic (3.3 million – 10,000 years ago) archaeological record. Such tools were key components of hominin subsistence and survival strategies for ~99% of prehistory and helped to shape the course of human evolution. Of course, stone-tool making (“knapping”) is but one of many archaeologically visible Paleolithic skills that might be studied. Other examples include pyrotechnology, hunting and butchery, tool-making in non-lithic materials, and signaling technologies such as pigments and beads (e.g. d’Errico & Stringer, 2011). Nevertheless, knapping stands out in terms of the quality, quantity, and spatiotemporal range of the record it provides. Furthermore, the existence of relatively well-established methods for the experimental replication of Paleolithic stone toolmaking techniques by modern research subjects facilitates the application of neural- and behavioral-science methods to study these behaviors (Stout & Khreisheh, 2015), including the neuroimaging results discussed below. Finally, ethnographic (Stout, 2002) and comparative (Fragaszy et  al., 2013) research on skill acquisition can provide some insight into broader the broader social context of cooperation, learning, and communication implicated in lithic technology. 3. Stone tools and language evolution Recent work relating stone tools to language evolution has emphasized the potential co-option (exaptation) of instrumental action recognition and control systems to support communicative actions (Hecht et al., 2014; Stout & Chaminade, 2012). The proposed overlap is the need to rapidly generate (for production) and parse

Archaeology and the evolutionary neuroscience of language 259

(for comprehension and learning) complex multi-level behavior sequences, as in the complex action recognition and imitation espoused in the MSH (Arbib, 2018). A prevailing view is that the learning and fluid execution of such skills is accomplished by chunking discrete action elements into abstracted perceptual and/or goaldirected units (Gobet et al., 2001; Kolodny, Edelman, & Lotem, 2015) that can be manipulated at higher levels of organization without exceeding working memory resources. In linguistics, this is consistent with a Construction Grammar approach that understands language structure as built from stored pairings of form and function (Goldberg, 2003). Such constructions range in scale from morphemes to sentence patterns and are learned by generalizing from regularities of linguistic experience. From this perspective, “the ability to acquire and deploy a hierarchy of chunks at different linguistic scales is parallel to the ability to chunk sequences of motor movements, numbers, or chess positions: it is a skill, built up by continual practice.” (Christiansen & Chater, 2016: 114). Importantly, such skills are not simply present or absent but exhibit continuous variation in degrees of competence. This must be taken into account when attempting to specify the timing and relative order for the “appearance” of various instrumental and communicative skills on both evolutionary and developmental timescales. Stone knapping is one evolutionarily-significant and archaeologically-visible behavior that may be approached in this manner. 3.1 Oldowan flake production Knapping involves the detachment of flakes from a stone core using ballistic strikes with a hand-held hammer to initiate controlled and predictable fracture. Early technologies, such as the Oldowan, focus on the production of these flakes, which are surprisingly effective cutting tools. Flake production requires precise percussion and core manipulation coupled with perception of subtle affordances defined by complex interactions between the force and location of the strike and the morphology, positioning, and support of the core (Nonaka, Bril, & Rein, 2010), but involves relatively little contingency between successive flake removals and exerts minimal demands for explicit planning (Stout, Hecht, Khreisheh, Bradley, & Chaminade, 2015). Currently, Oldowan tools dated to 2.6 million-years-ago (mya) offer the earliest archaeological evidence of hominin knapping skill development beyond the capacities of other apes (Toth, Schick, & Semaw, 2006), but ongoing research may soon extend this to 3.3 mya (Lewis & Harmand, 2016) or even beyond. Further increases in expressed perceptual-motor skill are evident with the first appearance (1.7 mya) and later elaboration (0.7–0.5 mya) of Acheulean “handaxe” technology (Stout, 2011). Skilled actions like knapping are thought to depend on the use of internal models to predict movements and outcomes in advance (Wolpert, Doya, &

260 Dietrich Stout

Kawato, 2003). These models employ multiple, interacting levels of representation (e.g. elementary movements, sequences of movements, goals of sequences) to combine action elements and achieve flexible control across variable conditions. For Oldowan knapping, the salient demands are at the level of elementary movements, perhaps comparable to the articulatory/phonological level of speech (or sign) processing. Consistent with this, Oldowan knapping by modern human subjects recruits portions of ventral premotor cortex (Stout & Chaminade, 2007; Stout, Toth, Schick, & Chaminade, 2008) neighboring those involved in speech production and perception (Wilson, Saygin, Sereno, & Iacoboni, 2004). In contrast to the perceptual-motor complexity of the elementary movements, Oldowan action sequencing is relatively simple and invariant: first rotate and inspect the core (repeating until a viable target is identified), then strike the target (repeating with adjusted kinematics until a flake is detached or the target is abandoned as unsuitable, generally < 5 blows), and finally repeat these operations until sufficient flakes have been produced or the raw material is exhausted. This sequence has been variously described as a “flake production chunk” (Stout, 2011), “basic flake unit”(Moore, 2010), or elementary “flake loop” (Gowlett, 1984). It might be considered analogous to a morphological (lexical) or syntactic (phrasal) level linguistic construction, although knapping action typically unfold at a much slower rate than spoken or signed language (e.g. ~ 5 seconds for a 2–3 action flake chunk vs. ~ 1 second for a 3 word sentence). The simplicity of Oldowan action sequences is reflected in a lack of task-related activation in portions of the inferior frontal gyrus (“Broca’s Area”) commonly implicated in structured sequence processing for linguistic syntax (Schell, Zaccarella, & Friederici, 2017) and hierarchical action organization (Fitch & Martins, 2014; Koechlin & Jubault, 2006). What construction of even the simplest flake production chunk does require is abstraction from kinematic details to generic action “types” (e.g. rotate, strike) that stand in some consistent relation to the goal: in other words, the creation of a stored pairing of form and function. Just as linguistic construction requires perceptual chunking of sounds into morphemes and words that can be recognized despite variation across instances (e.g. foreign accent:Arbib, 2016), knapping requires perceptual abstraction across variable instances of suitable core morphology and motor gestures to build an “action lexicon”. This may again be related to findings from neuroimaging studies of Oldowan knapping. During object manipulation, internal models are implemented by a frontoparietal circuit in which object/grasp transformations supported by anterior inferior parietal lobe inform grip selection and execution processes in ventral premotor cortex (Fagg & Arbib, 1998). To produce these grasp transformations, anterior parietal cortex integrates kinematic information about task affordances from a parietal lobe dorsal stream of “vision for action” involved in position and motion

Archaeology and the evolutionary neuroscience of language 261

processing with semantic information from a temporal lobe ventral stream of “vision for perception” providing the recognition of objects, individuals, and body parts (Milner & Goodale, 1995) necessary for goal-directed action planning. Oldowan-style knapping appears to be especially demanding of this integrated system, specifically including the ventral premotor – anterior inferior parietal circuit and elements of its dorsal (dorsal intraparietal sulcus) and ventral (lateral occipital cortex) stream inputs (Stout & Chaminade, 2007; Stout et al., 2008). Notably, this ventral steam activity (putatively related to object recognition) is only evident following training when it is accompanied by a posterior-medial shift in premotor activation. In the framework being developed here, we might speculate that this reorganization constitutes a neural signature of form-function pairing during action lexicon construction. Comparative neuroprimatology further suggests that, evolutionarily, distinctive human capacities for action lexicon construction have been enabled by dorsal stream additions to pre-exiting ventral stream pathways (reviewed by Stout & Hecht, 2017). For example, chimpanzees and humans, but not macaques, display robust connections between anterior inferior parietal cortex and object-processing cortex of the ventral stream (Hecht et al., 2013), perhaps explaining the novel responsiveness of this region to hand-held tools in humans but not macaques (Orban & Caruana, 2014). In humans alone, this system is further augmented by new dorsal and ventral stream connections to superior parietal cortex. This “dorsal-dorsal” pathway is involved in spatial perception (Hecht et al., 2013) and includes a region of dorsal intraparietal cortex that exhibits derived capacities for 3-D object representation in humans and is specifically recruited during Oldowan toolmaking (Stout & Chaminade, 2007). Hypothetically, such changes in connectivity would have increased the availability of spatial and kinematic details for action perception and execution, enabling the construction of a more detailed and differentiated action lexicon. 3.2 From complex action recognition and imitation to proto-language Insofar as human pantomime also relies on spatial processing in superior parietal cortex (Emmorey, McCullough, Mehta, Ponto, & Grabowski, 2011) and communicative gestures overlap with object-directed actions in fronto-parietal cortex (Montgomery, Isenberg, & Haxby, 2007), enhanced praxic capacities for action lexicon construction should have been readily generalizable to communicative gesture, perhaps in the context of technological pedagogy. In the MSH framework, such exaptation is posited to enable a proliferation of ad hoc pantomimes that were conventionalized into an expanding lexicon of proto-sign gestures. The open-ended semantics afforded by protosign then provided the socio-behavioral context in

262 Dietrich Stout

which an extension to the vocal modality became adaptive. Alternatively, elements of this system could have been more directly co-opted into the vocal modality (Stout & Chaminade, 2012) to support phonological production and comprehension of proto-words (see Aboitiz, 2018 for further discussion). In either case, the neuroarchaeological and comparative neuroprimatological evidence reviewed above indicates the appearance of novel, dorsal stream, praxic adaptations starting at least 2.6 mya. A further issue for this account is that it requires a transition from instrumental actions like tool-making to the production and understanding of intentionally communicative (IC) actions like pantomime. The MSH posits that IC capacity was already present in the last common ancestor of chimpanzees and humans (as implied by the presence of gestural communication in modern apes) and thus that the proposed exaptation of praxic skills for pantomime requires no special explanation (Arbib, 2018). However, many would contend that ape IC is limited to socially-instrumental communication aimed at eliciting behaviors rather than modifying mental states (but see Russon, 2018 for an alternative view), suggesting that some additional hominin mentalizing (“Theory of Mind”) capacity is required to enable the production of pantomimes intended to convey information (Abramova, 2018). If this is accepted, one reaction might simply be to consider the evolution of mentalizing as external to the MSH and thus subject to independent explanation. However, an extended evolutionary-developmental perspective suggests more interesting connections (Stout & Hecht, 2017). One prevailing view is that core aspects of modern human mentalizing rely on internal simulation of others’ actions and are developmentally constructed from the motor resonance properties of the mirror system (Heyes & Frith, 2014). Enhancements to this system for action recognition and imitation, like those described above, would thus be expected to additionally facilitate the development of enhanced (e.g. easier, more frequent, more detailed) mentalizing. Furthermore this would be occurring within a novel developmental context created by an evolving technological niche reliant on social skill transmission, cooperation, and coordinated action (Stout & Hecht, 2017). This leads to a technological pedagogy hypothesis (Stout & Chaminade, 2012) for the evolution of elaborated human IC. It is not currently possible to designate a specific date by which human-like IC would have been required to explain the archaeological evidence (for an alternative view, see Gärdenfors & Högberg, 2017), but intentional demonstration, gesture, and linguistic instruction are at least beneficial for learning even Oldowan-like flake production (Morgan et al., 2015) and, by later Acheulean times (0.7–0.5 mya) there is evidence for the transmission of quite complex and demanding techniques (Stout, Apel, Commander, & Roberts, 2014), some of which modern humans have difficulty conveying without explicit verbal instruction (Gärdenfors & Högberg, 2017).

Archaeology and the evolutionary neuroscience of language 263

3.3 Acheulean shaping After about 1.7 Mya, a number of technological innovations begin to appear in the archaeological record, including the production of intentionally shaped Early Acheulean tools archaeologists refer to as “handaxes,” “picks,” and “cleavers” (Stout, 2011). Such shaping requires greater perceptual-motor skill to precisely control stone fracture patterns and more complex action plans that relate individual flake removals to larger design goals such as shaping a pointed tip or thinning (“refining”) the piece. For example, by 0.7–0.5 mya, some “Late” Acheulean handaxes exhibit a level of refinement that requires careful preparation of edges and surfaces preceding flake removals (Stout et al., 2014). This involves a novel action chunk, “platform preparation,” in which (1) the handaxe is flipped over to reveal the face opposite the intended percussion, (2) rapidly repeated low-amplitude, brushing strokes with the hammerstone are used to delicately chip the edge until desired criteria of sharpness, bevel, and placement relative to the midline are met, and (3) the handaxe is flipped back over for the planned thinning flake removal. In other words, the basic flake production chunk described above is elaborated by allowing the simple striking action slot to be filled with a complex preparation & percussion chunk. This compound construction can then be inserted into even larger procedural constructs such as cross-sectional thinning and bifacial edging. This parallels the production of complex linguistic constructions through combination of partially filled constructions at various levels of specification (Goldberg, 2003). As expected by this analogy with language processing, the increased complexity of Acheulean over Oldowan action sequences is associated with activation in more anterior portions of the inferior frontal gyrus. Importantly, however, this activity is right- rather than left-lateralized as is typical for many aspects of language processing. Evidence of right Inferior Frontal Gyrus (rIFG) response to handaxe-making now includes increased functional activation during both actual execution (Putt, Wijeakumar, Franciscus, & Spencer, 2017; Stout et al., 2008) and passive observation (Stout, Passingham, Frith, Apel, & Chaminade, 2011), as well as structural remodeling of underlying white matter in response to handaxe-making training (Hecht et al., 2014). These effects are localized to anterior rIFG (pars triangularis, Brodmann Area 45). In the left hemisphere, BA45 is thought to support semantic processing (Schell et  al., 2017) through executive control (retrieval, selection, integration: (Lambon Ralph, Jefferies, Patterson, & Rogers, 2016)) over temporal lobe lexical representations. We have suggested (Hecht et  al., 2014; Hecht, Gutman, Bradley, Preuss, & Stout, 2015) that rIFG involvement in handaxe production reflects its analogous role in action selection during multi-component behavior (Dippel & Beste, 2015), which requires the integrations of abstract ventral stream form-and-function constructions with dorsal stream spatial and kinematic details that characterize concrete instances.

264 Dietrich Stout

Such integration appears to be facilitated in humans by enhanced dorsal stream inputs to rIFG via an expanded Superior Longitudinal Fasciculus (SLF). This tract, which connects anterior inferior parietal cortex to IFG, is increased in volume, anterior extension, and rightward lateralization in humans vs. chimpanzees (Hecht et al., 2015) and displays transient increases in white matter integrity during handaxe-making skill learning (Hecht et al., 2014). In the left hemisphere, a similarly expanded dorsal pathway connects the middle temporal gyrus and superior temporal sulcus to IFG via the Arcuate Fasciculus (Rilling, Glasser, Jbabdi, Andersson, & Preuss, 2011). AF appears to play a comparable role to SLF but in the linguistic domain: providing lexical and phonetic/articulatory specifics for integration into more abstracted syntactic constructions. Indeed, the (relatively late) anatomical development (Brauer, Anwander, Perani, & Friederici, 2013) and functional lateralization (Xiao, Friederici, Margulies, & Brauer, 2016) of this pathway in children is associated with improved processing of syntactically-complex sentences. Distinct left- and right-lateralized dorsal stream enhancements, when added to bilateral ventral stream inputs to IFG that are shared with macaques and chimpanzees (Rilling et al., 2011), thus appear to support unique human capacities for language and praxis respectively. Evolutionarily, these left and right pathways might reflect separate, functionally-specific adaptations. Alternatively, they might share a common evolutionarydevelopmental origin in more generalized, bilateral elaborations to dorsal stream connectivity. The modern adult condition would then be a result of some combination between subsequent phylogenetic change (exaptation and secondary adaptation) and/or plastic developmental accommodation to culturally-evolved linguistic and technological environments (Hecht et al., 2014; Stout & Hecht, 2017). The latter alternative is consistent with evidence of the late development (Brauer et al., 2013; Xiao et al., 2016) and adult plasticity (Hecht et al., 2014) of these systems, but more work on the genetic and experiential basis of individual- and species-level structural-functional variation is clearly needed to address this question. In either case, these dorsal stream developments would be relevant to the transition from protolanguage to language, which the MSH envisions as involving feedback between evolving protosign and protospeech capacities such that “mechanisms evolved to support one become available to support the other” (Arbib, 2016). In other words, communication at this point would be multi-modal (Levinson & Holler, 2014). To this picture, we can add continued feedback between evolving praxic and communicative skills. In the case of distinct, developmentally canalized left and right pathways, these various evolutionary interactions would occur through the extra-somatic medium of niche construction (Laland et al., 2015). As outlined in Section 2, enhanced praxis and communication enable cooperation and social learning leading to the accumulation of ever more

Archaeology and the evolutionary neuroscience of language 265

complicated technologies that provide further pressures for enhanced praxis and communication. The possible existence of more direct functional or developmental linkage between hemispheres, such that selection acting on one pathway would have been likely to produce coordinated changes in the plastic developmental substrates available for the other, suggests a further mechanism of powerful evolutionary feedback (Stout & Hecht, 2017). 3.4 From proto-language to language Regardless of modality, the MSH transition to language is proposed to proceed through the decomposition of holophrastic “protowords” with unitary meanings into multiple, re-combinable, constituents with discrete meanings (true words). It is the actual use and re-combination of words in a community which then provides opportunities for generalization and statistical learning that, when iterated across generations, lead to the cultural evolution of complexity and regularity through processes such as abstraction and grammaticalization. For example, semantic distinctions between actions and objects will affect patterns of word association in systematic ways that might lead to the generalization of abstract lexical categories like “noun” and “verb.” This process can still be observed in modern language learners, who integrate multiple phonetic, distributional, and semantic cues to infer lexical categories (Christiansen & Chater, 2016). The point is that the fractionation and statistical learning processes required for the MSH transition to language depend on precisely the capacities for integration and manipulation of multi-level phonetic, lexical, and sequential information with semantic world-knowledge that are hypothesized to result from enhanced dorsal and ventral stream confluence in IFG. This is consistent with the well-documented involvement of IFG in structured sequence processing and statistical learning across domains (Frost, Armstrong, Siegelman, & Christiansen, 2015), and the key role of these capacities in language processing and acquisition (Christiansen & Chater, 2016; Kolodny et al., 2015). Critically, the MSH transition to language is an inherently social process that requires the externalization of lexical representations as physical (perceptualmotor) tokens and the incorporation of these tokens into a culturally evolving system shaped by the demands of efficient communication in a serially-ordered medium (Arbib, 2012). Whereas we might envision a continuum of quantitative enhancements to ventral stream perceptual-abstraction and concept formation capacities across monkeys, apes and humans, the externalization critical to language evolution seems to have been supported by qualitative shifts to the organization, connectivity and/or computational capacities of the dorsal stream for action. Comparative evidence cannot date these human-specific neuroanatomical

266 Dietrich Stout

developments, but archaeological evidence of increasingly refined handaxes from ~1.7–0.5 mya does provide evidence for the timing and context of the expression of new behavioral capacities for complex socially-transmitted praxis. 4. Conclusion: Towards a new road map Archaeological data can supplement comparative approaches to language evolution by supplying detail regarding the timing and context of the contingent events that defined a uniquely human evolutionary path since the last common ancestor with chimpanzees. Evidence reviewed here suggests that key features of the language-ready brain (Arbib, 2012, 2016) emerged in the context of an evolving human technological niche that both demanded and enabled the evolution of increasingly effective communication. By 2.6 mya, Oldowan artifacts provide evidence of enhanced action control. Behaviorally, such control could be exapted to support the production and recognition of an expanded repertoire of distinct communicative gestures. Developmentally, it would be expected to facilitate the construction of implicit mentalizing skills from motor resonance properties of the action control system (Heyes & Frith, 2014). Evidence of brain expansion, increased carnivory, and more routine tool production by 2.0 mya (Antón et  al., 2014) suggests increasing commitment to a technological niche and correspondingly accelerating pressures for enhanced communication. The behavioral, cognitive, and neuro-evolutionary implications of the Early Acheulean technology that appeared around 1.7 mya have yet to be studied using experimental neuroarchaeology methods, but available evidence indicates that the emergence of Late Acheulean methods by 0.7–0.5 mya marks a clear increment in technological complexity. Notably, this early Middle Pleistocene (0.78–0.40 mya) time-frame also encompasses some of the fastest increases in encephalization of the past 2 million years (Ruff, Trinkaus, & Holliday, 1997), and a substantial hominin range expansion and habitat diversification including persistence in Eurasia through entire ice age cycles (Stewart & Stringer, 2012). Of special interest here are Late Acheulean demands for hierarchical action construction hypothetically supported by enhanced dorsal stream inputs to IFG. As many have argued, the neural substrates and computational capacities underlying such instrumental action sequencing could have been behaviorally exapted to support the construction of hierarchically-structured communicative action sequences (e.g. Greenfield, 1991). The subsequent cultural evolution of grammatically complex languages would also be supported by the enhanced statistical learning capacities of this system (Arbib, 2016; Christiansen & Chater, 2016; Frost et  al., 2015; Kolodny et  al., 2015). At the same time, the added complexity and non-intuitive nature of Late Acheulean technology would have generated additional selective benefits for enhanced

Archaeology and the evolutionary neuroscience of language 267

communication and teaching (Stout et al., 2014), perhaps even requiring the intentional communication of displaced concepts (Gärdenfors & Högberg, 2017). To further test and develop this account, work is needed in at least three key areas. First, to examine the technological pedagogy hypothesis, experimental studies of the acquisition of real-world, evolutionary relevant skills like stone knapping are needed. The practical challenge for study design is to combine (1) the lengthy learning periods required for such skills with (2) adequate methods for measuring learning behaviors and outcomes and (3) sufficient sample sizes to experimentally manipulate learning conditions (Stout & Khreisheh, 2015). Second, to clarify the actual contribution of various candidate processes (e.g. natural selection, developmental bias, niche construction) in dorsal stream evolution, we need much more information regarding genetic and environmental contributions to the development of individual- and species-level phenotypic variation (Gómez-Robles, Hopkins, Schapiro, & Sherwood, 2015; Strike et al., 2018). Third, to experimentally test the hypothesis that instrumental action and linguistic communication share neurocognitive foundations in a domain-general capacity for multi-component behavior sequencing, it will first be necessary to develop empirical methods for quantifying the structural complexity of praxic behaviors that, unlike language, music and mathematics, do not already have well-developed notational systems and explicit grammars.

Funding This paper was prepared for a workshop funded by NSF Grant No. BCS-1343544 “INSPIRE Track 1: Action, Vision and Language, and their Brain Mechanisms in Evolutionary Relationship,” (M.A. Arbib, Principal Investigator).

References Aboitiz, Francisco. (2018). Voice, gesture and working memory in the emergence of speech. Interaction Studies.  https://‍ Abramova, Ekaterina. (2018). The role of pantomime in gestural language evolution, its cognitive bases and an alternative. Journal of Language Evolution. https://‍ Antón, Susan C., Potts, Richard, & Aiello, Leslie C. (2014). Evolution of early Homo: An integrated biological perspective. Science, 345(6192), 1236828. https://‍ Arbib, Michael A. (2018). In support of the role of pantomime in language evolution. Journal of Language Evolution.  https://‍ Arbib, Michael A. (2012). How the brain got language: The mirror system hypothesis (Vol. 16). Oxford University Press.  https://‍ Arbib, Michael A. (2016). Towards a Computational Comparative Neuroprimatology: Framing the language-ready brain. Physics of Life Reviews, 16, 1–54. https://‍

268 Dietrich Stout Bogin, Barry, Bragg, Jared, & Kuzawa, Christopher. (2014). Humans are not cooperative breeders but practice biocultural reproduction. Annals of human biology, 41(4), 368–380. https://‍ Brauer, Jens, Anwander, Alfred, Perani, Daniela, & Friederici, Angela D. (2013). Dorsal and ventral pathways in language development. Brain and language, 127(2), 289–295. https://‍ Christiansen, Morten H., & Chater, Nick. (2016). Creating language: integrating evolution, acquisition, and processing. Cambridge, MA: MIT Press. d’Errico, F., & Stringer, C. B. (2011). Evolution, revolution or saltation scenario for the emergence of modern cultures? Philosophical Transactions of the Royal Society B: Biological Sciences, 366(1567), 1060.  https://‍ Dippel, Gabriel, & Beste, Christian. (2015). A causal role of the right inferior frontal cortex in implementing strategies for multi-component behaviour. Nature communications, 6. https://‍ Emmorey, Karen, McCullough, Stephen, Mehta, Sonya, Ponto, Laura L. B., & Grabowski, Thomas J. (2011). Sign language and pantomime production differentially engage frontal and parietal cortices. Language and Cognitive Processes, 26(7), 878–901. https://‍ Fagg, Andrew H., & Arbib, Michael A. (1998). Modeling parietal-premotor interactions in primate control of grasping. Neural Networks, 11(7–8), 1277–1303. https://‍‍00047-1 Fitch, W., & Martins, Mauricio D. (2014). Hierarchical processing in music, language, and action: Lashley revisited. Annals of the New York Academy of Sciences, 1316(1), 87–104. https://‍ Fragaszy, Dorothy M., Biro, D., Eshchar, Y., Humle, T., Izar, P., Resende, B., & Visalberghi, E. (2013). The fourth dimension of tool use: temporally enduring artefacts aid primates learning to use tools. Philosophical Transactions of the Royal Society B: Biological Sciences, 368(1630).  https://‍ Frost, Ram, Armstrong, Blair C., Siegelman, Noam, & Christiansen, Morten H. (2015). Domain generality versus modality specificity: the paradox of statistical learning. Trends in Cognitive Sciences, 19(3), 117–125.  https://‍ Gärdenfors, Peter, & Högberg, Anders. (2017). The archaeology of teaching and the evolution of Homo docens. Current Anthropology, 58(2), 000–000.  https://‍ Gobet, Fernand, Lane, Peter C. R., Croker, Steve, Cheng, Peter C. H., Jones, Gary, Oliver, Iain, & Pine, Julian M. (2001). Chunking mechanisms in human learning. Trends in cognitive sciences, 5(6), 236–243.  https://‍‍01662-4 Goldberg, Adele E. (2003). Constructions: a new theoretical approach to language. Trends in cognitive sciences, 7(5), 219–224.  https://‍‍00080-9 Gómez-Robles, Aida, Hopkins, William D., Schapiro, Steven J., & Sherwood, Chet C. (2015). Relaxed genetic control of cortical organization in human brains compared with chimpanzees. Proceedings of the National Academy of Sciences, 112(48), 14799–14804. https://‍ Gowlett, John A. J. (1984). Mental abilities of early man: A look at some hard evidence. In R. Foley (Ed.), Hominid Evolution and Community Ecology (pp. 167–192). New York: Academic Press.

Archaeology and the evolutionary neuroscience of language 269

Greenfield, Patricia M. (1991). Language, tools, and brain: The development and evolution of hierarchically organized sequential behavior. Behavioral and Brain Sciences, 14, 531–595. https://‍ Hecht, Erin E., Gutman, D. A., Khreisheh, N., Taylor, S. V., Kilner, J., Faisal, A. A., … Stout, D. (2014). Acquisition of Paleolithic toolmaking abilities involves structural remodeling to inferior frontoparietal regions. Brain Structure and Function, 1–17. https://‍ Hecht, Erin E., Gutman, David A., Preuss, Todd M., Sanchez, Mar M., Parr, Lisa A., & Rilling, James K. (2013). Process Versus Product in Social Learning: Comparative Diffusion Tensor Imaging of Neural Systems for Action Execution–Observation Matching in Macaques, Chimpanzees, and Humans. Cerebral Cortex, 23(5), 1014–1024. https://‍ Hecht, Erin E., Gutman, David A., Bradley, Bruce A., Preuss, Todd M., & Stout, Dietrich. (2015). Virtual dissection and comparative connectivity of the superior longitudinal fasciculus in chimpanzees and humans. NeuroImage, 108, 124–137. https://‍ Heyes, Cecilia M., & Frith, Chris D. (2014). The cultural evolution of mind reading. Science, 344(6190), 1243091.  https://‍ Hill, Kim, Barton, M., & Hurtado, A. M. (2009). The emergence of human uniqueness: Characters underlying behavioral modernity. Evolutionary Anthropology: Issues, News, and Reviews, 18(5), 187–200.  https://‍ Ingold, Tim. (1997). Eight themes in the anthropology of technology. Social Analysis, 41, 106–138. Isler, Karin, & Van Schaik, Carel P. (2014). How humans evolved large brains: Comparative evidence. Evolutionary Anthropology: Issues, News, and Reviews, 23(2), 65–75. https://‍ Kaplan, Hillard, Hill, Kim, Lancaster, Jane, & Hurtado, A. Magdalena. (2000). A theory of human life history evolution: Diet, intelligence, and longevity. Evolutionary Anthropology: Issues, News, and Reviews, 9(4), 156–185. https://‍‍9:43.0.CO;2-7 Koechlin, Etienne, & Jubault, Thomas. (2006). Broca’s Area and the hierarchical organization of human behavior. Neuron, 50(6), 963–974.  https://‍ Kolodny, Oren, Edelman, Shimon, & Lotem, Arnon. (2015). Evolution of protolinguistic abilities as a by-product of learning to forage in structured environments. Proc. R. Soc. B, 282(1811), 20150353.  https://‍ Laland, Kevin N., Uller, Tobias, Feldman, Marcus W., Sterelny, Kim, Müller, Gerd B., Moczek, Armin, … Odling-Smee, John. (2015). The extended evolutionary synthesis: its structure, assumptions and predictions. Proceedings of the Royal Society of London B: Biological Sciences, 282(1813).  https://‍ Lambon Ralph, Matthew A., Jefferies, Elizabeth, Patterson, Karalyn, & Rogers, Timothy T. (2016). The neural and computational bases of semantic cognition. Nature Reviews Neuroscience. Levinson, Stephen C., & Holler, Judith. (2014). The origin of human multi-modal communication. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 369(1651).  https://‍ Lewis, Jason E., & Harmand, Sonia. (2016). An earlier origin for stone tool making: implications for cognitive evolution and the transition to Homo. Phil. Trans. R. Soc. B, 371(1698), 20150233.  https://‍

270 Dietrich Stout Milner, A. David, & Goodale, Melvyn A. (1995). The visual brain in action. Oxford: Oxford University Press. Montgomery, Kimberly J., Isenberg, Nancy, & Haxby, James V. (2007). Communicative hand gestures and object-directed hand movements activated the mirror neuron system. Social Cognitive and Affective Neuroscience, 2(2), 114–122.  https://‍ Moore, Mark W. (2010). “Grammars of action” and stone flaking design space. In A. Nowell & I. Davidson (Eds.), Stone tools and the evolution of human cognition (pp. 13–43). Boulder, Colorado: University Press of Colorado. Morgan, T. J. H., Uomini, N. T., Rendell, L. E., Chouinard-Thuly, L., Street, S. E., Lewis, H. M., … Laland, K. N. (2015). Experimental evidence for the co-evolution of hominin tool-making teaching and language. 6, 6029.  https://‍ https://‍ Nonaka, Tetsushi, Bril, Blandine, & Rein, Robert. (2010). How do stone knappers predict and control the outcome of flaking? Implications for understanding early stone tool technology. Journal of Human Evolution, 59(2), 155–167.  https://‍ Orban, Guy A., & Caruana, Fausto. (2014). The neural basis of human tool use. Frontiers in psychology, 5(310), 12. Putt, Shelby S., Wijeakumar, Sobanawartiny, Franciscus, Robert G., & Spencer, John P. (2017). The functional brain networks that underlie Early Stone Age tool manufacture. Nature Human Behaviour, 1(6), 0102.  https://‍ Rilling, James K., Glasser, Matthew F., Jbabdi, Saad, Andersson, Jesper, & Preuss, Todd M. (2011). Continuity, Divergence, and the Evolution of Brain Language Pathways. Frontiers in Evolutionary Neuroscience, 3, 11.  https://‍ Roepstorff, Andreas, Niewöhner, Jörg, & Beck, Stefan. (2010). Enculturing brains through patterned practices. Neural Networks, 23(8), 1051–1059. https://‍ Ruff, Christopher B., Trinkaus, Erik, & Holliday, Trenton W. (1997). Body mass and encephalization in Pleistocene Homo. Nature, 387(6629), 173–176. https://‍ Russon, Anne. (2018). Pantomime and imitation in great apes: Implications for reconstructing the evolution of language. Interaction Studies.  https://‍ Schell, Marianne, Zaccarella, Emiliano, & Friederici, Angela D. (2017). Differential cortical contribution of syntax and semantics: An fMRI study on two-word phrasal processing. Cortex, 96, 105–120.  https://‍ Stewart, J. R., & Stringer, C. B. (2012). Human Evolution Out of Africa: The Role of Refugia and Climate Change. Science, 335(6074), 1317–1321.  https://‍ Stout, Dietrich. (2002). Skill and cognition in stone tool production: An ethnographic case study from Irian Jaya. Current Anthropology, 45(3), 693–722.  https://‍ Stout, Dietrich. (2011). Stone toolmaking and the evolution of human culture and cognition. Philosophical Transactions of the Royal Society B: Biological Sciences, 366(1567), 1050–1059. https://‍ Stout, Dietrich. (2013). Neuroscience of technology. In P. J. Richerson & M. Christiansen (Eds.), Cultural Evolution: Society, Technology, Language, and Religion (pp. 157–173). Cambridge, MA: MIT Press. Stout, Dietrich, Apel, Jan, Commander, Julia, & Roberts, Mark. (2014). Late Acheulean technology and cognition at Boxgrove, UK. Journal of Archaeological Science, 41, 576–590. https://‍

Archaeology and the evolutionary neuroscience of language 271

Stout, Dietrich, & Chaminade, Thierry. (2007). The evolutionary neuroscience of tool making. Neuropsychologia, 45, 1091–1100.  https://‍ Stout, Dietrich, & Chaminade, Thierry. (2012). Stone tools, language and the brain in human evolution. Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1585), 75–87.  https://‍ Stout, Dietrich, & Hecht, Erin E. (2017). Evolutionary neuroscience of cumulative culture. Proceedings of the National Academy of Sciences, 114(30), 7861–7868. https://‍ Stout, Dietrich, Hecht, Erin, Khreisheh, Nada, Bradley, Bruce, & Chaminade, Thierry. (2015). Cognitive Demands of Lower Paleolithic Toolmaking. PLoS ONE, 10(4), e0121804. https://‍ Stout, Dietrich, & Khreisheh, Nada. (2015). Skill Learning and Human Brain Evolution: An Experimental Approach. Cambridge Archaeological Journal, 25(04), 867–875. https://‍ Stout, Dietrich, Passingham, Richard, Frith, Christopher, Apel, Jan, & Chaminade, Thierry. (2011). Technology, expertise and social cognition in human evolution. European Journal of Neuroscience, 33(7), 1328–1338.  https://‍ Stout, Dietrich, Toth, N., Schick, K., & Chaminade, T. (2008). Neural correlates of Early Stone Age toolmaking: technology, language and cognition in human evolution. Philos Trans R Soc Lond B Biol Sci, 363(1499), 1939–1949.  https://‍ Strike, Lachlan T., Hansell, Narelle K., Couvy-Duchesne, Baptiste, Thompson, Paul M., de Zubicaray, Greig I., McMahon, Katie L., & Wright, Margaret J. (2018). Genetic Complexity of Cortical Structure: Differences in Genetic and Environmental Factors Influencing Cortical Surface Area and Thickness. Cerebral Cortex.  https://‍ Toth, Nicholas, Schick, Kathy D., & Semaw, Sileshi. (2006). A comparative study of the stone tool-making skills of Pan, Australopithecus, and Homo sapiens. In N. Toth & K. D. Schick (Eds.), The Oldowan: case studies into the earliest stone age (pp. 155–222). Gosport, IN: Stone Age Institute Press. Wilson, Stephen M., Saygin, Ayşe Pinar, Sereno, Martin I., & Iacoboni, Marco. (2004). Listening to speech activates motor areas involved in speech production. Nature neuroscience, 7(7), 701–702.  https://‍ Wolpert, D., Doya, K., & Kawato, M. (2003). A unifying computational framework for motor control and social interaction. Philosophical Transactions of the Royal Society of London B, 358, 593–602.  https://‍ Xiao, Yaqiong, Friederici, Angela D., Margulies, Daniel S., & Brauer, Jens. (2016). Development of a selective left-hemispheric fronto-temporal network for processing syntactic complexity in language comprehension. Neuropsychologia, 83, 274–282. https://‍

Tracing the evolutionary trajectory of verbal working memory with neuro-archaeology Shelby S. Putt1,2 and Sobanawartiny Wijeakumar3 1The

Stone Age Institute / 2Indiana University / 3University of Stirling

We used optical neuroimaging to explore the extent of functional overlap between working memory (WM) networks involved in language and Early Stone Age toolmaking behaviors. Oldowan tool production activates two verbal WM areas, but the functions of these areas are indistinguishable from general auditory WM, suggesting that the first hominin toolmakers relied on early precursors of verbal WM to make simple flake tools. Early Acheulian toolmaking elicits activity in a region bordering on Broca’s area that is involved in both visual and verbal WM tasks. The sensorimotor and mirror neurons in this area, along with enhancement of general WM capabilities around 1.8 million years ago, may have provided the scaffolding upon which a WM network dedicated to processing exclusively linguistic information could evolve. In the road map going forward, neuro-archaeologists should investigate the trajectory of WM over the course of human evolution to better understand its contribution to language origins. Keywords: visual working memory, auditory working memory, ventral premotor cortex, protolanguage, stone tools

Introduction Working memory (WM) is a process that temporarily stores and manipulates representations in relation to one or more goals. Multiple modalities of information are processed in WM, including visual (De Benni et al., 2005), auditory (Kumar et al., 2016), tactile (Fassihi et al., 2014), olfactory (Jönsson et al., 2011), gustatory (Lara et al., 2009), and linguistic (Acheson and MacDonald, 2009) information. There is also brain circuitry dedicated to different subdomains; for example, object- and visuo-spatial WM activate separate ventral and dorsal neural systems, respectively (Courtney et al., 1996). In this paper, we will attempt to trace the evolution of verbal WM in early Homo using a neuro-archaeological approach, and in so doing, we hope that WM may provide the bridge between praxic action and language in an evolutionary context. https://‍ © 2020 John Benjamins Publishing Company

Tracing the evolutionary trajectory of verbal working memory with neuro-archaeology 273

The amount of verbal information that the brain can hold and manipulate in order for a person to achieve a goal or solve a problem specifies the capacity of verbal WM. Verbal WM and its corresponding subdomains are critical for language acquisition, subvocal rehearsal, assigning syntactic structure to determine the meaning of an utterance, and remembering information during a conversation (Gathercole and Baddeley, 2014). Without verbal WM, modern language as we know it would not exist. Therefore, at least the base elements of verbal WM needed to be present in Arbib’s (2016) hypothesized “language-ready brain” for fully modern language to develop. At this point, however, very little is known about the evolution of verbal WM because of the lack of direct fossil evidence for cognition and language. The multicomponent model is often used to describe WM as a central executive that acts as a supervisory system over two independent short-term memory buffers that store verbal (phonological loop) and nonverbal (visuo-spatial sketchpad) information and an episodic buffer that temporarily stores and binds multimodal information from subsidiary systems and long-term memory (Baddeley, 2000; Baddeley and Hitch, 1974). How nonlinguistic auditory, tactile, and other forms of sensorimotor information map onto the multicomponent model, however, is “far from clearly established” (Baddeley, 2012, p. 13). For example, some studies suggest a hemispheric dissociation, where the right and left PFC are engaged during visual and verbal WM tasks, respectively, such as while remembering faces versus remembering names over a delay (Rämä et al., 2001; Rothmayr et al., 2007). It is unlikely that all of the listed modalities (visual, auditory, tactile, etc.) map onto the multicomponent model. Therefore, it is not the ideal model for investigating the evolution of WM. Rather, one more akin to Goldman-Rakic’s (1996) domain specificity hypothesis might be more appropriate for deciphering an evolutionary account of WM. Under this model, each specialized domain is localized to a different anatomical subdivision and has its own processing and storage mechanisms, which could explain why object-based visual and auditory WM pathways extend from sensory regions to different parts of the frontal cortex, for example (see Kumar et al., 2016; Lehnert and Zimmer, 2008). Under this model, it is possible to explore overlap between two or more WM circuits as a potential indicator of common descent. Neuroimaging and neurophysiological studies confirm that WM in human and nonhuman primates involves parallel, distributed neuronal networks that manage different sensory domains of information (Constantinidis and Procyk, 2004; Schulze et  al., 2010). Were these WM domain networks always separate from each other? Verbal WM, for example, is only found in humans and is therefore a more recent evolutionary development. Did it evolve from one of these preexisting WM networks or does it reflect cultural evolution, providing a new skill to reshape existing WM resources?

274 Shelby S. Putt and Sobanawartiny Wijeakumar

Perhaps neuro-archaeology can shed light on these questions. As some of the only surviving artifacts for early human manual skill and cognition, stone tools are the best option available for scientists to learn about past hominin brain operations at specific points in the past (Stout and Hecht, 2015; Wynn, 1979). In many cases, the exact function of the tools is unknown, but through many replicative studies conducted over the years, it has become increasingly clear how stone tools were made (Whittaker, 1994). Therefore, neuro-archaeological research has focused on the toolmaking process foremost, though there has been some pilot electroencephalography research done on prehistoric tool use (Williams et al., 2014). By using neuroimaging techniques to record the brain activity of modern-day subjects as they replicate the process of making stone tools, a neuro-archaeological approach pinpoints exactly which brain networks are active in modern subjects, which can then be informative about the cognitive features that were likely the most important for completing these toolmaking tasks at different points in the past. The activation of specific neural circuits while carrying out certain prehistoric behaviors need not imply that these neural circuits evolved for the purpose of these behaviors, only that these circuits were likely already in place before these behaviors arose; otherwise, the behaviors in question would have been impossible to perform because of a motor or cognitive limitation. Although many studies have assumed that features or objects are represented independently of each other in WM, recent evidence suggests that these representations are organized in a hierarchically structured fashion (Nie et al., 2017). Some researchers propose that the cognitive processes (i.e., WM) involved in combining objects in a hierarchical organization and combining words into sentences are homologous and occur in the same neural structure (Fadiga et al., 2009; Greenfield, 1991). For example, the hierarchical thinking required to form and interpret complex sentences as well as in nonverbal tasks with high WM demands activates the posterior third of the inferior frontal gyrus (BA 44) known as Broca’s area (Fiebach and Schubotz, 2006). Fadiga and colleagues (2009) also suggest that the ventral premotor cortex (vPMC) is tuned to detect and represent abstract, hierarchical structures. The hierarchical sequencing of language and the technological actions involved in stone tool production, specifically Acheulian tool production, are hypothesized to be the result of similar cognitive processes (Mahaney, 2014; Stout et al., 2008). If this is the case, then we should be able to show that stone tool production activates Broca’s area and vPMC, which could be informative about the evolution of verbal WM. If, however, verbal WM and stone tool production are completely unrelated cognitive processes, then it may be difficult to learn anything at all about the evolution of verbal WM by monitoring the brain activity associated with making stone tools.

Tracing the evolutionary trajectory of verbal working memory with neuro-archaeology 275

This brings up three important questions that this paper will address. First, what (if anything) can neuro-archaeology conclude about the evolution of verbal WM? Second, do language and toolmaking rely on the same WM network to any extent, and did they evolve along a single pathway at any point during the course of human evolution? Lastly, what are further open questions on how the brain got language that neuro-archaeology is prime to address in future studies?

Neuro-archaeological insights into the evolution of working memory There have been some promising developments made in neuro-archaeology regarding the evolution of WM that have thus far focused solely on Oldowan and Acheulian stone technologies (e.g., Stout et al., 2015; see also, Stout, 2018). These stone industries appeared 2.6 and 1.75 million years ago (mya), respectively (Beyene et al., 2013; Semaw et al., 1997). Oldowan technology involves the expedient method of obtaining a sharp flake tool by striking a core with a hard hammerstone with the knapping gesture (Toth, 1985). Resulting non-standard cores reflect the original shape of the stone (Figures 1a–b). The early Acheulian technology involves a more advanced form of knapping called ‘alternate flaking,’ which is used to thin and shape a stone into a standard handaxe shape (Figures 1c–d). Some researchers claim that the appearance of the Acheulian technocomplex in the archaeological record signifies an increase in cognitive capacity and the introduction of protolanguage (e.g., Arbib, 2011; Shipton, 2010). Here we focus on a recent study that uses functional near-infrared spectroscopy (fNIRS) to investigate the functional brain networks underlying Oldowan and Acheulian tool manufacture (Putt et al., 2017). fNIRS is a neuroimaging technique that measures changes in cortical oxygenated and deoxygenated hemoglobin and produces reconstructed images of localized functional brain activity that can be directly compared to fMRI results (Wijeakumar et  al., 2017). Because fNIRS is less influenced by motion artifacts than fMRI, it can be used to measure real-time, localized cortical activity as people make stone tools. After completing seven training sessions, the participants’ oxygenated and deoxygenated hemoglobin cortical levels were measured with fNIRS while they replicated the process of Oldowan and early Acheulian toolmaking. Data were collected from alternating 1-min toolmaking blocks and 15-s rest periods during both of these tasks. This experiment assessed differences in brain activity for an Acheulian task as contrasted with an Oldowan task and thereby focused on cognition changes at one point in prehistory around 1.8 mya when early Homo presumably innovated the more complex Acheulian industry. This study found that Acheulian toolmaking involves the guidance and integration of visual and auditory WM representations in the vPMC. The Acheulian

276 Shelby S. Putt and Sobanawartiny Wijeakumar

task activates a brain network that is also employed during tasks that are within the skillset of modern humans alone, such as piano-playing (Bangert et al., 2006). Create sharp flake tool(s) Remove flake(s) Identify Determine angles on core striking force

Select hammer

Strike stone in correct location

Judge size of core


b. Make teardrop-shaped handaxe with sharp edges Remove square edges Alternate flake around edge

Flip core Identify convex and concave areas of core

Thin and shape piece Remove convexities

Create point

Inhibit flake removal

Remove flake(s)

Identify angles Determine on core striking force

Monitor faults and weaknesses in stone Avoid thin areas and ends of core

Select Plan direction and hammer length of flake

Judge size of core


Keep core intact

Prepare platform

Small flake removal

Strike stone in correct location

Position core correctly before striking Grinding

d. Figure 1.  Goal hierarchy and production stages associated with Oldowan flaking (a-b) and early Acheulian handaxe manufacture (c-d). These particular goal hierarchies reflect the thought process of the first author while working toward the overarching goals of making the featured tools. Each goal can only be accomplished if all of its underlying subgoals are also accomplished. The Acheulian production stage (d) demonstrates how easy it is to snap a core if proper attention is not directed to each of the subgoals.

Tracing the evolutionary trajectory of verbal working memory with neuro-archaeology 277

This is likely because both tasks are complex, involving bimanual coordination, the integration of multiple modes of sensory information, and goal-directed decision-making based on a fixed set of affordances (i.e, number of keys on a piano versus number of angles less than 90° on a core). Oldowan toolmaking, on the other hand, depends on a lateral premotor system that recognizes and assigns significance to external objects based on external visual input. These findings, along with the dearth of complex stone tools prior to 1.75 mya, indicate an expansion in WM capabilities at this time. Putt and colleagues (2017) assume that Acheulian toolmaking relies on a visual WM network because the coordinates associated with the activated vPMC are noted in a visual WM meta-analysis (Wijeakumar et al., 2015). This seems logical, as handaxe production relies on constant visual monitoring of the intermediate steps that must be deduced before one can reach the end goal state(s) (see Figures 1c). It is unclear to what extent visual WM areas are involved in stone toolmaking because of the nature of the analysis used. It focused only on the Oldowan-Acheulian contrast, and the results were compared exclusively to the coordinates of known visual WM centers, thus biasing the interpretation toward visual WM. Therefore, the extent that stone toolmaking recruits other WM networks like verbal WM is unknown. To gain a clearer understanding of the WM networks involved in stone tool production tasks, we present the results of two region-of-interest analyses, which explore the relative activation of known visual and verbal WM centers during stone toolmaking tasks.

Working memory centers activated during stone tool production Did early Homo succeed at making complex Acheulian tools because of an evolutionary change to their visual WM capacity, allowing them to store and manipulate more information than their primate predecessors? Or was this technical innovation possible because they developed a unique way of thinking in the form of verbal WM? Because of the relative complexity of the Acheulian toolmaking task, having even a proto-verbal WM could have been beneficial to prehistoric toolmakers because they could store and process complex action sequences as simple concepts, thus increasing their understanding of interrelated parts and actions. We collected coordinates of visual and verbal WM regions-of-interest from two meta-analyses, a visual WM meta-analysis that includes delayed match-tosample and change-detection tasks (Wijeakumar et al., 2015) and a language-processing meta-analysis (Vigneau et al., 2011). With these coordinates, we extracted values representing the level of change in the neural signal in the corresponding brain space of our participants during Oldowan and Acheulian knapping tasks and rest periods. Data were included from 16 participants who learned to knap

278 Shelby S. Putt and Sobanawartiny Wijeakumar

without verbal instructions (see Putt et  al., 2017 for more information on the methods used to obtain and process neuroimaging data). The knapping values were statistically compared to the rest values using a Wilcoxon signed-rank test to determine if knapping significantly activated these visual and verbal WM areas. Three visual WM regions were identified where the knapping signal is significantly higher than the rest signal, including the left frontal eye field and dorsolateral prefrontal cortex (dlPFC) in both hemispheres (see Figure  2a). The frontal eye field forms part of a dorsal visual attention network (Corbetta and Schulman, 2002) and is only significantly activated during the Oldowan task. This result affirms what was found in the Oldowan-Acheulian contrast. The bilateral activation of dlPFC during the Acheulian task, however, is a novel result. The dlPFC is associated with a wide range of executive control functions, including planning, executing goal-directed behaviors, deductive reasoning, and decision-making (Coutlee and Huettel, 2012; Heekeren et al., 2006; Kaller et al., 2011). It is also one of the more important substrates for visual WM. The differential activation of bilateral dlPFC between the two toolmaking tasks suggests that making an Acheulian handaxe has a more ambiguous goal hierarchy and greater search depth than making Oldowan tools (Kaller et al., 2011), meaning that the sequence of actions needed to make an Acheulian handaxe is much less obvious. Also, Acheulian toolmaking requires mental generation of sequences and evaluation of the interdependency of individual actions, while Oldowan toolmaking is primarily based in visual search (see Figure 1). These results further support three claims: (1) Acheulian toolmaking is a more cognitively demanding task than Oldowan toolmaking; (2) complex stone tool manufacture probably relies on a visual WM network; and (3) the appearance of the Acheulian industry in the archaeological record may mark a transition in the visual WM capabilities of early Homo. Of the seven verbal WM regions included in the analysis, there were only two areas significantly activated during the stone toolmaking tasks. These included the left dorsal pars triangularis, which forms the anterior portion of Broca’s area, and the right anterior middle frontal gyrus, which also overlaps with the anterior dorsal part of pars triangularis (see Figure 2b). The signal in the left dorsal pars triangularis is significantly higher than the resting signal for both the Oldowan and Acheulian tasks, while only the Oldowan task signal is higher than the rest signal in the right anterior middle frontal gyrus. The increase in technical complexity with the advent of the Acheulian industry therefore cannot be attributed to the evolution of verbal WM per se. Both of the noted areas are associated with phonological WM functions rather than semantic or sentence-level processing functions (Vigneau et al., 2011). For example, the former is activated in tasks that involve pseudo-word repetition (Warburton et al., 1996), word articulation versus word reading (McGuire et al.,

Tracing the evolutionary trajectory of verbal working memory with neuro-archaeology 279 Left frontal eye field

* % signal change

% signal change

0.010 0.005 0.000 −0.005

Left dorsolateral prefrontal cortex

Oldowan Acheulian Rest

0.04 0.03 0.02 0.01 0.00 −0.01 −0.02 −0.03

Oldowan Acheulian

0.000 −0.005 −0.010 −0.015

Oldowan Acheulian



Right anterior middle frontal gyrus (F2ant)


0.002 *

0.001 % signal change

% signal change


0.00 −0.01

0.000 −0.001


−0.02 −0.03





Left dorsal pars triangularis (F3td)






Right dorsolateral prefrontal cortex

% signal change


Oldowan Acheulian Task




Acheulian Task


b. Figure 2.  Active visual WM (a) and verbal WM (b) areas during stone tool production tasks. Red circles represent WM coordinates determined by meta-analyses (Wijeakumar et al., 2015 in the case of visual WM and Vigneau et al., 2011 in the case of verbal WM). Only regions where the signal associated with the stone tool production tasks is significantly higher than the signal associated with rest periods are included. Significant Wilcoxin signed-rank tests where p 4-fold increase in the number of cognitive decisions required to make the earlier Oldowan as compared to later Acheulean technology (which first occurs ~1.7 MYA but becomes fully mature ~0.7 MYA). In addition, Increased visuomotor coordination, possible changes in hand shape, improved imitation ability and possibly teaching (see Stout, this volume) would be required. Consistent with this, cranial capacity increases seen during the Acheulean technological tradition are substantial (Figure 1). What might have caused such an increase in conceptual complexity? Humphrey (1984) proposed that increasing intellect was an adaptation to social living, and that tools were made possible by this (when combined with a grasping hand). Consistent with this, brain size in primates is associated with social group size (Dunbar, 2003). Although DeCasien et al. (2017) and Powell et al. (2017) report relative brain size does not correlate with social group size in primates, both datasets show a robust correlation with absolute brain size. Increasing brain size over time predicts increasing social complexity as well as increasing conceptual richness. There are two ways in which social complexity would likely induce conceptual richness. First, social complexity would have spurred the elaboration and

The evolution of enhanced conceptual complexity and of Broca’s area 341

refinement of conceptual understanding. Understanding increasingly subtle patterns of social signals would select for finer conceptualization of social behaviors generally. As social interactions become increasingly complicated, conceptualizing the difference between Machiavellian strategic behavior vs. that of a truly reliable ally would be critical. Individual behaviors themselves might be the same, but subtle differences in context could signal extremely important underlying differences in long-term commitment. At the same time, increasing social complexity would lead to the elaboration of cultural learning generally: More individuals doing more things leads to increasing technological elaboration, thereby selecting for the ability to conceptualize the world in increasingly rich, subtle, and creative ways. Increasing technological complexity also dramatically increases the ways individuals can interact with the world. This suggests a dynamic, positive feedback effect on cultural complexity that would select for increasing neural elaboration underlying conceptualization. Consistent with this, Powell et  al. (2009) show it is possible to model dramatic changes in technological innovation (e.g., the transition to the Upper Paleolithic ~45,000 years ago) using demographic changes alone. The evolution of increasing conceptual richness in the context of an increasingly socially-interactive existence would have been a powerful spur to enhanced communication. The positive-feedback nature of this interaction over evolutionary time would have lead inevitably to selection for increasingly elaborate modern human language systems.

Sequence processing Language grammar and syntax rely partly on sequence processing. Since an evolutionary perspective predicts this circuitry most likely occurs through the modification of pre-existing circuits, we should expect that modern language circuits have precursors of some kind in non-human primates. Any particular aspect of language processing likely involves complex interactions of circuits from many different brain areas, of course, and not simply neural circuits localized in one area alone. The focus here will be on a subset of circuits relevant to sequential processing, which are depend in part the cortical region known as Broca’s area. This focus is partly because Broca’s has such a long and important history in neurolinguistics studies, but also because – as a cortical as opposed to subcortical area – it stands a better chance of leaving imprints on fossil endocrania, and thereby suggesting the time-course of aspects of language evolution. Studies of Broca’s aphasia, Parkinson’s disease, and Huntington’s disease indicate the critical involvement of subcortical structures such as the basal ganglia also (Lieberman, 2000). Nevertheless, functional brain imaging studies show Broca’s area plays a role in

342 P. Thomas Schoenemann

grammar (Grodzinsky, 2000; Thompson-Schill et al., 1997), and gray matter loss in Broca’s is correlated with the degree of deficits in syntactic comprehension and production (Wilson et al., 2011). The 5–6 fold difference in size in Broca’s area (Keller et al., 2009; Schenker et al., 2010) between humans and chimpanzees tells us there has been particularly strong selection in hominin evolution on this region. What, then, is the evolutionary history of Broca’s area? Because homologs of Broca’s area have been found in both apes and monkeys (Schenker et al., 2010), the precursors of human language circuits there must have evolved for proposes other than language. Homologs of Broca’s area in monkeys has been shown to contain mirror neuron circuits (Arbib, 2016), is active during the recognition of species-specific vocal calls (Gil-da-Costa et al., 2006), is involved in orofacial motor sequencing (Petrides et al., 2005), and the active controlled retrieval of visual object and spatial information (Petrides and Pandya, 2009). Chimpanzee Broca’s homolog is active during the production of communicative gestures (Taglialatela et al., 2008), consistent with the proposal that mirror neurons in Broca’s area form an important foundation for language (Schenker et al., 2008). Another avenue for obtaining clues about the original function of Broca’s is to ask what non-linguistic functions are still evident in human Broca’s area? These may represent ancient functions of the original Broca’s area circuits in earlier monkeys and apes. Although they may simply represent secondary uses of circuits that evolved specially for language in humans, this can be ruled out if Broca’s homologs in modern monkeys and apes also have these functions. One intriguing set of studies shows that Broca’s is important for implicit learning of non-linguistic sequential patterns or "rules" (Christiansen et al., 2010; Petersson et al., 2012). This suggests an intriguing hypothesis: Circuits in Broca’s originally evolved to ‘extract’ or learn sequential pattern information (predictions about what sequences are likely) from the organism’s environment. In hominins they would have been an attractive substrate for the evolution of syntax (and grammar), as well as the ability to distinguish words based on different patterns of phonemes. Such sequence-pattern-sensitive circuits would be useful for all sorts of reasons, including a connection with mirror neuron mediated action recognition. An action typically involves some number of parts or ‘sub-actions’, which are sequenced in a specific way. For example: the action: [grabbing an object] involves first extending the arm towards the object, and then closing the fingers around it. By contrast, the action: [punching an object] involves first closing the fingers around themselves (to make a fist), and then extending the arm (rapidly) towards the object. Recognizing these as distinct actions requires being senstitive to the particular sequence of sub-actions that they entail. Being able to quickly differentiate between a someone punching vs. grabbing you would be extremely useful in a complex social environment.

The evolution of enhanced conceptual complexity and of Broca’s area 343

Sequential patterns of sounds – rather than the individual sounds themselves – are of course the key to differentiating individual words. "Cat", "tack", and "act" are distinguishable not because of the basic sounds they contain, but rather the order in which the sounds appear. Much of grammar involves sensitivity to different orders of words in a sentence. Although not all languages use word order to mark argument structure, there is no language for which word order is truly meaningless (William Wang, personal communication). For all languages, some circuitry is needed that is sensitive to differences in sequential ordering of constituents. Understanding the evolutionary history of sequential processing requires assessment of the degree of overlap (if any) between linguistic vs. non-linguistic sequential processing. Fedorenko et al. (2012) do report somewhat distinct (though highly individually variable) linguistic and nonlinguistic functional regions of Broca’s area, but their tasks did not assess sequential processing particularly well. Differential localization of linguistic vs. non-linguistic processing within Broca’s also does not demonstrate that humans evolved unique language-specfic circuitry there. Such differentiation could be entirely developmentally induced. The degree of individual variability found by Fedorenko et al. (2012) makes any suggestion of a genetic hard-wiring unlikely. Because of the non-linguistic nature of the sequential processing tasks probed by Christiansen et  al. (2010) and Petersson et  al. (2012), it is possible to probe whether or not non-human primates also demonstrate implicit learning of sequential patterns, and if so, whether their homolog of Broca’s is also involved. Recent studies with simpler patterns suggest they do (Wilson et al., 2015; Wilson in this volume, 2013), and we are pursuing similar work. Evidence of brain evolution in the fossil record provides tantalizing clues about the evolution of Broca’s area (Holloway, 1983). Given that in modern humans left Broca’s is more active during language processing, and that Broca’s tends to be larger on the left than on the right, asymmetries in Broca’s cap of fossil endocasts are of particular interest (Holloway, 1976). While Broca’s cap only partially overlies Broca’s area (Falk, 2014), it does overly area intensively involved in language processing (Schoenemann and Holloway, 2016). Holloway et al, (2004) report subjective assessments whether left Broca’s cap is larger than the right in the 19 pre-anatomically modern specimens for which both left and right Broca’s caps are preserved. The majority (almost 80%), going back >2 MYA, show the left protruding more than right, and only two show a clear right bias. Assessments of fossil endocast sulcal/gyral patterns overlaying Broca’s area also suggest changes occurring >2 MYA. The earliest fossil suggesting a difference from the basic chimpanzee pattern in the left inferior frontal (LIF) is an Australopithecus africanus specimen, STS-5 (~2.5 MYA from Sterkfontein, South Africa; Figure 2). This specimen also displays a left-biased Broca’s cap (Holloway et al., 2004).

344 P. Thomas Schoenemann

Even more suggestive is a 1.8 MYA early Homo specimen, KNM-ER 1470. Holloway (1976) noted the endocranial gyral/sulcal impressions in the area overlying the LIF were clearly larger than in earlier hominins. Falk (1983) argued its convolutional detail was more consistent with ahuman-like Broca’s. The curvature pattern confirms a more modern-human-like LIF (Figure 2). Early Homo ergaster specimens KNM-ER 3733 (1.78 MYA) and 3883 (1.57 MYA) had "true" Broca’s caps (Holloway, 1983), although surface morphology is not described. The Homo erectus specimen KNM-ER 15000 (1.5 MYA), has "inflated" gyri over Broca’s area (Begun and Walker, 1993). The Daka Homo erectus specimen (1.0 MYA), displays strong left Broca’s cap protrusion similar to modern humans. Gilbert et al. (2008) report it lacks convolutional details, but images of the LIF pattern LIF pattern

Pan troglodytes (brain) 318 ml Homo sapiens (brain) 1471 ml

KNM-ER 1470 1.88 MYA 769 ml

STS-5 2.50 MYA 504 ml −0.05


Saccopastore 1 0.12 MYA 1245 ml 0.05

1 cm

Figure 2.  Top row: Chimpanzee brain (average of 3 male and 3 female; Schoenemann, Sheehan, & Glotzer, 2005); Human brain (Grabner et al., 2006). Bottom row: Endocrania of 3 hominin fossils: STS-5, KNM-ER 1470, and Saccopastore 1. Color coding corresponds to the average curvature of the surface at each point, calculated following Avants et al. (2005) and Avants & Gee (2003). An image of the virtual endocast of KNM-ER 1470 without curvature coding is included for comparison. For each image, the LIF area overlying Broca’s area is enclosed by a cyan circle, and a matching circle highlighting the basic pattern for that endocast appears just to its lower left. Typical pattern of the LIF for modern chimp and human are indicated for comparison. Note that the fossil endocast LIF patterns are more similar to the modern human brain

The evolution of enhanced conceptual complexity and of Broca’s area 345

endocast indicate a more complicated morphology than is typical for chimpanzees. The Ngandong (Solo XI) Homo erectus specimen (Indonesia, 0.34 MYA) has a well-developed Broca’s area (Holloway, 1980). Later hominins appear to retain this characteristic. Homo antecessor specimens from Atapuerca, Spain (0.43 MYA), appear to be more modern-human-like in the LIF, judging from published images (Poza-Rey et al., 2017). Curvature analysis of the Saccopastore 1 (0.12 MYA) Neanderthal specimen suggests the same for this group as well (Figure 2). This suggests significant changes in Broca’s area go back at least to early Homo, and possibly earlier. Though tantalizing, what exactly these changes reflect neuroanatomically and linguistically are not clear. Because Broca’s area is known to have non-linguistic functions it is possible these inferred changes in Broca’s area only reflect increasing tool use, pantomime (or both), and not something about language directly. The archaeological evidence of tool use is much richer than the fossil evidence of brains. Sites that contain only stone tools vastly outnumber those with hominin fossils. What these stone tools may tell us about cognitive evolution, and language in particular, is an area of intense interest (Morgan et al., 2015; Putt et al., 2017; Stout and Chaminade, 2012). Although left hemisphere Broca’s area does not appear to be particularly active in these studies, its role in sequential processing and hierarchical representation of motor actions nevertheless suggests it was relevant to stone tool manufacturing. The archaeological record indicates increasing sophistication of technology over time, and the earliest evidence of stone tool manufacturing approximates when the earliest suggestions of changes in Broca’s area occur in the hominin fossil record. These changes in tool complexity are consistent with both an elaboration of conceptual complexity, as well as enhanced sequential processing ability.

Toward a new road map The fact that conceptual complexity and sequential processing have been highlighted here should not be taken to mean that these are the only important components to the story. The arguments here are meant to suggest important considerations for ongoing research. Some intriguing directions to pursue include: – How are the homologs of modern human language circuits functioning in nonhuman primates? Investigation should not just Broca’s area, but also its connections with posterior areas and the basal ganglia, Wernicke’s area, and mirror neuron circuitry. Which circuits are specifically involved with sequential information? We are currently working with Robert Shumaker, Indianapolis Zoo, to assess implicit learning of nonlinguistic sequential rules

346 P. Thomas Schoenemann

in orangutans. Which of these homologs also process information about hierarchical social relationships (Wilkins and Wakefield, 1995)? Hierarchical relationships make predictions about what sequential patterns are likely in social environments. The relationship between sequential processing and hierarchical processing should be explored fully. Sequential processing in monkeys is known to activate premotor and supplementary motor areas (e.g., Nakajima et al., 2013), but exactly what role Broca’s area might play in monkeys, and how this was elaborated over time, has not been extensively probed. The possible role of increasing social and technological complexity in enhancing the usefulness of the ability to identify and reconstruct sequential patterns (in many domains) deserves particular attention. Is there direct evidence of increasing conceptual richness going from monkeys to apes to humans? One idea: use the oddball paradigm with EEG (Picton, 1992). A smaller difference between oddball vs. expected stimuli should show larger ERP’s for larger brained species (Mark Liberman, personal communication). What connections exist between conceptual richness, sequential processing, and hierarchical processing? How might these covary across primates of varying brain sizes? What is the relevance of sequential processing to mirror neuron activity in nonhuman primates? Does the later depend on the former? Are mirror neuron circuits used in processing non-human primate communicative/social/ gestural behavior? Evidence suggests that language-trained apes do have limited syntactic understanding (e.g., “Rose is gonna chase Kanzi” vs. "Kanzi is going to chase Rose Savage-Rumbaugh et al., 1993, p. 95). What circuits are they using for this? Assuming that sensitivity to sequential processing critically underlies grammar and phonology, what then, if anything, has to change in this circuitry to support grammar (syntax and semantics) or phonology? Or can a system that extracts sequential patterns from the environment be harnessed for grammar directly? What quantitative evidence can be obtained for changes in fossil endocranial morphology that might be relevant to language? Research on endocranial form suggests the parietal was also important (Bruner et  al., 2016). Exactly how much can we predict about a hominin brain from its skull alone? We are currently obtaining brain and for skull data from the same human and ape subjects to directly assess this. Exactly what cognitive functions are necessary for the transitions between stone tool types documented in the archaeological record, and are these correlated with changes in endocranial morphology? Lastly, further elaboration of dynamic models of the co-evolution of language and neurobiology would be useful (Gong et al., 2014). For example, is it possible

The evolution of enhanced conceptual complexity and of Broca’s area 347

to marry socially-interactive agent-based models with models of brain function, e.g., those outlined in Arbib (2016)? To what extent might syntactic complexity be a cultural evolutionary byproduct of increasing conceptual complexity (Schoenemann, 1999; Smith et al., 2003)? If so, can one directly model the expanding spiral of influences (e.g., Arbib, 2016)? Can the elaboration of language and technology seen in the archaeological record be modelled?

Acknowledgements I thank Ralph Holloway for insights on the endocranial evidence, allowing his endocast of KNMER 1470 to be scanned, and for his copy of Giorgio Manzi’s virtual endocast of Saccopastore 1. The endocast of STS 5 was derived from CT scans of the original. I also thank the reviewers.

Funding This research was supported in part by grant 52935 from the Templeton Foundation titled: “What Drives Human Cognitive Evolution?” N. Toth, K. Schick, C. Allen, P. Todd, P.T. Schoenemann, co-Principle Investigators. The paper was prepared for a workshop funded by NSF Grant No. BCS-1343544 “INSPIRE Track 1: Action, Vision and Language, and their Brain Mechanisms in Evolutionary Relationship,” (M.A. Arbib, Principal Investigator).

References Arbib, M. A. (2017). Dorsal and ventral streams in the evolution of the language-ready brain: Linking language to the world. J. Neurolinguistics 43, 228–253. https://‍ Arbib, M. A. (2016). Toward the Language-Ready Brain: Biological Evolution and Primate Comparisons. Psychon. Bull. Rev. Avants, B., Gee, J. (2003). The shape operator for differential analysis of images. Inf Process Med Imaging 18, 101–13.  https://‍ Avants, B. B., Gee, J. C., Schoenemann, P. T., Monge, J., Lewis, J. E., Holloway, R. L. (2005). A new method for assessing endocast morphology: calculating local curvature from 3D CT images. Am. J. Phys. Anthropol. 126, 67. Begun, D., Walker, A. (1993). The endocast. Nariokotome Homo Erectus Skelet. 326–358. Begun, D. R., Kordos, L. (2004). Cranial evidence of the evolution of intelligence in fossil apes, in: Russon, A. E., Begun, D. R. (Eds.), The Evolution of Thought: Evolutionary Origins of Great Ape Intelligence. Cambridge University Press, Cambridge, pp. 260–279. https://‍ Bergman, T. J., Beehner, J. C., Cheney, D. L., Seyfarth, R. M. (2003). Hierarchical classification by rank and kinship in baboons. Science 302, 1234–1236. https://‍ Berwick, R. C., Friederici, A. D., Chomsky, N., Bolhuis, J. J. (2013). Evolution, brain, and the nature of language. Trends Cogn. Sci. 17, 89–98.  https://‍ Bock, W. J. (1959). Preadaptation and Multiple Evolutionary Pathways. Evolution 13, 194–211. https://‍

348 P. Thomas Schoenemann Bramão, I., Faísca, L., Forkstam, C., Reis, A., Petersson, K. M. (2010). Cortical brain regions associated with color processing: An FMRI study. Open Neuroimaging J. 4, 164–173. https://‍ Bruner, E., Preuss, T. M., Chen, X., Rilling, J. K. (2016). Evidence for expansion of the precuneus in human evolution. Brain Struct. Funct.  https://‍ Changizi, M. A., Shimojo, S. (2005). Parcellation and area-area connectivity as a function of neocortex size. Brain. Behav. Evol. 66, 88–98.  https://‍ Christiansen, M. H., Kelly, M. L., Shillcock, R. C., Greenfield, K. (2010). Impaired artificial grammar learning in agrammatism. Cognition 116, 382–393. https://‍ DeCasien, A. R., Williams, S. A., Higham, J. P. (2017). Primate brain size is predicted by diet but not sociality. Nat. Ecol. Evol. 1, 0112.  https://‍ Dunbar, R. I. M. (2003). The Social Brain: Mind, Language, and Society in Evolutionary Perspective. Annu. Rev. Anthropol. 32, 163–81. https://‍ Falk, D. (2014). Interpreting sulci on hominin endocasts: old hypotheses and new findings. Front. Hum. Neurosci. 8, 1–11.  https://‍ Falk, D. (1983). Cerebral cortices of East African early hominids. Science 221, 1072–1074. https://‍ Fan, L., Li, H., Zhuo, J., Zhang, Y., Wang, J., Chen, L., Yang, Z., Chu, C., Xie, S., Laird, A. R., Fox, P. T., Eickhoff, S. B., Yu, C., Jiang, T. (2016). The Human Brainnetome Atlas: A New Brain Atlas Based on Connectional Architecture. Cereb. Cortex 26, 3508–3526. https://‍ Fedorenko, E., Duncan, J., Kanwisher, N. (2012). Language-Selective and Domain-General Regions Lie Side by Side within Broca’s Area. Curr. Biol. 22, 2059–2062. https://‍ Gilbert, W. H., Holloway, R. L., Kubo, D., Kono, R. T., Suwa, G. (2008). Tomographic analysis of the Daka calvaria. Homo Erectus Pleistocene Evid. Middle Awash Ethiop. Univ. Calif. Press Berkeley Los Angel. 329–347. Gil-da-Costa, R., Martin, A., Lopes, M. A., Munoz, M., Fritz, J. B., Braun, A. R. (2006). Speciesspecific calls activate homologs of Broca’s and Wernicke’s areas in the macaque. Nat Neurosci 9, 1064–1070.  https://‍ Gong, T., Shuai, L., Zhang, M. (2014). Modelling language evolution: Examples and predictions. Phys. Life Rev. 11, 280–302.  https://‍ Grabner, G., Janke, A. L., Budge, M. M., Smith, D., Pruessner, J., Collins, D. L. (2006). Symmetric Atlasing and Model Based Segmentation: An Application to the Hippocampus in Older Adults, in: Larsen, R., Nielsen, M., Sporring, J. (Eds.), Medical Image Computing and Computer-Assisted Intervention  – MICCAI 2006: 9th International Conference, Copenhagen, Denmark, October 1–6, 2006. Proceedings, Part  II. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 58–66. Grodzinsky, Y. (2000). The neurology of syntax: language use without Broca’s area. Behav. Brain Sci. 23, 1–21; discussion 21–71.  https://‍ Harmand, S., Lewis, J. E., Feibel, C. S., Lepre, C. J., Prat, S., Lenoble, A., Boës, X., Quinn, R. L., Brenet, M., Arroyo, A., Taylor, N., Clément, S., Daver, G., Brugal, J. -P., Leakey, L., Mortlock, R. A., Wright, J. D., Lokorodi, S., Kirwa, C., Kent, D. V., Roche, H. (2015). 3.3-million-yearold stone tools from Lomekwi 3, West Turkana, Kenya. Nature 521, 310–315. https://‍

The evolution of enhanced conceptual complexity and of Broca’s area 349

Holloway, R. L. (1983). Human paleontological evidence relevant to language behavior. Hum. Neurobiol. 2, 105–114. Holloway, R. L. (1980). Indonesian “Solo” (Ngandong) endocranial reconstructions: Some preliminary obserbations and comparisons with Neanderthal and Homo erectus groups. Am. J. Phys. Anthropol. 53, 285–295.  https://‍ Holloway, R. L. (1976). Paleoneurological evidence for language origins. Ann. N. Y. Acad. Sci. 280, 330–348.  https://‍ Holloway, R. L., Broadfield, D. C., Yuan, M. S. (2004). The Human Fossil Record, Volume 3. Brain Endocasts – The Paleoneurological Evidence, The Human Fossil Record. John Wiley & Sons, Hoboken.  https://‍ Humphrey, N. (1984). The social function of intellect, in: Consciousness Regained. Oxford University Press, Oxford, pp. 14–28. Isaac, G. L. (1976). Stages of cultural elaboration in the pleistocene: Possible archaeological indicators of the development of language capabilities. Ann. N. Y. Acad. Sci. 280, 275–288. https://‍ Jacob, F. (1977). Evolution and tinkering. Science 196, 1161–1166. https://‍ Jerison, H. J. (1985). Animal intelligence as encephalization. Philos. Trans. R. Soc. Lond. Ser. B 308, 21–35.  https://‍ Keller, S. S., Roberts, N., Hopkins, W. (2009). A comparative magnetic resonance imaging study of the anatomy, variability, and asymmetry of Broca’s area in the human and chimpanzee brain. J Neurosci 29, 14607–16.  https://‍ Lieberman, P. (2000). Human language and our reptilian brain : the subcortical bases of speech, syntax, and thought, Perspectives in cognitive neuroscience. Harvard University Press, Cambridge, Mass. Mars, R. B., Sallet, J., Neubert, F. -X., Rushworth, M. F. (2013). Connectivity profiles reveal the relationship between brain areas for social cognition in human and monkey temporoparietal cortex. Proc. Natl. Acad. Sci. 110, 10806–10811. https://‍ Mayr, E. (1978). Evolution. Sci. Am. 239, 47–55. https://‍ Miller, G. A., Gildea, P. M. (1991). How children learn words, in: Wang, W. S. -Y. (Ed.), The Emergence of Language: Development and Evolution. W. H. Freeman, New York, pp. 150–158. Morgan, T. J. H., Uomini, N. T., Rendell, L. E., Chouinard-Thuly, L., Street, S. E., Lewis, H. M., Cross, C. P., Evans, C., Kearney, R., de la Torre, I., Whiten, A., Laland, K. N. (2015). Experimental evidence for the co-evolution of hominin tool-making teaching and language. Nat. Commun. 6, 6029.  https://‍ Nakajima, T., Hosaka, R., Tsuda, I., Tanji, J., Mushiake, H. (2013). Two-Dimensional Representation of Action and Arm-Use Sequences in the Presupplementary and Supplementary Motor Areas. J. Neurosci. 33, 15533–15544. https://‍ Petersson, K. -M., Folia, V., Hagoort, P. (2012). What artificial grammar learning reveals about the neurobiology of syntax. Brain Lang. 120, 83–95. https://‍ Petrides, M., Cadoret, G., Mackey, S. (2005). Orofacial somatomotor responses in the macaque monkey homologue of Broca’s area. Nature 435, 1235–8. https://‍

350 P. Thomas Schoenemann Petrides, M., Pandya, D. N. (2009). Distinct Parietal and Temporal Pathways to the Homologues of Broca’s Area in the Monkey. PLoS Biol. 7, e1000170. https://‍ Picton, T. W. (1992). The P300 wave of the human event-related potential. J. Clin. Neurophysiol. 9, 456–479.  https://‍ Powell, A., Shennan, S., Thomas, M. G. (2009). Late Pleistocene Demography and the Appearance of Modern Human Behavior. Science 324, 1298–1301. https://‍ Powell, L. E., Isler, K., Barton, R. A. (2017). Re-evaluating the link between brain size and behavioural ecology in primates. Proc. R. Soc. B Biol. Sci. 284, 20171765. https://‍ Poza-Rey, E. M., Lozano, M., Arsuaga, J. L. (2017). Brain asymmetries and handedness in the specimens from the Sima de los Huesos site (Atapuerca, Spain). Quat. Int. 433, 32–44. https://‍ Putt, S. S., Wijeakumar, S., Franciscus, R. G., Spencer, J. P. (2017). The functional brain networks that underlie Early Stone Age tool manufacture. Nat. Hum. Behav. https://‍ Savage-Rumbaugh, E. S., Murphy, J., Sevcik, R. A., Brakke, K. E., Williams, S. L., Rumbaugh, D. M. (1993). Language comprehension in ape and child. Monogr. Soc. Res. Child Dev. 58, 1–222.  https://‍ Schenker, N. M., Buxhoeveden, D. P., Blackmon, W. L., Amunts, K., Zilles, K., Semendeferi, K. (2008). A Comparative Quantitative Analysis of Cytoarchitecture and Minicolumnar Organization in Broca’s Area in Humans and Great Apes. J. Comp. Neurol. 510, 117–128. https://‍ Schenker, N. M., Hopkins, W. D., Spocter, M. A., Garrison, A. R., Stimpson, C. D., Erwin, J. M., Hof, P. R., Sherwood, C. C. (2010). Broca’s area homologue in chimpanzees (Pan troglodytes): probabilistic mapping, asymmetry, and comparison to humans. Cereb. Cortex 20, 730–42.  https://‍ Schoenemann, P. T. (2017). A complex-adaptive-systems approach to the evolution of language and the brain, in: Mufwene, S. S., Coupé, C., Pellegrino, F. (Eds.), Complexity in Language: Developmental and Evolutionary Perspectives, Cambridge Approaches to Language Contact. Cambridge University Press, pp. 67–100.  https://‍ Schoenemann, P. T. (2013). Hominid Brain Evolution, in: Begun, D. R. (Ed.), A Companion to Paleoanthropology. Wiley-Blackwell, Chichester, UK, pp. 136–164. https://‍ Schoenemann, P. T. (2012). Evolution of brain and language, in: Hofman, M. A., Falk, D. (Eds.), Progress in Brain Research. Elsevier, Amsterdam: The Netherlands, pp. 443–459. Schoenemann, P. T. (1999). Syntax as an emergent characteristic of the evolution of semantic complexity. Minds Mach. 9, 309–346.  https://‍ Schoenemann, P. T., Holloway, R. L. (2016). Brain function and Broca’s Cap: A meta-analysis of fMRI studies. Am. J. Phys. Anthropol. 159, 283. Schoenemann, P. T., Sheehan, M. J., Glotzer, L. D. (2005). Prefrontal white matter volume is disproportionately larger in humans than in other primates. Nat. Neurosci. 8, 242–52. https://‍

The evolution of enhanced conceptual complexity and of Broca’s area 351

Semaw, S., Rogers, M. J., Quade, J., Renne, P. R., Butler, R. F., Dominguez-Rodrigo, M., Stout, D., Hart, W. S., Pickering, T., Simpson, S. W. (2003). 2.6-Million-year-old stone tools and associated bones from OGS-6 and OGS-7, Gona, Afar, Ethiopia. J Hum Evol 45, 169–77. https://‍‍00093-9 Seyfarth, R. M., Cheney, D. L., Marler, P. (1980). Monkey Responses to Three Different Alarm Calls: Evidence of Predator Classification and Semantic Communication. Science 210, 801–803.  https://‍ Smith, K., Kirby, S., Brighton, H. (2003). Iterated learning: a framework for the emergence of language. Artif. Life 9, 371–86.  https://‍ Snowdon, C. T. (1990). Language capacities of nonhuman animals. Yearb. Phys. Anthropol. 33, 215–243.  https://‍ Stephan, H., Frahm, H., Baron, G. (1981). New and revised data on volumes of brain structures in Insectivores and Primates. Folia Primatol. (Basel) 35, 1–29. https://‍ Stout, D., Chaminade, T. (2012). Stone tools, language and the brain in human evolution. Philos. Trans. R. Soc. Lond. B Biol. Sci. 367, 75–87.  https://‍ Taglialatela, J. P., Russell, J. L., Schaeffer, J. A., Hopkins, W. D. (2008). Communicative signaling activates “Broca’s” homolog in chimpanzees. Curr Biol 18, 343–8. https://‍ Thompson-Schill, S. L., D’Esposito, M., Aguirre, G. K., Farah, M. J. (1997). Role of left inferior prefrontal cortex in retrieval of semantic knowledge: a reevaluation. Proc Natl Acad Sci U A 94, 14792–7.  https://‍ Toth, N., Schick, K. (2009). The Importance of Actualistic Studies in Early Stone Age Research: Some Personal Reflections, in: Schick, K., Toth, N., Toth, N. (Eds.), The Cutting Edge: New Approaches to the Archaeology of Human Origins, Stone Age Institute Publication Series. Stone Age Institute Press, Gosport, IN, pp. 267–344. Uylings, H. B. M., Van Eden, C. G. (1990). Qualitative and quantitative comparison of the prefrontal cortex in rat and in primates, including humans, in: Uylings, H. B. M., Van Eden, C. G., De Bruin, J. P. C., Corner, M. A., Feenstra, M. G. P. (Eds.), Progress in Brain Research, Vol. 85, Progress in Brain Research. Elsevier Science Publishers, New York, pp. 31–62. Wilkins, W. K., Wakefield, J. (1995). Brains evolution and neurolinguistic preconditions. Behav. Brain Sci. 18, 161–182.  https://‍ Wilson, B., Kikuchi, Y., Sun, L., Hunter, D., Dick, F., Smith, K., Thiele, A., Griffiths, T. D., Marslen-Wilson, W. D., Petkov, C. I. (2015). Auditory sequence processing reveals evolutionarily conserved regions of frontal cortex in macaques and humans. Nat. Commun. 6, 8901.  https://‍ Wilson, B., Slater, H., Kikuchi, Y., Milne, A. E., Marslen-Wilson, W. D., Smith, K., Petkov, C. I. (2013). Auditory Artificial Grammar Learning in Macaque and Marmoset Monkeys. J. Neurosci. 33, 18825–18835.  https://‍ Wilson, S. M., Galantucci, S., Tartaglia, M. C., Rising, K., Patterson, D. K., Henry, M. L., Ogar, J. M., DeLeon, J., Miller, B. L., Gorno-Tempini, M. L. (2011). Syntactic Processing Depends on Dorsal Language Tracts. Neuron 72, 397–403. https://‍

Mental travels and the cognitive basis of language Michael C. Corballis University of Auckland

I argue that a critical feature of language that distinguishes it from animal communication is displacement, the means to communicate about the non-present. This implies a capacity for mental travels in time and space, which is the ability to call to mind past episodes, imagine future ones or purely fictitious ones, and locate them in different places. While mental travel in time, in particular, is often considered to be unique to humans, behavioral and neurophysiological evidence suggests that it is evident in some form, at least, in nonhuman animals, including rodents, and may go far back in the evolution of animals that move. The linguistic capacity to share experiences that transcend space and time evolved much more recently, accelerating during the Pleistocene with increasing demands for effective cooperation and long-distance planning. The expansion of mental time travel probably co-evolved with the development of language itself, because the sharing of memories, plans, and stories vastly added to our own individual experiences. From this perspective, the critical period of the Road Map runs from the emergence of the genus Homo around 2 million years ago to the present. Keywords: displacement, hippocampus, Homo, mental time travel, Pleistocene, theory of mind

Introduction According to an emerging consensus, one of the critical features of language that distinguishes it from animal communication is displacement, the capacity to communicate about the non-present (e.g., Bickerton, 2014; Corballis, 2013; 2017; Dor, 2015; Gardenförs & Osvath, 2010). As Hockett (1960) had earlier put it: “Man is apparently almost unique in being able to talk about things that are remote in space or time (or both) from where the talking goes on. This feature – ‘displacement’ – seems to be definitely lacking in the vocal signaling of man’s closest relatives, though it does occur in bee dancing.” More recently, Bickerton (2014) proposed

https://‍ © 2020 John Benjamins Publishing Company

Mental travels and the cognitive basis of language 353

that it is displacement, rather than the use of arbitrary symbols, that provides “the road into language” (p. 93). The adaptiveness of being able to communicate about non-present events, and so share them, may well have driven the evolution of language itself. Displacement, though, depends on the capacity to think about the non-present without the use of language, a capacity that has been dubbed mental time travel (Suddendorf & Corballis, 1997), and the main contention of this article is that mental time travel has origins that long precede language itself. The quote from Hockett suggests that displacement, and perhaps mental time travel itself, is unique to humans. In contrast, this article is written in the spirit of Darwin’s (1871) maxim that “The difference in mind between man and the higher animals, great as it is, certainly is one of degree and not of kind.” In interpreting evidence from nonhuman species, the emphasis is on continuity rather than on qualitative changes in the course of evolution. This may seem to conflict with the principle of parsimony, whereby the simplest of possible explanations is to be preferred. Explanations are considered ‘simple’ to the extent to which they do not involve higher thought processes, such as language or, as discussed below, mental time travel. The ‘Clever Hans’ debacle is often cited as an example of failure to apply this principle (Sebeok & Rosenthal, 1981). Nevertheless, it is more in keeping with evolutionary theory to suppose that the evolution of cognitive processes, including mental time travel and indeed language itself, is gradual and incremental, and the burden of proof lies in showing this not to be the case, rather than in showing humans to be qualitatively different from other species. While this issue may be something of a “red herring” in the context of the present volume, the history of ideas about evolution is beset with claims about human uniqueness that owe more to religious or philosophical conviction than to biological reality, and the ideas expressed here and elsewhere provide something of a necessary corrective.

Mental time travel Mental time travel includes episodic memories, imagined future scenarios, and even imagined scenarios or narratives that need have no explicit reference to time (“Once upon a time”). Since being in different places must occur at different times, mental time travel traverses space as well as time. As with language itself, there have been strong claims that mental time travel is uniquely human (Bischof-Kohler, 1985; Köhler, 1925; Roberts, 2002; Premack, 2007; Suddendorf & Corballis, 1997, 2007; Penn et al., 2008; Tulving, 2002) – an evolutionary singularity. Such claims, though, are under increasing challenge from evidence of both episodic memory and future thinking in nonhuman animals, including chimpanzees (e.g., Janmaat et al., 2014; Osvath & Karvonen, 2012), magpies (Zinkivskay et al., 2009), meadow voles (Ferkin et al., 2008), pigeons (Gibson et al., 2012), rats

354 Michael C. Corballis

(Babb & Crystal, 2005; Wilson et al., 2013), ravens (Kabadayi & Osvath, 2017), and scrub jays (Clayton et  al., 2003). Although the behavioral evidence is now voluminous and often ingenious, it is sometimes explicable in terms of processes other than mental time travel, such as trial-and-error learning or simple association (Suddendorf & Corballis, 2010). There is also ambiguity as to how episodic memory can be defined behaviorally. One suggestion is that memory for “what,” “where,” and “when” – the WWW criterion (Suddendorf & Corballis, 2007)  – provides a sufficient definition; the scrub jay that behaves as though remembering where and when it cached particular items of food may be deemed to have memory for the caching episode itself (Clayton et al., 2003). This need not mean, though, that the bird mentally re-enacts the episode; it may simply know where the food is cached and how long it has been there. I know where I was born, what was born, and when I was born, but have no episodic memory of the event, suggesting that the WWW criterion is not sufficient. This kind of criticism has led some animal researchers to write of “episodic-like memory,” as though hedging bets as to the precise nature of the memory (e.g., Clayton et al., 2003). Behavioral work also raises the question of whether episodic-like memory in some animals is a “special-purpose” function, without the broad purview of episodic memory as we understand it in ourselves. For instance, much of the evidence from birds comes from the caching of food, where recording of time and place is important, and may or may not apply in other contexts. Reference to future episodes perhaps suggests more personal reference, as when a scrub jay re-caches food items after being watched by another bird, presumably to deny the watching bird the opportunity for theft (Clayton et al., 2003). Even so, re-caching may be a learned strategy, and the bird may have no internal image of a future act of theft. A more capricious element seems to be injected by the chimpanzee Santino in the Furuvik Zoo in Sweden, who hides rocks in various locations with the evident intent of later hurling them at visitors to the Zoo (Osvath & Karvonen, 2002). One feature of episodic memory in humans that suggests a more general capacity, but one that is difficult to assess behaviorally, is what Tulving (2002) has called the autonoetic aspect, the sense of replaying a personal experience. When a bird behaves as though it remembers when and where it cached items of food, it is difficult to tell whether it mentally replays the act of caching, or preplays the later act of retrieving them, or indeed whether it can envisage earlier or later activity unconnected with caching itself.

Role of hippocampus and entorhinal cortex The autonoetic aspect in nonhuman animals lacking expressive language may be better indexed through neurophysiological recording than from behavioral

Mental travels and the cognitive basis of language 355

measures. Aside from language itself, the best “read-out” of mental time travel, whether in humans or rodents, comes from the hippocampus, which is the hub of the episodic memory system in humans. Brain imaging shows the hippocampus to be active both when people recall past events and imagine future ones (Martin et al., 2011), and also when they simply construct imaginary scenes without specific reference to past or future (Hassabis, et al., 2007). Prospective memory is also positively correlated with the grey matter volume of the medial temporal lobe, with the strongest correlation in the hippocampus itself (Gordon et al., 2011). Of course, memory does depend on circuits beyond the hippocampus, including the angular gyrus, medial prefrontal cortex, parahippocampal gyrus, and retrosplenial cortex (e.g., Rugg & Vilberg, 2013), but the dramatic loss of the ability to form new memories or even imagine future episodes following bilateral removal of the hippocampus is testimony to its central role (Corkin, 2013; Tulving, 2002; Wearing, 2005). The anterior hippocampus appears to be especially involved in the construction of recent events (e.g., 2 weeks ago), and the posterior portion in recovering more distant events (10 years ago) (Bonnici et al., 2012). Neurophysiological recordings from the rat also suggest hippocampal involvement in mental time travel. The rat hippocampus includes place cells, recording the animal’s location in space (O’Keefe & Nadel, 1978). These cells sometimes also fire in sequence after the rat has been removed from a particular environment, such as a maze. These sequences, known as “short-wave ripples” (SWRs), map out trajectories in the environment, as though the animal were “thinking” about its experience. The trajectories are sometimes “replays” of trajectories previously taken, sometimes the reverse of those trajectories (Foster & Wilson, 2006), and sometimes trajectories the animal did not take at all, some of which may be anticipations of future trajectories (Pfeiffer & Foster, 2013). Reviewing the evidence, Moser et al. (2015) write that “the replay phenomenon may support ‘mental time travel’ (Suddendorf & Corballis, 2007) through the spatial map, both forward and backward in time (p. 6).” These findings show parallels with the evidence from humans. First is the involvement of the hippocampus itself, known to act not only as a cognitive map to record current location but also for the activation of different maps; in one study, the rat hippocampus constructed eleven different maps of eleven different rooms in which it had been placed (Alme et  al., 2014). Second, SWRs imply both the replay of past episodes and the preplay of future ones, along with imagined trajectories that need bear little relation to either remembered or planned events. Third, they appear to be spontaneous and innovative, and not the products of habit. Of course, they depend also on learned knowledge of the environment, just as our own memories and plans are generally set in familiar territories. These properties are derived from recordings from animals that are stationary and removed from

356 Michael C. Corballis

the spatial environment, just as the human evidence is obtained from stationary people in a brain scanner removed from their actual travels. Of course, these mental travels may still seem limited relative to the human ability to cover extensive regions of space and time, and in the course of evolution hippocampal mapping may well have extended into what has been termed a World Graph (Guazzelli, Corbacho, Bota, & Arbib, 1998; Lieblich & Arbib, 1982). The question of whether this involved a change in kind or in degree remains a challenge for research. There are further parallels. Moser et al. note that hippocampal cells in the rat respond not only to spatial locations, but also to nonspatial features of events, such as odors, tactile inputs, and timing, and these are the same cells that fire like place cells when animals move around in space. They therefore express the location of the animal in combination with information about events that take place or took place there. Miller et al. (2013) report very similar results from single-cell recordings in the human hippocampus (in patients awaiting surgery for epilepsy); when the patients were asked to recall items they had delivered to locations in a virtualreality environment, recall of each item activated the place cell corresponding to its location. In both rat and human, the cells that combine nonspatial content with remembered locations appear to have the hallmarks of episodic memory. Drawing on both human fMRI evidence and neurophysiological recordings in rodents, Deuker et al. (2016) write of “an event based map of memory space in the hippocampus,” scaled with “the remembered proximity of events in space and time.” Similarly, Maguire et al. (2016), in a critical evaluation of the evidence, suggest that the hippocampus plays the critical role in martialing details for the construction of scenes, whether in navigation, imagining the future, or scene perception. It also underlies what has been termed “boundary extension,” in which scenes are remembered as more extensive than what was available in the sensory input (Intraub & Richardson, 1989). The concordance of hippocampal function between humans and animals, along with the (possibly ambiguous) behavioral studies, prompted me to change my earlier opinion, and argue that mental time travel is not unique to humans, but probably has a long evolutionary history (Corballis, 2013; but see Suddendorf, 2013). Of course mental travels have almost certainly adapted to changing ecological conditions, such as the incorporation of societal elements, and the added complexities of a manufactured environment, especially during the Pleistocene, but there is so far little to refute the Darwinian contention that these are differences in degree rather than in kind. Place cells in the hippocampus are modulated by a wider network of grid cells encoding spatial scale, head-direction cells, and border cells in the neighboring entorhinal cortex, which have been identified through single-cell recordings in pre-operative humans (Jacobs et  al., 2013) as well as in rodents (Hafting et  al.,

Mental travels and the cognitive basis of language 357

2005). The interactions among these cells provide a rapidly changing representation of an animal’s location as it moves around. Since an animal’s trajectory can be traced in sequential firing after the animal has been removed from a maze, the circuit may also mediate representation of events that are not in the present. Based largely on the evidence from rodents, Hasselmo (2009) has developed a model of “mental time travel along encoded trajectories using grid cells.” The model accounts for how lesions of the hippocampus and entorhinal cortex impair not only rats’ performance in tasks that could be solved by retrieval of trajectories, but also human episodic memory. Grid cells operate in modular fashion, allowing for different combinations. Moser et al. (2015) write: The mechanism would be similar to that of a combination lock in which 10,000 combinations may be generated with only four modules of 10 possible values, or that of an alphabet in which all words of a language can be generated by combining only 30 letters or less. (p. 11)

This suggest a generativity that may well be linked to the generativity of language itself, and even suggests a connection between spatial navigation and the generation of sentences. That is, language may allow us to form unlimited combinations of words, not because language is itself the vehicle for generativity, but because the number of episodes one might bring to mind is itself unlimited. The combinatorial possibilities may well have spiraled with the emergence of manufacture and with differentiation of societal roles, and the suggestion here is that it was these material and social changes that drove the expansion of generative language itself. One aspect of generativity, whether in language or imagination, is its recursive nature; Pinker and Jackendoff (2005) assert that “the only reason language needs to be recursive is because its function is to express recursive thoughts (p. 230).” On this view, then, mental travels in time and space may be said to occur even in the rat, and probably go far back in the evolution of animals that move and need to know where they are, where they have been, and where they might go next. These mental excursions do appear to be generative, and carry associations relating to particular events. Functionally, the process of re-enactment is closer to the autonoetic aspect, the “bringing to mind” of events, than is the WWW criterion suggested for behavioral studies. The generative aspect allows the construction of possible future events, providing choice between alternatives  – although the extent to which trajectories implied by hippocampal activity in the rat are truly involved in future planning remains to be tested. Nevertheless, such enactments can be individually adaptive in the absence of any capacity to share them, and may therefore have been established well before the evolution of language, and indeed remain adaptive in humans independently of language. We have many

358 Michael C. Corballis

experiences we do not share, and even cannot share because language is itself limited. Conversely, though, sharing itself can be adaptive because it allows mental travels to be experienced vicariously. One caveat is that “replay” or “preplay” of a rat’s trajectory in a maze may have to do with consolidation rather than with episodic memory (Buzsáki, 1989). Hippocampal activity representing trajectories outside of those actually taken may then function primarily to consolidate the representation of the maze itself, and indeed extend the mental map beyond the area actually traversed, rather than to explicitly bring to mind the experience. Nevertheless, opinion seems to be converging on the view that the enactments serve a double function (Derdikman & Moser, 2010), both supplying episodic information and acting to consolidate memory for an environment. Enacting an episodic memory may influence the memory itself, since it is a constructive process and prone to error (Schacter, 2012). What is consolidated in memory, then, may be the enactment, rather than the original episode.

So to language The suggestion here is that mental time travel long preceded language itself, and in that sense may be considered part of a “language-ready brain,” although its evolution was not driven by any communicative advantage it might have bestowed. On the other hand, in common with all the authors for this volume, I reject the notion that language is itself fundamentally a mode of thought, which Chomsky (2010) calls “I-language,” and considers unique to humans, with communication a mere by-product. The alternative proposed here is that language is fundamentally a device for communication, and only secondarily as an aid to thinking. This view is similar to that of Dor (2015), who defines language as the sharing of experience, which includes our experience of the non-present (e.g., memories and future plans), but extends the notion of displacement more generally to “that which cannot be shown.” That is, we need language not only to share memories, plans, or stories, but sometimes also to explain aspects of the present, such as “this woman in red, right there by the door, is the cousin of my ex-wife (p. 28).” Sharing of episodes experienced in the present can normally be accomplished by joint perception, perhaps augmented by cues to direct attention, such as pointing. Language itself is not required. A crowd may share the experience of a football match with little other than emotional or attentional signaling  – although this does not stop sports commentators on television telling us what is happening. But the sharing of mental experience relating to the non-present requires the means to address objects or events not immediately available to the senses. Listening to a radio commentary requires symbols to indicate who is doing what on the field.

Mental travels and the cognitive basis of language 359

Such descriptions can apply across time and space, as you relay what you did last week or might do tomorrow – or even tell of a recent theory of, say, language itself. Language evolved, I suggest, primarily to allow us to share experiences not linked to the present, or that “which cannot be shown.” As Dor (2015) phrased it in the title of his book, it is “the instruction of imagination.” Reference to the non-present requires symbols that can be linked to concepts. These symbols might be iconic, as in pantomimic gestures, or purely arbitrary. They are presumably learned in the present, but can then be used in the absence of their referents to refer to past or future episodes in which they feature. Thus a child learns that the word “doll” applies to that particular object, but can then use the word in describing a past episode involving a doll, or perhaps explain that she wants to play with it. She can also understand when someone else recounts a story about a doll. Using symbols to refer to concepts is a matter of association, apparently not restricted to humans. The bonobo Kanzi could find his way to particular locations from instructions as to distance and direction provided by lexigrams (noniconic visual symbols), even when the locations were novel and out of sight (Menzel et al., 2002). A chimpanzee, Panzee, could select a token to obtain a particular food in the future, even when a more immediately preferred food was available (Beran et al., 2012). Two border collies could retrieve individually named objects from a pool of objects that are out of sight, implying vocabularies of at least 200 in one case (Kaminski et al., 2004) and over 1,000 in another (Pilley & Reid, 2011), and the bonobo Kanzi also has a receptive vocabulary estimated to be around 3,000 (Raffaele, 2006) – a not inconsiderable lexicon. These examples illustrate the ability not only to make symbolic reference, but also to do so in ways that transcend time and space – albeit in limited ways. The human lexicon is not only receptive but also productive. A border collie may understand the word “doll” and fetch the appropriate object, but cannot utter the word itself. Yet there is at least a degree of productively in great apes. Kanzi uses symbols productively by pointing to lexigrams, but these were created by humans. Kanzi also uses gestures, as does the chimpanzee Washoe, and the gorilla Koko is said, perhaps with an element of hyperbole, to have a gestural vocabulary of over a thousand signs referring to objects or actions (Patterson & Gordon, 2001). These symbolic acts come closer to language that is both produced and understood, although still at least partly dependent on human guidance. Nevertheless there are some claims for the spontaneous generation of gestures and mimes. Russon and Andrews (2001) counted eighteen mimes among orangutans in a forest-living enclave in Indonesia, and Hobaiter and Byrne (2011) recorded 4,397 gestures, involving at least 66 different kinds, made by chimpanzees in the Budongo National Park in Uganda.

360 Michael C. Corballis

Neural links Language and memory have traditionally been treated as separate domains, with distinct neural substrates. Recent evidence suggests they are linked. Direct recording from the hippocampus in human patients showed that theta oscillations, a marker of memory activity, were modulated during sentence processing by linguistic constraints unrelated to memorization (Piai et al., 2015). Studies of patients with hippocampal amnesia also reveal language deficits, suggesting that the hippocampus plays a role in “the flexible use and processing of language” (Duff & Browne-Schmidt, 2012, p. 1). Commenting on these and other findings, Covington and Duff (2016) suggest that the hippocampus should be included as part of the language circuit. In humans, at least, the link between language and memory is implicit in the often-drawn distinction between nondeclarative memory, which includes tacit knowledge and skills, and declarative memory, which includes both episodic and semantic memory (e.g., Squire, 2004). The very notion of declarative memory – or memory that can be declared – recognizes the association with language. The link probably goes beyond memory to mental time travels generally, which in humans are virtually always documented from verbal reports, whether accounts of remembered events, future projections, or simply the telling of stories. The involvement of the hippocampus may also supply the link between the generativity of mental travels and that of language itself. This does not mean that the language and mental time travel circuits are identical. We can travel mentally in time and space without resort to language, and the main theme of this article is that mental time travel goes far back in evolution, and is present in animals without language. The proposal is simply that the hippocampus in humans is part of the circuit allowing our mental travels to be shared through language.

Theory of mind One further requirement for language is theory of mind – the understanding of what others feel, think, or believe. Whether nonhuman species are capable of theory of mind has been much discussed and disputed since Premack and Woodruff ’s (1978) famous question: “Does the chimpanzee have a theory of mind?” As late as 2008 the answer to Premack and Woodruff ’s question was still widely disputed. Penn et  al. (2008) argued that even chimpanzees, our closest nonhuman relatives, have no theory of mind, describing such attributions as “Darwin’s mistake,” while Call and Tomasello (2008) conclude that the 30 years of research showed chimpanzees to have an understanding of the goals, intentions, perceptions, and knowledge of others, but no understanding of others’ beliefs or desires. A critical

Mental travels and the cognitive basis of language 361

test, sometimes considered the gold-standard test of theory of mind, is whether the individual shows understanding that another individual has a false belief. Krupenye et  al. (2016) recently showed that great apes, including chimpanzees, bonobos and orangutans, look in anticipation of where a human agent will falsely believe an object has been hidden. Such findings raise the possibility that a sophisticated theory of mind evolved before the emergence of our hominin forebears. In any case, theory of mind is probably not all-or-none. It may well have begun with empathy, the ability to read emotion (de Waal, 2012), and progressed into the more complex understanding of others’ beliefs, with varying degrees of recursion (Corballis, 2011). Theory of mind extends our mental travels beyond excursions in time and space, and into the minds of others, giving rise to storytelling through the eyes of other people, real or imaginary (Corballis, 2017). Theory of mind, though, is also critical to the act of communication itself. According to the philosopher Paul Grice (1989), communicative language can only work if the speaker knows what is in the listener’s mind, but also knows that the listener knows this. Here is the rather contorted way in which he put it: He said that P; he could not have done this unless he thought that Q. He knows (and knows that I know that I know that he knows) that it is necessary to suppose that Q; he has done nothing to stop me thinking, or is at least willing for me to think that Q. [pp. 30–31]

This implies at least third-order recursion (“I know that he knows that I know”), and implies that language involves a good deal more cognitive processing than is implied by the words themselves. Scott-Phillips (2015) described language as underdetermined, and heavily dependent on shared thinking that is not physically expressed. More generally, social understanding may hold part of the key to the evolution of language itself, but again may not have emerged fully-fledged in humans. Based mainly on research with wild baboons, Seyfarth and Cheney (2017) show that coordinated activity in nonhuman primates has some of the structure of language, including discrete elements, open-ended rules, and computational structure, accompanied by vocal calls that by themselves convey relatively little information. When language itself was added, whether vocal or gestural, its basic grammatical features may have been already established, at least in part. Simply imagining a complex social episode involves the sequencing of concepts, perhaps forming the basis of sentence structure, with conventions added to signal relations among concepts, shifts in focus, and specifications of time and place. The cognitive linguist Gilles Fauconnier (2003) makes a similar point:

362 Michael C. Corballis

When we engage in any language activity, we draw unconsciously on vast cognitive and cultural resources, call up models and frames, set up multiple connections, coordinate large arrays of information, and engage in creative mappings, transfers, and elaborations. (p. 540)

Toward a new road map The main theme of this article is that a key feature of language is displacement, the capacity to communicate about the non-present, or generally about that which cannot be shown. As such, it depends on the ability to think about non-present events, including what has been dubbed mental time travel. The capacity to replay or preplay events probably has a long evolutionary history, perhaps common to animals that move, and need to keep track of where they are, where they have been, and where they might go next. The capacity for mental time travel has almost certainly increased over evolutionary time, although its extent is difficult to measure in nonhuman animals lacking language. Some evidence derives from mime. Great apes use gesture more often to make requests than declarative statements but there is some evidence for the re-enactment of past episodes (Russon & Andrews, 2001), including one case of an orangutan miming an event that occurred several months before. Chimpanzees and bonobos appear to remember specific scenes in movies, as evidenced by their reaction to a small change in the movie when it is shown a second time, 24 hours later (Kano & Hirata, 2015). Beran (2015) gives similar examples of memory for movies by language-trained chimpanzees, including their use of lexigrams to indicate what happens next in movies shown a second time. Of course these examples pale beside the human ability to remember episodes that occurred years or even decades ago, although the gap between human and ape memory may be less than we imagine. We now know that human memory for events is unreliable, even false (Loftus & Ketcham, 1994; Roediger & McDermott, 1995); episodic memory, in particular, is more an act of construction than a recording of facts, and arguably more a system for planning future acts than for accurate recording of the past (Klein, Robertson & Delton, 2010). Moreover, the number of episodes we remember is surely tiny relative to the number we experience in life. The Czech novelist Milan Kundera 2002) may have exaggerated only slightly when he wrote that “without much risk of error I could assume that the memory retains no more than a millionth, a hundred-millionth, in short an utterly infinitesimal bit of the lived life” (pp. 122–123). I can remember exactly what I ate for lunch yesterday but nothing of what I ate on the same day last year.

Mental travels and the cognitive basis of language 363

The Pleistocene A critical period for the expansion of mental time travel was the Pleistocene, when an emergent hunter-gatherer pattern favored a life-style extended in both space and time (Gardenförs, 2014; Gardenförs & Osvath, 2010). There were long delays between the acquisition and the use of tools, as well as geographical distance between the sources of raw material for tools and killing or butchering sites. Migration itself would have added to demands of time, space, and cooperation. The hunter-gatherer lifestyle involved frequent shifts of camp as resources were depleted, forcing the group to move on to another more abundant region – a pattern still evident in present-day hunter-gatherers (Venkataraman et  al., 2017). Migrations also increased in scale, with exoduses of Homo erectus from Africa to Eurasia beginning from about 2 million years ago (Hughes et al., 2007), and waves of migration of Homo sapiens out of African from about 120,000 years ago (Timmermann & Friedrich, 2016), eventually inhabiting most of the globe. The hunter-gatherer life-style also increased the demand for more effective and detailed communication. Gardenförs (2014) suggests that communication became more complex in the context of planning for future goals, requiring elaboration from single words to combinatorial structures necessary for cooperative action, and also for the development of “narratives, particularly gossip.” Language itself expanded beyond the individual sentence to narrative, including culturally important means of expression such as story-telling, legend, and myth (Barnard, 2013). Following Bickerton (1990), Gardenförs suggests that communication developed to the level of protolanguage in Homo erectus, reaching full grammatical structure only in Homo sapiens – although there have been more recent suggestions that the Neandertals and Denisovans were also capable of grammatical language (Dediu & Levinson, 2013; Johansson, 2013). Pantomimic communication, minimally evident in great apes, probably expanded during the early Pleistocene (Donald, 1991), before conventionalizing and grammaticalizing into modern language  – whether signed or spoken. Enhanced communication would also have allowed mental time travels to be shared, so that the memories and plans of others are incorporated into our own, further increasing the demand for mental time travels. In short, communication efficacy and mental time travel co-evolved in an expanding spiral. In terms of the road map, then, the Pleistocene, and especially the interval from early Homo to the late emergence of Homo sapiens, probably assumes special importance in the journey to human cognition and language. The demands of the emergent “cognitive niche” (Tooby & DeVore, 1987) led to a broadening of spatial and temporal awareness, and an increased need for social cooperation that resulted in enhanced theory of mind and capacity for communication. These

364 Michael C. Corballis

developments are indexed by a rapid increase in brain size, the emergence of facultative bipedalism, and the gradual but cumulative development of manufactured tools. Tools themselves can tell part of the story. Stout and Chaminade (2012) contrast the Oldowan tool industry with the later Acheulian industry, which emerged in the Pleistocene and shows the first signs of syntactic structure. Brain imaging of present-day individuals making these tools reveal neural overlap with language circuits for Acheulian tools, but not Oldowan ones. Stout and Hecht (2017) further elaborate the extended role of technological innovation in the emergence of the human mind. Although the Pleistocene was witness to expansions of mental time travel, communication, theory of mind, and manufacture, the overall picture is one of continuity rather than saltation. Mental time travel is probably the most ancient of the precursors to human cognition and language, evident in mammals and birds, and perhaps even in insects such as bees. Technology in corvids rivals that in the chimpanzee (Emery & Clayton, 2004; Taylor et al., 2007), and theory of mind and language-like communication can be discerned in apes, albeit controversially. But the Pleistocene itself is something of a black hole, since we have been deprived, through extinctions, of all ancestors since our common ancestry with the chimpanzee, and must rely largely on fossil evidence, including increasingly accurate extraction of ancient DNA. Even so, the picture that is emerging is one of continuous if accelerated change through the Pleistocene itself, rather than some abrupt and seemingly miraculous event that transformed Homo sapiens itself.

Funding This paper was prepared for a workshop funded by NSF Grant No. BCS-1343544 “INSPIRE Track 1: Action, Vision and Language, and their Brain Mechanisms in Evolutionary Relationship,” (M.A. Arbib, Principal Investigator).

References Alme, C. B., Miao, C., Jezek, K., Treves, A., Moser, E. I. & Moser, M. -B. (2014). Place cells in the hippocampus: Eleven maps for eleven rooms. Proceedings of the National Academy of Sciences (USA), 111, 18428–18435.  https://‍ Babb, S. J. & Crystal, J. D. (2005). Discrimination of what, when, and where: Implications for episodic-like memory in rats. Learning & Motivation, 36, 177–189. https://‍ Barnard, A. (2013). Cognitive and social aspects of language origins. In Lefebvre, C., Comrie, B. &. Cohen, H. (Eds.), New perspectives on the origins of language (pp. 53–71). Amesterdam/ Philadelphia: John Benjamins.  https://‍ Beran, M. J. (2015). Animal memory: chimpanzees anticipate what comes next in short movies. Current Biology, 25, R827–R844  https://‍

Mental travels and the cognitive basis of language 365

Beran, M. J., Perdue, B. M., Bramlett, J. L., Menzel, C. R. & Evans, T. A. (2012). Prospective memory in a language-trained chimpanzee (Pan troglodytes). Learning & Motivation, 43, 192–199.  https://‍ Bickerton, D. (1990). Language and species. Chicago, IL: University of Chicago Press. Bickerton, D. (2014). More than nature needs: Language, mind, and evolution. Cambridge, MA: Harvard University Press.  https://‍ Bischof-Kohler, D. (1985). Zur phyogenese menschlicher motivation [On the phylogeny of human motivation]. In L. H. Eckensberger & E. D. Lantermann (Eds.), Emotion und reflexivitut (pp. 3–47). Vienna: Urban & Schwarzenberg. Bonnici, H. M., Chadwick, M. J., Lutti, A., Hassabis, D., Weiskopf, N. & Maguire, E. A. (2012). Detecting representations of recent and remote autobiographical memories in vmPFC and hippocampus. Journal of Neuroscience, 32, 16, 982–16, 991. Buzsáki, G. (1989). Two-stage model of memory trace formation: A role for “noisy” brain states. Neuroscience, 31, 551–570.  https://‍‍90423-5 Call, J. & Tomasello, M. (2008). Does the chimpanzee have a theory of mind? 30 years later. Trends in Cognitive. Sciences, 12, 187–192.  https://‍ Chomsky, N. (2010). Some simple evo devo theses: How true might they be for language? In R. K. Larson, Déprez, V. & Yamakido, H. (Eds), The evolution of human language (pp. 45–62). Cambridge: Cambridge University Press.  https://‍ Clayton, N. S., Bussey, T. J. & Dickinson, A. (2003). Can animals recall the past and plan for the future? Trends in Cognitive Sciences, 4, 685–691. Corballis, M. C. (2011). The recursive mind: The origins of human language, thought, and civilization. Princeton, NJ: Princeton University Press. Corballis, M. C. (2013). Mental time travel: A case for evolutionary continuity. Trends in Cognitive Sciences, 17, 5–6.  https://‍ Corballis, M. C. (2017). Language evolution: A changing perspective. Trends in Cognitive Sciences, 27, 229–236.  https://‍ Corkin, S. (2013). Permanent present tense: The man with no memory, and what he taught the world. London: Allen Lane. Covington, N. V. & Duff, M. C. (2016). Expanding the language network: contributions from the hippocampus. Trends in Cognitive Sciences. 20, 869–870. https://‍ Darwin, C. (1871). The descent of man and selection in relation to sex, 2nd edition. London: John Murray.  https://‍ Dediu, D. & Levinson, S. C. (2013). On the antiquity of language: The reinterpretation of Neandertal linguistic capacities and its consequences. Frontiers in Psychology, 4, article 397. https://‍ Derdikman, D. & Moser, M. B. (2010). A dual role for hippocampal replay. Neuron, 65, 582–584. https://‍ Deuker, L., Bellmund, J. L. S., Schröder, T. N. & Doeller, C. F. (2016). An event map of memory space in the hippocampus. eLife, 5, e16534.  https://‍ De Waal, F. B. M. (2012). The antiquity of empathy. Science, 336, 874–876. https://‍ Donald, M. (1991). Origins of the modern mind. Cambridge, MA: Harvard University Press. Dor, D. (2015). The instruction of imagination: Language as a social communication technology. New York: Oxford University Press. https://‍

366 Michael C. Corballis Duff, M. C. & Brown-Schmidt, S. (2012). The hippocampus and the flexible use and processing of language. Frontiers in Human Neuroscience, 6, article 69. https://‍ Emery, N. & Clayton, N. (2004). The mentality of crows: convergent evolution of intelligence in corvids and apes. Science, 306, 1903–1907.  https://‍ Fauconnier, G. (2003). Cognitive linguistics. In Nadel, L. (Ed.) Encyclopedia of cognitive science (pp. 539–543). London: Nature Publishing Group. Ferkin, M. H., Combs, A., del Barco-Trillo, J., Pierce, A. A. & Franklin, S. (2008). Meadow voles, Microtus pennsylvanicus, have the capacity to recall the “what”, “where”, and “when” of a single past event. Animal Cognition, 11, 147–159. https://‍ Foster, D. J. & Wilson, M. A. (2006). Reverse replay of behavioural sequences in hippocampal place cells during the awake state. Nature, 440, 680–683. https://‍ Gärdenförs, P. (2014). The evolution of sentential structure. Humana.Mente Journal of Philosophical Studies, 27, 79–97. Gärdenförs, P. & Osvath, M. (2010). Prospection as a cognitive precursor to symbolic communication. In R. Larson, V. Déprez, & H. Yamakido (Eds.), Evolution of language: Biolinguistic approaches (pp. 103–114). Cambridge: Cambridge University Press. Gibson, B., Wilkinson, M. & Kelly, D. (2012). Let the pigeon drive the bus: pigeons can plan future routes in a room. Animal Cognition, 15, 379–391. https://‍ Gordon, B. A., Shelton, J. T., Bugg, J. M., McDaniel, M. A. & Head, D. (2011). Structural correlates of prospective memory. Neuropsychologia, 49, 3795–3800. https://‍ Grice, H. P. (1989). Studies in the ways of words. Cambridge, MA: Cambridge University Press. Guazzelli, A., Corbacho, F. J., Bota, M., & Arbib, M. A. (1998). Affordances, motivation, and the world graph theory. Adaptive Behavior, 6, 435–471. https://‍ Hafting, T., Fyhn, M., Molden, S., Moser, M. -B., & Moser, E. I. (2005). Microstructure of a spatial map in the entorhinal cortex. Nature, 436, 801–806. https://‍ Hassabis, D., Kumaran, D. & Maguire, E. A. (2007). Using imagination to understand the neural basis of episodic memory. Journal of Neuroscience. 27, 14365–1437. https://‍ Hasselmo, M. E. (2009). A model of episodic memory: Mental time travel along encoded trajectories using grid cells. Neurobiology of Learning & Memory, 92, 559–573. https://‍ Hockett, C. F. (1960). The origins of speech. Scientific American, 203(3), 88–96. https://‍ Hobaiter, C. & Byrne, R. W. (2011). Serial gesturing by wild chimpanzees: Its nature and function for communication. Animal Cognition, 14, 827–838. https://‍ Hughes, J. K., Haywood, A., Mithen, S. J., Sellwood, B. W. & Valdes, P. J. (2007). Investigating early hominin dispersal patterns: developing a framework for climate data integration. Journal of Human Evolution, 53, 465–474.  https://‍

Mental travels and the cognitive basis of language 367

Intraub, H. & Richardson, M. (1989). Wide-angle memories of close-up scenes. Journal of Experimental Psychology: Learning, Memory & Cognition, 15, 179–187. Jacobs, J., Weidemann, C. T., Miller, J. F., Solway, A., Burke, J. F., Wei, X. -X. … Kahana, M. J. (2013). Direct recordings of grid-like neuronal activity in human spatial navigation. Nature Neuroscience, 16, 1188–1190.  https://‍ Janmaat, K. R. L., Polansky, L., Ban, S. D. & Boesch, C. (2014). Wild chimpanzees plan their breakfast time, type, and location. Proceedings of the National Academy of Sciences, USA,. 111, 16343–16348.  https://‍ Johansson, S. (2013). The talking Neandertals: What do fossils, genetics, and archeology say? Biolinguistics, 7, 35–74. Kabadayi, C. & Osvath, M. (2017). Ravens parallel great apes in flexible planning for tool-use and bartering. Science, 357, 202-204. https://‍ Kaminski, J., Call, J. & Fischer, J. (2004). Word learning in the domestic dog: evidence for ‘fast mapping’. Science 304, 1682–1683.  https://‍ Kano, F., & Hirata, S. (2015). Great apes make anticipatory looks based on long-term memory of single events. Current Biology, 25, 2513–2517,  https://‍ Klein, S. B., Robertson, T. E., & Delton, A. W. (2010). Facing the future: Memory as an evolved system for planning future acts. Memory & Cognition, 38, 13–22. https://‍ Köhler, W. (1925). The mentality of apes. New York: Routledge & Kegan Paul. (Originally published in German in 1917). Kundera, M. (2002). Ignorance. New York: HarperCollins. Lieblich, I., & Arbib, M. A. (1982). Multiple representations of space underlying behavior. The Behavioral & Brain Sciences, 5, 627–659.  https://‍ Loftus, E. & Ketcham, K. (1994). The myth of repressed memory. New York: St. Martin’s Press. Maguire, E. A., Intraub, H. & Mullally, S. L. (2016). Scenes, spaces, and memory traces: What does the hippocampus do? The Neuroscientist, 22, 432–439. https://‍ Martin, V. C., Schacter, D. L., Corballis, M. C. & Addis, D. R. (2011). A role for the hippocampus in encoding simulations of future events. Proceedings of the National Academy of Sciences, 108, 13858–13863.  https://‍ Menzel, C. R., Savage-Rumbaugh, S. & Menzel, E. W., Jr. (2002). Bonobo (Pan paniscus) spatial memory and communication in a 20-hectare forest. International Journal of Primatology, 23, 601–619.  https://‍ Miller, J. F., Neufang, M., Solway, A., Brandt, A., Trippel, M., Mader, I. … Schulze-Bonhage, A. (2013). Neural activity in human hippocampal formation reveals the spatial context of retrieved memories. Science, 342, 1111–1114.  https://‍ Moser, M. B., Rowland, D. C. & Moser, E. I. (2015). Place cells, grid cells, and memory. Cold Spring Harbor Perspectives in Biology. 7, a021808. https://‍ O’Keefe, J. & Nadel, N. (1978). The hippocampus as a cognitive map. Oxford: Clarendon Press. Osvath, M. & Karvonen, E. (2012). Spontaneous innovation for future deception in a male chimpanzee. PLoS ONE 7, e36782.  https://‍ Patterson, F. G. P. & Gordon, W. (2001). Twenty-seven years of Project Koko and Michael.” In Galdikas, B., Briggs, N. E., Sheeran, L. K. & Goodall, J. (Eds.), All apes great and small, Vol. 1: African Apes (pp. 165–176). New York: Kluver.

368 Michael C. Corballis Penn, D. C., Holyoak, K. J. & Povinelli, D. J. (2008). Darwin’s mistake: Explaining the discontinuity between human and nonhuman minds. Behavioral & Brain Sciences. 31, 108–178. https://‍ Pfeiffer, B. E. & Foster, D. J. (2013). Hippocampal place-cell sequences depict future paths to remembered goals. Nature, 497, 74–79.  https://‍ Piai, V., Anderson, K. L., Lin, J. J., Dewar, C., Parvizi, J., Dronkers, N. F., & Knight, R. T. (2016). Direct brain recordings reveal hippocampal rhythm underpinnings of language processing. Proceedings of the National Academy of Sciences, 113, 11366–11371. https://‍ Pilley, J. W. & Reid, A. K. (2011). Border collie comprehends object names as verbal referents. Behavioral Processes, 86, 184–195.  https://‍ Pinker, S. & Jackendoff, R. (2005). The faculty of language: What’s special about it? Cognition, 95, 201–236.  https://‍ Premack, D. (2007). Human and animal cognition: Continuity and discontinuity. Proceedings of the National Academy of Sciences (USA), 104, 13861–13867. https://‍ Premack, D. & Woodruff, G. (1978). Does the chimpanzee have a theory of mind? Behavioral & Brain Sciences, 4, 515–526.  https://‍ Raffaele, P. (2006). Speaking bonobo. Smithsonian Magazine, November 2006. Online at: http://‍ Roberts, W. A. (2002). Are animals stuck in time? Psychological Bulletin, 128, 473–489. https://‍ Roediger, H. L. & McDermott, K. B. (1995). Creating false memories: Remembering words not presented in lists. Journal of Experimental Psychology: Learning, Memory and Cognition, 24(4), 803–814 Rugg, M. D. & Vilberg, K. L. (2013). Brain networks underlying episodic memory retrieval. Current Opinion in Neurobiology, 23, 255–260.  https://‍ Russon, A. & Andrews, K. (2001). Orangutan pantomime: Elaborating the message. Biology Letters, 7, 627–630.  https://‍ Schacter, D. L. (2012). Adaptive constructive processes and the future of memory. American Psychologist, 67, 603–613.  https://‍ Scott-Phillips, T. (2015). Speaking our minds: Why human communication is different, and how language evolved to make it special. Basingstoke, UK: Palgrave Macmillan. https://‍ Sebeok, T. A. & Rosenthal, R. (Eds.) (1981). The Clever Hans phenomenon: Communication with horses, whales, apes, and people. New York: New York Academy of Sciences. Seyfarth, R. M. & Cheney, D. L. (2017). Precursors to language: Social cognition and pragmatic inference in primates. Psychonomic Bulletin & Review, 24, 79–84. https://‍ Squire, L. R. (2004). Memory systems of the brain: A brief history and current perspective. Neurobiology of Learning and Memory, 82. 171–177. https://‍ Stout, D. & Chaminade, T. (2012). Stone tools, language and the brain in human evolution. Philosophical Transactions of the Royal Society of London B, 367, 75–87. https://‍

Mental travels and the cognitive basis of language 369

Stout, D. & Hecht, E. E. (2017). Evolutionary neuroscience of cumulative culture. Proceedings of the National Academy of Sciences, 114, 7861–7868 https://‍ Suddendorf, T. (2013). Mental time travel: continuities and discontinuities. Trends in Cognitive Sciences, 17, 151–152.  https://‍ Suddendorf, T. & Corballis, M. C. (1997). Mental time travel and the evolution of the human mind. Genetic, Social, and General Psychology Monographs, 123, 133–167. Suddendorf, T. & Corballis, M. C. (2007). The evolution of foresight: What is mental time travel, and is it unique to humans? Behavioral and Brain Sciences, 30, 299–351. https://‍ Suddendorf, T. & Corballis, M. C. (2010). Behavioural evidence for mental time travel in nonhuman animals. Behavioural Brain Research, 215, 292–298. https://‍ Timmermann, A. & Friedrich, T. (2016). Late Pleistocene climate drivers of early human migration. Nature, 538, 92–95.  https://‍ Taylor, A. H., Hunt, G. R., Holzhaider, J. C. & Gray, R. D. (2007). Spontaneous metatool use by New Caledonian crows. Current Biology, 17, 1504–1507 https://‍ Tooby, J., and DeVore, I. (1987). The reconstruction of hominid evolution through strategic modeling. In W. G. Kinzey (Ed.), The evolution of human behavior: Primate models (pp. 183–227). Albany, NY: SUNY Press. Tulving, E. (2002). Episodic memory: From mind to brain. Annual Review of Psychology, 53, 1–15.  https://‍ Venkataraman, V. V., Kraft, T. S., Dominy, N. J., Endicott, K. M. (2017). Hunter-gatherer residential mobility and the marginal value of rainforest patches. Proceedings of the National academy of Sciences, 114, 3097–3102.  https://‍ Wearing, D. (2005). Forever today. New York: Doubleday. Wilson, A. G., Pizzo, M. J. & Crystal, J. D. (2013). Event-based prospective memory in the rat. Current Biology. 23, 1089–1093.  https://‍ Zinkivskay, A., Nazir, F. & Smulders, T. V. (2009). What–Where–When memory in magpies (Pica pica). Animal Cognition, 12, 119–125  https://‍

The Road Map

The comparative neuroprimatology 2018 (CNP-2018) road map for research on How the Brain Got Language Michael A. Arbib,1 Francisco Aboitiz,2 Judith M. Burkart,3 Michael Corballis,4 Gino Coudé,5 Erin Hecht,6 Katja Liebal,7 Masako Myowa-Yamakoshi,8 James Pustejovsky,9 Shelby Putt,10 Federico Rossano,1 Anne E. Russon,11 P. Thomas Schoenemann,10 Uwe Seifert,12 Katerina Semendeferi,1 Chris Sinha,12 Dietrich Stout,13 Virginia Volterra,14 Sławomir Wacewicz15 and Benjamin Wilson16 1University

of California at San Diego / 2Universidad Católica de Chile Zürich / 4University of Auckland / 5Institut des Sciences / Cognitives Marc Jeannerod – CNRS / 6Georgia State University / 7Freie Universität Berlin / 8Kyoto University / 9Brandeis University / 10Indiana University / 11Glendon College of York University / 12Hunan University / 13Emory University / 14Italian National Research Council, CNR / 15Nicolaus Copernicus University / 16Newcastle University 3Universität

We present a new road map for research on “How the Brain Got Language” that adopts an EvoDevoSocio perspective and highlights comparative neuroprimatology – the comparative study of brain, behavior and communication in extant monkeys and great apes – as providing a key grounding for hypotheses on the last common ancestor of humans and monkeys (LCA-m) and chimpanzees (LCA-c) and the processes which guided the evolution LCA-m → LCA-c → protohumans → H. sapiens. Such research constrains and is constrained by analysis of the subsequent, primarily cultural, evolution of H. sapiens which yielded cultures involving the rich use of language. Keywords: brain evolution, cultural evolution, EvoDevoSocio, languageready brain, language evolution, neurolinguistics, neuroprimatology, primate communication, protolanguage, social interaction

https://‍ © 2020 John Benjamins Publishing Company

The comparative neuroprimatology 2018 (CNP-2018) road map for research 371

An overall perspective The present paper presents the Comparative Neuroprimatology 2018 (CNP-2018) Road Map based on the papers of this volume. The comparative neuroprimatology framework for study of language evolution assesses relevant data and theories concerning the brains, behaviors and communication systems of monkeys, apes and humans to raise hypotheses about LCA-m (our last common ancestor with monkeys) and LCA-c (our last common ancestor with chimpanzees and apes more generally) as a basis for investigating the biological and cultural evolution of the human language-ready brain. Four assumptions are shared by the authors of this road map (though several may remain controversial in the language evolution community at large): 1. Our ancestors evolved a capability for protolanguage – which had an open lexicon but little if any syntax – before they developed language. Here, “protolanguage” is being used in the sense of “something intermediate between (i) the communication systems of LCA-c and (ii) language – but which is not itself a language.” 2. The quest is an exercise in EvoDevoSocio – the view that biological evolution defines developmental systems that can both shape and be shaped by cultural evolution, the dynamic emergence of habits of social interaction. We seek to understand how biological evolution yielded brains and bodies (Evo) that could develop (Devo) in a culture that already had (proto)‍language (Socio) so that children could master the use of that language with the help of caregivers to support (proto)‍language acquisition. And how did these brains enable humans in interaction to support the (extended and polymorphic) emergence of languages followed in turn by historical language change? 3. The study of language evolution must include brain mechanisms, comparing human brain imaging and lesion data with data on brain mechanisms for “language-related” functions in other species to ground an understanding of what has been conserved and what has been changed in human brain evolution. However, “language-related” functions need not be communicative. 4. Shared mechanisms that support signed as well as spoken languages are crucial. Nonetheless, the importance of spoken language requires us to understand the evolution of human vocal control. Evolution works by bricolage (tinkering). It does not produce the optimal software on the optimal hardware. Instead, it yields a cultural artifact riddled with historical contingency on a brain whose genetic code reflects selection without conscious design operating on structures (DNA, membrane and cytoplasm) that are far removed from what we might think are the crucial design features of language.

372 Michael A. Arbib et al.

To understand the evolution of the language-ready brain we need to understand mechanisms and processes and their variation across (at least) primates. Just because processes have the same name (e.g., imitation, pantomime, cognition, theory of mind) does not mean that they are implemented by the same circuits across species or even in one brain. To keep citations to a minimum, we place the name of each author in bold italics to refer the reader to their individual papers in this volume (listed in the bibliography, which provides names of any co-authors, with each marked with a * and details of their original publication in Interaction Studies) for detailed references.

Aspects of language to be explained We start by listing some key properties shared by human languages.

Language is a special form of communication Lexicon and grammar A language provides a framework for sharing of meaning in a community by combining words (we use the term to include, e.g., the signs of a signed language), perhaps modifying the words in the process, to express both familiar and novel meanings and to understand (more or less) the novel utterances of others (parity of comprehension and production). It combines an open-ended lexicon with a rich grammar that supports a compositional semantics. The endless aboutness of language A human language is a mechanism to support sharing of meaning in a community about physical and mental worlds. Components of this ability include:

Here-and-Now: A commonly shared assumption is that the primary drive in the evolutionary path to language was the value of being able to coordinate current behavior, with joint attention supporting the sharing of perception of the current environment and plans for acting within that environment (Common Ground). Theory of mind: The ability to talk about the mental (including emotional) states of others; this may rest on an ability, possibly shared to some extent with other species, to infer the mental states of others, and use these to predict behavior. Displacement: Moving beyond the co-situated context, language builds on capacities for episodic memory, planning and imagination to support the ability to talk about distant events as well as about the past and the possible future, as well as counterfactuals.

The comparative neuroprimatology 2018 (CNP-2018) road map for research 373

Abstraction: Moving from embodied grounding to disembodied abstractions. Language as a tool for thought versus language as a means of communication: Communicative tools are already tools for thought. Directing the goal/ intention of a linguistic expression reflectively can be seen as self-communication, the origin of the alleged “non-communicative” aspect of language.

Social structure and the motivation to converse A prominent question at the workshop was “Why do we talk?” but a more fruitful question is “Why do we converse?” (i) In few species of nonhuman primates does communication involves a back and forth, whereas the dominant form of nonwritten language is conversation. (ii) The word “talk” overemphasizes the use of speech. The OED definition of conversation includes a quote from Boswell’s Life of Johnson (ed. 2, 1793) that cites Johnson as saying “we had talk enough, but no conversation; there was nothing discussed.” This stresses the importance of the endeavor to develop a shared mental understanding, and this seems to require some aspects of theory of mind. The motivation to converse is one linkage between emotion and language. Here we can note two further aspects: An utterance may be emotionally charged by the way in which prosody, facial expression and posture are integrated with its production – compare the power of music to sway the emotions. However, language can also express emotions without the speaker or signer being emotionally engaged, with emotion serving as just another domain for “aboutness.”

Action, gesture and language Human languages are in most cases spoken languages (though their reach is extended by the emergence of writing), but the languages of the Deaf are fully-formed languages which rely on manual signs (supplemented, e.g., by facial expressions) and make no use of voice. When people do speak, their speech is complemented by cospeech gestures of the hands as well as facial expressions. The puzzle is this: Nonhuman primates exhibit very little in the way of vocal control but do exhibit dexterous manual control. Why, then, did vocal control evolve as part of the human brain’s distinctive capabilities, since language could “manage without it”? And how relevant does manual action remain in understanding the brain mechanisms of language? The notion, in any case, is that the brain mechanisms supporting language can – to a first approximation – be separated from the mechanisms that recognize words in the sensory (auditory or visual) input and generate words in the motor (vocal or manual) output – without denying that neural plasticity will differentially restructure the brain dependent on whether language is spoken or signed – just as literacy can restructure it (Dehaene et al., 2010).

374 Michael A. Arbib et al.

Methodologies At the cost of some redundancy, we precede our specification of the new road map by illustrating a range of methodologies relevant to it.

Neurophysiology and comparative neuroanatomy At a gross level, neuroanatomy characterizes distinctive brain regions and the pathways connecting them. At a finer level, it may seek to distinguish the cell types of different regions and the patterns of connectivity within and across those regions. Comparative neuroanatomy can thus suggest hypotheses about the evolutionary relationship of brains of LCA-m, LCA-c and modern humans, enriched by suggestions concerning the functions of specific regions. Aboitiz compares the anatomy of macaque and human brains in seeking to assess how changing connectivity might have supported the emergence of an auditory working memory (WM) that could provide the “phonological loop” for language. The reference point is that macaques have good visual WM but poor auditory WM. Hecht focuses on connectivity using diffusion tensor imaging (DTI) in living brains to compare pathways engaged in visuomotor integration across macaque, chimpanzee and human, while discussing the importance of changes in mechanisms of plasticity in complementing the “innate ground plan” of each brain. (Unfortunately, new NIH policies may preclude further US studies of DTI for great apes.) Semendeferi increases the level of detail by staining brains of different primates to reveal changes in neuronal structure that underlie differences in the substructure of different nuclei, especially those related to emotion. Neurophysiology then enriches the comparative database by looking at the dynamic activity of the brain while the human or animal performs specific tasks. In monkeys, we have data on the fine structure of firing of individual neurons in a circuit. In humans, we have imaging techniques that can follow fine timing with very poor spatial resolution (e.g., EEG) or coarse timing with better spatial resolution (but still in terms of millions of neurons as the unit, e.g., fMRI). Coudé uses neurophysiology to explore detailed activity of neurons, and especially mirror neurons in the manual and orofacial regions of region F5 of macaque prefrontal cortex and, crucially, links this to neuroanatomy showing that these two regions are linked to very different subsystems of the macaque brain. Wilson use neuroimaging to study sequence processing in the macaque brain to assess what is conserved in the human brain and what additions may have evolved to support syntax.

The comparative neuroprimatology 2018 (CNP-2018) road map for research 375

Behavior, social structure and communication Both field studies and studies in the lab can provide useful information for our quest even in the absence of neural correlates. Different species differ not only in their behavior and communication but also in the social structure in which these are embedded. Exploring the relationship between social structure and forms of communication may help us better distinguish the evolution of “social support for extensive communication” from the evolution of the general form of language (lexicon, grammar, compositional semantics) on which cultural evolution has played extensive variations – without ignoring the eventual need to explore the interactions between these two evolutionary foci. Liebal surveys gestures, calls and facial expressions in nonhuman primates to question the view that gestures are intentional whereas facial expressions and vocalizations are emotional. This may accord with a broader theoretical assessment of the linkage between emotion and intentions, and the observation that what distinguishes vervet alarm calls is not emotion (each expresses a fearful situation) but rather the difference between eagle, snake and leopard. Burkart notes that callitrichid monkeys (e.g., marmosets) appear to exhibit particularly elaborate vocal communication, including vocal turn-taking. She explores the hypothesis that this is linked with cooperative breeding (i.e., infant care shared among group members). Among primates, this rearing system is correlated with proactive prosociality, which can be expressed as a motivation to share information. Since humans are the only cooperative breeders in primates besides callitrichids, cooperative breeding may contribute to understanding why language evolved in our species, rather than in any other primate. Rossano uses comparative study of social manipulation, turn-taking and cooperation in apes to develop implications for the evolution of language-based interaction in humans – but note the emphasis here on social conditions for such interactions, not the particularities of language as distinct from other forms of social competition and cooperation. Importantly, he further stresses the need for longitudinal studies to explore the emergence of different gestures in apes in comparison with language development in children. Both imitation and pantomime have figured in discussions of the relation between action and language. Russon analyzes imitation in orangutans and offers evidence that they have some form of pantomime and that its use does not involve imitation. Myowa compares imitation in chimpanzees and young children to assess their different styles of imitation, noting that chimpanzees attend to the hands of the imitatee while children also glance back and forth at facial expression. The key lesson is that the same term may be employed for an ape and a human capacity but that there may be differences that require an evolutionary explanation.

376 Michael A. Arbib et al.

Volterra focuses on humans, revealing a developmental progression from actions via gestures to “words” (whether signed or spoken). Does this support the hypothesis of an evolutionary progression from manual action via gesture to protolanguage? The answer will require a delicate treatment of the relation between phylogeny and ontogeny. Corballis offers a comparative perspective (not limited to primates) of episodic memory, broadly construed, to suggest an evolutionary basis for a key property of language, displacement.

Archeology Archeology asks what can be learned from the remains of protohumans (australopithecines and predecessors of sapiens in the genus Homo) and early humans and their artefacts. New findings about Neandertal culture are further enriching the database. Here, the primates with whom modern humans being compared are all extinct hominins rather than extant apes or monkeys. Schoenemann focuses in part on the sparse set of skulls of Australopithecus and Homo and the somewhat limited inference of relative size of different cortical regions from endocasts of the skulls whose indentations are indicative of gross cortical shape. Cognitive archeology examines “cultural remains” of the daily lives of our ancestors to hypothesize the cognitive processes involved in their making and use. Stout and Putt carry this further, employing “neuro-archeology”  – they teach modern humans to make stone tools of the kind found by archeologists; see what parts and connections of the brain are “exercised” by learning the ancient skill; and hypothesize that their enlargement may have been a step in brain evolution.

High-level theory Diverse “high-level” theoretical approaches may complement attempts to generate and directly address the data of comparative neuroprimatology. Wacewicz offers an approach more consonant with general evolutionary theory to complement the work of Burkart and Rossano by emphasizing trust, cooperation and turn-taking in language origins. Sinha offers a general EvoDevoSocio perspective that highlights the role of biological and cultural co-evolution, with particular emphasis on the evolution of praxis, symbols and infancy. Seifert exemplifies a broader assessment of culture-readiness by investigating what is and is not shared between music-readiness (more attuned to emotional expression?) and language-readiness (more attuned to propositional content?).

The comparative neuroprimatology 2018 (CNP-2018) road map for research 377

Modeling and mechanism Pustejovsky probes the relation between action, perception and language, taking a step toward modeling the actual mechanisms that may underlie the use of language. Arbib 1 (2018a) introduces explicit modeling of biologically plausible neural networks, including frontoparietal interactions in macaque brain for the control of grasping, development of mirror neurons for manual actions, and opportunistic scheduling of sequences of actions. He then suggests how macaque mechanisms could be augmented to supply a hypothetical model of the ape brain adequate to support the emergence of novel gestures through dyadic interaction. Arbib 2 (2018b) offers a complementary style of modeling, schema theory, that can be applied to other primates but is especially relevant when modeling human capabilities such as visual scene understanding and language use for which data on activity at the neural level is sparse or unavailable. He models the “aboutness” of language in comprehension and production, and develops hypotheses about the evolutionary relation between manual action and language.

Genetics Genetics lies outside the scope of the present roadmap, and thus is of high priority for its sequels. Note the distinction between finding genes that act “merely” as markers (these remains are sapiens, those are Neandertal  – but even these may be relevant to establishing timelines) and those that can be linked to changing functionality of brain or body. One clear target is the assessment of how different forms of neural plasticity may have evolved to provide circuits with novel capacities for learning.

Road map preliminaries Using the term “hominin” for genera that emerged after the split from the great apes (australopithecines, Homo), with “hominid” including the great apes as well, we base the evolutionary account on four (probably overlapping) stages. Each subsequent stage raises the question: What is new here, and how did it build on or depart from features of the previous stage? LCA-m. Database: Monkeys. LCA-c. Database: Modern great apes (thus Russon’s suggestion that LCAga would be a better term). After LCA-c. Database: Hominin fossil record up to c. 200 Kya Modern Homo sapiens. Database: Archeology since c. 200Kya, historical record and current observation

378 Michael A. Arbib et al.

We briefly summarize key “landmarks” and “connecting roads” for these stages but do not provide references for the details. Instead, we mark items MSH if they are part of the mirror system hypothesis as set forth in Arbib (2012), and present the name of the author in bold italics if they have discussed this item in this volume. Areas of disagreement help define key challenges for future research. Since the length of the paper is limited, key points are omitted, but we have aimed to provide a firm framework for future elaboration. Since the body of actual and future research on each stage is overwhelming, the meta-challenge is to assess what it is at each stage that may be relevant to understanding “how the brain got language.” For example, if we view speech as the sine qua non for language, we might focus on monkey calls as a prime dataset. If we emphasize that human languages may be signed, then ape gesture may seem equally relevant. But once one looks at manual gesture, one may return to monkeys to study manual action more generally. Similarly, one may look at modern languages in the richness of their aboutness, or one may instead focus simply on the ability to string words into sequences, and then emphasize mechanisms in the monkey brain that support sequential behavior. We espouse a comprehensive framework.

Establishing the “Stages” What capabilities of brain, behavior and communication should define stages in our road map? We need to avoid being seduced by the metaphor of the evolutionary tree, for we now understand that extant species at one stage may evolve differentially yet continue to cross-breed – and so at each stage we may establish a suite of capabilities that may have been distributed across different species and populations. What evolutionary principles could explain how the human brain might aggregate them? Moreover, primates in human captivity may acquire capacities never seen in the wild. To what extent does that imply they have the brain mechanisms to support that capacity but do not have the capability for the cultural evolution that led to that capacity? Modern species did not evolve from each other. Thus, one challenge is to study various extant monkeys to extract a shared core (brain, behavior, communication) to define the LCA-m baseline. But what of traits seen in some monkey species that don’t meet our criteria for LCA-m and yet are shared by humans? Perhaps convergent evolution was involved. But if so, we must hypothesize where and how this property re-emerged. Examples: Vocal turn-taking in the marmoset does not seem to qualify as a property of LCA-m. One group of cebus (capuchin) monkeys exhibits tool use (using stones to crack palm nuts), other groups and other species do not. Burkart assesses how the former may provide insights into human social

The comparative neuroprimatology 2018 (CNP-2018) road map for research 379

structure; the latter may be relevant for placing (proto)‍human stone tools (Stout and Putt) in an evolutionary context. Similarly, we need a fuller assessment of what properties of present-day great apes can plausibly be attributed to LCA-c or may offer suggestions for convergent evolution.

In search of precise terminology Another methodological challenge is that many of the terms that appear in this field have different meanings when applied to different species. Future work must refine the terminology to the point where we can address the questions: Which definition best characterizes the version seen in one species rather than another, and how does this license the version(s) posited for LCA-m, LCA-c and later? For version X, can we establish the properties of the X-ready brain and the cultural conditions (if necessary) that support its expression? Then, when we note version X posited for one ancestral species and version Y posited for a later ancestral species, we must investigate: Is X a precursor of Y, or was it a terminological “coincidence” that X and Y are refinements of the same term? If we can establish that X is a precursor of Y, is an X-ready brain also a Y-ready brain, with the evolution being primarily cultural? Or is a Y-ready brain different from an X-ready brain, so that biological as well as cultural evolution is involved? Note that these questions apply more generally. For example, while all would agree that an H. sapiens languageready brain is a reading-ready brain, some may disagree with the view (held in MSH) that a protolanguage-ready brain is already language-ready. Here are four of the terms whose refinement is relevant to defining our road map: Imitation: Inspired in part by Byrne and Russon (1998), MSH distinguished very limited imitation (e.g., effector priming) in LCA-m, “simple” imitation in LCA-c and “complex” imitation in humans, but Myowa adds a new dimension to complex imitation – attention to emotional state as well as the performance of the skill. Pantomime: MSH defines a form of pantomime that builds on complex action recognition (a prerequisite of complex imitation) and posits that it evolved in the hominin line; Russon reports on pantomimes in orangutans and so posits pantomime as a component of LCA-c (her LCA-ga). Turn-taking: Burkart assesses turn-taking in callitrichids; Rossano presents three variants of turn-taking, suggesting that the one applicable to human language may not be a descendant of the callitrichid version (see also Wacewicz for a similar view).

380 Michael A. Arbib et al.

Episodic Memory: Corballis offers a wide range of capabilities (e.g., navigation in rats, recall of sites where food is cached by squirrels) as examples of the great ancestral depth of episodic memory; Pustejovsky sees the ability to conceptualize events, extracting them from the embodied flow of experience, with recalling such events as the form of episodic memory underlying much of language use, as unique to humans.

Beyond the primates Parrots and some other birds have flexible vocal production and imitation, while dogs can acquire a large receptive (not productive) vocabulary for spoken commands; none of these appear to have grammar. Much is to be learned from studying such capabilities and their neural basis, but while such studies will usefully complement work on primates (Petkov & Jarvis, 2012), these lie outside our present purview.

The CNP-2018 road map Capabilities of LCA-m MSH: Manual dexterity with a related mirror neuron system supporting action recognition, but no capacity for “real” imitation. Integration of the mirror system with systems “beyond the mirror,” including a visual dorsal “how” pathway and ventral “planning” pathway for the reach-to-grasp system. Serial behavior including opportunistic scheduling of actions. Coudé argues that more attention must be paid to the oro-facial mirror system. Whereas the manual system is related to parietal-premotor circuits, the orofacial system connects with limbic structures. Exploring the linkage between these two systems could underpin efforts to chart evolution of the linkage between communication and emotions. Arbib 2 addresses the “aboutness” of language by suggesting that a system for linking visual perception of the current environment to a plan for manual action may be the precursor for a system of semantic representation in the languageready brain. This notion needs to be assessed in relation to models of sequential behavior that set the baseline for Wilson’s exploration of their relevance to syntax. LCA-m is posited to have vocal communication (an innate call repertoire) but (almost) no vocal learning and little importance for manual gesture. Where MSH posits (at later stages) that (manual) protosign provided the scaffolding for the evolution of flexible vocal control and learning, Aboitiz (without denying the importance of gesture) argues for a direct road in evolving speech, requiring more

The comparative neuroprimatology 2018 (CNP-2018) road map for research 381

careful attention to the auditory system and precursors to vocal control in monkeys. He is particularly concerned with precursors of the form of working memory in humans called the phonological loop. Resolving the debate between the “vocal control first” and “semantics first” hypotheses is a major challenge. Corballis challenges us to assess what form of episodic memory LCA-m had: was it more than the ability to form a limited cognitive map?

Capabilities of LCA-c LCA-m properties are conserved, but further capacities become available. MSH emphasizes simple imitation, attempting to use familiar manual actions to achieve recognizable goals and the use of gesture to communicate (but not to converse). Arbib 1 offers a model of how some of these could be learned by ontogenetic ritualization without dependence on imitation. For MSH, learnable gestures are on the path to language whereas primate calls are not (recall the debate on whether protosign provided essential scaffolding for the evolution of speech). The use of gestures shows that intentional communication is already established in LCA-c. The acquisition of human-demonstrated “symbols” by enculturated apes shows that LCA-c was symbol-ready, even though LCA-c “cultures” were not symbol-rich. In what sense are these symbols similar to those of humans, with their rich conceptual repertoire? Liebal challenges us to assess vocal calls and manual gestures in monkeys and great apes as a basis for better defining the evolutionary path (changes in brain and culture) that link them, and for grounding a more careful analysis of their links to emotion and intentionality. Semendeferi compares emotion-related structures in different great ape species, assessing what they might offer in defining the LCA-c brain in contrast with the human brain to suggest that an expanded capacity for emotional processing could be linked to language-readiness. Determining the relevant connections and assessing their role in linking emotion and communication remains a crucial challenge. Helping address this will be Coudé’s enrichment of the macaque mirror system database, and Hecht’s use of DTI to compare mirror neuron connectivity in macaque, chimpanzee and human. A further challenge is to relate this to comparative studies of language-related connectivity (Rilling, 2014) as differentially assessed by Aboitiz and Arbib 2. Russon presents pantomimes observed in great apes (especially orangutans) to argue for pantomime as a capacity of LCA-c (her LCA-ga). She observes that great ape pantomime does not rest on imitation, whereas MSH posits (next section) that complex action recognition and imitation evolved post LCA-c and prior to pantomime. Do her data invalidate the MSH claim or is this rather a challenge for

382 Michael A. Arbib et al.

terminological refinement? In either case, modeling brain mechanisms supporting these forms of pantomime will be crucial to assessing these social functions.

Hominins prior to Homo sapiens MSH posits a sequence of five stages from LCA-c to language-ready Homo sapiens: The first combines complex action recognition, the ability to attend to the subgoals and some details of the constituent movements of an observed behavior, with complex imitation, the ability to use such recognition to acquire new skills. Second, pantomime emerges, based on complex action recognition (but perhaps not on imitation), supporting the creation of novel pantomimes “on the fly” and the ability of others to recognize them. This opens up semantics beyond the limited range offered by innate vocalizations and ape gestures. Third, frequently used pantomimes become conventionalized within groups to provide protosign. Fourth, early protosign constructed the niche for the emergence of sophisticated vocal learning and control, thus augmenting protosign with protospeech in an expanding spiral. Fifth comes protolanguage, the capacity to recognize classes of events and link them to “protowords,” whether signed or spoken. Clearly, each claim here offers challenges. We have already mentioned the debates over speech and pantomime. One may add the Corballis-Pustejovsky debate over when event perception became developed enough to support protolanguage, let alone the ability to converse about past and imagined events. Did the latter occur with early protolanguage, or did it await the emergence of language? But even were the above sequence correct, serious problems remain. Here are a few: Timeline: When did these substages occur? In australopithecines? Did H. habilis or H. erectus or yet uncharted forms of early Homo see the emergence of key innovations? Schoenemann assesses the data from endocasts but these provide weak constraints since we lack insights into what it really takes for a brain to support any of the above capabilities. Social structure: What social structure was necessary for the success of these innovations? Rossano offers a comparative view of different patterns of social interaction and communication that apes may exhibit. A major challenge then is to assess what combination of these were relevant to the evolution of language, and how they contributed to the “platform of trust” that Wacewicz sees as necessary for the success of (proto)‍language and the ability of children to acquire it – without denying the capacity of humans (shared with chimpanzees) to steal, and to violate that trust in diverse ways (Byrne & Whiten, 1988).

The comparative neuroprimatology 2018 (CNP-2018) road map for research 383

Culture, more generally: Brains do not fossilize and the evidence from endocasts is limited. We have no record of language before the invention of writing a few thousand years ago. But we do have a profusion of stone tools and other artefacts. Stout and Putt combine instructing modern humans in Oldowan versus Acheulean stone knapping with brain imaging to hypothesize what might have changed in the parieto-frontal system to support these technologies. Stout assesses the pedagogy involved to calibrate forms of imitation and assess the level of (proto)‍language that might have been needed to support training in the relevant skills. For each form of culture, we must assess to what extent its evolution depended on the biological or cultural evolution of (proto)‍language, and to what extent it contributed to it. Seifert explores a possible relationship between the evolution of the language-ready brain and music-ready brain and raises questions as to what may be shared (could prosody be part of the overlap?) and what is distinctive. Here, again, we face the issue of what makes a brain “ready” for a domain of culture, and how cultural evolution may have exploited those resources.

Post-biological evolution in Homo sapiens MSH holds that early Homo sapiens had protolanguages (diverse “protowords” with little or no grammar) in vocal and manual modalities, but not languages (with a grammar to support compositional semantics) – and that it was cultural evolution that underlay the transition via increasingly complex protolanguages to languages which in turn increased in complexity (there is no sharp boundary) along with increasing complexity of social structure. The ability to form protowords yielded to the ability to freely extend the lexicon and develop diverse constructions to support on-line production and comprehension of utterances which (in a possibly context-dependent way) convey new meanings. Arbib 2 discusses the challenge of assaying the relative plausibility of the Bickertonian version of protolanguage (with the transition to language adding “merge” to a set of words) and the MSH version (with the transition both fractionating protowords to yield constructions and words and building from there). Dubreuil and Henshilwood (2013) alert us to the challenge of placing the transition in the hominin timeline, but our road map holds that the transition was gradual, with no clear break between complex protolanguages and simple languages. Wilson compares monkey and human brains to suggest how sequence processing in LCA-m might have survived as the core of syntax as assayed by the learning of artificial grammars. But how do artificial grammars relate to meaningful

384 Michael A. Arbib et al.

conversation? Neurolinguists have no pre-eminent theory of grammar whose operation in the brain they agree to study. The only candidate offered in this volume is Template Construction Grammar (Arbib 2), but there is no reason to expect it to survive as more than a crude approximation. Volterra offers insights into the aboutness of language by investigating the progression from actions to gestures to words in the young child and the emergence of cross-modal (gesture-word) combinations, although more investigation is required about the further development of grammar. Here we return to the crucial DevoSocio challenge of understanding the evolution of brains that not only support the human child’s ability to learn a language but also the caregiver’s ability to assist the process. Stout’s notion of technological pedagogy, linking acquisition of technical skill with imitation and instruction, may prove helpful. Sinha presents three “spheres” that together provide the setting for modern humans: the sphere of infancy and childhood, including learning and teaching; the technosphere of praxis and its products; and the semiosphere of communication and its mediating signs. The preceding pages offer pieces of the road map relevant to these spheres, and sets the grand challenge of not only providing each of them with a testable evolutionary scenario rooted in (computational) comparative neuroprimatology but also exploring their mutual dependencies during their evolutionary progressions, both biological and cultural. Surveying artifacts from the last 100,000 years, one can seek to assess the cognitive capacities required for constructing shelters, for burial practices and for cave art – and then debate whether language was necessary for the development and transmittal of these cultural practices, or whether protolanguage or “mere” imitation would have sufficed. A thoughtful cautionary note is provided by Dubreuil and Henshilwood (2013) who survey a range of archeological evidence to conclude (p. 257) that Language readiness results from a combination of several neurocognitive mechanisms, often independent of one another. The absence of one of these mechanisms may not have prevented the evolution of language, but may have led to the evolution of impoverished forms of language. The most likely scenario, in our view, is that the brain was almost language-ready significantly before Homo sapiens and that the cultural evolution of languages was well underway when the first sapiens evolved. This is not to say, however, that Homo erectus and Homo heidelbergensis were speaking languages totally akin to ours. Limitations in perspective-taking and mind-reading abilities might have prevented some features of modern human languages from evolving, such as metalinguistic awareness, irony, and potentially some complex syntactical structures.

The comparative neuroprimatology 2018 (CNP-2018) road map for research 385

Envoi The current road map cannot do justice to the richness of research in the diverse disciplines that it touches upon. Somewhat humblingly, one may note that Jon Kaas has recently published the second edition of Evolution of Nervous Systems (Kaas, 2017) in four volumes, with Volume 3, The Nervous Systems of NonHuman Primates, and Volume 4, The Evolution of The Human Brain: Apes and Other Ancestors, providing but a small part of the treasure trove to be exploited in building on the sample provided here. Meanwhile, we invite readers to explore the selected treasures in the 21 preceding papers in this volume. Each concludes with a section “Towards a New Road Map.” Their totality offers far more detail than the Road Map presented here – the one towards which the others are pointing – but the present paper offers a more integrated view than the others can provide. It is our hope that the CNP-2018 Road Map will not be the last, and we would welcome suggestions on how it might be enriched in future editions. Those sent to Michael Arbib ([email protected]) may, after editing and with your permission, be posted on ResearchGate as part of his Project at https://‍ project/Evolution-of-the-language-ready-brain.

Acknowledgements Michael Arbib thanks each co-author for their lively contributions to the Workshop and all they have contributed to development of the CNP-2018 Road Map. Finally, thanks to student and postdoctoral attendees at the Workshop (whose participation was supported by a grant from UCSD) for their contribution to the discussion. and to the outside reviewers for their careful analysis of the penultimate drafts of the preceding papers.

Funding Support for the Workshop that provided the basis for the present paper was provided by the National Science Foundation (NSF) under Grant No. BCS-1343544 “INSPIRE Track 1: Action, Vision and Language, and their Brain Mechanisms in Evolutionary Relationship” (Michael A. Arbib, Principal Investigator). This Grant has also supported, in part, recent developments of the Mirror System Hypothesis.

References *Aboitiz, F. (2018). Voice, gesture and working memory in the emergence of speech. Interaction Studies, 19(1–2), 70–85.  https://‍ Arbib, M. A. (2012). How the Brain Got Language: The Mirror System Hypothesis. New York & Oxford: Oxford University Press. https://‍

386 Michael A. Arbib et al. *Arbib, M. A. (2018a). Computational Challenges of evolving the language-ready brain: 1. From Manual Action to Protosign. Interaction Studies, 19(1–2), 7–21. https://‍ *Arbib, M. A. (2018b). Computational Challenges of evolving the language-ready brain: 2. Building towards neurolinguistics. Interaction Studies, 19(1–2), 22–37. https://‍ *Burkart, J. M., Guerreiro Martins, E. M., Miss, F., & Zürcher, Y. (2018). From sharing food to sharing information. Cooperative breeding and language evolution. Interaction Studies, 19(1–2), 136–150.  https://‍ Byrne, R. W., & Russon, A. E. (1998). Learning by imitation: a hierarchical approach. Behav Brain Sci, 21(5), 667–684; discussion 684–721. https://‍ Byrne, R. W., & Whiten, A. (1988). Machiavellian Intelligence: Social expertise and the evolution of intellect in monkeys, apes, and humans (1 ed.). Oxford: Claredon Press. *Corballis, M. C. (2018). Mental travels and the cognitive basis of language. Interaction Studies, 19(1–2), 353–370. *Coudé, G., & Ferrari, P. F. (2018). Reflections on the differential organization of mirror neuron systems for hand and mouth and their role in the evolution of communication in primates. Interaction Studies, 19(1–2), 38–53.  https://‍ Dehaene, S., Pegado, F., Braga, L. W., Ventura, P., Filho, G. N., Jobert, A., … Cohen, L. (2010). How Learning to Read Changes the Cortical Networks for Vision and Language. Science, 330(6009), 1359–1364.  https://‍ Dubreuil, B., & Henshilwood, C. S. (2013). Archeology and the language-ready brain. Language and Cognition, 5(2–3), 251–260.  https://‍ *Hecht, E. E. (2018). Plasticity, innateness, and the path to language in the primate brain: Comparing macaque, chimpanzee and human circuitry for visuomotor integration. Interaction Studies, 19(1–2), 54–69.  https://‍ Kaas, J. (Ed.) (2017). Evolution of Nervous Systems (Second Edition; in 4 volumes): Elsevier. *Liebal, K., & Oña, L. (2018). Mind the gap – moving beyond the dichotomy between intentional gestures and emotional facial and vocal signals of nonhuman primates. Interaction Studies, 19(1–2), 121–135.  https://‍ *Myowa, M. (2018). The Evolutionary Roots of Human Imitation, Action Understanding and Symbols. Interaction Studies, 19(1–2), 183–199.  https://‍ Petkov, C. I., & Jarvis, E. D. (2012). Birds, primates, and spoken language origins: behavioral phenotypes and neurobiological substrates. Frontiers in Evolutionary Neuroscience, 4, 12. https://‍ *Pustejovsky, J. (2018). From Actions to Events: Communicating through Language and Gesture. Interaction Studies, 19(1–2), 289–317.  https://‍ *Putt, S., & Wijeakumar, S. (2018). Tracing the evolutionary trajectory of verbal working memory with neuro-archaeology. Interaction Studies, 19(1–2), 272–288. https://‍ Rilling, J. K. (2014). Comparative primate neurobiology and the evolution of brain language systems. Current Opinion in Neurobiology, 28, 10–14. https://‍ *Rossano, F. (2018). Social manipulation, turn-taking and cooperation in apes: Implications for the evolution of language-based interaction in humans. Interaction Studies, 19(1–2), 151–166.  https://‍

The comparative neuroprimatology 2018 (CNP-2018) road map for research 387

*Russon, A. (2018). Pantomime and imitation in great apes: Implications for reconstructing the evolution of language. Interaction Studies, 19(1–2), 200–215. https://‍ *Schoenemann, P. T. (2018). The evolution of enhanced conceptual complexity and of Broca’s area: Language preadaptations. Interaction Studies, 19(1–2), 336–351. https://‍ *Seifert, U. (2018). Relating the evolution of Music-Readiness and Language-Readiness within the context of comparative neuroprimatology. Interaction Studies, 19(1–2), 86–101. https://‍ *Semendeferi, K. (2018). Why do we want to talk? Evolution of neural substrates of emotion and social cognition. Interaction Studies, 19(1–2), 102–120. https://‍ *Sinha, C. (2018). Praxis, symbol and language: developmental, ecological and linguistic issues. Interaction Studies, 19(1–2), 239–255.  https://‍ *Stout, D. (2018). Archaeology and the evolutionary neuroscience of language: the technological pedagogy hypothesis. Interaction Studies, 19(1–2), 256–271. https://‍ *Volterra, V., Capirci, O., Rinaldi, P., & Sparaci, L. (2018). From action to spoken and signed language through gesture: some basic developmental issues for a discussion on the evolution of the human language-ready brain. Interaction Studies, 19(1–2), 216-238. https://‍ *Wacewicz, S., & Żywiczyński, P. (2018). Language origins: The platform of trust, cooperation, and turn-taking. Interaction Studies, 19(1–2), 167–182. https://‍ *Wilson, B., & Petkov, C. I. (2018). From evolutionarily conserved frontal regions for sequence processing to human innovations for syntax. Interaction Studies, 19(1–2), 318–335. https://‍


Acheulean manufacture, see Acheulian manufacture Acheulian manufacture  60, 259, 262–263, 266, 274–82, 340, 364, 383 ACQ model  11–12, 15, 194 action lexicon  260–261 action understanding  183–184, 192, 196 action–gesture  221, 234–236, 310 AF, see arcuate fasciculus alloparenting  123, 172 see also cooperative breeding American Sign Language, ASL  226, 228 amygdala  40–42, 94, 102–104, 106–109, 111, 113, 115 anterior cingulate cortex, ACC  40, 42, 103–111 anterior intraparietal sulcus, AIP  10–11, 41–42, 46 archaelogy  256–265, 275–278, 339, 343–345, 376–377 see also neuro-archaeology arcuate fasciculus, AF  30, 71–73, 76–79, 81, 264, 322, 329 artificial grammar  87, 319–324 ASD, see autism ASL, see American Sign Language attention  11, 14, 25, 45, 59, 74, 103, 105, 116, 126, 128, 185, 187–189, 193, 218, 220, 223, 229, 241–242, 276, 278–279, 292, 298, 358, 372, 379 auditory cortex  71–73, 75, 87, 114 auditory gestures  128–129 auditory perception  29, 70

auditory selective attention  279 auditory signals  155 auditory working memory  76–79, 272–273, 275, 281–283, 327, 374 auditory–vocal circuit  74–75, 80–81 see also dorsal audio–vocal pathway, ventral audio– vocal pathway Augmented Competitive Queueing model, ACQ  12 Australopithecus  98, 343, 376 autism  103, 109, 115 Baldwin effect  16, 18, 64, 290 basal ganglia  15, 19, 87, 93–94, 341, 345 Big Mistake hypothesis  153 bonobo  57, 110, 242, 361–362 brain  108, 110–111 gestures  158, 175, 204 lexigram/sign use  338, 346, 359 mother–infant dyads  177, 220 pantomime  202 stone flaking  283 vocalizations  127 Broca’s area  8, 18, 46, 61, 71, 73, 75, 80, 87, 89, 91, 104, 110, 128–129, 260, 274, 278, 281, 321–322, 324–325, 328, 337–338, 341–346 callitrichids  95, 137 see also marmosets, tamarins see also vocalization in callitrichids and the mirror system hypothesis  143–145 cooperative breeding in  137–138, 172, 178

vocal communication  138–142, 145 vocal turn-taking  177, 375, 379 cerebellum, role in prism adaptation  17 childhood, niche of  252 chimpanzee  8, 14, 30, 32, 45, 55–60, 63, 70, 76, 95, 98, 108, 110–112, 127–130, 138, 152–153, 155, 157, 162, 175, 184–190, 192–196, 200, 202, 207–208, 210, 242, 249, 256, 261, 264, 303, 327, 338–40, 342–345, 353–354, 359–362, 364, 374–375, 381–382 chimpanzee imitation  208 CNP, see comparative neuroprimatology co-gestural speech  291, 297, 302–304, 306–313 see also cospeech gestures cognitive complexity  155–156 cognitive control  59–60, 105 cognitive evolution  105, 110, 114, 319, 282, 345 cognitive map  355, 381 cognitive niche  363 combinatoriality hypothesis  88, 91 comparative neuroprimatology (CNP)  1–2, 8–9, 45, 55, 61, 86–89, 103, 106, 112, 122, 124, 130–131, 137, 145, 174, 175, 178, 192, 260–262, 266, 319, 323–325, 328, 374–376, 381–382 of music  93, 95–96, 98 comparative neuroprimatology 2018 (CNP-2018) road map  370–387

390 How the Brain Got Language: Towards a New Road Map see also towards a new road map competitive queuing, see ACQ model complex action recognition  14 complex imitation  14–16, 31, 79, 89–91, 97, 190, 259–262, 290 ontogeny and mechanisms  190–191 computational comparative neuroprimatology  8–10, 86 construction grammar  245, 249, 251, 259 see also Template Construction Grammar constructions, evolution of  32 conversation  15, 33, 75, 97–98, 142, 152, 154–155, 161, 167, 170, 175–177, 247, 273, 373, 392 cooperation, evolutionary origins of  171–172 see also social cooperation cooperative breeding  77, 137, 140, 142, 144, 145, 153, 172, 177, 375 Cooperative Breeding Hypothesis  153 cooperative computation of schemas  24–26, 28 cooperative nature of language  170, 174, 363 see also turn-taking cooperative signaling  159 cospeech gesture  217, 225–227 see also co-gestural speech, gesture–speech integration Cultural Group Selection Hypothesis  153 deaf signers, see sign language desirability  12, 173, 194 development issues  46–47, 63, 90–91, 103–105, 107–109, 111–115, 140, 142, 144, 153, 158–159, 178, 190–192, 196, 207, 212, 216, 218, 220, 222–226, 229–30, 234–237, 240, 246–248, 251–252, 257, 259, 262, 264–267, 308, 339, 343, 371, 375–377, 384

Diffusion Tensor Imaging, DTI  14 dorsal audio–vocal pathway  30–31, 71–80, 93–94, 264–265 dorsal visuo–manual pathway  10, 14, 28–30, 56, 58, 61, 65, 75, 80, 185–186, 191, 250, 260–267, 329, 380 dyadic brain modeling  13 emotion and music  87–88, 90–92, 94–95 emotion  40, 42, 44, 47, 98, 103–105, 114–115, 121–122, 162, 211, 240, 372, 380 and language evolution  122–124, 130–132, 195–196, 381 evolution of neural substrates  102, 107, 374 neural substrate  105–113 emotional and intentional communication in nonhuman primates  125–129 emotional contagion  44, 122 emotional facial signals  42, 47, 192–195 emotional gestural signals  128–129 emotional prosody  94, 97 emotional vocal signals  127–128 emotion–language linkage  95–97, 373, 110 empathy  44, 110, 152, 361 episodic memory  353–358, 362, 372, 376, 380–381 EvoDevoSocio  7, 9, 22–23, 239–40, 251, 371, 376 Evolution, see EvoDevoSocio, language evolution Evolutionary Modern Language (EML)  240, 244, 252, 265 executability  12, 195 eye movement  192–193, 220 F5, see premotor area F5 face mirror network, see mirror system for mouth faces, see referential information from faces facial gesture  42, 45, 47

see also mirror system for mouth facial mimicry  43–44 FARS (Fagg-Arbib-RizzolattiSakata) model  10, 13 fitness-consequences  141, 167–173, 176, 178 fractionation  32–34, 88–89, 91, 97, 248, 250–251, 265, 290, 313, 383 gelada baboons  43 gestural communication, see gesture gesture  13, 15–16, 29–31, 39–40, 43–47, 61, 78–79, 90, 122, 124–125, 128–131, 152, 155, 158–159, 175, 186–190, 192, 195, 201–203, 206–207, 216–230, 233–238, 261–262, 266, 290–294, 297–306, 308–11, 313, 329–30, 342, 359, 362, 373, 375–378, 380, 382–385 gesture–speech integration  46, 217–218, 222–223, 292–294 grammar  7, 16, 23–24, 27, 31, 33–34, 73, 79, 91, 154, 183, 201, 205, 244–245, 267, 297, 303, 305, 337, 341–343, 346, 372, 375, 380, 383–384 see also artificial grammar, construction grammar, syntax, universal grammar group cohesion, see social cooperation hand and mouth in relationship  39–40 hand mirror system, see mirror system for hand hand–mouth synergies  44–47, 229 for gestural communication  45–46 hippocampus  19, 92, 94–95, 105, 108, 354–357, 360 historical language change  371, 377 see also Evolutionary Modern Language Hominin line (Homo and Australopithecus)  98,

Index 391

114, 178, 272, 277–278, 337, 344–345, 376–377, 382 Homo antecessor  345 Homo erectus  31, 243, 344–345, 363, 384 Homo ergaster  344 Homo heidelbergensis  384 Homo sapiens  7–8, 16, 22–23, 29, 31, 104–105, 114, 186, 363–364, 383–384 honest communication  167, 169, 172 imitation  14–16, 31, 57–61, 78, 88, 159–160, 247–250, 283, 339–340, 375, 379–384 see also simple imitation, complex imitation assisted  189 in chimpanzees compared to humans  184–188, 195–196 in great apes  200–201, 206–213 in marmosets  97, 143 of facial expressions  40 information donation  140–141, 169–171, 173, 178 innateness  7, 10, 13, 15, 17, 19, 23–24, 29, 55, 63–65, 94, 109, 122, 127, 186, 188, 374, 380, 382 insula  40–42, 106, 108, 110–111 interdependence Hypothesis  153 intransitive action  14 Kanzi’s lexigram/sign use  338, 346, 359 knapping  258–261, 267, 275, 277–278, 383 see also Acheulian, Oldowan language comprehension  27–28 language emergence, see language evolution language evolution  7, 16, 30, 33, 55, 64, 78–79, 89, 96, 98, 103–105, 112, 122–124, 130, 143, 162, 169–70, 172, 201, 203, 211, 239–40, 244, 257, 265–266, 289–90, 308, 319, 323, 325, 330, 337, 339, 341, 371, 382

and cooperative breeding  136, 138, 144–145 and social competition and cooperation  152, 158, 161, 170, 174, 178 emotion and  112, 122, 130–131 language-ready brain  7–8, 13, 19, 23, 28–29, 33–34, 87, 89, 91, 94–98, 184, 191, 196, 216–217, 244, 246–247, 250–252, 266, 273, 289, 303, 308, 313, 358, 371–372, 376, 379, 381–385 see also development issues laryngeal areas and motoneurons  72, 77–78, 81, 128, 195 larynx  47, 77, 128 last common ancestor, see LCA-c, LCA-m LCA-a, see LCA-c, LCA-ga LCA-c, last common ancestor with chimpanzees  8–9, 13–16, 56, 61, 95, 103–105, 107–109, 112, 114–115, 130, 159, 171, 177–178, 186, 229, 240, 242, 251, 290, 339, 371, 374, 377, 379, 381–382 see also LCA-ga LCA-ga, last common ancestor with great apes  201, 210, 377, 379, 381 LCA-m, last common ancestor with monkeys  7, 9–10, 12, 13–15, 55, 61, 110, 114, 127, 130, 229, 290, 339, 371, 374, 377–81, 383 lexicon  7, 16, 77, 91, 154, 157, 159, 183, 226, 244, 359, 371–372, 375 see also action lexicon, music-lexicon evolution of  31, 32, 34, 383 limbic system  18, 39, 47, 73, 77, 97, 102–106, 110–116, 123, 195, 380 lying  170 macaque  8–11, 14, 17–19, 28, 30, 38–46, 55–58, 60–61, 63, 75–77, 95–96, 98, 105–106, 110, 112, 125–126, 129,

194–195, 261, 264, 281, 323–324, 326, 374, 377, 381 macaque mirror neurons  38–46 macaque premotor neurons, relation to vocalization  17 marmosets  13, 137, 140–142 see also callitrichid antiphonal calling in  155, 326 co-representing each other’s actions  143 mirror neurons in  143 sequence processing abilities  323 social learning in  143–144 mental time travel  244, 353–358, 360, 362–364 mentalizing  44, 194, 196, 262, 266 Mirror Neuron System model, MNS  10 mirror neurons  38–39 as evolved first to monitor self-actions  11 for hand  41 for mouth  41, 42 Mirror System Hypothesis, MSH  2, 7–8, 13–18, 23, 28, 30–34, 46, 71, 76, 78, 78–79, 81, 143–144, 168, 178, 172, 257, 259, 261–265, 283, 290, 379–383 Mitteilungsbedürfniss  98 mouth and hand in relationship, see hand and mouth in relationship MR, see music readiness MSH, see Mirror System Hypothesis music  86–98, 104, 234, 247, 267 see also comparative neuroprimatology of music, music-emotion, music-lexicon, musicreadiness music-emotion  86–87, 92, 94–95, 97, 373 music–language relationship music-lexicon  86, 88, 91, 96 musicology  90

392 How the Brain Got Language: Towards a New Road Map music-readiness  86–88, 91, 95–96, 376, 383 neuro-archaeology  274–275, 376 neurolinguistics  9, 24, 30–31, 33, 341 new road map, see towards a new road map niche construction  240, 246, 264, 267 Oldowan manufacture  60, 259–263, 266, 275–279, 281–283, 339–340, 364, 383 ontogenetic ritualization (OR)  13, 19, 158–159, 173, 187, 206, 211, 220, 230, 381 opercular cortex  40–41, 43, 106, 281, 321, 324 orangutan brain  98 imitation  207–209, 362 mother–offspring dyads  160 pantomime  202, 205 tool use  161 orbitofrontal cortex  40, 42, 94–95, 102–103, 106–108, 11–113, 115 origin of speech, see speech, origin of pantomime  8, 15–17, 30–32, 79, 173, 178, 223–230, 240, 261–262, 283, 345, 375, 379, 381–382 in great apes  200–207, 210–213 parietal-premotor circuits  40, 380 see also FARS (Fagg-ArbibRizzolatti-Sakata) model parity  8 parroty  328, 380 pars opercularis, see opercular cortex pedagogy  178, 256, 258, 261–262, 267, 383–384 see also teaching phonological loop  74–75, 76, 273, 278, 282, 374, 381 phonology  16, 32, 73, 74, 76–77, 80, 91, 244, 251, 260, 262, 328, 346

plasticity  22, 24, 54–55, 57, 62–65, 78–79, 114, 190, 252, 264–265, 373–374, 377 plasticity–innateness tradeoff  54, 65 platform of trust  144, 167–169, 171–174, 176–178, 382 Pleistocene  257, 266, 282, 356, 363–364 pointing  45–46, 128–129, 190, 218–20, 222–223, 229, 234, 237–238, 298, 300–10, 358–359 praxis  15, 28, 33–34, 89, 239–40, 242, 246, 252, 264–266, 376, 384 niche of  252 premotor area F5  8, 10–11, 18, 41, 43, 46, 60, 89, 91, 129, 374 premotor cortex  17–18, 28, 39, 55, 59–60, 76–78, 93, 106, 128, 195, 260–261, 274, 280, 346 prism adaptation  17 prosody  77, 88, 94–98, 175, 373, 383 protolanguage  9, 16, 23, 29, 31–34, 89–91, 114, 229, 242, 244, 252, 264, 275, 283, 294–295, 304, 363, 371, 376, 379, 382 protomusic  97–98 protosign  8, 16–18, 30, 46–48, 79, 89, 91, 97–98, 207, 229–30, 261, 264, 290, 380–382 protospeech  8, 16–18, 23, 30, 46–48, 79, 89, 97–98, 264, 289, 382–384 protosymbols  186, 188, 195–196 protowords  23, 31–32, 34, 265, 303, 382–383 putamen  43, 113 reading-ready brain  23 recursion  60, 61, 89, 205, 239–240, 249, 284, 308, 312, 361 referential information from faces  193–195 rhythm  88, 90, 93–98, 125

rhythm-first hypothesis  88, 90 road map, see towards a new road map schemas  9–10, 23–29, 33–34, 80, 217, 221, 244–245, 247–249, 289, 297, 377 semantic representation, see SemRep semantics  16–17, 23–24, 27–28, 73–74, 79, 201, 203, 211, 244, 261, 297, 301, 305, 313, 346, 372, 375, 381–383 semiosphere  252 SemRep  25–34 in language evolution  28–30 Sequencing  10, 19, 30, 34, 45, 55, 57, 59–61, 77, 87, 124, 141, 143, 153, 156, 157–158, 161, 177, 205, 207, 210, 221, 240, 247, 249–50, 258–60, 263, 265–267, 274, 277–278, 289, 297, 300–305, 309–10, 312, 313, 318–330, 341–342, 355, 361, 374, 377–378, 383 sign(ed) language  7–8, 46, 183, 201, 217, 223–230, 260, 372–373 brain lesions and  16 reduced form used with great apes  201, 208 simple imitation  14, 290 SLFIII, see superior longitudinal fasciculus social context, processing of  40, 42, 125–128, 205, 258 social cooperation  89, 95, 97, 123, 142, 151–154, 158, 160–162, 172–173, 177, 257–258, 262, 375–376 see also cooperative breeding, turn-taking socio-affect-cohesion hypothesis  88, 91 sociocognitive function  112, 114–115, 121, 141, 162, 190 speech  7, 30–33, 59, 70–71, 74–77, 79–81, 87, 223, 260, 295, 305, 373, 378 see also cospeech gesture, gesture–speech integration, gesture–speech integration

Index 393

compared to music  90–91, 94, 97 origin of  8, 16–17, 79, 90–91, 110, 144, 242, 289, 380–381 spindle neurons  110 stone tools, see Acheulian manufacture, Oldowan manufacture subvocal rehearsal  273 superior longitudinal fasciculus, SLFIII  30, 60–61, 71–72, 264 superior temporal sulcus, STS  42, 72, 264 symbolic capacity  114 symbolic communication  103, 167, 170, 216–217, 222–224, 226, 241–242, 252 symbolic construal  245 symbolic development  223, 230 symbolic niche  242–243, 246, 251 symbolism  17 symbol-ready brain  239–40, 242, 244, 246, 251–252 symbols  16, 32, 62, 183–184, 188–192, 195–196, 228, 239–40, 244, 308, 311, 337–338, 353, 359, 376, 381 synergies, see hand mouth synergies syntax  23–24, 28, 31, 73, 89, 239, 260, 302, 313, 319, 321–322, 324, 329, 341–342, 346, 371, 374, 380, 383 see also grammar phonological  328

tamarins  141 see also calltrichids teaching  140–142, 171–172, 183, 190, 209, 252, 258, 267, 340, 384 see also pedagogy technological niche  252, 257–258, 262, 266 technological pedagogy hypothesis  256, 258, 261–262, 267, 384 technosphere, see technological niche Template Construction Grammar (TCG)  23–24, 79 theory of mind  73, 110–111, 144, 157, 262, 360–361, 363–364, 372–373 towards a new road map  18–19, 33–34, 46–48, 80–81, 96–98, 115–116, 130–132, 161–162, 177–178, 210–213, 228–230, 249–252, 266–267, 283–284 transitive action  14 turn taking  95, 97–98, 139–140, 144, 153, 155–156, 162, 174–177, 376, 379 vocal  141, 149–50, 375, 378 universal grammar  7, 55 ventral audio-vocal pathway  18, 71–73, 75–77, 79–80, 93–94, 264–265, 321 ventral visuo-manual pathway  10, 14, 18, 28–31, 33, 56–60, 65, 185–186, 193, 250, 260–261, 263, 264–265, 380 viewpoint  224

visual control of action  10, 55–56, 70–71, 74 visual scene description  25–27 visuospatial sketchpad  273 vocal calls  15, 196, 342, 361, 381 vocal communication, see vocalization vocal complexity  140, 144–145 vocal control, see vocalization vocal exchanges  326–327 see also turn-taking vocal sequences  322–23 vocalization  8, 16–17–19, 23, 30, 42, 46–48, 70, 74–81, 88–97, 109, 123–128, 130–131, 155, 162, 229, 243, 262, 281, 290, 322, 326–329, 371, 373, 375, 380–383 see also vocal calls conditioned  47 in callitrichids (marmosets)  138–142, 144–145 Wernicke’s area  71, 74, 104, 345 Williams syndrome  98, 104, 109, 113, 115, 178 WM, see working memory working memory (WM)  10, 15, 19, 25–28, 74–81, 105, 143–144, 244, 259, 272–84, 323, 327, 374, 381 see also phonological loop, visuospatial sketchpad WS, see Williams syndrome zone of proximal development  207, 212

How did humans evolve biologically so that our brains and social interactions could support language processes, and how did cultural evolution lead to the invention of languages (signed as well as spoken)? This book addresses these questions through comparative (neuro) primatology – comparative study of brain, behavior and communication in monkeys, apes and humans – and an EvoDevoSocio framework for approaching biological and cultural evolution within a shared perspective. Each chapter provides an authoritative yet accessible review from a different discipline: linguistics (evolutionary, computational and neuro), archeology and neuroarcheology, macaque neurophysiology, comparative neuroanatomy, primate behavior, and developmental studies. These diverse perspectives are unified by having each chapter close with a section on its implications for creating a new road map for multidisciplinary research. These implications include assessment of the pluses and minuses of the Mirror System Hypothesis as an “old” road map. The cumulative road map is then presented in the concluding chapter. Originally published as special issue of Interaction Studies 19:1/2 (2018)


978 90 272 0762 3

John Benjamins Publishing Company