Consensual Illusion: The Mind in Virtual Reality (Cognitive Systems Monographs, 44) 3662637405, 9783662637401

This book is inspired by the contemporary fascination with virtual reality and growing presence of this type of technolo

162 83 4MB

English Pages 145 [141] Year 2021

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Consensual Illusion: The Mind in Virtual Reality (Cognitive Systems Monographs, 44)
 3662637405, 9783662637401

Table of contents :
Contents
1 A New Kind of Extension
1.1 From Real to Virtually Real
1.2 Extensions
1.3 Being … Where?
1.3.1 Breaks in Presence
1.3.2 Cognitive Penetrability of Visual Perception and Presence
1.3.3 Some Other Factors Affecting Presence
1.3.4 Lost and Inverse
1.4 Physicality Expectations and Pathways to Illusion
1.4.1 Virtual Embodiment
1.4.2 Feeling What is not There
1.5 Consensual Illusion
References
2 Self in Virtual Reality
2.1 The Puzzle of Having a Self
2.2 Disjoint Self
2.2.1 The Minimal Self
2.3 Denaturalized Body
2.3.1 Body Representation: Body Image and Body Schema
2.3.2 A Combined Tools Extension of the Body
2.3.3 Bodily Self-Consciousness
2.3.4 Spatial Perspective Taking and Extracorporeal Experiences
2.3.5 Perceptual Asynchrony and Information Integration
2.4 The Neural Basis of Self-Representation
References
3 Self and the Virtual Other
3.1 Social Cognition
3.1.1 Theory of Mind
3.1.2 Attributing Minds to Non-Human Entities
3.1.3 The We-Mode
3.1.4 Relational Models
3.2 The Social Brain
3.3 Social Interaction in Virtual Environments
3.3.1 Relational Models and We-Mode in Virtual Environments
3.3.2 Interacting with Virtual Characters
3.3.3 Virtual Togetherness
References
4 Virtual Embodiment and Action
4.1 Distinguishing Own Actions from Those of Others
4.2 Action Possibilities
4.3 Vision for Action
4.3.1 Visual Scene and the Feeling of Presence
4.4 Motor Cognition
4.4.1 Hierarchies of the Action System
4.4.2 Intentional Action
4.4.3 Action Representation
4.5 Joint Action
4.5.1 Synchronous Actions: Muscular Bonding and Beyond
4.5.2 Just a Sound
4.6 Extended Embodiment and Action
References
5 Spatial Cognition in Virtual Reality
5.1 Body and the Space Around It
5.1.1 Sense of Place
5.1.2 Processing Strategies for Real versus Virtual Spaces
5.1.3 Frames of Reference
5.2 There… Where the Self Is?
5.3 Body Boundaries and Distance Sensing
5.3.1 Virtual Proxemics
5.4 Spatial Memory and the Hippocampus
5.4.1 Spatial Working Memory in Virtual Reality
References
Index

Citation preview

Cognitive Systems Monographs 44

Vanja Kljajevic

Consensual Illusion: The Mind in Virtual Reality

Cognitive Systems Monographs Volume 44

Series Editors Rüdiger Dillmann, University of Karlsruhe, Karlsruhe, Germany Yoshihiko Nakamura, Department of Mechano-Informatics, Tokyo University, Tokyo, Japan Stefan Schaal, University of Southern California, Los Angeles, CA, USA David Vernon, University of Skövde, Skövde, Sweden Advisory Editors Heinrich H. Bülthoff, MPI for Biological Cybernetics, Tübingen, Germany Masayuki Inaba, University of Tokyo, Tokyo, Japan J. A. Scott Kelso, Florida Atlantic University, Boca Raton, FL, USA Oussama Khatib, Stanford University, Stanford, CA, USA Yasuo Kuniyoshi, The University of Tokyo, Tokyo, Japan Hiroshi G. Okuno, Kyoto University, Kyoto, Japan Helge Ritter, University of Bielefeld, Bielefeld, Germany Giulio Sandini, University of Genova, Genova, Italy Bruno Siciliano, University of Naples, Napoli, Italy Mark Steedman, University of Edinburgh, Edinburgh, UK Atsuo Takanishi, Waseda University, Tokyo, Japan

The Cognitive Systems Monographs (COSMOS) publish new developments and advances in the fields of cognitive systems research, rapidly and informally but with a high quality. The intent is to bridge cognitive brain science and biology with engineering disciplines. It covers all the technical contents, applications, and multidisciplinary aspects of cognitive systems, such as Bionics, System Analysis, System Modelling, System Design, Human Motion Understanding, Human Activity Understanding, Learning of Behaviour, Man-Machine Interaction, Smart and Cognitive Environments, Human and Computer Vision, Neuroinformatics, Humanoids, Biologically motivated systems and artefacts, Autonomous Systems, Linguistics, Sports Engineering, Computational Intelligence, Biosignal Processing, or Cognitive Materials—as well as the methodologies behind them. Within the scope of the series are monographs, lecture notes, selected contributions from specialized conferences and workshops, as well as selected Ph.D. theses. Indexed by SCOPUS, DBLP, zbMATH, SCImago.

More information about this series at http://www.springer.com/series/8354

Vanja Kljajevic

Consensual Illusion: The Mind in Virtual Reality

Vanja Kljajevic Norwegian University of Science and Technology Trondheim, Norway

ISSN 1867-4925 ISSN 1867-4933 (electronic) Cognitive Systems Monographs ISBN 978-3-662-63740-1 ISBN 978-3-662-63742-5 (eBook) https://doi.org/10.1007/978-3-662-63742-5 © Springer-Verlag GmbH Germany, part of Springer Nature 2021 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer-Verlag GmbH, DE part of Springer Nature. The registered company address is: Heidelberger Platz 3, 14197 Berlin, Germany

To Mara, my wonderful mom, for her love and encouragement.

Contents

1 A New Kind of Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 From Real to Virtually Real . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Being … Where? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Breaks in Presence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Cognitive Penetrability of Visual Perception and Presence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.3 Some Other Factors Affecting Presence . . . . . . . . . . . . . . . . . 1.3.4 Lost and Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Physicality Expectations and Pathways to Illusion . . . . . . . . . . . . . . . 1.4.1 Virtual Embodiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 Feeling What is not There . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Consensual Illusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1 4 8 9 11 13 15 16 17 22 25 27

2 Self in Virtual Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 The Puzzle of Having a Self . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Disjoint Self . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 The Minimal Self . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Denaturalized Body . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Body Representation: Body Image and Body Schema . . . . . 2.3.2 A Combined Tools Extension of the Body . . . . . . . . . . . . . . . 2.3.3 Bodily Self-Consciousness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Spatial Perspective Taking and Extracorporeal Experiences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.5 Perceptual Asynchrony and Information Integration . . . . . . . 2.4 The Neural Basis of Self-Representation . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33 33 34 36 37 38 40 42 44 46 49 52

3 Self and the Virtual Other . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Social Cognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Theory of Mind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Attributing Minds to Non-Human Entities . . . . . . . . . . . . . . .

57 57 60 63 vii

viii

Contents

3.1.3 The We-Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.4 Relational Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 The Social Brain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Social Interaction in Virtual Environments . . . . . . . . . . . . . . . . . . . . . 3.3.1 Relational Models and We-Mode in Virtual Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Interacting with Virtual Characters . . . . . . . . . . . . . . . . . . . . . 3.3.3 Virtual Togetherness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

65 67 69 73

4 Virtual Embodiment and Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Distinguishing Own Actions from Those of Others . . . . . . . . . . . . . . 4.2 Action Possibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Vision for Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Visual Scene and the Feeling of Presence . . . . . . . . . . . . . . . . 4.4 Motor Cognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Hierarchies of the Action System . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Intentional Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Action Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Joint Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Synchronous Actions: Muscular Bonding and Beyond . . . . . 4.5.2 Just a Sound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Extended Embodiment and Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

87 87 89 92 94 96 99 100 101 103 104 106 107 108

5 Spatial Cognition in Virtual Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Body and the Space Around It . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Sense of Place . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 Processing Strategies for Real versus Virtual Spaces . . . . . . 5.1.3 Frames of Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 There… Where the Self Is? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Body Boundaries and Distance Sensing . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Virtual Proxemics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Spatial Memory and the Hippocampus . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Spatial Working Memory in Virtual Reality . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

113 113 114 115 117 119 121 122 125 128 131

75 77 80 82

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

Chapter 1

A New Kind of Extension

The world is not what I think, but what I live through (Merleau-Ponty 1958, p. xviii).

1.1 From Real to Virtually Real The notion that virtual reality is not experienced as another, radically different type of reality, but as an extension of what is already there as physically real, is based on the assumption that there is a reality-virtuality continuum, with the two being positioned at the far ends of the continuum and separated by mixed reality (Milgram et al. 1995). Unlike physically real environments, which by definition remain free of computer-generated interventions, virtual reality environments are on the opposite end of the continuum, being completely computer-generated. Along the continuum we find mixed environments, more specifically augmented reality, in which most of the images are images of real objects, and augmented virtuality, in which most of the images are computer-generated. This concept of display technologies is based on the notion that presence, interactivity and reproduction fidelity are key factors that determine where on the continuum lays a specific application1 (Gutiérrez et al. 2008). The term la realite virtuelle has been introduced more than 80 years ago by Antonin Artaud in a work related to theatre (Artaud 1938). Since then this loose term has been used to refer to alternative ways of generating experiences of the world, not only by means of theatre, but also by means of literature, film, and other forms of art (Rapolyi 2001). It has been widely popularized in the 1980s, after Jaron Lanier used this term in the context of a new technology aiming to induce 1 Readers

are referred to Gutiérrez et al. (2008) for interesting examples of applications along the reality-virtuality continuum, such as virtualization of the famous Terracotta Soldiers discovered in China in the 1970s, fully immersive virtual tours of archeological sites, and virtualization of ancient Greek theaters, among others, together with an overview of related technical and technological issues, such as the real time rendering of the virtual crowds inside the projected space or interactive sound simulation.

© Springer-Verlag GmbH Germany, part of Springer Nature 2021 V. Kljajevic, Consensual Illusion: The Mind in Virtual Reality, Cognitive Systems Monographs 44, https://doi.org/10.1007/978-3-662-63742-5_1

1

2

1 A New Kind of Extension

multisensory, interactive, immersive experiences in computer-generated environments (Lanier 2017). Even in this more specific context, the term virtual reality remains ambiguous, referring both to virtual reality environments and virtual reality technology. There are additional meanings that have been ascribed to this term, for instance simulation, interaction, artificiality, immersion, telepresence, full body immersion, and networked communication (Heim 1995). However, none of them on its own captures the nature of virtual reality. Any attempt to define virtual reality must first clarify the meaning of virtual.2 For instance, the virtual in virtual reality means that something virtual is real “effectively but not formally”, so that when a computer graphic makes an entity present to us so effectively that we might just as well have it before us, then the graphic becomes virtually that entity. The graphic then provides a virtual reality (Heim et al. 1995, p. 265).

In addition, virtual in virtual reality indicates that in a virtual reality system only the model is virtual, and it is so in the sense in which human culture and knowledge are: The user really perceives a particular state of the virtual environment. The computer has been actualizing it for the user. Computer systems enable the process of virtualization of a three-dimensional and multi-sensorial model and the process of its actualization. It makes the user feel that this experience is familiar enough to qualify it as reality. (d’Huart 2001, p. 132)

Although there is currently no universally accepted definition of virtual reality, there is a growing understanding across disciplines that virtual environments are computer-generated, interactive and immersive (Burdea and Coiffet 1994; Slater 2009; Gutiérrez et al. 2008; Chalmers 2017). Virtual reality environments differ from mixed environments, which are also interactive and immersive, but only partly computer-generated. Another controversial term related to virtual reality is virtual world. In the context of computer-generated, interactive and immersive virtual reality environments, a 2 Briefly, the word virtual originates from the old Latin virtus, which refers to certain powers of human beings. The overtones of power are not present in the Christian meaning of virtue, which was “possessing certain values”. A fifteenth century definition of virtual determines its meaning as being something in effect, essence, or potentiality, but not in actual name or form (Chalmers 2017). In contemporary English, this adjective is generally used to indicate that something is almost as the entity it describes, but not as strictly defined. The word virtual has more specific meanings in different scientific disciplines. For instance, in computer science it may refer to a software creation that appears as if it exists, although it does not physically exist as such (e.g. virtual images) or to something that is accessed or stored on a computer (e.g. a virtual library). Lexico (2019) provides the following definitions of virtual in some other disciplines: in optics, virtual may refer to the “points at which rays would meet if produced backwards”, in mechanics to “infinitesimal displacements of a point in a system”, while in physics it denotes “particles or interactions with extremely short lifetimes and (owing to the uncertainty principle) indefinitely great energies, postulated as intermediates in some processes". Due to contemporary fascination with virtual reality technologies, the word virtual in everyday use is becoming increasingly associated with the meaning found in computer science, i.e. referring to something that is being simulated, being online, or being present not physically but instead in a computer-generated environment.

1.1 From Real to Virtually Real

3

virtual world is the world generated by using virtual reality technology. However, as the early use of the term virtual reality implies (Artaud 1938), virtual worlds are created also by art, which opens the question of whether there are any fundamental differences between these types of virtual worlds. One difference is related to the degree to which these worlds can be shared with others (Chalmers 2017). Virtual worlds created by art, for example when reading a book, watching a movie, or observing a painting, rely on a degree of imagination and thus they cannot be fully shared with others. On the other hand, computer-generated virtual worlds are much more specified and in principle easier to share. There are, of course, differences among virtual environments depending on fidelity. For example, environments with more spatial cues result in more detailed spatial situation models, which increases participants’ motivation to participate in such environments and their attention to events, objects and characters they encounter there (Hartmann et al. 2015). Apparently, environments with impoverished spatial cues require a higher degree of participants’ visuo-spatial imagery skills to construct stable spatial situational models. Furthermore, virtual reality applications have been generally divided into those pertaining to realistic virtual worlds and those pertaining to magical virtual worlds (Mania and Chalmers 2001). The former type of environments are simulations of the real world, including for instance flight simulators and other training devices, whereas the latter are environments that defy the laws of physics (e.g. lack of gravity, teleporting to remote objects) and afford superhuman features (e.g. having X-ray vision, the ability to walk through walls, flying by own volition, psychokinesis). Since the early days of virtual reality, it has been emphasized that one advantage of virtual compared to real environments is that the laws of physics that hold for the real world do not necessarily have to hold for a virtual environment. As Sutherland (1965) pointed out, there is no reason why virtual reality has to obey the rules of the physical reality: the true potential of virtual reality is precisely in not having to obey the physical reality laws. Philosophers who study virtual worlds typically ascribe them a fictional character, assuming that these worlds do not exist in reality. Put differently, such worlds are fictional and so are virtual objects that we encounter there as well as virtual events that take place in such worlds. This view is known as virtual fictionalism (Chalmers 2017). Other scholars deny that virtual worlds are fictional, while still ascribing them the status of being unreal, in the same way in which dream worlds are not real. Yet others argue against virtual fictionalism and for virtual realism. Briefly, virtual realism is a view that virtual objects really exist, i.e. as digital objects on a computer server, that virtual properties of virtual objects are real, too; virtual events in virtual reality really take place, our experiences in virtual reality are non-illusory and they are as valuable as non-virtual experiences. Virtual worlds are part of the real world via existing on real computers, but that does not make them less real or less valuable (Chalmers 2017).

4

1 A New Kind of Extension

While philosophers argue about it, for most laymen the physical world exists independently of beings that cognitively represent it.3 In contrast, virtual reality worlds, being computer-generated, presuppose existence of the physical world and require a thinking, representing mind. Since virtual reality offers a space in which to act, it is a tool that generates the feeling of extension of the physical world. Importantly, even though a computer-generated space remains a virtual environment, the participant’s subjective experience of being and acting in that space, the feelings and thoughts generated there, are not fictive, but real.

1.2 Extensions The notion of extension as an instrument of perception and cognition is certainly not new. For instance, Merleau-Ponty (1958) argued that a stick in the hand of blind man becomes an instrument instead of object of perception, and as such it is an extension of his body. Telescopes and microscopes expand our view of reality so that we can see distant galaxies and observe microorganisms for which we cannot know to exist when we look by the naked eye. Due to these extensions, our senses can reach further. Anthropologists use the concept of extension to refer to improvements or specialization of human mental and physical functions by means of external tools: The computer is an extension of part of the brain, the telephone extends the voice, the wheel extend the legs and feet. Language extends experience in time and space while writing extends language. (Hall 1969, p. 3)

Hall argues that these extensions of human organism have been elaborated to such an extent that they have taken over, “replacing nature” and creating a new dimension, the cultural dimension. Crucially, the new dimension allows us to create “the total world” in which we live and thus to determine who we will be. On this view, humans and their extensions, from cities to technology to language, are extensively interrelated into one system. The extensions we create are in turn shaping who we are. Given this unity, the argument goes, it would be a mistake to consider extensions on their own, i.e. outside of this tightly interconnected relationship with humans, as it would be also a mistake not to pay enough attention to the extensions we create when considering human beings. Besides Hall, other anthropologists4 in the 1950s and 1960s argued that human evolution has shifted from the human body to various extensions and that this shift has immensely accelerated the evolutionary process. In other words, using and inventing tools has changed the human nature. Marshall McLuhan had recognized the potential of media as technological prosthesis to reconfigure and change the human nature, 3 We

will not delve into philosophical arguments concerning the nature of reality or deal with intriguing interpretations coming for instance from physics, such as the impact of quantum theory on our understanding of physical reality (Penrose 1989), because such topics would lead us too far from the goals of the present book. 4 See, for instance, La Barre (1954).

1.2 Extensions

5

while Clifford Geertz, another anthropologist, argued around the same time for an externalist philosophy of mind, insisting that thinking is not restricted only to the processes that take place in the head but extends to culture (Geertz 1966; Boden 2006). Echoing such ideas, one contemporary cognitive science view posits that mind is “less and less in the head” (Clark 2003, p. 3) and that cognition relies on the body and its interactions with the environment (Barsalou 2008, 2010). Another interesting concept of extension comes from biology. In 1982 the evolutionary biologist Richard Dawkins introduced the concept of extended phenotype. The phenotypic effects of a gene are the effects that the gene has on the organism in which it sits; an extended phenotypic effect of a gene is the effect it has on the world outside that organism. The idea is that the tools by which the gene propels itself into the next generation “may reach outside the individual body wall”, with examples including artefacts such as beaver dams, bird nests and caddis houses (Dawkins 1989, p. 238). Furthermore, extended phenotypic effects of genes in one organism may be found not only on inanimate objects in the world (e.g. stone in the case of caddis houses), but also on the body of another organism, as in the case of snails with extra thick shells, where the thickening is due to the presence of a parasite, in which case it is appropriate to think that the parasite’s genes are influencing snails’ bodies. The genes can be close to their extended phenotypic effects (e.g. parasites live inside their hosts) or they can act at a distance (a beaver dam is an example of a gene acting at a distance). As Dawkins (1989) puts it succinctly, “It is as if the genes reached outside their ‘own’ body and manipulated the world outside” (p. 242). Similar to Hall’s idea of new dimension, culture can be viewed as Dawkins’s extended phenotype in the sense that it continues along the path of biological evolution. More recently cognitive scientists have recognized the impact of bodily interactions with the world on the evolution of cognition (Thelen 2000), arguing for the concept of extended cognition. Some suggested that extended cognition is “a core cognitive process, not an add-on extra” (Clark and Chalmers 1998) and that “a process is not cognitive simply because it happens in a brain” (Hollan et al. 2000, p. 175). They argue that our minds/brains and bodies together with tool-based extensions form extended thinking systems: It is our natural proclivity for tool-based extension, and profound and repeated selftransformation, that explains how we humans can be so very special while at the same time being not so very different, biologically speaking, from the other animals with whom we share both the planet and our genes. What makes us distinctively human is our capacity to continually restructure and rebuild our own mental circuitry, courtesy of an empowering web of culture, education, technology, and artifacts. (Clark 2003, p. 10)

On this view, the unity between the brain, body and technology expands and alters our psychological processes and our sense of who we are. For instance, Clark argues that we are becoming “human-technology symbionts: thinking and reasoning systems whose minds and selves are spread across biological brain and nonbiological circuitry” (Clark 2003, p. 3–4). Regardless of whether we agree with this notion or not, human mind and extensions we invent coevolve. Further following the development of the idea that tool-based extension of human mental and physical functionality has accelerated the evolutionary process, we arrive

6

1 A New Kind of Extension

at the concept of posthuman. As the concept of posthuman developed, the posthuman future of humanity has been envisioned differently, from post-biological future, in which intelligent machine outperforms human and takes over, to the concept of posthuman as a symbiotic union between human and intelligent machine. The concept of biological posthuman is related to human engineering of biological cognition via genetic editing. According to one view, such engineering is ideally performed through iterated embryo selection (Bostrom 2014). The goal is to develop superintelligence, i.e. minds that perform significantly better than the best current human minds across multiple cognitive domains.5 For example, developing new cognitive modules that are comparable in advantage to the way in which language sets apart humans from other species would mean becoming superintelligent. The concept of posthuman as a symbiotic union between human and intelligent machine assumes that the two components are indistinguishable (Hayles 1996, 1999). Some of the additional assumptions of the posthuman view in this sense are that information pattern has primacy over bodily instantiation, where the body is a “historical accident” and consciousness has a less prominent role than in the traditional Western thought (Hayles 1999, p. 2–3). Virtual reality technology is an example of tool that allows us to experience having abilities that surpass those characteristic to human beings, a different sense of corporeality, and more generally a different sense of self. However, this does not mean that the concept of posthuman is warranted in this context. It would be misleading to assume that researchers generally agree that the human–machine symbiosis or the way in which virtual reality intervenes in our subjectivity represents a radical break from the past. Kendrick (1996) for example argues that the human–machine symbiosis, or “the technological real”, relies on reconstruction of subjectivity by the very technologies that it is supposed to manipulate, and that this type of intervention is not unique to virtual reality. Similarly, Clark (2003) convincingly argues that the biotechnological unions are not more radical than the changes that humanity has encountered in the past, for instance with the advent of timekeeping or the use of text. He remarks that even the early proponents of biotechnology type of symbiosis, including the originators of the term “cyborg”, Clynes and Kline, were against the notion that the intervention of new technologies into our subjectivity might transform us into posthuman. New technologies would extend human capacities, for instance by extending the capabilities of the human physical body, but they do not render us posthuman, not because they are not transformative, but because it is in our nature to routinely incorporate such physical and cognitive extensions. It is therefore unnecessary, Clark (2003) claims, to evoke any post-human future, defining ourselves in contrast with, instead of as part of, the world in which we live. Thus, instead of 5 Other paths to superintelligence have also been envisioned (e.g. artificial intelligence, whole brain

emulation, direct brain-computer interfaces, enhancement of networks linking individual minds into collective superintelligence) as well as different types: (i) a speed superintelligence is an intellect that is “just like a human mind, but faster”; (ii) collective superintelligence is a system that includes numerous smaller intellects whose overall performance across general domains outperforms any current cognitive system; and (iii) quality superintelligence is a system that is at least as fast as and qualitatively much smarter than a human mind (Bostrom 2014, p. 53).

1.2 Extensions

7

arguing that the ways in which new technologies including virtual reality intervene in our subjectivity make us posthuman, a more appropriate view would be to interpret the effects of various tools or extensions within the context of technological progress and our cultural and cognitive adaptation. Since its inception, virtual reality has fueled debates on what it is and what it should be. These debates reflect distinct theories of reality, distinct philosophies of representation and communication, and differing views regarding how best to benefit from virtual reality. Furthermore, virtual reality challenges the way we think about the relationship between the mind and body. It fools the mind into believing that the participant is in an alternative world that is sufficiently believable to be real, although physically it does not exist as such (Ellis 1991; Vince 1998; Brooks 1999; Steinicke 2016). “Leaving” the body in one environment while inhabiting this other environment may introduce the sense of disembodiment. The feeling of disembodiment is expected to arise in situations when digital immersion is not accompanied by appropriate bodily and environmental feedback. In fact, the sense of disembodiment is relatively common in virtual environments that use head mounted display-based virtual reality systems, because they rarely display a rendering of the participant’s real body. It has been demonstrated that a presence of an avatar as participant’s representation in a virtual environment has beneficial effects on the sense of presence and task performance, and that presence of a self-avatar may affect interaction in shared virtual environments, benefiting cooperative tasks and increasing the level of trust among the collaborators, as well as the sense of presence and perceptual judgments (Pan and Steedman 2017). But as emphasized by Hayles (1999), the point is not in “leaving the body behind”, but in “extending embodied awareness” in ways that would not be possible without the use of specific technology (p. 291). Virtual reality allows seeing, hearing, touching and manipulating objects in a computer-generated space as if they were physically present. This affects not only participants’ sense of body schema (Sect. 2.3.1), but also their sense of personal location and agency, raising questions such as Where am I? (Dennett 2000; Sanford 2000), Am I where my body is? Am I where the action is? These questions are relevant for virtual reality because they pertain to one of key features of virtual reality—presence: the feeling of being present elsewhere, meaning in the space generated by the virtual reality system instead at the location of the physical body. Virtual reality challenges the notion that our sense of presence is bound to our physical selves, because it allows the participant to temporarily incorporate nonbiological components in his/her body schema and perform actions that may surpass the capacities of real human body or defy the laws of physics. It extends the participant’s sense of corporeality beyond the borders of the physical body, and by introducing the concept of virtual body it challenges the idea that one agent corresponds to one body. Importantly, even when actions performed in a virtual reality environment are beyond the bounds of the physical reality, there is a degree of awareness that the world of enactment is enabled by a specific technology and embedded in the physical world via a specific discourse (Slater 2009). The ability to retain this awareness is considered to characterize all but naïve users of virtual reality (Chalmers 2017). An important implication of this view is that presence allows a degree of awareness

8

1 A New Kind of Extension

about the real world, and yet the virtual space has the power to induce the feeling that the participant is there, in the virtual environment.

1.3 Being … Where? Presence, which is short for the term telepresence, is typically defined as a subjective feeling of being there, meaning being virtually present in an alternative environment, at a location different from the actual location of the physical body. The concept of presence was introduced by André Bazin in 1951 in the context of film experience and in the 1970s it was extended to communication phenomena mediated by technology (Lombard and Jones 2015). Due to Minsky’s (1980) seminal paper on telepresence, followed by the establishment of the MIT journal Presence: Teleoperators and Virtual Environments in 1992, presence has emerged as a concept crucial for technology-mediated virtual environments. Namely, Minsky’s term telepresence referred to remote presence mediated by technology: To convey the idea of these remote control tools, scientists often use the words ‘teleoperator’ or ‘telefactor’. I prefer to call this ‘telepresence’, a name suggested by … Patrick Gunkel. Telepresence emphasizes the importance of high-quality sensory feedback and suggests future instruments that will feel and work so much like our own hands that we won’t notice any significant difference. …The biggest challenge to developing telepresence is achieving the sense of ‘being there’. Can telepresence be a true substitute for the real thing? Will we be able to couple our artificial devices naturally and comfortably to work together with the sensory mechanisms of human organisms? (Minsky 1980)

While most researchers agree that presence is one of the most noticeable psychophysical effects of immersive virtual reality (Waltemate et al. 2018), there is little agreement on how to best define this concept. In fact, there are currently many conceptualizations and definitions of presence, and many different terms are used to refer to the feeling of being there, such as telepresence, co-presence, spatial and social presence, virtual, immersive, perceived, subjective, and so on. For example, presence has been defined as the sense of being located at the place depicted by the virtual displays ( Sheridan 1992), as an all-or-none psychological phenomenon (Slater 2009) associated with the illusion of being present at a location different from the actual location of the physical body (place illusion), which is “a ‘response’ to a system of a certain level of immersion” (Slater 2003). Others define presence as “the feeling of being located in a perceptible external world around the self” (Waterworth et al. 2015), as “the experience of being engaged by the representations of a virtual world” (Jacobson 2002), and as the “perceptual illusion of nonmediation” (Lombard and Ditton 1997; Lombard and Jones 2015). Slater (2002) has noted that defining presence in terms of the feeling of being there is a category mistake and that the sense of being elsewhere is just one of many contributors to presence. Following this intuition, Riva (2009) defined presence in terms of agency and control: “subjects are ‘present’ if they are able to enact in

1.3 Being … Where?

9

an external world their intentions” (p. 159). This definition is consistent with the observation that our sense of where we are located largely depends on the sense of an “action-space” (Clark 2003, p. 94). Similarly, Wirth et al.’s (2007) model of presence formation requires two steps: self-localization in a mediated environment and the perception of possibilities for action in that environment. Given that presence is tied to the notion of space by definition, it is not surprising that it is sometimes defined in terms of what can be done in that space. Anthropologists in the 1960s emphasized this dynamic aspect of our perception of space (e.g. Hall 1969). The variety of definitions of presence has caused a great deal of confusion regarding what constitutes presence, not only among designers and consumers, but also among scholars, impeding progress in the field (Slater 2003). Thus, tidying up the definitions of presence would be helpful. Since different researchers use the term presence differently, a unifying theory of presence is currently not possible (Waterworth et al. 2015). The International Society for Presence Research (2000) defines presence as a psychological state or subjective perception in which even though part or all of an individual’s current experience is generated by/or filtered through human-made technology, part or all of the individuals’ perception fails to accurately acknowledge the role of the technology in the experience. Except in the most extreme cases, the individual can indicate correctly that s/he is using the technology, but at some level and to some degree, her/his perceptions overlook that knowledge and objects, events, entities, and environments are perceived as if the technology was not involved in the experience (www.ispr.info/about-presence-2/aboutpresence/).

For instance, using a sophisticated flight simulator we may, at least for a short while, be completely unaware that our experience of flying an aircraft is actually being mediated by the technology. When you are present your perceptual, vestibular, proprioceptive, and autonomic nervous systems are activated in a way similar to that of real life in similar situations. Even though you cognitively know that you are not in the real life situation, you will tend to behave as if you were, and have similar thoughts…. (Slater 2003, p. 2)

Thus, when the participant feels as being present in an immersive virtual environment, certain knowledge that the environment is mediated is always there, if not in the person’s full awareness, then on its border, ready to enter the focus. It is therefore important to determine the factors that channel the processes related to this awareness and how they lead to shifts between the participant’s feeling of being in and being out of the virtual space. We address these issues in the following sections.

1.3.1 Breaks in Presence When fully immersed in a virtual environment, a participant is subjected to two streams of sensory data, one stream coming from the physical world and the other from the virtual world displayed by the virtual reality system (Slater et al. 2003). If

10

1 A New Kind of Extension

for some reason the participant stops responding to the data from the virtual environment and instead responds to the sensory data from the physical world, a break in presence will follow. Recording the number and duration of such breaks can help improve presence in virtual environments. Such a method of analysis is much preferred to subjective reports and presence questionnaires. The latter methods of assessment have been critiqued on methodological grounds (Mania and Chalmers 2001; Slater et al. 2003), for instance as being unreliable, because participants report subjective, post-hoc impressions about their virtual reality experiences some of which may not be accurate and at the same time they may forget to report important indicators of transition between the environments. Physiological responses signaling such transitions are therefore considered a better basis for measuring presence than introspection-based methods. Slater (2002) outlines a way of thinking about presence that, instead of relying on an account of experience based on introspection, postulates that presence is a perceptual mechanism that answers the question Where am I? That is, it selects between the hypotheses that a person is in a virtual environment vs. that he/she is in the real world. It is this switching between the alternative environments that makes presence such an interesting research topic: if we receive signals from only one environment, we are present in that environment; but when there are competing signals from different environments, how do we choose the signals based on which to act? As an example, a participant who is walking in a virtual room with a precipice in its center—while actually standing in a CAVE-like system or wearing a headmounted display and feeling the cables, temperature and the physical signals of the actual place—will experience presence if the virtual environment sensory signals systematically override other sensory signals. In fact, this can be achieved using minimal cues: At some deep level, our minds do not understand “virtual reality”. Hence, only minimal cues are necessary for our presence selection mechanism to respond to a virtual pit, even though we know “really” that there is not one there. The sixth sense is this process of seeing what we expect to see, and it doesn’t take much for a virtual reality to convince us: we respond to events in the virtual world much as we would to similar events in the physical world. (Slater 2002, p. 435)

Even though participants have an abstract knowledge that they really are located within a CAVE space and not in a virtual room with a precipice, “visual perception overrides this knowledge and the bodily system reacts as if they were in a virtual room: heart rate rises, locomotion is carefully judged, and the subject reports symptoms of anxiety” (Slater 2002, p. 436). Slater suggests that the presence selection mechanism is grounded in the previous work on perception that assumes that perception consists of selection between hypotheses. For instance, the explanation goes, choosing between the hypotheses in perceiving the famous Kanizsa triangle typically leads to an optical illusion, i.e. seeing the edges of the white upright triangle between the circles even though such edges are not there. In addition, Slater credits Norton and Stark’s (1971) scanpath theory of eye movements for his notion of presence selection mechanism, more

1.3 Being … Where?

11

specifically the top-down cognitive model of visual perception that this theory implicates. According to this model, what we perceive in the world depends on what is in our mind’s eye. When observing ambiguous figures in which we can recognize two objects (e.g. a duck and a rabbit), our eye movements change according to whether we decide that the figure represents a duck or a rabbit. On this view, a top-down cognitive mechanism, together with the image, determines our eye movements, which suggests that eye movements and perception are not fixed by what is represented. Note, however, that this explanation contains the assumption that cognitive states penetrate visual perception, which is debatable. In fact, the question of whether perception is cognitively penetrable is at the heart of one of the most interesting debates in cognitive science.

1.3.2 Cognitive Penetrability of Visual Perception and Presence The issue whether vision is continuous with cognition, i.e. cognitively penetrable, or discontinuous, distinctive, and independent from it has been debated for a long time.6 One formulation of cognitive penetrability is as follows: …if a system is cognitively penetrable then the function it computes is sensitive, in a semantically coherent way, to the organism’s goals and beliefs, that is, it can be altered in a way that bears some logical relation to what the person knows. (Pylyshyn 1999, p. 343)

Those who argue that vision is continuous with cognition postulate that vision processes are affected by higher cognitive processes. Those who argue against this view postulate encapsulation or cognitive impenetrability of certain visual processes, such as early vision processes. One recent approach to visual perception suggests that vision is impervious to cognitive penetration and that whatever influence cognition may have on sensory stimuli processing, it may have to do not with cognitive states (beliefs, desires, emotions, motivations, intentions, linguistic representations), but with other factors, such as attention: Perhaps most prominently, shifting patterns of attention can change what we see. Selective attention is obviously closely linked to perception – often serving as a gateway to conscious awareness in the first place, such that we may completely fail to see what we do not attend to (as in intentional blindness…). Moreover, attention – which is often linked to a ‘spotlight’ or ‘zoom lens’ … – can sometimes literally highlight or enhance attended objects, making them appear (relative to unattended objects) clearer … and more finely detailed….. (Firestone and Scholl 2016, p. 13)

According to this view, selectively attending to a different feature of an object is similar to changing what we see when moving our eyes. In other words, what 6 See

Pylyshyn (1999) for a brief review of the debate, including the views held by von Helmholtz, Bruner and the New Look in Perception, among others.

12

1 A New Kind of Extension

changes is the input to mechanism of visual perception, whereas visual processing itself remains fixed, that is, unaffected by cognitive states. For instance, when looking at an ambiguous image, its disambiguation does not require penetration of perception by any top-down state (intention, emotion, desire); instead, voluntarily switching between two interpretations (e.g. duck/rabbit), although it depends on peripheral attention, changes only the input to visual processing: … though one may indeed be ‘changing one’s assumptions’ when the figure switches, this is not actually triggering the switches. Instead, the mechanism is that different image regions are selectively processed over others, because such regions are attended differently in relatively peripheral way. (Firestone and Scholl 2016, p. 14)

This view predicts that the presence selection mechanism may be not perceptual, but rather an attention-based mechanism. It is compatible with the idea that presence in a virtual environment is an experience “similar to that of physical reality, together with an imperceptible shifting of focus of consciousness to the proximal stimulus located in that other world” (Sas et al. 2004, p. 53). In other words, switching between experiencing alternative environments—the immediate and the mediated—is not due to a cognitive state penetrating perception, but it is rather based on the change in a visual input due to a shift in attention. However, despite the evidence supporting this view regarding vision (e.g. Gur 2016) and other domains of perception, such as speech perception processes (e.g. Cutler and Norris 2016), the view that perception is not penetrable by cognition has been challenged on several grounds. For instance, the model of vision inherent to this view consists of three modules: a sensory/input module, a perception module, and a cognition module (Firestone and Scholl 2016). Yet the neural evidence on the organization of the visual brain does not seem to support such neatly differentiated and entirely encapsulated modules, indicating instead that the primary visual cortex (V1) is involved not only in an early stage of vision but that it is actually involved all the way through, supporting feedforward and feedback activity of the network; that attention effects do not proceed from input to perception but rather they cause changes across the visual hierarchy; and that overall interactions among the brain areas involved in vision are so extensive and complex that it is “impossible to neatly separate sensation and perception”: Rather than simply gating or enhancing input, much of the neural hardware responsible for vision flexibly changes its function in complex ways depending on the goals of the observer. (Beck and Clevenger 2016, p. 20)

While the original notion of encapsulation proposed by Fodor (1983) holds that “at least some of the background information at the subject’s disposal is inaccessible to at least some of his perceptual mechanisms” (p. 66), Firestone and Scholl’s (2016) notion of encapsulation is stricter, because they claim that the most fundamental distinction in cognitive science is “between seeing and thinking”, and that “perception proceeds without any direct, unmediated influence from cognition” (p. 17). The question whether perception and cognition are neatly separable or whether the line between them is blurred, reflecting a continuum in information processing

1.3 Being … Where?

13

capacities, has been vigorously debated in cognitive science. The question has theoretical and experimental value, as well as clinical relevance because of its role in contemporary approaches to the formation and maintenance of delusions (Ross et al. 2016). Other developments in cognitive neuroscience largely support the view that processing in the human brain is distributed and context-sensitive (Hackel et al. 2016), that the brain is a predictive organ, and that information processing is predictive rather than merely integrative (Federmeier 2007; Friston 2012; Hoffmann and Falkenstein 2012; Van Petten and Luka 2012). This evidence comes from research including vision, attention, motor control, motor imagery, action understanding, music, language processing, emotional processing, executive functions, and theory of mind. It suggests that the brain uses the incoming stimuli, which are noisy and partial, to create highly structured models for making predictions, which are then compared against the input. Regardless, it is worth considering Firestone and Scholl’s (2016) approach to visual perception and their contribution to the debate on cognitive penetrability of perception, because their approach opens a path to the explanation of breaks in presence that emphasizes the role of attention in shifting between feeling present in the real and a virtual environment. Specifically, assuming that digitally generated environments may compete for processing resources with the participants’ immediate environments, an attentional model of presence was developed. According to the model, attention is the most critical component of presence, because it determines whether the participants’ sensory perception is directed to mediated stimuli or to cues from the immediate environment (Draper et al. 1998; Wirth et al. 2007). While it is obvious that different cognitive resources play a role in our perception of the environment we are in, it has been pointed out that attention on its own is insufficient to evoke the sense of being located at a place different from the immediate environment and that attentional resources can be exhausted without evoking the sense of presence (Hartmann et al. 2015). However, shifts in attention can explain breaks in presence.

1.3.3 Some Other Factors Affecting Presence One widely accepted assumption is that participants in virtual environments process the sensory information using the mental models they use in everyday life (Slater and Usoh 1994). Anthropologists have long observed that people from different cultures belong to different sensory worlds (e.g. some cultures rely more on olfaction and touch than others), applying different sets of culturally-molded sensory filters (Hall 1969). Such cross-cultural differences need to be considered in virtual reality design, because a specific sensory input may have different effects on participants who belong to different cultures. There is also evidence indicating that a cognitive style may influence the sense of presence (Sas et al. 2004). The concept of cognitive style originates from Jung’s (1971) theory of psychological types and refers to enduring patterns of cognitive behavior. More specifically, cognitive styles are consistent modes of perceiving,

14

1 A New Kind of Extension

remembering, judging, and problem solving that characterize an individual (Messick 1989). They are tightly linked to affective, temperamental and motivational aspects of personality structure. Implicit in these patterns are people’s mental models of the world and their processing biases. Evidence from a virtual reality study suggests that people who are more introvert and sensitive often experience a higher degree of presence than extroverts (Sas et al. 2004). The study suggests that depending on a type of task they want to complete in such environments, introverts may need additional support, and that since for instance they appear to have a diminished level of spatial awareness, their performance on spatial tasks would benefit from additional cues, such as additional landmarks. In contrast, extraverted people do not need such support, because they appear to have increased level of spatial awareness. While more research on the topic is necessary, the findings that cognitive styles affect presence and task performance in virtual reality indicate that design of virtual environments needs to consider the fact that people differ with regard to patterns of cognitive behavior they adopt. Not only that characteristics of participants’ cognitive styles affect their experience of a virtual environment, but personality traits of virtual characters may have that effect as well. A recent study investigated how personality traits in virtual characters affect participants’ perception of uncanniness (Sect. 3.1.2). The study reported that the perception of personality traits indicating psychopathy, such as virtual characters’ lack of the startle response to a scream, was a strong predictor of uncanniness, whereas perception of other negative personality traits that are not indicative of psychopathy was not (Tinwell et al. 2013). Stein and Ohler (2017) suggest including personality assessment variables as covariates when studying the uncanny valley effects. Thus, personality traits of virtual characters are an important question, because the feeling of uneasiness may affect presence and participants’ task performance. Although individual variation in human cognitive task performance as well as in brain anatomy and plasticity is a well-established fact, it is usually taken for granted that there is a single “typical” human cognitive architecture. This notion has recently been criticized and it was suggested that there may be more variability in the “mapping from brain function to mind” than previously recognized and that human cognitive architecture may “differ substantially across individuals” (Adolphs 2016, p. 1153). Relating this notion to presence, individual differences in susceptibility to a specific sensory modality (vision, touch, sound) affect the degree to which that modality may induce presence. Given the elusive nature of presence, one could argue that there may exist differences in susceptibility to a specific sensory input even within a single participant across different contexts. This once again highlights the importance of consistency of incoming sensory data, which has been emphasized as the primary requirement for presence (Slater 2003). Furthermore, not only differences in the processing of incoming sensory input, but also differences in participants’ intentions may lead to differences in experiencing presence across participants in immersive virtual environments (Riva 2009). Another factor that may differentially influence participants’ experience of presence and their task performance in virtual reality is gender. The notion that gender may play an important role in cognitive functions has received much attention in

1.3 Being … Where?

15

the past and it still continues to provoke debates. The idea that women in general outperform men in certain cognitive tasks, such as verbal learning, whereas men in general outperform women in other tasks, such as math and spatial reasoning, had dominated psychological science in the 1960s and 1970s. Although not so much in the foreground, the debate on cognitive gender differences continues, generating further evidence on female vs. male advantage for cognitive tasks (e.g. Kimura and Clarke 2002). On the other hand, the gender similarity hypothesis posits that males and females are similar in most cognitive abilities, including language and math, with large differences apparently existing in motor abilities, sexual behavior, and aggression (Hyde 2005, 2016). Importantly, the question of whether gender differences exist in experiencing presence and task performance in virtual reality has not been sufficiently addressed so far. Some scholars argue that studies on virtual reality need to control for gender, pointing out differential (females > males) susceptibility to cybersickness and the impact of sex hormones on cognition (Larson et al. 1999; Parsons et al. 2004). For example, supporting the view on gender differences in spatial processing, an early study on mental rotation and spatial rotation in a semi-immersive environment reported that male participants performed considerably better in the traditional pen and pencil mental rotation task, but that there were no statistically significant gender differences in participants’ performance on the virtual reality spatial rotation task (Larson et al. 1999). The results were replicated in a later study by the same group (Parsons et al. 2004). Importantly, the two tasks differed only in one aspect: unlike the paper and pencil version of the test, which required mental rotation, the virtual reality spatial task allowed manual manipulation of the stimuli. Since the two tasks were not equivalent in the two environments, it is difficult to directly compare the results obtained on the tasks, although the study remains informative regarding the gender differences. Coutrot et al. (2018) also found a male advantage in navigation, which was more prominent in a virtual than in a real environment. Clearly, the impact of gender on presence and task performance in virtual environment is an important issue, and it is disappointing that some recent studies that used sophisticated methods such as functional magnetic resonance imaging (fMRI) for assessment of spatial abilities in virtual reality included only male participants (e.g. Mellet et al. 2010). By not including female participants, such studies miss the opportunity to contribute potentially invaluable evidence regarding this largely neglected factor which may confound cognitive results and lead to a distorted picture on participants’ performance in virtual reality and their sense of presence.

1.3.4 Lost and Inverse Another perspective in defining presence is to determine what it is not. For instance, to negatively define presence, Slater and Usoh (1994) suggest as key feature the absence of the sense of location, “such that ‘no presence’ is equated with no locality, the sense of where self is as being always in flux” (p. 130). To illustrate the point,

16

1 A New Kind of Extension

they refer to Oliver Sack’s patient who lost the ability for present day memory as a case of neurological loss of presence. Briefly, this patient would forget with whom he was talking and what the topic of conversation was, losing the context every few moments. “Imagine”, write Slater and Usoh, “a VR system that continuously and randomly changed the environment, so that the human participant could form no stable sense of locality, and no relationship with any object: everything being continually in flux. Such an environment would not be presence inducing” (p. 130). This nicely reiterates the danger of breaks in presence, and the critical roles of flow and continuity of the feeling of being in one location as well as the importance of stability of the relationship between the self and the surrounding objects. Finally, an experience related to presence is inverse presence. Namely, while presence pertains to mediated experiences that involve an illusion of being nonmediated, the concept of inverse presence refers to those experiences that are not mediated and yet they involve an illusion of being mediated (Timmins and Lombard 2005). Experiences of inverse presence include, for instance, the feeling of being within a video game while participating in real world activities. Such experiences involve a degree of disbelief and are usually associated with descriptions such as “I thought I was watching a movie”, “I couldn’t believe my eyes”, and questions such as “Am I really here?” The inverse-presence experience allows the person to pretend, at least at some level, that the event is not real and not “really” happening; because the event seems like a mediated experience that is therefore not real, it can serve to distance the person from, and help him or her cope with, the unpleasant reality. (Timmins and Lombard 2005, p. 498)

While one common feature of presence and inverse presence is a distorted picture of the role of medium in the experience, an important difference is that they appear in contexts that are opposite (mediated, not mediated) and involve different cognitive mechanisms. Presence requires suspension of disbelief, while inverse presence requires a degree of disbelief. Presence hinges on the ability to shut out the cues to the external environment, whereas inverse presence requires a sustained effort to focus on a selection of cues from the external environment and enhance them along some personally relevant dimension that regulates the way in which an individual experiences the outside world.

1.4 Physicality Expectations and Pathways to Illusion Efforts to develop virtual reality environments that elicit presence have focused more on issues related to location than on action, revealing a need for a richer physical experience of immersive virtual environments (Giannopoulos et al. 2011). Researchers often point out to examples such as hitting a wall, picking up an object, or shaking somebody’s hand in virtual environments. To be believable, such actions require a degree of physicality. If the expected physical feedback is lacking, participants “feel

1.4 Physicality Expectations and Pathways to Illusion

17

nothing” and such actions and situations are not believable. Current generally available virtual reality technology does not involve systems with generalized haptics, and if the participant touches an object randomly, they will feel nothing: The whole aspect of physicality is typically missing from virtual environment experiences— collisions do not typically result in haptic or even auditory sensations. (Slater 2009, p. 3550)

In an immersive virtual reality study of interpersonal distance in which participants interacted with two types of virtual characters, embodied agents and avatars, the lack of physical contact between virtual characters was experienced as a major disadvantage of using virtual environments (Bailenson et al. 2003). Receiving haptic feedback when jointly manipulating an object in an immersive virtual environment increases the sense of social presence (Oh et al. 2018). In general, improved haptics would ensure a more realistic experience, regardless of whether a virtual environment is intended for training, therapy, or other activities that involve physical contact. Feeling the texture and weight of objects in a virtual environment, or the strength of a grip and trajectory of movements while shaking hands with other virtual characters are equally important as visual and auditory cues (Slater 2009). In an immersive virtual environment real sensory data are replaced by computergenerated data (Mel Slater 2008), yet certain physicality is necessary. How is this to be achieved?

1.4.1 Virtual Embodiment Virtual embodiment is a process of identifying with a virtual (computer-generated) body, i.e. an avatar, which is a digital proxy of the participant in a virtual environment. An avatar represents the participant’s physical body in a virtual environment so that, although located in a different environment (e.g. a laboratory), the participant acts in the virtual environment by means of controlling his/her avatar from this other location. Virtual embodiment may also refer to the end point in the process of identifying with a virtual body, whereby the participant experiences illusion of ownership over a virtual body. This gives the concept of avatar a unique role in virtual environments. Avatars are often conceptualized as “virtual replicas” or digital alter egos of the participants’ physical self. According to one definition, avatars are “the direct extensions of ourselves into the virtual domain”; they are “the digital representations tightly bound to our embodied self, our self perception and our personality” (Waltemate et al. 2018, p. 2). Their function is to enable interaction between the participant and a virtual environment, other virtual characters, and objects encountered in the virtual world. It is not surprising then that avatar’s appearance and behavior cause a range of psychophysical effects in participants they represent as well as in other participants with whose avatars they interact. Illusion of ownership over a body part or over a whole body has been induced in immersive virtual environments for instance by using synchronized visuo-tactile

18

1 A New Kind of Extension

stimulation, as in rubber hand illusion. This way of mapping of the participant’s body schema onto his/her avatar takes place via afferent or sensory signal correspondences (Bailey et al. 2016). In addition, the mapping can be realized via sensorimotor correspondences between the physical body and the virtual body. Thus, virtual reality affords different ways in which body ownership illusion may be induced, involving bottom-up factors, which are related to multisensory integration (visual, motor and tactile stimulation), and top-down factors, which are related to conceptual aspects of the experience, i.e. interpretation of the observed virtual body (e.g. its appearance) (Waltemate et al. 2018). A concept that plays an important role in understanding of virtual embodiment is embodied or grounded cognition.7 It assumes that cognition is not limited to what is inside the skull, but is grounded in the body and its relationship to the environment: … the cognitive system utilizes the environment and the body as external information structures that complement internal representations. In turn, internal representations have a situated character, implemented via simulations in the brain’s modal systems, making them well suited for interfacing with external structures (Barsalou 2010, p. 717).

Applying this notion to virtual environments, one may wonder about how mental representations of objects, events, characters and their interactions in immersive virtual environments are grounded. It emphasizes even more the role of the physical body in virtual reality and the complexity with which this role is projected to a computer-generated space, given that the bodily feedback is mediated and thus limited or sometimes completely lacking in such environments. An additional level of complexity is involved when virtual bodies offer affordances different from those of human body, such as an avatar with a tail (Steptoe et al. 2013) or an avatar with three arms (Won et al. 2015). This opens an intriguing question of how far virtual embodiment can go and still allow mapping of virtual objects onto human body schema. Steptoe et al. (2013) introduced the concept of “extended humanoid avatars” to refer to avatars with fundamentally human form, but with some additional features. In this specific study, avatars had a movable, long tail, extending 0.5 m beyond each arm. The study involved an immersive virtual CAVE-type environment with 32 participants, whose movements were tracked and whose avatars reflected their movements. One half of participants reported that the avatar’s tail moved in a random and asynchronous way relative to their movements, whereas the other half reported that the avatar’s body moved in a synchronous way and that they could control it by hip movements. Moreover, the participants who controlled avatar’s tail movements experienced anxiety when faced with a virtual threat to the tail. This finding is interesting because it indicates that regardless of the appearance of avatar’s body, these participants were able to experience body ownership, confirming once again the flexibility of the human body schema. 7 The term grounded cognition is sometimes used interchangeably with the term embodied cognition.

However, the term embodied cognition overemphasizes the role of the body in grounding, whereas the physical and the social environments are equally important grounding mechanisms (Barsalou 2008, 2010).

1.4 Physicality Expectations and Pathways to Illusion

19

Avatars are used to visually represent participants in virtual environments. In shared and mixed reality environments, these are interconnected participants. Importantly, virtual reality affords an unprecedented degree of possibilities for extending and radically changing our virtual bodies. In addition to avatars’ bodies that considerably differ from the human body matrix and yet are evolutionary plausible, other types of virtual bodies have also been used to investigate body ownership in virtual reality, including bodies with additional limbs or bodies whose structure has no basis in evolution (Bailey et al. 2016). Although virtual, these bodies are still perceptually real (Steptoe et al. 2013). The sense of ownership over the virtual body and the sense of control over the virtual body’s actions have been reported for human-like and other types of avatars. However, regardless of the sense of ownership over the virtual body and the sense of control over its movements, the process of motor prediction differs between a virtual body that has a counterpart in the physical world and a body that does not (Steptoe et al. 2013). Consider as an example an avatar with a tail. Although an intention to move avatar’s tail is followed by an efferent signal to the hip muscles which perform the movement and afferent signal from proprioception is sent to the brain, and although the visual signal confirms the hip movement as well as the presence and the movement of the tail, the tail’s movement itself cannot be felt. And this is what sets apart the avatar’s tail movement experience from the normal body movement experience and places it closer to a phantom limb experience. While in the phantom limb experiences people can sense but cannot see the movement, in the avatar tail experience participants can see but cannot sense the movement. The dominance of the visual feedback in the case of avatar with a tail may attenuate, but it cannot compensate for the lack of feedback on how the tail’s movement feels. The Steptoe et al.’s observation is significant, because it suggests a clear demarcation between virtual embodiment based on human-like and extended humanoid avatars. The difference has implications for design of virtual reality applications for treatment and training as well as entertainment. On a theoretical level, it suggests the importance of studying effects of virtual embodiment. For instance, one relevant question is: Can a movement that is not felt still be believable? Apparently, even if it is not felt, as long as the movement can be seen and is synchronized with participant’s movements, it is believable. And being believable is what counts most in this context, or what makes virtual reality feel real (Brooks 2003). Research evidence converges in indicating that the first-person perspective, visuo-tactile and visuo-motor synchrony as well as sensory feedback are critical steps in inducing virtual embodiment, and that even just observing a virtual body from the first person perspective leads to virtual embodiment (Slater et al. 2010). Assuming that participants in immersive virtual environments can learn how to control their virtual bodies, Won et al. (2015) investigated embodiment of an avatar that has been designed with affordances for tasks not available to human body. One question in the study was whether participants would use avatar’s body in a way that is optimal to accomplish the task, thereby modifying the body schema, and another question was whether they would benefit from the affordances for task completion

20

1 A New Kind of Extension

available to a humanoid avatar body extended by a longer, third arm. While participants were able to relatively quickly (~10 min) adapt to the movement afforded by a new avatar, there were no statistically significant differences in task performance between the avatar who used a longer third arm and the avatar in the normal condition. The authors argue that it may not be the human/non-human-like appearance of an avatar, but rather the control of action that is critical for virtual embodiment. This study provides further evidence that there is flexibility in embodiment of virtual bodies, and that the body can learn to remap movements from the real world to movements of a virtual body despite the structural differences. To suggest the transformative effect of virtual embodiment, Slater et al. (2010) use the term “body transfer”, which emphasizes an aspect of participant’s sense of embodiment, or body ownership over an immersive virtual reality avatar. However, body transfer implies leaving the body behind, whereas the main feature of virtual embodiment is extension of corporeal awareness to incorporate a virtual body into an already existing body schema. Regardless, they demonstrate that a first person perspective, in addition to visuo-tactile synchrony, allows body ownership over a virtual body of the opposite sex. Previous studies, which manipulated either perspective or movement or touch, have suggested that the first person perspective and simultaneous touch were associated with ownership over a virtual body, and this study, which manipulated all three factors, further indicates that perspective is the predominant explanatory factor for subjective and objective measures of body ownership. What is interesting about this study is that it generated the sense of being in a body that is not only physically different in size from the participants’ physical body (smaller), but also of a different sex (female). The authors argue that using the bottom-up sensory stimuli, as in their study which used visual, tactile, vestibular and proprioceptive stimuli, sends a perceptually stronger signal of body ownership than using top-down perceptual signals, such as visuo-tactile integration, as previously argued for illusion of ownership over a body part (e.g. Botvinick and Cohen 1998). They further argue that the ownership over a rubber hand, which is an illusion based on top-down visuo-tactile integration, differs from the whole body ownership illusion, which arises from the bottom-up perceptual signal, in a sense that the rubber hand illusion indicates only a point of proprioceptive and tactile sensation, and not necessarily body ownership. Taken together, these studies suggest that participants can extend the sense of body ownership to avatars and learn to control them even when they differ from the specific avatar in age, gender, race, or physical body size, and even when the avatar’s body structurally differs from the physical body. Certain limitations to virtual embodiment, however, apply. For instance, extending the length of an avatar’s arm much into extrapersonal space breaks illusion of arm extension, which is expected given that human arm’s functionality is limited to peripersonal space (e.g. Kilteni et al. 2012). Remarkably, changing the appearance and affordances of participant’s virtual body changes the sense of self (Biocca 1997; Won et al. 2015) and it may affect participant’s behavior not only in virtual reality, but also after they leave the virtual environment. Consider the Proteus effect, a phenomenon observed in virtual reality

1.4 Physicality Expectations and Pathways to Illusion

21

participants when they adjust their behavior and attitudes according to what they infer is expected of them, based on the appearance of their avatars (Yee and Bailenson 2009). For instance, choosing an avatar that is taller and more attractive than the participant may affect the participant’s performance and interaction with others in the virtual world, with the effect transferring even to the physical world. This transfer of adjusted behavior and attitudes to situations in which participant finds himself/herself after leaving the virtual environment is intriguing and perhaps insufficiently explored so far with regard to some important issues regarding interaction with others (e.g. violence, sexuality). The Proteus effect is based in self-perception theory and the phenomenon known as behavioral confirmation. In an influential study on social stereotypes, which investigated the effects of social perception on social interaction, Snyder et al. (1977) put forth the idea of behavioral confirmation. They proposed that an initial perception of a person guides social interaction so that the person in fact behaves in the expected way and fulfills the perceiver’s initial impression of him/her. Thus, using our social perceptions as guides in interactions with others we may constrain their behavior. An often cited example is the behavioral confirmation of the stereotype on physical attractiveness in dyadic social interactions, where the perceiver expects an attractive person to behave in a friendly, outgoing and charming way, and the target person behaves towards the perceiver in the fashion perfectly consistent with the expectations. The power of behavioral confirmation is such that the same person may be led to behave in an opposite, yet again consistent, trait-like manner towards another perceiver, whose impressions of the target person are opposite. Thus, choosing an avatar may have consequences beyond participants’ expectations. While choosing an avatar may have consequences not anticipated by the participant, the experience of having a body in an immersive virtual environment is beneficial for interaction tasks, the feeling of presence, perceptual judgment, distance estimation, and spatial awareness (Diersch and Wolbers 2019). Currently available consumer virtual reality systems usually have only a partial representation of the participant in a virtual environment and only a few have a full body representation. The head-mounted systems without full-body avatar representation typically result in a sense of disembodiment. Pan and Steed (2017) investigated how self-avatar affects collaboration in shared virtual environments, manipulating the level of embodiment (no avatar vs. self-avatar) and comparing the results with the participants’ face-toface performance on the task. The main findings of their study are that participants completed the task faster when they cooperated then when they competed in selfavatar and face-to-face conditions, but not in no-avatar condition, and that participants’ scores for trust formation were higher in the former two conditions than in the latter condition. These findings indicate the importance of self-avatar for immersive virtual environments. As pointed out by Pan and Steed (2017), given the benefits of having a full body self-avatar representation, adding a self-avatar to commercial head-mounted display systems would be an important improvement of participants’ experience of shared virtual environments.

22

1 A New Kind of Extension

Among the factors that promote embodiment effects in virtual environments, Waltemate et al. (2018) investigated how avatar’s personalization and individualization affected body ownership, presence and emotional response in an immersive virtual environment. The level of immersion was manipulated by using a virtual mirror metaphor with two setups, a CAVE-type setup and a head-mounted display. Thus, a simulated mirror reflected participants’ avatars allowing inspection of a full avatar’s body as well as face, regardless of the degree of immersion. Participants’ avatars varied in look between a generic hand-modeled version, a generic scanned version, and an individualized scanned version. The authors reported that the participants with personalized avatars experienced stronger feelings of body ownership, presence and dominance relative to the participants represented by the other two types of avatars (both generic), and that a higher degree of immersion was associated with a greater sense of body ownership, control of avatar’s movement, and the feeling of presence. Thus, immersive virtual reality affords new ways of interaction with an environment and new ways of experiencing own body. This realization has led some researchers to investigate the ways in which mental representations of mediated interactions are rooted in the body and potential effects of virtual embodiment on participants’ behaviors and attitudes in virtual environments and beyond.

1.4.2 Feeling What is not There In phantom limb experiences people can feel, but they cannot see the movement of the amputated limb (the limb is not there). In virtual environments participants can see, but they cannot feel the movement of a body part of their avatar that is not part of the human body (e.g. a tail, a third arm). Yet participants experience the sense of ownership over such virtual bodies. How are these illusions induced? Psychologists who study illusions argue that these errors can reveal interesting facts about perception and thus they represent a window into the human mind/brain, possibly reflecting our evolutionary development (Hara et al. 2015 ). Illusions implicating vision, touch and proprioception are of specific interest for research on virtual reality, because they may lead to a better understanding of perception in virtual environments. In addition, illusions involving ownership of body parts as well as the whole body ownership in virtual environments may lead to a better general understanding of bodily self-identification. Evidence on such illusions converges with observations on neurological patients and findings from phantom-limb studies, suggesting that the body representation is not rigidly fixed (Berlucchi and Aglioti 1997). Although body ownership illusions have become the object of research interest only relatively recently, an early anecdotal account of limb ownership illusion goes back to 1937 and work of Tastevin (Slater et al. 2009; Kilteni et al. 2014). A seminal study on the rubber hand illusion (Botvinick and Cohen 1998) showed that synchronized stroking of a participant’s hidden hand and a rubber hand that was visible to the participant induced an illusion that the rubber hand had sensed the touch. This means

1.4 Physicality Expectations and Pathways to Illusion

23

a false perception regarding body ownership. In contrast, asynchronous stroking did not elicit the illusion. Furthermore, the study reported a proprioceptive drift, which is a tendency to inaccurately localize one’s own hand, and a drift towards the rubber hand when participants were required to position the real hand after misattributing the rubber hand to the self. The rubber hand illusion is elicited by a conflict in multisensory information processing that is resolved in favor of apparently predominant visual signal over the sense of hand’s position (proprioception). With the onset of illusion and within about 11–15 s of the rubber hand stroking,8 the hand-centered frame of reference shifts from the real hand, which is hidden, to the rubber hand, which is visible to the participant. Ehrsson et al. (2004) argue that this recalibration of limb position may be a key mechanism that enables the elicitation of the illusion, and that the drift towards the rubber hand happens because the brain recalculates the position of the real hand on the basis of the position of the rubber hand. Furthermore, threat to the rubber hand provokes anxiety and an urge to withdraw it, precisely as when the real hand is under threat, with brain activity increasing in the areas associated with pain and anxiety (insula and anterior cingulate cortex) (Ehrsson et al. 2007). This has been interpreted as neurophysiological evidence that the brain represents the rubber hand as part of the body schema. In contrast to the illusion of ownership over a fake hand, consider somatoparaphrenia, which is a disorder characterized by the false belief that a real limb does not belong to one’s own body (Vallar and Ronchi 2009). Of note, neurocognitively normal people who experience the rubber hand illusion never report the sense of having three arms (Moseley et al. 2012). This indicates that the body matrix incorporates the rubber hand and temporarily disowns the actual hand. This hypothesis has been confirmed by the observations on drop of temperature in the disowned hand. Importantly, unlike the disowned hand, the opposite hand or the ipsilateral leg did not undergo a change in temperature. According to Moseley et al. (2012), the drop in temperature is due to “the dynamic changes in neural activity across the body matrix”, where the body matrix is “a coarse representation of the body and the space around it” (p. 41) (Fig. 1.1). Specifically, visuo-tactile synchronization triggers an increase in neural activation in the representation of the space occupied by the rubber hand and a decrease in the neural activation in the representation of the space occupied by the actual hand that is being disowned due to the rubber hand illusion. In addition to the importance of incoming multisensory data and space around the body for the representation of the body, prior knowledge about human body form also plays a major role in determining body ownership. This is indicated by the fact that for inducing the rubber hand illusion, the fake arm needs to look like a real arm, be oriented as the real arm and aligned with it (Slater et al. 2009). It also needs to be positioned in the proximity of the real arm and its length cannot be random, since it operates in peripersonal space. 8 It

has been estimated that the rubber hand illusion can be successfully generated in about 80% of people within 15 s of stimulation, if the rubber hand is placed in the proximity of 15–18 cm from the real hand (Lloyd et al. 2007).

24

1 A New Kind of Extension

Fig. 1.1 The body matrix. Cortical areas supporting the body matrix, including the areas involved in ownership over the rubber hand (grey), in somatotopic representation of the body (blue) and in body-centered spatial representation (green) (Moseley et al. 2012). Reproduced with permission from Elsevier

Ownership over a rubber hand (Botvinick and Cohen 1998) has been replicated in virtual reality using a three dimensional arm and hand, and a virtual ball for stimulation of the virtual hand (Mel Slater 2008). The virtual arm illusion requires synchronization and unification of information coming from different domains, such as visuo-tactile synchrony, visuo-motor synchrony (where there is no tactile stimulation but instead there is self-induced motor activity such as movements of fingers and hand), and it can also be induced via brain-computer interface, where there is no tactile or motor stimulation, but instead a cued motor imagery paradigm is used. Thus, seamless integration of multisensory information is always necessary to induce the illusion. Apart from the virtual limb illusion, a whole virtual body illusion has also been generated in virtual environments (Petkova and Ehrsson 2008; Mel Slater 2008; Slater et al. 2009; Sanchez-Vives et al. 2010), indicating that these types of illusions are easily replicated in virtual reality (Normand et al. 2011). Critical in inducing the virtual hand illusion, for instance, is synchrony between visual and motor actions of the real and virtual hands, and in the whole body illusion, in addition to synchronous visual-tactile stimulation, first person perspective is necessary. These illusions cannot be produced with asynchronous stimulation.

1.4 Physicality Expectations and Pathways to Illusion

25

Furthermore, body space has been manipulated in virtual reality to induce illusions of different size or shape of one’s body. For instance, the very long arm illusion is characterized by the ownership of virtual arm that is up to three times the length of the person’s real arm (Kilteni et al. 2012). While the illusion of shrinking waist has been demonstrated using fMRI (Ehrsson et al. 2005), the opposite, larger belly illusion was generated in virtual reality (Normand et al. 2011). Beside bodily features that are susceptible to change in the course of life, such as belly size, some more rigidly fixed features related to one’s body are also susceptible to body ownership illusion. For instance, a person’s sex has been manipulated in a whole body ownership illusion, which induced illusion of having female bodies in men (Slater et al. 2010a). Here, too, first person perspective and synchronous multisensory stimulation (visuo-tactile) were critical in generating the illusion. These findings have important implications for debates revolving around questions such as: Does the form of human body need to be maintained in virtual reality? If the way the brain represents the body depends both on prior knowledge about human body and on incoming multisensory information, then a certain balance between the two sources of information must be maintained to incorporate a virtual representation into the participant’s body schema. As studies on virtual hand illusion show, the main features of the actual body are critical in deceiving the perceptual system by a virtual hand. For the illusion to succeed, there needs to be a sense of body continuity, i.e. visual experience that the virtual hand belongs to the body. Any anatomically implausible alignment between the virtual hand and the real hand or the rest of the body as well as deviations from the real, such as the length of the virtual arm that makes it difficult to see the hand’s details or hand’s texture, affects the illusion (Normand et al. 2011; Kilteni et al. 2012; Perez-Marcos et al. 2012). Thus, expectations of certain physicality of the body that characterizes normal human experience extend to virtual environments, but they can be modified to some extent.

1.5 Consensual Illusion Ultimately, virtual reality is about making systems that fool the human senses (Krueger 1994, p. xii). Virtual reality aims not at simulating reality but producing illusions (Slater 2014, p. 2).

We all have experienced situations in which our senses deceived us. Virtual reality operates on the premise that the mind can be fooled into perceiving computergenerated environments as real. Since the early days of virtual reality, the nature of experience in computer-generated environments has been labeled by various expressions, one of which is consensual hallucination introduced by William Gibson (Markley 1996). However, virtual reality experiences are perhaps better captured by the expression consensual illusion, because of the clinical implications of the word “hallucination”:

26

1 A New Kind of Extension A hallucination is a false perception seen by one person, often drugged or mentally ill, in the absence of stimulation, whereas an illusion is a misinterpretation of a stimulus consistently experienced by most normal people (Anstis 1999, p. 385).

Given that illusion is a matter of perception, on the one hand, and that a consensus requires a knowledge-based decision, a sort of cognitive action, on the other, the term consensual illusion is theoretically provoking in the context of virtual reality, implying the view that perception is cognitively penetrable (Sect. 1.3.2). Importantly, the term captures the essence of the experience of being present in an immersive virtual environment, suggesting that even though virtual reality is positioned further down on the reality continuum, it is still in the realm of psychologically normal, and as such it differs from experiences of “another kind of reality”, found in patients with psychiatric conditions. In addition to distinguishing between hallucinations and illusions, which result from perceptual errors, sometimes the term delusion is reserved for errors in beliefs and reasoning, i.e. for errors of cognition. However, equating delusions with false beliefs is an oversimplification; delusions imply “a significant lack of, or distortion in, understanding of one’s situation” (Perring 1999). So while experiences in fully immersive virtual reality environments trigger suspension of disbelief, they do not trigger errors in belief such as delusions, because participants always know that the projected environment is not real. They are aware of its manipulability and that it can change at will (Kreitler 1996). In fact, experiencing that “what is apparently happening [in a virtual environment] is really happening (even though you know for sure that it is not)” Slater (2009) characterizes as “plausibility illusion” (p. 3553). In contrast, patients who suffer from delusions related to severe mental disorders, such as schizophrenia9 , do not realize that “what is apparently happening” is not actually happening and that their experiences are result of distorted thinking and errors in their belief system (Fourneret et al. 2001). It might seem that the reality continuum redefined in the context of virtual reality technology has something in common with experiences of reality in patients who are further down on the mental health-illness continuum. However, what psychiatric patients who suffer from delusions have in common with participants in virtual reality is only that they experience an alternative, highly believable reality. Yet, participants immersed in a virtual environment retain a degree of awareness about the immersion; they consent to participating in it and experiencing what they encounter there. In psychiatric illnesses such as schizophrenia, patients are not aware of their departure from reality caused by the illness; there is no consent to being mentally ill or to participating in that alternative world. In addition, the experience of being in a technology-mediated environment may be easily disrupted and then easily resumed. Unlike the ease of shifting between being in- and out- of a virtual reality world, delusion-induced experiences are rigidly locked. This reflects the difference between

9 Schizophrenia is a severe psychiatric disorder which is characterized by positive and negative symp-

toms. Positive symptoms include a thought disorder, hallucinations and delusions, while negative symptoms include flattening of ål (Jahanshahi and Frith 1998).

1.5 Consensual Illusion

27

errors of perception (i.e. illusion of being there) and errors of cognition (e.g. errors in understanding of one’s own situation and deviations within the belief system). Furthermore, with virtual reality technology, the illusion of alternative reality is achieved by using computers to synthesize stimuli for the human senses, which means that the process is carefully controlled and guided by the design of specific application. It is intended as such and participants can have certain expectations and predictions about it. Obviously, this is not the case with illnesses such as schizophrenia. Finally, illusions are public and repeatable, because a certain type of stimuli typically results in specific misperceptions or incorrect interpretations, whereas delusions are a private type of derangement (Blackburn 1994); they are not deliberate or controlled by the patient, who is not a consenting participant, a subject, but rather an object of illness. Thus, virtual reality departs from physical reality because of illusory transformations in virtual environments, including not only illusion of being elsewhere instead at the location of own physical body, but also illusions related to changes of some aspects of the body and illusory transformations of the whole body. These changes may lead, at least temporarily, to changes in participants’ attitudes and behaviors, extending sometimes beyond the virtual to real life situations. Participants may or may not be aware of such changes. The power of virtual reality is in evoking such responses, despite the limitations of current virtual reality technology.

References Adolphs, R.: Human lesion studies in the 21st century. Neuron 90, 1151–1153 (2016) Anstis, S.: Illusions. In: Wilson, R.A., Keil, F.C. (eds.) The MIT Encyclopedia of the Cognitive Sciences, pp. 385–387. MIT Press, Cambridge (1999) Bailenson, J.N., Blascovich, J., Beall, A.C., Loomis, J.M.: Interpersonal distance in immersive virtual environments. Pers. Soc. Psychol. Bull. 29, 1–15 (2003) Bailey, J.O., Bailenson, J.N., Casasanto, D.: When does virtual embodiment change our minds? Presence 25, 222–233 (2016) Barsalou, L.W.: Grounded cognition. Annu. Rev. Psychol. 59, 617–645 (2008) Barsalou, L.W.: Grounded cognition: past, present, and future. Top. Cogn. Sci. 2, 716–724 (2010) Beck, D.M., Clevenger, J.: The folly of boxology. Behav. Brain Sci. 39, 20–21 (2016) Berlucchi, G., Aglioti, S.: The body in the brain: neural basis of corporeal awareness. Trends in Neurosci. 20, 560–564 (1997) Blackburn, S.: The Oxford Dictionary of Philosophy. Oxford University Press, Oxford (1994) Boden, M.: Mind as Machine. A History of Cognitive Science. Clarendon Press, Oxford (2006) Bostrom, N.: Superintelligence. Paths, Dangers, Strategies. Oxford University Press, Oxford (2014) Botvinick, M., Cohen, J.: Rubber hands ‘feel’ touch that eyes see. Nature 391, 756–756 (1998) Brooks, F.P. Jr.: What’s real about virtual reality? IEEE Comput. Graph. Appl. 19, 16–27 (1999). https://doi.org/10.1109/38.799723 Brooks, K.: There is Nothing Virtual About Immersion: Narrative Immersion for VR and Other Interfaces. (Retrieved March, 2017) (2003), http://alumni.media.mit.edu/~brooks/storybiz/imm ersiveNotVirtual.pdf Burdea, G., Coiffet, P.: Virtual Reality Technology. Wiley, New York (1994) Chalmers, D.: The virtual and the real. Disputatio 9(46), 309–352 (2017) Clark, A.: Natural-Born Cyborgs. Oxford University Press, Oxford (2003)

28

1 A New Kind of Extension

Clark, A.: Supersizing the Mind. Oxford University Press, Oxford (2008) Clark, A., Chalmers, D.: The extended mind. Analysis 58, 7–19 (1998) Coutrot, A., Schmidt„ S., Pittman, J., Hong, L., Weiner, J.M., et al.: Virtual navigation tested on a mobile app is predictive of real world navigation performance: preliminary data. BioRxiv (2018) Cutler, A., Norris, C.: Bottoms up! How top-down pitfalls ensnare speech perception researchers, too. Behav. Brain Sci. 39, 25–26 (2016) Dawkins, R.: The Selfish Gene. Oxford University Press, Oxford (1989) Dennett, D.: Where am I? In: Hofstadter, D.R., Dennett, D.C. (eds.) The Mind’s I, pp. 217–229. Basic Books Inc., Publishers, New York (2000) D’Huart, D.M.: From reality to “the real”. Using augmented virtual reality for training. In: Riegler, A., Peschl, M.F., Edlinger, G.F, Feigl, W. (eds) Virtual Reality. Cognitive Foundations, Technological Issues & Philosophical Implications (pp.129–139). Peter Lang Publishing, Frankfurt (2001) Diersch, N., Wolbers, T.: The potential of virtual reality for spatial navigation research across the adult life span. J. Exp. Biol. 222, 1–9 (2019) Draper, J.V., Kaber, D.B., Usher, J.M.: Telepresence. Hum. Factors 40(3), 354–375 (1998) Ehrsson, H.H., Spence, C., Passingham, R.E.: That’s my hand! Activity in premotor cortex reflects feeling of ownership of a limb. Science 305, 875–877 (2004) Ehrsson, H., Kito, T., Sadato, N., Passingham, R., Naito, E.: Neural substrate of body size: illusory feeling of shrinking of the waist. PLoS Biol 3, e412 (2005) Ehrsson, H.H., Wiech, K., Weiskopf, N., Dolan, R.J., Passingham, R.E.: Threatening a rubber hand that you feel is yours elicits a cortical anxiety response. PNAS 104, 9828–9833 (2007) Ellis, S.R.: Nature and origin of virtual environments. A bibliographic essay. Comput. Syst. Eng. 2, 321–347 (1991) Federmeier, K.D.: Thinking ahead: the role and roots of prediction in language comprehension. Psychophysiology 44, 491–505 (2007) Firestone, C., Scholl, B.J.: Cognition does not affect perception: evaluating the evidence for ‘topdown’ effects. Behav. Brain Sci. 39, 1–77 (2016) Fodor, J.A.: The Modularity of Mind: An Essay on Faculty Psychology. MIT Press (1983) Fourneret, P., Franck, N., Slachevsky, A., Jeannerod, M.: Self-monitoring in schizophrenia revisited. NeuroReport 12(6), 1203–1208 (2001) Friston, K.: Prediction, perception and agency. Int. J. Psychophysiol. 83, 248–252 (2012) Geertz, C.: The impact of the concept of culture on the concept of man. In: Platt, J.R. (ed.) New Views of the Nature of Man, pp. 93–118. Chicago University Press, Chicago (1966) Giannopoulos, E., Wang, Z., Peer, A., Buss, M., Slater, M.: Comparison of people’s responses to real and virtual handshakes within a virtual environment. Brain Res Bullet. 85, 276–282 (2011) Gur, M.: The anatomical and physiological properties of the visual cortex argue against cognitive penetration. Behav. Brain Sci. 39, 34–35 (2016) Gutiérrez, M.A., Vexo, F., Thalmann, D.: Stepping into Virtual Reality. Springer (2008) Hackel, L.M., Larson, G.M., Bowen, J.D., Mann, T.C., Middlewood, B., Roberts, I.D., et al.: On the neural implausibility of the modular mind: evidence for distributed construction dissolves boundaries between perception, cognition, and emotion. Behav. Brain Sci. 39, 34–36 (2016) Hall, E.T.: The Hidden Dimension. Doubleday & Company Inc., Garden City (1969) Hara, M., Pozeg, P., Rognini, G., Higuchi, T., Fukuhara, K.: Voluntary self-touch increases body ownership. Frontiers in Psychology, https://doi.org/10.3389/fpsyg.2015.01509 (2015) Hartmann, T., Wirth, W., Vorderer, P., Klimmt, C., Schramm, H., Boecking, S.: Spatial presence theory: state of the art and challenges ahead. In: Lombard, M., Biocca, F., Freeman, J., Jsselsteijn, W., Schaevitz, R.J. (eds) Immersed in Media. Telepresence Theory, Measurement & Technology (pp. 115–135). Springer (2015) Hayles, K.N.: Boundary disputes: homeostasis, reflexivity, and the foundations of cybernetics. In: Markley, R. (ed.) Virtual Realities and Their Discontents, pp. 11–38. Johns Hopkins University Press, Baltimore (1996)

References

29

Hayles, K.N.: How We Became Posthuman. Virtual Bodies in Cybernetics, Literature, and Informatics. University of Chicago Press (1999) Heim, M.: Crossroads in virtual reality. In: Marchese F.T. (ed) Understanding Images. Finding Meaning in Digital Imagery, pp. 265–281. Springer, Berlin (1995) Hoffmann, S., Falkenstein, M.: Predictive information processing in the brain: errors and response monitoring. Int. J. Psychophysiol. 83, 208–212 (2012) Hollan, J., Hutchins, E., Kirsch, D.: Distributed cognition: toward a new foundation of humancomputer interaction research. ACM Transactions on Computer-Human Interaction. 7(2), 174– 196 (2000) Hyde, J.S.: The gender similarity hypothesis. Am. Psychol. 60, 581–592 (2005) Hyde, J.S.: Sex and cognition: gender and cognitive functions. Curr. Opin. Neurobiol. 38, 53–56 (2016) International Society for Presence Research.: The Concept of Presence: Explication Statement (2000). Retrieved on 27.03.2020 from https://smcsites.com/ispr/ Jacobson, D.: On theorizing presence. J. Virt. Environ. 6(1) (2002) Jahanshahi, M., Frith, C.D.: Willed action and its impairments. Cogn. Neuropsychol. 15, 483–533 (1998) Jung, C.G.: Psychological types. In: Ress, L., McGuire, W. (eds) The Collected Works of C.G. Jung, vol. 6. Princeton University Press, Princeton (1971) Kendrick, M.: Cyberspace and the technological real. In: Markley, R. (ed.) Virtual Realities and Their Discontents, pp. 143–160. Johns Hopkins University Press, Baltimore (1996) Kilteni, K., Normand, J.-M., Sanchez-Vives, M.V., Slater, M.: Extending body space in immersive virtual reality: a very long arm illusion. PLoSONE 7, e40867 (2012) Kilteni, K., Masseli, A., Kording, K.P., Slater, M.: Over my fake body: body ownership illusion for studying the multisensory basis of own body perception. Front. Hum. Neurosci. 9, 141 (2014) Kimura, D., Clarke, P.G.: Women’s advantage on verbal memory is not restricted to concrete words. Psychol. Rep. 91(1137–1142), 14 (2002) Krueger, M.W.: Foreword to Burdea. In: Coiffet, P. (ed) Virtual Reality Technology, pp. xi–xiii. Wiley & Sons, Inc., New York (1994) La Barre, W.: The Human Animal. University of Chicago Press, Chicago (1954) Lanier, J.: Dawn of the New Everything. A Journey Through Virtual Reality. The Bodley Head, London (2017) Larson, P., Rizzo, A.A., Buckwalter, J.G., Van Rooyen, A., Kratz, K., Neumann, U., et al.: Gender issues in the use of virtual environments. Cyberpsychol. Behav. 2, 113–123 (1999) Lexico.: Oxford University Press (2019). www.lexico.com Lloyd, D.: Spatial limits on referred touch to an alien limb may reflect boundaries of visuo-tactile peripersonal space surrounding the hand. Brain Cogn. 64, 104–109 (2007) Lombard, M., Jones, M.T.: Defining presence, In: Lombard, M., Biocca, F., Freeman, J., IJsselsteijn, W., Schaevitz, R.J. (eds) Immersed in Media. Telepresence Theory, Measurement & Technology, pp. 13–34. Springer (2015) Mania, K., Chalmers, A.: The effects of levels of immersion on memory and presence in virtual environments: a reality centered approach. Cyberpsychol. Behav. 4(2), 247–264 (2001) Markley, R.: Introduction: history, theory and virtual reality. In: Markley, R. (ed.) Virtual Reality and Its Discontents, pp. 1–10. Johns Hopkins University Press, Baltimore (1996) Mel Slater.: (2008) Towards a digital body: The virtual arm illusion. Frontiers in Human Neuroscience 2 Mellet, E., Laou, L., Petit, L., Zago, L., Mazoyer, B., Tzourio-Mazoyer, N.: Impact of the virtual reality on the neural representation of an environment. Hum. Brain Mapp. 31, 1065–1075 (2010) Merleau-Ponty, M.: Phenomenology of Perception. Routledge, London (1958) Messick, S.: Cognitive style and personality: scanning and orientation toward affect. In: A Persistent Scholar: A Festschrift for Charles M. Solley. Wayne State University Press, Detroit (1989) Milgram, P., Takemura, H., Utsumi, A., Kishino, F.: Augmented reality: a class of displays on the reality-virtuality continuum. Proc. Int. Soc. Opt. Eng. SPIE 2351, 282–292 (1995)

30

1 A New Kind of Extension

Minsky, M.: Telepresence. OMNI Magazine, June 1980 (1980) Moseley, L.G., Gallace, A., Spence, C.: Bodily illusions in health and disease: physiological and clinical perspectives and the concept of a cortical ‘body matrix’. Neurosci. Biobihev. Rev. 36, 34–46 (2012) Normand, J.-M., Giannopoulos, E., Spanlang, B., Slater, M.: Multisensory stimulation can induce an illusion of larger belly size in immersive virtual reality. PLoS ONE 6, e16128 (2011) Norton, D., Stark, L.: Scanpaths in saccadic eye movements while viewing and recognizing patterns. Vis. Res. 11, 929–942 (1971) Oh, C.S., Bailenson, J.N., Welch, G.F.: A systematic review of social presence: definitions, antecedents, and implications. Front. Robot. AI 5, 114 (2018) Pan, Y., Steed, A.: The impact of self-avatars on trust and collaboration in shared virtual environments. PLoS ONE 12, e0189078 (2017) Parsons, T.D., Larson, P., Kratz, K., Thiebaux, M., Bluestein, B., Buckwalter, G., et al.: Sex differences in mental rotation and spatial rotation in a virtual environment. Neuropsychologia 42, 555–562 (2004) Penrose, R.: The Emperor’s New Mind. Oxford University Press (1989) Perez-Marcos, D., Sanchez-Vives, M., Slater.: Is my hand connected to my body? The impact of body continuity and arm length on the virtual hand illusion. Cogn. Neurodyn. 6, 295–305 (2012) Perring, C.: Mental illness. In: Zalta, E. (ed) The Stanford Encyclopedia of Philosophy (Fall Edition) (1999). http://plato.stanford.edu/entries/mental-illness/8/7/02 Petkova, V., Ehrsson, H.H.: If I were you: perceptual illusion of body swapping. PLoS ONE 3, e3832 (2008) Pylyshyn, Z.: Is vision continuous with cognition? The case for cognitive impenetrability of visual perception. Behav. Brain Sci. 22, 341–423 (1999) Rapolyi, L.: Virtuality and plurality. In: Riegler, A., Pesch, M.F., Edlinger, K., Fleck, G. (eds) Virtual Reality. Cognitive Foundations, Technological Issues and Philosophical Implications, pp. 167–187. Peter Lang Publishing, Frankfurt (2001) Riva, G.: Is presence a technology issue? Some insights from cognitive sciences. Virtual Real. 13, 159–169 (2009) Ross, R.M., McKay, R., Coltheart, M., Langdon, R.: Perception, cognition, and delusion. Behav. Brain Sci. 39, 47–48 (2016) Sanchez-Vives, M.V., Spanlang, B., Frisoli, A., Bergamasco, M., Slater, M.: Virtual hand illusion induced by visuo-motor corelation. PLoS ONE 5, e10381 (2010) Sanford, D.H.: Where was I? In: Hofstadter, D.R., Dennett, D.C. (eds.) The Mind’s I, pp. 232–240. Basic Books Inc., Publishers, New York (2000) Sas, C., O’Hare, G.M.P., Reilly, R.: Presence and task performance: an approach in the light of cognitive style. Cogn. Tech. Work 6, 53–56 (2004) Slater, M.: Presence and the sixth sense. Presence 11(4), 435–439 (2002) Slater, M.: A note on presence terminology. Presence Connect 3(3), 1–5 (2003) Slater, M.: Place illusion and plausibility can lead to realistic behavior in immersive virtual environments. Philos. Trans. R. Soc. B 364, 3549–3557 (2009) Slater, M.: Grand challenges in virtual environments. Front. Robot. AI 1, 3 (2014) Slater, M. & Usoh, M. (1994). Body centered interaction in immersive virtual environments. In: Thalmann, M & Thalmann, D. (Eds.), Artificial life and virtual reality (pp. 125–148). Wiley, New York. Slater, M., Brogni, A., Steed, A.: Physiological responses to breaks in presence: a pilot study. In: Presence 2003: The 6th Annual International Presence Workshop, pp. 1–5 (2003) Slater, M., Perez-Marcos, D., Ehrsson, H.H., Sanchez-Vives, M.V.: Inducing illusory ownership of a virtual body. Front. Neurosci. 3, 214 (2009) Slater, M., Spanlang, B., Sanchez-Vives, M.V., Blanke, O.: First person experience of body transfer in virtual reality. PLoSOne 5, e10564 (2010) Snyder, M., Tanke, E.D., Berscheid, E.: Self-perception and interpersonal behavior: on the selffulfilling nature of social stereotypes. J. Pers. Soc. Psychol. 35, 656–666 (1977)

References

31

Stein, J.-P., Ohler, P.: Venturing into the uncanny valley of mind—the influence of mind attribution on the acceptance of human-like characters in a virtual reality setting. Cognition 160, 43–50 (2017) Steinicke, F.: Being Really Virtual. Immersive Natives and the Future of Virtual Reality. Springer (2016) Steptoe, W., Steed, A., Slater, M.: Human tails: ownership and control of extended humanoid avatars visualization and computer graphics. IEEE Trans. 19, 583–590 (2013) Thomas B, Sheridan.: (1992) Musings on Telepresence and Virtual Presence. Presence: Teleoperators and Virtual Environments 1(1), 120–126 Tinwell, A., Nabi, D.A., Charlton, J.P.: Perception of psychopathy and the uncanny valley in virtual characters. Comput. Hum. Behav. 29, 1617–1625 (2013) Tonna, M., Marchesi, C., Parmigiani, S.: The biological origins of rituals: an interdisciplinary perspective. Neurosci. Biobehav. Rev. 98, 95–106 (2019) Vallar, G., Ronchi, R.: Somatoparaphrenia: a body delusion. A review of the neuropsychological literature. Exp. Brain Res., 192, 533–551 (2009) Van Petten, C., Luka, B.: Prediction during language comprehension: benefits, costs, and ERP components. Int. J. Psychophysiol. 83, 176–190 (2012) Vince, J.: Essential Virtual Reality. How to Understand the Techniques and Potential of Virtual Reality. Springer, Berlin (1998) Waltemate, T., Gall, D., Roth, D., Botsch, M., Latoschik, E.: The Impact of Avatar Personalization and Immersion on Virtual Body Ownership, Presence, and Emotional Response. IEEE (2018) Wirth, W., Hartmann, T., Böcking, S., Vorderer, P., Klimmt, C., Schramm, H., et al.: A process model of the formation of spatial presence experiences. Media Psychol. 9, 493–525 (2007) Won, A.S., Bailenson, J., Lee, J., Lanier, J.: Homuncular flexibility in virtual reality. J. Comput. Mediat. Commun. 20, 241–259 (2015) Yee, N., Bailenson, J.N., Ducheneaut, N.: The Proteus effect implications of transformed digital self-representation on online and offline behavior. Commun. Res. 36, 285–312 (2009)

Chapter 2

Self in Virtual Reality

Between the self which analyses perception and the self which perceives, there is always a distance. But in the concrete act of reflection, I abolish this distance… I control in practice the discontinuity of the two selves (Merleau-Ponty 1958, pp. 49–50). The value of human self is not in some small, precious core, but in its vast, constructed crust (Minsky 1986, p. 41).

2.1 The Puzzle of Having a Self1 How do we know who we are? Recognizing ourselves is not always straightforward in everyday life, and it is even more complex in mediated environments, such as those enabled by virtual reality technology. It is widely assumed that mirror selfrecognition is a benchmark of self-recognition. However, a closer inspection reveals that mirror self-recognition is an aspect of self-recognition that is not the most representative of how we normally recognize ourselves in everyday life (van den Bos and Jeannerod 2002; Neisser 1988). With the exception of looking oneself in a mirror, everyday life actions normally allow us to visually perceive our bodies only partially. In addition, the received visual information is typically combined with the tactile and proprioceptive information about our bodies (Knoblich 2002). Taken together, these different types of information form an inter-modal representation of a person’s body, which is sometimes referred to as body image (Gallagher 2000). However, as any other instance of consciousness and cognition at play, self-recognition requires an agent—an experiencing, thinking self (Jeannerod 2003; Knoblich et al. 2003). Having a self has been a puzzling issue for philosophers, psychologists, and neuroscientists for a long time. Beginning with David Hume’s idea that an enduring self does not exist (White 1999), some philosophers claim that “no such things as selves exist in the world” (Metzinger 2003, p. 1) and that “What we often, naively, call 1 An

early version of some portions of this chapter appeared as Kljajevic (1999). Am I who I think I am? The self in virtual reality in the Proceedings of the 11th Virtual Reality International Conference (VRIC ‘09) (pp. 309–316), 2009: Laval, France. I thank Professor Simon Richir and LAVAL VIRTUAL for permission to reproduce this material.

© Springer-Verlag GmbH Germany, part of Springer Nature 2021 V. Kljajevic, Consensual Illusion: The Mind in Virtual Reality, Cognitive Systems Monographs 44, https://doi.org/10.1007/978-3-662-63742-5_2

33

34

2 Self in Virtual Reality

‘the self’ in folk-psychological2 contexts is the phenomenal self, the content of selfconsciousness, given in phenomenal experience” (p. 303). There are different aspects to the conscious experience of being someone: bodily self-experience, emotional self-consciousness, and cognitive self-reference. From the phenomenological point of view, a person’s experience is dominated by the sense of being an “I”. The neurocognitive mechanisms supporting the sense of self are far from clear (Knoblich and Sebanz 2005; Tsakiris et al. 2007), since the sense of self depends on multiple cognitive components and their interactions. Memory, emotion, perception, attention, and action affect the sense of self. The role of intentional action in selfrecognition has been particularly emphasized (Galaggher 2000; Jeannerod 2003; Knoblich 2002; Knoblich et al. 2003; Tsakiris et al. 2007; van den Bos and Jeannerod 2007; among others). For some scholars, this is because of the dual role of the self: it is an “embodied actor as well as an observer” (Neisser 1988, p. 39). Others emphasize that agency is necessary for the development of the self-world dualism (Russell 1995). An individual’s ability to recognize herself as the agent of a specific behavior also depends on her ability to recognize her body as a behaving body (Jeannerod 2003; Lenggenhager et al. 2007). In a way, the body represents the borders that localize the conscious self. For these reasons, it has been claimed that self-recognition requires both awareness of one’s body3 and awareness of one’s actions (van den Bos and Jeannerod 2002). This “spatial unity” of the conscious self and the body is a hallmark of the normal self-recognition experience (Lenggenhager et al. 2007). With virtual reality technology it is possible to change the primary location of the self from the physical location of the body to another location. Being virtually present elsewhere is enabled by virtual displacement. Understanding how the self functions when it is virtually displaced, and how “multiple simultaneous locations of the self” (Hofstadter 2007) function together, is critical for achieving one of the defining features of virtual environments—presence, that is, the impression of being away from the actual physical location of the body and in a computer-generated environment. Thus, a better understanding of the concept of self in virtual reality has theoretical implications as well as practical relevance for virtual reality design.

2.2 Disjoint Self Philosophers and psychologists have argued for a long time that the self is not a unitary phenomenon. Categorizations of the senses of self range from those that differentiate among the physical, mental, and spiritual selves, to more elaborated proposals, such as Neisser’s (1988), which distinguishes among the ecological, interpersonal, extended, private and conceptual aspects of the self. In Neisser’s categorization, for 2 The

term folk psychology was introduced by Dennett (1987) to refer to the knowledge we use in everyday life to explain thoughts, feelings and behavior of others. 3 The idea that the body is fundamental for the self has a long history, including work of Kant, Nietzsche, Freud, Merleau-Ponty, Edelman, Johnson, and Lakoff, among others.

2.2 Disjoint Self

35

example, the ecological self is the self-awareness related to a particular activity/ action; the interpersonal self is the self-awareness pertaining to human exchange behaviors, such as communicational and emotional. Other categorizations include cognitive, embodied, functional, and narrative selves (e.g. Strawson 1999). Furthermore, Damasio (1999) argues that the kinds of self include the proto-self, which is pre-conscious and relates to the neural patterns that represent current body states, and two conscious types—the core self and the extended/ autobiographical self. Clark (2003) also argues against a single, central self, and proposes a “soft self”, as a blend of neural, bodily, and technological processes. For Patricia Churchland, the self is a kind of emulation, constructed by the brain, for integrating and making sense of the inner world of the brain in its relation to the external world, including the other-person-world. Minimally, it has (1) a body component, (2) a “what-I-am-aware-of-now” component, (3) a stable but modifiable background of preferences, habits, skills, temperament, and so forth, and (4) a memory-based autobiographical component. These components are interrelated, but are also, to some extent, dissociable (Churchland 2011, p. 48).

What these categorizations have in common is the assumption that the self is a disjoint concept. The concept of self has been studied from many perspectives, as diverse as the disorder of self-monitoring in schizophrenia to emergence of self in a robot. These approaches to self were roughly grouped into those that focus on the so-called minimal self and those that focus on the narrative self (Gallagher 2000). The minimal self is the self as an immediate subject of experience, and the narrative self is the self extended to the past and future, which in Neisser’s (1988) categorization fits with the extended and the conceptual self. The concept of minimal self, which refers to the aspects accessible to immediate self-consciousness and aspects that involve the sense of action ownership and the sense of agency, is highly relevant for virtual reality. Given the predominance of the user-centered approach to design, it is important to understand how different cognitive processes contribute to participants’ experiences in virtual environments. The need for a better understanding of the first-person immediate experience in virtual reality is more pronounced by the facts that immersive virtual environments are technology-mediated, and that one of the defining features of such environments is presence. Similar to Clark’s concept of soft self, the self in virtual reality is an extended self that incorporates own projection in the virtual space and which can be molded in unique ways by a specific virtual environment. Thus, virtual selves may affect who we are beyond the virtual environments in which they are constructed (Sect. 1.4.1).

36

2 Self in Virtual Reality

2.2.1 The Minimal Self The concept of minimal self is often defined in terms of the immunity principle, the sense of self-ownership and the sense of self-agency (Gallagher 2000). The immunity principle postulates that when a person uses the pronoun “I”, he/ she unmistakably refers to himself/ herself. In other words, even though we may occasionally be mistaken regarding whether the body we observe is our own body (e.g. when one is performing the same movements within a group of dancers wearing the same costumes), one can never be mistaken about oneself as a subject of an experience and mistake oneself for another subject (White 1999). Since this ineffable sense of being an “I” is a precondition for the feeling of presence, it is critical that virtual reality environments do not disrupt participant’s sense of self. Extending the self from this “immediate, non-observational” sense of the first person experience to incorporate own representation in a virtual space may be difficult in virtual environments. For instance, the participant’s proprioceptive feedback may not match the action he performs in a virtual environment. In such cases, the participant needs to consciously try to match the first person experience with a perceptual experience or some other type of observational and reflective act in order to infer that he is the subject of that specific experience. This directly affects the sense of self and the sense of being and acting in the projected environment. Two aspects of the minimal sense of self are the sense of self-ownership and the sense of self-agency (Gallagher 2000; Martin 1995). The self-ownership is the aspect of self that knows that it is my body that is experiencing a particular action. On the other hand, self-agency is the aspect of the self that attributes the action to me: I am causing it. In everyday life, under normal circumstances that involve voluntary actions, the sense of agency and the sense of ownership coincide. However, in abnormal experiences, such as delusions and hallucinations in schizophrenia, the sense of agency is impaired and the action is ascribed to the other instead to the self. It is the ability to correctly assign actions to others and the sense of action control that are disturbed in these patients. For example, one delusional patient described her action as follows: “My fingers pick up the pen, but I don’t control them. What they do is nothing to do with me” (Blakemore 2003, p. 647). In depersonalization, on the other hand, individuals have problems related to their overall experience of the self, which is characterized by an erroneous self-attribution of the “I”. A person immersed in a virtual environment may feel to have multiple simultaneous selves and not be able to decide which one is the “real” self (Hofstadter 2007). Each of these selves has a first person perspective experience, and yet they differ with regard to their respective locations, raising the question Where am I? As pointed out by Hofstadter (2007), Gallagher (2000) and others, the whereness does not seem to arise as a question for a mobile robot that has for example two computerized guidance systems located elsewhere: our intuition is that a robot is where its body is and not where the control systems are. Thus, we would expect to find the robot’s self within its body. However, we do not readily assign a self to a robot, probably because it is difficult to think of a robot as a conscious subject of an experience,

2.2 Disjoint Self

37

even though the robot can intelligently respond to a challenging situation via sensors (Strawson 1999). For example, from the agency-ownership perspective, one could claim that even though the robot is definitely moving, it is not causing the movement. Clearly, it does not have a self in the sense in which we assign this concept to human beings. However, there are arguments that the minimal self may emerge even in a robot (Strawson 1999; Gallagher 2000; Hofstadter 2007). One hallmark of the normal sense of self in action is that we are not aware of the action itself, although we know that we are the agent causing the specific action (Frith 2005). Note that the sense of ownership may not coincide with the sense of agency in virtual environments. For example, a person who is walking in a virtual environment by using a joystick may have the sense of agency (she is causing the avatar’s action of walking on virtual streets) (e.g. Meilinger et al. 2008), but at the same time, her body is stationary and she is aware that it is not moving. In other words, she is not experiencing the self-ownership. The sense of self-ownership may also be disturbed in virtual environments that employ walking in place. The reason is bodily feedback: walking forward requires use of a different set of muscles than walking in place. In terms of bodily feedback, walking in place does not produce the sense of forward propulsion, since the movements (i.e. moving a foot up and down) are different relative to moving forward (Hollerbach 2002). Similarly, selfagency and self-ownership in virtual reality may fail to coincide due to a lag that the participants feel between a movement (e.g. head movement or hand motion) and the received visual or other feedback. For example, the mismatch may occur in cases where the participant’s walking direction and a centering motion on a treadmill designed to facilitate turning do not coincide. As another example, picking up and moving objects in a virtual environment by pressing buttons on a handheld device does not result in the same bodily feedback as when we actually pick and move objects in the physical world. The lack of appropriate bodily feedback may lead to the feeling of denaturalized body.

2.3 Denaturalized Body Bodily movements that project actions in virtual reality take place in physically constrained environments. At the same time, virtual reality environments are perceptually mediated and they may lack the type of bodily feedback that typically occurs when a person performs an action in a natural environment. Both CAVE-like displays and head mounted displays have been used for locomotion interfaces. Unlike CAVElike displays, virtual environment settings that involve head-mounted displays have limitations with regard to visual feedback on body, because they do not allow the participant to see his/her position on the locomotion interface (Hollerbach 2002). The lack of visual information on the body’s position, or “the self-vision”, and the bodily feedback on movement and action control contribute to the feeling of a “denaturalized” body in virtual reality.

38

2 Self in Virtual Reality

According to one well-established theoretical framework of motor control, we use internal models to represent our actions and interactions with objects (Wolpert and Flanagan 2001). The forward model of motor control captures the importance of prediction of motor actions (Miall et al. 1993). The model postulates that each of our actions is associated with a specific prediction of the sensory feedback. If the prediction and received sensory feedback for a specific action do not match, we cannot be sure that we caused that action (Knoblich and Sabanz 2005). For example, a lag between the head movement or hand motion and the feedback on that motor act in a virtual environment cannot be incorporated into the forward model (in everyday life we normally do not experience such delays). The larger the delays, the less accurate the predictions of the forward model, which in turn means the less likely it is that we will experience ourselves as someone who is causing the action (Blakemore 2003; Knoblich and Sabanz 2005; Sato and Yasuda 2005). Thus, virtual reality may cause a feeling of an unreliable internal predictor: a person intends to make a movement and carries out the motor act that is supposed to realize the movement, but the bodily feedback does not verify the movement. Even though the movement (e.g. turning right on the corner of a virtual street) matches the person’s intention to perform exactly that motor act (to turn right), the resulting bodily experience matches neither (the body is stationary).

2.3.1 Body Representation: Body Image and Body Schema Two important concepts related to body representation are body image and body schema. There has been a conceptual and terminological confusion of body image and body schema since Henry Head’s work in the 1920s and it is present in more recent approaches to body representation. The interchangeable use of these terms has led to theoretical confusion and presumably it is to blame for many inconsistencies observed in empirical findings on body representation (Gallagher 1995). Briefly, body image is a representation of the body that contains beliefs, impressions, emotional attitude, and understanding that one has about own body; it is attention to and awareness about own body (Gallagher 1995). Body image is conscious and evaluative, and it has an intentional status. In contrast, body schema is a postural model or an automatized, subconscious system that monitors and governs body posture and movement; it operates outside of consciousness and thus it does not have an intentional status. As pointed out by Campbell (1995), the distinction between body image and body schema is not a distinction between two different types of representations. Rather, it is a distinction between the ways in which body representation is used. If it is used to mediate own perceptions and actions, the body representation is used as a body schema. If it is used to register how it affects others, then the body representation is used as a body image. Importantly, body schema can be extended to include noncorporeal objects that are associated with the body, such as tools (e.g. hammer, bike), prosthetic devices,

2.3 Denaturalized Body

39

ornaments, clothes (Berlucchi and Aglioti 1997). Such extensions are temporary and they are not necessarily reflected in the body image (Gallagher 1995). The inclusion of tools and other noncorporeal objects in the body schema works in such a way that it feels “as if our own effector (e.g. the hand) were elongated to the tip of the tool” (Maravita and Iriki 2004, p. 79). The body schema is constantly being updated based on the rich influx of multisenosry information. Without the constant update of body schema, the body would not be able to act efficiently. The subjective feeling of bodily extension due to inclusion of noncorporeal objects in body schema has been associated with activation of the brain regions that support updating of body schema. Thus, use of noncorporeal objects, including virtual reality tools, allows a functional extension of the body into space; it extends our body beyond its physical boundaries and expands the space in which we can act (Berlucci and Aglioti 1997; Maravita and Iriki 2004). However, the neural correlates of virtual tool use in humans are still not clear and it is also unclear whether similar multisensory mechanisms support the use of physical and virtual tools (Maravita and Iriki 2004). The effects that virtual environments have on body image and body schema have been investigated to some extent, which is particularly evident in research on illusions in virtual reality elicited by manipulating a part of the body or the whole body (Chap. 1). Various types of inputs, such as proprioceptive, somatosensory and visual, combine to produce the body schema; the body representation critically depends on multisensory integration. To better appreciate the complexity of such an integration, consider the structure of the somatosensory system. The system, whose name originates from the Greek word soma (“body”), consists of several subsystems that differ with regard to the type of signal they send to the brain about various aspects of the body: (i) the internal milieu and visceral division is in charge of interoceptive sensing throughout the body; (ii) the vestibular and musculoskeletal division is in charge of proprioceptive or kinesthetic sensing; and (iii) the fine-touch division signals alterations in the skin sensors when we come into contact with an object and investigate its features (e.g. texture, form, weight, temperature etc.) (Damasio 1999). Thus, the bodily information integration, which is necessary for a stable representation of the body, is far from trivial. Physicality expectations regarding own body and bodily representations of others in virtual environments are important not just for the feeling of presence and copresence, but also for performance of individual and joint tasks. If body is not adequately represented, it is difficult to have self-awareness and awareness of other participants in the virtual environment, to interact with them, synchronize joint actions, and work on collaborative tasks. For instance, there are numerous reports on negative response of participants whose avatars’ bodily space was disturbed. One study reports that the participants were “truly upset when their avatars were seen to pass through each other”, which happened because an adequate collision detection was lacking (Durlach and Slater 2000, p. 217). Another study, however, reports no such effects during avatars’ collision (Heldal et al. 2006). However, even though the strong emotional response observed by Durlach and Slater (2000) was absent in this other study, inadequate bodily representation and lack of acknowledgment of other avatars had a negative effect on participants’ cooperation, as discussed in Sect. 3.3.

40

2 Self in Virtual Reality

Thus, an adequate representation of the body in a virtual space remains an important requirement for virtual environments.

2.3.2 A Combined Tools Extension of the Body Designing interfaces for locomotion in virtual environments is a big challenge, in particular when they rely on hand-held devices that create the illusion that the environment is in movement, while the body is stationary (Leeb et al. 2006; Meilinger et al. 2008). This causes simulation sickness, which negatively affects presence and task performance, reducing applicability of virtual reality technology. It has been suggested that one way to overcome this problem and design better virtual reality interfaces would be through employing navigation by thought. That is, taking advantage of imagining, instead of carrying out, locomotion movements (Leeb et al. 2006). Drawing on the electroencephalogram (EEG)-based brain-computer interface research, which typically involves patients with severe motor or communication deficits, mostly post-stroke as well as patients in locked-in state,4 virtual reality researchers have developed paradigms that incorporate these complementary tools. A brain-computer interface allows a direct connection between the brain and a computer, and EEG registers oscillatory activities in the brain. Among the various paradigms based on the combination of EEG and brain-computer interface, one particularly interesting with regard to perception of the body in virtual environments is a recently developed brain-computer interface paradigm for virtual reality control of events that is based on motor imagery. Briefly, thought-based navigation in virtual reality considers a brain-computer interface as input to a highly immersive CAVE-like system (Friedman et al. 2004; Leeb et al. 2006) (Fig. 2.1). A person moves through a virtual environment by imagining movements. The locomotion movements are predefined and EEG electrodes attached to the scalp record thought-related changes in the signal. Among different oscillatory activities of the human brain contained in the EEG, the oscillations in the alpha and beta band are most suitable for differentiation among the brain/mental states. The mental processes produced by the participant are recognized by the system as different patterns. For example, movements of the right hand, left hand, foot or tongue, elicit changes in the EEG signal over the sensorimotor cortex. The obtained signal is then transformed by the brain-computer interface into a control signal associated with computer commands (Friedman et al. 2004; Leeb et al. 2006). Navigation by thought in virtual reality is an interesting design concept, because it combines virtual reality, as one kind of extension, with brain-computer interface, 4 Locked-in syndrome is a condition in which a patient is conscious and cognitively intact, but unable

to move due to paralysis of almost all voluntary muscles and unable to communicate verbally. Since vertical eye movements and blinking are spared in cases that are not in total locked-in syndrome, some nonverbal communication is possible for the patients in this condition via eye movements.

2.3 Denaturalized Body

41

Fig. 2. Schematic model of a BCI-VR system (Leeb et al. 2006). Reproduced with permission from the MIT Press

another kind of extension (Friedman et al. 2004). When combined, these tools have an enormous potential for application in the domain of health rehabilitation as well as in various domains of life of healthy people, such as entertainment and art. Braincomputer interface-based virtual reality applications and the notion of using mental activity to perform tasks in virtual environments are an exciting milestone in virtual reality development. As a new type of input device, brain-computer interface changes the way of interacting with a virtual environment, and at the same time virtual reality technologies are becoming valuable tools in brain-computer interface research (Lotte et al. 2012). Furthermore, navigation by thought is a plausible concept in neuro-cognitive terms. It has support in neuroimaging evidence from research on human movement. For instance, execution of grasping and manipulation of objects (Binkofski et al. 1999), motor imagery of these movements (Binkofski et al. 2000; Grèzes and Decety 2002), and observation of meaningful (but not meaningless) hand actions (Fadiga and Craighero 2006; Fadiga et al. 2006) activate the same brain area—Brodmann area 44. What is particularly interesting is that this brain area becomes activated not only when it processes actual movements, but its activation is triggered by motor imagery in the absence of real movements (Blakemore 2003). Considering these findings, motor imagery as a strategy for navigation in virtual environments seems warranted. Motor imagery is a strategy suitable for the category of navigation tasks in virtual reality, but other brain-computer interface-virtual reality paradigms relying on evoked potentials such as P300 and SSVEP5 are more appropriate for other categories of tasks (e.g. for selection and manipulation of virtual objects). 5 P300

is an event related potential component, which has a positive going amplitude that peaks around 300 ms, and SSVEP is a steady state visually evoked potential.

42

2 Self in Virtual Reality

Another advantage of using this type of interface in virtual environments is that it has the potential to overcome the problem of motion sickness. Theoretically, this type of interface has a special status, because, in contrast to traditional interfaces, it bypasses peripheral nerves and muscles, raising important questions on embodiment and embodied cognition (Lotte et al. 2012). How does navigation by thought in a virtual environment affect the sense of self-ownership and self-agency? A potential problem is not only lack of self-ownership for the imagined action, but also an impaired sense of self-agency. Namely, one component of agency is anticipation of bodily feedback. If that feedback is not present, the self cannot establish to whom in the environment to attribute the action. One participant in a navigation by thought experiment reported that “it was more like a dream—you move but you do not feel your body physically move”, while another reported that she felt “as if her whole body were rotating” (Friedman et al. 2004). A person’s bodily awareness largely depends on whether the visual cues match the cues on his/her body’s position, and one’s awareness of own actions depends on the movement cues. Thus, due to the lack of bodily feedback for movement, as in the first example above, or due to a mismatch between the cues, as in the second example, the sense of self as well as the spatial self-body unity may be compromised in navigation-by-thought. As a consequence, the participant’s interaction with others and his/her manipulation of virtual objects may be compromised as well, negatively affecting presence and task performance. Other potential problems with navigation by thought include drifting of thoughts due to lack of concentration or sudden shift in attention, which may stop the movement. Similarly, making a mistake when imagining a movement may result in performance of a different movement (e.g. walking backwards instead of forward), loss of balance, or in experiencing discomfort (Friedman et al. 2004; Leeb et al. 2006; Lotte et al. 2012). These mistakes, too, have negative impact on presence and task performance in virtual environments.

2.3.3 Bodily Self-Consciousness Although research on agency as a constitutive element of bodily self-consciousness is well-established, there is also a view that agency is an “enabling” condition of the development of the conscious experience of being a self. On this view, “a passive, multisensory and globalized experience of ‘owning’ a body is sufficient for minimal conscious selfhood” (Blanke and Metzinger 2008, p. 12), where the main components of the core feeling of bodily self-consciousness are: (1) self-identification, meaning that we feel that our body belongs to us, (2) self-location, i.e. the ability to experience ourselves in a physical space (Aspell et al. 2010), and (3) a first person perspective,6 6 For

a distinction between three different readings of “first-person perspective” (week, strong and cognitive) see Blanke and Metzinger (2008). Roughly, the minimal self-model requires a weak first-person perspective—a system represents itself as a self. More complex models of selfhood

2.3 Denaturalized Body

43

i.e. our main perspective on the external world is from within our body (Brugger 2002; Blanke and Metzinger 2008; Salvatore and Candidi 2011). Thus, minimal conscious selfhood is a representation of the body that involves multiple simultaneous processes associated with various body properties. Importantly, if any of the integration processes that lead to the phenomenal self-model is disrupted, and for instance a state is left out of global control, disturbances of bodily self-consciousness will follow. This applies to emotional and cognitive states, extending to the models of conscious selfhood that are more complex than the minimal model (Metzinger 2003). The disturbances may be related to specific body parts or to the entire body. A disrupted sense of spatial unity of self and body leads to a distorted sense of self’s relation to the body and the surrounding environment. As an example, somatoparaphrenia is a distorted sense of ownership of body parts, which is associated with damage to the right temporo-parietal cortex (Blanke and Metzinger 2008). While some sompatoparaphrenic patients misattribute their contralesional hand to another person, others show the opposite pattern, misattributing other person’s hand as their own if presented in their contralesional hemispace. Research on body part ownership using virtual reality has shown that healthy individuals also misattributed body parts when incoming multisensory information was incongruous (Langgenhager et al. 2007; Mel Slater 2008). In a recent study Lenggenhager et al. (2007) manipulated bodily selfconsciousness in a virtual reality environment to determine its impact on the sense of self. They have shown that using conflicting visual-somatosensory input to disrupt the spatial self-body unity in virtual reality resulted in mislocalization of the self: … participants felt as if a virtual body seen in front of them was their own body and mislocalized themselves toward the virtual body, to a position outside their bodily borders (p. 1096).

Apparently, multisensory integration and cognitive processing of bodily information have a critical role in the sense of self. Due to its mediated character, a virtual reality environment may lack an optimal temporal alignment of multisensory information, which may lead to inadequate self-representation. Such undesirable effects on the sense of self may resemble disturbances found in psychiatric or neurological conditions. For instance, patients who completely lost proprioception apparently experience a bodiless state of self-consciousness, while patients with schizophrenia and those with depersonalization disorders, as in dissociative identity disorder, experience disintegration and depersonalization, which is sometimes associated with multiple phenomenal selves within the same physical body (Metzinger 2004). Furthermore, neurological patients with illusory perceptions of their own body report experiences during which they see a second own body in extracorporeal space. Illusory perceptions of this kind are typically associated with multisensory disintegration (Blanke and Metzinger 2008) and they can be induced in healthy individuals require other abilities. The transition from the minimal model of self-consciousness to the strong first-person perspective requires attentional access and it makes consciousness subjective, whereas a cognitive first-person perspective requires the capacity of self-reference using self-concepts.

44

2 Self in Virtual Reality

by a multisensory conflict. Therefore, it is crucial that virtual reality aligns parameters related to multisensory integration in such a way that their alignment does not disrupt participants’ conscious experience of being a self. To understand better the detrimental effects of a disrupted sense of spatial unity of the self and body, which may occur in virtual environments due to technologyrelated issues, we briefly review abnormal spatial perspective-taking in autoscopic phenomena.

2.3.4 Spatial Perspective Taking and Extracorporeal Experiences Disembodiment or out-of-body experience is an example of disordered bodily self-consciousness, which together with autoscopic hallucinations and heautoscopy belongs to autoscopic phenomena (Brugger 2002). Although these phenomena have been known for some time,7 they have been largely neglected until relatively recently, their status undecided, hovering between neurobiology/neuroscience and mysticism (Blanke et al. 2013; Blanke and Arzy 2005). The advancement of virtual reality technologies and neuroimaging methods has allowed new experimental approaches to these topics, in which virtual reality is used to experimentally induce such phenomena and neuroimaging to establish where in the brain such experiences are generated. In general, autoscopic phenomena (from the Greek autos “self” and skopeo “looking at”) are experiences of “increasing detachment from one’s own body as a point in space on which the observer’s perspective is normally centered and from which the world is observed” (Brugger 2002, p. 180). They are experiences in which a person’s perceived and actual body positions do not match (Blanke et al. 2004). For instance, in an out-of-body experience, the person sees his/her body from a location outside the physical body. This location is typically described as an elevated disembodied location, and the experience indicates an abnormal first person perspective and abnormal self-identification. Such experiences have been associated with states such as near-death experience, epilepsy, migraine, infarction, infection, neoplasia as well as schizophrenia, depression, anxiety, and dissociative disorders, occurring also in about 10% of individuals not suffering from neurological or psychiatric disorders (Blanke et al. 2004; Moseley et al. 2012; Heydrich and Blanke 2013). Compared to out-of-body experience, other autoscopic phenomena have been less studied. To distinguish between autoscopic hallucination, heautoscopy, and out-of-body experience researchers look at the observers’ first person perspective, self-location, and self-identification, as main components of bodily self-consciousness (Aglioti and Candidi 2011; Cardini et al. 2013). The first person perspective is the perspective from which we perceive the world (Petkova et al. 2011), which is from within our own body in normal circumstances. An abnormal first-person perspective is the experience 7 Apparently, the first study on an out-of-body experience elicited by an artificial finger was published

in 1937 by Tastevin (Moseley et al. 2012).

2.3 Denaturalized Body

45

of seeing own body and the world from an extrapersonal location, as in out-of-body experience. Next, self-location is the knowledge of the location in space where the person’s experience is (Aspell et al. 2010), i.e. where the person feels that his/her self or center of awareness is. An abnormal self-location would be experiencing the self as being located outside the person’s own physical body. Finally, self-identification relates to body ownership and whether the person identifies with his/her physical body. An abnormal self-identification includes experiences such as identifying with an autoscopic body, which is a disembodied location, and not with own physical body. In autoscopic hallucination, the first person perspective, self-identification and self-location remain normal, and subjects see “an image of their own body in extrapersonal space as if they were looking in a mirror” (Blanke et al. 2004, p. 791). Thus, autoscopic hallucination is not considered a disorder of bodily self-consciousness, but a visual hallucination. The next autoscopic phenomenon on the continuum of increasing detachment from one’s own body is heautoscopy, as an intermediate form between autoscopic hallucination and out-of-body experience. Like in the other two autoscopic phenomena, the person in heautoscopy has the impression of seeing his/her body in an extrapersonal space. However, unlike in the other autoscopic phenomena, those in heautoscopy often report an experience of existing at two locations at the same time—in the physical body and in the autoscopic body—and they may have simultaneous or alternating self-locations and first-person perspectives between the two bodies. In out-of-body experience the self is located outside the bodily borders. Using spatial perspective-taking as a criterion for differentiation of autoscopic phenomena, Brugger (2002) pointed out that in autoscopic hallucination the spatial perspective that the person takes is body-centered; thus, the self and the body remain a spatial unity, and the autoscopic body is just a reversal of the person’s body. Therefore, this is a visual hallucination and not a disorder of bodily self-consciousness. However, heautoscopy and out-of-body experience represent bodily self-consciousness disorders, because in these cases the spatial unity of the self and the body is not preserved. In heautoscopy, the person’s perspective may alternate between the physical body and the autoscopic body, whereas in out-of-body experience the person’s perspective is outside the physical body, the one’s self is dissociated from the body and located in extracorporeal space. Normally, the self is where the felt body is. Since in autoscopic hallucinations the self remains body-centered and there is no duplication of the self, autoscopic hallucinations are not disorders of self-consciousness. In contrast, in heautoscopy, in addition to two bodies (autoscopic and physical), there are two selves: With increasing bodily depersonalization, there is an increase in the doppelgänger’s ‘personalization’, that is, the subject may wonder whether it is the body or rather the doppelgänger which contains the real self…. (Brugger 2002, p. 184)

This pathology is even more pronounced further down the autoscopic continuum, in out-of-body experiences: even though there is only one self, it is completely

46

2 Self in Virtual Reality

outside the physical body. The phenomenal dislocation of the self from the physical to autoscopic body in out-of-body experiences leaves a disembodied, although psychologically unitary self, which represents a sharp distinction from heautoscopy, where the self may split in two selves.8 Unlike autoscopic hallucination, where the observer’s perspective is body centered but reversing its original sidedness, in out-ofbody experience the observer’s perspective is entirely projected to the reduplicated body, while maintaining its original sidedness. The neurology literature suggests that different brain regions are implicated in these distorted experiences of the self-body relationship, associating autoscopic phenomena with differential patterns of brain damage (Heydrich and Blanke 2013; De Ridder et al. 2007). However, autoscopic phenomena also occur in approximately 10% of neurologically intact population. It remains an important task for future research to determine the mechanisms giving rise to these phenomena in healthy people. Although different from autoscopic phenomena in many ways, virtual reality also challenges the view that the self is where the body is, allowing the self to map onto a body projected in a virtual space (e.g. Pomés and Slater 2013; Bourdin et al. 2017). The idea is that a virtual body becomes incorporated into the body schema, body ownership extends to a virtual body and self-identification extends to a virtual space. But how can the spatial self-body unity be preserved in virtual environments, where the body extends to a computer-generated representation and the sense of unity critically depends on technology? To create usable virtual environments and applications, virtual reality design seeks to ensure behavioral fidelity of virtual characters and create stable environments that will allow all aspects of behavior, from perception and cognition to unconscious responses of the autonomic nervous system, to function as they do in natural settings (Slater 2003). This goal is largely based on the assumption that multisensory integration of bodily signals associated with projections in virtual environments takes place in the same way as multisensory integration of bodily signals coming from natural environments, which may not be the case.

2.3.5 Perceptual Asynchrony and Information Integration Considering the reality-virtuality continuity hypothesis (Sect. 1.1), one may wonder how the mediated nature of virtual environments and technology-related lags affect information integration in such settings. Cognitive neuroscientists have investigated how different cognitive processes that are distributed in time or across cognitive agents get to operate together and produce a coherent experience, and how different 8 Partial

or total detachment from the physical body may be a tool for achieving an emotionalpsychological detachment (e.g. “it is my body that suffers, not my self”) and release from an otherwise unbearable reality, as found in severe depression, schizophrenia, or even in healthy people when facing extraordinary suffering (Brugger 2002).

2.3 Denaturalized Body

47

features of an object that are computed in different brain regions get integrated to represent a single object. The involved processes operate at a very rapid pace and within a temporal hierarchy. As an example, consider perceptual asynchrony and temporal hierarchy in visual processing. A neuron receives, integrates and propagates a signal in about 5–10 ms, but it takes about 300 ms to complete a visual pattern recognition task (Churchland and Grusch 1999). Even though all attributes of a particular visual percept are bound together after up to approximately 500 ms from the object’s appearance, we perceive its color for about 80 ms before its motion; locations are perceived before colors, and colors before orientations (Zeki 2003). Furthermore, binding within attributes is faster than binding between attributes. Assuming that “to perceive something is to be conscious of it” and considering that we become conscious of different attributes at different times, Zeki (2003) introduced a distinction between macro-consciousness, which refers to a bound percept, and micro-consciousness, which refers to consciousness of a single attribute of the percept, postulating also a level of unified consciousness. Each of the levels is characterized by temporal hierarchies. However, even though the participating processes are fast and smooth, macro-consciousness can contain results of false binding (Moutoussis and Zeki 1997; Zeki 2003; Maloney et al. 2012). For instance, in a study by Moutoussis and Zeki (1997), participants tended to misbind the color and the direction of motion, or the color and the orientation of lines. The authors claim that … the normal brain does not perceive different attributes of the visual scene at the same time, nor is it able to synchronize its different perceptual systems to ‘time 0’. Instead, the brain mis-binds in terms of real time, which is the same thing as saying that it only synchronizes the results of its own operations (p. 1412).

The separateness of attributes (e.g. color, motion, shape) and incorrect recombining (e.g. color, motion) can create an illusion (Wu et al. 2004). As another example of the binding problem consider perception of space. As we perceive the space around us, the brain constructs not only one but several representations of the perceived space, some of which may occur simultaneously (Colby 1998). These representations are constructed using different frames of reference (Chap. 5). The question of how the brain combines the multitude of spatial representations into a unitary percept is another challenging binding related issue. The perceptual asynchrony that occurs while the brain processes visual information in natural environments is an example of how human mental time works with regard to perceptual awareness in the real world. Understanding how it functions in virtual reality is more complex, because perception is further mediated by technology, which increases the magnitude of the task the brain has to perform when integrating information. Even though the brain is highly efficient in information integration, as shown by the fact that regardless of perceptual asynchrony, attributes in early vision processes for example are rarely misbound (Maloney et al. 2012), lags in virtual environments, i.e. delays caused by system design and its features, may disrupt perception.

48

2 Self in Virtual Reality

Visual perception is considered to be a more dominant type of input in virtual environments than auditory or proprioceptive input,9 and although other sensory inputs such as haptic and olfactory are still relatively insufficiently present in these environments, integration of information coming from multiple sensory sources may not be as seamless as necessary. Furthermore, perception of an object, regardless of a sensory channel, in addition to specialized sensory signals always includes “signals from the adjustments of the body” made to obtain the perception (Damasio 1999, p. 147). How is this accomplished in situations where the physical body is not located in the action space and the action takes place in a depicted, computer-generated environment? Brooks (1999) criticized the design of virtual reality tracking systems in the 1990s for inadequately dealing with system latency and emphasized the need for paying more attention to this problem: Perceptually, the greatest illusion breaker in 1994 systems was the latency user motion and its representation to the visual system. Latencies routinely run 250 to 500 ms. Flight simulator experience had shown latencies of greater than 50 ms to be perceptible. In my opinion, endto-end system latency is still the most serious technical shortcoming of todays’ VR systems (p. 18).

Thus, system latency is dangerous in virtual environments because it affects seamless multisensory information integration and the feeling of presence. As a final example, consider visuo-haptic asynchrony in virtual environments. In general, determining which signals appear simultaneously is important for a decision on whether they should be bound together into a single multisensory percept. In virtual reality, different sensory inputs can have different latencies, where the resulting asynchronies can negatively affect participant’s experience. A recent study investigated a range of visuo-haptic asynchronies that in fact remained unnoticed when touching an object in an immersive virtual environment (Di Luca and Mahnan 2019). The task was to touch the right side of a cube on a virtual table with a stretched out index finger of the right hand, which was visually represented as well, and judge whether the view of the contact with the virtual cube and the haptic signal— vibration at the fingertip upon tapping—where synchronous or asynchronous, and which one appeared first. Remarkably, asynchrony was not detectable if the haptic feedback appeared less than 50 ms after the view of the contact with the virtual cube; however, when the haptic feedback appeared before the visual, the asynchrony remained unnoticeable only if the visual lag was up to 15 ms. Here in addition to multisensory integration (visual-haptic), the participant needs to integrate the feedback from his hand movements in immediate reality (internal representation) with the projected representation of the movement in virtual reality (external representation). A question of interest here is how the internal and the external representations come together to form a coherent experience. Consider Hutchins’ (1995) distributed cognition approach, according to which: 9 Vision is also considered to be more dominant than other types of inputs in the physical world, but

see van Beers et al. (2002) for a different opinion regarding proprioception and Driver and Spence (2000) regarding auditory dominating visual perception.

2.3 Denaturalized Body

49

The properties of functional systems that are mediated by external representations differ from those that rely exclusively on internal representations, and may depend on the physical properties of the external representational media. Such factors as the endurance of a representation, the sensory modality via which it is accessed, its vulnerability to disruption, and the competition for modality specific resources may all influence the cognitive properties of such a system (Hutchins 1995, p. 286). Following Hutchins (1995), we consider this approach explicitly cognitive in that “it is concerned with how information is represented and how representations are transformed and propagated in the performance tasks” (p. 265). These are distributed systems in the sense in which Hutchins argues that cognition can be distributed across agents, environments, and situations. Wilson (2002) argues on several grounds against the strong distributed cognition view, which posits that a cognitive system in principle cannot consist only of an individual mind. She promotes instead a weaker view of distributed cognition. A cognitive system distributed across a situation (distributed cognition in the strong sense) would be highly transient, its elements and relations among them would change every time a person enters a new location or interacts with new objects. Such a system would be fairly unstable. On the other hand, a cognitive system that would consist of only an individual mind would be a persisting system, with its various components retaining their functions in the system across time (e.g. perception, attention, working memory) while being at the same time also open to its environment (e.g. receiving sensory input continuously). According to the weaker version of distributed cognition view, “studying the mind-plus-situation is considered to be a promising supplementary avenue of investigation, in addition to studying the mind per se” (Wilson 2002, p. 631). It appears that the latter view of distributed cognition is more suitable for studying cognition in virtual reality.

2.4 The Neural Basis of Self-Representation Since there is much disagreement regarding whether “a self” exists, and given the lack of consensus on what it consists of among the scholars who argue for some version of this concept, it is not surprising to find disparate views on what constitutes the neural basis of the self. Regarding the bodily self, an interesting proposal was developed in the context of studies on pain and phantom limbs. Briefly, our bodily self is supported by a neuromatrix, an innate, “genetically controlled neural network, modifiable by experience” (Melzack 2001, p. 1379): The neuromatrix, distributed throughout many areas of the brain, comprises a widespread network of neurons that generates patterns, processes information that flows through it, and ultimately produces the pattern that is felt as a whole body possessing a sense of self. The stream of neurosignature output with constantly varying patterns riding on the main

50

2 Self in Virtual Reality signature pattern produces the feelings of the body-self with constantly changing perceptual and emotional qualities (p. 1380).

The neuromatrix is both stable and plastic at the same time. Its stability comes from the commitment of some brain regions to permanently represent in conscious awareness the body parts from which they receive input, and its plasticity comes from the ability of the brain regions that are deprived of their natural inputs to become activated by inputs from other brain regions (Berlucchi and Aglioti 1997). The neuromatrix explains a range of findings. For instance, the ability of neonates to imitate movements is explained in terms of an implicit knowledge of body schema, where the capacity to imitate others implies the ability to form visual representations of observed actions performed by others (Jeannerod and Jacob 2005). The neuromatrix further explains phantom phenomena, such as amputees’ experience of pain in the amputated body parts as well as phantom sensations in children born without one or more limbs. Thus, the brain may have an innate capacity to represent intact body schema (Berlucchi and Aglioti 1997). While the representations of one’s own body, own actions and their effects on others typically implicate the inferior parietal lobe (Jeannerod and Anquetil 2008), neuroimaging findings suggest that the neural basis of a cohesive self-representation involves two large networks: the mirror neuron system and the default state network (Molnar-Szakacs and Arzy 2009). The mirror neuron system is a fronto-parietal network that has been associated with action performance and with observation of others performing actions. The default mode network has been associated with autobiographical memory, prospection, which is also known as episodic future thinking, mental time travel, spatial navigation, and reading other people’s minds (Buckner and Carroll 2006). While other evidence converges in suggesting the contribution of these networks to aspects of self-representation, the question of the neural basis of self still hinges on one’s definition of the self. Another issue in determining the neural basis of self-representation pertains to the level of analysis. Working on visual perception, specifically on his computational theory of vision, David Marr wrote that complex systems require different kinds of explanation at different levels of description that are linked, at least in principle, into a cohesive whole, even if linking the levels in complete detail is impractical. For the specific case of a system that solves an information-processing problem, there are in addition the twin strands of process and representation (Marr 1982, p. 20).

Briefly, Marr proposed three levels of analysis: (1) the computational level of abstract analysis of a problem, (2) the algorithmic and representational level, which specifies a formal procedure for its solution, and (3) the level of physical implementation, which could be anything from the brain to a computer. This tripartite division was conceptualized in such a way that the higher level questions were considered as being independent from the lower levels. As a consequence, details of the neuronal architecture have been neglected in many early artificial intelligence projects dealing with modeling of the human mind (Churchland and Grush 1999). Research on virtual reality has largely continued the trend of early artificial intelligence projects in this sense, neglecting the possibility that such a reduced view may lead to a partial or

2.4 The Neural Basis of Self-Representation

51

even distorted picture of how the mind functions in virtual environments. The danger for virtual reality design lies in relying on such partial or distorted views. Although Marr warned that a mistaken view of the computational level description of a system would turn attempts to theorize at other levels “confounded by spurious artifactual puzzles”, Dennett (1989) emphasized that Marr had underestimated the extent to which descriptions at the computational level might be misleading. Importantly, the levels in Marr’s tripartite model are not hierarchical and choosing a level of analysis depends on a problem at hand. If, for instance, the question is related to how something is done in the brain, then the right level of analysis is the level of implementation. In that case, we may study neural networks (e.g., Posner et al. 2006) or single-cell recordings (e.g., Quiroga et al. 2005; Rey et al. 2015). Information processing is at the level of computation, which is the least specified of Marr’s levels. Psychologists mostly study phenomena at the representational and algorithmic level (Holyoak 1999). While Marr’s approach has been very influential in cognitive science, the fact is that the nervous system displays not one level of implementation, as postulated by this model, but many structural levels, each having important functional capacities for supporting computation.10 Thus, there is much more interdependence between the levels of computation and implementation than originally postulated by Marr, which is a point that has been emphasized by scholars such as Churchland and Grush (1999). While the question of how the brain represents and computes remains to be fully elucidated, it is important to recognize that complex dynamic systems require multiple levels of analyses and that they cannot be fully explained at a single level of analysis. To contextualize these ideas within a perspective of virtual reality, one has to recognize that participant’s experiences in virtual environments need to be explained at different levels of analysis. For instance, components of bodily self-consciousness— self-location, self-identification and first-person perspective—which are important for the feeling of presence in virtual environments, are supported by the temporoparietal junction together with a bilateral brain network including supplementary motor area and premotor, parietal, occipito-temporal and insular cortices (Ionta et al. 2011, 2014). Self-location and first person perspective are supported by the predominantly right hemisphere network including the right insula and right supplementary motor area, together with bilateral temporo-parietal junction. The relevance of these findings for distorted experiences of own body and bodily illusions in virtual environments is related to multisensory integration. Lesion studies suggest that damage to the temporo-parietal junction has been associated with out-of-body experiences, but not with other autoscopic phenomena. For instance, autoscopic hallucination, a type of visual hallucination in which the self remains body-centered, has been associated with lesion in the right occipital cortex, while heautoscopy, in which one alternates between two bodies (autoscopic and physical) and two selves, has been associated with damage to the left posterior 10 The major levels of organization that are discerned in the nervous system are: the central nervous system, systems, maps, networks, neurons, synapses, and molecules.

52

2 Self in Virtual Reality

insula (De Ridder et al. 2007, p. 1829). In out-of-body experience, which is even further away from the normal experience of the self-body unity because the self is experienced as being completely outside the physical body, the faulty neural mechanism is disruption of multisensory integration of bodily signals (Heydrich and Blanke 2013, p. 791). De Ridder et al. (2007) described a tinnitus patient whose treatment with implanted electrodes in the right hemisphere temporo-parietal junction regularly induced out-of-body experiences: the implanted electrodes interfered with the integration of multisensory bodily information (visual, tactile, proprioceptive, and vestibular). Similarly, illusions regarding a virtual body are triggered by a multisensory conflict in a virtual environment (Petkova et al. 2011), indicating that multisensory integration processes are critical in generating the sense of spatial location of the self and the sense of body ownership and agency in virtual environments. Thus, without postulating a theoretical framework built around the concept of multisensory integration, which allows a multi-level account across the discussed phenomena, these findings would be just sets of random, unrelated data.

References Aglioti, S.M., Candidi, M.: Out-of-place bodies, out-of-body selves. Neuron 70, 173–175 (2011) Aspell, J.E., Lavanchy, T., Lenggenhager, B., Blanke, O.: Seeing the body modulates audiotactile integration. Eur. J. Neurosci. 31, 1868–1873 (2010) Berlucchi, G., Aglioti, S.: The body in the brain: neural bases of corporeal awareness. Trends Neurosci. 20, 560–564 (1997) Binkofski, F., Buccino, G., Posse, S., Seitz, R.J., Rizzolatti, G., Freund, H.-J.: A fronto-parietal circuit for object manipulation in man. Evidence from an fMRI study. Eur. J. Neurosci. 11, 326–3286 (1999) Binkofski, F., Amunts, K., Stephan, S.M., Posses, S., Schormann, T., Freaund, H.-J., et al.: Broca’s region subserves imagery of motion: a combined cytoarchitectonic and fMRI study. Hum. Brain Mapp. 11, 273–285 (2000) Blakemore, S.-J.: Deluding the motor system. Conscious. Cogn. 12, 647–655 (2003) Blanke, O., Landis, T., Spinelli, L., Seeck, M.: Out-of-body experience and autoscopy of neurological origin. Brain 127, 243–258 (2004) Blanke, O., Arzy, S.: The out-of-body experience: disturbed self-processing at the temporo-parietal junction. Neuroscientist 11, 16–24 (2005) Blanke, O., Metzinger, T.: Full-body illusions and minimal phenomenal selfhood. Trends Cogn. Sci. 13, 7–13 (2008) Bourdin, P., Barberia, I., Oliva, R., Slater, M.: A virtual out-of-body experience reduces fear of death. PLoS One 12, e0169343 (2017) Brooks, F.P.: What’s real about virtual reality? IEEE Comput. Graph. Appl. 19(6), 16–27 (1999) Brugger, P.: Reflective mirrors: perspective taking in autoscopic phenomena. Cogn. Neuropsychiatry 7(3), 179–194 (2002) Buckner, C.R., Carroll, D.C.: Self-projection and the brain. Trends Cogn. Sci. 11(2), 49–57 (2006) Campbell, J.: The body image and self-consciousness. In: Bermúdez, J.L., Marcel, A., Eilan, N. (eds.) The Body and the Self, pp. 29–42. MIT Press, Cambridge (1995) Cardini, F., Haggard, P., Ladavas, E.: Seeing and feeling for self and other: proprioceptive spatial location determines multisensory enhancement of touch. Cognition 127, 84–92 (2013) Churchland, P.S.: The brain and its self. Proc. Am. Philos. Soc. 155(1), 41–50 (2011)

References

53

Churchland, P.S., Grush, R.: Computation and the Brain. In: Wilson, R.A., Keil, F.C. (eds.) The MIT Encyclopedia of the Cognitive Sciences, pp. 155–158. MIT Press, Cambridge (1999) Clark, A.: Natural-born cyborgs. Oxford University Press (2003) Damasio, A.: The Feeling of What Happens. Body and Emotion in the Making of Consciousness. Harcourt Inc., San Diego (1999) Dennett, D.C.: The Intentional Stance. MIT Press, Cambridge (1987) Dennett, D.C.: Cognitive ethology: hunting for bargains or a wild goose chase? In: Montefiore, A., Noble, C. (eds.) Goals, No-Goals, and Own Goals, pp. 101–116. Unwin Hyman, London (1989) De Ridder, D., Van Laere, K., Dupont, P., Menovsky, T., Van de Heyning, P.: Visualizing out-of-body experience in the brain. N. Engl. J. Med. 357(18), 1829–1833 (2007) Di Luca, M., Mahnan, A.: Perceptual limits of visual-haptic simultaneity in virtual reality interactions. In: 2019 IEEE World Haptic Conference, 67–72 (2019). https://doi.org/10.1109/WHC. 2019.8816173. Driver, J., Spence, C.: Multisensory perception: beyond modularity and convergence. Curr. Biol. 10, R731–R735 (2000) Fadiga, L., Craighero, L.: Hand actions and speech representations in Broca’s area. Cortex 42, 486–490 (2006) Fadiga, L. Craighero, L., Destro, M.F., Finos, L., Cotillon, Williams, N., & Smith, A.: Language in shadow. Soc. Neurosci. 1(2), 77–89 (2006) Friedman, D., Leeb, R., Antley, A., Garau, M., Guger, C., Keinrath, C., et al.: Navigating virtual reality by thought: first steps. In: Proceedings of the 7th Annual International Workshop on Presence, 160–167 (2004) Frith, C.: The self in action: lessons from delusions of control. Conscious. Cogn. 14, 752–770 (2005) Gallagher, S.: Body schema and intentionality. In: Bermúdez, J.L., Marcel, A., Eilan, N. (eds.) The Body and the Self, pp. 225–244. MIT Press, Cambridge (1995) Gallagher, S.: Philosophical conceptions of the self: implications for cognitive science. Trends Cogn. Sci. 4(1), 14–21 (2000) Grèzes, J., Decety, J.: Does visual perception of object afford action? Neuropsychologia 40, 212–222 (2002) Heldal, I., Brathe, L., Steed, A., Schroeder, R.: Analyzing fragments of collaboration in distributed immersive virtual environments. In: Schroeder, R., Axelsson, A.-S. (eds.) Avatars at work and play. Collaboration and interaction in shared virtual environments, pp. 97–130. Springer (2006) Hettinger, L.J.: Illusory self-motion in virtual environments. In Stanney, K.M. (ed) Handbook of Virtual Environments. Design, Implementation, and Applications, pp. 471–491. Lawrence Erlbaum Associates, Mahwah (2002) Heydrich, L., Blanke, O.: Distinct illusory own-body perceptions caused by damage to posterior insula and extrastriate cortex. Brain 136, 790–803 (2013) Hofstadter, D.: I Am a Strange Loop. Basic Books, New York (2007) Hollerbach, J.M.: Locomotion interfaces. In: Stanney, K.M. (ed) Handbook of Virtual Environments. Design, Implementation, and Applications, pp. 239–254. Lawrence Erlbaum Associates, Mahwah (2002) Holyoak, K.J.: Psychology. In: Wilson, R.A., Keil, F.C. (eds) The MIT Encyclopedia of the Cognitive Sciences, pp. xl–xlix. MIT Press, Cambridge (1999) Hutchins, E.: How a cockpit remembers its speed. Cogn. Sci. 19, 265–288 (1995) Ionta, S., Heydrich, L., Lenggenhager, B., Mouthon, M., Fornari, E., Chapuis, D., Gassert, R., Blanke, O.: Multisensory mechanisms in temporo-parietal cortex support self-location and firstperson perspective. Neuron 70, 363–374 (2011) Ionta, S., Martuzzi, R., Salomon, R., Blanke, O.: The brain network reflecting bodily selfconsciousness: a functional connectivity study. Soc. Cogn. Affect. Neurosci. 9(12), 1904–1913 (2014) Jeannerod, M.: The mechanism of self-recognition in humans. Behav. Brain Res. 142, 1–15 (2003) Jeannerod, M., Anquetil, T.: Putting oneself in the perspective of the other: a framework for self-other distinction. Soc. Neurosci. 3, 356–367 (2008)

54

2 Self in Virtual Reality

Knoblich, G.: Self-recognition: body and action. Trends Cogn. Sci. 6(11), 447–449 (2002) Knoblich, G., Elsner, B., Aschersleben, G., Metzinger, T.: Grounding the self in action. Conscious. Cogn. 12, 487–494 (2003) Knoblich, G., Sebantz, N.: Agency in the face of error. Trends Cogn. Sci. 9(6), 259–261 (2005) Leeb, R., Keinrath, C., Friedman, D., Guger, C., Scherer, R., et al.: Walking by thinking: the brainwaves are crucial, not the muscles! Presence 15(5), 500–514 (2006) Lenggenhager, B., Tadi, T., Metzinger, T., Blanke, O.: Video ergo sum: manipulating bodily selfconsciousness. Science 317, 1096–1099 (2007) Lotte, F., Faller, J., Guger, C., Renards, Y., Pfurtscheller, A.L., et al.: Combining BCI with virtual reality: towards new applications and improved BCI. In: Allison, B.Z., Dunne, S., Leeb, R., Millán, J., Nijholt, A. (eds) Towards Practical Brain-Computer Interfaces, pp. 197–220. Springer (2012) Maloney, R.T., Lam, S.K., Clifford, C.W.G.: Color misbinding during motion rivalry. Biol. Let. 9, 20120899 (2012) Maravita, A., Iriki, A.: Tools for the body (schema). Trends Cogn. Sci. 8, 79–86 (2004) Marr, D.: Vision. Freeman, New York (1982) Martin, M.G.F.: Bodily awareness: a sense of ownership. In: Bermúdez, J.L., Marcel, A., Eilan, N. (eds.) The Body and the Self, pp. 267–289. MIT Press, Cambridge (1995) Meilinger, T., Knauff, M., Bülthoff, H.H.: Working memory in wayfinding—a dual task experiment in a virtual city. Cogn. Sci. 32, 55–70 (2008) Mel Slater.: Towards a digital body: The virtual arm illusion. Frontiers in Human Neuroscience 2 (2008) Melzack, R.: Pain and the neuromatrix in the brain. J. Dent. Educ. 65, 1378–1382 (2001) Merleau-Ponty, M.: Phenomenology of Perception. Routledge, London (1958) Metzinger, T.: Being No One. The Self-model Theory of Subjectivity. MIT Press, Cambridge (2003) Metzinger, T.: Why are identity disorders interesting for philosophers? In: Schramme, T., Thome, J. (eds.) Philosophy and Psychiatry, pp. 311–325. Walter de Gruyter, Berlin (2004) Miall, R.C., Weir, D.J., Wolpert, D.M., Stein, J.F.: Is the cerebellum a smith predictor? J. Motor Behav. 25, 203–216 (1993) Minsky, M.: The Society of Mind. Simone and Schuster, New York (1986) Molnar-Szakacs, I., Arzy, S.: Searching for an integrated self-representation. Commun. Integr. Biol. 2(4), 365–367 (2009) Moseley, L.G., Gallace, A., Spence, C.: Bodily illusions in health and disease: physiological and clinical perspectives and the concept of a cortical ‘body matrix.’ Neurosci. Biobehav. Rev. 36, 34–46 (2012) Moutoussis, K., Zeki, S.: Functional segregation and temporal hierarchy of the visual perceptive systems. Proc. Biol. Sci. 264(1387), 1407–1414 (1997) Neisser, U.: Five kinds of self-knowledge. Philos. Psychol. 1(1), 35–59 (1988) Normand, J.-M., Giannopoulos, E., Spanling, B., Slater, M.: Multisensory stimulation can induce an illusion of larger belly size in immersive virtual reality. PLoS One 6(1), e16128 (2011) Petkova, V.I., Khoshnevis, M., Ehrsson, H.H.: The perspective matters! Multisensory integration in ego-centric reference frames determines full-body ownership. Front. Psychol. 2, 35 (2011) Pomés, A., Slater, M.: Drift and ownership toward a distant virtual body. Front. Hum. Neurosci. 7, 908 (2013) Posner, M.I., Sheese, B., Odludas, Y., Tang, Y.: Analyzing and shaping neural networks of attention. Neural. Net. 19, 1422–1429 (2006) Quian Quiroga, R., Reddy, L., Kreiman, G., Koch, C., Fried, I.: Invariant visual representation by single neurons in the human brain. Nature 435, 1102–1107 (2005) Rey, H.G., Ison, M.J., Pedreira, C., Valentin, A., Alarcon, G., Selway, R., et al.: Single-cell recordings in the human medial temporal lobe. J. Anat. 227, 394–408 (2015) Russell, J.: At two with nature: agency and the development of self-world dualism. In: Bermúdez, J.L., Marcel, A., Eilan, N. (eds.) The Body and the Self, pp. 127–151. MIT Press, Cambridge (1995)

References

55

Sato, A., Yasuda, A.: Illusion of sense of self-agency: discrepancy between the predicted and actual sensory consequences of actions modulates the sense of self-agency, but not the sense of self-ownership. Cognition 94, 241–255 (2005) Slater, M.: A note on presence terminology. Presence Connect 3(3), 1–5 (2003) Strawson, G.: Self and body. Aristotelian Soc. 73, 307–332 (1999) Tsakiris, M., Schutz-Bosbach, S., Gallagher, S.: On agency and body-ownership: phenomenological and neurocognitive reflections. Conscious. Cogn. 16, 645–660 (2007) Uddin, L.Q., Iacoboni, M., Lange, C., Keenan, J.P.: The self and social cognition: the role of cortical midline structures and mirror neurons. Trends Cogn. Sci. 11(4), 153–157 (2007) van Beers, R.J., Wolpert, D.M., Haggard, P.: When feeling is more important than seeing in sensorimotor adaptation. Curr. Biol. 12, 834–837 (2002) van den Bos, E., Jeannerod, M.: Sense of body and sense of action both contribute to self-recognition. Cognition 85, 177–187 (2002) White, S.L.: Self. In: Wilson, R.A., Keil, F.C. (eds.) The MIT Encyclopedia of the Cognitive Sciences, pp. 733–735. MIT Press, Cambridge (1999) Wilson, M.: Six views of embodied cognition. Psychon. Bull. Rev. 9, 625–636 (2002) Wolpert, D.M., Flanagan, J.R.: Motor prediction. Curr. Biol. 11(18), R729–R732 (2001) Wu, D.-A., Kanai, R., Shimojo, S.: Steady-state misbinding of colour and motion. Nature 429, 262 (2004) Zeki, S.: The disunity of consciousness. Trends Cogn. Sci. 7(5), 214–218 (2003)

Chapter 3

Self and the Virtual Other

To perceive the world is to co-perceive oneself (Gibson 1986, p. 141).

3.1 Social Cognition In terms of evolutionary significance, the ability to decode social signs is highly ranked, because it allows animals to select a response (Dunbar 2009). Based on social signals, animals can discern predators from prey or mates and human beings can navigate the complexities of the social world. Human social life requires sophisticated cognitive abilities so that a unique balance of competition and cooperation among people is achieved in everyday activities. Everyday social understanding is based on a range of mechanisms of social cognition. Social cognition includes processes that support our understanding of others and our interaction with them (Gallotti and Frith 2013). Lower-level mechanisms, such as perception of basic emotions (fear, anger, sadness, happiness, disgust, surprise), eye gaze, biological motion, goal-directed action and agency are common to human beings and other animals. On the other hand, higher-level mechanisms, such as reading other people’s minds and interpreting complex emotions (e.g. pride, guilt, embarrassment, resentment) are considered to be unique to human beings (Blakemore and Frith 2004). Some of these mechanisms are present at birth, such as the abilities to recognize faces, to follow the direction of eye gaze of another person, and joint attention. Others, such as imitation, empathy, deception, and theory of mind, appear at different time points during the development. Social cognition involves the abilities to perceive, categorize, remember, analyze, reason with, and behave towards others (Pelphrey and Carter 2008). As indicated by this broad characterization, social cognition involves a range of mental capacities, although it is most often defined simply as “understanding others” (Lieberman 2007)

© Springer-Verlag GmbH Germany, part of Springer Nature 2021 V. Kljajevic, Consensual Illusion: The Mind in Virtual Reality, Cognitive Systems Monographs 44, https://doi.org/10.1007/978-3-662-63742-5_3

57

58

3 Self and the Virtual Other

or “thinking about people” (Fiske 1992; Korman et al. 2015). The term social cognition is applicable not only to how people perceive, categorize and otherwise cognize individuals, as in person perception, but also to how people interact and relate to groups (Bolender 2010). Researchers often focus on specific stages in social information processing, suggesting that it begins with social perception. While for some researchers social perception includes “the initial stages of evaluating the emotions, actions, and intentions of others using their gaze direction, body movements, hand gestures, facial expressions, and other biological-motion cues” (Pelphrey and Carter 2008, p. 284), other scholars use this term differently, for example merely as a component of the human visual system specialized for processing of social cues (Jacob and Jeannerod 2003). Regardless, in both cases social perception refers to early stages of social information processing. Studying other people is unique in the sense that, unlike other objects, people have minds and experiences. According to one prominent theory, to understand others, we use our knowledge of social rules and norms to mediate a tacit process of theorizing in which we construct our interpretations of other people’s behavior. From this perspective, psychological states are theoretical entities; assigning beliefs, desires, intentions and other mental states to others requires a theory of mind. Thus, other people as social objects are much more complex than other entities and they cannot be regarded just as a different type of object representations on which cognitive processes operate (Korman et al. 2015). While attributing mental states to others is part of normal everyday experience, patients with certain neurological and psychiatric conditions have marked difficulties with reading other minds. Another broad theme related to social cognition that is highly relevant to virtual reality involves mental processes pertaining to the self. Gibson (1986) argued that our perception of the world involves some extent of self-referentiality, because the self and the environment are inseparable. Self-recognition, self-reflection, selfknowledge, and self-control are some of the topics related to the self that have been relatively frequently addressed in recent research on social cognition. However, it may not be so obvious that studying self fits into the domain of social cognition: Given that the self feels hermetically sealed off from others, containing private thoughts and feelings, one might wonder why the self is so heavily researched by social psychologists whose main focus is on social interactions and situational pressures. (Lieberman 2007, p. 265)

One reason for considering self-directed mental processes within the domain of social cognition has to do with some early theories from social psychology that postulate a critical role of social feedback in the formation of self (Lieberman 2007). These theories further hold that inner speech, which people typically experience in everyday life as “speech-for-oneself” (Vygotsky 1986) and which remains preserved to some extent even in brain-damaged stroke patients with language disorders (Kljajevic et al. 2017), serves as a guide—via simulated conversations with the internalized other— for appropriate social behavior (Lieberman 2007). There are, of course, different types of inner speech, but what is relevant here is how it is acquired and when. Note that inner speech develops later than overt speech and that in typically developing children this happens around the age of 5 (Vygotsky 1986). Similarly, the ability to

3.1 Social Cognition

59

visually self-recognize oneself in a mirror is acquired at about 2 years of age, which is later than the ability to visually recognize others in a mirror, for instance baby’s own mother, which children typically acquire by the time they are 9 months old. Even development of the concept of self seems to be conditioned by the development of the concept of other (Buber 1923/1970). These findings are compatible with the proposal that social feedback may play a critical role in the formation of selfrelated cognitive processes. They may also suggest a more complex picture in which additional developmental requirements are imposed on the involved brain structures before they mature enough to adequately support these processes1 . Human interest in own self and others begins early and continues throughout life. In fact, intersubjectivity begins even before birth, while one is inhabiting the body of another person, and in case of twins, the mother’s body is shared with somebody else (Ammaniti and Gallese 2014). Developmental literature suggests that newborn human babies distinguish between human faces and objects, and that they prefer to attend to human faces (Beer and Ochsner 2006). One hallmark of the self is the ability to define and keep boundaries with others as well as the ability to establish relations with others. In a social context, the self is a social object and so is the other, which implies that, in addition to the processes required for self-perception and selfunderstanding, the processes involving perception and understanding of others and their interactions are also relevant. Developmentally, both perception of the self and perception of the other are initially egocentric, meaning that a child assumes that others always share her view, but during the development the child realizes that her own perspective does not necessarily match the perspective of the other. Even beyond the initial egocentric developmental phase, the self is often used as a reference in our efforts to understand others, with beliefs about the self, own intentions and emotions serving as a starting point. Finally, social cognition includes also social knowledge, in addition to perception and understanding of own self, perception and understanding of others, and interpersonal interactions. Social knowledge consists of declarative and procedural processes, where the former type of knowledge includes everything one can state about the social world (e.g. norms of behavior in certain social situations), and the latter type includes various skills and strategies used in social contexts (Beer and Ochsner 2006). For example, driving home from work we choose not to switch from the slow lane with single drivers to the lane reserved for carpools, which would get us home faster, not because we may get caught and have to pay a fine, but mostly because we know that it is a socially inappropriate action. Thus, our behavior reflects our choices as well as influences of specific social constraints.

1 For

a review of brain regions and networks supporting mental processes implicated in social cognition, see for example Lieberman (2007), Uddin et al. (2007), and Yang et al. (2015).

60

3 Self and the Virtual Other

3.1.1 Theory of Mind Our social experiences are inherently complex. A sharp demarcation between the acting self and others, the discrepancy between our need to act authentically and yet to follow social rules at the same time, and the ability to understand goals of other people’s actions and hold in mind someone else’s perspective, underscore the concept of mutual understanding (Gallese 2000). This concept implies that the self and others share certain mental states. It assumes the ability to attribute intentions, desires, beliefs and other mental states to the self and to others (Baron-Cohen 1995; Apperly 2011). According to this view, we do not have a direct access to other people’s mental states and thus we interpret their behavior based on their actions and communication, verbal and nonverbal. For instance, the way nonverbal communication is critical in social interaction is illustrated by the use of gestures, expression of emotions, and regulation of turn-taking in conversation; similarly, shifts of posture can express social closeness or distance between the interactants. Constructing believable virtual characters largely depends on how nonverbal communication is conceptualized and realized. In addition to implementing specific rules for nonverbal behavior including facial expressions, eye gaze, head motion, body posture, gestures as well as proxemic patterns, virtual characters’ nonverbal responses also need to be specific, situation appropriate, consistent across situations and characters, and sensitive to cultural preferences. Successful social interaction requires the ability to perceive and track other people’s mental states2 , as well as their ability to make inferences about own mental states and adjust own behavior accordingly. Numerous processes provide input information for mentalizing, such as face recognition, gaze following, and goal detection, whereas other processes allow reasoning about that information, for instance inference of specific emotions and prediction of next action (Korman et al. 2015). Thus, we predict other people’s behavior based on our mind-reading ability or theory of mind: We naturally explain people’s behavior on the basis of their minds: their knowledge, their beliefs and their desires, and we know that when there is a conflict between belief and reality it is the persons’ belief, not the reality that will determine their behavior. Explaining behavior in this way is called ‘having a theory of mind’ or ‘having an intentional stance’ (Frith and Frith 2005, p. 644).

The proponents of theory of mind argue that mindreading enables many important forms of human interaction, from deception, which is associated with false beliefs and which can be used to manipulate behavior of others, to teaching (Frith and Frith 2005). We need to understand beliefs of others in order to predict their actions, explain, or manipulate their behavior (Doherty 2009). Many aspects of this ability 2 The ability to get “inside other people’s heads” and share their mental states (their wishes, desires,

hopes, beliefs and so on) is known as mentalizing, having a theory of mind (ToM) or intentional stance, or mindreading (see Apperly 2011, for review). Our everyday explanations of behavior of others in terms of mental states is referred to as folk psychology (Dennett, 1981/1987).

3.1 Social Cognition

61

have been debated for a long time, such as whether it is uniquely human, whether it is innate, and whether it requires language. While some proponents of theory of mind argue for a theory-theory view, according to which attribution of mental states takes place via tacit use of a theory allowing for construction of interpretations as explanation of other people’s behavior, others argue for a simulation view, according to which our ability to infer mental states of others depends on our ability to put ourselves into their shoes, metaphorically speaking (Blackburn 1994; Frith and Frith 2005). Importantly, theory of mind is not a monolithic ability (Gallese 2000; BaronCohen 1995). It consists of components that are implicated in different mentalizing processes and require a more distributed neural support. In an attempt to reconcile the findings on simple ascriptions of mental states and more complex, flexible, high-level mindreading, a two-system account of the cognitive basis of mind-reading has been proposed (Apperly and Butterfill 2009; Apperly 2011). According to this view, simpler mindreading abilities are supported by cognitively efficient, low-level processing modules, whereas high-level mindreading is achieved via the same inferential processes that are implicated in other types of reasoning. Although the capacity for simple ascription of mental states develops early in humans and it is likely present in nonhuman animals, the development of high-level mindreading capacity unfolds along a longer neurocognitive developmental path in humans, for whom it may be unique. Furthermore, these components may interact with other cognitive processes, primarily with working memory, inhibition and other executive processes. The idea that there exist two theory of mind systems has provoked a debate, mainly questioning the existence of automatic theory of mind processing in adults. There is currently no consensus on how to best describe this ability in humans and whether it is best achieved within a one-system, two- or multiple-system theory of mind approach (Schneider et al. 2017). It is generally assumed that the ability to understand other people’s perspectives depends on the ability to understand one’s own perspective. However, there is a discrepancy in findings on the issue whether we automatically track other people’s mental states or whether taking other’s perspective requires certain cognitive effort. A recent study has suggested that we may automatically keep track of differing knowledge states of others, but that deliberately adopting other people’s perspectives requires a cognitive effort. More specifically, Bradford et al. (2015) investigated whether self and other belief-attribution processes that are part of the theory of mind mechanism, could be differentiated, and if so, whether they would be driven by different aspects of executive functioning. They measured accuracy and response times in 62 cognitively healthy adults in a mental state attribution task, where mental state attribution required holding two potentially contrasting belief states in one’s mind (e.g. possession or awareness of a false-belief state). The main findings of the study are that the response times were significantly longer when questions referred to other-people compared to when they referred to the self-belief states attribution as well as when the shift in perspective was from the other to the self. Thus, the authors conclude, the self perspective is always processed, regardless of the task demands, whereas the other’s perspective is processed only if explicitly required by the task.

62

3 Self and the Virtual Other

Another intriguing question that has been debated in research on mindreading is how we arrive at correct inferences, given that many mental states of an agent are potentially relevant for a given social situation (Apperly 2011). In other words, there are many-to-one mappings between the agent’s action that we observe and his/her goals (intentions, beliefs) that we want to infer. In cognitive science terms, this is an example of computationally intractable problem. On the other hand, people seem to effortlessly arrive at inferences about what others think, feel, or intend to do in typical social situations. Even eighteen months old infants appear to be capable of tracking intentions of other people whose actions they observe (Meltzof 1995), although other aspects of mindreading that require more involvement of executive functions, such as ascription of false beliefs, develop somewhat later, after executive functions have been better developed. Thus, according to this view, even though infants reason about behaviors of others, they do not necessarily reason about their beliefs. Regardless of whether infants are indeed capable of mindreading on some level, or whether they simply track other’s behavior, the fact remains that certain sensitivity that can be associated to mental states of others appears very early in typically developing children. Given the importance of intact social information processing in navigating complexities of everyday social life, it is not surprising to find that developmental disorders such as autism and antisocial personality disorder, as well as psychiatric disorders such as schizophrenia, are associated with disturbances in social cognition (Blakemore and Frith 2004; Pelphrey and Carter 2008). For instance, persons on the autistic spectrum have marked difficulties with social interaction as well as with self-awareness. This has been explained in terms of their diminished mentalizing ability. Individuals with autism are more interested in objects than in people; they have difficulties recognizing faces and emotional expressions, as well as understanding humor (Kljajevic 2019). As another example, children with antisocial behavior disorder show difficulties detecting emotions as well as considering emotions when making decisions, and adults with antisocial personality disorder show lack of empathy and lack of remorse when they harm others, even though they often have outstanding mentalizing skills. Furthermore, delusions of persecution in patients with schizophrenia are often explained in terms of dysfunctional mentalizing (Blakemore and Frith 2004). In virtual environments, intentions, beliefs, desires and other mental states of others may be more difficult to decode, because participants have to infer mental states from the cues displayed via digital representations of others. Such representations are known to be insufficiently expressive (Vinayagamoorthy et al. 2006) (Sect. 3.3.3). What is adding to this complexity in virtual environments is the projected nature of social interaction with virtual characters due to technology-mediation, which raises questions, such as: How do we predict behavior of others in a virtual environment? Do models on which we typically rely in social interaction in the physical world apply to virtual environments as well?

3.1 Social Cognition

63

3.1.2 Attributing Minds to Non-Human Entities Researchers working with complex human-like replicas in robotics and with humanlike characters in virtual reality have reported that when engaging with such entities, participants experience a strong feeling of eeriness. This effect was explained in 1970 by the Japanese robotics engineer Masahiro Mori, who formulated the uncanny valley theory. The theory postulates that the unsettling feelings observed in participants when they engage with human-like inanimate entities are due to objects’ humanlike appearance and their motion patterns. The term uncanny captures the essence of this effect. The term was introduced in 1906 by the German psychologist Ernst Anton Jentsch to refer to an eerie sensation associated with doubts about whether an entity is animate or inanimate. Mori applied this concept to the observation that users may like human-like robots only up to a certain point; the liking abruptly stops when these creations’ appearance becomes too much human-like; since users find the failed attempt at life-like appearance unnerving, there is a descent in liking, and it is this “descent into eeriness” that Mori calls uncanny valley (Mori 1970). An example cited by Mori is a robot designed to smile: when the speed of its facial movements producing the smile was cut in half “in an attempt to make the robot bring up a smile more slowly, instead of looking happy, its expression turned creepy”, tumbling down into the uncanny valley (Mori 1970, p. 99). Not only robotics or immersive virtual reality in recent times, but mythology and art have recorded throughout history that the attempts to turn inanimate things into life-like, living beings had the uncanniness as a common feature.3 According to explanations from evolutionary psychology, the uncanny valley effects arise as a protective response of an organism to potential danger to itself and its progeny. Another type of explanation of the aversion against human-like entities is based on the concept of cognitive dissonance. Briefly, to anticipate behaviors of others, we categorize them based on a combination of perceptual cues and previous experiences. When these expectations are not met, cognitive dissonance arises, which manifests as uneasiness, aversion, fear, or disgust. Based on the evidence from robotics, MacDorman and Entezari (2015) argue that a general cognitive theory, such as the one based on cognitive dissonance, cannot explain the eerie feeling associated with inanimate, human-looking entities. They identify this feeling as a type of biological adaptation to avoid threat arising due to fear and disgust. Other evidence suggests that the uncanny valley effects may be related not to virtual characters’ appearance and patterns of motion, but that instead they arise due to attribution of specific mental abilities, such as emotions and social cognition, to non-human entities (Stein and Ohler 2017). Briefly, a recent study investigated mind-attribution to human-like virtual characters—human-controlled avatars and computer-controlled agents—manipulating their autonomy (Stein and Ohler 2017). The study reported that autonomous virtual agents that displayed emotions and social cognition were perceived as more eerie than scripted virtual agents, and that scripted 3 Consider

for example Golem from ancient Jewish mythology, or Frankenstein’s monster, a more recent creation from art.

64

3 Self and the Virtual Other

human avatars were more eerie than autonomous human avatars. Based on these findings, the authors suggest that the uncanniness is related to ascribing emotions and a theory of mind to computer systems, which in a way turns machines into social beings. These data are consistent with the notion of an uncanny valley of mind and perhaps with the threat to human distinctiveness hypothesis. This view is further supported by the observation that the unnerving feeling is evoked regardless of whether it is a non-human entity exhibiting emotions or a human being exhibiting lack of emotions, which is typical for psychopaths; the uncanny valley is thus applicable to both humanlike robots and robotic humans (Gray and Wegner 2012). For instance, one visual marker of psychopathy is a lack of visible reaction in the eye region to situations that normally evoke emotions. A study that investigated upper facial animation in human-like virtual characters has suggested that virtual characters failing to show the startle response to a scream were experienced as most uncanny by the participants (Tinwell et al. 2013). More generally, the perception of personality traits indicating psychopathy was a strong predictor of uncanniness, whereas perception of other negative personality traits that are not indicative of psychopathy was not. This is consistent with the observation that psychopaths generally show lack of remorse and empathy, and that this lack of emotion is so unexpected to normal human beings that the explanations and predictions based on folk psychology fail, and so we just “don’t get” them. The exact circumstances under which attribution of minds to non-human entities causes eerie sensation are not clear. According to some researchers, the feeling of uncanniness emerges when perceptions of experience—feeling, sensing—are ascribed to human-like robots or creations in virtual reality to which we know such features cannot be actually ascribed, whereas the uncanny effects are not associated with cases when features of other dimensions of mind, for example agency, are ascribed to such entities (Gray et al. 2007; Gray and Wegner 2012). Agency in this context4 is a broadly construed complex of abilities involving self-control, planning, communication, but also thought, emotion recognition, memory, and morality (Gray et al. 2007). Although it may seem problematic to ascribe some of these features, such as morality and thought, to a nonhuman entity, it is plausible to think of robots as having features like memory, the capacity to communicate or plan (Strawson 1999; Gallagher 2000; Hofstader 2007) (Sect. 2.2.1). Moreover, even when they do not have a human-like appearance, machines that display emotions and understanding of emotional experience of others appear to cause anxiety in participants. Considering early explanations of uncanniness in virtual reality based on virtual characters’ appearance and motion patterns, and the new findings suggesting that attribution of mind to non-human entities causes the eerie effect, a reconciling view proposes that more anthropomorphic representations and high behavioral realism are likely associated with higher expectations regarding social interaction (Fox and Ahn 2015). 4 According

to the authors of this study, experience as a mind dimension consists of capacities for hunger, fear, pain, pleasure, rage, desire, personality, consciousness, pride, embarrassment and joy (Gray et al. 2007).

3.1 Social Cognition

65

3.1.3 The We-Mode Unlike the traditional approach to social understanding, which is anchored in mindreading and assumes attribution of mental states to others through observation, more recent approaches posit interactionism at the basis of social understanding. The family of views grouped under the term interactionism originated as a reaction to cognitivism and other forms of individualism that tend to focus on psychological features of single agents even when studying group behavior. Such reduction of group psychology to the psychological features of individual agents has been critiqued as inadequate in addressing important questions of social cognition (e.g. Fiske and Haslam 1996), and arguments have been aligned for the claim that “when agents are poised to interact, they achieve interpersonal awareness through a ‘meeting’ of minds rather than an endlessly recursive exercise of mindreading” (Galloti and Frith 2013, p. 2). Thus, the interactionists challenge the traditional approach to social understanding and action, which they view as individualistic, as being concerned only with what an individual brings to interaction, while disregarding the emergent properties of the interaction itself. They argue that social understanding cannot be explained in terms of cognitive processes that take place in individual minds and that emergent properties of interaction with physical and social objects must be considered. The notion of interactive social cognition hinges on the assumption that key aspects of individual cognitive performance are inherently relational, as also argued by Fiske (1992). In our encounters with the physical world, cognitive activity mostly consists of making sense of things in the world, that is, in attributing them meaning and value. So, making sense of a social environment goes beyond mere physical copresence and requires participants’ understanding of the social scene and their joint actions. Importantly, a joint action is more than simultaneous contribution of two (or more) people who are doing something together. It involves shared intentions or we-intentions (Searle 1990). The we-mode requires that interacting individuals share a distinct psychological attitude towards the interactive scene as togetherness in intentions, beliefs, and desires (Galloti and Frith 2013). Collective intentionality is a predecessor of the we-mode approach to social understanding. This concept, introduced in Searle’s (1990) seminal paper, includes shared mental attitudes, such as shared intentions, shared beliefs, and collective emotions, which are also labeled as collective practical intentionality, collective cognitive intentionality, and collective affective intentionality (Schmidt 2008). There are various approaches to collective intentionality, which differ with regard to where collectivity is located—in the content, in a we-mode, or in collective plural subjects. What they all have in common is that they are based on the standard model of intentional states as propositional attitudes. Importantly, collective behavior is not the same as the summation of individual behaviors and collective intentionality is not reducible to individual intentionality. The reason for this irreducibility of we-intentions to I-intentions is that “the notion of a we-intention, of collective intentionality, implies the notion of cooperation” to

66

3 Self and the Virtual Other

achieve the goal, which is not entailed in the notion of I-intentions (Searle 1990, p. 406). However, even though collective intentionality is shared, we-intentions do not imply the existence of a group mind or group consciousness, and therefore, Searle claims, there is no some mysterious group mind. Furthermore, according to one view of interaction in the we-mode, the social environment broadens action possibilities for participating individuals. It enhances their interpersonal understanding through “meeting of minds”, a quality which is not available when merely ascribing mental states to others. But how exactly do interacting individuals acquire these insights into each other’s’ “propensities and dispositions to act” that are not available in the observational perspective? Co-representing the others’ viewpoint on the action scene as a condition for acting jointly modulates the space of mental activity and, therefore, behavior, by providing each agent with access to a set of descriptions and concepts that would be unavailable from the observational, first-person singular or third-person perspective. For example, actions that would not be available to me on my own are added because they are available to someone else in my group. … Human cognition is enriched with resources for cognizing in an irreducibly collective mode that remain latent until individuals become engaged in particular interactive contexts. In this respect, the we-mode is a property of individuals, but, since it manifests during active participation in group behavior, it cannot be understood in purely individualistic terms. (Gallotti and Frith 2013, p. 6).

Thus, the proponents of the we-mode approach do not argue for a contrast between the individual and the social. They argue instead for an approach that builds on individualism, starting from neurocognitive features intrinsic to the individual. Similarly, interactionism is not necessarily an alternative to mindreading and perhaps it is best viewed as complementing it (Michael 2011). The central claims of the we-mode approach to social understanding are that a group is the principal agent, not its individual members, and that the we-mode is irreducible to the I-mode. The defining criteria of the we-mode reasoning are authoritative group reason, collectivity condition, and collective commitment (Hakli et al. 2010). Furthermore, two types of we-reasoning have been discerned, we-reasoning in the pro-group I-mode and we-reasoning in the we-mode: Pro-group I-mode reasoning answers ‘What should I do as a private person acting in part for the group?’, and we-mode reasoning answers two types of questions: ‘What should our group do?’ and ‘What should I do as a group member as my part of our group’s action?’ (Hakli et al. 2010, p. 293)

Summing up, interactionism and the we-mode approach to social cognition emphasize irreducibility of social understanding to the features of individual mind, collective intentionality and a group as the main agent. However, the concept of group agent has been debated, in particular with regard to the question of how to analyze its actions and attitudes. It also remains unclear under which circumstances a group of agents become a group agent. Despite the lack of consensus on these important issues, there is agreement that one precondition to identify with a group is to adopt a we-perspective, which is irreducible. This notion raises interesting questions related to not only joint action, but also responsibility and ethics.

3.1 Social Cognition

67

Adopting a specific conceptualization of social interaction is relevant to virtual environments, because they extend the space in which participants act and interact. An often emphasized observation is that although human beings are intrinsically relational, most of immersive virtual reality has been designed for single participants, resulting in isolated social users’ experiences. Additionally, research on virtual reality has been largely focused on single-user systems (Steinicke 2016). It is only relatively recently that shared virtual environments started to become more widely available. Shared virtual environments open important questions regarding social cognition and the ways in which cognition is distributed across participants and environments. Similar to the essential questions addressed by the mainstream social cognition neuroscience research, such as What does it mean to be someone? How do we relate to others?, virtual reality is raising questions such as What does it mean for a self to have an avatar? How do we navigate the social world of virtual characters? Questions like these deserve more research attention, because the challenges of designing virtual environments become exceedingly higher when complexities of social interaction are included.

3.1.4 Relational Models How do we think about social relations? One prominent theory that deals with this question is the relational models theory (Fiske 1992). It is not a theory of interpersonal relations per se, but a theory of cognition, specifically our mental representations of social interactions. The theory is based on the assumptions that individualism cannot explain social cognition and that “the relational and the individual are not just distinct levels at which social cognition can be described, but correspond to kinds of schemas that combine and possibly compete in organizing the processing of social information” (Fiske and Haslam 1996, p. 147). Instead of focusing on individuals and objects, the theory explicates how people use relational models to “construct, coordinate, interpret, contest and evaluate social relationships”. This means that in everyday social transactions with acquaintances, people think about the social world not in terms of personal, individual attributes, but in relational terms. For instance, it is more important to know that the person that arrived with your boss to the party is his new wife, than that she is young, tall, blond, and friendly. Additionally, awareness and monitoring of how others are relating are also important, and people generally care about the implications of their relationships to other relationships, and so at a third level of complexity the theory posits the models of combined relationships. The relational models theory postulates that cognition of social relations is structured by means of a small number of basic mental models. The models are used crossculturally for social interaction, evaluation and affect (Fiske 1992). Fiske proposed the following four models: communal sharing, authority ranking, equality matching, and market pricing. In communal sharing relationships, members are considered equal in some relevant social way; they focus on what they have in common while

68

3 Self and the Virtual Other

disregarding individual identities, and one identifies with the group. The type of interpersonal relations that fits this model can be found in romantic love as well as among people that identify with some group, race or nation. In authority ranking, there is a strict linear order along some hierarchical social dimension; the group members must respect ranking and obey the rules that regulate the hierarchy, as found in relationships among people at different military ranks or in relations across generations in traditional societies. Importantly, the rank that is higher up in the hierarchy is treated as better. In equality matching, relationships require equal balancing or one-to-one correspondence, and reciprocity. This type of interaction is found among colleagues and peers that are not particularly close, and it also includes relationships and rules such as those found in competitive sports, regarding for instance equal team size, turn taking, or equal time. Another type of Fiske’s models is market pricing; in relationships of this type, proportionality in social relationships is key, and people interact considering prices, wages, commissions, rents, calculation of efficiency and costbenefit analysis, and similar concepts that define social value. Such relationships are found among sellers and buyers (Fiske and Haslam 1996). The four basic models of social interaction are typically combined in everyday social exchanges, with a specific model predominating others, depending on the aspect of a person’s social life. Finally, there are social interactions which are characterized by the absence of any relational model. For instance, the theory postulates null interactions, in which the other is treated as if she/he were not a human being or if they were not present (Bolender 2010), and asocial interactions, in which people use others just as a means to some ulterior end, as found in sociopathy (Fiske 1992). Examples of such interpersonal interactions include interactions between guards and prisoners, or between persons inflicting torture and the victim. More common examples include driving a car on a highway and trying not to collide with other cars, where the other cars are treated merely as objects, as completely devoid of human presence, which leads to null relation between the drivers. In stark contrast to such ways of relating to others is oceanic merging, the sense of “being united in love with everything”, which goes beyond fellow humans or living things, and extends to “an ecstatic union with the cosmos” (Bolender 2010, p. 109). This model is only rarely implemented in relational cognition, perhaps more frequently by mystics than other people, although such indiscriminate feelings of empathy can be induced by certain chemicals known as empathogens, such as psychoactive drug Methylenedioxymethaphetamine, commonly known as Ecstasy (Bedi et al. 2010). The relational models theory postulates that the models are tacit, and that they provide different expectations while guiding people in social interactions and helping them to understand social situations—or put differently, find meaning in actions of others—and respond to them (Bolender 2010). This further means that relational qualities and implicit relational schemas cannot be used as explicit strategies in social recall, because they are not directly accessible (Fiske and Haslam 1996). An interesting implication of the theory is that difficulties in interpersonal relationships that traditional psychology addresses in individualistic terms may benefit more from

3.1 Social Cognition

69

the relational models approach and that it might be worth finding out whether a person has a problem with understanding how certain types of relationships operate and/or implementing a specific type of relationship. Thus, the relational models are highly relevant for person perception and social interaction. How do these models apply to virtual environments? Perceiving and understanding oneself and others in virtual environments requires social cognition. Although it is clear that the limits of current virtual reality technology are limiting participants’ experiences of social life in virtual environments, interaction with virtual others should be less like perceiving cars on the highway and trying not to collide with them, i.e. null relation, and more readily fitting other relational models. Additionally, a type of relation that is highly relevant for virtual reality but is not predicted by the relational models theory (Fiske 1992; Bolender 2010) is relating to inanimate objects in the way one relates to fellow human beings. This type of relation is particularly important in interactions with virtual agents, because these computer-generated characters simulating humans are controlled by the computer. Apart from accounting for uncanny valey effects, it would be interesting to explore this type of relation considering insights from the relational models theory and the we-mode approach to social understanding.

3.2 The Social Brain Does social cognition imply a unique social cognitive module and if so, does it rely on a specialized brain area or network of areas for neural support? There are arguments in favor of, as well as against, a modular approach to social cognition and the domain specific view of social cognitive processes. Needless to say, social cognition draws on almost all brain functions, from language and vision to hearing, attention and memory. Given the diversity of the involved processes and the extent of dispersed brain areas supporting these processes, it is unlikely that there exists a highly specialized neural system for social cognition. At the same time, social cognition involves some processes that may operate on uniquely social contents, such as self, other, and interpersonal knowledge, which makes the question of unique contribution of certain brain networks to these processes worth pursuing (Beer and Ochsner 2006; Korman et al. 2015; Wittmann et al. 2016). More generally, social cognition has been linked to one major distinction between the human brain and the brain in other vertebrates. Namely, the observation that primates have large brains for body size relative to other vertebrates (Jerison 1973) led to the development of the social brain hypothesis. This hypothesis postulates that the significantly bigger and in terms of energy expenditure5 more demanding brains 5 Compared to the energy consumption of skeletal muscle, brain tissue consumes about 10 times more

energy, and this is mostly because of high energy demands related to replenishing neurotransmitters (Dunbar 2009).

70

3 Self and the Virtual Other

in primates developed due to their complex social lives (Dunbar 2009). Humans, who have the highest social competence, have brains that are about three times bigger than the brains of great apes, who are their nearest primate relatives. More importantly, unlike in cognitive skills for dealing with the physical world, in certain social cognitive skills, such as social learning, communication, and theory of mind, even children as young as 2.5 years of age outperform chimpanzees and orangutans (Herrmann et al. 2007). The finding that there exists a set of species-specific social cognitive skills in humans that enable participation and knowledge exchange in cultural groups supports the social brain hypothesis as well as the cultural intelligence hypothesis. Regardless of which specific hypothesis on the development of social brain in humans one adopts, the notion that social cognition abilities depend on specialized brain systems seems to be relatively well accepted (Blakemore and Frith 2004; Pelphrey and Carter 2008). For instance, two large-scale neural networks have been implicated in our understanding of other minds: the mirror neuron system, which comprises the parietal and premotor regions, and the network of areas known as the social brain, which includes the medial prefrontal, the temporopolar, the temporoparietal cortices and the amygdala (Wilms et al. 2010). The discovery of the mirror neurons, a type of visuo-motor neurons in the prefrontal cortex that was first described in the monkey brain (Rizzolatti and Craighero 2004), has inspired accounts according to which mirror neurons and activity of the motor system during action observation are the building blocks of our capacity to understand actions of others. For instance, Ammaniti and Gallese (2014) argue: We do not necessarily need to metarepresent in propositional format the intentions of others to understand them. Motor goals and intentions are part of the vocabulary being spoken by the motor system. Most of the time we do not explicitly ascribe intentions to others; we simply detect them (p. 14).

While most research to date has focused on the role of the motor system in inferring goals or intentions of actions, it has been shown recently that there is a strong correlation between an individual’s ability to correctly perceive information in an observed action and their execution of that action at the level of kinematics. However, the role of the motor system in action perception and inference, and the role of mirror neurons in action understanding have been debated. For instance, the claim that we understand an action “because the motor representation of that action is activated in our brain” (Rizzolatti et al. 2001, p. 661) has been taken up by Hickok (2009), who argues that the mirror neuron theory of action understanding fails on several grounds. Furthermore, the role of the mirror neuron system in understanding of other minds has been criticized as unclear; the role mirror neurons are claimed to play more generally in cognition and behavior was described as “an overenthusiastic explanatory overextension”, resembling the fascination with computer models some fifty years ago, when everything seemed to be explainable in computational terms (Gallagher 2009).

3.2 The Social Brain

71

Gallagher (2009) suggests that the activity of the mirror neuron system does not equal an implicit simulation. However, regarding social cognition, it has been claimed that this system may explain how we understand actions of others via simulation of other people’s actions and even grasp other people’s minds. Unlike “theory theory”, according to which we infer about intentional states of others, such as their beliefs, wishes, desires and so on by relying on folk psychology, the simulation theory claims that we simulate other minds in our mind and then project those mental states onto others to explain or predict their behavior. Instead of simulation, Gallagher argues that the mirror neuron system supports intersubjective perception, pointing out to the evidence from human developmental studies on the ability of infants as old as 9 months to understand actions of others; this understanding is not the result of their taking an intentional stance, but simply evidence of perceiving the embodied actions of others. The embodied simulation theory postulates that the basis of intersubjectivity, or of our direct knowledge about others “which comes from within”, is intercorporeality. The concept of intercorporeality was developed by the French philosopher Merleau-Ponty, but it was created by Edmund Husserl, the founder of the study of consciousness or phenomenology. Intercorporeality, which is enabled by the body schema, is a bodily connection with others that is preconceptual and so it does not require propositional attitudes while it allows the observer to connect to sensations, emotions, and actions of others. It is, therefore, more basic than mindreading: Parallel to the detached third-person sensory description of the observed social stimuli, internal nonpropositional ‘representations’ in bodily format of the body states associated with actions, emotions, and sensations are evoked in the observer, as if he or she were performing a similar action or experiencing a similar emotion or sensation. (Ammaniti and Gallese 2014, p. 16)

The central concept of the embodied simulation theory is the mental state reuse based on a bodily formatted representations, instead of a propositional representation associated with theory of mind. Ammaniti and Gallese (2014) emphasize that what makes a mental representation embodied is not its neural instantiation in the brain, but its bodily format, comprising motor, viscero-motor and somatosensory characteristics. Furthermore, the concept of simulation in the simulation theory of the classic theory of mind and in the embodied simulation theory differs in terms of the content on which this process operates: on the one hand, there are symbolic representations as mappings of propositional attitudes (beliefs, desires), and on the other, there is a bodily formatted representation. In this theory, the mirror system furnishes the concept of intercorporeality, described also as “a privileged access to the world of the other” (ibid, p. 25), with a neurobiological basis. This approach to intersubjectivity, which rejects the “solipsistic” perspective of classic cognitive science and much of analytic philosophy, emphasizes that the other is “much more than a different representational system; it becomes a bodily self, like us” (Ammaniti and Gallese 2014, p. 9). There lays a problem for virtual reality: a bodily self in a virtual environment is only a representation of the physical body. Is it then possible to achieve “intentional attunement” with others in virtual reality, that

72

3 Self and the Virtual Other

is, to achieve this form of “understanding others from within”? And more generally, given the projected nature of virtual interactants, can mindreading or “meeting of minds” or embodied simulation theory explain the nature of self-other dynamics in virtual reality? Research on how we understand other minds in virtual environments so far has relied on the classic approach to mindreading, and findings from a relatively small pool of studies on the topic indicate that attributing minds to nonhuman entities elicits the uncanny valley effects (Stein and Ohler 2017). An additional problem is that in general we are not very accurate in social information processing, regardless of whether it is self-perception, perception of others, or understanding of social norms (Beer and Ochsner 2006). This may be even more pronounced in virtual reality, given the limited expressiveness of virtual characters, typically insufficient availability of cues to social contexts, and the general paucity of details on social situations, despite the fact that social situations can affect our behavior in powerful ways. One might argue that, since virtual environments allow participants to locate virtual experiences in corporeal reality (but see Steptoe et al., 2013 for an exception), the embodied simulation approach is perhaps a better way of explaining intersubjectivity in virtual reality than the models based on propositional attitudes. On the other hand, an approach that considers complementary contributions of mental attribution of representations in bodily format and in propositional format (Gallese and Sinigaglia 2011) might be optimal also in virtual environments. Further, it is worth noticing that cognitive neuroscience distinguishes between internally-focused social processes, which are directed to mental processes in self and others (i.e. self-referential processes and understanding other minds), and externallyfocused processes, which are directed to one’s own and others’ physical and visible features and actions (Lieberman 2007). This distinction has important implications for virtual environments, where the split between internally- and externally-focused processes becomes a polarization between the processes accessible to the immediate and those accessible to the mediated. In terms of implicated networks, internallyfocused processes are supported by the cortical midline structures composing the default mode network, whereas the frontoparietal network supports the representation of physical, embodied self (self-face, self-voice, self-body), understanding of other people’s actions, and physical mapping between the self and others (Uddin et al. 2010). Finally, models of human social information processing typically focus on social perception, action observation, and theory of mind as isolated abilities. Attempts to bring together these different branches of research into a unified model began to appear only recently. One such model was built around the finding that social perception, action observation, and theory of mind involve a common area, the posterior superior temporal sulcus, which interconnects functionally with other regions within the respective networks underlying each of these aspects of social information processing (Yang et al. 2015). The proposed role of the posterior superior temporal sulcus in social cognition is temporal integration of the dynamic stimuli from the environment (i.e. integration of audio, visual and somatosensory cues to behaviors

3.2 The Social Brain

73

of others), and temporal predictive coding of that behavior. Here temporal integration should not be confused with time perception as a subjective experience of the duration of events. Rather, temporal integration assumes that incoming information is being integrated into a coherent whole so that we can understand and predict how events take place over time. As the proponents of the integrative model point out, the model is compatible with the dual system account of implicit and explicit social cognition (Apperly and Butterfill 2009; Apperly 2011). The system of implicit social cognition is efficient in tracking belief-like states, and it corresponds to the systems of social perception and action understanding in the integrative model (Yang et al. 2015). The system of explicit social cognition, which is cognitively demanding and enables reasoning about beliefs, corresponds to the theory of mind system in the integrative model. Thus, social perception and action understanding are relatively effortless, reflexive, and automatic, and as such close to the concept of implicit social cognition, whereas theory of mind is effortful, controlled, and cognitively demanding, and as such it is close to the concept of explicit social cognition. In conclusion, despite the disparate views regarding even the basic concepts related to social cognition, a picture that is emerging is that social cognitive processing in virtual reality may require not only additional cognitive resources, but it may also engage a different mechanism to process the social world virtually, requiring support of additional brain areas. The demands of processing social interaction may be higher in virtual environments relative to those in the physical world, because the bodily representation in virtual environments is technology-mediated, i.e. the self is “extended” and often unstable due to presence issues. This instability of the corporeal self and shifts in the feeling of being in vs. out of the virtual space could intensify demands for cognitive resources, for example by requiring a cognitive reanalysis.

3.3 Social Interaction in Virtual Environments Collaborative virtual environments are networked, computer-generated environments in which participants can interact with the environment as well as with each other via avatars. They are usually used for work, in simulation, training, and treatment of phobias, but also for entertainment and play. While using immersive virtual environments for collaborative tasks has been associated with better efficiency and productivity, these environments have also been associated with enhanced negative consequences of their use, such as greater aggressiveness in players of violent video games compared to identical playing using a traditional platform (Persky and Blascovich 2006). Thus, immersive virtual environments have the potential to enhance both collaboration and competition among the interactants in virtually shared social situations; striking the right balance between the two is apparently key to relating to others successfully. The main goals of collaborative virtual environments are to enable efficient and enjoyable collaboration and interaction, and to foster the illusion that the interacting

74

3 Self and the Virtual Other

individuals are in the same space, even though they are at different physical locations (Garau 2006; Ligorio et al. 2008). Immersive virtual environments allow social interaction in a shared, computer generated 3D context, in which people are represented by avatars, their “digital proxies”. Among the main challenges in the development of collaborative virtual environments is creation of expressive avatars that can sufficiently represent participants’ actions and intentions in real time (Bailenson et al. 2001, 2006; Bailenson and Beall 2006; Garau 2006; Vinayagamoorthy et al. 2006; Pan and Steed 2017). The function of avatars in collaborative virtual environments is to help enrich communication and interaction among physically distant participants. Yet that is not always the case, because of the paucity of expressions typically found in avatars (Vinayagamoorthy et al. 2006). In comparison, consider the richness of nonverbal cues typically present in everyday interactions, such as eye gaze, use of facial expressions, gestures, body language, posture, and distance among interlocutors. Garau (2006) argues that avatars in collaborative virtual environments are currently “extremely limited in their expressive potential” (p. 21), pointing out to some major technical limitations regarding the degree of avatar fidelity, both visual and behavioral. Visual fidelity is considered secondary to behavioral fidelity, where behavioral fidelity refers to the extent to which the behavior of avatars and other objects in a virtual environment matches the behavior of what they stand for in the physical world. In cases where prioritizing between high fidelity motion information and image realism is necessary, the common recommendation is to consider preserving the former. In addition to technical reasons, researchers have turned to exploring minimal fidelity, because they have noticed that social responses automatically arise even when cues are minimal. This is relevant for avatar design, because it indicates that owing to such automatic social responses, e.g. ascribing sentience to virtual characters, interaction in virtual environments can be based on minimal cues. Exploring the “lower boundaries of fidelity” in experiments with eye gaze and photorealism in humanoid avatars representing people taking part in conversation, Garau (2006) found that the impact of behavioral realism, which is so often emphasized in the literature on avatars, was not independent of appearance; therefore, prioritizing behavioral fidelity over visual fidelity is not necessarily a recipe for optimal design of expressive avatars in social interaction. In addition to behavioral fidelity and visual representation of interactants in an immersive virtual environment, haptic feedback is also an important feature that affects the feeling of social presence. Haptic feedback has been one of the most frequently studied features of immersive virtual environments related to joint action as well as the sense of social presence in these environments (Oh et al. 2018). Regardless, haptic feedback appears to be an area that requires much improvement (Slater 2009). Finally, in addition to immersive features which depend on virtual reality technology, other features such as context (e.g. proximity cues) and individual characteristics of participants (e.g. personality) can also affect the feeling of being together with others in virtual environments. This observation has led Oh et al. (2018) to conclude

3.3 Social Interaction in Virtual Environments

75

that it is not recommended to assume that increased social presence is equally desirable in all virtual environments and across all individuals. For example, persons with social anxiety and other individuals who choose social withdrawal over interaction will be less susceptible to social cues and they may prefer less social presence. In such cases, increased social presence would lead to less desirable results.

3.3.1 Relational Models and We-Mode in Virtual Environments Collaboration in virtual environments using immersive virtual reality systems is complex not only because of the complexities of networked immersive virtual environments, but also because of the intricacies of social processes in dyads or among members of a group. Using immersive projection technology systems, Heldal et al. (2006) studied social interaction in this kind of distributed collaboration, analyzing participants’ interaction via the interface technology, their social interaction, and interaction to complete a task. There were six pairs of participants in the study, connected via two immersive projection technology systems located at different cities. The total time the pairs of participants spent collaborating was 210 min, during which they worked on five tasks: solving a puzzle (a small-scale version of Rubik’s cube), learning a landscape and drawing a map afterwards, playing a game (solving a murder mystery), figuring out a proverb from sentence fragments spread across 10 posters, and architectural modeling, which consisted of building a single object using 96 shapes and selecting at least three out of six available colors. Clearly, successfully solving these tasks in a joint effort requires not only a high degree of collaboration, but also a range of cognitive and social skills, including efficient conversation, coordination of actions, awareness of the partner, appropriate social conduct, strategies to reach goals, adjusting movements to each other, proxemic shifts, and so on. The cognitive tests used in the study assess a range of cognitive abilities, from spatial and language abilities to problem solving. They are assessed from the perspective of collaboration and joint action in a collaborative virtual environment. Participants were minimally presented, by simple avatars with a jointed left or right arm; they could not see their own avatars, except for the virtual hand, which was drawn in the position of the actual hand (Heldal et al. 2006). The way participants handled the partner’s avatar depended on the phase of collaboration: immediately upon meeting each other, the interactants would discuss appearance of the partner’s avatar, “the colors of their clothes, the sizes of their avatars, the name tags on the top of the other’s head and the like”, and this was interspersed with seeking real life information, such as finding out where the partner was based, asking about his/her occupation, or about current weather in that place. During the problem solving phase, they treated the avatars only as a point of reference. But at the end, they would treat partners’ avatars as real people, e.g. facing them during conversation, trying to shake each other’s hand or to “high five” each other, and waving goodbye.

76

3 Self and the Virtual Other

The shift between treating the partner’s avatar as an object in the problem solving phase to treating them as “real people” afterwards indicates an unstable relation between the partners. Treating the collaborative partner’s avatar as an object corresponds to Fiske’s (1992) null relation, in which the other is not treated as a human being or it is treated as if it were not present, as suggested by the following observation: during proper problem solving, they would locomote through each other more often. There were instances for most of the tasks when one went through the other’s avatar several times in the course of one minute, or also worked standing inside the other person’s avatar for several seconds. In these cases they mentioned the other’s avatar only if it disturbed problem solving, for example by blocking each other’s view (Heldal et al. 2006, p. 112–113).

The switch to treating the partner’s avatar as “real people” in the post-task phase indicates that the expected cooperation to achieve the joint goal during the problem solving phase represented an obstacle to task performance, which is an unwanted effect in shared virtual environments. The findings of the study further emphasize that, even though participants were engaged in a collaborative task and thus they were expected to cooperate to achieve the joint goals, they did not share intentions or adopt a we perspective, acting instead in the I-mode: … sometimes they treated the partner’s avatar as if it was not there, going through it forward and backward, but at other times they apologized when colliding with it. This observation was true for all tasks and all couples (p. 125).

In terms of technology limitations affecting social interaction in shared virtual environments, the study suggests that in addition to paying attention that virtual characters do not intrude each other’s body space and do not collide with the walls of the immersive projection technology systems (as in the study), the interactants should also consider how to act and what option to pursue if the communication with the partner is suddenly lost,6 and if longer breaks in audio connection occur. By analyzing fragments of interaction, Heldal and colleagues have shown that the difficulties in interacting with technology are easily resolved relative to the difficulties in social interaction and task-solving, further confirming the common view that generating immersive virtual social environments that are realistic enough poses a special challenge to virtual reality design. Overall, these findings indicate a need for more consistency in dealing with avatars, which may lead to improved sense of presence and better task performance.

6 For example, when the tracker at one of the locations lost synchronization for several seconds, the

displayed images got frozen and so even though the collaborator’s projected hand was stuck in the door and she complained about it, her partner did not notice anything (Heldal et al. 2006, 103).

3.3 Social Interaction in Virtual Environments

77

3.3.2 Interacting with Virtual Characters Immersive virtual reality has been used in the context of exposure therapy for various types of anxiety disorders, such as acrophobia (fear of heights), aerophobia (fear of flying), arachnophobia (fear of spiders), claustrophobia (fear of confined spaces), and agoraphobia (fear of open spaces). Phobias are characterized by extreme fear and panic as a reaction when a phobic person is exposed to or is anticipating exposure to the terrifying stimuli. Using virtual environments in phobia treatment is based on the idea that these conditions can be treated by exposing patients to the terrifying stimuli but in a safe, controlled environment. Inherent to this view is the assumption of ecological validity of virtual reality therapy, i.e. transferability of the results obtained in a virtual environment to the real world (Beck et al. 2010). Unlike other types of phobia, social phobia was much less studied using virtual reality so far. This is mostly due to excessive computational requirements imposed by generating environments with realistic enough virtual humans capable of triggering the effects one wants to study. Some aspects of social interaction, which were relatively recently considered to be “beyond the capabilities of today’s realtime computer graphics and artificial intelligence”, including purposeful interaction through speech recognition, communication of emotions, sentence production, facial expressions or representation of muscle movements and joints (Sanchez-Vives and Slater 2005, p. 11), are still considered as challenging to virtual reality design. However, assuming certain level of visual and behavioral fidelity of virtual humans, some research suggests that virtual environments can help to tackle social phobia as well. For instance, an early study that investigated the effectiveness of virtual reality in treatment of anxiety due to public speaking suggested that virtual therapy had positive results, reducing the level of anxiety in the speakers (Slater et al. 1999). The participants in the study faced an audience of eight avatars, seated in a semicircle in a seminar room. Importantly, the avatars “continuously displayed random autonomous behaviors such as twitches, blinks, and nods designed to foster the illusion of ‘life’”, and the situation in which some avatars appear “frozen” in inanimate poses was precluded (Slater et al. 1999, p. 2). The eye contact was simulated to allow the avatars to look at the speaker and move heads so that they could follow the speaker as he/she moved about the room. The facial animation of avatars involved six main facial expressions, including yawning and sleeping faces. The avatars could stand up, clap, walk around, and produce sounds of clapping and yawning. Each participant gave a talk in front of a friendly audience, a hostile audience, and an audience that was at the beginning negative but changed to positively reacting as the talk progressed. Despite the limitation related to a small study sample as well as lack of verbal feedback, which is typical for real life attendees of a talk, the authors suggest that virtual characters may be an appropriate audience to human speakers who are being treated for fear of public speaking, because the virtual characters elicited the type of response that the speakers would experience in a real world setting.

78

3 Self and the Virtual Other

Establishing eye contact with avatars is important, because eye contact has impact on our perception of other people, it can improve our memory for faces and cooperative behaviors, and increase bodily self-awareness. Some findings indicate that bodily self-awareness in human adults heightens under another person’s gaze and that perceiving a face with a direct gaze may modulate ongoing or subsequent cognitive processes in humans (Baltazar et al. 2014). However, previous findings not involving virtual reality suggest that eye gaze may have differential effects on the animate-inanimate dimension. One would expect the impact of eye contact to be even more pronounced in people suffering from social anxiety and social phobia. But this effect may be lacking when interacting with virtual characters, because physiological signals that indicate an increase in arousal when using eye contact differ for inanimate and animate entities; the effect is there “when a real person is gazing at a participant” (Baltazar et al. 2014, p. 121). Since this effect is not associated with inanimate entities, such as avatars, virtual environments represent a safe space to treat fear of public speaking. More research is needed on other possible contributors to this type of excessive fear and how to attenuate their effects in virtual environments. A recent study investigated more directly interaction between participants and virtual humans. The goal of the study was to determine whether virtual reality could be efficient in treating fear of interaction in socially anxious men approached by a person of opposite sex, in this case a virtual one (Pan et al. 2012). Like Slater et al.’s (1999) study, this study also reports a positive effect of virtual treatment, concluding that personal interaction with virtual characters can help anxious men to overcome a fear of social contact with women. The virtual characters in the study were life-sized, they could make eye contact with the participants, and use facial expressions and gestures; they could talk and perform various activities related to the participant. The study was set in the context of a virtual party of five, and the avatar was modeled to be an attractive, forward female, who would gaze towards the participant for a few seconds before approaching him. She would introduce herself and begin conversation. She would listen carefully, nod and smile, while looking at and leaning towards the participant; she would then enter his personal space at close phase approaching him within 0.5 m (Sect. 5.3.1), and suggest to meet up again (Pan et al. 2012). Thus, the virtual character’s behavior consisted of speech, body movements, and facial animation. The study reports that the brief interaction of anxious participants with the avatar was associated with a decrease in their anxiety scores. Despite the delay in avatar’s spoken responses, which were prerecorded and the experimenter had to choose the most appropriate one from the database to fit each conversation, the anxious men in the study behaved automatically as if the avatar was a real woman. This is in keeping with other findings that indicate that even minimal social cues trigger automatic social responses (Garau 2006). Note that this is a reciprocal one-to-one type of social interaction. On the other hand, unlike their automatic behavior, participants’ emotional responses and thoughts elicited by the experience were not similar to those that normally arise in such situations in real life. Finally, being observed by other avatars did not have considerable physiological effect on the anxious men, although according to participants’ subjective reports, they had experienced some discomfort,

3.3 Social Interaction in Virtual Environments

79

which is consistent with social anxiety. Therefore, the findings of the study indicate a certain influence of the direct interaction with a virtual character on the participants. This finding is compatible with the model of social influence that postulates that behavioral fidelity together with social presence determines the extent to which a virtual character may influence participants (Blascovich et al. 2002). Other reports indicate that these are not solitary findings. Despite some limitations, interaction with virtual humans as a controllable and reliable method for research and treatment of social anxiety has been used in other studies. For instance, a study that investigated effects of mimicry on socially anxious women found that the participants did not positively experience mimicry by virtual characters (Vrijsen et al. 2010). Typically, interacting with others who mimic a person’s behavior, be it tone of voice,7 posture, emotions, facial expressions, or gestures and body movements, is positively experienced by the mimicked person. Unconsciously mimicking others facilitates the interaction and increases likeability of the mimicker as well as the feeling of affiliation, rapport, interpersonal closeness and social bonding. However, individuals with social anxiety disorders process mimicry by others differently than the typical population and so it is expected that they experience being mimicked by avatars also differently. Thus, while the positive effect of mimicry also holds for mimicking avatars in normal circumstances, this is not the case with participants who suffer from social anxiety disorders. More specifically, the participants in the study by Vrijsen et al. (2010) were anxious females and the avatars that interacted with the participants in a virtual environment were two male characters: one of them mimicked the participant with whom it interacted, and the other character mimicked the previous participant. The study did not find a statistically significant difference between the participants’ experience of their interaction with the mimicking and non-mimicking avatar. While the finding is compatible with the notion that socially anxious individuals process mimicry in social interaction differently from typical population, one issue related to the study design is a possible gender effect. In other words, the question remains whether the findings would have been different had the mimicking avatars been female and had the sample included male participants as well. Regardless, the study supports the notion that interaction with virtual characters may represent a useful tool in addressing social anxiety disorders. One insufficiently researched topic related to social interaction in virtual reality is the role of communication. There are many aspects of language that deserve to be better explored in virtual environments, from issues related to the use of language in complex spatial tasks and navigation to fairly simple ones, such as acknowledging the other’s avatar as an equal partner in a joint task (Friedman and Gillies 2005). As an illustration of the need to expand research on language and communication in virtual environments, consider the role of humor in social interaction. Humor allows us to express negative feelings in a positive way, which is critical in social 7 For

instance, phonetic convergence (or speech alignment/speech accommodation) is a common phenomenon in speech perception which is characterized by perceiver’s tendency to imitate certain features of the spoken language produced by others (Rosenblum et al. 2017).

80

3 Self and the Virtual Other

bonding (Wilkins and Eisenbraun 2009). It has positive effects on anxiety states and mood disturbances, so much so that humor and laughter have been used in a variety of treatments. Furthermore, laughter may indicate safety and play behavior (e.g. tickling laughter), laughing with others strengthens group unity and promotes cooperation, bonding, and affection (Kljajevic 2019). Additionally, laughter may express negative feelings (e.g. contempt), enhance a verbal insult (Otten et al. 2017), or simply convey a social status, e.g. dominant, disinhibited laughter in people of a higher social status vs. submissive, inhibited laughter in people of a lower social status (Oveis et al. 2016). In social anxiety, which is one of the most prevalent psychiatric conditions, fears related to humiliation, criticism and rejection are exaggerated, and so is the fear of being the target of laughter. The fear of being the target of laughter may be even pathological, which is the case in a type of social phobia known as gelotophobia (Chan 2016). As mentioned at the beginning of this section, designing virtual characters believable enough for relatively realistic social interaction is challenging and computationally costly. Adding figurative language, humor and various types of laughter as tools to nuance meanings in avatars’ communication amounts to complexities that may push the challenge to the edge of what is currently possible.

3.3.3 Virtual Togetherness The newcomer, in effect, transforms a solitary individual and himself into a gathering. (Goffman, 1956, p. 21)

Some sixty-five years ago, discussing behavior in public places—more specifically, face-to-face interaction and the social norms that regulate behavior of persons in physical proximity—the sociologist Erving Goffman introduced the concept of co-presence: Persons must sense that they are close enough to be perceived in whatever they are doing, including their experiencing of others, and close enough to be perceived in this sensing of being perceived. In our walled-in Western society, these conditions are ordinarily expected to obtain throughout the space contained in a room, and to obtain for any and all persons present in the room. (Goffman 1968, p. 17)

In open space, as on streets, it is more difficult to determine the exact space in which to consider two or more individuals as co-present. Goffman further wrote that co-presence renders interacting persons “uniquely accessible, available, and subject to one another” (p. 22). Accessibility, availability, and interaction among interactants as equal agents need to be observed in virtual environments as well. Instead, sometimes a more relaxed attitude is found, which permits avatars to disturb each other’s bodily space in some way; for instance, virtual characters pass through each other. Avatars that pass through each other are not co-present, since they do not perceive or recognize each other as accessible and available partners. Even though

3.3 Social Interaction in Virtual Environments

81

physicality and appropriate feedback may be limited in virtual environments, copresence in the above described sense is still a requirement. While these insights clearly emphasize the importance of physicality for copresence, social presence cannot be reduced to physical co-presence. It requires understanding of the interactive social scene shared by the interactants, their unique contributions as acting individuals and interacting members of a team, interpersonal awareness as well as their shared or collective we-ness (Searle 1990; Gallotti and Frith 2013) (Sect. 3.1.3). According to an early definition of social presence related to phenomena involving communication mediated by technology (e.g., a closed-circuit television), social presence is a subjective quality that depends on “the medium’s objective qualities”, the perceived “degree of salience of the other person in the interaction and the consequent salience of the interpersonal relationships” (Short et al. 1976, p. 65). Applied to virtual reality, this definition would emphasize the role of immersion, and it would posit the interactants and their interaction on an equal level. Furthermore, as pointed out by Schroeder (2005), social presence in virtual environments is also shaped by an environment’s affordances. In the literature on virtual reality, definitions of social presence often reduce this phenomenon to representational co-presence. For example, according to one such definition, social presence is grounded in the assumption that if avatars as representations of other people reside in a virtual environment, then the environment exits, and furthermore, if own presence is acknowledged by others in a virtual environment, then it proves that one exists in that environment (Sadowski and Stanney 2002). Since the purpose of virtual reality is to enable participants to do things and not just to be there, such definitions are only a partial characterization of social presence in immersive virtual environments that neglect the rich, dynamic side of virtual togetherness: social interaction. A more complete characterization of this phenomenon requires more than representational co-localization and acknowledgment of one’s presence by others in the same environment. It requires ways of understanding others, for instance via mindreading or “meeting of [extended] minds”, or if the view on social understanding as being disembodied is abandoned, then via intercorporeality and embodied simulation. In collaborative virtual environments co-presence requires joint goals, joint actions, and we-intentions or collective intentionality. As illustrated by the example of avatars that pass through each other while performing a joint task in a collaborative immersive virtual environment (where avatars appear to be nuisance if not an obstacle to each other, rather than equal, interacting, accessible and available partners), representational co-localization is not enough for social presence and virtual togetherness. While the requirement for co-intentionality is naturally imposed in social settings with interacting avatars, digital proxies of humans that are controlled by humans, it is less clear how to reconcile this requirement with virtual settings that involve interactions with virtual agents, which simulate humans but are controlled by a computer system. An intriguing question, evoking some old debates in cognitive science, arises: Can a computer system have intentionality? Can we speak of “meeting of minds” and we-intentionality when interacting with virtual agents? It remains unclear also

82

3 Self and the Virtual Other

how to incorporate in virtual character design the cues to social knowledge that are typically triggered in specific social situations in real life. The concept of shared virtual environments assumes virtual togetherness, that is, the sense of being and acting together with others in the same virtual environment. Virtual togetherness presupposes that human participants have a developed relationship with their own avatars, on which further relationships are built (Durlach and Slater 2000). Importantly, collaborative virtual environments in which participants need to complete a task together require more than representational co-localization and acknowledgment of their presence by other virtual characters. It requires shared goals, we-intentionality, and joint action.

References Ammaniti, M., Gallese, V.: The birth of intersubjectivity. Psychodynamics, neurobiology and the self. W.W. Norton & Company, New York (2014) Apperly, I.: Mindreaders. The cognitive basis of “theory of mind”. Psychology Press, New York (2011) Apperly, I.A., Butterfill, S.A.: Do humans have two systems to track beliefs and belief-like states? Psychol. Rev. 116, 953–970 (2009) Bailenson, J.N., Blascovich, J., Beall, A.C., Loomis, J.M.: Equilibrium theory revisited: mutual gaze and personal space in virtual environments. Presence 10, 583–598 (2001) Bailenson, J.N., Beall, A.C.: Transformed social interaction: exploring the digital plasticity of avatars. In: Schroeder, R., Axelsson, A.-S. (eds.), Avatars at work and play. Collaboration and interaction in shared virtual environments, pp. 1–16. Springer, Dordrecht (2006) Bailenson, J.N., Yee, N., Merget, D., Schroeder, R.: The effect of behavioral realism and form realism of real-time avatar faces on verbal disclosure, nonverbal disclosure, emotion recognition, and copresence in dyadic interaction. Presence 15, 359–372 (2006) Baltazar, M., Hazem, N., Vilarem, E., Beaucousin, V., Picq, J.-L., Conty, L.: Eye contact elicits bodily self-awareness in human adults. Cognition 133, 120–127 (2014) Beck, L., Wolter, M., Mungard, N.F., Vohn, R., Staedtgen, M., Kuhlen, T., Sturm, W.: Evaluation of spatial processing in virtual reality using functional magnetic resonance imaging (fMRI). Cyberpsychol. Behav. Soc. Netw. 13, 211–215 (2010) Bedi, G., Hyman, D., de Wit, H.: Is ecstasy an ‘empathogen’? Effects of MDMA on prosocial feelings and identification of emotional states in others. Biol. Psychiat. 68, 1134–1140 (2010) Beer, J.S., Ochsner, K.N.: Social cognition: a multi-level analysis. Brain Res. 1079, 98–105 (2006) Blakemore, S.-J., Frith, U.: How does the brain deal with the social world? NeuroReport 15, 119–128 (2004) Blascovich, J., Loomis, J., Beall, A., Swinth, K., Hoyt, C., et al.: Immersive virtual environment technology as a methodological tool for social psychology. Psychol. Inq. 13, 103–124 (2002) Bolender, J.: The self-organizing social mind. MIT Press, Cambridge, MA (2010) Bradford, E.E.F., Jentzsch, I., Gomez, J.-C.: From self to social cognition: Theory of mind mechanisms and their relation to executive functioning. Cognition 138, 21–34 (2015) Buber, M.: I and Thou. Charles Scribners’ Sons, New York (1923/1970) Chan, Y.-C.: Neural correlates of deficits in humor appreciation in gelotophobics. Scientific Rep. 6, 34580 (2016) Dennett, D.C.: The Intentional Stance. MIT Press, Cambridge (1987) Doherty, M.: Theory of mind. Psychology Press, New York (2009) Dunbar, R.I.M.: The social brain hypothesis and its implications for social evolution. Ann. Hum. Biol. 36, 562–572 (2009)

References

83

Durlach, N., Slater, M.: Presence in shared virtual environments and virtual togetherness. Presence 9(2), 214–217 (2000) Fiske, A.P.: The four elementary forms of sociality: framework for a unified theory of social relations. Psychol. Rev. 99, 689–723 (1992) Fiske, A.P., Haslam, N.: Social cognition is thinking about relationships. Curr. Dir. Psychol. Sci. 5, 143–148 (1996) Friedman, D., Gillies, M.: Teaching virtual characters how to use body language. In: Panayiotopouos, T. et al. (eds.), IVA 2005, LNAI 3661, pp. 205–214. Springer-Verlag (2005) Frith, C., Frith, U.: Theory of mind. Curr. Biol. 15, R644–R645 (2005) Gallagher, S.: Body schema and intentionality. In: Bermúdez, J.L., Marcel, A., Eilan, N. (eds.) The body and the self, pp. 226–244. MIT Press, Cambridge, MA (1995) Gallagher, S.: Neural simulation and social cognition. In: Pineda, J.A. (ed.) Mirror neuron systems, pp. 355–371. Humana Press, New York, NY (2009) Gallese, V.: The acting subject: toward the neural basis of social cognition. In: Metzinger, T. (ed.) Neural correlates of consciousness, pp. 323–333. MIT Press, Cambridge, MA (2000) Gallese, V., Sinigaglia, C.: How the body in action shapes the self. J. Conscious. Stud. 18, 117–143 (2011) Gallotti, M., Frith, C.D.: Social cognition in the we-mode. Trend Cogn. Sci. 17, 160–165 (2013) Garau, M.: Selective fidelity: investigating priorities for the creation of expressive avatars. In: Schroeder, R., Axelsson, A.-S. (eds.) Avatars at work and play. Collaboration and interaction in shared virtual environments, pp. 17–38. Springer, Dordrecht (2006) Gibson, J.J.: The ecological approach to visual perception. Lawrence Erlbaum Associates, Publishers, Hillsdale, NJ (1986) Goffman, E.: The presentation of self in everyday life, p. 2. The University of Edinburgh, Monograph No, Edinburgh (1956) Gray, K., Wegner, D.M.: Feeling robots and human zombies: mind perception and the uncanny valley. Cognition 125, 125–130 (2012) Gray, H.M., Gray, K., Wegner, D.M.: Dimensions of mind perception. Science 315, 619 (2007) Hakli, R., Miller, K., Tuomela, R.: Two kinds of we-reasoning. Econ. Philos. 26, 291–320 (2010) Hall, E.T.: The hidden dimension. Anchor Books, New York (1969) Heldal, I., Bråthe, L., Steed, A., Schroeder, R.: Analyzing fragments of collaboration in distributed immersive virtual environments. In: Schroeder, R., Axelsson, A.-S. (eds.) Avatars at work and play. Collaboration and interaction in shared virtual environments, pp. 97–130. Springer, Dordrecht (2006) Herrmann, E., Call, J., Hernández-Lloreda, M.V., Hare, B., Tomasello, M.: Humans have evolved specialized skills of social cognition: the cultural intelligence hypothesis. Science 317, 1360–1366 (2007) Hickok, G.: Eight problems for the mirror neuron theory of action understanding in monkeys and humans. J. Cogn. Neurosci. 21, 1229–1243 (2009) Jerison, H.J.: Evolution of the brain and intelligence. Academic Press, New York (1973) Kljajevic, V., Ugarte, E., Lopez, C., Balboa, Y.B., Vicente, A.: Inner speech in post-stroke motor aphasia. Cogn. Sci. Soc. Proc. 39, 2432–2437 (2017) Kljajevic, V.: Neurology of humor. In: Shackelford, T.K., Weekes-Shackelford, V.A. (eds.) Encyclopedia of evolutionary psychological science, Springer Nature (2019) https://doi.org/10.1007/ 978-3-319-16999-6_3242-1 Korman, J., Voiklis, J., Malle, B.F.: The social life of cognition. Cognition 135, 30–35 (2015) Lieberman, M.D.: Social cognitive neuroscience: A review of core processes. Annu. Rev. Psychol. 58, 259–289 (2007) Ligorio, M.B., Cesareni, D., Schwartz, N.: Collaborative virtual environments as means to increase the level of intersubjectivity in a distributed cognition system. J. Res. Technol. Educ. 40(3), 339–357 (2008) MacDorman, K.F., Entezari, S.: Individual differences predict sensitivity to the uncanny valley. Interaction Stu. 16, 141–172 (2015)

84

3 Self and the Virtual Other

Michael, J.: Interactionism and mindreading. Rev. Philos. Psychol. 2, 559–578 (2011) Mori, M.: The uncanny valley. Energy 7, 33–35. Reprinted in IEEE Robotics and Automation Magazine, June 2012, 98–100 (1970) Oh, C.S., Bailenson, J.N., Welch, G.F.: A systematic review of social presence: definitions, antecedents, and implications. Front. Robot. AI 5, 114 (2018) Otten, M., Mann, L., van Berkum, J.A.A., Jonas, K.J.: No laughing matter: how the presence of laughing witnesses changes the perception of insults. Soc. Neurosci. 12, 182–193 (2017) Oveis, C., Spectre, A., Smith, P.K., Liu, M.Y., Keltner, D.: Laughter conveys status. J. Exp. Soc. Psychol. 65, 109–115 (2016) Pan, X., Gillies, M., Barker, C., Clark, D.C., Slater, M.: Socially anxious and confident men interact with a forward virtual woman: an experimental study. PLoS ONE 7(4), (2012) Pan, Y., Steed, A.: The impact of self-avatars on trust and collaboration in shared virtual environments. PLoS ONE 12, (2017) Pelphrey, K.A., Carter, E.J.: Brain mechanisms for social perception. Lessons from autism and typical development. Ann. N.Y. Acad. Sci. 1145, 283–299 (2008) Persky, S., Blascovich, J.: Consequences of playing violent video games in immersive virtual environments. In: Schroeder, R., Axelsson, A.S. (eds.) Avatars at work and play. Collaboration and interaction in shared virtual environments, pp. 167–186. Springer (2006) Rizzolatti, G., Craighero, L.: The mirror-neuron system. Annu. Rev. Neurosci. 27, 169–192 (2004) Rizzolatti, G., Fogassi, L., Gallese, V.: Neurophysiological mechanisms underlying the understanding and imitation of action. Nat. Rev. Neurosci. 2, 661–670 (2001) Rosenblum, L.D., Dias, J.W., Dorsi, J.: The supramodal brain: implications for auditory perception. J. Cogn. Psychol. 29, 65–87 (2017) Sadowski, W., Stanney, K.: Presence in virtual environments. In: Stanney, K.M. (ed.), Handbook of virtual environments. Design, implementation, and applications, pp. 791–806. Lawrence Erlbaum Associates, Publishers, Mahwah, NJ (2007) Sanchez-Vives, M.V., Slater, M.: From presence to consciousness through virtual reality. Nat. Rev. Neurosci. 6, 332–339 (2005) Sas, C., O’Hare, G.M.P., Reilly, R.: Presence and task performance: an approach in the light of cognitive style. Cogn Tech Work 6, 53–56 (2004) Schmidt, H.B.: Plural action. Springer, Cham (2008) Schneider, D., Slaughter, V.P., Dux, P.E.: Current evidence for automatic theory of mind processing in adults. Cognition 162, 27–31 (2017) Schroeder, R.: Being there together and the future of connected presence. Presence 15, 339–344 (2005) Searle, J.: Collective intentions and actions. In: Cohen, P.R., Morgan, J., Pollock, M. (eds.) Intentions in Communications, pp. 401–415. MIT Press, Cambridge, MA (1990) Short, J., Williams, E., Christie, B.: The social psychology of telecommunications. Wiley, London (1976) Stein, J.-P., Ohler, P.: Venturing into the uncanny valley of mind—the influence of mind attribution on the acceptance of human-like characters in a virtual reality setting. Cognition 160, 43–50 (2017) Steinicke, F.: Being really virtual. Springer, Immersive natives and the future of virtual reality (2016) Tinwell, A., Nabi, D.A., Charlton, J.P.: Perception of psychopathy and the uncanny valley in virtual characters. Comput. Hum. Behav. 29, 1617–1625 (2013) Uddin, L.Q., Iacoboni, M., Lange, C., Keenan, P.: The self and social cognition: the role of cortical midline structures and mirror neurons. Trends Cogn. Sci. 11, 153–157 (2007) Vinayagamoorthy, V., Gillies, M., Steed, A., Tanguy, E., Pan, X., Loscos, C., et al.: Building expression into virtual characters. Eurographics (2006). https://doi.org/10.2312/egst.20061052 Vrijsen, J.N., Lange, W.-G., Dotsch, R., Wigboldus, D.H.J., Rinck, M.: How do socially anxious women evaluate mimicry? A virtual reality study. Cogn. Emot. 24, 840–847 (2010) Vygotsky, L.: Thought and language. MIT Press, Cambridge, MA (1986)

References

85

Wilkins, J., Eisenbraun, A.J.: Humor theories and the physiological benefits of laughter. Holist Hurs Pract 23, 349–354 (2009) Wilms, M., Schilbach, L., Pfeiffer, U., Bente, G., Fink, G.R., Vogeley, K.: It’s in your eyes—using gaze-contingent stimuli to create truly interactive paradigms for social cognitive and affective neuroscience. Scan 5, 98–107 (2010) Wittmann, M.K., Kolling, N., Faber, N.S., Scholl, J., Nelissen, N., Rushworth, M.F.S.: Self-Other mergence in the frontal cortex during cooperation and competition. Neuron 91, 482–493 (2016) Yang, D.Y.-J., Rosenblau, G., Keifer, C., Pelphrey, K.A.: An integrative neural model of social perception, action observation, and theory of mind. Neurosci. Biobehav. Rev. 51, 263–275 (2015)

Chapter 4

Virtual Embodiment and Action

The body becomes the highly polished machine which the ambiguous notion of behavior nearly made us forget (Merleau-Ponty 1958, p. 87). If we analyze what is at the basis of intersubjectivity, what it is that allows individuals to “see” the social world as meaningful, we discover that agency plays a major role in all these phenomena (Gallese 2000a, p. 330).

4.1 Distinguishing Own Actions from Those of Others Successful human interaction depends on both verbal and nonverbal communication. As much as it is important for participants in a speech act to extract a message from verbal communication, it is also important that that they derive information from nonverbal behavior and predict actions of other interactants. Another important ability that affects social cognition is the ability to distinguish between self-initiated actions and actions of other people. In fact, the ability to understand actions of others and to correctly attribute actions to others is a prerequisite for the self-other differentiation (Jeannerod and Anquetil 2008). According to this view, the self-other distinction is accomplished by first “displacing” the self to the location of the other, whose action is being observed, and then by simulating the observed action so that the self can understand what the other is doing. Under normal circumstances, the human motor system efficiently processes the distinction between self-created actions and actions created by others (Howard et al. 2016). The neural underpinnings of the self-other differentiation have been studied using a range of methods, including neuroimaging. For example, an early Positron Emission Tomography (PET) study compared brain activation patterns elicited by a task requiring participants to simulate the action of grasping1 an object from either a first- or third-person perspective. The study reports that the parieto-occipital junction (Brodmann area 19) was specific to the third-person perspective, suggesting that 1 Grasping

is the most studied motor act in the domain of visuomotor behavior (Ferreti 2016).

© Springer-Verlag GmbH Germany, part of Springer Nature 2021 V. Kljajevic, Consensual Illusion: The Mind in Virtual Reality, Cognitive Systems Monographs 44, https://doi.org/10.1007/978-3-662-63742-5_4

87

88

4 Virtual Embodiment and Action

the activation in this area reflects computation of “the difference in spatial localization between oneself and the other self” and thus it represents a cue for the self-other distinction (Jeannerod and Anquetil 2008, p. 365). On the other hand, the inferior parietal lobe was associated with the representation of the first person perspective and self-attribution of actions generated by own self. This findings is consistent with evidence from studies on neurological disorders that indicate an association between lesions in the inferior parietal lobe, mostly in the right hemisphere, and neglect or misattribution of one’s own body parts, which is found in asomatognosia and somatoparaphrenia. The self cannot always correctly attribute an action in virtual environments, because the sense of self-agency and self-ownership may be diminished in such environments, which may blur the distinction between the self and the other. Even in non-mediated environments, we may on occasion confuse own action performance with the performance of others (Wittmann et al. 2016). Apparently, this confusion is registered in Brodmann area 9, since the same brain area processes information about own self and others. When observing an action, we can infer information on that action from its kinematics (i.e. trajectory and velocity), motor aspect (i.e. the pattern of muscle activity), goals (e.g. to grasp an object) and intentions (the overall reason for the action) (Macerollo et al. 2015). These layers appear to be organized hierarchically, where the kinematics depends on the motor level, which in turn depends on the goal level, and with the goal level depending on the intention level. Thus, our inferences on observed actions always depend on the kinematics. This approach is aligned with the notion that during action observation the motor system generates predictions on the kinematics of the observed action (e.g. Friston 2012). Distinguishing between own actions and actions of others is a critical element of early development in human beings, which precedes the development of conceptualization and language competence (Gallese 2000a). For instance, newborns are capable of imitating movements of others (Berluchi and Aglioti 1997). This is enabled by an implicit body schema that will eventually develop into the adult body schema, which consists of perceptions and impressions about one’s body and how it relates to other bodies, allowing a differentiation between the self and the other. Some evidence suggests that eighteen-month old children are capable of not only differentiating between intended and actual actions, but also of re-enacting the intended behavior of others. This means that they understand multiple levels of actions they observe, including the intentions and goals of actions, which posits agency at the core of mentalism (Gallese 2000a) and underlines its role in intersubjectivity (Trevarthen 1999). Remarkably, social actions emerge even before birth. Studies involving twin fetuses that used four-dimensional ultrasonography to investigate social actions at the earliest phases of development have shown that movements of the fetuses were not only self-directed or directed toward the uterine wall, but they were also directed toward the cotwin, with the proportions of the movements directed towards the cotwin and their duration increasing from the 14th to the 18th week of gestation (Ammaniti and Gallese 2014). Thus, the aspect of self that Neisser (1988) labeled

4.1 Distinguishing Own Actions from Those of Others

89

the interpersonal self begins developing even before birth (Gallese and Sinigaglia 2011). Taken together, these findings support the view, which will be further discussed in the following sections, that agency plays a major role in subjectivity, intersubjectivity and in making sense of the social world (Gallese 2000a).

4.2 Action Possibilities The objects which surround my body reflect the possible action of my body on them. (Bergson 1896/1988, p. 21)

How do we perceive other people’s actions? One way to approach this question is via the concept of affordances. This concept is relevant to our perception of objects and behaviors as well as for own body schema. The concept was introduced by the ecological psychologist James Jerome Gibson in the late 1970s. Long before Gibson, the philosopher Henri Bergson observed that objects we perceive reflect possibilities for how we can act on them, and others explicitly claimed that perception involves action. Here is Gibson’s (1986) definition of affordances: The affordances of the environment are what it offers the animal, what it provides or furnishes, either for good or ill. The verb to afford is found in the dictionary, but the noun affordance is not. I have made it up. I mean by it something that refers to both the environment and the animal in a way that no existing term does. It implies the complementarity of the animal and the environment. (Gibson 1986, p. 127)

Unlike the classical approach to perception, according to which perception of objects is based on our ability to discriminate objects’ properties and qualities, Gibson’s approach emphasizes affordances in the environment. In other words, the combination of qualities unique to an object is typically in the foreground and it remains unnoticed relative to its affordances: “what the objects afford us is what we normally pay attention to” (Gibson 1986, p. 134). For example, surfaces afford support, air affords breathing, water affords drinking, stairways afford ascent and descent, small objects afford grasping, and so on. Affordances do not change if the observer’s needs change; they are always present, always there to be perceived, and when detected they are used without conscious awareness (Gibson et al. 1999). Affordances can be beneficial or harmful (e.g., some ingestible substances afford nutrition, others afford poisoning), or both (the brink of a cliff affords not only walking, but also falling and thus injury). The concept of affordances is relevant not only to our perception of objects, but also to our perception of behavior: affordances are considered to guide our perception and action. In fact, Gibson (1986) describes the affordances of human behavior as “staggering”: Behavior affords behavior, and the whole subject matter of psychology and of the social sciences can be thought of as an elaboration of this basic fact. Sexual behavior, nurturing

90

4 Virtual Embodiment and Action behavior, fighting behavior, cooperative behavior, economic behavior, political behavior – all depend on the perceiving of what another person or other persons afford, or sometimes on the misperceiving of it. (p. 135)

The perception of mutual affordances depends on how much and what type of information from the environment is perceived, such as information contained in touch, sound, odor, taste, and light. Thus, affordances indicate action possibilities that depend on object properties in the environment and behavior of interacting individuals afforded by a specific situation. Affordances as action possibilities further depend on the possibilities of body schema. This means that while objects afford actions, the possibilities for action depend on “the possibilities of particular postural models” (Gallagher 1995, p. 234) and adjustments the body needs to make to obtain the perception (Damasio 1999). Gallese (2000b) objects that the concept of affordances as defined by Gibson does not capture completely the contribution that an action that is being observed may have to the content of perception. This is because the role Gibson assigned to action is “merely instrumental”, equating action and passive movements, whereas only active movements, Gallese argues, “enable us to create representational content”. Ferretti (2016) also argues that the direct visual perception view of action possibilities promoted by Gibson implies an anti-representationalist view of affordances, which is a stance that most neuroscientists and philosophers nowadays find unappealing. Even if one uncritically assumes the concept of affordances as proposed by Gibson, its applicability to virtual reality is not straightforward. While it is clear that virtual environments may afford actions not afforded by the physical world, the range of affordances in virtual environments is at the same time limited relative to what is afforded by the real world. For example, interacting with a person in close proximity in the real world allows us not only to see and hear them, but also to feel how they smell and sometimes what it is like to touch them (e.g. greeting someone with a handshake or hug), providing at the same time other valuable cues that may guide the interaction (e.g. the strength of the person’s hand grip while shaking hands, or the body language indicating that she is uncomfortable with the hug). The scope of such information is much smaller in virtual environments, where multisensory information integration may be further limited because of technology issues. So, how are behavioral affordances registered in virtual environments, where participants are represented by virtual characters and the behaviors are not actually acted out by the participants? This remains a challenge for virtual environments, even when virtual humans show a high degree of believability with regard to their appearance and behavior: they are not perceiving, thinking, feeling agents. According to the traditional view, affordances are static: they are perceived directly without the mediation of knowledge about the perceived object; the motor system is assumed to directly respond to them, regardless of whether the object is recognized or not. This static view has been challenged recently, based on the evidence suggesting that affordances may involve a dynamic component. More specifically, they can be affected by changes in the environment or in the person’s perception. Furthermore, affordances have been divided into stable, such as the shape or size of the coffee

4.2 Action Possibilities

91

mug on my desk, and variable, such as its orientation in space. For example, the mug can be positioned differently so that its handle points in different directions, or it can be turned upside down. Some affordances are canonical, such as the upright position of houses, and conceptually they are close to stable affordance. Since variable affordances, such as orientation in space, can also lead to stable affordances, it has been proposed that affordances are perhaps best viewed as a continuum (Sakreida et al. 2016). The human mind appears to differently process stable and variable affordances. The invariant features of an object and other properties related to stable affordances depend on our knowledge about the object and our previous experience with it. Thus, stable affordances engage slow, off-line processing of information based on our knowledge and previous experience with the particular object. In contrast, variable affordances require adaptation to changes in the properties of an object (e.g. different orientation in space), which requires fast, online processing of visual information during an actual situation in which the object is encountered (Borghi and Riggio 2015; Sakreida et al. 2016). The two types of affordances are supported by different processing pathways within the same brain network. A recent meta-analysis of published neuroimaging data, involving 44 studies of stable affordances and 27 studies of variable affordances (Sakreida et al. 2016) indicates that the network supporting stable affordances includes the left inferior parietal and frontal cortices, while the variable affordances are supported by a network localized more dorsally (Fig. 4.1). This approach is firmly based in the notion that the visual system has a dual function, supporting not only visual perception or descriptive vision, but also the visual control of action or vision for action (Milner and Goodale 1995). On this view, the concept of visual perception does not indicate reflexive phenomena, such

Fig. 4.1 Affordances on the continuum from stable to variable and the processing pathways that support them: the ventro-dorsal pathway (red) supports processing of stable affordances and the dorso-dorsal pathway (green) supports processing of variable affordances (Sakreida et al. 2016). Reproduced with permission from Elsevier

92

4 Virtual Embodiment and Action

as the processing of visual input in the control of pupillary diameter, but rather “a process which allows one to assign meaning and significance to external objects and events” (Milner and Goodale 1995, p. 2). Thus conceptualized visual perception subserves object recognition and identification and how they relate in space and time. Crucially, vision for perception and vision for action are supported by different “visual brains”, and in normal experience they operate together, seamlessly, to form the visual representation of space.

4.3 Vision for Action The idea that there exist two distinct visual processing streams was introduced in the 1980s, when Ungerleider and Mishkin (1982) proposed that visual information, which originates in the occipital cortex, is processed along two distinct anatomical pathways. The ventral or “what” stream is critical in perceptual identification of objects and their visual characteristics (e.g. size, shape, orientation, color). The dorsal or “where” stream is critical in determining location of objects in visual space. In this original version of the dual-stream model for visual processing, the distinction pertains only to the processing of incoming visual information or input processing, differentiating between a system for object vision and a system for spatial vision. The two processing streams contribute to a single representation, which, as the end point of the dual-stream processing, serves as a basis for thought and action. On the other hand, Milner and Goodale’s (1995) model of visual processing focuses less on the input characteristics (e.g. object location vs. intrinsic object features) and more on the output characteristics of the visual processing systems. The model postulates that two distinct visual processing streams have developed to mediate in different ways the visual information about objects and their locations (Milner and Goodale 2008). More specifically, the ventral stream “transforms the visual inputs into perceptual representations that embody the enduring characteristics of objects and their spatial representations” (p. 755). It serves to identify actual as well as possible goal objects and plan how to deal with them. In contrast, the dorsal stream registers visual metrics for action and transforms them into the employed effector’s coordinates; it “mediates the visual control of skilled actions, such as reaching and grasping, directed to objects in the world” (p. 775). Thus, both streams support action, with the ventral stream supporting the planning of action and the dorsal stream supporting the motor programming of the action and online control of its realization. Milner and Goodale (2008) emphasize that a common misunderstanding of their model is the assumption that the mental representations supporting visual perception serve as a direct basis for action. Furthermore, the model postulates that visual coding for perception and coding for visuomotor control differ with regard to the frames of reference they use. Visual coding for perception is viewer-independent: the identity of an object is determined regardless of the observer’s point of view. The constancies of shape, size, color, lightness and location in human perception allow that enduring characteristics of

4.3 Vision for Action

93

objects and their relations remain stable across different conditions, forming longterm perceptual representations that are used to recognize and identify objects. Unlike visual coding for perception, visual coding required for visuomotor control depends on the observer. Execution of an action requires a unique set of transformations of the visual array so that each component is executed correctly with regard to the goal object. For example, the action of reaching out and grasping an object requires coordination of movements of the fingers, hands, upper limbs, torso, head and eyes. Such a coordination requires complex visual inputs and transformations that differ from those involved in object recognition. Crucially, “to fixate and then reach towards a goal object, it is necessary that the location and motion of that object be specified in egocentric coordinates (that is, coded with respect to the observer)” (Milner and Goodale 1995, p. 41). Additionally, even though the reference system is egocentric, the particular coordinate system will be centered with respect to the retina, head or body depending on the employed effector system (eye, hand or both) (Sect. 5.1.3). Evidence from optic ataxia, an action disorder characterized by normal spatial awareness and deficit in visually guided movements related to a specific effector, supports the view that in vision for action the egocentric reference system relies on multiple spatial coding systems and that each of them controls a specific effector system. The viewer-based coding for visual control of object-based actions involves instantaneous features that require continuous updating, because they are computed on a moment-to-moment basis from the egocentric coordinates of the object’s location in the environment. According to Milner and Goodale (1995), the key evolutionary force driving the development of two separate processing streams in the primate visual cortex is precisely the differences in the transformations of incoming visual information that are required in visual processing. In addition, the refinement of visuomotor control in primates has relevance for the expansion of visual areas in the cerebral cortex; this expansion also reflects the emergence of representational systems relevant for perception and cognition. The phylogenetically ancient dorsal system supporting visually guided actions and the phylogenetically recent ventral system supporting conscious visual awareness differ with regard to the metric information about the size (absolute vs. relative), distance, and geometry of objects, as well as with regard to the frames of reference they rely on, and the type of spatial information they use (Briscoe 2009). However, the two streams need to cross-talk. For instance, the ventral stream needs to inform the dorsal stream on which object to act on. While such issues have been addressed to some extent within Milner and Goodale’s model, Briscoe (2009) presents evidence suggesting that both conscious visual awareness and vision for action utilize egocentric coordinates. For example, the dorsal stream has access to the stored ventral stream representation after the visual stimulus disappears. While this represents the problem for the model that postulates two separate modules, a model that postulates that both visual streams employ an egocentric frame of reference does not encounter this problem, because in that case “perception has no problem when it comes to telling action upon which object to act” (Briscoe 2009, p. 443). Thus, according to this view visual awareness is not restricted to an egocentric spatial frame, but may inform the

94

4 Virtual Embodiment and Action

subject about allocentric features of the visual scene, and the ventral stream operates together with the dorsal stream and executive brain areas in supporting visual awareness of egocentric space. Recent modifications of the dual-stream model for visual processing developed after Milner and Goodale’s influential model postulate within the dorsal stream the dorso-dorsal and the ventro-dorsal streams. Briefly, the distinction between them is that the dorso-dorsal processing stream supports online control of action, whereas the ventro-dorsal stream supports object awareness for action (Binkofski and Buxbaum 2013). They operate together with the ventral stream. Considering these modifications, some scholars argue that the dorso-dorsal stream supports fast, non-conscious visual transformations and the feeling of presence, whereas the ventro-dorsal stream supports the conscious visual awareness of space, and that the ventral stream supports objects’ identification and classification (Clark 2009). If the dorso-dorsal processing stream supports the feeling of presence and the ventro-dorsal processing stream supports conscious visual awareness, such as awareness of one’s actual location, then the model could help to better explain experiences in immersive virtual reality. Importantly, both types of information processing remain within the dorsal stream, but the difference is in the type of processes: the automatic, unconscious processes (that enable presence) and conscious visual awareness (e.g. about the actual environment, owing to a shift in visual attention and break in presence). Regardless, the dual-stream model of visual processing and its recent modifications remain at the heart of current efforts to explain vision. When it comes to immersive virtual reality, the model remains highly relevant because it suggests that vision for action requires that the observed object be specified in egocentric coordinates. In sum, vision for perception (descriptive vision) differs from vision for action (motion-guiding vision) in that the former refers to experiences involving visual awareness, whereas the latter pertains to fine motor control which is executed without conscious awareness, and for which the relationship between motor output and visual input is critical. Vision for perception involves processing features of objects, such as color and shape, their structure, the way objects move and the direction in which they move, while vision for action involves processing of affordances, for instance, perceiving an object’s orientation when reaching for it (Matthen 2010). This type of vision allows us to reach for an object, manipulate objects in near space, locate objects relative to ourselves, and communicate by pointing to objects.

4.3.1 Visual Scene and the Feeling of Presence An important feature of experiencing a visual scene is the feeling of presence. However, this feeling can only be ascribed to things that the visual system recognizes as real. Thus, the visual system needs to first solve the problem whether the observed object or event is real (Matthen 2010). A real life visual scene conveys a certain content to the perceiver, i.e. something descriptive about his surroundings.

4.3 Vision for Action

95

This content is conveyed directly, without a need for making inferences, by means of visual qualia2 or vision’s own medium of expression: In vision we experience a scene consisting of discrete objects and their properties, including location. In addition, we see a great variety of overlapping objects: not only material objects, but shadows, patches of light, films, stains, vapours, three-dimensional regions of illumination or darkness, reflections, and more. As far as visual properties are concerned, we directly see shapes, motion, and faces… we see visual objects in three-dimensional locations, and … where there are no objects to see, we do not see anything – we do not see unfilled visual field places. (Matthen 2010, p. 14)

Matthen observes an important phenomenological difference between seeing an object in real life and seeing it in a picture: in his terms, real life object seeing is “actuality committing”, but this is not the case with pictorial seeing. Crucially, there is a spatial connection between the viewing subject and real life objects, which is not present in pictorial vision. While the picture itself is a real life object, the objects depicted in the picture are not. For example, looking at a picture of people at a party, we may recognize their faces, the depicted location, and remember the occasion, but the picture itself does not tell us where they actually are. This is, Matthen claims, because we are spatially disconnected from the depicted people. In contrast, the same scene in the actual world tells us immediately that they are there, in the same space with us, meaning we know where they are relative to us and we can assign them egocentric coordinates. Some pictures may be deceptive in conveying the sense of spatial connection with real life objects and with the objects in them appearing as if they were in the same space with the viewing subject. We extend this concept to virtual environments, suggesting that the same phenomenological difference that holds for real life seeing and pictorial seeing may apply to technology-mediated virtual environments. Experiencing a visual scene in actual life contains a feeling of spatial connection between the viewing subject and the viewed objects. The spatial unity between the viewing subject and the viewed object implies that the self and the environment are inseparable, as also postulated by Gibson (1986), among others. The notion of extended self and virtual embodiment in immersive virtual environments in principle allow extension of the spatial unity between the participant and the objects viewed in the virtual space, where the participant’s representation, that is his avatar, “acts” as a subject in that environment. However, this connection may be lost in unstable environments. That this is not simply a fidelity issue is indicated by the evidence suggesting that increased visual realism in virtual environments does not necessarily induce the feeling of presence (Garau 2006). Matthen’s concept of duality of seeing associated with pictorial vision (i.e. we see the picture itself and a depicted scene) is relevant for virtual reality as well, where on the one hand, there is a computer-generated space in which participant is supposed to act and, on the other hand, there is a displayed 3D scene with objects in it. This concept is similar to shifts in visual attention, which make the participant feel present in a virtual space vs. in the physical world 2 Qualia

(from qualities) refers to the felt qualities of experiences, to what it is like to have an experience (e.g. the feeling of pain, the hearing of a sound, etc.) (Blackburn 1994).

96

4 Virtual Embodiment and Action

(e.g. due to a break in presence). In real life seeing, both vision for perception and vision for action operate together, and objects can be assigned both egocentric and allocentric coordinates. However, we cannot interact with depicted objects; they cannot be located relative to the perceiver and thus they cannot be assigned the egocentric coordinates. As Matthen puts it, “vision represents depicted objects as not really present” (p. 24). Therefore, as much as spatial connection between the viewing subject and the viewed object is important for the feeling of presence, so is the viewer’s ability to locate objects relative to his/her own body. Viewing an object in the real world allows assignment of egocentric coordinates to the seen objects (i.e. relative to the perceiver’s body), which is important for object manipulation, as well as assignment of allocentric coordinates (i.e. independent of the perceiver’s body). Locating an object in a virtual environment relative to the perceiver’s own body requires consideration of the object’s location both in egocentric coordinates (i.e. relative to the participant’s physical body) and in quasi-egocentric3 coordinates (i.e. relative to the participant’s assuming of the position of his/her avatar). In an actual visual scene, we may choose how to locate an object, and choosing egocentric coordinates (the book is in front of me) over allocentric coordinates (the book is next to the newspapers) may have no importance. In virtual environments, where so much depends on object manipulation, motion-guiding vision and thus the ability to locate objects in egocentric coordinates are crucial. Yet this ability may be diminished in unstable virtual environments, where the feeling of spatial unity between the participant and the observed virtual objects may also be lacking. As pointed out by Ferretti (2016), among others, the dorsal visual stream does not differentiate between real and depicted objects, but it is relevant for response selection because it allows us “to detect the action afforded by an object, or in the case of pictures, to understand that there is no possible interaction” (p. 190). Interacting with virtual objects via an avatar allows participants to act on virtual objects’ affordances; however, if the illusion of extended space in which one acts and illusion of own corporeality being extended to that space are missing, the observed possibilities for action will be lost.

4.4 Motor Cognition An important feature that makes a difference between a movement and a motor act is the presence of a goal: a movement is a motor act if a goal is associated with it (Gallese 2000a; Ammaniti and Gallese 2014). Different motor acts are supported by different classes of neurons. Sometimes they are referred to as a motor vocabulary (Gentilucci and Rizzolatti 1990; Rizzolatti et al. 1988). For instance, neurophysiological studies of the motor function in the monkey have revealed four classes of

3 Briscoe

(2009) uses this term in his discussion on perception of depicted space.

4.4 Motor Cognition

97

neurons as relevant for specific motor acts, which are knows as grasping, holding, tearing and manipulation neurons. Two classes of grasping-related neurons located in area F5 of the ventral premotor cortex in the macaque brain are the canonical neurons and mirror neurons. Canonical neurons respond to visual presentation of 3D objects in the absence of movement, where objects appear to be represented in relational terms. They contribute to the visuomotor transformation for grasping, but they do not seem to be relevant in identification of objects (Nelissen et al. 2005). That is, an object is identified and differentiated not in terms of mere physical appearance, but in terms of the effect of interaction with the acting agent (Gallese 2000a). This brings the perceptual and the motor systems closer, allowing for postulation of a shared code. In other words, canonical neurons code visually perceived objects according to their affordances, supporting grasping actions under visual guidance. Another class of grasping-related prefrontal neurons are mirror neurons. Although mirror neurons have the same motor properties as canonical neurons, they have different visual properties. Mirror neurons become activated by either execution or observation of an action. Apparently, it is the action, not the observed object or agent, that activates these neurons. Crucially, the neural pattern in the observer matches the neural pattern elicited in the agent. A subclass of these neurons discharges not only when observing a noisy act, such as breaking a peanut, but also when only the sound is present and the action itself is not visually observed. Thus, according to this view, mirror neurons are the neural basis of an action observation/execution matching system (Gallese 2000a) and as such they are considered to contribute to action understanding and making inferences about others’ motor intentions as well as predictions about actions of others (Umilta et al. 2001; Oosterhof et al. 2012). The brain region that houses the two types of neurons, F5, is a motor area that is capable of supporting sensory–motor associations (Rizzolatti et al. 2001). Neuroimaging studies involving human subjects have revealed increased activation in ventral premotor cortex and anterior parietal cortex when subjects observe and execute actions. Remarkably, the brain regions supporting the mirror mechanism in humans are activated even when observed hand actions are performed both by a robotic arm as well as in people who cannot perform hand grasping due to congenital deficiencies of upper limbs (Ammaniti and Gallese 2014). These findings have been interpreted as evidence that these areas house the human homologue of the mirror system, which is found in area F5 and parietal areas in macaque’s brain. More generally, the mirror mechanism is a neural mechanism for directly mapping observed behaviors (e.g. motor actions) onto observer’s behavior (Gallese and Sinigaglia 2011). We mirror not only motor goals of others, but we also mirror forms of behavior of others (Ammaniti and Gallese 2014). For example, empathy is a mirroring mechanism that involves our capacity to share emotions and sensations with others. Neuroimaging studies suggest that perceiving emotions and sensations experienced by others activates some of the brain areas displaying mirror activation, which suggests that “there is a we-centric dimension” in our experience of affective states of others (Ammaniti and Gallese 2014, p. 15), which is consistent with the embodied simulation theory (Sect. 3.2). Observing someone’s facial expression of

98

4 Virtual Embodiment and Action

disgust partially activates the portion of insula that is typically activated by the firstperson experience of disgust. Additionally, our facial muscles display a congruent expression to some extent, which depends on our empathic capacity (Jeannerod and Jacob 2005; Gallese and Sinigaglia 2011), but merely imitating the expression of an emotion observed in the other does not necessarily induce the experience of that emotion in the observer. Overall, the human mirror system has been implicated in various aspects of social cognition, from action understanding, imitation, theory of mind, to language acquisition and empathy (Rizzolatti and Craighero 2004). But how does the brain that is mirroring an action performed by someone else know when to attribute a movement to the other and not to the self? Research on crossmodal action coding has largely focused on actions that were seen as if performed from the first person perspective. Some evidence indicates that cross-modal visuomotor coding occurs in ventral premotor cortex when actions are seen from the first person perspective (Kilner et al. 2007, 2009). Recent evidence corroborates the role of the ventral premotor cortex in action-specific cross-modal vision-motor representations only for the first-person perspective, revealing additionally a role of posterior areas in parietal and occipital cortex in cross-modal coding, regardless of perspective. These findings indicate that human understanding of actions performed by others involves a more distributed network, implying dissociable neural substrates for actions observed from the first- and third-person perspectives (Oosterhof et al. 2012). The differentiation of the neural substrates for actions performed by an individual and actions that he/she observes others perform is compatible with the distinction between the first-person and third-person perspective at the phenomenological level. For instance, a first-person perspective is relevant for self-consciousness. When we observe someone else’s performance of an action, the processes involved in observing execution of our own action are not present, such as the presence of a planned goal before any movement begins, perceived ownership of the effector used to execute the action, coordination of muscles of the involved effector, and involvement of the visuomotor and proprioceptive neural feedback mechanisms (Oosterhof et al. 2012). Perceiving actions performed by others is typically an experience of a third-person perspective and it is associated with mechanisms involved in interpreting behaviors of others, such as theory of mind (Sect. 3.1.1). Patients with schizophrenia who suffer from delusions of control have a faulty internal representation of action (Blakemore and Frith 2003), with some evidence suggesting that their confusion between self and other is due to an abnormal sensory prediction (Blakemore et al. 2000). The mirror neuron theory of action understanding continues a long line of motor theories of cognition. Such theories go back to the eighteenth century, when Berkeley (1709) proposed a motor interpretation of depth perception. Motor theories of cognition were dominant in the early twentieth century (e.g. Washburn’s motor theory of mental imagery), but their influence had diminished throughout the twentieth century, with the exception of Liberman et al.’s (1967; Liberman and Mattingly 1985) very influential motor theory of speech perception. The remarkable finding that the perceptual and motor systems share a common code has been recognized not only as a major contribution to our knowledge on

4.4 Motor Cognition

99

the primate cerebral cortex, but it has also motivated research on human intention to act and goal-oriented movements. However, the mirror neuron theory of action understanding has been controversial. Some researchers have questioned the role of mirror neurons in action understanding, proposing instead that action understanding is the cause not the consequence of discharge of mirror neurons (Brass et al. 2007). Nonetheless, a rich body of evidence suggests that mirror neurons play a role in predicting why an individual is performing an action and thus they are relevant for social interaction (Ferrari et al. 2015). In addition to understanding other people’s actions and intentions, observing their action execution may provide insights into affective and communicative relations between the agent and the action recipient (Di Cesare et al. 2015).

4.4.1 Hierarchies of the Action System The concept of hierarchical structure of the action system posits actions in a motor hierarchy from kinematics to goals and intentions (Macerollo et al. 2015). The hierarchy is applicable to action execution and action observation. On this view, visual properties of actions are supported by posterior occipitotemporal areas, action goals and intentions by parietal areas, while motor control is supported by frontal areas (Oosterhof et al. 2012). Researchers also differentiate between a hierarchical structure of an action that is observable in behavior and a hierarchical structure of neural processes that support the action (Uithol et al. 2012). While the literature on motor control reserves the term action hierarchy for both, recently a distinction has been made between the action hierarchy and the control hierarchy, where the action hierarchy contains subactions and sub-subactions, whereas the control hierarchy consists of the neural processes that underlie or control the action (Uithol et al. 2012). Usually, the two hierarchies are assumed to match (e.g. Botvinick 2008), but this notion has been challenged based on the observation that the action system does not necessarily need to be hierarchical, even when a task is represented hierarchically (Badre 2008). Furthermore, the two hierarchies differ with regard to their structuring principles. The action hierarchy is based on a part-whole structure, meaning that the elements relate as constituents, comprising units at different levels of the hierarchy (action-subactins-subsubactions), where the lower levels of the hierarchy provide more detailed description. The control hierarchy is structured within a causal framework, meaning that the elements relate in a causal way (Uithol et al. 2012). Among important distinctions between the two hierarchies is that positing causal relations among elements of a hierarchy (e.g. goals can cause an action) can accommodate affordances, although both hierarchies are limited and so neither is considered a good candidate for a theory of action representation. Alternative accounts have been proposed around another structuring principle for a hierarchy, which is temporal extension. These accounts are based on temporal ordering and use of different time scales for different control processes (Uithol

100

4 Virtual Embodiment and Action

et al. 2012). Hierarchically higher elements are represented longer, they are more stable, and they exert more influence over hierarchically lower elements. For instance, goals are represented longer than movements, which explains our ability to organize behavior around goals. Unlike the causal hierarchy, which assumes a direct causal influence (i.e. from goals to actions) and thus it posits directionality of the relations between the constituting elements, the control hierarchy based on temporal extension assumes simultaneous influences of action features at multiple levels. Implicit in this approach is the notion that the action hierarchy and the control hierarchy do not need to match, as assumed in the classical cognitive science theories of action hierarchy (e.g. Botvinick 2008).

4.4.2 Intentional Action Intentional behavior, be it self-initiated movements or say speech production, is not a unitary phenomenon. According to one model, intentional actions consist of three components: deciding what to do (the what component), when to act (the when component), and whether to perform the action or to inhibit it (the whether component) (Braas and Haggard 2008). This model, which has been known as the what-when-whether model, has predecessors in the model of willed action developed by William James in the nineteenth century and in some more recent models. James’s (1890) concept of willed action is based on a distinction between ideo-motor and willed acts, where ideo-motor acts are automatic and effortless movements, while willed actions require attention, conscious effort, and a sort of expressed consent. In addition to attention and conscious awareness, as defined by James, choice and control as well as intentionality are further characteristics of willed actions (Jahanshahi and Frith 1998). While the what-when-whether model has attracted much attention among researchers, the evidence on the neural basis of intentional action is disparate, suggesting involvement of different networks in the three components of intentional action4 . Given the disparate findings regarding the involved neural circuits (e.g. Braas and Haggard 2008; Zapparoli et al. 2017), a precise demarcation of the involved brain regions remains a task for future research. Among the questions relevant for virtual environments are: Given that avatars are not cognitive agents, how is intention to act mediated in immersive virtual environments? Which cognitive processes support the mediation, and does participant’s intention to act in a virtual environment involve the same neural underpinnings as her intention to act in the real world? Any parsimonious model seeks to explain as many data as possible with fewer and fewer hypotheses (Gauch 2012). The what-when-whether model of intentional 4 We

will not review various proposals regarding specific brain regions and networks supporting each of the components, because the disparate findings may be due to methodological issues; we refer the reader to the original studies (e.g. Jahanshahi and Frith 1998; Braas and Haggard 2008; Hoffstaedter et al. 2013; Zapparoli et al. 2017, among others).

4.4 Motor Cognition

101

action explains a range of data, from those on how intentional actions are processed in the healthy brain to data on aberrations of intentional behavior found in neurological conditions. For instance, the disorder affecting patients’ ability to control intentional actions of one hand, which is known as the anarchic hand syndrome, might be explained in terms of the what component, i.e., as inability to intentionally select appropriate action; one of the defining features of Parkinson’s disease, the difficulty to initiate movements, may be explained in terms of the when component; and disorders such as obsessive-compulsive behavior, Tourette’s syndrome, and attention deficit hyperactivity disorder may be explained in terms of the whether component, i.e. as inability to intentionally inhibit actions (Jahanshahi and Frith 1998; Braas and Haggard 2008). The what-when-whether model of intentional action may be applicable to actions in virtual environments, if we keep in mind that these actions are projected, and that intention to act, although projected to a virtual space, remains in the domain of the actual cognitive agent, not his representation in the virtual space. Put differently, since avatars are not cognitive agents, the participant’s intention to act, when to act, and her decision whether to perform or inhibit the action may be compromised by technology-related issues. For instance, an intention to perform a joint action in a virtual space may be impossible to realize if avatars pass through each other. If such movements are not precluded/inhibited, it will be difficult to recognize the other as an equal, collaborating partner. Similarly, the decision on when to act in the virtual environment may be compromised by lags and delays in the system, which may lead to a wrong impression on the participant’s decision on whether to act.

4.4.3 Action Representation There is currently no consensus regarding the nature of action representation. The ongoing debate pertains to questions such as whether motor representations are supported by the dorsal stream, by the ventral stream, or by both; whether motor representations are consciously accessible or not; whether they represent only action properties, or action properties and goals plus bodily movements, or only action goals; and whether motor representations are a single representational mechanism or a multi-componential one (Ferretti 2016). Among the issues surrounding the notion of motor representation, the role of dorsal processing stream deserves careful consideration. As discussed in previous sections, the dual stream model remains the predominant model linking visual perception and action in humans and other mammals. It postulates that the dorsal stream supports processing of visually guided actions, while the ventral stream supports processing of visual perception. The original version of the dual-stream model disagrees regarding the above described role of the dorsal stream: Ungerleider and Mishkin’s (1982) model assumes that the main function of the primate visual system is visual perception, and therefore the two streams support perceptual awareness, whereas in Milner and Goodale’s (1995) model the two streams

102

4 Virtual Embodiment and Action

go beyond visual perception and include vision for action. The critical concept in the latter model is the concept of visuomotor transformation, that is, the automatic transformation of visual information into motor commands. Furthermore, Jeannerod and Jacob (2005) have introduced the terms semantic and pragmatic processing for vision for perception and vision for action respectively, arguing that Milner and Goodale’s model underestimated the complexity of action representations created in the course of pragmatic processing. Further refinements of the model postulate two distinct sub-streams within the dorsal stream: the dorso-dorsal and the ventro-dorsal stream, ascribing them different functionality (Sect. 4.3). Thus, the role of the dorsal pathway in vision has been viewed differently by different scholars (e.g. Briscoe 2009; Matthen 2010) and debated from the early days of the model, with the debate still continuing (Ferretti 2016). Additionally, there are important differences in visual representations of objects and visual representations of actions pertaining to representation in space. The spatial position of an object can be represented in different coordinates (Sect. 5.1.3), that is in coordinates respective to the participant’s body (egocentric coordinates) or an object’s position in space (allocentric coordinates), but also different coordinates within the egocentric perspective can be assigned to the object, depending on the effector. Vision for action requires that the spatial position of an object be represented in egocentric coordinates, whereas full perceptual awareness of other visual features of the object requires the allocentric coordinates (Jeannerod and Jacob 2005). Importantly, the parietal lobe which supports the dorsal stream has been functionally subdivided, with the superior parietal lobe supporting visuomotor processing, the right inferior parietal lobe supporting the perception of spatial relations, and the left inferior parietal lobe supporting representations of visually guided actions. The parietal lobe is also implicated in the generation of motor images, and it also may store motor representations that support motor imagery (Blakemore and Frith 2003). Research on motor imagery has emphasized the notion of action simulation (Jeannerod 2003). Namely, in order to perform an action, such as voluntarily touch an object, one needs to move. Performing an action implies that the initiated movement is goal-oriented and that every overt action follows from a covert stage of that action. Jeannerod (2003) suggests that covert actions, like motor images, are also actions, only not executed, arguing that the stage immediately preceding action execution is identical in mental and neural terms to action simulation. Another form of covert action that Jeannerod (2003) explains in terms of simulation is observation of actions executed by others. Since the brain areas activated by self-generated (overt and covert) actions and the areas activated by observation of actions executed by others partially overlap, implicating for instance motor cortex and premotor cortex, the representations are shared. According to this view, when we observe behavior of others, our understanding of the observed behavior begins only when the identity between action observation and action execution is established by means of a shared motor representation. The idea that different aspects of action—from intention to act to action execution to action observation—all share the same system, i.e. the motor system (Jeannerod 1994), has implications for virtual embodiment.

4.4 Motor Cognition

103

Bodily motor possibilities affect the sense of bodily awareness. Researchers generally agree that our awareness of own body differs from our awareness of objects in the environment and that our bodily awareness shapes our sense of self-awareness. Often, bodily awareness is defined in terms of proprioceptive awareness. Proprioception provides us with knowledge about our body. Various systems are in charge of providing information about one’s own body, such as the sense of movement of body parts relative to the body and the external world. Gallese and Sinigaglia (2011) question the view that our sense of bodily self is primarily determined by proprioception, arguing instead for a view based on a more intertwined roles of perception and action. Philosophers since Aristotle have argued that perception and action are more related than what appears to be the case (Schellenberg 2007). Gibson’s (1986) affordances and Dreyfus’s (1992) know how, influenced by work of Merleau-Ponty and Heidegger, reflect the idea that the body plays an important role in perception and that perception reflects adaptive action. The sharp distinction between perception and action has also been denied by other scholars (e.g. Neisser 1988), who argue that what one can do shapes what one perceives, and the other way around (Jeannerod 2003). The view that perception might be affected by the perceived possibilities for action, even in the absence of any movement, has been generating much interest (Gallese and Sinigaglia 2011). This view has implications for experiencing ourselves as bodily selves, because when it comes to action possibilities we experience own body as a body that can perform the action whose potentiality has been detected in the environment. This is what Gallese and Sinigaglia (2011) call bodily power for action, claiming that proprioceptive awareness is insufficient for the sense of bodily awareness, although it can provide helpful updates on it, and that bodily motor possibilities for action determine primary bodily self-awareness. Thus, according to this view “the motor roots” of our bodily self-awareness and the mirror mechanism that permits matching of action possibilities between our bodily self and bodily selves of others are key not only to disentangling the own self from the others, but also to connecting with others in a meaningful way. Motor possibilities of a virtual body that differs from the normal human body (e.g. it has a tail, or a third arm) may involve additional movements (e.g. tail movement, a third arm functionality) that may not be associated with the motor roots of bodily selfawareness, because such movements are not felt. Thus, the possibilities for action afforded by such virtual bodies are not registered by bodily self-awareness in the same way the movements afforded by the normal human body are registered.

4.5 Joint Action Differentiating between self-initiated movements and movements generated by others is related to the process of action monitoring. Cognitively normal individuals typically attribute the role of agent to the self when they execute an action and to

104

4 Virtual Embodiment and Action

the other when they observe someone else executing an action. The ability to understand actions of others is critical for everyday social interactions (Salomon et al. 2013). Another important aspect of social interaction is joint action, which occurs when two or more individuals act together to achieve a shared goal. According to another definition, joint action is “any form of social interaction whereby two or more individuals coordinate their actions in space and time to bring about a change in the environment” (Michael et al. 2016, p. 106). Humans appear to like coordinating their actions with others (Tomasello 2009). Participating in a joint action, and even only observing interpersonal coordination, enhances rapport and trust, willingness to cooperate and more generally pro-social behavior (Michael et al. 2016; Miles et al. 2009). Some researchers argue that synchronous movement highly correlates with rapport (Lakens and Stel 2011) and that the temporal organization of action via synchrony is key to successful social exchange (Miles et al. 2009; Valdesolo et al. 2010).

4.5.1 Synchronous Actions: Muscular Bonding and Beyond Rituals, be they animal behavioral patterns, psychopathological compulsions, or cultural rituals, are performed in search of order, stability, and predictability of the environment (Tonna et al. 2019). It has been long recognized that synchronous cultural rituals are beneficial for improving cooperation within groups. Intentionally or unintentionally, “armies, churches, and communities have all benefited from cultural practices that draw on ‘muscular bonding’ or physical synchrony, to solidify ties between members” (Wiltermuth and Heath 2009, p. 1). Synchronous rituals produce positive emotions, increase the group’s cohesion, attenuate the self-other differences, and increase the sense of individual well-being. The highest sense of belonging to a group is represented by oceanic merging (Bolender 2010). It is the scale that the human mind uses to represent the sense of being one with the universe (Sect. 3.1.4). In more common experiences, such as collective effervescence5 , the self-other boundaries weaken as one becomes absorbed in the group. Collective effervescence may characterize the joy experienced by rave dancers or positive emotions experienced by pilgrims at mass gatherings (Hopkins et al. 2016; Wlodarczyk et al. 2020). It occurs not only in situations where two or more individuals coordinate their movements; it may also occur due to empathetic arousal. As an example, Xygalatas et al. (2011) report synchronization of heart rates between the participants of a Spanish San Pedro Manrique’s village fire walking ritual and the spectators related to the participants.

5 The

term collective effervescence was introduced by Emile Durkheim to refer to crowd emotionality, a positive feeling resulting from a collective ritual (see Xygalatas et al. (2011); Hopkins et al. (2016), for more on the history of the term).

4.5 Joint Action

105

While muscular bonding of soldiers marching together may not bring the joy experienced by rave dancers, synchrony generally improves cooperation within groups. This effect surpasses experiences that involve positive emotions, as shown by cooperation fostered in situations that require personal sacrifice, and in physical rituals, as found in religious chanting and singing (Wiltermuth and Heath 2009). One explanation is that this happens because of strengthening of attachment among group members. The biological mechanism that supports coordinated and rhythmic actions among the interacting partners is known as entrainment. Entrainment has been observed across species, from fireflies to frogs and fiddler crabs to humans. An interesting notion is that when synchronization between interacting partners is optimized, the partner is experienced as being “like me”, sensorimotor coupling “becomes a form of shared mental representation… and a vehicle to selfother merging” (Fairhurst et al. 2013, p. 2599). However, the social identity accounts of intense positive emotions in crowds reject the notion that individuals lose their sense of self, positing instead a shift of focus from thinking about oneself in terms of personal identity to thinking about oneself as belonging to a group. This shift is then the foundation for the shift in behavior, which is now aligned to the group norms, and not to personal beliefs (Hopkins et al. 2016). Furthermore, coordinated behavior, be it dyadic (e.g. parent-child bonding, intimate relationships) or involving multiple participants, leads to judgments of enhanced social connectedness and rapport. In a recent study, Miles et al. (2009) investigated the effect of the mode of interpersonal synchrony in dyadic interactions on judgments of rapport. The study used visual and auditory cues to coordination between walkers. Even though the interacting individuals moved through a range of forms of intermediate stages of the phase in interpersonal coordination, they typically ended up in a state of in-phase or anti-phase coordination, that is, their actions were either at equivalent points or at the opposite points of the movement cycle. The results of the study indicate that the highest level of rapport is associated with the most stable forms of interpersonal coordination, which is in-phase and anti-phase synchrony. These results hold regardless of whether the modality of presentation of cues to coordination between the walkers is visual or auditory. Crucially, in-phase movement synchrony has been characterized as the most stable form of spontaneous movement synchronization; it emerges unintentionally and as such it is the only form that has the potential to blur the self-other boundaries (Lakens and Stel 2011; Launay et al. 2013). The stability of coordination between synchronously interacting partners creates the sense of social connection among them, which enhances the interactants’ judgment of rapport (Miles et al. 2009). Coordinated actions increase liking, affiliation, and rapport, which contributes to social cohesion and positively affects the ability to follow joint goals (Valdesolo et al. 2010). Non-conscious mimicry of the interactants typically increases trust, not only in everyday social interactions, but also in virtual reality involving interactions with virtual characters (Launay et al. 2013). Other advantages of coordinating one’s own actions with actions of another person include facilitation of learning (Wilson and Knoblich 2005).

106

4 Virtual Embodiment and Action

To achieve synchronized movements, two or more individuals need to be not only motivated for a joint action, but they also need to be able to predict each other’s behavior, move at the same time as the other (Launay et al. 2013), and also generate complementary sequences of action at the right time, which assume a continuous perceptual and motor acts adjustment (Valdesolo et al. 2010). Synchronization with others in a specific context improves perceptual sensitivity to other people’s movements, improving their joint action performance in other contexts (Valdesolo et al. 2010). The degree of coordination in a joint action also affects observers’ perception of the collaborating partners’ commitment to the joint action, with more coordination enhancing the expectation that the goal of the joint action will be achieved (Michael et al. 2016). Similarly, synchronous actions affect perceived rapport and entitativity judgments (Lakens and Stel 2011). According to Lakens and Stel (2011), movement synchrony serves as a cue to observers for making inferences about the extent to which the observed subjects are a social group. If movement synchrony is perceived as a result of explicit instructions to synchronize, shared feelings of rapport among the group members are perceived to a lesser extent than when the group synchronizes spontaneously. This is compatible with the observation that only nonintentional mimicry enhances social cohesion (Valdesolo et al. 2010). Finally, the effects of movement synchrony on perceived social unity are not necessarily due to perceptual similarity of synchronized movement rhythms, but rather synchronous movements cause attribution of inferences on shared psychological states of synchronized individuals (Lakens and Stel 2011), for instance synchronized emotions (Wlodarczyk et al. 2020). Joint action also requires collective intentionality (Searle 1990; Schmidt 2008; Galloti and Frith 2013) (Sect. 3.1.3). It is in this sense that action understanding in social interaction and joint action cannot be treated separately from mental states such as intentions and desires.

4.5.2 Just a Sound Affiliative behavior in virtual environments can be triggered by cues other than synchronized movement. To determine minimal cues that are necessary for affiliative behavior, a recent study investigated whether such behavior can be induced when agency is attributed only to a sound (Launay et al. 2013). The study reports such effects, regardless of whether subjects were aware that the sounds were produced by computer algorithms instead of human participants. It suggests that, even in the absence of visual cues regarding movement of another person, synchronization with a sound alone is sufficient to trigger affiliative behavior and the feeling of trust. In other words, mapping a sound alone onto an action, without mapping the actual movements to the action visually, served as a cue to the participants for ascribing the sounds to the actions of others. They believed that the others were making the same movements as themselves, thereby relating the perceived sounds to their own movements. However, synchrony was related to trust only when the participants’ movement (tapping) was in-phase with the heard sounds.

4.5 Joint Action

107

The finding that agency can be attributed even to a sound in this context is consistent with findings from animal studies, such as the finding that the sound of breaking peanuts in the absence of visual observation of the action itself activates a subclass of mirror neurons in the macaque brain (Gallese 2000a); the firing of these specific neurons indicates that the sound alone is mapped onto the action (Sect. 4.4). Using functional magnetic resonance imaging (fMRI) and a sensorimotor synchronization paradigm involving an adaptive pacing signal as a virtual partner, a recent study investigated brain networks supporting entrainment, that is, the mechanism that allows action synchronization or rhythmic behaviors among interacting partners (Fairhurst et al. 2013). This neuroimaging study suggests that two major brain networks support responses to a partner’s behavioral adaptability: the ventromedial prefrontal cortex supports actions in which it is easy to synchronize with a virtual partner, and the default mode network supports actions in which it is difficult to synchronize with a virtual partner. Importantly, synchronizing actions with an optimally adaptive partner, including a virtual partner, leads to easier coordination of a shared sensorimotor goal, which means that it requires less cognitive control resources.

4.6 Extended Embodiment and Action Agency and action ownership, which have been recognized as important cues to self-recognition and self-consciousness, are highly relevant for virtual embodiment (Sect. 1.4.1). However, even in the physical world they are not always as straightforward as they seem to be. Consider, for instance, a dance troupe in action, where the dancers perform the same movements and wear the same costumes. It is not surprising then that agency and action ownership pose a unique challenge to virtual reality design. A recent virtual reality study investigated whether visual attention can use efferent information to enhance self-recognition among several moving avatars (Salomon et al. 2013). Importantly, one of the avatars moved in a way consistent with the participant’s movements (“self”-avatar), while the other avatars (4 in one condition, 6 in another) made identical, although spatially deviated movements relative to the participant’s movements, and therefore they served as distractors in the experiment. The main finding of the study is that when the avatar’s movements were actively controlled by the participant’s active movements (as opposed to the “passive” movement condition, in which the participant’s arm was moved by the experimenter), reaction times for self-recognition were shorter regardless of the number of distractors. The authors refer to this short search time, unaffected by the appearance of other avatars, as “self pop-out”, suggesting an important role of efferent information in self-recognition from motion. This finding is aligned with the notion of forward models as being critical in the sense of agency (Blakemore and Firth 2003; Jeannerod 2003, Wolpert et al. 2003). The basic assumption here is that we use internal models to build representations

108

4 Virtual Embodiment and Action

of our actions and interactions with objects, where the forward model of motor control is one such internal model. The forward model postulates that when we move an internal representation of the planned motor action is created (i.e. efferent copy), which is then compared against the actual consequence of movement (afferent sensory inputs). This sensory prediction is a critical component of our awareness of an action, which is directed to intended movements rather than to the movements we perform or the sensory signal arising from the moving limb (Blakemore and Firth 2003; Jeannerod 2003, Wolpert et al. 2003). Thus, as illustrated by the “self pop-out” finding in Salomon et al.’s (2013) study and by findings from other studies indicating lack of feeling of a movement made by a part of avatar’s body that is not part of human body (e.g. a tail) (Steptoe et al. 2013), the sense of agency and action ownership are crucial for virtual bodies in action. Specifically, they are important for extending the sense of corporeality beyond the physical body (to a virtual space), incorporation of virtual tools into the body schema, and acting on affordances of the virtual environment, objects and behavior of others in it.

References Ammaniti, M., Gallese, V.: The birth of intersubjectivity. Psychodynamics, neurobiology and the Self. W.W. Norton & Company, New York, NY (2014) Badre, D.: Cognitive control, hierarchy and the rostro-caudal organization of the frontal lobes. Trends Cogn. Sci. 12, 193–200 (2008) Bergson, H.: Matter and memory. Zone Books, New York (1896/1988) Berkeley, G.: An essay towards a new theory of vision. Dublin (1709) Berlucchi, G., Aglioti, S.: The body in the brain: neural bases of corporeal awareness. Trends Neurosci. 20, 560–564 (1997) Binkofski, F., Bluxbaum, L.J.: Two action systems in the human brain. Brain Lang. 127, 222–229 (2013) Blackburn, S.: The Oxford dictionary of philosophy. Oxford University Press, Oxford (1994) Blakemore, S.-J., Frith, C.: Self-awareness and action. Curr. Opin. Neurobiol. 23, 219–224 (2003) Blakemore, S.-J., Wolpert, D., Frith, C.: Why can’t you tickle yourself? NeuroReport 11, 11–16 (2000) Bolender, J. The self-organizing social mind. MIT Press, Cambridge, MA (2010) Borghi, A.M., Riggio, L.: Stable and variable affordances are both automatic and flexible. Front. Hum. Neurosci. 9, 351 (2015) Botvinick, M.: Hierarchical models of behavior and prefrontal function. Trends Cogn. Sci. 12, 201–208 (2008) Braas, M., Haggard, P.: The what, when, whether model of intentional action. Neurosci. 14, 319–325 (2008) Clark, A.: Perception, action, and experience: Unraveling the golden braid. Neuropsychologia 47, 1460–1468 (2009) Damasio, A.: The feeling of what happens. Body and emotion in the making of consciousness. Harcourt Inc, San Diego, CA (1999) Di Cesare, G., Di Dio, C., Marchi, M., Rizzolatti, G.: Expressing our internal states and understanding those of others. PNAS 112, 10331–10335 (2015)

References

109

Dreyfus, H.L.: What computers still can’t do: a critique of artificial reason. MIT Press, Cambridge, MA (1992) Garau, M.: Selective fidelity: investigating priorities for the creation of expressive avatars. In: Schroeder, R., Axelsson, A.-S. (eds.) Avatars at work and play. Collaboration and interaction in shared virtual environments, pp. 17–38. Springer (2006) Fairhurst, M.T., Janata, P., Keller, P.E.: Being and feeling in sync with an adaptive virtual partner: brain mechanisms underlying dynamic cooperativity. Cereb. Cortex 23, 2592–2600 (2013) Ferretti, G.: Through the forest of motor representations. Conscious. Cogn. 43, 177–196 (2016) Friston, K.: Prediction, perception and agency. Int J Psychophysiol. 83, 248–252 (2012) Gallagher, S.: Body schema and intentionality. In: Bermúdez, J.L., Marcel, A., Eilan, N. (eds.) The body and the self, pp. 226–244. MIT Press, Cambridge, MA (1995) Gallese, V.: The acting subject: toward the neural basis of social cognition. In: Metzinger, T. (Ed.), Neural correlates of consciousness, pp. 323–333. MIT Press, Cambridge, MA (2000a) Gallese. The acting brain: reviewing the neuroscientific evidence. Psycoloquy, 11 (34) (2000b). http://www.cogsci.soton.ac.uk/psyc/bin/newpsy/11.034 Gallese, V., Sinigaglia, C.: How the body in action shapes the self. J. Conscious. Stud. 18, 117–143 (2011) Galloti, M., Frith, C.D.: Social cognition in the we-mode. Trends Cogn. Sci. 17, 160–165 (2013) Gauch, H.G.: Scientific method in practice. Cambridge University Press, Cambridge (2012) Gibson, J.J.: The ecological approach to visual perception. Lawrence Erlbaum Associates, Publishers, Hillsdale, NJ (1986) Gibson, E.J., Adolph, K., Eppler, M.: Affordances. In: Wilson, R.A., Keil, F.C. (eds.) The MIT encyclopedia of cognitive science, pp. 4–6. MIT Press, Cambridge, Massachusetts (1999) Hoffstaedter, F., Grefkes, C., Zilles, K., Eickhoff, S.B.: The “what” and “when” of self-initiated movements. Cereb. Cortex 23, 520–530 (2013) Hopkins, N., Reicher, S.D., Khan, S.S., Tewari, S., Srinivasan, N., Stevenson, C.: Explaining effervescence: investigating the relationship between shared social identity and positive experience in crowds. Cogn. Emot. 30, 20–32 (2016) Howard, E.E., Edwards, G.S., Bayliss, A.P.: Physical and mental effort disrupts the implicit sense of agency. Cognition 157, 114–125 (2016) Jahanshahi, M., Frith, C.D.: Willeed action and its impairments. Cogn. Neuropsychol. 15, 483–533 (1998) James, W.: The principles of psychology. Holt, New York (1890) Jeannerod, M.: The representing brain: neural correlates of motor intention and imagery. Behav. Brain Sci. 17, 187–245 (1994) Jeannerod, M.: The mechanism of self-recognition in humans. Behav. Brain Res. 142, 1–15 (2003) Jeannerod, M., Jacob, P.: Visual cognition: a new look at the two-visual systems model. Neuropsychologia 43, 301–312 (2005) Jeannerod, M., Anquetil, T.: Putting oneself in the perspective of the other: a framework for self-other distinction. Soc. Neurosci. 3, 356–367 (2008) Kilner, J., Neal, A., Weiskoph, N., Friston, K.J., Frith, C.D.: Evidence of mirror neurons in human inferior frontal gyrus. J. Neurosci. 29, 10153–10159 (2009) Lakens, D., Stel, M.: (2011). If they move in synch, they must feel in synch: movement synchrony leads to attributions of rapport and entitativity. Soc. Cogn. 29, 1–14 Launay, J., Dean, R.T., Bailes, F.: Synchronization can influence trust following virtual interaction. Exp. Psychol. 60, 53–63 (2013) Liberman, A.M., Cooper, F.S., Shankweiler, D.P., Studdert-Kennedy, M.: Perception of the speech code. Psychol. Rev. 74, 431–461 (1967) Liberman, A.M., Mattingly, I.G.: The motor theory of speech perception revised. Cognition 21, 1–36 (1985) Macerollo, A., Bose, S., Ricciardi, L., Edwards, M.J., Kilner, J.M.: Linking differences in action perception with differences in action execution. SCAN 10, 1121–1127 (2015)

110

4 Virtual Embodiment and Action

Matthen, M.: Two visual systems and the feeling of presence. In: Ganopadhyay, N., Modarly, M., Spicer, F. (eds.) Perception, action, and consciousness: sensorimotor dynamics and two visual systems. Oxford University Press, Oxford (2010) Merleau-Ponty, M.: Phenomenology of perception. London, Routledge (1958) Michael, J., Sebanz, N., Knoblich, G.: Observing joint action: coordination creates commitment. Cognition 157, 106–113 (2016) Miles, L.K., Nind, L.K., Macrae, N.C.: The rhythm of rapport: interpersonal synchrony and social perception. J. Exp. Soc. Psychol. 45, 585–589 (2009) Milner, A.D., Goodale, M.A.: The visual brain in action. Oxford University Press, Oxford (1995) Milner, A.D., Goodale, M.A.: Two visual system re-viewed. Neuropsychologia 46, 774–785 (2008) Neisser, U.: Five kinds of self-knowledge. Philos. Psychol. 1(1), 35–59 (1988) Nelissen, K., Luppino, G., Vanduffel, W., Rizzolatti, G., Orban, G.A.: Observing others: multiple action representation in the frontal lobe. Science 310, 332–336 (2005) Oosterhof, N., Tipper, S.P., Downing, P.E.: Viewpoint (in)dependence of action representation: an MVPA study. J. Cogn. Neurosci. 24, 975–989 (2012) Rizzolatti, G., Fogassi, L., Gallese, V.: Neurophysiological mechanisms underlying the understanding and imitation of action. Nat. Rev. Neurosci. 2, 661–670 (2001) Rizzolatti, G., Craighero, L.: The mirror-neuron system. Annu. Rev. Neurosci. 27, 169–192 (2004) Briscoe, R.: Egocentric Spatial Representation in Action and Perception. Philos. Phenomenol. Res. 79(2), 423–460 (2009) Sakreida, K., Effnert, I., Thill, S., Menz, M.M., Jirak, D., Eickhoff, C.R. et al.: Affordance processing in segregated parieto-frontal dorsal stream sub-pathways. Neurosci. Biobehav. Rev. 69, 89–112 (2016) Salomon, R., Lim, M., Kannape, O., Llobera, J., Blanke, O.: “Self pop-out”: agency enhances self-recognition in visual search. Exp. Brain Res. 228, 173–181 (2013) Schellenberg, S.: Action and self-location in perception. Mind 116, 603–631 (2007) Searle, J.: Collective intentions and actions. In: Cohen, P.R., Morgan, J., Pollock, M. (eds.) Intentions in communications, pp. 401–415. MIT Press, Cambridge, MA (1990) Steptoe, W., Steed, A., Slater, M.: Human tails: ownership and control of extended humanoid avatars. Vis. Comput. Graph., IEEE Trans. 19, 583–590 (2013) Tomasello, M. Origins of human communication. MIT Press, Cambridge, MA (2009) Tonna, M., Marchesi, C., Parmigiani, S.: The biological origins of rituals: an interdisciplinary perspective. Neurosci. Biobehav. Rev. 98, 95–106 (2019) Trevarthen, C.: Intersubjectivity. In: Wilson, R.A., Keil, F.C. (eds.) The MIT encyclopedia of the cognitive sciences, pp. 415–419. MIT Press, Cambridge, MA (1999) Uithol, S., van Rooij, I., Bekkering, H., Haselager, P.: Hierarchies in action and motor control. J. Cogn. Neurosci. 24, 1077–1086 (2012) Umilta, M.A., Kohler, E., Gallese, V., Fogassi, L., Fadiga, L., Keysers, C., Rizzolatti, G.: I know what you are doing. a neurophysiological study. Neuron 31, 155–165 (2001) Ungerleider L.G., Mishkin M. Two cortical visual systems. In: Ingle D.J., Goodale M.A., Mansfield R.J.W. (eds.) Analysis of visual behavior, pp. 549–586. MIT Press, Cambridge, MA (1982) Valdesolo, P., Ouyang, J., DeSteno, D.: The rhythm of joint action: synchrony promotes cooperative ability. J. Exp. Soc. Psychol. 46, 693–695 (2010) Wilson, M., Knoblich, G.: The case for motor involvement in perceiving conspecifics. Psychol. Bull. 131, 460–473 (2005) Wiltermuth, S.S., Heath, C.: Synchrony and cooperation. Psychol. Sci. 20, 1–5 (2009) Wittmann, M.K., Kolling, N., Faber, N.S., Scholl, J., Nelissen, N., Rushworth, M.F.S.: Self-other mergence in the frontal cortex during cooperation and competition. Neuron 91, 482–493 (2016) Wlodarczyk, A., Zumeta, L., Pizarro, J.J., Bouchat, P., Hatibovic, H., Basabe, N., et al.: Predictive emotional synchrony in collective gatherings: validation of a short scale and proposition of an integrative measure. Front. Psychol. 11, 1721 (2020) Wolpert, D.M., Doya, K., Kawato, M.: A unifying computational framework for motor control and social interaction. Philos. Trans. R. Soc., Ser. B, Biol. Sci. 358, 593–602 (2003)

References

111

Xygalatas, D., Konvalinka, I., Roepstorff, A., Bulbulia, J.: Quantifying collective effervescence. Heart rate dynamics at a fire-waling ritual. Commun. & Integr. Biol. 4, 735–738 (2011) Zapparoli, L., Seghezzi, S., Paulesu, E.: The what, the when, and the whether of intentional action in the brain: a meta-analytic review. Front. Hum. Neurosci. 11, 238 (2017)

Chapter 5

Spatial Cognition in Virtual Reality

Space not only communicates in the most basic sense, but it also organizes virtually everything in life (Hall 1990, p. viii). What you can do in it determines how you experience a given space (Hall 1969, p. 54).

5.1 Body and the Space Around It In one way or another, space is implicated in all our behavior. Space not only organizes nearly everything in life, as observed by Hall (1990), but it also demarcates the borders between the self and the other (Jeannerod and Anquetil 2008). Many cognitive processes, from navigation, mental rotation, syllogistic reasoning, and problem solving, to language require spatial information processing (Gunzelmann and Lyon 2007). While a number of theories address various aspects of spatial information processing, for instance, representation of environmental information and formation of cognitive maps, visuospatial working memory, reasoning based on spatial mental models, mental imagery and navigation, other theories focus more on the neural basis of spatial processing and investigate, for example, how the brain builds spatial representations, how specific types of cell populations (e.g. place cells, grid cells) support specific aspects of spatial cognition, and whether the neural mechanisms that support spatial navigation operate across other domains of cognition. There is currently no unified theory that would explain consistently and parsimoniously across the relevant levels of analysis (Sect. 2.4) the way in which human beings perceive and conceptualize space. It is not surprising then that some basic spatial processing terms are sometimes used rather loosely. For instance, the term spatial representation has been used with reference to different subsystems, from spatial perception (e.g. object localization, line orientation discrimination, and spatial synthesis), to spatial memory, spatial attention, mental rotation and spatial construction (Atkinson 1999). Despite its profound role in our life, we find it very difficult to come to grips with space, struggling even with the most basic questions, such as: What is space? How is

© Springer-Verlag GmbH Germany, part of Springer Nature 2021 V. Kljajevic, Consensual Illusion: The Mind in Virtual Reality, Cognitive Systems Monographs 44, https://doi.org/10.1007/978-3-662-63742-5_5

113

114

5 Spatial Cognition in Virtual Reality

it represented in the mind? How are multiple spatial representations put together to derive perceptual unity of space? (O’Keefe and Nadel 1978; O’Keefe 1999; Colby 1998; Rowland et al. 2016). By positioning participants and their experiences in a projected environment, virtual reality adds further layers of complexity to our perception and conceptualization of space, emphasizing the intricacies of the way we interact with the world around us.

5.1.1 Sense of Place For a long time, psychologists were primarily interested in object perception, leaving environment perception to generalizations based on findings on object perception. This approach has been criticized as limited, if for no other reason then because of a crucial distinction between objects and environments: Objects require subjects—a truism whether one is concerned with the philosophical unity of the subject-object duo, or is thinking more naively of the object as a ‘thing’ which becomes a matter for psychological study only when observed by a subject. In contrast, one cannon be a subject of an environment, one can only be a participant. The very distinction between self and nonself breaks down: the environment surrounds, enfolds, engulfs, and no thing and no one can be isolated and identified as standing outside of, and apart from, it. (Ittelson 1973, pp. 12–13)

Overcoming this major issue of the traditional approach, current environmental psychology generally defines sense of place along three dimensions: (1) physical features of the environment (e.g. natural features, sound, light), (2) activities that the place affords (e.g. movement, manipulation of objects, interaction with others), and (3) associated meanings and emotions (e.g. being reminded of other places, feeling at home) (Turner et al. 2003). Thus, the identity of places is defined in terms of their physical setting, the activities they afford, and their meanings. Some argue that sense of place is like an attitude and, as any other attitude, it has cognitive, affective and conative aspects. While most scholars agree that some generalizations about sense of place are possible, they also maintain that sense of place is an emergent characteristic of the way in which an individual interacts with the environment. To investigate sense of place and presence in a photo-realistic immersive virtual reality environment, Turner et al. (2003) recreated a glasshouse in a botanical garden. Participants were immersed in a tropical glasshouse via a head-mounted display and they viewed a 360 degrees panorama of its interior. However, the panorama could not include any moving objects or people and therefore it could not allow manipulation of objects or interaction with other participants. Participants’ physical movements were also restricted, except that they could turn 360 degrees in either direction as well as look up and down. This virtual glasshouse allowed for only a passive, viewing experience. The environment included sounds consistent with botanical gardens, such as birdsong and the sound of moving water, which were conveyed over the external speakers, but there was no other sensory input.

5.1 Body and the Space Around It

115

For this virtual environment, participants noted the lack of realism. In terms of physical settings, the heat, the smell and humidity of real world glasshouses were missing. For instance, one participant noted: “I have been to a botanical garden and some of the most distinct things about that is … the warmth and the water in the air and the smells that make a big impression when you are there and I needed that to make this seem more real” (Turner et al. 2003, p. 8). Similarly, in terms of possible activities in this space, participants were disappointed by the restricted movement: “I think I would have liked to have taken a leaf or taken a walk around…”. Furthermore, compared to the real glasshouse experience, participants had considerably less reference to the meanings, that is, the virtual glasshouse did not trigger associations with other places, and they commented on the artificiality of the environment. Overall, this particular virtual glasshouse was deficient along all three dimensions of sense of place—the physical setting, actions it affords, and meanings attached to the place-, which negatively affected the participants’ experience.

5.1.2 Processing Strategies for Real versus Virtual Spaces Since virtual environments are projected spaces, an important question is whether spatial processing is affected by the technology that mediates the virtual experience. This question is important, because it is often assumed that spatial skills for real world spaces can be successfully trained using virtual reality. However, the extent to which spatial skills acquired in a virtual environment are transferable to the real world situations is a matter of debate (Lessels and Ruddle 2005). Performance on spatial tasks in general can be affected by various factors, including limitations of perception, reduced spatial working memory capacity and attention span, among others (Gunzelmann and Lyon 2007). Like any cognitive reanalysis, from reanalysis of garden-path sentences1 in language comprehension to disambiguation of ambiguous images, mentally re-encoding spatial information (e.g. reanalyzing location relative to a different spatial cue) is cognitively expensive: it is time-consuming and error prone. Some evidence suggests that completing spatial tasks in virtual environments may require more time than completing comparable tasks in immediate environment, regardless of whether the virtual environment is large or small. Such findings have been interpreted to suggest that there exist important differences in spatial processing between virtual and real world spaces. Makany et al. (2006) investigated whether people use the same strategies in free spatial exploration of real and virtual spaces. They found that participants’ behavioral patterns in the physical environment reflected meaningful strategies. However, such strategies were not employed in the virtual environment, which resulted in inefficient navigation. Based on these findings, the authors argue that spatial cognition and behavior may fundamentally differ in the real world and in corresponding tasks in virtual environments. 1 An

often cited example of garden path sentences is: The horse raced past the barn fell.

116

5 Spatial Cognition in Virtual Reality

Further evidence supporting this view comes from neuroimaging data, which indicate that spatial processing of virtual objects in near space (60 cm) and far space (150 cm) is neurally different from spatial processing of real objects at the same distances. More specifically, Beck et al. (2010) found such differences using a version of the line bisection task, in which both horizontal and vertical 3D objects (a tube, a toilet paper roll) of actual size were presented as floating in a box-shaped room, and participants had to judge whether the object was centered, shifted to the left, or shifted to the right. Importantly, processing of objects in near space is associated with the dorsal visual stream, involving projections from visual areas in the occipital lobe to the posterior parietal region, and it is related to visually guided actions on such objects (Sect. 4.3). On the other hand, processing of objects in far space is associated with the ventral visual stream, involving projections from the primary visual area to inferior temporal cortex, and it is related to perceptual identification of objects (visual perception or descriptive vision). However, Beck et al. (2010) found that the brain areas supporting the processing of stimuli in the real world do not correspond to those activated by the processing in virtual environment. Specifically, the processing of far space objects in a virtual environment was supported by brain areas along the dorsal stream, whereas the processing of the near space objects activated brain areas along the ventral stream. These results indicate that, while the far space virtual objects were processed spatially as 3D objects, this was not the case with near space virtual objects, which were processed as objects without spatial reference, as if pictorial seeing was involved. Importantly, the objects in near space were not processed as objects that afford manipulation. Such processing of a visual scene is clearly not presence inducing. The finding that the neural basis of near and far space objects’ processing differs between the real world and a virtual space has important implications for virtual reality applications intended for treatment and rehabilitation of spatial processing deficits. It is also relevant for training of spatial skills in virtual environments for real world spaces. Further evidence for the claim that the brain differently processes spatial information in the real and virtual environments comes from an electrophysiological study. Bohbot et al. (2017) used invasive brain recordings in humans, specifically intrahippocampal electroencephalography (EEG), to study whether virtual and real movements during navigation induced oscillations peaking within different frequency bands. Findings from previous studies indicate that virtual navigation induces in human subjects low frequency oscillations (theta rhythm) in the hippocampus, which peak around 3.3 Hz, whereas oscillations induced by real world movement peak around 7–9 Hz. Bohbot et al.’s (2017) findings largely support this view, indicating that the shift to lower range frequencies in virtual vs. real movements may reflect the difference in the available bodily feedback. In other words, movements in a virtual environment (realized, for example, by pressing keys on the keyboard or operating a handheld device) lack appropriate bodily feedback, that is, motor, vestibular, and proprioceptive input, and because of this lack of signals, the hippocampal oscillations associated with real world movements cannot be induced by virtual movements. Another line of evidence suggesting differences in spatial information processing between the physical world and virtual environments comes from research on

5.1 Body and the Space Around It

117

animals, which indicates that some hippocampal cells, such as place cells,2 do not necessarily fire to the same extent in virtual reality tasks as they do in the corresponding tasks in a real laboratory setting. A recent study reported that as much as 60% of the place cells that were activated in the real world were “silent” in a virtual environment (Donato and Moser 2016). Taken together, these findings indicate that spatial processing may differ in virtual reality relative to the physical world, requiring more time, relying on different cognitive strategies, and differentially activating the associated neural substrate. These differences need to be considered not only when designing virtual environments specifically for training of spatial behavior for real world purposes, but also when designing more general tasks for virtual environments which by default require spatial processing.

5.1.3 Frames of Reference One concept that is critical for study of spatial cognition regardless of the discipline is frames of reference. Although the concept itself has been associated with Aristotle and it had predominated medieval theories of space, the term frame of reference on the other hand was introduced by Gestalt theories of perception in the 1920s, who provided the following definition: “a unit or organization of units that collectively serve to identify a coordinate system with respect to which certain properties of objects, including the phenomenal self, are gauged” (Levinson 1999, p. 126). Spatial coordinates may be extracted from various sensory inputs (visual, auditory, somatosensory). Different sensory signals that convey information on spatial coordinates are integrated in the posterior parietal cortex, where they further combine with proprioceptive information and are then used by different egocentric frames of reference to plan movement of a specific effector (Herweg and Kahana 2018). There exist many frames of reference, reflecting differences between underlying coordinate systems. Unlike the traditional approach to spatial processing, which postulates that the brain constructs a single map as a representation of the self, objects and actions in an environment, the predominant view now is that the brain constructs multiple spatial representations, some of which even simultaneously (Colby 1998; Colby and Goldberg 1999). These spatial representations may be encoded using different frames of reference. Thus, multiple frames of reference guide our behavior. A spatial reference frame consists of the reference point (the origin) and the reference direction (the axes) (Wang 2012). Considering the reference point, two broad 2 Place

cells are a type of cells in the brain that fire during spatial navigation, encoding a subject’s environmental location. The discovery of place cells in the human hippocampus (Ekstrom et al. 2003; Moser et al. 2015) is consistent with the findings from animal studies that reported the existence of place cells in the hippocampus of rats and monkeys. This discovery has led to a finer distinction regarding the neural basis of spatial representations, according to which spatial navigation is supported by the hippocampus, whereas identification of landmarks and spatial scenes is supported by the parahippocampus (Burges and O’Keefe 2003).

118

5 Spatial Cognition in Virtual Reality

classes of frames of reference are egocentric and allocentric representations. Egocentric representations are frames in which objects and locations are represented relative to the observer, that is, they are viewer-dependent. In contrast, the class of allocentric representations includes those frames in which objects and locations are represented in frames that are extrinsic to the observer, i.e. they are viewer-independent. For example, egocentric frames of reference include eye-centered, head-centered and arm-centered coordinates, whereas representations centered on an object or environment coordinates are allocentric. However, not everyone agrees with such neat classifications of cognitive frames of reference. For instance, Gallese (1999) argues against the strict division into viewer-centered and object-centered reference frames, because in his view such a rigid dichotomy follows from the tendency to consider action and perception as separate domains, which he considers an untenable view. Considering the reference direction, frames of reference are usually divided into relative, intrinsic, and absolute frames (Levelt 1999; Wang 2012). The relative frame of reference is view-dependent, and it is the typical way of expressing locations and directions in most European languages (e.g. front, back, left, right). The intrinsic frame of reference is a binary, viewpoint-independent relation (e.g. The garden is behind the house). Although this frame of reference is the main secondary frame in European languages, it is the primary frame in some other languages (Haun et al. 2011). Finally, the absolute frame of reference relates a reference object and a landmark using a system of fixed angles (e.g. north, south, east, west) (e.g. The lake is north of the hill). Some languages use exclusively an absolute perspective system3 ; others use exclusively an intrinsic system; yet some other languages use a mix of absolute and intrinsic perspective systems. English uses all three systems, with evidence on interindividual differences regarding which system is preferred as well as intraindividual differences in the use of a specific perspective depending on the purpose and context (Levelt 1999). Cross-cultural research on spatial cognition indicates that there exists variability in preferred spatial strategies among different cultures. Occasionally, this variability had been misguidedly interpreted as indicating differences in spatial capabilities of different cultures. However, it has been established on several grounds that crosscultural variability in spatial cognition is not a question of absolute capacity, but rather it reflects a preference of memory strategy used to encode spatial relations (Haun et al. 2011). Finally, adopting someone else’s spatial perspective is a simple form of inferring representations of the world of that person, and it could be also regarded as an important step towards inferring other people’s mental states (Kessler et al. 2010). When it comes to virtual environments, egocentric frames of reference are critical because of their role in object manipulation. In contrast, allocentric spatial representation cannot guide action (Sect. 4.3) and it needs to be transformed into the 3 Exclusive use of an absolute perspective system is found in Guugu Yimithirr, an Australian Aborig-

inal language spoken by the Guugu Yimithirr people of Far North Queensland. Exclusive use of an intrinsic system is found in Mopan, a Mayan language spoken by the Mopan people, an indigenous group of the Mayan people native to regions in Guatemala and Belize.

5.1 Body and the Space Around It

119

participant’s egocentric coordinates. For example, Havranek et al. (2012) found that an egocentric perspective (i.e. first person view) allowed a stronger sense of spatial presence than an exocentric perspective (i.e. third person view) in a video-game environment, regardless of whether the participants merely observed or played the game. Other evidence also suggests that an egocentric point of view contributes to the participants’ perception of a virtual body as their own body.

5.2 There… Where the Self Is? Different streams of research evidence converge to indicate that a physical location of the body does not always coincide with mental self-localization (Sect. 2.3.4), as found in out-of-body experience and in the rubber hand syndrome (Wissmath et al. 2011). Virtual reality extends the boundaries of self beyond the physical body into a virtual space. In a way, it allows a bilocation, because the participant is physically located in one environment, yet she acts and feels as if being present in another, virtual environment. It has been proposed recently that the feeling of being physically located in a mediated environment needs to be considered as a special type of presence, spatial presence (Hartmann et al. 2015). Spatial presence refers to the “perception or illusion to be located in an environment that is conveyed by some sort of media technology” (Hartmann et al. 2015, p. 116). According to another definition, spatial presence is considered a sense of being physically situated within a spatial environment portrayed by a medium (e.g., television, virtual reality) (Baumgartner et al. 2006). Current definitions determine spatial presence as an experience in which participants feel located in a technology-mediated environment, losing a critical distance to the media as well as awareness that the source of the experience is technology (Hartmann et al. 2015). This subjective experience is an illusion. Some models of presence explicitly use the term spatial presence and emphasize that it is the most critical component of participant’s experience of being in a virtual environment. One common thread in the conceptualization of spatial presence in such models is the idea that it is a subjective experience or a state of consciousness4 in which the participant feels physically located in a mediated environment. Apart from this general idea, researchers on spatial presence differ in their understanding of this emergent phenomenon associated with virtual technology (Bystrom et al. 1999). Taking a step back to revisit definitions of presence, one might find it unclear how this type of presence differs from the more general concept of presence (e.g. Sheridan 4 Although the term consciousness is ambiguous, referring to phenomena such as awakeness, aware-

ness or knowledge about something, and the ability to introspect and “report one’s mental states”, some scholars reserve this term for the “subjective quality of experience: what it is like to be a cognitive agent” (Chalmers 1996, p. 6). Alternative terms for the same class of phenomena as consciousness in this sense are experience, qualia (short from qualities), phenomenology, phenomenal, subjective experience and what it is like (Blackburn 1994).

120

5 Spatial Cognition in Virtual Reality

1992), as defined by the International Society for Presence Research (Sect. 1.3). Both definitions emphasize perception in technology mediated immersive environments. Importantly, in addition to inducing the feeling of being located in a virtual environment, virtual reality is also a tool for performing actions in a computer-generated space. Unless participants can perform actions inherent to a specific environment, such as picking a leaf in a botanical garden, they feel lack of realism, which makes the sense of presence fade away (Turner et al. 2003). Thus, the possibility to do things in a virtual environment is another key feature of presence (Sanchez-Vives and Slater 2005), which requires an optimal mapping between participant’s actions and the sensory feedback provided by the media system (Hartmann et al. 2015). These two components of presence—self-localization in a computer-generated space and the sense of possibility to act in that space together with the credibility of the events that unfold there—have been recognized to differing degrees in different models of presence, and sometimes they are referred to by different terms, for instance, as place illusion and plausibility illusion respectively (Slater 2009). Of note, plausibility illusion in this context also refers to the extent to which the participant feels that he is interacting with actual human beings (Oh et al. 2018). In addition to localizing the self within a virtual space, an influential model of presence posits as a second stage the participant’s effort to stay there in the face of competing incoming information that originates in the physical world and may override participant’s interpretation of the surrounding as being in the virtual space (Wirth et al. 2007). The effort to maintain the spatial situation model constructed based on the virtual space may dissipate, if the input from the physical world outweighs the virtual space cues, leading to breaks in presence. If the system is particularly unstable, participants may repeateadly shift from the sense of being present to feeling as being out of the virtual space, which affects their task performance. Importantly, unlike visual representations of space such as a picture on the postcard, in which no action can be taken, representations of 3D computer-generated spaces allow participants to act. To take an action, the participant needs to define a space relative to his/her body. This means adopting an egocentric frame of reference. Once the spatial information on the virtual environment is adopted as the primary egocentric frame of reference, the feeling of being located in that space and being able to act there emerges (Hartmann et al. 2015). One might argue that these are quasi-egocentric coordinates (Briscoe 2009), since the participant in fact assumes the position of his representation in the virtual space. Not only that the overall spatial logic of the mediated space and sufficiently provided cueing need to be aligned with the participant’s sense of being in that space and being able to act there, but additionally the environment needs to support performance of plausible actions given the participant’s frames of reference. Thus, virtual reality allows localizing the self beyond the body boundaries. It allows one to extend the sense of own body to a virtual space and use own virtual environment representation as a tool to perform actions in that space.

5.3 Body Boundaries and Distance Sensing

121

5.3 Body Boundaries and Distance Sensing Philosophers have long realized that the body “extends” into its action space and that it is not constrained by its physical boundaries. Unlike the classical cognitive science view that postulates that the mind is an abstract information processing system, recently some neuroscientists and cognitive scientists have embraced the view that cognition is shaped by the body and bodily interactions with the environment, and that in that sense it extends beyond the skull. The theory of distributed cognition departs from traditional cognitive theories in extending cognition even beyond the individual level: it does not expect all cognitive events “to be encompassed by the skin or skull of an individual” (Hollan et al. 2000, p. 176) and it looks for cognitive processes wherever they occur, not just in the brain, positing that they may be distributed across the members of a social group, in their interaction with the environment, or through time. Apart from these developments, debates on how to best explain the phenomenal self and whether the reductionist approach can fully explain it continue. Virtual reality affords taking actions in a projected space and thus creates the sense of extended corporeality. The sense of being present and acting in a virtual space, together with self transformations that take place there and subjective conscious awareness associated with these experiences appear to widen the gap in current understanding of the role of corporeality in cognition, perception and adaptive action. Body representation is one of the most commonly studied topics in virtual reality. Other commonly studied topics are navigation, and object selection and manipulation (Lotte et al. 2012). Achieving a goal-directed movement in space requires processing of spatial information regarding the involved objects and our own body as well as integration of incoming visual, proprioceptive and motor signals. This integration apparently depends on an internal representation of the body, or body schema (Sect. 2.3.1). The body schema updates its status with the incoming information from any of the contributing sources of information (Sekiyama 2006). Another way to conceptualize the integration of bodily movements and the space around it is via the concept of body matrix (Moseley et al. 2012) (Sect. 1.3.1). Since we can identify a location from various inputs, such as sight, sound, smell, or touch, space has been described as a “supramodal construct not limited to a specific sensation” (Colby and Goldberg 1999, p. 320). The various aspects of experiencing space, such as visual, tactile, kinesthetic and thermal, may affect social interaction. For instance, thermal space is defined by emission and detection of body’s skin temperature, which can also indicate emotional states (e.g., blushing) and when thermal spaces between two (or more) people overlap, allowing perception of another person by olfaction as well, then a certain chemical influence of the emotions of the other can be exerted on the perceiver (Hall 1969). Anthropologists consider these other aspects of space as tightly related to the sense of self, so much so that Hall (1969) differentiates among the visual, kinesthetic, tactile and thermal aspects of self, arguing that a person’s environment determines which of these aspects will develop and which will be repressed. Virtual environments are

122

5 Spatial Cognition in Virtual Reality

still mostly visual spaces, with these other aspects of space-relevant sensory information being relatively insufficiently represented, which limits the scope of their applicability. Nevertheless, there is much to be considered regarding the spatial information that is typically available in virtual environments. As an example, consider a change in distance separating interlocutors in conversation that is indicated by a shift in voice. Different conversational distances require different types of voice, just like different types of social relationships require different physical distances among the interactants. For North Americans, the differences in voice shifts are associated with a range of distances, from using a soft whisper (in close, intimate situations) and audible whisper (in close conversational distance) to using a soft voice (neutral), full voice (public distance) and loud voice (e.g. when talking to a group) (Hall 1990). There are also important cross-cultural differences related both to sense of distance and perception of space more generally. For example regarding the latter, perception of space in the West includes perception of objects and excludes perception of spaces between them, whereas in Japan the spaces are perceived as ma, intervening interval (Hall 1969). As for the former, we turn to proxemics.

5.3.1 Virtual Proxemics Where people stand in relation to each other signals their relationship, or how they feel toward each other, or both. (Hall 1969, p. 120)

Humans and other species use spacing mechanisms to regulate distance when they interact. For instance, in birds and mammals such distances include a flight distance and critical distance, when members of different species are encountered, as well as social and personal distance for encounters with members of the same species. Writing about the ways in which human beings use distancing in social space to regulate interactions, the American anthropologist Edward T. Hall introduced the term proxemics. He defined proxemics as “the interrelated observations and theories of man’s use of space as a specialized elaboration of culture” (Hall 1969, p. 1). Starting from the idea that communication is the heart of culture, and drawing to some extent on the controversial Sapir-Whorf hypothesis,5 according to which the language we speak shapes the way we think, Hall claimed that people of different cultures inhabit 5 The

Sapir-Whorf hypothesis was proposed by two American anthropological linguists, Edward Sapir (1884–1939) and Benjamin Lee Whorf (1897–1941). Based on their work on American Indian languages, which differ structurally from Indo-European languages, they argued that variations among languages are substantial and unpredictable. The controversial claim is that structural differences among languages encode different world views of their speakers, i.e. we conceptualize the world by means of categories and relations that come from our particular language. For instance, Whorf (1956) compared and contrasted English and Hopi, arguing for two distinct conceptions of time in the two languages, defined by whether cyclic experiences are classed as ordinary objects (English) or as recurrent events (Hopi). Linguistic determinism postulates that language determines thought, and thus speakers of different languages must experience the world differently.

5.3 Body Boundaries and Distance Sensing

123

different perceptual worlds, that we are interlocutors with our environments, and that architects, city planners and builders need to consider our proxemics needs.6 Based on his observations of American culture, Hall proposed a classification of interaction distances into four types: intimate, personal, social, and public distance. For each type, he differentiated between a close and a far phase, with clear distinctions not only in the amount of space between the interactants, but also in the type of information about the other that is accessible at a specific distance, level of voice used in conversation, and body stance. Here we briefly present each of Hall’s four types of distance. Intimate distance at close phase is the distance of direct involvement with another body, as in love making or wrestling. At this distance, we unmistakably recognize our involvement with another body through sensory inputs from multiple channels. Vocalization that occurs at this distance is less relevant than communication via other sensory channels, and it is mostly involuntary. At such a minimal distance between two bodies, a whisper has an effect of “expanding the distance” (Hall 1969, p. 117). The far phase of intimate distance is the distance of 6–18 inches from another body, at which we can still detect the heat loss or gain from the other person’s body and odor of their breath. This is the distance of intimate conversation, which imposes a low level voice or a whisper. Importantly, the American proxemic patterns qualify the use of intimate distance in public as inappropriate; in crowded environments bodies must remain as immobile as possible to preclude accidental touching, and the eye contact must be avoided, except for a passing glance. These tactics are meant to exclude intimacy from such intimate spatial relations imposed on strangers. Furthermore, any accidental touch requires immediate withdrawal, or, if impossible, the muscles must remain tense and it is taboo to relax them and differently experience the contact. However, cultures differ with regard to what constitutes intimate space and an accidental touch by a stranger might be better tolerated in some cultures than in others. Personal distance at close phase is a distance between 1.5 and 2.5 feet, at which we can grasp and hold the other person. Verbal communication at this distance is characterized by soft voice and whisper. Spatially, two people barely have elbow room, but the distance does not allow detection of heat or odor of the other body. The far phase of personal distance involves distances just outside touching, within the range of 2.5 and 4 feet. This distance allows a moderate voice level in verbal communication. Social distance at close phase involves a span of 4–7 feet distances. It is used for conducting impersonal business, in communication with co-workers and in casual social gatherings. The voice level at this distance is normal. Social distance at far phase includes distances between 7 and 12 feet. At such distances, the fine details of the face are not noticeable, but the details of skin texture, hair and teeth, and condition of clothes are observable. When conversing at this distance, it is important to keep 6 As an example, Hall points to Frank Lloyd Wright, one of the most famous twentieth century archi-

tects, whose success he ascribes to Wright’s high sensitivity to individual and cultural differences in people’s experience of space.

124

5 Spatial Cognition in Virtual Reality

eye contact. Failing to do so indicates lack of interest or disrespect and usually ends conversation. The voice at this distance is louder to avoid reducing the social distance to personal distance. Hall furthermore observes that at social distances Americans speak at a lower voice relative to the Arabs, the Spaniards, the South Asian Indians and the Russians, but higher than the English upper class, the Southeast Asians and the Japanese. Additionally, social distance at far phase can effectively “insulate or screen people from each other”, but less than 10 feet from another person at the waiting area makes a receptionist feel compelled to talk with that person (Hall 1969, p. 123). Public distance precludes physical involvement with others, allowing even at its close phase, which is any distance from 12 to 25 feet, flight reactions if the subject feels threatened. It is associated with a formal style, as in carefully used formal language. The voice at this type of distance from others is loud, yet not at full volume. At this distance, the fine details of another person’s face are not observable. Public distance at far phase is a distance of 25 feet or more. Nonverbal communication is carried out by means of gestures and body stance, because facial expressions and subtle movements cannot be easily perceived at this distance. This type of distance is used by public figures. Since the normal level voice cannot reach others at this distance, the full public speaking voice is used, which is louder but also combined with a slowed tempo and more clearly pronounced words (it is so-called “frozen style”). This is the distance of public address and theatrical performance and it is well outside the circle of personal involvement. Thus, the four types of interaction distances observed by Hall (1969) differ with regard to the type of social transaction, that is, the type of relationship between the interactants, their feelings, and intentions. Since these are American proximal patterns, it is expected that other cultures may recognize and prioritize other patterns of social interaction, such as family/non-family (e.g. in Spain and Portugal), or the caste/outcast system (e.g. in India). Thus, it is important to recognize the spatial needs at the cultural and individual levels, because the spaces in which people have to live and work can confine them, forcing them “into behaviors, relationships, or emotional outlets that are overly stressful” (Hall 1969, p. 129). Are Hall’s observations on proxemics applicable to virtual environments? Intrusions of personal space, whether they consist of decreasing physical distance from the observer or making eye contact with a stranger who happens to invade that space, have been consistently associated with avoidant behaviors, reports of discomfort, increased anxiety registered through objective measures such as changes in skin conductance, and defensive posture. The proximity effect was found even for 3D images of people as intruders of personal space, and as the viewing distance increased above 0.5 m, the effect decreased (Wilcox et al. 2006). Even sounds alone when spatially distributed to suggest invasion of personal space in a virtual environment have the potential to induce the sense of threat and discomfort in participants. For instance, manipulating the distance of characters approaching participants in a virtual environment, Kobayashi et al. (2015) found that when the participants’ personal space was invaded, both psychological and physiological measures reflected an increased level of discomfort. In addition to confirming

5.3 Body Boundaries and Distance Sensing

125

that Hall’s observation on the importance of having control over own personal space applies to virtual environments, this finding also suggests an important role of spatial distribution of sounds in increasing the sense of presence in these environments. Furthermore, participants in an immersive virtual environment study kept a greater distance between themselves and embodied agents relative to the distance they kept between themselves and avatars when approaching virtual characters in a virtual room, and they kept a larger distance from the agents that engaged them in mutual gaze (Bailenson et al. 2003). The latter finding is consistent with Hall’s (1969) observation that at close physical distances, people who are not personally or intimately involved typically avoid mutual gaze to exclude intimacy from a close spatial relation. Besides, this finding is consistent with the intimacy equilibrium model, which postulates that two behaviors, mutual gaze and interpersonal distance, are inversely related, which is also demonstrated in an immersive virtual environment in which humanoid agents modulated not only participants’ proxemic behavior, but also their memory (Bailenson et al. 2001). When virtual humans approached participants and invaded their personal space, participants moved further away from virtual agents. Taken together, these findings suggest the importance of considering the type of virtual humans for virtual proxemics: agents, which are controlled by preset computer algorithms, vs. avatars, controlled by participants. It appears then that the proxemic patterns observed by Hall also hold in virtual environments. Briefly, to evoke a positive reaction in participants, their personal space should be respected; conversely, invading participants’ personal space evokes a negative reaction (Wilcox et al. 2006; Durlach and Slater 2000). However, it has often been noted that participants in immersive virtual environments retain a degree of awareness that they are in a technology-mediated environment. For instance, in Bailenson et al.’s (2003) study, participants’ awareness that they cannot really be touched, bumped into, or harmed in any way by the virtual characters, despite their automatic responses evoking presence, led the authors to conclude that “interpersonal interactions in IVE may be fundamentally different than in everyday interaction in the physical world” (p. 14). This further means that even though Hall’s proxemic patterns apply to virtual environments to some extent (e.g. intrusion of personal space by others causes discomfort in participants), they do not quite coincide with those observed in virtual spaces. Finally, the impact of cultural differences on virtual proxemics is yet to be sufficiently explored and virtual reality design needs to recognize cross-cultural differences in proxemic patterns when considering how to create best environments, avoiding predefined, generic social space molds as a universal solution.

5.4 Spatial Memory and the Hippocampus The hippocampus is a core brain region that supports memory. Traditionally, the hippocampus has been associated with episodic memory. Ever since the concept of episodic memory has been introduced (Tulving 1972), research on this topic has

126

5 Spatial Cognition in Virtual Reality

focused on establishing differences between this and other types of memory (Moscovitch et al. 2017). For instance, episodic memory refers to memories of past events or autobiographical episodes, semantic memory refers to our knowledge about the world, including language and lexical knowledge. Together, they are often referred to as declarative memory and contrasted with procedural memory and other forms of memories that by definition are not consciously accessible knowledge. However, recent proposals challenge such a strong dissociative view, reinterpreting old findings within a more interactive view of memory subsystems and presenting new evidence for the role of episodic memory in other cognitive functions, from perception and working memory to language and semantic memory, empathy and problem solving, to decision making (Chadwick et al. 2012; Yonelinas 2013; Renoult et al. 2014; Moscovitch et al. 2017). The new evidence extends also the hippocampal role from mediating episodic memory to other domains. For instance, some findings suggest a hippocampal role in the formation of statistical context memories in a search task (Geyer et al. 2012; Zinchenko et al. 2018), imagining the future, and spatial navigation (Maguire and Mullally 2013). The extended role of the hippocampus allows for a notion that even word retrieval, as in semantic fluency test,7 may invoke a recollective episodic process. In the domain of language, research evidence implicates the hippocampus in a range of processes, from word recognition (Bird and Burgess 2008) and syntactic integration (Meyer et al. 2005) to discourse processing (Duff and Brown-Schmidt 2012), generally suggesting an integrative or binding-of-information-across-representations role of the hippocampus. The hippocampal function has been debated with regard to episodic memory, relational binding, and cognitive map views, among others (Kent et al. 2016). An ongoing debate pertains to the role of the hippocampus in consolidated memories (Maguire 2014). Contemporary theories typically assume that new information is temporarily stored in the hippocampus and then transferred to distributed cortical networks for permanent storage (Frankland and Bontempi 2005). However, there are different views regarding the hippocampal role in retrieval of consolidated memories. While some researchers assume that once a memory trace is fully consolidated, the hippocampus is not necessary for retrieval (e.g. Squire and Alvarez 1995), others argue that the hippocampus is involved in retrieval of not only recent memories but also remote or consolidated memories (e.g. Nadel and Moscovitch 1997). A third approach is anchored in the notion of hippocampal indexing (Teyler and DiScenna 1986): instead of storing details about an event, the hippocampus serves as an index, remembering neocortical patterns associated with memory representations (Yaasa and Reagh 2013). This allows the hippocampus to induce the recall of neocortical memory representations. Since Marr’s (1971) seminal work, research on hippocampal computations such as pattern separation and pattern completion has produced a large body of evidence, 7 Semantic

fluency test assesses the ability to retrieve words according to a semantic criterion, such as semantic category membership. For example, participants are required to say as many words that refer to animals as they can recall in 1 min or in 1.5 min in some studies.

5.4 Spatial Memory and the Hippocampus

127

mostly obtained from animal data and to a lesser extent from human data (Horner and Doeller 2017). Pattern separation is a cognitive process of discerning overlapping stimuli into distinct memory representations, whereas in pattern completion a preexisting representation is retrieved from a partial or degraded cue. When studying these computations in animals, a task performance can be directly related to neuronal activity at the level of cell population. Since ethical reasons do not permit such studies with human subjects, the terms pattern separation and pattern completion in these studies refer to behavioral discrimination and generalization, respectively (Liu et al. 2016). Importantly, these fundamental computations are supported by different hippocampal regions, with the dentate gyrus/CA3 supporting pattern separation, and region CA1 supporting pattern completion (Bakker et al. 2008), but CA3 may support both types of computations (Liu et al. 2016). In contrast to this view, Kirwan et al. (2012) suggest that pattern completion and pattern separation are processes “carried out widely throughout the brain” and that the hippocampus stands out because it executes these processes more rapidly. Recently, Horner et al. (2015) studied pattern completion of complex “events”—consisting of words for locations, people, and objects/animals—and although each element type activated a distinct neocortical region, with their reinstatement at retrieval, memory performance was predicted by the hippocampal activity during encoding. Thus, these data confirm that the hippocampal involvement is wider than originally thought. The hippocampus plays a key role in spatial cognition (O’Keefe and Nadel 1978). Different regions of the hippocampus support different roles in this regard. Recent evidence suggests that the posterior hippocampus supports spatial cognition regardless of whether episodic or semantic memory is being retrieved, and that the functional roles between the posterior and the anterior hippocampus can be further differentiated: the former is engaged when we are thinking about spatial details and spatial relations, as in navigation, whereas the latter is preferentially engaged when thinking about locations or contexts (Nadel et al. 2012). The hippocampus is considered to be “constantly constructing spatially coherent scenes”, where scenes are central to this type of information processing (Maguire and Mullally 2013, p. 1180). The hippocampal formation, consisting of the hippocampus and the entorhinal cortex, is largely viewed as supporting memory formation and navigation, as well as spatial context of retrieved memories (Miller et al. 2013). Like place cells in the hippocampus, which fire when an animal/human is at a certain position in the environment (Bellmund et al. 2018), other functionally specialized types of cells also support aspects of spatial processing (Moser et al. 2015). For example, grid cells in the entorhinal cortex encode a coordinate system of the environment, thereby supporting spatial navigation (Rowland et al. 2016). The brain’s spatial navigation system contains other types of cells that support spatial processing, such as head direction cells, goal direction cells which signal egocentric direction to navigation goals, speed cells, and boundary vector cells (Colby 1998; Bellmund et al. 2018). Regarding navigation strategies, the experimental literature often refers to two broadly defined types: the egocentric strategy, which is used when we rely on our knowledge of the routes and landmarks and the order of their appearance in a sequence that leads to the target, and the allocentric strategy, which consists of our relying on a mental

128

5 Spatial Cognition in Virtual Reality

representation of the environment, including the distances and the direction of the target and relevant landmarks (Jheng and Pai 2009). Importantly, spatial representations are supported by other brain regions as well, which interact with the medial temporal lobe to support neural coding during orientation and navigation (Herweg and Kahana 2018). For instance, posterior parietal cortex is crucial for integration of sensory inputs that convey information on spatial coordinates and for translation of this information into egocentric frames of reference (Herweg and Kahana 2018), while the medial temporal lobe predominantly supports formation of allocentric spatial maps. Importantly, these are not isolated processes, as illustrated for instance by egocentric recall of allocentric information. It has been suggested that place and grid cells can encode positions in a cognitive space, far beyond Euclidean space for navigation (Bellmund et al. 2018). Echoing Edward Tolman’s (1948) idea that cognitive maps underlie not only spatial navigation but cognition more generally, as well as some early attempts to explicate this idea (Pick 1999), a specific proposal on a “spatial representational format of cognition” has been recently put forth, with arguments that neural mechanisms supporting spatial navigation are implicated in a wide range of functions across cognitive domains (Bellmund et al. 2018; Bottini and Doeller 2020). Summing up, the hippocampus is involved a range of cognitive functions, with a prominent role in spatial information processing.

5.4.1 Spatial Working Memory in Virtual Reality Another memory subsystem critical for spatial tasks is working memory. Here we take a closer look at the role of working memory in one type of spatial tasks, wayfinding. Working memory is a memory system that supports short-term maintenance and manipulation of information. According to an influential model8 , working memory consists of the phonological loop, the visuo-spatial sketchpad and the central executive subsystems (Baddeley and Hitch 1974; Baddeley 1986). The phonological loop supports maintenance and processing of verbal information, the visuo-spatial sketchpad supports maintenance and processing of visual and spatial information, apparently having an important role in spatial orientation and geographical knowledge, while the central executive coordinates the other two subsystems, in addition to selecting reasoning and storage strategies (Baddeley and Hitch 1974; Baddeley 1986, 2003). In an elegantly designed study, Meilinger et al. (2008) investigated which of these subsystems supports spatial orientation. This question is important because it is not clear whether the wayfinding information is represented in a verbal format (as in verbal directions), visually (as an image/snapshot of the environment) or spatially (in terms of more abstract spatial representations, such as the geometric layout of an environment). To answer this question, they designed an experiment with a 8 For

an update of the model, see Baddeley (2003).

5.4 Spatial Memory and the Hippocampus

129

secondary verbal, spatial, and visual task, and tested whether these tasks interfered with encoding wayfinding knowledge of a virtual city. The virtual environment was displayed on a 220 degrees screen. The participants were 24 cognitively healthy subjects (mean age 24, ±4), who have never been to Tübingen. They had to learn two different routes through Virtual Tübingen. Importantly, during this phase a secondary task (verbal, visual or spatial) disrupted the learning. The secondary verbal task was a lexical decision task in which participants had to decide for each word they heard whether it existed in German (e.g. “Mintag” is not a word in German, but “Montag” is). In the secondary visual task, the participants had to imagine a clock with watch hands pointing the times they heard and then indicate whether the watch hands pointed to the same halves (e.g. “20 past 4”). The secondary spatial task required that the participants indicate the direction from which a sound came from—the left, the right, or the front. The difficulty level in the interference of the secondary tasks with the learning of the environmental routes was balanced for the difficulty of the three tasks. However, in the following wayfinding phase there were no secondary tasks and the participants had to “find and ‘virtually walk’ the two routes” using a joystick (Meilinger et al. 2008, p. 756). The findings of this study indicate that the verbal and the spatial secondary tasks considerably interfered with wayfinding performance, whereas the secondary visual task only had a mild effect. Based on this evidence, the authors conclude that the phonological loop and the spatial component of the visuo-spatial sketchpad are involved in encoding of information on spatial orientation. The verbal encoding of environmental information is suggested to take part in spatial orientation as verbal directions (e.g., left, then right, etc.) implemented to support memory. The secondary verbal task inhibited production of such verbal directions, which resulted in worse results in wayfinding. Similarly, the secondary spatial task—but not the secondary visual task—interfered with wayfinding performance, which indicates that abstract spatial features, such as the geometry of an environment, may be more important in spatial orientation than visual surface features and reliance on pictorial information, i.e. snapshot or a conventional map of the environment. This study is interesting because it suggests that multiple working memory subsystems play a role in encoding and recall of environmental information. Furthermore, drawing on Paivio’s theory of dual coding (Paivio 1971, 1986, 1991), according to which verbally presented information is also coded in a visuo-spatial format, the authors propose that “environmental information is encoded not only in a spatial format, but also in a verbal format” (p. 765). While this conclusion holds for route knowledge, other representations of environment do not necessarily assume a verbal format. A conventional map used in navigation differs from routes in form, despite the overlap of the purpose they serve. A conventional map is a two-dimensional representation of some space. It contains representation of places that are interconnected by a set of spatial transformation rules. Routes, on the other hand, consist of specific instructions on how to get from point A to point B; it is “a list of stimulus-responsestimulus commands which lead the rambler from one sight to another…” (O’Keefe and Nadel 1978, p. 81). These instructions have been divided into a guidance type

130

5 Spatial Cognition in Virtual Reality

and an orientation or direction type. A guidance is a type of route instruction that directs us to a prominent landmark or object (a bridge, mountains, the destination itself); it requires an alignment of the egocentric body axis to it. The latter type of instructions is for aligning the egocentric body axis to some other axis, requiring rotations within the egocentric body space while “maintaining the stimuli in a particular part of that space” (O’Keefe and Nadel, O’Keefe 1978, p. 84). While maps are richer in information content than routes, and reading maps is more complicated than following route instructions, maps are more flexible than routes because they allow a wide choice of possible paths between places that can be taken. Thus, routes by definition involve information both in verbal and spatial formats, but this does not hold for maps. There is another, more abstract concept of map which is relevant in this context: a cognitive map. One common misconception is that a cognitive map is a mental analogue of a conventional, physical map. Drawing on Tolman’s (1948) concept of cognitive map, which is often defined as an allocentric spatial representation that integrates spatial relations among locations (e.g. Wang 2012), O’Keefe and Nadel (1978) point out the following: The cognitive map is not a picture or image which ‘looks like’ what it represents; rather it is an information structure from which map-like images can be reconstructed and from which behavior dependent upon place information can be generated. (O’Keefe and Nadel 1978, p. 78)

Cognitive maps are considered to be a high form of spatial knowledge: These cognitive maps are known to be Euclidean, sense-preserving representations of the environment, that is, they encode the metric relations (distances and angles) and the sense relation (right versus left). (Gallistel 1999, p. 25)

Although Tolman defined the concept of cognitive map while experimenting with rats and other evidence also indicates that cognitive maps support animal navigation, it has been questioned whether all animals possess this type of spatial knowledge. Unlike cognitive maps, route knowledge is considered to be a more basic form of spatial knowledge. While cognitive maps are allocentric representations, route knowledge includes both egocentric and allocentric frames of reference. Given the difference in the frames of reference, relying on a cognitive map to navigate a specific environment requires translation of allocentric to egocentric coordinates. For route knowledge to generate a cognitive map, information about distances and angles about objects in the space need to be added. Thus, it appears that the nature of routes is such that it affords both spatial and verbal encoding of spatial information, i.e. as in verbal instructions, whereas an image of some space is more like a conventional map, which relies more rigidly on its canonical encoding format. Regardless, they both rely on information contained in the cognitive map, which does not include a verbal component, as also indicated by the evidence that animals form cognitive maps. Overall, perception and conceptualization of space in immersive virtual environments reflect the fact that these virtual experiences are technology-mediated. This

5.4 Spatial Memory and the Hippocampus

131

is reflected in a range of phenomena, from virtual embodiment and use of quasiegocentric coordinates, reliance on different cognitive strategies in spatial tasks in the two types of environment, to differences in neural and electrophysiological patterns associated with movement when navigating virtual vs. real environments, and, at least to some extent, in virtual proxemics. Although they have many aspects of spatial processing in common, some important aspects of experiencing space set apart virtual environments from the real world.

References Atkinson, J.: A neurobiological approach to the development of ‘where’ and ‘what’ systems for spatial representation in human infants. In: Eilan, N., McCarthy, R., Brewer, B. (Eds.), Spatial representation. Problems in philosophy and psychology (pp. 325–339). Oxford University Press, Oxford (1999) Baddeley, A.D.: Working memory. Oxford University Press, Oxford (1986) Baddeley, A.D.: Working memory: looking back and looking forward. Nat. Rev. Neurosci. 4, 829– 839 (2003) Baddeley, A.D., Hitch, G.J.: Working memory. In: Bower, G.A. (ed.) The psychology of learningand motivation, vol. VIII, pp. 47–90. Academic, New York (1974) Bailenson, J.N., Blascovich, J., Beall, A.C., Loomis, J.M.: Equilibrium theory revisited: mutual gaze and personal space in virtual environments. Presence 10, 583–598 (2001) Bailenson, J.N., Blascovich, J., Beall, A.C., Loomis, J.M.: Interpersonal distance in immersive virtual environments. Pers. Soc. Psychol. Bull. 29, 1–15 (2003) Bakker, A., Kirwan, B.C., Miller, M., Stark, C.E.L.: Pattern separation in the human hippocampal CA3 and dentate gyrus. Science 319, 1640–1642 (2008) Baumgartner, T., Valko, L., Esslen, M., Jaencke, L.: Neural correlate of spatial presence in an arousing and noninteractive virtual reality: an EEG and psychophysiology study. Cyberpsychology & Behav. 9, 30–45 (2006) Beck, L., Wolter, M., Mungard, N., Vohn, R., Staedtgen, M., Kuhlen, T., et al.: Evaluation of spatial processing in virtual reality using functional magnetic resonance imaging (fMRI). Cyberpsychology, Behav., Soc. Netw. 13, 211–215 (2010) Bellmund, J.L.S., Gardenfors, P., Moser, E.I., Doeller, C.F.: Navigating cognition: spatial codes for human thinking. Science 362(6415) (2018). https://doi.org/10.1126/sceince.aat6766 Bird, C.M., Burgess, N.: The hippocampus supports recognition memory for familiar words but not unfamiliar faces. Curr. Biol. 18, 1932–1936 (2008) Blackburn, S.: The Oxford Dictionary of Philosophy. Oxford University Press, Oxford (1994) Bohbot, V.D., Copara, M.S., Gotman, J., Ekstrom, A.D.: Low-frequency theta oscillations in the human hippocampus during real-world and virtual navigation. Nat. Commun. 8(1) (2017) Bottini, R., Doeller, C.F.: Knowledge across reference frames: cognitive maps and image spaces. Trends Cogn. Sci. 24, 606–619 (2020) Briscoe, R.: Egocentric spatial representation in action and perception. Philos. Phenomen. Res. 79(2), 423–460 (2009) Bystrom, K.E., Barfield, W., Hendrix, C.: A conceptual model of the sense of presence in virtual environments. Presence: Teleoperators and Virtual Environments 8, 241–244 (1999) Chadwick, M.J., Bonnici, H.M., Maguire, E.A.: Decoding information in the human hippocampus: a user’s guide. Neuropsychologia 50, 3107–3121 (2012) Chalmers, D.J.: The conscious mind: In search of a fundamental theory. Oxford University Press (1996) Colby, C.L.: Action-oriented spatial reference frames in cortex. Neuron 20, 15–24 (1998)

132

5 Spatial Cognition in Virtual Reality

Colby, C.L., Goldberg, M.E.: Space and attention in parietal cortex. Annu. Rev. Neurosci. 22, 319–349 (1999) di Pellegrino, G., Làdavas, E.: Peripersonal space in the brain. Neuropsychologia 66, 126–133 (2015) Donato, F., Moser, E.I.: A world away from reality. Nat. Neurosci. 533, 345 (2016) Duff, M.C., Brown-Schmidt, S.: The hippocampus and the flexible use and processing of language. Front. Hum. Neurosci. 6, 69 (2012) Durlach, N., Slater, M.: Presence in shared virtual environments and virtual togetherness. Presence 9(2), 214–217 (2000) Ekstrom, A.D., Kahana, M.J., Caplan, J., Fields, T.A., Isham, E.A., et al.: Cellular networks underlying human spatial navigation. Nature 425, 184–188 (2003) Frankland, P.W., Bontempi, B.: The organization of recent and remote memories. Nat. Rev. Neurosci. 6, 119–130 (2005) Gallese, V.: From grasping to language: Mirror neurons and the origin of social communication. Towards a Science of Consciousness, 165–178 (1999) Gallistel, R.C.: Animal navigation. The MIT encyclopedia of the cognitive sciences, pp. 24–26. MIT Press, Cambridge, MA (1999) Geyer, T., Baumgartner, F., Müller, H.J., Pollmann, S.: Medial temporal lobe-dependent repetition suppression and enhancement due to implicit vs. explicit processing of individual repeated search displays. Front. Hum. Neurosci., 6, 272 (2012) Gunzelmann, G., Lyon, D.R.: Mechanisms for human spatial competence. In: Barkowsky, T. et al., Spatial cognition V, LNAI 4387, (pp. 288–307). Springer (2007) Hall, E.T.: The silent language. Anchor Books Doubleday, New York (1990) Hall, E.T.: The hidden dimension. Anchor Books, Doubleday & Company, Inc., Garden City, New York (1979) Hartmann, T., Wirth, W., Vorderer, P., Klimmt, C., Schramm, H., Boecking, S.: Spatial presence theory: state of the art and challenges ahead. In: Lombard, M., Biocca, F., Freeman, J., Jsselsteijn, W., Schaevitz, R.J. (eds.), Immersed in media. Telepresence theory, measurement & technology (pp. 115–135). Springer (2015) Haun, D.B.M., Rapold, C.J., Janzen, G., Levinson, S.C.: Plasticity of human spatial cognition: spatial language and cognition covary across cultures. Cognition 119, 70–80 (2011) Havranek, M., Langer, N., Cheetham, M., Jäncke, L.: Perspective and agency during video gaming influences spatial presence experience and brain activation patterns. Behav. Brain Funct. 8, 1–13 (2012) Herweg, N.A., Kahana, M.J.: Spatial representations in the human brain. Front. Human Neurosci. 12, (2018) Hollan, J., Hutchins, E., Kirsh, D.: Distributed cognition: toward a new foundation for humancomputer interaction research. ACM Trans. Comput.-Hum. Interact. 7, 174–196 (2000) Horner, A.J., Doeller, F.C.: Plasticity in hippocampal memories in humans. Curr. Opin. Neurobiol. 43, 102–109 (2017) Horner, A.J., Bisby, J.A., Bush, D., Lin, W.-J., Burgess, N.: Evidence for holistic episodic recollection via hippocampal pattern completion. Nat. Comm. 6, 1–11 (2015) Ittelson, W.: Environment perception and contemporary perception theory. In: Ittelson, W. (ed.) Environment and cognition, pp. 141–154. Seminar Press, New York (1973) Jeannerod, M., Anquetil, T.: Putting oneself in the perspective of the other: a framework for self-other differentiation. Soc. Neurosci. 3, 356–367 (2008) Jheng, S.-S., Pai, M.-C.: Cognitive map in patients with Alzheimer’s disease: a computer-generated arena study. Behav. Brain Res. 200, 42–47 (2009) Kent, B.A., Hvoslef-Eide, M., Saksida, L.M., Bussey, T.J.: The representational-hierarchical view of pattern separation: Not just hippocampus, not just space, not just memory? Neurobiol. Learn. Mem. 129, 99–106 (2016) Kessler, K., Thomson, L.A.: The embodied nature of spatial perspective taking: embodied transformation versus sensorimotor interference. Cognition 114, 72–88 (2010)

References

133

Kirwan, B.C., Hartshorn, A., Stark, S.M., Goodrich-Hunsaker, N.J., Hopkins, R.O., Stark, C.E.L.: Pattern separation deficits following damage to the hippocampus. Neuropsychologia 50, 2408– 2414 (2012) Kobayashi, M., Ueno, K., Ise, S.: The effects of spatialized sounds on the sense of presence in auditory virtual environments: a psychological and physiological study. Presence 24(2), 163–174 (2015) Lessels, S., Ruddle, R.A.: Movement around real and virtual cluttered environments. Presence: Teleoper. Virtual Environ. 14(5), 580–596 (2005) Levelt, W.J.M.: Perspective taking and ellipses in spatial descriptions; In: Bloom, P., Peterson, M.A., Nadel, L. & Garrett, M. (Eds.), Language and Space. Cambridge, MA: MIT Press (1999) Liu, K.Y., Gould, R.L., Coulson, M.C., Ward, E.V., Howard, R.J.: Tests of pattern separation and pattern completion in humans—a systematic review. Hippocampus 26, 705–717 (2016) Lotte, F., Faller, J., Guger, C., Renards, Y., Pfurtscheller, A.L. et al.: Combining BCI with virtual reality: towards new applications and improved BCI. In: Allison, B.Z., Dunne, S., Leeb, R., Millán, J. & Nijholt, A. (Eds.), Towards practical brain-computer interfaces (pp. 197–220). Springer (2012) Maguire, E.: Memory consolidation in humans: new evidence and opportunities. Exp. Physiol. 99, 471–486 (2014) Maguire, E., Mullally, S.L.: The hippocampus: a manifesto for change. J. Exp. Psychol. 142, 1180– 1189 (2013) Makany, T., Dror, I.E., Redhead, E.S.: Spatial strategies in real and virtual environments. Cognitive Processes 7(Suppl, 1), S63 (2006) Marr, D.: Simple memory: a theory of archicortex. Philos. Trans. R. Soc. Lond. Ser. B: Biol. Sci. 262, 23–81 (1971) Meilinger, T., Knauff, M., Bülthoff, H.H.: Working memory in wayfinding—a dual task experiment in a virtual city. Cogn. Sci. 32, 755–770 (2008) Meyer, P., Mecklinger, A., Grunwald, T., Fell, J., Elger, C.E., Friederici, A.: Language processing within the human medial temporal lobe. Hippocampus 15, 451–459 (2005) Miller, J.F., Neufang, M., Solway, A., Brandt, A., Trippel, M., Mader, I., et al.: Neural activity in human hippocampal formation reveals the spatial context of retrieved memories. Science 342, 1111–1114 (2013) Moser, M.-B., Rowland, D.C., Moser, E.I.: Place cells, grid cells, and memory. Cold Spring Harb. Perspect. Biol. 7, (2015) Moscovitch, M., Cabeza, R., Winocur, G., Nadel, L.: Episodic memory and beyond: the hippocampus and neocortex in transformation. Annu. Rev. Psychol. 67, 105–134 (2017) Moseley, G.L., Gallace, A., Spence, C.: Bodily illusions in health and disease: physiological and clinical perspectives and the concept of a cortical ‘body matrix’. Neurosci. Biobehav. Rev. 36, 34–46 (2012) Nadel, L., Moscovitch, M.: Memory consolidation, retrograde amnesia and the hippocampal complex. Curr. Opin. Neurobiol. 7, 217–227 (1997) Nadel, L., Hoscheidt, S., Ryan, L.R.: Spatial cognition and the hippocampus: the anterior–posterior axis. J. Cogn. Neurosci. 25, 22–28 (2012) Normand, J.-M., Giannopoulos, E., Spanlang, B., Slater, M.: Multisensory stimulation can induce an illusion of larger belly size in immersive virtual reality. PLoS ONE 6, (2011) Oh, C.S., Bailenson, J.N., Welch, G.F.: A systematic review of social presence: Definition, antecedents, and implications. Front. Robot. AI, 5, (2018) O’Keefe, J.: Kant and the sea-horse: an essay in the neurophilosophy of space. In: Eilan, N., McCarthy, R. & Brewer, B. (Eds.), Spatial representation. Problems in philosophy and psychology (pp. 43–64). Oxford: Oxford University Press (1999) O’Keefe, J., Nadel, L.: The hippocampus as a cognitive map. Oxford, New York (1978) Paivio, A.: Imagery and verbal processes. Holt, Rinehart & Winston, New York (1971) Paivio, A.: Mental representations: a dual coding approach. Oxford University Press, New York (1986)

134

5 Spatial Cognition in Virtual Reality

Paivio, A.: Dual coding theory: retrospect and current status. Can. J. Psychol. 45, 255–287 (1991) Pick, H.: Cognitive maps. The MIT Encyclopedia of the Cognitive Sciences, pp. 135–137. MIT Press, Cambridge, MA (1999) Renoult, L., Davidson, P.S.R., Schmitz, E., Park, L., Campbell, K., Moscovitch, M., et al.: Autobiographically significant concepts: more episodic than semantic in nature? An electrophysiological investigation of overlapping types of memory. J. Cogn. Neurosci. 27, 57–72 (2014) Sanchez-Vives, M.V., Slater, M.: From presence to consciousness through virtual reality. Nat. Rev. Neurosci. 6, 332–339 (2005) Sekiyama, K.: Dynamic spatial cognition: components, functions, and modifiability of body schema. Japan. Psychol. Res. 48, 141–157 (2006) Sheridan, T.B.: Musings on telepresence and virtual presence. Presence 1, 120–126 (1992) Slater, M.: Place illusion and plausibility can lead to realistic behavior in immersive virtual environments. Philos. Trans. R. Soc. B 364, 3549–3557 (2009) Squire, L.R., Alvarez, P.: Retrograde amnesia and memory consolidation: a neurobiological perspective. Curr. Opin. Neurobiol. 5, 169–177 (1995) Teyler, T.J., DiScenna, P.: The hippocampal memory indexing theory. Behav. Neurosci. 100, 147– 154 (1986) Tolman, E.C.: Cognitive maps in rats and men. Psychol. Rev. 55, 189–208 (1948) Tulving, E.: Episodic and semantic memory. In: Tulving, E., Donaldson, W. (eds.) Organization of memory, pp. 381–403. Academic Press, New York, NY (1972) Turner, S., Turner, P., Carroll, F., O’Neill, S., Benyon, D., McCall, R., Smyth, M.: Re-creating the botanics: towards a sense of place in virtual environments. Proceedings of the Third Conference of the Environmental Psychology in the UK Network, 23–25 (2003) Wang, R.F.: Theories of spatial representations and reference frames: what can configuration errors tell us? Psychon. Bull. Rev. 19, 575–587 (2012) Whorf, B.L.: Language, thought, and reality. MIT Press, Cambridge, MA (1956) Wilcox, L.M., Allison, R.S., Elfassy, S., Grelik, C.: Personal space in virtual reality. ACM Trans. Appl. Percept. 3, 412–428 (2006) Wirth, W., Hartmann, T., Böcking, S., Vorderer, P., Klimmt, C., Schramm, H., et al.: A process model of the formation of spatial presence experiences. Media Psych. 9, 493–525 (2007) Wissmath, B., Weibel, D., Schmutz, J., Mast, F.W.: Being present in more than one place at a time? Patterns of mental self-localization. Conscious. Cogn. 20, 1808–1815 (2011) Yaasa, M.A., Reagh, Z.M.: Competitive trace theory: a role for the hippocampus in contextual interference during retrieval. Front. Behav. Neurosci. 7, 107 (2013) Yonelinas, A.P.: The hippocampus supports high-resolution binding in the service of perception, working memory and long-term memory. Behav. Brain Res. 254, 34–44 (2013) Zinchenko, A., Conci, M., Taylor, P.C.J., Müller, H.J., Gezer, T.: Taking attention out of context: frontopolar transcranial magnetic stimulation abolishes the formation of new context memories in visual search. J. Cogn. Neurosci. 31, 442–452 (2018)

Index

A Action action, joint, 39, 65, 66, 74, 75, 81, 101, 103, 104, 106 action, synchronous, 104, 106 action system, 99 action, willed, 100 Affordances, 18–20, 81, 89–91, 94, 96, 97, 99, 103, 108 Agency, 7, 8, 34–37, 42, 52, 57, 64, 88, 89, 106–108 Allocentric coordinates, 96, 102, 130 frames of reference, 130 Augmented reality, 1 Autoscopic hallucination, 44–46, 51 Autoscopic phenomena, 44–46, 51 Avatar, 7, 17–22, 37, 39, 63, 67, 73–82, 95, 96, 100, 101, 107, 108, 125 Awareness bodily awareness, 42, 103 corporeal self-awareness, 35, 39 self-awareness, 35, 39, 62, 78, 103

B Behavioral confirmation, 21 Behavioral fidelity, 46, 74, 77, 79 Beliefs, 11, 26, 38, 58–62, 65, 71, 73, 105 Binding, 47, 126 Body body image, 33, 38, 39 body matrix, 19, 23, 24, 121 body representation, 21, 22, 38, 39, 121 body schema, 7, 18–20, 23, 25, 38, 39, 46, 50, 71, 88–90, 108, 121 Brain-computer interface, 6, 24, 40, 41

C CAVE, 10, 18, 22, 37, 40 Central nervous system, 51 Cognition social, 57–59, 62, 63, 65–67, 69, 70, 72, 73, 87, 98 Cognitive cognitive dissonance, 63 cognitive map, 113, 126, 128, 130 cognitive penetrability, 11, 13 cognitive states, 11, 12, 43 cognitive style, 13, 14 Co-localization, 81 Co-presence, 8, 39, 65, 80, 81 Cybersickness, 15 Cyborg, 6

D Delusion, 13, 26, 27, 36, 62, 98 Depersonalization, 36, 43, 45 Disbelief disbelief, suspension of, 16, 26 Disembodiment, 7, 21, 44 Distributed cognition, 48, 49, 121

E Ecstasy, 68 Egocentric coordinates, 93–96, 102, 119, 120, 130 frames of reference, 92, 93, 117, 118, 120, 128, 130 Embodied simulation theory, 71, 97 Embodiment virtual embodiment, 17–20, 22, 95, 102, 107, 131

© Springer-Verlag GmbH Germany, part of Springer Nature 2021 V. Kljajevic, Consensual Illusion: The Mind in Virtual Reality, Cognitive Systems Monographs 44, https://doi.org/10.1007/978-3-662-63742-5

135

136 Encapsulation, 11, 12 Entrainment, 105, 107 Episodic memory, 50, 125, 126 Extended phenotype, 5

F Fidelity behavioral, 46, 74, 77, 79 minimal, 74 visual, 74 First-person perspective, 19, 42–45, 51, 98 Folk psychology, 34, 60, 64, 71 Forward model, 38, 107, 108 Frames of reference allocentric, 118 egocentric, 92, 93, 117–119

G Gender, 14, 15, 20, 79 Gender similarity hypothesis, 15 Grounded cognition, 18

H Hallucination, 25, 26, 36, 44–46, 51 Head-mounted display, 10, 21, 22, 37, 114 Heautoscopy, 44–46, 51 Hippocampus, 116, 125–128 Humor, 62, 79, 80

I Illusion, 8, 10, 16–18, 20, 22–27, 39, 40, 47, 48, 51, 52, 73, 77, 96, 119 Immersion, 2, 7, 8, 22, 26, 81 Inner speech, 58 Intentionality co-intentionality, 81 Intentional states, 65, 71 Interactionism, 65, 66 Intimacy equilibrium model, 125 Inverse presence, 16

J Joint action, 39, 65, 66, 74, 75, 81, 101, 103, 104, 106 Joint attention, 57

L Latency

Index system latency, 48 Locomotion locomotion interfaces, 37

M Memory declarative, 126 episodic, 50, 125, 126 semantic, 126, 127 spatial, 113, 125 working, 49, 61, 113, 126, 128, 129 Mental models, 13, 14, 67, 113 Mind reading, 50, 57, 58, 60–62, 65, 66, 71, 72, 81 Mirror neurons mirror neuron system, 50, 70, 71 Mixed reality, 1, 19 Motion sickness, 42 Motor motor acts, 96, 97, 106 motor control, 13, 38, 94, 99, 108 motor imagery, 13, 24, 40, 41, 102 motor representation, 70, 98, 101, 102 motor theories of cognition, 98 Multisensory integration, 18, 39, 43, 44, 46, 48, 51, 52

N Navigation navigation by thought, 40–42 navigation strategies, 41, 127 Neuromatrix, 49, 50

O Orientation, 47, 91, 92, 94, 113, 128, 130 Out-of-body experience, 44–46, 51, 52, 119

P Pattern completion, 126, 127 Pattern separation, 126, 127 Perception, 4, 9–14, 17, 21–23, 26, 27, 33, 34, 38, 40, 43, 46–50, 57–59, 64, 69– 73, 78, 88–94, 96, 98, 101–103, 106, 113–115, 117–122, 126, 130 Perspective systems, 118 Phantom limb experience, 19, 22 Phantom sensation, 50 Photorealism, 74 Place cells, 113, 117, 127 Place illusion, 8, 120

Index Plausibility illusion, 26, 120 Posthuman, 6, 7 Presence presence, breaks, 9, 13, 16, 120 presence, inverse, 16 presence selection mechanism, 10–12 Proprioception, 19, 22, 23, 43, 48, 103 Proteus effect, 20, 21 Proxemics, 60, 75, 122–125, 131

R Realism, 3, 64, 74, 95, 115, 120 Relational models relational models theory, 67–69 Rubber hand illusion, 18, 20, 22, 23

S Schizophrenia, 26, 27, 35, 36, 43, 44, 46, 62, 98 Self self-awareness, 35, 39, 62, 78, 103 self-consciousness, 34, 35, 42–45, 51, 98, 107 self-identification, 22, 42, 44–46, 51 self-location, 42, 44, 45, 51 self, minimal, 35–37 self, narrative, 35 self, phenomenal, 34, 43, 117, 121 self-recognition, 33, 34, 58, 107 self-referentiality, 58 self-representation, 43, 49, 50 self-vision, 37 Simulation simulation theory, 71, 97 Simulator flight simulator, 3, 9, 48 Social brain social brain hypothesis, 69, 70 Social cognition, 57–59, 62, 63, 65–67, 69, 70, 72, 73, 87, 98 Social interaction, 21, 58, 60, 62, 64, 67–69, 73–81, 99, 104–106, 121, 124 Social stereotypes, 21 Space space, peripersonal, 20, 23 space, personal, 78, 124, 125 space, thermal, 121 Spatial presence, 119 Spatial processing

137 spatial situation models, 3, 120 Superintelligence, 6 Synchrony, 19, 20, 24, 104–106

T Telepresence, 2, 8 Theory of mind, 13, 57–61, 64, 70–73, 98 Third person perspective, 66, 87, 98

U Uncanny valley, 14, 63, 64, 72 User-centered design, 35

V Virtual virtual body, 7, 17–20, 24, 43, 46, 52, 103, 119 virtual embodiment, 17–20, 22, 95, 102, 107, 131 virtual fictionalism, 3 virtual human, 77–79, 90, 125 virtual realism, 3 virtual togetherness, 80–82 virtual world, 3, 8, 10, 17, 21 Virtual environments collaborative, 73–75, 81 immersive, 9, 14, 16–19, 21, 22, 26, 35, 48, 73–75, 81, 95, 100, 125, 130 Virtualization, 1, 2 Virtual reality virtual reality design, 13, 34, 46, 51, 76, 77, 107, 125 virtual reality technology, 2, 3, 6, 17, 26, 27, 33, 34, 40, 69, 74 Vision descriptive vision, 91, 94, 116 vision for action, 91–94, 96, 101, 102 Visual perception, 10–13, 48, 50, 90–92, 101, 116 Visuo-spatial imagery, 3

W We-intentions, 65, 66, 81 We-mode, 65, 66, 69, 75 Working working memory, spatial, 115, 128