The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics [online version ed.] 0199988692, 9780199988693

Cognitive neuroscience has grown into a rich and complex discipline, some 35 years after the term was coined. Given the

914 65 7MB

English Pages 621 [1111] Year 2013

Report DMCA / Copyright


Polecaj historie

The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics [online version ed.]
 0199988692, 9780199988693

  • Commentary
  • pdf from online version
Citation preview

Oxford Library of Psychology

Oxford Library of Psychology   The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics Edited by Kevin N. Ochsner and Stephen Kosslyn Print Publication Date: Dec 2013 Subject: Psychology Online Publication Date: Dec 2013

(p. ii)

Oxford Library of Psychology

Editor-in-Chief Peter E. Nathan Area Editors: Clinical Psychology David H. Barlow Cognitive Neuroscience Kevin N. Ochsner and Stephen M. Kosslyn Cognitive Psychology Daniel Reisberg Counseling Psychology Elizabeth M. Altmaier and Jo-Ida C. Hansen Developmental Psychology Philip David Zelazo Health Psychology Howard S. Friedman History of Psychology David B. Baker Methods and Measurement Page 1 of 2

Oxford Library of Psychology Todd D. Little Neuropsychology Kenneth M. Adams Organizational Psychology Steve W. J. Kozlowski Personality and Social Psychology Kay Deaux and Mark Snyder

Page 2 of 2


[UNTITLED]   The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics Edited by Kevin N. Ochsner and Stephen Kosslyn Print Publication Date: Dec 2013 Subject: Psychology Online Publication Date: Dec 2013

(p. iv)

Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research, scholarship, and education by publishing worldwide. Oxford New York Auckland Cape Town Dar es Salaam Hong Kong Karachi Kuala Lumpur Madrid Melbourne Mexico City Nairobi New Delhi Shanghai Taipei Toronto With offices in Argentina Austria Brazil Chile Czech Republic France Greece Guatemala Hungary Italy Japan Poland Portugal Singapore South Korea Switzerland Thailand Turkey Ukraine Vietnam Oxford is a registered trademark of Oxford University Press in the UK and certain other countries. Published in the United States of America by Oxford University Press 198 Madison Avenue, New York, NY 10016 © Oxford University Press 2013 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, by license, or under terms agreed with the appropriate reproduction rights organiza­ tion. Inquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above. You must not circulate this work in any other form and you must impose this same condition on any acquirer. Library of Congress Cataloging-in-Publication Data The Oxford handbook of cognitive neuroscience / edited by Kevin Ochsner, Stephen M. Kosslyn. Page 1 of 2

[UNTITLED] volumes cm.—(Oxford library of psychology) ISBN 978–0–19–998869–3 1. Cognitive neuroscience—Handbooks, manuals, etc. 2. Neuropsychology—Hand­ books, manuals, etc. I. Ochsner, Kevin N. (Kevin Nicholas) II. Kosslyn, Stephen Michael, 1948– III. Title: Handbook of cognitive neuroscience. QP360.5.O94 2013 612.8'233—dc23 2013026213 987654321 Printed in the United States of America on acid-free paper

Page 2 of 2

Oxford Library of Psychology

Oxford Library of Psychology   The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics Edited by Kevin N. Ochsner and Stephen Kosslyn Print Publication Date: Dec 2013 Subject: Psychology Online Publication Date: Dec 2013

(p. vi)

(p. vii)

Oxford Library of Psychology

The Oxford Library of Psychology, a landmark series of handbooks, is published by Oxford University Press, one of the world’s oldest and most highly respected publishers, with a tradition of publishing significant books in psychology. The ambitious goal of the Oxford Library of Psychology is nothing less than to span a vibrant, wide-ranging field and, in so doing, to fill a clear market need. Encompassing a comprehensive set of handbooks, organized hierarchically, the Library incorporates volumes at different levels, each designed to meet a distinct need. At one level is a set of handbooks designed broadly to survey the major subfields of psychology; at another are numerous handbooks that cover important current focal research and scholarly areas of psychology in depth and detail. Planned as a reflection of the dynamism of psychology, the Library will grow and expand as psychology itself develops, thereby highlighting significant new research that will influence the field. Adding to its accessibil­ ity and ease of use, the Library will be published in print and electronically. The Library surveys psychology’s principal subfields with a set of handbooks that capture the current status and future prospects of those major subdisciplines. This initial set in­ cludes handbooks of social and personality psychology, clinical psychology, counseling psychology, school psychology, educational psychology, industrial and organizational psy­ chology, cognitive psychology, cognitive neuroscience, methods and measurements, histo­ ry, neuropsychology, personality assessment, developmental psychology, and more. Each handbook undertakes to review one of psychology’s major subdisciplines with breadth, comprehensiveness, and exemplary scholarship. In addition to these broadly conceived volumes, the Library also includes a large number of handbooks designed to explore in depth more specialized areas of scholarship and research, such as stress, health and cop­ ing, anxiety and related disorders, cognitive development, and child and adolescent as­ sessment. In contrast to the broad coverage of the subfield handbooks, each of these lat­ ter volumes focuses on an especially productive, more highly focused line of scholarship and research. Whether at the broadest or most specific level, however, all of the Library handbooks offer synthetic coverage that reviews and evaluates the relevant past and Page 1 of 2

Oxford Library of Psychology present research and anticipates research in the future. Each handbook in the Library includes introductory and concluding chapters written by its editor or editors to provide a roadmap to the handbook’s table of contents and to offer informed anticipations of signifi­ cant future developments in that field. An undertaking of this scope calls for handbook editors and chapter authors who are es­ tablished scholars in the areas about which they write. Many of the (p. viii) nation’s and world’s most productive and respected psychologists have agreed to edit Library handbooks or write authoritative chapters in their areas of expertise. For whom has the Oxford Library of Psychology been written? Because of its breadth, depth, and accessibility, the Library serves a diverse audience, including graduate stu­ dents in psychology and their faculty mentors, scholars, researchers, and practitioners in psychology and related fields. All will find in the Library the information they seek on the subfield or focal area of psychology in which they work or are interested. Befitting its commitment to accessibility, each handbook includes a comprehensive index, as well as extensive references to help guide research. And because the Library was de­ signed from its inception as an online as well as a print resource, its structure and con­ tents will be readily and rationally searchable online. Further, once the Library is re­ leased online, the handbooks will be regularly and thoroughly updated. In summary, the Oxford Library of Psychology will grow organically to provide a thorough­ ly informed perspective on the field of psychology, one that reflects both psychology’s dy­ namism and its increasing interdisciplinarity. Once published electronically, the Library is also destined to become a uniquely valuable interactive tool, with extended search and browsing capabilities. As you begin to consult this handbook, we sincerely hope you will share our enthusiasm for the more than 500-year tradition of Oxford University Press for excellence, innovation, and quality, as exemplified by the Oxford Library of Psychology. Peter E. Nathan Editor-in-Chief Oxford Library of Psychology

Page 2 of 2

About the Editors

About the Editors   The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics Edited by Kevin N. Ochsner and Stephen Kosslyn Print Publication Date: Dec 2013 Subject: Psychology Online Publication Date: Dec 2013

(p. ix)

About the Editors

Kevin N. Ochsner Kevin N. Ochsner is Associate Professor of Psychology at Columbia University. He gradu­ ated summa cum laude from the University of Illinois where he received his B.A. in Psy­ chology. Ochsner then received a M.A. and Ph.D. in psychology from Harvard University working in the laboratory of Dr. Daniel Schacter, where he studied emotion and memory. Also at Harvard, he began his postdoctoral training in the lab or Dr. Daniel Gilbert, where he first began integrating social cognitive and neuroscience approaches to emotion-cogni­ tion interactions, and along with Matthew Lieberman published the first articles on the emerging field of social cognitive neuroscience. Ochsner later completed his postdoctoral training at Stanford University in the lab of Dr. John Gabrieli, where he conducted some of the first functional neuroimaging studies examining the brain systems supporting cog­ nitive forms of regulation. He is now director the Social Cognitive Neuroscience Labora­ tory at Columbia University, where current studies examine the psychological and neural bases of emotion, emotion regulation, empathy and person perception in both healthy and clinical populations. Ochsner has received various awards for his research and teaching, including the American Psychological Association’s Division 3 New Investigator Award, the Cognitive Neuroscience Society’s Young Investigator Award, and Columbia University’s Lenfest Distinguished Faculty Award. Stephen M. Kosslyn Stephen M. Kosslyn is the Founding Dean of the university at the Minerva Project, based in San Francisco. Before that, he served as Director of the Center for Advanced Study in the Behavioral Sciences and Professor of Psychology at Stanford University, and was pre­ viously chair of the Department of Psychology, Dean of Social Science, and the John Lind­ sley Professor of Psychology in Memory of William James at Harvard University. He re­ ceived a B.A. from UCLA and a Ph.D. from Stanford University, both in psychology. His original graduate training was in cognitive science, which focused on the intersection of cognitive psychology and artificial intelligence; faced with limitations in those approach­ es, he eventually turned to study the brain. Kosslyn’s research has focused primarily on Page 1 of 2

About the Editors the nature of visual cognition, visual communication, and individual differences; he has authored or coauthored 14 books and over 300 papers on these topics. Kosslyn has re­ ceived the American Psychological Association’s Boyd R. McCandless Young Scientist Award, the National Academy of Sciences Initiatives in Research Award, a Cattell Award, a Guggenheim Fellowship, the J-L. Signoret (p. x) Prize (France), an honorary Doctorate from the University of Caen, an honorary Doctorate from the University of Paris Descartes, an honorary Doctorate from Bern University, and election to Academia Rodi­ nensis pro Remediatione (Switzerland), the Society of Experimental Psychologists, and the American Academy of Arts and Sciences.

Page 2 of 2


Contributors   The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics Edited by Kevin N. Ochsner and Stephen Kosslyn Print Publication Date: Dec 2013 Subject: Psychology Online Publication Date: Dec 2013

(p. xi)


Claude Alain

Rotman Research Institute

Baycrest Centre

Toronto, Ontario, Canada

Agnès Alsius

Department of Psychology

Queen’s University

Kingston, Ontario, Canada

George A. Alvarez

Page 1 of 19

Contributors Department of Psychology

Harvard University

Cambridge, MA

Stephen R. Arnott

Rotman Research Institute

Baycrest Centre

Toronto, Ontario, Canada

Moshe Bar

Martinos Center for Biomedical Imaging

Massachusetts General Hospital

Harvard Medical School

Charlestown, MA

Bryan L. Benson

Department of Psychology

Page 2 of 19

Contributors School of Kinesiology

University of Michigan

Ann Arbor, MI

Damien Biotti

Lyon Neuroscience Research Center

Bron, France

Annabelle Blangero

Lyon Neuroscience Research Center

Bron, France

Sheila E. Blumstein

Department of Cognitive, Linguistic, and Psychological Sciences

Brown Institute for Brain Science

Brown University

Providence, RI

Page 3 of 19

Contributors Grégoire Borst

University Paris Descartes

Laboratory for the Psychology of Child Development and Education (CNRS Unit 3521)

Paris, France

Department of Psychology

Harvard University

Cambridge, MA

Nathaniel B. Boyden

Department of Psychology

University of Michigan

Ann Arbor, MI

Andreja Bubic

Martinos Center for Biomedical Imaging

Massachusetts General Hospital

Page 4 of 19


Harvard Medical School

Charlestown, MA

Bradley R. Buchsbaum

Rotman Research Institute

Baycrest Centre

Toronto, Ontario, Canada

Roberto Cabeza

Center for Cognitive Neuroscience

Duke University

Durham, NC

Denise J. Cai

Department of Psychology

University of California, San Diego

La Jolla, CA

Page 5 of 19


Alfonso Caramazza

Department of Psychology

Harvard University

Cambridge, MA

Center for Mind/Brain Sciences

University of Trento

Rovereto, Italy

(p. xii)

Evangelia G. Chrysikou

Department of Psychology

University of Kansas

Lawrence, KS

Jared Danker

Department of Psychology

New York University

Page 6 of 19


New York, NY

Sander Daselaar

Donders Institute for Brain, Cognition,and Behaviour

Radboud University

Nijmegen, Netherlands

Center for Cognitive Neuroscience

Duke University

Durham, NC

Lila Davachi

Center for Neural Science

Department of Psychology

New York University

New York, NY

Mark D’Esposito

Page 7 of 19


Helen Wills Neuroscience Institute

Department of Psychology

University of California

Berkeley, CA

Benjamin J. Dyson

Department of Psychology

Ryerson University

Toronto, Ontario, Canada

Jessica Fish

MRC Cognition and Brain Sciences Unit

Cambridge, UK

Angela D. Friederici

Department of Neuropsychology

Max Planck Institute for Human Cognitive and Brain Sciences

Page 8 of 19


Leipzig, Germany

Melvyn A. Goodale

The Brain and Mind Institute

University of Western Ontario

London, Ontario, Canada

Kalanit Grill-Spector

Department of Psychology and Neuroscience Institute

Stanford University

Stanford, CA

Argye E. Hillis

Departments of Neurology, Physical Medicine and Rehabilitation, and Cognitive Science

Johns Hopkins University

Baltimore, MD

Page 9 of 19

Contributors Ray Jackendoff

Center for Cognitive Studies

Tufts University

Medford, MA

Petr Janata

Center for Mind and Brain

Department of Psychology

University of California Davis

Davis, CA

Roni Kahana

Department of Neurobiology

Weizmann Institute of Science

Rehovot, Israel

Stephen M. Kosslyn

Page 10 of 19


Minerva Project

San Francisco, CA

Youngbin Kwak

Neuroscience Program

University of Michigan

Ann Arbor, MI

Bruno Laeng

Department of Psychology

University of Oslo

Oslo, Norway

Ewen MacDonald

Department of Psychology

Queen’s University

Ontario, Canada

Page 11 of 19


Centre for Applied Hearing Research

Department of Electrical Engineering

Technical University of Denmark

Lyngby, Denmark

Bradford Z. Mahon

Departments of Neurosurgery and Brain and Cognitive Sciences

University of Rochester

Rochester, NY

Claudia Männel

Department of Neuropsychology

Max Planck Institute for Human Cognitive and Brain Sciences

Leipzig, Germany

(p. xiii)

Jason B. Mattingley

Queensland Brain Institute

Page 12 of 19


University of Queensland

St. Lucia, Queensland, Australia

Josh H. McDermott

Department of Brain and Cognitive Sciences

Massachusetts Institute of Technology

Cambridge, MA

Kevin Munhall

Department of Psychology

Queen’s University

Kingston, Ontario, Canada

Emily B. Myers

Department of Psychology

Department of Speech, Language, and Hearing Sciences

University of Connecticut

Page 13 of 19


Storrs, CT

Jeffrey Nicol

Department of Psychology

Nipissing University

North Bay, Ontario, Canada

Kevin N. Ochsner

Department of Psychology

Columbia University

New York, NY

Laure Pisella

Lyon Neuroscience Research Center

Bron, France

Gilles Rode

Page 14 of 19

Contributors Lyon Neuroscience Research Center

University Lyon

Hospices Civils de Lyon

Hôpital Henry Gabrielle

St. Genis Laval, France

Yves Rossetti

Lyon Neuroscience Research Center

University Lyon

Mouvement et Handicap

Plateforme IFNL-HCL

Hospices Civils de Lyon

Lyon, France

M. Rosario Rueda

Departemento de Psicolog í a Experimental

Centro de Investigación Mente, Cerebro y Comportamiento (CIMCYC)

Page 15 of 19


Universidad de Granada

Granada, Spain

Rachael D. Seidler

Department of Psychology

School of Kinesiology

Neuroscience Program

University of Michigan

Ann Arbor, MI

Noam Sobel

Department of Neurobiology

Weizmann Institute of Science

Rehovot, Israel

Sharon L. Thompson-Schill

Department of Psychology

Page 16 of 19


University of Pennsylvania

Philadelphia, PA

Caroline Tilikete

Lyon Neuroscience Research Center

University Lyon

Hospices Civils de Lyon

Hôpital Neurologique

Lyon, France

Kyrana Tsapkini

Departments of Neurology, and Physical Medicine and Rehabilitation

Johns Hopkins University

Baltimore, MD

Alain Vighetto

Lyon Neuroscience Research Center

Page 17 of 19


University Lyon

Hospices Civils de Lyon

Hôpital Neurologique

Lyon, France

Barbara A. Wilson

Department of Psychology

Institute of Psychiatry

King’s College

London, UK

John T. Wixted

Department of Psychology

University of California, San Diego

La Jolla, CA

Eiling Yee

Page 18 of 19


Basque Center on Cognition, Brain, and Language

San Sebastian, Spain

Josef Zihl

Neuropsychology Unit

Department of Psychology

University of Munich

Max Planck Institute of Psychiatry

Munich, Germany

(p. xiv)

Page 19 of 19

Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive Neuroscience—Where Are We Now?

Introduction to The Oxford Handbook of Cognitive Neu­ roscience: Cognitive Neuroscience—Where Are We Now?   Kevin N. Ochsner and Stephen Kosslyn The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics Edited by Kevin N. Ochsner and Stephen Kosslyn Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0001

Abstract and Keywords This two-volume set reviews the current state-of-the art in cognitive neuroscience. The in­ troductory chapter outlines central elements of the cognitive neuroscience approach and provides a brief overview of the eight sections of the book’s two volumes. Volume 1 is di­ vided into four sections comprising chapters that examine core processes, ways in which they develop across the lifespan, and ways they may break down in special populations. The first section deals with perception and addresses topics such as the abilities to repre­ sent and recognize objects and spatial relations and the use of top-down processes in vi­ sual perception. The second section focuses on attention and how it relates to action and visual motor control. The third section, on memory, covers topics such as working memo­ ry, semantic memory, and episodic memory. Finally, the fourth section, on language, in­ cludes chapters on abilities such as speech perception and production, semantics, the ca­ pacity for written language, and the distinction between linguistic competence and per­ formance. Keywords: cognitive neuroscience, perception, attention, language, memory, spatial relations, visual perception, visual motor control, semantics, linguistic competence

On a night in the late 1970s, something important happened in a New York City taxicab: A new scientific field was named. En route to a dinner at the famed Algonquin Hotel, the neuroscientist Michael Gazzaniga and the cognitive psychologist George Miller coined the term “cognitive neuroscience.” This field would go on to change the way we think about the relationship between behavior, mind, and brain. This is not to say that the field was born on that day. Indeed, as Hermann Ebbinghaus (1910) noted, “Psychology has a long past, but a short history,” and cognitive neuro­ science clearly has a rich and complex set of ancestors. Although it is difficult to say ex­ actly when a new scientific discipline came into being, the groundwork for the field had begun to be laid decades before the term was coined. As has been chronicled in detail elsewhere (Gardner, 1985; Posner & DiGirolamo, 2000), as behaviorism gave way to the Page 1 of 11

Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive Neuroscience—Where Are We Now? cognitive revolution, and as computational and neuroscientific approaches to understand­ ing the mind became increasingly popular, researchers in numerous allied fields came to believe that understanding the relationships between behavior and the mind required un­ derstanding their relationship to the brain. This two-volume set reviews the current state-of-the art in cognitive neuroscience, some 35 years after the field was named. In these intervening years, the field has grown tremendously—so much so, in fact, that cognitive neuroscience is now less a bounded dis­ cipline focused on specific topics and more an approach that permeates psychological and neuroscientific inquiry. As such, no collection of chapters could possibly encompass the entire breadth and depth of cognitive neuroscience. That said, this two-volume set at­ tempts systematically to survey eight core areas of inquiry in cognitive neuroscience, four per volume, in a total of 55 chapters. As an appetizer to this scientific feast, this introductory chapter offers a quick sketch of some central elements of the cognitive neuroscience approach and a brief overview of the eight sections of the Handbook’s two volumes. (p. 2)

The Cognitive Neuroscience Approach Among the many factors that gave rise to cognitive neuroscience, we highlight three sig­ nal insights. In part, we explicitly highlight these key ideas because they lay bare ele­ ments of the cognitive neuroscience approach that have become so commonplace today that their importance may be forgotten even as they implicitly influence the ways re­ search is conducted.

Multiple Levels of Analysis The first crucial influence on cognitive neuroscience were insights presented in a book by the late British vision scientist David Marr. Published in 1982, the book Vision took an old idea—levels of analysis—and made a strong case that we can only understand visual per­ ception if we integrate descriptions cast at three distinct, but fundamentally interrelated (Kosslyn & Maljkovic, 1990), levels. At the topmost computational level, one describes the problem at hand, such as how one can see edges, derive three-dimensional structure of shapes, and so on; this level characterizes “what” the system does. At the middle algo­ rithm level, one describes how a specific computational problem is solved by a system that includes specific processes that operate on specific representations; this level char­ acterizes “how” the system operates. And at the lowest implementation level, one de­ scribes how the representations and processes that constitute the algorithm are instanti­ ated in the brain. All three levels are crucial, and characteristics of the description at each level affect the way we must describe characteristics at the other levels. This approach proved enormously influential in vision research, and researchers in other domains quickly realized that it could be applied more broadly. This multilevel approach is now the foundation for cognitive neuroscience inquiry more generally, although we of­ Page 2 of 11

Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive Neuroscience—Where Are We Now? ten use different terminology to refer to these levels of analysis. For instance, many re­ searchers now talk about the levels of behavior and experience, psychological processes (or information processing mechanisms), and neural systems (Mitchell, 2006; Ochsner, 2007; Ochsner & Lieberman, 2001). But the core idea is still the same as that articulated by Marr: A complete understanding of the ways in which vision, memory, emotion, or any other cognitive or emotional faculty operates necessarily involves connecting descriptions of phenomena across levels of analysis. The resulting multilevel descriptions have many advantages over the one- or two-level ac­ counts that are typical of traditional approaches in allied disciplines such as cognitive psychology. These advantages include the ability to use both behavioral and brain data in combination—rather than just one or the other taken alone—to draw inferences about psychological processes. In so doing, one constructs theories that are constrained by, must connect to, and must make sense in the context of more types of data than theories that are couched solely at the behavioral or at the behavioral and psychological levels. We return to some of these advantages below.

Use of Multiple Methods If we are to study human abilities and capacities at multiple levels of analysis, we must necessarily use multiple types of methods to do so. In fact, many methods exist to mea­ sure phenomena at each of the levels of analysis, and new measures are continually being invented (Churchland & Sejnowski 1988). Today, this observation is taken as a given by many graduate students who study cogni­ tive neuroscience. They take it for granted that we should use studies of patient popula­ tions, electrophysiological methods, functional imaging methods, transcranial magnetic stimulation (TMS, which uses magnetic fields to temporarily impair or enhance neural functioning in a specific brain area), and other new techniques as they are developed. But this view wasn’t always the norm. This fact is illustrated nicely by a debate that took place in the early 1990s about whether and how neuroscience data should inform psycho­ logical models of cognitive processes. On one side was the view from cognitive neuropsy­ chology, which centered on the idea that studies of patient populations may be sufficient to understand the structure of cognitive processing (Caramazza, 1992). The claim was that by studying the ways in which behavior changes as a result of the unhappy accidents of nature (e.g., strokes, traumatic brain injuries) that caused lesions of language areas, memory areas, and so on, we can discover the processing modules that constitute the mind. The key assumption here is that researchers can identify direct relationships be­ tween behavioral deficits and specific areas of the brain that were damaged. On the other side of the debate was the view from cognitive neuroscience, (p. 3) which centered on the idea that the more methods used, the better (Kosslyn & Intriligator, 1992). Because every method has its limitations, the more methods researchers could bring to bear, the more likely they are to have a correct picture of how behavior is related to neural functioning. In the case of patient populations, for example, in some cases the deficits in behavior might not simply reflect the normal functions of the damaged regions; rather, they could Page 3 of 11

Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive Neuroscience—Where Are We Now? reflect reorganization of function after brain damage or diffuse damage to multiple re­ gions that affects multiple separate functions. If so, then observing patterns of dissocia­ tions and associations of abilities following brain damage would not necessarily allow re­ searchers to delineate the structure of cognitive processing. Other methods would be re­ quired (such as neuroimaging) to complement studies of brain-damaged patients. The field quickly adopted the second perspective, drawing on multiple methods when constructing and testing theories of cognitive processing. Researchers realized that they could use multiple methods together in complementary ways: They could use functional imaging methods to describe the network of processes active in the healthy brain when engaged in a particular behavior; they could use lesion methods or TMS to assess the causal relationships between activity in specific brain areas and particular forms of infor­ mation processing (which in turn give rise to particular types of behavior); they could use electrophysiological methods to study the temporal dynamics of cortical systems as they interactively relate to the behavior of interest. And so on. The cognitive neuroscience ap­ proach adopted the idea that no single technique provides all the answers. That said, there is no denying that some techniques have proved more powerful and gen­ erative than others during the past 35 years. In particular, it is difficult to overstate the impact of functional imaging of the healthy intact human brain, first ushered in by positron emission tomography studies in the late 1980s (Petersen et al., 1988) and given a tremendous boost by the advent of, and subsequent boom of, functional magnetic reso­ nance imaging in the early 1990s (Belliveau et al., 1992). The advent of functional imag­ ing is in many ways the single most important contributor to the rise of cognitive neuro­ science. Without the ability to study cortical and subcortical brain systems in action in healthy adults, it’s not clear whether cognitive neuroscience would have become the cen­ tral paradigm that it is today. We must, however, offer a cautionary note: Functional imaging is by no means the be-all and end-all of cognitive neuroscience techniques. Like any other method, it has its own strengths and weaknesses (which have been described in detail elsewhere, e.g., Poldrack, 2006, 2008, 2011; Van Horn & Poldrack, 2009; Yarkoni et al., 2010). Researchers trained in cognitive neuroscience understand many, if not all, of these limitations, but unfortu­ nately, many outside the field do not. This can cause two problems. The first is that new­ comers to the field may improperly use functional imaging in the service of overly simplis­ tic “brain mapping” (e.g., seeking to identify “love spots” in the brain; Fisher et al., 2002) and may commit other inferential errors (Poldrack, 2006). The second, less appreciated problem, is that when nonspecialists read about studies of such overly simplistic hypothe­ ses, they may assume that all cognitive neuroscientists traffic in this kind of experimenta­ tion and theorizing. As the chapters in these volumes make clear, most cognitive neuro­ scientists appreciate the strengths and limits of the various techniques they use, and un­ derstand that functional imaging is simply one of a number of techniques that allow neu­ roscience data to constrain theories of psychological processes. In the next section, we turn to exactly this point.

Page 4 of 11

Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive Neuroscience—Where Are We Now?

Constraints and Convergence One implication of using multiple methods to study phenomena at multiple levels of analy­ sis is that we have numerous types of data. These data provide converging evidence for, and constrain the nature of, theories of human cognition, emotion, and behavior. That is, the data must fit together, painting different facets of the same picture (this is what we mean by convergence). And even though each type of data alone does not dictate a partic­ ular interpretation, each type helps to narrow the range of possible interpretations (this is what we mean by constraining the nature of theories). Researchers in cognitive neuro­ science acknowledge that data always can be interpreted in various ways, but they also rely on the fact that data limit the range of viable interpretations—and the more types of data, the more strongly they will narrow down the range of possible theories. In this sense, constraints and convergence are the very core of the cognitive neuroscience ap­ proach (Ochsner & Kosslyn, 1999). We note that the principled use of constraining and converging evidence does not privi­ lege evidence couched at any one level of analysis. Brain data are not more important, more real, or more (p. 4) intrinsically valuable than behavioral data, and vice versa. Rather, both kinds of data constrain the range of possible theories of psychological processes, and as such, both are valuable. In addition, both behavioral and brain data can spark changes in theories of psychological processes. This claim stands in contrast to claims made by those who have argued that brain data can never change, or in any way constrain, a psychological theory. According to this view, brain data are ambiguous without a psychological theory to interpret them (Kihlstrom, 2012). Such arguments fail to appreciate the fact that the goal of cognitive neuroscience is to construct theories couched at all three levels of analysis. Moreover, be­ havioral and brain data often are dependent variables collected in the same experiments. This is not arbitrary; we have ample evidence that behavior and brain function are inti­ mately related: When the brain is damaged in a particular location, specific behaviors are disrupted—and when a person engages in specific behaviors, specific brain areas are acti­ vated. Dependent measures are always what science uses to constrain theorizing, and thus it follows that both behavioral and brain data must constrain our theories of the in­ tervening psychological processes. This point is so important that we want to illustrate it with a two examples. The first be­ gins with classic studies of the amnesic patient known for decades only by his initials, H.M. (Corkin, 2002). After he died, his brain was famously donated to science and dis­ sected live on the Internet in 2009 (see We now know that his name was Henry. In the 1960s, Henry suffered from severe epilep­ sy that could not be treated with medication, which arose because of abnormal neural tis­ sue in his temporal lobes. At the time, he suffered horribly from seizures, and the last re­ maining course of potential treatment was a neurosurgical operation that removed the tips of Henry’s temporal lobes (and with them, the neural origins of his epileptic seizures). Page 5 of 11

Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive Neuroscience—Where Are We Now? When Henry awoke after his operation, the epilepsy was gone, but so was his ability to form new memories of events he experienced. Henry was stuck in the eternal present, forevermore awakening each day with his sense of time frozen at the age at which he had the operation. The time horizon for his experience was about two minutes, or the amount of time information could be retained in short-term memory before it required transfer to a longer-term episodic memory store. To say that the behavioral sequelae of H.M.’s operation were surprising to the scientific community at that time is an understatement. Many psychologists and neuroscientists spent the better part of the next 20 to 30 years reconfiguring their theories of memory in order to accommodate these and subsequent findings. It wasn’t until the early 1990s that the long-reaching theoretical implications of Henry’s amnesia finally became clear (Schacter & Tulving, 1994), when a combination of behavioral, functional imaging, and patient lesion data converged to implicate a multiple-systems account of human memory. This understanding of H.M.’s deficits was hard won, and emerged only after an extended “memory systems debate” in psychology and neuroscience (Schacter & Tulving, 1994). This debate was between, on the one hand, behavioral and psychological theorists who argued that we have a single memory system (which has multiple processes) and, on the other hand, neuroscience-inspired theorists who argued that we have multiple memory systems (each of which instantiates a particular kind of process or processes). The initial observation of H.M.’s amnesia, combined with decades of subsequent careful experimen­ tation using multiple behavioral and neuroscience techniques, decisively came down on the side of the multiple memory systems theorists. Cognitive processing relies on multiple types of memory, and each uses a distinct set of representations and processes. This was a clear victory for the cognitive neuroscience approach over purely behavioral approach­ es. A second example of the utility of combining neuroscientific and behavioral evidence comes from the “imagery debate” (Kosslyn, Thompson, & Ganis, 2006). On one hand, some psychologists and philosophers argued that the pictorial characteristics of visual mental images that are evident to experience are epiphenomenal, like heat produced by a light bulb when someone is reading—something that could be experienced but played no role in accomplishing the function. On the other hand, cognitive neuroscientists argued that visual mental images are analogous to visual percepts in that they use space in a rep­ resentation to specify space in the world. This debate went back and forth for many years without resolution, and at one point a mathematical proof was offered that behavioral data alone could never resolve it (Ander­ son, 1978). The advent of neuroimaging helped bring this debate largely to a close (Koss­ lyn, Thompson, & Ganis, 2006). A key (p. 5) finding was that the first cortical areas that process visual input during perception each are topographically mapped, such that adja­ cent locations in the visual world are represented in adjacent locations in the visual cor­ tex. That is, these areas use space on the cortex to represent space in the world. In the early 1990s, researchers showed that visualizing objects typically activates these areas, Page 6 of 11

Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive Neuroscience—Where Are We Now? and increasing the size of a visual mental image activates portions of this cortex that reg­ ister increasingly larger sizes in perception. Moreover, in the late 1990s researchers showed that temporarily impairing these areas using TMS hampers imagery and percep­ tion to the same degree. Hence, these brain-based findings provided clear evidence that visual mental images are, indeed, analogous to visual percepts in that both represent space in the world by using space in a representation. We have written as if both debates—about memory systems and mental imagery repre­ sentation—are now definitely closed. But this is a simplification; not everyone is con­ vinced of one or another view. Our crucial point is that the advent of neuroscientific data has shifted the terms of the debate. When only behavioral data were available, in both cases the two alternative positions seemed equally plausible—but after the relevant neu­ roscientific data were introduced, the burden of proof shifted dramatically to one side— and a clear consensus emerged in the field (e.g., see Reisberg, Pearson, & Kosslyn, 2003). In the years since these debates, evidence from cognitive neuroscience has constrained theories of a wide range of phenomena. Many such examples are chronicled in this Hand­ book.

Overview of the Handbook Cognitive neuroscience in the new millennium is a broad and diverse field, defined by a multileveled integrative approach. To provide a systematic overview of this field, we’ve di­ vided this Handbook into two volumes.

Volume 1 The first volume surveys classic areas of interest in cognitive neuroscience: perception, attention, memory, and language. Twenty years ago when Kevin Ochsner was a graduate student and Stephen Kosslyn was one of his professors, research on these topics formed the backbone of cognitive neuroscience research. And this is still true today, for two rea­ sons. First, when cognitive neuroscience took off, these were the areas of research within psy­ chology that had the most highly developed behavioral, psychological, and neuropsycho­ logical (i.e., brain-damaged patient based) models in place. And in the case of research on perception, attention, and memory, these were topics for which fairly detailed models of the underlying neural circuitry already had been developed on the basis of rodent and nonhuman primate studies. As such, these areas were poised to benefit from the use of brain-based techniques in humans. Second, research on the representations and processes used in perception, attention, memory, and language in many ways forms a foundation for studying other kinds of com­ plex behaviors, which are the focus of the second volume. This is true both in terms of the

Page 7 of 11

Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive Neuroscience—Where Are We Now? findings themselves and in terms of the evidence such findings provided that the cogni­ tive neuroscience approach could be successful. With this in mind, each of the four sections in Volume 1 includes a selection of chapters that cover core processes and the ways in which they develop across the lifespan and may break down in special populations. The first section, on perception, includes chapters on the abilities to represent and recog­ nize objects and spatial relations. In addition, this section contains chapters on the use of top-down processes in visual perception and on the ways in which such processes enable us to construct and use mental images. We also include chapters on perceptual abilities that have seen tremendous research growth in the past 5 to 10 years, such as on the study of olfaction, audition, and music perception. Finally, there is a chapter on disorders of perception. The second section, on attention, includes chapters on the abilities to attend to auditory and spatial information as well as on the relationships between attention, action, and vi­ sual motor control. These are followed by chapters on the development of attention and its breakdown in various disorders. The third section, on memory, includes chapters on the abilities to maintain information in working memory as well as semantic memory, episodic memory, and the consolidation process that governs the transfer of information from working to semantic and episodic memory. There is also a chapter on the ability to acquire skills, which depends on differ­ ent systems than those used in other forms of memory, as well as chapters on changes in memory function with older age and the ways in which memorial processes break down in various disorders. Finally, the fourth section, on language, includes chapters on abilities such as speech per­ ception and production, the distinction between linguistic (p. 6) competence and perfor­ mance, semantics, the capacity for written language, and multimodal and developmental aspects of speech perception.

Volume 2 Whereas Volume 1 addresses the classics of cognitive neuroscience, Volume 2 focuses on the “new wave” of research that has developed primarily in the past 10 years. As noted earlier, in many ways the success of these relatively newer research directions builds on the successes of research in the classic domains. Indeed, our knowledge of the systems implicated in perception, attention, memory, and language literally—and in this Handbook —provided the foundation for the work described in Volume 2. The first section, on emotion, begins with processes involved in interactions between emotion, perception, and attention, as well as the generation and regulation of emotion. This is followed by chapters that provide models for understanding broadly how emotion affects cognition as well as the contribution that bodily sensation and control make to af­ Page 8 of 11

Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive Neuroscience—Where Are We Now? fective and other processes. This section concludes with chapters on genetic and develop­ mental approaches to emotion. The second section, on self and social cognition, begins with a chapter on the processes that give rise to the fundamental ability to know and understand oneself. This is followed by chapters on increasingly complex abilities involved in perceiving others, starting with the perception of nonverbal cues and perception–action links, and from there ranging to face recognition, impression formation, drawing inferences about others’ mental states, empathy, and social interaction. This section concludes with a chapter on the develop­ ment of social cognitive abilities. The third section, on higher cognitive functions, surveys abilities that largely depend on processes in the frontal lobes of the brain, which interact with the kinds of core perceptu­ al, attentional, and memorial processes described in Volume 1. Here, we include chapters on conflict monitoring and cognitive control, the hierarchical control of action, thinking, decision making, categorization, expectancies, numerical cognition, and neuromodulatory influences on higher cognitive abilities. Finally, in the fourth section, four chapters illustrate how disruptions of the mechanisms of cognition and emotion produce abnormal functioning in clinical populations. This sec­ tion begins with a chapter on attention deficit-hyperactivity disorder and from there moves to chapters on anxiety, post-traumatic stress disorder, and obsessive-compulsive disorder.

Summary Before moving from the appetizer to the main course, we offer two last thoughts. First, we edited this Handbook with the goal of providing a broad-reaching compendium of research on cognitive neuroscience that will be widely accessible to a broad audience. Toward this end, the chapters included in this Handbook are available online to be down­ loaded individually. This is the first time that chapters of a Handbook of this sort have been made available in this way, and we hope this facilitates access to and dissemination of some of cognitive neuroscience’s greatest hits. Second, we hope that, whether you are a student, an advanced researcher, or an interest­ ed layperson, this Handbook whets your appetite for learning more about this exciting and growing field. Although reading survey chapters of the sort provided here is an excel­ lent way to become oriented in the field and to start building your knowledge of the top­ ics that interest you most, we encourage you to take your interests to the next level: Delve into the primary research articles cited in these chapters—and perhaps even get in­ volved in doing this sort of research!

Page 9 of 11

Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive Neuroscience—Where Are We Now?

References Anderson, J. R. (1978). Arguments concerning representations for mental imagery. Psy­ chological Review, 85, 249–277. Belliveau, J. W., Kwong, K. K., Kennedy, D. N., Baker, J. R., Stern, C. E., et al. (1992). Mag­ netic resonance imaging mapping of brain function: Human visual cortex. Investigative Radiology, 27 (Suppl 2), S59–S65. Caramazza, A. (1992). Is cognitive neuropsychology possible? Journal of Cognitive Neuro­ science, 4, 80–95. Churchland, P. S., & Sejnowski, T. J. (1988). Perspectives on cognitive neuroscience. Science, 242, 741–745. Corkin, S. (2002). What’s new with the amnesic patient H.M.? Nature Reviews, Neuro­ science, 3, 153–160. Fisher, H. E., Aron, A., Mashek, D., Li, H., & Brown, L. L. (2002). Defining the brain sys­ tems of lust, romantic attraction, and attachment. Archives of Sexual Behavior, 31, 413– 419. Gardner, H. (1985). The mind’s new science: A history of the cognitive revolution. New York: Basic Books. Kihlstrom, J. F. (2012). Social neuroscience: The footprints of Phineas Gage. Social Cogni­ tion, 28, 757–782. Kosslyn, S. M., & Intriligator, J. I. (1992). Is cognitive neuropsychology plausible? The per­ ils of sitting on a one-legged stool. Journal of Cognitive Neuroscience, 4, 96–105. Kosslyn, S. M., & Maljkovic, V. M. (1990). Marr’s metatheory revisited. Concepts in Neu­ roscience, 1, 239–251. Kosslyn, S. M., Thompson, W. L., & Ganis, G. (2006). The case for mental imagery. New York: Oxford University Press. Marr, D. (1982). Vision: A computational investigation into the human representa­ tion and processing of visual information. San Francisco: W. H. Freeman. (p. 7)

Mitchell, J. P. (2006). Mentalizing and Marr: An information processing approach to the study of social cognition. Brain Research, 1079, 66–75. Ochsner, K. (2007). Social cognitive neuroscience: Historical development, core princi­ ples, and future promise. In A. Kruglanksi & E. T. Higgins (Eds.), Social psychology: A handbook of basic principles (pp. 39–66). New York: Guilford Press.

Page 10 of 11

Introduction to The Oxford Handbook of Cognitive Neuroscience: Cognitive Neuroscience—Where Are We Now? Ochsner, K. N., & Kosslyn, S. M. (1999). The cognitive neuroscience approach. In B. M. Bly & D. E. Rumelhart (Eds.), Cognitive science (pp. 319–365). San Diego, CA: Academic Press. Ochsner, K. N., & Lieberman, M. D. (2001). The emergence of social cognitive neuro­ science. American Psychologist, 56, 717–734. Petersen, S. E., Fox, P. T., Posner, M. I., Mintun, M., & Raichle, M. E. (1988). Positron emission tomographic studies of the cortical anatomy of single-word processing. Nature, 331, 585–589. Poldrack, R. A. (2006). Can cognitive processes be inferred from neuroimaging data? Trends in Cognitive Sciences, 10, 59–63. Poldrack, R. A. (2008). The role of fMRI in cognitive neuroscience: Where do we stand? Current Opinion in Neurobiology, 18, 223–227. Poldrack, R. A. (2011). Inferring mental states from neuroimaging data: From reverse in­ ference to large-scale decoding. Neuron, 72, 692–697. Posner, M. I., & DiGirolamo, G. J. (2000). Cognitive neuroscience: Origins and promise. Psychological Bulletin, 126, 873–889. Reisberg, D., Pearson, D. G., & Kosslyn, S. M. (2003). Intuitions and introspections about imagery: The role of imagery experience in shaping an investigator’s theoretical views. Applied Cognitive Psychology, 17, 147–160. Schacter, D. L., & Tulving, E. (1994). (Eds.) Memory systems 1994. Cambridge, M A: MIT Press. Van Horn, J. D., & Poldrack, R. A. (2009). Functional MRI at the crossroads. International Journal of Psychophysiology, 73, 3–9. Yarkoni, T., Poldrack, R. A., Van Essen, D. C., & Wager, T. D. (2010). Cognitive neuro­ science 2.0: Building a cumulative science of human brain function. Trends in Cognitive Sciences, 14, 489–496.

Kevin N. Ochsner

Kevin N. Oschner is a professor in the Department of Psychology at Columbia Univer­ sity in New York, NY. Stephen Kosslyn

Stephen M. Kosslyn, Center for Advanced Study in the Behavioral Sciences, Stanford University, Stanford, CA

Page 11 of 11

Representation of Objects

Representation of Objects   Kalanit Grill-Spector The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics Edited by Kevin N. Ochsner and Stephen Kosslyn Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0002

Abstract and Keywords Functional magnetic resonance imaging (fMRI) has enabled neuroscientists and psycholo­ gists to understand the neural bases of object recognition in humans. This chapter re­ views fMRI research that yielded important insights about the nature of object represen­ tations in the human brain. Combining fMRI with psychophysics may offer clues about what kind of visual processing is implemented in distinct cortical regions. This chapter explores how fMRI has influenced current understanding of object representations by fo­ cusing on two aspects of object representation: how the underlying representations pro­ vide for invariant object recognition and how category information is represented in the ventral stream. It first provides a brief introduction of the functional organization of the human ventral stream and a definition of object-selective cortex before describing cue-in­ variant responses in the lateral occipital complex (LOC), neural bases of invariant object recognition, object and position information in the LOC, and viewpoint sensitivity across the LOC. The chapter concludes by commenting on debates about the nature of functional organization in the human ventral stream. Keywords: functional magnetic resonance imaging, object recognition, psychophysics, brain, object representation, category information, ventral stream, object-selective cortex, lateral occipital complex, functional organization

Introduction Humans can effortlessly recognize objects in a fraction of a second despite large variabili­ ty in the appearance of objects (Thorpe et al., 1996). What are the underlying representa­ tions and computations that enable this remarkable human ability? One way to answer these questions is to investigate the neural mechanisms of object recognition in the hu­ man brain. With the advent of functional magnetic resonance imaging (fMRI) about 20 years ago, neuroscientists and psychologists began to examine the neural bases of object recognition in humans. fMRI is an attractive method because it is a noninvasive tech­ nique that allows multiple measurements of brain activation in the same awake behaving human. Among noninvasive techniques, it provides the best spatial resolution currently Page 1 of 29

Representation of Objects available, enabling us to localize cortical activations in the spatial resolution of millime­ ters (as fine as 1 mm) and at a reasonable time scale (on the order of seconds). Before the advent of fMRI, knowledge about the function of the ventral stream was based on single-unit electrophysiology measurements in monkeys and on lesion studies. These studies showed that neurons in the monkey inferotemporal (IT) cortex respond to shapes (Fujita et al., 1992) and complex objects such as faces (Desimone et al., 1984), and that lesions to the ventral stream can produce specific deficits in object recognition such as agnosia (inability to recognize objects) and prosopagnosia (inability to recognize faces; see Farah, 1995). However, interpreting lesion data is complicated because lesions are typically diffuse (usually more than one region is damaged), may disrupt both a cortical region and its connectivity, (p. 12) and are not replicable across patients. Therefore, the primary knowledge gained from fMRI research was which cortical sites in the normal hu­ man brain are involved in object recognition. The first set of fMRI studies of object and face recognition in humans identified the regions in the human brain that respond selec­ tivity to objects and faces (Malach et al., 1995; Kanwisher et al., 1997; Grill-Spector et al., 1998b). Next, a series of studies demonstrated that activation in object- and face-selec­ tive regions correlates with success at recognizing object and faces, respectively, provid­ ing striking evidence for the involvement of these regions in recognition (Bar et al., 2001; Grill-Spector et al., 2000, 2004). After researchers determined which regions in cortex are involved in object recognition, their focus shifted to examining the nature of repre­ sentations and computations that are implemented in these regions to understand how they enable efficient object recognition in humans. In this chapter, I review fMRI research that provided important knowledge about the na­ ture of object representations in the human brain. For example, one of the fundamental problems in recognition is how to recognize an object across variations in its appearance (invariant object recognition). Understanding how a biological system has solved this problem may give hints for how to build a robust artificial recognition system. Further, fMRI is more adequate for measuring object representations than the temporal sequence of computations en route to object recognition because the time scale of fMRI measure­ ments is longer than the time scale of the recognition process (the temporal resolution of fMRI is on the order of seconds, whereas object recognition takes about 100 to 250 ms). Nevertheless, combining psychophysics with fMRI may give us some clues to the kinds of visual processing implemented in distinct cortical regions. For example, finding regions whose activation is correlated with success at some tasks, but not others, may suggest the involvement of particular cortical regions in one computation, but not another. In discussing how fMRI has affected our current understanding of object representations, I focus on results pertaining to two aspects of object representation: • How do the underlying representations provide for invariant object recognition? • How is category information represented in the ventral stream?

Page 2 of 29

Representation of Objects I have chosen these topics for three reasons: (1) are central topics in the field of object recognition for which fMRI has substantially advanced our understanding, (2) some find­ ings related to these topics stirred considerable debate (see the later section, Debates about the Nature of Functional Organization in the Human Ventral Stream), and (3) some of the fMRI findings in humans are surprising given prior knowledge from single-unit electrophysiology in monkeys. In terms of the chapter’s organization, I begin with a brief introduction of the functional organization of the human ventral stream and a definition of object-selective cortex, and then describe research that elucidated the properties of these regions with respect to basic coding principles. I continue with findings related to invariant object recognition, and then end with research and theories regarding category representation and specialization in the human ventral stream.

Functional Organization of the Human Ventral Stream The first set of fMRI studies on object and face recognition in humans was devoted to identifying the regions in the brain that are object and face selective. Electrophysiology research in monkeys suggested that neurons in higher level regions respond to shapes and objects more than simple stimuli such as lines, edges, and patterns (Desimone et al., 1984; Fujita et al., 1992; Logothetis et al., 1995). Based on these findings, fMRI studies measured brain activation when people viewed pictures of objects, as opposed to when people viewed scrambled objects (i.e., pictures that have the same local information and statistics, but do not contain an object) or texture patterns (e.g., checkerboards, which are robust visual stimuli, but do not elicit a percept of a global form). These studies found a constellation of regions in the lateral occipital cortex termed the lateral occipital com­ plex (LOC), extending from the lateral occipital sulcus, posterior to the medial temporal hMT+ region ventrally to the occipito-temporal sulcus (OTS) and the fusiform gyrus (Fus), that respond more to objects than controls. The LOC is located lateral and anterior to early visual areas (Grill-Spector et al., 1998a, 1998b) and is typically divided to two subregions: LO, a region in lateral occipital cortex adjacent and posterior to the hMT+ re­ gion; and pFus/OTS, a ventral region overlapping the OTS and the posterior fusiform gyrus (pFus) (Figure 2.1). More recent experiments indicate that the posterior subregion (LO) overlaps a visual field map representation between V3a and hMT+ called LO2 (Sayres & Grill-Spector, 2008).

Page 3 of 29

Representation of Objects

Figure 2.1 Object-, face- and place-selective cortex. (a) Data of one representative subject shown on her partially inflated right hemisphere. Left: lateral view. Right: ventral view. Dark gray: sulci. Light gray: gyri. White lines delineate retinotopic regions. Blue: ob­ ject-selective regions (objects > scrambled objects), including LO and pFus/OTS ventrally as well as dor­ sal foci along the intraparietal sulcus (IPS). Red: face-selective regions (faces > objects, body parts, places & words), including two regions in the fusiform gyrus (FFA-1, FFA-2) a region in the inferior occipital gyrus (IOG) and two regions in the posteri­ or superior temporal sulcus (STS). Magenta: overlap between face- and object-selective regions. Green: place-selective regions (places > faces, body parts, objects and words.), including the PPA and a dorsal region lateral to the IPS. Yellow: Body part selective regions (bodies > other categories). Black: Visual word form area (VWFA), words > other categories. All maps thresholded at p < 0.001, voxel level. (b) LO and pFus (but not V1) responses are correlated with recognition performance (Ungerleider et al., 1983; Grill-Spector et al., 2000). To superimpose recogni­ tion performance and fMRI signals on the same plot, all values were normalized relative to the maximal response for the 500-ms duration stimulus. For


(p. 13)

Page 4 of 29

Representation of Objects The LOC responds robustly to many kinds of objects and object categories (including nov­ el objects) and is thought to be in the intermediate or high-level stages of the visual hier­ archy. Importantly, LOC activations are correlated with subjects’ object recognition per­ formance. High LOC responses correlate with successful object recognition (hits), and low LOC responses correlate with trials in which objects are present, but are not recog­ nized (misses) (see Figure 2.1b). There are also object-selective regions in the dorsal stream (Grill-Spector, 2003; Grill-Spector & Malach, 2004), but these regions do not cor­ relate with object recognition performance (Fang & He, 2005) and may be involved in computations related to visually guided actions toward objects (Culham et al., 2003). However, a comprehensive discussion of the dorsal stream’s role in object perception is beyond the scope of this chapter. In addition to the LOC, researchers found several ventral regions that show preferential responses to specific object categories. Searching for regions with categorical preference was motivated by reports that suggested that lesions to the ventral stream can produce very specific deficits, such as the inability to recognize faces or the inability to read words, whereas other visual (and recognition) faculties are preserved. By contrasting ac­ tivations to different kinds of objects, researchers found ventral regions that show higher responses to specific object categories, such as lateral fusiform regions that respond more to animals than tools and medial fusiform regions that respond to tools more than animals (Chao et al., 1999; Martin et al., 1996); a region in the left OTS that responds more strongly to letters than textures (the visual word form area [VWFA]; Cohen et al., 2000); several foci that respond more strongly to faces than other objects (Grill-Spector et al., 2004; Haxby et al., 2000; Hoffman & Haxby, 2000; Kanwisher et al., 1997; Weiner & Grill-Spector, 2012), including the fusiform face areas (FFA-1, FFA-2; Kanwisher et al., 1997; Weiner & Grill-Spector, 2010); regions that respond more strongly to houses and places than faces and objects, including a region in the parahippocampal gyrus, the parahippocampal place area (PPA; Epstein & Kanwisher, 1998); and regions that respond more strongly to body parts than faces and objects, including a region near the MT called the extrastriate body area (EBA; Downing et al., 2001); and a region in the fusiform gyrus, the fusiform body area (FBA; Schwarzlose et al., 2005, or OTS-limbs, Weiner and GrillSpector, 2011). Nevertheless, many of these object-selective and category-selective re­ gions respond to more than one object category and also respond strongly to object frag­ ments (Grill-Spector et al., 1998b; Lerner et al., 2001, 2008). This suggests that caution is needed when interpreting the nature of the selective responses. It is possible that the un­ derlying representation is perhaps of object parts, features, and/or fragments and not of whole objects or object categories. Findings of category-selective regions in the human brain initiated a fierce debate about the (p. 14) principles of functional organization in the ventral stream. Should one consider only the maximal responses to the preferred category, or do the non maximal responses also carry information? How abstract is the information represented in these regions? For example, is category information represented in these regions, or are low-level visual fea­ tures that are associated with categories represented? I address these questions in detail Page 5 of 29

Representation of Objects in the later section, Debates about the Nature of Functional Organization in the Human Ventral Stream.

Cue-Invariant Responses in the Lateral Occipi­ tal Complex Although findings of object-selective responses in the human brain were suggestive of the involvement of these regions in processing objects, there are many differences between objects and scrambled objects (or objects and texture patterns). Objects have a shape, surfaces, and contours; they are associated with a meaning and semantic information; and generally are more interesting than texture patterns. Each of these factors may af­ fect the higher fMRI response to objects than controls. Further, differences in low-level visual properties across objects and controls may be driving differences in response am­ plitudes.

Figure 2.2 Selective responses to objects across multiple visual cues across the lateral occipital com­ plex. Statistical maps of selective response to object from luminance, stereo, and motion information in a representative subject. All maps were thresholded at p < 0.005, voxel level, and are shown on the inflated right hemisphere of a representative subject. (a) Luminance objects > scrambled luminance objects. (b) Objects generated from random dot stereograms vs. structureless random dot stereograms (perceived as a cloud of dots). (c) Objects generated from dot motion vs. the same dots moving randomly. Visual meridians are represented by the red (upper), blue (horizontal), and green (lower) lines. White contour: motion-selective region, MT. (Adapted from Vinberg & Grill-Spector, 2008.)

Converging evidence from several studies revealed an important aspect of coding in the LOC: it responds to object shape, not low-level visual features. These studies showed that all LOC subregions (LO and pFus/OTS) respond more strongly when subjects view objects independently of the type of visual information that defines the object form (Gilaie-Dotan et al., 2002; Grill-Spector et al., 1998a; Kastner et al., 2000; Kourtzi & Kanwisher, 2000, 2001; Vinberg & Grill-Spector, 2008) (Figure 2.2). The LOC responds more strongly to (1) objects defined by luminance compared with luminance textures, (2) objects generated from random dot stereograms compared with structureless random dot stereograms, (3) objects generated from structure from motion relative to random (structureless) motion, and (4) objects generated from textures compared with texture patterns. LOC response to Page 6 of 29

Representation of Objects objects is also similar across object format (gray-scale, line drawings, silhouettes), and it responds to objects delineated by both real and illusory contours (Mendola et al., 1999; Stanley & Rubin, 2003). Kourtzi and Kanwisher (2001) also showed that when objects have the same shape but different contours, there is fMRI adaptation (fMRI-A, indicating a common neural substrate), but there is no fMRI-A when the shared contours were iden­ tical but the perceived shape was different, suggesting that the LOC responds to global shape rather than local contours (see also Kourtzi et al., 2003; Lerner et al., 2002). Over­ all, these studies provided fundamental knowledge showing that LOC activation is driven by shape rather than low-level visual information. More recently, we examined whether LOC response to objects is driven by their global shape or their surfaces and whether LOC subregions are sensitive to border ownership. One open question in object recognition is whether the region in the image that belongs to the object is first segmented from the rest of the image (figure–ground segmentation) and then recognized, or whether knowing the shape of an object aids its segmentation (Nakayama et al., 1995; Peterson & Gibson, 1994a, 1994b). To address these questions, we scanned subjects when they viewed stimuli that were matched for their low-level in­ formation (p. 15) but generated different percepts. Conditions included: (1) a flat object in front of a flat background object, (2) a flat surface with a shaped hole (same shape as the object) in front of a flat background, (3) two flat surfaces without shapes, (4) local edges (created by scrambling the object contour) in front of a background, or (5) random dot stimuli with no structure (Vinberg & Grill-Spector, 2008) (Figure 2.3a). Note that condi­ tions 1 and 2 both contain a shape, but only condition 1 contains an object. We repeated the experiment twice, once with random dots that were presented stereoscopically and once with random dots that moved, to determine whether the pattern of result varied across stereo and motion cues. We found that LOC responses (both LO and pFus/OTS) were higher for objects and shaped holes than for surfaces, local edges, or random stim­ uli (see Figure 2.3b). We observed these results for both motion and stereo cues. In con­ trast, LOC responses were not higher for surfaces than for random stimuli and were not higher for local edges than for random stimuli. Thus, adding either local edge information or global surface information does not increase LOC response. However, adding a global shape produces a significant increase in LOC response. These results provide clear evi­ dence that cue-invariant responses in the LOC are driven by object shape, rather than by global surface information or local edge information. Additional studies revealed that the LOC is also sensitive to border ownership (Appel­ baum et al., 2006; Vinberg & Grill-Spector, 2008). Specifically, LO and pFus/OTS respons­ es were higher for objects (shapes presented in the foreground) than for the same shapes when they defined holes in the foreground. Since objects and holes had the same shape, the only difference between the objects and the holes was the border ownership of the contour defining the shape. In the former case, the border belongs to the object, and in the latter case, it belongs to the flat surface in which the hole is punched in. Interestingly, this higher response to objects than holes was a unique characteristic of LOC subregions and did not occur in other visual regions (see Figure 2.3). This result suggests that LOC prefers shapes (and contours) when they define the figure region. One implication of this Page 7 of 29

Representation of Objects result is that perception the same cortical machinery determines what is the object in the visual input as well as which region in the visual input is the figure regions, correspond­ ing to the object.

Neural Bases of Invariant Object Recognition

Figure 2.3 Responses to shape, edges and surfaces across the ventral stream. (a) Schematic illustration of experimental conditions. Stimuli were generated from either motion or stereo information alone and had no luminance edges or surfaces (except for the screen border, which was present during the entire experiment, including blank baseline blocks). For il­ lustration purposes, darker regions indicate front surfaces. From left to right: Object on the front sur­ face in front of a flat background plane. Shaped hole on the front surface in front of a flat background. (c) Disconnected edges in front of a flat background. Edges were generated by scrambling the shape con­ tours. Surfaces: Two semitransparent flat surfaces at different depths. Random stimuli with no coherent structure, edges, global surfaces, or global shape. Random stimuli had the same relative disparity or depth range as other conditions. See examples of stimuli: jnpstim/. (b) Responses to objects, holes, edges, and global surfaces across the visual ventral processing hierarchy. Responses: mean ± SEM across eight sub­ jects. O: object; H: hole; S: surfaces; E: edges; R: ran­ dom. Diamonds: significantly different than random at p < 0.05. (Adapted with permission from Vinberg & Grill-Spector, 2008.)

The literature reviewed so far provides accumulating evidence that LOC is involved in processing object form. The next question that one may ask, given the role of the LOC in object perception, (p. 16) is, How does it deal with the variability in objects’ appearance? There are many factors that can affect the appearance of objects. Changes in object ap­ pearance can occur as a result of the object being at different locations relative to the ob­ server, which will affect the retinal projection of the object in terms of its size and posi­ tion. Also, the two-dimensional (2D) projection of a three-dimensional (3D) object on the retina varies considerably owing to changes in its rotation and viewpoint relative to the observer. Other changes in appearance occur because of differential illumination condi­ Page 8 of 29

Representation of Objects tions, which affect the object’s color, contrast, and shadowing. Nevertheless, humans are able to recognize objects across large changes in their appearance, which is referred to as invariant object recognition. A central topic of research in the study of object recognition is understanding how invari­ ant recognition is accomplished. One view suggests that invariant object recognition is accomplished because the underlying neural representations are invariant to the appear­ ance of objects. Thus, there will be similar neural responses even when the appearance of an object changes considerably. One means by which this can be achieved is by extracting from the visual input features or fundamental elements (such as geons; Biederman, 1987) that are relatively insensitive to changes in objects’ appearance. According to one influen­ tial model (the recognition by components [RBC] model; Biederman, 1987), objects are represented by a library of geons (that are easy to detect in many viewing conditions) and their spatial relations. Other theories suggest that invariance may be generated through a sequence of computations across a hierarchically organized processing stream in which the level of sensitivity to object transformation decreases from one level of processing to the next. For example, at the lowest level, neurons code local features, and in higher lev­ els of the processing stream, neurons respond to more complex shapes and are less sensi­ tive to changes in position and size (Riesenhuber & Poggio, 1999). Neuroimaging studies of invariant object recognition found differential sensitivity across the ventral stream to object transformations such as size, position, illumination, and view­ point. Intermediate regions such as LO show higher sensitivity to image transformations than higher level regions such as pFus/OTS. Notably, accumulating evidence from many studies suggests that at no point in the ventral stream are neural representations entirely invariant to object transformations. These results support an account in which invariant recognition is supported by a pooled response across neural populations that are sensi­ tive to object transformations. One way in which this can be accomplished is by a neural code that contains independent sensitivity to object information and object transforma­ tion (DiCarlo & Cox, 2007). For example, neurons may be sensitive to both object catego­ ry and object position. As long as the categorical preference is retained across object transformations, invariant object information can be extracted.

Object and Position Information in the Lateral Occipital Complex Of the object transformations that the recognition system needs to overcome, size and po­ sition invariance are thought to be accomplished in part by an increase in the size of neural receptive fields along the visual hierarchy. That is, as one ascends the visual hier­ archy, neurons respond to stimuli across a larger part of the visual field. At the same time, a more complex visual stimulus is necessary to elicit significant responses in neu­ rons (e.g., shapes instead of oriented lines). Findings from electrophysiology suggest that even at the highest stages of the visual hierarchy, neurons retain some sensitivity to ob­ ject location and size (although electrophysiology reports vary significantly about the de­ Page 9 of 29

Representation of Objects gree of position sensitivity of IT neurons (DiCarlo & Maunsell, 2003; Op De Beeck & Vo­ gels, 2000; Rolls, 2000). A related issue is whether position sensitivity of neurons in high­ er visual areas manifests as an orderly, topographic representation of the visual field. Several studies documented sensitivity to both eccentricity and polar angle in distinct ventral stream regions. Both object-selective and category-selective regions in the ventral stream respond to objects presented at multiple positions and sizes. However, the ampli­ tude of response to an object varies across different retinal positions. The LO, pFus/OTS, and category-selective regions (e.g. FFA, PPA) respond more strongly to objects present­ ed in the contralateral compared with ipsilateral visual field (Grill-Spector et al., 1998b; Hemond et al., 2007; McKyton & Zohary, 2007). Some regions (LO and EBA) also respond more strongly to objects presented in the lower visual field (Sayres & Grill-Spector, 2008; Schwarzlose et al., 2008). Responses also vary with eccentricity: the FFA and the VWFA respond more strongly to centrally presented stimuli, and the PPA responds more strong­ ly to peripherally presented stimuli (Hasson et al., 2002, 2003; Levy et al., 2001; Sayres & Grill-Spector, 2008). Further, more recently, Aracro & (p. 17) colleagues discovered that the PPA contains two visual field map representations (Aracaro et al., 2009). Using fMRI-A, my colleagues and I have shown that the pFus/OTS, but not the LO, ex­ hibits some degree of insensitivity to objects’ size and position (Grill-Spector et al., 1999). fMRI-A is a method that allows characterization of the sensitivity of neural representa­ tions to stimulus transformations at a subvoxel resolution. fMRI-A is based on findings from single-unit electrophysiology showing that when objects repeat, there is a stimulusspecific decrease in IT cells’ response to the repeated image, but not to other object im­ ages (Miller et al., 1991; Sawamura et al., 2006). Similarly, fMRI signals in higher visual regions show a stimulus-specific reduction (fMRI-A) in response to repetition of identical object images (Grill-Spector et al., 1999, 2006a; Grill-Spector & Malach, 2001). We showed that fMRI-A can be used to test the sensitivity of neural responses to object trans­ formation by adapting cortex with a repeated presentation of an identical stimulus and examining adaptation effects when the stimulus is changed along an object transforma­ tion (e.g., changing its position). If the response remains adapted, it indicates that neu­ rons are insensitive to the change. However, if the response returns to the initial level (i.e., recovers from adaptation), it indicates sensitivity to the change (Grill-Spector & Malach, 2001). Using fMRI-A, we found that repeated presentation of the same face or object at the same position and size produces reduced fMRI activation. This is thought to reflect stimulusspecific neural adaptation. Presenting the same face or object in different positions in the visual field or at different sizes also produces fMRI-A in pFus/OTS and FFA, indicating in­ sensitivity to object size and position in the range we tested (Grill-Spector et al., 1999; see also Vuilleumier et al., 2002). This result is consistent with electrophysiology findings showing that IT neurons that respond similarly to stimuli at different positions in the visu­ al field also show adaptation when the same object is shown in different positions (Lueschow et al., 1994). In contrast, the LO recovered from fMRI-A to images of the same

Page 10 of 29

Representation of Objects face or object when presented at different sizes or positions. This indicates that the LO is sensitive to object position and size. Recently, several groups examined the sensitivity of the distributed response across the visual stream to object category and object position (Sayres & Grill-Spector, 2008; Sch­ warzlose et al., 2008) and also object identity and object position (Eger et al., 2008). These studies used multivoxel pattern analyses (MVPA) and classifier methods developed in machine learning to examine what information is present in the distributed responses across voxels in a cortical region. The distributed response can carry different informa­ tion from the mean response of a region of interest (ROI) when there is variation across voxel responses. To examine sensitivity to position information, several studies examined whether distrib­ uted response patterns to the same object category (or object exemplar) is the same (or different) when the same stimulus is presented in a different position in the visual field. In MVPA, researchers typically split the data into two independent sets and examine the cross-correlation between the distributed responses to the same (or different) stimulus in the same (or different) position across the two datasets. This gives a measure of the sen­ sitivity of distributed responses to object information and position. When responses are position invariant, there is a high correlation between the distributed responses to the same object category (or exemplar) at different positions. When responses are sensitive to position, there is a low correlation between responses to the same object category (or exemplar) at different positions. When exemplars from the same object category are shown in the same position in the vi­ sual field LO responses are reliable (or positively correlated). Surprisingly, showing ob­ jects from the same category, but at a different position, significantly reduced the correla­ tion between activation patterns (Figure 2.4, first vs. third bars) and this reduction was larger than changing the object category in the same position (see Figure 2.4, first vs. second bar). Importantly, position and category effects were independent because there were no significant interactions between position and category (all F values < 1.02, all p values > 0.31). Thus, changing both object category and position produced maximal decorrelation between distributed responses (see Figure 2.4, fourth bar).

Page 11 of 29

Representation of Objects

Figure 2.4 Mean cross correlations between LO dis­ tributed responses across two independent halves of the data for the same or different category at the same or different position in the visual field. Position effects: LO response patterns to the same category were substantially more similar if they were present­ ed at the same position versus different positions ( first and third bars, p < 10–7). Category effects: the mean correlation was higher for same-category re­ sponse patterns than for different-category response patterns when presented in the same retinotopic po­ sition (first two bars; p < 10–4). Error bars indicate SEM across subjects. (Adapted with permission from Sayres & Grill-Spector, 2008.)

Is the position information in the LO a consequence of an orderly retinotopic map (similar to retinotopic organization in lower visual areas)? By measuring retinotopic maps in the LO using standard traveling wave paradigms (Sayres & Grill-Spector, 2008; Wandell, 1999), we found a continuous mapping of the visual field in the LO in terms of both ec­ centricity and polar angle. This retinotopic map (p. 18) contained an over-representation of the contralateral and lower visual field (more voxels preferred these visual field posi­ tions than ipsilateral and upper visual fields). Although we did not consistently find a sin­ gle visual field map (a single hemifield or quarterfield representation) in LO, it over­ lapped the visual map named LO2 (Larsson & Heeger, 2006) and extended inferior to it. This suggests that there is retinotopic information in the LO which explains the position sensitivity found in the MVPA. A related recent study examined position sensitivity using pattern analysis more broadly across the ventral stream, providing additional evidence for a hierarchical organization across the ventral stream (Schwarzlose et al., 2008). Schwarzlose and colleagues found that distributed responses to a particular object category (faces, body parts, or scenes) were similar across positions in ventral temporal regions (e.g., pFus/OTS and FBA) but changed across positions in occipital regions (e.g., EBA and LO). Thus, accumulating evi­ dence from both fMRI-A and pattern analysis studies suggests a hierarchy of representa­ Page 12 of 29

Representation of Objects tions in the human ventral stream through which representations become less sensitive to object position as one ascends the visual hierarchy.

Implications for Theories of Object Recognition It is important to relate imaging results to the concept of position-invariant representa­ tions of objects and object categories. What exactly is implied by the term invariance depends on the scientific context. In some instances, this term is taken to reflect a neural representation that is abstracted so as to be independent of viewing conditions. A fully in­ variant representation, in this meaning of the term, is expected to be completely indepen­ dent of retinal position information (Biederman & Cooper, 1991). However, in the context of studies of visual cortex, the term is more often considered to be a graded phenomenon, in which neural populations are expected to retain some degree of sensitivity to visual transformations (like position changes) but in which stimulus selectivity is preserved across these transformations (DiCarlo & Cox, 2007; Kobatake & Tanaka, 1994; Rolls & Milward, 2000). In support of this view, a growing literature suggests that maintaining lo­ cal position information within a distributed neural representation may actually aid in­ variant recognition in several ways (DiCarlo & Cox, 2007; Dill & Edelman, 2001; Sayres & Grill-Spector, 2008). First, maintaining separable information about position and category may also allow maintaining information about the structural relationships between object parts (Edelman & Intrator, 2000). Second, separable position and object information may provide a robust way to generate position invariance by using a population code. Accord­ ing to this model, objects are represented as manifolds in a high dimensional space spanned by a population of neurons. The separability of position and object information may allow for fast decisions based on linear computations (e.g., linear discriminant func­ tions) to determine the object identity (or category) across positions (see DiCarlo & Cox, 2007). Finally, separable object and position information may allow concurrent localiza­ tion and recognition of objects, that is, recognizing what the object is and also where it is.

Evidence for Viewpoint Sensitivity Across the Lateral Occipital Complex Another source of change in object appearance that merits separate consideration is change across rotation in depth. In contrast to position or size changes, for which invari­ ance may be achieved by a linear transformation, the shape of objects changes with depth rotation. This is because the visual system (p. 19) receives 2D retinal projections of 3D ob­ jects. Some theories suggest that view-invariant recognition across object rotations or changes in the observer viewing angle are accomplished by largely view-invariant repre­ sentations of objects (generalized cylinders, Marr, 1980; the RBC model, Biederman, 1987). That is, the underlying neural representations respond similarly to an object across its views. However, other theories suggest that object representations are view de­ pendent, that is, consist of several 2D views of an object (Bulthoff et al., 1995; Bulthoff & Edelman, 1992; Edelman & Bulthoff, 1992; Poggio & Edelman, 1990; Tarr & Bulthoff, Page 13 of 29

Representation of Objects 1995; Ullman, 1989). Invariant object recognition is accomplished by interpolation across these views (Logothetis et al., 1995; Poggio & Edelman, 1990; Ullman, 1989) or by a dis­ tributed neural code across view-tuned neurons (Perrett et al., 1998). Single-unit electrophysiology studies in primates indicate that most neurons in monkey IT cortex are view dependent (Desimone et al., 1984; Logothetis et al., 1995; Perrett, 1996; Vogels & Biederman, 2002; Wang et al., 1996), with a small minority (5–10 percent) of neurons showing view-invariant responses across object rotations (Booth & Rolls, 1998; Logothetis et al., 1995;). In humans, results vary considerably. Short-lagged fMRI-A experiments, in which the test stimulus is presented immediately after the adapting stimulus (Grill-Spector et al., 2006a), suggest that object representations in the lateral occipital complex are view de­ pendent (Fang et al., 2007; Gauthier et al., 2002; Grill-Spector et al., 1999; but see Va­ lyear et al., 2006). However, long-lagged fMRI-A experiments, in which many intervening stimuli occur between the test and adapting stimulus (Grill-Spector et al., 2006a), have provided some evidence for view-invariant representations in the ventral LOC, especially in the left hemisphere (James et al., 2002; Vuilleumier et al., 2002) and the PPA, (Epstein et al., 2008). Also, a recent study showed that the distributed LOC responses to objects remained stable across 60-degree rotations (Eger et al., 2008). Presently, there is no con­ sensus across experimental findings in the degree to which ventral stream representa­ tions are view dependent or view invariant. These variable results may reflect differences in the neural representations depending on object category and cortical region, or methodological differences across studies (e.g., level of object rotation and fMRI-A para­ digm used). To address these differential findings, in a recent study we used a parametric approach to investigating sensitivity to object rotation and used a computational model to link be­ tween putative neural tuning and resultant fMRI measurements (Andresen et al., 2009). The parametric approach allows a richer characterization of rotation sensitivity because it measures the degree of sensitivity to rotations rather than characterizing representa­ tions as one of two possible alternatives: “invariant” or “not invariant.” We used fMRI-A to measure viewpoint sensitivity as a function of the rotation level for two object cate­ gories: animals and vehicles. Overall, we found sensitivity to object rotation in the LOC. However, there were differences across categories and regions. First, there was higher sensitivity to vehicle rotation than animal rotation. Rotations of 60 degrees produced a complete recovery from adaptation for vehicles, but rotations of 120 degrees were neces­ sary to produce recovery from adaptation for animals (Figure 2.5). Second, we found evi­ dence for over-representation of the front view of animals in the right pFus/OTS: its re­ sponses to animals were higher for the front view than the back view (compare black and gray circles in Figure 2.5b, right). In addition, fMRI-A effects across rotation varied ac­ cording to the adapting view (see Figure 2.5b, right). When adapting with the back view of animals, we found recovery from adaptation for rotations of 120 degrees or larger, but when adapting with the front view of animals, there was no significant recovery from adaptation across rotations. One interpretation is that there is less sensitivity to rotation Page 14 of 29

Representation of Objects when adapting with front views than back views of animals. However, subjects’ behav­ ioral performance in a discrimination task across object rotations showed that they are equally sensitive to rotations (performance decreases with rotation level) whether rota­ tions are relative to the front or back of an animal (Andresen et al., 2009), suggesting that this interpretation is unlikely. Alternatively, the apparent rotation cross-adaptation may be due to lower responses for back views of animals. That is, the apparent adapta­ tion across rotation from the front view to the back view is driven by lower responses to the back view rather than adaptation across 180-degree rotations.

Figure 2.5 LO and pFus/OTS responses during fMRIA experiments of rotation sensitivity Each line repre­ sents response after adapting with a front (dashed black) or back (solid gray) view of an object. The nonadapted response is indicated by diamonds (black for front view and gray for back view). The open cir­ cles indicate significant adaptation, lower than non­ adapted, p < 0.05, paired t-test across subjects. (a) Vehicle data. (b) Animal data. Responses are plotted relative to a blank fixation baseline. Error bars indicate SEM across eight subjects. (Adapted with permission from Anderson, Vinberg, & Grill-Spector, 2009.)

To better characterize the underlying representations and examine which representations may lead to our observed results, we simulated putative neural responses in a voxel and predicted the resultant (p. 20) blood oxygen level dependent (BOLD) responses. In the model, each voxel contains a mixture of neural populations, each tuned to a different ob­ ject view (Andresen et al., 2009) (Figure 2.6). blood oxygen level dependent (BOLD) re­ sponses were modeled to be proportional to the sum of responses across all neural popu­ lations. We simulated the BOLD responses in fMRI-A experiments. Results of the simula­ tions indicate that two main parameters affected the pattern of fMRI data: (1) the view

Page 15 of 29

Representation of Objects tuning width of the neural population and (2) the proportion of neurons in a voxel that prefer a specific object view. Figure 2.6a shows the response characteristics of a model of a putative voxel containing a mixture of view-dependent neural populations tuned to different object views, in which the distribution of neurons tuned to different object views is uniform. In this model, nar­ rower neural tuning to object view (left) results in recovery from fMRI-A for smaller rota­ tions than wider view tuning (right). Responses to front and back views are identical when there is no adaptation (see Figure 2.6a, diamonds), and the pattern of adaptation as a function of rotation is similar when adapting with the front or back views (see Figure 2.6a). Such a model provides an account of responses to vehicles across object-selective cortex (as measured with fMRI), and for animals in the LO. Thus, this model suggests that the difference between the representation of animals and vehicles in the LO is likely due to a smaller population view tuning for vehicles than animals (a tuning width of σ < 40° produces complete recovery from adaptation for rotations larger than 60 degrees, as ob­ served for vehicles). Figure 2.6b shows simulation results when there is a prevalence of neurons to the front view of objects. This simulation shows higher BOLD responses to frontal views without adaptation (gray vs. black diamonds) and a flatter profile of fMRI-A across rotations when adapting with the front view. These simulation results are consistent with our observa­ tions in pFus/OTS and indicate that the differential recovery from adaptation as a func­ tion of the adapting animal view may be a consequence of a larger neural population tuned to front views of animals.

Page 16 of 29

Representation of Objects

Implications for Theories of Object Recognition

Figure 2.6 Simulations predicting fMRI responses of putative voxels containing a mixture of view-depen­ dent neural populations. Left: schematic illustration of the view tuning and distribution of neural popula­ tions tuned to different views in a voxel. Right: result of model simulations illustrating the predicted fMRIA data. In all panels, the model includes six Gaus­ sians tuned to specific views around the viewing cir­ cle, separated 60° apart. Across columns, the view tuning width varies; across rows, the distribution of neural populations preferring specific views varies. Diamonds, responses without adaptation; black, back view; gray, front view; lines, response after adapta­ tion with a front view (dashed gray line) or back view (solid black line). (a) Mixture of view-dependent neural populations that are equally distributed in a voxel. Narrower tuning (left) shows recovery from fMRI-A for smaller rotations than wider view tuning (r ight). This model predicts the same pattern of recov­ ery from adaptation when adapting with the front or back view. (b) Mixture of view-dependent neural pop­ ulations in a voxel with a higher proportion of neu­ rons that prefer the front view. The number on the right indicates the ratio between the percentages neurons tuned to the front vs. back view. Top row: ra­ tio = 1.2; bottom row: ratio = 1.4. Because there are more neurons tuned to the front view in this model, it predicts higher BOLD responses to frontal views without adaptation (gray vs. black diamonds) and a flatter profile of fMRI-A across rotations when adapt­ ing with the front view. (Adapted with permission from Anderson, Vinberg, & Grill-Spector, 2009).

Overall, recent results provide empirical evidence for view-dependent object representa­ tion across human object-selective cortex that is evident both with standard fMRI and fM­ RI-A measurements. These data provide important empirical constraints for theories of object recognition and highlight the importance of parametric manipulations for captur­ ing neural selectivity to any type of stimulus transformation. They findings also generate new questions. For example, if there is no view-invariant neural representation in the hu­ man ventral temporal cortex, how is view invariant object recognition accomplished? One Page 17 of 29

Representation of Objects possibility is that view invariant recognition is achieved by utilizing a population code across neurons, where each neuron itself is not view invariant, but the responses of the populations to views of an object are separable from views of other objects (Perret et al, 1998; Cox & Dicarlo, 2007). Thus the (p. 21) distributed pattern of responses across neu­ rons separates among views of one object from views of another object. Another possibili­ ty is that downstream neurons in the anterior temporal lobe or prefrontal cortex read out the information from ventral temporal cortex and these downstream neurons contain view-invariant representations supporting behavior (Friedman et al., 2003, 2008; Quiroga et al., 2005, 2008).

Debates about the Nature of Functional Orga­ nization in the Human Ventral Stream So far, we have considered general computational principles that are required by any ob­ ject recognition system. Nevertheless, it is possible that some object classes or domains require specialized computations. The rest of this chapter examines functional specializa­ tion in the ventral stream that may be linked to these putative “domain-specific” compu­ tations. As illustrated in Figure 2.1, several regions in the ventral stream exhibit higher responses to particular object categories such as places, faces, and body parts compared with other object categories. Findings of category-selective regions initiated a fierce debate about the principles of functional organization in the ventral stream. Are there regions in the cortex that are specialized for (p. 22) any object category? Is there something special about computations relevant to specific categories that generate specialized cortical re­ gions for these computations? That is, perhaps some general processing is applied to all objects, but some computations may be specific to certain domains and may require addi­ tional brain resources. In explaining the pattern of functional selectivity in the ventral stream, four prominent views have emerged. The main debate centers on the question of whether regions that elicit maximal response for a category should be treated as a module for the representa­ tion of that category, or whether they are part of a more general object recognition sys­ tem.

Handful of Category-Specific Modules and a General Purpose Region for Processing All Other Objects Kanwisher and coworkers (Kanwisher, 2000; Op de Beeck et al., 2008) suggested that the ventral temporal cortex contains a limited number of modules specialized for the recogni­ tion of special object categories such as faces (in the FFA), places (in the PPA), and body parts (in the EBA and FBA). The remaining object-selective cortex (LOC), which shows lit­ tle selectivity for particular object categories, is a general-purpose mechanism for per­ ceiving any kind of visually presented object or shape. The underlying hypothesis is that there are few “domain-specific modules” that perform computations specific to these Page 18 of 29

Representation of Objects classes of stimuli beyond what would be required from a general object recognition sys­ tem. For example, faces, like other objects, need to be recognized across variations in their appearance (a domain-general process). However, given the importance of face pro­ cessing for social interactions, there are aspects of face processing that are unique. Spe­ cialized face processing may include identifying faces at the individual level (e.g., John vs. Harry), extracting gender information, gaze, expression, and so forth. These unique facerelated computations may be implemented in face-selective regions.

Process Maps Tarr and Gauthier (2000) proposed that object representations are clustered according to the type of processing that is required, rather than according to their visual attributes. It is possible that different levels of processing may require dedicated computations that are performed in localized cortical regions. For example, faces are usually recognized at the individual level (e.g. “Bob Jacobs”), but many objects are typically recognized at the category level (e.g. “a horse”). Following this reasoning, and evidence that objects of ex­ pertise activate the FFA more than other objects (Gauthier et al., 1999, 2000), Gauthier, Tarr, and their colleagues have suggested that the FFA is a region for subordinate identi­ fication of any object category that is automated by expertise (Gauthier et al., 1999, 2000; Tarr & Gauthier, 2000).

Distributed Object Form Topography Haxby et al. (2001) posited an “object form topography” in which occipito-temporal cor­ tex contains a topographically organized representation of shape attributes. The repre­ sentation of an object is reflected by a distinct pattern of response across all ventral cor­ tex, and this distributed activation produces the visual perception. Haxby et al. showed that the activation patterns for eight object categories were replicable, and that the re­ sponse to a given category could be determined by the distributed pattern of activation across the ventral temporal cortex. Further, they showed that it is possible to predict what object category subjects viewed even when regions that show maximal activation to a particular category (e.g., the FFA) were excluded (Haxby et al., 2001). Thus, this model suggests that the ventral temporal cortex represents object category information in an overlapping and distributed fashion. One of the reasons that this view is appealing is that a distributed code is a combinatorial code that allows representation of a large number of object categories. Given Biederman’s rough estimate that humans can recognize about 30,000 categories (Bieder­ man, 1987), this provides a neural substrate that has a capacity to represent such a large number of categories. Second, this model posited a provocative view that when consider­ ing information in the ventral stream, one needs to consider the weak signals as much as the strong signals because both convey useful information.

Sparsely Distributed Representations of Faces and Body Parts Recently, using high-resolution fMRI (HR- fMRI), we reported a series of alternating faceand limb-selective activations that are arranged in a consistent spatial organization rela­ tive to each other (p. 23) as well as retinotopic regions and hMT+ (Weiner & Grill-Spector, Page 19 of 29

Representation of Objects 2010, 2011, 2013). Specifically, our data illustrate that there is not just one distinct re­ gion selective for each category (i.e., a single FFA or FBA) in the ventral temporal cortex, but rather a series of face- and limb-selective clusters that minimally overlap, with a con­ sistent organization relative to one another on a posterior-to-anterior axis on the OTS and fusiform gyrus (FG). Our data also show an interaction between localized cortical clusters and distributed responses across voxels outside these clusters. Our results further illus­ trate that even in weakly selective voxels outside of these clusters, the distributed re­ sponses for faces and limbs are distinct from one another. Nevertheless, there is signifi­ cantly more face information in the distributed responses in weakly and highly selective voxels compared with nonselective voxels, indicating differential amounts of information in these different subsets of voxels where weakly and highly selective voxels are more in­ formative than nonselective voxels. These data suggest a fourth model—a sparsely distributed organization in the ventral temporal cortex—mediating the debate between modular and distributed theories of ob­ ject representation. Sparsely refers to the presence of several face- and limb-selective clusters with a distinct, minimally overlapping organization, and distributed refers to the presence of information in weakly and nonselective voxels outside of these clusters. This sparsely distributed organization is supported by recent cortical connectivity studies indi­ cating a hybrid modular and distributed organization (Borra et al., 2009; Zangenehpour & Chaudhuri, 2005), as well as theoretical work of a sparse-distributed network (Kanerva, 1988). Presently, there is no consensus in the field about which account best explains ventral stream functional organization. Much of the debate centers on the degree to which object processing is constrained to discrete modules or involves distributed computations across large stretches of the ventral stream (Op de Beeck et al., 2008). The debate is both about the spatial scale on which computations for object recognition occur and about the funda­ mental principles that underlie specialization in the ventral stream. On the one hand, do­ main-specific theories need to address findings of multiple foci that show selectivity. For example, there are multiple foci in the ventral stream that respond more strongly to faces versus objects. Thus, a strong modular account of a single “face module” for face recogni­ tion is unlikely. Second, the spatial extent of these putative modules is undetermined, and it is unclear whether each of these category-selective regions corresponds to a visual area. On the other hand, a very distributed and overlapping account of object representa­ tion in the ventral stream suffers from the potential problem that in order to resolve cate­ gory information, the brain may need to read out information present across the entire ventral stream (which is inefficient). Further, the fact that there is information in the dis­ tributed response does not mean that the brain uses the information in the same way that an independent classifier does. It is possible that activation in localized regions is more informative for perceptual decisions than the information available across the entire ven­ tral stream (Grill-Spector et al., 2004; Williams et al., 2007). For example, FFA responses predict when subjects recognize faces and birds, but do not predict when subjects recog­ nize houses, guitars, or flowers (Grill-Spector et al., 2004). The recent sparsely distrib­ uted model we proposed attempts to bridge between the extreme modular views and Page 20 of 29

Representation of Objects highly distributed and overlapping views of organization of the ventral temporal cortex. One particular appeal of this view is that it is closely tied to the measurements and allows for additional clusters to be incorporated into the model. As scanning resolutions improve for human fMRI studies, the number of clusters is likely to increase, but the alternating nature of face and limb representations is likely to remain in adjacent activations, as also suggested by monkey fMRI (Pinsk et al., 2009).

Open Questions and Future Directions In sum, neuroimaging research has advanced our understanding of object representa­ tions in the human brain. These studies have identified regions involved in object recogni­ tion, and have laid fundamental stepping stones in understanding the neural mechanisms underlying invariant object recognition. However, many questions remain. First, what is the relationship between neural sensitivi­ ty to object transformations and behavioral sensitivity to object transformations? Do bias­ es in neural representations produce biases in performance? For example, empirical evi­ dence shows over-representation of the lower visual field in LO. Does this lead to better recognition in the lower than upper visual field? Second, what information does the visual system use to build invariant object representations? Third, (p. 24) what computations are implemented in distinct cortical regions involved in object recognition? Does the “aha” moment in recognition involve a specific response in a particular brain region, or does it involve a distributed response across a large cortical expanse? Combining experimental methods such as fMRI and EEG will provide high spatial and temporal resolution, which is critical to addressing this question. Fourth, why do representation of few categories such as faces or body parts yield local clustered activations while many other categories (e.g., manmade objects) produce more diffuse and less intense responses across the ven­ tral temporal cortex? Fifth, what is the pattern of connectivity between ventral stream vi­ sual regions in the human brain? Although the connectivity in monkey visual cortex has been extensively explored (Moeller et al., 2008; Van Essen et al., 1990), there is little knowledge about connectivity between cortical visual areas in the human ventral stream. This knowledge is necessary for building a model of hierarchical processing in humans and any neural network model of object recognition. Future directions that combine methodologies, such as psychophysics with fMRI, EEG with fMRI, or diffusion tensor imaging with fMRI, will be instrumental in addressing these fundamental questions.

Acknowledgements I thank David Andresen, Rory Sayres, Joakim Vinberg, and Kevin Weiner for their contri­ butions to the research summarized in this chapter. This work was supported by NSF grant and NEI grant.

Page 21 of 29

Representation of Objects

References Andresen, D. R., Vinberg, J., & Grill-Spector, K. (2009). The representation of object view­ point in the human visual cortex. NeuroImage, 45, 522–536. Appelbaum, L. G., Wade, A. R., Vildavski, V. Y., Pettet, M. W., & Norcia, A. M. (2006). Cueinvariant networks for figure and background processing in human visual cortex. Journal of Neuroscience, 26, 11695–11708. Bar, M., Tootell, R. B., Schacter, D. L., Greve, D. N., Fischl, B., Mendola, J. D., Rosen, B. R., & Dale, A. M. (2001). Cortical mechanisms specific to explicit visual object recognition. Neuron, 29, 529–535. Biederman, I. (1987). Recognition-by-components: A theory of human image understand­ ing. Psychological Review, 94, 115–147. Biederman, I., & Cooper, E. E. (1991). Evidence for complete translational and reflection­ al invariance in visual object priming. Perception, 20, 585–593. Borra, E., Ichinohe, N., Sato, T., Tanifuji, M., Rockland KS. (2010). Cortical connections to area TE in monkey: hybrid modular and distributed organization. Cereb Cortex, 20 (2): 257–70. Booth, M. C., & Rolls, E. T. (1998). View-invariant representations of familiar objects by neurons in the inferior temporal visual cortex. Cerebral Cortex, 8, 510–523. Bulthoff, H. H., & Edelman, S. (1992). Psychophysical support for a two-dimensional view interpolation theory of object recognition. Proceedings of the National Academy of Sciences U S A 89: 60–64. Bulthoff, H. H., Edelman, S. Y., & Tarr, M. J. (1995). How are three-dimensional objects represented in the brain? Cerebral Cortex 5, 247–260. Chao, L. L., Haxby, J. V., & Martin, A. (1999). Attribute-based neural substrates in tempo­ ral cortex for perceiving and knowing about objects. Nature Neuroscience, 2, 913–919. Cohen, L., Dehaene, S., Naccache, L., Lehericy S., Dehaene-Lambertz, G., Henaff, M. A., & Michel, F. (2000). The visual word form area: Spatial and temporal characterization of an initial stage of reading in normal subjects and posterior split-brain patients. Brain, 123 (2), 291–307. Culham, J. C., Danckert, S. L., DeSouza, J. F., Gati, J. S., Menon, R. S., & Goodale, M. A. (2003). Visually guided grasping produces fMRI activation in dorsal but not ventral stream brain areas. Experimental Brain Research, 153, 180–189. Desimone, R., Albright, T. D., Gross, C. G., & Bruce, C. (1984). Stimulus-selective proper­ ties of inferior temporal neurons in the macaque. Journal of Neuroscience, 4, 2051–2062.

Page 22 of 29

Representation of Objects DiCarlo, J. J., & Cox, D. D. (2007). Untangling invariant object recognition. Trends in Cog­ nitive Science, 11, 333–341. DiCarlo, J. J., & Maunsell, J. H. (2003). Anterior inferotemporal neurons of monkeys en­ gaged in object recognition can be highly sensitive to object retinal position. Journal of Neurophysiology, 89, 3264–3278. Dill, M., & Edelman, S. (2001). Imperfect invariance to object translation in the discrimi­ nation of complex shapes. Perception, 30: 707–724. Downing, P. E., Jiang, Y., Shuman, M., & Kanwisher, N. (2001). A cortical area selective for visual processing of the human body. Science, 293, 2470–2473. Edelman, S., & Bulthoff, H. H. (1992) Orientation dependence in the recognition of famil­ iar and novel views of three-dimensional objects. Vision Research, 32, 2385–2400. Edelman, S., & Intrator, N. (2000), (Coarse coding of shape fragments) + (retinotopy) ap­ proximately = representation of structure. Spatial Vision, 13, 255–264. Eger, E., Ashburner, J., Haynes, J. D., Dolan, R. J., & Rees, G. (2008). fMRI activity pat­ terns in human LOC carry information about object exemplars within category. Journal of Cognitive Neuroscience, 20, 356–370. Epstein, R., & Kanwisher, N. (1998). A cortical representation of the local visual environ­ ment. Nature, 392, 598–601. Epstein, R. A., Parker, W. E., & Feiler, A. M. (2008). Two kinds of fMRI repetition suppres­ sion? Evidence for dissociable neural mechanisms. Journal of Neurophysiology, 99 (6), 2877–2886. Fang, F., & He, S. (2005). Cortical responses to invisible objects in the human dorsal and ventral pathways. Nature Neuroscience, 8, 1380–1385. Farah, M. J. (1995). Visual agnosia. Cambridge, MA: MIT Press. Freedman, D. J., Riesenhuber, M., Poggio, T., & Miller, E. K. (2001) Categorical rep­ resentation of visual stimuli in the primate prefrontal cortex. Science, 291, 312–316. (p. 25)

Freedman, D. J., Riesenhuber, M., Poggio, T., & Miller, E. K. (2003). A comparison of pri­ mate prefrontal and inferior temporal cortices during visual categorization. Journal of Neuroscience, 23, 5235–5246. Fujita, I., Tanaka, K., Ito, M., & Cheng, K. (1992). Columns for visual features of objects in monkey inferotemporal cortex. Nature, 360, 343–346. Gauthier, I., Skudlarski, P., Gore, J. C., & Anderson, A. W. (2000). Expertise for cars and birds recruits brain areas involved in face recognition. Nature Neuroscience, 3, 191–197.

Page 23 of 29

Representation of Objects Gauthier, I., Tarr, M. J., Anderson, A. W., Skudlarski, P., & Gore, J. C. (1999). Activation of the middle fusiform “face area” increases with expertise in recognizing novel objects. Na­ ture Neuroscience, 2, 568–573. Gilaie-Dotan, S., Ullman, S., Kushnir, T., & Malach, R. (2002). Shape-selective stereo pro­ cessing in human object-related visual areas. Human Brain Mapping, 15, 67–79. Grill-Spector, K. (2003). The neural basis of object perception. Current Opinion in Neuro­ biology, 13, 159–166. Grill-Spector, K., Golarai, G., & Gabrieli, J. (2008). Developmental neuroimaging of the hu­ man ventral visual cortex. Trends in Cognitive Science, 12, 152–162. Grill-Spector, K., Henson, R., & Martin, A. (2006a). Repetition and the brain: Neural mod­ els of stimulus-specific effects. Trends in Cognitive Science, 10, 14–23. Grill-Spector, K., Knouf, N., & Kanwisher, N. (2004). The fusiform face area subserves face perception, not generic within-category identification. Nature Neuroscience, 7, 555– 562. Grill-Spector, K., Kushnir, T., Edelman, S., Avidan, G., Itzchak, Y., & Malach, R. (1999). Dif­ ferential processing of objects under various viewing conditions in the human lateral oc­ cipital complex. Neuron, 24, 187–203. Grill-Spector, K., Kushnir, T., Edelman, S., Itzchak, Y., & Malach, R. (1998a). Cue-invariant activation in object-related areas of the human occipital lobe. Neuron, 21, 191–202. Grill-Spector, K., Kushnir, T., Hendler, T., Edelman, S., Itzchak, Y., & Malach, R. (1998b). A sequence of object-processing stages revealed by fMRI in the human occipital lobe. Hu­ man Brain Mapping, 6, 316–328. Grill-Spector, K., Kushnir, T., Hendler, T., & Malach, R. (2000). The dynamics of object-se­ lective activation correlate with recognition performance in humans. Nature Neuro­ science, 3, 837–843. Grill-Spector, K., & Malach, R. (2001). fMR-adaptation: A tool for studying the functional properties of human cortical neurons. Acta Psychologica (Amst), 107, 293–321. Grill-Spector, K., & Malach, R. (2004). The human visual cortex. Annual Review of Neuro­ science, 27, 649–677. Hasson, U., Harel, M., Levy, I., & Malach, R. (2003). Large-scale mirror-symmetry organi­ zation of human occipito-temporal object areas. Neuron, 37, 1027–1041. Hasson, U., Levy, I., Behrmann, M., Hendler, T., & Malach, R. (2002). Eccentricity bias as an organizing principle for human high-order object areas. Neuron, 34, 479–490.

Page 24 of 29

Representation of Objects Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai, A., Schouten, J. L., & Pietrini, P. (2001). Distributed and overlapping representations of faces and objects in ventral temporal cor­ tex. Science, 293, 2425–2430. Haxby, J. V., Hoffman, E. A., & Gobbini, M. I. (2000). The distributed human neural system for face perception. Trends in Cognitive Science, 4, 223–233. Hemond, C. C., Kanwisher, N. G., & Op de Beeck, H. P. (2007). A preference for contralat­ eral stimuli in human object- and face-selective cortex. PLoS ONE, 2, e574. James, T. W., Humphrey, G. K., Gati, J. S., Menon, R. S., & Goodale, M. A. (2002). Differen­ tial effects of viewpoint on object-driven activation in dorsal and ventral streams. Neuron, 35, 793–801. Johnson, M. H. (2001). Functional brain development in humans. Nature Reviews, Neuro­ science, 2, 475–483. Kanwisher, N. (2000). Domain specificity in face perception. Nature Neuroscience, 3, 759–763. Kanwisher, N., McDermott, J., & Chun, M. M. (1997). The fusiform face area: A module in human extrastriate cortex specialized for face perception. Journal of Neuroscience, 17, 4302–4311. Kastner, S., De Weerd, P., & Ungerleider, L. G. (2000). Texture segregation in the human visual cortex: A functional MRI study. Journal of Neurophysiology, 83, 2453–2457. Kobatake, E., & Tanaka, K. (1994). Neuronal selectivities to complex object features in the ventral visual pathway of the macaque cerebral cortex. Journal of Neurophysiology, 71, 856–867. Kourtzi, Z., & Kanwisher, N. (2000). Cortical regions involved in perceiving object shape. Journal of Neuroscience, 20, 3310–3318. Kourtzi, Z., & Kanwisher, N. (2001). Representation of perceived object shape by the hu­ man lateral occipital complex. Science, 293, 1506–1509. Kourtzi, Z., Tolias, A. S., Altmann, C. F., Augath, M., & Logothetis, N. K. (2003). Integra­ tion of local features into global shapes: monkey and human fMRI studies. Neuron, 37, 333–346. Larsson, J., & Heeger, D. J. (2006). Two retinotopic visual areas in human lateral occipital cortex. Journal of Neuroscience, 26, 13128–13142. Lerner, Y., Epshtein, B., Ullman, S., & Malach, R. (2008). Class information predicts acti­ vation by object fragments in human object areas. Journal of Cognitive Neuroscience, 20, 1189–1206.

Page 25 of 29

Representation of Objects Lerner, Y., Hendler, T., Ben-Bashat, D., Harel, M., & Malach, R. (2001). A hierarchical axis of object processing stages in the human visual cortex. Cerebral Cortex, 11, 287–297. Lerner, Y., Hendler, T., & Malach, R. (2002). Object-completion effects in the human later­ al occipital complex. Cerebral Cortex, 12, 163–177. Levy, I., Hasson, U., Avidan, G., Hendler, T., & Malach, R. (2001). Center-periphery organi­ zation of human object areas. Nature Neuroscience, 4, 533–539. Logothetis, N. K., Pauls, J., & Poggio, T. (1995). Shape representation in the inferior tem­ poral cortex of monkeys. Current Biology, 5, 552–563. Malach, R., Levy, I., & Hasson, U. (2002) The topography of high-order human object ar­ eas. Trends in Cognitive Science, 6, 176–184. Malach, R., Reppas, J. B., Benson, R. R., Kwong, K. K., Jiang, H., Kennedy, W. A., Ledden, P. J., Brady, T. J., Rosen, B. R., & Tootell, R. B. (1995). Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex. Proceedings of the Na­ tional Academy of Sciences U S A, 92, 8135–8139. Marr, D. (1980). Visual information processing: The structure and creation of visu­ al representations. Philosophical Transactions of the Royal Society of London, Series B, Biological Sciences, 290, 199–218. (p. 26)

Martin, A., Wiggs, C. L., Ungerleider, L. G., & Haxby, J. V. (1996). Neural correlates of cat­ egory-specific knowledge. Nature, 379, 649–652. McKyton, A., & Zohary, E. (2007). Beyond retinotopic mapping: The spatial representa­ tion of objects in the human lateral occipital complex. Cerebral Cortex, 17, 1164–1172. Mendola, J. D., Dale, A. M., Fischl, B., Liu, A. K., & Tootell, R. B. (1999). The representa­ tion of illusory and real contours in human cortical visual areas revealed by functional magnetic resonance imaging. Journal of Neuroscience, 19, 8560–8572. Miller, E. K., Li, L., & Desimone, R. (1991). A neural mechanism for working and recogni­ tion memory in inferior temporal cortex. Science, 254, 1377–1379. Moeller, S., Freiwald, W. A., & Tsao, D. Y. (2008). Patches with links: A unified system for processing faces in the macaque temporal lobe. Science, 320, 1355–1359. Nakayama, K., He, Z. J., & Shimojo, S. (1995). Visual surface representation: A critical link between low-level and high-level vision. In S. M. Kosslyn & D. N. Osherson (Eds.), An invitation to cognitive sciences: Visual cognition. Cambridge, MA: MIT Press. Op de Beeck, H. P., Haushofer, J., & Kanwisher, N. G. (2008). Interpreting fMRI data: Maps, modules and dimensions. Nature Reviews, Neuroscience, 9, 123–135. Op De Beeck, H., & Vogels, R. (2000). Spatial sensitivity of macaque inferior temporal neurons. Journal of Comparative Neurology, 426, 505–518. Page 26 of 29

Representation of Objects Perrett, D. I. (1996). View-dependent coding in the ventral stream and its consequence for recognition. In R. Camaniti, K. P. Hoffmann, & A. J. Lacquaniti (Eds.), Vision and move­ ment mechanisms in the cerebral cortex (pp. 142–151). Strasbourg: HFSP. Perrett, D. I., Oram, M. W., & Ashbridge, E. (1998). Evidence accumulation in cell popula­ tions responsive to faces: An account of generalisation of recognition without mental transformations. Cognition, 67, 111–145. Peterson, M. A., & Gibson, B. S. (1994a). Must shape recognition follow figure-ground or­ ganization? An assumption in peril. Psychological Science, 5, 253–259. Peterson, M. A., & Gibson, B. S. (1994b). Object recognition contributions to figureground organization: Operations on outlines and subjective contours. Perception and Psy­ chophysics, 56, 551–564. Pinsk, MA., Arcaro, M., Weiner, KS., Kalkus, JF., Inati, SJ., Gross, CG., Kastner, S. (2009). Neural representations of faces and body parts in macaque and human cortex: a compar­ ative FMRI study. J Neurophysiol. (5): 2581–600. Poggio, T., & Edelman, S. (1990). A network that learns to recognize three-dimensional objects. Nature, 343, 263–266. Quiroga, R. Q., Mukamel, R., Isham, E. A., Malach, R., & Fried, I. (2008). Human singleneuron responses at the threshold of conscious recognition. Proceedings of the National Academy of Sciences U S A, 105, 3599–3604. Quiroga, R. Q., Reddy, L., Kreiman, G., Koch, C., & Fried, I. (2005). Invariant visual repre­ sentation by single neurons in the human brain. Nature, 435, 1102–1107. Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2, 1019–1025. Rolls, E. T. (2000). Functions of the primate temporal lobe cortical visual areas in invari­ ant visual object and face recognition. Neuron, 27, 205–218. Rolls, E. T., & Milward, T. (2000). A model of invariant object recognition in the visual sys­ tem: Learning rules, activation functions, lateral inhibition, and information-based perfor­ mance measures. Neural Computation, 12, 2547–2572. Sawamura, H., Orban, G. A., & Vogels, R. (2006). Selectivity of neuronal adaptation does not match response selectivity: A single-cell study of the FMRI adaptation paradigm. Neu­ ron, 49, 307–318. Sayres, R., & Grill-Spector, K. (2008). Relating retinotopic and object-selective responses in human lateral occipital cortex. Journal of Neurophysiology, 100 (1), 249–267. Schwarzlose, R. F., Baker, C. I., & Kanwisher, N. K. (2005). Separate face and body selec­ tivity on the fusiform gyrus. Journal of Neuroscience, 25, 11055–11059. Page 27 of 29

Representation of Objects Schwarzlose, R. F., Swisher, J. D., Dang, S., & Kanwisher, N. (2008). The distribution of category and location information across object-selective regions in human visual cortex. Proceedings of the National Academy of Sciences U S A, 105, 4447–4452. Stanley, D. A., & Rubin, N. (2003). fMRI activation in response to illusory contours and salient regions in the human lateral occipital complex. Neuron, 37, 323–331. Tarr, M. J., & Bulthoff, H. H. (1995). Is human object recognition better described by geon structural descriptions or by multiple views? Comment on Biederman and Gerhardstein (1993). Journal of Experimental Psychology: Human Perception and Performance, 21, 1494–1505. Tarr, M. J., & Gauthier, I. (2000). FFA: A flexible fusiform area for subordinate-level visual processing automatized by expertise. Nature Neuroscience, 3, 764–769. Thorpe, S., Fize, D., & Marlot, C. (1996). Speed of processing in the human visual system. Nature, 381, 520–522. Ullman, S. (1989). Aligning pictorial descriptions: An approach to object recognition. Cog­ nition, 32, 193–254. Ungerleider, L. G., Mishkin, M., & Macko, K. A. (1983). Object vision and spatial vision: Two cortical pathways. Trends in Neuroscience, 6, 414–417. Van Essen, D. C., Felleman, D. J., DeYoe, E. A., Olavarria, J., & Knierim, J. (1990). Modular and hierarchical organization of extrastriate visual cortex in the macaque monkey. Cold Spring Harbor Symposia on Quantum Biology, 55, 679–696. Vinberg, J., & Grill-Spector, K. (2008). Representation of shapes, edges, and surfaces across multiple cues in the human visual cortex. Journal of Neurophysiology, 99, 1380– 1393. Vogels, R., & Biederman, I. (2002). Effects of illumination intensity and direction on ob­ ject coding in macaque inferior temporal cortex. Cerebral Cortex, 12, 756–766. Vuilleumier, P., Henson, R. N., Driver, J., & Dolan, R. J. (2002). Multiple levels of visual ob­ ject constancy revealed by event-related fMRI of repetition priming. Nature Neuroscience, 5, 491–499. Wandell, B. A. (1999). Computational neuroimaging of human visual cortex. Annual Re­ view of Neuroscience, 22, 145–173. Wang, G., Tanaka, K., & Tanifuji, M. (1996). Optical imaging of functional organization in the monkey inferotemporal cortex. Science, 272, 1665–1668. (p. 27)

Weiner, K. S., & Grill-Spector, K. (2010). Sparsely-distributed organization of face

and limb activations in human ventral temporal cortex. NeuroImage, 52, 1559–1573.

Page 28 of 29

Representation of Objects Weiner, KS., & Grill-Spector, K. (2011). Not one extrastriate body area: using anatomical landmarks, hMT+, and visual field maps to parcellate limb-selective activations in human lateral occipitotemporal cortex. NeuroImage, 56 (4): 2183–99. Weiner, KS., & Grill-Spector, K. (2013). Neural representations of faces and limbs neigh­ bor in human high-level visual cortex: evidence for a new organization principle. Psychol Res. 277 (1): 74–97. Williams, M. A., Dang, S., & Kanwisher, N. G. (2007). Only some spatial patterns of fMRI response are read out in task performance. Nature Neuroscience, 10, 685–686. Zangenehpour, S., Chaudhuri, A., Zangenehpour, S., Chaudhuri A. (2005). Patchy organi­ zation and asymmetric distribution of the neural correlates of face processing in monkey inferotemporal cortex. Curr Biol, 15 (11): 993–1005.

Kalanit Grill-Spector

Kalanit Grill-Spector is Associate Professor, Department of Psychology and Neuro­ science Institute, Stanford University.

Page 29 of 29

Representation of Spatial Relations

Representation of Spatial Relations   Bruno Laeng The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics Edited by Kevin N. Ochsner and Stephen Kosslyn Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0003

Abstract and Keywords Research in cognitive neuroscience in humans and animals has revealed a considerable degree of specialization of the brain for spatial functions, and has also revealed that the brain’s representation of space is separable from its representation of object identities. The current picture is that multiple and parallel frames of reference cooperate to make up a representation of space that allows efficient navigation and action within the sur­ rounding physical environment. As humans, however, we do not simply “act” in space, but we also “know” it and “talk” about it. Hence, the human brain’s spatial representations may involve specifically human and novel patterns of lateralization and of brain areas’ specializations. Pathologies of space perception and spatially-directed attention, like spa­ tial neglect, can be explained by the damage to one or several of these maps and frames of reference. The multiple spatial, cognitive, maps used by the human brain clearly coop­ erate toward flexible representations of spatial relations that are progressively abstract (or categorical) and may be apt to support the human ability to communicate spatial in­ formation and understand mathematical concepts. Nevertheless, a representation of space as extended and continuous is also necessary for the control of action and naviga­ tion. Keywords: spatial representations, frames of reference, spatial neglect, lateralization, cognitive maps

Representing the space around our human bodies seems to serve three main functions or goals: to “act,” to “know,” and to “talk.” First of all, humans need to navigate in their en­ vironment. Some navigational skills require a very precise adaptation of the movement of the entire body to the presence of other external bodies and obstacles (whether these are static or also mobile). Without an adequate representation of physical distances between external bodies (and between these and oneself), actions like running in a crowd or dri­ ving in traffic would be impossible. Humans also possess hands and need to manipulate objects; their highly developed control of fine finger movements makes possible the con­ struction and use of tools. Engaging in complex sequences of manual actions requires the positioning and direction of movements over precise and narrow areas of space (e.g., when typing on the keyboard, opening a lock with a key, making a knot, playing a musical Page 1 of 59

Representation of Spatial Relations instrument). All these behaviors would be impossible without a fine-grained representa­ tion of the spatial distances and actual sizes of the objects and of their position relative to each other and to the body and hand involved in the action. However, humans do not solely “act” in space. Humans also “cognize” about space, For example, we can think about an object being present in a place although neither the ob­ ject nor the place is any longer visible (i.e., Piagetian object permanence). We can think about simple spatial schemata or complex mathematical spaces and geometries (Aflalo & Graziano, 2006); the ability to represent space as a continuum may lie at the basis of our understanding of objects’ permanence in space and, therefore, (p. 29) of numerosity (De­ haene, 1997). We can also engage in endless construction of meaning and parse the phys­ ical world into classes and concepts; in fact, abstraction and recognition of equivalences between events (i.e., categorization) has obvious advantages, as the cultural evolution of humans demonstrates. Categories can be expressed in symbols, which can be exchanged. All this applies to spatial representations as well. For example, the category “to the left” identifies as equivalent a whole class of positions and can be expressed either in verbal language or with pictorial symbols (e.g., an arrow: ←). Clearly, humans not only “act” in space and “know” space, but they also “talk” about space. The role played by spatial cog­ nition in linguistic function cannot be underestimated (e.g., “thinking for speaking”; Slobin, 1996). Thus, a categorical, nonmetric, representation of space constitutes an addi­ tional spatial ability, one that could make the groundwork for spatial reference in lan­ guage. The present discussion focuses on vision, in part because evolution has devoted to it a large amount of the primate brain’s processing capacity (e.g., in humans, about 4 to 6 bil­ lion neurons and 20 percent of the entire cortical area; Maunsell & Newsome, 1987; Wan­ dell et al., 2009;); in part because vision is the most accurate spatial sense in primates and is central to the human representation of space (“to see is to know what is where by looking”; Marr, 1982). Nevertheless, it is important to acknowledge that a representation of space can be obtained in humans through several sensory modalities (e.g., kinesthesia) and that blind people do posses a detailed knowledge of space and can move and act in it as well as talk about it. Some animals possess a representation of navigational space that can be obtained through sensory information that is not available to humans (e.g., a “magnetic” sense; Johnsen & Lohmann, 2005; Ritz, 2009). Honeybees’ “dances” can com­ municate spatial information that refers to locations where food can be found (Kirchner & Braun, 1994; Menzel et al., 2000). We begin by discussing (1) the existence of topographic maps in the brain. From these maps, the brain extracts higher order representations of the external world that are dedi­ cated to the basic distinction between (2) “what” is out there versus “where” it is. These two types of information need to be integrated for the control of action, and this yields in turn the representation of (3) “how” an object can be an object of action and “which” spe­ cific object among multiple ones is located in a specific place at a given time. However, lo­ calization of objects can take place according to multiple and parallel (4) spatial frames of reference; the deployment of spatial attention can also occur along these multiple refer­ Page 2 of 59

Representation of Spatial Relations ence frames, as the pathology of spatial attention or (5) neglect, after brain damage, has clearly revealed. Regarding the brain’s specialization for spatial cognition, humans show a strong degree of (6) cerebral lateralization that may be unique in the animal world. The current evidence indicates a right hemisphere’s specialization for analog (coordinate) spatial information versus a left hemisphere’s specialization for digital (categorical) spa­ tial information. The representation of categorical spatial relations is also relevant for (7) object recognition because an object’s structural description is made in terms of parts and categorical spatial relations between these (i.e., the “where of what”). Finally, we dis­ cuss the brain’s representation of large-scale space, as it is used in navigation, or the (8) “cognitive map” of the environment.

1. The Brain’s Topographic Maps In the evolutionary history of vision, the primeval function of photoreceptors or “eyes” may have been a raw sense of location (e.g., cardinal directions as up vs. down based on sunlight; detecting motion paths or distances by use of optic flow; Ings, 2007). However, the ability to focus a detailed “image” with information about wavelengths and spatial fre­ quencies that allow the extraction of colored surfaces and forms requires the evolution of camera-like eyes with a single lens focusing light onto a mosaic of photoreceptors (e.g., the image-forming eye of the squid or the eyes of vertebrates; Lamb et al., 2007). The hu­ man retina allows the formation of an image that is very detailed spatially, and this detail seems conserved at the early cortical level in the flow of information from the eye to suc­ cessive brain areas. The smallest cortical receptive fields processing spatial information in human vision possess receptive field centers hardly wider than the single-cone pho­ toreceptors (Smallman et al., 1996). The retina provides the initial topographic map for humans, where nearby scene points are represented in the responses of nearby photoreceptors and, in turn, in a matrix of neurons to which they provide their input. Areas of the brain receiving retinal input are also topographically organized in retinotopic “visual field maps” (Tootell, Hadjikhani, et al., 1998; Wandell et al., 2005). These preserve, to some extent, the geometric structure of the retina, which in turn, by the laws of optic refraction, reflects the geometric struc­ ture of the external visual (p. 30) world as a planar projection onto a two-dimensional sur­ face (Figure 3.1).

Page 3 of 59

Representation of Spatial Relations

Figure 3.1 Topographic representations in primary visual cortex (V1). Reprinted with permission from Tootell, Hadjikhani, Vanduffel W, et al., 1998. © 1998 National Academy of Sciences, U.S.A.

There is now consensus that the topographic features of cortical and subcortical maps are not incidental, but instead are essential to brain function (Kaas, 1997). A topographi­ cally organized structure can depict visual information as “points” organized by their rel­ ative locations in space and varying in size, brightness, and color. Points near each other in the represented space are represented by points near each other in the representing substrate; this (internal) space can be used to represent (external) space (Markman, 1999). Thus, one fundamental property of the brain’s representation of space is that the brain uses space on the cortex to represent space in the world (Kosslyn, Thompson, & Ga­ nis, 2006). Specifically, topographic maps can evaluate how input from one set of recep­ tors can be different from that of adjoining sets of receptors. Local connections among neurons that are topographically organized can easily set up center-surround receptive fields and compare adjacent features. Other types of brain organization between units sensitive to adjacent points requires more complex arrays and longer connections (Cher­ niak, 1990), which are metabolically (and evolutionarily) costly and result in increases in neural transmission time. Cortical maps appear to be further arranged in spatial clusters at a coarser scale (Wandell et al., 2005, 2009). This organization allows neural mosaics in different maps that serve similar common computational goals to share resources (e.g., coordinating the timing of neural signals or temporarily storing memories; Wandell et al., 2005). Thus, cortical maps may organize themselves to optimize nearest neighbor rela­ tionships (Kohonen, 2001) so that neurons that process similar information are located near each other, minimizing wiring length. A topographical neural design is clearly revealed by the effects of localized cortical dam­ age, resulting in a general loss of visual function restricted to a corresponding region within the visual field (Horton & Hoyt, 1991). Several areas of the human visual cortex, but also brainstem nuclei (e.g., the superior colliculus) and thalamus (e.g., the lateral Page 4 of 59

Representation of Spatial Relations geniculate nucleus and the pulvinar), are organized (p. 31) into retinotopic maps, which preserve both left–right and top–bottom ordering. Consequently, cells that are close to­ gether on the sending surface (the retina) project to regions that are close together on the target surface (Thivierge & Marcus, 2007). Remarkably, the dorsal surface of the hu­ man brain, extending from the posterior portion of the intraparietal sulcus forward, con­ tains several maps (400–700 mm2) that are much smaller than the V1 map (4000 mm2; Wandell et al., 2005). The visual field is represented continuously as in V1, but the visual field is split along the vertical meridian so that input to each hemisphere originates from the contralateral visual hemifield. The two halves are thus “seamed” together by the long connections of the corpus callosum. If early vision’s topography has high resolution, later maps in the hierarchy are progres­ sively less organized topographically (Malach, Levy, & Hasson, 2002). As a result, the im­ age is represented at successive stages with decreasing spatial precision and resolution. In addition, beyond the initial V1 cortical map, retinotopic maps are complex and patchy. Consequently, adjacent points in the visual field are not represented in adjacent regions of the same area in every case (Sereno et al., 1995). However, the topographic organiza­ tion of external space in the visual cortex is extraordinarily veridical compared with other modalities. For example, the representation of bodily space in primary somatosensory cortex (or the so-called homunculus) clearly violates a smooth and continuous spatial rep­ resentation of body parts. The face representation is inverted (Servos et al., 1999; Yang et al., 1994), and the facial skin areas are located between those representing the thumb and the lower lip of the mouth (Nguyen et al., 2004). Also, similarly to the extensive rep­ resentation of the thumb in somatosensory cortex (in fact larger than that of the skin of the whole face), cortical visual maps magnify or compress distances in some portions of the visual field. The central 15 degrees of vision take up about 70 percent of cortical area; and the central 24 degrees cover 80 percent (Fishman, 1997; Zeki, 1969). In V2, parts of the retina that correspond to the upper half of the visual field are represented separately from parts that respond to the lower half of the visual field. Area MT repre­ sents only the binocular field, and V4 only the central 30 to 40 degrees, whereas the pari­ etal areas represent more of the periphery (Gatass et al., 2005; Sereno et al., 2001). Cal­ losal connections in humans allow areas of the inferior parietal cortex and the fusiform gyrus in the temporal lobe to deal with stimuli presented in the ipsilateral visual field (Tootell, Mendola, et al., 1998). These topographic organizations have been revealed by a variety of methods, including clinical studies of patients (Fishman, 1997); animal re­ search (Felleman & Van Essen, 1991; Tootell et al., 1982); and more recently, neuroimag­ ing in healthy humans (Engel et al., 1994, 1997; Sereno et al., 1995; DeYoe et al., 1996; Wandell, 1999) and brain stimulation (Kastner et al., 1998). In some of the retinotopic mapping studies with functional magnetic resonance imaging (fMRI), participants per­ formed working memory tasks or planning eye movements (Silver & Kastner, 2009). These studies revealed the existence of previously unknown topo-graphic maps of visual space in the human parietal (e.g., Sereno & Huang, 2006) and frontal (e.g., Kastner et al., 2007) lobes.

Page 5 of 59

Representation of Spatial Relations Another clear advantage of a topographical organization of the visual brain would be in guiding ocular movements by maintaining a faithful representation of the position of the target of a saccade (Optican, 2005). In addition, a topographical organization provides ex­ plicit and accessible information that represents the external world, beginning with the extraction of boundaries and limits of surfaces of objects and ground (Barlow, 1981). Ac­ cording to Marr (1982): “A representation is a formal system for making explicit certain entities or types of information.” If different properties or features of the physical infor­ mation are encoded or made explicit at any level of the flow of information, then qualita­ tively different types of information will be represented.

2. “What” Versus “Where” in the Visual System The brain can be defined as a largely special-purpose machine in which a “division of la­ bor” between brain areas is the pervasive principle of neural organization. After more than a century of systematic brain research, the idea that functions fractionate into a pro­ gressively modular brain structure has achieved axiomatic status (Livingstone & Hubel, 1988; Zeki, 2001). Specifically, a perceptual problem is most easily dealt with by dividing the problem into smaller subproblems, as independent of the others as possible so as not to disrupt each other (Gattass et al., 2005). One way in which the visual brain accomplish­ es this division of labor is by separating visual information into two streams of processing or, namely, a “what” system and a “where” system. It may appear odd that the brain or cognitive system separates visual attributes that in the physical world are con­ joined. Disjoining attributes of the same object exacerbates the problem of integrating (p. 32)

them (i.e., the so-called binding problem; Treisman, 1996; Revonsuo & Newman, 1999; Seth et al., 2004). However, computer simulations with artificial neural networks have demonstrated that two subsystems can be more efficient than one in computing different mappings of the same input at the same time (Otto et al., 1992; Reuckl, Cave & Kosslyn, 1989). Thus, cognitive neuroscience reveals that the brains of humans and, perhaps, of all other vertebrates process space and forms (bodies) as independent aspects of reality. Thus, the visual brain can be divided between “two cortical visual systems” (Ingle, 1967; Mishkin, Ungerleider & Macko, 1983; Schneider, 1967; Ungerleider & Mishkin, 1982). A ventral (mostly temporal cortex in humans) visual stream mediates object identification (“what is it?”), and a dorsal (mostly parietal cortex) visual stream mediates localization of objects in space (“where is it?”). This partition is most clearly documented for the visual modality in our species (and in other primates), although it appears to be equally valid for other sensory processes (e.g., for the auditory system, Alain et al., 2001; Lomber & Malhotra, 2008; Romanski et al., 1999; for touch, Reed et al., 2005). Information processed in the two streams appears to be integrated at a successive stage in the superior temporal lobe (Morel & Bullier, 1990), where integration of the two pathways could recruit ocular move­ ments to the location of a shape and provide some positional information to the temporal areas (Otto et al., 1992). Further integration of shape and position information occurs in

Page 6 of 59

Representation of Spatial Relations the hippocampus where long-term memories of “configurations” of the life environment (Sutherland & Rudy, 1988) or its topography (O’Keefe & Nadel, 1978) are formed. Similarly to the classic Newtonian distinction between matter and absolute space, the dis­ tinction between what and where processing assumes that (1) objects can be represented by the brain independently from their locations, and (2) locations can be perceived as im­ material points in an immaterial medium, empty of objects. Indeed, at a phenomenologi­ cal level, space primarily appears as an all-pervading stuff, an incorporeal or “ethereal” receptacle that can be filled by material bodies, or an openness in which matter spreads out. In everyday activities, a region of space is typically identified with reference to “what” is or could be located there (e.g., “the book is on the table”; Casati & Varzi, 1999). However, we also need to represent empty or “negative” space because trajectories and paths toward other objects or places (Collett, 1982) occur in a spatial medium that is free of material obstacles. The visual modality also estimates distance (i.e., the empty portion of space between objects), and vision can represent the future position of a moving object into some unoccupied point in space (Gregory, 2009). Humans exemplify the division of labor between the two visual systems. Bálint (1909) was probably the first to report a human case in which the perception of visual location was impaired while the visual recognition of an object was relatively spared. Bálint’s patient, who had bilateral inferior parietal lesions, was unable to reach for objects or estimate dis­ tances between objects. Holmes and Horax (1919) described similar patients and showed that they had difficulty in judging differences in the lengths of two lines, but could easily judge whether a unitary shape made by connecting the same two lines was a trapezoid and not a rectangle (i.e., when the lines were part of a unitary shape). In general, pa­ tients with damage to the parietal lobes lack the ability to judge objects’ positions in space, as shown by their difficulties in reaching, grasping, pointing to, or verbally de­ scribing their position and size (De Renzi, 1982). In contrast, “blindsight” patients, who are unaware of the presence of an object, can locate by pointing an object in the blind field of vision (Perenin & Jeannerod, 1978; Weiskrantz, 1986). Children with the Williams’ (developmental) syndrome show remarkable sparing of object recognition with severe breakdown of spatial abilities (Landau et al., 2006). In contrast, patients with infe­ rior temporal-occipital lesions, who have difficulty in recognizing the identity of objects and reading (i.e., visual agnosia), typically show unimpaired spatial perception; they can reach and manipulate objects and navigate without bumping into objects (Kinsbourne & Warrington, 1962). Patients with visual agnosia after bilateral damage to the ventral sys­ tem (Goodale et al., 1991) can also guide a movement in a normal and natural manner to­ ward a vertical opening by inserting a piece of cardboard into it (Figure 3.2), but perform at chance when asked to report either verbally or by adjusting the cardboard to match the orientation of the opening (Milner & Goodale, 2008). It would thus seem that the spa­ tial representations of the dorsal system can effectively guide action but cannot make even simple pattern discriminations.

Page 7 of 59

Representation of Spatial Relations

Figure 3.2 Performance of a patient with extensive damage to the ventral system in perceptually match­ ing a linear gap versus inserting an object into the gap. From Milner & Goodale, 2006. Reprinted with per­ mission from Oxford University Press.

Importantly, several neurobiological investigations in which brain regions of nonhuman primates (p. 33) were selectively damaged have shown a double dissociation between per­ ception of space and of objects (Mishkin et al., 1983). In one condition, monkeys had to choose one of two identical objects (e.g., two square plaques) located closer to a land­ mark object (a cylinder). In another condition, two different objects (e.g., a cube and a pyramid), each with a different pattern on its surface (e.g., checkerboard vs. stripes), were shown. In both tasks, the monkeys had to reach a target object, but the solution in the first task mainly depended on registering spatial information, whereas in the other, information about shape and pattern was crucial to obtain the reward. A group of mon­ keys lacked parts of the parietal cortex, whereas another group lacked parts of the tem­ poral cortex. The “parietal” monkeys were significantly more impaired in the spatial task, whereas the “temporal” monkeys were significantly more impaired in the object discrimi­ nation task. The results are counterintuitive because one would think that the monkeys with an intact dorsal system (parietal lobe) should be able to discriminate checkerboards and stripes (because these differ in both size and slant of their elements), despite the damage to the ventral system (temporal lobe). The inability of monkeys to do so clearly indicates that spatial representations of their dorsal system are used to guide action, not to discriminate patterns (Kosslyn, 1994; Kosslyn & Koenig, 1992). In addition, electrical recordings from individual neurons in monkeys’ parietal lobes re­ veal cells that encode the shape of objects (Sereno & Maunsell, 1998; Taira et al., 1990;; Sakatag et al., 1997). However, these pattern (or “what”) representations in the dorsal cortex are clearly action related; their representation of the geometrical properties of shapes (e.g., orientation, size, depth, and motion) are used exclusively when reaching and grasping objects. That is, they represent space in a “pragmatic” sense and without a con­ Page 8 of 59

Representation of Spatial Relations ceptual content that is reportable (Faillenot et al., 1997; Jeannerod & Jacob, 2005; Va­ lyear et al., 2005). Neuroimaging studies show shape-selective activations in humans’ dorsal areas (Denys et al., 2004); simply seeing a manipulable human-made object (e.g., a tool like a hammer) evokes changes in neural activity within the human dorsal system (Chao & Martin, 2000). This is consistent with primate studies showing that the parietal lobes contain neurons that encode the shape of objects. Thus, human parietal structures contain motor-relevant information about the shapes of some objects, information that would seem necessary to efficiently control specific actions. These dorsal areas’ shape information could also be used in conjunction with the ventral system and act as an organizing principle for a “cate­ gory-specific” representation of semantic categories (in this case, for “tools”; Mahon et al., 2007). Remarkably, shape information in the dorsal system per se does not seem to be able to support the conscious perception of object qualities; a profound object agnosia (that includes the recognition of manipulable objects) is observed after temporal lesions. Dissociations of motor-relevant shape information from conscious shape perception have also been documented with normal observers, when actions were directed toward visual illusions (Aglioti et al., 1995; Króliczak et al., 2006). Evidently, the dorsal (parietal) system’s visual processing does not lead to a conscious description (identification) of ob­ jects’ shapes (Fang & He, 2005; Johnson & Haggard, 2005). It is also clear that the dorsal system’s shape sensitivity does not derive from information relayed by the ventral system because monkeys with large temporal lesions and profound object recognition deficits are able to normally grasp small objects (Glickstein et al., 1998) and catch flying insects. Similarly, patient D.F. (Goodale et al., 1991; Milner et al., 1991; Milner & Goodale, 2006) showed preserved visuomotor abilities and could catch a ball in flight (Carey et al., 1996) but could not recognize a drawing of an apple. When asked to make a copy of it, she arranged straight lines into a spatially incoherent squarelike configuration (Servos et al., 1999). D.F.’s drawing deficit indicates that her spared dorsal shape representations cannot be (p. 34) accessed or expressed symbolically, de­ spite being usable as the targets of directed action. The fact that D.F. performed at chance, when reporting the orientation of the opening in the previously described “post­ ing” experiment, does not necessarily indicate that the conscious representation of space is an exclusive function of the ventral system (Milner & Goodale, 2006, 2008). Nor does it indicate that the dorsal system’s representation of space should be best described as a “zombie agent” (Koch, 2004) or as a form of “blindsight without blindness” (Weiskrantz, 1997). In fact, patient D.F. made accurate metric judgments of which elements in an array were nearest and which were farthest; when asked to reproduce an array of irregularly positioned colored dots on a page, her rendition of their relative positions (e.g., what ele­ ment was left of or below another element) was accurate, although their absolute posi­ tioning was deficient (Carey et al., 2006). It seems likely that a perfect reproduction of an array of elements requires comparing the copy to the model array as an overall “gestalt” or perceptual template, a strategy that may depend on the shape perception mechanisms of D.F.’s (damaged) ventral system. Page 9 of 59

Representation of Spatial Relations Finally, it would seem that the ventral system in the absence of normal parietal lobes can­ not support an entirely normal perception of shapes. Patients with extensive and bilateral parietal lesions (i.e., with Bálint’s syndrome) do not show completely normal object or shape perception; their object recognition is typically limited to one object or just a part of it (Luria, 1963). Remarkably, these patients need an extraordinarily long time to ac­ complish recognition of even a single object, thus revealing a severe reduction in object processing rate (Duncan et al., 2003). Kahneman, Treisman, and Gibbs (1992) made a dis­ tinction between object identification and object perception. They proposed that identifi­ cation (i.e., the conscious experience of seeing an instance of an object) depends on form­ ing a continuously updated, integrated representation of the shapes and their space–time coordinates. The ventral and dorsal system may each support a form of “phenomenal” consciousness (cf. Baars, 2002; Block, 1996), but they necessitate the functions of the oth­ er system in order to generate a conscious representation that is reportable and accessi­ ble to other parts of the brain (e.g., the executive areas of the frontal lobes; Lamme, 2003, 2006).

3. “Where,” “How,” or “Which” Systems? “What” versus “where” functional distinctions have also been identified in specific areas of the frontal lobe of monkeys (Goldman-Rakic, 1987; Passingham, 1985; Rao et al., 1997; Wilson et al., 1993). These areas appear to support working memory (short-term reten­ tion) of associations between shape and spatial information. In other words, they encode online information about “what is where?” or “which is it?” (when seeing multiple ob­ jects). Prompted by these findings with animals, similar functional distinctions have been described in human patients (e.g., Newcombe & Russell, 1969) as well as in localized brain activations in healthy subjects (e.g., Courtney et al., 1997; Haxby et al., 1991; Smith et al., 1995; Ungerleider & Haxby, 1994; Zeki et al., 1991). Spatial information (in particular fine-grained spatial information about distances, direc­ tion, and size) is clearly essential to action. In this respect, much of the spatial informa­ tion of the “where” system is actually in the service of movement planning and guidance or of “praxis” (i.e., how to accomplish an action, especially early-movement planning; An­ dersen & Buneo, 2002). In particular, the posterior parietal cortex of primates performs the function of transforming visual information into a motor plan (Snyder et al., 1997). For example, grasping an object with one hand is a common primate behavior that, ac­ cording to studies of both monkeys and humans, depends on the spatial functions of the parietal lobe (Castiello, 2005). Indeed, patients with damage to the superior parietal lobe show striking deficits in visually guided grasping (i.e., optic ataxia; Perenin & Vighetto, 1988). Damage to this area may result in difficulties generating visual-motor transforma­ tions that are necessary to mold the hand’s action to the shape and size of the object (Jeannerod et al., 1994; Khan et al., 2005), as well as to take into account the position of potential obstacles (Schindler et al., 2004).

Page 10 of 59

Representation of Spatial Relations Neuroimaging studies of healthy participants scanned during reach-to-grasp actions show activity in the posterior parietal cortex (especially when a precision grip is required; Cul­ ham et al., 2003; Gallivan et al., 2009). The superior parietal lobe appears to contain a topographic map that represents memory-driven saccade direction (Sereno et al., 2001) as well as the direction of a pointing movement (Medendorp et al., 2003). The parietal cortex may be tiled with spatiotopic maps, each representing space in the service of a specific action (Culham & Valyear, 2006). It is likely that the computation of motor com­ mands for reaching depends on the simultaneous processing of mutually connected areas of the parietal and frontal lobes, which (p. 35) together provide an integrated coding sys­ tem for the control of reaching (Burnod et al., 1999; Thiebaut de Schotten et al., 2005). Importantly, while lesions to the ventral system invariably impair object recognition, ob­ ject-directed grasping is spared in the same patients (James et al., 2003). The parietal lobes clearly support spatial representations detailed enough to provide the coordinates for precise actions like reaching, grasping, pointing, touching, looking, and avoiding a projectile. Given the spatial precision of both manual action and oculomotor behavior, one would expect that the neural networks of the parietal cortex would include units with the smallest possible spatial tuning (i.e., very small receptive fields; Gross & Mishkin, 1977). By the same reasoning, the temporal cortex may preferentially include units with large spatial tuning because the goal of such a neural network is to represent the presence of a particular object, regardless of its spatial attributes (i.e., show dimen­ sional and translational invariance). However, it turns out that both the parietal and tem­ poral cortices contain units with large receptive fields (from 25 to 100 degrees; O’Reilly et al., 1990) that exclude the fovea and can even represent large bilateral regions of the visual field (Motter & Mountcastle, 1981). Therefore, some property of these neural populations other than receptive field size must underlie the ability of the parietal lobes to code positions precisely. A hint is given by computational models (Ballard, 1986; Eurich & Schwegler, 1997; Fahle & Poggio, 1981; Hinton, McClelland, & Rumelhart, 1986; O’Reilly et al., 1990) showing that a population of neurons with large receptive fields, if these are appropriately overlapping, can be su­ perior to a population of neurons with smaller receptive fields in its ability to pinpoint something. Crucially, receptive fields of parietal neurons are staggered toward the pe­ riphery of the visual field, whereas receptive fields of temporal neurons tend to crowd to­ ward the central, foveal position of the visual field. Consequently, parietal neurons can ex­ ploit coarse coding to pinpoint locations, whereas the temporal neurons, which provide less variety in output (i.e., they all respond to stimuli in the fovea), trade off the ability to locate an object with the ability to show invariance to spatial transformations. Spatial at­ tention evokes increased activity in individual parietal cells (Constantinidis & Steinmetz, 2001) and within whole regions of the parietal lobes (Corbetta et al., 1993; 2000). There­ fore, focusing attention onto a region of space can also facilitate computations of the rela­ tive activation of overlapping receptive fields of cells and thus enhance the ability to pre­ cisely localize objects (Tsal & Bareket, 2005; Tsal, Meiran, & Lamy, 1995).

Page 11 of 59

Representation of Spatial Relations Following the common parlance among neuro-scientists, who refer to major divisions be­ tween neural streams with interrogative pronouns, the “where” system is to a large ex­ tent also the “how” system of the brain (Goodale & Milner, 1992), or the “vision-for-ac­ tion” system (whereas the “what” system has been labeled the “vision-for-perception’ sys­ tem by Milner & Goodale, 1995, 2008). However, spatial representations do much more than guide action; we also “know” space and can perceive it without having an intention­ al plan or having to perform any action. Although neuroimaging studies show that areas within the human posterior parietal cortex are active when the individual is preparing to act, the same areas are also active during the mere observation of others’ actions (Bucci­ no et al., 2004; Rizzolatti & Craighero, 2004). One possibility is that these areas may be automatically registering the “affordances” of objects that could be relevant to action, if an action were to be performed (Culham & Valyear, 2006). The superior and intraparietal regions of the parietal lobes are particularly engaged with eye movements (Corbetta et al., 1998; Luna et al., 1998), but neuroimaging studies also show that the parietal areas are active when gaze is fixed on a point on the screen and no action is required, while the observer simply attends to small objects moving randomly on the screen (Culham et al., 1998). The larger the number of moving objects that have to be attentively monitored on the screen, the greater is the activation in the parietal lobe (Cul­ ham et al., 2001). Moreover, covert attention to spatial positions that are empty of objects (i.e., before they appear in the expected locations) strongly engages mappings in the pari­ etal lobes (Corbetta et al., 2000; Kastner et al., 1999). Monkeys also show neural popula­ tion activity within the parietal cortex when they solve visual maze tasks and when men­ tally following a path, without moving their eyes or performing any action (Crowe et al., 2005). In human neuroimaging studies, a stimulus position judgment (left vs. right) in re­ lation to the body midline mainly activates the superior parietal lobe (Neggers et al., 2006), although pointing to or reaching is not required. In addition, our cognition of space is also qualitative or “categorical” (Hayward & Tarr, 1995). Such a type of spatial information is too abstract to be useful in fine motor guid­ ance. Yet, neuropsychological evidence clearly indicates that this type of (p. 36) spatial perception and knowledge is also dependent on parietal lobe function (Laeng, 1994). Thus, the superior and inferior regions of the parietal lobes may specialize, respectively, in two visual-spatial functions: vision-for-action versus vision-for-knowledge, or a “prag­ matic” versus a “semantic” function that encodes the behavioral relevance or meaning of stimuli (Freedman & Assad, 2006; Jeannerod & Jacob, 2005). The superior parietal lobe (i.e., the dorsal part of the dorsal system) may have a functional role close to the idea of an “agent” directly operating in space, whereas the inferior parietal lobe plays a role clos­ er to that of an “observer” that understands space and registers others’ actions as they evolve in space (Rizzolatti & Matelli, 2003). According to Milner and Goodale (2006), the polysensory areas of the inferior parietal lobe and superior temporal cortex may have developed in humans as new functional ar­ eas and be absent in monkeys. Thus, they can be considered a “third stream” of visual processing. More generally, they may function as a supramodal convergent system be­ Page 12 of 59

Representation of Spatial Relations tween the dorsal and ventral systems that supports forms of spatial cognition that are unique to our species (e.g., use of pictorial maps; Farrell & Robertson, 2000; Semmes et al., 1955). These high-level representational systems in the human parietal lobe may also provide the substrate for the construction and spatial manipulation of mental images (e.g., the three-dimensional representation of shapes and the ability to “mentally rotate”). Indeed, mental rotation of single shapes is vulnerable to lesions of the posterior parietal lobe (in the right hemisphere; Butters et al., 1970) or to its temporary and reversible de­ activation after transcranial magnetic stimulation (Harris & Miniussi, 2003) or stimula­ tion with cortically implanted electrodes in epileptic patients (Zacks et al., 2003). Neu­ roimaging confirms that imagining spatial transformations of shapes (i.e., mental rota­ tion) produces activity in the parietal lobes (e.g., Alivisatos & Petrides, 1997; Carpenter et al., 1999a; Harris et al., 2000; Jordan et al., 2002; Just et al., 2001; Kosslyn, DiGirolamo, et al., 1998). Specifically, it is the right superior parietal cortex that seems most involved in the mental rotation of objects (Parsons, 2003). Note that space in the visual cortex is represented as two-dimensional space (i.e., as a planar projection of space in the world), but disparity information from each retina can be used to reconstruct the objects’ threedimensional (3D) shape and the depth and global 3D layout of a scene. Neuroimaging in humans and electrical recordings in monkeys both indicate that the posterior parietal lobe is crucial to cortical 3D processing (Naganuma et al., 2005; Tsao et al., 2003). The parietal lobe of monkeys also contains neurons that are selectively sensitive to 3D infor­ mation from monocular information like texture gradients (Tsutsui et al., 2002). Thus, the human inferior parietal lobe or Brodmann area 39 (also known as the angular gyrus) would then be a key brain structure for our subjective experience of space or for “space awareness.” This area may contribute to forming individual representations of multiple objects by representing the spatial distribution of their contours and boundaries (Robertson et al., 1997). Such a combination of “what” with “where” information would result in selecting “which” objects will be consciously perceived. In sum, the posterior parietal cortex in humans and primates appears to be the control center for visual-spatial functions and the hub of a widely distributed brain system for the processing of spatial in­ formation (Graziano & Gross, 1995; Mountcastle, 1995). This distributed spatial system would include the premotor cortex, putamen, frontal eye fields, superior colliculus, and hippocampus. Parietal lesions, however, would disrupt critical input to this distributed system of spatial representations.

4. Spatial Frames of Reference In patients with Bálint’s syndrome, bilateral parietal lesions destroy the representation of spatial relations. These patients act as though there is no frame of reference on which to hang the objects of vision (Robertson, 2003). Normally, in order to reach, grasp, touch, look, point toward, or avoid something, we need to compute relative spatial locations be­ tween objects and our body (or the body’s gravitational axis) or of some of its parts (e.g.,

Page 13 of 59

Representation of Spatial Relations the eyes or the head’s vertical axis). Such computations are in principle possible with the use of various frames of reference. The initial visual frame of reference is retinotopic. However, this frame represents a view of the world that changes with each eye movement and therefore is of limited use for con­ trolling action. In fact, the primate brain uses multiple frames of reference, which are ob­ tained by integrating information from the other sense modalities with the retinal infor­ mation. These subsequent frames of reference provide a more stable representation of the visual world (Feldman, 1985). In the parietal lobe, neurons combine information to a stimulus in a particular location with information about the position (p. 37) of the eyes (Andersen & Buneo, 2002; Andersen, Essick, & Siegel, 1985), which is updated across saccades (Heide et al., 1995). Neurons with head-centered receptive fields are also found in regions of the monkey’s parietal lobe (Duhamel et al., 1997). Comparing location on the retina to one internal to the observer’s body is an effective way to compute position within a spatiotopic frame of reference, as also shown by computer simulations (Zipser & Andersen, 1988). In the monkey, parietal neurons can also code spatial relationships as referenced to an object and not necessarily to an absolute position relative to the viewer (Chafee et al., 2005, 2007). A reference frame can be defined by an origin and its axes. These can be conceived as rigidly attached or fixed onto something (an object or an element of the environment) or someone (e.g., the viewer’s head or the hand). For example, the premotor cortex of mon­ keys contains neurons that respond to touch, and their receptive fields form a crude map of the body surface (Avillac et al., 2005). These neurons are bimodal in that they also re­ spond to visual stimuli that are adjacent in space to the area of skin they represent (e.g., the face or an arm). However, these cells’ receptive fields are not retinotopic; instead, when the eyes move, their visual receptive fields remain in register with their respective tactile fields (Gross & Graziano, 1995; Kitada et al., 2006). For example, a bimodal neu­ ron with a facial tactile field responds as if its visual field is an inflated balloon glued to the side of the face. About 20 percent of these bimodal neurons continue their activity af­ ter lights are turned off, so as to also code the memory of an object’s location (Graziano, Hu, & Gross, 1997). Apparently, some of these neurons are specialized for withdrawing from an object rather than for reaching it (Graziano & Cooke, 2006). Neurons that inte­ grate several modalities at once have also been found within the premotor cortex of the monkey; trimodal neurons (visual, tactile, and auditory; Figure 3.3) have receptive fields that respond to a sound stimulus located in the space surrounding the head, within roughly 30 cm (Graziano, Reiss, & Gross, 1999). Neuroimaging reveals maximal activity in the human dorsal parieto-occipital sulcus when viewing objects looming near the face (i.e., in a range of 13 to 17 cm), and this neural response decreases proportionally to dis­ tance from the face (Quinlan & Culham, 2007). The superior parietal lobe of monkeys might be the substrate for body-centered positional codes for limb movements, where coordinates define the azimuth, elevation, and distance of the hand (Lacquaniti et al.,

Page 14 of 59

Representation of Spatial Relations 1995). In other words, pre­ motor and parietal areas can represent visual space near the body in “arm-centered” coordinates (Graziano et al., 1994). Visual space is con­ structed many times over, at­ tached to different parts of the body for different func­ tions (Graziano & Gross, 1998). Neuroimaging in hu­ mans confirms that at least one of the topographic maps of the parietal lobes uses a head-centered coordinate frame (Sereno & Huang, 2006). Thus, a plurality of Figure 3.3 Frames of reference of neural cells of the sensorimotor action spaces macaque centered on a region of the face and ex­ may be related to specific ef­ tending into a bounded region in near space. Such fectors that can move inde­ cells are multimodal and can respond to either visual pendently from the rest of or auditory stimuli localized within their head-cen­ tered receptive field. the body (e.g., hand, head, and eye). In these motor-ori­ From Graziano et al., 1999. Reprinted with permis­ sion from Nature. ented frames of reference, a spatial relationship between two locations can be coded in terms of the movement required to get from one to the other (Pail­ lard, 1991).

Finally, the underpinning of our sense of direction is gravity, which leads to the percep­ tion of “up” versus “down” or of a vertical direction that is clearly (p. 38) distinct from all other directions. This gravitational axis appears as irreversible, whereas front–back and left–right change continuously in our frame of reference simply by our turning around (Clément & Reschke, 2008). The multimodal cortex integrates the afferent signals from the peripheral retina with those from the vestibular organs (Battista & Peters, 2010; Brandt & Dietrich, 1999; Kahane et al., 2003; Waszak, Drewing, & Mausfeld, 2005) so as to provide a sense of the body’s position in relation to the environment.

5. Neglecting Space Localizing objects according to multiple and parallel spatial frames of reference is also relevant to the manner in which spatial attention is deployed. After brain damage, atten­ tional deficit, or the “neglect” of space, clearly reveals how attention can be allocated within different frames of reference. Neglect is a clinical disorder that is characterized by a failure to notice objects to one side (typically, the left). However, “left” and “right” must be defined with respect to some frame of reference (Beschin et al., 1997; Humphreys & Riddoch, 1994; Pouget & Snyder, 2000), and several aspects of the neglect syndrome are best understood in terms of different and specific frames of reference. That is, an object Page 15 of 59

Representation of Spatial Relations can be on the left side with respect to the eyes, head, or body, or with respect to some ax­ is placed on the object (e.g., the left side of the person facing the patient). In the latter case, one can consider the left side of an object (e.g., of a ball) as (1) based on a vector originating from the viewer or (2) based on the intrinsic geometry of the object (e.g., the left paw of the cat). These frames of reference can be dissociated by positioning different­ ly the parts of the body or of the object. For example, a patient’s head may turn to the right, but gaze can be positioned far to the left. Thus, another person directly in front of the patient would lie to the left with respect to the patient’s head and to the right with re­ spect to the patient’s eyes. Moreover, the person in front of the patient would have her right hand to the left of the patient’s body, but if she turned 180 degrees, her right hand would then lie to the right of the patient’s body. Although right-hemisphere damage typically leads to severe neglect (Heilman et al., 1985; Heilman & Van Den Abell, 1980; Vallar et al., 2003), some patients with left-sided lesions tend to neglect the ends of words (i.e., the right side, in European languages), even when the word appears rotated 180 degrees or is written backward or in mirror fashion (Caramazza & Hillis, 1990). Such errors occurring for a type of stimulus (e.g., words) in an object-centered or word-centered frame of reference imply (1) a spatial rep­ resentation of the object’s parts (e.g., of the letters, from left to right, for words) and (2) that damage can specifically affect how one reference frame is transformed into another. As discussed earlier, the parietal cortex contains neurons sensitive to all combinations of eye position and target location. Consequently, a variety of reference frame transforma­ tions are possible because any function over that input space can be created with appro­ priate combinations of neurons (Pouget & Sejnowski, 1997; Pouget & Snyder, 2000). That is, sensory information can be recoded into a flexible intermediate representation to facil­ itate the transformation into a motor command. In fact, regions of the parietal lobes where cells represent space in eye-centered coordinates may not form any single spatial coordinate system but rather carry the raw information necessary for other brain areas to construct other spatial coordinate systems (Andersen & Buneo, 2002; Chafee et al., 2007; Colby & Goldberg, 1999; Graziano & Gross, 1998; Olson, 2001, 2003; Olson & Gettner, 1995).

Page 16 of 59

Representation of Spatial Relations

Figure 3.4 Left, The eight conditions used to probe neglect in multiple reference frames: viewer cen­ tered, object centered, and extra personal. In condi­ tion A, the patient viewed the cubes on a table and “near” his body. In condition B, the patient viewed the cubes on a table and “far” from his body. In con­ dition C, the patient viewed the cubes held by the ex­ perimenter while she sat “near” the patient, facing him. In condition D, the patient viewed the cubes held by the experimenter while she sat “far” away and facing the patient. In condition E, the patient viewed the cubes held by the experimenter while she sat “far” and turned her back to the patient. In condi­ tion F, the patient viewed the cubes in the “far” mir­ ror while these were positioned on a “near” table. In condition G, the patient viewed the cubes in the “far” mirror while the experimenter facing him held the cubes in her hands. In condition H, the patient viewed the cubes in the “far” mirror while the experi­ menter turned her back to the patient and held the cubes in her hands. Note that in the last three condi­ tions, the cubes are seen only in the mirror (in ex­ trapersonal space) and not directly (in peripersonal space). Right, Results for conditions D and E, show­ ing a worsening of performance when the target was held in the left hand of the experimenter and in left hemispace compared with the other combinations of an object-centered and viewer-centered frames of reference. From Laeng, Brennen, et al., 2002. Reprinted with permission of Elsevier.

According to a notion of multiple coordinate systems, different forms of neglect will mani­ fest depending on the specific portions of parietal or frontal cortex that are damaged. These will reflect a complex mixture of various coordinate frames. Thus, if a lesion of the parietal lobe causes a deficit in a distributed code of locations that can be read out in a variety of reference frames (Andersen & Buneo, 2002), neglect behavior will emerge in the successive visual transformations (Driver & Pouget, 2000). It may also be manifested within an object-centered reference frame (Behrmann & Moscovitch, 1994). Indeed, ne­ Page 17 of 59

Representation of Spatial Relations glect patients can show various mixtures and dissociations between the reference frames; thus, some patients show both object-centered and viewer-centered neglect (Behrmann & Tipper 1999), but other patients shown neglect in just one of these frames (Hillis & Cara­ mazza, 1991; Tipper & Behrmann, 1996). For example, Laeng and colleagues (2002) asked a neglect patient to report the colors of two objects (cubes) that could either lie on a table positioned near or far from the patient or be held in the left and right hands of the experimenter. In the latter case, the experimenter either faced the patient or turned backward so that the cubes held in her hands could lie in either the left or right hemi­ space (Figure 3.4). Thus, the cubes’ position in space was also determined by the experimenter’s (p. 39) body position (i.e., they could be described according to an exter­ nal body’s object-centered frame). Moreover, by use of a mirror, the cubes could be seen in the mirror far away, although they were “near” the patient’s body, so that the patient actually looked at a “far” location (i.e., the surface of the mirror) to see the physically near object. The experiment confirmed the presence of all forms of neglect. Not only did the patient name the color of a cube seen in his left hemispace more slowly than in his right hemispace, but also latencies increased for a cube held by the experimenter in her left hand and in the patient’s left hemispace (both when the left hand was seen directly or as a mirror reflection). Finally, the patient’s performance was worse for “far” than “near” locations. He neglected cubes located near his body (i.e., within “grasping” space) but seen in the mirror, thus dissociating directing gaze toward extrapersonal space to see an object physically located in peripersonal space. In most accounts of spatial attention, shifting occurs within coordinate frames that can be defined by a portion (or side) of a spatial plane that is orthogonally transected by some egocentric axis (Bisiach et al., 1985). However, together with the classic form of neglect for stimuli or features to the left of the body (or an object’s) midline, neglect can also oc­ cur below or above the horizontal plane or in the lower (Butter et al., 1989) versus the upper visual field (Shelton et al., 1990) In addition, several neglect behaviors would seem to occur in spatial frames of reference that are best defined by vectors (p. 40) (Kins­ bourne, 1993) or polar coordinates (Halligan & Marshall, 1995), so that either there is no abrupt boundary for the deficit to occur or the neglected areas are best described by an­ nular regions of space around the patient’s body (e.g., grasping or near, peripersonal, space). Neurological studies have identified patients with more severe neglect for stimuli within near or reaching space than for stimuli confined beyond the peripersonal region in far, extrapersonal, space (Halligan & Marshall, 1991; Laeng et al., 2002) as well as pa­ tients with the opposite pattern of deficit (Cowey et al., 1994; Mennemeier et al., 1992). These findings appear consistent with the evidence from neurophysiology studies in mon­ keys (e.g., Graziano et al., 1994), where spatial position can be defined within a bounded region of space to the head or arm. Moreover, a dysfunction within patients’ inferior pari­ etal regions is most likely to result in neglect occurring in an “egocentric” spatial frame of reference (i.e., closely related to action control within personal space), whereas dys­ function within the superior temporal region is most likely to result in “allocentric” ne­ glect occurring in a spatial frame of reference centered on the visible objects in extraper­ sonal space (Committeri et al., 2004, 2007; Hillis, 2006). Page 18 of 59

Representation of Spatial Relations Patients with right parietal lesions also have difficulties exploring “virtual space” (i.e., lo­ cating objects within their own mental images). For example, patients with left-sided ne­ glect are unable to describe from their visual memory left-sided buildings in a city scene (“left” being based on their imagined position within a city’s square; Beschin et al., 2000; Bisach & Luzzatti, 1978). Such patients may also be unable to spell the beginning of words (i.e., unable to read the left side of the word from an imaginary display; Baxter & Warrington, 1983). However, patients with neglect specific to words (or “neglect dyslex­ ia”) after a left-hemisphere lesion can show a spelling deficit for the ends of words (Cara­ mazza & Hillis, 1990). Neuropsychological findings also have shown that not only lesions of the inferior parietal lobe but also those of the frontal lobe and the temporal-parietal-occipital junction lead to unilateral neglect. Remarkably, damage to the rostral part of the human superior tempo­ ral cortex (of the right hemisphere) results in profound spatial neglect (Karnath et al., 2001) in humans and monkeys, characterized by a profound lack of awareness for objects in the left hemispace. Because the homologous area of the left hemisphere is specialized for language in humans, this may have preempted the spatial function of the left superior temporal cortex, causing a right-sided dominance for space-related information (Wein­ traub & Mesulam, 1987). One possibility is that the right-sided superior temporal cortex plays an integrative role with regard to the ventral and dorsal streams (Karnath, 2001) because the superior temporal gyrus is adjacent to the inferior areas of the dorsal system and receives input from both streams and is therefore a site for multimodal sensory con­ vergence (Seltzer & Pandya, 1978). However, none of these individual areas should be in­ terpreted as the “seat” of the conscious perception of spatially situated objects. In fact, no cortical area alone may be sufficient for visual awareness (Koch, 2004; Lamme et al., 2000). Most likely, a conscious percept is the expression of a distributed neural network and not of any neural bottleneck. That is, a conscious percept is the gradual product of recurrent and interacting neural activity from several reciprocally interconnected regions and streams (Lamme, 2003, 2006). Nevertheless, the selective injury of a convergence zone, like the superior temporal lobe, could disrupt representations that are necessary (but not sufficient) to spatial awareness. Interestingly, patients with subcortical lesions and without detectable damage of either temporal or parietal cortex also show neglect symptoms. However, blood perfusion mea­ surements in these patients reveal that the inferior parietal lobe is hypoperfused and therefore dysfunctional (Hillis et al., 2005). Similarly, damage to the temporoparietal junction, an area neighboring both the ventral and dorsal systems, produces abnormal correlation of the resting state signal between left and right inferior parietal lobes, which are not directly damaged; this abnormality correlates with the severity of neglect (Corbet­ ta et al., 2008; He et al., 2007). Therefore, the “functional lesion” underlying neglect may include a more extensive area than what is revealed by structural magnetic resonance, by disrupting underlying association or recurrent circuits (e.g., parietal-frontal pathways; Thiebaut de Schotten et al., 2005).

Page 19 of 59

Representation of Spatial Relations

6. Lateralization of Spatial Representations Differential functional specializations of the two sides of the brain are already present in early vertebrates (Sovrano et al., 2005; Vallortigara & Rogers, 2005), suggesting that lat­ eralization may be the expression of a strategy of division of labor that evolved millions of years before the appearance of the human species. In several species, the right brain ap­ pears to be specialized for vigilance (p. 41) and recognition of novel or surprising stimuli. For example, birds appear more adept at gathering food or catching prey seen with the right eye (i.e., left brain) than with the left eye (i.e., right brain). Such a segregation of functions would seem at first glance not so adaptive because it may put the animal at great risk (by making its behavior predictable to both prey and predators). An evolution­ ary account that can explain this apparently nonadaptive brain organization is based on the hypothesis that a complementary lateralization makes the animal superior in perform­ ing several tasks at the same time (Vallortigara et al., 2001), counteracting the ecological disadvantages of lateral bias. Evidence indicates that birds that are strongly lateralized are more efficient at parallel processing than birds of the same species that are weakly lateralized (Rogers et al., 2004). Thus, a complementary lateral specialization would seem to make the animals apt to attend to two domains simultaneously. There is a clear analogy between this evolution-arily adaptive division of labor between the vertebrate cerebral hemispheres and the performance of artificial neural networks that segregate processing to multiple, smaller subsystems (Otto et al., 1992; Reuckl, Cave, & Kosslyn, 1989). Most relevant, this principle of division of labor has also been ap­ plied to the modularization of function for types of spatial representations. Specifically, Kosslyn (1987) proposed the existence of two neural subnetworks within the dorsal stream that process qualitatively different types of spatial information. One spatial repre­ sentation is based on a quantitative parsing of space and therefore closely related to that of spatial information in the service of action. This type of representation is called coordi­ nate (Kosslyn, 1987) because it is derived from representations that provide coordinates for navigating into the environment as well as for performing targeted actions such as reaching, grasping, hitting, throwing, and pointing to something. In contrast, the other hypothesized type of spatial representation, labeled categorical spatial relation, parses space in a qualitative manner. For example, two configurations can be described as “one to the left of the other.” Thus, qualitative spatial relations are based on the perception of spatial categories, where an object (but also an empty place) is assigned to a broad equiv­ alence class of spatial positions (e.g., if a briefcase can be on the floor, and being “on the floor” is satisfied by being placed on any of the particular tiles that make up the whole floor). Each of the two proposed separate networks would be complementarily lateralized. Thus, the brain can represent in parallel the same spatial layout in at least two separate man­ ners (Laeng et al., 2003): a right-hemisphere mode that assesses spatial “analog” spatial relations (e.g., the distance between two objects) and a left-hemisphere mode that assess­ es “digital” spatial relations (e.g., whether two objects are attached to one another or above or below the other). The underlying assumption in the above account is that com­ Page 20 of 59

Representation of Spatial Relations puting separately the two spatial relations (instead of, e.g., taking the quantitative repre­ sentation and making it coarser by grouping the finer locations) could result in a more ef­ ficient representation of space, where both properties can be attended simultaneously. Artificial neural network simulations of these spatial judgments provide support for more efficient processing in “split” networks than unitary networks (Jacobs & Kosslyn, 1994; Kosslyn, Chabris, et al., 1992; Kosslyn & Jacobs, 1994). These studies have shown that, when trained to make either digital or analog spatial judgments, the networks encode more effectively each relation if their input is based, respectively, on units with relatively small, nonoverlapping receptive fields, as opposed to units with relatively large, overlap­ ping receptive fields (Jacobs & Kosslyn, 1994). Overlap of location detectors would then promote the representation of distance, based on a “coarse coding” strategy (Ballard, 1986, Eurich & Schwegler, 1997; Fahle & Poggio, 1981; Hinton, McClelland, & Rumel­ hart, 1986). In contrast, the absence of overlap between location detectors benefits the representation of digital or categorical spatial relations, by effectively parsing space. Consistent with the above computational account, Laeng, Okubo, Saneyoshi, and Michi­ mata (2011) observed that spreading the attention window to encompass an area that in­ cludes two objects or narrowing it to encompass an area that includes only one of the ob­ jects can modulate the ability to represent each type of spatial relation. In this study, the spatial attention window was manipulated to select regions of differing areas by use of cues of differing sizes that preceded the presentation of pairs of stimuli. The main as­ sumption was that larger cues would encourage a more diffused attention allocation, whereas the small cues would encourage a more focused mode of attention. When the at­ tention window was large (by cueing an area that included both objects as well as the empty space between them), spatial transformations of distance between two objects were noticed faster than when (p. 42) the attention window was relatively smaller (i.e., when cueing an area that included no more than one of the objects in the pair). Laeng and colleagues concluded that a relatively larger attention window would facilitate the processing of an increased number of overlapping spatial detectors so as to include (and thus “measure”) the empty space in between or the spatial extent of each form (when judging, e.g., size or volume). In contrast, smaller nonoverlapping spatial detectors would facilitate parsing space into discrete bins or regions and, therefore, the processing of cat­ egorical spatial transformations; indeed, left–right and above–below were noticed faster in the relatively smaller cueing condition than in the larger (see also Okubo et al., 2010). The theoretical distinction between analog and digital spatial functions is relatively re­ cent, but early neurological investigations had already noticed that some spatial functions (e.g., distinguishing left from right) are commonly impaired after damage to the posterior parietal cortex of the left hemisphere, whereas impairment of other spatial functions, like judging an object’s orientation or exact position, is typical after damage to the same area in the opposite, right, hemisphere (Luria, 1973). The fact that different forms of spatial dysfunctions can occur independently for each hemisphere has been repeatedly con­ firmed by studies of patients with unilateral lesions (Laeng, 1994, 2006; Palermo et al., 2008) as well as by neuroimaging studies of normal individuals (Baciu et al., 1999; Koss­ lyn et al., 1998; Slotnick & Moo, 2006; Trojano et al., 2002). Complementary results have Page 21 of 59

Representation of Spatial Relations been obtained with behavioral methods that use the lateralized (and tachistoscopic) pre­ sentation of visual stimuli to normal participants (e.g., Banich & Federmeier, 1999; Bruy­ er et al., 1997; Bullens & Postma, 2008; Hellige & Michimata, 1989; Kosslyn, 1897; Koss­ lyn et al., 1989, 1995; Laeng et al., 1997; Laeng & Peters, 1995; Roth & Hellige, 1998; Ry­ bash & Hoyer, 1992). Laeng (1994, 2006) showed a double dissociation between failures to notice changes in categorical spatial relations and changes in coordinate spatial relations. A group of pa­ tients with unilateral damage to the right hemisphere had difficulty noticing a change in distance or angle between two figures of animals presented successively. The same pa­ tients had less difficulty noticing a change of relative orientation (e.g., left vs. right or above vs. below) between the same animals. In contrast, the patients with left-hemi­ sphere damage had considerably less difficulty noticing that the distance between the two animals had either increased or decreased. In another study (Laeng, 2006), similar groups of patients with unilateral lesions made corresponding errors in spatial construc­ tion tasks from memory (e.g., building patterns made of matchsticks; relocating figures of animals on a cardboard). Distortions in reproducing the angle between two elements and accuracy of relocation of the objects in the original position were more common after damage to the right hemisphere (see also Kessels et al., 2002), whereas mirror reversals of elements of a pattern were more common after damage to the left hemisphere. A study by Palermo and colleagues (2008) showed that patients with damage confined to the left hemisphere had difficulty visually imaging whether a dot shown in a specific position would fall inside or outside of a previously seen circle. These patients were relatively bet­ ter in visually imaging whether a dot shown in a specific position would be nearer to or farther from the circle’s circumference than another dot previously seen together with the same circle. The opposite pattern of deficit was observed in the patients with righthemisphere damage. Another study with patients by Amorapanth, Widick, and Chatterjee (2010) showed that lesions to a network of areas in the left hemisphere resulted in more severe impairment in judging categorical spatial relations (i.e., the above–below relations between pairs of objects) than lesions to homologous areas of the right hemisphere. Also in this study, the reverse pattern of impairment was observed for coordinate spatial pro­ cessing, where right-brain damage produced more severe deficit than left-hemisphere damage.

Page 22 of 59

Representation of Spatial Relations

Figure 3.5 Spatial memories for “coordinate” rela­ tions showed increased activity in the right hemisphere’s prefrontal cortex, whereas memories for “categorical” relations showed increased activity in the left hemisphere’s prefrontal cortex. From Slotnick & Moo, 2006. Reprinted with permis­ sion from Elsevier.

The above evidence with patients is generally consistent with that from studies with healthy participants, in particular studies using the lateralized tachistoscopic method. In these studies, the relative advantages in speed of response to stimuli presented either to the left or right of fixation indicated superiority of the right hemisphere (i.e., left visual field) for analog judgments and of the left hemisphere (i.e., right visual field) for digital judgments. However, in studies with healthy subjects, the lateral differences appear to be small (i.e., in the order of a few tens of milliseconds according to a meta-analysis; Laeng et al., 2003). Nevertheless, small effect sizes identified with such a noninvasive method are actually greater than effect sizes in percent blood oxygenation observed with fMRI. Most important, both behavioral effects can predict very dramatic outcomes after dam­ age to the same region or side of the brain. Another method, whereby the same cortical sites can be temporarily and reversibly deactivated (i.e., transcranial magnetic stimula­ tion [TMS]), (p. 43) provides converging evidence. Left-sided stimulation can effectively mimic the deficit in categorical perception after left-hemisphere damage, whereas rightsided stimulation mimics the deficit in coordinate space perception after right-hemi­ sphere damage (Slotnick et al., 2001; Trojano et al., 2006). A common finding from studies using methods with localizing power (e.g., neuroimaging, TMS, and selected patients) is that both parietal lobes play a key role in supporting the perception of spatial relations (e.g., Amorapanth et al., 2010; Baciu et al., 1999; Kosslyn et al., 1998; Laeng et al., 2002; Trojano et al., 2006). Moreover, areas of the left and right prefrontal cortex that receive direct input from ipsilateral parietal areas also show activi­ ty when categorical or coordinate spatial information, respectively, is held in memory (Kosslyn, Thompson, et al., 1998; Trojano et al., 2002). In an fMRI study (Slotnick & Moo, 2006), participants viewed in each trial a configuration consisting of a shape and a dot placed at a variable distance from the shape (either “on” or “off” the shape and, in the latter case, either “near” or “far” from the shape). In the subsequent retrieval task, the shape was presented without the dot, and participants responded to queries about the previously seen spatial layout (e.g., either about a categorical spatial relation property: Page 23 of 59

Representation of Spatial Relations “was the dot ‘on’ or ‘off’ the shape?”; or about a coordinate spatial relation property: “was the dot ‘near’ to or ‘far’ from the shape?”). Spatial memories for coordinate rela­ tions were accompanied by increased activity in the right hemisphere’s prefrontal cortex, whereas memories for categorical relations were accompanied by activity in the left hemisphere’s prefrontal cortex (see Figure 3.5). One should note that the above studies on the perception of categorical and coordinate relations do not typically involve any specific action in space, but instead involve only ob­ servational judgments (e.g., noticing or remembering the position of objects in a display). Indeed, a child’s initial cognition of space and of objects’ numerical identity may be en­ tirely based on a purely observational representation of space whereby the child notices that entities preserve their identities and trajectories when they disappear behind other objects and reappear within gaps of empty space (Dehaene & Changeux, 1993; Xu & Carey, 1996). The above findings from neuroscience studies clearly point to a role of the dorsal system in representing spatial information beyond the mere service of action (cf. Milner & Goodale, 2008). Thus, the evidence from categorical and coordinate spatial pro­ cessing, together with the literature on other spatial transformations or operations (e.g., mental rotations of shapes, visual maze solving) clearly indicates that a parietal-frontal system supports not merely support the “act” function but also two other central func­ tions of visual-spatial representations: to “know” and “talk.” The latter, symbolic function would seem of particular relevance to our species and the only one that we do not share with other living beings (except, perhaps, honeybees; Kirchner & Braun, 1994; Menzel et al., 2000). That is, humans can put into words or verbal propositions (as well as into gestures) any type of (p. 44) spatial relations, whether quantitative (by use of numerical systems and geometric systems specifying angles and eccentricities) or qualitative (by use of preposi­ tions and locutions). However, quantitative propositions may require measurement with tools, whereas establishing qualitative spatial relations between objects would seem to require merely looking at them (Ullman, 1984). If abstract spatial relations between ob­ jects in a visual scene can be effortlessly perceived, these representations are particular­ ly apt to be efficiently coded in a propositional manner (e.g., “on top of”). The latter lin­ guistic property would seem pervasive in all languages of the world and also pervade dai­ ly conversations. Some “localist” linguists have proposed that the deep semantic struc­ ture of language is intrinsically spatial (Cook, 1989). Some cognitive neuro-scientists have also suggested that language in our species may have originated precisely from the need to transmit information about the spatial layout of an area from one person to anoth­ er (O’Keefe, 2003; O’Keefe & Nadel, 1978). Locative prepositions are often used to refer to different spatial relations in a quick and frugal manner (e.g., “above,” “alongside,” “around,” “behind,” “between,” “inside,” “left,” “on top of,” “opposite,” “south,” “toward,” “underneath”); their grammatical class may exist in all languages (Jackendoff & Landau, 1992; Johnson-Laird, 2005; Kemmerer, 2006; Miller & Johnson-Laird, 1976; Pinker, 2007). Clearly, spatial prepositions embedded in sentences (e.g., the man is “in” the house) can express spatial relations only in a rather Page 24 of 59

Representation of Spatial Relations abstract manner (compared, for example, with how GPS coordinates can pinpoint space) and can guide actions and navigation only in a very coarse sense (e.g., by narrowing down an area of search). Locative prepositions resemble categorical spatial relations in that they express spatial relationships in terms of sketchy or schematic structural proper­ ties of the objects, often ignoring details of spatial metrics (e.g., size, orientation, dis­ tance; Talmy, 2000). Nevertheless, the abstract relations of locative prepositions seem ex­ tremely useful to our species because they can become the referents of vital communica­ tion. Moreover, categorical spatial representations and their verbal expression counter­ parts may underlie the conceptual structure of several other useful representations (Miller & Johnson-Laird, 1976), like the representations of time and of numerical entities (Hubbard et al., 2005). Indeed, categorical spatial representations could provide the ba­ sic mental scaffolding for semantics (Cook, 1989: Jeannerod & Jacob, 2005), metaphors (Lakoff & Johnson, 1999), and reasoning in general (Goel et al., 1998; Johnson-Laird, 2005; Pinker, 1990). O’Keefe (1996; 2003) has proposed that the primary function of locative prepositions is to identify a set of spatial vectors between places. The neural substrate supporting such function would consist of a specific class of neurons or “place cells” within the right hip­ pocampus and of cerebral structures interconnected with the hippocampus. Specifically, a combination of the receptive fields of several space cells would define boundaries of re­ gions in space that effectively constitute the referential meaning of a preposition. For ex­ ample, the preposition “below” would identify a “place field” with its center on the verti­ cal direction vector from a reference object. The width of such a place field would typical­ ly be larger than the width of the reference object but would taper with distance so as to form a tear-dropped region attached to the bottom surface of the reference object (see al­ so Carlson et al., 2003; Hayward & Tarr, 1995). Cognitive neuroscience studies have found neuroanatomical correlates of locative prepo­ sitions within the left inferior prefrontal and left inferior parietal regions (Friederici, 1982; Tranel & Kemmerer, 2004). Consistently, neuroimaging studies have found that naming spatial relationships with prepositions activated the same regions in healthy sub­ jects (Carpenter et al., 1999b; Damasio et al., 2001). Similar results have been found with speakers of sign language (Emmorey et al., 2002). Kemmerer and Tranel (2000) found a double dissociation between linguistic representations and perceptual representations; that is, some patients had difficulties using locative prepositions but not making percep­ tual judgments, and other patients had the opposite problem. Laeng (1994) also noticed that patients who made errors in a matching-to-sample task with pictures differing in their categorical spatial relations were nonetheless able to follow the instructions of the Token Test (where the comprehension of locative prepositions is necessary; De Renzi & Vignolo, 1962). These findings indicate that the encoding of categorical spatial relations (and their loss after left-hemisphere damage) cannot be easily reduced to the mediation of semantic or verbal codes. In fact, the evidence suggests that, although perceptual rep­ resentations may be crucial for establishing the meaning of locative prepositions (Hay­ ward & Tarr, 1995), once these are learned, they can be supported and interpreted within the semantic network and also selectively disrupted by (p. 45) brain damage. The concep­ Page 25 of 59

Representation of Spatial Relations tual representation of locative prepositions also appears to be separated from other lin­ guistic representations (e.g., action verbs, despite several of these verbs sharing with prepositions the conceptual domain of space) because brain damage can dissociate the meanings of these terms (Kemmerer & Tranel, 2003). Although the evidence for the existence of a division of labor for analog versus digital spatial relations in humans is now clearly established, an analogous lateralization of brain function in nonhuman species remains unclear (Vauclair et al., 2006). Importantly, lateral­ ization is under the influence of “opportunistic” processes of brain development that opti­ mize the interaction of different subsystems within the cerebral architecture (Jacobs, 1997, 1999). Thus, in our species, local interactions with linguistic and semantic net­ works may play a key role in the manner in which the spatial system is organized. That is, biasing categorical spatial representations within a left hemisphere’s substrate by “yok­ ing” them with linguistic processes may facilitate a joint operation between perception, language, and thought (Jacobs & Kosslyn, 1994; Kosslyn, 1987).

7. The “Where” of “What”: Spatial Information Within the Object The conceptual distinctions of categorical and coordinate spatial relations also have strong similarities to different types of geometries (e.g., “topological” versus “Euclidean” or “affine” geometries). For example, inside–outside judgments are topological judg­ ments. Piaget and Inhelder (1956) considered topological judgments as basic and sug­ gested that children learn topological spatial concepts earlier than other types of spatial concepts, such as projective and “Euclidian-like” geometry; however, even infants are sensitive to metric qualities (Liben, 2009). Research in neuroscience shows that topological judgments are accomplished by the pari­ etal lobe (also in rats; Goodrich-Hunsaker et al., 2008); in humans, these judgments have a robust left-hemisphere advantage (Wang et al., 2007). As originally reasoned by Franco and Sperry (1977), given that we can represent multiple geometries (e.g., Euclidian, affine, projective, topological) and the right hemisphere’s spatial abilities are superior to those of the left (the prevalent view in the 1970s), the right hemisphere should match shapes by their geometrical properties better than the left hemisphere. They tested this idea with a group of (commissurotomized) “split-brain” patients in an intermodal (vision and touch) task. Five geometrical forms of the same type were examined visually, while one hand searched behind a curtain for one shape among three with the matching geome­ try. As expected, the left hand’s performance of the split-brain patients was clearly superi­ or to that of their right hand. This geometrical discrimination task required the percep­ tion of fine spatial properties of shapes (e.g., differences in slant and gradient of surfaces, angular values, presence of concavities or holes). Thus, the superior performance of the left hand, which is controlled by the right hemisphere, reflects the use of the right hemisphere’s coordinate spatial relations’ system (of the right hemisphere) in solving a shape discrimination task that crucially depends on the fine metrics of the forms. Later Page 26 of 59

Representation of Spatial Relations investigations on split-brain patients showed that the left hand outperforms the right hand also when copying drawings from memory or in rearranging blocks of the WAIS-R Block Design Test (LeDoux, Wilson, & Gazzaniga, 1977). Specifically, LeDoux and Gaz­ zaniga (1978) proposed that the right hemisphere possesses a perceptual capacity that is specifically dedicated to the analysis of space in the service of the organization of action or movements planning that they called a manipulospatial subsystem. Again, a left-hand (right-hemisphere) superiority in these patients’ constructions is consistent with a view that rearranging multiple items that are identical in shape (and share colors) may require a coordinate representation of the matrix of the design or array (Laeng, 2006). A shape is intuitively a geometrical entity that occupies a volume of space. As such, a shape is nothing but the spatial arrangement of the points in space occupied by it. How­ ever, many real-world objects can be parsed in component elements or simpler shapes, and many objects differ in the locations of similar or identical constitutive elements (Bie­ derman, 1987). Intuitively, it would seem that a multipart object (e.g., a bicycle) is noth­ ing but the spatial relations among its parts. According to several accounts, an object can be represented as a structural description (Marr, 1982) or a representation of connec­ tions between parts (e.g., geons, in Biederman’s, 1987, recognition by components mod­ el). In these computational models, an object’s connections are conceived as abstract spa­ tial specifications of how an object’s parts are put together. The resulting representations can differentially describe whole classes of similar objects (e.g., cups versus buckets). In this case, abstract, categorical, spatial relations (Hayward & Tarr, 1995; Hummel & Bie­ derman, 1992) (p. 46) could provide the spatial ingredient of such structural descriptions. Indeed, Kosslyn (1987) proposed that the left dorsal system’s categorical spatial represen­ tations can play a role in object recognition, by representing spatial relations among the object’s parts. In this account, shape properties are stored in a visual memory system within the inferior temporal lobe (Tanaka et al., 1991) as a nontopographical “population code” that ignores locations. Whereas the dorsal system registers locations in a topographic map that ignores shape (e.g., the object can be represented here simply as a point and its location specified relatively to other points or indices). Such a map of in­ dices or spatial tokens could then represent the locations of objects in a scene or of parts of objects in space and form an “object map” (Kosslyn et al., 2006) or “skeletal image” (Kosslyn, 1980). This information can be used to reconstruct the image by posi­ tioning (back-propagating) each part representation in its correct location within the high-detail topographic maps of the occipital lobes. When reconstituting a mental image or, generally, in recollection (O’Regan & Nöe, 2001), the set of locations retrieved from the “object map” could also be specified by the relation of parts to eye position during learning (Laeng & Teodorescu, 2002). Based on the above account, one would expect that lesions of the dorsal system (in partic­ ular, of the left hemisphere) would result in object recognition problems. However, as al­ ready discussed, patients with parietal lesions do not present the dramatic object recogni­ tion deficits of patients with temporal lesions. Patients with unilateral lesions localized in the parietal lobe often appear to lack knowledge of the spatial orientation of objects, yet they appear to achieve normal object recognition (Turnbull et al., 1995, 1997). For exam­ Page 27 of 59

Representation of Spatial Relations ple, they may fail to recognize their correct orientation or, when drawing from memory, they rotate shapes of 90 or 180 degrees. Most remarkably, patients with bilateral parietal lesions (i.e., with Bálint’s syndrome and simultanagnosia), despite being catastrophically impaired in their perception of spatial relations between separate objects, can recognize an individual object (albeit very slowly; Duncan et al., 2003) on the basis of its shape. Hence, we face something of a paradox: Object identity depends on spatial representa­ tions among parts (i.e., within object relations), but damage to the dorsal spatial systems does not seem to affect object recognition (Farah, 1990). It may appear that representing spatial relations “within” and “without” shapes depends on different perceptual mecha­ nisms. However, there exists evidence that lesions in the dorsal system can cause specific types of object recognition deficits (e.g., Warrington, 1982; Warrington & Taylor, 1973). First of all, patients with Bálint’s syndrome do not have entirely normal object perception (Dun­ can et al., 2003; Friedman-Hill et al., 1995; Robertson et al., 1997). Specifically, patient R.M., with bilateral parieto-occipital lesions, is unable to judge both relative and absolute visual locations. Concomitantly, he makes mistakes in combining the colors and shapes of separate objects or the shape of an object with the size of another (i.e., the patient com­ mits several “illusory conjunctions”). Thus, an inadequate spatial representation or loss of spatial awareness of the features of forms, due to damage to parietal areas, appears to underlie both the deficit in spatial judgment and that of binding shape features. Accord­ ing to feature integration theory (Treisman, 1988), perceptual representations of two sep­ arate objects currently in view require integration of information in the dorsal and ven­ tral system, so that each object’s specific combination of features in their proper loca­ tions can be obtained. Additional evidence that spatial information plays a role in shape recognition derives from a study with lateralized stimuli (Laeng, Shah & Kosslyn, 1999). This study revealed a short-lived advantage for the left hemisphere (i.e., for stimuli presented tachistoscopical­ ly to the right visual field) in the recognition of pictures of contorted poses of animals. It was reasoned that nonrigid multipart objects (typically animal bodies but also some arti­ facts, e.g., a bicycle) can take a number of contortions that, combined with an unusual perspective, are likely to be novel or rarely experienced by the observer. In such cases, the visual system may opt to operate in a different mode from the usual matching of stored representations (i.e., bottom-up matching of global templates) and initiate a hy­ pothesis-testing procedure (i.e., a top-down search for connected parts and a serial matching of these to stored structural descriptions). In the latter case, the retrieval of categorical spatial information (i.e., a hypothesized dorsal and left hemisphere’s function) seems to be crucial for recognition. Abstract spatial information about the connectivity of the object’s parts would facilitate the formation of a perceptual hypothesis and verifying it by matching visible parts to the object’s memorized spatial configuration. In other words, an “object map” in the dorsal system specifies the spatial relations among parts’ representation of the complex pattern represented by the ventral system (Kosslyn, Ganis, & Thompson, 2006). Page 28 of 59

Representation of Spatial Relations

Figure 3.6 Stimuli used in an object recognition task with patients with unilateral posterior lesions. Pa­ tients with damage to the left hemisphere had greater difficulties with the noncanonical views of the nonrigid objects (animals) than those with dam­ age to the right hemisphere, whereas those with damage to the right hemisphere had relatively greater difficulties with the noncanonical views of rigid objects. Reprinted with permission from Laeng et al., 2000.

A subsequent study (Laeng et al., 2000) of patients with unilateral posterior dam­ age (mainly affecting the parietal lobe) confirmed that patients with left-hemisphere dam­ (p. 47)

age had greater difficulties in recognizing the contorted bodies of animals (i.e., the same images used in Laeng et al.’s, 1999, study) than those with right- hemisphere damage (Figure 3.6). However, left-hemisphere damage resulted in less difficulty than right-hemi­ sphere damage when recognizing pictures of the same animals seen in conventional pos­ es but from noncanonical (unusual) views as well as when recognizing rigid objects (arti­ facts) from noncanonical views. As originally shown in studies by Warrington (1982; War­ rington & Taylor, 1973), patients with right parietal lesions showed difficulties in the recognition or matching of objects when viewed at unconventional perspectives or in the presence of strong shadows. According to Marr (1982), these findings suggested that the patients’ difficulties reflect the inability to transform or align an internal spatial frame of reference centered on the object’s intrinsic coordinates (i.e., its axes of elongation) to match the perceived image. To conclude, the dorsal system plays a role in object recognition but as an optional re­ source (Warrington & James, 1988) by cooperating with the ventral system during chal­ lenging visual situations (e.g., novel contortions of flexible objects or very unconventional views or difficult shape-from-shadows discriminations; Warrington & James, 1986) or when making fine judgments about the shapes of objects that differ by subtle variations Page 29 of 59

Representation of Spatial Relations in size or orientation (Aguirre & D’Esposito, 1997; Faillenot et al., 1997). In ordinary cir­ cumstances, different from these “visual problem solving” (p. 48) situations (Farah, 1990), a spatial analysis provided by the dorsal system seems neither necessary nor sufficient to achieve object recognition.

8. Cognitive Maps As thinking agents, we accumulate in our lifetime a spatial understanding of our sur­ rounding physical world. Also, we can remember and think about spatial relations either in the immediate, visible, physical environment or in the invisible environments of a large geographic scale. We can also manipulate virtual objects in a virtual space and imagined geometry (Aflalo & Graziano, 2008). As a communicative species, we can transfer knowl­ edge about physical space to others through symbolic systems like language and geo­ graphical maps (Liben, 2009). Finally, as a social species, we tend to organize space into territories and safety zones, and to develop a sense of personal place. A great deal of our daily behavior must be based on spatial decisions and choices be­ tween routes, paths, and trajectories. A type of spatial representation, called the cogni­ tive map, appears to be concerned with the knowledge of large-scale space (Cheng, 1986; Kosslyn et al., 1974; Kuipers, 1978; Tolman, 1948; Wolbers & Hegarty, 2010). A distinc­ tion can be made between (1) a map-like representation, consisting of a spatial frame ex­ ternal to the navigating organism (this representation is made by the overall geometric shape of the environment [survey knowledge] and/or a set of spatial relationships be­ tween locales [landmarks and place]); and (2) an internal spatial frame that is based on egocentric cues generated by self-motion (route knowledge) and vestibular information (Shelton & McNamara, 2001). Cognitive maps may be based on categorical spatial infor­ mation (often referred to as topological; e.g., Poucet, 1993), which affords a coarse repre­ sentation of the connectivity of space and its overall arrangement, combined with coordi­ nate (metric) information (e.g., information about angles and distances) of the large-scale environment. Navigation (via path integration or dead reckoning or via the more flexible map-like rep­ resentation) and environmental knowledge can be disrupted by damage to a variety of brain regions. Parietal lesions result in difficulties when navigating in immediate space (DiMattia & Kesner, 1988; Stark et al., 1996) and can degrade the topographic knowledge of their environment (Newcombe & Russell, 1969; Takahashi et al., 1997). However, areas supporting cognitive map representations in several vertebrate species appear to involve portions of the hippocampus and surrounding areas (Wilson & McNaughton, 1993). As originally revealed by single-cell recording studies in rats (O’Keefe, 1976; O’Keefe & Nadel, 1978) and later in primates (O’Keefe et al., 1998; Ludvig et al., 2004) and also hu­ mans (Ekstrom et al., 2003), some hippocampal cells can provide a spatial map-like repre­ sentation within a reference frame fixed onto the external environment. For example, some of these cells have visual receptive fields that do not move with the position of the animal or with changes in viewpoint but instead fire whenever the animal (e.g., the mon­ Page 30 of 59

Representation of Spatial Relations key; Rolls et al., 1989; Fyhn et al., 2004) is in a certain place in the local environment. Thus, these cells can play an important functional role as part of a navigational system (Lenck-Santini et al., 2001). Reactivation of place cells has also been observed during sleep episodes in rats (Wilson & McNaughton, 1994), which can be interpreted as an of­ fline consolidation process of spatial memories. Reactivation of whole past sequences of place cell activity has been recorded in rats during maze navigation whenever they stop at a turning point (Foster & Wilson, 2006); in some cases, place cell discharges can indi­ cate future locations along the path (Johnson & Redish, 2007) before the animals choose between alternative trajectories. Another type of cell (Figure 3.7) has been found in the rat entorhinal cortex (adjacent to the hippocampus). These cells present tessellating fir­ ing fields or “grids” (Hafting et al., 2005; Solstad et al., 2008) that could provide the ele­ ments of a spatial map based on path integration (Kjelstrup et al., 2008; Moser et al., 2008) and thus complement the function of the place cells.

Figure 3.7 Neuralfiring of “place cells” and “grid cells” of rats while navigating in their cage environ­ ment. Reprinted with permission from Moser et al., 2008.

Ekstrom and colleagues (2003) recorded directly from hippocampal and parahippocampal cells of epileptic patients undergoing neurosurgery. The patients played a videogame (a taxi-driving game in which a player navigates within a virtual city) while neural activity was recorded simultaneously from multiple cells. A significant proportion of the recorded cells showed spiking properties identical to those of place cells already described for the rat’s hippocampus. Other cells were instead view responsive. They responded to the view of a specific landmark (e.g., the picture of a particular building) and were relatively more common in the patients’ parahippocampal region. Thus, these findings support an ac­ count of the human hippocampus as computing a flexible map-like representation of space by combining visual and spatial elements with a coarser representation of salient scenes, views, and landmarks formed in the parahippocampal region. In addition, neu­ roimaging studies in humans (p. 49) revealed activity in the hippocampal region during navigational memory tasks (e.g., in taxi drivers recalling routes; Grön et al., 2000; Maguire et al., 1997; Wolbers et al., 2007). Lesion studies of animals and neurological cases have demonstrated deficits after temporal lesions that include the hippocampus (Barrash et al., 2000; Kessels et al., 2001; Maguire et al., 1996). However, the hippocam­ pus and the entorhinal cortex may not constitute necessary spatial structures for humans and for all types of navigational abilities; patients with lesions in these areas can main­ Page 31 of 59

Representation of Spatial Relations tain a path in mind and point to (estimate) the distance from a starting point by keeping track of a reference location while moving (Shrager etal., 2008). In rats, the parietal cortex is also clearly involved in the processing of spatial information (Save & Poucet, 2000) and constitutes another important structure for navigation (Nitz, 2006; Rogers & Kesner, 2006). One hypothesis is that the parietal cortex is involved in combining visual-spatial information and self-motion information so that egocentrically acquired information can be relayed to the hippocampus to generate and update an allo­ centric representation of space. Based on the role of the human dorsal system in the com­ putation of both categorical and coordinate types of spatial representations (Laeng et al., 2003), one would expect a strong interaction between processing in the hippocampal for­ mation and in the posterior parietal cortex. The human parietal cortex could provide both coordinate (distance and angle) and categorical information (boundary conditions, con­ nectivity, and topological information; Poucet, 1993) to the hippocampus. In turn, the hip­ pocampus could combine the above spatial information with spatial scenes encoded by the parahippocampal area and ventral areas specialized for landscape object recognition (e.g., recognition of a specific building; Aguirre et al., 1998). In addition, language-based spatial information (Hermer & Spelke, 1996; Hermer-Vazquez et al., 2001) could play an active role for this navigational system. Neuroimaging studies with humans revealed activity in the parahippocampal cortex when healthy participants passively viewed an environment or large-scale scenes (Epstein & Kanwisher, 1998), including an empty room, as well as during navigational tasks (e.g., in virtual environments; Aguirre et al., 1996; Maguire et al., 1998, 1999). Patients with dam­ age in this area show problems in scene recognition and route learning (Aguirre & D’Esposito, 1999; Epstein et al., 2001). Subsequent research with patients and monkeys has clarified the involvement of the parahippocampal cortex in memorizing objects’ loca­ tions within a large-scale scene or room’s geometry (Bohbot et al., 1998; Malkova & Mishkin, 2003), more than in supporting navigation or place knowledge (Burgess & O’Keefe, 2003). One proposal is that, when remembering a place or scene, the parietal cortex, based on reciprocal connections, can also translate an allocentric (North, South, East, West) parahippocampal representation into an egocentric (left, right, ahead, be­ hind) representation (Burgess, 2008). By this account, neglect in scene imagery (e.g., the Milan square’s neglect experiment of Bisiach & Luzzatti, 1978) after parietal lesions would result from an intact ventral allocentric representation of space (i.e., the whole square) along with damage to the parietal egocentric representation.

Conclusion As humans, we “act” in space, “know” space, and “talk” about space; three func­ tions that together would seem to require the whole human brain in order to be accom­ plished. Indeed, research on the human brain’s representation of spatial relations in­ cludes rather different traditions and theoretical backgrounds, which taken together pro­ vide us with a complex and rich picture of our cognition of space. Neuroscience has re­ (p. 50)

Page 32 of 59

Representation of Spatial Relations vealed (1) the existence of topo-graphic maps in the brain or, in other words, the brain’s representation of space by the spatial organization of the brain itself. The visual world is then represented by two higher order, representational streams of the brain, anatomically located ventrally and dorsally, that make the basic distinction between (2) “what” is in the world and “where” it is. However, these two forms of information need to be integrated in other representations that specify (3) “how” an object can be acted on and “which” object is the current target. Additionally, the brain localizes objects according to multiple and parallel (4) spatial frames of reference that are also relevant to the manner in which spa­ tial attention is deployed. After brain damage, attentional deficits (5) or neglect clearly reveal the relevance allocating attention along different frames of reference. Although many of the reviewed functions are shared among humans and other animals, humans show a strong degree of (6) cerebral lateralization for spatial cognition, and the current evidence indicates complementary hemispheric specializations for digital (categorical) and for analog (coordinate) spatial information. The representation of categorical spatial relations is also relevant for (7) object recognition by specifying the spatial arrangement of parts within an object (i.e., the “where of what”). Humans, as other animals, can also represent space in the very large scale, a (8) cognitive map of the external environment, which is useful for navigation. The most striking finding of cognitive neuro-science is the considerable degree of func­ tional specialization of the brain’s areas. Interestingly, the discovery that the visual brain separates visual information into two streams of processing (“what” versus “where”) does particular justice to Kant’s classic concept of space as a separate mode of knowledge (Moser et al., 2008). In the Critique of Pure Reason, space was defined as what is left when one ignores all the attributes of a shape: “If we remove from our empirical concept of a body, one by one, every feature in it which is empirical, the color, the hardness or softness, the weight, even the impenetrability, there still remains the space which the body (now entirely vanished) occupied, and this cannot be removed” (Kant, 1787; 2008, p. 377).

Author Note I am grateful for comments and suggestions on drafts of the chapter to Charlie Butter, Michael Peters, and Peter Svenonius. Please address correspondence to Bruno Laeng, Ph.D., Department of Psychology, Univer­ sity of Oslo, 1094 Blindern, 0317 Oslo, Norway; e-mail: [email protected]

References Aflalo, T. N., & Graziano, M. S. A. (2006). Possible origins of the complex topographic or­ ganization of motor cortex: Reduction of a multidimensional space onto a two-dimension­ al array. Journal of Neuroscience, 26, 6288–6297.

Page 33 of 59

Representation of Spatial Relations Aflalo, T. N., & Graziano, M. S. A. (2008). Four-dimensional spatial reasoning in humans. Journal of Experimental Psychology: Human Perception and Performance, 34, 1066–1077. Aglioti, S., Goodale, M. A., & DeSouza, J. F. X (1995). Sizecontrast illusions deceive the eye but not the hand. Current Biology, 5, 679–685. Aguirre, G. K., & D’Esposito, M. (1997). Environmental knowledge is subserved by sepa­ rable dorsal/ventral neural areas. Journal of Neuroscience, 17, 2512–2518. Aguirre, G. K., & D’Esposito, M. (1999). Topographical disorientation: A synthesis and taxonomy. Brain, 122, 1613–1628. Aguirre, G. K., Zarahn, E., & D’Esposito, M. (1998). An area within human ventral cortex sensitive to “building” stimuli: Evidence and implications. Neuron, 17, 373–383. Alain, C., Arnott, S. R., Hevenor, S., Graham, S., & Grady, C. L. (2001). “What” and “where” in the human auditory system. Proceedings of the National Academy of Sciences U S A, 98, 12301–12306. Alivisatos, B., & Petrides, M. (1997). Functional activation of the human brain during mental rotation. Neuropsychologia, 35, 111–118. Amorapanth, P. X., Widick, P., & Chatterjee, A. (2010). The neural basis for spatial rela­ tions. Journal of Cognitive Neuroscience, 22, 1739–1753. Andersen, R. A., & Buneo, C. A. (2002). Intentional maps in posterior parietal cortex. An­ nual Review of Neuroscience, 25, 189–220. Andersen, R. A., Essick, G. K., & Siegel, R. M. (1985). The encoding of spatial location by posterior parietal neurons. Science, 230, 456–458. Avillac, M., Deneve, S., Olivier, E., Pouget, A., & Duhamel, J. R. (2005). Reference frames for representing visual and tactile locations in parietal cortex. Nature Neuroscience, 8, 941–949. Baars, B. J. (2002). The conscious access hypothesis: Origins and recent evidence. Trends in Cognitive Sciences, 6, 47–52. Baciu, M., Koenig, O., Vernier, M.-P., Bedoin, N., Rubin, C., & Segebarth, C. (1999). Cate­ gorical and coordinate spatial relations: fMRI evidence for hemispheric specialization. Neuroreport, 10, 1373–1378. (p. 51)

Bálint, R. (1909). Seelenlähmung des “Schauens”, optische Ataxie, räumliche

Störung der Aufmerksamkeit. European Neurology, 25 (1), 51–66, 67–81. Ballard, D. H. (1986). Cortical connections and parallel processing: Structure and func­ tion. Behavioral and Brain Sciences, 9, 67–120.

Page 34 of 59

Representation of Spatial Relations Banich, M. T., & Federmeier, K. D. (1999). Categorical and metric spatial processes distin­ guished by task demands and practice. Journal of Cognitive Neuroscience, 11 (2), 153– 166. Barlow, H. (1981). Critical limiting factors in the design of the eye and visual cortex. Pro­ ceedings of the Royals Society of London, Biological Sciences, 212, 1–34. Barrash, J., Damasio, H., Adolphs, R., & Tranel, D. (2000). The neuroanatomical corre­ lates of route learning impairment. Neuropsychologia, 38, 820–836. Battista, C., & Peters, M. (2010). Ecological aspects of mental rotation around the vertical and horizontal axis. Learning and Individual Differences, 31 (2), 110–113. Baxter, D. M., & Warrington, E. K. (1983). Neglect dysgraphia. Journal of Neurology, Neu­ rosurgery, and Psychiatry, 46, 1073–1078. Behrmann, M., & Moscovitch, M. (1994). Object-centered neglect in patients with unilat­ eral neglect: Effects of left-right coordinates of objects. Journal of Cognitive Neuroscience, 6, 1–16. Behrmann, M., & Tipper, S. P. (1999). Attention accesses multiple reference frames: Evi­ dence from visual neglect. Journal of Experimental Psychology: Human Perception and Performance, 25, 83–101. Beschin, N., Basso, A., & Della Sala, S. (2000). Perceiving left and imagining right: Disso­ ciation in neglect. Cortex, 36, 401–414. Beschin, N., Cubelli, R., Della Sala, S., & Spinazzola, L. (1997). Left of what? The role of egocentric coordinates in neglect. Journal of Neurosurgery and Psychiatry, 63, 483–489. Biederman, I. (1987). Recognition-by-components: A theory of human image understand­ ing. Psychological Review, 94, 115–147. Bisiach, E., Capitani, E., & Porta, E. (1985). Two basic properties of space representation in the brain: Evidence from unilateral neglect. Journal of Neurology, Neurosurgery, and Psychiatry, 48, 141–144. Bisiach, E., & Luzzatti, C. (1978). Unilateral neglect of representational space. Cortex, 14, 129–133. Block, N. (1996). How can we find the neural correlate of consciousness. Trends in Neuro­ sciences, 19, 456–459. Bohbot, V. D., Kalina, M., Stepankova, K., Spackova, N., Petrides, M., & Nadel, L. (1998). Spatial memory deficits in patients with lesions to the right hippocampal and the right parahippocampal cortex. Neuropsychologia, 36, 1217–1238. Brandt, T., & Dietrich, M. (1999). The vestibular cortex: Its locations, functions and disor­ ders. Annals of the New York Academy of Sciences, 871, 293–312. Page 35 of 59

Representation of Spatial Relations Bruyer, R., Scailquin, J. C., & Coibon, P. (1997). Dissociation between categorical and co­ ordinate spatial computations: Modulation by cerebral hemispheres, task properties, mode of response, and age. Brain and Cognition, 33, 245–277. Buccino, G., Lui, F., Canessa, N., Patteri, I., Lagravinese, G., Benuzzi, F., Porro, C.A., & Rizzolatti, G. (2004). Neural circuits involved in the recognition of actions performed by nonconspecifics: An fMRI study. Journal of Cognitive Neuroscience, 16, 114–126. Bullens, J., & Postma, A. (2008). The development of categorical and coordinate spatial relations. Cognitive Development, 23, 38–47. Burgess, N. (2008). Spatial cognition and the brain. Annals of the New York Academy of Sciences, 1124, 77–97. Burgess, N., & O’Keefe, J. (2003). Neural representations in human spatial memory. Trends in Cognitive Sciences, 7, 517–519. Burnod, Y., Baraduc, P., Battaglia-Mayer, A., Guigon, E., Koechlin, E., Ferraina, S., Lac­ quaniti, F., & Caminiti, R. (1999). Parieto-frontal coding of reaching: An integrated frame­ work. Experimental Brain Research, 129, 325–346. Butter, C. M., Evans, J., Kirsh, N., & Kewman, D. (1989). Altitudinal neglect following trau­ matic brain injury: A case report. Cortex, 25, 135–146. Butters, N., Barton, M., & Brody, B. A. (1970). Role of the right parietal lobe in the media­ tion of cross-modal associations and reversible operations in space. Cortex, 6, 174–190. Caramazza, A., & Hillis, A. E. (1990). Spatial representation of words in the brain implied by the studies of a unilateral neglect patient. Nature, 346, 267–269. Carey, D. P., Dijkerman, H. C., Murphy, K. J., Goodale, M. A., & Milner, A. D. (2006). Point­ ing to places and spaces in a patient with visual form agnosia. Neuropsychologia, 44, 1584–1594. Carey, D. P., Harvey, M., & Milner, A. D. (1996). Visuomotor sensitivity for shape and ori­ entation in a patient with visual form agnosia. Neuropsychologia, 3, 329–337. Carlson, L., Regier, T., & Covey, E. (2003). Defining spatial relations: Reconciling axis and vector representations. In E. van der Zee & J. Slack (Eds.), Representing direction in lan­ guage and space (pp. 111–131). Oxford, UK: Oxford University Press. Carpenter, P. A., Just, M. A., Keller, T. A., Eddy, W., & Thulborn, K. (1999a). Graded func­ tional activation in the visuospatial system with the amount of task demand. Journal of Cognitive Neuroscience, 11, 9–24. Carpenter, P. A., Just, M. A., Keller, T. A., Eddy, W., & Thulborn, K. (1999b). Time-course of fMRI activation in language and spatial networks during sentence comprehension. Neu­ roImage, 10, 216–224. Page 36 of 59

Representation of Spatial Relations Casati, R., & Varzi, A. (1999). Parts and places: The structures of spatial representation. Boston: MIT Press. Castiello, U. (2005). The neuroscience of grasping. Nature Reviews: Neuroscience, 6, 726–736. Cavanagh, P. (1998). Attention: Exporting vision to the mind. In S. Saida & P. Cavanagh (Eds.), Selection and integration of visual information, pp. 3–11. Tsukuba, Japan: STA & NIBH-T. Chafee, M. V., Averbeck, B. B., & Crowe, D. A. (2007). Representing spatial relationships in posterior parietal cortex: Single neurons code object-referenced position. Cerebral Cor­ tex, 17, 2914–2932. Chafee, M. V., Crowe, D. A., Averbeck, B. B., & Georgopoulos, A. P. (2005). Neural corre­ lates of spatial judgement during object construction in parietal cortex. Cerebral Cortex, 15, 1393–1413. Chao, L. L., & Martin, A. (2000). Representation of manipulable man-made objects in the dorsal stream. NeuroImage, 12, 478–484. Cheng, K. (1986) A purely geometric module in the rat’s spatial representation. Cognition, 23, 149–178. Cherniak, C. (1990). The bounded brain: Toward a quantitative neuroanatomy. Journal of Cognitive Neuroscience, 2, 58–68. (p. 52)

Clément, G., & Reschke, M. F. (2008). Neuroscience in space. New York: Springer.

Colby, C. L., & Goldberg, M. E. (1999), Space and attention in parietal cortex. Annual Re­ view of Neuroscience, 23, 319–349. Collett, T. (1982). Do toads plan routes? A study of the detour behaviour of Bufo Viridis. Journal of Comparative Physiology, 146, 261–271. Committeri, G., Galati, G., Paradis, A. L., Pizzamiglio, L., Berthoz, A., & LeBihan, D. (2004). Reference frames for spatial cognition: Different brain areas are involved in view­ er-, object-, and landmark-centered judgments about object location. Journal of Cognitive Neuroscience, 16, 1517–1535. Committeri, G., Pitzalis, S., Galati, G., Patria, F., Pelle, G., Sabatini, U., Castriota-Scander­ beg, A., Piccardi, L., Guariglia, C., & Pizzamiglio L. (2007). Neural bases of personal and extrapersonal neglect in humans. Brain, 130, 431–441. Constantinidis, C., & Steinmetz, M. A. (2001). Neuronal responses in Area 7a to multiplestimulus displays: I. Neurons encode the location of the salient stimulus. Cerebral Cortex, 11, 581–591. Cook, W. A. (1989). Case grammar theory. Washington, DC: Georgetown University Press. Page 37 of 59

Representation of Spatial Relations Corbetta, M., Akbudak, E., Conturo, T. E., Snyder, A. Z., Ollinger, J. M., Drury, H. A., Linenweber, M. R., Petersen, S. E., Raichle, M. E., Van Essen, D. C., & Shulman, G. L. (1998). A common network of functional areas for attention and eye movements. Neuron, 21, 761–773 Corbetta, M., Kincade, J. M., Ollinger, J. M., McAvoy, M. P., & Shulman, G. L. (2000). Vol­ untary orienting is dissociated from target detection in human posterior parietal cortex. Nature, 3, 292–297. Corbetta, M., Miezin, F. M., Shulman, G. L., & Petersen, S. E. (1993). A PET study of visu­ ospatial attention. Journal of Neuroscience, 13, 1202–1226. Corbetta, M., Patel, G., & Shulman, G. L. (2008). The reorienting system of the human brain: From environment to theory of mind. Neuron, 58, 306–324. Courtney, S. M., Ungerleider, L. G., Keil, K., & Haxby, J. V. (1997). Transient and sustained activity in a distributed neural system for human working memory. Nature, 386, 608–611. Cowey, A., Small, M., & Ellis, S. (1994). Left visuo-spatial neglect can be worse in far than in near space. Neuropsychologia, 32, 1059–1066. Crowe, D. A., Averbeck, B. B., Chafee, M. V., & Georgopoulos, A. P. (2005). Dynamics of parietal neural activity during spatial cognitive processing. Neuron, 47, 885–891. Culham, J. C., Brandt, S. A., Cavanagh, P., Kanwisher, N. G., Dale, A. M., & Tootell, R. B. H. (1998). Cortical fMRI activation produced by attentive tracking of moving targets. Journal of Neurophysiology, 80, 2657–2670. Culham, J. C., Cavanagh, P., & Kanwisher, N. G. (2001) Attention response functions: Characterizing brain areas using fmri activation during parametric variations of atten­ tional load. Neuron, 32, 737–745. Culham, J. C., Danckert, S. L., DeSouza, J. F., Gati, J. S., Menon, R. S., & Goodale, M. A. (2003). Visually guided grasping produces fMRI activation in dorsal but not ventral stream brain areas. Experimental Brain Research, 153 (2), 180–189. Culham, J. C., & Valyear, K. F. (2006). Human parietal cortex in action. Current Opinion in Neurobiology, 16 (2), 205–212. Damasio, H., Grabowski, T. J., Tranel, D., Ponto, L. L. B., Hichwa, R. D., & Damasio, A. R. (2001). Neural correlates of naming actions and of naming spatial relations. NeuroImage, 13, 1053–1064. Dehaene, S. (1997). The number sense. Oxford, UK: Oxford University Press. Dehaene, S., & Changeux, J.-P. (1993). Development of elementary numerical abilities: A neuronal model. Journal of Cognitive Neuroscience, 5, 390–407.

Page 38 of 59

Representation of Spatial Relations Denys, K., Vanduffel, W., Fize, D., Nelissen, K., Peuskens, H., Van Essen, D., & Orban, G. A. (2004). The processing of visual shape in the cerebral cortex of human and nonhuman primates: A functional magnetic resonance imaging study. Journal of Neuroscience, 24, 2551–2565. De Renzi, E. (1982). Disorders of space exploration and cognition. New York: John Wiley & Sons. De Renzi, E., & Vignolo, L. (1962). The Token Test: A sensitive test to detect receptive dis­ turbances in aphasics. Brain, 85, 665–678. DeYoe, E. A., Carman, G. J., Bandettini, P., Glickman, S., Wieser, J., Cox, R., Miller, D., & Neitz, J. (1996). Mapping striate and extrastriate visual areas in human cerebral cortex. Proceedings of the National Academy of Sciences U S A, 93, 2382–2386. DiMattia, B. V., & Kesner, R. P. (1988). Role of the posterior parietal association cortex in the processing of spatial event information. Behavioral Neuroscience, 102, 397–403. Driver, J., & Pouget, A. (2000). Object-Centered Visual Neglect, or Relative Egocentric Ne­ glect? Journal of Cognitive Neuroscience, 12 (3), 542–545. Duhamel, J.-R., Bremmer, F., BenHamed, S., & Graf, W. (1997) Spatial invariance of visual receptive fields in parietal cortex neurons. Nature, 389, 845–848. Duncan, J., Bundesen, C., Olson, A., Humphreys, G., Ward, R., Kyllingsbæk, S., van Raams­ donk, M., Rorden, R., & Chavda, S. (2003). Attentional functions in dorsal and ventral si­ multanagnosia. Cognitive Neuropsychology, 20, 675–701. Ekstrom, A. D., Kahana, M. J., Caplan, J. B., Fields, T. A., Isham, E. A., Newman, E. L., & Fried, I. (2003). Cellular networks underlying human spatial navigation. Nature, 425, 184–187. Emmorey, K., Damasio, H., McCullough, S., Grabowski, T., Ponto, L., Hichwa, R., & Bellu­ gi, U. (2002). Neural systems underlying spatial language in American Sign Language. Neuroimage, 17, 812–824. Engel, S. A., Glover, G. H., & Wandell, B. A. (1997) Retinotopic organization in human vi­ sual cortex and the spatial precision of functional MRI. Cerebral Cortex, 7, 181–192. Engel, S. A., Rumelhart, D. E., Wandell, B. A., Lee, A. T., Glover, G. H., Chichilnisky, E. J., & Shadlen, M. N. (1994). fMRI of human visual cortex. Nature, 369, 525. Epstein, R., DeYoe, E. A., Press, D. Z., Rosen, A. C., & Kanwisher, N. (2001). Neuropsycho­ logical evidence for a topographical learning mechanism in parahippocampal cortex. Cog­ nitive Neuropsychology, 18, 481–508. Epstein, R., & Kanwisher, N. (1998). A cortical representation of the local visual environ­ ment. Nature, 392 (6676), 598–601. Page 39 of 59

Representation of Spatial Relations Eurich, C. W., & Schwegler, H. (1997). Coarse coding: Calculation of the resolution achieved by a population of large receptive field neurons. Biological Cybernetics, 76, 357– 363. Fahle, M., & Poggio, T. (1981). Visual hyperacuity: Spatiotemporal interpolation in human vision. Philosophical Transactions of the Royal Society of London: Series B, 213, 451–477. (p. 53)

Faillenot, I., Toni, I., Decety, J., Grégoire, M.-C., & Jeannerod, M. (1997). Visual pathways for object-oriented action and object recognition: Functional anatomy with PET. Cerebral Cortex, 7, 77–85. Fang, F., & He, S. (2005). Cortical responses to invisible objects in the human dorsal and ventral pathways. Nature Neuroscience, 8, 1380–1385. Farah, M. J. (1990). Visual agnosia: Disorders of object recognition and what they tell us about normal vision. Cambridge, MA: MIT Press. Farrell, M. J., & Robertson, I. H. (2000). The automatic updating of egocentric spatial re­ lationships and its impairment due to right posterior cortical lesions. Neuropsychologia, 38, 585–595. Feldman, J. (1985). Four frames suffice: A provisional model of vision and space. Behav­ ioral and Brain Sciences, 8, 265–289. Felleman, D. J., & Van Essen, D. C. (1991) Distributed hierarchical processing in primate cerebral cortex. Cerebral Cortex, 1, 1–47. Fishman, R. S. (1997). Gordon Holmes, the cortical retina, and the wounds of war. Docu­ menta Ophthalmologica, 93, 9–28. Foster, D.J., & Wilson, M. A. (2006). Reverse replay of behavioural sequences in hip­ pocampal place cells during the awake state. Nature, 440, 680–683. Franco, L., & Sperry, R. W. (1977). Hemisphere lateralization for cognitive processing of geometry. Neuropsychologia, 75, 107–114. Freedman, D. J., & Assad, J. A. (2006). Experience-dependent representation of visual cat­ egories in parietal cortex. Nature, 443, 85–88. Friederici, A. D. (1982). Syntactic and semantic processes in aphasic deficits: The avail­ ability of prepositions. Brain and Language, 15, 249–258. Friedman-Hill, S. R., Robertson, L. C., & Treisman, A. (1995). Parietal contributions to vi­ sual feature binding: Evidence from a patient with bilateral lesions. Science, 269, 853– 855. Fyhn, M., Molden, S., Witter, M. P., Moser, E. I., & Moser, M.-B. (2004). Spatial represen­ tation in the entorhinal cortex. Science, 305, 1258–1264. Page 40 of 59

Representation of Spatial Relations Gallivan, J. P., Cavina-Pratesi, C., & Culham, J. C. (2009). Is that within reach? fMRI re­ veals that the human superior parieto-occipital cortex (SPOC) encodes objects reachable by the hand. Journal of Neuroscience, 29, 4381–4391. Gattass, R., Nascimento-Silva, S., Soares, J. G. M., Lima, B., Jansen, A. K., Diogo, A. C. M., Farias, M. F., Marcondes, M., Botelho, E. P., Mariani, O. S., Azzi, J., & Fiorani, M. (2005). Cortical visual areas in monkeys: Location, topography, connections, columns, plasticity and cortical dynamics. Philosophical Transactions of the Royal Society, B, 360, 709–731. Glickstein, M., Buchbinder, S., & May, J. L. (1998). Visual control of the arm, the wrist and the fingers: Pathways through the brain. Neuropsychologia, 36, 981–1001. Goel, V., Gold, B., Kapur, S., & Houle, S. (1998). Neuroanatomical correlates of human reasoning. Journal of Cognitive Neuroscience, 10, 293–302. Goldman-Rakic, P. S. (1987). Circuitry of primate prefrontal cortex and regulation of be­ havior by representational memory. Handbook of Physiology, 5, 373–417. Goodale, M.A., & Milner, A. D. (1992). Separate visual pathways for perception and ac­ tion. Trends in Neurosciences, 15, 20–25. Goodale, M. A., Milner, A. D., Jakobson, L. S., & Carey, D. P. (1991). A neurological dissoci­ ation between perceiving objects and grasping them. Nature, 349, 154–156. Goodrich-Hunsaker, N. J., Howard, B. P., Hunsaker, M. R., & Kesner, R. P. (2008). Human topological task adapted for rats: Spatial information processes of the parietal cortex. Neurobiology of Learning and Memory, 90, 389–394. Graziano, M. S. A., & Cooke, D. F. (2006). Parieto-frontal interactions, personal space, and defensive behavior. Neuropsychologia, 44, 845–859. Graziano, M. S. A., & Gross, C. G. (1995). Multiple representations of space in the brain. The Neuroscientist, 1, 43–50. Graziano, M. S. A., & Gross, C. G. (1998). Spatial maps for the control of movement. Cur­ rent Opinion in Neurobiology, 8, 195–201. Graziano, M. S. A., Hu, X. T., & Gross, C. G. (1997). Coding the locations of objects in the dark. Science, 277, 239–241. Graziano, M. S. A., Reiss, L. A. J., & Gross, C. G. (1999). A neuronal representation of the location of nearby sounds. Nature, 397, 428–430. Graziano, M. S. A., Yap, G. S., & Gross, C. G. (1994). Coding of visual space by pre-motor neurons. Science, 226, 1054–1057. Gregory, R. (2009). Seeing through illusions. Oxford, UK: Oxford University Press.

Page 41 of 59

Representation of Spatial Relations Grön, G., Wunderlich, A. P., Spitzer, M., Tomczak, R., & Riepe, M. W. (2000). Brain activa­ tion during human navigation: Gender-different neural networks as a substrate of perfor­ mance. Nature Neuroscience, 3, 404–408. Gross, C.G., & Graziano, M. S. (1995). Multiple representations of space in the brain. Neuroscientist, 1, 43–50. Gross, C. G., & Mishkin, M. (1977). The neural basis of stimulus equivalence across reti­ nal translation. In S. Harnad, R. Doty, J. Jaynes, L. Goldstein, and G. Krauthamer (Eds.), Laterulizution in the nervous system (pp. 109–122). New York: Academic Press. Hafting, T., Fyhn, M., Molden, S., Moser, M.-B., & Moser, E. I. (2005). Microstructure of a spatial map in the entorhinal cortex. Nature, 436, 801–806. Halligan, P. W., & Marshall, J. C. (1991). Left neglect for near but not far space in man. Nature, 350, 498–500. Halligan, P. W., & Marshall, J. C. (1995). Lateral and radial neglect as a function of spatial position: A case study. Neuropsychologia, 33, 1697–1702. Harris, I. M., Egan, G. F., Sonkkila, C., Tochon-Danguy, H. J., Paxinos, G., & Watson, J. D. (2000). Selective right parietal lobe activation during mental rotation: A parametric PET study. Brain, 123, 65–73. Harris, I. M., & Miniussi, C. (2003). Parietal lobe contribution to mental rotation demon­ strated with rTMS. Journal of Cognitive Neuroscience, 15, 315–323. Haxby, J. V., Grady, C. L., Horwitz, B., Ungerleider, L. G., Mishkin, M., Carson, R. E., et al. (1991). Dissociation of object and spatial visual processing pathways in human extrastri­ ate cortex. Proceedings of the National Academy of Sciences U S A, 88, 1621–1625. Hayward, W. G., & Tarr, M. J. (1995). Spatial language and spatial representation. Cogni­ tion, 55, 39–84. He, B. J., Snyder, A. Z., Vincent, J. L., Epstein, A., Shulman, G. L., & Corbetta, M. (2007). Breakdown of functional connectivity in frontoparietal networks underlies behav­ ioral deficits in spatial neglect. Neuron, 53, 905–918. (p. 54)

Heide, W., Blankenburg, M., Zimmermann, E., & Kompf, D. 1995. Cortical control of dou­ blestep saccades—implications for spatial orientation. Annals of Neurology, 38, 739–748. Heilman, K. M., Bowers, D., Coslett, H. B., Whelan, H., & Watson, R. T. (1985). Directional hypokinesia: prolonged reaction times for leftward movements in patients with right hemisphere lesions and neglect. Neurology, Cleveland, 35, 855–859. Heilman, K. M., & Van Den Abell, T. (1980). Right hemisphere dominance for attention: The mechanism underlying hemispheric asymmetries of inattention (neglect). Neurology, 30, 327–330. Page 42 of 59

Representation of Spatial Relations Hellige, J. B., & Michimata, C. (1989). Categorization versus distance: Hemispheric differ­ ences for processing spatial information. Memory & Cognition, 17, 770–776. Hermer, L., & Spelke, E. (1996). Modularity and development: The case of spatial reorien­ tation. Cognition, 61, 195–232. Hermer-Vazquez, L., Moffet, A., & Munkholm, P. (2001). Language, space, and the devel­ opment of cognitive flexibility in humans: The case of two spatial memory tasks. Cogni­ tion, 79, 263–299. Hillis, A. E. (2006). Neurobiology of unilateral spatial neglect. Neuroscientist, 12, 153– 163. Hillis, A. E., & Caramazza, A. (1991). Spatially-specific deficit to stimulus-centered letter shape representations in a case of “neglect dyslexia.” Neuropsychologia, 29, 1223–1240. Hillis, A. E., Newhart, M., Heidler, J., Barker, P. B., & Degaonkar, M. (2005). Anatomy of spatial attention: Insights from perfusion imaging and hemispatial neglect in acute stroke. Journal of Neuroscience, 25, 3161–3167. Hinton, G. E., McClelland, J. L., & Rumelhart, D. E. (1986). Distributed representations. In D. E. Rumelhart & D. L. McClelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition. Volume 1: Foundations (pp. 77–109). Cambridge, MA: MIT Press. Holmes, G., & Horax, G. (1919). Disturbances of spatial orientation and visual attention, with loss of stereoscopic vision. Archives of Neurology and Psychiatry, 1, 385–407. Horton, J. C., & Hoyt, W. F. (1991). The representation of the visual field in human striate cortex: A revision of the classic Holmes map. Archives of Ophthalmology, 109, 816–824. Hubbard, E. M., Piazza, M., Pinel, P., & Dehaene, S. (2005). Interactions between number and space in parietal cortex. Nature Reviews, Neuroscience, 6, 435–448. Hummel, J. E., & Biederman, I. (1992). Dynamic binding in a neural network for shape recognition. Psychological Review, 99, 480–517. Humphreys, G. W., & Riddoch, M. J. (1994). Attention to withinobject and between-object spatial representation: Multiple sites for visual selection. Cognitive Neuropsychology, 11, 207–241. Ingle, D. J. (1967). Two visual mechanisms underlying the behaviour of fish. Psychologis­ che Forschung, 31, 44–51. Ings, S. (2007). A natural history of seeing. New York: Norton & Company. Jackendoff, R., & Landau, B. (1992). Spatial language and spatial cognition. In R. Jackend­ off (Ed.), Languages of the mind: Essays on mental representation (pp. 99–124). Cam­ bridge, MA: MIT Press. Page 43 of 59

Representation of Spatial Relations Jacobs, R. A. (1997). Nature, nurture, and the development of functional specializations: A computational approach. Psychonomic Bulletin & Review, 4, 299–309. Jacobs, R. A. (1999). Computational studies of the development of functionally specialized modules. Trends in Cognitive Sciences, 3, 31–38. Jacobs, R. A., & Kosslyn, S. M. (1994). Encoding shape and spatial relations: The role of receptive field size in coordinating complementary representations. Cognitive Science, 18, 361–386. James, T. W., Culham, J. C., Humphrey, G. K., Milner, A. D., & Goodale, M. A. (2003). Ven­ tral occipital lesions impair object recognition but not object-directed grasping: A fMRI study. Brain, 126, 2463–2475. Jeannerod, M., Decety, J., & Michel, F. (1994). Impairment of grasping movements follow­ ing a bilateral posterior parietal lesion. Neuropsychologia, 32, 369–380. Jeannerod, M., & Jacob, P. (2005). Visual cognition: A new look at the two-visual systems model. Neuropsychologia, 43, 301–312. Johnsen, S., & Lohmann, K. J. (2005). The physics and neurobiology of magnetoreception. Nature Review Neuroscience, 6, 703–712. Johnson, H., & Haggard, P. (2005). Motor awareness without perceptual awareness. Neu­ ropsychologia, 43, 227–237. Johnson, A., & Redish, A. D. (2007). Neural ensembles in CA3 transiently encode paths forward of the animal at a decision point. Journal of Neuroscience, 27, 12176–12189. Johnson-Laird, P. N. (2005). Mental models in thought. In K. Holyoak & R. J. Sternberg (Eds.), The Cambridge handbook of thinking and reasoning (pp. 179–212). Cambridge, UK: Cambridge University Press. Jordan, K., Wustenberg, T., Heinze, H. J., Peters, M., & Jancke, L. (2002). Women and men exhibit different cortical activation patterns during mental rotation tasks. Neuropsycholo­ gia, 40, 2397–2408. Just, M. A., Carpenter, P. A., Maguire, M., Diwadkar, V., & McMains, S. (2001). Mental ro­ tation of objects retrieved from memory: A functional MRI study of spatial processing. Journal of Experimental Psychology: General, 130, 493–504. Kaas, J. H. (1997). Topographic maps are fundamental to sensory processing. Brain Re­ search Bulletin, 44, 107–112. Kahane, P., Hoffman, D., Minotti, L., & Berthoz, A. (2003) Reappraisal of the human vestibular cortex by cortical electrical stimulation study. Annals of Neurology, 54, 615– 624.

Page 44 of 59

Representation of Spatial Relations Kahneman, D., Treisman, A., & Gibbs, B. (1992). The reviewing of object files: Object-spe­ cific integration of information. Cognitive Psychology, 24, 175–219. Kant, I. (1781). Kritik der reinen Vernunft (translation: Critique of Pure Reason, 2008. Penguin Classics. Karnath, H. O. (2001). New insights into the functions of the superior temporal cortex: Nature Reviews, Neuroscience, 2, 568–576. Karnath, H. O., Ferber, S., & Himmelbach, M. (2001). Spatial awareness is a function of the temporal not the posterior parietal lobe. Nature, 411, 950–953. Khan, A. Z., Pisella, L., Vighetto, A., Cotton, F., Luauté, J., Boisson, D., Salemme, R., Craw­ ford, J. D., & Rossetti, Y. (2005). Optic ataxia errors depend on remapped, not viewed, tar­ get location. Nature Neuroscience, 8, 418–420. Kastner, S., Demner, I., & Ziemann, U. (1998). Transient visual field defects induced by transcranial magnetic stimulation. Experimental Brain Research, 118, 19–26. Kastner, S., DeSimone, K., Konen, C. S., Szczepanski, S. M., Weiner, K. S., & Sch­ neider, K. A. (2007). Topographic maps in human frontal cortex revealed in memory-guid­ ed saccade and spatial working-memory tasks. Journal of Neurophysiology, 97, 3494– 3507. (p. 55)

Kastner, S., Pinsk, M. A., De Weerd, P., Desimone, R., & Ungerleider, L. G. (1999). In­ creased activity in human visual cortex during directed attention in the absence of visual stimulation. Neuron, 22, 751–761. Kemmerer, D. (2006). The semantics of space: Integrating linguistic typology and cogni­ tive neuroscience. Neuropsychologia, 44, 1607–1621. Kemmerer, D., & Tranel, D. (2000). A double dissociation between linguistic and percep­ tual representations of spatial relationships. Cognitive Neuropsychology, 17, 393–414. Kemmerer, D., & Tranel, D. (2003). A double dissociation between the meanings of action verbs and locative prepositions. Neurocase, 9, 421–435. Kessels, R. P. C., de Haan, E. H. F., Kappelle, L. J., & Postma, A. (2001). Varieties of human spatial memory: A meta-analysis on the effects of hippocampal lesions. Brain Research Reviews, 35, 295–303. Kessels, R. P. C., Kappelle, L. J., de Haan, E. H. F., & Postma, A. (2002). Lateralization of spatial-memory processes: evidence on spatial span, maze learning, and memory for ob­ ject locations. Neuropsychologia, 40, 1465–1473. Kirchner, W. H., & Braun, U. (1994). Dancing honey bees indicate the location of food sources using path integration rather than 48, cognitive maps. Animal Behaviour, 1437– 1441. Page 45 of 59

Representation of Spatial Relations Kinsbourne, M. (1993). Orientational bias model of unilateral neglect: Evidence from at­ tentional gradients within hemispace. In I. H. Robertson & J. C. Marshall (Eds.), Unilater­ al neglect: Clinical and experimental studies (pp. 63–86). Hillsdale, NJ: Erlbaum. Kinsbourne, M., & Warrington, E. K. (1962). A disorder of simultaneous form perception. Brain, 85, 461–486. Kitada, R., Kito, T., Saito, D. N., Kochiyama, T., Matsamura, M., Sadato, N., & Lederman, S. J. (2006). Multisensory activation of the intraparietal area when classifying grating ori­ entation: A functional magnetic resonance imaging study. Journal of Neuroscience, 26, 7491–7501. Kjelstrup, K. B., Solstad, T., Brun, V. H., Hafting, T., Leutgeb, S., Witter, M. P., Moser, E. I., & Moser, M.-B. (2008). Finite scales of spatial representation in the hippocampus. Science, 321, 140–143. Koch, C. (2004). The quest for consciousness: A neurobiological approach. Englewood, CO: Roberts and Company. Kohonen, T. (2001). Self-organizing maps. Berlin: Springer. Kosslyn, S. M. (1987). Seeing and imagining in the cerebral hemispheres: A computation­ al approach. Psychological Review, 94 (2), 148–175. Kosslyn, S. M. (1980). Image and mind. Cambridge, MA: Harvard University Press. Kosslyn, S. M. (1994). Image and brain: The resolution of the imagery debate. Cambridge, MA: MIT Press. Kosslyn, S. M., Chabris, C. F., Marsolek, C. J., & Koenig, O. (1992). Categorical versus co­ ordinate spatial relations: Computational analyses and computer simulations. Journal of Experimental Psychology: Human Perception and Performance, 18 (2), 562–577. Kosslyn, S. M., DiGirolamo, G. J., Thompson, W. L., & Alpert, N. M. (1998). Mental rotation of objects versus hands: neural mechanisms revealed by positron emission tomography. Psychophysiology, 35, 151–161. Kosslyn, S. M., & Jacobs, R. A. (1994). Encoding shape and spatial relations: A simple mechanism for coordinating complementary representations. In V. Honavar & L. M. Uhr (Eds.), Artificial intelligence and neural networks: Steps toward principled integration (pp. 373–385). Boston: Academic Press. Kosslyn, S. M., & Koenig, O. (1992). wet mind: the new cognitive neuroscience. New York: Free Press. Kosslyn, S. M., Koenig, O., Barrett, A., Cave, C. B., Tang, J., & Gabrieli, J. D. E. (1989). Evi­ dence for two types of spatial representations: Hemispheric specialization for categorical

Page 46 of 59

Representation of Spatial Relations and coordinate relations. Journal of Experimental Psychology: Human Perception and Per­ formance, 15 (4), 723–735. Kosslyn, S. M., Maljkovic, V., Hamilton, S. E., Horwitz, G., & Thompson, W. L. (1995). Two types of image generation: Evidence for left and right hemisphere processes. Neuropsy­ chologia, 33 (11), 1485–1510. Kosslyn, S. M., Pick, H. L., & Fariello, G. R. (1974). Cognitive maps in children and men. Child Development, 45, 707–716. Kosslyn, S. M., Thompson, W. T., & Ganis, G. (2006). The case for mental imagery. New York: Oxford University Press. Kosslyn, S. M., Thompson, W.T., Gitelman, D. R., & Alpert, N. M. (1998). Neural systems that encode categorical versus coordinate spatial relations: PET investigations. Psychobi­ ology, 26 (4), 333–347. Króliczak, G., Heard, P., Goodale, M. A., & Gregory, R. L. (2006). Dissociation of percep­ tion and action unmasked by the hollow-face illusion. Brain Research, 1080, 9–16. Kuipers, B. (1978). Modeling spatial knowledge. Cognitive Science, 2, 129–153. Lacquaniti, F., Guigon, E., Bianchi, L., Ferraina, S., & Caminiti, R. (1995). Representing spatial information for limb movement: The role of area 5 in the monkey. Cerebral Cortex, 5, 391–409. Laeng, B. (1994). Lateralization of categorical and coordinate spatial functions: A study of unilateral stroke patients. Journal of Cognitive Neuroscience, 6 (3), 189–203. Laeng, B. (2006). Constructional apraxia after left or right unilateral stroke. Neuropsy­ chologia, 44, 1519–1523. Laeng, B., Brennen, T., Johannessen, K., Holmen, K., & Elvestad, R. (2002). Multiple refer­ ence frames in neglect? An investigation of the object-centred frame and the dissociation between “near” and “far” from the body by use of a mirror. Cortex, 38, 511–528. Laeng, B., Carlesimo, G. A., Caltagirone, C., Capasso, R., & Miceli, G. (2000). Rigid and non-rigid objects in canonical and non-canonical views: Effects of unilateral stroke on ob­ ject identification. Cognitive Neuropsychology, 19, 697–720. Laeng, B., Chabris, C. F., & Kosslyn, S. M. (2003). Asymmetries in encoding spatial rela­ tions. In K. Hugdahl and R. Davidson (Eds.), The asymmetrical brain (pp. 303–339). Cam­ bridge, MA: MIT Press. Laeng, B., Okubo, M., Saneyoshi, A., & Michimata, C. (2011). Processing spatial relations with different apertures of attention. Cognitive Science, 35, 297–329. Laeng, B., & Peters, M. (1995). Cerebral lateralization for the processing of spatial coor­ dinates and categories in left- and right-handers. Neuropsychologia, 33, 421–439. Page 47 of 59

Representation of Spatial Relations Laeng, B., Peters, M., & McCabe, B. (1997). Memory for locations within regions. Spatial biases and visual hemifield differences. Memory and Cognition, 26, 97–107. Laeng, B., Shah, J., & Kosslyn, S. M. (1999). Identifying objects in conventional and con­ torted poses: Contributions of hemisphere-specific mechanisms. Cognition, 70 (1), 53–85. Laeng, B., & Teodorescu, D.-S. (2002). Eye scanpaths during visual imagery reen­ act those of perception of the same visual scene. Cognitive Science, 26, 207–231. (p. 56)

Lakoff, G., & Johnson, M. (1999). Philosophy in the flesh. Chicago: University of Chicago Press. Lamb, T. D., Collin, S. P., & Pugh, E. N. (2007). Evolution of the vertebrate eye: Opsins, photoreceptors, retina and eye cup. Nature Reviews, Neuroscience, 8, 960–975. Lamme, V. A. F. (2003). Why visual attention and awareness are different. Trends in Cog­ nitive Sciences, 7, 12–18. Lamme, V. A. F. (2006). Towards a true neural stance on consciousness. Trends in Cogni­ tive Sciences, 10, 494–501. Lamme, V. A. F., & Roelfsema, P. R. (2000). The distinct modes of vision offered by feed­ forward and recurrent processing. Trends in Neuroscience, 23, 571–579. Lamme, V. A. F., Super, H., Landman, R., Roelfsema, P. R., & Spekreijse, H. (2000). The role of primary visual cortex (V1) in visual awareness. Vision Research, 40, 1507–1521. Landau, B., Hoffman, J. E., & Kurz, N. (2006). Object recognition with severe spatial deficits in Williams syndrome: Sparing and breakdown. Cognition, 100, 483–510. LeDoux, J. E., & Gazzaniga, M. S. (1978). The integrated mind. New York: Plenum. LeDoux, J. E., Wilson, D. H., & Gazzaniga, M. S. (1977). Manipulospatial aspects of cere­ bral lateralization: Clues to origin of lateralization. Neuropsychologia, 15, 743–750. Lenck-Santini, P.-P., Save, E., & Poucet, B. (2001). Evidence for a relationship between place-cell spatial firing and spatial memory performance. Hippocampus, 11, 337–390. Liben, L. S. (2009). The road to understanding maps. Current Directions in Psychological Science, 18, 310–315. Livingstone, M. S., & Hubel, D. H. (1988). Segregation of form, color, movement, and depth: Anatomy, physiology, and perception. Science, 240, 740–749. Lomber, S. G., & Malhotra, S. (2008). Double dissociation of “what” and “where” process­ ing in auditory cortex. Nature Neuroscience, 11, 609–616.

Page 48 of 59

Representation of Spatial Relations Ludvig, N., Tang, H. M., Gohil, B. C., & Botero, J. M. (2004). Detecting location-specific neuronal firing rate increases in the hippocampus of freely moving monkeys. Brain Re­ search, 1014, 97–109. Luna, B., Thulborn, K. R., Strojwas, M. H., McCurtain, B. J., Berman, R. A., Genovese, C. R., et al. (1998). Dorsal cortical regions subserving visually guided in humans: an fMRI study. Cerebral Cortex, 8 (1), 40–47. Luria, A. R. (1963). Restoration of function after brain injury. Pergamon Press. Luria, A. R. (1973). The working brain: An introduction to neuropsychology. New York: Basic Books. Maguire, E. A., Burgess, N., & O’Keefe, J. (1999). Human spatial navigation: cognitive maps, sexual dimorphism. Current Opinion in Neurobiology, 9, 171–177. Maguire, E. A., Burke, T., Phillips, J., & Staunton, H. (1996). Topographical disorientation following unilateral temporal lobe lesions in humans. Neuropsychologia, 34, 993–1001. Maguire, E. A., Frackowiak, R. S., & Frith, C. D. (1997). Recalling routes around London: Activation of the right hippocampus in taxi drivers. Journal of Neuroscience, 17, 7103– 7110. Maguire, E. A., Frith, C. D., Burgess, N., Donnett, J. G., & O’Keefe, J. (1998). Knowing where things are: Parahippocampal involvement in encoding object relations in virtual large-scale space. Journal of Neuroscience, 10, 61–76. Mahon, B. Z., Milleville, S. C., Negri, G. A. L., Rumiati, R. I., Caramazza, A., & Martin, A. (2007). Action-related properties shape object representations in the ventral stream. Neu­ ron, 55, 507–520. Malach, R., Levy, I., & Hasson, U. (2002). The topography of high-order human object ar­ eas. Trends in Cognitive Science, 6, 176–184. Malkova, L., & Mishkin, M. (2003). One-trial memory for object-place associations after separate lesions of hippocampus and posterior parahippocampal region in the monkey. Journal of Neuroscience, 1; 23 (5), 1956–1965. Markman, A. B. (1999). Knowledge representation. Mahwah, NJ: Psychology Press. Marr, D. (1982). Vision. San Francisco: Freeman and Company. Maunsell, J. H. R., & Newsome, W. T. (1987). Visual processing in the monkey extrastriate cortex. Annual Review of Neuroscience, 10, 363–401. Medendorp, W. P., Goltz, H. C., Vilis, T., & Crawford, J. D. (2003). Gaze-centered updating of visual space in human parietal cortex. Journal of Neuroscience, 23, 6209–6214.

Page 49 of 59

Representation of Spatial Relations Mennemeier, M., Wertman, E., & Heilman, K. M. (1992). Neglect of near peripersonal space. Brain, 115, 37–50. Menzel, R., Brandt, R., Gumbert, A., Komischke, B., & Kunze, J. (2000). Two spatial mem­ ories for honeybee navigation. Proceedings of the Royals Society of London, Biological Sciences, 267, 961–968. Miller, G. A., & Johnson-Laird, P. N. (1976). Language and perception. Cambridge, MA: Harvard University Press. Milner, A. D., & Goodale, M. A. (1995). The visual brain in action. Oxford: Oxford Univer­ sity Press. Milner, D. A., & Goodale, M. A. (2006). The visual brain in action. New York: Oxford Uni­ versity Press. Milner, D. A., & Goodale, M. A. (2008). Two visual systems re-viewed. Neuropsychologia, 46, 774–785. Milner, D. A., Perrett, D. I., Johnston, R. S., Benson, P. J., Jordan, T. R., Heeley, D. W., et al. (1991). Perception and action in “visual form agnosia.” Brain, 114, 405–428. Mishkin, M., Ungerleider, L. G., & Macko, K. A. (1983). Object vision and spatial vision: Two cortical pathways. Trends in Neurosciences, 6, 414–417. Morel, A., & Bullier, J. (1990). Anatomical segregation of two cortical visual pathways in the macaque monkey. Visual Neuroscience, 4, 555–578. Moser, E. I., Kropff, E., & Moser, M.-B. (2008). Place cells, grid cells and the brain’s spa­ tial representation system. Annual Review of Neuroscience, 31, 69–89. Motter, B. C., & Mountcastle, V. B. (1981). The functional properties of light-sensitive neu­ rons of the posterior parietal cortex studied in waking monkeys: Foveal sparing and oppo­ nent vector organization. Journal of Neuroscience, 1, 3–26. Mountcastle, V. B. (1995). The parietal system and some higher brain functions. Cerebral Cortex, 5, 377–390. Naganuma, T., Nose, I., Inoue, K., Takemoto, A., Katsuyama, N., & Taira, M. (2005). Infor­ mation processing of geometrical features of a surface based on binocular disparity cues: An fMRI study. Neuroscience Research, 51, 147–155. Neggers, S. F. W., Van der Lubbe, R. H. J., Ramsey, N. F., & Postma, A. (2006). Interactions between ego- and allocentric neuronal representations of space. NeuroImage, 31, 320– 331. Newcombe, F., & Russell, W. R. (1969). Dissociated visual perceptual and spatial deficits in focal lesions of the right hemisphere. Journal of Neurology, Neurosurgery, and Psychia­ try, 32, 73–81. Page 50 of 59

Representation of Spatial Relations Nguyen, B. T., Trana, T. D., Hoshiyama, M., Inuia, K., & Kakigi, R. (2004). Face rep­ resentation in the human primary somatosensory cortex. Neuroscience Research, 50, 227–232. (p. 57)

Nitz, D. A. (2006). Tracking route progression in the posterior parietal cortex. Neuron, 49, 747–756. O’Keefe, J. (1976). Place units in the hippocampus of the freely moving rat. Experimental Neurology, 51, 78–109. O’Keefe, J. (1996). The spatial prepositions in English, vector grammar, and the cognitive map theory. In P. Bloom, M. A. Peterson, L. Nadel, & M. F. Garrett (Eds.), Language and space (pp. 277–316). Cambridge, MA: The MIT Press. O’Keefe, J. (2003). Vector grammar, places, and the functional role of spatial prepositions in English. In E. van der Zee & J. Slack (Eds.), Representing direction in language and space (pp. 69–85). Oxford, K: Oxford University Press. O’Keefe, J., Burgess, N., Donnett, J. G., Jeffery, J. K., & Maguire, E. A. (1998). Place cells, navigational accuracy, and the human hippocampus. P hilosophical Transactions of the Royal Society of London, Series B, Biological Sciences, 353, 1333–1340. O’Keefe, J., & Nadel, L. (1978). The hippocampus as a cognitive map. Oxford, UK: Calren­ don Press. Okubo, M., Laeng, B., Saneyoshi, A., & Michimata, C. (2010). Exogenous attention differ­ entially modulates the processing of categorical and coordinate spatial relations. Acta Psychologica, 135, 1–11. Olson, C. R. (2001). Object-based vision and attention in primates. Current Opinion in Neurobiology, 11, 171–179. Olson, C. R. (2003). Brain representations of object-centered space in monkeys and hu­ mans. Annual Review of Neuroscience, 26, 331–354. Olson, C. R., & Gettner, S. N. (1995). Object-centered direction selectivity in the macaque supplementary eye field. Science, 269, 985–988. Optican, L. M. (2005). Sensorimotor transformation for visually guided saccases. Annals of the New York Academy of Sciences, 1039, 132–148. O’Regan, J. K., & Nöe, A. (2001). A sensorimotor account of vision and visual conscious­ ness. Behavioral and Brain Sciences, 24, 939–1011. O’Reilly, R. C., Kosslyn, S. M., Marsolek, C. J., & Chabris, C. F. (1990). Receptive field characteristics that allow parietal lobe neurons to encode spatial properties of visual in­ put: A computational analysis. Journal of Cognitive Neuroscience, 2, 141–155.

Page 51 of 59

Representation of Spatial Relations Otto, I., Grandguillaume, P., Boutkhil, L., & Guigon, E. (1992). Direct and indirect cooper­ ation between temporal and parietal networks for invariant visual recognition. Journal of Cognitive Neuroscience, 4, 35–57. Paillard, J. (1991). Motor and representational framing of space. In J. Paillard (Ed.), Brain and space (pp. 163–182). Oxford, UK: Oxford University Press. Palermo, L., Bureca, I., Matano, A., & Guariglia, C. (2008). Hemispheric contribution to categorical and coordinate representational processes: A study on brain-damaged pa­ tients. Neuropsychologia, 46, 2802–2807. Parsons, L. M. (2003). Superior parietal cortices and varieties of mental rotation. Trends in Cognitive Sciences, 7, 515–517. Perenin, M.-T., & Vighetto, A. (1988). Optic ataxia: A specific disruption in visuomotor mechanisms. Brain, 111, 643–674. Piaget, J., & Inhelder, B. (1956). The child’s conception of space. London: Routledge & Kegan Paul. Perenin, M. T., & Jeannerod, M. (1978). Visual function within the hemianopic field follow­ ing early cerebral hemidecortication in man. I. Spatial localization. Neuropsychologia, 16, 1–13. Pinker, S. (1990). A theory of graph comprehension. In R. Friedle (Ed.), Artificial intelli­ gence and the future of testing (pp. 73–126). Hillsdale, NJ: Erlbaum. Pinker, S. (2007). The stuff of thought: Language as a window into human nature. New York: Penguin Books. Poucet, B. (1993). Spatial cognitive maps in animals: New hypotheses on their structure and neural mechanisms. Psychological Review, 100, 163–182. Pouget, A., & Sejnowski, T. J. (1997). A new view of hemineglect based on the response properties of parietal neurones. Philosophical Transactions of the Royal Society, Series B, Biological Sciences, 352, 1449–1459. Pouget, A., Snyder, L. H. (2000). Computational approaches to sensorimotor transforma­ tions. Nature Neuroscience, 3, 1192–1198. Quinlan, D. J., & Culham, J. C. (2007). fMRI reveals a preference for near viewing in the human parietal-occipital cortex. NeuroImage, 36, 167–187. Rao, S. C., Rainer, G., & Miller, E. K. (1997). Integration of what and where in the primate prefrontal cortex. Science, 276, 821–824. Reed, C. L., Klatzky, R. L., & Halgren, E. (2005). What vs. where in touch: An fMRI study. NeuroImage, 25, 718–726. Page 52 of 59

Representation of Spatial Relations Revonsuo, A., & Newman, J. (1999). Binding and consciousness. Consciousness and Cog­ nition, 8, 123–127. Ritz, T. (2009). Magnetic sense in animal navigation. In L. Squire (Ed.), Encyclopedia of neuroscience (pp. 251–257). Elsevier: Network Version. Rizzolatti, G., & Craighero, L. (2004). The mirror-neuron system. Annual Review of Neu­ roscience, 27, 169–192. Rizzolatti, G., & Matelli, M. (2003). Two different streams form the dorsal visual system: Anatomy and functions. Experimental Brain Research, 153, 146–157. Robertson, L. C. (2003). Binding, spatial attention and perceptual awareness. Nature Re­ views: Neuroscience, 4, 93–102. Robertson, L. C., Treisman, A., Friedman-Hill, S., & Grabowecky, M. (1997). The interac­ tion of spatial and object pathways: Evidence from Bálint ‘s syndrome. Journal of Cogni­ tive Neuroscience, 9, 295–317. Rogers, J. L., & Kesner, R. P. (2006). Lesions of the dorsal hippocampus or parietal cortex differentially affect spatial information processing. Behavioral Neuroscience, 120, 852– 860. Rogers, L. J., Zucca, P., & Vallortigara, G. (2004). Advantage of having a lateralized brain. Proceedings of the Royal Society of London B (Suppl.): Biology Letters, 271, 420–422. Rolls, E. T., Miyashita, Y., Cahusac, P. M. B., Kesner, R. P., Niki, H., Feigenbaum, J., et al. (1989). Hippocampal neurons in the monkey with activity related to the place in which a stimulus is shown. Journal of Neuroscience, 9, 1835–1845. Romanski, L. M., Tian, B., Fritz, J., Mishkin, M., Goldman-Rakic, P. S., & Rauschecker, J. P. (1999). Dual streams of auditory afferents target multiple domains in the primate pre­ frontal cortex. Nature Neuroscience, 2, 1131–1136. Roth, E. C., & Hellige, J. B. (1998). Spatial processing and hemispheric asymmetry: Con­ tributions of the transient/magnocellular visual system. Journal of Cognitive Neuroscience, 10, 472–484. Rueckl, J. G., Cave, K. R., & Kosslyn, S. M. (1989). Why are ‘what’ and ‘where’ processed by separate visual systems? A computational investigation. Journal of Cognitive Neuro­ science, 1 (2), 171–186. Rybash, J. M., & Hoyer, W. J. (1992). Hemispheric specialization for categorical and coordinate spatial representations: A reappraisal. Memory & Cognition, 20 (3), 271–276. (p. 58)

Sakata, H., Taira, M., Kusunoki, M., Murata, A., & Tanaka, Y. (1997). The parietal associa­ tion cortex in depth perception and visual control of hand action. Trends in Neuroscience, 20, 350–357. Page 53 of 59

Representation of Spatial Relations Save, E., & Poucet, B. (2000). Hippocampal-parietal cortical interactions in spatial cogni­ tion. Hippocampus, 10, 491–499. Schindler, I., Rice, N. J., McIntosh, R. D., Rossetti, Y., Vighetto, A., & Milner, A. D. (2004). Automatic avoidance of obstacles is a dorsal stream function: Evidence from optic ataxia. Nature Neuroscience, 7, 779–784. Schneider, G. E. (1967). Contrasting visuomotor functions of tectum and cortex in the golden hamster. Psychologische Forschung, 30, 52–62. Seltzer, B., & Pandya, D. N. (1978). Afferent cortical connections and architectonics of the superior temporal sulcus and surrounding cortex in the rhesus monkey. Brain Research, 149, 1–24. Semmes, J., Weinstein, S., Ghent, L., & Teuber, H. L. (1955). Spatial orientation in man af­ ter cerebral injury: I. Analyses by locus of lesion. Journal of Psychology, 39, 227–244. Sereno, A. B., & Maunsell, J. H. R. (1998). Shape selectivities in primate lateral intrapari­ etal cortex. Nature, 395, 500–503. Sereno, M. I., Dale, A. M., Reppas, J. B., Kwong, K. K., Belliveau, J. W., et al. (1995). Bor­ ders of multiple visual areas in humans revealed by functional magnetic resonance imag­ ing. Science, 268, 889–893. Sereno, M. I., & Huang, R.-S. (2006). A human parietal face area contains head-centered visual and tactile maps. Nature Neuroscience, 9, 1337–1343. Sereno, M. I., Pitzalis, S., & Martinez, A. (2001). Mapping of contralateral space in retino­ topic coordinates by a parietal cortical area in humans. Science, 294, 1350–1354. Servos, P., Engel, S. A., Gati, J., & Menon, R. (1999). FMRI evidence for an inverted face representation in human somatosensory cortex. Neuroreport, 10, 1393–1395. Seth, A. K., McKinstry, J. L., Edelman, G. M., & Krichmar, J. L. (2004). Visual binding through reentrant connectivity and dynamic synchronization in a brain-based device. Cerebral Cortex, 14, 1185–1199. Shelton, P. A., Bowers, D., & Heilman, K. M. (1990). Peripersonal and vertical neglect. Brain, 113, 191–205. Shelton, A. L., & McNamara, T. P. (2001). Systems of spatial reference in human memory. Cognitive Psychology, 43, 274–310. Shrager, Y., Kirwan, C. B., & Squire, L. R. (2008). Neural basis of the cognitive map: Path integration does not require hippocampus or entorhinal cortex. Proceedings of the Na­ tional Academy of Sciences U S A, 105, 12034–12038. Silver, M., & Kastner, S. (2009). Topographic maps in human frontal and parietal cortex. Trends in Cognitive Sciences, 11, 488–495. Page 54 of 59

Representation of Spatial Relations Slobin, D. I. (1996). From “thought and language” to “thinking for speaking.” In J. J. Gumperz & S. C. Levinson (Eds.), Rethinking linguistic relativity (pp. 70–96). Cambridge, UK: Cambridge University Press. Slotnick, S. D., & Moo, L. R. (2006). Prefrontal cortex hemispheric specialization for cate­ gorical and coordinate visual spatial memory. Neuropsychologia, 44, 1560–1568. Slotnick, S. D., Moo, L., Tesoro, M. A., & Hart, J. (2001). Hemispheric asymmetry in cate­ gorical versus coordinate visuospatial processing revealed by temporary cortical deacti­ vation. Journal of Cognitive Neuroscience, 13, 1088–1096. Smallman, H. S., MacLeod, D. I. A., He, S., & Kentridge, R. W. (1996). Fine grain of the neural representation of human spatial vision. Journal of Neuroscience, 76 (5), 1852– 1859. Smith, E. E., Jonides, J., Koeppe, R. A., Awh, E., Schumacher, E., & Minoshima, S. (1995). Spatial vs. object working memory: PET investigations. Journal of Cognitive Neuroscience, 7, 337–358. Snyder, L. H., Batista, A. P., & Andersen, R. A. (1997). Coding of intention in the posterior parietal cortex. Nature, 386, 167–170. Solstad, T., Boccara, C. N., Kropff, E., Moser, M.-B., & Moser, E. I. (2008). Representation of geometric borders in the entorhinal cortex. Science, 322, 1865–1868. Sovrano, V. A., Dadda, M., Bisazza, A. (2005). Lateralized fish perform better than nonlat­ eralized fish in spatial reorientation tasks. Behavioural Brain Research, 163, 122–127. Stark, M., Coslett, H. B., & Saffran, E. M. (1996). Impairment of an egocentric map of lo­ cations: Implications for perception and action. Cognitive Neuropsychology, 13, 481–523. Sutherland, R. J., & Rudy, J. W. (1988) Configural association theory: The role of the hip­ pocampal formation in learning, memory and amnesia. Psychobiology, 16, 157–163. Tanaka, K., Saito, H., Fukada, Y., & Moriya, M. (1991) Coding visual images of objects in the inferotemporal cortex of the macaque monkey. Journal of Neurophysiology, 66, 170– 189. Taira, M., Mine, S., Georgopoulos, A. P., Murata, A., & Sakata, H. (1990). Parietal cortex neurons of the monkey related to the visual guidance of hand movement. Experimental Brain Research, 83, 29–36. Takahashi, N., Kawamura, M., Shiota, J., Kasahata, N., & Hirayama, K. (1997). Pure topo­ graphic disorientation due to right retrosplenial lesion. Neurology, 49, 464–469. Talmy, L. (2000). Toward a cognitive semantics. Cambridge, MA: MIT Press.

Page 55 of 59

Representation of Spatial Relations Thiebaut de Schotten, M., Urbanski, M., Duffau, H., Volle, E., Levy, R., Dubois, B., & Bar­ tolomeo, P. (2005). Direct evidence for a parietal-frontal pathway subserving spatial awareness in humans. Science, 309, 2226–2228. Thivierge, J.-P., & Marcus, G. (2007). The topographic brain: From neural connectivity to cognition. Trends in Neuroscience, 30, 251–259. Tipper, S.P., & Behrmann, M. (1996). Object-centered not scene based visual neglect. Journal of Experimental Psychology: Human Perception and Performance, 22, 1261–1278. Tolman, E. C. (1948) Cognitive maps in rats and men. Psychological Review, 55, 189–208. Tootell, R. B., Hadjikhani, N., Hall, E. K., Marrett, S., Vanduffel, W., Vaughan, J.T., & Dale, A. M. (1998). The retinotopy of visual spatial attention. Neuron, 21, 1409–1422. Tootell, R. B., Hadjikhani, N., Vanduffel, W., Liu, A. K., Mendola, J. D., Sereno, M. I., & Dale, A. M. (1998). Functional analysis of primary visual cortex (V1) in humans. Proceed­ ings of the National Academy of Sciences U S A, 95, 811–817. Tootell, R. B., Mendola, J. D., Hadjikhani, N., Liu, A. K., & Dale, A. M. (1998). The repre­ sentation of the ipsilateral visual field in human cerebral cortex. Proceedings of the Na­ tional Academy of Sciences U S A, 95, 818–824. Tootell, R. B., Silverman, M. S., Switkes, E., & DeValois, R. L. (1982). Deoxyglucose analy­ sis of retinotopic organization in primate striate cortex. Science, 218, 902–904. Tranel, D., & Kemmerer, D. (2004). Neuroanatomical correlates of locative prepositions. Cognitive Neuropsychology, 21, 719–749. Treisman, A. (1996). The binding problem. Current Opinion in Neurobiology, 6, 171–178. Treisman, A. (1988). The perception of features and objects. In R. D. Wright (Ed.), Visual attention (pp. 26–54). New York: Oxford University Press. (p. 59)

Tranel, D., & Kemmerer, D. (2004). Neuroanatomical correlates of locative prepositions. Cognitive Neuropsychology, 21, 719–749. Trojano, L., Conson, M., Maffei, R., & Grossi, D. (2006). Categorical and coordinate spa­ tial processing in the imagery domain investigated by rTMS. Neuropsychologia, 44, 1569– 1574. Trojano, L., Grossi, D., Linden, D. E. J., Formisano, E., Goebel, R., & Cirillo, S. (2002). Co­ ordinate and categorical judgements in spatial imagery: An fMRI study. Neuropsychologia, 40, 1666–1674. Tsal, Y., & Bareket, T. (2005). Localization judgments under various levels of attention. Psychonomic Bulletin & Review, 12 (3), 559–566.

Page 56 of 59

Representation of Spatial Relations Tsal, Y., Meiran, N., & Lamy, D. (1995). Toward a resolution theory of visual attention. Vi­ sual Cognition, 2, 313–330. Tsao, D. Y., Vanduffel, W., Sasaki, Y., Fize, D., Knutsen, T. A., Mandeville, J. B., Wald, L. L., Dale, A. M., Rosen, B. R., Van Essen, D. C., Livingstone, M. S., Orban, G. A., & Tootell, R. B. H. (2003). Stereopsis activates V3A and caudal intraparietal areas in macaques and humans. Neuron, 31, 555–568. Tsutsui, K.-I., Sakata, H., Naganuma, T., & Taira, M. (2002). Neural correlates for percep­ tion of 3D surface orientation from texture gradients. Science, 298, 409–412. Turnbull, O. H., Beschin, N., & Della Sala, S. (1997). Agnosia for object orientation: Impli­ cations for theories of object recognition. Neuropsychologia, 35, 153–163. Turnbull, O. H., Laws, K. R., & McCarthy, R. A. (1995). Object recognition without knowl­ edge of object orientation. Cortex, 31, 387–395. Ullman, S. (1984). Visual routines. Cognition, 18, 97–159. Ungerleider, L. G., & Haxby, J. V. (1994). “What” and “where” in the human brain. Current Opinion in Neurobiology, 4, 157–165. Ungerleider, L. G., & Mishkin, M. (1982). Two cortical visual systems. In D. J. Ingle, M. A. Goodale, & R. J. Mansfield (Eds.), Analysis of visual behavior (pp. 549–586). Cambridge, MA: MIT Press. Vallar, G., Bottini, G., & Paulesu, E. (2003). Neglect syndromes: the role of the parietal cortex. Advances in Neurology, 93, 293–319. Vallortigara, G., Cozzutti, C., Tommasi, L., & Rogers, L. J. (2001) How birds use their eyes: Opposite left-right specialisation for the lateral and frontal visual hemifield in the domes­ tic chick. Current Biology, 11, 29–33. Vallortigara, G., & Rogers, L. J. (2005). Survival with an asymmetrical brain: advantages and disadvantages of cerebral lateralization. Behavioral and Brain Sciences, 28, 575–589. Valyear, K. F., Culham, J. C., Sharif, N., Westwood, D., & Goodale, M. A. (2005). A double dissociation between sensitivity to changes in object identity and object orientation in the ventral and dorsal visual streams: A human fMRI study. Neuropsychologia, 44, 218–228. Vauclair, J., Yamazaki, Y., & Güntürkün, O. (2006). The study of hemispheric specialization for categorical and coordinate spatial relations in animals. Neuropsychologia, 44, 1524– 1534. Wandell, B. A. (1999). Computational neuroimaging of human visual cortex. Annual Re­ view of Neuroscience, 22, 145–173. Wandell, B.A., Brewer, A.A., & Dougherty, R. F. (2005). Visual field map clusters in human cortex. Philosophical Transactions of the Royal Society, B, 360, 693–707. Page 57 of 59

Representation of Spatial Relations Wandell, B. A., Dumoulin, S. O., & Brewer, A. A. (2009). Visual cortex in humans. In L. Squire (Ed.), Encyclopedia of neuroscience (pp. 251–257). Elsevier: Network Version. Wang, B., Zhou, T. G., Zhuo, Y., & Chen, L. (2007). Global topological dominance in the left hemisphere. Proceedings of the National Academy of Sciences U S A, 104, 21014– 21019. Warrington, E. K. (1982). Neuropsychological Studies of Object Recognition. Philosophi­ cal Transactions of the Royal Society of London. Series B, Biological, 298, 15–33. Warrington, E. K., & James, A. M. (1986). Visual object recognition in patients with righthemisphere lesions: Axes or features. Perception, 15, 355–366. Warrington, E. K., & James, A. M. (1988). Visual apperceptive agnosia: A clinico-anatomi­ cal study of three cases. Cortex, 24, 13–32. Warrington, E. K., & Taylor, A. M. (1973). The contribution of the right parietal lobe to ob­ ject recognition. Cortex, 9, 152–164. Waszak, F., Drewing, K., & Mausfeld, R. (2005). Viewerexternal frames of reference in the mental transformation of 3-D objects. Perception & Psychophysics, 67, 1269–1279. Weintraub, S., & Mesulam, M.-M. (1987). Right cerebral dominance in spatial attention: Further evidence based on ipsilateral neglect. Archives of Neurology, 44, 621–625. Weiskrantz, L. (1986). Blindsight: A Case Study and Implications. Oxford: Oxford Univer­ sity Press. Weiskrantz, L. (1997). Consciousness lost and found: A neuropsychological exploration. Oxford, UK: Oxford University Press. Wilson, F. A., Scalaidhe, S. P., & Goldman-Rakic, P. S. (1993). Dissociation of object and spatial processing domains in primate prefrontal cortex. Science, 260, 1955–1958. Wilson, M. A., & McNaughton, B. L. (1993). Dynamics of the hippocampal ensemble code for space. Science, 261, 1055–1058. Wilson, M. A., & McNaughton, B. L. (1994). Reactivation of the hippocampal ensemble memories during sleep. Science, 265, 676–679. Wolbers, T., & Hegarty, M. (2010). What determines our navigational abilities? Trends in Cognitive Sciences, 14 (3), 138–146. Wolbers, T., Wiener, J. M., Mallot, H. A., & Bűchel, C. (2007). Differential recruitment of the hippocampus, medial prefrontal vortex, and the human motion complex during path integration in humans. Journal of Neuroscience, 27, 9408–9416. Xu, F., & Carey, S. (1996). Infants’ metaphysics: The case of numerical identity. Cognitive Psychology, 30, 111–153. Page 58 of 59

Representation of Spatial Relations Yang, T. T., Gallen, C. C., Schwartz, B. J., & Bloom, F. E. (1994). Noninvasive somatosenso­ ry homunculus mapping in humans by using a large-array biomagnetometer. Proceedings of the National Academy of Sciences U S A, 90, 3098–3102. Zacks, J. M., Gilliam, F., & Ojemann, J. G. (2003). Selective disturbance of mental rotation by cortical stimulation. Neuropsychologia, 41, 1659–1667. Zeki, S. (1969). Representation of central visual fields in prestriate cortex of monkey. Brain Research, 14, 271–291. Zeki, S. (2001). Localization and globalization in conscious vision. Annual Review of Neu­ roscience, 24, 57–86. Zeki, S., Watson, J. D. G., Luexk, C. J., Friston, K. J., Kennard, C., & Frackowiak, R. S. J. (1991). A direct demonstration of functional specialization in human visual cortex. Journal of Neuroscience, 11, 641–649. Zipser, D., & Andersen, R. A. (1988). A back-propagation programmed network that simu­ lates response properties of a subset of posterior parietal neurons. Nature, 331, 679–684.

Bruno Laeng

Bruno Laeng is professor in cognitive neuropsychology at the University of Olso.

Page 59 of 59

Top-Down Effects in Visual Perception

Top-Down Effects in Visual Perception   Moshe Bar and Andreja Bubic The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics Edited by Kevin N. Ochsner and Stephen Kosslyn Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0004

Abstract and Keywords The traditional belief that our perception is determined by sensory input is an illusion. Empirical findings and theoretical ideas from recent years indicate that our experience, memory, expectations, goals, and desires can substantially affect the appearance of the visual information that is in front of us. Indeed, our perception is equally shaped by the incoming bottom-up information that captures the objective appearance of the world sur­ rounding us, as by our previous knowledge and personal characteristics that constitute different sources of top-down perceptual influences. The present chapter focuses on such feedback effects in visual perception, and describes how the interplay between prefrontal and posterior visual cortex underlies the mechanisms through which top-down predic­ tions guide visual processing. One major conclusion is that perception and cognition are more interactive than typically thought, and any artificial boundary between them may be misleading. Keywords: top-down, feedback, predictions, expectations, perception, visual cortex, prefrontal cortex

Introduction The importance of top-down effects in visual perception is widely acknowledged by now. There is nothing speculative in recognizing the fact that our perception is influenced by our previous experiences, expectations, emotions, or motivation. Research in cognitive neuroscience has revealed how such factors systematically influence the processing of signals that originate from our retina or other sensory organs. Nevertheless, even most textbooks still view percepts as objective reflections of the external world. According to this view, when placed in a particular environment, we all principally see the same things, but may mutually differ with respect to postperceptual, or higher level, cognitive process­ es. Thus, perception is seen as exclusively determined by the external sensory, so-called bottom-up, inputs. And, although the importance of the constant exchange between in­ coming sensory data and existing knowledge used for postulating hypotheses regarding sensations has occasionally been recognized during the early days of cognitive psycholo­ Page 1 of 24

Top-Down Effects in Visual Perception gy (MacKay, 1956), only rarely have the top-down effects been considered to be of signifi­ cant importance as bottom-up factors (Gregory, 1980; Koffka, 1935). Although increased emphasis has been placed on top-down effects in perception within modern cognitive neuroscience theories and approaches, it is important to note that these are mostly treated as external, modulatory effects. This view incorporates an implicit and rather strong assumption according to which top-down effects represent secondary phe­ nomena whose occurrence is not mandatory, but rather is contingent on environmental conditions such as the level of visibility, or ambiguity. As such, they are even discussed as exclusively attentional or higher-level executive effects that may influence and (p. 61) bias visual perception, without necessarily constituting one of its inherent, core elements. This chapter reviews a more comprehensive outlook on visual perception. In this account, interpreting the visual world can’t be accomplished by relying solely on bottom-up infor­ mation, but rather it emerges from the integration of external information with preexist­ ing knowledge. Several sources of such knowledge that trigger different types of topdown biases and may modulate and guide visual processing are described here. First, we discuss how our brain, when presented with a visual object, employs a proactive strategy of testing multiple hypotheses regarding its identity. Another source of early top-down fa­ cilitatory bias derived from the presented object concerns its potential emotional salience that is also extracted during the early phases of visual processing. At the same time, a comparable associative and proactive strategy is used for formulating and testing hy­ potheses based on the context in which the object is presented and on other items likely to accompany it. Finally, factors outside the currently presented input such as task de­ mands, previously encountered events, or the behavioral context may be used as addition­ al sources of top-down influences in visual recognition. In conclusion, although triggered and constrained by the incoming bottom-up signals, visual perception is strongly guided and shaped by top-down mechanisms whose origin, temporal dynamics, and neural basis are reviewed and synthesized here.

Understanding the Visual World The common view of visual perception posits that the flow of visual information starts once the visual information is acquired by the eye, and continues as it is transmitted fur­ ther, to the primary and then to higher order visual cortices. This hierarchical view is in­ complete because it ignores the influences that our preexisting knowledge and various dispositions have on the way we process and understand visual information. In our everyday life, we are rarely presented with clearly and fully perceptible visual in­ formation, and we rarely approach our environment without certain predictions. Even highly familiar objects are constantly encountered in different circumstances that greatly vary in terms of lighting, occlusions, and other parameters. Naturally, all of these varia­ tions limit our ability to structure and interpret the presented scene in a meaningful man­ ner without relying on previous experience. Therefore, outside the laboratory, successful recognition has to rely not only on the immediately available visual input but also on dif­ Page 2 of 24

Top-Down Effects in Visual Perception ferent sources of related preexisting information represented at higher processing levels (Bullier, 2001; Gilbert & Sigman, 2007; Kveraga et al., 2007b), which trigger the so-called top-down effects in visual perception. Consequently, to understand visual recognition, we need to consider differential, as well as integrative, effects of bottom-up and top-down processes, and their respective neural bases. Before reviewing specific instances in which top-down mechanisms are of prominent rele­ vance, it is important to mention that the phrase “top-down modulation” has typically been utilized in several different ways in the literature. Engel et al. (2001) summarize four variants of its use: anatomical (equating top-down influences with the feedback activ­ ity in a processing hierarchy), cognitivist (equating top-down influences with hypothesisor expectation-driven processing), perceptual or gestaltist (equating top-down influences with contextual modulation of perceptual items), and dynamicist (equating top-down in­ fluences with the enslavement of local processing elements by large-scale neuronal dy­ namics). Although separable, these variants are in some occasions partly overlapping (e.g., it may be hard to clearly separate cognitivist and perceptual definitions) or mutually complementary (e.g., cognitivist influences may be mediated by anatomical or dynamicist processing mechanisms). A more general definition might specify top-down influences as instances in which complex types of information represented at higher processing stages influence simpler processing occurring at earlier stages (Gilbert & Sigman, 2007). How­ ever, even this definition might be problematic in some instances because the term “com­ plexity of information” can prove to be hard to specify, or even to apply in some instances of top-down modulations. Nevertheless, bringing different definitions and processing lev­ els together, it is possible to principally suggest that bottom-up information flow is sup­ ported by feedforward connections that transfer information from lower to higher level regions, in contrast to feedback connections that transmit the signals originating from the higher level areas downstream within the processing hierarchy. Feedforward connec­ tions are typically considered to be those that originate from supragranular layers and terminate in and around the fourth cortical layer, in contrast to feedback connections that originate in infragranular and end in agranular layers (Felleman & Van Essen, 1991; Fris­ ton, 2005; Maunsell & Van Essen, 1983; Mumford, 1992; Shipp, 2005). (p. 62) Further­ more, feedforward connections are relatively focused and tend to project to a smaller number of mostly neighboring regions in the processing hierarchy. In contrast, feedback connections are more diffused because they innervate many regions and form wider con­ nection patterns within them (Shipp & Zeki, 1989; Ungerleider & Desimone, 1986). Con­ sequently, feedforward connections carry the main excitatory input to cortical neurons and are considered as driving connections, unlike the feedback ones that typically have modulatory effects (Buchel & Friston, 1997; Girard & Bullier, 1989; Salin & Bullier, 1995; Sherman & Guillery, 2004). For example, feedback connections potentiate responses of low-order areas by nonlinear mechanisms such as gain control (Bullier et al., 2001), in­ crease the synchronization in lower order areas (Munk et al., 1995), and contribute to at­ tentional processing (Gilbert et al., 2000). Mumford (1992) has argued that feedback and feedforward connections have to be considered of equal importance, emphasizing that most cognitive processes reflect a balanced exchange of information between pairs of Page 3 of 24

Top-Down Effects in Visual Perception brain regions that are often hierarchically asymmetrical. However, the equal importance does not imply equal functionality: as suggested by Lamme et al. (1998), feedforward con­ nections may be fast, but they are not necessarily linked with perceptual experience. In­ stead, attentive vision and visual awareness, which may in everyday life introspectively be linked to a conscious perceptual experience, arise from recurrent processing within the hierarchy (Lamme & Roefsema, 2000). As described in more detail later in this chapter, a similar conceptualization of feedfor­ ward and feedback connections has been advocated by a number of predictive approach­ es to cortical processing, all of which emphasize that efficient perception arises from a balanced exchange of bottom-up and top-down signals. Within such recurrent pathways, feedback connections are suggested to trigger “templates,” namely expected reconstruc­ tions of sensory signals that can be compared with the input being received from lowerlevel areas by the feedforward projections. According to one theory, the residual, or the difference between the template and incoming input, is calculated and transmitted to higher level areas (Mumford, 1992). Different models propose different ways to handle the discrepancy between top-down hypotheses and bottom-up input, but they mostly agree that a comparison between ascending and descending information is needed to fa­ cilitate convergence. In conclusion, our understanding of visual recognition strongly requires that we charac­ terize the sources and dynamics of top-down effects in sensory processing. First, we need to distinguish between different types of cognitive biases in vision, some of which may be­ come available before the presentation of a specific stimulus, whereas others are trig­ gered by its immediate appearance. Specifically, although in some situations we may be able to use prior knowledge to generate more or less specific predictions regarding the upcoming stimulation, in other contexts hypotheses regarding the identity or other fea­ tures of incoming stimuli can only be generated after the object is presented. In either case, visual perception does not simply reflect a buildup of independently processed stim­ ulus features (e.g., shape, color, or edge information), which are first integrated into a recognizable image and thereafter complemented with other, already existing information about that particular object. Instead, links between the presented input and preexisting representations are created before, or at the initial moment of, object presentation and continuously refined thereafter until the two sufficiently overlap and the object is suc­ cessfully recognized.

Relevance of Prediction in Simple Visual Recognition Even the simplest, most impoverished contexts of visual recognition rely on an integra­ tion of bottom-up and top-down processing mechanisms. It has been proposed that this type of recognition is predictive in nature, in the sense that predictions are initialized af­ ter stimulus presentation based on certain features of the presented input (e.g., global object shape), which are rapidly processed and used for facilitating the processing of oth­ Page 4 of 24

Top-Down Effects in Visual Perception er object features (Bar, 2003, 2007; Kveraga et al., 2007b). One proposal is that percep­ tion, even when single objects are presented in isolation, progresses through several phases that constitute an activate-predict-confirm perception cycle (Enns & Lleras, 2008). Explicitly or implicitly, this view is in accordance with predictive theories of cortical pro­ cessing (Bar, 2003; Friston, 2005; Grossberg, 1980; Mumford, 1992; Rao & Ballard, 1999; Ullman, 1995) that were developed in attempts to elucidate the mechanisms of iterative recurrent cortical processing underlying successful cognition. Their advancement was strongly motivated by the increase in knowledge regarding the structural and functional properties of feedback and feedforward connections described earlier, as well as the posited distinctions between (p. 63) forward and inverse models in computer vision (Bal­ lard et al., 1983; Friston, 2005; Kawato et al., 1993). Based on these developments, pre­ dictive theories suggested that different cortical systems share a common mechanism of constant formulation and communication of expectations and other top-down biases from higher to lower level cortical areas, which thereby become primed for the anticipated events. This allows the input that arrives to these presensitized areas to be compared and integrated with the postulated expectations, perhaps through specific synchrony patterns visible across different levels of the hierarchy (Engel et al., 2001; Ghuman et al., 2008; Kveraga et al., 2007b; von Stein et al., 2000; von Stein & Satnthein, 2000). Such compari­ son and integration of top-down and bottom-up information has been posited to rely on it­ erative error-minimization mechanisms (Friston, 2005; Grossberg, 1980; Kveraga et al., 2007b; Mumford, 1992; Ullman, 1995) that support successful cognitive functioning. With respect to visual processing, this means that an object can be recognized once a match between the prepostulated hypothesis and sensory representation is reached, such that no significant difference exists between the two. As mentioned before, this implies that feedforward pathways carry error signals, or information regarding the residual discrep­ ancy between predicted and actual events (Friston, 2005; Mumford, 1992; Rao & Ballard, 1999). This suggestion is highly important because it posits a privileged status for errors of prediction in cortical processing. These events require more pronounced and pro­ longed analysis because they typically signal inadequacy of the preexisting knowledge for efficient functioning. Consequently, in addition to providing a powerful signal for novelty detection, discrepancy signals often trigger a reevaluation of current knowledge, new learning, or a change in behavior (Corbetta et al., 2002; Escera et al., 2000; Schultz & Dickinson, 2000; Winkler & Czigler, 1998). In contrast, events that are predicted correct­ ly typically carry little informational value (because of successful learning, we expected these to occur all along) and are therefore processed in a more efficient manner (i.e., faster and more accurately) than the unexpected or wrongly predicted ones. Although the described conceptualization of the general dynamics of iterative recurrent processing across different levels of cortical hierarchies is shared by most predictive theories of neural processing, these theories differ with respect to their conceptualizations of more specific features of such processing (e.g., the level of abstraction of the top-down mediat­ ed templates or the existence of information exchange outside neighboring levels of corti­ cal hierarchies; cf. Bar, 2003; Kveraga et al., 2007b; Mumford, 1992).

Page 5 of 24

Top-Down Effects in Visual Perception One of the most influential models that highlight the relevance of recurrent connections and top-down feedback in visual processing is the “global-to-local” integrated model of vi­ sual processing of Bullier (2001). This model builds on our knowledge of the anatomy and functionality of the visual system, especially the differences between magnocellular and parvocellular visual pathways that carry the visual information from the retina to the brain. Specifically, it takes into account findings showing that the magnocellular and par­ vocellular pathways are separated over the first few processing stages and that, following stimulus presentation, area V1 receives activation from lateral geniculate nucleus magno­ cellular neurons around 20 ms earlier than from parvocellular neurons. This faster activa­ tion of the M-channel (characterized by high contrast but poor chromatic sensitivity, larg­ er receptive fields, and lower spatial sampling rate), together with the high degree of myelination, could account for the early activation of the dorsal visual processing stream after visual stimulation, which enables the generation of fast feedback connections from higher to lower areas (V1 and V2) at exactly the time when feedforward activation from the P-channel arrives (Bullier, 2001; Kaplan, 2004; Kaplan & Shapley, 1986; Merigan & Maunsell, 1993; Schiller & Malpeli, 1978; Goodale & Milner, 1992). This view is very dif­ ferent not only from the classic theories that emphasize the importance of feedforward connections but also from the more “traditional” account of feedback connections stating that, regardless of the overall importance of recurrent connections, the first sweep of ac­ tivity through the hierarchy of (both dorsal and ventral) visual areas is still primarily de­ termined by the pattern of feedforward connections (Thorpe & Imbert, 1989). The inte­ grated model of visual processing treats V1 and V2 areas as general-purpose representa­ tors that integrate computations from higher levels, allowing global information to influ­ ence the processing of more detailed ones. V1 could be a place for perceptual integration that reunites information returned from different higher level areas by feedback connec­ tions after being divided during the first activity sweep for a very simple reason: This cor­ tical area still has a high-resolution map of almost all relevant feature information. This view resonates well with recent theories of visual processing and awareness, such as Zeki’s theory of visual (p. 64) consciousness (Zeki & Bartels, 1999), Hochstein and Ahissar’s theory of perceptual processing and learning (Hochstein & Ahissar, 2002), or Lamme’s (2006) views on consciousness. Another model that suggests a similar division of labor and offers a predictive view of vi­ sual processing in object recognition was put forth by Bar and colleagues (Bar, 2003, 2007; Bar et al., 2006; Kveraga et al., 2007a). This model identifies the information con­ tent that triggers top-down hypotheses and characterizes the exact cortical regions that bias visual processing by formulating and communicating those top-down predictions to lower-level cortical areas. It also explains how top-down information modulates bottom-up processing in situations in which single objects are presented in isolation and in which prediction is driven by a subset of an object’s own properties that facilitate the recogni­ tion of the object itself as well as other objects it typically appears with. Specifically, this top-down facilitation model posits that visual recognition progresses from an initial stage aimed at determining what an object resembles, and a later stage geared toward specify­ ing its fine details. For this to occur, the top-down facilitation model critically assumes Page 6 of 24

Top-Down Effects in Visual Perception that different features of the presented stimulus are processed at different processing stages. First, the coarse, global information regarding the object shape is rapidly extract­ ed and used for activating in memory existing representations that most resemble the global properties of the given object to be recognized. These include all objects that share the rudimentary properties of the presented object and look highly similar if viewed in blurred or decontextualized circumstances (e.g., a mushroom, desk lamp, and umbrella). Although an object cannot be identified with full certainty based on coarse stimulus out­ line, such rapidly extracted information is still highly useful for basic-level recognition of resemblance, creating the so-called analogies (Bar, 2007). Generally, analogies allow the formulation of links between the presented input and relevant preexisting representa­ tions in memory, which may be based on different types of similarity between the two (e.g., perceptual, semantic, functional, or conceptual). In the context of visual recogni­ tion, analogies are based on global perceptual similarity between the input object and ob­ jects in memory, which allows the brain to generate multiple hypotheses or guesses re­ garding the object’s most likely identity. For these analogies and initial guesses to be useful, they need to be triggered early dur­ ing visual processing, while they can still bias the slower incoming bottom-up input. Thus, information regarding the coarse stimulus properties has to be processed first, before the finer object features. Indeed, it has been suggested that such information is rapidly ex­ tracted and conveyed using low spatial frequencies of the visual input through the mag­ nocellular (M) pathway, which is ideal for transmitting coarse information regarding the general object outlines at higher velocity compared with other pathways (Merigan & Maunsell, 1993; Schiller & Malpeli, 1978). This information is transmitted to the or­ bitofrontal cortex (OFC) (Bar, 2003; Bar et al., 2006), a polymodal region implicated pri­ marily in the processing of rewards and affect (Barbas, 2000; Carmichael & Price, 1995; Cavada et al., 2000; Kringelbach & Rolls, 2004), as well as supporting some aspects of vi­ sual processing (Bar et al., 2006; Freedman et al., 2003; Frey & Petrides, 2000; Meunier et al., 1997; Rolls et al., 2005; Thorpe et al., 1983). In the present framework, the OFC is suggested to generate predictions regarding the object’s identity by activating all repre­ sentations that share the global appearance with the presented image by relying on rapidly analyzed low spatial frequencies (Bar, 2003, 2007). Once fed back into the ventral processing stream, these predictions interact with the slower incoming bottom-up infor­ mation and facilitate the recognition of the presented object. Experimental findings cor­ roborate the hypothesis regarding the relevance of the M-pathway for transmitting coarse visual information (Bar, 2003; Bar et al., 2006; Kveraga et al., 2007b), as well as the importance of OFC activity, and the interaction between the OFC and inferior tempo­ ral cortex, for recognizing visual objects (Bar et al., 2001, 2006). It has also been suggest­ ed that low spatial frequency information is used for generating predictions regarding other objects and events that are likely to be encountered in the same context (Bar, 2004; Oliva & Torralba, 2007). As also suggested by Ganis and Kosslyn (2007), the associative nature of our long-term memory plays a crucial role in matching the presented input with preexisting representations and all associated information relevant for object identifica­ tion. Finally, as suggested by Barrett and Bar (2009), different portions of the OFC are al­ Page 7 of 24

Top-Down Effects in Visual Perception so implicated in mediating affective predictions by supporting a representation of objects’ emotional salience that constitutes an inherent part of visual recognition. The elaboration of the exact dynamics and interaction between affective predictions and (p. 65) other types of top-down biases in visual recognition remains to be elucidated.

Contextual Top-Down Effects In the previous sections, we described how the processing of single objects might be facil­ itated by using rapidly processed global shape information for postulating hypotheses re­ garding their identity. However, in the real world, objects are rarely presented in isola­ tion. Instead, they are embedded in particular environments and surrounded by other ob­ jects that are typically not random, but rather are mutually related in that they typically share the same context. Such relations that are based on frequent co-occurrence of ob­ jects within the same spatial or temporal context may be referred to as contextual associ­ ations. Importantly, objects that are typically found in the same environment can share many qualitatively different types of relations. For example, within one particular context (e.g., a kitchen), it may be possible to find objects that are semantically or categorically related in the sense that they belong to the same semantic category (e.g., kitchen appli­ ances such as a dishwasher and a refrigerator), as well as those that are typically used to­ gether, thus sharing a mutual functional relation (e.g., a frying pan and oil). However, the existence of such relations is not crucial for defining contextual associates because some may only have the environment that they typically coinhabit in common (e.g., a shower curtain and a toilet brush). Furthermore, some categorically or contextually related ob­ jects may be perceptually similar (e.g., an orange and a peach) or dissimilar (e.g., an ap­ ple and a banana). In addition, various contextual associates may share spatial relations of different flexibility, for example, such that a bathroom mirror is typically placed above the sink, whereas a microwave and a refrigerator may be encountered in different rela­ tional positions within the kitchen. Finally, although some object pairs may share only one relation (e.g., categorical: a cat and a goat; or contextual: a towel and sunscreen), others may be related in more than one way (e.g., a mouse and a cat are related both contextual­ ly and categorically, whereas a cat and a dog are related contextually, categorically, and perceptually). Although the number of different association types shared by the two objects may be of some relevance, it is mainly the frequency and consistency of their mutual co-occurrence that determines the strength of their associative connections (Bar, 2004; Biederman, 1981; Hutchison, 2003; Spence & Owens, 1990). Such associative strength is important in that it provides information that our brain can use for accurate predictions. Our nervous system is extremely efficient in capturing the statistics in natural visual scenes (Fiser & Aslin, 2001; Torralba & Oliva, 2003), as well as learning the repeated spatial contingen­ cies of even arbitrarily distributed abstract stimuli (Chun & Jiang, 1999). In addition to being sensitive and successful in learning contextual relations, our brain is also highly ef­ ficient in utilizing this knowledge for facilitating its own processing. In accordance with the general notion of a proactively predictive brain (Bar, 2009), it has repeatedly been Page 8 of 24

Top-Down Effects in Visual Perception demonstrated that the learned contextual information is constantly used for increasing the efficiency of visual search and recognition (Bar, 2004; Biederman et al., 1982; Daven­ port & Potter, 2004; Torralba et al., 2006). Specifically, objects presented in familiar back­ grounds, especially if appearing in expected spatial configuration, are processed faster and more accurately than those presented in noncongruent settings. Furthermore, the contextual and semantic redundancy provided by the presentation of several related ob­ jects encountered in such settings allows the observer to resolve the insecurities and am­ biguities of individual objects more efficiently. Such context-based predictions are ex­ tremely useful because they save processing resources and reduce the need for exerting mental effort while dealing with predictable aspects of our surroundings. They allow us to allocate attention toward relevant environmental aspects more efficiently, and they are very useful for guiding our actions and behavior. However, to understand the mechanisms underlying contextual top-down facilitation effects, it is important to illustrate how the overall meaning, or the gist, of a complex image can be processed fast enough to become useful for guiding the processing of individual objects presented within it (Bar, 2004; Oli­ va, 2005; Oliva & Torralba, 2001). Studies that have addressed this issue have recently demonstrated how extracting the gist of a scene mainly relies on low spatial frequencies present in the image. This is not surprising because we demonstrated earlier how such frequencies can be rapidly ana­ lyzed within the M-pathway allowing them to, in this case, aid the rapid classification of the presented context (Bar, 2004; Oliva & Torralba, 2001; Schyns & Oliva, 1994). Specifi­ cally, information regarding the global scene features can proactively be used for activat­ ing context frames, namely the representations of objects and relations that are common to that specific context (Bar, 2004; Bar & Ullman, 1996). Similar to the ideas of schemata (p. 66) (Hock et al., 1978), scripts (Shank, 1975), and frames (Minsky, 1975) from the 1970s and 1980s, such context frames are suggested to aid the processing of individual objects presented in the scene. At this point, it is important to mention that context frames should not be understood as static images that are activated in an all-or-none manner, but rather as dynamic entities that are processed gradually. In this view, a proto­ typical spatial template of the global structure of a familiar context is activated first, and is then filled with more instance-specific details until it develops into an episodic context frame that includes specific information regarding an individual instantiation of the given context. Overall, there is ample evidence that demonstrates our brain’s efficiency in ex­ tracting the gist of even highly complex visual images, allowing it to identify individual objects typically encountered in such settings. This, however, represents only one level of predictive, contextual top-down modulations. On another level, it is possible that individual objects that are typically encountered to­ gether in a certain context are also able to mutually facilitate each other’s processing. In other words, although it has long been known that a kitchen environment helps in recog­ nizing a refrigerator that it typically contains, it was less clear whether seeing an image of that refrigerator in isolation could automatically invoke the image of the contextual set­ ting in which it is typically embedded. If so, it would be plausible to expect that present­ ing objects typically associated with a certain context in isolation could facilitate subse­ Page 9 of 24

Top-Down Effects in Visual Perception quent recognition of their typical contextual associates, even when these are presented outside the shared setting. This hypothesis has been addressed and substantiated in a se­ ries of studies conducted by Bar and colleagues (Aminoff et al., 2007; Bar et al., 2008a, 2008b; Bar & Aminoff, 2003; Fenske et al., 2006) that indicated the relevance of the parahippocampal cortex (PHC), the retrosplenial complex (RSC), and the medial pre­ frontal cortex (MPFC) for contextual processing. In addition to identifying the regions rel­ evant for contextual relations, they also revealed a greater sensitivity of the PHC to asso­ ciations with greater physical specificity, in contrast to RSC areas that seem to represent contextual associations in a more abstract manner (Bar et al., 2008b). Similar to the neighboring OFC’s role in generating predictions related to object identity, the involve­ ment of the neighboring part of the MPFC was suggested to reflect the formulations of predictions based on familiar, both visual-spatial and more abstract types of contextual associations (Bar & Aminoff, 2003; Bar et al., 2008a, 2008b). These findings are of high relevance because they illustrate how the organization of the brain as a whole may auto­ matically and parsimoniously support the strongly associative nature of our cognitive mind (Figure 4.1). Before moving on to other types of top-down modulations, it is important to highlight that the described contextual effects in visual recognition represent only one type of contextu­ al phenomena in vision. As described, they mostly address the functional or even seman­ tic context in which the objects appear. On a somewhat more basic level than this, sim­ pler stimulus contextual effects also need to be mentioned as a form of modulatory influ­ ence in vision. For example, contextual effects in simple figure–ground segregation of ob­ jects are visible in situations in which, for example, the response of a neuron becomes af­ fected by the global characteristics of the contour defining the object that is outside the neuron’s receptive field (Li et al., 2006). Generally, it is hard to determine whether such influences should be considered as top-down because some of them might be mediated solely by local connections intrinsic to the area involved in processing a certain feature. Thus, strictly speaking, they might not adhere to the previous definitions of top-down modifications as those reflecting influences from higher level processing stages (Gilbert & Sigman, 2007). However, given the relevance of these stimulus contextual influences on many elements of visual processing, including contour integration, scene segmenta­ tion, color constancy, and motion processing (Gilbert & Sigman, 2007), their relevance for intact visual perception has to be acknowledged. In a sense, even gestalt rules of percep­ tual grouping may be seen as similar examples of modulations in vision because they summarize how contours are perceptually linked as a consequence of their geometric re­ lationships (Todorović, 2007). These rules most clearly indicate how, indeed, “the whole is greater than the sum of its parts,” because they show how our perception of each stimu­ lus strongly depends on the contextual setting in which it is embedded. A very interesting feature of gestalt rules, and information integration in general, is that the knowledge re­ garding the grouping process itself does not easily modify the percept. In the example of the Muller-Lyer visual illusion, knowing that the two lines are of equal length does not necessarily make the illusion go away (Bruno & Franz, 2009).

Page 10 of 24

Top-Down Effects in Visual Perception

Figure 4.1 Parallel to the systematic bottom-up pro­ gression of image details that are mainly mediated by high spatial frequency (HSF) information along the ventral visual pathway, rapid projections of coarse low spatial frequencies (LSFs) trigger the generation of hypotheses or “initial guesses” regard­ ing the exact object identity and the context within which it typically appears. Both of these types of pre­ dictions are validated and refined with the gradual arrival of HSFs (Bar, 2004). IT, inferior temporal cor­ tex; MPFC, medial prefrontal cortex; OFC, orbital frontal cortex; RSC, retrosplenial cortex; PHC, parahippocampal cortex. Modified, with permission, from Bar (2009). (p. 67)

Interfunctional Nature of Top-Down Modula­ tions In the previous section, a “purely perceptual” aspect of top-down modulations has been introduced because the theoretical proposals and experimental findings described mostly focused on the top-down modulations that are triggered by the presentation of a single stimulus. However, top-down influences in visual perception include a much wider array of phenomena. Specifically, in many cases, the top-down-mediated preparation begins be­ fore the actual presentation of the stimulus and is triggered by factors such as instruction (Carlsson et al., 2000) or specific task cue (Simmons et al., 2004). These types of influ­ ences may be considered contextual, in the sense that they reflect the behavioral context of visual processing that is related to the perceptual task at hand (Watanabe et al., 1998). In this case, the participant may choose to focus on a certain aspect of a stimulus that is expected to be of relevance in the future, thus triggering a modulatory effect on the feed­ forward analysis of the stimulus once it appears. This is similar to the way prior knowl­ edge regarding the likelihood of the spatial location or other features of the objects ex­ pected to appear in the near future influences our perception (Bar, 2004; Biederman, Page 11 of 24

Top-Down Effects in Visual Perception 1972, 1981; Biederman et al., 1973; Driver & Baylis, 1998; Scholl, 2001). Typically, know­ ing what to expect allows one to attend to the location or other features of the expected stimulus. Not only spatial but also object features, objects or categories, and temporal context or other perceptual groups could be considered different subtypes of top-down in­ fluences (Gilbert & Sigman, 2007). In addition, prior presentation of events that provide clues regarding the identity of the incoming stimulation may also influence visual processing. Within this context, one spe­ cific form of the potential influence of previous experience on current visual processing involves priming (Schacter et al., 2004; Tulving & Schacter, 1990; Wiggs & Martin, 1998). Events that occur before target presentation and that influence its processing may in­ clude those that are semantically or contextually long-term related to the target event (Kveraga et al., 2007b), (p. 68) as well as stimuli that had become associated with the tar­ get stimulus through short-term learning within the same (Schubotz & von Cramon, 2001; 2002) or a different (Widmann et al., 2007) modality. It has been suggested that, in some cases, and especially in the auditory domain, prediction regarding the forthcoming stimuli can be realized within the sensory system itself (Näätänen et al., 2007). In other cases, however, these predictive sensory phenomena may be related to the computations of the motor domain. Specifically, expectations regarding the incoming stimulation may be formulated based on an “efference copy” elicited by the motor system in situations in which a person’s own actions trigger such stimulation (Sperry, 1950). Specifically, von Holst, Mittelstaedt, and Sperry in the 1950s provided the first experimental evidence demonstrating the importance of motor-to-sensory feedback in controlling behavior (Bays & Wolpert, 2008; Wolpert & Flanagan, 2001). This motivated an increased interest in the so-called internal model framework that can now be considered a prevailing, widely ac­ cepted view of action generation (Miall & Wolpert, 1996; Wolpert & Kawato, 1998). Ac­ cording to this framework, not only is there a prediction for sensory events that may be considered consequences of one’s own behavior (Blakemore et al., 1998), but the same system may be utilized for anticipating sensory events that are strongly associated with other sensory events co-occurring on a short time scale (Schubotz, 2007). Before moving away from the motor system, it is important to mention one more, somewhat different view that also emphasizes a strong link between perceptual and motor behavior. Specifi­ cally, according to Gross and colleagues (1999), when discussing perceptual processing, a strong focus should be placed on sensorimotor anticipation because it allows one to di­ rectly characterize a visual scene in categories of behavior. Thus, perception is not simply predictive but also is “predictive with purpose” and may thus be described as behavior based (Gross et al., 1999) or, as suggested by Arbib (1972), action oriented. Up to now, top-down modulatory influences have been recognized in a multitude of differ­ ent contexts and circumstances. However, one that is possibly the most obvious has not been addressed. Specifically, we have not discussed in any detail the relevance and neces­ sity of top-down modulations in situations in which we can, from our daily experience, ex­ plicitly appreciate the aid of prior knowledge for processing stimuli at hand. This mostly concerns the situations in which the visual input is impoverished or ambiguous, and in which we almost consciously rely on top-down information for closing the percept. For ex­ Page 12 of 24

Top-Down Effects in Visual Perception ample, contextual information is crucial in recovering visual scene properties lost be­ cause of the blurs or superimpositions in visual image (Albright & Stoner, 2002). In situa­ tions in which an object is ambiguous and may be interpreted in more than one fashion, the importance of a prior template that may guide recognition is even more of relevance than when viewing a clear percept (Cavanagh, 1991). Examples of such stimuli include two faces or a vase versus one face behind a vase (Costall, 1980), or as shown in Figure 4.2, a face of a young versus an old woman or a man playing a saxophone seen in silhou­ ette versus a woman’s face in shadow (Shepard, 1990).

Figure 4.2 Ambiguous figures demonstrate how our experience of the visual world is not determined sole­ ly by bottom-up input: example of the figure of a young woman versus an old woman, and a woman’s face versus a man playing the saxophone.

Generally, in the previous sections, a wide array of top-down phenomena have been men­ tioned, and some were described in more detail. From this description, it became clear how difficult it is to categorize these effects in relation to other cognitive functions, the most important of which is attention. Specifically, not only is it hard to clearly distinguish between attentional and other forms of top-down phenomena, but also, given that the same effect is sometimes discussed as an attentional, and sometimes as a perceptual, phenomenon, a clear separation may not be possible. This might not even be necessary in all situations because, for most practical purposes, the mechanisms underlying such phe­ nomena and the effects they introduce may be considered the same. When discussing topdown influences of selective attention, one typically refers to the processing guided by hypotheses or expectations and the influence of prior knowledge or other (p. 69) personal features on stimulus selection (Engel et al., 2001), which is quite concordant to the way top-down influences have been addressed in this chapter. Even aspects of visual percep­ tion as mentioned here might, at least in part, be categorized under anticipatory atten­ tion that involves a preparation for the upcoming stimulus and improves the speed and precision of stimulus processing (Posner & Dehaene, 1994; Brunia, 1999). Not surprising­ ly, then, instead of always categorizing a certain effect into one functional “box,” Gazzaley (2010) uses a general term “topdown modulation,” which includes changes in sensory cortical activity associated with relevant and irrelevant information that stand at the crossroads of perception, memory, and attention. Similarly, a general conceptualization of Page 13 of 24

Top-Down Effects in Visual Perception top-down influences as those that reflect an interaction of goals, action plans, working memory, and selective attention is suggested by Engel et al. (2001). In an attempt to bet­ ter differentiate basic types of top-down influences in perception, Ganis and Kosslyn (2007) suggested a crucial distinction between strategic top-down processes that include those influences that are under voluntary control and may be contrasted with involuntary reflexive types of top-down modulations. In a somewhat different view, Summerfield and Egner (2009) argued that biases in visual processing include attentional mechanisms that prioritize processing based on motivational relevance and expectations that bias process­ ing based on prior experience. While acknowledging these initial attempts to create a clear taxonomy of top-down influences in perceptual processing, there is still quite a lot of disagreement in this area that remains to be settled in the future. On a more anatomical and physiological side, when discussing top-down modulations, it was suggested that these should not be regarded as an intrinsic property of individual sensory areas, but instead as a phenomenon realized through long-range connections be­ tween distant brain regions (Gazzaley, 2010). In this context, a lot of work has been in­ vested in clearly defining the two types of areas involved in such interactive processing: sites and sources of biases. Specifically, sites relate to those areas in which the analysis of afferent signals takes place, whereas sources include those that provide the biasing infor­ mation that modulate processing (Frith & Dolan, 1997). Although some of the previously mentioned theories of visual perception that emphasize the role of feedback connections (Grossberg, 1980; Mumford, 1992; Ullman, 1995) often consider the immediate external stimulation to constitute the source of feedback information, others consider modulatory “bias” signals to be triggered by the system itself based on the prior knowledge (Engel et al., 2001). The sources of such signals most often include the prefrontal, but also parietal and temporal, cortices as well as the limbic system, depending on the specific type of in­ formation influencing information processing (Beck & Kastner, 2009; Engel et al., 2001; Hopfinger et al., 2000; Miller & Cohen, 2001). All of them aid and guide visual processing by providing relevant information necessary for the evaluation and interpretation of the incoming stimulus. An important thing to keep in mind, however, is that defining top-down modulations in terms of long-term connections may be limiting for recognizing some influences that have a modulatory role in perception and contribute to our experience of an intact percept. In this conceptualization, it is not quite clear how distant two regions have to be in order for their interaction to be considered a “top-down” phenomenon and whether some shortrange and local connections that modulate our perception may also be considered “topdown.” It is not clear what should be more relevant for defining top-down modulations, the distance between regions or the existence of recurrent processing between separate units that allows us to use prior experiences and knowledge for modulating the feedfor­ ward sweep of processing. This modulation may, as suggested in the attentional theory of Kastner and Ungerleider (2000), be realized through an increase of the neural response to the relevant or attended stimulus and an attenuation of the irrelevant stimulus before or after its presentation, a claim that has been experimentally corroborated (Desimone & Duncan, 1995; Pessoa et al., 2003; Reynolds & Chelazzi, 2004). Furthermore, as suggest­ Page 14 of 24

Top-Down Effects in Visual Perception ed by Dehaene et al. (1998), such top-down attentional amplification, or an increase in ac­ tivity related to the relevant stimulus, is also relevant as the mechanism that allows the stimulus to be made available to consciousness.

Concluding Remarks Visual recognition was depicted here as a process that reflects the integration of two equally relevant streams of information. One, the bottom-up stream, captures the input present in the visual image itself and conveyed through the senses. Second, a top-down stream, contributes based on prior experiences, current personal and behavioral sets, and future expectations and modifies the processing of the presented stimulus. In this concep­ tualization, perception may be considered the process of integrating the incoming input and our preexisting (p. 70) knowledge that exists in all contexts, even in very controlled and simple viewing circumstances. In this chapter, it has been argued that top-down effects in visual recognition may be of different complexity and may be triggered by the stimulus itself or information external to the presented stimulus such as the task instruction, copies of motor commands, or prior presentation of events informative for the current stimulation. These various types of topdown influences originate from different cortical sources, reflecting the fact that they are triggered by different types of information. Regardless of the differences in their respec­ tive origins, different types of top-down biases may nevertheless result in similar local ef­ fects, or rely on similar mechanisms for the communication between lower level visual ar­ eas and higher level regions to enable the described effects. In that sense, understanding the principles of one type of top-down facilitation effect may provide important clues re­ garding the general mechanisms of triggering and integrating top-down and bottom-up information that underlie successful visual recognition. In conclusion, the body of research synthesized here demonstrates the richness, complex­ ity, and importance of top-down effects in visual perception and illustrates some of the fundamental mechanisms underlying these. Critically, it is suggested that top-down ef­ fects should not be considered a secondary phenomenon that only occurs in special, ex­ treme settings. Instead, the wide, heterogeneous set of such biases suggests an everpresent and complex informational processing stream that, together with the bottom-up stream, constitutes the core of visual processing. As suggested by Gregory (1980), per­ ception can then be defined as a dynamic search for the best interpretation of the sensory data, a claim that highlights both the active and the constructive nature of visual percep­ tion. Consequently, visual recognition itself should be considered a proactive, predictive, and dynamic process of integrating different sources of information, the success of which is determined by their mutual resonance and correspondence and by our ability to learn from the past in order to predict the future.

Page 15 of 24

Top-Down Effects in Visual Perception

Author Note Work on this chapter was supported by NIH grant R01EY019477-01, NSF grant 0842947, and DARPA grant N10AP20036.

References Albright, T. D., & Stoner, G. R. (2002). Contextual influences on visual processing. Annual Review of Neuroscience, 25, 339–379. Aminoff, E., Gronau, N., & Bar, M. (2007). The parahippocampal cortex mediates spatial and nonspatial associations. Cerebral Cortex, 27, 1493–1503. Arbib, M. (1972). The metaphorical brain: An introduction to cybernetics as artificial in­ telligence and brain theory. New York: Wiley Interscience. Ballard, D. H., Hinton, G. E., & Sejnowski, T. J. (1983). Parallel visual computation. Nature, 306, 21–26. Bar, M. (2009). The proactive brain: Memory for predictions. Theme issue: Predictions in the brain: Using our past to generate a future (M. Bar, Ed.), Philosophical Transactions of the Royal Society, Series B, Biological Sciences, 364, 1235–1243. Bar, M. (2007). The proactive brain: Using analogies and associations to generate predic­ tions. Trends in Cognitive Sciences, 11 (7), 280–289. Bar, M. (2004). Visual objects in context. Nature Reviews, Neuroscience, 5 (8), 617–629. Bar, M. (2003). A cortical mechanism for triggering top-down facilitation in visual object recognition. Journal of Cognitive Neuroscience, 15, 600–609. Bar, M., & Aminoff, E. (2003). Cortical analysis of visual context. Neuron, 38 (2), 347–358. Bar, M., Aminoff, E., & Ishai, A. (2008a). Famous faces activate contextual associations in the parahippocampal cortex. Cerebral Cortex, 18 (6), 1233–1238. Bar, M., Aminoff, E., & Schacter, D. L. (2008b). Scenes unseen: The parahippocampal cor­ tex intrinsically subserves contextual associations, not scenes or places per se. Journal of Neuroscience, 28, 8539–8544. Bar, M., Kassam, K. S., Ghuman, A. S., Boshyan, J., Schmid, A. M., Dale, A. M., Hämäläi­ nen, M. S., Marinkovic, K., Schacter, D. L., Rosen, B. R., & Halgren, E. (2006). Top-down facilitation of visual recognition. Proceedings of the National Academy of Sciences U S A, 103 (2), 449–454. Bar, M., Tootell, R., Schacter, D., Greve, D., Fischl, B., Mendola, J., Rosen, B. R., & Dale, A. M. (2001). Cortical mechanisms of explicit visual object recognition. Neuron, 29 (2), 529– 535. Page 16 of 24

Top-Down Effects in Visual Perception Bar, M., & Ullman, S. (1996). Spatial context in recognition. Perception, 25 (3), 343–352. Barbas, H. (2000). Connections underlying the synthesis of cognition, memory, and emo­ tion in primate prefrontal cortices. Brain Research Bulletin, 52 (5), 319–330. Barrett, L. F., & Bar, M. (2009). See it with feeling: Affective predictions during object perception. Theme issue: Predictions in the brain: Using our past to generate a future (M. Bar. Ed.), Philosophical Transactions of the Royal Society, Series B, Biological Sciences, 364, 1325–1334. Bays, P. M., & Wolpert, D. M. (2008). Predictive attenuation in the perception of touch. In P. Haggard, Y. Rossetti, & M. Kawato (Eds.), Sensorimotor foundations of higher cogni­ tion: Attention and performance XXII (pp. 339–359). New York: Oxford University Press. Beck, D. M., & Kastner, S. (2009). Top-down and bottom-up mechanisms in biasing com­ petition in the human brain. Vision Research, 49 (10), 1154–1165. Biederman, I. (1981). On the semantics of a glance at a scene. In M. Kubovy, and J. R. Pomerantz (Eds.), Perceptual organization (pp. 213–263). Hillsdale, NJ: Erlbaum. Biederman, I. (1972). Perceiving real-world scenes. Science, 177, 77–80. Biederman, I., Glass, A. L., & Stacy, W. (1973). Searching for objects in real-world scenes. Journal of Experimental Psychology, 97, 22–27. Biederman, I., Mezzanotte, R. J., & Rabinowitz, J. C. (1982). Scene perception: De­ tecting and judging objects undergoing relational violations. Cognitive Psychology, 14 (2), 143–177. (p. 71)

Blakemore, S. J., Rees, G., & Frith, C. D. (1998). How do we predict the consequences of our actions? A functional imaging study. Neuropsychologia, 36 (6), 521–529. Brunia, C. H. M. (1999). Neural aspects of anticipatory behavior. Acta Psychologica, 101, 213–352. Bruno, N., & Franz, V. H. (2009). When is grasping affected by the Muller-Lyer illusion? A quantitative review. Neuropsychologia, 47 (6), 1421–1433. Buchel, C., & Friston, K. J. (1997). Modulation of connectivity in visual pathways by atten­ tion: Cortical interactions evaluated with structural equation modeling and fMRI. Cere­ bral Cortex, 7 (8), 768–778. Bullier, J. (2001). Integrated model of visual processing. Brain Research Reviews, 36, 96– 107. Carlsson, K., Petrovic, P., Skare, S., Petersson, K. M., & Ingvar, M. (2000). Tickling antici­ pations: Neural processing in anticipation of a sensory stimulus. Journal of Cognitive Neu­ roscience, 12, 691–703. Page 17 of 24

Top-Down Effects in Visual Perception Carmichael, S. T., & Price, J. L. (1995). Limbic connections of the orbital and medial pre­ frontal cortex in macaque monkeys. Journal of Comparative Neurology, 363 (4), 615–641. Cavada, C., Company, T., Tejedor, J., Cruz-Rizzolo, R. J., & Reinoso-Suarez, F. (2000). The anatomical connections of the macaque monkey orbitofrontal cortex: A review. Cerebral Cortex, 10 (3), 220–242. Cavanagh, P. (1991). What’s up in top-down processing? In A. Gorea (Ed.), Representa­ tions of vision: Trends and tacit assumptions in vision research (pp. 295–304). Cambridge, UK: Cambridge University Press. Chun, M. M., & Jiang, Y. (1999). Top-down attentional guidance based on implicit learning of visual covariation. Psychological Science, 10, 360–365. Corbetta, M., Kincade, J. M., & Shulman, G. L. (2002). Neural systems for visual orienting and their relationships to spatial working memory. Journal of Cognitive Neuroscience, 14, 508–523. Costall, A. (1980). The three faces of Edgar Rubin. Perception, 9, 115. Davenport, J. L., & Potter, M. C. (2004). Scene consistency in object and background per­ ception. Psychological Science, 15 (8), 559–564. Dehaene, S., Kerszberg, M., & Changeux, J. P. (1998). A neuronal model of a global work­ space in effortful cognitive tasks. Proceedings of the National Academy of Sciences U S A, 95, 14529–14534. Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual attention. An­ nual Review of Neuroscience, 183, 193–222. Driver, J., & Baylis, G. C. (1998). Attention and visual object segmentation. In R. Parasura­ man (Ed.), The attentive brain (pp. 299–325). Cambridge, MA: MIT Press. Engel, A. K., Fries, P., & Singer, W. (2001). Dynamic predictions: Oscillations and syn­ chrony in top-down processing. Nature Reviews, Neuroscience, 2 (10), 704–716. Enns, J. T., & Lleras, A. (2008). What’s next? New evidence for prediction in human vi­ sion. Trends in Cognitive Sciences, 12, 327–333. Escera, C., Alho, K., Schröger, E., & Winkler, I. (2000). Involuntary attention and dis­ tractibility as evaluated with event-related brain potentials. Audiology and Neurootology, 5, 151–166. Felleman, D. J., & Van Essen, V. C. (1991). Distributed hierarchical processing in primate visual cortex. Cerebral Cortex, 1, 1–47. Fenske, M. J., Aminoff, E., Gronau, N., & Bar, M. (2006). Top-down facilitation of visual ob­ ject recognition: Object-based and context-based contributions. Progress in Brain Re­ search, 155, 3–21. Page 18 of 24

Top-Down Effects in Visual Perception Fiser, J., & Aslin, R. N. (2001). Unsupervised statistical learning of higher-order spatial structures from visual scenes. Psychological Science, 12, 499–504. Freedman, D. J., Riesenhuber, M., Poggio, T., & Miller, E. K. (2003). A comparison of pri­ mate prefrontal and inferior temporal cortices during visual categorization. Journal of Neuroscience, 23 (12), 5235–5246. Frey, S., & Petrides, M. (2000). Orbitofrontal cortex: A key prefrontal region for encoding information. Proceedings of the National Academy of Sciences U S A, 97, 8723–8727. Frith, C., & Dolan, R. J. (1997). Brain mechanisms associated with top-down processes in perception. Philosophical Transactions of the Royal Society, Series B, Biological Sciences, 352 (1358), 1221–1230. Friston, K. (2005). A theory of cortical responses. Philosophical Transactions of the Royal Society of London, Series B, Biological Sciences, 360 (1456), 815–836. Ganis, G., & Kosslyn, S. M. (2007). Multiple mechanisms of top-down processing in vision. In S. Funahashi (Ed.), Representation and brain (pp. 21–45). Tokyo: Springer-Verlag. Gazzaley, A. (2010). Top-down modulation: The crossroads of perception, attention and memory. Proceedings of SPIE-IS&T Electronic Imaging, SPIE Vol. 7527, 75270A. Ghuman, A., Bar, M., Dobbins, I. G., & Schnyer, D. (2008). The effects of priming on frontal-temporal communication. Proceedings of the National Academy of Sciences U S A, 105 (24), 8405–8409. Gross, H., Heinze, A., Seiler, T., & Stephan, V. (1999). Generative character of perception: A neural architecture for sensorimotor anticipation. Neural Networks, 12 (7–8), 1101– 1129. Gilbert, C., Ito, M., Kapadia, M., & Westheimer, G. (2000). Interactions between attention, context and learning in primary visual cortex. Vision Research, 40 (10–12), 1217–1226. Gilbert, C. D., & Sigman, M. (2007). Brain states: Top-down influences in sensory process­ ing. Neuron, 54, 677–696. Girard, P., & Bullier, J. (1989). Visual activity in area V2 during reversible inactivation of area 17 in the macaque monkey. Journal of Neurophysiology, 62 (6), 1287–1302. Goodale, M. A., & Milner, A. D. (1992). Separate visual pathways for perception and ac­ tion. Trends in Neurosciences, 15 (1), 20–25. Gregory, R. L. (1980). Perceptions as hypotheses. Philosophical Transactions of the Royal Society of London, Series B, Biological Sciences, 290, 181–197. Grossberg, S. (1980). How does a brain build a cognitive code? Psychological Review, 87 (1), 1–51. Page 19 of 24

Top-Down Effects in Visual Perception Hochstein, S., & Ahissar, M. (2002). Review view from the top: Hierarchies and reverse hierarchies in the visual system. Neuron, 36, 791–804. Hock, H. S., Romanski, L., Galie, A., & Williams, C. S. (1978). Real-world schemata and scene recognition in adults and children. Memory and Cognition, 6, 423–431. Hopfinger, J. B., Buonocore, M. H., & Mangun, G. R. (2000). The neural mechanisms of top-down attentional control. Nature Neuroscience, 3 (3), 284–291. Hutchison, K. A. (2003). Is semantic priming due to association strength or feature overlap? A microanalytic review. Psychonomic Bulletin & Review, 10 (4), 785–813. (p. 72)

Kaplan, E. (2004). The M, P, and K pathways of the primate visual system. In L. M. Chalu­ pa & J.S. Werner (Eds.), The visual neuroscience (pp. 481–494). Cambridge, MA: MIT Press. Kaplan, E., & Shapley, R. M. (1986). The primate retina contains two types of ganglion cells, with high and low contrast sensitivity. Proceedings of the National Academy of Sciences U S A, 83 (8), 2755–2757. Kastner, S., & Ungerleider, L.G. (2000). Mechanisms of visual attention in the human cor­ tex. Annual Review of Neuroscience, 23, 315–341. Kawato, M., Hayakawa, H., & Inui, T. (1993). A forward-inverse optics model of reciprocal connections between visual cortical areas. Network, 4, 415–422. Koffka, K. (1935). The principles of gestalt psychology. New York: Harcourt, Brace, & World. Kringelbach, M. L., & Rolls, E. T. (2004). The functional neuroanatomy of the human or­ bitofrontal cortex: Evidence from neuroimaging and neuropsychology. Progress in Neuro­ biology, 72 (5), 341–372. Kveraga, K., Boshyan, J., & Bar, M. (2007a). Magnocellular projections as the trigger of top-down facilitation in recognition. Journal of Neuroscience, 27 (48), 13232–13240. Kveraga, K., Ghuman, A. S., & Bar, M. (2007b). Top-down predictions in the cognitive brain. Brain and Cognition, 65, 145–168. Lamme, V. A. F. (2006). Towards a true neural stance on consciousness. Trends in Cogni­ tive Sciences, 10 (11), 494–501. Lamme, V. A. F., & Roelfsema, P. R. (2000). The distinct modes of vision offered by feed­ forward and recurrent processing. Trends in Neurosciences, 23, 571–579. Lamme, V. A. F., Super, H., & Spekreijse, H. (1998). Feedforward, horizontal, and feed­ back processing in the visual cortex. Current Opinion in Neurobiology, 8, 529–535.

Page 20 of 24

Top-Down Effects in Visual Perception Li, W., Piech, V., & Gilbert, C. D. (2006). Contour saliency in primary visual cortex. Neuron, 50, 951–962. MacKay, D. (1956). Towards an information-flow model of human behaviour. British Jour­ nal of Psychiatry, 43, 30–43. Maunsell, J. H. R., & Van Essen, D. C. (1983) Functional properties of neurons in the mid­ dle temporal visual area of the macaque monkey. II. Binocular interactions and the sensi­ tivity to binocular disparity. Journal of Neurophysiology, 49, 1148–1167. Merigan, W. H., & Maunsell, J. H. (1993). How parallel are the primate visual pathways? Annual Review of Neuroscience, 16, 369–402. Meunier, M., Bachevalier, J., & Mishkin, M. (1997). Effects of orbital frontal and anterior cingulate lesions on object and spatial memory in rhesus monkeys. Neuropsychologia, 35, 999–1015. Miall, R. C., & Wolpert, D. M. (1996). Forward models for physiological motor control. Neural Networks, 9 (8), 1265–1279. Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. An­ nual Review of Neuroscience, 24, 167–202. Minsky, M. (1975). A framework for representing knowledge, In P. H. Winston (Ed.), The psychology of computer vision (pp. 163–189). New York: McGraw-Hill. Mumford, D. (1992). On the computational architecture of the neocortex. I. The role of cortico-cortical loops. Biological Cybernetics, 66 (3), 241–251. Munk, M. H., Nowak, L. G., Nelson, J. I., & Bullier, J. (1995). Structural basis of cortical synchronization. II. Effects of cortical lesions. Journal of Neurophysiology, 74 (6), 2401– 2414. Näätänen, R., Paavilainen, P., Rinne, T., & Ahlo, K. (2007). The mismatch negativity (MMN) in basic research of central auditory processing: A review. Clinical Neurophysiolo­ gy, 118, 2544–2590. Oliva, A. (2005). Gist of the scene. In L. Itti, G. Rees, & J.K. Tsotsos (Eds.), Encyclopedia of neurobiology of attention (pp. 251–256). San Diego, CA: Elsevier. Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representa­ tion of the spatial envelope. International Journal in Computer Vision, 42, 145–175. Oliva, A., & Torralba, A. (2007). The role of context in object recognition. Trends in Cogni­ tive Sciences, 11 (12), 520–527. Pessoa, L., Kastner, S., & Ungerleider, L. G. (2003). Neuroimaging studies of attention: From modulation of sensory processing to top-down control. Journal of Neuroscience, 23 (10), 3990–3998. Page 21 of 24

Top-Down Effects in Visual Perception Posner, M. I., & Dehaene, S. (1994). Attentional networks. Trends in Neuroscience, 17, 75–79. Rao, R. P., & Ballard, D. H. (1999). Predictive coding in the visual cortex: A functional in­ terpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2, 79– 87. Reynolds, J. H., & Chelazzi, L. (2004). Attentional modulation of visual processing. Annual Review of Neuroscience, 27, 611–647. Rolls, E. T., Browning, A. S., Inoue, K., & Hernadi, I. (2005). Novel visual stimuli activate a population of neurons in the primate orbitofrontal cortex. Neurobiology of Learning and Memory, 84 (2), 111–123. Salin, P. A., & Bullier, J. (1995). Corticocortical connections in the visual system: Struc­ ture and function. Physiological Reviews, 75 (1), 107–154. Schacter, D. L., Dobbins, I. G., & Schnyer, D. M. (2004). Specificity of priming: A cognitive neuroscience perspective. Nature Reviews Neuroscience, 5 (11), 853–862. Shank, R. C. (1975). Conceptual information processing. New York: Elsevier Science Ltd. Schiller, P. H., & Malpeli, J. O. (1978). Functional specificity of lateral geniculate nucleus laminae of the rhesus monkey. Journal of Neurophysiology, 41, 788–797. Schyns, P. G., & Oliva, A. (1994). From blobs to boundary edges: Evidence for time- and spatial-dependent scene recognition. Psychological Science, 5 (4), 195–200. Scholl, B. J. (2001). Objects and attention: The state of the art. Cognition, 80 (1–2), 1–46. Schubotz, R. I. (2007). Prediction of external events with our motor system: towards a new framework. Trends in Cognitive Sciences, 11, 211–218. Schubotz, R. I., & von Cramon, D. Y. (2002). Dynamic patterns make the premotor cortex interested in objects: Influence of stimulus and task revealed by fMRI. Brain Research Cognitive Brain Research, 14, 357–369. Schubotz, R. I., & von Cramon, D. Y. (2001). Functional organization of the lateral premo­ tor cortex: fMRI reveals different regions activated by anticipation of object properties, location and speed. Brain Research: Cognitive Brain Research, 11 (1), 97–112. Schultz, W., & Dickinson, A. (2000). Neuronal coding of prediction errors. Annual Review of Neuroscience, 23, 473–500. (p. 73)

Shepard, R. (1990). Mind sights. New York: W. H. Freeman. Sherman, S. M., & Guillery, R. W. (2004). The visual relays in the thalamus. In L. M. Chalupa & J. S. Werner (Eds.), The visual neuroscience (pp. 565–592). Cambridge, MA: MIT Press. Page 22 of 24

Top-Down Effects in Visual Perception Shipp, S. (2005). The importance of being agranular: A comparative account of visual and motor cortex. Philosophical Transactions of the Royal Society of London, Series B, Biolog­ ical Sciences, 360, 797–814. Shipp, S., & Zeki, S. (1989). The organization of connections between areas V5 and V2 in macaque monkey visual cortex. European Journal of Neuroscience, 1 (4), 333–354. Simmons, A., Matthews, S. C., Stein, M. B., & Paulus, M. P. (2004). Anticipation of emo­ tionally aversive visual stimuli activates right insula. Neuroreport, 15, 2261–2265. Spence, D. P., & Owens, K. C. (1990). Lexical co-occurrence and association strength. Journal of Psycholinguistic Research, 19, 317–330. Sperry, R. (1950). Neural basis of the spontaneous optokinetic response produced by visu­ al inversion. Journal of Comparative and Physiological Psychology, 43, 482–489. Summerfield, C., & Egner, T. (2009). Expectation (and attention) in visual cognition. Trends in Cognitive Sciences, 13 (9), 403–408. Thorpe, S., & Imbert, M. (1989). Biological constraints on connectionist modeling. In R. Pfeifer, Z. Schreter, F. Fogelman-Soulié, & L. Steels (Eds), Connectionism in perspective (pp. 63–93). Amsterdam: Elsevier. Thorpe, S. J., Rolls, E. T., & Maddison, S. (1983). Neuronal activity in the orbitofrontal cortex of the behaving monkey. Experimental Brain Research, 49, 93–115. Todorović, D. (2007). W. Metzger: Laws of seeing. Gestalt Theory, 28, 176–180. Torralba, A., & Oliva, A. (2003). Statistics of natural image categories. Network, 14 (3), 391–412. Torralba, A., Oliva, A., Castelhano, M. S., & Henderson, J. M. (2006). Contextual guidance of attention in natural scenes: the role of global features on object search. Psychological Review, 113, 766–786. Tulving, E., & Schacter, D. L. (1990). Priming and human memory systems. Science, 247, 301–306. Ullman, S. (1995). Sequence seeking and counter streams: A computational model for bidirectional information flow in the visual cortex. Cerebral Cortex, 1, 1–11. Ungerleider, L. G., & Desimone, R. (1986). Cortical connections of visual area MT in the macaque. Journal of Comparative Neurology, 248 (2), 190–222. von Stein, A., Chiang, C., Konig, P., & Lorincz, A. (2000). Top-down processing mediated by interareal synchronization. Proceedings of the National Academy of Sciences U S A, 97 (26), 14748–14753.

Page 23 of 24

Top-Down Effects in Visual Perception von Stein, A., & Satnthein, J. (2000). Different frequencies for different scales of cortical integration: From local gamma to long range alpha/theta synchronization. International Journal of Psychophysiology, 38 (3), 301–313. Watanabe, T., Harner, A. M., Miyauchi, S., Sasaki, Y., Nielsen, M., Palomo, D., & Mukai, I. (1998). Task-dependent influences of attention on the activation of human primary visual cortex. Proceedings of the National Academy of Sciences U S A, 95, 11489–11492. Widmann, A., Gruber, T., Kujala, T., Tervaniemi, M., & Schroger, E. (2007). Binding sym­ bols and sounds: Evidence from event-related oscillatory gamma-band activity. Cerebral Cortex, 17, 2696–2702. Wiggs, C. L., & Martin, A. (1998). Properties and mechanisms of visual priming. Current Opinion in Neurobiology, 8, 227–233. Winkler, I., & Czigler, I. (1998). Mismatch negativity: Deviance detection or the mainte­ nance of the “standard.” Neuroreport, 9, 3809–3813. Wolpert, D. M., & Flanagan, J. R. (2001). Motor prediction. Current Biology, 11 (18), R729–R732. Wolpert, D. M., & Kawato, M. (1998). Multiple paired forward and inverse models for mo­ tor control. Neural Networks, 11 (7–8), 1317–1329. Zeki, S., & Bartels, A. (1999). Toward a theory of visual consciousness. Consciousness and Cognition 8, 225–259.

Moshe Bar

Moshe Bar is a neuroscientist, director of the Gonda Multidisciplinary Brain Re­ search Center at Bar-Ilan University, associate professor in psychiatry and radiology at Harvard Medical School, and associate professor in psychiatry and neuroscience at Massachusetts General Hospital. He directs the Cognitive Neuroscience Laborato­ ry at the Athinoula A. Martinos Center for Biomedical Imaging. Andreja Bubic

Andreja Bubic, Martinos Center for Biomedical Imaging, Massachusetts General Hos­ pital, Harvard Medical School, Charlestown, MA

Page 24 of 24

Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor Imagery

Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor Imagery   Grégoire Borst The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics Edited by Kevin N. Ochsner and Stephen Kosslyn Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0005

Abstract and Keywords Mental imagery is one of the cognitive functions that has received a lot of attention in the past 40 years both from philosophers and cognitive psychologists. Recently, researchers started to use neuroimaging techniques in order to tackle fundamental properties of men­ tal images such as their depictive nature—which was fiercely debated for almost 30 years. Results from neuroimaging, brain-damaged patients, and transcranial magnetic stimulation studies converged in showing that visual, spatial and motor mental imagery relies on the same basic brain mechanisms used respectively in visual perception, spatial cognition, and motor control. Thus, neuroimaging and lesions studies have proved critical to answer the imagery debate between depictive and propositionalist theorists. Partly be­ cause of the controversy that surrounded the nature of mental images, the neural bases of mental imagery are probably more closely defined than those of any other higher cog­ nitive functions. Keywords: visual mental imagery, spatial mental imagery, motor mental imagery, visual perception, neuroimaging, brain-damaged patients, transcranial magnetic stimulation

When we think of the best way to load luggage in the trunk of a car, of the fastest route to go from point A to point B, or of the easiest way to assemble bookshelves, we generally rely on our abilities to simulate those events by visualizing them instead of actually per­ forming them. When we do so, we experience “seeing with the mind’s eye,” which is the hallmark of a specific type of representation processed by our brain, namely, visual men­ tal images. According to Kosslyn, Thompson, and Ganis (2006), mental images are repre­ sentations that are similar to those created on the initial phase of perception but that do not require an external stimulation to be created. In addition, those representations pre­ serve the perceptible properties of the stimuli they represent.

Page 1 of 24

Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor Imagery

Early Studies of Mental Imagery and the Imagery Debate Visual mental imagery has a very specific place in the study of human mental activity. In fact, dating back to early theories of mental activity, Greek philosophers such as Plato proposed that memory might be analogous to a wax tablet into which our perception and thoughts stamp images of themselves, as a signet ring stamps impressions in wax. Ac­ cording to this view, seeing with the mind’s eye is considered a phenomenon closely relat­ ed to perceptual activities. Thus, the idea of an analogy between mental imagery and per­ ception is not new. However, because of the inherent private and introspective nature of mental imagery, garnering objective empirical evidence of the nature of these representa­ tions has been a great challenge for psychology researchers. The introspective nature of imagery led behaviorists (who championed the idea that psychology should focus on ob­ servable stimuli and the responses to these stimuli) such as Watson (1913) to deny the ex­ istence of mental images by asserting that thinking was solely constituted by subtle movements of the vocal apparatus. Behaviorism has had a (p. 75) long-lasting negative im­ pact on the legitimacy of studying mental imagery. In fact, neither the cognitive revolu­ tion of the 1950s—during which the human mind started to be conceived of as like com­ puter software—nor the first results of Paivio (1971) showing that mental imagery im­ proves the memorization of words were sufficient to legitimize the study of mental im­ agery. The revival of mental imagery was driven not only by empirical evidence that mental im­ agery was a key part of memory, problem solving, and creativity but also by the type of questions and methodologies researchers used. Researchers shifted from phenomenologi­ cal problematic and introspective methods and started to focus on refining the under­ standing of the nature of the representations involved in mental imagery and of the cogni­ tive processes that interpret those representations. The innovative idea was to use chronometric data as a “mental tape measure” of the underlying cognitive processes that interpret mental images in order to characterize the properties of the underlying repre­ sentations and cognitive processes. One of the most influential works that helped mental imagery to regain its respectability was proposed by Shepard and Metzler (1971). In their mental rotation paradigm, participants viewed two three-dimensional (3D) objects with several arms, each consisting of small cubes, and decided whether the two objects had the same shape, regardless of difference in the orientations of the objects. The key find­ ing was that response times increased linearly with increasing angular disparity between the two objects. The results demonstrated for the first time that people mentally rotated one of the objects in congruence with the orientation of the other object. Other para­ digms, such as the image scanning paradigm (e.g., Finke & Pinker, 1982; Kosslyn, Ball, & Reiser, 1978), allowed researchers to characterize not only the properties of the cognitive processes at play in visual mental imagery but also the nature of visual mental images. Critically, the data of these experiments suggested that visual mental images are depic­ tive representations. By depictive, researchers mean that (1) each part of the representa­ tion corresponds to a part of the represented object, such that (2) the distances between Page 2 of 24

Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor Imagery representations of the parts (in a representational space) correspond to the distances be­ tween the parts on the object itself (see Kosslyn et al., 2006). However, not all researchers interpreted behavioral results in mental imagery studies as evidence that mental images are depictive. For example, Pylyshyn (1973, 2002, 2003a, 2003b, 2007) proposed a propositional account of mental imagery. According to this view, results obtained in mental imagery experiments can be best explained by the fact that participants rely not on visuo-spatial mental images, but instead on descriptive represen­ tations (the sort of representations that underlie language). Pylyshyn (1981) championed the idea that the conscious experience of visualizing an object is purely epiphenomenal, as is the power light on an electronic device—the light does not plays a functional role in the way the electronic device works. Thus, it became evident that behavioral data would not be sufficient to resolve the mental imagery debate between propositional and depic­ tive researchers. In fact, Anderson (1978) demonstrated that any behavioral data collect­ ed in a visual mental imagery experiment could be explained equally well by inferring that depictive representations were processed or that a set of propositions were processed. As cognitive neuroscience started to elicit the neural underpinning of a number of higher cognitive functions and of visual perception started, it became evident that neuroimaging data could resolve the imagery debate initiated in the 1970s. The rationale of using neu­ roimaging to characterize the nature of visual mental images followed directly on the heels of the functional and structural equivalence documented in behavioral studies be­ tween visual mental imagery and visual perception (see Kosslyn, 1980). Researchers rea­ soned that if visual mental imagery relies on representations and cognitive processes sim­ ilar to those involved during visual perception, then visual mental imagery should rely on the same brain areas that support visual perception (Kosslyn, 1994). In this chapter we report results collected in positron tomography emission (PET), func­ tional magnetic resonance imagery (fMRI), transcranial magnetic stimulation (TMS), and brain lesions studies, which converged in showing that visual mental imagery relies on the same brain areas as those elicited when one perceives the world or initiates an ac­ tion. The different methods serve different means. For example, fMRI allows researchers to monitor the whole brain at work with a good spatial resolution—by contrasting the mean blood oxygen level–dependent signal (BOLD) in a baseline condition to the BOLD signal in an experimental condition. However, fMRI provides information on the correla­ tions between the brain areas activated and the tasks performed but not on the causal re­ lations between the two. In contrast, brain-damaged patients and TMS studies can pro­ vide such causal (p. 76) relations. In fact, if a performance in a particular task is selective­ ly impaired following a virtual lesion (TMS) or an actual brain lesion, this specific brain area plays a causal role in the cognitive processes and representations engaged in that particular task. However, researchers need to rely on previous fMRI or PET studies to de­ cide what specific brain areas to target with TMS or which patients to include in their

Page 3 of 24

Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor Imagery study. Thus, a comprehensive view of the neural underpinning of any cognitive function requires taking into account data from all of these approaches. In this chapter, we first discuss and review the results of studies that document an over­ lap of the neural circuitry in the early visual cortex between visual mental imagery and vi­ sual perception. Then, we present studies that specifically look at the neural underpin­ ning of shape-based mental imagery and spatial mental imagery. Finally, we report stud­ ies on the neural bases of motor imagery and how they overlap with those recruited when ones initiates an action.

Visual Mental Imagery and the Early Visual Ar­ eas The early visual cortex comprises Brodmann areas 17 and 18, which receive input from the retina. These visual areas are retinotopically organized: Two objects located close to each other in a visual scene activate neurons in areas 17 and 18 relatively close to each other (Sereno et al., 1995). Thus, visual space is represented topographically in the visual cortex using two dimensions: eccentricity and polar angle. “Eccentricity” is the distance from the fovea (i.e., high-resolution central region of the visual field) of a point projected on the retina. Crucially, the farther away a point is located from the fovea, the more ante­ rior the activation is observed in the early visual cortex. Bearing on the way eccentricity is represented on the cortex, Kosslyn and collaborators (1993) used PET to study whether area 17 was recruited during visual mental imagery of letters. In their task, participants visualized letters, maintained the mental images of the letters for 4 seconds, and then were asked to make a judgment about a visual property of the letters—such as whether the letters possess a straight line. Blood flow was monitored through PET. The hypothesis was that if visual mental images were depictive and recruited topographical areas of the visual cortex, then when participants were asked to visualize letters as small as possible (while remaining visible), the activation of area 17 should be more anterior than when participants visualized letters as large as possible (while being entirely visible). The re­ sults were consistent with their hypothesis: Large visualized letters activated posterior regions of area 17, whereas small visualized letters recruited anterior regions of area 17. Kosslyn, Thompson, Kim, and Alpert (1995) replicated the results in a PET study in which participants visualized line drawings of objects previously memorized in boxes of three different sizes. These two studies used a neuroimaging technique with limited spatial res­ olutions, which led some to raise doubt about these results. However, similar findings were reported when fMRI was used—a technique that provides a better spatial resolution of the brain areas activated. For example, Klein, Paradis, Po­ line, Kosslyn, and Lebihan (2000) in an event-related fMRI study documented an activa­ tion of area 17 that started 2 seconds after the auditory cue prompting participants to form a mental image, peaked around 4 to 6 seconds, and dropped off after 8 seconds or so. In a follow-up experiment, Klein et al. (2004) demonstrated that the orientation with which a bowtie shape stimulus was visualized modulated the activation of the visual cor­ Page 4 of 24

Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor Imagery tex. The activation elicited by visualizing the bowtie vertically or horizontally matched the cortical representations of the horizontal and vertical meridians. Moreover, in a fMRI study, Slotnick, Thompson, and Kosslyn (2005) found that the retinotopic maps produced by the visual presentation of rotating and flickering checkerboard wedges were similar to the ones produced when rotating and flickering checkerboard wedges were visualized. And to some extent, those maps were more similar than the maps produced in an atten­ tion-based condition. Finally, Thirion and colleagues (2006) adopted an “inverse retino­ topy” approach to infer the content of visual images based on the brain activations ob­ served. Participants were asked in a perceptual condition to fixate rotating Gabor patches and in the mental imagery condition to visualize one of the six Gabor patches rotating right or left of a fixation point. Authors were able to predict accurately the stimuli partici­ pants had seen and to a certain degree the stimuli participants had visualized. Crucially, most of the voxels leading to a correct prediction of the stimuli visualized or presented vi­ sually were located in area 17 and 18.

Figure 5.1 Set of stimuli (left) and mean response times for each participant (noted 1 to 5) in the two experimental conditions (perception vs. mental im­ agery) as a function of the repetitive transcranial magnetic stimulation (rTMS) condition (real vs. sham).

Taken together, results from fMRI and PET studies converge in showing that visual men­ tal imagery activates the early visual areas and that the spatial structure of the activa­ tions elicited by the mental imagery task is accounted for by standard (p. 77) retinotopic mapping. Nevertheless, the questions remained as to whether activation of the early visu­ al areas plays any functional role in visual imagery. In order to address this question, Kosslyn et al. (1999) designed a task in which participants first memorized four patterns of black and white stripes (which varied in length, width, orientation, and spacing of the stripes; Figure 5.1) in four quadrants, and then were asked to visualize two of the pat­ terns and to compare them on a given visual attribute (such as the orientation of the stripes). The same participants performed the task in a perceptual condition on which their judgments were based on patterns of stripes displayed on the screen. In both condi­ tions, before comparing two patterns of stripes, repetitive TMS (rTMS) was delivered to the early visual cortex—which had been shown to be activated using PET. In rTMS stud­ Page 5 of 24

Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor Imagery ies, a coil is used to deliver low-frequency magnetic pulses, which decrease cortical ex­ citability for several minutes in the cortical area targeted (see Siebner et al. 2000). This technique has the advantage that the disruption is reversible and lasts for a few minutes. In addition, because the disruption is transient, there are no compensatory phenomena as with real lesions. When stimulation was delivered to the posterior occipital lobe (real rTMS condition), participants required more time to compare two patterns of stripes than when stimulations were delivered away from the brain (in a sham rTMS). The effect of re­ al rTMS (as denoted by the difference between the sham and real stimulations; see Fig­ ure 5.1) was similar in visual mental imagery and visual perception, which makes sense if area 17 is critical for both. Sparing et al. (2002) used another TMS approach to determine whether visual mental im­ agery modulates cortex excitability. The rationale of their approach was to use the phosphene threshold (PT; i.e., the minimum TMS intensity that evokes phosphenes) to de­ termine the cortical excitability of the primary visual areas of the brain. A single-pulse TMS was delivered on the primary visual cortex to produce phosphenes in the right lower quadrant of the visual field. Concurrently, participants performed either a visual mental imagery task or an auditory control task. For each participant, the PT was determined by increasing TMS intensity on each trial until participants reported experiencing phosphenes. Visual mental imagery decreased the PT compared with the baseline condi­ tion, whereas the auditory task had no effect on the PT. The results indicate that visual mental imagery enhances cortex excitability in the visual cortex, which supports the func­ tional role of the primary visual cortex in visual mental imagery. Consistent with the role of area 17 in visual mental imagery, the apparent horizontal (p. 78) size of visual mental images of a patient who had the occipital lobe surgically removed in one hemisphere was half the apparent size of mental images in normal participants (Farah, Soso, & Dasheiff, 1992). However, not all studies reported a functional role of area 17 in visual mental imagery. In fact, neuropsychological studies offered compelling evidence that cortically blind patients could have spared visual mental imagery abilities (see Bartolomeo, 2002, 2008). Anton (1899) and Goldenberg, Müllbacher, and Nowak (1995) reported cortically blind patients who seemed to be able to form visual mental images. In addition, Chatterjee and South­ wood (1995) reported two cortically blind patients resulting from medial occipital lesions with no impairment of their capacity to imagine object forms—such as capital letters or common animals. These two patients could also draw a set of common objects from mem­ ory. Finally, Kosslyn and Thompson (2003), reviewed more than 50 neuroimaging studies (fM­ RI, PET, and single-photon emission computer tomography, or SPECT) and found that in nearly half, there was no activation of the early visual cortex. A meta-analysis of the neu­ roimaging literature of visual mental imagery revealed that three factors accounted for the probability of activation in area 17. Sensitivity of the technique is one of the factors, and 19 fMRI studies out of 27 reported activation in area 17, compared with only 2 SPECT studies out of 9 reporting such activation. The degree of detail of the visual men­ Page 6 of 24

Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor Imagery tal images that needs to be generated is also important, with high-resolution mental im­ ages more likely to elicit activation in area 17. Finally, the type of judgment accounts for the probability of activation in area 17: If spatial judgment is required, activation in area 17 is less likely. Thus, activation in area 17 most likely reflects the computation needed to generate the visual images, at least for certain types of mental images, such as high-reso­ lution, shape-based mental images.

Visual Mental Imagery and Higher Visual Areas The overlap of the brain areas elicited by visual perception and mental imagery was stud­ ied not only in early visual areas but also in higher visual areas. The visual system is orga­ nized hierarchically, with early visual cortical areas (areas 17 and 18) located on the low­ est level (see Felleman & Van Essen, 1991). Brain lesions and neuroimaging studies docu­ ment that the visual system is then organized in two parallel streams with different func­ tions (e.g., Goodale & Milner, 1992; Haxby et al., 1991; Ungerleider & Mishkin, 1982). The ventral stream (running from the occipital lobes down to the inferior temporal lobes) is specialized in processing object properties of percepts (such as shape, color, and tex­ ture), whereas the dorsal stream (running from the occipital lobes up to the posterior parietal lobes) is specialized in processing spatial properties of percepts (such as orienta­ tion and location) and action (but see for a discussion Borst, Thompson, and Kosslyn, 2011). A critical finding is that parallel deficits occur in visual mental imagery (e.g., Levine, Warach, & Farah, 1985): Damages to the ventral stream disrupt the ability to vi­ sualize the shape of objects (such as the shape of a stop sign), whereas damages to the dorsal stream disrupt the ability to create a spatial mental image (such as the locations of landmarks on a map). In the next section, we review neuroimaging and brain-damaged patient studies showing that shape-based mental imagery (including mental images of faces) and visual percep­ tion engage most of the same higher visual areas in the ventral stream and that spatial mental imagery and spatial vision recruit most of the same areas in the dorsal stream.

Ventral Stream, Shape-Based Mental Imagery and Color Imagery Brain imaging and neuropsychological data document a spatial segregation of visual ob­ ject representations in the higher visual areas. For example, Kanwisher and Yovel (2006) demonstrated that the lateral fusiform gyrus responds more strongly to pictures of faces than other categories of objects, whereas the medial fusiform gyrus and the parahip­ pocampal gyri respond selectively to pictures of buildings (Downing, Chan, Peelen, Dodds, & Kanwisher, 2006). To demonstrate the similarity between the cognitive processes and representations in vi­ sion and visual mental imagery, researchers investigated whether the spatial segregation of visual objects in the ventral stream can be found during shape-based mental imagery. Bearing on this logic, O’Craven and Kanwisher (2000) asked a group of participants ei­ ther to recognize pictures of familiar faces and buildings or to visualize those pictures in Page 7 of 24

Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor Imagery an fMRI study. In the perceptual condition, a direct comparison of activation elicited by the two types of stimuli (buildings and faces) revealed a clear segregation within the ven­ trotemporal cortex—with activation found in the fusiform face area (FFA) for faces and in the parahippocampal place area for buildings (PPA). In the visual mental imagery condi­ tion, (p. 79) the same pattern was observed but with weaker activation and smaller patch­ es of cortex activated. Crucially, there was no hint of activation of the FFA when partici­ pants visualized faces, nor of the PPA when they visualized buildings. The similarity be­ tween vision and mental imagery in the higher visual areas was further demonstrated by the fact that more than 84 percent of the voxels activated in the mental imagery condition were activated in the perceptual condition. These results were replicated in another fMRI study (Ishai, Ungerleider, & Haxby, 2000). In this study, participants were asked either to view passively pictures of three objects categories (i.e., faces, houses, and chairs), to view scrambles version of these pictures (perceptual control condition), to visualize the pictures while looking at a gray back­ ground, or to stare passively at the gray background (mental imagery control condition). When activation elicited by the three object categories were compared in the perceptual condition—after removing the activation in the respective control condition—different re­ gions in the ventral stream showed differential responses to faces (FFA), houses (PPA), and chairs (inferior temporal gyrus). Fifteen percent of the voxels in these three ventral stream regions showed a similar pattern of activation in the mental imagery condition. Mental images of the three categories of objects elicited additional activation in the pari­ etal and the frontal regions that were not found in the perceptual condition. In a follow-up study, Ishai, Haxby, and Ungerleider (2002) studied the activation elicited by famous faces either presented visually or visualized. In the mental imagery condition, participants studied pictures of half of the famous faces before the experiment. For the other half of the trials, participants had to rely on their long-term memory to generate the mental images of the faces. In the mental imagery and perceptual conditions, the FFA (lateral fusiform gyrus) was activated, and 25 percent of the voxels activated in the men­ tal imagery condition were within regions recruited during the perceptual condition. The authors found that activation within the FFA was stronger for faces studied before the ex­ periment than for faces generated on the basis of information stored in long-term memo­ ry. In addition, given that visual attention did not modulate the activity recorded in higher visual areas, Ishai and colleagues argued that attention and mental imagery are dissociat­ ed to some degree. Finally, although mental imagery and perception recruit the same category-selective ar­ eas in the ventral stream, these areas are activated predominantly through bottom-up in­ puts during perception and through top-down inputs during mental imagery. In fact, a new analysis of the data reported by Ishai et al. (2000) revealed that the functional con­ nectivity of ventral stream areas was stronger with the early visual areas in visual percep­ tion; whereas during visual mental imagery, stronger functional connections were found

Page 8 of 24

Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor Imagery between the higher visual areas and the frontal and parietal areas (Mechelli, Price, Fris­ ton, & Ishai, 2004). A recent fMRI study further supported the similarity of the brain areas recruited in the ventral stream during visual mental imagery and visual perception (Stokes, Thompson, Cusack, & Duncan, 2009). In this study, participants were asked to visualize an “X” or an “O” based on an auditory cue, or to view passively the two capital letters displayed on a computer screen. During both conditions (i.e., visual mental imagery and visual percep­ tion), the visual cortex was significantly activated. Above-baseline activation was record­ ed in the calcarine sulcus, cuneus, and lingual gyrus, and it extended to the fusiform and middle temporal gyri. In addition, in both conditions, a multivoxel pattern analysis re­ stricted to the anterior and posterior regions of the lateral occipital cortex (LOC) re­ vealed that different populations of neurons code for the two types of stimuli (“X” and “O”). Critically, a perceptual classifier trained on patterns of activation elicited by the perceptual presentation of the stimuli was able to predict the type of visual images gener­ ated in the mental imagery condition. The data speak to the fact that mental imagery and visual perception activate shared content-specific representations in high-level visual ar­ eas, including in the LOC. Brain lesions studies generally present converging evidence that mental imagery and per­ ception rely on the same cortical areas in the ventral stream (see Ganis, Thompson, Mast, & Kosslyn, 2003). The logic underlying the brain lesions studies is that if visual mental imagery and perception engage the same visual areas, then the same pattern of impair­ ment should be observe in the two functions. Moreover, given that different visual mental images (i.e., houses vs. faces) selectively elicit activation in different areas of the ventral stream, the impairment in one domain of mental imagery (color or shape) should yield parallel deficit in this specific domain in visual perception. In fact, patients with impair­ ment in face recognition (i.e., prosopagnosia) are impaired in their ability to visualize faces (Shuttleworth, Syring, & Allen, 1982; (p. 80) Young, Humphreys, Riddoch, Hellawell, & De Haan, 1994). Selective deficit to identify animals in a single case study was accom­ panied by similar deficit to describe animals or to draw them from memory (Sartori & Job, 1988). In addition, as revealed by an early review of the literature (Farah, 1984), ob­ ject agnosia was generally associated with deficit in the ability to visualize objects. Even finer parallel deficits can be observed in the ventral stream. For example, Farah, Ham­ mond, Mehta, and Ratcliff (1989) reported the case of a prosopagnosic patient with spe­ cific deficit in his ability to visualize living things (such as animals or faces) but not in his ability to visualize nonliving things. In addition, some brain-damaged patients cannot dis­ tinguish colors perceptually and present similar deficits in color imagery (e.g., Rizzo, Smith, Pokorony, & Damasio, 1993). Critically, patients with color perception deficits have good general mental imagery abilities but are specifically impaired in color mental im­ agery tasks (e.g., Riddoch & Humphreys, 1987). However, not all neuropsychological findings report parallel deficits in mental imagery and perception. Cases of patients were reported who had altered perception but pre­ served imagery (e.g., Bartolomeo et al., 1998; Behrmann, Moscovitch, & Winocur, 1994; Page 9 of 24

Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor Imagery Servos & Goodale, 1995). For example, Behrmann et al. (1994) reported the case of C.K., a brain-damaged patient with a left homonymous hemianopia and a possible thinning of the occipital cortex (as revealed by a PET and MRI scan) who was severely impaired at recognizing objects but who had no apparent deficit in shape-based mental imagery. In fact, C.K. could draw objects with considerable detail from memory and could use infor­ mation derived from visual images in a variety of tasks. Conversely, he could not identify objects presented visually, even those he drew from memory. A similar dissociation be­ tween perceptual impairments and relative spared ability in mental imagery was ob­ served in Madame D. (Bartolomeo et al., 1998). Following bilateral brain lesions to the ex­ trastriate visual areas (i.e., Brodmann areas 18, 19 bilaterally and 37 in the right hemi­ sphere), Madame D. developed severe alexia, agnosia, prosopagnosia, and achromatop­ sia. Her ability to recognize objects presented visually was severely impaired except for very simple shapes like geometric figures. In contrast, she could draw objects from mem­ ory, but she could not identify them. She performed well on an object mental imagery test. Her impairment was not restricted to shape processing. In fact, she could not dis­ criminate between colors, match colors, or point to the correct color. In sharp contrast, she presented no deficit in color imagery, being able, for example, to determine which of two objects had a darker hue when presented with a pair of objects names. In some instances, studies reported the reverse pattern of dissociation with relatively nor­ mal perception associated with deficits in visual mental imagery (e.g., Goldenberg, 1992; Guariglia, Padovani, Pantano, & Pizzamiglio, 1993; Jacobson, Pearson, & Robertson, 2008; Moro, Berlucchi, Lerch, Tomaiuolo, & Aglioti, 2008). For example, two patients who performed a battery of mental imagery tests in several sensory domains (visual, tactile, auditory, gustatory, olfactory, and motor) showed pure visual imagery deficit for one and visual and tactile imagery deficit for the other. Critically, the two patients had no appar­ ent perceptual, language, or memory deficits (Moro et al., 2008). Lesions were located in the middle and inferior temporal gyri of the left hemisphere in one patient and in the tem­ poro-occipital area and the left medial and superior parietal lobe in the other patient. The fact that some brain-damaged patients can present spared mental imagery with deficit in visual perception or spared visual perception with deficit in mental imagery could reveal a double dissociation between shape- and color-based imagery and visual perception. In fact, visualizing an object relies on top-down processes that are not always necessary to perceive this object, whereas perceiving an object relies on bottom-up orga­ nizational processes not required to visualize it (e.g., Ganis, Thompson, & Kosslyn, 2004; Kosslyn, 1994). This double dissociation is supported by the fact that not all of the same brain areas are activated during visual mental imagery and visual perception (Ganis et al., 2004; Kosslyn, Thompson, & Alpert, 1997). In an attempt to quantify the similarity be­ tween visual mental imagery and visual perception, Ganis et al. (2004) in an fMRI study asked participants to judge visual properties of objects (such as whether the object was taller than wide) based either on a visual mental image of that object or on a picture of that object presented visually. Across the entire brain, the amount of overlap of the brain regions activated during visual mental imagery and visual perception reached 90 percent. The amount of overlap in activation was smaller in the occipital and temporal lobes than Page 10 of 24

Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor Imagery in the frontal and parietal lobes, which suggests that perception relies in part on bottomup organizational processes that are not used as extensively during mental imagery. How­ ever, visual imagery elicited activation in regions that were a (p. 81) subset of the regions activated during the perceptual condition.

Dorsal Stream and Spatial Mental Imagery In the same way that researchers have studied brain areas in the ventral stream involved in shape- and color-based mental imagery, researchers have identified brain areas recruit­ ed during spatial mental imagery in the dorsal stream. A number of neuroimaging studies used a well-understood mental imagery phenomenon to investigate the brain areas elicit­ ed during spatial mental imagery, namely, the image scanning paradigm. In the image scanning paradigm, participants first learn a map of an island with a num­ ber of landmarks, then they mentally scan the distance between each pair of landmarks after hearing the names of a pair of landmarks (e.g., Denis & Cocude, 1989; Kosslyn et al., 1978). The landmarks are positioned in such a way that distances between each pair of landmarks are different. The classic finding is a linear increase of response times with in­ creasing distance between landmarks (see Denis & Kosslyn, 1999). The linear relation­ ship between distance and scanning times suggests that spatial images incorporate the metric properties of the objects they represent—which constitutes some of the evidence that spatial images depict information. In a PET study, Mellet, Tzourio, Denis, and Mazoy­ er (1995) investigated the neural basis of image scanning. After learning the map of a cir­ cular island, participants were asked either to scan between each landmark on a map pre­ sented visually in clockwise or counterclockwise direction or to scan a mental image of the same map in the same way. When compared with a rest condition, both conditions elicited brain activation in the bilateral superior external occipital regions and in the left internal parietal region (precuneus). However, primary visual areas were activated only in the perceptual condition. fMRI studies provided further evidence that spatial processing of spatial images and spa­ tial processing of the same material presented visually share the same brain areas in the dorsal stream (e.g., Trojano et al., 2000, 2004). For example, Trojano et al. (2000) asked participants to visualize two analogue clock faces and then to decide on which of them the clock hands form the greater angle. In the perceptual task, the task of the partici­ pants was identical, but the two clock faces were presented visually. When compared with a control condition (i.e., participants judged which of the two times was numerically greater), the mental imagery condition elicited activation in the posterior parietal cortex and several frontal regions. In both conditions, brain activation was found in the inferior parietal sulcus (IPS). Critically, when the two conditions (imagery and perception) were directly contrasted, the activity in the IPS was no longer observed. The neuroimaging da­ ta suggest that the IPS supports spatial processing of mental images and of visual per­ cepts. In a follow-up study using the clock-face mental imagery task in an event-related fMRI study, Formisano et al. (2002) found similar activation of the posterior parietal cor­

Page 11 of 24

Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor Imagery tex with a peak of activation in the IPS 2 seconds after the auditory presentation of the hours to visualize. Interestingly, the frontoparietal network at play during spatial imagery is not restricted to the processing of static spatial representation. In fact, Kaas, Weigelt, Roebroeck, Kohler, and Muckli (2010) studied the brain areas recruited when participants were imagining objects in movement using fMRI. In the motion imagery task, participants were asked to visualize a blue ball moving back and forth within either the upper right corner or the lower left corner of a computer screen. Participants imagined the motion of the ball at dif­ ferent speeds—adjusted in the function of duration of an auditory cue. To determine whether participants visualized the ball at the correct speed, participants were required upon hearing a specific auditory cue to decide which of two visual targets was closer to the imagined blue ball. The motion imagery task elicited activation in a parietofrontal net­ work comprising bilaterally the superior and inferior parietal lobules (areas 7 and 40) and the superior frontal gyrus (area 6), in addition to activation in the left middle occipital gyrus and hMT/V5+. Finally, in V1, V2, and V3, a negative BOLD response was found. Kass and colleagues argue that this negative BOLD signal might reflect an inhibition of these areas to prevent visual inputs to interfere with motion imagery in higher visual ar­ eas such as hMT/V5+. The recruitment of the dorsal route for spatial imagery is not restricted to the modality in which information is presented. Mellet et al. (2002) found similar activation in a pari­ etofrontal network (i.e., intraparietal sulcus, presupplementary motor area, and superior frontal sulcus) when participants mentally scan an environment described verbally or an environment learned visually. Activation of similar brain areas in the dorsal route is also observed when participants generate spatial images of cubes assembled on the basis of verbal information (Mellet et al., 1996). In addition, neuroimaging studies on (p. 82) blind participants suggest that representations and cognitive processes in spatial imagery are not visuo-spatial. For example, during a spatial mental imagery task, the superior occipi­ tal (area 19), the precuneus, and the superior parietal lobes (area 7) were activated in the same way in sighted and early blind participants (Vanlierde, de Volder, Wanet-Defalque, & Veraart, 2003). The task required participants to generate a pattern in a 6 × 6 grid by fill­ ing in cells based on verbal instructions. Once they generated the mental image of the pattern, participants judged the symmetry of this pattern. The fact that vision is not nec­ essary to form and to process spatial images was further demonstrated in an rTMS study. Aleman et al. (2002) found that participants required more time to determine whether a cross presented visually “fell” on the uppercase letter they visualized in a real rTMS con­ dition (compared with a sham rTMS condition) only when repetitive pulses were delivered on the posterior parietal cortex (P4 positions) but not when delivered on the early visual cortex (Oz position). The functional role of the dorsal route in spatial mental imagery is supported by results collected on brain-damaged patients (e.g., Farah et al., 1988; Levine et al., 1985; Luzzatti, Vecchi, Agazzi, Cesa-Bianchi, & Vergani, 1998; Morton & Morris, 1995). For example, Morton and Morris (1995) reported a patient called M.G. with a left parieto-occipital le­ Page 12 of 24

Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor Imagery sion who was selectively impaired in visuo-spatial processing. M.G. had no deficit in face recognition and visual memory tests nor in an image inspection task. In contrast, she was not only impaired on a mental rotation task but also on an image scanning task. She could learn the map of the island and indicate the correct positions of the landmarks, but she was not able to mentally scan the distance between the landmarks. She presented similar deficit when asked to scan the contour of block letters.

Motor Imagery In the previous sections, we presented evidence that visual mental imagery and spatial imagery rely on the same brain areas as the ones elicited during vision and spatial vision, respectively. Given that motor imagery occurs when a movement is mentally simulated, motor imagery should recruit brain areas involved in physical movement. And in fact there is a growing number of evidence that motor areas are activated during motor im­ agery. In the next section, we review evidence that motor imagery engages the same brain areas as the ones recruited during a physical movement, including in some in­ stances the primary motor cortex, and that motor imagery is one of the strategies used to transform mental images.

Motor Imagery and Physical Movement Decety and Jeannerod (1995) demonstrated that if one is asked to mentally walk from point A to point B, the time to realize this “mental travel” is similar to the time one would take to walk that distance. This mental travel effect (i.e., similarity of the time to imagine an action and the time to perform that action) constitutes strong evidence that motor im­ agery is crucial to simulating actual physical movements. Motor imagery is a particular type of mental imagery and differs from visual imagery (and to a certain extent from spa­ tial imagery). In fact, a number of studies have documented that visual mental imagery and motor imagery rely on distinct mechanisms and brain areas (Tomasino, Borroni, Isa­ ja, & Rumiati, 2005; Wraga, Shepard, Church, Iniati, Kosslyn, 2005; Wraga, Thompson, Alpert, & Kosslyn, 2003). A single-cell recoding of the motor strip of monkeys first demon­ strated that motor imagery relies partially on areas of the cortex that carry motor control: Neurons in the motor cortex fired in sequence depending of their orientation tuning while monkeys were planning to move a lever along a specific arc (Georgopoulos, Lurito, Petrides, Schwartz, & Massey, 1989). Crucially, the neurons fired when the animals were preparing to move their arms, not actually moving them. To study motor imagery in humans, researchers often used mental rotation paradigms. In the seminal mental rotation paradigm designed by Shepard and Metzler (1971), a pair of 3D objects with several arms (each consisting of small cubes) is presented visually (Fig­ ure 5.2). The task of the participants is to decide whether the two objects have the same shape, regardless of difference in their orientation. The key finding is that the time to make this judgment increases linearly as the angular disparity between the two objects increases (i.e., mental rotation effect). Subsequent studies showed that the mental rota­ Page 13 of 24

Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor Imagery tion effect is found with alphanumerical stimuli (e.g., Cooper & Shepard, 1973, Koriat & Norman, 1985), two-dimensional line drawings of letter-like asymmetrical characters (e.g., Tarr & Pinker, 1989), and pictures of common objects (e.g., Jolicoeur, 1985).

Figure 5.2 Example of a pair of Shepard and Met­ zler–like three-dimensional objects with (a) identical and (b) different shapes with a 50-degree rotation of the object on the right.

Richter et al. (2000) in an fMRI study found that mental rotation of Shepard and Metzler stimuli elicited activation in the superior parietal lobes bilaterally, the supplementary mo­ tor cortex, and the left (p. 83) primary motor cortex. Results from a hand mental rotation study provided additional evidence that motor processes were involved during image transformation (Parsons et al., 1995). Pictures of hands were presented in the right or left visual field with different orientations, and participants determined whether each picture depicted a left or right hand. Parsons and colleagues reasoned that the motor cortex would be recruited if participants mentally rotated their own hand in congruence with the orientation of the stimulus presented to make their judgment. Bilateral activation was found in the supplementary motor cortex, and critically, activation in the prefrontal and the insular premotor areas occurred in the hemisphere contralateral to the stimulus handedness. Activation was not restricted to brain areas that implemented motor func­ tions; significant activation was also reported in the frontal and parietal lobes as well as in area 17. According to Decety (1996), image rotation occurs because we anticipate what we would see if we manipulate an object, which implies that motor areas are recruited during men­ tal rotation regardless of the category of objects rotated. Kosslyn, DiGirolamo, Thompson, and Alpert (1998) in a PET study directly tested this assumption by asking participants ei­ ther to mentally rotate inanimate 3D armed objects or pictures of hands. In both condi­ tions, the two objects (or the two hands) were presented with different angular dispari­ ties, and participants judged whether the two objects (or hands) were identical. To deter­ mine the brain areas specifically activated during mental rotation, each experimental con­ ditions was compared with a baseline condition in which the two objects (or hands) were presented in the same orientation. The researchers found activation in the primary motor cortex (area M1), premotor cortex, and posterior parietal lobe when participants rotated hands. In contrast, none of the frontal motor areas was activated when participants men­ tally rotated inanimate objects. The findings suggest that there are at least two ways ob­ jects in images can be rotated: one that relies heavily on motor processes, and one that does not. However, the type of stimuli rotated might not predict when the motor cortex is recruited. In fact, Cohen et al. (1996) in an fMRI study found that motor areas were acti­ Page 14 of 24

Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor Imagery vated in half of the participants in a mental rotation task using 3D armed object similar to the one used in the Kosslyn et al. (1998) study.

Strategies in Mental Rotation Tasks The fact that mental rotation of inanimate objects elicits activation in frontal motor areas in some participants but not others suggests that there might be more than one strategy to rotate this type of object. Kosslyn, Thompson, Wraga, and Alpert (2001) tested whether in a mental rotation task of 3D armed objects participants could imagine the rotation of objects in two different ways: as if an external force (such as a motor) was rotating the objects (i.e., external action condition), or as if the objects were being physically manipu­ lated (i.e., internal action condition). Participants received different sets of instructions and practice procedures to prompt them to use one of the two strategies (external action vs. internal action). In the practice of the external action condition, a wooden model of a typical Shepard and Metzler object was rotated by an electric motor. In contrast, in the internal condition, participants rotated the wooden model physically. The object used dur­ ing practice was not used on the experimental trials. On each new set of trials, partici­ pants were instructed to mentally rotate the object in the exact same way the wooden model was rotated in the preceding practice session. The crucial finding was that area M1 was activated when participants mentally rotated the object on the internal action tri­ als but not on the external action trials. However, posterior parietal and secondary motor (p. 84) areas were recruited in both conditions. The results have two implications: First, mental rotation in general (independently of the type of stimuli) can be achieved by imag­ ining the physical manipulation of the object. Second, participants can adopt one or the other strategy voluntarily regardless of their cognitive styles or cognitive abilities. However, the previous study left open the question of whether one can spontaneously use a motor strategy to perform a mental rotation task of inanimate objects. Wraga et al. (2003) addressed this issue in a PET study. In their experiment, participants performed ei­ ther a mental rotation task of pictures of hands (similar to the one used by Kosslyn et al., 1998) and then a Shepard and Metzler rotation task or two Shepard and Metzler tasks. The authors reasoned that for the group that started with the mental rotation task of hands, motor processes involved in the hand rotation task would covertly transfer to the Shepard and Metzler task. In fact, when the brain activation in the two groups of partici­ pants were compared in the second mental rotation task (Shepard and Metzler task in both groups), activation in the motor areas (areas 6 and M1) were found only in the group that performed a hand rotation task before the Shepard and Metzler task (Figure 5.3). The results clearly demonstrate that motor processes can be used spontaneously to men­ tally rotate objects that are not body parts.

Page 15 of 24

Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor Imagery

Functional Role of Area M1

Figure 5.3 Brain activations observed in the internal action minus the external action conditions.

The studies we reviewed suggest that area M1 plays a role in the mental transformation of objects. However, none addressed whether M1 plays a functional role in mental trans­ formation and more specifically in mental rotation of objects. To test this issue, Ganis, Keenan, Kosslyn, and Pascual-Leone (2000) administered single-pulse TMS to the left pri­ mary motor cortex of participants while they performed mental rotations of line drawings of hands or feet presented in their right visual field. Single-pulse TMS was administered at different time intervals from the stimulus onset (400 or 650 ms) to determine when pri­ mary motor areas are recruited during mental rotation. In addition, to test whether men­ tal rotation of body parts is achieved by imagining the movement of the corresponding part of the body, single-pulse TMS was delivered specifically to the hand area of M1. Par­ ticipants required more time and made more errors when a single-pulse TMS was deliv­ ered to M1, when the single-pulse TMS was delivered 650 ms rather than 400 ms after stimulus onset, and when participants mentally rotated hands rather than feet. Within the limits of the spatial resolution of the TMS methodology, the results suggest that M1 is re­ quired to perform mental rotation of body parts by mapping the movement on one’s own body part but only after the visual and spatial relations of the stimuli have been encoded. Tomasino et al. (2005) reported converging data supporting the functional role of M1 in mental rotation by using a mental rotation task of hands in a TMS study. However, the data are not sufficient to claim that the computations are actually taking place in M1. It is possible that M1 relays information computed elsewhere in the brain (such as in the posterior parietal cortex). And in fact, Sirigu, Duhamel, Cohen, Pillon, Dubois, and Agid (1996) demonstrated that the parietal cortex, not the motor cortex, is critical to generate mental movement representations. Patients with lesions restricted to the parietal cortex showed deficit in predicting the time necessary to perform specific fin­ Page 16 of 24

Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor Imagery ger movements, whereas no such deficit was reported for a patient with lesions restricted to M1.

Conclusion Some remain dubious that mental imagery can be functionally meaningful and can consti­ tute a topic of research on its own. However, by drifting away from a purely introspective approach of mental imagery to embrace more objective approaches, and notably by using neuroimaging, researchers have collected evidence that mental images are depictive rep­ resentations interpreted by cognitive processes at play in other systems—like the percep­ tual and the (p. 85) motor systems. In fact, we hope that this review of the literature has made clear that there is little evidence to counter the concepts that most of the same neural processes underlying perception are also used in visual mental imagery and that motor imagery can recruit the motor system in a similar way that physical action does. Researchers now rely on what is known of the organization of the perceptual and motor systems and of the key features of the neural mechanisms in those systems to refine the characterization of the cognitive mechanisms at play in the mental imagery system. The encouraging note is that each new characterization of the perceptual and motor systems brings a chance to better understand neural mechanisms at play in mental imagery. Finally, with the ongoing development of more elaborate neuroimaging techniques and analyses of the BOLD signal, mental imagery researchers have an increasing set of tools at their disposal to resolve complicate questions about mental imagery. A number of ques­ tions remain to be answered in order to achieve a full understanding of the neural mecha­ nisms carrying shape, color, spatial, and motor imagery. For example, although much evi­ dence points toward an overlapping of perceptual and visual mental imagery processes in high-level visual cortices—temporal and parietal lobes—evidence remains mixed at this point concerning the role of lower level processes in visual mental imagery. Indeed, we need to understand the circumstances under which the early visual cortex is recruited during mental imagery. Another problem that warrants further investigation is the neural basis of the individual differences observed in mental imagery abilities. As a prerequisite, we can develop objective methods to measure individual differences in those abilities.

References Aleman, A., Schutter, D. J. L. G., Ramsey, N. F., van Honk, J., Kessels, R. P. C., Hoogduin, J. M., Postma, A., Kahn, R. S., & de Haan, E. H. F. (2002). Functional anatomy of top-down visuospatial processing in the human brain: Evidence from rTMS. Cognitive Brain Re­ search, 14, 300–302. Anderson, A. K. (1978). Arguments concerning representations for mental imagery. Psy­ chological Review, 85, 249–277.

Page 17 of 24

Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor Imagery Anton, G. (1899). Über die Selbstwahrnehmungen der Herderkranungen des Gehirns durch den Kranken bei Rindenblindheit. Archiv für Psychiatrie und Nervenkrankheiten, 32, 86–127. Bartolomeo, P. (2002). The relationship between visual perception and visual mental im­ agery: A reappraisal of the neuropsychological evidence. Cortex, 38, 357–378. Bartolomeo, P. (2008). The neural correlates of visual mental imagery: An ongoing de­ bate. Cortex, 44, 107–108. Bartolomeo, P., Bachoud-Levi, A. C., De Gelder, B., Denes, G., Dalla Barba, G., Brugieres, P., et al. (1998). Multiple-domain dissociation between impaired visual perception and preserved mental imagery in a patient with bilateral extrastriate lesions. Neuropsycholo­ gia, 36, 239–249. Behrmann, M., Moscovitch, M., & Winocur, G. (1994). Intact visual imagery and impaired visual perception in a patient with visual agnosia. Journal of Experimental Psychology: Human Perception and Performance, 20, 1068–1087. Borst, G., Thompson, W. L., & Kosslyn, S. M. (2011). Understanding the dorsal and ventral systems of the cortex: Beyond dichotomies. American Psychologist, 66, 624–632. Chatterjee, A., & Southwood, M. H. (1995). Cortical blindness and visual imagery. Neurol­ ogy, 45. Cohen, M. S., Kosslyn, S. M., Breiter, H. C., DiGirolamo, G. J., Thompson, W. L., Bookheimer, S. Y., Belliveau, J. W., & Rosen, B. R. (1996). Changes in cortical activity dur­ ing mental rotation: A mapping study using functional MRI. Brain, 119, 89–100. Cooper, L. A., & Shepard, R. N. (1973). Chronometric studies of the rotation of mental im­ ages. In W. G. Chase (Eds.), Visual information processing (pp. 75–176). New York: Acade­ mic Press. Decety, J. (1996). Neural representation for action. Reviews in the Neurosciences, 7, 285– 297. Decety, J., & Jeannerod, M. (1995). Mentally simulated movements in virtual reality: Does Fitts’s law hold in motor imagery? Behavioral Brain Research, 72, 127–134. Denis, M., & Cocude, M. (1989). Scanning visual images generated from verbal descrip­ tions. European Journal of Cognitive Psychology, 1, 293–307. Denis, M., & Kosslyn, S. M. (1999). Scanning visual mental images: A window on the mind. Current Psychology of Cognition, 18, 409–465. Downing, P. E., Chan, A. W., Peelen, M. V., Dodds, C. M., & Kanwisher, N. (2006). Domain specificity in visual cortex. Cerebral Cortex, 16, 1453–1461.

Page 18 of 24

Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor Imagery Farah, M. J. (1984). The neurological basis of mental imagery: A componential analysis. Cognition, 18, 245–272. Farah, M. J., Hammond, K. M., Mehta, Z., & Ratcliff, G. (1989). Category-specificity and modality-specificity in semantic memory. Neuropsychologia, 27, 193–200. Farah, M. J., Soso, M. J., & Dasheiff, R. M. (1992). Visual angle of the mind’s eye before and after unilateral occipital lobectomy. Journal of Experimental Psychology: Human Per­ ception and Performance, 18, 241–246. Felleman, D. J., & Van Essen, D. C. (1991). Distributed hierarchical processing in the pri­ mate cerebral cortex. Cerebral Cortex, 1, 1–47. Finke, R. A., & Pinker, S. (1982). Spontaneous imagery scanning in mental extrapolation. Journal of Experimental Psychology: Learning, Memory and Cognition, 8, 142–147. Formisano, E., Linden, D. E. J., Di Salle, F., Trojano, L., Esposito, F., Sack, A. T., Grossi, D., Zanella, F. E., & Goebel, R. (2002). Tracking the mind’s image in the brain I: Time-re­ solved fMRI during visuospatial mental imagery. Neuron, 35, 185–194. Ganis, G., Keenan, J. P., Kosslyn, S. M., & Pascual-Leone, A. (2000). Transcranial magnetic stimulation of primary motor cortex affects mental rotation. Cerebral Cortex, 10, 175– 180. Ganis, G., Thompson, W. L., & Kosslyn, S. M. (2004). Brain areas underlying visual mental imagery and visual perception: An fMRI study. Brain Research: Cognitive Brain Research, 20, 226–241. Ganis, G., Thompson, W. L., Mast, F. W., & Kosslyn, S. M. (2003). Visual imagery in cerebral visual dysfunction. Neurologic Clinics, 21, 631–646. (p. 86)

Georgopoulos, A. P., Lurito, J. T., Petrides, M., Schwartz, A. B., & Massey, J. T. (1989). Mental rotation of the neuronal population vector. Science, 243, 234–236. Goldenberg, G. (1992). Loss of visual imagery and loss of visual knowledge: A case study. Neuropsychologia, 30, 1081–1099. Goldenberg, G., Müllbacher, W., & Nowak, A. (1995). Imagery without perception: A case study of anosognosia for cortical blindness. Neuropsychologia, 33, 1373–1382. Goodale, M. A., & Milner, A. D. (1992). Separate visual pathways for perception and ac­ tion. Trends in Neurosciences, 15, 20–25. Guariglia, C., Padovani, A., Pantano, P., & Pizzamiglio, L. (1993). Unilateral neglect re­ stricted to visual imagery. Nature, 364, 235–237. Haxby, J. V., Grady, C. L., Horwitz, B., Ungerleider, L. G., Mishkin, M., Carson, R. E., Her­ scovitch, P., Schapiro, M. B., & Rapoport, S. I. (1991). Dissociation of object and spatial Page 19 of 24

Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor Imagery visual processing pathways in human extrastriate cortex. Proceedings of the National Academy of Sciences U S A, 88, 1621–1625. Ishai, A., Haxby, J. V., & Ungerleider, L. G. (2002). Visual imagery of famous faces: Effects of memory and attention revealed by fMRI. NeuroImage, 17, 1729–1741. Ishai, A., Ungerleider, L. G., & Haxby, J. V. (2000). Distributed neural systems for the gen­ eration of visual images. Neuron, 28, 979–990. Jacobson, L. S., Pearson, P. M., & Robertson, B. (2008). Hue-specific color memory impair­ ment in an individual with intact color perception and color naming. Neuropsychologia, 46, 22–36. Jolicoeur, P. (1985). The time to name disoriented natural objects. Memory and Cognition, 13, 289–303. Kaas, A., Weigelt, S., Roebroeck, A., Kohler, A., & Muckli, L. (2010). Imagery of a moving object: The role of occipital cortex and human MT/V5+. NeuroImage, 49, 794–804. Kanwisher, N., & Yovel, G. (2006). The fusiform face area: A cortical region specialized for the perception of faces. Philosophical Transactions of the Royal Society of London B, 361, 2109–2128. Klein, I., Dubois, J., Mangin, J. F., Kherif, F., Flandin, G., Poline, J. B., Denis, M., Kosslyn, S. M., & Le Bihan, D. (2004). Retinotopic organization of visual mental images as revealed by functional magnetic resonance imaging. Brain Research: Cognitive Brain Research, 22, 26–31. Klein, I., Paradis, A.-L., Poline, J.-B., Kosslyn, S. M., & Le Bihan, D. (2000). Transient activ­ ity in human calcarine cortex during visual imagery. Journal of Cognitive Neuroscience, 12, 15–23. Koriat, A., & Norman, J., (1985). Reading rotated words. Journal of Experimental Psychol­ ogy: Human Perception and Performance, 11, 490–508. Kosslyn, S. M. (1980). Image and mind. Cambridge, MA: Harvard University Press. Kosslyn, S. M. (1994). Image and brain. Cambridge, MA: Harvard University Press. Kosslyn, S. M., Alpert, N. M., Thompson, W. L., Maljkovic, V., Weise, S. B., Chabris, C., Hamilton, S. E., & Buonanno F. S. (1993). Visual mental imagery activates topographically organized visual cortex: PET investigations. Journal of Cognitive Neuroscience, 5, 263– 287. Kosslyn, S. M., Ball, T. M., & Reiser, B. J. (1978). Visual images preserve metric spatial in­ formation: Evidence from studies of image scanning. Journal of Experimental Psychology: Human Perception and Performance, 4, 47–60.

Page 20 of 24

Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor Imagery Kosslyn, S. M., DiGirolamo, G., Thompson, W. L., & Alpert, N. M. (1998). Mental rotation of objects versus hands: Neural mechanisms revealed by positron emission tomography. Psychophysiology, 35, 151–161. Kosslyn, S. M., Pascual-Leone, A., Felician, O., Camposano, S., Keenan, J. P., Thompson, W. L., Ganis, G., Sukel, K. E., & Alpert, N. M. (April 2, 1999). The role of area 17 in visual im­ agery: Convergent evidence from PET and rTMS. Science, 284, 167–170. Kosslyn, S. M., & Thompson, W. L. (2003). When is early visual cortex activated during vi­ sual mental imagery? Psychological Bulletin, 129, 723–746. Kosslyn, S. M., Thompson, W. L., & Alpert, N. M. (1997). Neural systems shared by visual imagery and visual perception: A positron emission tomography study. NeuroImage, 6, 320–334. Kosslyn, S. M., Thompson, W. L., & Ganis, G. (2006). The case for mental imagery. New York: Oxford University Press. Kosslyn, S. M., Thompson, W. L., Kim, I. J., & Alpert, N. M. (1995). Topographical repre­ sentations of mental images in primary visual cortex. Nature, 378, 496–498. Kosslyn, S. M., Thompson, W. L., Wraga, M., & Alpert, N. M. (2001). Imagining rotation by endogenous versus exogenous forces: Distinct neural mechanisms. NeuroReport, 12, 2519–2525. Levine, D. N., Warach, J., & Farah, M. J. (1985). Two visual systems in mental imagery: Dissociation of “what” and “where” in imagery disorders due to bilateral posterior cere­ bral lesions. Neurology, 35, 1010–1018. Luzzatti, C., Vecchi, T., Agazzi, D., Cesa-Bianchi, M., & Vergani, C. (1998). A neurological dissociation between preserved visual and impaired spatial processing in mental imagery. Cortex, 34, 461–469. Mechelli, A., Price, C. J., Friston, K. J., & Ishai, A. (2004). Where bottom-up meets topdown: neuronal interactions during perception and imagery. Cerebral Cortex, 14, 1256– 1265. Mellet, E., Briscogne, S., Crivello, F., Mazoyer, B., Denis, M., & Tzourio-Mazoyer, N. (2002). Neural basis of mental scanning of a topographic representation build from a text. Cerebral Cortex, 12, 1322–1330. Mellet, E., Tzourio, N., Crivello, F., Joliot, M., Denis, M., & Mazoyer, B. (1996). Functional anatomy of spatial mental imagery generated from verbal instructions. Journal of Neuro­ science, 16, 6504–6512. Mellet, E., Tzourio, N., Denis, M., & Mazoyer, B. (1995). A positron emission tomography study of visual and mental spatial exploration. Journal of Cognitive Neuroscience, 4, 433– 445. Page 21 of 24

Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor Imagery Moro, V., Berlucchi, G., Lerch, J., Tomaiuolo, F., & Aglioti, S. M. (2008). Selective deficit of mental visual imagery with intact primary visual cortex and visual perception. Cortex, 44, 109–118. Morton, N., & Morris, R. G. (1995). Image transformations dissociated from visuo-spatial working memory. Cognitive Neuropsychology, 12, 767–791. O’Craven, K. M., & Kanwisher, N. (2000). Mental imagery of faces and places activates corresponding stimulus-specific brain regions. Journal of Cognitive Neuroscience, 12, 1013–1023. (p. 87)

Paivio, A. (1971). Imagery and verbal processes. New York: Holt, Rinehart and Win­

ston. Parsons, L. M., Fox, P. T., Downs, J. H., Glass, T., Hirsch, T. B., Martin, C. C., Jerabek, P. A., Lancaster, J. L. (1995). Use of implicit motor imagery for visual shape discrimination as revealed by PET. Nature, 375, 54–58. Pylyshyn, Z. W. (1973). What the mind’s eye tells the mind’s brain: A critique of mental imagery. Psychological Bulletin, 80, 1–24. Pylyshyn, Z. W. (1981). Psychological explanations and knowledge-dependent processes. Cognition, 10, 267–274. Pylyshyn, Z. W. (2002). Mental imagery: In search of a theory. Behavioral and Brain Sciences, 25, 157–237. Pylyshyn, Z. W. (2003a). Return of the mental image: Are there really pictures in the head? Trends in Cognitive Sciences, 7, 113–118. Pylyshyn, Z. W. (2003b). Seeing and visualizing: It s not what you think. Cambridge, MA: MIT Press. Pylyshyn, Z. W. (2007). Things and places: How the mind connects with the world. Cam­ bridge, MA: MIT Press. Richter, W., Somorjai, R., Summers, R., Jarmasz, M., Tegeler, C., Ugurbil, K., Menon, R., Gati, J. S., Georgopoulos, A. P., & Kim, S.-G. (2000). Motor area activity during mental ro­ tation studied by time-resolved single-trial fMRI. Journal of Cognitive Neuroscience, 12, 310–320. Riddoch, M. J., & Humphreys, G. W. (1987). A case of integrative visual agnosia. Brain, 110, 1431–1462. Rizzo, M., Smith, V., Pokorny, J., & Damasio, A. (1993). Color perception profiles in central achromatopsia. Neurology 43, 995–1001. Sartori, G., & Job, R. (1988). The oyster with four legs: A neuropsychological study on the interaction of visual and semantic information. Cognitive Neuropsychology, 5, 105–132. Page 22 of 24

Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor Imagery Sereno, M. I., Dale, A. M., Reppas, J. B., Kwong, K. K., Belliveau, J. W., Brady, T. J., Rosen, B. R., & Tootell, R. B. H. (1995). Borders of multiple visual areas in humans revealed by functional magnetic resonance imaging. Science, 268, 889–893. Servos, P., & Goodale, M. A. (1995). Preserved visual imagery in visual form agnosia. Neu­ ropsychologia, 33 (11), 1383–1394. Shepard, R. N., & Metzler, J. (1971). Mental rotation of three-dimensional objects. Science, 171, 701–703. Shuttleworth, E. C., Jr., Syring, V., & Allen, N. (1982). Further observations on the nature of prosopagnosia. Brain and Cognition, 1, 307–322. Siebner, H. R., Peller, M., Willoch, F., Minoshima, S., Boecker, H., Auer, C., Drzezga, A., Conrad, B., & Bartenstein, P. (2000). Lasting cortical activation after repetitive TMS of the motor cortex: A glucose metabolic study. Neurology, 54, 956–963. Sirigu, A., Duhamel, J.-R., Cohen, L., Pillon, B., Dubois, B., & Agid, Y. (1996). The mental representation of hand movements after parietal cortex damage. Science, 273 (5281), 1564–1568. Slotnick, S. D., Thompson, W. L., & Kosslyn, S. M. (2005). Visual mental imagery induces retinotopically organized activation of early visual areas. Cerebral Cortex, 15, 1570–1583. Sparing, R., Mottaghy, F., Ganis, G. Thompson, W. L., Toepper, R., Kosslyn, S. M., & Pas­ cual-Leone, A. (2002). Visual cortex excitability increases during visual mental imagery: A TMS study in healthy human subjects. Brain Research, 938, 92–97. Stokes, M., Thompson, R., Cusack, R., & Duncan, J. (2009). Top-down activation of shapespecific population codes in visual cortex during mental imagery. Journal of Neuroscience, 29, 1565–1572. Tarr, M. J., & Pinker, S. (1989). Mental rotation and orientation-dependence in shape recognition. Cognitive Psychology, 21, 233–282. Thirion, B., Duchesnay, E., Hubbard, E., Dubois, J., Poline, J.-B., Lebihan, D., & Dehaene, S. (2006). Inverse retinotopy: Inferring the visual content of images from brain activation patterns. Neuroimage, 33, 1104–1116. Tomasino, B., Borroni, P., Isaja, A., & Rumiati, R. I. (2005). The role of the primary motor cortex in mental rotation: A TMS study. Cognitive Neuropsychology, 22, 348–363. Trojano, L., Grossi, D., Linden, D. E., Formisano, E., Hacker, H., Zanella, F. E., Goebel, R., & Di Salle, F. (2000). Matching two imagined clocks: The functional anatomy of spatial analysis in the absence of visual stimulation. Cerebral Cortex, 10, 473–481.

Page 23 of 24

Neural Underpinning of Object Mental Imagery, Spatial Imagery, and Motor Imagery Trojano, L., Linden, D. E., Formisano, E., Grossi, D., Sack, A. T., & Di Salle, F. (2004). What clocks tell us about the neural correlates of spatial imagery. European Journal of Cognitive Psychology, 16, 653–672. Ungerleider, L. G., & Mishkin, M. (1982). Two cortical visual systems. In D. J. Ingle, M. A. Goodale, & R. J. W. Mansfield (Eds.), Analysis of visual behavior (pp. 549–586). Cam­ bridge, MA: MIT Press. Vanlierde, A., de Volder, A. G., Wanet-Defalque, M. C., & Veraart C. (2003). Occipito-pari­ etal cortex activation during visuo-spatial imagery in early blind humans. NeuroImage, 19, 698–709. Watson, J. B. (1913). Psychology as the behaviorist views it. Psychological Review, 20, 158–177. Wraga, M., Shephard, J. M., Church, J. A., Inati, S., & Kosslyn, S. M. (2005). Imagined ro­ tations of self versus objects: An fMRI study. Neuropsychologia, 43, 1351–1361. Wraga, M. J., Thompson, W. L., Alpert, N. M., & Kosslyn, S. M. (2003). Implicit transfer of motor strategies in mental rotation. Brain and Cognition, 52, 135–143. Young, A. W., Humphreys, G. W., Riddoch, M. J., Hellawell, D. J., & de Haan, E. H. (1994). Recognition impairments and face imagery. Neuropsychologia, 32, 693–702.

Grégoire Borst

Grégoire Borst is an assistant professor in developmental psychology and cognitive neuroscience at Paris Descartes University.

Page 24 of 24

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose   Roni Kahana and Noam Sobel The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics Edited by Kevin N. Ochsner and Stephen Kosslyn Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0006

Abstract and Keywords Mammalian olfaction is highly stereotyped. It consists of a sensory epithelium in the nose, where odorants are transduced to form neural signals. These neural signals are projected via the olfactory nerve to the olfactory bulb, where they generate spatiotemporal patterns of neural activity subserving odorant discrimination. This information is then projected via the olfactory tract to olfactory cortex, a neural substrate optimized for olfactory ob­ ject perception. In contrast to popular notions, human olfaction is quite keen. Thus, sys­ tematic analysis of human olfactory perception has uncovered fundamental properties of mammalian olfactory processing, and mammalian olfaction explains fundamental proper­ ties of human behaviors such as eating, mating, and social interaction, which are all criti­ cal for survival. Keywords: olfactory perception, olfaction, behavior, odorant, olfactory epithelium, olfactory discrimination, piri­ form cortex, eating, mating, social interaction

Introduction Even in reviews on olfaction, it is often stated that human behavior and perception are dominated by vision, or that humans are primarily visual creatures. This reflects the con­ sensus in cognitive neuroscience (Zeki & Bartels, 1999). Indeed, if asked which distal sense we would soonest part with, most (current authors included) would select olfaction before audition or vision. Thus, whereas primarily olfactory animals such as rodents are referred to as macrosmatic, humans are considered microsmatic. That said, we trust our nose over our eyes and ears in the two most critical decisions we make: what we eat, and with whom we mate (Figure 6.1). We review various studies in this respect, yet first we turn to the reader’s clear intuition: Given a beautiful-looking slice of cake that smells of sewage and a mushy-looking shape­ less mixture that smells of cinnamon and banana, which do you eat? Given a gorgeousPage 1 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose looking individual who smells like yeast and a profoundly physically unattractive person who smells like sweet spice, with whom do you mate? In both of these key behaviors, hu­ mans, like all mammals, are primarily olfactory. With this simple truth in mind, namely, that in our most important decisions we follow our nose, should humans nevertheless still be considered microsmatic (Stoddart, 1990)?

Functional Neuroanatomy of the Mammalian Olfactory System

Figure 6.1 The primacy of human olfaction. Humans trust olfaction over vision and audition in key behav­ iors related to survival, such as mate selection and determination of edibility. Courtesy of Gilad Larom.

Page 2 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose

Figure 6.2 Schematic of the human olfactory system. Odorants are transduced at the olfactory epithelium (1). Receptors of different subtypes (three illustrat­ ed, ∼1,000 in mammals) converge via the olfactory nerve onto common glomeruli at the olfactory bulb (2). From here, information is conveyed via the later­ al olfactory tract to primary olfactory cortex (3). From here, information is conveyed throughout the brain, most notably to orbitofrontal cortex (5) via a direct and indirect route through the thalamus (4). (From Sela & Sobel, 2010. Reprinted with permission from Springer.)

Before considering the behavioral significance of human olfaction, we first provide a ba­ sic overview of olfactory system organization. The mammalian olfactory system follows a rather clear hierarchy, starting with transduction at the olfactory epithelium in the nose, then initial processing subserving odor discrimination in the olfactory bulb, and finally higher order processing related to odor object formation and odor memory in primary ol­ factory cortex (R. I. Wilson & Mainen, 2006) (Figure 6.2). This organization is bilateral and symmetrical, and although structural connectivity appears largely (p. 89) ipsilateral (left epithelium to left bulb to left cortex) (Powell, Cowan, & Raisman, 1965), functional measurements have implied more contralateral than ipsilateral driving of activity (Cross et al., 2006; McBride & Slotnick, 1997; J. Porter, Anand, Johnson, Khan, & Sobel, 2005; Savic & Gulyas, 2000; D. A. Wilson, 1997). The neural underpinnings of this functional contralaterality remain unclear.

Page 3 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose

More than One Nose in the Nose

Figure 6.3 More than one nose in the nose. A, The olfactory system in the mouse contains multiple sub­ systems: the olfactory epithelium (OE), the vomeronasal organ (VNO), the Grueneberg ganglion (GG), and the septal organ (SO). Sensory neurons po­ sitioned in the OE, SO, and GG project to the main ol­ factory bulb (MOB), whereas sensory neurons of the VNO project to the accessory olfactory bulb (AOB). (From Ferrero & Liberles, 2010, originally adapted from Buck, 2000.) B, The human nose is innervated by both olfactory and trigeminal sensory nerve end­ ings. (Modification of illustration by Patrick J. Lynch.)

Odorants are concurrently processed in several neural subsystems beyond the above-de­ scribed main olfactory system (Breer, Fleischer, & Strotmann, 2006) (Figure 6.3). For ex­ ample, air-borne molecules are transduced at endings of the trigeminal nerve in the eye, nose, and throat (Hummel, 2000). It is trigeminal activation that provides the cooling sen­ sation associated with odorants such as menthol, or the stingy sensation associated with odorants such as ammonia or onion. In rodents, at least three additional sensing mecha­ nisms have been identified in the nose. These include (1) the septal organ, which consists of a small patch of olfactory receptors that are anterior to the main epithelium (Ma et al., 2003); (2) the Grueneberg organ, which contains small grape-like clusters of receptors at the anterior end of the nasal passage that project to a separate subset of main olfactory bulb targets (Storan & Key, 2006); and (3) the vomeronasal system, or accessory olfactory system (Halpern, 1987; Wysocki & Meredith, 1987). The accessory olfactory system is equipped with a separate bilateral epithelial structure, the vomeronasal organ, or VNO (sometimes also referred to as Jacobson’s organ). The VNO is a (p. 90) pit-shaped struc­ ture at the anterior portion of the nasal passage, containing receptors that project to an accessory olfactory bulb, which in turn projects directly to critical components of the lim­ bic system such as the amygdala and hypothalamus (Keverne, 1999; Meredith, 1983) (see Page 4 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose Figure 6.3). In rodents, the accessory olfactory system plays a key role in mediating so­ cial chemosignaling (Halpern, 1987; Kimchi, Xu, & Dulac, 2007; Wysocki & Meredith, 1987). Whether humans have a septal organ or Grueneberg organ has not been carefully studied, and it is largely held that humans do not have an accessory olfactory system, al­ though this issue remains controversial (Frasnelli, Lundstrˆm, Boyle, Katsarkas, & Jones Gotman; Meredith, 2001; Monti-Bloch, Jennings-White, Dolberg, & Berliner, 1994; Witt & Hummel, 2006). Regardless of this debate, it is clear that the sensation of smell in hu­ mans and other mammals is the result of common activation across several neural subsys­ tems (Restrepo, Arellano, Oliva, Schaefer, & Lin, 2004; Spehr et al., 2006). However, be­ fore air-borne stimuli are processed, they first must be acquired.

Sniffs: More than a Mechanism for Odorant Sampling

Figure 6.4 Sniffing. Careful visualization of human sniff airflow revealed that although the nostrils are structurally close together, an asymmetry in nasal airflow generates a “functional distance” between the nostrils. A, A PIV laser light sheet was oriented in a coronal plane intersecting the nostrils at their mid­ point. B and C, PIV images of particle-laden inspired air stream for two example sniffs. D, A contour plot of velocity magnitude of the inspired air stream into the nose of a subject sniffing at 0.2 Hz. E, Velocity profiles of the right and left naris; abscissa indicates distance from the tip of the nose to the lateral extent of the naris. From Porter et al., 2007. Reprinted with permission from Nature.

Mammalian olfaction starts with a sniff—a critical act of odor sampling. Sniffs are not merely an epiphenomenon of olfaction, but rather are an intricate component of olfactory perception (Kepecs, Uchida, & Mainen, 2006, 2007; Mainland & Sobel, 2006; Schoenfeld & Cleland, 2006). Sniffs are in part a reflexive action (Tomori, Benacka, & Donic, 1998), which is then rapidly modified in accordance with odorant content (Laing, 1983) (Figure 6.4). Humans begin tailoring their sniff according to odorant properties within about 160 ms of sniff onset, reducing sniff magnitude for both intense (B. N. Johnson, Mainland, & Sobel, 2003) and unpleasant (Bensafi et al., 2003) odorants. We have proposed that the mechanism that tailors a sniff to its content is cerebellar (Sobel, Prabhakaran, Hartley, et Page 5 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose al., 1998), and cerebellar lesions indeed negate this mechanism (Mainland, Johnson, Khan, Ivry, & Sobel, 2005). Moreover, not only are sniffs the key mechanism for odorant sampling, they also play a key role in timing and organization of neural representation in the olfactory system. This influence of sniffing on neural representation in olfaction may begin at the earliest phase of olfactory processing because olfactory receptors are also mechanosensitive (Grosmaitre, Santarelli, Tan, Luo, & Ma, 2007), potentially responding to sniffs even without odor. Sniff properties are then reflected in neural activity at both the olfactory bulb (Verhagen, Wesson, Netoff, White, & Wachowiak, 2007) and olfactory cortex (Sobel, Prabhakaran, Desmond, et al., 1998). Indeed, negating sniffs (whether their execution, or only their (p. 91) intension) may underlie in part the pronounced differ­ ences in olfactory system neural activity during wake and anesthesia (Rinberg, Koulakov, & Gelperin, 2006). Finally, odor sampling is not only through the nose (orthonasal) but al­ so through the mouth (retronasal): Food odors make their way to the olfactory system by ascending through the posterior nares of the nasopharynx (Figure 6.5). Several lines of evidence have suggested partially overlapping yet partially distinct neural substrates sub­ serving orthonasal and retronasal human olfaction (Bender, Hummel, Negoias, & Small, 2009; Hummel, 2008; Small, Gerber, Mak, & Hummel, 2005).

Olfactory Epithelium: The Site of Odorant Transduction

Figure 6.5 Schematic drawing of the nasal cavity with the lower, middle, and upper turbinates. Airflow in relation to orthonasal (through the nostrils) or retronasal (from the mouth/pharynx to the nasal cavi­ ty) is indicated by arrows, both leading to the olfacto­ ry epithelium located just beneath the cribriform plate. From Negoias, Visschers, Boelrijk, & Hummel, 2008. Reprinted with permission from Elsevier.

Once sniffed, an odorant makes its way up the nasal passage, where it crosses a mucous membrane before (p. 92) interacting with olfactory receptors that line the olfactory ep­ ithelium. This step is not inconsequential to the olfactory process. Odorants cross this mucus following the combined action of passive gradients and active transporters, which generate an odorant-specific pattern of dispersion (Moulton, 1976). These so-called sorp­ Page 6 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose tion properties have been hypothesized to play a key role in odorant discrimination, in that they form a sort of chromatographic separation at the nose (Mozell & Jagodowicz, 1973). The later identification of an inordinately large family of specific olfactory receptor types (L. Buck & Axel, 1991; Zhang & Firestein, 2002) shifted the focus of enquiry regard­ ing odor discrimination to that of receptor–ligand interactions, but the chromatographic component of this process has never been negated and likely remains a key aspect of odorant processing. Once an odorant crosses the mucosa, it interacts with olfactory receptors at the sensory end of olfactory receptor neurons. Humans have about 12 million bipolar receptor neu­ rons (Moran, Rowley, Jafek, & Lovell, 1982) that differ from typical neurons in that they constantly regenerate from a basal cell layer throughout the lifespan (Graziadei & Monti Graziadei, 1983). These neurons send their dendritic process to the olfactory epithelial surface, where they form a knob from which five to twenty thin cilia extend into the mu­ cus. These cilia contain the olfactory receptors: 7-transmembrane G-protein–coupled sec­ ond-messenger receptors, where a cascade of events that starts with odorant binding cul­ minates in the opening of cross-membrane cation channels that depolarize the cell (Firestein, 2001; Spehr & Munger, 2009; Zufall, Firestein, & Shepherd, 1994) (Figure 6.6). The mammalian genome contains more than 1,000 such receptor types (L. Buck & Axel, 1991), yet humans functionally express only about 400 of these (Gilad & Lancet, 2003). Typically, each receptor neuron expresses only one receptor type, although recent evidence from Drosophila has suggested that in some cases a single neuron may express two receptor types (Goldman, Van der Goes van Naters, Lessing, Warr, & Carlson, 2005). In rodents, receptor types are grouped into four functional expression zones along a dorsoventral epithelial axis, yet are randomly dispersed within each zone (Ressler, Sulli­ van, & Buck, 1993; Strotmann, Wanner, Krieger, Raming, & Breer, 1992; Vassar, Ngai, & Axel, 1993). Each receptor type is typically responsive to a small subset of odorants (Hallem & Carlson, 2006; Malnic, Hirono, Sato, & Buck, 1999; Saito, Chi, Zhuang, Mat­ sunami, & Mainland, 2009), although some receptors may be responsive to only very few odorants (Keller, Zhuang, Chi, Vosshall, & Matsunami, 2007), and other receptors may be responsive to a very wide range of odorants (Grosmaitre et al., 2009). Despite some alter­ native hypotheses (Franco, Turin, Mershin, & Skoulakis, 2011), this receptor-to-odorant specificity is widely considered the basis for olfactory coding (Su, Menuz, & Carlson, 2009).

Page 7 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose

Olfactory Bulb: A Neural Substrate for Odorant Discrimination

Figure 6.6 Receptor events in olfaction. Signal trans­ duction in an olfactory sensory neuron. Binding of an odorant to its cognate odorant receptor (OR) results in the activation of heterotrimeric G protein (Gαolf plus Gβγ). Activated Gαolf in turn activates type III adenylyl cyclase (AC3), leading to the production of cyclic adenosine monophosphate (cAMP) from adeno­ sine triphosphate (ATP). cAMP gates or opens the cyclic nucleotide–gated (CNG) ion channel, leading to the influx of Na+ and Ca2+, depolarizing the cell. This initial depolarization is amplified through the activation of a Ca2+-dependent Cl− channel. In addi­ tion, cAMP activates protein kinase A (PKA), which can regulate other intracellular events, including transcription of cAMP-regulated genes. Reprinted with permission from DeMaria & Ngai, 2010.

Whereas receptor types appear randomly dispersed throughout each epithelial subzone, the path (p. 93) from epithelium to bulb via the olfactory nerve entails a unique pattern of convergence that brings together all receptor neurons that express a particular receptor type. These synapse onto one of two common points at the olfactory bulb, termed glomeruli (Mombaerts et al., 1996). Thus, the number of glomeruli is expected to be about double the number of receptor types, and the receptive range of a glomerulus is ex­ pected to reflect the receptive range of a given receptor type (Feinstein & Mombaerts, 2004). Within the glomeruli, receptor axons contact dendrites of either mitral or tufted output neurons and periglomerular interneurons. Whereas these rules have been learned mostly from studies in rodents, the human olfactory system may be organized slightly dif­ ferently; rather than the expected about 750 glomeruli (about double the number of ex­ pressed receptor types), postmortem studies revealed many thousands of glomeruli in the human olfactory bulb (Maresh, Rodriguez Gil, Whitman, & Greer, 2008). The stereotyped connectivity from epithelium to bulb generates a spatial representation of receptor types on the olfactory bulb surface. In simple terms, each activated glomeru­ lus reflects the activation of a given receptor type. Thus, the spatiotemporal pattern of bulbar activation is largely considered the base for olfactory discrimination coding (Firestein, 2001). The common notion is that a given odorant is represented by the partic­ ular pattern of glomeruli activation in time. Indeed, various methods of recording neural activity at the olfactory bulb have converged to support this notion (Leon & Johnson, Page 8 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose 2003; Su et al., 2009; Uchida, Takahashi, Tanifuji, & Mori, 2000) (Figure 6.7). Although it is easy to grasp and convey this notion of a purely spatial substrate where different odors induce different patterns of activation, this is clearly a simplified view because the partic­ ular timing of neural activity also clearly plays a role in odor coding at this stage. The role of temporal neural activity patterns in odor coding was primarily uncovered in insects (Laurent, 1997, 1999; Laurent, Wehr, & Davidowitz, 1996) but has been revealed in mam­ mals as well (Bathellier, Buhl, Accolla, & Carleton, 2008; Lagier, Carleton, & Lledo, 2004; Laurent, 2002). Moreover, it is noteworthy that olfactory bulb lesions have a surprisingly limited impact on olfactory discrimination (Slotnick & Schoonover, 1993), and a spa­ tiotemporal bulbar activation code has yet to be linked to meaningful olfactory informa­ tion within a predictive framework (Mainen, 2006). In other words, a “map of odors” on the olfactory bulb is a helpful concept in understanding the olfactory system, but it is not the whole story.

Primary Olfactory Cortex: A Loosely Defined Structure with Loosely Defined Function The structural and functional properties of epithelium and bulb are relatively straightfor­ ward: The epithelium is the site of transduction, where odorants become neural signals. The bulb is the site of discrimination, where different odors form different spatiotemporal patterns of neural activity. By contrast, the structure and function of primary olfactory cortex remain unclear. In other words, there is no clear agreement as to what constitutes primary olfactory cortex, let alone what it does. By current definition, primary olfactory cortex consists of all brain regions that receive di­ rect input from the mitral and tufted cell axons of the olfactory bulb (Allison, 1954; Carmichael, Clugnet, & Price, 1994; de Olmos, Hardy, & Heimer, 1978; Haberly, 2001; J. L. Price, 1973, 1987; J. L. Price, 1990; Shipley, 1995). These comprise most of the paleo­ cortex, including (by order along the olfactory tract) the anterior olfactory cortex (also re­ ferred to as the anterior olfactory nucleus) (Brunjes, Illig, & Meyer, 2005), ventral tenia tecta, anterior hippocampal continuation and indusium griseum, olfactory tubercle, piri­ form cortex, anterior cortical nucleus of the amygdala, periamygdaloid cortex, and rostral entorhinal cortex (Carmichael et al., 1994) (Figure 6.8). As can be appreciated by both the sheer area and diversity of cortical real estate that is considered primary olfactory cortex, this definition is far from functional. One cannot as­ sign a single function to “primary olfactory cortex” when primary olfactory cortex is a la­ bel legitimately applied to a large proportion of the mammalian brain. The term primary typically connotes basic functional roles such as early feature extraction, yet as can be ex­ pected, a region comprising in part piriform cortex, amygdala, and entorhinal cortex is in­ volved in far more complex sensory processing than mere early feature extraction. With this in mind, several authors have simply shifted the definition by referring to the classic primary olfactory structures as secondary olfactory structures, noting that the definition

Page 9 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose of mammalian primary olfactory cortex may better fit the olfactory bulb than piriform cor­ tex (Cleland & Sullivan, 2003; Haberly, 2001).

Figure 6.7 Spatial coding at the olfactory bulb. Pat­ terns of rat glomeruli activation (by 2-deoxyglucose uptake) evoked by different odors. Activation is rep­ resented as the average z-score pattern for both bulbs of up to four separate rats exposed to each odor. Warmer colors indicate higher uptake. From Johnson, Ong, & Leon, 2010. Copyright © 2009 Wiley-Liss, Inc.

At the same time, there has been a growing tendency to use the term primary olfactory cortex for piriform cortex alone. Piriform cortex, the largest (p. 94) component of primary olfactory cortex in mammals, lies along the olfactory tract at the junction of temporal and frontal lobes and continues onto the dorsomedial aspect of the temporal lobe (see Figure 6.8A and B). Consistent with the latter approach, here we restrict our review of olfactory cortex to the piriform portion of primary olfactory cortex alone.

Page 10 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose

Piriform Cortex: A Neural Substrate for Olfactory Object Formation

Figure 6.8 Human olfactory cortex. A, Ventral view of the human brain in which the right anterior tem­ poral lobe has been resected in the coronal plane to expose the limbic olfactory areas. B, Afferent output from the olfactory bulb (OB) passes through the lat­ eral olfactory tract (LOT) and projects monosynapti­ cally to numerous regions, including the anterior ol­ factory nucleus (AON), olfactory tubercle (OTUB), anterior piriform cortex (APC), posterior piriform cortex (PPC), amygdala (AM), and entorhinal cortex (EC). Downstream relays include the hippocampus (HP) and the putative olfactory projection site in the human orbitofrontal cortex (OFColf). C, Schematic representation of the cellular organization of the piri­ form cortex. Pyramidal neurons are located in cell body layers II and III, and their apical dendrites project to molecular layer I. Layer I is subdivided in­ to a superficial layer (Ia) that contains the sensory afferents from the olfactory bulb (shown in red) and a deeper layer (Ib) that contains the associative in­ puts from other areas of the primary olfactory cortex and higher order areas (shown in blue). Most of the layer Ia afferents terminate in the APC, whereas most of the layer Ib associative inputs terminate in the posterior piriform cortex (PPC). Reprinted with permission from Gottfried, 2010.

Piriform cortex is three-layered paleocortex that has been described in detail (Martinez, Blanco, Bullon, & Agudo, 1987). In brief, layer I is subdivided into layer Ia, where afferent fibers from the olfactory bulb terminate, and layer lb, where (p. 95) association fibers ter­ minate (see Figure 6.8C). Layer II is a compact zone of neuronal cell bodies. Layer III contains neuronal cell bodies at a lower density than layer II and a large number of den­ dritic and axonal elements. Piriform input is widely distributed, and part of piriform out­ put feeds back into piriform as further distributed input. Moreover, piriform cortex is rec­ iprocally and extensively connected with several high-order areas of the cerebral cortex, including the prefrontal, amygdaloid, perirhinal, and entorhinal cortices (Martinez et al., 1987).

Page 11 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose The current understanding of piriform cortex function largely originated from the work of Lew Haberly and colleagues (Haberly, 2001; Haberly & Bower, 1989; Illig & Haberly, 2003). These authors hypothesized that the structural organization of piriform cortex ren­ ders it highly suitable to function as a content-addressable memory system, where frag­ mented input can be used to “neurally reenact” a stored representation. Haberly and col­ leagues identified or predicted several aspects of piriform organization that render it an ideal substrate for such a system. These predictions have been remarkably borne out in later studies of structure and function. Haberly and colleagues noted that first, associa­ tive networks depend on spatially distributed input systems. Several lines of evidence have indeed suggested that the projection from bulb to piriform is in fact spatially distrib­ uted. In other words, in contrast to the spatial clustering of responses at the olfactory bulb, this ordering is apparently obliterated in the projection to piriform cortex (Stettler & Axel, 2009). Second, the discriminative power of associative networks relies on positive feedback via interconnections between the processing units that receive the distributed input. Indeed, in piriform cortex, each pyramidal cell makes a small number of synaptic contacts on a large number (>1,000) of other cells in piriform cortex at disparate loca­ tions. Axons from individual pyramidal cells also arborize extensively within many neigh­ boring cortical areas, most of which send strong projections back to piriform cortex (D. M. G. Johnson, Illig, Behan, & Haberly, 2000). Third, in associative memory models, indi­ vidual inputs are typically weak relative to output threshold, a situation that indeed likely (p. 96) occurs in piriform (Barkai & Hasselmo, 1994). Finally, content-addressable memo­ ry systems typically require activity-dependent changes in excitatory synaptic strengths. Again, this pattern has since consistently been demonstrated in piriform cortex, where enhanced olfactory learning capability is accompanied by long-term enhancement of synaptic transmission in both the descending and ascending inputs (Cohen, Reuveni, Barkai, & Maroun, 2008). In addition to the above materialization of Haberly’s predictions on piriform structure, several studies have similarly borne out his predictions on function. In a series of studies, Don Wilson and colleagues have demonstrated the importance of piriform cortex associa­ tive memory-like properties in olfactory pattern formation, completion, and separation from background (Barnes, Hofacer, Zaman, Rennaker, & Wilson, 2008; Kadohisa & Wil­ son, 2006; Linster, Henry, Kadohisa, & Wilson, 2007; Linster, Menon, Singh, & Wilson, 2009; D. A. Wilson, 2009a, 2009b; D. A. Wilson & Stevenson, 2003). In a critical recent study, these authors taught rats to discriminate between various mixtures, each contain­ ing ten monomolecular components (Barnes et al., 2008). They found that rats easily dis­ criminated between a target mixture of ten components (10C) and a second mixture in which only one of the ten components was replaced with a novel component (10CR1). In turn, rats were poor at discriminating this same target mixture from a mixture where one of the components was deleted (10C-1). The authors concluded that through pattern com­ pletion, 10C-1 was “completed” to 10C, yet through pattern separation, 10CR1 was per­ ceived as something new altogether. Critically, the authors accompanied these behavioral studies with electrical recordings from both olfactory bulb and piriform cortex. They found that a shift from 10C to 10C-1 induced a significant decorrelation in the activity of Page 12 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose olfactory bulb mitral cell ensembles. In other words, olfactory bulb mitral cell ensembles readily separated these overlapping patterns. In contrast, piriform cortex ensembles showed no significant decorrelation across 10C and 10C-1 mixtures. In other words, the piriform ensemble filled in the missing component and responded as if the full 10C mix­ ture were present—consistent with pattern completion. In contrast, a shift from 10C to 10CR1 produced significant cortical ensemble pattern separation. In other words, the en­ semble results were consistent with behavior whereby introduction of a novel component into a complex mixture was relatively easy to detect, whereas removal of a single compo­ nent was difficult to detect. Consistent with the above, Jay Gottfried and colleagues have used functional magnetic resonance imaging (fMRI) to investigate piriform activity in humans (Gottfried & Wu, 2009). In an initial study, they uncovered a heterogenic response profile whereby odorant physicochemical properties were evident in activity patterns measured in anterior piri­ form cortex, and odorant perceptual properties were associated with activity patterns measured in posterior piriform (Gottfried, Winston, & Dolan, 2006). In that posterior piri­ form is richer than anterior piriform in the extent of associational connectivity, this find­ ing is consistent with the previously described findings in rodents. Moreover, using multi­ variate fMRI analysis techniques, they found that odorants with similar perceived quality induced similar patterns of ensemble activity in posterior piriform cortex alone (Howard, Plailly, Grueschow, Haynes, & Gottfried, 2009). Taken together, these results from both rodents and humans depict piriform cortex as a critical component allowing the olfactory system to deal with an ever-changing olfactory environment, while still allowing stable ol­ factory object formation and constancy. Finally, beyond primary olfactory cortex, olfactory information is distributed widely throughout the brain. Whereas other sensory modalities traverse a thalamic relay en route from periphery to primary cortex, in olfaction information reaches primary cortex directly. This is not to say, however, that there is no olfactory thalamus. A recent lesion study has implicated thalamic involvement in olfactory identification, hedonic processing, and olfactory motor control (Sela et al., 2009), and a recent imaging study has implicated a thalamic role in olfactory attention (Plailly, Howard, Gitelman, & Gottfried, 2008), a finding further supported by lesion studies (Tham, Stevenson, & Miller, 2010). From the thalamus, olfactory information radiates widely, yet most notable in its projections is the orbitofrontal cortex that is largely considered secondary olfactory cortex (J. L. Price, 1990). Both human fMRI studies and single-cell recordings in monkeys suggest that or­ bitofrontal cortex is critical for coding odor identity (Rolls, Critchley, & Treves, 1996; Tan­ abe, Iino, Ooshima, & Takagi, 1974) and may further be key for conscious perception of smell (Li et al., 2010).

Page 13 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose

Looking at the Nose Through Human Be­ havior (p. 97)

As reviewed above, the basic functional architecture of mammalian olfaction is well un­ derstood. In the olfactory epithelium, there is a thorough understanding of receptor events that culminate in transduction of odorants into neural signals. In the olfactory bulb, there is a comprehensive view of how such neural signals form spatiotemporal pat­ terns that allow odor discrimination. Finally, in piriform cortex, there is an emerging view of how sparse neural representation enables formation of stable olfactory objects. Howev­ er, despite this good understanding of olfaction at the genetic, molecular, and cellular lev­ els, we have only poor understanding of structure–function relations in this system (Mainen, 2006). Put simply, there is not a scientist or perfumer in the world who can look at a novel molecule and predict its odor, or smell a novel smell and predict its structure. One reason for this state of affairs is that the olfactory stimulus, namely, a chemical, has typically been viewed as it would be by chemists. For example, carbon chain length has been the most consistently studied odorant property, yet there is no clear importance for carbon chain length in mammalian olfactory behavior (Boesveldt, Olsson, & Lundstrom, 2010). Indeed, as elegantly stated by the late Larry Katz at a lecture he gave at the Asso­ ciation for Chemoreception Science: “The olfactory system did not evolve to decode the catalogue of Sigma-Aldrich, it evolved to decode the world around us.” In other words, perhaps if we reexamine the olfactory stimulus space from a perceptual rather than a chemical perspective, we may gain important insight into the function of the olfactory system. It is with this notion in mind that we have recently generated an olfactory percep­ tual metric, and tested its application to perception and neural activity in the olfactory system. In an effort led by Rehan Khan (Khan et al., 2007), we constructed a perceptual “odor space” using data from the Dravnieks’ Atlas of Odor Character Profiles, wherein about 150 experts (perfumers and olfactory scientists) ranked (from 0 to 5, reflecting “absent” to “extremely” representative) 160 odorants (144 monomolecular species and 16 mix­ tures) against each of the 146 verbal descriptors (Dravnieks, 1982, 1985). We applied principal components analysis (PCA), a well-established method for dimension reduction that generates a new set of dimensions (principal components, or PCs) for the profile space in which (1) each successive dimension has the maximal possible variance and (2) all dimensions are uncorrelated. We found that the effective dimensionality of the odor profile space was much smaller than 146, with the first four PCs accounting for 54 per­ cent of the variance (Figure 6.9A). To generate a perceptual odor space, we projected the odorants onto a subspace formed by these first four PCs (Figure 6.9B. A navigable version of this space is available at the odor space link at­ gy/worg). In a series of experiments, we found that this space formed a valid representa­ tion of odorant perception: Simple Euclidian distances in the space predicted both explic­ it (Figure 6.9C) and implicit (Figure 6.9D) odor similarity. In other words, odorants close in the space smell similar, and odorants far-separated in the space smell dissimilar (Khan et al, 2007). Moreover, we found that the primary dimension in the space (PC1) was tight­ Page 14 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose ly linked to odorant pleasantness, that is, a continuum ranging from very unpleasant at one end to very pleasant at the other (Haddad et al, 2010, Figure 6.10). Finding that pleasantness was the primary dimension of human olfactory perception was consistent with many previous efforts. Odorant pleasantness was the primary aspect of odor spontaneously used by subjects in olfactory discrimination tasks (S. S. Schiffman, 1974), and odorant pleasantness was the primary criterion spontaneously used by sub­ jects in order to combine odorants into groups (Berglund, Berglund, Engen, & Ekman, 1973; S. Schiffman, Robinson, & Erickson, 1977). When using large numbers of verbal de­ scriptors in order to describe odorants, pleasantness repeatedly emerged as the primary dimension in multidimensional analyses of the resultant descriptor space (Khan et al., 2007; Moskowitz & Barbe, 1977). Studies with newborns suggested that at least some as­ pects of olfactory pleasantness are innate (Soussignan, Schaal, Marlier, & Jiang, 1997; Steiner, 1979). For example, neonate’s behavioral markers of disgust (nose wrinkling, up­ per lip raising) discriminated between vanillin judged as being pleasant and butyric acid judged to be unpleasant by adult raters (Soussignan et al., 1997). Moreover, there is agreement in the assessments of pleasantness by adults and children for various pure odorants (Schmidt & Beauchamp, 1988) and personal odors (Mallet & Schaal, 1998).

Page 15 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose

Figure 6.9 Olfactory perceptual space. A, The pro­ portion of (descending line) and cumulative (ascend­ ing line) variance in perceptual descriptions ex­ plained by each of the principal components (PCs). B, The 144 odorants projected into a two-dimensional space made of the first and second PCs. Nine odor­ ants used in experiments depicted in C and D: [acetophenone (AC), amyl acetate (AA), diphenyl ox­ ide (DP), ethyl butyrate (EB), eugenol (EU), guaiacol (GU), heptanal (HP), hexanoic acid (HX), and phenyl ethanol (PEA)]. C, For the nine odorants, the correla­ tion between explicit perceived similarity ratings and PCA-based distance for all pairwise comparisons. Odorants closer in the perceptual space were per­ ceived as more similar. D, Reaction time for correct trials in a forced-choice same–different task using five of the nine odorants. Error bars reflect SE. The reaction time was longer for odorant pairs that were closer in PCA-based space, thus providing an implicit validation of the perceptual space. Reprinted with permission from Khan et al., 2007.

Figure 6.10 Identifying pleasantness as the first PC of perception. A, The five descriptors that flanked each end of PC1 of perception. B, For the nine odor­ ants in Figure 6.8, the correlation between the pair­ wise difference in pleasantness and the pairwise dis­ tance along the first PC. Distance along the first PC was a strong predictor of difference in pleasantness. Reprinted with permission from Khan et al., 2007.

Page 16 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose After using PCA to reduce the apparent dimensionality of olfactory perception, we set out to independently apply the same approach to odorant structure. We used structural chem­ istry software to obtain 1,514 physicochemical descriptors for each of 1,565 odorants. These descriptors were of many types (p. 98) (p. 99) (e.g., atom counts, functional group counts, counts of types of bonds, molecular weights, topological descriptors). We applied PCA to these data and found that much of the variance could be explained by a relatively small number of PCs. The first PC accounted for about 32 percent of the variance, and the first ten accounted for about 70 percent of the variance.

Figure 6.11 Relating physicochemical space to per­ ceptual space. A, The correlation between the first to fourth (descending in the figure) perceptual PCs and each of the first seven physicochemical PCs for the 144 odorants. Error bars reflect the SE from 1,000 bootstrap replicates. The best correlation was be­ tween the first PC of perception and the first PC of physicochemical space. This correlation was signifi­ cantly larger than all other correlations. B, For the 144 odorants, the correlation between their actual first perceptual PC value and the value our model predicted from their physicochemical data. Reprinted with permission from Khan et al., 2007.

Because we separately generated PC spaces for perception and structure, we could then ask whether these two spaces were related in any way. In other words, we tested for a correlation between perceptual PCs and physicochemical PCs. Strikingly, the strongest correlation was between the first perceptual PC and the first physicochemical PC (Figure 6.11A). In other words, there was a privileged relationship between PC1 of perception and PC1 of physicochemical organization. The single best axis for explaining the variance in the physicochemical data was the best predictor of the single best axis for explaining the variance in the perceptual data. Having established that the physicochemical space is related to the perceptual space, we next built a linear predictive model through a crossvalidation procedure that allowed us to predict odor perception from odorant structure (Figure 6.11B). To test the predictive power of our model, we obtained physicochemical parameters for 52 odorants commonly used in olfaction experiments, but not present in the set of 144 used in the model building. We applied our model to the fifty-two new mole­ cules so that for each we had predicted values for the first PC of perceptual space. We found that using these PC values, we could convincingly predict the rank-order of pleas­ Page 17 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose antness of these molecules (Spearman rank correlation, r = 0.72; p = 0.0004), and mod­ estly yet significantly predict their actual pleasantness ratings (r = 0.55; p = 0.004). Moreover, we obtained similar predictive power across three different cultures: urban Americans in California, rural Muslim Arab Israelis, and urban Jewish Israelis (Figure 6.12).

Figure 6.12 Predicting odorant pleasantness across cultures. Twenty-seven odorous molecules not com­ monly used in olfactory studies, and not previously tested by us, were presented to three cultural groups of naïve subjects: urban Americans (23 subjects), rur­ al Arab Israelis (22 subjects), and urban Jewish Is­ raelis (20 subjects). Reprinted with permission from Khan et al., 2007.

An aspect of these results that has been viewed as challenging by many is that they imply that pleasantness is written into the molecular structure of odorants and is therefore by definition innate. This can be viewed as inconsistent with the high levels of cross-individ­ ual and cross-cultural variability in odor perception (Ayabe-Kanamura et al., 1998; Wysoc­ ki, Pierce, & Gilbert, 1991). We indeed think that odor pleasantness is hard-wired and in­ nate. Consistent with this, many odors have clear hedonic value despite no previous expe­ rience or exposure (Soussignan et al., 1997; Steiner, 1979), and moreover, the metric that links this hedonic value with odorant structure (PC1 of structure) predicts (p. 100) re­ sponses across species (Mandairon, Poncelet, Bensafi, & Didier, 2009). Nevertheless, we stress that an innate hard-wired link remains highly susceptible to the influences of learn­ ing, experience, and context. For example, no one would argue that perceived color is in­ nately and hard-wire-linked to wavelength. However, a given wavelength can be per­ ceived to have very different colors as a function of context (see striking online demon­ strations at Moreover, no one would argue that location in space is reflected in location on the retina in an innate and hard-wired fashion. Nevertheless, context can alter spatial perception, as clearly evident in the Muller-Lyer il­ Page 18 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose lusion. Similarly, we argue that odor pleasantness is hard-wire-linked to (PC1 of) odorant structure, yet this link is clearly modified by learning, experience, and context. Olfaction is often described as multidimensional. It is made of a multidimensional stimu­ lus, which is transduced by about 1,000 different receptor types, giving rise to a neural image of similarly high dimensionality. Yet our results suggested that very few dimen­ sions, in fact primarily one, captures a significant portion of the variance in olfactory per­ ception, and critically, this one dimension allows for modest yet accurate predictions of odor perception from odorant structure. With this in mind, in an effort led by Rafi Haddad (Haddad, Khan, et al., 2008; Haddad, Lapid, Harel, & Sobel, 2008; Haddad et al., 2010), we set out to ask whether this reduced dimensionality was reflected in any way in neural activity. We mined all available previously published data sets that reported the neural re­ sponse in a sizable number of receptor types or glomeruli to a sizable number of odor­ ants. This rendered 12 data sets using either methods of electrical or optical recording. Once again, we applied PCA to this data. The first two PCs alone explained about 58 per­ cent of the variance in the neural activity data. Moreover, in nine of the twelve datasets we analyzed, we found a strong correlation between PC1 of neural response space and the summed activity of the sampled population, whether spike rates or optical signal, with r values ranging between 0.73 and 0.98 (all p < 0.001). Considering the summed re­ sponse in the olfactory system of insects was previously identified as strongly predictive of insect approach or withdrawal (Kreher, Mathew, Kim, & Carlson, 2008) (Figure 6.13A), we set out here to ask whether PC1 of neural activity in the mammalian olfactory system was similarly related to behavior and perception. One of the datasets we studied was that of Saito et al. (2009), who reported the neural response of ten human neurons and fiftythree mouse neurons in vitro to a set of sixty-two odorants. We asked eighteen human subjects to rate the odorant pleasantness of twenty-six odorants randomly selected from those tested by Saito et al. (2009). The correlation between human receptor PC1 and odorant pleasantness was 0.49 (p < 0.009), and if we added the mouse receptor (p. 101) response, it was 0.71 (p < 0.0001) (Figure 6.13B). To reiterate this critical result, PC1 of odorant-induced neural activity measured in a dish by one group at Duke University in the United States was a significant predictor of odorant pleasantness, as estimated by hu­ man subjects tested by a different group at the Weizmann Institute in Israel. Finally, here we also conducted an initial investigation into the second principal compo­ nent of activity as well. In that PC1 of neural activity reflected approach or withdrawal in animals, we speculated that once approached, a second decision to be made regarding an odor is whether it is edible or poisonous. Consistent with this prediction, we found signifi­ cant correlations between PC2 of neural activity and odorant toxicity in mice and in rats (Figure 6.13C), as well as a significant correlation between toxicity/edibility and PC2 of perception in humans (Figure 6.13D). Similar findings have been obtained by others inde­ pendently (Zarzo, 2008). To conclude this section, we found that if one uses the human nose as a window onto ol­ faction, one obtains a surprisingly simplified picture that explains a significant portion of the variance in both neural activity and perception in this system. This picture relied on a Page 19 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose set of simple linear transforms. It suggested that the primary axis of perception was linked to the primary axis of odorant structure, and that both of these were in turn relat­ ed to the primary axis of neural activity in this system. Moreover, the second axis of per­ ception was linked to the second axis of neural activity. Critically, these transforms al­ lowed for modest but significant predictions of perception, structure, and neural activity across species.

Looking at Human Behavior Through the Nose

Figure 6.13 The principal axes of neural space re­ flected olfactory behavior and perception. A, Correla­ tion between PC1 of neural population activity and the odor preferences of Drosophila larvae. Every dot represents a single odor. B, Correlation between PC1 of neural space in humans and mice with human odor pleasantness. Every dot represents a single odor. C, Correlation between PC2 of neural population activi­ ty and oral toxicity for rats (LD50 values in mg/kg). Every dot represents an odor. D, Correlation be­ tween PC2 of human perceptual space and LD50 val­ ues of rats. Reprinted with permission from Haddad et al., 2010.

In the previous section, the human nose taught us about the mammalian olfactory system. This was (p. 102) possible because, in contrast to popular notions, the human nose is an astonishingly acute device. This is evident in unusually keen powers of detection and dis­ crimination, which in some cases compete with those of microsmatic mammals, or with those of sophisticated analytical equipment. These abilities have been detailed within re­ cent reviews (Sela & Sobel, 2010; Shepherd, 2004, 2005; Stevenson, 2010; Yeshurun & Sobel, 2010; Zelano & Sobel, 2005). Here, we highlight key cases in which these keen ol­ factory abilities clearly influence human behavior. As noted in the introduction, two aspects of human behavior that are, in our view, macros­ matic, are eating and mating: Notably, both are critical for survival. A third human behav­ Page 20 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose ior for which human olfactory influences are less clearly apparent, yet in our view are nevertheless critical, is social interaction. It is beyond the scope of this chapter to provide a comprehensive review on human chemosignaling, as recently done elsewhere (Steven­ son, 2010). Here, we selectively choose examples to highlight the role of olfaction in hu­ man behavior.

Eating We eat what tastes good (Drewnowski, 1997). Taste, or more accurately flavor, is domi­ nated by smell (Small, Jones-Gotman, Zatorre, Petrides, & Evans, 1997). Hence, things taste good because they smell good (Letarte, 1997). In other words, by determining the palatability and hedonic value of food, olfaction influences the balance of food intake (Rolls, 2006; Saper, Chou, & Elmquist, 2002; Yeomans, 2006). In addition to this very sim­ ple basic premise, there are also several direct and indirect lines of evidence that high­ light the significance of olfaction in eating behavior. For example, olfaction drives saliva­ tion even at subthreshold odor concentrations (Pangborn & Berggren, 1973; Rogers & Hill, 1989). Odors regulate appetite (Rogers & Hill, 1989) and affect the cephalic phase of insulin secretion (W. G. Johnson & Wildman, 1983; Louis-Sylvestre & Le Magnen, 1980) and gastric acid secretion (Feldman & Richardson, 1986). The interaction between olfaction and eating is bidirectional. Olfaction influences eating, and eating behavior and mechanisms influence olfaction. The nature of this influence, however, remains controversial. For example, whereas some studies suggest that hunger increases olfactory sensitivity to food odors (Guild, 1956; Hammer, 1951; Schneider & Wolf, 1955; Stafford & Welbeck, 2010), others failed to replicate these results (Janowitz & Grossman, 1949; Zilstorff-Pedersen, 1955), or even found the opposite—higher sensitivity in satiety (Albrecht et al., 2009). Hunger and satiety influence not only sensitivity but also hedonics: Odors of foods consumed to satiety become less pleasant (Albrecht et al., 2009; Rolls & Rolls, 1997). This satiety-driven shift in hedonic representation is accompanied by altered brain representation. This was uncovered in an elegant human brain–imaging study in which eating bananas to satiety changed the representation of banana odor in the orbitofrontal cortex (O’Doherty et al., 2000). Also, an odor encoded during inactiva­ tion of taste-cortex in rats was later remembered as the same only during similar tastecortex inactivation (Fortis-Santiago, Rodwin, Neseliler, Piette, & Katz, 2009). The mecha­ nism for these shifted representations may be evident at the earliest stages of olfactory processing: Perfusion of the eating-related hormones insulin and leptin onto olfactory re­ ceptor neurons in rats significantly increased spontaneous firing frequency in the ab­ sence of odors and decreased odorant-induced peak amplitude in response to food odors (Ketterer et al., 2010; Savigner et al., 2009). Therefore, by increasing spontaneous activi­ ty but reducing odorant-induced activity of olfactory receptor neurons, elevated levels of insulin and leptin (such as after a meal) may result in decreased global signal-to-noise ra­ tio in the olfactory epithelium (Ketterer et al., 2010; Savigner et al., 2009).

Page 21 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose The importance of olfaction for human eating behavior is clearly evidenced in cases of ol­ factory loss. Anosmic patients experience distortions in flavor perception (Bonfils, Avan, Faulcon, & Malinvaud, 2005) and changes in eating behavior (Aschenbrenner et al., 2008). Simulating anosmia in healthy subjects by intranasal lidocaine administration re­ sulted in reduced hunger ratings (Greenway et al., 2007). Nevertheless, the rate of abnor­ mal body mass index subjects among anosmic people is no larger than in the general pop­ ulation (Aschenbrenner et al., 2008). Several eating disorders, ranging from obesity (Hoover, 2010; Obrebowski, ObrebowskaKarsznia, & Gawlinski, 2000; Richardson, Vander Woude, Sudan, Thompson, & Leopold, 2004; Snyder, Duffy, Chapo, Cobbett, & Bartoshuk, 2003) to anorexia (Fedoroff, Stoner, Andersen, Doty, & Rolls, 1995; Roessner, Bleich, Banaschewski, & Rothenberger, 2005), have been associated with alterations in olfactory perception, and the nutritional chal­ lenge associated with aging has been clearly linked to the age-related loss of olfaction (Cain & Gent, 1991; Doty, 1989; (p. 103) S. S. Schiffman, 1997). Accordingly, artificially in­ creasing the odorous properties of foods helps overcome the nutritional challenge in ag­ ing (Mathey, Siebelink, de Graaf, & Van Staveren, 2001; S. S. Schiffman & Warwick, 1988; Stevens & Lawless, 1981). Consistent with the bidirectional influences of olfaction and eating behavior, edibility is clearly a key category in odor perception. It was identified as the second principal axis of perception independently by us (Haddad et al., 2010) and others (Zarzo, 2008). Consis­ tent with edibility as an olfactory category, olfactory responses are stronger (Small et al., 2005) and faster (Boesveldt, Frasnelli, Gordon, & Lundstrom), and identification is more accurate (Fusari & Ballesteros, 2008), for food over nonfood odors. Moreover, whereas humans are poor at spontaneous odor naming, they are very good at spontaneous rating of odor edibility, even in childhood (de Wijk & Cain, 1994a, 1994b). Indeed, olfactory pref­ erences of neonates are influenced by their mother’s food preferences during pregnancy (Schaal, Marlier, & Soussignan, 2000), suggesting that the powerful link between olfacto­ ry preferences and eating behavior is formed at the earliest stages of development.

Mating When reasoning the choice of a sexual partner, some may list physical and personality qualities, whereas others may just explain the choice by a “simple click” or “chemistry.” Is this “click” indeed chemical? Although, as noted, humans tend to underestimate their own olfactory abilities, humans can nevertheless use olfaction to discriminate the genetic makeup of potential mating partners. The human genome includes a region called human leukocyte antigen (HLA), which consists of many genes related to the immune system, in addition to olfactory receptor genes and pseudogenes. Several studies have found that women can use smell to discriminate between men as a function of similarity between their own and the men’s HLA alleles (Eggert, Muller-Ruchholtz, & Ferstl, 1998; Jacob, McClintock, Zelano, & Ober, 2002; Ober et al., 1997; Wedekind & Furi, 1997; Wedekind, Seebeck, Bettens, & Paepke, 1995). The “ideal” smell of genetic makeup remains contro­ versial, yet most evidence suggests that women prefer an odor of a man with HLA alleles Page 22 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose not identical to their own, but at the same time not too different (Jacob et al., 2002; T. Roberts & Roiser, 2010). In turn, this preference may be for major histocompatibility complex (MHC) heterozygosity rather than dissimilarity (Thornhill et al., 2003). Olfactory mate preference, however, is plastic. For example, single women preferred odors of MHCsimilar men, whereas women in relationships preferred odors of MHC-dissimilar men (S. C. Roberts & Little, 2008). Moreover, olfactory mate preferences are influenced by the menstrual cycle (Gangestad & Cousins, 2001; Havlicek, Roberts, & Flegr, 2005; Little, Jones, & Burriss, 2007; Singh & Bronstad, 2001) (Figure 6.14A) and by hormone-based contraceptives (S. C. Roberts, Gosling, Carter, & Petrie, 2008; Wedekind et al., 1995; Wedekind & Furi, 1997). Finally, although not directly related to mate selection, the clearest case of chemical com­ munication in humans also has clear potential implications for mating behavior. This is the phenomenon of menstrual synchrony, whereby women who live in close proximity, such as roommates in dorms, synchronize their menstrual cycle over time (McClintock, 1971). This effect is mediated by an odor in sweat. This was verified in a series of studies in which experimenters obtained underarm sweat extracts from donor women during ei­ ther the ovulatory or follicular menstrual phase. These extracts were then deposited on the upper lips of recipient women, where follicular sweat accelerated ovulation, and ovu­ latory sweat delayed it (Russell, Switz, & Thompson, 1980; Stern & McClintock, 1998) (Figure 6.14B). Moreover, variation in menstrual timing can be increased by the odor of other lactating women (Jacob et al., 2004) or regulated by the odor of male hormones (Cutler et al., 1986; Wysocki & Preti, 2004). Olfactory influences on mate preferences are not restricted to women. Men can detect an HLA odor different from their own when taken from either men or women odor donors, and can rate the similar odor as more pleasant for both of the sexes (Thornhill et al., 2003; Wedekind & Furi, 1997). In addition, men preferred the scent of common over rare MHC alleles (Thornhill et al., 2003). Moreover, unrelated to HLA similarity, male raters can detect the menstrual phase of female body odor donors. The follicular phase is rated as more pleasant and sexy than the luteal phase (Singh & Bronstad, 2001), an effect that is diminished when the women use hormonal contraceptives (Kuukasjarvi et al., 2004; Thornhill et al., 2003).

Page 23 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose

Figure 6.14 Human chemosignaling. A, Women’s preference for symmetrical men’s odor as a function of probability of conception, based on actuarial val­ ues. Normally ovulating (non-pill-using) women only. Positive regression values reflect increased relative attraction to scent of symmetrical males; r = 0.54, p < 0.005. (From Gangestad & Thornhill, 1998.) B, Change in length of the recipient’s cycle. Cycles were shorter than baseline during exposure to follic­ ular compounds (t = 1.78; p ≤ 0.05, 37 cycles) but longer during exposure to ovulatory compounds (t = 2.7; p ≤ 0.01, 38 cycles). Cycles during exposure to the carrier were not different from baseline (t = 0.05; p ≤ 0.96, 27 cycles). (From Stern & McClin­ tock, 1998.) C, Post-smell testosterone levels (con­ trolling for pre-smell testosterone levels) among men exposed to the odor of a woman close to ovulation, the odor of a woman far from ovulation, or a control odor. Error bars represent standard errors. Reprinted with permission from Miller & Maner, 2010.

These behavioral results are echoed in hormone expression. Men exposed to the scent of an ovulating woman subsequently displayed higher levels of testosterone than did men exposed to the scent of a (p. 104) nonovulating woman or a control scent (Miller & Maner, 2010) (Figure 6.14C). Moreover, a recent study on chemosignals in human tears revealed a host of influences on sexual arousal (Gelstein et al., 2011). Sniffing negative-emotion-re­ lated odorless tears obtained from women donors induced reductions in sexual appeal at­ tributed by men to pictures of women’s faces. Sniffing tears also reduced self-rated sexu­ al arousal, reduced physiological measures of arousal, and reduced levels of testosterone. Finally, fMRI revealed that sniffing women’s tears selectively reduced activity in brain substrates of sexual arousal in men (Gelstein et al., 2011) (Figure 6.15).

Page 24 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose

Social Interaction

Figure 6.15 A chemosignal in human tears. Sniffing odorless emotional tears obtained from women donors, altered brain activity in the substrates of arousal in men, and significantly lowered levels of salivary testosterone.

Whereas olfactory influences on human eating and mating are intuitively obvious, olfacto­ ry cues may play into aspects of human social interaction that have been less commonly associated with smell. Many such types of social chemosignaling have been examined (Meredith, 2001), but here we will detail only one particular case that has received more attention than others, and that is the ability of humans to smell fear. Fear or distress chemosignals are prevalent throughout animal species (Hauser et al., 2008; Pageat & Gaultier, 2003). In an initial study in humans, Chen and Haviland-Jones (2000) collected underarm odors on gauze pads from young women and men after they watched funny or frightening movies. They later asked other women and men to determine by smell which was the odor of people when they were “happy” or “afraid.” Women correctly identified happiness in men and women, and fear in men. Men correctly identified happiness in women and fear in men. A (p. 105) similar result was later obtained in a study that exam­ ined women only (Ackerl, Atzmueller, & Grammer, 2002). Moreover, women had improved performance in a cognitive verbal task after smelling fear sweat versus neutral sweat (Chen, Katdare, & Lucas, 2006), and the smell of fearful sweat biased women toward in­ terpreting ambiguous expressions as more fearful, but had no effect when the facial emo­ tion was more discernible (Zhou & Chen, 2009). Moreover, subjects had an increased startle reflex when exposed to anxiety-related sweat versus sports-related sweat (Prehn, Ohrt, Sojka, Ferstl, & Pause, 2006). Finally, imaging studies have revealed dissociable brain representations after smelling anxiety sweat versus sports-related sweat (PrehnKristensen et al., 2009). These differences are particularly pronounced in the amygdala, a brain substrate common to olfaction, fear responses, and emotional regulation of behav­ ior (Mujica-Parodi et al., 2009). Taken together, this body of research strongly suggests Page 25 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose that humans can discriminate the scent of fear from other body odors, and it is not unlike­ ly that this influences behavior. We think that smelling fear or distress is by no means one of the key roles of human chemical communication, yet we have chosen to detail this par­ ticular example of human social chemosignaling because it has received increased experi­ mental attention. We think that chemosignaling in fact plays into many aspects of human social interaction, and uncovering these instances of chemosignaling is a major goal for research in our field.

Final Word We have described the functional neuroanatomy of the mammalian sense of smell. This system is highly conserved (Ache & Young, 2005), and therefore the human sense of smell is not very different from that of other mammals. With this in mind, just as a deep under­ standing of human visual psychophysics provided the basis for probing vision neurobiolo­ gy, we propose that a solid understanding of human olfactory psychophysics is a perquisite to understanding the neurobiological mechanisms of the sense of smell. More­ over, olfaction significantly influences critical human behaviors directly related to sur­ vival, such as eating, mating, and social interaction. Better understanding of these olfac­ tory influences is key, in our view, to a comprehensive picture of human behavior.

References Ache, B. W., & Young, J. M. (2005). Olfaction: Diverse species, conserved principles. Neu­ ron, 48 (3), 417–430. Ackerl, K., Atzmueller, M., & Grammer, K. (2002). The scent of fear. Neuroendocrinology Letters, 23 (2), 79–84. Albrecht, J., Schreder, T., Kleemann, A. M., Schopf, V., Kopietz, R., Anzinger, A., et al. (2009). Olfactory detection thresholds and pleasantness of a food-related and a non-food odour in hunger and satiety. Rhinology, 47 (2), 160–165. Allison, A. (1954). The secondary olfactory areas in the human brain. Journal of Anatomy, 88, 481–488. Aschenbrenner, K., Hummel, C., Teszmer, K., Krone, F., Ishimaru, T., Seo, H. S., et al. (2008). The influence of olfactory loss on dietary behaviors. Laryngoscope, 118 (1), 135– 144. Ayabe-Kanamura, S., Schicker, I., Laska, M., Hudson, R., Distel, H., Kobayakawa, T., et al. (1998). Differences in perception of everyday odors: A Japanese-German cross-cultural study. Chemical Senses, 23 (1), 31–38. Barkai, E., & Hasselmo, M. E. (1994). Modulation of the input/output function of rat piri­ form cortex pyramidal cells. Journal of Neurophysiology, 72 (2), 644.

Page 26 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose Barnes, D. C., Hofacer, R. D., Zaman, A. R., Rennaker, R. L., & Wilson, D. A. (2008). Olfac­ tory perceptual stability and discrimination. Nature Neuroscience, 11 (12), 1378–1380. Bathellier, B., Buhl, D. L., Accolla, R., & Carleton, A. (2008). Dynamic ensemble odor cod­ ing in the mammalian olfactory bulb: Sensory information at different timescales. Neuron, 57 (4), 586–598. Bender, G., Hummel, T., Negoias, S., & Small, D. M. (2009). Separate signals for or­ thonasal vs. retronasal perception of food but not nonfood odors. Behavioral Neuro­ science, 123 (3), 481–489. Bensafi, M., Porter, J., Pouliot, S., Mainland, J., Johnson, B., Zelano, C., et al. (2003). Olfac­ tomotor activity during imagery mimics that during perception. Nature Neuroscience, 6 (11), 1142–1144. Berglund, B., Berglund, U., Engen, T., & Ekman, G. (1973). Multidimensional analysis of 21 odors. Scandinavian Journal of Psychology, 14 (2), 131–137. Boesveldt, S., Frasnelli, J., Gordon, A. R., & Lundstrom, J. N. (2010). The fish is bad: Nega­ tive food odors elicit faster and more accurate reactions than other odors. Biological Psy­ chology, 84 (2), 313–317. Boesveldt, S., Olsson, M. J., & Lundstrom, J. N. (2010). Carbon chain length and the stimu­ lus problem in olfaction. Behavioral Brain Research, 215 (1), 110–113. Bonfils, P., Avan, P., Faulcon, P., & Malinvaud, D. (2005). Distorted odorant perception: Analysis of a series of 56 patients with parosmia. Archives of Otolaryngology—Head and Neck Surgery, 131 (2), 107–112. Breer, H., Fleischer, J., & Strotmann, J. (2006). The sense of smell: Multiple olfactory sub­ systems. Cellular and Molecular Life Sciences, 63 (13), 1465–1475. Brunjes, P. C., Illig, K. R., & Meyer, E. A. (2005). A field guide to the anterior olfactory nu­ cleus (cortex). Brain Res Brain Res Rev, 50 (2), 305–335. Buck, L. B. (2000). The molecular architecture of odor and pheromone sensing in mam­ mals. Cell, 100 (6), 611–618. Buck, L., & Axel, R. (1991). A novel multigene family may encode odorant receptors: a molecular basis for odor recognition. Cell, 65 (1), 175–187. Cain, W. S., & Gent, J. F. (1991). Olfactory sensitivity: Reliability, generality, and associa­ tion with aging. Journal of Experimental Psychology: Human Perception and Performance, 17 (2), 382–391. Carmichael, S. T., Clugnet, M. C., & Price, J. L. (1994). Central olfactory connections in the macaque monkey. Journal of Comparative Neurology, 346 (3), 403–434.

Page 27 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose Chen, D., & Haviland-Jones, J. (2000). Human olfactory communication of emo­ tion. Perceptual and Motor Skills, 91 (3 Pt 1), 771. (p. 106)

Chen, D., Katdare, A., & Lucas, N. (2006). Chemosignals of fear enhance cognitive perfor­ mance in humans. Chemical Senses, 31 (5), 415. Cleland, T. A., & Sullivan, R. M. (2003). Central olfactory structures. In R. L. Doty (Ed.), Handbook of olfaction and gustation (2nd ed., pp. 165–180). New York: Marcel Dekker. Cohen, Y., Reuveni, I., Barkai, E., & Maroun, M. (2008). Olfactory learning-induced longlasting enhancement of descending and ascending synaptic transmission to the piriform cortex. Journal of Neuroscience, 28 (26), 6664. Cross, D. J., Flexman, J. A., Anzai, Y., Morrow, T. J., Maravilla, K. R., & Minoshima, S. (2006). In vivo imaging of functional disruption, recovery and alteration in rat olfactory circuitry after lesion. NeuroImage, 32 (3), 1265–1272. Cutler, W. B., Preti, G., Krieger, A., Huggins, G. R., Garcia, C. R., & Lawley, H. J. (1986). Human axillary secretions influence women’s menstrual cycles: The role of donor extract from men. Hormones and Behavior, 20 (4), 463–473. de Olmos, J., Hardy, H., & Heimer, L. (1978). The afferent connections of the main and the accessory olfactory bulb formations in the rat: an experimental HRP-study. Journal of Comparative Neurology, 15 (181), 213–244. de Wijk, R. A., & Cain, W. S. (1994a). Odor identification by name and by edibility: Lifespan development and safety. Human Factors, 36 (1), 182–187. de Wijk, R. A., & Cain, W. S. (1994b). Odor quality: Discrimination versus free and cued identification. Perception and Psychophysics, 56 (1), 12–18. DeMaria, S., & Ngai, J. (2010). The cell biology of smell. Journal of Cell Biology, 191 (3), 443–452. Doty, R. L. (1989). Influence of age and age-related diseases on olfactory function. Annals of the New York Academy of Sciences, 561, 76–86. Dravnieks, A. (1982). Odor quality: Semantically generated multi-dimensional profiles are stable. Science, 218, 799–801. Dravnieks, A. (1985). Atlas of odor character profiles. Philadelphia: ASTM Press. Drewnowski, A. (1997). Taste preferences and food intake. Annual Review of Nutrition, 17 (1), 237–253. Eggert, F., Muller-Ruchholtz, W., & Ferstl, R. (1998). Olfactory cues associated with the major histocompatibility complex. Genetica, 104 (3), 191–197.

Page 28 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose Fedoroff, I. C., Stoner, S. A., Andersen, A. E., Doty, R. L., & Rolls, B. J. (1995). Olfactory dysfunction in anorexia and bulimia nervosa. International Journal of Eating Disorders, 18 (1), 71–77. Feinstein, P., & Mombaerts, P. (2004). A contextual model for axonal sorting into glomeruli in the mouse olfactory system. Cell, 117 (6), 817–831. Feldman, M., & Richardson, C. T. (1986). Role of thought, sight, smell, and taste of food in the cephalic phase of gastric acid secretion in humans. Gastroenterology, 90 (2), 428–433. Ferrero, D. M., & Liberles, S. D. (2010). The secret codes of mammalian scents. Wiley In­ terdisciplinary Reviews: Systems Biology and Medicine, 2 (1), 23–33. Firestein, S. (2001). How the olfactory system makes sense of scents. Nature, 413 (6852), 211–218. Frasnelli, J., Lundstrom, J. N., Boyle, J. A., Katsarkas, A., & Jones-Gotman, M. (2011). The Vomeronasal Organ is not Involved in the Perception of Endogenous Odors. Human Brain Mapping, 32 (3), 450–460. Fortis-Santiago, Y., Rodwin, B. A., Neseliler, S., Piette, C. E., & Katz, D. B. (2009). State dependence of olfactory perception as a function of taste cortical inactivation. Nature Neuroscience, 13 (2), 158–159. Franco, M. I., Turin, L., Mershin, A., & Skoulakis, E. M. (2011). Molecular vibration-sens­ ing component in Drosophila melanogaster olfaction. Proceedings of the National Acade­ my of Sciences U S A, 108 (9), 3797–3802. Fusari, A., & Ballesteros, S. (2008). Identification of odors of edible and nonedible stimuli as affected by age and gender. Behavior Research Methods, 40 (3), 752. Gangestad, S. W., & Cousins, A. J. (2001). Adaptive design, female mate preferences, and shifts across the menstrual cycle. Annual Review of Sex Research, 12, 145–185. Gangestad, S. W., & Thornhill, R. (1998). Menstrual cycle variation in women’s prefer­ ences for the scent of symmetrical men. Proceedings of the Royal Society of London. B. Biological Sciences, 265 (1399), 927–933. Gelstein, S., Yeshurun, Y., Rozenkrantz, L., Shushan, S., Frumin, I., Roth, Y., et al. (2011). Human tears contain a chemosignal. Science, 331 (6014), 226–230. Gilad, Y., & Lancet, D. (2003). Population differences in the human functional olfactory repertoire. Molecular Biology and Evolution, 20 (3), 307–314. Goldman, A. L., Van der Goes van Naters, W., Lessing, D., Warr, C. G., & Carlson, J. R. (2005). Coexpression of two functional odor receptors in one neuron. Neuron, 45 (5), 661– 666.

Page 29 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose Gottfried, J. A. (2010). Central mechanisms of odour object perception. Nature Reviews, Neuroscience, 11 (9), 628–641. Gottfried, J. A., Winston, J. S., & Dolan, R. J. (2006). Dissociable codes of odor quality and odorant structure in human piriform cortex. Neuron, 49 (3), 467–479. Gottfried, J. A., & Wu, K. N. (2009). Perceptual and neural pliability of odor objects. An­ nals of the New York Academy of Sciences, 1170, 324–332. Graziadei, P. P., & Monti Graziadei, A. G. (1983). Regeneration in the olfactory system of vertebrates. American Journal of Otolaryngology, 4 (4), 228–233. Greenway, F. L., Martin, C. K., Gupta, A. K., Cruickshank, S., Whitehouse, J., DeYoung, L., et al. (2007). Using intranasal lidocaine to reduce food intake. International Journal of Obesity (London), 31 (5), 858–863. Grosmaitre, X., Fuss, S. H., Lee, A. C., Adipietro, K. A., Matsunami, H., Mombaerts, P., et al. (2009). SR1, a mouse odorant receptor with an unusually broad response profile. Jour­ nal of Neuroscience, 29 (46), 14545–14552. Grosmaitre, X., Santarelli, L. C., Tan, J., Luo, M., & Ma, M. (2007). Dual functions of mam­ malian olfactory sensory neurons as odor detectors and mechanical sensors. Nature Neu­ roscience, 10 (3), 348–354. Guild, A. A. (1956). Olfactory acuity in normal and obese human subjects: Diurnal varia­ tions and the effect of d-amphetamine sulphate. Journal of Laryngology and Otology, 70 (7), 408–414. Haberly, L. B. (2001). Parallel-distributed processing in olfactory cortex: New insights from morphological and physiological analysis of neuronal circuitry. Chemical Senses, 26 (5), 551–576. Haberly, L. B., & Bower, J. M. (1989). Olfactory cortex: Model circuit for study of associa­ tive memory? Trends in Neurosciences, 12 (7), 258–264. Haddad, R., Khan, R., Takahashi, Y. K., Mori, K., Harel, D., & Sobel, N. (2008). A metric for odorant comparison. Nature Methods, 5 (5), 425–429. (p. 107)

Haddad, R., Lapid, H., Harel, D., & Sobel, N. (2008). Measuring smells. Current Opinion in Neurobiology, 18 (4), 438–444. Haddad, R., Weiss, T., Khan, R., Nadler, B., Mandairon, N., Bensafi, M., et al. (2010). Glob­ al features of neural activity in the olfactory system form a parallel code that predicts ol­ factory behavior and perception. Journal of Neuroscience, 30 (27), 9017–9026. Hallem, E. A., & Carlson, J. R. (2006). Coding of odors by a receptor repertoire. Cell, 125 (1), 143–160.

Page 30 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose Halpern, M. (1987). The organization and function of the vomeronasal system. Annual Re­ view of Neuroscience, 10 (1), 325–362. Hammer, F. J. (1951). The relation of odor, taste and flicker-fusion thresholds to food in­ take. Journal of Comparative Physiology and Psychology, 44 (5), 403–411. Hauser, R., Marczak, M., Karaszewski, B., Wiergowski, M., Kaliszan, M., Penkowski, M., et al. (2008). A preliminary study for identifying olfactory markers of fear in the rat. Labo­ ratory Animals (New York), 37 (2), 76–80. Havlicek, J., Roberts, S. C., & Flegr, J. (2005). Women’s preference for dominant male odour: Effects of menstrual cycle and relationship status. Biology Letters, 1 (3), 256–259. Hoover, K. C. (2010). Smell with inspiration: The evolutionary significance of olfaction. American Journal of Physical Anthropology, 143 (Suppl 51), 63–74. Howard, J. D., Plailly, J., Grueschow, M., Haynes, J. D., & Gottfried, J. A. (2009). Odor qual­ ity coding and categorization in human posterior piriform cortex. Nature Neuroscience, 12 (7), 932–938. Hummel, T. (2000). Assessment of intranasal trigeminal function. International Journal of Psychophysiology, 36 (2), 147–155. Hummel, T. (2008). Retronasal perception of odors. Chemical Biodiversity, 5 (6), 853–861. Illig, K. R., & Haberly, L. B. (2003). Odor-evoked activity is spatially distributed in piri­ form cortex. Journal of Comparative Neurology, 457 (4), 361–373. Jacob, S., McClintock, M. K., Zelano, B., & Ober, C. (2002). Paternally inherited HLA alle­ les are associated with women’s choice of male odor. Nature Genetics, 30 (2), 175–179. Jacob, S., Spencer, N. A., Bullivant, S. B., Sellergren, S. A., Mennella, J. A., & McClintock, M. K. (2004). Effects of breastfeeding chemosignals on the human menstrual cycle. Hu­ man Reproduction, 19 (2), 422–429. Janowitz, H. D., & Grossman, M. I. (1949). Gustoolfactory thresholds in relation to ap­ petite and hunger sensations. Journal of Applied Physiology, 2 (4), 217–222. Johnson, B. A., Ong, J., & Leon, M. (2010). Glomerular activity patterns evoked by natural odor objects in the rat olfactory bulb are related to patterns evoked by major odorant components. Journal of Comparative Neurology, 518 (9), 1542–1555. Johnson, B. N., Mainland, J. D., & Sobel, N. (2003). Rapid olfactory processing implicates subcortical control of an olfactomotor system. Journal of Neurophysiology, 90 (2), 1084– 1094. Johnson, D. M. G., Illig, K. R., Behan, M., & Haberly, L. B. (2000). New features of connec­ tivity in piriform cortex visualized by intracellular injection of pyramidal cells suggest Page 31 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose that” primary” olfactory cortex functions like” association” cortex in other sensory sys­ tems. Journal of Neuroscience, 20 (18), 6974. Johnson, W. G., & Wildman, H. E. (1983). Influence of external and covert food stimuli on insulin secretion in obese and normal persons. Behavioral Neuroscience, 97 (6), 1025– 1028. Kadohisa, M., & Wilson, D. A. (2006). Separate encoding of identity and similarity of com­ plex familiar odors in piriform cortex. Proceedings of the National Academy of Sciences U S A, 103 (41), 15206–15211. Keller, A., Zhuang, H., Chi, Q., Vosshall, L. B., & Matsunami, H. (2007). Genetic variation in a human odorant receptor alters odour perception. Nature, 449 (7161), 468–472. Kepecs, A., Uchida, N., & Mainen, Z. F. (2006). The sniff as a unit of olfactory processing. Chemical Senses, 31 (2), 167. Kepecs, A., Uchida, N., & Mainen, Z. F. (2007). Rapid and precise control of sniffing dur­ ing olfactory discrimination in rats. Journal of Neurophysiology, 98 (1), 205. Ketterer, C., Heni, M., Thamer, C., Herzberg-Schafer, S. A., Haring, H. U., & Fritsche, A. (2010). Acute, short-term hyperinsulinemia increases olfactory threshold in healthy sub­ jects. International Journal of Obesity (London), 35 (8), 1135–1138. Keverne, E. B. (1999). The vomeronasal organ. Science, 286 (5440), 716. Khan, R., Luk, C., Flinker, A., Aggarwal, A., Lapid, H., Haddad, R., et al. (2007). Predict­ ing odor pleasantness from odorant structure: Pleasantness as a reflection of the physical world. Journal of Neuroscience, 27 (37), 10015–10023. Kimchi, T., Xu, J., & Dulac, C. (2007). A functional circuit underlying male sexual behav­ iour in the female mouse brain. Nature, 448 (7157), 1009–1014. Kreher, S. A., Mathew, D., Kim, J., & Carlson, J. R. (2008). Translation of sensory input in­ to behavioral output via an olfactory system. Neuron, 59 (1), 110–124. Kuukasjarvi, S., Eriksson, C. J. P., Koskela, E., Mappes, T., Nissinen, K., & Rantala, M. J. (2004). Attractiveness of women’s body odors over the menstrual cycle: the role of oral contraceptives and receiver sex. Behavioral Ecology, 15 (4), 579–584. Lagier, S., Carleton, A., & Lledo, P. M. (2004). Interplay between local GABAergic in­ terneurons and relay neurons generates {gamma} oscillations in the rat olfactory bulb. Journal of Neuroscience, 24 (18), 4382. Laing, D. G. (1983). Natural sniffing gives optimum odour perception for humans. Percep­ tion, 12 (2), 99–117. Laurent, G. (1997). Olfactory processing: Maps, time and codes. Current Opinion in Neu­ robiology, 7 (4), 547–553. Page 32 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose Laurent, G. (1999). A systems perspective on early olfactory coding. Science, 286 (5440), 723–728. Laurent, G. (2002). Olfactory network dynamics and the coding of multidimensional sig­ nals. Nature Reviews, Neuroscience, 3 (11), 884–895. Laurent, G., Wehr, M., & Davidowitz, H. (1996). Temporal representations of odors in an olfactory network. Journal of Neuroscience, 16 (12), 3837–3847. Leon, M., & Johnson, B. A. (2003). Olfactory coding in the mammalian olfactory bulb. Brain Research, Brain Research Reviews, 42 (1), 23–32. Letarte, A. (1997). Similarities and differences in affective and cognitive origins of food likings and dislikes* 1. Appetite, 28 (2), 115–129. Li, W., Lopez, L., Osher, J., Howard, J. D., Parrish, T. B., & Gottfried, J. A. (2010). Right or­ bitofrontal cortex mediates conscious olfactory perception. Psychological Science, 21 (10), 1454–1463. Linster, C., Henry, L., Kadohisa, M., & Wilson, D. A. (2007). Synaptic adaptation and odor-background segmentation. Neurobiology of Learning and Memory, 87 (3), 352– 360. (p. 108)

Linster, C., Menon, A. V., Singh, C. Y., & Wilson, D. A. (2009). Odor-specific habituation arises from interaction of afferent synaptic adaptation and intrinsic synaptic potentiation in olfactory cortex. Learning and Memory, 16 (7), 452–459. Little, A. C., Jones, B. C., & Burriss, R. P. (2007). Preferences for masculinity in male bod­ ies change across the menstrual cycle. Hormones and Behavior, 51 (5), 633–639. Louis-Sylvestre, J., & Le Magnen, J. (1980). Palatability and preabsorptive insulin release. Neuroscience and Biobehavioral Reviews, 4 (Suppl 1), 43–46. Ma, M., Grosmaitre, X., Iwema, C. L., Baker, H., Greer, C. A., & Shepherd, G. M. (2003). Olfactory signal transduction in the mouse septal organ. Journal of Neuroscience, 23 (1), 317. Mainen, Z. F. (2006). Behavioral analysis of olfactory coding and computation in rodents. Current Opinion on Neurobiology, 16 (4), 429–434. Mainland, J., Johnson, B. N., Khan, R., Ivry, R. B., & Sobel, N. (2005). Olfactory impair­ ments in patients with unilateral cerebellar lesions are selective to inputs from the con­ tralesion nostril. Journal of Neuroscience, 25 (27), 6362–6371. Mainland, J., & Sobel, N. (2006). The sniff is part of the olfactory percept. Chemical Sens­ es, 31 (2), 181–196. Mallet, P., & Schaal, B. (1998). Rating and recognition of peers’ personal odors by 9-yearold children: an exploratory study. Journal of General Psychology, 125 (1), 47–64. Page 33 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose Malnic, B., Hirono, J., Sato, T., & Buck, L. B. (1999). Combinatorial receptor codes for odors. Cell, 96 (5), 713–723. Mandairon, N., Poncelet, J., Bensafi, M., & Didier, A. (2009). Humans and mice express similar olfactory preferences. PLoS One, 4 (1), e4209. Maresh, A., Rodriguez Gil, D., Whitman, M. C., & Greer, C. A. (2008). Principles of glomerular organization in the human olfactory bulb—implications for odor processing. PLoS One, 3 (7), e2640. Martinez, M. C., Blanco, J., Bullon, M. M., & Agudo, F. J. (1987). Structure of the piriform cortex of the adult rat: A Golgi study. J Hirnforsch, 28 (3), 341–834. Mathey, M. F., Siebelink, E., de Graaf, C., & Van Staveren, W. A. (2001). Flavor enhance­ ment of food improves dietary intake and nutritional status of elderly nursing home resi­ dents. Journals of Gerontology. A. Biological Sciences and Medical Sciences, 56 (4), M200–M205. McBride, S. A., & Slotnick, B. (1997). The olfactory thalamocortical system and odor re­ versal learning examined using an asymmetrical lesion paradigm in rats. Behavioral Neu­ roscience, 111 (6), 1273. McClintock, M. K. (1971). Menstrual synchrony and suppression. Nature, 229 (5282), 244–245. Meredith, M. (1983). Sensory physiology of pheromone communication. In J. G. Vanden­ bergh (Ed.), Pheromones and reproduction in mammals (pp. 200–252). New York: Acade­ mic Press. Meredith, M. (2001). Human vomeronasal organ function: a critical review of best and worst cases. Chemical Senses, 26 (4), 433. Miller, S. L., & Maner, J. K. (2010). Scent of a woman: men’s testosterone responses to ol­ factory ovulation cues. Psychological Sciences, 21 (2), 276–283. Mombaerts, P., Wang, F., Dulac, C., Chao, S. K., Nemes, A., Mendelsohn, M., et al. (1996). Visualizing an olfactory sensorymap. Cell, 87 (4), 675–686. Monti-Bloch, L., Jennings-White, C., Dolberg, D. S., & Berliner, D. L. (1994). The human vomeronasal system. Psychoneuroendocrinology, 19 (5–7), 673–686. Moran, D. T., Rowley, J. C., Jafek, B. W., & Lovell, M. A. (1982). The fine-structure of the olfactory mucosa in man. Journal of Neurocytology, 11 (5), 721–746. Moskowitz, H. R., & Barbe, C. D. (1977). Profiling of odor components and their mixtures. Sensory Processes, 1 (3), 212–226. Moulton, D. G. (1976). Spatial patterning of response to odors in peripheral olfactory sys­ tem. Physiological Reviews, 56 (3), 578–593. Page 34 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose Mozell, M. M., & Jagodowicz, M. (1973). Chromatographic separation of odorants by the nose: Retention times measured across in vivo olfactory mucosa. Science, 181 (106), 1247–1249. Mujica-Parodi, L. R., Strey, H. H., Frederick, B., Savoy, R., Cox, D., Botanov, Y., et al. (2009). Chemosensory cues to conspecific emotional stress activate amygdala in humans. PLoS One, 4 (7), 113–123. Negoias, S., Visschers, R., Boelrijk, A., & Hummel, T. (2008). New ways to understand aroma perception. Food Chemistry, 108 (4), 1247–1254. Ober, C., Weitkamp, L. R., Cox, N., Dytch, H., Kostyu, D., & Elias, S. (1997). HLA and mate choice in humans. Am J Hum Genet, 61 (3), 497–504. Obrebowski, A., Obrebowska-Karsznia, Z., & Gawlinski, M. (2000). Smell and taste in chil­ dren with simple obesity. International Journal of Pediatric Otorhinolaryngology, 55 (3), 191–196. O’Doherty, J., Rolls, E. T., Francis, S., Bowtell, R., McGlone, F., Kobal, G., et al. (2000). Sensory-specific satiety-related olfactory activation of the human orbitofrontal cortex. Neuroreport, 11 (4), 893–897. Pageat, P., & Gaultier, E. (2003). Current research in canine and feline pheromones. Vet­ erinary Clinics of North America, Small Animal Practice, 33 (2), 187–211. Pangborn, R. M., & Berggren, B. (1973). Human parotid secretion in response to pleasant and unpleasant odorants. Psychophysiology, 10 (3), 231–237. Plailly, J., Howard, J. D., Gitelman, D. R., & Gottfried, J. A. (2008). Attention to odor modu­ lates thalamocortical connectivity in the human brain. Journal of Neuroscience, 28 (20), 5257–5267. Porter, J., Anand, T., Johnson, B., Khan, R. M., & Sobel, N. (2005). Brain mechanisms for extracting spatial information from smell. Neuron, 47 (4), 581–592. Porter, J., Craven, B., Khan, R. M., Chang, S. J., Kang, I., Judkewitz, B., et al. (2007). Mechanisms of scent-tracking in humans. Nature Neuroscience, 10 (1), 27–29. Powell, T. P., Cowan, W. M., & Raisman, G. (1965). The central olfactory connexions. Jour­ nal of Anatomy, 99 (Pt 4), 791. Prehn, A., Ohrt, A., Sojka, B., Ferstl, R., & Pause, B. M. (2006). Chemosensory anxiety sig­ nals augment the startle reflex in humans. Neuroscience Letters, 394 (2), 127–130. Prehn-Kristensen, A., Wiesner, C., Bergmann, T. O., Wolff, S., Jansen, O., Mehdorn, H. M., et al. (2009). Induction of empathy by the smell of anxiety. PLoS One, 4 (6), e5987.

Page 35 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose Price, J. L. (1973). An autoradiographic study of complementary laminar patterns of termination of afferent fibers to the olfactory cortex. Journal of Comparative Neurology, 150, 87–108. (p. 109)

Price, J. L. (1987). The central olfactory and accessory olfactory systems. In T. E. Finger & W. L. Silver (Eds.), Neurobiology of taste and smell (179–203). New York: Wiley. Price, J. L. (1990). Olfactory system. In G. Paxinos (Ed.), Human nervous system (pp. 979– 1001). San Diego: Academic Press. Ressler, K. J., Sullivan, S. L., & Buck, L. B. (1993). A zonal organization of odorant recep­ tor gene expression in the olfactory epithelium. Cell, 73 (3), 597–609. Restrepo, D., Arellano, J., Oliva, A. M., Schaefer, M. L., & Lin, W. (2004). Emerging views on the distinct but related roles of the main and accessory olfactory systems in respon­ siveness to chemosensory signals in mice. Hormones and Behavior, 46 (3), 247–256. Richardson, B. E., Vander Woude, E. A., Sudan, R., Thompson, J. S., & Leopold, D. A. (2004). Altered olfactory acuity in the morbidly obese. Obesity Surgery, 14 (7), 967–969. Rinberg, D., Koulakov, A., & Gelperin, A. (2006). Sparse odor coding in awake behaving mice. Journal of Neuroscience, 26 (34), 8857. Roberts, S. C., Gosling, L. M., Carter, V., & Petrie, M. (2008). MHC-correlated odour pref­ erences in humans and the use of oral contraceptives. Proceedings of the Royal Society of London. B. Biological Sciences, 275 (1652), 2715–2722. Roberts, S. C., & Little, A. C. (2008). Good genes, complementary genes and human mate preferences. Genetica, 132 (3), 309–321. Roberts, T., & Roiser, J. P. (2010). In the nose of the beholder: are olfactory influences on human mate choice driven by variation in immune system genes or sex hormone levels? Experimental Biology and Medicine (Maywood), 235 (11), 1277–1281. Roessner, V., Bleich, S., Banaschewski, T., & Rothenberger, A. (2005). Olfactory deficits in anorexia nervosa. European Archives of Psychiatry and Clinical Neuroscience, 255 (1), 6– 9. Rogers, P. J., & Hill, A. J. (1989). Breakdown of dietary restraint following mere exposure to food stimuli: interrelationships between restraint, hunger, salivation, and food intake. Addictive Behavior, 14 (4), 387–397. Rolls, E. T. (2006). Brain mechanisms underlying flavour and appetite. Philosophical Transactions of the Royal Society of London. B. Biological Sciences, 361 (1471), 1123– 1136. Rolls, E. T., Critchley, H. D., & Treves, A. (1996). Representation of olfactory information in the primate orbitofrontal cortex. Journal of Neurophysiology, 75 (5), 1982–1996. Page 36 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose Rolls, E. T., & Rolls, J. H. (1997). Olfactory sensory-specific satiety in humans. Physiology & Behavior, 61 (3), 461–473. Russell, M. J., Switz, G. M., & Thompson, K. (1980). Olfactory influences on the human menstrual cycle. Pharmacology, Biochemisty, and Behavior, 13 (5), 737–738. Saito, H., Chi, Q., Zhuang, H., Matsunami, H., & Mainland, J. D. (2009). Odor coding by a Mammalian receptor repertoire. Science Signal, 2 (60), ra9. Saper, C. B., Chou, T. C., & Elmquist, J. K. (2002). The need to feed: Homeostatic and he­ donic control of eating. Neuron, 36 (2), 199–211. Savic, I., & Gulyas, B. (2000). PET shows that odors are processed both ipsilaterally and contralaterally to the stimulated nostril. Neuroreport, 11 (13), 2861–2866. Savigner, A., Duchamp-Viret, P., Grosmaitre, X., Chaput, M., Garcia, S., Ma, M., et al. (2009). Modulation of spontaneous and odorant-evoked activity of rat olfactory sensory neurons by two anorectic peptides, insulin and leptin. Journal of Neurophysiology, 101 (6), 2898–2906. Schaal, B., Marlier, L., & Soussignan, R. (2000). Human foetuses learn odours from their pregnant mother’s diet. Chemical Senses, 25 (6), 729–737. Schiffman, S. S. (1974). Physicochemical correlates of olfactory quality. Science, 185 (146), 112–117. Schiffman, S. S. (1997). Taste and smell losses in normal aging and disease. Journal of the American Medical Association, 278 (16), 1357–1362. Schiffman, S., Robinson, D. E., & Erickson, R. P. (1977). Multidimensional-scaling of odor­ ants—examination of psychological and physiochemical dimensions. Chemical Senses & Flavour, 2 (3), 375–390. Schiffman, S. S., & Warwick, Z. S. (1988). Flavor enhancement of foods for the elderly can reverse anorexia. Neurobiology of Aging, 9 (1), 24–26. Schmidt, H. J., & Beauchamp, G. K. (1988). Adult-like odor preferences and aversions in three-year-old children. Child Development, 1136–1143. Schneider, R. A., & Wolf, S. (1955). Olfactory perception thresholds for citral utilizing a new type olfactorium. Journal of Applied Physiology, 8 (3), 337–342. Schoenfeld, T. A., & Cleland, T. A. (2006). Anatomical contributions to odorant sampling and representation in rodents: zoning in on sniffing behavior. Chemical Senses, 31 (2), 131. Sela, L., Sacher, Y., Serfaty, C., Yeshurun, Y., Soroker, N., & Sobel, N. (2009). Spared and impaired olfactory abilities after thalamic lesions. Journal of Neuroscience, 29 (39), 12059. Page 37 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose Sela, L., & Sobel, N. (2010). Human olfaction: A constant state of change-blindness. Ex­ perimental Brain Research, 1–17. Shepherd, G. M. (2004). The human sense of smell: Are we better than we think? PLoS Bi­ ology, 2 (5), E146. Shepherd, G. M. (2005). Outline of a theory of olfactory processing and its relevance to humans. Chemical Senses, 30, I3–I5. Shipley, M. T. (1995). Olfactory system. In G. Paxinos (Ed.), Rat nervous system (2nd ed., pp. 899–928). San Diego: Academic Press. Singh, D., & Bronstad, P. M. (2001). Female body odour is a potential cue to ovulation. Proceedings of the Royal Society of London. B. Biological Sciences, 268 (1469), 797–801. Slotnick, B. M., & Schoonover, F. W. (1993). Olfactory sensitivity of rats with transection of the lateral olfactory tract. Brain Research, 616 (1–2), 132–137. Small, D. M., Gerber, J. C., Mak, Y. E., & Hummel, T. (2005). Differential neural responses evoked by orthonasal versus retronasal odorant perception in humans. Neuron, 47 (4), 593–605. Small, D. M., Jones-Gotman, M., Zatorre, R. J., Petrides, M., & Evans, A. C. (1997). Flavor processing: more than the sum of its parts. Neuroreport, 8 (18), 3913–3917. Snyder, D., Duffy, V., Chapo, A., Cobbett, L., & Bartoshuk, L. (2003). Childhood taste dam­ age modulates obesity risk: Effects on fat perception and preference. Obesity Research, 11, A147–A147. Sobel, N., Prabhakaran, V., Desmond, J. E., Glover, G. H., Goode, R. L., Sullivan, E. V., et al. (1998). Sniffing and smelling: Separate subsystems in the human olfactory cortex. Na­ ture, 392 (6673), 282–286. Sobel, N., Prabhakaran, V., Hartley, C. A., Desmond, J. E., Zhao, Z., Glover, G. H., et al. (1998). Odorant-induced and sniff-induced activation in the cerebellum of the hu­ man. Journal of Neuroscience, 18 (21), 8990–9001. (p. 110)

Soussignan, R., Schaal, B., Marlier, L., & Jiang, T. (1997). Facial and autonomic responses to biological and artificial olfactory stimuli in human neonates: Re-examining early hedo­ nic discrimination of odors. Physiology & Behavior, 62 (4), 745–758. Spehr, M., & Munger, S. D. (2009). Olfactory receptors: G protein-coupled receptors and beyond. Journal of Neurochemistry, 109 (6), 1570–1583. Spehr, M., Spehr, J., Ukhanov, K., Kelliher, K., Leinders-Zufall, T., & Zufall, F. (2006). Sig­ naling in the chemosensory systems: Parallel processing of social signals by the mam­ malian main and accessory olfactory systems. Cellular and Molecular Life Sciences, 63 (13), 1476–1484. Page 38 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose Stafford, L. D., & Welbeck, K. (2010). High hunger state increases olfactory sensitivity to neutral but not food odors. Chemical Senses, 36 (2), 189–198. Steiner, J. E. (1979). Human facial expressions in response to taste and smell stimulation. Advances in Child Development and Behavior, 13, 257–295. Stern, K., & McClintock, M. K. (1998). Regulation of ovulation by human pheromones. Na­ ture, 392 (6672), 177–179. Stettler, D. D., & Axel, R. (2009). Representations of odor in the piriform cortex. Neuron, 63 (6), 854–864. Stevens, D. A., & Lawless, H. T. (1981). Age-related changes in flavor perception. Appetite, 2 (2), 127–136. Stevenson, R. J. (2010). An initial evaluation of the functions of human olfaction. Chemical Senses, 35 (1), 3. Stoddart, D. M. (1990). The scented ape: The biology and culture of human odour: Cam­ bridge, UK: Cambridge University Press. Storan, M. J., & Key, B. (2006). Septal organ of Gr¸neberg is part of the olfactory system. Journal of Comparative Neurology, 494 (5), 834–844. Strotmann, J., Wanner, I., Krieger, J., Raming, K., & Breer, H. (1992). Expression of odor­ ant receptors in spatially restricted subsets of chemosensory neurons. Neuroreport, 3 (12), 1053–1056. Su, C. Y., Menuz, K., & Carlson, J. R. (2009). Olfactory perception: receptors, cells, and circuits. Cell, 139 (1), 45–59. Tanabe, T., Iino, M., Ooshima, Y., & Takagi, S. F. (1974). Olfactory area in prefrontal lobe. Brain Research, 80 (1), 127–130. Tham, W. W. P., Stevenson, R. J., & Miller, L. A. (2010). The role of the mediodorsal thala­ mic nucleus in human olfaction. Neurocase, 99999 (1), 1–12. Thornhill, R., Gangestad, S. W., Miller, R., Scheyd, G., McCollough, J. K., & Franklin, M. (2003). Major histocompatibility complex genes, symmetry, and body scent attractiveness in men and women. Behavioral Ecology, 14 (5), 668–678. Tomori, Z., Benacka, R., & Donic, V. (1998). Mechanisms and clinicophysiological implica­ tions of the sniff- and gasp-like aspiration reflex. Respiration Physiology, 114 (1), 83–98. Uchida, N., Takahashi, Y. K., Tanifuji, M., & Mori, K. (2000). Odor maps in the mammalian olfactory bulb: Domain organization and odorant structural features. Nature Neuro­ science, 3 (10), 1035–1043.

Page 39 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose Vassar, R., Ngai, J., & Axel, R. (1993). Spatial segregation of odorant receptor expression in the mammalian olfactory epithelium. Cell, 74 (2), 309–318. Verhagen, J. V., Wesson, D. W., Netoff, T. I., White, J. A., & Wachowiak, M. (2007). Sniffing controls an adaptive filter of sensory input to the olfactory bulb. Nature Neuroscience, 10 (5), 631–639. Wedekind, C., & Furi, S. (1997). Body odour preferences in men and women: Do they aim for specific MHC combinations or simply heterozygosity? Proceedings of the Royal Soci­ ety of London. B. Biological Sciences, 264 (1387), 1471–1479. Wedekind, C., Seebeck, T., Bettens, F., & Paepke, A. J. (1995). MHC-dependent mate pref­ erences in humans. Proceedings of the Royal Society of London. B. Biological Sciences, 260 (1359), 245–249. Wilson, D. A. (1997). Binaral interactions in the rat piriform cortex. Journal of Neurophys­ iology, 78 (1), 160–169. Wilson, D. A. (2009a). Olfaction as a model system for the neurobiology of mammalian short-term habituation. Neurobiology of Learning and Memory, 92 (2), 199–205. Wilson, D. A. (2009b). Pattern separation and completion in olfaction. Annals of the New York Academy of Sciences, 1170, 306–312. Wilson, D. A., & Stevenson, R. J. (2003). The fundamental role of memory in olfactory per­ ception. Trends in Neurosciences, 26 (5), 243–247. Wilson, R. I., & Mainen, Z. F. (2006). Early events in olfactory processing. Neuroscience, 29 (1), 163. Witt, M., & Hummel, T. (2006). Vomeronasal versus olfactory epithelium: is there a cellu­ lar basis for human vomeronasal perception? International review of cytology, 248, 209– 259. Wysocki, C. J., & Meredith, M. (1987). The vomeronasal system. In T. E. Finger & W. L. Silver (Eds.), Neurobiology of taste and smell (pp. 125–150). New York: John Wiley & Sons. Wysocki, C. J., Pierce, J. D., & Gilbert, A. N. (1991). Geographic, cross-cultural, and indi­ vidual variation in human olfaction. In T. V. Getchell (Ed.), Smell and taste in health and disease (pp. 287–314). New York: Raven Press. Wysocki, C. J., & Preti, G. (2004). Facts, fallacies, fears, and frustrations with human pheromones. Anatomical Record. A. Discoveries in Molecular, Cellular, and Evolutionary Biology, 281 (1), 1201–1211. Yeomans, M. R. (2006). Olfactory influences on appetite and satiety in humans. Physiolo­ gy of Behavior, 89 (1), 10–14. Page 40 of 41

Looking at the Nose Through Human Behavior, and at Human Behavior Through the Nose Yeshurun, Y., & Sobel, N. (2010). An odor is not worth a thousand words: from multidi­ mensional odors to unidimensional odor objects. Annual Review of Psychology, 61, 219– 241. Zarzo, M. (2008). Psychologic dimensions in the perception of everyday odors: Pleasant­ ness and edibility. Journal of Sensory Studies, 23 (3), 354–376. Zeki, S., & Bartels, A. (1999). Toward a theory of visual consciousness* 1. Consciousness and Cognition, 8 (2), 225–259. Zelano, C., & Sobel, N. (2005). Humans as an animal model for systems-level organization of olfaction. Neuron, 48 (3), 431–454. Zhang, X., & Firestein, S. (2002). The olfactory receptor gene superfamily of the mouse. Nature Neuroscience, 5 (2), 124–133. Zhou, W., & Chen, D. (2009). Fear-related chemosignals modulate recognition of fear in ambiguous facial expressions. Psychological Science, 20 (2), 177. Zilstorff-Pedersen, K. (1955). Olfactory threshold determinations in relation to food in­ take. Acta Otolaryngologica, 45 (1), 86–90. Zufall, F., Firestein, S., & Shepherd, G. M. (1994). Cyclic nucleotide-gated ion channels and sensory transduction in olfactory receptor neurons. Annual Review of Biophysics and Biomolecular Structure, 23 (1), 577–607.

Roni Kahana

Roni Kahana, Department of Neurobiology, Weizmann Institute of Science, Rehovot, Israel Noam Sobel

Noam Sobel, Department of Neurobiology, Weizmann Institute of Science, Rehovot, Israel

Page 41 of 41

Cognitive Neuroscience of Music

Cognitive Neuroscience of Music   Petr Janata The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics Edited by Kevin N. Ochsner and Stephen Kosslyn Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0007

Abstract and Keywords Humans engage with music in many ways, and music is associated with many aspects of our personal and social lives. Music represents an organization of our auditory environ­ ments, and many neural processes must be recruited and coordinated both to perceive and to create musical patterns. Accordingly, our musical experiences depend on the inter­ play of diverse brain systems underlying perception, cognition, action, and emotion. Com­ pared with the study of other human faculties, the neuroscientific study of music is rela­ tively recent. Paradigms for examining musical functions have been adopted from other domains of neuroscience and also developed de novo. The relationship of music to other cognitive domains, in particular language, has garnered considerable attention. This chapter provides a survey of the experimental approaches taken, and emphasizes consis­ tencies across the various studies that help us understand musical functions within the broader field of cognitive neuroscience. Keywords: music, auditory environments, cognition, neuroscience

Introduction Discoveries of ancient bone flutes illustrate that music has been a part of human society for millennia (Adler, 2009). Although the reason for why the human brain enables musical capacities is hotly debated—did it evolve like language, or is it an evolutionary epiphe­ nomenon?—the fact that human individuals and societies devote considerable time and resources to incorporate music into their lives is indisputable. Scientific study of the psychology and neuroscience of music is relatively recent. In terms of human behaviors, music is usually viewed and judged in relation to language. Accord­ ingly, the organization of music in the brain has been viewed by many in terms of modular organization, whereby music-specific processing modules are associated with specialized neural substrates in much the same way that discrete brain areas (e.g. Broca’s and Wernicke’s areas) have been associated traditionally with language functions (Peretz & Coltheart, 2003). Indeed, neuropsychological case studies in which the loss of language Page 1 of 42

Cognitive Neuroscience of Music abilities can be dissociated from the loss of musical abilities, and vice versa, clearly sup­ port a modular view. To the extent that language has been treated as a domain of cogni­ tive neuroscience that is largely separate from other functional domains—such as percep­ tion, memory, attention, action, and emotion—music, too, has been regarded as a cogni­ tive novelty with a unique embodiment in the human brain. In other words, music has been subjected to the same localizationist pressures that have historically pervaded cog­ nitive neuroscience. Neuroscientific inquiry into musical functions has therefore focused most often on the auditory cortices situated along the superior surface of the temporal lobes. The logic is simple: if music is an auditory phenomenon and the auditory cortex (p. 112) processes auditory information, then music must reside in the auditory cortex. Although it is true that we mainly listen to music, meaningful engagement with music is much more than a perceptual phenomenon. The most obvious example is that of musical performance, whereby the action systems of the brain must be engaged. However, as de­ scribed below in detail, underneath overt perception and action lie more subtle aspects of musical engagement: attention, various forms of memory, and covert action such as men­ tal imagery. Finally, music plays on our emotions in various ways. Thus, the most parsimo­ nious view of how music coexists with the other complex behaviors that the human brain supports is not one in which all of the cognitive functions on which music depends are lo­ calized to a specific area of the brain, but rather one in which musical experiences are in­ stantiated through the coupling of networks that serve domain general functions. Follow­ ing a brief description of key principles underlying the organization of acoustic informa­ tion into music, this chapter treats the neuroscience of each of those functions in turn.

Building Blocks of Music In order to discuss the cognitive neuroscience of music, it is necessary to describe briefly some of the basic dimensions on which music is organized. These broad dimensions en­ compass time, tonality, and timbre. Music from different cultures utilizes these dimen­ sions to varying degrees, and in general it is the variability in the ways that these dimen­ sions are utilized that gives rise to concepts of musical styles and genres that vary within and across cultures. Here I restrict my discussion to Western tonal music.

Time The patterning of acoustic events in time is a crucial element of music that gives rise to rhythm and meter, properties that are essential in determining how we move along with music. We often identify “the beat” in music by tapping our feet or bobbing our heads with a regular periodicity that seems to fit best with the music, and this capacity for en­ trainment has strong social implications in terms of coordinated action and shared experi­ ence (Pressing, 2002; Wiltermuth & Heath, 2009). Importantly, meter and rhythm provide a temporal scaffold that guides our expectations for when events will occur (Jones, 1976; London, 2004). This scaffold might be thought of in terms of coupled oscillators that fo­ cus attention at particular moments in time (Large & Jones, 1999; Large & Palmer, 2002) Page 2 of 42

Cognitive Neuroscience of Music and thereby influence our perception of other musical attributes such as pitch (Barnes & Jones, 2000; Jones et al., 2002). Our perception of metric structure, that aspect of music that distinguishes a waltz from a march, is associated with hierarchies of expectations for when events will occur (Jones & Boltz, 1989; Palmer & Krumhansl, 1990). Some temporal locations within a metric struc­ ture are more likely to be associated with events (“strong beats”), whereas others are less likely (“weak beats”). The manipulation of when events actually occur relative to weak and strong beat locations is associated with syncopation, a salient feature of many rhythms, which in turn characterize musical styles and shape our tendency to move along with music (Pressing, 2002).

Tonality Tonality refers to notes (pitches) and the relationships among notes. Even though there are many notes that differ in their fundamental frequency (e.g., the frequency associated with each key on a piano keyboard), tonality in Western tonal music is based on twelve pitch classes, with each pitch class associated with a label such as C or C-sharp (Figure 7.1). The organization into twelve pitch classes arises because (1) two sounds that are separated by an octave (a doubling in frequency) are perceptually similar and are given the same note name (e.g., F), and (2) an octave is divided into twelve equal (logarithmic) frequency steps called semitones. When we talk about melody we refer to sequences of notes, and when we talk about harmony, we refer to two or more notes that sound at the same time to produce an interval or chord. Melodies are defined in large part by their contours—the pattern of ups and downs of the notes—such that changes in contour are easier to detect than contour-preserving shifts of isolated notes when the entire melody is transposed to a different key (Dowling, 1978).

Figure 7.1 Tonal relationships. A, The relationship between pitch height and pitch chroma (pitch class) Page 3 of 42

Cognitive Neuroscience of Music is obtained by placing successively higher pitches on a spiral, such that one full turn of the spiral corre­ sponds to a frequency doubling. This arrangement captures the phenomenon of octave equivalence. When the spiral is viewed from above, the chroma circle emerges. The chroma circle comprises the twelve pitch classes (C, C#, D, etc.) that are used in Western tonal music. B, The seven pitch classes (notes) belonging to the C-major scale are shown in musical notation. The notes on the staff are in ap­ proximate alignment with the notes along the pitch spiral. The unfilled symbols above the notes labeled C, D, and E illustrate the additional notes that would be played in conjunction with each root note to form a triad—a type of chord. Roman numerals are used to designate the scale position of each note. The case of the numeral indicates whether the triad formed with that note as a root has a major (uppercase) or minor (lowercase) quality. C, Although seven of the twelve possible pitch classes belong to a key, they are not equally representative of the key. This fact is embod­ ied in the concept of tonal hierarchies (key profiles), which can be derived via psychological methods such as goodness-of-fit ratings of probe tones correspond­ ing to each of the twelve possible pitch classes. Shown in blue is the canonical Krumhansl & Kessler key profile (Krumhansl, 1990). The generality of this profile is illustrated by the red bars, which were ob­ tained from about 150 undergraduate students of varying musical aptitude in the author’s psychology of music class. Each student made a single judgment about each probe tone. Very similar key profiles are obtained when the distributions of notes in pieces of music are generated, suggesting that tonal knowl­ edge embodies the statistics of the music we hear. The seven notes belonging to the key (black note names) are judged to fit better than the five notes that don’t belong to the key (blue note names). The number symbol (#) following a letter indicates a pitch that is raised (sharp) by one semitone, whereas a “b” following a letter indicates a note that is low­ ered (flat) by one semitone. D, The fact that each ma­ jor and minor key is associated with a different key profile gives rise to the concept of distance between the different keys. In music theory, the distances are often represented by the circle of fifths for the major (red) and minor (cyan) keys. The circle is so named because working in a clockwise direction, the fifth scale degree, which is the second most stable note in a key (e.g., the note G in C-major), becomes the most stable note (the tonic) of the next key (e.g., G-major). The distances between major and minor keys (pitch probability distributions) are represented most parsi­ moniously on the surface of a torus. The toroidal rep­ resentation is arrived at either by multidimensional scaling of subjective distances between the keys or by self-organizing neural networks that are served music that moves through all twenty-four major and minor keys as input. Each location on the torus rep­ resents a particular pitch probability distribution. Ac­ cordingly, a tally of the notes in a musical segment can be projected onto the toroidal surface and ren­ Page 4 of 42

Cognitive Neuroscience of Music dered in color to indicate the likelihood that the piece of music is in a particular key at a particular moment in time. Because the notes that make up a piece’s melodies and harmonies change in time, thereby creating variation in the momentary key pro­ files, the activation on the torus changes dynamically in time.

Central to the way that tonality works are the concepts of pitch probability distributions and the related notion of key (Krumhansl, 1990; Temperley, 2001, 2007). When we say that a piece of music is in the key of G-major or g-minor, it means that certain notes, such as G or D, will be perceived as fitting well into that key, whereas others, like G-sharp or Csharp, will sound out of place. Tallying up the notes of many different pieces written in the key of G-major, we would find that G and D are the notes that occur most often, whereas G-sharp and C-sharp would not be very frequent. That is, (p. 113) the key of Gmajor is associated with a particular probability distribution across the twelve possible pitch classes. The key of D-major is associated with a slightly different probability distrib­ ution, albeit one that is closely related to that of G-major in that the note (pitch class) D figures prominently in both. However, the probability distribution for the key of C-sharp major differs markedly from that of G-major. These probability distributions are often re­ ferred to as key profiles or tonal hierarchies, and they simultaneously reflect the statistics of music as well as perceptual distances between individual notes and the keys (or tonal contexts) that have been established in the minds of listeners (Krumhansl, 1990; Temper­ ley, 2007). Knowledge of the statistics underlying tonality is presumably acquired through (p. 114) implicit statistical learning mechanisms (Tillmann et al., 2000), as evidenced by the brain’s rapid adaptation to novel tonal systems (Loui et al., 2009b). Conveniently, the perceived, statistical, and music-theoretical distance relationships be­ tween keys can be represented geometrically on the surface of a ring (torus) such that keys that have many of their notes in common are positioned close to each other on the toroidal surface (Krumhansl & Kessler, 1982; Krumhansl, 1990). Each location on the toroidal surface is associated with a probability distribution across the twelve pitch class­ es. If a melody or sequence of chords is played that contains notes in proportions that correspond to the probability distribution for a particular location on the torus, then that region of tonal space is considered activated or primed. If a chord is played whose con­ stituent notes belong to a very different probability distribution (i.e., a distantly situated location on the torus corresponding to a distant key), the chord will sound jarring and out of place. However, if several other chords are now played that are related to the chord that was jarring, the perceived tonal center will shift to that other region of the torus. Therefore, movement of music in tonal space is dynamic and dependent on the melodies and chord progressions that are being played (Janata, 2007; Janata et al., 2002b; Toivi­ ainen, 2007; Toiviainen & Krumhansl, 2003).

Page 5 of 42

Cognitive Neuroscience of Music

Timbre The third broad dimension of music is timbre. Timbre is what distinguishes instruments from each other. Timbre is the spectrotemporal signature of an instrument: a description of how the frequency content of the sound changes in time. If one considers the range of musical sounds that are generated not just by physical instruments but also by electronic synthesizers, human voices, and environmental sounds that are used for musical purpos­ es, the perceptual space underlying timbre is vast. Based on multidimensional scaling analyses of similarity judgments between pairs of sounds, it has been possible to identify candidate dimensions of timbre (Caclin et al., 2005; Grey, 1977; Lakatos, 2000; McAdams et al., 1995). Most consistently identified across studies are dimensions corresponding to attack and spectral centroid. Attack refers to the onset characteristics of the amplitude envelope of the sound (e.g., percussive sounds have a rapid attack, whereas bowed sounds have a slower attack). Centroid refers to the location along the frequency axis where the peak of energy lies, and is commonly described as the “brightness” of a sound. The use of acoustic features beyond attack and centroid for the purpose of judging the similarities between pairs of sounds is much less consistent and appears to be context de­ pendent (Caclin et al., 2005). For (p. 115) example, variation in spectral fine structure, such as the relative weighting of odd and even frequency components, or time-varying spectral features (spectrotemporal flux) influence similarity judgments also. The fact that timbre cannot be decomposed into a compact set of consistent dimensions that satisfacto­ rily explain the abundance of perceptual variation somewhat complicates the search for the functional representation of timbre in the brain.

Perception and Cognition Tonality Pitch and Melody Some of the earliest attempts to understand the neural substrates for music processing focused on the ability of patients, in whom varying amounts of the temporal lobes, and thereby auditory cortical areas, had been removed, to detect alterations in short melodies. A consistent result has been that right temporal lobe (RTL) damage impairs the ability of individuals to detect a variety of changes in melodies, including the starkest type of change in which the altered note violates both the contour of the melody and the key, whereas left temporal lobe damage leads to no or little impairment (Dennis & Hopy­ an, 2001; Liegeois-Chauvel et al., 1998; Samson & Zatorre, 1988; Warrier & Zatorre, 2004; Zatorre, 1985). However, more difficult discriminations, in which the altered notes preserve the contour and the key of the initial melody, suffer when either hemisphere is damaged. Patients with RTL damage have difficulty judging whether one pitch is higher or lower than the next, a critical ability for determining both the contour and the specific intervals (distances between notes) that define a melody, even though their basic ability to discriminate whether the notes are the same or different remains intact (Johnsrude et Page 6 of 42

Cognitive Neuroscience of Music al., 2000). Whereas having the context of a familiar melody generally facilitates the abili­ ty to detect when a final note is mistuned (Warrier & Zatorre, 2002), RTL damage re­ duces the amount of that facilitation (Warrier & Zatorre, 2004), indicating that the pro­ cessing of pitch relationships in the temporal lobes also affects more basic perceptual processes such as intonation judgments. Interestingly, the melody processing functions of the RTL appear to depend in large part on areas that are anterior to the primary auditory cortex, which is situated on Heschl’s gyrus (HG; Johnsrude et al., 2000; Samson & Za­ torre, 1988). The results from the patient studies are supported by a number of functional magnetic resonance imaging (fMRI) studies of pitch and melody processing. A hierarchy of pitch processing is observed in the auditory cortex following a medial to lateral progression. Broadband noise or sounds that have no clear pitch activate medial HG, sounds with dis­ tinct pitch produce more activation in the lateral half of HG, and sequences in which the pitch varies, in either tonal or random melodies, generate activity that extends rostrally from HG along the superior temporal gyrus (STG) toward the planum polare, biased to­ ward the right hemisphere (Patterson et al., 2002). One of the critical features of pitch in music is the distinction between pitch height and pitch chroma (pitch class). The chroma of a pitch is referred to by its note name (e.g. D, D#). Critically, chroma represent perceptual constancy that allows notes played in differ­ ent octaves to be identified as the same note. These separable aspects of pitch appear to have partially distinct neural substrates, with preferential processing of pitch height pos­ terior to HG in the planum temporale, and processing of chroma anterolateral to HG in the planum polare (Warren et al., 2003), consistent with the proposed role of anterior STG regions in melody processing (Griffiths et al., 1998; Patterson et al., 2002; Schmithorst & Holland, 2003). The neural representations of individual pitches in melod­ ic sequences are also influenced by the statistical structure of the sequences (Patel & Bal­ aban, 2000, 2004). In these experiments, the neural representations were quantified by examining the quality of the coupling between the amplitude modulations of the tones used to create the sequence and the amplitude modulations in the response recorded above the scalp using magnetoencephalography (MEG). Random sequences elicited little coupling, whereas highly predictable musical scales elicited the strongest coupling. These effects were strongest above temporal and lateral prefrontal sensor sites, consistent with a hypothesis that a frontotemporal circuit supports the processing of melodic structure.

Detecting Wrong Notes and Wrong Chords The representation and processing of tonal information has been the aspect of music that has been most extensively studied using cognitive neuroscience methods. Mirroring the approach taken in much of the rest of cognitive neuroscience, “expectancy violation” par­ adigms have been the primary approach to establishing the existence of a cognitive schema through which we assimilate pitch information. In other words, how does the brain respond (p. 116) when a target event, usually the terminal note of a melody or chord of a harmonic progression, is unexpected given the preceding musical context?

Page 7 of 42

Cognitive Neuroscience of Music When scales or melodies end in notes that do not belong to the established key, large pos­ itive deflections are evident in event-related potentials (ERPs) recorded at posterior scalp sites, indicating the activation of congruency monitoring and context-updating processes indexed by the P300 or late-positive complex (LPC) components of ERP waveforms (Besson & Faïta, 1995; Besson & Macar, 1987; Paller et al., 1992). These effects are ac­ centuated in subjects with musical training and when the melodies are familiar (Besson & Faïta, 1995; Miranda & Ullman, 2007). Similarly, short sequences of chords that termi­ nate with a chord that is unexpected given the tonal context established by the preceding chords elicit P300 and LPC components (Beisteiner et al., 1999; Carrion & Bly, 2008; Janata, 1995; Koelsch et al., 2000; Patel et al., 1998), even in natural musical contexts (Koelsch & Mulder, 2002). As would be expected given the sensitivity of the P300 to glob­ al event probability (Tueting et al., 1970), the magnitude of the posterior positive re­ sponses increases as the starkness of the harmonic violation increases (Janata, 1995; Pa­ tel et al., 1998). The appearance of the late posterior positivities depends on overt processing of the tar­ get chords by making either a detection or categorization judgment. When explicit judg­ ments about target chords are eliminated, the most prominent deviance-related response is distributed frontally, and typically manifests as an attenuated positivity approximately 200 ms after the onset of a deviant chord. This relative negativity in response to contextu­ ally irregular chords was termed an early right anterior negativity (ERAN; Koelsch et al., 2000), although in many subsequent studies, it was found to be distributed bilaterally. The ERAN has been studied extensively and is interesting for two principle reasons. First, the ERAN and the right anterior negativity (RATN; Patel et al., 1998) have been interpret­ ed as markers of syntactic processing in music, paralleling the left anterior negativities associated with the processing of syntactic deviants in language (Koelsch et al., 2000; Pa­ tel et al., 1998). Localization of the ERAN to Broca’s area using MEG supports such an in­ terpretation (Maess et al., 2001). (The parallels between syntactic processing in music and language are discussed in a later subsection.) Second, the ERAN is regarded as an in­ dex of automatic harmonic syntax processing in that it is elicited even when the irregular chords themselves are not task relevant (Koelsch et al., 2000; 2002b). Whether the ERAN is attenuated when attention is oriented away from musical material is a matter of some debate (Koelsch et al., 2002b; Loui et al., 2005). The ERAN is a robust index of harmonic expectancy violation processing, and it is sensi­ tive to musical training. It is found in children (Koelsch et al., 2003b), and it increases in amplitude with musical training in both adults (Koelsch et al., 2002a) and children (Jentschke & Koelsch, 2009). The amplitude of the ERAN is also sensitive to the probability with which a particular chord occurs at a particular location in a sequence. For example, the ERAN to the same irregular chord function, such as a Neapolitan sixth, is weaker when that chord occurs at a sequence location that is more plausible from a harmonic syntax perspective (Leino et al., 2007). Similarly, using Bach chorales, chords that are part of the original score, but Page 8 of 42

Cognitive Neuroscience of Music not the most expected from a music-theoretical point of view, elicit an ERAN, in compari­ son to more expected chords that have been substituted in, but a much weaker ERAN than highly unexpected Neapolitan sixth chords inserted into the same location (Steinbeis et al., 2006). The automaticity of the ERAN naturally leads to comparisons with the mismatch negativi­ ty (MMN), the classic marker of preattentive detection of deviant items in auditory streams (Näätänen, 1992; Näätänen & Winkler, 1999). Given that irregular chords might be regarded as violations of an abstract context established by a sequence of chords, the ERAN could just be a form of “abstract MMN.” The ERAN and MMN occur within a simi­ lar latency range, and their frontocentral distributions often make them difficult to distin­ guish from one another based on their scalp topographies (e.g. Koelsch et al., 2001; Leino et al., 2007). Nonetheless, the ERAN and MMN are dissociable (Koelsch, 2009). For ex­ ample, an MMN elicited by an acoustically aberrant stimulus, such as a mistuned chord or a change in the frequency of a note (frequency MMN), does not show sensitivity to lo­ cation within a harmonic context (Koelsch et al., 2001; Leino et al., 2007). Moreover, if the sensory properties of the chord sequences are carefully controlled in terms of repeti­ tion priming for specific notes or the relative roughness of target chords and those in the preceding context, an ERAN is elicited by harmonically incongruent chords even if the harmonically incongruent chords are more similar (p. 117) in their sensory characteristics to the penultimate chords than are the harmonically congruent chords (Koelsch et al., 2007). A number of fMRI studies have contributed to the view that musical syntax is evaluated in the ventrolateral prefrontal cortex (VLPFC), in a region comprising the ventral aspect of the inferior frontal gyrus (IFG), frontal operculum, and anterior insula. The evaluation of target chords in a harmonic priming task results in bilateral activation of this region, and is greater for harmonically unrelated targets than harmonically related targets (Koelsch et al., 2005b; Tillmann et al., 2003). Similarly, chord sequences that contain modulations —strong shifts in the tonal center toward another key—activate this region in the right hemisphere (Koelsch et al., 2002c). Further evidence that the VLPFC is sensitive to con­ textual coherence comes from a paradigm in which subjects listened to a variety of 23second excerpts of familiar and unfamiliar classical music. Each excerpt was rendered in­ coherent by being chopped up into 250- to 350-ms segments and then reconstituted with random arrangement of the segments. Bilaterally, the inferior frontal cortex and adjoining insula responded more strongly to the normal music, compared with reordered music (Levitin & Menon, 2003).

Tonal Dynamics Despite their considerable appeal from an experimental standpoint, trial-based expectan­ cy violation paradigms are limited in their utility in investigating the brain dynamics that accompany listening to extended passages of music in which the manipulation of ex­ pectancies is typically more nuanced and ongoing. When examined more closely, chord sequences such as those used in the experiments described above do more than establish a particular static tonal context. They actually create patterns of movement within tonal Page 9 of 42

Cognitive Neuroscience of Music space—the system of major and minor keys that can be represented on a torus. The de­ tails of the trajectories depend on the specific chords and the sequence in which they oc­ cur (Janata, 2007). Different pieces of music will create different patterns of movement through tonal space, depending on the notes in the melodies and harmonic accompani­ ments. The time-varying pattern on the toroidal surface can be quantified for any piece of music, and this quantification can then be used to probe the time-varying structure of the fMRI activity recorded while a person listens to the music. This procedure identifies brain areas that are sensitive to the movements of the music through tonal space (Janata, 2005, 2009; Janata et al., 2002b). The “tonality-tracking” approach has suggested a role of the medial prefrontal cortex (MPFC) in the maintenance of tonal contexts and the integration of tonal contexts with music-evoked autobiographical memories (Janata, 2009). When individuals underwent fM­ RI scans while listening attentively to an arpeggiated melody that systematically moved (modulated) through all twenty-four major and minor keys over the course of 8 minutes (Janata et al., 2003), the MPFC was the one brain region that was consistently active across three to four repeated sessions within listeners and across listeners, even though consistent tonality tracking responses were observed at the level of individuals in several brain areas (Janata, 2005; Janata et al., 2002b). ERP studies provide converging evidence for a context maintenance interpretation in that a midline negativity with a very frontal focus characterizes both the N5 component, a late negative peak that has been interpret­ ed to reflect contextual integration of harmonically incongruous material (Koelsch et al., 2000; Loui et al., 2009b), and a sustained negative shift in response to modulating se­ quences (Koelsch et al., 2003a). As discussed below, tonality tracking in the MPFC is ger­ mane to understanding how music interacts with functions underlying a person’s sense of self because the MPFC is known to support such functions (Gilbert et al., 2006; Northoff & Bermpohl, 2004; Northoff et al., 2006).

Rhythm and Meter As in the case of melody perception, the question arises to what extent auditory cortical areas in the temporal lobe are engaged in the processing of musical rhythmic patterns. Studies of patients in whom varying amounts of either the left or right anterior temporal lobes have been removed indicate that the greatest impairment in reproducing rhythmic patterns is found in patients with excisions that encroach on secondary auditory areas in HG in the right hemisphere (Penhune et al., 1999). The deficits are observed when exact durational patterns are to be reproduced, but not when the patterns can be encoded cate­ gorically as sequences of long and short intervals. Given the integral relationship between timing and movement, and the propensity of hu­ mans to move along with the beat in the music, neuroimaging experiments of rhythm and meter perception have examined the degree to which motor systems (p. 118) of the brain are engaged alongside auditory areas during passive listening to rhythms or attentive lis­ tening while performing a secondary discrimination task (Grahn & Rowe, 2009), listening with the intent to subsequently synchronize with or reproduce the rhythm (Chen et al., Page 10 of 42

Cognitive Neuroscience of Music 2008a), or listening with the intent to make a same/different comparison with a target rhythm (Grahn & Brett, 2007). Discrimination of metrically simple, complex, and non­ metric rhythms recruits, bilaterally, the auditory cortex, cerebellum, IFG, and a set of pre­ motor areas including the basal ganglia (putamen), pre–supplementary motor area (pS­ MA) or supplementary motor area (SMA), and dorsal premotor cortex (PMC) (Grahn & Brett, 2007). The putamen has been found to respond more strongly to simple rhythms than complex rhythms, suggesting that its activation is central to the experienced salience of a beat. A subsequent study found stronger activation throughout the basal ganglia in response to beat versus nonbeat rhythms, along with greater coupling of the putamen with the auditory cortex and medial and lateral premotor areas in the beat con­ ditions (Grahn & Rowe, 2009). Interestingly, the response of the putamen increased as ex­ ternal accenting cues weakened, suggesting that activity within the putamen is also shaped by the degree to which listeners generate a subjective beat. Activity in premotor areas and the cerebellum is differentiated by the degree of engage­ ment with a rhythm (Chen et al., 2008a). The SMA and mid-PMC are active during pas­ sive listening to rhythms of varying complexity. Listening to a rhythm with the require­ ment to subsequently synchronize with that rhythm recruits these regions along with ven­ tral premotor and inferior frontal areas. These regions are then also active when subjects subsequently synchronize their taps with the rhythm. Similar results are obtained in re­ sponse to short 3-second piano melodies: lateral premotor areas are activated both dur­ ing listening and during execution of arbitrary key press sequences without auditory feedback (Bangert et al., 2006). Converging evidence for the recruitment of premotor ar­ eas during listening to rhythmic structure in music has been obtained through analyses of MEG data in which a measure related to the amplitude envelope of the auditory stimulus is correlated with the same measure applied to the MEG data (Popescu et al., 2004). One study of attentive listening to polyphonic music also found increased activation of premo­ tor areas (pSMA, mid-PMC), although the study did not seek to associate these activa­ tions directly with the rhythmic structure in the music (Janata et al., 2002a).

Timbre Given the strong dependence of music on variation in timbre (instrument sounds), it is surprising that relatively few studies have addressed the representation of timbre in the brain. Similarity judgments of pairs of heard or imagined orchestral instrument sounds drive activity in auditory cortical areas along the posterior half of the STG, around HG and within the planum temporale (Halpern et al., 2004). When activation in response to more complex timbres (sounds consisting of more spectral components—harmonics—and greater temporal variation in those harmonics) is compared with simpler timbres or pure tones, regions of the STG surrounding primary auditory areas stand out as more active (Menon et al., 2002; Meyer et al., 2006). Processing of attack and spectral centroid cues is more impaired in individuals with right temporal lobe resections than in those with left temporal lobe resections or in normal controls (Samson & Zatorre, 1994).

Page 11 of 42

Cognitive Neuroscience of Music A number of studies have made explicit use of the multidimensional scaling approach to examine the organization of timbral dimensions in the brain. For example, when based on similarity judgments of isolated tones varying in attack or spectral centroid, the perceptu­ al space of patients with resections of the right temporal lobe is considerably more dis­ torted than is that of normal controls or individuals with left temporal lobe resections (Samson et al., 2002). These impairments are ameliorated to a great extent, but not en­ tirely, when the timbral similarity of eight-note melodies is judged (Samson et al., 2002), although the extent to which melodic perception as opposed to simple timbral reinforce­ ment drives this effect is unclear. Evidence that timbral variation is assessed at relatively early stages of auditory cortical processing comes from observations that the MMN is similar to changes in the emotional connotation of a tone played by a violin, a change in timbre from violin to flute, and changes in pitch (Goydke et al., 2004). Moreover, MMN responses are largely additive when infrequent ignored deviant sounds deviate from standard ignored sounds on multi­ ple timbre dimensions simultaneously, suggesting that timbral dimensions are processed within separate sensory memory channels (Caclin et al., 2006). Also, within the represen­ tation of the spectral centroid dimension, the (p. 119) magnitude of the MMN varies lin­ early with perceptual and featural similarity (Toiviainen et al., 1998). Although timbre can be considered in terms of underlying feature dimensions, musical sounds nonetheless have a holistic object-like quality to them. Indeed, the perceptual pro­ cessing of timbre dimensions is not entirely independent (Caclin et al., 2007). The inter­ activity of timbral dimensions becomes evident when timbral categorization judgments are required and manifest themselves mainly in the amplitude of later decision-related components such as the P300 (Caclin et al., 2008). An understanding of how timbral ob­ jects are represented and behave within broader brain networks in a task-dependent manner, such as when musical pieces are recognized based on very brief timbral cues (Schellenberg et al., 1999), or emotional distinctions are made (Goydke et al., 2004; Bi­ gand et al., 2005), remains to be elaborated.

Attention Most of the research described to this point was aimed at understanding the representa­ tion of basic musical features and dimensions, and the processing of change along those dimensions, without much regard for the broader and perhaps more domain-general psy­ chological processes that are engaged by music. Following from the fact that expectancy violation paradigms have been a staple of cogni­ tive neuroscience research on music, considerable information has been collected about attentional capture by deviant musical events. Working from a literature based primarily on visual attention, Corbetta and Shulman (2002) proposed a distinction between a dorsal and a ventral attentional system, whereby the ventral attentional system is engaged by novel or unexpected sensory input while the dorsal attentional system is active during en­ dogenously guided expectations, such as the orientation of attention to a particular spa­ Page 12 of 42

Cognitive Neuroscience of Music tial location. Overall, the orienting and engagement of attention in musical tasks recruits these attention systems. Monitoring for target musical events and the targets themselves cause activity increases in the ventral system—in the VLPFC in the region of the frontal operculum where the IFG meets the anterior insula (Janata et al., 2002a; Koelsch et al., 2002c; Tillmann et al., 2003). The strongest activation arises when the targets violate har­ monic expectations (Maess et al., 2001; Tillmann et al., 2003). Structural markers in music, such as the boundaries of phrases or the breaks between movements, also cause the ventral and dorsal attentional systems to become engaged (Nan et al., 2008; Sridharan et al., 2007), with the ventral system leading the dorsal sys­ tem (Sridharan et al., 2007). Attentive listening to excerpts of polyphonic music engages both systems even in the absence of specific targets or boundary markers (Janata et al., 2002a ; Satoh et al., 2001). Interestingly, the ventral attentional system is engaged, bilat­ erally, when (1) an attentive listening task requires target detection in either selective or divided attention conditions, or (2) selective listening is required without target detec­ tion, but not during divided/global listening without target detection. When task demands are shifted from target detection to focusing attention on an instrument as though one were trying to memorize the part the instrument is playing, working memory areas in the dorsolateral prefrontal cortex (DLPFC) are recruited bilaterally (Janata et al., 2002a). Given the integral relationship between attention and timing (Jones, 1976; Large & Jones, 1999), elements of the brain’s attention and action systems interact when attention is fo­ cused explicitly on timing judgments, and this interaction is modulated by individual dif­ ferences in listening style. For example, while frontoparietal attention areas, together with auditory and premotor areas, are engaged overall, greater activation is observed within the ventral attentional system in those individuals who tend to orient their atten­ tion toward a longer, rather than a subdivided, beat period (Grahn & McAuley, 2009).

Memory Music depends on many forms of memory that aid in its perception and production. For example, we form an implicit understanding of tonal structure that allows us to form ex­ pectations and detect violations of those expectations. We also store knowledge about musical styles in terms of the harmonic progressions, timbres, and orchestration that characterize them. Beyond the memory for structural aspects of music are memories for specific pieces of music or autobiographical memories that may be inextricably linked to those pieces of music. We also depend on working memory to play music in our minds, ei­ ther when imagining a familiar song that we have retrieved from long-term memory or when repeating a jingle from a commercial that we just heard. Because linguistic materi­ al is often an integral part of music (i.e., the lyrics in songs), the set of memory processes that needs to be considered in association with music necessarily extends to include those associated with language. Two questions that arise (p. 120) are, How do the different memory systems interact, and how might they be characterized in terms of their overlap with memory systems identified using different tasks and sensory modalities?

Page 13 of 42

Cognitive Neuroscience of Music Working Memory An early neuroimaging study of musical processes used short eight-note melodies and found that parietal and lateral prefrontal areas were recruited when subjects had to com­ pare the pitch of the first and last notes (Zatorre et al., 1994). This result suggested that there is an overlap of musical working memory with more general working memory sys­ tems. Direct evidence that verbal working memory and tonal working memory share the same neural substrates—auditory, lateral prefrontal, and parietal cortices—was obtained in two studies in which verbal and tonal material was presented and rehearsed in sepa­ rate trials (Hickok et al., 2003) or in which the stimuli were identical but the task instruc­ tions emphasized encoding and rehearsal of either the verbal or tonal material (Koelsch et al., 2009). Further studies relevant to musical working memory are discussed below in the section on musical imagery.

Episodic Memory If we have a melody running through our minds, that is, if we are maintaining a melody in working memory, it is likely the consequence of having heard and memorized the melody at some point in the past. The melody need not even be one that we heard repeatedly dur­ ing our childhood (remote episodic memory), but could be one that we heard for the first time earlier in the experimental session (recent episodic memory). Neuroimaging experi­ ments have examined both types of episodic memory. During an incidental encoding phase of an episodic memory experiment, familiarity judg­ ments about 5-second excerpts of melodies (half of which were familiar) with no associat­ ed lyrics resulted in activation of medial prefrontal and anterior temporal lobe regions (Platel et al., 2003). The same pattern of activations was observed when a familiarity judgment task about nursery tunes was contrasted against a change detection judgment task using those same melodies (Satoh et al., 2006). Familiarity judgments based on CD recordings (as opposed to synthesized melodies) of instrumental music without lyrics or voice were found to increase activation within the MPFC, but not the anterior temporal lobes (Plailly et al., 2007). Similarly, both increased familiarity and autobiographical salience of popular music excerpts that did contain lyrics resulted in stronger activation of the dorsal medial prefrontal cortex, but not the anterior temporal lobes. In addition to showing a stronger response to familiar and memory-evoking music, the MPFC tracked the trajectories of musical excerpts in tonal space, supporting a hypothesis that tonal con­ texts are integrated with self-relevant information within this region (Janata, 2009). One possible reason for the discrepant findings in the anterior temporal lobes is the use of neuroimaging technique in that the positron emission tomography studies (Platel et al., 2003; Satoh et al., 2006) were not susceptible to signal loss in those regions as were the fMRI experiments (Janata, 2009; Plailly et al., 2007). Another is the use of complex recorded musical material compared with monophonic melodies. Familiarity judgments about monophonic melodies must be based solely on the pitch and temporal cues of a sin­ gle melodic line, whereas recordings of orchestral or popular music contain a multitude of timbral, melodic, harmonic, and rhythmic cues that can facilitate familiarity judgments. Page 14 of 42

Cognitive Neuroscience of Music Recruitment of the anterior temporal lobes is consistent with the neuropsychological evi­ dence of melody processing and recognition deficits following damage to those areas (Ay­ otte et al., 2000; Peretz, 1996). Indeed, when engaged in a recognition memory test in which patients were first presented with twenty-four unfamiliar folk tune fragments, and then made old/new judgments, those with right temporal lobe excisions were mainly im­ paired on tune recognition, whereas left temporal lobe excisions resulted in impaired recognition of the words (Samson & Zatorre, 1991). When tunes and words were com­ bined, new words paired with old tunes led to impaired tune recognition in both patient groups, suggesting some sort of text–tune integration process involving the temporal lobes of both hemispheres. In contrast to making judgments about or experiencing the long-term familiarity of musi­ cal materials, making judgments about whether a melody (either familiar or unfamiliar) was recently heard is more comparable with typical laboratory episodic memory tasks in which lists of items are memorized. The results from the small number of studies that have examined brain activations during old/new judgments for musical material consis­ tently indicate that different brain areas than those described above are recruited. More­ over, the reported activation patterns are quite heterogeneous, including the right hip­ pocampus (Watanabe et al., 2008), and a large number of prefrontal, parietal, and lateral temporal loci distributed bilaterally (Klostermann et al., 2009; (p. 121) Platel et al., 2003; Watanabe et al., 2008). Most consistent among those are the recruitment of the lateral anterior prefrontal cortex along the middle frontal gyrus (Brodmann area 10) and assort­ ed locations in the precuneus.

Absolute Pitch The rare ability to accurately generate the note name for a tone played in isolation with­ out an external referent is popularly revered as a highly refined musical ability. Neu­ roimaging experiments indicate that regions of the left DLPFC that are associated with working memory functions become more active when absolute pitch possessors passively listen to or make melodic interval categorization judgments about pairs of tones relative to musically trained individuals without absolute pitch (Zatorre et al., 1998). The hypothe­ sis that these regions become more active because of the process of associating a note with a label is supported by the observation that nonmusician subjects who are trained to associate chords with arbitrary numbers show activity within this region during that task following training (Bermudez & Zatorre, 2005). Although the classic absolute pitch ability as defined above is rare, the ability to distinguish above chance whether a recording of popular music has been transposed by one or two semitones is common (Schellenberg & Trehub, 2003), although not understood at a neuroscientific level.

Page 15 of 42

Cognitive Neuroscience of Music

Parallels Between Music and Language Syntax As noted in the section on the processing of harmonic/tonal structure in music, harmonic incongruities appear to engage brain regions and processes that are similar to those in­ volved in the processing of syntactic relations in language. These observations suggest that music and language may share neural resources that are more generally devoted to syntactic processing (Fedorenko et al., 2009; Patel, 2003). If there is a shared resource, then processing costs or altered ERP signatures should be observed when a person is at­ tending to and making judgments about one domain and syntactic violations occur in the unattended domain. Indeed, when chords are presented synchronously with words, and subjects have to make syntactic or semantic congruity judgments regarding the final word of each sentence, the amplitude of the left anterior negativity (LAN), a marker of linguistic syntax processing, is reduced in response to a syntactically incongruous word when it is accompanied by a harmonically irregular chord compared with when it is ac­ companied by a harmonically regular chord (Koelsch et al., 2005a). Conversely, the ERAN is reduced in amplitude when an incongruous chord is accompanied by a syntactically in­ congruous word (Steinbeis & Koelsch, 2008). Interestingly, there is no effect of harmonic (in)congruity on the N400, a marker of semantic incongruity, when words are being judged (Koelsch et al., 2005a). Nor is there an effect of physical auditory incongruities that give rise to an MMN response when sequences of tones, rather than chords, accom­ pany the words. The latter result further supports the separability of processes underly­ ing the ERAN and MMN.

Semantics The question of whether music conveys meaning is another interesting point of compari­ son between music and language. Music lacks sufficient specificity to unambiguously con­ vey relationships between objects and concepts, but it can be evocative of scenes and emotions (e.g., Beethoven’s Pastoral Symphony). Even without direct reference to lan­ guage, music specifies relationships among successive tonal elements (e.g., notes and chords), by virtue of the probability structures that govern music’s movement in tonal space. Less expected transitions create a sense of tension, whereas expected transitions release tension (Lerdahl & Krumhansl, 2007). Similarly, manipulations of timing (e.g., tempo, rhythm, and phrasing) parallel concepts associated with movement. Evidence of music’s ability to interact with the brain’s semantic systems comes from two elegant studies that make use of simultaneously presenting musical and linguistic materi­ al. The first (Koelsch et al., 2004) used short passages of real music to prime semantic contexts and then examined the ERP response to probe words that were either semanti­ cally congruous or incongruous with the concept ostensibly primed by the music. The in­ congruous words elicited an N400, indicating that the musical passage had indeed primed a meaningful concept as intended. The second (Steinbeis & Koelsch, 2008) used simultaneously presented words and chords, along with a moderately demanding dual Page 16 of 42

Cognitive Neuroscience of Music task in which attention had to be oriented to both the musical and linguistic information, and found that semantically incongruous words affected the processing of irregular (Neapolitan) chords. The semantic incongruities did not affect the ERAN, but rather af­ fected the N500, a late frontal negativity that follows the ERAN in response to incongru­ ous chords and is interpreted as a stage of contextual (p. 122) integration of the anom­ alous chord (Koelsch et al., 2000).

Action Although technological advances over the past few decades have made it possible to se­ quence sounds and produce music without the need for a human to actively generate each sound, music has been and continues to be intimately dependent on the motor sys­ tems of the human brain. Musical performance is the obvious context in which the motor systems are engaged, but the spectrum of actions associated with music extends beyond overt playing of an instrument. Still within the realm of overt action are movements that are coordinated with the music, be they complex dance moves or the simple tapping, nod­ ding, or bobbing along with perceived beat. Beyond that is the realm of covert action, in which no overt movements are detectable, but movements or sensory inputs are imag­ ined. Even expectancy, the formation of mental images of anticipated sensory input, can be viewed as a form of action (Fuster, 2001; Schubotz, 2007). As described above, listening to rhythms in the absence of overt action drives activity within premotor areas, the basal ganglia, and the cerebellum. Here, we examine musical engagement of the brain’s action system when some form of action, either overt or covert, is required. One of the beautiful things about music is the ability to engage the ac­ tion system across varying degrees of complexity and still have it be a compelling musical experience, from simple isochronous synchronization with the beat to virtuosic polyrhyth­ mic performance on an instrument. Within other domains of cognitive neuroscience, there is an extensive literature on timing and sequencing behaviors that shows differential en­ gagement of the action systems as a function of complexity (Janata & Grafton, 2003), which is largely echoed in the emerging literature pertaining explicitly to music.

Sensorimotor Coordination Tapping Perhaps the simplest form of musical engagement is tapping isochronously with a metronome whereby a sense of meter is imparted through the periodic accentuation of a subset of the pacing events (e.g., accenting every other beat to impart the sense of a march or every third beat to impart the sense of a waltz). In these types of situations, a very simple coupling between the posterior auditory cortex and dorsal premotor areas is observed, in which the strength of the response in these two areas is positively correlated with the strength of the accent that drives the metric salience and the corresponding be­ havioral manifestation of longer taps to more salient events (Chen et al., 2006). When the synchronization demands increase as simple and complex metric and then nonmetric Page 17 of 42

Cognitive Neuroscience of Music rhythms are introduced, behavioral variability increases, particularly among nonmusi­ cians. Positively correlated with the increased performance variability is the activity with­ in a more extensive network comprising the pSMA, SMA, ventral PMC, DLPFC, inferior parietal lobule, thalamus, and cerebellum, with a few additional differences between mu­ sicians and non-musicians (Chen et al., 2008b). Thus, premotor areas are coupled with at­ tention and working memory areas as the sensorimotor coupling demands increase. Basic synchronization with a beat also provides a basis for interpersonal synchronization and musical communication. Simultaneous electroencephalogram (EEG) recordings from guitarists given the task of mentally synchronizing with a metronome and then commenc­ ing to play a melody together reveal that EEG activity recorded from electrodes situated above premotor areas becomes synchronized both within and between the performers (Lindenberger et al., 2009). Although the degree of interpersonal synchronization that arises by virtue of shared sensory input is difficult to estimate, such simultaneous record­ ing approaches are bound to shape our understanding of socioemotional aspects of senso­ rimotor coordination.

Singing The adjustment of one’s own actions based on sensory feedback is an important part of singing. In its simplest form, the repeated singing/chanting of a single note, in compari­ son to passive listening to complex tones, recruits auditory cortex, motor cortex, the SMA, and the cerebellum, with possible involvement of the anterior cingulate and basal ganglia (Brown et al., 2004b; Perry et al., 1999; Zarate & Zatorre, 2008). However, as pitch regulation demands are increased through electronic shifting of the produced pitch, additional premotor and attentional control regions such as the pSMA, ventral PMC, basal ganglia, and intraparietal sulcus are recruited across tasks that require the singer to either ignore the shift or try to compensate for it. The exact network of recruited areas depends on the amount of singing experience (Zarate & Zatorre, 2008). In regard to more melodic material, (p. 123) repetition of, or harmonization with, a melody also engages the anterior STG relative to monotonic vocalization (Brown et al., 2004b), although this acti­ vation is not seen during the singing of phrases from an aria that is familiar to the subject (Kleber et al., 2007).

Performance Tasks involving the performance of complex musical sequences, beyond those that are re­ produced via the imitation of a prescribed auditory pattern, afford an opportunity to ob­ serve the interaction of multiple brain systems. Performance can be externally guided by a musical score, internally guided as in the case of improvisation, or some combination of the two.

Score-Based One of the first neuroimaging studies of musical functions examined performance of a Bach partita from a score and the simpler task of playing scales contrasted with listening to the corresponding music (Sergent et al., 1992). Aside from auditory, visual, and motor Page 18 of 42

Cognitive Neuroscience of Music areas recruited by the basic processes of hearing, score reading, and motor execution, parietal regions were engaged, presumably by the visuomotor transformations associated with linking the symbols in the score with a semantic understanding of those symbols as well as associated actions (Bevan et al., 2003; McDonald, 2006; Schön et al., 2001; 2002). In addition, left premotor and IFG areas were engaged, presumably reflecting some of the sequencing complexity associated with the partita. A similar study (Parsons et al., 2005) in which bimanual performance of memorized Bach pieces was compared with bimanual playing of scales, found extensive activation of medial and lateral premotor areas, anteri­ or auditory cortex, and subcortical activations in the thalamus and basal ganglia, presum­ ably driven by the added demands of retrieving and executing complex sequences from memory. Other studies complicate the interpretation that premotor cortices are driven by greater complexity in the music played. For example, separate manipulation of melodic and rhyth­ mic complexity found some areas that were biased toward processing melodic informa­ tion (mainly in the STG and calcarine sulcus), whereas others were biased toward pro­ cessing rhythmic information (left inferior frontal cortex and inferior temporal gyrus), but there was no apparent activation of premotor areas (Bengtsson & Ullen, 2006).

Improvised Music, like language, is often improvised with the intent of producing a syntactically (and semantically) coherent stream of auditory events. Given a task of continuing an unfamil­ iar melody or linguistic phrase with an improvised sung melodic or spoken linguistic phrase, a distributed set of brain areas is engaged in common for music and language, in­ cluding the SMA, motor cortex, putamen, globus pallidus, cerebellum, posterior auditory cortex, and lateral inferior frontal cortex (Brodmann area 44/45), although the extent of activation in area 44/45 is greater for language (Brown et al., 2006). Lateral premotor ar­ eas are consistently found to be active during improvisation tasks that involve various de­ grees of piano performance realism. For instance, when unimanual production of melodies is constrained by a five-key keyboard and instructions that independently vary the amount of melodic or rhythmic freedom that can be exhibited by the subject, activity in mid-dorsal premotor cortex is modulated by complexity along both dimensions (Berkowitz & Ansari, 2008). A similar region is recruited during unimanual production while improvising around a visually presented score, both when the improvised perfor­ mance must be memorized and when it is improvised freely without memorization (Bengtsson et al., 2007). A more dorsal premotor area is engaged during this type of im­ provisation also, mirroring effects found in a study of unimanual improvisation in which free improvisation without a score was contrasted with playing a jazz melody from memo­ ry (Limb & Braun, 2008). The latter study observed activation within an extensive net­ work encompassing the ventrolateral prefrontal (Brodmann area 44), middle temporal, parietal, and cerebellar areas. Emotion areas in the ventromedial prefrontal cortex were active during improvisation also, providing the first neuroimaging evidence of how motor control areas are coupled with affective areas during a highly naturalistic task. Interest­ ingly, both of the studies in which improvisation was least constrained also found substan­ Page 19 of 42

Cognitive Neuroscience of Music tial activation in extrastriate visual cortices that could not be attributed to visual input or score reading, suggesting perhaps that visual mental imagery processes accompany im­ provisation. One must note that in all of these studies, subjects were musicians, often with high levels of training.

Imagery Music affords an excellent opportunity for examining mental imagery. It is common to sing to oneself or have a song stuck in one’s head, so it would (p. 124) seem that the brain’s sensorimotor system is covertly engaged by this mental pastime. Studies of musi­ cal imagery have tended to emphasize either the auditory or the motor components, with an interest in determining the degree to which the primary auditory and motor cortices are engaged.

Auditory Imagery Activation of auditory association cortices is found using fMRI or PET when subjects sing through a short melody in order to compare the pitch of two notes corresponding to spe­ cific words in the lyric (Zatorre et al., 1996), or continue imaging the notes following the opening fragment of a television show theme song (Halpern & Zatorre, 1999). The activa­ tion of auditory areas is corroborated by EEG/MEG studies in which responses to an imagined note (Janata, 2001) or expected chord (Janata, 1995; Otsuka et al., 2008) closely resemble auditory evoked potentials with known sources in the auditory cortex (e.g., the N100). One study that used actual CD recordings of instrumental and vocal music found extensive activation of auditory association areas during silent gaps that were inserted in­ to the recordings, with some activation of the primary auditory cortex when the gaps oc­ curred in instrumental music (Kraemer et al., 2005). However, another study that used actual CD recordings to examine anticipatory imagery—the phenomenon of imagining the next track on a familiar album as soon as the current one ends—found no activation of the auditory cortices during the imagery period but extensive activation of a frontal and pre­ motor network (Leaver et al., 2009). Premotor areas, in particular the SMA, as well as frontal regions associated with memory retrieval, have been activated in most neuroimaging studies of musical imagery that have emphasized the auditory components (Halpern & Zatorre, 1999; Leaver et al., 2009; Za­ torre et al., 1996), even under relatively simple conditions that could be regarded as maintenance of items in working memory during same/different comparison judgments of melodies or harmonized melodies lasting 4 to 6 seconds (Brown & Martinez, 2007). It has been argued, however, on the basis of comparing activations in an instrumental timbre imagery task with a visual object imagery task, that the frontal contribution may arise from general imagery task demands (Halpern et al., 2004). Nonetheless, effortful musical imagery tasks, such as those requiring the imagining of newly learned pairings of novel melodies (Leaver et al., 2009), imagery of expressive short phrases from an aria (Kleber et al., 2007), or imagining the sound or actions associated with a Mozart piano sonata when only the other modality is presented (Baumann et al., 2007), appear to be associat­ ed with activity in a widespread network of cortical and subcortical areas. This network Page 20 of 42

Cognitive Neuroscience of Music matches quite well elements of both the ventral and dorsal attentional networks (Corbet­ ta & Shulman, 2002) and the network observed when attentive listening to polyphonic music is contrasted with rest (Janata et al., 2002a).

Motor Imagery Several studies have focused on motor imagery. In a group of pianists, violinists, and cel­ lists, imagined performance of rehearsed pieces from the classical repertoire recruited frontal and parietal areas bearing resemblance to the dorsal attention network, together with the SMA and subcortical areas and cerebellum (Langheim et al., 2002). Imagining performing the right-hand part of one of Bartok’s Mikrokosmos while reading the score similarly activates the dorsal attentional network along with visual areas and the cerebel­ lum (Meister et al., 2004). Interestingly, the SMA is not activated significantly when the source of the information to be imagined is external rather than internal (i.e., playing from memory), indicating that premotor and parietal elements of the dorsal attentional system coordinate with other brain regions based on the specific demands of the particu­ lar imagery task.

Emotion The relationship between music and emotion is a complex one, and multiple mechanisms have been postulated through which music and the emotion systems of the brain can in­ teract (Juslin & Vastfjall, 2008). Compared with the rather restricted set of paradigms that have been developed for probing the structural representations of music (e.g., tonali­ ty), the experimental designs for examining neural correlates of emotion in music are di­ verse. The precise emotional states that are captured in any given experiment, and their relevance to real music listening experiences, are often difficult to discern when the actu­ al comparisons between experimental conditions are considered carefully. Manipulations have tended to fall into one of two categories: (1) normal music contrasted with the same music rendered dissonant or incoherent, or (2) selection or composition of musical stimuli to fall into discrete affective categories (e.g., happy, sad, fearful). In general, studies have found modulation of activity within limbic system areas of the brain. (p. 125)

When the relative dissonance of a mechanical performance of a piano melody is

varied by the dissonance of the accompanying chords, activity in the right parahippocam­ pal gyrus correlates positively with the increases in dissonance and perceived unpleasant­ ness, whereas activity in the right orbitofrontal cortex and subcallosal cingulate increases as the consonance and perceived pleasantness increase (Blood et al., 1999). Similarly, when listening to pleasant dance music spanning a range of mostly classical genres is contrasted with listening to the same excerpts rendered dissonant and displeasing by mixing the original with two copies that have been pitch-shifted by a minor second and a tritone, medial temporal areas—the left parahippocampal gyrus, hippocampus, and bilat­ eral amygdala—respond more strongly to the dissonant music, whereas areas more typi­ cally associated with listening to music—the auditory cortex, the left IFG, anterior insula and frontal operculum, and ventral premotor cortex—respond more strongly to the origi­ Page 21 of 42

Cognitive Neuroscience of Music nal pleasing versions (Koelsch et al., 2006). The same stimulus materials result in stronger theta activity along anterior midline sites in response to the pleasing music (Sammler et al., 2007). Listening to coherent excerpts of music rather than their tempo­ rally scrambled counterparts (Levitin & Menon, 2003) increases activity in parts of the dopaminergic pathway—the ventral tegmental area and nucleus accumbens (Menon & Levitin, 2005). These regions interact and are functionally connected to the left IFG, insu­ la, hypothalamus, and orbitofrontal cortex, thus delineating a set of emotion-processing areas of the brain that are activated by music listening experiences that are relatively pleasing. The results of the above-mentioned studies are somewhat heterogeneous and challenging to interpret because they depend on comparisons of relatively normal (and pleasing) mu­ sic-listening experiences with highly abnormal (and displeasing) listening experience, rather than comparing normal pleasing listening experiences with normal displeasing ex­ periences. Nonetheless, modulation of the brain’s emotion circuitry is also observed when the statistical contrasts do not involve distorted materials. Listening to unfamiliar and pleasing popular music compared with silent rest activates the hippocampus, nucleus ac­ cumbens, ventromedial prefrontal cortex, right temporal pole, and anterior insula (Brown et al., 2004a). When listening to excerpts of unfamiliar and familiar popular music, activi­ ty in the VMPFC increases as the degree of experienced positive affect increases (Janata, 2009). Somewhat paradoxically, listening to music that elicits chills (goosebumps or shiv­ ers down the spine)—something that is considered by many to be highly pleasing—re­ duces activity in the VMPFC (where activity increases tend to be associated with positive emotional responses), whereas activity in the right amygdala and in the left hippocampus/ amygdala also decreases (Blood & Zatorre, 2001). Activity in other brain areas associated with positive emotional responses, such as the ventral striatum and orbitofrontal cortex increases, along with activity in the insula and premotor areas (SMA and cerebellum). The amygdala has been of considerable interest, given its general role in the processing of fearful stimuli. Patients with either unilateral or bilateral damage to the amygdala show impaired recognition of scary music and difficulty differentiating peaceful music from sad music (Gosselin et al., 2005, 2007). Right amygdala damage, in particular, leaves patients unable to distinguish intended fear in music from either positive or nega­ tive affective intentions (Gosselin et al., 2005). Chord sequences that contain irregular chord functions and elicit activity in the VLPFC are also regarded as less pleasing and elicit activity bilaterally in the amygdala (Koelsch et al., 2008). Violations of syntactic ex­ pectations also increase the perceived tension in a piece of music and are associated with changes in electrodermal activity—a measure of emotional arousal (Steinbeis et al., 2006). Perhaps the most common association between music and emotion is the relationship be­ tween affective valence and the mode of the music: The minor mode is consistently asso­ ciated with sadness, whereas the major mode is associated with happiness. Brain activa­ tions associated with mode manipulations are not as consistent across studies, however. In one study (Khalfa et al., 2005), the intended emotions of classical music pieces played Page 22 of 42

Cognitive Neuroscience of Music on the piano were assessed on a five-point bivalent scale (sad to happy). Relative to major pieces, minor pieces elicited activity in the posterior cingulate and in the medial pre­ frontal cortex, whereas pieces in the major mode were not associated with any activity in­ creases relative to minor pieces. A similar absence of response for major mode melodies relative to minor mode was observed in a different study in which unfamiliar monophonic melodies were used (Green et al., 2008). However, minor mode melodies elicited activity in the left parahippocampal gyrus and rostral anterior cingulate, indicating engagement of the limbic system, albeit in a unique constellation. A study in which responses to (p. 126) short four-chord sequences that established either major or minor tonalities were compared with repeated chords found bilateral activation of the IFG, irrespective of the mode (Mizuno & Sugishita, 2007). This result was consistent with the role of this region in the evaluation of musical syntax, but inconsistent with the other studies comparing ma­ jor and minor musical material. Finally, a study using recordings of classical music that could be separated into distinct happy, sad, and neutral categories (Mitterschiffthaler et al., 2007) found that happy and sad excerpts strongly activated the auditory cortex bilat­ erally relative to neutral music. Responses to happy and sad excerpts (relative to neutral) were differentiated in that happy music elicited activity within the ventral striatum, sev­ eral sections of the cingulate cortex, and the parahippocampal gyrus, whereas sad music was associated with activity in a region spanning the right hippocampus and amygdala, along with cingulate regions.

Anatomy, Plasticity, and Development Music provides an excellent arena in which to study the effects of training and expertise on the brain, both in terms of structure and function (Munte et al., 2002), and also to ex­ amine structural differences in unique populations, such as those individuals who possess the ability to name pitches in isolation (absolute pitch) or those who have difficulty per­ ceiving melodies (amusics). Anatomical correlates of musical expertise have been ob­ served both in perceptual and motor areas of the brain. Unsurprisingly, the auditory cortex has been the specific target of several investigations. An early investigation observed a larger planum temporale in the left hemisphere among musicians, although the effect was primarily driven by musicians with absolute pitch (Schlaug et al., 1995). In studies utilizing larger numbers of musically trained and un­ trained subjects, the volume of HG, where the primary and secondary auditory areas are situated, was found to increase with increasing musical aptitude (Schneider et al., 2002, 2005). The volumetric measures were positively correlated with the strength of the early (19–30 ms) stages of the evoked responses to amplitude-modulated pure tones (Schneider et al., 2002). Within the lateral extent of HG, the volume was positively correlated with the magnitude of a slightly later peak (50 ms post-stimulus) in the waveform elicited by sounds consisting of several harmonics. Remarkably, the hemispheric asymmetry in the volume of this region was indicative of the mode of perceptual processing of these sounds, with larger left-hemisphere volumes reflecting a bias toward processing the im­ plied fundamental frequency of the sounds and larger right-hemisphere volumes indicat­ Page 23 of 42

Cognitive Neuroscience of Music ing a bias toward spectral processing of the sounds (Schneider et al., 2005). In general, the auditory cortex appears to respond more strongly to musical sounds in musicians (Pantev et al., 1998) and as a function of the instrument with which they have had the most experience (Margulis et al., 2009). Whole brain analyses using techniques such a cortical thickness mapping or voxel-based morphometry (VBM) have also revealed differences between individuals trained on musi­ cal instruments and those with no training, although there is considerable variability in the regions identified in the different studies, possibly a consequence of differences in the composition of the samples and the mapping technique used (Bermudez et al., 2009). Cer­ tain findings, such as a greater volume in trained pianists of primary motor and so­ matosensory areas and cerebellar regions responsible for hand and finger movements (Gaser & Schlaug, 2003), are relatively easy to interpret, and they parallel findings of stronger evoked responses in the hand regions of the right hemisphere that control the left (fingering) hand of violinists (Pascual-Leone et al., 1994). Larger volumes in musi­ cians are also observed in lateral prefrontal cortex, both ventrally (Bermudez et al., 2009; Gaser & Schlaug, 2003; Sluming et al., 2002) and dorsally along the middle frontal gyrus (Bermudez et al., 2009). However, one particular type of musical aptitude, absolute pitch, is associated with decreased cortical thickness in dorsolateral frontal cortex in similar ar­ eas that are associated with increases in activation in listeners with absolute pitch rela­ tive to other musically trained subjects (Zatorre et al., 1998). Another paradox presented by VBM are findings of greater gray-matter volumes in amusic subjects in some of the same ventrolateral regions that show greater cortical thickness in musicians (Bermudez et al., 2009; Hyde et al., 2007). The anatomical differences that are observed as a function of musical training are per­ haps better placed into a functional context when one observes the effects of short-term training on neural responses. Nonmusicians who were trained over the course of two weeks to play a cadence consisting of broken chords on a piano keyboard exhibited a stronger MMN to deviant notes in similar note patterns compared either with their re­ sponses before receiving training or with a group of subjects who received training by lis­ tening and making judgments about (p. 127) the sequences performed by the trained group (Lappe et al., 2008). Similarly, nonmusicians who, over the course of 5 days, learned to perform five-note melodies showed greater activation bilaterally in the dorsal IFG (Broca’s area) and lateral premotor areas when listening to the trained melodies compared with listening to comparison melodies on which they had not trained (Lahav et al., 2007). Thus, perceptual responses are stronger following sensorimotor training with­ in the networks that are utilized during the training. When the training involves reading music from a score, medial superior parietal areas also show effects of training (Stewart et al., 2003). Both mental and physical practice of five-finger piano exercises is capable of strengthening finger representations in the motor cortex across a series of days, as mea­ sured by reduced transcranial magnetic stimulation (TMS) thresholds for eliciting move­ ments (Pascual-Leone et al., 1995).

Page 24 of 42

Cognitive Neuroscience of Music

Disorders Musical behaviors, like any other behaviors, are disrupted when the functioning of the neural substrates that support those behaviors is impaired. Neuropsychological investiga­ tions provide intriguing insights into component processes in musical behaviors and their likely locations in the brain. Aside from the few studies mentioned in the sections above of groups of patients who underwent brain surgery, there are many case studies docu­ menting the effects of brain insults, typically caused by stroke, on musical functions (Brust, 2003). A synthesis of the findings from these many studies (Stewart et al., 2006) is beyond the scope of this chapter, as is a discussion of the burgeoning topic of using music for neurorehabilitation (Belin et al., 1996; Sarkamo et al., 2008; Thaut, 2005). Here, the discussion of brain disorders in relation to music is restricted to amusia, a music-specific disorder.

Amusia Amusia, commonly referred to as “tone deafness,” refers to a profound impairment in ac­ curately perceiving melodies. The impairment arises not so much from an inability to dis­ criminate one note in a melody from the next (i.e., to recognize that a different note is be­ ing played), but rather from the inability to determine the direction of the pitch change (Ayotte et al., 2002; Foxton et al., 2004). The ability to perceive the direction of pitch change from one note to the next is critical to discerning the contour of the melody, that is, its defining feature. The impairment may be restricted to processing of melodic rather than rhythmic structure (Hyde & Peretz, 2004), although processing of rhythms is im­ paired when the pitch of individual notes is also changing (Foxton et al., 2006). Impaired identification of pitch direction, but not basic pitch discrimination, has been ob­ served in patients with right temporal lobe excisions that encroach on auditory cortex in HG (Johnsrude et al., 2000), which is likely to underlie the bias toward the right hemi­ sphere for the processing of melodic information (Warrier & Zatorre, 2004). A diffusion tensor imaging study of amusics and normal controls found that the volume of the superi­ or arcuate fasciculus in the right hemisphere was consistently smaller in the group of amusics than in normal controls (Loui et al., 2009a). The arcuate fasciculus connects the temporal and frontal lobes, specifically the posterior superior and middle temporal gyri and the IFG. VBM results additionally indicate a structural anomaly in a small region of the IFG in amusics (Hyde et al., 2006, 2007). Taken together, the neuroimaging studies that have implicated the IFG in the processing of musical syntax and temporal structure (Koelsch et al., 2002c, 2005b; Levitin & Menon, 2003; Maess et al., 2001; Patel, 2003; Till­ mann et al., 2003), the behavioral and structural imaging data from amusics, and the studies of deficits in melody processing in right temporal lobe lesion patients support a view that the ability to perceive, appreciate, and remember melodies depends in large part on intact functioning of a perception/action circuit in the right hemisphere (Fuster, 2000; Loui et al., 2009a).

Page 25 of 42

Cognitive Neuroscience of Music

Music and the Brain’s Ensemble of Functions During the past 15 to 20 years, there has been a tremendous increase in the amount of knowledge pertaining to the ways in which the human brain interacts with music. Al­ though it is expeditious for those outside the field to regard music as a tidy circumscribed object or unitary process that is bound to have a concrete representation in the brain, or perhaps conversely a complex cultural phenomenon for which there is no hope of under­ standing its neural basis, even a modest amount of deeper contemplation reveals music to be a multifaceted phenomenon that is integral to human life. My objective in this chapter was to provide an overview of the variety of musical processes that contribute to musical behaviors and experiences, and of the way that these processes interact with various (p. 128) domain-general brain systems. Clearly, music is a diverse phenomenon, and musi­ cal stimuli and musical tasks are capable of reaching most every part of the brain. Given this complexity, is there any hope for generating process models of musical functions and behaviors that can lead to a grand unified theory of music and the brain?

Figure 7.2 A highly schematized and simplified sum­ mary of brain regions involved in different facets of music psychological processes. On the left is a lateral view of the right cerebral hemisphere. A medial view of the right hemisphere is shown on the right. There is no intent to imply lateralization of function in this portrayal. White lettering designates the different lobes. The colored circles correspond to the colored labels in the box below. AG, angular gyrus; DLPFC, dorsolateral prefrontal cortex; HG, Heschl’s gyrus; IFG, inferior frontal gyrus; IPS, intraparietal sulcus; MPFC, medial prefrontal cortex; PMC, premotor cor­ tex; pSMA, pre–supplementary motor area; PT, planum temporale; SMA, supplementary motor area; STG, superior temporal gyrus; VLPFC, ventrolateral prefrontal cortex.

Obligatory components of process models are boxes with arrows between them, in which each box refers to a discrete function, and perhaps an associated brain area. Such models have been proposed with respect to music (Koelsch & Siebel, 2005; Peretz & Coltheart, 2003). Within such models, some component processes are considered music specific, whereas others represent shared processes with other brain functions (e.g., language, emotion). The issue of music specificity, or modularity of musical functions, is of consider­ able interest, in large part because of its evolutionary implications (Peretz, 2006). The strongest evidence for modularity of musical functions derives from studies of individuals Page 26 of 42

Cognitive Neuroscience of Music with brain damage in whom specific musical functions are selectively impaired (Peretz, 2006; Stewart et al., 2006). Such specificity in loss of function is remarkable given the usual extent of brain damage. Indeed, inferences about modularity must be tempered when not all possible parallel functions in other domains have been considered. The pro­ cessing functions of the right lateral temporal lobes provide a nice example. As reviewed in this chapter and elsewhere (Zatorre et al., 2002), the neuropsychological and function­ al and structural neuroanatomical evidence suggests that the auditory cortex in the right hemisphere is more specialized than the left for the processing of pitch, pitch relation­ ships, and thereby melody. Voice-selective regions of the auditory cortex are highly selec­ tive in the right hemisphere (Belin et al., 2000), and in general a large extent of the right lateral temporal lobes appears to be important for the processing of emotional prosody (Ethofer et al., 2009; Ross & Monnot, 2008; Schirmer & Kotz, 2006). Given evidence of parallels between melody and prosody, such as the connotation of sadness by an interval of a minor third in both music and speech (Curtis & Bharucha, 2010), or deficits among amusic individuals in processing speech intonation contours (Patel et al., 2005), it is likely that contour-related music and language functions are highly intertwined in the right temporal lobe. Perhaps more germane to the question of how the brain processes music is the question of how one engages with the music. Few would doubt that the brain of someone dancing a tango at a club would show more extensive engagement with the music than that of some­ one incidentally hearing music while choosing what cereal to buy at a supermarket. It would therefore seem that any given result from the cognitive neuroscience of music lit­ erature must be interpreted with regard to the experiential situation of the participants, both in terms of the affective, motivational, and task/goal states they might find them­ selves in, and in terms of the relationship of the (often abstracted) musical stimuli that (p. 129) they are being asked to interact with to the music they would normally interact with. In this regard, approaches that explicitly relate the dynamic time-varying properties of real musical stimuli to the behaviors and brain responses they engender will become increasingly important. Figure 7.2 is a highly schematized (and incomplete) summary of sets of brain areas that may be recruited by different musical elements and more general processes that mediate musical experiences. It is not intended to be a process model or to represent the outcome of a quantitative meta-analysis. Rather, it serves mainly to suggest that one goal of re­ search in the cognitive neuroscience of music might be to serve as a model system for un­ derstanding the coordination of the many processes that underlie creative goal-directed behavior in humans.

References Adler, D. S. (2009). Archaeology: The earliest musical tradition. Nature, 460, 695–696. Ayotte, J., Peretz, I., & Hyde, K. (2002). Congenital amusia: A group study of adults afflict­ ed with a music-specific disorder. Brain, 125, 238–251. Page 27 of 42

Cognitive Neuroscience of Music Ayotte, J., Peretz, I., Rousseau, I., Bard, C., & Bojanowski, M. (2000). Patterns of music agnosia associated with middle cerebral artery infarcts. Brain, 123 (Pt 9), 1926–1938. Bangert, M., Peschel, T., Schlaug, G., Rotte, M., Drescher, D., Hinrichs, H., Heinze, H.-J., & Altenmüller, E. (2006). Shared networks for auditory and motor processing in profes­ sional pianists: Evidence from fMRI conjunction. NeuroImage, 30, 917–926. Barnes, R., & Jones, M. R. (2000). Expectancy, attention, and time. Cognitive Psychology, 41, 254–311. Baumann, S., Koeneke, S., Schmidt, C. F., Meyer, M., Lutz, K., & Jancke, L. (2007). A net­ work for audio-motor coordination in skilled pianists and non-musicians. Brain Res, 1161, 65–78. Beisteiner, R., Erdler, M., Mayer, D., Gartus, A., Edward, V., Kaindl, T., Golaszewski, S., Lindinger, G., & Deecke, L. (1999). A marker for differentiation of capabilities for process­ ing of musical harmonies as detected by magnetoencephalography in musicians. Neuro­ science Letters, 277, 37–40. Belin, P., VanEeckhout, P., Zilbovicius, M., Remy, P., Francois, C., Guillaume, S., Chain, F., Rancurel, G., & Samson, Y. (1996). Recovery from nonfluent aphasia after melodic intona­ tion therapy: A PET study. Neurology, 47, 1504–1511. Belin, P., Zatorre, R. J., Lafaille, P., Ahad, P., & Pike, B. (2000). Voice-selective areas in hu­ man auditory cortex. Nature, 403, 309–312. Bengtsson, S. L., Csikszentmihalyi, M., & Ullen, F. (2007). Cortical regions involved in the generation of musical structures during improvisation in pianists. Journal of Cognitive Neuroscience, 19, 830–842. Bengtsson, S. L., & Ullen, F. (2006). Dissociation between melodic and rhythmic process­ ing during piano performance from musical scores. NeuroImage, 30, 272–284. Berkowitz, A. L., & Ansari, D. (2008). Generation of novel motor sequences: The neural correlates of musical improvisation. NeuroImage, 41, 535–543. Bermudez, P., Lerch, J. P., Evans, A. C., & Zatorre, R. J. (2009). Neuroanatomical corre­ lates of musicianship as revealed by cortical thickness and voxel-based morphometry. Cerebral Cortex, 19, 1583–1596. Bermudez, P., & Zatorre, R. J. (2005). Conditional associative memory for musical stimuli in nonmusicians: Implications for absolute pitch. Journal of Neuroscience, 25, 7718–7723. Besson, M., & Faïta, F. (1995). An event-related potential (ERP) study of musical ex­ pectancy: Comparison of musicians with nonmusicians. Journal of Experimental Psycholo­ gy: Human Perception and Performance, 21, 1278–1296.

Page 28 of 42

Cognitive Neuroscience of Music Besson, M., & Macar, F. (1987). An event-related potential analysis of incongruity in mu­ sic and other non-linguistic contexts. Psychophysiology, 24, 14–25. Bevan, A., Robinson, G., Butterworth, B., & Cipolotti, L. (2003). To play “B” but not to say “B”: Selective loss of letter names. Neurocase, 9, 118–128. Bigand, E., Vieillard, S., Madurell, F., Marozeau, J., & Dacquet, A. (2005). Multidimension­ al scaling of emotional responses to music: The effect of musical expertise and of the du­ ration of the excerpts. Cognition & Emotion, 19, 1113–1139. Blood, A. J., & Zatorre, R. J. (2001). Intensely pleasurable responses to music correlate with activity in brain regions implicated in reward and emotion. Proceedings of the Na­ tional Academy of Sciences U S A, 98, 11818–11823. Blood, A. J., Zatorre, R. J., Bermudez, P., & Evans, A. C. (1999). Emotional responses to pleasant and unpleasant music correlate with activity in paralimbic brain regions. Nature Neuroscience, 2, 382–387. Brown, S., & Martinez, M. J. (2007). Activation of premotor vocal areas during musical discrimination. Brain and Cognition, 63, 59–69. Brown, S., Martinez, M. J., & Parsons, L. M. (2004a). Passive music listening spontaneous­ ly engages limbic and paralimbic systems. Neuroreport, 15, 2033–2037. Brown, S., Martinez, M. J., & Parsons, L. M. (2006). Music and language side by side in the brain: A PET study of the generation of melodies and sentences. European Journal of Neuroscience, 23, 2791–2803. Brown, S., Martinez, M. J., Hodges, D. A., Fox, P. T., & Parsons, L. M. (2004b). The song system of the human brain. Cognitive Brain Research, 20, 363–375. Brust, J. C. M. (2003). Music and the neurologist: A historical perspective. In I. Peretz & R. J. Zatorre (Eds.), Cognitive neuroscience of music (pp. 181–191). Oxford, UK: Oxford University Press. Caclin, A., Brattico, E., Tervaniemi, M., Naatanen, R., Morlet, D., Giard, M. H., & McAdams, S. (2006). Separate neural processing of timbre dimensions in auditory senso­ ry memory. Journal of Cognitive Neuroscience, 18, 1959–1972. Caclin, A., Giard M.-H., Smith, B. K., & McAdams, S. (2007). Interactive processing of timbre dimensions: A Garner interference study. Brain Research, 1138, 159–170. Caclin, A., McAdams, S., Smith, B. K., & Giard, M. H. (2008). Interactive processing of timbre dimensions: An exploration with event-related potentials. Journal of Cognitive Neuroscience, 20, 49–64.

Page 29 of 42

Cognitive Neuroscience of Music Caclin, A., McAdams, S., Smith, B. K., & Winsberg, S. (2005). Acoustic correlates of timbre space dimensions: A confirmatory study using synthetic tones. Journal of the Acoustical Society of America, 118, 471–482. (p. 130)

Carrion, R. E., & Bly, B. M. (2008). The effects of learning on event-related potential cor­ relates of musical expectancy. Psychophysiology, 45, 759–775. Chen, J. L., Penhune, V. B., & Zatorre, R. J. (2008a). Listening to musical rhythms recruits motor regions of the brain. Cerebral Cortex, 18, 2844–2854. Chen, J. L., Penhune, V. B., & Zatorre, R. J. (2008b). Moving on time: Brain network for au­ ditory-motor synchronization is modulated by rhythm complexity and musical training. Journal of Cognitive Neuroscience, 20, 226–239. Chen, J. L., Zatorre, R. J., & Penhune, V. B. (2006). Interactions between auditory and dor­ sal premotor cortex during synchronization to musical rhythms. NeuroImage, 32, 1771– 1781. Corbetta, M., & Shulman, G. L. (2002). Control of goal-directed and stimulus-driven atten­ tion in the brain. Nature Reviews, Neuroscience, 3, 201–215. Curtis, M. E., & Bharucha, J. J. (2010). The minor third communicates sadness in speech, mirroring its use in music. Emotion, 10, 335–348. Dennis, M., & Hopyan, T. (2001). Rhythm and melody in children and adolescents after left or right temporal lobectomy. Brain and Cognition, 47, 461–469. Dowling, W. J. (1978). Scale and contour: Two components of a theory of memory for melodies. Psychological Review, 85, 341–354. Ethofer, T., De Ville, D. V., Scherer, K., & Vuilleumier, P. (2009). Decoding of emotional in­ formation in voice-sensitive cortices. Current Biology, 19, 1028–1033. Fedorenko, E., Patel, A., Casasanto, D., Winawer, J., & Gibson, E. (2009). Structural inte­ gration in language and music: Evidence for a shared system. Memory and Cognition, 37, 1–9. Foxton, J. M., Dean, J. L., Gee, R., Peretz, I., & Griffiths, T. D. (2004). Characterization of deficits in pitch perception underlying “tone deafness.” Brain, 127, 801–810. Foxton, J. M., Nandy, R. K., & Griffiths, T. D. (2006). Rhythm deficits in “tone deafness.” Brain and Cognition, 62, 24–29. Fuster, J. M. (2000). Executive frontal functions. Experimental Brain Research, 133, 66– 70. Fuster, J. M. (2001) The prefrontal cortex—an update: Time is of the essence. Neuron, 30, 319–333. Page 30 of 42

Cognitive Neuroscience of Music Gaser, C., & Schlaug, G. (2003). Brain structures differ between musicians and non-musi­ cians. Journal of Neuroscience, 23, 9240–9245. Gilbert, S. J., Spengler, S., Simons, J. S., Steele, J. D., Lawrie, S. M., Frith, C. D., & Burgess, P. W. (2006). Functional specialization within rostral prefrontal cortex (Area 10): A meta-analysis. Journal of Cognitive Neuroscience, 18, 932–948. Gosselin, N., Peretz, I., Johnsen, E., & Adolphs, R. (2007). Amygdala damage impairs emo­ tion recognition from music. Neuropsychologia, 45, 236–244. Gosselin, N., Peretz, I., Noulhiane, M., Hasboun, D., Beckett, C., Baulac, M., & Samson, S. (2005). Impaired recognition of scary music following unilateral temporal lobe excision. Brain, 128, 628–640. Goydke, K. N., Altenmuller, E., Moller, J., & Munte, T. F. (2004). Changes in emotional tone and instrumental timbre are reflected by the mismatch negativity. Cognitive Brain Re­ search, 21, 351–359. Grahn, J. A., & Brett, M. (2007). Rhythm and beat perception in motor areas of the brain. Journal of Cognitive Neuroscience, 19, 893–906. Grahn, J. A., & McAuley, J. D. (2009). Neural bases of individual differences in beat per­ ception. NeuroImage, 47, 1894–1903. Grahn, J. A., & Rowe, J. B. (2009). Feeling the beat: Premotor and striatal interactions in musicians and nonmusicians during beat perception. Journal of Neuroscience, 29, 7540– 7548. Green, A. C., Baerentsen, K. B., Stodkilde-Jorgensen, H., Wallentin, M., Roepstorff, A., & Vuust, P. (2008). Music in minor activates limbic structures: A relationship with disso­ nance? Neuroreport, 19, 711–715. Grey, J. M. (1977). Multidimensional perceptual scaling of musical timbres. Journal of the Acoustical Society of America, 61, 1270–1277. Griffiths, T. D., Buchel, C., Frackowiak, R. S. J., & Patterson, R. D. (1998). Analysis of tem­ poral structure in sound by the human brain. Nature Neuroscience, 1, 422–427. Halpern, A. R., & Zatorre, R. J. (1999). When that tune runs through your head: A PET in­ vestigation of auditory imagery for familiar melodies. Cerebral Cortex, 9, 697–704. Halpern, A. R., Zatorre, R. J., Bouffard, M., & Johnson, J. A. (2004). Behavioral and neural correlates of perceived and imagined musical timbre. Neuropsychologia, 42, 1281–1292. Hickok, G., Buchsbaum, B., Humphries, C., & Muftuler, T. (2003). Auditory-motor interac­ tion revealed by fMRI: Speech, music, and working memory in area Spt. Journal of Cogni­ tive Neuroscience, 15, 673–682.

Page 31 of 42

Cognitive Neuroscience of Music Hyde, K. L., Lerch, J. P., Zatorre, R. J., Griffiths, T. D., Evans, A. C., & Peretz, I. (2007). Cortical thickness in congenital amusia: When less is better than more. Journal of Neuro­ science, 27, 13028–13032. Hyde, K. L., & Peretz, I. (2004). Brains that are out of tune but in time. Psychological Science, 15, 356–360. Hyde, K. L., Zatorre, R. J., Griffiths, T. D., Lerch, J. P., & Peretz, I. (2006). Morphometry of the amusic brain: A two-site study. Brain, 129, 2562–2570. Janata, P. (1995). ERP measures assay the degree of expectancy violation of harmonic contexts in music. Journal of Cognitive Neuroscience, 7, 153–164. Janata, P. (2001). Brain electrical activity evoked by mental formation of auditory expecta­ tions and images. Brain Topography, 13, 169–193. Janata, P. (2005). Brain networks that track musical structure. Annals of the New York Academy of Sciences, 1060, 111–124. Janata, P. (2007). Navigating tonal space. In W. B. Hewlett, E. Selfridge-Field, & E. Cor­ reia (Eds.), Tonal theory for the digital age (pp. 39–50). Stanford, CA: Center for Comput­ er Assisted Research in the Humanities. Janata, P. (2009). The neural architecture of music-evoked autobiographical memories. Cerebral Cortex, 19, 2579–2594. Janata, P., Birk, J. L., Tillmann, B., & Bharucha, J. J. (2003). Online detection of tonal popout in modulating contexts. Music Perception, 20, 283–305. Janata, P., Birk, J. L., Van Horn, J. D., Leman, M., Tillmann, B., & Bharucha, J. J. (2002b). The cortical topography of tonal structures underlying Western music. Science, 298, 2167–2170. Janata, P., & Grafton, S. T. (2003). Swinging in the brain: Shared neural substrates for be­ haviors related to sequencing and music. Nature Neuroscience, 6, 682–687. Janata, P., Tillmann, B., & Bharucha, J. J. (2002a). Listening to polyphonic music recruits domain-general attention and working memory circuits. Cognitive, Affective and Behav­ ioral Neuroscience, 2, 121–140. Jentschke, S., & Koelsch, S. (2009). Musical training modulates the development of syntax processing in children. NeuroImage, 47, 735–744. (p. 131)

Johnsrude, I. S., Penhune, V. B., & Zatorre, R. J. (2000). Functional specificity in the right human auditory cortex for perceiving pitch direction. Brain, 123, 155–163. Jones, M. R. (1976). Time, our lost dimension—toward a new theory of perception, atten­ tion, and memory. Psychological Review, 83, 323–355. Page 32 of 42

Cognitive Neuroscience of Music Jones, M. R., & Boltz, M. (1989). Dynamic attending and responses to time. Psychological Review, 96, 459–491. Jones, M. R., Moynihan, H., MacKenzie, N., & Puente, J. (2002). Temporal aspects of stim­ ulus-driven attending in dynamic arrays. Psychological Science, 13, 313–319. Juslin, P. N., & Vastfjall, D. (2008). Emotional responses to music: The need to consider underlying mechanisms. Behavioral and Brain Science, 31, 559–621. Khalfa, S., Schon, D., Anton, J. L., & Liegeois-Chauvel, C. (2005). Brain regions involved in the recognition of happiness and sadness in music. Neuroreport, 16, 1981–1984. Kleber, B., Birbaumer, N., Veit, R., Trevorrow, T., & Lotze, M. (2007). Overt and imagined singing of an Italian aria. NeuroImage, 36, 889–900. Klostermann, E. C., Loui, P., Shimamura, A. P. (2009). Activation of right parietal cortex during memory retrieval of nonlinguistic auditory stimuli. Cognitive Affective & Behav­ ioral Neuroscience, 9, 242–248. Koelsch, S. (2009). Music-syntactic processing and auditory memory: Similarities and dif­ ferences between ERAN and MMN. Psychophysiology, 46, 179–190. Koelsch, S., & Mulder, J. (2002). Electric brain responses to inappropriate harmonies dur­ ing listening to expressive music. Clinical Neurophysiology, 113, 862–869. Koelsch, S., & Siebel, W. A. (2005). Towards a neural basis of music perception. Trends in Cognitive Sciences, 9, 578–584. Koelsch, S., Schmidt, B. H., & Kansok, J. (2002a). Effects of musical expertise on the early right anterior negativity: An event-related brain potential study. Psychophysiology, 39, 657–663. Koelsch, S., Schroger, E., & Gunter, T. C. (2002b). Music matters: Preattentive musicality of the human brain. Psychophysiology, 39, 38–48. Koelsch, S., Fritz, T., & Schlaug, G. (2008). Amygdala activity can be modulated by unex­ pected chord functions during music listening. Neuroreport, 19, 1815–1819. Koelsch, S., Gunter, T., Friederici, A. D., & Schroger, E. (2000). brain indices of music pro­ cessing: “Nonmusicians” are musical. Journal of Cognitive Neuroscience, 12, 520–541. Koelsch, S., Gunter, T., Schroger, E., & Friederici, A. D. (2003a). Processing tonal modula­ tions: An ERP study. Journal of Cognitive Neuroscience, 15, 1149–1159. Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005a). Interaction between syn­ tax processing in language and in music: An ERP study. Journal of Cognitive Neuroscience, 17, 1565–1577.

Page 33 of 42

Cognitive Neuroscience of Music Koelsch, S., Jentschke, S., Sammler, D., & Mietchen, D. (2007). Untangling syntactic and sensory processing: An ERP study of music perception. Psychophysiology, 44, 476–490. Koelsch, S., Fritz, T., Schulze, K., Alsop, D., & Schlaug, G. (2005b). Adults and children processing music: An fMRI study. NeuroImage, 25, 1068–1076. Koelsch, S., Fritz, T., v Cramon, D. Y., Muller, K., & Friederici, A. D. (2006). Investigating emotion with music: An fMRI study. Human Brain Mapping, 27, 239–250. Koelsch, S., Gunter, T. C., Schroger, E., Tervaniemi, M., Sammler, D., & Friederici, A. D. (2001). Differentiating ERAN and MMN: An ERP study. Neuroreport, 12, 1385–1389. Koelsch, S., Gunter, T. C., v Cramon, D. Y., Zysset, S., Lohmann, G., & Friederici, A. D. (2002c). Bach speaks: A cortical “language-network” serves the processing of music. Neu­ roImage, 17, 956–966. Koelsch, S., Grossmann, T., Gunter, T. C., Hahne, A., Schroger, E., & Friederici, A. D. (2003b). Children processing music: Electric brain responses reveal musical competence and gender differences. Journal of Cognitive Neuroscience, 15, 683–693. Koelsch, S., Kasper, E., Gunter, T. C., Sammler, D., Schulze, K., & Friederici, A. D. (2004). Music, language, and meaning: Brain signatures of semantic processing. Nature Neuro­ science, 7, 302–307. Koelsch, S., Schulze, K., Sammler, D., Fritz, T., Muller, K., & Gruber, O. (2009). Functional architecture of verbal and tonal working memory: An fMRI study. Human Brain Mapping, 30, 859–873. Kraemer, D. J. M., Macrae, C. N., Green, A. E., & Kelley, W. M. (2005). Musical imagery: Sound of silence activates auditory cortex. Nature, 434, 158–158. Krumhansl, C. L. (1990). Cognitive foundations of musical pitch. New York: Oxford Uni­ versity Press. Krumhansl, C. L., & Kessler, E. J. (1982). Tracing the dynamic changes in perceived tonal organization in a spatial representation of musical keys. Psychological Review, 89, 334– 368. Lahav, A., Saltzman, E., & Schlaug, G. (2007). Action representation of sound: Audiomo­ tor recognition network while listening to newly acquired actions. Journal of Neuro­ science, 27, 308–314. Lakatos, S. (2000). A common perceptual space for harmonic and percussive timbres. Per­ ception and Psychophysics, 62, 1426–1439. Langheim, F. J. P., Callicott, J. H., Mattay, V. S., Duyn, J. H., & Weinberger, D. R. (2002). Cortical systems associated with covert music rehearsal. NeuroImage, 16, 901–908.

Page 34 of 42

Cognitive Neuroscience of Music Lappe, C., Herholz, S. C., Trainor, L. J., & Pantev, C. (2008). Cortical plasticity induced by short-term unimodal and multimodal musical training. Journal of Neuroscience, 28, 9632– 9639. Large, E. W., & Jones, M. R. (1999). The dynamics of attending: How people track timevarying events. Psychological Review 106, 119–159. Large, E. W., & Palmer, C. (2002). Perceiving temporal regularity in music. Cognitive Science, 26, 1–37. Leaver, A. M., Van Lare, J., Zielinski, B., Halpern, A. R., & Rauschecker, J. P. (2009). Brain activation during anticipation of sound sequences. Journal of Neuroscience, 29, 2477– 2485. Leino, S., Brattico, E., Tervaniemi, M., & Vuust, P. (2007). Representation of harmony rules in the human brain: Further evidence from event-related potentials. Brain Research, 1142, 169–177. Lerdahl, F., & Krumhansl, C. L. (2007). Modeling tonal tension. Music Perception, 24, 329–366. Levitin, D. J., & Menon, V. (2003). Musical structure is processed in “language” areas of the brain: A possible role for Brodmann Area 47 in temporal coherence. NeuroImage, 20, 2142–2152. Liegeois-Chauvel, C., Peretz, I., Babai, M., Laguitton, V., & Chauvel, P. (1998). Contribution of different cortical areas in the temporal lobes to music processing. Brain, 121, 1853–1867. (p. 132)

Limb, C. J., & Braun, A. R. (2008). Neural substrates of spontaneous musical perfor­ mance: An fMRI study of jazz improvisation. PLoS One, 3, e1679. Lindenberger, U., Li, S. C., Gruber, W., & Muller, V. (2009). Brains swinging in concert: Cortical phase synchronization while playing guitar. BMC Neuroscience, 10 (22), 1–12. London, J. (2004). Hearing in time: Psychological aspects of musical meter. New York: Ox­ ford University Press. Loui, P., Alsop, D., & Schlaug, G. (2009a). Tone deafness: A new disconnection syndrome? Journal of Neuroscience, 29, 10215–10220. Loui, P., Grent-’t-Jong, T., Torpey, D., & Woldorff, M. (2005). Effects of attention on the neural processing of harmonic syntax in Western music. Cognitive Brain Research, 25, 678–687. Loui, P., Wu, E. H., Wessel, D. L., & Knight, R. T. (2009b). A generalized mechanism for perception of pitch patterns. Journal of Neuroscience, 29, 454–459.

Page 35 of 42

Cognitive Neuroscience of Music Maess, B., Koelsch, S., Gunter, T. C., & Friederici, A. D. (2001). Musical syntax is processed in Broca’s area: An MEG study. Nature Neuroscience, 4, 540–545. Margulis, E. H., Mlsna, L. M., Uppunda, A. K., Parrish, T. B., & Wong, P. C. M. (2009). Selective neurophysiologic responses to music in instrumentalists with different listening biographies. Human Brain Mapping, 30, 267–275. McAdams, S., Winsberg, S., Donnadieu, S., Desoete, G., & Krimphoff, J. (1995). Perceptual scaling of synthesized musical timbres: Common dimensions, specificities, and latent sub­ ject classes. Psychological Research, 58, 177–192. McDonald, I. (2006). Musical alexia with recovery: A personal account. Brain, 129, 2554– 2561. Meister, I. G., Krings, T., Foltys, H., Boroojerdi, B., Muller, M., Topper, R., & Thron, A. (2004). Playing piano in the mind: An fMRI study on music imagery and performance in pianists. Cognitive Brain Research, 19, 219–228. Menon, V., & Levitin, D. J. (2005). The rewards of music listening: Response and physio­ logical connectivity of the mesolimbic system. NeuroImage, 28, 175–184. Menon, V., Levitin, D. J., Smith, B. K., Lembke, A., Krasnow, B. D., Glazer, D., Glover, G. H., & McAdams, S. (2002). Neural correlates of timbre change in harmonic sounds. NeuroI­ mage, 17, 1742–1754. Meyer, M., Baumann, S., & Jancke, L. (2006). Electrical brain imaging reveals spatio-tem­ poral dynamics of timbre perception in humans. NeuroImage, 32, 1510–1523. Miranda, R. A., & Ullman, M. T. (2007). Double dissociation between rules and memory in music: An event-related potential study. NeuroImage, 38, 331–345. Mitterschiffthaler, M. T., Fu, C. H. Y., Dalton, J. A., Andrew, C. M., & Williams, S. C. R. (2007). A functional MRI study of happy and sad affective states induced by classical mu­ sic. Human Brain Mapping, 28, 1150–1162. Mizuno, T., & Sugishita, M. (2007). Neural correlates underlying perception of tonality-re­ lated emotional contents. Neuroreport, 18, 1651–1655. Munte, T. F., Altenmuller, E., & Jancke, L. (2002). The musician’s brain as a model of neu­ roplasticity. Nature Reviews, Neuroscience, 3, 473–478. Näätänen, R. (1992). Attention and brain function. Hillsdale, NJ: Erlbaum. Näätänen, R., & Winkler, I. (1999). The concept of auditory stimulus representation in cognitive neuroscience. Psychological Bulletin, 125, 826–859. Nan, Y., Knosche, T. R., Zysset, S., & Friedericil, A. D. (2008) Cross-cultural music phrase processing: An fMRI study. Human Brain Mapping, 29, 312–328. Page 36 of 42

Cognitive Neuroscience of Music Northoff, G., & Bermpohl, F. (2004). Cortical midline structures and the self. Trends in Cognitive Sciences, 8, 102–107. Northoff, G., Heinzel, A., Greck, M., Bennpohl, F., Dobrowolny, H., & Panksepp, J. (2006). Self-referential processing in our brain: A meta-analysis of imaging studies on the self. NeuroImage, 31, 440–457. Otsuka, A., Tamaki, Y., & Kuriki, S. (2008). Neuromagnetic responses in silence after mu­ sical chord sequences. Neuroreport, 19, 1637–1641. Paller, K. A., McCarthy, G., & Wood, C. C. (1992). Event-related potentials elicited by de­ viant endings to melodies. Psychophysiology, 29, 202–206. Palmer, C., & Krumhansl, C. L. (1990). Mental representations for musical meter. Journal of Experimental Psychology. Human Perception and Performance, 16, 728–741. Pantev, C., Oostenveld, R., Engelien, A., Ross, B., Roberts, L. E., & Hoke, M. (1998). In­ creased auditory cortical representation in musicians. Nature, 392, 811–814. Parsons, L. M., Sergent, J., Hodges, D. A., & Fox, P. T. (2005). The brain basis of piano per­ formance. Neuropsychologia, 43, 199–215. Pascual-Leone, A., Dang, N., Cohen, L. G., Brasilneto, J. P., Cammarota, A., & Hallett, M. (1995). Modulation of muscle responses evoked by transcranial magnetic stimulation dur­ ing the acquisition of new fine motor-skills. Journal of Neurophysiology, 74, 1037–1045. Pascual-Leone, A., Grafman, J., Hallett, M. (1994), Modulation of cortical motor output maps during development of implicit and explicit knowledge. Science, 263, 1287–1289. Patel, A. D. (2003). Language, music, syntax and the brain. Nature Neuroscience, 6, 674– 681. Patel, A. D., & Balaban, E. (2000). Temporal patterns of human cortical activity reflect tone sequence structure. Nature, 404, 80–84. Patel, A. D., & Balaban, E. (2004). Human auditory cortical dynamics during perception of long acoustic sequences: Phase tracking of carrier frequency by the auditory steady-state response. Cerebral Cortex, 14, 35–46. Patel, A. D., Foxton, J. M., & Griffiths, T. D. (2005). Musically tone-deaf individuals have difficulty discriminating intonation contours extracted from speech. Brain and Cognition, 59, 310–313. Patel, A. D., Gibson, E., Ratner, J., Besson, M., & Holcomb, P. J. (1998). Processing syntac­ tic relations in language and music: An event-related potential study. Journal of Cognitive Neuroscience, 10, 717–733. Patterson, R. D., Uppenkamp, S., Johnsrude, I. S., & Griffiths, T. D. (2002). The processing of temporal pitch and melody information in auditory cortex. Neuron, 36, 767–776. Page 37 of 42

Cognitive Neuroscience of Music Penhune, V. B., Zatorre, R. J., & Feindel, W. H. (1999). The role of auditory cortex in reten­ tion of rhythmic patterns as studied in patients with temporal lobe removals including Heschl’s gyrus. Neuropsychologia, 37, 315–331. Peretz, I. (1996). Can we lose memory for music? A case of music agnosia in a nonmusi­ cian. Journal of Cognitive Neuroscience, 8, 481–496. Peretz, I. (2006). The nature of music from a biological perspective. Cognition, 100, 1–32. (p. 133)

Peretz, I., & Coltheart, M. (2003). Modularity of music processing. Nature Neuroscience, 6, 688–691. Perry, D. W., Zatorre, R. J., Petrides, M., Alivisatos, B., Meyer, E., & Evans, A. C. (1999). Localization of cerebral activity during simple singing. Neuroreport, 10, 3979–3984. Plailly, J., Tillmann, B., & Royet, J.-P. (2007). The feeling of familiarity of music and odors: The same neural signature? Cerebral Cortex, 17, 2650–2658. Platel, H., Baron, J. C., Desgranges, B., Bernard, F., & Eustache, F. (2003). Semantic and episodic memory of music are subserved by distinct neural networks. NeuroImage, 20, 244–256. Popescu, M., Otsuka, A., & Ioannides, A. A. (2004). Dynamics of brain activity in motor and frontal cortical areas during music listening: A magnetoencephalographic study. Neu­ roImage, 21, 1622–1638. Pressing, J. (2002). Black Atlantic rhythm: Its computational and transcultural founda­ tions. Music Perception, 19, 285–310. Ross, E. D., & Monnot, M. (2008). Neurology of affective prosody and its functionalanatomic organization in right hemisphere. Brain and Language, 104, 51–74. Sammler, D., Grigutsch, M., Fritz, T., & Koelsch, S. (2007). Music and emotion: Electro­ physiological correlates of the processing of pleasant and unpleasant music. Psychophysi­ ology, 44, 293–304. Samson, S., & Zatorre, R. J. (1988). Melodic and harmonic discrimination following unilat­ eral cerebral excision. Brain and Cognition, 7, 348–360. Samson, S., & Zatorre, R. J. (1991). Recognition memory for text and melody of songs af­ ter unilateral temporal lobe lesion: Evidence for dual encoding. Journal of Experimental Psychology. Learning, Memory, and Cognition, 17, 793–804. Samson, S., & Zatorre, R. J. (1994). Contribution of the right temporal-lobe to musical timbre discrimination. Neuropsychologia, 32, 231–240.

Page 38 of 42

Cognitive Neuroscience of Music Samson, S., Zatorre, R. J., & Ramsay, J. O. (2002). Deficits of musical timbre perception af­ ter unilateral temporal-lobe lesion revealed with multidimensional scaling. Brain, 125, 511–523. Sarkamo, T., Tervaniemi, M., Laitinen, S., Forsblom, A., Soinila, S., Mikkonen, M., Autti, T., Silvennoinen, H. M., Erkkilae, J., Laine, M., Peretz, I., & Hietanen, M. (2008). Music listening enhances cognitive recovery and mood after middle cerebral artery stroke. Brain, 131, 866–876. Satoh, M., Takeda, K., Nagata, K., Hatazawa, J., & Kuzuhara, S. (2001). Activated brain regions in musicians during an ensemble: A PET study. Cognitive Brain Research, 12, 101–108. Satoh, M., Takeda, K., Nagata, K., Shimosegawa, E., & Kuzuhara, S. (2006). Positronemission tomography of brain regions activated by recognition of familiar music. Ameri­ can Journal of Neuroradiology, 27, 1101–1106. Schellenberg, E. G., Iverson, P., & McKinnon, M. C. (1999). Name that tune: Identifying popular recordings from brief excerpts. Psychonomic Bulletin and Review, 6, 641–646. Schellenberg, E. G., & Trehub, S. E. (2003). Good pitch memory is widespread. Psycholog­ ical Science, 14, 262–266. Schirmer, A., & Kotz, S. A. (2006). Beyond the right hemisphere: brain mechanisms medi­ ating vocal emotional processing. Trends in Cognitive Sciences, 10, 24–30. Schlaug, G., Jancke, L., Huang, Y. X., & Steinmetz, H. (1995). In-vivo evidence of structur­ al brain asymmetry in musicians. Science, 267, 699–701. Schmithorst, V. J., & Holland, S. K. (2003). The effect of musical training on music pro­ cessing: A functional magnetic resonance imaging study in humans. Neuroscience Letters, 348, 65–68. Schneider, P., Scherg, M., Dosch, H. G., Specht, H. J., Gutschalk, A., & Rupp, A. (2002). Morphology of Heschl’s gyrus reflects enhanced activation in the auditory cortex of musi­ cians. Nature Neuroscience, 5, 688–694. Schneider, P., Sluming, V., Roberts, N., Scherg, M., Goebel, R., Specht, H. J., Dosch, H. G., Bleeck, S., Stippich, C., & Rupp, A. (2005). Structural and functional asymmetry of lateral Heschl’s gyrus reflects pitch perception preference. Nature Neuroscience, 8, 1241–1247. Schön, D., Anton, J. L., Roth, M., & Besson, M. (2002). An fMRI study of music sight-read­ ing. Neuroreport, 13, 2285–2289. Schön, D., Semenza, C., & Denes, G. (2001). Naming of musical notes: A selective deficit in one musical clef. Cortex, 37, 407–421.

Page 39 of 42

Cognitive Neuroscience of Music Schubotz, R. I. (2007). Prediction of external events with our motor system: Towards a new framework. Trends in Cognitive Sciences, 11, 211–218. Sergent, J., Zuck, E., Terriah, S., & Macdonald, B. (1992). Distributed neural network un­ derlying musical sight-reading and keyboard performance. Science, 257, 106–109. Sluming, V., Barrick, T., Howard, M., Cezayirli, E., Mayes, A., & Roberts, N. (2002). Voxelbased morphometry reveals increased gray matter density in Broca’s area in male sym­ phony orchestra musicians. NeuroImage, 17, 1613–1622. Sridharan, D., Levitin, D. J., Chafe, C. H., Berger, J., & Menon, V. (2007). Neural dynamics of event segmentation in music: Converging evidence for dissociable ventral and dorsal networks. Neuron, 55, 521–532. Steinbeis, N., & Koelsch, S. (2008). Shared neural resources between music and language indicate semantic processing of musical tension-resolution patterns. Cerebral Cortex, 18, 1169–1178. Steinbeis, N., Koelsch, S., & Sloboda, J. A. (2006). The role of harmonic expectancy viola­ tions in musical emotions: Evidence from subjective, physiological, and neural responses. Journal of Cognitive Neuroscience, 18, 1380–1393. Stewart, L., Henson, R., Kampe, K., Walsh, V., Turner, R., & Frith, U. (2003). Brain changes after learning to read and play music. Neuroimage, 20 (1), 71–83. doi: http:// Stewart, L., von Kriegstein, K., Warren, J. D., & Griffiths, T. D. (2006). Music and the brain: Disorders of musical listening. Brain, 129, 2533–2553. Temperley, D. (2001). The cognition of basic musical structures. Cambridge, MA: MIT Press. Temperley, D. (2007). Music and probability. Cambridge, MA: MIT Press. Thaut, M. (2005). Rhythm, music, and the brain: Scientific foundations and clinical appli­ cations. New York: Routledge. Tillmann, B., Bharucha, J. J., & Bigand, E. (2000). Implicit learning of tonality: A self-orga­ nizing approach. Psychological Review, 107, 885–913. Tillmann, B., Janata, P., & Bharucha, J. J. (2003). Activation of the inferior frontal cortex in musical priming. Cognitive Brain Research, 16, 145–161. (p. 134)

Toiviainen, P. (2007). Visualization of tonal content in the symbolic and audio domains. Computing in Musicology, 15, 187–199. Toiviainen, P., & Krumhansl, C. L. (2003). Measuring and modeling real-time responses to music: The dynamics of tonality induction. Perception, 32, 741–766. Page 40 of 42

Cognitive Neuroscience of Music Toiviainen, P., Tervaniemi, M., Louhivuori, J., Saher, M., Huotilainen, M., & Naatanen, R. (1998). Timbre similarity: Convergence of neural, behavioral, and computational ap­ proaches. Music Perception, 16, 223–241. Tueting, P., Sutton, S., & Zubin, J. (1970). Quantitative evoked potential correlates of the probability of events. Psychophysiology, 7, 385–394. Warren, J. D., Uppenkamp, S., Patterson, R. D., & Griffiths, T. D. (2003). Separating pitch chroma and pitch height in the human brain. Proceedings of the National Academy of Sciences U S A, 100, 10038–10042. Warrier, C. M., & Zatorre, R. J. (2002). Influence of tonal context and timbral variation on perception of pitch. Perception and Psychophysics, 64, 198–207. Warrier, C. M., & Zatorre, R. J. (2004). Right temporal cortex is critical for utilization of melodic contextual cues in a pitch constancy task. Brain, 127, 1616–1625. Watanabe, T., Yagishita, S., & Kikyo, H. (2008). Memory of music: Roles of right hip­ pocampus and left inferior frontal gyrus. NeuroImage, 39, 483–491. Wiltermuth, S. S., & Heath, C. (2009). Synchrony and cooperation. Psychological Science, 20, 1–5. Zarate, J. M., & Zatorre, R. J. (2008). Experience-dependent neural substrates involved in vocal pitch regulation during singing. NeuroImage, 40, 1871–1887. Zatorre, R. J. (1985). Discrimination and recognition of tonal melodies after unilateral cerebral excisions. Neuropsychologia, 23, 31–41. Zatorre, R. J., Belin, P., & Penhune, V. B. (2002). Structure and function of auditory cortex: Music and speech. Trends in Cognitive Sciences, 6, 37–46. Zatorre, R. J., Evans, A. C., & Meyer, E. (1994). Neural mechanisms underlying melodic perception and memory for pitch. Journal of Neuroscience, 14, 1908–1919. Zatorre, R. J., Halpern, A. R., Perry, D. W., Meyer, E., & Evans, A. C. (1996). Hearing in the mind’s ear: A PET investigation of musical imagery and perception. Journal of Cognitive Neuroscience, 8, 29–46. Zatorre, R. J., Perry, D. W., Beckett, C. A., Westbury, C. F., & Evans, A. C. (1998). Function­ al anatomy of musical processing in listeners with absolute pitch and relative pitch. Pro­ ceedings of the National Academy of Sciences U S A, 95, 3172–3177.

Petr Janata

Petr Janata is Professor at University of California Davis in the Psychology Depart­ ment and Center for Mind and Brain.

Page 41 of 42


Audition   Josh H. McDermott The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics Edited by Kevin N. Ochsner and Stephen Kosslyn Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0008

Abstract and Keywords Audition is the process by which organisms use sound to derive information about the world. This chapter aims to provide a bird’s-eye view of contemporary audition research, spanning systems and cognitive neuroscience as well as cognitive science. The author provides brief overviews of classic areas of research as well as some central themes and advances from the past ten years. The chapter covers the sound transduction of the cochlea, subcortical and cortical anatomical and functional organization of the auditory system, amplitude modulation and its measurement, adaptive coding and plasticity, the perception of sound sources (with a focus on the classic research areas of location, loud­ ness, and pitch), and auditory scene analysis (including sound segregation, streaming, filling in, and reverberation perception). The chapter concludes with a discussion of where hearing research seems to be headed at present. Keywords: sound transduction, auditory system anatomy, modulation, adaptation, plasticity, pitch perception, au­ ditory scene analysis, sound segregation, streaming, reverberation

Introduction From the cry of a baby to the rumble of a thunderclap, many events in the world produce sound. Sound is created when matter in the world vibrates, and takes the form of pres­ sure waves that propagate through the air, containing clues about the environment around us. Audition is the process by which organisms utilize these clues to derive infor­ mation about the world. Audition is a crucial sense for most organisms. Humans, in particular, use sound to infer a vast number of important things—what someone said, their emotional state when they said it, and the whereabouts and nature of objects we cannot see, to name but a few. When hearing is impaired (via congenital conditions, noise exposure, or aging), the conse­ quences can be devastating, such that a large industry is devoted to the design of pros­ thetic hearing devices.

Page 1 of 62

Audition As listeners we are largely unaware of the computations underlying our auditory system’s success, but they represent an impressive feat of engineering. The computational chal­ lenges of everyday audition are reflected in the gap between biological and machine hear­ ing systems—machine systems for interpreting sound currently fall far short of human abilities. Understanding the basis of our success in perceiving sound will hopefully help us to replicate it in machine systems and to restore it in biological auditory systems when their function becomes impaired. The goal of this chapter is to provide a bird’s-eye view of contemporary hearing research. I provide brief overviews of classic areas of research as well as some central themes and advances from the past ten years. The first section describes the sensory transduction of the cochlea. The second section outlines subcortical and cortical functional organization. (p. 136) The third section discusses modulation and its measurement by subcortical and cortical regions of the auditory system, a key research focus of the past few decades. The fourth section describes adaptive coding and plasticity, encompassing the relationship be­ tween sensory coding and the environment as well as its adaptation to task demands. The fifth section discusses the perception of sound sources, focusing on location, loudness, and pitch. The sixth section presents an overview of auditory scene analysis. I conclude with a discussion of where hearing research is headed at present. Because other chapters in this handbook are devoted to auditory attention, music, and speech, I will largely avoid these topics.

The Problem Just by listening, we can routinely apprehend many aspects of the world around us: the size of a room in which we are talking, whether it is windy or raining outside, the speed of someone approaching from behind, or whether the surface someone is walking on is gravel or marble. These abilities are nontrivial because the properties of the world that are of interest to a listener are generally not explicit in the acoustic input—they cannot be easily recognized or discriminated using the sound waveform itself. The brain must process the sound entering the ear to generate representations in which the properties of interest are more evident. One of the main objectives of hearing science is to understand the nature of these transformations and their instantiation in the brain. Like other senses, audition is further complicated by a second challenge—that of scene analysis. Although listeners are generally interested in the properties of individual ob­ jects or events, the ears are rarely presented with the sounds from isolated sources. In­ stead, the sound signal that reaches the ear is typically a mixture of sounds from different sources. Such situations occur frequently in natural auditory environments, for example, in social settings, where a single speaker of interest may be talking among many others, and in music. From the mixture it receives as input, the brain must derive representa­ tions of the individual sound sources of interest, as are needed to understand someone’s speech, recognize a melody, or otherwise guide behavior. Known as the “cocktail party problem” (Cherry, 1953), or “auditory scene analysis” (Bregman, 1990), this problem has Page 2 of 62

Audition analogues in other sensory modalities, but the auditory version presents some uniquely challenging features.

Sound Measurement—The Peripheral Auditory System The transformation of the raw acoustic input into representations that are useful for be­ havior is apparently instantiated over many brain areas and stages of neural processing, spanning the cochlea, midbrain, thalamus, and cortex (Figure 8.1). The early stages of this cascade are particularly intricate in the auditory system relative to other sensory sys­ tems, with many processing stations occurring before the cortex. The sensory organ of the cochlea is itself a complex multicomponent system, whose investigation remains a considerable challenge—the mechanical nature of the cochlea renders it much more diffi­ cult to probe (e.g., with electrodes) than the retina or olfactory epithelium, for instance. Peripheral coding of sound is also unusual relative to that of other senses in its degree of clinical relevance. Unlike vision, for which the most common forms of dysfunction are op­ tical in nature, and can be fixed with glasses, hearing impairment typically involves al­ tered peripheral neural processing, and its treatment has benefited from a detailed un­ derstanding of the processes that are altered. Much of hearing research has accordingly been devoted to understanding the nature of the measurements made by the auditory pe­ riphery, and they provide a natural starting point for any discussion of how we hear.

Frequency Selectivity and the Cochlea Hearing begins with the ear, where the sound pressure waveform carried by the air is transduced into action potentials that are sent to the brain via the auditory nerve. Action potentials are a binary code, but what is conveyed to the brain is far from simply a bina­ rized version of the incoming waveform. The transduction process is marked by several distinctive signal transformations, the most obvious of which is produced by frequency tuning.

Page 3 of 62


Figure 8.1 The auditory system. Sound is transduced by the cochlea, processed by an interconnected set of subcortical areas, and then fed into the core re­ gions of auditory cortex.

The coarse details of sound transduction are well understood (Figure 8.2). Sound induces vibrations of the eardrum, which are transmitted via the bones of the middle ear to the cochlea, the sensory organ of the auditory system. The cochlea is a coiled, fluid-filled tube, containing several membranes that extend along its length and vibrate in response to sound. Transduction of this mechanical vibration into an electrical signal occurs in the organ of Corti, a mass of cells attached to the basilar membrane. The organ of Corti in particular contains what are known as hair cells, named for the stereocilia that protrude from them. The inner hair cells are (p. 137) responsible for sound transduction. When the section of membrane on which they lie vibrates, the resulting deformation of the hair cell body opens mechanically gated ion channels, inducing a voltage change within the cell. Neurotransmitter release is triggered by the change in membrane potential, generating action potentials in the auditory nerve fiber that the hair cell synapses with. This electri­ cal signal is carried by the auditory nerve fiber to the brain. The frequency tuning of the transduction process occurs because different parts of the basilar membrane vibrate in response to different frequencies. This is partly due to me­ chanical resonances—the thickness and stiffness of the membrane vary along its length, producing a different resonant frequency at each point. However, the mechanical reso­ nances are actively enhanced via a feedback process, believed to be mediated largely by a second set of cells, called the outer hair cells. The outer hair cells abut the inner hair cells on the organ of Corti and serve to alter the basilar membrane vibration rather than transduce it. They expand and contract in response to sound through mechanisms that are only partially understood (Ashmore, 2008; Dallos, 2008; Hudspeth, 2008). Their mo­ tion alters the passive mechanics of the basilar membrane, amplifying the response to low-intensity sounds and tightening the frequency tuning of the resonance. The upshot is that high frequencies produce vibrations at the basal end of the cochlea (close to the eardrum), whereas low frequencies produce vibrations at the apical end (far from the Page 4 of 62

Audition eardrum), with frequencies in between stimulating intermediate regions. The auditory nerve fibers that synapse onto individual inner hair cells are thus frequency tuned—they fire action potentials in response to a local range of frequencies, collectively providing the rest of the auditory system with a frequency decomposition of the incoming wave­ form. As a result of this behavior, the cochlea is often described functionally as a set of bandpass filters—filters that each pass frequencies within a particular range, and elimi­ nate those outside of it.

Figure 8.2 Structure of the peripheral auditory sys­ tem. Top right, Diagram of ear. The eardrum trans­ mits sound to the cochlea via the middle ear bones (ossicles). Top middle, Inner ear. The semicircular canals abut the cochlea. Sound enters the cochlea via the oval window and causes vibrations along the basilar membrane, which runs through the middle of the cochlea. Top left, Cross section of cochlea. The organ of Corti, containing the hair cells that trans­ duce sound into electrical potentials, sits on top of the basilar membrane. Bottom, Schematic of section of organ of Corti. The shearing that occurs between the basilar and tectorial membranes when they vi­ brate (in response to sound) causes the hair cell stereocilia to deform. The deformation causes a change in the membrane potential of the inner hair cells, transmitted to the brain via afferent auditory nerve fibers. The outer hair cells, which are three times more numerous than the inner hair cells, serve as a feedback system to alter the basilar membrane motion, tightening its tuning and amplifying the re­ sponse to low amplitude sounds.

Page 5 of 62


Figure 8.3 Frequency selectivity. A, Threshold tun­ ing curves of auditory nerve fibers from a cat ear, plotting the level that was necessary to evoke a crite­ rion increase in firing rate for a given frequency (Mill er, Schilling, et al., 1997). B, The tonotopy of the cochlea. The position along the basilar membrane at which auditory nerve fibers synapse with a hair cell (determined by dye injections) is plotted vs. their best frequency (Liberman, 1982). Both parts of this figure are courtesy of Eric Young, 2010, who replotted data from the original sources.

The frequency decomposition of the cochlea is conceptually similar to the Fourier trans­ form, but differs in the way that the frequency spectrum is decomposed. Whereas the Fourier transform uses linearly spaced frequency bins, each separated by the same num­ ber of hertz, the tuning bandwidth of auditory nerve fibers increases with their preferred frequency. This characteristic can be observed in Figure 8.3A, in which the frequency re­ sponse of a set of auditory nerve fibers is (p. 138) (p. 139) plotted on a logarithmic frequen­ cy scale. Although the lowest frequency fibers are broader on a log scale than the highfrequency fibers, in absolute terms their bandwidths are much lower—several hundred hertz instead of several thousand. The distribution of best frequency along the cochlea follows a roughly logarithmic function, apparent in Figure 8.3B, which plots the best fre­ quency of a large set of nerve fibers against the distance along the cochlea of the hair cell that they synapse with. These features of frequency selectivity are present in most biolog­ ical auditory systems. It is partly for this reason that a log scale is commonly used for fre­ quency. Cochlear frequency selectivity has a host of perceptual consequences—our ability to de­ tect a particular frequency is limited largely by the signal-to-noise ratio of the cochlear fil­ ter centered on the frequency, for instance. There are many treatments of frequency se­ lectivity and perception (Moore, 2003); it is perhaps the most studied aspect of hearing.

Page 6 of 62

Audition Although the frequency tuning of the cochlea is uncontroversial, the teleological question of why the cochlear transduction process is frequency-tuned remains less settled. How does frequency tuning aid the brain’s task of recovering useful information about the world from its acoustic input? Over the past two decades, a growing number of re­ searchers have endeavored to explain properties of sensory systems as optimal for the task of encoding natural sensory stimuli, initially focusing on coding questions in vision, and using notions of efficiency as the optimality criterion (Field, 1987; Olshausen & Field, 1996). Lewicki and colleagues have applied similar concepts to hearing, using algorithms that derive efficient and sparse representations of sounds (Lewicki, 2002; Smith & Lewic­ ki, 2006), properties believed to be desirable of early sensory representations. They re­ port that for speech, or for (p. 140) combinations of environmental sounds and animal vo­ calizations, efficient representations for sound look much like the representation pro­ duced by auditory nerve fiber responses—sounds are represented with filters whose tun­ ing is localized in frequency. Interestingly, the resulting representations share the depen­ dence of bandwidth on frequency found in biological hearing—bandwidths increase with frequency as they do in the ear. Moreover, representations derived in the same way for “unnatural” sets of sounds, such as samples of white noise, do not exhibit frequency tun­ ing, indicating that the result is at least somewhat specific to the sorts of sounds com­ monly encountered in the world. These results suggest that frequency tuning provides an efficient means to encode the sounds that were likely of importance when the auditory system evolved, possibly explaining its ubiquitous presence in auditory systems. It re­ mains to be seen whether this framework can explain potential variation in frequency tun­ ing bandwidths across species (humans have recently been claimed to possess narrower tuning than other species (Joris, Bergevin, et al., 2011; Shera, Guinan, et al., 2002), or the broadening of frequency tuning with increasing sound intensity (Rhode, 1978), but it pro­ vides one means by which to understand the origins of peripheral auditory processing.

Amplitude Compression A second salient transformation that occurs in the cochlea is that of amplitude compres­ sion, whereby the mechanical response of the cochlea to a soft sound (and thus the neur­ al response as well) is larger than would be expected given the response to a loud sound. The response elicited by a sound is thus not proportional to the sound’s amplitude (as it would be if the response were linear), but rather to a compressive nonlinear function of amplitude. The dynamic range of the response to sound is thus “compressed” relative to the dynamic range of the acoustic input. Whereas the range of audible sounds covers five orders of magnitude, or 100 dB, the range of cochlear response covers only one or two or­ ders of magnitude (Ruggero, Rich, et al., 1997). Compression appears to serve to map the range of amplitudes that the listener needs to hear (i.e., those commonly encountered in the environment), onto the physical operating range of the cochlea. Without compression, it would have to be the case that either sounds low in level would be inaudible, or sounds high in level would be indiscriminable (for they would fall outside the range that could elicit a response change). Compression Page 7 of 62

Audition permits very soft sounds to produce a physical response that is (just barely) detectable, while maintaining some discriminability of higher levels. The compressive nonlinearity is often approximated as a power function with an exponent of 0.3 or so. It is not obvious why the compressive nonlinearity should take the particular form that it does. Many different functions could in principle serve to compress the out­ put response range. It remains to be seen whether compression can be explained in terms of optimizing the encoding of the input, as has been proposed for frequency tuning (but see Escabi, Miller, et al., 2003). Most machine hearing applications also utilize amplitude compression before analyzing sound, however, and it is widely agreed to be useful to am­ plify low amplitudes relative to large when processing sound. Amplitude compression was first noticed in measurements of the physical vibrations of the basilar membrane (Rhode, 1971; Ruggero, 1992) but is also apparent in auditory nerve fiber responses (Yates, 1990) and is believed to account for a number of perceptual phenomena (Moore & Oxenham, 1998). The effects of compression are related to “cochlear amplification,” in that compression results from response enhancement that is limited to low-intensity sounds. Compression is achieved in part via the outer hair cells, whose motility modifies the motion of the basilar membrane in response to sound (Rug­ gero & Rich, 1991). Outer hair cell function is frequently altered in hearing impairment, one consequence of which is a loss of compression, something that hearing aids attempt to mimic.

Neural Coding in the Auditory Nerve

Figure 8.4 Phase locking. A, A 200-Hz pure tone stimulus waveform aligned in time with several over­ laid traces of an auditory nerve fiber’s response to the tone. Note that the spikes are not uniformly dis­ tributed in time, but rather occur at particular phas­ es of the sinusoidal input. B, A measure of phase locking for each of a set of nerve fibers in response to different frequencies. Phase locking decreases at high frequencies. Both parts of this figure are reprinted with permis­ sion from the original source: Javel & Mott, 1988.

Although frequency tuning and amplitude compression are at this point uncontroversial and relatively well understood, several other empirical questions about peripheral audito­ ry coding remain unresolved. One important issue involves the means by which the audi­ Page 8 of 62

Audition tory nerve encodes frequency information. As a result of the frequency tuning of the audi­ tory nerve, the spike rate of a nerve fiber contains information about frequency (a large firing rate indicates that the sound input contains frequencies near the center of the range of the fiber’s tuning). Collectively, the firing rates of all nerve fibers could thus be used to estimate the instantaneous spectrum of a sound. However, spike timings also car­ ry frequency information. At least for low frequencies, the spikes that are fired in re­ sponse to sound do not occur randomly, (p. 141) but rather tend to occur at the peak dis­ placements of the basilar membrane vibration. Because the motion of a particular section of the membrane mirrors the bandpass-filtered sound waveform, the spikes occur at the waveform peaks (Rose, Brugge, et al., 1967). If the input is a single frequency, spikes thus occur at a fixed phase of the frequency cycle (Figure 8.4A). This behavior is known as phase locking and produces spikes at regular intervals corresponding to the period of the frequency. The spike timings thus carry information that could potentially augment or su­ percede that conveyed by the rate of firing. Phase locking degrades in accuracy as frequency is increased (Figure 8.4B) due to limita­ tions in the temporal fidelity of the hair cell membrane potential (Palmer & Russell, 1986) and is believed to be largely absent for frequencies above 4 kHz in most mammals, al­ though there is some variability across species (Johnson, 1980; Palmer & Russell, 1986). The appeal of phase locking as a code for sound frequency is partly due to features of rate-based frequency selectivity that are unappealing from an engineering standpoint. Al­ though frequency tuning in the auditory system (as measured by auditory nerve spike rates or psychophysical masking experiments) is narrow at low stimulus levels, it broad­ ens considerably as the level is raised (Glasberg & Moore, 1990; Rhode, 1978). Phase locking, by comparison, is robust to sound level—even though a nerve fiber responds to a broad range of frequencies when the level is high, the time intervals between spikes con­ tinue to convey frequency-specific information, as the peaks in the bandpass-filtered waveform tend to occur at integer multiples of the periods of the component frequencies. Our ability to discriminate frequency is impressive, with thresholds on the order of 1 per­ cent (Moore, 1973), and there has been long-standing interest in whether this ability in part depends on fine-grained spike timing information (Heinz, Colburn, et al., 2001). Al­ though phase locking remains uncharacterized in humans because of the unavailability of human auditory nerve recordings, it is presumed to occur in much the same way as in nonhuman auditory systems. Moreover, several psychophysical phenomena are consistent with a role for phase locking in human hearing. For instance, frequency discrimination becomes much poorer for frequencies above 4 kHz (Moore, 1973), roughly the point at which phase locking declines in nonhuman animals. The fundamental frequency of the highest note on a piano is also approximately 4 kHz; this is also the point above which melodic intervals between pure tones (tones containing a single frequency) are much less evident (Attneave & Olson, 1971; Demany & Semal, 1990). These findings provide some circumstantial evidence that phase locking is important for deriving precise estimates of frequency, but definitive evidence remains elusive. It remains possible that the perceptual degradations at high frequencies reflect a lack of experience with such frequencies, or Page 9 of 62

Audition their relative unimportance for typical behavioral judgments, rather than a physiological limitation. The upper limit of phase locking is also known to decrease markedly at each successive stage of the auditory system (Wallace, Anderson, et al., 2007). (p. 142) By primary audito­ ry cortex, the upper cutoff is in the neighborhood of a few hundred hertz. It would thus seem that the phase locking that occurs robustly in the auditory nerve would need to be rapidly transformed into a spike rate code if it were to benefit processing throughout the auditory system. Adding to the puzzle is the fact that frequency tuning is not thought to be dramatically narrower at higher stages in the auditory system. Such tightening might be expected if the frequency information provided by phase-locked spikes was trans­ formed to yield improved rate-based frequency tuning at subsequent stages (but see Bit­ terman, Mukamel, et al., 2008).

II. Organization of the Auditory System Subcortical Pathways The auditory nerve feeds into a cascade of interconnected subcortical regions that lead up to the auditory cortex, as shown in Figure 8.1. The subcortical auditory pathways have complex anatomy, only some of which is depicted in Figure 8.1. In contrast to the subcor­ tical pathways of the visual system, which are often argued to largely preserve the repre­ sentation generated in the retina, the subcortical auditory areas exhibit a panoply of in­ teresting response properties not found in the auditory nerve, many of which remain ac­ tive topics of investigation. Several subcortical regions will be referred to in the sections that follow in the context of other types of acoustic measurements or perceptual func­ tions.

Feedback to the Cochlea Like other sensory systems, the auditory system can be thought of as a processing cas­ cade, extending from the sensory receptors to cortical areas believed to mediate auditorybased decisions. This “feedforward” view of processing underlies much auditory re­ search. As in other systems, however, feedback from later stages to earlier ones is ubiqui­ tous and substantial, and in the auditory system is perhaps even more pronounced than elsewhere in the brain. Unlike the visual system, for instance, the auditory pathways con­ tain feedback extending all the way back to the sensory receptors. The function of much of this feedback remains poorly understood, but one particular set of projections—the cochlear efferent system—has been the subject of much discussion. Efferent connections to the cochlea originate primarily from the superior olivary nucleus, an area of the midbrain a few synapses removed from the cochlea (see Figure 8.1, al­ though the efferent pathways are not shown). The superior olive is divided into two sub­ regions, medial and lateral, and to first order, these give rise to two efferent projections: Page 10 of 62

Audition one from the medial superior olive to the outer hair cells, called the medial olivocochlear (MOC) efferents, and one from the lateral superior olive to the inner hair cells, called the lateral olivocochlear (LOC) efferents (Elgoyhen & Fuchs, 2010). The MOC efferents have been relatively well studied. Their activation (e.g., by electrical stimulation) is known to reduce the basilar membrane response to low-intensity sounds, and causes the frequency tuning of the response to broaden. This is probably because the MOC efferents inhibit the outer hair cells, which are crucial to amplifying the response to low-intensity sounds and to sharpening frequency tuning. The MOC efferents may serve a protective function by reducing the response to loud sounds (Rajan, 2000), but their most commonly proposed function is to enhance the re­ sponse to transient sounds in noise (Guinan, 2006). When the MOC fibers are severed, for instance, performance on tasks involving discrimination of tones in noise is reduced (May & McQuone, 1995). Noise-related MOC effects are proposed to derive from its influence on adaptation, which when induced by background noise, reduces the detectability of transient foreground sounds by decreasing the dynamic range of the auditory nerve’s re­ sponse. Because MOC activation reduces the response to ongoing sound, adaptation in­ duced by continuous background noise is reduced, thus enhancing the response to tran­ sient tones that are too brief to trigger the MOC feedback themselves (Kawase, Delgutte, et al., 1993; Winslow & Sachs, 1987). Another interesting but controversial proposal is that the MOC efferents play a role in auditory attention. One study, for instance, found that patients whose vestibular nerve (containing the MOC fibers) had been severed were better at detecting unexpected tones after the surgery, suggesting that selective attention had been altered so as to prevent the focusing of resources on expected frequencies (Scharf, Magnan, et al., 1997). See Guinan, 2006, for a recent review of these and other ideas about MOC efferent function. Less is known about the LOC efferents. One recent study found that destroying the LOC efferents to one ear in mice caused binaural responses to become “unbalanced” (Darrow, Maison, et al., 2006)—when sounds were presented binaurally at equal levels, responses from the two ears that were equal under normal conditions were generally not equal fol­ lowing the surgical procedure. The suggestion was that the LOC efferents serve to regu­ late binaural responses so that interaural intensity (p. 143) differences, crucial to sound localization (see below), can be accurately registered.

Page 11 of 62



Figure 8.5 Tonotopy. Best frequency of voxels in the human auditory cortex, measured with fMRI, plotted on the flattened cortical surface (Humphries, Lieben­ thal, et al., 2010). Note that the best frequency varies quasi-smoothly over the cortical surface and is suggestive of two maps that are approximately mir­ ror images of each other.

Although many of the functional properties of subcortical and cortical neurons are dis­ tinct from what is found in auditory nerve responses, frequency tuning persists. Every subcortical region contains frequency-tuned neurons, and neurons tend to be spatially or­ ganized to some extent according to their best frequency, forming “tonotopic” maps. This organization is also evident in the cortex. Many cortical neurons have a preferred fre­ quency, although they are often less responsive to pure tones (relative to sounds with more complex spectra) and often have broader tuning than neurons in peripheral stages (Moshitch, Las, et al., 2006). Cortical frequency maps were one of the first reported find­ ings in single-unit neurophysiology studies of the auditory cortex in animals, and have since been found using functional magnetic resonance imaging (fMRI) in humans (Formisano, Kim, et al., 2003; Humphries, Liebenthal, et al., 2010; Talavage, Sereno, et al., 2004) as well as monkeys (Petkov, Kayser, et al., 2006). Figure 8.5 shows an example of a tonotopic map obtained in a human listener with fMRI. Although never formally quantified, it seems that tonotopy is less robust than the retinotopy found in the visual system (evident, e.g., in recent optical imaging studies; Bandyopadhyay, Shamma, et al., 2010; Rothschild, Nelken, et al., 2010). Although the presence of some degree of tonotopy in the cortex is beyond question, its functional importance remains unclear. Frequency selectivity is not the end goal of the auditory system, and it does not obviously bear much relevance to behavior, so it is un­ clear why tonotopy would be a dominant principle of organization throughout the audito­ Page 12 of 62

Audition ry system. It may be that other principles of organization are in fact more prominent but have yet to be discovered. At present, however, tonotopy remains a staple of textbooks and review chapters such as this.

Functional Organization Largely on grounds of anatomy and connectivity, mammalian auditory cortex is standard­ ly divided into three sets of regions, shown in Figure 8.6: a core region receiving direct input from the thalamus, a “belt” region surrounding it, and a “parabelt” region beyond that (Kaas & Hackett, 2000; Sweet, Dorph-Petersen, et al., 2005). Within these areas, tonotopy is often used to delineate distinct fields (a field is typically considered to contain a single tonotopic map). The core region is divided in this way into areas A1, R (for ros­ tral), and RT (for rostrotemporal) in primates, with A1 and R receiving direct input from the medial geniculate nucleus of the thalamus. There are also multiple belt areas (Petkov, Kayser, et al., 2006), each receiving input from the core areas. Functional imaging re­ veals many additional areas that respond to sound in the awake primate, including parts of parietal and frontal cortex (Poremba, Saunders, et al., 2003). There are some indica­ tions that the three core regions have different properties (Bendor & Wang, 2008), and that stimulus selectivity increases in complexity from the core to surrounding areas (Kikuchi, Horwitz, et al., 2010; Rauschecker & Tian, 2004; Tian & Rauschecker, 2004), suggestive of a hierarchy of processing. However, at present, there is not a single widely accepted framework for auditory cortical organization. Several principles of organization have been proposed with varying degrees of empirical support; here, we review a few of them.

Page 13 of 62


Figure 8.6 Anatomy of auditory cortex. A, Lateral view of macaques cortex. The approximate location of the parabelt region is indicated with dashed or­ ange lines. B, View of the brain from (A) after re­ moval of the overlying parietal cortex. Approximate locations of the core (solid red line), belt (dashed yel­ low line), and parabelt (dashed orange line) regions are shown. AS, arcuate sulcus; CS, central sulcus; INS, insula; LS, lateral sulcus; STG, superior tempo­ ral gyrus; STS, superior temporal sulcus. C, Connec­ tivity between core and belt regions. Solid lines with arrows denote dense connections; dashed lines with arrows denote less dense connections. RT, R, and A1 compose the core; all three subregions receive input from the thalamus. The areas surrounding the core make up the belt, and the two regions outlined with dashed lines make up the parabelt. The core has few direct connections with the parabelt or more distant cortical areas. AL, anterolateral; CL, caudolateral; CM, caudomedial; CPB, caudal parabelt; ML, middle lateral; MM, middle medial; RM, rostromedial; RPB, rostral parabelt; RT, rostrotemporal; RTM, medial rostrotemporal; RTL, lateral rostrotemporal. All parts reprinted from original source: Kaas & Hackett, 2000.

Some of the proposed organizational principles clearly derive inspiration from the visual system. For (p. 144) instance, selectivity for vocalizations and selectivity for spatial loca­ tion have been found to be partially segregated, each being most pronounced in a differ­ ent part of the lateral belt (Tian, Reser, et al., 2001; Woods, Lopez, et al., 2006). These re­ gions have thus been proposed to constitute the beginning of ventral “what” and dorsal “where” pathways analogous to those in the visual system, perhaps culminating in the same parts of the prefrontal cortex as the analogous visual pathways (Cohen, Russ, et al., 2009; Romanski, Tian, et al., 1999). Functional imaging results in humans have also been viewed as supportive of this framework (Alain, Arnott, et al., 2001; Warren, Zielinski, et Page 14 of 62

Audition al., 2002). Additional evidence for a “what/where” dissociation comes from a recent study in which sound localization and temporal pattern discrimination in cats were selectively impaired by reversibly deactivating different regions of nonprimary auditory cortex (Lomber & Malhotra, 2008). However, other studies have found less evidence for segre­ gation of tuning properties in early auditory cortex (Bizley, Walker, et al., 2009). More­ over, the properties of the “what” stream remain relatively undefined (Recanzone, 2008); at this point, it has been defined mainly by reduced selectivity to spatial location. There have been further attempts to extend the characterization of a ventral auditory pathway by testing for specialization for the analysis of particular categories of sounds, analogous to what has been found in the visual system (Kanwisher, 2010). The most wide­ ly proposed specialization is for vocalizations. Using functional imaging, regions of the anterior temporal lobe have been identified in both humans (Belin, Zatorre, et al., 2000) and macaques (Petkov, Kayser, et al., 2008) that appear to be somewhat selectively re­ sponsive to vocalizations and that could be homologous across species. Evidence for re­ gions selective for other categories is less clear at present (Leaver & Rauschecker, 2010), although see the section below on pitch perception for a discussion of a cortical region putatively involved in pitch processing. Another proposal is that the left and right auditory cortices are specialized for different aspects of signal processing, with the left optimized for temporal resolution and the right for frequency resolution (Zatorre, Belin, et al., 2002). This idea is motivated by the uncer­ tainty principle of time–frequency analysis, whereby resolution cannot simultaneously be optimized for both time and frequency. The evidence for hemispheric differences comes mainly from functional imaging studies that manipulate spectral and temporal stimulus characteristics (Samson, Zeffiro, et al., 2011; Zatorre & Belin, 2001) and neuropsycholo­ gy studies that find pitch perception deficits associated with right temporal lesions (John­ srude, Penhune, et al., 2000; Zatorre, 1985). (p. 145) A related alternative idea is that the two hemispheres are specialized to analyze distinct timescales, with the left hemisphere more responsive to short-scale temporal variation (e.g. tens of milliseconds) and the right hemisphere more responsive to long-scale variation (e.g. hundreds of milliseconds) (Boemio, Fromm, et al., 2005; Poeppel, 2003).

Page 15 of 62


III. Sound Measurement—Modulation Amplitude Modulation and the Envelope

Figure 8.7 Amplitude modulation. A, The output of a bandpass filter (centered at 340 Hz) for a recording of speech, plotted in blue, with its envelope plotted in red. B, Close-up of part of A (corresponding to the black rectangle in A). Note that the filtered sound signal (like the unfiltered signal) fluctuates around zero at a high rate, whereas the envelope is positivevalued and fluctuates more slowly. C, Spectrogram of the same speech signal. Spectrogram is formed from the envelopes (one of which is plotted in A) of a set of filters mimicking the frequency tuning of the cochlea. The spectrogram is produced by plotting each envelope horizontally in grayscale. D, Power spectra of the filtered speech signal in A and its en­ velope. Note that the envelope contains power only at low frequencies (modulation frequencies), where­ as the filtered signal has power at a restricted range of high frequencies (acoustic frequencies).

The cochlea decomposes the acoustic input into frequency channels, but much of the im­ portant information in sound is conveyed by the way that the output of these frequency channels is modulated in amplitude. Consider Figure 8.7A, which displays in blue the out­ put of one such frequency channel for a short segment of a speech signal. The blue wave­ form oscillates at a rapid rate, but its amplitude waxes and wanes at a much lower rate (evident in the close-up view of Figure 8.7B). This waxing and waning is known as ampli­ tude modulation and is a common feature of many modes of sound production (e.g., vocal articulation). The amplitude is captured by what is known as the envelope of a signal, shown in red for the signal of Figures 8.7A and B. Often, the envelopes of each cochlear channel are stacked vertically and displayed as an image called a spectrogram, providing a depiction of how the sound energy in each frequency channel varies over time (Figure 8.7C). Figure 8.7D shows the spectra of the signal and envelope shown in Figures 8.7A and B. The signal spectrum is bandpass (because it is the output of a bandpass filter), with energy at frequencies in the audible range. The envelope spectrum, in contrast, is Page 16 of 62

Audition low-pass, with most of the power below 10 Hz, corresponding to the slow rate at which the envelope changes. The frequencies that compose the envelope are typically termed modulation frequencies, distinct from the acoustic frequencies that compose the signal that the envelope is derived from. The information carried by a cochlear channel can thus be viewed as the product of “fine structure”—a (p. 146) waveform that varies rapidly, at a rate close to the center frequency of the channel—and an amplitude envelope that varies more slowly (Rosen, 1992). The envelope and fine structure have a clear relation to common signal processing formula­ tions in which the output of a bandpass filter is viewed as a single sinusoid varying in am­ plitude and frequency—the envelope describes the amplitude variation, and the fine structure describes the frequency variation. The envelope of a frequency channel is also straightforward to extract from the auditory nerve—it can be obtained by low-pass filter­ ing a spike train (because the amplitude changes reflected in the envelope are relatively slow). Despite the fact that envelope and fine structure are not completely independent (Ghitza, 2001), there has been much interest in the past decade in distinguishing their roles in different aspects of hearing (Smith, Delgutte, et al., 2002) and its impairment (Lorenzi, Gilbert, et al., 2006). Perhaps surprisingly, the temporal information contained in amplitude envelopes can be sufficient for speech comprehension even when spectral information is severely limited. In a classic paper, Shannon and colleagues isolated the information contained in the am­ plitude envelopes of speech signals with a stimulus known as noise-vocoded speech (Shannon, Zeng, et al., 1995). Noise-vocoded speech is generated by filtering a speech signal and a noise signal into frequency bands, multiplying the frequency bands of the noise by the envelopes of the speech, and then summing the modified noise bands to syn­ thesize a new sound signal. By using a small number of broad frequency bands, spectral information can be greatly reduced, leaving amplitude variation over time (albeit smeared across a broader than normal range of frequencies) as the primary signal cue. Examples are shown in Figure 8.8 for two, four, and eight bands. Shannon and colleagues found that the resulting stimulus was intelligible even when just a few bands were used (i.e., with much broader frequency tuning than is present in the cochlea), indicating that the tempo­ ral modulation of the envelopes contains much information about speech content.

Modulation Tuning Motivated by its perceptual importance, amplitude modulation has been proposed to be analyzed by dedicated banks of filters operating on the envelopes of cochlear filter out­ puts rather than the sound waveform itself (Dau, Kollmeier, et al., 1997). Early evidence for such a notion came from masking and adaptation experiments, which found that the detection of a modulated signal was impaired by a masker or adapting stimulus modulat­ ed at a similar frequency (Bacon & Grantham, 1989; Houtgast, 1989; Tansley & Suffield, 1983). There is now considerable evidence from neurophysiology that single neurons in the midbrain, thalamus, and cortex exhibit some degree of tuning to modulation frequen­ cy (Depireux, Simon, et al., 2001; Joris, Schreiner, et al., 2004; Miller, Escabi, et al., 2001; Page 17 of 62

Audition Rodriguez, Chen, et al., 2010; Schreiner & Urbas, 1986, 1988; Woolley, Fremouw, et al., 2005), loosely consistent with the idea of a modulation filter bank (Figure 8.9A). Because such filters are typically conceived to operate on the envelope of a particular cochlear channel, they are tuned both in acoustic frequency (courtesy of the cochlea) and modula­ tion frequency. Neurophysiological studies in nonhuman animals (Schreiner & Urbas, 1986, 1988) and neuroimaging results in humans (Boemio, Fromm, et al., 2005; Giraud, Lorenzi, et al., 2000; Schonwiesner & Zatorre, 2009) have generally found that the auditory cortex re­ sponds preferentially to low modulation frequencies (in the range of 4–8 Hz), whereas subcortical structures prefer higher rates (up to 100–200 Hz), with preferred modulation frequency generally decreasing up the auditory pathway. Based on this, it is intriguing to speculate that successive stages of the auditory system might process structure at pro­ gressively longer (slower) timescales, analogous to the progressive increase in receptive field size that occurs in the visual system from V1 to inferotemporal cortex (Lerner, Hon­ ey, et al., 2011). Within the cortex, however, no hierarchy is clearly evident as of yet, at least in the response to simple patterns of modulation (Boemio, Fromm, et al., 2005; Gi­ raud, Lorenzi, et al., 2000). Moreover, there is considerable variation within each stage of the pathway in the preferred modulation frequency of individual neurons (Miller, Escabi, et al., 2001; Rodriguez, Chen, et al., 2010). There are several reports of topographic orga­ nization for modulation frequency in the inferior colliculus, in which a gradient of pre­ ferred modulation frequency is observed orthogonal to the tonotopic gradient of pre­ ferred acoustic frequency (Baumann, Griffiths, et al., 2011; Langner, Sams, et al., 1997). Whether there is topographic organization in the cortex remains unclear (Nelken, Bizley, et al., 2008).

Page 18 of 62


Figure 8.8 Noise-vocoded speech. A, Spectrogram of a speech utterance, generated as in Figure 8.7C. B–D Spectrograms of noisevocoded versions of the utter­ ance from A, generated with eight (B), four, (C), or two (D) channels. To generate the noise-vocoded speech, the amplitude envelope of the original speech signal was first measured in each of the fre­ quency bands in B, C, and D. A white noise signal was then filtered into these same bands, and the noise bands were multiplied by the corresponding speech envelopes. These modulated noise bands were then summed to generate a new sound signal. It is visually apparent that the sounds in parts B to D are spectrally coarser versions of the original utter­ ance. Good speech intelligibility is usually obtained with only four channels, indicating that patterns of amplitude modulation can support speech recogni­ tion in the absence of fine spectral detail.

Modulation tuning in single neurons is often studied by measuring spectrotemporal re­ ceptive fields (STRFs) (Depireux, Simon, et al., 2001), (p. 147) conventionally estimated using techniques such as spike-triggered averaging. To compute an STRF, neuronal re­ sponses to a long, stochastically varying stimulus are recorded, after which the stimulus spectrogram segments preceding each spike are averaged to yield the STRF—the stimu­ lus, described in terms of acoustic frequency content over time, that on average preceded a spike. In Figure 8.9B, for instance, the STRF consists of a decrease in power followed by an increase in power in the range of 10 kHz; the neuron would thus be likely to re­ spond well to a rapidly modulated 10 kHz tone, and less so to a tone whose amplitude was constant. This STRF can be viewed as a filter that passes modulations in a certain range of rates, that is, modulation frequencies. Note, however, that it is also tuned in acoustic frequency (the dimension on the y-axis), responding only to modulations of fairly high acoustic frequencies.

Page 19 of 62


Figure 8.9 Modulation tuning. A, Example of tempo­ ral modulation tuning curves for neurons in the me­ dial geniculate nucleus of the thalamus (Miller, Es­ cabi, et al., 2002). B, Example of the spectrotemporal receptive field (STRF) from a thalamic neuron (Miller , Escabi, et al., 2002). Note that the modulation in the STRF is predominantly along the temporal di­ mension, and that this neuron would thus be sensi­ tive primarily to temporal modulation. C, Example of STRFs from cortical neurons (Mesgarani, David, et al., 2008). Note that the STRFs feature spectral mod­ ulation in addition to temporal modulation, and as such are selective for more complex acoustic fea­ tures. Cortical neurons typically have longer laten­ cies than subcortical neurons, but this is not evident in the STRFs, probably because of nonlinearities in the cortical neurons that produce small artifacts in the STRFs (Stephen David, personal communication). Figure parts are taken from the original sources.

The STRF approximates a neuron’s output as a linear function of the cochlear input—the result of convolving the spectrogram of the acoustic input with the STRF. However, it is clear that linear models are inadequate to explain neuronal responses (Christianson, Sa­ hani, et al., 2008; Machens, Wehr, et al., 2004; Rotman, Bar Yosef, et al., 2001; Theunis­ sen, Sen, et al., 2000). Understanding the nonlinear contributions is an important direc­ tion (p. 148) of future research (Ahrens, Linden, et al., 2008; David, Mesgarani, et al., 2009), as neuronal nonlinearities likely play critical computational roles, but at present much analysis is restricted to linear receptive field estimates. There are established methods for computing STRFs, and they exhibit many interesting properties even though they are clearly not the whole story. Modulation tuning functions (e.g., those shown in Figure 8.9A) can be obtained via the Fourier transform of the STRF. Temporal modulation tuning is commonly observed, as previously discussed, but some tuning is normally also present for spectral modulation— Page 20 of 62

Audition variation in power that occurs along the frequency axis. Spectral modulation is often evi­ dent as well in spectrograms of speech (e.g., Figure 8.7C) and animal vocalizations. Mod­ ulation results both from individual frequency components and from formants—the broad spectral peaks that are present for vowel sounds due to vocal tract resonances. Tuning to spectral modulation is generally less pronounced than to amplitude modulation, especial­ ly subcortically (Miller, Escabi, et al., 2001), but is an important feature of cortical re­ sponses (Barbour & Wang, 2003; Mesgarani, David, et al., 2008). Examples of cortical STRFs with spectral modulation sensitivity are shown in Figure 8.9C.

(p. 149)

IV. Adaptive Coding and Plasticity

Because the auditory system evolved to enable behavior in natural auditory environ­ ments, it is likely to be adapted for the representation of naturally occurring sounds. Nat­ ural sounds thus in principle should provide hearing researchers with clues about the structure and function of the auditory system (Attias & Schreiner, 1997). In recent years there has been increasing interest in the use of natural sounds as experimental stimuli and in computational analyses of the relation between auditory representation and the environment. Most of the insights gained thus far from this approach are “postdictive”— they offer explanations of previously observed phenomena rather than revealing previous­ ly unforeseen mechanisms. For instance, we described earlier the attempts to explain cochlear frequency selectivity as optimal for encoding natural sounds (Lewicki, 2002; Smith & Lewicki, 2006). The efficient coding hypothesis has also been proposed to apply to modulation tuning in the inferior colliculus. Modulation tuning bandwidth tends to increase with preferred modulation frequency (Rodriguez, Chen, et al., 2010), as would be predicted if the lowpass modulation spectra of most natural sounds (Attias & Schreiner, 1997; McDermott, Wrobleski, et al., 2011; Singh & Theunissen, 2003) were to be divided into channels con­ veying equal power. Inferior colliculus neurons have also been found to convey more in­ formation about sounds whose amplitude distribution follows that of natural sounds rather than that of white noise (Escabi, Miller, et al., 2003). Along the same lines, studies of STRFs in the bird auditory system indicate that neurons are tuned to the properties of bird song and other natural sounds, maximizing discriminability of behaviorally important sounds (Hsu, Woolley, et al., 2004; Woolley, Fremouw, et al., 2005). Similar arguments have been made about the coding of binaural cues to sound localization (Harper & McAlpine, 2004). Other strands of research have explored whether the auditory system might further adapt to the environment by changing its coding properties in response to changing environ­ mental statistics, so as to optimally represent the current environment. Following on re­ search showing that the visual system adapts to local contrast statistics (Fairhall, Lewen, et al., 2001), numerous groups have reported evidence for neural adaptation in the audi­ tory system—responses to a fixed stimulus that vary depending on the immediate history of stimulation (Ulanovsky, Las, et al., 2003; Kvale & Schreiner, 2004). In some cases, it Page 21 of 62

Audition can be shown that this adaptation increases information transmission. For instance, the “tuning” of neurons in the inferior colliculus to sound intensity (i.e., the function relating intensity to firing rate) depends on the mean and variance of the local intensity distribu­ tion (Dean, Harper, et al., 2005). Qualitatively, the rate–intensity curves shift so that the point of maximum slope (around which neural discrimination of intensity is best) is closer to the most commonly occurring intensity. Quantitatively, this behavior results in in­ creased information transmission about stimulus level. Some researchers have recently taken things a step further, showing that auditory re­ sponses are dependent not just on the stimulus history but also on the task a listener is performing. Fritz and colleagues found that the STRFs measured for neurons in the pri­ mary auditory cortex of awake ferrets change depending on whether the animals are per­ forming a task (Fritz, Shamma, et al., 2003), and that the nature of the change depends on the task (Fritz, Elhilali, et al., 2005). For instance, STRF changes serve to accentuate the frequency of a tone being detected, or to enhance discrimination of a target tone from a reference. These changes are mirrored in sound-evoked responses in the prefrontal cor­ tex (Fritz, David, et al., 2010), which may drive the changes that occur in auditory cortex during behavior. In some cases the STRF changes persist long after the animals are fin­ ished performing the task, and as such may play a role in sensory memory and perceptual learning. Perhaps surprisingly, long-term plasticity appears to occur as early as the brainstem, where recent evidence in humans suggests considerable experience-dependent variation across individuals. The data in question derive from an evoked electrical potential known as the auditory brainstem response (ABR) (Skoe & Kraus, 2010). The ABR is recorded at the scalp but is believed to originate in the brainstem. It often mirrors properties of the stimulus, such that its power spectrum, for instance, often resembles that of the acoustic input. The extent to which the ABR preserves the stimulus can thus be interpreted as a measure of processing integrity. Interestingly, the ABR more accurately tracks stimulus frequency for musician listeners than nonmusicians (Wong, Skoe, et al., 2007). This could in principle reflect innate differences in auditory ability that predispose listeners to be­ come musicians or not, but it could also reflect the substantial differences in auditory ex­ perience between the two groups. Consistent (p. 150) with the latter notion, 10 hours of training on a pitch discrimination task is sufficient to improve the fidelity of the ABR re­ sponse to frequency, providing clear evidence of experience-dependent plasticity (Carcagno & Plack, 2011). Aspects of the ABR are also altered in listeners with reading problems (Banai, Hornickel, et al., 2009). This line of research suggests that potentially important individual differences are present at early stages of the auditory system, and that these differences are in part the result of plasticity.

V. Sound Source Perception Ultimately, we wish to understand not only what acoustic measurements are made by the auditory system, as were characterized in the previous sections, but also how these mea­ Page 22 of 62

Audition surements give rise to perception—what we hear when we listen to sound. Following Helmholtz, we might suppose that the purpose of audition is to infer something about the events in the world that produce sound. We can often identify sound sources with a ver­ bal label, for instance, and realize that we heard a finger snap, a flock of birds, or con­ struction noise. Even if we cannot determine the object that caused the sound, we may nonetheless know something about what happened: that something fell onto a hard floor, or into water (Gaver, 1993). Despite the richness of these aspects of auditory recognition, remarkably little is known about them at present (speech recognition stands alone as an exception), mainly because they are rarely studied (but see Gygi, Kidd, et al., 2004; Lutfi, 2008; McDermott & Simoncelli, 2011). Perhaps because they are more easily controlled and manipulated, researchers have been more inclined to instead study the perception of isolated properties of sounds or their sources. Much research has concentrated in particular on three well-known properties of sound: spatial location, pitch, and loudness. This focus is in some sense unfortunate be­ cause auditory perception is much richer than the hegemony of these three attributes in hearing science would indicate. However, their study has nonetheless given rise to fruit­ ful lines of research that have yielded many useful insights about hearing more generally.

Localization Localization is less precise in hearing than in vision but is nonetheless of great value, be­ cause sound enables us to localize objects that we may not be able to see. Human ob­ servers can judge the location of a source to within a few degrees if conditions are opti­ mal. The processes by which this occurs are among the best understood in hearing. Spatial location is not made explicit on the cochlea, which provides a map of frequency rather than of space, and instead must be derived from three primary sources of informa­ tion. Two of these are binaural, resulting from differences in the acoustic input to the two ears. Due to the difference in path length from the source to the ears, and to the acoustic shadowing effect of the head, sounds to one side of the vertical meridian reach the two ears at different times and with different intensities. These interaural time and level dif­ ferences vary with direction and thus provide a cue to a sound source’s location. Binaural cues are primarily useful for deriving the location of a sound in the horizontal plane, be­ cause changes in elevation do not change interaural time or intensity differences much. To localize sounds in the vertical dimension, or to distinguish sounds coming from in front of the head from those from in back, listeners rely on a third source of information: the filtering of sounds by the body and ears. This filtering is direction specific, such that a spectral analysis can reveal peaks and valleys in the frequency spectrum that are signa­ tures of location in the vertical dimension (Figure 8.10; discussed further below).

Page 23 of 62


Figure 8.10 Head-related transfer function (HRTF). Example HRTF for the left ear of one human listener. The gray level represents the amount by which a fre­ quency originating at a particular elevation is attenu­ ated or amplified by the torso, head, and ear of the listener. Sounds are filtered differently depending on their elevation, and the spectrum that is registered by the cochlea thus provides a localization cue. Note that most of the variation in elevation-dependent fil­ tering occurs at high frequencies (above 4 kHz). Figure is reprinted with permission from original source: Zahorik, Bangayan, et al., 2006.

Interaural time differences (ITDs) are typically a fraction of a millisecond, and just-notice­ able ITDs (which determine spatial acuity) can be as low as 10 microseconds (Klump & Eady, 1956). This is striking given that neural refractory periods (which determine the minimal interspike interval for a single neuron) are on the order of a millisecond, which one might think would put a limit on the temporal resolution of neural representations. Typical interaural level differences (ILDs) can be as large as 20 dB, with a just-noticeable difference of about 1 dB. ILDs result from the acoustic shadow cast by the head. To first order, ILDs are more pronounced for high frequencies because low frequencies are less affected by the acoustic shadow (because their wavelengths are comparable to the dimen­ sions of the head). ITDs, in contrast, support localization most effectively at low frequen­ cies, when the time difference between individual cycles of sinusoidal sound components can be detected via phase-locked spikes from the two ears (phase locking, as we dis­ cussed earlier, degrades at high frequencies). That said, ITDs between the envelopes of high-frequency sounds can also produce percepts of localization. The classic “duplex” view that localization is determined by either ILDs or ITDs, depending (p. 151) on the fre­ quency (Rayleigh, 1907), is thus not fully appropriate for realistic natural sounds, which in general produce perceptible ITDs across the spectrum. See Middlebrooks and Green (1991), for a review of much of the classic behavioral work on sound localization. The binaural cues to sound location are extracted in the superior olive, a subcortical re­ gion where inputs from the two ears are combined. In most animals there appears to be an elegant segregation of function, with ITDs being extracted in the medial superior olive (MSO) and ILDs being extracted in the lateral superior olive (LSO). In both cases, accu­ rate coding of interaural differences is made possible by neural signaling with unusually high temporal precision. This precision is needed to encode both sub-millisecond ITDs and ILDs of brief transient events, for which the inputs from the ears must be aligned in time. Brain structures subsequent to the superior olive largely inherit its ILD and ITD Page 24 of 62

Audition sensitivity. See Yin and Kuwada, 2010, for a recent review of the physiology of binaural lo­ calization. Binaural cues are of little use in distinguishing sounds at different locations on the verti­ cal dimension (relative to the head), or in distinguishing front from back, because interau­ ral time and level differences are largely unaffected by changes across these locations. Instead, listeners rely on spectral cues provided by the filtering of a sound by the torso, head, and ears of a listener. The filtering results from the reflection and absorption of sound by the surfaces of a listener’s body, with sound from different directions producing different patterns of reflection and thus different patterns of filtering. The effect of these interactions on the sound that reaches the eardrum can be described by a linear filter known as the head-related transfer function (HRTF). The overall effect is that of amplify­ ing some frequencies while attenuating others. A broadband sound entering the ear will thus be endowed with peaks and valleys in its frequency spectrum (see Figure 8.10). Compelling sound localization can be perceived when these peaks and valleys are artifi­ cially induced. The effect of the filtering is obviously confounded with the spectrum of the unfiltered sound source, and the brain must make some assumptions about the source spectrum. When these assumptions are violated, as with narrowband sounds whose spec­ tral energy occurs at a peak in the HRTF of a listener, sounds are mislocalized (Middle­ brooks, 1992). For broadband sounds, however, HRTF filtering produces signatures that are sufficiently distinct as to support localization in the vertical dimension to within 5 de­ grees or so in some cases, although some locations are more accurately perceived than others (Makous & Middlebrooks, 1990; Wightman & Kistler, 1989). The bulk of the filtering occurs in the outer ear (the pinna), the folds of which produce distinctive pattern of reflections. Because pinna shapes vary across listeners, the HRTF is listener specific as well as location specific, with spectral peaks and valleys that are in different places for different listeners. Listeners appear to learn the HRTFs for their set of ears. When ears are artificially modified with plastic molds that change their shape, lo­ calization initially suffers considerably, but over a period of weeks, listeners regain the ability to localize with the modified ears (Hofman, Van Riswick, et al., 1998). Listeners thus learn at least some of the details of their particular HRTF through experience, al­ though sounds (p. 152) can be localized even when the peaks and valleys of the pinna fil­ tering are somewhat blurred (Kulkarni & Colburn, 1998). Moreover, compelling spatial­ ization is often evident even if a generic HRTF is used. The physiology of HRTF-related cues for localization is not as developed as it is for binau­ ral cues, but there is evidence that midbrain regions may again be important. Many infe­ rior colliculus neurons, for instance, show tuning to sound elevation (Delgutte, Joris, et al., 1999). The selectivity for elevation presumably derives from tuning to particular spec­ tral patterns (peaks and valleys in the spectrum) that are diagnostic of particular loca­ tions (May, Anderson, et al., 2008).

Page 25 of 62

Audition Although the key cues for sound localization are extracted subcortically, lesion studies re­ veal that the cortex is essential for localizing sound. Ablating auditory cortex typically produces large deficits in localization (Heffner & Heffner, 1990), with unilateral lesions producing deficits specific to locations contralateral to the side of the lesion (Jenkins & Masterton, 1982). Consistent with these findings, tuning to sound location is widespread in auditory cortical neurons, with the preferred location generally positioned in the con­ tralateral hemifield (Middlebrooks, 2000). Topographic representations of space have not been found to be evident within individual auditory cortical areas, although one recent re­ port argues that such topography may be evident across multiple areas (Higgins, Storace, et al., 2010).

Pitch Although the word pitch is often used colloquially to refer to the perception of sound fre­ quency, in hearing research it has a more specific meaning—pitch is the perceptual corre­ late of periodicity. Vocalizations, instrument sounds, and some machine sounds are all of­ ten produced by periodic physical processes. Our vocal cords open and close at regular intervals, producing a series of clicks separated by regular temporal intervals. Instru­ ments produce sounds via strings that oscillate at a fixed rate, or via tubes in which the air vibrates at particular resonant frequencies, to give two examples. Machines frequent­ ly feature rotating parts, which often produce sounds at every rotation. In all these cases, the resulting sounds are periodic—the sound pressure waveform consists of a single shape that repeats at a fixed rate (Figure 8.11A). Perceptually, such sounds are heard as having a pitch that can vary from low to high, proportional to the frequency at which the waveform repeats (the fundamental frequency, i.e., the F0). The periodicity is distinct from whether a sound’s frequencies fall in high or low regions of the spectrum, although in practice periodicity and the spectral center of mass are sometimes correlated. Pitch is important because periodicity is important—the period is often related to proper­ ties of the source that are useful to know, such as its size, or tension. Pitch is also used for communicative purposes, varying in speech prosody, for instance, to convey meaning or emotion. Pitch is a centerpiece of music, forming the basis of melody, harmony, and tonality. Listeners also use pitch to track sound sources of interest in auditory scenes. Many physically different sounds—all those with a particular period—have the same pitch. Historically, pitch has been a focal point of hearing research because it is an impor­ tant perceptual property with a nontrivial relationship to the acoustic input, whose mech­ anistic characterization has been resistant to unambiguous solution. Debates on pitch and related phenomena date back at least to Helmholtz, and continue to occupy many re­ searchers today (Plack, Oxenham, et al., 2005). One central debate concerns whether pitch is derived from an analysis of frequency or time. Periodic waveforms produce spectra whose frequencies are harmonically related— they form a harmonic series, being integer multiples of the fundamental frequency, whose period is the period of the waveform (Figure 8.11B). Although the fundamental frequency Page 26 of 62

Audition determines the pitch, the fundamental need not be physically present in the spectrum for a sound to have pitch—sounds missing the fundamental frequency but containing other harmonics of the fundamental are still perceived to have the pitch of the fundamental, an effect known as the missing fundamental illusion. What matters for pitch perception is whether the frequencies that are present are harmonically related. Pitch could thus con­ ceivably be detected with harmonic templates applied to an estimate of a sound’s spec­ trum obtained from the cochlea (Goldstein, 1973; Shamma & Klein, 2000; Terhardt, 1974; Wightman, 1973). Alternatively, periodicity could be assessed in the time domain, for in­ stance via the autocorrelation function (Cariani & Delgutte, 1996; de Cheveigne & Kawa­ hara, 2002; Meddis & Hewitt, 1991). The autocorrelation measures the correlation of a signal with a delayed copy of itself. For a periodic signal that repeats with some period, the autocorrelation exhibits peaks at multiples of the period (Figure 8.11C).

Figure 8.11 Periodicity and pitch. Waveform, spec­ trum, and autocorrelation function for a note played on an oboe. The note shown is the A above middle C, with a fundamental frequency (F0) of 440 Hz. A, Ex­ cerpt of waveform. Note that the waveform repeats every 2.27 ms (the period). B, Spectrum. Note the peaks at integer multiples of the F0, characteristic of a periodic sound. In this case, the F0 is physically present, but the second, third, and fourth harmonics actually have higher amplitude. C, Autocorrelation. The correlation coefficient is always 1 at a lag of 0 ms, but because the waveform is periodic, correla­ tions close to 1 are also found at integer multiples of the period (2.27, 4.55, 6.82, and 9.09 ms in this ex­ ample). Figure reprinted with permission from original source: McDermott & Oxenham, 2008. (p. 153)

Page 27 of 62

Audition Such analyses are in principle functionally equivalent because the power spectrum is re­ lated to the autocorrelation via the Fourier transform, and detecting periodicity in one do­ main versus the other might simply seem a question of implementation. In the context of the auditory system, however, the two concepts diverge, due to information being limited by distinct factors in the two domains. Time–domain models are typically assumed to uti­ lize fine-grained spike timing (i.e., phase locking), with concomitant temporal resolution limits. In contrast, frequency-based models (often known as place models, in reference to the frequency–place mapping that occurs on the basilar membrane) rely on the pattern of excitation along the cochlea, which is limited in resolution by the frequency tuning of the cochlea (Cedolin & Delgutte, 2005). Cochlear frequency selectivity is present in time–do­ main models of pitch as well, but its role is typically not to estimate the spectrum but sim­ ply to restrict an autocorrelation analysis to a narrow frequency band (Bernstein & Oxen­ ham, 2005), which might help improve its robustness in the presence of multiple sound sources. Reviews of the current debates and their historical origins are available else­ where (de Cheveigne, 2004; Plack & Oxenham, 2005), and we will not discuss them ex­ haustively here. Suffice it to say that despite being a centerpiece of hearing research for decades, the mechanisms underlying pitch perception remain under debate. Research on pitch has provided many important insights about hearing even though a conclusive account of pitch remains elusive. One contribution of pitch research has been to reveal the importance of the resolvability of individual frequency components by the cochlea, a principle that has importance in other aspects of hearing as well. Because the frequency resolution of the cochlea is approximately constant on a logarithmic scale, whereas the components of a harmonic tone are equally spaced on a linear scale (separat­ ed by a fixed number of hertz, equal to the fundamental frequency of the tone; Figure 8.12A), multiple high-numbered harmonics fall within a single cochlear filter (Figure 8.12B). Because of the nature of the log scale, this is true regardless of whether the fun­ damental is low or high. As a result, the excitation pattern induced by a tone on the cochlea (of a human with normal hearing) is believed to contain resolvable peaks for only the first ten or so harmonics (Figure 8.12C).

Page 28 of 62


Figure 8.12 Resolvability. A, Spectrum of a harmonic complex tone composed of thirty-five harmonics of equal amplitude. The fundamental frequency is 100 Hz—the frequency of the lowest component in the spectrum and the amount by which adjacent harmon­ ics are separated. B, Frequency responses of audito­ ry filters, each of which represents a particular point on the cochlea. Note that because a linear frequency scale is used, the filters increase in bandwidth with center frequency, such that many harmonics fall within the passband of the high frequency filters. C, The resulting pattern of excitation along the cochlea in response to the tone in A. The excitation is the am­ plitude of vibration of the basilar membrane as a function of characteristic frequency (the frequency to which a particular point on the cochlea responds best, i.e., the center frequency of the auditory filter representing the response properties of the cochlea at that point). Note that the first ten or so harmonics produce resolvable peaks in the pattern of excitation, but that higher numbered harmonics do not. The lat­ ter are thus said to be “unresolved.” D, The pattern of vibration that would be observed on the basilar membrane at several points along its length. When harmonics are resolved, the vibration is dominated by the harmonic close to the characteristic frequen­ cy, and is thus sinusoidal. When harmonics are unre­ solved, the vibration pattern is more complex, re­ flecting the multiple harmonics that stimulate the cochlea at those points. Figure reprinted with permission from original source: Plack, 2005.

There is now abundant evidence that resolvability places strong constraints on pitch per­ ception. For instance, the perception of pitch is determined (p. 154) predominantly by lownumbered harmonics (harmonics one to ten or so in the harmonic series), presumably ow­ ing to the peripheral resolvability of these harmonics. Moreover, the ability to discrimi­ Page 29 of 62

Audition nate pitch is much poorer for tones synthesized with only high-numbered harmonics than for tones containing only low-numbered harmonics, an effect not accounted for simply by the frequency range in which the harmonics occur (Houtsma & Smurzynski, 1990; Shack­ leton & Carlyon, 1994). This might be taken as evidence that the spatial pattern of excita­ tion, rather than the periodicity that could be derived from the autocorrelation, underlies pitch perception, but variants of autocorrelation-based models have also been (p. 155) pro­ posed to account for the effect of resolvability (Bernstein & Oxenham, 2005). Resolvabili­ ty has since been demonstrated to constrain sound segregation as well as pitch (Micheyl & Oxenham, 2010); see below. Just as computational theories of pitch remain a matter of debate, so do its neural corre­ lates. One might expect that neurons at some stage of the auditory system would be tuned to stimulus periodicity, and there is one recent report of this in marmosets (Bendor & Wang, 2005). However, comparable results have yet to be reported in other species (Fishman, Reser, et al., 1998), and some have argued that pitch is encoded by ensembles of neurons with broad tuning rather than single neurons selective for particular funda­ mental frequencies (Bizley, Walker, et al., 2010). In general, pitch-related responses can be difficult to disentangle from artifactual responses to distortions introduced by the non­ linearities of the cochlea (de Cheveigne, 2010; McAlpine, 2004). Given the widespread presence of frequency tuning in the auditory system, and the im­ portance of harmonic frequency relations in pitch, sound segregation (Darwin, 1997), and music (McDermott, Lehr, et al., 2010), it is natural to think there might be neurons with multipeaked tuning curves selective for harmonic frequencies. There are a few isolated reports of such tuning (Kadia & Wang, 2003; Sutter & Schreiner, 1991), but the tuning peaks do not always correspond to harmonic frequencies, and whether they relate to pitch is unclear. At least given how researchers have looked for it thus far, tuning for har­ monicity is not as evident in the auditory system as might be expected. If pitch is analyzed in a particular part of the brain, one might expect the region to re­ spond more to stimuli with pitch than to those lacking it, other things being equal. Such response properties have in fact been reported in regions of auditory cortex identified with functional imaging in humans (Hall, Barrett, et al. 2005; Patterson, Uppenkamp, et al., 2002; Penagos, Melcher, et al., 2004; Schonwiesner & Zatorre, 2008). The regions are typically reported to lie outside primary auditory cortex, and could conceivably be homol­ ogous to the region claimed to contain pitch-tuned neurons in marmosets (Bendor & Wang, 2006), although again there is some controversy over whether pitch per se is impli­ cated (Hall & Plack, 2009). See Winter, 2005, and Walker, Bizley, et al., 2010, for recent reviews of the brain basis of pitch. In many contexts (e.g., the perception of music or speech intonation), it is the changes in pitch over time that matter rather than the absolute value of the F0. For instance, pitch increases or decreases are what capture the identity of a melody or the intention of a speaker. Less is known about how this relative pitch information is represented in the brain, but the right temporal lobe has been argued to be important, in part on the basis of Page 30 of 62

Audition brain-damaged patients with apparently selective deficits in relative pitch (Johnsrude, Penhune, et al., 2000). See McDermott and Oxenham, 2008, for a review of the perceptu­ al and neural basis of relative pitch.

Loudness Loudness is the perhaps the most immediate perceptual property of sound, and has been actively studied for more than 150 years. To first order, loudness is the perceptual corre­ late of sound intensity. In real-world listening scenarios, loudness exhibits additional in­ fluences that suggest it serves to estimate the intensity of a sound source, as opposed to the intensity of the sound entering the ear (which changes with distance and the listening environment). However, loudness models that capture exclusively peripheral processing nonetheless have considerable predictive power. For a sound with a fixed spectral profile, such as a pure tone or a broadband noise, the relationship between loudness and intensity can be approximated via the classic Stevens power law (Stevens, 1955). However, the relation between loudness and intensity is not as simple as one might imagine. For instance, loudness increases with increasing band­ width—a sound whose frequencies lie in a broad range will seem louder than a sound whose frequencies lie in a narrow range, even when their physical intensities are equal. Standard models of loudness thus posit something somewhat more complex than a simple power law of intensity: that loudness is linearly related to the total amount of neural ac­ tivity elicited by a stimulus at the level of the auditory nerve (ANSI, 2007; Moore & Glas­ berg, 1996). The effect of bandwidth on loudness is explained via the compression that occurs in the cochlea: loudness is determined by the neural activity summed across nerve fibers, the spikes of which are generated after the output of a particular cochlear location is nonlinearly compressed. Because compression boosts low responses relative to high re­ sponses, the sum of several responses to low amplitudes (produced by the several fre­ quency channels stimulated by a broadband sound) is greater than a single response to a high amplitude (produced by a single frequency (p. 156) channel responding to a narrow­ band sound of equal intensity). Loudness also increases with duration for durations up to half a second or so (Buus, Florentine, et al., 1997), suggesting that it is computed from neural activity integrated over some short window. The ability to predict perceived loudness is important in many practical situations, and is a central issue in the fitting of hearing aids. Cochlear compression is typically reduced in hearing-impaired listeners, and amplification runs the risk of making sounds uncomfort­ ably loud unless compression is introduced artificially. There has thus been long-standing interest in quantitative models of loudness. Loudness is also influenced in interesting ways by the apparent distance of a sound source. Because intensity attenuates with distance from a sound source, the intensity of a sound at the ear is determined conjointly by the intensity and distance of the source. At least in some contexts, the auditory system appears to use loudness as a perceptual esti­ mate of a source’s intensity (i.e., the intensity at the point of origin), such that sounds Page 31 of 62

Audition that appear more distant seem louder than those that appear closer but have the same overall intensity. Visual cues to distance have some influence on perceived loudness (Mer­ shon, Desaulniers, et al., 1981), but the cue provided by the amount of reverberation also seems to be important. The more distant a source, the weaker the direct sound from the source to the listener, relative to the reverberant sound that reaches the listener after re­ flection off of surfaces in the environment (see Figure 8.14). This ratio of direct to rever­ berant sound appears to be used both to judge distance and to calibrate loudness percep­ tion (Zahorik & Wightman, 2001), although how the listener estimates this ratio from the sound signal remains unclear at present. Loudness thus appears to function somewhat like size or brightness perception in vision, in which perception is not based exclusively on retinal size or light intensity (Adelson, 2000).

VI. Auditory Scene Analysis Thus far we have discussed how the auditory system represents single sounds in isola­ tion, as might be produced by a note played on an instrument, or a word uttered by some­ one talking. The simplicity of such isolated sounds renders them convenient objects of study, yet in many auditory environments, isolated sounds are not the norm. It is often the case that many things make sound at the same time, causing the ear to receive a mixture of multiple sources as its input. Consider Figure 8.13, which displays spectrograms of a single “target” speaker along with that of the mixture that results from adding to it the utterances of one, three, and seven additional speakers, as might occur in a social set­ ting. The brain’s task in this case is to take such a mixture as input and recover enough of the content of a target sound source to allow speech comprehension or otherwise support behavior. This is a nontrivial task. In the example of Figure 8.13, for instance, it is appar­ ent that the structure of the target utterance is progressively obscured as more speakers are added to the mixture. Machine systems for recognizing speech suffer dramatically un­ der such conditions, performing well in quiet, but much worse in the presence of multiple speakers (Lippmann, 1997). The presence of competing sounds greatly complicates the computational extraction of just about any sound source property, from pitch (de Cheveigne, 2006) to location. Human listeners, however, parse auditory scenes with a re­ markable degree of success. In the example of Figure 8.13, the target remains largely au­ dible to most listeners even in the mixture of eight speakers. This is the classic “cocktail party problem” (Bee & Micheyl, 2008; Bregman, 1990; Bronkhorst, 2000; Carlyon, 2004; Cherry, 1953; Darwin, 1997; McDermott, 2009). Historically, the “cocktail party problem” has referred to two conceptually distinct prob­ lems that in practice are closely related. The first, known as sound segregation, is the problem of deriving representations of individual sound sources from a mixture of sounds. The second is the task of directing attention to one source among many, as when listening to a particular speaker at a party. These tasks are related because the ability to segregate sounds is probably dependent on attention (Carlyon, Cusack, et al., 2001; Shinn-Cunning­ ham, 2008), although the extent and nature of this dependence remains an active area of study (Macken, Tremblay, et al., 2003). Here, we will focus on the first problem, of sound Page 32 of 62

Audition segregation, which is usually studied under conditions in which listeners pay full atten­ tion to a target sound. Al Bregman, a Canadian psychologist, is typically credited with drawing interest to this problem and pioneering its study (Bregman, 1990).

Sound Segregation and Acoustic Grouping Cues

Figure 8.13 The cocktail party problem. Spectro­ grams of a single “target” utterance (top row), and the same utterance mixed with one, three, and seven additional speech signals from different speakers. The mixtures approximate the signal that would en­ ter the ear if the additional speakers were talking as loud as the target speaker, but were standing twice as far away from the listener (to simulate cocktail party conditions). The grayscale denotes attenuation from the maximum energy level across all of the sig­ nals (in dB), such that gray levels can be compared across spectrograms. Spectrograms in the right col­ umn are identical to those on the left except for the superimposed color masks. Pixels labeled green are those where the original target speech signal is more than –50 dB but the mixture level is at least 5 dB higher, and thus masks the target speech. Pixels la­ beled red are those where the target had less than -50 dB and the mixture had more than –50 dB ener­ gy. Spectrograms were computed from a filter bank with bandwidths and frequency spacing similar to those in the ear. Each pixel is the rms amplitude of the signal within a frequency band and time window. Figure reprinted with permission from original source: McDermott, 2009.

Sound segregation is a classic example of an ill-posed problem in perception. Many differ­ ent sets of sounds are physically consistent with the mixture (p. 157) that enters the ear (in that their sum is equal to the mixture), only one of which actually occurred in the world. The auditory system must infer the set of sounds that actually occurred. As in oth­ Page 33 of 62

Audition er ill-posed problems, this inference is only possible with the aid of assumptions that con­ strain the solution. In this case, the assumptions concern the nature of sounds in the world, and are presumably learned from experience with natural sounds (or perhaps hard-wired into the auditory system via evolution). Grouping cues (i.e., sound properties that dictate whether sound elements are heard as part of the same sound) are examples of these assumptions. For instance, natural sounds that have pitch, such as vocalizations, contain frequencies that are harmonically related, evident as banded structures in lower half of the spectrogram of the target speaker in Figure 8.13. Harmonically related frequencies are unlikely to occur from the chance alignment of multiple different sounds, and thus when they (p. 158) are present in a mix­ ture, they are likely to be due to the same sound and are generally heard as such (de Cheveigne, McAdams, et al., 1995; Roberts & Brunstrom, 1998). Moreover, a component that is mistuned (in a tone containing otherwise harmonic frequencies) segregates from the rest of the tone (Moore, Glasberg, et al., 1986). Understanding sound segregation re­ quires understanding the acoustic regularities, such as harmonicity, that characterize nat­ ural sound sources and that are used by the auditory system. Perhaps the most important generic acoustic grouping cue is common onset: frequency components that begin and end at the same time are likely to belong to the same sound. Onset differences, when manipulated experimentally, cause frequency components to per­ ceptually segregate from each other (Cutting, 1975; Darwin, 1981). Interestingly, a com­ ponent that has an earlier or later onset than the rest of a set of harmonics has reduced influence over the perceived pitch of the entire tone (Darwin & Ciocca, 1992), suggesting that pitch computations operate on frequency components that are deemed likely to be­ long together, rather than on the raw acoustic input. Onset may be viewed as a special case of comodulation—amplitude modulation that is common to different spectral regions. In some cases relatively slow comodulation pro­ motes grouping of different spectral components (Hall, Haggard, et al., 1984), although abrupt onsets seem to be most effective. Common offset also promotes grouping but is less effective than common onset (Darwin, 1984), perhaps because abrupt offsets are less common in natural sounds (Cusack & Carlyon, 2004). Not every intuitively plausible grouping cue produces a robust effect when assessed psy­ chophysically. For instance, frequency modulation (FM) that is shared (“coherent”) across multiple frequency components, as in voiced speech, has been proposed to promote their grouping (Bregman, 1990; McAdams, 1989). However, listeners are poor at discriminat­ ing coherent from incoherent FM if the component tones are not harmonically related, in­ dicating that sensitivity to FM coherence may simply be mediated by the deviations from harmonicity that occur when harmonic tones are incoherently modulated (Carlyon, 1991). One might also think that the task of segregating sounds would be greatly aided by the tendency of distinct sound sources in the world to originate from distinct locations. In practice, spatial cues are indeed of some benefit, for instance, in hearing a target sen­ tence from one direction amid distracting utterances from other directions (Bronkhorst, Page 34 of 62

Audition 2000; Hawley, Litovsky, et al., 2004; Ihlefeld & Shinn-Cunningham, 2008; Kidd, Arbogast, et al., 2005). However, spatial cues are surprisingly ineffective at segregating one fre­ quency component from a group of others (Culling & Summerfield, 1995), especially when pitted against other grouping cues such as onset or harmonicity (Darwin & Hukin, 1997). The benefit of listening to a target with a distinct location (Bronkhorst, 2000; Haw­ ley, Litovsky, et al., 2004; Ihlefeld & Shinn-Cunningham, 2008; Kidd, Arbogast, et al., 2005) may thus be due to the ease with which the target can be attentively tracked over time amid competing sound sources, rather than to a facilitation of auditory grouping per se (Darwin & Hukin, 1999). Moreover, humans are usually able to segregate monaural mixtures of sounds without difficulty, demonstrating that spatial separation is often not necessary for high performance. For instance, much popular music of the twentieth cen­ tury was released in mono, and yet listeners have no trouble distinguishing many differ­ ent instruments and voices in any given recording. Spatial cues thus contribute to sound segregation, but their presence or absence does not seem to fundamentally alter the problem. The weak effect of spatial cues on segregation may reflect their fallibility in complex audi­ tory scenes. Binaural cues can be contaminated when sounds are combined or degraded by reverberation (Brown & Palomaki, 2006) and can even be deceptive, as when caused by echoes (whose direction is generally different from the original sound source). It is possible that the efficacy of different grouping cues in general reflects their reliability in natural conditions. Evaluating this hypothesis will require statistical analysis of natural auditory scenes, an important direction for future research.

Sequential Grouping Because the spectrogram approximates the input that the cochlea provides to the rest of the auditory system, it is common to view the problem of sound segregation as one of de­ ciding how to group the various parts of the spectrogram (Bregman, 1990). However, the brain does not receive an entire spectrogram at once. Rather, the auditory input arrives gradually over time. Many researchers thus distinguish between the problem of simulta­ neous grouping (determining how the spectral content of a short segment of the auditory input should be segregated) and sequential grouping (determining how the (p. 159) groups from each segment should be linked over time, e.g., to form a speech utterance or a melody) (Bregman, 1990). Although most of the classic grouping cues (e.g., onset/comodulation, harmonicity, ITD) are quantities that could be measured over short timescales, the boundary between what is simultaneous and what is sequential is unclear for most real-world signals, and it may be more appropriate to view grouping as being influenced by processes operating at mul­ tiple timescales rather than two cleanly divided stages of processing. There are, however, contexts in which the bifurcation into simultaneous and sequential grouping stages is nat­ ural, as when the auditory input consists of discrete sound elements that do not overlap in time. In such situations interesting differences are sometimes evident between the grouping of simultaneous and sequential elements. For instance, spatial cues, which are Page 35 of 62

Audition relatively weak as a simultaneous cue, have a stronger influence on sequential grouping of tones (Darwin & Hukin, 1997). Another clear case of sequential processing can be found in the effects of sound repeti­ tion. Sounds that occur repeatedly in the acoustic input are detected by the auditory sys­ tem as repeating, and are inferred to be a single source. Perhaps surprisingly, this is true even when the repeating source is embedded in mixtures with other sounds, and is never presented in isolation (McDermott, Wrobleski, et al., 2011). In such cases the acoustic in­ put itself does not repeat, but the source repetition induces correlations in the input that the auditory system detects and uses to extract the repeating sound. The informativeness of repetition presumably results from the fact that mixtures of multiple sounds tend not to occur repeatedly, such that when a structure does repeat, it is likely to be a single source. Effects of repetition are also evident in classic results on “informational masking”—mask­ ing-like effects on the detectability of a target tone, so-called because they cannot be ex­ plained in terms of conventional “energetic masking,” (in which the response to the tar­ get is swamped by a masker that falls within the same peripheral channel). Demonstra­ tions of informational masking typically present a target tone along with other tones that lie outside a “protected region” of the spectrum, such that they are unlikely to stimulate the same filters as the target tone. These “masking” tones nonetheless often elevate the detection threshold for the target, sometimes quite dramatically (Durlach, Mason, et al., 2003; Lutfi, 1992; Neff, 1995; Watson, 1987). The effect is presumably due to impair­ ments in the ability to segregate the target tone from the masker tones, and can be re­ duced when the target is repeatedly presented (Kidd, Mason et al., 1994; Kidd, Mason et al., 2003).

Streaming One type of sequential segregation effect has particularly captured the imagination of the hearing community and merits special mention. When two pure tones of different fre­ quency are repeatedly presented in alternation, one of two perceptual states is commonly reported by listeners: one in which the two repeated tones are heard as a single “stream” whose pitch varies over time, and one in which two distinct streams are heard, one with the high tones and one with the low tones (Bregman & Campbell, 1971). If the frequency separation between the two tones is small, and if the rate of alternation is slow, one stream is generally heard. When the frequency separation is larger or the rate is faster, two streams tend to be heard, in which case “streaming” is said to occur (van Noorden, 1975). An interesting hallmark of this phenomenon is that when two streams are perceived, judgments of the temporal order of elements in different streams are impaired (Bregman & Campbell, 1971; Micheyl & Oxenham, 2010). This latter finding provides compelling ev­ idence for a substantive change in the representation underlying the two percepts. Sub­ sequent research has demonstrated that separation along most dimensions of sound can elicit streaming (Moore & Gockel, 2002). The streaming effects in these simple stimuli Page 36 of 62

Audition may be viewed as a variant of grouping by similarity—elements are grouped together when they are similar along some dimension, and segregated when they are sufficiently different, presumably because this similarity reflects the likelihood of having been pro­ duced by the same source.

Filling in Although it is common to view sound segregation as the problem of grouping the spectro­ gram-like output of the cochlea across frequency and time, this cannot be the whole story, in part because large swaths of a sound’s time–frequency representation are often physi­ cally obscured (masked) by other sources and are thus not physically available to be grouped. Masking is evident in the green pixels of Figure 8.13, which represent points where the target source has substantial energy, but where the mixture exceeds it in level. If these points are simply assigned (p. 160) to the target, or omitted from its representa­ tion, the target’s level at those points will be misconstrued, and the sound potentially misidentified. To recover an accurate estimate of the target source, it is necessary to in­ fer not just the grouping of the energy in the spectrogram but also the structure of the target source in the places where it is masked. There is in fact considerable evidence that the auditory system does just this, from exper­ iments investigating the perception of partially masked sounds. For instance, tones that are interrupted by noise bursts are “filled in” by the auditory system, such that they are heard as continuous in conditions in which physical continuity is plausible given the stim­ ulus (Warren, Obusek, et al., 1972). Known as the “continuity effect”, it occurs only when the interrupting noise bursts are sufficiently intense in the appropriate part of the spec­ trum to have masked the tone should it have been present continuously. Continuity is also heard for frequency glides (Ciocca & Bregman, 1987; Kluender & Jenison, 1992) as well as oscillating frequency-modulated tones (Carlyon, Micheyl, et al., 2004). The perception of continuity across intermittent maskers was actually first reported for speech signals in­ terrupted by noise bursts (Warren, 1970). For speech, the effect is often termed phonemic restoration, and likely indicates that knowledge of speech acoustics (and perhaps of other types of sounds as well) influences the inference of the masked portion of sounds. Similar effects occur in the spectral domain—regions of the spectrum are perceptually filled in when evidence indicates they are likely to have been masked, e.g. by a continuous noise source (McDermott & Oxenham, 2008). Filling-in effects in hearing are conceptually simi­ lar to completion under and over occluding surfaces in vision, although the ecological constraints provided by masking (involving the relative intensity of two sounds) are dis­ tinct from those provided by occlusion (involving the relative depth of two surfaces). Neu­ rophysiological evidence indicates that the representation of tones in primary auditory cortex reflects the perceived continuity, responding as though the tone were continuously present despite being interrupted by noise (Petkov, O’Connor, et al., 2007; Riecke, van Opstal, et al., 2007).

Page 37 of 62


Brain Basis of Sound Segregation Recent years have seen great interest in how sound segregation is instantiated in the brain. One proposal that has attracted interest is that sounds are heard as segregated when they are represented in non-overlapping neural populations at some stage of the au­ ditory system. This idea derives largely from studies of the pure-tone streaming phenome­ na described earlier, with the hope that it will extend to more realistic sounds. The notion is that conditions that cause two tones to be represented in distinct neural populations are also those that cause sequences of two tones to be heard as separate streams (Bee & Klump, 2004; Fishman, Arezzo, et al., 2004; Micheyl, Tian, et al., 2005; Pressnitzer, Sayles, et al., 2008). Because of tonotopy, different frequencies are processed in neural populations whose degree of overlap decreases as the frequencies become more separated. Moreover, tones that are more closely spaced in time are more likely to reduce each other’s response (via what is termed suppression), which also reduces overlap be­ tween the tone representations—a tone on the outskirts of a neuron’s receptive field might be sufficiently suppressed as to not produce a response at all. These two factors, frequency separation and suppression, predict the two key effects in pure-tone stream­ ing: that streaming should increase when tones are more separated in frequency or are presented more quickly (van Noorden, 1975). Experiments over the past decade in multiple animal species indicate that pure-tone se­ quences indeed produce non-overlapping neural responses under conditions in which streaming is perceived by human listeners (Bee & Klump, 2004; Fishman, Arezzo, et al., 2004; Micheyl, Tian, et al., 2005; Pressnitzer, Sayles, et al., 2008). Some of these experi­ ments take advantage of another notable property of streaming—its strong dependence on time. Specifically, the probability that listeners report two streams increases with time from the beginning of the sequence, an effect termed buildup (Bregman, 1978). Buildup has been linked to neurophysiology via neural adaptation. Because neural responses de­ crease with stimulus repetition, over time it becomes less likely that two stimuli with dis­ tinct properties will both exceed the spiking threshold for the same neuron, such that the neural responses to two tones become increasingly segregated on a timescale consistent with that of perceptual buildup (Micheyl, Tian, et al., 2005; Pressnitzer, Sayles, et al., 2008). For a comprehensive review of these and related studies, see Snyder and Alain, 2007, and Fishman and Steinschneider, 2010. A curious feature of these studies is that they suggest that streaming is an accidental side effect of what would appear to be general features of the auditory system—tonotopy, sup­ pression, and (p. 161) adaptation. Given that sequential grouping seems likely to be of great adaptive significance (because it affects our ability to recognize sounds), it would seem important for an auditory system to behave close to optimally, that is, for the per­ ception of one or two streams to be related to the likelihood of one or two streams in the world. It is thus striking that the phenomenon is proposed to result from apparently inci­ dental features of processing. Consistent with this viewpoint, a recent study showed that synchronous high- and low-frequency tones produce neural responses that are just as Page 38 of 62

Audition segregated as those for the classic streaming configuration of alternating high and low tones, even though perceptual segregation does not occur when the tones are synchro­ nous (Elhilali, Ma, et al., 2009). This finding indicates that non-overlapping neural re­ sponses are not sufficient for perceptual segregation, and that the relative timing of neur­ al responses may be more important. The significance of neural overlap thus remains un­ clear, and the brain basis of streaming will undoubtedly continue to be debated in the years to come.

Separating Sound Sources from the Environment Thus far we have mainly discussed how the auditory system segregates the signals from multiple sound sources, but listeners face a second important scene analysis problem. The sound that reaches the ear from a source is almost always altered to some extent by the surrounding environment, and these environmental influences must be separated from those of the source if the source content is to be estimated correctly. Typically the sound produced by a source reflects off multiple surfaces on its way to the ears, such that the ears receive some sound directly from the source, but also many reflected versions (Figure 8.14). These reflected versions (echoes) are delayed because their path to the ear is lengthened, but generally they also have altered frequency spectra because reflective surfaces absorb some frequencies more than others. Because each reflection can be well described with a linear filter applied to the source signal, the signal reaching the ear, which is the sum of the direct sound along with all the reflections, can be described sim­ ply as the result of applying a single composite linear filter to the source (Gardner, 1998). Significant filtering of this sort occurs in almost every natural listening situation, such that sound produced in anechoic conditions (in which all surfaces are minimally reflec­ tive) sounds noticeably strange and unnatural. Listeners are often interested in the properties of sound sources, and one might think of the environmental effects as a nuisance that should simply be discounted. However, envi­ ronmental filtering imbues the acoustic input with useful information—for instance, about the size of a room where sound is produced and the distance of the source from the lis­ tener. It is thus more appropriate to think of separating source and environment, at least to some extent, rather than simply recovering the source. Reverberation is commonly used in music production, for instance, to create a sense of space or to give a different feel to particular instruments or voices. The loudness constancy phenomena discussed earlier are one example of the brain infer­ ring the properties of the sound source as separate from that of the environment, but there are many others. One of the most interesting involves the treatment of echoes in sound localization. The echoes that are common in most natural environments pose a problem for localization because they generally come from directions other than that of the source (Figure 8.14B). The auditory system appears to solve this problem by percep­ tually fusing similar impulsive sounds that occur within a brief interval of each other (on the order of 10 ms or so), and using the sound that occurs first to determine the per­ ceived location. This precedence effect, so called because of the dominance of the sound Page 39 of 62

Audition that occurs first, was described and named by Hans Wallach (Wallach, Newman, et al., 1949), one of the great gestalt psychologists, and has since been the subject of a large and interesting literature. For instance, the maximal delay at which echoes are perceptu­ ally suppressed increases as two pairs of sounds are repeatedly presented (Freyman, Clifton, et al., 1991), presumably because the repetition provides evidence that the sec­ ond sound is indeed an echo of the first, rather than being due to a distinct source (in which case it would not occur at a consistent delay following the first sound). Moreover, reversing the order of presentation can cause an abrupt breakdown of the effect, such that two sounds are heard rather than one, each with a different location. See Litovsky, Colburn, et al., 1999, for a review.

Figure 8.14 Reverberation. A, Impulse response for a classroom. This is the sound waveform recorded in this room in response to a click (impulse) produced at a particular location in the room. The top arrow indicates the impulse that reaches the microphone directly from the source (that thus arrives first). The lower arrow indicates one of the subsequent reflec­ tions, i.e., echoes. After the early reflections, a grad­ ually decaying reverberation tail is evident (cut off at 250 ms for clarity). The sound signal resulting from an arbitrary source could be produced by convolving the sound from the source with this impulse re­ sponse. B, Schematic diagram of the sound reflec­ tions that contribute to the signal that reaches a listener’s ears in a typical room. The brown box in the upper right corner depicts the speaker producing sound. The green lines depict the path taken by the direct sound to the listener’s ears. Blue and red lines depict sound reaching the ears after one and two re­ flections, respectively. Sound reaching the ear after more than two reflections is not shown. Part B is reprinted with permission from Culling & Akeroyd, 2010.

Page 40 of 62

Audition Reverberation poses a problem for sound recognition in addition to localization because different environments alter the sound from a source in different ways. Large amounts of reverberation (with prominent echoes at very long delays), as are present in some large auditoriums, can in fact greatly reduce the intelligibility of speech. Moderate amounts of (p. 162) reverberation, however, as are present most of the time, typically have minimal ef­ fect on our ability to recognize speech and other sounds. Recent work indicates that part of our robustness to reverberation derives from a process that adapts to the history of echo stimulation. In reverberant conditions, the intelligibility of a speech utterance has been found to be higher when preceded by another utterance than when not, an effect that does not occur in anechoic conditions (Brandewie & Zahorik, 2010). Such results, like those of the precedence effect, are consistent with the idea that listeners construct a model of the environment’s contribution to the acoustic input and use it to partially dis­ count the environment when judging properties of a source. Analogous effects have been found with nonspeech sounds. When listeners hear instrument sounds preceded by speech or music that has been passed through a filter that “colors” the spectrum, the in­ strument sound is identified differently, as though listeners internalize the filter, assume it to be an environmental effect, and discount it to some extent when identifying the sound (Stilp, Alexander, et al., 2010).

VII. Current and Future Directions Hearing science is one of the oldest areas of psychology and neuroscience, with a strong research tradition dating back over 100 years, yet there remain many important open questions. Although research on each of the senses need not be expected to proceed ac­ cording to a single fixed trajectory, the contrast between hearing and vision nonetheless provides useful reminders of what remains poorly understood in audition. The classic methods of psychophysics were initially developed largely within hearing research, and were then borrowed by vision scientists to explore sensory encoding processes in vision. But while vision science quickly embraced perceptual and cognitive questions, hearing science remained more focused on the periphery. This can be explained in part by the challenge of understanding the cochlea, the considerable complexity of the early auditory system, and the clinical importance of peripheral audition. However, the focus on the pe­ riphery has left many central aspects of audition underexplored, and recent trends in hearing research reflect a shift toward the study of these neglected mid- and high-level questions. One important set of questions concerns the interface of audition with the rest of cogni­ tion, via attention and memory. Attention research ironically also flourished in hearing early on (with Cherry’s [1953] classic dichotic listening studies), but then largely moved to the visual domain. Recent years have seen renewed interest (see chapter 11 in this vol­ ume), but there remain many open questions. Much is still unclear about what is repre­ sented about sound in the absence of attention, about how and what auditory attention selects, and about the role of attention in perceptual organization.

Page 41 of 62

Audition Another promising research area involves working memory. Auditory short-term memory may have some striking differences with its visual counterpart (Demany, Trost, et al., 2008) and appears closely linked to auditory scene analysis (Conway, Cowan, et al., 2001). (p. 163) Studies of these topics in audition also hold promise for informing us more gener­ ally about the structure of cognition––the similarities and differences with respect to visu­ al cognition will reveal much about whether attention and memory mechanisms are do­ main general (perhaps exploiting central resources) or specific to particular sensory sys­ tems. Interactions between audition and the other senses are also attracting increased interest. Information from other sensory systems likely plays a crucial role in hearing given that sound on its own often provides ambiguous information. The sounds produced by rain and applause, for instance, can in some cases be quite similar, such that multisensory integra­ tion (using visual, somatosensory, or olfactory input) may help to correctly recognize the sound source. Cross-modal interactions in localization (Alais & Burr, 2004) are similarly powerful. Understanding cross-modal effects within the auditory system (Bizley, Nodal, et al., 2007; Ghazanfar, 2009; Kayser, Petkov, et al., 2008) and their role in behavior will be a significant direction of research going forward. In addition to the uncharted territory in perception and cognition, there remain important open questions about peripheral processing. Some of these unresolved issues, such as the mechanisms of outer hair cell function, have great importance for understanding hearing impairment. Others may dovetail with higher level function. For instance, the role of ef­ ferent connections to the cochlea is still uncertain, with some hypothesizing a role in at­ tention or segregation (Guinan, 2006). The role of phase locking in frequency encoding and pitch perception is another basic issue that remains controversial and that has wide­ spread relevance to mid-level audition. As audition continues to evolve as a field, I believe useful guidance will come from a com­ putational analysis of the inference problems the auditory system must solve (Marr, 1982). This necessitates thinking about the behavioral demands of real-world listening situations, as well as the constraints imposed by the way that information about the world is encoded in a sound signal. Many of these issues are becoming newly accessible with re­ cent advances in computational power and signal processing techniques. For instance, one of the most important tasks a listener must perform with sound is sure­ ly that of recognition—determining what it was in the world that caused a sound, be it a particular type of object, or of a type of event, such as something falling on the floor (Gaver, 1993; Lutfi, 2008). Recognition is computationally challenging because the same type of occurrence in the world typically produces a different sound waveform each time it occurs. A recognition system must generalize across the variation that occurs within categories, but not the variation that occurs across categories (DiCarlo & Cox, 2007). Re­ alizing this computational problem allows us to ask how the auditory system solves it. One place where these issues have been explored to some extent is speech perception (Holt & Lotto, 2010). The ideas explored there—about how listeners achieve invariance Page 42 of 62

Audition across different speakers and infer the state of the vocal apparatus along with the accom­ panying intentions of the speaker—could perhaps be extended to audition more generally (Rosenblum, 2004). The inference problems of audition can also be better appreciated by examining realworld sound signals, and formal analysis of these signals seems likely to yield valuable clues. As discussed in previous sections, statistical analysis of natural sounds has been a staple of recent computational auditory neuroscience (Harper & McAlpine, 2004; Ro­ driguez, Chen, et al., 2010; Smith & Lewicki, 2006), where natural sound statistics have been used to explain the mechanisms observed in the peripheral auditory system. Howev­ er, sound analysis seems likely to provide insight into mid- and high-level auditory prob­ lems as well. For instance, the acoustic grouping cues used in sound segregation are al­ most surely rooted to some extent in natural sound statistics, and examining such statis­ tics could reveal unexpected cues. Similarly, because sound recognition must generalize across the variability that occurs within sounds produced by a particular type of source, examining this variability in natural sounds may provide clues to how the auditory system achieves the appropriate invariance in this domain. The study of real-world auditory competence will also necessitate measuring auditory abilities and physiological responses with more realistic sound signals. The tones and noises that have been the staple of classical psychoacoustics and auditory physiology have many uses, but also have little in common with many everyday sounds. One chal­ lenge of working with realistic signals is that actual recordings of real-world sounds are often uncontrolled, and typically introduce confounds associated with their familiarity. Methods of synthesizing novel sounds with naturalistic properties (Cavaco & Lewicki, 2007; McDermott, Wrobleski et al., 2011; (p. 164) McDermott & Simoncelli, 2011) are thus likely to be useful experimental tools. Simulations of realistic auditory environments are also increasingly within reach, with methods for generating three-dimensional auditory scenes (Wightman & Kistler, 1989; Zahorik, 2009) being used in studies of sound localiza­ tion and speech perception in realistic conditions. We must also consider more realistic auditory behaviors. Hearing does not normally oc­ cur while we are seated in a quiet room, listening over headphones, and paying full atten­ tion to the acoustic stimulus, but rather in the context of everyday activities in which sound is a means to some other goal. The need to respect this complexity while maintain­ ing sufficient control over experimental conditions presents a challenge, but not one that is insurmountable. For instance, neurophysiology experiments involving naturalistic be­ havior are becoming more common, with preparations being developed that will permit recordings from freely moving animals engaged in vocalization (Eliades & Wang, 2008) or locomotion—ultimately, perhaps a real-world cocktail party.

Page 43 of 62


Author Note I thank Garner Hoyt von Trapp, Sam Norman-Haignere, Michael Schemitsch, and Sara Steele for helpful comments on earlier drafts of this chapter, the authors who kindly al­ lowed me to reproduce their figures (acknowledged individually in the figure captions), and the Howard Hughes Medical Institute for support.

References Adelson, E. H. (2000). Lightness perception and lightness illusions. In: M. S. Gazzaniga (Ed.), The new cognitive neurosciences (2nd ed., pp. 339–351). Cambridge, MA, MIT Press. Ahrens, M. B., Linden, J. F., et al. (2008). Influences in auditory cortical responses mod­ eled with multilinear spectrotemporal methods. Journal of Neuroscience, 28 (8), 1929– 1942. Alain, C., Arnott, S. R., et al. (2001). “What” and “where” in the human auditory system. Proceedings of the National Academy of Sciences U S A, 98, 12301–12306. Alais, D., & Burr, D. E. (2004). The ventriloquist effect results from near-optimal bimodal integration. Current Biology, 14, 257–262. ANSI (2007). American national standard procedure for the computation of loudness of steady sounds. ANSI, S3–4. Ashmore, J. (2008). Cochlear outer hair cell motility. Physiological Review, 88, 173–210. Attias, H., & Schreiner, C. E. (1997). Temporal low-order statistics of natural sounds. Ad­ vances in Neural Information Processing (p. 9). In M. Mozer, Jordan, M., & Petsche, T. Cambridge, MA: MIT Press. Attneave, F., & Olson, R. K. (1971). Pitch as a medium: A new approach to psychophysical scaling. American Journal of Psychology, 84 (2), 147–166. Bacon, S. P., & Grantham, D. W. (1989). Modulation masking: Effects of modulation fre­ quency, depth, and phase. Journal of the Acoustical Society of America, 85, 2575–2580. Banai, K., Hornickel, J., et al. (2009). Reading and subcortical auditory function. Cerebral Cortex, 19 (11), 2699–2707. Bandyopadhyay, S., Shamma, S. A., et al. (2010). Dichotomy of functional organization in the mouse auditory cortex. Nature Neuroscience, 13 (3), 361–368. Barbour, D. L., & Wang, X. (2003). Contrast tuning in auditory cortex. Science, 299, 1073– 1075.

Page 44 of 62

Audition Baumann, S., Griffiths, T. D., et al. (2011). Orthogonal representation of sound dimensions in the primate midbrain. Nature Neuroscience, 14 (4), 423–425. Bee, M. A., & Klump, G. M. (2004). Primitive auditory stream segregation: A neurophysio­ logical study in the songbird forebrain. Journal of Neurophysiology, 92, 1088–1104. Bee, M. A., & Micheyl, C. (2008). The cocktail party problem: What is it? How can it be solved? And why should animal behaviorists study it? Journal of Comparative Psychology, 122 (3), 235–251. Belin, P., Zatorre, R. J., et al. (2000). Voice-selective areas in human auditory cortex. Na­ ture, 403, 309–312. Bendor, D., & Wang, X. (2005). The neuronal representation of pitch in primate auditory cortex. Nature, 426, 1161–1165. Bendor, D., & Wang, X. (2006). Cortical representations of pitch in monkeys and humans. Current Opinion in Neurobiology, 16, 391–399. Bendor, D., & Wang, X. (2008). Neural response properties of primary, rostral, and ros­ trotemporal core fields in the auditory cortex of marmoset monkeys. Journal of Neuro­ physiology, 100 (2), 888–906. Bernstein, J. G. W., & Oxenham, A. J. (2005). An autocorrelation model with place depen­ dence to account for the effect of harmonic number on fundamental frequency discrimi­ nation. Journal of the Acoustical Society of America, 117 (6), 3816–3831. Bitterman, Y., Mukamel, R., et al. (2008). Ultra-fine frequency tuning revealed in single neurons of human auditory cortex. Nature, 451 (7175), 197–201. Bizley, J. K., Nodal, F. R., et al. (2007). Physiological and anatomical evidence for multi­ sensory interactions in auditory cortex. Cerebral Cortex, 17, 2172–2189. Bizley, J. K., Walker, K. M. M., et al. (2009). Interdependent encoding of pitch, timbre, and spatial location in auditory cortex. Journal of Neuroscience, 29 (7), 2064–2075. Bizley, J. K., Walker, K. M. M., et al. (2010). Neural ensemble codes for stimulus periodici­ ty in auditory cortex. Journal of Neuroscience, 30 (14), 5078–5091. Boemio, A., Fromm, S., et al. (2005). Hierarchical and asymmetric temporal sensitivity in human auditory cortices. Nature Neuroscience, 8, 389–395. Brandewie, E., & Zahorik, P. (2010). Prior listening in rooms improves speech intelligibili­ ty. Journal of the Acoustical Society of America, 128, 291–299. Bregman, A. S. (1978). Auditory streaming is cumulative. Journal of Experimental Psy­ chology: Human Perception and Performance, 4, 380–387.

Page 45 of 62

Audition Bregman, A. S. (1990). Auditory scene analysis: The perceptual organization of sound. Cambridge, MA: MIT Press. Bregman, A. S., & Campbell, J. (1971). Primary auditory stream segregation and percep­ tion of order in rapid sequences of tones. Journal of Experimental Psychology, 89, 244– 249. Bronkhorst, A. W. (2000). The cocktail party phenomenon: A review of research on speech intelligibility in multiple-talker conditions. Acustica 86: 117–128. (p. 165)

Brown, G. J., & Palomaki, K. J. (2006). In D. Wang & G. J. Brown (Eds.), Reverberation. Computational auditory scene analysis: Principles, algorithms, and applications (pp. 209– 250). D. Wang and G. J. Brown. Hoboken, NJ: John Wiley & Sons. Buus, S., Florentine, M., et al. (1997). Temporal integration of loudness, loudness discrim­ ination, and the form of the loudness function. Journal of the Acoustical Society of Ameri­ ca, 101, 669–680. Carcagno, S., & Plack, C. J. (2011). Subcortical plasticity following perceptual learning in a pitch discrimination task. Journal of the Association for Research in Otolaryngology, 12, 89–100. Cariani, P. A., & Delgutte, B. (1996). Neural correlates of the pitch of complex tones. I. Pitch and pitch salience. Journal of Neurophysiology, 76, 1698–1716. Carlyon, R. P. (1991). Discriminating between coherent and incoherent frequency modula­ tion of complex tones. Journal of the Acoustical Society of America, 89, 329–340. Carlyon, R. P. (2004). How the brain separates sounds. Trends in Cognitive Sciences, 8 (10), 465–471. Carlyon, R. P., & Cusack, R., et al. (2001). Effects of attention and unilateral neglect on auditory stream segregation. Journal of Experimental Psychology: Human Perception and Performance, 27 (1), 115–127. Carlyon, R. P., Micheyl, C., et al. (2004). Auditory processing of real and illusory changes in frequency modulation (FM) phase. Journal of the Acoustical Society of America, 116 (6), 3629–3639. Cavaco, S., & Lewicki, M. S. (2007). Statistical modeling of intrinsic structures in impact sounds. Journal of the Acoustical Society of America, 121 (6), 3558–3568. Cedolin, L., & Delgutte, B. (2005). Pitch of complex tones: Rate-place and interspike in­ terval representations in the auditory nerve. Journal of Neurophysiology, 94, 347–362. Cherry, E. C. (1953). Some experiments on the recognition of speech, with one and two ears. Journal of the Acoustical Society of America, 25 (5), 975–979.

Page 46 of 62

Audition Christianson, G. B., Sahani, M., et al. (2008). The consequences of response nonlineari­ ties for interpretation of spectrotemporal receptive fields. Journal of Neuroscience, 28 (2), 446–455. Ciocca, V., & Bregman, A. S. (1987). Perceived continuity of gliding and steady-state tones through interrupting noise. Perception & Psychophysics, 42, 476–484. Cohen, Y. E., Russ, B. E., et al. (2009). A functional role for the ventrolateral prefrontal cortex in non-spatial auditory cognition. Proceedings of the National Academy of Sciences U S A, 106, 20045–20050. Conway, A. R., Cowan, A. N., et al. (2001). The cocktail party phenomenon revisited: The importance of working memory capacity. Psychonomic Bulletin & Review, 8, 331–335. Culling, J. F., & Akeroyd, M. A. (2010). In C. J. Plack (Ed.), Spatial hearing. The Oxford handbook of auditory science: Hearing (Vol. 3, pp. 123–144). Oxford, UK: Oxford Universi­ ty Press. Culling, J. F., & Summerfield, Q. (1995). Perceptual separation of concurrent speech sounds: Absence of across-frequency grouping by common interaural delay. Journal of the Acoustical Society of America, 98 (2), 785–797. Cusack, R., & Carlyon, R. P. (2004). Auditory perceptual organization inside and outside the laboratory. In J. G. Neuhoff (Ed.), Ecological psychoacoustics (pp. 15–84). San Diego: Elsevier Academic Press. Cutting, J. E. (1975). Aspects of phonological fusion. Journal of Experimental Psychology: Human Perception and Performance, 104, 105–120. Dallos, P. (2008). Cochlear amplification, outer hair cells and prestin. Current Opinion in Neurobiology, 18, 370–376. Darrow, K. N., Maison, S. F., et al. (2006). Cochlear efferent feedback balances interaural sensitivity. Nature Neuroscience, 9 (12), 1474–1476. Darwin, C. (1984). Perceiving vowels in the presence of another sound: Constraints on formant perception. Journal of the Acoustical Society of America, 76 (6), 1636–1647. Darwin, C. J. (1981). Perceptual grouping of speech components different in fundamental frequency and onset-time. Quarterly Journal of Experimental Psychology, 3A 185–207. Darwin, C. J. (1997). Auditory grouping. Trends in Cognitive Sciences, 1, 327–333. Darwin, C. J., & Ciocca, V. (1992). Grouping in pitch perception: Effects of onset asyn­ chrony and ear of presentation of a mistuned component. Journal of the Acoustical Soci­ ety of America, 91, 3381–3390.

Page 47 of 62

Audition Darwin, C. J., & Hukin, R. W. (1997). Perceptual segregation of a harmonic from a vowel by interaural time difference and frequency proximity. Journal of the Acoustical Society of America, 102 (4), 2316–2324. Darwin, C. J., & Hukin, R. W. (1999). Auditory objects of attention: The role of interaural time differences. Journal of Experimental Psychology: Human Perception and Perfor­ mance, 25 (3), 617–629. Dau, T., Kollmeier, B., et al. (1997). Modeling auditory processing of amplitude modula­ tion. I. Detection and masking with narrow-band carriers. Journal of the Acoustical Soci­ ety of America, 102 (5), 2892–2905. David, S. V., Mesgarani, N., et al. (2009). Rapid synaptic depression explains nonlinear modulation of spectro-temporal tuning in primary auditory cortex by natural stimuli. Jour­ nal of Neuroscience, 29 (11), 3374–3386. de Cheveigne, A. (2005). Pitch perception models. In C. J. Plack & A. J. Oxenham (Eds.), Pitch (pp. 169–233). New York: Springer-Verlag. de Cheveigne, A. (2006). Multiple F0 estimation. In: D. Wang & G. J. Brown (Eds.), Com­ putational auditory scene analysis: Principles, algorithms, and applications (pp. 45–80). Hoboken, NJ: John Wiley & Sons. de Cheveigne, A. (2010). Pitch perception. In C. J. Plack (Ed.), The Oxford handbook of au­ ditory science: Hearing (Vol. 3), pp. 71–104. New York: Oxford University Press. de Cheveigne, A., & Kawahara, H. (2002). YIN, a fundamental frequency estimator for speech and music. Journal of the Acoustical Society of America, 111, 1917–1930. de Cheveigne, A., McAdams, S., et al. (1995). Identification of concurrent harmonic and inharmonic vowels: A test of the theory of harmonic cancellation and enhancement. Jour­ nal of the Acoustical Society of America, 97 (6), 3736–3748. Dean, I., Harper, N. S., et al. (2005). Neural population coding of sound level adapts to stimulus statistics. Nature Neuroscience, 8 (12), 1684–1689. Delgutte, B., Joris, P. X., et al. (1999). Receptive fields and binaural interactions for virtu­ al-space stimuli in the cat inferior colliculus. Journal of Neurophysiology, 81, 2833–2851. Demany, L., & Semal, C. (1990). The upper limit of “musical” pitch. Music Percep­ tion, 8, 165–176. (p. 166)

Demany, L., Trost, W., et al. (2008). Auditory change detection: Simple sounds are not memorized better than complex sounds. Psychological Science, 19, 85–91. Depireux, D. A., Simon, J. Z., et al. (2001). Spectro-temporal response field characteriza­ tion with dynamic ripples in ferret primary auditory cortex. Journal of Neurophysiology, 85 (3), 1220–1234. Page 48 of 62

Audition DiCarlo, J. J., & Cox, D. D. (2007). Untangling invariant object recognition. Trends in Cog­ nitive Sciences, 11, 333–341. Durlach, N. I., Mason, C. R., et al. (2003). Note on informational masking. Journal of the Acoustical Society of America, 113 (6), 2984–2987. Elgoyhen, A. B., & Fuchs, P. A. (2010). Efferent innervation and function. In P. A. Fuchs (Ed.), The Oxford handbook of auditory science: The ear (pp. 283–306). Oxford, UK: Ox­ ford University Press. Elhilali, M., Ma, L., et al. (2009). Temporal coherence in the perceptual organization and cortical representation of auditory scenes. Neuron, 61, 317–329. Eliades, S. J., & Wang X. (2008). Neural substrates of vocalization feedback monitoring in primate auditory cortex. Nature, 453, 1102–1106. Escabi, M. A., Miller, L. M., et al. (2003). Naturalistic auditory contrast improves spec­ trotemporal coding in the cat inferior colliculus. Journal of Neuroscience, 23, 11489– 11504. Fairhall, A. L., Lewen, G. D., et al. (2001). Efficiency and ambiguity in an adaptive neural code. Nature, 412, 787–792. Field, D. J. (1987). Relations between the statistics of natural images and the response profiles of cortical cells. Journal of the Optical Society of America A, 4, 2379–2394. Fishman, Y. I., Arezzo, J. C., et al. (2004). Auditory stream segregation in monkey auditory cortex: Effects of frequency separation, presentation rate, and tone duration. Journal of the Acoustical Society of America, 116, 1656–1670. Fishman, Y. I., Reser, D. H., et al. (1998). Pitch vs. spectral encoding of harmonic complex tones in primary auditory cortex of the awake monkey. Brain Research, 786, 18–30. Fishman, Y. I., & Steinschneider, M. (2010). Formation of auditory streams. In A. Rees & A. R. Palmer (Eds.), The oxford handbook of auditory science: The auditory brain (pp. 215–245). Oxford, UK: Oxford University Press. Formisano, E., Kim, D., et al. (2003). Mirror-symmetric tonotopic maps in human primary auditory cortex. Neuron, 40 (4), 859–869. Freyman, R. L., Clifton, R. K., et al. (1991). Dynamic processes in the precedence effect. Journal of the Acoustical Society of America, 90, 874–884. Fritz, J. B., David, S. V., et al. (2010). Adaptive, behaviorally gated, persistent encoding of task-relevant auditory information in ferret frontal cortex. Nature Neuroscience, 13 (8), 1011–1019. Fritz, J. B., Elhilali, M., et al. (2005). Differential dynamic plasticity of A1 receptive fields during multiple spectral tasks. Journal of Neuroscience, 25 (33), 7623–7635. Page 49 of 62

Audition Fritz, J. B., Shamma, S. A., et al. (2003). Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortex. Nature Neuroscience, 6, 1216–1223. Gardner, W. G. (1998). Reverberation algorithms. In M. Kahrs and K. Brandenburg (Eds.), Applications of digital signal processing to audio and acoustics (pp. 85–131). Norwell, MA: Kluwer Academic. Gaver, W. W. (1993). What in the world do we hear? An ecological approach to auditory source perception. Ecological Psychology, 5 (1), 1–29. Ghazanfar, A. A. (2009). The multisensory roles for auditory cortex in primate vocal com­ munication. Hearing Research, 258, 113–120. Ghitza, O. (2001). On the upper cutoff frequency of the auditory critical-band envelope detectors in the context of speech perception. Journal of the Acoustical Society of Ameri­ ca, 110 (3), 1628–1640. Giraud, A., Lorenzi, C., et al. (2000). Representation of the temporal envelope of sounds in the human brain. Journal of Neurophysiology, 84 (3), 1588–1598. Glasberg, B. R., & Moore, B. C. J. (1990). Derivation of auditory filter shapes from notched-noise data. Hearing Research, 47, 103–138. Goldstein, J. L. (1973). An optimum processor theory for the central formation of the pitch of complex tones. Journal of the Acoustical Society of America, 54, 1496–1516. Guinan, J. J. (2006). Olivocochlear efferents: Anatomy, physiology, function, and the mea­ surement of efferent effects in humans. Ear and Hearing, 27 (6), 589–607. Gygi, B., Kidd, G. R., et al. (2004). Spectral-temporal factors in the identification of envi­ ronmental sounds. Journal of the Acoustical Society of America, 115 (3), 1252–1265. Hall, D. A., & Plack, C. J. (2009). Pitch processing sites in the human auditory brain. Cere­ bral Cortex, 19 (3), 576–585. Hall, D. A., Barrett, D. J. K., Akeroyd, M. A., & Summerfield, A. Q. (2005). Cortical repre­ sentations of temporal structure in sound. Journal of Neurophysiology, 94 (11), 3181– 3191. Hall, J. W., Haggard, M. P., et al. (1984). Detection in noise by spectro-temporal pattern analysis. Journal of the Acoustical Society of America, 76, 50–56. Harper, N. S., & McAlpine, D. (2004). Optimal neural population coding of an auditory spatial cue. Nature, 430, 682–686. Hawley, M. L., Litovsky, R. Y., et al. (2004). The benefit of binaural hearing in a cocktail party: Effect of location and type of interferer. Journal of the Acoustical Society of Ameri­ ca, 115 (2), 833–843. Page 50 of 62

Audition Heffner, H. E., & Heffner, R. S. (1990). Effect of bilateral auditory cortex lesions on sound localization in Japanese macaques. Journal of Neurophysiology, 64 (3), 915–931. Heinz, M. G., Colburn, H. S., et al. (2001). Evaluating auditory performance limits: I. Oneparameter discrimination using a computational model for the auditory nerve. Neural Computation, 13, 2273–2316. Higgins, N. C., Storace, D. A., et al. (2010). Specialization of binaural responses in ventral auditory cortices. Journal of Neuroscience, 30 (43), 14522–14532. Hofman, P. M., Van Riswick, J. G. A., et al. (1998). Relearning sound localization with new ears. Nature Neuroscience, 1 (5), 417–421. Holt, L. L., & Lotto, A. J. (2010). Speech perception as categorization. Attention, Percep­ tion, and Psychophysics, 72 (5), 1218–1227. Houtgast, T. (1989). Frequency selectivity in amplitude-modulation detection. Journal of the Acoustical Society of America, 85, 1676–1680. Houtsma, A. J. M., & Smurzynski, J. (1990). Pitch identification and discrimination for complex tones with many harmonics. Journal of the Acoustical Society of America, 87 (1), 304–310. Hsu, A., Woolley, S. M., et al. (2004). Modulation power and phase spectrum of natural sounds enhance neural encoding performed by single auditory neurons. Journal of Neuroscience, 24, 9201–9211. (p. 167)

Hudspeth, A. J. (2008). Making an effort to listen: Mechanical amplification in the ear. Neuron, 59 (4), 530–545. Humphries, C., Liebenthal, E., et al. (2010). Tonotopic organization of human auditory cortex. NeuroImage, 50 (3), 1202–1211. Ihlefeld, A., & Shinn-Cunningham, B. (2008). Spatial release from energetic and informa­ tional masking in a divided speech identification task. Journal of the Acoustical Society of America, 123 (6), 4380–4392. Javel, E., & Mott, J. B. (1988). Physiological and psychophysical correlates of temporal processes in hearing. Hearing Research, 34, 275–294. Jenkins, W. M., & Masterton, R. G. (1982). Sound localization: Effects of unilateral lesions in central auditory system. Journal of Neurophysiology, 47, 987–1016. Johnson, D. H. (1980). The relationship between spike rate and synchrony in responses of auditory-nerve fibers to single tones. Journal of the Acoustical Society of America, 68, 1115–1122. Johnsrude, I. S., Penhune, V. B., et al. (2000). Functional specificity in the right human au­ ditory cortex for perceiving pitch direction. Brain, 123 (1), 155–163. Page 51 of 62

Audition Joris, P. X., Bergevin, C., et al. (2011). Frequency selectivity in Old-World monkeys corrob­ orates sharp cochlear tuning in humans. Proceedings of the National Academy of Sciences U S A, 108 (42), 17516–17520. Joris, P. X., Schreiner, C. E., et al. (2004). Neural processing of amplitude-modulated sounds. Physiological Review, 84, 541–577. Kaas, J. H., & Hackett, T. A. (2000). Subdivisions of auditory cortex and processing streams in primates. Proceedings of the National Academy of Sciences U S A, 97, 11793– 11799. Kadia, S. C., & Wang, X. (2003). Spectral integration in A1 of awake primates: Neurons with single and multipeaked tuning characteristics. Journal of Neurophysiology, 89 (3), 1603–1622. Kanwisher, N. (2010). Functional specificity in the human brain: A window into the func­ tional architecture of the mind. Proceedings of the National Academy of Sciences U S A, 107, 11163–11170. Kawase, T., Delgutte, B., et al. (1993). Anti-masking effects of the olivocochlear reflex. II. Enhancement of auditory-nerve response to masked tones. Journal of Neurophysiology, 70, 2533–2549. Kayser, C., Petkov, C. I., et al. (2008). Visual modulation of neurons in auditory cortex. Cerebral Cortex, 18 (7), 1560–1574. Kidd, G., Arbogast, T. L., et al. (2005). The advantage of knowing where to listen. Journal of the Acoustical Society of America, 118 (6), 3804–3815. Kidd, G., Mason, C. R., et al. (1994). Reducing informational masking by sound segrega­ tion. Journal of the Acoustical Society of America, 95 (6), 3475–3480. Kidd, G., Mason, C. R., et al. (2003). Multiple bursts, multiple looks, and stream coher­ ence in the release from informational masking. Journal of the Acoustical Society of Amer­ ica, 114 (5), 2835–2845. Kikuchi, Y., Horwitz, B., et al. (2010). Hierarchical auditory processing directed rostrally along the monkey’s supratemporal plane. Journal of Neuroscience, 30 (39), 13021–13030. Kluender, K. R., & Jenison, R. L. (1992). Effects of glide slope, noise intensity, and noise duration on the extrapolation of FM glides through noise. Perception & Psychophysics, 51, 231–238. Klump, R. G., & Eady, H. R. (1956). Some measurements of interural time difference thresholds. Journal of the Acoustical Society of America, 28, 859–860. Kulkarni, A., & Colburn, H. S. (1998). Role of spectral detail in sound-source localization. Nature, 396, 747–749. Page 52 of 62

Audition Kvale, M., & Schreiner, C. E. (2004). Short-term adaptation of auditory receptive fields to dynamic stimuli. Journal of Neurophysiology, 91, 604–612. Langner, G., Sams, M., et al. (1997). Frequency and periodicity are represented in orthog­ onal maps in the human auditory cortex: Evidence from magnetoencephalography. Jour­ nal of Comparative Physiology, 181, 665–676. Leaver, A. M., & Rauschecker, J. P. (2010). Cortical representation of natural complex sounds: Effects of acoustic features and auditory object category. Journal of Neuroscience, 30 (22), 7604–7612. Lerner, Y., Honey, C. J., et al. (2011). Topographic mapping of a hierarchy of temporal re­ ceptive windows using a narrated story. Journal of Neuroscience, 31 (8), 2906–2915. Lewicki, M. S. (2002). Efficient coding of natural sounds. Nature Neuroscience, 5 (4), 356–363. Liberman, M. C. (1982). The cochlear frequency map for the cat: Labeling auditory-nerve fibers of known characteristic frequency. Journal of the Acoustical Society of America, 72, 1441–1449. Lippmann, R. P. (1997). Speech recognition by machines and humans. Speech Communi­ cation, 22, 1–16. Litovsky, R. Y., Colburn, H. S., et al. (1999). The precedence effect. Journal of the Acousti­ cal Society of America, 106, 1633–1654. Lomber, S. G., & Malhotra, S. (2008). Double dissociation of “what” and “where” process­ ing in auditory cortex. Nature Neuroscience, 11 (5), 609–616. Lorenzi, C., Gilbert, G., et al. (2006). Speech perception problems of the hearing impaired reflect inability to use temporal fine structure. Proceedings of the National Academy of Sciences U S A, 103, 18866–18869. Lutfi, R. A. (1992). Informational processing of complex sounds. III. Interference. Journal of the Acoustical Society of America, 91, 3391–3400. Lutfi, R. A. (2008). Human sound source identification. In W. A. Yost & A. N. Popper (Eds.), Springer handbook of auditory research: Auditory perception of sound sources (pp. 13–42). New York: Springer-Verlag. Machens, C. K., M. S. Wehr, et al. (2004). Linearity of cortical receptive fields measured with natural sounds. Journal of Neuroscience, 24, 1089–1100. Macken, W. J., Tremblay, S., et al. (2003). Does auditory streaming require attention? Evi­ dence from attentional selectivity in short-term memory. Journal of Experimental Psychol­ ogy: Human Perception and Performance, 29, 43–51.

Page 53 of 62

Audition Makous, J. C., & Middlebrooks, J. C. (1990). Two-dimensional sound localization by human listeners. Journal of the Acoustical Society of America, 87, 2188–2200. Marr, D. C. (1982). Vision: A computational investigation into the human representation and processing of visual information. New York: Freeman. May, B. J., Anderson, M., et al. (2008). The role of broadband inhibition in the rate representation of spectral cues for sound localization in the inferior colliculus. Hearing Research, 238, 77–93. (p. 168)

May, B. J., & McQuone, S. J. (1995). Effects of bilateral olivocochlear lesions on pure-tone discrimination in cats. Auditory Neuroscience, 1, 385–400. McAdams, S. (1989). Segregation of concurrent sounds. I. Effects of frequency modula­ tion coherence. Journal of the Acoustical Society of America, 86, 2148–2159. McAlpine, D. (2004). Neural sensitivity to periodicity in the inferior colliculus: Evidence for the role of cochlear distortions. Journal of Neurophysiology, 92, 1295–1311. McDermott, J. H. (2009). The cocktail party problem. Current Biology, 19, R1024–R1027. McDermott, J. H., Lehr, A. J., et al. (2010). Individual differences reveal the basis of conso­ nance. Current Biology, 20, 1035–1041. McDermott, J. H., & Oxenham, A. J. (2008a). Music perception, pitch, and the auditory system. Current Opinion in Neurobiology, 18, 452–463. McDermott, J. H., & Oxenham, A. J. (2008b). Spectral completion of partially masked sounds. Proceedings of the National Academy of Sciences U S A, 105 (15), 5939–5944. McDermott, J. H., & Simoncelli, E. P. (2011). Sound texture perception via statistics of the auditory periphery: Evidence from sound synthesis. Neuron, 71, 926–940. McDermott, J. H., Wrobleski, D., et al. (2011). Recovering sound sources from embedded repetition. Proceedings of the National Academy of Sciences U S A, 108 (3), 1188–1193. Meddis, R., & Hewitt, M. J. (1991). Virtual pitch and phase sensitivity of a computer mod­ el of the auditory periphery: Pitch identification. Journal of the Acoustical Society of America, 89, 2866–2882. Mershon, D. H., Desaulniers, D. H., et al. (1981). Perceived loudness and visually-deter­ mined auditory distance. Perception, 10, 531–543. Mesgarani, N., David, S. V., et al. (2008). Phoneme representation and classification in primary auditory cortex. Journal of the Acoustical Society of America, 123 (2), 899–909. Micheyl, C., & Oxenham, A. J. (2010). Objective and subjective psychophysical measures of auditory stream integration and segregation. Journal of the Association for Research in Otolaryngology, 11 (4), 709–724. Page 54 of 62

Audition Micheyl, C., & Oxenham, A. J. (2010). Pitch, harmonicity and concurrent sound segrega­ tion: Psychoacoustical and neurophysiological findings. Hearing Research, 266, 36–51. Micheyl, C., Tian, B., et al. (2005). Perceptual organization of tone sequences in the audi­ tory cortex of awake macaques. Neuron, 48, 139–148. Middlebrooks, J. C. (1992). Narrow-band sound localization related to external ear acoustics. Journal of the Acoustical Society of America, 92 (5), 2607–2624. Middlebrooks, J. C. (2000). Cortical representations of auditory space. In M. S. Gazzaniga. The new cognitive neurosciences (2nd ed., pp. 425–436). Cambridge, MA: MIT Press. Middlebrooks, J. C., & Green, D. M. (1991). Sound localization by human listeners. Annual Review of Psychology, 42, 135–159. Miller, L. M., Escabi, M. A., et al. (2001). Spectrotemporal receptive fields in the lemnis­ cal auditory thalamus and cortex. Journal of Neurophysiology, 87, 516–527. Miller, L. M., Escabi, M. A., et al. (2002). Spectrotemporal receptive fields in the lemnis­ cal auditory thalamus and cortex. Journal of Neurophysiology, 87, 516–527. Miller, R. L., Schilling, J. R., et al. (1997). Effects of acoustic trauma on the representation of the vowel /e/ in cat auditory nerve fibers. Journal of the Acoustical Society of America, 101 (6), 3602–3616. Moore, B. C. J. (1973). Frequency differences limens for short-duration tones. Journal of the Acoustical Society of America, 54, 610–619. Moore, B. C. J. (2003). An introduction to the psychology of hearing. San Diego, CA: Acad­ emic Press. Moore, B. C., & Glasberg, B. R. (1996). A revision of Zwicker’s loudness model. Acta Acus­ tica, 82 (2), 335–345. Moore, B. C. J., Glasberg, B. R., et al. (1986). Thresholds for hearing mistuned partials as separate tones in harmonic complexes. Journal of the Acoustical Society of America, 80, 479–483. Moore, B. C. J., & Gockel, H. (2002). Factors influencing sequential stream segregation. Acta Acustica, 88, 320–332. Moore, B. C. J., & Oxenham, A. J. (1998). Psychoacoustic consequences of compression in the peripheral auditory system. Psychological Review, 105 (1), 108–124. Moshitch, D., Las, L., et al. (2006). Responses of neurons in primary auditory cortex (A1) to pure tones in the halothane-anesthetized cat. Journal of Neurophysiology, 95 (6), 3756– 3769.

Page 55 of 62

Audition Neff, D. L. (1995). Signal properties that reduce masking by simultaneous, random-fre­ quency maskers. Journal of the Acoustical Society of America, 98, 1909–1920. Nelken, I., Bizley, J. K., et al. (2008). Responses of auditory cortex to complex stimuli: Functional organization revealed using intrinsic optical signals. Journal of Neurophysiolo­ gy, 99 (4), 1928–1941. Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381, 607–609. Palmer, A. R., & Russell, I. J. (1986). Phase-locking in the cochlear nerve of the guinea-pig and its relation to the receptor potential of inner hair-cells. Hearing Research, 24, 1–15. Patterson, R. D., Uppenkamp, S., et al. (2002). The processing of temporal pitch and melody information in auditory cortex. Neuron, 36 (4), 767–776. Penagos, H., Melcher, J. R., et al. (2004). A neural representation of pitch salience in non­ primary human auditory cortex revealed with functional magnetic resonance imaging. Journal of Neuroscience, 24 (30), 6810–6815. Petkov, C. I., Kayser, C., et al. (2006). Functional imaging reveals numerous fields in the monkey auditory cortex. PLoS Biology, 4 (7), 1213–1226. Petkov, C. I., Kayser, C., et al. (2008). A voice region in the monkey brain. Nature Neuro­ science, 11, 367–374. Petkov, C. I., O’Connor, K. N., et al. (2007). Encoding of illusory continuity in primary au­ ditory cortex. Neuron, 54, 153–165. Plack, C. J. (2005). The sense of hearing. New Jersey, Lawrence Erlbaum. Plack, C. J., & Oxenham, A. J. (2005). The psychophysics of pitch. In C. J. Plack, A. J. Oxen­ ham, R. R. Fay, & A. J. Popper (Eds.), Pitch: Neural coding and perception (pp. 7–55). New York: Springer-Verlag. Plack, C. J., Oxenham, A. J., et al. (Eds.) (2005). Pitch: Neural coding and perception. Springer Handbook of Auditory Research. New York: Springer-Verlag. Poeppel, D. (2003). The analysis of speech in different temporal integration windows: Cerebral lateralization as “asymmetric sampling in time.” Speech Communication, 41, 245–255. (p. 169)

Poremba, A., Saunders, R. C., et al. (2003). Functional mapping of the primate au­

ditory system. Science, 299, 568–572. Pressnitzer, D., Sayles, M., et al. (2008). Perceptual organization of sound begins in the auditory periphery. Current Biology, 18, 1124–1128.

Page 56 of 62

Audition Rajan, R. (2000). Centrifugal pathways protect hearing sensitivity at the cochlea in noisy environments that exacerbate the damage induced by loud sound. Journal of Neuro­ science, 20, 6684–6693. Rauschecker, J. P., & Tian, B. (2004). Processing of band-passed noise in the lateral audi­ tory belt cortex of the rhesus monkey. Journal of Neurophysiology, 91, 2578–2589. Rayleigh, L. (1907). On our perception of sound direction. Philosophical Magazine, 3, 456–464. Recanzone, G. H. (2008). Representation of con-specific vocalizations in the core and belt areas of the auditory cortex in the alert macaque monkey. Journal of Neuroscience, 28 (49), 13184–13193. Rhode, W. S. (1971). Observations of the vibration of the basilar membrane in squirrel monkeys using the Mossbauer technique. Journal of the Acoustical Society of America, 49, 1218–1231. Rhode, W. S. (1978). Some observations on cochlear mechanics. Journal of the Acoustical Society of America, 64, 158–176. Riecke, L., van Opstal, J., et al. (2007). Hearing illusory sounds in noise: Sensoryperceptual transformations in primary auditory cortex. Journal of Neuroscience, 27 (46), 12684–12689. (p. 170)

Roberts, B., & Brunstrom, J. M. (1998). Perceptual segregation and pitch shifts of mis­ tuned components in harmonic complexes and in regular inharmonic complexes. Journal of the Acoustical Society of America, 104 (4), 2326–2338. Rodriguez, F. A., Chen, C., et al. (2010). Neural modulation tuning characteristics scale to efficiently encode natural sound statistics. Journal of Neuroscience, 30, 15969–15980. Romanski, L. M., Tian, B., et al. (1999). Dual streams of auditory afferents target multiple domains in the primate prefrontal cortex. Nature Neuroscience, 2 (12), 1131–1136. Rose, J. E., Brugge, J. F., et al. (1967). Phase-locked response to low-frequency tones in single auditory nerve fibers of the squirrel monkey. Journal of Neurophysiology, 30, 769– 793. Rosen, S. (1992). Temporal information in speech: Acoustic, auditory and linguistic as­ pects. Philosophical Transactions of the Royal Society, London, Series B, 336, 367–373. Rosenblum, L. D. (2004). Perceiving articulatory events: Lessons for an ecological psy­ choacoustics. In J. G. Neuhoff (Ed.), Ecological psychoacoustics (pp.: 219–248). San Diego, CA: Elsevier Academic Press. Rothschild, G., Nelken, I., et al. (2010). Functional organization and population dynamics in the mouse primary auditory cortex. Nature Neuroscience, 13 (3), 353–360. Page 57 of 62

Audition Rotman, Y., Bar Yosef, O., et al. (2001). Relating cluster and population responses to nat­ ural sounds and tonal stimuli in cat primary auditory cortex. Hearing Research, 152, 110– 127. Ruggero, M. A. (1992). Responses to sound of the basilar membrane of the mammalian cochlea. Current Opinion in Neurobiology, 2, 449–456. Ruggero, M. A., & Rich, N. C. (1991). Furosemide alters organ of Corti mechanics: Evi­ dence for feedback of outer hair cells upon the basilar membrane. Journal of Neuro­ science, 11, 1057–1067. Ruggero, M. A., Rich, N. C., et al. (1997). Basilar-membrane responses to tones at the base of the chinchilla cochlea. Journal of the Acoustical Society of America, 101, 2151– 2163. Samson, F., Zeffiro, T. A., et al. (2011). Stimulus complexity and categorical effects in hu­ man auditory cortex: an Activation Likelihood Estimation meta-analysis. Frontiers in Psy­ chology, 1, 1–23. Scharf, B., Magnan, J., et al. (1997). On the role of the olivocochlear bundle in hearing: 16 Case studies. Hearing Research, 102, 101–122. Schonwiesner, M., & Zatorre, R. J. (2008). Depth electrode recordings show double disso­ ciation between pitch processing in lateral Heschl’s gyrus. Experimental Brain Research, 187, 97–105. Schonwiesner, M., & Zatorre, R. J. (2009). Spectro-temporal modulation transfer function of single voxels in the human auditory cortex measured with high-resolution fMRI. Pro­ ceedings of the National Academy of Sciences U S A, 106 (34), 14611–14616. Schreiner, C. E., & Urbas, J. V. (1986). Representation of amplitude modulation in the au­ ditory cortex of the cat. I. Anterior auditory field. Hearing Research, 21, 227–241. Schreiner, C. E., & Urbas, J. V. (1988). Representation of amplitude modulation in the au­ ditory cortex of the cat. II. Comparison between cortical fields. Hearing Research, 32, 49– 64. Shackleton, T. M., & Carlyon, R. P. (1994). The role of resolved and unresolved harmonics in pitch perception and frequency modulation discrimination. Journal of the Acoustical So­ ciety of America, 95 (6), 3529–3540. Shamma, S. A., & Klein, D. (2000). The case of the missing pitch templates: How harmon­ ic templates emerge in the early auditory system. Journal of the Acoustical Society of America, 107, 2631–2644. Shannon, R. V., Zeng, F. G., et al. (1995). Speech recognition with primarily temporal cues. Science, 270 (5234), 303–304.

Page 58 of 62

Audition Shera, C. A., Guinan, J. J., et al. (2002). Revised estimates of human cochlear tuning from otoacoustic and behavioral measurements. Proceedings of the National Academy of Sciences U S A, 99 (5), 3318–3323. Shinn-Cunningham, B. G. (2008). Object-based auditory and visual attention. Trends in Cognitive Sciences, 12 (5), 182–186. Singh, N. C., & Theunissen, F. E. (2003). Modulation spectra of natural sounds and etho­ logical theories of auditory processing. Journal of the Acoustical Society of America, 114 (6), 3394–33411. Skoe, E., & Kraus, N. (2010). Auditory brainstem response to complex sounds: A tutorial. Ear and Hearing, 31 (3), 302–324. Smith, E. C., & Lewicki, M. S. (2006). Efficient auditory coding. Nature, 439, 978–982. Smith, Z. M., Delgutte, B., et al. (2002). Chimaeric sounds reveal dichotomies in auditory perception. Nature, 416, 87–90. Snyder, J. S., & Alain, C. (2007). Toward a neurophysiological theory of auditory stream segregation. Psychological Bulletin, 133 (5), 780–799. Stevens, S. S. (1955). The measurement of loudness. Journal of the Acoustical Society of America, 27 (5), 815–829. Stilp, C. E., Alexander, J. M., et al. (2010). Auditory color constancy: Calibration to reli­ able spectral properties across nonspeech context and targets. Attention, Perception, and Psychophysics, 72 (2), 470–480. Sutter, M. L., & Schreiner, C. E. (1991). Physiology and topography of neurons with multi­ peaked tuning curves in cat primary auditory cortex. Journal of Neurophysiology, 65, 1207–1226. Sweet, R. A., Dorph-Petersen, K., et al. (2005). Mapping auditory core, lateral belt, and parabelt cortices in the human superior temporal gyrus. Journal of Comparative Neurolo­ gy, 491, 270–289. Talavage, T. M., Sereno, M. I., et al. (2004). Tonotopic organization in human auditory cor­ tex revealed by progressions of frequency sensitivity. Journal of Neurophysiology, 91, 1282–1296. Tansley, B. W., & Suffield, J. B. (1983). Time-course of adaptation and recovery of chan­ nels selectively sensitive to frequency and amplitude modulation. Journal of the Acousti­ cal Society of America, 74, 765–775. Terhardt, E. (1974). Pitch, consonance, and harmony. Journal of the Acoustical Society of America, 55, 1061–1069.

Page 59 of 62

Audition Theunissen, F. E., Sen, K., et al. (2000). Spectral-temporal receptive fields of non-linear auditory neurons obtained using natural sounds. Journal of Neuroscience, 20, 2315–2331. Tian, B., & Rauschecker, J. P. (2004). Processing of frequency-modulated sounds in the lat­ eral auditory belt cortex of the rhesus monkey. Journal of Neurophysiology, 92, 2993– 3013. Tian, B., Reser, D., et al. (2001). Functional specialization in rhesus monkey auditory cor­ tex. Science, 292, 290–293. Ulanovsky, N., Las, L., et al. (2003). Processing of low-probability sounds by cortical neu­ rons. Nature Neuroscience, 6 (4), 391–398. van Noorden, L. P. A. S. (1975). Temporal coherence in the perception of tone sequences. Eindhoven, The Netherlands: The Institute of Perception Research, University of Technol­ ogy. Walker, K. M. M., Bizley, J. K., et al. (2010). Cortical encoding of pitch: Recent results and open questions. Hearing Research, 271 (1–2), 74–87. Wallace, M. N., Anderson, L. A., et al. (2007). Phase-locked responses to pure tones in the auditory thalamus. Journal of Neurophysiology, 98 (4), 1941–1952. Wallach, H., Newman, E. B., et al. (1949). The precedence effect in sound localization. American Journal of Psychology, 42, 315–336. Warren, J. D., Zielinski, B. A., et al. (2002). Perception of sound-source motion by the hu­ man brain. Neuron, 34, 139–148. Warren, R. M. (1970). Perceptual restoration of missing speech sounds. Science, 167, 392–393. Warren, R. M., Obusek, C. J., et al. (1972). Auditory induction: perceptual synthesis of ab­ sent sounds. Science, 176, 1149–1151. Watson, C. S. (1987). Uncertainty, informational masking and the capacity of immediate auditory memory. In W. A. Yost & C. S. Watson (Eds.), Auditory processing of complex sounds (pp. 267–277). Hillsdale, NJ: Erlbaum. Wightman, F. (1973). The pattern-transformation model of pitch. Journal of the Acoustical Society of America, 54, 407–416. Wightman, F., & Kistler, D. J. (1989). Headphone simulation of free-field listening. II. Psy­ chophysical validation. Journal of the Acoustical Society of America, 85 (2), 868–878. Winslow, R. L., & Sachs, M. B. (1987). Effect of electrical stimulation of the crossed olivo­ cochlear bundle on auditory nerve response to tones in noise. Journal of Neurophysiology, 57 (4), 1002–1021. Page 60 of 62

Audition Winter, I. M. (2005). The neurophysiology of pitch. In C. J. Plack, A. J. Oxenham, R. R. Fay, & A. J. Popper (Eds.), Pitch—Neural coding and perception (pp. 99–146). New York: Springer-Verlag. Wong, P. C. M., Skoe, E., et al. (2007). Musical experience shapes human brainstem en­ coding of linguistic pitch patterns. Nature Neuroscience, 10 (4), 420–422. Woods, T. M., Lopez, S. E., et al. (2006). Effects of stimulus azimuth and intensity on the single-neuron activity in the auditory cortex of the alert macaque monkey. Journal of Neu­ rophysiology, 96 (6), 3323–3337. Woolley, S. M., Fremouw, T. E., et al. (2005). Tuning for spectro-temporal modulations as a mechanism for auditory discrimination of natural sounds. Nature Neuroscience, 8 (10), 1371–1379. Yates, G. K. (1990). Basilar membrane nonlinearity and its influence on auditory nerve rate-intensity functions. Hearing Research, 50, 145–162. Yin, T. C. T., & Kuwada, S. (2010). Binaural localization cues. In A. Rees & A. R. Palmer (Eds.), The Oxford handbook of auditory science: The auditory brain (pp. 271–302). Ox­ ford, UK: Oxford University Press. Young, E. D. (2010). Level and spectrum. In A. Rees & A. R. Palmer (Eds.), The Oxford handbook of auditory science: The auditory brain (pp. 93–124). Oxford, UK: Oxford Uni­ versity Press. Zahorik, P. (2009). Perceptually relevant parameters for virtual listening simulation of small room acoustics. Journal of the Acoustical Society of America, 126, 776–791. Zahorik, P., Bangayan, P., et al. (2006). Perceptual recalibration in human sound localiza­ tion: Learning to remediate front-back reversals. Journal of the Acoustical Society of America, 120 (1), 343–359. Zahorik, P., & Wightman, F. L. (2001). Loudness constancy with varying sound source dis­ tance. Nature Neuroscience, 4 (1), 78–83. Zatorre, R. J. (1985). Discrimination and recognition of tonal melodies after unilateral cerebral excisions. Neuropsychologia, 23 (1), 31–41. Zatorre, R. J., & Belin, P. (2001). Spectral and temporal processing in human auditory cor­ tex. Cerebral Cortex, 11, 946–953. Zatorre, R. J., Belin, P., et al. (2002). Structure and function of auditory cortex: Music and speech. Trends in Cognitive Sciences, 6 (1), 37–46.

Josh H. McDermott

Page 61 of 62

Audition Josh H. McDermott, Department of Brain and Cognitive Sciences, Massachusetts In­ stitute of Technology, Cambridge MA

Page 62 of 62

Neural Correlates of the Development of Speech Perception and Compre­ hension

Neural Correlates of the Development of Speech Per­ ception and Comprehension   Angela Friederici and Claudia Männel The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics Edited by Kevin N. Ochsner and Stephen Kosslyn Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0009

Abstract and Keywords The development of auditory language perception proceeds from acoustic features via phonological representations to words and their relations in a sentence. Neurophysiologi­ cal data indicate that infants discriminate acoustic differences relevant for phoneme cate­ gories and word stress patterns by the age of 2 and 4 months, respectively. Salient acoustic cues that mark prosodic phrase boundaries (e.g., pauses) are also perceived at about the age of 5 months and infants learn about the rules according to which phonemes are legally combined (i.e, phonotactics). At the end of their first year of life, children rec­ ognize and produce their first words, and electrophysiological studies suggest that they establish brain mechanisms to gain lexical representations similar to those of adults. In their second year of life, children enlarge their lexicon, and electrophysiological data show that 24-month-olds base their processing of semantic relations in sentences on brain mechanisms comparable to those observable in adults. At this age, children are also able to recognize syntactic errors in a sentence, but it takes until 32 months before they display a brain response pattern to syntactic violations similar to adults. The development of comprehension of syntactically complex sentences, such as sentences with a noncanon­ ical word order, however, takes several more years before adult-like processes are estab­ lished. Keywords: phonemes, word stress, prosody, phonotactics, lexicon, semantics, syntax

Introduction Language acquisition, with its remarkable speed and high levels of success, remains a mystery. At birth, infants are able to communicate by crying in different ways. From birth on, infants also distinguish the sound of their native language from that of other lan­ guages. Following these first language-related steps, there is a fast progression in the de­ velopment of perceptive and expressive language skills. At about 4 months, babies start to babble, the earliest stages of language production. A mere 12 months after birth, most Page 1 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension babies start to speak their first words, and about half a year later, they can even produce short sentences. Finally, at the end of most children’s third year of life, they have ac­ quired at least 500 words and know how to combine them into meaningful utterances. Thus, they have mastered the entry into their native language: They have acquired a com­ plex system with the typical sounds of a language, these sounds are combined in different ways to make up a wide vocabulary, and the vocabulary items are related to each other by means of syntactic rules. Although developmental research has delivered increasing knowledge about language ac­ quisition (e.g., Clark, 2003; Szagun, 2006), many questions remain. Studying how chil­ dren acquire language is not easily accomplished because a great deal of learning takes place before the child is able to speak and to show overt responses to what he or she ac­ tually perceives. It is a methodological challenge to develop ways to investigate whether infants know (p. 172) a particular word before they can demonstrate this by producing it. The measurement of infants’ brain responses to language input can help to provide infor­ mation about speech perception abilities early in life, and, moreover, they allow us to de­ scribe the neural basis of language perception and comprehension during development.

Measuring Brain Activity in Early Development There are several neuroimaging methods that enable the measurement of the brain’s re­ action to environmental input such as spoken language. The most frequently used mea­ sures in infants and young children are event-related brain potentials (ERPs), as regis­ tered with electroencephalography (EEG). ERPs reflect the brain’s electrical activity in response to a particular stimulus with an excellent temporal resolution, thus covering the high-speed and temporally sequenced sensory and cognitive processes. Each time-locked average response typically appears as a waveform with several positive or negative peaks at particular latencies after stimulus onset; and each peak, or component, has a charac­ teristic scalp distribution. Although ERPs deliver restricted spatial information about the component’s distribution in two-dimensional maps, reliable source reconstruction from surface data still poses a methodological problem. The polarity (negative/positive inflec­ tion of the waveform relative to baseline) and the latency and the scalp distribution of dif­ ferent components allow us to dissociate perceptual and cognitive processes associated with them. Specifically, changes within the dimensions of the ERP can be interpreted as reflecting a slowing down of a particular cognitive process (reflected in the latency), a re­ duction in the processing demands or efficiency (amplitude of a positivity or negativity), or alterations/maturation of cortical tissue supporting a particular process (topography). For example, ERP measures allow the investigation of infants’ early ability to detect audi­ tory changes and how the timing and appearance of these perceptual processes vary through the first year (Kushnerenko, Ceponiene, Balan, Fellman, & Näätänen, 2002a). Only recently, magnetoencephalography (MEG) has started to be used for developmental research. MEG measures the magnetic fields associated with the brain’s electrical activi­ ty. Accordingly, this method also captures information processing in the brain with a high Page 2 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension temporal resolution. In contrast to EEG, however, it also provides reliable spatial informa­ tion about the localization of the currents responsible for the magnetic field sources. For example, an MEG study with newborns revealed infants’ instant ability to discriminate speech sounds and, moreover, reliably located this process in the auditory cortices (Ku­ jala et al., 2004). Because movement restrictions limit the use of MEG in developmental research, this method has been primarily applied to sleeping newborns (e.g., Kujala et al., 2004; Sambeth, Ruohio, Alku, Fellman, & Huotilainen, 2008). However, the use of addi­ tional head-tracking techniques considerably expands its field of application (e.g., Imada et al., 2006). A third neuroimaging method, functional near-infrared spectroscopy (fNIRS) or optical topography (OT) (Villringer & Chance, 1997), enables the study of cortical hemodynamic responses in infants. This method relies on the spectroscopic determination of changes in hemoglobin concentrations resulting from increased regional blood flow in the cerebral cortex, which can be assessed through the scalp and skull. Changes in light attenuation at different wavelengths greatly depend on the concentration changes in oxygenated and deoxygenated hemoglobin ([oxy-Hb] and [deoxy-Hb]) in the cerebral cortex. Because he­ modynamic responses are only slowly evolving, the temporal resolution of this method is low, but its spatial resolution is relatively informative, depending on the number of chan­ nels measured (Obrig & Villringer, 2003; Okamoto et al., 2004; Schroeter, Zysset, Wahl, & von Cramon, 2004). To improve the temporal resolution, event-related NIRS paradigms have been suggested (Gratton & Fabiani, 2001). The limitations of fNIRS are at the same time its advantages because the spatial characteristics outrank EEG and the temporal characteristics are comparable or superior to fMRI, so fNIRS simultaneously delivers both kinds of information in moderate resolutions. Furthermore, it is, in contrast to MEG and fMRI, not subject to movement restrictions and seems thus particularly suitable for infant research. For example, fNIRS was used to locate brain responses to vocal and non­ vocal sounds and revealed voice-sensitive areas in the infant brain (Grossmann, Obereck­ er, Koch, & Friederici, 2010). Another advantage of fNIRS is its compatibility with EEG and MEG measures, which delivers complementary high temporal information (e.g., Grossmann et al., 2008). Another method that registers the metabolic demand due to neural signaling is functional magnetic resonance imaging (fMRI). Here, the resulting changes in oxygenated hemoglo­ bin are measured as (p. 173) blood-oxygen-level-dependent (BOLD) contrast. The temporal resolution of this method is considerably lower than with EEG/MEG, but its spatial resolu­ tion is excellent. Thus, fMRI studies provide information about the localization of sensory and cognitive processes, not only in surface layers of the cortex as with fNIRS, but also in deeper cortical and subcortical structures. So far, this measurement has been primarily applied with infants while they were asleep in the scanner (e.g., Dehaene-Lambertz, De­ haene, & Hertz-Pannier, 2002; Dehaene-Lambertz et al., 2010; Perani et al., 2010). The limited number of developmental fMRI studies may be due to both practical issues and methodological considerations. Movement restrictions during brain scanning make it dif­ ficult to work with children in the scanner. Moreover, there is an ongoing discussion whether the BOLD signal in children is comparable to the one in adults and whether the Page 3 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension adult models applied so far are appropriate for developmental research (see, e.g., Muzik, Chugani, Juhasz, Shen, & Chugani, 2000; Rivkin et al., 2004; Schapiro et al., 2004). To ad­ dress the latter problem, recent fMRI studies in infants and young children have used age-specific brain templates (Dehaene-Lambertz, Dehaene, & Hertz-Pannier, 2002) and optimized template construction for developmental populations (Fonov et al., 2011; Wilke, Holland, Altaye, & Gaser, 2008). The decision to use one of these neuroimaging techniques in developmental research is thus determined by practical matters and, in addition, is highly dependent on the kind of information sought, that is, the neuronal correlates of information processing in their temporal or spatial resolution. Ideally, various methods using their complementary abili­ ties should be combined because results from the respective measures all add up to pro­ vide insight into the brain bases of language development in its early stages.

Neural Dispositions of Language in the Infant Brain For about half a century, developmental researchers have used electrophysiological mea­ sures to investigate the neural basis of language development (for reviews, see Csibra, Kushnerenko, & Grossmann, 2008; Friederici, 2005; Kuhl, 2004). Methods that allow a better spatial resolution, (i.e., MEG, fNIRS, and fMRI) have only recently been applied in language research with infants and young children (for reviews, see Cheour et al., 2004; Gervain et al., 2011; Leach & Holland, 2010; Lloyd-Fox, Blasi, & Elwell, 2010; MinagawaKawai, Mori, Hebden, & Dupoux, 2008; Vannest et al., 2009). Although there are only a few studies that have applied the latter methods with infants so far, their findings strongly point to neural dispositions for language in the infant brain. In an fNIRS study with sleeping newborns, Pena et al. (2003) observed a left hemispheric dominance in temporal areas for normal speech compared with backward speech. In an fMRI experiment with 3-month-olds, Dehaene-Lambertz, Dehaene, and Hertz-Pannier (2002) also found that speech sounds, collapsed across forward and backward speech, compared with silence evoked strong left hemispheric activation in the superior temporal gyrus. This activation included Heschl’s gyrus and extended to the superior temporal sul­ cus and the temporal pole (Figure 9.1). Activation differences between forward and back­ ward speech were observed in the left angular gyrus and the precuneus. An additional right frontal brain activation occurred only in infants that were awake and was interpret­ ed as reflecting attentional factors. In a second analysis of the fMRI data from 3-month-olds, Dehaene-Lambertz et al. (2006) found that the temporal sequence of left hemispheric activations in the different brain ar­ eas was similar to adult patterns, with the activation in the auditory cortex preceding ac­ tivation both in the most posterior and anterior parts of the temporal cortex and in Broca’s area. The reported early left hemisphere specialization has been shown to be speech-specific (Dehaene-Lambertz et al., 2010) and, in addition, appears more pro­ Page 4 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension nounced for native relative to non-native language input (Minagawa-Kawai et al., 2011). Specifically, 2-month-olds showed stronger fMRI activation in the left posterior temporal lobe in response to language than music stimuli (Dehaene-Lambertz et al., 2010). In an fNIRS study with 4-month-olds, Minagawa-Kawai et al. (2011) reported stronger respons­ es in left temporal regions for speech relative to three nonspeech conditions, with native stimuli revealing stronger lateralization patterns than non-native stimuli. Left hemispher­ ic superior temporal activation has also been reported as a discriminative response to syl­ lables in a recent MEG study with neonates, 6-month-olds, and 12-month-olds (Imada et al., 2006). Interestingly, 6- and 12-month-olds additionally showed activation patterns in inferior frontal regions, but newborns did not yet do this. Together with the results in 3month-olds (Dehaene-Lambertz et al., 2006), these findings indicate developmental changes in motor speech areas (Broca’s area) at an age when infants produce their first syllable sequences and words.

Figure 9.1 Neural disposition for language in the in­ fant brain. Brain activation of 3-month-old infants in response to speech sounds. A, Averaged brain activa­ tion in response to speech sounds (forward speech and backward speech versus rest). B, left panel, Av­ eraged brain activation in awake infants (forward speech versus backward speech). Right panel, Aver­ aged hemodynamic responses to forward speech and backward speech in awake and asleep infants. L, left hemisphere; R, right hemisphere. Reprinted with permission from Dehaene-Lambertz, Dehaene, & Hertz-Pannier, 2002. Copyright © 2002, American Association for the Advancement of Science. (p. 174)

Page 5 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension Thus, it appears that early during the development, there is a dominance of the left hemi­ sphere for the processing of speech, and particularly native language stimuli, in which both segmental and suprasegmental information is intact, compared with, for example, backward speech, in which this information is altered. However, the processing of more fine-grained phonological features and language-specific contrasts may lateralize later during the development, once children have advanced from general perceptual abilities to the attunement to their native language (e.g., Minagawa-Kawai, Mori, Naoi, & Kojima, 2007). In adults, processing of sentence-level prosody (i.e., suprasegmental information) has been shown to predominantly recruit brain areas in the right hemisphere (Meyer, Alter, Friederici, Lohmann, & von Cramon, 2002; Meyer, Steinhauer, Alter, Friederici, & von Cramon, 2004). To investigate the brain basis for prosodic processes in infancy, Sambeth et al. (2008) used MEG and presented sleeping newborns with varying degrees of prosod­ ic information. For normal continuous speech and singing, infants showed pronounced bi­ lateral brain responses, which, however, dramatically decreased when infants were pre­ sented with filtered low-prosody speech. Similarly, in two fNIRS studies with newborns, Saito and colleagues observed first, increased bilateral frontal responses to (p. 175) in­ fant-directed compared with adult-directed speech, with the former featuring more pro­ nounced prosodic information (Saito et al., 2007a). Second, infants only showed this frontal activation pattern for speech with normal prosody, but not for speech with flat prosody (Saito et al., 2007b). An fNIRS study with 3-month-olds, which directly compared normal and flattened speech, revealed activation differences in the right temporal-pari­ etal cortex, suggesting a right hemispheric dominance for the processing of sentential prosody (pitch information) similar to the dominance reported in adults (Homae, Watan­ abe, Nakano, Asakawa, & Taga, 2006). Surprisingly, at 10 months, infants showed anoth­ er activation pattern, with flattened speech evoking stronger responses than normal speech in right temporal and temporal-parietal regions and bilateral prefrontal regions, which the authors explained by the additional effort to process unfamiliar pitch contours in brain regions specialized for prosodic information processing and attention allocation (Homae, Watanabe, Nakano, Asakawa, & Taga, 2007). Thus, the combined data on infant prosodic processing suggest that infants are sensitive to prosodic information from early on, but that the brain activation develops from a bilateral toward a more right lateralized pattern. Given these early neuroimaging findings, it seems that the functional neural network on which language is based, with a left-hemispheric dominance for speech over nonspeech and right-hemispheric dominance for prosody (Friederici & Alter, 2004; Hickok & Poep­ pel, 2007), is, in principle, established during the first 3 months of life. However, it ap­ pears that neither the specialization of the language-related areas (e.g., Brauer & Friederici, 2007; Minagawa-Kawai, Mori, Naoi, & Kojima, 2007) nor all of their structural connections are fully developed from early on (Brauer, Anwander, & Friederici, 2011; Dubois et al., 2008).

Page 6 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension

Developmental Stages in Language Acquisition and Their Associated Neural Correlates The development of auditory language perception proceeds from acoustic features via phonological representations to the representation of words and their relations in a sen­ tence. From a schematic point of view, two parallel developmental paths can be consid­ ered: one proceeding from acoustic cues to phonemes, and then to words and their mean­ ings, and the other proceeding from acoustic cues to prosodic phrases, to syntactic phras­ es and their relations. With respect to the first developmental path, neurophysiological data indicate that acoustic differences in phonemes and word stress patterns are detected by the age of 2 to 4 months. At the end of their first year, children recognize and produce their first words, and ERP studies suggest that infants have established brain mechanisms necessary to ac­ quire lexical representations in a similar way to adults, although these are still less spe­ cific. In their second year, children enlarge their lexicon, and ERP data show that 24month-olds process semantic relations between nouns and verbs in sentences. These processes resemble those in adults, indicated by children at this age already displaying an adult-like N400 component reflecting semantic processes. With respect to the second developmental path, developmental studies show that salient acoustic cues which mark prosodic phrase boundaries (e.g., pauses) are also perceived at about the age of 5 months, although it takes some time before less salient cues (e.g., changes in the pitch contour) can be used to identify prosodic boundaries that divide the speech stream into lexical and syntactic units. Electrophysiological data suggest that the processing of prosodic phrase structure, reflected by the closure positive shift (CPS), evolves with the emerging ability to process syntactic phrase structure. At the age of 2 years, children are able to recognize syntactic errors in a sentence, reflected by the P600 component. However, the fast automatic syntactic phrase structure building processes, indicated by the early left anterior negativity (ELAN) in addition to the P600, do not oc­ cur before the age of 32 months. The ability to comprehend syntactically complex sen­ tences, such as those with noncanonical word order (e.g., passive sentences, object-first sentences), takes a few more years until adult-like processes are established. Diffusiontensor imaging data suggest that this progression is dependent on the maturation of the fiber bundles that connect the language-relevant brain areas in the inferior frontal gyrus (Broca’s area) and in the superior temporal gyrus (posterior portion). Figure 9.2 gives an overview of the outlined developmental steps in language acquisition and the following sections will describe the related empirical evidence in more detail.

From Sounds to Words

Page 7 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension

Figure 9.2 Developmental stages of language acqui­ sition. Development stages are specified in the top row and their associated ERP components in the bot­ tom row. MMN, mismatch negativity. Modified from Friederici, 2005; Männel & Friederici, 2008.

On the path from sounds to words, infants initially start to process phonological informa­ tion that makes up the actual speech sounds (phonemes) and the rules according to which these sounds are combined (phonotactic rules). Soon, they process prosodic stress patterns of words, which help them to recognize lexical units in the speech stream. These information types are accessed before the processing of word meaning. (p. 176)

Phoneme Characteristics As one crucial step in language acquisition, infants have to tackle the basic speech sounds of their native language. The smallest sound units of a language, phonemes, are contrastive from each other, although functionally equivalent. In a given language, a cir­ cumscribed set of approximately 40 phonemes can be combined in different ways to form unique words. Thus, the meaning of a word changes when one of its component phonemes is exchanged with another, as in from cat to pat. Electrophysiological studies investigated phoneme discrimination using the mismatch paradigm. In this paradigm, two classes of stimuli are repeatedly presented, with one stimulus occurring relatively frequently (standard) and the other one relatively rarely (de­ viant). The mismatch negativity (MMN) component is a preattentive electrophysiological response that is evoked by any discriminable change in repetitive auditory stimulation (Näätänen, 1990). Thus, the mismatch response (MMR) in the ERP is the result of the brain’s automatic detection of the deviant among the standards. Several ERP studies have studied phoneme discrimination in infants and reported that the ability to detect acoustic changes in consonant articulation (Dehaene-Lambertz & Dehaene, 1994), consonant du­ ration (Kushnerenko et al., 2001), vowel duration (Leppänen, Pikho, Eklund, & Lyytinen, 1999; Pihko et al., 1999), and vowel type (Cheour et al., 1998) is present between 2 and 4 months of age. For example, Friederici, Friedrich, and Weber (2002) investigated infants’ ability to dis­ criminate between different vowel lengths in phonemes at the age of 2 months. Infants were presented with two syllables of different duration, /ba:/(baa) versus /ba/(ba), in an MMR paradigm. Two separate sessions tested the long syllable /ba:/(baa) as deviant in a

Page 8 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension stream of short syllable /ba/(ba) standards, and short /ba/(ba) as deviant in a stream of long /ba:/(baa) standards.

Figure 9.3 Syllable discrimination. ERP responses of 2-month-old infants to standard and deviant syllables and difference waves (deviant-standard) for the long syllable /ba:/ and the short syllable /ba/ in an auditory oddball paradigm. Modified from Friederici, Friedrich, & Weber, 2002.

In Figure 9.3, the ERP difference waves display a positivity with a frontal maximum at about 500-ms post-syllable onset for deviant processing. However, this positivity was only present for the deviancy detection of the long syllable in a stream of short syllables but not vice versa, which can be explained by the greater perceptual saliency of a larger ele­ ment in the context of smaller elements. In adults, the same experimental setting evoked a pronounced negative deflection at about 200-ms post-stimulus onset in the difference wave, the typical MMN response to acoustically deviating stimuli. Interestingly, in in­ fants, the response varied depending on their state of alertness; children who were in qui­ et sleep during the experiment showed only a positivity, while children who (p. 177) were awake showed an adult-like MMN in addition to the positivity. From the data, it follows that infants as young as 2 months of age are able to discriminate long syllables from short syllables and that they display a positivity in the ERP as MMR. Interestingly, language-specific phonemic discrimination is established only later during infants’ development, between the age of 6 and 12 months. Electrophysiological evidence revealed that younger infants aged 6 and 7 months show discrimination of phonemic con­ trasts that are either relevant or not relevant for their native language, whereas older in­ fants aged 11 and 12 months only display discrimination of the phonemic contrast in their native language (Cheour et al., 1998; Rivera-Gaxiola, Silva-Pereyra, & Kuhl, 2005). Simi­ larly, Minagawa-Kawai, Mori, Naoi, and Kojima (2007) showed in an fNIRS study that in­ fants tune into their native language-specific phoneme contrast at about the age of 6 to 7 months. However, the left dominance of the phoneme-specific response in the temporal regions was observed only in infants aged 13 months and older. These results suggest that phoneme contrasts are initially processed as acoustic rather than linguistic differ­ ence until at about 12 months, when left hemisphere regions are recruited similarly to in adults (Minagawa-Kawai, Mori, Furuya, Hayashi, & Sato, 2002). In infant studies using the mismatch paradigm, the MMR can appear as either a positive or a negative deflection in the ERP. For example, Kushnerenko et al. (2001) presented sleeping newborns with fricatives of different durations and observed negative MMRs, whereas Leppänen et al. Page 9 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension (1999) and Pihko et al. (1999) reported positive MMRs in sleeping newborns for syllables with different vowel length. The outcome of ERP responses to auditory change detection seems to be affected by several factors, for example, the infants’ state of alertness (awake or asleep). Furthermore, stimulus discrimination difficulty or saliency seems to have an impact on the discrimination response (Friederici, Friedrich, & Weber, 2002; Morr, Shafer, Kreuzer, & Kurtzberg, 2002). Also, the transition from a positive to a negative MMR has been shown to be an effect of advancing maturation (Kushnerenko et al., 2002b; Morr et al., 2002; Trainor et al., 2003). Despite the differences in the ERP morphology of the detection of phoneme changes, the combined data suggest that infants’ ability to au­ tomatically discriminate between different phonemes is present from early on.

Word Stress Another important phonological feature that infants have to discover and make use of during language acquisition is the rule according to which stress is applied to multisyllab­ ic words. For example, English, like German, is a stress-based language and has a bias to­ ward a stress-initial pattern for bisyllabic words (Cutler & Carter, 1987). French, in con­ trast, is a syllable-based language that tends to lengthen the word’s last syllable (Nazzi, Iakimova, Bertoncini, Frédonie, & Alcantara, 2006). Behaviorally, it has been shown that even newborns discriminate differently stressed pseudowords (Sansavini, Bertoncini, & Giovanelli, 1997) and that between 6 and 9 months, infants acquire language-specific knowledge about the stress pattern of possible word candidates (Jusczyk, Cutler, & Redanz, 1993; Höhle, Bijeljac-Babic, Nazzi, Herold, & Weissenborn, 2009; Skoruppa et al., 2009). Interestingly, studies revealed that stress pat­ tern discrimination at 6 months is shaped by language experience, as German-learning (p. 178) but not French-learning infants distinguish between stress-initial and stress-final pseudowords (Höhle et al., 2009). Similarly, at 9 months, Spanish-learning but not French-learning infants show discrimination responses, suggesting that French infants, although they are sensitive to the acoustic differences, do not treat stress as lexically in­ formative (Skoruppa et al., 2009). Neuroimaging studies that do not require infants’ attention during testing suggest that infants are sensitive to the predominant stress pattern of their target language as early as 4 to 5 months of age (Friederici, Friedrich, & Christophe, 2007; Weber, Hahne, Friedrich, & Friederici, 2004). In an ERP study, Friederici, Friedrich, and Christophe (2007) tested two groups of 4- to 5-month-old German- and French-learning infants for their ability to discriminate between different stress patterns. In a mismatch paradigm, the standard stimuli were bisyllabic pseudowords with stress on the first syllable (baaba), whereas the deviant stimuli had stress on the second syllable (babaa). The data revealed that both groups are able to discriminate between the two types of stress patterns (Fig­ ure 9.4). However, results differed with respect to the amplitude of the MMR: Infants learning German showed a larger effect for the language-nontypical iambic pattern (stress on the second syllable), whereas infants learning French demonstrated a larger ef­ fect for the language-nontypical trochaic pattern (stress on the first syllable). These re­ sults suggest that the respective nontypical stress pattern is considered deviant both Page 10 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension within the experiment (i.e., rare stimulus in the set) and with respect to an individual infant’s native language. This finding, in turn, presupposes that infants have established knowledge about the predominant stress pattern of their target language by the age of 5 months. As behavioral and electrophysiological developmental studies suggest, early syllable iden­ tification and stress pattern discrimination support speech segmentation during later ac­ quisition stages, performed by identifying onset and offset boundaries. Accordingly, in a number of behavioral experiments, Nazzi, Dilley, Jusczyk, Shattuck-Hunagel, and Jusczyk (2005) demonstrated that both the type of initial phoneme and the stress pattern influ­ ence word segmentation from fluent speech, with a preference for the predominant pat­ terns of the infants’ native language. Similarly, infants’ word detection is facilitated when words occur at boundary positions and are thus marked by additional prosodic informa­ tion (Gout, Christophe, & Morgan, 2004; Seidl & Johnson, 2007). Moreover, several stud­ ies that measured children’s later language outcome in lexical-semantic and syntactic do­ mains revealed the predictive value of infants’ early ERP responses to phoneme and stress pattern contrasts (Friedrich & Friederici, 2010; Kuhl et al., 2008; Tsao, Liu, & Kuhl, 2004). Regarding the development of word segmentation abilities, behavioral studies have demonstrated that at the age of 7.5 months, infants learning English are able to segment bisyllabic words with stress on the first syllable from continuous speech but not those with stress on the second syllable (Jusczyk, Houston, & Newsome, 1999). Segmentation of stress-initial words was also reported in 9-month-old Dutch-learning infants for both native and nonnative words, which, however, all followed the same language-specific stress pattern rules (Houston, Jusczyk, Kuijpers, Coolen, & Cutler, 2000; Kuijpers, Coolen, Houston, & Cutler, 1998). In contrast, the ability to segment words with stress on the sec­ ond syllable was only observed at the age of 10.5 months in English-learning infants (Jusczyk, Houston, & Newsome, 1999). For French-learning infants, Nazzi et al. (2006) found developmental differences between 8 and 16 months for the detection of syllables and bisyllabic words in fluent speech. Bisyllabic words as one unit are only detected at the age of 16 months. Although no segmentation effect was found for 8-month-olds, 12month-olds segmented individual syllables from the speech stream, with more ease in segmenting the second syllable, which is consistent with the rhythmic features of French.

Page 11 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension

Figure 9.4 Word stress. ERP responses of 4- to 5month-olds to standard and deviant stress patterns in an auditory oddball paradigm. A, Grand-average ERPs for French infants with the trochaic stress pat­ tern /baaba/ as standard and deviant (upper row) and the iambic stress pattern /babaa/ as standard and de­ viant (lower row). B, Grand-average ERPs for Ger­ man infants with the trochaic stress pattern /baaba/ as standard and deviant (upper row) and the iambic stress pattern /babaa/ as standard and deviant (lower row). Reprinted with permission from Friederici, Friedrich, & Christophe, 2007.

Electrophysiological studies on word segmentation revealed word recognition responses for Dutch-learning 7-month-olds in the ERP to previously familiarized words (Kooijman, Johnson, & Cutler, 2008), whereas behavioral studies observed word segmentation for 9month-olds, but not yet for 7.5-month-olds (Kuijpers, Coolen, Houston, & Cutler, 1998). Interestingly, detection of words in sentences by German-learning infants was observed even at 6 months, when during familiarization, words were particularly prosodically em­ phasized (Männel & Friederici, 2010). For the segmentation of the less familiar finalstress pattern, Dutch-learning 10-months-olds still largely relied on the strong syllable to launch words (Kooijman, Hagoort, & Cutler, 2009). Similarly, relating to the behavioral delay of French-learning infants in bisyllabic word segmentation, Goyet, de Schonen, and Nazzi (2010) found for French 12-month-olds, ERP responses to (p. 179) bisyllabic stressfinal words that revealed both whole word and syllable segmentation.

Phonotactics For successful language learning, infants eventually need to acquire the rules according to which phonemes may be combined to form a word in a given language. As infants be­ come more familiar with actual speech sounds, they gain probabilistic knowledge about particular phonotactic rules. This also includes information about which phonemes or phoneme combinations can legitimately appear at word onsets and offsets. If infants ac­

Page 12 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension quire this kind of information early on, it can support the detection of lexical units in con­ tinuous speech and thus facilitate the learning of new words. Behaviorally, it has been shown that phonotactic knowledge about word onsets and off­ sets is present and used for speech segmentation at the age of 9 months, but is not yet present at 6 months of age (Friederici & Wessels, 1993; Jusczyk, Friederici, Wessels, Svenkerud, & Jusczyk, 1993). In ERP studies, the N400 component can serve as an elec­ trophysiological marker for studying phonotactic knowledge by enabling the comparison of brain responses to nonwords that follow the phonotactic rules of a given language and nonsense words that do not. The N400 component is known to indicate lexical (word form) and semantic (meaning) processes and is interpreted to mark the effort to integrate an event into its current or long-term context, with more pronounced N400 amplitudes in­ dicating lexically and semantically unfamiliar or unexpected events (Holcomb, 1993; Ku­ tas & Van Petten, 1994). Regarding phonotactic processing in adults, ERP studies re­ vealed larger N400 amplitudes for pseudowords (i.e., (p. 180) phonotactically legal but nonexistent in the lexicon) than to real words. In contrast, nonwords (i.e., phonotactically illegal words) did not elicit an N400 response (e.g., Bentin, Mouchetant-Rostaing, Giard, Echallier, & Pernier, 1999; Holcomb, 1993; Nobre & McCarthy, 1994). This suggests that pseudowords trigger search processes for possible lexical entries, but this search fails be­ cause pseudowords do not exist in the lexicon. Nonwords, in contrast, do not initiate a similar search response because they are not even treated as possible lexical entries as they already violate the phonotactic rules.

Figure 9.5 Phonotactic rules. ERP data of 12-montholds, 19-month-olds, and adults in response to phono­ tactically illegal nonwords and phonotactically legal pseudowords in a picture–word priming paradigm. Modified from Friedrich & Friederici, 2005a.

In a developmental ERP study, Friedrich and Friederici (2005a) investigated phonotactic knowledge in 12- and 19-month-old toddlers by measuring brain responses to phonotacti­ cally legal pseudowords and phonotactically illegal nonwords. In a picture–word priming paradigm, children were presented with simple colored pictures while simultaneously lis­ tening to words that either correctly labeled the picture content or were pseudowords or nonwords. The picture content is assumed to initiate lexical-semantic priming, which re­ sults in semantic integration difficulties when the respective labels do not match the pic­ tures, reflected in enhanced N400 amplitudes. As Figure 9.5 illustrates, the ERP respons­ es of 19-month-olds are quite similar to the ones observed in adults because they demon­ strate more negative responses to phonotactically legal pseudowords than to phonotacti­ cally illegal nonwords. Adults show the typical N400 response to pseudowords, starting at about 400 ms after stimulus onset, whereas in 19-month-olds, the negative deflection to Page 13 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension pseudowords is sustained longer. In contrast, data of 12-month-olds do not reveal differ­ ential ERP responses to pseudowords and nonwords. From these data it follows that, in contrast to 12-month-olds, who do not yet have this ability, 19-month-olds possess some phonotactic knowledge (indicated by an N400-like response) and therefore treat pseudo­ words, but not nonwords, as likely referents for picture labels. This implies that nonwords that do not follow the language-specific phonotactic rules are not considered word candi­ dates and, from very early on, are excluded from further word learning.

Phonological Familiarity Infants’ emerging efforts to map sounds onto objects (or pictures of objects) has been captured in an additional ERP effect. An ERP study with 11-month-olds suggested a dif­ ferential brain response to known compared with unknown words in the form of a nega­ tivity at about 200 ms after word onset, which could be viewed as a familiarity effect (Thierry, Vihman, & Roberts, 2003). Using a picture–word priming paradigm with 12- and 14-month-olds, Friedrich and Friederici (2005b) observed an early frontocentral negativi­ ty between 100 and 400 ms for auditory word targets that matched the picture compared with nonmatching words. This early effect was interpreted as a familiarity effect reflect­ ing the fulfillment of a phonological (word) expectation after seeing the picture of an ob­ ject. At this age, infants seem to have some lexical knowledge, but the specific word form referring to a given object may not yet be sharply defined. This interpretation is support­ ed by the finding that 14-month-olds show an ERP difference between known words and phonetically dissimilar known words, but not between known words and phonetically sim­ ilar words (Mills et al., 2004). The available data thus indicate that phonological informa­ tion and semantic knowledge interact at about 14 months of age.

Word Meaning As described above, the adult N400 component reflects the integration of a lexical ele­ ment into a semantic context (Holcomb, 1993; Kutas & Van Petten, 1994) and can be used as an ERP template (p. 181) against which the ERPs for lexical-semantic processing during early development are compared. Mills, Coffey-Corina, and Neville (1997) investigated infants’ processing of words whose meaning they knew or did not know. Infants between 13 and 17 months of age showed a bilateral negativity for unknown words, whereas 20-month-olds showed a left-hemispher­ ic negativity, which was interpreted as a developmental change toward a hemispheric specialization for word processing (see also Mills et al., 2004). In a more recent ERP study, Mills and colleagues tested the effect of word experience (training) and vocabulary size (word production) on lexical processing (Mills, Plunkett, Prat, & Schafer, 2005). In this word-learning paradigm, 20-month-olds acquired novel words either paired with a novel object or without an object. After training, the infant ERPs showed a repetition ef­ fect indicated by a reduced N200-500 amplitude to familiar and novel unpaired words, whereas an increased bilaterally distributed N200-500 was found for novel paired words. This finding indicates that the N200-500 is linked to word meaning; however, it is not en­ tirely clear whether the N200-500 reflects semantic processes alone or whether phono­ Page 14 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension logical familiarity also plays a role. Assuming that this early ERP effect indeed reflects se­ mantic processing, its early onset may be explained by infants’ relatively small vocabular­ ies. A small vocabulary results in a low number of phonologically possible alternative word forms, allowing the brain to respond soon after hearing a word’s first phonemes (see earlier section on phonological familiarity). A clear semantic-context N400 effect at the word level has been observed for 14- and 19month-olds, but not yet for 12-month-olds (Friedrich & Friederici, 2005a, 2005b, 2004). The ERP to words in picture contexts showed a central-parietal, bilaterally distributed negative-going wave between 400 and 1400 ms, which was more negative for words that did not match the picture context than those that did (Figure 9.6). Compared with adults, this N400-like effect reached significance later and lasted longer, which suggests slower lexical-semantic processes in children. There were also small topographic differences of the effect because children showed a stronger involvement of frontal electrode sites than adults. The more frontal distribution could either mean that children’s semantic process­ es are still more image-based (see frontal distribution in adults for picture instead of word processing; West & Holcomb, 2002) or that children recruit frontal brain regions that, in adults, are associated with attention (Courchesne, 1990) and increased demands on language processing (Brauer & Friederici, 2007). In a recent study, Friedrich and Friederici (2010) found that the emergence of the N400 is not merely age dependent but also related to the infants’ state of behavioral language development. Twelve-month-olds, who obtained a high early word production score, already displayed an N400 semantic priming effect, whereas infants with lower vocabulary rates did not. Torkildsen and colleagues examined lexical-semantic processes as indicated by the N400 in 2-year-olds. In the first study, using a picture–word priming paradigm, they found that 20-month-olds showed earlier and larger N400 effects for between-category than withincategory violations, pointing to children’s early knowledge about semantic categories (Torkildsen et al., 2006). In the second study, the authors used a unimodal lexical-seman­ tic priming paradigm with semantically related and unrelated word pairs, and demon­ strated that 24-month-olds reveal a phonological-lexical familiarity effect for related word pairs and an N400 effect for unrelated word pairs, suggesting that semantic relatedness priming is functional at the end of children’s second year (Torkildsen et al., 2007). There are few fMRI studies with children investigating lexical-semantic processes at the word level. These fMRI studies suggest that a neural network for the processing of words and their meaning often seen in adults is established by the age of 5 years. For example, one study used a semantic categorization task with 5- to 10-year-old children and ob­ served activation in the left inferior frontal gyrus and the temporal region as well as in the left fusiform gyrus, suggesting a left hemispheric language network similar to that in adults (Balsamo, Xu, & Gaillard, 2006). Another study used a semantic judgment task requiring the evaluation of the semantic relatedness of two auditorily presented words (Chou et al., 2006). During this task, 9- to 15-year-olds showed activation in the temporal gyrus and in the inferior frontal gyri bilaterally. In a recent fMRI language mapping study with 8- to 17-year-old children, de Guibert et al. (2010) applied two auditory lexical-se­ Page 15 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension mantic tasks and two visual phonological tasks and observed selective activations in left frontal and temporal regions.

Figure 9.6 Word meaning. ERP data and difference maps (nonmatching–matching) of 12-month-olds, 14month-olds, 19-montholds, and adults in response to words matching or not matching the picture content in a picture–word priming paradigm. Modified from Friedrich & Friederici, 2005a, 2005b.

In summary, in the developmental ERP studies on semantic processing at the word level we have introduced, two ERP effects have been observed. First, an early negativity in re­ sponse to picture-matching words has been found even in 12-month-olds (p. 182) and can be interpreted as a phonological familiarity effect. Second, a later central-parietal nega­ tivity for nonmatching words has been observed in 14- and 19-month-olds, an effect re­ ferred to as infant N400. The occurrence of a phonological familiarity effect across all age groups suggests that not only 14- and 19-month-olds but also 12-month-olds create lexical expectations from picture contents, revealing that they already possess some lexical-se­ mantic knowledge. However, infants at the latter age do not yet display an N400 semantic expectancy violation effect present in 14-month-olds, which indicates that the neural mechanisms of the N400 mature between 12 and 14 months of age. Furthermore, at the end of their second year, toddlers are sensitive to semantic category relations and seman­ tic relatedness of basic-level words. The finding that the N400 at this age still differs in latency and distribution from the adult N400 suggests that the underlying brain systems are still under development. The fact, however, that an N400 effect is present at this age implies that this ERP component is a useful tool to further investigate semantic process­ ing in young children. In this context, developmental fMRI studies have revealed lefthemisphere activation patterns for lexical-semantic processes that resemble those of adults. Direct developmental comparisons, however, suggest age-related activation in­ creases in left inferior frontal regions and left superior temporal regions, indicating greater lexical control and experience-based gain in lexical representations, respectively (Booth et al., 2004; Cao et al., 2010; Schlaggar et al., 2002).

Page 16 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension

From Sounds to Sentences On the path from sounds to sentences, prosodic information plays a central role. Senten­ tial prosody is crucial for the acquisition of syntactic structure because different acoustic cues that in combination mark prosodic phrase boundaries often signal syntactic phrase boundaries. The detection and processing of prosodic phrase boundaries thus facilitate the segmentation of linguistically relevant units from continuous speech and provide an easy entry into later lexical and syntactic learning (see Gleitman & Wanner, 1982).

Sentence-Level Prosody Intonational phrase boundaries (IPBs) mark the largest units in phrasal prosody, roughly (p. 183) corresponding to syntactic clauses, and are characterized by several acoustic cues, namely, preboundary lengthening, pitch change, and pausing (Selkirk, 1984). Be­ haviorally, it has been shown that adult listeners make use of prosodic boundaries in the interpretation of spoken utterances (e.g., Schafer, Speer, Warren, & White, 2000). Simi­ larly, developmental studies indicate that infants perceive larger linguistic units in contin­ uous speech based on prosodic boundary cues. Although 6-month-old English-learning in­ fants detect clauses in continuous speech, they cannot yet reliably identify syntactic phrases in continuous speech (Nazzi, Kemler Nelson, Jusczyk, & Jusczyk, 2000; Seidl, 2007; Soderstrom, Nelson, & Jusczyk, 2005; Soderstrom, Seidl, Nelson, & Jusczyk, 2003). In contrast, 9-month-olds demonstrate this ability at both clause and phrase level (Soder­ strom et al., 2003). Thus, the perception of prosodic cues that, in combination, signal boundaries appears to be essential for the structuring of the incoming speech signal and enables further speech analyses. In adult ERP studies, the offset of IPBs is associated with a positive-going deflection with a central-parietal distribution, the CPS (Pannekamp, Toepel, Alter, Hahne, & Friederici, 2005; Steinhauer, Alter, & Friederici, 1999). This component has been interpreted as an indicator of the closure of prosodic phrases by IPBs. The CPS has been shown to be not a mere reaction to the acoustically salient pause (lower-level processing), but rather an in­ dex for the underlying linguistic process of prosodic structure perception (higher-level processing) because it is still present when the pause is deleted (Steinhauer, Alter, & Friederici, 1999). To investigate the electrophysiology underlying prosodic processing at early stages of lan­ guage acquisition, a recent ERP study examined 5-month-olds’ ability to process IPBs with and without a boundary pause (Männel & Friederici, 2009). Infants listened to sen­ tences with two different prosodic realizations determined by their particular syntactic structure: sentences containing an IPB (e.g., Tommi verspricht, # Papa zu helfen [Tommi promises to help Papa]), and sentences without an IPB (e.g., Tommi verspricht Papa zu schlafen [Tommi promises Papa to sleep]). In a first experiment, 5-month-olds showed no CPS in response to IPBs; instead, they demonstrated an obligatory ERP response to sen­ tence continuation after the pause. In a second experiment in which the boundary pause had been deleted and only preboundary lengthening and pitch change signaled the IPBs, another group of 5-month-olds did not reveal the obligatory ERP response observed previ­ Page 17 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension ously. In contrast, adults showed a CPS in addition to obligatory ERP responses indepen­ dent of the presence of the boundary pause (see also Steinhauer, Alter, & Friederici, 1999). The developmental comparison indicates that infants are sensitive to salient acoustic cues such as pauses in the speech input, and that they process speech interrup­ tions at lower perceptual levels. However, they do not yet show higher-level processing of combined prosodic boundary cues, reflected by the CPS. ERP studies in older children examined when, during language learning, the processes associated with the CPS emerge by exploring the relationship between prosodic boundary perception and syntactic knowledge (Männel & Friederici, 2011). ERP studies on the pro­ cessing of phrase structure violations have revealed a developmental shift between children’s second and third year (Oberecker, Friedrich, & Friederici, 2005; Oberecker & Friederici, 2006; see below). Accordingly, children were tested on IPB processing before this developmental phase, at 21 months, and after this phase, at 3 and 6 years of age. As can be seen from Figure 9.7, 21-month-olds do not yet show a positive shift in response to IPBs, although 3- and 6-year-olds do. These results indicate that prosodic structure pro­ cessing, as indicated by the CPS, does not emerge until some knowledge of syntactic phrase structure has been established. The combined ERP findings on prosodic processing in infants and children suggest that during early stages of language acquisition, infants initially rely on salient acoustic as­ pects of prosodic information that are likely contributors to the recognition of prosodic boundaries. Children may initially detect prosodic breaks through lower-level processing mechanisms until a degree of syntactic structure knowledge is formed through continued language experience that, in turn, reinforces the ability of children to perceive prosodic phrasing at a cognitive level. The use of prosodic boundary cues for later language learning has been shown in lexical acquisition (Gout, Christophe, & Morgan, 2004; Seidl & Johnson, 2007), and in the acqui­ sition of word order regularities (Höhle, Weissenborn, Schmitz, & Ischebeck, 2001). Thus, from a developmental perspective, the initial analysis and segmentation of larger linguis­ tically relevant units based on prosodic boundary cues seems to be particularly important during language acquisition and likely facilitates bootstrapping into smaller syntactic and lexical units in the speech signal later in children’s development.

Page 18 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension

Figure 9.7 Sentence-level prosody. ERP data and dif­ ference maps (with IPB–without IPB) of 21-montholds, 3-year-olds, and 6-year-olds in response to sen­ tences with and without intonational phrase bound­ aries (IPB). Modified from Männel, 2011. (p. 184)

Sentence-Level Semantics Sentence processing requires not only the identification of linguistic units but also the maintenance of the related information in working memory and the integration of differ­ ent information over time. To understand the meaning of a sentence, the listener has to possess semantic knowledge about nouns and verbs as well as their respective relation­ ship (for neural correlates of developmental differences between noun and verb process­ ing, see Li, Shu, Liu, & Li, 2006; Mestres-Misse, Rodriguez-Fornells, & Münte, 2010; Tan & Molfese, 2009). To investigate whether children already process word meaning and se­ mantic relations in sentential context, the semantic violation paradigm can be applied with semantically correct and incorrect sentences such as The king was murdered and The honey was murdered, respectively (Friederici, Pfeifer, & Hahne, 1993; Hahne & Friederici, 2002). This paradigm uses the N400 as an index of semantic integration abili­ ties, with larger N400 amplitudes for higher integration efforts of semantically inappro­ priate words into their context. The semantic expectation of a possible sentence ending, for example, is violated in The honey was murdered because the verb at the end of the sentence (murdered) does not semantically meet the meaning that was set up by the noun in the beginning (honey). In adult ERP studies, an N400 has been found in response to such semantically unexpected sentence endings (Friederici, Pfeifer, & Hahne, 1993; Hahne & Friederici, 2002). Friedrich and Friederici (2005c) studied the ERP responses to semantically correct and incorrect sentences in 19- and 24-month-old children. Semantically incorrect sentences contained objects that violated the selection restrictions of the preceding verb, as in The cat drinks the ball in contrast to The child rolls the ball. For both age groups, the sen­ tence endings of semantically incorrect sentences evoked N400-like effects in the ERP, with a maximum at central-parietal electrode sites (Figure 9.8). In comparison to the adult data, the negativities in children started at about the same time (i.e., at around 400 ms post-word onset) but were longer lasting. This suggests that semantically unexpected nouns that violate the selection restrictions of the preceding verb also initiate semantic integration processes in children but that these integration efforts are maintained longer than in adults. The developmental ERP data indicate that even at the age of 19 and 24 Page 19 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension months, children are able to process semantic relations between words in sentences in a similar manner to adults. ERP studies on the processing of sentential lexical-semantic information have also report­ ed N400-like responses to semantically incorrect sentences in older children, namely 5- to 15-year-olds (Atchley et al., 2006; Hahne et al., 2004; Holcomb, Coffey, & Neville, 1992). Similarly, Silva-Pereyra and colleagues found that sentence endings that semantically vio­ lated the preceding sentence phrases evoked several anteriorly distributed negative peaks in 3- and 4-year-olds, whereas in 30-month-olds, an anterior negativity between 500- and 800-ms after word onset occurred (Silva-Pereyra, Klarman, Lin, & Kuhl, 2005; Silva-Pereyra, Rivera-Gaxiola, & Kuhl, 2005). Although these studies revealed differential responses to semantically incorrect and correct sentences in young children, the distribu­ tion of these negativities did not match the usual central-parietal maximum of the N400 seen in adults.

Figure 9.8 Sentence-level lexical-semantic informa­ tion. ERP data and difference maps (incorrect–cor­ rect) of 19-month-olds, 24-month-olds, and adults in response to the sentence endings of semantically cor­ rect and incorrect sentences in a semantic violation paradigm. Modified from Friedrich & Friederici, 2005c.

Despite the different effects reported in the ERP studies on sentential semantic process­ ing, the current ERP studies suggest that semantic processes at sentence level, as reflect­ ed by an N400-like response, are, in principle, present at the end of children’s second year of life. However, it takes a few more years (p. 185) before the neural network under­ lying these processes is established in an adult-like manner. A recent fMRI study investigating the neural network underlying sentence-level semantic processes in 5- to 6-year-old children and adults provides some evidence for the differ­ ence between the neural network recruited in children and adults (Brauer & Friederici, Page 20 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension 2007). Activation in children was found bilaterally in the superior temporal gyri and in the inferior and middle frontal gyri for the processing of correct sentences and semanti­ cally incorrect sentences. Compared with adults, the children’s language network was less lateralized, was less specialized with respect to different aspects of language pro­ cessing (semantics versus syntax, see also below), and engaged additional areas in the in­ ferior frontal cortex bilaterally. Another fMRI study examined lexical-semantic decisions for semantically congruous and incongruous sentences in older children, aged 7 to 10 years, and adults (Moore-Parks et al., 2010). Overall, the results suggested that by the end of children’s first decade, they employ a similar cortical network in semantic process­ ing as adults, including activation in left inferior frontal, left middle temporal, and bilater­ al superior temporal gyri. However, results also revealed developmental differences, with adults showing greater activation in the left inferior frontal gyrus, left supramarginal gyrus, and left inferior parietal lobule as well as motor-related regions.

Syntactic Rules In any language, a well-defined rule system determines the composition of lexical ele­ ments, thus giving the sentence its structure. The analysis of syntactic relations between words and phrases is a complicated process, yet children have acquired the basic syntac­ tic rules of their native language (p. 186) by the end of their third year (see Guasti, 2002; Hirsh-Pasek, & Golinkoff, 1996; Szagun, 2006). For successful sentence comprehension, two aspects of syntax processing appear to be of particular relevance: first, the structure of each phrase that has to be built on the basis of word category information; and second, the grammatical relationship between the various sentence elements, which has to be es­ tablished in order to allow the interpretation of who is doing what to whom.

Figure 9.9 Syntactic rules. ERP data of 24-montholds, 32-month-olds, and adults in response to syn­ tactically correct and incorrect sentences in a syntac­ tic violation paradigm. Modified from Oberecker, Friedrich, & Friederici, 2005.

Adult ERP and fMRI studies have investigated the neural correlates of syntactic process­ ing during sentence comprehension by focusing on two aspects: phrase structure build­ ing and the establishment of grammatical relations and thereby the sentence’s interpreta­ tion. Studies of the former have used the syntactic violation paradigm (e.g., Atchley et al., 2006; Friederici, Pfeifer, & Hahne, 1993). In this paradigm, syntactically correct and syn­ tactically incorrect sentences are presented, with the latter having morphosyntactic, phrase structure, or tense violations. In the ERP response to syntactically incorrect sen­ Page 21 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension tences containing phrase structure violations, two components have been observed. The first is the ELAN, an early anterior negativity, which is interpreted to reflect highly auto­ matic phrase structure building processes (Friederici, Pfeifer, & Hahne, 1993; Hahne & Friederici, 1999). The second is the P600, a later-occurring central-parietal positivity, which is interpreted to indicate processes of syntactic integration (Kaan et al., 2000) and controlled processes of syntactic reanalysis and repair (Friederici, Hahne, & Mecklinger, 1996; Osterhout & Holcomb, 1993). This biphasic ERP pattern in response to phrase structure violations has been observed for both passive and active sentence constructions (Friederici, Pfeifer, & Hahne, 1993; Hahne, Eckstein, & Friederici, 2004; Hahne & Friederici, 1999; Rossi, Gugler, Hahne, & Friederici, 2005). Several developmental ERP studies have examined at what age children process phrase structure violations and therefore show the syntax-related ERP components ELAN and P600 as observed in adults (Oberecker, Friedrich, Friederici, 2005; Oberecker & Friederi­ ci, 2006). In these experiments, 24- and 32-month-old German children listened to syntac­ tically correct sentences and incorrect sentences that comprised incomplete prepositional phrases. For example, the noun after the preposition was omitted as in *The lion in the ___ roars versus The lion roars. As illustrated in Figure 9.9, the adult data revealed the ex­ pected biphasic ERP pattern in response to the sentences containing a phrase structure violation. The ERP responses of 32-month-old children showed a similar ERP pattern, al­ though both components appeared in later time windows than the adult data. Interesting­ ly, 24-month-old children also showed a difference between correct and incorrect sen­ tences; however, in this age group only, a P600 but no ELAN occurred. Recently, Bernal, Dehaene-Lambertz, Millotte, and Christophe (2010) demonstrated that 24-month-old French children compute syntactic structure when listening to spoken sen­ tences. The authors report an early left-lateralized ERP response for word category viola­ tions (i.e., when an expected verb was incorrectly replaced by a noun, or vice versa). Sil­ va-Pereyra and colleagues examined the processing of tense violations in sentences in children between 30 and 48 months (Silva-Pereyra et al., 2005; (p. 187) Silva-Pereyra, Rivera-Gaxiola, & Kuhl, 2005). The ERPs to incorrect sentences revealed a late positivity for the older children and a very late-occurring positivity for the 30-month-olds. In a re­ cent series of ERP experiments, Silva-Pereyra, Conboy, Klarmann, and Kuhl (2007) studied syntactic processing in 3-year-olds, using natural sentences and sentences with­ out semantic information (so-called jabberwocky sentences) in which content words are replaced by pseudowords. Children were presented with correct sentences and incorrect sentences containing phrase structure violations. For the natural sentences, children showed two positivities in response to the syntactic violations, whereas for the syntacti­ cally incorrect jabberwocky sentences, two negativities were observed. This ERP pattern is certainly different from that in adults, who show an ELAN and a P600 in normal and jabberwocky sentences, with a constant amplitude of the ELAN and a reduced P600 for jabberwocky sentences, in which integration is not necessary (Hahne & Jescheniak, 2001; Yamada & Neville, 2007).

Page 22 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension Hahne, Eckstein, and Friederici (2004) investigated the processing of phrase structure vi­ olations in syntactically more complex, noncanonical sentences (i.e., passive sentences such as The boy was kissed by the girl). In these sentences, the first noun (the boy) is not the actor, which makes the interpretation more difficult than in active sentences. When a syntactic violation occurred in passive sentences, the ELAN-P600 pattern was evoked in 7- to 13-year-old children. Six-year-olds, however, only displayed a late P600. The combined ERP results point to developmental differences suggesting that automatic syntactic processes, reflected by the ELAN, are present later during language develop­ ment than processes reflected by the P600. Moreover, the adult-like ERP pattern is present earlier for active than for passive sentences. This developmental course is in line with behavioral findings indicating that the processing of noncanonical sentences only de­ velops late, after the age of 5 years and, depending on the syntactic structure only around the age of 7 years (Dittmar, Abbot-Smith, Lieven, & Tomasello, 2008). The neural network underlying syntactic processes in the developing brain has recently been investigated in an fMRI study with 5- to 6-year-olds using the syntactic violation par­ adigm (Brauer & Friederici, 2007). Sentences containing a phrase structure violation bi­ laterally activated the superior temporal gyri and the inferior and middle frontal gyri (similar to correct and semantically incorrect sentences) but, moreover, specifically acti­ vated left Broca’s area. Compared with that in adults, this activation pattern was less lat­ eralized, less specific, and more extended. A time course analysis of the perisylvian acti­ vation across correct and incorrect sentences also revealed developmental differences. In contrast to that in adults, children’s inferior frontal cortex responded much later than their superior temporal cortex (Figure 9.10). Moreover, in contrast to adults, children dis­ played a temporal primacy of right-hemispheric over left-hemispheric activation (Brauer, Neumann & Friederici, 2008), which suggests a strong reliance on right-hemisphere prosodic processes during auditory sentence comprehension in childhood. In a recent fM­ RI study with 10- to 16-year-old children, Yeatman, Ben-Shachar, Glover, and Feldmann (2010) investigated sentence processing by systematically varying syntactic complexity and observed broad activation patterns in frontal, temporal, temporal-parietal and cingu­ late regions. Independent of sentence length, syntactically more complex sentences evoked stronger activation in the left temporal-parietal junction and the right superior temporal gyrus. Interestingly, activation changes in frontal regions correlated with vocab­ ulary and syntax perception measures. Thus, individual differences in activation patterns demonstrate that auditory sentence comprehension is based on a dynamic and distrib­ uted network that is modulated by age, language skills, and task demands.

Page 23 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension


Figure 9.10 Temporal organization of cortical activa­ tion during auditory sentence comprehension. Brain activation of adults and children in sagittal section (x = −50) and horizontal section (z = 2). Data are masked by random-effects activation maps at z = 2.33 and display a color coding for time-to-peak val­ ues in active voxels between 3.0 and 8.0 seconds. The lines indicate the cut for the corresponding sec­ tion. Note the very late response in the inferior frontal cortex in children and their hemispheric dif­ ferences in this region. Inserted diagrams demon­ strate examples of BOLD responses to sentence com­ prehension in Broca’s area and in Heschl’s gyrus. Reprinted with permission from Brauer, Neumann, & Friederici, 2008.

The results of the reported behavioral and neuroimaging studies broadly cover phonologi­ cal/prosodic, semantic, and syntactic aspects of language acquisition during the first years of life. In developmental research, ERPs are well established and often the method of choice; however, MEG, NIRS, and fMRI have recently been adjusted for use in develop­ mental populations. Because the ERP method delivers information about the neural corre­ lates of different aspects of language processing, it is an excellent tool for the investiga­ tion of the various developmental stages in language acquisition. More specifically, a par­ ticular ERP component, the MMR, which reflects discrimination not only of acoustic but also of phonological features, can thus be used to examine very early stages of language acquisition, even in newborns. A further ERP component that indicates lexical and seman­ tic processes in adults, the N400, has been registered in 14-month-olds, but has not been found in 12-month-olds, and can (p. 188) be used to investigate phonotactic knowledge, word knowledge, and knowledge of lexical-semantic relations between basic-level words and verbs and their arguments in sentences. For the syntactic domain, an adult-like biphasic ERP pattern, the ELAN-P600, is not yet present in 24-month-olds but is in 32month-old children for the processing of structural dependencies within phrases, thus Page 24 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension characterizing the developmental progression of syntax acquisition. Other methods, par­ ticularly fMRI, deliver complementary evidence that the neural basis underlying specific aspects of language processing, such as semantics and syntax, is still under development for a few more years before adult-like language processes are achieved. In summary, neuroimaging methods, in addition to behavioral studies, provide relevant in­ formation on various aspects of language processing. Although developmental research is still far from a detailed outline of the exact steps in language acquisition, the use of so­ phisticated neuroscientific methods with high temporal or spatial resolution allows re­ searchers to study language development from very early on and to gain a more finegrained picture of the language acquisition process and its neural basis.

References Atchley, R. A., Rice, M. L., Betz, S. K., Kwasney, K. M., Sereno, J. A., & Jongman, A. (2006). A comparison of semantic and syntactic event related potentials generated by children and adults. Brain & Language, 99, 236–246. Balsamo, L. M., Xu B., & Gaillard W. D. (2006). Language lateralization and the role of the fusiform gyrus in semantic processing in young children. NeuroImage, 31 (3), 1306–1314. Bentin, S., Mouchetant-Rostaing, Y., Giard, M. H., Echallier, J. F., & Pernier, J. (1999). ERP manifestations of processing printed words at different psycholinguistic levels: Time course and scalp distribution. Journal of Cognitive Neuroscience, 11 (3), 235–260. Bernal, S., Dehaene-Lambertz, G., Millotte, S., & Christophe, A. (2010). Two-yearolds compute syntactic structure online. Developmental Science, 13 (1), 69–76. (p. 189)

Booth, J. R., Burman, D. D., Meyer, J. R., Gitelman, D. R., Parrish, T. B., & Mesulam, M. M. (2004). Development of brain mechanisms for processing orthographic and phonologic representations. Journal of Cognitive Neuroscience, 16 (7), 1234–1249. Brauer, J., Anwander, A. & Friederici, A. D. (2011). Neuroanatomical prerequisites for lan­ guage functions in the maturing brain. Cerebral Cortex, 21, 459–466. Brauer, J., & Friederici, A. D. (2007). Functional neural networks of semantic and syntac­ tic processes in the developing brain. Journal of Cognitive Neuroscience, 19 (10), 1609– 1623. Brauer, J., Neumann, J., & Friederici, A. D. (2008). Temporal dynamics of perisylvian acti­ vation during language processing in children and adults. NeuroImage, 41 (4), 1484– 1492. Cao, F., Khalid, K., Zaveri, R., Bolger, D. J., Bitan, T., & Booth, J. R. (2010). Neural corre­ lates of priming effects in children during spoken word processing with orthographic de­ mands. Brain & Language, 114 (2), 80–89.

Page 25 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension Cheour, M., Ceponiene, R., Lehtokoski, A., Luuk, A., Allik, J., Alho, K., et al. (1998). Devel­ opment of language-specific phoneme representations in the infant brain. Nature Neuro­ science, 1, 351–353. Cheour, M., Imada, T., Taulu, S., Ahonen, A., Salonen, J., & Kuhl, P. K. (2004). Magnetoen­ cephalography is feasible for infant assessment of auditory discrimination. Experimental Neurology, 190, 44–51. Chou, T. L., Booth, J. R., Burman, D. D., Bitan, T., Bigio, J. D., Lu, D., & Cone, N. E. (2006). Developmental changes in the neural correlates of semantic processing. NeuroImage, 29, 1141–1149. Clark, E. V. (2003). First language acquisition. Cambridge, MA: Cambridge University Press. Courchesne, E. (1990). Chronology of postnatal human brain development: Event-related potential, positron emission tomography, myelogenesis, and synaptogenesis studies. In J. W. Rohrbaugh, R. Parasuraman, & R. Johnson (Eds.), Event-related brain potentials: Basic issues and applications (pp. 210–241). New York: Oxford University Press. Csibra, G., Kushnerenko, E., & Grossmann, T. (2008). Electrophysiological methods in studying infant cognitive development. In: C. Nelson & M. Luciana (Eds.), Handbook of developmental cognitive neuroscience, 2nd ed. (pp. 247–262). Cambridge, MA: MIT Press. Cutler, A., & Carter, D. (1987). The predominance of strong initial syllables in the English vocabulary. Computational Speech and Language, 2, 133–142. de Guibert, C., Maumeta, C., Ferréa, J.-C., Jannina, P., Birabeng, A., Allairee, C., Barillota, C., & Le Rumeur, E. (2010). FMRI language mapping in children: A panel of language tasks using visual and auditory stimulation without reading or metalinguistic require­ ments. NeuroImage, 51 (2), 897–909. Dehaene-Lambertz, G., & Dehaene, S. (1994). Speed and cerebral correlates of syllable discrimination in infants. Nature, 370, 292–295. Dehaene-Lambertz, G., Dehaene, S., & Hertz-Pannier, L. (2002). Functional neuroimaging of speech perception in infants. Science, 298 (5600), 2013–2015. Dehaene-Lambertz, G., Hertz-Pannier, L., Dubois, J., Mériaux, S., Roche, A., Sigman, M., et al. (2006). Functional organization of perisylvian activation during presentation of sen­ tences in preverbal infants. Proceedings of the National Academy of Sciences U S A, 103, 14240–14245. Dehaene-Lambertz, G., Montavont, A., Jobert, A., Allirol, L., Dubois, J., Hertz-Pannier, L., & Dehaene, S. (2010). Language or music, mother or Mozart? Structural and environmen­ tal influences on infants’ language networks. Brain & Language, 114 (2), 53–65. Page 26 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension Dittmar, M., Abbot-Smith, K., Lieven, E., & Tomasello, M. (2008). Young German children’s early syntactic competence: A preferential looking study. Developmental Science, 11 (4), 575–582. Dubois, J., Dehaene-Lambertz, G., Perrin, M., Mangin, J.-F., Cointepas, Y., Duchesnay, E., et al. (2008). Asynchrony of the early maturation of white matter bundles in healthy in­ fants: Quantitative landmarks revealed noninvasively by diffusion tensor imaging. Human Brain Mapping, 29 (1), 14–27. Fonov, V. S., Evans, A. C., Botteron, K., Almli, C. R., McKinstry, R. C., Collins, D. L., et al. (2011). Unbiased average age-appropriate atlases for pediatric studies. NeuroImage, 54, 313–327. Friederici, A. D. (2005). Neurophysiological markers of early language acquisition: From syllables to sentences. Trends in Cognitive Sciences, 9, 481–488. Friederici, A. D., & Alter, K. (2004). Lateralization of auditory language functions: A dy­ namic dual pathway model. Brain and Language, 89, 267–276. Friederici, A. D., & Wessels, J. M. (1993). Phonotactic knowledge and its use in infant speech perception. Perception and Psychophysics, 54, 287–295. Friederici, A. D., Friedrich, M., & Christophe, A. (2007). Brain responses in 4-month-old infants are already language specific. Current Biology, 17 (14), 1208–1211. Friederici, A. D., Friedrich, M., & Weber, C. (2002). Neural manifestation of cognitive and precognitive mismatch detection in early infancy. NeuroReport, 13, 1251–1254. Friederici, A. D., Hahne, A., & Mecklinger, A. (1996). Temporal structure of syntactic parsing: Early and late event-related brain potential effects. Journal of Experimental Psy­ chology: Learning Memory and Cognition, 22, 1219–1248. Friederici, A. D., Pfeifer, E., & Hahne, A. (1993). Event-related brain potentials during natural speech processing: Effects of semantic, morphological and syntactic violations. Cognitive Brain Research, 1, 183–192. Friedrich, M., & Friederici, A. D. (2004). N400-like semantic incongruity effect in 19month-olds: Processing known words in picture contexts. Journal of Cognitive Neuro­ science, 16, 1465–1477. Friedrich, M., & Friederici, A. D. (2005a). Phonotactic knowledge and lexical-semantic priming in one-year-olds: Brain responses to words and nonsense words in picture con­ texts. Journal of Cognitive Neuroscience, 17 (11), 1785–1802. Friedrich, M., & Friederici, A. D. (2005b). Lexical priming and semantic integration re­ flected in the ERP of 14-month-olds. NeuroReport, 16 (6), 653–656.

Page 27 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension Friedrich, M., & Friederici, A. D. (2005c). Semantic sentence processing reflected in the event-related potentials of one- and two-year-old children. NeuroReport, 16 (6), 1801– 1804. Friedrich, M., & Friederici, A. D. (2010). Maturing brain mechanisms and developing be­ havioral language skills. Brain and Language, 114, 66–71. Gervain, J., Mehler, J., Werker, J. F., Nelson, C. A., Csibra, G., Lloyd-Fox, S., et al. (2011). Near-infrared spectroscopy: A report from the McDonnell infant methodology consor­ tium. Developmental Cognitive Neuroscience, 1 (1), 22–46. Gleitman, L. R., & Wanner, E. (1982). The state of the state of the art. In E. Wan­ ner & L. Gleitman (Eds.), Language acquisition: The state of the art (pp. 3–48). Cam­ bridge, MA: Cambridge University Press. (p. 190)

Gout, A., Christophe, A., & Morgan, J. L. (2004). Phonological phrase boundaries con­ strain lexical access. II. Infant data. Journal of Memory and Language, 51, 548–567. Goyet, L., de Schonen, S., & Nazzi, T. (2010). Words and syllables in fluent speech seg­ mentation by French-learning infants: An ERP study. Brain Research, 1332, 75–89. Gratton, G., & Fabiani, M. (2001). Shedding light on brain function: The event-related op­ tical signal. Trends in Cognitive Sciences, 5, 357–363. Grossmann, T., Johnson, M. H., Lloyd-Fox, S., Blasi, A., Deligianni, F., Elwell, C., et al. (2008). Early cortical specialization for face-to-face communication in human infants. Pro­ ceedings of the Royal Society B, 275, 2803–2811. Grossmann, T., Oberecker, R., Koch, S. P., & Friederici, A. D. (2010). Developmental ori­ gins of voice processing in the human brain. Neuron, 65, 852–858. Guasti, M. T. (2002). Language acquisition: The growth of grammar. Cambridge, MA: MIT Press. Hahne, A., & Friederici, A. D. (1999). Electrophysiological evidence for two steps in syn­ tactic analysis: Early automatic and late controlled processes. Journal of Cognitive Neuro­ science, 11, 194–205. Hahne, A., & Friederici, A. D. (2002). Differential task effects on semantic and syntactic processes as revealed by ERPs. Cognitive Brain Research, 13, 339–356. Hahne, A., & Jescheniak, J. D. (2001). What’s left if the Jabberwock gets the semantics? An ERP investigation into semantic and syntactic processes during auditory sentence comprehension. Cognitive Brain Research, 11, 199–212. Hahne, A., Eckstein, K., & Friederici, A. D. (2004). Brain signatures of syntactic and se­ mantic processes during children’s language development. Journal of Cognitive Neuro­ science, 16, 1302–1318. Page 28 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension Hickok, G., & Poeppel, D. (2007). The cortical organization of speech processing. Nature Reviews Neuroscience, 8, 393–402. Hirsh-Pasek, K., & Golinkoff, R. M. (1996). The origins of grammar: Evidence from early language comprehension. Cambridge, MA: MIT Press. Höhle, B., Bijeljac-Babic, R., Herold, B., Weissenborn, J., & Nazzi, T. (2009). Language specific prosodic preferences during the first half year of life: Evidence from German and French infants. Infant Behavior and Development, 32 (3), 262–274. Höhle, B., Weissenborn, J., Schmitz, M., & Ischebeck, A. (2001). Discovering word order regularities: The role of prosodic information for early parameter setting. In J. Weis­ senborn & B. Höhle (Eds.), Approaches to bootstrapping. Phonological, lexical, syntactic and neurophysiological aspects of early language acquisition (Vol. 1, p. 249–265). Amster­ dam: John Benjamins. Holcomb, P. J. (1993). Semantic priming and stimulus degradation: Implications for the role of the N400 in language processing. Psychophysiology, 30, 47–61. Holcomb, P. J., Coffey, S. A., & Neville, H. J. (1992). Visual and auditory sentence process­ ing: A developmental analysis using event-related brain potentials. Developmental Neu­ ropsychology, 8, 203–241. Homae, F., Watanabe, H., Nakano, T., Asakawa, K., & Taga, G. (2006). The right hemi­ sphere of sleeping infant perceives sentential prosody. Neuroscience Research, 54 (4), 276–280. Homae, F., Watanabe, H., Nakano, T., & Taga, G. (2007). Prosodic processing in the devel­ oping brain. Neuroscience Research, 59 (1), 29–39. Houston, D. M., Jusczyk, P. W., Kuijpers, C., Coolen, R., & Cutler, A. (2000). Cross-lan­ guage word segmentation by 9-month-olds. Psychonomic Bulletin Review 7, 504–509. Imada, T., Zhang, Y., Cheour, M., Taulu, S., Ahonen, A., & Kuhl, P. K. (2006). Infant speech perception activates Broca`s area: A developmental magnetoencephalography study. Neu­ roReport, 17, 957–962. Jusczyk, P. W., Cutler, A., & Redanz, N. J. (1993). Infants’ preference for the predominant stress patterns of English words. Child Development, 64, 675–687. Jusczyk, P. W., Friederici, A. D., Wessels, J. M. I., Svenkerud, V., & Jusczyk, A. M. (1993). Infants’ sensitivity to the sound patterns of native language words. Journal of Memory and Language, 32, 402–420. Jusczyk, P. W., Houston, D. M., & Newsome, M. (1999). The beginnings of word segmenta­ tion in English-learning infants. Cognitive Psychology, 39 (3–4), 159–207.

Page 29 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension Kaan, E., Harris, A., Gibson, E., & Holcomb, P. (2000). The P600 as an index of syntactic integration difficulty. Language and Cognitive Processes, 15, 159–201. Kooijman, V., Hagoort, P., & Cutler, A. (2009). Prosodic structure in early word segmenta­ tion: ERP evidence from Dutch ten-month-olds. Infancy, 14, 591–612. Kooijman, V., Johnson, E. K., & Cutler, A. (2008). Reflections on reflections of infant word recognition. In A. D. Friederici & G. Thierry (Eds.), Early language development: Bridging brain and behaviour (TiLAR 5, p. 91–114). Amsterdam: John Benjamins. Kuhl, P. K. (2004). Early language acquisition: Cracking the speech code. Nature Reviews Neuroscience, 5, 831–843. Kuhl, P. K., Conboy, B. T., Coffey-Corina, S., Padden, D., Rivera-Gaxiola, M., & Nelson, T. (2008). Phonetic learning as a pathway to language: New data and native language mag­ net theory expanded (NLM-e). Philosophical Transactions of the Royal Society B, 363, 979–1000. Kuijpers, C. T. L., Coolen, R., Houston, D., Cutler, A. (1998). Using the headturning tech­ nique to explore cross-linguistic performance differences. Advances in Infancy Research, 12, 205–220. Kujala, A., Huotilainen, M., Hotakainen, M., Lennes, M., Parkkonen, L., Fellman, et al. (2004). Speech-sound discrimination in neonates as measured with MEG. NeuroReport, 15 (13), 2089–2092. Kushnerenko, E., Ceponiene, R., Balan, P., Fellman, V., Huotilainen, M., & Näätänen, R. (2002b). Maturation of the auditory event-related potentials during the first year of life. NeuroReport, 13, 47–51. Kushnerenko, E., Ceponiene, R., Balan, P., Fellman, V., & Näätänen, R. (2002a). Matura­ tion of the auditory change detection response in infants: A longitudinal ERP study. Neu­ roReport, 13 (15), 1843–1846. Kushnerenko, E., Cheour, M., Ceponiene, R., Fellman, V., Renlund, M., Soininen, K., et al. (2001). Central auditory processing of durational changes in complex speech patterns by newborns: An event-related brain potential study. Developmental Neuropsychology, 19 (1), 83–97. Kutas, M., & van Petten, C. K. (1994). Psycholinguistics electrified: Event-related brain potential investigations. In M. A. Gernsbacher (Ed.), Handbook of psycholinguistics (pp. 83–143). San Diego, CA: Academic Press. Leach, J. L., & Holland, S. K. (2010). Functional MRI in children: Clinical and research ap­ plications. Pediatric Radiology, 40, 31–49.

Page 30 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension Leppänen, P. H. T., Pikho, E., Eklund, K. M., & Lyytinen, H. (1999). Cortical re­ sponses of infants with and without a genetic risk for dyslexia: II. Group effects. NeuroRe­ port, 10, 969–973. Cambridge, MA: MIT Press. (p. 191)

Lloyd-Fox, S., Blasi, A., & Elwell, C. E. (2010) Illuminating the developing brain: The past, present and future of functional near infrared spectroscopy. Neuroscience and Biobehav­ ioural Reviews, 34 (3), 269–284. Li, X. S., Shu, H., Liu, Y. Y., & Li, P. (2006). Mental representation of verb meaning: Behav­ ioral and electrophysiological evidence. Journal of Cognitive Neuroscience, 18 (10), 1774– 1787. Männel, C., & Friederici, A. D. (2008). Event-related brain potentials as a window to children’s language processing: From syllables to sentences. In I. A. Sekerina, E. M. Fer­ nandez, & H. Clahsen (Eds.), Developmental psycholinguistics: On-line methods in children’s language processing (LALD 44, p. 29–72). Amsterdam: John Benjamins. Männel, C., & Friederici, A. D. (2009). Pauses and intonational phrasing: ERP studies in 5month-old German infants and adults. Journal of Cognitive Neuroscience, 21 (10), 1988– 2006. Männel, C., & Friederici, A. D. (2010). Prosody is the key: ERP studies on word segmenta­ tion in 6- and 12-month-old children. Journal of Cognitive Neuroscience, Supplement, 261. Männel, C., & Friederici, A. D. (2011). Intonational phrase structure processing at differ­ ent stages of syntax acquisition: ERP studies in 2-, 3-, and 6-year-old children. Develop­ mental Science, 14 (4), 786–798. Mestres-Misse, A., Rodriguez-Fornells, A., & Münte, T. F. (2010). Neural differences in the mapping of verb and noun concepts onto novel words. NeuroImage, 49, 2826–2835. Meyer, M., Alter, K., Friederici, A. D., Lohmann, G., & von Cramon, D. Y. (2002). fMRI re­ veals brain regions mediating slow prosodic modulations in spoken sentences. Human Brain Mapping, 17 (2), 73–88. Meyer, M., Steinhauer, K., Alter, K., Friederici, A. D., & von Cramon, D. Y. (2004). Brain activity varies with modulation of dynamic pitch variance in sentence melody. Brain and Language, 89 (2), 277–289. Mills, D. L., Coffey-Corina, S. A., & Neville, H. J. (1997). Language comprehension and cerebral specification from 13 to 20 months. Developmental Neuropsychology, 13 (3), 397–445. Mills, D. L., Plunkett, K., Prat, C., & Schafer, G. (2005). Watching the infant brain learn words: Effects of vocabulary size and experience. Cognitive Development, 20, 19–31.

Page 31 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension Mills, D. L., Prat, C., Zangl, R., Stager, C. L., Neville, H. J., & Werker, J. F. (2004). Lan­ guage experience and the organization of brain activity to phonetically similar words: ERP evidence from 14- and 20-month-olds. Journal of Cognitive Neuroscience, 16 (8), 1452–1464. Minagawa-Kawai, Y., Mori, K., Furuya, I., Hayashi R., & Sato, Y. (2002). Assessing cere­ bral representations of short and long vowel categories by NIRS. NeuroReport, 13, 581– 584. Minagawa-Kawai, Y., Mori, K., Hebden, J., & Dupoux, E. (2008). Optical imaging of in­ fants’ neurocognitive development: Recent advances and perspectives. Developmental Neurobiology, 68 (6), 712–728. Minagawa-Kawai, Y., Mori, K., Naoi, N., & Kojima, S. (2007). Neural attunement process­ es in infants during the acquisition of a language-specific phonemic contrast. Journal of Neuroscience, 27, 315–321. Minagawa-Kawai, Y., van der Lely, H., Ramus, F., Sato, Y., Mazuka, R., & Dupoux, E. (2011). Optical brain imaging reveals general auditory and language-specific processing in early infant development. Cerebral Cortex, 21 (2), 254–261. Moore-Parks, E. N., Burns, E. L., Bazzill, R., Levy, S., Posada, V., & Muller, R. A. (2010). An fMRI study of sentence-embedded lexical-semantic decision in children and adults. Brain and Language, 114 (2), 90–100. Morr, M. L., Shafer, V. L., Kreuzer, J., & Kurtzberg, D. (2002). Maturation of mismatch negativity in infants and pre-school children. Ear and Hearing, 23, 118–136. Muzik, O., Chugani, D. C., Juhasz, C., Shen, C., & Chugani, H. T. (2000). Statistical para­ metric mapping: Assessment of application in children. NeuroImage, 12, 538–549. Näätänen, R. (1990). The role of attention in auditory information processing as revealed by event-related potentials and other brain measures of cognitive function. Behavioral and Brain Sciences, 13, 201–288. Nazzi, T., Dilley, L. C., Jusczyk, A. M., Shattuck-Hufnagel, S., & Jusczyk, P. W. (2005). Eng­ lish-learning infants’ segmentation of verbs from fluent speech. Language and Speech, 48, 279–298. Nazzi, T., Iakimova, G., Bertoncini, J., Frédonie, S., & Alcantara, C. (2006). Early segmen­ tation of fluent speech by infants acquiring French: Emerging evidence for crosslinguistic differences. Journal of Memory and Language, 54, 283–299. Nazzi, T., Kemler Nelson, D. G., Jusczyk, P.W., & Jusczyk, A. M. (2000). Six-month-olds’ de­ tection of clauses embedded in continuous speech: Effects of prosodic well-formedness. Infancy, 1 (1), 123–147.

Page 32 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension Nobre, A. C., & McCarthy, G. (1994). Language-related ERPs: Scalp distributions and modulation by word type and semantic priming. Journal of Cognitive Neuroscience, 6 (33), 233–255. Oberecker, R., & Friederici, A. D. (2006). Syntactic event-related potential components in 24-month-olds’ sentence comprehension. NeuroReport, 17 (10), 1017–1021. Oberecker, R., Friedrich, M., & Friederici, A. D. (2005). Neural correlates of syntactic processing in two-year-olds. Journal of Cognitive Neuroscience, 17, 407–421. Obrig, H., & Villringer, A. (2003). Beyond the visible: Imaging the human brain with light. Journal of Cerebral Blood Flow and Metabolism, 23, 1–18. Okamoto, M., Dan, H., Shimizu, K., Takeo, K., Amita, T. Oda, I., et al. (2004). Multimodal assessment of cortical activation during apple peeling by NIRS and fMRI. NeuroImage, 21, 1275–1288. Osterhout, L., & Holcomb, P. J. (1993). Event-related brain potentials and syntactic anom­ aly: Evidence on anomaly detection during perception of continuous speech. Language and Cognitive Processes, 8, 413–437. Pannekamp, A., Toepel, U., Alter, K., Hahne, A., & Friederici, A. D. (2005). Prosody-driven sentence processing: An event-related brain potential study. Journal of Cognitive Neuro­ science, 17, 407–421. Pena, M., Maki, A., Kovacic, D., Dehaene-Lambertz, G., Koizumi, H., Bouquet, F., et al. (2003). Sounds and silence: An optical topography study of language recognition at birth. Proceedings of the National Academy of Sciences U S A, 100 (20), 11702–11705. Perani, D., Saccuman, M. C., Scifo, P., Spada, D., Andreolli, G., Rovelli, R., Baldoli, C., & Koelsch, S. (2010). Functional specializations for music processing in the human newborn brain. Proceedings of the National Academy of Sciences U S A, 107 (10), 4758–4763. Pihko, E., Leppänen, P. H. T., Eklund, K. M., Cheour, M., Guttorm, T. K., & Lyyti­ nen, H. (1999). Cortical responses of infants with and without a genetic risk for dyslexia: I. Age effects. NeuroReport, 10, 901–905. (p. 192)

Rivera-Gaxiola, M., Silva-Pereyra, J., & Kuhl, P. K. (2005). Brain potentials to native- and non-native speech contrasts in seven- and eleven-month-old American infants. Develop­ mental Science, 8, 162–172. Rivkin, M. J., Wolraich, D., Als, H., McAnulty, G., Butler, S., Conneman, N., et al. (2004). Prolonged T*[2] values in newborn versus adult brain: Implications for fMRI studies of newborns. Magnetic Resonance in Medicine, 51 (6), 1287–1291. Rossi, S., Gugler, M. F., Hahne, A., & Friederici, A. D. (2005). When word category infor­ mation encounters morphosyntax: An ERP study. Neuroscience Letters, 384, 228–233.

Page 33 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension Saito, Y., Aoyama, S., Kondo, T., Fukumoto, R., Konishi, N., Nakamura, K., Kobayashi, M., & Toshima, T. (2007a). Frontal cerebral blood flow change associated with infant-directed speech. Archives of Disease in Childhood. Fetal and Neonatal Edition, 92 (2), F113–F116. Saito, Y., Kondo, T., Aoyama, S., Fukumoto, R., Konishi, N., Nakamura, K., Kobayashi, M., & Toshima, T. (2007b). The function of the frontal lobe in neonates for response to a prosodic voice. Early Human Development, 83 (4), 225–230. Sambeth, A., Ruohio, K., Alku, P., Fellman, V., & Huotilainen, M. (2008). Sleeping new­ borns extract prosody from continuous speech. Clinical Neurophysiology, 119 (2), 332– 341. Sansavini, A., Bertoncini, J., & Giovanelli, G. (1997). Newborns discriminate the rhythm of multisyllabic stressed words. Developmental Psychology, 33 (1), 3–11. Schafer, A. J., Speer, S. R., Warren, P., & White, S. D. (2000). Intonational disambiguation in sentence production and comprehension. Journal of Psycholinguistic Research, 29, 169–182. Schapiro, M. B., Schmithorst, V. J., Wilke, M., Byars Weber, A., Strawsburg, R. H., & Hol­ land, S. K. (2004). BOLD fMRI signal increases with age in selected brain regions in chil­ dren. NeuroReport, 15 (17), 2575–2578. Schlaggar, B. L., Brown, T. T., Lugar, H. L., Visscher, K. M., Miezin, F. M., & Petersen, S. E. (2002). Functional neuroanatomical differences between adults and school-age chil­ dren in the processing of single words. Science, 296, 1476–1479. Schroeter, M. L., Zysset, S., Wahl, M., & von Cramon, D. Y. (2004). Prefrontal activation due to Stroop interference increases during development: An event-related fNIRS study. NeuroImage, 23, 1317–1325. Seidl, A. (2007). Infants’ use and weighting of prosodic cues in clause segmentation. Jour­ nal of Memory and Language, 57, 24–48. Seidl, A., & Johnson, E. K. E. (2007). Boundary alignment facilitates 11-month-olds’ seg­ mentation of vowel-initial words from speech. Journal of Child Language, 34, 1–24. Selkirk, E. (1984). Phonology and syntax: The relation between sound and structure. Cambridge, MA: MIT Press. Silva-Pereyra, J., Conboy, B. T., Klarman, L., & Kuhl, P. K. (2007). Grammatical processing without semantics? An event-related brain potential study of preschoolers using jabber­ wocky sentences. Journal of Cognitive Neuroscience, 19 (6), 1–16. Silva-Pereyra, J., Klarman, L., Lin, L. J., & Kuhl, P. K. (2005). Sentence processing in 30month-old children: An event-related potential study. NeuroReport, 16, 645–648.

Page 34 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension Silva-Pereyra, J., Rivera-Gaxiola, M., & Kuhl, P. K. (2005). An event-related brain potential study of sentence comprehension in preschoolers: Semantic and morphosyntactic pro­ cessing. Cognitive Brain Research, 23, 247–258. Skoruppa, K., Pons, F., Christophe, A., Bosch, L., Dupoux, E., Sebastián-Gallés, N., & Peperkamp, S. (2009). Language-specific stress perception by nine-month-old French and Spanish infants. Developmental Science, 12, 914–919. Soderstrom, M., Nelson, D. G. K., & Jusczyk, P. W. (2005). Six-month-olds recognize claus­ es embedded in different passages of fluent speech. Infant Behavior & Development, 28, 87–94. Soderstrom, M., Seidl, A., Nelson, D. G. K., & Jusczyk, P. W. (2003). The prosodic boot­ strapping of phrases: Evidence from prelinguistic infants. Journal of Memory and Lan­ guage, 49 (2), 249–267. Steinhauer, K., Alter, K., & Friederici, A. D. (1999). Brain potentials indicate immediate use of prosodic cues in natural speech processing. Nature Neuroscience, 2, 191–196. Szagun, G. (2006). Sprachentwicklung beim Kind. Weinheim: Beltz. Tan, A., & Molfese, D. L. (2009). ERP Correlates of noun and verb processing in preschool-age children. Biological Psychology, 8 (1), 46–51. Thierry, G., Vihman, M., & Roberts, M. (2003). Familiar words capture the attention of 11month-olds in less than 250 ms. NeuroReport, 14, 2307–2310. Torkildsen, J. V. K., Sannerud, T., Syversen, G., Thormodsen, R., Simonsen, H. G., Moen, I., et al. (2006). Semantic organization of basic level words in 20-month-olds: An ERP study. Journal of Neurolinguistics, 19, 431–454. Torkildsen, J. V. K., Syversen, G., Simonsen, H. G., Moen, I., Smith, L., & Lindgren, M. (2007). Electrophysiological correlates of auditory semantic priming in 24-month-olds. Journal of Neurolinguistics, 20, 332–351. Trainor, L., Mc Fadden, M., Hodgson, L., Darragh Barlow, J., Matsos, L., & Sonnadara, R. (2003). Changes in auditory cortex and the development of mismatch negativity between 2 and 6 months of age. International Journal of Psychophysiology, 51, 5–15. Tsao, F.-M., Liu, H.-M., & Kuhl, P. K. (2004). Speech perception in infancy predicts lan­ guage development in the second year of life: A longitudinal study. Child Development, 75, 1067–1084. Vannest, J., Karunanayaka, P. R., Schmithorst, V. J., Szaflarski, J. P., & Holland, S. K. (2009). Language networks in children: Evidence from functional MRI studies. American Journal of Roentgenology, 192 (5), 1190–1196.

Page 35 of 36

Neural Correlates of the Development of Speech Perception and Compre­ hension Villringer, A., & Chance, B. (1997). Noninvasive optical spectroscopy and imaging of hu­ man brain function. Trends in Neuroscience, 20, 435–442. Weber, C., Hahne, A., Friedrich, M., & Friederici, A. D. (2004). Discrimination of word stress in early infant perception: Electrophysiological evidence. Cognitive Brain Research, 18, 149–161. West, W. C., & Holcomb, P. J. (2002). Event-related potentials during discourse-level se­ mantic integration of complex pictures. Cognitive Brain Research, 13, 363–375. Wilke, M., Holland, S. K., Altaye, M., & Gaser, C. (2008). Template-O-Matic: A toolbox for creating customized pediatric templates. NeuroImage, 41 (3), 903–913. Yamada, Y., & Neville, H. J. (2007). An ERP study of syntactic processing in English and nonsense sentences. Brain Research, 1130, 167–180. Yeatman, J. D., Ben-Shachar, M., Glover, G. H., & Feldman, H. M. (2010). Individual differ­ ences in auditory sentence comprehension in children: An exploratory event-related func­ tional magnetic resonance imaging investigation. Brain & Language, 114 (2), 72–79.

Angela Friederici

Angela D. Friederici, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany. Claudia Männel

Claudia Männel, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany

Page 36 of 36

Perceptual Disorders

Perceptual Disorders   Josef Zihl The Oxford Handbook of Cognitive Neuroscience, Volume 1: Core Topics Edited by Kevin N. Ochsner and Stephen Kosslyn Print Publication Date: Dec 2013 Subject: Psychology, Cognitive Neuroscience Online Publication Date: Dec 2013 DOI: 10.1093/oxfordhb/9780199988693.013.0010

Abstract and Keywords Perceptual processes provide the basis for mental representation of the visual, auditory, olfactory, gustatory, somatosensory, and social “worlds” as well as for guiding and con­ trolling cognitive, social, and motor activities. All perceptual systems, i.e. vision, audition, somatosensory perception, smell and taste, and social perception are segregated func­ tional networks and show a parallel-hierarchical type of organization of information pro­ cessing and encoding. In pathological conditions such as acquired brain injury, perceptu­ al functions and abilities can be variably affected, ranging from the loss of stimulus detec­ tion to impaired recognition. Despite the functional specialization of perceptual systems, association of perceptual deficits within sensory modalities is the rule, and disorders of a single perceptual function or ability are rare. This chapter describes cerebral visual, audi­ tory, somatosensory, olfactory, and gustatory perceptual disorders within a neuropsycho­ logical framework. Disorders in social perception are also considered because they repre­ sent a genuine category of perceptual impairments. Keywords: vision, audition, somatosensory perception, smell, taste, social perception, cerebral perceptual disor­ ders

Introduction Perception “is the process or result of becoming aware of objects, relationships, and events by means of the senses,” which includes activities such as detecting, discriminat­ ing, identifying, and recognizing. “These activities enable organisms to organize and in­ terpret the stimuli received into meaningful knowledge” (APA, 2007). Perception is con­ structed in the brain and involves lower and higher level processes that serve simpler and more complex perceptual abilities such as detection, identification, and recognition (Mather, 2006). The behavioral significance of perception lies not only in the processing of stimuli as a basis for mental representation of the visual, auditory, olfactory, gustatory, somatosensory, and social “worlds” but also in the guidance and control of activities. Thus there exists a reciprocal interaction between perception, cognition, and action. For per­ ceptual activities, attention, memory, and executive functions are crucial prerequisites. Page 1 of 37

Perceptual Disorders They form the bases for focusing on stimuli and maintaining attention during stimulus ac­ quisition and processing, storing percepts as experience and concepts, and controlling in­ put and output activities that allow for an optimal, flexible adaptation to extrinsic and in­ trinsic challenges. The aim of this chapter is to describe the effect of pathological conditions, particularly ac­ quired brain injury, on the various abilities in the domains of vision, audition, somatosen­ sory perception, smell and taste, and social perception as well as the behavioral conse­ quences and the significance of these disorders for the understanding of brain organiza­ tion. Perceptual disorders can result from injury to the afferent sensory pathways and/or to their subcortical and cortical processing and coding (p. 194) stages. Peripheral injury usually causes “lower level” dysfunctions (e.g., threshold elevation or difficulties with stimulus localization and sensory discrimination), whereas central injuries cause “higher level” perceptual dysfunctions (e.g., in the domains of identification and recognition). However, peripheral sensory deficits may also be associated with higher perceptual disor­ ders because the affected sensory functions and their interactions represent a crucial prerequisite for more complex perceptual abilities (i.e., detection and discrimination of stimuli build the basis for identification and recognition).

Vision Visual perception comprises lower level visual abilities (i.e., the visual field, visual acuity, contrast sensitivity, color and form vision, and stereopsis) and higher level visual abilities, in particular visual identification and recognition. Visual perceptual abilities also form the basis for visually guided behavior, such as oculomotor activities, hand and finger move­ ments, and spatial navigation. From its very beginning, visual neuroscience has been con­ cerned with the analysis of the various visual perceptual deficits and the identification of the location of the underlying brain injury. Early clinical reports on patients have already demonstrated the selective loss of visual abilities after acquired brain injury. These obser­ vations have suggested a functional specialization of the visual cortex, a concept verified many years later by combined anatomical, electrophysiological, and behavioral evidence (Desimone & Ungerleider, 1989; Grill-Spector & Malach, 2004; Orban, 2008; Zeki, 1993). The primary visual cortical area (striate cortex, Brodmann area 17, visual area 1, or V1) receives its input from the retina via the lateral geniculate body (LGN) and possesses a highly accurate, topographically organized representation of the retina and thus of the vi­ sual field. The central visual field occupies a large proportion of the striate cortex; about half of the cortical surface is devoted to the central 10 degrees of the visual field, which is only 1 percent of the visual field (Tootell, Hadjikhani, Mendola, Marrett, & Dale, 1998). In addition, V1 distributes specific visual signals to the other visual areas that are located in the surrounding cortex (for a review, see Bullier, 2003). This anatomical and functional or­ ganization enables the visual brain to deal with the processing of global and local fea­ tures of visual objects and scenes. The result of processing at distinct levels of complexity at each stage can be flexibly and dynamically integrated into coherent perception (Bar­ tels & Zeki, 1998; Tootell et al., 1998; Zeki, 1993; Zeki & Bartels, 1998). Because of the Page 2 of 37

Perceptual Disorders inhomogeneity of spatial resolution and acuity in the visual field (Anstis, 1974), the field size for processing visual details (e.g., form vision) is much smaller, comprising the inner 9 degrees of the binocular visual field (i.e., macular region; Henderson, 2003). Occipitalparietal, posterior parietal, and prefrontal mechanisms guarantee rapid global context ex­ traction as well as visual spatial working memory (Bar, 2004; Henderson, 2003; Hochstein & Ahissar, 2002). Ungerleider and Mishkin (1982) have characterized the functional specialization of the vi­ sual brain as consisting of two processing streams: The “where” pathway or dorsal route, comprising occipital-parietal visual areas and connections, is specialized in space pro­ cessing; and the “what” pathway or ventral route, comprising occipital-temporal visual areas and connections, is specialized in object processing. According to Milner and Goodale (2008), information processed in the dorsal pathway is used for the implicit visu­ al guidance of actions, whereas explicit perception is associated with processing in the ventral stream. Because visual perception usually involves both space- and object-based information processing, cooperation and interaction between the two visual streams are required (Goodale & Westwood, 2004). In addition, both routes interact either directly or indirectly via attention involving the inferior parietal cortex (Singh-Curry & Husain, 2009) and working memory involving the prefrontal cortex (Goodale & Westwood, 2004; Oliveri et al., 2001). Eye movements play a crucial role in visual processing and thus in vi­ sual perception (for a comprehensive review, see Martinez-Conde, Macknik, & Hubel, 2004). The posterior thalamus and its reciprocal connections with cortical regions in the occipital, parietal, and frontal lobes and with the limbic neocortex form a cortical-subcor­ tical network subserving intentionally guided and externally triggered attention as well as saccadic eye movements that are involved in visual information processing (e.g., Anders­ son et al., 2007; Dean, Crowley & Platt 2004; Himmelbach, Erb, & Karnath, 2006; Nobre, 2001; Olson et al., 2000; Schiller & Tehovnik, 2001, 2005). Complex visual stimuli (e.g., objects and faces) are coded as specific categories in extrastriate regions in the ventral visual pathway (Grill-Spector, 2003; Sigala, 2004; Wierenga et al., 2009). Top-down processes involving the prefrontal cortex facilitate visual object recognition (Bar, 2003), and hippocampal-dependent memory builds the basis for experience-dependent visual scanning (Smith & Squire, 2008). (p. 195) Yet, it is still unclear how the brain eventually codes complex visual stimuli for accurate identification and recognition; it appears, how­ ever, that complex visual stimuli are simultaneously represented in two parallel and hier­ archically organized processing systems in the ventral and dorsal visual pathways (Konen & Kastner, 2008). About 30 percent of patients with acquired brain injury suffer from visual disorders (Clarke, 2005; Rowe et al., 2009; Suchoff et al., 2008). Lower level visual functions and abilities (e.g., visual detection and localization, visual acuity, contrast sensitivity, and col­ or discrimination) may be understood as perceptual architecture, whereas higher level, visual-cognitive capacities (e.g., text processing and recognition) also involve learning and memory processes as well as executive functions. Selective visual disorders after brain injury are the exception rather than the rule because small “strategic” lesions are very rare and visual cortical areas are intensely interconnected. Injury to the visual brain, Page 3 of 37

Perceptual Disorders that is, to visual cortical areas and fiber connections, therefore commonly causes an asso­ ciation of visual disorders.

Visual Field A homonymous visual field defect is defined as a restriction of the normal visual field caused by injury to the afferent postchiasmatic visual pathway, that is, an interruption in the flow of visual information between the optic chiasm and the striate cortex. Homony­ mous visual field disorders are characterized by partial or total blindness in correspond­ ing visual field regions of each eye. In the case of unilateral postchiasmatic brain injury, vision may be lost in the left or right hemifield (homonymous left- or right-sided hemi­ anopia), the left or right upper or lower quadrants (homonymous upper or lower quadra­ nopia in the left or right hemifield), or a restricted portion in the parafoveal visual field (paracentral scotoma). The most common type of homonymous visual field disorders is hemianopia (loss of vision in one hemifield), followed by quadranopia (loss of vision in one quadrant) and paracentral scotoma (island of blindness in the parafoveal field region). Vi­ sual field defects are either absolute (complete loss of vision, anopia) or relative (de­ pressed vision, amblyopia, hemiachromatopsia). Homonymous amblyopia typically affects the entire hemifield (hemiamblyopia), and homonymous achromatopsia (i.e., the selective loss of color vision) typically affects one hemifield (hemiachromatopsia) or the upper quadrant. Visual field defects differ with respect to visual field sparing. Foveal sparing refers to sparing of the foveal region (1 degree), macular sparing refers to the preserva­ tion of the macular region (5 degrees), and macular splitting refers to a sparing of less than 5 degrees (for review, see Harrington & Drake, 1990). In the majority of patients (71.5 percent of 876 cases), field sparing does not exceed 5 degrees. As a rule, patients with small visual field sparing are more disabled, especially with regard to reading. Stroke represents the most common etiology, but other etiologies such as traumatic brain injury, tumors, multiple sclerosis, and cortical posterior atrophy may also cause homony­ mous visual field disorders (see Zihl, 2011). About 80 percent of patients (n = 157) with unilateral homonymous visual field loss suffer from functional impairments in reading (hemianopic dyslexia) and/or in global perception and overview (Zihl, 2011). Homonymous visual field loss causes a restriction of the field of view, which prevents the rapid extraction of the entire spatial configuration of the visu­ al environment. It therefore impairs the top-down and bottom-up interactions that are re­ quired for efficient guidance of spatial attention and oculomotor activities during scene perception and visual search. Patients with additional injury to the posterior thalamus, the occipital white matter route (i.e., fiber pathways to the dorsal visual route and path­ ways connecting occipital, parietal, temporal, and frontal cortical areas) show disorga­ nized oculomotor scanning behavior (Zihl & Hebel, 1997; Mort & Kennard, 2003). The im­ pairments in global perception and visual scanning shown by these patients are more se­ vere than those resulting from visual field loss alone (Zihl, 1995a). Interestingly, about 20 percent show spontaneous substitution of visual field loss by oculomotor compensation and thus enlargement of the field of view; the percentage is even higher in familiar sur­ roundings because patients can make use of their spatial knowledge of the surroundings Page 4 of 37

Perceptual Disorders (Zihl, 2011). In normal subjects, global visual perception is based on the visual field with­ in which they can simultaneously detect and process visual stimuli. The visual field can be enlarged by eye shifts, which is typically 50 degrees in all directions (Leigh & Zee, 2006). The resulting field of view is thus defined by the extent of the visual field when moving the eyes in global visual perception (see also Pambakian, Mannan, Hodgson, & Kennard, 2004). Reading is impaired in patients with unilateral homonymous field loss and visual field sparing of less than 5 degrees to the left and less than 8 degrees to the right of the fovea. In reading, the visual brain (p. 196) relies on a gestalt-type visual word-form processing, the “reading span.” It is asymmetrical (larger to the right in left-to right-orthographies) and is essential for the guidance of eye movements during text processing (Rayner, 1998). However, insufficient visual field sparing does not appear to be the only factor causing persistent “hemianopic” dyslexia. The extent of brain injury affecting in particular the oc­ cipital white matter seems to be crucial in this regard (Schuett, Heywood, Kentridge, & Zihl, 2008a; Zihl 1995b). That reading is impaired at the pre-semantic visual sensory level is supported by the outcome of treatment procedures involving practice with nontext ma­ terial, which have been found to be as effective as word material in reestablishing eye movement reading patterns and improving reading performance (Schuett, Heywood, Ken­ tridge, & Zihl, 2008b). In the case of bilateral postchiasmatic brain injury, both visual hemifields are affected, re­ sulting in bilateral homonymous hemianopia (“tunnel vision”), bilateral upper or lower hemianopia, bilateral paracentral scotoma, or central scotoma. Patients with bilateral vi­ sual field disorders suffer from similar, but typically more debilitating, visual impairments in global visual perception and reading. A central scotoma is a very dramatic form of homonymous visual field loss because foveal vision is either totally lost or depressed (cen­ tral amblyopia). The reduction or loss of vision in the central part of the visual field is typ­ ically associated with a corresponding loss of visual spatial contrast sensitivity, visual acu­ ity, and form, object, and face perception. The loss of foveal vision also causes a loss of the central reference for optimal fixation and of the straight-ahead direction as well as an impairment of the visual-spatial guidance of saccades and hand-motor responses. As a consequence, patients cannot accurately fixate a visual stimulus and shift their gaze from one stimulus to another, scan a scene or a face, and guide their eye movements during scanning and reading. Patients therefore show severe impairments in locating objects, recognizing objects and faces, finding their way in rooms or places, and reading, and of­ ten get lost when scanning a word or a scene (Zihl, 2011).

Visual Acuity, Spatial Contrast Sensitivity, and Visual Adaptation After unilateral postchiasmatic brain injury, visual acuity is usually not significantly re­ duced, except for cases in which the optic tract is involved (Frisén, 1980). After bilateral postchiasmatic injury, visual acuity can either be normal, gradually diminished, or totally

Page 5 of 37

Perceptual Disorders lost (i.e., form vision is no longer possible) (Symonds & MacKenzie, 1957). This reduction in visual acuity cannot be improved by optical correction. When spatial contrast sensitivity is reduced, patients usually complain of “blurred” or “foggy” vision despite normal visual acuity, accommodation, and convergence (Walsh, 1985). Impairments of contrast sensitivity have been reported in cerebrovascular dis­ eases (Bulens, Meerwaldt, van der Wildt, & Keemink, 1989; Hess, Zihl, Pointer, & Schmid, 1990); after closed head trauma, encephalitis, and hypoxia (Hess et al., 1990); in Parkinson’s disease (Bulens, Meerwaldt, van der Wildt & Keemink, 1986; Uc et al., 2005); multiple sclerosis (Gal, 2008); and dementia of the Alzheimer type (Jackson & Owsley, 2003). Bulens et al. (1989) have suggested that impairments of contrast sensitivity for high spatial frequencies mainly occur after occipital injury, whereas impairments of sensi­ tivity for lower spatial frequencies occur after temporal or parietal injury. Depending on the severity of the sensitivity loss, patients have difficulties with depth perception, text processing, face perception, and visual recognition. Because reduction in spatial contrast sensitivity is not necessarily associated with reduced visual acuity, assessing visual acuity alone is not sufficient for detecting impaired spatial contrast sensitivity.

Color Vision Color vision may be lost in the contralateral hemifield (homonymous hemiachromatopsia) or in the upper quadrant after unilateral occipital-temporal brain injury. Because light sensitivity and form vision are not impaired in the affected hemifield, the loss of color vi­ sion is selective (e.g., Short & Graff-Radford, 2001). Patients are usually aware of this dis­ order and report that the corresponding part of the visual environment appears “pale,” in “black and white,” or “like in an old movie.” In the case of cerebral dyschromatopsia, foveal color vision is affected with and without the concomitant loss of color vision in the peripheral visual field (Koh et al., 2008; Rizzo, Smith, Pokorny, & Damasio, 1993). Pa­ tients with cerebral dyschromatopsia find it difficult to discriminate fine color hues. Bilat­ eral occipital-temporal injury causes moderate or severe loss of color vision in the entire visual field, which is called cerebral achromatopsia (Bouvier & Engel, 2006; Heywood & Kentridge, 2003; Meadows 1974); yet, discrimination of grays (p. 197) (Heywood, Wilson, & Cowey, 1987) and even processing of wavelength differences (Heywood & Kentridge, 2003) may be spared. Consequently, discriminating and sorting of colors and associating color stimuli with their names and with particular objects (e.g., yellow and banana; green and grass) are affected. Patients may report that objects and pictures appear “drained of color,” as “dirty brownish” or “reddish,” or as “black and white.” Cerebral hemiachro­ matopsia is a rather rare condition. Among 1,020 patients with unilateral homonymous vi­ sual field disorders after acquired posterior brain injury, we found thirty cases (3.9 per­ cent) with unilateral hemiachromatopsia and impaired foveal color discrimination; among 130 cases with bilateral occipital injury, sixteen cases (12.3 percent) showed complete cerebral achromatopsia. Partial cerebral achromatopsia may also occur and may be asso­ ciated with impaired color constancy (Kennard, Lawden, Morland, & Ruddock, 1995). The ventral occipital-temporal cortex is the critical lesion location of color vision deficits (Bou­ vier & Engel, 2006; Heywood & Kentridge, 2003). Color vision may also be impaired in Page 6 of 37

Perceptual Disorders (mild) hypoxia (Connolly, Barbur, Hosking, & Moorhead, 2008), multiple sclerosis (Moura et al., 2008), Parkinson’s disease (Müller, Woitalla, Peters, Kohla, & Przuntek, 2002), and dementia of the Alzheimer type (Jackson & Owsley, 2003). Furthermore, color hue dis­ crimination accuracy can be considerably reduced in older age (Jackson & Owsley, 2003).

Spatial Vision Disorders in visual space perception comprise deficits in visual localization, depth percep­ tion, and perception of visual spatial axes. Brain injury can differentially affect retino­ topic, spatiotopic, egocentric, and allocentric frames of reference. Visual-spatial disor­ ders typically occur after occipital-parietal and posterior parietal injury; a right-hemi­ sphere injury more frequently causes visual spatial impairments (for comprehensive re­ views, see Farah, 2003; Karnath & Zihl, 2003; Landis 2000). After unilateral brain injury, moderate defective visual spatial localization is typically found in the contralateral hemifield, but may also be present in the foveal visual field (Postma, Sterken, de Vries, & de Haan, 2000), which is associated with less accurate sac­ cadic localization accuracy. Patients with bilateral posterior brain injury, in contrast, show moderate to severe localization inaccuracy in the entire visual field, which typically af­ fects all visually guided activities, including accurately fixating objects, reaching for ob­ jects, and reading and writing (Zihl, 2011). Interestingly, patients with parietal lobe injury can show dissociation between spatial perception deficits and pointing errors (Darling, Bartelt, Pizzimenti, & Rizzo, 2008), indicating that inaccurate pointing cannot always be explained in terms of defective localization but may represent a genuine disorder (optic ataxia; see Caminiti et al., 2010). Impaired monocular and binocular depth perception (astereopsis) has been observed in patients with unilateral and bilateral posterior brain injury, with bilateral injury causing more severe deficits. Defective depth perception may cause difficulties in pictorial depth perception, walking (downstairs), and reaching for objects or handles (Koh et al., 2008; Miller et al., 1999; Turnbull, Driver, & McCarthy, 2004). Impaired distance perception, in particular in the peripersonal space, has mainly been observed after bilateral occipitalparietal injury (Berryhill, Fendrich, & Olson, 2009). Shifts in the vertical and horizontal axes have been reported particularly in patients with right occipital-parietal injury (Barton, Behrmann, & Black, 1998; Bonan, Leman, Legar­ gasson, Guichard, & Yelnik, 2006). Right-sided posterior parietal injury can also cause ip­ silateral and contralateral shifts in the visually perceived trunk median plane (Darling, Pizzimenti, & Rizzo, 2003). Occipital injury more frequently causes contralateral shifts in spatial axes, whereas posterior parietal injury also causes ipsilateral shifts. Barton and Black (1998) suggested that the contralateral midline shift of hemianopic patients is “a consequence of the strategic adaptation of attention into contralateral hemispace after hemianopia” (p. 660), that is, that a change in attentional distribution might cause an ab­ normal bias in line bisection. In a study of 129 patients with homonymous visual field loss, we found the contralateral midline shift in more than 90 percent of cases. However, Page 7 of 37

Perceptual Disorders the line bisection bias was not associated with efficient oculomotor compensation for the homonymous visual loss. In addition, visual field sparing also did not modulate the degree of midline shift. Therefore, the subjective straight-ahead deviation may be explained as a consequence of a systematic, contralesional shift of the egocentric visual midline and may therefore represent a genuine visual-spatial perceptual disorder (Zihl, Sämann, Schenk, Schuett, & Dauner, 2009). This idea is supported by Darling et al. (2003), who reported difficulties in visual perception of the trunk-fixed anterior-posterior axis in patients with left- or (p. 198) right-sided unilateral posterior parietal lesions without visual field defects.

Visual Motion Perception Processing of direction and speed of visual motion stimuli is a genuine visual ability. How­ ever, in order to know how objects move in the world, we must take into account the rota­ tion of our eyes as well as of our head (Bradley, 2004; Snowden & Freeman, 2004). Mo­ tion perception also enables recognition of biological movements (Giese & Poggio, 2003) and supports face perception (Roark, Barrett, Spence, Abdi, & O’Toole, 2003). Visual area V5 activity is the most critical basis for generating motion perception (Moutoussis & Zeki, 2008), whereas superior temporal and premotor areas subserve biological motion percep­ tion (Saygin, 2007). The first well-documented case of loss of visual motion perception (cerebral akinetopsia) is L.M. After bilateral temporal-occipital cerebrovascular injury, she completely lost move­ ment vision in all three dimensions, except for detection and direction discrimination of single targets moving at low speed with elevated thresholds. In contrast, all other visual abilities, including the visual field, visual acuity, color vision, stereopsis, and visual recog­ nition, were spared, as was motion perception in the auditory and tactile modalities. Her striking visual-perceptual impairment could not be explained by spatial or temporal pro­ cessing deficits, impaired contrast sensitivity (Hess, Baker, & Zihl, 1989), or generalized cognitive slowing (Zihl, von Cramon, & Mai, 1983; Zihl, von Cramon, Mai, & Schmid, 1991). L.M. was also unable to search for a moving target among stationary distractor stimuli in a visual display (McLeod, Heywood, Driver, & Zihl, 1989) and could not see bio­ logical motion stimuli (McLeod, Dittrich, Driver, Perrett, & Zihl, 1996), including facial movements in speech reading (Campbell, Zihl, Massaro, Munhall, & Cohen, 1997). She could not extract shape from motion and lost apparent motion perception (Rizzo, Nawrot, & Zihl, 1995). Because of her akinetopsia, L.M. was severely handicapped in all activities involving visual motion perception, whereby perception and action were similarly affect­ ed (Schenk, Mai, Ditterich, & Zihl, 2000). Selective impairment of movement vision in terms of threshold elevation for speed and direction has also been reported in the hemi­ field contralateral to unilateral posterior brain injury for motion types of different com­ plexity, combined and in separation (Billino, Braun, Bohm, Bremmer, & Gegenfurtner, 2009; Blanke, Landis, Mermoud, Spinelli, & Safran, 2003; Braun, Petersen, Schoenle, & Fahle, 1998; Plant, Laxer, Barbaro, Schiffman, & Nakayama, 1993; Vaina, Makris, Kennedy, & Cowey, 1998).

Page 8 of 37

Perceptual Disorders

Visual Identification and Visual Recognition Visual agnosia is the inability to identify, recognize, interpret, or comprehend the mean­ ing of visual stimuli even though basic visual functions (i.e., the visual field, visual acuity, spatial contrast sensitivity, color vision, and form discrimination) are intact or at least suf­ ficiently preserved. Visual agnosia either results from defective visual perception (e.g., synthesis of features; apperceptive visual agnosia) or from the loss of the “bridge” be­ tween the visual stimulus and its semantic associations (e.g., label, use, history; associa­ tive or semantic visual agnosia). However, objects can be recognized in the auditory and tactile modalities, and the disorder cannot be explained by supramodal cognitive or apha­ sic deficits (modified after APA, 2007). Lissauer (1890) interpreted apperceptive visual ag­ nosia as “mistaken identity” because incorrectly identified objects share global (e.g., size and shape) and/or local properties (e.g., color, texture, form details) with other objects, which causes visual misidentification. Cases with pure visual agnosia seem to be the ex­ ception rather than the rule (Riddoch, Johnston, Bracewell, Boutsen, & Humphreys, 2008). Therefore, a valid and equivocal differentiation between a “genuine” visual ag­ nosia and secondary impairments in visual identification and recognition resulting from other visual deficits is often difficult, in particular concerning the integration of global and local information (Delvenne, Seron, Coyette, & Rossion, 2004; Thomas & Forde, 2006). In a group of 1,216 patients with acquired injury to the visual brain we have found only seventeen patients (about 2.4 percent) with genuine visual agnosia. Visual agnosia is typically caused by bilateral occipital-temporal injury (Barton, 2008a) but may also occur after left- (Barton, 2008b) or right-sided posterior brain injury (Landis, Regard, Bliestle, & Kleihues, 1988). There also exist progressive forms of visual agnosia in posterior corti­ cal atrophy and in early stages of dementia (Nakachi et al., 2007; Rainville et al., 2006). Farah (2000) has proposed a useful classification of visual agnosia according to the type of visual material patients find difficult to identify and recognize. Patients with visual ob­ ject and form agnosia are unable to visually recognize complex objects or pictures. (p. 199) There exist category-specific types of object agnosia, such as for living and nonliv­ ing things (Thomas & Forde, 2006), animals or artifacts (Takarae & Levin, 2001). A par­ ticular type of visual object agnosia is visual form agnosia. The most elaborated case with visual form agnosia is D.F. (Milner et al., 1991). After extensive damage to the ventral processing stream due to carbon monoxide poisoning, this patient showed a more or less complete loss of form perception, including form discrimination, despite having a visual resolution capacity of 1.7 minute of arc. Visually guided activities such as pointing to or grasping for an object, however, were spared (Carey, Dijkerman, Murphy, Goodale, & Mil­ ner, 2006; James, Culham, Humphrey, Milner, & Goodale, 2003; McIntosh, Dijkerman, Mon-Williams, & Milner, 2004). D.F. also showed profound inability to visually recognize objects, places, and faces, indicating a more global rather than selective visual agnosia. Furthermore, D.F.’s visual disorder may also be explained in terms of an allocentric spa­ tial deficit rather than as perceptual deficit (Schenk, 2006). As Goodale and Westwood (2004) have pointed out, the proposed ventral-dorsal division in visual information pro­ cessing may not be as exclusive as assumed, and both routes interact at various stages. However, automatic obstacle avoidance was intact in D.F. while correct grasping was pos­ Page 9 of 37

Perceptual Disorders sible for simple objects only (McIntosh et al., 2004), suggesting that the “what” pathway plays no essential role in detecting and localizing objects or in the spatial guidance of walking (Rice et al., 2006). Further cases of visual form agnosia after carbon monoxide poisoning have been reported by Heider (2000). Despite preserved visual acuity and only minor visual field defects, patients were severely impaired in shape and form discrimina­ tion, whereas the perception of color, motion, and stereoscopic depth was relatively unim­ paired. Heider (2000) identified a failure in figure–ground segregation and grouping sin­ gle elements of a composite visual scene into a “gestalt” as the main underlying deficit. Global as well as local processing can be affected after right- and left-sided occipital-tem­ poral injury (Rentschler, Treutwein, & Landis, 1994); yet, typically patients find it more difficult to process global features and integrate them into a whole percept (integrative or simultaneous agnosia; Behrmann & Williams, 2007; Saumier, Arguin, Lefebvre, & Las­ sonde, 2002; Thomas & Forde, 2006). Consequently, patients are unable to report more than one attribute of a single object (Coslett & Lie, 2008). Encoding the spatial arrange­ ments of parts of an object requires a mechanism that is different from that required for encoding the shape of individual parts, with the former selectively compromised in inte­ grative agnosia (Behrmann, Peterson, Moscovitch, & Suzuki, 2006). Integration of multi­ ple object stimuli into a holistic interpretation seems to depend on the spatial distance of local features and elements (Huberle & Karnath, 2006). Yet, shifting fixation and thus al­ so attention to all elements of an object in a regular manner seems not sufficient to “bind” together the different elements of spatially distributed stimuli (Clavagnier et al., 2006). The integration of multiple visual elements resulting in a conscious perception of their gestalt seems to rely on bilateral structures in the human lateral and medial inferior parietal cortex (Himmelbach, Erb, Klockgether, Moskau, & Karnath, 2009). An alternative explanation for the impairment in global visual perception is shrinkage of the field of at­ tention and thus perception (Michel & Henaff, 2004), which might be elicited by atten­ tional capture (“radical visual capture”) to single, local elements (Takaiwa, Yoshimura, Abe, & Terai, 2003; Dalrymple, Kingstone, & Barton, 2007). The pathological restriction and rigidity of attention impair the integration of multiple visual elements to a gestalt, but the type of capture depends on the competitive balance between global and local salience. The impaired disengaging of attention causes inability to “unlock” attention from the first object or object element to other objects or elements of objects (Pavese, Coslett, Saffran, & Buxbaum, 2002). Interestingly, facial expressions of emotion are less affected in simultanagnosia, indicating that facial stimuli constitute a specific category of stimuli that attract attention more effectively and are possibly processed before attention­ al engagement (Pegna, Caldara-Schnetzer, & Khateb, 2008). It has been proposed that differences in local relative to more global visual processing can be explained by different processing modes in the dorsal and medial ventral visual pathways at an extrastriate lev­ el; these characteristics can also explain category-specific deficits in visual perception (Riddoch et al., 2008). The dual-route organization of visual information has also been ap­ plied to local–global perception. Difficulties with processing of multiple stimulus elements or features (within-object representation) are often referred to as “ventral” simultanag­ nosia, and impaired processing of multiple spatial stimuli (between-object representation) as “dorsal” simultanagnosia (Karnath, Ferber, Rorden, & Driver, 2000). Dorsal simul­ Page 10 of 37

Perceptual Disorders tanagnosia is one component of the Bálint-Holmes syndrome, which consists of (p. 200) spatial (and possibly temporal) restriction of the field of visual attention and thus visual processing and perception, impaired visual spatial localization and orientation, and defec­ tive depth perception (Moreaud, 2003; Rizzo & Vecera, 2002). In addition, patients with severe Balint’s syndrome find it extremely difficult to shift their gaze voluntarily or on command (oculomotor apraxia or psychic paralysis of gaze) and are unable to direct movement of an extremity in space under visual guidance (optic or visuomotor ataxia). As a consequence, visually guided oculomotor and hand motor activities, visual-constructive abilities, visual orientation, recognition, and reading are severely impaired (Ghika, GhikaSchmid, & Bogousslavsky, 1998). In face agnosia (prosopagnosia), recognition of familiar faces, including one’s own face, is impaired or lost. The difficulties prosopagnosic patients have with visual face recognition also manifest in their oculomotor scan path during inspection of a face; global features such as hair or the forehead, for example, are scanned in much more detail than genuine facial features such as the eye or nose (Stephan & Caine, 2009). Other prosopagnosic subjects may show partial processing of facial features, such as the mouth region (Bukach, Le Grand, Kaiser, Bub, & Tanaka, 2008). Topographical (topographagnosia) or environmentalagnosia refers to defective recognition of familiar environments, in reality and on maps and pictures; however, patients may have fewer difficulties in familiar sur­ roundings and with scenes with clear landmarks, and may benefit from semantic informa­ tion such as street names (Mendez & Cherrier, 2003). Agnosia for letters (pure alexia) is a form of acquired dyslexia with defective visual recognition of letters and words while au­ ditory recognition of letters and words and writing are intact. The underlying disorder may have a pre-lexical, visual-perceptual basis because patients can also exhibit difficul­ ties with nonlinguistic stimuli (Mycroft, Behrmann, & Kay, 2009).

Audition Auditory perception comprises detection, discrimination, identification, and recognition of sounds, voice, music, and speech. The ability to detect and discriminate attributes of sounds improves with practice (Wright & Zhang, 2009) and thus depends on auditory ex­ perience. This might explain interindividual differences in auditory performance, in par­ ticular recognition expertise and domain specificity concerning, for example, sounds, voices, and music (Chartrand, Peretz, & Belin, 2008). Another factor that crucially modu­ lates auditory perceptual efficiency is selective attention (Shinn-Cunningham & Best, 2008). The auditory brain possesses tonotopic maps that show rapid task-related changes to sub­ serve distinct functional roles in auditory information processing, such as pitch versus phonetic analysis (Ozaki & Hashimoto, 2007). This task specificity can be viewed as a form of plasticity that is embedded in a context- and cognition-related frame of reference, whereby attention, learning and memory, and mental imagery can modulate processing (Dahmen & King, 2007; Fritz, Elhilali, & Shamma, 2005; Weinberger, 2007; Zatorre, Page 11 of 37

Perceptual Disorders 2007). The auditory cortex forms internal representations of temporal characteristic structures, which may build the further basis for sound segmentation, complex auditory objects processing, and also multisensory integration (Wang, Lu, Bendor, & Bertlett, 2008). In the discrimination of speech and nonspeech stimuli, which is based on subtle temporal acoustic features, the middle temporal gyrus, the superior temporal sulcus, the posterior part of the inferior frontal gyrus, and the parietal operculum of the left hemi­ sphere are involved (Zaehle, Geiser, Alter, Jancke, & Meyer, 2008). Environmental sounds are mainly processed in the middle temporal gyri in both hemispheres (Lewis et al., 2004), whereas vocal communication sounds are preferentially coded in the insular re­ gion (Bamiou, Musiek, & Luxon, 2003). Music perception is understood as a form of com­ munication in which formal codes (i.e., acoustic patterns) and their auditory representa­ tions are employed to elicit a variety of perceptual and emotional experiences (Bharucha, Curtis, & Paroo, 2006). Musical stimuli have also been found to activate specific path­ ways in several brain areas, which are associated with emotional behavior, such as insu­ lar and cingulate cortices, amygdala, and prefrontal cortex (Boso, Politi, Barale, & Enzo, 2006). For the representation of auditory scenes and categories within past and actual ex­ periences and contexts, the medial and ventrolateral prefrontal cortex appears to play a particular role (Janata, 2005; Russ, Lee, & Cohen, 2007). The auditory system also possesses a “where” and a “what” subdivision for processing spatial and nonspatial aspects of acoustic stimuli, which allows detection, localization, discrimination, identification, and recognition of auditory information, including vocal communication sounds (speech perception) and music (Kraus & Nicol, 2005; Wang, Wu, & Li, 2008). (p. 201)

Auditory Perceptual Disorders

Unilateral and bilateral injury to left- or right-sided temporal brain structures can affect processing of spatial and temporal auditory processing capacities (Griffiths et al., 1997; Polster & Rose, 1998) and the perception of environmental sounds (Tanaka, Nakano, & Obayashi, 2002), sound movement (Lewald, Peters, Corballis, & Hausmann, 2009), tunes, prosody, and voice (Peretz et al., 1994), and words (pure word deafness) (Shivashankar, Shashikala, Nagaraja, Jayakumar, & Ratnavalli, 2001). Functional dissociation of auditory perceptual deficits, such as preservation of speech perception and environmental sounds but impairment of melody perception (Peretz et al., 1994), impaired speech perception but intact environmental sound perception (Kaga, Shindo, & Tanaka, 1997), and impaired perception of verbal but spared perception of nonverbal stimuli (Shivashankar et al., 2001), suggests a modular architecture similar to that in the visual cortex (Polster & Rose, 1998).

Auditory Agnosia Auditory agnosia is defined as the impairment or loss of recognition of auditory stimuli in the absence of defective auditory functions and language and cognitive disorders that can (sufficiently) explain the recognition disorder. As in visual agnosia, it may be difficult to Page 12 of 37

Perceptual Disorders validly distinguish between genuine and secondary auditory agnosia. It is impossible to clearly differentiate sensory-perceptual from perceptual-cognitive abilities because both domains are required for auditory recognition. For example, patients with intact process­ ing of steady-state patterns but impaired processing of dynamic acoustic patterns may ex­ hibit verbal auditory agnosia (Wang, Peach, Xu, Schneck, & Manry, 2000) or have (addi­ tional) difficulties with auditory spatial localization and auditory motion perception (Clarke, Bellmann, Meuli, Assal, & Steck, 2000). Auditory agnosia for environmental sounds may be associated with impaired processing of meaningful verbal information (Saygin, Dick, Wilson, Dronkers, & Bates, 2003) and impaired recognition of music (Kaga, Shindo, Tanaka, & Haebara, 2000); yet, perception of environmental sound (Shivashankar et al., 2001) and music may also be spared even in the case of generalized auditory ag­ nosia (Mendez, 2001). However, there exist examples of pure agnosia for recognizing par­ ticular categories of auditory material, such as environmental sounds (Taniwaki, Tagawa, Sato, & Iino, 2000), speech (pure word deafness) (Engelien et al., 1995; Polster & Rose, 1998), and music perception. Musical timber perception can be affected after left- or right temporal lobe injury (Samson, Zatorre, & Ramsay, 2002). Agnosia for music (music agnosia, amusia) and agnosia for other auditory categories are frequently associated but can also dissociate; they typically occur after right unilateral and bilateral temporal lobe injury (Vignolo, 2003). Amusia may affect discrimination and recognition of familiar melodies (Ayotte, Peretz, Rousseau, Bard, & Bojanowski, 2000; Sato et al., 2005). Howev­ er, there is evidence for a less strong hemispheric specificity for music perception be­ cause cross-hemisphere and fragmented neural substrates underlie local and global musi­ cal information processing at least in the melodic and temporal dimensions (Schuppert, Munte, Wieringa, & Altenmüller, 2000).

Somatosensory Perception The somatosensory system provides information about object surfaces that are in direct contact with the skin (touch) and about the position and movements of body parts (propri­ oception and kinesthesis). Somatosensory perception thus includes detection and discrim­ ination of (fine) differences in touch stimulation and haptic perception, that is, the per­ ception of shape, size, and identity (recognition) of objects on the basis of touch and kinesthesis. Shape is an important cue for recognizing objects by touch; edges, curvature, and surface areas are associated with three-dimensional shape (Plaisier, Tiest, & Kap­ pers, 2009). Exploratory motor procedures are directly linked to the extraction of specific shape properties (Valenza et al., 2001). Somatosensory information is processed in anteri­ or, lateral, and posterior parietal cortex, but also in frontal, cingulate, temporal, and insu­ lar cortical regions (Porro, Lui, Facchin, Maieron, & Baraldi, 2005).

Somatosensory Perceptual Disorders Impaired haptic perception of (micro) geometrical properties, which may be associated with a failure to recognize objects, has been reported after injury to the postcentral gyrus, including somatosensory areas SI and SII, and the posterior parietal cortex Page 13 of 37

Perceptual Disorders (Bohlhalter, Fretz, & Weder, 2002; Estanol, Baizabal-Carvallo, & Senties-Madrid, 2008). Difficulties to identify objects using hand manipulation only have been reported after parietal injury (Tomberg & Desmedt, 1999). Impairment of the perception of stimulus shape (morphagnosia) may result from defective processing of spatial orientation in twoand three-dimensional space (Saetti, De Renzi, & Comper, 1999). (p. 202) Tactile object recognition can be impaired without associated disorders in tactile discrimination and manual shape exploration, indicating the existence of “pure” tactile agnosia (Reed, Casel­ li, & Farah, 1996).

Body Perception Disorders Disorders in body perception may affect body form and body actions selectively or in com­ bination (Moro et al., 2008). Patients with injury to the premotor cortex may show ag­ nosia for their body (asomatognosia); that is, they describe parts of their body to be miss­ ing or disappeared from body awareness (Arzy, Overney, Landis, & Blanke, 2006). Macro and, less frequently, micro somatognosia have been reported as transient and reversible modifications of body representation during migraine aura (Robinson & Podoll, 2000). Asomatognosia either may involve the body as a whole (Beis, Paysant, Bret, Le Chapelain, & Andre, 2007) or may be restricted to finger recognition (“finger agnosia”; Anema et al., 2008). Body misperception may also result in body illusion, a deranged representation of the body concerning its ownership labeled “somatoparaphrenia” (Vallar & Ronchi, 2009). Distorted body perception may also occur in chronic pain (Lotze & Moseley, 2007).

Olfactory and Gustatory Perception The significance of the sense of smell is still somehow neglected. This is surprising given that olfactory processing monitors the intake of airborne agents into the human respirato­ ry system and warns of spoiled food, leaking natural gas, polluted air, and smoke. In addi­ tion, it determines to a large degree the flavor and palatability of foods and beverages, enhances life quality, and mediates basic elements of human social relationships and com­ munication, such as in mother–child interactions (Doty, 2009). Olfactory perception im­ plies detection, discrimination, identification, and recognition of olfactory stimuli. Olfacto­ ry perception shows selective adaptation; the perceived intensity of a smell drops by 50 percent or more after continuous exposure of about 10 minutes, and recovers again after removal of the smell stimulus (Eckman, Berglund, Berglund, & Lindvall, 1967). Continu­ ous exposition to a particular smell, such as cigarette smoke, causes persistent adapta­ tion to that smell on the person and in the environment. Smell perception involves the caudal orbitofrontal and medial temporal cortices. Olfacto­ ry stimuli are processed in primary olfactory (piriform) cortex and also activate the amyg­ dala bilaterally, regardless of valence. In posterior orbitofrontal cortex, processing of pleasant and unpleasant odors is segregated within medial and lateral segments, respec­ tively, indicating functional heterogeneity. Olfactory stimuli also show that brain regions mediating emotional processing are differentially activated by odor valence and provide Page 14 of 37

Perceptual Disorders evidence for a close anatomical coupling between olfactory and emotional processes (Got­ tfried, Deichmann, Winston, & Dolan, 2002). Gustation is vital for establishing whether a specific substance is edible and nutritious or poisonous, and for developing preferences for specific foods. According to the well-known taste tetrahedron, four basic taste qualities can be distinguished: sweet, salt, sour, and bitter. A fifth taste quality is umami, a Japanese word for “good taste.” Perceptual taste qualities are based on the pattern of activity across different classes of sensory fibers (i.e., cross-fiber theory; Mather, 2006, pp. 44) and distributed cortical processing (Simon, de Araujo, Gutierrez, & Nicolelis, 2006). Taste information is conveyed through the cen­ tral gustatory pathways to the gustatory cortical area, but is also sent to the reward sys­ tem and feeding center via the prefrontal cortex, insular cortex, and amygdala (Simon et al., 2006; Yamamoto, 2006). The sensation of eating, or flavor, involves smell and taste as well as interactions between these and other perceptual systems, including temperature, touch, and sight. However, flavor is not a simple summation of different sensations; smell and taste seem to dominate flavor.

Olfactory Perceptual Disorders Olfactory perception can be impaired in the domains of detection, discrimination, and identification/recognition of smell stimuli. Typically, patients experience hyposmia or dys­ geusia (decrease) or anosmia (loss of sense of smell) (Haxel, Grant, & Mackay-Sim, 2008). However, distinct patterns of olfactory dysfunctions have been reported, indicating differ­ ential breakdown in olfactory perception analogous to visual and auditory modalities (Luzzi et al., 2007). Interestingly, selective inability to recognize the favorite foods by smell can also occur despite preserved detection and evaluation of food stimuli as pleas­ ant or familiar (Mendez & Ghajarnia, 2001). Chronic disorders in olfactory perception and recognition have been reported after (trau­ matic) brain injury mainly to ventral frontal cortical structures (Fujiwara, Schwartz, Gaom Black, & Levine, 2008; Haxel, Grant, & Mackay-Sim, 2008; Wermer, Donswijk, Greebe, Verweij, & Rinkel, 2007), in (p. 203) Parkinson’s disease and multiple sclerosis, in mesial temporal epilepsy, and in neurodegenerative diseases, including dementia of the Alzheimer type, frontal-temporal dementia, cortical-basal degeneration, and Huntington’s disease (Barrios et al., 2007; Doty, 2009; Jacek, Stevenson, & Miller, 2007; Pardini, Huey, Cavanagh, & Grafman, 2009). It should be mentioned, however, that hyposmia and im­ paired odor identification can also be found in older age (Wilson, Arnold, Tang, & Ben­ nett, 2006), in particular in subjects with cognitive decline. Presbyosmia has been found in particular after 65 years of age, with no difference between males and females, and with a weak relationship between self-reports of olfactory function and objective olfactory function (Mackay-Sim, Johnston, Owen, & Burne, 2006). Olfactory perceptual changes have also been reported among subjects receiving chemotherapy (Bernhardson, Tishel­ man, & Ruthqvist, 2009), in depression (Pollatos et al., 2007), and in anorexia nervosa (Roessner, Bleich, Banashewski, & Rothenburger, 2005).

Page 15 of 37

Perceptual Disorders

Gustatory Perceptual Disorders Gustatory disorders in the form of quantitatively reduced (hypogeusia) or qualitatively changed (dysgeusia) gestation have been reported after subcortical, inferior collicular stroke (Cerrato et al., 2005), after pontine infarction (Landis, Leuchter, San Millan Ruiz, Lacroix, & Landis, 2006), after left insular and opercular stroke (Mathy, Dupuis, Pigeolet, & Jacquerye, 2003), in multiple sclerosis (Combarros, Miro, & Berciano, 1994), and in di­ abetes mellitus (Stolbova, Hahn, Benes, Andel, & Treslova, 1999). The anteromedial tem­ poral lobe plays an important role in recognizing taste quality because injury to this structure can cause gustatory agnosia (Small, Bernasconi, Sziklas, & Jones-Gutman, 2005). Gustatory perception also decreases with age (>40 years), which is more pro­ nounced in males than in females (Fikentscher, Roseburg, Spinar, & Bruchmuller, 1977). Smell and taste dysfunctions, including impaired detection, discrimination, and identifica­ tion of foods, have been frequently reported in patients following (minor) stroke in tempo­ ral brain structures (Green, McGregor, & King, 2008). Abnormalities in taste and smell have also been reported in patients with Parkinson’s disease (Shah et al., 2009).

Social Perception Social perception is an individual’s perception of social stimuli (i.e., facial expression, prosody and gestures, and smells), which allow inferring motives, attitudes, or values from the social behavior of other individuals. Social perception and social cognition, but also sensitivity to the social context, and social action, belong to particular functional sys­ tems in the prefrontal brain (Adolphs, 2003; Adolphs, Tranel, & Damasio, 2003). The amygdala is involved in recognizing facial emotional expressions; the orbitofrontal cortex is important to reward processing; and the insula is involved in representing “affective” states of our own body, such as empathy or pain (Adolphs, 2009). The neural substrates of social perception are characterized by a general pattern of right-hemispheric functional asymmetry (Brancucci, Lucci, Mazzatenta, & Tommasi, 2009). The (right) amygdala is crucially involved in evaluating sad but not happy faces, suggesting that this brain struc­ ture plays a specific role in processing negative emotions, such as sadness and fear (Adolphs & Tranel, 2004).

Disorders in Social Perception Patients with traumatic brain injury may show difficulties with recognizing affective infor­ mation from the face, voice, bodily movement, and posture (Bornhofen & McDonald, 2008), which may persistently interfere with successful negotiation of social interactions (Ietswaart, Milders, Crawford, Currie, & Scott, 2008). Interestingly, face perception and perception of visual social cues can be affected while the perception of prosody can be relatively spared, indicating a dissociation between visual and auditory social-perceptual abilities (Croker & McDonald, 2005; Green, Turner, & Thompson, 2004; Pell, 1998). Im­ paired auditory recognition of fear and anger has been reported following bilateral amyg­ Page 16 of 37

Perceptual Disorders dala lesions (Scott et al., 1997). Impairments of social perception, including inaccurate in­ terpretation and evaluation of stimuli signifying reward or punishment in a social context, and failures to translate emotional and social information into task- and context-appropri­ ate action patterns are often observed in subjects with frontal lobe injury. Consequently, patients may demonstrate inadequate social judgments and decision making, social inflex­ ibility, and lack of self-monitoring, particularly in social situations (Rankin, 2007). Difficul­ ties with facial expression perception have also been reported in mood disorders (Venn, Watson, Gallagher, & Young, 2006).

Conclusion and Some Final Comments The systematic study of individuals with perceptual deficits has substantially contributed to the (p. 204) understanding of the role of perceptual abilities and their underlying so­ phisticated brain processes, as well as the neural organization of the perceptual modali­ ties. Combined neurobiological, neuroimaging, and neuropsychological evidence supports the view that all perceptual systems are functionally segregated and show a parallel-hier­ archical type of organization of information processing and coding. Despite this type of functional organization, pure perceptual disorders are the exception rather than the rule. This somehow surprising fact can be explained by three main factors: (1) focal brain in­ jury is only rarely restricted to the cortical area in question; (2) the rich, typically recipro­ cal fiber connections between cortical areas are frequently also affected; and (3) percep­ tion may depend on spatiotemporally distributed activity in more than just one cortical area, as is known, for example, in body perception (Berlucchi & Aglioti, 2010). Thus, an association of deficits is more likely to occur. Furthermore, complex perceptual disorders, such as recognition, may also be caused by impaired lower level perceptual abilities, and it is rather difficult to clearly distinguish between lower and higher level perceptual abili­ ties. In addition, recognition cannot be understood without reference to memory, and it is therefore not surprising that it has been suggested that the brain structures underlying visual memory, in particular in the medial temporal lobe, also possess perceptual func­ tions and can thus be understood as an extension of the ventral visual processing stream (Baxter, 2009; Suzuki, 2009). Consequently, rather than trying to map perceptual func­ tions onto more or less separate brain structures, a more comprehensive understanding of perception would benefit from the study of cortical representation of functions crucial­ ly involved in defined percepts (Bussey & Saksida, 2007). This also holds true for the per­ ception–action debate, in particular in vision, which is treated as an exploratory activity, that is, a way of acting based on sensorimotor contingencies, as proposed by O’Regan & Noë (2001). According to this approach, the outside visual world serves as its own repre­ sentation, whereas the experience of seeing occurs as a result of mastering the “govern­ ing laws of sensorimotor contingency” and thereby accounts for visual experience and “visual consciousness.” If one applies this approach to the pathology of visual perception, then the question arises as to which visual perceptual disorders would result from the im­ paired representation of the “outside” visual world, and which from the defective “mas­ tering of the governing laws of sensorimotor contingency.” Would visual perceptual disor­ ders of the first type not be experienced by patients, and thus not represent a disorder Page 17 of 37

Perceptual Disorders and not cause any handicap, because there is no “internal” representation of the outside world in our brains and thus no visual experience? Modulatory effects of producing action on perception such that observers become selectively sensitive to similar or related ac­ tions are known from visual imitation learning and social interactions (Schutz-Bosbach & Prinz, 2007), but in both instances, perception of action and, perhaps, motivation to ob­ serve and attention directed to the action in question are required. Nevertheless, a more detailed understanding of the bidirectional relationships between perception and action and the underlying neural networks will undoubtedly help us to understand how percep­ tion modulates action and vice versa. Possibly, the search for associations and dissocia­ tions of perceptions and actions in cases with acquired brain injury in the framework of common functional representations in terms of sensorimotor contingencies represents a helpful approach to studying the reciprocal relationships between perception and action. Accurate visually guided hand actions in the absence of visual perception (Goodale, 2008) and impaired eye–hand coordination and saccadic control in optic ataxia as a conse­ quence of impaired visual-spatial processing (Pisella et al., 2009) are examples of such dissociations and associations. Despite some conceptual concerns and limitations, the dual-route model of visual processing proposed by Milner and Goodale (2008) is still of theoretical and practical value (Clark, 2009). An interesting issue is implicit processing of stimuli in the absence of experience or awareness, such as detection, localization, and even discrimination of simple visual stim­ uli in hemianopia (“blindsight”; Cowey, 2010; Danckert & Rosetti, 2005); discrimination of letters in visual form agnosia (Aglioti, Bricolo, Cantagallo & Berlucchi, 1999), discrimina­ tion of forms in visual agnosia (Kentridge, Heywood, & Milner, 2004; Yang, Wu, & Shen,