Moving Image Theory : Ecological Considerations [1 ed.] 9780809387571, 9780809327461

Blending unconventional film theory with nontraditional psychology to provide a radically different set of critical meth

247 117 2MB

English Pages 269 Year 2007

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Moving Image Theory : Ecological Considerations [1 ed.]
 9780809387571, 9780809327461

Citation preview

Moving Image Theory Ecological Considerations Edited by Joseph D. Anderson and Barbara Fisher Anderson Foreword by David Bordwell

Moving Image Theory Ecological Considerations

Moving Image Theory Ecological Considerations Edited by

Joseph D. Anderson and Barbara Fisher Anderson With a Foreword by David Bordwell Southern Illinois University Press Carbondale

Copyright © 2005 by the Board of Trustees, Southern Illinois University Paperback edition 2007 All rights reserved Printed in the United States of America 10 09 08 07 4 3 2 1

Library of Congress Cataloging-in-Publication Data Moving image theory : ecological considerations / edited by Joseph D. Anderson and Barbara Fisher Anderson ; with a foreword by David Bordwell. p. cm. Includes bibliographical references and index. 1. Motion pictures—Psychological aspects. 2. Motion picture audiences—Psychology. I. Anderson, Joseph, date. II. Anderson, Barbara Fisher, date. PN1995.M688 2004 302.23'43—dc22 2004015236 ISBN 0-8093-2599-3 (alk. paper) ISBN-13: 978-0-8093-2746-1 ISBN-10: 0-8093-2746-5

Printed on recycled paper. The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences—Permanence of Paper for Printed Library Materials, ANSI Z39.48-1992. ∞

To James D. Simmons

Contents Foreword David Bordwell

ix

Preliminary Considerations Joseph D. Anderson

1

Part One. Information Available in Moving Images 1. Perceiving Scenes in Film and in the World James E. Cutting 2. The Value of Oriented Geometry for Ecological Psychology and Moving Image Art Robert E. Shaw and William M. Mace

7 9

Part Two. Perception of Simulated Human Motion 3. Creating Realistic Motion Jessica K. Hodgins, James F. O’Brien, Nancy S. Pollard, Robert Sumner, Wayne L. Wooten, Gary Yngve, and Victor Zordan 4. Perceiving Human Motion in Synthesized Images Joseph D. Anderson and Jessica K. Hodgins Part Three. Acoustic Events 5. Background Tracks in Recent Cinema Charles Eidsvik 6. Acoustic Specification of Object Properties Claudia Carello, Jeffrey B. Wagman, and Michael T. Turvey

28 49 52

61 67 70 79

Part Four. Information in Facial Expression 7. Three Views of Facial Expression and Its Understanding in the Cinema Ed S. Tan 8. Facial Motion as a Cue to Identity Karen Lander and Vicki Bruce

105

Part Five. Coupling of Perception and Emotion 9. Film Lighting and Mood Torben Grodal 10. Cinematic Creation of Emotion Dolf Zillmann

149 152

107 128

164

VIII

/ C ONTENTS

Part Six. Appeals of Reality-Based Moving Images 11. Documentary’s Peculiar Appeals Dirk Eitzen 12. Reality Programming: Evolutionary Models of Film and Television Viewership William Evans

181 183

Part Seven. Events, Symbols, and Metaphors 13. Through Alice’s Glass: The Creation and Perception of Other Worlds in Movies, Pictures, and Virtual Reality Sheena Rogers 14. Metaphors in Movies John M. Kennedy and Dan L. Chiappe

215

List of Contributors Index

200

217 228

245 249

Foreword David Bordwell W HAT PROCESSES ENABLE us to perceive, comprehend, and respond emotionally to moving pictures? Here, in gross outline, is one answer. As humans we have evolved certain capacities and predispositions, ranging from perceptual ones (biological mechanisms for obtaining information about the world we live in) to social ones (e.g., affinities with and curiosity about other humans). By exercising these capacities and predispositions and by bonding with our conspecifics, we have built a staggeringly sophisticated array of cultural practices—skills, technologies, arts, and institutions. Moving pictures are such a practice. We designed them to mesh with our perceptual and cognitive capacities. What hammers are to hands, movies are to minds: a tool exquisitely shaped to the powers and purposes of human activity. A great deal of movies’ effects—more than many contemporary film theories allow— stem from their impact on our sensory systems. We are prompted to detect movement, shape, color, and sounds, and this is surely one of the transcultural capacities that movies tap. Similarly, films from all nations and times draw upon more cognitive skills, such as categorizing an object as living or nonliving or seeing a face as furious—abilities that, it’s reasonable to think, are part of our evolutionary heritage. And because affective states and counterfactual speculation are of adaptive advantage, it is likely that an artistic medium that permits emotional and imaginative expression would have appeal across cultural boundaries. If we consider culture to be an elaboration of evolutionary processes, there’s no inherent gulf between biology and society in this explanatory framework. True, these elaborations vary historically, yielding (among other things) what we usually call conventions—local practices that seem more artificial and that differ from one society to another. Yet some conventions are less artificial than others.1 A verbal language takes years to learn and is perhaps the epitome of hard-core conventionality. Other conventions can be picked up fast because they are functionally similar across cultures. Some countries require drivers to stay on the right side of the road, others on the left, but the idea of ordering the traffic flow governs each choice. Still other conventions require only the slightest adjustments of our natural proclivities. In a picture, if the most important element occupies the center of the format, viewers from any culture will probably not be surprised. Centering (manifesting the principle of symmetry) is in some sense a convention of pictorial composition, but it seems to run with the grain of our visual predispositions, taking the line of least resistance. Strategic decentering, on the other hand, may be a convention that requires a little more tutoring.

ix

X

/ DAVID BORDWELL

Films use conventions. In most movies, characters face each other in an odd way: their bodies and faces are conveniently tilted in three-quarter view for the camera. Scenes are cut according to the tactics of continuity editing. We may hear music that does not issue from the locale of the scene, and a dissolve or fade may convey a passage of time. Still, such conventions are mostly of the quickly learned variety. Many of them piggyback on our natural predispositions; others require only slight adjustments. Several amplify and streamline regularities of human interaction, as when movie characters talking to one another stare more fixedly and blink far less than they would in real life. We understand movies fairly easily because in many respects their conventions are easy to learn: they are simplifications of things we already know. Of course, a particular filmmaker may wish to block that easy understanding—to be, as we say, unconventional—but very often will have to tap into other of our capacities and proclivities. If the story is told out of order, then we will need some redundant cues to that design as well, such as Pulp Fiction’s replay of the opening dialogue when the action returns to the diner for its climactic scene. Nevertheless, a great deal of what is conveyed in a movie is conveyed “naturally”—through those perceptual-cognitiveaffective universals that are part of our biological inheritance. This view, I believe, is likely to be true. Yet it would be stoutly rejected by most film scholars. The reasons are partly due to certain strongly held opinions within the humanities and partly due to the history of film studies as an academic discipline. My fuller version of the story can be read elsewhere,2 but in brief it goes like this. The framework I just sketched presumes contingent universals of human makeup and experience, but most scholars in the humanities tend to doubt the existence (or the importance) of empirical universals. Further, the framework hypothesizes causal and functional explanations for social practices. Most humanists, though, prefer interpretation to explanation, and when they do seek explanations, biological causes or functions are usually ruled out as deterministic. The framework I traced also takes rational inquiry, of which science is our most successful exemplar, as the most promising way to explain cultural practices. But academic humanists on the whole mistrust science and, sometimes, rational inquiry more generally. Film academics are on the whole even more suspicious of this framework than their peers in other disciplines, I believe. This is largely because film studies, entering university humanities departments in the late 1960s, became rather quickly attached to certain doctrines. Most of these, such as semiotics and psychoanalytic theory, were deeply antinaturalistic (at least in the versions that became influential). While these particular doctrines have lost their grip, an extreme version of cultural constructivism is at the base of most film studies. Consider just a few premises. • All personal experiences—identity, concepts, feelings, even perceptions—are socially constructed. (Constructed out of what? That matter is not addressed.) • Because everything is socially constructed, there is no such thing as a more or less realistic representation; every sign is equally arbitrary. (Can the concept of the arbitrary sign be intelligible without a concept of the non-arbitrary sign? Shouldn’t one then consider that there might be nonarbitrary signs? And why should all signs be equally arbitrary? These questions are not asked.)

FOREWORD / XI

• Realism is a myth because no representational system provides total access to some reality out there, if indeed such a thing exists. (Doesn’t this set the bar unreasonably high? A realistic representation need not preserve all aspects of its referent in order to be reliable, as we see in architectural design and forensic photography. But these objections are caricatured as naive realism.) • Every culture creates its own web of meaning; there may be hybridity when cultures come in contact, but there is no universal human culture. (If every culture is sui generis, how could theorists have grasped enough features of alien cultures to arrive at this generalization? This circularity isn’t considered.) Despite some claims that the discipline has become more pluralistic since the 1980s, premises like these, invoked ritualistically in the literature and taught by rote and exemplar in courses, have become operational assumptions of most academic film writing. Film studies also got off on the wrong foot methodologically. Instead of framing questions, to which competing theories might have responded in a common concern for enlightenment, film academics embraced a doctrine-driven conception of research. Essentially scholastic in its impulse, it held that certain theorists had revealed core truths, and their gospel could then be applied, in more or less mechanical fashion, to particular movies. First came Mulvey’s gaze theory, then postmodernism, then versions of identity politics, multiculturalism, and modernity theory—none weighed as candidate answers to a puzzle or problem but accepted unskeptically, then used to churn out interpretations of film after film. Film studies remains, in a word, dogmatic. In these circumstances, the appearance of Moving Image Theory: Ecological Considerations can only be welcomed. The editors have assembled a distinguished cast of empirical researchers and film theorists to explore, within a naturalistic framework, the ways moving images mesh with our minds. Every essay teems with insights and fruitful suggestions for further reflection and experiment, and all point toward ways of reconsidering some of the tenets I’ve already outlined. Take, for example, the very issue of ecological psychology. Most generally, this means treating evolutionary considerations as one constraint on theorizing about the psychology of film. It is one of James J. Gibson’s most long-lasting contributions to have brought evolutionary issues into the study of perception. At this level, any examination of moving image media that reckons evolutionary constraints or tendencies into account—as Torben Grodal, Dolf Zillmann, and Dirk Eitzen do in their accounts here—deserves the name ecological. From the same adaptive perspective, certain candidates for contingent universals can be illuminated by robust, nuanced overviews like that provided by Ed S. Tan’s discussion of facial expression or by John M. Kennedy and Dan Chiappe who (fittingly enough in a paper on metaphor) offer us the image of human cultures as islands linked into an archipelago by an unseen common ground, what used to be called human nature. We also encounter ecological theorizing in a narrower sense, that is, as proceeding from the direct-perception theories Gibson developed. Here we find essays ranging from selective treatment of some ideas as in Karen Lander and Vicki Bruce’s contribution to explorations of Gibson’s system as a whole as in the essays by Claudia Carello et al. and Sheena Rogers. I have no competence to assess these contributions, but they promise

XII

/ DAVID BORDWELL

to be as much of value to psychologists as to those of us interested in the psychology of art. Perhaps they will link up in time with the emerging ecological strain in cognitive theory, such as Gerd Gigerenzer’s concept of ecological rationality.3 Encouragingly, all these essays allay any concerns that a Gibsonian view commits one to preferring only realistic art. Nearly every study shows how a realist psychology gives special meaning to artists’ efforts to violate ecological validity, as Robert E. Shaw and William M. Mace present, to defeat our normal responses as well as to build upon them (Zillmann and Grodal again), to create filmmaking traditions that preserve certain invariants and stylize others, for example, James E. Cutting on—what else?—cutting. This is a subtle and supple realism, one that takes veridicality as a bridgehead—biologically, perceptually, cross-culturally—and then shows how conventions might arise out of systematic revisions or rejections of it. The difference between these contributions and most current film theory might boil down to this. Contemporary theory assumes that cinematic communication is almost wholly conventional (and the conventions come from culture); what is not conventional amounts to very little (often called physiology!) and not considered very important. According to the ecological view, cinematic communication relies on a great many nonconventional capacities and processes, and the conventions are correspondingly small in number and easy to learn—riding as many of them do upon just those ecologically constrained processes. It’s appropriate that Gibson developed his perceptual theories out of his work with cinema. As a lieutenant colonel in the Air Force during World War II, he was in charge of testing films that could train pilots. In trying to simulate the problems of identifying other aircraft and landing on airstrips, he was led to treat human vision not as a snapshot but as a flowing optic array—oriented to the horizon, displaying texture gradients, and illuminated by light from above.4 Film turned out to be very effective in teaching young pilots what flying looks like, and so Gibson had pretty solid empirical reasons for adopting a realist perceptual psychology. Movies, he understood, could faithfully capture essential features of a life-or-death situation. If we invited today’s postmodern academics to come up with reliable ways to represent airplane maneuvers, I shudder to think what casualties would result. But maybe not, at least once these academics got off the ground. If there are no atheists in foxholes, perhaps there are no culturalists in cockpits. Notes 1. On pictorial conventions, see Noël Carroll, “The Power of Movies,” Theorizing the Moving Image (Cambridge: Cambridge University Press, 1996), pp. 80–81. 2. See my “Contemporary Film Studies and the Vicissitudes of Grand Theory” in Post-Theory: Reconstructing Film Studies, ed. David Bordwell and Noël Carroll (Madison: University of Wisconsin Press, 1996), pp. 1–36. I try to defend the “moderate constructivism” sketched above in “Convention, Construction, and Cinematic Vision,” ibid., pp. 87–107. 3. See Gerd Gigerenzer, Adaptive Thinking: Rationality in the Real World (New York: Oxford University Press, 2000). 4. See Motion Picture Testing and Research: Army Air Forces Aviation Psychology Program Research Report No. 7, ed. James J. Gibson (Washington, D.C.: U.S. Government Printing Office, 1947), pp. 181–212, 219–30. Edward S. Reed explains the importance of these experiments in his superb James J. Gibson and the Psychology of Perception (New Haven: Yale University Press, 1988), pp. 114–80.

Moving Image Theory Ecological Considerations

Preliminary Considerations Joseph D. Anderson Muddles and misconceptions prevail. We are led to conceive a sort of apparatus inside the head that is similar to the apparatus for making a picture show outside the head. We have been taught that a picture is sent up to the brain and so we conclude that a series of pictures can be sent up to the brain. We all know what a snapshot is, and we know that a film is a series of snapshots. If we are told that a movie presents us with a sequence of retinal snapshots joined by what is called the “persistence of vision,” we believe it. But we are misled. —James J. Gibson, The Ecological Approach to Visual Perception

A FEW YEARS AGO a student, a very bright Ph.D. candidate, stopped me in the hallway after class and said, “Why do you persist in your ecological approach to film theory when it makes everyone so angry at you?” I was a bit stunned. I opened my mouth, but nothing came out. Then he continued. “Is it just because it makes so damned much sense when it comes to explaining motion pictures?” “Yes,” I heard myself answering, “It makes sense.” This student had come to us with a master’s degree in film studies, and he was familiar with conventional film theory, usually referred to by its adherents as “contemporary film theory,” the residue of a series of theoretical formulations and political positions that fueled a succession of academic fads from the 1970s through the end of the century. The followers of these conventions were presumably the everyone to whom he referred as being angry. Apparently what made the conventionalists so angry was the introduction of literature from the sciences into the discussion of motion pictures. They were categorically against such a thing no matter how much sense it made. At our urging, the student had also taken several courses in psychology and was aware of the controversy between advocates of perceptual psychology based on a tradition extending from Hermann von Helmholtz to the present and those who support the newer, ecological approach based on the work of James J. Gibson. At the heart of their disagreement is a difference in their basic conceptions of what it is to perceive the world. For the followers of Helmholtz, We are essentially separate from the world of objects and isolated from external physical events, except for neural signals which, somewhat like language, must be learned and read according to various assumptions, which may or may not be appropriate. (Gregory, 1987, p. 309) Percepts are thus constructed from the raw sense data. Ecological psychologists offer quite a different idea. They propose that we are part of the environment that shaped us and

1

2 / JOSEPH D. ANDERSON

that through evolution we have developed the capacity to perceive the world directly. That is, we see the world, not our sense data. They argue that within well-defined boundaries, we see and hear the events of the world around us without mediation; and we presumably share the capacity to do so with other biological creatures who have no capacity for language and less capacity for logic or inference than we. Film theorists who would draw upon the sciences, particularly psychology to help them understand what motion pictures are, how we gain meaning from them, why we seem so attracted to them, and what effects they have on our lives, have gained a greater appreciation of the complexity of perception itself. They have learned that psychologists do not all agree on how we perceive the world nor is there general agreement on how we perceive motion pictures. From the other side, psychologists are thinking that a better understanding of how we perceive the comparatively limited phenomenon of the motion picture might open the way to further insight into how we perceive the world at large. Who would have thought that so much attention would be paid to the seemingly casual and inconsequential act of movie viewing? Much to our regret, the student was so dismayed by the storm of controversy surrounding film theory that after completing his degree, he switched to another field of study. We were unable to convey to him that we are not dismayed. If people are passionate about such matters, that is all the better. Intellectual controversy is good, and open debate is often very productive. Spirited competition can expose error or uncritical adherence to dogma, and in time what was once distant and incomprehensible can become known. Considering the pervasive role of media in our world and in our individual lives, the debate is crucial, and the study of moving images is an important and terribly exciting pursuit. This book is about perceiving mediated images and sounds. It will take us beyond conventional film theory and traditional psychology, and it will encompass images and sounds that are currently being produced at the convergence of film, video, and computer technologies. The authors of these essays come from a variety of fields, from computer animation, media research, ecological psychology, philosophy, and film studies, yet they all share an interest in discovering how mediated images and sounds are somehow intensely meaningful and emotionally engaging. The book itself constitutes one of those rare and delightful occasions when scholars from different disciplines come together through mutual interest and general goodwill to add to the common store of knowledge, in this instance to increase our understanding of one of the most pervasive and controversial phenomena of our time—moving images. While the authors of this set of essays are quite knowledgeable concerning other points of view, their commitment in this volume is to an ecological consideration of moving images. To greater and lesser extents, the authors are all building on the foundation laid by Gibson, the founder of ecological psychology. Of course, others could have been asked to elaborate some of the competing theories, but that would have resulted in a much larger, and perhaps unwieldy, volume. Readers interested in seeking information about alternate theories should have no difficulty in finding extensive material on film and other related media from other points of view. For the reader who is new to ecological psychology, perhaps a brief introduction is in order. As its name implies, ecological psychology takes into account that we (like all other animals) are creatures that have evolved in an environment. The environment has provided the things we must have for survival, and we through evolution have developed the capacities to gain the information we need to guide our actions in that environment. The environment provides patterned arrays of energies such as are found in reflected light and vibrations of molecules in the air, which specify in their patterns of

P RELIMINARY C ONSIDERATIONS / 3

activity the objects and events of the world. Our senses are attuned to certain of these energies, which we therefore refer to as “visible” light and “audible” sound frequencies. Through our perceptual systems, we can obtain the information about objects and events, and in doing so, we generally are unaware of either the arrays of energies or our own sensory processing. Instead we see and hear the world—its objects and events. Of course, the properties of an environment are not static, and the perceiver does not remain stationary but instead moves through the environment. Therefore, the arrays of energies to which our senses are attuned are constantly changing, yet we are able to perceive a stable world. The objects that are stationary are perceived as stationary, and the objects that are moving are seen as doing so. Moreover, we are able to detect the invariants of a layout despite changes in lighting or our own movement. Invariants are the things that do not change with changes such as lighting or the position of the perceiver. For example, the rectangular table does not change its shape as we dim the dining room lights or walk around it. A major tenet of ecological psychology is that we do not passively catalogue random properties of the world as they are revealed to us through our senses; instead we actively look and listen (and touch, taste, and sniff ) for the things that the environment might afford us. Gibson proposed that to perceive the layout of the environment, its arrangements of objects and events is to perceive what they afford. He coined the word affordance to name that “complimentarity of animal and the environment” (Gibson, 1979, p. 127). “Affordances are properties of the environment as they are related to an animal’s capabilities for using them. . . . To perceive an affordance is to detect an environmental property that provides opportunity for action” (E. Gibson & Pick, 2000, pp. 15–16). An obvious example in the natural world is that the flatness and solidity of the ground specify the affordance of walking upon for a human. A log might afford sitting. It also seems to be possible to attend to whatever information we need and to simply ignore much of the rest. While it is no doubt true that we have access to (perhaps one could say are bombarded with) more information today than ever before in our history as human beings and therefore this capacity is very useful to us, the basic strategy of seeking the information we need to function and ignoring the rest is apparently an ancient one. Even though several different species may physically occupy the same space, the same valley or mountain, each occupies its own ecological niche. The physical objects and actual events that occur in that space are not equally important or equally attended to by all its inhabitants. But, of course, the moving images that we are primarily concerned with in this volume are not those of the natural world but those of motion pictures, television, and computer displays (including video games and virtual reality) and therefore we are not in the area of Gibson’s greatest interest. In the introduction to his 1979 book The Ecological Approach to Visual Perception, he says, We are concerned with direct perception, not so much with the indirect perception got by using microscopes and telescopes or by photographs and pictures, and still less with the kind of apprehension got by speech and writing. (p. 10) Yet, in the same work he devotes a chapter to “motion pictures and visual awareness” and frets that there is no term to refer to what he wants to call the “progressive picture” as opposed to an “arrested picture.” “The term motion picture,” he notes, “implies that motion has been added to a still picture” (p. 293), and he is perhaps uncomfortable with

4 / JOSEPH D. ANDERSON

the baggage of cinema; and film seems to exclude television. With a similar interest in being both precise and inclusive, we have arrived at the term moving image, hence the main title of this volume, Moving Image Theory. Gibson said that he was not much concerned with motion pictures, yet he laid a foundation upon which his successors might build a comprehensive theory of progressive pictures, or moving images if you will. His approach to perception seems to shed light upon many of the problems that have puzzled students of the motion picture ever since its inception. The implications of this approach for a theory of moving images are pursued in considerable detail in the essays that make up our current volume, hence the subtitle, Ecological Considerations. The writers of these essays are in a position to avoid a problem that plagued much of the theorizing of the last half of the twentieth century—the construction of elaborate and complex theories with no moorings in reality. The ecological approach is “a theory about everyday perceiving in the world, and it differs greatly from theories that begin with a motionless creature haplessly bombarded by stimuli” (E. Gibson & Pick, 2000, p. 14), or we might add, haplessly sutured into texts or haplessly victimized by dominant ideology (see Bordwell, 1996). Ecological psychology takes into account routine, everyday selection on the part of the perceiver. It is readily apparent that images are different in kind, that they differ in their meanings, and that some are much more interesting or useful or entertaining than others. What information we choose to gain and how we gain it from a plethora of moving images is the common theme of the essays that make up this book. A set of capacities for gaining information from our environment developed during the millions of years of our evolution as living organisms because information guided our actions in that environment. The development of capacities for extracting information was possible in evolutionary terms because those individuals with better information were better able to obtain the things they needed from the environment and better prepared to avoid dangers that the environment presented. Today, makers of moving images exploit these capacities, and we willingly engage with the images and sounds offered. But our capacities are not unlimited; they are finite in number and bounded in extent. Exactly what are our capacities for gaining information, precisely how do the makers of motion pictures exploit them, and why do we attend to images and sounds that we know are purposely crafted? The essays in this volume address these questions. Although it becomes very apparent as one reads the essays that the writers are not all of a stripe, they nevertheless in aggregate make a case for an ecological approach to film theory that is much too compelling to ignore. They address fundamental questions that puzzled film scholars throughout the last century and offer answers of such clarity and complexity that an open-minded reader may indeed be surprised and delighted. For example, basic questions surround the status of the image itself. What is a moving image? What is its relationship to the world, to the viewer, to its creator? In a fiction film, what is the relationship of the image to the fictional world and to the profilmic event, the action staged before the camera? The renowned French film theorist André Bazin argued that the photographic image bears the relationship to the thing photographed that a fingerprint bears to the finger that produced it. Just as the ink is mechanically transferred from the finger to the paper, point-by-point the rays of light reflected from an object imprint the features of the object upon the light-sensitive emulsion of the film. For both the fingerprint and the photograph, there is a mechanical link between the image and its referent. Accord-

P RELIMINARY C ONSIDERATIONS / 5

ing to Bazin, it is this purely mechanical relationship that connects a photographic image to reality in a way that does not exist for other methods of depiction such as painting (Bazin, 1971). Similarly, Siegfried Kracauer argued that “Film . . . is uniquely equipped to record and reveal physical reality and, hence, gravitates toward it” (1960, p. 28). Both Bazin and Kracauer realized that the photographic image had a special relationship to reality, but neither was able to fully resolve the problem of the relationship of the image on the film to the event in the fictional world. In the years following the deaths of Bazin and Kracauer, film scholars gradually abandoned these questions. At first, they proposed that film was essentially symbolic and should be treated as language, where, of course, the connections between referent and symbol are arbitrary. And as this semiotic project failed, they posed political or social questions that avoided confronting the image as image. In fact, in film studies, the word image came to refer not to the pictures presented but to the political positioning of persons, groups, or other entities by their portrayal in a motion picture. An ecological approach to moving images returns us to the basic questions and grounds us in everyday reality. Events are the everyday stuff of reality and the patterns they create in light and air, the stuff of ordinary information. Claudia Carello, Jeffrey B. Wagman, and Michael T. Turvey make the point that events structure energy distributions such as arrays of light or patterns of vibrations in the air that specify the properties (size, shape, mass, momentum, friction, etc.) of the source that generated the energy distributions in the first place. As perceivers, we can access the arrays of light with our eyes and the patterns of vibrations in the air with our ears and thus see and hear the event itself. We thus have direct mechanical access to the actual event. A camera may record the event by recording the array of light energy, and a microphone may record the patterns of vibrations of air. These in turn may be re-presented to our perceptual system. But the existence of information does not guarantee perception. As Robert E. Shaw and William M. Mace point out, We do not see what is simply in the light to the eye, as the physicist might construe it; rather we see what is functionally specified by the light to a highly evolved visual system—one that has been adaptively designed to fit its environment by evolution and further attuned by experience. We might say that perception is lawful but not rigid. It is in fact highly flexible. The arrays of light and sound structured by an event often contain a great deal more information than we can comprehend at any given moment. This is where selectivity comes into play, and for a variety of reasons we perceive some patterns and not others. This holds for arrays structured by natural events and for arrays structured by the hand of man. It is indeed possible to construct arrays artificially that specify events that exist only in a fictional world, and when presented to our perceptual system, we process these arrays by the same laws of perception as actual events. Carello et al. and James E. Cutting address the problem of a synthesized event, that is, information created not by a natural event in the world but by either staging the event before a camera and microphone or perhaps by generating such information on a computer or, as is often the case, doing some of both. Shaw and Mace observe most succinctly that “the source of the information may not possess the property that the information from that source specifies.” Once an actual physical event is no longer required to produce the array that carries the information, then it is possible to create information for events of any kind, even impossible events.

6 / JOSEPH D. ANDERSON

An ecological theory of moving images has no problem dealing with these synthetic images. It begins with the premise that we have developed a sensory system specifically adapted to gain information from the usual events that occur in the natural world that have in our past been important for our survival as creatures who walk around on the ground. Our senses are constrained in many ways. For example, we cannot directly perceive the very small or the very distant; for these purposes we have invented microscopes and telescopes to allow us to perceive these objects and events indirectly because we could not easily adapt to perceiving them directly. We can, of course, perceive synthetic events in moving images, not because we have adapted to the fast pace or discontinuity of editing but because these images have been specifically tailored by their makers to fall within the boundaries of our perceptual system for perceiving events in the world. Our perception remains lawful and constrained whether we are seeing events in the natural world or impossible events on the movie screen. The events depicted may or may not adhere to the laws governing the natural world. Shaw and Mace observe that “To the extent that such extraordinary circumstances defy rational (lawful) explanation, they serve to increase the mystery, metaphoric depth, and hence expressive power of the indirect perceptual event.” To the extent the events of a movie follow a rational course and adhere to natural constraints, they are made more accessible to a larger audience and tend to gain in credibility. The idea that perception of moving images may be lawfully constrained is perhaps frightening to some scholars in the arts and humanities who may not fully understand or appreciate the creativity required for the theory building that is an indispensable part of scientific pursuit; but they should be reassured that within the constraints articulated by ecological psychology, there is plenty of room for discussion about particular relationships between moving images and viewers. By proposing lawful relationships for the perception of motion pictures, students of the moving image are invited to engage in substantive debate, to discuss matters of consequence, and to propose explanations that stand a chance of being true. References Bazin, André. (1971). What is cinema? (Vol. I). Hugh Gray, Trans. Berkeley: University of California Press. Bordwell, David. (1996). Contemporary film studies and the vicissitudes of grand theory. In Bordwell and Noël Carroll (Eds.), Post theory: Reconstructing film studies (pp. 3–36). Madison: University of Wisconsin Press. Gibson, Eleanor J., and Pick, Anne D. (2000). An ecological approach to perceptual learning and development. New York: Oxford University Press. Gibson, James J. (1979). The ecological approach to visual perception. Boston: Houghton Mifflin. Gregory, Richard L., ed. (1987). The Oxford companion to the mind. New York: Oxford University Press. Kracauer, Siegfried. (1960). Theory of film: The redemption of physical reality. London: Oxford University Press.

Part One Information Available in Moving Images T HE CONCEPT OF information is central to ecological psychology. It is a precise term that denotes arrays of energy that are patterned and quantifiable; a discussion of the information available in moving images seems an appropriate place to begin. James E. Cutting in “Perceiving Scenes in Film and in the World” notes that movies are not really very much like the world. Movie space and time are different, and things are arranged differently. In Hollywood-style movies, all elements are arranged to support the narrative. A number of techniques have been developed to minimize our awareness of the way the film is constructed and to maximize our attention to the story. “To go unnoticed,” Cutting says, “these techniques must mesh with the human visual system.” He offers nine sources of information and then demonstrates how each has been manipulated for a desired effect. For example, in Notting Hill (1999) and Rope (1948), occlusion is used to hide the cuts. In Twelve Angry Men (1957), height in the visual field is varied to manipulate objects in space. He notes that in Vertigo (1957), Hitchcock combines a dolly shot with a zoom-out to manipulate relative size and distance. “This procedure keeps the near steps the same size, but dilates the space changing the apparent depth by changes in the relative size of farther objects but not nearer ones.” He also notes that shadows and lighting are used to the advantage of narrative in The Lady Vanishes (1938) “where a handwritten message in the condensation on the interior of a train window is invisible in daylight but appears when the train is in a tunnel.” With regard to space from the viewer’s point of view, Cutting defines three regions— “vista space (that beyond about 30 m) for a pedestrian, action space (from 30 m inward to about 1.5 m), and personal space (closer than about 1.5 m).” He argues that action space is the primary space of the movie. Vista space may be occasionally employed in a wide shot of a landscape, and more importantly, he claims “that part of being a viewer of the action in a film is contingent on not having things enter one’s personal space.” Cutting also observes that many violations of continuity editing will be tolerated by an audience if the narrative is sufficiently powerful, as in the cutting back and forth between shots with a blue sky and shots with an overcast sky in the boating scene in The Sound of Music (1965). But he also points out that neither filmmakers nor editors nor psychologists know for sure in advance which discontinuities will be accepted by an audience and which will not. He goes on to observe that the 180-degree rule may not be as inviolable as some have thought, and that “such cinematic ‘rules’ are not, as often proposed, like a ‘grammar’ of film.” Recalling that we did not evolve to watch movies, he marvels that they work so well and offers that the reasons for film’s success stem from our biological endowment, how it constrains and does not constrain our cognitive and perceptual systems in dealing with space and time. Robert E. Shaw and William M. Mace in “The Value of Oriented Geometry for Ecological Psychology and Moving Image Art” distinguish between direct and indirect

7

8 / P ART O NE

perception and suggest a modification of standard projective geometry that would allow a more lawful approach to both. They offer that representations can be ecologically valid by conforming to the laws of nature. A set of representations in violation of the laws of nature would be ecologically invalid. “Direct perception has ecological validity,” they write, “because, in principle, it has direct access to confirmatory information while indirect perception may or may not.” We are reminded that “[a]rt is always lacking some degree of ecological validity because the expressive stylistics imposed by the individual artist are unique and defy rational conventions.” But it is possible, they argue, to have a set of conventions which when followed will allow indirect perceptions to conform to direct perceptions. One might consider this to be the case with films that use classical continuity cutting techniques. But what about those films in which artistic expression overrides clarity of narration? These are, according to Shaw and Mace, extraordinary circumstances, where the conventional constraint is suppressed or unavailable. Under such circumstances, the indirect perceptual event can “take on a life of its own.” Let us pursue an example using the concepts just introduced. A scene may begin with a series of shots that depict an event in a perfectly lawful way where the editing channels the information within the bounds of normal direct perception, and in a sense, our perception of the event represented in the scene is direct. It is ecologically valid. But then a f lashforward is presented, thus introducing an ecologically impossible element, and suddenly the representation is made invalid. Now, we may assume that most viewers are aware of the filmic convention of the flashforward and are made aware of the status by the narrative; thus, the flashforward becomes completely comprehensible, and we are able to continue perceiving the scene as ecologically valid. On the other hand, it is also possible that the narrative suppresses the information about the convention (and that we do not somehow figure it out), and we are left with an incomprehensible event that inexplicably jumps forward in time. This situation may or may not be acceptable to us the audience, and it may or may not have been intended by the filmmaker. But the filmmaker is free to create an ecologically invalid situation if he likes, and he frequently does. A film that comes to mind is Matrix (1999), in which the ecological status of the image is often in question. Sometimes, the viewer is aware of a shift in status to flashback, flashforward, dream, or hallucination, and sometimes not. That is, sometimes the events are made as though lawful by the use of filmic convention, and sometimes they are not. In the latter case, perhaps a sense of disorientation and unreality is created in the viewer, which presumably is a desirable effect for this film. Of course, the questionable status of most of the images is supported by the narrative, thereby placing the irrational events of the movie in a rational narrative context. Shaw and Mace observe that projective geometry has for some time been generally thought by both psychologists and artists to be the most descriptive possible mathematical formulation of information that is available to visual perception. But they argue that ordinary projective geometry merely explains how the retinal image is formed and is inadequate to explain how we gain information in actually seeing the world. They offer oriented projective geometry as “a simpler and more accurate description of visual perception.” Such a geometry might prove especially useful to artists creating computergenerated images, perhaps allowing artists both the possibility of creating more-accurate representations and of creating more-controlled violations of natural law.

1 Perceiving Scenes in Film and in the World James E. Cutting T HE REAL WORLD is spatially and temporally continuous; film is not. We evolved in a continuous world and, regardless of how much we may enjoy them, we emphatically did not evolve to watch movies. Instead, movies evolved, at least in part, to match our cognitive and perceptual dispositions. The result is a curious melange of short shots with instantaneous camera jumps between them, something not at all like the rest of the world around us. Why and how do we accept this? Part of the answer, I claim, is that we do not necessarily perceive the world according to its physical structure. For example, although we evolved in a Euclidean world, our perceptions of space around us are generally not Euclidean and generally do not need to be (for more discussion, see Cutting & Vishton, 1995; and Cutting, 1997). In addition, although we evolved in a temporally continuous world, our perception of time is not tightly bound to any temporal meter. Thus, there is a considerable plasticity to our perceptual world; it just happens that the world is mostly rigid and evenly flowing. Part of the success of film can be attributed to the goals of what is sometimes called Hollywood style (see Bordwell, Staiger, & Thompson, 1985).1 Without endorsing any political or social aspects of this genre, one finds that Hollywood style has a main goal that is almost purely cognitive and perceptual—to subordinate all aspects of the presentation of the story to the narrative (e.g., Messaris, 1994; Reisz & Millar, 1968). This means that, generally speaking, all manipulations of the camera, lighting, editing, and sets should be transparent, unnoticed by the filmgoer. To go unnoticed, these techniques must mesh with the human visual system. Finally, to understand why film works so well is to understand much about how we perceive the real world; and to understand how we perceive the world tells us much about how we understand film. This is, I claim, the fundamental tenet of an ecological approach to cinematic theory. This chapter is about our perception of space (or, better, layout) in the world and in film; and then of how space and time can be cut up to make a film scene. But first let me establish some terminology. Film is made up of shots, each consisting of the continuous run of a camera. For seventy-five years, the maximum standard shot length for 35 mm film has been ten minutes—a thousand feet of film or the running time of one standard reel—although few shots are ever that long. In the production of a typical Hollywood film, shots tend to be much longer in the initial photography, and then in the editing process, each shot is trimmed to a few seconds in length for the final film. The shot is then juxtaposed, without transition, with another shot taken from another point of view. This juxtaposition is called a cut. A scene usually takes place in a single location. Typical scenes are made up of many shots and cuts; and most films, of course, are made up of many scenes. We do not usually speak of real life as being made up of

9

10 / JAMES E. C UTTING

scenes, but it does no real injustice to speak this way. We walk through life. When our environs are roughly the same, such as when strolling outside across a town square, we could call this a scene; when they change, perhaps when we then enter a building, we could call this a break between scenes, going on to the next. Episodic memory, a central concern of cognitive science, is essentially the memory of scenes from our life. In film, every shot shows an environment of some kind. This environment has a physical arrangement, or a layout, of objects and people. The projection of this layout is unique to a particular camera position, but as viewers, we pay little attention to this projection. Instead, we focus on the “world behind the screen.” We also view the real world at any given time from a particular position and also generally ignore its particular projection to our eyes, focusing instead on the general three-dimensional layout of the environment. Cinematographers and film directors, of course, pay considerable attention to camera position, crafting the composition of the image. In particular, they manipulate the information available to portray the layout of the scene as they deem best. What are they manipulating? Consider an answer in terms of contemporary and traditional research in the visual sciences. Revealing Layout (Depth) Through Different Information To begin, it will be useful to separate nine of the different sources of information (traditionally called depth cues) available to an observer in the real world and then apply these sources to film. Few if any of these sources by themselves imply a metric space (measured in ratios and absolute distances). Although in consort all can contribute to a near-Euclidean representation of space relatively near us under ideal conditions, there is enough leeway for a seasoned cinematographer or a film director to carve out of them more or less what he or she wants us to see. Consider each in turn applied to the world and then to film. 1. Occlusion occurs when one object partly hides another from view. Cup one hand in the other, and the hand closer to your eyes partially occludes the farther. As an artistic means of conveying depth information, partial occlusion has been found in art since paleolithic times where is it often used alone, with no other information to convey depth. Thus, one can make a reasonable claim that occlusion was the first source of information discovered and used to depict spatial relations in depth. And, of course, it is found in the earliest photographs and films as well. However, occlusion is never more than ordinal information—one can only judge that one object is in front of another but not by how much. Thus, the kind of space that can be built up from occlusion information alone is an affine space—one that can squash, stretch, and shear. Camera position and the layout of clutter in a scene will dictate to the observer (and camera) which objects occlude or partly occlude others. If only occlusion occurs within a shot, a perceiver will not be able to know exactly where two objects are. It gives great power to the cinematographer. Occlusion is unavoidable in film, so much so we often take it for granted. We should not. It is used very effectively, for example, in a temporal-lapse sequence in Roger Mitchell’s 1999 film Notting Hill. Between flirtatious episodes with movie star Anna Scott (Julia Roberts), bookseller Will Thacker (Hugh Grant) walks through the market in London’s Notting Hill, being occluded by arcades, stands, and people. The sequence appears continuous, and the camera follows Thacker with a long tracking movement, most of it with the camera’s line of sight at 90° to its motion. Seasons change through a full year during the stroll and track, juxtaposing two types of time—that measured in

P ERCEIVING S CENES / 11

seconds with that measured in months. Given that the camera follows Thacker, our attention remains on him even when he is out of sight. Among other things, this demonstrates that objects at different depths but at the same retinal location can be attended to separately, an idea that has received much laboratory focus (see Atchley, Kramer, Andersen, & Theeuwes, 1997, for a review; see also Neisser & Becklen, 1975). Despite appearances, this Notting Hill sequence is not a continuous shot. Manipulating the viewer’s attention, the editor uses occlusion to hide a cut in the shot transitions, which is necessary for the circular movement of the camera in the second part of the sequence, a fine example of following Hollywood style. A similar solution to a technical problem is used in Alfred Hitchcock’s 1948 film Rope, about which more will be said later. 2. Height in the visual field concerns object positions in the field of view, or in the frame. Objects occupying higher positions are generally farther away. This information typically measures relations among the bases of objects in a three-dimensional environment as projected to the eye or camera. Like occlusion, height in the visual field offers only ordinal information, and like occlusion it has been used in pictorial representations since near the beginning of art. Moreover, with photography and film, camera height is often manipulated for specific effect. A high, downward-tilting camera reveals greater differences among the bases of ground-plane objects measured in the picture plane, giving more-articulated information. A low and level camera, on the other hand, diminishes the availability of this information, forcing us to compare object juxtapositions without height information. A high and level camera yields the same kind of distance information as a lower one but goes farther out into space, giving a grander view. Relations among objects in terms of height in the frame reciprocally specify the height of the camera and the camera angle with respect to the ground plane. The height of the camera and its angle, in turn, place the perceiver in a subjective position—high often indicating dominance (as with adults looking down at children) and low a more submissive role (as with children to adults; see Messaris, 1994). The first half of Robert Wise’s 1965 film The Sound of Music is largely about the Von Trapp children. It is shot mostly from an eye height slightly less than an adult. The second half of the film, however, is largely about the romance between Maria (Julie Andrews), the governess, and Captain Von Trapp (Christopher Plummer). It is shot mostly from the eye height of an adult. Indeed, viewers are supposed to identify with the children in the first half of the film (Maria is also winning us over as governess) and with the adults in the second. The point here is that the relations among objects, particularly as revealed by height in the picture plane, tell us where our eye is—and thus help tell us whether we, the film audience, are “children” or “adults.” We don’t notice this watching the film; it is a part of Hollywood style. In his 1957 film Twelve Angry Men, Sidney Lumet used systematic differences in camera height across the course of the film, manipulating information (among other things) about height in the visual field (Lumet, 1995). Unlike Wise’s film, dominance and identification are not primary factors here; manipulation of space is. Roughly the first third of the film was shot at a standing eye height that, because most of the actors are sitting at the jurors’ deliberation table, gives ample information about the locations of objects on the table and positions of individuals around it. The second third of the film was shot generally at a sitting eye height. This foreshortens the table and makes less clear where things and people are, but we already know this because of the first part of the film, and the lower camera height draws us into the deliberation around the table. The final third of the film was shot just below sitting eye height, removing the plane of the

12 / JAMES E. C UTTING

table almost completely. This deletes the space in front of the individual jurors, isolating them in their deliberations from their locations at the table and thus from each other. But again, we don’t notice this manipulation. 3 and 4. Relative size and relative density concern how big objects are and how many there are as seen by the eye. Pebbles are large and not numerous when seen held in the hand, but they are smaller and more numerous seen on a rocky beach. More technically, relative size is a measure of the angular extent of the retinal (or image) projection of two or more similar objects or textures. It has been used in some rough sense since at least early Greek, if not Egyptian and Persian, art. Unlike occlusion and height in the visual field, relative size has the potential of yielding ratio information. That is, for example, if one sees two similar objects, one of which subtends one half the visual angle of the other, the former will be twice as far away. Technically, relative density concerns the projected number of similar objects or textures per solid visual angle and is what Gibson (1950) meant by term texture gradient. It works inversely to relative size and is considerably weaker in its perceptual potency (Cutting & Millard, 1984; Cutting & Vishton, 1995). Relative density is a relative latecomer to art; its effects were first seen in the local (not fully coherent) perspective piazzas of the fourteenth century. Its lateness to the armamentarium of depiction is due to the fact that only with the invention and use of linear perspective in Renaissance art are these first four sources of information—occlusion, height, size, and density—coupled in a rigorous fashion. The technology of depicting density differences is the hardest to carry out. Unlike relative size but like the first two sources, relative density provides only ordinal information about depth. Computer graphics allow independent manipulation of relative density and relative size, but with a camera in the real world, the two are yoked: As size of texture elements doubles, their density decreases by half. In photography, relative size and relative density are manipulated through the use of lenses (e.g., Swedlund, 1981). Perhaps the most familiar example of issues concerning relative size occurs in portrait photography. Here the photographer typically stands back from the subject and uses a long lens. For 35 mm film, the standard lens has a focal length of 50 mm; a lens with a focal length greater than about 100 mm is considered a long lens, also called a telephoto lens. With a short focal-length lens on the camera, the camera must be placed close to the person being photographed, with the result that the difference between camera-to-nose distance and camera-to-ear distance is great, and the person’s nose appears large. With a long lens on the camera, the camera can be placed farther away from the person being photographed. With the camera farther away, the difference between camera-to-nose and camera-to-ear distances becomes negligible, so the person’s features appear close to their actual sizes. This is also one reason why most shot-reverse-shot sequences in cinematic dialogs are taken with relatively long lenses. They make the actors look better. More on dialogs later. Manipulations of relative size through lenses have other important effects, dilating and compressing space with short and long lenses, respectively. One memorable scene near the end of Mike Nichols’ 1967 film The Graduate has Benjamin Braddock (Dustin Hoffman) running down a sidewalk, his car having broken down, trying to stop the wedding of the young woman he loves. He runs for more than ten seconds directly towards the camera (into a very long lens), with the appearance of getting nowhere. This getting-nowhere effect—enhancing the anxiety of the viewer—is conveyed by the fact that the long lens compresses depth. This compression results from decreased differences

P ERCEIVING S CENES / 13

in relative size (and relative density) in the sections of sidewalk and the surrounding trees and bushes and also keeping Braddock from growing much in size as he strains to get to the church. Such spatial compression and dilation effects find themselves useful in many situations. Again in Twelve Angry Men, Lumet shot the first third of his film with relatively short lenses, dilating depth and conveying a wide-angle spaciousness of the deliberation room. He then shifted to more-standard lenses in the next third; and a long lenses in the final third, narrowing the field of view and compressing the space around the jurors as the debate progressed, creating more tension. Combined with the progressively lower camera angles, by the end, the ceiling is revealed to be pressing in on the jurors as well. But again, all of this is unnoticed; we follow the narrative, and the lens effects support the narrative. Perhaps the most striking spatial transformation is attributable to Alfred Hitchcock (Truffaut, 1983) in a sequence that gives eponymic visual force to 1957 film Vertigo. Hitchcock wished to simulate Scottie Ferguson’s (James Stewart’s) fear of heights during his views down a bell tower’s stairs. The effect is done in a subjective shot (one following an objective shot of Ferguson looking down the stairwell, called point-of-view editing) by combining a dolly in with a zoom out.2 This procedure keeps the near steps the same size but dilates the space, changing the apparent depth by changes in the relative size of farther objects but not nearer ones. The scene has a stomach-churning plastic and deforming character. The bottom of the stairwell rushes away from the viewer, getting deeper and more dangerous. It should be noted that the effectiveness of this dolly/zoom depends on the viewer having some near-metric information about depth. If the visual system’s ability to deal with depth were completely plastic (affine), such effects would not be noticed at all! Thus, there is a sense in which we notice the effect. This would appear to conflict with the idea of Hollywood style, but it does not. The effect is the key element of the narrative. Ferguson has vertigo, and we, personally, can see that it is an awful and debilitating thing. 5. Aerial perspective refers to the effects of fog, mist, and haze. These create an indistinctness of objects, with distance determined by moisture or pollutants in the atmosphere. Its perceptual effect is a decrease in contrast of the object against the background with increasing distance, converging to the color of the atmosphere. Aerial perspective was systematically discussed and understood by Leonardo da Vinci (Richter, 1883) and has been used photographically and cinematographically since their beginnings. Like many other sources, it is ordinal information. Objects are dimmer and less distinct when farther away, but as viewers we don’t know by how much because we really cannot accurately assess the density of the atmosphere. In photography and film, aerial perspective (particularly as fog) can also be manipulated with lenses. Long lenses bring more of the atmosphere into play among objects in focus and in the field of view. The final scenes of Michael Curtiz’s 1941 film Casablanca use lenses and fog quite effectively. Wondering who, if anybody, will escape the Nazis from this North African city, the audience sees—because of the fog—the airplane at a barely attainable distance behind Rick Blaine (Humphrey Bogart), Ilsa Lund (Ingrid Bergman), and Victor Laszlo (Paul Henreid). This heightens the viewer’s anxiety about possible departure. Perhaps the clearest example of this effect is not with fog but with rain. The bulk of televised baseball games is typically shot with long lenses from behind the catcher or

14 / JAMES E. C UTTING

from center field. These show the pitcher, batter, and catcher occupying what seems to be the same space, nearly on top of one another. This, of course, is the effect of relative size compressing depth discussed earlier. But on nights with a smattering of rain, not enough to stop the game, the images shot with long lenses make the scene look like a veritable downpour. This is not Hollywood style, because one wonders why the umpires do not stop the game. Yet, the downpour is a false impression—more raindrops in depth are compressed into the field of view than are experienced by ballplayers. 6. Accommodation occurs with the change in the shape of the lens of the eye, allowing it to focus on objects near or far while keeping the retinal image sharp. Objects at other distances are blurred. The camera analog to accommodation occurs with dynamic manipulation of focal depth, which can place one object in focus and another out. This information tells the viewer only that the objects are at different depths. By itself, however, it does not even tell depth order. Interestingly, blur first appeared in art about the same time as Impressionism and with late-nineteenth-century photography (Scharf, 1968). Manipulation of clear and blurred regions of an image is also a powerful tool for the cinematographer. It is used to control points of interest in a scene where he or she wants the viewer to look. This is done effectively, for example, in The Graduate when looking over Benjamin Braddock’s shoulder and first focussing on Elaine Robinson (Katherine Ross) in her bedroom, then on Mrs. Robinson (Anne Bancroft) in the more-distant hallway. Only one is in focus at a time. They are thus revealed at different distances, and the narrative’s sequence of outrage passes from one to the other. 7 and 8. Convergence and binocular disparity are two-eyed phenomena. Convergence is measured as the angle between foveal axes of the two eyes. When the angle is large, the two eyes are canted inward to focus near the nose; when approaching 0°, the two eyes are aligned to focus beyond 10 m (which is, interestingly enough, functionally the same as the horizon). Convergence can be registered and used at close range but not beyond about 2 m. Given that photographic and cinematic images are flat, it is uninformative, and given that all of film and much of television is watched from distances greater than 2 m, this source of information is irrelevant. Binocular disparities are the differences in relative position of sets of objects as projected on the retinas of the two eyes. When disparities are sufficiently small, they yield stereopsis, or the impression of solid space. When disparities are greater than stereopsis will allow, they yield diplopia— or double vision—which is also informative about relative depth. Stereo is also extremely malleable, and just one day of monocular vision can render one temporarily stereoblind (Wallach & Karsh, 1963). Convergence has never had artistic use, and it is remarkable that stereo has never played an important role in photography or film except as a type of parlor teaser. Despite all predictions at the time and before (see Eisenstein, 1948/1970), few pictures following the 1953 film House of Wax by André de Toth, have been made in 3-D. Some theorists suggest that the reason has to do with the relative gimmickry of stereo and the necessity of wearing glasses (Kubovy, 1986). Without denying this factor, I think stereo films fail as an important medium because stereo in the real world enhances noticeable depth differences only nearest to the viewer and, as I will discuss later, this is not a region of space that is important to most filmmakers. Interestingly, Hitchcock’s 1954 film Dial M for Murder was shot in 3-D but is rarely seen in this format; Hitchcock had particular interest in low camera angles in this film, and 3-D worked well to reveal

P ERCEIVING S CENES / 15

depth differences in near space (Truffaut, 1983, p. 210). However, perhaps unsatisfied with its effects, Hitchcock never used 3-D again. Convergence and disparities are linked in human vision, but having two eyes can often get in the way of seeing depth in pictures. At the turn of the twentieth century, Karl Zeiss, inventor of cameras and of the planetarium, hoped to gain (another) fortune by selling a device that neutralized both sources for visitors to art museums. Called a synopter, this apparatus contains a series of fully silvered and half-silvered mirrors at 45° angles that superimpose the lines of sight of the two eyes, nullifying disparities and convergence. Reports suggest these devices greatly enhance the visual depth seen in photographs and in paintings (Koenderink, van Doorn, & Kappers, 1994), often even more than stereoscopic displays. The reason for enhanced depth compared to unencumbered two-eyed viewing seems relatively straightforward. This device cancels certain information about flatness (uniformly graded disparities and vergence on a nearby object) in the scene viewed. It also can remove from view the frame and other context surrounding the picture. The reason the synopter produces an effect of depth better than stereopsis is a bit more complicated. First, with typical stereo material, there is a coulisse effect (objects can appear relatively flat with startling spatial gaps in depth between them). This effect is due to the fact that the two cameras used to take the stereo images are usually considerably more than 6 cm apart, the distance between our eyes. Stereo cameras wider apart than our eyes will tend to minify a scene (they effectively “enlarge” our head), making objects appear proportionately smaller and flatter than they are. Indeed, early parlor stereograms were of European cities, taken with cameras as much as a half-meter apart (eight times normal) or more. This renders impressions of the cities very much less grand one-eighth the size), even toylike. Second, zero disparity, which is what the synopter achieves, does not actually take away depth information. Instead, it specifies infinite depth, or at least a depth beyond about 30 m. This would probably be pooled with other sources and enhance the overall depth effect. Most simply, however, moviegoers can achieve a nearly synoptic effect by sitting more than 10 m from the screen. 9. Motion perspective refers to the field of relative motions of objects rigidly attached to a ground plane around a moving observer or camera. It specifically does not refer to the motion of a given object, which was the major early accomplishment in the ontogeny of film.3 Motion perspective occurs best during a dolly (or tracking shot), where near objects and textures move faster than far ones, and their velocity is inversely proportional to their distance from the camera. Thus, objects twice as far move exactly half as fast so long as the camera does not pan. The first uses of motion perspective in film were seen at the end of the nineteenth century (Toulet, 1988), where cameras were mounted on trolleys and trains and their effects presented to appreciative audiences. Motion perspective is particularly good at generating the impression of self-movement, but it needs to be distinguished from another camera manipulation. In early and later cinema outside the studio, dollies entailed putting a camera in a moving vehicle or, more expensively, the laying down of a track on which the camera rolled. For example, the filming of the background in the chase sequence through the Ewok (redwood) forest near the end of George Lucas’s 1983 film Return of the Jedi used a track and a dollying camera. He also used frame-by-frame photography to enhance the speed and then hand blurring of the periphery of each image to avoid motion-aliasing arti-

16 / JAMES E. C UTTING

facts. Today, steadycams (cameras using inertia to avoid the bounciness of hand-held techniques) make the motion-perspective effect easier to attain outside the studio. The information about motion perspective attained from a dolly should be distinguished from the patterns seen in a zoom. Zooming in, as suggested above, is the continuous adjustment of a variable lens from a relatively short to a relatively long length (the range of 38 mm to 115 mm is common in a 35 mm camera). The optical differences between the two are interesting but in short sequences is generally unnoticed by a film viewer (Hochberg, 1978). Zooming in simply enlarges the focal object, allowing all texture to rush by with equal speed as a function of its image distance from the center of the focal object. No occlusions and disocclusions occur in a zoom. Motion perspective, on the other hand, creates occlusions and disocclusions of far objects by near ones, and the objects and textures rush by at a speed proportional to their physical distance from the camera and their angle from the path of the camera. Although viewers may not be, filmmakers are quite sensitive to the differences between a dolly and zoom. Dollies are used to indicate observer motion; zooms are typically used for increased attention. Interestingly, the phenomenon of attention as studied within experimental psychology generally supports this idea. Attention is a phenomenon of increased interest on an object, typically in the center of the field of view, coupled with an increased rejection of information in the periphery. Indeed, theories of attention occasionally talk, metaphorically, of zooming in on objects during periods of interest (see Palmer, 1999, for a review of the phenomena of attention). Shadows and lighting are often added to lists of information contributing to the perception of depth. However, I believe shadows are used almost exclusively for articulating the shapes of objects, not about object relations in depth around the perceiver. The reason is straightforward—changes in shadows rarely change depth or one’s perception of the shape of an object, whereas changes in relative size, height in the visual field, binocular disparities, and the like almost always do. This is not to underplay the importance of shadows in real life or in cinema. Artistically, it is crucial to play with the identity of objects and individuals in film, and this is often done best through variations in lighting. Consider two particularly striking examples of lighting effects. One occurs in Hitchcock’s 1938 film The Lady Vanishes where a handwritten message in the condensation on the interior of a train window is invisible in daylight but appears when the train is in a tunnel—a key bit of evidence on which turns possible hallucination into intrigue. A second occurs throughout Godfrey Reggio’s 1982 film Koyaanisquatsi, a film without dialog or standard plot. Time-lapsed photography is used throughout with the camera often remaining in position throughout a full day, recording the passing of events under the change of light. Phenomenal Spaces in the Real World and in Cinema On the basis of the differential relative potency of the various sources of information listed above, I have found it convenient to divide egocentric space into three regions— vista space (that beyond about 30 m) for a pedestrian, action space (from 30 m inward to about 1.5 m), and personal space (closer than about 1.5 m) (see Cutting & Vishton, 1995; Cutting, 1997). In vista space, the only effective sources of information are the traditional pictorial cues—occlusion, height in the visual field, relative size, relative density, and aerial perspective—all of which are yoked within the technique of linear perspective mastered by Renaissance artists and yoked in camera use as well. Motion per-

P ERCEIVING S CENES / 17

spective for the pedestrian is not particularly effective beyond 30 m, particularly when looking in or near the direction of motion. Similarly, stereo is also not very effective. Vista space can be strikingly portrayed in large trompe l’oeil paintings and in cinema, particularly in wide-screen format. But the typically narrative content of vista space in film is nil. Vista is only backdrop, and older Hollywood movies succeeded well by simply painting vistas on walls and on movable sets. Action space is circular, on the ground plane around us, and generally closer than about 30 m but beyond arm’s reach. We move quickly within this space, talk within it, and toss things to a friend. More simply, we act within this space. In everyday life, all but the most intimate conversations occur within this space. In the real world, this space appears to be served by a different collection of information sources: three of the five linear-perspective sources (relative density and aerial perspective are usually too weak compared to the others) plus binocular disparity and motion perspective. For film we can omit disparities. Most emphatically, action space is the space of films. Film content almost always takes place between 2 m and 30 m of the camera. As viewers, we like it this way. The near boundary of action space for the pedestrian is delimited by the emergence of height in the visual field as a strong information source, which also serves to limit this space to the ground plane. Viewing objects from above or below about 1.5 m to 2 m tends to make perception of their layout less certain by weakening the effect of familiar size, a phenomenon by which we can scale the size of surrounding objects by what we know to be the size of a particular object. Many wide-angled paintings and engravings from the eighteenth century (e.g., Caneletto and Piranesi) use an eye height of about 2.5 times normal, which is about the extreme of its utility without loss of object identity. Bertamini, Yang, and Proffitt (1998) and Dixon, Wraga, Proffitt, and Williams (2000) have shown that one begins to lose the impression of object size when eye heights exceed this value. Interestingly, this is roughly the height typically attainable by raising a camera on a crane, a common device at the end of a film indicating that the film is over. Also, the opening shot and several others in Orson Welles’s 1958 film Touch of Evil use a crane effectively to dodge up and down within this range. Finally, partly because very high camera positions can defeat our sense of true object size, Hitchcock and others were able to use small models to film what would appear to be outdoor scenes. Personal space immediately surrounds the observer’s head, generally within arm’s reach and slightly beyond. Within this region, I claim five sources of information are generally effective (Cutting & Vishton, 1995)—occlusion and relative size from the linear perspective set, plus the reflexive, biologically engrained set of accommodation, binocular disparities, and convergence. Given that the latter two are not attained in standard film, their absence could create a problem. Fortunately, the personal space of the viewer is not often relevant to film. Indeed, I claim that part of being a viewer of the action in a film is contingent on not having things enter one’s personal space.4 This impinges on one’s person, and typically one does not want to be made aware of oneself when watching a movie. If you “lost it at the movies,” to use Pauline Kael’s (1965) felicitous phrase, you did so because you were not made aware of yourself. This is critical to Hollywood style. Thus, whereas in the real world there appear to be three differentiable spaces (vista, action, and personal), in film there appears to be but one (action space). This makes the cinematographer’s job possible. He or she doesn’t have to worry too much about the background (indeed, many times sets can be substituted for outdoor scenes) and

18 / JAMES E. C UTTING

doesn’t have to worry about the extreme foreground (because it would impinge on the space of the viewer). How Cuts, Shots, and Narrative Knit Together a Film Having broached spatial information and its use in cinema, let us turn next to temporal structure and how it interacts with space. It is useful to begin historically. Quite understandably, many early films were shot as theater productions, with an unmoving camera in mid-audience. It was soon discovered, however, that the camera could move, and execute close-ups, and the viewer could still make good sense of the action from different points of view. In addition, with increased demand and the advance of technology, films became longer, and cuts were needed; one simply couldn’t hold enough unexposed film in a magazine to shoot the whole movie (difficulties and expense of multiple takes aside). Early on, different shots were separated by a fading out of the first and then a fading in of the second. Darkness knit the two shots together. Later, dissolves entered the editor’s toolkit, where the fading out of one scene is overlapped with the fading in of another (see Spotteswood, 1951). Nonetheless, quite early, D. W. Griffith and others discovered that straight cuts were acceptable and not jarring (see Carey, 1982). Cuts separating shots in the same scene are by far the most common. In contemporary television, within-scene, alternating shots, and reverse shots account for more than 95% of all cuts (Messaris, 1994). Transitions separating shots from different scenes and times, however, often continued to use fades. For example, The Sound of Music in 1965 has straight cuts within scenes but fades when both time and place are changed. More recently, fades have passed out of favor, seeming quaint and unnecessary. A striking straight cut across scenes occurs early in Steven Spielberg’s 1997 film The Lost World. An English family vacations on a remote, tropical island off Mexico. The daughter strays and plays just off the beach beneath some palms. Small creatures surround her and attack. She screams. The scene then cuts to a yawning Ian Malcolm (Jeff Goldblum) with a palm tree behind him, but it turns out that Malcolm is next to a screeching train in New York City, and the palm tree is on a poster advertisement. Hollywood style is followed because the juxtaposition tells us that Malcolm will be connected to understanding the cause of the girl’s death. Why is a straight cut perceptually acceptable? This question divides several ways. First, why is it acceptable for one image to displace another taken from the same position in space but with the camera rotated to a new orientation? Second, why is it not acceptable for one image to displace another taken from the same position and orientation? Third, why is it acceptable for one image to displace another taken from a different position with a new camera orientation? Cuts, saccades, suppression, and the lack of beta motion. With respect to the first question, many conjectures have been made. In a 1965 interview, director John Huston made an intelligent start, establishing himself as perhaps the first ecological film theorist: All the things we have laboriously learned to do with film, were already part of the physiological and psychological experience of man before film was invented. . . . Move your eyes, quickly, from an object on one side of the room to an object on the other side. In a film you would use a cut. . . . In moving your head from one side . . . to the other, you briefly closed your eyes. (Messaris, 1994, p. 82) Thus, for Huston, a cut is a surrogate for the real-world combination of saccade and blink.

P ERCEIVING S CENES / 19

Our visual world is usually continuous, but saccades and blinks do alter and cut the stream. We can often make ourselves aware of temporal discontinuities that occur during eye blinks, which are usually about a fifth of a second long (Pew & Rosenbaum, 1986). Make a blink longer than 200 milliseconds (ms), and the “dimming” that often occurs becomes quite noticeable. Such dimming has a cause beyond mere lid closure. Some of the effect lies physiologically in the commands to the eyelid muscles. Such dimming occurs in the dark even when an optical fiber delivers light to the retina through the roof of the mouth (Volkman, Riggs, & Moore, 1980). Despite this, I know of no one who has collected normative data on the co-occurrence of saccades and blinks. This aside, it is quite clear that most of our saccades occur without blinks, so Huston’s conjecture must reduce to one of comparing cuts with saccades. Cuts are instantaneous, one frame to the next.5 Saccade durations vary, mostly by the extent to which the eye moves, but 40 ms is about average, with a range of 20 to 90 ms (Hallett, 1986). The velocity of eye rotation during a saccade is quite fast with a range of 50° to 500°/second. Given that film screens are seldom seen wider than about 35° and that a full circle is 360°, this is fast indeed. During such movement, one would expect to see blur. We do not. In fact, we see essentially nothing, a fact called saccadic suppression. Causes for this are complex but seem to be a mixture of blocking by two sources— feedback from eye-muscle movements and a particular type of masking, or blotting out of the message. Technically the latter is called metacontrast masking (Matin, 1986). In effect, we are relatively blind to visual information occurring from a few milliseconds before a saccade, almost completely so during the 50 ms or so of a saccade, and tapering off for about another 50 to 100 ms after the saccade is complete (Volkman, Schick, & Riggs, 1968). The time course of interruption masking by metacontrast is about the same (without the presence of a saccade duration). Because interruption masking is likely to occur after a film cut, one can assume that we are partially blind to the visual information in the first 100 ms after a cut, about the duration of two frames. This means the editor must be a bit careful; one cannot cut quickly again. Quick cuts within this range are disruptive. They were tried, for example, in Dennis Hopper’s 1969 film Easy Rider. Toward the end of the film, single-frame and longer shots were incrementally cut back and forth between scenes of motorcycle riding and camping. These were jarring, interfered with the narrative, and hence broke with Hollywood style. However, masking and suppression explain only part of why cuts work. They explain a temporary blindness between shots at the cut line and perhaps the lack of disorientation immediately after the cut, but they do not explain the acceptability of the cut. Why are we able to make sense out of two shots with no transition? Acceptability seems predicated, in part, on the physical differences between shots. Cuts become acceptable only when the general patterns of light in the two shots are sufficiently different (Hochberg & Brooks, 1996). This occurs naturally in fixations before and after a saccade. After rotating our eyes, the backgrounds of what we see are different, the objects focused upon are typically different, the lighting is often different, and few edges and lines as projected on the retina line up across fixations. Thus, we accept a disrupted flow quite naturally, it is a part of our everyday visual world, and this is the heart of Huston’s conjecture. But there is a caveat. Unlike the perceiver in the real world, the editor composing the film must be careful with the content of successive images. If edges line up or worse, almost line up, a certain kind of irrelevant motion can occur, which I will call beta motion.6 This motion occurs in the laboratory and occasionally in neon street signs, where objects can change shape and position. This motion is not cinematic motion and would likely de-

20 / JAMES E. C UTTING

tract from the narrative. A particularly interesting candidate case occurs early in Stanley Kubrick’s 1968 film 2001: A Space Odyssey. Protohumans battle, one side wins, and a leader of the winning group tosses a bone into the air, which the camera follows and which rotates in slow motion. Cut to a spaceship docking at a space station. What is interesting about this cut is that across frames the bone and the spaceship do not line up. In fact, they are at right angles. Surely, the editor must have been tempted to align the orientations of bone and spaceship. Despite the fact that the backgrounds of the scenes are very different (light blue against the bone and black against the spaceship), we can only assume that the editor found it inappropriate to align them; it must have created jarring beta motion. The avoidance of beta motion is also part of the answer to the second question: Why is it not acceptable for one image to displace another taken from the same position and orientation? Juxtaposed shots taken from the same point of view create what is called a jump cut. Although used occasionally in French New Wave cinema of the mid-twentieth century, the perceptual effects of a jump cut are often very jarring. The reasons for this would seem to be that the commonality of the backgrounds of the two shots across the cut anchors the sameness of what is seen (Hochberg & Brooks, 1996). Within this sameness, the changes in the focal object can often only be made sense of, perceptually, as plastic deformation and size change. Because people and cars and other objects that are the focus of the cinematic narrative cannot spontaneously deform or change size, anything indicating that they do seems weird and detracts from the narrative. Shot-reverse-shot sequences, the cinematic viewer, and discontinuity. The third question concerns cuts and the movement of the camera to a different position and orientation within the same scene. This is called a shot-reverse-shot pattern and occurs most often in filmed conversations. Such filming is a technical tour de force with great psychological interest. There are at least two interrelated problems here. First, after an establishing shot that shows two (or more) people in the scene, the camera typically frames each speaker sequentially, alternating position and focus between the two conversants. One person looks left off the screen as if to the other. The other individual looks offscreen right. Cinematic practice has shown that it is best if sight lines (gaze directions of the conversants) line up. The result is as if we (the camera) were a silent third party to the conversation, looking back and forth. Because of the necessity of using long lenses, we (as the camera) cannot simply occupy a position nearby the conversants—they would have big noses. Thus, and revealing the second problem, we (as the camera) are often looking over the shoulder of one of the conversants, whether or not his or her shoulder is actually in the picture. In this manner, we do not occupy a single third position. How is it we can tolerate this subjective jumping around so much? My view is this jumping is only a problem if one assumes that we perceive the world metrically. Conversations are focused on the people, not the backgrounds, and we actually care quite little about the overall coherence of the background of the scene. We are perfectly happy, as viewers, if the camera positions are only roughly consistent with a third position generally between and to one side of the conversants. More simply, we are much more interested in the story than the details of the framework around those enacting the story. This raises an important concern of filmmakers—continuity (see Anderson, 1996; Bordwell et al., 1985). Among other things, continuity means keeping track of what is in each shot and making sure that the world that is projected on the screen appears coherent. However, it appears that we, as perceivers, are not that particular about such

P ERCEIVING S CENES / 21

coherence. Levin and Simons (1997; see also Simons & Levin, 1997), in one of the few laboratory experiments on the perception of film, showed that objects can appear and disappear across shots within the shot-reverse-shot sequence of a conversation, and viewers don’t notice. In fact, one can sometimes switch actors across cuts as well (and actually do this in real life) without observers noticing. Indeed, Luis Buñuel did this with his leading actresses in his 1977 film That Obscure Object of Desire, and viewers often did not notice (see Buñuel, 1983). However striking these results may seem, this kind of continuity “experiment” is forced by circumstances in some way upon the editors of nearly every film. In The Sound of Music, for example, consider a pivotal dramatic scene. Maria and the Von Trapp children are having fun rowing on a lake behind the house. Captain Von Trapp suddenly arrives home with his betrothed, the Baroness Von Schrader, and the Captain excoriates Maria for a breach of strictness in the children’s upbringing. Maria defends herself, to the point of being fired. The scene demands that it be shot outdoors and that Maria and the children get wet, falling in the water. The clothes, being sewn from colorful drapes, are not easily replaced. Thus, different takes of the same scene could not be shot on the same day. The clothes must dry. The film version of the scene was clearly edited from shots taken on at least two days with quite different weather, one with clear blue sky and one with heavy humidity. The shots cut back and forth between clear and humid days seven times in the course of the argument between Maria and the Captain. Nonetheless, no student to whom I have shown this clip has ever noticed this fact, even after hearing a short lecture about continuity in film. Clearly, the narrative is sufficiently powerful that is doesn’t matter that the sky behind Captain Von Trapp (taking up as much as half of the surface area of the screen) changes so many times. The Filmmaker’s Contract with the Viewer But there are constraints; not everything goes. This idea divides two ways. First, filmmaking demands that continuity be as great as possible during the actual filming process. This is because of the filmmaker’s and editor’s inability to know, in advance, which discontinuities would and would not be noticed—and psychological theorists don’t know either. Unprepared for and noticeable discontinuity could jeopardize Hollywood style and the success of the film. Second, whereas certain structural aspects of continuity may be violable, thematic aspects cannot. This raises the issue of montage and the oft-described Kuleshov effect (Levaco, 1974; Pudovkin, 1958). The Russian filmmaker V. I. Pudovkin, a student of Kuleshov, made several short movies (each of three shots) using the actor Ivan Mosjukhin. In the first movie, the first shot showed a close-up of the relatively expressionless face of the actor, the second a coffin in which lay a dead woman, and the third another close-up of the actor. In a second film, the first and third shots were the same, but the second was replaced with a bowl of soup. Reports suggest that viewers read the expression on Mosjukhin’s face in the third differently in the two short sequences. Such, it is said, is the power of montage. Indeed, Hitchcock embraced this idea (Truffaut, 1983, p. 216) and claims to have used it in his 1954 film Rear Window where, as a temporary invalid, L. B. “Jeff ” Jeffries (James Stewart) views the murder of a neighbor across a back courtyard. Yet there is much less here than actually meets the eye. What this description of montage leaves out is context. The montage will work but only in the context of the longer narrative. Without that context, every experiment I know that tries to replicate the Kuleshov effect has failed. Hochberg and Brooks explain why:

22 / JAMES E. C UTTING

Despite Eisenstein’s assertion (1949) that two pieces of film of any kind, when placed in juxtaposition, inevitably combine into a new concept of quality, there is no reason to believe that without specific effort at construal by the viewer anything other than a meaningless flight of visual fragments . . . will be perceived. (1996, p. 265) In other words, the filmmaker must first win over the viewer with the narrative. After the viewer accepts the narrative, the filmmaker has an implicit contract with the viewer to promote the narrative in an appropriate way. All storytellers in all media have such contracts with their audiences (see also Proffitt, 1976; Willats, 1995). At this point, montage is good film practice but only so long as the narrative continues in a satisfactory way. If it does not, the filmmaker has broken the contract, and the perceiver is on his or her own. One final point about the acceptability of successive shots. Great importance has been given in the psychological and film literature to what is often called the 180° rule (see Carroll, 1980; Hochberg & Brooks, 1996; Kraft, 1987). This rule states that successive shots should not cross the line of sight between two conversants, or cross the line of action, but roughly any camera position within the remaining 180° is fine. Nonetheless, this rule seems violated quite often with little effect. In John Ford’s 1939 film Stagecoach, the opening scene cuts across the line of action (the stagecoach enters from the left, facing right, and we then see it facing in the opposite direction). Later, when more passengers are added, this “error” occurs again in the opposite direction, yet little seems lost. Few students, when shown the film, notice it. More potently, the final scene in Casablanca cuts in violation of this rule during a three-way conversation among Rick, Victor, and Ilsa. Most of the conversation consists of shots cut between Rick (on the right) and Victor (on the left). Ilsa is between them but closer to Rick. At a critical moment, however, suddenly Rick is on the left and Victor and Ilsa on the right. This is important for the story line, because at this moment, Victor puts the papers of transit in his coat pocket, a gesture that could not be seen from the previous perspective. Moreover, the new placement of the three seals the fact that Ilsa is going with Victor and not staying with Rick. Quickly, the camera positions shift back to Rick on the right and Victor on the left before departure. My experience in showing this sequence in a class is that no one is confused—indeed, no one even notices. This is probably because the positions of all three characters were well established in previous shots. I agree with Murch (1995, p. 18), who suggests that the 180° rule is less important than often suggested and is subordinate to many other purposes of film. I contend that such cinematic “rules” are not, as often proposed, like a “grammar” of film. In linguistics, violations of grammatical rules render sentences incomprehensible or ambiguous; in film, violation of these “rules” typically does not yield unknowable or uncertain results; one understands the film but is also made aware that something is amiss. Instead of being like grammar, I content these “rules” are like conversational axioms (Grice, 1957), the basis of a contract about how people behave towards one another in how they conduct a conversation. In film, these are parts of the contract between filmmaker and viewers. As I suggested earlier, the Hollywood-style contract dictates that filmmakers will not let viewers become aware of themselves. Crossing between conversants in the real world would be bad manners, and one would become aware of oneself. Crossing between conversants in film, as one apparently would in any violation of the 180° rule, would be equally rude.7

P ERCEIVING S CENES / 23

Final Notes on Cuts and Time Two final comments about cuts. First, can a film have no cuts? I know of only two in standard-release cinema, Hitchcock’s 1948 film Rope and Louis Malle’s 1981 film My Dinner with André. The latter actually has a beginning and ending shot outside a restaurant, but the 105 minutes in between is one 16 mm film shot (made with an extremely large film cartridge), occasionally with a gradual zoom in and out, of a dinner conversation. It is a remarkable film, but as a viewer, one is teased and made aware of many things throughout. Rope is different; eighty minutes long, it is composed in 35 mm film as if it is one shot. As suggested earlier, in the context of describing a scene in Notting Hill, Rope is actually shot in ten-minute sections (see Truffaut, 1983). Breaks in the sections but not in the shot are hidden, for example, with slow pans across a person’s back. Nonetheless, the action is continuous (walls and furniture having to be moved for the camera), making the film take place in real time. The camera roves throughout an apartment as a college professor (James Stewart) gradually discovers that two of his former students (Farley Grainger and John Dall) have followed the principles espoused in his course to an unexpected extreme, killing another former student. I showed this film to my daughters, and they never noticed that it was filmed in only one shot. Thus, short shots and cuts are not necessary to film; but it is equally unnecessary for films to be a continuous shot.8 Our perceptual and cognitive systems accept either with equal alacrity as long as the narrative carries one’s interest. Finally, although films have had cuts and shots for a long time, it is clear that in recent cinema their pace is accelerating. Why? Many would blame music videos. The pace of shots and cuts in these three-minute clips can often be breathtaking, although the music overlay is continuous. Gleick (1999) would suppose that this pacing effect is a cultural one, due to the acceleration of demands on our time and a decrease in threshold for boredom. Indeed, to a degree, this is almost certainly true.9 However, shot length beyond a second or two is not biologically constrained. Bordwell et al. (1985) found shot lengths in Hollywood cinema between 1930 and 1960 to be between about six and twelve seconds; today it is probably only a bit less. Only the lower limit of shot duration is limited by our perceptual and cognitive systems being able to make sense of shot composition and continuity, and this limit for any relatively sustained visual art form probably has a mean of about one to two seconds, or so. This is still below the pace of most music videos. Thus, I would claim that music videos exploit a heretofore unused perceptual and cognitive niche in cinema construction. I would claim further that our perceptual apparatus and cognitive apparatus have always been able to accept such pacing; it is just that only recently has this ability been tested. Film in general may evolve to have an average shot of slightly shorter duration than at present, but this would be a statistical artifact of mixing relatively long-duration shots (which in filmed conversations and elsewhere are not likely to diminish in length) with more music video-like shots. Notes I thank Claudia Lazzaro for listening to near-endless ruminations about film during preparation of this essay; Joseph D. Anderson, Barbara Fisher Anderson, and Dan Levin for comments on a draft of this essay and for sharing their expertise about film; Dennis Proffitt for long-ago discussions of contracts; Michael Kubovy for discussions of Hitchcock; and my children, who for many years forced me to watch many different films with them many times, allowing me to become aware of and interested in Hollywood style. 1. Two important points need to be made. First, not all movies made in Hollywood are uniformly in Hollywood style nor are non-American movies necessary not made in Hollywood style.

24 / JAMES E. C UTTING The use of the term Hollywood style is intended to evoke the commonality of presentation and narrative found in popular and classic films (see Bordwell et al., 1985). Second, many genres contrast with Hollywood style and for many reasons. Documentaries and television sportscasts have a narrative of sorts, but they differ from Hollywood style in that they typically have very long duration shots and no point-of-view editing. Newscasts also have a kind of narrative, but they differ from Hollywood style by having long shots and by having people look directly into the camera, intentionally engaging the viewer with eye contact as in conversation. Advertisements and political spots differ in that they typically have no real narrative; instead, they have a strong message that their crafters want the viewers to remember. Music videos differ in that many have very quick cuts, a continuous music line, and a sequence of shots that often alternates between the singer(s) and a small plot that uses the song as narration. Television sitcoms differ, having fewer changes of scene, using a generally proscenium set, and inserting canned laughter. Finally, much of the film corpus of Sergei Eisenstein, for example, can be taken as a part of a genre of cinema that is trying strongly to educate the viewer, where the juxtaposition of content across cuts is often intended to elucidate similarities and dissimilarities, forcing the viewer to make judgments about what is seen. 2. By dolly in I mean that the camera physically rolls closer to the object, and by zoom out I mean that the lens length of the camera gets shorter (normally minifying the objects in the image and creating a wider field of view). Surprisingly, Hitchcock and François Truffaut misdescribe the scene as a “track-out combined with a forward zoom” (1983, p. 246). Roland Emmerich and Dean Devlin’s 1998 film Godzilla provides another example, although there are many. When Godzilla is about to erupt through the pavement of Manhattan, the camera is on Nick Tatopoulos (Matthew Broderick), and the buildings on the streets of New York convulse around him during the combined dolly in/zoom out. 3. Motion is, of course, the raison d’être of film. One of the first “feature” films was the fifteen-second film by Louis Lumière, L’Arrivée d’un train à La Ciotat (Ciotat being a small town outside Marseille). This 1895 film cost one franc per viewing and was an immediate smash hit, seen by breathless thousands (Toulet, 1988). Accounting for inflation, it is remarkable that viewing this film cost about $685 per hour in early-twenty-first-century dollars. We may bemoan the cost of going to the cinema today, but today’s price is quite reasonable compared to previous times. 4. Let me make two additional points. First, it is also not good Hollywood style to have actors look directly into the camera. If looked at, one becomes self-aware. For example, this is done early in The Sound of Music. Maria, returning late to the convent and being excoriated by the nuns, looks into the camera and shrugs her shoulders. It seems quite amateurish and disruptive. Second, the camera’s extreme close-ups of actors’ faces and other body parts do not necessarily impinge on the viewer’s personal space. Instead, because long lenses are typically used in such scenes, the optics are akin to looking through binoculars, giving one a more immediate look at something that is still rather far away. These shots are not mistaken for being on top of the actor. For example, contrast them with some of the compelling, computer-graphics shots in Disney and Pixar’s 1999 film Toy Story 2. Many are subjective shots from the points of view of toys looking at human beings. The humans, of course, optically loom large—but not as with a telescope but because they really are very close—entering the personal space of the toys. But rather than becoming aware of ourselves, this is part of the narrative, showing us what it is like to be a toy. 5. Of course, standard film is usually interrupted seventy-two times per second, each of twentyfour frames three times by an episcotister. This brings the flicker rate above the normal human threshold, which for a bright light is about sixty times per second. Continuity at this scale is achieved by exceeding the temporal resolving capacity of the system. See Anderson (1996, chapter 4) for a good analysis. 6. Beta motion is a kind of apparent motion. There are many kinds of apparent motion and much confusion in the literature. It is sometimes said that film presents apparent motion due to its stroboscopic presentation of frames. But stroboscopic motion is, neurophysiologically, no dif-

P ERCEIVING S CENES / 25 ferent that real world and typically entails using many separate and sequential displays; apparent motion is quite different and considerably less compelling (Sperling, 1976). Sometimes this distinction has been called short-range motion (for stroboscopic) and long-range motion (for apparent), but this distinction is sometimes difficult to maintain. I use the technical and historical term beta motion in an attempt to avoid confusion. See Palmer (1999, pp. 471–79) for a good analysis and presentation of the types of apparent motion. 7. Extraordinary blocking and camera gymnastics, as seen in a conversation in Ivan Reitman’s 1988 film Twins between Danny DeVito and Arnold Schwarzenegger, may diminish the appearance of this rudeness. 8. Hitchcock runs into a problem in Rope. Typically, in Hollywood style, when an actor looks offscreen, the next shot is a subject one, showing us what the actor sees. Resolved to have no cuts, Hitchcock could not do this. Thus, when the professor (James Stewart) finds an extra hat in the closet, he drops his head in thought and turns the inside of the hat towards the camera which zooms in and shows us the initials of the dead man. This is a break with Hollywood style, because Hitchcock could not use point-of-view editing, and the viewer is denied the subjective shot of what the professor sees. The information had to be conveyed by other means, one that borders on making us aware of ourselves. Hitchcock has often played with the subjective shot that is supposed to follow an offscreen glance. In his 1963 film The Birds, the protagonist (Tippy Hedrun) sits on a bench and looks offscreen several times, interleaved with shots of more and more birds arriving on a schoolyard jungle gym. We might have assumed that she was looking at the birds, but later she turns around in horror to see them. However, an establishing shot at the beginning of the sequence showed her facing away. See Carroll (1980) and Messaris (1994) for more discussion of this scene. 9. Indeed, pace within a film is important, too, and Lumet (1995) suggests that shorter shots are necessary to build to a climax. In Twelve Angry Men fully half of the cuts come in the final third of the film.

References Anderson, J. D. (1996). The reality of illusion. Carbondale: Southern Illinois University Press. Atchley, P., Kramer, A. F., Andersen, G. J., & Theeuwes, J. (1997). Spatial cuing in a stereoscopic display: Evidence for a “depth-aware” attentional focus. Psychonomic Bulletin & Review, 4, 524–29. Bertamini, M., Yang, T. L., & Proffitt, D. R. (1998). Relative size perception at a distance is best done at eye level. Perception & Psychophysics, 60, 673–82. Bordwell, D., Staiger, J., & Thompson, K. (1985). The classical Hollywood cinema: Film style & mode of production to 1960. New York: Columbia University Press. Buñuel, L. (1983). My last sigh. New York: Knopf. Carey, J. (1982). Convention and meaning in film. In S. Thomas (Ed.), Film culture: Explorations of cinema in its social context (pp. 110–25). Metuchen, NJ: Scarecrow Press. Carroll, J. M. (1980). Toward a structural psychology of cinema. The Hague: Mouton. Cutting, J. E. (1997). How the eye measures reality and virtual reality. Behavior Research Methods, Instruments, & Computers, 29, 27–36. Cutting, J. E., & Millard, R. T. (1984). Three gradients and the perception of flat and curved surfaces. Journal of Experimental Psychology: General, 113, 198–216. Cutting, J. E., & Vishton, P. M. (1995). Perceiving layout and knowing distances: The integration, relative potency, and contextual use of different information about depth. In W. Epstein & S. Rogers (Eds.), Perception of space and motion (pp. 69–117). San Diego: Academic Press. Dixon, M. W., Wraga, M., Proffitt, D. R., & Williams, G. C. (2000). Eye height scaling of absolute size in immersive and nonimmersive displays. Journal of Experimental Psychology: Human Perception and Performance, 26, 582–93.

26 / JAMES E. C UTTING Eisenstein, S. (1970). Film form. New York: Harcourt. (Original work published 1949). ———. (1959). Notes of a film director. Translated. New York: Dover. (Original work published 1948). Gibson, J. J. (1950). Perception of the visual world. Boston: Houghton Mifflin. Gleick, J. (1999). Faster. New York: Pantheon. Grice, H. P. (1957). Meaning. Philosophical Review, 66, 377–88. Hallett, P. E. (1986). Eye movements. In K. R. Boff, L. Kaufman, & J. P. Thomas (Eds.), Handbook of perception and performance (Vol. 1, chap. 10, pp. 1–112. New York: Wiley. Hochberg, J. (1978). Perception (2nd ed.). Englewood Cliffs, NJ: Prentice Hall. Hochberg, J., & Brooks, V. (1996). The perception of motion pictures. In M. P. Friedman & E. C. Carterette (Eds.), Cognitive ecology (pp. 205–92). San Diego: Academic Press. Kael, P. (1965). I lost it at the movies. Boston: Little, Brown. Koenderink, J. J., van Doorn, A., & Kappers, A. M. L. (1994). On so-called paradoxical monocular stereoscopy. Perception, 23, 583–94. Kraft, R. N. (1987). Rules and strategies of visual narratives. Perceptual and Motor Skills, 64, 3–14. Kubovy, M. (1986). The psychology of perspective and Renaissance art. Cambridge, UK: Cambridge University Press. Levaco, R. (Ed. & Trans.). (1974). Kuleshov on film. Berkeley: University of California Press. Levin, D. T., & Simons, D. J. (1997). Failure to detect changes to attended objects in motion pictures. Psychonomic Bulletin & Review, 4, 501– 6. Lumet, S. (1995). Making movies. New York: Vintage. Matin, L. (1986). Visual localization and eye movements. In K. R. Boff, L. Kaufman, & J. P. Thomas (Eds.), Handbook of perception and performance (Vol. 1, chap. 20, pp. 1– 45). New York: Wiley. Messaris, P. (1994). Visual literacy: Image, mind, & reality. Boulder, CO: Westview Press. Murch, W. (1995). In the blink of an eye. Los Angeles: Silman-James Press. Neisser, U., & Becklen, R. (1975). Selective looking: Attending to visually-specified events. Cognitive Psychology, 7, 480–94. Palmer, S. E. (1999). Vision science: From photons to phenomenology. Cambridge, MA: MIT Press. Pew, R. W., & Rosenbaum, D. A. (1986). Human movement control: Computation, representation, and implementation. In R. C. Atkinson, R. J. Herrnstein, G. Lindzey, & R. D. Luce (Eds.), Stevens’ handbook of experimental psychology (pp. 473–509). New York: Wiley. Proffitt, D. R. (1976). Demonstrations to investigate the meaning of everyday experience. (Doctoral dissertation, The State University of Pennsylvania, 1976). University Microfilms No. 76-29, 667. Pudovkin, V. I. (1958). Film technique and film acting. (I. Montagu, Trans.). London: Vision Press. Reisz, K., & Millar, G. (1968). The technique of film editing (2nd ed.) New York: Hastings House. Richter, J. P. (1970). The notebooks of Leonardo da Vinci. New York: Dover. (Original work published 1883). Scharf, A. (1968). Art and photography. London: Penguin Press. Simons, D. J., & Levin, D. T. (1997). Change blindness. Trends in Cognitive Sciences, 1, 261–67. Sperling, G. (1976). Movement perception in computer driven visual displays. Behavioral Research Methods & Instrumentation, 8, 224–30. Spotteswood, R. (1951). Film and its techniques. Berkeley: University of California Press. Swedlund, C. (1981). Photography. Fort Worth: Harcourt Brace Jovanovich College. Toulet, E. (1988). Cinématographie: Invention du siècle. Paris: Découvertes Gallimard. Truffaut, F. (1983). Hitchcock (Rev. ed.). New York: Touchstone. Volkman, F. C., Riggs, L. A., & Moore, R. K. (1980). Eyeblinks and visual suppression. Science, 207, 900 –1. Volkman, F. C., Schick, A. M. L., & Riggs, L. A. (1968). Time course of visual inhibition during voluntary saccades. Journal of the Optical Society of America, 58, 562–69.

P ERCEIVING S CENES / 27 Wallach, H., & Karsh, E. B. (1963). The modification of stereoscopic depth-perception. American Journal of Psychology, 76, 429–35. Willats, J. (1995). The draughtsman’s contract: How an artist creates an image. In H. Barlow, C. Blakemore, & M. Weston-Smith (Eds.), Images and understanding (pp. 235–54). Cambridge, UK: Cambridge University Press.

2 The Value of Oriented Geometry for Ecological Psychology and Moving Image Art Robert E. Shaw and William M. Mace SCIENTISTS AND ARTISTS share the same environmental habitat (roughly, where they live) but occupy distinct, somewhat intersecting econiches (roughly, how they live). Although evolving within the same natural frame, their arenas of life are so dramatically different—the former tending toward the rational and the latter toward the expressive—that no easy comparison can be made of their methods or content. Yet, they have much in common. For instance, they have both made major contributions to the broadening of our culture of shared experiences. Such experiences are of two kinds: first, those that arise from direct perception of the environment, something all animals have in common; and, second, those that arise vicariously, as second-hand experiences, through indirect perception, or the use of substitutes for the real thing. Historically, humankind has distinguished itself from other species by its attempt to produce a vision of nature—to produce records of that vision, with various degrees of fidelity and stylistic expression, to be shared and appreciated by others. Where art has pioneered our expressive side through poetry, dramaturgy, painting, sculpture, and music, among other things, science has advanced our rational side through basic research, theory, and technology. Milestones for both science and art were the discovery of various means for reproducing objects and events of general social interest vis à vis drawing, sculpting, painting, writing, printing, the telegraph, the telephone, photography, the phonograph, radio, movies, television, and computers. Drawings or paintings of people, landscapes, seascapes, or social events, such as sports, dance, travels, and trials, when framed and hung in a public place, become sources that capture some of the information contained in artists’ once-personal experiences but which can now be shared publicly by many. Let’s consider more carefully what this act of reproducing might entail. We are so familiar with various forms of reproduction that we scarcely recognize what marvels they really are. Why do they work? There are two fundamental reasons: one having to do with intentionality, the other with causality. First, the very nature of one object, the object of intention, being in some way a reproduction of another object, the object of reference, is that the first refers beyond itself to the second. This is what is meant by the intention of the first being to refer to the second. To refer entails, at least, that when we perceive the first object, something about it formally resembles the reference object, and that thus in our experiencing the first, there is some part that would agree with our experiencing the second if such experiencing should occur. Second, there must be a causal basis for such intentional reference. But the nature of the referential relationship between the two objects is such that the absence of information about their causal connection does not mean absence of information about their intentional con28

THE V ALUE OF ORIENTED GEOMETRY / 29

nection. Hence (and this is the main point) the intentional entailment that exists between them is not solely dependent on the causal entailment. Because this is so, a certain freedom for expressive variety exists for intentional entailment that is not allowed for rational entailment. The concept of ecological validity1 is useful for comparing views of the world acquired through direct perception as opposed to indirect perception. To illustrate this concept, consider the following example. A relief map of a landscape may show the lay of the land, the shape of forests, the meander of rivers or wiggle of streams and where their courses take them, relative to mountains and valleys; and, perhaps, it also shows virgin countryside where no houses or roads have yet been built. Later maps may show progressive variations in the topography after erosion has shifted the lay of the land, say, due to a forest being burned and tumultuous runoffs now allowed where before soaking action contained the water. Still later maps may show that houses and other buildings have cropped up since the addition of a major highway and its access roads have made commuting easier. Thus, the series of maps show what was, what is, and over successive differences, what transpired in the periods between cartographical perspectives. The series of maps offers graded records of a natural dynamical perspective unfolding over time, a historical event that can be causally explained by natural processes acting over the time samples. Now imagine that someone accidentally shuffles the series of maps because their timetags were lost. The differences between the maps would be out of causal order, with laws of nature appearing to be violated. Burned forests would sprout immediately full-replacement growth, and erosion creases would be inexplicably erased, as houses became dismantled and roads became covered by dirt, rocks, underbrush, and trees. No ecology could change in such a manner. Consequently, we would be justified in concluding that the properly ordered series was ecologically valid as a historical event because it conformed to the laws of nature. On the other hand, the improperly ordered one was ecologically invalid because the processes witnessed were unrealistic, being in violation of natural law. Direct perception of the landscape daily, say, by a forest ranger from a tower on a mountain peak or by a pilot whose plane flew daily over the landscape, could confirm the naturally ordered series but not the unnaturally ordered one. Direct perception has ecological validity because, in principle, it has direct access to confirmatory information while indirect perception may or may not. To ensure that indirect perception of the landscape over the series of maps was ecologically valid, that is, that it conformed to direct perception, one would have to supply the missing time-tags or some other means that would leave their temporal order inviolate. One way is to label the maps with numerals and to give instructions that the maps are to be looked at in the order assigned, assuming that all who are allowed access to the maps understand the forward-counting convention. In this way we can see that a convention acts as a constraint that makes indirect perceptions conform to direct perceptions. Such conventions are required because perspectives taken on the world through indirect perception have more uncontrolled degrees of freedom than those taken through direct perception. That is to say, indirect perception allows for ecologically invalid information to be fashioned about an event even though the source-event is always ecologically valid, a fact that can always, in principle, under ordinary circumstances, be validated by direct perception. There are, however, extraordinary circumstances where the conventional constraint is suppressed or unavailable. Under such circumstances, the indirect perceptual event can take on a life of its own. To the extent that such extraordinary circumstances defy

30 / ROBERT E. S HAW

AND

WILLIAM M. MACE

rational (lawful) explanation, they serve to increase the mystery, metaphoric depth, and hence expressive power of the indirect perceptual event. This is one important way, perhaps the most important way, by which great arts attain expressive dimensions that surprise, challenge, and entice the viewer. Contrary to what some have argued (Goodman, 1968), art is not in this sense conventional like language but unconventional. Art is always lacking some degree of ecological validity because the expressive stylistics imposed by the individual artist are unique and defy rational conventions which would make the art object an easy read. Of course, there is much that can be rational in art depending on how obvious its representational content; but there are also dimensions of expressive depth to be exploited by defying convention, as is found in the extreme in abstract expressionism— with impressionism falling between these poles of rational content versus expressive style. The main goal of this paper is to show an example of extending the scope of lawfulness of projective geometry and thereby the basis for direct perception, but in doing so we also show a basis for controlled violations of ecological validity—available for use by the artist. To take liberties with the laws, one must know what the laws are and how to violate them skillfully so as to preserve some more-general constraint. Contrasting Theories of Perspectives Traditional theories of visual perspectives have been based on ordinary projective geometries. The technique of central projection is typically adopted without question as being the proper one for optics (e.g., photography), art (e.g., linear perspective), and psychology (e.g., retinal image theory). Here we wish to offer a glimpse of another theory of projective geometry that promises a simpler and more accurate description of visual perception and, perhaps, will have more potential usefulness for photography and art. By describing this alternative, oriented projective geometry, we mean to bring underlying geometry into focus as part of what can be tested and modified in the course of our science. Sometimes, it appears that researchers take projective geometry to be given and unmodifiable, leaving hypothesis formation and testing to be about tricks and assumptions for applying the geometry rather than revising the geometry itself. The emphasis of ecological psychology on lawfulness leads us to look to modifying theory as deeply as possible in order to minimize arbitrariness in the hypothesized system. It is not uncommon or unreasonable to regard limits on geometry to indicate limits on lawfulness. If we can extend the reach of geometry, we may justify a broader scope for lawfulness. We will caution that oriented geometry does not have all the properties we ultimately seek in a geometry but that it offers an advance worth making. Projective theories have many practical uses in both art and science—a most important one being to model linear perspective in drawing and graphical computation. A second popular use has been to model the optical projection of objects and scenes observed in the world into the visual system. Traditional theorists treat the optical projection of the retinal image as a putative first stage (p1) in visual processing. A second neurological projection (p2) over the optic nerve tract and past the optic chiasm eventually reaches the visual cortex. And, finally, a third phenomenological projection (p3) takes the cortical information into a visual experience in some way still not fully understood. Under this view, the retinal image is the first and most primitive site containing the visual information to be projected and, perhaps, cortically processed before being experienced. The fact that, geometrically speaking, the retinal image is a two-dimensional object representing three-dimensional objects and scenes has posed a perplexing puzzle for the

THE V ALUE OF ORIENTED GEOMETRY / 31

traditional perceptual theorist. How can we recover the third dimension from a twodimensional image? This has been called the tridimensionality problem. If it were possible, however, to render the retinal image superfluous as a stage of processing, the main issue would then be how information gets into the visual system, without worrying about the specific properties of the retinal image. We could then move our theoretical concern to the second stage of projection described above without further ado. In traditional psychological terms, we would say that the distal object was the referent rather than the proximal object (i.e., the retinal image). The issue would be not what image is projected but how the information about the object remains invariant under such projection. The optical physics connecting the distal referent to the eye dynamically influences the retinal firing pattern so that the visual pathways project the information experienced with high fidelity in a special sense. If perception is to be a direct (uncorrupted) specification of the world vis à vis information detected and directly experienced, then the medium of the central nervous system inside the body, like the medium of air outside the body, would have to be “transparent” and so pass the properties of the referent invariantly into experience. This transparent projection may be instantiated in many different energetic modes in between the reference object and the intentional object, but this is of no concern to the perceiver (unless the perceiver is a scientist); the perceiver merely sees the world as it is through his visual system, which has been carefully and relevantly tuned by evolution and learning to help him remain adapted to the environment. That is, it has been tuned to yield ecologically valid experiences. Why should the retinal image be noticed in the course of perceiving the world? It is well known that our brains are insensate to being touched by probes; why then might not the retina, an extension of the brain, be insensate to the ephemeral touch of dancing photons and their rhythmic image? Is it required that the retina be treated as an image plane? Could this light-sensitive surface at the back of the eye be treated instead as a window? When one looks at the world through a window, there is a flat surface (the window glass) interposed between the observer and the world, but we do not say that the observer is looking at the window in order to look out the window. A window washer needs to look at the window, but ordinary observation through a window does not involve reading an image off the window. Consider looking outside a building through a window that is open versus one that is closed. Are these cases very different from one another? To the extent that window glass is transparent, we do not see it. We see the plane of the window only to the extent that it is not transparent, and what we see when we look at dirt on a window is something about the window itself, not the scene on the other side. A few people have argued that the retinal image, treated as a stage of analysis, is unnecessary or, even worse, a red herring which confuses rather than clarifies our understanding of perception (Gibson, 1966, 1979/1986; Haber, 1983). Let’s consider two of these arguments, followed by a third, which is the primary focus of this chapter. First argument: Retinal image is a red herring. The retinal image is an inverted, smallerscale image of objects in the world, being projected upside down on the back of the eyeball. Yet, we do not experience the world itself as being inverted or small enough to fit into the eyeball. Furthermore, we have learned that by the wearing of inverting-prism goggles, before adaptation occurs, a person’s reaching behavior is disturbed in predictable ways (Kohler, 1964; Dolezal, 1982). Thus, it seems that the image can be functionally inverted but not experienced as inverted under ordinary viewing conditions. The stage of interest therefore does not seem to be the first inversive projection (p1) or even the second insensate cortical projection (p2) but the information resultant of com-

32 / ROBERT E. S HAW

AND

WILLIAM M. MACE

posing the first and second projection (p1 x p2) into a third and final experiential projection (p3 = p1 x p2). Thus, the perceptual experience is not an event at the end of the train of three projections, not an effect that magically “pops out” at the end of a causal chain; rather, the experience longitudinally penetrates all three projections, with one foot in the environment and the other in the perceiver, and nothing but transparent physical and neurological media lying between. The resultant projection is over-mixed media (air and tissue) that are informationally transparent to the invariant properties of the environment—a distributed experience whose support is over the three projections. Knock out any of the distributed causal supports anywhere along the three projections, however, and the immediate consequence is some kind of blindness. The transparency would be destroyed. Vision may fail because there is no light, or eyes are shut, or when the lenses are clouded by cataracts, or when the humors of the eye are too filled with debris (diabetic hemorrhages), or when fluid pressure compresses the ocular nerve (glaucoma), or when the retinae are detached, or when ocular tract has lesions or arterial occlusions, or when there is cortical damage, or when one is hit hard on the head, or when one is chronically inattentive or when temporally distracted. If causes of blindness can be distributed at different sites along the causal chain, then so can causes of sightedness. Why restrict experience arbitrarily to any specific location? Hence, the head is more likely in the experience than the experience is in the head, say, at the retinal image or some particular brain state. Because no one has solved the hard problem of where experience is located in the central nervous system (Chalmers, 1996), then we may locate it distributively over the field of concern. Our experiences join us with the objects experienced because our objects of intention directly specify our object of reference (Hintikka, 1975), so long as our history has appropriately attuned our perceptual systems to the relevant information that information is detected by us (Chan & Shaw, 1996). This ecological view of direct perception differs somewhat from that of the Gestaltists’ principle of psychoneural isomorphism, for it incorporates their brain field into an ecological field; their brain field is integrated into a more comprehensive psycho-neuro-physical field that interfaces a functionally defined environment (econiche) with its functionally defined organism (a perceiver-actor). (Note: Experiences that arise from dreaming, imagining, or hallucinating are allowed but simply do not have the referential transparency that direct perceiving and knowing do). Second argument: Ganzfeld is experienced as three-dimensional. Assume that the total field of view is entirely filled by an illuminated, white, featureless surface. Such a homogeneous field of light with no visible boundaries is called a Ganzfeld. There are no focusable contrasts for binocular hunting to stereoscopically lock onto or to which the eyes’ lenses can accommodate (i.e., change their shape). Nevertheless, shouldn’t the experience be one of two-dimensionality? If the retina is an image plane, would we not still see the retina in the absence of a projection? A blank canvas is still an object to be seen. If the retina is an image plane to be seen, then shouldn’t it show up as a flat surface if no perspective projections are given? But will a person really see a two-dimensional image of lightness, as if a white surface has been painted on the retina? Or will one see instead a white, featureless surface located at some determinate distance from the point of observation? If so, how far? Such Ganzfeld experiments have been done (Metzger, 1930; Gibson & Waddell, 1952; Cohen, 1957) but with an outcome that could not be predicted from retinal image theory.

THE V ALUE OF ORIENTED GEOMETRY / 33

Instead of experiencing a white surface at some indeterminate distance in the so-called frontal plane, people report experiencing a three-dimensional translucent volume of indeterminate depth. Statements are made like: “I am looking into a penetrable white fog that completely surrounds me!” Thus is our most primitive visual experience, as the Gestalt theorists argued, an autochthonous experience of three-dimensional, unbounded openness (Ganzfeld) that arises independently from nowhere. Ecological psychology would explain this experience otherwise than being a mysterious autochthonous “force.” Like the rest of science, we would look for a sufficient reason for the phenomenon. The basic premise of an ecological theory of perception is that we see what we see because the information from the environmental situation is what it is. In other words, we do not see what is simply in the light to the eye, as the physicist might construe it; rather we see what is functionally specified by the light to a highly evolved visual system— one that has been adaptively designed to fit its environment by evolution and further attuned by experience. The information contained in a Ganzfeld specifies no surface because there are no focusable features on the surface creating the Ganzfeld. The eyes cannot accommodate to any given distance because no specific distance information is given. A situation in which there is light but no surface information is an insubstantial medium (like fog) quite capable of indeterminate penetration. This explains the first part of the argument needed: namely, how a 3-D object can be represented on a 2-D surface. Third argument: Ordinary projective geometry does not preserve orientation information. Because traditional perceptual theory depends on the retinal image projection but such theory fails to explain the experience of tridimensionality, we must search for a different theory. No matter what projective theory is needed for describing the information input for visual experience, it must be based on a different kind of projection than that which describes the retinal image. Even if this were not sufficient to cast doubt on retinal image theory, there is a third even more telling argument having to do with the fact that ordinary projective geometry fails to preserve orientation information. Such a theory fails because the retinal image does not distinguish two distinct kinds of projections that need distinguishing if mischief is to be avoided. Consequently, this brings us to our third argument, which in many ways is the most important one. Orientability, among other things, allows us to recognize counter-clockwise rotations from clockwise ones, left from right, top from bottom, and inside from outside. The argument we wish to present is based on whether the topological property of orientability is present or absent in the projective space of interest—whether this be the retinal image treated as a two-dimensional projective space or the dynamical retinal image treated as a three-dimensional projective space. We discuss next this important property of orientability, which is missing from all ordinary projective geometries regardless of their dimensionality. After that, we shall turn to framing a mathematical basis for an ecologically valid projective geometry. Orientability and Sidedness Take a few minutes to scrutinize carefully the following figures, and then we shall pose a few telling questions. Look carefully at figure 2.1A. It depicts an ordinary two-sided carton with two cells. There is clearly an outside and an inside. The left-most arrow is inside the left cell; the middle arrow is outside the left cell and inside the right cell; and the right-most arrow

34 / ROBERT E. S HAW

AND

WILLIAM M. MACE

Fig. 2.1. Orientability and sidedness. A is a two-sided, surface with orientability while both B and C are one-sided surfaces without orientability. Can you see why?

is outside the right cell. Compare A with B: B is also a carton with two cells; but how many sides does it have? Notice that the three arrows in B have exactly the same placement as those in A relative to the flat surface of the page but not relative to the surfaces of the depicted cartons. The right-most arrows in A and B point in opposite directions on different sides of their respective cells: In A, the right-most arrow is outside and points backward while the corresponding arrow in B is inside and points forward. If this does not seem remarkable, then compare the middle arrows in each. In A, this arrow is inside the right cell and points outward but has no clear orientation in B: It seems to be outside of both the left and right cells and to point outward and inward, respectively, at the same time! C simplifies the picture so that it is easier to see that the surfaces of the B carton are based on the one-sided Möbius band. To clarify the relationship of orientability to sidedness, consider figure 2.2. To understand this breakdown of the orientability property, we need to understand sidedness—an important topological property that projective geometries usually do not preserve. Later we shall see that two-sidedness is a necessary property of projective geometries to have distinguished, as pointed out earlier, front from back, inside from outside, left from right, top from bottom, and clockwise from counter-clockwise. To anticipate further: Two-sidedness is the minimal property any geometry must have, whether projective or not, if it is to have a way to handle occlusion information—a key informational invariant of a theory of the three-dimensional layout of surfaces in the environment and hence one of the most ubiquitous sources of information for perceiving three-dimensionality. Try this demonstration: As shown in figure 2.2A, glue the corresponding ends of a paper strip together to make a cylindrical band (i.e., a⇔a, b⇔b). Notice that an ant crawling on the inside circumference of the band would stay on the inside or if crawling on the outside surface would remain on the outside surface. It would have to crawl over an edge to change sides. For this reason, the cylindrical band is called a two-sided, bounded surface.

THE V ALUE OF ORIENTED GEOMETRY / 35

Fig. 2.2. The two-sided cylindrical band versus the one-sided Möbius band.

Now, as shown in figure 2.2B, take another paper strip and glue the noncorresponding ends together by giving the paper strip a half-twist (i.e., a⇔b, b⇔a). Notice that an ant crawling on this surface, even without crossing over an edge, will nevertheless cover what appears at one moment to be the inside but at another moment the outside. This is why the Möbius band is called a one-sided unbounded surface. If we draw a closed path on the circumference of the cylindrical band, an arrow transported around this path will retain its orientation (see fig. 2.3), but an arrow transported around the corresponding closed curve on the Möbius band will not retain its orientation. The property of orientability is a consequence of the surface being two-sided, while the loss of this property is a consequence of a surface being one-sided.

Fig. 2.3. Preservation or loss of orientability. A, parallel transport around a two-sided surface preserves orientability while B, parallel transport around a one-sided surface does not.

Orientable and Nonorientable Objects To make clear the way in which projective transformations typically lose orientability information, consider the simple example of rotating a triangle in the plane.

36 / ROBERT E. S HAW

AND

WILLIAM M. MACE

Fig. 2.4. Removing ambiguity from a projected rotation event. Here > specifies order of sequential occurrence (i.e., to the left of on the projective line) and bold letters denote the front range of the projective mapping.

The sequence I, II, III in figure 2.4 denotes a clockwise rotation, while the sequence I, III, II denotes a counter-clockwise rotation. Rotation direction reverses if the back range and front range are interchanged. Because occlusion information for orientability is suppressed, the projected dynamical shadow of any rotating object appears to reverse direction spontaneously. This is explained by the fact that some of the successive relationships on the projective line reverse order (indicated by the arrows). If occlusion or any other information is available to “mark” the front range (indicated by the bold letters), then there is no misidentification of what is in the front range and in the back ranges. Hence, orientability information is preserved under projected rotation. These arguments apply to any objects regardless of shape. Consider another case of loss of orientability: the so-called Necker cube. Real Cubes, Necker Cubes, and Projection Topology Assume that we have a cube in 3-D space (fig. 2.5, column I) that is projected onto a 2-D surface (column II-top) thus collapsing the cube’s six faces into a complex with a maximum of seven co-planar, polygons (depending on the orientation of the cube in I, the number can be smaller); this 2-D polygonal complex is then topologically transformed into the unit circle (column III-top), preserving the seven regions but not their shapes. Alternatively, the ordinary cube can be projected in two other ways: either onto a one-sided (nonoriented) 2-D representation of a 3-D cube, called the Necker cube (IImiddle), or onto a two-sided (oriented) 2-D representation of a 3-D cube (II-bottom). The number of each face is placed in the center of that face. For example, the numeral 5 refers only to the square that its face is centered in, and 6 refers to the square that its face is centered in. The one-sided figure (Necker cube representation) is then also topologically transformed into a unit circle in such a way as to preserve the ambiguous orientation of its faces, while the two-sided (real cube representation) is so transformed

THE V ALUE OF ORIENTED GEOMETRY / 37

as to preserve the unambiguous orientation of its faces. The ambiguity of column IImiddle is shown by arrows leading to both the middle and bottom topological maps in column III. Column II-bottom, on the other hand, is drawn with an arrow only to the column III-bottom topological map. The middle and bottom maps in column III indicate that two faces are projected to each of the seven regions. In column III-middle, the bold numbers—1, 4, and 6—depict the faces that are seen in front. The numbers not in bold—2, 3, and 5—indicate the faces seen through the faces that are in front. The bottom map in column III represents the alternative orientation of the cube with fronts and backs interchanged.

Fig. 2.5. Contrasting perspective theories: nonoriented and oriented projective geometries.

Hence we have a reason for the three different experiences. Namely, they correspond to the three projections: • I => II-top => III-top: no occlusion information • I => II-middle => III-middle and III-bottom: occlusion information is ambiguously specified • I => II-bottom => III-bottom: occlusion information is unambiguously specified If information is defined as specification of reference-object properties under a projective mapping and if we have three projective mappings that convey information from the environment to the visual system in three different ways, then we should have three different experiences (intentional objects); and of course we do!

38 / ROBERT E. S HAW

AND

WILLIAM M. MACE

Just as in the argument that the Ganzfeld is to be explained by the projective mapping being a faithful specification of three-dimensionality, we now argue analogously for the Necker cube. It is an ambiguous figure, not because some creative cognitive magic takes place but because its projective mapping, like the Mobius band, is a forgetful specification, leaving behind occlusion information. This contrasts sharply with the faithful projection of the occlusion information in the case of the unambiguous representation of the cube. Because the latter projective mapping is the most faithful in specifying invariantly the properties of the reference object, then it provides the most ecologically valid experience although it should be understood that all three experiences are accounted for by direct perception. Namely, you see what you see because the information is what it is. The specification is of just those properties experienced. Nothing need be cognitively constructed, remembered, or inferred—that is, no autochthonous “forces” need be postulated to account for the alternative experiences. More would need to be said to explain the selection of one of these alternatives at a given time, but this would be a selection among justified alternatives, not a creation or a construction. Next, we must discover why the orientation-specific information is lost in our experience of the Necker cube. Oriented Projective Geometry Ordinary projective space, such as the Mobius band and the Necker cube, is one-sided as shown in figure 2.6. The spherical model of this geometry represents the fact that the projections of a point on the back of the sphere and of a point on its front both have the same image in the Euclidean (projective) plane, represented here as an infinite disk. (Note: The circumference of the disk actually lies at infinity where the angle of projection reaches 180 degrees, i.e., lying in the xy-plane, and completely covers the plane with images of points from the sphere.) All of the projected points, regardless of the hemisphere to which they belong, cover the projective plane in the usual way without any designation of where they originated. The loss of orientability is due to this failure of the projective mapping to preserve the distinction between the front and back range, collapsing both into positive values of the dimension of depth w. This loss of orientability is represented by the fact that relationships (e.g., the arrows) invert when the projective angle passes through the points at infinity.

Fig. 2.6. The spherical model for ordinary projective geometry.

THE V ALUE OF ORIENTED GEOMETRY / 39

To keep the front and back ranges distinguished, traditional computational geometries use the line at infinity as a reference. This means we would have to exclude certain “degenerate” cases, such as line segments with one end on that reference line. But this move is not a real solution to the orientability problem in ordinary projective geometry because it is tantamount to a return to Euclidean geometry and hence to a geometry without a natural theory of perspective. Graphics programmers use many tricks to distinguish the front and back ranges: among them normalized signing, ray tracing, and a negative weight-clipping rule. These are ad hoc provisos rather than a systematic change in the basis of the geometry itself. For this reason, in traditional perspective theories, occlusion information is not principled. There is a better way of keeping the orientability information intact. Oriented projective geometry introduces a principled way to distinguish the front and back ranges. (We follow, in part, Jorge Stolfi’s 1991 book in this presentation, which we highly recommend for those who are mathematically inclined.)

Fig. 2.7. Oriented projections with duomorphic projections.

In figure 2.7, we assign a dual range, +w and –w, to represent the front and back ranges of the spherical model, respectively (the front range and the back range are shown with the opposite dimension suppressed). In figure 2.8, the projective plane is no longer without thickness but is a manifold (surface) of infinitesimal (e) thickness. Hence, every point on the “thick” plane is a double point, with each member of the pair being marked by either +w or –w, depending on whether it occupies the front or back range. Also, the line at infinity is no longer needed as a reference line. Let’s take note of a few of the technical concepts needed to describe the new projective geometry. Here we get a duomorphic or double covering of the projective manifold, that is, a covering by double points. A double point is not just two coincidental points but is also a neighborhood defined by a duomorphism. A duomorphism comprises two distinguishable functions, such as, a pair of dual projective transformations, with distinct ranges lying within the same topos—a concept from category theory referring

40 / ROBERT E. S HAW

AND

WILLIAM M. MACE

Fig. 2.8. The spherical model for oriented projective geometry.

to an infinitesimal region containing both points and a rule for distinguishing them in terms of their origin rather than their destinations. Here what looks initially like a twoto-one mapping from domain to range is actually a pair of dual mappings, or duomorphisms. Recall that an isomorphism is a mapping that is one-to-one and onto, while a duomorphism is a kind of isomorphism that is only ref lexive and symmetrical but not transitive as are other isomorphisms, such as equivalence and identity. Finally, we spoke of a “thick” two-sided surface, or manifold. But how thick is infinitesimal thickness? An infinitesimal number is a number that is greater than zero but smaller than any real number and belongs to the hyperreal domain consisting of both the real points and the infinitesimals nested among them. (See J. L. Bell, 1998, for a lucid introduction.) In the next section, we apply this model to discuss some key issues of perceptual theory. A Plethora of Double-points The “depth” seen at an occluding edge of a surface (fig. 2.11) involves a scission effect, just as does a surface seen though a semi-transparent surface (fig. 2.9A). A scission effect is where a single projection carries information for more than one surface. Clearly, all the points along the line defined by an occluding edge qualify, as do all the points seen through a semi-transparent surface. In each case, a point c on the projection surface involves at least a pair of other points, a and b: One point a is seen to lie either in front of or behind another point b (see fig. 2.10). Although the separation and order of the surfaces in depth is nearly always clearly specified in occlusion, only the separation of the surfaces in depth, not the order, is clearly specified in transparency. In transparency their order usually appears indeterminate. The indeterminate order of separation can be understood as the failure to break parity. Convexity: The Missing Ingredient? Two unanswered questions deserve attention: If projective mapping can convey surface separation information but not determinate order, then by what information is determinate order specified? Of course, the answer is whatever information conveys twosidedness ipso facto specifies ordered separation in depth. But this does not say precisely what such information is. To appreciate the nature of scission effects, it will be instructive to pay careful attention to another property that often accompanies orientability

THE V ALUE OF ORIENTED GEOMETRY / 41

and is only defined if it is, namely, convexity. After defining this new concept, we will try to show how it may be the missing piece of the puzzle of depth perception and that it introduces order into the scission effects, regardless of how they are achieved. Please study the displays in figure 2.9 for a moment.

Fig. 2.9. Transparency and convex sets.

Fig. 2.10. Occlusion information specifies double-points.

One experience typically specified by figure 2.9A is of a semi-transparent disk covering a surface with two contrastive regions. A certain law of psychophysics (Talbot’s law) has been shown to account for a broad class of transparent depth phenomena so long as certain initial conditions are satisfied (Metelli, 1974; Anderson, 1997). One condition is that light contrast values must be present; and the second is that the light contrast values be in a certain order. In figure 2.9A, both conditions are met, and a transparent depth is experienced, while in B, they are not, and no such experience arises. This constitutes a general law of ecological physics in that it systematically links information conditions with a specific experience. Note the chord across the open disk at figure 2.9C and also across the closed disk at figure 2.9D. Think of the circular area in C as indicating empty space. This makes C a nonconvex figure and D a convex one for the following reason. A convex set is one such

42 / ROBERT E. S HAW

AND

WILLIAM M. MACE

that if the end-points, a and b of the chords lie in the set, then so must any point, c, lying between the endpoints. Clearly, then, the open disk of C is nonconvex while the closed disk of D is convex. We can use the convex set property to clarify the figural condition so as to distinguish the two spurious cases from the ecologically valid one, as illustrated. An ecologically valid display for transparent depth must satisfy the following geometric and optical information conditions: First, the light contrast conditions must have the values dictated by Talbot’s law; Second, the light contrast values must be arranged properly, such that • they have values in the back range that do not belong to the front range, and • they specify a convex set, which has values in the front range that do not belong to the back range. If these conditions are met, then there is information for a scission effect that could only originate from a source with an ordered separation of surfaces. Hence, the information would have the fidelity required to qualify as a direct specification of an ecologically valid experience. Circularity is avoided in defining these transparent depth conditions in that the scission effect needed for surface separation and order is assimilated to the new oriented projective geometry as a consequence of two-sidedness—something that would not be possible in the old projective geometry. Dynamical Occlusion as Displaced Accretion-Deletion Fronts If the visual system is to distinguish between the front and the back range of an environmental projection, then there must be information for the order of separation of the surfaces specified through the optical projection. Ecological optics, as opposed to traditional optical physics, has accepted the task of discovering such information sources that account for our most important and most salient experiences of the environment—an environment shared by all life forms and within which they must organize and direct their behaviors in adaptive fashions. Some of the most ubiquitous and most important information is that for specifying the occlusion of one surface by another surface as seen by a perceiver. Occlusion information through interpositioning, however, is not the only means for specifying the order of surface separation. Nonoccluding surfaces may be ordered in depth if they occupy different positions of an optical texture gradient or if they move at different rates toward or away from the perceiver. Because we cannot survey all such cases here, let’s consider dynamical occlusion as our last example. It has been well established that ordered depth effects are specified by accretion and deletion of texture, even in the absence of occluding real surfaces (Kaplan, 1969). Imagine that on a computer screen or in a movie, one sees a randomly textured pattern that completely covers the screen (fig. 2.11). Then suddenly a small rectangular section of the random texture is seen to emerge (at t1) from the background camouflage, moves in a straight line for one-third of the screen width (t2-t3), and finally stops, merging back into the camouflage of the background (t4). This merging into the background shows that the edges over the accretion and deletion change and do not exist in the static image. In the real-world case in which one surface dynamically occludes another, the leading and the trailing edges of the surface in front will define moving fronts of accretion and deletion. The moving accretion and deletion front is the information that optically

THE V ALUE OF ORIENTED GEOMETRY / 43

Fig. 2.11. Accretion and deletion of texture specifies object motion.

specifies an edge with depth. However, for this to be unambiguous, the texture between the accretion and deletion edges must be preserved. A case of more-pure accretion and deletion without preservation of internal texture shows us that accretion-deletion alone is not sufficient to unambiguously specify ordering. Suppose that one creates a case in which background texture is deleted while foreground texture is accreted—and that’s all. This would define the leading edge of a possible occluding surface, albeit an odd one. Deleting foreground texture and accreting background texture define its trailing edge (see fig. 2.12). For the case depicted, the texture trapped between these nonadjacent fronts defines the occluding surface as a convex set with values in the front range.

Fig. 2.12. Illustrating accretion and deletion fronts.

In figure 2.12, regions of background texture (A, B, C, D, E) get replaced by regions of foreground texture (u, v, w, x, y). The sub-regions A, . . . , E and the sub-regions u, . . . , y are static regions on the surface of the screen. Texture replacement is defined by accretion of new texture in the place of existing texture, which is correspondingly deleted. Where the accreting and deleting take place, changes occur—optical disturbances but no texture is actually transported over locations. In this case, however, the display may reverse so that what was the occluding convex set in the front range becomes an occluded surface in the back range, and the occluding surface is now not a convex set. To make the possibilities clear, we depict three cases that might be experienced from the same display (fig. 2.13A, B, C; these cases are meant to be read as fig. 2.12, but in fig. 2.13, the small rectangular regions are offset for illus-

44 / ROBERT E. S HAW

AND

WILLIAM M. MACE

trative purposes.) Case A is the accretion-deletion fronts just as before; but these may be experienced as either case B or case C. In case B, the occluding surface is seen as convex while in case C, the occluding surface is seen as nonconvex, that is, as a small moving aperture through which is seen the background texture.

Fig. 2.13. Ambiguous occlusion.

The dynamical display is therefore reversible, like the Necker cube. In figure 2.13C, Rule 1, if the leading edge accretes behind itself, and the trailing edge deletes in front of itself, then the occluding set is convex; while in figure 2.13B, Rule 2, if the leading edge accretes in front of itself, and the trailing edge deletes behind itself, then the occluding set is nonconvex. These relative directions of accretion and deletion, therefore, partition the total display into those regions belonging to the front range and are thus occluding; and these relatives directions also partition the total display into those regions belonging to the back range and are thus occluded. Rules 1 and 2 define a duomorphism and are dual theorems. Perceptual Fidelity Is a Function of Ecological Validity In this final section, we return to some of the earlier issues regarding ecological validity. With the differences between ordinary and oriented projective geometry firmly in mind, we now have a principled basis for our argument. From the discussion of transparency, we saw that information may be from a source (e.g., display) that need not have the property experienced. Notice that A in figure 2.9 is a source of information that specifies a transparent depth experience. In constructing the display, however, one may have followed any of the three procedures, although only D represents an actual case of a transparent surface being placed over the contrastive background surface. Hence, the source of the information might not possess the property that the information from that source specifies. There are, of course, other procedures one might have followed, such as painting a picture (intentional object) in the proper way to specify a transparent reference object (source) or writing a graphics program to create such a display. The question that is nagging is the following: If a display (optic array structure) can be contrived to create information experienced as it having some property X but fails to possess that property, then might we not be fooled about our natural environment? Might perception have low fidelity? The short rebuttal to this question is that our perceptions have fidelity to the degree that the actions we take in accordance with experiences succeed in achieving the intended

THE V ALUE OF ORIENTED GEOMETRY / 45

goals. There can be no better yardstick for perceptual fidelity than the degree to which information is lawful in helping organisms as perceivers achieve positive outcomes as actors. The ecological fidelity for information detected by successful actors must be high, whether they be simple or complex organisms. From a pragmatic point of view, then, truth in perceiving the world is determined by the value gained or lost. Ecologically valid experiences are simply those that are most lawful in preserving the “right” values. Truth is what truth does! Hence, the issue of information fidelity is neither more nor no less than the issue of ecological validity. Ecological Realism as a Critical Perspectival Realism Occlusion is a perspectival relationship that only makes sense when the point-of-view is such as to place one surface between the perceiver and another surface. For this reason, what distinguishes the front range from the back range is the place of the perceiver, as the projective surface, within the layout of surfaces in the environment. This makes the problem of occlusion perspective-dependent. Its laws, as formulated in Rules 1 and 2, belong to ecological physics rather than ordinary physics, where laws are intended to be universal and perspective-free rather than general (socially invariant) and perspective-dependent. Perhaps, the notion of ecological realism that perception is direct but does not entail naïve realism is the most difficult principle of ecological psychology to grasp; namely, for example, that the visual world consists of a 3-D optical array of texture where all forms of change are merely optical disturbances in various regions of the array with its nested n-tuple ranges. We have seen cases where critical realism has come into conflict with common-sense (naïve) realism often in science: most dramatically, perhaps, when Copernicus refused to accept the naïve realism that claimed that the sun orbited the Earth because we “see” the sun rise in the east and set in the west; or when we reject the flat-Earth hypothesis, even though under our limited viewing conditions, the Earth does indeed look flat. These naïve claims are not based on ecologically valid experiences, because we have drawn inferences that go beyond the limited information available. We may only conclude legitimately that the optic-array information samples that we have under our restricted circumstances do not themselves rule out either the geocentric theory or the flat-Earth hypothesis. The information available surely does not affirm them but leaves room for two alternative hypotheses—the heliocentric theory and the round-world hypothesis. This wiggle room for critical realism to assert itself is justified because it allows a change in possible viewing circumstances, say, by taking a ride into outer space. From this broader, unrestricted perspective, we see directly how the local flatness of the Earth gives way to global curvature, and how the Earth moves among its sister planets to circle the sun. As scientists, we must be conservative with our guesswork. For we do not have the ontological luxury of assuming the character of surfaces, objects, and events are as commonsense experience tells us. Rather, we must discover the information that specifies their particular perceptual character by broadening our perspectives. In this way, we accept no cheap ontological conclusions about environmental sources (reference objects) but must work to justify all such claims through sound experimental epistemology to identify the information and restrictions on our circumstances responsible for our ordinary experiences (intentional objects).

46 / ROBERT E. S HAW

AND

WILLIAM M. MACE

We then must work toward a systematic relaxation of those restrictions to reveal the larger truth of ecologically valid experiences. Art helps us do this, especially the special effects created by visionary artists. Artists create such circumstances (displays) that inform us of ways to transcend our world of ordinary experiences and in this sense provide intuitive bases for ecological physicists to study nature more directly. Both the motion and the rectangle are specified by the relative accretion and deletion functions. Thus, there is no denying that what we see in movies, videos, or computer graphics is quite different from what actually happens. To understand the optical information for motion, we must distinguish between what is specified by the optical information, the intentional object of our experience, and what happens at the display that is the source of the information, the reference object of our experience. But one should not think of information as mere appearance and its source as the true reality for they are both equally real. Of course, the information may be presented in such a way that it may specify something other than its source. This follows naturally from the condition that more than one source may display the same information and hence give the same perceptual experience. The basic postulate of ecological psychology, you will recall, is that the source is always specified as well. Just because we do not recognize the source for what it is simply means we may have to sample the information more extensively over many more perspectives before getting it right. “Getting it right,” so to speak, is to elaborate our perspective sampling of the optic array until we have a valid ecological experience—an experience that stands up to all lawful scrutiny. Obviously, both scientists and artists learn to contrive situations that look one way under a given set of perspectives and another way under another set of perspectives. Presentation constraints are very important. Special effects technicians, like magicians, know this only too well. The other side of this issue is that special effects that dissimulate their true sources follow from lawful practices that can be understood and reliably reproduced. By a careful study of alternative ways to present the same information, we eventually discover the ecological laws upon which to base our theories of perceptual experience. Note 1. Many writers have used the phrase ecological validity in an intuitive way that is not especially technical. They are aware, however, that Egon Brunswik (1956) was well known for the concept of ecological validity and frequently cite him. In most cases, it appears that Brunswik is credited as a scholarly courtesy. Some writers apparently did not extend Brunswik the added courtesy of reading him. For examples and a scolding, see Hammond, 1998. For extensive material on Brunswik, see Hammond, 1966. Brunswik’s use of ecological validity was a very specific one, the correlation between a cue (say, retinal size in vision) and an environmental property (real size). Ecological validities could have any value on the 0 to 1 range of correlations. Our usage begins closer to the intuitive usage and then is developed more technically within our version of the ecological program (Shaw, Turvey, & Mace, 1982, p. 209). We respectfully notify our readers that we are not using Brunswik’s concept of ecological validity.

References Anderson, B. (1997). A theory of illusory lightness and transparency in monocular and binocular images: The role of contour junctions. Perception, 26, 419–54. Bell, J. L. (1998). A primer of infinitesimal analysis. Cambridge: Cambridge University Press. Brunswik, E. (1956). Perception and the representative design of psychological experiments. Berkeley: University of California Press.

THE V ALUE OF ORIENTED GEOMETRY / 47 Chalmers, D. J. (1996). The conscious mind: In search of a fundamental theory. New York: Oxford University Press. Chan, T. C., & Shaw, R. E. (1996). What is ecological psychology? Psychologia: An International Journal of Psychology in the Orient, 39, 1–16. Cohen, W. (1957). Spatial and textural characteristics of the Ganzfeld. American Journal of Psychology, 70, 403–10. Dolezal, H. (1982). Living in a world transformed. New York: Academic Press. Gibson, J. J. (1986). The ecological approach to visual perception. Hillsdale, NJ: Lawrence Erlbaum. (Original work published 1979). ———. (1966). The senses considered as perceptual systems. Boston: Houghton Mifflin. Gibson, J. J., & Waddell, D. (1952). Homogeneous retinal stimulation and visual perception. American Journal of Psychology, 65, 263 –70. Goodman, N. (1968). Languages of art: An approach to a theory of symbols. Indianapolis: BobbsMerrill. Haber, R. N. (1983). The impending demise of the icon: A critique of the concept of iconic storage in visual information processing. Behavioral and Brain Sciences, 6, 1–54. Hammond, K. (1998). Ecological validity: Then and now. Available: http://www.brunswik.org/ notes/essay2.html. ———. (Ed.) (1966). The psychology of Egon Brunswik. New York: Holt, Rinehart, & Winston. Hintikka, J. (1975). The intentions of intentionality and other new models for modalities. Dordrecht: D. Reidel. Kaplan, G. (1969). Kinetic disruption of optical texture. Perception & Psychophysics, 6, 193–98. Koffka, K. (1935). The principles of Gestalt psychology. New York: Harcourt, Brace. Kohler, I. (1964). The formation and transformation of the perceptual world. Psychological Issues, 3 (monograph 12). Metelli, F. (1974). The perception of transparency. Scientific American, 230, 90–98. Metzger, W. (1930). Optische Untersuchungen im Ganzfeld II. Psychologische Forschung, 13, 6–29. Shaw, R. E., Turvey, M. T., & Mace, W. (1982). Ecological psychology: The consequence of a commitment to realism. In W. Weimer & D. Palermo (Eds.), Cognition and the symbolic processes (Vol. 2, pp. 159–226). Hillsdale, NJ: Lawrence Erlbaum. Stolfi, J. (1991). Oriented projective geometry. Boston, MA: Academic Press.

Part Two Perception of Simulated Human Motion B Y THE END of the twentieth century, the film-theory establishment had lost its faith in realism. In the waning decades, film theorists became increasingly entrenched in their belief that reality itself is a construct of language and culture. The digital technologies of the 1990s that made possible the synthetic construction of images seemed to render obsolete any notions of a photographable reality. In the end, neither reality nor motion picture realism could be countenanced, and the party oprichniks set out to purge the field of film studies of ideas from writers such as André Bazin and Siegfried Kracauer, who in the middle of the century had advanced well-articulated theories of motionpicture realism based upon the processes of photography (Bazin, English trans., 1971; Kracauer, 1960). In the academy, both Bazin and Kracauer, and especially Bazin, were ritually ridiculed. Bazin fully appreciated the psychological power of the photographic image, which as he noted is etched on the film by rays of light directly connected to objects in the world. Indeed, the very process of photography results in an inescapable realism. The photographic image is the object itself, the object freed from the conditions of time and space that govern it. . . . Only a photographic lens can give us the kind of image of the object that is capable of satisfying the deep need man has to substitute for it something more than a mere approximation, a kind of decal or transfer. . . . [F]or photography does not create eternity, as art does, it embalms time, rescuing it simply from its proper corruption. (1971, p. 14) The basis for Kracauer’s theory was less direct, relying more on appearance and function of the image, but he, too, looked to the affinities of the photographic medium for the recording of reality. “Film, in other words, is uniquely equipped to record and reveal physical reality and, hence, gravitates toward it” (1960, p. 32). Ironically, it was Christian Metz who wrote perhaps the most eloquent homage to the role of motion in creating the impression of reality. It is movement (one of the greatest differences, doubtless the greatest, between still photography and the movies) that produces the strong impression of reality. . . . Because movement is never material but is always visual, to reproduce its appearance is to duplicate its reality. (1974, pp. 7, 9) Ironic, because it was also Metz who with Film Language introduced into film theory a linguistic approach that would lead film studies first into semiotics and then into psychoanalysis and Marxism and beyond, leaving concerns with realism far behind.

49

50 / P ART T WO

By the time digital manipulation of the image became commercially feasible in the 1990s, the relevance of realism to motion pictures was judged by most film scholars to be minimal. This dismissal was defended on both ideological and technical grounds. One writer summed up the first position rather succinctly by offering that rather than reality, “[w]hat the camera in fact grasps is the ‘natural’ world of the dominant ideology” (Nichols, 1991, p. 214). The second argument was that because realist theories had been based on photography, they were invalid for digital images. Thus, the digitization of moving images seemingly left the classical realists with no place to stand. The entire issue of realism might have been put to rest right there had it not been brought to the fore again by the practical requirements of everyday moviemaking. In the 1990s, it became more efficient to create certain film scenes by means of computer animation than by the older photographic techniques. Effects such as the absence of Lieutenant Dan’s legs in Forrest Gump (1994), the presence of a tornado in Twister (1996), the running dinosaurs in Jurassic Park (1993), Tom Cruise being hurled from a helicopter in a tunnel to land on the rear of a passenger train in Mission Impossible (1996), or all those people scrambling for footing on the deck of a ship that sank eighty-someodd years ago in Titanic (1997) are but some of the first of the many movies to benefit greatly from computer animation. As these movies illustrate, and as Metz himself observed, “The feeling of credibility, which is so direct, operates on us in films of the unusual and of the marvelous, as well as in those that are ‘realistic.’ Fantastic art is fantastic only as it convinces (otherwise it is merely ridiculous)” (1974, p. 5). Few major filmmakers are creating composited or computer-generated images that are not intended to look realistic. To the contrary, they spend a lot of money and employ a lot of highly skilled artists and technicians to generate images that look real. In the last few years, the art of animation has become very sophisticated indeed. In “Creating Realistic Motion,” Jessica K. Hodgins et al. share their considerable knowledge of motion simulation. They have created images of glass shattering and buildings exploding, but even more fascinating is their work on human-motion simulation. Keyframing, motion capture, and simulation are techniques for generating motion, and the latter, they point out, has the advantage of “generalization and interactivity. Simulated motion can easily be computed to produce similar but different motions while maintaining physical realism . . . Real-time simulations also allow the motion of an animated character to be truly interactive, an important property for virtual environments and video games.” They also note that a “disadvantage of simulation as a source of human motion is the expertise required to develop control systems for new behaviors or to adapt existing behaviors to new characters.” Their work is that of continually creating and then evaluating animated sequences, very much like the process E. H. Gombrich calls “making and matching.” Here the gold standard of realism is the photographed image, the filmed record of actual humans in motion, and the ultimate test of effective realism, or the “grand challenge” as Hodgins et al. call it, is a Turing test. In other words, if viewers were presented with a sequence of simulated human motion along with a sequence of photographed human motion and asked to choose the most realistic one, and if they chose one as often as the other, then the simulated sequence could be said to be as realistic as the photographed one. Then, of course, one is led to ask how realistic is the photographed image? Have “the salient elements of the motion” remained? In any event, this is a difficult test to pass. Hodgins et al. note that the goal of most animation is storytelling, and they ponder the possibility that perhaps the expressive demands of storytelling may

PERCEPTION OF S IMULATED HUMAN MOTION / 51

sometimes require selective violation of “the laws of physics and the biomechanical principles of how people move.” Just how difficult it is to create realistic human figures in motion is explored by Joseph D. Anderson and Jessica K. Hodgins in “Perceiving Human Motion in Synthesized Images.” They start from the premise that special-effects artists are often required to create animated portions of scenes that can be intercut with photographed portions of the same scenes. The problem is that while animation can create information that is very similar to photographic information, there is also information that is dissimilar, information that informs viewers that the picture is constructed rather than photographed. The more work animation artists do, the more they risk leaving tool marks of their trade. What becomes apparent is that the realism that belongs to photography by natural affinity is the hard-won prize of computer animation. References Bazin, André. (1971). What is cinema? Vol. 1. (Hugh Gray, Trans.). Berkeley: University of California Press. Kracauer, Siegfried. (1960). Theory of film: The redemption of physical reality. London: Oxford University Press. Metz, Christian. (1974). Film language: A semiotics of the cinema (Michael Taylor, Trans.). New York: Oxford University Press. Nichols, Bill. (1991). Representing Reality. Bloomington: Indiana University Press.

3 Creating Realistic Motion Jessica K. Hodgins, James F. O’Brien, Nancy S. Pollard, Robert Sumner, Wayne L. Wooten, Gary Yngve, and Victor Zordan P EOPLE ARE SKILLED at perceiving the subtle details of human motion. We can, for example, often identify friends by the style of their walk when they are too far away to be recognizable otherwise. As a result of this skill, we have high standards for the motion of virtual human actors. If synthesized human motion is to be compelling, the virtual actors must appear realistic and move in a natural fashion. Synthetic human motion is needed for such entertainment applications as computer animation, virtual environments, and video games. We would like to be able to create a Toy Story that starred kids as well as their toys, sports training environments where virtual competitors motivate aspiring sports stars to become better athletes, and video games with appealing and interactive characters. The ability to simulate human motion also has significant scientific applications in full body ergonomics, gait analysis of individuals, and physical rehabilitation. The task of specifying the motion of an animated object to the computer is surprisingly difficult. Even animating a simple object like a bouncing ball can be challenging, in part because people can quickly pick out motion that is unnatural or implausible without necessarily knowing exactly what is wrong. Animation is also time consuming because very subtle details of the motion must be specified in order to convey the personality of a character or the mood of an animation. A number of techniques have been developed for computer animation, but all the available tools involve a trade-off between automation and control. The techniques can be classified into three basic groups: keyframing, motion capture, and simulation. Keyframing allows a fine level of control but does little to automatically ensure the naturalness of the result. Motion capture and simulation generate motion in a fairly automatic fashion but offer little control over the fine details of the motion. Keyframing. Borrowing its name from the traditional hand-animation technique, keyframing requires that the animator specify key positions for the objects being animated. The computer then smoothly interpolates to determine the positions for the in-between frames. The characters of Toy Story were animated in this fashion with over seven hundred separate controls for the subtle motions of each main character. The specification of keyframes can be made easier with techniques that aid in the placement of articulated models. For example, if the hand of an animated character must be in a particular location, inverse kinematics allows the computer to calculate appropriate elbow and shoulder angles. While these techniques make animation easier, keyframing still requires that the animator have a detailed understanding of how the animated object should behave over time and have the talent to express that behavior though keyframed con52

C REATING R EALISTIC M OTION / 53

figurations. The continued popularity of keyframing comes from the control that it gives the animator over the subtle details of the motion. Motion capture. This technique for generating motion employs magnetic sensors or reflective markers to record the three-dimensional motion of a human performer. The recorded data is then played back through a graphical model to create the animation. Motion capture is a very popular technique because of the relative ease with which many human motions can be recorded. However, a number of problems prevent motion capture from being an ideal solution for all applications. First, accurately measuring the motion of the human body is tricky because sensors or markers attached to skin or clothing shift as the performer moves, creating errors in the recorded data. Furthermore, if the performer and the graphical actor have different dimensions, the animation may have noticeable flaws. For example, if the performer touches a table, the hands of the graphical actor might be suspended in the air or sunk into the model of the table in the animated scene. A number of researchers have addressed this problem by modifying the recorded motion to constrain the hand to touch the table or the feet not to slide on the floor via optimization (Gleicher, 1997; Gleicher, 1998; Lee & Shin, 1999). The technology used for motion capture also makes it difficult to capture some motions. One sensor technology is magnetic, and the performer is restricted to a small workspace that must be free of metal objects. A second technology is optical and measures the location of small reflective markers. Optical systems have a larger field of view but require more processing of the data to extract the information necessary for animation. In spite of these difficulties, much of the motion used in commercial animation is generated by using captured data and tweaking the results either by hand or through optimization algorithms to match the size and desired behavior of the graphical character. Simulation. This technique makes use of the laws of physics to generate motion by integrating the equations of motion forward in time. For example, a dynamically simulated human figure is often modeled as a hierarchy of rigid bodies while clothing is modeled as a mass-spring system. Computing the motion of a human figure using dynamic simulation requires a rigid body model of the character with biomechanically realistic parameters (fig. 3.1), the equations of motion for the model, and control algorithms that cause the figure to perform the desired action. For example, a control system for running must contain equations that compute joint torques to swing the leg forward before touchdown and prevent the runner from tripping. We have explored techniques for combining control systems with physically realistic models of humans for such dynamic tasks as running, diving, bicycling, and performing gymnastic vaults and flips (Hodgins, Brogan, & O’Brien, 1995; Wooten & Hodgins, 2000; Hodgins, 2000). Although the behaviors are very different in character, the control algorithms are built from a common toolbox: State machines are used to enforce a correspondence between the phase of the behavior and the active control laws, synergies are used to cause several degrees of freedom to act with a single purpose, limbs without required actions in a particular state are used to reduce disturbances to the system, and the low-level control is performed with proportional-derivative servos. We now illustrate these principles with several examples. Running is a cyclic behavior in which the legs swing fore and aft and provide support for the body in alternation. Because the legs perform different functions during the phases of the locomotion cycle, the muscles are used for different control actions at different times in the cycle. For example, when the foot of the simulated runner is on the ground, the ankle, knee, and hip provide support and balance, but during the flight

54 / JESSICA K. HODGINS

Fig. 3.1. The controlled degrees of freedom of male and female human models. The female gymnast has fifteen body segments and a total of thirty controlled degrees of freedom. The male runner has seventeen body segments and thirty controlled degrees of freedom.

Fig. 3.2. A state machine is used to determine the control actions that should be active for running given the current state of the system. At lift-off, the legs switch roles, and the state machine repeats. The states correspond to the points of contact on the ground: flight, heel contact, heel and toe contact, and toe contact.

phase, the hip is used to swing the leg forward in preparation for the next touchdown. These distinct phases and the corresponding changes in control actions make a state machine a natural tool for structuring the control algorithm (fig. 3.2). Associated with each state are control actions that compute desired values for all the joints of the simulated human. Often, several limbs are used in a synergistic fashion. For example, in the gymnastics sequence shown in figure 3.3, the kinematics of the arm are used to compute the shoulder, elbow, and wrist angles that will put the vaulter’s hands

C REATING R EALISTIC M OTION / 55

on the horse. Where possible, these control actions use the passive behavior of the system to achieve the desired effect. For example, the runner’s knee acts as a passive spring during the first part of stance, compressing to store energy and then extending again. Similarly, the vaulter’s shoulder joints are free when her hands are in contact with the horse, allowing her body to swing up and over the horse.

Fig. 3.3. A comparison of real and simulated runners and vaulters.

In figure 3.1, the directions of the arrows indicate the positive direction of rotation for each degree of freedom. Limbs without specific duties in a particular state are used to reduce disturbances to the system. For example, the runner’s arms swing in opposition to the legs to reduce the yawing of the body. The control laws compute desired values for each joint, and proportional-derivative servos are used to move the joints towards their desired values: τ=k (θd−θ)-kv(θ) where τ is the joint torque, θd is the desired joint angle, θ is the actual joint angle, θ is the joint velocity, and k and kv are gains for the stiffness and damping of the proportionalderivative servo. This low-level servo serves as a very simple muscle model by controlling the stiffness of each joint. These control systems are designed by hand and require significant understanding of the physics behind the motion. In recent years, the field has seen the development of a number of techniques for automatically generating motion for new behaviors and new creatures. Several researchers have treated the problem of generating motion as an optimization problem (Auslander et al., 1995; Liu, Gortler, & Cohen, 1994; Popovic & Witkin, 1999; van de Panne, Fiume, & Vranesic, 1990; Witkin & Kass, 1988). While automatic techniques would be preferable to hand design, automatic techniques have not yet been developed that can find solutions for systems with the number of controlled degrees of freedom needed for a plausible model of the human body. Although developing a new simulation is not easy, once it has been designed, an animator may use it without a detailed understanding of the underlying algorithms. A simulation provides only high-level control over the motion (bicycle along this path at 9 m/s, for example) and does not allow the animator to easily control subtle details of the motion. This problem could be addressed by using simulation to generate the gross motions in an automatic fashion and using keyframing or motion capture for finer motions such as facial expressions or subtle gestures.

56 / JESSICA K. HODGINS

As a source of human motion, simulation has two potential advantages over keyframing and motion capture: generalization and interactivity. Simulated motion can easily be computed to produce similar but different motions while maintaining physical realism (running at 4 m/s rather than 5 m/s, for example). Real-time simulations also allow the motion of an animated character to be truly interactive, an important property for virtual environments and video games in which the actor must respond to changes in the environment and the actions of the user. In contrast, motion capture and keyframing require the use of a precomputed library for such applications perhaps with the addition of real-time adaptations to the scene. One disadvantage of simulation as a source of human motion is the expertise required to develop control systems for new behaviors or to adapt existing behaviors to new characters. We can begin to address these problems by developing a library of behaviors that can be adapted and combined to form new behaviors. For example, we have constructed a set of four parameterized basis behaviors (leaping, tumbling, landing, and balancing) and then combined those behaviors together to form a variety of more complex maneuvers such as flips and dives (fig. 3.4) (Wooten & Hodgins, 2000). Adapting behaviors to new characters is not easy because control systems are tuned for the dynamic properties of a particular model. In general, a system developed for an

Fig. 3.4. Two platform dives.

C REATING R EALISTIC M OTION / 57

adult will not work for a child, for example. However, optimization techniques can be used to adapt an existing control system to a new character with significantly different physical properties. For example, we have generalized a control system for a running man to a woman and a three-year-old child (Hodgins & Pollard, 1997). Figure 3.5 shows a comparison of the simulated child’s motion with that of a real child of similar size. Although it is not apparent in the still images, the real child’s motion includes much more variability.

Fig. 3.5. A comparison between a real child and a simulated child running.

Behaviors such as diving, vaulting, and running are dynamic activities where the physics provides many constraints on the motion. For example, angular momentum is conserved during flight, and this physical law limits the ways in which a character in flight can move. These athletic behaviors are also near the limit of human performance. As a result of these constraints, there is less stylistic variation in the way that a good gymnast performs a difficult vault than, for example, in the way that casual athletes run or walk. Because there is less possibility for stylistic variation, these behaviors are easier to program so that the character accomplishes the task in a way that looks natural. This observation has implications for the simulation of simple human motions where style plays a greater role, such as reaching, gesturing, and fidgeting. When the gross characteristics of the motion are not constrained by the dynamics of the system, the task can be completed successfully but in a way that appears unnatural. Additional rules based on observations and measurements of human behavior will be required before it will be easy to program control algorithms for these behaviors. We have begun to explore ways in which motion capture and simulation can be effectively combined (Zordan & Hodgins, 1999; Hodgins & Popovic, 2000). One example is a simulated table-tennis player whose swings are driven by data interpolated from a set of captured swings (fig. 3.6). Simulation offers the opportunity to make the animated motion more realistic through secondary, passive elements that move in response to the primary motion of the main actor. Traditional animation contains many examples of secondary motion, and it is thought to be essential for appealing and natural-looking animated characters. We have coupled passive simulations of objects in the environment with active, rigid body simu-

58 / JESSICA K. HODGINS

Fig. 3.6. A comparison of human motion and simulated motion for a table-tennis swing. The actor’s motion data was collected at the same time as the live footage and then subsequently used to drive a control system for the upper body of the simulated player. A hand-designed control system for the lower body maintains the balance of the simulated character.

Fig. 3.7. A comparison of a real and simulated basketball nets.

lations of humans to compute secondary motion for animated figures: a gymnast on a trampoline, a bungee jumper, a gymnast vaulting onto a mat, a runner wearing a T-shirt and sweatpants, a girl wearing a skirt, a basketball going through the hoop (fig. 3.7), and a runner making footprints in the sand (O’Brien, et al., 2000). Other natural phenomenon are also amenable to simulation. We have used a finiteelement model to simulate the fractures that result when an object is dropped onto a hard floor (fig. 3.8) (O’Brien & Hodgins, 1999). Similarly, a computational fluid dynamics model can be used to simulate explosions in a modeled city (fig. 3.9) (Yngve, O’Brien, & Hodgins, 2000). In the case of a simulated explosion, the basic physics of the pressure wave provide only some of the information that is required to render the scene in a realistic fashion. We also needed models of fire, smoke, and dust clouds, the physics and rendering parameters of which are far less well understood (fig. 3.9). Traditional tests such as comparison of real and simulated video footage can also be used to assess the realism of the motion. For example, figure 3.3 shows comparisons between human and simulated runners and vaulters. Biomechanical data for ground reaction forces, flight times, velocity, and step length can also be used to measure how closely the simulated motion resembles human motion.

C REATING R EALISTIC M OTION / 59

Fig. 3.8. A comparison of real and simulated bowls as they are dropped onto a hard surface.

Fig. 3.9. A simulated explosion as it turns a corner in a city model.

The grand challenge in simulating human motion is a Turing test. Is it possible to create motion sequences where the results are as natural as actual human motion, at least when played back through the same graphical model? Our preliminary experiments indicate that the answer to this question is probably not yet. Constructing an appropriate test is not easy because we are not yet able to compute the motion or render of skin, muscles, and hair in a way that is sufficiently accurate allow a side-by-side comparison with video footage. We could, of course, use motion-capture data rendered in the same style as our simulated motion, but then we would merely be asking whether our motion was as good as motion capture, not whether it was as good as human motion. Motion capture itself has characteristic flaws which the test subjects might well be able to detect. Alternatively, we could distill down both the filmed and simulated motion via image-processing techniques (hand digitization or filtering processes). This experiment, however, would raise the question of whether the salient elements of the motion remained. Preliminary results make it appear likely that the rendering style affects a subject’s ability to make fine discriminations about the motion (Hodgins, O’Brien, & Tumblin, 1998). With each of these experiments, our goal has been to create motion with as much visual realism as possible. This goal is appropriate for some applications, such as training, for which realism determines the effectiveness of the system. However, entertainment rather than realism is the goal of most animated motion. We hope that by understanding realism first, we will later be able to more easily selectively violate the laws of physics and the biomechanical principles of how people move in order to tell a story or evoke a particular emotion effectively. Note The research described in this paper was performed while the authors were at the College of Computing and Graphics, Visualization, and Usability Center at Georgia Institute of Technology. Jessica Hodgins is currently with the School of Computer Science at Carnegie Mellon Uni-

60 / JESSICA K. HODGINS versity. James F. O’Brien is currently with the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley; Nancy S. Pollard is with the School of Computer Science at Carnegie Mellon University; Robert Sumner is with CSAIL at the Massachusetts Institute of Technology; Wayne L. Wooten is with Pixar Animation Studios; Gary Yngve is with the Department of Computer Science and Engineering at the University of Washington; and Victor Zordan is with the Department of Computer Science, University of California, Riverside.

References Auslander, J., Fukunaga, A., Partovi, H., Christensen, J., Hsu, L., Reiss, P., Shuman, A., Marks, J., & Thomas, J. N. (1995). Further experiences with controller-based automatic motion synthesis for articulated figures. ACM Transactions on Graphics, 14(4), 311–36. Gleicher, M. (1997, Apr.). Motion editing with spacetime constraints. 1997 Symposium on Interactive 3D Graphics (pp. 139–48). ———. (1998, July). Retargeting motion to new characters. Proceedings of SIGGRAPH 98 (pp. 33–42). Hodgins, J. K. (2000). Simulation of human running. IEEE International Conference on Robotics and Automation. Hodgins, J. K., O’Brien, J. F., & Tumblin, J. (1998, Oct.–Dec.). Perception of human motion with different geometric models. IEEE Transactions on Visualization and Computer Graphics, 4(4), 307–16. Hodgins, J. K., & Pollard, N. S. (1997, Aug.). Adapting simulated behaviors for new characters. Proceedings of SIGGRAPH 97 (pp. 153–62). Hodgins, J. K., & Popovic, Z. (2000, July). Animating humans by combining simulation and motion capture. Course Notes 33, SIGGRAPH 2000. Hodgins, J. K., Wooten, W. L., Brogan, D. C., and O’Brien, J. F. (1995, Aug.). Animating human athletics. Proceedings of SIGGRAPH 95 (pp. 71–78). Lee, J., & Shin, S. Y. (1999, Aug.). A hierarchical approach to interactive motion editing for human-like figures. Proceedings of SIGGRAPH 99 (pp. 39– 48). Liu, Z., Gortler, S. J., & Cohen, M. F. (1994, July). Hierarchical spacetime control. Proceedings of SIGGRAPH 94, (pp. 35–42). O’Brien, J. F., & Hodgins, J. K. (1999, Aug.). Graphical modeling and animation of brittle fracture. Proceedings of SIGGRAPH 99 (pp. 137–46). O’Brien, J. F., Zordan, V. B., & Hodgins, J. K. (2000). Combining active and passive simulations for secondary motion. IEEE Computer Graphics & Applications, 20(4), 86–90. Popovic, Z., & Witkin, A. (1999, Aug.). Physically based motion transformation. Proceedings of SIGGRAPH 99 (pp. 11–20). Sumner, R., O’Brien, J. F., & Hodgins, J. K. (1999). Animating sand, mud, and snow. Computer Graphics Forum, 18(1), 17–26. van de Panne, M., Fiume, D., & Vranesic, Z. (1990). Reusable motion synthesis using state-space controllers. Computer Graphics (Proceedings of SIGGRAPH 90), 24(4), 225–34. Witkin, A., & Kass, M. (1988). Spacetime constraints. Computer Graphics (Proceedings of SIGGRAPH 88), 22(4), 159–68. Wooten, W. L., & Hodgins, J. K. (1996). Animation of human diving. Computer Graphics Forum, 15(1), 3–14. ———. (2000). Simulating leaping, tumbling, landing, and balancing humans. IEEE International Conference on Robotics and Automation (pp. 656–62). Yngve, G. D., O’Brien, J. F., & Hodgins, J. K. (2000, July). Animating explosions. Proceedings of SIGGRAPH 2000 (pp. 29–36). Zordan, V. B., & Hodgins, J. K. (1999, Sept.). Tracking and modifying upper-body human motion data with dynamic simulation. Computer Animation and Simulation ‘99 (pp. 13–22).

4 Perceiving Human Motion in Synthesized Images Joseph D. Anderson and Jessica K. Hodgins T HE GOAL OF special-effects animators is to create on the computer portions of an event that can be intercut or composited with other portions of the event that were shot in live action, such that the computer-generated portions of the event are indistinguishable from the live-action portions. To achieve this goal is no small accomplishment, yet most of the feature films we see contain computer animation, and a good portion of that work passes our notice. The artists and technicians who work to create special effects have gotten very good at their jobs, yet they generally proceed by way of “making and matching” (see Gombrich, 1960, p. 29), and our theoretical understanding lags considerably behind their accomplishment. If the problem is to match computer-generated images with photographed images, then the most obvious question is, what are the differences between the two images? One rudimentary difference is that there are more grains of silver halide in the picture area of the film than there are dots in the matrix of a standard computer image, giving greater picture resolution and a greater range in gray scale to the photograph. But perhaps much more important are those differences that accrue to the two types of images because of the way they are made. The computer-generated image is constructed; the film image is recorded. The question becomes: What are the differences between a constructed image and a recorded image? To achieve a computer image, objects and events are often modeled mathematically. The resulting image can be realistic because the world we know seems to follow physical laws and therefore to lend itself to mathematical description. In theory, the relationship between an object in the world and an object on the screen could be just as exact if created by mathematics as if created by the mechanical action of light rays—if it were not for the problem of specificity. When an event occurs in the world, the event itself is unique, and it produces a disturbance in the optic array that is also unique. However, the disturbance in the array is not a picture, not a “copy” of the event. The particular relationship that exists between the event and the corresponding disturbance in the optic array is one of specificity. The information in the optic array specifies the event. As James J. Gibson noted, “There are different kinds of disturbances for different kinds of events” (1979, p. 102). It is this problem of specificity with which the animator must deal. For example, when a camera records the explosion of an automobile, a common event in the movies, the explosion occurs according to the laws of physics, the gases expand and combust in irregular patterns, and the car separates into millions of fragments of unpredictable shape and size that move on complex unpredictable paths. The camera records every detail of the event that can be seen from its point of view, and when the film is later projected on a screen, the viewer finds authenticity in the consistency of

61

62 / JOSEPH D. A NDERSON

AND J ESSICA

K. HODGINS

the whole and in the complexity of details. While the computer artist may correctly formulate the physics of the general explosion, for example, the expansion of gases and the hurling of fragments from a central point, there are no models to compute and render the irregular movement of every minute progression of flame and smoke or the size and shape of every fragment of exploded automobile; one must settle for a mathematical approximation; and when it is later projected upon a screen, the viewer can often see that this synthetic construction is a departure from reality. Given the differences between photographed and computer-generated images, how is it possible for an animation artist to create images of an event that can be intercut with photographed portions of the same event such that they are indistinguishable to the viewer? Apart from tricks of the trade such as casting the computer-generated images in shadow or creating a diversion in the audio track or elsewhere in the frame to distract the viewer, the answer has to be in accurately modeling the properties of objects and their motion. It is the motion of a motion picture that distinguishes it from photography and gives it its special capacity to render the living. For, as Gibson observed, “The eye developed to register change and transformation. The retinal image is seldom an arrested image of life. Accordingly, we ought to treat the motion picture as the basic form of depiction and the painting or photograph as a special form of it. . . . Moviemakers are closer to life than picture makers” (1979, p. 182). Cinematographers therefore have only to capture motion from life; animators must derive algorithms for the motions of life and construct representations from them. Early on, Gunnar Johansson, Lynn T. Kozlowski, James E. Cutting, and others realized that they could derive mathematical formulae for the movement of the white dots on a black screen that resulted from photographing or videotaping walkers or runners with point-light sources (Johansson, 1973; Cutting & Kozlowski, 1977; Kozlowski & Cutting, 1977, 1978). Using this recorded data to move the dots, they and subsequent researchers were free to synthesize human motion. But how realistic do these synthetic motions appear to viewers? An interesting problem arose when animators attempted to sustain the synthetic motion of human walking or running for more than a few seconds on the screen. The motion of the synthetic figures was apparently too regular. Viewers reported that the running figures looked stiff or unnatural. Bobby Bodenheimer, Anna Shleyfman, and Jessica K. Hodgins tested the possibility of simulating the small irregularities that occur when humans move, which are, of course, captured by a movie camera. They introduced random noise (that is, slight random changes) into their equations for the motions of their figures. The result was that viewers found the motion of running figures with some irregularity to be slightly more natural than those without irregularity (Bodenheimer, Shleyfman, & Hodgins, 1999). Hodgins, James O’Brien, and Jack Tumblin (1998) set out to solve another practical problem, which turns out to have major theoretical implications. When animators begin a project, they often start by sketching out some of the required motions using a simple form such as a traditional stick figure. The practical question that arises is whether the motions initially worked out with a simple figure appear the same to viewers when the figures have been fully developed. As Hodgins et al. posed the question, “Does the geometric model used to render an animation affect a viewer’s judgment of the motion, or can a viewer make accurate judgments independent of the geometric model?” The test they devised consisted of forty paired presentations. Twenty of the pairs were stick figures, and twenty were dressed human-like polygonal figures.

P ERCEIVING HUMAN MOTION / 63

Viewers were asked simply to say whether the motion was the same or different for the two figures in the pair. The motion was in fact varied in different amounts between the figures in each pair. The overall result was that viewers were better at recognizing small differences in motion for the polygonal figures than for the stick figures. Initially the researchers posed three possible explanations for three possible results: 1. If viewers are able to make finer motion discriminations for the stick figures, the explanation might be that “[s]impler models may be easier to comprehend than more complex ones allowing the viewer’s attention to focus more completely on the details of movement rather than the details of the model.” 2. If viewers can make more accurate discriminations for the polygonal figures, it is perhaps because “People have far more experience judging the position and movement of actual human shapes than they do judging abstract representations such as stick figures.” 3. If viewers can detect the differences in motion equally between the two different types of figures, then perhaps it is that “[t]he human visual system may use a displayed image only to maintain the positions of a threedimensional mental representation. Judgments about the motion may be made from this mental representation rather than directly from the viewed image.” (p. 307) After collecting and analyzing the data, the researchers concluded that they could not confirm either possibility 1 or 2. They did however conclude that their work “provides experimental evidence to disprove possibility 3 by showing that viewer sensitivities to variations in motion are significantly different for the stick figure model and the polygonal model” (p. 308). Viewers were much better at detecting differences in motion for the fully developed polygonal figure than the stick figure. Recall that if viewers had been able to detect the differences in motion equally between the two different types of figures, the theoretical explanation would have been that “[t]he human visual system may use a displayed image only to maintain the positions of a three-dimensional mental representation. Judgments about the motion may be made from this mental representation rather than directly from the viewed image.” Now, the fact that differences were found in viewer detection of changes in motion with regard to the two types of figures may not definitively disprove the part of the theory that assumes the construction of mental representations, but these findings cast doubt upon such a theory. There are other possibilities. What if constructing mental representations and using them to make judgments about motion is not the way the perceptual system works at all? What if, instead, through some combination of innate capacity and experience, the perceptual system is capable of directly recognizing human motion even if it is minimally specified as the research by Johansson, Kozlowski, and Cutting suggests? With a complex figure that contains more information (such as the polygonal figure used by Hodgins, Wooten, Brogan, & O’Brien, 1995), might it not be that some of the additional information in the complex figure stands out as inconsistent with human motion? And might this additional information not amplify changes in the motion of the figure that may also have been recognized as inconsistent with natural human motion?

64 / JOSEPH D. A NDERSON

AND J ESSICA

K. HODGINS

To test these notions, we designed the present study in which we showed viewers three running figures: a figure made up of dots, a stick figure, and a polygonal figure (fig. 4.1).

Fig. 4.1. Dot, stick, and full figures.

We told viewers that the running motions for all the figures were mathematically modeled and asked them to order the figures from the most like human motion to the least like human motion. (In the previous experiment—Hodgins, O’Brien, and Tumblin— viewers were asked only to say whether the motion in the paired figures was the same or different. They were not asked to say which motions were the more natural.) We expected that if viewer perception involves the construction of representations, then the motion of the polygonal figure that has the most fully realized detail would be ranked as the most like human motion, the stick figure would be second, and the dot figure with the least information would be last. If, on the other hand, viewers directly recognize human motion even if minimally specified, and any nonhuman information is recognized as such, then viewers would choose the dot figure as most like human motion, because it presents the minimal information needed to specify human running but little information to the contrary. The stick figure would be next, and the motion of the full polygonal figure would be last, because it contains the most information overall, but much of that information is likely inconsistent with real human running. We presented the videotape containing the figures to 105 viewers who were mostly engineering and business majors in undergraduate speech classes. The images were projected with a video projector onto a fifteen-foot screen in total darkness. The same algorithm was employed for all three figures so that their motion would be exactly the same. (For further detail of how the motion was generated, see Hodgins, Wooten, Brogan, & O’Brien, 1995.) Each figure was presented for ten seconds with five seconds of darkness between each presentation. The figures were presented three times in different order. Viewers were given an answer sheet that contained three icons of the dot, stick, and full figure simulations with boxes below each icon. The sheet asked for self-report of age, gender, and race, and contained the following instructions: “You will see three computer-generated images of human running. Each image will appear three times. Watch the images, then answer these two questions: Which computer-generated image is closest to the motion of actual human running? Place a “1” in the box under the appropriate icon. Which computer-generated image is the next closest to the motion of actual human running? Place a “2” in the box under the appropriate icon.” The results were that even though the motion was the same for all figures in all trials, the viewers consistently judged the motion of the figure composed of dots as closest to the motion of actual human running, with the stick figure next and the full polygonal figure least like actual human motion.

P ERCEIVING HUMAN MOTION / 65

So why does the motion of the dot figure look so much more like actual human motion than the stick and polygonal figures? There are at least a couple of reasons, we think. First, there is not much information in a set of moving white dots compared to the information that might be there, yet it has been compelling demonstrated that there is enough information in the motion of the dots to specify a human figure, the gender of that figure, and perhaps even his or her identity (Johansson, 1973; Cutting & Kozlowski, 1977; Kozlowski & Cutting, 1977; Kozlowski & Cutting, 1978). As viewers, we perceive the information that is present and seldom miss information that might have been there but is not, unless its absence is called to our attention. For example, an editor learns quickly that he is often putting together pieces of action that elide various amounts of time, small and large, and that usually it is possible to do this in a way that seems continuous to the audience. Where it is important for the audience to understand that there is a gap in time, the editor finds a way to make this apparent. Second, while there is enough information in the array of moving dots to specify human motion, there is little information about anything else. There is little information to tell us that the figure was constructed using dynamic simulation and hand-tuned control algorithms. The stick figure and the polygonal figures give us comparatively much more information, but the additional information is not all about human motion; a good portion of it is about nonhuman shapes and surfaces that move in nonhuman ways. Computer-generated animations, like painted animations, present to a viewer a synthetic optic array. The array of wavelengths and brightnesses created by the digital artist must be so close to the array recorded by the camera that the audience will not be able to discriminate between the two. Directly recorded images are not mere symbols; they function more as surrogates for the real world. That is, we encounter the visual arrays of a motion picture as we encounter the visual arrays of the real world. We perceive them as we perceive natural arrays. We have no capacity to do otherwise. Subsequent to the experiencing of an event we are, of course, free to think about what we have seen in any way we like. We may consider it all symbolically if we are so inclined. But the power of the motion picture, the thing that sets it apart from mere symbolic communication, is that it engages our perceptual system directly, and we process the changing array of light before us as we process the natural world. Of course, we know that “it’s only a movie,” but such knowledge shuts down neither our senses nor our emotions. And the greater the realism, the greater the access to our senses. Put another way, we encounter the motion picture as we encounter the world, and in both, the builtin “assumption” is that things are as they appear to be unless we gain information to the contrary. The “assumption” of realism is, of course, not an assumption at all. The reference is not to an individual’s expectations based on prior experience with the environment. The “assumption” resides at a very basic level; no inference or intellectualization occurs. So, to our perceptual system, things simply are as they appear to be unless we encounter information to the contrary. The notion of specificity is relevant here, for although no picture of an event exists in the optic array, there is a precise relationship between the event in the world and the disturbance in (or transformation of ) the optic array. This relationship is real and can be expressed mathematically. In perception of the natural world, this relationship can usually be taken for granted—that is, the transformation specifies that particular event. Because that specificity can be taken for granted, we do not ever have to ask when viewing the natural world whether what we see is real,

66 / JOSEPH D. A NDERSON

AND J ESSICA

K. HODGINS

nor are we inclined to ask this question when watching a movie, where we are presented with an optic array without the physical event that produced it. Motion pictures, or scenes in motion pictures, to the extent that they confirm our most basic perceptions of objects and events from moment to moment seem real even though we may know that the diegetic world is one of fiction or fantasy. To the extent that they give us information that they are fabricated, crafted, or constructed, they undermine our basic “assumption” of realism and cause us to see them as the constructions that they are, rather than the objects and events they depict. The dot figure specifies motion with minimal form. Perhaps, as Yves Guiard has suggested, it “captures the soul of human motion.” What filmmakers have known for a long time is confirmed: with regard to realism, more information is sometimes less. The dot figure is the limiting case; no elaboration can improve upon the appearance of the human motion it specifies. References Bodenheimer, B., Shleyfman, A., and Hodgins, J. K. (1999, Sept.). The effects of noise on the perception of animated human running, computer animation, and simulation ’99. In N. Magnenat-Thalmann & D. Thalmann (Eds.), Eurographics Animation Workshop (pp. 53– 63). Wien: Springer-Verlag. Cutting, J. E., and Kozlowski, L. T. (1977). Recognizing friends by their walk: Gait perception without familiarity cues. Bulletin Psychonomic Society, 9(5), 353–56. Gibson, J. J. (1979). The Ecological Approach to Visual Perception. Boston: Houghton-Mifflin. Gombrich, E. H. (1960). Art and Illusion: A study in the psychology of pictorial representation. Princeton: Princeton University Press. Guiard, Y. (2001, June 28). Statement made during open discussion following Open Topics Panel C, Eleventh International Conference on Perception and Action, University of Connecticut, Storrs, CT. Hodgins, J. K., O’Brien, J. F., & Tumblin, J. (1998, Dec.). Judgments of human motion with different geometric models. IEEE: Transactions on Visualization and Computer Graphics, 4(4), 307–16. Hodgins, J. K., Wooten, W. L., Brogan, D. C., & O’Brien, J. F. (1995, Aug.). Animating human athletics. Proceedings of SIGGRAPH ‘95 (pp. 71–78). Johansson, G. (1973). Visual perception of biological motion and a model for its analysis. Perception and Psychophysics, 14(2), 201–11. Kozlowski, L. T., and J. E. Cutting. (1977). Recognizing the sex of a walker from a dynamic pointlight display. Perception and Psychophysics, 21(6), 575–80. ———. (1978). Recognizing the gender of walkers from point-lights mounted on ankles: Some second thoughts. Perception and Psychophysics, 23(5), 459.

Part Three Acoustic Events A SOUND - EFFECTS EDITOR for a motion picture begins his work by viewing a scene in which people are moving about on the screen, interacting with each other and with objects and structures in their environment. And usually the sounds of their footsteps, their opening and closing of doors, their pouring and drinking of liquids, as well as the roar of their engines, the squealing of their tires, and the blasts of their guns, have been recorded along with the picture. For a movie in which all the sound has been recorded synchronously with the picture, it would seem that the sound-effects editor would have little to do. But nothing could be farther from actual practice, for in contemporary motion pictures, most, if not all, of the incidental sounds are replaced or enhanced. This poses a major problem for our understanding of motion-picture realism, for it would seem self-evident that the sounds of the action recorded at the time of the action are the most realistic possible and that any replacement or enhancement could only result in less realism. We recall editing a scene in which the protagonist is trapped in a blind alley and eludes his pursuers by climbing a fire escape up the side of a building to the rooftop. In actuality the steel stairway was very rigid and secured firmly to the building. The sounds made by the steel structure as his feet rapidly ascended the steps made it seem solid and safe. We listened to these sounds recorded at the same time as the picture and judged that they would have to be replaced. (This, by the way, is the judgment rendered by almost every sound-effects editor with regard to almost every production-track effect.) But why did we want to change the sounds? Why did they feel wrong? They were the sounds actually made by the climbing of the fire escape. Do they not qualify as realistic? The sounds that we heard accompanying the actor climbing the fire escape were indeed realistic. They accurately conveyed the information that the actor was climbing a well-constructed and well-maintained metal stairway and that he was in no danger. In fact, the scene had been rehearsed several times, and the actor with camera and sound crews in tow had each time made the ascent quite safely. But our job was to construct a fictional scene in which the character was in great jeopardy and the outcome of his effort to escape tinged with uncertainty. What was needed was a fire escape that sounded creaky, loosely attached to the supporting building, and dangerous to climb. We obtained some loosely fabricated pieces of metal and by trampling upon them achieved sounds that seemed much better. The foleyed sounds were in fact much better because they were more realistic, not for the profilmic event but for the fictional event. With the new sounds, the scene was no longer of an actor climbing a safe set of stairs but of the character fleeing for his life up a flight of open metal steps so precariously attached to the side of an old building that the outcome of his attempt was in grave doubt. These sounds were entirely appropriate and ultimately realistic for the fictional world of the movie. The insight to be gained is

67

68 / P ART T HREE

that the realism of sound effects must be judged not in terms of the actual location or set but against the fictional world of the movie. Charles Eidsvik addresses this issue of differentiating between the sounds of the profilmic event and those of the fictional world of the film in his essay “Background Tracks in Recent Cinema.” And he does so from the point of view of one who creates sounds for motion pictures. He knows that the track must sound real, and what “sounding real” means is at the root of his exploration of the background track. He observes that “[s]ound teams use the functions of background sound in everyday life, but rebuild them to reinforce and give ‘presence’ to the film’s fictional world.” He enumerates the functions of background sound, provides a history of its development, and discusses its practical, ontological, and aesthetic implications. Eidsvik ends his exploration with a call for “collaboration or at least communication between filmmakers and academics” and optimistically observes that in the small community of audiophiles, such cooperation has begun. If, in the pursuit of realism, the sound artist must, and does in practice, contrive and synthesize an effects track that sounds real in the context of the fictional world of the movie, then the question remains as to how it is possible for contrived or synthesized sounds to sound real. This is the question that Claudia Carello, Jeffrey B. Wagman, and Michael T. Turvey address with great clarity and in exquisite detail. For them, “[t]he issue is simply one of what is being synthesized. [James J.] Gibson himself put the issue in terms of synthesizing information.” (Italics ours.) Carello et al. note that the ecological approach tends to focus on our perception of “object properties rather than sound properties.” As with vision where we tend to see the objects and events of the world rather than the energy distributions of light, with sound we tend also to hear sources: a bird, a waterfall, a person walking, rather than properties such as frequency, amplitude, or onset and decay. Even then, “perception is not awareness of objects and events per se but awareness of their behavioral relevance.” Because an object’s physical parameters and spatial dimensions determine the acoustic structure it generates when mechanically disturbed, such structure is potentially informative about object properties. Thus it is possible to know the sizes and shapes of all sorts of objects by the sounds they make. For humans, we can tell whether they are going up stairs or down, and we can often tell the gender of a walker from the sound of footsteps. If we apply the principles offered by Carello et al. to motion-picture sound effects, we should not be terribly surprised that although we are presented with only a metal cage in the opening scene of Jurassic Park, we know from the sounds we hear that it contains a live creature that is big and fierce and frightened. Furthermore, we are aware that the creature poses a danger to everyone in the vicinity. We feel the fear. The sounds for the caged animal were synthesized from recordings of the sounds of other animals. They were created by sound artists by trial and error and are in their final form very complex, yet we are able to gain the information that the filmmakers intended. Does the complexity of the sounds make it more difficult for viewers to understand the source of the sounds? Apparently not; Carello et al. report: “Our survey of experimental investigations of auditory source perception strongly suggests that listeners are good when they have a lot of acoustical structure, even if that acoustical structure is not readily quantified by scientists.” Yet, we should not conclude that the meanings of sounds are to be found in the ear of the listener. To the contrary, Carello et al. suggest that the perception of object properties by sound alone is reliable, surprisingly accurate, and constrained by dynamical law.

A COUSTIC EVENTS / 69

In films, as in the natural world, we normally perceive objects and events both visually and aurally. A breaking bottle generates patterns of disturbance in both light and air, thus creating simultaneous optic structure and acoustic structure. Carello et al. suggest that these reliable patternings may well be indifferent to the medium. The human perceptual system seems to be capable of perceiving the common causal event specified by both. The corollary would seem to be that the simultaneously presented images and sounds of a motion picture specify the putative causal event, the fictional one.

5 Background Tracks in Recent Cinema Charles Eidsvik We do not see and hear a film, we hear/see it. —Walter Murch, 01 October 2000

W HAT WE HEAR affects what we see. Each of the three kinds of movie sound—voice, background sounds, and music—has important functions in our experience of film narratives. Though background sound was used only sparingly until the late 1960s—notable exceptions occur in films such as Alfred Hitchcock’s Rear Window—in the last three decades, new technologies have made it central to how narratives function. Sound editors combine location sound with sound recorded in postproduction to create specific audio identities for each scene; the resulting sound cements the visual and other audio elements of the scene into what feels like a continuous whole that feels real and reinforces the scene’s mood. Background sound, like music, also alters the thresholds of our awareness both of the visual and of larger narrative and emotional processes, altering how we experience visuals and narrative rhythms. Meant to be unnoticed, it is a catalyst for our reactions to nearly every element of our experience of films. What we know about sound in cinema is not solely the product of academics but also of practitioners. Sound studies is among the few areas in cinema studies in which scholars and artists often turn up at the same conferences, contribute to the same Internet list-serves, and seem to respect one another’s views.1 Among the practitioner-theorists, the most prominent is the sound designer and film editor Walter Murch. Murch has evolved a pragmatic esthetic based on the limits and potentials of aural perception, with close attention to the ways in which sound and sight work together. His theories (and the example of his sound tracks) augmented by the work of close collaborators such as Randy Thom (sound designer and mixer for Skywalker sound facility, George Lucas’s postproduction center) provide a basis for approaching sound in the cinema. Some Basics: The Five Functions of Background Sound For those unfamiliar with sound usage in cinema, it may help to review basics. Dialogue and some background sound are recorded during shooting. The film is edited using location sound. Once the film is edited using location materials, additional or replacement dialogue is “looped” in, and background sound and music are added. Filmmakers divide sound into three general categories: voice (usually dialogue); background sound (comprised of location “presence” and specific effects (most of which are created by Foley artists, who are specialists in creating sound effects); and music. From the beginning of sound movies in the 1920s, clear and comprehensible dialogue has been central to main-

70

BACKGROUND TRACKS IN R ECENT C INEMA / 71

stream film esthetics. The scripts that serve as the organizational basis for most film productions are mostly dialogue. If on-location recording is not optimal, voices are rerecorded in postproduction so that every important syllable can be heard clearly. The dialogue, even if spoken in a whisper, is usually boosted to above seventy decibels in the movie theater (about twice as loud as normal business conversation), to make sure everyone can hear it, even in large theaters, which absorb speech more than they do music. Usually stylized and simplified speech, movie dialogue is meant to be noticed. Most other sounds are not. Extending a tradition from silent film, music tracks cue the audience to emotional or associative responses to the film stories. Unless previously recorded music is used to establish the time period of the film’s action or to provide significant associations, music is specifically composed and recorded for each film. Most is non-diegetic, outside the world of the film story. Until the 1970s, unless it was needed to serve a clear narrative function—a gunshot or a train locomotive, for example—background sound tended to be suppressed. Over the last thirty years, however, background sound, comprised of location ambience (often called presence) as well as rhythmic effects such as footsteps, has joined music as a manipulator of mood and has come to have numerous additional uses. Background sound fills at least five distinct functions, four of which are adaptations of the ways in which background sounds function in everyday life. The fifth function is as a metaphoric extension of a character’s consciousness. Sound teams use the functions of background sound in everyday life but rebuild them to reinforce and give presence to the film’s fictional world. In Audio-Vision: Sound on Screen (1994), Michel Chion calls this “rendered sound” (pp. 107–20). The oldest use of ambience is as an identifier of fictional places. We identify and remember places by their sounds. The beach, a city street corner, a meadow, a harbor, and suburbia in lawn-mowing season each has typical ambiences. Much background sound on television comes from sound-effects libraries and thus has a generic “I’ve heard this same sound a hundred times” feel to it. Sound teams for feature films usually try to get audio collected specifically for the individual film. Sometimes a location loop from the film set will be part of the ambience, but normal practice is to make the scene sound like it “should” rather than accept how it was during shooting. Production mixers try to get as little ambience as possible into the dialogue tracks—no footsteps, no clocks ticking, no motors running, etc. The sport for location recordists is to try to record dialogue cleanly enough to be used in the final film rather than to have it end up as a guide track for dubbing. Actors’ performances often are better during production than in studio dubbing sessions later. Still, anywhere between 20% and 90% of dialogue will be replaced in postproduction. All effects tend to be done by Foley artists. Ambient tracks from real sounds in real places are mixed to create scene ambience. Second, and very important to response, is the use of continuous ambience to establish scenes as cohesive units, each with its own presence, its own aural identity. As Tomlinson Holman points out, If sound remains constant before and after a picture cut, the indication being made to the audience is that while the point of view may have changed, the scene has not shifted—we are in the same space as before. So sound provides a form of continuity or connective tissue for films. In particular, one type of sound represented several ways plays this part: Presence and ambience help to “sell” the continuity of a scene to the audience. (1997, p. xvi)

72 / CHARLES E IDSVIK

Because we are particularly sensitive to synchrony and to sharp audio transients, audio overlap between shots, called “greasing the cut” by editors, also helps limit awareness of acoustic differences in the ambiences of adjoining shots. Third, sound effects can make us think we have seen something that did not happen. In a boxing film such as Raging Bull, for example, there are lots of sounds of bodies being hit. Most film students report seeing the punches connect. But when the film is played back in slow motion, one sees almost no fist-body contact at all: usually the punch misses entirely, with the sound creating the illusion of violent contact. This technique is common in cinema because hearing is faster than foveal vision; we accept hearing as a guide to seeing. (See Chion, 1994, pp. 60–61.) The fourth function and the most sophisticated (and difficult to achieve in practice) is the use of background tracks to enhance identification with a central character’s situation in or perceptions of the space he or she inhabits. While used frequently in virtual reality simulations and by works made for surround-sound digital video disk DVD) exhibition, problematic theater acoustics and loudspeaker settings make it a risky part of sound design (see Chion, 1994, pp. 80–85). Still, since the late 1970s, some films have used the suggestion of audio placement as an important element (for example, in the foray into the jungle and in Apocalypse Now (1979). Holman describes how this works: The sound of the jungle remains present on the screen channels, but also creeps up in the surround channels, enveloping the listener. This use of surround sound creates greater involvement on the part of the listener by breaking the bounds of the rigid screen edges, and brings the audience into the action. Then when the tiger jumps out, the action is much more frightening because we accompanied the characters on their search, rather than observed them from afar. (1997, p. 177) Especially in thrillers, point-of-view sound is an important technique used for almost every high-tension scene. People normally can place themselves in a room by sound orientation, with the rule of the first wave determining where a sound comes from (Blauert, 1996, p. 410). Point-of-view sound thus works fairly well in restricted locations (the place a murder will occur in Seven or the last interview of Hannibal Lector by Clarice Starling in a basketball auditorium-like space in Silence of the Lambs). Because background sounds are panned according to where they occur in the frame (assigned to the left or the right, unlike dialogue, which always comes from the center), binaural sound allows this to work even without full surround effects. The fifth function is expressionistic: If background sounds are unconventionally loud or distorted, the main character must be in an abnormal psychological state. This convention has been around radio and cinema quite a while: One hears it in Orson Welles’s The Lady from Shanghai and in the opening dream sequence in Ingmar Bergman’s Wild Strawberries and Persona as well as in films such as Hitchcock’s Psycho. In one sense, making selected background sounds unusually loud is an analog to concentration or “listening for” specific kinds of sounds while ignoring others, or, conversely, being unable to shut out unwanted noises mentally to the point where the unwanted is all one hears. (See Chion, 1994, pp. 90–94.) A special but rare situation occurs when background sound replaces music as a moodinvoking element. The best scenes for showing how background sound can replace music are on the island location in the middle section of Robert Zemeckis’s Cast Away. The main character has washed up on an island without amphibians, birds, insects, or mam-

BACKGROUND TRACKS IN R ECENT C INEMA / 73

mals. Zemeckis’s sound team, led by Randy Thom, created the moods by sounds put into the film in postproduction. Sounds of individual waves, of ocean sounds, and of wind were constructed specifically to suggest both location and mood. As Thom put it: What I’m looking for is emotional and dramatic notes that will resonate. Very often some accidental juxtaposition of a certain kind of watery wave, a big impact on a rock and a wave that’s rolling backward down a steep bank of sand will have an effect on you that’s very different from—and might be better than—what you imagined was possible before you went about collecting those sounds. (Axinn, 2001) To get off the island the character must wait for the wind to change. He’s noticed in the years that he’s been there that there’s one time of year when a strong wind does blow in the opposite direction. We can use the wind in a very musical way to signal, “Here’s the moment that the change is going to happen.” And the intensity of the wind is such that it literally and emotionally launches him in this attempt to get off the island. Further, places have their own sounds: the winds that are associated with the cave are probably more musical than any of the other winds, partly because the wind as it blows across the opening and through the cave is a little like a flute. (Axinn, 2001) In principle, what Thom did in Cast Away is an extension of the musique concrète created by Pierre Schaeffer and Pierre Henry in the 1950s, except that Thom restricted his sources to wind and water. Musique concrète was a strong influence on Murch when he was an adolescent, playing with early tape recorders, fascinated with what could be done with recorded sound. After studying art history and comparative literature, Murch eventually became a soundman, editor, director, and translator of poetry, with a strong bent for theory. As he explains in “Dense Clarity—Clear Density,” Murch conceives of sound as a sort of spectrum. On the far left is speech, which Murch characterizes as “encoded.” At the right end of the spectrum, he places music, which “embodies itself.” If knowing the “code” behind a sound is important, as it is for language, it is “encoded” to some degree; thus, as Murch puts it, “Schoenberg is more encoded than Santana.” Murch uses colors to describe his spectrum: violet for encoded; red for embodied; yellow for sounds that are in-between, encoded/embodied. Murch places background sounds into areas of green and orange if they slide toward one end of the spectrum or the other. Location ambience tends to be closer to music (hence, Thom’s success in Cast Away), whereas rhythmic effects such as footsteps, doors closing, and gunshots are closer to speech and often can work metaphorically. (The door closing at the end of The Godfather gives a finality to the dilemma Kate finds herself in.) Murch’s theories are generally the results of insights discovered while working. For example, while laying in footstep sounds for robots for George Lucas’s first feature, THX1138, Murch discovered that he did not need to synchronize sounds with steps if there were three or more robots—for more than two, people just heard a “group”. His conclusion: The brain can only recognize synchronicity in two things at a time. In film after film, he has explored other limits and potentials in perception/cognition. In The Godfather, he found that, carefully done, two simultaneous conversations could be grasped by audiences. In the helicopter attack sequence in Apocalypse Now, he discovered that in total, he could have five streams of sound—a sixth would jumble everything. So he

74 / CHARLES E IDSVIK

removed whatever would overload, with different streams—music, background, and so forth—pulled out depending on the needs of the moment. Though physically the ear hears any number of sounds, five is the limit one can listen to. Murch codified how sound has to be mixed into what he calls the rule of two and a half. We can process only about two-and-one-half sounds in one area of the spectrum at one time. Curiously, background sound can be quite complex without becoming confusing, because it comprises elements that vary from very encoded to highly embodied. For very brief moments, we can handle aural overloads, but fatigue quickly sets in. Further, Murch believes that good sound requires variation from scene to scene as well as from moment to moment, so that the listeners’ ears will stay fresh: Fatigue limits apprehension. Some History That it is possible to control sound precisely enough to make background sound work properly is a consequence of two mini-revolutions in sound technology, one in the early 1970s, the other in the late 1980s and 1990s when digital electronics came into common use. From the mid-1920s until the 1940s, sound was recorded on movie film; its signalto-noise ratio and dynamic range were thirty to forty decibels at best, and every time it was copied (to make edit prints, then to make theatrical copies), quality got worse. Movie projection systems rarely had more than thirty decibels of dynamic range and had an audible, ever-present hiss. At best this would be the equivalent of a cheap VCR playing a video into a cheap television with a bad speaker. Tape recording meant better originals (and copies, up to the optical theatrical print) from the late 1940s on. In the late 1960s, punch-in/punch out recording equipment was introduced that allowed sound mixes to be done a little at a time rather than in perfectly executed ten-minute reels (as had been the practice). Dolby noise reduction began being used by music recording studios in 1965, but the film industry ignored the high-fidelity revolution in music recording for home entertainment. The audio revolution in film began when Francis Ford Coppola, George Lucas, and Walter Murch began adapting audiophile equipment for use in American Zoetrope movies in the late 1960s while making Coppola’s The Rain People). Their sound explorations were continual and included movies as different as The Godfather, The Conversation, and American Graffiti. The Zoetrope team was free to explore audio technology, but films had to be finished the old way, squeezed onto one track (except for 70 mm theaters) with optical sound and one big speaker behind the screen handling all sound. Still, one virtue of new technologies is that they put everything into question—what one can do, what one might dream of doing, what one thinks one is doing. Rapid technological change transformed how film sound would be recorded, filtered, mixed, and reproduced. This transformation showed up in sound tracks, but for audiences to hear the extent of the revolution required changes in film-exhibition technology. Dolby Laboratories was the impetus for this change. Though Dolby began being used in mixing studios in 1971 on Clockwork Orange, and in a few large releases in 70 mm (e.g., A Star Is Born) as early as 1975, the first release in Dolby Stereo (with an additional low-frequency effects track in 70 mm theaters) was Star Wars in 1977. About half of first-run 35 mm theaters were able to run Dolby Stereo copies of Star Wars in 1977 (Salt, 1992, p. 282). The first full-fledged surroundsound Dolby film was Apocalypse Now in 1979.

BACKGROUND TRACKS IN R ECENT C INEMA / 75

A number of technological improvements ensued. The THX system (which dealt mostly with specifications for theater equipment and acoustics) in 1983. Dolby SR, with improved signal processing, came out in 1986. Dolby brought out a digital system in 1992 (which has also become the standard for DVD). DTS (digital theater system), which uses a compact-disc system, came out in 1993, as did Sony Dynamic Digital Sound, which uses seven tracks, plus a subwoofer channel—two more in front of the audience than other systems. All use some version of surround sound plus a subwoofer for low noises. All available digital formats, plus analog Dolby Stereo can be fit onto a single film print, so a film will play on whatever sound system a theater happens to own. In practice, acoustical differences between any two theaters affect what we hear more than the particular digital sound system in use. Limitations on audio are a result of mixers having to ensure that a film will work in older as well as newer theaters, in large as well as small houses. Thus, no one since the 1950s had challenged the convention that dialogue will mostly come from the center speaker. We attribute dialogue to whomever we see speaking, by other visual cues, and by voice recognition, just as we do in everyday life. As Albert S. Bregman puts it: The tendency to experience a sound as coming from a location at which visual events are occurring with the same temporal pattern (the so-called ventriloquism effect) can be interpreted as a way in which visual evidence about the location of an event can supplement unclear auditory evidence. This is helped by the fact that we attribute sounds to streams, capture specific sounds from streams, and have a bias toward “exclusive allocation” of sounds into specific streams. (1990, p. 652) As mentioned earlier, dialogue has to be boosted to carry in large theaters, so it is left loud on all prints. Mixers tend not to use anywhere near the full dynamic range of digital sound because some theater sound systems will distort if pushed beyond their functional limits. Thom tends to be more daring than most, deliberately running dialogue low in Forrest Gump and working with a wide dynamic range. Most mixers, however, are more conservative. Relatively few theaters have put in equipment to exploit the superior sound of Sony’s 7.1 sound system, in which three speakers evenly spaced behind the screen provide clear spatial orientation for the viewer. Echoing the data of virtual reality and DVD experiments, Sony claims greater audience involvement when sound can help orient the viewer to the visual screen space. (Sound also can give the impression of a sharper, more dynamic picture—a key to the effectiveness of DVD.) But because not enough theaters put in the extra speakers (and special decoding) for Sony dynamic sound, most non–Sony Studio films are only mixed for 5.1 surround. Though filmmakers such as George Lucas have campaigned for theater sound systems even more complex than Sony’s and regard the Sony system as a minimum for effectiveness, the five-speaker-with-subwoofer system from 1992 seems to be the standard we will live with for the next few years. With DVD standardized on the 5.1 Dolby and DTS systems and most profits from films coming from ancillary markets (video, cable, foreign sales, DVD), most producers (and mixers) stay with 5.1. Why Quality Does and Does Not Matter How good should audio be? From an audiophile perspective, there are any number of weaknesses in movie sound, not only in terms of spatialization but also in terms of qual-

76 / CHARLES EIDSVIK

ity. Each of the digital sound systems uses extreme compression in order to get surround system into the space available on a film print (or even on CDs, as with DTS). The algorithms involved are very lossy—they throw away a lot of data—with much of the compression based on psycho-acoustics. High frequencies, secondary harmonics of fundamental frequencies, and sounds masked by louder sounds in their frequency range are dropped. The results are adequate, so long as you do not listen to compressed sound side-by-side with uncompressed. To my ears, (though I have had to judge by listening to the tracks via DVD copies) each of the tracks on most movies is about the quality of an MP3 download from the Web, sometimes a bit better but not much. In a mixing studio, sound has at least five times the potential dynamic range and at least eight times the potential harmonic nuances than what theater systems can handle. When the mixed sound is fit into what theaters can handle, the result is sound that is complicated but not deep or subtle. Is that important? Perhaps not for most films and most viewers. Arguments that quality matters usually convince cinephiles, who go out of their way to watch movies in 70 mm theaters with well-tuned sound systems. Bigger, brighter, better pictures accompanied by well-designed and well-modulated sound produces a better spectacle and greater perceptual involvement, all other factors being equal. Whether that is very important to cinema is a question at the heart both of the audio revolution and cinema studies. In part, the question is a matter of genre and of sociology. That genres have their own emotional palettes, as Grodal has argued, also has perceptual and technological consequences (1997, pp. 157–277). The major technical advances in theater sound have been forced by blockbuster thrillers that needed it—Star Wars, et al. (It is no accident that films I have mentioned as using background sound well are, with very few exceptions, thrillers.) The market for thrillers is primarily young people—even younger than the usual movie demographic. This makes sense in terms of Pierre Bourdieu’s data in Distinction: A Social Critique of the Judgement of Taste: Social fractions (whether described by age, economic status, or cultural advantages) vary considerably in their reactions to strong sensations. Male adolescents and people lacking cultural advantage tend to seek thrills. (It is no accident that adolescent males are the primary market for 3-D video games.) The taste for strong thrills is less in social fractions that are older or have more cultural capital (education, social status, and connections). Still, even with young audiences, it is remarkable how many viewers sit toward the back of the theater, where perceptually there will be less stimulus than close to the screen. Ambiguous relations between perceptual and cognitive involvement are central to the paradox of how we experience movies. In audio as well as visual terms, the convention of movie theaters is that “All seats are alike.” Unlike live theater, we do not pay more for a better vantage point. We assume the role of being at no particular place. That convention allows the camera its freedom and allows cutting fluidly from perspective to perspective without perceptual confusion (provided the filmmaker has followed conventions, such as the 180-degree rule, to prevent confusions). Music plays along with action, again coming outside the world of the story. Actors we recognize as not the characters they are playing speak dialogue and engage in activities far from anything in ordinary life. We are in a space for perceiving fictions. And yet audiences will not put up with background sound that is less than skillful at masking the artifices, at invoking our aural memories of real space. Two explanations come to mind. One is Murch’s. He argues in “Dense Clarity, Clear Density” against sound tracks being completely realistic, with all the detail of ordinary

BACKGROUND TRACKS IN R ECENT C INEMA / 77

perception, in that the mind needs gaps in order to become engaged imaginatively. Listened to closely, few tracks are more than suggestive of the complexity of ordinary scenes carefully perceived. The second explanation seems the more promising, for one may also argue that imagination is sparked not just by gaps but by juxtaposition and contradiction. Movies juxtapose the real and the artificial: That is the genius of the form. What do we expect to feel real? I think the minimum involves three areas, two of which are aspects of acting. Visually, we expect actors’ facial and bodily expressions to correspond closely to the everyday, especially in reaction shots, which help guide our own reactions. Aurally, we expect the emotion-revealing subtexts and intonations of the actors to feel right. The third aspect involves where the characters are: We expect sound to invoke our sense of real places. In a great many films, additional elements also invoke our experiences of the real. And, as always in the arts, there are exceptions: for example, in broad comedy, almost any call to realism or sincerity is apt to become a joke. Such exceptions aside, why is mimetic realism necessary in these three areas? We can explain acting and characterization issues in terms of how we read other people, but why background sound? We put up with all sorts of artifices in cinema, why not with sound discontinuities within scenes? One part of the answer is simple: Sharp sound transients distract us, invoking a startle effect if we do not understand their source. Physiologically, unexpected sound transients are an alerting mechanism. Unexpected noises will wake us at night, they will distract attention from whatever we are doing, and they will even put us into fight-orflight modes in many contexts. Response to sound transients probably is one of the few sound phenomena that is biologically encoded as a signal of danger. Avoiding transients unmotivated by the screen story is a way of keeping us within the rhythms of the movie. The issue of avoiding sharp transients aside, what is it about those elements we require to seem real, to seem to have a documentary quality—visual acting, voice delivery, and background sound—that is identifiable? I think a central issue is that we react to these in everyday life while in the main unaware that they are what we are responding to, how they work, or even that we are paying attention to them. They are all, to use Murch’s terms, both encoded and embodied. When a film is dubbed for foreign release, only the voices are replaced, and then effort is made to imitate the emotional tone of the original. The argument for not dubbing, for subtitling, is to hang onto the original actors’ voice intonations, the emotionally and rhythmically embodied record not of what they say in English or French or Japanese but what the characters feel, which at least partially transcends specific linguistic encoding. With facial and bodily expressions, particularly in response to other people, something similar is going on: There are differences in what any visual or physical expression means culturally, but at an immediate level, we react to what might be called the music of the faces and bodies, the rich embodiment of very complex feelings. Background sound can be in that sort of category: One does not have to have heard the ocean in ordinary life to react to the sound track in Cast Away. Background sound is a call not just to our own experience but to our curiosity and enjoyment of where we are, of the aural textures of sensory life. Film sound resonates with our sensory lives; to the extent that we bring those lives to the experience of a film, we enrich our experience. The curse of being an academic in cinema studies is the separation between universities and film production. Most film scholars have no notion of the differences in how a film works if one background sound is used instead of an alternative. In the final film, there will be no one to point out what has been done or why. Access to the world of

78 / CHARLES E IDSVIK

production and postproduction removes both mystery and finality from the esthetics of cinema. A film never has to be the way it turns out: It is a matter of choices, of opportunities taken, and opportunities lost, all in the context of an expensive and timelimited process. The only access to what goes into sound tracks is through those who put them together. If cinema studies is to advance the project of discovering how films work, collaboration or at least communication between filmmakers and academics is essential. In the small community of those concerned with sound, that has at least begun. Note 1. For some years, the central meeting ground for sound studies and sound practitioners has been Sven E. Carlsson’s web page: http://www.filmsound.org/. The site is linked with and tracks industry and trade journals in film sound such as Mix and Studio Sound as well as academic articles and books on sound.

References Axinn, Michael. (2001, January 1). Randy Thom creates soundtrack from water, wind, and fire. 15 June 2004. Mix Magazine MixOnline. 0.5 denoting increased sensitivity to signal. B’’ is a non-parametric bias index, which varies between -1 and +1: B’’ < 0 representing a conservative criterion, B’’ = 0 denoting a neutral criterion, and B’’ > 0 a liberal criterion (see Macmillan & Creelman, 1990). Recognition performance was significantly higher in the ordered condition in both the hits and A’ analyses (p < .05). From Lander and Bruce (2000); used by permission of Lawrence Erlbaum Associates.

FACIAL MOTION

AS A

CUE TO IDENTITY / 135

Here it is important to note that in both conditions the face appeared to move, but the original dynamic characteristics of the motion were only retained in the ordered condition. Other experiments we have conducted have followed up this finding by comparing the recognition of famous faces from naturally moving clips (as sampled from television and video clips) with speeded-up clips, reversed clips, and static images (Lander & Bruce, 2000, experiment 2). Removing the original dynamic characteristics of the motion, by increasing its tempo or changing its direction, significantly decreased the recognizability of the famous faces, compared to images moving naturally in real time. Results confirmed previous findings that the beneficial effects of motion previously observed were not solely due to the fact that a moving sequence contains more static-based information than a single static image. In this experiment, the same number and selection of images were shown for the same amount of time, across all moving conditions tested. If the beneficial effect was simply a reflection of the number of images, then we would have expected to find no significant differences between these different types of image presentation. However, while increased static-based information cannot provide a full account of the beneficial effects observed, it may be of some use for recognition. In this experiment, viewing any kind of moving image, whether or not it preserved the original dynamic properties. promoted better recognition than viewing a single static image. This advantage may be due to the additional number of (static) images contained in a moving sequence, revealing subtleties of facial morphology not available from a single static image (Alley, 1999). So, the experiments reported here strongly suggest that the beneficial effect of motion is partly due to the increased amount of static-based information in a moving sequence but that movement also adds additional dynamic information. The usefulness of this dynamic information seems to be linked to the precise dynamic characteristics of the observed motion. Removing the original dynamic characteristics of the motion by increasing the tempo or reversing the direction significantly decreased the recognizability of the famous faces, compared to images moving naturally in real time. This adds to previous findings (see Lander, Christie, & Bruce, 1999) by clearly demonstrating that the exact pattern and tempo of the observed motion are important rather than simply a sense of animation. There are several possible interpretations for this finding. It may be the case that there is some generalized benefit for viewing a face moving naturally. O’Toole, Roark, and Abdi (2002) refer to this idea as the representation enhancement hypothesis, positing that facial motion contributes to recognition by facilitating the perception of the 3-D structure from the face. Alternatively, each known face may have an associated characteristic motion signature, which acts as an additional cue to identity (termed the supplemental information hypothesis by O’Toole, Roark, & Abdi, 2002). We discuss the theoretical interpretations of our findings in more depth at the end of this chapter. First we address whether motion aids face learning and the importance of walking movements for the recognition of identity. Learning New Faces The above section considered the possibility that movement may be important in the recognition of known faces, which involves accessing established face representations stored in long-term memory. We have outlined compelling evidence to suggest that movement provides useful information for familiar face recognition (Bruce & Valentine, 1988; Christie, 1997; Knight & Johnston, 1997), particularly when recognition

136 / K AREN LANDER

AND

V ICKI BRUCE

is problematic. As discussed, the exact theoretical underpinnings of these movement advantages are at present a little unclear. As well as aiding recognition of known faces, movement may also help us build face representations for previously unfamiliar faces. Thus, motion may aid learning of new faces. As new faces are learned, we add to the store of known faces in long-term memory. Research on the importance of motion for face learning suggests that seeing an unfamiliar face move may aid learning although results have been a little inconsistent. Bruce and Valentine (1988) used a standard recognition memory task to investigate the recognition of novel views of faces that had previously been studied in motion or from a still photograph. It was thought that viewing faces in motion may help to build up a superior representation of the three-dimensional structure of the face, thereby allowing better generalization to a novel view (Bruce & Valentine, 1988). In the moving condition, the person was shown looking up and down, speaking, and rubbing one eye, either for ten or twenty seconds. The movement in this experiment was largely affective and nonrigid in nature. Bruce and Valentine also included a multistatic condition whereby the participant was shown five stills from the video or a single still image. As expected, performance was significantly better when the target face was shown for twenty seconds, compared to the ten-second condition. The effect of presentation mode in the study phase was not significant although the trend was in the predicted direction (recognition better for moving faces). The lack of significance for this result was explained by the authors in terms of the variability of the participant population at this task. In a later study, Pike, Kemp, Towell, and Phillips (1997) did find a significant advantage for motion when learning faces. In this experiment, learning was done with faces moving, as a series of static images, or as a single static image. Here, the movement was solely rigid in that it comprised a full 360-degree turn on a motorized chair. Recognition in the testing phase involved a yes/no decision using a single slide image taken at a different photographic session or using the video filmed for the learning phase. Images presented in the moving condition were recognized significantly more accurately than those from the multiple or single static condition. Pike et al. (1997) proposed that seeing a face move rigidly in the learning stage leads to a more-robust and more-accurate object-centered mental representation (Marr, 1982) than can be generated from either multiple static images or a single static image. Conversely, Shepherd, Ellis, and Davies (1982) reported no advantage in an eyewitnessing paradigm for studying a moving sequence compared with a single static photograph when participants were later required to select the person studied. Similarly, Christie and Bruce (1998) found no benefit for learning faces in motion on a subsequent recognition task (yes/no decision). In this study, faces were shown either as a moving computer-animated display or as a series of static images, with the amount of static-based information equated across conditions. The movements involved nonrigid transformations (expression changes) or rigid rotations of the whole head (nodding or shaking). At test, participants either saw moving sequences or a single static image.2 While there was no benefit for learning faces in motion, there was a benefit for testing faces in (rigid) motion compared to static images. Similarly, Schiff, Banka, and De Bordes Galdi (1986) found an advantage for testing recognition memory for unfamiliar faces using a moving sequence rather than a static mug-shot photograph. These findings can be compared with those of Knight and Johnston (1997) and Lander et al. (1999, 2000), who also found beneficial effects of movement when testing recognition of already familiar faces.

FACIAL MOTION

AS A

CUE TO IDENTITY / 137

So, the role of movement in building face representations is somewhat unclear. Christie and Bruce (1998) found no benefit for learning faces that were moving either nonrigidly or rigidly (but see Lander & Bruce, 2003). In contrast, Pike et al. (1997) found that learning faces in rigid motion did subsequently help participants recognize the faces more accurately, compared with when they were originally presented either as a single static image or as a series of statics. One possible reason for the discrepant results is that Pike et al.’s (1997) study involved recognizing faces across a change in lighting, while Christie and Bruce (1998) used the same lighting at presentation and test. It may be that rigid motion is particularly useful in compensating for the loss of three-dimensional shape information when lighting changes. Further work is needed to investigate this possibility and evaluate the role of rigid motion in building face representations. More recent work has been investigating the transformations that occur as faces become familiar. There are some rather intriguing differences between the perception of novel faces and familiar ones, and we are investigating how these differences arise as a new person is learned. In particular, the external features (hair, ears, face outline, and chin) of unfamiliar faces seem to be more highly weighted than the internal features (eyes, nose, and mouth), but for familiar faces this balance shifts, with relatively more weight placed on the internal features (Ellis, Shepherd, & Davies, 1979; Young, Hay, McWeeny, Flude, & Ellis, 1985). This project examines how this shift in processing develops over a number of days and investigates whether learning is better under certain circumstances. One factor that we have explored already is whether the external to internal shift arises sooner or more strongly when faces are learned moving rather than in static images. However, experiments to date have not found any difference between moving and static learning conditions (Bruce & Henderson, 2000). The moving sequences we have used thus far in our familiarization project have shown both rigid and nonrigid motion. None of the studies detailed in the literature report any beneficial effects for the recognition of unfamiliar faces shown in nonrigid motion sequences, though some benefits have been shown for rigid motion alone (e.g., Pike et al., 1997). Thus, one task for the future will be to separately examine familiarization from rigid and nonrigidly moving sequences. However, if benefits of motion are confined to rigid motion, then the effect itself can have little ecological validity, because the faces we are exposed to in everyday life show complex motions combining a variety of nonrigid expressions and speech acts with rigid gestures such as head nodding. Moreover, our experiments on the identification of familiar faces show benefits of sequences showing nonrigid motion as the faces speak and express (Christie, 1997; Knight & Johnston, 1997). Matching Moving Faces As well as remembering faces, in certain circumstances we need to be able to compare images to see if they show the same person. For example, if a CCTV image has captured a person robbing a bank, crime investigators may want to compare this footage with images of people who have committed this kind of crime in the past. Is it easier to compare one face with another when they are shown moving? In experiments simulating the comparison of faces seen on CCTV, we have so far shown little or no benefits if the sequences are shown moving rather than as static images. For example, Bruce, Henderson, Greenwood, Hancock, Burton, and Miller (1999) showed participants a series of trials, each of which showed an excerpt from a high-quality video sequence of a male face, alongside an array of ten male photographs. Participants were asked to decide which face in the comparison array matched the video image. Perfor-

138 / K AREN LANDER

AND

V ICKI BRUCE

mance was no better when the video excerpts showed animated clips, which participants were encouraged to pause and replay, compared with when they showed a single static image which most closely matched the viewpoint depicted in the comparison arrays. One possibility is that the matching of unfamiliar faces may be based largely on what Bruce and Young (1986) termed “pictorial codes,” and that for this task the addition of dynamic and/or three-dimensional information does not help because the comparison images are static ones. However, in a follow-up to this study, undergraduate student Jenny Rarity investigated whether participants were better able to match pairs of video sequences taken on different cameras or pairs of still images and found no difference between these two conditions—that is, no benefit for the moving condition. Interestingly, Thornton and Kourtzi (2002) did find a small benefit for motion in an immediate sequential matching task in terms of reaction time. However, in line with our studies (see Bruce et al., 1999), no consistent benefits were found in the overall accuracy of these matching decisions. Thornton and Kourtzi presented participants with a dynamic (smiling or frowning) or a static prime face (presentation time 540 ms). Following a short interval (300 ms), a second target face (static image) appeared on the screen. The target face was presented upside-down, and participants were asked to judge if the prime and target faces belonged to the same person or to a different person. Results indicated that same decisions were significantly faster when the prime image was dynamic relative to when static. This was true regardless of whether the prime and target faces showed the same expression (smile-smile/frown-frown) or different expressions (smile-frown/frown-smile). Thornton and Kourtzi suggest that motion may be particularly useful during the construction of transient perceptual representations, used to support our ongoing interaction with objects in the physical world. Beneficial effects of motion were only found when the prime and target faces mapped onto the same basic object (i.e., a particular person) and were not found when matching for expression rather than identity. Thus, it seems that in some circumstances, there may be some benefit for matching moving faces although further work is needed to establish the size, consistency, and nature of these effects. In ongoing research, we are investigating the importance of face motion for patient HJA (described earlier in this chapter). HJA is able to use dynamic cues during expression processing (see Humphreys et al., 1993) and performs normally with dynamic expressing displays. Early results suggest that HJA cannot utilize motion to access identity but is better at matching moving compared with static faces (see Lander, Humphreys, & Bruce, 2004). Moving Faces and Moving Bodies Viewing static images of faces does not just eliminate natural motion of the face, it also isolates the face from the rest of the body. When we recognize people in everyday life, however, we rather rarely see their face alone, and movement (gait) as well as other characteristics of the body (shape, clothing) may contribute to person identification. Gait perception has attracted an increasing amount of interest in the last few years, primarily focusing on the treatment of abnormal walking patterns (Wagenaar and van Emmerik, 1994). More recently though, it has been suggested that human gait patterns should be added to the list of biometrics, measures that can be used to reliably distinguish one individual from another (Stevenage, Nixon, & Vince, 1999). If a person walks in front of a CCTV camera, it is not just his or her face that is seen moving but the whole person.

FACIAL MOTION

AS A

CUE TO IDENTITY / 139

It is easy to consider the qualities of a walk that could be used to identify an individual before the face becomes discernible. These qualities include stride length, bounce, speed, rhythm, body swing, swagger, and any additional characteristics such as limps or injury. Work using point-light displays (see earlier) demonstrates that as with facial movement, gait patterns can be used to make simple categorizations (human/non human judgments, nature of movement, sex of human, etc). Here it is important to investigate whether gait may also be used to reliably identify an individual walker. Initial results using point-light displays suggested that perceivers could reliably recognize themselves and their friends from dynamic displays (Cutting and Kozlowski, 1977). Furthermore, Beardsworth and Buckner (1981) demonstrated that participants found it easier to recognize themselves from dynamic point-light gait displays compared to the patterns of their friends. This is interesting considering that we rarely see our own walk from a third person perspective, suggesting that there is some transference from the kinesthetic to visual modality. These results support the view that gait signatures may be used as a cue to identity. Stevenage and colleagues (1999) carried out a more controlled investigation of this issue and its applied interest. In experiment one, participants learned to identify six individuals on the basis of their gait under conditions of simulated daylight, simulated dusk, and from point-light displays. Results confirmed that all participants could use gait-based cues to identify individual walkers. The gender of the participant did not affect the difficulty of the learning task, but female walkers were easier to identify than male walkers. Importantly, the difference in lighting had no significant impact on the task difficulty. Stevenage at al. suggest that this may be because participants matched walkers for body shape (height and weight) and clothing. Given this, participants may have been forced to base their decisions on gait-based rather than body-based cues. Alternatively, the provision of these additional body-based cues may be of little importance to the discrimination of gait patterns. A second experiment suggested that even under adverse viewing conditions involving a single brief exposure, participants were able to identify a target from a walking identity parade at better than chance probability. This ability emerged regardless of lighting condition and walker gender. Furthermore, participants’ confidence in their target selection was positively correlated to their accuracy. This has important forensic implications for situations when the investigator is unaware who the real walker is. So, work outlined so far suggests that gait could be used as a reliable means of identifying individuals. This suggestion may become important in battle against crime, given that criminals seldom think to disguise their walk. Accordingly, in order to automate recognition, computer-vision scientists have been concerned with the individual characteristics of gait that underpin identification. A number of techniques have achieved impressive results, albeit with only a small number of walkers (see Niyogi and Adelson, 1994; Murase and Sakai, 1996). However, research on CCTV identification suggests that the facial information is vastly more important than that from bodily movement (Burton, Wilson, Cowan, & Bruce, 1999). Burton and colleagues examined the effectiveness of human-face recognition from poor-quality video images. Footage of teaching members of the psychology department at the University of Glasgow was captured using a low-cost security system. The video camera, mounted on the main entrance to the building, captured people entering the building walking towards and past the camera. Video clips of fifteen people were taken, ten

140 / K AREN LANDER

AND

V ICKI BRUCE

belonging to lecturing staff familiar to student participants. The remaining five clips were of visitors to the department, unfamiliar to the student participants. All clips lasted for three seconds, and participants were asked to try and identify the person shown in the video clip. To investigate the basis of the identity decision, different aspects of the video were selectively disrupted. Video clips were edited to obscure the head, body, or gait of the targets (see fig. 8.5). Gait information was obscured by not displaying all the frames in the moving sequence but instead only showing seven still frames each for an equal period (summing to three seconds). This manipulation destroyed the apparent motion of the video, making it very difficult to perceive the gait of the target.

Fig. 8.5. Stills from video sequences. From Burton Wilson, Cowan, & Bruce, (1999); used by permission of Blackwell Publishing.

Results indicated that overall accuracy was high, with participants correctly identifying over 90% of unedited familiar targets and correctly rejecting 92% of the unfamiliar stimuli. Obscuring the body and gait produced only a small decrement in recognition performance (average 84% correct). However, obscuring the target’s heads had a dramatic effect on participants’ abilities to recognize them. This implies that participants were recognizing the targets from their faces, even from these poor-quality images. Burton and colleagues comment on the relatively poor recognition of familiar people from their bodies and gait (head-obscured condition), in which successful recognition dropped to around one in three (33% correct). This figure is low considering that participants knew the video camera was set up in the psychology department and that the people they were likely to see would be local academics. One possible reason for this low identification rate is that perhaps walking towards the camera somehow disguised some aspects of the target’s gait. However, a later study that captured targets walking left to right replicated previous results, suggesting that obscuring gait only had a small effect on recognition performance (A. M. Burton, personal communication, 1999). Burton et al. (1999) suggest that it is intuitively reasonable to suppose that both gait and bodyshape information are used to discriminate among people although little support for this intuition was found in the present study. It seems that although gait can be used a cue to discriminating among individuals (Stevenage et al., 1999), the face is still the most important cue to identity (Burton et al., 1999). A number of further experiments on matching CCTV images, using the same low-cost CCTV system at the University of Glasgow, have been conducted (Bruce et al., 2001). Participants were either familiar or unfamiliar with the people shown on CCTV, and they were asked to compare CCTV images showing the whole body including the person’s face

FACIAL MOTION

AS A

CUE TO IDENTITY / 141

with high-quality photographs of faces. Participants had to decide if the CCTV and photograph showed the same or different person. Accuracy was much higher (and over 90%) for the participants who were familiar with the target people, but there was no advantage shown if the CCTV sequences were shown animated compared with a single- or multiple-stills condition. If patterns of gait were beneficial for the recognition of the familiar people shown, it might be thought that the moving condition would benefit familiar participants, but no such benefit was observed. Moreover, it seems that in this specific situation—where movements of the face arise largely from associated body movements—movement does not appear to benefit recognition of familiar faces either in contrast to the advantages shown for moving faces that showed nonrigid expressive speech patterns, in Lander et al.’s work. Interpretation and Future Directions We have outlined evidence to demonstrate the salience of dynamic information for event perception, face-categorization tasks, and—perhaps most surprisingly—face-recognition tasks. Across the range of situations explored, it seems particularly to be nonrigid movement patterns—either of faces generally, or of specific faces—that aid the recognition of familiar faces. How might this dynamic information be stored in memory? One possibility is that the motion trace is quite distinct from the static-form–based representation. If this were the case, then dynamic information may feed into the faceidentity system either directly or via other aspects of face processing where dynamic information is known to be important, for example, via expression and/or visual speech processing. Alternatively, the stored representation of familiar faces may themselves be dynamic in nature, intrinsically linking specific motion characteristics to specific identities. It is difficult to predict the nature of these dynamic representations although we could speculate that they are akin to short movies of the face that capture all the idiosyncratic aspects of that person’s motion. What could it mean for a representation to be dynamic? Techniques formulated by Freyd and Finke offer one way of investigating and thinking about this issue. Finke and Freyd (1985) showed that when the rotation of a visual pattern is implied, participants’ memory of that pattern tends to be displaced forwards in the direction of the implied rotation. The researchers termed this phenomenon representational momentum because of the similarity to physical momentum where an object continues along its motion path due to inertia. In common with physical momentum, the amount of representational momentum is proportional to the implied velocity of the motion (Freyd & Finke, 1985; Finke, Freyd, & Shyi, 1986), whereby the faster the implied motion, the greater the memory distortion. Accordingly, when the implied final velocity of the motion is slowed to zero, no such momentum effects are found (Finke et al., 1986). Freyd (1993) interpreted these findings as evidence that time is inextricably embedded in mental representations and that the underlying representation is dynamic in nature. In line with this suggestion, no momentum effects are found when the shape of the pattern is radically changed from image to image (Kelly & Freyd, 1987), presumably because the viewer attributes the static images as coming from different visual patterns rather than the same pattern moving around. If face representations are dynamic in nature, then we might expect participants to show a forward shift in memory for position of faces. Preliminary work on the issue of representational momentum and faces has been carried out by Thornton (1997). Thornton was interested in whether representational momentum effects can be found when

142 / K AREN LANDER

AND

V ICKI BRUCE

viewing moving expressing faces. In a series of experiments, participants viewed a short video sequence of a male face moving from a neutral expression to a smile (frames 0 to 19) and back again. The forward moving sequence then began again, terminating midway as the expression was emerging but not fully established (frames 0 to 9). Participants were asked to retain the final image of the display in mind. Following a 300 ms retention period, a probe face appeared on the screen. Participants were asked to judge if the probe face was the same as the final image of the video clip or was different from that image. Sometimes the probe face was identical to the final image of the video clip (frame 9), and sometimes the probe face was a distracter image displaced forwards (frame 11 or 19) or backwards (frame 7 or 8) in the sequence. If moving faces give rise to dynamic mental representations, then participants should misremember the stopping point of the expression as being more intense than in reality, that is, there should be a forward memory shift (Thornton, 1997). However, results indicated no such forward shift in memory, indicating no evidence for representational momentum. However, this may have been due to a number of artifactual reasons. One major issue concerned the selection of the distracter probe images. In standard representational momentum displays (for example, using simplistic dots or arrows), the physical difference between the same and distracter items can be systematically varied. However, when selecting images of faces are from video, such precision is not possible. Indeed, the amount of change that occurs two frames ahead of the selected stopping point may be very different from the amount of change two frames behind this point. Thornton addressed this concern by asking participants to carry out a concurrent matching task, designed to determine the relative discriminability of the distracter images. There was little evidence to suggest that the results observed were primarily due to the discriminability of the distracter images used. Thornton suggests that the negative memory shifts may be due to memory averaging (see Freyd & Johnson, 1987) or due to a categorical perception effect (see Etcoff & Magee, 1992). If the distracter images were close to an expression boundary, then some discontinuity in the discriminality of the distracter images might arise. However, care was taken to ensure that the distracter images fell well within the expression category (smiling). Further work is needed to investigate representational momentum effects with faces and to determine the nature of the representations needed to make these kind of matching decisions. In addition, more research is needed to tease apart different theoretical interpretations of the effects of dynamic information on face recognition. One current line of investigation uses priming techniques (see Bruce, Burton, Hanna, & Mason, 1994; Bruce, Carson, Burton, & Ellis, 2000; Bruce, Dench, & Burton, 1993; Brunas, Young, & Ellis, 1990; Brunas-Wagstaff, Young, & Ellis, 1992; Lander & Bruce, 2004 in press) to try to pin down the locus within the identification system where dynamic properties are represented. As priming methodology and logic are somewhat peripheral to the theme of this volume, we will not take space here to describe this work in any detail. In brief, however, we have found that moving images of faces prime later identification of static images of faces even better than identical copies of the static images themselves and that this effect—like those we reviewed earlier in this chapter—appears to be due to the precise temporal characteristics of the moving sequence rather than the number of images shown or their order. This observation appears consistent with our suggestion that dynamic information is either integral to (cf. dynamic representations) or feeds directly into the person identity system rather than being derived indirectly from expression or facial speech analysis systems.

FACIAL MOTION

AS A

CUE TO IDENTITY / 143

In this chapter, we have described the ways in which motion contributes to face perception and identification. It seems to be particularly the nonrigid, expressive, and communicative movements of the face that contribute to the perception of expression, to lip-reading, and to the identification of familiar faces seen in difficult-to-recognize footage. In contrast, beneficial effects of rigid motion such as head-turning or nodding, which might be expected to help deliver a three-dimensional representation of the face, seem much more elusive. We are still at the stage of speculating about how facial motion is deciphered and represented in memory. In the meantime, techniques for synthesizing motion of artificial faces for movies or avatars proceed apace. For example, FaceWorks Studio™ (“Challenging the task,” 1998) is an easy-to-use facial-animation package which allows any two-dimensional, full-face picture to be converted into a three-dimensional, synthetic face that accurately synchronizes lip movements and expressions to real speech. Thus, it is possible to take a static full-face image of a face and convert it to a moving digitalface image. Such techniques potentially provide ways that we can pursue the science of face perception in future years. If it becomes possible realistically to add one person’s facial movement onto another person’s facial form, then we will be able to explore whether beneficial effects of movement are general or specific (Knappmeyer, Thornton, & Bülthoff, 2003). Finally, we note that our experiments have repeatedly revealed that difficult-to-recognize images can become quite easy to recognize when animated. Perhaps the most dramatic demonstration comes from studies reported by Lander, Bruce and Hill (2001) where we showed that famous faces that were heavily disguised by pixelation or blurring—in the way used to conceal identity in documentary films—were readily recognized when seen in motion. This is strongly suggestive of the possibility that we do retain in memory some knowledge of characteristic motion of different, known individuals. Impersonators are clearly able to draw on some such knowledge when mimicking their targets, and this in turn suggests that distinctive movements and mannerisms may in turn facilitate celebrity. But as well as becoming recognized as an individual, an actor must also portray a range of different characters. Variation in facial and gestural movements, both deliberately or as a by-product of speech, should help in the depiction of different characters. When Meryl Streep varies her accent to play different roles, this must in turn affect the visual impression created by the talking head and should help to reinforce a distinctive identity from face and voice together. If we study only static images, such possibilities remain hidden from us. Notes Preparation of this chapter and some of the research described within it was supported by an Economic and Social Research Council grant, Ref. No: R000 22 29 89, awarded to V. Bruce and K. Lander. 1. Although faces are the most reliable means available to the human eye, fingerprints and iris patterns may prove most useful for automated identity recognition (see Daugman, 1998). 2. Thus, in the test phase, the number of images shown in the moving and static conditions was not equated as only a single image was shown in the static condition.

References Aldridge, J., & Knupfer, G. (1994). Public safety: Improving the effectiveness of CCTV security systems. Journal of Forensic Science Society, 34(4), 257–63.

144 / K AREN LANDER

AND

V ICKI BRUCE

Alley, T. R. (1999). Perception of static and dynamic computer displays of facial appearance in applied settings. In L. D. Rosenblum (Chair), Reconsidering the information for faces. Symposium conducted at the Tenth International Conference on Perception and Action (ICPAX). Edinburgh, UK. Bassili, J. N. (1978). Facial motion in the perception of faces and emotional expression. Journal of Experimental Psychology: Human Perception and Performance, 4, 373–79. Beardsworth, T., & Buckner, T. (1981). The ability to recognize oneself from a video recording of one’s movements without seeing one’s body. Bulletin of the Psychonomic Society, 18(1), 19–22. Benoît, C., Abry, C., Cathiard, M. A., Guiard-Marigny, T., & Lallouache, M. T. (1995). Read my lips: Where? How? When? And so . . . what? In B. G. Bardy, R. J. Bootsma, & Y. Guiard (Eds.), Studies in Perception and Action III (pp. 423–26). Mahwah, NJ: Lawrence Erlbaum. Bertenthal, B. I., & Pinto, J. (1994). Global processing of biological motions. Psychological Science, 5(4), 221–25. Bertenthal, B. I., Proffitt, D. R., & Kramer, S. J. (1987). Perception of biomechanical motions by infants: Implementation of various processing constraints. Journal of Experimental Psychology: Human Perception and Performance, 4, 577–85. Bingham, G. P. (1987). Kinematic form and scaling—Further investigations on the visual perception of lifted weight. Journal of Experimental Psychology: Human Perception and Performance, 13(2), 155–77. Bruce, V. (1994). Stability from variation: The case of face recognition. The M. D. Vernon Memorial Lecture. Quarterly Journal of Experimental Psychology, 47A, 5–28. Bruce, V., Burton, A. M., Carson, D., Hanna, E., & Mason, O. (1994). Repetition priming of face recognition. Attention and Performance, 15, 179–201. Bruce, V., Carson, D., Burton, M. A., & Ellis, A. W. (2000). Perceptual priming is not a necessary consequence of semantic classification of pictures. Quarterly Journal of Experimental Psychology, Section A: Human Experimental Psychology, 53(2), 289–323. Bruce, V., Dench, N., & Burton, M. (1993). Effects of distinctiveness, repetition, and semantic priming on the recognition of face familiarity. Canadian Journal of Experimental Psychology, 47(1), 38–60. Bruce, V., & Henderson, Z. (2000, July). Getting to know you . . . how we learn new faces. Paper presented at meeting of Experimental Psychology Society, Cambridge, England. Bruce, V., Henderson, Z., Greenwood, K., Hancock, P. J. B., Burton, A. M., & Miller, P. (1999). Verification of face identities from images captured on video. Journal of Experimental Psychology: Applied, 5(4), 339–60. Bruce, V., Henderson, Z., Newman, C., & Burton, A. M. (2001). Matching identities of familiar and unfamiliar faces caught on CTTV images. Applied Cognitive Psychology, 15, 445–64. Bruce, V., & Langton, S. (1994). The use of pigmentation and shading information in recognising the sex and identities of faces. Perception, 23, 803–22. Bruce, V., & Valentine, T. (1988). When a nod’s as good as a wink. The role of dynamic information in facial recognition. In M. Gruneberg & E. Morris (Eds.), Practical Aspects of Memory: Current Research and Issues (pp. 169–74). New York: John Wiley. Bruce, V., & Young, A. W. (1986). Understanding face recognition. British Journal of Psychology, 77, 305–27. Brunas, J., Young, A. W., & Ellis, A. W. (1990). Repetition priming from incomplete faces— evidence for part to whole completion. British Journal of Psychology, 81(1), 43–56. Brunas-Wagstaff, J., Young, A. W., & Ellis, A. W. (1992). Repetition priming follows spontaneous but not prompted recognition of familiar faces. Quarterly Journal of Experimental Psychology, Section A: Human Experimental Psychology, 44(3), 423–54. Burton, A. M., Bruce, V., & Hancock, P. J. B. (1999) From pixels to people: A model of familiar face recognition. Cognitive Science, 23, 1–31.

FACIAL MOTION

AS A

CUE TO IDENTITY / 145

Burton, A. M., Bruce, V., & Johnston, R. A. (1990) Understanding face recognition with an interactive activation model. British Journal of Psychology, 81, 361–80. Burton, A. M., Wilson, S., Cowan, M., & Bruce, V. (1999). Face recognition in poor quality video: Evidence from security surveillance. Psychological Science, 10, 243–48. Campbell, R. (1986). The lateralisation of lip-reading: A first look. Brain and Cognition, 5, 1–21. Challenging the task of facial animation head on. (1998, Sept). Computer Graphics World, 21(9), 19–20. Christie, F. (1997). Face processing: The role of dynamic information. Unpublished doctoral dissertation, University of Stirling, Scotland. Christie, F., & Bruce, V. (1998). The role of dynamic information in the recognition of unfamiliar faces. Memory and Cognition, 26(4), 780–90. Cutting, J. E., & Kozlowski, L. T. (1977). Recognising friends by their walk: Gait perception without familiarity cues. Bulletin of the Psychonomic Society, 9, 353–56. Daugman, J. (1998). Phenotypic vs. genotypic approaches to face recognition. In H. Wechsler, P. J. Phillips, V. Bruce, F. F. Souliè, & T. S. Huang (Eds.), Face recognition: From theory to applications (pp. 108–23). New York: Springer. Dittrich, W. H., Troscianko, T., Lea, S. E. G., & Morgan, D. (1996). Perception of emotion from dynamic point-light displays represented in dance. Perception, 26(6), 727–38. Duchenne, G. B. (1990). The Mechanism of Human Facial Expression. Cambridge: Cambridge University Press. (Original work published 1862 as Mechanisme de la Physiolonomie Humaine ou Analyse Electro-physologique de l’Expression des Passions). Ekman, P. (1982). Emotion and the human face (2nd ed.). Cambridge, England: Cambridge University Press. Ekman, P., & Friesen, W. V. (1982). Felt, false, and miserable smiles. Journal of Nonverbal Behavior, 6(4), 238–52. Ekman, P., Friesen, W. V., & Simons, R. C. (1985). Is the startle reaction an emotion. Journal of Personality and Social Psychology, 49(5), 1416–26. Ellis, H. D., Shepherd, J. W., & Davies, G. M. (1979). Identification of familiar and unfamiliar faces from internal and external features: Some implications for theories of face recognition. Perception, 8, 431–39. Etcoff, N. L., & Magee, J. J. (1992). Categorical perception of facial expressions. Cognition, 44, 227– 40. Finke, R. A., Freyd, J. J., & Shyi, G. C. W. (1986). Implied velocity and acceleration induce transformations of visual memory. Journal of Experimental Psychology: General, 115, 175–88. Freyd, J. J. (1993). Five hunches about perceptual processes and dynamic representations. In D. Meyer, & S. Kornblum (Eds.), Attention and performance, XIV: Synergies in experimental psychology, artificial intelligence and cognitive neuroscience—a silver jubilee (pp. 99–120). Cambridge, MA: MIT Press. Freyd, J. J., & Finke, R. A. (1985). A velocity effect for representational momentum. Bulletin of the Psychonomic Society, 23, 443– 46. Freyd, J. J., & Johnson, J. Q. (1987). Probing the time course of representational momentum. Journal of Experimental Psychology: Learning, Memory, and Cognition, 13(2), 259–68. Hess, U., & Kleck, R. E. (1990). Differentiating emotion elicited and deliberate facial expressions. European Journal of Social Psychology, 20, 369–85. ———. (1994). The cues decoders use in attempting to differentiate emotion-elicited and posed facial expressions. European Journal of Social Psychology, 24, 367– 81. Hill, H., & Pollick, F. E. (2000). Exaggerating temporal differences enhances recognition of individuals from point light displays. Psychological Science, 11(3), 1–5. Humphreys, G. W., Donnelly, N., & Riddoch, M. J. (1993). Expression is computed separately from facial identity, and it is computed separately for moving and static faces: Neuropsychological evidence. Neuropsychologica, 31(2), 173–81.

146 / K AREN LANDER

AND

V ICKI BRUCE

Johansson, G. (1973). Visual perception of biological motion and a model for its analysis. Perception and Psychophysics, 14, 201–11. Kamachi, M., Bruce, V., Mukaida, S., Gyoba, J., & Yoshikawa, S. (2001). Dynamic properties influence the perception of facial expressions. Perception, 30, 875–87. Kelly, M. H., & Freyd, J. J. (1987). Explorations of representational momentum. Cognitive Psychology, 19, 369– 401. Kemp, R., Pike, G., White, P., & Musselman, A. (1996). Perception and recognition of normal and negative faces: the role of shape from shading and pigmentation cues. Perception, 25, 37–52. Kleinke, C. L. (1986). Gaze and eye contact: A research review. Psychological Bulletin, 100(1), 78–100. Knappmeyer, B., Thornton, I. M., & Bülthott, H. H. (2003). Facial motion can bias the perception of facial identity. Vision Research, 43, 1921–36. Knight, B., & Johnston, A. (1997). The role of movement in face recognition. Visual Cognition, 4(3), 265–73. Kozlowski, L. T., & Cutting, J. E. (1977). Recognizing the sex of a walker from a dynamic-point light display. Perception and Psychophysics, 21, 575–80. Lander, K., & Bruce, V. (2000). Recognizing famous faces: Exploring the benefits of facial motion. Ecological Psychology, 12, 259–72. ———. (2004, in press). Repetition priming from moving faces. Memory & Cognition, 10, 897–912. Lander, K., Bruce, V., & Hill, H. (2001). Evaluating the effectiveness of pixelation and blurring on masking the identity of familiar faces. Applied Cognitive Psychology, 15, 101–16. Lander, K., Christie, F., & Bruce, V. (1999). The role of movement in the recognition of famous faces. Memory & Cognition, 27(6), 974 –85. Lander, K., & Chuang, L. (2004, in press). Why are moving faces easier to recognize? Visual Cognition. Lander, K., Humphreys, G. W., & Bruce, V. (2004, in press). Exploring the role of motion in prosopagnosia: Recognizing learning and maturing faces. Neurocase. Langton, S. R. H. (2000). The mutual influence of gaze and head orientation in the analysis of social attention direction. Quarterly Journal of Experimental Psychology Section A—Human Experimental Psychology, 53(3), 825–45. Macmillan, N. A., & Creelman, C. D. (1990). Response bias: Characteristics of detection theory, threshold theory, and “nonparametric” indexes. Psychological Bulletin, 107(3), 401–13. Marr, D. (1982). Vision. San Francisco: W. H. Freeman. Massaro, D. W. (1987). Speech perception by ear and eye: A paradigm for psychological inquiry. Hillsdale, NJ: Lawrence Erlbaum. McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices: A new illusion. Nature, 264, 746–48. Munhall, K. G., & Vatikiotis-Bateson, E. (1998). The moving face during speech communication. In R. Campbell, B. Dodd, & D. Burnham (Eds.), Hearing by eye II: Advances in the psychology of speechreading and auditory-visual speech (pp. 123–39). Hove, UK: Psychology Press. Murase, H., & Sakai, R. (1996). Moving object recognition in eigenspace representation: Gait analysis and lip reading. Pattern Recognition Letters, 17, 155–62. Niyogi, S. A., & Adelson, E. H. (1994). Analyzing and recognizing walking figures in XYT. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 469–74). Seattle, WA. O’Toole, A. J., Roark, D. A., & Abdi, H. (2002). Recognizing moving faces: A psychological and neural framework. Trends in Cognitive Sciences, 6, 261–66. Parke, F. I., & Waters, K. (1996). Computer facial animation. Wellesley, MA: A. K. Peter.

FACIAL MOTION

AS A

CUE TO IDENTITY / 147

Pike, G. E., Kemp, R. I., Towell, N. A., & Phillips, K. C. (1997). Recognising moving faces: The relative contribution of motion and perspective view information. Visual Cognition, 4(4), 409–37. Reisberg, D., McLean, J., & Goldfield, A. (1987). Easy to hear but hard to understand: A lipreading advantage with intact auditory stimuli. In B. Dodd and R. Campbell (Eds.), Hearing by eye: The psychology of lipreading (pp. 97–114). London: Lawrence Erlbaum. Rosenblum, L. D., & Saldaña, H. M. (1996). An audiovisual test of kinetic primitives for visual speech perception. Journal of Experimental Psychology: Human Perception and Performance, 22(2), 318–31. ———. (1998). Time-varying information for visual speech perception. In R. Campbell, B. Dodd, & D. Burnham (Eds.), Hearing by eye II: Advances in the psychology of speechreading and auditory-visual speech (pp. 61–81). Hove, UK: Psychology Press. Runeson, S., & Frykholm, G. (1981). Visual perception of lifted weight. Journal of Experimental Psychology: Human Perception and Performance, 7, 733– 40. Schiff, W., Banka, L., & De Bordes Galdi, G. (1986). Recognizing people seen in events via dynamic “mugshots.” American Journal of Psychology, 99, 219–31. Shepherd, J. W., Ellis, H. D., & Davies, G. M. (1982). Identification evidence: A psychological evaluation (pp. 100–5). Aberdeen, UK: Aberdeen University Press. Stevenage, S. V., Nixon, M. S., & Vince, K. (1999). Visual analysis of gait as a cue to identity. Applied Cognitive Psychology, 13(6), 513–26. Sumby, W. H., & Pollack, I. (1954). Visual contribution to speech intelligibility in noise. Journal of the Acoustical Society of America, 26, 212–15. Thornton, I. M. (1997). The perception of dynamic human faces. Unpublished doctoral dissertation, University of Oregon, Eugene, Oregon. Thornton, I. M., & Kourtzi, Z. (2002). The perception of dynamic human faces. Perception 31, 113–32. Vitkovitch, M., & Barber, P. (1994). Effect of video frame rate on subjects’ ability to shadow one of two competing verbal passages. Journal of Speech and Hearing Research, 37, 1204–11. Wagenaar, R. C., & van Emmerik, R. E. A. (1994). Dynamics of pathological gait. Human Movement Science, 13(3– 4), 441–71. Walden, B. E., Prosek, R. A., Montgomery, A. A., Scherr, C. K., & Jones, C. J. (1977). Effects of training on the visual recognition of consonants. Journal of Speech and Hearing Research, 20, 130– 45. Weiss, F., Blum, G. S., & Gleberman, L. (1987). Anatomically based measurement of facial expressions in simulated versus hypnotically induced affect. Motivation and Emotion, 11, 67–81. Young, A. W., Hay, D. C., McWeeny, K. H., Flude, B. M., & Ellis, A. W. (1985). Matching familiar and unfamiliar faces on internal and external features. Perception, 14(6), 737– 46.

Part Five Coupling of Perception and Emotion I N THE 1990 S , when cognitive film theory (remember that cognitive in the field of film studies broadly denotes an approach to film study that seeks to incorporate the findings and methods of science into the study of film) was just beginning to gain momentum, it was generally thought that such an approach could deal only with conscious and rational responses to film and that emotional or nonrational responses would be addressed far more adequately by psychoanalysis or feminism. The work that has been published since has dispelled that notion. A number of works in cognitive film theory have advanced our understanding of the role of emotion in film viewing. In the essays that follow, Torben Grodal and Dolf Zillmann are working toward incorporating an ecological perspective into the study of film and emotion. Ecological psychology has, of course, focused on perception and action, seldom venturing into the murky waters of emotion. Yet, if we are to do justice to film theory, issues such as the viewer’s engagement with fictional characters, narrative pattern, and narrative comprehension and the appeals of genres like melodrama, horror films, and comedy, where emotional responses are fundamental to our interest, must be addressed. In these two essays, Grodal and Zillmann tackle two difficult subjects: film lighting’s role in the creation of mood and the diversity of emotional responses to fictional dramas. Grodal, while still using the language and some of the concepts of cognitive science, is approaching the ecological concept of affordances in his assertion that “our visual attention is normally intimately linked with our human concerns. What we focus on is what is central to our concerns at a given moment, we cannot separate our interests and our attention.” And his consideration of film lighting is at base more ecological than it might at first appear. Grodal points to a fundamental concern with the facilitation of information pick-up in the canonical lighting situation. He explains how the lighting techniques adopted by the Hollywood system are designed to provide “an optimum of object information” and are often combined with the presentation of objects or characters from angles that also provide optimum information. And he notes that alternate methods of lighting result in the objects, characters, or scenes being seen as expressive or theatrical—in his words, “perceived as representations under certain contingent lighting conditions.” His analysis of the film viewer’s experience of differently lighted film scenes is based upon a distinction between lighting that is natural (that is, lighting that approximates our experience with the natural world) and lighting that deviates in some way from this norm of naturalness. He is, in other words, asking whether the lighting being employed in a given film scene is ecologically valid and what effect any differences might have on the film viewer’s response to the scene. In our own work, we have found that viewers display a sensitivity to interactions between characters and objects on the screen that deviate from the laws of ecological

149

150 / PART FIVE

dynamics. Viewers respond immediately and decisively to what we have described as small violations of the laws governing the ways people and things move and interact. Just as the scenes that violate ecological dynamics are rated as less realistic in these studies, so, too, as Grodal points out, lighting effects that deviate from the norm are marked as artificial or theatrical and considered expressive of a film’s style or artfulness. At a more general level, Grodal raises an interesting question for ecological psychology. His use of the term affordances, while departing in some ways from its use in ecological psychology, prompts one to ask in what way emotions are involved in the perception of affordances. Is an action the only possible response to a perception, or is an emotional response equally ecologically valid? Is it as integral a part of the perceptual act? Is the perception of an opportunity for an emotional experience fundamentally different from the perception of an opportunity for physical action? In “Cinematic Creation of Emotion,” Zillmann takes an evolutionary if not expressly ecological approach to viewers’ affective responses to moving images, explaining the concept of excitation transfer, its relevance for the structuring of motion pictures, and the moral framework in which one’s affective disposition directs and even overrides basic empathic responses. If one’s perceptual systems are designed for interaction with the environment, it presents a challenge for an ecological approach to explain the film viewer’s engagement with “others’ confrontations with threatening conditions and fortuitous circumstances” instead of merely witnessing the fate of others. Zillmann goes a long way toward placing the film viewer in an ecologically viable position with regard to cinematic narratives by stressing the role played by empathy, “an archaic mechanism that, through the millennia, served emotional contagion and the coordination of action . . . [which] ultimately served the preservation of individuals and their species.” Though it is seldom in our contemporary world that this mechanism is called upon to jointly excite members of a group and prepare them for an action such as flight or attack, empathy remains a powerful means of “excitatory contagion” that, as Zillmann points out, plays a central role in our engagement with characters in fictional situations. Furthermore, witnessed events (in the world or in films) are assessed in moral terms, the approval of actions prompting dispositions of liking and caring and the disapproval prompting dispositions of disliking and resentment. To be able to intuit another person’s intentions has always been crucial to our survival. In the distant past, our lives depended upon our ability to accurately judge the moral character of other individuals of our own species. It is no less the case today; we must quickly judge whether we can trust a new acquaintance; we must know whether he will befriend or exploit us. Likewise, we observe the behavior of old acquaintances and continually update our assessments of their characters. The capacities to cope with these problems were developed through evolution, and the manifestations of those capacities are, as we might expect, similar from culture to culture. It is within this framework that judgments of social justice are rendered, judgments that allow for a viewer’s moral sanctioning of a variety of dramatic resolutions. In Zillman’s words “empathy functions as a basic default mechanism that, if not opposed and overpowered by affective dispositions that derive from assessments of deservingness, governs emotional reactivity to the observed fate of others. The condemnation of others’ conduct and the resulting disliking, then are indeed prerequisite to joy over their demise as well as to distress over their good fortunes.”

C OUPLING

OF

PERCEPTION

AND

E MOTION / 151

In the consideration of mood, empathy, and moral judgments, no less than in the consideration of perceptual systems, if we are to be open to an ecological approach, the human observer must be placed squarely in his ecological niche, bounded on every side by the biological and psychological capacities developed through evolution. Filmmakers exploit those capacities in order to engage the viewer, and as film theorists, our most adaptive strategy might be to adopt a willingness to be constrained by our knowledge of those capacities in our struggle to understand the viewing of motion pictures.

9 Film Lighting and Mood Torben Grodal L IGHTING IS ONE of the most powerful means of creating effect in films. The research on lighting as a means of creating effect has mainly been pragmatic and based on singlecase observations. Different cinematographers have commented on their experiments with different types of lighting (cf. e.g., Mankiewitz 1986; Schaefer & Salvato, 1984), just as handbooks in film interpretation and film production (cf. Bordwell & Thompson, 1990; Monaco, 1977) have asserted some rules of thumb induced from typical practices and by means of introspection. There may be good reasons for the fact that, compared with narration, for instance, there exists no such thing as a theory of film lighting: The experience of light is a basic one, linked to numerous different situations, and the experience might not be derived from a small set of principles with unambiguous effects. And although the perception of light, including film light, is based on some innate capabilities, many versions of film lighting deviate from those natural conditions for which our visual capacities have been developed. More research has been done on lighting within art (cf. e.g., Arnheim 1974). The following does not pretend to be able to put forward a general theory of film lighting but will provide an account of some of the dominant metaphoric descriptions of the effects of film lighting and provide some reasons for these effects. Describing the physical or technical layout of a given type of lighting is fairly easy; it is often possible to get descriptions from some of the people arranging the lighting. The problem of the intended effects is a much thornier one. To describe the cognitive effects of lighting—for instance, the way in which a given light enhances or impedes object recognition and object salience—in itself poses a series of problems for description. Mostly, however, the description of the effects of lighting is aimed at a larger endeavor, namely, to describe the way in which lighting aspectualizes the emotional experience of a given scene, resulting in sad, scary, or euphoric experiences. Although such moods may be analyzed in connection with an overall analysis of a given scene, it still raises the problem of how lighting contributes to mood. When cinematographers want to describe the effects of different types of lighting, they mostly use metaphors. Some of those are tactile (soft versus hard light, warm versus cold colors), others are muscular-kinetic: a given type of light provides a punch or a kick to the image. Such descriptions may not be just metaphoric in a vague sense but indications of ways that the viewer relates to given visual phenomena. To say that the light is soft, and thus also the objects illuminated with the soft light, may simply indicate the experience that the possible contact with the objects is evaluated as soft . To say that an image has got a punch may mean that the viewer has some low-level experiences of some qualities in the image that are dynamic and possibly suggest a “hard” interaction. A stone in a film is neither more 152

FILM LIGHTING

AND

MOOD / 153

nor less soft if it is illuminated with soft light; on a narrative level, it is hard, but it might be suggested that at a general experiential level, the lighting will modify its perceived qualities in the mind of the viewer, for instance, in the form of a general mood that represents the way in which its qualities are suggested, what they seem to afford for the viewer.1 Such very general esthetic experiences linked to lighting must, for reasons that I will explain later, be based in part on innate factors. I will therefore start with a discussion of the extent our experience of lighting is innate or molded by cultural factors. The Hardwired Expressiveness of Underlighting In the article “The Psychological Foundations of Culture” (Tooby & Cosmides 1992), John Tooby and Leda Cosmides have shown how the social sciences for the last eighty years have been dominated by a culturalist paradigm. The dominant idea in this paradigm is that all human behavior is a product of culture, that innate specifications and constraints have no part in the creation of human behavior. Tooby and Cosmides provide a powerful criticism of the culturalist paradigm and show how cultural development takes place on the basis of a biological design that supports and enables but also puts some constraints on the cultural development. The humanities in general and film studies in particular have also been dominated by a culturalist paradigm. Numerous books on film theory and film history have claimed that visual perception is a strongly historical product, influenced by ideology, social conditions, and the way in which visual representations in film, photography, and painting mold the visual perception. Not all film theorists, however, have accepted that view. Joseph D. Anderson (1996), for instance, has shown that a series of conventions in film editing can be explained on the basis of innate features of the human visual system, and I have elsewhere (1997) shown that the way in which Renaissance perspective has been described as an ideological construction is problematic. A strong illustration of the way in which visual experience is based on an innate brain architecture is the way in which underlighting in film (and in real life) is experienced. Underlighting refers to the phenomenon of space and objects being lighted by a source of light that comes from below. Under those natural lighting conditions, which existed during that prehistoric period of time in which our visual system developed, directed light always arrives at a scene from above, from the sun or the moon. Human beings probably got permanent access to cultural sources of light (fires) only after our present visual system was fully developed. Thus, underlighting is a strongly antinaturalistic effect. If our visual system were strongly determined by cultural factors, the use of underlighting would cause a habituation. But this is not the case. Most, if not all, critics agree that underlighting is systematically perceived as providing a strong “unnatural” salience to the underlighted objects and spaces. Many critics would furthermore state that this unnatural salience is interpreted as being negatively toned. Thus, James Monaco states that lighting from below gives it a “lugubrious appearance” (1977, p. 164). Kris Mankiewitz states, “As the saying goes, good people are lit from heaven and the bad people are lit from hell” (1986, p. 133). These clichés are not as obvious in today’s more natural and often softer lighting, yet the angle of light and the composition of light in the frame remain some of the most powerful tools for the creation of mood and for the shaping of an actor’s face. That underlighting should always be interpreted as uncanny or negative is easily refuted by analyzing, say, cozy or romantic scenes in which a fireplace provides the light. But, nevertheless, such scenes are also perceived as having a highly deviant salience although the emotional effect is contextualized (labeled) positively.

154 / TORBEN GRODAL

The reason for the unnatural salience of underlighted objects is very straightforward. As shown by, for instance, Vilayanur Ramachandran (1988), the analysis of spatial proportions is—among other factors—based on the shading. The analysis of shape from shading takes place by innate modules that work under the assumptions that light is coming from above and from a single source. Ramachandran produced a drawing of circles of which half of the area was shaded. Those with shadings in the upper half of the circle are perceived as concavities, those with shadings in the lower half are perceived as projections in space. If the paper drawing is turned around (so that those circles that had shadings in the lower half of the circle now had the shading in the upper half, and vice versa) the “holes” changed into “hills,” and the “hills” changed into “holes.” The perception of the form of objects thus takes place by means of a hardwired assumption that directed light comes from above and has a single source. It is therefore difficult to recognize figures with underlighting because the objects will be perceived with strange shadows and thus with strange proportions. Underlighting is therefore an example of a visual effect in which there is a natural norm of lighting, and deviations from this norm will be felt as unfamiliar (and expressive) even if they are motivated by a (cultural) source. Unfamiliarity will cause arousal (salience), but the hedonic valence (good/bad) of the arousal will, as mentioned, be determined contextually. However, many critics assume that underlighting has an uncanny effect, mostly linked to villains, and there might also be a reason for this, namely that it is easier to provide a negative than a positive contextualization for the effects of underlighting. All other things being equal, familiarity is linked to positive, upbeat feelings, unfamiliarity with negative feelings. That does not prevent filmmakers from contextualizing underlighting in such a way that it provides positive feelings, fuelled by the emotional salience of the deviating light, say, using a cozy source of light such as candlelight or fireplace. But the situation is marked as “extraordinary” and as expressing a “mood.”2 The intimate relation between lighting and mood is not something that characterizes only the experience of underlighting. Many of the lighting clichés in cinema (and real life) are used in order to create or enhance moods—from romantic sunset scenes to horror-inspiring fog-clad cityscapes. A preliminary reason for this can be found by considering the difference among feelings, moods, and emotions. Feelings and moods typically express non–object-directed general emotional states. To be depressed, happy, or romantic may often be experienced as general affective dispositions. In contrast, emotions are mostly concomitant with more specified action tendencies and object relations (cf. Frijda, 1986; Grodal, 1997). A person can be in a romantic mood without having a particular liaison in mind whereas being in love implies a specified object and some action tendencies. Moods thus express unfocused dispositions. Darkness reduces object control and enhances passive experiences whether such experiences are positive (for instance, in the context of a romantic encounter linked to a voluntary reduction of control) or negative (as in a horror environment and its forced reduction of control). The conscious or unconscious evaluation of a given type of lighting will thus be felt as mood. Attention, Highlighting, and Indexing Under natural conditions, light is a passive condition for seeing whereas visual attention is an active condition for seeing. There are under natural conditions no active ways of controlling the degree to which objects and spaces are lit. The basic assumption of a viewer is that the lighting conditions are objective aspects of the exterior world not caused by a communicative purpose, for instance, that some agent communicates something

FILM LIGHTING

AND

MOOD / 155

about the lighted scene. Variations in the lightness (and color) of the different objects and surfaces will influence the attention of the human onlooker. Thus, all other things being equal, a selectively highlighted object will stand out from the less-lighted objects or surfaces and thereby draw attention to the highlighted phenomena. Following Noël Carroll’s terminology (1988), we may call highlighting a special kind of indexing, a way of controlling attention by pointing. The experience of the highlighted phenomenon will be linked to the experience of its source. Natural highlighting is not caused by the intentions of some living creature but by the whims of Mother Nature. Patches of light in a forest and a sudden small hole in the clouds (and selective light in caves) are examples of naturally existing selective lighting. But because lighting typically varies without any connection to active human interests, the highlighted phenomena may either be transformed into a relatively disinterested esthetic experience (a beautifully lit mountaintop seen through a rift in some clouds) or eventually provide cause for a “supernatural” experience (because the onlooker for some reason constructs a metaphysical agency as causing the view). The filmmakers can also index some objects as being worthy of attention by using a culturally produced lighting scheme that expresses the filmmaker’s priorities in directing the viewer’s attention. The filmmaker may provide a selective, directed light at some object within the film frame. The highlighted object will draw the viewer’s attention by its visual prominence. Such a procedure might be an alternative to or a supplement to an indexing, a control of attention, by means of framing (and reframing) as described by Carroll. However, indexing by highlighting stands in a much more problematic relation to our innate visual assumptions than does indexing by framing and reframing. Our visual attention is normally intimately linked with our human concerns. What we focus on is what is central to our concerns at a given moment, we cannot separate our interests and our attention. If our concerns and interests change, our attention will automatically refocus to the new center of interest. The standard narrative film follows this link between interests and attention by demanding that a given frame is motivated, is linked to diegetic concerns. As long as the camerawork and cutting follow this rule of motivation, the indexing will strongly facilitate a seemingly seamless and naturalistic experience of the viewing process. Only when the indexing appears to be unrelated to some diegetic concerns—as for instance in some art films—the viewer’s attention will be drawn to the camerawork and eventually to the way in which it is intended by some extradiegetic agency (for instance the director’s artistic intentions). In contrast to diegetic indexing by framing, diegetic indexing by highlighting will automatically be felt as artificial, either by a nonconscious extraordinary toning of the experience or as a conscious experience of artfulness, because such intentional highlighting has no natural equivalent. The experience of artfulness may be highly positive or negative, depending on individual viewer preferences and the given execution of the artfulness. Humans have only recently gained control over lighting and thus made it possible to use highlighting for indexical purposes (enhanced by the way in which the development of houses and their doors and windows have increased experiences of selective lighting). Within the arts, it is an even more recent phenomenon; ancient and non-European art use indexical highlighting only sparely. However, painters of the Renaissance, for instance Rembrandt, experimented with selectively highlighting objects or parts of objects. Theater lighting developed schemes for selective highlighting. After a short period of time in which filmmakers primarily used natural light, filmmakers started to experiment with sources of artificial light, for instance with the purpose of highlight-

156 / TORBEN GRODAL

ing. In the era of silent cinema such an indexical use of light was often called Rembrandt lighting or Lasky lighting (cf. Jacobs, 1993). Clearly, this lighting was perceived by filmmakers as well as viewers as being artificial in a descriptive sense, for instance by being characterized as dramatic although the salience of the effect might be weakened by being provided with a naturalistic motivation (light from a window, fireplace, torch light, etc.). But by the very artificial linking of narrative concerns with highlighting—the very improbability of a co-occurrence of highlight and dramatic importance—meant that indexing by light was and is experienced as an artificial effect. The artificial experience is further caused by the fact that systems of highlighting typically imply several sources of directed light—something that the eye has no natural ability to interpret. The salience of an artificial system of lighting due to the strong activation by deviation from innate norms can be observed in the phenomenon of glamour lighting described by Kristin Thompson (Bordwell, Staiger, & Thompson, 1985, p. 226). Typical glamour lighting had, besides ambient fill light, two directed sources, a key light and a backlight, which, for instance, might combine a key light source directed at the front of the face and a directed backlight that lighted the outline of the hair. The root of glamour is a French word meaning “to cast a magic spell.” The word, therefore, not only indicates the way in which highlighting by a two-light-source system is perceived as pleasant but also in a slightly “magic” way, that is: There are no natural ways of interpreting the lighting of the actors, and there is no habituation to this artificial way of lighting. The glamour effects are rooted in a deviation from innate assumptions, and although we may become accustomed to these lighting effects, they will still be experienced as “marked.”3 Even if indexical highlighting that corresponds to some central human concerns (similar to diegetic concerns) has a low probability under natural circumstances, its chance occurrence may have a powerful effect by its natural rareness. It may even lead to a metaphysical interpretation: the highlighting is intended by some supernatural agency. Thus, some filmmakers have used natural highlighting in order to provide a metaphysical dimension to objects or persons. There is a further reason for providing a highlighted object or person with metaphysical qualities. There is a certain ambiguity in the perception of a selectively lighted object. To the degree that a reasonably general illumination is the basis for our experiences of objects, the highlighted objects may gain some of the characteristics of luminous objects that possess a luminosity that is much higher than their surroundings. This is also noticeable in the use of rim light. Besides serving to isolate the contour from the background, rim light also adds a perhaps subliminally perceived halo to a person. The source of the rim light is invisible, and the effect runs counter to the dominant lighting scheme (the effect of the key light). The subliminal effect of rim light, therefore, is to provide a light radiation whose apparent, if not true, source is the contour of the person. Rudolf Arnheim has discussed in detail how painters have experimented with giving persons symbolic significance by implying that they are sources of light (1974). He furthermore argues that sources of light are not perceived as having a surface because they are without texture. Thus, persons who are highlighted will by this lack of texture derive an ethereal softness. Ambient and Directed Light Light, as such, is—strangely enough—invisible, as pointed out by James J. Gibson (1986). The visual system has developed as a tool for orienting humans and animals in the world, for instance, to support motion in space and to enable object recognition,

FILM LIGHTING

AND

MOOD / 157

object manipulation, etc. The experience of light is therefore linked to this function and will consist of experiences of luminous objects and surfaces. The light is invisible in its trajectory through a transparent medium, and the experience of light comes about only if the light is refracted or reflected from some substance, object, or surface. Some of the blue light may be refracted by the atmosphere and thus cause us to experience a “surface,” the blue sky, or some light will be reflected from objects like human beings, trees, or rocks that cue experiences of these surfaces and objects. Sometimes we feel that we perceive light as such, as when some rays of light hit small particles in the air, but we are actually seeing the reflections from those particles, not the light itself. For evolutionary reasons, light is primarily interesting when it serves as a vehicle for our experience of the physical environment, and vision therefore obtains information that is perceived as immanent features of the objects and surfaces. A prime example of object invariance is commonly labeled color constancy (cf. Zeki, 1993, pp. 230–37). Natural lighting has an enormous variability in its quantitative dimension from strong noon sunlight to a starlit midnight light as well as in its different qualitative dimensions, its spectral composition, or its sources (directed light versus ambient light, for instance). Different viewer positions and different trajectories of light from source to object further create an enormous variability in the perceived shapes and shadings. The human visual system is confronted with the formidable task of extracting some invariant features out of the ever-changing optic array that meets the eyes. To perceive some immanent and permanent features is thus impeded by all those factors that may be perceived as contingent, as derived from special types of lighting and viewing positions. We are usually not consciously aware of having extracted an invariant such as color; our awareness may be described as tacit knowledge. When we see a face with strong shadows, our conscious experience is based on its transient appearance although we will have a tacit knowledge about some permanent features of the face that exist below the shadows. We “know” that the face is perceived under special lighting conditions. This knowledge is similar to our tacit knowledge that the face has a backside, the back of the head, even if we do not see it. When we see a landscape at sunset, we have a double experience: We are seeing the landscape and its objects under special lighting conditions, and at the same time, we are somehow aware of some permanent features of the landscape and the objects. The discrepancy between the conscious experience of the transient features and the tacit experience of permanent features is partly represented in consciousness as feelings, for instance as moods. The feelings express the general affordances of the scene under these specific lighting conditions, for instance whether they facilitate or impede interaction. In Carl Dreyer’s Vampyr, the moonlit landscapes impede full object recognition, and the special viewing conditions are represented by a mood that marks the depressed visual orientation along with the diminished capabilities for action and control. A central parameter in the visual experience of objects is the relationship between the directed light and the ambient light. Directed light reaches the object directly from the light source and is then reflected, whereas ambient light is refracted by passing through some transparent material (the atmosphere for instance) or reflected from other surfaces and objects. Whereas directed light radiates from one point, ambient light arrives at a given object from multiple points, from all the surfaces of the environment of the object. Overcast weather will create a high ambience because the objects will be lighted with light derived from many points. Fog will also create a high ambience, but because the light will also be reflected after having been reflected by a given object, it will be increasingly difficult to trace the visual information back to the object. Extreme

158 / TORBEN GRODAL

“post-object” ambience will ultimately make object vision impossible; the viewer will only see a uniform array of white light. Thus, the visual perception of objects and surfaces has two vanishing conditions: total darkness and total white ambience. Between those two extremes are the many different lighting configurations that enable vision. Central to human vision is depth perception—seeing objects and spaces as threedimensional. The perception of three-dimensionality is partly based on processes that are relatively independent of lighting, such as stereopsis (seeing with two eyes), the density of texture elements, and overlap of objects. Other processes are, however, very dependent on lighting in combination with the point of observation relative to a given object. This is, for instance, the case in the evaluation of distance by degree of ambience (remote objects are normally more hazy because of greater atmospheric refraction) and shading. Shading has a profound impact on our experience of objects and thus our aesthetic experience, not least because various shadings may produce radically different perceptions of the same object. To recover shape from shading may, therefore, be extremely difficult. Shading not only provides information but also may often provide “noise” that blocks not only three-dimensionality but also object recognition. Thus, David Marr has stated, “The human visual processor seems to use only coarse shading information, often but not always correctly, which is probably why shading is easily overridden by other cues” (1982, p. 248). One set of variations is linked to the source of the directed light in relation to the observer. Let me illustrate this with an example, the perception of a human face, facing the observer and illuminated by directed light only. If the light is coming from behind the observer (the camera), there will be a minimum of shading, and the face will look “flat.” If the face is illuminated with side light, that is, light coming from either the left or right side of the object, the face will get strong shadings that will enhance the curves on the chin and make the nose ridge very prominent, but the opposite side of the head will be placed in deep shadow. The shading will enhance a three-dimensional physical appearance but distort the overall perception of the face, which is now very asymmetrical. If the source of light is directly behind the face, the face will be in deep shadow and will only exist as a dark two-dimensional surface, defined by the contour line. Thus, except for front lighting, the main effect of strong directed light is to enhance the discontinuous, dramatic aspect of the face, either by enhancing its physical and sculptural three-dimensionality (in combination with asymmetry) or by enhancing its nonphysical appearance as a two-dimensional silhouette. Furthermore, directed sidelight enhances the dramatic three-dimensionality, but it suppresses the ability of the observer to see the face as a continuous (and soft) surface by giving prominence to those aspects of the face where there is a radical change of curvature, such as the cheekbones or nose ridge. Even details of the skin surface such as pores or scars may gain a dramatic salience. The reflected, ambient light will in several ways result in the opposite effects from those caused by directed light. Because the light waves hit the object from many different points, the ambient light will not create strong shading, and if a given scene is illuminated solely with ambient light, shadows and shading will disappear. Surfaces will not be seen as defined by radical changes but as continuous surfaces. This may provide an eerie feeling. If, for instance, the weather is strongly overcast, the lack of shadows may provide the objects with a two-dimensional immateriality whereas other cues (such as overlapping) point to three-dimensionality. It is this natural conflict of depth information such as overlap, etc. versus surface information created by ambience that causes the salience. If ambient light is combined with directed light, the ambient light will partly

FILM LIGHTING

AND

MOOD / 159

fill in those areas that are shaded by the directed light, and it thus softens the discontinuous and three-dimensional tendency. A nose ridge may still have some prominence but is now part of the continuous skin surface of the face because the shadows have been softened. (Some of the effects of ambience may be produced by defocusing a given image, thus increasing continuity by blurring the visual information.) The main cinematic way of producing those two lighting elements, directed and ambient light, is by having a key light producing directed light and a fill light to produce ambient light. We may characterize four prominent types of lighting as follows: 1. strongly ambient light that enhances the perception of objects as continuous surfaces experienced with a two-dimensional lack of volume that eventually may result in an experience of immateriality 2. a combination of directed and ambient light that cues three-dimensional volume and physicality but also a perception of a two-dimensional continuous surface 3. a predominantly directed (low-key) light that enhances three-dimensionality (except when it is frontal) and suppresses the experience of continuous surfaces, eventually by blocking perception of parts of, in principle, the visible surfaces by strong attached shadows 4. a strongly directed light without any ambient light that may partly or totally dissolve the viewer’s ability to perceive the objects, including his or her ability to perceive its three-dimensionality (strong backlight, expressionist, or noir scenes with only directed light). The light dissolves the ability to perceive the object as a whole, and the attached or cast shadows may get such dominance that they are perceived as autonomous objects Of these four types of lighting, clearly type two provides a kind of information optimum by providing information for surface as well as volume. So, even if this type of lighting might not be the dominant one, it would certainly be the most probable candidate for producing the “canonical” object experience by providing an optimum of object information. Another way to express this is that the object is perceived as having an absolute, immanent physical existence, contrary to when the viewer experiences the object under certain lighting conditions. Such a perception of an object immanence is a matter of perceiving invariant properties, those things that do not change as lighting and viewing angle change. The capacity to perceive the invariant properties of the object is very practical and enables the viewer to recognize the object under various lighting conditions and to evaluate how deviating lighting causes a deviant appearance. The classical Hollywood cinema’s high key lighting scheme for interior scenes (and the use of reflectors for outdoor scenes in order to provide ambience to hard sunlight) was aimed at facilitating such a canonical object perception. This scheme was often combined with presenting objects, scenes, and persons by canonical views, that is, seen from an angle that provides an optimum of information (cf. Grodal, 1997, p. 53). But all those representation that are similar to types one, three, and four do not possess such an object immanence; they are perceived as representations under certain contingent lighting conditions. The viewer will to a certain extent compare the appearance under the contingent lighting conditions with some tacit knowledge of the canonical appearance and experience the deviation as feelings. But as the sight is not transformed to its canonical form, the deviant representation will also be taken at face value. If a viewer

160 / TORBEN GRODAL

thinks that a crook looks sinister if part of his face is covered with deep, attached shadows, it means that the viewer does not fully mentally construct the crook as he would have looked by canonical lighting. The viewer somehow accepts the surface appearance as a valid representation, even if he or she knows that there is soft, colored flesh below the dark patches of shadows. However, by thinking that the crook looks sinister, the viewer is implicitly seeing the appearance as deviating, as expressive. We thus have two complementary mental reactions. One reaction is tacitly aimed at constructing a permanent canonical object immanence out of the variations created by lighting, and one reaction consists of perceiving the object under the given and maybe transient lighting conditions. The first reaction is the unmarked one; the second one is separated from the first by the feelings of expressiveness. When seeing a film noir, the viewer perceives the actual distorted scenes and figures, but their deviation from some implicit norm only get access to consciousness by the feelings of expressiveness. By contrast, even canonical lighting may derive expressiveness. In the beginning of Orson Welles’s The Lady from Shanghai, the Welles character sits besides the Rita Hayworth character in a horse carriage. He is lightened by hard sidelight that leaves most of his face in shadows whereas she is highlighted in a rather canonical way that provides her face with an angelic expressiveness in contrast to the Welles character. Expressiveness is a broad term that covers the activation of a range of different feelings, and we, therefore, need to establish more-specific links between a given deviating type of lighting and the type of feeling that is activated. Are there more-specific links between certain types of lighting and certain specific expressive feelings, and how specific are those links? I will try to answer these questions by some examples. Darkness diminishes the intake of visual information and thus diminishes visual control and therefore the ability to act and control. So, at least on a general level, darkness should be concomitant with feelings of deactivation. Deactivation by darkness can be a forced and stressful block of action tendencies, as in horror films, and also a block of moral control, as in Martin Scorsese’s Taxi Driver. It can further be a wished for deactivation as in a romantic night scene or a peaceful and contemplative sunset scene. So, the specific expressive qualities of darkness can only be determined by its diegetic context, by an analysis of whether the depression of activity cued by darkness is concomitant with a voluntary relaxation or is opposing a wish for control. Similarly, strong backlighting is generally expressive because it reduces three-dimensional objects to an immaterial twodimensionality, but whether this will be experienced positively as a sublime transfiguration or negatively depends on the diegetic context: An immaterial villain is extra ominous, whereas an immaterial hero gains additional sublime qualities. But in both cases, the perceived two-dimensionality will impede concrete interaction that presupposes solid three-dimensional canonical objects and will thus engender feelings of a deviating (decreased) reality. The tendency to perceive permanent objects has been blocked, and the objects are perceived in their transient, lighting-derived form. The two examples above seem to indicate that some general expressiveness is a functional element of a given type of lighting; darkness or strong backlighting is mostly expressive. But the fine-grained molding of the expressiveness is a product of the specific diegetic contextualizations. The relation among lighting, affordances, and feelings can further be illustrated by looking into the effects of ambient light and directed light. These two types of light are normally referred to by means of two tactile metaphors, namely soft light and hard light, respectively. These metaphors indicate not only metaphoric tactile dimensions in the

FILM LIGHTING

AND

MOOD / 161

visual experience but also, as mentioned earlier, some emotional connotations. Softness is mostly linked to tactile experience of surfaces, often produced by organic surfaces whereas hardness is linked to solid three-dimensional objects, often of a mineral kind. However, all humans know that human faces are predominantly soft, organic tissues, and, thus, if a viewer experiences human faces illuminated by directed light as hard, then the viewer is cued by the lighting conditions, not by his or her knowledge. Some fundamental mechanisms in the online perception override the universal knowledge of the tactile qualities of human skin and human faces. The hardness or softness of a given face or object may be enhanced by context, say, using soft light for a romantic scene or hard light for a thriller. But hard light on a romantic scene would still add an experience of hardness to the scene and characters. Thus, an explanation must also account for some context-independent innate factors in the experience of light and lighting. Innate Factors in the Experience of Light and Lighting Some of the learned aspects of the expressive qualities of lighting are derived from the interaction with universal and fundamental experiences, namely, the cycle of lighting caused by the sun and even aspects of the change of weather. Central aspects of the experience of light are linked to our experience of the daily and seasonal temporal flow, eventually modulated by the change of weather. The daily changes in lighting constitute a fundamental experience of a cyclical modulation of affordances and activation, linked to general modulations of mood. In real life, the daily light cycles are relatively slow, but films may dramatically speed up the process and thus provide a strong, focal awareness of the transience of our experience of objects and scenes. (The relation between transience and permanence in the experience of art is analyzed in Grodal, 2000a.) Furthermore, films may also provide much more powerful synchronization between the characters’ central concerns and lighting. The traditional romantic symbolism of Friedrich W. Murnau’s Sunrise, where the solution of the narrative problems is linked to a sunrise, demonstrates the power of synchronizing narrative schemas with a natural scheme of lighting. Similarly, the expressive impact of silhouetting the Hayworth character in The Lady from Shanghai is derived from using a natural expressive phenomenon in a specific narrative context. Thus, on one hand, lighting relies on a series of experiences that are based on innate capabilities or universal experiential conditions. On the other hand, as also discussed in relation to indexing with light, the way in which filmmakers may synchronize such experiences with central diegetic concerns has a very small natural probability. That the solution of the problems for the Welles character in The Lady from Shanghai takes place just before dawn so that he—in the last sequence—can walk out to a “new day” in San Francisco has indeed a very low natural probability. Many viewers may recognize this by feeling that the scene is theatrical, that is, as an expressive and antirealistic effect. But the cinematic power of the scene has natural and universal roots. Light is the most important medium for gaining information about the world and guiding our interaction with our environment. Our experience of lighting is not a neutral intake of information but is welded together with feelings and moods that in shorthand tell us something about the affordances of a given scene or a given object. The experience of viewing a motion picture will be an emotional one because our feelings and

162 / TORBEN GRODAL

emotions will make their contributions toward expressing what a given scene affords characters or viewers. Vision and light have a special role in the formation of consciousness. The basic frame for our consciousness, when we are fully awake and conscious, is the continuous visual experience of spaces and objects. For good reasons, this visual frame cannot show us how things actually are but shows us only how things look from a certain angle and with a given lighting. Much of our knowledge about objects and their affordances will not achieve a focal and salient visual form in our consciousness. We do not see what we tacitly know is beneath shadows or is invisible from a certain point of view. Different kinds of tacit knowledge will, therefore, be attached to the conscious visual surface information and gain conscious salience by means of different kinds of feelings, as when we experience some types of ambient light as soft or feel that strong light (within limits) creates upbeat feelings. A central aspect of affordances is the reality status of what is seen, experienced as feelings. These feelings may indicate an expressive deviation from some norms as when underlighting provides a feeling of uncanniness. The experience of light and lighting is a source of knowledge about an objective, exterior world. But the experience is not disinterested. It is deeply intertwined with our concerns and subjective interests. For that reason, lighting is a powerful tool for inducing and changing feelings and moods. Notes 1. Affordance is a concept used by the psychologist James J. Gibson in order to describe the functional relations between world, perception, and animal or human action. 2. Whether strong salience by unfamiliarity is perceived as positive or negative is also viewerdependent. It has been shown (Rubin, 1994) that viewers select activating or relaxing films and TV programs according to situational viewer needs: Stressed viewers select relaxing films, and understimulated or “bored” viewers select exciting films. 3. The eventual experience of glamour as “magic” indicates the way in which the effect is transformed into a feeling (linked to a sublime-reality status of the object) linked to some qualities felt as being immanent in the object. This feeling of “magic immanence” is due to the aforementioned difference between attentional processes that mostly are interpreted as active processes performed by the viewer and the passive nature of lighting as a phenomenon that are not under subjective control but happen passively and therefore are linked to passive moods and feelings. That does not, however, preclude the viewer from also having an experience of the film as being actively narrated with light.

References Anderson, J. D. (1996). The Reality of Illusion: An Ecological Approach to Cognitive Film Theory. Carbondale: Southern Illinois University Press. Arnheim, R. (1974). Art and Visual Perception: A Psychology of the Creative Eye. Berkeley: University of California Press. Bordwell, D., Staiger, J., & Thompson, K. (1985). The Classical Hollywood Cinema: Film Style and Mode of Production to 1960. London: Routledge. Bordwell, D., & Thompson, K. (1990). Film Art: An Introduction. (3rd. ed.). New York: McGrawHill. Carroll, N. (1988). Mystifying Movies: Fads and Fallacies in Contemporary Film Theory. New York: Columbia University Press. Frijda, N. H. (1986). The Emotions. Cambridge: Cambridge University Press. Gibson, J. J. 1986). The Ecological Approach to Visual Perception. Hillsdale, NJ: Lawrence Erlbaum.

FILM LIGHTING

AND

MOOD / 163

Grodal, T. (1997). Art Film, the Transient Body, and the Permanent Soul. Aura. Film Studies Journal 4(3), 33–53. ———. (2000a). Moving Pictures: A New Theory of Film Genres, Feelings, and Cognition. Oxford: Clarendon/Oxford University Press. ———. (2000b). Subjectivity, Realism, and Narrative Structures in Film. In I. Bondebjerg (Ed.), Moving Images, Culture & the Mind (pp. 87–104). Luton, England: University of Luton Press. Jacobs, L. (1993). Belasco, DeMille, and the Development of Lasky Lighting. Film History, 5(4), 405–18. Mankiewitz, K. (1986). Film Lighting: Talks with Hollywood’s Cinematographers and Gaffers. Fireside Books. Marr, D. (1982). Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. San Francisco.: W. H. Freeman. Monaco, J. (1977). How to Read a Film: The Technology, Language, History, and Theory of Film and Media. New York: Oxford University Press. Ramachandran, V. S.(1988). Perception of Shape from Shading. Nature, 331, 163–66. Rubin, A. M. (1994). Media Uses and Effects: A Uses-and-Gratifications Perspective. In B. Jennings & D. Zillmann, (Eds.), Media Effects: Advances in Theory and Research (pp. 417– 36). Hillsdale, NJ: Lawrence Erlbaum. Schaefer, D., & Salvato, L. (1984). Masters of Light: Conversations with Contemporary Cinematographers. Berkeley: University of California Press. Tooby, J., & Cosmides, L. (1992). The Psychological Foundation of Culture. In J. H. Barkow, Cosmides, & Tooby (Eds.), The Adapted Mind: Evolutionary Psychology and the Generation of Culture. New York: Oxford University Press. Zeki, S. (1993). A Vision of the Brain. Oxford: Blackwell.

10 Cinematic Creation of Emotion Dolf Zillmann THE DYNAMICS OF emotion that govern responses to actual situations versus to cinematic presentations thereof may be much the same. There is ample research evidence that demonstrates considerable commonality in the mediation of affect by the two formats (Zillmann, 2000a). However, one principal condition exists that sets cinematic storytelling apart from alternative means of relating chains of events, and this condition proves to be pivotal in considering the creation and modification of emotional reactions. The condition in question is simply that cinematic narrative invariably compresses the time course of the happenings that make up a story and then, in delivering the story, imposes reception time (Grodal, 1997). Emotions evoked in actuality by personal success or failure are usually allowed to run their course. A person, after achieving an important goal, may be ecstatic for minutes and jubilant for hours. Alternatively, a grievous experience may foster despair or sadness that similarly persists for comparatively long periods of time. Mostly for physiological reasons and also as a result of reflection, emotions are not momentary experiences, but cinematic narrative treats them as if they were. As a rule rather than the exception, featured events that instigate emotions are followed by the presentation of other events long before all relevant aspects of the instigated emotions have subsided. Such compression of emotional and nonemotional events has, as we will see, intriguing implications for emotional experience. It should be noted at this point that the compression of events in cinematic narrative does not necessarily extend to fiction generally. Written prose allows readers to pause when emotionally stirred and to continue only after recovery. All presentational formats that permit the pacing of information intake afford recipients a degree of control over their affective responding. All formats that dictate the pace of intake, whether concerning fiction or nonfiction, do not. These formats, because they impose reception time, entail unique means of evoking and escalating emotional experience. The paradigm that addresses these means focuses on the transfer of excitation from an initial emotional reaction to subsequent ones, primarily to the immediately following reaction. Excitation Transfer Cognitive activity does not sufficiently define emotional experience. It is generally thought that emotions entail a stirring, rousing, and driving component. This component of the emotions has been labeled arousal or excitation, and it has been conceived of in bodily terms. Two-factor theories have suggested an interaction between cognition and arousal, with cognition determining emotions in kind and arousal their expe-

164

C INEMATIC CREATION OF EMOTION / 165

riential intensity and behavioral urgency (Hebb, 1955; Schachter, 1964). This conception has been elaborated in a three-factor theory that more fully accounts for the mostly involuntary evocation of excitatory activities as well as for their waxing and waning over time (Zillmann, 1978). Three-factor theory distinguishes between dispositional, excitatory, and experiential components of emotion. Both the dispositional and excitatory components integrate reflexive response tendencies with reactions acquired through learning, whereas the experiential component involves cognition in the service of behavioral guidance and response correction. Basic emotions, such as specific fears and aggressive impulsions, often defy rationality and are not instigated by reflection. Excitation associated with these emotions is obviously not controlled by contemplation either. Rather archaic mechanisms mediate these reactions, whether they are made in response to actual situations or to their iconic representations. However, cognitive elaboration can function as a corrective and diminish and shortcut emotions that are recognized as inappropriate and groundless. On occasion, it also can exacerbate and even initiate emotions. This brings us to emotional misreactions that are not recognized as such, misreactions that are regularly and often deliberately created in cinematic presentations. On the well-founded premise that cognitive adaptation to stimulus change is rapid and quasiinstantaneous whereas excitatory adaptation is sluggish and time-consuming, it can be expected that persons will quickly switch cognitively from situation to situation. In contrast, excitation instigated by a first situation will persist through a second one and possibly through yet others (Zillmann, 1996a). It is established beyond doubt that excitation, once triggered, decays rather slowly. For all practical purposes, it takes at least three minutes, often ten or more minutes, on occasion hours for excitation to return to normal levels. This is for reasons of humoral mediation. Specifically, excitatory reactions are instigated by the release of adrenal hormones (the catecholamines, epinephrine, norepinephrine, and dopamine, in particular) and, to a lesser degree, of gonadal steroids (mostly testosterone) into systemic circulation. The excitatory reactions persist until these agents are metabolized (Zillmann & Zillmann, 1996). Excitation in response to particular stimuli, then, is bound to enter into subsequent experiences. In case of contiguously placed discrete emotions, residual excitation from the first thus will intensify the immediately subsequent emotion, regardless of differences in kind. Moreover, depending on the strength of the initial excitatory reaction and the time separation of emotions elicited at later times, residual excitation may intensify experiences further down the line. This is the principle of excitation transfer. Before considering the effects of the cinematic chunking of emotion-inducing episodes, let us take a look at common experiences of emotional overreaction in situations of rapid cognitive but sluggish excitatory adjustment to changing conditions. Let us imagine, for a moment, a lady who steps on a snake in the grass of her backyard. Deeprooted survival mechanisms, organized in the brain’s limbic system (LeDoux, 1993), will be activated and make her jump back and possibly scream. Following this initial reaction, she might find the time to construe her emotional behavior as fear and panic. She might also notice herself shaking and thus realize that she is greatly excited. Let us imagine further that, upon looking once more at the object of her terror, she recognizes that the snake is a rubber dummy, in all probability planted by her mischievous son, who rushes onto the scene laughing his head off. This recognition, a result of instant cognitive adjustment to changing circumstances, proves her initial emotion of fear ground-

166 / D OLF ZILLMANN

less and invites a new interpretation of her experiential state. If she is annoyed with her son for giving her such a scare, she is likely to become infuriated. But after fully comprehending the prank, she might consider being angry inappropriate and cognitively adjust once more, this time joining in his laughter and appraising her experience as amusement. Throughout this cognitive switching from experiential state to state, the excitatory reaction to the detected danger in the grass persisted to varying degrees. It initially determined the intensity of the fear reaction. The residual excitation from this reaction then intensified the emotion of anger and the experience of amusement. Had the lady acted out her anger, she would have overreacted in spanking her son, but transfer-intensified reacting might also have expressed itself in fits of laughter bordering on the hysterical. Emotional overreactions of this sort are commonplace. At one time or another, everybody seems to have experienced the extraordinary intensity of frustration after rousing efforts, of joy upon the sudden resolution of nagging annoyances, of gaiety after unfounded apprehensions, or for that matter, of sexual pleasures in making up after acute conflict (Zillmann, 1998a). Irrespective of personal experiences, however, ample research evidence exists that shows the transfer intensification of all so-called active emotions (i.e., emotions associated with increased levels of excitation), of their experiential states as well as of their behavioral manifestations (Zillmann, 1996a). It has been demonstrated experimentally, for instance, that residual excitation from sexual excitement can intensify anger and aggressive behavior, but also altruistic feelings and supportive actions. Moreover, residual excitation from either sexual excitement or disgust has been found to facilitate such diverse emotional experiences as the enjoyment of music, the appreciation of humor, and feelings of sadness. Residues from feelings of sadness and fear, in turn, have been found to intensify joyous reactions to fortuitous happenings. Frustration has been observed to intensify euphoric as well as angry subsequent feelings. Even nonemotionally induced, hedonically neutral excitation was found to transfer into subsequent states. Specifically, it has been demonstrated that residual excitation from strenuous physical exercise can enhance feelings of anger and aggressive behavior, intensify sexual arousal, promote help-giving, elicit feelings of grandiosity and elation, foster favorable reactions to advertisements, and facilitate sexual attraction. The bottom line is that residual excitation from essentially any emotional reaction is capable of intensifying any other emotional reaction. The degree of intensification depends, of course, on the magnitude of residues prevailing at the time. Cinematic Dramaturgy of Transfer In the empirical exploration of the enjoyment of cinematic presentations, the excitation-transfer paradigm has been employed, primarily, to explain the suspense paradox. Why is it that emotional distress from witnessing protagonists in peril can be converted to joy upon suspense resolution? Moreover, how can it be that greater initial distress fosters more joy in the end (Carroll, 1990)? Research with both children and adults has firmly established this long presumed relationship (Zillmann, 1996b). Excitation transfer is its explanation. Specifically, the distressing experience of suspense is arousing, and residues of this arousal linger through resolution and intensify the experience of relief and euphoria. Again, cognitive adjustment to the changed circumstances featured in the resolution is rapid, whereas excitatory adjustment is drawn out. The more intense the suspense-induced distress, finally, the greater the excitatory residues that come to

C INEMATIC CREATION OF EMOTION / 167

energize joyous reactions to the satisfying outcomes of the resolution. Narratives that present seemingly doomed protagonists who struggle against hostile environments and who merely manage to survive perhaps best illustrate the transfer logic. There may be little heroism, if any, in this getting away with dear life. The resolution thus offers little to celebrate and to be jubilant about. Such minimal-heroism resolution can be intensely enjoyed, however, when appropriately preceded by empathic torment. Another narrative domain that has received some attention in these terms is tragedy (de Wied, Zillmann, & Ordman, 1994; Zillmann, 1998b). It has been observed that highly empathic recipients are more distressed by and literally shed more tears about tragic happenings than do their less empathic counterparts. The resolution again offers little that might be celebrated, but those who are particularly distressed by the tragic events take whatever redeeming value there may be in the resolution as a cue for happiness, experience its arousal intensification, and report greater enjoyment of tragic drama overall. Other research has shown the transfer intensification of humor (Zillmann, 2000b). The concept of comic relief obviously focuses on relief. Its cinematic form may well serve this purpose in preventing excessive distress (King Jablonski & Zillmann, 1995; Zillmann, Gibson, Ordman, & Aust, 1994). The concept can also be construed as one that maximizes mirth in response to comic situations. Comic material may be mediocre but is bound to produce strong reactions after tense, arousing scenes. These arousal-enhanced reactions of amusement, especially when overtly expressed in laughter, may have a cumulative effect and result in assessments of greater enjoyment of drama that provides frequent opportunities for comic relief. Most research on the transfer-facilitation of emotions has been conducted independent of cinematic considerations. However, demonstrations such as that residual arousal from distress can facilitate subsequent sexual excitement or that excitatory residues from fear can intensify feelings of sympathy and support are directly applicable to cinematic dramaturgy. Scenes can be aggregated in ways that maximize emotional reactivity to some and minimize it to others. For instance, arousing violence preceding the display of sexual behavior will intensify reactions to the sexual scenes, and arousing, distressing torture will energize jubilation and applause to the punitive dismemberment of the torturer. Transfer theory projects such facilitation for all scene-evoked affective reactions, provided that the afore-placed scenes arouse and that residues of the arousal outlast these scenes. In developing a dramaturgy of transfer more formally, the following propositions can be stated. (a) Arousing scenes from which excitatory residues are to be transferred into subsequent scenes must terminate prior to appreciable dissipation of excitation. (b) The intensification of affect in response to subsequent scenes is a function of the magnitude of excitation elicited by these scenes plus that of residual excitation from preceding scenes. (c) Affect facilitation is stronger the more immediate the placement of subsequent scenes. (d) Affect facilitation is stronger the less drawn out the subsequent scenes. (e) In case both antecedent and subsequent scenes are strongly arousing, affect facilitation can escalate. The escalation is limited, however, by experiential maxima for excitation. A law of initial values specifies that excitatory contributions from arousing scenes are inversely proportional to the height of prevailing levels of arousal (Wilder, 1957). In other words, as experienced arousal levels increase, successively less excitation from arousing scenes can be added. The law thus renders the aggregation of highly arousing scenes comparatively inefficient for transfer.

Fig. 10.1. Principal forms of transfer intensification of emotional responses to contiguous scenes. S-specific excitation refers to excitation in response to the particular scene. Residual excitation refers to portions of excitation that remain from exposure to preceding scenes. See text for further explanation.

C INEMATIC CREATION OF EMOTION / 169

( f ) The facilitation of affect in response to scenes that are separated from preceding arousing scenes by unarousing scenes is stronger the shorter the time of the separating scenes. Facilitation terminates, of course, with the complete dissipation of residual excitation. (g) The facilitation of affect in response to subsequent scenes is prevented by delaying their placement until excitatory residues from preceding scenes have completely dissipated. The first schema in figure 10.1, proceeding from left to right, shows a highly arousing scene (S1) followed by four unarousing scenes (S2–S5). Residual excitation is presumed to dissipate by one-third from scene to scene. As can be seen, reactions to S2, whatever its contents, will be highly emotional despite that this scene does not contribute arousal. Transfer intensification by S1 is successively weaker for the subsequent scenes. The second schema indicates transfer under the same conditions, except that S2–S5 are now presumed to contribute minor degrees of arousal (one-sixth of S1). These contributions, as can be seen, retain excitation at comparatively high levels for S2–S5. The logic is that of compounded interest. S2 combines excitation from its stimuli with residues from S1. S3 also combines excitation from its stimuli with residues from its antecedent, S2, with the S2 residues entailing portions of the residues of S1. All later scenes analogously benefit from the combined residues of their antecedents. In the third schema, S2 to S5 are presumed to supplement the amount of S1 excitation, that is lost to decay (one-third). As can be seen, high levels of excitation can be maintained by such supplementation. The fourth schema displays excitatory escalation for subsequent scenes by making them contribute one-half the excitation of S1, the highly arousing initial scene. As can be seen, the escalation is negatively accelerated, leveling out eventually. The last schema illustrates transfer in a contrived possible situation. S2 and S4 are presumed to be unarousing, S3 to be mildly arousing (one-sixth of S1), and S5 to be moderately arousing (one-third of S1). The schema indicates that intermixed nonarousing scenes can be made to appear considerably arousing and that the occasional usage of mildly and moderately arousing scenes can maintain excitation and therefore affect intensity at comparatively high levels. Strategies for the creation of scene compositions with optimal emotional effects can be constructed by applying propositions (a) through (g). Such strategies are also apparent from inspection of the schemata in the figure. Origins of Excitation Evolutionary psychology has emphasized the fight-flight reaction (Cannon, 1932). Individuals are thought to continually monitor their environment for danger and to respond with attack or escape when detecting it. A burst of energy is needed to respond in such fashion, and the immediate instigation of sympathetic excitation (i.e., the activating component of the autonomic nervous system) serves this purpose. The emotions of anger and fear, then, are energized in preparation for action. Such action is obviously not called for when responding to cinematic representations of danger. However, because these reactions are organized in archaic brain structures, the amygdala in particular, cinematic scenes of danger, in defiance of rationality, still trigger excitatory reactions (Zillmann, 1998c).

170 / D OLF ZILLMANN

The fight-flight dichotomy was eventually expanded to a response trichotomy that includes sexual preparedness (Zillmann, 1986). Sexuality, serving the preservation of the species rather than self-preservation, is similarly deep-rooted evolutionarily. Sexual activity, mostly organized in the septal structures, also requires energy for bouts of exertion, and this energy is likewise provided by sympathetic excitation. As in case of danger, the cinematic presentation of others’ sexual opportunities and actions still elicits sexual excitedness, notwithstanding that sexual targets for consummatory behavior are not immediately available. Iconic representations of danger and sexual opportunities, then, may be considered stimulus conditions that reliably arouse and that therefore will foster responses that are construed as affective or emotional experiences. It would seem to be a grave error, however, to consider the display of perilous happenings and erotic enticements the only or even the primary condition for the creation of excitation and emotions. Cinematic narratives invariably involve and are built around people and other animated entities. Floods, quakes, and fires but also poisonous snakes, snarling leopards, and murderous villains threaten others—the narratives’ cast of characters. On occasion, these threats are presented as if they were directed at the recipients. If they are presented in this manner, they supplement the display of others in peril rather than define autonomous dramatic narratives. For instance, floodwaters presented as rushing toward viewers or snakes presented as striking at them may prove arousing because they more closely than alternative presentations replicate the stimulus conditions of being personally threatened. Such displays thus may be used to create arousal, but they also may serve to provide recipients with a better appreciation of the dangers facing those who are seen coping with them. Cinematic narratives, no doubt, evoke emotions primarily by featuring others’ confrontations with threatening conditions and fortuitous circumstances, as well as by displaying these others’ reactions, including emotional ones, to their demise or to their enrichment as such outcomes materialize. Unless the narrative is interactive and makes the recipients active participants in its flow (Grodal, 2000; Vorderer, 2000), recipients are mere witnesses to the fate of others (Tan, 1996; Zillmann, 1994). Given that status, recipients respond nonetheless with emotions, at times with emotions of extreme intensity, to the fortunes and misfortunes they see others enjoy or suffer. In order to explain such strong emotional involvement of observers, the concept of empathy has been invoked and employed to good effect. Empathic Evocation of Emotion Empathy can be construed as an archaic mechanism that, through the millennia, served the social contagion of emotion and thereby the coordination of motivation and action (Hoffman, 1978; Zillmann, 1991a). It ultimately served the preservation of individuals and their species. In a group’s confrontation with danger, for instance, it undoubtedly was adaptive to get jointly excited and thus prepared for vigorous adaptive action. The contagious effect of one individual’s expression of fear could instantly permeate the group, readying all for flight or the expression of anger and assertive behavior could instantly foster preparedness for concerted resistance and attack. The conditions of life in contemporary times have, of course, deprived empathy of much of its utility. However, as a mechanism of excitatory contagion, empathy has been retained in the paleomammalian structures of the brain (MacLean, 1967). If this were not the case, it would be difficult to explain, for instance, why observers experience dis-

C INEMATIC CREATION OF EMOTION / 171

tress when seeing a construction worker fall off the scaffolding, hit the ground, and cringe in pain—or for that matter, when watching a movie that shows the protagonist cling with his fingertips to a cliff, apparently about to fall to his death. Common observation and research evidence leave no doubt that people, in responding to the emotions displayed by others in actual situations or in fictional presentations, get routinely engrossed and show emotions, often of considerable depth. Some time ago, Adam Smith, in connection with his theory of moral sentiments (1759/1971), recognized the lack of ulterior benefits from such emotional investment. In his words: How selfish soever man may be supposed, there are evidently some principles in his nature, which interest him in the fortune of others, and render their happiness necessary to him, though he derives nothing from it except the pleasure of seeing it. (p. 1) Empathy with others’ experiences and expression of emotion are by no means a necessary response, however. There obviously exist circumstances under which empathic sensitivities diminish or are entirely abandoned and overpowered by alternative response mechanisms. Under these circumstances, those who witness others’ misfortunes are free to take pleasure in these others’ demise. The circumstances in question have been well understood since antiquity. Regarding dramatic narratives, Aristotle articulated them succinctly although in negative form (Aristotle, 1966). Specifically, he found fault with two principal narrative transitions, deeming them utterly unenjoyable. In his Poetica he stipulated that: (1) a good man must not be seen passing from happiness to misery, or (2) a bad man from misery to happiness. By implication, he recommended as joy-producing plots those that feature (1) a good person passing from misery to happiness or (2) a bad person from happiness to misery. Whether presented in negative or positive form, however, the propositions concerning negatively judged persons indicate the absence of empathic reactions to the projected outcome. Apparently, only good characters warrant empathic concerns. Bad characters do not. Bad characters’ joy from coming to glory cannot be affectively shared. Their joy may prove distressing instead. Analogously, their pains from coming to harm are not to be shared. Those who witness the demise of bad characters are free to applaud it. Aristotle thought it self-evident that the narrative transitions on which he had focused could not foster joy. He simply stated that these transitions would be odious. In discussing tragic plots, however, he articulated his reasons for projecting reactions of displeasure and vexation. Aristotle specifically implicated moral judgment with the mediation of reactions of joy versus revulsion to the resolution of various forms of dramatic narrative. He essentially argued that persons pursuing good causes (i.e., consensually approved causes) are considered good people and that good people are judged deserving of good fortunes. Analogously, persons pursuing bad causes (i.e., consensually condemned causes) are bad people, and bad people are judged deserving of bad fortunes— or, at the very least, undeserving of good fortunes. Outcomes in accord with moral considerations thus can be enjoyed. In contrast, outcomes that violate moral considerations are those thought to squelch enjoyment and to foster irritation and contempt instead. One is inclined, therefore, to expand on Smith’s (1753/1971) reflections about empathy and complete his thought by considering the abandonment of empathy, tran-

172 / D OLF ZILLMANN

sitory as this abandonment may be. Such expansion might read: There are evidently some principles in human nature that make individuals take an interest in the fortunes of others and that, in case good fortunes are judged unwarranted and bad fortunes are deemed just and called for, render these others’ misfortunes and their demise necessary, although onlookers derive nothing from it except the pleasure of seeing it. Considerations of morality have assumed a central position in drama theory ever since (Carroll, 1990; Tan, 1996), and in the form of moral sanctions, they have entered into the contemporary psychology of drama appreciation as well (Jose & Brewer, 1984; Zillmann & Bryant, 1975). In particular, moral assessments have become an integral, pivotal part of the disposition theory of emotion that has been employed to explain the enjoyment to drama of any kind (Zillmann, 2000a). Dispositional Mediation of Emotion The indicated intertwined operation of moral judgment and emotional disposition is outlined in figure 10.2. Witnessed behavior, as can be seen, is assessed in moral terms (i.e., good versus bad, to varying degrees), and such assessment is expected to determine emotional dispositions. The approval of actions and their apparent purpose is thought to prompt dispositions of liking and caring. Their disapproval, in contrast, is thought to prompt dispositions of disliking and resenting. Liking defines protagonists, disliking antagonists. Character development thus is considered a function of moral evaluation. Without such evaluation, dispositions of indifference would prevail, and witnesses to social happenings would show little emotional involvement, if any.

Fig. 10.2. A model of the dispositional mediation of emotion from witnessing the actions and contingent emotional experiences of others. Stages (2) and (7) indicate the involvement of moral considerations in the formation of emotional dispositions, and stages (3) and (4) the resulting emotional dispositions and their influence on anticipatory emotions. Stages (5) and (6) specify emotional reactions to pertinent outcomes, such as gratification or aversion, and to their expressive consequences, such as elation or distress. Feedback loop (c) indicates the influence of formed dispositions on moral judgment, such as amity fostering tolerance and enmity fostering strictness. Loop (b) suggests a similar influence of witnessed outcomes through their impact on dispositions. Loop (a) indicates that the process described in stages (1) through (7) is recursive and can be chained to arbitrary length (i.e., short dramatic plots can be chained within overarching dramatic plots).

C INEMATIC CREATION OF EMOTION / 173

Witnesses to socially relevant events in cinematic narratives may be thought of as untiring moral monitors. Their continually rendered verdicts are bound to yield the approval and adoration of some and the disapproval and detestation of others. The interdependence between moral assessment and emotional disposition is further apparent in loop c of the figure, which indicates the possibility of feedback from disposition to judgment. It has been observed that liking invites overly favorable, forgiving assessments, whereas disliking biases in the opposite direction. Emotional dispositions, once firmly established, are thought to foster anticipatory emotions. These anticipatory emotions are either positive or negative, their hedonic valence reversing as a function of morally determined dispositions. As the figure shows, positive dispositions foster hopes for positive, rewarding happenings as well as apprehension about negative, punitive ones. Negative dispositions foster the opposite hopes and apprehensions. If and when the hoped-for or dreaded events materialize, the evoked emotions will be in accord with anticipations. Specifically, hoped-for and morally sanctioned outcomes (i.e., rewarding events for protagonists and punitive events for antagonists) will foster euphoric, joyous reactions whereas dreaded and morally unwarranted outcomes (i.e., rewarding events for antagonists and punitive events for protagonists) will prompt reactions of dysphoria, discontent, disappointment, and contempt. Positive emotional dispositions are known to foster hedonically compatible reactions to events that evoke emotions in witnessed persons. Negative emotional dispositions, in contrast, are those that relax and overwhelm empathic inclinations and that enable witnesses to rejoice in response to resented others’ misfortune and agony. Negative dispositions also get in the way of empathizing with gratified others who are deemed undeserving of such fortune. The perception of undeserved gratification can stir intense emotions of righteous indignation. Counterempathic emotional reactions of this kind are obviously the result of moral considerations. Villains are to get their just deserts, and concerns about their welfare would amount to misinvested efforts at emotion control. Villains, moreover, are simply not entitled to good fortunes. Oddly, then, it is morality that liberates observers, allowing them to take pleasure from the punitive torment of others judged to deserve such fate. However, it is morality also that makes for the infuriation and indignation from bearing witness to the benefaction of those deemed utterly undeserving. These considerations lead to the following predictions of euphoric and dysphoric emotions in response to the resolution of dramatic conflict in cinematic narratives. The classification in moral terms is to highlight the significance of moral assessments in the mediation of the emotions of witnesses. Justice Conditions 1. Witnessing the victimization of a disliked antagonist at the hands of a liked protagonist fosters delight, the experiential intensity of which increases with (a) the liking of the protagonist, (b) the disliking of the antagonist, and (c) the extent to which the antagonist is deemed deserving of a particular victimization. 2. Witnessing the benefaction of a liked protagonist fosters delight, the experiential intensity of which increases with (a) the liking of the protagonist and (b) the extent to which the protagonist is deemed deserving of a particular benefaction.

174 / D OLF ZILLMANN

Injustice Conditions 3. Witnessing the victimization of a liked protagonist at the hands of a disliked antagonist fosters repugnance, the experiential intensity of which increases with (a) the liking of the protagonist, (b) the disliking of the antagonist, and (c) the extent to which the protagonist is deemed undeserving of a particular victimization. 4. Witnessing the benefaction of a disliked antagonist fosters repugnance, the experiential intensity of which increases with (a) the disliking of the antagonist and (b) the extent to which the antagonist is deemed undeserving of a particular benefaction. Support for these predictions comes from research on the enjoyment of a variety of dramatic formats. It comes, obviously, from the exploration of drama proper but also from that of specific genres (such as suspenseful narrative or comedy) and genre-like nonfictional exposition (such as sports and the news) (Zillmann & Knobloch, 2000; Carroll, 1990; Zillmann, 1998c). Suffice it here to exemplify the outlined moral-dispositional mechanisms with two selected investigations. The most-direct demonstrations of the power of moral judgment in the mediation of emotion in response to others’ emotions come from empathy research (Wilson, Cantor, Gordon, & Zillmann, 1986; Zillmann & Cantor, 1977). Schoolchildren were exposed to specially produced films in which either a loved or a hated character was developed and in which this character was either victimized or benefited during resolution. His victimization showed him in excruciating pain, his benefaction in extreme joy. The children’s facial reactions to these final scenes were unobtrusively recorded and then scrutinized. The findings were entirely in line with the specifications of stages 5 and 6 of figure 10.2. Respondents empathically cringed when the beloved character was in pain, and they exhibited joy when he was euphoric. They responded counterempathically, however, to the behavioral displays of the resented character. They cringed when he jumped for joy, and they expressed pleasure when he was hurt. In the latter condition, he apparently got what he deserved; in the former, the outcome was unjust and hence annoying and detestable. A parallel investigation with mentally challenged children demonstrated that, when the capacity for moral judgment at the level of equitable retribution is not developed, empathy becomes mechanical. In particular, counterempathic reactivity does not materialize. Such mentally challenged children invariably expressed joy in response to witnessed joy, and they invariably expressed distress in response to witnessed distress. Whether the witnessed emotions were exhibited by a beloved or by a resented character was immaterial. This latter investigation shows compellingly that empathy functions as a basic default mechanism that, if not opposed and overpowered by affective dispositions that derive from assessments of deservingness, governs emotional reactivity to the observed fate of others. The condemnation of others’ conduct and the resulting disliking, then, are indeed prerequisite to joy over their demise as well as to distress over their good fortunes. Moral Sanctions of Resolution In dramatic narratives, plots are known to dwell on hostile confrontation and conflict. Conflict is almost always resolved, however, and usually promptly so. Both in minor plots (i.e., minor in terms of duration) and in major plots (i.e., those that span large

C INEMATIC CREATION OF EMOTION / 175

portions of narratives, if not their entirety), the parties in conflict are disengaged in ways that are more fortuitous to one party than to others. Resolution may simply consist of the cessation of hostility or endangerment. More likely, it entails glorious victory for one party and humiliating defeat for the other. In emotional terms, resolutions provide at the very least relief from empathic distress. More characteristically, however, fully embellished resolutions, especially those overarching a narrative, evoke emotions of considerable intensity. Depending on dispositions toward the victorious and defeated parties, respondents will experience happiness or sadness—or emotions with affinity to these experiences. But dispositions are not the only factor that influences emotions in response to resolutions. Resolutions must be sanctioned morally to have their intended effect on emotion. Feelings of joy in response to a protagonist’s triumph can be spoiled by his or her actions that are deemed inappropriate, if not deplorable. Analogously, feelings of sadness will suffer impairment if the protagonist’s imperfections, her or his tragic flaw, are too prominent. The emotions evoked by the resolution of conflict in drama are undoubtedly pivotal to the enjoyment of cinematic narratives. Given that, along with the stipulation that these emotions hinge on moral considerations and are readily compromised, closer examination of the concept of moral sanction would seem to be warranted. The assessment of what is morally correct under given circumstances may be a deliberate, reflective process yielding specific verdicts. It may suggest a particular punishment for a particular transgression or indicate a particular reward for a particular accomplishment. Moral sanction is not thought to have such a high degree of specificity. It is not considered to prescribe and demand particular outcomes. Rather, moral sanction is conceived of as a readiness to accept, in moral terms, observed outcomes. It may well happen that, on occasion, specific harm, such as torture and death, is deliberately wished upon a brutal villain. But as a rule, expectations of punishment and reward are not specific to particular treatments and outcomes. Moral sanction is characterized, instead, by considerable latitude in accepting punitive or rewarding actions and events. Respondents to drama that features rape, for instance, may in a round-about fashion wish harm upon the rapist but be satisfied when seeing him caught and convicted, when seeing him contract a debilitating disease, or when seeing him crippled by a falling tree. The latitude of retribution is not unlimited, however. The respondents would probably be distressed if the only punitive consequence were that one of the rapist’s victims managed to bloody his nose. The respondents might be similarly distressed when seeing him being castrated and having his arms chopped off. Transgression during conflict and punishment during resolution must be roughly commensurate for the punishment to be morally sanctioned and deemed emotionally satisfying. Punishments that fall outside the latitude of sanction leave the respondents’ sense of justice disturbed, which ultimately diminishes the enjoyment of resolutions. The same applies to accomplishments for which the rewards fall outside the latitude of moral sanction. The exercise of moral sanction is not presumed, however, to involve formal moral systems, such as Kant’s categorical imperative or Bentham’s utilitarian formula (Kant, 1785/1922; Bentham, 1789/1948), and to necessitate derivations from them. Reminiscent of Aristotle’s suggestions, the morality thought to be involved is truly basic in prescribing good fortunes to good people (i.e., people who are good because they do good deeds) and bad fortunes to bad people (i.e., people who are bad because they do bad things). If moral judgment is thus conceived of as the not entirely systematic evaluation of situational behavior as being good versus bad or right versus wrong in terms of

176 / D OLF ZILLMANN

idiosyncratic conceptions, we must expect vast differences in assessments. For instance, some will consider the death penalty fair retribution for taking the life of a fellow human; others will consider this penalty a crime against humanity. Some will consider sexual preference a moral entitlement; others will deem specific ones morally indefensible. Some will think it good and right to save the big redwoods in California and Oregon; others will think it good and right to sacrifice a bit of nature for continued income and, perhaps, a better life for workers in the local lumber industry. Some will see fit to honor and defend the national flag because it is thought to signify the political doctrine of equal justice for all; others will be ready to burn the flag because they deem this political doctrine wanting in its administration of social justice. Some will embrace the morality manifest in prevailing social conventions; others will consider these traditions decadent and declare them morally bankrupt, thereby elevating themselves into a moral elite that is called upon to challenge the morality of those deemed morally inferior. Moral judgment is simply not monolithic, as some ethicists would have us believe. As a result, it would seem futile to treat people’s moral sanction of drama as uniform and normative. Recipients bring their idiosyncratic morality to the screen, sanction or condemn witnessed actions and agents in accord with it, and then experience emotions as a result of their assessment. As moral assessments vary, so will the respondents’ emotions. In constructing theories of drama appreciation that involve moral sanction as an essential mechanism, it is imperative to recognize and to make allowances for the diversity of basic morality in strata of the population at large. In order to predict more accurately which retributive events foster delight and which repugnance in whom, it will be necessary to stake out existing morality subcultures and to determine the judgmental properties that characterize and distinguish them. In the face of the indicated profound diversity in moral assessments, common ground should not be overlooked, however. Considering coercive and socially supportive actions, in particular, the members of different subcultures are likely to render similar judgments. Additionally, apprehensions about others being granted access to gratifications that are denied us may be widespread and nearly universal. Such apprehensions might also explain why we can take pleasure from witnessing the punishment and torment of those we think have taken unfair advantage of situations and, hence, have done wrong. Perhaps the overarching theme of enjoyable fictional exposition is conveyed in the projection of social justice in the sense that gratifications have to be earned by all our fellow humans just as we by our own efforts have to earn them—and that none of our fellow humans be exempt from the punitive contingencies that govern our own lives. Violations of this conception of justice will strike us as repugnant whereas exposition within these principles will delight us. Cinematic Dramaturgy of Good and Evil Our discussion of the evocation of emotion by cinematic narratives appears to render these narratives unmitigated morality plays. Moral monitoring is thought to foster approval or disapproval of the actions of the characters of plays and thereby yield feelings of sympathy toward the well-behaved protagonists and antipathy toward the ill-behaved antagonists. Within this good-versus-evil dichotomy, the strength of the affective dispositions, then, is expected to determine the intensity of empathy or counterempathy, of the anticipatory emotions of hope or fear, and of joyous emotions as hoped for outcomes materialize versus distressing emotions as feared outcomes do. Throughout the

C INEMATIC CREATION OF EMOTION / 177

cinematic display of relevant actions, the depth of the recipients’ emotional reactions is clearly a function of the magnitude of dispositional involvement. Poorly developed characters (i.e., characters whose actions and apparent intentions prompt neither applause nor condemnation) will not be engaging. In contrast, the recipients’ emotions are bound to be engaged by characters who do and intend to do things that, for whatever particular moral reason, are deemed supportive, courageous, brave, and simply wonderful, on the one hand, or arrogant, malicious, brutal, and plainly evil, on the other. The more we can love or hate the characters that the narrative develops, the more we shall enjoy outcomes that show those we love triumph over those we hate. If our emotions are sufficiently engaged, we shall applaud the cruelest destruction of evil characters without having moral misgivings about it. We could, after all, morally sanction the brutality involved. It would seem, then, that those cinematic narratives that develop the most admirable protagonists and the most terrifying antagonists, all within the limits of credibility, are likely to evoke the strongest emotions. The greatest dispositional separation between protagonists and antagonists promises the most-intense emotions in response to the resolution of conflict. Joy will be at a maximum as the best of good triumphs over the worst of evil. And should evil get the better of good, as it does in tragic resolutions, the deepest reactions of disappointment, dejection, and sadness can be expected. These observations seem to question the wisdom of developing complex characters— the type of character that is so highly valued by the critics. Indeed, character complexity, as it violates the purity of good or evil, must be considered a detriment to drama that focuses on the evocation of strong emotions. What should be recognized, however, is that the evocation of emotion is not the only objective of drama, not even necessarily the most desirable one. Drama may captivate and instigate us cognitively (Zillmann, 1991b). It can be thought-provoking and inspiring. Rather than stir our emotions to the fullest, it may gently touch us. Drama that combines the indicated elements—that is, drama both touches our heart and intrigues our mind—may well emerge as the genre of superior entertainment value. References Aristotle. (1966). De Poetica. (I. Bywater, Trans.). In The works of Aristotle (Vol. 11) (pp. 1447– 62). Oxford: Clarendon. Bentham, J. (1948). An introduction to the principles of morals and legislation. New York: Hafner. (Original work published 1789). Cannon, W. B. (1932). The wisdom of the body. New York: Norton. Carroll, N. (1990). The philosophy of horror or the paradoxes of the heart. New York: Routledge.de Wied, M., Zillmann, D., & Ordman, V. (1994). The role of empathic distress in the enjoyment of cinematic tragedy. Poetics, 23, 91–106. Grodal, T. (1997). Moving pictures: A new theory of film genres, feelings, and cognition. Oxford, England: Oxford University Press/Clarendon. Grodal, T. (2000). Video games and the pleasures of control. In D. Zillmann & P. Vorderer (Eds.), Media entertainment: The psychology of its appeal (pp. 197–213). Mahwah, NJ: Erlbaum. Hebb, D. O. (1955). Drives and the C.N.S. (conceptual nervous system). Psychological Review, 62, 243–54. Hoffman, M. L. (1978). Toward a theory of empathetic arousal and development. In M. Lewis & L. A. Rosenblum (Eds.), The development of affect (pp. 227–56). New York: Plenum Press. Jose, P. E., & Brewer, W. F. (1984). Development of story liking: Character identification, suspense, and outcome resolution. Developmental Psychology, 20(5), 911–24.

178 / D OLF ZILLMANN Kant, I. (1922). Grundlegung zur Metaphysik der Sitten. In Immanuel Kant’s sämtliche Werke (Vol. 5). Leipzig: Inselverlag. (Original work published 1785). King Jablonski, C., & Zillmann, D. (1995). Humor’s role in the trivialization of violence. Medienpsychologie: Zeitschrift für Individual—und Massenkommunikation, 7(2), 122–33, 162. LeDoux, J. E. (1993). Emotional networks in the brain. In M. Lewis & J. M. Haviland (Eds.), Handbook of emotions (pp. 109–18). New York: Guilford. MacLean, P. D. (1967). The brain in relation to empathy and medical education. Journal of Nervous and Mental Disease, 144, 374–82. Schachter, S. (1964). The interaction of cognitive and physiological determinants of emotional state. In L. Berkowitz (Ed.), Advances in experimental social psychology (Vol. 1, pp. 49–80). New York: Academic Press. Smith, A. (1971). The theory of moral sentiments. New York: Garland. (Original work published 1759). Tan, E. S. (1996). Emotion and the structure of narrative film: Film as an emotion machine. Mahwah, NJ: Erlbaum. Vorderer, P. (2000). Interactive entertainment and beyond. In D. Zillmann & Vorderer (Eds.), Media entertainment: The psychology of its appeal (pp. 21–36). Mahwah, NJ: Erlbaum. Wilder, J. (1957). The law of initial values in neurology and psychiatry: Facts and problems. Journal of Nervous and Mental Disease, 125, 73–86. Wilson, B. J., Cantor, J., Gordon, L., & Zillmann, D. (1986). Affective response of nonretarded and retarded children to the emotions of a protagonist. Child Study Journal, 16(2), 77–93. Zillmann, D. (1978). Attribution and misattribution of excitatory reactions. In J. H. Harvey, W. J. Ickes, & R. F. Kidd (Eds.), New directions in attribution research (vol. 2, pp. 335–68). Hillsdale, NJ: Erlbaum. ———. (1986). Coition as emotion. In D. Byrne & K. Kelley (Eds.), Alternative approaches to the study of sexual behavior (pp. 173–99). Hillsdale, NJ: Erlbaum. ———. (1991a). Empathy: Affect from bearing witness to the emotions of others. In J. Bryant & Zillmann (Eds.), Responding to the screen: Reception and reaction processes (pp. 135–67). Hillsdale, NJ: Erlbaum. ———. (1991b). The logic of suspense and mystery. In J. Bryant & Zillmann (Eds.), Responding to the screen: Reception and reaction processes (pp. 281–303). Hillsdale, NJ: Erlbaum. ———. (1994). Mechanisms of emotional involvement with drama. Poetics, 23, 33–51. ———. (1996a). Sequential dependencies in emotional experience and behavior. In R. D. Kavanaugh, B. Zimmerberg, & S. Fein (Eds.), Emotion: Interdisciplinary perspectives (pp. 243– 72). Mahwah, NJ: Erlbaum. ———. (1996b). The psychology of suspense in dramatic exposition. In P. Vorderer, H. J. Wulff, & M. Friedrichsen (Eds.), Suspense: Conceptualizations, theoretical analyses, and empirical explorations (pp. 199–231). Mahwah, NJ: Erlbaum. ———. (1998a). Connections between sexuality and aggression. (2nd. ed.). Mahwah, NJ: Erlbaum. ———. (1998b). Does tragic drama have redeeming value? Siegener Periodicum zur Internationalen Empirischen Literaturwissenschaft, 17(1), 4–14. ———. (1998c). The psychology of the appeal of portrayals of violence. In J. H. Goldstein (Ed.), Why we watch: The attractions of violent entertainment (pp. 179–211). New York: Oxford University Press. ———. (2000a). Basal morality in drama appreciation. In I. Bondebjerg (Ed.), Moving images, culture and the mind (pp. 53–63). Luton, England: University of Luton Press. ———. (2000b). Humor and comedy. In Zillmann & P. Vorderer (Eds.), Media entertainment: The psychology of its appeal (pp. 37–57). Mahwah, NJ: Erlbaum. Zillmann, D., & Bryant, J. (1975). Viewer’s moral sanction of retribution in the appreciation of dramatic presentations. Journal of Experimental Social Psychology, 11, 572–82. Zillmann, D., & Cantor, J. R. (1977). Affective responses to the emotions of a protagonist. Journal of Experimental Social Psychology, 13, 155–65.

C INEMATIC CREATION OF EMOTION / 179 Zillmann, D., Gibson, R., Ordman, V. L., & Aust, C. F. (1994). Effects of upbeat stories in broadcast news. Journal of Broadcasting and Electronic Media, 38(1), 65–78. Zillmann, D., & Knobloch, S. (2000). Das Nachrichtenschauspiel: Reaktionen auf Ereignisse um Prominente und Interessengruppen in den Nachrichten. In A. Schorr (Ed.), Publikums— und Wirkungsforschung: Ein Reader (pp. 98–123). Stuttgart: Westdeutscher Verlag. Zillmann, D., & Paulus, P. B. (1993). Spectators: Reactions to sports events and effects on athletic performance. In R. N. Singer, M. Murphey, & L. K. Tennant (Eds.), Handbook of research on sport psychology (pp. 600–19). New York: Macmillan. Zillmann, D., & Zillmann, M. (1996). Psychoneuroendocrinology of social behavior. In E. T. Higgins & A. W. Kruglanski (Eds.), Social psychology: Handbook of basic principles (pp. 39– 71). New York: Guilford Press.

Part Six Appeals of Reality-Based Moving Images I N THE 1970 S and 1980s, as if in reaction to the strong case for realism set forth by André Bazin and Siegfried Kracauer at mid-century, film-studies scholars alternately attacked, ignored, and redefined realism until by the last decade of the twentieth century, filmic realism was considered a non-issue in film studies. It had been shorn of both its earlier purported direct link with physical reality and its special credibility with regard to documentation. But suddenly, at the end of the century, the dead issue of realism reappeared on at least three media fronts. Along with the interest occasioned by special effects for Hollywood motion pictures (see chapter 3), there appeared on television a spate of news shows and reality programs such as hospital shows, police shows, and talk shows. And in documentary, compilations of photographs and personal reports such as Ken Burns’ The Civil War drew large audiences. When the Rodney King incident occurred in 1991, video footage of his being beaten by uniformed policemen shocked the general public, and the special credibility issue was thrust front and center once again. Media scholars “wanted to be as outraged as everyone else,” reports Noël Carroll, but “theoretically they ‘had proven’ antecedently that film and video could never convey truths, but only fictions. In order to negotiate this embarrassment,” Carroll says, they “declared the doctrine of ‘strategic realism.’ This seems to be the notion that if it suits your politics, then you can talk with the vulgar and act as if film images and videotape can be evidentiary, even though you know that, theoretically, this is a pipe dream” (pp. 55–56). But, in spite of the best efforts of some in the film studies establishment to belatedly embalm realism, it was back. Judging from the essays in this section, one would conclude that in the century’s final decade, realism returned to life not only in practice but in theory as well. Dirk Eitzen asserts in “Documentary’s Peculiar Appeals” that “[s]ignificance is what matters to us at any given moment. It does not originate in the ‘space between subjects’ or the ‘system of signifiers.’ To the contrary, it is grounded in the impact on our body of whatever registers in our senses and our mind.” And in “Reality Programming: Evolutionary Models of Film and Television Viewership,” William Evans surmises that “[i]ndeed, the realism of television and film content is one of the most relevant factors in explaining why people prefer television and film content and why they consume so much of it.” Realism is, of course, a central concept in ecological theories of perception. However, the issue of realism in all its complexity in the visual culture of our day goes beyond the pale of perception to incorporate questions that traditionally belong to the study of cognition, rhetoric, and narrative. The essays included in this chapter grapple with reality-based moving images in that larger context; their authors are film theorists and communication theorists—media philosophers, if you will. They and others like them

181

182 / PART S IX

have set a major direction in moving image theory toward a discussion of what kinds of capacities viewers have and how the makers of moving images exploit those capacities for good and ill. References Carroll, Noël. (1996). Prospects for film theory: A personal assessment. In David Bordwell and Carroll (Eds.), Post Theory: Reconstructing Film Studies (pp. 37–68). Madison: University of Wisconsin Press.

11 Documentary’s Peculiar Appeals Dirk Eitzen O NE OF THE most affecting movie sequences I have ever seen is the opening of Robert Gardner’s documentary about India, Forest of Bliss (1985). In it, an extremely emaciated and very bedraggled-looking mongrel is set upon by a pack of more-robust dogs. The lone mongrel tries to run away, but the pack catches it and brings it down. The mongrel whimpers and cowers submissively, but the pack attacks it relentlessly. Finally, the poor brute rolls over on its back in what is pretty obviously a plea for mercy. Still, the other dogs bite and tear at the hapless creature, evidently meaning to kill it. Even in a fiction film, seeing an event like this would be profoundly disturbing. Seeing it in a documentary, I found it practically unbearable. I was literally nauseated. I wanted to turn away. And yet, because this was a documentary, I felt an even stronger compulsion to watch. Even more than that, I wanted to intervene. I wanted to pick up a rock and throw it at the dogs that were so viciously attacking one of their own kind. This is an example of the peculiar power of documentaries. Fiction films obviously engender strong emotional responses as well. Compared to fiction films, documentaries tend to be boring and unengaging. And yet, when documentaries do produce strong responses, as this sequence from Forest of Bliss did in me, there is something special, something uniquely compelling and affecting, in their impact. To explain that is the purpose of this essay. Defining Documentary In everyday English, the word appeal has two general meanings that are quite distinct. One is “attraction” or “to attract,” as in “the dynamic professor has extraordinary appeal to students.” It denotes the power of arousing a generally pleasurable, sympathetic, or emotionally engaged response in onlookers. The second meaning is “entreaty” or “to entreat,” as in “the beleaguered professor appealed to his unruly students for cooperation.” This denotes an plea for help or a favor or for special attention and sympathy from others. By and large, the first kind of appeal is something one has; the second kind is something one makes. Documentary can be defined, in simple, pragmatic terms, as that kind of movie in which the first kind of appeal—the attraction to viewers—is intimately tied to the second—an entreaty to viewers. Take, for example, Ken Burns’s The Civil War (1990)—a kind of recipe text for the whole spate of historical documentaries that now air regularly in the United States on PBS, the History Channel, and Arts and Entertainment. As movies go, The Civil War is pretty dull: an endless series of slow pans and zooms over still photographs and languid shots of empty erstwhile battlefields. There’s no narrative through-line to speak

183

184 / D IRK E ITZEN

of, just a voice-over lecture, punctuated by readings from the diaries of soldiers and statesmen and static talking-head interviews with scholars. The film’s images are sometimes sentimental, but they are not spectacular. There are not even reenactments. And yet, critics quite routinely claimed that this documentary has a kind of immediacy and emotional force that no fiction film has. Why? The reason is that, in the program’s constant and explicit reference to historical realities that were (a) self-evidently tragic and (b) already invested with deep significance by most Americans, the show is supposed to request or warrant or even demand a special emotional investment from viewers. In contrast, Ken Burns’ subsequent documentary series for PBS, which applied the same stylistic formula to a history of the radio, was a ratings bust. It made the same appeals, but viewers just didn’t respond. This demonstrates that the main attraction of The Civil War series does not derive from anything intrinsic to the program (in the way that, say, part of the appeal of the fiction film Glory derives from its spectacular action sequences). It derives instead from some special claim upon or entreaty to viewers. In short, the emotional impact of The Civil War stems not from any appeals the film has but from the appeals it makes. The form and style of the program—the sepia stills, the pensive reflections of historians, and so on—are by no means irrelevant, but they are relevant in emotional, rhetorical, and esthetic terms more for the deep emotional significance they imply than for anything they explicitly show or say. This is typical, indeed it is characteristic and distinctive, of the way documentaries work. Their power to arouse a pleasurable or engaged response is closely tied to an implied entreaty for special attention and concern.1 It is up to the viewer whether to heed this entreaty (as did the many viewers who said they were moved by The Civil War) or to ignore it (as did the even greater number who checked out, then tuned out the series), or even to take umbrage at it (as did some viewers who found the series biased).2 The Pragmatic Approach I may choose to sit through all umpteen hours of The Civil War, despite its slow pace and repetitiveness, waiting for those occasional moments that move me. Or I may choose to change the channel to something more entertaining. Or I may choose to watch the show suspiciously and critically, taking special note of its wistful music, its pumped-up melodrama, and the many other devices it deliberately employs to push my emotional buttons. Whichever the case, when I recognize the movie as a documentary, I also recognize that it is making a plea for special consideration from me. That is part and parcel of seeing it as a documentary. This plea is rarely if ever express in documentaries, like “The events in this program really took place and therefore merit your special consideration.” That kind of overt claim is usually reserved for “docudramas” and tabloid TV shows like Cops and Rescue 911. In the case of “genuine” documentaries like The Civil War, the plea for special consideration is implied and tends to be much more pervasive, along the lines of “This is a real photo of real people of days gone by, folks, which is why we’re holding it in front of you for a whole thirty seconds.” Besides being implied, something else typifies the kind of entreaty made by documentaries: It concerns the way the viewer is supposed to “frame” the movie. It says, in effect, “This is a special kind of movie that needs to be watched differently than other kinds of movies.” It is an invitation to the viewer to adopt a particular stance or attitude, a particular mode of response. In this respect, it is similar to a wink or a nudge in a conversation that tells my listener, “I’m saying more here than I’m saying, if you get my drift.”

D OCUMENTARY ’ S P ECULIAR APPEALS / 185

The study of this kind of implication in oral and written discourse is called pragmatics. Pragmatics concerns itself primarily with the uses of discourses rather than with their meanings—or, more precisely, pragmatics supposes that any utterance is an act and that, as such, its meaning is always inescapably bound up with particular aims and goals. The aims and goals of specific acts of communication are rooted not only in cultures and interpretive communities but in specific discourse situations. To make sense of even the simplest exchange, like “How are you?” “Oh, I’m fine,” requires some understanding not only of the way these words and phrases generally work in our culture but of the specific situation in which they occur. Does the exchange take place in a doctor’s office? Is it a wife to her husband? Two people passing on the street? In each of these situations, the exchange may have a completely different purpose and effect. The same is true of documentaries. The only way to explain why one person is bored by The Civil War while another, with more or less the same cultural background, finds it deeply moving, is to tease out differences in the goal-oriented acts in which the two viewers make sense of and respond to the movie. In other words, to understand the peculiar workings of documentaries, pragmatics is not only useful, it is essential. The pragmatics approach is essential for understanding not just documentaries but any kind of filmic discourse that can be characterized as a mode, such as melodrama and comedy. What defines comedy, for example, is not primarily a particular form or set of conventions (even though certain forms and conventions do tend to cluster around comedy); it is a particular kind of response. Comedies are supposed to be funny. Comedies invite a particular stance: a playful, nonserious, somewhat distanced stance. It is precisely this sort of phenomenon with which pragmatics deals. And yet, remarkably, pragmatics has made scarcely any impact on mainstream film scholarship. One reason is that the discipline of film studies is so rooted in semiotics. Like the study of grammar, out of which it was born, the initial aim of semiotics was to discover the system of language (langue) independently of particular acts of communication (paroles). In other words, semiotics entertained the goal of separating the meanings of language from situational specifics, such as intentions and referents. Today, most film scholars acknowledge the fallacy in this approach (or at least in pushing it to such an extreme). Just the same, there seems to be lingering bias against the pragmatics approach, against ascribing meanings to implied aims or purposes (as opposed to arbitrary, cultural conventions) and against analyzing actual functions of filmic discourses (as opposed to their virtual “effects”). Yet, again, the pragmatics approach is precisely what is necessary in order to understand the peculiar appeals of documentary. The Body of “Evidence” Semiotics has always had particular trouble dealing with the idea of reference. A reference, such as that tree, over there, is, on its face, rooted in a particular context. This seems to confute the central contention of semiotics that all meanings lie in the system of language, quite apart from particular utterances and speech acts. Semiotics claimed to deal with this problem by splitting the referent into an infinite regress of signifiers, each referring to the next but none really referring to the thing itself. In other words, semiotics maintained that the existence of a tree and the significance of the tree (or of pointing to the tree) are not actually related. Semiotically oriented film theory used the same maneuver to deal with the problem of indexicality—the fact that a photographic image is a chemical record of the light reflected or emitted by a thing and therefore bears a trace of its reality. The theory did

186 / D IRK E ITZEN

not dispute the reality of this trace (just as semiotics did not dispute the reality of trees). It accepted the indexicality of photographic images as a fact. What it claimed, instead, is that this fact and the significance of this fact have nothing to do with each other. Documentary films revolve around exactly the opposite belief: That this fact (the indexicality of the photographic image) has everything to do with the special significance of documentaries (what I have called their peculiar appeals). “Evidence” is the bread and butter of documentaries. As John Corner writes in The Art of Record, “It is on the warrant provided by the integrity of their ‘raw materials’ that most documentaries base their discursive status” (p. 18). Some scholars of documentary have intimated that this so-called warrant is merely a sleight of hand, that documentary is nothing more than a kind of fiction film that pretends to be different.3 From a pragmatic perspective, this is a perverse argument, like maintaining that when I stub my toe, it only seems to hurt. There is no question that so-called evidence can be faked and that even actual evidence can lead to false conclusions. The history of documentary is full of examples of such cases. Nonetheless, when I perceive something to be evidence, its impact as evidence is no less real than the pain I feel when I stub my toe. This is not to say that the impact will always be the same. To the contrary. As the pragmatics perspective maintains and as the responses to The Civil War mentioned earlier demonstrate, the particular impact depends upon the context. This is shaped by the interests of the viewer and by what he or she perceives to be the aims of the discourse. Those are influenced, in turn, by historical and institutional factors, such as changing documentary practices.4 Still, the truly significant difference between documentary and fiction does not lie in a some abstract system of language or culture. It lies in the differences that I feel; in the very different impact upon me that the two kinds of discourse manifest, not just intellectually, but viscerally; in my glands, nerves, and emotional responses; in my whole orientation towards what I see—in a word, in my body. Return, for a moment, to my example “that tree, over there.” (I am referring, incidentally, to a tall maple outside my office window.) When I look at it, I see, first of all, a thing—a solid thing that I would have to walk around if it was in my way or that I could lean against or climb if I was so inclined. I also see it as a rough and shaggy thing; I can easily imagine the feel of its bark upon my fingers. Whatever symbolic or cultural significance the tree may have, it is first of all something that my body recognizes. When I recognize, in a more cerebral way, that trees like the one I see were used for making the lumber in my house and the paper this book is printed on, the significance of the tree is still inflected by its potential use-value. Even when I conceive the tree as part of a purely symbolic endeavor, as in this thought experiment, the significance of the tree is all tied up with its usefulness to me at the particular moment. It is not something purely “symbolic.” It does not reside in some imaginary “space between subjects.” It is something that I feel. Its “intellectual” significance has quasi-sensory and emotional dimensions that, again, register first of all in my body. This impact is an essential and undeniable part of the significance of “that tree, over there.” The same kind of impact is an essential and undeniable part of the significance of the “evidence” served up by photographic images. The Body and the World Significance ultimately comes from the body. This is the truly groundbreaking insight of ecological psychology. It marries ecological psychology to the pragmatics approach

D OCUMENTARY ’ S P ECULIAR APPEALS / 187

described above. It follows irresistibly from a proper understanding of biological evolution. And it has found strong support in the empirical findings of contemporary cognitive neuroscience. This is not the place to defend the insight—that has been done very admirably elsewhere by philosophers, film theorists, and neuroscientists.5 It is necessary to amplify and clarify the claim a little bit here because it may seem alien and even farfetched to those steeped in contemporary cultural theory. Cultural theorists often describe culture as a “web of meanings” that surrounds (and ensnares) individuals in a society. Those meanings are conceived as floating in a “symbolic space” among people: in conventions, mores, myths, ideologies, and so on. This hypothetical space may be a useful abstraction. To the extent that culture has any kind of tangible impact upon people, however, it must be quite literally embodied, either in the physical world—in such things as buildings, books, and the audible pressure waves in the air generated by speech—or in the representations people make of such physical things, in their heads—whether those manifest themselves as brute perceptions of buildings and books or as abstruse theories of language and culture. We have no direct access to what goes on in other people’s heads. If we did, speech would be unnecessary. We need to give our ideas, goals, feelings, beliefs, and so on concrete form in order for them to be recognized and responded to by other people. Discourse is the production and reception of such concrete forms: facial expressions, gestures, speech, writing, music, and so on. What we call a text is a record, of one kind or another, of such physical changes in the environment. Being more or less permanent fixtures of the physical world themselves, texts have their own concrete forms, which we refer to as books, CDs, movies, videos, and so on. When we respond to or interpret a discourse or a text, what we are doing, at bottom, is responding to changes in our physical environment. The reactions, perceptions, emotions, meanings, and so on that they engender do not take place in some imaginary space between subjects. That space is an analytical abstraction; it does not truly exist. Our responses actually take place entirely in our bodies and our brains. The question is, if our responses to texts occur solely in the bodies and brains of individuals, how come so many of our them are shared? Indeed, how can communication take place at all? There are two reasons. The first is that our physical environment is shared. It is made up not just of natural things, like ground and trees, but also social interactions, like conversations, and cultural artifacts, like movies. The second reason is that the senses and minds with which our bodies interact with the world are the products of natural selection. We are first of all organisms that move about and depend upon our physical environment to survive. Perceptions and thoughts evolved because, by and large, they deliver useful and reliable information about how to interact with the environment. In other words, they correspond to reality in important ways. If they did not, we would not have them. The easiest way to show this is to consider a creature that has no mind yet obviously interacts with its environment successfully, even to the point of sharing information with others of its kind: the honeybee. A honeybee has so few neurons in its tiny brain that it cannot possibly have anything like an accurate or complete picture of its surroundings. Yet, it has a remarkable ability to detect the scent of sugar molecules in the air and trace them to their source. Natural selection has guaranteed that a honeybee’s representations of its physical world, as rudimentary as they must be, are useful for finding nectar. In the same way, natural selection has guaranteed that human mental representations are

188 / D IRK E ITZEN

by and large correct, not in that they conform to reality in every respect but in that they gather the kind of information about the world that the human organism needs in order to survive. Signs used to communicate must also correspond to reality in important ways. A honeybee’s dance that signals the presence of food not only needs to tell other bees of the colony where to go looking, it has to consistently point to the presence of real food, otherwise the dance will disappear when the colony starves. In the same way, a dog’s growl that signals the intent to bite must really tend to correspond with the intent to bite, not with anything at all or nothing in particular; otherwise there would be no point to it. By the same token, even the abstract symbols used in human language, such as liberty, must really tend to correspond to the same patterns of behavior in society and, by extension, the same constellation of ideas in people’s heads; otherwise they would convey no information and would be useless for communication. In sum, it is in the interest of humans, as it is in the interest of honeybees and dogs, that both the impulses in the brain that register as sensations, perceptions, or meanings of one kind or another and the signals that are used in social interactions are not merely regular and rule-governed but that they have some reliable correspondence to reality.6 Meaning and Matter Something can reliably correspond to reality and not really matter at all. Recently, I found a map of South Carolina in my car’s glove compartment. I have no idea what it was doing there, because the last time I visited South Carolina must be fifteen years ago. The map had evidently been lying there for years, unused, just junking up my storage space. Much of the stuff that is lying about in our culture and our minds has the same sort of status. It is there, but it can hardly be said to matter. What matters is only what gets used, or recognized, or thought about. In other words, it is only what engenders some sort of immediate physical response. Its significance might be said to be this immediate physical response. When I discovered that map in my glove compartment, for example, I thought, “Junk. Throw it away.” At the moment, that was its sole significance. I never dreamed that the map would find its way out of the dustbin into this essay. Now that it has, of course, it produces a different sort of reaction in me. Its significance has changed. Significance is what matters to us at any given moment. It does not originate in the “space between subjects” or the “system of signifiers.” To the contrary, it is grounded in the impact on our body of whatever registers in our senses and our mind. We have senses and minds because we are creatures that act. The first purpose of our senses and minds is not merely to show us things in our environment but to motivate, monitor, and guide our actions. When humans receive visual signals from the environment, for example, it is not a passive affair; it is for the purpose of being able to interact successfully with the environment. It is not just the eyes, optic nerves, and visual cortex that are involved. The whole body is involved. At the level of neurons, thought and memory do not involve retrieving passive clusters of associations; they involve repeating bits and pieces of prior activity. Because of that, representations of the body play a crucial part. When we see a tree, it is not just the cluster of tree detectors in the sensory cortex that are triggered; it is also the network of neurons that are associated with the body’s experience of interacting with trees. Such representations of the body’s responses are crucial, from an evolutionary standpoint, because they serve to orient us toward the things we perceive. The same is true

D OCUMENTARY ’ S P ECULIAR APPEALS / 189

in thought and memory. Patterns of activity in the higher-level associational cortices are always dispositional. In other words, they represent not just things but also our relationship to those things. That relationship is what we perceive in emotions, for example—feelings of pleasure, pain, attraction, revulsion, anxiety, sorrow, and so on.7 Evolution has bequeathed us such feelings because they push us to behave in ways that enhance our prospects for survival and reproduction. And because of that, such feelings are a fundamental part of our thoughts and perceptions. They cannot be factored out. It is not possible to analyze what meanings are irrespective of what they are for, as though meanings are separate from motivations, and the needs of the body are only indirectly connected to the workings of the mind. Judgments and decisions, motivations and inclinations, dispositions and feelings are built into our thoughts and perceptions. This brings us back, finally, to the question of documentary’s peculiar appeals. As I stated earlier, the truly significant difference between documentary and fiction does not lie in a some abstract system of language or culture. It lies in the differences that I feel: in the very different impact upon me that the two kinds of discourse manifest, not just intellectually but viscerally; in my glands, nerves, and emotional responses—in my whole orientation towards what I see—in a word, in my body. The question remains, just what is this difference? A Peculiar Disposition to Intervene Recall the intense experience I described of witnessing a dog attack in the documentary Forest of Bliss? Consider, now, the experience of watching a fiction film that is designed to provoke similarly intense responses. Not long ago, I went to see a Hollywood thriller about genetically engineered brainy sharks on the loose. Every time a character stepped into a pool of water in a leaking underwater research station, odds were pretty good he was going to lose a limb, at least, to one of these sharks. When I left the movie, I was surprised to discover that my thigh muscles were exhausted from trying to hold my own feet off the theater floor. This is an amusing response because, clearly, there were no sharks under my seat. It may also seem peculiar inasmuch as my physical response—lifting my feet—relates so specifically to the predicament of characters in the fiction. In that regard, my response was no different from many other physical responses we have to fiction films, from laughter at funny scenes to closing our eyes during scary scenes. It does exemplify in a particularly vivid way, however, what cognitive neuroscience has found out about the way imagination works. When we imagine, recall, think about, or make sense of actions and situations, including those we might witness in a movie, we do not process them in some dedicated imagination center, removed from the sites where we process impulses from the real world. There is no such center in the brain. (To suppose otherwise is to suppose that there is some dedicated I in the brain—something like a little homunculus who sees our thoughts and perceptions unfold before it, as though it is a spectator at a movie. This notion has been thoroughly debunked by cognitive science, both on theoretical and empirical grounds.) Instead, when we imagine (or see a movie about) being in a sinking research station, besieged by brainy sharks, we rehearse actions and situations in the self-same neurological networks in the sensory and motor cortex that we would use if we were actually in a sinking research station besieged by brainy sharks.8 Naturally, the outputs of this processing are bundled together downstream with other outputs, including the body’s

190 / D IRK E ITZEN

implicit awareness that it is comfortably ensconced in a theater seat. This is why the shark attacks in the movie are thrilling and not merely terrifying. So, what is the difference between this kind of response and my response to the dog attack in Forest of Bliss? In one respect, nothing at all. When I see the dog that is being attacked cringe before the fangs of its attackers, I cringe, too—not outwardly, perhaps, but at least in those parts of my brain that know from experience what it feels like to cringe before a physical threat. This triggers a rush of adrenaline. It is partly what accounts for my extreme discomfort. It is also partly why I feel such strong sympathy for the victim (and for underdogs, generally): My physical responses to the idea of being attacked and out of control are so strong that they overwhelm any inclination I may also have to imagine what the attackers are feeling. In short, I cringe sympathetically during this dog attack just as I lift my feet sympathetically during the shark attacks depicted in the fiction film. The two responses are closely parallel. But there is a crucial difference, as well. During the scene from Forest of Bliss, I experienced a very strong impulse to intervene even though I knew that was not possible. I wanted to hurl a stone at the attacking dogs. Indeed, I was upset at the filmmaker for not throwing a stone himself. I felt he was to blame for letting the attack continue. I did not experience anything remotely like this during the shark movie. I lifted my feet with the characters in the movie, as it were, but I had no inclination to lift even a finger for them. When a potential victim flailed at a shark, I was right in there flailing with him, in effect, but there was nothing like the inclination to protect him that I felt in Forest of Bliss. When I screamed, in my head, “Stay out of that water, you idiot!” it was in delighted anticipation that the character would ignore my telepathic warning, not actually to stop him. The idea that the filmmaker was somehow to blame for not preventing the shark attack, had it occurred to me, would have seemed patently absurd. To describe what I experienced during Forest of Bliss as an impulse to intervene is not quite accurate. I really had no inclination to throw rocks at the movie screen. What I experienced was a bit more remote: an awareness of an inclination to intervene, had that been possible. In other words, it was an emotion—a recognition of my body’s disposition toward what I was seeing—not an impulse to actually act.9 It is more accurate, therefore, to call it a disposition to intervene rather than an impulse. This disposition is part of the distinctive impact of documentaries. Many documentaries take advantage of it by openly inviting us to take action—to write our senators, change our minds, express our opinions, or at the very least, to pay special attention and feel special concern while watching the film. Of course, there are other documentaries that eschew or even seem to preclude this kind of action. Forest of Bliss is one. It presents a lyrical wash of images of India with next to no explanation. This tends to deny viewers the evaluative foothold that one must have to contemplate intervention. Moreover, the world the film depicts is totally inaccessible: decades past, now, and half a world away. Nevertheless, our interest in the film revolves around seeing the world it depicts as a place where our actions would have made a difference. For this reason, the dog attack that opens the film does not merely arouse anxiety, horror, and sympathy, as it would in a fiction film. It also arouses indignation—the feeling that somebody should have intervened to change the events that are depicted. Indignation is a response that is, by and large, peculiar to documentaries. The felt inclination to intervene is a crucial difference between documentary and fiction films. It is this, more than anything else, that accounts for the peculiar emotional force that documentaries can exercise on us, the peculiar power to “move” evidenced in

D OCUMENTARY ’ S P ECULIAR APPEALS / 191

my strong reaction to the dog attack sequence. Even if, in some circumstances, this is merely a rhetorical effect, the product of filmmakers’ techniques or viewers’ assumptions, it is a difference that really matters, because it results in a really different overall orientation toward what we see on the screen. The Documentary Mode On the landscape of emotional experience, the disposition to intervene seems a whole lot less peculiar than the response typically engendered by fiction films: namely, emotional engagement without any disposition to intervene, in the way I experienced the shark movie. We are, after all, social creatures. An inclination to intervene when we see somebody in trouble is part of our makeup. If I were to witness, first-hand, a pack of dogs viciously attacking another dog, I would feel a strong impulse to throw a stone at the attackers. If I saw somebody really being attacked by sharks, I would be far more concerned with his safety than with actionadventure heroics. If I saw a person crying, I would feel a natural inclination to comfort him or her. If I saw any of these things in a documentary, my responses would be much the same. In that respect, our emotional responses to the things we see in documentaries are not so peculiar after all. They are quite similar to the responses we would have to seeing the same things first-hand (albeit from a distance that prevents actual intervention, as though through a window). It is fiction that is the peculiar mode of experience, closely allied with pretending. Gregory Bateson was one of the first social scientists to recognize the importance of evolutionary theory for understanding human communication. Bateson observed two young monkeys at the San Francisco zoo posturing and chasing each other aggressively, as though to bite. And yet, Bateson writes, it was clear “even to the human observer, that the sequence [of behavior] as a whole was not combat, and evident to the human observer that to the participant monkeys this was ‘not combat’” (p. 179). This kind of behavior could only occur, Bateson points out, if the monkeys were able to “frame” the posturing and attacks, meta-discursively, as “This is play.” Bateson was mainly interested in demonstrating the intentionality of discourse. The difference between real fighting and pretend fighting, he said, lies not in the actions themselves but in their purpose or aim. It is a matter of conceptual “framing.” That is true, as far as it goes. Such framing, however, is not merely conceptual; it is emotional. It has a profoundly different impact upon and produces a profoundly different disposition in the participants in the discourse. Therein lies its real significance. Pretend fighting (wrestling, for example) can produce the same intense emotions as real fighting—arousal, a desire to dominate, anger, even fear—yet, because the intent to inflict injury is absent or restrained, there is much less likelihood of the participants getting seriously hurt. The consequence of the activity is therefore vastly different. Play and pretending change the consequence of any behavior or situation. When Bateson saw the monkeys fighting in play, he knew that neither really intended to hurt the other. Because of that, it would have been inappropriate for him to intervene. Had he for some reason wished to interact with the monkeys on their own terms, he would have had to “play along,” to acknowledge that their fighting was “just pretend,” intentionally removed from the realm of ordinary consequences, and to respond accordingly. Fiction produces the same kind of response. When I watched that shark movie, I responded as though I was watching a pretend shark attack (as indeed I was). This did not diminish the strength of my emotional responses, but it changed their tenor. I rec-

192 / D IRK E ITZEN

ognized that the event did not belong to the realm of ordinary consequences. Nobody was really going to get eaten. That is why I felt no disposition to intervene. It is not coincidental that Bateson’s studies of frames of discourse began by looking at play. There are innumerable other frames he might have chosen: real fighting, for example, or teaching or arguing or flirting. The playing frame is fairly unique, however, in that it removes an interaction from the realm of ordinary consequences. When I am in the teaching frame, for example, I expect my students to learn. Not so if I am merely pretending to teach. In that regard, teaching, arguing, and flirting are all part of the same mode of experience as real fighting. Pretending stands apart. Documentary also belongs to the same mode of experience as “real fighting.” In a shark attack in a documentary, somebody really could get eaten. That is the root of the disposition to intervene I described above. That is also the reason why documentaries are used to teach or argue or persuade. They are the same order of interaction. They all belong to the realm of ordinary consequences. (Conversely, that is why when fiction films teach or argue or persuade, they almost always do it indirectly or covertly by, for example, creating scenario after scenario in which men are violent and women are sex objects.) Fiction is clearly of a different order than other kinds of experience, an order allied with pretending. In contrast, one might say that nonfiction is a mode allied with not pretending. But against the backdrop of all other kinds of experience, nonfiction is not a special mode, any more than teaching, arguing, or flirting is. Our responses to documentaries are simply an extension of our ordinary responses to things that matter to us into the indirect realms of observation (for example, watching monkeys at play) and discourse (for example, talking about monkeys at play). Instead of calling nonfiction a mode, therefore, we might just as well say that, like teaching, arguing, and flirting, it is simply a discourse of consequence. Of course, documentaries are movies. They are contrived, just like fiction films. That makes a difference in the impact of the things they show us. A very significant difference. A Peculiar Disposition to Trust A documentary can “lie.” Therein lies the difference. If you were to witness, first-hand, a pack of dogs viciously attacking another dog, there would be no question in your mind about the reality status of the event. You would be able to tell, from the context, how consequential or serious the attack was. On the other hand, if such an attack is described to you, as I described it to you at the start of this essay or if it is presented to you in a movie, as in Forest of Bliss, you have no way of ever knowing just how serious the attack actually was. It could have been exaggerated, taken out of context, or even completely fabricated.10 Whenever any actual event or state of affairs is presented through a discourse, regardless of the medium, one ultimately has no alternative than to simply “take it on faith.” The eminent British philosopher of language, H. Paul Grice, observed that there is an awful lot we have to take on faith in any discourse of consequence, be it teaching, arguing, asking for help in changing a tire, or, by extension, watching a documentary. First and foremost, we have to take on faith that there is some purpose or point or at least a rational direction to the discourse. Otherwise, why would we bother? Furthermore, we have to take on faith that the other parties in the discourse are trying to make it worth our while. Otherwise, again, why would we bother? Grice calls this the cooperative principle.

D OCUMENTARY ’ S P ECULIAR APPEALS / 193

The cooperative principle entails, in turn, certain general rules or maxims that any competent participant in a discourse implicitly understands. For example, it is a maxim that, in conversation, a comment or question about a new topic merits some sort of response. If my wife and I are sitting in the living room, and I say, “Gee, honey, it’s cold in here,” I suppose (and I suppose that my wife supposes) that some response is in order. Accordingly, whatever my wife says, she knows that I am likely to construe it as a response. That is the nature of conversation. Of course, this is not a given. It is a kind of agreement, an implicit social pact that is necessary for conversation to succeed. My wife can, if she chooses, willfully flout the maxim and sabotage the discourse by deliberately responding with some nonsense or by ignoring me altogether. In other words, such maxims are another thing that I simply have to take on faith in any discourse of consequence. One of Grice’s four general maxims is “Do not say what you believe to be false.” Notice that the maxim does not read, “Say only what you believe to be true.” There is a subtle but important difference. Grice’s wording describes precisely what we expect of documentaries. We do not expect documentaries to be true in any absolute sense: We welcome generalizations, opinions, controversy, illustrations, political agendas, melodramatic music, Hollywood-style editing, sound effects, reenactments, and all sorts of other quasi-fictional elements that are undeniably and self-evidently not true. On the other hand, we expect things in documentaries to be what they seem to be: A reenactment should not look like archival footage, an opinion should not be passed off as a fact, dates and figures that are asserted confidently ought to be uncontroversial and correct, and so on. Documentaries are contrived, just like fiction films. They are no different in this regard than any other discourse about reality. When I teach or mention the temperature or describe in a scholarly essay the movies I have seen, my discourse is no less artificial, no less fabricated, than when I make up stories. What separates such discourses from fiction is not the extent to which they are contrived or artificial. It is the presumption that they are consequential. Grice’s most important insight is that, precisely because they are contrived, discourses of consequence, such as documentaries, must always revolve around trust. Trust is the essence of his cooperative principle. When I am watching a documentary, I have to trust that it was made for a purpose. I have to trust that part of that purpose is to produce a satisfying or worthwhile response in me. And I have to trust that the filmmakers are not saying what they believe to be false by deliberately distorting or falsifying the apparent consequence of their raw materials. Without these kinds of trust, documentaries would be pointless. Watching them would be a waste of time. Because we know this, implicitly, our trust in documentaries is not just a fact, it is a powerful disposition. The main reason for watching documentaries is because we are disposed to trust them. This disposition is powerfully related to their impact. This explains the peculiar nature of evidence in documentaries: historic photographs, observational footage, first-hand testimonies, and so on. Such evidence does not warrant our trust (even though it does signal that we are involved in a discourse of consequence). To the contrary, it is our trust in the discourse that warrants taking what we see in documentaries as grounds for a special emotional investment. In other words, the evidence in documentaries does not so much testify to the truth of what we see as to the honesty and serious intent of the filmmakers. This explains why,

194 / D IRK E ITZEN

in The Civil War, as in most other documentaries, materials of vastly different status as actual evidence (archival photographs, artists’ renderings, recreated sound effects, animated maps, contemporary footage of landscapes, expert testimony, apocryphal anecdotes, dramatic readings, and melodramatic music) can be all mixed up, willy-nilly, without in the least troubling or putting off viewers. The credibility and emotional impact of the series depend only upon the impression that the filmmakers are conscientious and trustworthy—that, despite their interest in drama, they do not step beyond the bounds of fact. It does not depend upon the use of actual evidence. Any kind of raw material is suitable, from archival documents to dramatic reenactments, so long as it does not undermine our impression of the reliability of the discourse.11 A Peculiar Sense of Self-Worth I mentioned earlier, in passing, that indignation is a response that is peculiar to documentaries. When I saw the dog-attack sequence in Forest of Bliss, I was not merely revolted and disposed to intervene, I was indignant. Indignation, besides implying the sense that someone should have stepped in to change things, implies a sense of moral rectitude, of self-righteousness, of superiority. So, ironically, at the same time the dogattack sequence made me feel awful for the poor brute that suffered, it made me feel pretty good about myself by putting me in a position to pass judgment on what I was seeing. This is just one example of the way documentaries quite routinely make us feel good about ourselves by allowing us to feel superior. The documentary canon can be read as a virtual compendium of cinematic snobbery. Superior attitudes range from the cuteness factor of Nanook of the North (Robert Flaherty, 1922) to the political pretensions of the Vertov group, the artiness of the city symphonies, the attitude of moral “uplift” inherent in the whole Grierson tradition, the nationalistic sentiments of Triumph of the Will (Leni Riefenstahl, 1934), the arrogance of voice-of-God documentaries, and the distanced “objectivity” of cinema vérité. A more recent sampling of documentary hauteur includes the insiders’ jokes of The Atomic Cafe (Kevin Rafferty, Jayne Loader, and Pierce Rafferty, 1982), the self-importance of Shoah (Claude Lanzmann, 1985), the high moral dudgeon of the right-to-life film The Silent Scream (1985), the smug irony of Roger & Me (Michael Moore, 1989), the slightly condescending pity of Common Threads: Stories from the Quilt (Robert Epstein and Jeffrey Friedman, 1989), the elitist aura of prestige projected by The Civil War, the self-righteousness of The Panama Deception (Barbara Trent, 1992), the “exclusive” back-room access of The War Room (D. A. Pennebaker and Chris Hegedus, 1993), the ostentatious stylishness of Fast, Cheap, and Out of Control (Errol Morris, 1997), and the camp sensibility of The Eyes of Tammy Faye (Fenton Bailey and Randy Barbato, 2000). This is not intended as criticism; it is just an observation. We all feel a little smug sometimes. Documentaries simply use this as a lever to elevate our interest and concern. The typical venues of documentaries—art-house cinemas, university campuses, and public television, among others—also suggest their prestige value. It is interesting to note, in passing, that the primary audience of PBS documentaries is the most statusconscious segment of American society: well-educated, well-off, white, male, middleaged professionals.12 In the previous sections, I discussed the special feelings of consequence that documentaries engender. Compared to fiction films, documentaries seem both especially urgent and especially honest. Even if this fact makes documentaries seem especially boring in

D OCUMENTARY ’ S P ECULIAR APPEALS / 195

our entertainment-saturated age, it also invests them with a special status. There appears to be a strong emotional connection between the impression that a particular kind of discourse, like documentary, has special status and the impression that participants in that discourse have special status—that they are in some way privileged or superior. From a pragmatics perspective, again, this makes sense. Consider the example of a “secret”—the prototypical example of a privileged discourse. By excluding some people, it creates a social bond among those who share it—a feeling of intimacy and belonging. It elevates the status of those who share the secret among those who do not. And, for those reasons, it fosters in participants in the secret a special sense of self-worth or social superiority. A similar sense of self-worth is clearly part of the stock-in-trade of documentaries. Although one sees it particularly clearly in the superior attitudes mentioned above, selfworth is also implicit in the appeal to shared sentiments in every social issues documentary and in the claims of education, edification, and enlightenment that are used to promote all kinds of nonfiction films. I speculate that there is also an emotional link between the peculiar sense of self-worth documentaries can engender and the sense of their consequence or “seriousness.” The more “serious” a documentary is supposed to be, on account of either its topic or its approach, the more likely it is that viewers will make moral judgments and the more powerful their emotional investment in these judgments is likely to be. In short, the more superior they are likely to feel. Conversely, the more superior a documentary makes them feel, the more “serious” it may seem to be. (I suspect this is true of other kinds of movies, such as art films, as well.) The sense of superiority that attends some documentaries can backfire, however. If the sense of superiority is perceived as arrogance, it can trigger a defensive reaction from viewers or sympathy for those who are looked down upon. The upshot is to undermine the disposition to trust described in the previous section. Reviews of the condescendingly funny documentaries The Atomic Cafe and Roger & Me are rife with examples of this kind of response, but the reaction can be triggered by even the most “serious” of nonfiction films, such as Shoah.13 When we see someone crying in a documentary, the assumption that it is a “real person” (as opposed to an actor) is not merely a belief that we assent to, it is in some vital way “given,” even though it may be false—just like the assumptions that govern vision. Of course, fiction films also produce real physiological responses or “givens.” The difference of documentaries lies in the peculiar emotions they engender. Emotions are the key. The difference between “We’re fighting” and “Let’s pretend we’re fighting” is fundamentally a matter of the body’s disposition, or the way it is inclined to act. The same is true of the difference between “That’s a ‘real person’ crying” and “That’s an actor crying.” Emotions are the way the body’s disposition toward the “givens” of perception impinges upon awareness. Emotions are not the bane of reason but an indispensable prop of reason. They evolved because, by and large, they serve to perpetuate an inclination to act in ways that further the social and physical well-being of the organism. In a word, they are adaptive. I have suggested that three emotions, in particular, shape documentaries’ peculiar appeals. All three relate to specifically social dimensions of our experience. One is a disposition to intervene, even where action is impossible. The second is an inclination

196 / D IRK E ITZEN

to trust the discourse, to accept the things shown and said in the film at face value. The third is a special sense of privilege and belonging. These hypotheses are just a start, a tentative foundation for further research. A pragmatic or ecological perspective casts the peculiar appeals of documentaries in a new light. Rather than seeing them as fundamentally pernicious and misleading—as false claims to Truth, for example—it shows them to be based upon a natural and in general quite helpful tendency to regard reality, and by extension discourses about reality, as having certain real consequences that fiction does not. To try to flout this tendency, either with theoretical claims that documentaries are nothing more than fiction films in disguise, or with stylistic practices that thwart our efforts to tell which is which, is to undermine any power documentaries may have to really make a difference. Notes 1. By this definition, most of the segments of CBS’s 60 Minutes are documentaries, because they make a point of practically oozing deep social significance in order to attract viewers. On the other hand, CBS’s smash success Survivor, even though it has already made an impact on nonfiction television programming, is not a documentary, because its entertainment value, for the typical viewer, is mainly linked to its game-show elements. The exotic locales and the interpersonal conflicts, however genuine they may be, are just part of the show. MTV’s Real World lies somewhere in between. The situation (an eclectic ensemble of teenagers and twenty-somethings living together under the constant scrutiny of cameras) is obviously engineered to foment little “dramas” that will appeal to audiences in much the same way soap operas do. At the same time, in the implication that “This is the way young people today really are,” the program trades on more than just its intrinsic entertainment value. In any case, my definition of documentary is not supposed to delineate a neat body of films. Among film scholars, documentary is generally regarded not as a genre or a form of movie but as a mode of movie—that is, a type of movie primarily concerned with producing a particular kind of response in audiences. Movies that are squarely in this documentary mode, like The Civil War, have many fictional elements. Quasi-documentary television programs, like Real World, quite routinely overturn traditional associations between certain filmmaking techniques, such as cinema vérité, and the documentary mode of response. Even straight-up fiction films, like Titanic, can dip into the documentary mode by resting part of their appeal on, for example, reminders that there was a real ship that really sank. The key question in distinguishing the documentary mode, I have argued elsewhere, is not “What is it?” but “When is it?” See Eitzen, 1995. Although the definition of documentary I have proposed here is novel, its basic premise— that documentary discourse revolves around a special entreaty to viewers—is widely accepted by scholars of documentary. This is essentially what Bill Nichols means when he characterizes documentary as a “discourse of sobriety” in the introduction to his Representing Reality (Bloomington: Indiana University Press, 1991): 3–4. 2. For a survey and analysis of responses to The Civil War, see Eitzen, 1994, pp. 111–30. 3. I use the word intimated, because no one has stated this claim so baldly, but it is clearly the guiding idea behind William Guynn’s 1990 book, A Cinema of Nonfiction. It is also the foundation of certain of the stronger claims of Bill Nichols and others, such as this one from Representing Reality: Documentaries are fictions with plots, characters, situations and events like any other. . . . They [refer] to a “reality” that is a construct, the product of signifying systems, like the documentary film itself. . . . The notion of any privileged access to a reality that exists “out there,” beyond us, is an ideological effect. The sooner we recognize this, the better. (p. 107)

D OCUMENTARY ’ S P ECULIAR APPEALS / 197 4. In this regard, my understanding of the nature of the evidence in documentaries agrees with that of semiotically oriented scholars, such as art historian John Tagg, who writes: The photograph is not a magical “emanation” but a material product of a material apparatus set to work in specific contexts, by specific forces, for more or less defined purposes. . . . That a photograph can come to stand as evidence, for example, rests not on a natural or existential fact, but on a social semiotic process. (p. 4) The first sentence of the above quote is absolutely correct (as is the gist of Tagg’s analysis). The second sentence, however, shows how Tagg minimizes the extent to which the “social semiotic process” by which photographs are deployed and interpreted hinges upon recognition of the “natural and existential fact” that they bear an indexical relationship to the things they record. This is symptomatic of a kind of schizophrenia in much “post-semiotic” cultural theory. On one hand, it insists that there is a strong connection between significance and historical realities; on the other, it wants to deny any connection between significance and physical realities. 5. I refer you, in particular, to three very readable summaries, the first philosophical, the second film-theoretical, and the third neuroscientific: Daniel Dennett, Consciousness Explained; Joseph D. Anderson, The Reality of Illusion: An Ecological Approach to Cognitive Film Theory; and Antonio Damasio, Descartes’ Error. 6. Note that some reliable correspondence to reality does not make a sign true, merely more or less useful. A gopher’s squeak to warn its colony of the presence of a hawk is in no sense a reflection or index of the actual presence of a hawk. It is merely a probable indication. It may be mistaken. Its success in evolutionary terms hinges not upon its being confirmed by reality to be true but upon its not being confirmed by reality to be false. A gopher that squeaks and runs for cover at the sight of every passing crow and swallow may have no way of ever knowing that all those crows and swallows are really not hawks but the reality that they are not will tend to favor more discerning gophers. 7. This paragraph is a sketchy summary of one key finding of cognitive neuroscience. For elaboration and support, I refer you to Damasio. 8. Once again, I must steer you to Damasio for details. Incidentally, the title of his book, Descartes’ Error, refers to the supposition that there is some dedicated I in the brain. 9. My conception of emotions as mind’s reflection of the body’s dispositions is similar to the account of N. H. Frijda, in The Emotions. Frijda describes emotions as a kind of judgment that takes control of and guides reasoning in situations that touch on an individual’s concerns. A concern is anything that a person perceives to be of consequence to his or her interests, goals, or personal well-being. This perception is generally automatic—a kind of “given.” For example, because our ability to correctly interpret another person’s emotional state is critical to our social well-being, the perceived emotional state of another person is almost automatically a concern of ours as well. There are two steps to any emotional response, in Frijda’s theory. The first is that a particular situation is flagged as especially relevant, which pushes it more or less irresistibly to the foreground of attention. The second is that deeply ingrained dispositions to particular kinds of action, or “action tendencies,” are more or less automatically initiated. This accords well with Damasio’s neurobiological theory of emotions. Damasio argues that emotions are prior to cognitions. They are temporally prior because the physiological mechanisms that set them in motion occur well before the situations that trigger them arrive in consciousness. They are developmentally prior because they provide the basis for social learning. They are functionally prior in that they bias our cognitions from the outset. They are prior in evolutionary terms, because they originate in more-primitive parts of the brain and provide an essential foundation for “higher” reasoning. Emotions are prior to cognitions in all of these ways simply because they are more vital to the survival and reproduction of the organism. The significance of our cognitions, according to both Frijda and Damasio, always follows the trajectory of our emotions.

198 / D IRK E ITZEN 10. Granted, everything about way the dog attack is depicted in Forest of Bliss seems to testify to its authenticity: the long takes, the catch-as-catch can framing, and especially the extreme ferocity of the attack, which seems well beyond the ability of any dog to pretend. Just the same, the very things that appear to testify to the authenticity of a sequence like this one can be used to fake it. The brilliantly contrived fake vérité documentary No Lies (Mitch Block, 1973) is a perfect example (and, because it is so perfect, a profoundly disturbing one). The bottom line is that whenever we see something in a photographic image or sequence, it is stripped of all context other than what the filmmaker has chosen to present. A person can pretend to cry so convincingly that I cannot tell by sight whether the crying is genuine or not. If I am physically present, I can always take a step back to consider the wider context or I can continue to look on indefinitely until I am sure. If I see somebody crying in a documentary, however, I cannot see past the edges of the frame and the end of the shot. The only reason I have to be confident that the crying is genuine is that, if the filmmaker had known otherwise and not let me know, too, it would have been a terrible breach of my trust. (That is why Geraldo Rivera got in hot water, years ago, for inserting reaction shots of himself weeping into an interview that, it was revealed, had been recorded with a single camera.) For a more sustained discussion of the impression that, in contrast to other kinds of discourse, photograph images “do not lie,” and of the film No Lies and its reception, see Eitzen, 1995. 11. This requires a slight modification of John Corner’s declaration that I quoted earlier, “It is on the warrant provided by the integrity of their ‘raw materials’ that most documentaries base their discursive status.” Actually, it is on the warrant provided by our trust in their integrity, period, that documentaries base their discursive status. Their “raw materials” can be anything at all, so long as they are deployed in ways that do not undermine this trust. 12. For example, for The Civil War series, ratings indicate that 97% of the audience was white, 54% male; 61% of the males were from 35 to 64 years of age, 56% had family incomes of forty thousand or above, and 37% were professional or management versus 18% skilled or semi-skilled (PBS Research). 13. Of The Atomic Cafe, Stanley Kauffman writes “What’s most disturbing about [the film] is the feeling of present-day superiority it’s apparently supposed to evoke” (p. 24). David Ansen writes, “[The film] assumes the audience’s foreknowledge and sympathy. . . . At best, it encourages outrage; at worst, smugness” (p. 73). Of Roger & Me, Pauline Kael writes: I had stopped belie:ving what Moore was saying very early; he was just too glib. . . . [The film] uses its leftism as a superior attitude. Members of the audience can laugh at ordinary working people and still feel that they’re taking a politically correct position. (p. 93) One example of the “cheap shot” strategy of Roger & Me, write Gary Crowdus and Carley Cohan, “is [filmmaker Michael] Moore’s pursuit of Miss Michigan, Kay Lani Rae Rafko, to get her views on unemployment in Flint. Did Moore really expect to get some enlightening commentary from the Miss America candidate? His badgering of the beleaguered beauty queen contestant ultimately says more about Michael Moore than it does about Miss America” (p. 28). Of Shoah, Paul Coates writes, filmmaker Claude Lanzmann “arrogates a God-like position to himself ” and condescends to the Polish villagers by “slouch[ing] patronizingly against the wall” and asking “none-too-tactful loaded questions” (p. 60). About the same movie, Edward Rogerson writes, “The director’s fascination with his own opinion casts a shadow over the film’s worth as a serious study of the Nazi Holocaust. . . . Lanzmann has not played fair with his audience. More to the point, he has not played fair with his witnesses and their harrowing testimony” (p. 65).

References Anderson, Joseph D. (1996). The Reality of Illusion: An Ecological Approach to Cognitive Film Theory. Carbondale: Southern Illinois University Press. Ansen, David. (1982, June 28). Review of Atomic Cafe. Newsweek 99: 73.

D OCUMENTARY ’ S P ECULIAR APPEALS / 199 Bateson, Gregory. (1972). Steps to an Ecology of Mind. San Francisco: Chandler. Coates, Paul. (1987, June). A Ghetto in Babel: Revisiting Lanzmann’s Shoah. Encounter 69: 60. Cohan, Carley, and Gary Crowdus. (1990). Reflections on Roger and Me, Michael Moore, and His Critics. Cineaste 17.4: 25–30. Corner, John. (1996). The Art of Record: A Critical Introduction to Documentary. Manchester, England: Manchester University Press. Damasio, Antonio. (1994). Descartes’ Error. New York: Avon Books. Dennett, Daniel. (1991). Consciousness Explained. Boston: Back Bay Books. Eitzen, Dirk. (1994). Bringing the Past to Life: The Reception and Rhetoric of Historical Documentaries. Unpublished doctoral dissertation, University of Iowa. ———. (1995). When Is a Documentary? Documentary as a Mode of Reception. Cinema Journal, 35.1: 81–102. Frijda, N. H. (1986). The Emotions. Cambridge: Cambridge University Press. Grice, H. Paul. (1975). Logic and Conversation. In Peter Cole & J. Morgan (Eds.), Syntax and Semantics, Vol. III: Speech Acts. New York: Academic. 41–58. Guynn, William. (1990). A Cinema of Nonfiction. London: Associated University Presses. Kael, Pauline. (1990, Jan. 8). Melodrama/Cartoon/Mess. New Yorker, 90–93. Kauffmann, Stanley. (1982, May 19). Review of Atomic Cafe. New Republic 186: 24. Nichols, Bill. (1991). Representing Reality. Bloomington: Indiana University Press. PBS Research. “National Audience Report for Summer Quarter 1990.” Alexandria, Virginia. Rogerson, Edward. (1988, Apr.). Movies and Metaphysics: Steiner, Coates, Shoah. Encounter 70: 65. Tagg, John. (1988). The Burden of Representation: Essays on Photographies and Histories. Amherst: University of Massachusetts Press.

12 Reality Programming: Evolutionary Models of Film and Television Viewership William Evans R EADING IS NOT dead, but it is practiced by a steadily decreasing percentage of Americans. The consumption of books, newspapers, and magazines is becoming an elite activity, still practiced by some educated and relatively affluent citizens but increasingly eschewed by others. At the same time, a sizable and steadily growing percentage of the world’s population has become heavy consumers of television and film. Why do people prefer television and film to other media? Why do we consume so much television and film? Many observers of contemporary culture have addressed these questions. Some see our heavy consumption of television and film as a symptom of cultural decay. Some see it as a symptom of capitalism and its need to colonize leisure time and commoditize audience attention. Media historians have deemed these questions too broad to be answered simply, arguing that our turn away from print media to television and film is a result of myriad and complex causes. What unites most observers and scholars of media is the belief that our heavy consumption of television and film is symptomatic, worrisome, and perhaps even a threat to democratic societies. These observers may be correct. Heavy consumption of television and film can have profound negative consequences for individual consumers and for the societies in which television and film become staples. A complete explanation of our heavy consumption of television and film would account for a large number and wide range of interrelated factors, including historical, social, political, economic, psychological, and technological factors. But this essay attempts to address why we have become heavy consumers of television and film. It is an answer that has yet to be thoughtfully explored by media theorists even though there exists ample empirical evidence to support the answer. It is a simple answer, but not simplistic. It is a parsimonious answer, but it is not a mere caricature of a theory. Rather, it suggests an overarching theoretical framework within which we can make sense of a wide range of television and film content and effects. To wit, I argue that humans have evolved to prefer television and film to print media and that our heavy consumption of television and film is attributable to the situation that these media provide efficient access to people, places, and other highly salient phenomena. I also argue that there is no need to conceptualize the access provided by television and film as mediated access. Rather, we prefer television and film in large part because these media provide access that seems real to us. We typically see and respond to television and film content as if it is real. Indeed, the realism of television and film content is

200

R EALITY P ROGRAMMING / 201

one of the most relevant factors in explaining why people prefer television and film content and why they consume so much of it. This essay will offer evolutionary and ecological explanations of television and film content and effects. Along the way, it will attempt to answer several questions regarding media content and effects that are currently vexing both media critics and media scholars: Why do people prefer television news over print news? Why is television news so frequently concerned with bad news, gossip, and celebrities? Why is reality programming popular? Why is there so much sex and violence in television and film? Environmental Surveillance and Communication Technology Animals must survey their environment, especially to monitor threats, to secure necessary resources such as food and shelter, and to realize optimum mating and reproduction strategies. Humans are social animals; we are genetically programmed to be social animals. Surveying our environment involves watching others, for threats and opportunities. And there are many other humans we must watch. Human communication abilities and even the human brain itself have evolved (or rather co-evolved) to enable humans to form and maintain adaptable and very large social groups (Deacon, 1997; Dunbar, 1996). Moreover, as a rule, as human groups grow larger, divisions of labor and hierarchies based on status, power, wealth, and sex tend to emerge. Environmental surveillance is especially important for animals that have exploited an ecological niche for social creatures with relatively high intelligence who live in groups that are adaptable, relatively large, and relatively complex. Environmental surveillance has played a crucial role in human evolution. The creation and utilization of communication technology for environmental surveillance is adaptive. Individuals who exploit communication technologies may be more likely than those who do not exploit communication technologies to identify threats and opportunities in their environment. Such individuals would be more likely to avoid danger, to accrue resources (e.g., food, wealth), to achieve a relatively high social status, and to realize optimal mating and reproductive opportunities. Humans consume popular media such as newspapers, television, and film in large part because these media facilitate environmental surveillance. These media provide costeffective surveillance across a wide range of people, places, and phenomena. These media enable us to exceed the surveillance range possible for any one person unaided by media. Humans prefer television and film to other media because television and film provide efficient and direct access, in the most interesting manner, to other people, other places, and highly salient phenomena. The human preference for television and film over print media is perfectly natural, in that humans are hardwired to attend and respond to visual stimuli, especially when visual stimuli include other people and especially when these people are engaging in salient behavior. This is not to say that television and film provide accurate information regarding our environment. In fact, television and film often provide highly inaccurate information regarding our environment. This is not to say that books and newspapers are not interesting. Rather, television and film are relatively more interesting because they provide moving images, a format for information that humans find innately engaging. To say that human utilization of media for environmental surveillance is adaptive and our preference for television and film over other media is innate is not to suggest

202 / WILLIAM EVANS

that our voracious consumption of television and film is good for us. The human preference for sweet and high-fat foods is adaptive in that it encouraged early humans to maximize food resources at a time when food was a scarce resource. But today food is a relatively plentiful resource (at least in most developed nations), and sweet and highfat foods are in especially abundant supply. Our innate, adaptive preferences for these foods now contributes to an epidemic of obesity in the United States. Similarly, while it is adaptive to exploit communication technology and innate to prefer television and film, the most popular television and film content seldom presents an accurate view of our world. Television and film producers exploit our innate preferences, offering us content that is highly salient and extremely realistic but that often presents a misleading account of our environment. The currently popular genre of “reality programming” represents an especially clever—and perhaps especially pernicious—exploitation of our innate appetite for watching others. Let us now consider three concepts—surveillance, access, and interest—that are central in an evolutionary and ecological framework for understanding why humans have become voracious consumers of television and film. Surveillance among Social Animals Harold Lasswell was one of our first and best theoreticians of mass media. Lasswell knew that mass media and mass society were interdependent. In the 1940s, Lasswell posited that mass media served a surveillance function in mass society, allowing people to monitor their environment for salient information (Lasswell, 1948). He suggested that surveillance is a crucial function in the maintenance of social groups. Lasswell noted that animals other than humans also engage in environmental surveillance. And he noted that social animals might develop specialization for surveillance. In human groups, we call the surveillance specialists “journalists.” These journalists assist us in the otherwise impossible task of keeping up with salient information across a wide range of people and places. But Lasswell did not elaborate regarding his view of surveillance as an evolutionary adaptation, and few media scholars sensed the important implications of this view. Lasswell said little about television and film per se. He said little about entertainment, because he was concerned primarily with news as surveillance (although he acknowledged that he was making an often arbitrary distinction between news and entertainment—an issue to which we will return). Still, Lasswell was among the first to insist that environmental surveillance is a crucial function of mass media in mass society. There is ample evidence that social animals engage in surveillance of others. For example, many species of birds periodically gather together in large groups, singing incessantly, even though this behavior has no obvious utility in terms of finding food, gaining protection from predators, delimiting territory, or securing mating opportunities. Wynn-Edwards (1978) suggests these gatherings function to provide the flock with an estimate of its size. The birds both listen and contribute to the singing. They adjust their nesting and mating behavior accordingly, producing more offspring when the chirping-and-singing evidence suggests that population is lower than optimum in light of available food resources and producing fewer offspring when the population is greater than the available resources can sustain. At the risk of anthropomorphisizing these birds, it can be said that they are conducting a census of sorts, the results of which somehow inform mating behavior.

R EALITY P ROGRAMMING / 203

Among higher primates, we find even more evidence of surveillance. We also find evidence that this surveillance is relatively sophisticated. Apes and monkeys pay close attention to who is doing what to whom in the troop. They pay close attention to the behavior of others, and, aware that others are doing the same, they often attempt to hide their behavior from others when doing so will afford them an advantage.1 Both male and female apes and monkeys routinely monitor the mating behavior of others. They do this to forecast and prevent attempts by others to usurp one’s mate (or mates). They do so to determine which opportunities for mating may become available, as pair bonds seem to be dissolving, as higher status males seem to be losing status and therefore losing their claims on particular females. Both males and females sometimes attempt to optimize mating strategies by concealing illicit liaisons from others (that is, liaisons that would threaten the troop hierarchy and even put the participants at risk of physical aggression if their liaison was discovered). But they also use very public liaisons when doing so reinforces the social hierarchy, affirming one’s (usually high) status in the troop. Surveillance among higher primates is not related only to sex. Apes and monkeys also pay attention to who is sharing food with whom, who is grooming whom, and so forth. In brief, apes and monkeys spend a good deal of their time watching one another, watching for threats and opportunities. Humans are keen and frequent observers of one another. We are especially adept at assessing postures, gestures, eye movements, and other forms of nonverbal communication (Ekman, 1982). We are adept at determining the demeanor of others. We are so adept at discerning the moods and intentions of others we have evolved clever ways of concealing this information from others. We may even resort to self-deception regarding our intentions when we would otherwise be unable to conceal our intentions from others (Lockard & Paulhus, 1988). Do humans have a good sense of the size and nature of their social groups? We do, at least in terms of our immediate surroundings. Most people can offer rather accurate estimates of the number, age, sex, and ethnicity of people who reside on their street or who work in their office—people whom they frequently see. Humans will generalize beyond others they frequently see, believing that the next street over is likely to be similar to the street on which they live, that the entire corporation is likely to be similar in terms of demographics to their department (Nisbett & Ross, 1980). This is usually a safe bet, but, of course, we are sometimes wrong. The next street over may be markedly different. The department in which one works may be markedly different than the rest of the corporation. Yet, we readily make inferences regarding populations we have not seen directly, inferences based primarily on the segment of the population with whom we have direct contact. We also readily estimate specific traits manifested in these populations. That is, we stereotype. We attend to far more than the demographics of our world. We readily offer estimates regarding the distribution of beliefs and opinions across our neighbors. And we are often relatively accurate. Political scientists, communication researchers, and public-opinion researchers working in the spiral-of-silence research program have shown that humans have an innate “quasi-statistical sense” of the distribution of beliefs among those around them (Noelle-Neumann, 1993). We even have a sense of the waxing and waning trajectories of particular beliefs. As social animals, we desire to be accepted by

204 / WILLIAM EVANS

others and fear being shunned. Accordingly, we are less likely to publicly reveal our opinions that we believe to be minority opinions. Even if we believe ourselves to be in the majority, we are less likely to reveal opinions that we believe to be losing adherents. We are more likely to change an opinion we hold if we believe the opinion will soon be held by only a small minority of the people around us. Again, we readily offer estimates regarding the distribution of beliefs and opinions across our neighbors, and we are often relatively accurate—so long as we have had direct and frequent contact with our neighbors. But survey research shows that we have become willing to offer estimates regarding the distribution of people, traits, behaviors, and beliefs in our communities, our states, our nations, and even the world. When asked to do so by survey researchers, Americans will estimate, say, the percentage of American adults who oppose abortion or which candidate will be the likely winner in a Presidential election. We will offer estimates of the distribution of opinions on all sorts of topics. Why? What makes us think we can do this? Part of the answer is: mass media. Especially television and film. Television and film provide access to people, places, behaviors, and phenomena that range far beyond our own neighborhoods and workplaces. Access Television and film offer us innately interesting, realistic access to people and places we might otherwise never see. As heavy consumers of television and film, citizens in most developed nations now attend to many dozens of people on screen each day. These people on the screen are women and men. They hold a variety of occupations although people who hold higher-status occupations such as lawyers and physicians are more commonly encountered in television and film than they are in real life. We see both attractive and unattractive people on screen although the people we see in television and film are relatively more powerful and attractive than people we are likely to see off screen, in “real life,” on any given day. The people we see in television and film engage in a variety of activities although we are far more likely to see them engage in violent and sexual activity than we are likely to witness such activities in real life. Moreover, consumers of television and film are not hardwired to distinguish fact from fiction in moving images although we can readily do so if we are so motivated or if the moving image is somehow unfaithful to reality (Anderson, 1996). Viewers typically spend little cognitive resources wondering if the images they see on television and in film are fictional or nonfictional. We will consider later the evidence that access to other people via television and film affects our view of the world regardless of whether this access occurs via programming classified as fictional or nonfictional. But for now it is enough only to note that television and film provide access to a wide variety of people and places. Television and film provide access not only to other people but also to their behavior. Moreover, television and film provide access to behavior that we find salient, perhaps innately so. With his concept of the global village, McLuhan (1964) made it fashionable to view access as one of the most relevant concepts in understanding electronic media. Unfortunately, McLuhan did not rigorously tease out the implications of his important observations regarding the access afforded by electronic media. This task was accomplished two decades later, by Meyrowitz (1985), who argued that the primary power of electronic media is to provide us with access to otherwise inaccessible people—and their otherwise inaccessible thoughts and behaviors. According to Meyrowitz, television

R EALITY P ROGRAMMING / 205

and film provide us with access to backstage behavior, a concept he borrows from Goffman (1963). Backstage behavior is behavior that we typically are not allowed to see. Not because it is lurid (although it sometimes is) but because it is behavior that reveals what is not typically revealed to us. In evolutionary terms (which Meyrowitz does not use), backstage access may provide us with an advantage over others who are not permitted backstage access. Backstage access may disadvantage those who would wish to conceal their behavior backstage. To provide some examples, a politician would likely regard as backstage behavior his or her discussions with his or her staff regarding how his or her administration will attempt to manipulate public opinion. Two female acquaintances might regard as backstage behavior their discussions about their relationships with boyfriends. They would likely wish to conceal these discussions from the boyfriends as well as from others who might overhear them. Backstage behavior can quite literally occur backstage, as in the case of theater performers who, before they take the stage, may candidly discuss with one another their doubts and fears regarding their performances, their (usually low) opinion of the audience, and their post-performance plans for a night of revelry. Access to such behavior was once rare, but television and film routinely take us backstage, offering us compelling and realistic portrayals of behavior we would otherwise be unlikely to witness. These portrayals often take the form of television and film content classified as fictional. For example, the backstage behavior discussed in the preceding paragraph is accessible in Oliver Stone’s Nixon, in Rob Reiner’s When Harry Met Sally, and in Mike Leigh’s Topsy-Turvy. These portrayals may not be accurate, a point to which we will return, but again, they are compelling and realistic. Television and film provide access to people and behavior that viewers find innately interesting and highly salient. Interest and Salience Realistic moving images are interesting. Moving pictures of people, animals, and objects elicit more interest and arousal than still photographs of people, animals, and objects (Detenber & Reeves, 1996). Realistic moving images that offer narrative accounts of human activities are especially interesting. Television and film provide carefully crafted, highly realistic images of humans engaging in interesting activities in interesting situations. Tan (1996) details the myriad ways in which narrative film elicits viewer interest, arguing that interest is an emotion, perhaps even a fundamental state that precedes and primes the viewer for subsequent emotions such as fear, anger, empathy, and so forth. Anderson (1996) suggests that film viewers may have an innate curiosity regarding the characters they encounter in narrative film and the situations these characters may face. It is quite natural, Anderson notes, that narrative films generate viewer curiosity regarding characters’ behaviors, circumstances, and their fate. The rapidly growing literature on interest and emotion in film (e.g., Carroll, 1999; Eitzen, 1999; Grodal, 1999; Plantinga, 1999; Tan & Frijda, 1999) delves into complexities and controversies glossed over or not included in this essay. For the purposes of the argument developed here, we need only note that interest is a basic human emotion that is readily elicited by narrative television and film. Humans are primed to be interested in narrative accounts of human activity, especially when these narrative accounts are offered via realistic moving images. Moreover, television and film narratives typically show human behavior that is highly salient. Obviously, sex is highly salient behavior for humans just as it is for apes and chimpan-

206 / WILLIAM EVANS

zees. As an exemplification of conflicts related to power and status, violence is an innately salient feature of television and film narratives. Access to people relatively high in status—and their backstage behavior—is highly salient. Contemporary television and film traffic in portrayals of the rich, powerful, and beautiful. Television and film traffic in narrative accounts of men and women negotiating romantic and sexual relationships and of lower-status protagonists overcoming challenges posed by relatively higher-status antagonists. Television and film provide realistic visual accounts of situations and behavior that humans find innately interesting. In this sense, our preference for television and film over print media is quite natural. Our heavy consumption of television and film results in large part from a perception that these media assist us in surveying our environment, of discovering who believes what, and who is sleeping with whom; of gaining access to higher-status individuals; and of discovering dangers and opportunities in our environment. Surveillance and (Direct) Access via Television and Film Of course, television and film typically provide inaccurate representations of reality. Yet, television and film often exercise a powerful influence on viewers’ perceptions of reality. There is ample evidence that heavy consumers of television and film come to believe that the world is in fact similar to the world as revealed via television and film (Gerbner, Gross, Morgan, & Signorielli, 1994; Morgan & Shanahan, 1996). This is unfortunate but also understandable. The access to others that is afforded by television and film is in an important sense direct access—one might even say unmediated access—in that viewers typically see and respond to television and film as if television and film show real people, real places, and real behavior (Reeves & Nass, 1996). Consumers can be influenced by news accounts of the distribution of public opinion. That is, consumer estimates of the distribution of opinions on a topic are often based at least in part on news accounts of public-opinion polls (Noelle-Neumann, 1993). This is not surprising, but there is evidence that consumer estimates regarding the distribution of public opinion are shaped simply by the number of people who divulge their opinions in a news story. Zillmann and Brosius (2000) have found that estimates of the distribution of opinions on various topics are strongly influenced by exemplification in news stories. The authors report that news consumers often seem to conduct an informal census of sorts, attending to the number of people in a particular news story who speak for and against certain propositions or political candidates. Across many experiments dealing with a variety of topics, Zillmann and Brosius find that by manipulating the number of people quoted in a news story as favoring or opposing a topic, they can correspondingly shift subjects’ estimates regarding the distribution of opinion in the real world. This influence is especially powerful in television news accounts, where the number of talking heads who speak for or against any given topic is often highly correlated with viewers’ estimates of the distribution of public opinion regarding that topic. In addition, visual information in television news stories will often trump information presented verbally but not visually. For example, even when television viewers are told via a reporter’s voice-over that an opinion poll reveals a majority of people favor a particular political proposal, viewers who see a majority of talking heads indicate opposition to the proposal will later offer estimates of the distribution of opinion in the real world that correspond more closely to the distribution of opinion manifested across the talking heads than specified in the reporter’s voice-over testimony regarding public opin-

R EALITY P ROGRAMMING / 207

ion data (Brosius & Bathelt, 1994). Apparently, the visual portrayal of a talking human is among the most salient and therefore most influential features of television news. News programming is not the only type of programming that can influence viewers’ perceptions of reality. Entertainment programming also routinely influences viewers’ perceptions of reality. The cultivation analysis research program has generated more than one hundred studies that show a correlation between consumption of entertainment television and viewer perceptions of reality (Gerbner, Gross, Morgan, & Signorielli, 1994; Morgan & Shanahan, 1996). Heavy consumers of television are more likely than light viewers to believe that the world is similar to the world portrayed on television. For example, heavy consumers of television are more likely than light viewers to overestimate the percentage of the population that works as lawyers, doctors, and police officers, occupations that are overrepresented on television in proportion to their real-life numbers. Heavy consumers of television are more likely than light viewers to overestimate the likelihood that they will fall victim to violent crimes, which are more common on television than in real life. Viewers’ perceptions of the distribution of physical traits in the population are also influenced by television and film. Television and film show us a higher percentage of thin women, muscular men, and generally attractive people of both sexes than we are likely to encounter in real life. Recent research (Cattarin, Thompson, Thomas, & Williams, 2000) suggests that exposure to television and film is associated with unrealistic estimates regarding the percentage of thin women, strong men, and generally attractive people among us. Exposure to these images has been associated with an increased likelihood of body self-image disorders among both female and male viewers (Botta, 1999; Leit, 2001). There is ample evidence that viewers’ perceptions of reality are shaped in part by television and film, especially viewer perceptions regarding the distribution of people and their behavior, beliefs, and physical traits. But why are viewers influenced thusly by television and film portrayals? It is not because heavy viewers are relatively less intelligent than lighter viewers (although there may exist a negative correlation between intelligence and television and film consumption). It is not because (or not only because) viewers are uncritical although, to be sure, few viewers reflect on the human handiwork required to create compelling television and film narratives or the purposes for which advertisers sponsor such programming. Rather, viewers perceptions of reality are shaped by television and film because television and film content seems real to viewers. The people on television and in film seem real, and their activities seem real. Of course, most of the people who appear in television and film are indeed real people, and their behavior is real. These real people and their behavior are captured on videotape or film, usually by production personnel who are highly skilled at keeping their handiwork invisible, ensuring that they exploit or at least do not interfere with viewers’ innate inclination to see television and film images as real. Since the 1950s, researchers have documented parasocial interaction between television viewers and the people they watch on television (Horton & Wohl, 1956; Levy, 1979; Rafaeli, 1990). Viewers frequently come to feel as if they have a personal relationship of sorts with television actors and television personalities. Parasocial interaction is neither pathological nor rare. It is especially likely to be manifested by viewers who follow a show over several years. Parasocial interaction seems to be cultivated by people on television who make eye contact with the camera or who offer self-disclo-

208 / WILLIAM EVANS

sures regarding their personal life. Talk-show hosts, news anchors, and soap-opera characters seem especially likely to generate in habitual viewers a feeling that there is a social bond between the viewer and the character or personality. The fact that people on television tend to be attractive and hold (or portray people who hold) high-status positions in the social order may make parasocial interaction more likely, because viewers may find it somehow rewarding to count such people as members of their social group. Of course, viewers seldom profess to have a personal relationship with the television characters they habitually watch. It seems likely that most viewers would deny this, insisting that they understand that the people they watch on television are not among their family, friends, and acquaintances. But research shows that viewers often respond to television characters as they would respond to friends or family members, believing they can gauge day-to-day changes in the thoughts and emotions of talk-show hosts, news anchors, or soap-opera characters they have watched for years, feeling that it is important to them that the talk-show host, news anchor, or soap-opera character be happy (Rafaeli, 1990). Viewers do not believe they interact with these television characters or develop social ties to them, but viewers have nonetheless developed a relationship of sorts. Parasocial interaction is a normal outcome of habitual viewing of people on television who seem real and about whom viewers have come to care. Consumers of television and film may be especially likely to develop bonds with their favorite television and film personalities, but there is growing evidence that viewers of television and film tend to see most people on screen as real people. In a series of cleverly designed experiments, Reeves and Nass (1996) have shown that viewers presume that people on screen are real. Moreover, viewers do not interpret people on screen as representations of real people. Reeves and Nass suggest it makes little sense to say that viewers interpret the images they see. Rather, viewers simply presume that people on screen are real people. When watching television and film, most viewers do not posit that the person they see before them—captured in video or on film in a manner that does not call attention to its artifice—is an actor behind whom there exists a screenwriter. Scholars of television and film have learned to be critical viewers, to keep in mind the artifice behind people who appear on the screen. Laypersons, too, can and often do attend to the fact that the people on television and in film are actors. But for most viewers, most of the time, people on the screen are simply people. They are not mediated representations of people. Reality Programming and Media Scholarship The evidence that many viewers see much of the content of television and film as real has long been available and is steadily growing (see especially Busselle, 2001; Busselle & Greenberg, 2000; Potter, 1988). This ecological view of media processes can be combined with insights provided by evolutionary biology and evolutionary psychology to provide a useful framework with which to investigate some current controversies regarding media processes and effects. Television “reality” shows have become one of the most popular and profitable programming genres. Shows such as Survivor, Temptation Island, Real World, Big Brother, and The Apprentice may seem especially real to viewers because these shows circumvent whatever content features may alert viewers that the images they see before them have been created and manipulated. Reality shows do not employ actors. Reality shows avoid stylized lighting, photographic, and editing techniques. Reality shows are especially adept

R EALITY P ROGRAMMING / 209

at convincing viewers that they are watching real people engage in unscripted behavior. Again, this is perhaps the most common—and most natural—viewer response to mainstream television and film; that is, viewers typically do not need to be convinced that they are watching real people. But reality programming effectively exploits this response, using a visual style, programming content, and marketing and advertising campaigns designed to reinforce viewers’ innate sense that they are watching real people, in real places, engaged in real behavior. Of course, if the behavior of people on Survivor and similar shows was not salient, relatively few people would become habitual viewers of these shows. Reality programming shows people immersed in social situations that are highly dynamic, in which a social order has yet to emerge. Reality programming typically shows strangers who have been brought together in part because the producers expect tensions will emerge as the strangers attempt to live together. Reality programming shows people struggling to form and maintain strategic alliances, alliances that will enable them to optimize their position in the social group. To ensure that tensions emerge, producers have invented contests in which participants can earn an increased likelihood of rising to a position of dominance in the group. Participants in these contests often win by demonstrating superior physical or (less frequently) mental skills. These contests are zero-sum games. Losers are increasingly likely to become socially isolated (or even removed from the social group). Alternatively, weaker participants may form alliances with one another to preemptively challenge a seemingly stronger participant. To ensure that the social dynamics remain tense and evolve quickly enough to move appreciably forward in one-hour increments, producers may institute recurring events in which the group must collectively decide who among them will be forced to leave (e.g., be “voted off the island,” in the parlance of Survivor’s first season on United States television). Romantic relationships and sexual behavior are often manifested in these shows. Indeed, Temptation Island focused romantic and sexual relationships. Viewers of reality programming see much backstage behavior, being afforded access to individuals and subsets of the group discussing their emotions, beliefs, alliances, and strategic plans. Similarly, television talk shows such as the Jerry Springer Show, the Maury Povitch Show, and Montel have enjoyed great success as they provide audience access to candid discussions of sexual behavior and romantic interpersonal dynamics. Accounts of sexual infidelity and deviant sexual behavior are frequent on these shows (Greenberg, Sherry, & Smith, 1997). Viewers are often provided backstage access of sorts, watching people talk about the alleged betrayals and other sins of lovers, spouses, family members, and friends, while the alleged sinners are shown isolated backstage, restricted from access to these candid conversations and self-disclosures. Social tensions, portrayals of romance, and access to backstage behavior and conversation are staples of mainstream television and film content. As social animals, television and film viewers find such content compelling. Viewers gain access to others via television and film, frequently others who are attractive and powerful. This access seems direct and unmediated. This access often seems privileged as viewers are afforded intimate, backstage access. Viewers may develop parasocial bonds with particularly compelling characters. Certainly, many viewers see and respond to people on television and in film as if they were real people. To the extent that viewers believe that television and film provide environmental surveillance, they may be more likely to habitually consume television and film, more likely to manifest parasocial interaction, and more likely to

210 / WILLIAM EVANS

prefer television and film content that affords access to salient backstage behavior (e.g., reality programming). Of course, the habitual consumption of realistic television and film may be a cause as well as a consequence of the belief that film and television provide environmental surveillance. Like social dynamics, the dynamics of television and film processes and effects are complex (and indeed, television and film processes are an important component of contemporary social dynamics). Admittedly, there is little evidence in the literature that directly supports the predictions offered here regarding the relationship between viewers’ content preferences and their perceptions of television and film as technologies of environmental surveillance, but these are intriguing hypotheses that can be empirically investigated. The notion that television provides environmental surveillance may be especially useful in understanding viewer preferences regarding television news. Critics of journalism bemoan the fact that television newscasts so frequently traffic in bad news, deviance, gossip, and “human interest” angles on the stories reported. This is derided as “tabloid journalism” and sensationalism. However, Shoemaker (1996) offers a convincing evolutionary account of consumer preference for bad news, noting that it is adaptive for humans to attend to bad news rather than good news. News of a threat or danger in one’s environment, Shoemaker reasons, is more salient than news that there exists no danger. It is adaptive to seek out information regarding deviance, Shoemaker notes, because deviance is potentially threatening. It is perhaps maladaptive to expend time and energy attending to news that has no potential negative implications. Exemplification is a common—and very powerful—feature of television news (Zillmann, 1999). It is quite natural that news accounts should frequently include details regarding how one or more people are personally affected by the topic being covered in a story. These “human interest” or “person on the street” testimonies are disparaged by critics of television journalism who believe that this sort of coverage cultivates emotional rather than intellectual viewer understanding of a topic and distracts viewers from considering the complexities of a topic. These critics are likely correct. But viewers find other people salient, especially when other people offer revealing testimony about themselves and their emotions. Gossip has long been a common feature of news. For centuries, newspapers have covered the personal intrigues of famous people and the salacious details of the lives of lessfamous people (Stephens, 1988). Television news continues this tradition. But television enables us to see the people involved, who are often framed in a close-up shot that establishes an intimate distance between viewer and the person on the screen. The people on television often provide intimate self-disclosures or angry denials of alleged deviant behavior. Television captures the subtle facial and eye movements that humans have evolved to find highly salient components of human conversation, because facial and eye movements provide information regarding the honesty of the speaker (Ekman, 1982). Dunbar (1996) theorizes that gossip has always accounted for a large percentage of the communication exchanged between humans. Dunbar argues that gossip is crucial in the creation and maintenance of human social groups. Individuals with access to exclusive (or at least not widely distributed) information regarding who is doing what to whom may enjoy an advantage over others who do not enjoy this access. Individuals who possess salient information and the individuals with whom they share this information may be more likely to rise to and maintain relatively high positions in the social order. Gossip evolved as an important means of environmental surveillance among

R EALITY P ROGRAMMING / 211

early humans, Dunbar argues. In this sense, gossip is evolutionarily adaptive behavior. Television news succeeds in large part because it provides gossip and does so in a visually realistic manner, providing access to the people involved and providing important information regarding their demeanor, emotions, and veracity. From an evolutionary perspective, it is not surprising that most consumers prefer television news to print news. Moreover, it is not necessarily true that television news does a poor job of informing its viewers. Television news provides a great deal of information about people and their beliefs, emotions, and behavior. Television news provides information that viewers have evolved to find salient in a visual form that viewers have evolved to prefer. Access to others via television news provides a great deal of information, but this information does little to cultivate logical and abstract reasoning about the issues and events that impact viewers’ lives. Access to others via television and film entertainment programming convinces viewers that television and film provide environmental surveillance, but in fact television and film often provide inaccurate accounts of our world. Television and film exploit innate human preferences and capabilities that have been adaptive in the distant past. Today, the consumption of mass media may be adaptive but only for the individual who harnesses the power of the media to provide accurate environmental surveillance. News and information services such as Associated Press, Reuters, and Dow Jones provide valuable resources for readers and viewers who wish to monitor the political and economic environment. Still, entertainment programming is far more popular (and profitable) than news programming, even if entertainment programming provides misleading accounts of our environment and does little to cultivate sophisticated understandings of our complex world. Pinker (1997) suggests that consumption of television and film entertainment is a nonadaptive byproduct of our adaptive preferences for rich visual and auditory stimuli. Pinker also briefly develops a point also made in this essay, noting that television and film succeed because they show us attractive people engaged in sexual and political intrigue, portrayals that we find innately engaging. Television and film provide viewers with a sense that they enjoy access to high-status individuals and to backstage behavior. Television and film seldom provide viewers with access or information that viewers can use to advance their positions in the social order. Instead, television and film provide viewers with a sense that they already enjoy a privileged place in the social order (Thomas, 1989). Television and film succeed by convincing viewers that television and film provide a window on the world. However, the view from this window is inaccurate. In a sense, viewers demand inaccurate portrayals when they manifest a preference for access to attractive, powerful people who engage in salient behavior. It is a mathematical impossibility to expect that every television viewer could enjoy intimate real-life access to highstatus individuals. There are simply too few rich and beautiful people and too many relatively less affluent and less attractive people. Television ratings data and Hollywood box office returns demonstrate that viewers will pay for access to beautiful people engaged in salient behavior. At the same time, viewers will shun programming that accurately mirrors the real-world dearth of wealthy and very beautiful people. The media and advertising professionals who produce and support mainstream television and film can be blamed for exploiting viewers’ natural preferences. But commercial media inter-

212 / WILLIAM EVANS

ests did not create these viewer preferences. Viewers will prefer programming that mirrors reality only if they can suppress or transcend their desire to gain access to high-status individuals and to backstage behavior. Because this desire is determined primarily by biology, it is not a desire that can be easily suppressed or transcended. We cannot make sense of the large and steadily growing evidence regarding media processes and effects unless we posit that (1) television and film images seem real to viewers and (2) viewers prefer television and film content that provides access to people, places, and behavior that viewers are hardwired to find salient. Attempts to theorize the role of television and film in society must acknowledge that biology compels and constrains viewing preferences. Certainly, many political-economic factors are implicated in media processes and effects. But a full account must begin by acknowledging that humans have evolved to prefer television and film to other media. Our preferences for particular types of television and film content are also attributable primarily to evolutionary forces. Television and film may be leading us astray, providing misleading accounts of our world, but television and film can do so primarily because they exploit human biology and innate human preferences. Evolutionary and ecological approaches would seem to be required if we are to fully understand and ultimately re-engineer media content and consumption patterns. Note 1. These facts, and the facts regarding higher primates adduced in subsequent paragraphs, are discussed in several rather well-known books. See Cheney & Seyfarth (1990); de Waal (1993); Harcourt & de Waal (1993); Dunbar (1988); and Smuts, Cheney, Seyfarth, Wrangham, & Struhsaker (1987).

References Anderson, J. D. (1996). The reality of illusion: An ecological approach to cognitive film theory. Carbondale: Southern Illinois University Press. Botta, R. A. (1999). Television images and adolescent girls’ body image disturbance. Journal of Communication, 49(2), 22–41. Brosius, H.-B., & Bathelt, A. (1994). The utility of exemplars in persuasive communication. Communication Research, 21, 48–78. Busselle, R. W. (2001). Television exposure, perceived realism, and exemplar accessibility in the social judgment process. Media Psychology, 3, 43–67. Busselle, R. W., & Greenberg, B. S. (2000). The nature of television realism judgments: A reevaluation of their conceptualization and measurement. Mass Communication and Society, 3, 249–86. Carroll, N. (1999). Film, emotion, and genre. Plantinga & Smith (pp. 21–47)). Cattarin, J. A., Thompson, J. K., Thomas, C., & Williams, R. (2000). Body image, mood, and televised images of attractiveness: The role of social comparison. Journal of Social and Clinical Psychology, 19, 220–39. Cheney, D. L., & Seyfarth, S. M. (1990). How monkeys see the world. Chicago: University of Chicago Press. Deacon, T. W. (1997). The symbolic species: The co-evolution of language and the brain. New York: Norton. Detenber, B. H., & Reeves, B. (1996). A bio-informational theory of emotion: Motion and image size effects on viewers. Journal of Communication, 46(3), 66–84. de Waal, F. B. M. (1998). Chimpanzee politics: Power and sex among the apes. (Rev. ed.). Baltimore: Johns Hopkins University Press.

R EALITY P ROGRAMMING / 213 Dunbar, R. I. M. (1988). Primate social systems. Ithaca, NY: Cornell University Press. ———. (1996). Grooming, gossip, and the evolution of language. Cambridge, MA: Harvard University Press. Eitzen, D. (1999). The emotional basis of film comedy. Plantinga & Smith (pp. 84–99). Ekman, P. (Ed.). (1982). Emotion in the human face. (2nd ed.). New York: Cambridge University Press. Gerbner, G., Gross, M., Morgan, M., & Signorielli, N. (1994). Growing up with television: The cultivation perspective. In J. Bryant & D. Zillmann (Eds.), Media effects: Advances in theory and research (pp. 17–41). Hillsdale, NJ: Lawrence Erlbaum. Goffman, E. (1963). Behavior in public places: Notes on the social organization of gatherings. New York: Free Press. Greenberg, B. S., Sherry, J. L., & Smith, S. (1997). Daytime television talk shows: Guests, content, and interactions. Journal of Broadcasting & Electronic Media, 41, 412–19. Grodal, T. (1999). Emotions, cognitions, and narrative patterns in film. Plantinga & Smith (pp. 127– 45). Harcourt, A., & de Waal, F. B. M. (Eds.). (1993). Coalitions and alliances in humans and other animals. New York: Oxford University Press. Horton, D., & Wohl, R. R. (1956). Mass communication and para-social interaction: Observation on intimacy at a distance. Psychiatry, 19, 215 –29. Lasswell, H. (1948). The structure and function of communication in society. In L. Bryson (Ed.), The communication of ideas (pp. 54–65). New York: Harper. Leit, R. A. (2001). The media’s representation of the ideal male body: A cause for muscle dysmorphia? Unpublished doctoral dissertation, American University, Washington, DC. Levy, M. R. (1979). Watching TV news as para-social interaction. Journal of Broadcasting, 23, 69–80. Lockard, J. S., & Paulhus, D. L. (Eds.). (1988). Self-deception: An adaptive mechanism? Englewood Cliffs, NJ: Prentice Hall. McLuhan, M. (1964). Understanding media: The extensions of man. New York: McGraw-Hill. Meyrowitz, J. (1985). No sense of place: The impact of electronic media on social behavior. New York: Oxford University Press. Morgan, M., & Shanahan, M. (1996). Two decades of cultivation research: An appraisal and metaanalysis. Communication Yearbook, 20, 1–45. Nisbett, R. E., & Ross, L. (1980). Human inference: Strategies and shortcomings of social judgment. Englewood Cliffs, NJ: Prentice Hall. Noelle-Neumann, E. (1993). The spiral of silence: Public opinion, our social skin. 2nd ed. Chicago: University of Chicago Press. Pinker, S. (1997). How the mind works. New York: Norton. Plantinga, C. (1999). The scene of empathy and the human face on film. Plantinga & Smith (pp. 239–355). Plantinga, C., & Smith, G. M. (Eds.). (1999). Passionate views: Film, cognition, and emotion. Baltimore: Johns Hopkins University Press. Potter, W. J. (1988). Perceived reality in television effects research. Journal of Broadcasting & Electronic Media, 32, 23–41. Rafaeli, S. (1990). Interacting with media: Para-social interaction and real interaction. In B. D. Ruben & L. A. Lievrouw (Eds.), Mediation, information, and communication: Information and behavior (pp. 125–81). New Brunswick, NJ: Transaction Press. Reeves, B., & Nass, C. (1996). The media equation: How people treat computers, television, and new media like real people and places. New York: Cambridge University Press. Shoemaker, P. J. (1996). Hardwired for news: Using biological and cultural evolution to explain the surveillance function. Journal of Communication, 46(3), 32–47. Smuts, B. B., Cheney, D. L., Seyfarth, R. M., Wrangham, R. W., & Struhsaker, T. T. (Eds.). (1987). Primate societies. Chicago: University of Chicago Press. Stephens, M. (1988). A history of news: From the drum to the satellite. New York: Viking.

214 / WILLIAM EVANS Tan, E. S. (1996). Emotion and the structure of narrative film: Film as an emotion machine. Mahwah, NJ: Lawrence Erlbaum. Tan, E. S., & Frijda, N. H. (1999). Sentiment in film viewing. Plantinga & Smith (pp. 48–64). Thomas, S. (1989). Functionalism revised and applied to mass communication study. In B. Dervin, L. Grossberg, L. O’Keffe, & E. Wartella (Eds.), Rethinking communication, volume 2: Paradigm exemplars (pp. 149–60). Newbury Park, CA: Sage. Wynne-Edwards, V. C. (1978). Intrinsic population control: An introduction. In F. Ebling & D. M. Stoddart (Eds.), Population control by social behavior (pp. 1–22). London: Institute of Biology. Zillmann, D. (1999). Exemplification theory: Judging the whole by the sum of its parts. Media Psychology, 1, 69–94. Zillmann, D., and Brosius, H.-B. (2000) Exemplification in communication: The influence of case reports on the perception of issues. Mahwah, NJ: Lawrence Erlbaum.

Part Seven Events, Symbols, and Metaphors I N T HE S ENSES Considered as Perceptual Systems (1966), James J. Gibson wrote: There is a curious paradox about a picture—it is neither a pure display on the one hand nor a pure deception on the other. The stimulus conveys information for both what it is physically and what it stands for. (235) When viewing a motion picture, we are constantly in alternation between seeing the scene and seeing the surface. The perceptual alternation between scene and surface constitutes a framing of the motion-picture viewing event, separating the experience of a motion picture from the experience of the real world, and serving as a constant reminder to the viewer that the scene in which he is involved is not the natural world but an image (Anderson, 1996). There are two sets of information, one for the scene and one for the surface, but we do not see them equally, for there is much more information in the scene, the fictional world of the movie, what film theorists call the diegetic world. Sheena Rogers discusses this phenomenon in her essay, “Through Alice’s Glass: The Creation and Perception of Other Worlds in Movies, Pictures, and Virtual Reality.” The diegetic world is, to use her terms, seen “through Alice’s glass.” It is a world created by the filmmaker that is like ours, yet different. It is a world we as viewers can watch but not touch, become engaged with but not interact with. Rogers offers an ecological theory of meaning in film, beginning with the basic tenets of ecological psychology—a foundation in realism. “As observers,” she writes, “we become immersed in the world of the motion pictures because it shares its natural (non-symbolic) meaning with the real world it depicts.” The central concept is information rather than sensation as the starting point for perception. “The ecological concept of information provides a possible explanation for the moviegoer’s sense of immersion in another world.” She reminds us of the crucial role played by movement in the experience of perception—movement of objects, in the world or in moving images, and movement of the perceiver. When perceiving the natural world, our movement increases the information available to us about the world. With a movie it is different. Any movement on the part of the viewer provides information only about the surface on which the images move, the “glass” of the essay’s title. The ecological explanation Rogers develops calls for a redefinition of the role of the filmmaker. In contrast to the dictatorial illusionist of subject-position and post-modernist film theory where a passive viewer is sutured into the text or victimized by imprisonment in the camera’s position and constantly threatened with absorption, an ecological perspective offers a much more positive assessment of both filmmaker and viewer. The viewer is an active explorer, constantly in search of information, and the filmmaker as “auteur of information” takes on the formidable responsibility of providing the information needed to understand the world being portrayed on the screen. This requires,

215

216 / PART S EVEN

in Rogers’ words, “careful deployment of informative structure under a set of constraints.” The filmmaker’s role according to Rogers is “not to invent new symbols and new languages but to direct the availability of information.” The filmmaker is doing our job of moving through the world, selecting structures for our attention, moving in and through the environment, revealing further information about the (diegetic) world with each camera move. All of this functions on a very basic level, that of the primary or natural meaning of the moving image. As Rogers points out, we still have to follow the narrative and figure out the secondary and intrinsic meanings for ourselves. But we should take care not to dismiss or take too lightly those systems that operate at such fundamental levels. A special property of the ecological concept of information is that it points both to the world and to ourselves. As we move about our world, we gain information not only about the world out there but also about ourselves in relation to it. It is this property, Rogers asserts, “that is responsible for our deep sense of engagement in the life of the movie.” In “Metaphors in Movies,” John M. Kennedy and Dan L. Chiappe move beyond the level of primary meaning to consider what Erwin Panofsky would call secondary or intrinsic meanings, and here, too, they find an ecological perspective valuable. “We contend figurative expression is based on what is literal,” they write. “The theory has realism about the ecological visible world at its core.” The theory of metaphor, they offer, can be applied to both language and images. The theory stresses that the understanding of tropes (figurative or nonliteral expressions) depends upon the recognition of not just common features (on which many theories of metaphor are based) but relevant features, which, like all ecological acts of perception, require taking into consideration the context or surroundings of the event being perceived. “Our target,” they write, “is a theory of metaphor and symbol based on literal and realistic representation, using relevant features that might be shared by what is represented and how it is represented.” The theory of metaphor offered by Kennedy and Chiappe, like Rogers’s ecological explanation of film viewing, stresses the universals in perception and communication. They emphasize the central role played by those capacities with which we have been endowed by evolution and that, as human beings occupying the same ecological niche, we all share and all bring to our experience of moving images. References Anderson, J. D. (1996). The reality of illusion: An ecological approach to cognitive film theory. Carbondale: Southern Illinois University Press. Gibson, J. J. (1966). The senses considered as perceptual systems. Boston: Houghton Mifflin.

13 Through Alice’s Glass: The Creation and Perception of Other Worlds in Movies, Pictures, and Virtual Reality Sheena Rogers W E TAKE OUR seats, sit back, and prepare to be enthralled. In the coming “two hours’ traffic of our stage,” we will see people, places, objects, and events. They are familiar— they are ours, from this planet, this life. We understand what we see. But wait! The actor is twenty feet tall. She is flat and trapped on a surface that slopes away to one side. She is in focus, but the landscape surrounding her is not. We see her face, then her back, then her face again. Our viewpoint changes though we have not moved. This is a motion picture. We shift our attention once more, and the other world returns. The people are not giants. They move in depth, they are solid. There is a conversation. This is life. How to explain this experience—the easy reality of life behind the screen and the undeniable artifice of the sequence of highly crafted images projected there? Film theory has swung from realism through semiology to a postmodernist neglect of this essential question (see Anderson, 1996). The created world is “real” to be simply seen. It is a book to be read, a mental fiction to be constructed. The ecological approach to perceptual psychology supports a new alternative: The motion picture is an artifact, constructed to exploit everyday perceptual processes and to reveal information about the world depicted. As an artifact, the motion picture is not part of the natural world though it is like it. We look but we cannot touch. Past events are re-presented or replayed before us but can not be reshaped. The filmmaker has created a world through Alice’s glass—like ours, yet different. Filmmakers and photographers have long sought to create new worlds and preserve moments of old ones. Their chosen media seem especially close to the truth of life compared to the more visibly manufactured media of drawing, painting, and now virtual reality. Small slices of real life are caught in silver or preserved in digital memory to be pasted to the paper in our photo albums or strung together in sequences that replay the past over and over. Through narrative and visual trickery, the real objects and events thus captured can seem to show worlds unknown, times and places that no living person has seen. Yet, while knowing there is artifice at work, we cannot shed the feeling that we observe reality. Light made these images, and that light was once in contact with the actual objects and places that we see in the image. Indeed, the philosopher Ken Walton (1984) has written that to observe a photograph of Napoleon is to observe Napoleon himself. (See also Bazin, 1971). By extension then, perception of the picture proceeds exactly as it would if the originating object were itself before us and not its image. No special theory of image or film is needed because pictures, still or moving, are not recognized as having any special qualities that distinguish them from natural objects.

217

218 / SHEENA R OGERS

Semiotic theory taught us to reject this simple-minded realism. On this view, film communicates its meaning through a system of signs. In film, each shot can be a word, each scene a sentence, and each sequence a paragraph in the language of film (Monaco, 1981, pp. 129–30). Still images and motion pictures need to be read. The image of Napoleon is not Napoleon. It is an artifact, shot-through with conventions that distance it from its 3-D subject. Semioticians concede that there is certainly some degree of nonarbitrary likeness between Napoleon and his image, and, therefore, to some degree the image denotes Napoleon (or at least a man of his appearance, stature, and garb). In addition, however, the choices made by the filmmaker in constructing the scene, shooting the image, and editing the sequence are visible in the image as signs. When these signifiers are arbitrarily related to the signified, they are symbols whose meaning is determined by cultural compact and not by Nature. Thus, the image can connote more than it denotes to an observer who has learned the meaning of the signs (Metz, 1974; Monaco, 1981). This style of theorizing has been hugely popular with artists and theorists alike, perhaps because it seems to free creators from the apparent straitjacket of optical realism. Anything can become “real” and meaningful if we only say it’s so. Indeed, Kepes claimed that it is not just images that are flexible but, in principle, the very processes of vision itself (1944). Artificial image making allows us to unyoke vision from the perspective structure of natural optics. If vision is essentially a process of symbol reading that will function in the presence of any well-structured, familiar symbol system, then new laws and new realities can be created: Just as letters of the alphabet can be put together in innumerable ways, so the optical measures and qualities can be brought together in innumerable ways, and each relationship generates a different sensation of space. The variations to be achieved are endless. (Kepes, 1944, p. 23) Thus the artist is given license to create not just the film but the very language by which it is to be understood. Prospects for unbounded creativity aside, the centerpiece of the semiotic argument is the apparent evidence that there is no natural relationship between image and reality. Artifice is a requirement of image making because there is no alternative. To the naïve observer, it seems that, when the shutter is opened, reality is of necessity sucked in through the lens to be caught on the film ready for inspection like so many bugs on flypaper. In fact, all filmmakers and photographers know that reality will change its shape and form depending on the choices they make. Returning to Napoleon’s portrait for a moment, choice of a short focal-length lens will enlarge his nose, positioning him far from the center of the picture will deform his head. Every filmmaking and photography handbook lists guidelines for managing the wayward distortions of pictorial perspective. (Leonardo da Vinci made a similar list for painters; see Richter, 1970.) Camera optics alone do not guarantee a meaningful and natural image. This fact has led some writers to claim that the perspective structure of our photographs, our motion pictures, and, in particular, our paintings is no more natural than any other organized depiction system. Nelson Goodman, for example, wrote that “the artist who wants to produce a spatial representation that the present-day Western eye will accept as faithful must defy ‘the laws of geometry’” (1968, p. 12). Perspective has been called a “scientific convention” with no inherent validity (Read, 1956, p. 66) and even an instrument of Western

THROUGH ALICE ’S G LASS / 219

capitalist ideology (Baudry, 1974–75) (see also Arnheim, 1954, p. 93; Gombrich, 1960; Panofsky, 1991). It seems to me that the symbol theorists made a number of serious mistakes. They extended the reach of the symbolic into areas of film meaning where it does not belong. They denied the natural relationship between image and life. They misunderstood perspective structure and overestimated its frailty. They conflated the conventional with the arbitrary, focusing on symbols at the expense of artistic practice. As a result, they falsely ascribed to artists the power to create a system of meaning that properly belongs to nature. In correcting each of these mistakes in turn, we can build an alternative, ecological theory of meaning in film in which the filmmaker is the auteur of meaning and not the producer of pidgin. The Proper Place of Symbols There is no doubt that people can extract meaning from symbols. We do it every time we use language. There is also no question that images can contain symbols. The dove in Renaissance painting is always the Holy Spirit. The young man with grapes is always Bacchus. But that is not to say that meaning can not be perceived directly from an image, that is, without the intervention of a psychological translation process. Erwin Panofsky (1939/1982) distinguished between the natural or primary meaning of images (the fact that the subject is a young man holding grapes, his expression of pleasure) and secondary or conventional meaning (that he is Bacchus). A third stratum provides additional meaning (Panofsky called this “intrinsic” meaning) to an observer able to compare the work with other “documents bearing witness to the political, poetical, religion, philosophical and social tendencies of the personality, period or country under investigation” (1982, p. 39). Panofsky made psychological claims about these levels of meaning. The primary or natural meaning of the image is pre-iconographical. It can be detected by anyone who has practical experience of the real world. The secondary or conventional meaning of the image is not available to everyone but requires the application of special knowledge. Secondary meaning is thus intelligible but not “sensible” (p. 27). It is outside the bounds of perception proper. The ultimate level of meaning requires even more in the way of active cognition from the observer, “a mental faculty comparable to that of a diagnostician” (p. 38). The problem with the semiotic tradition is not just that it has neglected the huge reservoir of natural meaning found in still and moving images but that it has actively sought to drain it. Metz, for example, noticed that the units of film are not exactly like words. He could not identify elemental units in film at all. Further, in film, unlike in language, “signifier and the signified are nearly the same” (Monaco, 1981, p. 341). But instead of backing off from the position that the meaningful units of film are signs or that film is a language, Metz substitutes the idea of a short-circuit sign for a full-blown symbol. The short-circuit sign, like the icon in C. S. Peirce and Peter Wollen’s Signs and Meaning in the Cinema, is a sign that carries its meaning through resemblance. Signs, however, even short-circuit ones with a fast track to the truth, still need to be decoded. The ecological approach offers a radical alternative to this view. The objects, surfaces, people, places, and events that constitute our world and the other worlds we create in images are not understood through the decoding of signs or symbols. There is nothing arbitrary about the way light is structured in space and time as it is reflected off these environmental surfaces. Further, the structure forms definable units of meaning. As

220 / SHEENA R OGERS

observers, we become immersed in the world of the motion picture because it shares its natural (nonsymbolic) meaning with the real world it depicts. Once immersed, we can revel in the profoundly engaging task of decoding secondary meanings and then, or perhaps later with our friends, construct a satisfying edifice of intrinsic meaning from our shared understanding of the wider social and historical context of the work. The ecological theory of film I advocate here does not deny the existence of symbols in still and moving images. The works of Ingmar Bergman, for one example, would be the poorer without them. Rather, I seek to reclaim the abundant natural meaning of moving images for perception through the concept of information in the optical structure of light and through an appreciation of the filmmaker as artist in this medium. Life: Seeing the World and Oneself in It Contrary to the claim of the symbol theorists, the perspective structure of light as it is reflected from the world to enter the light-sensitive chambers of our eyes and cameras is rife with meaning. This richly patterned light bundle, called the optic array, is usually in flux as we and the things in our world go about the daily business of living. Analysis of the spatial and temporal patterns in this optical flow reveals structures that can provide information to observers about objects, surfaces, people, places, and events, about observers themselves in relation to these things, and about the possibilities that exist for observers to interact with them. These structures are not forms or objects but aspects of the geometrical structure of the optic array (Gibson, 1973). To be informative, particular structures must specify, without ambiguity, some state of affairs in the world, with respect to the capabilities of a particular observer. If these structures exist and if we can detect them, we can know the world. This is the heart of the ecological approach to visual perception, developed by James J. Gibson in the latter half of the last century (Gibson 1950, 1966, 1979). Information is a core concept in the ecological approach and a powerful one. Ecological psychologists have identified and investigated structures that can inform us about the objects and events that make up our world. (The contributors to Epstein and Rogers’s volume, Perception of Space and Motion, 1995, review much of this literature). Optical flow patterns reveal the three-dimensional structure of objects (for reviews, see Lappin, 1995; Todd, 1995) and people (Johannson, 1973). They inform us about events (for a review, see Proffitt & Kaiser, 1995) and guide actions such as walking, driving, catching, jumping, and climbing (for reviews, see Warren, 1995, and Bruce, Green, & Georgeson, 2003). A central idea in Gibson’s ecological approach to perception is that the function of vision is to control behavior. Vision guides us when we interact with the world. Behavior is controlled by information because information “points two ways” (1979, p. 141)— to the world and to ourselves. It involves “seeing oneself in the world” (1979, p. 225). Information to specify the utilities of the environment is accompanied by information to specify the observer himself . . . to perceive the world is to coperceive oneself. . . . The awareness of the world and of one’s complementary relations to the world are not separable. (1979, p. 141) [italics added] Among the informative structures so far described in the literature, my favorite is one that clearly shows how information connects us to the world around us even as it reveals something of the nature of that world. The optical horizon specifies the height

THROUGH ALICE ’S G LASS / 221

of the observer’s eye level in a scene. It is visible in the flowing optic array generated by a moving observer, and it leaves a trace in the perspective structure of a frozen array. The information the horizon supports is captured in motion pictures, in photographs, and in perspective painting.1 The horizon ratio is a geometric structure that relates the height of objects in the scene to the height of the observer’s own eye. An object that is intersected exactly in half by the horizon, for example, is exactly twice the height of the point of observation whether that point is occupied by an eye or by a camera. One quite literally sees oneself in and relative to the world whether that world is real or exists only in an image (Rogers, 1995, 1996, 2000; Sedgwick 1973, 1980, 1983). Image: The World Through Alice’s Glass The ecological concept of information provides a possible explanation for the moviegoer’s sense of immersion in another world. If information can be captured in moving images, then as observers we can hardly help but see the other world as real, as out there, as involving us. The camera shows us our own view of the other world, and that view shows us where “we” (in this case embodied in the camera) are in relation to it. But remember this is the world through Alice’s glass. It is like ours but different. The chief difference is that the glass itself remains between us. The movie screen is an object in our world, its properties specified through one set of informative structures available in our optic array before, during, and after the movie plays across it. The properties of Alice’s world are specified through another set of informative structures, caught by the camera, and available only when the movie rolls, and we look through the glass. It is hard to resist looking through and into Alice’s world, but if we force ourselves to hold back, we can see the two-dimensional colored patches that float across the surface of the glass. These floating 2-D objects are now sized relative to our own body and immediate surroundings and are clearly seen as much bigger than ourselves. When looking through the glass, we become embodied in the camera. Its “eye-height” becomes our own, Alice’s world enfolds us. A surprising quality of Alice’s world is that it rarely, if ever, maps directly on to the real world it seems to depict despite our readiness to immerse ourselves and treat the camera’s view as ours, and this new world as real. The problem is that the observer’s eye is very unlikely to be located at the exact spot occupied by the camera relative to the scene when it was shot. And only when the eye is at this center of projection is the geometrical structure of the real scene and the depicted scene the same. When the eye is displaced from this point, and it usually is (there is only one center of projection for any image but many seats in the movie-theater), a new world results, with somewhat differently shaped and disposed objects and surfaces in it. (These distortions are identified and fully described elsewhere in Sedgwick, 1991, Rogers, 1995.) Under some conditions, the distortions are noticeable. They were certainly noticed immediately by the early developers of pictorial perspective, who quickly figured out solutions. Leonardo’s solutions included restrictions on the location of the center of projection, some of which limit the angle of view of the principal objects (Richter, 1970). The equivalent strategy for filmmakers and photographers is the use of a long focallength lens. This choice is almost obligatory for portraits where distortions are particularly unacceptable (unless the subject is an unpopular political figure). (See Ascher & Pincus, 1999, p. 83 for an example of handbook advice on this matter.) If the artist wants a close viewpoint and a wide-angle view, distortion is almost inevitable. Uccello struggled

222 / SHEENA R OGERS

to bring under control a perspective portrait of Sir John Hawkwood mounted on a horse on a pedestal, finally settling on two centers of projection, one for the pedestal and one for the horse and rider (see Kubovy, 1986). The filmmaker has more choices here. A spatial equivalent, though rarely used, would involve two (or more) camera viewpoints in a split-screen or a multiple exposure (David Hockney’s video joiner attempts something like this, though not to solve problems of perspective distortion, which he explains in a 1983 interview). More likely, the filmmaker would simply show successive viewpoints of parts of the object by moving the camera either in a continuous shot or in an edited montage. It is “fixes” such as these that led Sir Herbert Read (1956) and Nelson Goodman (1968) among others to think that perspective in its raw state was unacceptable to picture viewers and thus “unnatural.” Their mistake was to confuse the conventional with the arbitrary. Conventional practices designed to limit distortions produced by ordinary viewing conditions that do not pin the observer to the center of projection are in no sense arbitrary. Horizon ratios, for example, will be identical in pictures, movies, and the real scenes that gave rise to them just so long the viewer’s eye is located somewhere on the horizontal plane that extends out of the picture from the horizon. If this constraint is met, the information that specifies the height of objects in the picture relative to the height of the viewer’s eye will also be identical. If the constraint is not met, the horizon ratio will specify taller or shorter objects. (The heights of objects relative to each other will be preserved.) The constraint can be met by positioning movie-theater seats appropriately although errors in perception are likely to be fairly small and inconsequential if seats are not too far above or below the ideal plane. Indeed, movie-theater design guidelines ensure that even from front row, side, aisle, distortions of this and other sources of information are likely to be too small to matter (see Cutting, 1987). That these distortions are a product of the relation between perspective geometry and the observer’s viewpoint can easily be demonstrated. When the viewer’s eye is forced into the one true viewpoint by means of a peephole or a floor marker, whether in a museum, a church, or in the laboratories of experimental psychologists, no distortions occur. Instead, powerful illusions of solid 3-D space appear, as anyone can attest who has peeped into a perspective cabinet by seventeenth-century Dutch artist Von Hoogstraten or peered up at Pozzo’s trompe l’oeil ceiling in the church of St. Ignatius in Rome. (See Leeman, Elffers, & Schuyt’s 1976 book Hidden Images, or Cole’s Perspective, if your travel budget is small.) Distortions appear as a viewer strides away from the floor marker or looks in through the side of the cabinet. The distortions are not there because of putative intrinsic shortcomings in the optical system of projection we call perspective but because the artist chose to break the rules of traditional practice, buying the great illusion at the cost of keeping you in your place (one at a time please). Usually, then, Alice’s world maps on to the real world quite well. Much of the information is preserved. Perspective distortions are minimal and rarely noticed. (See Rogers, 1995, for a review of the experimental literature on the perception of pictorial space.) It is the very success of artists and filmmakers in minimizing the expected distortions that has rendered their labors invisible and their media transparent, most of the time. Sometimes, the sense of immersion in the motion picture is not complete, and we become aware of Alice’s glass between the created world and us. Our noses pressed to the glass we can not step in. We look but we can not touch. As we have seen, in the ecological view, perception evolved to support action. But in real life, they are not on a one-way street. Gibson pointed out that it is action that creates the information-loaded

THROUGH ALICE ’S G LASS / 223

optic flow. When we encounter a novel or confusing object or layout of surfaces, our immediate response is to move, to generate more information. Gibson insisted that “we perceive in order to move and we must move in order to perceive” (1979, p. 223). The claim is that visual perception is an exploratory sense like touch (Gibson, 1962). Motion pictures, however, roll out their action and display their informative wares for our leisured consumption with no need for active involvement on our part beyond the drift of an eyeball. Our inability to explore Alice’s world has two potential drawbacks, each with consequences for perception. First, any movements we make generate informative structures about the surface of Alice’s glass and risk wresting us, however briefly, from the other world behind it. The conflicting optical-flow patterns from our world and Alice’s, the former locked on to our own movements, the latter not, repeatedly activate our subsidiary awareness that we are looking at images. Thus, as we watch a motion picture, our immersion in the action of the created world ebbs and flows. Even at high tide, we are never far from a reminder that Alice’s world exists behind the glass. Only in Woody Allen’s The Purple Rose of Cairo (1985) can Alice break the glass and step through. The second drawback brings us back to the main thesis of my essay: that the filmmaker is auteur of information. Without the opportunity to explore Alice’s world, we may not be able to get all the information we want. We may misunderstand a scene, and we may, of course, be intentionally misled. Usually, though, we are in good hands. The filmmaker is also a perceiver and, lucky for us, has control of the camera. A random walk by a blind, camera-toting robot would produce moving pictures not unlike the first incomprehensible effort made by my very young daughter. Random slices of reality capture very little informative structure, even when set in a temporal sequence. We sometimes forget that literally hundreds of decisions go into every shot, some as simple as which way is up. And while ground-down–sky-up may well be a convention of film-making practice, it is surely one grounded in the psychology and ecology of perception and not in a social compact. One of the filmmaker’s tasks is to select, record, display, and guarantee the information we need in order to understand the natural meaning of the movie. Then our job as perceiver is easy. Perception of the movie can proceed effortlessly, using just the same everyday processes of information detection that our species evolved long before we invented images. Of course, you will still have to follow the narrative and figure out the secondary and intrinsic meanings for yourself. Movies, Pictures, and Virtual Reality Much of what I have said so far about the perception of motion pictures applies equally to their still kin although a number of differences remain. Both moving and still pictures are projections on to some kind of surface, the glass of my metaphor, beyond which is another world. Both consequently have a dual reality as visible thing in this world and invisible window onto the things in another. Both types of image need to be carefully constructed to avoid distortions when they are not viewed from their center of projection. The practices that artists in one medium have developed to ensure that distortion is minimized can usually be generalized to the others. Neither still nor moving images can be actively explored to generate additional informative structures, and in both cases attempts to do so instead generate optic-flow patterns that only serve to remind us of the objective reality of the picture as a surface. In both cases, the image makers must use their experience with their craft to ensure that necessary information has been captured and that any constraints are met to ensure that the structure retains its mean-

224 / SHEENA R OGERS

ing. These are important similarities, crucial to a theory of images. The remaining differences serve to highlight the special status of moving images for perception. Still pictures are clearly the impoverished cousins of movies because still pictures lack movement. Lacking movement, their ability to capture natural meaning in the form of information is limited, and the possibilities for ambiguity are much greater. It is relatively easy to build an ecological theory of film but much harder to do the same for painting and photography. The emphasis of the ecological approach to perception has been on the motion of observer and of object. It is a changing array that provides information, so movies seem to be ready surrogates for reality. The retinal image is seldom an arrested image in life. Accordingly, we ought to treat the motion picture as the basic form of depiction and the painting or photograph as a special form of it. . . . Moviemakers are closer to life than picture makers. (Gibson, 1979, p. 293) Pointing out the inadequacies of still pictures has long been a popular pastime of ecological psychologists. Indeed, Gibson once denied that they can contain information at all, writing that “frozen structure is a myth” (1979, p. 87). He stopped short of handing the black sheep over to his theoretical enemies, however, and later in the same book tried again to bring them back into the fold. He stood by his earlier claim that “a picture is a surface so treated that a delimited optic array to a point of observation is made available that contains the same kind of information that is found in the optic arrays from an ordinary environment” (1971, p. 31). Ever aware of the paradox in his position, Gibson swung back and forth on the prospect for an ecological theory of the still image. Offering such a theory seemed to undermine his primary project built on the foundation of action and movement. Yet, the compelling realness that pictures sometimes achieved and their evident immediacy for perception cried out for inclusion in the ecological venture. Perhaps the structures are there in frozen arrays but are not fully revealed until the array is set in motion (Gibson, 1973, p. 45) or perhaps they are there and detected but are simply weaker than those that emerge in transforming arrays (Gibson, 1979, pp. 271, 302). It is not clear what he had in mind in these comments. Is a partly revealed structure one that yields only part of its information, and thus we are misinformed more often than in the movies or in life? Or is it one that is detected sometimes and sometimes not? Is a weaker structure the same thing as a partly revealed one? I have elsewhere offered a different formulation (Rogers, 1995, 2000). Frozen arrays, including pictures, can indeed provide the same kinds of informative structures available in transforming arrays, and these informative structures can be detected using the same kinds of perceptual processes as Gibson proposed. However, there will be fewer structures present. Specifically, there are some things about the world that simply cannot be known from the information available in a frozen array although many things can be. Some informative structures require movement, and these structures are necessarily absent from still pictures. Others do not require movement, and these structures can, in principle, be made available in a still picture. For example, the depth order of parts of a scene can not be specified with motion parallax in a photograph, but it could be specified by another structure (or by several other structures) including the ordinal distance relation (by which objects closer to the horizon are specified as further away). Time-to-contact, on the other hand, useful for ducking missiles and controlling braking, is specified by the optical variable tau, which is not definable in the absence of optic flow. Gibson was right. Movies are closer to life than still pictures.

THROUGH ALICE ’S G LASS / 225

There is a new entry in this continuum from still pictures, through motion pictures, to reality. If proximity to life is measured by of the quantity of informative structures that are available, then closest to life is the new medium of virtual reality (VR). These displays have many of the characteristics of movies with the addition of the missing ecological imperative of action. Here, the observer can direct the camera through the artificial environment to produce optical-flow patterns like those in life. As in life, observers can generate the information they need through various movements. A step to the side produces motion parallax, a step forwards towards an object produces tau. In a movie, we must trust the filmmakers to do this for us, and they might not satisfy our needs. For this reason, VR has the potential to be more informative. VR can also provide additional information through stereo vision (but so can movies like The House of Wax, 1953), and by reaching in to our peripheral vision, it can strengthen the feeling of selfmotion (but so can ultra-wide and wrap-around film formats). I would argue, however, that VR achieves its special status for perception not through the additional information that it can provide nor through its responsiveness to exploration. The key difference between VR and motion pictures is that informative structures such as the horizon ratio are locked on to the eye of the observer, just as they are in real life (now becoming known as actual reality). The horizon will slide up and down in the scene as you bend and straighten your knees, adjusting the horizon ratios to reflect your new eye-height as it goes. In a sense, VR is the modern, moving, version of Von Hoogstraten’s perspective cabinet. Here, though, the restricted viewpoint continues to preserve the geometric structure of the optic array even as the observer moves. Observer movement produces neither distortions of the virtual space nor the conflicting optical patterns described earlier that bounce observers out of Alice’s world and back into their own. In my laboratory, at least, it is this preservation of viewpoint and the yoking of the observer to the scene that seem to be most responsible for the observer’s feeling of immersion in the other world (Rogers, 2003). Ultimately, however, if the array is coming from a picture, still or moving, there are no guarantees that informative structures will have been captured there. Pictures, as artifacts, have great potential to be ambiguous (a potential realized to great effect in the work of M. C. Escher and surrealist René Magritte). Pictures, movies, and VR can mislead us, or they can tell us nothing at all. Thus, it is not enough to claim, as Gibson did, that the “ delimited array [of a movie] is analogous to the temporary field of view of a human observer in a natural environment surrounding the observer” (1979, p. 302). The possibility of information in images is not enough to build an ecological theory of the creation and perception of images: Someone must put the information there. It must be selected, revealed, perhaps emphasized, and certainly guaranteed through the satisfaction of the constraints that allow the structures to hold on to their meaning. This is hard in VR, but it is harder still in movies and hardest of all in still pictures. In this ecological approach to film and image theory, the burden of creating meaning in the movies is shifted away from the viewers and into the capable hands of the filmmakers. At least with respect to what Panofsky has called primary or natural meaning, the filmmakers’ task is not to invent new symbols and new languages but to direct the availability of information. Drawing on the ecological approach to the psychology of perception, I have shown that moving pictures and still ones can present optical structures to the eye of the observer that are identical to those available in nature. These structures are based in the

226 / SHEENA R OGERS

natural geometry of the optic array. Some are available only when the array is set in motion, but others are still there when it stops. Potentially, VR can carry more natural meaning than movies, and movies can carry more than still pictures. When available, particular optical structures inform us about the objects, people, places, and events in the world. A special property of the ecological concept of information is that it also allows us to perceive ourselves in relation to the world. I have argued that it is this property that is responsible for our deep sense of engagement in the life of a movie. Filmmaking enacts the laws of ecological optics, and it is therefore a natural system for creating other worlds. That is not to say that we can leave the chore of creating meaning entirely in the hands of the great Mother. To be maximally effective, filmmaking, like painting and photography, requires careful deployment of informative structure under a set of constraints. It is only under those constraints that elements of the optical structure of paintings, photographs, movies, and virtual reality can be guaranteed to carry the same meaning as those found in the structure of light reflected from real scenes. Traditional artistic practices (which are conventions but not signs) minimize distortion and ambiguity by ensuring that these constraints are met. The world through Alice’s glass is perceived as meaningful and truthful because the motion picture engages the only perceptual processes we have. Perception proceeds through the detection of informative structures, and it does so because of the skill of the filmmaker. The motion picture is an artifact but therein lies its strength. The filmmaker colludes with Nature to create meaning, and we are caught in their thrall. Note 1. The terrestrial horizon is a little lower than the optical horizon because the planet curves away from us. The optical horizon is the infinite limit of the ground plane. It is always at the eye level of the observer. In perspective painting, it is the locus of the vanishing points of parallel lines on the ground plane that are orthogonal to the picture surface.

References Anderson, J. D. (1996). The reality of illusion: An ecological approach to cognitive film theory. Carbondale: Southern Illinois University Press. Arnheim, R. (1954). Art and visual perception. Berkeley: University of California Press. Ascher, S., and Pincus, E. (1999). The filmmaker’s handbook. (Rev. ed.). New York: Plume Books. Baudry, J-L. (1974–75). Ideological effects of the basic cinematographic apparatus. Film Quarterly, 18(2), 39–47. Bazin, A. (1971). The myth of total cinema. In H. Gray (Selector & Trans.), What is cinema? (pp. 17–22). 2 vols. Berkeley: University of California Press. Bruce, V., Green, P. R., and Georgeson, M. A. (2003). Visual perception: Psychology, physiology and ecology. (4th ed.). New York: Psychology Press. Cole, A. (1992). Perspective: A visual guide to the theory and techniques from the Renaissance to pop art. London: Dorling Kindersley. Cutting, J. E. (1987). Rigidity in cinema seen from front row, side aisle. Journal of Experimental Psychology: Human Perception and Performance, 13, 323–34. Epstein, W., and Rogers, S. (Eds.). (1995). Perception of space and motion. San Diego: Academic Press. Gibson, J. J. (1950). The perception of the visual world. Boston: Houghton Mifflin. ———. (1962). Observations on active touch. Psychological Review, 69(6), 447–90. ———. (1966). The senses considered as perceptual systems. Boston: Houghton Mifflin.

THROUGH ALICE ’S G LASS / 227 ———. (1971). The information available in pictures. Leonardo, 4, 27–35. ———. (1973). On the concept of formless invariants in visual perception. Leonardo, 6, 43–45. ———. (1979). The ecological approach to visual perception. Boston: Houghton Mifflin. Gombrich, E. H. (1960). Art and illusion: A study in the psychology of pictorial representation. Oxford: Phaidon Press. Goodman, N. (1968). Languages of art: An approach to a theory of symbols. Indianapolis: BobbsMerrill. Hockney, D. (1983). Interview with M. Bragg (Ed. and Presenter). Portrait of an artist (Vol. 12) (Videocassette). London Weekend Television. Johannson, G. (1973). Visual perception of biological motion and a model for its analysis. Perception & Psychophysics, 14, 210–11. Kepes, G. (1944). Language of vision. Chicago: Paul Theobald. Kubovy, M. (1986). The psychology of perspective and Renaissance art. Cambridge, England: Cambridge University Press. Lappin, J. (1995). Visible information about structure from motion. Epstein & Rogers (pp. 166–99). Leeman, F., Elffers, J., & Schuyt, M. (1976). Hidden images. New York: Harry N. Abrams. Metz, C. (1974). Film language. New York: Oxford University Press. Monaco, J. (1981). How to read a film: The art, technology, language, history, and theory of film and media. New York: Oxford University Press. Panofsky, E. (1982). Iconography and iconology. In Meaning in the visual arts (pp. 26–54). Chicago: University of Chicago Press. ———. (1991). Perspective as symbolic form. (C. S. Wood, Trans.). New York: Zone Books. (Original work published 1924–25). Proffitt, D. R., & Kaiser, M. K. Perceiving events. Epstein and Rogers (pp. 228–61). Read, H. (1956). The art of sculpture. Vol. 3. New York: Bollingen Series, 35. Princeton University Press. Richter, J. P. (Ed.). (1970). The notebooks of Leonard da Vinci. New York: Dover. Rogers, S. (1995). Perceiving pictorial space. Epstein and Rogers (pp. 119–63). ———. (1996). The horizon ratio relation as information for relative size in pictures. Perception and Psychophysics, 58, 142–52. ———. (2000). The emerging concept of information. Ecological Psychology, 12, 335–43. ———. (2003). Truth and meaning in pictorial space. In H. Hecht, R. Schwartz, and M. Atherton (Eds.), Looking into pictures: An interdisciplinary approach to pictorial space (pp. 301–20). Cambridge, MA: MIT Press. Sedgwick, H. A. (1973). The visible horizon: A potential source of visual information for the perception of size and distance. Dissertation Abstracts International, 34, 1301B-2B. (University Microfilms No. 73-22530). ———. (1980). The geometry of spatial layout in pictorial representation. In M. A. Hagen (Ed.), The perception of pictures: Vol. 1. Alberti’s window: The projective model of pictorial information (pp. 33–90). New York: Academic Press. ———. (1983). Environment-centered representation of spatial layout. In J. Beck, B. Hope, and A. Rosenfeld (Eds.), Human and machine vision. New York: Academic Press. ———. (1991). The effects of viewpoint on the virtual space of pictures. In S. R. Ellis (Ed.), Pictorial communication in virtual and real environments (pp. 460–79). New York: Taylor & Francis. Todd, J. T. (1995). The visual perception of three-dimensional structure from motion. Epstein & Rogers (pp. 202–226). Walton, K. (1984). Transparent pictures: On the nature of photographic realism. Critical Inquiry, 11, 246–77. Warren, W. H. Jr. (1995). Self-motion: Visual perception and visual control. Epstein & Rogers (pp. 263–325). Wollen, P. (1972). Signs and meaning in the cinema. Bloomington: Indiana University Press.

14 Metaphors in Movies John M. Kennedy and Dan L. Chiappe I N THE OPENING of Memento (2001), a murderer kills his victim with a gunshot. But the scenes run backwards. The blood rises to the wound. The wound heals. The victim gets up. The murderer pockets his gun. The killer and the target separate and retreat their own ways. Time’s arrow is reversed at the start of a movie about a manhunter who can no longer form lasting memories. He has a few notes to guide himself. Plus he swiftly interprets whatever events surround him. One thing he does know for sure. He wants revenge. One of the memories leaking through from his life before his loss is that his wife was killed by a criminal, and the same criminal caused his injury. What lessons could we learn from this clever beginning of Memento? Should we decide they are a knock-down argument against the thesis that movies are realistic? It would be easy to reach this conclusion. Time flies forwards, always. To show it in reverse is a violation of nature, Einstein’s, Newton’s, or Asimov’s. Here, our chief task is to describe the role unrealistic events can play as figurative devices, metaphors, and symbols in movies (Whittock, 1990). Along the way, we will take issue with the idea that each medium of representation (literature, radio, movies, TV, Internet, etc.) is a unique cultural product. We argue that each medium is like an island. Above the waterline, it is its own entity. But below, it is joined with all the others in an archipelago. When they use metaphors, movies rely on what is age-old, rich, and powerful in perception and cognition. This is what makes movie metaphors tick. Memento begins backwards to symbolize the puzzle facing the central character. It is a vivid and imaginative but entirely figurative device because the image is not strictly true for the character, because time runs forwards for him, even though it leaves no memories. The figure also captures the movie’s tactic in presenting the story, because the movie unfolds its story backwards. Though each scene is shown forwards, the order of the scenes is generally backwards (with some overlap of scenes to help keep the order straight). For example, secondary characters in the movie take advantage of our hero, but the first time we see an event, this is not obvious. The disheveled woman we thought had been attacked we later see has self-inflicted damage and is only pretending to be distraught. We take her at face value at first, and only later are we shown early events that reveal the deception being played on our hero. Blood running back to a wound sums up the movie’s clever tack. It is a capsule for the movie, an unrealistic image, a symbol. By the end of the movie, we—the audience— have information about its dual significance: the murder’s place in the story and why it was shown backwards.

228

METAPHORS IN M OVIES / 229

Interesting movies are full of clever symbols and metaphoric twists, events that resonate ironically later in the storyline (Currie, 1995). The Unbearable Lightness of Being (1988) has its credit titles streaming over black-and-white pictures of scenes. But the pictures are shown in negative. What is in shadow is bright, and what is illuminated is dark. Much of what occurs in the country and period of the movie’s setting is a reversal of the truth, and the credit title’s background reflects this in a concrete image. When a single camera shot in a movie shows objects, space, and events, it makes use of what we may call very loosely the vocabulary of a picture in a literal fashion (Currie, 1995). Language has recognizable words, and moving pictures have recognizable locations, objects, and movements of the camera’s vantage point (Arnheim, 1992). Some might argue that the vocabulary within a shot is the only part of a movie that is remotely realistic and somewhat culture-free. This argument holds that the story in a movie and the style or technique in which we tell it are popular-cultural matters and often “trite and formulaic” (Markgraf and Pavlik, 1998, p. 275) or in danger of being “hackneyed” (Whittock, 1990, p. 130). The overall logic of a movie is generally a story, and the cutting in the movie, which orders its moments and camera points of view, is its technique. The cultural argument contends both the way in which the moments in a scene are ordered and the order in the overall storyline are cultural products pure and simple. Clear examples of cultural products include anything that violates time and space. Unrealistic ordering of moments, such as reverse time, and unrealistic presentation of individual moments, such as reverse luminance, indicate movies are purely cultural devices, the formula argument proceeds. Simple cuts between camera vantage-points jump across space instantaneously in a way human vision cannot. To understand them, we need to be immersed in the vocabulary of movies, not realistic, normal, visual inspection. In The Matrix (1999), characters jump up into the air, arms extended to the side, legs prepared to kick, and then freeze. However, in a bravura display of filmmaking, the camera spins around the character, showing a single frozen moment but with free camera motion. This clever distinctive device has been parodied in various comedies and alludes in each case to The Matrix. The cultural argument insists movies constantly refer to other movies in this fashion. In this vein, consider the use in language of reverse vocabulary. “She is a real winner” twists words to fit what we mean. “Tell me something” means we have no news to offer. As in “His heart is stony,” in which we switch from human to inanimate, we mean something other than what is presented directly. We are using nonliteral language—figurative devices, technically, tropes, or more loosely, many kinds of metaphor. Just as winner and stony are nonliteral devices here, so, too, are the reversals of time and brightness. If we keep following the cultural argument, we have to claim that it is because of deep acculturation that metaphors in language and pictures have meaning for us. We have been taught the kinds of metaphors, and, therefore, we can understand them. The argument that communication in pictures and language is culture-bound can indeed make good use of the irony in winner and the apparent reversal of time’s arrow in a movie. If, like irony, complete reversals of what is realistic are proper fare in a movie, how can we conclude movies respect principles of realism? And how can we expect uninitiated viewers to follow a story if anything goes including, at times, complete reversals of what is true? This argument drives straight to the conclusion that there are no fixed canons in movies.

230 / JOHN M. K ENNEDY

AND

DAN L. CHIAPPE

In like vein, Goodman (1966) noted some theorists conclude pictures look like what we think they represent only because we have become used to their style. A Picasso is not like its sitter? The Picasso of Gertrude Stein is not much like the redoubtable Ms. Stein? No matter. With acculturation, it will eventually be, the argument races ahead, if we take it to its extreme. Our purpose here is to recognize metaphor in movies while standing firm against the worst excesses of the acculturation argument. We will propose there are strong universals in perception and communication, including universals to do with metaphor. The universals are the sunken land that joins the islands in an archipelago. Our target is a theory of metaphor and symbol based on literal and realistic representation, using relevant features that might be shared by what is represented and how it is represented. Acculturation Moulin Rouge (2001) offers “real artificiality,” says its director Baz Luhrmann (Van Meter, 2001). Set in fin-de-siècle Paris, it has references to most decades of the past 150 years, visually and musically. At one point, a frantic can-can is shown in a frozen image. It is blurred, colorful, full of contrast, and evocative of French Impressionism. Later, the screen is full of red and white dots, like a pointillist picture. In between, high contrast, black-and-white images allude to Toulouse-Lautrec. Other images are like paintings by Degas, Manet, and Monet. The artificiality is the artifice of theater, close up, backstage as well as front. The reality includes a frankly exaggerated presentation of Montmartre as colorful like greasepaint, with seductive red dresses of sick prostitutes and the tempting striking green of addictive, poisonous absinthe, set in the midst of grimy, sooty, gray, industrial nineteenth-century Paris. The early can-can Impressionist moment stops time, like energetic color splashed on the audience’s visual field. The immobile red and white dots tell us about death caught in silence, after our heroine has coughed up blood. The artifice of Montmartre brightly covers the backstage story of venality, frantic hope, and stories within stories, each story breaking through into the others. We see the setting as a lurid one, surrounded by what is dull and oppressive. “Seeing as” is possible because Montmartre is portrayed in a certain way. As Arnheim (1992) noted, objects can be depicted in ways that make them actors, that is, aspects of an object can be selected for emphasis. It is the job of directors to emphasize in ways that are distinctive and informative. Thereby the objects are caricatured in Moulin Rouge. Moulin Rouge is a whirlwind of dance, song, myth, and allusion. References abound, and each has a significance the movie shows in brief snatches, implicitly, before whisking on, like a dancer who has caught our eye but will not stop for us (Bernsten and Kennedy, 1996). The acculturation argument rightly points out that to understand Moulin Rouge, we need to know about Madonna, Sting, Monroe, Gilbert and Sullivan, Offenbach, Seurat, Renoir, Monet, Degas, Cezanne, dancers and actresses, sluts and lords, ambition and greed. But the argument needs limits. It holds sway for many things, but it needs to respect astonishing evidence to the contrary. The acculturation argument has its merits, because we have to know about a culture to read its banners, sashes, songs, and marches. But knowing about Alannis Morissette, Van Morrison, or Elvis is a peripheral matter compared to the ability to sing and run. At the center, where universals live and are respected, subsea land joins islands. Cultures are just ways to express the universals of perceiving, believing, and communicat-

METAPHORS IN M OVIES / 231

ing, loving and leaving. Acculturation, like nationalism, is a celebration of the parochial. It is in danger of being small-minded and exclusive. It does not celebrate that we all see, have principles, know, and communicate. It only provides some contents for our awareness and forgets the shape of the urn. We may choose water lilies to express ourselves to others, but our ability to express is universal, and the waterlilies are the vehicles and content of our ideation. Undermining the distinction between the vehicle and what is expressed, some hold that “The medium is the message.” Certainly the Internet is more significant than the individual jokes and personals that are rife in its channels, just as TV is more significant than any one of its sitcoms. Each medium is indeed a unique cultural invention. But that does not mean we need follow the fashion and disdain the idea that communication has many universals. Deconstructing texts finds their arguments and plots resting on unspoken assumptions about reality, representation, men, women, and children. Likewise, if we look at images of events positioned in odd orders in movies, we might be drawn to hypothesize that what we are seeing is evident only because we have been given a minor code. Each medium, in this view, has its own code: for CNN, the time is now, and changeable, and for movies, the time is the period in which it is set, and the movie is fixed as released. There is certainly a great deal surrounding our looking (Parks, 2001). But here we will use metaphors, symbols, and related devices in movies to develop a theory about important universals. We contend figurative expression is based on what is literal. The theory has realism about the ecological visible world at its core. It is allied with mentalism and allows for communication between minds. The kind of mind in question has purposes and uses particular objects to convey messages, sometimes literally and sometimes using modes that are nonliteral, metaphoric, and symbolic. At heart, we will claim, people distinguish between representations that are intended to be realistic within a movie’s plot line and ones that are intended to be devices. That is, we ask what is truthful and what might mislead but is not intended to mislead. In Crouching Tiger, Hidden Dragon (2000), some of the initial scenes have characters running on walls and leaping over houses, balletically but farfetchedly. As the storyline develops, it becomes clear that the warriors are a special breed, disciplined, trained in magic martial arts. We are invited to wonder at them and to indulge our senses in the delightful, poetic, hyperbolic motions (e.g., running along treetops, etc.). When directors play their hand, they treat us as confidantes and offer us grounds indicating how and why they are representing misleadingly. They expect us to detect the misrepresentation and to see how it belongs. It is up to the viewer to seek out key relevant features of the helpful misrepresentation (Sperber and Wilson, 1995). Tropes There are many ways to speak our mind figuratively. “Your life is in your hands,” and “Life is like a box of chocolates.” The first is a metaphor, which asserts A is B. The second is a simile, which tells us A is like B. “Play it again, Sam” is a cliché, a saying that has become overused. To call someone pilgrim evokes husky John Wayne and is an allusion. All these and more are tropes, the technical term for forms of nonliteral language. If we define a metaphor as a sentence with the form A is B, then wordless pictures cannot be metaphors, by definition, because they do not contain sentences. If we define each trope in terms of words, no trope can be present in language-free pictures. But there are unrealistic pictures that are meant to convey messages, and so it is useful to

232 / JOHN M. K ENNEDY

AND

DAN L. CHIAPPE

import some of the terms for tropes and apply them to nonlinguistic media while admitting the fit is loose. Richards (1936) defined the key parts of a metaphor as the topic and the vehicle. “Charley is an angel” has a topic (Charley) and a vehicle (angel) for commenting on the topic. All tropes could be defined as word vehicles commenting on topics, with the different kinds of tropes (metaphor, simile, hyperbole, etc.) taking slightly different forms. Richards argued metaphors, like all the tropes, are matters of thought, not just matters of language. This licenses the search for metaphor in any area of cognition. Alas, the notion that tropes are basically thoughts is misleading. Tropes are expressions. To understand an expression, literal or figurative, we need to find the claim being made, that is, the thought being expressed. “Charley is an angel” tells us that Charley is wonderful. That claim is not figurative. It would be a mistake to stop the thought at the figurative expression and take it that Charley is divine. No trope in language has another trope in thought as its base (contra Lakoff, 1993). Rather, the target of comprehension is the thought that the topic has certain features, as does the vehicle (Chiappe and Kennedy, 1999, 2001a). Tropes are figurative expressions of thoughts that are not themselves metaphoric. By analogy, thought can be expressed in English, but thought is not English. The relevant domain of cognition for tropes is broader than language but narrower than all of thought. It is the domain of representation. Expressions are representations. So, too, are pictures, statues—and movies. If any form of representation can provide accurate information and can be misleading (and pictures, statues, and movies can), it can be deliberately misleading, with its recipient expected to detect the intention. That is the core definition of a trope. Therefore, the pictures in movies can involve tropes akin to the tropes in language. Regular, static pictures (drawings and paintings) take many forms that are readily described as kin to tropes (Kennedy, 1982). On the editorial page, newspapers proudly display pictures of movers and shakers of our day in compromising postures and locations, none of the pictures realistic. A haul of recent editorial cartoons has some politicians with bags over their heads, others on traffic stop signs, others hollow, and still others seeing no evil. The drawings show objects with features pertinent to the fixes the public figures are in: trying to keep out of sight, trying to halt a tide of criticism, offering no policy, remaining aloof, and avoiding responsibility. The features specify the fixes succinctly and aptly, given knowledge of the affairs in the news. The devices we can use in tropes are legion. Every dimension of cognition can be used precisely and literally or in some inexact fashion deliberately to emphasize the proper state of affairs. Some devices are closely allied with a mode of expression. Others are not. One trope that is highly allied with a mode of expression is the pun. By definition a linguistic matter, a pun involves two similar-sounding words and two referents, with a point, expressed humorously, as in “There is a gulf between me and my man, and it’s pronounced golf.” Movies generally do not fulfill all of these criteria: They do not show two scenes with similar sounds that are both representations, with different referents, and a point being made humorously. But clearly they could come partway. Is that a power lawn mower? We are indeed viewing a garden scene. But wait! The next scene is Walter Matthau asleep indoors on his couch. The sound is his snoring. But, of course, the lawn mower/snore sound is not a representation of another referent as words are. So this is not strictly a pun. Some movie devices, like allusion and caricature, are close to their linguistic cousin tropes. Let us consider some of the especially lively, distinctive devices movies use.

METAPHORS IN M OVIES / 233

The lawn mower/snore example is not close enough to be a pun. But sound is full of information and can be used misleadingly, deliberately. Stoffregen and Bardy (2001) point out we can use relationships in the world available across sensory channels as information. One example in the real environment is the gap in time from visual evidence of a collision to sonic evidence. The forked lightning we see now is accompanied by sound that arrives later. The precise timing difference is accurate information for the distance to the lightning (a perceptible cue that traditional theories about perception of distance that consider vision alone or sound alone will miss). The lawn mower/snore example may not be a pun, but it shows how sound is used in movies nonliterally to herald an event we cannot yet see. Before we lose sight of one scene such as a beach, the sounds of another scene, such as footsteps on a gym floor indoors, are played. Then the visual scene switches to conform to the sounds. This displacement with sound preceding sight is a break in temporal ordering as well as space. The story is unfolding in sound before it unfolds visually. Should theory of the effects of relationships stress neural mechanisms and semantics? Consider first a neural model of combinations of signals. When the sound fits the scene, it often has a vivifying effect. We could say sound or sight alone has minor effects, but sound plus sight multiplies their effects. Sound plus sight equals sound times sight! Mechanical models for multiplier effects are not hard to devise. The neuron that responds to a sound or a related sight fires four times per second to each, say, but fires sixteen times per second, not eight, when both sight and sound are present. One explanation is that sight or sound alone sends some excitation to a collector neuron (each firing it eight times per second), but in addition each excites inhibitory neurons that can suppress the collector (each forcing the collector to lower its firing rate by four times per second). Twice eight plus twice minus-four sum to a total of eight units of excitation. When sight and sound work together, sight could suppress the inhibitory neuron in audition to zero, and sound could suppress the inhibitory neuron in vision. Hence the total arriving at the collector neuron is twice eight plus zero inhibition, for a total of sixteen units of excitation per second. Such mechanical models are easily devised for enlivening effects, that is, those things that have to do with energy in a scene. However, few tropes can be so easily modeled mechanically, because information, reference, deception, and purpose are at issue, not energy. Consider the meaning of light in this respect. Fading light in nature means the sun is going behind an occluder, such as the hills, clouds, or the horizon. The sun coming out brightens the scene. Literally, light is photons, waves, and illumination. Darkness is absence of light. We can model the perceptual effects of light and shadow as increases and decreases in neural activity in vision. But getting information means responding to what the light means. Literally, this means matters of depth, size, surfaces, and color. There are no colored neurons, ones that fire red or blue sparks, which shows that what we have as visual compressions is not contained in the firing in the brain. Information is a matter of patterns distinctive to their sources (Gibson, 1966). The firing of a neuron may be governed by a visual input, but if it fires at sixteen rather than eight times per second, this may model importance or relevance rather than physical intensity. Further, metaphorically, light expresses life and darkness, death, neither of which is specified by firing rate. In The Sixth Sense (1999), a boy, Cole Sear, “sees dead people.” The settings for sequences of events are places like cars, outdoors, churches. As a sequence ends, the visual array fades to black. After a brief pause, the array ramps up in brightness, and a

234 / JOHN M. K ENNEDY

AND

DAN L. CHIAPPE

new setting and sequence develop. This stately composition of fades and brightenings is not explained. But we learn at the close of the movie that the central adult male character, Dr. Malcolm Crowe, is himself dead. The fading hearkens us unto death. The dead Crowe has no existence outside of the interactions we see coming into bright being and fading into nothingness. The fading, darkness, and lightening figuratively portray going into and out of the state of death. The kind of physical fades that likely signified nothing more than scene changes when we watched the start of the movie do not change physically but come to signify mortality. Our eye movements provide us with vantage points, fixations, detailed centers, and a vague periphery. We see from our eye-height. We cannot jump instantly from one viewpoint to another miles distant, let alone the other side of a room. In Oliver Stone’s movie melodrama Nixon (1995) with Anthony Hopkins, Richard Nixon’s presidency becomes more and more untenable. Stone periodically shows a picture of the White House (not a clear one but a rather grainy, newsprint-looking one). He tilts the camera so that the White House is at an angle, almost like a ship that is sinking. It is quite unsettling and apt. Movement of a camera is often used to emphasize parts of a story. The camera gymnastics are devices, not plot. Moulin Rouge freeze-frames do not signify the dance stopped. They offer a picture that alludes to artworks. Some camera devices frame the entire plot. Camera viewpoints are high above a scene at the start of many movies. The setting and period of the story are revealed. We do not have to say explicitly (except for effect) “in a galaxy far, far away” or “once upon a time.” The camera may gradually move in from above a town to neighborhoods, then to house and street level, and finally to a room and its occupant. Likewise, as a story ends, the camera viewpoint may retreat from the main individuals and their facial expressions to a viewpoint in middle distance with traffic visible, and then to a higher location where a whole bridge and ocean setting are visible, and then perhaps the direction of the camera tilts to take in the sky. Metaphorically, too, we are to draw back, leave the characters to their own devices, and get on with our own concerns. Spatial changes signify story starts and endings. In The Sound of Music (1965), the singer Maria in the Salzkammergut hills is relatively stationary, but the camera swings around her, and the sweeping motions symbolize and emphasize her joy and freedom and the glory all around. Our movements are constrained by nature. We walk and dance within our means. In movies however, liberties can be taken for effect. Whittock (1990) notes hyperbole in Fred Astaire dancing so happily in Royal Wedding (1951) that he ends up dashing across the walls and ceiling. When objects are present in a scene in real life, they neighbor one another. In a picture, however, we can have a collage that would be hard or impossible to arrange physically. But what is juxtaposed in movies is not simply a set of objects that adjoin one another. Representation involves selection and ordering of objects for a purpose. In language, synecdoche is the use of part for a whole, as in “Sail hoy!” meaning a sailing vessel. This is the bread and butter of movies. We see a hand enter the screen and are aware of a powerful presence. A shadow moves across a window, and Dracula is near, we realize. We see successive images of crowds, a clock, steam, parts of iron wheels, a ticket-taker, and a man with a flag, and we know this montage specifies a train station of old. Montage, the quick replacement on screen of one object by another, is part of setting the scene, not someone’s eye movements. The artful selection of objects is crucial to being informative about the scene and its import.

METAPHORS IN M OVIES / 235

Everyday objects take on representational force in movies, precisely because they have been singled out for presentation. On occasion, the objects are highly specific by association with a character or plot. If we refer to a companion as “Mr. Blue-suede-shoes,” this suggests something about his character. In movies, if we start by focusing on an Aston Martin or a martini being shaken not stirred, James Bond comes to mind. A whitemaned, honey-blond horse rearing up suggests the Lone Ranger. The Bat Signal tells us about the Dark Knight. Metonymy is the trope used in “blue suede shoes” and the shaken martini. An object associated with the referent stands in place of the referent. Notice however that in language, we mean to refer to the object when we use the associate. We say, “Are you really going to go out with old belly-button ring?” and we do not mean that someone will be squired by a ring, old or new. We mean the lad who sports the ring. But if we see the Batsignal in the sky, we do not think Batman is up in the clouds. Metonomy is not as readily filmed as synecdoche. Parts being present entail wholes being present, except in unusual circumstances. Things merely associated with Charles Chaplin being evident do not necessarily mean Charley is here. Hence theme music, like the John Williams great white shark theme in Jaws (1975) is useful, to fill a gap in the filmic lexicon and tell us what has joined us. The term in language for an item that fills a gap in the lexicon is catechresis. “To fall in love” is catechretic because there is no standard, literal verb for becoming in love. Because psychological states are in principle invisible, we need catechretic devices in pictures. We put question marks, light bulbs, radiant lines, smoke, and the like above peoples’ heads in strip cartoons to show they are puzzled, insightful, delighted, and angry. In cartoon movies, we add little birds swirling around the head of someone who has a concussion. In regular film, catechretic cartoon devices are out of place, but there is still a need for devices to fill gaps in movie vocabulary. To a great extent, movies rely on excellent actors and explicit scripts to go beyond nonlinguistic sight and sound. More-subtle movies offer less-explicit scripts. Science fiction movies took a leap forward when they stopped explaining new technology in painful lectures within the storyline. “The grapple grundizer works this way.” James Bond movies adroitly played on these boring and unconvincing expositions by using the admirable Q, who demonstrated his cute deadly toys to the bored 007. But to show love, hate, intention, courage, and the like, movies resort to gestures, poses, and tokens of these states because they require these to be on show to motivate plot developments. We have to know who is good (Snow White) and who is horrid (Cruella de Vil), and in regular drama, there has to be a show of kindness and cruelty to make clear who is who. Some tokens of mental states are found fairly frequently. Death often takes the form of a head turn, a stare, mouth open, and a blank look. Love involves a prolonged gaze, with mutual smiling. Evil requires furrowed brows and a set mouth. Intention requires clenched fists and teeth grinding. Sadness requires turning away slowly, with drooping shoulders. Happiness involves skipping and swirling, and for males, whistling. Surprise requires a double take. This is all a bit theatrical. The poses are exaggerated, conventional, pat, trite, and formulaic. In real life, we are not so easy to read. Most mental states never make it to the outside. Most crowds do not have one expression across the many faces, even if the event the folks are watching is attention-getting. The most common expression in a real crowd is probably neutral, but the crowd in a movie is often a chorus, and the same emotional theme is represented across many faces.

236 / JOHN M. K ENNEDY

AND

DAN L. CHIAPPE

To show someone moving swiftly, movies at times resort to catechresis. A man moving swiftly can be shown in slow motion or with colored lines trailing after him (a device borrowed from comic books), or everything in the scene can be turned to lines pointing in the direction of travel (as in warp drive). Notice that fast-moving dots of light might well turn to lines in our visual impressions, but if we are shown the starship Millennium Falcon surrounded by lines, we are being given direct vision of the craft and an observer’s impression of the starfield. To provide for the passage of time, everything in the scene can be speeded up. These are figurative devices precisely if they are not information for slow travel, leaving a wake, a representation of a visual impression one of the characters would have, or accelerated action in the plot. If the viewer is supposed to take it that characters in the scene have these impressions, they are generally literal. In Superman (1978), the caped one used his x-ray vision to look through an obstacle, and we could see what he saw. Our point of view switches from outside the Man of Steel to inside. If he is in the picture while we also see the x-rayed impression of the object, the event is metaphoric. Hendiadys is one by means of two. “I present to you a great writer. I present to you a fine friend. I present to you a defender of the arts.” These could be three people or, Rudi Arnheim, three in one. Hendiadys is a trope, and merely showing double images of an anxious nurse bending over us is not hendiadys because it literally shows what we may be seeing as we wake up from a concussion. But if we see the patient doubled, and it is only the patient who has suffered trauma, then this is imaginative and a visual trope. Anticlimax is part and parcel of comedy. In Monty Python and the Holy Grail (1974), we are promised a monster. We are warned about the monster. We see bones around his lair. The laird of the lair turns out to be a rabbit. We are amused and even more amused when this, we soon discover, is a particularly vicious wabbit. Euphemism is required in movies because we cannot give full exposure to the full dimensions of realistic sex, violence, birth, or death in mainline fare. At crucial moments in romance movies, the screen fades slowly. This is akin to ellipsis: the event carried on to its . . . Movies look away at crucial times and give us the reaction of bystanders or show a scene from nature that has the same emotional significance as the plot development. Hence, we see crows flying off if the scene has supernatural significance or if it is romantic, attractive sunsets or a clear warm starry night with sparks flying up from a campfire. If we are watching a Hitchcock romantic thriller (North by Northwest, 1959), a train may rush into a tunnel. Meiosis is understatement (a kind of anticlimax but one that heightens the drama). It is rare in movies, but it can be managed. If we see an event of cataclysmic significance from a distance, it diminishes its impact. If the prisoner is toiling under a hot sun, seen from close up his struggles are perfectly plain. From a mile away, with other events in the scene and other people busy about their daily events, the prisoner’s lot is de-emphasized. Strong understatement was present in James Cameron’s Titanic (1997) when a mother told a bedtime story to her children as the ship was slipping down. The motion of the viewpoint back to on high at the end of a movie can be a kind of meiosis, a lessening of the involvement. Persif lage is irresponsibility—the serious is treated flippantly. It is the basis of farce. The Monty Python movie Life of Brian (1979) was replete with persiflage. At the equivalent of the Sermon on the Mount, two members of the crowd get into an undignified scuffle. Treating the grand with indifference—this is persiflage. It is attention to the rude

METAPHORS IN M OVIES / 237

and insignificant instead of what is major. Laugh-a-minute movies frequently show scenes that echo parts of dramatic movies but out of place, with no connection to the storyline, in ways that undermine their seriousness. In Naked Gun 2 1/2 (1991), a plane that buzzes our comic hero, echoing Hitchcock, turns out to be a model plane (again, a kind of anticlimax). In this vein, Tim Burton comedies portray death as banal, full of administrivia, and K-Mart shoppers. Prolepsis is anticipation of the future, nonliterally. It is rare in movies though literal omens of the future are common. At the end of American Beauty (1999), the central character dies of a gunshot, and red bloodstains on the wall remind one of roses, which had appeared in dreams early in the movie. The roses also show up in vases. They are found hither and yon. But each time they have a literal place. They are part of the character’s surroundings, visual impressions, dreams, or vivid wish-fulfilling imaginings of a young girl opening her blouse and rose petals pouring out. They are there for clear, natural reasons. In prolepsis proper, the roses would appear without natural support. An example in a text would be “The joker was dead from the start” when he was actually alive and kicking at the outset. If, in between scene changes, the movie simply presented roses briefly (much as one might do behind credit titles), this would be closer to persiflage proper. Magnolia (1999) was replete with odd events. For example, frogs dropped from the sky, but these portents of chaos, coincidence, and doom were portrayed as real events. In tropes, they would not be. If “Jimmy has the legs of a frog” is a trope, then he does not. For no natural reason, the soldiers in a scene might appear dead, momentarily. They might have burned clothing for an instant. One might see grieving relatives for a second. The future might be present in their reflections in water. Their shadows may show them maimed. If the characters continue unawares, this is close to persiflage proper. The soldiers might speak in ways that are ambiguous, meaning one thing to themselves and quite another to us, the audience, the irony lost on the players but not the onlookers. In sum, movies can readily distort observable time, shape, and order of events and select aspects of objects and pose groups of objects in ways that are kith and kin to many of the tropes of language. Some tropes are difficult to manage in movies. Many nonliteral devices are common in movies. Theory We have sampled some of the rhetorical devices in language and asked if they are in some sense available for movies. Some are used commonly, but others are not. With the examples in mind, let us turn to systematic questions. How are pictorial tropes related to tropes in language? What is the basis of all the tropes? Some tropes in language may not be distinguishable in pictures. “Man is a wolf ” (a metaphor), and “Man is like a wolf ” (a simile) only differ in the word like. Both have the same topic (man) and use one vehicle (wolf ) to comment on the topic. Pictures do not have the word like to insert and remove. Therefore, the distinction between metaphor and simile is not importable to movies directly. The movie can present a wolf and a man in one scene. But in a picture containing a man and a wolf, does the wolf comment on the man? Is the proper term for the person in the picture man? Is wolf the proper term for the quadruped? The picture is not specific (Gibson, 1966; Kennedy, 1993; Carello and Turvey, 2000). No picture is an exact equivalent for a particular set of words, literal or metaphoric. To the extent that our definitions of terms rely on the presence of

238 / JOHN M. K ENNEDY

AND

DAN L. CHIAPPE

language, movies cannot fulfill the exact definitions. A particular movie sequence shares only a few features with the particular trope in this instance. But since movies are representations, it is fair to describe some as showing referents in a nonliteral fashion. The movie represents and, therefore, has an intention behind it. The intention can be to violate a canon, and the violation is meant to be taken as deliberate and not a feature of the thing represented. That is, parts of movies can be figurative. If the movie provides a trope, some part of the object being shown (e.g., backwards time in Memento) is not the referent of the trope. It is the vehicle. There are features of the object (reversal) the referent (loss of memory) does not have. It is the job of the audience to detect what relevant features are shared by the object on the screen and the referent. Slow speed can mean remarkably swift speed because both are unusual speeds. Time going backwards can mean highly unusual use of periods of time. Roses can mean blood because a bouquet of roses is red spread like a bloodstain. In short, the trick in discovering the intention in a display is to discover the key relevant features and to discern how they are related to the topic and the vehicle commenting on the topic. All movie tropes involve a topic or referent, a vehicle on the screen, and relevant features. That all tropes stem from one condition means there is continuity among tropes. They are not independent devices like ploughshares and guitars. They are all manifestations of one criterion: a set of features made relevant. This is what allows new tropes to be understood. The question, given a new figure in which A and B are present, is to find what is relevant. If “man is a wolf ’ is pertinent to a combat movie, then, likely, being merciless matters. The crucial factor here is relevant features not just common features. “No man is an island” means not-separate-and-sufficient-unto-itself is relevant but the claim is that man and islands do not share what is relevant here. It would be possible to place wolves in a combat movie and aim to make the point that wolves and the central characters were not the same: Men are not wolves. This is a rare kind of theme. Pictures have trouble saying something is not true without recourse to language or to a well-known alternative theme such as suspicions about a political cover-up. However, the search for relevant features of A and B in a message is usually for features in common. Man and wolf share features, we realize, is the import of “Man is a wolf.” The features of the A and the B are not to be fused in this search for what is common. “Man is a wolf ” is misunderstood if we end up thinking he is a werewolf with fur. “Giraffes are skyscrapers” does not mean we should think about long necks with small windows (Vervaeke & Kennedy, 1996). The search does not entail images in our head, even though werewolves and funny giraffes with windows on their outside and elevators in their insides may come to mind. These are flotsam of the process of comprehension and may never occur to the reader unless the images are invited, as here. If we happen to think about our breakfast when we hear “She is a window at Tiffany’s,” that is not relevant. That is, simultaneous excitation of features is not the crucial mechanism at work here. Tropes are not driven by whatever association comes to mind. Because finding relevant features is the key process for all tropes, the comprehension of tropes is very general and does not require tutelage in every possible kind. Indeed, the set of tropes is an open set, allowing for new ones at any time. The sheep named Dolly is a clone, relatively new to this world. It is unlikely the trope “Man is Dolly” has been asserted before. But in the context of an argument (or movie) with the theme that artificial things are immoral, someone might assert this as an exaggerated claim that we

METAPHORS IN M OVIES / 239

depend on artifice and have done so since the dawn of agriculture and tool use. We might disagree with the speaker or be upset at the movie, but we would understand the point. Any dimension of comprehension (new or old) can be used straightforwardly or with an intended violation of its normal criteria to be understood as a trope. Hence, movies can display a wardful of new figurative attempts at novel messages and expect the viewer to have a good go at following them. Indeed, there is some remarkable evidence in favor of the hypothesis that novel tropes can be understood without first initiating the viewer into the significance of each device. Kennedy and Merkas (2000) asked a congenitally totally blind man (EA) to draw wheels in motion: a static wheel, a wheel spinning steadily, a wheel in jerky motion, a wheel in wobbly motion, a wheel spinning too fast to make out, and a wheel with its brakes on. EA said he had never tried to draw the referents before, but he would try. He reported he thought about wheels on a child’s tricycle, moving in various ways. The results are shown in figure 14.1. Kennedy and Merkas tested whether EA’s figures were recognizable to undergraduates. The subjects reliably picked out the referents EA intended for each picture.

Fig. 14.1. From Kennedy and Merkas (2000); © 2000 Psychonomic Society, Inc.

The conclusion that follows from Kennedy and Merkas (2000) and similar studies by Kennedy and Gabias (1985) and Kennedy (1993) is that figurative devices in pictures are interpretable on the basis of common features of the picture and the referent. For example, the wobbly motion of a wheel can be shown by distorting the wheel’s circumference to give it a shape in common with a noncircular path of the motion. The ground can be enlisted and given the shape of the wobbly path of motion, too. Pictures distribute the common features, and subjects pick them out, understanding the point of the nonliteral presentation, the distortions of the central object, and its surrounds. The features chosen as the basis of the bridge from vehicle to topic have to be apt. “Trees are straws” and “Highways are snakes” are not especially apt comparisons and make poor metaphors (Chiappe and Kennedy, 1999, 2000, 2001a, 2001b). The topic and vehicle do not have many suitable features in common. When few common features are evident, we often prefer the simile form. The similes “Trees are like straws” and “Highways are like snakes” are acceptable and comprehensible where the metaphors leave us

240 / JOHN M. K ENNEDY

AND

DAN L. CHIAPPE

blank. The key features “sucking up liquid” and “sometimes highly curved” justify similes but not metaphors. Expressions that are not especially apt are preferred as similes. “Life is a journey” and “Cigarettes are time bombs” make good metaphors. The relevant features seem salient. The features “have a start and a terminus and events along the way” seem highly pertinent to the first and “kill one eventually, after a lag of a duration we could find out” to the second. Expressions that are highly apt are preferred as metaphors. It is the pertinence of the features that underwrites the preference, not a matter of images fusing or the familiarity of a device. Figurative messages like “Life is a journey” are presented in the same form as literal ones, like “Life ends at death.” That is, straightforward information is the standard. Tropes borrow the forms of literal representation and include extras of some kind, so allowances must be made. The kinds of literal representation that tropes rest on are essential for human cognition. For example, we see particulars such as Lassie and Black Bob but classify Lassie and Black Bob as collies. This is cognition at its most fundamental. In form, metaphors are based on just such category claims (Glucksberg and Keysar, 1990). Zebras are mammals, and Fords are cars. Metaphors with this form include “Genes are blueprints” and “Education is a stairway.” Similes are based on similarity claims such as zebras are like horses, and Fords are like Chryslers. We cannot say zebras are like mammals, and Fords are Chryslers. Some tropes, such as prolepsis, hyperbole, and meiosis, are based on dimensions such as time, size, space, number, and continua on which we can place objects. Hyperbole takes us to one extreme (exaggerating) and meiosis to the other (understating). A few depend explicitly on forms of representation, such as catechresis (and puns). Persiflage and anticlimax depend heavily on expectation (as do irony and sarcasm). “Seeing as” depends on emphasis of select features a topic has in common with another and uses vision as a model. Making some features relevant for the moment is the business of all the tropes. There is no trope that exists sui generis, without a literal form to rely on. This is why the form of a message does not tell us whether a trope is at issue. Time does flee backwards in the storyline of Superman (1978). Apes do talk in Planet of the Apes (1968), and so does Charlton Heston. If we say, “Freda is my witch,” there is no telling, outside of the rest of the story, whether Freda is a boss, a wife, or truly magic. When we use a trope, there are generally two components to deal with. A is a B. Hence, it could be A affects B, B affects A or A and B are affected equally. Chiappe and Kennedy (2001a) point out when we consider men as wolves, we think of both men and wolves as killers, perhaps, and features such as “strong family groupings for children or cubs” are irrelevant to both. “Crime is a disease” puts one in mind of problems spreading, like epidemics, but also it makes one consider taking responsibility for controlling both crime and disease. There is no fusion of the A and B here. Nor is it simply a matter of putting men in the category of wolves or wolf-like things (dogs, foxes, dingoes, etc.) or things named loosely for the moment as “wolves.” Rather, the same set of features is made relevant to men and wolves; likewise for crime and disease. In The Virgin Spring (1960), Ingmar Bergman’s woods near the family’s house have trees with straight trunks. Further from habitation and closer to danger, the forest trees slant, criss-cross, become entangled. These are form symbols for safe and threatening respectively (Melnick, 2000). Shape vehicles communicate well across language barriers,

METAPHORS IN M OVIES / 241

from Bergman’s Swedish milieu to many others. Just as EA’s pictorial metaphors are interpretable outside of his native Turkish homeland, Kennedy, Liu, Challis, and Kennedy (2003) find symbolic use of simple shapes such as squares and circles communicate across language barriers. A circle is soft, and a square hard, symbolically, to English speakers, Danish speakers, Slovene speakers, and Japanese speakers. A circle is happy, weak, and good, and a square is sad, strong, and evil. Similarly, curved and straight have symbolic functions, and to a lesser extent so, too, have a sphere and a cube (Liu, 1997). In each case, the form and the referent have features in common, claims the theory that figurative expression has a literal base. Circles are unchanging, smooth, and as balls are easily rolled whereas squares have sharp angles, changes in direction of sides, and as blocks resist being pushed along more than balls. We can seek out salient features in the forms and the abstract referents. Sad people are hard to motivate to get moving, strong people can resist pulls. Sharp things can hurt us, like angles and evil. Round things are more comfortable to handle and easier to push. Movies are largely pictures and chatter. While they can be figurative, most do not use many tropes compared to novels and poetry. A novel may use every single major trope many times in a single chapter, and two hours of reading literature tests our agility with figurative communication more adroitly than many movies. The challenge to directors, from an ecological realist theory friendly with metaphors, is: Can you make a movie that mostly retains the vocabulary of objects, environments, and vantage points and tells a good story but uses the range of standard tropes? We imagine this is possible and would be well worthwhile: Tropes! The Movie! We look forward to lining up. Filmography American Beauty (1999) Dir: S. Mendes Crouching Tiger, Hidden Dragon (2000) Dir: A. Lee Jaws (1975) Dir: S. Spielberg Life of Brian (1979) Dir: T. Jones Matrix, The (1999) Dir: L. Wachowski and A. Wachowski Memento (2001) Dir: C. Nolan Magnolia (1999) Dir: P. T. Anderson Monty Python and the Holy Grail (1974) Dir: T. Gilliam Moulin Rouge (2001) Dir: B. Luhrmann Naked Gun 2 1/2 (1991) Dir: D. Zucker Nixon (1995) Dir: O. Stone North by Northwest (1959). Dir: A. Hitchcock Planet of the Apes (1968) Dir: F. J. Shaffner Royal Wedding (1951) Dir: S. Donen Sixth Sense, The (1999) Dir: M. N. Shyamalan Sound of Music, The (1965) Dir: R. Wise Superman (1978). Dir: Richard Donner Titanic (1997) Dir: J. Cameron Virgin Spring, The (1960) Dir: I. Bergman

References Arnheim, R. (1992). To the rescue of art. Berkeley: University of California Press. Bernsten, D., and Kennedy, J. M. (1996). Unresolved contradictions specifying attitudes—in metaphor, irony, understatement, and tautology. Poetics, 24, 13–29.

242 / JOHN M. K ENNEDY

AND

DAN L. CHIAPPE

Carello, C., and Turvey, M. (2000). Rotational dynamics and dynamic touch. In M. A. Heller (Ed.), Touch, representation, and blindness (pp. 27–66). Oxford: Oxford University Press. Chiappe, D. L., and Kennedy, J. M. (1999). Aptness predicts preference for metaphors, as well as recall bias. Psychonomic Bulletin and Review, 6, 668–76. ———. (2000). Are metaphors elliptical similes? Journal of Psycholinguistic Research, 29, 371–98. ———. (2001a). Literal bases for metaphor and simile. Metaphor & Symbol, 16, 269–76. ———. (2001b). Metaphor or simile: Apt or contentional? And what changes? Paper presented at the meeting of the Psychonomic Society, November 15–18, 2001 Sarasota, FL. Currie, G. (1995). Image and mind. Cambridge: Cambridge University Press. Gibson, J. J. (1966). The senses considered as perceptual systems. Boston: Houghton Mifflin. Glucksberg, S., and Keysar, B. (1990). Understanding metaphorical comparisons. Psychological Review, 97, 3–18. Goodman, N. (1966). Languages of art. Indianapolis: Bobbs-Merrill. Kennedy, J. M. (1982). Metaphor in pictures. Perception, 11, 589–605. ———. (1993). Drawing and the blind. New Haven: Yale Press. Kennedy, J. M., and Gabias, P. (1985). Metaphoric devices in drawings of motion mean the same to the blind and the sighted. Perception, 14, 189–95. Kennedy, J. M., Liu, C. H., Challis, B. H., and Kennedy, V. (2003). Form symbolism across languages: Danish, Slovene, and Japanese. In C. Zelinsky-Wibbelt (Ed.), Text, Context, Concepts (pp. 221–39). Amsterdam: Mouton de Gruyter. Kennedy, J. M., and Merkas, C. E. (2000). Depictions of motion devised by a blind person. Psychonomic Bulletin and Review, 7, 700–6. Lakoff, G. (1993). The contemporary theory of metaphor. In A. Ortony (Ed). Metaphor and thought (2nd ed.) (pp. 202–51). Cambridge: Cambridge University Press. Liu, C. H. (1997). Symbols: Circles and spheres represent the same referents. Metaphor and Symbol, 12, 135 – 47. Markgraf, S., and Pavlik, L. (1998). “Reel” metaphors for teaching. Metaphor and Symbol, 13, 275–85. Melnick, B. (2000). Cold hard world/warm soft mummy: The unconscious logic of metaphor. Annual of Psychoanalysis, 28, 225– 44. Parks, T. (Ed). (2001). Looking at looking. London: Sage. Richards, I. A. (1936). The philosophy of rhetoric Oxford: Oxford University Press. Sperber, D., and Wilson, D. (1995). Relevance: Communication and Cognition. Cambridge, England: Cambridge University Press. Stoffregen, T. A., and Bardy, B. (2001). On specification and the senses. Behavioral and Brain Sciences, 24, 195 –261. Van Meter, J. (2000, Dec.). The wildest party. Vogue 306 –19. Vervaeke, J., and Kennedy, J. M. (1996). Metaphors in language and thought: Falsification and multiple meanings. Metaphor and Symbolic Activity, 11, 273–84. Whittock, T. (1990). Metaphor and film. Cambridge: Cambridge University Press.

List of Contributors Index

Contributors BARBARA FISHER ANDERSON is the managing director of the Center for Cognitive Studies of the Moving Image and coauthor of “The Case for an Ecological Metatheory” in Post Theory: Reconstructing Film Studies (1996). JOSEPH D. ANDERSON is the chair of the Department of Mass Communication and Theatre at the University of Central Arkansas. He serves as the director of the Center for Cognitive Studies of the Moving Image and is the author of The Reality of Illusion: An Ecological Approach to Cognitive Film Theory (1996). D AVID B ORDWELL is the Jacques Ledoux Professor of Film Studies Emeritus in the Department of Communication Arts at the University of Wisconsin–Madison. He is the coeditor with Noël Carroll of Post Theory: Reconstructing Film Studies (1996). His books include On the History of Film Style (1998), Planet Hong Kong: Popular Cinema and the Art of Entertainment (2000), and Figures Traced in Light: On Cinematic Staging (2005). With Kristin Thompson he has also written Film Art: An Introduction (seventh ed., 2003) and Film History: An Introduction (second ed., 2002). V ICKI B RUCE is the head of the College of Humanities and Social Science at the University of Edinburgh and honorary professor of psychology at the University of Stirling, where she continues to research face perception and social cognition. Her books include Visual Perception: Physiology, Psychology, and Ecology (with Patrick Green and Mark Georgeson, fourth ed., 2003), Recognising Faces (1988), and In the Eye of the Beholder: The Science of Face Perception (with Andy Young, 1998). C LAUDIA C ARELLO is a professor and the head of the experimental division in the Department of Psychology at the University of Connecticut, where she also serves as the director of the Center for the Ecological Study of Perception and Action. D AN L. CHIAPPE is an assistant professor of psychology at California State University, Long Beach. He has published numerous articles on the role of comparison and categorization processes in the comprehension of figures of speech. His other research interests include foundational issues in evolutionary psychology and cognitive and linguistic processes involved in reading disabilities and higher-level reasoning. J AMES E. C UTTING is a professor of psychology at Cornell University. He is a fellow of the American Psychological Association and of the American Psychological Society. He has two published books and a hundred scientific articles. He is the editor of Psychological Science (2002–2006) and served as editor of the Journal of Experimental Psychol-

245

246 / C ONTRIBUTORS

ogy: Human Perception and Performance (1989–1993). In 1993, he was awarded a John Simon Guggenheim Fellowship. He has also been deeply interested in the relation between pictures (as two-dimensional objects) and the three-dimensional natural world. Coupled with his interest in motion, this has led him to cinema as a tool to understand the constraints under which the human visual system evolved. CHARLES EIDSVIK is a professor in the Department of Drama at the University of Georgia and the author of Cineliteracy (1978). He is currently writing on the cognitive bases of new and interactive media. He is the chair of the dramatic media production area at the University of Georgia. D IRK E ITZEN is an associate professor of film and television at Franklin and Marshall College. He has published theoretical articles and essays on documentary film, film comedy, film reception, and film historiography in Cinema Journal, The Velvet Light Trap, Post Script, and other journals and anthologies. His documentary films and videos have garnered numerous awards, including a Gold Hugo from the Chicago International Film and Video Festival, a blue ribbon from the American Film and Video Festival, and a best-of-festival award at the Columbus Video Festival. W ILLIAM E VANS is the director of the Institute for Communication and Information and a professor in the Department of Telecommunication and Film at the University of Alabama. His research interests include computerized analysis of film and video content, new media, and science and health communication. T ORBEN G RODAL is a professor of film studies at the University of Copenhagen. He is the author of Moving Pictures: A New Theory of Film Genres, Feelings, and Cognition (2000) as well as articles on emotions in film, visual aesthetics, narrative theory, evolutionary film theory, and video-game theory. J ESSICA K. H ODGINS is a professor in the Department of Computer Science and the Robotics Institute at Carnegie Mellon University. In 1994 she received an NSF Young Investigator Award and was awarded a Packard Fellowship. She was SIGGRAPH 2003 papers chairperson and editor-in-chief of ACM Transactions on Graphics from 2000 to 2002. J OHN M. KENNEDY is a professor of psychology at Toronto University. Born in Belfast, he went to Cornell University on a Fulbright scholarship for a Ph.D. with James J. Gibson. Kennedy taught perception and child development at Harvard University’s Department of Social Relations. At Toronto since 1972, his interests developed from figure-ground and depiction to a theory of pictures, including outline for the blind, cave art, Renaissance perspective, motion depiction, and cross-cultural aspects of metaphor and symbolism. In 2005, he became a fellow of the Royal Society of Canada. More information is available at www.utsc.utoronto.ca/~kennedy/. K AREN LANDER is a psychology lecturer at the University of Manchester. She currently researches the role of dynamic information on the recognition and learning of faces and has published a number of articles addressing this issue.

C ONTRIBUTORS / 247

W ILLIAM M. M ACE is a professor in the Department of Psychology at Trinity College, Hartford, Connecticut, and the editor of Ecological Psychology, the journal of the International Society for Ecological Psychology. He also serves, along with Michael T. Turvey and Robert E. Shaw, as the series editor of Resources in Ecological Psychology. S HEENA R OGERS is a professor of psychology and the head of the Department of Graduate Psychology at James Madison University. Her research interests include visual perception of spatial layout and pictorial representation of three-dimensional space. She is the coeditor of Perception of Space and Motion (1995) and Studies in Perception & Action 7 (2003). R OBERT E. S HAW is a professor emeritus of psychology at the University of Connecticut where, with Michael Turvey, he helped establish the Center for the Ecological Study of Perception and Action. He is the founding president of the International Society for Ecological Psychology and is on the board of editors for the journals Intelligent Systems and Ecological Psychology. In addition, he serves as a coeditor of the series Resources for Ecological Psychology. E D S. T AN is a professor of media entertainment at the University of Amsterdam. He is the author of Emotion and the Structure of Narrative Film (1996). M ICHAEL T. T URVEY is a Board of Trustees’ Distinguished Professor of psychology at the University of Connecticut and research scientist, Haskins Laboratories, New Haven. He was designated American Psychological Association’s Distinguished Scientist Lecturer (1998) and is a coeditor of Dexterity and Its Development (1995). J EFFREY B. W AGMAN is a member of the cognitive and behavioral sciences faculty in the Department of Psychology at Illinois State University. His current research is on perception-action, particularly perception of affordances, and the improvement of perceptual skill with practice. D OLF ZILLMANN is a Burnum Distinguished Professor Emeritus of information sciences and psychology at the University of Alabama. His monographs are Hostility and Aggression, Connections Between Sexuality and Aggression, and Exemplification in Communication. Among his edited books with chapter contributions are Selective Exposure to Communication, Perspectives on Media Effects, Pornography, Responding to the Screen, Media Effects, and Media Entertainment.

Index accommodation: 14 accretion/deletion: 42–44 acoustics, ecological: 79–101 Affliction: 116 affordance: 3, 91–95, 100, 115, 150, 162n. 1 American Beauty: 237 American Graffiti: 74 Anderson, Joseph: 24n. 5, 50, 51, 109, 114, 153, 197n. 5, 205, 215, 217 animation, facial: 128–29 anticlimax: 236 Apocalypse Now: 72, 73, 74 appeal: defined, 183 Aristotle: 171, 175 Arnheim, Rudolf: 230, 236 L’Arrivé d’un train à La Ciotat: 24n. 3 Atomic Cafe, The: 195, 198n. 13 Bateson, Gregory: 191 Bazin, André: 4–5, 49, 181, 217 Bentham, J.: 175 Bergman, Ingmar: 220 Birds, The: 25n. 8 Bodenheimer, Bobby: 62 body: as source of significance, 186–89 body-scaling: 92–94 Bordwell, David: 4, 6 Bourdieu, Pierre: 76 Bregman, Albert S.: 75 Bruce, Vicki: xi, 105–6 Buñuel, Luis: 21 Burns, Ken: 181, 183–84 Carello, Claudia: xi, 5, 68 caricature: 111–13, 122 Carroll, Noël: xii, 155, 181 Casablanca: 13, 22 Cast Away: 72–73, 77

catechresis: 235 Chiappe, Dan: 216 Chion, Michel: 71, 72 Civil War, The: 181, 183–84, 185, 194, 196n. 2, 197n. 12 Clockwork Orange: 74 close-up: 105–6, 114 color constancy: 157 Common Threads: Stories from the Quilt: 194 communication, acts of (parole): 185 constraints (see also lawfulness): xi–xii, 6, 21, 29–30, 46, 68 continuity: 20–21, 24n. 5 conventions: ix–x, xii, 8, 29–30, 76, 120–23, 218, 219, 222 convergence: 14 Conversation, The: 74 convexity: 40– 42 cooperative principle: 192–93 Coppola, Francis Ford: 74 Corner, John: 186, 198n. 11 Cosmides, Leda: 153 Crouching Tiger, Hidden Dragon: 231 Cukor, George: 116 Curtiz, Michael: 13 cut: 9, 18–19, 23 Cutting, James E.: 5, 7, 62, 63 Damasio, Antonio: 197n. 5, 197n. 8, 197n. 9 Delpeut, Peter: 116 Dennett, Daniel: 197n. 5 density, relative: 12 depth, information for: 10–16, 40 de Toth, André: 14 Dial M for Murder: 14 diegesis (diegetic world): 215 disparity, binocular: 14–15 distances, relative: 7

249

250 / INDEX documentary: 183–96 dolly: 13, 16, 24n duomorphism: 39– 40 Easy Rider: 19 editing, continuity: 7, 8, 20–21, 153 Eisenstein, Sergei: 24n. 1 Eitzen, Dirk: xi, 181 Ekman, Paul: 108–9; 120 Emma Zunz: 116 emotion: cinematic creation of, 164 –77; dispositional mediation of, 172–77; threefactor theory of, 165; two-factor theories of, 164–65 emotions, basic: 108, 109 empathy: 150–51, 170–72 Escher, M. C.: 225 euphemism: 236 Evans, William: 181 events: acoustic: 67–69, 79–101; fictional, 67– 68; mechanical, 81, 83–87; profilmic, 4, 67, 68 evolutionary development: ix, 2, 4, 7–8, 9, 31, 33, 150–51, 169, 187–88, 189, 201, 216 excitation transfer: 150, 165–69 excitatory contagion: 150 Eyes of Tammy Faye, The: 194 faces: as instrument of narration, 113; familiar, 131–5; famous, 105, 106, 132; information in, 105–6; learning new, 135–37; motion of 128–35; recognition of 105–6, 128, 129; recognition of character emotion from, 107 facial affect coding system (FACS): 109 facial expression program (FEP): 109, 117–118 Fast, Cheap, and Out of Control: 194 fidelity, perceptual: 44–45 fight-flight response: 169, 170 filmmaker, function of: 21–22, 216, 225, 230 flicker: 24n. 5 Forest of Bliss: 183, 189, 190, 192, 194, 198n. 10 Forrest Gump: 50, 75 framing: 191–92 Fridja, N. H.: 197n. 9 Fridlund, Alan: 117–19 gait: 139 Ganzfeld: 32

Gardner, Robert: 183 geometry: oriented, 8, 30–39 gesture: 119–20 Gibson, James J.: xi–xii, 1–6, 61, 68, 79, 99, 156, 215, 220–22, 223, 224, 225 Gigerenzer, Gerd: xii Glory: 184 Godfather, The: 73, 74 Godzilla: 24n. 2 Gombrich, E. H.: 10, 50, 120, 122, 125n. 16, 125n. 17 Goodman, Nelson: 218, 222, 230 Graduate, The: 12 Grice, H. Paul: 192, 193 Griffith, D.W. 18 Grodal, Torben: xi, xii, 76, 124n. 8, 149–50 Guiard, Yves: 66 height: 11 Helmholtz, Hermann von: 1 hendiadys: 235, 236 higher-order properties: 91, 92, 93 Hitchcock, Alfred: 7, 13, 14–15, 16, 21, 23, 25n. 8, 105 Hodgins, Jessica K.: 50, 51 Hollywood style: 9, 19, 23n. 1 Hoogstraten, Samuel van: 222, 225 Hopper, Dennis: 19 horizon: 220–21 House of Wax: 14, 225 Huston, John: 18 I Confess: 105, 107– 8, 110, 113–14, 116, 120 indexicality: 185–86 inertia tensor: 91, 101n. 7 information: 2–6, 7– 8, 65, 68, 215, 217, 220, 223; dynamic, 135 interest, viewer: 113 intervene, disposition to: 189–91 invariants: 3, 80, 92, 100n. 2, 157, 159 isomorphism: 40 isospectral companions: 87 Jaws: 235 Johansson, Gunnar: 62, 63 Jurassic Park: 50, 68, 100 Kant, Immanuel: 175

INDEX / 251 Kazan, Elia: 116 Kennedy, John M.: xi, 216 Kepes, Gyorgy: 218 keyframing: 52–53 Kiarostami, Abbas: 116 Koyaanisquatsi: 16 Kozlowski, Lynn T.: 62, 63 Kracauer, Siegfried: 5, 49, 181 Kuleshov effect: 21, 124n. 7

51; synthesized 52, 62; with minimal form, 63–66 motion capture: 53 Moulin Rouge: 230, 234 Munsterberg, Hugo: 105 Murch, Walter: 22, 70, 73–74, 76, 77 Murnau, F.W.: 161 musique concrète: 73 My Dinner with André: 23

Lady from Shanghai, The: 72, 106, 161 Lady Vanishes, The: 7, 16 Lander, Karen: xi, 105 language system (langue): 185 lawfulness: 30–32, 80 Life of Brian: 236 lighting: 7, 149–50, 152–62 listening: everyday vs. musical, 81, 89 Lost World, The: 18 Lucas, George: 15, 70, 74, 75

Naked Gun 2 1/2 : 237 Nanook of the North: 194 Necker cube: 36–40, 44 Nichols, Mike: 12 Nixon: 205, 234 No Lies: 198n. 10 North by Northwest: 236 Notting Hill: 7, 10–11, 23

Mace, William M.: xii, 5, 7–8 Magnolia: 237 Magritte, René: 225 Makhmalbaf, Mohsen: 116 Malle, Louis: 23 masking: 19 Matrix, The: 8, 229 McGurk effect: 131 McLuhan, Marshall: 204 meiosis: 236 Memento: 228, 238 metaphor: in movies, 228– 41 metonomy: 235 metrical precision: 81, 85, 87–91 Metz, Christian: 49, 50, 219 Meyrowitz, J.: 204–5 Mission Impossible: 50 Mitchell, Roger: 10 Monsters Inc.: 129 montage: 22, 234 Monty Python and the Holy Grail: 236 mood: creation of, 152–62 Moore, Roger: 198n. 13 moral judgments: 169, 170–77 motion: 224; beta, 19–20, 24n. 6; information conveyed by, 129–30; nonrigid 106, 128, 143; rigid, 106, 128, 137; simulated 50–

O’Brien, James: 62 occlusion: 7, 10, 12, 38, 45; dynamical, 42–44 180-degree rule: 7, 22 optic array: 220 orientability: 33 –36, 39 Panama Deception, The: 194 Panofsky, Erwin: 216, 219, 225 Peirce, Charles S.: 219 perception: direct, 31, 38; direct vs. indirect , 7–8, 28–30 persiflage: 236 Persona: 72 perspective: aerial, 13; linear, 12, 153; motion, 15–17 Planet of the Apes: 240 Plantinga, Carl: 108, 114, 115, 124n. 11 point-light displays: 129–31, 132, 139 pragmatics: 185 presence: 68, 70, 71 prolepsis: 237 prospective control: 92 protocol studies: 80–81, 100n. 3 Psycho: 72 psychology, ecological: xi, 1–6 , 33, 68, 79–80, 91, 186, 215, 220; evolutionary, 169 Pudovkin, V. I.: 21 Pulp Fiction: x Purple Rose of Cairo, The: 223

252 / INDEX Raging Bull: 72 Rain People, The: 74 Ramachandran, Vilayanur: 154 rationality, ecological: xii reachability: 91–93 Read, Herbert: 222 realism: xi, 4–5, 49–51, 65–66, 67, 68, 77, 181–82, 200, 215, 229, 231–37; ecological 45–46 reality programming: 208–12 Rear Window: 21 Reed, Edward S.: xii, n.4 reference, concept of: 185 Reggio, Godfrey: 16 representations, mental: 88 representational momentum: 141– 42 resolution: 61 retinal image: 31 Return of the Jedi: 15 Roger and Me: 195, 198n. 13 Rogers, Sheena: xi, 215–16 Rope: 7, 23, 25n. 8 Royal Wedding: 234

Smith, Adam: 171 sound (see also events, acoustic): 67; background, 70–78; Dolby, 74–75; foleyed, 67– 68; movie, 70; synthesized, 68 Sound of Music, The: 7, 11, 18, 21, 24n. 4, 234 source event: 80–81, 85 space: action, 7, 16–18; perception of, 93–95; personal, 7, 16–18, 105; vista 7, 16–17 specificity: 61–62, 65, 85, 92 Spielberg, Steven: 18 Stagecoach: 22 Star Is Born, A: 74, 116 Star Wars: 74, 76 Streetcar Named Desire, A: 116 Sunrise: 161 Superman: 236, 240 suppression, saccadic: 19 surface, perception of: 215 surveillance, environmental: 201– 4 synecdoche: 234 synopter: 15 synthesis: 99–100 synthetic tokens: 81–82

saccades: 18, 19 scaling: definite vs. relative, 88 scene: 9–10; perception of, 215; vs. surface, 215 Schrader, Paul: 116 Schuster, Sir Arthur: 87 scission effect: 40– 42 Scorcese, Martin: 160 semiotic theory: 185 –86, 218, 219 Seven: 72 shadows: 15 shape: perception of, based on sound, 87–89 Shaw, Robert E.: xii, 5, 7–8 Shleyfman, Anna: 62 Shoah: 194, 195, 198n. 13 shot: 9,18, 23 shot-reverse-shot: 20–21 sidedness: 33–34 Silence, The: 116 Silence of the Lambs: 72 Silent Scream, The: 194 simulation: 53–59 Sixth Sense, The: 233 size: familiar, 17; perception of , based on sound, 87–91; relative, 7, 12

Tagg, John: 197n. 4 Tan, Ed S.: xi, 105–6, 205 Taxi Driver: 160 texture gradient: 12 That Obscure Object of Desire: 21 39 Steps, The: 115 Thom, Randy: 70 THX-1138: 73 THX system: 75 time to contact (tau): 95–99, 224 Tintin: 124 Titanic: 50, 236 Tooby, John: 153 Topsy-Turvey: 205 Touch of Evil: 17 Toy Story: 52 Toy Story 2: 24n. 4 tridimensionality problem: 31–32 Triumph of the Will: 194 tropes: 216, 229, 231– 41 trust, disposition to: 192–94 Tumblin, Jack: 62 Turvey, Michael T.: 5, 68 Twelve Angry Men: 7, 13

INDEX / 253 Twins: 25n. 7 Twister: 50 two-dimensional wave equation: 89 2001: A Space Odyssey: 20 Unbearable Lightness of Being, The: 229 universals: x, 107–13, 216, 230, 241 universal theory of facial expression (UFTE): 107–13, 117, 118, 123 validity, ecological: 8, 29, 44– 45, 46n. 1, 106, 122 Vampyr: 157 Vertigo: 7, 13 Virgin Spring, The: 240 virtual reality (VR): 225, 226 Wagman, Jeffrey B.: 5, 68 Walton, Ken: 217 War Room, The: 194 Welles, Orson: 17 When Harry Met Sally: 205 Wild Strawberries: 72 Wilke, David: 124n. 4 Wind Will Carry Us: 116 Wollen, Peter: 219 Zeiss, Karl: 15 Zillmann, Dolf: xi, xii, 149–50 zoom: 13, 16