Seeing into Screens: Eye Tracking and the Moving Image 9781501329029, 9781501329012, 9781501328992

Seeing into Screens: Eye Tracking and the Moving Image is the first dedicated anthology that explores vision and percept

234 76 9MB

English Pages [290] Year 2017

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Seeing into Screens: Eye Tracking and the Moving Image
 9781501329029, 9781501329012, 9781501328992

Table of contents :
Cover
Half-title
Title
Copyright
Dedication
Contents
Acknowledgements
Introduction: The Blackest and Whitest of Swans
Section 1: Seeing the Eye
1. In Order to See, You Must Look Away: Thinking About the Eye
2. Invisible Rhythms: Tracking Aesthetic Perception in Film and the Visual Arts
3. The Development of Eye Tracking in Empirical Research on Subtitling and Captioning
4. Into the Film with Music: Measuring Eyeblinks to Explore the Role of Film Music in Emotional Arousal and Narrative Transportation
5. Looking at Sound: Sound Design and the Audiovisual Influences on Gaze
6. Passing Time: Eye Tracking Slow Cinema
Section 2: The Eye Seeing
7. Shaping Abstractions: Eye Tracking Experimental Film
8. Audiences as Detectives: Eye Tracking and Problem Solving in Screen Mysteries
9. Discordant Faces, Duplicitous Feelings: The Eye’s Affective Lures of Drive
10. Using Eye Tracking and Raiders of the Lost Ark (1981) to Investigate Stardom
11. A Proposed Workflow for the Creation of Integrated Titles Based on Eye-Tracking Data
12. Eye Tracking, Subtitling and Accessible Filmmaking
Biographies
Index

Citation preview

Seeing into Screens

Seeing into Screens Eye Tracking and the Moving Image

Edited by Tessa Dwyer, Claire Perkins, Sean Redmond and Jodi Sita

BLOOMSBURY ACADEMIC Bloomsbury Publishing Inc 1385 Broadway, New York, NY 10018, USA 50 Bedford Square, London, WC1B 3DP, UK BLOOMSBURY, BLOOMSBURY ACADEMIC and the Diana logo are trademarks of Bloomsbury Publishing Plc First published in 2018 Paperback edition first published 2019 Copyright © Tessa Dwyer, Claire Perkins, Sean Redmond, Jodi Sita and contributors, 2018 For legal purposes the Acknowledgments on p. ix constitute an extension of this copyright page. Cover design: Louise Dugdale Cover image © Shutterstock All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage or retrieval system, without prior permission in writing from the publishers. Bloomsbury Publishing Inc does not have any control over, or responsibility for, any third-party websites referred to or in this book. All internet addresses given in this book were correct at the time of going to press. The author and publisher regret any inconvenience caused if addresses have changed or sites have ceased to exist, but can accept no responsibility for any such changes. Library of Congress Cataloging-in-Publication Data Names: Dwyer, Tessa, 1970- editor. | Perkins, Claire (Claire Elizabeth), editor. | Redmond, Sean, 1967- editor. | Sita, Jodi, editor. Title: Seeing into screens : eye tracking and the moving image / edited by: Tessa Dwyer, Claire Perkins, Sean Redmond, Jodi Sita. Description: New York : Bloomsbury Academic, 2018. | Includes bibliographical references and index. Identifiers: LCCN 2017049474 (print) | LCCN 2017055761 (ebook) | ISBN 9781501328992 (ePDF) | ISBN 9781501329005 (ePub) | ISBN 9781501329029 (hardback : alk. paper) Subjects: LCSH: Motion picture audiences. | Eye tracking. | Visual perception. Classification: LCC PN1995.9.A8 (ebook) | LCC PN1995.9.A8 S34 2018 (print) | DDC 302.23/43–dc23 LC record available at https://lccn.loc.gov/2017049474 ISBN: HB: 978-1-5013-2902-9 PB: 978-1-5013-5492-2 ePDF: 978-1-5013-2899-2 eBook: 978-1-5013-2900-5 Typeset by Deanta Global Publishing Services, Chennai, India To find out more about our authors and books visit www.bloomsbury.com and sign up for our newsletters.

Tessa: For Rhian, who sees art and science in all things. Claire: For Huck, and all that he will see with his own eyes. Sean: ‘You can close your eyes to reality but not to memories.’ – Stanislaw Jerzy Lec. For Josh, Caitlin, Erin, Dylan and Cael: my beautiful memory tree. Jodi: To those who have supported me to keep looking into, under, around and about things to better understand the world: Scotty, the book editing team, my fellow researchers and my students. Also to my son, Zade, for putting up with me doing it.

CONTENTS

Acknowledgements ix

Introduction: The Blackest and Whitest of Swans Tessa Dwyer, Claire Perkins, Sean Redmond, Jodi Sita

1

Section 1: Seeing the Eye 1 In Order to See, You Must Look Away: Thinking About the Eye William Brown 15 2 Invisible Rhythms: Tracking Aesthetic Perception in Film and the Visual Arts Paul Atkinson 28 3 The Development of Eye Tracking in Empirical Research on Subtitling and Captioning Stephen Doherty and Jan-Louis Kruger 46 4 Into the Film with Music: Measuring Eyeblinks to Explore the Role of Film Music in Emotional Arousal and Narrative Transportation Ann-Kristin Wallengren and Alexander Strukelj 65 5 Looking at Sound: Sound Design and the Audiovisual Influences on Gaze Jonathan P. Batten and Tim J. Smith 6 Passing Time: Eye Tracking Slow Cinema Tessa Dwyer and Claire Perkins 103

85

viii

CONTENTS

Section 2: The Eye Seeing 7 Shaping Abstractions: Eye Tracking Experimental Film Sean Redmond and Jodi Sita 129 8 Audiences as Detectives: Eye Tracking and Problem Solving in Screen Mysteries Jared Orth 154 9 Discordant Faces, Duplicitous Feelings: The Eye’s Affective Lures of Drive Laura Henderson 173 10 Using Eye Tracking and Raiders of the Lost Ark (1981) to Investigate Stardom Sarah Thomas, Adam Qureshi and Amy Bell 191 11 A Proposed Workflow for the Creation of Integrated Titles Based on Eye-Tracking Data Wendy Fox 215 12 Eye Tracking, Subtitling and Accessible Filmmaking Pablo Romero-Fresco 235

Biographies 259 Index 265

ACKNOWLEDGEMENTS

A great thought begins by seeing something differently, with a shift of the mind’s eye. ALBERT EINSTEIN

Much of the inspiration and impetus for this work came out of the Melbourne-based ‘Eye Tracking the Moving Image Research Group’ and the wonderful and collegial group of scholars in it. We thank you for all your tremendous support. We would also like to thank our excellent research assistant, Isabelle Janecki, for gathering the research data for the chapters written by the editors of this book, and all of the participants who volunteered their time to take part in the many rich studies described in this book – without you this work would not have been possible. Thanks also to Susan, Katie and the wonderful team at Bloomsbury for believing in this project. Lastly, we would like to acknowledge the support of Deakin University and Monash University for the grant awards that allowed us to work on this book.

Introduction: The Blackest and Whitest of Swans Tessa Dwyer, Claire Perkins, Sean Redmond, Jodi Sita

Prologue We would like to begin this introduction to Seeing into Screens: Eye Tracking the Moving Image with a brief reading of the preliminary research we have conducted on the opening six minutes to Black Swan (Darren Aronofsky 2010). The research affords us the opportunity to demonstrate what eyetracking data reveals and conceals, and enables us to introduce the ‘two sides’ to the collection’s structure: in this sequence, we see the eye engaging with the duplicitous nature of the text, and we see the eye seeing, motivated by the fractured fictions it is presented with. The opening to the film is split between what the viewer eventually realizes is a dream sequence, in which Nina (Natalie Portman) sees herself as the young princess Odette, cursed and transformed into a swan by Rothbart, and her then waking up and travelling to the New York Ballet Company to practise. We discover that the narrative and aesthetic codes and conventions of Black Swan have a direct impact on the viewers’ gaze patterns and fixations: the eyes are led by the continuity principles of commercial filmmaking and the codes and conventions of the psychological thriller. However, we also discover ‘seeing’ emerging in the periphery of the frame, with viewers searching or perhaps questioning their way across the mise-en-scène of the divided film world they are presented with. Black Swan opens with Nina/Odette at centre frame in a long shot with a single spotlight on her as she slowly pirouettes. This is followed by a series of close-up, tracking shots that follow her feet as they dance. The stage she moves on is bare and black, her white costume acting as a sharp and emotive contrast. What becomes the film’s play with light and dark,

2

SEEING INTO SCREENS

movement and stillness, seeing and not seeing, is immediately signified in these opening shots. The eye-tracking data reveals that most viewers (twelve in this study) closely follow the movement of Nina/Odette, tracking her feet, torso and arms as the scene proceeds. Gazing, then, is closely aligned with the congruent aesthetics of the scene and character identification. And yet, a small number of viewers (one to three at any one time) scan outside the central area of focus that the film is operating from, gazing into ‘blind spots’ or the dark margins of the frame, where something may very well be lurking. The suspense that the darkness might in fact be hiding someone is of course central to the way that the scene develops: an attributed point of view shot establishes that Nina/Odette is being watched and then followed, as Rothbart emerges from the wings to manipulate her dance, in a puppet-like way. The music closely anchors the way the scene is choreographed: as the music builds, so too does the pace of the embodied movement and the camera as it transitions to being handheld and circular, as both dances are captured in a 360-degree turn. The eye-tracking data shows that all the viewers’ eyes both follow Rothbart’s point of view and pick him out as he emerges from the shadows. When he and Nina/Odette are captured together in the same co-proximate space, the data shows that all viewers are looking back and forth, focusing on the character’s faces as they do so. There is inherent and culturally coded meaning in facial expressions, so we are witnessing viewers looking to ascertain or determine the level of fear and desire wrapped up in this unravelling scene. They are searching for meaning and feeling. One can provocatively suggest that the data shows not only what the eye sees, but also the way, why and how gaze patterns manifest, allowing us to explore the optical and cine-relational determinants of seeing into the screen. This is of course one of the cornerstone tenets of the book you are about to read: a concern with eyes seeing, and with seeing the eyes. However, the opening to Black Swan, and the eye-tracking results we obtained, are interesting for a number of further reasons. The film builds its psychological narrative out of competing strategies and cues: restricted knowledge is set against foreshadows, narrative comprehension is undermined by misdirection, reality is set against dream, and mental health is pitched against the psychosis of the divided self. One binding way that the film establishes its quivering narrative is through the constant use of reflections and mirrors. This has consequences for seeing, which the eye-tracking data picks up on. As the dream sequence ends, we find Nina waking up in her apartment, shot in close-up, followed by a long shot of her stretching in front of a fulllength, tri-fold mirror. The eye-tracking data shows us that viewers evenly split their gaze, either looking at two of the three mirrors (one of them does not have her reflection in it), or directly at Nina’s back. The rest of the opening sequence is full of such reflections, offering the viewer a divisive gaze, and providing a set of split screens upon which to focus. While getting ready to leave her apartment, Nina’s mother sees bruising/marks on her back, captured through a mirror that we see them both reflected in; on the subway,

INTRODUCTION

3

it is her reflection in the train window that the viewer is initially asked to gaze at, and then through a train window at what appears to be her ‘identical image’; and in the make-up room, which closes the sequence, the viewer sees three to eight mirrors that both reflect the dancers’ faces and fracture the spatial cues, and further introduce the doppelganger narrative when Lily (Mila Kunis) walks in. We find that the eye-tracking data captures the divided nature of the screen, often triangulating between reflection, body and foreground, and yet supporting the way a main character typically holds gaze patterns (because of frame centrality, lighting, performance and narrative action). However, in Black Swan, Nina is a divided self, the world we see and the world we see through her eyes is not necessarily ‘real’. The eye-tracking data draws attention to these subjective cues as we find viewers searching and scanning the screen for more than her reflection, for more than a mere apparition. We think Black Swan speaks – at least metaphorically – to the complexities of eye-tracking data and the work it undertakes which, as the collection will go on to demonstrate, is never ‘black’ or ‘white’. While screen culture is assembled out of codes and conventions, and various approaches in this book utilize the data to make compelling conclusions about how these affect, shape or determine gaze patterns, other approaches find subjectivity and embodiment engaging with fiction in decidedly active ways. Finally, we think Black Swan expresses the very complexities of vision: the film plays with perception, using the frame to draw attention to, and perhaps to draw upon, the very processes of peripheral vision. Moreover, Black Swan draws attention to the fallibility of vision, of not seeing things clearly, and of the embodied and affective quality of vision. When we see the eye, and when we see the eye seeing, it is, prosaically speaking, both a black and a white swan that we are comprehending.

Mapping the field There has been growing interest over the past few decades in whether and how research in cognitive and neurosciences can contribute to what might be defined as the more interpretive, philosophical and aesthetic concerns of the visual arts, and of film in particular. Research in eye tracking the moving image is therefore part of a larger field, often referred to as psychocinematics, which at its core tries to understand more about both what the viewers are experiencing and how they themselves are interacting with what they are watching. This approach is driven by both cognitive and empirical dimensions. Cognitive film theorists David Bordwell and Noël Carroll have both played an important role in championing the use of these more traditionally scientific approaches to studying film (see Bordwell 1989; Carroll 1998 and 2013), as has the film historian Barry Salt (2009). As Tim Smith summarizes, Salt describes a cognitive computation cinematics (CCC) approach, studying

4

SEEING INTO SCREENS

film viewing as ‘it triangulates our understanding of how we watch films via three traditionally separate approaches: (1) cognitive psychology and associated methods of hypotheses testing; (2) computational methods in audiovisual analysis and computational modelling; and (3) the formal and statistical analysis of film’ (Smith 2012: 2). To date, functional magnetic resonance imaging (fMRI) has been the most widely used research tool in this area (Ho-Seung Cha et al. 2015). However, more recently, electroencephalography (EEG) and eye tracking are also being used to combine an exploration of brain activity with gaze patterns as viewers watch moving image texts. Eye tracking plays a pivotal role within this domain, targeting the visual attention component of screen viewing. Visual attention is a primary cognitive factor involved in watching a film or television programme, and understanding more about how it is being employed and how visual attention changes or differs between people, or between the vast range of genres or types of films, is of key interest to screen scholars and cognitive scientists alike. Eye tracking is currently the perfect tool to explore questions around these topics; it allows us to see and measure where on the screen, relative to the mise-en-scène, action and elements such as lighting, people focus their visual attention. Carrol and Seely (2013: 53) state that ‘moving pictures are constructed from a suite of formal and narrative devices carefully developed to capture, hold, and direct our attention. These devices are tools for developing content by controlling the way information is presented throughout the duration of our engagement with a movie.’ Eye tracking allows scholars to explore viewer’s engagement by examining gaze in relation to narrative, cutting, directing, dialogue and shot length in ways that stand outside of the traditional methods of studying these elements within film (and television) studies. Early eye-tracking studies on film were first conducted by cognitive scientists without input from film scholars. In 1997, a very early eye-tracking study by Tosi recorded ten adults viewing short segments from both fiction and nonfiction films, and found little difference between the scan paths between individuals for fiction films. They reported that, generally, the eyes concentrated on the screen centre when looking at characters, or at objects that were in rapid motion. These results were repeated by Goldstein in 2007 (and again by Brasel in 2008), who also saw that more than half of the time the distribution the of subjects’ gaze fell within an area that was less than 12 per cent of the total scene. The earliest reported film studies research in this area (Marchant 2009) used eye tracking as part of an experiment involving 400 people as they viewed two scenes from Alfred Hitchcock’s Vertigo (1958). Viewers were then able to watch a collective summary visualization that resized the parts of the film attended to by groups of viewers according to how long the

INTRODUCTION

5

participants’ gaze stayed in that location. This provided ‘an overview’ of the collective visual experience. They then were able to use these visualizations to examine how Hitchcock exerts control over his films through his use of cinematic techniques to manipulate gaze behaviour. However, the results also suggested how the consistency of relationships, within the ‘rules’ of continuity editing, had the effect of holding the viewers’ gaze on certain areas of the screen. Today, almost twenty years from when these first studies were conducted, numerous findings continue to support the early conclusions, resulting in these phenomena being referred to as the ‘central bias’ towards the centre of the screen/frame (Bindemann 2010) and ‘attentional synchrony’ where the gaze responds to movement or new objects as they appear on the screen (Smith and Mital 2013). These well-described phenomena are in fact now examined in most film-based eye-tracking studies, and can be used as comparison models themselves to allow researchers to compare how much attentional synchrony is occurring, and how central bias fits and shifts with changes in the types of film stimuli. An important review of recent eye-tracking research in this area has been described in detail by Tim Smith (see 2013), a key cognitive psychologist who has led eye tracking and film research as part of a special themed issue  of Refractory on eye tracking the moving image (Redmond and Batty 2015). Further important contributions to the area of eye-tracking screen research have been made by William Brown (2015), who reflects on eye tracking’s theoretical and applied limitations as well as its future possibilities. Brown extends on these ‘so what?’ contentions in a chapter in this book. The ‘so what?’ question has in fact been part of a theoretical and methodological shift in eye-tracking research, led by interdisciplinary teams who draw on emotion, memory and affect to dissect deeper into the data, and to add more intricate analytical layers to this path-finding area. As we note in our theatrical prologue, this is one of the ambitions of this book: to draw upon different and divergent approaches to eye tracking in screen research. Eye-tracking research today results in rich and fascinating discoveries, where more of the nuances of a film’s palettes and registers are explored, such as sound design, colour, camera movement, performance and stardom, shot length, subtitling, and the subtle differences between screen genres. Now that eye-tracking devices are much easier to use than ever before, and more cost effective, and as data visualizations increase in accuracy and reach, researchers are able to set and answer many more varied questions – as we  discover in this anthology book. In ever more exciting ways, we are seeing into screens.

6

SEEING INTO SCREENS

Seeing into Screens As the previous overview of the field demonstrates, research and writing on eye tracking the moving image traditionally privileges a focus on outcomes. As a field that emerges, in the first instance, with the goal of scientifically measuring the experience of viewing moving images (Shimamura 2013: 2), it is no surprise that research most commonly proceeds as the interpretation of data produced from individual eye-tracking studies. From here, compelling conclusions on how the mechanics of the text direct viewers’ eye movements are drawn, offering a new empirical perspective on the processes of looking that screen theory has for decades theorized using the frameworks of psychoanalysis, reception studies and phenomenology, among others. The Eye Tracking and Moving Image Research Group from which this book springs has always regarded this central term of looking in as open a manner as possible (see Redmond and Batty 2015). Eye-tracking technologies enable new understandings of looking as a neurological and cognitive process – the anatomical vision of the ‘meat and bones’ body and brain rather than the metaphysical consciousness. But, as Sean Redmond and Craig Batty have written elsewhere, looking is also always ‘connected to culture, discourse and ideology, where seeing into things is always gendered, classed and raced, amongst other encultured practices and modes of being in the world’ (2015). The group’s work strives to keep this tension between the anatomy and theory of looking in play through all of its research. At an ideological level, this means approaching eye-tracking data – the raw fixations and movements with which screen content is taken in – without ceding to ocularcentrism; that is, without automatically equating seeing to knowing. At a practical level, it means being cautious about understanding gaze behaviour as ‘proof’ of a singular truth that is disconnected from the encultured practices described earlier. Seeing into Screens: Eye Tracking the Moving Image demonstrates this idiosyncratic approach in its interdisciplinary structure, where researchers from a variety of fields bring a diverse range of processes to the common method of eye tracking moving images. Spanning the fields of film and screen studies, neuroscience, psychology, linguistics, communications, translation studies and philosophy, our contributors demonstrate the wide spectrum of ways in which this work can be undertaken. In some, quantitative attention to original gaze pattern data is central, while in others this is balanced out with a qualitative emphasis on details collected through interviews with viewers. Some researchers form their core conclusions from careful statistical analysis, while others do this through a more intuitive approach to the traces and indices of their raw data. Some work is not drawn from original experiments but extrapolated from existing studies to extend or challenge those conclusions.

INTRODUCTION

7

In numerous chapters, a reflective, exploratory approach is favoured over a strictly scientific one, with writers primarily contemplating the ‘so what?’ question described previously through attention to the very system of eye tracking and what it can bring to studies of the moving image. In this way, eye tracking is taken not just as a method but as a contemporary topic for screen studies, and the book is able to push into areas and questions not yet explored in any detail in the field. This includes the examination of how issues of aesthetics function in eye-tracking research, and the cultural politics of attending to commercial over avant-garde material. It also includes the contemplation of what this method can bring to screen criticism, right through to the epistemological consequences of eye tracking’s emphasis on vision. By presenting these different processes and angles side by side, we aim to demonstrate how, as suggested earlier, eye-tracking research and the data it produces is never ‘black’ or ‘white’. Rather than wrestling these different and often conflicting approaches into one overarching perspective, we allow their intersectionality to emerge and be grappled with by readers themselves.

Sides, sections, chapters Like the sequence from Black Swan described earlier in the prologue, this book has two sides or sections. It starts with a series of chapters that encounter the eye as object of study and focus, exploring its many varied refractions as well as the visual system in which it operates. It then proceeds in the second section to consider the eye as agent, actively searching, detecting and emoting. At points, these two halves overlap, as the seen and seeing eye become indistinguishable, forming two sides of a complex whole.

‘Seeing the Eye’ The first section, ‘Seeing the Eye’, opens with William Brown’s philosophical polemic on eye-tracking politics, entitled ‘In Order to See, You Must Look Away: Thinking about the Eye’. Arguing against a tendency to prioritize eye fixations over saccades, Brown calls for a renewed examination into vision systems. His focus on three films made up almost entirely of still images – Año uña (Alfonso Cuarón 2007), La Jetée (Chris Marker 1962) and Je vous salue, Sarajevo (Jean-Luc Godard 1993) – provides him the means to interrogate and revalue ‘invisible’ moments of necessary blindness that accompany vision, as with blinks, sleep and saccades. Brown’s arresting thoughts on vision are directly followed by another chapter that invests in saccades and structural relations between saccades and fixations. In ‘Invisible Rhythms: Tracking Aesthetic Perception in Film and the Visual

8

SEEING INTO SCREENS

Arts’, Paul Atkinson ponders how eye tracking can inform understandings of aesthetics by highlighting micromovements of the eye and variations in attention across different media, comparing film to painting and sculpture. Focusing on the different temporal constraints of these mediums, Atkinson calls for a reconsideration of eye fixations, fixation cycles and saccades within the context of broader gaze patterns and scanpaths that unfold over periods of extended duration. His focus on duration, perceptual variability and aesthetic awareness provides a rich, cross-disciplinary analysis that uncovers both limitations and new possibilities within eyetracking research. Atkinson foregrounds themes echoed in following chapters within this  book, especially those by Tessa Dwyer and Claire Perkins, Sean Redmond and Jodi Sita, and Lauren Henderson. Engaging with aesthetic forms and modes of aesthetic perception, whether in relation to experimental abstraction, slow film or visual art, these chapters together make a substantial contribution to the growing body of moving-image eye-tracking research focused on non-mainstream, non-Hollywood film. In this way, they extend the scope of eye-tracking research – one of the key objectives of this book. Both Brown and Atkinson’s chapters also  address  another key aim: to provide much-needed space rarely afforded within empirical testing for critical and conceptual reflection. It is in this way that the book presents a diverse range of work in the field,  deliberately mixing empirical and reflective analyses. Brown and Atkinson’s introductory chapters set a high conceptual bar for the book as  a whole. They are complemented by another analytical overview, yet one  with an empirical bent. In ‘The Development of Eye Tracking in Empirical Research on Subtitling and Captioning’, Stephen Doherty and Jan-Louis Kruger assess how terms, measures and methodologies are employed across a range of past and present eye-tracking studies into subtitling and captioning – key factors in the global distribution and accessibility of film, television and digital screen media. Tracing intersections with media psychology and cognitive science, Doherty and Kruger focus primarily upon three measures central to this research – visual attention, cognitive load and psychological immersion – advocating for greater terminological and methodological consistency in this increasingly interdisciplinary field. The final three chapters in this first section all present results from new eye-tracking experiments. Two consider the influence of film sound on the gaze, while the third chapter draws the section to a close by instituting a change of pace and returning to issues of duration, this time in relation to slow cinema. First, Ann-Kristin Wallengren and Alexander Strukelj examine how score and soundtrack affect narrative engagement and absorption in their chapter ‘Into the Film with Music: Measuring Eyeblinks to Explore the Role of Film Music for Emotional Arousal and Narrative Transportation’.

INTRODUCTION

9

Bolstering pioneering work in the area, Wallengren and Strukelj tested short clips from Ronin (John Frankenheimer 1998), Songs from the Second Floor (Roy Andersson 2000) and the documentary Winged Migration (Jacques Perrin 2001) under three distinct sound conditions. They conclude that congruence between image and sound has a more significant effect on emotional engagement and transportation than the rhythm, pitch and tempo of the music itself. This foray into sound continues in the following chapter by Jonathan P. Batten and Tim J. Smith. Starting by comparing the perceptual attributes of auditory and visual systems, ‘Looking at Sound: Sound Design and the Audiovisual Influences on Gaze’ mixes eye-tracking results with pupil dilation data and self-reporting in order to gauge the effect of music, dialogue and sound on eye movements, emotional engagement and arousal. Testing sequences from How to Train Your Dragon (Dean DeBlois and Chris Sanders 2010) and The Conversation (Francis Ford Coppola 1974), Batten and Smith report that differences in affective response do not produce corresponding differences in gaze behaviour. Pausing for breath, and offering a segue into the second part of the book, the first half concludes with the chapter ‘Passing Time: Eye Tracking  Slow  Cinema’ by Tessa Dwyer and Claire Perkins. Here, Dwyer and Perkins test the claim that ‘slow’ films deliberately restrained in pace, tempo and action enable audiences to see differently. Presenting results from an experiment involving long-take sequences from The Passenger (Michelangelo Antonioni 1975) and Cemetery of Splendour (Apichatpong Weerasethakul 2015), Dwyer and Perkins challenge the common assumption that protracted shot durations lead to more exploratory, less directed modes of viewing. Following their chapter, the book resets, instituting a felt shift in focus.

‘The Eye Seeing’ The anthology’s second stream on ‘The Eye Seeing’ is introduced by Sean Redmond and Jodi Sita’s ‘Shaping Abstractions: Eye Tracking Experimental Film’. Engaging with the poetic, textured abstractions of Mothlight (Stan Brakhage 1963), La Région Centrale (Michael Snow 1971) and the counterpoint text 2001 (Stanley Kubrick 1967), Redmond and Sita effectively reshape eye tracking in response to embodiment theory and phenomenology. Quantitative eye-tracking data is combined with qualitative post-screening questionnaires in order to mine the productive potential of the meaning and knowledge gaps that separate yet enmesh eye fixations, memory and emotion. Eyes that actively see and seek, emoting, feeling their way and gaining access, remain the focus in following chapters throughout this section. Jared Orth’s ‘Audiences as Detectives: Eye Tracking and Problem Solving in Screen

10

SEEING INTO SCREENS

Mysteries’ addresses the popularity of puzzle films and the investigatory strategies of visual attention they invoke. For Orth, screen mysteries provide an ideal means of ‘triangulating’ understandings of task-oriented modes of perception and the ways that comprehension impacts eye movements. Orth’s exploration of investigative, analytic audiences is followed by Lauren Henderson’s detailed study into audience intersubjectivity and affective meaning in film experiences, entitled ‘Discordant Faces, Duplicitous Feelings: The Eye’s Affective Lures of Drive’. For Henderson, Drive’s (Nicolas Winding Refn 2011) multiple framed faces and excess of facial gestures invite a type of painterly gaze, pulling visual attention while nevertheless frustrating and fracturing fovea fixations. This adds to the film’s affective lure and ability to communicate beyond language. Productive links emerge here between Henderson’s analysis and Atkinson’s earlier chapter detailing durational differences between film and painting. Next up is ‘Using Eye Tracking and Raiders of the Lost Ark (1981) to Investigate Stardom’ by Sarah Thomas, Adam Qureshi and Amy Bell. This chapter specifically tracks how the presence of a star performer on screen might complicate the idea that film viewing is primarily driven by bottom-up, exogenous stimuli. Taking Harrison Ford as Indiana Jones in Raiders of the Lost Ark (Spielberg 1981) as a case study, Thomas, Qureshi and Bell consider how certain actors might cause discrepancies in otherwise predictable gaze behaviour. Their specific experiment design seeks to acknowledge the overlay between star and character, and to compare different types of scenes within the film. The final two chapters in this book return to the field of subtitling and captioning research and both focus on issues of accessibility, advocating for filmmaking and subtitling that foregrounds inclusivity for deaf and hard-of-hearing audiences. Wendy Fox’s chapter considers how eyetracking data can inform changes to subtitle and caption design, layout and integration. In ‘A Proposed Workflow for the Creation of Integrated Titles: Based on Eye-Tracking Data’, Fox sets out key steps and considerations for producing subtitles and captions that are responsive to film form and aesthetics as much as translation. Finally, Pablo Romero-Fresco’s ‘Eyetracking, Subtitling and Accessible Filmmaking’ concludes this book. As both filmmaker and translation professor, Romero-Fresco marries practice and theory, the concrete and the conceptual. Responding to the increasing presence of multilingualism and on-screen text in contemporary film and television, his chapter canvasses the fraught relationship between film and translation in order to affect change, increase accessibility and inclusivity, and foster collaboration. The eye tracking of participants viewing subtitled and dubbed screen media can provide rich material, Romero-Fresco argues, for understanding more about how audiences of all types ultimately see into screens.

INTRODUCTION

11

References Bindemann, M. (2010), ‘Scene and Screen Center Bias Early Eye Movements in Scene Viewing’, Vision Research, 50 (23): 2577–87. Bordwell, D. (1989), ‘A Case for Cognitivism. Cinema and Cognitive Psychology’, Iris, 9: 11–40. Brasel, S.A and J. Gips (2008), ‘Points of View: Where Do We Look When We Watch TV?’, Perception, 37 (12): 1890–94. Brown, W. (2015), ‘Politicizing Eye-Tracking Studies of Film’, Refractory, 25. Available online: http://refractory.unimelb.edu.au/2015/02/07/brown/ (accessed 5 June 2017). Carroll, N. (1998), Interpreting the Moving Image, Cambridge: Cambridge University Press. Carroll, N. and W. P. Seeley (2013), ‘Cognitivism, Psychology and Neuroscience: Movies as Attentional Engines’, in A. Shimamura (ed.), Psychocinematics: The Aesthetic Science of Movies, 53–75, New York: Oxford University Press. Cha, H., W. Chang, Y. Shin, D. Jang and C. Im (2015), ‘EEG-Based Neurocinematics: Challenges and Prospects’, Brain-Computer Interfaces, 2 (4): 186–92. Goldstein, R. B., R. L. Woods and E. Peli (2007), ‘Where People Look When Watching Movies: Do All Viewers Look at the Same Place?’, Computers in Biology and Medicine, 37 (7): 957–64. Marchant P., D. Raybould, T. Renshaw and R. Stevens (2009), ‘Are You Seeing What I'm Seeing? An Eye-Tracking Evaluation of Dynamic Scenes’, Digital Creativity, 20 (3): 153–63. Redmond, S. and C. Batty (2015), ‘Themed Issue: Eye-Tracking the Moving Image’, Refractory. Available online: http://refractory.unimelb.edu.au/2015/02/06/ volume-25-2015/ (accessed 12 May 2017). Salt, B. (2009), Film Style and Technology: History and Analysis, 3rd Volume, Totton and Hampshire: Starword. Shimamura, A. (2013), Psychocinematics: Exploring Cognition at the Movies, New York: Oxford University Press. Smith, T. J. (2013), ‘Watching You Watch Movies: Using Eye Tracking to Inform Cognitive Film Theory’, in A. Shimamura (ed.), Psychocinematics: Exploring Cognition at the Movies, 165–91, New York: Oxford University Press. Smith, T. J. and P. K. Mital (2013), ‘Attentional Synchrony and the Influence of Viewing Task on Gaze Behaviour in Static and Dynamic Scenes’, Journal of Vision, 13 (8): 16. Tosi, V., L. Mecacci and E. Pasquali (1997), ‘Scanning Eye Movements Made When Viewing Film: Preliminary Observations’, International Journal of Neuroscience, 92: 47–52.

SECTION 1

Seeing the Eye

1 In Order to See, You Must Look Away: Thinking About the Eye William Brown

Eye-tracking studies of film (as well as eye-tracking studies in general) focus on seeing. In this chapter, I shall propose that the human visual system relies not just on seeing, but also on moments of not seeing. That is, I shall argue that moments in which we do not see are not ‘flaws’ in a visual system that otherwise strives towards total vision. Rather, such ‘flaws’ (moments of blindness) are crucial components of vision and what it means to be human more generally. I shall illustrate this need for temporary blindness by looking at three films that are comprised uniquely (or almost uniquely) of photographs, including Jonás Cuarón’s Año uña/Year of the Nail (Mexico 2007), Chris Marker’s La Jetée (France 1962) and Jean-Luc Godard’s Je vous salue, Sarajevo (France 1993). In films comprised of photographs, we see not continuous action (we do not have a ‘perfect vision’ of events) but wilfully fragmented action (even more than in films comprised of moving images that involve narrative ellipses). These films demonstrate how what we do not see is perhaps equally as important as what we do see – both in film viewing and in life. In this way, I shall argue that these films make clear how eye tracking overemphasizes vision during eye fixation, thereby underappreciating the importance of the necessary blindness that accompanies vision. As it is cinema that draws out the oversights of eye tracking, I shall reverse the trend of science being used to explain art, providing an example rather of art being used to explain science. In order to do this, let us start with a brief explanation of human vision.

16

SEEING INTO SCREENS

The imperfect eye Andrew Parker has written about how – as part of the body’s response to atmospheric light – the eye evolved rapidly during, and thus played a key part in, the so-called Cambrian explosion, whereby the diversity of life on earth developed rapidly about 542 million years ago. Although eyes did not necessarily originate with the Cambrian explosion, Parker nonetheless links the development of the eye with predation: being able to see at some distance rather than smelling at close range or touching as a result of proximity enabled life forms to develop new ways of hunting and to find new ways of avoiding being hunted. As a result of these new strategies of survival, many new life forms evolved – in effect, a different species evolved for each different survival strategy (Parker 2003). While Michael F. Land and Dan-Eric Nilssen suggest that eyes have evolved separately in many different species (i.e. not all species with eyes evolved from the same ancestor; see Land and Nilssen 2002), nonetheless, eyes seem to have evolved from photosensitive proteins found within cells to photoreceptive cells, which themselves clustered to become the first ‘eyespots’ and then, with the addition of lenses, what we might today refer to as eyes. The point is that while Charles Darwin considered the eye to be ‘complex and perfect’ (1998: 143), the eye has in fact evolved to view the world in a heuristic/evolutionary fashion; thus, if the contemporary eye is exceptional in its photosensitivity, being able to pick out details at a distance, even in low levels of light, in some senses the eye cannot be ‘perfect’: it is necessarily imperfect in that it does not always work, nor does it take in all information that surrounds it. In some senses, to say that the eye is not perfect is obvious: Amos Vogel has pointed out how the eye is sensitive to only about 5 per cent of the light spectrum (2005: 12–19), while the fact that humans cannot see 360 degrees at once would also seem to contradict Darwin. Furthermore, we know that humans regularly fail to spot things in their field of vision (see, for example, work on ‘change blindness’; Simons and Levin 1997) and as a result miss both prey and predators. We might contend in such instances that the eye is working perfectly and that it is the rest of the body and/or brain that is the major contributor to the demise of the human, thereby necessitating a discussion regarding where the eye begins and ends in relation to the brain and the body. However, I wish to bypass such discussions and assert that the eye is not perfect and that it can always get better not just at taking in information but at taking in a greater range of information. Indeed, the very imperfection of the eye might in some respects explain mankind’s development of optical tools, from glasses to microscopes to telescopes, and so on, tools that supplement the eye both on the level of the individual (contact lenses help me to see ‘better’) and on the level of society (knowing about the microand macroscopic worlds potentially benefits all of humanity).

IN ORDER TO SEE, YOU MUST LOOK AWAY

17

To see or not to see The human eye works by receiving light reflected from the environment, which, upon striking rods and cones in the fovea, is then turned into brain signals that create vision. The blind demonstrate clearly that vision is not necessary in order for humans to live respectable lifespans, with numerous blind people from myth and history even being upheld for their notable achievements, including Tiresias from Homer’s Odyssey, John Milton and Jorge Luis Borges. Furthermore, while most humans are not blind (even if vision is imperfect, including in the sense of not having perfect vision, i.e. 20:20 vision), vision itself is predicated on gaps, or moments of blindness, which themselves often seem invisible to humans in that we tend to overlook them. These moments range from the very short to the relatively long, and they include blinking, sleeping and saccades. Each has a different function, but each also involves moments when vision shuts down. My suggestion here is that these moments of blindness are not flaws in our visual system, but structural necessities of it, as we shall see next. The reasons why we blink are at least twofold. First, we blink in order to protect our eyes from oncoming objects. Secondly, we blink in order to spread across the surface of the eye moisture from our tear ducts. Blinking thus helps to prevent our eyes from going dry, as excessive dryness might cause blindness. There is an important irony here. As we are told in childhood not to stare at the sun, especially when using binoculars or other telescopic tools, the very phenomenon that enables us to see, namely light from the sun, possesses the power to destroy our vision. The eye needs to be kept moist and in some senses it needs to be kept in the shade (as implied by the use of sunglasses above and beyond their function as a fashion item). Nonetheless, humans blink more often than is necessary for reasons of ocular lubrication alone, and so a third reason has been proposed for why we do it. Using functional magnetic resonance imaging (fMRI) to measure the brain activity of subjects watching clips from Mr Bean (various directors, UK, 1990–5), Tamami Nakano and colleagues observed that the parts of the brain associated with attention are deactivated when we blink. Nakano et al., therefore, suggest that blinking provides humans with a moment’s respite from the ongoing stream of visual information that we receive, and that this respite allows the brain to process what it has seen in a non-attentive fashion (2013). This means that blinking potentially serves a similar function to sleep as a major reason for why humans sleep is to allow the brain to consolidate information gathered over the course of a given day (Hobson 1995). If we need to blink and sleep in order to build memories, then what is the purpose of saccades? Saccades are movements carried out by the eye in order to search the visual field; saccades punctuate and are punctuated by fixations or moments when the eye remains momentarily still. Fixations here

18

SEEING INTO SCREENS

are not just spatial; indeed, we can take in information about what is in our peripheral vision when we are not directly looking at something (so-called covert attention). Rather, fixations are a primary temporal mechanism for the eye to take in visual information (even if the fixations are only very brief – as when we quickly get the ‘gist’ of a scene based on only a rapid glimpse; see Rayner 1998). In other words, we may see more in our field of vision than that upon which our eye fixates, but we nonetheless only see when our eye fixates, and not when it saccades. While we can choose to make them, saccades are primarily involuntary. What happens is that the eye performs small, ‘anarchic’ (i.e. random) movements that allow it to attend to slightly different parts of the visual field, which in turn allow humans to create a more accurate and detailed image of their surroundings (keeping an eye out for predators, prey and mates). It is the ‘anarchy’ of the saccade that allows for precision: ‘Visual searching is free-running (“anarchic”) because commanded, ordered deployment of attention is so much slower than anarchic deployment that it is faster overall to make many anarchic attentional deployments than fewer orderly ones’ (Wolfe, Alvarez and Horowitz 2000: 691). While anarchic, though, the important thing to note for this chapter is that we do not consciously see while we saccade. As Benjamin W. Tatler and Tom Trościanko explain, we can easily verify this by looking at our own eyes in a mirror: ‘Looking from one eye to the next it is not possible to see the eyes moving – they appear always to be stationary’ (2002: 1403). ‘Saccadic suppression’ likely takes place because our visual field would jumble about and we would lose orientation if we saw during eye movement. Fortunately for us, though, our field of vision remains pretty consistent. Akin to sleeping and blinking, then, saccades provide a necessary moment of blindness that allows us to see what is before us during fixations. ‘Blindness’, in this respect, is not a deficiency or a weakness, but a structuring principle of vision itself. Thus, vision is not an issue of seeing or not seeing, but a necessary combination of the two. Without wishing to sound too ‘Zen’, in order to see, you must not see. In order to see, you must look away.

Persistence of vision With this (unconventional) understanding of vision in place, we shall soon turn our attention to cinema, including the role that eye tracking plays in our understanding of film. In order to get there, though, an important question must be answered: namely, if we are not conscious of seeing (if perhaps we do not see) while we saccade, then how is it that vision is not fragmented and discontinuous, but continuous across saccades (and even some blinks)? Many scientists have long since worked on this issue, but a good way to answer this question can be found through a theory that, oddly

IN ORDER TO SEE, YOU MUST LOOK AWAY

19

but appropriately enough, has held mistaken sway in film studies for a long time, namely persistence of vision. Joseph Anderson and Barbara Fisher point out how persistence of vision has often been attributed (erroneously) to physician Peter Mark Roget, also the inventor of the thesaurus (Anderson and Fisher 1978). Roget noticed that a passing cart, when seen through the vertical slits of a window blind, appeared to be jumping from one static position to another as opposed to moving continuously between each slit, and that the spokes of the cart’s wheels, instead of looking straight, in fact looked curved. While the curvature effect is not illusory (the movement of a rotating line seen through a slit will indeed trace a curve), Roget nonetheless struggled to find a convincing reason for the static appearance of the cart and its wheels (Roget 1824). Although he does not use the term, Roget suspected that the effect had something to do with afterimages, or the belief that an image from the outside world remains fixed on the eye for a brief period. For Roget, the occlusion of the cart by the slits in the blind, followed by the cart seeming static when visible beyond the slits, confirmed the existence of afterimages, as the experience apparently demonstrates how vision is really a succession of still images (that linger on the eye), and not a perception of movement itself. Afterimages help to create persistence of vision. However, afterimages do not explain persistence of vision. If they did, then what Roget and other sighted humans would see in this case would be ‘a plethora of images resulting from the tracings scattered about the retina according to each separate fixation of the eye’ (Anderson and Fisher 1978: 6). That is, the cart would bounce around our field of vision as we see static images upon each eye fixation, with each static image then lingering on the eye for the period of a saccade before being replaced by another image with the cart having jumped to a different point within our field of vision as a result of the movement of our eye (and the movement of the cart). The important point to make here is that persistence of vision does not take place on the retina, that is, via afterimages lingering solely in the eye – a belief that is the kernel of Roget’s error. Rather, as Anderson and Fisher explain, the continuity that we see takes place not on the retina but in the brain as a result of the ‘direct path from the receptors in the eye to the brain’, with the processing of what we see ‘deferred until the signal reaches the brain’ (1978: 7). As Anderson and Fisher (now Anderson and Anderson) describe in a subsequent paper, the continuity of vision has since about 1900 been treated ‘almost without exception as principally a central phenomenon (i.e. occurring in the brain or the central nervous system)’, as opposed to uniquely in the eye (1993: 6). While we now know that the continuity, or persistence, of vision takes place in the brain, this does not mean that we know how (or even necessarily why) the brain does this. Nonetheless, I wish to suggest that the creation of perceptual continuity by the brain is a probabilistic process that, by virtue

20

SEEING INTO SCREENS

of being correct most of the time (but definitely not all of the time, as we regularly bump into things, and so on), makes it a more efficient system than actually seeing the world and its movement continuously – which might well prove exhausting (we would never be able to blink or sleep; the eye would dry up and go blind; the human might well go insane). In other words, it is not necessary for us to literally see everything; we only see as much as we need, with our brains filling in the gaps, such that we can survive – at least for a long enough period for us to be able to procreate. More than this, it is at times necessary for us not to see everything (we saccade, blink and sleep), with total vision perhaps even being detrimental to our chances of survival. Countering Darwin, then, our vision is not perfect; it is deliberately, ‘perfectly’ imperfect.

The persistence of cinema While I could have drawn on any number of vision scientists in order to clarify how the persistence of vision takes place as much in the brain as it does in the eye, it has been useful to use Roget (whose hypotheses have in fact been almost entirely ignored by psychologists of perception) because he has historically had an undue influence on our understanding of cinema. Indeed, Roget’s influence on film studies provides Anderson and Fisher/ Anderson’s very motivation for debunking him. First, Roget does not use the term persistence of vision; but more profoundly, he does not really help us to understand cinema because he ‘described a case in which a series of moving points results in the perception of a static image. In cinema a series of static images results in the illusion of motion’ (Anderson and Fisher 1978: 6). The Roget experience cannot account for persistence of vision, or how we perceive the world – and films – as continuous. Anderson and Fisher/Anderson suggest that film studies might have benefitted greatly from turning to someone like Hugo Münsterberg as opposed to Roget, as the former’s Photoplay: A Psychological Study (Münsterberg 1970 [1916]) would have been a far more informative (and more recent) source for understanding cinema and perception more generally. Fortunately, Münsterberg has in recent times become a touchstone in cognitive approaches to film, with Tom Gunning citing him as offering an early rejection of the persistence of vision as taking place in the eye (2011: 29–30). Gunning goes on to survey a wide array of literature on the phenomenon before concluding, akin to Anderson and Anderson, that the continuity of perception takes place in the brain. Importantly, Gunning addresses how the continuous nature of films is not ‘the product of a mental (or physical) processing of still images’ – even if cinema is made up of still images/frames, and even if the eye fixates in between saccades (2011: 35). Rather, cinema (and other moving image devices – perhaps including the eye

IN ORDER TO SEE, YOU MUST LOOK AWAY

21

itself) ‘do not represent motion; they produce it…they make pictures move… [even if] no one can explain it purely physiologically and the psychological explanations are still debated’ (Gunning 2011: 38–9). That is, we see neither a series of still images, even if that is what cinema at a base level is, nor do we see continuous motion as an external phenomenon as we only take in information during fixations and not saccades. Rather we – and by extension cinema – produce continuous motion (in the brain). But while we might think that cinema via its still images flickering on a screen and the eye via its fixations and saccades are ‘imperfect’ in terms of capturing and representing continuous motion, Gunning would seem to suggest that these moments of blindness are necessary, and that we see not despite, but because of these limitations. As cinema is made up of still images, we might say that it offers us an illusion of continuous motion. But as the eye saccades and yet we see continuous motion, is ‘the moving image…only an illusion if we assume our eyes are defective’ (Gunning 2011: 40)? When we watch a film, our eyes do not deceive us; rather, cinema points to the fact that to see at all necessarily involves blindness, compensated for probabilistically by the brain. In this way, psychology does not so much help us to understand cinema as cinema (and the lowly film theorist) helps us to understand psychology. Indeed, while Gunning bases his argument on work by psychologists and physiologists who have studied perception, his argument is not dissimilar to that of Gilles Deleuze, who in Cinema 1: The Movement-Image also suggests that cinema does not involve an image of movement, but that the cinematic image is movement (2005a: 62). Gunning’s omission of Deleuze mirrors the general antipathy felt by cognitive film theorists towards a more ‘continental’ approach to film. However, if in Cinema 1 Deleuze can lead us to a similar point as that reached by Gunning, then the former’s suggestion in Cinema 2: The Time-Image that cinema might also show us, or be, time might be equally useful (2005b). Deleuze succinctly points to how it is in its interstices and gaps, or in what we might call its ‘blind spots’, and in moments where little happens and/or in which we cannot tell what is happening, that cinema equally exists alongside those moments in which we can clearly see what is happening. That is, just as humans tend to think that vision is continuous and that it does not involve moments of looking away (saccades, blinks), so, too, do they tend to think that cinema is defined by continuous visible action – as is made clear by eye-tracking studies of films, which themselves emphasize and use as their points of measurement fixations as opposed to saccades. However, cinema is equally defined by ‘invisible’ forces – as we shall see in relation to films comprised of no continuous visible action at all, being instead made (almost) entirely of still images (and sounds). To emphasize continuous action and vision over blind spots and looking away – or in eye-tracking terminology to emphasize only fixations – offers only a partial understanding of vision, which relies on both.

22

SEEING INTO SCREENS

Seeing darkness Among various other contentious points, Peter Wyeth suggests in The Matter of Vision: Affective Neurobiology and Cinema that ‘in a Hitchcock film, we take in far more information than we do in a “non-classic” film’ (2015: 97). However, I would suggest that, when watching a film, I am always taking in a screen’s worth of information – irrespective of what is actually on that screen (but perhaps not irrespective of screen size). If the screen features a blank wall, I still take in a screen’s worth of information about that wall – even if it is ‘boring’. If the screen features a battle, I still see a screen’s worth of information about that battle – even if it is ‘exciting’. While Wyeth tries to use his claim to validate (his favourite aspects of) classical Hollywood, an Alfred Hitchcock film does not necessarily contain more information than a ‘non-classic’ film. Indeed, as a result of the time that we spend carrying out saccades and the fact that the human field of vision can only take in about 3.8 per cent of the average cinema screen, humans only take in about 3.14 per cent of any given film – regardless of content (see Brown 2015; Smith 2013). That is, we ‘miss’ 96.86 per cent of any given film, be it a Hitchcock or an Andy Warhol. As discussed, this is not necessarily a failing; while we might all become better humans by being more attentive in various ways, there is nonetheless too much information to take in, and structurally we need to take breaks from taking in information in order to create memories. (We would not be able to blink or sleep if we were to see ‘everything’ – and as a result we would potentially remember nothing.) Wyeth continues by saying that ‘the eye follows movement on the screen. That might be thought of as the first moment of Cinema. Movement is not just a thing in itself, nor merely a warning of possible danger. Movement is fundamental to life, to nature. The one thing that is constant in life is movement, that is change’ (Wyeth 2015: 143). Wyeth also suggests that ‘all is movement. Cinema is movement. Photography is stasis. Stasis is death. Movement is life’ (2015: 34). In many respects, Wyeth is correct – even if he does not address how ‘all’ cannot by definition be movement, as ‘all’ must include photography, which Wyeth labels as ‘non-movement’, and ‘all’ must also include death, which undeniably exists. However, there is a problem that arises in Wyeth’s argument, at the core of which is a belief in humanity’s detachment from reality. Let us consider this issue presently as it serves to synthesize much of what has preceded, while also functioning as a stepping stone to the still-image films mentioned previously. When we watch (analogue) films, we in fact sit in darkness for some 40 per cent of the film as a result of the black leader that sits in between the film frames (see Doane 2002: 172). This darkness is invisible to us, as is the ‘darkness’ of the saccade, the blink and sleep. In other words, it seems to be inscribed in cinema as in life that in order to see, you must look away.

IN ORDER TO SEE, YOU MUST LOOK AWAY

23

‘Missing’ 96.86 per cent of a film. Needing to saccade and blink in order to see. Sitting in darkness for some 40 per cent of a film’s duration. These gaps in our vision are all real, and yet we sense that we experience a continuous flow of movement on the screen. If the brain fills in the gaps that our eyes do not perceive, thereby giving us the impression of continuous experience, then the brain is in part the ‘inventor’ of that continuity (for our eyes only see in a fragmented fashion). If the brain ‘invents’ continuity, then in some respects the brain ‘invents’ time – the smooth, ongoing impression of the world, as opposed to a fragmented, stuttering world. Time is as much perceived in the brain, therefore, as it is an ‘external’ phenomenon. In this way, films that seek to show us time and not simply movement must also show us the processes that take place in order to produce time – hence Deleuze’s shift from the movement-image of action cinema to the time-image with its self-consciousness, its contemplative pacing and its breakdown of the distinction between fantasy and reality: it shows us how time is a mental as much as a real-world construct. Furthermore, if the continuity of time is ‘created’ in the brain, then in some senses the cinema does exactly the same as the brain – synthesizing fragments into a smooth continuity. (The darkness between frames is cinema’s blinks and saccades; its frames are its fixations.) In this way, Deleuze also suggests that ‘the brain is the screen’ (2000). More important for the present argument, though, is that if for Wyeth movement/change is life, then he overlooks not only how our perception of movement is predicated upon moments of stasis (fixations), which Wyeth equates to death, but also that we produce this continuity of life, much like cinema does. To see change without stasis would be to detach two things that are fundamentally conjoined – to detach humans from reality, as if reality were something ‘out there’ from which we could be detached, and not something with which we are entangled and thus in part produce. (We become with the world as the world becomes with us.) Stasis is not the antithesis of movement; it is fundamental to it. Or, in Wyeth’s terms, death is not the antithesis of life; it is also fundamental to it. Thus, cinema again teaches us something about reality. Wyeth’s insistence that cinema is only movement does not serve to help us understand cinema, but rather to declare cinema as only consisting of certain things (in Wyeth’s case, the features of classical Hollywood), to the exclusion of others. And yet, cinema, like (or perhaps as) vision, includes not just movement, but also stasis. Cinema includes stasis not only at its core, in that all (analogue) films at their root are made up of still images, but also in specific instances of films made up of photographs like Año uña, La Jetée and Je vous salue, Sarajevo. For Wyeth, such ‘still’ films cannot be films because of their stasis (an idea also explored by Liv Hausken 2011). And yet, here they are, existing and widely viewed. Not only do these films exist, but they also bring out a potential political/ideological reason for Wyeth to exclude such films

24

SEEING INTO SCREENS

from cinema, and which by extension we can use to critique eye tracking’s emphasis only on fixations, while not really accounting for saccades.

Still films? Año uña tells the story of Molly (Eireann Harper), an American college student who stays with the family of Diego (Diego Cataño), a teenager who takes a shine to her, in Mexico City. Molly and Diego form a close bond, which Diego then tries to convert into a relationship by visiting Molly in New York. While Molly is partially attracted to Diego, their age difference ultimately puts her off, and the two never get together. The film therefore depicts American–Mexican relationships, with the Mexican wanting to break out of the perceived stasis of his homeland and to accede to the United States, as reflected by his conquest of Molly. The fact that this does not happen suggests some pessimism in the film, although Año uña also ends with Diego overcoming his adolescent desire to be or to become American, and instead being/becoming Mexican. The film’s story is framed by Diego having an ingrown toenail, which is brought on while playing football. The role of the nail might seem frivolous or coincidental – but in fact it is anything but. The film is titled Año uña and it is about nails for at least two reasons. The first is that the nail is a feature of the human body that contains dead skin, while at the same time being a part of our living body. In other words, the nail suggests that humans carry death with them everywhere, and that (pace Wyeth) ‘all’ is not life, but life and death at the same time. Secondly, the title is important because it twice features the letter ñ, which is unique to the Spanish language, and which therefore functions as a signifier of Mexico’s difference from the United States. Indeed, Molly tries to learn (with eventual success) a tongue twister that consists almost uniquely of words containing ñ; to succeed in pronouncing it quickly would be a sign of her proficiency in Spanish. In other words, the title marries death-in-life with Mexicanness, a marriage that is fitting in a country that is famous for its Day of the Dead festival. Finally, that the film is made up of still as opposed to moving images suggests that in its form, Año uña is also keen to convey a sense of ‘death-stasis in life-movement’. However, as much as the film reflects upon the nail as death-in-life, and on cinema as a kind of stasis in movement, it also speaks of the ‘life’ of the United States (as the ‘home’ of cinema) in comparison to the ‘death’ of Mexico (a country that does not dominate the global market despite the recent success of Alfonso Cuarón, Guillermo del Toro and Alejandro González Iñárritu). Indeed, the film’s form (still images), provenance (Mexico) and themes (nails, Mexican–American relations) would all suggest an assertion of Mexico as real and valid, even if many Mexicans in the United States live outside the

IN ORDER TO SEE, YOU MUST LOOK AWAY

25

law, and even if the United States seeks to build a wall along its border with Mexico. That is, the film implies that the US perspective of Mexicans as illegals and/or sub-American (to be excluded) overlooks how these ‘subhumans’ (Mexicans are ‘dead’ to Americans) are in fact crucial to the United States’ existence, just as a dead nail is crucial to the life of a human body, and just as cinema is not just movement, but movement and stasis. Meanwhile, La Jetée tells the story of a man (Davos Hanich) who is chosen to take part in time travel experiments after a nuclear holocaust has sent surviving humans underground in a bid to find either a place on the face of the earth for humans to continue, or a time. Haunted by a memory from his childhood, the man travels back to that past, where he develops a relationship with a woman (Hélène Chatelain). Sent to the future, the man is given the means by which to restore his own civilization, which prompts those who are managing the time travel experiments in the present to plot to kill him, as such a restoration of civilization would threaten their power. The people from the future offer to let the man remain among them, but instead he asks to be sent back to the past. Arriving on an airport jetty, he seeks the woman, only to be shot by an agent from the present who has followed him there. The man realizes that he has been killed in front of his childhood self. It is fitting that Marker’s film is about time travel as if cinema is not just made up of movement, but also of time, and if time is invisible, then Marker uses the photographs in his film to suggest a dimension beyond movement, the invisible dimension that is time itself. (Time is invisible because we cannot see during saccades, just as cinema is made up of still images/frames.) In effect, Marker, like Cuarón, uses photography in film to demonstrate that cinema is not ‘all movement’, but that it is also necessarily time, because as the movement that we perceive is entangled with the movement of the eye (saccades, blink, sleep), so too is that movement entangled with time. In Deleuzian terms, to present an image of time is to present an image of the stasis or death that accompanies movement. The film uses science fiction to suggest on a political level that the static, the invisible, the discarded and the thrown away (literally, la jetée) are equally as important as the visible – they are crucial to its existence. Indeed, La Jetée suggests that death – akin to the dead and alive nails that we all generate – is an inescapable part of life, as the man cannot escape his inexorable fate. It is also fitting, therefore, that in the film’s only moving image, we see the woman’s eyes blinking: eyes not only see, but must also look away. To be seen is as much to be generated as a memory in the brain of the beholder during a blink as it is more literally to be seen during a fixation. Finally, I shall end with a brief consideration of Je vous salue, Sarajevo (France 1993), a film in which Jean-Luc Godard picks out details from Ron Haviv’s photograph of a soldier from the Serbian Volunteer Guards (also known as Arkan’s Tigers after paramilitary commander Željko ‘Arkan’

26

SEEING INTO SCREENS

Ražnatović) kicking prostrate Bosnian Muslims as fellow Tigers walk past, seemingly oblivious to the brutality before them. In a voiceover, Godard laconically contemplates fear and the role that it plays in what he calls culture, to which he contrasts art; culture is about the rule, while art is about the exception, or what I shall term difference. After nearly two minutes of various details, Godard reveals the picture in its entirety, before cutting to a second still image in black and white of an unidentified woman crouching over what appears to be something like an image viewer, her head hidden from view by her hair. Godard concludes by saying that he has seen so many people live so badly, while also having seen so many people die well. Godard suggests a similar combination of death-in-life as La Jetée and Año uña. It is not that Godard condones the Tigers’ brutality. On the contrary, to exclude from reality or to deprive of life through murder is what Godard would call the rule, or culture; it manifests most brutally in genocide, and is defined as what it means to lead a bad life. To lead a good life involves not the murder of the other in order to exteriorize death, or to render it an external object; it involves realizing that all who live carry death, darkness, stasis and the invisible within them. In using photographs for his film, Godard suggests that stasis accompanies movement, that the visible requires the invisible, and that to believe that the ‘all’ excludes stasis, the invisible and death is to head down a road to perdition. Thus, to conclude, we do not see despite saccades, blinks and sleep. We see because of these things. If we are to understand vision and cinema, we must begin to account for the invisible moments that these necessary actions entail, rather than emphasize only fixations, as is the case in most eyetracking studies, including eye-tracking studies of film. In an era of electric light and digital images and screens, in which the darkness between the analogue frames no longer exists, and in an era of constant surveillance in which all is supposedly brought to light, like the eye that is kept constantly open, we cannot use darkness in order to remember, and we will go blind not just from never giving our eyes a rest from the light that our technologies create and transmit, but also because we forget that not seeing – a necessary component of our experience of time – will cause us to live perpetually in a controlled, static world that in seeking to exteriorize/remove death will in fact hasten it.

Bibliography Anderson, J. and B. Anderson (1993), ‘The Myth of the Persistence of Vision Revisited’, Journal of Film and Video, 45 (1): 3–12. Anderson, J. and B. Fisher (1978), ‘The Myth of Persistence of Vision’, Journal of the University Film Association, 30 (4): 3–8.

IN ORDER TO SEE, YOU MUST LOOK AWAY

27

Brown, W. (2015), ‘Politicizing Eye-Tracking Studies of Film’, Refractory: A Journal of Entertainment Media, 25. Available online: http://refractory.unimelb.edu. au/2015/02/07/brown/ (accessed 5 December 2016). Darwin, C. (1998), The Origin of Species, Ware: Wordsworth. Deleuze, G. (2000), ‘The Brain is the Screen: An Interview with Gilles Deleuze’, translated by Marie Therese Guirgis, in Gregory Flaxman (ed.), The Brain is the Screen: Deleuze and the Philosophy of Cinema, 365–74, Minneapolis and London: University of Minnesota Press. Deleuze, G. (2005a), Cinema 1: The Movement-Image, translated by Hugh Tomlinson and Barbara Habberjam, London: Continuum. Deleuze, G. (2005b), Cinema 2: The Time-Image, trans. Hugh Tomlinson and Robert Galeta, London: Continuum. Doane, M. A. (2002), The Emergence of Cinema Time: Modernity, Contingency, The Archive, Cambridge, MA: Harvard University Press. Gunning, T. (2011), ‘The Play between Still and Moving Images: NineteenthCentury “Philosophical Toys” and their Discourse’, in Eivind Røssaak (ed.), Between Stillness and Motion: Film, Photography, Algorithms, 27–43, Amsterdam: Amsterdam University Press. Hobson, J. A. (1995), Sleep, New York: Scientific American Library. Land, M. F. and D-E. Nilsson (2002), Animal Eyes, Oxford: Oxford University Press. Münsterberg, H. (1970), The Photoplay: A Psychological Study, Minneola and New York: Dover Publications. Nakano, T., M. Kato, Y. Morito, S. Itoi and S. Kitazawa (2013), ‘Blink-Related Momentary Activation of the Default Mode Network While Viewing Videos’, Proceedings of the National Academy of Sciences of the United States of America, 110 (2): 702–706. Parker, A. (2003), In the Blink of an Eye: How Vision Kick-Started the Big Bang of Evolution, London: The Free Press. Rayner, K. (1998), ‘Eye Movements in Reading and Information Processing: 20 Years of Research’, Psychological Bulletin, 124 (3): 372–422. Roget, P.M. (1824), ‘Explanation of an Optical Deception in the Appearance of the Spokes of a Wheel Seen Through Vertical Apertures’, Philosophical Transactions of the Royal Society of London for the Year MDCCCXXV, 1: 131–40. Simons, D. J. and D. T. Levin (1997), ‘Change Blindness’, Trends in Cognitive Sciences, 1 (7): 261–7. Smith, T. J. (2013), ‘Watching You Watch Movies: Using Eye Tracking to Inform Cognitive Film Theory’, in Arthur P. Shimamura (ed.), Psychocinematics: Exploring Cognition at the Movies, 165–91, New York: Oxford University Press. Tatler, B. W. and T. Trościanko (2002), ‘A Rare Glimpse of the Eye in Motion’, Perception, 31: 1403–6. Vogel, A. (2005), Film as a Subversive Art, London: CT Editions. Wolfe, J. M., G. A. Alvarez and T. S. Horowitz (2000), ‘Attention is Fast but Volition is Slow: A Random Scan is a Quicker Way to Find Items in a Display than a Systematic Search’, Nature, 406 (6797): 691. Wyeth, P. (2015), The Matter of Vision: Affective Neurobiology & Cinema, New Barnet: John Libbey.

2 Invisible Rhythms: Tracking Aesthetic Perception in Film and the Visual Arts Paul Atkinson

Eye tracking has proved particularly valuable in highlighting the micromovements that underpin visual perception and in doing so can provide insight into the fine-grained structures of aesthetic perception in both film and the visual arts. In mapping the movements of the eye, it also provides a platform for understanding how attention varies over definite durations. This is particularly important for the study of aesthetic attention, which not only gauges what is observed, but also how the duration of viewing affects the viewer’s relationship with the visual object. It is not only about seeing something – recognizing qualia and objects that occupy the visual field – but also becoming aware of how the appearance of an object changes over time, an awareness that informs the viewer’s further investigation of the image. The results from eye-tracking studies do not readily reveal this process of becoming aware of visual difference because individual fixations could also be attributed to non-aesthetic forms of viewing, such as recognizing objects and responding to movement. In order to demonstrate the usefulness of eye tracking in the study of aesthetic perception, there has to be some form of evaluation of the relationship between the short duration of eye movements and the longer duration of aesthetic awareness. This relationship can be better understood through a comparison of aesthetic attention in film and the visual arts. Although both film and the visual arts place an emphasis on the visual object, there is a structural difference in how the time of viewing

INVISIBLE RHYTHMS

29

is managed. In film, viewing time is moderated by a range of devices from the simple movement of bodies within the frame, to the time of the shot and the broader rhythms of editing. By changing the visual field over definite durations that are not controlled by the viewer, these devices also delimit eye movement. The shorter the duration, the more likely it is that the eye reflexively responds to change rather than according with the will of the spectator or signalling conscious deliberation. This can be contrasted with aesthetic attention in the visual arts – in particular sculpture, photography and painting – in which there is no change in the material form of the image and the duration of viewing is not scripted. This lack of a temporal constraint means that there is more time for the eye to attend to the momentary variations in appearance but also to diverge from the demands of utilitarian perception. This chapter proposes that eye tracking could provide some valuable information on the structure of aesthetic attention provided that the results are analysed in terms of clearly specified durations.

Aesthetics and aesthetic perception It is important to first provide a framework for understanding aesthetics because the term is used quite broadly in both popular and academic milieux to describe a variety of things, including the formal attributes of works, types of social value, judgements of taste and forms of attention. In the field of aesthetics, the differences are manifold and many of the theories are incompatible. In the early German tradition, which continues to play a pivotal role in philosophy, the emphasis was on forms of judgement and perception. Alexander Baumgarten stated that aesthetics concerns a particular type of knowledge that leads to the appreciation of the beautiful (Ducasse 1966: 12–13), whereas for Immanuel Kant (1987), it describes a type of theoretical and contemplative disinterestedness. In more recent works, there has been a greater emphasis on the structural properties of texts as well as their social contexts. Noël Carroll (2010: 159) argues that aesthetics should be primarily concerned with the content of works and how they are placed within a particular medium or genre. However, Jacques Rancière (2013: 13), another prominent recent theorist, refers to a much more expansive notion of aesthetics as the ‘distribution of the sensible’, in which the perception of artworks is fully informed by social and cultural discourses. Many of these arguments operate at a level of complexity that is not readily assessed by tracking the gaze or the movement of the eye, because there are too many layers of possible determination between the aesthetic theory and the perceptual event. In order to narrow the frame of reference, this chapter only limits its address to the issue of aesthetic perception, which, in its contrast with normal perception, opens up a field of inquiry between aesthetics and eye tracking. Of course, there are many

30

SEEING INTO SCREENS

debates as to what constitutes aesthetic perception, but in order to further restrict the frame of reference, there will only be a discussion of one of the main strands in which aesthetic perception is distinguished by an attention to the sensuousness of appearances. When separated from the analysis of a specific work, aesthetic perception describes an attentiveness to sensual and perceptual data that is not reducible to general categories, structures of representation or narratives. One of the most succinct definitions comes from Martin Seel, who argues in his Aesthetics of Appearing that aesthetic perception is ‘to apprehend something in the process of its appearing for the sake of its appearing’ (2005: 15). It is about the sensuousness of an object rather than its placement within an epistemology, including an epistemology of perception. This is why, since Kant, there has been an emphasis on the disinterested gaze or, more appropriately, non-utilitarian perception. It is not about simply seeing objects (a woman smiling, Christ sitting at a table with the Apostles, etc.) because this does not distinguish aesthetic perception from normal perception. Instead, there has to be an attentiveness to sensual differences in the perceptual field or work of art, over and above the capacity to recognize objects. Seel (2005: 22) argues that this does not mean that aesthetic attention devolves into mere sensuousness, where all that viewers see is the play of light, colour and form. As in normal perception, the sensual properties can be incorporated into patterns and structures of perceiving; for example, there remains the capacity to see aesthetically when hue and texture are emplaced within a particular visible surface or structure. What distinguishes aesthetic perception is that appearances ‘stand out more or less radically from their conceptually determinable exterior image’ (Seel 2005: 22). There is an attentiveness to the transitory aspects of an object; its ‘repleteness’ and ‘phenomenal individuality’, rather than its conceptual form (Seel 2005: 27). Of course, this presents a challenge for analytical and empirical approaches, including eye tracking, because attention to the repleteness of appearing resists general argument or abstraction. Two people could regard the same painting with particular attention to the way that sensual forms appear, and as such deploy a form of aesthetic perception, but the way that each gaze navigates the work and the visual variations that each viewer notes could be radically different. There is commonality in the general attentiveness of the two viewers to sensual features, but not necessarily a commonality in which features are attended to. Indeed, the very fact that the work can invoke a variety of responses – there is variability in its mode of appearing – attests to its status as an aesthetic object. Because aesthetic perception involves a response to the particularity of appearing, aesthetic perception cannot follow strict perceptual rules and it is just as important to look for variability as it is to establish commonality. In aesthetic analysis, there has to be a balance between the haecceity of things and abstract principles and common structures of representation.

INVISIBLE RHYTHMS

31

One solution can be found in the notion of concrete universals, which Philip Rawson (2005: 39) argues guide the viewing process by linking particulars to more general forms. Universals are different from abstractions because they are not derived from a formal logic or theory of structuration but nevertheless provide a common framework for experience. Art can be read through quite commonplace universals, such as clearly recognizable shapes, but art often utilizes much more complex universals that require greater effort on the part of the viewer to understand them (Rawson 2005: 39). Eye tracking, insofar as it reveals an underlying structure of attention, could reveal patterns that are universally applicable to aesthetic experience and which delimit rather than determine the play of appearances. Alfred Yarbus (1967: 191) in the seminal Eye Movements and Vision revealed principles that could function as concrete universals. He discovered that the eye does not follow the outlines of visual objects, even though this is often highlighted in art criticism, and in the observation of figures, more attention is given to eyes, ears, mouths or other distinguishing traits. Furthermore, the face is more important as a point of fixation than the body, and figures receive more attention than other objects in a given space (Yarbus 1967: 191). These key organizational principles are certainly important in understanding perception in the visual arts and could be linked to gestalt theories of perception. Rudolf Arnheim (1954) wrote extensively on general perceptual principles in art but, notably, he did not confuse the perception of aesthetic objects with aesthetic perception. These concrete universals might constitute a ground upon which aesthetic perception supervenes but the two should not be confused. A general attentiveness to human and animal figures might be utilitarian rather than aesthetic if it is considered to feature in all types of perception, and as such its relevance to understanding the particularity of appearing is limited. To refer back to Martin Seel (2005: 22), it could describe a ‘conceptually determinable exterior image’ rather than the play of appearances. When it comes to analysing the particularity of aesthetic attention rather than the general principles of perception, eye-tracking studies are usually less conclusive. In an attempt to understand the specificity of aesthetic perception in a gallery context, Susan Heidenreich and Kathleen A. Turano (2011) measured viewer responses using portable eye-trackers in the Baltimore Museum of Art. The small sample of four viewers, equally split between experienced and naïve, were required to view, in situ, representational and abstract paintings across a number of genres (55). The study aimed to track the particular processes through which the viewers inspected the work while allowing for body and head movements. To reduce the level of prejudgment, in the beginning of the experiment, each viewer was led to a fixed position with his or her eyes down and was only informed of the title just before looking up. The participants were given an ‘unlimited period’ in which to view each image and the order of viewing was randomized (2011: 57).

32

SEEING INTO SCREENS

Heidenreich and Turano (2011) did not find any significant differences between viewing abstract and representational artworks. The ‘mean fixation times’ were all under 0.4 seconds, and there were no common patterns of eye movement, irrespective of the narrative or representational content of each work (2011: 62–3). There was one positive result and that was that ‘naïve’ viewers were more likely to use shorter saccades in their navigation of the paintings than experienced viewers (2011: 65). The researchers did not provide an explanation of why this would be the case. It could mean that the experienced viewers are more likely to take into consideration the breadth of the work – the framing, overall composition, and so on – when analysing individual features, but this is only speculation. What is most important here is that the study does not posit a significant difference in how viewers attend to representational and abstract works, a difference that is central to the philosophy of art. For example, Jean-François Lyotard (2012: 52) argues that modernism, in its push towards abstraction, asks the viewer to attend to the plasticity, materiality and opacity of painting. By contrast, earlier representational works place greater emphasis on the image’s readability and the recognition of familiar objects that can be more readily placed within broader cultural narratives. Lyotard is just one of many theorists who have argued that abstract and representational works require quite a different form of aesthetic engagement. Abstract works require greater attention to the surface properties of a painting – such as the mixing of paint, the marks left by the brush, and so on – because these features are not delimited by representation. If eye tracking does not reveal significant differences in attention in such distinct categories, it will be hard pressed to reveal subtle variations of attention in aesthetic perception. In a much more comprehensive study of eye tracking and the visual arts, Massaro et al. (2012: 6) presented the participants with two main genres of painting: nature images and human figure images. Within each of these categories, there were further distinctions between dynamic and static, as well as colour and black and white images. The aim was to understand the role that content plays in the movement of the eye, in particular, how dynamic content (i.e. bodies in motion) would alter the scanpaths and the number of fixations in the perception of art. Interestingly, Massaro et al. (2012: 11) found that in the perception of nature images, there were a greater number of fixations than in images containing human figures, regardless of their degree of dynamism – that is, the degree of implied movement. Furthermore, the images containing human figures had a smaller cluster size but also longer periods of fixation. The authors attribute this to the salience of the figures, which attract attention due to the deployment of familiar top-down perceptual patterns. In the absence of such figures in the nature images, it is ‘low-level’ features that are more likely to draw attention (Massaro et al. 2012: 11). Massaro et al. (2012: 15) conclude that there may be significant differences in the operation of bottom-up and top-down factors when

INVISIBLE RHYTHMS

33

viewing art images: ‘Content-related top-down processes’ attract attention in works containing human figures but in nature images, ‘bottom-up processes, mediated by elements such as color, complexity and visual dynamism, appear to preferentially affect gazing behavior’ (Massaro et al. 2012: 15). This particular characterization of the difference between bottom-up and topdown processes has limited value in evaluating aesthetic perception. Despite the deployment of a much more complex set of parameters and the use of more accurate technology than Yarbus (1967), the authors have returned to basically the same point – the body and face have a high degree of salience in the investigation of complex images. They serve as concrete universals that unify perceptual detail within clearly understandable wholes – a feature of non-aesthetic perception as well – and are more directly linked to the epistemology rather than the aesthetics of vision. It is a matter of reading the image rather than attending to variations in appearance. The attention to visual detail in the nature images is of some importance, but it is not clear if the movement of the eye is incorporated into a general search for salience – an epistemological effect – or the result of greater attention to the haecceity of appearance. It is difficult to adjudicate on such distinctions unless there is some consideration of the relationship between how the viewer consciously attends to a work and the types of top-down and bottom-up processes that eye tracking reveals. Conscious attention is a top-down process that is quite distinct from the recognition of visual gestalts, and something that can only be understood over longer durations of viewing.

Aesthetic perception and the time of viewing One difficulty in correlating eye tracking and aesthetic attention is that the method does not give sufficient consideration to the process of becoming aware in perception. This is understandable because eye tracking provides an empirical foundation for viewing behaviour and researchers do not necessarily want to undermine this by making reference to what viewers think they are looking at. However, awareness is crucial to aesthetic perception because it is not only a matter of an unconscious optical response to an image – in the unwilled and extremely rapid movements of the saccades – but also an awareness of what it means to perceive. Seel (2005: 16) argues that aesthetic perception also comprises apperception in addition to the interest in the ‘indeterminable’ aspects of appearances: ‘This concentration on the momentary appearing of things is always at the same time an attentiveness to the situation of perception of their appearing – and thus reflection on the immediate presence in which this perception is executed’ (Seel 2005: 16). The viewer is aware of a variation in the appearance of an object – it is not merely given – and this forms the basis of aesthetic interest. This aspect of aesthetic perception can only reveal itself over time because there

34

SEEING INTO SCREENS

has to be some tracking of the variability in how things appear (Seel 2005: 28). Lyotard (1997) characterizes this as a form of intrigue in which the utilitarian function of perception is momentarily suspended: What is beautiful catches the eye, stops the permanent sweeping of the field of vision by the gaze (which is what happens in ordinary sight), visual thought pauses, and this point of suspension is the mark of aesthetic pleasure. It is what is called contemplation. You wait, you linger, you wonder why, how it is that it is pleasing. (Lyotard 1997: 36) The image might provoke an immediate response – although there is no requirement that it must be immediate – but it is only through continued attention that it becomes aesthetic. Andrew Benjamin, in a broader discussion of painting, argues that in aesthetic perception there is a ‘refusal of immediacy’, insofar as the viewer is distanced from the work as part of a broader process of engagement (2004: 28). But this is not a complete refusal, as the immediate is still present within the perception. Rather, it is a refusal that aesthetic engagement is reducible to a single appearance – the one that is first given or immediate. In these approaches, there is an intermingling of a notion of sensuous immediacy – seeing in a particular here and now – with the recognition that the appearance is a momentary state of the thing. Aesthetic perception emerges out of normal perception over a definite duration to describe a type of awareness in the act of seeing. One might say that aesthetic perception is grounded in the present of appearances, but it is a present that is always integrated with both the past and future. The memory of past appearances – the variability of perception – provides the basis for future perceptions including what the viewer consciously chooses to attend to as well as a general state of awareness. The duration of this awareness could be quite short, such as the time it takes to acknowledge the frame or the flickering of colour within a single shot, or much longer, such as the overall time it takes to peruse a painting. But in either case, there has to be time to acknowledge sensual differences. Eye tracking certainly has a direct relationship to the time of perception because it tracks movement, but does it only reveal general cognitive patterns rather than attentiveness to the play of appearances? In the analysis of aesthetic perception, there must be some acknowledgement of the emergence of top-down processes that are underpinned by awareness and they have to be distinguished from bottomup processes and predetermined cognitive patterns, in which the eye is provoked to respond. At what point does a saccade indicate an exploratory movement rather than support a fixation or confirm a pattern of viewing? It is possible that these differences could be revealed through an analysis of the overall scanpath, especially over longer durations, because there is a greater opportunity to analyse variation in perceptual interest and, most importantly, intentionality in perception. It is not just a matter of analysing

INVISIBLE RHYTHMS

35

the areas of greatest fixation, as it is with the heat maps, but understanding the logic of the overall sequence of fixations and how it reflects a variation in interest. The variability of perception over longer durations can, to some degree, be studied in eye tracking, especially when the studies compare naïve and expert viewers. The longer the time of the viewing, the more these differences should become apparent because there is time for the viewers to develop divergent patterns of interest. Harland et al. (2014: 242) conducted an eye-tracking study in which participants were asked to examine the very complex painting, Édouard Manet’s Bar at the Folies-Bergère. This was chosen due to the ambiguity of its mode of address – it is not clear if the woman behind the bar is looking at the viewer or not. They found that there were differences between viewing in the first 1/10 of a second, the initial ‘gist’ and an extended investigation of the work (Harland et al. 2014: 242). In response to an open question on mode of address – to what degree does the viewer feel included or excluded by the work – the expert observers fixated for longer than the novices on the woman’s face (2014: 245). It has already been noted that eye-tracking studies have revealed a general attraction to human figures and faces, because they are often the most salient aspects of an image. But this does not directly explain why experts would have longer fixations. Harland et al. (2014: 245) argue that it is not just the duration of the fixation that is important, but the fact that the expert viewers increased the number of visits to the face from fixations on other figures, in particular reflections in the glass. In the famous painting, the woman stands before a large mirror in which the man she may be addressing can be seen. Unlike the novices, the gaze patterns of expert viewers suggested that they were examining the relationship between figures – a key aspect of the painting’s ambiguity – rather than the figures themselves (Harland et al. 2014: 245–6). This attention to structural relationships in the image cannot be explained by only deploying the distinction between top-down and bottom-up processes, for it is about the relationship between bottom-up processes and conscious top-down awareness. The aesthetic pleasure derived from an ambiguous image is based on an awareness of the multiple states of an object – as with Jastrow’s famous duck–rabbit optical illusion, the viewer remembers that the rabbit was once a duck (Mitchell 1994: 46) – and the time in which such differences can play out. The eye-tracking study cannot disclose at what point there is an awareness of ambiguity, but it does show the unfolding pattern of fixation that underpins the ambiguity. It is not about individual fixations, which could simply indicate when the viewer notices or recognizes an object, but the oscillation of fixations over time. From this point of view, each of the saccades is both a movement towards a point of interest and a movement away from another point of interest. One could say that aesthetic attention unfolds as the total dwell time increases. The longer the duration of aesthetic perception, which is a feature of aesthetic contemplation in the

36

SEEING INTO SCREENS

visual arts, the more these patterns – what the art theorist Philip Rawson calls ‘scanning tracks’ (2005: 71) – become visible. A key concern in the study of aesthetics is assessing the extent to which attention is driven by the sensual properties of works or is directed by intentional and voluntary actions. The philosopher of art Paul Crowther (2010: 39) argues that the phenomenal properties of a work are a necessary condition for aesthetic engagement but are not sufficient to understand visual pleasure or the concept of beauty. This can only be revealed by examining the ‘interplay’ between ‘phenomenal form’ and the viewer’s capacity to engage freely with the artwork. There has to be a level of ‘cognitive freedom’ by which the work encourages active engagement with its ‘perceptual possibilities’ (Crowther 2010: 39). There is a conscious process of working through that is integral to aesthetic engagement and which distinguishes it from simply seeing and sensing. In this debate, there is a temporal break between the sensual immediacy of perception and a broader conscious reflective process in which the phenomenal provides the ground for aesthetic exploration. This is why longer viewing durations are privileged in the aforementioned aesthetic theories. There is an assumption that the mind, assisted by the roving eye, must have time to judge, inspect and contemplate an image – processes that are all associated with awareness. This idea of enduring attention is most directly associated with the plastic arts where the time of viewing is not predetermined, but it is also of central importance to the study of film.

The time of response and the time of attentiveness in film In many respects, this notion of cognitive freedom in the visual arts is not easily mapped onto the complex structure of cinematic attention, where there are many more structural limits placed on the time of aesthetic engagement. The greater the constraints on dwell time, the less time there is to consciously attend to the particularity of appearance. In Art and Fear, Paul Virilio (2003: 85) bemoans the cinematic effect on attention in general, in which contemporary culture is driven by cinema’s ‘speeded-up’ images. This increase in speed is reflected in ‘contemporary art’s shrillness in its bid to be heard without delay – that is, without necessitating attention, without requiring the onlooker’s prolonged reflection and instead going for the conditioned reflex, for a reactionary and simultaneous activity’ (Virilio 2003: 90). Virilio’s comments are hyperbolic and too stridently oppose between cinema and the visual arts, but there is, nevertheless, an important point here that is central to evaluating the relationship between eye tracking and aesthetic perception. Cinema’s capacity to produce reflex responses and

INVISIBLE RHYTHMS

37

condition attention is what eye-tracking studies most readily reveal, and as such eye tracking is not ideally suited to analysing the type of variability that is implicit in aesthetic attention. William Brown (2015) states that one of the problems with eye-tracking studies of film is that they isolate ‘statistically significant and shared responses’ and consequently fail to attend to ‘idiosyncratic’ viewing practices. There is an emphasis on scientific visibility, in the form of ‘attention-grabbing’, over viewer introspection and other ‘invisible’ forms of knowledge – to which could be added an attention to the variability of appearances. There is implied politics in this method because it privileges those films that are most successful in coordinating the viewer's gaze, that is, mainstream films (Brown 2015). Certainly, there are many theorists who argue that aesthetic attention is best understood as a resistance to a reflexive formalism and the process of grabbing attention. Laura Mulvey (2006) refers to how new technologies of ‘delay’ allow the viewer to stop, slow down and pause film and provide a foundation for rethinking the nature of cinematic time. The spectator is ‘pensive’ because they are no longer immersed in the narrative fiction, the ‘illusion of movement’ or the naturalized indexicality of the cinematic image (2006: 183–4). Mary Ann Doane (2002: 224) argues that there is much greater contingency, unpredictability and indeterminacy within a shot until the cut closes off its capacity to signify. Extrapolating from this argument, the shorter the duration of a shot, the greater the degree of determination in what the viewer looks at. Consequently, aesthetic perception is much more readily associated with the long take because it gives the viewer sufficient aesthetic freedom to attend to the plenitude of visual detail. Andrey Tarkovsky (1989: 117) warns against over-editing because the spectator’s felt understanding of the temporal continuity of a shot is reduced. In his film Mirror, he only used about 200 shots to allow the viewer greater opportunity to experience the ‘time pressure’ of each shot rather than the ‘conceptual’ rhythm imposed by editing. In other words, the viewers have an awareness of time passing at the same time they attend to what is contained in the shot. Gilles Deleuze (1989: 42–3) argues that cinema can only break its enslavement to the movementimage when time begins to signify independently of the mechanics of the shot. Bazin (1967) also emphasizes the importance of aesthetic engagement that is not fully determined by the mechanics of film editing. Of course, the aesthetics of film can take many forms, but if we follow this particular strand, the question is, can eye tracking provide a platform for analysing aesthetic attention in film that is dependent on cognitive and temporal freedom? Eye movements could reveal some aspects of the modulation of aesthetic attention insofar as they always accompany viewing, and constitute a type of infra-rhythm that operates within the time of the shot. Adrian Dyer and Sarah Pink gesture in this direction when they argue that eye tracking should be linked to how viewers ‘inhabit’ film (2015). They draw inspiration from the

38

SEEING INTO SCREENS

cultural geographer Tim Ingold’s work, and by proxy Merleau-Ponty, on how people occupy movements, tracks and lines. Could a space be found within the scanpaths and time of our eye movements that can be linked to aesthetic attention? One of the barriers to this approach is that eye movements are delimited by the duration of the shots and the salience of the features. Tim J. Smith (2013: 31), in his comprehensive survey of eye tracking and the moving image, states that much of the data refers to exogenous factors describing how the eye moves in response to a particular visual array and sequence of shots. Exogenous factors are more easily determined in eye-tracking studies of film due to the phenomenon of ‘attentional synchrony’, where the eye movements of multiple viewers concord spatially and temporally – something that is more pronounced in the watching of Hollywood films (Smith 2013: 31). This contrasts with the analysis of static images, where there is a greater variance in what observers are looking at, even though patterns may form over time (Smith 2013: 10–12). The variance can be attributed to the greater capacity of viewers of static images in the absence of temporal constraints on viewing to determine what they are looking at. In this model, a film can be judged in terms of the efficacy of its directorial cues. The more effective the cues, the clearer the reception of the message and, consequently, Smith (2013: 23) argues that the most effective visual communication involves ‘matching the rate of presentation of information to the spatiotemporal constraints of viewer attention’. The most effective cues are those in which there is a ‘clear flow of attention’, for example, in a successful match on action, the viewer attends to the continuity of the movement rather than the cut (Smith 2013: 17). Eye tracking confirms many of the principles of continuity editing. In an eye-tracking study of a scene from Vertigo, Marchant et al. (2009: 157) found that viewers generally look towards the centre of the screen but that the eye will shift to the left or right depending on the direction in which the camera is moving the direction of the character’s gaze. Redmond et al. (2015) noted that viewers engaged in ‘joint attention’ with a character’s gaze and identified with character movement and intention in an analysis of an episode of Sherlock. Attention also varied with the type of action, for example, the eye shifting between interlocutors in dialogue. Identification with the character and with the narrative could be seen as endogenous because it is not fully governed by bottom-up visual features, but nevertheless it is still a direct response to on-screen movement that only allows for limited cognitive freedom. Empirically, attentional synchrony and directional cues are important because they support stronger generalizations about the directorial capacity to manage audience attention through mise-en-scène and editing. They are more applicable to the analysis of mainstream film where the viewer is not given sufficient time to develop an idiosyncratic relationship with the film image. However, they are of less value to the study of aesthetic attention because, as Virilio warns, they confirm a stimulus–response approach to film communication.

INVISIBLE RHYTHMS

39

Film form and cognitive freedom Up to this point, I have been using the word cue to describe the relationship between eye movement and aspects of film form, but what attentional synchrony reveals is that eye movements are largely dependent on temporal constraints. Film, as a performance medium, does not necessarily force the eye to move but rather limits the time in which the eye can move. This means that any study of eye movements and aesthetic attention should specify the relative durations of visual presentation and the degree to which this constrains the cognitive freedom of viewing. One of the most noted features of eye tracking and the moving image is the overwhelming attention to the centre of the frame, or what is referred to as the ‘central bias’ (Dyer and Pink 2015). This can be contrasted with the greater amplitude of attention in stillimage viewing. This ‘central bias’ ostensibly describes what the eye is looking at – the eye fixates on an object in the centre of the screen – but another way of explaining the phenomenon is through reference to the reorientation of the eye after each cut. Tosi et al. (1997: 50) argue that even in dynamic scenes in which there is significant character movement, the eye always repositions itself in the centre, because the ‘mind sets the eyes in such a way as to be able to take in the essential information during the rapid mutation of images’. Each change of scene requires a reflex movement that is not necessarily about seeing something but being in a position to see everything. In the centre of the screen, the eye is best placed to see the whole of the shot in peripheral vision. If the eye attends to one edge of the frame, a good portion of peripheral vision will remain outside the frame. It is not about what is in the shot or a direct response to visual cues, but rather a receptiveness to what could be in the shot. There could be other explanations for the ‘central bias’, but what I am trying to point out here is that the particular durational mechanisms of film can affect how the eye moves irrespective of what fills the screen. Unlike in object motion perception, eye movement in response to rapid scene changes is about limiting possibility rather than attending to the particularity of appearances, and consequently could be contrasted with longer duration exploratory movements. This also means that an eye movement that immediately follows a cut should be placed in a different category of analysis than a movement mid shot or at the end of a long take. Smith notes that when an observer is confronted with a new shot in a film, there is more saccadic movement but also greater attentional synchrony (2013: 22). In the period when the eye is searching for new information, there is also a greater concordance of viewing patterns. This principle could be used to establish common ground between film and static image viewing. Rather than discuss medium differences, eye tracking could focus on common temporal frames such as the period of first fixation. For example, Massaro et al. (2012) argue that in the analysis of still images, like film, the body or face is more likely to be the first point of fixation: ‘The human frame seems

40

SEEING INTO SCREENS

to automatically orient participants towards predetermined attractors, namely the presence of a human figure’ (2012: 12). In the narrow duration of first viewing an image, what attracts the eye may be more important than what the viewer chooses to attend to. However, over longer durations, other factors such as volitional attentiveness, awareness of the variability of appearance or even inattentiveness may come into play. In the latter case, the eye might not seek salient information but rather initiate saccades due to the inability to maintain a fixation. Indeed, over very long takes, the viewers might develop an expectation that the shot should end, in which case they begin to develop an awareness of the shot over and above what it presents (Atkinson 2014). This expectation could be manifest in the variation of the scanpaths, but this would have to be studied specifically in terms of the duration of the shot, a task that has been undertaken for this book by Claire Perkins and Tessa Dwyer in their analysis of slow cinema. Aesthetic attention describes how the viewer develops an attentiveness to the play of appearances over time, and if eye tracking is to have any explanatory value in this area, it must describe the variance in attention over and above habitual responses. It is about understanding how the eye responds to a condition of cognitive possibility. One way of analysing this is to remove some of the visual and auditory cues in a film. In a pilot study on music and eye tracking in film, Miguel Mera and Simone Stumpf (2014) were able to alter gaze patterns and the amplitude of saccades by varying the soundtrack accompanying a scene from The Artist. They found that ‘focusing’ music – music that was created to complement the rhythm of the shots – increased the duration of fixations, but when participants were shown a silent version, there was less clustering of eye movements and a greater ‘fixation spread’. Moreover, distracting music – music that directly contrasted with the mood and rhythm of the scene – decreased the dwell time for individual fixations such that the viewer would continue to look around the image (2014: 14–15). There are many reasons why the eye is more likely to fixate on the most predictable features, such as the movement of the main characters, with a focusing soundtrack. The music could increase cognitive load and occupy attention, such that there is a reduction in the drive to explore the frame. It could confirm movement and narrative cues or perhaps provide a rhythm for dwelling, such that the viewer does not feel that time is passing while fixating. In other words, there is a relaxation of what Tarkovsky (1989) refers to as the ‘time pressure’ in each shot. What is crucial for aesthetics is understanding attention in terms of patterns of variance from normal perception. Are there different fixation distributions because the eye seeks visual difference in the silent version? Does the ‘focusing’ music direct the eye to relevant features or does it simply reduce inattention? Does the distracting music force the eye to move away from a current point of fixation or towards a new viewing area? These questions abound because it is not simply a matter of understanding what is seen but finding reasons for the variations in eye movement.

INVISIBLE RHYTHMS

41

The difficulty in understanding variation in attention across different media is that in the analysis of still images there is too much variation, whereas in the study of the moving image there are too many structures managing attention. To refer to an earlier point, aesthetic perception requires an attentiveness to perceptual possibility but this attentiveness can only reveal itself over time. There is first a recognition of the image as an epistemological fact – the initial gist and recognition of what is contained in the image – and then a conscious attentiveness to appearance as a momentary state of the image. There is no absolute point at which an epistemology of seeing evolves into an aesthetics of seeing, but the longer the duration of viewing, the more time there is to attend to perceptual possibility. With regard to the particular time of the shot, except in slow cinema, there is limited time to explore perceptual possibility and attend to the variation in appearance, and it is often only when rules are broken that the viewer is more attentive. Géry d’Ydewalle and Mark Vanderbeeken (1990) argue that cognitive activity increases in the degree to which film breaks standard editing rules. In an eye-tracking experiment, they found that breaks in the impression of apparent motion, such as jump cuts, reduced the breadth of eye movements, whereas significant spatial disruptions, that is, breaking the 180-degree rule or inverting shots, increased eye movement. They argue that the spatial breaks increase the breadth of the saccades because the viewer attempts to reconfigure the spatial logic of the film (1990: 137). The increase in saccadic amplitude is important because it describes an increase in attention to the formal properties of the shot and the cut – the participants noticed the cut in a way that they did not with continuity editing. The eye increases its activity in response to unassimilable visual difference and this activity can be correlated with an increase in aesthetic awareness. The viewers are aware of the situation or condition of appearing in addition to their attention to the objects that appear. This can serve as the basis for a temporal comparison of different aspects of aesthetic perception. In the visual arts, awareness develops over an extended period of viewing through the iterability of exploration, whereas in film, except in the long take, it has to be provoked by breaks in filmic rules. From the point of view of aesthetics, the problem is determining at what point eye movements contribute to awareness or are reflexive and voluntary. The tracking of a single saccade does not provide sufficient information because it operates at speeds that are incommensurate with aesthetic conceptions such as ‘time pressure’ or ‘cognitive freedom’. At the level of the saccade (20–50 ms) or the fixation (on average 300 ms), there is little time for conscious aesthetic awareness to emerge. The viewers may be able to recall after the fact what they have seen but this is not the same as being aesthetically aware of variations in appearance in the time in which they occur. Indeed, if we correlate this with Benjamin Libet’s study of conscious perception, where there is a delay of about 500 ms before a subject is aware

42

SEEING INTO SCREENS

of an external stimulus (2004: 33), it is difficult to conceptualize individual saccades in terms of a process of becoming aware or a process of top-down deliberation. Due to the speed of the movements, François Molnar (1997: 225) argues that eye-tracking studies primarily investigate low-level sensory inputs that are processed without the perceiver’s awareness. This is why Molnar is critical of Alfred Yarbus’s famous studies of complex images (Yarbus 1967) and the claim that saccadic eye movements are driven by the search for ‘semantically rich information’. The problem, he argues, is that there is no time for the observer to see the whole of the image and then choose those areas that are semantically rich. The eye is making choices before the observer is even aware that there is a choice to be made, and for this reason, he proclaims that it is ‘nonsense to talk about the primacy of cognition in exploration’ (1997: 230). The eye movements are reactive and responsive rather than exploratory, which is more of an issue in film studies due to the imposition of temporal frames, such as shot duration, on the time of exploration. To understand visual awareness and attention is not only about separating endogenous and exogenous factors but reconsidering each movement within the context of broader patterns of movement and longer time frames. The increase in amplitude in response to a cut might only signify awareness when the pattern is repeated over a significant duration. Examining different types of eye movements might reveal subtle differences between the bottom-up mechanical and physical movements, but as a general principle, the longer the duration of viewing a particular visual scene, the more likely it is that there will be movements that reveal something about awareness, apperception and cognition. Yarbus’s (1967) study of complex images is often cited as an example of exploratory deliberation, but what is fundamental to the results he obtained is the 1–3 minute duration of the studies. He claims that the main determinant for fixation was ‘essential and useful information’ (1967: 182), but notes that over long durations – the 1–3 minutes he allowed the viewers to peruse an image – the eye does not continue scanning for new information but rather returns to key points of interest: ‘Additional time spent on perception is not used to examine the secondary elements, but to reexamine the most important elements. The impression is created that the perception of a picture is usually composed of a series of “cycles,” each of which has much in common’ (1967: 193). However, what Yarbus does not discuss is the change in the meaning of ‘essential information’ over the cycle of viewing. The fixations associated with the re-examination should have a different value than the first exploratory fixations. There is a variation in seeing, even though the viewer is focusing on the same region, because the viewer is operating with a different level of awareness. The cycle of eye movements attests to this awareness, and this is why it is important to analyse the overall scanpath rather than place undue emphasis on heat maps. If this logic is applied to film, there should be a correlation between the duration

INVISIBLE RHYTHMS

43

of the cycle of eye movement and the shot, as well as a differentiation of those eye movements that are proximate to the initial cut and those that are evident in the later stages of the cycle. It is about finding patterns of variation over the longer duration – the movement away from the centre or human figures and faces – rather than principles of correspondence, such as attentional synchrony. Admittedly, this approach is better suited to the analysis of the long take because with faster editing, the emphasis shifts to attentional synchrony and low-level sensory inputs. Time plays a constitutive role in aesthetic perception. If we look at a painting over a longer period, we end up seeing the painting differently because the duration of viewing affects the structure of attention and reveals differences in the object. Over time, aesthetic perception diverges from normal perception in a way that allows for greater attention to the sensual properties of the visual field. Eye tracking can provide valuable information on aesthetic attention because it can be used to measure a cycle of attention composed of a sequence of micromovements. In the form of the scanpath, these movements serve as a diagrammatic accompaniment to the duration of aesthetic engagement similar to a score that accompanies a musical performance. However, unlike a musical score, eye movements are operating at speeds that sit below the threshold of awareness, and are not easily correlated with one of the key features of aesthetic perception – an attentive awareness to the variation of appearances. The shorter the cycle of movement under investigation, the more difficult it is to talk about a form of conscious attentiveness. This is amplified in film due to restrictions imposed on the viewing process by camera movement, character movement and editing. Editing has the capacity to reset the gaze with each shot and this means that greater emphasis is placed on ascertaining the ‘gist’ of the image, or responding to change, rather than the much more deliberative practices of cognitive exploration and evaluation. Eye tracking might have some value in revealing aspects of aesthetic attention in film, but only if the temporal structures of film are correlated with scanpath cycles. It is about understanding how eye movements vary over the duration of a filmic practice (shot, pan, zoom, etc.) rather than how they react to the initial change or movement. This would allow space to analyse deviations in perception, which could be linked, albeit tentatively, with the process of becoming aware of visual difference in aesthetic attention.

References Arnheim, R. (1954), Art and Visual Perception: A Psychology of the Visual Eye, London: Faber and Faber. Atkinson, P. (2014), ‘Turning Away: Embodied Movement in the Perception of Shot Duration’, Image [&] Narrative, 15 (1): 89–101.

44

SEEING INTO SCREENS

Bazin, A. (1967), What is Cinema? translated by H. Gray, Berkeley: University of California Press. Benjamin, A. (2004), Disclosing Spaces: On Painting, Manchester: Clinamen Press. Brown, W. (2015), ‘Politicizing Eye Tracking Studies of Film’, Refractory: A Journal of Entertainment Media, 25. Carroll, N. (2010), ‘Aesthetic Experience, Art and Artists’, in R. Shusterman and A. Tomlin (eds), Aesthetic Experience, 145–65, New York and London: Routledge. Crowther, P. (2010), ‘The Aesthetic: From Experience to Art’, in R. Shusterman and A. Tomlin (eds), Aesthetic Experience, 31–44, New York and London: Routledge. Deleuze, G. (1989), Cinema 2: The Time-Image, translated by H. Tomlinson and R. Galeta, Minneapolis: University of Minnesota Press. Doane, M. A. (2002), The Emergence of Cinematic Time: Modernity, Contingency, the Archive, Cambridge, MA and London: Harvard University Press. D’Ydewalle, G. and M. Vanderbeeken (1990), ‘Perceptual and Cognitive Processing of Editing Rules in Film’, in R. Groner, G. d’Ydewalle and R. Parham (eds), From Eye to Mind: Information Acquisition in Perception, Search, and Reading, 129–40, Amsterdam: Elsevier. Ducasse, C. J. (1966), The Philosophy of Art, New York: Dover. Dyer, A. G. and S. Pink (2015), ‘Movement, Attention and Movies: The Possibilities and Limitations of Eye Tracking?’, Refractory: A Journal of Entertainment Media, 25. Harland, B., J. Gillett, C. M. Mann, J. Kass, H. J. Godwin, S. P. Liversedge and N. Donnelly (2014), ‘Modes of Address in Pictorial Art: An Eye Movement Study of Manet’s Bar at the Folies-Bergère’, Leonardo, 47 (3): 241–7. Heidenreich, S. M. and K. A. Turano (2011), ‘Where Does One Look When Viewing Artwork in a Museum?’, Empirical Studies of the Arts, 29 (1): 51–72. Kant, I. (1987), Critique of Judgment, translated by Werner S. Pluhar, Indianapolis: Hackett Pub. Co. Libet, B. (2004), Mind Time: The Temporal Factor in Consciousness, Cambridge, MA: Harvard University Press. Lyotard, J. F. (1997), Postmodern Fables, translated by G. Van Den Abbeele, Minneapolis: University of Minnesota Press. Lyotard, J. F. (2012), Textes dispersés 1: esthétique et théorie de l’art, (Miscellaneous Texts 1: Aesthetics and Theory of Art), translated by V. Ionescu, E. Harris and P. Milne, Leuven: Leuven UP. Marchant, P., D. Raybould, T. Renshaw and R. Stevens (2009), ‘Are You Seeing What I’m Seeing? An Eye-Tracking Evaluation of Dynamic Scenes’, Digital Creativity, 20 (3): 153–63. Massaro, D., F. Savazzi, C. Di Dio, D. Freedberg, V. Gallese, G. Gilli and A. Marchetti (2012), ‘When Art Moves the Eyes: A Behavioral and Eye-Tracking Study’, PLoS ONE, 7 (5): 1–16. Mera, M. and S. Stumpf (2014), ‘Eye-Tracking Film Music’, Music and the Moving Image, 7 (3): 3–23. Mitchell, W. J. T. (1994), Picture Theory: Essays on Verbal and Visual Representation, Chicago: University of Chicago Press. Molnar, F. (1997), ‘A Science of Vision for Visual Art’, Leonardo, 45 (1): 225–32.

INVISIBLE RHYTHMS

45

Mulvey, L. (2006), Death 24x a Second: Stillness and the Moving Image, London: Reaktion Books. Rancière, J. (2013), Aisthesis: Scenes from the Aesthetic Regime of Art, translated by Z. Paul, London and New York: Verso. Rawson, P. (2005), Art and Time, Madison and Teaneck: Farleigh Dickinson University Press. Redmond, S., J. Sita and K. Vincs (2015), ‘Our Sherlockian Eyes: The Surveillance of Vision’, Refractory: A Journal of Entertainment Media, 25. Seel, M. (2005), Aesthetics of Appearing, translated by John Farrell, Stanford: Stanford University Press. Smith, T.J. (2013), ‘Watching You Watch Movies: Using Eye Tracking to Inform Cognitive Film Theory’, in A. Shimamura (ed.), Psychocinematics: Exploring Cognition at the Movies, 1–46, Oxford: Oxford University Press. Tarkovsky, A. (1989), Sculpting in Time: Reflections on the Cinema, translated by Kitty Hunter-Blair, London: Faber and Faber. Tosi, V., L. Mecacci and E. Pasquali (1997), ‘Scanning Eye Movements Made When Viewing Film: Primary Observations’, International Journal of Neuroscience, 92 (1–2): 47–52. Virilio, P. (2003), Art and Fear, translated by Julie Rose, London and New York: Continuum. Yarbus, A. L. (1967), Eye Movements and Vision, translated by Basil Haigh, New York: Plenum Press.

3 The Development of Eye Tracking in Empirical Research on Subtitling and Captioning Stephen Doherty and Jan-Louis Kruger

1 Background The soul, fortunately, has an interpreter – often an unconscious but still a faithful interpreter – in the eye. CHARLOTTE BRONTË (1847)

Embedded into the audiovisual text, subtitles and captions contain rich textual information that links to the surrounding audio and visual information. The information contained within subtitles and captions can be verbal or nonverbal in nature. While subtitles typically contain visual elements, captions have recently started to contain auditory elements in addition to the traditional visual elements of text, graphics and avatars. (For a recent review of multimodal captioning, see Sasamoto, O’Hagan and Doherty 2016.) Researchers and practitioners alike typically draw upon semiotics – the study of signs and their meanings – to create and understand the complex links between subtitles, captions and their multimodal surroundings. Film and television present viewers with dynamic, audiovisual texts that make numerous demands on their cognitive capacity. Viewers have to process

EYE TRACKING IN SUBTITLING AND CAPTIONING RESEARCH

47

auditory and visual sources of information simultaneously, and these sources can contain both verbal and nonverbal information. Unlike the continuous processing of real-life scenes, viewers of film and television have to further interpret a range of audiovisual information that requires them to conduct both deductive and inductive reasoning due to the succession of shots and scenes they see before them. While the multimodal processing of audiovisual information is indeed commonplace in everyday communication (For a review of contemporary models, see Smith, Monaghan and Heuttig 2017.), viewers’ cognitive resources may face an even greater demand when subtitles or captions are added to this process. In order to follow speech in audiovisual media, the viewer has to engage in a continuous and dynamic strategic reading activity while also processing images and the auditory information of the soundtrack. Moreover, subtitles and captions themselves perform a multimodal function, typically providing textual renderings of film and television dialogue. Captions for deaf and hard-of-hearing viewers also often contain textual renditions of the soundtrack and sound effects. Viewers also process other types of text on-screen, such as signs and letters embedded into images on screen, intertitles and subtitles used to provide scene information. Empirical research on subtitling and captioning has understandably focused on examining their processing and reception by diverse audiences as part of a rich multimodal experience that spans various genres and formats. Seeking to explore this multimodal processing of subtitles and captions, eye tracking in this field of research originated in the 1980s through the work of d’Ydewalle and colleagues (e.g. d’Ydewalle, Muylle and van Rensbergen 1985; d’Ydewalle, van Rensbergen and Pollet 1987; d’Ydewalle et al. 1991) and has since developed to include a range of different measures in diverse research designs, and to inspire novel avenues of research and application (described in Section 3). There remains, however, a need to consolidate this growing collection of work and identify the progress that has been made and the limitations that are preventing us from moving forward and applying eye-tracking methodologies more widely in the study of the moving image (Section 4). The technologies underpinning and surrounding subtitling and captioning have also evolved substantially over recent decades as part of a wider development of language and translation technologies (see Doherty 2016). Fundamentally, eye tracking is the study of eye movements using a device that tracks one’s gaze and response to the given stimuli. The application of eye tracking to the study of subtitling and captioning provides unprecedented access to the allocation of visual attention by viewers. This method has enabled researchers to directly observe viewers’ visual attention and make inferences about underlying cognitive processes.

48

SEEING INTO SCREENS

In this chapter, we first focus on describing the individual eye-tracking measures that have been and are being used in this field of research. We then move to critically review the constructs being used and their associated development, usage and methodological limitations. Finally, we draw upon the lessons learnt from the body of research to identify potential solutions and ways in which eye tracking can be developed within subtitling research and practice.

2 Individual eye-tracking measures used in empirical studies of subtitling and captioning In this section, we detail each of the dominant eye-tracking measures used in subtitling and captioning research. We also identify measures that have not been used extensively and thus bridge the gap between subjective and more objective measures in a field that is becoming increasing interdisciplinary and open to mixed-methods approaches. We then tie these measures together by focusing on the constructs they are measuring directly and/or indirectly in order to form a critique of the relevant research and identify our current limitations and future opportunities. In an attempt to facilitate comprehension, we allocate each measure to a category: primary or secondary. Primary measures are those that are considered to be directly available to the researcher from the raw data set from the eye-tracker’s software, that is, they are unidimensional and do not need to be combined or transformed. These measures are typically reported in raw form using simple counts, medians and averages. Secondary measures are composite measures that require the combination of two or more primary measures and can therefore become bidimensional or multidimensional. A comprehensive technical description of all measures used in eye-tracking research in cognitive science and psychology can be found in Liversedge, Gilchrist and Everling (2011). As research studies into subtitle processing are more numerous than those of caption processing, several measures use terms that contain the word subtitle, but in all cases, they can still be applied to captioning. A simple example of the distinction between primary and secondary measures can be seen in pupil dilation. As a primary measure, pupil dilation is measured and reported on as a raw numerical value, that is, the physical size of the pupil in millimetres as recorded by the eye-tracker. The percentage change in pupil dilation, however, is a secondary measure as it uses the raw numerical values to create percentages from a given baseline. It can thereby measure the size of the pupil relative to the baseline for a given time period or task, for example, to show if it increases or decreases as a result of a stimulus being presented.

EYE TRACKING IN SUBTITLING AND CAPTIONING RESEARCH

49

Primary measures Eye movements are typically categorized as belonging to fixations or saccades (see Rayner 1998 for a detailed review). A fixation occurs when the eye remains relatively still in a predefined area of dispersion over a predefined threshold of time, typically 200–300 milliseconds in reading, scene processing and object searching tasks (Rayner 1998). Fixations can be caused by unintentional, bottom-up processing (stimulus-driven) or intentional, top-down processing (viewer driven). Fixation count is the raw numerical count of fixations. It is also reported as dwell count. Fixation duration, then, is the length of a given fixation or series of fixations reported in milliseconds or seconds. Fixation count and fixation duration are the most widely used primary measures in the literature (Akahori, Hirai, Kawamura and Morishima 2016; Bisson, Van Heuven, Conklin and Tunney 2014; Caffrey 2008a, 2009, 2012; Cambra et al. 2014; D’Ydewalle and De Bruycker 2007; D’Ydewalle et al. 1991; D’Ydewalle, van Rensbergen and Pollet 1987; Fernandez, Matamala and Vilaro 2014; Ghia 2012; Hefer 2011; Hefer 2013; Jensema, Danthurthi and Burch 2000; Jensema et al. 2000; Krejtz, Szarkowska and Łogińska 2015; Kretj, Szarkowska and Kretj 2013; Kruger 2013; Kruger and Steyn 2014; Kruger et al. forthcoming; Kruger, Hefer and Matthew 2013; Künzli and Ehrensberger-Dow 2011; Mäkisalo, Gowases and Pietinen 2013; Perego et al. 2010; Rajendran et al. 2013; Romero-Fresco 2010; Specker 2008; Szarkowska, Krejtz, Klyszejko and Wieczorek 2011; Szarkowska, Krejtz, Pilipczuk, Łukasz and Kruger 2016; Winke, Gass and Sydorenko 2013). A saccade is the rapid movement of the eye between one fixation and the next. Saccades enable the eyes to move quickly from one point of interest to the next, for example, the words in reading a subtitle, and typically last between 30 and 40 milliseconds for reading and scene perception tasks (Rayner 1998). Saccade count is the raw numerical count of saccades in the given area, and saccade length/duration reports the respective length of each saccade. Saccadic measurements are uncommon in eye-tracking studies of subtitle and caption processing (D’Ydewalle and De Bruycker 2007; Ghia 2012; Kruger and Steyn 2014; Rajendran et al. 2013; Specker 2008; Szarkowska et al. 2011). Lastly, pupil dilation is the size of the pupil at a given time as measured and reported in millimetres. This can be captured from one or both eyes and is often also reported as pupil diameter (Caffrey 2008b, 2009, 2012; Kruger, Hefer and Matthew 2013). While pupil dilation is an established indicator of cognitive load (see Section 3), it is not widely implemented in research into subtitle and caption processing due to the dynamic nature of the media which can cause confusion due to the luminosity and rapid movements that incite a physiological response rather than a cognitive or affective response to the stimuli. (For a discussion of this limitation, see Kruger and Doherty 2016.)

50

SEEING INTO SCREENS

Secondary measures Building upon the primary measures, a range of secondary measures is available to investigate numerous aspects of visual attention and cognitive processing in the presence of subtitles and captions. Gaze time is the combination of the total duration of all fixations and saccades in a given area of interest (AOI). It can consist of multiple visits to the AOI, for example, reading a word and returning to it later, or it can be used for an entire recording, for example, a full movie. It is reported in milliseconds or seconds and also termed dwell time and visit duration depending on the software being used and the duration of the AOI. It is a commonly used measurement throughout the literature and is typically used alongside fixation count and fixation duration (Akahori et al. 2016; Caffrey 2009; 2012; Cambra et al. 2014; D’Ydewalle and De Bruycker 2007; D’Ydewalle et al. 1991; D’Ydewalle, van Rensbergen, and Pollet 1987; Fox 2016; Hefer 2011; Hefer 2013; Jensema, Danthurthi and Burch 2000; Jensema et al. 2000; Krejtz, Szarkowska and Łogińska 2015; Kretj, Szarkowska and Kretj 2013; Kruger 2013; Kruger, Hefer and Matthew 2013; Künzli and Ehrensberger-Dow 2011; Szarkowska et al. 2011; Szarkowska et al. 2016; Winke, Gass and Sydorenko 2013). Time to first fixation is the amount of time taken by the viewer to first fixate on the AOI. Its duration is also reported as first fixation duration. Time to first fixation is reported in milliseconds and seconds and is an uncommon measure in the literature (D’Ydewalle and De Bruycker 2007; D’Ydewalle et al. 1991; D’Ydewalle et al. 1991; Fox 2016). This is also the case for first fixation duration (Krejtz, Szarkowska and Łogińska 2015; Kretj, Szarkowska and Kretj 2013). A glance is defined as each movement of the eyes to a particular AOI. It includes the saccade entering the AOI, as well as all fixations in the AOI before the eyes start leaving the AOI. Glance duration is then the sum of the duration of the saccade entering the AOI and all the fixations in the AOI until the last fixation before the eyes leave the AOI. It is reported as a raw number as glance count and is also an uncommon measure (D’Ydewalle and De Bruycker 2007; Hefer 2011; Szarkowska et al. 2011; Szarkowska et al. 2016). A revisit is defined as a fixation that returns to a previously visited AOI where its preceding fixation was outside of the AOI. It is reported in both a raw numerical count as revisit count, and in its duration in milliseconds as revisit duration (Hefer 2011; Kruger et al. forthcoming). Sequences of fixations are reported as consecutive fixations and scanpaths, whereby a series of fixations is made in a given direction, or regressive fixations, which denote that the fixations are made in the opposite direction to what is expected for linear reading, for example, left to right is required for English. The latter is also reported as regressions and regressive eye movements. The measurement of these sequences of fixations is widely reported on in raw

EYE TRACKING IN SUBTITLING AND CAPTIONING RESEARCH

51

number, duration of time and in the amplitude of the movements (Bisson et al. 2014; D’Ydewalle and De Bruycker 2007; Fox 2016; Ghia 2012; Mäkisalo, Gowases and Pietinen 2013; Perego et al. 2010; Specker 2008). Skipped subtitles are also widely reported and denote the number of subtitles where no fixation was present in the subtitle AOI. This raw numerical count also enables the counting of unskipped subtitles, also reported as fixated subtitles, that is, where at least one fixation is present in the subtitle AOI (Bisson et al. 2014; Caffrey 2008b, 2009, 2012; D’Ydewalle and De Bruycker 2007; Kretj, Szarkowska and Kretj 2013; Szarkowska et al. 2016). Word fixation probability is a calculation of the likelihood of a fixation landing on a given word as determined by the number of participants in the study’s sample who fixated on the word, divided by the total number of participants. It is expressed as a percentage or ratio and is also reported as hit count (Caffrey 2008b, 2009, 2012; D’Ydewalle and De Bruycker 2007; Krejtz, Szarkowska and Łogińska 2015; Kretj, Szarkowska and Kretj 2013). Similarly, but specifically designed to account for the dynamic nature of subtitles and captions, the reading index for dynamic texts (RIDT) is a formula that calculates the degree to which a particular subtitle was processed based on the number of unique fixations per standard word in a given subtitle, but also taking into account regressions (negative saccades), refixations on the same word and saccade length (Kruger and Steyn 2014; Kruger et al. forthcoming). Finally, all of these primary and secondary measures can also be expressed using a percentage. Such percentages are used to indicate relativity (e.g. number of words fixated on within a subtitle) and relative change (e.g. more fixations in an experimental condition) and are expressed in percentage points.

3 Constructs in empirical eye-tracking studies of subtitling and captioning Building upon the previous description of individual eye-tracking measures, this section examines these measures in context and describes how they are used to construct, test and refine more complex constructs that are essential to empirical subtitling and captioning research and indeed relevant to other disciplines, including media and screen studies. These principally concern the constructs of visual attention, cognitive load and psychological immersion. We acknowledge that studies in cognitive science and cognitive psychology use eye tracking to examine bimodal and multimodal processing. While such studies have been foundational in developing eye-tracking methodologies, they have not been included here as they do not study film or subtitles or captions per se, rather they focus on the cognitive processing of stimuli across different modes.

52

SEEING INTO SCREENS

Visual attention Our cognitive resources are finite and must continually respond to external stimuli. In the case of subtitling and captioning, the stimuli are both static and dynamic in nature and information from the stimuli is presented visually and auditorily. The visual aspect of the audiovisual stimuli in subtitles was the primary interest of early researchers in this field as they sought to uncover what viewers were looking at when watching subtitled media and why. The use of eye-trackers in such investigations was a departure from the traditional self-report measures used, which were based on post hoc questionnaires aiming to gain insight into viewers’ thoughts and opinions about what they had just watched. As an online measure, eye tracking allowed researchers unprecedented access into the eyes, and arguably the minds, of the viewers under study. In determining what viewers were looking at, the construct of visual attention was borrowed from the cognitive and psychological sciences, where it had already been widely developed with and without eye-tracking methodologies. Visual attention refers to ‘the cognitive operations that allow us to efficiently deal with this capacity problem by selecting relevant information and by filtering out irrelevant information’ (McMains and Kastner 2009: 4296). Visual attention can be considered as both top-down and bottom-up processing: top-down processing as viewers can actively decide to allocate their visual attention to a given stimulus at a given time, and bottom-up processing as a stimulus can attract our visual attention in an involuntary way due to its traits (e.g. movement, luminosity, colour, position, etc.). The allocation of visual attention during subtitle and caption processing is a combination of these two processes and this distinction has implications for research design due to the rich, multimodal and dynamic nature of subtitled and captioned media. Visual attention remains of primary interest in subtitling and captioning research as it allows researchers to directly observe where viewers are allocating their visual attention, for example, by using fixation count and related measures, and how they are distributing and switching their attention between the subtitles and the other elements on the screen. Of the constructs studied in eye-tracking research in this area, visual attention is the most directly observable and requires the least interference due to the ability of the eye-tracker to directly and overtly observe primary measures such as fixation count and fixation duration, and calculate secondary measures such as time to first fixation, dwell time, and so on. Central to discussions of visual attention is the eye–mind hypothesis (Just and Carpenter 1980), a foundational aspect of eye tracking which asserts that there is a close relationship between what the eyes are fixating upon and what the mind is engaged with, or what the brain is processing, in that ‘there is no appreciable lag between what is fixated and what is

EYE TRACKING IN SUBTITLING AND CAPTIONING RESEARCH

53

processed’ (Just and Carpenter 1980: 331). While there are numerous reported exceptions to this hypothesis (for discussions, see Hoffman 1998; Anderson, Bothell and Douglass 2004; Irwin 2004; Staub and Rayner 2007), subtitle and captioning research implicitly adopts the eye–mind hypothesis without a critical discussion of its limitations and how they have been addressed or accounted for in the study at hand. Of most relevance here is Irwin (2004) who shows how cognitive processing is not limited to periods during which the eyes are fixating, but also occurs while they are in the rapid movements of saccades. Once again, contextual facts such as word frequency and contextual predictability come into play (see Staub and Rayner 2007). Electroencephalography (discussed in the following paragraphs) may address some limitations of the eye–mind hypothesis in the context of subtitled and captioned media by providing a more direct and potentially holistic account of viewers’ eye-tracking behaviour vis-à-vis cognitive and affective processing. From numerous eye-tracking studies in the cognitive and psychological sciences, it is widely accepted that our eyes jump from one location to another while we read rather than sustaining a smooth, uninterrupted path (for a review, see Staub and Rayner 2007). This saccade movement ranges typically between 20 and 40 milliseconds and occurs between stationary periods when the eye is fixating. Fixations typically range from 200 to 300 milliseconds during reading and allow the information to be extracted from the text contained in the subtitle or caption. Fixations are relative to word length, and short words are frequently skipped during a reading task, such as reading for comprehension (Staub and Rayner 2007). Skipped words comprise approximately 20 per cent of the words we encounter and they are usually identifiable from orthographic and phonological information obtained in the fixation on the previous word (for a review, see Rayner 1998; Drieghe 2008). Our perceptual span while reading is limited to approximately 7–8 characters to the right of the fixation and 2–3 characters to the left of the fixation in Indo-European languages and does not allow us to extract meaningful information from text on the line above or below the word we are reading (e.g. Pollatsek et al. 1993). Given that much of the reading process is automated (see Gunter and Friederici 1999), and the depth of processing is shallow by default (see Bentin, Kutas and Hillyard 1993), unless instructed to read for a specific task that would require a deeper level of processing (e.g. reading for comprehension versus reading for translation in Jakobsen and Jensen 2008), a limited amount of information can be obtained in each fixation where contextual facts such as word frequency and contextual predictability come into play (see Staub and Rayner 2007). The use of eye-movement data to study visual attention can extend beyond subtitle and caption AOIs into the entirety of the visual and auditory experience of all sorts of media – from advertisements to TV shows and

54

SEEING INTO SCREENS

feature films. We can, for example, use fixation counts and their duration to investigate face and scene recognition in multimodal environments and to identify the attention allocated to specific characters and their interactions as well as to discrete elements on screen (e.g. for objects or logos of interest, see Kruger 2012). While the study of visual attention in subtitled and captioned media has provided us with invaluable insights into understanding and improving how viewers process such media, the exclusion of audio is a costly and systematic weakness. Auditory attention has not yet been incorporated into subtitling and captioning research; however, recent work in cognitive science is directly applicable (for a review, see McGarrigle et al. 2014), especially when the auditory aspect of subtitle and caption processing is central to the research question.

Cognitive load Originating from the psychological sciences, cognitive load theory (Plass, Moreno and Brünken 2010) posits that we have a limited working memory and processing capacity. Applied to the context of the multimodal processing of subtitled and captioned media, this has been used to measure the cognitive load of different text presentation styles, speeds and placements (for a review, see Kalyuga 2012). This theoretical framework can be directly applied to our field of research as it brings a well-established inventory of media design principles that have been shown to ease the cognitive processing of multimodal information and reduce problems in processing due to overload and redundancy. Indeed, the combination of the visual and auditory channels has been shown to maximize processing in a variety of scenarios (see Mayer and Moreno 2003; Kruger and Doherty 2016). As it is a theoretical construct, cognitive load can only be indirectly assessed using eye tracking with or without self-report and task performance measures due to its multidimensionality. Based on Mayer and Moreno (2003), this multidimensionality is formalized as: intrinsic load (inherent to the viewer and the task); extraneous load (aspects of the viewer experience that impose cognitive effort); and germane load (the level of cognitive activity that is required for successful viewing to take place). As such, the main application of cognitive load theory in subtitling research should be to determine and reduce the extraneous load in order to avoid cognitive overload for the viewer, thereby optimizing the cognitive capacity to be assigned to germane load, thus ensuring that the viewer can become immersed in the multimodal experience. While research in subtitling and captioning has widely used the construct of cognitive load, it has yet to fully incorporate the multidimensionality of cognitive load that has since developed in the cognitive and psychological sciences. Kruger and Doherty

EYE TRACKING IN SUBTITLING AND CAPTIONING RESEARCH

55

(2016) provide a discussion of the applications of cognitive load theory in subtitling research and outline methodologies that can capture its multidimensionality. Despite being an indirectly observable theoretical construct, the evidence base of cognitive load theory makes it relevant and useful not only to subtitling research, but also to media studies and media psychology more generally. Researchers can use eye tracking to investigate the efficiency of any form or combination of media stimuli such as scene cuts, graphics, focus and sound in terms of the cognitive processing demands made on the viewer. Recent work, for example, has begun to compare the cognitive processing demands of unsubtitled film, film with conventional subtitles and film with dynamic subtitles in order to establish the optimal placement and aesthetic characteristics of subtitles to facilitate psychological immersion (see Kruger et al. forthcoming). Manipulating the placement of subtitles (in a process termed ‘integrated titles’ by the authors) can reduce cognitive load. This placement is determined using eye-tracking data to calculate the most effective way to reduce spatial and temporal distance between the subtitle and its surrounding audiovisual elements, thus enabling the viewer to integrate the multimodal information with greater ease than with traditional subtitles positioned at the bottom of the screen.

Psychological immersion Linked to cognitive load theory, a current trend in research in this area is in exploring the psychological immersion in film and television using eye tracking as part of a mixed-method approach. This approach also showcases the next generation of subtitling research in which cognitive approaches with experimental methods are coming to the fore in order to triangulate the traditional offline measures of questionnaires and self-report psychometrics with the now established use of online eye-tracking measures. Current research is also beginning to incorporate electroencephalography (EEG) into eye-tracking methodologies in order to provide a more direct measurement of immersion and cognitive load. EEG measures with high temporal resolution the brain’s electrical responses to dynamic stimuli and may bypass some of the limitations of eye tracking. Within the cognitive and psychological sciences, immersion in mediated environments such as film, television, fiction and virtual reality is typically measured through offline self-report scales on dimensions including presence, transportation, identification, perceived realism and flow, and by using online measures of heart rate, galvanic skin response and pupil dilation. In parallel to this, cognitive science research has identified the prefrontal cortex as being in control of cortical processing, or executive

56

SEEING INTO SCREENS

processing, whereas the posterior parietal cortex has been identified as being active when we engage our imagination, particularly when we imagine ourselves as someone else, as being somewhere else or at another time, in other words, when we become immersed in a story world (Shimamura 2013). The initial findings of a recent study (Kruger, Soto-Sanfiel and Doherty 2017) suggest that same-language subtitles do not decrease the self-reported psychological immersion (measured using the self-report scales listed previously) of first language viewers and enable second language viewers to access the narrative, thereby increasing self-reported immersion among these viewers. While this study does not use eye tracking or EEG in this methodology, the authors propose links to how the same study can be replicated and correlated with online measures. EEG methods have since been used in a follow-up study to measure such effects in watching movies and in listening to music, although the link to immersion has not yet been validated sufficiently (see Kruger et al. 2016). This recent research on immersion in subtitled media is directly relevant to cognitive media studies and of course the impact that subtitles and captions may have on the viewer’s ability to become immersed in a movie. However, the methodological challenges in using EEG to measure such effects have proven quite significant (as indicated by topics in recent conferences in the field, such as the 2016 TRA&CO Translation and Cognition Symposium at the University of Mainz), and only one study has published evidence to verify the usefulness of EEG in this regard. Kruger et al. (2016) report that the initial validation of a mixed-method approach using EEG found that subtitles significantly increased immersion as measured using a combination of objective (online EEG) and subjective (offline self-reporting as detailed earlier) measures. If such an approach can be further validated, it may provide an exciting and fruitful avenue for future research into the empirical study of the cognitive and affective processing of media more generally, in and indeed outside of the context of subtitling research. Researchers and content creators of all media types could harness such an approach to measure a viewer’s immersion in a film, television programme or video game in real time to examine the effect of manipulations to the multimodal environment of the medium (e.g. cinematography, graphics, virtual and augmented reality and 3D) and create immersive and interactive multimodal media that can adapt to the viewers’ cognitive and affective responses. A recent example of this is the implementation of this approach to the use of subtitles in educational settings described in Kruger and Doherty (2016) wherein students studying through their first or second language can benefit from subtitled educational content and, with sufficient technical resources, the educational content can adapt to students’ needs in real time thanks to the high temporary resolution of the online measures of eye tracking and EEG.

EYE TRACKING IN SUBTITLING AND CAPTIONING RESEARCH

57

4 Eyes forward, looking ahead We have now seen how individual eye-tracking measures have been used in isolation and in combination to measure the constructs of visual attention, cognitive load and psychological immersion in the context of subtitle and caption processing. We now move to discuss the general methodological limitations of these eye-tracking methodologies in this field of research. We argue that these limitations typically lie in the inconsistency of terminology (e.g. gaze time versus dwell time) and operational definitions (e.g. cognitive load versus cognitive effort), limited sampling and participant profiling, limited data analysis and statistical testing, and inconsistent reporting of the technical specifications of eye-trackers and their related software. We believe that these limitations then lead to a lack of standardization in interpreting and reporting results, and an overall lack of adherence to established best practice for using eye tracking as reported in other disciplines where eye tracking has been well established, for example, in the cognitive and psychological sciences from which we have borrowed these eye tracking measures and constructs. We neither attempt to argue that eye tracking is a panacea for research into subtitle and caption processing, nor propose that all research designs using eye tracking have the limitations and caveats discussed in the previous sections. Used alone, eye movements cannot reveal, for example, whether a comprehension difficulty in a subtitle leads to a comprehension failure or success, and cannot reveal the thoughts, feelings or opinions of the participant regarding what is being viewed (see Dyer and Pink 2015). There is also the danger that we may infer incorrectly from eye movement data. We may, for example, infer that more and longer fixations on one word or face denotes a comprehension or recognition difficulty. Conversely, we may infer that the word or face is eliciting an affective response or attracting the viewer’s visual attention due to their characteristics. All scenarios may be possible but it is through careful research design informed by evidenced-based theoretical models and best practice in using eye tracking in tandem with other methods that we can reduce the risk of potentially invalid inferences. We believe that the refinement of the eye-tracking methodologies used in empirical studies of the moving image, and its text, will add significant value and integrity to this growing body of increasing interdisciplinary research. We propose that this be accomplished by improving research design and publication standards in this field of research.

Improving research design and publication quality In order to avoid an obfuscation of qualitative and quantitative research designs, it is necessary to include concise and explicit descriptions of

58

SEEING INTO SCREENS

the study’s design, as this may affect the assumptions and interpretation of results. The study’s research questions should be made explicit and previous literature in the relevant disciplines should be used to inform them and to construct hypotheses. By establishing this link, researchers can draw upon  previous work and identify the appropriate primary and secondary eye-tracking measures relevant to their research question. This also enables the review of literature to become more concise and meaningful and enables the researcher to differentiate between complex constructs like visual attention, cognitive load and psychological immersion. Given the challenges of eye tracking as described in the previous sections, it is valuable to employ mixed-method research designs that contain a triangulation of individual methods to overcome the limitations of each individual method. For discussions of this issue in the context of eye tracking, see Saldanha and O’Brien (2014), Batty, Perkins and Sita (2015), Dyer and Pink (2015) and Smith (2015). A practical guide to eyetracking methodologies not specific to this field of research can be found in Duchowski (2007). It is critical that we report the level of sampling required for the target population as this may affect the study’s results if sampling is not sufficiently mapped to the research design and research questions being asked. Profiling participants using relevant and established psychometrics will also add to the descriptive and inferential power of the data, helping to establish language proficiency, working memory capacity, viewing habits and preferences, and immersive tendencies. It is also important to assess the quality of eye-tracking data using holistic methods that combine known parameters of fixation durations with the metrics used in the proprietary eye-tracking software (see Doherty 2012). While the metrics used in the proprietary software are typically well supported by empirical research (see manufacturers’ documentation), they are often left unjustified or are not described by researchers. Poor quality data may skew results and lead to incorrect findings in individual projects and entire programmes of research. Moving towards the publication of research findings, we propose that we report statistical results, especially effect sizes, in line with the American Psychological Association’s standards (see APA 2009). As eye-tracking methodologies and the concomitant statistical analyses are relatively new to researchers in this field, there are often gaps in understanding between researchers and indeed between researchers and publication reviewers and editors. The APA standards are internationally accepted as best practice and are the most relevant to this field of research. Further to this, as eye-tracking equipment can vary significantly, it is also important that we are consistent in providing explicit definitions of the thresholds and technical specifications of eye-tracking equipment, such as temporal resolution. This should be coupled with concise descriptions

EYE TRACKING IN SUBTITLING AND CAPTIONING RESEARCH

59

of the materials used in the study to avoid misunderstanding and enable replication studies. In closing, we believe that the application of eye tracking to the study of subtitling research is an excellent example of the challenges and rewards of conducting interdisciplinary research that combines tightly controlled experimental research with real-world, ecologically valid studies. It is in borrowing eye tracking from the cognitive and psychological sciences and taking the time to employ it effectively in researching subtitle and caption processing that we can add significant value to the study of the moving image by combining the critical lens and creativity of the humanities with the rigor of the sciences, especially in understanding more about visual attention, cognitive load and psychological immersion from an interdisciplinary perspective. In improving the quality and integrity of subtitling research, we can resolve long-standing questions of cognitive processing and reception and show the value of our work to other disciplines, and to industry and public stakeholders given the growing usage of subtitling and captioning in all forms of media. Moving forward, it is crucial that evidence behind the study of subtitles and captioning be placed in the best position to influence professional practice and international policy (e.g. quality assessment models for manual and automatic subtitling workflows, professional training, professional best practice, evidence-based standards for same-language subtitles and captioning for the deaf and hard-of-hearing, and encouraging the usage of subtitling and captioning in order to improve accessibility for linguistically and cognitively diverse viewers) as we become increasingly influenced by language and translation technologies.

References Akahori, W., T. Hirai, S. Kawamura and S. Morishima (2016), ‘Region-of-InterestBased Subtitle Placement Using Eye-Tracking Data of Multiple Viewers’, Proceedings of the ACM International Conference on Interactive Experiences for TV and Online Video, 123–28, Chicago: Association for Computing Machinery. American Psychological Association (2009), Publication Manual of the American Psychological Association, 6th Edition, Washington, DC: American Psychological Association (APA). Anderson, J. R., D. Bothell and S. Douglass (2004), ‘Eye Movements do not Reflect Retrieval Processes Limits of the Eye-Mind Hypothesis’, Psychological Science, 15 (4): 225–31. Batty, C., C. Perkins and J. Sita (2015), ‘How We Came to Eye Tracking Animation: A Cross-Disciplinary Approach to Researching the Moving Image’, Refractory: A Journal of Entertainment Media, 25. Available online: http://refractory. unimelb.edu.au/2015/02/06/batty-perkins-sita/ (accessed 15 October 2016).

60

SEEING INTO SCREENS

Bentin, S., M. Kutas and S. A. Hillyard (1993), ‘Electrophysiological Evidence for Task Effects on Semantic Priming in Auditory Word Processing’, Psychophysiology, 30: 161–9. Bisson, M. J., W. J. Van Heuven, K. Conklin and R. J. Tunney (2014), ‘Processing of Native and Foreign Language Subtitles in Films: An Eye Tracking Study’, Applied Psycholinguistics, 35 (2): 399–418. Brontë, C. (2000), Jane Eyre, Oxford: Oxford University Press. Caffrey, C. (2008a), ‘Viewer Perception of Visual Nonverbal Cues in Subtitled TV anime’, European Journal of English Studies, 12 (2): 163–78. Caffrey, C. (2008b), ‘Using Pupillometric, Fixation-Based and Subjective Measures to Measure the Processing Effort Experienced When Viewing Subtitled TV Anime with Pop-Up Gloss’, Copenhagen Studies in Language, (36): 125–44. Caffrey, C. (2009), ‘Relevant Abuse? Investigating the Effects of an Abusive Subtitling Procedure on the Perception of TV Anime Using Eye Tracker and Questionnaire’, PhD diss., Dublin City University. Caffrey, C. (2012), ‘Using an Eye-Tracking Tool to Measure the Effects of Experimental Subtitling Procedures on Viewer Perception of Subtitled AV Content’, in E. Perego, (ed.), Eye Tracking in Audio-Visual Translation, 223–58, Rome: Aracne. Cambra, C., O. Penacchio, N. Silvestre and A. Leal (2014), ‘Visual Attention to Subtitles When Viewing a Cartoon by Deaf and Hearing Children: An EyeTracking Pilot Study’, Perspectives: Studies in Translatology, 22 (4): 607–17. Doherty, S. (2012), ‘Investigating the Effects of Controlled Language on the Reading and Comprehension of Machine Translated Texts: A Mixed-Methods Approach Using Eye Tracking’, PhD diss., Dublin City University. Doherty, S. (2016), ‘The Impact of Translation Technologies on the Process and Product of Translation’, International Journal of Communication, 10: 947–69. Drieghe, D. (2008), ‘Foveal Processing and Word Skipping during Reading’, Psychonomic Bulletin and Review, 15: 856–60. Duchowski, A. (2007), Eye Tracking Methodology: Theory and Practice, New York, Springer. Dyer, A. and S. Pink (2015), ‘Movement, Attention and Movies: The Possibilities and Limitations of Eye Tracking?’, Refractory: A Journal of Entertainment Media, 25. Available online: http://refractory.unimelb.edu.au/2015/02/06/dyerpink/ (accessed 15 October 2016). D’Ydewalle, G. and J. Van Rensbergen (1989), ‘Developmental Studies of TextPicture Interactions in the Perception of Animated Cartoons with Text’, Advances in Psychology, 58: 233–48. D’Ydewalle, G. and I. Gielen (1992), ‘Attention Allocation with Overlapping Sound, Image, and Text’, in K. Rayner (ed.), Eye Movements and Visual Cognition, 415–27, New York: Springer. D’Ydewalle, G. and M. Van de Poel (1999), ‘Incidental Foreign-Language Acquisition by Children Watching Subtitled Television Programs’, Journal of Psycholinguistic Research, 28 (3): 227–24. D’Ydewalle, G. and W. De Bruycker (2007), ‘Eye Movements of Children and Adults While Reading Television Subtitles’, European Psychologist, 12 (3): 196–205.

EYE TRACKING IN SUBTITLING AND CAPTIONING RESEARCH

61

D’Ydewalle, G., P. Muylle and J. Van Rensbergen (1985), ‘Attention Shifts in Partially Redundant Information Situations’, in R. Groner, G. W. McConkie and C. Menz (eds), Eye Movements and Human Information Processing, 375–84, Amsterdam: Elsevier Science. D’Ydewalle, G., J. Van Rensbergen and J. Pollet (1987), ‘Reading a Message When the Same Message is Available Auditorily in Another Language: The Case of Subtitling’, in J. K. O’Regan and A. Levy-Schoen (eds), Eye Movements from Physiology to Cognition, 313–21, Amsterdam: Elsevier. D’Ydewalle, G., C. Praet, K. Verfaillie and J. Van Rensbergen (1991), ‘Watching Subtitled Television Automatic Reading Behavior’, Communication Research, 18 (5): 650–66. Fernández, A., A. Matamala and A. Vilaró (2014), ‘The Reception of Subtitled Colloquial Language in Catalan: An Eye-Tracking Exploratory Study’, International Journal of Applied Linguistics, 11: 63–80. Fox, W. (2016), ‘Integrated Titles: An Improved Viewing Experience?’, in S. Hansen-Schirra and S. Grucza (eds), Eye Tracking and Applied Linguistics, 5–30, Berlin: Language Science Press. Ghia, E. (2012), ‘The Impact of Translation Strategies on Subtitle Reading’, in E. Perego (ed.), Eye Tracking in Audio-Visual Translation, 155–82, Rome: Aracne. Gunter, T. C., and A. D. Friederici. (1999), ‘Concerning the Automaticity of Syntactic Processing’, Psychophysiology, 36 (1): 126–37. Hefer, E. (2011), ‘Reading Second Language Subtitles: A Case Study of South African Viewers Reading in their Native Language and L2-English’, PhD diss., North-West University, Vanderbijlpark. Hefer, E. (2013), ‘Reading Second Language Subtitles: A Case Study of Afrikaans Viewers Reading in Afrikaans and English’, Perspectives: Studies in Translatology, 21 (1): 22–41. Hoffman, J. E. (1998), ‘Visual Attention and Eye Movements’, Attention, 31: 119–53. Irwin, D. E. (2004), ‘Fixation Location and Fixation Duration as Indices of Cognitive Processing’, in J. Henderson and F. Ferreira (eds), The Interface of Language, Vision, and Action: Eye Movements and the Visual World, 105–34, New York: Psychology Press. Jakobsen, A. L. and K.T. Jensen (2008), ‘Eye Movement Behaviour Across Four Different Types of Reading Task’, Copenhagen Studies in Language, 36: 103–24. Jensema, C. J., R. S. Danturthi and R. Burch (2000), ‘Time Spent Viewing Captions on Television Programs’, American Annals of the Deaf, 145 (5): 464–8. Jensema, C. J., S. El Sharkawy, R. S. Danturthi, R. Burch and D. Hsu (2000), ‘Eye Movement Patterns of Captioned Television Viewers’, American Annals of the Deaf, 145 (3): 275–85. Just, M. A. and P. A. Carpenter (1980), ‘A Theory of Reading: From Eye Fixations to Comprehension’, Psychological Review, 87 (4): 329–54. Kalyuga, S. (2012), ‘Instructional Benefits of Spoken Words: A Review of Cognitive Load Factors’, Educational Research Review, 7 (2): 145–59. Krejtz, I., A. Szarkowska and K. Krejtz (2013), ‘The Effects of Shot Changes on Eye Movements in Subtitling’, Journal of Eye Movement Research, 6 (5): 1–12. Krejtz, I., A. Szarkowska and M. Łogińska (2015), ‘Reading Function and Content Words in Subtitled Videos’, Journal of Deaf Studies and Deaf Education, 21 (2): 222–32.

62

SEEING INTO SCREENS

Kruger, J. L. (2012), ‘Making Meaning in AVT: Eye Tracking and Viewer Construction of Narrative’, Perspectives: Studies in Translatology, 20 (1): 67–86. Kruger, J. L. (2013), ‘Subtitles in the Classroom: Balancing the Benefits of Dual Coding with the Cost of Increased Cognitive Load’, Journal for Language Teaching, 47 (1): 29–53. Kruger, J. L. (2016), ‘Psycholinguistics and Audio-Visual Translation’, Target. International Journal of Translation Studies, 28 (2): 276–87. Kruger, J. L. (forthcoming), ‘Eye Tracking in Audio-Visual Translation Research’, in L. Perez-Gonzalez (ed.), The Routledge Handbook of Audio-visual Translation Studies, London: Routledge. Kruger, J. L. and F. Steyn (2014), ‘Subtitles and Eye Tracking: Reading and Performance’, Reading Research Quarterly, 49 (1): 105–20. Kruger, J. L. and S. Doherty (2016), ‘Measuring Cognitive Load in the Presence of Educational Video: Towards a Multimodal Methodology’ Australasian Journal of Educational Technology, 32 (6): 19–31. Kruger, J. L., E. Hefer and G. Matthew (2013), ‘Measuring the Impact of Subtitles on Cognitive Load: Eye Tracking and Dynamic Audio-Visual Texts’, Proceedings of the 2013 Conference on Eye Tracking South Africa, 62–6, Cape Town: Association for Computing Machinery. Kruger, J. L., A. Szarkowska and I. Krejtz (2015), ‘Subtitles on the Moving Image: An Overview of Eye Tracking Studies’, Refractory: A Journal of Entertainment Media, 25. Available online: http://refractory.unimelb.edu.au/2015/02/07/ kruger-szarkowska-krejtz/ (accessed 15 October 2016). Kruger, J. L., M. T. Soto-Sanfiel and S. Doherty (2017), ‘Original Language Subtitles: Their Effects on the Native and Foreign Viewer’, Comunicar, 50 (1): 23–32. Kruger, J. L., M. T. Soto-Sanfiel, S. Doherty and R. Ibrahim (2016), ‘Towards a Cognitive Audio-Visual Translatology: Subtitles and Embodied Cognition’, in. R. Muñoz (ed.), Reembedding Translation Process Research, 71–194, London: John Benjamins. Kruger, J. L., S. Doherty, W. Fox and P. de Lissa (forthcoming), ‘Multimodal Measurement of Cognitive Load during Subtitle Processing: Same-Language Subtitles for Foreign Language Viewers’, in I. Lacruz and R. Jääskeläinen (eds), New Directions in Cognitive and Empirical Translation Process Research, London: John Benjamins. Künzli, A. and M. Ehrensberger-Dow (2011), ‘Innovative Subtitling: A Reception Study’, in C. Alvstad, A. Hild and E. Tiselius, (eds), Methods and Strategies of Process Research, 187–200, Amsterdam: John Benjamins. Liversedge, S., I. Gilchrist and S. Everling (2011), The Oxford Handbook of Eye Movements, Oxford: Oxford University Press. Mäkisalo, J. L., T. Gowases and S. Pietinen (2013), ‘Using Eye Tracking to Study the Effect of Badly Synchronized Subtitles on the Gaze Paths of Television Viewers’, New Voices in Translation Studies, 10 (1): 72–86. Mayer, R. E. and R. Moreno (2003), ‘Nine Ways to Reduce Cognitive Load in Multimedia Learning’, Educational Psychologist, 38 (1): 43–52. McGarrigle, R., K. J. Munro, P. Dawes, A. J. Stewart, D. R. Moore, J. G. Barry and S. Amitay (2014), ‘Listening Effort and Fatigue: What Exactly are we Measuring? A British Society of Audiology Cognition in Hearing Special Interest Group White Paper’, International Journal of Audiology, 53: 433–40.

EYE TRACKING IN SUBTITLING AND CAPTIONING RESEARCH

63

McMains, S. and S. Kastner (2009), ‘Visual Attention’, Encyclopaedia of Neuroscience, 4296–4302, Berlin: Springer. Moran, S. (2012), ‘The Effect of Linguistic Variation on Subtitle Reception’, in E. Perego, (ed.), Eye Tracking in Audio-Visual Translation, 183–222, Rome: Aracne. Perego, E. and E. Ghia (2011), ‘Subtitle Consumption According to Eye Tracking Data: An Acquisition Perspective’, in L. Incalcaterra, M. Biscio and M. A. Ní Mhainnín (eds), Audio-Visual Translation: Subtitles and Subtitling: Theory and Practice, 177–96, London: Peter Lang. Perego, E., F. Del Missier, M. Porta and M. Mosconi (2010), ‘The Cognitive Effectiveness of Subtitle Processing’, Media Psychology, 13 (3): 243–72. Plass, J. L., R. Moreno and R. Brünken (2010), Cognitive Load Theory, London: Cambridge University Press. Pollatsek, A., G. E. Raney, L. LaGasse and K. Rayner (1993), ‘The Use of Information Below Fixation in Reading and in Visual Search’, Canadian Journal of Experimental Psychology, 47: 179–200. Rajendran, D. J., A. T. Duchowski, P. Orero, J. Martínez and P. Romero-Fresco (2013), ‘Effects of Text Chunking on Subtitling: A Quantitative and Qualitative Examination’, Perspectives: Studies in Translatology, 21 (1): 5–21. Rayner, K. (1998), ‘Eye Movements in Reading and Information Processing: 20 Years of Research’, Psychological Bulletin, 124 (3): 372–422. Romero-Fresco, P. (2009), ‘More Haste Less Speed: Edited Versus Verbatim Respoken Subtitles’, Vigo International Journal of Applied Linguistics, 6 (1): 109–33. Romero-Fresco, P. (2010), ‘Standing on Quicksand: Hearing Viewers’ Comprehension and Reading Patterns of Respoken Subtitles for the News’, Approaches to Translation Studies, 32: 175-95. Saldanha, G. and S. O’Brien (2014), Research Methodologies in Translation Studies. London: Routledge. Sasamoto, R., M. O’Hagan and S. Doherty (2016), ‘Telop, Affect, and Media Design: A Multimodal Analysis of a Japanese TV Program’, Television and New Media, 19 (3): 1–16. Sherlock Holmes: A Game of Shadows (2011), [Film] Dir. Guy Ritchie, USA: Village Roadshow Pictures/Silver Pictures/Wigram Productions/Lin Pictures. Shimamura, A.P. (2013), Psychocinematics: Exploring Cognition at the Movies, Oxford: Oxford University Press. Smith, T. J. (2015), ‘Read, Watch, Listen: A Commentary on Eye Tracking and Moving Images’, Refractory: A Journal of Entertainment Media, 25. Available online: http://refractory.unimelb.edu.au/2015/02/07/smith/ (accessed 15 October 2016). Smith, A. C., P. Monaghan, and F. Huettig (2017), ‘The Multimodal Nature of Spoken Word Processing in the Visual World: Testing the Predictions of Alternative Models of Multimodal Integration’, Journal of Memory and Language, 93: 276–303. Specker, E. A. (2008), ‘L1/L2 Eye Movement Reading of Closed Captioning: A Multimodal Analysis of Multimodal Use’, PhD diss., University of Arizona, Tucson.

64

SEEING INTO SCREENS

Staub, A. and K. Rayner (2007), ‘Eye Movements and On-Line Comprehension Processes’, in G. Gaskell (ed.), Oxford Handbook of Psycholinguistics, 327–42, Oxford: Oxford University Press. Szarkowska, A., I. Krejtz, Z. Klyszejko and A. Wieczorek (2011), ‘Verbatim, Standard, or Edited? Reading Patterns of Different Captioning Styles Among Deaf, Hard of Hearing, and Hearing Viewers’, American Annals of the Deaf, 156 (4): 363–78. Szarkowska, A., I. Krejtz, O. Pilipczuk, L. Łukasz Dutka and J. L. Kruger (2016), ‘The Effects of Text Editing and Subtitle Presentation Rate on the Comprehension and Reading Patterns of Interlingual and Intralingual Subtitles Among Deaf, Hard of Hearing and Hearing Viewers’, Across Languages and Cultures, 17 (2): 183–204. Winke, P., S. Gass and T. Sydorenko (2013), ‘Factors Influencing the Use of Captions by Foreign Language Learners: An Eye-Tracking Study’, The Modern Language Journal, 97 (1): 254–75.

4 Into the Film with Music: Measuring Eyeblinks to Explore the Role of Film Music in Emotional Arousal and Narrative Transportation Ann-Kristin Wallengren and Alexander Strukelj

Introduction In this chapter, we explore whether eyeblinks can be used as a measurement of spectators’ narrative transportation into a cinematic narrative. In our experiment, we show that spectators’ eyeblinks increase when watching film clips together with what might be called congruent music. Increased eyeblinks indicate an emotional arousal that motivates attention, which in turn is a prerequisite for narrative transportation. Hence, our thesis is that a specific interaction between film and music facilitates and increases narrative transportation in cinematic narratives. Indeed, eyeblinks might be an effective method for answering deeper questions about the captivating audiovisual experience that is film. Ever since the first writings on film music almost a hundred years ago (e.g. Erdmann and Becce 1927), score and soundtrack have been seen to strongly

66

SEEING INTO SCREENS

affect the way viewers engage with film narrative. In modern film music theory, informed by psychoanalysis and semiotics (e.g. Gorbman 1987), the idea persists of film music as a transporter into the narrative, although now more in the guise of a hypnotizing function. Claudia Gorbman describes film music as a more or less subconscious part of the cinematic narrative: ‘As we follow a movie’s narrative in the perceptual foreground, music inhabits the shadows of our attention…. Film music is also the hypnotist that lulls us into a hyperreceptive state, in order that we receive and identify with the movie’s fantasy’ (Gorbman 2000: 234). Emotions and absorption into movies in general are issues that have attracted extensive research by cognitivist scholars (Plantinga and Smith 1999). Jeff Smith has addressed the question from the perspective of film music, and he outlines a theory of film music and emotions where, among other things, he discusses the concept of ‘affective congruence’ between film and music and its ability to cause a physiological response in a viewer, thus producing a high degree of emotional engagement (Smith 1999: 160–6). In research in film music psychology in later decades, scholars from the discipline of psychology have maintained the theory of film music’s ability to increase the spectator’s engagement in film. Both Annabel Cohen (2010), the scholar that has done the most extensive research about film music psychology in general, and Siu Lan-Tan, have put forward the argument that music has the ‘power to further immerse the viewer by heightening emotions’ (Tan et al. 2010: 273). More recently, research on video and computer game playing has come to a similar conclusion (Nacke and Grimshaw 2011). Music’s ability to increase engagement with a film’s narrative, to direct the spectator’s attention to crucial narrative elements and to evoke or represent feelings is thus regarded as its essential function. No other stylistic parameter is separately ascribed such affective power. The question of engagement, absorption or narrative transportation has, however, turned out to be quite complicated to explore empirically. Only recently has eye tracking been applied to study the influence of music on a film narrative.

Previous research: The context of the experiment Despite the longevity and centrality of the idea about the power of music to foster and enhance engagement with a film, few experiments and studies have been devoted to the topic, arguably because of methodological obstacles in measuring the phenomenon. Annabel Cohen writes: ‘My review of film music research (Cohen 2000) discussed the assumed contribution of music to the sense of engagement in a film. The discussion was speculative. In spite of the enormous contribution of music to this aspect of film experience, there was

INTO THE FILM WITH MUSIC

67

no research on this topic’ (Cohen 2014: 109). Cohen herself has contributed a number of studies to this area, but these are based on post-screening questionnaires, and not on eye tracking, or other psychophysiological measures. These studies indicate that absorption or engagement (which are the terms used by Cohen) is higher when music accompanies the film clips compared to clips without musical accompaniment (Cohen 2014). Her congruence–association model (CAM) is intended to show the complex relationships between music, sound and visuals, and builds not least on the assumption that certain fundamental relationships between film and music, i.e. congruencies or ‘appropriate music’, help the spectator to form a ‘coherent working narrative’, which promotes absorption into a film’s diegetic world (Cohen 2013: 34–5). There is, however, interesting research conducted in related and contextual areas, approaches that at times coincide. One area is the sizeable body that explores interpretation, and, as the editors write in the concluding article of the recent Psychology of Music in Multimedia: ‘The most common line of research in multimedia investigates how sound influences the interpretation of visual images’ (Tan et al. 2013: 395). These studies are in most cases conducted using quantitative methods such as questionnaires, which are arguably more subjective than the data produced by eye-tracking research. Another type of empirical study that is also important as a background for our experimental research is on the emotional or attentive effects of film music, measured by psychophysiological methods. However, research into the area of psychophysiological responses to music is for the most part done on autonomous music; very few studies on film music interaction exist. An early experiment was performed by Julian Thayer and Robert Levenson, presented in their article ‘Effects of Music on Psychophysiological Responses to a Stressful Film’ (1983), where they measured skin conductance, heart beat and other psychophysiological attributes to determine how different musical compositions for a specific film increased or decreased stressful responses to it. This sparsely populated research area is highlighted by Robert J. Ellis and Robert F. Simons, who express that it is both ‘surprising and unfortunate’ that more studies have not been done using physiological methods exploring the relationship between music and film (Ellis and Simons 2005: 17). Their own experiments in this area measured relationships between film and music, using questionnaires and physiological responses such as skin conductance, EMG recordings taken from the face, measurements of muscle activity and heart rate. The physiological responses showed that the interaction between music and film was more complex than seemed to be the case when considering the results from the questionnaire-based research. In an article relating an eye tracking experiment exploring Eisenstein’s theory about how music might guide the viewer’s attention in the battle scene in Alexander Nevsky (1938), psychologist Tim Smith, known for eye tracking studies on cinematic continuity editing, writes that there is a ‘lack

68

SEEING INTO SCREENS

of empirical evidence of non-diegetic, non-spatialized audiovisual effects on the direction of visual attention’ (2014: 93). In recent years, however, the method of eye tracking has been applied to a few experiments on film music. Such pioneering studies have been conducted by Auer et al. (2012), Song et al. (2013), Mera and Stumpf (2014) and Wallengren and Strukelj (2015). Eye tracking experiments on sound in general have been carried out, for example, by Coutrot et al. (2012, 2014), Shimamura et al. (2015) and Rassel et al. (2016). These studies do not study engagement and absorption but concentrate on how music can direct our attention and thereby influence how we understand and interpret a film. The experiment by Mera and Stumpf also included a questionnaire asking about what emotions the spectators experienced. Also, using pupil dilation as a measurement, Redmond and Sita (2013) showed in a pilot study that pupil dilation was greater when the participants watched clips with sound, which indicated that they experienced some kind of increased emotional response. Only during the last few years has research on film music’s role in narrative transportation emerged. To our knowledge, to date, there is only one study that directly deals with film music and transportation, although it does not deploy eye tracking but a questionnaire developed especially for measuring narrative transportation. This study by psychologists Kristi Costabile and Amanda Terman, published in 2013, showed unambiguous and significant results: ‘Participants reported greater transportation into the film and greater agreement with film-relevant beliefs when soundtrack was presented, but only when music was congruent with the film’s affective tone’ (Costabile and Terman 2013: 322). A related experiment was performed by Strick et al. (2015), who studied the role of music in transportation during different advertisements. There are a few studies on film narratives and narrative transportation (see van Laer et al. 2014) as well as studies on how eyeblinks can be used as a way of measuring narrative transportation (Nomura 2015), but these do not consider the impact of music. These different works confirm the current interest in the phenomenon of narrative transportation and the mechanisms that lead to this experience. Hence, in summary, we have previous studies that: (1) measure the emotional arousal created by film music using psychophysiological methods; (2) show how film music facilitates narrative transportation using post-screening questionnaires; and (3) show how eyeblinks can be used to measure narrative transportation, although not using music as a stimulus. Our experiment combines all of these different approaches, using blinks as an aspect of eye tracking to measure emotional activity and narrative transportation as influenced by music. In an article from 2005, Tracy et al. in an experiment on how affective pictures influence blink rate write that eyeblink measurement, among others, is a method on the rise because eyeblinking is a behaviour that is ‘robust, standardized and directly measurable’ (Tracy et al. 2005: 45). Furthermore, in an evaluation in 2013, Raudonis et al.

INTO THE FILM WITH MUSIC

69

showed that eye tracking devices such as pupil dilation, fixations and different viewing patterns are able to detect and measure emotions quite accurately. In their experiment, they suggested that ‘it is possible to detect the certain emotional state with up to 90 per cent of recognition accuracy’ (Raudonis et al. 2013: 84). Now, before we move on to our experiment, it seems necessary to define some central concepts and results.

Definitions: Congruence and narrative transportation As we have seen in the previous summary review, the concept of congruence seems vital for narrative transportation or absorption into a film. Both Annabel Cohen’s CAM model and Costabile and Terman’s experiment find that it is congruence that helps the spectator in creating a working narrative and fostering narrative transportation. Congruence is a problematic notion that has haunted much of the writings on film and music relationships, and is often problematized in modern film music theory (Ireland 2015). However, in psychological studies, the concept is still broadly applied, and there are admittedly plentiful occasions in a film narrative where some kind of congruence can be observed. Congruence is understood in the same way by different scholars; however, the terminology can vary. Structural, temporal or formal congruence is when music and visuals coincide in structure, for example, fast music set to fast motions, music that moves upward on a scale to upward visual movements, or music that follows a dramatic structure through, for instance, stingers or dynamics. Also, congruence sometimes coincides with synchronization. Semantic or associative congruence means congruence in meaning or ‘similarity between auditory and visual affective impressions’ (Iwamiya 2013: 141). Musicologist David Ireland, among others, has criticized the way these concepts are used and writes that ‘structural or temporal congruence may connote fit, while semantic or mood congruence may more readily imply notions of appropriateness’ (Ireland 2015: 49). Ireland (2012) argues that these ideas reflect a binary and reductive way of thinking in dichotomies, which does not have the ability to capture the many complexities that characterize the relationship between film and music. Nevertheless, for the experiment under analysis in this chapter, we needed music and film scenes that related to each other in this simple but distinct way in order to obtain basic results upon which to build in further work. The notion of congruence must definitely be problematized, but for the sake of this primary experiment, this simplified use of congruence between film and music was needed. The idea of narrative transportation has been studied as a psychological phenomenon, mostly in connection with written texts, but also increasingly

70

SEEING INTO SCREENS

with advertisements and film. The concept was first used by Gerrig 1993 (van Laer et al. 2014), who in the context of novels used travelling as a metaphor for reading. Narrative transportation is the feeling of being ‘carried away’ by the narrative, or, in more detail, ‘as a convergent mental process in which the real world is temporarily left behind in favor of the world created by the narrative’ (Costabile and Terman 2013: 317). The theory was initially developed by psychologists Melanie Green and Timothy Brook in the so-called transportation-imagery model (Green and Brook 2002), and is systematically summed up by Tom van Laer et al. (2014). According to these researchers, narrative transportation is accomplished through two components: empathy with the characters in the narrative, and mental imagery, that is, ‘the story plot activates his or her imagination, which leads him or her to experience suspended reality during story reception’ (van Laer et al. 2014: 799–800). Emotion and attention are vital for narrative transportation to occur, in that ‘the story’s emotional flow, or the series of emotional shifts throughout the piece, can provide the motivating force for continued attention. Further, this attention may help sustain narrative transportation and engagement during the course of a story’ (Nabi and Green 2015: 138). Different studies have explored what elements in a narrative, no matter which type, can increase transportation, and these are identifiable characters, imaginable plot and verisimilitude (van Laer et al. 2014). The more these are fulfilled, the more narrative transportation increases. We will argue that further important elements for transportation are style or formal qualities, which, in the present case, highlight music as a central tool in travelling into and through a film narrative.

Current experiment: Material, method, procedure and result In our eye tracking experiment, we measured eyeblink frequency, fixation durations, saccadic amplitude and scanpath lengths. Eyeblink frequency is a common measure in eye tracking (Holmqvist et al. 2011). Researchers usually distinguish between three types of blinks: ‘spontaneous, reflexive, and voluntary...Under the conditions of everyday life, a human blinks approximately 15 times per minute’ (Shin et al. 2015: 1). In our experiment, we measured the spontaneous eyeblink, that is, we did not try to trigger eyeblinks, for example, with the use of air puffs that would produce a reflexive blink. The basic question for the experiment was whether different music influences the way that the spectator looks at a film, but above all, whether eyeblinks can actually be used to measure emotional arousal and narrative transportation. The basic method in the experiment was to use different music with the same film scenes or sequences, a standard

INTO THE FILM WITH MUSIC

71

commutation that is also the most frequently used method in film music experiments (Tan et al. 2013).

Material We used three film excerpts as stimuli, systematically chosen according to formal and stylistic principles such as genre, editing, framing, camera movement, movement in picture and people or no people in the scene. Each excerpt was 1.5 minutes long and depicted one demarcated situation without cutting to other scenes or contrasting situations. Because of the shortness of the clips, they only offered a glimpse of a narrative; however, despite the shortness and limited narration, they turned out to be sufficient for our purposes. As spectators strongly react to movement (Mital et al. 2011), we chose an excerpt from Ronin (John Frankenheimer 1998), an action movie with a car chase in city streets, very fast and spatially confusing editing, framing changes and a lot of movement in the shots. As a contrast to this actionloaded scene, we chose a scene from Songs from the Second Floor (Roy Andersson 2000), which was set in a hospital with an old man sitting in a bed with some nurses standing around him. This scene employed a static camera and deep focus, and involved no editing, and had no closeups but only medium long shots, with very little movement in the scene. As the third film of choice, we used one with no human actors – Winged Migration (Jacques Perrin 2001). This scene involved horses and different birds moving around the space, with classical editing (analytical editing and shots/reaction shots) in the iconic surroundings of Monument Valley. Even as a documentary, it is moulded into the form of a narrative, and uses the stylistic devices characteristic of a narrative film. Almost all documentaries use a narrative form, which should not be confused with fiction (Bordwell and Thompson 2012). To analyse these three film scenes, we set up three different sound conditions: the first played with no music, the second played with ‘action music’ and the third played with ‘soft music’. The music was chosen from open (film) music libraries on the internet (see bibliography); thus, the music can be regarded as in some way tested as the music in the library is generically sorted according to musical form, style and expression. These types of music libraries are often used for all kinds of film productions, even rather prestigious ones and for different genres, and have existed in different forms since the 1910s (Erdmann and Becce 1927). The ‘action music’ that we chose has a fast tempo with ostinato strings as base, is supported by drums in fast tempo and is slightly syncopated. The melody is first played by strings, sometimes in dissonance, and after a short while distorted electric guitars are added and the drums play faster and in syncopated rhythm. The

72

SEEING INTO SCREENS

‘soft’ music is played as a piano solo, in slow tempo and regular rhythm with a very plain pastoral melody line and uncomplicated accompaniment. The film excerpts originally had no music, only natural sounds and very sparse dialogue. Therefore, music could be overlaid on the original soundscape and dialogue of the clips. This manipulation of the material might seem methodologically problematic, but is the only choice if commutation is to be used as a guiding method.

Method Participants and design: Fifty-five native speakers of Swedish (28 females; M = 26.1 years of age) with normal or corrected-to-normal vision took part in the experiment. All participants watched four movie clips; one clip that functioned as a filler and was shown in an unaltered form for all participants, followed by the three clips used in the analysis. The unaltered filler clip was shown to allow the participants to adjust their mind for film viewing, and to familiarize the participants with the experimental procedure. The following three clips were accompanied by one of three sound conditions, namely fast-paced action music, slow piano music, or no music at all. The original, diegetic sound design was, as noted, in place in all conditions. The presentation was counterbalanced between the participants using three separate lists. The participants were naïve to the purpose of the experiment. Apparatus: Binocular eye movement data was recorded with the smart binocular setting at 120 Hertz, with a RED-m remote video-based eyetracker from SensoMotoric Instruments (Teltow, Germany). The recordings were conducted in the Digital Classroom at the Humanities Laboratory, Lund University. The distance between the participant and the monitor was approximately 600 millimetres. Stimuli were displayed on a Dell P2210 22ʺ widescreen LCD display at a resolution of 1680 × 1050 pixels (475 × 300 millimetres, approximately 43.2 × 28.1 degrees of visual angle) with a refresh rate of 60 Hertz. SMI iView RED-m (1.0.62) controlled the eye tracking system, while SMI Experiment Center (3.1.116) controlled stimulus presentation, five-point calibration and four-point validation. Calibration was repeated until less than 1.0 degrees of deviation was achieved in both the horizontal and the vertical direction. Procedure: Participants were greeted and told they would take part in a study evaluating film viewing. They were instructed that they would view four movie clips. They were seated in front of the eye-tracker, and the equipment was calibrated. They watched four clips, starting with the filler clip, followed by the three clips with experimental manipulations. After the eye tracking recordings, participants were told about the research interest of the study, and signed a consent form.

INTO THE FILM WITH MUSIC

73

Data analysis and measures: Binocular eye tracking data was transformed into a single average in iView RED-m. The raw data was transformed into fixations and saccades using velocity-based high-speed event detection with default settings (peak velocity threshold = 40°/s; minimum fixation duration = 50 milliseconds) in SMI BeGaze (3.1.152). The data processing, statistical analyses and plots were performed using the statistical software R. All statistical models used mixed-effects regression analyses. Several eye tracking variables were analysed: average fixation durations (the average lengths of a participant’s fixations), saccadic amplitudes (the distance between two fixations), scanpath lengths (the sum of all saccade lengths during one clip) and eyeblinks (the total number of blinks – a measure of cognitive load or arousal).

Results Figure 4.1 shows the mean number of blinks for each clip, depending on its sound condition. The average number of blinks in Songs from the Second Floor in the control condition (no music but with original sound) was high compared to the other two movies. These differences are marginally significant (difference between Songs from the Second Floor and Winged Migration: Estimate = –14.804, SE = 7.580, t = –1.953, p = 0.055; difference between Songs from the Second Floor and Ronin: Estimate = –13.471, SE = 7.393, t = –1.822, p = 0.0727). Furthermore, there were significant effects of background music on the number of blinks. This effect shows up in the chart as the two high bars of Winged Migration with soft music (Estimate =

FIGURE 4.1 Chart showing number of blinks in different music conditions.

74

SEEING INTO SCREENS

26.586, SE = 12.565, t = 2.116, p = 0.0384), and Ronin with action music (Estimate = 28.141, SE = 12.565, t = 2.240, p = 0.0288).

Discussion of results In an experiment we conducted in 2013, we obtained significant results, most notably for fixation durations (Wallengren and Strukelj 2015). But in this second experiment, no significant results were achieved for fixation durations, saccadic amplitude or scanpath lengths. However, significant results were found for eyeblinks. The eye tracking results showed significantly more blinks when Winged Migration was viewed with soft music, when Ronin was viewed with action music, as well as results trending towards significance when Songs from the Second Floor was viewed with no added music. As mentioned earlier, we blink at a frequency of around fifteen times per minute in everyday life, and this was also the case in this experiment with variations from ten to twenty blinks per minute in most of the conditions. When watching Songs from the Second Floor with no music but with its original sound design, Winged Migration with soft music, and Ronin with action music, the blink rate more than doubled. So, if narrative transportation involves congruence, which is indeed the result of the experiment performed by Costabile and Terman (2013), what could be said about our results? To begin with, we saw a result that on the surface implied that we had measured congruence: the bars in the diagram seemed to show that the spectators reacted to congruence. The action-loaded scene from Ronin produced a high frequency of eyeblinks when shown together with the action music. The congruency between the fast narrated film clip, with rather confusing editing and a lot of movements in the picture, and fast tempo music with its diverse instruments and slightly syncopated rhythm is straightforward and obvious. The spectators also reacted strongly when watching the film clip from Winged Migration. Here we see birds flying slowly over Monument Valley and horses galloping in a vast landscape. This film excerpt is characterized by classical, invisible editing, with establishing shots alternating with medium long shots and a couple of close shots, accompanied by piano music with a pastoral tone where the simple melody and accompaniment seem to reflect and represent what is going on visually in the scene. Here, too, the congruence between the shots and the music seems rather indisputable. Interestingly, a high eyeblink frequency was evoked when Songs from the Second Floor was watched without any music at all. Nothing really happens in this scene, neither formally nor from the point of view of the plot. Action music set to this scene would probably bewilder the audience and make them start to anticipate another type of narrative coming later, and it seems that the soft music with its pastoral and slightly romantic mood does not relate in a direct manner to the gloomy

INTO THE FILM WITH MUSIC

75

hospital scene with its peculiar atmosphere: they seem rather incompatible regarding congruence in its different aspects. Congruence is, as mentioned, a contested notion, but nevertheless, here the results showed a significantly higher frequency of eyeblinks when the music was fitting in these rather simple and direct ways, or when there was no music at all. Accordingly, the spectators seemed to be sensitive to an aesthetic experience where the interaction between film and music was in some way congruent and they reacted physiologically to that. Thus, perhaps this is a way to measure how well film and music interact. But how is this relationship reflected in eyeblinks? Before we enter more deeply into this question, more needs to be said about the non-significant results for the other measures. Do these results indicate that music does not influence the way we watch movies, that is, that music does not have a strong potential to guide our attention? Our eye movements are guided by many things, one of which is motion (Mital et al. 2011), and this makes moving images natural stimuli for eye tracking experiments. However, a problem arises when using commercial movies as stimuli, with recent research arguing that classical narration tends to induce strong attentional synchrony (Loschky et al. 2015; Smith and Mital 2013). In other words, people look at the same things in the same way because of the analytical editing, use of light and colour, placement of the actors, and so on (Bordwell and Thompson 2012). Loschky et al. (2015) even use the term ‘the tyranny of film’ to express this determining relationship. As classically narrated films, both fiction and documentary, are created to strongly guide the viewer’s gaze, this will likely make it more difficult to find eye movement differences caused by music when investigating film viewing. Manipulations need to be strong in order to produce effects, and removing all sound from a film clip has been shown to significantly affect eye movements (e.g. Coutrot et al. 2012). However, we argue that these results cannot be generalized to the effects of film music, as the viewing in the experiment by Coutrot et al. with total diegetic silence is quite far removed from viewing a film in a natural setting. This is why we created our manipulation to resemble artistically/commercially produced film by only adding music to otherwise unaltered film clips. However, this also made it ‘weaker’ from an experimental standpoint. In classical narration, which guides the audience’s eyes with all the formal and stylistic elements available, it seems rather remarkable if music, at least when played continuously and with no special marking, would significantly alter the way we look at the film as measured with eye tracking, even if music can influence the way we understand and experience the cinematic story world. However, more studies on how film music can guide our attention are needed using stimuli that are carefully chosen and set up. With these results at hand, we need to further explore why the spectators reacted with increased eyeblinks when the film clips and music were congruent, and how this relates to questions about eyeblinks and emotion.

76

SEEING INTO SCREENS

Eyeblinks and emotion in film musical contexts The literature on music psychology agrees about bodily responses to music: ‘It is apparent that there is overwhelming support for the notion that listening to music affects physiological responses. Furthermore, there is general, though by no means unanimous, support for the notion that stimulative and sedative music tend to increase and decrease physiological responses, respectively’ (Hodges 2012: 125). Shin et al. state that ‘spontaneous eyeblinks can reflect cognitive states or emotions in humans [italics by the authors]’ (Shin et al. 2015: 1). It seems vital here to distinguish between different concepts used in connection with ‘emotions’, not least because the different concepts are often used in connection with film music, but also to elucidate the differences between cognitive load and emotional arousal in connection with eyeblinks. Music psychologists Patrik Juslin and John Sloboda observe that ‘researchers have tended to use words such as affect, emotion, feeling, and mood in different ways, which has made communication and integration difficult. Sometimes, researchers have used the same term to refer to different things’ (Juslin and Sloboda 2010: 9). Hence, the word affect is an umbrella term covering all evaluative states in general. Emotion refers to a ‘quite brief but intense affective reaction’ and ‘focus on specific “objects”’, whereas arousal is ‘the physical activation of the autonomic nervous system. Physiological arousal is one of the components of an emotional response’. And mood, so often used in connection with film music, is defined as ‘affective states that are lower in intensity than emotions, that do not have a clear “object”’ (Juslin and Sloboda 2010: 11). In connection with eyeblinks, Shin et al. distinguish between cognitive load and emotional arousal. So, what is the difference? Cognitive or mental load in experimental situations often implies that the persons involved in the experiment get a task to solve, for example, to detect an ‘X’ in a picture. In an article about eye movements and attention, Mulvey and Heubner (2011) argue that increases in mental load lead to decreases in eyeblink rates, but, on the other hand, that emotional activation and stress lead to increases in eyeblink rates. So, when the participants in our experiment look at specific combinations of sound and film, emotions are evoked. These emotions trigger a physiological arousal, which is manifested in higher eyeblinks rates. It is important to note that in the literature about the psychology of music, as exemplified by our previous mention of Hodges, Juslin and Sloboda, the discussions almost exclusively centre on physiological responses to autonomous music. But in our case, we have a multimodal narrative, an audiovisual interaction that apparently influences the spectator differently. That music used in film does not necessarily affect the body in the same way as autonomous music is evident in our experiment. Fast autonomous music is supposed to activate our nervous system in a way that would

INTO THE FILM WITH MUSIC

77

produce more eyeblinks, in accordance with studies conducted by music psychologists. However, in our experiment, the action music condition affected the spectators only when played together with a scene that had a corresponding fast set of aesthetics including rapid editing. A fast tempo score when played over the other scenes did not have any influence at all according to the eye measurements. Also, the soft and slow music, which is supposed to slow down physiological responses, induced high eyeblink rates when played with a scene that could be deemed as congruent in the way we discussed previously. Hence, certain interactions between music and visuals seem to be the decisive factor, rather than the tempo, rhythm, instrumentation or melody of the music itself. This is a characteristic of multimodal narration. That is, the same music can have different physiological effects depending on the visual context; a physiological response triggered by, for example, action music, can be neutralized by a scene that is narrated slowly and formally. Even more interesting is the no-music condition in our experiment. No music is usually used as a control condition; that is, no results are expected from this condition but it is used as a research baseline. Here, however, musical ‘silence’ played over one of the clips, Songs from the Second Floor, triggered eyeblinks, which clearly demonstrates that the interaction or cooperation between music and film functions in a way that differs from autonomous music. Musical silence functions here in congruence with the film clip in the same way as music does. The relationships between film and music are discussed from a methodological position in the aforementioned article by Ellis and Simons (2005). In their experiment, they used both questionnaires and different physiological measurements and their results differed between these methods. They argue that self-report measures, such as questionnaires, seem to demonstrate an additive and quite simple relationship between film and music (sad pictures with sad music made the experience doubly sad), whereas the physiological measures showed more complex results; that is, the result of the interaction is not always foreseeable and can go beyond the mere ‘addition’ of emotions and experiences: ‘Results indicate a fairly straight-forward, additive relationship in terms of emotion self report. The modulating role of music on physiological reactions to film, however, was more complex. This study corroborates previous evidence regarding the subjective experience of viewing images with music. Physiological evidence, however, suggests that the interactions between music and film not always are predictable’ (Ellis and Simons 2005: 15). Ellis and Simons conclude that physiological measures seem to be more useful in researching the interaction between film and music in that complexity, which is characteristic of film narratives, is put forward. The whole is something different from the sum of its parts. Hence, our results showed that the spectators reacted with emotional arousal to certain interactions between film and music. We will go on to see how these discussions connect to narrative transportation as emotional

78

SEEING INTO SCREENS

arousal is argued to be a prerequisite for engagement, and we will also briefly discuss the viewing consequences that follow from narrative transportation.

Narrative transportation and film music Nacke and Grimshaw suggest in an article about game playing, sound and music that ‘it is emotion that drives attention and this has an important effect upon both engagement with the game and immersion’ (2011: 266). In a recent article about narrative persuasion, Nabi and Green (2015) highlight the importance of emotion for engagement in the narrative world, and state that this is an understudied area. As mentioned before, some studies show that eyeblinks can be a useful instrument for measuring transportation. In an article from 2015, ‘Emotionally Excited Eyeblink-Rate Variability Predicts an Experience of Transportation into the Narrative World’, the authors look at variable blinks, which means blinks that vary between emotional arousal (increasing blinks) and cognitive load (decreasing blinks), with videotapes of traditional Japanese performances as stimuli (Nomura et al. 2015). They found that emotional excitement is connected to attention, that is, emotional excitement in turn motivates attention. Hence, they concluded that ‘high emotional excitement and high eyeblink variability would predict that audience members experience more transportation’ (Nomura et al. 2015: 2). Note that they were studying the variability of eyeblinks, which implies that the material used must be much longer than is usual in experiments, including ours. The first study that used the theory of transportation to explore the effects of film music was, as mentioned, performed by psychologists Kristi Costabile and Amanda Terman (2013). Only self-report questionnaires were used, with scales measuring the subjective estimation of transportation, and the participants were also asked about how they valued the protagonist and to what degree they agreed with the ideas articulated in the film. They used short films as stimuli and manipulated the music to find out whether congruence had any influence. They found that only when music and film were congruent did music foster both narrative transportation and filmrelevant beliefs, that is, the audience found the narrative more persuasive. The capacity of music to facilitate and increase transportation into a narrative is interesting in connection with advertisements. Strick et al. (2015) have studied the role of music during different advertisements. Their results are not entirely relevant to our experiment as they are connected to questions concerning attitudes towards the advertisements, but one result was that music with a strong emotional expression, ‘moving music’ as they call it, increases transportation, which for marketing could be interesting as one effect of transportation is that our critical thoughts decrease.

INTO THE FILM WITH MUSIC

79

Our experiment combines these related studies, and it could be the first that uses physiological measurements, namely eyeblinks, to show how music facilitates narrative transportation into a film. In theories on narrative transportation, the elements judged to increase the degree of emotional arousal and transportation are identifiable characters, imaginable plot and verisimilitude. It seems relevant and important that these elements are complemented by the way that the narrative is formally and stylistically constructed. A high eyeblink rate indicates both emotional arousal and narrative transportation, and the two are tightly knitted together and work in interaction. As we have seen, narrative transportation implies that our critical attitude diminishes, and as van Laer et al. also write, our ‘story-consistent beliefs increase’ (van Laer et al. 2014: 805). Hence, if we are not transported, that would imply that we regard the narrative with more critical distance. Does this mean that we could be more easily influenced by what the narrative tells us when we are transported, or that we would keep our critical distance when we are not? Green et al. (2008) assert that narrative transportation involves the audience being more entertained by their experience, but also, concurring with the discussion by Strick et al., they argue that ‘transportation is a key mechanism of narrative persuasion. Transported individuals are more likely to adopt attitudes and beliefs implied by a narrative, even a fictional narrative’ (Green et al. 2008: 514). This could mean that when transported, we are more open to impacts such as advertisement marketing, and perhaps even ideological influences. From these thoughts, it is not such a big step to Eisenstein’s theory of counterpoint juxtaposition, such as the relationship between sound/music and pro-filmic action, which he argued raised the audience’s level of consciousness regarding the societal inequities that he staged in his films (Eisenstein 1949). This theory of counterpoint juxtaposition could hypothetically imply that when music and film are interrelated in a (congruent) way that fosters transportation, viewers accept the narration without reflection. Eisenstein contends that film and music should work in a contrapuntal way (which is a notion as problematic as congruence) that wakes the viewer up and does not leave them ‘hypnotized’ by the cinematic narration, which he connects to ideology and ideological indoctrination. In the extension of our results and similar findings, ideological analysis could be one of the consequences that Kevin Donnelly (2015) refers to when he writes that empirical findings could even bring us back to psychoanalysis, apparatus theory and ideological analysis.

Concluding discussion Up until now, most empirical experiments on film or film music have revolved around questions about attention and understanding; it has perhaps

80

SEEING INTO SCREENS

seemed a little farfetched that eye tracking in its different forms could be capable of measuring emotional arousal and narrative transportation. In the discussions in this chapter, we have seen how film, music, emotional arousal and narrative transportation are connected and how these connections can be revealed by eye tracking research. There are, however, theoretical problems that need to be addressed in further research. One of them is, again, the question of congruence. The notions of ‘congruence’ and ‘non-congruence’ between music and film rest on the old and tenacious idea that music can relate to the visuals either in a parallel way or in counterpoint, which in turn is a theory that assumes a visual dominance or primacy. Film sound theoretician Michel Chion criticizes the simple dichotomy that underpins the concept of counterpoint, which produces incongruence: it is ‘imprisoning us in a binary logic that has only remotely to do with how cinema works’ (Chion 1994: 38). By adhering to this thinking, ‘more complicated, subtle or abstract mismatches are not always adequately explained’ (Ireland 2012: 100). Incongruence in this discourse is often described as ‘a lack of shared properties within a multimedia relationship’ (Ireland 2015: 49), and what many researchers deem as incongruent, something that is mostly valued as negative, are often actually moments in the audiovisual narrative that are the most memorable, captivating and interesting. For an experimental situation like the one discussed in this chapter, with the use of film excerpts and music with very clear-cut stylistic features, the notions of congruence and incongruence can be useful. In further experiments, however, it would be interesting to use full-length films in their original form and with their original music to find out in a more adequate way the exact relationships between film and music that trigger emotional arousal and/or narrative transportation. Certainly, ‘incongruent’ combinations of film and music will also induce higher emotional arousal and narrative transportation. It seems strange that an aesthetic experience should cease because of what is considered incongruence. Anticipating music, very often used by classical narration, may not seem to be congruent with the actual scene, but could instead be ‘congruent’ with the narrative logic; it might, so to speak, be congruent with something that will occur later in the film. Film has temporal, spatial, dramatic and narrative layers, with music not necessarily on the same layer as the visuals in a given scene. What film and music produce together could be totally different from what they signify in isolation. Such magical moments are elegantly expressed by film music scholar Kay Dickinson: ‘At other times, these raw, previously unrelated elements seem to conjure an alchemical transformation, and a fresh approach to understanding materializes, one that could never have been imagined beforehand’ (Dickinson 2008: 13).

INTO THE FILM WITH MUSIC

81

Acknowledgements We want to express our gratitude to Joost van de Weijer for his work with the statistical results.

References Auer, K., O. Vitouch, S. Koreimann, G. Pesjak, G. Leitner and M. Hitz (2012), ‘When Music Drives Vision: Influences of Film Music on Viewer’s Eye Movements’, Conference Proceeding of the 12th International Conference on Music Perception and Cognition and the 8th Triennial Conference of the European Society for the Cognitive Sciences of Music, July 23–28:73–76. Bordwell, D. and K. Thompson (2012), Film Art: An Introduction, 10th edn, London: McGraw-Hill. Chion, M. (1994), Audio-Vision. Sound on Screen, translated by C. Gorbman, New York: Columbia University Press. Cohen, A. J. (2010), ‘Music as a Source of Emotion in Film’, in P. N. Juslin and J. A. Sloboda (eds), Handbook of Music and Emotion: Theory, Research, Applications, 879–908, Oxford: Oxford University Press. Cohen, A.J. (2013), ‘Congruence-Association Model of Music and Multimedia: Origin and Evolution’, in S-L. Tan, A.J. Cohen, S. D. Lipscomb & R. A, Kendall (eds), The Psychology of Music in Multimedia, 17–48, Oxford: Oxford University Press. Cohen, A. J. (2014), ‘Film Music from the Perspective of Cognitive Science’, in D. Neumeyer (ed), The Oxford Handbook of Film Music Studies, 96–131, Oxford: Oxford University Press. Costabile, K. A. and Terman, A. W. (2013), ‘Effects of Film Music on Psychological Transportation and Narrative Persuasion’, Basic and Applied Social Psychology, 35(3): 316–24. Coutrot, A., N. Guyader and A. Caplier (2012), ‘Influence of Soundtrack on Eye Movements During Video Exploration’, Journal of Eye Movement Research, 5(4), 1–10. Coutrot, A., N. Guyader, G. Ionescu and A. Caplier (2014), ‘Video Viewing: Do Auditory Salient Events Capture Visual Attention?’, Annals of Telecommunications, 69: 89–97. Dickinson, K. (2008), Off Key. When Film and Music Won’t Work Together, Oxford: Oxford University Press. Donnelly, K. J. (2015), ‘Accessing the Mind's Eye and Ear: What Might Lab Experiments Tell Us About Film Music?’ Music and the Moving Image, 8 (2): 25–34. Eisenstein, S. (1949), Film Form. Essays in Film Theory, edited and translated by Jay Layda, New York: Harcourt, Brace and Company, Inc. Ellis, R. J. and R. Simons (2005), ‘The Impact of Music on Subjective and Physiological Indices of Emotion While Viewing Films’, Psychomusicology: A Journal of Research in Music Cognition, 19(1): 15–40.

82

SEEING INTO SCREENS

Erdmann, H. and G. Becce (1927), Allgemeines Handbuch der Film-Musik, Berlin: Schlesinger’sche Buch u. Musikhandlung. Gorbman, C. (1987), Unheard Melodies: Narrative Film Music, Bloomington: Indiana University Press. Gorbman, C. (2000), ‘Scoring the Indian: Music in the Liberal Western’, in G. Born and D. Hesmondhalgh (eds), Western Music and its Others: Difference, Representation, and Appropriation in Music, 234–53, Berkeley: University of California Press. Green, M. C. and T. C. Brock (2002), ‘In the Mind's Eye: Transportation-imagery model of Narrative Persuasion’, in M. C. Green, J. J. Strange and T. C. Brock (eds), Narrative Impact: Social and Cognitive Foundations, 315–41, Mahwah, NJ: Lawrence Erlbaum. Green M. C., S. Kass, J. Carrey, B. Herzig, R. Feeney and J. Sabini (2008), ‘Transportation Across Media: Repeated Exposure to Print and Film’, Media Psychology, 11(4): 512–539. Hodges, D. A. ([2009] 2012), ‘Bodily Responses to Music’, in S. Hallam, I. Cross and M. Thaut (eds), The Oxford Handbook of Music Psychology, 121–31, Oxford: Oxford University Press. Holmqvist, K., M. Nyström, R. Andersson, R. Dewhurst, H. Jarodzka, and J. van de Weijer (2011), Eye Tracking: A Comprehensive Guide to Methods and Measures, Oxford: Oxford University Press. Ireland, D. (2012), ‘”It’s a sin … using Ludwig van like that. He did no harm to anyone, Beethoven just wrote music”: The Role of the Incongruent Soundtrack in the Representation of the Cinematic Criminal’, in C. Gregoriou (ed), Constructing Crime. Discourse and Cultural Representations of Crime and ‘Deviance’, 97–112, Basingstoke: Palgrave Macmillan. Ireland, D. (2015), ‘Deconstructing Incongruence: A Psycho-semiotic Approach toward Difference in the Film-Music Relationship’, Music and the Moving Image, 8(2): 48–57. Iwamaya, S. (2013), ‘Perceived congruence between auditory and visual elements in multimedia’, in S-L. Tan, A. J. Cohen, S. D. Lipscomb and R. A. Kendall (eds), The Psychology of Music in Multimedia, 141–65, Oxford: Oxford University Press. Juslin, P. N. and Sloboda, J. A. (2010), ’Introduction. Aims, Organization, and Terminology’, in P. N. Juslin, and J. A. Sloboda (eds), Handbook of Music and Emotion: Theory, Research, Applications, 3–12, Oxford: Oxford University Press. Loschky, L. C., A. M. Larson, J. P. Magliano and T. J. Smith (2015), ‘What would Jaws do? The Tyranny of Film and the Relationship between Gaze and Higherlevel Narrative Film Comprehension’, PLoS ONE, 10(11): 1–23. http://journals. plos.org/plosone/article?id=10.1371/journal.pone.0142474. Mera, M. and S. Stumpf (2014), ‘Eye-Tracking Film Music’, Music and the Moving Image, 7 (3): 3–23. Mital, P. K., T. J. Smith, R. L. Hill and J. M. Henderson (2011), ‘Clustering of Gaze During Dynamic Scene Viewing is Predicted by Motion’, Cognitive Computation, 3(1): 5–24. Mulvey, F. and M. Heubner (2012), ‘Eye Movements and Attention’, in P. Majaranta, H. Aoki, M. Donegan, D. W. Hansen, J. P. Hansen, A. Hyrskykari

INTO THE FILM WITH MUSIC

83

and K-J. Räihä (eds), Gaze Interaction and Applications of Eye Tracking, 129–52, Hershey, PA: Medical Information Science Reference. Nabi, R. L. and Green, M. C., (2015), ‘The Role of a Narrative’s Emotional Flow in Promoting Persuasive Outcomes’, Media Psychology, 18 (2): 137–62. Nacke, L. E. and Grimshaw, M. (2011), ’Player-Game Interaction Through Affective Sound’, in M. Grimshaw (ed), Game Sound Technology and Player Interaction: Concepts and Developments, 264–85, Hershey, PA: Information Science Reference. Nomura, R., K. Hino, M. Shimazu, Y. Liang and T. Okada (2015), ‘Emotionally Excited Eyeblink-rate Variability Predicts an Experience of Transportation into the Narrative World’, Frontiers in Psychology, 6, 447, http://doi.org/10.3389/ fpsyg.2015.00447. Plantinga, C. and G. M. Smith, eds. (1999), Passionate Views. Film, Cognition, and Emotion, Baltimore and London: The Johns Hopkins University Press. Rassell, A., S. Redmond, J. Robinson, J. Stadler, D. Verhagen and S. Pink (2016), ‘Seeing, Sensing Sound: Eye Tracking Soundscapes in Saving Private Ryan and Monsters, Inc.’, in C. D. Reinhard and C. J. Olson (eds), Making Sense of Cinema: Empirical Studies into Film Spectators and Spectatorship, 139–64, New York: Bloomsbury Academic. Raudonis, V., D. Dervinis and A. Vilkauskas (2013), ‘Evaluation of Human Emotion from Eye Motions’, International Journal of Advanced Computer Science and Applications, 4 (8): 79–84. Redmond, S. and J. Sita (2013), ‘What Eye Tracking Tells us about the Way we Watch Films’, online The Conversation https://theconversation.com/what-eyetracking-tells-us-about-the-way-we-watch-films-19444. Shimamura, A. P., B. I. Cohn-Sheehy, B. L. Pogue and T. A. Shimamura (2015), ‘How Attention is Driven by Film Edits: A Multimodal Experience’, Psychology of Aesthetics, Creativity, and the Arts, 9: 417–22. Shin, Y. S., W-S. Chang, J. Park, C. H.Im, S. I. Lee, I. Y. Kim and D. P.Jang (2015), ‘Correlation between Inter-Blink Interval and Episodic Encoding during Movie Watching’, PLoS ONE 10(11): 1–10. http://dx.doi.org/10.1371/journal. pone.0141242. Smith, T. J. (1999), ‘Movie Music as Moving Music: Emotion, Cognition, and the Film Score’, in C. Plantinga and G. M. Smith (eds.), Passionate Views. Film, Cognition, and Emotion, 146–68, Baltimore and London: The Johns Hopkins University Press. Smith, T. J. (2014), ‘Audiovisual Correspondences in Sergei Eisenstein’s Alexander Nevsky: A Case Study in Viewer Attention’, in T. Nannicelli and P. Taberham (eds), Cognitive Media Theory, 85–106, New York: Routledge. Smith, T. J. and P. K. Mital (2013), ‘Attentional Synchrony and the Influence of Viewing Task on Gaze Behavior in Static and Dynamic Scenes’, Journal of Vision, 13 (8:16): 1–24. Song, G., D. Pellerin and L. Granjon (2013), ‘Sounds Influence Gaze Differently in Video’, Journal of Eye Movement Research, 6(4): 1–13. Strick M., H. de Bruin, L. de Ruiter and W. Jonkers (2015), ‘Striking the Right Chord: Moving Music Increases Psychological Transportation and Behavioral Intentions’, Journal of Experimental Psychology Applied, 21(1): 57–72.

84

SEEING INTO SCREENS

Tan, S-L., P. Pfordresher and R. Harré (2010), Psychology of Music. From Sound to Significance, Hove and New York: Psychology Press. Tan, S-L., A. J. Cohen, S. D. Lipscomb and R. A. Kendall (2013), ‘Future Research Directions for Music and Sound in Multimedia’, in S-L. Tan, A. J. Cohen, S. D. Lipscomb and R. A. Kendall (eds), The Psychology of Music in Multimedia, 391–407, Oxford: Oxford University Press. Thayer, J. F. and R. Levenson (1983), ‘Effects of Music on Psychophysiological Responses to a Stressful Film’, Psychomusicology, 3 (1): 44–54. Tracy, J. A., R. M. McFall and J. E. Steinmetz, (2005), ‘Effects of Emotional Valence and Arousal Manipulation on Eyeblink Classical Conditioning and Autonomic Measures’, Integrative Physiological & Behavioral Science, 40 (1): 45–54. van Laer, T., K. de Ruyter, L. M. Visconti and M. Wetzels (2014), ‘The Extended Transportation-Imagery Model: A Meta-Analysis of the Antecedents and Consequences of Consumers’ Narrative Transportation’, Journal of Consumer Research, 40(5): 797–817. Wallengren, A-K. and A. Strukelj (2015), ‘Film Music and Visual Attention: A Pilot Experiment using Eye-Tracking’, Music and the Moving Image, 8 (2): 69–80.

Films Ronin (1998), [Film] Dir. John Frankenheimer, USA: FGM Entertainments, United Artists. Songs from the Second Floor (2000), [Film] Dir. Roy Andersson, Sweden: DR, SFI, et al. Winged Migration (orig. La peuple migrateur) (2001), [Film] Dir. Jacques Perrin, France: Bac Films et al.

Music Resistance Action 2, Hollywood Film Music Orchestra, (2012), [Music] Warner/ Chappell Production Music. Song for Hannah by Breathe, (2008), [Music] Breathe, Relaxation, Meditation, Yoga, Massage Therapy and Healing Music, Time Machine Records.

5 Looking at Sound: Sound Design and the Audiovisual Influences on Gaze Jonathan P. Batten and Tim J. Smith

Watching a film extends beyond simply viewing a visual sequence, it is an immersive audiovisual experience that engages both senses (and may invoke others; Sobchack 2000) in order to entertain, inform and transport its audience to narrative worlds. The composer Virgil Thompson, quoted in Copeland (1939: 158), conveyed this well: ‘The quickest way to a person’s brain is through his eye but even in the movies the quickest way to his heart and feelings is still through the ear’. In this chapter, we will investigate how the auditory and visual modalities interact – refining, placing and contextualizing each other in a continuous semantic interplay that conveys the narrative, the scene context and the emotional nuances of the scene. Sound enhances the visual scene as an additive force, providing energy, dialogue, motion and warmth, and grounding the limited visual perspective in a 360-degree aural world that is believed to immerse and guide the viewer through the narrative (Gorbman 1980; Chion 1994; Sonnenschein 2001). In this chapter, we will explore the empirical evidence for how audio influences our experience of narrative film with a specific focus on whether sound design influences viewer gaze. Although the early years of cinema did not have synchronized sound, this is not to say that the percept of the viewer was absent of any sound. The cinematic world being viewed clearly had sound in which interacting actors could hear each other and the world around them. However, the movie

86

SEEING INTO SCREENS

required the audiences’ imagination to ‘hear’ (Raynaud 2001). Additionally, early cinema screenings were commonly accompanied by an array of audio cues including narrators, live interpreters as well as live music (Elsaesser and Barker 1990). These served a number of purposes: (1) creating continuity between the traditional use of sound design in theatrical performances, which may have shared the bill with an early movie; (2) communicating narrative information; (3) drowning out the whirring mechanical projector; and (4) adding audio energy and emotion to the otherwise ghostly and unnatural looking silent actions (Gorbman 1980). Since the introduction of the ‘talkies’, the role of sound in film has developed exponentially, with modern films utilizing complex soundscapes for Dolby 5.1 and 7.1 immersive surround sound that envelopes the audience in 360-degree spatialized sound. The requirements for film sound are vast, so common practice in film production is to divide sound into three distinct stems: dialogue, music and sound effects. The sound-effect stem encompasses diegetic sounds (the sounds of the scene, including foley) and non-diegetic sounds (sounds not attributable to the scene, for example, sounds added for dramatic effect). Both the dialogue and diegetic sound effects are altered to conform to the intended phonic world (the phonic resonance of the visually projected space, for example, by adding reverb and compression). Music is usually nondiegetic and is completely for the benefit of the audience (the characters do not generally hear or interact with it) as an emotive and narrative emphasis or counterpoint (Gorbman 1980). This chapter will consider how each of these three stems individually and collectively influence where and when viewers attend to visual features in Hollywood narrative film. Prior to investigating how audiovisual influences may alter film viewing behaviour, we must first consider the nature of the two perceptual systems. When comparing the perceptual attributes of the auditory and visual systems, two key features stand out. First, the human field of view is limited to around 130 degrees (where 360 degrees is a full circle around the viewer’s head; Henderson 2003) and our ability to perceive high-level detail and colour is further limited to the visual information projected close to the centre of the retina (known as the fovea), with image quality decreasing rapidly with eccentricity, further limiting the useful field of view. To perceive visual events in the world, the eyes must continuously move so that the parts of the scene we are interested in are projected onto this high-resolution part of the retina (on average three times per second for scenes; Rayner 1998). Where the eyes move is subject to constant competition between visually salient image features and task/semantic relevance (Tatler et al. 2011), and this focus of visual information means that visual events that occur outside of this ‘spotlight’ are less likely to be processed sufficiently to make it into our conscious awareness (Jensen et al. 2011). As a result, the visual system suffers from severe sensory capacity limitation. In contrast, there is no ‘field of view’ for audition as all audible information from our

LOOKING AT SOUND

87

360-degree surroundings is received by the auditory system. However, for auditory information to be perceived, neural processes are required that inhibit, isolate and group sounds into attributable sources, a process known as auditory scene analysis (Bregman 1990). The second feature that contrasts the two modalities is the dominance of vision in processing where information is (spatial), and in the auditory modality for when it occurs (temporal). Both senses have spatio-temporal components, but the difference in emphasis is a direct product of how the sensory information is formed: sound is produced by changes in air pressure over time whereas visual information is largely a product of the difference in absorption of photons by adjacent parts of the human retina. To identify and attend to a sound source (e.g. a person speaking) requires the binding of a continuous stream of auditory features through time by temporally grouping sounds based on phonic similarities (Bregman 1990). However, perceiving a visual object involves processing the changes in brightness projected spatially across the retina in order to identify edges and bind these together to form an object (Marr 1982). But real-world perception is rarely unimodal and both auditory and visual information are perceptually bound by their relative spatio-temporal features, something Soviet filmmaker Sergei Eisenstein termed ‘the synchronization of the senses’ (Eisenstein 1957: 69). In binding information, the perception systems utilize the relative strengths of the two modalities to form a coherent and efficient precept of the world. When there is perceptual ambiguity for one of the senses, information from the other is employed, which can produce perceptual illusions, for example, the ‘ventriloquism effect’ (Thurlow and Jack 1973) where highly simplified, spatially separated audiovisual stimuli are perceived as joined when their presentation is synchronized in time, or the McGurk effect, whereby simultaneous mismatching mouth shapes and syllabic sounds form an integrated but different illusory auditory percept not present in either modality (McGurk and MacDonald 1976). A notable example of an illusory audiovisual percept from film identified in Chion (1994: 12) is the ‘pssssht’ door sound used in the early Star Wars films, which gives the viewer a percept of doors closing yet the doors are never seen in motion. The use of sound combined with the abrupt visual cut is fused to provide an illusory percept of visual motion that matches the temporal dynamics of the audio. Beyond the ability to generate audiovisual illusions, the combination of audio and visual information is generally perceptually advantageous. In a psychophysical ‘pip and pop’ paradigm, Van der Berg et al. (2008) found that participants’ identification time for detecting an ambiguous target line within a complex array of lines was significantly reduced (i.e. the line seemed to ‘pop’ out) if the visual presentation was accompanied by an auditory tone (a ‘pip’). This effect provides evidence that the temporal binding of both the audio and visual information is used to efficiently disambiguate visual information, and that this bound representation is perceptually enhanced

88

SEEING INTO SCREENS

(more salient) in a viewer’s attention. A fundamental benefit of a bound audiovisual representation is that it can inform the temporal dynamics of attention (a limited resource) through time. An example of this was observed in a simple visual discrimination task with music performed by Escoffier et al. (2010). The authors presented visual images both in synchrony with the musical beat and randomly in time. They found that reaction times on-beat were significantly faster than those of the offbeat presentation, suggesting an entrainment of visual attention to the music, that is, the use of predictable auditory temporal events (musical pacing) enhanced the predictive dynamics of visual attention through time. These findings are compelling, but were ultimately reached in contrived (somewhat reductive) scenarios that had little auditory or visual complexity. To date, there is little research that extends these psychophysical paradigms to more complex naturalistic scenes, or applies them to film. Were these effects to scale up to the complexity of film, the temporal correspondence of sound to a visual event or object should enhance film viewers’ attention to it, increasing the probability of their gaze fixating on it (as has been suggested by sound designers; Sonnenschein 2001). Secondly, the musical rhythm of film scores would influence attention to key visual elements introduced on-beat (and inversely be detrimental to offbeat moments), potentially altering and influencing memory and narrative understanding subject to these time points as has been proposed by theorists of classical narrative film scoring (e.g. Gorbman 1980). A fundamental example of how sound designers believe they can influence viewer attention is the introduction of a sound corresponding to a visual object (Bordwell and Thomson 2013; Murch 2001). Chion (1994) believes that the inclusion of sound has influenced how complex the visual content in film can be. He notes that silent cinema demanded a simpler visual scene, as without synchronized sound, the visual complexity of a scene would overwhelm the viewer, fail to highlight the important details and lead to confusion. Such gaze guidance by sound was also predicted by Sergei Eisenstein (1957) in relation to a sequence in his 1938 film, Alexander Nevsky. Eisenstein believed that the score (composed by Sergei Prokofieff) directed the rise and fall of viewers’ attention in synchrony with the rise and fall of the music. A recent empirical test of his predictions by Smith (2014) provided some limited correspondences between his predictions and viewer gaze allocation. However, the overarching musical influence of Prokofieff’s score on where gaze was located was not supported as the viewers’ gaze was no different with the music than in silence. Rather, the changes in the music complemented the existing changes in the visual scene across cuts, producing vertical gaze shifts in time with the rise and fall of the music, but no significant association between music and gaze was found within the shots. These findings potentially confirm Prokofieff’s ability to see the visual patterns of the scene and feel them in his own gaze before expressing them in the musical score.

LOOKING AT SOUND

89

Eye movement evidence in support of auditory influences on where people look when watching films is limited. When watching edited sequences, the gaze of viewers often clusters around faces, hands and points of motion in the scene, a phenomenon we have termed attentional synchrony (Smith, Levin and Cutting 2012; Smith and Mital 2013; Smith 2013). The attentional synchrony of multiple viewers’ eye movements is unsurprising when you consider the tendency in film to frame the salient action centrally (Cutting 2015). A highly effective viewing strategy for watching a film is therefore to simply maintain gaze to the screen centre (Tseng et al. 2009; Le Meur et al. 2007). The frequent central and close framing of action in narrative films combined with the general tendency for gaze to cluster around these centrally located salient visual features (faces, hands and points motion) limits the possibility for audio influences to draw attention away from the screen centre and direct it to peripheral screen locations. In fact, the apparent dominance of visual features and shot composition on viewer attention has been empirically shown as so robust that we have recently referred to it as ‘the tyranny of film’ (Loschky et al. 2015). Despite these complexities, there is some evidence that audio can influence dynamic scene viewing. A study by Võ et al. (2012) eye tracked two groups of participants watching a series of ad hoc interviews (pedestrians on the street) that were either accompanied by synchronized speech with background music or simply background music. They found that gaze was captured by the faces of people, and when they spoke, people looked at the speakers’ mouths. This mouth capture was notably reduced when watching the scene without the speech (music condition). Similar evidence for gaze differences with and without a film’s original soundtrack has been presented by Rassel et al. (2015; 2016). In two eye-tracking studies examining viewer gaze behaviour during the Omaha Beach sequence from Saving Private Ryan (Spielberg 1998) and the climactic chase sequence from Monsters Inc. (Docter, Silverman and Unkrich 2001), the authors reported a qualitative trend towards greater gaze exploration of the screen periphery in the mute conditions compared to the audio conditions, and potentially greater sensitivity to visually salient events in the periphery (such as a foot movement or bright light) in the absence of sound (although none of these differences were statistically significant; see Smith 2015 for further critique). There is also some evidence that the addition of film sound, especially music, can influence the duration of fixations (the period of a relatively stable localization of gaze). Wallengren and Strukelj (2015) identified some evidence of a reduction in fixation duration subject to the inclusion of film music (although the effect may reverse when the soundtrack includes speech; Rassell et al. 2016), and a study by Coutrot and Guyader (2013) found that the inclusion of film sound increased the attentional synchrony of participants’ eye movements, and influenced the size of the saccades (suggestive of exploratory scene viewing away from the centre). This may be evidence that the sound does guide viewers’ attention as predicted by Chion

90

SEEING INTO SCREENS

and others. Taken together, this evidence allows us to predict that audible dialogue would be expected to capture gaze to the mouth of the speaker, that music may reduce the duration of fixations, and that the addition of audio may generally promote a clustered exploration of the visual scene. In this chapter, we will investigate the influence of audio on viewer gaze via two stylistically very different ‘found’ experimental case studies, How to Train Your Dragon (DeBlois and Sanders 2010) and the classic Francis Ford Coppola movie The Conversation (1974), which was famously inspired by the work of the Oscar winning sound designer, Walter Murch. By using famous case studies of sound design, we aim to demonstrate the relationship between viewer gaze and the three key elements of sound design – music, dialogue and sound effects – as they appear in Hollywood narrative movies, as well as highlight the need for future research of the audiovisual influences on overt attention using more controlled naturalistic stimuli.

How to Train Your Dragon One of the challenges facing research into how sound design influences viewer attention is the inaccessibility of a professionally produced film’s individual sound stems. Studies comparing a soundtrack’s presence or absence (see our previous discussion) can identify the overall influence but cannot pinpoint whether individual audio components such as sound objects or music independently influence attention. To overcome this limitation, we will exploit a ‘found’ experiment presented during a SoundWorks Collection interview with the creative team responsible for the animated film How to Train Your Dragon (DeBlois and Sanders 2010). During this interview, a short clip from the film was repeated three times to feature in isolation the separate sound stems (dialogue, music and sound effects). This exemplar of sound design provided an excellent opportunity to extract and investigate the influence of the final sound mix from each stem on eye movement behaviour and affective response. The 52-second clip taken from the very beginning of the movie was viewed by 48 adult participants (36 female and 12 male, aged from 20 to 50 years old). Twelve participants were placed in each audio condition (music, dialogue, sound effects and a silent control). Each participant gave informed consent for their eye movements to be recorded (a Tobii TX300 screen-based eye-tracker recording at 300 Hertz with video resolution of 1920 × 1080, 24 frames per second), and were tasked to watch the clip knowing that they would later complete a memory test (to encourage close viewing). Following the clip, they rated how the film made them feel on both a nine-point arousal scale and happiness scale (Bradley and Lang 1994). How to Train Your Dragon (2010) is a highly successful DreamWorks Animation film that tells the story of a diminutive and resourceful teenage

LOOKING AT SOUND

91

Viking (Hiccup) in a land plagued by dragons. The story follows Hiccup, who befriends and trains an intelligent dragon (Toothless), ultimately saving his village and earning the pride of his father (Stoick the Vast, the village chief). The eye-tracked 52-second scene is set in Hiccup’s hilltop village and the plot both introduces the different dragons that plague the people, while also demonstrating their destructive abilities around the village (lighting houses on fire, stealing sheep, destroying defences). The overarching message of the clip is that there is a fight between the people who are equipped with simple weapons and the immense destructive powers of the different dragon types. The clip ends with a narrated description of the elusive and powerful Night-Fury dragon, who causes the explosive demolition of a large boulderthrowing catapult (containing Stoick the Vast) and is later revealed to be the character Toothless. In the music condition, participants watched with the associated film music (composed by John Powell), which was formed of percussive drumming and a brass refrain. Two features of the music stand out; first, the use of a pulse-like beat (marching snare sounds and low booming drums) reinforces the visual momentum of the scene both within and across cuts. Secondly, the rise and fall of the horn melody evokes awe and suspenseful emotion and the musical motif calls to mind film scores of battle scenes. The dialogue condition contained not only the speech of the characters, but also the narration (the voice of Hiccup) and all other human vocal noises (murmurs and vocal exertion sounds). With the exception of the silent condition, the dialogue version was relatively limited in the amount of sound and variability. The sound effects condition contained a combination of the foley and the sound effects for the actions on scene, for example, low rumbling explosions, impact sounds, animal noises and dragon vocalizations. The specific sound stems each add different qualities to the film. The music adds an emotion and tempo not found in other mixes. The additive quality of music as energy and emotion would be predicted to increase enjoyment and arousal ratings for the film (when compared to the silent condition; see Gorbman 1980). The music would be predicted to increase the dilation of pupillary response, which is modulated by arousal state changes and variance in cognitive demand (Hoeks and Levelt 1993). The music would also be predicted to decrease fixation durations as observed in Wallengren and Strukelj (2015). The sound effects condition containing diegetic sound would be predicted to guide attention in a more tightly clustered manner than the other auditory conditions, increasing attentional synchrony through time (Coutrot and Guyader 2013; Robinson, Stadler and Rassell 2015; Rassell et al. 2016). Additionally, as the representation of sound objects is believed to capture attention, when a clear audiovisual correspondence occurs, this will capture gaze to that object. Finally, when characters on screen speak, gaze is predicted to cluster on the mouth more

92

SEEING INTO SCREENS

in the dialogue condition (Coutrot et al. 2012; Võ et al. 2012; Foulsham and Sanderson 2013; Rassell et al. 2016). As predicted, the participants in the music condition reported a significantly higher happiness level than those in the silent condition (revealed by a statistical t-test comparing the means between conditions; t(22) = 3.02, p < 0.01). There were no other significant differences in the self-report measures between the four conditions, including no difference in arousal (excitement) between those with music and silence. We did not show significantly different fixation durations between the conditions, or any trend indicative of a shortening of fixation durations in the music condition (revealed by an analysis of variance; F(3,44) = 0.548, p = 0.652). Furthermore, while pupillary responses were highly sensitive to changes in luminance as observed in Figure 5.1 there was no support for the prediction that any audio condition significantly altered pupil dilation. Analysis of the variance of gaze scanpaths between the groups through time was conducted using a methodology employed in Loschky et al. (2015). This methodology takes the gaze from each frame of the movie and calculates the probability that each gaze point belongs to its own 2D spatial distribution (e.g. within the silent condition) as well as calculating the probability between groups (dialogue vs. silent, music vs. silent, sound effects vs. silent). These probabilities are then normalized relative to the

FIGURE 5.1 Normalized pupil variance across conditions with 95 per cent confidence intervals, and a representation of the mean luminance of each frame through time (from black = dark to light).

LOOKING AT SOUND

93

referent group’s (silent) mean and standard deviation, creating a z-scored gaze similarity value. The silent condition was chosen as the baseline so that we could identify the additive influence of sound. Negative values indicate random or less clustering than average. Positive values indicate moments of tighter than average clustering and the separation of the lines indicates that gaze in that condition is located in a different part of the screen than the silent condition (see Loschky et al. 2015 for further details about the method). A shuffled baseline was added as a referent for what a randomly distributed gaze would look like (green line in Figure 5.2). By shuffling the gaze data from the silent condition and rerunning the gaze similarity analysis for this shuffled data, it provides a baseline for random (i.e. asynchronous) gaze. In Figure 5.2, the gaze similarity means present a generally tightly clustered

FIGURE 5.2 Gaze similarity over time from How to Train Your Dragon under four different audio conditions. Upper and lower faded bands around each line indicate 95 per cent confidence intervals. None of the apparent differences between these bands reach statistical significance. Key frames from How to Train Your Dragon (DeBlois and Sanders, 2010; Copyright: DreamWorks Animation) with gaze heat map overlaid for each audio condition are displayed at the bottom.

94

SEEING INTO SCREENS

distribution of gaze that does not vary notably by auditory condition and are mostly more clustered than would be predicted by chance (denoted by the moments when the lines intersect with the shuffled baseline). Each of the significant moments in the plot are attributable to visual events as the groups tend to peak in unison, for example, at 26 seconds when the cut to a medium shot of Stoick’s face produced a tight clustering of gaze to his eyes that did not differ by condition. This is further evidence for the ‘tyranny of film’ (Loschky et al. 2015), that is, how the visual editing techniques, lighting, and central framing of action led to a reduced exploration of the screen space and centralized the scanpath. Sound events within the clip were isolated to test whether the audiovisual representation of objects captures gaze. Regions of interest (ROI) dynamically traced the audiovisual events (e.g. the sheep baaing, the villager dialogue and the sound of a dragon exhuming gas) for comparison between the conditions. No significant influences of audiovisual representation on gaze to these ROIs were observed in the sound effect or dialogue conditions. What is apparent is that the editing and highly mobile virtual camerawork were very effective in holding attention at the screen centre. This drive towards the screen centre combined with highly salient character motion preceding every diegetic sound effect meant that the gaze scanpath was very conservative and not influenced by audio changes. These findings mirror prior evidence (Smith 2013; Loschky et al. 2015; Redmond 2015; Smith 2015) that fast-paced, highly composed film sequences from a blockbuster narrative film do not afford the opportunity for the idiosyncratic gaze exploration that would be required to observe audio influences. However, prior studies using slower paced film clips with more scope for exploration have shown an influence of audio on the spatial distribution of gaze (Coutrot et al. 2012; Võ et al. 2012; Foulsham and Sanderson 2013; Rassell et al. 2016). To provide a greater opportunity for gaze exploration, the next case study uses a classic example of innovative sound design within a single long take, long shot: the opening scene from Francis Ford Coppola’s film, The Conversation (1974).

The Conversation The Conversation (Coppola 1974) is a film about Harry Caul (Gene Hackman), a renowned surveillance operative in San Francisco who wrestles with the moral implications of the information he captures. The sound designer Walter Murch was nominated for a Best Sound Oscar award for his work on the film. While the film is a fine example of a 1970s American art film that differs from How to Train Your Dragon in many dimensions, not least of all as an active subversion of the classical Hollywood formal technique and narrative style (Elsaesser 1975), our use

LOOKING AT SOUND

95

of the film here will focus on its famous opening sequence which will serve as an antithesis of the highly dynamic and rapidly edited sequence used in our previous case study. The opening scene is unique for both its use of a single continuous shot (with a subtle use of zoom), and for the use of a solely diegetic soundtrack. There are no overt non-diegetic sound effects, dialogue or music. The 2-minute and 54-second scene begins with a long wide shot of Union Square, bustling with Christmas shoppers. The sequence slowly pans and zooms, initially not directly framing any particular person or interaction. The square is busy, with a band playing in the bottom-right corner, a mime who is playfully mimicking passers-by, dogs barking, and a generally scattered crowd of people. It ends with a zoomed-in shot of Harry Caul as he exits the square. The audio from the scene is completely diegetic, and (with hindsight) a surveillance recording that is interspersed with short periods of incoherent electronic noise as Caul tunes in to objects of interest. The general mix (aside from these moments of distortion) captures footsteps, the band playing, dancing foot-scuffs, handclaps, dogs barking and the hubbub of a busy square. These sounds provide a unique opportunity to isolate and identify gaze differences subject to the visual correspondence with diegetic sounds. The most identifiable (and least competitive) moment is when the sound of a dog barking corresponds with the entrance of a dog from the right of the screen. The barking increases in loudness as the dog enters the screen, reinforcing the audiovisual contract (Chion 1994). A second isolatable moment in the auditory mix is the 16-second period when the mix is solely the band playing (increasing in loudness, then fading out with the song end). The predictions for the study are as follows. First, an auditory representation of the dog barking will both capture attention to the screen entrance of the dog, and those in the auditory condition will look at the dog faster than those in other conditions. Secondly, the self-reported ratings for arousal and happiness should be both happier and more excited (arousal) in the audio condition when compared to the silent condition. The third prediction is that the inclusion of audio will facilitate a more ‘guided’ visual attention, increasing the clustering of gaze within the group to similar screen locations in time. The fourth prediction is that during the auditory representation of the band (noticeably reduced auditory complexity), the pupil dilation reactions of the two groups to the music (in audio) and to the visuals alone (in silence) should differ, indicative of differing interpretations of the scene – the isolation of the music should disambiguate the scene for those in the audio condition. A total of 48 adults, 36 female and 12 male, aged between 20 and 50 years, watched the first 2 minutes and 54 seconds of the opening sequence of the film. Of these, 24 watched with the corresponding sound (played through headphones), and 24 watched in silence. Eye-tracking hardware and presentation conditions were identical to the previous case study. Each participant was asked to watch the film with the knowledge that a memory

96

SEEING INTO SCREENS

test based on what they had seen would follow (although this test was not administered). After the clip, each participant rated how happy (sad – happy) and how aroused (excited – unexcited) the film made them feel on a scale from one to nine (Bradely and Lang 1994). As observed qualitatively in the heat map overlay of Figure 5.3, the group who heard the dog barking fixated on the dog significantly faster (mean time from the entrance of the dog to the screen = 1316.8 milliseconds) than those in silence (1527.63 milliseconds; a statistical t-test of the mean times to fixate the dog showed a significant difference, t(35) = –2.114, p = 0.048). Both groups had a similar proportion of participants who looked at the dog. This provides some evidence that auditory information influences visual attention to corresponding objects, although the effect is subtle, mostly due to the general salience of moving objects and the need for movement to generate sound (these audiovisual objects are already visually salient). The effect of audio in this instance is a slightly earlier capture of attention, rather than the clear guidance of attention that is predicted with the inclusion of sound. The self-reported scores for happiness and arousal (Bradley and Lang 1994) support the general prediction that audio would be more exciting and would generally make people feel happier than watching in silence. Those who watched the clip with the audio (M = 3.46, SD = 1.38) reported significantly happier scores than those who watched in silence (M = 4.29,

FIGURE 5.3 Gaze distribution heat map for two frames (left and right column) from The Conversation (Coppola, 1974; Copyright: The Directors Company) that highlight the early allocation of gaze to the dog in the audio condition (Top) compared to the silent condition (Bottom). The ‘hot spots’ indicate a clustering of multiple viewers’ eye position.

LOOKING AT SOUND

97

SD = 1.55, t(46) = 1.97, p = 0.03 [one-tailed]). Also, those who watched the clip in silence (M = 6.08, SD = 2.10) reported significantly less excitement than those with the audio (M = 4.61, SD = 2.19, t(46) = 2.36, p = 0.012 [one-tailed]). As with the How to Train Your Dragon clip, the gaze similarity of the participants was analysed between the two groups. This is visualized in the top panel of Figure 5.4. A shuffled baseline derived from the gaze data in the silent condition was again included as a referent for randomly distributed gaze. Contrary to the prediction of the study, the gaze similarity values were not significantly different between the audio and silent groups (F(1,46) = 1.04, p = 0.3). The silent condition tended to have a slightly more clustered distribution of gaze (e.g. the peaks at 63 and 157 seconds). There is variance over time in the clustering distributions, but the pattern of variances does not indicate an additive auditory influence, as both the silent and audio conditions peak and trough in unison through time, indicating a primary shared influence of visual events. When considering the prior analysis on the preceding effect of the dog bark, this short ( 0.05), Sallah = Katanga (p > 0.05); Indiana/Marion < Sallah/Katanga (p's< 0.01)

Indy – Katanga: Difference = 11.29%, 95% CI −29.62 to 44.70, Χ2 (1) = 0.35, p = 0.55 Marion–Sallah: Difference = 0%, 95% CI −29.56 to 29.64, Χ2 (1) = 0.00, p = 1.00 Marion–Katanga: Difference = 11.29%, 95% CI −29.62 to 44.70, Χ2 (1) = 0.35, p = 0.55 Sallah–Katanga: Difference = 11.29%, 95% CI −29.62 to 44.70, Χ2 (1) = 0.35, p = 0.55

SEEING INTO SCREENS

0.35 (0.09)

Difference in IA time

206

Table 10.3 Time stamp, dwell time and statistical analyses for each clip; category = Classical Style vs Supporting Actors/Characters

8 (2) 0.17 (0.09)

0.12 (0.10)

Indiana Jones 7 [1:04:58 – 1:05:23] 0.35 (–1) (0.10)

0.04 (0.04)

0.20 (0.07)

0.80 (0.06)

F (1, 17) = 23.27, p < 0.01, ηp2 = 0.58

0.06 (0.05)

Indiana > Marion > Indy–Sallah: Difference = 9.16%, 95% Sallah (all p’s < 0.05) CI −22.41 to 38.82, Χ2 (1) = 0.36, p = 0.54 Marion–Sallah: Difference = 1.86%, 95% CI −28.13 to 31.52, Χ2 (1) = 0.02, p = 0.90

 Difference = 10.12%, 95% CI −13.38 to 39.89, Χ2 (1) = 1.15, p = 0.28 Indy = 13,086 ms Ark = 13,086 ms Sallah = 13,086 ms

Sallah

7 (2)

Indy–Marion: Difference = 11.02%, 95% CI −20.41 to 40.18, Χ2 (1) = 0.53, p = 0.46

Indy = 86248 ms Sallah = 96270 ms

Sallah

Indiana Ark Jones

0.12 (0.06)

F (2, 34) = 113.63, p < 0.01, ηp2 = 0.87

F (2, 34) =706.62, p < 0.01, ηp2 = 0.98

Sallah–Ark: Difference = 0.01%, 95% CI −27.48 to 86.25, Χ2 (1) = 0.00, p = 1.00 Ark > Indiana > Sallah–Indy: Difference = 0.01%, 95% Sallah (all p’s < 0.05) CI −26.89 to 31.72, Χ2 (1) = 0.00, p = 1.00 Indy–Ark: Difference = 0.00%, 95% CI −86.36 to 32.51, Χ2 (1) = 0.00, p = 1.00 

USING EYE TRACKING AND RAIDERS OF THE LOST ARK

 Indy = 40,071 ms Marion = 50,868 ms Sallah = 49,049 ms

Indiana Marion Sallah Jones

207

208

SEEING INTO SCREENS

FIGURE 10.2 Example of one participant’s biased gaze direction towards Indiana/ Ford in Clip 7.

viewing bias, where central placement motivates gaze direction (the sun); secondly, Indiana/Ford moves differently to the workers, first standing up as they continue to dig down, so gaze direction is also cued this way. This is supported by the fact that the viewers did not fixate immediately on Indiana/ Ford, although they did fixate on Indiana/Ford before he put the hat on his head, seemingly unprompted by the visual iconography of the character before recognizing him as the significant body in the sequence. (This mirrors the effect used in Indiana Jones and the Last Crusade (Steven Spielberg, USA, 1989), where the costume alone is not indicative of the ‘authentic’ character.) However, once the full ‘iconography’ was present, the duration of the participants’ gaze was fixated at length on him, despite the narrative significance of the digging and the centrally framed sun. In ‘En route to the Opening’, the camera tracks across a large group in movement, cutting into mid shots of the major characters at various points. The group appears to diminish in narrative importance with Belloc/Freeman and Marion/Allen at the front, Toht (Ronald Lacey) in the middle, and anonymous uniformed extras bringing up the rear. Also in uniform, and therefore blending in anonymously, Indiana/Ford sneaks in at the end of the group (and tracking shot) to join them. Unlike clip 6, his final actions do not render him visible through the distinctive iconography of ‘Indiana Jones’ – he merely joins the group and then moves away to hide in the camp. The recognizability of character iconography is removed (through the uniformity of costume) and the editing, camera movement, staging and presence of multiple performers complicates the potential for attention, especially in the focus on never-before-seen figures (the soldier extras). The participants spent longer looking at Toht/Lacey and the group of soldiers than at Belloc/Freeman and Marion/Allen (see Table 10.4). In the second

USING EYE TRACKING AND RAIDERS OF THE LOST ARK

209

part of the scene, participants spent the same amount of time looking at the soldiers and Indiana/Ford, and he was not specifically identified in the group. This suggests an inability to recognize the main character/actor due to the identical costuming, framing and position of him and the group. What should motivate the final attentional synchrony between viewer and character/actor is the final action of Indiana/Ford, where an exaggerated movement (a furtive look around and leap off to hide behind a prop) reveals him as ‘different’ to the group (and therefore as Indiana/Ford in disguise). However, while they do not initially see him, the participants did fixate on Indiana/Ford before he completed this overtly different movement. (This occurred at 6,000 milliseconds into the clip; see Table 10.4 for results.) This links their reactions to clip 6, where Indiana/Ford is not first identified at the point of overt difference (putting on the hat, moving away from the group). Similarly, this fixation stands at odds with the assumption of a central viewing bias as here attention is directed towards him before he reaches the centre of the frame. In previous examples, the viewers appear to follow the central visual cues that direct attention to what is centrally framed (the sun, the ark, the idol, etc.). In this clip, they appear to be making particular distinctions between (seemingly) identical figures who occupy different spaces. They look for, and locate, Indiana/Ford despite the absence of classical cueing motifs and other obvious visual differences.

Conclusion Recording the eye movements, timings and durations of gaze of participants through eye tracking begins to reveal the complex and often contradictory ways in which people watch films, sequences and figures, and the different ways they are cued to do this. By concentrating our analysis of results around the central presence of a star figure (in addition to acknowledging Ford’s performance of a specific character), we have begun to identify how viewers may self-determine where to look and for how long, and even pre-empt conventional cues as well as following traditional formal stylistic devices associated with the ‘success’ of the classical cinematic form. Significantly, they appear to be able to do both, often shifting between the two types of spectatorial behaviour over the course of the series of clips shown. In the ‘Indiana and Belloc’ clip, they tended to follow narrative/speech, lighting and camera focus cues, rather than the scale and framing of the star/character presence. Yet in ‘The Government Meeting’, this pattern was not adhered to, and in ‘Goodbye Sallah’, despite being initially cued by speech and movement, the viewers’ gaze lingered on a figure who was otherwise sidelined and silenced. Just as Hasson et al. implicitly found in The Good, The Bad and The Ugly, when offered two figures formally styled almost equally, in ‘Raising the Ark’, gaze was still concentrated on the more significant

Category

Clip

Characters’ proportional dwell time Difference in Post hoc tests (M [SD]) dwell time Indiana Jones

Sun

Difference in IA time

Further analyses

Comparison of fixation on Indiana at start of clip (0 Indiana Jones = ms) using oneF (2, 34) Sun (p > 0.05); Workers–Indy: Difference sample t-test with = 15.41, Indiana Jones/ =3.62%, 95% CI −48.95 to that value as test p < 0.01, ηp2 Sun > Workers 73.38, Χ2 (1) = 0.09, p = 0.76 value = 0.48 (p's < 0.01) Indy–Sun: Difference = 4.00%, 95% CI −48.60 to 97.05, Χ2 (1) = 0.07, p = 0.79

0.04 (0.04)

6 [55:47 – 56:02]

Comparison of fixation when he puts his hat on (5,551 ms) using one-sample t-test with that value as test value

Toht

Belloc

Marion

Group

Toht = 14,274 ms Belloc = 15,912 ms Marion = 8,652 ms Group = 53,313 ms

t (17) = 3.80, p < 0.01 (mean difference = 376.89 ms)

t (17) = –52.13, p < 0.01 (mean difference = –5,174.11 ms)

SEEING INTO SCREENS

Classical style vs restrictionthen-recognition of Indiana/Ford

0.15 (0.13)

Notes

Indy = 16,802 ms Sun = 16,203 ms Workers = 16,216 ms

Workers

Workers–Sun: Difference = 0.07%, 95% CI –69.84 to 93.50, Χ2 (1) = 0.00, p = 1.00

0.29 (0.17)

210

Table 10.4 Time stamp, dwell time and statistical analyses for each clip; category = Classical Style vs Restrictionthen-Recognition of Indiana/Ford

Toht–Belloc: Difference = 3.72%, 95% CI −37.54 to 40.33, Χ2 (1) = 0.04, p = 0.85 Toht–Marion: Difference = 12.78%, 95% CI –30.46 to 51.78, Χ2 (1) = 0.40, p = 0.53

0.20 (0.06)

0.12 (0.05)

0.04 (0.03)

0.20 (0.08)

F (3, 51) = 31.56, p < 0.01, ηp2 = 0.65

Toht = Group (p > 0.05); Belloc > Marion (p < 0.01); Toht/Group > Belloc/Marion (p's < 0.05)

Toht–Group: Difference = 67.56%, 95% CI 31.32 to 92.10, Χ2 (1) = 28.57, p < 0.01 Belloc–Marion: Difference = 16.5%, 95% CI −25.29 to 48.68, Χ2 (1) = 0.77, p = 0.38 Belloc–Group: Difference = 63.84%, 95% CI 35.06 to 85.73, Χ2 (1) = 2,871, p < 0.01 Marion–Group: Difference = 80.34%, 95% CI 43.50 to 97.59, Χ2 (1) = 35.09, p < 0.01

Indiana Jones

Anon. Soldiers

Indy = 12,142 ms Anon. Soldiers = 12,145 ms

Initial fixation time comparison between Indiana (3,823.81 ms [2,299.44]) and Anon. Soldiers (652.44 ms [504.84])

F (1, 15) = 28.16, p < 0.01, ηp2 = 0.65

Difference = 0.00%, 95% CI –45.08 to 51.16, Χ2 (1) = 0.00, p = 1.00

Comparison of fixations when Indiana looks back (at 6,000 ms) using one-sample t-test

t (16) = –2.20, p < 0.05 (mean difference = –1,190.41)

9 (2)

0.26 (0.15)

0.38 (0.16)

F (1, 17) = 3.29, p = 0.09, ηp2 = 0.16

USING EYE TRACKING AND RAIDERS OF THE LOST ARK

9 [1:36:58 – 1:37:43] (1)

211

212

SEEING INTO SCREENS

figure of Indiana/Ford. And in clips where they had to ‘find’ Indiana/Ford themselves (with an absence of conventional editing/lighting cues), they did so before the iconography of the character was present, perhaps relying more on a recognition of the movement patterns associated with that star or with their own familiarity with the whole text. Our results show a concentration of gaze attention on faces (cf. Wild et al. 2001; Juth et al. 2005; Redmond et al. 2015). However, conceptually speaking, it is virtually impossible to conclusively determine whose face the participants define themselves as looking at in any one moment – that of ‘Harrison Ford’ or that of ‘Indiana Jones’. As star discourse reveals, the process of gaze, attention and identification between star and character is a fluid one, where each identity may merge with or subsume/overwhelm the other, depending on the audience. The logical next step for this study would be to marry the quantitative method of eye tracking with a qualitative participant commentary that articulates the higher cognitive processes at work from this specific context. Although not with this particular question in mind, the qualitative work with focus groups in Martin Barker and Sarah Ralph’s (2015) study of audience responses to acting in The Usual Suspects (Bryan Singer, USA 1995) reveals the distinct dual use of character and actor/star identity, with participants naming both and with some clearly relying more on the star identity/name than the character in articulating and interpreting their own responses. Although in our experiment it remains difficult to completely separate the ‘star’ from ‘the character’, our choice of text, individual sequences and investigative stance towards Ford and the other actors on screen begin to frame questions of recording, measuring and interpreting gaze, attentional synchrony and spectatorship from an alternative perspective, one that allows the space of stardom to be acknowledged as a potential stimulus. Using such a classically styled, directorial-led, character-driven text as Raiders of the Lost Ark confirms the influential cues of narrative, dialogue, editing, lighting and other visual elements. But, as our results suggest (and Indiana Jones and the Last Crusade makes explicit), Harrison Ford is another crucial element that guides spectatorial behaviour and interpretation. While more formally obscured in Raiders of the Lost Ark by the prevalence of classical convention, Ford’s star status (especially for contemporary viewers) may also direct attention and even challenge the so-called tyranny of film. Extratextual knowledge of stars, performance styles and paratexts could also contribute to our understanding of empirical measurements of spectatorship. We want to suggest that the dissemination of such data should perhaps challenge the singular assumptions that we only watch a character on screen and that narrative orientation over star-spectacle or memory alone guides a viewer’s gaze.

USING EYE TRACKING AND RAIDERS OF THE LOST ARK

213

Notes 1 The participants were asked to fixate on a central fixation prior to each clip. Therefore, it is possible that the initial fixations made were in the area where the central fixation point was set. However, as the main dependent variable was interest area dwell time across participants, this is likely to have been averaged out in the analyses. For comparisons of initial fixation, the dynamic interest areas at the start of each clip did not cover the central fixation point. 2 There is a brief cutaway to Satipo (Alfred Molina).

References Bordwell, D. (2010), ‘The Part-Time Cognitivist: A View from Film Studies’, Projections, 4 (2): 1–18 Brown, W. (2015), ‘Politicizing Eye Tracking Studies of Film’, Refractory, 25. Available online: http://refractory.unimelb.edu.au/2015/02/07/brown/. Buckland, W. (2006), Directed by Steven Spielberg: Poetics of the Contemporary Hollywood Blockbuster, London and New York: Continuum. Delorme, A., G. A. Rousselet, M. J-M. Mace and M. Fabre-Thorpe (2004), ‘Interaction of Top-Down and Bottom-Up Processing in the Fast Visual Analysis of Natural Scenes’, Cognitive Brain Research, 19: 103–13. Diamond, A. (2013), ‘Executive Functions’, Annual Review of Psychology, 64: 135–68. Engel, A. K., P. Fries and W. Singer (2001), ‘Dynamic Predictions: Oscillations and Synchrony in Top-Down Processing’, Nature Reviews Neuroscience, (2): 704–16. Gordon, A. E. (2007), Empire of Dreams: The Science Fiction and Fantasy Films of Steven Spielberg, Lanham: Rowan & Littlefield. Hasson, U., O. Landesman, B. Knappmeyer, I. Vallines, N. Rubin and D. Heeger (2008), ‘Neurocinematics: The Neuroscience of Film’, Projections, 2 (1): 1–26. Juth, P., D. Lundqvist, A. Karlsson and A. Ohman (2005), ‘Looking for Foes and Friends: Perceptual and Emotional Factors When Finding a Face in the Crowd’, Emotion, 54 (4): 379–95. Levin, D. T. and D. J. Simons (1997), ‘Failure to Detect Changes to Attended Objects in Motion Pictures’, Psychonomic Bulletin and Review, 4 (4): 501–6. Loschky, L. C., A. M. Larson, J. P. Magliano and T. J. Smith (2015) ‘What Would Jaws do? The Tyranny of Film and the Relationship between Gaze and HigherLevel Narrative Film Comprehension’, PLoS ONE, 10 (11): 1–23 Maltby, R. (1995), Hollywood Cinema, Oxford and Cambridge MA: Blackwell. Marchant, P., D. Raybould, T. Renshaw and R. Steven (2009) ‘Are You Seeing What I’m Seeing? An Eye-tracking Evaluation of Dynamic Scenes’ Digital Creativity, 20 (3): 153–63. McDonald, P. (2012), ‘Spectacular Acting: On the Exhibitionist Dynamics of Film Star Performance’, in J. Sternagel, D. Levitt and D. Mersch (eds), Acting and

214

SEEING INTO SCREENS

Performance in Moving Image Culture, 61–71, New Brunswick and London: Transaction. Morris, N. (2007), The Cinema of Steven Spielberg: Empire of Light, London: Wallflower Press. Moseley, R. (2002), Growing up with Audrey Hepburn, Manchester: Manchester University Press. Orpen, V. (2003), Film Editing, London: Wallflower Press. Plantinga, C. and G. M. Smith (1999), ‘Introduction’, in C. Plantinga and G. M. Smith (eds), Passionate Views: Film, Cognition and Emotion, 1–17, Chicago: Johns Hopkins University Press. Ralph, S. and M. Barker (2015), ‘What a Performance! Exploring Audiences’ Responses to Film Acting’, Participations, 12 (1). Available online: http://www. participations.org/Volume%2012/Issue%201/41.pdf. Redmond, S. (2016), ‘Aesthetic Autoethnography: Dancing with Ian Curtis’, in S. Holmes, S. Ralph and S. Redmond, ‘Swivelling the spotlight: Stardom, Celebrity and “me”’, Celebrity Audiences, 110–17, London and New York: Routledge. Redmond, S., J. Sita and K. Vincs (2015), ‘Our Sherlockian Eyes: The Surveillance of Vision’, Refractory, 25. Available online: http://refractory.unimelb.edu. au/2015/02/07/redmond-sita-vincs/. Smith, T. J. (2014), ‘Audiovisual Correspondences in Sergei Eisenstein’s Alexander Nevsky: A Case Study in Viewer Attention’, in P. Taberham and T. Nannicelli (eds), Cognitive Media Theory, 85–105, New York and London: Routledge. Smith, T. J. (2015), ‘Read, Watch, Listen: A Commentary on Eye Tracking and Moving Images’, Refractory, 25. Available online: http://refractory.unimelb.edu. au/2015/02/07/smith/ Smith, T. J, D. Levin and J. Cutting (2012), ‘A Window on Reality: Perceiving Edited Moving Images’, Current Directions in Psychological Science, 21: 101–6. Sobchack, V. (2004), ‘Thinking through Jim Carrey’, in C. Baron, D. Carson and F. P. Tomasulo (eds), More Than a Method: Trends and Traditions in Contemporary Film Performance, 275–96, Detroit: Wayne State University Press. Stacey, J. (1994), Star Gazing: Hollywood Cinema and Spectatorship, London and New York: Routledge. Wild, B., M. Erbb and M. Bartelsa (2001), ‘Are Emotions Contagious? Evoked Emotions While Viewing Emotionally Expressive Faces: Quality, Quantity, Time Course and Gender Differences’, Psychiatry Research, 102 (2): 109–24.

11 A Proposed Workflow for the Creation of Integrated Titles Based on Eye-Tracking Data Wendy Fox

1 Introduction Abusive, creative, hybrid and integrated are just some of the many terms that have emerged in past years to describe new kinds of subtitles – titles that appear all over the screen, that imitate or contrast with the film’s images, and that follow modern concepts of design and perception in audiovisual translation. Subtitles are placed not at the bottom but in close relation to what is currently happening on the screen. There are various examples of films incorporating integrated titles in order to translate an additional language in an English-language film, as in John Wick (Stahelski and Leitch 2014). There are also first examples of integrated titles being used to translate an entire film into another target language, both for hearing audiences (see Night Watch/Nochnoy Dozor [Bekmambetov 2004]), and hearing-impaired audiences (see Notes on Blindness [Spinney and Middleton 2016]). However, there seem to be no clear rules or guidelines being followed in the creation of these integrated titles. Based on an eye-tracking study that illustrates how integrated titles can enhance image exploration, detail perception and overall entertainment value, a first workflow is proposed and tested in this chapter.

216

SEEING INTO SCREENS

2 Integrated titles: Terms and definitions When film critics or viewers that are not used to subtitles discuss traditional or conventional subtitles, they are often seen as ‘a blemish on the film screen’ (Díaz Cintas and Remael 2007: 82), or an ‘intrusion into the visual space of film’ (Thompson 2000: 1) with the ‘potential to “drown” the images [and] instead of watching images, the audience starts literally to see only texts’ (Sinha 2004: 174). These reactions demonstrate the relevance of audience experience for subtitling processes in general. Traditional subtitles can create a negative splitting of attention, distract particularly inexperienced viewers or obscure relevant image areas or elements. Stuart Comer, the curator of film at the Tate Museum in London, identifies one of the main reasons for these shortcomings: ‘Subtitling often takes place after the film is completed. It isn’t necessarily done by the director, and there is less quality control. That’s why it can seem thoughtless’ (Rawsthorn 2007). While it is understandable that before digital technology, the complex and timeconsuming process of adding subtitles to film did not spark much interest in additional design effort, today’s processes are much more flexible. However, few filmmakers and producers have adjusted to this not-so-new reality. ‘Purely utilitarian’ subtitles that keep ‘interfering with the beautifully framed and considered shot of the director’ remain the norm (Vit 2005). The demand for the cheapest possible solution to language barriers might seem like an understandable explanation – but how much might a slightly higher investment into subtitling actually weigh, especially compared to the very high costs of dubbing? Considering Pablo Romero-Fresco’s (2013: 202) observation that ‘half of the revenue of … both top-grossing and award-winning Hollywood films comes from foreign territories’ and the fact that ‘translation and accessibility services only account for 0.1–1 per cent of the budget of an average film production’ (Lambourne 2012), it can only be in the interest of film producers to take a critical look at the reception of translated versions of their films. Various terms have emerged over past years to describe deviations from conventional subtitling. Earlier suggestions such as ‘abusive’ (Nornes 1999: 18) and ‘hybrid’ (Díaz Cintas and Muñoz Sánchez 2006: 51) focus on subtitles in the context of Japanese anime and Nornes’s ‘abusive subtitles’ are often seen as primarily ‘making translations linguistically visible’ (Foerster 2010: 86). However, researchers such as Foerster (2010) and McClarty (2012) approach the topic from another perspective, namely that of film studies and aesthetics. Foerster (2010: 82) criticizes the ‘Code of Good Subtitling’ (Ivarsson and Carroll 1998: 158) and its aim of invisibility, promoting ‘register and … design for subtitles that never call attention to themselves’ based on an interpretation of subtitles ‘solely as a means of understanding what is being said on screen’. In contrast, Foerster (2010: 85) defines ‘aesthetic subtitling’ as a practice that ‘draws attention to the subtitles via aesthetic means exploring semiotic possibilities, which include

WORKFLOW FOR THE CREATION OF INTEGRATED TITLES

217

the semantic dimension without being restricted by it’ and the subtitles are ‘predominantly designed graphically to support or match the aesthetics of the audiovisual text and consequently develop an aesthetic of their own’. McClarty (2012: 138) follows a more film studies-based approach when she speaks of ‘creative subtitles’. Similarly to Foerster, she sees ‘subtitling practitioners [as] mere norm-obeying machines’ that ‘continue to have their hands tied by the constraints of the field and the norms of the profession’ while failing to ‘acknowledge the insights that could be gained by referring to audiovisual translation’s parallel discipline: film studies’ (McClarty 2012: 135). Instead of ‘abusive’, she sees the need for a ‘creative’ approach that does not simply ‘describe a subtitling practice that differs from the norm but … looks outward from its own discipline as well as its own culture’ and aims for ‘difference rather than sameness’ (McClarty 2012: 135, 140). McClarty (2012) emphasizes that it is not the creativity of a translator or subtitle practitioner that leads to innovative titles as seen in Slumdog Millionaire (Boyle 2008) or La Antena (Sapir 2007) (see Figure 11.1). Rather, it is the ‘imagination of film directors and editors’ who not only want to provide a translation but also want to create additional comedic or artistic effects (McClarty 2012: 140–2). Thus, subtitles can convey not only content, but also sound or speaker location (McClarty 2012: 146ff). The goal should not be a new set of norms but a ‘creative response to individual qualities within and between films’, created by a ‘translator-title designer’ who aims for both linguistically and aesthetically pleasing titles (McClarty 2012: 146ff, 149). These should be produced within a film’s post-production phase in cooperation with the filmmaker, editors and title designers. Another term that can be found in recent studies that is more strongly focused on usability and automation is ‘dynamic subtitles’ (Armstrong and Brooks 2014; Brown et al. 2015). The concept is based on placing the titles close to the speaker, allowing for easier speaker identification and more time for image exploration. Similarly, studies by Park et al. (2008), Hong et al. (2010) and Hu et al. (2014) focus not only on improved perception but also on the automation of subtitling processes. They use terms such as ‘speaker-following subtitles’ to refer to automatic speaker recognition systems within the software packages they have developed.

FIGURE 11.1 Slumdog Millionaire (2008) and La Antena (2007).

218

SEEING INTO SCREENS

While these approaches still use the term ‘subtitles’, even back in 1999, Nornes (1999: 23) put the prefix ‘sub’ in parentheses ‘because they were not always at the bottom of the frame’. As Bayram and Bayraktar (2012: 82) speak of ‘text information’ presented in ‘integrated formats’ when combining text and image, I decided to adopt the term ‘integrated titles’ for a pilot study in 2012 (see Fox 2012) and have retained this term in following studies (see Fox forthcoming) as Armstrong and Brooks (2014) mention the enhancement of subtitles through ‘integrating them with the moving image’. Even though this approach could also be deemed ‘creative’ or ‘aesthetic’ and the title placement ‘dynamic’ or ‘speaker-following’, using the term ‘integrated’ seems to include both these concepts while emphasizing the relationship between image and title. The various terms that describe these different approaches show the wide bandwidth of possibilities available that defy tight norms, definitions or guidelines. The analysis in Fox (forthcoming) gives a first overview over the general frequencies and placement of integrated titles in English-language films. Partially integrated titles that follow classic SDH (subtitles for the deaf and hard-of-hearing) guidelines can be found on recent Blu-ray releases such as Jaws (Spielberg 1975/2012), Gone Girl (Fincher 2014), and Ex Machina (Garland 2015). These releases offer titles that are placed under or in-between speakers and noise sources and allow for clearer speaker identification. Titles integrated with the image and not confined to the bottom area of the screen can be found in Man on Fire (Scott 2004), Night Watch, Heroes (Kring 2006–2010), Slumdog Millionaire, Fast Five (Lin 2011; as well as following productions of the franchise), Star Trek Into Darkness (Abrams 2013) and John Wick. A first example of a German production including integrated titles (English to German) is Victoria (Schipper 2015), while Notes on Blindness includes individually placed SDH. Based on this growing number and frequency of integrated approaches, the eye-tracking study presented in the following section was aimed at testing whether these kinds of titles actually improve the perception and entertainment value of films by allowing the viewer to spend more time looking at relevant image areas instead of the bottom area of the screen, and by improving the visibility and aesthetics of the titles themselves. The results provide the foundation for a proposed workflow for the creation of integrated titles.

3 Reception study The aim of the reception study was to analyse and discuss the impact of integrated titles on the audience. Reception here is understood both in terms of visual attention and aesthetic experience. The study compares the eye movements of native English speakers watching an English documentary

WORKFLOW FOR THE CREATION OF INTEGRATED TITLES

219

without any titles with those of German natives watching the same documentary with either integrated titles (INT) or traditional subtitles (TRAD). A review of literature on eye tracking and film aesthetics allowed me to propose the following hypotheses and test them based on eye-tracking and questionnaire data:

Visual attention (Based on eye-tracking data): Hypothesis 1: The fixation duration of INT is shorter than that of TRAD. Hypothesis 2: Compared to TRAD participants, INT participants spend more time fixating the scene than fixating the titles, experiencing a positive split attention.1 Hypothesis 3: The gaze patterns of the INT participants more closely resemble those of native-language viewers than do the gaze patterns of the TRAD participants. Hypothesis 4: The time to first fixation, which can be interpreted as reaction time, of INT participants is longer than that of TRAD participants.

Aesthetic experience (Based on questionnaire data): Hypothesis 5: The integrated titles allow a more detailed perception of the film. Hypothesis 6: The integrated titles are rated as more aesthetically pleasing. The experiment was conducted with a Tobii TX300, a video-based remote eye-tracker that uses Purkinje images.2 Its high sampling rate of 300 Hertz allows for precise data and despite the high resolution of the screen (1920 × 1080 pixel), head movements do not pose a major issue. The system collects data on saccades, fixations, pupil size and eyelid movements. The software used was Tobii Studio (version 3.1.6) including the Tobii Fixation Filter. Collected data included a number of eye movement variables as well as questionnaire results from INT participants. The film used for this study was the short documentary Joining the Dots by Pablo Romero-Fresco, screened for the first time in 2012.3 The documentary shows an interview with Trevor Franklin, who went blind at the age of sixty, and has a runtime of twelve minutes. He speaks about his experiences and how he handles being blind. The main topic of the documentary is accessibility for the blind, focusing on television and theatre. The image system, compositions and key elements in the various scenes were defined and possible placements and designs discussed with the filmmaker prior to the design of the titles, trialling Romero-Fresco’s idea that subtitle creation should be integrated into post-production processes.

220

SEEING INTO SCREENS

Each participant of the study watched the entire film without any prior knowledge of the topic or the goal of the experiment. The German native speakers who watched the version with integrated titles then filled out a questionnaire about information flow and title aesthetics. In total, 45 participants took part in the experiment; 14 of them were native English speakers between the ages of 18 and 45 who study at the Fachbereich Translations-, Sprach- und Kulturwissenschaft at Johannes GutenbergUniversität and watched the original version (ORIG) of Joining the Dots. As film audiences are usually not homogenous groups, only native language and eye sight proficiency were determined. Each participant claimed to have normal or corrected-to-normal vision. Thirty-one participants were native German speakers who stated that they rely on German subtitles to understand English-language films. As switching on subtitles is a personal decision based on the viewer’s self-assessment and not his or her factual knowledge of the foreign language, the actual level of the participants’ English was not determined. Of these native German speakers, 15 watched Joining the Dots with traditional subtitles in German (TRAD) while the other 16 participants watched the film with the integrated titles (INT) in German. None of the participants had seen the film before. The integrated titles aimed to keep relevant, primary areas of the film free from text that might obscure pertinent information, and to place titles as closely as possible to the main focus points indicated by the eye movement recordings of native English speakers. This should facilitate speaker identification and provide an indication of speaking direction (see Figure 11.2). Additionally, a small number of effects were used to further aid

FIGURE 11.2 Indication of speaking direction in Joining the Dots.

WORKFLOW FOR THE CREATION OF INTEGRATED TITLES

221

comprehension and add levels of artistry. These included slow fade-ins and fade-outs to convey hesitation, emotion and unfinished sentences as well as depth effects and varied transparency levels.

3.1 Results and discussion The results of the reception study are based on the previously mentioned eye movement measures and the short questionnaire. For the eye-tracking data, the comparison between INT and TRAD subtitles all reached significant levels of p < 0.001 (see Table 11.1). When comparing the average fixation duration (AFD) between subtitling conditions (see Figure 11.3), integrated

Table 11.1 Mean values of the recorded eye-tracking measures AFD title AFD image ORIG

TFF

0 values Cluster fixation Title fixation





87.9%



TRAD

51.6%

48.4%

0.057s

28.7%

75.3%

88.1%

INT

47.5%

52.5%

0.074s

16.5%

83.3%

98.2%

FIGURE 11.3 Comparison of the viewing behaviour of the ORIG (top), TRAD (bottom left) and INT (bottom right) participants in Joining the Dots.

222

SEEING INTO SCREENS

titles were found to have an effect on the fixation duration on the title. For INT, the AFD on the title decreased by about 14.4 per cent compared to TRAD. When titles were visible, on average, INT participants focused on the title area 47.5 per cent of the time and TRAD participants, 51.6 per cent of the time. Integrated titles seem to motivate viewers to spend more time on image exploration and to return more quickly to the actual focal point in the image. This is also supported by the low number of participants that fixated the title area before the title was faded in: only about 16.5 per cent of all recorded times to first fixation (TFF) of the INT participants were 0, meaning that participants fixated the area before the title was visible. For the TRAD participants, the percentage of TFF amounting to 0 was 28.7 per cent. Based on a random sample of ten titles and subtitles and twenty-three areas defined as relevant due to the native-speaker participants’ gaze behaviour, the eye movements of German-speaking participants (INT and TRAD) were compared to those of the natural focus group (ORIG). To create these twenty-three relevant areas, clusters (accumulated fixations) were automatically created using Tobii Studio (version 3.1.6) and were defined as natural focus points if more than 50 per cent of the ORIG participants fixated on them at least once (see Figure 11.4). As visible from Figure 11.4, the native speakers that watched the original version of the film (ORIG) fixated on average 87.9 per cent of the twenty-three areas, whereas 75.3 per cent of TRAD participants and 83.3 per cent of INT participants fixated the same or similar areas. Additionally, INT participants fixated a higher percentage of the overall titles. When examining the TFF between viewing groups, a negative effect on the TFF was found for the INT condition. Including the 0 values, the TFF of INT participants increased by 28.9 per cent compared to TRAD participants. Excluding the 0 values, the increase was about 25.9 per cent or 18 milliseconds. As the TFF can be interpreted as the reaction time to the appearing title, an increase seems to be a negative effect. However, while a shorter TFF might indicate less cognitive load for the viewer, a longer TFF could result from the extended image exploration demonstrated by INT participants before the first fixation of the title. In addition to analysing the eye movements and attention distribution of the participants, the questionnaire data of INT participants showed subjective improvements for the INT condition concerning detail perception and aesthetic experience. All participants with integrated titles were asked to rank a series of statements after watching the film. The evaluation showed an overall positive rating for aesthetic experience and increased information intake, especially when compared to traditional subtitles. This supports the hypothesis that differences in title design and placement are perceived by the audience and that considerate placement can positively impact reception processes and information gain. Participants that were used to traditional

WORKFLOW FOR THE CREATION OF INTEGRATED TITLES

223

FIGURE 11.4 Comparison of the automatically created clusters in Joining the Dots: ORIG (top), TRAD (bottom left), and INT (bottom right).

subtitles rated integrated titles as an improvement and a feasible alternative that they would like to use in the future.

4 A proposed workflow After the creation of the integrated titles, data analysis and in-depth discussions with filmmakers and subtitle professionals, I formed the impression that developing strict norms or guidelines for integrated titling would not be the right approach. Film is such an individual and artistic medium, and it offers multiple opportunities to work creatively with text elements, typography and image composition. Instead, based on the eye-tracking and questionnaire data, I developed a concept for modular guidelines and recommendations for the creation of integrated titles. The essential steps in my proposed workflow are identified as translation of the dialogue, analysis of the film material, concept creation for placement and layout strategies, and final application. Although the translation process involves corrections and adjustments throughout the whole workflow, a first draft should be completed before taking any further steps. The following chart gives an overview of this proposed workflow for the creation of integrated titles.

224

SEEING INTO SCREENS

CHART 1 Detailed proposed workflow for the creation of integrated titles.

4.1 Analysis The analysis of film material should include a consideration of image composition and filmmaker intentions, an assessment of the complexity of the film and its individual scenes (as this influences placement strategies, effects and layout), and a definition of the film’s typographic identity. Additionally, consideration of the target group is important, especially whether it is made up of hearing or deaf audiences and whether it incorporates spoken or sign language, as, again, these factors influence placement and layout concepts. In particular, a differentiation between hearing and deaf or hard-of-hearing audiences needs to be taken into account, as well as differences within hearing-impaired groups such as being born deaf or having become deaf post-lingually. It can be assumed that hearing-impaired audiences are more interested in additional layout effects such as colour-based indication of speakers. Furthermore, prelingually deaf audiences are likely to demonstrate a slower reading pace (Dyer et al. 2003: 215ff).4 Together with the translation, an analysis of content and situations should take place. The pace of speech acts, music and sound, and the existence, number and layout of text elements (see Fox 2016) influences the space and time available for the integrated titles, especially as there should be ‘a close correlation between film dialogue and subtitle content’ (Carroll and Ivarsson 1998). As these aspects are tightly connected to the image, a basic understanding of image composition rules should be present. To understand how a shot works, its format and the way it presents content, familiarity is needed with various aspects of filmmaking including camera angles, lenses, depth perception and editing (see Fox forthcoming). Only if the image composition and image systems at work are understood can primary and secondary areas be defined and a good position be found for a title. Hence, a basic understanding of shot compositions, emotion and

WORKFLOW FOR THE CREATION OF INTEGRATED TITLES

225

storytelling, rhythm, the use of three-dimensional space, and image systems is required. Such an understanding allows films and individual scenes to be assigned various levels of difficulty concerning the placement of integrated titles (see Fox forthcoming). These levels of difficulty are based on a number of characteristics that can either complicate or facilitate the title placement: static versus active scenes, visibility and number of speakers, the presence of cuts during a speech act, the dominance of either secondary or primary areas, and the difficulty level of creating a viable contrast between title and image. Within the proposed workflow, dialogue translation and image analysis are closely linked and together contribute to content analysis. The complexity and form of both are determined for the film and individual scenes based on the following considerations: Do monologues outweigh dialogues? Are there many quick-paced fights or, rather, formal greetings and discussions? Where are the speakers in relation to other focus points? Are there many pre-existing text elements in the image? These considerations enable initial, basic decisions concerning placement and layout. The layout strategies are not only based on the content and complexity, but should also be preceded by analysis of the film’s typographic identity (see Fox 2016). Typefaces, colours and existing text placement and effects should be considered and incorporated in the layout of the integrated titles as the overall use of text elements can support or establish a specific tone or atmosphere.

4.2 Concept Based on the analysis, a concept for both the placement and layout of the integrated titles has been created. Several requirements of integrated title concepts were developed and discussed in Fox (forthcoming), based not only on translation studies but also film studies (offering insight into image composition and storytelling), communication design (for definitions of aesthetics and creativity), usability studies (on user experience as well as interface design) and computer sciences (potential automation processes and software design): ● ●

Intuitiveness (learnability, efficiency and memorability) Usefulness Suitable translation (e.g. preventing negative acoustic feedback effects) Consistency (following comprehensible rules and avoiding unintended irritation, frustration or amusement) Readability/legibility







226

SEEING INTO SCREENS

Reduced eyestrain (small distance in-between consecutive titles) Close to action and close to speaker (speaker identification) Satisfaction, based on the combination of a pleasing layout and comprehensible design concept: Titles are within safe area Suitable typeface Legible colour combinations Saturation index < 85 per cent No overlap with speaker’s mouth, other text elements or important activity

❍ ❍ ●

❍ ❍ ❍ ❍ ❍

Concerning placement strategies, an extensive analysis of all known cases of integrated titles (see Section 2 and Fox forthcoming) revealed the most frequent and modern basic placement strategies (Table 11.2). These positions can be combined with customized positions based on the film’s image composition and atmosphere. While these positions respond to the actual physical positions of speakers on screen, they should be based additionally on a specific concept, one that is ideally developed for each film individually. Based on the results of the reception study detailed previously and the set of criteria presented here, a number of layout and placement objectives can be derived. These are summarized in the following list and should be considered equally: ●





Resemblance of natural focus: In order to recreate the natural viewing patterns of native speakers watching the original version of a film, one objective should be to place titles as close as possible to main focus areas. For Joining the Dots, these main focus areas could be clearly defined based on the recordings of the eye movements of the native English-speaking participants. With integrated titles being closer to the main focus areas, the viewing behaviour of the German-speaking participants watching Joining the Dots with integrated titles was much closer to that of the English natives (compared to the participants watching Joining the Dots with traditional subtitles). Use of secondary areas: If the image composition is taken into account and no primary areas or elements are covered, eye-tracking data could also be used to define secondary areas. Basic knowledge of film studies and communication design principles (see Fox forthcoming) can also help with the analysis and interpretation of film scenes. Indication of speaker and speaking direction: The position of a title can be chosen in a way that makes it possible to quickly connect the

WORKFLOW FOR THE CREATION OF INTEGRATED TITLES

227

Table 11.2 Basic set of positions for integrated titles developed by Fox (forthcoming) Visible speakers

Placement strategy

Off-screen

Below focus Next to focus Speaking direction

1 speaker

Below speaker Next to speaker Speaking direction

2+ speakers

Below focus/speaker Next to focus/speaker In-between speakers

title to the speaker (e.g. below a speaker). Furthermore, the position of a title can also indicate speech direction or the position of the conversation partner. This also supports a natural focus, for example, in a conversation between two speakers with the titles placed between them. Legibility: Good legibility and readability are usually achieved through a strong contrast and an even background. A well-designed title can still be read despite a changing background and should reflect the colour palette of the image rather than contrasting with it (while still creating a strong enough contrast to ensure easy legibility). Individual aesthetic and/or typographic concepts: Other concepts might focus on supporting a film’s atmosphere, tone or image composition. This can be reflected by the typographic identity, effects or special placement strategies (see e.g. John Wick in Figure 11.5). Accessibility: Accessibility can be the reason for the use of integrated titles in a film. This might include previously mentioned objectives such as indication of speech direction and additional objectives such as noise indication or colour-based speaker identification. When deciding upon objectives for integrating titles with a specific film, it should be clear from the beginning whether the overall concept is primarily artistic (see e.g. Man on Fire and Slumdog Millionaire) or whether the aim is to improve overall legibility and information intake (as for Joining the Dots). ●





228

SEEING INTO SCREENS

FIGURE 11.5 Individual typographic concept in John Wick (00:47:32).

Additional objectives might be transportation or immersion, factors that have been shown to actually increase through the inclusion of subtitles or integrated titles in a film.5 Other features that should be considered are typographic and kinetic effects. While these factors were not analysed in the experiment presented here or in previous studies, it is very likely that they play a role and present an interesting topic for follow-up studies. Depending on the individual film, its genre and its intentions, a wide range of effects is possible and can be used to support the film’s atmosphere but also increase a film’s entertainment value and maybe even its accessibility. The following effects were found in an analysis of integrated titles (see Fox forthcoming) in English films that translate an additional language into the film’s main language (e.g. Russian in John Wick): ●







Kinetic effects: These are titles that illustrate motion, for example, by moving over the screen or being placed consecutively in a way that indicates motion. Moving titles were visible multiple times in Man on Fire (moving in and out of the frame) and in Nochnoy Dozor (following an object that is thrown towards a person). Spatial effects: Some titles are animated to appear as a situated element of the image or mise-en-scène, so that a person walking will cover the title when passing. This was done in Man on Fire, John Wick, Nochnoy Dozor and Fast Five. Repetitive effects: Some titles were repeated before and after cuts in order to emphasize a statement (Man on Fire, Nochnoy Dozor). Transformative effects: These effects cover a wide range of possibilities that seems unlimited due to today’s tools and software. Man on Fire has titles that disperse and leave focus because the speaker is crying and Nochnoy Dozor features titles that disperse into a blood trail because the speaker is a vampire on the hunt.

WORKFLOW FOR THE CREATION OF INTEGRATED TITLES





229

Display effects: While conventional subtitles are usually displayed uniformly, this can be adjusted. Titles can be faded in and out faster or slower to indicate speech pace or emotions. They can be displayed letter by letter or line by line, and they can blur in or out (see Man on Fire). Typographic effects: The layout and especially the typography of a title can also convey information – font size or form (e.g. capitals) can visualize volume, while specific fonts can cause certain associations (Man on Fire, John Wick).

4.3 Application After a decision has been made in regard to title strategies and layout, it is time to place the integrated titles in the film. In order to do this, the first step should be to identify the focus points in the scene based on the analysis of the film and its image composition. This can be achieved by dividing the image that corresponds to a speech act (or sound) into primary and secondary areas. The integrated titles should then be placed in the secondary areas where possible and should not cover relevant, primary areas and elements. Next, the chosen layout features should be controlled. Do the titles look as planned and have the intended effect? Also, the contrast should always be strong enough to ensure legibility (unless illegibility is intended). During a final check, the need for additional information or additional effects should be analysed. Additional information might be necessary for accessible titles, and additional effects might be required to support a film’s tone and atmosphere.

5 Testing and application of the workflow As concepts like these need to be thoroughly tested, criticized and refined, there is a wide range of possible follow-up studies. The strategies and workflow have so far been tested in the course of a Bachelor thesis, in a research project and in a real-life setting. Notably, the Bachelor thesis by Hevesi (2015) led to adjustments based on the student’s feedback. Hevesi (2015) created integrated titles for the English-language short film Carry On Only (Loope 2013) and did a small eye-tracking and questionnaire-based study with three native English-speaking participants and eight German participants dependent on subtitles. Based on the presented placement strategies, the workflow and the results from the pilot study (Fox 2012), as well as the eye movements of the English participants, the integrated titles for Carry On Only were developed. The titles followed the film’s typographic identity and made use of a number of effects (spatial,

230

SEEING INTO SCREENS

FIGURE 11.6 Integrated titles for Sherlock Holmes – Game of Shadows.

typographic and display effects). The eye-tracking data revealed that during the title display, the participants spent more time on the image (on average 37.6 per cent). Participants did not show any unnatural searching behaviour and the recorded eye movements did not seem stressed compared to those of the TRAD group. In the questionnaire, at least 75 per cent of the participants rated legibility, stress, time for image exploration, detail perception and effects positively. However, most participants did not notice the effects or care much about them. Concerning the workflow, space was added for adjustments and changes to the translation during the process. For a further study on the reception of integrated titles by Kruger et al. (forthcoming), integrated titles were created for the first thirty minutes of the English blockbuster film Sherlock Holmes – A Game of Shadows (Ritchie 2011). As this is quite an action-packed film, it was decided to keep to the conventional position of subtitles in the bottom-centre area of the frame as long as there was no superior position. The titles indicated speaker position and speaking direction, were placed close to main focus points, and made use of spatial and display effects (see Figure 11.6). No need was observed for any adjustments to the proposed workflow. The most recent project that made use of the placement strategies and proposed workflow took place in cooperation with the filmmaking and production team of Notes on Blindness and was supervised by RomeroFresco who gave advice concerning placement and accessibility. Feedback came from the filmmakers, producers, subtitle professionals (who created the SDH) and Romero-Fresco. As these integrated titles target a hearing-impaired audience, noise indication and colour-based speaker identification were added. Different colours were assigned to each speaker based on colours present during scenes featuring that person. Another new feature was the additional indication of speaking volume and source (see Figure 11.7).

WORKFLOW FOR THE CREATION OF INTEGRATED TITLES

231

FIGURE 11.7 Volume and source indication in Notes on Blindness.

6 Conclusion Integrated titles are here to stay. They are used in more and more film productions and all areas of subtitling, be it for interlingual, intralingual, partial or full translations. The eye-tracking study detailed in this chapter has demonstrated the positive effects for viewers, while the aesthetic and perception-based implications are enormous. However, not every set of integrated titles can be based on an extensive eye-tracking study, and not every title creator or subtitle professional has the skills and knowledge of basic design aspects needed to create a satisfying set of integrated titles. Therefore, the proposed workflow presented here is expected to provide a basic understanding of the necessary steps and considerations, and to help create titles that not only provide translation but also become an integral part of the overall film. Practical implications from this study arise for all areas of audiovisual translation. Film producers should be aware of the effects that traditional subtitles have on perception – especially with top-grossing and awardwinning Hollywood films making more profit in translated versions than on the domestic market. The integrated titling strategies and proposed workflow set out in this chapter offer filmmakers opportunities to have their work translated in respectful and artistic ways for specific target audiences. A number of recent integrated title projects already show that this proposed workflow is applicable in real-life settings, as is demonstrated by the integrated titles created for hearing-impaired audiences of Notes on Blindness.

232

SEEING INTO SCREENS

Notes 1 Chandler and Sweller define ‘split attention’ as the result of the divided attention of a learner due to ‘multiple sources of information’ (1991: 295), which can be – in the context of film material – transferred to splitting attention between image and title (and sound) as sources of information. More attention towards the image – rather than on the subtitle or title – is considered a positive effect as easier and faster information processing is more likely (cf. Drescher 1997: 151). 2 Cf. http://www.tobiipro.com/product-listing/tobii-pro-tx300/ (accessed 8 September 2016). 3 For further information on Joining the Dots, see http://www.jostrans. org/issue20/int_romero.php (accessed 28 August 2016). 4 Cf. also https://www.valdosta.edu/student/disability/documents/ captioning-key.pdf (accessed 8 September 2016). 5 Cf. ‘The Impact of Subtitles on Psychological Immersion’, Dr Jan-Louis Kruger, Macquarie University, Australia, ‘Languages and the Media’ conference in Berlin, 6 November 2014.

References Armstrong, M. and M. Brooks (2014), ‘Enhancing Subtitles’, paper presented at the TVX2014 Conference, Brussels, 25–27 June. Available online: http://www. bbc.co.uk/rd/blog/2014-10-tvx2014-short-paper-enhancing-subtitles (accessed 1 August 2016). Bayram, S. and D. M. Bayraktar (2012), ‘Using Eye Tracking to Study on Attention and Recall in Multimedia Learning Environments: The Effects of Design in Learning’, World Journal on Educational Technology, 4 (2): 81–98. Brown, A., R. Jones and M. Crabb (2015), ‘Dynamic Subtitles: The User Experience’, paper presented at the TVX2015 Conference, Brussels, 3–5 June. Carroll, M. and J. Ivarsson (1998), ‘Code of Good Subtitling Practice’. Available online: https://www.esist.org/wp-content/uploads/2016/06/Code-of-GoodSubtitling-Practice.PDF.pdf (accessed 2 May 2017). Chandler, P. and J. Sweller (1991), ‘Cognitive Load Theory and the Format of Instruction’, Cognition and Instruction, 8 (4): 293–332. Díaz Cintas, J. and P. Muñoz Sánchez (2006), ‘Fansubs: Audiovisual Translation in an Amateur Environment’, The Journal of Specialised Translation, 6: 37–52. Díaz Cintas, J. and A. Remael (2007), Audiovisual Translation: Subtitling, Manchester: St. Jerome. Drescher, K. H. (1997), Erinnern und Verstehen von Massenmedien: Empirische Untersuchungen zur Text-Bild-Schere [Remembering and Understanding Mass Media: Empirical Investigation of the Divergence of Texts and Images], PhD Thesis, University of Vienna. Vienna: Facultas. Dyer, A., M. MacSweenay, M. Szerzerbinski, L. Green and R. Campbell (2003), ‘Predictors of Reading Delay in Deaf Adolescents: The Relative Contributions of

WORKFLOW FOR THE CREATION OF INTEGRATED TITLES

233

Rapid Automatized Naming Speed and Phonological Awareness and Decoding’, Journal of Deaf Studies and Deaf Education, 8 (3): 215–29. Foerster, A. (2010), ‘Towards a Creative Approach in Subtitling: A Case Study’, in J. Díaz Cintas, A. Matamala and J. Neves (eds), New In-sights into Audiovisual Translation and Media Accessibility, 81–98, Amsterdam: Rodopi. Fox, W. (2012), ‘Integrierte Bildtitel – Eine Alternative zur traditionellen Untertitelung. Am Beispiel der BBC-Serie Being Human’ [Integrated Titles – An Alternative to Traditional Subtitling, Based on the BBC Television Series ‘Being Human’], MA diss., FTSK Germersheim, Johannes Gutenberg University Mainz, Germany. Fox, W. (2016), ‘“Should She Really be Covered by Her Own Subtitle?” Text Elements in Film and their Graphical Translation’, Translation Spaces, 5 (2): 244–70. Fox, W. (forthcoming), ‘Can Integrated Titles Improve the Viewing Experience? Investigating the Impact of Subtitling on the Reception and Enjoyment of Film Using Eye Tracking and Questionnaire Data’, PhD diss., FTSK Germersheim, Johannes Gutenberg University Mainz, Germany. Hevesi, J. (2015), ‘Anwendung und Auswertung der modularen Richtlinien zur Erstellung integrierter Titel nach Fox anhand des Kurzfilms “Carry on Only”’ [Application and Analysis of the Modular Guidelines for the Creation of Integrated Titles by Fox Based on the Short Film ‘Carry On Only’], BA thesis, FTSK Germersheim, Johannes Gutenberg University Mainz, Germany. Hong, R., M. Wang, M. Xu, S. Yan and T-S. Chua (2010), ‘Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment’, MM ’10: Proceedings of the 18th ACM International Conference on Multimedia (25–29 October, Florence, Italy), 421–430, New York: ACM. Hu, Y., J. Kautz, Y. Yu and W. Wang (2014), ‘Speaker-Following Video Subtitles’, ACM Transactions on Multimedia Computing, Communications and Applications, 2 (3). Available online: http://www.bbc.co.uk/rd/blog/2014-10tvx2014-short-paper-enhancing-subtitles (accessed 1 August 2016). Ivarsson, J. and M. Carroll (1998), Subtitling, Simrishamn: TransEdit. Kruger, J-L., S. Doherty, W. Fox and P. De Lissa (forthcoming), ‘Multimodal Measurement of Cognitive Load during Subtitle Processing’, in I. Lacruz and R. Jääskeläinen (eds), New Directions in Cognitive and Empirical Translation Process Research, London: John Benjamins. Lambourne, A. (2012), ‘Climbing the Production Chain’, paper presented at 2012 Languages and the Media Conference, Berlin, 21–23 November. McClarty, R. (2012), ‘Towards a Multidisciplinary Approach in Creative Subtitling’, Monographs in Translating and Interpreting (MonTI), 4: 133–55. Nornes, A. M. (1999), ‘For an Abusive Subtitling’, Film Quarterly, 52 (3): 17–34. Park, S-B., K-J. Oh, H-N. Kim and G-S. Jo (2008), ‘Automatic Subtitles Localization Through Speaker Identification in Multimedia System’, IEEE International Workshop on Semantic Computing and Applications. Rawsthorn, A. (2007), ‘The Director Timur Bekmambetov Turns Film Subtitling into an Art’, New York Times, 27 May. Romero-Fresco, P. (2013), ‘Accessible Filmmaking: “Joining the Dots” between Audiovisual Translation, Accessibility and Filmmaking’, JoSTrans: The Journal of Specialised Translation, 20: 201–23.

234

SEEING INTO SCREENS

Sinha, Amresh. (2004), ‘The use and Abuse of Subtitles’. In Atom Egoyan and Ian Balfour (eds), Subtitles: On the Foreigness of Film, 65–7. Cambridge, MA: MIT Press, Alphabet City. Thompson, P. (2000), ‘Getting Beyond “Read ‘em Quick!” Practical Notes on Subtitles and Superimpositions’, Chicago Media Works. Available online: http://www.chicagomediaworks.com/2instructworks/3editing_doc/3editing_ docsubtitles.html (accessed 10 August 2016). Vit, A. (2005), ‘Subtítulos en Acción (Subtitles in Action)’, Speak Up. Available online: http://www.underconsideration.com/speakup/archives/002231.html (accessed 08 October 2016).

Films Carry On Only [Short]. 2013. Directed by Christopher Loope. USA: no company credits. Ex Machina. 2015. Directed by Alex Garland. UK: Universal Pictures International, Film4, DNA Films; Universal Pictures International. Fast Five. 2011. Directed by Justin Lin. USA: Original Film, One Race Films, Dentsu; Universal Pictures. Gone Girl. 2014. Directed by David Fincher. USA: Twentieth Century Fox Film Corporation, Regency Enterprises, TSG Entertainment; Twentieth Century Fox Home Entertainment. Heroes [Television Series]. 2006-2010. Created by Tim Kring. USA: Tailwind Productions; Universal Pictures. Jaws. 1975. Directed by Steven Spielberg. USA: Zanuck/Brown Productions, Universal Pictures; Universal Pictures. John Wick. 2014. Directed by Chad Stahelski and David Lynch. USA: 87Eleven; Lionsgate Entertainment. Joining the Dots [Short]. 2012. Directed by Pablo Romero-Fresco. UK: no company credits. La Antena. 2007. Directed by Esteban Sapir. AR: LadobleA; Capelight Pictures. Man on Fire. 2004. Directed by Tony Scott. USA/UK: Warner Bros. International; Twentieth Century Fox. Nochnoy Dozor [Night Watch]. 2004. Directed by Timur Bekmambetov. RUS: Channel One; Twentieth Century Fox. Notes on Blindness. 2016. Directed by Peter Middleton and James Spinney. UK: Arte France, Creative England, Impact Partners; ARTE. Sherlock Holmes – A Game of Shadows. 2011. Directed by Guy Ritchie. USA: Warner Bros., Village Roadshow Pictures, Silver Pictures; Warner Home Video. Slumdog Millionaire. 2008. Directed by Danny Boyle and Loveleen Tandan. UK/ FR/USA: Celador Films Ltd.; Twentieth Century Fox. Star Trek Into Darkness. 2013. Directed by J. J. Abrams. USA: Paramount Pictures, Spyglass Entertainment, Bad Robot; Paramount Pictures. Victoria. 2015. Directed by Sebastian Schipper. D: MonkeyBoy; Radical Media.

12 Eye Tracking, Subtitling and Accessible Filmmaking Pablo Romero-Fresco

1 Introduction Film and translation have not always been divorced from one another. They once lived happily together when the translation of intertitles was carried out as part of the post-production process in silent films, and even more happily throughout the 1930s with the so-called multiple-language versions (Vincendeau 1999), when films were made and remade in two or three languages by the same director and sometimes in up to fourteen languages with a different director for each language version. Yet, it was not long until more affordable techniques were found for translation, which resulted in the introduction of dubbing and subtitling as part of the distribution stage from 1940 onwards (Izard 2001). Translation was thus expelled from the filmmaking process and the divorce between film and translation was consummated. As a result, translators are now among the worst paid professionals in the filmmaking industry, despite being indispensable for the distribution and success of films in foreign markets and among deaf and blind audiences. The divide between film and translation has also affected research. Film translation, more often referred to as audiovisual translation (AVT), has traditionally been included within translation studies, which in turn originated in the field of linguistics, away from the more visually oriented area of film studies. However, over the past few years, there have been signs of a growing connection between film and translation. The increasingly common presence

236

SEEING INTO SCREENS

of multilingualism in original films and the creative use of on-screen titling in TV series such as Sherlock (Gatiss and Moffat 2010–present) is making translation, and particularly subtitling, more visible in the film and television industry. This presents an opportunity for film (and television) scholars to work together with AVT scholars or at least to turn their attention to translation and compare the reception of films between original and foreign viewers. Are they watching the same film? Are they watching it differently? How does translation impact on their viewing experience? Are filmmakers aware of what happens in translation and can they do anything to provide a similar viewing experience for original and foreign viewers should they wish to do so? The present chapter hopes to find answers to some of these questions by exploring how eye-tracking research in film can contribute to translation (see section 2) and vice versa (see section 3), and by presenting accessible filmmaking (see section 4) as a framework that can enable and foster this collaboration in terms of practice, training and eye-tracking-based research.

2 Viewing film without subtitles The use of eye tracking is rapidly becoming one of the most popular methods of studying film cognition through psychological investigation (Smith 2015). This empirical approach to the study of film seeks to quantify a viewer’s experience of a film, analysing different viewing conditions and the way in which cinematic elements impact on where viewers look. This enables the researcher to gain insight into how film works and how it is perceived by the audience, while also serving as ‘a test bed for investigating complex aspects of real-world cognition that were often considered beyond the realms of experimentation’ (Smith 2015). The main findings from the analysis of original films so far show that the viewers’ gaze is often focused on the central features of a traditionally composed film, with a particular bias on faces (where eyes are favoured over mouths) and moving objects (Treuting 2006; Smith 2013). These movements are in line with the patterns found for facial and emotional recognition (Ekman and Friesen 1971; Hernandez et al. 2009). Eye movements can be the result of a bottom-up process, that is, a reaction to a stimulus that captures attention automatically without volitional control, or of a top-down process, one that is directed by a voluntary control that focuses attention on something that is relevant to the observer. Film viewing involves both bottom-up (reactive) and top-down (volitional) movements (Subramanian et al. 2014). According to Bordwell (2011), top-down movements in film viewing are caused by the tasks we take on when we watch a film, the most basic of which is often to maintain our interest and to comprehend the story. Our top-down hypotheses about what is going on and what will happen next inform what we look at,

EYE TRACKING, SUBTITLING AND ACCESSIBLE FILMMAKING

237

and when and how we look at it. Viewers are often so focused on these tasks that they tend to ignore other aspects, which leads to the so-called inattentional blindness that explains why the gorilla is missed in the famous experiment by Chabris and Simons (2010), or why so many continuity and technical inconsistencies go unnoticed by the viewers. However, when watching a film, often what we think are top-down movements are in reality bottom-up ones. This is the illusion of volition (Smith 2011) that makes us believe that we are free to look where we want, when in reality our choice is manipulated by the cinematic tools used by the filmmaker, such as ‘sound, especially dialogue; camera movement, which is constantly redirecting our attention; and figure movement, which is a powerful eye-catcher’ (Bordwell 2011). Other elements that inform our gaze patterns and fixations are characters’ point of view and subjective experiences (Rassell et al. 2015), editing techniques (Smith and Mital 2011) and the layout of the mise-enscène (Marchant et al. 2009): All things being equal, these channels of information will usually work in tandem with composition and the human signal patterns at work in a scene. Most films can be thought of as massively redundant systems for drawing our visual attention to certain items in the frame, second by second. (Bordwell 2011) These top-down and especially bottom-up processes account for the existence of edit blindness (Smith and Henderson 2008), whereby the viewers miss many of the cuts in films edited according to the classical rules of continuity editing, and attentional synchrony, when the viewers’ gaze is synchronized on the same areas of the screen (Smith and Mital 2013). Furthermore, the viewers’ fixations at the beginning of a shot tend to be central and long whereas there seems to be greater exploration of the screen and less attentional synchrony as the shot duration increases (Mital et al. 2011). Hochberg and Brooks (1978) referred to this as the visual momentum of the image: the pace at which visual information is acquired. However, although there may be greater exploration of the screen as the shot lingers, this is normally limited, with viewers taking in roughly no more than 3.8 per cent of the total screen area during an average-length shot (Smith 2013) and showing a clear tendency for viewing in detail only restricted parts of the overall image (Dorr et al. 2010). Peripheral processing is at play, but it is ‘mostly reserved for selecting future saccade targets, tracking moving targets, and extracting gist about scene category, layout and vague object information’ (Smith 2013:168). What remains to be seen is what happens to inattentional and edit blindness, attentional synchrony and visual momentum in translation, when watching, for example, a film with subtitles. This will be discussed in the following section, once the main findings obtained so far about eye tracking and original films have been reviewed.

238

SEEING INTO SCREENS

Two more eye-tracking-based findings related to the use of sound in film provide interesting food for thought regarding translation for foreign viewers and accessibility for deaf and blind audiences. Firstly, as found by Coutrot et al. (2012), Rassell et al. (2015) and Robinson et al. (2015), the use of sound in film concentrates perceptual attention and triggers longer fixations and larger saccades than the absence of sound. The latter results in higher dispersion, that is, more variability between observers’ positions. From the point of view of accessibility, it will be interesting to explore whether deaf and hard-of-hearing viewers, who have limited or no access to the soundtrack, may show a greater degree of dispersion than hearing viewers, or perhaps whether the subtitles will help to mitigate this dispersion. The second soundrelated finding that is relevant to translation concerns the mouth bias. As shown by Robinson et al. (2015), fixations on characters’ mouths and lip reading in general increase as the listener’s linguistic competence in the spoken language decreases. It would be interesting to see if this still applies in subtitled films, where the translation in the subtitles makes up for the viewers’ difficulty in understanding the original language. More importantly, aside from decreased linguistic competence, if background noise (Buchan et al. 2007) and poorly synched lips (Smith et al. 2014) trigger mouth bias, does this apply to dubbing, the prevailing translation modality in countries such as Spain, France, Italy and Germany? And if that is the case, and viewers in these countries are pushed to focus more than usual on mouths that, by definition, are not fully synchronized with the sound, does this not have an impact on the dubbing viewers’ experience and their degree of immersion and engagement with the film? The little research carried out on this to date (Romero-Fresco 2016 and Figures 12.1 and 12.2) shows that the opposite may be true. In a study

FIGURE 12.1 Distribution of attention between eyes and mouth by English viewers watching an original clip from Casablanca.

EYE TRACKING, SUBTITLING AND ACCESSIBLE FILMMAKING

239

FIGURE 12.2 Distribution of attention between eyes and mouth by Spanish viewers watching a dubbed clip from Casablanca.

conducted with eighteen native English viewers and twenty-one native Spanish viewers watching an original scene from Casablanca (Curtiz, 1942) and its dubbed version in Spanish, respectively, the English viewers devoted 73 per cent of their time to looking at the eyes of the characters and 27 per cent of the time to looking at their mouths, which is in line with what has been found in the literature so far. In contrast, the Spanish viewers devoted 93 per cent of their time to looking at the characters’ eyes and only 7 per cent to looking at their mouths. When shown a comparable scene from a Spanish film, the same group of Spanish viewers showed a 74 per cent (eyes) versus 26 per cent (mouth) distribution of attention, thus highlighting the exceptionality of the eye bias found in the dubbed scene. The results obtained from questionnaires on comprehension and immersion indicate that the Spanish viewers had no problems understanding the scenes and that they felt as immersed in the fiction as the English participants. A further questionnaire on selfreported distribution of attention shows that the Spanish viewers believed that they were focusing on mouths just as much as on eyes, thus being unaware of their strong eye bias. These findings, corroborated by a further experiment with Italian participants included in Di Giovanni and RomeroFresco (forthcoming), point to the potential existence of a dubbing effect, an unconscious strategy/eye movement adaptation performed by dubbing viewers to avoid looking at mouths in dubbing. This dubbing effect seems to prevail over the natural and idiosyncratic way in which humans watch real-life scenes and film, and allows them to suspend disbelief and be transported into the fictional world by unconsciously focusing on eyes rather than mouths. This last issue as well as some of the questions posed earlier about the impact of subtitling on inattentional and edit blindness, attentional synchrony and visual momentum justify Smith’s view (2015) that translation

240

SEEING INTO SCREENS

(and subtitling in particular) leads to differences in eye movement behaviour, thus invalidating the use of eye tracking in translated films ‘as a way to measure how the filmmaker intended to shape viewer attention and perception’. However, as the next section aims to show, eye-tracking research on translation has much to offer filmmaking and film studies, and a new and more inclusive collaborative approach to film translation could encourage filmmakers to use subtitles as a tool to direct viewers’ attention.

3 Viewing film with subtitles Subtitles are perceived by viewers as both a reflexive top-down and reactive bottom-up process. On the one hand, the viewers read the subtitles voluntarily as soon as the characters speak in order to understand what they are saying. On the other hand, research shows that the appearance of text on screen (in the form of subtitles or another type of on-screen text) draws the viewers’ attention to it automatically, whether or not the subtitles are needed (d’Ydewalle and De Bruycker 2007) or even understood (Bisson et al. 2014). Experienced subtitle viewers seem to watch subtitled programmes almost effortlessly (d’Ydewalle and De Bruycker 2007). They often start by reading the subtitles (Jensema et al. 2000), then shift their visual attention from the subtitles to the images (de Linde and Kay 1999) and modulate their processing strategies according to the type of subtitled programme they are watching (Perego et al. 2010). However, some viewers show less smooth reading patterns, shifting their gaze between the image and the subtitles in what are known as ‘deflections’ (de Linde and Kay 1999), ‘back-andforth shifts’ (d’Ydewalle and De Bruycker (2007) or even regressions. Most viewers spend more time looking at the subtitles than at the images, but the fixations on the images are longer (Perego et al. 2010). In other words, once they have finished reading the subtitles, the viewers of a subtitled film use their remaining time to focus on the key parts of the image (often faces, when they are present) for as long as possible. Thus, they seem to explore the image (even) less than the viewers of the original film without subtitles. In general, the faster the subtitles and the more movement they present (e.g. scrolling subtitles as opposed to subtitles displayed in blocks), the more time is spent reading them and the less time is left to look at the images (RomeroFresco 2011). The results obtained in the EU-funded project DTV4ALL, which analysed 71,070 subtitles viewed by 103 deaf, hard-of-hearing and hearing viewers from Poland, Spain, Italy, France and Germany, have made it possible to ascertain the average distribution of viewers’ attention between text and image depending on the time that the text is left on screen (RomeroFresco 2015). Text displayed at a speed of 150 words per minute (wpm) leads to an average distribution of 50 per cent of the time on the subtitles

EYE TRACKING, SUBTITLING AND ACCESSIBLE FILMMAKING

241

and 50 per cent on the images. A faster speed of 180 wpm yields an average of 60–65 per cent of the time on the subtitles and 40–35 per cent on the images, whereas 200 wpm only allows 20 per cent of the time to be spent on the images. As for the difference in reception depending on the viewers, Tables 12.1 and 12.2 show that deaf viewers can find the subtitles on the screen more quickly than hearing viewers (perhaps because they are waiting for them, as they rely on them heavily), but they take longer to read them than hearing viewers, which may be a sign of reading difficulties, a result also found in Conrad (1977), Braverman (1981), Torres Monreal and Santana Hernández (2005), Szarkowska et al. (2011) and Miquel Iriarte (2017). As shown in Table 12.3, despite having less time left to look at the images on the screen, the deaf viewers’ visual comprehension is just as good as, and sometimes even better than, that of the hearing viewers. In other words, deaf viewers seem to make up for their sometimes substandard reading skills with particularly good visual perception and comprehension.1 The differences between static text and subtitles determine some of the main characteristics of how subtitles are perceived by the viewers

Table 12.1 Average reaction times of the participants in DTV4ALL All participants 332 ms (Min: 309; Max: 393)

Hearing

Hard of hearing

Deaf

348 ms

340 ms

309 ms

Table 12.2 Average time spent on subtitles (vs. time spent on images) by the participants in DTV4ALL All participants (%)

Hearing (%)

Hard of hearing (%)

Deaf (%)

48.2

53.1

56.7

52.7 (Min: 35.9; Max: 63.2)

Table 12.3 Average comprehension of the participants in DTV4ALL All participants (%)

Hearing (%)

Hard of hearing (%) Deaf (%)

Overall

69.6

77.2

65.3

66.2

Subtitles

67.4

76.25

64.6

61.4

Images

71

73.5

66.2

73.25

242

SEEING INTO SCREENS

and provide interesting food for thought regarding the questions posed previously about how subtitles impact on inattentional and edit blindness, attentional synchrony and the notion of visual momentum. Unlike much print media and other types of static text, subtitles are read in competition with an image that must be scanned, they interact with different sources of information and they are fleeting (Kruger et al. 2015). The fact that subtitle viewers must combine image scanning and subtitle reading is likely to have an impact on edit blindness and inattentional blindness. Smith and Henderson (2008) found that viewers may miss approximately one third of the cuts in a classically edited film, even when they have been asked to identify these cuts, which shows the effectiveness of the rules of continuity editing in engaging and immersing the audience in a film as if the fictional story was continuous instead of fractured. A great deal of the subtitles in a film are likely to be displayed across cuts, especially as average shot lengths are reduced in contemporary filmmaking (Bordwell 2006).2 Viewers are thus more likely to be busy reading a subtitle when the shot changes, which would theoretically mean that they are more likely to miss the cut. This has not been empirically tested yet, but if it is true, one could argue that, contrary to the widespread view that subtitles draw viewers out of the fiction, they may actually help to enhance the viewers’ sense of engagement and immersion in the film as a continuous story. Inattentional blindness may be defined as ‘the failure to notice a fullyvisible, but unexpected object because attention was engaged on another task, event, or object’ (Simons 2007: 3244). On the one hand, filmmakers can use this phenomenon to their advantage, as they can use the cinematic tools at their disposal to guide the viewers’ attention and ensure that they are not being distracted by other elements in the shot. On the other hand, if a filmmaker wants the viewers to focus on an unexpected element placed outside the main point of attention, they should be aware that inattentional blindness will make it difficult to achieve. Presumably, viewers watching the same scene with subtitles will be even less likely to find the object, unless the shot is long enough or the object is not entirely unexpected. The second characteristic of subtitles that distinguishes them from static text – the interaction with different sources of information – triggers complicated viewing patterns including regressions and deflections, as shown in Figure 12.3. This adds a new layer of complexity to the notion of attentional synchrony, as the variables involved in image viewing are now combined with the idiosyncratic reading habits of the spectators. Does this mean that there is less attentional synchrony in subtitled films? And if so, does the aforementioned illusion of volition (Smith 2011) not apply in subtitled films and are they watched more idiosyncratically than original ones? As mentioned previously, filmmakers often use different cinematic techniques (composition, lighting, blocking, etc.) to trigger bottom-up attentional

EYE TRACKING, SUBTITLING AND ACCESSIBLE FILMMAKING

243

FIGURE 12.3 Scanpaths showing the viewing patterns of a deaf (left) and a hardof-hearing participant (right) watching a clip from Shrek 3 (Miller and Hui 2007) with Polish subtitles (Szarkowska et al. 2011).

synchrony and direct the viewers’ gaze. Until now, though, no attention has been paid to how subtitles impact on this. For example, a filmmaker may choose to hold a wide shot with a quick conversation between two main characters for longer than usual so that the viewers can see not only the characters but also explore other important aspects of the mise-en-scène. When subtitled, this quick conversation (spoken, for instance, at 180 wpm) will cause the viewers to spend a great deal of time on the subtitles (around 65 per cent, according to the data in Romero-Fresco 2015), leaving perhaps one third of the time to look at the images (in this case the characters) and probably no time to explore the screen. As a result of this, the foreign audience is likely to miss essential elements in the film that the original viewers are able to appreciate. So far, there is no evidence that filmmakers are aware of this, which means that the viewers of the subtitled film (which are often greater in number than those of the original film) are not watching the film as intended by the filmmaker. Finally, the fact that subtitles are fleeting and interact with different sources of information is bound to have an impact on the visual momentum of a film, that is, the pace at which visual information is acquired. No research has been done to date on this, but Smith hypothesizes that when watching a subtitled film, ‘the viewer is less likely to exhaust the information in the image because their eyes are busy saccading across the text to acquire the information that would otherwise be presented in parallel to the image via the soundtrack’ (Smith 2015). This process may ‘increase the cognitive load experienced by viewers of subtitled films and change their affective experience, producing greater arousal and an increased sense of pace’ (Smith 2015). Up until now, filmmakers have focused on editing and other elements such as blocking and music to manipulate and explore the sense of pace in film. Yet, in the case of subtitled films (or original films and series that use on-screen titles, such as Sherlock), the sense of pace is also determined by the speed at which subtitles are displayed and read by the viewers, which

244

SEEING INTO SCREENS

has so far been ignored by filmmakers and by the film industry as a whole. Subtitling decisions are therefore crucial to controlling the sense of pace in a film. For instance, the translator may have chosen to subtitle a scene with two-line subtitles but the filmmaker may decide that the scene requires an increase in pace, for which one-line subtitles (twice as many as two-line subtitles and therefore displayed twice as fast) may be better suited. These specific characteristics of the perception of subtitles, as well as the fact that subtitle viewers compensate for the back and forth between subtitles and images by making shorter fixations than in regular reading and by skipping high-frequency words while still registering meaning (d’Ydewalle and de Bruycker 2007), leads Dwyer to position subtitle processing closer to visual scanning than to the reading of static text. For her, ‘it is more accurate to see subtitling as transforming reading into viewing and text into image, rather than vice versa’ (Dwyer 2015). Unfortunately, the current industrialized model of subtitling, relegated to the distribution stage and allowing no contact between translators and the creative team of the film, means that subtitles are dismissed as a necessary evil that is often seen as turning viewers into readers while drawing them out of the fictional story. The next section in this chapter proposes an alternative model that supports Dwyer’s idea of text as image by creating a space for the collaboration of translators and filmmakers where the latter can use subtitling as another cinematic tool to influence the viewers’ attention and engagement with the film (see chapter by Fox in this book).

4 Eye tracking and accessible filmmaking: Integrated titles Relegated to the distribution stage as an afterthought in the filmmaking process, translation is normally carried out in a very limited time, for a small remuneration and with no access to the creative team of the films. This may be seen as a profitable model for the film industry,3 but more than a decade of research in AVT has shown that it may also have a very negative impact on the quality and reception of translated films. In fact, renowned filmmakers such as Ken Loach are now beginning to denounce this model as it often results in the alteration of their vision and, even more worryingly, most filmmakers are not always aware of this (de Higes Andino 2014). As a potential way to tackle this problem, accessible filmmaking (Romero-Fresco 2013) attempts to integrate AVT and accessibility as part of the filmmaking process through collaboration between filmmakers and translators. Put another way, accessible filmmaking involves the consideration during the filmmaking process (and through collaboration between the translator and the creative team of the film) of some of the aspects that are required

EYE TRACKING, SUBTITLING AND ACCESSIBLE FILMMAKING

245

to make a film accessible to viewers in other languages and viewers with hearing or visual loss. Since it was first introduced (Romero-Fresco 2013), this approach has been applied to training, research and professional practice. Traditionally, film(making) courses have disregarded translation and accessibility issues, while postgraduate programmes in AVT do not normally include film(making). This is beginning to change, as postgraduate courses in filmmaking such as the Film Studies Masters at the University of Malta and the MA in Film Production at the ESCAC (Barcelona), the leading film school in Spain, now include classes on AVT and accessibility. Likewise, AVT courses are beginning to open the door to film-related content, as shown by the MA in Audiovisual Translation at the Universitat Autònoma de Barcelona and especially the MA in Accessibility and Filmmaking at the University of Roehampton, where students learn not only how to make films but also how to make them accessible to viewers in other languages and viewers with hearing and visual loss. As far as professional practice is concerned, the first examples of accessible filmmaking can be traced back to silent films, where intertitles were produced as part of the post-production process (often supervised by the filmmakers) to create multiple-language versions, where translation shaped the production process. Since then, the process of industrial subtitling, with translation relegated to the distribution stage, has prevailed, although there is an increasing number of exceptions to this rule. The first ones may be found in ethnographic documentaries from the 1960s. Just as the emergence of ethnographic filmmaking helped to give a voice to marginalized communities around the world through the use of subtitles, it also contributed to raising the visibility of translation among a small number of filmmakers and film scholars. Ethnographic filmmakers used subtitles as one of the creative ingredients of the filmmaking process, a ‘dramatic component of visual anthropology’ (Ruoff 1994) that had to be tackled collaboratively, often by filmmakers and non-professional translators (Lewis 2003) and from the editing stage (Henley 1996). All these elements, as well as the consideration by these filmmakers and scholars of the effect of subtitles upon the audience, account for the pioneering role played by ethnographic film studies in the research and practice of accessible filmmaking: The writing and placing of subtitles involves considerable polishing and fine-tuning, but unlike the ex post-facto subtitling of a feature film, this remains part of the creative process, influencing the pacing and rhythm of the film as well as its intellectual and emotional content. (MacDougall 1998:168) In recent years, partly due to the emergence of multilingual films, more and more filmmakers are beginning to engage with translation from the

246

SEEING INTO SCREENS

production process and to collaborate with translators, as is the case with John Sayles (Lone Star 1996; Men with Guns 1997), Jim Jarmusch (Mystery Train 1989; Night on Earth 1991), Danny Boyle (Slumdog Millionaire 2008), James Cameron (Avatar 2009) and, more notoriously, Quentin Tarantino (Inglourious Basterds 2009) and Alejandro González Iñárritu (Babel 2009; The Revenant 2015), both of whom issued translation guidelines to their distributors in order to ensure that their vision for their films was maintained in the target versions (Sanz 2015). However, given the inflexible nature of industrial subtitling, where distributors have the power to decide against the translation wishes of recognized filmmakers such as Ken Loach and Quentin Tarantino (Sanz 2015), independent filmmaking offers an ideal platform for accessible filmmaking to be developed. This is the case with recent independent films that have integrated translation and accessibility from an early stage, such as Michael Chanan’s Secret City (2012), Enrica Colusso’s Home Sweet Home (2012), Elisa Fuksas’s Nina (2012), Alastair Cole’s The Colours of the Alphabet (2016) or the Emmy award-winning Notes on Blindness (Spinney and Middleton 2016). Interestingly, many of these filmmakers, when faced with the need to use subtitles in the original versions of their films or to engage with translation, have opted for the use of creative or integrated titles (Fox 2016 and Figures 12.4 and 12.5), thus playing with nonstandard fonts, display modes, effects or positions in order to fulfil both a linguistic and an aesthetic function in the film (McClarty 2012). In theory, one of the arguments against this type of integrated titles is that viewers may spend too much time looking for them around the screen and that their creative nature may attract too much attention, thus drawing the audience out of the fiction. Eye tracking, in combination with comprehension and preference questionnaires, is a very suitable tool to

FIGURE 12.4 Integrated titles for Notes on Blindness (Spinney and Middleton 2016): dialogue. Courtesy Archers Mark.

EYE TRACKING, SUBTITLING AND ACCESSIBLE FILMMAKING

247

FIGURE 12.5 Integrated titles for Notes on Blindness (Spinney and Middleton 2016): sound effects. Courtesy Archers Mark.

ascertain whether this fear is founded. With this purpose in mind, Fox (2016 and Figures 12.6–12.11) analysed the reception of three versions of the twelve-minute film Joining the Dots (Romero-Fresco 2012): its original version with no subtitles, its translation into German with interlingual subtitles produced as an afterthought and finally its translation into German with integrated titles created in collaboration with the filmmaker as part of the production process. These integrated titles were displayed in different positions within the shots on the basis of an established set of criteria regarding framing and mise-en-scène (Fox forthcoming, 2017). Using a Tobii TX300, Fox analysed the reaction time (time to first fixation), reading time (total visit duration) and general visual attention distribution between image and subtitle of fourteen native English participants watching the film without subtitles, fifteen native speakers of German with little or no knowledge of English watching the film with traditional subtitles and sixteen native speakers of German with little or no knowledge of English watching the film with integrated titles. Questionnaires were also used to ascertain whether integrated titles offer advantages over traditional subtitles concerning information absorption and aesthetic experience. The results of the experiment show that while viewers take a little more time to find integrated titles than standard subtitles, the overall reading time is reduced. With integrated titles, the viewers have more time available to watch and explore the images and they show very similar eye-movement patterns to the viewers of the original film with no subtitles. Additionally, very positive results were obtained regarding information intake and aesthetic experience. Another argument levelled against integrated titles and subtitling in general is that they draw viewers out of the film. This has recently been analysed by exploring the viewers’ sense of presence in combination with eye

248

SEEING INTO SCREENS

FIGURE 12.6 Eye tracking of Joining the Dots (Romero-Fresco 2012) without subtitles.

FIGURE 12.7 Eye tracking of Joining the Dots (Romero-Fresco 2012) without subtitles.

tracking and other physiological measures. Kruger et al. (2016) compared the reception of an episode of the medical drama series House with and without English subtitles by 143 university students, who completed a 44-item questionnaire on psychological immersion. The results show that far from drawing the viewers out of the film, the subtitles help them to

EYE TRACKING, SUBTITLING AND ACCESSIBLE FILMMAKING

249

FIGURE 12.8 Eye tracking of Joining the Dots (Romero-Fresco 2012) with subtitles.

FIGURE 12.9 Eye tracking of Joining the Dots (Romero-Fresco 2012) with subtitles.

identify with the characters and to increase their sense of transportation, as though they were inside the fictional reality. The authors are now combining these subjective measurements by investigating the neural processing of subtitles using electroencephalography, and, more specifically, assessing the so-called beta coherence between prefrontal and posterior regions of the

250

SEEING INTO SCREENS

FIGURE 12.10 Eye tracking of Joining the Dots (Romero-Fresco 2012) with integrated titles.

FIGURE 12.11 Eye tracking of Joining the Dots (Romero-Fresco 2012) with integrated titles.

brain, which can provide objective data as to whether subtitles increase the viewers’ sense of immersion in the fictional reality. As part of the EU-funded project HBB4ALL, Romero-Fresco and Fryer (2016) analysed the preferences, eye movements and sense of immersion of 157 deaf, hard-of-hearing and hearing viewers watching a theatre play

EYE TRACKING, SUBTITLING AND ACCESSIBLE FILMMAKING

251

with subtitles for the deaf and hard-of-hearing displayed on either LED screens (open subtitles) or tablets (closed subtitles). The eye-tracking and preference results show a ‘horses-for-courses’ scenario, where different devices may perform effectively depending on the viewers, the type of play and the venue. As far as immersion is concerned, the results obtained by the three groups of participants were equally high. Although more research needs to be conducted, these findings offer the first statistical evidence that the use of subtitles can enable viewers with hearing loss to be as engaged in the fictional world as the hearing participants watching the same play with no captions. Despite the fact that theatre captions are detached from the actors, the viewers still managed to read them as if they were embedded in the performance, perhaps because having little or no hearing enabled the viewers to be more focused, less prone to question the naturalness of the fictional world and less likely to be sidetracked by their own thoughts, irrelevant visual elements or their surroundings. Until now, most eye-tracking studies comparing the reception of original and subtitled films have often highlighted the different viewing experiences of original and foreign viewers. In some cases, these differences will be inevitable and not problematic, but what is worrying is that most filmmakers are not even aware of them (de Higes 2014). The analysis of the short film Joining the Dots (Romero-Fresco 2013 and forthcoming) in its version with standard, non-integrated titles includes two illustrative examples of these differences. The first one is the use of narration in shots featuring on-screen text, as shown in Figure 12.12. The translated version of this shot would normally require a subtitle to translate the main character’s narration and another one, typically in capitals, to translate the content of the on-screen sign. Whereas the original viewers watch a shot with narration and two lines of on-screen text (a fairly common occurrence), the foreign viewers are faced with a cluttered and

FIGURE 12.12 Sign shown in Joining the Dots along with the main character’s narration.

252

SEEING INTO SCREENS

FIGURE 12.13 Transition shot from Joining the Dots (Romero-Fresco, 2012) with fixations from a viewer watching the film with Spanish subtitles.

text-heavy shot that is as difficult to read as it is aesthetically different from what the filmmaker had in mind. The second example consists of short transition shots along with narration, used on several occasions in Joining the Dots to change the location of the story. While in the tentative reception study included in Romero-Fresco (2013 and Figure 12.13), 80 per cent of the original viewers watched and were able to recall these shots, none of the foreign viewers viewed or remembered the images, as they were busy reading the subtitles. This may be described as subtitling blindness, a failure to notice or fully appreciate images on the screen because attention is engaged in reading subtitles. The result of this subtitling blindness, when it does happen, is that foreign and deaf viewers are effectively watching a different film, or rather watching the same film so differently that it becomes a different film. If an accessible filmmaking approach is adopted, translators will have the opportunity to inform filmmakers about the risks of this subtitling blindness and the impact that certain filming and editing decisions may have on the reception of the subtitles by foreign and deaf audiences (Cole 2015). The filmmakers may not want to alter their films to accommodate these viewers, but at least (as they are now aware of the issue) they can if they wish to do so. They can also make informed decisions about issues such as the font (e.g. to have a typographical identity that is consistent with the rest of the onscreen text used in the film), display mode and position of the subtitles, as well as about the way in which these aspects impact on phenomena such as edit blindness, attentional synchrony and the visual momentum for foreign and deaf viewers. In other words, translation, and more specifically subtitles,

EYE TRACKING, SUBTITLING AND ACCESSIBLE FILMMAKING

253

become another tool for filmmakers to influence the viewers’ attention and engagement with the film and to ensure that the film’s original vision is maintained when it reaches foreign, deaf and blind audiences. This new framework contradicts Smith’s (2015) belief that translated films invalidate ‘the use of eye tracking as a way to measure how the filmmaker intended to shape viewer attention and perception’, thus potentially contributing to bridging the gap between film studies and audiovisual translation.

5 Conclusion The historical divide between film and translation as far as industry, training and research is concerned has had a damaging effect on the working conditions of translators. It has also impacted negatively on the quality of the films received by foreign, deaf and blind audiences, whose experience is often very different from that of the original viewers. However, recent developments indicate that this may be changing. The increasing presence of multilingualism and on-screen text in contemporary original films and TV series is making translation and especially subtitles more visible in the filmmaking industry. Filmmakers are being forced to engage with subtitles and often use them with the creativity with which they approach the rest of their work, which is helping to push the boundaries of traditional standard subtitling. In terms of research, this book and the interdisciplinary work carried out by the Eye Tracking and the Moving Image research group or researchers at the University of Roehampton and Universidade de Vigo shows evidence of an connection between film, translation and accessibility studies. There is indeed much to be investigated about how the same film is received by original, foreign, deaf and blind audiences, if we can even consider it to be the same film. It would be interesting to analyse, for instance, the impact that different translation modalities have on inattentional and edit blindness, attentional synchrony and visual momentum, as well as the potential occurrence of the aforementioned subtitling blindness. These eye-tracking analyses can be combined with further data on viewers’ preferences, comprehension and sense of immersion in the translated film. Accessible filmmaking aims to provide a space for this to happen by fostering collaboration between professionals, trainers and scholars working in film and translation who can use film translation (and subtitling) as an opportunity rather than a limitation. Needless to say, accessible filmmaking cannot be expected to replace the deep-rooted industrialized model that relegates translation and accessibility to the distribution stage as an afterthought in the film industry. This is too firmly established and financially motivated to be changed, at least for now. Instead, the role of accessible filmmaking is to simply sit quietly on the side, learning from the innovation and creativity found in Sherlock-style creative titling and

254

SEEING INTO SCREENS

fan-made translations, and offer an alternative approach for filmmakers and film scholars who care about foreign and sensory-impaired audiences as much as they do about their original viewers.

Acknowledgement This research has been conducted within the frameworks and with the support of the Spanish-government funded projects ‘Inclusión Social, Traducción Audiovisual y Comunicación Audiovisual’ (FFI2016-76054-P), ‘EU-VOS. Intangible Cultural Heritage. For a European Programme of Subtitling in Non-hegemonic Languages’ (Agencia Estatal de Investigación, ref. CSO2016-76014-R) and the EU-funded projects ‘MAP: Media Accessibility Platform for the European Digital Single Market’ (COMM/MAD/2016/04) and ‘Interlingual Live Subtitling for Access’ (2017-1-ES01-KA203-037948).

Notes 1 Further research could look into the extent to which sound, tempo and other factors may have an impact on the rate at which hearing audiences read the subtitles. 2 This applies mostly to narrative and popular Hollywood cinema, but it may be different in slow and experimental cinema, as discussed in Dwyer and Perkins (this book). 3 Almost 60 per cent of the revenue obtained by the leading topgrossing Hollywood films in the last decade comes from the translated (subtitled or dubbed) or accessible (with subtitles for the deaf or audio description for the blind) versions of those films. However, only between 0.1 per cent and 1 per cent of their budgets is normally devoted to translation and accessibility (Romero-Fresco 2013).

References Bisson, M-J., W. Van Heuven, K. Conklin and R. Tunney (2014), ‘Processing of Native and Foreign Language Subtitles in Films: An Eye Tracking Study’, Applied Psycholinguistics, 35 (2): 399–418. Bordwell, D. (2006), The Way Hollywood Tells It, Berkeley, CA: UCP. Bordwell, D. (2011), ‘The Eye’s Mind’, Observations on Film Art, 6 February. Available online: http://www.davidbordwell.net/blog/2011/02/06/the-eyes-mind/ (accessed 1 May 2017). Braverman, B. (1981), ‘Captioning Strategies: A Systematic Research and Development Approach’, American Annals of the Deaf, 126 (9): 1031–36.

EYE TRACKING, SUBTITLING AND ACCESSIBLE FILMMAKING

255

Buchan, J. N., M. Paré and K. G. Munhall (2007), ‘Spatial Statistics of Gaze Fixations during Dynamic Face Processing’, Social Neuroscience, 2: 1–13. Chabris, C. and D. J. Simons (2010), The Invisible Gorilla: And Other Ways Our Intuitions Deceives Us, New York: Crown. Cole, A. (2015), ‘Good Morning, Grade One. Language Ideologies and Multilingualism within Primary Education in Rural Zambia’, PhD diss., University of Edinburgh. Conrad, R. (1977), ‘The Reading Ability of Deaf School-Leavers’, British Journal of Education Psychology, 47: 138–48 Coutrot, A., N. Guyader, G. Ionescu and A. Caplier (2012), ‘Influence of Soundtrack on Eye Movements During Video Exploration’, Journal of Eye Movement Research, 5 (5): 1–10. de Higes Andino, I. (2014), ‘Estudio descriptivo y comparativo de la traducción de filmes multilingües: el caso del cine británico de inmigración contemporáneo’, PhD diss., Universitat Jaume I. de Linde, Z. and N. Kay (1999), The Semiotics of Subtitling, Manchester: St. Jerome Publishing. Di Giovanni, E. and P. Romero-Fresco (forthcoming), ‘Are We All Together Across Languages? An Eye Tracking Study of Original and Dubbed Films’, in Irene Ranzato and Serenella Zanotti (eds), Reassessing Dubbing: The past is present, Amsterdam: Benjamins. Dorr, M., T. Martinetz, K. R. Gegenfurtner and E. Barth (2010), ‘Variability of Eye Movements When Viewing Dynamic Natural Scenes’, Journal of Vision, 10 (28): 1–17. d’Ydewalle, G. and W. De Bruycker (2007), ‘Eye Movements of Children and Adults While Reading Television Subtitles’, European Psychologist, 12: 196–205. Dwyer, T. (2015), ‘From Subtitles to SMS: Eye Tracking, Texting and Sherlock’, Refractory. A Journal of Entertainment Media, 25. Available online: http:// refractory.unimelb.edu.au/2015/02/07/dwyer/ (accessed 1 May 2017). Ekman, P. and W. V. Friesen (1971), ‘Constants Across Cultures in the Face and Emotion’, Journal of Personality and Social Psychology, 17 (2): 124–9. Fox, W. (2012), ‘Integrierte Bildtitel – Eine Alternative zur traditionellen Untertitelung. Am Beispiel der BBC-Serie Being Human’, MA diss., Johannes Gutenberg University Mainz/Germersheim. Fox, W. (2016), ‘Integrated Titles – An Improved Viewing Experience? A Contrastive Eye Tracking Study on Traditional Subtitles and Integrated Titles for Pablo Romero-Fresco’s Joining the Dots’, in S. Hansen-Schirra and S. Grucza (eds), Eyetracking and Applied Linguistics, 5–30, Berlin: Language Science Press. Fox, W. (forthcoming, 2017), ‘Placement Strategies for Integrated Titles – Based on an Analysis of Existing Strategies in Commercial Films’, in Juan José MartínezSierra and Beatriz Cerezo-Merchán (eds), Special issue of inTRAlinea: Building Bridges between Film Studies and Translation Studies. Henley, P. (1996), ‘The Promise of Ethnographic Film’, lecture given in honour of the late Professor Paul Stirling at the 5th International Festival of Ethnographic Film, the University of Kent, 8 November. Hernandez, N., A. Metzger, R. Magné, F. Bonnet-Brilhault, S. Roux, C. Barthelemy and J. Martineau (2009), ‘Exploration of Core Features of a Human Face by

256

SEEING INTO SCREENS

Healthy and Autistic Adults Analyzed by Visual Scanning’, Neuropsychologia, 47 (4): 1004–12. Hochberg, J. and V. Brooks (1978), ‘Film Cutting and Visual Momentum’, in J. W. Senders, D. F. Fisher and R. A. Monty (eds), Eye Movements and the Higher Psychological Functions, 293–317, Hillsdale, NJ: Lawrence Erlbaum. Jensema, C., S. E. Sharkawy, R. Sarma Danturthi, R. Burch and D. Hsu (2000), ‘Eye Movement Patterns of Captioned Television Viewers’, American Annals of the Deaf, 145 (3): 275–85. Kruger, J. L., A. Szarkowska and I. Krejtz (2015), ‘Subtitles on the Moving Image: An Overview of Eye Tracking Studies’, Refractory. A Journal of Entertainment Media, 25. Available online: http://refractory.unimelb.edu.au/2015/02/07/ kruger-szarkowska-krejtz/ (accessed 1 May 2017). Kruger, J-L., M. T. Soto-Sanfiel, S. Doherty and R. Ibrahim (2016), ‘Towards a Cognitive Audiovisual Translatology. Subtitles and Embodied Cognition’, in R. Muñoz Martín (ed.), Reembedding Translation Process Research, 171–94, Amsterdam: Benjamins. Lewis, E.D. (2003), Timothy Asch and Ethnographic Film, London: Routledge. MacDougall, D. (1998), Transcultural Cinema, Princeton, NJ: Princeton University Press. Marchant, P., D. Raybould, T. Renshaw and R. Stevens (2009), ‘Are You Seeing What I’m Seeing? An Eye-tracking Evaluation of Dynamic Scenes’, Digital Creativity, 20 (3): 153–63. McClarty, R. (2012), ‘Towards a Multidisciplinary Approach in Creative Subtitling’, in R. Agost, P. Orero and E. Di Giovanni (eds), Monographs in Translating and Interpreting (MonTI), 4: 133–55. Miquel Iriarte, M. (2017), ‘The Reception of Subtitling for the Deaf and Hard of Hearing: Viewers’ Hearing and Communication Profile and Speed of Subtitling Exposure’, PhD diss., Universitat Autònoma de Barcelona. Mital, P. K., T. J. Smith, R. Hill and J. M. Henderson (2011), ‘Clustering of Gaze during Dynamic Scene Viewing is Predicted by Motion’, Cognitive Computation, 3 (1): 5–24. Perego, E., F. del Missier, M. Porta and M. Mosconi (2010), ‘The Cognitive Effectiveness of Subtitle Processing’, Media Psychology, 13: 243–72. Rassell, A., S. Redmond, I. Robinson, J. Stadler, D. Verhagen and S. Pink (2015), ‘Seeing, Sensing Sound: Eye Tracking Soundscapes in Saving Private Ryan and Monsters, Inc.’, in C. D. Reinhard and C. Olson (eds), Making Sense of Cinema, 139–64, New York: Bloomsbury. Robinson, J., J. Stadler and A. Rassell (2015), ‘Sound and Sight: An Exploratory Look at Saving Private Ryan through the Eye Tracking Lens’, Refractory. A Journal of Entertainment Media, 25. Available online: http://refractory.unimelb. edu.au/2015/02/06/robinson-stadler-rassell/ (accessed 1 May 2017). Romero-Fresco, P. (2011), Subtitling through Speech Recognition: Respeaking, Manchester: Routledge. Romero-Fresco, P. (2013), ‘Accessible Filmmaking: Joining the Dots between Audiovisual Translation, Accessibility and Filmmaking’, JoSTrans: The Journal of Specialised Translation, 20: 201–23. Available online: http://www.jostrans. org/issue20/art_romero.php (accessed 1 May 2017).

EYE TRACKING, SUBTITLING AND ACCESSIBLE FILMMAKING

257

Romero-Fresco, P. (2015) The Reception of Subtitles for the Deaf and Hard of Hearing in Europe, Bern: Peter Lang. Romero-Fresco, P. (2016), ‘The Dubbing Effect: An Eye-Tracking Study Comparing the Reception of Original and Dubbed Films’, paper presented at Linguistic and Cultural Representation in Audiovisual Translation, Sapienza Università di Roma & Università degli Studi di Roma Tre, 11–13 February. Romero-Fresco, P. (forthcoming), ‘Accessible Filmmaking in Documentaries’, in Juan José Martínez-Sierra and Beatriz Cerezo-Merchán (eds), Special issue of inTRAlinea: Building Bridges between Film Studies and Translation Studies. Romero-Fresco, P. and L. Fryer (2016), ‘The Reception of Automatic Surtitles: Viewers’ Preferences, Perception and Presence’, paper presented at Conference Unlimited: International Symposium on Accessible Live Events, University of Antwerp, 29 April. Ruoff, J. (1994), ‘On the Trail of the Native’s Point of View’, in CVA Newsletter, 2: 15–18. Available online: https://www.dartmouth.edu/~jruoff/Articles/ CVANewsletter.htm (accessed 22 February 2016). Sanz, E. (2015), Beyond Monolingualism: A Descriptive and Multimodal Methodology for the Dubbing of Polyglot Films, PhD diss., University of Edinburgh. Simons, D. J. (2007), Scholarpedia, 2 (5): 3244. Smith, T. J. (2011), ‘Watching You Watch There Will Be Blood’, Observations on Film Art, 14 February. Available online: http://www.davidbordwell.net/blog/2011/02/14/ watching-you-watch-there-will-be-blood/ (accessed 1 May 2017). Smith, T. J. (2013) ‘Watching You Watch Movies: Using Eye Tracking to Inform Cognitive Film Theory’, in A. P. Shimamura (ed.), Psychocinematics: Exploring Cognition at the Movies, 165–91, New York: Oxford University Press. Smith, T. J. (2015), ‘Read, Watch, Listen: A Commentary on Eye Tracking and Moving Images’, Refractory. A Journal of Entertainment Media, 25. Available online: http://refractory.unimelb.edu.au/2015/02/07/smith/ (accessed 1 May 2017). Smith, T. J. and J. M. Henderson (2008), ‘Edit Blindness: The relationship between Attention and Global Change Blindness in Dynamic Scenes’, Journal of Eye Movement Research, 2 (2): 1–17. Smith, T. J. and P. K. Mital (2011), ‘Watching the World Go By: Attentional Prioritization of Social Motion during Dynamic Scene Viewing’, Journal of Vision, 11 (11): 478. Smith, T. J. and P. K. Mital (2013), ‘Attentional Synchrony and the Influence of Viewing Task on Gaze Behaviour in Static and Dynamic Scenes’, Journal of Vision, 13 (8): 16. Smith, T. J., J. Batten and R. Bedford (2014), ‘Implicit Detection of Asynchronous Audiovisual Speech by Eye Movements’, Journal of Vision, 14 (10): 440. Subramanian, R., D. Shankar, N. Sebe and D. Melcher (2014), ‘Emotion Modulates Eye Movement Patterns and Subsequent Memory for the Gist and Details of Movie Scenes’, Journal of Vision, 14 (3): 31. Szarkowska, A., I. Krejtz, Z. Kłyszejko and A. Wieczorek (2011), ‘Verbatim, Standard, or Edited? Reading Patterns of Different Captioning Styles Among Deaf, Hard of Hearing, and Hearing Viewers’, American Annals of the Deaf, 156 (4): 363–78. Torres Monreal, S. and R. Santana Hernández (2005), ‘Reading Levels of Spanish Deaf Students’, American Annals of the Deaf, 150 (4), 379–87.

258

SEEING INTO SCREENS

Treuting, J. (2006), ‘Eye Tracking and Cinema: A Study of Film Theory and Visual Perception’, Society of Motion Picture and Television Engineers, 115 (1): 31–40. Vincendeau, G. (1999), ‘Hollywood Babel: The Coming of Sound and the MultipleLanguage Version’, in A. Higson and R. Maltby (eds), “Film Europe” and “Film America”: Cinema, Commerce and Cultural Exchange 1920–1939, 207–25, Exeter: University of Exeter Press.

Filmography Avatar (2009), Dir. James Cameron, USA: Twentieth Century Fox. Babel (2009), Dir. Alejandro Gómez Iñárritu, USA: Paramount Vantage. Casablanca (1942), Dir. Michael Curtiz, USA: Warner Bros. The Colours of the Alphabet (2016), Dir. Alastair Cole, United Kingdom. Home Sweet Home (2012), Dir. Enrica Colusso, United Kingdom/France. Inglourious Basterds (2009), Dir. Quentin Tarantino, USA: The Weinstein Company. Joining the Dots (2012), Dir. Pablo Romero-Fresco, United Kingdom. Lone Star (1996), Dir John Sayles, USA: Columbia Pictures. Men with Guns (1997), Dir John Sayles, USA: Sony Pictures Classics. Mystery Train (1989), Dir. Jim Jarmusch, USA: Orion Classics. Night on Earth (1991), Dir. Jim Jarmusch, USA: Fine Line Features. Nina (2012), Dir. Elisa Fuksas, Italy. Notes on Blindness (2016), Dir. Peter Middleton and James Spinney, United Kingdom: Archer’s Mark. The Revenant (2009), Dir. Alejandro Gómez Iñárritu, USA: Twentieth Century Fox. Secret City (2012), Dir. Michael Chanan, United Kingdom. Sherlock (2010–present), [TV Programme] Dir. Mark Gatiss and Steven Moffat, United Kingdom: BBC. Shrek the Third (2007), Dir. Chris Miller and Raman Hui, USA: Paramount Pictures. Slumdog Millionaire (2008), Dir. Danny Boyle, United Kingdom: Fox Searchlight Pictures.

BIOGRAPHIES

Paul Atkinson Paul Atkinson teaches within the School of Media, Film and Journalism (MFJ), Monash University, Australia. His published articles address a range of topics including Bergsonism, time and cinema, modernism, visual aesthetics, affect theory and modern dance. He is currently working on a book and a series of articles that explore how processual theories of time can be used to rethink aesthetics, narrative and performance. This project involves the examination of how theories of rates of change can be deployed in understanding visual and medial differences.

Jonathan P. Batten Jonathan P. Batten, BSc Hons, MSc, is a PhD student within the CINE (Cognition in Naturalistic Environments) Lab in the Department of Psychological Sciences, Birkbeck, University of London. Under the supervision of Dr Tim Smith and Professor Fred Dick, he studies the influence of sound on when and where vision orients in complex scenes (including film and naturalistic environments), and how this affects perception and memory. He is experienced in quantifying active vision with eye tracking, utilizing psychophysical and behavioural measures to address how sound (music, dialogue, sound effects) can orient attention through time.

Amy Bell Amy Bell is a BSc (Hons) psychology graduate of Edge Hill University. Her final year dissertation investigated the differing perceptions of individuals when viewing a performance containing both a character actor and a highprofile star. Amy has now chosen to pursue a career in the National Health Service.

260

BIOGRAPHIES

William Brown William Brown is a senior lecturer in film at the University of Roehampton, London. He is the author of Non-Cinema: Global Digital Filmmaking and the Multitude (Bloomsbury, forthcoming), Supercinema: Film-Philosophy for the Digital Age (Berghahn 2013) and Moving People, Moving Images: Cinema and Trafficking in the New Europe (with Dina Iordanova and Leshu Torchin, St Andrews Film Studies 2010). He is also the co-editor of Deleuze and Film (with David Martin-Jones, Edinburgh University Press 2012). He has published numerous essays in journals and edited collections, and has directed various films, including En Attendant Godard (2009), Circle/Line (2016), Letters to Ariadne (2016) and The Benefit of Doubt (2017).

Stephen Doherty Stephen Doherty is a senior lecturer in the School of Humanities and Languages based in the Faculty of Arts and Social Sciences at the University of New South Wales in Sydney (UNSW), Australia. His research is broadly based in the interaction between language, cognition and technology. His current work investigates the cognitive aspects of human and machine language processing with a focus on translation and language technologies using a combination of traditional task performance measures, eye tracking, psychometrics and electroencephalography.

Tessa Dwyer Tessa Dwyer is a lecturer in film and screen studies in the School of Media, Film and Journalism at Monash University, Melbourne. She has published widely on language politics and translation in global screen media, and is the author of Speaking in Subtitles: Revaluing Screen Translation (EUP 2017). Tessa is a member of the interdisciplinary research group Eye Tracking the Moving Image (ETMI) and president of the journal Senses of Cinema (www. sensesofcinema.com).

Wendy Fox Wendy Fox works at digital design agency Pixelpublic and has worked as a research assistant and lecturer in audiovisual translation at FTSK Germersheim, Johannes Gutenberg University, Mainz, where she completed her PhD in audiovisual translation. She also has a diploma in communication design from the University of Arts and Design, Karlsruhe. Wendy writes on

BIOGRAPHIES

261

subtitle processing and subtitle design and has recently published in Translation Spaces (2016) and in the forthcoming anthology New Directions in Cognitive and Empirical Translation Process Research (John Benjamins). Her work connecting subtitling and graphic design gained her the Karl Steinbuch Scholarship of the MFG Innovation Agency for ICT and Media (2013).

Pablo Romero-Fresco Pablo Romero-Fresco is a Ramón y Cajal grantholder at Universidade de Vigo (Spain) and honorary professor of translation and filmmaking at the University of Roehampton (London, UK). He is the author of the book Subtitling through Speech Recognition: Respeaking (Routledge) and the editor of The Reception of Subtitles for the Deaf and Hard of Hearing in Europe (Peter Lang). He has collaborated with Ofcom in the UK, AiMedia in Australia and CRTC/CAB in Canada, among other institutions, to introduce and improve access to live events for people with hearing loss around the world. He is the leader of the EU-funded Media Accessibility Platform project (COMM/MAD/2016/04) and a member of the research groups IEIT (UVigo) and Transmedia Catalonia (UAB), for which he coordinated the subtitling part of the EU-funded project DTV4ALL. Pablo is also a filmmaker and is working on accessible filmmaking, which aims to integrate translation and accessibility as part of the filmmaking process. His first documentary, Joining the Dots (2012), was used by Netflix as well as schools around Europe to raise awareness about audiodescription.

Laura Henderson Laura Henderson is a postgraduate student at the University of Melbourne. Her research examines virtual spaces, affective aesthetics and cinematic spectatorship. She has been published in The Conversation, Colloquy and Senses of Cinema.

Jan-Louis Kruger Jan-Louis Kruger is head of the Department of Linguistics at Macquarie University in Sydney, Australia and extraordinary professor at the NorthWest University’s Vaal Triangle Campus in South Africa. His main research interests include studies on the reception and processing of audiovisual translation products including aspects such as cognitive load, comprehension, attention allocation and psychological immersion. He is a co-editor for Perspectives, Studies in Translatology. His current research

262

BIOGRAPHIES

projects investigate cognitive load in the context of educational subtitling with a view to optimizing subtitles as language support in second language environments.

Jared Orth Jared Orth is a PhD candidate in the School of Culture and Communication (Screen Studies) at the University of Melbourne. His research looks at film genre, screen culture in escape-room games and textual problem solving. Specifically, his PhD project looks at how viewers engage in problem solving while viewing mystery films. He is interested in empirical methods of investigating film and audience, including cinemetrics, eye tracking and experimental research.

Claire Perkins Claire Perkins is a senior lecturer in film and screen studies in the School of Media, Film and Journalism at Monash University, Melbourne. She is the author of American Smart Cinema (Edinburgh UP 2012) and co-editor of collections including Indie Reframed: Women’s Filmmaking and Contemporary American Independent Cinema (Edinburgh UP 2016), Transnational Television Remakes (Routledge 2016), US Independent Film After 1989: Possible Films (Edinburgh UP 2015) and B is for Bad Cinema: Aesthetics, Politics and Cultural Value (SUNY 2014). Her writing on film and television has also appeared in journals including Camera Obscura, Continuum, The Velvet Light Trap, Celebrity Studies and Critical Studies in Television.

Adam Qureshi Adam Qureshi is a senior lecturer in psychology at Edge Hill University, and has written extensively on cognitive psychology and our understanding of other people’s perspectives, thoughts and beliefs. He has published in Cognition, Psychopharmacology, PLoS ONE and the British Journal of General Practice, among other journals. 

Sean Redmond Sean Redmond is the director of Deakin Motion Lab – Centre for Creative Arts Research, and a professor in screen and design at Deakin University,

BIOGRAPHIES

263

Australia. He has research interests in film and television aesthetics, eye tracking the moving image, film and television genre, film authorship, film sound, and stardom and celebrity. He has published fifteen books including Liquid Space: Digital Age Science Fiction Film and Television (IB Taurus 2017), the AFI Film Reader: Endangering Science Fiction Film (Routledge 2015), Celebrity and the Media (Palgrave Macmillan 2014) and The Cinema of Takeshi Kitano: Flowering Blood (Columbia 2013). With Su Holmes, he edits the journal Celebrity Studies, which was shortlisted for best new academic journal in 2011.

Jodi Sita Jodi Sita is a senior lecturer in neuroscience and anatomy who uses eye tracking to study human behaviour in areas such as viewing moving images and educational videos, experiencing natural urban landscapes, sports coaching and signature forensics. She is currently working at the Australian Catholic University, in the School of Science, Faculty of Health Sciences, Melbourne, Australia.

Tim J. Smith Tim J. Smith, BSc Hons, PhD, is a reader/associate professor in the Department of Psychological Sciences, Birkbeck, University of London. He is the head of the CINE (Cognition in Naturalistic Environments) Lab, which studies audiovisual attention, perception and memory in real-world and mediated environments (including Film, TV and VR) as well as the impacts of such media on social and cognitive development. He is an expert in active vision and eye tracking and applies empirical cognitive psychology methods to questions of film cognition, publishing his work on the subject in both psychology and film journals.

Sarah Thomas Sarah Thomas is a lecturer in Film Studies at Aberystwyth University and has written widely on stardom, screen performance, Hollywood and cult cinema, and digital media. She is the author of Peter Lorre – Facemaker: Stardom and Performance in Europe and Hollywood (Berghahn 2012) and James Mason (BFI Palgrave 2017), and co-editor with Kate Egan of Cult Film Stardom: Offbeat Attractions and Processes of Cultification (Palgrave Macmillan 2013). 

264

BIOGRAPHIES

Alexander Strukelj Alexander Strukelj is a PhD candidate in English linguistics at the Centre for Languages and Literature, Lund University, Sweden. His current research examines the cognitive aspects of reading, more specifically the monitoring processes involved in the understanding of written text. He is also a research assistant in the EyeLearn project at the Humanities Lab, Lund University, Sweden. The research project focuses on the use of eye tracking in classroom settings, improving multimodal teaching materials and examining the effect of visually degraded text on the reading process.

Ann-Kristin Wallengren Ann-Kristin Wallengren is a professor in film studies, Lund University, Sweden. She got her PhD in 1998 with En afton på Röda Kvarn: Svensk stumfilm som musikdrama (1998, An Evening at Röda Kvarn: Swedish Silent Film as Music Drama), and has during recent years published Welcome Home Mr Swanson: Swedish Emigrants and Swedishness in Film (Nordic Academic Press 2014) and, together with Erik Hedling, edited Den nya svenska filmen: Kultur, kriminalitet and kakofoni (The New Swedish Cinema: Culture, Criminality, Cacophony, Atlantis 2014). Together with K.J. Donnelly, she has edited a special issue of Music and the Moving Image about the psychology of film music (2015), and the anthology Today’s Sounds for Yesterday’s Films: Making Music for Silent Cinema (2016). AnnKristin Wallengren has written a number of articles on Swedish film and national and cultural identity, ideology and transnationality, as well as on different aspects of film music.

INDEX

NOTE: Page locators in bold refer to tables and in italics refer to figures. 2001 (Kubrick) 9, 130–2 and fixations 141–5 impact of film sound on emotional engagement 147–8, 151 and peripheral gazing 146 and representational meanings 149 abstraction and aesthetic perception 31–2 centre of screen vs. periphery gazing 134–6, 141–5 and emotional engagement 145–9 and gaze behaviour 9, 130–3, 137–41, 151–2 and representational meanings 149–50 abusive subtitles 215–16 accessible filmmaking 10, 216, 227–8. See also audiovisual translation; subtitling/subtitles and eye tracking 244–54 actors. See Ford, Harrison; stardom; supporting actors advertisements and narrative transportation 69–70, 79 role of music in narrative transportation of 68, 78 aesthetic perception 7–8, 37, 40 abstraction vs. representational arts 31–2 films vs. visual arts 28–9 nature vs. human images 32–3 notion of 29–31 organizational principles of 31 and response time 36–8

and time 33–6, 43 variation in 34–6 variation in, vis-à-vis different media 28–9, 40–2 aesthetics of integrated subtitles 219–23 notion of 29 aesthetic subtitling 216–18 affective congruence 66 afterimages 19 Alexander Nevsky (Eisenstein) 67, 88 Allen, Karen 201–5, 208 ambiguity 87 and art cinema 176 in painting 35 and spectatorship 105 and viewer suspicion 155–6, 158, 161–2, 164 American Psychological Association (APA) standards 58 Anderson, Joseph 19, 20 anomalous information 164 Año uña (Cuarón) 7, 15, 23–5 La Antena (Sapir) 217 AOI. See area/region of interest APA standards. See American Psychological Association standards area/region of interest (AOI) 50, 133–4 and gaze behaviour 94 gaze time vis-à-vis film star vs. 199–201 notion of 134 Arnheim, Rudolf 31 arousal 76

266

INDEX

art cinema 175–6, 188 n.3. See also 2001 (Kubrick) and modes of viewing practices 176–9 Atkinson, Paul 7–8 Atlas, J. Daniel (movie character: Now You See Me) 110, 116–17 attentional synchrony 38, 43, 88, 104, 162, 181–2, 191–2, 237 and directional cues 38, 39 impact of film editing on 113 impact of film sound on 89–90 impact of subtitling on 108, 122, 242–3 notion of 5, 89 and slow cinema 113–19 audience. See spectatorship audiovisual illusions 87 audiovisual translation 216, 225, 231, 235–6, 244–6, 252–3 impact on eye movements 238–9 auditory scene analysis 87 auditory systems 86–7 Auer, Karin 68 autonomous music and blinking 76–7 awareness and aesthetic perception 33–6, 41–2 Balázs, Bela 174, 179 Bar at the Folies-Bergère (Manet) 35 Batten, Jonathan P. 9 Batty, Craig 156, 161, 199 Baumgarten, Alexander 29 Bazin, André 37, 105 Bell, Amy 10 Belloc, René (movie character: Raiders of the Lost Ark) 198–9, 201–5, 208–11 Benjamin, Andrew 34 Black Swan (Aronofsky) 1–3 blindness, moments of 7, 15, 17–18, 21 blinking/blinks 25 and emotion in film music contexts 76–9 as measure of narrative transportation 65, 68–9, 70–5, 78

and memory 17 types of 70 variability of 78 body. See also human figure as concrete universal 33 as first point of fixation 39–40 of star performer 194 Bordwell, David 3, 175–6, 188 n.3, 195 brain and continuity of perception 19–20 and time perception 23 Brakhage, Stan 131 Brasel, S. Adam 4 Brick (Johnson) 164 Brody, Marcus (movie character: Raiders of the Lost Ark) 198, 204 Brooks, Virginia 118, 120, 237 Brook, Timothy 70 Brown, William 5, 7, 8, 37, 104 Buckland, Warren 196 CAM. See congruence-association model Carrey, Jim 194 Carroll, Noël 3, 29 Carry on Only (Loope) 229 Casablanca (Curtiz) 238–9 CCC approach. See cognitive computation cinematics approach Cemetery of Splendour (Weerasethakul) 9, 103, 110–12, 115–16, 118, 120–2 central bias 5 and abstract experimental films 130, 133–6, 142–5 and film sound 89 notion of 39 and slow cinema 113, 114, 116–17, 119 and star performer 202, 204–5, 208–9 Chandler, Paul 232 n.1 character(s). See also supporting actors joint attention with gaze of 38 overlay between star and character role 197 subtitles vis-à-vis identification of 220

INDEX

Chion, Michel 80, 87 classical Hollywood cinema 22–3, 175–6, 191–2, 193, 196, 199 and gaze behaviour 201–9 close-ups 174, 179, 185 cognitive computation cinematics (CCC) approach 3–4 cognitive control hypothesis 131, 158–9, 165–7 cognitive freedom 36 and aesthetic perception 36–8 and film form 39–42 cognitive load 40 and diegetic sounds 97, 99 indicator of 49 and subtitle/caption 8, 54–5, 122 Cohen, Annabel 66–7, 69 comprehension and exogenous control 158 impact of cognitive tasks on 162–4 impact on eye movements 10, 160–4, 167 role in directing attention 154–5 concrete universals 31, 33 congruence and narrative transportation 9, 69, 74–5, 78, 80 notion of 69 congruence-association model (CAM) 67, 69 conscious perception 41–2 consecutive fixations 50 continuity editing 38, 166, 237, 242 The Conversation (Coppola) 9, 90, 94–9 Costabile, Kristi 68–9, 78 Coutrot, Antoine 68, 75, 238 covert attention 18, 99, 158–9 creative subtitles 215, 217–18, 246 Crowther, Paul 36 darkness seeing 22–4 deflections 240, 242–3 Deleuze, Gilles 21, 23, 37, 106 Desperate Housewives (TV series: Season 1) 163

267

dialogue 86, 90, 91. See also audiovisual translation; subtitling/subtitles sparse/lack of 175, 180–1, 185 diegetic sounds 86 impact on gaze 95–9 impact on scanpath 94 impact on visual attention 91 Di Giovanni, Elena 238–9 directorial intent/control 5, 197 disinterested gaze 29, 30 dispersion 149–50, 238 distracting music 40 Doane, Mary Ann 37 Doherty, Stephen 8, 54–6 Drive (Refn) 10, 174, 189 n.5 a gestural film 178, 188 Hollywood aesthetic and art-house effect 175–6 intersubjectivity of 186–8 multiplicity of faces within a frame 179–87 sparse/lack of dialogue 174–5, 177, 180–1, 185, 187 Driver (movie character: Drive) 180–7 DTV4ALL project 240–1 dubbing effect 238–9 duplicitousness/enigmatic features 177–9, 183–6 dwell count. See fixation count dwell time. See gaze time Dwyer, Tessa 9, 244 D’Ydewalle, Géry 41 Dyer, Adrian 37–8 dynamic scene viewing and attentional synchrony 116–17, 120 and eye movements 32, 39, 182 and fixation duration 111–12, 120–2 impact of film sound on 89 dynamic subtitles 217, 218 Eastwood, Clint 193–4 edit blindness 237 impact of subtitles on 241–2 Eisenstein, Sergei 67, 79, 87

268

INDEX

electroencephalography (EEG) in subtitling and captioning research 4, 53, 55–6 Elliot, Denholm 204 Ellis, Robert J. 67, 77 embodied synchrony 145–7 embodiment theory 9, 131, 145 emotions/emotional engagement and attention 78 and cognition studies 195 and diegetic sound 95–9 and film music 9, 66–8, 76–8, 91 and film sound 9, 95–9, 147–8 and memory 9, 150–1 notion of 76 and silence 146–7, 174, 184–6 and stardom 195 empirical-experiential theory 195 endogenous control 38, 42, 157–8, 162, 192 enigmatic smile 177–78 ethnographic filmmaking 245 event boundaries 158, 166–7 event schemata 165 event segmentation 165–7 notion of 158 notion of an event 165 event structure perception 165–6 Ex Machina (Garland) 218 exogenous control 42, 157–8, 161, 163, 192 experienced viewers abstract vs. representational arts viewing 31–2 variability in aesthetic perception 35 eye(s) eye-mind hypothesis 52–3 imperfection of 16 eye bias 174, 177–9 and dubbed films 238–9 eye movements. See also fixation(s); saccade(s) bottom-up vs. top-down processing 32–3, 35, 52, 119, 130, 157, 163, 192, 236–7 impact of comprehension on 10, 160–4, 167

role of content in 32–3 short duration of 28–9 and task setting 158, 162–4 temporal constraints 39 eye tracking 1–3 applications of 47, 129–30 equipment 58–9 and film critique 188 methodological limitations of 57 primary measures of 48–9, 58 secondary measures of 48, 50–1, 58 Eye Tracking and Moving Image Research Group 6 eye tracking studies on film 15, 236 challenges of 37 early studies 4–5 exogenous factors in 38 methodological challenges of 157–9 eye tracking studies on visual arts aesthetic perception 31–3 face. See also Mona Lisa (painting) bias 31, 39–40, 115–16, 181–6, 202, 212, 236 close-up of 179, 185 as concrete universal 33 polyphonic quality of 179–80 of star performer 201, 205, 212 Fast Five (Lin) 218, 228 film(s) audio cues 85–6 cognitive freedom and film form 39–42 as movement 22–3, 24–5 persistence of 20–1 subjectivity of 174 and translation 216, 235–6, 253–4 ‘the tyranny of film’ 75, 89, 94, 118, 162, 192, 212 film editing and central bias 113, 114 and event segmentation 166 impact on gaze 43 and slow cinema 104 and temporal continuity 37 film music 86 emotive effects of 9, 66–8, 76–8, 91

INDEX

impact on gaze patterns 40 impact on interpretation of visual images 67 and narrative transportation 8–9, 65–6, 68–9, 70–6, 78–9 and visual attention 88 film sound impact on attentional synchrony 89–90 impact on dynamic scene viewing 89 impact on emotions 9, 95–9, 147–8 impact on viewer attention 90 role in film 85–6 and translation concerns 238 first fixation duration notion of 50 and still image viewing 39–40 Fisher, Barbara 19, 20 fixated subtitles 51 fixation(s) 17–18, 23 abstract vs. representational works 32 and aesthetic perception 32–3, 35 centre of screen vs. peripheral areas 134–6 determinants for 42 measures of 49 nature vs. human images 32–3 notion of 49 novice vs. expert viewers 35 sequences of 50–1 and subtitles/captions 53 fixation count 49, 54 fixation duration 8, 10, 54 abstract vs. representational art 31–2 film music research on 73–4 impact of film music on 89, 91 impact of musical silence on 92 integrated subtitles vs. traditional subtitles 221–2 notion of 49 and shot length 111–12, 118–20 Flanagan, Matthew 105 focusing music 40 Foerster, Anna 219 Ford, Harrison 10, 192, 197. See also Jones, Indiana iconic film roles of 196

269

Fox, Wendy 10, 218, 225, 247 Freeman, Paul 202–4, 208–9 Fryer, Louise 250–1 gaze/gaze behaviour 6 and abstract experimental films 130, 141–5 and attention 157–8 impact of cognitive tasks on 162–3 impact of comprehension on 161–2 impact of diegetic sounds on 95–7 impact of film editing on 43 impact of film music on 40 impact of film sound on 8–9, 89 impact of stardom on 10, 192–4, 198–209, 212–13 manipulation of 5, 237 and narrative cues 156, 204–5, 209–11 silent vs. sound conditions 92–4, 97–8 gaze time and aesthetic perception 35–6 centre of screen vs. peripheral area 134–6 notion of 50 and restriction-then-recognition of star character/performer 205, 208–11 star performer vs. areas of interest 199–201 star performer vs. supporting actors 201–7 and subtitle speed 240–1 germane load 54 Gerrig, Richard 70 gestures 178 and duplicitousness 178, 180, 183–4 and emotions 174, 185–6 glance 50 glance count 50 glance duration 50 Goldstein, Robert B. 4 Gone Girl (Fincher) 218 The Good, The Bad and the Ugly (Leone) 193, 205, 209

270

Gosling, Ryan 180. See also Driver (movie character: Drive) Green, Melanie 70 Grodal, Torben 176 Gunning, Tom 20–1 Harland, Beth 35 Harris, George 204 Hasson, Uri 193, 205, 209 HBB4ALL project 250–1 heat maps attentional synchrony in slow cinema 113–17 attention on subtitles 115–16 fixation duration 201 sound impact on visual attention 95–7 Heidenreich, Susan 31–2 Helmholtz, Hermann von 159 Henderson, John 181–2, 242 Henderson, Lauren 10 Heubner, Michael 76 Hevesi, J. 229 Hitchcock, Alfred 5, 22 hit count 51 Hochberg, Julian E. 118, 120, 237 Hong, Richang 217 Honoré, Carl 121 House (TV series) 248 How to Train Your Dragon (DeBlois & Sanders) 9, 90–4 human figure. See also eye bias; face; mouth bias aesthetic perception of 32–3 attention to features of 161 as first point of fixation 39–40 as point of fixation in slow cinema 116–17 variability in aesthetic perception of 35 Hutson, John 158, 160–3 Hu, Yongtao 217 hybrid subtitles 215–16, 219 Iñárritu, Alejandro González 246 inattentional blindness 40, 237 impact of subtitles on 241–2 notion of 242

INDEX

incongruence 80 independent filmmaking and accessibility 246 industrial subtitling 245–6 Ingold, Tim 37–8 integrated titles 10, 55, 215, 218 aesthetic experience of 219–20, 222–3 creation of 223–5 debate on 246–8 impact of 218–19 impact on visual attention 219, 221–2 legibility of 227, 230 placement of 225–9 and spectatorship 218–19, 229–31, 247–50 types of effects in 228–9 typography of 227–8 intersubjective cinema 10, 173–4, 186–8 intertitles 245 intrinsic load 54 Ireland, David 69 Irene (movie character: Driver) 180–5, 187 Irwin, David E. 53 James, William 159 Jaws (Spielberg) 218 La Jetée (Marker) 7, 15, 23, 25 Je vous salue, Sarajevo (Godard) 7, 15, 23, 25–6 John Wick (Stahelski) 215, 218, 227–8 Joining the Dots (Romero-Fresco) 219–3, 226–7, 247, 248–52 joint attention 38 Jones, Indiana/Ford, Harrison 10, 192, 196 actor substitution 197 gaze time 199–201, 209, 212–13 gaze time via restriction-thenrecognition of 205, 208–11 gaze time vis-à-vis supporting actors vs. 201–7 Juslin, Patrick 76

INDEX

Kant, Immanuel 29, 30 Katanga, Simon (movie character: Raiders of the Lost Ark) 204, 206 kinetic subtitles 228–9 Kiss, Miklós 175 Koch, Christof 161 Kruger, Jan-Louis 8, 54–6, 230, 248 Lacey, Ronald 208 Lahnakoski, Juha M. 163 Land, Michael E. 16 Leone, Sergio 193 Levenson, Robert 67 Levin, Daniel T. 196 Libet, Benjamin 41–2 Lim, Song Hwee 105–6 Livingstone, Margaret 174, 177–8, 188 Loach, Ken 244, 246 Loschky, Lester C. 92 Lyotard, Jean-François 32, 34 McClarty, Rebecca 216–17 McDonald, Paul 193 McGurk effect 87 Manet, Édouard 35 Man on Fire (Scott) 218, 227–8 Marchant, Paul 4–5, 38 Massaro, Davide 32–3 Massumi, Brian 174, 178, 187–8 memory and cognitive control hypothesis 131, 165 and emotion vis-à-vis fixation 9 and nostalgia 150–1 and sleep/eye blinks 17 Mera, Miguel 68 Mirror (Tarkovsky) 37 Mital, Parag K. 111, 118, 120 modernism 32 Molnar, François 42 Mona Lisa (painting) 174, 177–8, 185, 188 Monsters Inc. (Docter) 89 mood. See also emotions/emotional engagement and movement mapping 146–7 notion of 76

271

Mosley, Rachel 194 Mothlight (Brakhage) 9, 131–2 central bias vs. peripheral gaze 134–6 and emotional engagement 146–8 and imagery 149–50 and representational meanings 150–1 mouth bias 89–2, 174, 177–9 and dubbed films 238–9 Müller, Matthias M. 159 Mulligan, Carey 180 multimodal narration 76–7 Mulvey, Laura 37, 76 Münsterberg, Hugo 20 Murch, Walter 94 mystery films and eye tracking research 167 popularity of 154 and textual problem solving 155–7 as task-setting tool 9–10, 158, 162–4 and viewer suspicion 161–2 Nakamo, Tamami 17 narrative cues absence of 130 and gaze direction 156, 204–5, 209–11, 237 narrative transportation notion of 69–70 research on role of film music in 68–6 role of film music in 8–9, 65–6, 78–9 nature image aesthetic perception of 32–3 and gaze behaviour 138–40 Niebur, Ernst 161 non-diegetic sounds 86 novice viewers abstract vs. representational arts viewing 31–2 and variability in aesthetic perception 35 Nilssen, Dan-Eric 16 Nochnoy Dozor (Bekmambetov) 215, 228

272

INDEX

Notes on Blindness (documentary) 218, 230–31, 246 Now You See Me (Leterrier) 103–4, 110–11 and attentional synchrony 116–17, 120 and fixation duration 112, 121–2 Orpen, Valerie 193 Orth, Jared 9–10 painterly gaze 10, 177–8 paintings. See also Mona Lisa (painting) aesthetic perception of 35, 43 content of, impact on eye movements 32–3 Parker, Andrew 16 Park, Seung-Bo 217 The Passenger (Antonioni) 9, 103–4, 109 and attentional synchrony 113–15, 118–19 and fixation duration 111–12 and fixation duration vis-à-vis shot length 120 slowness of 110, 123 perception. See also viewer suspicion aesthetics (See aesthetic perception) audiovisual 86–7 event structure perception 165 impact of film sound on 148 of importance of target objects 10, 154–5, 157–8, 161–2, 166–7 of time 23 variability in 8, 30, 33–6 peripheral gaze 1, 89, 134–6, 141–6, 237 Perkins, Claire 9, 156, 161 phenomenology 9, 131 Phoenix, River 197 Pink, Sarah 37–8 pip and pop paradigm 87 Plantinga, Carl 176 play-fighting 178, 187 Posner cueing task 159 post-subtitles 122 Powell, John 91 predictive inferences 163–4

Prokofieff, Sergei 88 psychocinematics 3, 104–5 psychological immersion in subtitle and caption 8, 55–6, 248–50 pupil diameter 49 pupil dilation 68, 91 and diegetic sounds 95, 97, 99 as eye tracking measure 48 notion of 49 puzzle films. See also mystery films and visual attention 9–10, 154–5 Qureshi, Adam 10 Raiders of the Lost Ark (Spielberg) 10, 195–6, 198–9 and attentional synchrony 199–1 and classical style cues 201–3 and classical style cues vs. restriction-then-recognition of star 205, 208–11 and classical style cues vs. supporting actors 204–7 and stardom 10, 209, 212 Raiders of the Lost Art – Indiana Jones and the Last Crusade (Spielberg) 197, 208 Rassel, Andrea 68, 238 Raudonis, Vidas 68–9 Rawson, Philip 31 reading index for dynamic texts (RIDT) 51 Redmond, Sean 9, 38, 68, 176, 195 Refn, Nicolas Winding 10, 174, 177–8, 180–1, 183–6, 188–9 n.4. See also Drive La Région Centrale (Snow) 9, 131–2, 137–41, 150 and emotional engagement 147–8 and imagery 150 and peripheral gaze 146 regressive fixations 50 representational artworks and aesthetic attention 31–2 revisit 50 revisit count 50 revisit duration 50

INDEX

Rhys Davies, John 198, 204–5, 209 RIDT. See reading index for dynamic texts Riverwood, Marion (movie character: Raiders of the Lost Ark) 198–11 Robinson, Jennifer 238 Rogers, Anna Backman 175 Roget, Peter Mark 19, 20 Romero-Fresco, Pablo 10, 121, 230, 238–9, 250–1. See also Joining the Dots Ronin (Frankenheimer) 9, 71, 73, 74 Russell, Jane 194 saccade(s) 7–8, 17–18 impact of film music on 40 notion of 49 novice vs. expert viewing of painting 32 saccade count 49 saccade length/duration 49 saccadic amplitude film music research on 73, 74 increase in 41–2 saliency maps 161 Sallah (movie character: Raiders of the Lost Ark) 198–9, 204–7, 209 Salt, Barry 3–4 Saving Private Ryan (Spielberg) 89 scanpaths 50 impact of film music on 92–3 impact of film sound on 94 Schoonover, K. 106 screen mysteries. See mystery films SDH. See subtitles for the deaf and hard-of-hearing Seel, Martin 30–1, 33–4 self-report scales 55–6 semantic congruence 69 Sherlock (TV series) 38, 154, 236, 243, 254 Sherlock Holmes 154, 164 Sherlock Holmes: A Game of Shadows (Ritchie) 154, 230 Shimamura, Arthur P. 68, 176, 188 n.2 Shin, Young Seok 76

273

shot length. See also slow cinema and art cinema 177 and fixation duration 111–12, 118–20 one-shot sequences 105 and visual momentum 118–19 Shutter Island (Scorsese) 164 silence 77 impact on fixation duration 92–4 meaning through 174–5, 180–1, 185, 187 silent cinema 88 Simons, Daniel J. 196 Simons, Robert F. 67, 77 Sita, Jodi 9, 68, 156, 161 skipped subtitles 51, 53 Sloboda, John 76 slow cinema 8, 9 and attentional synchrony 113–15, 118–19 and eye tracking 108–9, 120–1, 123 and fixation duration 111–12, 118–20 mechanics of 117–18 notion of 103 slow effects 109 spectatorship 105–6, 118 and subtitling 104, 107–8, 121–2, 123 understanding of slowness in 120–1 and viewing style 103–4, 106 Slumdog Millionaire (Boyle & Tandan) 217–18, 227 Smith, Jeff 66 Smith, Tim J. 3–5, 9, 39, 67–8, 111, 118–20, 122, 181–2, 194, 238–9, 242, 243 Sobchack, Vivian 174, 187–8, 194 Song, Guanghan 68 Songs from the Second Floor (Andersson) 9, 71, 73, 74–5, 77 sonic asynchronism 147–8 Soto-Sanfiel, Maria T. 56 sound effects 86, 91. See also film sound spatial frequency 183–6 spatial subtitles 228

274

INDEX

speaker-following subtitles 217 spectatorship and art cinema 176–9, 187–8 and integrated titles 218–19, 229–31, 247–50 and slow cinema 105–6, 118 and stardom 10, 192–5, 209, 212–13 and textual problem solving 154–7, 158, 162–4 Spielberg, Steven 196, 199. See also Raiders of the Lost Ark split attention 219, 221–22 notion of 232 n.1 Stacey, Jackie 194 stardom actor/star substitution 196–7 and gaze behaviour 198–211 overlay between character and 197 and spectatorship 10, 192–5, 209, 212–13 Star Wars films 87 static image viewing 23 period of first fixation 39–40 subtitle vs. static text 241–3 variance in 38, 181–2 still films 7, 23–6 Strick, Madelijn 68 structural congruence 69 structured films 193 Strukelj, Alexander 8–9, 68, 89 Stumpf, Simone 68 subtitles for the deaf and hard-ofhearing (SDH) 10, 218, 238, 240–1, 250–1 subtitling/subtitles 252–3. See also integrated titles automation of 217 impact on attentional synchrony 108, 122, 242–3 impact on inattentional and editing blindness 241–2 impact on visual momentum 243–4 reading patterns 240 shortcomings of 219 and slow cinema 104, 107–8, 121–3 speed of 240–1 vs. static text 241–3

subtitling and captioning research 8, 47 and eye tracking 10, 47–8, 59, 121–2 eye tracking measures used in 48–51, 58 multimodal processing of 46–7 publishing quality 58 research design 57–8 subtitling blindness 252 supporting actors gaze time vis-à-vis star performer vs. 201–7 Sweller, John 232 n.1 Tan, Siu-Lan 66 Tarantino, Quentin 246 Tarkovsky, Andrey 40 Terman, Amanda 68–9, 78 textual problem solving 155–7 Thayer, Julian 67 Thomas, Sarah 10 time 23 time of response and time of attentiveness in film 36–8 time of viewing and aesthetic perception 28, 33–6, 43 time pressure 40 time to first fixation 50 Toht, Arnold (movie character: Raiders of the Lost Ark) 208, 210–11 Tosi, Virgilio 4 Touch of Evil (Welles) 160–3 Tracy, Jo Anne 68 Translation. See audiovisual translation Turano, Kathleen A. 31–2 the ‘tyranny of film’ hypothesis 75, 89, 94, 118, 162, 192, 212 Ullman, Shimon 161 Unema, P. 120 universals 31 unskipped subtitles 51 unstructured films 193–4 Up (Docter & Peterson) 156 Vanderbeeken, Mark 41 Van Der Burg, Erik 87 van Laer, Tom 70

INDEX

ventriloquism effect 87 Vertigo (Hitchcock) 4–5, 38 Victoria (Schipper) 218 viewer suspicion and anomalous information 164 and attention 158, 161–2 notion of 155–6 Vilaro, Anna 122 Vinci, Leonardo da 177–8 Vine, Samuel J. 159 Virilio, Paul 36 vision fallibility of 7 persistence of 18–20 visit duration. See gaze time visual arts. See also paintings and aesthetic attention 28–9, 31–3, 35, 43 visual attention 4 and comprehension 154–5 eye vs. mouth 238–9 and film music 88, 90–4 and gaze 157–62 and integrated subtitles vs. traditional subtitle processing 219, 221–22

275

and mystery films 9–10 notion of 52 salience-based 130, 161–2 and subtitle and caption processing 8, 52–4 and subtitle vs. image processing 240 towards thought 158–9, 165–7 vis-à-vis narrative importance and suspicion 159–62 visual momentum 118–20, 237 impact of subtitles on 243–4 visual saliency 130, 161–2 Vogel, Amos 16 Võ, Melissa Le-Hoa 89 Wallach, Eli 194 Wallengren, Ann-Kristin 8–9, 68, 89 Wilson, Mark R. 159 Winged Migration (Perrin) 9, 71–2, 73–4 Wood, Greg 159 Wyeth, Peter 22, 23–4 Yarbus, Alfred

31, 33, 42, 158