Critical Multimodal Studies of Popular Discourse 2013012107, 9780415624718, 9780203104286

Studies of multimodality have significantly advanced our understanding of the potential of different semiotic resources—

401 48 3MB

English Pages [285] Year 2013

Report DMCA / Copyright


Polecaj historie

Critical Multimodal Studies of Popular Discourse
 2013012107, 9780415624718, 9780203104286

Table of contents :
List of Figures
List of Tables
1 From Multimodal to Critical Multimodal Studies through Popular Discourse
PART I Methodological and Theoretical Challenges
2 Revisiting Cinematic Authorship: A Multimodal Approach
3 The Television Title Sequence: A Visual Analysis of Flight of the Conchords
4 The Strategic Use of the Visual Mode in Advertising Metaphors
5 Japanese Street Fashion for Young People: A Multimodal Digital Humanities Approach for Identifying Sociocultural Patterns and Trends
PART II Key Issues in Contemporary Popular Culture
6 Multimodal Constructions of the Nation: How China’s Music-Entertainment Television Has Incorporated Macau into the National Fold
7 A Multimodal Analysis of the Environment Beat in a Music Video
8 Representations of the Institutional ‘Self’ in Web-Based Business News Discourse
9 Selling the ‘Indie Taste’: A Social Semiotic Analysis of frankie Magazine
10 From Popularization to Marketization: The Hypermodal Nucleus in Institutional Science News
PART III New Audienceship and Authorship in Popular Discourse
11 Telling a Different Story: Stance in Verbal-Visual Displays in the News
12 Point of View in Picture Books and Animated Film Adaptations: Informing Critical Multimodal Comprehension and Composition Pedagogy
13 Points of Difference: Intermodal Complementarity and Social Critical Literacy in Children’s Multimodal Texts
14 Bullet Points, New Writing, and the Marketization of Public Discourse: A Critical Multimodal Perspective
15 Toward a Semiotics of Listening

Citation preview

Critical Multimodal Studies of Popular Discourse

Critical Multimodal Studies of Popular Discourse is a ground-breaking collection of interdisciplinary studies that bridge two major traditions in discourse studies: multimodal and critical discourse analysis. Chapters by leading and emerging scholars explore the role that individual semiotic resources and their interaction play in concealing and supporting, or drawing attention to and subverting, social boundaries and political or commercial agendas in contemporary popular culture. The contributions propose or adopt a range of methods (including content analysis, multimodal corpus analysis, data visualization, film stylistics) and exemplify the value of different theoretical and interpretative perspectives (including cognitive theory, digital humanities, cultural and media studies, musicology, social semiotics, and systemic functional theory) for investigating the role multimodal interaction plays in construing issues such as environmentalism, consumerism, and nationalism, and reshaping the ways popular discourses are received and produced. Together, the chapters examine a rich pool of multimodal texts and phenomena—advertising, jazz, American and European cinema, Japanese fashion, American and Chinese entertainment TV, a music video, an Australian women’s magazine, online science and business news, picture books, animations designed for and by children, print and online newspapers, the new writing practices supported by ubiquitous software such as PowerPoint, and listening. Emilia Djonov is a lecturer in multimodality and multiliteracies at the Institute of Early Childhood, Macquarie University, Australia. Sumin Zhao is Chancellor’s Postdoctoral Research Fellow at University of Technology, Sydney, Australia.

Routledge Studies in Multimodality Edited by Kay L. O’Halloran, National University of Singapore

1 New Perspectives on Narrative and Multimodality Edited by Ruth Page 2 Multimodal Studies Exploring Issues and Domains Edited by Kay L. O’Halloran and Bradley A. Smith

6 Multimodality and Social Semiosis Communication, MeaningMaking, and Learning in the Work of Gunther Kress Edited by Margit Böck and Norbert Pachler

3 Multimodality, Cognition, and Experimental Literature Alison Gibbons

7 Spoken and Written Discourse in Online Interactions A Multimodal Approach Maria Grazia Sindoni

4 Multimodality in Practice Investigating Theory-in-Practicethrough-Methodology Edited by Sigrid Norris

8 Critical Multimodal Studies of Popular Discourse Edited by Emilia Djonov and Sumin Zhao

5 Multimodal Film Analysis How Films Mean John Bateman and Karl-Heinrich Schmidt

Critical Multimodal Studies of Popular Discourse Edited by Emilia Djonov and Sumin Zhao

First published 2014 by Routledge 711 Third Avenue, New York, NY 10017 Simultaneously published in the UK by Routledge 2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN Routledge is an imprint of the Taylor & Francis Group, an Informa business © 2014 Taylor & Francis The right of the editors to be identified as the author of the editorial material, and of the authors for their individual chapters, has been asserted in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilized in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data   Critical multimodal studies of popular discourse / Edited by Emilia Djonov & Sumin Zhao.    pages cm. — (Routledge studies in multimodality ; #8)   Includes bibliographical references and index.   1.  Critical discourse analysis.  2.  Modality (Linguistics)  3. Communication—Methodology.  4. Discourse analysis—Social aspects. I. Djonov, Emilia, 1975– editor of compilation. II. Zhao, Sumin, 1979– editor of compilation.   P302.C68584 2013  401'.41—dc23  2013012107 ISBN: 978-0-415-62471-8 (hbk) ISBN: 978-0-203-10428-6 (ebk) Typeset in Sabon by Apex CoVantage, LLC


List of Figures List of Tables Acknowledgments 1 From Multimodal to Critical Multimodal Studies through Popular Discourse

vii ix xi



PART I Methodological and Theoretical Challenges 2 Revisiting Cinematic Authorship: A Multimodal Approach



3 The Television Title Sequence: A Visual Analysis of Flight of the Conchords



4 The Strategic Use of the Visual Mode in Advertising Metaphors



5 Japanese Street Fashion for Young People: A Multimodal Digital Humanities Approach for Identifying Sociocultural Patterns and Trends



PART II Key Issues in Contemporary Popular Culture 6 Multimodal Constructions of the Nation: How China’s Music-Entertainment Television Has Incorporated Macau into the National Fold LAUREN GORFINKEL


vi Contents 7 A Multimodal Analysis of the Environment Beat in a Music Video



8 Representations of the Institutional ‘Self’ in Web-Based Business News Discourse



9 Selling the ‘Indie Taste’: A Social Semiotic Analysis of frankie Magazine



10 From Popularization to Marketization: The Hypermodal Nucleus in Institutional Science News



PART III New Audienceship and Authorship in Popular Discourse 11 Telling a Different Story: Stance in Verbal-Visual Displays in the News



12 Point of View in Picture Books and Animated Film Adaptations: Informing Critical Multimodal Comprehension and Composition Pedagogy



13 Points of Difference: Intermodal Complementarity and Social Critical Literacy in Children’s Multimodal Texts



14 Bullet Points, New Writing, and the Marketization of Public Discourse: A Critical Multimodal Perspective



15 Toward a Semiotics of Listening



Contributors Index

265 269


2.1 Analyzing authorship based on a dynamic continuum: (a) an auteur’s film demonstrating some discourse patterns showing a creative style while other patterns conform to established genre patterns; (b) a film lacking specific authorial traits 2.2 The opening sequence of eight shots from Bergman’s Wild Strawberries (1957) 2.3 The filmic discourse relations holding within the first eight shots of Bergman’s Wild Strawberries (1957) 2.4 The filmic cohesive chains established across the first eight shots of Bergman’s Wild Strawberries (1957) 2.5 Unidentifiable discourse and cohesive relations in the opening section of Bergman’s Persona (1966) 2.6 Pattern of cohesive chain in Bergman’s Summer with Monika (1953) 2.7 The dominantly used pattern of cohesive chains found in the beginning sequences of 16 films of the corpus 3.1 Shot 5 in detail 4.1 Billboard for Mazda cars (Netherlands 1992), with paint thrown by a protester  4.2 Advertisement for Nivea nail polish 4.3 Still from Peugeot 308 commercial (2007): Car driver manoeuvres quickly to avoid hitting some hummingbirds 4.4 Stills from various commercials in the Mac campaign (2007–2010) showing PC (left) and Mac (right) 5.1 Feature calculation and back projection into self-organizing map (SOM) 5.2 Section of self-organizing map (SOM) and complete feature space 5.3 Dynamic mapping in real time

21 24 25 27 28 29 30 48 61 62 63 65 75 80 82

viii Figures 5.4 Density of self-organizing map (SOM) and accompanying networks 7.1 Representation of drawing after screen shot 1 7.2 Representation of drawing after screen shot 2 9.1 Frank Bits (frankie, issue 35, pp. 18–19) 10.1 Hypermodal nucleus and Futurity news story page 10.2 Image evoking moods and concepts 11.1 The appraisal system (based on Hood, 2006; Martin & White, 2005) 11.2 Strategies for the expression of attitude (Martin & White, 2005) 11.3 Visual graduation: Force system (Economou, 2009) 11.4 Standout orbital structure 11.5 The online SMH standout and The Age print standout 11.6 Social actor chains in the two standouts: Asylum seekers and Australians 12.1 Meeting the lost thing 12.2 Introducing the lost thing in the movie 12.3 Meeting the lost thing—comparing narration in the book and movie versions 12.4 Remembering the lost thing 12.5 A weird, sad, lost, sort of look 13.1 Example of parallel intermodal complementarity in Mikey’s Miss Muffet retelling 14.1 A cultural studies lecture slide 14.2 “Our Strong Economy” brochure: Front, middle and back pages 14.3 “Our Strong Economy” brochure: Report card 15.1 Attentive, thoughtful listening (Twelve Angry Men, Lumet, 1957) 15.2 Angry, resentful listening (Twelve Angry Men, Lumet, 1957) 15.3 Bill Evans, listening (Steve Schapiro, 1961)

84 116 116 150 165 167 184 185 186 187 188 190 209 210 212 213 214 228 237 242 244 258 258 261


2.1 2.2 3.1 3.2 3.3 3.4 3.5 5.1 7.1 7.2 8.1 8.2 8.3

10.1 11.1 11.2 13.1 13.2 13.3 13.4

Eighteen films by Ingmar Bergman used as data Cohesion analysis results for the corpus FOC’s TTS Balance sub-options Example analysis The whale as participant Changes in salient characters Framework for function and systems in clothing semiotics (Owyong, 2009, p. 197) Sample of the multimodal transcription of the music video Summary of the visual and aural strategies employed for the representation of temporal continuity and spatial unity Business news networks’ video webpages Semiotic resources for constructing and negotiating the identity of business news networks in online news videos Representations of news affiliates and institutional actors in videobites and news videos on popular business news networks Logical relationship between the lead and caption SMH standout: Visual and verbal components The Age print standout Shot sequence of the first scenes in Mikey’s retelling of Little Miss Muffet Moment of high intermodal convergence in Mikey’s Miss Muffet retelling Shot sequence of next scene in Mikey’s retelling of Miss Muffet Moment of both intermodal convergence and divergence in Mikey’s Miss Muffet retelling

23 31 41 44 45 47 49 86 112 120 126 130

132 169 191 194 225 226 227 228

This page intentionally left blank


The idea for the volume first emerged in late 2010, two decades since the publication of Kress and Van Leeuwen’s original Reading Images in 1990 and almost 10 years after their Multimodal Discourse appeared in 2001. While the timing is coincidental, we see this volume as an opportunity to further advance the work started by the co-founders of multimodal discourse analysis—Gunther Kress and Theo Van Leeuwen—and thank them for inspiring this project, alongside other luminary figures in multimodal and critical discourse analysis, who are acknowledged in references across the contributions. Working on this volume, we have been fortunate enough to enjoy the support and cooperation of all the contributors. Their dedication to this project was evident in all its stages, during the collection, peer review, and revision of their chapter proposals and full papers. We would also like to acknowledge the generosity of the peer reviewers involved in this project. We deeply appreciate the expert advice, patience, and understanding of our editors Felisa Salvago-Keyes and Catherine Tung from Routledge, and of Professor Kay O’Halloran, the founder and editor of the Routledge Series in Multimodality, which made working on this project a great collaborative learning experience. Finally, we are indebted to the Faculty of Arts and Social Sciences, University of Technology, Sydney for financial contribution toward the preparation of the manuscript, and to our proofreader, Eddie Hopely, for his meticulousness and efficiency. We would also like to acknowledge the copyright holders for permission to reprint the following materials: • Screenshots from news videos on popular business news networks (Society for Cinema and Media Studies) • Frank Bits (Frankie, issue 35, pp. 18–19) (frankie magazine) • Screenshots from • Combinations of prominent images, main headlines, captions, and subheadlines that introduced David Marr’s news feature story published

xii Acknowledgments in the Sydney Morning Herald’s online edition and in The Age print edition from October 17, 2009, respectively titled “Come hell or high water” and “Fear rides in on rusty boat.” (Fairfax Media) • Pages from Shaun Tan’s The Lost Thing (Hachette Australia) • Screenshots from Ruhemann and Tan’s animation The Lost Thing (Passion Pictures Australia) • A photo of Bill Evans, listening (Steve Schapiro).


From Multimodal to Critical Multimodal Studies through Popular Discourse Emilia Djonov and Sumin Zhao

1. ORIENTATION TO CRITICAL MULTIMODAL STUDIES OF POPULAR DISCOURSE Over the past two decades or so, two strands of discourse studies have emerged and gained considerable momentum in semiotics and applied linguistics, media and cultural studies, and education. The first, multimodal discourse analysis (MDA) explores the meaning-making potential of different communication modes and media and their actual use and dynamic interaction with each other and with the sociocultural context in which they operate. The second, critical discourse analysis (CDA) is concerned with the relationship between language (and to a lesser extent other modes) and power by studying how communication conceals and legitimizes, or reveals and even subverts social boundaries, inequality, and political or commercial agendas. Despite each having a different primary focus, both strands share two fundamental understandings about human communication. One is that human communication is always multimodal. Meaning-making involves selecting from different modes (e.g., written language, sound, gesture, visual design) and media (e.g., face-to-face, print, film) and combining these selections according to the logic of space (e.g., a sculpture), time (e.g., a sound composition), or both (e.g., a film) (Kress, 2010). The other fundamental understanding is that human communication is always social. It is defined by and construes, and over time can be transformed by and transform, its social context. Besides these shared understandings, both MDA and CDA thrive on disciplinary, theoretical, and methodological diversity, which provides a firm ground on which to explicitly unite their agendas. Critical Multimodal Studies of Popular Discourse is an invitation for MDA scholars, such as us, the editors, to engage in what we would call ‘critical multimodal discourse analysis’ by focusing on popular discourse. Specifically, contributions to this volume advance the field of multimodal studies by engaging in dialogue with critical social theory in general (e.g., the work of Mikhail Bakhtin, Roland Barthes, Basil Bernstein, Pierre Bourdieu, Michel Foucault, Paulo Freire, Erving Goffman, and the Frankfurt School) and CDA in particular (e.g., as developed in Chouliaraki & Fairclough,

2  Emilia Djonov and Sumin Zhao 1999; Fairclough, 2001 [1989], 2010 [1995]; Van Dijk, 1993, 2008b; Van Leeuwen, 2008; Wodak & Chilton, 2007; Wodak & Meyer, 2009b [2001]; Wodak & Weiss, 2003). Although this dialogue has been promoted in many individual projects (e.g., Chouliaraki, 2006; Lemke, 2006; Machin, 2004; Machin & Van Leeuwen, 2007; Macken-Horarik, 2003; O’Halloran, 2004a; Van Leeuwen, 2000, 2009), as Van Leeuwen (2013) argues, the critical analysis of multimodal discourse has yet to establish itself as a field with “a clear academic identity of its own [and] its own conferences, journals, edited books and so on” (p. 1). This book is envisaged as a step toward this goal, complementing invitations for critical discourse analysts to engage with multimodality (on ‘multimodal critical discourse analysis’, see the forthcoming special issue of Critical Discourse Studies edited by David Machin, as well as Machin & Mayr, 2012). Popular discourse, as we discuss later, offers a fertile ground on which to take this step and explore how different semiotic resources can be employed to perpetuate or challenge prevailing sociocultural beliefs, stereotypes, and norms. In this chapter, we offer a snapshot of the history, research directions, and dominant approaches within MDA and CDA, highlighting their shared backgrounds and aspirations and thereby setting the scene for this volume. We then present a brief outline of the following chapters, making explicit the similarities and differences between their analytical methods, the critical social issues they consider, and the data through which these issues are explored. 2.  MULTIMODAL DISCOURSE ANALYSIS As Kress (2010) states, “Multimodality names both a field of work and a domain to be theorized” (p. 54). The domain of multimodality—the ways different semiotic resources such as language, colour, dress, and gesture are concurrently deployed in communication across a range of overlapping spheres such as politics, art, education, and everyday interactions—has long intrigued scholars, artists, and educators (Kaltenbacher, 2004). It is, therefore, not surprising that as a field of work, multimodality has benefited from insights from a wide array of disciplines, including anthropology; philosophy; psychology; visual, media, and cultural studies; fine art; linguistics; and semiotics, as contributions to Jewitt (2009) and O’Halloran and Smith (2011) attest. The origin of multimodality, or MDA, as a named field of academic research, however, is relatively recent, and is strongly associated with two seminal 1990s studies—Kress and Van Leeuwen’s (1990, 2006 [1996]) Reading Images and O’Toole’s (2011 [1994]) The Language of Displayed Art. Since then, research in MDA has pursued two main interdependent directions. The first involves mapping the history and unique meaning-­making ­potential of individual semiotic resources, such as visual design, sound, action/gesture, space, mathematical symbolism, and typography, while the second concentrates on theorizing and analyzing the interaction between

From Multimodal to Critical Multimodal Studies  3 different semiotic resources in multimodal communication in specific social contexts, and motivates the first. MDA’s rapid expansion, interdisciplinarity, and theoretical and methodological richness make characterizing the field a precarious task. Yet, discernible within it are two main distinct, yet compatible, approaches—the social semiotic and the interactional—and a third, emergent influence—that of cognitive theories of communication. The social semiotic approach is grounded in Halliday’s (1978) theory of language as a social semiotic, where language is viewed as “one of the semiotic systems that constitute a culture” (p. 2), interpreted “by reference to its place in the social process” (p. 4), and modelled as a resource for making meaning that has evolved, and is organized, in response to the three functions (‘metafunctions’) it serves in society. This view is encapsulated in Halliday’s model of the relationship between text/discourse and social context. In this model, a text is any “social exchange of meanings” (Halliday, 1985, p. 11) or act of communication, and simultaneously serves all three functions: representing patterns of experience and the logical relations among them (ideational meaning), conveying emotions and attitudes and enacting social relations (interpersonal meaning), and interweaving ideational and interpersonal meanings into a cohesive and coherent semantic unit, that is, a text (textual meaning). In Halliday’s (1994 [1985]) systemic functional linguistics (SFL), the resources language offers for making ideational, interpersonal, and textual meaning are represented as systems of choices known as system networks. The focus on meaning, definition of culture as “a set of semiotic systems, a set of systems of meaning” (Halliday, 1985, p. 4), and model of the dynamic relationship between text and context have stimulated social semiotic theory’s expansion beyond language. The first step in this direction is Hodge and Kress’s (1988) outline for a theory of social semiotics that would support a transdisciplinary dialogue on communication in all its forms and across different institutional contexts and for which “texts and contexts, agents and objects of meaning, social structures and forces and their complex interrelationships together constitute the minimal and irreducible object of semiotic analysis” (p. viii). Multimodal social semiotics has inherited the sociopolitical orientation evident in Hodge and Kress’s proposal, and seeks to theorize the principles (e.g., framing, salience, style, genre) that apply across different modes and their interaction, as well as to address the ways multimodal meaning-making reflects the interests of meaning-makers, their differential access to semiotic resources, and the norms that govern semiotic practices (e.g., rules built into technologies for making meaning, such as office software) (cf. Kress, 2010; Kress & Van Leeuwen, 2001; Van Leeuwen, 2005). Within social semiotics, there is also a steadily growing group of studies inspired specifically by SFL and known as systemic functional multimodal discourse analysis (SF-MDA) (O’Halloran, 2008). SFL’s influence is recognized in, for instance, Kress and Van Leeuwen’s (2006 [1996]) grammar of visual

4  Emilia Djonov and Sumin Zhao design, O’Toole’s (2011 [1994]) framework for analyzing displayed art, and Van Leeuwen’s (1999) unified theory of sound, as well as in many studies of multimodal semiosis (e.g., Baldry & Thibault, 2006; Lemke, 2002; Martinec, 2005; O’Halloran, 2004b, 2005; Royce & Bowcher, 2007). In addition to adapting to multimodality SFL concepts and analytical tools (e.g., the metafunctions, system networks, stratification, rank), SF-MDA studies have built on SFL frameworks for the analysis of ‘discourse beyond the clause’ (Martin & Rose, 2007 [2003]), including Appraisal Theory (Martin & White, 2005), which deals with the language of evaluation, and work on genre (most comprehensively covered in Martin & Rose, 2008). This has enabled SF-MDA research to shed light on the relationship between multimodal discourse, knowledge construction, identity, and affiliation (e.g., Bednarek & Martin, 2010; see also Bednarek, Chapter 3; Economou, Chapter 11; Zhang & O’Halloran, Chapter 10, this volume). Multimodal social semiotics has been widely applied in contexts of education, advancing Halliday’s (1978, p. 5) “concern with language in relation to the process and experience of education” along two related paths. One involves exploring multimodal learning in social interactions in diverse educational settings (e.g., Jewitt, 2006; Jewitt & Kress, 2003; Kress et al., 2005; Kress, Jewitt, Ogborn, & Tsatsarelis, 2001). The other focuses on developing “an educationally accessible functional grammar, that is, a metalanguage that describes meaning in various realms [that] include the textual, the visual as well as the multimodal relations between different meaning-making processes” (New London Group, 1996, p. 77) on the basis of analyses of a rich array of multimodal texts and genres (e.g., Unsworth, 2001, 2008a, 2008b). By giving students, teachers, and curriculum developers tools they can share for describing, analyzing, and evaluating multimodal meaning-making, these studies have paved the way toward overt multiliteracies pedagogies and equity in education (Callow, 2006; Cope & Kalantzis, 2000; see also Thomas, Chapter 13, and Unsworth, Chapter 12, this volume). The second dominant approach to MDA, interactional multimodal discourse analysis has roots in anthropology, conversation analysis, linguistic ethnography, interactional sociolinguistics, and research on nonverbal communication (e.g., the work of Erving Goffmann, John Gumperz, Deborah Tannen, Ray Birdwhistell, Gregory Bateson, Albert Scheflen, and Adam Kendon). Best represented in the work of Ron Scollon (2001) and Sigrid Norris (2004), it explores “how a variety of modes are brought into and constitutive of social interaction, identities and relations, with a particular interest in habitus and embodiment” (Jewitt, 2009, p. 33). This approach employs activity theory developed in the Vygotskian tradition and James Wertsch’s notion of ‘mediated action’ for conceptualizing the relationship between semiotic/cultural tools, discourse, and society. Like social semiotic multimodal studies of classroom discourse (cf. Kress et al., 2001; Kress et al., 2005), it combines social semiotic frameworks with ethnographic methods when investigating the role that discourse embodied in concrete semiotic

From Multimodal to Critical Multimodal Studies  5 artefacts and situated interactions plays in social change. Its strength lies in producing thick descriptions of social interactions and revealing the relations among social actors in complex institutional practices, such as those in news media (e.g., see Tan, Chapter 8, this volume), and the ways identity is enacted and redefined dynamically in everyday (inter)actions (Norris, 2011). Contributions to MDA have also been influenced by cognitive approaches concerned with establishing the conditions for effective communication (e.g., McNeill, 2005; Sperber & Wilson, 1995 [1986]). A key proponent of this approach is Charles Forceville, who sees cognitive theory as a means of developing less interpretive methods for multimodal discourse analysis. His own work concentrates on the formal qualities of pictorial and multimodal metaphor in print and TV advertising, comics, cartoons, and film (Force­ ville, 1996; Forceville & Urios-Aparisi, 2009) as a prerequisite to exposing how this semiotic device is exploited for commercial purposes (Forceville, Chapter 4, this volume). The cognitive approach to MDA has also been applied to the study of multimodal experimental literature (Gibbons, 2011) and the viewing and description of pictures (Holsanova, 2008). 3.  CRITICAL DISCOURSE ANALYSIS CDA, like MDA, is an umbrella term for a heterogeneous group of studies employing diverse theories, methods, and data, yet united by “a common goal: the critique of the hegemonic discourses and genres that effect inequalities, injustices, and oppression in contemporary society” (Van Leeuwen, 2006, p. 290). CDA is a child of Critical Linguistics, a movement that started at the University of East Anglia in the mid-1970s (Fowler, Hodge, Kress, & Trew, 1979; Kress & Hodge, 1979). Inspired by the ideas of Halliday, Marx, and Whorf, critical linguists took the fundamental step of interpreting grammatical categories as potential traces of ideological mystification, and broke with a tradition in which ways of saying the same thing were seen as mere stylistic variants, or as conventional and meaningless indicators of group membership categories such as class, professional role, and so on. (Van Leeuwen, 2006, pp. 291–292) The next two decades saw the birth and development of CDA as a “problemoriented, and therefore necessarily interdisciplinary and eclectic . . . research programme” (Wodak & Meyer, 2009a [2001], pp. 3–4). Events strongly associated with its emergence include the extension of critical linguistics’ ­sociopolitical orientation to meaning-making in general in Hodge and Kress’s (1988) Social Semiotics and in the journal Social Semiotics founded in 1991, and the publication of a special issue of the journal Discourse and Society in 1993, with papers by Norman Fairclough, Gunther Kress, Teun Van Dijk,

6  Emilia Djonov and Sumin Zhao Theo Van Leeuwen, and Ruth Wodak and Bernd Matouschek, originally presented at a 1992 meeting in Amsterdam (Van Leeuwen, 2006). Motivated by social issues, CDA grounds discourse analysis in critical social theory in order to move beyond description and interpretation, and explain and raise awareness of the relationship between discourse and social structures (Chouliaraki & Fairclough, 1999; Fairclough, 2010 [1995]). Of particular interest to CDA are latent ideologies, such as those implicit in everyday conceptual metaphors as theorized by Lakoff and Johnson (1980), and mass media discourses, such as those found in the news and popular culture texts (Van Dijk, 2001; Wodak & Meyer, 2009a [2001]). CDA’s ultimate goal is to contribute to social change by engaging in both “negative critique . . . of how societies produce and perpetuate social wrongs, and positive critique . . . of how people seek to remedy or mitigate them, and identification of further possibilities for righting and mitigating them” (Fairclough, 2010 [1995], p. 7; cf. Martin, 2004). (An example of positive critique is Maier and Cross’s contribution to this volume, Chapter 7.) Despite the field’s diversity, three dominant approaches can be recognized in CDA—Fairclough’s dialectical-relational, Wodak’s discourse-historical, and Van Dijk’s sociocognitive approach. Also of particular relevance to this volume is the framework developed by Van Leeuwen (2008), a co-founder of both MDA and CDA, due to the strong connections it fosters between these two strands of discourse studies. Fairclough’s approach (Chouliaraki & Fairclough, 1999; Fairclough, 2001 [1989], 2010 [1995]) has been developed through a focus on neo-capitalism in corporate and public discourse in knowledge-based economies. It is inspired by a rich array of social theories (including those of Marx, Gramsci, Giddens, Habermas, Althusser, Foucault, and Bourdieu), employs SFL for the analysis of texts, considers relations between and within discourses as well as between discourses and other elements of the social context, and emphasizes the interdependence of discursive and material structures. Wodak’s discourse-historical approach (e.g., Wodak, De Cillia, Reisigl, & Liebhart, 2009 [1999]) has explored racism, anti-Semitism, immigration, and nationalism, and is chiefly influenced by the Frankfurt School. Its central tenet is that CDA must relate texts to their co-texts, immediate situational context, and broader sociocultural and historical contexts, and this requires a strong empirical basis (i.e., analyzing large corpora comprising different genres), interdisciplinary collaboration, and methodological diversity (e.g., corpus linguistics, ethnography, argumentation theory, and rhetoric have all been employed in this approach). Van Dijk (2008b), like Wodak, has investigated racism, xenophobia, and anti-Semitism. Based on cognitive social theory, the distinguishing feature of Van Dijk’s (2008a) approach is the argument that discourse structures cannot be related to social structures directly but only through the “mental model[s] of everyday experience” of individual language users (p. 71).

From Multimodal to Critical Multimodal Studies  7 Van Leeuwen’s (2008) approach, which—alongside the work of Hodge and Kress (Kress & Hodge, 1979; Hodge & Kress, 1988)—is a significant inspiration for this volume, has been developed through explorations of consumerism, immigration, globalization, ideologies of childhood, schooling, gender, and war, and incorporates insights from many branches of critical social theory. Taking as a starting point Bernstein’s (1990) notion of ‘recontextualization’, Van Leeuwen provides a model for analyzing how discourses recontextualize social practices by substituting, deleting, and rearranging the elements of social practices (e.g., social actors, activities, location, time, instruments) and/or by adding evaluations, purposes, or legitimations, and uses SFL for the analysis of verbal discourse. A distinguishing feature of this model is its adaptability to nonverbal and multimodal representations (e.g., Machin & Van Leeuwen, 2007; Van Leeuwen, 2000, 2009; see also Djonov & Van Leeuwen, Chapter 14, Economou, Chapter 11, Maier & Cross, Chapter 7, and Tan, Chapter 8, this volume), and the argument that CDA needs to consider not only what is or is not represented nonverbally or multimodally (e.g., whether ethnic minorities are represented in the media) but also how such representations are constructed. Finally, like MDA, CDA has had a significant impact in education. Fairclough’s (2001 [1989], 2010 [1995]) notion of ‘critical language awareness’ (later called ‘critical discourse awareness’), for example, was extended simultaneously to education and to multimodality in the influential multiliteracies manifesto of the New London Group (1996), of which Fairclough was a founding member alongside MDA’s originator Gunther Kress. A key goal of (critical) multiliteracies pedagogies is to help shape social futures by assisting learners in developing conscious awareness of and control over the relation between discourse and specific socio-historical, political, and cultural contexts, and thereby equipping them to successfully participate in and transform social practices (see, for example, Thomas, Chapter 13, and Unsworth, Chapter 12, this volume). As the brief overview presented above suggests, MDA and CDA have much in common—both focus on the use of language and other semiotic modes in social context, both have drawn inspiration from social semiotics and critical theories, both are characterized by interdisciplinarity and theoretical and methodological diversity, and both have been influential in contexts of education. 4.  POPULAR DISCOURSE As multimodal discourse analysts, we believe that popular discourse provides a unifying ground for efforts to engage in critical multimodal discourse analysis. The contributions to this volume reflect our broad definition of popular discourse, which includes but is not limited to popular culture, neither to traditional concerns about the divide between high/elite and low/mainstream or commercial culture, nor to particular forms of mass entertainment or

8  Emilia Djonov and Sumin Zhao communication technologies (cf. Danesi, 2012). For us, the theme of popular discourse encourages investigations of how our semiotic and sociocultural landscapes are reflected in and transformed by popular culture texts (e.g., hit TV shows, music videos, and women’s magazines) and phenomena (e.g., fashion, jazz), by texts designed to popularize particular institutional discourses (e.g., science, business, and politics), and by ubiquitous semiotic practices and technologies (e.g., the use of office software and social media). We also believe that critical multimodal studies of popular discourse have the potential to contribute to both MDA, as they stimulate the development of tools for empirically relating multimodal communication to social power, and CDA, as they respond to Van Leeuwen’s (2013) argument that “racist stereotypes persist in visual rather than verbal texts, and in comic strips, advertisements and other forms of popular culture rather than in more factual and ‘highbrow’ texts” (p. 2) and that “the discourses that need the scrutiny of a critical eye are now overwhelmingly multimodal and mediated by digital systems that take multimodality entirely for granted” (p. 5). 5.  OUTLINE OF SECTIONS AND CHAPTERS The fourteen contributions to this volume are organized into three sections. The chapters in Part I are oriented primarily toward addressing theoretical or methodological challenges pertinent to critical multimodal discourse analysis. Tseng and Bateman (Chapter 2) offer a new angle on the much debated theory of cinematic authorship. They present an analysis of cohesion and discourse relations between event segments in a multimodal corpus of the beginning sections of 18 films by Ingmar Bergman, and argue the potential of their framework to be applied in film stylistics for identifying the distinctiveness of authors’ works and contrasting patterns of authoriallydistinctive versus mainstream films. Bednarek (Chapter 3) presents an analysis of the title sequence (opening credits, intro) of the musical television comedy Flight of the Conchords, which is first contextualized through a basic content analysis, in the spirit of media studies, of 50 title sequences of popular TV shows. The chapter shows how visual choices in the comedy’s opening credits reflect overall social functions of television title sequences (e.g., attracting audiences, providing continuity, or characterizing a series and its genre) and how the identity of the ‘hipster’ is turned into a commodified identity. Forceville (Chapter 4) explains how visual and multimodal metaphors in advertising work, and argues that conceptual metaphor theory provides a critical tool in the analysis and evaluation of advertising, as metaphors present one kind of thing (a ‘target’) in terms of another (a ‘source’) and are therefore ideal instruments for advertisers to make claims about products (the metaphors’ targets) efficiently and implicitly. Taking a digital humanities orientation, Podlasov and O’Halloran (Chapter 5) address the challenge of developing ‘cultural analytic’ methods robust enough to manage

From Multimodal to Critical Multimodal Studies  9 ‘big social data’ (Manovich, 2012), as they demonstrate how two data visualization techniques, self-organizing map (SOM) and growing neural gas (GNG), can be combined with multimodal social semiotics to reveal patterns and trends in a large collection of Japanese street fashion photographs. They also consider how automated quantitative data analysis can be productively employed in multimodal semiotics for understanding and interpreting sociocultural patterns and trends, and the fundamental questions this raises for critical discourse analysts. Each contribution in Part II concentrates on a particular issue in contemporary popular culture. Gorfinkel (Chapter 6) examines how Macau has been symbolically ‘returned’ to the People’s Republic of China (PRC) in popular music-entertainment performances on China Central Television (CCTV). Informed by cultural studies and semiotics, the analysis reveals the strategic use of visual, musical, and linguistic modes by party-state television to naturalize a particular national identity discourse. Maier and Cross’s positive critique of Michael Jackson’s music video Earth Song (Chapter 7) explores how environmental concerns are communicated through the complex orchestration of different semiotic modes, specifically through its role in construing time and space. Tan (Chapter 8) combines conversation analysis, discourse analysis, and social semiotics in a multidisciplinary study of the institutional identities that popular business news networks construct in mediated discourse on the Internet. Specifically, the study examines the social roles and relationships constructed by business news networks; the process types, categories, and discursive practices they use; and the issues these may raise for coparticipants in the studio and Internet audiences. Zhao (Chapter 9) examines how a popular Australian independent women’s magazine titled frankie employs multimodal resources to distinguish itself from mainstream women’s glossies by commodifying anticonsumerist independent (‘indie’) culture. This analysis reveals that the key strategy frankie uses to mask its consumerist nature is allowing multiple ‘consumptions’— those of semiotic artefacts, of culture, and of consumer goods—in a single discursive space. Zhang and O’Halloran (Chapter 10) analyse the newly emerging forms of hypermodal nuclei in an institutional science news website. The nuclei consist of images from image banks and short introductions of research findings, and cater to the information sharing style of social media. By examining the social context that has shaped their emergence, the authors argue that these nuclei are a type of ‘scifotainment’ that has moved science news reports beyond the popularization of science research into the realm of the marketization of universities and researchers. Part III comprises contributions concerned with questions of new audienceship and authorship in popular discourse in general. Economou (Chapter 11) compares two verbal-visual displays that introduced exactly the same feature article on asylum seekers in different Australian broadsheet news sites, one print and one online. An SF-MDA analysis reveals that each display tells a different story and attitudinally positions readers quite differently toward the

10  Emilia Djonov and Sumin Zhao issue, and so reflects differences in terms of editorial responsibility in light of pressures to popularize serious news. Unsworth (Chapter 12) explores the highly acclaimed literary picture book The Lost Thing by Australian author/ illustrator Shaun Tan and its Oscar-winning animated film adaptation. Comparative analyses of selected segments, dealing with the same story elements in book and movie form, are described in terms of interactive meanings and point of view in images, and the relationship between the images and the language of the narration. These analyses indicate how different interpretive possibilities are constructed for ostensibly the same story elements in the two versions. Such analyses of literary picture books and their popular animated film adaptations provide a powerful basis for critical multimodal literacy pedagogy. Thomas (Chapter 13) focuses on intermodal complementarity as it applies to young children’s multimodal compositions. It shows that a child is able to exhibit deep understandings about how to create powerful moments of affect in narratives through the semiotic resources of image, verbiage, and sound. By discussing these moments as opportunities for resistance to traditional discourses and ideologies, the chapter demonstrates the potential of student-created multimodal narratives for social critical literacy. Djonov and Van Leeuwen (Chapter 14) argue that bullet lists epitomize both new writing practices, which are promoted through ubiquitous software such as Microsoft’s PowerPoint, and the marketization of public discourse, and that a critical multimodal perspective is key to learning how to employ new writing effectively and understanding its role in obscuring and maintaining social divisions. The argument is illustrated with an analysis of the recontextualization of the Australian Treasurer’s 2012–2013 budget speech into a brochure promoting the government’s economic achievements. In the book’s final chapter (Chapter 15), Van Leeuwen ventures into an uncharted territory for MDA and CDA, and introduces an approach to analysing listening as a multimodal semiotic activity. Studying the placement and articulation of ‘listening signs’ in the listening shots that intersperse film monologues and in the accompaniment of modern jazz solos, the chapter shows how listening signs articulate interpretations and evaluations of these monologues and solos and actively contribute to their development. He concludes that critical discourse analysis could use listening analysis to give voice to what otherwise might remain silent dissent. 6.  SUMMARY AND OUTLOOK As the above overview of MDA and CDA and the outline of chapters suggest, this book captures the state of the art in the two fields that it aims to bring together. The contributions cover a range of visual, verbal, aural, and kinetic semiotic resources and their multimodal interaction; present analyses of diverse multimodal texts and phenomena (feature films, TV comedy series, metaphor in print and TV advertising, Japanese street fashion, Chinese musicentertainment TV, popular music video, popular online business and science

From Multimodal to Critical Multimodal Studies  11 news, women’s magazines, literary picture books and their animated film adaptations, children’s multimodal narrative compositions, the new writing practices embodied in slideshow presentations and government brochures, and listening in film and jazz accompaniment); and offer insights into an array of popular discourse issues (authorship in film, metaphor in advertising, the hipster identity, fashion and culture, national and institutional identity, indie culture, the popularization of science, critical multiliteracies pedagogy, the marketization of popular discourse, and the ‘voice’ of listening). Collectively, the chapters (some more and others less directly) also address some of the key challenges that face both strands of discourse analysis: empirically relating local textual patterns to broader sociocultural questions, exploring how cultural and sociohistorical factors and new technologies transform and are shaped through multimodal communication, extending the focus of MDA and CDA beyond Western culture, and capturing more of the complexity of culture than can be done with the critical analysis of multimodal texts alone. In order to address these challenges, the contributions to this book both reflect and break away from the considerable dominance of social semiotics and SFL in multimodal and critical discourse studies as they embrace various theoretical, methodological, and disciplinary perspectives (multimodal corpus analysis; conceptual metaphor theory; media, cultural and film studies; digital humanities and cultural analytics; musicology and popular music studies; conversation and mediated discourse analysis; typography and graphic and software design; the work of critical social theorists such as Bakhtin, Barthes, Bourdieu, Freire, and others). We hope that the book’s diversity will attract more researchers to take the path from multimodal to critical multimodal studies and further advance our growing understanding of the role that different semiotic resources and their multimodal interaction play in supporting existing power relations and promoting social change.

REFERENCES Baldry, A., & Thibault, P. (2006). Multimodal transcription and text analysis. London: Equinox. Bednarek, M., & Martin, J. R. (Eds.). (2010). New discourse on language: Functional perspectives on multimodality, identity, and affiliation. London: Continuum. Bernstein, B. (1990). The structuring of pedagogic discourse: Class, code and control (Vol. 4). London: Routledge. Callow, J. (2006). Images, politics and multiliteracies: Using a visual metalanguage. Australian Journal of Language and Literacy, 29(1), 7–23. Chouliaraki, L. (2006). The spectatorship of suffering. London: Sage. Chouliaraki, L., & Fairclough, N. (1999). Discourse in late modernity: Rethinking critical discourse analysis. Edinburgh: Edinburgh University Press. Cope, B., & Kalantzis, M. (Eds.). (2000). Multiliteracies: Literacy learning and the design of social futures. South Yarra, Victoria: Macmillan Publishers Australia.

12  Emilia Djonov and Sumin Zhao Danesi, M. (2012). Popular culture: Introductory perspectives. Lanham, MD: ­Rowman & Littlefield. Fairclough, N. (2001 [1989]). Language and power (2nd ed.). London: Longman. Fairclough, N. (2010 [1995]). Critical discourse analysis: The critical study of language (2nd ed.). Harlow: Longman. Forceville, C. (1996). Pictorial metaphor in advertising. London: Routledge. Forceville, C., & Urios-Aparisi, E. (Eds.). (2009). Multimodal metaphor. Berlin: Mouton de Gruyter. Fowler, R., Hodge, R., Kress, G., & Trew, T. (Eds.). (1979). Language and control. London: Routledge & Kegan Paul. Gibbons, A. (2011). Multimodality, cognition, and experimental literature. London: Routledge. Halliday, M. A. K. (1978). Language as social semiotic. London: Arnold. Halliday, M. A. K. (1985). Part A. In M. A. K. Halliday & R. Hasan (Eds.), Language, context, and text: Aspects of language in a social-semiotic perspective (pp. 1–49). Geelong, Victoria: Deakin University Press. Halliday, M. A. K. (1994 [1985]). An introduction to functional grammar (2nd ed.). London: Arnold. Hodge, R., & Kress, G. (1988). Social semiotics. Cambridge, UK: Polity Press. Holsanova, J. (2008). Discourse, vision, and cognition. Amsterdam: Benjamins. Jewitt, C. (2006). Technology, literacy and learning: A multimodal approach. Oxon: Routledge. Jewitt, C. (Ed.). (2009). The Routledge handbook of multimodal analysis. Oxon: Routledge. Jewitt, C., & Kress, G. (Eds.). (2003). Multimodal literacy. New York, NY: Peter Lang Publishing. Kaltenbacher, M. (2004). Perspectives on multimodality: From the early beginnings to the state of the art. Information Design Journal + Document Design, 12(3), 190–207. Kress, G. (2010). Multimodality: A social semiotic approach to contemporary communication. Oxon: Routledge. Kress, G., & Hodge, R. (1979). Language as ideology. London: Routledge & Kegan Paul. Kress, G., Jewitt, C., Bourne, J., Franks, A., Hardcastle, J., Jones, K., & Reid, E. (2005). English in urban classrooms: A multimodal perspective on teaching and learning. London: RoutledgeFalmer. Kress, G., Jewitt, C., Ogborn, J., & Tsatsarelis, C. (2001). Multimodal teaching and learning: The rhetorics of the science classroom. London: Continuum. Kress, G., & Van Leeuwen, T. (1990). Reading images. Geelong, Victoria: Deakin University Press. Kress, G., & Van Leeuwen, T. (2001). Multimodal discourse: The modes and media of contemporary communication. London: Arnold. Kress, G., & Van Leeuwen, T. (2006 [1996]). Reading images: The grammar of visual design (2nd ed.). London: Routledge. Lakoff, G., & Johnson, M. (1980). Metaphors we live by. Chicago, IL: University of Chicago Press. Lemke, J. L. (2002). Travels in hypermodality. Visual Communication, 1(3), 299–325. Lemke, J. L. (2006). Towards critical multimedia literacy: Technology, research, and politics. In M. C. McKenna, L. D. Labbo, & D. Reinking (Eds.), International handbook of literacy and technology (Vol. 2, pp. 3–14). Mahwah, NJ: Lawrence Erlbaum Associates. Machin, D. (2004). Building the world’s visual language: The increasing global importance of image banks in corporate media. Visual Communication, 3(3), 316–336.

From Multimodal to Critical Multimodal Studies  13 Machin, D., & Mayr, A. (2012). How to do critical discourse analysis: A multimodal introduction. London: Sage. Machin, D., & Van Leeuwen, T. (2007). Global media discourse. London: Routledge. Macken-Horarik, M. (2003). Working the borders in racist discourse: The challenge of the “Children Overboard Affair” in news media texts. Social Semiotics, 13(3), 284–303. Manovich, L. (2012). Trending: The promises and the challenges of big social data. In M. K. Gold (Ed.), Debates in the digital humanities (pp. 460–475). Minneapolis: The University of Minnesota Press. Martin, J. R. (2004). Positive discourse analysis: solidarity and change. Revista Canaria de Estudios Ingleses, 49 (Special Issue on Discourse Analysis at Work: Recent Perspectives in the Study of Language and Social Practice), 179–200. Martin, J. R., & Rose, D. (2007 [2003]). Working with discourse: Meaning beyond the clause (2nd ed.). London: Continuum. Martin, J. R., & Rose, D. (2008). Genre relations: Mapping culture. London: Equinox. Martin, J. R., & White, P. (2005). The language of evaluation: Appraisal in English. London: Palgrave Macmillan. Martinec, R. (2005). Topics in multimodality. In R. Hasan, C. M. I. M. Matthiessen, & J. Webster (Eds.), Continuing discourse on language: A functional perspective (pp. 157–181). London: Equinox. McNeill, D. (2005). Gesture and thought. Chicago, IL: University of Chicago Press. New London Group. (1996). A pedagogy of multiliteracies: Designing social futures. Harvard Educational Review, 66, 60–92. Norris, S. (2004). Analyzing multimodal interaction: A methodological framework. London: Routledge. Norris, S. (2011). Identity in (inter)action: Introducing multimodal (inter)action analysis. Berlin: Mouton De Gruyter. O’Halloran, K. L. (2004a). Discourses in secondary school mathematics classrooms according to social class and gender. In J. Foley (Ed.), Language, education and discourse: Functional approaches (pp. 191–225). London: Continuum. O’Halloran, K. L. (Ed.). (2004b). Multimodal discourse analysis. London: Continuum. O’Halloran, K. L. (2005). Mathematical discourse: Language, symbolism and visual images. London: Continuum. O’Halloran, K. L. (2008). Systemic functional-multimodal discourse analysis (SFMDA): Constructing ideational meaning using language and visual imagery. Visual Communication, 7(4), 443–475. O’Halloran, K. L., & Smith, B. (Eds.). (2011). Multimodal studies: Exploring issues and domains. London: Routledge. O’Toole, M. (2011 [1994]). The language of displayed art (2nd ed.). London: Routledge. Royce, T., & Bowcher, W. (Eds.). (2006). New directions in the analysis of multimodal discourse. Mahwah, NJ: Lawrence Erlbaum Associates. Scollon, R. (2001). Mediated discourse: The nexus of practice. London: Routledge. Sperber, D., & Wilson, D. (1995 [1986]). Relevance: Communication and cognition (2nd ed.). Oxford: Blackwell. Unsworth, L. (2001). Teaching multiliteracies across the curriculum: Changing contexts of text and image in classroom practice. Buckingham: Open University Press. Unsworth, L. (Ed.). (2008a). Multimodal semiotics: Functional analysis in contexts of education. London: Continuum. Unsworth, L. (Ed.). (2008b). New literacies and the English curriculum. London: Continuum. Van Dijk, T. A. (1993). Principles of critical discourse analysis. Discourse and Society, 4(2), 249–283.

14  Emilia Djonov and Sumin Zhao Van Dijk, T. A. (2001). Critical discourse analysis. In D. Schiffrin, D. Tannen, & H. E. Hamilton (Eds.), The handbook of discourse analysis (pp. 352–371). Oxford: Blackwell. Van Dijk, T. A. (2008a). Discourse and context: A socio-cognitive approach. Cambridge: Cambridge University Press. Van Dijk, T. A. (2008b). Discourse and power. Houndmills: Palgrave Macmillan. Van Leeuwen, T. (1999). Speech, music, sound. London: Macmillan Press Ltd. Van Leeuwen, T. (2000). Visual racism. In M. Reisigl & R. Wodak (Eds.), The semiotics of racism: Approaches in critical discourse analysis (pp. 333–350). Vienna: Passagen Verlag. Van Leeuwen, T. (2005). Introducing social semiotics. London: Routledge. Van Leeuwen, T. (2006). Critical discourse analysis. In K. Brown (Ed.), Elsevier encyclopaedia of language and linguistics (2nd ed., Vol. 13, pp. 290–294). Oxford: Elsevier. Van Leeuwen, T. (2008). Discourse and practice: New tools for critical analysis. London: Oxford University Press. Van Leeuwen, T. (2009). The world according to Playmobil. Semiotica, 2009(173), 299–315. doi:10.1515/SEMI.2009.013 Van Leeuwen, T. (2013). Critical analysis of multimodal discourse. In C. Chapelle (Ed.), Encyclopedia of applied linguistics (pp. 1–5). Oxford: Wiley-Blackwell. Wodak, R., & Chilton, P. (Eds.). (2007). New agenda in (critical) discourse analysis. Amsterdam: John Benjamins. Wodak, R., De Cillia, R., Reisigl, M., & Liebhart, K. (2009 [1999]). The discursive construction of national identity (2nd ed.). Edinburgh: Edinburgh University Press. Wodak, R., & Meyer, M. (2009a [2001]). Critical discourse analysis: History, agenda, theory and methodology. In R. Wodak & M. Meyer (Eds.), Methods of critical discourse analysis (2nd ed., pp. 1–33). London: Sage. Wodak, R., & Meyer, M. (Eds.). (2009b [2001]). Methods of critical discourse analysis (2nd ed.). London: Sage. Wodak, R., & Weiss, G. (Eds.). (2003). Critical discourse analysis: Theory and interdisciplinarity. Basingstoke: Palgrave Macmillan.

Part I

Methodological and Theoretical Challenges

This page intentionally left blank


Revisiting Cinematic Authorship A Multimodal Approach Chiaoi Tseng and John A. Bateman


1.1  Issues Raised Surrounding Authorship In film studies and culture at large, there is an area of discussion typically labelled ‘authorship’. In recent decades, the increasingly diverse and com­ plex collaborations in artwork, and particularly in commercial cinema, have raised fundamental issues about the theoretical validity of any straight­ forward notion of ‘the author’ of a text. Within film theory, discussions of authorship became particularly prominent with the advent of auteur theory, which claimed that particular directors were to be likened to literary authors by virtue of their distinctive styles and vision (cf. discussion and introduction in Livingston, 1997). Whereas the notion of literary authors has certainly been subject to critique, the situation for film is in many ways even more complex. Debates surrounding this concept have been engendered by con­ siderations such as the following. First, studying authorship seems an overly constrained way of approach­ ing mass culture; the artistic freedom of an author, particularly in cinema, cannot avoid being influenced by cultural, industrial, and institutional fac­ tors. That is, individual creativity in the commercial artistic production must be subject to several external influences, such as those imposed by genre and marketing, and thus complete author autonomy cannot exist. Such ‘death of the author’ concerns (cf. Barthes, 1977) and the paradox of what in film is typically termed auteur theory are succinctly expressed by Edward Bus­ combe: “The conscious will and talent (of the artist) are also in turn the product of those forces that act upon the artist, and it is here that traditional auteur theory most seriously breaks down” (1981, p. 32). Second, there is a general antipathy toward the ‘single author’ view, which holds “[w]ithin most film industries, [where] the director is considered the single person most responsible for the look and sound of the finished film” (Bordwell & Thompson, 1993, p. 13). This automatic authorial role attrib­ uted to the director has come under close scrutiny and may even now be treated as ideologically suspect or politically incorrect. For film there is

18  Chiaoi Tseng and John A. Bateman certainly a wealth of other candidates for the author role and mentioning the director to the exclusion of other potential contributors might be considered unfair. For example, in some cases, screenwriters are considered the primary creative source (Koch, 2000); this possibility is discussed by LaRocca (2011) with respect to the screenwriter Charlie Kaufman and his particularly dis­ tinctive film discourse patterns, themes, and ideology. In other cases, it is the movie stars and their acting styles that are seen as being especially important (Arnheim, 1997, p. 68). In the complex process of cinematic production, therefore, ‘coauthorship’ seems particularly compelling, and many theorists consequently propose retheorizing authorship by accepting that artworks may have multiple authors (Gaut, 1997; Livingston, 1997; Sellors, 2007). However, just who might reasonably be considered to be authors during various stages of the production of a complex work such as a film is still a matter of considerable debate (cf. Bacharach & Tollefsen, 2010; Livingston, 2011; Meskin, 2009). Drawing on the problems raised for the notion of authorship, our concern in this chapter is to explore a further possibility for reconciling traditional views and recent scepticism by means of a comparative analysis of some selected facets of Ingmar Bergman’s films. Moreover, although we focus our analysis here only on cinema, we hope nevertheless that similar approaches might also shed light on ‘auteurism’ in other cultural and artistic realms.

1.2  Author versus Author’s Filmic Text Despite the ongoing debate concerning its theoretical status, authorship appears to have more than proved its usefulness as a means of describing cre­ ative film styles and innovations. It is still open to question, however, whether auteur analysis should be confined to asserting personal expression as the principal criterion of value. Peter Wollen, particularly in his groundbreaking book Signs and Meaning in the Cinema (1969), and as summarized in Wollen (1981), clearly elucidates his concern with the following perspective: I think it is important to detach the auteur theory from any suspicion that it simply represents a ‘cult of personality’ or apotheosis of the direc­ tor. . . . What the auteur theory argues is that any film, certainly a Hollywood film, is a network of different statements, crossing and con­ tradicting each other, elaborated into a final ‘coherent’ version. . . . [B]y a process of comparison with other films, it is possible to decipher, not a coherent message or world-view, but a structure which underlies the film and shapes it, gives it a certain pattern of energy cathexis. It is this structure which auteur analysis disengages from the film. The structure is associated with a single director, not because [he or she] has played the role of artist, . . . but because it is through the force of [his or her] preoccupations that an unconscious, unintended meaning can be decoded in the film. . . . It is wrong . . . to deny any status to individuals

Revisiting Cinematic Authorship  19 at all. But Fuller or Hawks or Hitchcock, the directors are quite separate from ‘Fuller’ or ‘Hawks’ or ‘Hitchcock’, the structures named after them, and should not be methodologically confused. (pp. 146–147) This citation sets out two important foundations on which our approach builds. First, Wollen clearly distinguishes flesh and blood authors from partic­ ular structural configurations embedded in filmic texts that might encourage reading in terms of an originating ‘author’; texts may contain ‘author-indicating cues’ just as they may contain a range of other cues without entailing real situations of individuals. These textual constructions must therefore be distin­ guished from text external entities and states of affairs. This distinction opens up a space for potential meaning-making that is essential for film interpreta­ tion. As in many branches of media studies, analyses of films often consider the ideologies that those films either conform to or work against (cf. Ryan & Kellner, 1988). Indeed, in earlier approaches, the very fact of film being a projection over which an audience has very little control was considered to be an ideological commitment of its own (cf. Baudry, 1974). The distinctions between textual configurations and actual ‘authors’ thereby avoid any inap­ propriate linking of film as ‘work’ to, on the one hand, personality and, on the other, to ideology. Wollen’s second point is to emphasize the importance of the process of comparison by which particular patterns in authors’ works can be set against broader social contexts. Consequently, analysis at the textual level—our first foundation—grasps filmic organization, while the process of comparison with other filmic constructions—our second foundation—locates such organizations with respect to broader social currents and trends. These two analytical principles also directly echo the dynamic, compara­ tive approach to auteur theory proposed by Buscombe: What is needed now is a theory of the cinema that locates directors in a total situation, rather than one which assumes that their development has only an internal dynamic. . . . Three approaches seem possible. . . . First, there is the examination of the effects of the cinema on society. . . . Second is the effect of society on the cinema; in other words, the opera­ tion of ideology, economics, technology, etc. Lastly, and this is in a sense only a sub-section of the preceding category, the effects of film on other films; this would especially involve questions of genre, which only means that some films have a very close relation to other films. (1981, p. 32) Along similar lines to Wollen and Buscombe, therefore, this paper aims to demonstrate how methods of filmic textual analysis can be used effectively to reflect an author’s creative styles and patterns while nevertheless locating these in social, economic, and institutional contexts. The framework of textual analysis we apply draws on our previous work transferring linguistic discourse methods to film analysis (Bateman & Schmidt, 2012; Tseng, 2013; Wildfeuer, 2013). There we proposed that each

20  Chiaoi Tseng and John A. Bateman film exhibits dimensions of discourse organization that interact with viewers’ expectations during the process of film narrative construction. These dimen­ sions include features such as a film’s temporal structure of shot and scenes, story events, the organization of events into larger plot structures, orchestra­ tion of emotion, and the cohesion of character tracking. We are developing frameworks for analyzing discourse patterns such as these, including methods for analyzing relations between shots and scenes (Bateman, 2007; Bateman & Schmidt, 2012), for formally constructing filmic discourse structure (Wild­ feuer, 2013), for constructing characters’ actions and events, and for tracking characters (Tseng, 2013). These linguistically motivated analyses also offer significant insights concerning how viewers’ genre expectations are impacted by the precise textual organization of film elements (Tseng, 2013). For present purposes, we will use analyses of these discourse dimensions to show that an author’s creative filmic patterns and commercial generic traits can be interrelated as poles along a dynamic continuum rather than as two contra­ dictory extremes of ‘high art’ and ‘popular culture’. This dynamic comparative relation is suggested visually in Figure 2.1(a), where a multilayered analysis of different discourse dimensions shows that an auteur film may enact cre­ ative discourse patterns, such as distinctive selections of temporal structure and story events, realizing a personal style, while at the same time in other areas, such as emotion and cohesion, completely follow widely established genre conventions. It is in fact rather common in films generally classified as auteur films that after establishing personal creative styles in one discourse dimension, filmmakers appeal to genre conventions in other discourse dimensions to make their films accessible. We have discussed this in detail in a previous analysis of Christopher Nolan’s Memento (Tseng & Bateman, 2012), where we show that, despite its creative, nonlinear temporal structure, the film is nevertheless made accessible (and also popular) by virtue of highly cohesive structures of characters and settings. In addition, Bordwell (2012) discusses many neo-noir character traits that are clearly already familiar to viewers.

1.3  Who Is/Are Author(s)? Although its subdivisions draw on different theories, a general anti-individual movement now holds that films are typically the product of multiple authors. For instance, Gaut (1997) deconstructs three major strategies used by the traditional auteur theory to justify single authorship: (1) the restriction strategy restricts the single author of a film to the person who is taken as having most contributed to the film’s artistic properties; (2) the sufficient control strategy identifies a single author by looking at who has sufficient control over the artwork as a whole; and (3) the construction strategy con­ structs a single author, who is not the same as the actual person who created the work but “the author’s persona appearing in her work” (Gaut, 1997, pp. 158–159). After examining their analytical validity, Gaut insists that the restrictive, sufficient, and constructive strategies actually fail to identify

Revisiting Cinematic Authorship  21

Figure 2.1  Analysing authorship based on a dynamic continuum: (a) an auteur’s film demonstrating some discourse patterns showing a creative style while other pat­ terns conform to established genre patterns; (b) a film lacking specific authorial traits

a single author because many individuals can contribute to the making of a film, many individuals exhibit control over its artistic properties, and many personae are likely to exhibit themselves in a film. Therefore, a film needs to be examined as made by multiple authors. Livingston (1997) also takes a critical stand on the nature of ‘author’ and suggests that the status of author should be allocated in a more literary sense, that is, when there has been a significant creative aesthetic contribu­ tion. This means that he takes some films to be subject to authority without exhibiting authorship and that certain films can even be ‘authorless’. By distinguishing between genuine coauthors and mere contributors, he argues for a theory of coauthorship based on the theories of collective intentional­ ity developed by Michael E. Bratman. Along similar lines, Sellors (2007) proposes a theory drawing on John Searle’s theory of we-intentions—that is,

22  Chiaoi Tseng and John A. Bateman among those multiple contributors in the making of a film, only those who share intentions to produce an utterance acquire authorial status. Bacharach and Tollefsen (2010), in contrast, argue against collective intention theories and advocate an alternative approach to coauthorship drawing on concepts of collective action and responsibility developed by Margaret Gilbert. This paper is in no position to start another philosophical debate about ‘what is an author’ or to show who exactly in each film exercises sufficient control in the production of the work to warrant attribution of the author role. Recent retheorizations of the various roles of authors and the textual realization of those roles in filmic texts offer instead a good starting point for attempting a clarification of authorship drawing directly on multimodal discourse theory. Particularly following Livingston (e.g. 1997, 2009, 2011), it appears a useful strategy to reserve the term ‘author’ for those who exert a positive stylistic influence on the artwork/film. In such cases, at least some of the ‘sliders’ shown in Figure 2.1 will be moved to the left, away from estab­ lished genre norms. Although we must be very careful here not to fall back on outdated notions of style as ‘deviation’ (cf. the critique of this position given in Halliday, 1971), it seems reasonable to insist that some aesthetic contribution must be present; we will return to this question at the end of the chapter. In films falling entirely within genre conventions, there are still naturally (i.e., ontologically) filmmakers, but they leave little authorial trace—the texts produced resemble the ‘faceless bureaucratic’ text, anony­ mous with respect to the options taken up. Films of this kind correspond to Figure 2.1(b), in which a filmic textual construction is created with no specific trait of a particular author but remains within commonly used con­ ventions of genre, style, and subject matter, and which easily meet viewers’ expectations without setting any demanding tasks of narrative construction. Thus, in contrast to the straightforward, common sense idea of the author simply as the originator of a discourse, films (particularly mainstream films) may then sometimes be beneficially seen to be authorless, coming into existence as a product of multiple external collaborative factors—genre con­ ventions, screenwriters’ control, producers’ authority, and so on. In summary of the discussion so far, typical for any account of film author­ ship are questions such as: “Does a film have an author?” or “Is the author its director or screenwriter?” Drawing on the issues outlined previously for auteur theory, we can now consider how an authorship analysis can effectively address such questions by appealing more directly to the multimodal text by means of a fine-grained textual analysis of the films in question. We will do this by adopting a filmic textual approach based on a comparative corpus-based study.

1.4  Data and Hypothesis The data used for the study include the beginnings of the 18 films by Ingmar Bergman listed in Table 2.1. Ingmar Bergman was selected because his auteur status is unlikely to be contested. Throughout his career, Bergman made

Revisiting Cinematic Authorship  23 Table 2.1  Eighteen films by Ingmar Bergman used as data 1953

Summer with Monika




Wild Strawberries


The Rite


The Seventh Seal


The Passion of Anna


The Virgin Spring


Cries and Whispers


Through a Glass Darkly


Scenes from a Marriage


The Silence


Face to Face




Autumn Sonata




From the Life of the Marionettes

Hour of the Wolf


Fanny and Alexander


highly individual films with many reoccurring themes and preoccupations. The distinctiveness of these films is generally considered a prime example of an auteur in action (cf. Livingston, 2009). Moreover, we select the begin­ nings of the films because beginnings in all films function specifically to establish a hypothesis, to provide first impressions that later developments of the narrative will be measured against (cf. Hartmann, 2009). In psycho­ logical terms, the function of the initial portions of a film has been described according to the primacy effect and priming (cf. Luchins & Luchins, 1962) or anchoring bias (cf. Tversky & Kahneman, 1974). A distinctive structur­ ing function has also been theorized in studies of text linguistics, for example by Martin (1992), who develops the notion of ‘macro-themes’ to describe a communicative function that serves the role of signposting, or predicting, the organization of the text following. Applying our discourse methods to the beginnings of Bergman’s films, we will then specifically investigate the hypotheses that: 1. Regardless of genre and authors, the functions of film beginnings are familiar to viewers from their knowledge of traditional conventions and will consequently be used in a stereotypical fashion. 2. Nevertheless, an organic unity can still be found in Bergman films, and it works to construct particularly Bergmanesque themes as proposed by several film theorists (cf. Kawin, 1978; Livingston, 2009). 2.  FILMIC DISCOURSE ANALYSIS The discourse patterns that we analyze in this paper are based on two of the discourse properties briefly introduced in Section 1.2 above: filmic cohesion (Tseng, 2008) and filmic discourse relations (Bateman, 2007). Both have been described at length elsewhere and so here we will illustrate them simply

24  Chiaoi Tseng and John A. Bateman

Figure 2.2  The opening sequence of eight shots from Bergman’s Wild Strawberries (1957)

with an example analysis of the first eight shots of the opening sequence from Bergman’s Wild Strawberries (1957), key frames of which are shown in Figure 2.2. The first and second shots (S1 and S2) show an old man, Professor Isak Borg, sitting in his study. His onscreen monologue describes his lonely life. As he continues to talk about his son, a tracking shot (S3b and S3b) follows the characters in the photos mentioned by Borg. He further introduces his mother (S4) and, after a close shot (S5) of Borg, his wife (S6). In shot 7 (S7), the housekeeper enters the room and speaks to Borg on screen. After reply­ ing to the housekeeper, he restarts his onscreen monologue (S8) describing his luck in having a good housekeeper. We exemplify first the method for analyzing filmic discourse relations, which is an extension of the notion of conjunctive relations proposed for ver­ bal language by Martin (1992) and applied to film by Van Leeuwen (1991). Filmic discourse relations characterize relations between film segments in terms of temporality, spatiality, epistemic status and mental state (seen, heard,

Revisiting Cinematic Authorship  25 imagined, etc.), and audiovisual structural dependence (dependent/hypotactic or independent/paratactic). According to Bateman (2007) and Bateman and Schmidt (2012), relations of this kind operate at a level of discourse pattern­ ing, rather than, as often assumed in traditional accounts of filmic ‘montage’, between shots. This then builds on the central systemic-functional notion of stratification and applies this to multimodal discourse and texts. Within models of this kind, description is spread across several levels of abstraction and explanations for patterns at any one level are sought in correlations with patterns at other levels (cf. Martin, 1992). Discourse patterning is known to exhibit rather different properties than, for example, the stratum of grammar, and so is usefully distinguished. Connections between units at the discourse level are constructed dynami­ cally and defeasibly rather than compositionally, which is crucial for dealing with dynamic media such as film. Whereas a variety of units anchored in filmic form have been adopted in film analysis, such as the shot or frame, moving to the level of discourse allows us to employ a more abstract view that is more suited for higher-level interpretations such as narrative. The analytical unit adopted here is consequently that of the ‘event segment’, a segment correspond­ ing to, or construing, a single ‘unit’ of behaviour or activity. Such segments may then be realized both shot-internally and across shot boundaries as illustrated in detail in Bateman and Schmidt (2012, pp. 154–161). Stratification conse­ quently allows us to more readily bridge abstract interpretations in terms of narrative and social configurations and fine-grained technical features of films. The spatiotemporal relations holding within the opening eight shots of Wild Strawberries are then as shown in Figure 2.3. The relations between

Figure 2.3  The filmic discourse relations holding within the first eight shots of Berg­ man’s Wild Strawberries (1957)

26  Chiaoi Tseng and John A. Bateman units indicated by arrows are all taken from the classification network set out in detail in Bateman (2007) and Bateman and Schmidt (2012) and cover spatial, temporal, and various structural dependency relationships. In the present case, continuation of movement and other filmic technical features, such as sound, lead to the assumption of continuous temporality between many of the units; similarly technical features allow judgements of spatial relationships such as moving nearer (‘narrowing’), and so forth. Constraints specified in the classification network also allow structural groupings to be deduced; these are indicated in the graphic by square brackets. Thus, in the present case, we can observe that the film includes a variety of structural devices, such as inserts (e.g., S3a–S3b). Importantly, this begins to show how film is much more than a simple linear unfolding of successive shots in precisely the same way that verbal language exhibits richer structures than a simple one-dimensional stream of utterances. The second method from the stratum of discourse that we employ is filmic cohesion. Rather than analyzing relations between event segments, filmic ­cohesion examines how characters, objects, and settings in coherent film narra­ tives are presented and tracked throughout a film. These tracks form cohesive chains, which bind together information concerning the salient characters, objects, and settings realized across the semiotic modes in a film. Cohesive ties between each appearance of the characters, objects, and settings provide important cues that guide the viewer along intended paths of interpretation. We exemplify this for the same opening sequence in Figure 2.4. This shows how the cohesive ties between filmic elements are established across the eight shots. The ties as such can be constructed from any audio, visual, or verbal modes in the film and hence each chain, as we can see in Figure 2.4, is crossmodal. In other words, there are reoccurrences of visual and verbal elements that are tied together, often employing continuity techniques long established in filmmaking practice. In this example, the two main chains to which almost every shot con­ tributes are the general setting of Borg’s study, which is clearly presented in an establishing shot (S1), and the main character, Professor Borg himself, who also reappears in the sequence cross-modally in both visual and verbal modes (S1, S2, S8) in addition to mono-modal realizations (visual mode in S5, verbal mode in S3a, S4, S6, S7). After Borg mentions ‘other people’ in the first shot, the verbal element of ‘other people’ is divided into the people (son, son’s wife, mother, Borg’s wife, housekeeper) introduced both visually and verbally in the subsequent shots. The pattern of these cross-modal chains shows that the opening sequence has very high cross-modal cohesive harmony (Hasan, 1984; Tseng, 2008); apart from Professor Borg, each of his “other people” is vividly identi­ fied in both the visual and verbal mode. As we will see in the next section, except for his experimental puzzle film Persona (1966), many of the opening sequences of Bergman’s films demonstrate a cohesive chain pattern of this kind: very dominant, cross-modal main character chains and one setting

Revisiting Cinematic Authorship  27

Figure 2.4  The filmic cohesive chains established across the first eight shots of Bergman’s Wild Strawberries (1957); [v] indicates visual elements, other elements are verbal

chain without further transitions between settings. Moreover, as we shall explain in more detail below, these chains in fact foreshadow the principal motifs of Bergman’s films, probing into life’s fundamental topics surround­ ing ‘self and other people’ and ‘external and internal self’ (cf. Livingston, 2009, pp.194–195). 3.  EMPIRICAL ANALYSIS OF THE CORPUS

3.1  Analysis of Discourse Relations We now apply our selected analytical methods to the 18 Bergman films listed previously in order to explore (a) whether there are similar patterns of dis­ course relations and cohesive chains across these films’ opening sequences, and (b) whether these patterns are specific to the ‘author’.

28  Chiaoi Tseng and John A. Bateman Within the discourse relation dimension, relations between events were classified according to whether they were temporal, spatial, projective, com­ parison, or unidentifiable. The results show that 17 of the 18 films exhibit a chronological structure in their opening sections and so the filmic discourse relations are prominently linear and continuous, just as was the case in the analysis of the opening of Wild Strawberries shown in Figure 2.3 above. The one exception is the beginning of Persona (1966), a well-known example of a so-called ‘puzzle film’ (cf. Buckland, 2009). As illustrated in Figure 2.5, here we find an extreme example of unclear filmic discourse relations. The film begins with camera equipment and projectors lighting up and projecting dozens of brief cinematic images between which the discourse relations are unidentifiable. In addition, the cohesive devices also do not make characters, objects, and settings between the images trackable. Each glimpse seems to depict certain visual symbols, but the connections between them are not established by filmic elements that viewers can follow. Similar experimental construction patterns are observable in other puzzle films such as Christof­ fer Boe’s Reconstruction (2003) or Kieślowski’s Blind Chance (1981). In summary, except for Persona, constructing discourse relations in the opening segments of Bergman’s films is not a demanding task for the viewer. This can be seen particularly clearly by comparing these structures with those of other films since, when compared to the opening segments of films by other recent directors such as Christopher Nolan’s Following (1998), Memento (2000), and Prestige (2006); Darren Aronofsky’s Requiem for a Dream (2000) and The Fountain (2006); or Wong KarWai’s In the Mood for Love (2000) and 2046 (2004), the continuous temporal and spatial structures of Bergman’s films are quite straightforward. For further discussion of how these more recent films manipulate their viewers in the opening segments, see Tseng and Bateman (2010, 2012), Tseng (2012), and Wildfeuer (2013).

Figure 2.5  Unidentifiable discourse and cohesive relations in the opening section of Bergman’s Persona (1966)

Revisiting Cinematic Authorship  29

3.2  Analysis of Filmic Cohesive Chains When we turn to the analysis of cohesion, however, a distinctive pattern used prominently in many Bergman films does start to emerge. A compar­ ison of the specificity of characters, objects, and settings, including both verbal and visual cohesive devices, reveals that, again with the exception of Persona, each film visually, verbally, or cross-modally identifies main characters explicitly in the beginning. Moreover, within these 17 films, only Summer with Monika (1953), the earliest in the sample, has a transition within the establishment of the setting where there is a gradual ‘zooming in’ on progressively more specific settings as the film opens. These transitions begin with several long-shot images of Stockholm harbour and then depict how the principal character, Harry, drives on city streets with heavy traffic, enters a café, and finally encounters Monika. The cohesive chain of this sequence is illustrated in Figure 2.6, in which we can see how each setting

Figure 2.6  Pattern of cohesive chain in Bergman’s Summer with Monika (1953)

30  Chiaoi Tseng and John A. Bateman and character is gradually introduced. This pattern can often be seen in films beginning with a ‘zooming in’ from broader city images to characters, as in Hitchcock’s The Birds (see Tseng, 2008). The beginnings of the other 16 films have no setting transitions. That is to say, main characters are seen within one setting where they are clearly introduced and their relationships are revealed. Furthermore, these 16 films vividly elaborate their subject matters in the beginning sequences, thus pro­ viding a concrete, thematic hypothesis right at the outset for the viewer to confirm or refute as the film unfolds. The chain pattern established from these 16 films is then as suggested graphically in Figure 2.7. Here we see cohesive chains prominently tracking setting and main characters from the beginning to the end of the opening sequence. Whenever a particular theme is presented, a further theme chain is added to the pattern. These themes are commonly presented verbally by main characters, such as ‘death’ and ‘God’ in The Seventh Seal, The Virgin Spring, and Winterlight; ‘marriage’ in Scenes from a Marriage; and ‘war’ in Shame. When the films deal with characters’ personal difficulties, traits, and relationships, the beginning sequences often open with a clear verbal introduction to those main characters and their relationships, as we previously illustrated for Wild Strawberries and as is also evident in Through a Glass Darkly, Hour of the Wolf, The Rite, The

Figure 2.7  The dominantly used pattern of cohesive chains found in the beginning sequences of 16 films of the corpus

Revisiting Cinematic Authorship  31 Table 2.2  Cohesion analysis results for the corpus


Pattern Conforming to Theme = Figure 2.7 Character

Additional Themes


Summer with Monika


See Figure 2.6

Wild Strawberries


other people

See Figure 2.4

The Seventh Seal



The Virgin Spring



Through a Glass Darkly



The Silence





church, God



pattern unclear

Hour of the Wolf






The Rite



The Passion of Anna



Cries and Whispers



Scenes from a Marriage



Face to Face



Autumn Sonata



From the Life of the Marionettes



Fanny and Alexander



Passion of Anna, and Autumn Sonata. Personal traits and relationships can also be foregrounded through performance, as in The Silence, Cries and Whispers, Face to Face, From the Life of the Marionettes, and Fanny and Alexander. The overall results for cohesion and the patterns exhibited are summarized in Table 2.2. 4.  CONCLUSIONS AND DISCUSSION On the basis of the discourse analysis of the 18 films, our exploratory study shows that there is indeed a prominently used filmic discourse pattern found in the beginning of Bergman’s films: 16 out of 18 films have the same pat­ tern, consisting of chronologically organized event segments presenting main

32  Chiaoi Tseng and John A. Bateman characters both verbally and visually. The opening sequences thus function to foreshadow narrative thematics, providing a hypothesis against which viewers can measure their interpretation of the film. This function is real­ ized through a web of texture created by multimodal cohesive devices and by discourse relations holding between event segments. The introduction of additional themes appears to serve the function of introducing precisely those subject matters with which the films are concerned. These in turn cor­ respond to the basic Bergman themes as discussed in film studies (cf. Kawin, 1978; Livingston, 2009). This then lends support to both of our hypotheses above. On the one hand, the beginnings of the films appear very likely to exhibit their major themes as is usually the case for film beginnings and, on the other hand, those themes constructed by the cohesive patterns turn out to be precisely those for which Bergman is well known. Ascertaining to what extent Berg­ man’s pattern as such resembles or differs from other mainstream films clearly requires further work. For example, a broader study would need to contrast Bergman’s films with other films and also to move beyond the opening sequences in order to examine whether there are specific patterns that depict ‘elaboration’ of these character traits and themes and how these character traits and themes might be different from mainstream films. Our analysis also suggests, however, that despite Bergman’s prominent style of thematic presentation, the beginning portions of his films fulfil conventional communicative functions that the viewer is familiar with and which do not differ from most mainstream films. To bring out more clearly where Bergman’s contribution might lie, we can usefully apply the notion of stratification introduced previously in order to consider the extent to which the particular authorial presence in these films is not given by any direct ‘violation’ of straightforward genre norms of the kind suggested in Figure 2.1, but rather by regularly instantiating pat­ terns that have recognition value at a higher level of aesthetic abstraction. Thus, while any of the choices made among discourse relations and cohesion might, when considered alone, be within the bounds of genre conventions, their regular selection in concert and in combination with other particular selections may well result in an authorially distinctive set of semiotic options being taken up. The ‘deviation’ from norms constituting the authorial input is then situated within more abstract organizational levels of the film con­ cerning, in this case, the particular subject matters that the films discuss and just how these subject matters are filmically introduced. Again, there is sub­ stantially more work to be done here to take this line of investigation further and to probe its sensitivity for revealing possible authorial differences. In conclusion, therefore, this paper has suggested a new approach to cin­ ematic authorship. It has considered in particular how a complex, stratified framework drawing on the recent development of linguistics-based multi­ modal theory might support the view that focusing on authorship is a fruitful strategy for investigating film stylistics. Through the analysis of discourse

Revisiting Cinematic Authorship  33 relations between event segments and cohesion analysis, we began to outline how an organic unity in the author’s work might be recognized by textual analysis and that this, when contrasted across films, might take us further toward the kind of cross-film comparative studies suggested to address the issues of authorship by Wollen and Buscombe several decades ago. We also see the approach set out in this paper as further bringing together a stratified linguistic approach and film studies. A stratified framework eluci­ dates how the lower-level configurations of visual, verbal, and audio devices construct coherent filmic texts, which then support higher-level descriptions of thematics and genre comparison. Although the sample size of data in this paper is small and more corpus-based analyses are needed, both for non-Bergman films and for segments beyond the opening sequences, we hope nevertheless that this exploratory study has helped suggest how higher-level cultural issues might begin to be addressed on the basis of fine-grained textual analysis.

BERGMAN FILM REFERENCES Bergman, I., Ekelund, A. (Producer), & Bergman, I. (Writer/Director). (1960). The Virgin Spring [Motion picture]. Sweden: Svensk Filmindustri (SF). Bergman, I. (Producer), & Bergman, I. (Writer/Director). (1966). Persona [Motion picture]. Sweden: Svensk Filmindustri (SF). Brick, R. (Producer), & Bergman, I. (Writer/Director). (1978). Autumn Sonata [Motion picture]. United States: New World Pictures. Carberg, L. (Producer), & Bergman, I. (Writer/Director). (1968). Hour of the Wolf [Motion picture]. Sweden: Svensk Filmindustri (SF). Carberg, L. (Producer), & Bergman, I. (Writer/Director). (1968). Shame [Motion picture]. Sweden: Svensk Filmindustri (SF). Carlberg, L. (Producer), & Bergman, I. (Writer/Director). (1969). The Rite [Motion picture]. Sweden: Svensk Filmindustri (SF). Carlberg, L. (Producer), & Bergman, I. (Writer/Director). (1969). The Passion of Anna [Motion picture]. United States: United Artists. Carlberg, L. (Producer), & Bergman, I. (Writer/Director). (1972). Cries and Whispers [Motion picture]. Sweden: Svensk Filmindustri (SF). Carlnerg, L. (Producer), & Bergman, I. (Writer/Director). (1973). Scenes from a Marriage [Motion picture]. Sweden: Svensk Filmindustri (SF). Carlberg, L. (Producer), & Bergman, I. (Writer/Director). (1976). Face to Face [Motion picture]. Sweden: Paramount Pictures. Ekelund, A. (Producer), & Bergman, I. (Writer/Director). (1953). Summer with Monika [Motion picture]. Sweden: Svensk Filmindustri (SF). Ekelund, A. (Producer), & Bergman, I. (Writer/Director). (1957). Wild Strawberries [Motion picture]. Sweden: Svensk Filmindustri (SF). Ekelund, A. (Producer), & Bergman, I. (Writer/Director). (1957). The Seventh Seal [Motion picture]. Sweden: Svensk Filmindustri (SF). Ekelund, A. (Producer), & Bergman, I. (Writer/Director). (1961). Through a Glass Darkly [Motion picture]. Sweden: Svensk Filmindustri (SF). Ekelund, A. (Producer), & Bergman, I. (Writer/Director). (1963). The Silence [Motion picture]. Sweden: Svensk Filmindustri (SF). Ekelund, A. (Producer), & Bergman, I. (Writer/Director). (1963). Winterlight [Motion picture]. Sweden: Svensk Filmindustri (SF).

34  Chiaoi Tseng and John A. Bateman Jörn, D. (Producer), & Bergman, I. (Writer/Director). (1982). Fanny and Alexander [Motion picture]. Sweden: Svensk Filmindustri (SF). Wendlandt, H., Wendlandt, K., Bergman, I. (Producer), & Bergman, I. (Writer/ ­Director). (1980). From the Life of the Marionettes [Motion picture]. Sweden: Svensk Filmindustri (SF).

REFERENCES Arnheim, R. (1997). Film essays and criticism. Madison: University of Wisconsin Press. Bacharach, S., & Tollefsen, D. (2010). We did it: From mere contributors to co-authors. The Journal of Aesthetics and Art Criticism, 68(1), 23–32. Barthes, R. (1977). Image–music–text (S. Heath, Trans.). London: Fontana Press. Bateman, J. A. (2007). Towards a grande paradigmatique of film: Christian Metz reloaded. Semiotica, 167(1/4), 13–64. Bateman, J. A., & Schmidt, K. (2012). Multimodal film analysis: How films mean. London: Routledge. Baudry, J. L. (1974). Ideological effects of the basic cinematographic apparatus (A. Williams, Trans.). Film Quarterly, 28(2), 39–47. Bordwell, D. (2012). Nolan vs. Nolan. [Web log post]. Retrieved from http://www. Bordwell, D., & Thompson, K. (1993). Film art: An introduction (4th ed.). New York: McGraw-Hill. Buckland, W. (Ed.). (2009). Puzzle films: Complex storytelling in contemporary ­cinema. Chichester, UK: Wiley-Blackwell. Buscombe, E. (1981). Ideas of authorship. In John Caughie (Ed.), Theories of authorship: A reader (pp. 22–34). London: Routledge. Gaut, B. (1997). Film authorship and collaboration. In R. Allen & M. Smith (Eds.), Film theory and philosophy (pp. 149–172). London: Oxford University Press. Halliday, M. A. K. (1971). Linguistic function and literary style: An enquiry into the language of William Golding’s “The Inheritors.” In S. Chatman (Ed.), Literary style: A symposium (pp. 330–365). London: Oxford University Press. Hartmann, B. (2009). Aller Anfang: Zur initialphase des Spielfilms. Marburg: Schüren. Hasan, R. (1984). Coherence and cohesive harmony. In J. Flood (Ed.), Understanding reading comprehension: Cognition, language, and the structure of prose (pp. 181–219). Newark, DE: International Reading Association. Kawin, B. F. (1978). Mindscreen: Bergman, Godard, and first-person film. Princeton, NJ.: Princeton University Press. Koch, H. (2000). A playwright looks at the “filmwright.” In J. Hollows, P. Hutch­ ings, & M. Jancovich (Eds.), The film studies reader (pp. 55–58). London: Arnold. LaRocca, D. (Ed.). (2011). The philosophy of Charlie Kaufman. Lexington: Univer­ sity Press of Kentucky. Livingston, P. (1997). Cinematic authorship. In R. Allen & M. Smith (Eds.), Film theory and philosophy (pp. 132–148). Oxford: Oxford University Press. Livingston, P. (2009). Cinema, philosophy and Bergman: On film as philosophy. Oxford: Oxford University Press. Livingston, P. (2011). Theories of authorship: A reader. The Journal of Aesthetics and Art Criticism, 69(2), 221–225. Luchins, A. S., & Luchins, E. H. (1962). Primary-recency in communications reflect­ ing attitudes toward segregation. Journal of Social Psychology, 58, 357–369. Martin, J. R. (1992). English text: System and structure. Amsterdam: Benjamins. Meskin, A. (2009). Authorship. In P. Livingston & C. Plantinga (Eds.), The Routledge companion to philosophy and film (pp. 12–28). London: Routledge.

Revisiting Cinematic Authorship  35 Ryan, M., & Kellner, D. (1988) Camera politica: The poetics and ideology of contemporary Hollywood film. Bloomington: Indiana University Press. Sellors, P. (2007). Collective authorship in film. The Journal of Aesthetics and Art Criticism, 65, 263–271. Tseng, C. (2008). Cohesive harmony in filmic text. In L. Unsworth (Ed.), Multimodal semiotics: Functional analysis in contexts of education (pp. 87–104). London: Continuum. Tseng, C. (2012). Audiovisual texture in scene transitions. Semiotica, 192, 123–160. Tseng, C. (2013). Cohesion in film: Tracking film elements. Basingstoke: Palgrave Macmillan. Tseng, C., & Bateman, J. A. (2010). Chain and choice in filmic narrative: An analy­ sis of multimodal narrative construction in The Fountain. In C. R. Hoffmann (Ed.), Narrative revisited: Telling a story in the age of new media (pp. 213–244). Amsterdam: John Benjamins. Tseng, C., & Bateman, J. A. (2012). Multimodal narrative construction in Christopher Nolan’s Memento: A description of method. Journal of Visual Communication, 11(1), 91–119. Tversky, A., & Kahneman, D. (1974). Judgement under uncertainty: Heuristics and biases. Science, 185, 1124–1131. Van Leeuwen, T. (1991). Conjunctive structure in documentary film and television. Continuum: Journal of Media and Cultural Studies, 5(1), 76–114. Wildfeuer, J. (2013). Film Discourse Interpretation. Towards a New Paradigm of Multimodal Film Analysis. London, New York: Routledge. Wollen, P. (1969). Signs and meaning in the cinema. London: Secker and Warburg in association with the British Film Institute. Wollen, P. (1981). The auteur theory (extract). In J. Caughie (Ed.), Theories of authorship: A reader (pp. 138–151). London: Routledge.


The Television Title Sequence A Visual Analysis of Flight of the Conchords1 Monika Bednarek

1. INTRODUCTION Fictional television series are increasingly becoming recognized as important cultural products with which viewers engage using a variety of media platforms such as television broadcasts, DVD box sets, Internet downloads/ streaming, and so on.2 Discourse-oriented studies of such series have only recently started to emerge (e.g. Bednarek, 2010; Piazza, Bednarek, & Rossi, 2011; Richardson, 2010). So far, the main focus of such studies has been on the linguistics of television series and on their characters and narratives. In contrast, this chapter approaches television series from the perspective of critical multimodal analysis and focuses on data that are not part of the televisual narrative per se—the television title sequence (TTS). The television title sequence is a sequence of moving images, typically accompanied by music, which precedes each episode of a television series and is used throughout a season in identical format. I will first introduce key characteristics of the TTS before briefly discussing findings from a quantitative survey of 50 contemporary TTSs. I then move on to a qualitative analysis of the TTS for the musical television comedy Flight of the Conchords (Bobin, Clement, McKenzie, Miller, & Smiley, 2007–2009), which is the main focus of this chapter. I explore this TTS from a multimodal and from a critical perspective, drawing on Machin and Thornborrow (2003) in investigating the discursive practices of a contemporary commercial brand. 2.  THE TELEVISION TITLE SEQUENCE Although it has been discussed in television studies with respect to issues such as marketing (e.g., Mistry, 2006), general functions (e.g., Burton, 2000, p. 75), or in relation to specific programs (e.g., Bell, 1992; Gripsrud, 1995), the TTS is yet to become the object of comprehensive multimodal analysis. From such studies and adding some of my own observations, the

The Television Title Sequence  37 main characteristics and functions of the TTS can be summarized as follows. The TTS: • Signals the beginning of a program and/or separates elements such as recaps and scenes from the ‘start proper’ of the television episode • Creates continuity between different episodes • Identifies or names a television series (and sometimes its actors/creators) • Introduces characters, settings, and storylines • Aims to attract or ‘grab’ viewers • Establishes a particular emotional mood • Creates a particular aesthetic • Predicts the genre of the series, creating genre expectations in viewers • ‘Types’ or ‘brands’ a particular series through packaging of key features Clearly, TTSs are multifunctional, but all of their functions are targeted at viewers. This is another example of the audience design of fictional television (Bednarek, 2010, pp. 15–17; Bubel, 2006), which means that features of the audiovisual text are always designed with the target audience in mind. From the point of view of the television professional: For programme titles [sic] sequences the key thing is to try and get some sense of storyline as well as the key characters across and to try and establish the difference of the show as well. For a 30-second title sequence, a viewer seeing it for the first time should know exactly what that show’s about. Great title sequences will do that in one hit; it should do what it says on the tin. (Mistry, 2006, p. 93) The functions of the TTS are therefore also closely tied to the capitalist marketplace and its systems of marketing and advertising, as they aim to grab and attract viewers to a particular program and channel (see further section 4.3). 3.  QUANTITATIVE FINDINGS To explore general characteristics of the TTS, I surveyed 50 contemporary (2000–2010) U.S. American fictional television series from a variety of genres (action, adventure, crime, comedy, drama) for their TTS. This is part of a larger research project into TTSs, which methodologically proceeds in three steps: 1. A surface-oriented quantitative investigation, or survey of all 50 TTSs: Much like content analysis—a methodology common in media studies (e.g., see Bell & Milic, 2002)—this means coding texts for selected content features, but does not necessarily involve semiotic analysis.

38  Monika Bednarek 2. A pilot study coding one or a few TTSs using simplified semiotic analysis: This shows the kinds of qualitative insights that can be gained by drawing on semiotics but should be adaptable for later analyzing a large amount of data (in step 3). 3. Based on the insights gained in step 2, all TTSs are analyzed semiotically, allowing a combination of quantitative findings with the uncovering of qualitative insights (see Bell & Milic, 2002). In this chapter I will describe the results of steps 1 and 2 only, focusing in particular on the kinds of qualitative insights that can be gained by drawing on a simplified semiotic analysis. For reasons of scope, I will not discuss the implications for step 3 in this chapter. Each TTS was first surveyed for the following features, chosen because they provide important general information as well as insights into visual content and style and aspects of sound:3 • Length: How long (in seconds) is the TTS? • Credits: Does the TTS include credits (the names of actors/creators, etc.) or not? • Characters: Does the TTS focus on the main characters (i.e., most of its shots feature them)? • Sound: Does the TTS incorporate any of the following: spoken language (voice-over narration/dialogue), sound(s), a song (with a singer’s voice), music (without lyrics)? • Style: Are the images used in the TTS ‘realistic’ or not? This relates to whether the TTS uses still/moving shots of characters, processes, events (realistic) or comics, drawings, paintings, etc. (nonrealistic). Briefly, this survey suggests that the average length of a TTS is 34 seconds, although there is a lot of variation (and thus deviation from the average), with TTSs in the range of 1–20 and 21–40 seconds being very common. The majority of TTSs (29) include full credits, and almost half focus on the show’s key characters, introducing viewers to the main character(s) that they can expect to encounter and engage with throughout the season. Further, a clear majority of TTSs (31) feature music without lyrics rather than songs or other sounds. Finally, most TTSs (27) use a realistic style. The survey also points to features that need to be analyzed more delicately, such as style—here we could draw, for instance, on Kress and Van Leeuwen’s (2006, pp. 154–174) notions of modality/coding orientation, which refers to sets of reality principles in which texts are coded. Another venue for further investigation concerns the clustering of features, that is, the simultaneous co-occurrence of features (see Zhao, Chapter 9, this volume, for a different definition of clustering). For instance, if a TTS is short, what other features does it simultaneously exhibit? To give an example of such clustering (see further Bednarek, in press), a particular type of TTS is very short, features no credits, only animates the show’s title, and is accompanied by sound or music without lyrics (e.g., the

The Television Title Sequence  39 TTS for The Vampire Diaries). This is an easily recognizable subtype of the TTS, which is distinguished by a particular clustering of features. A surface-oriented quantitative investigation such as this can be used at the beginning of any project to generally survey the landscape and to provide a quantitative perspective on the data from the start (for additional details on the survey see Bednarek, in press). However, as we will see, a qualitative analysis can provide more in-depth insights. 4. A CASE STUDY IN QUALITATIVE MULTIMODAL ANALYSIS: THE TITLE SEQUENCE OF FLIGHT OF THE CONCHORDS

4.1  Data and Framework For a qualitative study of a TTS, the musical television comedy Flight of the Conchords (Bobin et al., 2007–2009), henceforth FOC, was chosen for analysis. The series takes place in New York, although the two main characters are from New Zealand. This program was selected because it is very contemporary, it is an example of one of the main television genres (comedy), and it has had considerable critical success (receiving nominations for industry awards such as the Emmys). It also “sparked a cult following” (Kale, 2009, p. 118; Lloyd, 2011, p. 417). The TTS for season 1 was chosen for analysis here, as it would be the audience’s first encounter with the series.4 This TTS is publicly available on YouTube (HBO, 2011a), so readers can watch it online as they read this chapter. Before moving to the discussion, I will briefly summarize descriptions of the main characters (Bret and Jemaine), setting, storylines, and humour of the show from its official website, the Internet Movie Database, Amazon. com, the DVD covers, and key academic publications: • Characters: “Two New Zealander friends” (IMDB, 2011); “dorky New Zealand rockers” (Brettingham, 2008, p. 37); “Expressing emotion doesn’t seem to come easily to Bret and Jemaine, unless it’s through song”; “these fictional innocents . . . remain curiously unaffected by any problems” (from Bartlett, 2008); “nerdy hipsters” (Kale, 2009); “clueless”, “naïve” (Lloyd, 2011) • Setting/storylines: “the trials and tribulations of a two man, digi-folk band from New Zealand as they try to make a name for themselves in their adopted home of New York City” (HBO, 2011b); “our heroes contend with unrequited love, bohemian parties, inept criminals and their (single) obsessed fan, breaking into song as they clumsily attempt to break into the New York scene” (Bobin et al., 2007); “live in squalor” (Liebenson, 2011) • Humour: “droll and deadpan”; “off-center charms”; “surreal lunacy” (Liebenson, 2011); “rather dry humour” (Bartlett, 2008); “quirkiness and awkwardness as its key narrative modes”; “quirkiness hinges on absurd” (Kale, 2009); “self-denigrating”, “hybrid comedy” (Lloyd, 2009); “transnational joking” (Lloyd, 2011)

40  Monika Bednarek I will show how these characteristics of FOC get construed in its TTS, with the aim of “typing” a television programme and attracting a particular target audience. First, the TTS as a whole was analyzed for features used in the quantitative survey (length, credits, characters, sound, style) to allow a comparison with the other TTSs surveyed. Then a more qualitative multimodal analysis was undertaken. In order to do so, the TTS was divided into shots. Drawing on research in film and semiotic analysis (Bordwell & Thompson, 2004, p. 218; Huisman, 2005, p. 170; Iedema, 2001, p. 188), a shot is the result of uncut camera action where typically one camera angle is held on an object or a scene. Shots can consist of one or more frames and are joined through cuts or other types of joins/edits like fade-outs, fade-ins, dissolves, or wipes (Bordwell & Thompson, 2004, p. 218–219). Bateman (2009) defines shots as “perceptually available, visual segmentation units” (p. 146), and crucially, I treat them here as analytic segmentation units rather than semantic ones. A segment or unit of analysis needs to be chosen to allow both replicability of the study and quantification of the analyses (e.g., 7 out of 8 shots). The shot was chosen as a unit of analysis (rather than frames or phases; see Baldry & Thibault, 2006) because it is a unit widely used in film/television and narrative studies, and, as O’Halloran (2004) points out, “the major systems for each metafunction [ideational/interpersonal/textual meaning, as explained below] . . . are operational” (p. 117) in shots. Shots are also of relevance both for television production and editing, as well as for the audience: “as viewers, we perceive a shot as an uninterrupted segment of screen time, space, or graphic configurations” (Bordwell & Thompson, 2004, p. 219). Table 3.1 shows screenshots from the beginning of the eight shots of Flight of the Conchord’s TTS. In reality, each shot was analyzed as a dynamic whole, including any changes within shots; this will be discussed in more detail below. Thus, the shots were not analyzed as static screenshots as this table may suggest; rather, they were analyzed as dynamic wholes and watched repeatedly with the help of video software. Each of the eight shots in the TTS was analyzed using a simplified multi­ modal analysis inspired by social semiotics.5 A social semiotic approach (e.g., Jewitt, 2009; Kress, 2010; O’Halloran, 2004; Tseng & Bateman, 2010) distinguishes three types of meaning made in images (e.g., Caple, 2009; Kress & Van Leeuwen, 2006; Martin, 2001): • Ideational/representational: construing reality in terms of “who is doing what to whom, where, when, why and how” (Caple, 2009, p. 57) • Interpersonal or interactive: enacting and negotiating social relationships • Textual or compositional: organizing the image Various frameworks have been suggested for analyzing these three types of meaning; in this chapter I will mainly adapt aspects of Martin’s (2001), Kress and Van Leeuwen’s (2006), and Caple’s (2009) frameworks.6 A

Shot 6

Shot 5

Shot 7

Shot 3

Shot 8

Shot 4

*See below on what happens within shot 5. Note that there is a perceivable cut between shots 6 and 7, that is, shot 7 is not simply a zoom-out.

Shot 2

Shot 1

Table 3.1  FOC’s TTS*

42  Monika Bednarek simplified analysis, rather than an exhaustive analysis of all possible features (as presented in, for example, Baldry & Thibault, 2006; Kress & Van Leeuwen, 2006; or O’Halloran, 2004) was adopted, because my ultimate aim is to analyze a large amount of data in step 3 of the research project, as outlined above. I will briefly discuss each type of meaning in turn, describing the analytical framework adopted. Ideational/Representational Meaning Kress and Van Leeuwen (2006) propose to analyze representational meaning in terms of participants, processes, and circumstances and have elaborated an intricate way of analyzing narrative structures. My own adaptation analyzes representational meaning in a more simplified way: • Represented participants: What participants (human and nonhuman entities involved in processes) are represented in the shot? • Action processes: What processes are participants engaged in? Is the process dynamic (i.e., showing movement), nondynamic (i.e., not showing any movement), reactional (i.e., showing gaze—looking at an object/ person within or outside the frame), or involving speech (i.e., showing lip movement or movements such as head nods)? These process types are partially adapted from Kress and Van Leeuwen (2006) and partially from Baldry and Thibault (2006). I adopt this particular categorization of processes as they are useful for further analysis in terms of genre— for example, do TTSs for action/adventure genres feature primarily dynamic action processes? Or do TTSs for dialogue-based sitcoms feature more speech processes? • Circumstances: Where do the processes take place? What is their ­setting? These features were selected because they are general enough to allow application to a large data set, yet still seem to get at the main narrative elements—who is participating, what are they doing and where are they located? Interpersonal/Interactive Meaning Interpersonal or interactive meanings concern ways in which images engage viewers, establishing particular social relationships with them. Closely following Kress and Van Leeuwen (2006) and Martin (2001), the following interpersonal features were analyzed for each shot: • Contact: Are the characters looking directly at viewers, thereby demanding to engage with them (demand) or not, offering themselves up as entertainment (offer)? • Affect: Are the characters engaging viewers emotionally or not, by showing (positive or negative) emotional responses? • Involvement: Are the characters presented using frontal angles and thereby including the viewer into the televisual world (involvement),

The Television Title Sequence  43 or are characters presented using oblique angles and thereby presenting the televisual world as one that excludes the viewer (detachment)? • Social distance: What shot types are used to present characters? Are they presented in an intimate-personal (very close/close shot), social (medium shot), or impersonal (long shot) relationship with viewers? • Power: What (vertical) angles are used to present characters? Are they presented as equals (eye-level angle) or as having more (low angle) or less (high angle) power over the viewer? Textual/Compositional Meaning Textual or compositional meaning concerns the image itself, specifically the way in which elements are placed within the image (or shot). Drawing on Kress and Van Leeuwen’s (2006) notion of salience and on Caple’s (2009) concept of balance, the following compositional features were analyzed for each shot: • Salience: Which human character is salient (i.e., attracts the viewer’s attention first in the shot)? • Balance: How are elements distributed in the shot to achieve balance (or not)? Is there one element that is singled out and focused on (isolating), or are two elements of equal size arranged evenly in the image frame (iterating)? Each of these options includes further sub-options as illustrated in Table 3.2 on p. 44. Table 3.3 on p. 45 shows an example analysis with respect to representational, orientational, and compositional meaning. Note, again, that shots were analyzed as dynamic wholes, not just as static images as suggested by the screenshots and that each type of meaning was analyzed in turn (e.g., all shots were first analyzed for representational, then for interactive, then for compositional meaning, using separate tables). Before discussing the results for all three types of meaning, one further methodological issue needs to be addressed. The frameworks used in the analysis of the TTS were developed for still, rather than moving images (shots). According to Van Leeuwen (1996), characteristics of moving images that need to be taken into account include the fact that the unit of analysis is dynamic (a shot, rather than a static image) and that this dynamism includes both motion of the camera and of entities shown in the shot. Shots are also combined into sequences through editing and combined with sound effects. Keeping this in mind, it is nevertheless possible to apply these frameworks, provided that the specific aspects of moving images are considered. For example, any changes that occur within shots (such as a zoom-in) need to be clearly noted in the analysis. In my analysis of the TTS for FOC I have therefore ‘translated’ the dynamic nature of each shot into language, so that each shot analysis notes any changes that occur within the shot using prepositional phrases (e.g., “from very impersonal to impersonal” in Table 3.3) or expressions of time (e.g., “first Bret and Jermaine, then aquarium visitors”). My analysis also

44  Monika Bednarek Table 3.2  Balance sub-options Type of Balance Isolating: ­centred

Definition from Bednarek & Caple (2012, pp. 160-180)


“A single element in the centre of (or filling) the image frame” (p. 166).

(TTS Birds of Prey)

Isolating: axial

The element that is singled out and focused on “is shown in relation to other elements in the frame but along the diagonal axis” (p. 167).

Iterating: dividing: matching

Two elements of equal size are arranged evenly in the image frame, captured doing the same or a similar thing (i.e., matching in their posture).

(TTS Birds of Prey)

(TTS Flight of the Conchords)

Iterating: dividing: facing

Two elements of equal size are arranged evenly in the image frame, captured facing each other.†

(TTS Flight of the Conchords)

While the two elements (Bret and Jemaine) are not of equal size, they are still facing each other and the image is relatively balanced. †

considers the interaction of visuals and music, as well as the transitions used (although I do not analyze cohesion or narrative construction; see Tseng & Bateman, 2010; Van Leeuwen, 2005).

4.2  The Charming Quirks of Flight of the Conchords Having introduced the data and framework of analysis, I will move to discussing the findings. First, how does the TTS for FOC compare with other TTSs considering general surface features?


Table 3.3  Example analysis

Balance 2 Centred/axial Dividing: matching Dividing: facing Dividing: matching

Balance 1 Isolating Iterating Iterating

Brett, Jermaine (both characters are equally salient)

Equality (horizontal angle used throughout the shot)

From very impersonal to impersonal (the characters move toward the camera, but the distance remains impersonal)

Involvement (frontal angle used throughout the shot)

N/A (facial expressions cannot be seen)

Offer (no eye contact with viewer)

Salience Name of salient character(s)

Power: equality/viewer power/representation power

Social Distance: intimate-personal/social/ impersonal

Involvement: involvement/ detachment

Affect: ­positive/ negative

Contact: demand/ offer

New York, street scene


Dynamic (characters are walking along street, with visible movement of both characters)

Action Processes

Bret, Jermaine

Represented Participants

46  Monika Bednarek In terms of length, this 30-second TTS instantiates a common choice among TTSs. The same can be said with respect to its featuring of credits, since it names actors, producers, and creators. Where it differs from most other TTSs is with respect to the audible whistle in shot 4, as it is rare for TTSs to feature similar types of sounds. At the same time, this TTS has in common with many others that it features music without lyrics, which is composed by the real-life band Flight of the Conchords (Bret McKenzie and Jemaine Clement). Further, FOC’s TTS is an example of a character-driven TTS, which focuses on the main characters throughout, as Bret and Jermaine (the two main characters) are recognizable in almost all of the shots. As in the majority of TTSs, a realistic style is used in FOC to portray the characters and setting. This is perhaps unexpected given the ‘quirky’ and ‘absurd’ nature of the show, which could have been indicated through use of either a nonrealistic style or through a mix of styles. In fact, the DVD covers for FOC combine line drawings (nonrealistic) with photographs (realistic). As we will see later, FOC’s quirkiness is, however, indicated through other techniques in the TTS. In most respects then, the TTS for FOC is fairly typical. However, this does not mean that it does not incorporate innovative aspects as the qualitative analysis will show. Ideational/Representational Meaning First, what does an analysis of the represented participants reveal? All but the final shot include both Bret and Jermaine. In so doing, the TTS focuses viewers’ attention on just the two major characters and defines them as the main participants in the show’s storylines. Alternatively, it could have included shots of other major characters in the show, such as the obsessive fan, their friend Dave, or the NZ ambassador (their agent), defining the show more as an ensemble program such as The L Word, Friends, and so forth. This characterizes FOC as a “buddy-duo” show (Medhurst, 2007, in Lloyd, 2011, p. 417) and allows viewers to observe characteristics such as Bret and Jermaine’s looks, clothes, and accessories, which is important for characterization—the idea that they are “nerdy hipsters” is “epitomized by Bret’s skintight jeans and Jemaine’s mutton-chop sideburns and blackrimmed glasses” (Kale, 2009, p. 118). Because Bret and Jemaine are almost always represented together without any further participants, viewers also learn that the show is at least partially about the relationship between them. Further analysis of participants also reveals some unusual nonhuman participants. In shot 3, trees and park benches, and in shot 5, pepper shakers and mugs, are moving along to the rhythm of the theme music. At the end of shot 7, the whale, which is contained in the aquarium, seems to be moving towards the bottom right-hand corner of the camera while a wipe in the background transitions to the next shot (Table 3.4 on p. 47).7 All of these are indications of the show’s “absurd”, “quirky”, “off-center”, and “surreal” aspects, as reflected in “[n]arrative randomness” (Kale, 2009, p. 119) and spontaneous musical “interludes” (Lloyd, 2011, p. 417). The TTS thus predicts narrative and genre features of FOC, creating particular

The Television Title Sequence  47 Table 3.4  The whale as participant

Whale in aquarium (shot 7)

Whale starts moving out of aquarium (shot 7)

Whale keeps moving (wipe in background)

Wipe almost completed (shot 8)

expectations about these in viewers and characterizing the show’s key ingredient of ‘quirk’. Second, what does an analysis of the action processes show? Most of the main action processes are dynamic, with many of them interacting in rhythm with the accompanying music. While the images of a TTS generally “fit the music” (Gripsrud, 1995, p. 194), FOC goes beyond this because the participants move with the music. This defines the show as one in which music plays a big part, to the point of intruding into the narrative, as the theme music impacts directly on the represented processes (human and nonhuman participants move along to its rhythm). It also foregrounds the constructedness of the TTS. Alongside processes with absent agents (in shot 2, guitars are thrown into the shot from outside the frame) and odd processes (e.g., Bret timing Jemaine with a stopwatch/whistle), this again contributes to characterizing FOC as quirky. This is also the result of using special effects within shots: in shot 5, things and participants just seem to suddenly appear while the camera angle is held constant (thus categorized as one shot) and the background remains identical (Figure 3.1).

48  Monika Bednarek

Figure 3.1  Shot 5 in detail (Fig. continues on p. 49)

The specific processes that are represented also give viewers an insight into the relationship between the participants (e.g., that they are friends is indicated by the high-five in shot 4; that they are flat mates is indicated by showing them eating different meals in shot 5) and their aspired profession (e.g., playing their guitars). The processes thus also contribute to defining the characters in the televisual narrative. Third, what does a look at the settings tell us? These include outdoor settings, such as a street, house stairs, and a door/building, and indoor settings, such as rooms in an apartment and an aquarium. On the one hand, these settings locate the narrative (and some viewers may even recognize the building in shot 1 as located in New York). On the other hand, the settings contribute to characterization, in suggesting that the two characters are flat mates (apartment settings) and starving musicians (living in a dingy apartment, playing in an aquarium, advertising with chalk and flyers at the door of a small venue). Thus, the settings work to give audiences a sense of where the narrative is located as well as telling us more about who is involved in the narrative. Interpersonal/Interactive Meaning Interpersonally, almost all of the shots are offers, where no direct eye contact is established—that is, the characters are not looking directly into the camera at the viewer. This is typical of naturalistic drama, where the audience is placed “in the role of an unseen observer, a voyeur of what is going on” (Van Leeuwen, 1996, p. 93), and the characters and storylines are offered up as entertainment. In none of the shots do the characters show any discernible affect in their facial expressions. This is all the more foregrounded because of the unusual circumstances in which the characters find themselves (i.e., guitars being thrown into the shot, salt shakers moving). This lack of facial affect aligns directly with descriptions of the show as “deadpan” and “dry”, with characters who cannot easily express emotion and remain unaffected. This is clearly used to introduce the characters’ personalities. Apart from shot 4, all shots use eye-level vertical and frontal horizontal angles to present characters. Thereby, the viewer is put into a relationship that is equal and involved, with characters presented as easy to relate to and identify with. Although Bret and Jemaine are, “with their odd accents, strange mannerisms and eccentricities, the ultimate assertion of difference” (Kale, 2009, p. 120), FOC can be seen “as an attempt to reverse

The Television Title Sequence  49

Figure 3.1  (Continued)

this significance and create a world that can be equally shared by stalkers, New Yorkers and New Zealanders” (Kale, 2009, p. 120). As Bartlett (2008) states, “we can easily identify with the small scale of their operations—the boys might almost be us” (p. 155). Finally, social distance varies from impersonal to social with no examples of intimate-personal (close-up) shots. It could be argued that viewers are thereby positioned as audience, rather than as the main characters’ friends. This aligns us, the viewers, with the fictionalized audience for which the show’s characters perform in the series, possibly blurring the distinction between ‘inside’ and ‘outside’ audience. Textual/Compositional Meaning Compositionally, Brett and Jermaine are the salient characters in most of the shots. Exceptions are shot 8 (no salient human character) and shots 6 and 7 (Table 3.5). While shot 6 starts out with Bret and Jemaine as salient characters, they soon become part of the background, as visitors to the aquarium start walking across the screen (and later the whale becomes a salient participant, as mentioned previously). Further, these visitors are pictured facing away from Bret and Jemaine, paying them no attention. These representational choices and the change in salience signify the lack of attention to and success for Bret and Jemaine’s band, a “rock band that can only schedule a performance at the local aquarium” (IMDB, 2011) where they are ignored by the visitors. These shots are also interesting in terms of balance (how elements are distributed in the shot to achieve compositional balance; see Table 3.2 for Table 3.5  Changes in salient characters

Shot 6 (Bret and Jemaine salient)

Shot 6 (Bret and Jemaine nonsalient)

Shot 7 (Bret and Jemaine nonsalient)

50  Monika Bednarek further details). While shot 6 starts out as balanced (iterating: dividing: matching), this shot and the following quickly become ‘unbalanced’, with no discernible ordering of the elements in the frame to achieve ‘rest’ for the eye, as several visual units of information are competing for our attention (see Bednarek & Caple, 2012, p. 173). Again, this very clearly contributes to characterization (of the band and their lack of success). In a sense, it also makes us feel for them: We are confronted with unordered chaos, as are they (rather than being surrounded by a rapt audience). All other shots with Bret and Jemaine can be analyzed as iterating: dividing: matching (shots 1, 2, 3, 5) or iterating: dividing: facing (shot 4). The compositional meaning thus contributes to characterizing the two as very close friends, potentially even one unit (the beginning of shot 1 can arguably be analyzed either as iterating: dividing or as isolating: axial, with the two friends forming a single element that is shown in relation to the red staircase). At the same time, the fact that they are repeatedly shown as matching, with (almost) identical postures again points to the constructedness and quirk of the show, since this is unnatural; in ‘real life’ we do not constantly adopt the same posture as the person next to us.

4.3  The FOC ‘Hipster’ as a Commodified Identity So far, I have provided a descriptive multimodal analysis of the TTS. However, it is also possible to view this social practice more critically. I will draw on the concept of commodified identity to discuss this further. This refers to the various ways in which identity is commodified, including “through acts of consumption (How do commercial discourses such as advertisements ‘speak’ to us and engage us with their message?)” (Benwell & Stokoe, 2006, p. 165). There are several ways in which commodification comes into play in relation to TTSs. First, the TTS is a clear example of commercial discourse, ‘typing’ or ‘branding’ a particular television series through short and attractive packaging of key features, with the aim of attracting or ‘grabbing’ viewers so that they watch a television series and/or engage with the platform (e.g., the television channel) on which they access that series. As I have argued earlier, the TTS is therefore closely tied to the capitalist marketplace and its systems of marketing and advertising. Choices made in the TTS work in the service of the overall functions of this multimodal sequence and are ultimately based on the desire to maximize profit. The TTS is all about selling a product to consumers: In the multichannel environment . . . there are hundreds of channels the viewer can choose to go and watch instead of yours, so that first hit they get of your show is absolutely everything. You’ve got to grab them there and then. Both programme title sequences and channel brands are the packaging for your product, and if the packaging isn’t grabbing the viewer then they don’t open that package and have a look. (Mistry, 2006, p. 93)

The Television Title Sequence  51 The analyses above thus show how a particular commercial discourse speaks to viewers and engages them with its message. We can view this critically as an attempt to manipulate audiences into particular acts of consumption. Further, we can discuss more critically how audiences are drawn into these acts of consumption by creating particular identities for them (see also Kress, 2010, p. 172–173)—identities that they can claim by consuming FOC and its associated products. As the analyses above have shown, key ‘ingredients’ of the TTS are choices that foreground ‘quirkiness’—unusual nonhuman participants, processes with absent agents, odd processes, special effects within shots, compositional choices—and ‘hipsterism’ (constructed through the characters’ hipster looks and behavior). Quirkiness itself is also associated with hipsters, with searches for “hipster quirk” or “quirky hipster” resulting in thousands of Google hits. By consuming FOC, audiences can thereby claim that they are ‘hipsters’ who appreciate its quirkiness. In this way, consumerism becomes a discourse through which we can signify our identities and roles (Machin & Thornborrow, 2003, p. 468), and viewers can distinguish themselves from and bond with others through consumption (Benwell & Stokoe, 2006, p. 176; Van Leeuwen, 2005, p. 145). The identity of the ‘hipster’,8 which has positive associations of creativity, coolness, and ‘indie’ (independent) nonmainstream culture (specifically music) (see Arsel & Thompson, 2010), is thus turned into a commodified identity, or a commodity. By consuming FOC and displaying its associated products (calendars, T-shirts, etc.) the consumers can signify that they are creative, indie, hip, and so forth; it affords membership of a ‘cool’ subculture. Indeed, from the point of view of marketization, FOC becomes a ‘brand’ (and the TTS itself the ultimate package for that brand), whose ‘brand values’ are in fact explicitly listed by its licensing company as “laconic, funny, hip, musical, indie, selfdeprecating” (Rocket Licensing, 2011). A brand is more than a product; it is “a set of representations and values that are not indissolubly tied to a specific product or products” (Machin & Thornborrow, 2003, p. 454). What is sold to the consumer is the values (i.e., quirkiness, hipsterism, indie music, etc.). The interesting thing here is that the hipster is historically antimainstream, countercultural, anticonformist, and anticonsumerist (indie, alternative) but has become another example of the reassimilation of resistant subcultures that get “invested with commodity value” (Benwell & Stokoe, 2006, p. 18).9 5.  CRITICAL READINGS In this chapter I have used quantitative and qualitative perspectives to offer a first insight into the TTS, showing the various ways in which its key functions can be fulfilled. Further investigation is necessary to identify different subvarieties of TTSs systematically. My focus in this chapter has been on the product (the TTS), but we also need to look into its production processes because these “have a huge influence over the nature of the texts that we analyse” (Machin, 2009, p. 189).

52  Monika Bednarek On the other hand, we need to look at the complex ways in which audiences engage with such televisual texts, for example analyzing indie consumers (see Arsel & Thompson, 2010). In this regard, audience studies (Benwell & Stokoe, 2006, pp. 170–171; Briggs, 2010; Burton, 2000, pp. 211–234) would help us investigate how uninitiated viewers perceive the meanings made in the TTS and what kind of expectations it generates in them about the advertised television series. Offering a critical perspective, the analysis above followed Machin and Thornborrow in using multimodal analysis to uncover the “specific discursive practices” (Machin & Thornborrow, 2003, p. 454) of a contemporary commercial brand. My own critical reading in this chapter comprises what I call critical (multimodal) discourse analysis with a small c, rather than Critical Discourse Analysis with a big C. I use the latter (Critical Discourse Analysis) here to refer specifically to those critical analyses of discourse that follow one of the key models/methods associated with CDA (such as Fairclough’s framework, Van Dijk’s social-cognitive model, Wodak’s ­discourse-historic method; see Richardson, 2007 for an overview). I use the former (critical discourse analysis) to refer to critical analyses of discourse that do not follow such a model. Both small c and big C discourse analysis can be contrasted with more neutral descriptions (discourse analysis) that do not take a critical stance. Apart from sharing this critical perspective, small c and big C discourse analysis also share a concern with systematically analyzing discourse (in its broadest sense), drawing on frameworks from linguistics or semiotics, and using these analyses to support interpretation of how discourse functions in a particular social context. This is why the term CDA is often used to encompass both what I label small c and big C discourse analysis in this chapter to allow for a more nuanced identification of my own approach here. Finally, I believe that the critical perspective that I have taken in this chapter should be complemented with a ‘positive’ one. As I have argued elsewhere (Bednarek, 2010, p. 223), there are both positive and critical things to say about popular television. The TTS for FOC is a commercial product aimed at attracting consumers, but, at the same time, it is also a creative achievement. NOTES 1. I am very grateful to Helen Caple, Emilia Djonov, and Sumin Zhao for constructive comments on an earlier version of this chapter. 2. In this chapter I use television series as a cover term for serialized fictional television, including both serials and series. This excludes reality television and other nonfictional serialized content. 3. Of the surveyed 50 television series, only two did not have a TTS—Glee and The Shield. 4. The TTS for season 2 is slightly different, but not radically, and retains the same music as well as some identical shots. 5. “Inspired” is used here deliberately to signify that I do not self-identify as a social semiotician or systemic functional multimodal discourse analyst in terms of adopting completely the theoretical model behind this type of

The Television Title Sequence  53 a­ nalysis. Rather, I make practical use of some of the tools developed by these schools for analyzing images. 6. It is beyond the scope of this chapter to outline in detail how my own adaptation differs from Martin (2001), Kress and Van Leeuwen (2006), and Caple (2009), and I refer the reader to these publications for further information and background. 7. For viewers, who perceive the TTS as a dynamic, moving sequence, it might seem as if it is the whale that constitutes the wipe and joins the two shots. 8. This is a ‘lifestyle’ identity, a type of identity that is characterized by consumer behaviour, taste, leisure time activities, and attitudes and is used in marketing to classify consumers (Van Leeuwen, 2005, pp. 145–146). 9. See Arsel and Thompson (2010, p. 795ff) for historical analysis of how indie culture became appropriated as the ‘hipster’ marketplace myth.

REFERENCES Arsel, Z., & Thompson, C. J. (2010). Demythologizing consumption practices: How consumers protect their field-dependent identity investments from devaluing marketplace myths. Journal of Consumer Research, 37(5), 791–806. Baldry, A., & Thibault, P. J. (2006). Multimodal transcription and text analysis. London: Equinox. Bartlett, M. (2008). Innocents abroad: Flight of the Conchords. Metro Magazine, 158, 154–155. Bateman, J. (2009). Film and representation: Making filmic meaning. In W. Wildgen & B. Van Heusden (Eds.), Metarepresentation, self-organization and art (pp. 137–162). Bern: Peter Lang. Bednarek, M. (2010). The language of fictional television: Drama and identity. London: Continuum. Bednarek, M. (in press). “And they all look just the same”?—A quantitative survey of television title sequences. Visual Communication. Bednarek, M., & Caple, H. (2012). News discourse. London: Continuum. Bell, J. (1992). In search of a discourse on aging: The elderly on television. The Gerontologist, 32(3), 305–311. Bell, P., & Milic, M. (2002). Goffman’s Gender Advertisements revisited: Combining content analysis with semiotic analysis. Visual Communication, 1(2), 203–222. Benwell, B., & Stokoe, E. (2006). Discourse and identity. Edinburgh: Edinburgh University Press. Bobin, J., Clement, J., McKenzie, B., Miller, T., & Smiley, S. (Executive producers). (2007–2009). Flight of the Conchords. [Television series]. United States: HBO. Bobin, J., Clement, J., McKenzie, B., Miller, T., & Smiley, S. (Producers), & Bobin, J., Clement, J., & Miller, T. (Directors). (2007). Flight of the Conchords: The Complete HBO First Season [DVD]. United States: HBO. Bordwell, D., & Thompson, K. (2004). Film art: An introduction (7th ed.). Boston: McGraw-Hill. Brettingham, M. (2008). Flight of the Conchords. The Times Educational Supplement, 4787, M37. Briggs, M. (2010). Television, audiences and everyday life. Maidenhead: Open ­University Press/McGraw-Hill. Bubel, C. (2006). The linguistic construction of character relations in TV drama: Doing friendship in Sex and the City. (Doctoral dissertation). Retrieved from Burton, G. (2000). Talking television: An introduction to the study of television. New York: Oxford University Press.

54  Monika Bednarek Caple, H. (2009). Playing with words and pictures: Intersemiosis in a new genre of news reportage. (Doctoral dissertation). University of Sydney, Sydney, Australia. Gripsrud, J. (1995). The Dynasty years: Hollywood, television and critical media studies. London: Routledge. HBO. (2011a). Flight of the Conchords theme song (season 1). [Video]. Retrieved from = 242kR8uSeQ4 HBO. (2011b). Flight of the Conchords. Retrieved from Huisman, R. (2005). Aspects of narrative in series and serials. In H. Fulton (with R. Huisman, J. Murphet, & A. Dunn), Narrative and media (pp. 153–171). Cambridge: Cambridge University Press. Iedema, R. (2001). Analysing film and television: A social semiotic account of Hospital: An Unhealthy Business. In T. Van Leeuwen & C. Jewitt (Eds.), Handbook of visual analysis (pp. 183–204). London: Sage. IMDB. (2011). Plot summary for The Flight of the Conchords. Retrieved from http:// Jewitt, C. (Ed.) (2009). The Routledge handbook of multimodal analysis. London: Routledge. Kale, N. (2009). “Who likes to rock the party?” Cultural appropriation in Flight of the Conchords. Metro Magazine, 162, 117–120. Kress, G. (2010). Multimodality: A social semiotic approach to contemporary communication. London: Routledge. Kress, G., & Van Leeuwen, T. (2006). Reading images: The grammar of visual design. London: Routledge. Liebenson, D. (2011). Review Flight of the Conchords: The complete first season. Retrieved from Lloyd, M. (2009). Nerds in the city: Flight of the Conchords makes good television humour. Media International Australia, 131, 57–67. Lloyd, M. (2011). When Jemaine met Keitha: Flight of the Conchords tackle Australia. Continuum, 25(3), 415–426. Machin, D. (2009). Multimodality and theories of the visual. In C. Jewitt (Ed.), The Routledge handbook of multimodal analysis (pp. 181–190). London: Routledge. Machin, D., & Thornborrow, J. (2003). Branding and discourse: The case of Cosmopolitan. Discourse & Society, 14, 453. Martin, J. R. (2001). Fair trade: Negotiating meaning in multimodal texts. In P. Coppock (Ed.), The semiotics of writing: Transdisciplinary perspectives on the technology of writing (pp. 311–338). Turnhout, Belgium: Brepols. Mistry, A. (2006, July 2). Branding guide. The Televisual Annual, 93. O’Halloran, K. L. (2004). Visual semiosis in film. In K. O’Halloran (Ed.), Multimodal discourse analysis (pp. 109–130). London: Continuum. Piazza, R., Bednarek, M., & Rossi, F. (Eds). (2011). Telecinematic discourse: Approaches to the language of films and television series. Amsterdam: John ­Benjamins. Richardson, J. (2007). Analysing newspapers: An approach from Critical Discourse Analysis. Basingstoke: Palgrave Macmillan. Richardson, K. (2010). Television dramatic dialogue: A sociolinguistic study. Oxford: Oxford University Press. Rocket Licensing. (2011). Rocket Licensing: brands: Flight of the Conchords. Retrieved from Tseng, C., & Bateman, J. A. (2010). Chain and choice in film narrative: An analysis of multimodal narrative construction in The Fountain. In C. Hoffmann (Ed.), Narrative revisited (pp. 213–244). Amsterdam: John Benjamins. Van Leeuwen, T. (1996). Moving English: The visual language of film. In S. Goodman & D. Graddol (Eds.), Redesigning English: New texts, new identities (pp. 81–105). London: Routledge. Van Leeuwen, T. (2005). Introducing social semiotics. London: Routledge.


The Strategic Use of the Visual Mode in Advertising Metaphors Charles Forceville

1. INTRODUCTION George Lakoff and Mark Johnson (1980) are usually recognized as the founding fathers of what has become known as Conceptual Metaphor Theory (CMT), although Ortony (1979) helped pave the way. Briefly, CMT, rooted in Cognitive Linguistics, sees metaphor—which Lakoff and Johnson define as “understanding and experiencing one kind of thing in terms of another” (1980, p. 5)—as a device that systematically structures the way humans conceptualize abstract and complex phenomena by comprehending them via concrete phenomena (i.e., phenomena that are experienced through the human body and its sensory organs). Examples of such structural metaphors, extensively studied over the past 30 years are life is a journey, time is space, and emotions are physical forces. (In the CMT paradigm it is customary to signal metaphors’ conceptual level by using small capitals.) However, CMT scholars have gradually begun to acknowledge that the ‘embodied’ basis of metaphors needs to be complemented by cultural dimensions (e.g., Gibbs, 2008; Gibbs & Steen, 1999; Kövecses, 2005). This work is expanding, but CMT’s focus is still predominantly on verbal manifestations of the embodied dimension of conceptual metaphors. Lakoff and Johnson’s (1980) insistence that metaphors are primarily conceptual has not only spawned a rich body of work on how structural metaphors help shape our thinking, but has also given rise to a different type of research. If metaphor characterizes thought and action, metaphors must pervade nonverbal modes, and combinations of modes, as well. This insight has resulted in research focusing on creative pictorial/visual metaphor (e.g., Forceville, 1996; see also Carroll, 1996; Whittock, 1990) and multimodal metaphor (Forceville, 2006, 2007, 2008; Forceville & Urios-Aparisi, 2009), with the latter also addressing work in gesture studies (e.g., Cienki & Müller, 2008; Mittelberg & Waugh, 2009; Müller, 2008). A major strand in my own work focuses on this second type of research, that is, on the pictorial and multimodal manifestations of creative rather than structural metaphors, where creative metaphors are original, oneoff metaphors (such as Wallace Stevens’ “A poem is a pheasant”) and

56  Charles Forceville structural metaphors are lexicalized (conventionalized, dead/dormant) metaphors such as “he passed away” (= dying is departing from here) and “she attacked my paper” (= argument is war). My work is as indebted to Black’s (1979) theory of creative metaphor as it is to Lakoff and Johnson. In modern metaphor theory, Max Black was arguably the first to revalue metaphor as a central instrument of cognition, after philosophers had long ignored the trope. This neglect, according to Black, was due to the misguided idea that only true propositions can contribute to knowledge, and metaphors are typically propositions (if they take a propositional form in the first place) that are literally false. Black moreover made the important point that metaphors do not necessarily capture a preexistent similarity between the two parts of a metaphor (target and source, in modern parlance), but may create that similarity. The main genre within which I have analyzed such metaphors is commercial advertising. Contemporary advertising is rich in metaphors, and my goal to develop a model for the analysis of pictorial (or visual) metaphor has been well served by the clear purpose of this type of discourse: to sell or promote a product, service, or brand. In most of this work, I have aimed to answer questions pertaining to the formal dimensions of pictorial and multimodal metaphor: What qualities does a certain configuration of visual elements need to possess to qualify as a pictorial metaphor? Can we distinguish subtypes, and if so, which ones, and how can we tell them apart? How, if at all, does pictorial metaphor relate to other visual tropes, such as metonymy? What makes a metaphor multimodal rather than monomodal? What modes or modalities (the two terms are here used interchangeably) can play a role in metaphor creation, and how? Does the choice of verbalization of a nonverbal or partly verbal metaphor—indispensable in scholarly publications—affect its possible interpretations? Such issues are important, and need to be addressed before it is possible to use insights into how metaphor functions in critical theory. In this chapter I will revisit the genre of advertising, and demonstrate how analyzing the modes used in pictorial and multimodal metaphor can contribute to a critical assessment of product and brand claims, and of the assumptions underlying the derivation of such claims. I will first give the bare bones of the model for analysis, then apply it to show how metaphors steer interpretation, and end with some conclusions and suggestions for widening the scope of this line of research. 2.  PICTORIAL AND MULTIMODAL METAPHOR Let me begin by introducing and explaining some basic assumptions and terminology. Each metaphor has an underlying a is b form. Sometimes a verbal metaphor already manifests this a is b form on the linguistic surface level. In the title of Pat Benatar’s 1980s hit single the verbal “Love is a battlefield” and

The Strategic Use of the Visual Mode in Advertising Metaphors  57 the conceptual level love is a battlefield coincide, but in the lyrics the phrase “when your heart surrenders” exemplifies the same conceptual metaphor. A metaphor concerns one thing, its target (also known as ‘topic’, ‘tenor’, or ‘primary subject’), about which something is predicated by the source (‘vehicle’, ‘secondary subject’). In Benatar’s single, love is the target and battlefield the source. In “when your heart surrenders” the word “heart” belongs to the semantic domain of the target, and “surrenders” to that of the source. In a metaphor, there is in principle never any doubt what is its target, and what is its source. Within CMT the irreversibility of target and source in metaphor is a central tenet (see, e.g., Forceville, 1996, p. 12; Lakoff & Turner, 1989, p. 132). This, to be sure, only means that in a given context target and source cannot be reversed; in other contexts one might come across metaphors in which the terms have shifted slots (in CMT the classic example is “my butcher is a surgeon” versus “my surgeon is a butcher,” both of which may make sense—but not in the same context). The interpretation of a metaphor consists of deciding what features or connotations of the source can be mapped (Black, 1979, uses the verb ‘projected’) from source onto target. This mapping process involves some fine-tuning. What is literally fighting, shooting, being wounded, surrendering, and so forth in the battlefield domain becomes, say, starting a relationship, not giving in to the lover’s desires, trying to hurt the lover emotionally, being emotionally hurt by the lover, admitting to being in love with the lover, and so on, after mapping on the target domain love. Importantly, the mapped features can either be spelled out by the metaphor’s creator or remain implicit. In the former case, the maker of the metaphor makes explicit how he or she intends the metaphor to be, wholly or partly, interpreted. In the latter case, the metaphor’s audience has more freedom in deciding on what is to be mapped. As long as the feature to be mapped is indeed part and parcel of the semantic domain of the source and the associations adhering to this domain, any such feature can be mapped—provided it is not incommensurate with pragmatic information that would show the mapping to be inappropriate. This has several possible consequences: (a) Addressees may map one or more features that the metaphor’s creator expects or hopes they will map; (b) addressees may map one or more features that the metaphor’s creator had not envisaged, leading either to minor or major miscommunication or to unexpected enrichment; (c) addressees may, for whatever reason, subversively map features that go against the grain of the presumably intended message. Consider the following dialogue between Shrek and Donkey from the movie Shrek (2001), which focuses on the metaphor “ogres are like onions”1: Shrek:

“For your information, there’s a lot more to ogres than people think.”

Donkey: “Example?” Shrek:

“Example? OK . . . Ahhm . . . Ogres are . . . like onions!”

58  Charles Forceville Donkey: “They stink?” Shrek:

“Yes . . . No!”

Donkey: “Oh, they make you cry?” Shrek: “No!” Donkey: “Oh, you leave them out in the sun, they get all brown, start sprouting little white hairs.” Shrek:

“No! . . . Layers! Onions have layers! Ogres have layers. Onions have layers. You get it. We both have layers.”

Donkey: “Oooohhh . . . you both have layers? You know, not everybody likes onions . . . Cake! Everybody loves cake!” In this example, the conceptual metaphor ogres are onions is presented on the verbal level in the ready-made A-is-like-B form “Ogres are like onions,” in which “Ogres” is the target, and “onions” is the source. But what is to be mapped? At first Shrek does not make explicit the mappings he envisages. This gives Donkey the opportunity, probably tongue-in-cheek, to subvert the metaphor by volunteering mappable features of the source domain that are factually correct, but are definitely not what Shrek had in mind when proffering the metaphor—which is that onions have layers. After ‘translation’ to the target domain, the layer-mapping presumably means something like: ‘is complex’ and/or ‘is less simply structured than meets the eye’. Donkey now acknowledges what Shrek means and volunteers a semantic domain that, he thinks, is a better source domain candidate given what Shrek wants to convey: cake. In Donkey’s opinion, everybody loves cake, which is therefore a salient feature in the cake domain. Moreover, cakes, in his view, presumably have layers—so cake is a better source than onions, since it is both universally liked and has layers. (A problem is that having layers is not a very salient property of cakes, and therefore is not so easily evoked by addressees of the metaphor “Ogres are like cakes”.) The point here is that metaphor makers may be unaware of unwanted salient mappings in the source domain. This may either lead to miscommunication or to a deliberate subversion of their own metaphor by the audience, which may then boomerang back to them. Clearly, the sum total of things that interlocutors know, feel, and believe about a semantic domain heavily influences the possible mappings they recruit from it. Chances for misunderstanding a metaphor become greater when it straddles different cultures or communities. In the examples hitherto discussed, metaphor users may make explicit the link between target and source verbally in the form of the copular ‘is’ (or ‘is like’). Since in verbal metaphors both target and source are presented in the same mode, language, these cases thus belong to the monomodal type. But there is no reason why the elements in a metaphor should be verbal in nature: Both a target and a source can be rendered in other modes. Now what counts as a mode/modality is a question that has spawned much debate, but this has

The Strategic Use of the Visual Mode in Advertising Metaphors  59 not resulted in much agreement (for discussion, see Elleström, 2010; Forceville, 2006; various authors in Jewitt, 2009; Kress, 2010; Kress & Van Leeuwen, 2001). I will not attempt to resolve this issue here, but take a practical approach and for the purposes of this chapter distinguish the following modes: written language, spoken language, visuals, music, and nonverbal sound. If this list of modes is accepted, this means, at least in theory, that we can have other types of monomodal metaphor than just the verbal type. We would have to postulate purely pictorial/visual, musical, sonic, or gestural metaphors if we should come across nonverbal discourses in which a perceptual configuration that invites metaphorical construal presents both the target and the source in the same modality. For pictorial metaphor, many specimens have been identified (Forceville, 1996, 2000; Van Mulken, Le Pair, & Forceville, 2010); whether monomodal musical or sonic or gestural metaphors actually occur is a matter for further investigation. But of course these modalities can be used in various permutations: A target can be presented in one modality, a source in another. In order to distinguish multimodal from monomodal metaphors, I have suggested the following definition: “Multimodal metaphors are metaphors whose target and source are each represented exclusively or predominantly in different modes” (Forceville, 2006, p. 384). Again, assessing whether all permutations actually occur requires further research. Nevertheless, as discussed below, certain types are widespread. Another caveat is in order: There is no law that forbids the maker of a metaphor to deploy more than one modality simultaneously to represent a target or source. Hence a target (or a source) can, for instance, be cued both visually and verbally. This means that the monomodal-multimodal metaphor distinction has fluid boundaries. Finally, modalities play a role not only in the identification of target and source, but also in the cueing of mappable features. This can be done verbally, as Donkey does in the Shrek fragment (“they stink?”), but also in other modalities. In a different situation, say in a cartoon, the onion’s stench could be conveyed by showing an onion with waft lines above it, and somebody sniffing it with a disgusted face or pinching his nose. To summarize, to be recognized as a multimodal metaphor, a phenomenon must meet each of the following four conditions: 1. An identity relation is created between two phenomena that, in the given context, belong to different categories. 2. The phenomena are perceived as being exclusively or predominantly conveyed in different modes. 3. The phenomena are to be understood as target and source, respectively; they are not, in the given context, reversible. 4. At least one characteristic/connotation associated with the source domain can be relevantly mapped onto the target domain; often a cluster of internally related connotations is to be so mapped (adapted from Forceville, 1996, Chapter 6; see also Black, 1979).

60  Charles Forceville 3.  THE GENRE OF ADVERTISING Unsurprisingly, advertisers are enthusiastic users of metaphors. An advertiser, after all, needs to make positive claims about a product (or service, or brand). Consequently, the advertiser must evoke positive attitudes toward, and emotions about, products (Forceville, 1996, p. 104)—and needs to do so very fast, both because of the cost of buying advertising space in newspapers, on TV, in cinema, or on the Internet, and because of the low attention value of ads and commercials. Metaphors are very effective instruments to achieve this goal, evoking the intended types of associations for a product in a space usually not exceeding a page or in a time frame of approximately 30 seconds. Generally, the product is the metaphor’s target, which is coupled with a source domain that has a structured set of precisely those prominent features that will evoke the appropriate qualities of the product—that is, those features the advertiser wants to claim for the product. Often, however, it is not in the advertiser’s interest to spell out these features verbally, since this may result in corny, offensive, or simplistic claims, which are likely to be rejected by audiences. But presenting a source in the visual and/or sonic mode cues the audience to infer the mappable features for itself. This is more subtle—or insidious. As Trevor Pateman aptly observes, “Advertisers get consumers to do their dirty ideological work for them, and keep their own hands clean” (Pateman, 1983, p. 200). 4.  CASE STUDIES The following examples of nonverbal advertising metaphors have been selected because they show how, by their clever use of the visual modality, the advertisers get us to “do their dirty ideological work for them”.

4.1  Mazda (1992) An old billboard (1992) for Mazda cars shows a young woman with a bruised eye. The pay-off text is “United colors of Mazda”—an intertextual reference to the controversial United Colors of Benetton campaign of the same period. The bruise around the woman’s eye has various colours, and each colour is connected by a line to a label (“canarian blue”, “classic red”, “neat green”, “elegant beige”, and “space yellow”) and a code. The labels and codes signify the colours of the Mazda 1992 car models. Due to the fact that the colours are visualized as the bruised areas of the woman’s eye, it is difficult to avoid construing a metaphor that could be verbalized as mazda car color program is variety of colors in woman’s bruised eye, where what is to be mapped, presumably, is the precise hues of the woman’s bruised eye onto the Mazda cars series. At the time there was quite a lot of protest against this billboard in The Netherlands, as some people complained that it was bad manners to promote a car by showing an abused woman (Figure 4.1 shows the billboard covered

The Strategic Use of the Visual Mode in Advertising Metaphors  61

Figure 4.1  Billboard for Mazda cars (Netherlands 1992), with paint thrown by a protester. Photograph by the author.

in paint thrown by a protester). What is interesting is that of course there is nothing in the billboard that warrants the conclusion that the woman has been abused—she could have run into a door. The people objecting simply supplied information that is commensurate with, but not guaranteed by, the information in the billboard. It is nonetheless probable that Mazda hoped that people would actually be angry on the basis of such inferences (a controversial Dutch Mazda commercial of the same period showed the functioning of the Mazda V6 motor as six midgets jumping quickly up and down to the sound of a motor), since it would aid free publicity for Mazda. The point to be made here is that the choice of modes (visual and verbal) making up the supposed metaphor creates a metaphor “scenario” (Musolff, 2006) that is suggested, but not explicitly communicated, by Mazda. By contrast, a purely verbal variety of the metaphor, say, “Our car color program is inspired by the subtle/idiosyncratic/unusual etc. colors in an abused woman’s bruised eye”, or even “. . . in a woman’s bruised eye”, would have been far more explicit, sounding overtly offensive or ridiculous in a way that the multimodal metaphor deployed here is not.

4.2  Nivea (2010) The metaphor in the advertisement in Figure 4.2 can be verbalized as fingernail (treated with nivea nailvarnish) is tin opener. Even without the anchoring text (“for extra strong extra long nails”), we probably would have guessed that the feature to be mapped is ‘strength’ and perhaps ‘sharpness’—since these are

62  Charles Forceville

Figure 4.2  Advertisement for Nivea nail polish

salient features of tin openers—so this would count as a monomodal, pictorial metaphor. While the choice of source domain cues the desired mappable features very well, some people may nonetheless have a problem with it in view of the semantic domains to which target and source belong. The metonymic link between varnished nail and woman is uncontroversial. However, opening tins is an activity primarily done in kitchens. In other words, there is also a metonymic link between tins and kitchens—and so the visuals in this advertisement forge a metonymic link between ‘woman’ and ‘kitchen’, which could be considered as reinforcing stereotypes. This would not have been the case if the metaphor had been, for instance, fingernail is screwdriver (see Forceville, 1996, pp. 149–152, for another controversial gender metaphor, in a motorbike ad aimed at adolescent boys).

4.3  Peugeot (2007) In this 25-second commercial broadcast on Dutch TV, we see a car that, in slow motion, avoids hitting a number of small birds by a quick steering manoeuvre (Figure 4.3). The male voice-over text comments: “One second. 25 wing-flaps of a hummingbird. Same second. 25 trajectory analyses by the sensors of the Peugeot 308. And for you: total control over the road” (my translation, ChF). At the end of the commercial there is a written pay-off, stating “Safety through control”.2

The Strategic Use of the Visual Mode in Advertising Metaphors  63

Figure 4.3  Still from Peugeot 308 commercial (2007): Car driver manoeuvres quickly to avoid hitting some hummingbirds

On the visual level, the message appears to be that the car is technically so well-equipped that it can make a last-second swerve on the road in order not to clash with some hummingbirds in midair. That is, the presence of the birds is motivated by the ‘car-almost-hits-living-creature’ scenario. However, the voice-over text mentions the hummingbirds in another sense, namely as the source domain of a metaphor that has the Peugeot car as its target. This is done subtly by comparing the “25 trajectory analyses” (whatever they may be) of the car to the 25 wing flaps per second of the hummingbird. We are thus invited to construe the pictorial metaphor peugeot car is hummingbird. The mapped feature is presented verbally, and should be understood along these lines: Just as a hummingbird flaps its wings an amazing 25 times per second, so a Peugeot can do an amazing 25 trajectory analyses per second. It is to be observed that we are invited, rather than forced, to construe the metaphor. In the first place, the voice-over text does not use a copula to directly establish the relation between the car and the hummingbird. In the second place, there is a quasi-realistic motivation for the visual presence of the hummingbirds, so a metaphorical construal is not necessary to account for their presence. If we accept the invitation to construe the peugeot car is hummingbird metaphor, we can in our interpretation map the 25 wing flaps onto the 25 trajectory analyses of the car, with connotations such as ‘admirable’ or ‘incredibly dexterous’. But we need not stop here. Given the genre convention of advertising that a commercial always makes a positive claim for a product, we may consciously or subconsciously map other features or connotations adhering to hummingbird that we might find worthy of mapping, such as cuteness, beauty, or naturalness. It is thus possible that we end up remembering from the commercial that the car is presented as

64  Charles Forceville a natural creature, perhaps environmentally friendly. To what extent such interpretations actually occur can of course only be assessed by experimental research. The points I want to make here are that (a) the hummingbirds are simultaneously part of the quasi-realistic scene, which “naturalizes” (Barthes, 1986, p. 39) their presence and constitutes the source domain of a metaphor that has the car as target; and (b) once we accept the metaphor, we may map yet other (positive) features and connotations on the basis of the visual qualities of the hummingbird and our knowledge of the world. The latter are inferred by us, at our own responsibility; after all, there is nothing in either the textual or the visual modality of the commercial that makes these latter features explicit.

4.4  Mac Campaign (2007–2010) Personification is one of the most widespread varieties of metaphor. It has the basic pattern object is human being/animate creature. The series of TV commercials promoting the Mac show (always the same) two men who respectively personify a PC and a Mac.3 In each commercial, this is made clear by the way they introduce themselves: “Hello, I’m a Mac . . . and I’m a PC.” In each instalment, a dialogue between the two ensues, which reveals Mac to be superior in one way or another to PC. Here is one of them: Mac: PC: PC: Mac: Mac: PC: Mac: PC: Mac: PC: Mac: PC: Mac: PC:

“Hello, I’m a Mac . . .” “ . . . and I’m a PC.” “What are you reading?” “Just the Wall Street Journal.” [PC grabs paper from Mac.] “No, no, no . . . PC, you know what?” “Oh, it’s a review of you.” “Ddd . . . don’t read it.” “It’s from Walt Mossberg, one of the most respected technology experts on the planet. . . . Fairly [?], you’re the finest PC on the market, at any price. Very nice!” “Just one man’s opinion.” “I actually got a great review this morning, too . . . ” “Oh? Good for you.” “ . . . They said I was awesome, and so, we’re the same.” “Where was that in?” “The, ehm . . . awesome . . . awesome Computer Review . . . Weekly . . . Journal. . . . ”

The Strategic Use of the Visual Mode in Advertising Metaphors  65

Figure 4.4a–c  Stills from various commercials in the Mac campaign (2007–2010) showing PC (left) and Mac (right).

The verbal modality alone makes the point of Mac’s superiority well enough for the commercial to work on the radio as well. The dialogue suggests that PC is impolite (reinforced by the visual information that he unceremoniously grabs the paper from Mac’s hands) and unrealistic (considering the review in the Computer Review Weekly Journal as on a par with a review by the star technology reviewer of the Wall Street Journal). By contrast Mac is polite and modest (“Just the Wall Street Journal”, “Just one man’s opinion”), and embarrassed that PC might read the glowing review of Mac (“Don’t read it”). But of course the visual appearance of the two men (Figure 4a–4c) inevitably adds features to the personification in a way that invites further mappings. The PC man (left) in most commercials wears a suit or jacket with tie; the Mac man (right) is much more informally dressed. (As an aside, it is interesting that while Mac always speaks first, it is PC who invariably stands on the left. It is tempting to understand this spatial presentation in terms of the “givennew” distinction proposed by Kress and Van Leeuwen [2006, pp. 179–185].) Moreover, PC is Caucasian, with a conventional pair of glasses and haircut, and he is slightly overweight, while Mac is lithe and athletic, handsome, with a more Latinate appearance, and looks younger than PC. It is difficult to avoid mapping these positive and negative connotations to Mac and PC respectively, even though it is entirely our responsibility to infer them. Moreover, although Mac and PC are personified, the viewer will probably also interpret them metonymically: They are the typical users of the machines—and clearly all of us would rather be a cool Mac type than a sordid has-been PC type of user.

4.5 Coca-Cola In the examples previously examined, the recurrent theme is that pictorial and multimodal metaphors in advertising often surreptitiously invite addressees to infer meanings that would be considered unacceptable when spelled out verbally, and therefore they deserve critical examination. But metaphors—in all modalities—can of course also be themselves used to make a critical comment, as Aristotle already realized: “And the source of the metaphor should be something beautiful; verbal beauty . . . is in the

66  Charles Forceville sound or in the sense, and ugliness the same” (Aristotle, 1991, p. 225). That is, one can debase a product by turning it into a metaphorical target that is coupled with a source that evokes ‘ugly’ connotations. In a picture (not reproduced here for copyright reasons) found on a blog by Rob Le Pair4, we see a petrol pump in the form of a huge can of Coca-Cola, against the background of a motorway. This undoubtedly suggests a metaphor. I was not able to determine its original provenance, and this allows for some speculation. If the image (or the petrol pump, if it exists as a real object) is designed to promote Coca-Cola, then readers are most likely expected to construe the metaphor coca-cola is petrol, and interpret it along the lines: “humans need Coca-Cola just as a car needs petrol” (and note that if the petrol station should sell Coca-Cola there is, again, a metonymic link between target and source). In this case, the ad-makers expect benevolent viewers to suppress unwanted connotations such as ‘bad taste’. Since ‘taste’ is such a salient feature in the target domain, however, the subversive “Coca-Cola tastes like petrol” comes to mind fairly easily. It is thus also thinkable that this image was created by somebody critical of Coca-Cola. But even if the image is recognized as promoting Coca-Cola, it is problematic for other reasons as well. After all, somebody critical of consumer society’s dependence on petrol may map connotations not envisaged or intended by the makers—for example, “humans are as dependent on Coca-Cola as cars are on petrol” or even coca-cola drinkers are machines. This example, then, is a reminder that different communities of viewers may routinely activate different mappings in a metaphor, particularly in the absence of context directing them toward the ‘correct’ interpretation. The interpretation of a metaphor, as of any message, always depends on how a stimulus combines with an addressee’s knowledge, beliefs, and emotions—what Sperber and Wilson (1995, p. 38) call the addressee’s “cognitive environment” (see also Forceville, 1996, forthcoming). Similarly, Max Black (1962, p. 40) invokes the addressee’s “system of associated commonplaces”, pointing out that the Hobbesian “Man is a wolf” would undoubtedly be interpreted completely differently by people who believe that wolves are the incarnations of dead ancestors. Formulated more generally, using metaphors is always a risky business, since addressees can always cue unintended mappings, whether because they happen to misunderstand or because they deliberately subvert the sender’s message (as Donkey does with Shrek’s). For this reason, while metaphor is a trope beloved by advertisers, it is no less cherished by their critics.5 5.  CONCLUDING REMARKS In this chapter I have aimed to show how an awareness of the functioning of metaphor provides an instrument for pinpointing how certain advertisements suggest dubious claims for their products, services, or brands without actually making them, or introduce or reinforce controversial assumptions

The Strategic Use of the Visual Mode in Advertising Metaphors  67 in their audiences. What is crucial is that the ground for these claims resides in the visual modality. This is important, because a fundamental distinction between the verbal and the visual modality is that only language (verbal language, that is) can make propositions, whether literal or metaphorical ones. Because of this property of language, verbal metaphors, particularly if they are given in the ready-made a is b format, can be questioned—as happens in the dialogue between Shrek and Donkey about “Ogres are like onions”. By contrast, visuals can only show certain clusters or configurations of items; the interpretation of the relations between these items is not steered and constrained by a grammar (only language has a ‘grammar’ in the literal sense), although the intended interpretation of these relations is often much helped by the viewer’s awareness to what genre the picture belongs (for more discussion, see Forceville, 1999). The crucial point is that these clusters or configurations can be designed in such a way that the visuals, alone or in combination with other modalities, make sense only or primarily as part of a metaphor or metonymy, while their makers could simply deny that a metaphor was intended. From a critical perspective, the model for analyzing pictorial and multi­ modal metaphors presented in this chapter can be used to investigate patterns that may manifest ideological meanings. Given that any patterns in pictorial and multimodal metaphor use are revealing in how advertisers channel our perception of the world, questions such as the following are worth investigating more systematically in corpus research: • Given a certain source domain (e.g., art, magic, tree, furred animal, petrol pumps), with what different target domains is it linked? (Kövecses, 2010, calls this the “scope” of a metaphor.) • Conversely, given a certain target (here, product category), with what different source domains is it linked? (Kövecses, 2010, calls this the “range” of metaphor; see Bounegru & Forceville, 2011; Forceville, 2000; Koller, 2009; Van Mulken et al., 2010.) • What presumably intended metaphorical mappings are cued in nonverbal modalities, and does this choice of modality downplay possibly controversial mappings? Conducting corpus research into the choice of metaphorical source domain in advertising directed at different audiences (men versus women, young people versus old people, blue collar versus white collar workers, Westerners versus Easterners, etc.) could feed into research on stereotyping (see also Tseng & Bateman, this volume). Finally, the model for analyzing pictorial and multimodal metaphors offered in this chapter constitutes an exhortation to examine which other classic tropes can be used nonverbally to create meaning. In as much as critically analyzing multimodal discourse depends on laying bare implicit meanings, developing an analytical framework for understanding certain

68  Charles Forceville visual and multimodal phenomena as manifestations of tropes such as oxymoron, pictorial grouping, metonymy, or puns (see, e.g., Abed, 1994; Forceville, 2009; Teng, 2009; Teng & Sun, 2002) will be a powerful tool for researchers hoping to unveil ideologically dubious patterns in all areas of multimodal discourse. This holds more specifically for discourses pertaining to persuasive communication (advertising, cartoons, propaganda, documentaries, corporate websites). ACKNOWLEDGMENT The author wants to thank Pieter Manders for helping to retrieve the photo­ graph of the Mazda billboard, Agey Benali for alerting the author to the Nivea advertisement, and the editors for their comments on an earlier version of this chapter. NOTES 1. Strictly speaking this is a simile, but in CMT similes are understood as being conceptually processed in the same way as metaphors. 2. For a French version of the commercial, see, last accessed September 2012. 3. The commercial discussed can be watched, with others in the series, at http://, last accessed September 2012. 4., last accessed September 2012. 5. See, for example, Adbusters,, last accessed September 2012.

REFERENCES Abed, F. (1994). Visual puns as interactive illustrations: Their effects on recognition memory. Metaphor and Symbolic Activity, 9, 45–60. Aristotle. (1991). [4th c. BC]. On rhetoric. (G. A. Kennedy, Trans., Ed.). New York: Oxford University Press. Barthes, R. (1986). The responsibility of forms (R. Howard, Trans.). Oxford: Blackwell. (Original work published 1964.) Black, M. (1962). Metaphor. In Models and metaphors: Studies in language and philosophy (pp. 25–47). Ithaca, NY: Cornell University Press. Black, M. (1979). More about metaphor. In A. Ortony (Ed.), Metaphor and thought (pp. 19–43). Cambridge: Cambridge University Press. Bounegru, L., & Forceville, C. (2011). Metaphors in editorial cartoons representing the global financial crisis. Visual Communication, 10, 209–229. Carroll, N. (1996). A note on film metaphor. In Theorizing the moving image (pp. 212–223). Cambridge: Cambridge University Press. Cienki, A., & Müller, C. (Eds.). (2008). Metaphor and gesture. Amsterdam: John Benjamins. Elleström, L. (Ed.). (2010). Media borders, multimodality and intermediality. Basingstoke: Palgrave MacMillan.

The Strategic Use of the Visual Mode in Advertising Metaphors  69 Forceville, C. (1996). Pictorial metaphor in advertising. London: Routledge. Forceville, C. (1999). Art or ad?: The influence of genre-attribution on the interpreta­ tion of images. SPIEL, 18, 279–300. Forceville, C. (2000). Compasses, beauty queens and other PCs: Pictorial metaphors in computer advertisements. Hermes, Journal of Linguistics, 24, 31–55. Forceville, C. (2006). Non-verbal and multimodal metaphor in a cognitivist framework: Agendas for research. In G. Kristiansen, M. Achard, R. Dirven, & F. Ruiz de Mendoza Ibáñez (Eds.), Cognitive linguistics: Current applications and future perspectives (pp. 379–402). Berlin: Mouton de Gruyter. Forceville, C. (2007). Multimodal metaphor in ten Dutch TV commercials. Public Journal of Semiotics, 1, 19–51. Retrieved from–1.swf. Forceville, C. (2008). Pictorial and multimodal metaphor in commercials. In E. F. McQuarrie & B. J. Phillips (Eds.), Go figure! New directions in advertising rhetoric (pp. 272–310). Armonk, NY: ME Sharpe. Forceville, C. (2009). Metonymy in visual and audiovisual discourse. In E. Ventola & A. J. Moya-Guijarro (Eds.), The world told and the world shown: Issues in multisemiotics (pp. 56–74). Basingstoke: Palgrave MacMillan. Forceville, C. (forthcoming). Relevance theory as model for pictorial and multimodal communication. In D. Machin (Ed.), Visual communication. Berlin: Mouton de Gruyter. Forceville, C., & Urios-Aparisi, E. (Eds.). (2009). Multimodal metaphor. Berlin: Mouton de Gruyter. Gibbs, R. W., Jr. (Ed.). (2008). The Cambridge handbook of metaphor and thought. Cambridge: Cambridge University Press. Gibbs, R. W., Jr., & Steen, G. J. (Eds.). (1999). Metaphor in cognitive linguistics. Amsterdam: John Benjamins. Jewitt, C. (Ed.). (2009). The Routledge handbook of multimodal analysis. London: Routledge. Koller, V. (2009). Brand images: Multimodal metaphor in corporate branding messages. In C. Forceville & E. Urios-Aparisi (Eds.), Multimodal metaphor (pp. 45–71). Berlin: Mouton de Gruyter. Kövecses, Z. (2005). Metaphor in culture: Universality and variation. Cambridge: Cambridge University Press. Kövecses, Z. (2010). Metaphor: A practical introduction (2nd ed.). Oxford: Oxford University Press. Kress, G. (2010). Multimodality: A social semiotic approach to contemporary communication. London: Routledge. Kress, G., & Van Leeuwen, T. (2001). Multimodal discourse: The modes and media of contemporary communication. London: Arnold. Kress, G., & Van Leeuwen, T. (2006). Reading images: The grammar of visual design (2nd ed.). London: Routledge. Lakoff, G., & Johnson, M. (1980). Metaphors we live by. Chicago, IL: University of Chicago Press. Lakoff, G., & Turner, M. (1989). More than cool reason: A field guide to poetic metaphor. Chicago, IL: University of Chicago Press. Mittelberg, I., & Waugh, L. R. (2009). Metonymy first, metaphor second: A ­cognitive-semiotic approach to multimodal figures of thought in co-speech gesture. In C. Forceville & E. Urios-Aparisi (Eds.), Multimodal metaphor (pp. 329–356). Berlin: Mouton de Gruyter. Müller, C. (2008). Metaphors dead and alive, sleeping and waking: A dynamic view. Chicago, IL: University of Chicago Press. Musolff, A. (2006). Metaphor scenarios in public discourse. Metaphor and Symbol, 21, 23–38.

70  Charles Forceville Ortony, A. (Ed.). (1979). Metaphor and thought. Cambridge: Cambridge University Press. Pateman, T. (1983). How is understanding an advertisement possible? In H. Davis & P. Walton (Eds.), Language, image, media (pp. 187–204). Oxford: Blackwell. Sperber, D., & Wilson, D. (1995). Relevance theory: Communication and cognition (2nd ed.). Oxford: Blackwell. Teng, N. Y. (2009). Image alignment in multimodal metaphor. In C. Forceville & E. Urios-Aparisi (Eds.), Multimodal metaphor (pp. 197–211). Berlin: Mouton de Gruyter. Teng, N. Y., & Sun, S. (2002). Grouping, simile, and oxymoron in pictures: A designbased cognitive approach. Metaphor and Symbol, 17, 295–316. Van Mulken, M., Le Pair, R., & Forceville, C. (2010). The impact of perceived complexity, deviation and comprehension on the appreciation of visual metaphor in advertising across three European countries. Journal of Pragmatics, 42, 3418–3430. Whittock, T. (1990). Metaphor and film. Cambridge: Cambridge University Press.


Japanese Street Fashion for Young People A Multimodal Digital Humanities Approach for Identifying Sociocultural Patterns and Trends Alexey Podlasov and Kay L. O’Halloran

1. INTRODUCTION Digital technology has changed the ways research is undertaken in universities (Berry, 2012a) and this impact extends to humanities subjects that have traditionally focused on close (pencil-and-paper type) analysis of exemplar texts. Today, there are new computational and visualization techniques permitting far and close readings of massive repositories of cultural data that are now freely available. These advances are at the heart of the ‘digital humanities’ research paradigm (Berry, 2011, 2012b; Hall, 2011), which brings together computer science, humanities, arts, and social science researchers in new institutional structures (e.g., specialized research institutes) designed to support such interdisciplinary collaborations. Digital humanities is currently organized into disparate subfields, ranging from the creation of data archives, automated analysis, and visualization of large data sets of images, texts, and videos, to critical approaches such as software studies, which is informed by cultural and media studies (e.g., Berry, 2011; Fuller, 2008; Hall, 2011; Schreibman, Siemens, & Unsworth, 2005). Nonetheless, the use of computational techniques for revealing sociocultural patterns in large data sets has steadily gained currency, as evidenced by recent grant programs, such as the Digging Into Data Challenge in 2009–2011, administered by the Office of Digital Humanities at the National Endowment for the Humanities in the United States, and sponsored by major research councils in the United States, the United Kingdom, and Canada.1 Within the field of digital humanities, Svensson (2010) identifies a shift from earlier ‘computing humanities’ to ‘multimodal humanities’, which uses computational tools and databases while “leveraging the potential of the visual and aural media”. This paper is an attempt to bring together digital humanities and multimodal analysis for critical readings, both far and close, of popular culture. The approach is outlined below. Multimodal analysis extends the study of language to the study of other resources (e.g., images, embodied action, and audio resources), which may be combined with language in multimodal texts (e.g., documents, videos, and hypermedia texts), objects (e.g., artworks, clothes, and architecture),

72  Alexey Podlasov and Kay L. O’Halloran and events (e.g., performances, debates, and so forth). Multimodal research is rapidly evolving in language-related fields of study (e.g., Jewitt, 2009; O’Halloran, 2011) in part due to the need to account for changes brought about by the seamless integration of language, images, videos, and other resources in digital media. Much of this research stems from Kress and Van Leeuwen’s (2006 [1996]) work on visual design and O’Toole’s (2011 [1994]) approach to displayed art in the 1990s. These foundational frameworks are based on Michael Halliday’s social semiotic approach to language (e.g., Halliday, 1978; Halliday & Matthiessen, 2004) and model resources for two- and three-dimensional visual design and displayed art as systems of interrelated choices. From here, multimodal analysis has been extended to investigate the interaction of semiotic resources in different domains such as mathematics, three-dimensional space, hypertext, film, and children’s picture books (see Jewitt, 2009; O’Halloran & Smith, 2011). In this chapter, we present a multimodal digital humanities approach which combines (a) the automated analysis of photographs and data visualization techniques using self-organizing maps (SOM) and topology learning algorithms, and (b) multimodal social semiotic analysis of images as a methodology for mapping sociocultural patterns and trends in popular culture, in this case for Japanese street fashion. In this way, we demonstrate how different levels of analysis for multimodal texts, context, and culture can be integrated using digital technology and multimodal theory. 2.  FASHION AS A CONCEPT Fashion, the perceived added value attached to clothing and accessories that extends beyond the material artefact, has received varying definitions and interpretations across different time frames and domains. Nevertheless, there is a common understanding that “the definite essence of fashion is change” (Kawamura, 2005, p. 5). There have been different explanations to explain ‘fashion as change’; for example, the economic imperative of capitalism which requires ongoing consumption (Kawamura, 2005); an ambivalence between adornment on one hand, and modesty on the another (e.g., Davis, 1992); and our affinity and receptiveness to novel things (e.g., Barthes, 1967; Koenig, 1973). Barthes (1967) explains fashion as “the new”: Fashion doubtless belongs to all the phenomena of neomania which probably appeared in our civilization with the birth of capitalism: in an entirely institutional manner, the new is a purchased value. But in our society, what is new in Fashion seems to have a well-defined anthropological function, one which derives from its ambiguity: simultaneously unpredictable and systematic, regular and unknown [emphasis added]. (p. 300)

Japanese Street Fashion for Young People  73 Barthes’ description of fashion is compatible with a social semiotic approach to language, image, and other resources, which includes clothes and accessories (Owyong, 2009). That is, semiotic resources are conceptualized as interrelated systems of choices that together constitute the ‘semiosphere’, the semiotic space of culture (Lotman, 2005). Within this semiotic space, there is invariably both stability and change over time. Moreover, clothing and accessory choices deemed ‘fashionable’ are accorded values, both positive and negative. That is, while fashion is valued and sought after, it is simultaneously denigrated due to its futility and artificiality—qualities considered taboo in utilitarian societies (Baudrillard, 1968). However, the “transgressive nature of fashion, to create meanings and forms other than those of economic and social progress” (Slade, 2009, p. 11) evokes a certain fascination, particularly as fashion reorders the past and “the forms that constitute milestones in it” (Slade, 2009, p. 12). The role of fashion in reordering the past makes Japanese street fashion a particularly interesting case study, given that Japanese society and identity have long been grounded in tradition (e.g., Slade, 2009). In this study, street fashion is defined as commercial fashions with “a hard urban edge to them” which are inspired by fashion magazines, but retain “a bottom-up, rather than a top-down, mechanism of gaining popularity” (Keet, 2007, p. 110). Indeed, as we discuss in the next section, Japanese street fashion was a site for the interaction between traditional Japanese culture and outside influences from the West, from which evolved a range of distinctive Japanese street fashion styles. However, how can fashion trends in Japan (and elsewhere) be analyzed and tracked over time, particularly in a world where fashion trends change quickly, due to the “democratization of fashion in the twenty-first century” enabled by the Internet and other digital technologies (Lynch & Strauss, 2010, p. 1)? Prior to presenting the analytical approach we have developed to address these questions, we provide an overview of Japanese street fashion. 3.  JAPANESE STREET FASHION Japan was introduced to Western clothes (yofuku in Japanese) during the Meiji restoration in the latter half of the nineteen century, and the suit, for example, had become standard office attire by the 1920s (Keet, 2007). From there, Japanese street style emerged under the influence of Western consumer culture, in particular American culture, after the Second World War. By the 1960s, the national dress in Japan had become suits and jeans (Keet, 2007; Steele, 2010). From here, young people in Japan forged their own street style in the 1970s, “mixing elements from western and Japanese culture” according to their own ideas about “what looks good” (Steele, 2010, p. 230). In this respect, Steele (2010) explains that deviant styles in Japan differed from the West because they were aligned with “urban self-display” rather than

74  Alexey Podlasov and Kay L. O’Halloran political and social frustration of the hippy and punk movements (p. 230). Moreover, there was a freedom to experiment with Western clothes in ways which did not exist in the West itself, due to the dissociation of Western clothes from social class in Japan. Japan’s attitude to Western clothes has been unfettered by the accompanying rules of class and status that clothes in Europe have been soaking in for hundreds of years (as indeed has Japan’s own system of indigenous dress). This, coupled with the country’s rapid postwar modernization into a hyperconsumerist society, has led to an evolution of yofuku that sometimes looks nothing less than spectacular to the eyes of the Westerner. (Keet, 2007, p. 8) Local and foreign fashion brands flourished in Japan in the 1980s as the economy rapidly grew. In particular, domestic Japanese brands became popular because they were more affordable. After the economy bubble burst in Japan in 1991, consumerism continued but now high fashion labels were sought and youth culture became fragmented into smaller groups. During this time, subcultures lost faith in American culture and developed their own styles; in some cases, the street style resembled a stage costume rather than everyday clothes, where the core experience was “to display oneself in the city” (Steele, 2010, p. 245). Today, Japanese designers have a relatively low international profile, yet Tokyo is the only Asian city that can be considered to be a fashion capital (Keet, 2007). For this reason, Tokyo street fashion has been chosen as the site for this study, particularly as it is seen to be “playing the role of designer: distilling down a plethora of fashion influences from home and abroad to come up with a number of unique looks” (Keet, 2007, p. 8). The aim of this study is not to identify the different styles and subcultures to which individuals belong, for example, Lolita (cute), Gyaru (girl-glam), Visual kei (rock-style), Bōsōzoku (manga-style), or Mori girl (doll-like) (Keet, 2007; Steele, 2010). Rather, the aim is to arrange Japanese fashion samples into groups according to what is actually being worn in order to see overall patterns and trends in Tokyo street fashions. This automatic grouping is achieved using various analytical techniques, which include automated image processing, visualization algorithms, and multimodal theory. After introducing the data set, methodology, and some preliminary findings, the potential and limitations of the approach are described. 4.  THE DATA SET The data set consists of 2,249 images with a full frontal view of a person that is the fashion sample. The images are automatically captured from,2 a Japanese street fashion portal run by the Japanese Fashion

Japanese Street Fashion for Young People  75 Association3, which collects street fashion photographs on a daily basis from different districts of Tokyo. A typical fashion sample found on this website is represented by the image on the left in Figure 5.1(i). The website portal records various metadata along with each image, such as the brand names of the clothes, the hair style, the accessories and pattern tags, and even personalized information about person’s age, occupation, music preferences, and so forth. The site also provides an interface to filter the collection using the available metadata. This filtering interface enables users to define metadata-based filtering criteria and access subsets of the collection, but it does not provide a holistic overview of all the street fashion photographs. This limitation makes


  (ii) Figure 5.1  Feature calculation and back projection into self-organizing map (SOM)

76  Alexey Podlasov and Kay L. O’Halloran it difficult (if not impossible) to answer questions concerning trends in the collection, where ‘trends’ are frequently repeated designs that are similar to each other. For example, we may ask: What are the fashion trends? What designs are sufficiently different from the repeated patterns in order to classify them as outliers? How do properties of the collection change over time? How is one collection different and/or similar to another collection? The collections may not necessarily come from the same source because the entire data set can be arranged according to the metadata; for example, different districts of Tokyo and age groups may be compared. The ability to explore such questions supports the goals of critical multimodal discourse analysis, which aims to understand social practices, in this case fashion trends, and how they are linked to larger social, economic, and political agendas. In view of these considerations, we propose a data-driven methodology for visualization and analysis of large fashion image collections. When proposing the technique we understand that the human analyst cannot be taken out of the loop completely since the domain of fashion is subjective by nature. However, certain computationally accessible properties of the data can be investigated and visualized. Since we know that fashion changes over time, there must be patterns in the collection of fashion images, and these patterns are defined by the visual data itself. 5.  MODELLING THE FASHION DATA Firstly, numerical features of photographs need to be defined in relation to the concept of fashion, and then these values need to be calculated using the raw images in the data set, resulting in the feature vector that expresses the desired properties of the data. Although many terms used by fashion analysts are too vague or ambiguous to be numerically represented and identified, there is one feature that is obviously important in the fashion domain and that can be confidently calculated from the visual data—colour. A trivial approach would be to calculate the average colour over the whole image and then use it for further processing. However, the results obtained using such modelling are unlikely to be useful for interpreting the meaning of fashion trends. For this reason, we propose a more sophisticated method for analyzing the data set. The feature calculation process is illustrated in Figure 5.1(i) and explained in detail below: (a) The source image is cropped to contain only fashion-related pixels The photograph is cropped to eliminate the influence of the background pixels on the final colour value so that only ‘fashion pixels’ contribute to the calculation. This step is facilitated by the almost perfect alignment of the

Japanese Street Fashion for Young People  77 images in the database (i.e., the images are comparable in terms of the size and position of the person), allowing the region of interest (ROI) to be fixed and the results to be manually corrected. (b)–(c)  The fashion area is separated into top and bottom segments. The photograph is divided into top and bottom segments and the average colours of the ROIs are calculated separately. This step provides further modelling precision, since fashion designs are likely to have different top and bottom elements that are likely to be vertically symmetrical. Besides, averaging the top and bottom sections (according to the vertical symmetry of each section) helps overcome errors caused by misalignment of ROI of different samples. (d)–(e)  The colour values for the top and bottom areas are averaged to form a six-dimensional feature vector. In the final stage, every image of the collection is represented by sixdimensional vector with values representing red, green, and blue colour components for top and bottom parts of the ROI in the images. 6.  THE FASHION MAP After the feature vectors for all images in the data set have been calculated, appropriate computational algorithms can be applied to produce visualizations of the data. We propose using a self-organizing map (SOM) (Kohonen, 2000) as an algorithm that can be applied to reveal underlying patterns that otherwise would not be detectable. SOM is an artificial neural network widely used for unsupervised modelling of multidimensional data. Like many other machine learning algorithms, it needs to be trained using the available data samples before being applied to the data set. After being trained, the resulting network is used to situate new data samples and visualize the structure of the network itself, making it possible to analyze the properties of the data it represents. SOM organizes the data in the form of two-dimensional map, which is easy to interpret visually. Moreover, SOM provides a holistic view of the complete data set and its parts, and can be implemented in an interactive way that facilitates navigation through the data set, as well as provides immediate access to individual samples. In what follows, we briefly describe the principles of SOM training, usage, and interpretation as they are applied in this study (for a different approach, see Kohonen, 2000). We first describe our definition of SOM and the analytical procedure for training the algorithm for the fashion collection data set. The description, although brief, is necessarily mathematical.

78  Alexey Podlasov and Kay L. O’Halloran In this study, we define SOM as two-dimensional grid of W × H N-dimensional vectors. Here, W and H are, respectively, width and height of the grid and N is the dimensionality of the data. In the following, N is equal to 6 (average red, green, and blue values for top and bottom area of segments of the analyzed images, as described above). The vectors mi, j= (m1i, j, . . ., mNi, j) forming the grid are referred as the model vectors, where (i, j) is the position in the grid, i = 1 . . . W is the column number and j = 1 . . . H is the row number. Model vectors of the grid serve as abstract models of any data vector of the same dimensionality N. Thus, for a given SOM, for any vector a = (a1, . . ., aN) one can find model vector mk,l such that d(a,mk,l) ≤ d(a,mi,j) for any i = 1 . . . W, j = 1 . . . H, where d is a vector distance function. In other words, for any vector a of the same dimensionality N as SOM model vectors, one can find a model vector on a map, which is most similar to a (in terms of distance). This model vector is called a best match model vector for vector a, and its position (k,l) is the best match position. In our application we use Euclidian distance as distance function d. The process of finding a best match model vector is referred to as the mapping of vector on the SOM. The underlying idea is that the map models the data set, which is usually complex and big, with a limited number of model vectors arranged as a grid. Model vectors represent clusters of data vectors, and the distance function, which is a measure of ‘similarity’, is used to find the best representative and measure the quality of the representation. Let us outline the way SOM learns the dataset. Assume we have a data set A = {ai}, i = 1 . . . K, which is a set of K N-dimensional vectors ai = (a1i , . . . , aiN). This set is called a training set, which in our case is a set of feature vectors produced by the collection of fashion images. Then we apply the following steps: 1. Initialize model vectors mi, j = (m1i, j, . . ., mNi, j ) randomly. 2. For every feature vector ai find its best match model vector mk,l, at position (k,j) in the map. 3. Update model vector mk,l and other model vectors within radius r of its position (k,j) according to m*k,l = mk,l Θ(r)α(ai – mk,l). Where Θ(r) is restrained due to distance, called a neighbourhood function, and α is learning coefficient. 4. Repeat Step 2 and Step 3 for all feature vectors ai in a training set. 5. Decrease neighbourhood distance r. 6. Decrease learning coefficient α. 7. Randomly rearrange vectors in training set A. 8. Repeat from Step 2, until α and r converge to zero and training effectively stops. In general terms, we initialize SOM model vectors randomly at Step 1, then Step 2 and Step 3 are repeated for all vectors in the training set. At Step 2, we find a best matching model on the map for a given feature vector

Japanese Street Fashion for Young People  79 (i.e., fashion sample represented as feature vector). Note that even if model vectors are randomly initialized (and thus are far from being good models), the best match can always be found. At Step 3 we update the best matching model vector and its neighbours in radius r in the grid to be ‘more like’ the training vector. The degree of this update is controlled by the parameter multiplier Θ(r)α. In case Θ(r)α = 0, the value of the model vector will not change. In case 0 < Θ(r)α < 1, the best match and its neighbours will become ‘more like’ the sample from the training set. The parameters are controlled at Step 5 and Step 6 in such a way that their influence gradually decreases during training from relatively high values of 0.8 toward 0. Therefore, initially, when model values are random, Step 3 significantly affects model vectors by bringing their value closer to the values in the training set. With a decrease in the parameter, model vectors are less affected by the training values and they stabilize around certain values. The training process results in values of the model vectors such that: (1) the model vectors best represent clusters in the training set, and (2) model vectors with values close in terms of N-dimensional feature space are positioned close to each other on the grid. In the context of this work, we may say that SOM is trained in a way that similar fashion designs will have their best match positions on the map coinciding or close to each other. 7.  VISUALIZING THE FASHION TREND The simplest way to visualize SOM is to back-project the training data. This is achieved by finding the best matching position for every training sample and rendering the original image in that position. Figure 5.1(ii) gives an overview of SOM with back-projected training images. One can easily see from the map that similar designs are positioned close to each other. For example, in Figure 5.1(ii), the top-left corner of the map is occupied by ‘light top-light bottom’ designs (see a close shot in Figure 5.2[i]); ‘dark ­top-dark bottom’ designs are positioned in the bottom-right corner region; ‘light top-dark bottom’ designs appear in the bottom-left region; and the ‘dark top-light bottom’ are positioned in the top-right toward the top-centre of the map. Besides that, there are ‘red top-dark bottom’ clusters in the centre and various clusters of bluish and brownish designs, in addition to areas of less distinguishable samples. This map can be interpreted as an overview of the fashion space and conclusions in terms of general-to-particular relationships can be drawn. For instance, we can conclude that in the given fashion collection, white-black (grey) combinations tend to prevail over colourful designs where colour clusters (if they exist) are small in size; the predominant colour for different clothing items is black; and designs of distinct green are almost absent from the data set.

80  Alexey Podlasov and Kay L. O’Halloran


  (ii) Figure 5.2  Section of self-organizing map (SOM) and complete feature space

8.  COMPARING COLLECTIONS The proposed approach makes it possible to compare fashion collections by mapping one collection onto the map that has been trained using another collection. The problem with this approach is that this mapping can be

Japanese Street Fashion for Young People  81 inadequate, since some samples (or even clusters) that are important in one collection may be completely absent in another. This may lead to a situation in which, for example, black designs are compared to a collection where only white samples exist. The best matching white design will be found, but the match will obviously be far from satisfactory. This problem may be solved by training the map using the combined set of samples and then mapping the collections separately. In the feature calculation approach developed in this study, any collection may be compared to what is hypothetically possible for the fashion designs in the feature space—that is, the collection (the actual) may be mapped against the complete fashion space (the potential). In order to do this, the SOM must be trained in relation to all possible feature vectors. Strictly speaking, six-dimensional feature space where vector components vary from 0 to 255 and are integers may generate 2556 unique vectors. This value is too high to complete the training in reasonable time and we propose a way to significantly reduce it. The complete colour space contains many more colours than the human eye can discern. Therefore, we can limit the colour space to the number of colours that the human eye can distinguish and calculate feature vectors for their combinations of two (i.e., for the top and the bottom). If we consider N colours in the palette, then only N2 feature vectors will represent the whole colour space. In our experiment, we generate a colour palette using semantic colour names defined in Tcl/Tk programming language.4 This palette consists of 501 colour names and generates a feature space of 5012 = 251,001 unique feature vectors. We trained the SOM using this generated colour space and then mapped the collection of Japanese fashion images into it. The resulting mapping of the Japanese data set against all possible distinguishable-colour combinations is presented in Figure 5.2(ii). The visualization of the complete colour space in Figure 5.2(ii) reveals that the Japanese fashion collection does not cover the approximated potential feature space. On the contrary, the majority of designs fall into several clusters, with a few outliers sparsely distributed over the map. Large areas of the map are empty, which means there are no samples in the Japanese fashion collection with similar colour properties. This map then provides visual evidence for a designer who wishes to fill a gap in the market, and the model vector of the SOM has the necessary information about the samples of the map one needs to access. We do not visualize these values in Figure 5.2(ii) for the sake of clarity in the image. 9.  MAPPING OVER TIME The approach described above can be used to compare parts of the same collection and to visualize the evolution of the collection over time, since fashion images have a time stamp associated with them and/or may be

82  Alexey Podlasov and Kay L. O’Halloran



  (iii) Figure 5.3  Dynamic mapping in real time

Japanese Street Fashion for Young People  83 arranged as a temporal sequence. The images of the collection, which are mapped one by one, form frames of a video sequence. A fade out effect is applied to the rendered images to gradually remove those images, which no longer belong to the current time frame. Two sample frames from the resulting video sequence are given in Figure 5.3(i)–(ii), which displays fashion trends from spring 2008 to autumn 2010. The dynamic visualization reveals that these Japanese fashion trends are not sequential transitions from one type of design to another, but develop simultaneously, since individual fashion samples fall into distant places of the map and not into dense clusters. At the same time, the video sequence shows a gradual shift from lighter designs to darker designs, which may be attributed to seasonal changes in Japan and outside fashion influences. 10.  REAL-TIME MAPPING As pointed out earlier, one can map any arbitrary fashion sample on the trained map, even if this sample has not been used to train the map. In view of this, we have implemented a real-time fashion mapping system, where visitors can map their current outfit into the Japanese street fashion map in real time using a web camera. The principles of the system are outlined in Figure 5.3(iii). At step a the camera captures a person; at step b the feature vector is calculated in real time and the original camera frame is rendered over the fashion map. This installation visualizes the person’s position in the Japanese street fashion space in real time and interactively, building a live connection between the individual and the fashion space.5 11.  MAPPING THE TOPOLOGY During the learning process, SOM maps the multidimensional space (the feature space) into a two-dimensional space (the map). The two-dimensional map provides the visualization, which is intuitively easy to interpret due to the strong analogy with geographical maps and the idea that close points on the map tend to be similar to each other. However, continuous mapping from multidimensional space into two-dimensional space is not always (in fact, is hardly) possible. For example, points on a three-dimensional sphere may be close to each other in three-dimensional space, yet these cannot be mapped into the flat map so that this closeness is preserved. This is evident in geographical maps, which can distort real distances, and this distortion can be very high (e.g., the Polar Regions in the standard Mercator projection map). SOM faces similar challenges; such distortions are in fact even harder to predict and locate because projection error usually only grows when the dimensionality of the original space is increased. For a better interpretation of the fashion space, one needs to visualize the distortions resulting from dimensionality reduction.

84  Alexey Podlasov and Kay L. O’Halloran


  (ii) Figure 5.4  Density of self-organizing map (SOM) and accompanying networks

One of the simple ways to achieve this task is to visualize the multidimensional ‘sparseness’ of neighbouring feature vectors of the map. In our implementation, every model vector on the map has eight neighbouring model vectors in the grid, so we can define sparseness as maximal distance between the neighbours. The map and its sparseness are displayed in Figure 5.4(i), where brighter gray values are used for higher and darker gray for lower sparseness values. One can notice that ‘light top-light bottom’ and ‘black top-black bottom’ regions of SOM (i.e., the top-left and the bottom-right corners of the map, respectively) are more uniform because the sparseness values of the corresponding areas are lower and therefore darker. Bright clusters in the centre of the sparseness map indicate that colourful designs, while being close on the map, are more dissimilar, or involve more varied combinations, in terms of the fashion space. 12.  TOPOLOGY AS NETWORKS FOR MULTIMODAL ANALYSIS The previously mentioned approach still does not provide complete understanding of the topology of the original space. For example, now we know that colourful clusters in the centre of the map in Figure 5.4(i), though being

Japanese Street Fashion for Young People  85 close on the map, are not that similar to each other in terms of distance in feature space. Then, which designs in the map are the most similar designs to these clusters? Are there designs that are ‘in between’ two particular clusters? Questions like that can be addressed by using topology-learning algorithms, which as we shall illustrate, provide a bridge to the multimodal analysis of the fashion samples extending beyond colour. The algorithm we propose for visualizing the topology of the Japanese street fashion space is known as Growing Neural Gas (GNG) (Martinetz & Schulten, 1994). This algorithm, like SOM, is a machine-learning algorithm consisting of two stages: training and interpretation. During training, the algorithm tries to approximate the structure of the training set (a set of feature vectors representing fashion samples) as a network of nodes and edges, where nodes approximate clusters in the training set, and edges (connecting lines) are defined between nodes in case there are enough data samples ‘between’ the clusters. The sample is considered to be ‘in between’ two nodes if these nodes are its first and second best matches. Once the network is constructed, it is visualized using any graph-drawing algorithm. The number of nodes is the parameter controlling the precision of the approximation. Note that position of the node on the visualization surface has no meaning since the drawing algorithm defines it only for convenience of viewing—the structure of the network is what matters. The topology learned by GNG is illustrated in Figure 5.4(ii). The network diagram on the left presents the topology approximated using 18 nodes (clusters). The thickness of the node frame corresponds to the number of data points in the cluster and the thickness of the edge corresponds to the number of designs ‘in between’ two clusters or, more precisely, cluster groups. As SOM earlier revealed, GNG demonstrates that the fashion designs are distributed between two main poles—‘light top-light bottom’ and ‘dark top-dark bottom’ designs (i.e., the top-left and the bottom-right regions in Figure 5.4(ii)). As the line thickness illustrates, the majority of the designs in the data set are distributed within these two main clusters. Besides that, one can see, from the thin lines that connect them, that there exists a smooth transition between the two opposite poles of the fashion spectrum. A GNG diagram can thus be used for finding similarities in the fashion space, which are more evident in the network diagram on the right, which is produced using 48 nodes for higher precision; that is, it shows more detailed and more varied paths between the opposing sides of the data set. As can be expected from fashion, the question of what is similar to a particular design has no single answer and a transition from one design to another can be achieved in various ways. Significantly, topological networks, with automatically selected representative samples for clusters of data points, provide an empirical basis for selecting examples of Japanese street fashion for close social semiotic, multimodal analysis. That is, we may see colour in relation to other aspects of the fashion sample using a social semiotic framework for

86  Alexey Podlasov and Kay L. O’Halloran clothing through the analysis of the nodes in the typology network, as described below. The social semiotic framework for clothes (Owyong, 2009) is organized according to three strands of meaning (see Van Leeuwen, 2005): representational meaning (what is depicted in terms of content), modal meaning (how the clothes engage viewers), and compositional meaning (how the outfit fits together). The systems are organized according to overall attire (the complete outfit), apparel (top and bottom items of clothing), element (details in the clothing), and accessories (head pieces, facial adornments, jewellery, and so forth). In this case, we can analyze the Japanese street fashion samples in the 18 node networks in Figure 5.4 in relation to selected systems of meaning at the rank of apparel for upper body and lower body items of clothing as displayed in Table 5.1. Representationally, many of the Japanese street fashion styles tend to select for a jacket with an equal distribution of skirts, shorts, pants, and leggings and a range of different tops. Dresses are the most popular singlepiece item. Interpersonally, the cut of the top and bottom clothing items is predominantly mid-range (not tight or loose) but many of the outfits have loose fitting cardigans, scarves, and over-blouses. There is only one case where words appear on an item of clothing. The majority of items

Table 5.1  Framework for function and systems in clothing semiotics (Owyong, 2009, p. 197) Unit Apparel




Upper-body articles (shirt, jacket, singlet)


Print (e.g., pinstriped, ­floral, paisley, tartan)

Lower-body articles (jeans, shorts, skirt)


Stitching (horizontal, vertical, diagonally aligned)

One piece

Fit to body

Material (e.g., fur, silk, denim, leather)


Colour composition

Material Texture Neckline Length Picture Words/message Brand logo/ slogan

Japanese Street Fashion for Young People  87 appear to be cotton, wool, and synthetic blends and leather is confined to jackets, shoes, and belts. The overall look is casual but stylish and smart, which aligns with Tokyo’s reputation of being “home to some of the world’s best-groomed, most stylish women” (Keet, 2007, p. 160). Compositionally, plain colours are most popular and the majority of fashion samples have an item of black clothing, apart from the nodes most distant from the ‘black top-black bottom’ region. Nonetheless, there are significant number of patterns and prints, which suggest that experimentation is a key aspect of the compositional design of Japanese street fashion. Also, the findings suggest that Japanese street fashion is quite different to international street fashion styles consisting of blue jeans, denim items, and printed T-shirts common to the West, but the precise nature of this difference and reasons for it would require a much larger study. One disadvantage of the GNG topology visualization is that it represents the network structure with a given degree of precision, and for this reason, outlying (that is, atypical) and less influential patterns are dominated by the bigger clusters, which have visual presence in the diagram. For this reason, GNG is more suitable for visualizing trends and not outliers, in this case, in the Japanese fashion samples. 13. CONCLUSION The methodology introduced in this chapter is a preliminary attempt to link digital humanities with multimodal analysis and critical cultural studies. The case study employed to present the methodology focuses on colour; however, fairly obviously, there are many more variables at play in fashion (e.g., the design of the clothes, the fabric, the accessories, and so forth), in addition to the influence of fashion magazines and websites and the designers themselves, which together form a complex sociocultural phenomenon beyond the scope of this study. Nonetheless, we have demonstrated how the automated analysis of large cultural data sets, augmented with multimodal social semiotic analysis, provides empirical evidence about social practices, in this case for Japanese street fashion. In this study, we have visualized Japanese street fashion trends and shown how fashion collections may be compared, and we have mapped the actual fashion space of a collection against the potential space to indicate possible gaps in the market. We have mapped the fashion space in real time, and provided a topology of the fashion collection in the form of a network diagram to represent clusters of designs in the collection. The nodes in the network provided fashion samples for close multimodal semiotic analysis. Moreover, the approach can be used to investigate fashion and other sociocultural trends as the product of change in ways that would not be possible with qualitative methods. Significantly, the approach developed

88  Alexey Podlasov and Kay L. O’Halloran in the study raises questions for critical discourse analysis, where insights about social practices are based on a close reading of a limited number of texts. As Berry (2012a, p. 5) explains, “digital technology highlights the anomalies generated in a humanities research project” and “leads to the questioning of the assumptions implicit in such research, for example close reading, canon formation, periodization, liberal humanism and so forth”. The major aim of the paper has been to illustrate the potential of digital technology for arts, social science, and humanities research. Indeed, in order to critically explore the massive cultural data sets that are now available, it seems feasible that in the foreseeable future all research projects, regardless of disciplinary field, will require researchers to have some understanding of computing and a degree of coding ability if they want to conduct independent research of sociocultural data; otherwise, they will be limited to existing methodologies, databases, and technologies. Presner (2010) explains further: “we are at the beginning of a shift in standards governing permissible problems, concepts, and explanations, and also in the midst of a transformation of the institutional and conceptual conditions of possibility for the generation, transmission, accessibility, and preservation of knowledge”. As digital humanities researchers (e.g., Berry, 2012b; Fuller, 2008; Manovich, 2008) claim, this naturally includes a critical approach to software itself.

ACKNOWLEDGMENTS This research was supported by the research grant Mapping Asian Cultures: From Data to Knowledge (No. HSS-0901-P02) (principal investigator: Kay O’Halloran) funded by the National University of Singapore. The authors thank Professor Lev Manovich (University of California San Diego; collaborator for the project) for providing valuable ideas and insights, and suggesting the style-arena website as a data source.

NOTES 1. 2. http:/ 3. 4. 5. The interactive installation, Japanese Street Fashion Exhibition, was exhibited at the Fifth International Conference on Multimodality (5ICOM), University of Technology Sydney (UTS), Australia, December 1–3, 2010, by Kay O’Halloran, Alexey Podlasov, and Ong Kian Peng (Multimodal Analysis Lab, Interactive & Digital Media Institute, National University of Singapore) and Lev Manovich (University of California San Diego).

Japanese Street Fashion for Young People  89 REFERENCES Barthes, R. (1967). The fashion system. (M. Ward & R. Howard, Trans.). New York: Hill and Wang. Baudrillard, J. (1968). The system of objects. (J. Benedict, Trans.). London: Verso. Berry, D. M. (2011). The computational turn: Thinking about the digital humanities. Culture Machine, 12, 1–22. Berry, D. M. (2012a). Introduction: Understanding the digital humanities. In D. M. Berry (Ed.), Understanding digital humanities (pp. 1–20). Hampshire: Palgrave Macmillan. Berry, D. M. (Ed.). (2012b). Understanding digital humanities. Hampshire: Palgrave Macmillan. Davis, F. (1992). Fashion, culture and identity. Chicago, IL: University of Chicago Press. Fuller, M. (Ed.). (2008). Software studies: A lexicon. Cambridge, MA: MIT Press. Hall, G. H. (2011). The digital humanities beyond computing. Culture Machine, 12, 1–11. Halliday, M. A. K. (1978). Language as social semiotic: The social interpretation of language and meaning. London: Edward Arnold. Halliday, M. A. K., & Matthiessen, C. M. I. M. (2004). An introduction to functional grammar (3rd ed., revised by C. M. I. M Matthiessen, Ed.). London: Arnold. Jewitt, C. (Ed.). (2009). Handbook of multimodal analysis. London: Routledge. Kawamura, Y. (2005). Fashion-ology: An introduction to fashion studies. Oxford: Berg. Keet, P. (2007). The Tokyo look book: Stylish to spectacular, goth to gyaru, sidewalk to catwalk. Tokyo: Kodansha International. Koenig, R. (1973). The restless image: A sociology of fashion (F. Bradley, Trans.). London: George Allen & Unwin Ltd. Kohonen, T. (2000). Self-organizing maps (3rd ed.). New York: Springer. Kress, G., & Van Leeuwen, T. (2006 [1996]). Reading images: The grammar of visual design (2nd ed.). London: Routledge. Lotman, Y. (2005). On the semiosphere. Sign System Studies, 33(1), 201–229. Lynch, A., & Strauss, M. D. (2010). Changing fashion: A critical introduction to trend analysis and meaning. Oxford: Berg. Manovich, L. (2008). Software takes command. Retrieved October 17, 2012, from Martinetz, T., & Schulten, K. (1994). Topology representing networks. Neural Networks, 7, 507–522. O’Halloran, K. L. (2011). Multimodal discourse analysis. In K. Hyland & B. Paltridge (Eds.), Companion to Discourse Analysis (pp. 120–137). London: Continuum. O’Halloran, K. L., & Smith, B. A. (2011). Multimodal studies. In K. L. O’Halloran & B. A. Smith (Eds.), Multimodal studies: Exploring issues and domains (pp. 1–13). London: Routledge. O’Toole, M. (2011 [1994]). The language of displayed art (2nd ed.). London: Routledge. Owyong, Y. S. M. (2009). Clothing semiotics and the social construction of power relations. Social Semiotics, 19(2), 191–211. Presner, T. (2010). Digital humanities 2.0: A report on knowledge. Retrieved October 17, 2012, from Schreibman, S., Siemens, R., & Unsworth, J. (Eds.). (2005). A companion to digital humanities. Malden, MA: Blackwell.

90  Alexey Podlasov and Kay L. O’Halloran Slade, T. (2009). Japanese fashion: A cultural history. Oxford: Berg. Steele, V. (2010). Japan fashion now. New Haven, CT: Yale University Press in association with the Fashion Institute of Technology, New York. Svensson, P. (2010). The landscape of digital humanities. Digital Humanities Quarterly, 4(1). Retrieved from 000080.html Van Leeuwen, T. (2005). Introducing social semiotics. London: Routledge.

Part II

Key Issues in Contemporary Popular Culture

This page intentionally left blank


Multimodal Constructions of the Nation How China’s Music-Entertainment Television Has Incorporated Macau into the National Fold1 Lauren Gorfinkel

1. INTRODUCTION: NATION-BUILDING AND MEDIATED DISCOURSE From a social constructivist perspective, building and maintaining a nation requires constantly constructing and consolidating national symbols. Nation-building involves the discursive reconciliation and demarcation of differences between groups both within and beyond the marked and perceived national boundaries (Duara, 1996). National boundaries and the symbols that mark them are constantly shifting, and it is through mediated performances that nations are able to ‘try on’ various identities and fashion a ‘coherent ethnic narrative’ for themselves (cf. Nelson, 1999, p. 341; Spickard, 2001, p. 93; Woronov, 2007, pp. 654, 666). Language and other forms of expression, such as music/sound, stage design, television editing production techniques, and dance, all work together to project particular imaginings of the nation. The need to reconcile differences and develop new symbolism is especially evident when a nation-state acquires (or reacquires) new territory. Prime examples in recent times are the ‘returns’ (huigui) of Hong Kong and Macau to the administration of the People’s Republic of China (PRC) in 1997 and 1999, respectively. Both Hong Kong and Macau had been ruled by ‘foreign powers’—Great Britain and Portugal—but were ‘handed back to the motherland’ (huigui zuguo) with great fanfare via symbolic events broadcast by the Chinese media. In the lead-up to and since the handovers, PRC television and other media have played important roles in continuing to make these returns a ‘reality’ for Chinese and international viewers. Hong Kong and Macau, which were formerly outside or foreign to Chinese mainlanders, had to be reimagined (to draw on Anderson, 2006) as part of a unified Chinese entity, and the media has played an important role in this national imagining. But how exactly has Chinese television made the new political unification a reality for viewers nationwide? What symbolism has been used? How have the semiotic resources of language, visuals, and sounds/music been drawn

94  Lauren Gorfinkel on to reconcile, in Macau’s case, hundreds of years of social, political, and ideological difference in order to link it to the rest of China? This chapter offers a particularly pertinent case for understanding the strategic ideological and political use of visual, musical, and linguistic modes by Chinese state television to “naturalize” (Barthes, 1973) a national discourse of reunification through the popular form of music entertainment. This chapter focuses on how Macau’s return has been discursively reincorporated into the Chinese nation-state through music-entertainment television programming (for the case of Hong Kong, see Gorfinkel, 2011). 2. CONTEXT: MACAU, CHINESE TELEVISION, AND THE IDEOLOGY OF ENTERTAINMENT Known as “Aomen” in Mandarin and “Oumun” in Cantonese (literally, “door to the bay”), Macau is located next to China’s Guangdong province. It was officially named the Macau Special Administrative Region (Macau SAR) of the PRC on December 20, 1999, after being formally handed back to Chinese administration by the Portuguese who had used Macau as an entrepôt for foreign trade since the 1550s.2 Like Hong Kong, Macau was incorporated into the PRC under the “one country, two systems” policy (Lam, 2010, p. 660). After over 10 years of unification, Macau’s politics, economy, and society have become strongly integrated into the PRC system such that the policy of “one country, two systems” is actually practiced much less in Macau than in Hong Kong (Lo, 2008, p. 227). Desiring to leave a Portuguese presence on Macau, the Portuguese administration spent a huge amount of money on historical preservation of Macau’s colonial history and Mediterranean-European architecture in the 1980s (Edmonds, 1995, pp. 232–234; Lam, 2010). Macau, home to 450,000 people in 1999, has been framed as a point of encounter between China and the Portuguese-speaking world (Lam, 2010). Since the hand­over, the new Chinese administration has overemphasized Macau’s unique historical and cultural character by constantly merging Portuguese and Chinese elements, thus helping to build Macau’s unique brand particularly for its tourism industry (Lam, 2010). This contrasts with Hong Kong, whose British history is largely under-emphasized by the Chinese authorities (Lam, 2010, p. 656). This article focuses on mediated realities about Macau. The programs discussed in this chapter appeared on CCTV, China’s only ‘national-level’ television network, which is based in Beijing. CCTV has massive reach across the country and the world, with satellite broadcasts and online streaming. Entertainment programming is considered to be vital for attracting and maintaining audiences who now have plenty of choices in accessing a wide number of national and provincial satellite channels (Hong, 1998). Yet, for CCTV, their mandate and role as mouthpiece

Multimodal Constructions of the Nation   95 of the central authorities requires a particular dedication to promoting party-state ideology, as well as satisfying audience desires for entertaining programs. What we see is a unique combination of entertainment and indoctrination, which Chinese media scholar Wanning Sun (2009, p. 64) has referred to as “indoctritainment”. Music and singing are particularly drawn upon to reflect a sense of happiness, harmony, and unity within the Chinese state as a whole (Gorfinkel, 2011; Gorfinkel, 2012a). Political uses of entertainment are not new in China. Before television became popular in the late 1970s, various forms of popular culture, including song, dance, and drama, were used by the Chinese Communist Party as tools for reaching out to the masses for the purposes of political education (Brady, 2008, p. 201; Holm, 1991, pp. 18–24). While political education focused on socialism in the Mao era, since 1979 the dominant ideology has come to revolve around nationalism (Zhao & Guo, 2005, pp. 530–532), with both ideologies helping to build a sense of legitimacy for the leadership of the Chinese Communist Party in China. As in other countries, television has been recognized by the government for the important role it plays in being able to vividly project symbols of the nation and its culture directly into ordinary people’s homes (Zhao & Guo, 2005, p. 531; Zhu & Berry, 2008, pp. 1–3). As part of a variety of nation-building themes, CCTV plays a special role in helping to define for national (and international) audiences Macau’s place within the Chinese state. 3.  ANALYTICAL FRAMEWORK To understand how the musical, spoken, and visual discourses work together to create a sense of unity and identity for Macau vis-à-vis the PRC state as a whole, I have drawn on theories from multimodal and critical discourse analysis, post-colonialism and post-structuralism, and semiotics. The critical analysis of multimodal discourse requires a consideration of the various interacting visual, spoken and musical modes in combination with one another, as situated within particular social and political contexts. It also requires delineating what each mode uniquely contributes to this “textual ensemble” (Macken-Horarik, 2003, p. 14; also see Lemke, 2002, p. 303). In the context of party-state television in China, I suggest that a useful way to think about multimodality is that sometimes the modes work together in ways that create relatively clear messages that appear to be “selectively designed to reinforce one another” and help to create a clear “single attitude” (Lemke, 2006, p. 8). At other times, their combinations result in more ambiguous, open, or varied readings. In the context of CCTV, where it is widely known that the party and state oversee the production process, the notion of ‘dominant’ versus ‘alternative’ readings, discussed by scholars like Lemke (2006), is problematic because it assumes that audience members can only position themselves for or against the controlling producers’ desired meanings. However, viewers

96  Lauren Gorfinkel may simultaneously see dominant and alternative readings in the same text, and the same television programs could fluctuate in how they attempt to position readers. This textual study refrains from assumptions about the ‘effects’ of the texts on audiences, except to suggest, drawing on Barthes (1973), that whether or not viewers accept the messages of the shows, immediate impressions of patterned intertextual and multimodal constructions may linger in viewers’ minds, even if they are able to see through the ‘myth’ with a more detailed examination over time (pp.124, 130–131). By myth, I refer to Barthes (1973) notion of ideological messages that are made to appear ‘natural’. My focus is on how the texts attempt to construct particular ‘reading positions’ or ‘intentions’ based on preferences for the ways in which sounds and visuals are arranged, ordered, and patterned in selected television moments. These patterns, repetitions, and “excesses” (Barthes, 1973, p. 126) relate to, and help to form, broader sociopolitical discourses and, given the widespread access to television, may be an important source for interpretations of Chinese identity. Drawing on the work of postcolonial scholar Presenjit Duara (1996), I consider CCTV music-entertainment programs as being engaged in constructions of ‘Chineseness’ that straddle a continuum between presenting fairly ‘hardened’ versions of the Chinese state and its people and fairly ‘soft’ and malleable constructions, where both the hard and soft extremes of production design reflect party-state political positions. The party-state itself is considered an organic and internally contested entity rather than a monolithic being. At the ‘hardened’ extreme, a sense of unity is built where all the modes—language, visuals, music, and so forth—reinforce each other to create a more or less hegemonic message. For instance, in highly politicized national events, a woman dressed in red (the official Party/patriotic colour), singing in the official dialect (Mandarin), against the backdrop of a fluttering red flag, with lyrics directly asserting the happy unity of China’s 56 official nationalities, gives a clearly readable message of state unity. Even if one disagrees, the intended message in such instances is very clear. At the other end of the spectrum, the modes may seem to pull in different directions leading to messages that are more open to multiple interpretations or that suggest a level of ambiguity. For instance, in one scene people may speak different languages, wear different kinds of clothes, and sing in diverse styles. In these instances, the Chinese collective may hang together more ‘softly’, and in ways that may sometimes appear less clear to the viewer/listener. A softer and more ambiguous image may be intentionally presented in order to win the support of an internationally oriented viewership by positioning China as ‘open’ and not overly dogmatic. The different choices in multimodal constructions may also reflect different purposes, internal conflicts of beliefs, or different artistic or habitual approaches within production teams. To make connections between broader myths of nation-building and the specific uses of musical, verbal, and visual modes, I draw on Barthes’s

Multimodal Constructions of the Nation   97 notion of “appropriation”, which describes new uses of images and forms that remain “silently rooted” in history (1973, pp. 119–122). This notion is useful for considering the selective use of colonial symbolism in new constructions of Macau. Semiotic theories of Van Leeuwen (1999) and Kress and Van Leeuwen (2002) are also drawn on throughout to help interpret the social meaning of musical, linguistic, and visual signs in the television texts, including, for instance, how they may help to construct feelings of belonging and distancing. 4.  THE TELEVISION EVENTS AND PROGRAM HOSTS In this chapter, I focus on three politically significant televised music-­ entertainment events that have featured Macau on CCTV-3, CCTV’s comprehensive arts and entertainment channel. Aimed primarily at a domestic audience, CCTV-3, which began broadcasting as a separate channel in 1995, is widely accessible in mainland China via broadcast and satellite television, as well as around the world over the Internet via China Network Television ( through live streaming and video on demand. The three chosen case studies capture key moments in the history of the Macau–mainland China relationship. The shows are 1) the CCTV Spring Festival Gala in 1999, the year of Macau’s “return” to Chinese administration; 2) the 2008 Happy in China special celebrating Macau, China’s national day, and China’s Olympic year; and 3) a 2009 special celebrating the 10-year anniversary of the handover. A brief background to the annual Spring Festival Gala, the weekly Happy in China program, and the notion of television “specials” is offered before moving onto the case studies. The CCTV Spring Festival Gala (Chunjie lianhuan wanhui), which began in 1984, is a large scale variety show broadcast on the eve of the Chinese Spring Festival/New Year. By the late 1980s, this show had a 90 percent share of the national audience during its annual broadcasts and had become a vital part of the New Years Eve ritual across China (Zhao & Guo, 2005, p. 525). Since the early 1990s, the CCTV Spring Festival Gala has become increasingly nationalistic, aiming to unite all members of the big Chinese family, whether they live in the mainland, Taiwan, Hong Kong, Macau, or overseas (Zhao & Guo, 2005, p. 531; also see Lu, 2009; Zhao, 1998). The show Happy in China (Huanle Zhongguo Xing, literally, Happy China Roaming) started broadcasting in 2004,3 and is a large-scale outdoor arts program that travels to different cities around China each week. Similar in style to the annual CCTV Spring Festival Gala, singing is the focal point, while different styles of popular music performances, spoken dialogue, occasional games, off-stage reportage, and visual spectacle (dance, dress, stage design, on-screen text, etc.) make up the rest of the show. In 2008, the two-hour program was broadcast once a week at the 7:30 p.m. prime-time slot on Friday evenings, with a repeat on the weekend. CCTV

98  Lauren Gorfinkel claims it has been CCTV-3’s top rating program for many years.4 Happy in China plays an important role in linking different cities around China into a coherent whole and framing local cultures and customs as part of a greater national culture. The show emphasizes the special role of music entertainment in ‘bringing happiness’ to audiences, as well as a sense of national unity (Happy in China, 2011). Music-entertainment television ‘specials’ (tebie jiemu), like the 2009 special celebrating the 10-year anniversary of the handover of Macau from Portugal to the PRC, are regular features of CCTV-3. Specials are usually one-off or periodic events that celebrate or commemorate key national, social, and political events like China’s National Day (October 1), the MidAutumn Festival, and various national anniversaries, including the 60th birthday of the PRC. The above shows are generally hosted by the same select group of Beijingbased CCTV hosts. Their speech style, following Van Leeuwen (1999, pp. 45–46), can be defined as friendly but formal. Using a highly rehearsed style, the hosts play a vital role in verbally linking the various performances and framing them as significant. When programs tour outside of Beijing, local hosts (generally from local television stations) often work alongside CCTV hosts, giving the impression of equality between the ‘national’ and ‘local’ levels. 5.  CASE STUDIES

5.1  The 1999 Spring Festival Gala and Macau’s Song of Return Like many provinces, cities, and towns around China, Macau has a dedicated song about itself that has been used in significant political television moments to link the local area with the national culture. Titled “The Song of Seven Lands—Aomen” (Qizi zhi ge—Aomen), the song presents Macau’s connection to China as a mother-child bond. It was originally written in 1925 as a poem about the seven places of China (Zhonghua)—Macau, Hong Kong, Taiwan, Kowloon, Weihai, Guangzhou Bay, and Luda (Lushun and Dalian)—that had fallen into the hands of foreign imperialists (Du, 2009). In 1998, the words were used for the theme song of a CCTV series called The Years of Macau (Aomen Suiyue) celebrating the return of Macau. The following year, the song appeared in the CCTV Spring Festival Gala, when an ordinary young girl from Macau famously sang the song for the nation from the CCTV stage in Beijing. She shared her feelings about Macau’s unity with the mainland, backed by a choir of men, women, and children.5 During one of the most socially and politically important television events of 1999, Rong Yunlin’s clear lyrical delivery, youthful image, and pure, innocent child’s voice signalled to audiences that Macau had a bright future as part of China. A similar image would be replicated during the opening ceremony

Multimodal Constructions of the Nation   99 of the Beijing 2008 Olympic Games when a young girl stepped out to sing ‘Ode to the Motherland’. In this example, the song lyrics, visual performance style, and musical accompaniment combined to produce a strong (‘hardened’) message that Macau desired to be part of—and was a natural part of—China. The tune began as a lullaby, accompanied by the tinkling of a xylophone, suggestive of childhood and a sense of safety. A choir of over 30 Chinese-looking girls (dressed in ‘modern’ Western-style dresses) and boys (wearing dress pants and yellow buttoned shirts with bow ties) sang happily in unison, nodding their heads as they sang to emphasize the importance of each word. They told the studio and television audiences “Do you know ‘Macau’ isn’t my real name?” and plainly noted that even though “I’ve been away from you for too long,” they (the unmentioned Portuguese colonizers) only took my “body”, not my “soul”. In this process of appropriation, Macau is emptied of its old colonial meaning and reinvested with a new meaning: belonging to China. Its old colonial name “Macau” is called into question and children guide audiences toward accepting “Aomen” as its new, appropriate, and ‘real’ Chinese name and identity. After this verse, 9-year-old Rong Yunlin stepped out of the crowd and started to walk forward, demanding particular attention as she took a leading soloist role. To reinforce the sense of national identity, the musical accompaniment merged modern Western instruments like the piano with Chinese musical instruments like the Chinese zither (guzheng) (also see Zhang, 2010, p. 52). Rong repeated the first stanza as the other children followed behind, repeating particular phrases (e.g., bu shi wo zhen xing/not my real name, muqin/mother, shi wo di routi/is my body) as they stepped forward for emphasis. Van Leeuwen (1999, pp. 35, 74) refers to this call and response style as supportive emulation, whereby the relationship of support for the leader’s message is dramatized. The visual image of children walking forward (like the minority nationality children walking forward with the Chinese flag during the Opening Ceremony of the Beijing 2008 Olympic Games) and the sound of many voices singing the same song together en masse are commonly used in CCTV’s music-entertainment programming as symbols of national unity and strength. The performance gained momentum when a critical mass of voices could be heard as adults (a group of about 30 women who stood behind the children, and a separate group of around 30 men who stood behind the women and who wore ‘modern’ Western-style black suits and red ties) stepped behind the children to back up their words. The pure voices of children and the ­solemn, authoritative, church-like sanctity of the choral voices of adult men and women helped to “eternalize” (Barthes, 1973, p. 124) Macau’s rightful place as a member of the PRC family. As Van Leeuwen has noted, this style of “social unison” or “monophony”, where all voices sing the same notes, ­displays what may be seen positively as a sense of solidarity and belonging, and negatively as conformity, disciplining, and lack of individuality. Drawing

100  Lauren Gorfinkel on Arnold, he notes that with no one “out of step”, the back-up voices “supply chordal pillars to prop up the dominant voice” and to create a “single feeling”, a form of singing developed during the industrial revolution in Europe that was readily adopted in China when it started to industrialize (Van Leeuwen, 1999, p. 79, 82). In highly politicized national televised moments, the style continues to symbolize national strength, national achievement, and progress. The choice of a child to lead the collective voice of Macau served to reinforce the song’s lyrics in which Macau is constructed as a child of her mainland “mother”. Metaphorically, she helped to establish the notion that Macau needs China just as a young child needs his or her mother. Her small stature and high pitched child-like voice positioned Macau as a small but treasured part of a powerful Chinese nation. With some notes sung slightly off-key, along with her innocent-looking face caught on close-up, the genuineness and conviction of her message seemed clear. Singing in Mandarin, the national language, but with a local accent (e.g., “ni kai” instead of the national standard Beijing dialect “li kai” for “depart from”, “dan si” instead of “dan shi” for the word “but”, “kui lai” instead of “hui lai” for “come back”) also added to a sense of rejoining of the local entity with the bigger collective nation. The adults who echoed the child’s words, and who looked ethnically indistinguishable from any other mainland Chinese citizen, indicated that they too were long lost children of the motherland. (Notably, a similar construction of adults being long lost relatives of mainlanders is also forged in CCTV’s depiction of mainland-Taiwan relations.) In the Spring Festival Gala’s construction of Macau, the men, women, and children unanimously told the television audience that even though they had been separated for “300 years”, they never forgot about whom they really belonged to—the “mother in my dreams”. They pleaded that “us children” should be called by their pet name (i.e., Chinese name) “Aomen”, not “Macau” (the colonial name). In increasingly authoritative tones, the young children, female adults, and male adults called out “Mother! Mother!” and “I want to come back”. The performance ended with deep pulsating drums, an increasing density of instrumental backing indicating a merging of the small Macau and large China, and a final emphatic call, rising to the heroic-sounding tonic in unison: “Mother!” (cf. Van Leeuwen, 1999, p. 39, 106). Sounds of clapping from the studio audience in Beijing merged with the final resonating tones, indicating their support too for the message of national solidarity. The political relationship between Macau and its PRC ‘mother’ was thus made utterly clear for audiences through a combination of sound, images, and very clearly articulated song lyrics that mutually reinforced each other to emphasize the message of unity between Macau and its motherland, mainland China. The former colonial influence was stripped of its significance and the new Chinese identity (or mythology) was constructed, emphasized, and made to appear ‘natural’.

Multimodal Constructions of the Nation   101 Since 1998, the CCTV Spring Festival Gala performance the “The Song of Seven Lands—Macau” has been regularly replayed on various CCTV musicentertainment programs. For example, a China Music Television episode called “Songs of Memories” (Jiyi de gesheng) that aired in 2008 allowed audiences to recapture the nostalgic moment when Rong Yunlin and her fellow Macau compatriots expressed their yearnings to rejoin the motherland. The text is circulated and intermixed with other texts that symbolize important national moments like Zhang Mingmin’s symbolic representation of the eminent return of Hong Kong in his performance of “My Chinese Heart” during the 1983 CCTV Spring Festival Gala (see Gorfinkel, 2011). Via its constant repetitions in various live and repackaged forms, the song is clearly used to educate mainland Chinese as well as regional and global audiences about the ‘real’ history of Macau, its connection to mainland China, and its reintegration into the PRC.

5.2  Happy in China and the Celebration of National Day in Macau in 2008 Another fairly hardened multimodal construction of Macau on CCTV, one that also offered a few ‘softer’ moments of identity construction, was during the music-entertainment program Happy in China—National Day Celebration Special—Charming Macau (Huanle Zhongguo Xing—Guoqingjie tebie jiemu—Meili Aomen). This entertainment special celebrated the fifty-ninth ‘birthday’ of the PRC (i.e., 59 years since Chairman Mao’s declaration of the New China under the leadership of the Chinese Communist Party), and was broadcast in October 2008 as part of China’s National Day celebrations.6 Unlike the previous example, this event was staged in Macau—on ‘local’ turf. The historical timing of the program was soon after the end of the Olympic Games, during which pride in China and the success of the Games were frequently repeated in music-entertainment extravaganzas. Even though Macau itself has only officially been part of the PRC since 1999, it was the number 59 that was emphasized by the host Zhang Lei, who announced that 2008 was the fifty-ninth birthday of the PRC, and this represented 59 years of “struggle/striving” and of “pioneering” a new course for the Chinese nation. She set the national celebratory tone for the program, explaining that “our motherland has experienced great changes”, emphasizing the importance of the Olympics, the “great achievements of 30 years of opening up and reform”, and “feelings of pride”. While Macau was the setting, this program firmly incorporated all of the areas and people to which the PRC lays claim and includes in its Greater China worldview. Zhang called on “all the nationalities of the whole country, Hong Kong and Macau compatriots, Taiwan compatriots and overseas Chinese [to] together celebrate this great day for the motherland”. The verbal, lyrical, and visual discourse worked together to confirm that Macau was a small entity within a big China. During the event, a young

102  Lauren Gorfinkel woman from Macau was invited on stage to sing with Sun Nan, who is a famous Chinese pop star. The woman’s spoken words emphasized her ideologically ‘correct’ affiliation: She said that she chose to sing “Red Flag Fluttering” (Hong qi piao piao), one of Sun Nan’s signature songs, in order to express her “passion for our motherland”. The patriotic lyrics merged magnificently with this uplifting pop classic. As they sang together about the “five-starred red flag” whose “name is more important than my life”, a camera panning across the massive audience showed male youths holding up and waving massive red PRC flags, while seated audience members waved the smaller PRC and Macau SAR flags, suggesting the strength and dominance of the motherland over the SAR. Red dominated the indoor setting: Massive red lanterns hung from the ceiling, huge red drums were displayed and played, and dancers were holding red fans, reflecting their passion for the motherland. The inclusion of ordinary people—the ordinariness of the woman singing with Sun Nan was partly exemplified by her off-key, mediocre singing ability in contrast to the professional singers who were mainly from the mainland—implied that their feelings were typical of people in Macau today. After the song, host Zhang Lei reiterated some of the lyrics of the song in her spoken discourse to reinforce the message of national unity. She stressed that the “five-starred red flag flutters in every one of our hearts” and this is what makes “our mother­ land a big strong family”. Like the CCTV Spring Festival Gala previously discussed, the history of Macau was also performed during the Happy in China program in ways that emphasized the PRC’s sovereignty. However, it also attempted to emphasize more strongly the unique ‘blend’ (jiaorong) of Portuguese and Chinese cultures, mirroring the findings of Lam (2010). Red and green were the colour themes, reflecting the respective flags of the PRC motherland and Macau SAR. The design of the huge indoor stadium also symbolically blended the Macau and PRC national cultures into ‘one’. Lotus flowers, the emblematic flower of Macau, floated in pools of water below the stage, and a huge ‘lotus and five-star’ design of the Macau flag (the stars adapted from the PRC flag) was seen on the ceiling providing an artistic and political ‘framing’ for wide camera shots of the action on stage and the stadium at large. A group of dancers in their red and green costumes also performed in the colour scheme that reflected the merging of the motherland and Macau. This event also emphasized to a degree Macau’s unique identity in contrast to the rest of China. Following the projection of familiar, symbolic images of Beijing, representing the heart of the motherland (Tiananmen Square, soldiers of the People’s Liberation Army, Chairman Mao’s portrait, and the PRC flag), set to the theme of the Chinese national anthem, audiences were fed images of Macau including the symbolic remaining façade of the ruins of St. Paul’s cathedral—known as dasanba paifang in Chinese. Instead of discussing the (colonial) historical or religious significance of such unique buildings, they were imbued with secular meanings more suitable to

Multimodal Constructions of the Nation   103 current times. An emphatic, deep male voice-over in official Beijing/national dialect spoke over these images of Macau to reframe Macau as a unique “harmonious”, “peaceful”, and “beautiful land”, with China continuing to draw some nourishment from Macau’s colonial history. Introducing the clip, host Zhang Lei also emphasized the blending of cultures over the past 400 years, proudly boasting of Macau’s unique buildings, which included both Chinese and foreign styles. Yet, following Barthes (1973), the colonial associations of Macau’s past “recede”, and are “put at a distance and [made] almost transparent” (pp. 117–127). The facades of dasanba paifang are a prime example of a form that is made ‘empty but present’, ensuring that some features are kept (e.g., for tourism purposes, promotion of China’s modern multicultural vitality), while others (e.g., religion) are dropped. The moment in which the former colonial splendour is hinted at is reappropriated to support a new message of contemporary China’s openness and vitality. Subtly, the significance of China and the success of its contemporary administration are stressed. The ordering of performances in this production also revealed a greater emphasis on the Beijing-centred PRC administration–era than on Macau’s former Portuguese history. A brief reference to Macau’s history under a Western power, for example, was immediately followed by a dazzling routine by an acrobatic troupe from Beijing. Such ordering may be coincidental, but overall Macau is framed in CCTV’s music-entertainment programs as firmly belonging to a revitalized PRC, rather than as a separate entity.

5.3 Celebrating the 10 Years since the Handover and China’s International Vision in 2009 Alongside the dominant focus on Macau as part of a flourishing Chinese party-state, between 2008 and 2010 (the period of data collection) there were also entertainment programs that took an international outlook in their representation of Macau.7 These programs adopted a ‘softer’ style of identity construction in which a greater variety of visual, verbal, and musical modes pull in different directions and have the potential for greater ambiguity of message. One such program was a CCTV ‘special’ celebrating the tenth anniversary since the return of Macau to China in 2009 (Qingzhu Aomen huigui zuguo shi zhounian wenyi wanhui).8 While focusing on Macau as part of a great Chinese nation, featuring youth with red ribbons dancing excitedly and singing “we will create the future of China” (women qu chuangzao huaxia de weilai), it also included elements that suggested a greater interest in the blending of Chinese and foreign cultures. The opening dance featured a sprinkling of Caucasian children alongside Chinese-looking children. An on-stage orchestra combined Western instruments (e.g., timpani, violins, double bass, oboe, trumpet, xylophone, and European metallic flutes) and Chinese ones (e.g., erhu, pipa, Chinese wooden flutes, and Chinese

104  Lauren Gorfinkel cymbals). Dancers in various acts wore a mixture of traditional European and Chinese costumes. Alongside two CCTV hosts, the program also included two male hosts, Xia Li’ao (Julio Acconci) and his twin brother Xia Jianlong (Dino Acconci). Better known in the region for performing as a pop duo caller Soler, the image of the Macanese-born brothers’ ‘mixed’ racial identity,9 alongside their popular and fashionable identities as pop singers, was co-opted in a way that assisted in the performance of particular national ideologies associated with Macau and China’s global outlook. At one moment during the show, the twins reminisced about growing up in Macau in a highly scripted dialogue in Mandarin during which a few lines of a Cantonese song were inserted (Cantonese is the language spoken by most local Chinese in Macau). During the dialogue, one curiously asked the other “do you think you really know Macau?” to which the other replied “Hey! I am a Macau/Aomen person (wo shi Aomen ren). Of course I understand Macau”. This dialogue seemed to address an uncertainty over whether mixed-race/Eurasian Macanese are valid Chinese (an unofficial website on Baidu lists them as being Chinese citizens), by casting doubt and then affirming their inclusion. Also significant is that the identity they resolutely expressed was a ‘Macau’ identity not a ‘Chinese’ one. Their dialogue seemed to be positioned to build a message of China’s friendly stance toward foreign countries, rather than to promote the (uncomfortable) reality that some Macanese may have ancestors from Western or other Asian nations. The symbolism of mixed-race Macanese appeared to be appropriated in a way that shifted the discourse away from a focus on Macau-China unity toward a China-West friendship. The Macanese hosts emphasized that Macau has ‘temples’ (read: Chinese temples) as well as ‘churches’ (read: Western churches), and stressed that people of many nationalities live in Macau, a point reinforced via an image of the audience that included Caucasian and Chinese people. They framed Macau as a socially integrated, trendy, modern city and concluded their dialogue by stating in unison that the real uniqueness of Macau was its “blending of China and the West, [where] old and new reflect each other” (Zhong-Xi ronghe, gujin huiyin). While China and the West are presented as cooperating in a friendly manner, they were still presented as separate entities, and the question of mixed identities and ‘who’ is really ‘Chinese’ in this new era remained somewhat ambiguous. The Macanese hosts’ dialogue was immediately followed by an image of the modern city of Macau with tall buildings and neon lights, symbolic of Macau’s flourishing under Chinese administration. Portuguese music was played but was appropriated to make Portugal seem small, simple, and insignificant. The quaint sound of an accordion formed the backdrop to the performance of a ‘Portuguese folk dance’ (Putaoya tufeng wu), performed by Caucasian and Chinese singers and dancers, wearing simple, traditional Portuguese costumes, playing simple folk instruments (one white woman played

Multimodal Constructions of the Nation   105 the triangle, with others on an acoustic guitar and accordion), and singing a simple folk song in Portuguese.10 The performance of the quaint Portuguese folk culture in many ways mirrored the folk performances of China’s ethnic minorities on CCTV, which suggest a concern with ‘preserving’ intangible cultural heritage of the exotic cultures within China, which is presented in stark contrast to the vitality and professionalism of the present mainstream national culture (see Gorfinkel, 2012b). Macau is now presented as part of a flourishing Chinese nation that has extended a friendly hand to its former and old-fashioned colonizers. Highlighting the ideological significance of the event, the live performance was attended by Chinese president Hu Jintao, who sat in the front row of the massive audience. Hu appeared with great fanfare at the beginning of the program, and at the end he went on stage again to extend his hand of friendship to key performers. In a scene reminiscent of the Beijing 2008 Olympic Games, Hu was seen on CCTV singing “Ode to the Mother­land” (Gechang Zuguo) with his compatriots, this time appearing on the stage with the Chairman of the Macau SAR, as well as all the performers who had appeared on the program and who represented various groups of the Chinese nation through their dress (minority nationality costumes, red dresses) and appearance (e.g., mixed-race Macanese). The Caucasian performers also served to suggest China’s friendship with foreign countries and the attraction of foreigners to Macau and China. Clapping and singing together en masse, no less a song about the “victorious singing voices” (shengli gesheng), the “five-starred red flag” (wuxing hongqi), and the “dear motherland” (qin’ai de zuguo), constructed a powerful message of Chinese national unity that was inclusive of China’s multiethnic population and its special administrative regions like Macau. This time, however, the message of unity was coupled with a strong sense that China was a nation with an international outlook. 6. CONCLUSION: POLITICAL IMPLICATIONS OF HARD AND SOFT CONSTRUCTIONS OF MACAU In this article, I have provided examples of how visual, linguistic, and musical semiotic resources in selected CCTV music-entertainment television programs broadcast between 1999 and 2009 have been patterned to ‘naturalize’ Macau’s ‘reintegration’ into the Beijing-centred Chinese nation-state. In the first example, at the time of the handover in 1999, visual, linguistic, and musical modes strongly reinforce a single ‘hardened’ attitude. Even if one disagrees, it is very difficult to miss the message that Macau is to China as a child is to his or her mother. The second example, shortly after the fanfare of the Beijing Olympic Games in 2008, shows how Macau is consistently made to visually, musically, and linguistically ‘blend’ with (and into) the greater PRC-centred Chinese national entity, while Beijing-centred China remains the central, dominant force. The greater number of symbols, including the

106  Lauren Gorfinkel presentation of former colonial architecture, increase the potential for diverse readings, but are constrained through CCTV’s reframing and reappropriation of the symbols, which become evidence of China’s unity and strength. Of the three televised events discussed above, the third example, 10 years after the handover in 2009, offers the greatest degree of ambiguity via the greatest incorporation of foreign semiotic elements embodied in the Eurasian identities of the Acconci twins. However, the potential for diverse readings continues to serve the political (though changing) interests expected of the state-run broadcaster. During this period, the programs not only emphasized messages of national unity, but also promoted China’s openness to the world, which is important for building China’s international image. Through popular cultural television performances, in both ‘hardened’ and somewhat ‘softer’ moments, a unified and revitalized image of the Chinese nation-state has been established and naturalized. This has been achieved by emptying the former colonial power of its significance and subtly marking the ongoing necessity of the Beijing-centred PRC administration in order to ensure the bright future of both Macau and China as a whole. NOTES 1. The author would like to thank the reviewers and editors, as well as Wanning Sun and Louise Edwards for helpful suggestions on earlier drafts of this chapter. 2. Portugal and China signed a secret pre-agreement in 1979 during which both sides agreed that Macau was Chinese territory under Portuguese administration. In 1987, China and Portugal signed an agreement on the process for handing over Macau to Chinese administration (Lam, 2010, p. 660). 3. See report at, (April 27, 2010). 4. See report on Happy in China (CNTV website, June 10, 2011), http://ent.cntv .cn/20110610/100420.shtml. 5. See Rong Yunlin’s performance on the CCTV Spring Festival Gala online at = .html (CCTV-3 MTV). A CCTV reflection on “Song of Seven Lands—Aomen”, including excerpts from the television series, and Rong Yunlin, now aged 19, is available at Further information on Rong Yunlin and her performance is available at: .com/view/63628.htm 6. See program excerpt Happy in China—National Day Celebration Special— Charming Macau at video/20100423/100586.shtml 7. Lam (2010, p. 672–673) notes that Macau is being used as an economic and symbolic platform to try and build strategic trade relationships with Portugal and African and Latin American states which were former colonies of Portugal, such as Brazil, Angola, Mozambique, Cape Verde, Guinea Bissau, and East Timor (also see Sheh & Law, 2011; Zhou Z., 2010). 8. See clip on celebrating the return of Macau at humanities/aomenshinianwenyiwanhui/classpage/video/20091227/100479 .shtml

Multimodal Constructions of the Nation   107 9. According to unofficial websites, for example (band), the brothers were born to an Italian father and mother from the Karen ethnic group (Burma) and raised as Roman Catholics in Macau where they attended a local Chinese school. As adults they moved to Italy to pursue their music and returned to Macau in 1999, before achieving success as performers in Hong Kong. They received the Commercial Radio Hong Kong Music Awards 2005 Best Group Newcomer Gold Award and have released popular albums in Cantonese and Mandarin. 10. Translated lyrics appeared in Chinese subtitles at the bottom of the screen explaining that the song was about a boy in the north of Portugal who could not sing and dance. But the girls didn’t care about his money or whether he wrote them letters, as long as he would dance with them.

REFERENCES Anderson, B. (2006). Imagined communities: Reflections on the origin and spread of nationalism. London: Verso. Barthes, R. (1973). Mythologies (A. Laver, Trans.). St Albans, UK: Paladin. Brady, A.M. (2008). Marketing dictatorship: Propaganda and thought work in contemporary China. Lanham, MD: Rowman and Littlefield. Du, J. (2009, June 9). From “The Song of Five Lands” to “The Song of Seven Lands” Dahe Daily . Retrieved July 28, 2011, from t20090609_19276579.shtml Duara, P. (1996). De-constructing the Chinese nation. In J. Unger (Ed.), Chinese nationalism (pp. 31–55). Armonk, NY: M.E. Sharpe. Edmonds, R. L. (1995). Macau and greater China. In D. Shambaugh (Ed.), Greater China: The next superpower? (pp. 226–254). New York, NY: Clarendon. Gorfinkel, L. (2011). Ideology and the performance of Chineseness: Hong Kong singers on the CCTV stage. Perfect Beat: The Pacific Journal of Research into Contemporary Music and Popular Culture, 12(2), 107–128. Gorfinkel, L. (2012a). Promoting a harmonious China through popular musicentertainment television programming. In J. T. H. Lee, L.V. Nedilsky, & S. K. Cheung (Eds.), China’s rise to power: Conceptions of state governance (pp. 71–90). New York, NY: Palgrave-Macmillan. Gorfinkel, L. (2012b). From transformation to preservation: Music and multi-ethnic unity on television in China. In K. Howard (Ed.), Music as intangible cultural ­heritage: Policy, ideology, and practice in the preservation of East Asian traditions (pp. 99–112). Farnham, UK: Ashgate. Happy in China. (2011). Retrieved January 12, 2011, from huanlezhongguoxing/videopage/index.shtml Holm, D. L. (1991). Art and ideology in revolutionary China. Oxford: Clarendon Press. Hong, J. (1998). The internationalization of television in China: The evolution of ideology, society, and media since the reform. Westport, CT: Praeger. Kress, G., & Van Leeuwen, T. (2002). Colour as a semiotic mode: Notes for a grammar of colour. Visual Communication, 1(3), 343–368. Lam, W. (2010). Promoting hybridity: The politics of the new Macau identity. The China Quarterly, 203 (September), 656–674. Lemke, J. (2002). Travels in hypermodality. Visual Communication, 1(3), 299–325. Lemke, J. (2006). Toward critical multimedia literacy: Technology, research, and politics. In M. C. McKenna, L. D. Labbo, R. D. Kieffer, & D. Reinking (Eds.), International handbook of literacy and technology (Vol. 2, pp. 3–14). Mahwah, NJ: Lawrence Erlbaum Associates.

108  Lauren Gorfinkel Lo, S. S. H. (2008). The dynamics of Beijing-Hong Kong relations: A model for Taiwan? Hong Kong: Hong Kong University Press. Lu, X. (2009). Ritual, television, and state ideology: Rereading CCTV’s 2006 spring festival gala. In Y. Zhu & C. Berry (Eds.), TV China (pp. 111–125). Indianapolis: Indiana University Press. Macken-Horarik, M. (2003). A telling symbiosis in the discourse of hatred: Multi­ modal new texts about the “children overboard” affair. Australian Review of Applied Linguistics, 26(2), 1–16. Nelson, L. (1999). Bodies (and spaces) do matter: The limits of performativity. Gender, Place and Culture, 6(4), 331–353. Spickard, P. (2001). The subject is mixed race: The boom in biracial biography. In D. Parker & M. Song (Eds.), Rethinking “mixed race” (pp. 76–98). London: Pluto. Sun, W. (2009). Maid in China: Media, mobility, and a new semiotic of power. London: Routledge. Van Leeuwen, T. (1999). Speech, music, sound. London: Macmillan. Woronov, T. E. (2007). Performing the nation: China’s children as little red pioneers. Anthropological Quarterly, 80(3), 647–672. Zhang, Y. [张裕亮] (2010). Mainland China’s popular culture and party state consciousness [中国大陆流行文化与党国意识]. Taipei, Taiwan: Showwe Information. Zhao, B. (1998). Popular family television and party ideology: The spring festival eve happy gathering. Media, Culture and Society, 20(1), 43–58. Zhao, Y., & Guo, Z. (2005). Television in China: History, political economy, and ideology. In J. Wasko (Ed.), A companion to television (pp. 521–539). Malden, MA: Blackwell. Retrieved March 14, 2011, from Companion_To_Television.pdf#page = 539 Zhu, Y., & Berry, C. (2008). Introduction. In Y. Zhu & C. Berry (Eds.), TV China (pp. 15–39). Bloomington: Indiana University Press.


A Multimodal Analysis of the Environment Beat in a Music Video Carmen Daniela Maier and Judith Leah Cross

1. INTRODUCTION The global impact of demanding environmental concerns is visible in almost all contexts of contemporary communication and across geographical borders. An increasing range of multimodal texts surface continuously in various media in order to facilitate public understanding of irreversible environmental changes, to educate future generations in ecoliteracy, to promote green or disclose greenwashed corporate images and practices, or to entertain and facilitate appropriate actions as well as responses. Simultaneously, research in environmental communication tries to keep up with this rapid pace by examining environment-focused multimodal texts from the context of journalism (Doyle, 2011; Lester & Cottle, 2009), education (Maier, 2010; Reid, 2007), advertising (Corbett, 2006; Cox, 2010; H ­ ansen & Machin, 2008; Maier 2011; Moschini, 2007), and popular culture (Brereton, 2004; Meister & Japp, 2002; Starosielski, 2011), to mention only a few relevant areas. Although environment-focused music videos have also proliferated in the last decade, and despite their recognized impact upon younger generations, giving expression as they do to the rhythms and visual associations relevant to youth cultures, music videos that deal with the environmental theme have relatively rarely been the subject of research endeavours. The present chapter intends to draw attention to how the analysis of relevant multimodal texts, such as the music video Earth Song, can contribute to a better understanding of the ways by which communication about environmental issues takes place in the context of popular culture. Our analysis will primarily be focused on how the video takes a critical view of human interaction with the environment by questioning the wisdom of traditional national boundaries and notions of time as linear and irreversible. Michael Jackson’s Earth Song is a call to save the planet from the destructive impact that has been wrought upon the earth by humanity and technology. It was recorded in 1995, but never released as a single in the United States, due to events related to perceptions of Michael Jackson’s ­private life. However, Earth Song won a Grammy nomination in 1997

110  Carmen Daniela Maier and Judith Leah Cross (Jurin, Roush, & Danter, 2010, p. 132), as well as recognition in the form of the Genesis Award in 1996. According to Grant (1998), it was Jackson’s intention to create a lyrical and also melodically simple song, so the whole world, including non-English-speaking audiences, could sing along. Earth Song has a specific synchronization of semiotic modes, orchestrated along four narrative strands and filmed in four different geographic locations across the globe, but presumably occurring at the same time. Each of these strands presents images of deforestation, animal cruelty, pollution, and war, with their disastrous consequences for humanity and Earth. These visual stories, based on shots taken from documentary archives and documentary-like footage filmed in Warwick, New York, the Amazon Rain Forest, Croatia, and Tanzania, are brought together and synchronized with an equally alarming musical accompaniment, insistent lyrics, an iconic presenter, and carefully edited shots of similar actions and gestures performed by the participants. The regular solo appearances of Michael Jackson as the voice of the world are staged against a backdrop of burning forests around New York. The overall spectacular effect is largely achieved through the interplay between its regular musical structure and the chorus-like chant “What about us?” which is coordinated with footage from the four disparate locations of devastation. Earth Song has earned its recognition as a “green anthem” because the broader environmental discourse that underlies it can be found not in the four individual “activity schemas” (Machin, 2010, p. 94), but at the level of the whole video, which reveals Michael Jackson’s critical approach to environmental issues. This chapter illustrates a few of the ways in which this video’s discourse constructs space and time through the interplay of several semiotic modes. Our focus on space and time is motivated by the fact that we consider these to be fundamental coordinates of many environmental discourses. As will be shown below, in this particular video, the multimodal representation of space and time carries the critical green message in multiple ways. 2.  APPROACHES TO MUSIC VIDEO ANALYSIS Music videos have been the focus of research work belonging to various traditions in media and cultural studies. McQuail (2000) describes music videos as the first postmodern television service and, therefore, part of a real cultural revolution that has taken place within the mass media, resulting in a new aesthetic where popular music has become a dominant art (p. 114). Since it is not only relevant to consider how music videos are perceived by their audiences, Schwartz and Ratner’s extensive reference guide to the making of music videos (2007) portrays the music video production medium and its practitioners in revealing ways so that readers can see how the industry perceives itself. This insider’s view of the industry shows that it is neither as hedonistic nor as chaotic as many may think.

A Multimodal Analysis of the Environment Beat in a Music Video  111 Content analysis of music videos has focused on specific themes, such as the verbal and nonverbal portrayal of performers (Wallis, 2011), as well as on what is culturally sensitive, that is, gender and race. Reid-Brinkley (2008), for example, has interrogated black women’s Internet-based discussion and negotiation of the negative gender and racist stereotypes portrayed in inflammatory rap music videos. The posted responses provoked by a music video provided a means of analyzing discursive communicative patterns according to the social allegiances these defined. McKee and Pardun’s qualitative study (2003) of how first-year college students “read” music videos seems to support the view that deeper understandings can indeed be gained from what appear to be superficial and sensationalist popular culture music texts. Relational approaches to the analysis of popular music (Moore, 2003) work from yet another viewpoint, attempting to merge the lines distinguishing genre (Mittell, 2000), content, textual, and other analyses. Studying music videos as communication, Gow (1992) categorizes them according to six popular formulas according to recurring combinations of form and content. The multilayered discourses and meanings in music videos, however, have rarely been addressed in research or educational texts, although a combinatory approach (of the art and its commerce), as adopted by Vernallis (2004), does begin to analyze critically, steering away from a focus on appearances, sales, and stereotypes. Among multimodality researchers, Martinec has approached a couple of Michael Jackson’s music videos focusing on rhythmic hierarchies at the level of language, music, and action (2000b), and phases and foregrounding (2000a). 3. METHODOLOGICAL FRAMEWORK, DATA, AND TRANSCRIPTION STRATEGIES In this chapter, we also adopt a multimodal approach both to data collection and analysis. For the purposes of this analysis, both the consideration that “each mode is partial in relation to the whole of meaning” (Kress & Jewitt, 2003, p. 3) has been important, and also the emphasis on “the temporal and spatial unfolding” (O’Halloran, 2004, p. 109) of these semiotic resources. Based on Van Leeuwen’s (2008) approach to discursive time and space, this chapter explains how semiotic modes interrelate and impact on the discursive construction and critical communication of time and space. By employing Van Leeuwen’s concepts of synchronization and recurrence (2008, p. 81), the present multimodal analysis addresses the issue of time in terms of both what is represented and how it is represented in the environmental discourse of Earth Song. The representation of space is similarly considered in terms of both what is represented and how it is represented through various semiotic resources. As will be demonstrated below, the representation of time and space is crucial for achieving continuity and unity in the music video and thus for its overall effect as a “green anthem”.

Gradual increase in volume

Piano continuing


Ooh . . . ooh . . .

Fade in

Smooth cut, visual rhyme (graphic match)

Medium and long shot MJ singing, arms outstretched, swinging from facing left toward the right; background forest fire 0:00:58

Close up shot African boy, eyes open, looking right 0:01:01

Phrasing repeated an octave higher; xylophone and percussion tambourine introduced

Transition to chorus

Smooth cut Visual rhyme (graphic match)

Close up Amazonian woman, eyes closed, facing left 0:00:57

Increased range in pitch as emotions expand

Aaah . . . Aaah . . .


Long shot MJ singing, arms outstretched, facing left; background forest fire 0:00:54

String instruments enter


This crying Earth, its weeping shores



Time code

Animation Special effects

Aural Mode

Long frontal shot Amazonians; background destroyed forest 0:00:52

Photographic image

Visual Mode

Table 7.1  Sample of the multimodal transcription of the music video

Space is universally shared

People face left and right, panoramic sweep of vision

Parallel disasters, different peoples across the globe: spatial unity


Time signature and beat become more pronounced and insistent; cumulative effect intended to induce dream-like or trance-like effect where time is eternal

Recurrence, repetition, out of time

Synchronization across narrative strands


Multimodal Connections

A Multimodal Analysis of the Environment Beat in a Music Video  113 The transcription strategies that have been adopted in this chapter for the multimodal analysis are closely related not only to the research interests of the authors, but also to the structural specificity of this music video. Use of the editing software Adobe Premiere, which allows exploration of the juxtapositions of sound, images, movement/dance, and words at frame and shot level, has enabled us to highlight their functions in securing this music video’s continuity. Highlighting continuity has an important role in the analysis of this video due to employment of two different editing strategies that might have had a disrupting effect without being combined with other continuity strategies at the level of the aural mode. We have adopted Bordwell and Thompson’s (2001) definition of frame as “a single image on the strip of film” (p. 431), and their definition of a shot as “one uninterrupted image with a single static or mobile framing” (p. 433). The data have been transcribed in a table that includes both aspects of description and interpretation of data. A selection from the transcription table used can be seen in Table 7.1, which displays information regarding aspects of the visual and aural modes, as well as multimodal connections related to space and time. The rows in the table are coded in varying shades to distinguish each of the four narrative strands, as well as the role of Michael Jackson’s appearances in the video’s structure. For example, all rows describing shots from the Amazonian forest are light grey, while those describing shots from Africa are dark grey. In the first two columns, descriptive elements have been inserted for each shot (measured in one second intervals) in terms of specific visual (photographic and animated images, as well as special effects and camera movements) and aural (lyrics, music, and other sounds) modes. The third column focuses on multimodal connections as it presents elements of interpretation specifically oriented to the representation of space and time at the level of each narrative strand. Each row has been shaded differently according to the continuity relations established across the four narrative strands. 4. REPRESENTATION OF TEMPORALITY AND SPATIALITY ACROSS SEMIOTIC MODES As already suggested, in order to identify what kind of discursive meanings are assigned to time and space, both what is represented and how it is represented are taken into account at the level of each narrative strand and at the level of the whole video. We have adopted this approach as we consider that temporality and spatiality are embodied both in the participants and locations of the discourse, while their representation is manipulated at the level of all semiotic modes in order to create a sense of timeless continuity and global unity. When examining one of music videos’ formulas, namely ’the enhanced performance‘ formula, Gow (1992) explains that this type of video “blend[s]

114  Carmen Daniela Maier and Judith Leah Cross performance and non performance images together in a manner where the musical work of the artist(s) is kept at the forefront of the video” (p. 62). In the case of Earth Song, it is evident that Michael Jackson’s performance is kept at the forefront, but it remains in tune with the nonperformance images during the whole video. Therefore, although Michael Jackson himself appears to embody the critical narrative ‘voice’ and ‘consciousness’ of Earth Song, three of the four narrative strands contain human participants (men, women, children, older people) who also ‘narrate’ their stories via documentary-like footage, the regular tempo of music, and a specific editing strategy. Temporality is embodied in these participants because the choice of human participants suggests time passing in terms of age: Babies, children, and adults appear in the four narrative strands suggesting historical and biological evolution. In the first part of the video, the human participants enact and question stories that simultaneously show how humanity has negatively interacted with and impacted the environment, while in the second part, all destruction is reversed. Murphet (2005) notes how film always appears to “unfold in real time” and hence, she concludes, “the simultaneous tense [that is, the present tense] is the most typical temporal location of filmic narrative voice” (p. 78). This is also the case in Earth Song where, despite the previously mentioned suggestion of time passing, the past is revisited and recurs as if eternally present, emphasizing the key notion of how time is conceptualized in music as “the extended present” (Tagg, 2013, p. 262). The temporal continuity of the parallel narrative strands implies an ongoing cry to align around a common concern for the future of the Earth. Similarly, if the choice of human participants suggests temporality, the choice of nonhuman participants, for example, elephants and whales, is more suggestive of spatiality in terms of the natural environment, such as the forest and sea. In general, space is more than setting as symbolic values are added in each narrative strand. We have found several categories of discursive space according to its relation—or lack of relation—to human participants. Visually, ‘human space’ is represented through social space (the Croatian ruins, street, and factory), while ‘nonhuman space’ is represented by nature (forest, sea, and African landscape). Space is also ‘humanized’ aurally through personifications: “crying Earth”, “bleeding Earth”, Earth’s “wounds”, “planet’s womb”, and “weeping shores” in the song’s lyrics. At the same time, the blatant lack of care for the environment makes space ‘unrecognizable’: “Now I don’t know where we are”. The ways in which these multimodal manifestations of space are related to each other have specific discursive consequences, as will be argued later in this chapter.

4.1  Continuity and Unity across Texts and Images As already mentioned, time initially follows a visual chronological path in each of the narrative strands. However, continuity at the level of the whole music video is visually maintained through the existence of similar actions

A Multimodal Analysis of the Environment Beat in a Music Video  115 and gestures in each strand. For example, actions and gestures of human participants such as walking, kneeling, and clinging to trees appear in each narrative strand and are synchronized across strands. This repetition contributes to maintaining the idea of spatial unity throughout the music video, even though the actions take place on four different continents. Furthermore, the same actions are visually reversed in each strand in the second part of the video. Future becomes past, but this transformation has no negative connotations in this discourse because the progression of time causes destruction, while the regression in time suggests healing. Verbally, the same rhetorical question with variations according to the content of each narrative strand strengthens the cyclical patterns as well as the continuity and sense of unity at the level of the whole video: “What about elephants / forest trails / children dying . . . ?” The first two words of this pattern, uttered repeatedly by Michael Jackson, enter in dialogic “sequential interactions” (Van Leeuwen, 1999, p. 212) with the recurrent question of the chorus “What about us?” as they alternate with it in the second half of the video. None of these questions receives an answer, increasing in this way the challenge each one makes, especially when synchronized with the above-mentioned repeated actions and gestures. Time is visually represented through several strategies: (1) fast motion represents time as passing at high speed and thus becoming uncontrollable; (2) alternation of lighter and darker frames, also construing time as becoming unmanageable, combined with cutting on the move, suggesting that time unifies spaces; (3) backward action representing time as passing in reverse, and it seems only then that time becomes controllable and a sense of balance can be restored. The main discursive implication of these temporal manipulations suggests that ‘progress’ is out of control and that humanity should be aware of irreversible consequences. Additionally, the use of slow-motion filming foregrounds actions through dilated moments, conveying a sense of being located out-of-time; this effect is also achieved through the use of close ups, which confer temporal stability, while simultaneously plunging the participants into spatial isolation. In addition, the repetition of similar types of shots in all four narrative strands creates visual rhymes that reinforce the cyclical character of time at the level of the whole video and further assist in creating smooth transitions from one story, space, and time to the next. It is this cyclical representation of time that confers hope on the relations of mankind with the environment. Yet, its visual realization is reminiscent of clichéd blockbuster films. As a result, the documentary-like footage can appear as contrived or staged, especially due to its being so expertly synchronized with and reinforced by the music—an interaction discussed later in this chapter. The images in Figure 7.1 and Figure 7.2 come from different narrative strands and illustrate how the four strands parallel and emphasize each other visually at the level of the video as a whole. The low-angle images portray a form of global prayer through the similar actions and gestures of the participants. They also help reduce the spatio-temporal distance between

116  Carmen Daniela Maier and Judith Leah Cross

Figure 7.1  Representation of drawing after screen shot 1

Figure 7.2  Representation of drawing after screen shot 2

A Multimodal Analysis of the Environment Beat in a Music Video  117 the video’s participants and events and the audience. According to Vernallis (2001), “low-angle shots are used more extensively in music videos, partly because they reproduce the relations among audience, performance and stage” (p. 28). This effect is reinforced verbally through the rhetorical question “What about . . . ?”, which is recurrently asked with pleading intonation and appropriate variations. Continuity is also maintained through several other visual strategies: recurrent shots of Michael Jackson in the same location; similar shades of colour throughout the video’s duration; similar types of long shots and similar panoramic camera movements in all four locations, foregrounding the spatial specificity of each setting while linking each of them in the same moment of recurring time. The dominant technique for joining all these types of shots in the video is parallel cutting/montage, “cutting between two or more related actions occurring simultaneously at different locations or occurring at different times” (Konigsberg, 1988, p. 254), which highlights the similarities of actions happening in various places and times. The synchronization of actions across narrative strands achieved through parallel montage suggests un ity across space and time and reflects a certain view of the destruction of our world. At the level of each narrative strand, “continuity editing” is used in order “to maintain continuous and clear narrative action” (Bordwell & Thompson, 2001, p. 429). Vernallis found that “music videos avoid continuity editing because such techniques would give the visual track too strong a forward trajectory: the image might seem to overtake the song” (2001, p.23). However, in the case of this video, the combination of parallel montage and continuity editing ensures a balance between the visual and the musical tracks. Thus, a sense of continuity and unity is maintained multimodally at the level of the whole video in order to critically connote the idea of global responsibility for the environment. First of all, this is achieved through the ways in which temporality and spatiality are embodied visually in participants. Then, the four narrative strands are characterized by multimodal similarities involving recurrent actions and rhetorical questions. Finally, repetitions of multimodally constructed space types, including human, nonhuman, humanized/personified, and unrecognizable, contribute to a sense of global unity. The main types of repetitions are given by the visual parallel montage, visual temporal manipulations, echoing of musical phrasings, similarities in space visualization and recurring words, as well as by repetition of phrases in the lyrics. The next part of the chapter discusses some of the repetitions related to music.

4.2  Continuity and Unity through Music Continuity is also aurally realized and maintained at the level of the whole music video through several strategies. The main strategy is musical time, which follows a consistent ‘beat’, a term often used loosely to refer to

118  Carmen Daniela Maier and Judith Leah Cross combinations of tempo, meter, and rhythm. In this music video, the four distinct narrative strands unfold with a continuity based in the sameness of their tempos, meters, and rhythms. This sameness in beat is reinforced by sameness in musical phrasings, as these too are repeated a set number of times, both within and between narrative strands. For the first few tranquil seconds of the music video, viewers see a forest and hear a faint accompanying sound of a bird twitter—rhythms of nature briefly suggesting the “extended present” (Tagg, 2012, p. 262); these sounds of nature can be a rich source for connotative meanings and here they momentarily evoke a sense of the timeless present. A lone electric piano then introduces the music video’s signature melody for the first minute or so, before strings join in, effectively downplaying any discernible beat, which only becomes apparent when the tambourine enters about one and half minutes into the clip, together with the chorus, heralding the four-four time signature proper. The percussive tambourine has traditionally been described as having a hypnotic effect partly due to the range of simultaneous effects it can produce: it can be played by hand or stick, mounted or held, shaken or stroked (Hamelman, 2011, pp. 104–108). The sound properties and associations of the tambourine thus contribute to dream-like associations for the space and time in which the rest of this music video plays out. It is at the point when the song’s time signature proper begins that the string instruments contribute to a classic, even predictable, sense of ‘musical story-building’: the instruments continue to repeat each of their phrasings at least three times, before progressing to one octave higher when their pitch range begins to widen. Following Van Leeuwen (1999), who describes sound qualities such as pitch register, loudness, and tension as having an “experiential meaning potential” where meanings are based on past bodily experiences, an expansive pitch range connotes force (p. 140). The repetition of the A flat minor /D flat major chord pattern in the verse, with its subsequent E flat major resolve, serves to accentuate the melody, while the music’s rising pitch has metaphoric extensions suggestive of both power and prayer. As the musical phrasings reappear in a higher register, the images cut rapidly from fleeting glimpses of a dying elephant and fleeing zebras and giraffes to more lingering close-ups of a deeply concerned child and horrified woman. Although such disquieting images may well be documentary evidence of authentic reactions and real events, they are synchronized with an original musical composition, unable to be categorized as representative of a particular genre, but variously described as a rock ballad with gospel, blues, and operatic elements (Hunter, 1995), and a surreally staged narrator in the form of Michael Jackson. This, alongside the editing techniques described earlier and the multilayered soundtrack, increases the potentially dramatic effect of the complete music clip. The combined effect of a repetitive musical structure with parallel repetitions in the aural and visual modes is discursively multiplied by their synchronization, producing a potent ­representation and criticism of global tragedies.

A Multimodal Analysis of the Environment Beat in a Music Video  119 By three minutes into the music video, a full band is performing, with drums that include a crashing snare pulsation as well as a bass, emphasizing the theatricality of the music video as a whole. The increased volume of the sound, together with the increase in vocal resonance and pitch range, combine to produce an overall ‘experience’ of augmented effort. Loudness, pitch range, and tempo are musical parameters whose effects function as “cues” (Tagg, 2013, p. 255) most likely indicative of, in this section of the music, ‘healing energy’. The cumulative result of the unrelenting beat and recurring musical phrasings effectively results in a dream-like atmosphere wherein even tragedies can be reversed. The throb of the music finds its visual echo, about halfway into the piece, in the participants’ parallel actions and related movements. Just as ‘elaboration’ restates by providing equivalent information, so do repetitive musical phrasings elaborate on parallel gestures, providing a different perspective on the same information (Halliday in Machin, 2010, p. 192), and thereby multimodally underlining the crisis that the Earth is facing. Alternating minor and major chord patterns, which initially lead from A flat minor, suggest a tendency toward disaster, while the momentary duration of each is also significant. Tempo is, according to Tagg (in Van Leeuwen, 1999, p. 39), “an important parameter in determining the human/ biological aspect of an affective relationship to time”, so the brevity of these phrases can be variously interpreted to mean something fleeting, or unstable and even volatile. Moreover, as these brief wave-like phrasings are recycled throughout the music video, their collective effect as short bursts of energy seems to signify disquiet and anxiety. Still, the conventional four-four time, together with the two basic building blocks of tonal music, the minor and major chords (Kamien, 2008, p. 46), anchors actions, unifying and grounding them as well as suggesting consistency. Meanwhile, the haunting melody assists in the discursive realization of a sense of continuity and also unity. Ultimately, a steady increase in volume and pitch range conveys a sense of urgency and fear that time is running out. Combined with its considerable range in pitch and volume, the final shift to a higher tone in a major key allows the music to evoke a sense of enduring continuity and spatial unity that the visual and verbal modes of Earth Song also communicate. Vernallis (2004) notes how the last beat (number four) of each verse in Earth Song results visually in a tree being felled, a seal being bludgeoned, or a man being killed (p. 189). Despite the representations of these atrocities being separated visually and temporally, the accent of the music constantly connects man and animal with the fate of, but also the hope for, the planet. The strong, sturdy beat of the music does not totally overpower its evocative melody or lyrics, but serves to underscore their energy and hence, the importance of what is being communicated. In other words, although repetition is characteristic of most popular music, it needs to be appreciated within the context of this music video as a whole. In short, the snowballing effect of the timed strike with the disturbing image results in a metaphorical

120  Carmen Daniela Maier and Judith Leah Cross enhancement of the message, namely our global responsibility of appropriately responding to environmental disasters now. The music’s substantial pitch range, suggestive of a force directed at rousing emotions on a global scale (Machin, 2010, p. 219), is a style reminiscent of national anthems (often referred to as ‘hymns’ in languages other than English), as these tend to be characterized by an “expansive range” (Tagg, 2013, p. 309) and usually include a preponderance of major chords specifically drawn on to stir patriotic emotions. These very same characteristics of anthems are deployed in this music video for a critically different purpose: to rally all peoples of the Earth, rather than individual nations, for the purpose of saving the planet. Applying Van Leeuwen’s system network of timing (1999, pp. 60–65), we can identify each sound element in the gospel-style answer-call vocals as measured and regularized: The usual unvaried response of such vocals is now in the form of a rhetorical question, “What about us?”, that is sung Table 7.2  Summary of the visual and aural strategies employed for the representation of temporal continuity and spatial unity Time and Space Modes Visual mode

Temporal Continuity •  Parallel montage •  Continuity editing •  Recurrent temporal manipulations •  fast/slow motion filming •  backward filming •  alternated light and darker frames

Aural mode

Spatial Unity Visual synchronization through repetition of: •  Recurrence of similar types of long & close up shots revealing the participants •  Recurrence of similar panoramic camera movements revealing the four locations


•  Answer-call vocals •  Recurrent rhetorical questions: What about us?

Personified space: crying Earth, bleeding Earth, Earths wounds


•  Regular four-four time •  Consistent beat •  Recurrent musical phrasings •  Alternating minor and major chords •  Steady increase in volume and pitch range

•  Large pitch and volume range •  Blockbuster-like soundscape •  Elements of anthem, ballad, blues, gospel, opera, and rock

A Multimodal Analysis of the Environment Beat in a Music Video  121 invariably and in unison, articulating a critical issue about the collective fate of the many communities listening across our planet. The regular, unvaried bass signifies a grounded, persistent, “down-to-earth”, but nonetheless, poignant, harmonizing, and consistent theme. The answer-call vocals and echoing of musical phrasings across the globe indicate a sense of community and solidarity, a collective perspective on destruction, a sense of reflection, and recognition of the cyclical nature of time: At the end of the video, past becomes present while sounds once again soften in the eternal present of a natural space shared by a vast range of animals, plants, and people. The gradual crescendo and amplified pitch range leave the listener in no doubt that the long wail is the climax for the four narrative strands, as it peaks more than halfway through the music video and marks a turning point. Winds change direction, felled trees rise, an elephant and a person come back to life; finally, the soft note of nature concludes the music video. As the music video ends, the events filmed and the music’s denouement come full circle; simultaneously, the volume and pitch decrease while the unrelenting beat of the music concludes one tone higher, on a hopeful F major, indicating that an atmosphere of equilibrium has been almost instantly restored and, consequently, nature as we have always known it reappears and seemingly exists as it had before. The almost symmetrical arrangement of both the music and the images frames the four narrative strands. A summary of this symmetrical or parallel arrangement of modes co-deployed for representing temporal continuity and spatial unity can be seen in Table 7.2. 5. CONCLUSIONS The critical multimodal analysis of Earth Song presented in this chapter has demonstrated how this music video addresses significantly vital questions about our global responsibility toward the planet’s future by constructing a specific interpretation of time and space through a complex interplay of several semiotic modes. By focusing on how the multimodal construction of discursive time and space at the level of the four narrative strands contributes to a broader critical discourse at the level of the whole video, this analysis complements other explorations of the video, which consistently recognize its explicit condemnation of the destruction humanity has wrought upon the Earth (Hunter, 1995; Stillwater 2011; Vernallis, 2004). The analysis has revealed that in Earth Song, time is no longer represented as divisible, linear, and irreversible, and space is no longer divisible and unrecognizable, but a shared view of nature is possible. The novelty of this music video lies in the ways in which a meaning of ‘oneness’ is attached to both time and space. This overarching meaning is multimodally achieved by highlighting temporal continuity and spatial unity. Visually unifying four continents through the same recurrent actions and gestures, the video focuses on the central message of its environmental discourse: the need for global involvement. The melody links

122  Carmen Daniela Maier and Judith Leah Cross the various visual narratives of the musical video as it reverberates in alternating minor/major patterns. The tone and tempo of the music drive both the lyrics and the images, allowing Earth Song to become a compassionate voice capable of revealing the magnitude and the urgency of environmental problems at the level of the whole planet. However, the discrepancy between the documentary archival shots and documentary-like footage that reveal the problems, together with the impossible solution of time reversion, weakens the intended message by implicitly suggesting a lack of viable solutions. In the light of the present findings, future research could focus on comparing various versions of popular music videos and the social context in which they have been produced, so as to explore their various representations of urgent contemporary issues. With reference to Earth Song, these versions could include the original, as well as a later version that uses still images, but displays the lyrics of the song clearly across the screen, omitting the authentic footage shot across four different geographical locations or any images of Michael Jackson. A third version worthy of consideration is seen in the relatively recent DVD, This Is It (Jackson, Phillips, Ortega, Gongaware, & Ortega, 2009). The version of Earth Song in this DVD begins with a child’s perspective. Significantly, the title of Michael Jackson’s planned final comeback, also given the title of This Is It Srefers specifically to this third version of Earth Song. Comparing the impact of various realizations may highlight the contribution of each semiotic choice in constructing meaning, as well as the combined overall effect of each in the context of popular culture. The popularity of Michael Jackson may well be debatable, but it is hoped that, by having chosen to analyze this particular music video, it has been shown how a representative text of contemporary popular culture communicates its critical environmental message by defying traditional time and space constraints. ACKNOWLEDGMENT The authors would like to extend special thanks to Michael Wurzer for his advice and expertise regarding our analysis of the music and the musical instruments in this paper. REFERENCES Bordwell, D., & Thompson, K. (2001). Film art: An introduction. New York, NY: McGraw-Hill. Brereton, P. (2004). Hollywood utopia: Ecology in contemporary American cinema. Bristol: Intellect. Corbett, J. (2006). Communicating nature: How we create and understand environmental messages. Washington, D.C.: Island Press. Cox, R. (2010). Environmental communication and the public sphere. Thousand Oaks, CA: Sage Publications.

A Multimodal Analysis of the Environment Beat in a Music Video  123 Doyle, J. (2011). Acclimatizing nuclear? Climate change, nuclear power and the reframing of risk in the UK news media. The International Communication Gazette, 73(1–2), 107–125. Gow, J. (1992). Music video as communication: Popular formulas and emerging genres. Journal of Popular Culture, 26(2), 41–70. Grant, A. (1998). Michael Jackson: Making history. London: Omnibus Press. Hamelman, S. (2011). The Beatles and the art of the tambourine. Studies in Popular Culture, 33(2), 95–116. Hansen, A., & Machin, D. (2008). Visually branding the environment: Climate change as a marketing opportunity. Discourse Studies, 10(6), 777–794. Hunter, J. (1995, August 10). Michael Jackson HIStory. Rolling Stone. Retrieved June 6, 2012, from history_past_present_and_future_book_1 Jackson, M., Phillips, R., Ortega, K., & Gongaware, P. (Producers), & Ortega, K. (Director). (2009). This is it [DVD]. USA: Sony Pictures Entertainment. Jurin, R., Roush, D., & Danter, J. (2010). Environmental communication. New York, NY: Springer. Kamien, R. (2008). Music: An appreciation (6th ed.). New York, NY: McGraw-Hill. Konigsberg, I. (1988). The complete film dictionary. London: Bloomsbury. Kress, G., & Jewitt, C. (Eds.). (2003). Multimodal literacy. New York, NY: Peter Lang. Lester, L., & Cottle, S. (2009). Visualizing climate change: Television news and ecological citizenship. International Journal of Communication, 3, 920–936. Machin, D. (2010). Analysing popular music: Image, sound and text. London: Sage. Maier, C. (2010). Fostering environmental knowledge and action through online learning resources. Designs for Learning, 3(1–2), 70–83. Maier, C. (2011). Knowledge communication in green corporate marketing: A multi­ modal discourse analysis of an Ecomagination video. In K. O’Halloran & B. A. Smith (Eds.), Multimodal representation and knowledge (pp. 153–169). New York: Routledge. Martinec, R. (2000a). Construction of identity in Michael Jackson’s jam. Social Semiotics, 10(3), 313–329. Martinec, R. (2000b). Rhythm in multimodal texts. Leonardo, 33(4), 289–297. McKee, K., & Pardun, C. (2003). Reading the video: A qualitative study of religious images in music videos. Journal of Broadcasting & Electronic Media, 43(1), Winter, 110–122. McQuail, D. (2000). McQuail’s mass communication theory (4th ed.). Sage: London. Meister, M., & Japp, P. (2002). Enviropop: Studies in environmental rhetoric and popular culture. London: Præger. Mittell, J. (2001). A cultural approach to television genre theory. Cinema Journal, 40(3), 3–24. Moore, A. (Ed.). (2003). Analyzing popular music. Cambridge: Cambridge University Press. Moschini, I. (2007). Ecomagination: Natural values at work. Textus, XX, 223–242. Murphet, J. (2005). Narrative time. In H. Fulton, R. Huisman, J. Murphet, & A. Dunn, (Eds.). Narrative and media. Cambridge: Cambridge University Press. O’Halloran, K. (2004). Visual semiosis in film. In K. L. O’Halloran (Ed.), Multimodal discourse analysis: Systemic functional perspectives (pp. 109–131). London: ­Continuum. Reid, J. (2007). Literacy and environmental communications: Towards a “pedagogy of responsibility.” Australian Journal of Language and Literacy, 30, 118–133. Reid-Brinkley, R. (2008). The harsh realities of “acting black.” (Doctoral thesis). Retrieved April 10, 2012, from 12500

124  Carmen Daniela Maier and Judith Leah Cross Schwartz, L., & Ratner, B. (2007). Making music videos: Everything you need to know from the best in the business. New York, NY: Watson-Guptill. Starosielski, N. (2011). “Movements that are drawn”: A history of environmental animation from “The Lorax” to “Fern Gully” to “Avatar.” The International Communication Gazette, 73(1–2), 145–163. Stillwater, W. (2011). Earthsong. M Poetica: Michael Jackson’s art of connection and defiance. Kindle Edition, US: Willa Stillwater. Tagg, P. (1987). Musicology and the semiotics of popular music. Semiotica, 66(1/3), 279–298. Tagg, P. (2012). Music’s meanings. New York: Mass Media Music Scholars’ Press. Van Leeuwen, T. (1999). Speech, music, sound. London: Macmillan. Van Leeuwen, T. (2008). Discourse and practice: New tools for critical discourse analysis. Oxford: Oxford University Press. Vernallis, C. (2001). The kindest cut: Functions and meanings of music video. Screen, 42(1), 21–48. Vernallis, C. (2004). Experiencing music video: Aesthetics and cultural context. New York, NY: Columbia University Press. Wallis, C. (2011). Performing gender: A content analysis of gender display in music videos. Sex Roles, 64(3–4), 160–172.


Representations of the Institutional ‘Self’ in Web-Based Business News Discourse Sabine Tan

1. INTRODUCTION According to Fairclough (1995), much of the ideological work of the news media involves particular ways of representing the world, particular constructions of social identity. Van Dijk (1988) similarly observes that these particular constructions of social identity by the news media are not the result of “a direct or passive operation but rather a socially and ideologically controlled set of constructive strategies” (p. 28). As members of distinct social groups, news organizations are deemed to operate on the basis of sociocultural and institutionally specific ideological values, codes, and constructs that organize all interpretations, representations, social relations, and interactions. Like other news organizations, business news networks, such as Bloomberg, CNBC, and FOX Business (FBN), are likely to have internalized specific “group-based schemata of social participants, groups, institutions, and their structural relationships” (Van Dijk, 1988, p. 25). “[E]ach of these categories”, as Van Dijk (1988) elaborates, “may be further associated with sets of often stereotypical criteria that condition such categorizations, such as prototypical appearance, activities, or social situations of manifestation” (p. 25). Whilst the news industry has always relied on visual communication in its mediation process, media professionals in the digital age deploy “ever more powerful visual means to convey complex situations . . . to readers and audiences” (Machin & Niblock, 2006, p. 25). New media practices and technologies, alongside the increasingly fast-paced nature of business and finance, have influenced the ways in which business news networks present themselves to their audiences. In the twenty-first century, business news networks routinely deploy an array of dynamic, interactive blends of verbiage, images, graphics, audio, and video streams, all of which can be disseminated and accessed through an ever-growing universe of interconnected websites. The multifaceted—and increasingly fast-paced—nature of business and financial journalism has in fact become an accepted part of the prevailing cultural landscape (e.g., see Kurtz, 2000; see also Mahar, 2003). Traditional images of the once-staid world of business and finance have been replaced by a never-ending stream of constantly changing images and information, delivered at breathtaking speed by telegenic men and women against a backdrop

126  Sabine Tan of noise, colour, and the “hubbub of open outcry trading” (Clark, Thrift, & Tickell, 2004, p. 297; see also Kurtz, 2000). In the digital age, the world of business and finance has turned into a media spectacle “dominated by video clips, the rush and clash of symbols, and the need to entertain minute after minute” (Clark et al., 2004, p. 303), and the once sober and serious field of business and finance has “become a news and entertainment commodity like any other” (Clark et al., 2004, p. 293). 2.  OVERVIEW OF THE STUDY The study presented in this chapter looks at the multiple ways in which popular business news networks such as Bloomberg, CNBC, and FBN construct their own identities in business news video clips mediated on the Internet. In particular, the study examines how social identities, roles, and relationships are constructed and represented through the multimodal interplay of: (1) static and dynamic moving images in video thumbnails and live or recorded video footage; (2) visio-textual displays of on-screen text in the main visual frame, lower-thirds, studio backdrops, props, and settings; and (3) dialogic representations (e.g., self-identification, delegated naming, third-party reference, etc.). In summary, the study seeks to establish who or what is represented in which mode or medium; what particular process types (e.g., whether social actors are endowed with active or passive discourse roles, how they are named and introduced), categories (e.g., which types of social actors are represented), and discursive practices (e.g., which social actors get to speak on their own accord, or are delegated the right speak) are drawn upon; and the issues this may raise for coparticipants in the studio and audiences in front of the computer screen. 3.  THE DATA The data in this study come from a sample of 200 analyzed videos clips from a larger corpus of more than 800 videos released on the video web-pages of the aforementioned business news networks (see Table 8.1 for illustration) over a 50-hour period in January 2009.1 Table 8.1  Business news networks’ video webpages Business News Network

Webpage URL Editors Video Picks US TV Clips /tvtoday.html Video Gallery, All Video sub-section 15839796&tabheader=false video page, Latest Video sub-section

Representations of the Institutional ‘Self’  127 4. METHODOLOGIES, THEORIES, AND INTERPRETATIVE FRAMEWORKS Exploring the discourse of hyper-mediated broadcast news—a medium in which visual and verbal elements play a powerful and integral role—“entails much more than examining it as a mode of constructing reality and mediating the world” (Montgomery, 2007, pp. 20–21). As Montgomery (2007) claims, the complexities that are inherent in these multimodal representations of reality are likely to pose challenges to any analytical framework or schema, and their analysis will be difficult to accomplish without resorting to the insights of more than one discipline or research tradition. In order to sufficiently address the complex issues of how business news networks represent themselves to their audiences, this study takes an interdisciplinary approach (see also Fairclough, 2001, p. 121; Meyer, 2001, pp. 29–31; Van Dijk, 2001, 2011), integrating complementary theoretical perspectives and methodologies from the fields of conversation analysis, critical multimodal discourse analysis, and social semiotics. Specifically, the analysis draws on and adapts Clayman and Heritage’s (2002) formulations of the personal and social roles that are constructed for participants in TV news interviews, Montgomery’s (2007) model for the institutionalized discourse roles and participation frameworks for news affiliates, Scollon’s (1998) social interaction model for mediated news discourse, and Van Leeuwen’s (1995, 1996, 2008) frameworks for the representation of social actors and social action in discourse. The remainder of the chapter first describes the types of social actors and modes of representations that business news networks draw upon. It then presents the analysis and discussion of how their respective identities are constructed and negotiated in televisual business news discourse mediated on the Internet. 5.  TYPES OF SOCIAL ACTORS AND MODES OF REPRESENTATION Based on the data analyzed for this study, the following types of social actors are typically represented in online business news videos.

5.1  News Affiliates Access to the specialized discourse domain of business and financial news is generally restricted to professional newsworkers, defined in this study as news affiliates. The term is adapted from Montgomery (2007, 2008) and refers to the discourse participants that represent the news organization in the capacity of anchors/presenters, reporters, correspondents, editors, contributors, panellists, and so on. As Montgomery (2007, pp. 34–35) explains, the institutionalized discourse roles that are assigned to newsworkers by their organizations are not universal, but vary considerably according to

128  Sabine Tan the prevailing cultural norms and institutional practices. For this reason, the taxonomical classifications proposed in this article are not based on the hierarchical or professional roles that news networks accord to their news personnel, but rather on the identities and positions that news affiliates construct for themselves—and their coparticipants—in mediated social interactions. Budd, Craig, and Steinman (1999, p. 124) perceive network news as a hierarchical discourse of authority and mediation that is almost exclusively the prerogative of the news affiliate. At the top of this hierarchy is the anchor or presenter, who—as the symbolic centre of the televisual news system—“mediates and frames virtually everything in the program” (Bid 1999, pp. 124–125). In this work, the term anchor/presenter is thus used to define all news affiliates who appear in the position of principal mediators in a news video, regardless of whether they are tasked specifically with news presentation, reporting, hosting, anchoring, interviewing, and so on. All other news affiliates (that is, reporters, correspondents, editors, contributors, panellists, etc.), whose authority to speak on behalf of the news organization is in some way framed, constrained, or otherwise mediated by the anchor/presenter, are characterized as affiliate experts. The term derives from Montgomery (2007) and Machin and Niblock (2006), who have observed the rise of the ‘journalist-as-expert’, that is, specialist reporters, correspondents, editors, or other news affiliates who are called upon as ‘expert’ sources in live ‘two-way’ interview situations, where one broadcast journalist interviews another (see Machin & Niblock, 2006, p. 30).

5.2  Institutional Actors While the constructed identities of social actors pertaining to the news agency or news network are often embodied in human actors, news organizations build their names, brands, and distinctive representational styles (e.g., Coupland, 2001, p. 416; Machin & Niblock, 2006, p. 21; Montgomery, 2007, p. 193) in part around the televisual apparatus. In the present study, representations of a news network’s brand and corporate identity are conceptualized as institutional actors. For example, the identity of a news organization may be expressed in the form of studio backdrops, settings, props, theme music, and sound effects, as well as digitized corporate logos and graphic displays (see also Lauerbach, 2007; Machin, 2007; Schieß, 2007). As Fairclough (1995) points out, such representations involve “visual and aural semiotics as well as language, including the layout of the newsroom, the opening sequence and theme music of the news programme” (p. 93; see also Allan, 1998; Scollon, 1998). In previous studies these multimodal—and often self-reflexive—displays of institutional identity and authority have been variously subsumed within a “studio modality system” (Graddol, 1994, pp. 147–148) or the televisual “authentication process” (Coupland, 2001, p. 416).

Representations of the Institutional ‘Self’  129

5.3  Modes of Representation Scollon (1998), who investigates the various ways in which journalists and newsmakers are given identity and voice in televisual news discourse, offers a detailed and comprehensive account of the multiple resources available for identifying and positioning newsworkers within the sphere of broadcast news. He posits that the identity of news affiliates is negotiated and constructed through multiple discursive frames, which are realized through naming, on-screen characters, video identification, and visual/verbal terms of address (see Scollon 1998, pp. 210–211). The interaction and convergence of different types of news media, such as television, print, and the Internet, however, has led to significant transformations in the presentational style of broadcast news media in recent decades (e.g., see Montgomery, 2007, p. 184). These convergent transformational processes—where complex presentational styles ‘borrowed’ from other media are incorporated and represented in (and through) another medium—are encapsulated in Bolter and Grusin’s (1998) concept of remediation, which, in turn, is invoked through ‘the twin logics’ of immediacy and hypermediacy. According to Bolter and Grusin (1998), immediacy is being enacted in our “apparently insatiable desire” for ‘live’ television coverage, while hypermediacy can be seen in the “mixing and matching” of media styles, that is, the amalgamation of video streams, split-screen displays, graphics, text, and so forth inside multiple windows within a single television screen (Bolter & Grusin, 1998, pp. 5–6). Business news videos mediated on the Internet draw upon a blend of semiotic resources from both hyptertextual and televisual media. The videobite—which represents the entry-point for the viewer as well as the analyst to the embedded news video—deploys both pictorial and verbal resources in the form of a thumbnail image, a headline, and often a summary-lead (analogous to the newsbite identified by Knox, 2007, 2009). The various semiotic resources that business news networks draw upon for constructing and negotiating their own identities in the embedded news videos are summarized in Table 8.2. 6.  ANALYSIS AND DISCUSSION

6.1  Representations of News Affiliates As pointed out by Fairclough (1995), in news discourse analysis, “[i]t is not simply the identification of participants that is of analytical interest; a key question is how participants’ identities and relations are constructed” (p. 39). The subsequent analysis and discussion deals with the question of how the identities of news affiliates are constructed in hyper-mediated business news discourse. It focuses on what is foregrounded or backgrounded, what is thematized or unthematized, and what process types and categories are drawn upon to represent social actors and social (inter)actions.

• Corporate logos • On-screen text (e.g., network name, title of program or program segment)

• Dynamic images with • In main visual frame soundtrack of live • I n lower thirds broadcast from the news • As part of composistudio or from the field tional framing in splitscreen presentations • Embedded, prerecorded actuality footage • Static, photographic images

Videographic Displays

Animated and Static Graphic Displays of Logographic Symbols

Visio-Textual Displays of On-Screen Text (On-Screen Captions)

Aural (Acoustic) Displays

• Self-identification • Phonic images (i.e., a • Delegated naming (i.e., social actor or discourse participant discourse participants is represented through are identified in verbal audio only) discourse, either by being explicitly • Sound images introduced, or by being (i.e., music, synthetic addressed) sound effects, or sound textures) • Third-party reference (i.e., co-present discourse participants are named in indirect address, for instance “as X mentioned”, “I agree with X”) • Reporting (i.e., nonpresent news actors are named in verbal discourse)

Verbal (Dialogic) Displays

Table 8.2  Semiotic resources for constructing and negotiating the identity of business news networks in online news videos

Representations of the Institutional ‘Self’  131 In linguistic discourse, part of the meaning of every clause lies in the part of the message that is chosen as the theme, that is, the element that is placed at the beginning of a clause. According to Halliday (1994), theme is “that with which the clause is concerned”, or the point of departure of the message (p. 37). Answering the question of what constitutes the point of departure in televisual media discourse involves the careful consideration of “[w]hat presences and absences, foregrounding and backgrounding, characterize the text” (Fairclough, 1995, p. 203). Drawing upon Van Leeuwen’s (1995, 1996, 2008) framework for the representation of social actors and social action, we can establish that social actors may either be ‘included’ or ‘excluded’ in discursive representations. If they are excluded, this may be due to acts of suppression or backgrounding, which means that a “social actor may not be mentioned in relation to a given activity”, but may be mentioned elsewhere in the text. In the case of backgrounding, actors are “not so much excluded as de-emphasized, pushed into the background” (Van Leeuwen, 1996, p. 39). Acts of inclusion, on the other hand, are generally concerned with the roles that are accorded to social actors in discourse (Van Leeuwen, 1996, p. 42). The analysis thus seeks to determine whether news affiliates are endowed with active or passive roles, and what the implications of these role-allocations are. For example, there is a significant difference in the ways in which speech (and its contents) is attributed to particular news actors in discourse (Scollon, 1998, pp. 217–219). In CNBC’s and FBN’s videobites, for instance, the speech verb reports generally tends to collocate with news presentation styles that are “shaped and delivered under the constraint of ‘doing facticity’ (sounding objective and unbiased)” (Montgomery, 2007, p. 128) (see Examples 1a–b; see also Table 8.3, Example A, for an illustration of representations of news affiliates and institutional actors in corresponding videobites and news videos). Example (1) a Bank of America’s Ken Lewis holds an emergency meeting with ­Merrill’s John Thain today, reports CNBC’s Charlie Gasparino. (CNBC Video “Lewis & Thain Meeting”, January 23, 2009) b FBN’s Adam Shapiro reports on Bernard L. Madoff Securities CFO Frank DiPascali’s role in the alleged ponzi scandal. (FBN Video “Exclusive Madoff CFO Under Fire”, January 21, 2009) Other discursive means by which the reporter is constructed in CNCB’s videobites include “has the latest”/“has the details”, which, according to Scollon (1998, pp. 164–165), often imply a “reportorial” function (see Examples 2a–b).

132  Sabine Tan Table 8.3  Representations of news affiliates and institutional actors in videobites and news videos on popular business news networks Example A: Representation of affiliate expert ‘doing facticity’ on CNBC

Example B: Representation of affiliate expert reporting from the ‘field’ on CNBC

Example C: Representation of affiliate expert in possession of an incriminating document on CNBC Example D: Foregrounding of actions and reactions in representations of multiparty interactions on CNBC Example E: Representations of affiliated experts in lower thirds and relational identification with the news network through multiple displays in the main visual frame and studio background on CNBC Example F: Representations of celebrity news affiliates on CNBC


Representations of the Institutional ‘Self’  133 Table 8.3  (Continued) Example G: Representations of news affiliates divested of hierarchical or professional status on Bloomberg

Example H: Representations of the institutional ‘self’ through animated graphic displays and studio setting on CNBC and Bloomberg

Example (2) a CNBC’s Diana Olick has the latest housing data. (CBNC Video “Housing Data Latest”, January 23, 2009) b CNBC has obtained one of the pitch books used to lure investors to Bernard Madoff. Scott Cohn has the details. (CNBC Video “Madoff’s Pitch Books”, January 22, 2009) On CNBC, “has the latest”/“has the details” appears to be reserved for affiliate experts, that is, specialist reporters and correspondents. Such role identifications commonly prefigure affiliated (live two-way) interviews with embedded news reports, often delivered from ‘the field’ or an ‘off-site’ studio location (see also Table 8.3, Example B). Interpreted in terms of Van Leeuwen’s (2008) framework for social actors and social action, the phrases “has the latest”/“has the details” imply a passive (rather than an active) participant role. In addition, on CNCB, the terms “has the latest”/“has the details” may be used to describe the physical act of possession. As shown in Table 8.3, Example C, the news affiliate named in Example 2b is indeed portrayed in possession of the incriminating document referred to in the accompanying headline and summary-lead. Although news affiliates tend to be visually foregrounded in the video­ bite thumbnail images on CNBC, they are seldom named in the videobite headline and rarely thematized in the videobite summary-lead. Instead, in the majority of CNBC’s videobites, news affiliates are represented as

134  Sabine Tan comitative or accompanying elements in circumstantial clauses, realized by means of the prepositional phrase with, even when they officiate as anchors and presenters in the embedded video clip. Instead, what appears to be foregrounded in the videobite summary-leads on CNBC are process types— actions and reactions, such as “discussing”, “breaking down”, “reviewing” (see Examples 3a–c). Example (3) a Discussing Treasury Secretary designate Timothy Geithner’s tax mistakes, with David Kotok, Cumberland Advisors; CNBC’s John Harwood, Steve Liesman, Charlie Gasparino, and Rick Santelli. (CNBC Video “Geithner Testimony Reaction”, January 22, 2009) b Breaking down the jobless claims and Housing Starts data, with Jim Iuorio, TJM Institutional Services, and CNBC’s Rick Santelli. (CNBC Video “Economic Data”, January 22, 2009) c Reviewing Google’s fourth-quarter earnings, with CNBC’s Jim Goldman; Porter Bibb, Mediatech Capital Partners; Glenn Hutchins, Silver Lake; and CNBC’s Maria Bartiromo. (CNBC Video “Google’s Q4 Results”, January 23, 2009) These process types in the videobite summary-lead function to index open and dynamic presentational formats in the embedded video clip, where news affiliates are not tasked with simply reporting the news but instead actively participate in the interaction. CNBC and FBN, in particular, favour multiparty panels and round table discussions that are characterized by a high degree of topical freedom, fluid speaker turns, and open-ended discourse frames (see also Tan, 2011, p. 187) where anchors/presenters and other news affiliates engage in live unscripted conversation and debate (see Table 8.3, Example D). Anchors/presenters who facilitate these multiparty interactions on CNBC often do not identify themselves or their network, nor are they identified in the form of on-screen text. Only affiliate experts (i.e., reporters, correspondents, or editors) are named and introduced to the audience by the anchor/presenter in spoken discourse. The process of naming and introducing affiliate experts thus functions to delegate the explicit right to speak to these interlocutors (see also Bell, 1991, p. 196; Scollon, 1998, p. 163). In their simplest form, delegation frames may contain only names or vocatives (calls). Calls are particularly favoured for allocating speech turns to affiliated experts (see Examples 4a–b), and can range from familiar (hypocorisms or diminutives), to informal (given name only), to semi-formal (given name

Representations of the Institutional ‘Self’  135 and surname) invocations, dependent upon an anchor’s individual presentational style. Example (4) a Anchor/Presenter: “John Thain is out at Bank of America. . . . CNBC’s Charlie Gasparino broke the story earlier today. He is here with us back with more details. Charlie.” (CNBC Video “Thain to Leave BofA”, January 23, 2009) b Anchor/Presenter: “Google minutes away from its fourth quarter. Silicon Valley Bureau Chief Jim Goldman with the set up. Ad sales obviously at the centre piece of this thing. Jim. What are we gonna get?” (CNBC Video “Google Q4 Preview”, January 23, 2009) In these interactions, only the identity of the affiliate expert is co-constructed visio-textually through the placement of on-screen text in the lower thirds. On CNBC, on-screen representations often include the affiliate experts’ functional categorizations, whereby they are referred to in terms of their distinctive ­occupational roles (e.g., “On-Air Editor”; “Silicon Valley Bureau Chief”), as well as relational identification and affiliation with the news network or program through multiple displays of corporate logos, circumstances of time (“live”), and location in the main visual frame and studio background (see also Table 8.3, Example E). According to Scollon (1998, p. 174), the question of identity in news discourse needs to be understood as a matter of not only naming and identifying news actors, but also accounting for their authority. How news affiliates are being positioned in discourse tends to reflect the place they hold within the organizational framework of the news agency. Anchors/presenters and affiliate experts are at the top of the institutional hierarchy and thus represent the ‘public face’ of the news network (see also Budd et al., 1999, p. 124). Foregrounding and promoting their centrality also lends emphasis to the popularity of personality-driven journalism that is prevalent in the United States (e.g., see Kurtz, 2000, p. 125). The act of foregrounding anchors/ presenters and affiliate experts in news discourse draws particular attention to the values that are attached to the personal characteristics of ‘front-stage’ news personnel (see also Budd et al., 1999, p. 125; Coupland, 2001, p. 417). Indeed, it is not uncommon for anchors and presenters to become “stars with fans, high salaries, and public personas” in the United States (Budd et al., 1999, p. 124). In many cases, anchors’ names may even be part of their own syndicated TV show or news program (Budd et al., 1999, p. 124; see also Clark et al., 2004; Kurtz, 2000).

136  Sabine Tan This study shows that only news affiliates with ‘celebrity’ status are foregrounded consistently across multiple semiotic modes in the videobite thumbnail, headline, and summary-lead, where they are accorded active participant roles in thematic position, as well as in subordinate clauses in the summary-lead (see also Table 8.3, Example F). Moreover, the proposition placed in thematic position in the videobite headline functions to self-reflexively index the title of the program or program segment that these news affiliates are shown to be hosting in the embedded news video (e.g., “Stop Trading, Listen to Cramer!”; “Maria’s Market Message”). The attention focused on news affiliates with ‘star’ appeal is, in part, attributable to the ‘mighty’ media apparatus that is built around the personalities of certain news workers in the finance media, which has turned some of them into ‘big-name’ celebrities, not just in the realms of business and finance, but in popular media as well (Kurtz, 2000, p. 209; see also Friedman, 2010, for anecdotal evidence). As a result, acclaimed financial journalists like CNBC’s Maria Bartiromo or flamboyant market commentators like Jim Cramer have acquired iconic statuses that are synonymous with the pursuits of their news organization. As Clark and colleagues (2004, p. 293) aptly comment, these celebrity news affiliates do not simply (re)present financial news, they virtually embody it. They are the “commodity that sells the news” (see Budd et al., 1999, p. 124). In effect, as Kurtz observes (2000, pp. 292–293), CNBC and its ‘star’ anchors have acquired the status of a pop culture icon: The network was not only featured in the popular American television drama series The Sopranos (1999–2007), but also was more recently cast in a major ‘supporting role’ in Oliver Stone’s film Wall Street: Money Never Sleeps (2010), with several CNBC ‘star’ anchor personalities—Maria Bartiromo and Jim Cramer amongst them—appearing in so many cameo roles that the film may be seen as “product placement for CNBC” (Friedman, 2010). Nonetheless, not all business news networks foreground the persona of the news affiliate to the same extent. In contrast to the high visibility and authorial freedom that is accorded to news affiliates on CNBC and FBN (e.g., see Tan, 2011, 2012), news affiliates on Bloomberg tend to be excluded from visual representation in the videobite thumbnail and backgrounded in discourse. Bloomberg’s news affiliates are generally divested of their hierarchical or professional status and construed simply as representing or ‘belonging’ to the news organization per se. Visio-textual representations of news affiliates in the lower thirds, for example, are limited to on-screen nomination and relational identification with the news network (e.g., “Laura Lee, Bloomberg News”; see Table 8.3, Example G). Moreover, Bloomberg’s news affiliates are consistently passivated in textual representations in the videobite summary-leads. Interpreted in terms of Van Leeuwen’s (2008) frameworks, the participatory role that is assigned to news affiliates by the verbal group “talks with . . . about” represents an instance of beneficialization, whereby the beneficialized participant (that

Representations of the Institutional ‘Self’  137 is, the news affiliate in this case) is construed as the receiver of information in relation to a verbal process (see Examples 5a–b). Example (5) a Bill Smith, chief executive officer of Smith Asset Management Inc., talks with Bloomberg’s Matt Miller about Citigroup Inc.’s move to make Richard Parsons chairman of its board, replacing Win Bischoff. (Bloomberg Video “Smith Says Parsons to Oversee Dismantling of Citigroup”, January 21, 2009) b Marshall Front, chairman of Front Barnett Associates LLC, talks with Bloomberg’s Carol Massar about President John Thain’s agreement to leave Bank of America Corp. (Bloomberg Video “Front Says Thain Had to Take Fall for Bank of America”, January 22, 2009) Indeed, a more comprehensive analysis of discursive processes has shown that it is not news affiliates but high-ranking expert interviewees from the fields of business and finance who are foregrounded across modes and media in online representations on Bloomberg (see Tan, 2012).

6.2  Representations of Institutional Actors Business news programs often begin with, or include, an explicit institutional frame in which anchors/presenters welcome and greet the audience, identify themselves by name, and affirm their affiliation with the news network by stating the title of the program or program segment. Institutional frames are presented in direct visual address to camera and often include circumstantial elements of place and time (see Example 6a). Institutional frames may also include iconic images (logos and operational news values) expressed in the form of animated graphic displays, together with synchronized sound images (see Example 6b). Example (6) a Anchor/Presenter: “All right. Welcome back to Starting Bell, live from New York. I’m Matt Miller, in for Betty Liu on this Thursday.” (Bloomberg Video “Today’s Economic Data: U.S. Initial Jobless Claims Rose 62K to 589K, Housing Starts Fall 15.5% to 550K”, January 22, 2009)

138  Sabine Tan b [Animated graphic display of operational news value “BREAKING NEWS”; pulsating sound image] Anchor/Presenter: “Welcome to the Call. I’m Charlie Gasparino.” (CNBC Video “John Thain to Leave Bank of America”, January 23, 2009) This foregrounding of the network’s corporate identity and institutional values, as Montgomery (2007, p. 193) argues, must not be (mis)construed as a representation of corporate branding. Studio systems should not be regarded simply as a form of “backdrop to news presentation”, but rather as “the site from where the news is enunciated in the here and now” (Montgomery, 2007, p. 76; emphasis added; see also Allan, 1998, p. 129). According to Allan, (1998), televisual news “provide[s] an up-to-the minute (now) narrative which, in turn, projects for the viewer a particular place (here) from which she or he may ‘make sense’ of the significance of certain ‘newsworthy’ events” (p. 105). This is usually achieved by the use of deictics “which anchor the articulation of time (‘now’, ‘at this moment’, ‘currently’, ‘as we are speaking’, ‘ongoing’, ‘today’) to that of space (‘here’)” (Allan, 1998, p. 126). On CNBC, the institutional narratives that anchor the news in the ‘here and now’ function to foreground operational news values and ideologies. In addition, they are often explicitly self-reflexive. For instance, the majority of videobites for News Now—a series of short hourly headline news briefs—consistently foreground the program title, logo, and institutional news values in both the headline and thumbnail image (e.g., “The top headlines this hour”; “The hour’s top business headlines”). In addition, thumbnail images that represent the anchor/presenter in direct address to camera tend to portray the anchor in an archetypal institutionalized setting (to a far greater extent than other news affiliates), framed either by multiple displays of the network’s corporate logo or by the remediated display of the program title in the background. Institutional identities may also be represented in the form of animated graphic displays, which co-deploy various visual and aural semiotic resources such as a dynamic revolving, appearing and disappearing iconic images and logos, text in large three-dimensional font size and vibrant colours, and a dynamic ‘breaking up’ of the text or logographic image as it fades and disappears (see also Smith, Tan, Podlasov, & O’Halloran, 2011) (for illustration, see Table 8.3, Example H, top row). Often, these dynamic institutional displays are accompanied by pulsating music or sound textures such as buzzing or ‘swooshing’ sounds (see also Van Leeuwen, 1999). While animated graphic displays generally function as transitional devices, they do have a tendency to virtually ‘explode’ onto the screen in extreme close-ups. They thus have the capacity to engage the viewer interpersonally. Moreover, such displays of institutional actors are inherently self-reflexive,

Representations of the Institutional ‘Self’  139 drawing attention to the title of the program or program segment. If displayed as part of the studio backdrop or setting, they tend to place emphasis on the traditional medium of reception—the ubiquitous television screen— which often occupies centre stage in these representations (see Table 8.3, Example H, bottom row). 7. CONCLUSION The analysis has shown that televisual business news discourse mediated on the Internet is an inherently hybrid genre that is reflective of the distinctive institutional practices and ideologies that are built around the televisual apparatus by each news network. As Machin and Niblock (2006) observe, the forces of competition may lead news networks to focus increasingly on ‘branding’ to distinguish themselves from other, similar networks (p. 21). According to Budd and colleagues (1999), this is also reflected in the tendency of news genres becoming increasingly reflexive by referring more and more to themselves “with programs and ads referring to other programs” and by drawing explicit attention to their own “constructedness” (p. 135; see also Iedema, 2003, p. 38). Consequently, the adopted presentational styles are also bound to be reflective of the networks’ sociocultural and historical values. The fact that business news networks such as CNBC and FBN have their historic roots in mainstream network television may perhaps explain the emphasis that is accorded to the news affiliate, specifically the affiliate expert or ‘star’ anchor, as well as the networks’ proclivity for informalized, conversationalized, and highly charged interpersonal discourse styles, which may appeal to a wider circle of general audiences with varying levels of financial literacy. Conversely, the more restrained and self-effaced presentational style favoured by Bloomberg may be intended to appeal to the refined tastes and preferences of a specialist target audience comprised largely of high-net worth individuals (cf. Pew Research Center, 2010). It needs to be acknowledged, of course, that the positions that are created for audiences in news discourse have always been treated as controversial. Scollon (1998, pp. 185–188), for example, posits that the primary audience of most news discourse is, in fact, other journalists, where viewers are positioned as mere spectators in the journalist’s ‘conversational game’. For Clark and colleagues (2004), business news discourse is enacted through “the rule of theatricality” (p. 293). They argue that newly emergent media practices owe more to entertainment television than to “serious and sober” financial journalism which, they claim, has been transformed into a form of entertainment that is dominated by a cacophony of people, noises, and a constantly changing array of images and information (pp. 297–303). The increasing shift toward conversationalization and informalization has been observed by other media theorists, such as Fairclough (1995, pp. 10–11),

140  Sabine Tan who claims that mounting commercial pressures and competition have implications on the ways news media represent themselves to their audiences. For Fairclough (1995), this change in media practices not only reflects an attempt to increase audience appeal, but also affects the form and style of delivery (see Fairclough, 1995, pp. 42–43; see also Clark et al., 2004, p. 293). While some media researchers and theorists view these changing media practices with suspicion, others feel that current developments in personalizing the news “should instead be viewed as a positive contribution to its communicative ethos” (Thornborrow & Montgomery, 2010, pp. 100–101). Moreover, the emergence of network news has produced discursive styles that are more relaxed, open and dynamic, and geared toward greater naturalism, compared to more traditional styles of news presentation (Montgomery, 2007, p. 196). As Clark and colleagues (2004) observe, this tendency toward greater naturalism and popularization can in fact “breathe life” into the otherwise sober world of business and financial journalism (p. 298). As a result, business and financial news will become accessible to a wider circle of general and increasingly diverse audiences. NOTE 1. The data presented in this article are part of the author’s PhD research (Tan, 2012), undertaken within the Events in the World project at the Multimodal Analysis Lab, Interactive Digital Media Institute (IDMI) at the National University of Singapore. The research was supported by Interactive Digital Media Program Office (IDMPO) in Singapore under the National Research Foundation’s (NRF) (grant number: NRF2007IDM-IDM002–066).

REFERENCES Allan, S. (1998). News from NowHere: Televisual news discourse and the construction of hegemony. In A. Bell & P. Garrett (Eds.), Approaches to media discourse (pp. 105–141). Oxford: Blackwell. Bell, A. (1991). The language of news media. Oxford: Blackwell. Bolter, J. D., & Grusin, R. (1998). Remediation: Understanding new media. Cambridge, MA: MIT Press. Budd, M., Craig, S., & Steinman, C. (1999). Consuming environments: Television and commercial culture. New Brunswick, NJ: Rutgers University Press. Clark, G. L., Thrift, N., & Tickell, A. (2004). Performing finance: The industry, the media and its image. Review of International Political Economy, 11(2), 289–310. Clayman, S., & Heritage, J. (2002). The news interview: Journalists and public figures on the air. New York, NY: Cambridge University Press. Coupland, N. (2001). Stylization, authenticity and TV news review. Discourse Studies, 3(4), 413–442. Fairclough, N. (1995). Media discourse. London: E. Arnold. Fairclough, N. (2001). Critical discourse analysis as a method in social scientific research. In R. Wodak & M. Meyer (Eds.), Methods of critical discourse analysis (pp. 121–138). London: Sage Publications.

Representations of the Institutional ‘Self’  141 Friedman, J. (2010, September 22). Maria Bartiromo, CNBC star in “Wall Street 2.” Market watch: The Wall Street Journal Digital Network. Retrieved from–09–22 Graddol, D. (1994). The visual accomplishment of factuality. In D. Graddol & O. Boyd-Barrett (Eds.), Media texts, authors and readers: A reader (pp. 136–160). Clevedon: Multilingual Matters in association with The Open University. Halliday, M. A. K. (1994 [1985]). An introduction to functional grammar. Second Edition. London: Edward Arnold. Iedema, R. (2003). Multimodality, resemiotization: Extending the analysis of discourse as multi-semiotic practice. Visual Communication, 2(1), 29–57. Knox, J. (2007). Visual-verbal communication on online newspaper home pages. Visual Communication, 6(1), 19–53. Knox, J. S. (2009). Punctuating the home page: Image as language in an online newspaper. Discourse & Communication, 3(2), 145–172. Kurtz, H. (2000). The fortune tellers: Inside Wall Street’s game of money, media, and manipulation. New York, NY: Free Press. Lauerbach, G. E. (2007). Presenting television election nights in Britain, the United States and Germany: Cross-cultural analyses. In A. Fetzer & G. E. Lauerbach (Eds.), Political discourse in the media: Cross-cultural perspectives (pp. 315–375). Amsterdam: John Benjamins. Machin, D. (2007). Introduction to multimodal analysis. London: Hodder Arnold. Machin, D., & Niblock, S. (2006). News production: Theory and practice. Abingdon: Routledge. Mahar, M. (2003). Bull! A history of the boom, 1982–1999. New York, NY: HarperCollins. Meyer, M. (2001). Between theory, method, and politics: Positioning of the approaches to CDA. In R. Wodak & M. Meyer (Eds.), Methods of critical discourse analysis (pp. 14–31). London: Sage Publications. Montgomery, M. (2007). The discourse of broadcast news: A linguistic approach. New York, NY: Routledge. Montgomery, M. (2008). The discourse of the broadcast news interview. Journalism Studies, 9(2), 260–277. Pew Research Center. (2010). The state of the news media 2010: An annual report on American journalism. Pew Project for Excellence in Journalism. Retrieved ­January 3, 2011, from Schieß, R. (2007). Information meets entertainment: A visual analysis of election night TV programs across cultures. In A. Fetzer & G. E. Lauerbach (Eds.), Political discourse in the media: Cross-cultural perspectives (pp. 275–313). Amsterdam: John Benjamins. Scollon, R. (1998). Mediated discourse as social interaction: A study of news discourse. London: Longman. Smith, B. A., Tan, S., Podlasov, A., & O’Halloran, K. L. (2011). Analyzing multi­ modality in an interactive digital environment: Software as metasemiotic tool. Social Semiotics, 21(3), 359–380. Tan, S. (2011). Facts, opinions and media spectacle: Exploring representations of business news on the internet. Discourse & Communication, 5(2), 169–194. Tan, S. (2012). Multimodal approaches to business news discourse mediated on the Internet and television. (PhD thesis). National University of Singapore. Thornborrow, J., & Montgomery, M. (2010). Special issue on personalization in the broadcast news interview. Discourse & Communication, 4(2), 99–104. Van Dijk, T. A. (1988). News analysis: Case studies of international and national news in the press. Hillsdale, NJ: Lawrence Erlbaum.

142  Sabine Tan Van Dijk, T. A. (2001). Multidisciplinary CDA: A plea for diversity. In R. Wodak & M. Meyer (Eds.), Methods of critical discourse analysis (pp. 95–120). London: Sage Publications. Van Dijk, T. A. (2011). Discourse studies: A multidisciplinary introduction. London: Sage. Van Leeuwen, T. (1995). Representing social action. Discourse & Society, 6(1), 81–106. Van Leeuwen, T. (1996). The representation of social actors. In C. R. Caldas-Coulthard & M. Coulthard (Eds.), Texts and practices: Readings in critical discourse analysis (pp. 32–70). London: Routledge. Van Leeuwen, T. (1999). Speech, music, sound. Basingstoke: MacMillan. Van Leeuwen, T. (2008). Discourse and practice: New tools for critical discourse analysis. New York, NY: Oxford University Press.


Selling the ‘Indie Taste’ A Social Semiotic Analysis of frankie Magazine Sumin Zhao

1.  THE STORY OF FRANKIE Between December 2009 and December 2011, the overall circulation of print magazines in Australia reported a drop of 5.61 percent (ABC, 2012). Bucking the trend of the decline is an independent (‘indie’) bimonthly women’s fashion and lifestyle magazine titled frankie, founded in a small apartment in suburban Melbourne in 2004 by two Australia young ‘creatives’—online editor Louise Bannister and creative director Lara Burke. With a 13.97 percent increase in its circulation during the reported period, frankie has become one of the fastest growing magazines in Australia. By the end of 2011, the monthly circulation of frankie reached 57,934, outnumbering 100-year-old publications of the same category such as Harpers Bazaar (54,158) and Vogue (51,103) (ABC, 2012). The social media pages of frankie have attracted an even larger global following, with over 168,000 on Facebook and 52,000 on Twitter.1 In the magazine industry and the mainstream media, the success story of frankie is believed to reflect “an increasingly splintered and niche driven magazine industry” (Gearin, 2010). The ‘niche’ that distinguishes frankie from mainstream women’s fashion and lifestyle magazines lies in its ability to create, in its editor Jo Walker’s words, a mix between “edgy” and “daggy” and “something that feels a bit more genuine, a bit more real, a bit less rushed and mass produced” (Gearin, 2010). frankie claims to advocate for a new type of magazine-reader relation, in which the magazine does not see itself as “big up on high”, but “always on a level playing field with people” (Gearin, 2010). It positions itself as a magazine for “women (and men) looking for a magazine that is smart, funny, sarcastic, friendly, cute, rude, arty, curious and caring as they are”.2 With its resistance to mass production and emphasis on creativity and authenticity, frankie clearly identifies with what is known as the ‘indie’ subculture (cf. Duncombe, 1997; see also Bednarek, this volume). The commercial success of frankie, therefore, relies largely on the effective commodification of indie cultural values and practices, or to use Bourdieu’s (1984) term, the indie taste. In this chapter, I explore the multimodal discursive strategies frankie employs in creating the ‘indie taste’ and negotiating its readership. In particular,

144  Sumin Zhao I focus on the visual design, generic, and semantic features of a section in the magazine called “Frank Bits”, which is essentially eight pages of editorial promotions. I endeavour to provide critical insights into the ways in which frankie deals with the classic paradox of the indie subculture—opposing dominant capitalist consumer culture, while producing a distinct cultural taste for its own consumers (Newman, 2009). In highlighting the critical aspect of my analysis, I do not suggest that I am concerned with ‘criticizing’ the practices of frankie. Neither do I intend to provide an analysis of or a solution to the internal contradiction of indie culture. Rather, I am interested in revisiting a central question that still perplexes popular cultural studies, in particular women’s magazine studies, as postulated by Macdonald (1995): “Why is it easier to criticize those media that target us than to explain their fascination?”(p.11). The fascination of frankie, as I will show, arises from the ways in which it commodifies the indie culture by ‘resolving’ the indie culture paradox at the discursive level, creating a discourse that is inclusive of those who identify with the indie subculture and those who read mainstream glossy women’s magazines. My analysis in this chapter is informed by two scholarly traditions. On the one hand, it draws on classic critical studies of women’s magazines, in particular those in the poststructuralist tradition (Ballaster, Beetham, Frazer, & Hebron, 1991; Tabolt, 1992, 1995). On the other hand, it employs a range of tools developed in the tradition of Systemic-Functional Multimodal Discourse Analysis (SF-MDA), in particular those for analyzing visual design (Kress & Van Leeuwen, 2006), genre (Bakhtin, 1986; Martin & Rose, 2008), and evaluative language (Martin & White, 2005). The relation between the two research traditions is treated as dialogical in my approach. The multimodal analysis allows the unpacking of social processes embedded in discursive patterns such as genre or evaluative language, whereas the critical lens on women’s magazines informs how these discursive patterns can be interpreted in relation to their broader sociocultural context. In the following sections of the chapter, I will explore the sociocultural context of frankie through the lens of women’s magazine studies. I will then discuss the magazine’s key discursive features, with a focus on Frank Bits and the ways in which these discursive strategies allow the magazine to commodify indie culture while maintaining its core identity as an anti­ thesis to the consumerist culture epitomized by mainstream glossy women’s magazines. 2.  FRANKIE THROUGH THE LENS OF WOMEN’S MAGAZINES STUDIES The pervasiveness of women’s magazines in many contemporary societies across the globe (cf. Machin & Van Leeuwen, 2007) can be easily observed at local news agencies, where these magazines often occupy the largest shelf

Selling the ‘Indie Taste’  145 space. As popular media texts, women’s magazines have been analyzed extensively in gender and cultural studies. In the critical tradition, the study of women’s magazines provides an important path for understanding the gender ideologies prevalent in a given sociohistorical and cultural context. In her seminal work Forever Feminine: Women’s Magazines and the Cult of Femininity, Marjorie Ferguson (1983) argues that: [women’s magazines] contribute to the wider cultural processes, which define the position of women in a given society at a given point in time. In this exchange with wider social structure, with processes of social change and social continuity, these journals help to shape both a woman’s view of herself and society’s view of her. (p. 1) Studies of women’s magazines generally fall into three categories (for an extensive literature review, see Gough-Yates, 2003). The first type, which includes most studies, explores the ideological dimensions of magazines through textual analysis (e.g., Friedan, 1965; Macdonald, 1995; Winship, 1987). The second focuses on readership, that is, the ways in which readers consume women’s magazines (e.g., Ballaster et al., 1991; Hermes, 1995), while the third and comparatively rare type explores the production of women’s magazines (e.g., Ferguson, 1983; Gough-Yates, 2003). From the perspective of discourse analysis, the distinction between the analysis of texts, audience, and production made in women’s magazines studies is somewhat simplistic. Texts, textual production, and consumption, as well as the larger social processes in which these discursive practices are situated, do not warrant separate analyses. Rather, they constitute different dimensions of the same analysis (Fairclough, 1995; Van Leeuwen, 2008). Nevertheless, due to limitations of scope and space, my analysis of frankie in this chapter focuses predominantly on the textual level. Early textual studies of women’s magazines were largely inspired by the second wave of Western feminism in the 1960s (Friedan, 1965). These early accounts were concerned with the ways in which women’s magazines offered ‘unreal’, ‘untruthful’, or ‘distorted’ images of women. These magazines were consequently viewed as a ‘problem’ for women, and were believed to contribute to the reinforcement of gender differences and inequalities in contemporary societies (Friedan, 1965; Tuchman, Daniels, & Benét, 1978). In the decades since the 1960s, there have been several distinctive phases in the critical understanding of women’s magazines in terms of gender ideology, subjectivity, and sexual identity. Studies in the 1970s and 1980s were influenced strongly by two neo-Marxist theorists, Louis Althusser and Antonio Gramsci, leading to a major shift away from criticizing women’s magazines simply for their negative representations of women. The neo-Marxism perspective viewed women’s magazines as a social institution in which ideological struggle took place. These magazines were considered either from an Althusserian view as a site for oppression that “contributed to overall

146  Sumin Zhao subordination of women’s real identities” (Hermes, 1995, p. 223), or from a Gramscian view as a site where women’s oppression could be challenged and negotiated, rather than merely reinforced (e.g., Winship, 1987). Since the 1990s, the analysis of women’s magazines (e.g., Ballaster et al., 1991; Tabolt, 1992) has been influenced by developments in postmodernist and poststructuralist theories, which see gender as a discursively construed notion (cf. Butler, 1990) and see women’s magazines as “bearers of particular discourses of femininity” (Ballaster et al., 1991, p. 127). From a poststructuralist perspective, “an individual’s identity is constructed at every moment through subject positions. These positions are taken up by the language user in the enactment of discourse practices and are constantly shifting” (Tabolt, 1995, p. 143). Attention to various historically and culturally specific forms of language use is thus a key feature of poststructuralist analyses of women’s magazines. My analysis of frankie follows this tradition in women’s magazines studies, with its emphasis on the discursive construal of identity. However, my central concern here is not the construal of gender identity and femininity per se. Rather, I aim to explore the ways in which frankie ‘brands’ itself (Machin & Van Leeuwen, 2007) by creating alternatives, that is, alternative discursive forms and practices to those of mainstream glossy women’s magazines. Lying at the heart of contemporary glossy women’s magazines since the 1980s is the ideology of ‘new women’ (Gough-Yates, 2003; Winship, 1987), referring to a new target readership of women’s magazines: young, professional, middle-class females in their twenties and thirties, distinguishable from traditional mass-market ‘housewives’ through their broader range of life experiences and “motivational distinctions” (Nixon, 1996, p. 3, cited in Gough-Yates, 2003, p. 2). New women, according to Winship (1987), are essentially unrealizable ‘superwomen’ and a “commercial appropriation of the cultural space of feminism opened up minus most of the politics” (p. 150). Each glossy women’s magazine attempts to distinguish itself from its competitors by exploiting visual and linguistic patterns consistent with a particular ‘coding orientation’ (i.e., a particular semantic style) (Bernstein, 2000) and thus offer its readers a different but largely internally consistent ideology of femininity (Del-Teso-Craviotto, 2006; Eggins & Iedema, 1997). Nevertheless, the differences are resources used to naturalize ideology and create consumer choice rather than to create genuine diversity in gender ideologies (Eggins & Iedema, 1997). To compete for the saturated women’s magazine market and attract an audience, therefore, frankie has to create and has indeed successfully created a new ‘coding orientation’ that can offer consumers more than just differences, but alternatives. Its main strategy, I shall argue, is to sideline the gender issue (e.g., the magazine has a gender-neutral name and positions itself as a magazine for “women [and men]”) and to appropriate the discursive practices and aesthetics of indie culture. To create and sell this distinct indie ‘taste’, frankie deploys a wide range of multimodal semiotic resources,

Selling the ‘Indie Taste’  147 which will be discussed in detail in the following sections. The discussion will be based primarily on an analysis of 12 issues (issues 24–35) of frankie, dated between July 2008 and June 2010. A preliminary content analysis (Goffman, 1979) was first conducted, focusing on relatively simple and broad conceptual categories such as basic genres, topics, and types of participants (i.e., interviewees) and objects (i.e., types of products promoted), followed by a qualitative social semiotic analysis of visual design, genre, and evaluative language in selected sections in the magazine, focusing closely on one of them—the Frank Bits. 3.  FRANKIE AS AN ALTERNATIVE: THE MULTIMODAL CONSTRUAL OF INDIE TASTE To distinguish itself from mainstream glossy magazines, frankie has used a rich array of semiotic resources. The first and most immediate is that of the texture. While mainstream women’s magazines are published on glossy paper, and hence called ‘women’s glossies’, frankie first appeared on semiglossy paper (issues 1–33), and changed to matte paper in April/May 2010 (issue 34). This shift not only further distinguishes frankie from traditional glossies such as Vogue, but also allows it to associate itself with indie women’s magazines from other countries such as indie in Europe and Oh Comely in the United States. Using tactile texture (‘tactile’ as opposed to visually represented texture) as a semiotic resource to create an identity for a magazine is more than a reflection of the changing landscape of the publishing market in which a magazine needs to stand out physically from its competitors on the cramped display shelves. It also showcases the ways in which technological change affects the meaning potential of semiotic resources such as texture, colour, and sound (cf. Kress & Van Leeuwen, 2001). As digital technologies steadily limit the role of tactile experiences (Djonov & Van Leeuwen, 2011), it becomes, I shall argue, increasingly important for magazines to explore the affordances and semiotic resources that are unique to the print media, such as tactile texture. As the market researcher Michele Levine puts it, “We know that people like the touch and feel of the magazine. . . . So unless iPads actually develop touch and feel and smell, they won’t give the same experience as a magazine” (Gearin, 2010). A second means frankie employs to distinguish itself is visual design, most notably through visual modality. Visual modality refers to the ways in which images can represent participants naturalistically (i.e., as though they actually exist in this way), or as though they are imaginings, fantasies, caricatures, and so forth (Kress & Van Leeuwen, 2006). Glossies such as Cosmopolitan often adopt a high sensory visual modality (Machin & Thornborrow, 2003), which is characterized with high colour saturation, low colour differentiation (i.e., a limited variety of colour hues), low colour modulation (i.e., limited subtlety in colour transitions), and minimal backgrounds. In contrast,

148  Sumin Zhao frankie adopts largely a medium to low sensory and a more naturalistic visual modality, with medium colour saturation, modulation, and differentiation and more detailed backgrounds, showing people, objects, and their settings in greater detail. Since changing to matte paper, there has been a notable increase in naturalistic visual modality in frankie, with the lowering of colour saturation and modulation in most images. Importantly, while the use of high sensory modality tends to be consistent throughout a women’s glossy, there are stylistic variations within the overall ‘naturalistic’ modality of images in frankie, as vintage, Polaroid, and documentary photographs are also encountered. While the high sensory modality in women’s glossies has “a connotation of mainstream modernist fashionability” (Machin & Thornborrow, 2003, p. 460), frankie’s increasingly naturalistic modality symbolizes a particular kind of personal and everyday reality that is highly celebrated in indie culture (Newman, 2009). However, by using naturalistic modality in visual design, what frankie ultimately creates is not, as the editor Jo Walker puts it, “something more real” (Gearin, 2010). Rather, it creates a sense of hyperreality (Eco, 1986). For instance, the two most commonly used visual styles in frankie include (1) the ‘imperfect’ everyday look, reproducing the effect of vintage 80s Polaroid photos, and (2) the ‘gritty’ look, reminiscent of the visual aesthetics of documentary photography. Clearly, neither style, in particular the Polaroid, with its often out-of-focus representations, shows how we actually see the world. They are essentially the discursive construal of what frankie or indie culture accepts as reality: the imperfect, everyday, and the gritty as opposed to the highly stylized and “deterritorialized” (Machin & Thornborrow, 2003, p. 460) fantasy world of the mainstream women’s glossies. Although frankie is immediately distinct from its mainstream competitors in terms of its texture and visual modality, the magazine follows more or less the typical macro-generic structure (Martin & Rose, 2008) of a mainstream women’s magazine, presenting in a relatively fixed order of basic genres such as letters to the editor, editorial promotions, interview features, fashion editorials, product reviews, witness/personal stories, and recipes. The basic genres in frankie can be categorized into two broad types—those that focus on sharing community experiences (e.g., letters to the editor, witness/personal stories, interviews, etc.) and those that promote products or endorse fashion trends (e.g., editorial promotions, product reviews, fashion editorials, etc.). If we accept that a specific type of macro-genre serves a certain social purpose and realizes a particular set of social processes (Martin & Rose, 2008), it is reasonable to argue that the basic social processes realized in frankie are not fundamentally different from those found in mainstream women’s magazines. Like the glossies (cf. Machin & Thornborrow, 2003; Tabolt, 1992, 1995; Winship, 1987), frankie at its core functions to construe a discursive community for its readers, in which they negotiate their identities by engaging in various social acts centred around consumption (e.g., fashion, beauty products, and cooking).

Selling the ‘Indie Taste’  149 However, since frankie appropriates the indie culture—a culture that prides itself on being antimainstream and anticonsumerism—to distinguish itself from mainstream glossies, it becomes essential for the magazine to resolve a paradox—that is, to sell products without seemingly promoting any form of ostentatious consumerism. To achieve this, frankie develops a set of strategies that discursively recontextualize consumption, a point I shall elaborate next. 4.  FRANK BITS: SELLING THE INDIE TASTE In this section, I use Frank Bits as an example of the ways in which frankie discursively recontextualizes consumption while maintaining the indie culture taste that the magazine relies on to distinguish itself from mainstream women’s glossies. Frank Bits, which follows the letters-to-the-editor section, typically consists of eight pages, often presented as four double spreads, of product promotions (Figure 9.1). The section’s title is a pun on the magazine’s name and the word ‘frank’, and signifies simultaneously two indie culture characteristics: creativity and authenticity (i.e., we are honest about what we recommend). The section adopts a distinctive ‘clean’ layout design, evoking the visual aesthetics of modern Scandinavian design. Feature texts on the page are strongly framed (i.e., disconnected from each other) by largely empty white space (for framing, see Kress &Van Leeuwen, 2006), while the image and the verbal text within a feature are weakly framed (e.g., “Shoe the bear” on the bottom-right in Figure 9.1). The feature texts in Frank Bits present three types of promotion: (1) of material products (largely clothing, handmade fashion accessories, and household decorations), (2) of independent musicians/bands (in the forms of interviews), and (3) of social/cultural events (exhibitions, fairs, and concerts). In the 12 issues (24–35) analyzed in this chapter, for instance, there are in total 341 feature texts in Frank Bits, with 265 product endorsements, 57 interviews of musicians/bands, and 19 advertisements for social and cultural events. The layout design in Frank Bits, however, does not make distinctions between these three different types of promotion. That is, by looking at each feature text without reading it, it is not easy to tell a band interview from a product advertisement. For instance, judging by visual design alone, the feature text “fill in the blanks with” at the top right of Figure 9.1 could be a promotion for a fashion label. This particular design choice serves more than an aesthetic or ergonomic function (e.g., to maintain visual consistency and facilitate legibility). The fact that band interviews are mixed with ‘straight’ advertisements (and there are also separate band/musician interview sections in the magazine) is a first indication of frankie’s attempt to weaken the traditional classificatory boundaries (Bernstein, 2000) between cultural activities and consumption. Simply put, frankie construes culture as consumption and vice versa.

Figure 9.1  Frank Bits (frankie, issue 35, pp. 18–19)

Selling the ‘Indie Taste’  151 This blurring of boundaries between consumption and culture is best exemplified in ‘straight’ advertisements, where the main purpose of the texts is to promote products such as clothing and fashion accessories. Product promotion texts in Frank Bits share the same basic generic elements, including product, brand, price, and retailer store, which are highlighted in bold in the magazine, as illustrated in Text 1. Text 1 Holy Crap! How good is this? Pretty up your pushie with a Carrier basket [product] from XXXX [brand] and you’re sure to be the envy of bike-loving folk everywhere. Inspired by intricate Swedish crochet, it comes in white, green and black and will set you back $149 [price] from [retailer]. While all the verbal texts promoting products in Frank Bits include the elements that facilitate the sale of products (i.e., telling readers what to buy and where), not all resemble standard print advertisements as does Text 1. There is extensive use of ‘contextual metaphor’ (cf. Martin & Rose, 2008; for cognitive metaphor in advertising, see Forceville, this volume); that is, a different genre (the source genre) is used to fulfil the function of an advertisement (the target genre). Text 2, for instance, parodies a confession by one romantic partner to another, and presents the four basic generic ad elements as the climax of that confession. Text 2 “Darling, I’ve got to tell you something.” “What is that, dear?” “All these years, I’ve been living a lie.” “What?!” “I can’t explain it. But my tiger hand can, in a funny voice with fake roars”. “Well, thank goodness for that . . . By the way what IS that?” “It’s an Animal Hands Temporary Tattoos [product], $12.5 [price] from XXXX [brand]. I brought it online at XXXX [retailer]. “That was surprisingly informative summary. Thank you.” “I spent the last of our savings on it.” “Oh.” Text 3 disguises an advertisement as an interview presented in questionanswer format, with the actual product (shoes) not mentioned until the very end.

152  Sumin Zhao Text 3 Name? XXXX [brand] Position? Team manager (skate) What’s the XXXX store in a few words? Out of L.A. Young, fresh and innovative. Price point? Entry to high [price] Where can we get them? Select stores and skate shops. [retailer] Why should we part with our hard-earned cash for XXXX? Best money ever spent. Three words to think of when you think of XXXX? Your next shoes! [product] Complete this sentence: You should wear our shoes when. . . . Chilling out, rocking out and about. Notably, frankie’s use of contextual metaphor differs from narrative advertising (i.e., when a story is employed to sell a product) (see Cook, 2001 [1992], 39ff, for examples and discussion of advertising as parasite discourse). In narrative advertising, a narrative may serve to create a shared experience and invoke an emotional response to the product (e.g., Padgett & Allen, 1997). This purpose of narrative in advertising is congruent with the social function of conventional narratives in English-speaking culture, which is to share and evaluate experience (e.g., Labov, 1972; Martin & Rose, 2008). What differentiates the two is the type of experience that is being shared: the consumption of a product versus, say, the birth of a child. (Note that information such as pricing and retailer is highly unlikely to appear in the story itself.) In Texts 2 and 3, by contrast, the social purposes of the two genres—the target genre (ads that promote the product and provide purchase information) and the source genre (confessional talk or an interview)—are not congruent. The main social function of confession, for instance, is to regulate morality in Christian culture (e.g., the confession of sin). At the surface level, the use of contextual metaphor in Frank Bits serves to create a sense of being ‘smart’, ‘funny’, and ‘sarcastic’ as emphasized in frankie’s self-description cited at the beginning of this chapter. It also helps to create a sense of diversity and sophistication, offsetting the ‘monotonous’ visuals where the images used are predominately decontextualized and conceptual, that is, serving the function of presenting the essence of objects and individuals rather than their involvement in various actions (see Kress & Van Leeuwen, 2006; see examples in Figure 9.1). Beyond its surface value, contextual metaphor realizes two critical discursive functions in frankie. First, it helps to construe a particular type of readership. To be able to wink at the smartness and the humour of these texts requires that the reader recognize both the source genre (e.g., the confession) used in the contextual metaphor

Selling the ‘Indie Taste’  153 and the incongruence between the social purposes of the target and source genres (e.g., the confession and the advertisement). With access to a variety of generic styles and the ability to understand contextual metaphor, the ‘ideal’ frankie reader is no doubt one that possesses a considerable amount of ‘cultural capital’ (Bourdieu, 1986). Interestingly, however, even without the right cultural capital, a reader still can engage with the primary purpose of these feature texts, since all key purchase information (e.g., brand, price, and retailer) is highlighted in bold. A reader does not have to recognize or agree with frankie’s ‘sense of humour’ to purchase the product. Second, and more importantly, contextual metaphor helps further blur the line, or weaken the classification boundaries, between consumption and other sociocultural processes. In engaging with these texts, the reader engages simultaneously with two social processes—the consumption of a particular cultural semiotic artefact (i.e., a ‘smart and funny’ text) and the potential consumption of the advertised product. In short, contextual metaphor allows frankie to recontextualize consumerism as the consumption of culture. Besides using contextual metaphor at the level of genre, Frank Bits employs evaluative language to construe its readership and recontextualize consumption. This is done in two distinctive ways. The first is the proliferation of affect (Martin & White, 2005), the linguistic resources for construing emotions. Affect is private and personal in nature, while the other two resources for construing evaluation in language—appreciation (concerned with the evaluation of things) and judgement (the evaluation of people)—focus on institutionalized feelings. In Frank Bits, institutional evaluation is achieved through affect rather than appreciation or judgement. Text 4 lists several of these examples, where frankie (“we”) promotes the product through positive affect. (Affect resources are highlighted in bold.) Text 4 • Ooh—we covet [affect: inclination] the new AW range from XXXX. We covet [affect: inclination] it. • We are a bit fond of [affect: happiness] these new wooden XXXX from XXXX. • We are liking it so much [affect: happiness] that we’re telling you it’s back in shops for another round. Affect is also used to construe the imagined evaluation of the readerconsumer, addressed as “you”. Text 5 • . . . because just about everything in it makes you want to grin with joy and hug passers-by [affect: happiness]. • Feeling the need to express your love [affect: happiness] for the Victorian capital in pillow form? • If you fancy [affect: inclination] wearing your intellectual heart on your sleeve. . . .

154  Sumin Zhao • But we in Australia shouldn’t despair [affect: unhappiness] because XXXX also runs a website where you can enjoy [affect: happiness] the itty-bitty experience online and order your very own XXXX. What is particularly interesting is that Frank Bits also frequently evaluates the emotions of the designers/makers (instead of judging their capacity or skills, e.g., ‘talented’) of the promoted products, as the examples highlighted in Text 6 suggest. Text 6 • There’s a lot of love [affect: happiness] going behind the scenes at XXXX. The design trio behind the brand—Kelly XXXX, Scott XXXX and Maya XXXX—are three kinds of happy [affect: happiness] with each other, being siblings (Kelly and Scott), partners (Maya and Scott) and best Friends (Kelly and Maya). There’s a lot to love [affect: happiness] in their clothes as well. • “Once upon a time there was a girl in a house who loved [affect: happiness] to draw. It wasn’t long before these drawings became an obsession and she lived for [token affect: happiness] her creations.” This is the story of Ali J, a talented [judgement: capacity] illustrator. . . . By focusing on private feelings rather than institutional evaluation, frankie recontextualizes consumer promotion and consumption as an interpersonal exchange of emotions between ‘us’ (frankie), ‘you’ (the reader), and those who produce ‘the things we and you love’. In this way, it creates for the readers a ‘synthetic personalization’ (Fairclough, 1989), by which the readers are caught up in a bogus community whose members bond around consumer products with the “frankie-loves” stamps. Promoting consumer products, however, is not the sole purpose of Frank Bits. As the earlier discussion of contextual metaphor has shown, frankie always simultaneously promotes intangible, cultural products. One such product is ‘creativity’, that is, the process of creating a (consumer) product rather than the product itself. One discursive strategy used in Frank Bits to sell creativity, a key value of indie culture, is by construing an elaborated narrative around the design and production processes, such as the second example listed in Text 6. Text 7 is another example in which the creation of a product is represented through a narrative. Text 7 Jessica Sutton’s toy and accessory label XXXX came to life during a cold Canadian winter, when the Sydneysider was spending time in Northern Alberta. To keep her hands busy, she started turning out sock puppets, vinyl pencils cases and wallets, eventually turning to plush bears and other cutesy critters.

Selling the ‘Indie Taste’  155 It is particularly interesting to note that creative processes are often construed as fantasy and fairy tales. The example in Text 6, for instance, adopts the classic fairy tale structure “Once upon a time. . . .” In Text 7, on the other hand, the designer “turned out” things such as “sock puppets” and made them into “plush bears” and “cutesy critters”. Since these creation processes are fantasy-oriented and magic-like, the products born out of these processes can in turn create more fantasies. Text 8, for instance, is a text written to promote a necklace in the shape of a wall clock. Text 8 Is it wrong to admit that we have often fantasized about running away to some snow-bound chalet in Switzerland and surrounding ourselves with beer steins, melty fondue cheeses and cuckoo clocks? Sort of like crazy old cat ladies, but with wood-based time keeping machinery. Anyway, this little doozy from XXXX may save us the bother and expense. It’s called the “cuckoo cuckoo” necklace, it’s made of sterling silver and resin, it’s $X, and it’s less extreme than relocation to Europe. www. (bold added) A second discursive strategy that allows Frank Bits to promote creativity is a type of intertextuality involving the extensive use of cultural references, such as those highlighted in the examples in Text 9. Text 9 • Robots are closer than ever to taking over the world, but perhaps they might not wreak quite so much havoc if you have one of their kind imprinted on your purse. This Wee Robots clutch purse from XXXX is hand made from imported fabric and guaranteed to never ever say, “Exterminate! Exterminate!” It XXXX bucks from XXXX. • It is truth universally acknowledged that the best song of all time to perform drunken interpretative dance is “Babooshka” by Kate Bush. Possibly aided by some kind of swirly, floaty skirt and a semianguished facial expression. This only thing that could make it even bester is a lovely brooch team with the theme and, wouldn’t you know, there’s one already on the go. This here is the Russian Doll Pin from XXXX. • Squee! How cute is this tea and toast bag from British designer XXXX? We have one (worth about $XX) to give away, so if you want to show off your love for the breakfast of champions, e-mail your details to XXXX with Tea and Toast in the header. All of XXXX’s accessories are handmade in limited editions using eco-friendly materials—and all are just as sweet as this one. These cultural references create an arbitrary association between the product, the design process, and the culture. Again, such references assume

156  Sumin Zhao a considerable amount of ‘cultural capital’ on the readers’ part, since they include different genres of both ‘high’ (e.g., Kurt Vonnegut’s novel Breakfast of Champions, Jane Austen’s Pride and Prejudice) and ‘low’ culture (e.g., Doctor Who, Kate Bush’s rock song). Though arbitrary in nature, these cultural references are seamlessly integrated into the production text, which, like the use contextual metaphor, helps blur the line between the consumption of culture and commercial products. To sum up, while Frank Bits functions primarily to advertise and promote consumer products, it co-deploys several discursive strategies to blur the line between cultural production/consumption and consumerism and to disguise its promotion of consumer products as the promotion of creativity. These include (1) layout design that does not make a distinction between ‘straight’ advertisement and band interviews, (2) contextual metaphor that disguises advertising as genres that serve unrelated social purposes, (3) frequent use of positive affect to evaluate the products and narrative to represent the creative cultural process, and (4) intertextuality involving cultural references to both ‘high’ and ‘low’ culture. These discursive strategies also allow frankie to construe an ‘ideal’ readership that possesses diverse ‘cultural capital’ expressed as the ability to recognize a wide range of discursive styles and genres. 5.  INTERPRETING THE FASCINATION WITH FRANKIE So far, I have explored the strategies frankie has employed to create a distinctive ‘indie’ discursive style through the use of texture and visual design. I have also looked in more detail at the multimodal integration of layout, genre (contextual metaphor), and evaluative language in a section named Frank Bits. I shall now return to the question put forward at the beginning of this chapter—how can we explain the fascination with frankie? From a marketing point of view, this fascination lies in the magazine’s ability to create an alternative ‘coding orientation’—the discursive style of indie culture— to distinguish itself from the mainstream glossies and thus create a ‘niche’ market. The uniqueness of the frankie code is very apparent to the reader since the magazine explores tactile and visible semiotic resources to create its distinctiveness, including matte texture and naturalistic, yet hyperreal, visual modality. frankie’s appeal may also be attributed to the ways it successfully realizes all its self-proclaimed core values through multiple discursive dimensions. The smartness and creativity of frankie, for instance, is realized through genre (e.g., contextual metaphor) in the use of intertextual references to high and low culture, and through linguistic play (e.g., pun). Most importantly, from a critical perspective, frankie’s appeal lies in its ability to commodify the indie culture without appearing to promote any ostentatious consumption and to cater both to those who identify with the indie taste and to readers of mainstream glossy magazines. The main

Selling the ‘Indie Taste’  157 strategy frankie uses to solve the indie paradox is allowing multiple consumptions within a single discursive space. It thus masks the consumerist nature of the magazine even in Frank Bits, essentially an advertising section. When reading the features in Frank Bits, for instance, the reader is simultaneously consuming the promotion text itself as a semiotic artefact (the ‘cleverness’ of the contextual metaphor), the indie culture through various intertextual references realized both visually (e.g., the naturalistic visual modality that makes reference to documentary photography) and linguistically (e.g., the verbal references to high and low culture), and ultimately and ideally the consumer product it aims to sell. At a discursive level, it is difficult to make a distinction between these different types of consumption. When an ‘ideal’ reader engages with the texts, she or he is simultaneously engaging with two different types of social processes: symbolic or cultural consumption and potentially material consumption. Yet, frankie is not only for the ‘ideal’ reader or those who possess the right type of ‘cultural capital’ and can appreciate the indie “taste”, but it is also inclusive of those who cannot. The magazine adopts the familiar macro-generic structure and genres (e.g., fashion editorial, recipe) of mainstream women’s glossies and simple conceptual images (typical in fashion photography) and layout, which help clearly identify the products that are being promoted. Even when complex linguistic play such as contextual metaphor is employed, frankie ensures that key consumer information (e.g., retailer, brand) is printed in bold and easy to find even for readers who do not fully appreciate frankie’s ‘taste’. Thus, consumerist ideology remains at the core of frankie despite its much more overt subscription to anticonsumerist ‘indie’ values. The ultimate fascination of frankie is its ability to promote consumerism. The frankie story seems to accord with Bourdieu’s (1984) observation of the rise of the ‘new petite bourgeoisie’, creating huge economic power through providing symbolic goods and services. If the ‘indie’ code is easily hijacked and commodified by the ‘petite bourgeoisie’, what does it say about the nature of independent culture (Newman, 2009)? Or, is it possible that there is no alternative cultural space within a late capitalist economy? ACKNOWLEDGMENT frankie press has kindly given permission for their design copyright to be used as part of this academic publication. frankie press has no other connection to Critical Multimodal Studies of Popular Discourse. NOTES 1. As of September 2012:, https:// 2. Price is an optional element while the other three are obligatory ones.

158  Sumin Zhao REFERENCES Audit Bureau of Circulations, Ad News online, 10 February, 2012. Bakhtin, M. M. (1986). Speech genres and other late essays (1st ed.). Austin: University of Texas Press. Ballaster, R., Beetham, M., Frazer, E., & Hebron, S. (1991). Women’s worlds: Ideology, femininity and the woman’s magazine. London: Macmillan. Bernstein, B. B. (2000). Pedagogy, symbolic control, and identity: Theory, research, critique (Rev. ed.). Lanham, MD: Rowman & Littlefield Publishers. Bourdieu, P. (1984). Distinction: A social critique of the judgement of taste (R. Nice, Trans.). London: Routledge & Kegan Paul. Bourdieu, P. (1986). The forms of capital. In J. G. Richardson (Ed.), Handbook for theory and research for the sociology of education (pp. 241–258).Westport, CT: Greenwood Publishing Group. Butler, J. (1990). Gender trouble: Feminism and the subversion of identity. New York, NY: Routledge. Cook, G. (2001 [1992]). The discourse of advertising. London: Routledge. Del-Teso-Craviotto, M. (2006). Words that matter: Lexical choice and gender ideologies in women’s magazines. Journal of Pragmatics, 38(11), 2003–2021. Djonov, E., & Van Leeuwen, T. (2011). The semiotics of texture: From tactile to visual. Visual Communication, 10(4), 541–564. Duncombe, S. (1997). Notes from underground: Zines and the politics of alternative culture. New York, NY: Verso. Eco, U. (1986). Travels in hyperreality: Essays (W. Weaver, Trans.). San Diego, CA: Harcourt Brace Jovanovich. Eggins, S., & Iedema, R. (1997). Difference without diversity: Semantic orientation and ideology in competing women’s magazine. In R. Wodak (Ed.), Gender and discourse (pp. 165–196). London: Sage. Fairclough, N. (1989). Language and power. London: Longman. Fairclough, N. (1995). Critical discourse analysis: The critical study of language. London: Longman. Ferguson, M. (1983). Forever feminine: Women’s magazines and the cult of femininity. London: Heinemann Educational. Friedan, B. (1965). The feminine mystique. Harmondsworth: Penguin. Gearin, M. (2010, June 9). Homemade magazine bucks the trend. Retrieved from http://–06–09/homemade-magazine-bucks-the-trend/859752 Goffman, E. (1979). Gender advertisements. London: Macmillan. Gough-Yates, A. (2003). Understanding women’s magazines: Publishing, markets and readerships. London: Routledge. Hermes, J. (1995). Reading women’s magazines: An analysis of everyday media use. Cambridge: Polity Press. Kress, G., & Van Leeuwen, T. (2001). Multimodal discourse: The modes and media of contemporary communication. London: Hodder Arnold. Kress, G., & Van Leeuwen, T. (2006). Reading images: The grammar of visual design (2nd ed.). London: Routledge. Labov, W. (1972). Language in the inner city: Studies in the Black English vernacular. Philadelphia: University of Pennsylvania Press. Macdonald, M. (1995). Representing women: Myths of femininity in the popular media. London: E. Arnold. Machin, D., & Van Leeuwen, T. (2007). Global media discourse: A critical introduction. London: Routledge. Machin, D., & Thornborrow, J. (2003). Branding and discourse: The case of cosmopolitan. Discourse and Society, 14(4), 453–470.

Selling the ‘Indie Taste’  159 Machin, D., & Van Leeuwen, T. (2007). Global media discourse: A critical introduction. London: Routledge. Martin, J. R., & Rose, D. (2008). Genre relations: Mapping culture. London: Equinox. Martin, J. R., & White, P. (2005). The language of evaluation: Appraisal in English. London: Palgrave Macmillan. Newman, M. Z. (2009). Indie culture: In pursuit of the authentic autonomous alternative. Cinema Journal, 48(3), 16–34. Padgett, D., & Allen, D. (1997). Communicating experiences: A narrative approach to creating service brand image. Journal of Advertising, 26(4), 49–62. Tabolt, M. (1992). The construction of gender in a teenage magazine. In N. Fairclough (Ed.), Critical language awareness (pp. 174–200). London: Longman. Tabolt, M. (1995). A synthetic sisterhood: False friends in a teenage magazine. In K. Hall & M. Bucholtz (Eds.), Gender articulated: Language and the socially constructed self (pp. 143–165). New York, NY: Routledge. Tuchman, G., Daniels, A. K., & Benét, J. W. (1978). Hearth and home: Images of women in the mass media. New York, NY: Oxford University Press. Van Leeuwen, T. (2008). Discourse and practice: New tools for critical discourse analysis. Oxford: Oxford University Press. Winship, J. (1987). Inside women’s magazines. London: Pandora.

10 From Popularization to Marketization The Hypermodal Nucleus in Institutional Science News Yiqiong Zhang and Kay L. O’Halloran

1. INTRODUCTION As digital technologies rapidly evolve, they change how information is presented, transmitted, and shared, and concomitantly the ways in which semiotic resources (e.g., language, images, and hyperlinks) are used to construe meaning. While digital media is having a pervasive impact on the way science news is presented (Trench, 2007, 2008), the studies of online science news, however, are relatively rare compared to other types of online discourses (e.g., Djonov, 2008; Knox, 2007, 2009a; Lemke, 2002; Tan, 2011; Zhang & O’Halloran, 2012), despite growing concerns about the nature of science journalism today: Precisely how, and to what extent, the internet is changing the characteristics of science news is deserving of our close attention. Even those who are dismissive of the celebratory claims being made about its potential for creating new spaces for “public engagement with science” need to recognize that it is here to stay, and that it promises to dramatically recast science journalism’s familiar norms and values in unanticipated ways. (Allan, 2009, p. 162) Changes to the values and norms of science journalism are not only related to technology, but also to social factors and economic imperatives, which are redefining the roles of scientific research institutions. Science communities have, for example, been increasingly integrated into the accelerated trend of ‘academic capitalism’, which is the new global knowledge economy of “institutional and professional market or market-like efforts to secure external moneys” (Slaughter & Rhoades, 2004, p. 8). Driven by this force of ‘marketization’ (Fairclough, 1993), the science community, who used to consider science journalism as “a low status activity, unrelated to research work” (Shinn & Whitley, 1985, p. 3), is now actively seeking media coverage for “securing the financial support required to run major research facilities” (Nelkin, 1995, p. 125). Besides actively seeking the attention of the press, universities are increasingly engaged in communicating directly with the public on the Internet to address their growing publicity demands (Trench, 2009). In doing so,

From Popularization to Marketization  161 institutional news reports of scientific research, as we shall argue in this chapter, have moved beyond the popularization of science, broadly defined as the re-presenting of scientific knowledge for the general public (Myers, 2003), and into the realm of the marketization of universities and researchers themselves. To illustrate this fundamental shift in science news reporting, we adopt a critical discourse analysis (CDA) approach (Fairclough, 1993, 2001) to investigate the representational practices of science news on an institutional website, (hereafter referred to as Futurity). Futurity is a university consortium website launched in 2009 and revamped in 2010 to aggregate and publish research news from leading universities in the United States, the United Kingdom, Canada, and Australia. It is the first of its type as a collaborative effort among universities to publicize research projects undertaken in their respective institutions (Brainard, 2009). Since its establishment, the designers of Futurity have used a range of semiotic resources to create ‘hypermodal’ science news reports, which are characterized by the “conflation of multimodality and hypertextuality” (Lemke, 2002, p. 301). In particular, an increasing number of images from image banks such as iStockphoto ( and Shutterstock (http://www are used in the website, forming a ‘hypermodal nucleus’, which includes a headline, a university attribution, a news lead, an image, a caption for the image, and access for social media sharing. This emerging representational practice captures readers’ attention with a simple and immediate impact and allows them to spread the news via social media sites such as Facebook and Twitter. The nature of hypermodal nuclei in Futurity suggests that science popularization has entered into the domain of mainstream popular culture, where “intelligible, persuasive values are [increasingly] drawn from the entertainment industries” (Sontag, 2001, p. 273). In what follows, we first introduce the theoretical framework of CDA— the approach we have adopted for analyzing the discourse practices and the broader historical context of social practices within which the consortium of universities is positioned. We then provide a detailed analysis of a hypermodal nucleus to illustrate how meaning is construed with multisemiotic resources in the nucleus (e.g., language, image, and hyperlinks) and discuss how the generic components are shaped by the social context. In the last section, we discuss how hypermodal nuclei in Futurity reflect the larger shift in the role of science popularization today. 2.  CRITICAL DISCOURSE ANALYSIS (CDA) CDA is an interdisciplinary approach to discourse studies that regards language as social practice and takes particular interest in the relationship between discursive practices and social structure (Fairclough, 1993; Fairclough & Wodak, 1997; Van Dijk, 1993; Wodak & Chilton, 2005; Wodak & Meyer, 2001). CDA extends beyond the description of discourse per se

162  Yiqiong Zhang and Kay L. O’Halloran to an explanation of the social factors that account for the production of discourse itself. The approach addresses the instrumental role of discourse in reproducing social norms, and reveals how discourse is socially constitutive (Fairclough, 1993). Adopting a CDA approach to investigate science popularization thus allows us to explore the social norms and structures that underpin how science is conducted and transmitted to the public and the role of popular science discourse in reinforcing these norms and structures. CDA “views reality as textually and intertextually mediated via verbal and non-verbal language systems, and texts as sites for both the inculcation and contestation of discourses” (Locke, 2004, p. 2). It can be conducted through three dimensions of analysis (Fairclough, 1993). The first dimension is descriptive analysis, which is concerned with the properties of textual elements. The second dimension involves interpretative analysis, in which the content of language and its functional parts are examined to understand and interpret the role of language plays in larger social structures. The third dimension is social analysis, which addresses larger cultural, historical, and social discourses surrounding the various interpretations of the texts. The analysis presented in this chapter is guided by this three-dimensional framework. Because CDA is “not a method, nor a theory that simply can be applied to social problems,” but rather an interdisciplinary approach that is “conducted in, and combined with any approach and subdiscipline in the humanities and the social sciences” (Van Dijk, 2001, p. 96), we need to introduce the specific approaches and tools adopted in our study for analyzing discourse practices and the historical context within which the Futurity website is situated.

2.1  Discourse Practices: Genre and Systemic Functional Linguistics CDA draws upon different theories of language to connect linguistic analysis to social analysis. In this study, Systemic Functional Linguistics (e.g., Halliday, 1994; Halliday & Matthiessen, 2004) and genre theory (e.g., Martin & Rose, 2007) are applied to studying how texts are shaped by the social functions they serve. This study adopts the systemic functional definition of genre as “a staged, goal-oriented social process” (Martin & Rose, 2007, p. 8). Each genre has its unique structure that enables the text to function as a semantic unit (Eggins & Martin, 1997) and is “shaped by social structures and habituated practices of greater or less stability and persistence” (Kress, 2003, p. 87). The science news investigated in this study belongs to the genre of hard news story that is grounded in communicative events such as reports and press releases (White, 1997). The hard news story has a ‘nucleus satellite structure’ (Iedema, Feez, & White, 1994), in which the headline/lead combination functions as a nucleus to give the crux of the story without the need to read on. The nucleus typically highlights the significance and newsworthiness of the story, and the remainder of the story develops with a

From Popularization to Marketization  163 series of satellites, which relate to the nucleus rather than to each other. The relationship between a satellite and the nucleus can be in one of the following forms (White, 1998): 1) Elaboration, where the satellite adds more detailed description or exemplification of the information presented in the nucleus; 2) Contextualization, which places the events or statements from the headline/lead in a temporal, spatial, or social context; 3) Explanation of the causes, consequences, purposes of, or reasons for elements introduced in the nucleus; 4) Appraisal of the event. This nucleus-satellite structure provides a general framework for the development of news story genres that incorporate visual images (Caple, 2009). In this chapter we propose that the hypermodal nucleus has evolved from the nuclear satellite structure of traditional print news, and involves the interaction between visual, verbal, and hypertextual resources. The analysis of discourse practices is also informed by Halliday’s systemic functional theory (Halliday, 1994; Halliday & Mathiessen, 2004), according to which language has the potential to simultaneously realize three types of meaning known as ‘metafunctions’: • Ideational, concerned with experience and logical relations in the world • Interpersonal, concerned with the enactment of social relations • Textual, which organizes the ideational and interpersonal meanings into coherent and cohesive units of meaning The metafunctions have been adapted to the analysis of nonlinguistic resources (e.g., Kress & Van Leeuwen, 2006; Lemke, 2002; O’Toole, 2011) and of relations across modes in multimodal texts (e.g., Lemke, 1998; Martinec & Salway, 2005; O’Halloran, 1999; Royce, 1998, 2007; Unsworth & Cléirigh, 2009). These studies recognize that semiotic resources are ‘incommensurable’ because they always necessarily present different meanings (Lemke, 2002) and combine in a ‘synergistic’ relationship where “the ability of elements, in the act of combining, [. . .] produces a total effect that is greater than the sum of the individual elements of contributions” (Royce, 1998, p. 27). In exploring the multiplication of meaning in the hypermodal nucleus in Futurity in this chapter, we follow Lemke’s (2002) framework for analyzing hypermodal texts, where the metafunctions are viewed as “the common denominator by which multimodal semiosis makes potential multiplicative hybrid meanings” (Lemke, 2002, p. 304), and the terms presentational, orientational, and organizational correspond respectively to Halliday’s ideational, interpersonal, and textual meaning.

2.2  Social Practices: Popularization and Marketization According to Fairclough, CDA is an approach with “a strong orientation to historical change: to changing discursive practices and their place within wider processes of social and cultural change” (Fairclough, 1993, p. 137).

164  Yiqiong Zhang and Kay L. O’Halloran In this case, the historical changes of science news provide a social context for understanding the discursive practices in Futurity. Before the 1980s, science popularization tended to be negatively viewed by the science community as being incorrect, oversimplified, or sensationalized and irrelevant to research work (Lewenstein, 2001). The attitudes began to change in late 1970s, when scientific research started to get privatized (Bauer & Gregory, 2007) and higher education became more involved in the global trend of ‘marketization’ (Fairclough, 1993). Since then, the main source of research funding has been steadily shifting from government and the public sector to the private corporate sector. In order to maintain and expand the scientific enterprise like other corporate entities, institutions and scientists have to ‘sell’ science in the mass media to gain financial support and to maintain a long-term positive public attitude toward their ‘product’, which in this case is the scientific enterprise itself (Nelkin, 1995). Institutions and scientists became increasingly engaged in attracting press attention and making their work known to the public. Research universities started to employ public relations professionals or media consultants to publicize the work of their scientists and make sure the work was covered prominently, accurately, and favourably in the press. Such publicity is regarded as a means of shaping public attitudes and influencing policymakers’ decisions on research funding. In the words of a public relations officer: “the surest way to capture a share of the funds was to do good research and, almost as important, to talk about it” (Pat McGrady, as cited in Nelkin, 1995, p. 127). Institutions are increasingly bypassing the mass media outlets and publishing science news on their own websites to ensure rapid and controlled publication of research achievements over the web (Trench, 2007). Futurity was established as a nonprofit site to aggregate research news from university partners and disseminate it directly to the public. The impact of this move is investigated in the following analysis of hypermodal nuclei in Futurity. 3.  HYPERMODAL NUCLEI IN FUTURITY The hypermodal nucleus is specifically designed to appeal to readers and was first put into use when the Futurity website was revamped in September 2010. The following discussion is based primarily on the analysis of 123 news pages from the October 2011 issues. Some additional examples from other data sets are drawn upon wherever necessary.

3.1  Intersemiotic Relations in a Hypermodal Nucleus A hypermodal nucleus has six components: the headline, the university attribution, the verbal lead, the image, the caption, and the hyperlinks for social media sharing (as illustrated in Figure 10.1). In relation to the whole news page, the nucleus occupies the top of the page and functions as the ‘head’

From Popularization to Marketization  165

Figure 10.1  Hypermodal nucleus and Futurity news story page

in a head-tail page structure; this area is distinguished from the remainder of the page as it presents “the information valued as of the most immediate relevance and importance” (Knox, 2007, p. 38). The news in Figure 10.1 reports on a mathematics equation for modelling the behaviour of hair in a bundle (in this case, a ponytail), which has applications in the fields of textiles and animation. The image is a back view of a woman holding her hair into a ponytail. The photograph resembles images that are used in fashion magazines, advertisements, or tabloids in order to evoke notions of beauty, femininity, and sexuality. In this case, however, it is used to introduce a mathematical equation about the behaviour of hair in a ponytail. The ponytail in the image is in a relationship of “intersemiotic repetition” (Royce, 1998) with the verbiage in the headline. The blank background of the image does not explain why the woman is holding her ponytail, which creates a relation of “intersemiotic collocation” (Royce, 1998) between the image and the phrase “mystery of ponytails” in the headline. These intersemiotic relations allow the headline and the image to function as an “interpersonal theme” (Martin, 2001), engaging readers and directing them to the rest of the nucleus. An attribution to the University of Warwick precedes the lead, increasing the credibility of the news and promoting the university at the same time. The lead reads: “The Ponytail Shape Equation explains how hair behaves in a bundle, and may have applications in textiles and animation”. The ponytail in the image becomes the subject studied in the reported research. The lead elaborates the headline in an academic style. For example, the discipline, ‘math’, is revealed by the nominal group “The Ponytail Shape Equation”,

166  Yiqiong Zhang and Kay L. O’Halloran comprising a ‘thing/head’ noun (equation) premodified by a ‘classifier’ (ponytail shape) (Halliday, 1994). The equation’s potential applications in textiles and animation are also mentioned in the lead, shifting the news value from the ponytail as a familiar object in the headline and image to the significance of the equation in the lead. The caption reads: “The researchers say that with a new quantity described in the study—the Rapunzel Number—the new equation can predict the shape of any ponytail”, which evaluates the findings presented in the lead and further suggests the news value of the research event by informing the reader that the equation has the power to predict the shape of any ponytail. The nucleus thus leads the reader from a simple and concrete everyday world into the abstract world of science. Readers who find the news interesting, either due to the amusing image or the power of the mathematical equation, can share it on social networking sites by clicking on the Facebook, Twitter or Google+ buttons in the nucleus. The frequencies of sharing are autorecorded to provide a new measure for news value—the popularity of the news among readers. As the example of “Math untangles the mystery of ponytails” suggests, the headline and image function to attract the viewer to the story. The attribution to a university establishes and promotes the reliability of the content. The lead and caption elaborate on the headline and image, and the social media sharing tools facilitate the sharing of the news, with the numbers next to the tools indicating the popularity of the news among social networking users. In the following section, we discuss these generic elements in relation to how they shape and are shaped by the social practice of science popularization.

3.2  Generic Components of the Hypermodal Nucleus The Headline and Image A writer from Ragan Communications comments that Futurity “looks more like a snazzy web magazine than a link farm for science abstracts” (Working, 2011). The ‘snazzy’ effect is mainly achieved with the headline and image. Recent studies on visual images in the media have suggested a shift in the functions of images in both print and digital news from a representational focus on portraying reality (Griffin, 2004; Hall, 1973) to an interpersonal focus on engaging readers visually and establishing a bond between the reader and the discourse (Caple & Bednarek, 2010; Knox, 2009b). This is indeed the case with Futurity, where there has been a steady increase in the use of images from image banks since the website’s official launch. In the October 2011 data set, 54 percent (67 out of 123) of the images are from image banks. These images are typically decontextualized, generic, and decorative, and they function more as a symbolic system and as elements of layout design, rather than as a record of reality (Machin, 2004). In the case of Futurity, the images function together with headlines to attract readers’ attention, as the example in Figure 10.1 illustrates.

From Popularization to Marketization  167 In terms of their contribution to individual stories, the images play a very limited role in either presenting the research event as news or representing science in a less abstract way. In the nuclei of October 2011, 76 percent (93 out of 123)1 of the images offered symbolic and/or generic representations of the objects or social phenomena being studied in a manner similar to the image in Figure 10.1. They appear to represent objects or social phenomena “with simple and immediate impact” to be “read quickly and easily,” and to “symbolically support the verbal text, often as a prompt or lead-in for the reader’s eye” (Griffin, 2004, p. 384). Knox (2009b) suggests that images in online news construe meanings not only within individual news stories, but also over time in an ongoing dialogic interaction between the newspaper and its reader. Similarly, over a longer time period, the presentational meanings of the images in Futurity nuclei are significant for the discourse of science popularization. The stereotypes of science as “an arcane and incomprehensible subject” that is “distanced and lofty” (Nelkin, 1995, p. 15) could be changed with these images of concrete objects and social phenomena. In terms of orientational meaning, these images “position the reader interpersonally in a way that is not possible with verbal text alone” (Knox, 2009b, p. 180). The majority of the images used in Futurity are close-ups, and they create a perception of physical proximity to the objects they depict, establishing an intimate relationship between these objects and the reader (Kress & Van Leeuwen, 2006). Photos from image banks also “evoke moods and concepts” (Machin, 2004, p. 330) and invoke attitudes toward them (Martin, 2001). This latter function is illustrated in Figure 10.2. The image presents a

Figure 10.2  Image evoking moods and concepts

168  Yiqiong Zhang and Kay L. O’Halloran smiling masculine man holding two tomatoes in front of his eyes. The man, the tomatoes, and the muscles in the image symbolically represent “guys”, “healthy”, and “manly” in the headline, respectively. The smile and the bright tomatoes evoke pleasure, beauty, and glamour, and the muscles could connote modernity and masculinity. The image and the headline create an experience that cannot be achieved with written science news alone. Such aesthetically engaging images turn the science news into what we call ‘scifotainment’, which could possibly build “a committed readership” (Caple, 2008) in the long term. Through scifotainment, the contemporary corporate ideology of ‘positive thinking’ that dominates image banks (Machin, 2004) penetrates science news, and thereby potentially fosters a positive public attitude toward scientific research. In terms of organizational meaning, the images are likely to give maximum salience to the targeted object. The salience typically results from the combined use of a frontal angle, central position, and backgrounding or elimination of contextual elements (see the images in Figure 10.1 and Figure 10.2). With a decontextualized background, organizational choices (e.g., framing, volume, proportion) become salient, thus highlighting the object in the image. The images also function as a design element to connect the headline, the lead, the caption, and the hyperlinks for social media sharing, thereby expanding the nucleus into a hypermodal one. The University Attribution The university attribution is a salient promotional element in the nucleus. It is foregrounded in a noticeable position between the headline and lead, in bold typeface and a font size larger than that of the lead. This particular design establishes the credibility of the science news by attributing it to the university where the research is conducted. At the same time, the attribution’s salient position in popular news helps promote the university. The Lead and the Caption Typically, a news lead presents the news point and essential news elements (such as who, what, when, where, and why) in one or two sentences (Bell, 1991; Van Dijk, 1988), which would be the research findings in the case of science news. The news body develops the lead with a series of satellites that elaborate, contextualize, explain, or appraise information presented in the lead (White, 1998). In print news, the satellites follow the headline/lead and are considered to have different generic properties from the headline/lead. But in the case of the hypermodal nucleus in Futurity, one of the satellites occupies the image caption, relating the caption directly to the lead. In this way, the caption develops the lead and expands the nucleus into a mini-story that is more than the headline-lead structure characteristic of print news nuclei. Examples of how the caption can expand the hypermodal nucleus through appraisal, elaboration, explanation, and contextualization of the lead are given in Table 10.1.

U. Illinois (US)—A chromosome boost makes some plants come back stronger after they’ve been eaten, researchers say.

U. Washington-Seattle (US)—Suicidal thoughts and behaviors seen in teens may begin much earlier in life than previously thought.

Thoughts of suicide start young


Nibbled plants grow back ­stronger

Headline / Image

Table 10.1  Logical relationship between the lead and caption

“Young adults who end up having chronic mental health problems show their struggles early,” says James Mazza, lead author and professor of education psychology (Credit: iStockphoto)

“Finding out that an organism can change its chromosome number under environmental stress was pretty surprising,” says Ken Paige, head of the University of Illinois Department of Animal Biology.






UC Berkeley (US)—Dickens was right on the money with his depiction of Cratchit and Scrooge. Poor people are quicker to show compassion than the rich, a study shows.

Brown (US)—Primates learn from feedback that surprises them, and in a recent investigation of how that happens, neurosurgeons have learned that neurons in two important structures handle both good and bad surprises.

How brain reacts to ­surprise is surprising


Bah, humbug! Rich slower to show empathy

Headline / Image

Table 10.1  (Continued)



Contextualization “It’s when you encounter something that’s unexpectedly good or bad that you need to change your behavior either to keep doing the thing that’s good or avoid the thing that’s bad. There’s been a lot of debate over how these signals are represented (in the brain),” says Wael Assaad of Brown University. (Credit: Masson/ Shutterstock)

“It’s not that the upper classes are coldhearted,” says social psychologist Jennifer Stellar. “They may just not be as adept at recognizing the cues and signals of suffering because they haven’t had to deal with as many obstacles in their lives.” (Credit: Alice Day, Shutterstock)


From Popularization to Marketization  171 The caption in the first example in Table 10.1 appraises findings presented in the lead: “A chromosome boost makes some plants come back stronger after they’ve been eaten” is rephrased in the caption as “an organism can change its chromosome number under environmental stress” and is evaluated as “pretty surprising”. This appraisal emphasizes the newsworthiness of the findings. In the second example, the caption elaborates the lead with a restatement. “Young adults who end up having chronic mental health problems show their struggles early” is restated as “Suicidal thoughts and behaviors seen in teens may begin much earlier in life than previously thought”. The elaboration stresses the significance of the findings by comparing them with existing understandings about suicidal behaviours. The caption in the third example explains the causes of the findings: “Poor people are quicker to show compassion than the rich” because they are more “adept at recognizing cues and signals of suffering”. The explanation of the causes furthers readers’ understanding about the findings presented in the lead. In the fourth example, the study of the phenomenon of surprise reported in the lead is contextualized in the literature in the caption “there’s been a lot of debate over how these signals [that the brain reacts to surprise] are represented”. The contextualization also furthers readers’ understanding about the findings by situating the research event in an ongoing debate. In the October 2011 issues, 61 percent of the captions in the nuclei elaborate the findings in the lead, 22 percent appraise the findings, while only 17 percent contextualize or explain the findings. The use of elaboration and appraisal highlight the significance of the newly derived scientific findings, while contextualization and explanation contribute to readers’ understanding of the findings. The fact that the majority of the captions are elaborations and appraisals of the lead suggests that the key function of the caption is to promote the research findings in the lead rather than to explain them. Another function of the caption is to introduce the researchers. In the verbal lead, researchers tend to be either excluded or presented in groups (e.g., ‘researchers’, ‘neurosurgeons’) (Van Leeuwen, 2008). But in the hypermodal nuclei, they can be introduced in the caption as individuals with their names and academic identities (see the texts in italics in Table 10.1). The ‘individualization’ (Van Leeuwen, 2008) of researchers in the caption celebrates the findings as achievements of individual researchers, thereby contributing to their ‘personal brand’ (Lair, Sullivan, & Cheney, 2005) in a competitive academic market. The Social Media Sharing The tools for social media sharing are crucial components of the hypermodal nucleus, though they do not contribute directly to its meaning. As shown in Figure 10.1, below the verbal lead and on the left of the image, there are three buttons available for sharing the news in three social networking sites: Twitter, Facebook, and Google+. They facilitate communication from consumer to consumer. The number of times the item is shared is autorecorded and displayed, providing an index of the popularity of the news among readers.

172  Yiqiong Zhang and Kay L. O’Halloran Sharing increases the likelihood of readers’ accessing the science news. According to the Clickstream for Futurity on Alexa the Web Information Company (2012), 25 percent of users visit the social media sites immediately prior to visiting Futurity and 36 percent visit them after leaving Futurity. These statistics do not necessarily suggest that a quarter of the users are directed to Futurity from social media sites and more than one-third of the users share Futurity news through these sites because users may visit one website after another out of habit. The statistics, nonetheless, suggest that a significant percentage of Futurity users are frequent users of social media. Social networking sharing may have an unprecedented, if not revolutionary, impact on science popularization because sharing shortens the distance between science and everyday life. When sharing news from Futurity, readers shift from a page of science news into a status page of social networking websites. The reading of science news could possibly shift from the science news to a friend’s post of jokes, and a reader on the social networking site could be directed to the Futurity news through shared posts. Such traversals shift between completely different domains, from a public to a relatively private domain and from science to daily life, in a very short timescale (see Lemke, 2005, on timescales in hypertext traversals). In this way, the traversals create a new context for science that traditional print media cannot provide. 4.  THE HYPERMODAL NUCLEI AND SOCIAL REALITY The analysis presented in Section 3 has demonstrated that hypermodal nuclei in Futurity are a discourse practice shaped by the technological and institutional context of science popularization. This section focuses on how this discourse practice might transform social reality. The hypermodal nucleus emerged as a type of ‘scifotainment’ by drawing upon resources and practices from the domain of popular culture (e.g., image banks and social media) to create a new presentation of science news for the public. The incorporation of popular culture into science news reporting is significant because “effective [science] communication will necessitate connecting a scientific topic to something the public already values or prioritizes, conveying personal relevance” (Nisbet & Scheufele, 2009, p. 1774). The nucleus is a semiotic construct catering to the information consumption style of the public in the digital era: the combination of an attention-grabbing image, short written texts, and hyperlinks that enable social media sharing is well-suited to the information-based economy where attention is the main currency (Goldhaber, 1997). As Dom Sagolla (2009), one of the creators of Twitter, remarks: The combination of short and instant message services, status appliances, and social networks has created an audience that both is voracious and has a deficit of attention. We as readers define the short form within the

From Popularization to Marketization  173 limits of our own attention. Material that makes a reader react and subscribe becomes successful, while other attempts fall by the side. (p. XV) Making science content readily available for wide distribution is critical in the digital era because around two-thirds of Internet users encounter science news when they go online for other information (Horrigan, 2006). Popularizing science on the Internet “is about more than just adding multimedia to a story—it is about adopting a digital mindset” (Hermida, 2007). The hypermodal nucleus is created with a ‘digital mindset’ as multisemiotic resources transform the print news ‘headline/lead’ nucleus into a hypermodal nucleus for readers to consume and share online, one that notably increases Futurity’s reach on the web according to the Statistics Summary for Futurity on Alexa the Web Information Company2. With the nuclei, Futurity is “going broad” (Nisbet & Scheufele, 2009) to make its science news accessible to the general public and thereby boost the reputation of the universities and researchers foregrounded in the nuclei. Hypermodal nuclei in Futurity have the potential to build positive public attitudes toward the science enterprise, but whether this improves the public’s understandings of science remains an open question. In public understandings of science, there are three kinds of knowledge: the knowledge about essential facts (science as facts); the knowledge of how science works (i.e., the research methods that produce scientific knowledge); and the knowledge of how science is organized as a social practice (i.e., the social structures or institutions of science) (Durant, 1994). The intended mission of popularization is to impart knowledge of science as a social practice. Yet, despite the prominence of university attributions and researchers’ names in Futurity, the nuclei appear to present science primarily as facts and entertainment as they adopt the ‘scanand-go’ portal mode of presentation (Cooke, 2003, p. 164). The information packed in a hypermodal nucleus is rather limited given its short form, and tends to be presented as newly defined facts. Information about how science works, namely how the findings are obtained and argued, is given in the body of the news. Therefore, learning about this requires complete reading of the news. Such reading, however, is not encouraged by the hypermodal nucleus, which is surrounded by empty space and thereby presented as a text that can be read on its own and is not dependent on the rest of the story. The tools for social media sharing further increase the chance that readers would skip reading beyond the nucleus as they direct readers to their status pages in social networking sites, potentially distracting them from the stories introduced in the hypermodal nuclei. As the nuclei impart few scientific facts, their main potential appears to lie in “evok[ing] positive feelings and attitudes that may lead to subsequent, deeper encounters with science” (Burns, O’Connor, & Stocklmayer, 2003, p. 197). The hypermodal nuclei, therefore, function to present research events more as scientific achievements for establishing the reputation of universities and gaining research funding than as activities for knowledge production and dissemination.

174  Yiqiong Zhang and Kay L. O’Halloran 5. CONCLUSION The hypermodal science-news nucleus examined in this chapter illustrates how institutions adopt popular culture resources and practices to popularize science. In the cross-institutional science website of Futurity, a hypermodal nucleus is created with an engaging image and a headline to attract attention, an attribution to promote the university, a lead and a caption to describe science findings briefly and promote the researchers and their universities, and tools for social media sharing to increase distribution of the news. The nucleus has evolved from the headline/lead as nucleus in print news to a hypermodal version to meet consumption demands in the digital era. By incorporating popular culture into science news in ways that traditional print media cannot achieve, the hypermodal nucleus may contribute to science popularization in terms of engaging a larger audience and changing the stereotypes of science as being arcane, obscure, and irrelevant to daily life. The representational practices adopted in the nuclei, nonetheless, may cultivate oversimplified views of science. While the effectiveness of hypermodal nuclei in popularizing science is yet to be established, the nuclei in Futurity promote its partner universities as they foreground universities and researchers and celebrate research findings, marketizing research news in the name of science popularization. Researchers caution that while we “use all communication tools at our disposal to connect with hard-to-reach audiences [for science]” (Nisbet & Scheufele, 2009, p. 1774), it is crucial that the old-fashioned virtues of good journalism—accurate information, multiple sources, context over controversy, and editorial independence—not be lost in the enthusiasm for communicating content in novel ways. (Russell, 2009, p. 1491) More research is needed to explore whether good science journalism is lost in this enterprise. The critical multimodal analysis of hypermodal sciencenews nuclei presented in this chapter is a modest step in this direction. ACKNOWLEDGMENT This research was supported by the grant NRF2007IDM-IDM002-066 funded by the Interactive Digital Media Program Office (IDMPO) under the National Research Foundation (NRF) in Singapore. NOTES 1. This percentage includes images from image banks and those provided by universities and researchers.

From Popularization to Marketization  175 2. According to Alexa the Web Information Company, the global traffic rank of Futurity has been improved from #116,636 in May 2011 to #83,996 in April 2012, which means an improvement of more than 32,000 in the traffic ranking within one year. The traffic ranking in May 2011 is the earliest statistic we documented. An earlier ranking was even lower.

REFERENCES Alexa the Web Information Company. (2012). Clickstream for Futurity. Retrieved April 17, 2012, from Allan, S. (2009). Making science newsworthy: Exploring the conventions of science journalism. In R. Holliman, E. Whitelegg, E. Scanlon, S. Smidt, & J. Thomas (Eds.), Investigating science communication in the information age: Implications for public engagement and popular media (pp. 149–165). Oxford: Oxford University Press. Bauer, M. W., & Gregory, J. (2007). From journalism to corporate communication in post-war Britain. In M. W. Bauer & M. Bucchi (Eds.), Journalism, science and society: Science communication between news and public relations (pp. 53–70). New York: Routledge. Bell, A. (1991). The language of news media. Cambridge, MA: Blackwell. Brainard, C. (2009, September 17). Is Futurity the future? Columbia Journalism Review. Retrieved from Burns, T. W., O’Connor, D. J., & Stocklmayer, S. M. (2003). Science communication: A contemporary definition. Public Understanding of Science, 12(2), 183–202. Caple, H. (2008). Intermodal relations in image nuclear news stories. In L. Unsworth (Ed.), Multimodal semiotics: Functional analysis in contexts of education (pp. 125– 138). London: Continuum. Caple, H. (2009). Multi-semiotic communication in an Australian broadsheet: A new news story genre. In C. Bazerman, A. Bonini, & D. Figueiredo (Eds.), Genre in a changing world (pp. 243–254). Fort Collins, CO: The WAC Clearinghouse. Caple, H., & Bednarek, M. (2010). Double-take: Unpacking the play in the imagenuclear news story. Visual Communication, 9(2), 211–229. Cooke, L. (2003). Information acceleration and visual trend in print, television, and web news sources. Technical Communication Quarterly, 12(2), 155–181. Djonov, E. (2008). Children’s website structure and navigation. In L. Unsworth (Ed.), Multimodal semiotics: Functional analysis in contexts of education (pp. 216– 236). London: Continuum. Durant, J. (1994). What is scientific literacy? European Review, 2(1), 83–89. Eggins, S., & Martin, J. R. (1997). Genres and registers of discourse. In T. A. Van Dijk (Ed.), Discourse as structure and process (pp. 230–256). London: Sage Publications. Fairclough, N. (1993). Critical discourse analysis and the marketization of public discourse: The universities. Discourse & Society, 4(2), 133–168. Fairclough, N. (2001). Critical discourse analysis as a method in social scientific research. In R. Wodak & M. Meyer (Eds.), Methods of critical discourse analysis (pp. 121–138). London: Sage Publications. Fairclough, N., & Wodak, R. (1997). Critical discourse analysis. In T. A. Van Dijk (Ed.), Discourse studies: A multidisciplinary introduction (Vol. 2, pp. 258–284). London: Sage Publications. Goldhaber, M. H. (1997). The attention economy and the net. First Monday, 2(4). Griffin, M. (2004). Picturing America’s “War on Terrorism” in Afghanistan and Iraq: Photographic motifs as news frames. Journalism, 5(4), 381–402.

176  Yiqiong Zhang and Kay L. O’Halloran Hall, S. (1973). The determinations of news photographs. In S. Cohen & J. Young (Eds.), The manufacture of news: Deviance, social problems and the mass media (pp. 176–190). London: Constable. Halliday, M. A. K. (1994). An introduction to functional grammar (2nd ed.). London: Edward Arnold. Halliday, M. A. K., & Mathiessen, C. M. I. M. (2004). An introduction to functional grammar (3rd ed.). London: Arnold. Hermida, A. (2007). Reimagining science journalism. Paper presented at the Future Directions in Science Journalism Conference, University of British Columbia. Retrieved from Horrigan, J. B. (2006). The Internet as a resource for news and information about science. Retrieved from Iedema, R., Feez, S., & White, P. R. (1994). Stage two: Media literacy. In A report for Write It Right Literacy in Industry Research Project. Sydney: Disadvantaged Schools Program, N.S.W. Department of School Education. Knox, J. (2007). Visual-verbal communication on online newspaper home pages. Visual Communication, 6(1), 19–53. Knox, J. (2009a). Punctuating the home page: Image as language in an online newspaper. Discourse & Communication, 3(2), 145–172. Knox, J. (2009b). Visual minimalism in hard news: Thumbnail faces on the SMH online home page. Social Semiotics, 19(2), 165–189. Kress, G. (2003). Literacy in the new media age. London: Routledge. Kress, G., & Van Leeuwen, T. (2006). Reading images: The grammar of visual design (2nd ed.). London: Routledge. Lair, D. J., Sullivan, K., & Cheney, G. (2005). Marketization and the recasting of the professional self. Management Communication Quarterly, 18(3), 307–343. Lemke, J. L. (1998). Multiplying meaning: Visual and verbal semiotics in scientific text. In J. R. Martin & R. Veel (Eds.), Reading science: Critical and functional perspectives on discourses of science (pp. 87–113). London: Routledge. Lemke, J. L. (2002). Travels in hypermodality. Visual Communication, 1(3), 299–325. Lemke, J. L. (2005). Multimedia genre and traversals. Folia Linguistica, XXXIX(1–2), 45–56. Lewenstein, B. V. (2001). Science and the media. In N. J. Smelser & P.B. Baltes (Eds.), International encyclopedia of the social & behavioral sciences (pp. 13654– 13657). Oxford: Pergamon. Locke, T. (2004). Critical discourse analysis. London: Continuum. Machin, D. (2004). Building the world’s visual language: The increasing global importance of image banks in corporate media. Visual Communication, 3(3), 316–336. Martin, J. R. (2001). Fair trade: Negotiating meaning in multimodal texts. In P. Coppock (Ed.), The semiotics of writing: Transdisciplinary perspectives on the technology of writing (pp. 311–338). Turnhout: Brepols. Martin, J. R., & Rose, D. (2007). Working with discourse: Meaning beyond the clause (2nd ed.). London: Continuum. Martinec, R., & Salway, A. (2005). A system for image-text relations in new (and old) media. Visual Communication, 4(3), 337–371. Myers, G. (2003). Discourse studies of scientific popularization: Questioning the boundaries. Discourse Studies, 5(2), 265–279. Nelkin, D. (1995). Selling science: How the press covers science and technology (2nd ed.). New York: W.H. Freeman and Company. Nisbet, M. C., & Scheufele, D. A. (2009). What’s next for science communication? Promising directions and lingering distractions. American Journal of Botany, 96(10), 1767–1778.

From Popularization to Marketization  177 O’Halloran, K. L. (1999). Interdependence, interaction and metaphor in multisemiotic texts. Social Semiotics, 9(3), 317–354. O’Toole, M. (2011). The language of displayed art (2nd ed.). New York: Routledge. Royce, T. D. (1998). Synergy on the page: Exploring intersemiotic complementarity in page-based multimodal text. JASFL Occasional Papers, 1(1), 25–50. Royce, T. D. (2007). Intersemiotic complementarity: A framework for multimodal discourse analysis. In T. D. Royce & W. L. Bowcher (Eds.), New directions in the analysis of multimodal discourse (pp. 63–110). Mahwah, NJ: Lawrence Erlbaum. Russell, C. (2009). Science journalism goes global. Science, 324, 1491. Sagolla, D. (2009). 140 Characters: A style guide for the short form. Hoboken, NJ: Wiley. Shinn, T., & Whitley, R. (1985). Expository science: Forms and functions of popularisation. Dordrecht: D. Reidel Publishing Company. Slaughter, S., & Rhoades, G. (2004). Academic capitalism and the new economy: Markets, state, and higher education. Baltimore, MD: The Johns Hopkins University Press. Sontag, S. (2001). Where the stress falls: Essays. New York: Farrar, Straus and Giroux. Tan, S. (2011). Facts, opinions, and media spectacle: Exploring representations of business news on the Internet. Discourse & Communication, 5(2), 169–194. Trench, B. (2007). How the Internet changed science journalism. In M. W. Bauer & M. Bucchi (Eds.), Journalism, science and society: Science communication between news and public relations (pp. 133–141). New York: Routledge. Trench, B. (2008). Internet: Turning science communication inside-out? In M. Bucchi & B. Trench (Eds.), Handbook of public communication of science and technology (pp. 185–198). London: Routledge. Trench, B. (2009). Science reporting in the electronic embrace of the Internet. In R. Holliman, E. Whitelegg, E. Scanlon, S. Smidt, & J. Thomas (Eds.), Investigating science communication in the information age: Implications for public engagement and popular media (pp. 166–180). Oxford: Oxford University Press. Unsworth, L., & Cléirigh, C. (2009). Multimodality and reading: The construction of meaning through image-text interaction. In C. Jewitt (Ed.), The Routledge handbook of multimodal analysis (pp. 151–163). London: Routledge. Van Dijk, T. A. (1988). News as discourse. Hillsdale, NJ: Lawrence Erlbaum Associates. Van Dijk, T. A. (1993). Principles of critical discourse analysis. Discourse & Society, 4(2), 249–283. Van Dijk, T. A. (2001). Multidisciplinary CDA: A plea for diversity. In R. Wodak & M. Meyer (Eds.), Methods of critical discourse analysis (pp. 95–120). London: Sage Publications. Van Leeuwen, T. (2008). Discourse and practice: New tools for critical discourse analysis. New York: Oxford University Press. White, P. R. (1997). Death, disruption and the moral order: The narrative impulse in mass “hard news” reporting. In F. Christie & J. R. Martin (Eds.), Genres and institutions: Social processes in the workplace and school (pp. 101–133). London: Cassell. White, P. R. (1998). Telling media tales: The news story as rhetoric. (Doctoral dissertation). University of Sydney, Sydney, Australia. Wodak, R., & Chilton, P. (Eds.). (2005). A new agenda in critical discourse analysis: Theory, methodology and interdisciplinarity. Amsterdam: John Benjamins. Wodak, R., & Meyer, M. (Eds.). (2001). Methods of critical discourse analysis. London: Sage Publications. Working, R. (2011). Universities band together to launch content-rich website. Retrieved from to_launch_contentrich_w_43036.aspx Zhang, Y., & O’Halloran, K. L. (2012). The gate of the gateway: A hypermodal approach to university homepages. Semiotica, 2012(190), 87–109.

This page intentionally left blank

Part III

New Audienceship and Authorship in Popular Discourse

This page intentionally left blank

11 Telling a Different Story Stance in Verbal-Visual Displays in the News Dorothy Economou

1. INTRODUCTION A common concern for any news publication today is finding ways of attracting and holding a wider readership base. One way has been by using more and larger images interacting with surrounding text in different ways. However, this move to become more ‘popular’ through the use of visual material, especially photographs, has consequences for the presentation of ‘objective’ stories. It calls for more sophisticated analyses both of the meanings carried by news photos and of their interactions with accompanying verbal text. This chapter demonstrates how the tools of critical discourse analysis can be extended to do this, examining examples of prominent verbal-visual displays that introduce serious news feature stories in print and online broadsheets. Such displays comprise a prominent image or images, a main headline, caption, and subheadline, and can take up most of the front page of a weekly news review section of a print paper. The aim here is to show how meanings in the image and the surrounding verbiage, and in their interaction, attitudinally engage and position readers in respect to the ‘factual’ content. In this chapter, two different displays are examined, each used to introduce the same 2,400-word news feature story written by Australian political journalist David Marr (2009a, 2009b). These were published on the same day in two different Australian metropolitan news sites owned by Fairfax Publications1: the online The Sydney Morning Herald (SMH) and the Melbourne print newspaper, The Age. Though both newspapers address a similar audience, with a large proportion of tertiary educated, professional, and affluent readers, and pride themselves on their political pluralism, the SMH is perhaps the more politically centrist and more populist of the two, and The Age more left-leaning. Both newspapers also pride themselves on greater editorial independence than found in the great majority of Australian newspapers such as The Australian, owned by Rupert Murdoch’s News Limited. The two newspapers have also in the past decade made major content and design changes in their print

182  Dorothy Economou versions, and invested heavily in their online versions—both moves aimed at increasing sales and attracting a broader readership. The two texts to be analysed introduce the cover story in what is considered the prestige section of each broadsheet—the weekly news review, which contains longer in-depth news analysis pieces. The readership of the news review has always been a narrower one than the mass reader­ ship that is now increasingly targeted overall by each newspaper. The traditional news review reader is considered likely to be knowledgeable on political issues and probably comes to most stories already holding an attitudinal position on the issue explored. In the print broadsheet, much time and news design expertise are devoted to production of the news review front-page display, which is larger and of course more durable than in the online broadsheet.2 The smaller, less prominent and more transient online display (some displays are changed over the week) gets less production time—time it must also share with the production of an online photo library component. The story introduced by the two different displays was published on Saturday, October 17, 2009, and examines the then Australian government policy on ‘boat people’, those who arrive by sea from Southeast Asia seeking asylum. This has been a controversial political issue affecting the outcome of national elections in the recent past, with many Australian voters hostile to such arrivals. The writer, David Marr, is a well-known and respected Australian political analyst and commentator, and knowledgeable local readers would be familiar with his sympathetic stance on asylum seekers and his consistent criticism of the tough policies of both the previous conservative and the present labour governments. However, in keeping with news analysis, Marr’s stance in the story is one that emerges from substantial evidence presented, rather than being stated as an explicit authorial position in the manner of an opinion piece. The introductory display provides an important orientation for the ensuing story and is even more important when it is read as a substitute for the story. Therefore, when the same story is published with quite different displays, it merits exploration. It is argued in this chapter that a close comparison of the two displays shows that each in fact ‘tells a different story’, and in the process also positions readers differently toward the issue and the events explored in the ensuing feature story. Importantly, the comparison reveals that only one display fully encapsulates Marr’s story and its overall stance. The detailed systematic comparison of two displays also reveals the still under-appreciated complexity of intersemiosis—interaction between words and pictures—in respect to attitudinal meanings and their accumulation across such a display. It remains a question whether differences found in overall stance may be due to differences in editorial position on the issue, audiences targeted, insufficient consideration of the display as a stand-alone text, and/or production constraints in the two contexts.

Telling a Different Story  183 2.  BACKGROUND, THEORY, AND METHOD This work follows in the long tradition of critical discourse analysis (CDA) of media texts, whose most well-known proponent is Norman Fairclough (1992, 1995). CDA scholars (Fowler, Hodge, Kress, & Trew, 1979) were the first to demonstrate how patterns of language choices in supposedly factual texts can attitudinally position compliant readers. This analytical approach, using the grammatical tools of Systemic Functional Linguistics (SFL), has been successfully taken up by CDA media analysts in different scholarly traditions to describe and deconstruct ideology and evaluation in print media texts (e.g., Van Dijk, 1988). One use made of SFL categories is to examine the content of a written text. For example, the grammatical roles of Actor and Goal can be discriminated, as in the following: The police (actor/agent) charged (action) the rioters (goal) repeatedly (manner). The rioters (actor) rampaged (action) through the town (location). A verbal text’s content, which can be analysed in this way, is referred to as its Ideational meaning (Halliday & Matthiessen, 2004), and in this chapter similar analytical categories will be applied to the visual image. The present work draws particularly on SFL critical discourse analysis, which makes use of the full range of SFL analytical tools (Martin, 2004b; Martin & White, 2005; White, 1998, 2006) and adds to the smaller group of studies that deconstruct visual and verbal-visual media texts (Caple, 2008, 2010; Macken-Horarik, 2003a, 2003b, 2004; Martin, 2001, 2004a, 2004b, 2008; Thibault 2001). As the main aim in this work is to describe evaluative stance, the analysis makes most use of the map of evaluative meanings provided within SFL known as appraisal and presented in Figure 11.1. Originally developed for language in Martin and White (2005), it has been extended for images, in particular, naturalistic photographs, in Economou (2009). Appraisal describes attitude expressed, as either positive or negative, and of three different kinds: emotions we feel (‘happy/sad’); moral judgements we make of human behaviour (‘cruel/kind’); and how we assess or appreciate material, semiotic, or abstract things (‘beautiful/distorted’). These three kinds of attitude are referred to as affect, judgement, and appreciation, respectively (see Figure 11.1). Appraisal also includes two means of contributing indirectly to attitude—one by grading meanings up or down in different ways, for example, by raising or lowering intensity (‘slightly/ extremely’)—and another by engaging with external and internal voices/positions in different ways (e.g., ‘experts say so/I think so/it may be so’) or by not acknowledging any voice (‘it is the case’). These are, respectively, the systems of graduation and engagement.

184  Dorothy Economou

Figure 11.1  The appraisal system (based on Hood, 2006; Martin & White, 2005)

Appraisal also describes the various strategies by which attitude values may be expressed, here presented in Figure 11.2. These range from the most explicit and unambiguous expression (in words like ‘happy’, ‘bad’, ‘uneven’) to more implicit forms, such as ‘he got up late’, which may or may not be interpreted negatively. Explicit expressions are referred to as ‘inscribed’ and implicit ones as ‘invoked’ or ‘evoked’. Figure 11.2 shows three different ways of evoking attitude, ranging from most to least explicit forms of expression: provoking, flagging, and affording. The most open to interpretation are those ideational meanings that may act as a token of attitude to particular groups. For example, in ‘They voted Labour’ the ideational content may imply negative, positive, or no judgement values depending on the addressee’s political leanings. A common-sense understanding of images is that they have the power to trigger much stronger attitudinal responses than language. One explanation for this belief is that visual ideational meaning is less open to interpretation than verbal, due to the different affordances of the visual semiotic. Even where no emotion is visible on depicted people, the amount of detail offered

Telling a Different Story  185

Figure 11.2  Strategies for the expression of attitude (Martin & White, 2005)

by a clear, naturalistic photo compared to a verbal proposition means that attitude potentially evoked is far less likely to be ambiguous. Consider a photo taken of a violent act. Strong negative attitudinal responses likely to be triggered in viewers by such a photo are the reason it may be banned as too ‘graphic’ or upsetting for publication, while the news story describing the scene is published. Our analysis therefore considers visual ideational items as ‘provoking’ attitude and thus identifiable as either positive or negative, and in terms of attitude type—affect, judgement, or appreciation. The analysis considers not only the attitude provoked by what is depicted, but also by how it is depicted. There are many ways that photographers and photo editors can adjust or graduate visual ideation in a naturalistic photo, with the effect of raising or lowering its force or impact. For example, a visual item can be made bigger or brighter, thus further flagging any attitudinal reading of the item. Such choices are described in the visual graduation system of “force”, presented in Figure 11.3, where the main choices are “quantification”, “intensification”, and “repetition”, the last of which is not an option in the verbal system. Visual realizations of each choice are briefly described in the figure. Finally, the third appraisal system of engagement can also have relevance for visual images. However, of its two options—‘heteroglossic’ and ‘monoglossic’—only the second, in which neither authorial nor any external voice is acknowledged or discerned, is taken up in both news photos analysed here. Thus, any attitudinal meanings provoked are likely to be seen as emanating from the ‘reality’ captured in the photo and not by the set of technical choices made by its producers (perhaps giving these even greater impact).

186  Dorothy Economou

Figure 11.3  Visual graduation: Force system (Economou, 2009)

The two texts under examination are identified here not only as introductory displays for a long news review story, but also as independent texts likely to be read instead of the story. They are referred to as ‘standout’ texts (Economou, 2006, 2008, 2009, 2010), comprising only prominent images and bold verbiage and identified as a genre (Martin & Rose, 2008) now found across the popular news media. The purpose of a news review standout is identified as a dual one, incorporating both the selling and the telling of a story (or from the producer’s perspective, the ‘retelling’ of the ensuing story). The structure of a standout, as presented in Figure 11.4, is identified as ‘orbital’ (White, 1998) by which is meant that one element functions as a nucleus and the rest as satellites. The standout nucleus is the large image-headline unit, usually placed top and centre, taking up most of the space of the page. As the largest, most prominent, and salient element of the standout, it is designed to pull in even cursory readers of the page, and for those who continue, to be read first.3 It is named the Lure here to indicate its function in the ‘selling’ of the story. Each satellite comprises one of the remaining pieces of verbiage in interaction with the central photo. Each elaborates and depends on lure meanings, but the satellites do not depend on each other and can be read in any order. One satellite is the caption-photo unit, called the Image Anchor, where meanings in the photo are verbally specified and elaborated; another is the sub- or surheadline-photo unit, called the Point, which elaborates on the more indeterminate lure meanings to pin the story down, and thus, also

Telling a Different Story  187

Figure 11.4  Standout orbital structure

encapsulate the ensuing story. One standout also has a pullout quote from the story, called a Tease (as known in journalism). 3.  COMPARISON OF THE TWO STANDOUTS The two standouts to be analysed are shown in Figure 11.5. The most striking similarity between them is that the prominent photo in each depicts the same generic ‘social actor’ (Van Leeuwen, 1996)—asylum seekers wishing to enter Australia, on the deck of a boat we can assume is close to shore. Despite this similarity, each standout overall is found to tell a different story and to construct a different overall evaluative stance. In particular, the standouts differ in respect to whether they position compliant readers to be sympathetic or not toward boat people, and approving or disapproving of Australia’s response to them. The findings from the comparative analysis of the two lures are briefly presented first, pointing out similarities in the stance they construct at an initial glance. Then, in a more comprehensive comparison of the whole standouts, the accumulation of attitudinal meanings across each standout is tracked to show how a quite different overall stance is constructed in each.

3.1  The Lure: Photo and Headline The lure is a verbal-visual unit given careful consideration by producers in respect both to pulling readers in and making sense at a glance. Both lures deploy the following four strategies (on these and other strategies,

188  Dorothy Economou

Figure 11.5  The online SMH standout and The Age print standout

see Economou, 2012) commonly used to attract and quickly engage a mass readership in standouts: • Striking design (more so in The Age), most significantly comprising a large, prominent colour photograph and a large, bold headline above it • An image depicting a recognizable and controversial social actor likely to trigger a strong attitudinal response from readers, and a headline featuring a relatively explicit (inscribed or provoked) attitudinal meaning • An easily accessible verbal-visual ‘figure’ (Halliday & Matthiessen, 1999) in which one or more missing grammatical elements in the headline is readily identifiable in the photo (in both lures, the social actor, boat people, is available only in the photo, and is more easily retrieved in the SMH figure) • A relatively ‘uncommitted’ (Martin, 2008) evaluative stance, whereby alternative, even opposing, attitudinal readings of the verbal-visual figure are possible depending on already held reader attitude These visual-verbal choices indicate that each lure targets a wider readership than the ensuing story, both in terms of making a figure easily accessible to even the most cursory of readers, and most importantly, in allowing readers to apply any attitudinal reading to the figure. In the SMH Lure, the headline “Come hell or high water” (interpretable as ‘against all odds’ or ‘with tenacity’) is missing both a human actor and an action. The actor is depicted (people on a boat close to shore) and the action is implied (they have sailed to our shores/are trying to get in). In “Fear rides in on a rusty boat” in The Age lure, only a human actor is missing, again easily retrievable from the photo.

Telling a Different Story  189 Most significantly, the available figures in each lure can be attitudinally read in different ways. Thus, it is unlikely that any reader, whatever their stance on the issue of asylum seekers, will be alienated at an initial glance. They can maintain a sympathetic or disapproving stance, as well as an undecided one, in respect to the depicted social actors and action. For example, in the SMH lure, the most likely figure is ‘Boat people are trying to enter Australia come hell or high water’, to which readers, if they wish, can equally well attach ‘and that is the right thing to do’ or ‘and that is the wrong thing to do’. Equally, in The Age lure, perhaps the easiest figure available at a glance4 is ‘Frightened boat people are coming to Australia in old, rusty boats’ and can equally well have attached to it, ‘and that is wrong/undesirable’ or ‘understandable/desirable’.

3.2  The Standout: Accumulated Meaning and Overall Stance This section examines the more elaborated meanings available beyond the lure. It discusses appraisal choices that emerge from closer or repeated viewings of the photo, from reading the surrounding verbiage (the caption, sub-/ surheadline, and quote), and from the interaction of the two. The major difference found between the two standouts is that the SMH standout tells much less of the ensuing story and constructs a more covert stance, although one more committed than in the lure. It does not, however, reflect the stance of the story. In contrast, The Age standout tells more of the story, and constructs a more committed and overt stance, one that closely reflects that of the ensuing written story. The difference between the two standouts is evidenced by a contrast in the social actor chains created across the elements, as presented in Figure 11.6. As the major cohesive device in an orbital text, such chains are not only the means by which ideational items are linked and elaborated across the standout, but also the means by which attitude values associated with ideational items are accumulated, helping to construct an overall stance. In The Age standout, two social actor chains are formed, one for Australians and another, longer chain for asylum seekers. In the SMH standout, only one chain is formed, that for asylum seekers. The standouts also differ in layout. The Age standout’s more orbital layout5 and greater number of satellites encourage more repeated looks at the photo. Each piece of verbiage positioned around or within the photo ensures that its dense visual detail receives closer attention, and thus more visually provoked attitudinal values are accumulated. In contrast, the SMH standout’s more linear structure does not encourage as many re-viewings of the main photo. It also offers a library/gallery of 12 related photos, which are presented as enlargeable thumbnails in this standout, potentially taking readers’ attention away from the central photo and making it less likely to be viewed closely or repeatedly. (On online news galleries, see further Caple & Knox, 2012.)

Figure 11.6  Social actor chains in the two standouts: Asylum seekers and Australians

Telling a Different Story  191 Table 11.1  SMH standout: Visual and verbal components Standout Components

Verbiage (in italics)/ and Visual Content


Come hell or high water


Boat people: four men arriving in Australia; addressing ­Australians; protesting?

Signs in photo

WE ARE SRI LANKAN CIVILIANS PL. . . . SA. E . . . . . . . . . 


Sri Lankan asylum seekers engage in a hunger strike after their boat broke down on the way to Christmas Island.


Ever since refugees began reaching Australia by boat, politicians of all stripes have heard the message: don’t let them in.

4.  SMH ONLINE STANDOUT The overall stance constructed in the SMH standout targets boat people, the only social actor in the photo and the only one to form a semantic chain across both visual and verbal components, and with whom attitude values are associated across the standout. The other key social actor is realized explicitly only in one instance, “politicians of all stripes”, in the subheadline. For ease of reference, the standout text is presented in Table 11.1, with verbal text (verbiage) in italics and a brief summary of the visual content also provided for the photo.

4.1  Accumulating Meaning in the Image As the largest and central component, whatever attitudinal work the photo does necessarily dominates in a standout, and like the attitude in the headline, spreads across and affects the entire standout. In addition, as the different pieces of verbiage are read, they encourage more and possibly closer looks at this photo, thus facilitating the accumulation of visual attitudinal meanings (see Martin & White, 2005, on domination and repetition/accumulation in evaluative prosody in language). There are only two ideational items that can serve as tokens of attitude in this photo—handwritten signs and people on a boat. The latter, as a human social actor, has greater potential for provoking all three types of attitude—not only affect and appreciation but also judgement. Viewers’ responses will be that triggered by the single category of boat person depicted here—dark-haired, dark-skinned, strong-looking adult males of South Asian or Middle Eastern appearance. Also relevant are the boat people’s implied actions, ‘trying to get into Australia’ and perhaps ‘protesting’, both done tenaciously, as the headline says.

192  Dorothy Economou For those already strongly positioned, the photo can be read either positively or negatively. However, for readers who do not already hold a position on this issue, meanings in the photo may colour and interact with the satellite verbiage to position them negatively toward boat people. A photo of such men trying to get into Australia “come hell or high water” is likely to trigger a much more negative attitude for many Australians than one of women or children or families, particularly for readers undecided on boat people. It can also be argued that these men’s appearance fits the ‘Middle Eastern terrorist’ stereotype made common in the Western media after 9/11, one still conflated with asylum seekers where arguments are made against their acceptance by Australia.6 The strength of visual stereotypes in the news has been shown to override verbal information to the contrary,7 such as their identification here as Sri Lankan civilians, who are normally not associated with international terrorism. Reinforcing the negative stereotype in this photo are the men’s unsmiling expressions, and shadows created by camera angle and position, which darken their skin and suggest beards where there are none.8 In appraisal terms, such a stereotype works by triggering feelings of antipathy and negative assessment of a social actor based on condemnation of behaviour believed to be practised by them—a potent attitudinal package of multiple negative affect, appreciation, and judgement values. If already held, such a stereotype can only be strengthened by the attitude value of tenacity in the headline, and by a depiction that implies the men are protesting. In terms of visual graduation (Economou, 2009), many choices made in this standout will intensify any antipathy or disapproval that this type of boat person may trigger. One choice made here is visual repetition, in this case the inclusion of four equally salient instances of the same type of man doing the same thing. In the visual appraisal system, repetition is a high force graduation choice realized by the inclusion of more than one ideational token of attitude of the same kind, often achieved by camera angle and/ or cropping. In this way, attitude values evoked by a single such token are multiplied. Another way that attitude values provoked by the depicted men are further flagged in this photo is by their foregrounding and the amount of space they take up in the shot. This is another type of high-force choice, that of visual quantification, specifically high proximity and extent.

4.2  Accumulating Meaning across the Whole Standout The SMH standout includes two relatively explicit negative attitude values in satellite verbiage, which skew the overall stance toward disapproval of asylum seekers for undecided readers. In this way the standout is more attitudinally committed than its lure, though its stance is far from being overt. One significant pattern that contributes to the overall stance is the representation of boat people as agentive actors trying to act upon Australia or Australians in both the photo and all pieces of verbiage. In both standouts, the photos imply boat people are trying to enter Australia and depict them as

Telling a Different Story  193 directly addressing Australians. However, only in the SMH standout is this agency verbally elaborated upon and accumulated in satellite verbiage. Most significantly, in the caption, “asylum seekers engage in a hunger strike” confirms the suggestion made in the photo of boat people ‘protesting’. In terms of associated attitude values, the accumulation of boat people’s agency ensures that the dominant attitude value (judgement: sanction: tenacity) in “Come hell or high water” spreads across the standout to affect reader positioning. Where readers strongly disapprove or approve of the represented action, that is, whether they consider it to be appropriate or not,9 its association with tenacity is likely to intensify the already-held attitude, for tenacity is more likely to be judged positively if the action executed with tenacity is approved of. However, where readers are unsure or undecided about Australia giving asylum to boat people, this spread of tenacity to all agentive actions, especially their extreme protest action, is likely to contribute to triggering disapproval. A related significant choice is that the most explicit attitude value in the standout is negative judgement targeting boat people, implied in the subheadline “don’t let them in”. Though this is the only instance where boat people are not the grammatical actor-agent, the negative imperative clearly implies their agentive action of ‘trying to get in’, as well as negative affect or judgement, interpretable as ‘we don’t want them here’ or ‘it is wrong to let them in’. Though an external source (Australians) is the implied appraiser, just as in an ostensibly objective news story that quotes only one party’s opinions, this quote can affect the attitudinal positioning of undecided readers. When read in association with the visual meanings, and the verbal iterations of boat people agency in actions carried out with ‘tenacity’, many compliant and not already sympathetic readers would easily align themselves with this unattributed demand. To sum up, the overall stance constructed in this online standout remains sufficiently uncommitted not to alienate either those who are already strongly sympathetic or those strongly disapproving of boat people. In this way, the inclusiveness displayed by the lure is maintained. However, the somewhat more committed stance constructed by accumulating verbal and visual ideation and both dominating and accumulating evaluative prosodies is more likely to create disapproval in respect to recently arrived boat people for compliant and not already positioned readers. This disapproval, significantly, is at odds with the stance in the ensuing story, but consonant with the attitude of the majority of the Australian population, a fact explicitly presented and supported by statistical evidence in the ensuing story. 5.  THE AGE PRINT STANDOUT Though the overall stance construed in this standout also focuses on boat people, it targets two social actors and forms a chain for each (see Figure 11.5), one for asylum seekers and one for Australians, with attitude values accumulating

194  Dorothy Economou Table 11.2  The Age print standout Standout Component

Verbiage and Visual Content


Fear rides in on a rusty boat


Boat people families on crowded old boat with washing hung up. Few men two working on top deck Many women looking worried one holding a baby Children calling out smiling.

Verbiage in photo



Sri Lankan asylum seekers bound for Australia. Their boat was towed back to Indonesia on Prime Minister Rudd's request.


ASYLUM SEEKERS: As tiny Christmas Island fills to overflowing with people seeking refuge, Australia’s ­politicians pander to an empirically unfounded anxiety in the national psyche.

Pullout quote

We process people on the mainland who arrive unauthorized by air. Senator Chris Evans, Immigration Minister

both across and between chains. In the asylum-seeker chain, not only are there many categories of people who arrive by boat, but also those who arrive by air. The other chain, for Australians, includes the general population, politicians, and more specifically, individually named members of the government, including the Prime Minister. The standout text is presented in Table 11.2. The more overt and committed overall stance in this standout is a result of more explicit verbal attitude values targeting each social actor, and many more visual attitude values provoked by the denser ideation in the photo, which represents much more of the same scene depicted in the SMH standout photo. The visual values overall create sympathy with, and approval of, boat people, and the verbal values create disapproval of the government policy on boat people. Significantly, the interaction between the verbal and visual does more than construct this overall double stance. It also constructs a visualverbal exposition whereby the positive stance on boat people in the photo provides support for the negative stance on government policy in verbiage.

5.1  Accumulating Meaning in the Image The depiction of boat people in The Age standout is strikingly different from that in the SMH standout photo. There are more and different kinds of boat people, depicted actions, and circumstantial elements that provoke and inscribe attitude values, all creating positive alignment with boat people. Another significant difference is that both handwritten signs are clearly visible

Telling a Different Story  195 here, as large and prominent as the headline they are placed directly below. Thus, any attitudinal meanings evoked by the signs are not only coloured by the headline attitude, but also interact very closely with visual attitude in every further look at the photo. In particular, the “Plz save our life” sign accumulates by implication both negative attitude values in the headline: the negative affect of fear in association with boat people, and the negative appreciation in rusty boat (in turn evoking judgement: capacity targeting boat people). The direct plea justifies boat people’s fear and expresses in their own voice their neediness, both also visually expressed and implied in many ways. As well as requesting action or help from Australians, this sign works with visual meanings (to be elaborated further below) to trigger sympathy. The positive stance construed by this photo toward boat people, particularly in any sustained look, will cause tension for viewers who believe Australia should turn boat people away, and is thus likely to alienate these viewers. Undecided viewers, by contrast, will almost certainly be positioned positively toward these people, and if they read on, this stance is strengthened by both the verbiage and by each re-viewing of the photo. Alongside the many negative values associated here with boat people that function to create sympathy, there are also many positive values that ensure approval. Most significantly, due to the more substantial verbiage, greater orbitality, and denser visual meanings in this standout, the stance toward boat people in the photo plays a much bigger part in constructing overall stance than in the SMH standout. The greatest contrast with the SMH photo is that the boat people depicted here are mostly women and children. Of the few men, two on the top deck are engaged in action that their posture suggests is ‘working on the boat’. Though we cannot see the men’s faces clearly, most women have visibly worried or serious expressions, in contrast to the children, some smiling as they lean over the rail and call out. There are also nonhuman ideational items that imply and interact with other meanings to contribute to positive alignment with these boat people. The worn, old timber boat beams forming three large vectors horizontally across the photo visually reinforce the headline’s “rusty boat”. The washing line hung with children’s clothes strung up above the women forms another vector across the shot and implies, together with the old beams and the visibly happy, active children, that the adults are looking after the children well in difficult circumstances. The configuration of attitude values in this photo targeting boat people guarantees strong positive alignment even for noncompliant readers. One set of values creates sympathy for suffering people—provoked by depicting worried, afraid, needy parents, particularly mothers, in unfortunate, dangerous circumstances (in terms of appraisal: negative affect: happiness and security + negative judgement: esteem: capacity and normality). Another set of values creates approval for admirable people, provoked by depicting or implying hard-working parents who care for their children well in such circumstances (positive judgement: sanction: propriety and esteem: capacity targeting the parents, and positive affect and capacity, the children).

196  Dorothy Economou This powerful attitudinal package is given even more impact by visual graduation choices made by the photographers or photo editors. The more distant view and angle here than in the SMH photo means that not only are both signs and more people clearly visible, but also that a much greater space is left for and around the three nonhuman items—signs, boat beams, and washing line. This ensures their salience and attitudinal effect remains high despite the strong salience of many clearly depicted humans. Both visual repetition and different kinds of visual quantification are used to increase the force of the attitude values evoked by both human and nonhuman items. The attitude evoked by the signs addressing Australians, the weathered beams, and washed garments is flagged by their repetition, size, and/or extent in the shot. For the foregrounded women and children facing us, crowded together on the lower deck, high number and proximity values increase the force of attitude provoked by their depiction. Adding to the repetition and quantification flagging the positive values associated with the children is another high-force choice—the vividness of the pink and red hues of their T-shirts, which visually intensifies these attitudes. The many visual ideational tokens of attitude in this photo, many unable to be processed in an initial glance, means this photo is well-chosen to do substantial evaluative work in an orbital standout. Each time readers look at the photo, they not only gradually unpack more embedded visual meanings in relation to the accompanying verbiage, but also revisit these meanings each time they look. The next section examines the way in which the stance being built up toward boat people in the photo interacts with the verbiage in each further look.

5.2  Accumulating Meaning across the Standout Crucial in the interaction with visual attitudinal meanings are ideational patterns developed across the standout in respect to grammatical roles and agency given to each of the two social actors, that is, boat people and the government. In The Age photo, the agency of boat people acting on their children or their boat is associated with positive judgement values. However, in the verbiage, this is countered by their consistent representation as acted upon by the Australian government. Unlike the SMH standout caption, in which “asylum seekers engage” in a protest action, in The Age caption “Their boat was towed back to Indonesia at Prime Minster’s Rudd’s request”. In an even greater contrast, the agentive social actor, Australia’s then Prime Minister, is introduced and named. At the same time, the dominant negative appreciation value in the headline’s “rusty boat” and the old boat beams in the photo is accumulated verbally in “boat . . . towed back”. In the surheadline, “people seeking refuge” accumulates the agency but also the neediness of the boat people depicted in the photo. More importantly, agency is further accumulated on the part of both Australian politicians and people, and in explicitly negatively judged behaviours.

Telling a Different Story  197 In “Australian politicians pander to an empirically unfounded anxiety in the national psyche”, both “Australian politicians”, and then, in the more abstract “national psyche”, Australians in general are targeted. The initial negative judgement (sanction: propriety) of politicians in “pander” is greatly strengthened by the negative appreciation of what they pander to. And as the emoter of this “empirically unfounded anxiety”, Australians are also negatively judged. The negative emotion “anxiety” elaborates and accumulates the headline’s dominant fear, though this time with the emoter identified. Thus, what was only an implication in the lure, perhaps not accessible at a glance, is here made clear: Boat people may feel fear but they also trigger fear and anxiety in Australians. Importantly, the depiction of needy boat people families functions as visual evidence of just how empirically unfounded Australians’ fear and anxiety is, in any further look at the photo. Finally, in the pullout quote from the written story, government agentive action directed at asylum seekers continues to accumulate, even though neither the layout nor the content of this quote encourages a look at the photo10: “We process those who arrive unauthorized by air”, a statement explicitly attributed to the Immigration Minister, refers to a category of asylum seekers not represented in the SMH standout. This statement by a high-status source provides authoritative evidence in support of the explicit negative judgement of Australian politicians and people in the surheadline. It does this by setting up an implicit contrast between how people arriving by air and those arriving by boat are perceived and treated. As with the Christmas Island reference in the caption, this implication may require some knowledge on the part of readers about Australia’s offshore processing policy for boat people. To sum up, the overall stance of The Age print standout is much more committed and more overt than its lure stance, and also more so than the overall stance of the SMH standout. A striking contrast between the two standouts is The Age standout’s greater complexity. First, it not only constructs a clearly positive alignment with boat people, but also an explicitly negative one with Australians through its two social actor chains. The asylum-seeker chain accumulates both visual and verbal attitude values that trigger sympathy and approval; the Australians chain only accumulates verbal attitude values that trigger disapproval. The two social actors are consistently connected verbally by representing Australian politicians as enacting negatively judged behaviours upon boat people. Finally, the photo and verbiage interact to construct an argument in which visual meanings support and provide evidence for the verbal negative judgement targeting Australians. 6. CONCLUSION The two standouts analyzed using SFL appraisal systems in this chapter not only tell a different story and construct a different stance, but most importantly, they also do not equally reflect the following story or its stance. The

198  Dorothy Economou online SMH standout is at odds with the story. It does not refer at all to that strand of the story that critically examines and condemns the present government’s policy on boat people. Neither does it reflect the lengthy argument that the hostility toward boat people felt by most Australians is based on an irrational fear, even though it quotes this hostile sentiment toward boat people. It also does not construct a sympathetic stance toward boat people as is implied throughout the story, and uses a photo that is likely to position undecided readers negatively toward boat people. By contrast, The Age print standout closely reflects both main strands of the ensuing expository story: one that concludes that the government’s policy of off-shore processing of boat people has failed, and another that all political party positions on the issue are based on securing votes from an electorate who have an irrational fear of boat people. The Age standout not only reflects the story’s overall critical stance on boat people policy, but also its expository aims. Interestingly, it construes a much more overt stance on boat people than does the story, using an attitudinally charged photo to create a strongly sympathetic and approving stance toward them. The dense meanings in the photo, gradually processed by readers of the whole standout, function as evidence in support of a highly critical stance toward Australian people’s and politicians’ response to boat people. Multiple viewings of this evaluatively laden photo provide, in the standout, an emotionally based equivalent of the series of contributing factually based arguments in the ensuing story supporting the same stance. One explanation for these differences is probably different target audiences. The more covert overall stance in the less-complex SMH standout strongly suggests the targeting of more diverse and less politically involved readership than for The Age. The more overt, complex, and expository stance in The Age strongly suggests less concern about alienating those holding different views and more attention to encapsulating the story. This more faithful reflection of the story may also be a result of greater time, care, and expertise available in print production, perhaps including consultation with staff writer, David Marr.11 Such editorial attention may also be related to the greater durability, size, and prominence of print standouts. However, despite the targeting of a broad audience and greater practical constraints in online production, the SMH standout’s minimal version of Marr’s story, with an overall stance that does not reflect that of the story, is of some concern. If intentional, it suggests an editorial position that differs from Marr’s. If unintentional, it suggests more care might be needed to avoid producing a standout that creates, for less-involved readers, a stance that is at odds with that in the written story. The application of verbal and visual appraisal systems to standouts in this study has revealed a surprising contrast between two texts purportedly doing the same thing. This finding suggests that production and design decisions aimed at making a verbal-visual text more attractive and inclusive, or perhaps based on efficiency, should be considered more carefully

Telling a Different Story  199 in terms of their semiotic consequences, particularly in a civic journalism context. The finding also clearly highlights the importance of refining our tools for closer, more systematic analysis of evoked evaluative meanings in standout visual-verbal texts—a text type increasingly deployed to “sell” serious news analysis. NOTES 1. For an analysis of a third display introducing the same Marr story and published in the print SMH on the same day, see Economou (2012). 2. Personal communication with SMH news review editors and news designers (2009). 3. Though one might expect the preferred reading path to order elements according to size, prominence, and salience, audience research shows the small photo caption is often read second. 4. The grammatical role of fear here is not a congruent one, and so may not be processed at a glance. In a more careful reading, however, the headline can be interpreted as ‘boat people bring fear (to Australians)’. 5. In Kress and Van Leeuwen’s (2006) terms, a “Centre-Margin composition”. 6. When an Australian politician publically expressed this view in October 2009, the then Prime Minister Rudd was forced to respond (Rudd slams, 2009). 7. Experimental research has demonstrated that where black men are depicted, their verbally described actions in an accompanying news story are judged more negatively than when accompanied by photos of white men (Wilkins & Coleman, 2005, pp. 82–91). 8. Similar to (though not electronically manipulated as was) the ‘darkening’ by U.S. newspapers of O. J. Simpson’s skin in photos of his high-profile trial, later ruled to be an illegal and discriminatory practice by U.S. courts (Wilkins & Coleman, 2005). 9. Judgement: sanction: propriety is the attitude value at the core of approval and disapproval. 10. Nor does another piece of verbiage, an inset at the bottom of the page (facts and figures on the law and boat people arrivals), which due to space constraints in this chapter is not analysed. 11. Though writers do not as a rule participate in standout production, Marr was identified as an exception in personal communication with Fairfax news review editors (2006).

REFERENCES Caple, H. (2008). Intermodal relations in image nuclear news stories. In L. Unsworth (Ed.), Multimodal semiotics: Functional analysis in contexts of education (pp. 125–138). London: Continuum. Caple, H. (2010). Doubling-up: Allusion and bonding in multi-semiotic news stories. In M. Bednarek & J. R. Martin (Eds.), New discourse on language: Functional perspectives on multimodality, identity, and affiliation (pp. 111–133). London: Continuum. Caple, H., & Knox, J. S. (2012). Online news galleries, photojournalism and the photo essay. Visual Communication, 11(2), 207–236.

200  Dorothy Economou Economou, D. (2006). The big picture: The role of lead image in print feature stories. In L. Lassen, J. Strunck, & A. Vestergaard (Eds.), Mediating ideology in text and image (pp. 112–234). Amsterdam: John Benjamins. Economou, D. (2008). Pulling readers in: News photos in Greek and Australian broadsheets. In E. Thomson & P. R. R. White (Eds.), Communicating conflict: Multilingual case studies of the news media (pp. 253–280). London: Continuum. Economou, D. (2009). Photos in the news: Appraisal analysis of visual semiosis and verbal-visual intersemiosis (Unpublished doctoral dissertation, University of Sydney). eThesis Economou, D. (2010). Having it both ways? Image and text face off in broadsheet news. In V. Rupar (Ed.), Newspapers and sense making (pp. 175–98). London: Hampton Press. Economou, D. (2012). Standing out on critical issues: Stance in broadsheet news. In W. Bowcher (Ed.), Multimodal texts from around the world (pp. 246–271). Basingstoke: Palgrave Macmillan. Fairclough, N. (1992). Discourse and social change. Cambridge: Polity Press. Fairclough. N. (1995). Media discourse. London: Edward Arnold. Fowler, R., Hodge, M., Kress, G., & Trew, T. (1979). Language and control. London: Routledge and Kegan Paul. Halliday, M. A. K., & Matthiessen, C. M. I. M. (1999). Construing experience through language: A language-based approach to cognition. London: Cassell. Halliday, M. A. K., & Matthiessen, C. M. I. M. (2004). An introduction to functional grammar (3rd ed.). London: Arnold. Hood, S. (2006). The persuasive power of prosodies: Radiating values in academic writing. Journal of English for Academic Purposes, 5, 37–49. Macken-Horarik, M. (2003a). The children overboard affair. Australian Review of Applied Linguistics, 26(2), 1–16. Macken-Horarik, M. (2003b). Working the borders in racist discourse: The challenge of the “children overboard” affair in news media texts. Social Semiotics, 13(3), 283–303. Macken-Horarik, M. (2004). Interacting with multimodal text: Reflections on image and verbiage in ArtExpress. Visual Communication, 3(1), 5–26. Marr, D. (2009a, October 17). Come hell or high water. The Sydney Morning Herald. Retrieved from http// Marr, D. (2009b, October 17). Fear rides in on rusty boat. The Age Newspaper, Insight section, p. 3. Martin, J. R. (2001). Fair trade: Negotiating meaning in multimodal texts. In P. Coppock (Ed.), The semiotics of writing: Transdisciplinary perspectives on the technology of writing (pp. 311–338). Turnhout: Brepols Publishing. Martin, J. R. (2004a). Sense and sensibility: Texturing evaluation. In J. Foley (Ed.), Language, education and discourse: Functional approaches (pp. 270–304). London: Continuum. Martin, J. R. (2004b). Mourning: How we get aligned. Discourse and Society (Special Issue on Discourse around 9/11), 15(2–3), 321–344. Martin, J. R. (2008). Innocence: Realisation, instantiation and individuation in a Botswanan town. In N. Knight & A. Mahboob (Eds.), Questioning linguistics: Proceedings from the first international free linguistics conference (pp. 27–54). Cambridge: Cambridge Scholars Publishing. Martin, J. R., & Rose, D. (2008). Genre relations: Mapping culture. London: Equinox. Martin, J. R., & White, P. R. R. (2005). The language of evaluation: Appraisal in English. Basingstoke and New York: Palgrave. Rudd slams Tuckey’s ‘terrorist’ asylum seeker comments. (2009, October 22). The Sydney Morning Herald. Retrieved from

Telling a Different Story  201 Van Dijk, T. (1988). News as discourse. Hillsdale, N.J.: Lawrence Erlbaum Associates. Van Leeuwen, T. (1996). The representation of social actors. In C. R. CaldasCoulthard & M. Coulthard (Eds.), Texts and practices: Readings in critical discourse analysis (pp. 32–70). London: Routledge. White, P. R. R. (1998). Telling media tales: The news story as rhetoric. (Unpublished doctoral dissertation, The University of Sydney). White, P. R. R. (2006). Evaluative semantics and ideological positioning in journalistic discourse: A new framework for analysis. In L. Lassen, J. Strunck, & A. Vestergaard (Eds.), Mediating ideology in text and image (pp. 37–68). Amsterdam: John Benjamins. Wilkins, L., & Coleman, R. (2005). The moral media. London: Lawrence Erlbaum Associates.

12 Point of View in Picture Books and Animated Film Adaptations Informing Critical Multimodal Comprehension and Composition Pedagogy Len Unsworth 1. INTRODUCTION: LITERARY PICTURE BOOKS AS TRANSMEDIA NARRATIVES Popular culture for children is increasingly characterized by a digital multi­ media dimension. One highly visible example is the adoption, adaptation, and re-versioning of stories from literary picture books to movies, animations, and videogames (Unsworth, 2006; Unsworth, Thomas, Simpson, & Asha, 2005). The experience of literary narrative by today’s children involves taking the “multiplicity of media and versions for granted” (Mackey, 1994, p. 19). Classic picture books such as Where the Wild Things Are (Sendak, 1962) as a popular mainstream animated movie (Jonze et al., 2009) have been highly cel­ ebrated within broad popular culture. Australian picture book author Shaun Tan’s winning of an Oscar in 2011 for the best animated short film of his book The Lost Thing (2000) initiated a flurry of online responses from children and adults. These phenomena indicate how digital multimedia is merging literary picture book culture with popular culture animated movies. Although discussing children’s literature in terms of print media texts alone “ignores the multimedia expertise of our children” (Mackey, 1994, p. 17), it cannot be simply assumed that experience of multiply versioned stories equips young readers to understand how the interpretive possibilities of story are shaped by the affordances of the different media through which the stories are being experienced. Despite a very significant proportion of young people being highly adept at using digital media for creative expres­ sion, research, and social life, they are not necessarily correspondingly adept at understanding how multimedia affordances influence the interpretive pos­ sibilities of the texts they are negotiating (Jenkins, 2006; Kellner & Share, 2007; Luce-Kapler, 2007). Appreciating that no text is ‘innocent’ and under­ standing how to interrogate or analyze texts to determine how they have been structured to convey a particular evaluative stance, whether explicitly or implicitly, has long been held as a crucial aspect of critical literacy (Gee, 2003; Hood, 2010; Lemke, 2006; Luke, 2000). Such critical interpretive analysis of texts may be facilitated by explicit knowledge of how meanings are made through the structuring of the semiotic resources of language and image.

Point of View in Picture Books and Animated Film Adaptations  203 Burn and his colleague, for example, found that early adolescents were able to develop explicit knowledge of the meaning-making resources of moving images and demonstrate strategic application of this in sophisticated com­ mentaries on their reformulated digital movie texts (Burn & Durran, 2006). What is suggested here is that close analysis of picture books as transmedia narratives in the form of animated movies can be an enjoyable basis for explicit teaching of aspects of multimodal narrative techniques such as point of view. This explicit meta-semiotic understanding can help students appreciate the multimodal ‘constructedness’ of story and hence become a resource for critical comprehension and for enhancing students’ own multi­ modal digital narrative authoring. Following a brief explication of options for the construction of point of view in images in picture books (Painter, Martin, & Unsworth, 2012), the focus in this chapter will be on the story of The Lost Thing in picture book (Tan, 2000) and animated movie formats (Ruhemann & Tan, 2010). A comparison of a selection of corresponding segments of the story in both formats will examine how point of view is constructed. The comparative analysis will then address the interactions of the narration with the image/viewer interactive aspects of the images, their depiction style, and the role of point of view, in affording the interpretive possibilities of the story in each format. Finally, examples of analyses of point of view in other picture books and animated movie adaptations will be mentioned and some early work on point of view in digital animated nar­ rative authoring pedagogy will be briefly noted. 2.  IMAGES AND POINT OF VIEW IN PICTURE BOOKS In verbal narrative a distinction can be made between (1) who is telling the story, the narrator, and (2) from whose point of view, or through whose eyes, we experience the story, which may change as the story progresses. Such changes in ‘focalization’ (Genette, 1980) and the subtlety and sophistication of shifts in point of view are the subject of detailed scholarly enquiry (Huhn, Schmid, & Schonert, 2009), but here we will simply illustrate some basic relationships between ‘who tells’ and ‘who sees’. We will firstly look briefly at shifts in point of view in a light-hearted story—“Unhappily Ever After”—in the collection Quirky Tales by Paul Jennings (1987). In this story a young boy, Albert, receives corporal punishment from his balding old headmaster for allegedly circulating a note (“BALD HEAD BROWN WENT TO TOWN RIDING ON A PONY”). One part of the story describes the headmaster, Mr. Brown, in his rowing boat on the sea: The sea was flat and mirrored the glassy clouds that beckoned from the horizon. Brown pushed out the small boat and it knifed a furrow through the inky water. He put back the oars and soon he was far out to sea with the shore only a think line in the distance. (p. 62)

204  Len Unsworth In this segment the unknown external narrator is telling the story and the point of view is that of the narrator who is outside of the story world. But the following segment at the beginning of the story is told from Albert’s point of view: Albert pulled up his socks and wiped his sweaty hands on the seat of his pants. He did up the top button of his shirt and adjusted his school tie. Then he trudged slowly up the stairs. He was going to get the strap. He knew it, he just knew it. He couldn’t think of one thing he had done wrong but he knew Mr Brown was going to give him the strap any­ way. He would find some excuse to whack Albert—he always did. (p. 59) Here it is the narrator again who is telling the story, but it is being told from Albert’s perspective—from inside his consciousness. Just after this seg­ ment, the point of view shifts: Inside the room Brown heard the knock. He said nothing. Let the little beggar suffer. Let the little smart alec think he was in luck. Let him think no one was in. Brown heard Albert’s soft footsteps going away from the door. “Come in. Jenkins,” he boomed. In this segment the narrator is telling the story but now it is from Mr. Brown’s point of view. The verbal text in picture books can position the reader to experience the story from such different points of view. Images in picture books can also position the reader to experience the image from an external, unmediated viewpoint, or from a point of view similar to that of one of the characters in the image, or indeed, as if he or she is one of the characters in the image (Painter, 2007; Painter et al., 2012; Unsworth, 2006). Sometimes the points of view constructed by the verbiage and the image are consistent and some­ times they are different. Painter (2007) identifies three methods by which viewers can be posi­ tioned as if they were one of the characters in the image (Painter et al., 2012). The first method is by depicting just the part of the body that could be seen by the focalizing character (such as the hands or feet out in front of the unseen body). Since the reader can see only that part of the body (such as the hands or feet) that would be visible to the focalizing character, then the reader is positioned as if he or she is the focalizing character—with that character’s point of view (see also Kress & Van Leeuwen, 2006, pp. 143–144). A similar effect is created when only the shadow or partial shadow of the focalizing character is included in such a way that the viewing position for the image could only be that of the character casting the shadow. This method of posi­ tioning of the reader to have the point of view of one of the characters is

Point of View in Picture Books and Animated Film Adaptations  205 inscribed in the actual form of the image depiction. The second method of positioning is achieved across a sequence of two images. In the first image, the focalizing character looks out from the page gazing directly at the reader, so it is clear that the character is looking at something, and what he or she is looking at is depicted in the subsequent image. This has the effect of position­ ing the reader to see the second image from the point of view of the focalizing character (see Painter et al., 2012, for examples of such image sequences from picture books). The third method is achieved by using the angle of view­ ing across two images. In the first image, the focalizing character is looking at something (or about to see something), but at this point we do not know what. Then the next image depicts the focalized participant, but from the same viewing angle as that depicted for the focalizing character in the previ­ ous image. In the second and third methods, the positioning of the reader to have the point of view of one of the characters is inferred from the relation­ ship between the successive images (evoked) rather than being inscribed in the actual form of the image as in the first method. It is also possible for the reader to share a character’s point of view rather than being positioned as the character. The reader’s view subsumes that of the character. The reader sees the character (or part of the character) while also seeing what the character sees from that character’s perspective. This is achieved by having the reader view what is depicted ‘along with’ or ‘over the shoulder’ of the focalizing character. The ‘over-the-shoulder’ view can be achieved by positioning the reader’s point of view as being from slightly to the rear and to one side of the focalizing character or directly behind the focal­ izing character, which may be seen as a stronger alignment with the focalizing character’s point of view (Unsworth, 2006, pp. 95–97). The ‘back view’ is briefly mentioned by Kress and Van Leeuwen as “complex and ambivalent” with possible interpretations such as “maximally confronting”, “trust”, and “abandonment” (Kress & Van Leeuwen, 2006, pp. 138–139), but they do not discuss the ‘back view’ in relation to focalization. In children’s literature this is very important in establishing alignment between the reader and the point of view of the focalizing character. For example, in Anthony Browne’s (1983) Gorilla there are four back-view images of the main protagonist, Hannah, alone, four of Hannah and the gorilla, and one of Hannah and her father. While the earlier rear views of Hannah tend to align the reader with Hannah’s perspective on the events of her life, and then those with the gorilla on how she imagines her life should be, the final rear view of Hannah and her father focuses the readers’ view on the togetherness of father and daughter as they walk hand in hand into the future (see Painter et al., 2012, for further discus­ sion and examples of ‘back view’ and focalization). Developing explicit knowledge of the construction of point of view, and how it relates to the ways in which the depiction of images construct a kind of pseudosocial interaction between the viewer and the represented partici­ pants, is important in understanding how the text is positioning the reader/ viewer to perceive the unfolding story. How the resources of image and

206  Len Unsworth language are used to align the reader/viewer with viewpoints of one or more characters is a significant means of privileging the interpretations that are consistent with the perspectives shown by those characters. Understanding this aspect of the ‘constructedness’ of story is therefore a crucial resource for critical interpretation and response. One way of exploring this ‘constructed­ ness’ is comparing different versions of ostensibly ‘the same’ story. 3. COMPARING POINT OF VIEW IN PICTURE BOOKS AND ANIMATED MOVIE ADAPTATIONS The Lost Thing (Tan, 2000) is a humorous story about a boy who discovers a bizarre-looking creature while out collecting bottle tops at a beach. Hav­ ing guessed that it is lost, he tries to find out who owns it or where it belongs, but the problem is met with indifference by everyone else, who barely notice its presence. Each is unhelpful; strangers and parents are all unwilling to entertain this uninvited interruption to day-to-day life. Even his friend is unable to help despite some interest. The boy feels sorry for this hapless creature, and attempts to find out where it belongs. In the book The Lost Thing, there are no images where the gaze of the characters is directed straight out toward the reader, so there is no “contact” between the depicted characters and the reader. Nearly all of the images are long-distance views, with only three images that are middle-distance views, and no close-up views at all, so the social distance between the reader and the depicted characters is generally quite remote. The depiction style used to represent the characters visually can be categorized as ‘minimalist’ as opposed to ‘generic’ or ‘realist’ (Painter et al., 2012; Welch, 2005). In broad terms, the minimalist style for a human character is one that uses circles or ovals for people’s heads, with dots or small circles for eyes, and does not need to maintain accurate facial or body proportions. Painter and colleagues (2012) suggest that this style is indexical of what they refer to as ‘apprecia­ tive’ engagement of the reader, with limited emotional involvement. The generic style they see associated with an ‘empathic’ role for the reader, recog­ nizing something of themselves and others in the characters, and the realist style is considered to support a ‘personalizing’ engagement of the reader, responding to the characters as real individuals. The minimalist images in The Lost Thing construct a reader role of appreciative engagement. The visual point of view in the book is overwhelmingly unmediated observation—viewers are positioned as detached outside observers. This combination of semiotic features does not orient the reader to become involved with characters as individuals and build up personal relationships with them, so, although the verbal dimension of The Lost Thing is a first person narrative, the images construct a reader stance that is predominantly distant and detached. The tension between the alignment of the reader and the character of the boy as the first person narrator, and the distancing of

Point of View in Picture Books and Animated Film Adaptations  207 the images, reflects the treatment of the relationship between the boy and the lost thing, where the boy believes he should be concerned about the aban­ donment of the lost thing by society, but does not establish any interpersonal closeness with it. Subtle changes in the language and very significant differ­ ences in the images from the book to the movie remove this tension. In the movie, the images construct a pseudosocial relationship with the viewers that is very different from that in the book. In the movie there are many more mid-distance and close-up views of the characters; sometimes the main character of the boy looks directly out at the audience; and the viewer is very frequently positioned to have a point of view along with one of the characters, and on a number of occasions as if he or she was one of the characters. While the story events are almost exactly equivalent in both versions, and the narration varies only moderately, the visual depiction in the movie significantly affects the interpretive possibilities. The movie sub­ stantially maintains the minimalist depiction style of the book, but the closer social distance, more frequent contact images, and particularly the mediated point of view ‘along with’ and ‘as’ the depicted characters shift the reader/ viewer engagement from appreciative to empathic, and visually accentuate the boy as a focus for interpretive issues concerning difference, conformity, acceptance, and interpersonal outreach. While an exhaustive comparison of point of view in the book and movie versions cannot be presented in this chapter, to indicate the differences in point of view I will outline examples in the movie where the point of view is mediated as the character of the lost thing or as the character of the boy, and compare these with the corresponding sections of the book. Then, to show the interpretive impact of the different approach to point of view in the movie, I will briefly compare the scenes from the very beginning of the story, where the boy meets the lost thing, with the scenes from the very end of the story, with the character of the boy reflecting on his encounter with the lost thing. After the boy has taken the lost thing home, in the book there is a one-page depiction of him feeding the lost thing. The image shows him on top of a lad­ der dropping objects from a box into the raised top ‘lid’ of the lost thing. The image is quite a long view showing the full height of the lost thing and a full view of the boy and the high ladder he is standing on to reach the top of the lost thing. It is also an ‘observe’ image, since there is no gaze from either of the char­ acters directed toward the reader. The text at the bottom on the page reads: I hid the thing in our back shed and gave it something to eat, once I found out what it liked. It seemed a bit happier then, even though it was still lost. In the movie, the narration maintains the part about hiding the lost thing in the back shed and it seeming happier after eating, but the intervening information about giving it something to eat after finding out what it liked is all rendered, in the movie, through images only, and there is a much more

208  Len Unsworth detailed and comprehensive portrayal of this sequence of events. What is significant is that we see the ladder being positioned against the side of the lost thing, looking down on the ladder, and seeing only the ends of the large front ‘claws’ of the lost thing—as they would be seen by the lost thing when carrying out this action—so the point of view for the movie viewer is ‘as’ the lost thing. In this case point of view is ‘inscribed’, or explicit, because it is constructed directly by what is depicted visually. In the book, the boy then notices in the newspaper a small advertisement for “The Federal Department of Odds & Ends” which would accommodate “Things that just don’t belong”. The full-page image showing this advertise­ ment appears on the left-hand side of one double-page opening. The text on the bottom on the previous page reads: “I was wondering what to do when a small advertisement on the last page of the paper happened to catch my eye”. On turning the page the reader sees the advertisement in the full-page image of the newspaper. While there is no image of the boy associated with the newspaper, the first person narrative statement on the previous page about an advertisement catching his eye, together with the appearance of the advertisement in the newspaper fully occupying the next page, enable a visual-verbal collaboration that evokes, rather than directly inscribes, the boy’s point of view. There is another such example later in the story when the right-hand side of the double-page spread depicts the boy’s arm and hand about to press a door buzzer and the text reads: “I pressed a buzzer on the wall and this big door opened up”. The subsequent page shows the bizarre characters and happenings inside that door, again strongly evoking the boy’s point of view. These two, and the possible further example dis­ cussed later in relation to Figure 12.4, are the only occasions in the book when we visually experience the boy’s point of view. In the movie, the advertisement is on television. We see the boy sitting in a chair watching television. The image is a mid to close view of the boy’s upper body and head, with his head tilted slightly forward toward the television set, of which we see one rear corner. The angle is slightly oblique, so that he is not quite facing out to the viewer. We see him move his head closer to the television set, and then in then next shot we see the television screen. So from this combina­ tion of shots the inferred or ‘evoked’ mediated point of view is that of the boy. While there are more examples of the point of view being that of the character in the movie, the more frequently occurring point of view is ‘along with’ the boy, where viewers see both the boy and what he is looking at from his perspective. This is frequently achieved through a close-up foreground image of the right side and rear of the boy’s head and shoulder, constructing our point of view as ‘over the shoulder’. In fact, this occurs within the first minute of the story. The boy stoops to pick up a bottle top for his collection and locates the specimen in his collector’s catalogue. As he is bent down looking at his catalogue on the ground, we have a close-up view of the top of his shoulders, the back of his head, and his arms manipulating his speci­ men over his catalogue. As this occurs, the camera moves in a little closer, as

Point of View in Picture Books and Animated Film Adaptations  209 if to foreshadow very early in the movie the significance of the boy’s point of view. The parallelism in the selection of this point of view when he meets the lost thing at the beginning of the story and when he reflects on their encounter at the end provides a very different orientation to our interpreta­ tion of the story than that provided by the unmediated point of view in the depiction of the corresponding segments of the book. The second double-page spread of The Lost Thing, shown in Figure 12.1, represents the boy meeting the lost thing. The point of view is unmediated— not ‘as’ or ‘along with’ the character. They are all ‘observe’ images, as no gaze is directed to the reader. The large image is a longish mid view of the boy, showing more than half of his body, but in profile and at a slightly oblique angle. The small images on the right-hand column are also mostly long shots or longish mid views and the social distancing is accentuated by the smallness of the images. The smallness of these images also means that no facial affect is discernible, although curiosity is indicated by posture and gesture. Overall, the reader’s relationship with the characters is quite remote and detached. In each of the four small images in a column on the right-hand side, the lost thing is prominent, and all of these images with the single line of text under each, appear to be labelled. The first three of these, each in turn, add a label of negative judgement about the lost thing: sure didn’t do much. It just sat there, looking out of place.

Figure 12.1  Meeting the lost thing

210  Len Unsworth The positioning of these negative judgements, like labels underneath the three small images of the lost thing on the right-hand side in the location of ‘new’ information (Kress & Van Leeuwen, 2006), and the social distancing of the reader from the characters, tends to give more emphasis to the unap­ pealing incongruity of the lost thing than to the curiosity of the boy. In the movie, there is no explicit verbal comment on the boy’s curiosity about the lost thing, but curiosity, surprise, and puzzlement are evoked through facial expression and gesture and the 17 seconds of silent walking around and looking at the lost thing before any utterance of judgement is made. The point of view is unmediated for those 17 seconds, but when the boy stands still in front of the lost thing, the camera shifts to a close-up foreground view of the back and right side of the boy’s head and his right shoulder, positioning us along with the boy’s point of view as he looks up at the lost thing, as indicated in the top right-hand image in Figure 12.2. The lost thing does not have eyes, but the aperture at the top, behind which is what appears to be a fan, creates the impression of it having eyes. This means that the immediately subsequent shot (shown in the bottomleft image of Figure 12.2), which is a ‘contact’ image because the gaze of the boy is upward toward us as the viewers, can be inferred as from the point of view of the lost thing. Hence we are being positioned as the lost thing. The closer social distance, inclusion of ‘contact’, and shifts in point of view mean that the viewer is positioned much more within the story world experience than is the case with the corresponding segment in the book. The synchronizing of the narrative judgement statements about the lost thing with shots in the movie is indicated in Figure 12.2. “It just sat

Figure 12.2  Introducing the lost thing in the movie

Point of View in Picture Books and Animated Film Adaptations  211 there” is synchronized with an image that could just as much show the lost thing contemplating the boy as the boy contemplating the lost thing. When we hear the continuation of the narrative “with a really weird look about it”, what we are actually looking at is the image of the boy shown in the bottom-left of Figure 12.2. So the synchronizing of the narration of “really weird look” with the image of the boy ‘unsettles’ the taken-forgranted notion that it is the lost thing only that has “a really weird look about it”. And then the narration of “You know, a sad, lost sort of look” is synchronized with the bird’s-eye view of the lost thing and the boy appar­ ently wandering away from it, so the boy appears somewhat lost from this perspective, again unsettling the idea that it is the lost thing only that is ‘lost’. It is as if the image-language interaction in the movie is problema­ tizing the negative judgement of the lost thing, whereas in the book these judgements appear to be quite unequivocal. This difference is underscored by comparing the narration in the book and the movie. The differences in the narration in the book and movie ver­ sions are indicated in Figure 12.3. While the narration is very similar, and the extent of the variation is modest, the differences in the movie version have quite a significant interpretive impact. Space does not permit detailed discussion of the slight, but impactful, modification and rearrangement of the judgement statements from the book to the movie version, but the combined effect of the move from the simple past to the past continuous (didn’t/wasn’t doing), and the shift of “a really weird look about it” from an attribute in the book to a circumstance of man­ ner in the movie, indicates the judgements in the book as being concerned with characteristic traits of the lost thing, whereas in the movie they are treated as referring more to the lost thing’s current demeanour (Figure 12.3). What is clear is that differences in the image–reader/viewer relations, point of view, and the interaction of these with the language of the narration con­ struct different orientations to interpretation in the book and movie versions of the story. The powerful potential of the visual construction of point of view to influence the reader/viewer orientation to interpretation can be seen by com­ paring the corresponding later segments of the book and movie where the character reflects on his encounter with the lost thing. The relevant page from the book is shown in Figure 12.4. The previous page is a very long-distance view of a tram, and although the passengers are very small, one is recognizable as the boy. Hence the image in Figure 12.4 could be regarded as being seen from the boy’s point of view. What he sees out of the corner of his eye that doesn’t quite fit is the orange-coloured creature with the light bulb for a head peering into the red box on the foot­ path. In the movie, we likewise see this creature through the window as the tram approaches the stop. But when it stops, the camera pulls back and we see the close-up rear view of the head of the boy looking out of the window at the orange creature, so our point of view is ‘along with’ that of the boy.

Figure 12.3  Meeting the lost thing—comparing narration in the book and movie versions

Point of View in Picture Books and Animated Film Adaptations  213

Figure 12.4  Remembering the lost thing

Simultaneous with this view we hear the narration, which is identical in the book and the movie: I still think about that lost thing from time to time. Especially when I see something out of the corner of my eye that doesn’t quite fit. You know, something with a weird, sad, lost sort of look.

214  Len Unsworth Just as the narration comes to the final words “a weird, sad, lost sort of look”, the character turns around to face the camera and looks out making contact with us as the viewers, as shown in the bottom image in Figure 12.5. At the time these words are narrated we are looking at both the boy who is looking at us, and we can also see through the tram window the orange creature with the light-bulb head, raising the question of which of these char­ acters has the weird, sad, lost sort of look. Not only does the shift in point of view facilitate this provocative synchronization of image and language, but also it precisely parallels the image-language synchronization indicated in Figure 12.2. So, in contrast to the book, in the movie the visual construction of image/viewer relations of social distance, contact, and point of view impli­ cate the boy quite directly in relation to issues of ‘looking out of place’ or not seeming to ‘quite fit’, and appearing to have “a weird, sad, lost sort of look”. 4. PICTURE BOOKS, POPULAR CULTURE, AND POINT OF VIEW IN CRITICAL NEW LITERACIES PEDAGOGY Multimedia versions of literary picture books in various filmic formats are now increasingly easily accessible either online or as DVDs, facilitat­ ing new forms of literary engagement for children and new computer-based approaches to the use of multiple story versions in school English curricula

Figure 12.5  A weird, sad, lost, sort of look

Point of View in Picture Books and Animated Film Adaptations  215 (Jewitt, 2002, 2006; Unsworth, 2006; Unsworth et al., 2005). However, the apparently strengthening nexus between literary picture book culture and popular movie culture brings simultaneously an exceptional opportunity for enhancing the engagement of all children with literary narratives and also a responsibility to develop the capacities of children for critical analytic interpretation of such transmedia narratives and the social, cultural, and personal values they privilege and distribute so broadly and powerfully. A pedagogic advantage of animated movies of picture books like The Lost Thing is that the meaning-making resources of the animated images are also available to children who are using animation software such as Mov­ iestorm ( or Muvizu (http://www.muvizu. com) in constructing their own films. For example, the minimalist depiction style in representing the characters in The Lost Thing (Tan, 2000) and The Little Prince (De Saint-Exupery, 2000a), which exists as both an interactive CD-ROM and a film (Donen, 2004), means that systems of meaning-mak­ ing resources for the representation of facial affect (Painter et al., 2012; Welch, 2005) can be taught and deployed by the children using software that makes these simple variations in facial features possible. The changes in camera positioning and shot choice that construct differences in point of view are also able to be taught and deployed in much of the readily acces­ sible animation software, facilitating the teaching of systems of options for the construction of point of view (Painter, 2007; Painter et al., 2012). The advantage of learning about these systems through close analysis of trans­ media narratives is that they frequently provide alternative perspectives on ostensibly the same story situation, so children are able to develop a critical understanding of the interpretive difference that can result from different semiotic choices. The intersection of popular culture and traditional literary picture book culture is a significant site for critical multimodal analysis that can certainly inform multimodal comprehension and composition pedagogy and, beyond that, intergenerational and intercultural understanding in per­ sonal, social, civic, and political contexts. REFERENCES Browne, A. (1983). Gorilla. London: Julia MacRae. Burn, A., & Durran, J. (2006). Digital anatomies: Analysis as production in media education. In D. Buckingham & R. Willett (Eds.), Digital generations: Children, young people and new media (pp. 273–294). Mahwah, NJ: Lawrence Erlbaum. De Saint-Exupery, A. (2000). The little prince (R. Howard, Trans.). London: Penguin. Donen, S. (2004). The Little Prince. US and Canada: Paramount Home Video. Gee, J. (2003). What computer games have to teach us about learning and literacy. New York, NY: Palgrave Macmillan. Genette, G. (1980). Narrative discourse: An essay in method. (J.E. Lewin, Trans.). Ithaca, NY: Cornell University Press. Hood, S. (2010). Appraising research: Evaluation in academic writing. Basingstoke: Palgrave Macmillan.

216  Len Unsworth Huhn, P., Schmid, W., & Schonert, J. (Eds.). (2009). Point of view, perspective and focalization: Modelling mediation in narrative. Berlin: Walter de Gruyter. Jenkins, H. (2006). Confronting the challenges of participatory culture: Media education for the 21st century. Occasional Paper, The MacArthur Foundation. Retrieved from b.2029291/k.97E5/Occasional_Papers.htm Jennings, P. (1987). Quirky tales. Ringwood, Victoria: Puffin. Jewitt, C. (2002). The move from page to screen: The multimodal reshaping of school English. Visual Communication, 1(2), 171–196. Jewitt, C. (2006). Technology, literacy and learning: A multimodal approach. London: Routledge. Jonze, S. (Writer), Hanks, T., Goetzman, G., Sendak, M., Caris, J., & Landay, V. (Producers). (2009). Where the Wild Things Are [Motion picture]. United States: Warner Brothers. Kellner, D., & Share, J. (2007). Critical media literacy is not an option. Learning Inquiry, 1, 56–69. Kress, G., & Van Leeuwen, T. (2006). Reading images: A grammar of visual design (2nd ed.). London: Routledge. Lemke, J. (2006). Towards critical multimedia literacy: Technology, research and ­politics. In M. McKenna, L. Labbo, R. Kieffer, & D. Reinking (Eds.), International handbook of literacy and technology (Vol. II, pp. 3–14). Mahwah, NJ: Lawrence Erlbaum. Luce-Kapler, R. (2007). Radical change and wikis: Teaching new literacies. Journal of Adolescent and Adult Literacy, 51(3), 214–223. Luke, C. (2000). Cyber-schooling and technological change. In B. Cope & M. Kalantzis (Eds.), Multiliteracies: Literacy learning and the design of social futures (pp. 69–91). Melbourne: Macmillan. Mackey, M. (1994). The new basics: Learning to read in a multimedia world. English in Education, 28(1), 9–19. Painter, C. (2007). Children’s picture book narratives: Reading sequences of images. In A. McCabe, M. O’Donnell, & R. Whittaker (Eds.), Advances in language and education (pp. 40–59). London: Continuum. Painter, C., Martin, J. R., & Unsworth, L. (2012). Reading visual narratives: Interimage analysis of children’s picture books. London: Equinox. Ruhemann, A., & Tan, S. (Directors). (2010). The Lost Thing [DVD/PAL]. Australia: Madman Entertainment. Sendak, M. (1962). Where the wild things are. London: The Bodley Head. Tan, S. (2000). The lost thing. Sydney: Hachette. Unsworth, L. (2006). e-Literature for children: Enhancing digital literacy learning. London: Routledge/Falmer. Unsworth, L., Thomas, A., Simpson, A., & Asha, J. (2005). Children’s literature and computer based teaching. London: McGraw-Hill/Open University Press. Welch, A. (2005). The illustration of facial affect in children’s literature. (Special study report). Department of Linguistics, University of Sydney, Australia.

13 Points of Difference Intermodal Complementarity and Social Critical Literacy in Children’s Multimodal Texts Angela Thomas 1. INTRODUCTION In our recent history, educators have seen a significant shift in the kinds of texts students engage with on a daily basis in and out of school contexts (Thomas, 2007). As curricula have changed, such as the new Australian English Cur­ riculum (ACARA, 2010), to acknowledge these shifts due to societal and technological changes, teachers are now required to work with new kinds of texts in classroom contexts. Increasingly, these new kinds of texts are multi­ modal and include the presence of more than one semiotic mode: alphabetic print, visual, audio, tactile, gestural, and/or spatial representations (Cope & Kalantzis, 2009). Young people who are growing up immersed in multimodal texts need to be taught explicitly how such texts create meaning, so that they may become active and critical consumers and creators themselves, and be well prepared to engage successfully in society into their futures. Of particular interest in this chapter are two features of children’s partici­ pation in authoring new kinds of multimodal texts. The first is exploring the notion of intermodality (Painter & Martin, 2011)—that is, the ways semiotic modes work together to make meaning—and to what degree intermodality can be understood and deployed meaningfully by students. The second is social critical literacy (Luke, 2000), which is concerned with teaching students explicit knowledge of how particular intellectual and political power operates within texts and discourses (see also Christie, 1990). In this respect, the chap­ ter explores the potential for students to manipulate texts for their own active ‘position-takings’ (Bourdieu, 1998), for example, the ways that they are able to assert agency within their text production, create resistant or alternative texts, and challenge conceptions about identity, discourse, and society. 2. INTERMODALITY To explore the concept of intermodality, I draw on systemic functional descriptions of language (Halliday & Matthiessen, 2004) and social semiotic descriptions of the ways images and sounds construct meaning (Kress & Van Leeuwen, 2006;

218  Angela Thomas Van Leeuwen, 1999). According to Unsworth (2001, 2006, 2008), a systemic functional semiotic theory has much to offer teachers in supporting their work with multimodal texts as it provides teachers with an explicit meta-language to discuss how texts work to make meaning. Moreover, this approach provides a detailed account of how the semiotic resources within a multimodal text work together to make meaning, that is, intermodality. In systemic functional linguistics, language is modelled as a social resource for making meaning, or a social semiotic system, which is organized into three broad functions that language has evolved to serve, which are called metafunctions: ideational (how language expresses ideas); interpersonal (how language is used to interact with others); and textual (how language is used to construct cohesive and coherent texts) (Halliday, 1985). Ideational meanings are concerned primarily with the topic or field of the text—the who, what, where, why, when, and how of the text. Interpersonal meanings relate to the tenor or the nature of the relationship between the writer/speaker and the reader/viewer, for example, the way in which language works to construct relative power or emotional positionings between those involved in the com­ munication. Textual meanings relate to the mode of communication and how a text is organized in that mode. These three metafunctions, whilst originally designed to describe language and verbal meaning-making, provide a useful starting point for describing the meaning-making potential and use of all semiotic modes, and are therefore a very powerful resource for teachers and students working with multimodal texts. Adapting key principles of systemic functional linguistics to the interaction between modes in picture books, Painter and Martin (2011) have outlined an approach to exploring the intermodality between image and text that they term intermodal complementarity and define as “the degree to which each [modality] commits meaning in a particular instance and the extent to which—for each metafunction—that commitment converges with or diverges from that of the other modality” (p.132). Painter and Martin (2011) elaborate in detail how intermodal complementarity is realized across visual and verbal modalities and demonstrate how meanings in the image and text might commit to the same meaning, which they term convergence, or how they might be oppositional in meaning, which they term divergence. They explain that by examining the meaning potentials of each metafunction, we can understand, for example, that the meaning of sadness might be achieved through both an illustration drawn in blue tones and the selection of certain interpersonal word choices such as the word ‘sad’. When the same mean­ ings in a metafunction (e.g., the interpersonal metafunction, which conveys feelings such as sadness) are present in both image and words, then Painter and Martin describe their interaction as convergent. If the visual and verbal meanings are different, then the intermodal complementarity is divergent. To provide an example of how this works, we can explore the children’s picture book Michael Rosen’s Sad Book (Rosen & Blake, 2008). This is a poignant true story dealing with the death of the author’s son, Eddie.

Points of Difference  219 Michael Rosen’s sparsely written tale described his grief at the loss of his son, whilst Quentin Blake’s haunting illustrations reflect both the sadness in the present as well as the joys of the past between father and child. In the opening page of the book, the text begins: “This is me being sad”. Yet the image is a drawing of Michael Rosen with a large smile on his face, dressed in bright colours, on a sunny yellow background. In terms of the interpersonal meaning of affect (feelings and emotions), the visual facial expression of a smile is positive, yet contrary to this, the word expressing emotion, ‘sad’, is negative. So in this instance, the intermodal divergence constructs an incredibly powerful moment of poignancy, particu­ larly when coupled with the following two sentences, which read: “Maybe you think I am being happy in this picture. Really I am being sad but pretend­ ing I’m being happy”. This is then followed by the next page in which the image is a shadowy ink sketch of a small hunched man on a large grey and black wash backdrop of clouds, and the words “Sometimes sad is very big. It’s everywhere. All over me”. On this page the visual and verbal modes converge. The same meanings of affect are reflected through both the bodily stance of the figure and the word ‘sad’. They are also reflected in choice of a cold colour tone, and scaled up in the ‘tone’ of the repeated circumstances in the words “very big”, “everywhere”, and “all over me”. Both of these pages reflect the grief experienced by the author but the very first page, in which there is an intermodal divergence, sets up a significant affectual, or emotional, impact that serves to shock the reader on first reading. Intermodal divergence creates one form of what I term a point of difference, which appears to create a moment in a text that has a marked affectual impact, whether it be, as in the example just given, of pathos, or perhaps of humour or irony. Evoking an emotional response is one means through which authors may enable readers to relate to a narrative’s key themes and issues, in order to lead them to both understand and make sense of the world. So an understanding of how words, images, and other semiotic modes, as well as the interaction between modes, construct emotional meanings is significant, as is an understanding of how authors create ‘points of difference’ through intermodal divergence and so create moments of extreme emotion. In systemic functional linguistics, interpersonal meanings relating to emo­ tions are described within the umbrella term attitude (see also Economou, this volume). Attitude can be created through the use of some specific tools. In words, the specific attitudinal tool relating to emotion is termed affect. Affect refers to those words which express happiness, security, satisfaction, or the opposite of these and any emotions between. Additionally, affect can be created through more implicit means, such as the repetition of certain words or phrases or the use of figurative language such as metaphors and similes (Martin & White, 2005). In images, attitude can be created through a range of tools, two of which are affiliation and ambience (Painter, 2008). Affiliation relates to the way in which a character makes contact or not with the viewer. For example, an image in which a character gazes directly at the viewer with a

220  Angela Thomas happy expression constructs a warm and inviting relationship with the viewer, while an image where the character hangs his or her head and makes no eye contact constructs a sense of distance and removal and a potential meaning of sadness. Ambience refers to the choices of colour in an image. For example, an image containing a predominance of cool blue tones can reflect sadness, while an image containing a predominance of vibrant red tones might reflect happiness and excitement. In music, attitude can be created through tone (Van Leeuwen, 1999). For example, a melody that uses a major key might signify happiness, while a melody in a minor key might signify sadness. When working with multimodal texts to explore instances of intermodal complementarity that create the points of difference, which I mentioned above, these interpersonal semiotic tools of attitude—affect, affiliation, ambience, and tone—are significant. Moreover, the ways in which intermo­ dal complementarity is manipulated throughout the narrative can work to build and enhance the emotional impact. Thus, good writers can take read­ ers and viewers of any literary text on an emotional journey of highs and lows. In multimodal texts, this is achieved through the ways in which inter­ modality is deployed within broader techniques the author uses to construct the narrative. One of these narrative techniques is parallelism. Parallelism is a technique used by authors, filmmakers, musicians, and storytellers of all kinds for purposes such as creating a cohesive text, empha­ sizing a certain point, creating a rhythm to the narrative, and building to emotional climaxes. Simply put, parallelism is a repetition of a similar struc­ ture (grammatical, visual, etc.) in two or more different places within a text. For example, a movie will have a soundtrack that uses leitmotifs or recurring themes to signal to the viewer that a particular character or object is present, such as the use of the main shark theme in the film Jaws (Spielberg, 1975). Intermodality can occur within instances of parallelism. For example, one use of parallel intermodal divergence is found in Anthony Browne’s (2008) picture book Gorilla. This story tells the tale of young girl Hannah and her father. From Hannah’s point of view, her father doesn’t have enough time for her. Obsessed with gorillas, Hannah is delighted one evening when her toy gorilla becomes real and whisks her off on a wonderful adventure. One exam­ ple of parallelism with a point of difference is when, on an early page in the text, Hannah and her father are both sitting at the table eating together, and on a later page, Hannah and the gorilla are sitting at the table eating together. Here the circumstantiation is identical (table, food, eating) but there are a number of stark contrasts between the two. In the first image, the ambience is ‘removed’. According to Painter (2008, p. 92), ambience can be described as ‘removed’ if the image is infused with colour of low differentiation. In the first image, the colours are black, blue, and white—with the exception only of Hannah, who is in bright red. It is sterile, cold, and stark, and has the purpose of representing Hannah’s sense of distance from her father. In con­ trast, the second image is predominantly coloured in vibrant, red tones, and reflects a high degree of warmth and familiarity (Painter, 2008). According to

Points of Difference  221 Painter (2008) this allows the viewer to feel more intimately connected with (as opposed to distanced from) the image, and in this case serves the purpose of representing Hannah’s sense of closeness to the gorilla. In this example, the parallel intermodal divergence positions the reader to understand changes in Hannah’s emotional state throughout the trajectory of the story. 3.  SOCIAL CRITICAL LITERACY Social critical literacy draws from the early work of Freire (1972, 1995), who articulated the need for allowing people access to literacy practices to reposi­ tion themselves within society. Similarly, educators have emphasized the need to provide the disadvantaged with explicit knowledge of how particular intel­ lectual and political power operated within texts and discourses (e.g. Christie, 1990; Martin & Rose, 2008). In previous work (Thomas, 2005, 2007, 2011) I researched the social and discursive practices that young people engage in within online communities and that offer them opportunities to become active citizens in those communities. I have been particularly interested in the kinds of attitudes and thinking about texts displayed by these students, and in how they are able to manipulate texts for their own active ‘position-takings’ (Bour­ dieu, 1998)—that is, the ways that they are able to assert agency within their text production, make resistant or alternative texts, and challenge conceptions about identity, discourse, and society. I have invited them to teach me what they do, how they do it, and why they do it to better understand how I can contribute to educational reform that genuinely reflects critical social literacies. One critical social literacy practice within popular culture relevant for this paper is that of the remix. As I have noted previously (Thomas, 2011), remix is the term given to appropriating content from existing stories and reorganizing that content to create new texts. Often the purpose for remix­ ing is for humour, parody, or to point out injustices or questionable values. As Jenkins (2006) noted: More and more literacy experts are recognizing that enacting, reciting, and appropriating elements from preexisting stories is a valuable and organic part of the process by which children develop cultural literacy. Parents should instead think about their kids’ appropriations as a kind of apprenticeship. They learn by remixing. Indeed, they learn more about the form of expression they remix than if they simply made that expression directly. (p. 177) Remixes are becoming a popular way for young people to engage in a new kind of social activism. By subverting traditional stories, they are able to engage in political commentary that has the potential to be very empow­ ering at a grassroots level. An excellent example of this is a remix of scenes from the popular movie franchise Twilight with the TV series Buffy the Vampire Slayer. Buffy features

222  Angela Thomas a strong female protagonist who exemplifies the identity of the ‘girl power’ movement in pop culture and literature that became mainstream in the last decades of the twentieth century. For many, the discourses surrounding gen­ der in the newer Twilight saga have seemed a return to traditional notions of femininity and masculinity, with its heroes playing out stereotyped roles of desire and storylines of females being protected or rescued by males. Not content with this, young activist Jonathan McIntosh created a remix, which in his own words, was a form of resistance to those discourses: In this re-imagined narrative, Edward Cullen from the Twilight Series meets Buffy the Vampire Slayer. It’s an example of transformative story­ telling serving as a pro-feminist visual critique of Edward’s character and generally creepy behavior. Seen through Buffy’s eyes, some of the more sexist gender roles and patriarchal Hollywood themes embedded in the Twilight saga are exposed. Ultimately this remix is about more than a decisive showdown between the slayer and the sparkly vampire. It also doubles as a meta­ phor for the ongoing battle between two opposing visions of gender roles in the 21st century. (McIntosh, 2009) This is one of many instances whereby youth are inserting themselves into the stories of pop culture, and where they are unable to find their place or voice, they recreate the stories in ways that allow them to become the heroes themselves. Such reinventions cause us to stop and reconsider identity, society, culture, and the tensions young people face on a day-to-day basis as they are bombarded with images and expectations that serve to alienate them. Lessig (2008) comments: There are two goods that remix creates, at least for us, or for our kids, at least now. One is the good of community. The other is education. (pp. 76–77) Elaborating upon Lessig’s good of “education”, I have proposed (Thomas, 2011), is the good of social change. Remixes are not just quirky, or fun, or a video that might go viral, but rather they have the power to change perspectives, thoughts, and behaviours. The potential of remixes for classroom contexts is what we hoped to tap into throughout the research project described next. 4. REMIX, INTERMODALITY, AND POINTS OF DIFFERENCE IN CHILDREN’S ANIMATED NARRATIVES This research project1 incorporated schools from across Australia, in both city and rural areas. Using a 3D animation software program called Kahootz (The Australian Children’s Television Foundation, 2008–2011), teachers

Points of Difference  223 were trained in visual literacy, film literacy, and the technological require­ ments of Kahootz to create machinima (movies using gaming platforms) in 3D worlds (c.f. Thomas, 2008). Teachers then worked with their vari­ ous classes across a range of contexts with a semi-structured plan of work, allowing students time to explore, collaborate, and experiment with their own machinima. The pedagogy provided a range of open-ended tasks that included opportunities to remix traditional tales, and that also allowed students to analyze and evaluate optimal ways to tell multimodal stories. Furthermore, it was structured to provide both meaningful contexts for play and explicit teaching of multimodal design. The focus on multimodal design was to ensure students gained a critical media literacy (Kellner & Share, 2007) enabling them to create richly layered texts using linguistic, visual, gestural, and aural resources to make meanings. The explicit teaching of both a metalanguage of different semiotics and of the ways a story can be told through the texturing of these semiotics facilitated students’ strategic and aesthetic constructions of multimedia texts. The need for explicit teaching of multimodal design has been emphasized in studies of middle school students’ use of animation and digital video (Burn & Durran, 2006; Burn & Leach, 2004; Burn & Parker, 2003). This work showed that when such design was taught, students made very sophisticated commentar­ ies on their reformulated movie texts (Burn & Durran, 2006). The students’ creative transformations of the uses of software facilitated their development of multimodal design knowledge in an enjoyable manner (Burn & Durran, 2006). The multimodal texts created by children were 3D animated movies, created with the software tool, Kahootz. These texts can be considered trimodal, as they use three semiotic modes: words/verbiage, image, and sound. This analysis is designed to address the following three research questions: • To what degree are significant affectual moments realized semiotically and through intermodal complementarity in children’s multimodal writing? • What patterns of intermodal complementarity and parallel configu­ rations of intermodal complementarity exist in significant affectual moments within children’s multimodal texts? • How do these moments reflect critical social literacy? Due to the complexity of analysis, I have selected one child’s animation and related interview data to showcase in detail. This animation was selected because it exemplified the features of intermodal complementarity and paral­ lel intermodal complementarity, as well as demonstrating the potential for this kind of work for social critical literacy. The child identified herself with the pseudonym Mikey, and was in year 6 (11 years old) at the time. The text she cre­ ated was a reversioning of the traditional nursery rhyme, “Little Miss Muffet”: Little Miss Muffet Sat on a tuffet

224  Angela Thomas Eating her curds and whey. Along came a spider, Who sat down beside her, And frightened Miss Muffet away.

The sequence of shots in Table 13.1 shows the first scenes of Mikey’s story. The moment of interest in this sequence is when we see the spider with his cartoonish love-heart eyes, ballooning out of his head when he spots Miss Muffet and falls in love with her. Additionally, purple and red love hearts explode around him. We hear two kinds of sounds—first the spoken expression “ooh la la” and then a musical line that is a romantically themed selection possible within the program. The musical line has sound qualities that might be characterized as bright, soft, and round, using harps, bells, violins, and a major arpeggio. In assigning a semiotic description of the music, I adapt Van Leeuwen’s (1999) theorization of how music conveys emotion: through the use of melody configuration, dynamics, and rhythm. Melodically, the music selected by Mikey is undulating within a narrow pitch range, medium tempo, and soft timbre. Using Van Leeuwen’s descrip­ tions of melody types, this matches his description of tenderness, a perfect selection for a romantic genre. In this moment, Mikey creates a high intermodal convergence within the interpersonal metafunction—a convergence of affect across all three modes, as demonstrated in Table 13.2. Here, all three semiotic resources and the affordances of each (image, shot type, animation, words, spoken qualities of words, melody, sound qualities of melody) converge to create maximum intersemiotic resonance interpersonally. In discussing her choices, Mikey stated: [About the verbiage] Ooh la la is a common expression people use for love . . . well, if somebody wants to make fun of something . . . like if you want to make fun . . . say your sister is going and falling in love with somebody and you don’t know them, you go ooh la la and kind of tease them like cheesy, or cheeky kind of. . . . [About the image] To make the audience involved . . . maybe the eyes with the spider, like to show them that he is definitely in love . . . I deliberately did that (front on view, eyes staring out) . . . yeah, I kind of learnt it from the unit and somewhere else, kind of a bit of both . . . I saw it in Doctor Who . . . with the werewolf . . . he has like black eyes, like he’s a boy and he has completely black eyes and he’s right on you and he makes your blood run cold and stuff . . . they do heaps of close ups with the eyes in Doctor Who, like red eyes . . . and so I used that here but with the romance to make the audience involved. . . . [About the music] Because it kind of sounded like romantic kind of, like he was in love, and he felt like he was a butterfly or something

Points of Difference  225 Table 13.1  Shot sequence of the first scenes in Mikey’s retelling of Little Miss Muffet Image



Little Miss Muffet Sat on a tuffet Eating her curds and whey. Along came a spider,

“Ooh la la”

Soft romantic melody

226  Angela Thomas Table 13.2  Moment of high intermodal convergence in Mikey’s Miss Muffet retelling Image



Interpersonal Meanings Affiliation

Gaze of character directed at reader/ viewer

Feeling (affect)

Visual affect: love heart eyes invok­ ing a romantic facial expression

Feeling (ambience/tone)

Ambience is vibrant and ­familiar

“Ooh la la” lexis expressed with emotion

Tender musical melody

Verbiage expressed with an intimate, romantic tone

Mikey’s justifications for her choices across semiotic resources reveal a strong sense of audience and an understanding of how to communicate meanings multimodally. It is interesting to note that later in the interview, she stated: I absolutely wanted to inform the audience that this was a romance and the spider was in love Her use of the intensifier absolutely here reflects the extent to which she consciously and deliberately imbued meanings across all three modalities, and explains why all three have a high degree of convergence to the attitu­ dinal meanings of the text. Mikey’s next scene (illustrated in Table 13.3) reveals the spider crawling up onto Miss Muffet and giving her a long, loud kiss on the lips. Miss Muffet, horrified, jumps up and shoves the spider off her. The spider lurches back­ wards onto the ground, and Miss Muffet exclaims, “Gross!” This has the effect of first creating a sense of horror or distaste in the viewer as the spi­ der crawls all over Miss Muffet, and then comedic relief as she shoves the poor creature off her. All meanings in the previous shot demonstrated high commitment and modal convergence related to positive affect—romance and love. Here, the ambience remains the same, however Miss Muffet’s charac­ ter and the attitudinal meanings represented in her characterization at this point diverge from the positive, and into the extreme negative, both with the gestural attitude (the shove, head turned to one side as if in disgust) and the verbal attitude (the expression “gross”, a form of indirect negative social sanc­ tion expressed toward the spider and his unwanted attentions). Miss Muffet then reflects a strong convergence of two modes (animated gesture and verbal expression) set up against the still-positive romantic ambience of the setting.

Points of Difference  227 This shift from convergent and divergent intermodal complementarity and the combination of both together in the one shot serves to create the humour in the text, and is summarized in Table 13.4. When discussing her choices for making the spider crawl up onto Miss Muffet’s lap and kiss her, Mikey commented: Just because it was kind of funny and it would get the audience engaged . . . especially on this bit, because you can just feel the spider kind of creeping up on you and people just have shivers sometimes, yeah all of my friends have shivers. . . . The final shot in Mikey’s retelling shows a close-up of the spider with tears falling from its eyes, accompanied by the sound of a sad melodic refrain. This final shot, when positioned with the one earlier in the sequence, as demon­ strated in Figure 13.1, represents a perfect example of a point of difference—not just an example of parallelism, but parallel intermodal complementarity. Table 13.3  Shot sequence of next scene in Mikey’s retelling of Miss Muffet Image



Sound effect of a long kiss


228  Angela Thomas Table 13.4  Moment of both intermodal convergence and divergence in Mikey’s Miss Muffet retelling Image



Interpersonal Meanings Feeling ­(judgement)

‘Gross!’, lexis Depicted affect / action invokes judge­ expressed with ments hove and head disgust turned to one side

Feeling (ambience/ tone)

Ambience vibrant and familiar

Verbiage expressed with an intimate, ­disgusted tone

Figure 13.1  Example of parallel intermodal complementarity in Mikey’s Miss Muffet retelling

Points of Difference  229 When we look at the two shots side by side, the ideational meanings (action, character, circumstantiation) and compositional meanings (promi­ nence) of the two shots are identical, but the point of difference is both the represented affectual meanings (changed facial features) and the tender ver­ sus ironic melodic sound effect. The multimodal parallelism creates a strong narrative arc, but the inter-shot divergence of affectual meanings creates a point of difference that constructs the successful humour of the retelling. This humour is a result of the contradictions between the original nursery rhyme in which the spider is cast as the antagonist of the tale, and Mikey’s transformed animation in which the spider is cast as sympathetic and the victim of the tale, to be empathized with and aligned with by the reader. It seems to me that the interplay of intermodal complementarities within a single shot, within parallel shots, and even between texts (such as an origi­ nal tale and a transformation of that tale as in the Miss Muffet example) may construct significant affectual meanings, and they may also be par­ ticularly revealing of incoherences, ambiguities, contradictions, ironies, and omissions—all defining elements of narrative (Chandler, 2001). As a first step for teachers working with younger children, the kind of work done by teachers within this research project serves a social agenda of permitting and empowering children to create alternatives to stereotypes and traditions. In Mikey’s reversioning of Miss Muffet she has, through the use of intersemiotic moments of humour, been able to position the spider as the vic­ tim of the tale. Opportunities to create resistant texts are an important means for teachers to embed critical literacy into their classrooms, and doing it through humorous reversionings or remixes of traditional tales is an excellent beginning. In the Buffy/Twilight remix introduced earlier in this chapter, the humour actually enables readers to enter into a very serious reconsideration of the represented discourses of gender within the pop cultural phenomenon of Twilight. If children are encouraged to feel free to subvert traditional ste­ reotypes, they are certainly on their way to understanding and practising critical social literacies that will serve them well for their futures. 5. CONCLUSION This chapter has examined the ways in which one particular student, through creating significant affectual moments in her text, has also created moments that allow for resistance to traditional literary stereotypes. I have demon­ strated using Mikey’s text that a primary school child is easily able to exhibit deep understandings about how to effectively create narratives employing three meaning-making resources: image, verbiage, and sound. In particular, the text analysis reveals that a young child has the capacity to use intermodal complementarity and parallel configurations of intermodal complementar­ ity to create points of difference across semiotic resources, in turn producing powerful moments of humour. These moments are also opportunities for

230  Angela Thomas resistance to traditional discourses and ideologies, and demonstrate the criti­ cal power of student-created multimodal narratives, and the ways in which teachers are able to embed both multimodal and critical social literacy in their work with children. If our aim as educators is to further advance students’ capacities for the kinds of innovation, invention, transformation, and boundary pushing so necessary for growth and global citizenship, then we need to provide a mean­ ingful pedagogy that enables such transformations. The pedagogy developed in the project described in this chapter offers teachers one such possibility. NOTE 1. This chapter draws on data from an Australian Research Council (ARC)– funded Linkage Project “Teaching effective 3D authoring in the middle years: Multimedia grammatical design and multimedia authoring pedagogy”, which is a collaboration between the University of New England, the University of Tasmania, and the Australian Children’s Television Foundation. The aims of this ARC project were to provide an account of children’s innovative, trans­ formative, and critical multimodal stories; and to develop a transformative pedagogy for multimodal authoring with the teaching of explicit multimodal metalanguage. At the heart of this project was the recognition that schools and teachers need to find optimum ways to work with multimodal, digital texts in their classrooms, and indeed to reconceptualize literacy in schools to account for multimodality.

REFERENCES The Australian Children’s Television Foundation. (2008–2011). Kahootz (Version 3) [Software]. Retrieved July 1, 2009, from home%2COrderKahootz.vm?navitem=public%2Fbuy Australian Curriculum Assessment and Reporting Authority (ACARA). (2012). English. Retrieved October 1, 2012, from English/Curriculum/F-10 Bourdieu, P. (1998). Practical reason. (R. Johnson, Trans.). Oxford: Polity Press. Browne, A. (2008). Gorilla. Newtown, NSW: Walker Books. Burn, A., & Durran, J. (2006). Digital anatomies: Analysis as production in media education. In D. Buckingham & R. Willett (Eds.), Digital generations: Children, young people and new media. Mahwah, NJ: Lawrence Erlbaum. Burn, A., & Leach, J. (2004). ICT and moving image literacy in English. In R. Andrews (Ed.), The impact of ICT on literacy education (pp. 151–179). London: Routledge Falmer. Burn, A., & Parker, D. (2003). Tiger’s big plan: Multimodality and the moving image. In C. Jewitt & G. Kress (Eds.), Multimodal literacy (pp. 56–72). New York: Peter Lang. Chandler, D. (2001). Semiotics: The basics. London: Routledge. Christie, F. (Ed.). (1990). The future of literacy in a changing world. Melbourne: Australian Council of Education Research. Cope, B., & Kalantzis, M. (2009). A grammar of multimodality. The International Journal of Learning, 16(4), 361–426.

Points of Difference  231 Freire, P. (1972). Pedagogy of the oppressed. Harmondsworth: Penguin. Freire, P. (1995). Pedagogy of hope: Reliving pedagogy of the oppressed. New York: Continuum. Halliday, M. A. K. (1985). An introduction to functional grammar. London: Edward Arnold. Halliday, M. A. K., & Matthiessen, C. M. I. M. (2004). Introduction to functional grammar (3rd ed.). London: Arnold. Jenkins, H. (2006). Convergence culture. New York, NY: New York University. Kellner, D., & Share, J. (2007). Critical media literacy is not an option. Learning Inquiry, 1, 56–69. Kress, G., & Van Leeuwen, T. (2006). Reading images: The grammar of visual design (2nd ed.). London: Routledge. Lessig, L. (2008). Remix: Making art and commerce thrive in the hybrid economy. New York, NY: Penguin. Luke, A. (2000). Critical literacy in Australia. Journal of Adolescent and Adult Literacy, 43(5), 448–461. Martin, J. R., & Rose, D. (2008). Genre relations: Mapping culture. London: Equinox. Martin, J. R., & White, P. (2005). The language of evaluation: Appraisal in English. London: Palgrave Macmillan. McIntosh, J. (2009, June 20). Buffy vs Edward (Twilight remixed). Rebellious Pixels. Retrieved October 1, 2012, from Painter, C. (2008). The role of colour in children’s picture books: Choices in ambience. In L. Unsworth (Ed.), New literacies and the English curriculum (pp. 89–111). London: Continuum. Painter, C., & Martin, J. R. (2011). Intermodal complementarity: Modelling affor­ dances across image and verbiage in children’s picture books. In F. Yan (Ed.), Studies in functional linguistics and discourse analysis 3 (pp. 132–158). Beijing: Higher Education Press. Rosen, M., & Blake, Q. (2008). The sad book. London: Walker. Spielberg, S. (Director). (1975). Jaws [Motion picture]. United States: Universal Pictures. Thomas, A. (2005). Fictional blogging and the narrative identity of adolescent girls. Paper presented at Blogtalk Downunder Conference, May 2005, Sydney. Thomas, A. (2007). Youth online: Identity and literacy in the digital age. New York: Peter Lang. Thomas, A. (2008). Machinima: Composing 3D multimedia narratives. In L. Unsworth (Ed.), New literacies and the English curriculum: Multimodal perspectives. London: Continuum. Thomas, A. (2011). Developing a transformative digital literacies pedagogy. Nordic Journal of Digital Literacy, 6(1–2), 89–101. Unsworth, L. (2001). Teaching multiliteracies across the curriculum: Changing contexts of text and image in classroom practice. Buckingham: Open University Press. Unsworth, L. (2006). e-literature for children: Enhancing digital literacy learning. London: Routledge/Falmer. Unsworth, L. (2008). Explicating inter-modal meaning-making in media and literary texts: Towards a metalanguage of image/language relations. In A. Burn & C. Dur­ rant (Eds.), Media teaching: Language, audience, production (pp. 48–80). Kent Town, South Australia: Wakefield Press and AATE-NATE. Van Leeuwen, T. (1999). Speech, music, sound. London: Macmillan.

14 Bullet Points, New Writing, and the Marketization of Public Discourse A Critical Multimodal Perspective Emilia Djonov and Theo Van Leeuwen

1. ORIENTATION A distinguishing feature of contemporary writing, bullet lists are encountered in everyday instructional materials such as recipes, technical manuals, and textbooks, in promotional texts such as print and TV ads, and increasingly not only in corporate but also in academic presentations and even research articles. Their use is also promoted through ubiquitous software such as PowerPoint and Keynote, where bullet lists are the default option for presenting information in the body of a slide. In this paper we explore bullet points as a case study for understanding and critically evaluating new writing practices. We first consider the semiotic potential of bullet points, with reference to their functions, rules about their use, and actual use in selected examples from a range of different texts, including technical manuals, recipes, advertisements, and slideshow presentations. We then argue that bullet points epitomize new writing practices and the marketization of public discourse. Therefore, a critical multimodal perspective is key to learning how to employ new writing effectively and understanding its role in obscuring and maintaining social divisions. The value of this perspective is illustrated through an analysis of the ways the Australian treasurer’s budget speech for the 2012–2013 financial year is recontextualized in a brochure promoting the government’s economic achievements. 2.  BULLET POINTS

2.1  Definition and Functions A bullet point is a combination of a graphic symbol (e.g., ●, ◆, ❖, ➣), called bullet, and a verbally presented list item, the point. The bullets serve to improve legibility and visually attract attention to key points, providing the ‘impact’ the bullet metaphor suggests, while at the same time fulfilling an aesthetic and often also symbolic function (e.g., a tick on a to-do list). In software such as PowerPoint and Word, bullets can be designed from

Bullet Points, New Writing, and the Marketization of Public Discourse  233 clipboard art or other image files, enhancing their potential to represent ideas. For example, in an undergraduate course on children’s literature that one of us taught, a group of students used a picture of bananas as bullets in a presentation about a picture book series by author/illustrator Anthony Browne in which the main character, Willy, is a chimpanzee. The purpose of a bullet list is to present an unordered series of items (e.g., several precautions to be taken in using a piece of equipment; the ingredients in a recipe) that may or may not be comprehensive (e.g., “tips for using baking soda”). In this respect, bullet lists differ from lists with numbers or letters, which suggest chronological or other priority amongst items, and are used, for instance, when items need to be referred to individually or when the number of items needs to be emphasized (e.g., “Six Ways Our Strong Economy Is Working for Everyday People”). Notably, a bullet list does convey a sense of order due to the alignment and visual similarity of the items that comprise it and the high expectation that list items would be mutually exclusive; it thus contrasts with the more chaotic, “pin-board” presentation of items in some sales catalogues, which functions to invoke a sense of abundance and excitement. This broad function renders the bullet list a “primary” (Bakhtin, 1986) or “elementary” visual-verbal genre (Lemke, 1998), which—like the table—can be employed across a variety of “secondary” genres (e.g., recipes, advertisements, instructions) and “discourse practices” (Van Leeuwen, 2008a) such as slideshow presentations, government reports, and textbooks.

2.2  Rules about Using Bullet Points The ubiquity of bullet lists in contemporary writing is perhaps the reason why rules about their use abound in blogs and other websites, seminars on improving presentation skills, and style guides. These rules reflect the fact that bullet lists straddle the boundary between visual design and written language, and that they can be either written to be read alone or written to be accompanied and elaborated on by speech and gesture, for example, in slideshow-supported presentations. Typographic manuals suggest that nonalphabetic elements such as bullets have to be “in tune with the basic font” (Bringhurst, 2004, p. 75) and draw attention to visual design features such as alignment, symmetry, spacing, and indenting.1 Style manuals for writers and editors (e.g., Commonwealth of Australia, 2002; The University of Chicago, 2010) consider bullet points a form of punctuation and distinguish between: (i) lists that present a series of items that are discursively integrated into the main text of a document so that the series follows, and usually completes, a lead-in sentence, and (ii) stand-alone lists, in which a series follows a simple heading, as is frequently the case in brochures and technical documentation.

234  Emilia Djonov and Theo Van Leeuwen A bullet list can present several hierarchically-related series. Each subdivision is then marked by further indentation and, optionally, a different bullet symbol, and contains at least two items. Items within such series should have “consistent, parallel formats” (i.e., have the same grammatical structure and follow the same punctuation or capitalization rules), and “each dot point within [a series] should flow logically and grammatically from the introductory, or ‘leadin’, material” (Commonwealth of Australia, 2002, p. 141), and may consist of one or more words, of sentence fragments, of complete sentences, or even of whole paragraphs. Style manuals also warn against overusing bullet lists, as this entails the risk of failing to convey a hierarchy of information and obscuring the logical connections within it. As we shall illustrate in the following section, the risk of overuse is particularly evident in slideshow presentations, where bullet points are the default option for presenting information in the body of a slide, appearing automatically when one starts typing in this area. In addition to numerous rules of thumb, bullet points (like the software PowerPoint of which they are a key staple) are subject to strongly polarized views. Some see them as an ideal means for getting straight to the heart of the matter and suggest that they should be kept short, presenting only the essence ideas or keywords, and comply with the ‘6 × 6 rule’ (no more than six bullet points of no more than six words per slide). As early as 1956, A. F. ‘Korky’ Kaulakis, the then manager of employee relations at Exxon Research and Engineering, recommended bullet points (known as “Korky dots” at Exxon) as a solution for highlighting key ideas and overcoming “poor organization” and “unattractive graphic appearance” in the company’s technical reports and manuals (Exxon, 1982, p. 19). In stark contrast, the information design guru Edward Tufte has argued that the “pitch-style typography of PP [PowerPoint] is hopeless for science and engineering” (Tufte, 2003, p. 11). To support this argument, he discusses three slide presentations by Boeing engineers, prepared in January 2003, to help NASA assess the risk of debris that had hit the Columbia shuttle just after take-off presented for the shuttle’s return: These reports provided mixed readings of the threat to the Columbia; the lower-level bullets often mentioned doubts and uncertainties, but the highlighted executive summaries and big-bullet conclusions were quite optimistic. Convinced that the reports indicated no problem rather than uncertain knowledge, high-level NASA officials decided that the Columbia was safe and, furthermore, that no additional investigations were necessary. (Tufte, 2003, p. 9) Some prescriptive literature on slideshow design advises against the use of bullet points (e.g., Atkinson, 2007 [2005]; Bozarth, 2008), for reasons such as that it could damage a company’s brand (Mitchell, n.d.). In sum, bullet points are a ubiquitous feature of contemporary writing, partially because of the ease with which they can be created with

Bullet Points, New Writing, and the Marketization of Public Discourse  235 non-specialist software such as Word and PowerPoint. They can be viewed as a visual-verbal mini-genre that functions to present an unordered series in which each item is emphasized by a graphic symbol, at the same or similar level of abstraction to the other items, and aligned with and visually similar to them. Discursively they can be integrated in stand-alone writing (e.g., technical manuals) or in writing designed to be spoken to, or accompanied by speech and gestures (e.g., presentation slideshows). Regardless of which rules about bullet points are followed and to what extent, creating a bullet list, as we discuss in Section 4, always involves itemizing and more often than not condensing information, so that some points are foregrounded while others are backgrounded or omitted. 3. NEW WRITING AND THE MARKETIZATION OF PUBLIC DISCOURSE Bullet points epitomize two dominant trends in today’s semiotic landscape, which, taken together, warrant a critical multimodal perspective. The first is what Kress (2003) terms “writing in the age of the screen” and Van Leeuwen (2006, 2008b, 2010) calls “the new writing”. The second is “the marketization of public discourse” (Fairclough, 1993), which plays an important role in obscuring, naturalizing, and/or perpetuating power relations in consumerist society. New writing differs from ‘old writing’ in two main aspects. First, it is governed by the logic of space, which is typically associated with images, and consequently blurs the boundary between language and image. In new writing, ideas can be represented through words and/or images, but the cohesive and coherent organization of these ideas depends less on verbal syntax and rhetorical organization and more, and sometimes exclusively, on visual design resources such as alignment and colour coding. The second difference is that new writing is more tacitly regulated, not through style manuals and explicit teaching, but through rules built into semiotic technologies such as office software, where one’s spelling can be automatically corrected and bullet lists can be automatically aligned, have their first word capitalized, and so on (for an exploration of these rules in PowerPoint’s design and use, see Djonov & Van Leeuwen, 2012). This change involves “gains” as well as “losses” (Kress, 2005) for communication practices as well as multimodality research. The logical relation between bullet list items aligned with each other, for example, is one where each presents a new point—a relationship defined as “extension” in Halliday’s system for classifying logico-semantic relations in language (Halliday, 1994; Halliday & Matthiessen, 2004)—and each item is subordinate to, or in Halliday’s terms “elaborates”, the same superordinate point, which may or may not be presented through a heading. Although a bullet list’s visual structure allows these relations to be perceived at a glance, it cannot reveal

236  Emilia Djonov and Theo Van Leeuwen other relations that may also exist within information presented in a list (see further Section 4.2). Loss of clarity is also associated with presenting information in a table (see examples in Van Leeuwen, 2006, 2008b, 2010). Such observations suggest that spatial logic disconnects writing from speech and makes new writing difficult, if not impossible, to read aloud, while at the same time making it more interactive, demanding that readers interpret the presented information depending on their own current knowledge and needs, and more dialogic (Van Leeuwen, 2006, 2008b), as presenters, for example, clarify the relationships among bullet list items in a slideshow for their audience. Exploring new writing practices may also motivate a critical reexamination of existing models of verbal, nonverbal, and multimodal semiosis. To illustrate, on its own the conjunction and, like a bullet list, makes explicit the presence but often not the meaning of a logical, or conjunctive, relation between two stretches of text. The explicitness of these conjunctive relations should therefore be understood not as a binary option where relations marked by a conjunction are explicit and those that are not are implicit (in the way explicitness is modelled in Halliday & Matthiessen, 2004), but as a scale, for nonverbal, verbal, and multimodal semantic conjunctive relations alike (Djonov, 2005). New writing also highlights the fluid boundaries between image and writing.2 Knox (2009) argues that the thumbnails in online newspapers have shifted away from the full meaning potential of images and visual-verbal relations, and function more as extended graphology, similarly to punctuation in written language and the initial capital in illuminated manuscripts (see also Matthiessen, 2007; Thibault, 2007). The use of images rather than graphic symbols in bullet lists, by contrast, represents a shift in the opposite direction, from writing to image. Finally, new writing reinforces the need to develop models for critically exploring the interplay between semiotic technology and semiotic practice (Zhao, Djonov, & Van Leeuwen, in press) and the norms that regulate it (Djonov & Van Leeuwen, 2012). The default status of bullet points as a means of presenting information in the body of a slide, for instance, can explain both their dominance in slideshows and cases where they create confusion, for example where the relationships between list items or between a list and other information on the same slide cannot be disambiguated without reference to the speech and/or gestures that accompany the slide (Djonov & Van Leeuwen, in press; Zhao et al., in press). A survey of 27 presentation slideshows (from 10 corporate presentations and 17 undergraduate university lectures) comprising 469 slides with text in the body of the slide showed that 56 percent of those slides used bullet lists to present all or some of that text (Djonov & Van Leeuwen, in press). It also revealed many instances where bullet lists obscure the semantic relations within the information presented on a slide, due to their visual harmony with other layout features. In one such instance, from a cultural studies

Bullet Points, New Writing, and the Marketization of Public Discourse  237 lecture on the semiotics of touch, visually similar bullet points present content belonging to different levels of abstraction, as the idea that touch is a universal aspect of embodiment is followed by examples of how touch is viewed in two different cultures: • Touch is a fundamental aspect of embodiment—defining embodied being • Anlo-Ewe people of West Africa describe consciousness via the concept, seselelame—touch or feeling in the body (literally: feel-feel-atflesh-inside) • Mind/body split of Western ontology devalues the body: consciousness configures as ‘mind’, of the mind/intellect Another example, from a cultural studies lecture about the Spanish Civil War, is the slide in Figure 14.1. The first point asks a general question “What is war?”, while the following two represent increasingly specific questions about civil wars, from “What is the difference between a civil war and other types of war?” to “What are modern day examples?”. The map, showing current versus concluded civil wars, relates mainly to the last question—a relation that the slide’s layout alone does not reveal. In short, there is a mismatch between the identical indentation of the bullet points and their semantically different content. In each case, lack of clarity on the slide was made up for by the lecturer’s speech and gestures, which clarified the nature of the semantic relationships

Figure 14.1  A cultural studies lecture slide

238  Emilia Djonov and Theo Van Leeuwen between the list items. Yet slideshows are increasingly distributed on their own, even when initially designed for a specific presentation (cf. Farkas, 2006; Yates & Orlikowski, 2007). Thus, bullet points that are written to be elaborated on in a presentation and, more generally, new writing that is dialogic are instead treated as non-dialogical and designed to stand alone. Two characteristics of new writing in general and bullet points in particular—their tendency to promote itemizing and to condense information and the ambiguity this tendency creates—play a key role in the “marketization of public discourse”, the introduction of promotional discourse into an increasingly wide range of social domains—the home, the school, government, and so forth (Fairclough, 1993, 2001). This reflects and caters to the “attention economy” (Goldhaber, 1997) of the information age, whose main commodity is information and whose most valuable currency is attention. The new writing is the writing of the information age. It privileges the idea of “information”, of morsels of fact that can stand alone and mean what they mean without needing to be included into a larger web of meaning, a higher level of understanding. (Van Leeuwen, 2006, p. 11) At the same time, the ambiguity resulting from the shift to new writing obscures the power relations that define this economy—relations between those who produce and those who consume information and knowledge, for example—and so demands a critical perspective. Indeed, bullet points turn discourse into information or even barely packaged data, and create what Leech (1966) called “disjunctive language” and saw as typical of advertising. Such language is exemplified by the use of isolated nominal groups, prepositional phrases, and nonfinite clauses in advertising (e.g., ‘The eyecatching new Lancer coupe’, ‘For especially sensitive skin’, ‘Meeting the needs of today’s toddler’), which, following Halliday’s (1994) “grammar of little texts”, can be classified as minor clauses and are also common in bullet-point grammar (e.g., “Disappearance of the East-West divide” from a bullet list discussed further below). Open to interpretation, such language enables advertisements to appeal to a wider audience, while grammatically backgrounding the possibility of propositions made in advertisements being challenged. By contrast, this possibility is inherent in major, declarative clauses, where the presence of a complete mood element comprising subject + finite (e.g., The new Lancer coupe + is) presents information as open to negation and argument (e.g., a statement such as “The new Lancer coupe is eye-catching” can be countered with “No, it isn’t”). With such changes [in writing practices]—which may seem superficial— come others, which change not only the deeper meanings of textual forms but also the structures of ideas, of conceptual arrangements, and of the structures of our knowledge. Such seemingly superficial changes are altering the very channels in which we think. Bullet points are, as

Bullet Points, New Writing, and the Marketization of Public Discourse  239 their name suggests, bullets of information. They are “fired” at us, abrupt and challenging, not meant to be continuous and coherent, not inviting reflection and consideration, not insinuating themselves into our thinking. They are hard and direct, and not to be argued with. (Kress, 2003, pp. 16–17) Fairclough’s (2001) analysis of a Green Paper on Welfare Reform by UK’s New Labour Government, for example, suggests that bullet lists contribute to making this document directive and nondialogical, a document that serves the hybrid purpose of educating the public and promoting the government’s initiatives, rather than encouraging consultation between the government and the public. Bullet lists, he argues, are simultaneously “pedagogical devices for presenting information in an easily digestible form” (p. 257) and a promotional, both “reader-friendly” and “readerdirective”, means of “pre-structur[ing] reader expectations” (p. 259). Most importantly, they present not arguments but lists of items, thereby setting up a “non-dialogical divide between those who are making all those assertions and those they are addressed at—those who tell and those who are told, those who know and those [who] don’t” (p. 260). Similar observations are made in Strathern’s (2006) discussion of university mission statements. Our analyses of slideshow-supported lectures, too, reveal the inability of slideshows alone to present coherent arguments and constructions of knowledge (cf. Martin, 2010; Myers, 2000), and their tendency to obscure the nature of pedagogic relations. In a first-year undergraduate European studies lecture on the division between Eastern and Western Europe, for example, the title of a slide reads “Since 1989 . . . ” and the body presents the following two bullet lists, one below the other, without headings and with space between them: • Disappearance of the East-West divide? • End of Eastern Europe? • Reunited Europe? One Europe? • No common history • Fears of an invasion by barbarians • “Yes, you are European, but only of mixed blood” (Giuliano Amato, Italian PM, 2000). The slide was accompanied by the following speech: So history lesson finished, we’re going back to 1989 where we started um and we have established that the term and concept of an Eastern Europe versus a Western Europe has existed throughout history, although with different boundaries and different categories for dif-

240  Emilia Djonov and Theo Van Leeuwen ferentiation. So that’s something I really wanted to emphasize at the beginning of this lecture. Um going back to the opening point: the fall of the Berlin Wall, did it mean the end of Eastern Europe? Has Europe now been reunited under the umbrella of democratic states and market economies? Um different debates um arose as a result of the fall of the wall and um one of them um is um addressed in um one of your additional readings, Snyder’s reading about two parts of Europe not really having a common history. Um another one um is the fear of invasion by barbarians, which still seems to be valid in some parts of Western Europe. That’s one of your mandatory readings. Um and to exemplify the third kind of attitude I wanted to quote the Italian Prime Minister in um the year 2000 talking to um the Hungarian, Hungarian I’m pretty sure, “Yes, you are European, but only of mixed blood”, which directly sends you back to the quote from the 18th century perception of the Eastern European people who are really of mixed blood and they only become European if they undergo a process of um um civilization from the west. Now we come to the. . . . The speech reveals that the first bullet list presents a set of questions raised by the fall of the Berlin Wall in 1989 and the second three views on these questions, which is a distinguishing feature of the “discussion genre” in history (Martin & Rose, 2008). In the speech, the use of internal conjunctive relations (e.g., “going back to the opening point”) and the emphasis on a main thesis (“that the term and concept of an Eastern Europe versus a Western Europe has existed throughout history”) reflect the overall rhetorical, rather than chronological, organization of the lecture as a type of exposition (Martin & Rose, 2008).3 But these elements are not present on the slide, which instead presents the key points the lecturer has planned to elaborate on in the accompanying speech. Following Bernstein (1990, 2000 [1996]), a lecture can be viewed as a pedagogic discourse consisting of two registers—an instructional register, concerned with educational content/knowledge, and a regulative register, encompassing the pedagogic relationship and the lecture’s goals, pacing, and sequencing (see further Christie, 2000, 2002). In this case the slide, especially when considered on its own, construes only the instructional register, while the speech also instantiates the regulative register, as is evident from its use of direct address and reference to “your mandatory readings”. The trend for lecture slideshows to present mainly educational knowledge, combined with the increasingly common practice of making them accessible to students as stand-alone documents, can be attributed to the increasingly dominant view of universities today as providers of knowledge with students as customers. As this example suggests, the new writing in lecture slideshows could also obscure the power relations between students and their lecturers, who design the curriculum and decide, for example, which readings to include as mandatory and which as additional, as it promotes a disassociation of power from expertise.

Bullet Points, New Writing, and the Marketization of Public Discourse  241 4.  A NATIONAL ECONOMIC REPORT IN BULLET POINTS Drawing on diverse examples of the use of bullet points, we have so far argued that a critical multimodal approach is key to understanding new writing and its role in the marketization of public discourse. We now present a case study—an analysis of the ways segments from an Australian national budget speech have been recast into a bullet list in a brochure promoting the government’s achievements—in order to demonstrate the potential of this approach to expose the contribution of bullet lists to obscuring and maintaining social divisions. The brochure “Our Strong Economy”, presented in Figure 14.2, consists of four A4 pages—front and back page and a middle, double-page, section. Authorized by the Australian Federal Labour Government, it was distributed in the electoral area of Parramatta, part of metropolitan Sydney in the state of New South Wales, in May 2012, a fortnight after Treasurer Wayne Swan’s 2012–2013 National Budget Speech, and reinforces key messages from that speech (Commonwealth of Australia, 2012). We should add that the Labour Party lost its Parramatta seat in the 2011 State Elections, even though in 2010 it had won (by 50.58 percent) the federal elections in this traditionally working class area. Visually, the brochure is presented as ‘a package’. The title on the front page and the area labelled “Australia’s Economic Report Card” on the back page are both tilted and resemble respectively a stamp and a customs form stuck on a parcel. A paper string is depicted as if tied around that parcel and the ‘report card’ as if attached to it by sticky tape. The bottom right corner of the top page is ‘turned up’, with the words “Find out . . .” positioned ‘underneath’ it. The ‘package’ metaphor is also supported by the middle section, which provides information on six initiatives that are presented as the result of the government distributing to ‘everyday people’ the benefits of the mining boom by sharing revenue from a very controversial mining tax proposed in November 2011 and legislated on March 20, 2012. Our analysis of the brochure focuses on the ‘report card’ on the back of the brochure and is based on Van Leeuwen’s (2008a) framework for critical discourse analysis. At its heart is the idea that there is a distinction between social practices and their representation in texts, or discourses, and that “all discourses recontextualize social practices” (p. vii), which is why the same social practice may be subject to different representations, or may attract “a plurality of discourses” (p. 6).4 Van Leeuwen theorizes a social practice as a sequence of physical and/or semiotic activities that comprises the following elements: social actors, their activities, and reactions to these activities or to other elements of the social practice; the location(s) and time(s) of the practice; and the grooming, dress, tools, and materials required for it. Understanding how discourses recontextualize social practices involves relating discourse to social practice by exploring how representations of

Figure 14.2  “Our Strong Economy” brochure: Front, middle and back pages

Bullet Points, New Writing, and the Marketization of Public Discourse  243 the social practice transform it through the use of verbal and/or nonverbal resources. Transformations may involve substitution, deletion, and rearrangement of the elements of a social practice, and/or addition of evaluations, purposes, or legitimations. Van Leeuwen (2008a) demonstrates, for instance, how in a text about the first day of school a nominalization transforms the action of a teacher separating children from their parents by representing it as a phenomenon (‘the separation from families’), deleting the teacher, who is a central actor in this activity, and substituting individual children and parents (e.g., Mary and her mother) with aggregate nouns (‘families’) (pp. 17–18). In the analysis below, we focus on how the report card selects from and transforms components of the budget speech into a bullet list, that is, how one text, or cohesive and coherent semiotic construction, transforms another. This “textual recontextualization” of one semiotic sequence into another is what Iedema (2003) terms resemiotization: “Resemiotization is about how meaning making shifts from context to context, from practice to practice, or from one stage of a practice to the next” (p. 41).

4.1 Deletion Deletion is arguably the most striking transformation involved in the condensation of the budget speech into a bullet-list-style report card. The speech is just over 3,300 words in length and organized into six sections. The first, “Strong Economy and Fair Australia”, serves as an orientation both to the budget process, informing the audience that “This Budget is about discipline and restraint but also about priorities”, and to the rest of the speech. It is followed by sections titled “Economic and Fiscal Strength”, “Spreading the Benefits of the Boom”, “Building for the Future”, “Balanced Budget”, and the conclusion “The Fair Go” (quoted below), which summarizes the speech by drawing attention to the key outcomes of that process: Madam Deputy Speaker, this Labor Government believes the tremendous opportunities of the mining boom should be shared fairly with all Australians. Ours is a country where people who work hard should get fairly rewarded, where there’s an optimism that comes with economic and social mobility. In a global economy marked by anxiety and uncertainty, our nation is a beacon of resilience, stability and success. Not just for the strengthening surpluses we will build years ahead of our peers. Not just for growth rates outpacing the major advanced economies over coming years. But for the resilience of our people, and the value we attach to the fair go. And now, amidst great change, new challenges lie ahead. That’s why this Budget supports workers and parents and helps businesses prosper. It’s why we are boosting super and skills; aged care and dental care;

244  Emilia Djonov and Theo Van Leeuwen and building an insurance scheme for the most vulnerable. All good Labor policies—with one purpose: To create more wealth, and turn our remarkable economic success into a stronger, fairer community as well. The report card, presented in Figure 14.3, is 203 words and presents information extracted mainly from the first two sections of the speech. It not only avoids mention of big business, the dependence of Australia’s economy on Asia, and government debt and other complex issues presented in the middle sections of the speech, but leaves out any acknowledgment of the fact that the budget is “about discipline and restraint” and of the contribution of the people, the mining boom, and specific policies to Australia’s strong economy. The information presented in the brochure is in stark contrast with the opening of the Treasurer’s speech: The four years of surpluses I announce tonight are a powerful endorsement of the strength of our economy, resilience of our people, and

Figure 14.3  “Our Strong Economy” brochure: Report card

Bullet Points, New Writing, and the Marketization of Public Discourse  245 s­uccess of our policies. . . . This Budget delivers a surplus this coming year, on time, as promised, and surpluses each year after that, strengthening over time. . . . It does these things for a core Labor purpose: To share the tremendous benefits of the mining boom with more Australians. In short, the brochure depoliticizes the speech, producing a message about the strong economy, which presents the surplus simply as an achievement or a profit, rather than as a result of reducing expenses and introducing controversial policies. While there are admittedly references to the ‘fair go’ and the ‘benefits of the mining boom’ in the brochure’s middle section, the report card no longer presents these as part and parcel of the government’s budget strategy. In addition to this selective focus, the bullet list encourages deletion at the level of grammar as evident in the headlines, which make use of “the grammar of little texts” (Halliday 1994, pp. 392–397), similarly to news headlines and advertising copy. Half of these—“fought off the global financial crisis”, “back in surplus, on time, as promised”, and “more than 750,000 jobs ­created”—have an incomplete or missing mood element (i.e., subject + finite). The rest—“low unemployment”, “low interest rates”, and “a growing economy”—are ­nominal groups. In relation to the report card’s heading and its function, the latter may be interpreted as serving the grammatical role of attributes as they represent qualities that characterize Australia’s economy as an implied Carrier (see Halliday & Matthiessen, 2004, pp. 244–245, on attributive: possessive clauses). All six headlines, however, share the quality of objectifying Australia’s economic achievements, grammatically presenting them as facts that cannot be contested, and encourage readers to retrieve any social practice elements that are left out of the headlines from the statements below them. For example, the agent in “fought off the global financial crisis” is revealed to be “The Federal Labour Government”.

4.2 Substitution Another transformation of central interest for this paper is the substitution of old with new writing: although the budget speech as a written document relies exclusively on verbal cohesive resources, the report card draws on resources for signalling cohesion visually. In terms of systemic functional theory’s framework for analyzing logico-semantic relations (Halliday & Matthiessen, 2004, pp. 395–440), the uniformity and alignment in the visual presentation of bullets and headlines and the list’s position under the heading “Australia’s Economic Report Card” present the list items as related to each other through extension and to the heading through elaboration, each adding another achievement to exemplify Australia’s economic performance. From the perspective of Kress and Van Leeuwen’s (2006 [1996], pp. 79–87) grammar of visual design, the list is a conceptual representation that can be more delicately defined as an overt taxonomy in which each item functions as subordinate to the explicitly mentioned superordinate in the heading. Because the bullets are in the

246  Emilia Djonov and Theo Van Leeuwen form of circled ticks, the list immediately conveys the message that Australia has ‘ticked’ six measures of successful economic performance. As each tick precedes rather than follows its corresponding ‘report’ item, the report-card genre is visually resemiotized to serve a more promotional purpose. A closer reading of the report card, however, suggests that its visual structure obscures the complex relationships among the bullet list items. Focusing on the six headlines, “low unemployment” and “more than 750,000 jobs created” are not two distinct achievements; together with “low interest rates” and “back in surplus, on time, as promised”, they are subpoints, examples, of “a growing economy” and of success in “[fighting] off the global financial crisis”, that is, related to those two headings through elaboration, rather than extension. Put simply, the six bullet points do not in fact represent six distinct accomplishments, even though this may be the immediate impression created by the bullet list’s visual structure. Reading below the list headlines, the first point also attributes the low unemployment and growing economy to the government’s efforts through an enhancement/cause relation: “The Federal Labour Government steered the country through the GFC and as a result hundreds of thousands of jobs were saved and we were virtually the only advanced economy to avoid recession”. The second and fifth points—“there’s no clearer sign of a strong economy than a surplus” and “Delivering a surplus means the Reserve Bank has maximum scope to reduce interest rates as it deems appropriate”—equate the surplus with low interest rates and strong economy as they construe an elaboration relation between “(delivering) a surplus” as a sign (in grammatical terms, Token) of a ‘strong economy’ and potential for reduced interest rates (Value) (on Token/Value relations, see Halliday & Matthiessen, 2004, pp. 230–234). So, while the visual cohesive relations may be simplistic or even misleading, the value of using a bullet list lies in ensuring that whether looked at accidentally or read in detail, the report card maximizes the chance of promoting the government’s economic performance.

4.3 Rearrangement The report card also rearranges the statements made in the first two sections of the budget speech. The speech first presents the surplus and then focuses on the creation of jobs. This is followed by references to low unemployment, economic growth, and interest rates, with each section or subsection opening with an assertion of Australia’s economic strength. The report card changes the order of these achievements, starting with the strength of the economy, then announcing in sequence the surplus, low unemployment, creation of jobs, low interest rates, and ending with a statement about economic growth. This order, alongside the visually and verbally realized logico-semantic relations discussed above, allows the list to function as more than a list or even a report card; it makes the verbal text akin to an exposition, starting with a general statement of the strong economy, then introducing the surplus as a

Bullet Points, New Writing, and the Marketization of Public Discourse  247 “sign” of economic strength and listing some examples of strength before ending with an overall statement of economic strength.

4.4 Addition Perhaps most pertinent to the argument presented here is the addition of the bullets in the report card. They not only segment the writing into visually salient sections but, because they take the form of ticks, they also instantly orient readers to the positive attitude each section seeks to convey. As the bullet list itemizes Australia’s economic performance, it allows this positive attitude to be repeated over and over. This is can be done explicitly, for example by stating that Reserve Bank having scope to lower interest rates is “good news for Australian families”, or implicitly, by contrasting Australia’s economy with the economies of other advanced nations (as the speech does too) and by presenting figures that increase the sense of factuality of the report card, including some that are not in the speech, “27 million jobs have been lost elsewhere in the world” and “Our economy is over 7% larger than it was before the Global Financial Crisis” (on the role of such resources for evaluative language, see Hood & Martin, 2005). So while the alignment of the bullet list items may obscure the fact that they present ideas at different levels of abstraction (e.g., “a growing economy” is more abstract than “low unemployment” or “low interest rates”), it foregrounds the key purpose of the brochure: promoting a positive attitude toward the government’s economic achievements. 5. CONCLUSION We have shown that bullet lists demonstrate the multimodal nature of new writing practices and illustrated their contribution to the marketization of public discourse. Through examples from various contexts and an analysis of the ways in which a particular bullet list has resemiotized parts of the Australian treasurer’s budget speech for the 2012–2013 financial year into a brochure promoting the government’s economic achievements, we have illustrated the value of a critical multimodal approach to understanding how new writing practices such as the use of bullet points are tacitly, albeit not exclusively, regulated by the design of ubiquitous software such as PowerPoint and how they may function to obscure or maintain social divisions. ACKNOWLEDGMENT The paper is part of a larger project, “Towards a Social Theory of Semiotic Technology: Exploring PowerPoint’s Design and its Use in Higher Education and Corporate Settings”, which is supported through an Australian Research Council Discovery Grant.

248  Emilia Djonov and Theo Van Leeuwen NOTES 1. Notably, such recommendations treat typography as a resource for presenting information through writing as a linguistic mode. Two alternatives to this view are implicit in the design of software such as Microsoft Word, where bullets and numbering are presented in a separate menu to that offers different choices of font type, style, and sizes, and Microsoft PowerPoint, where changing a slideshow’s design theme alters bullet symbols along with the font, background, and decoration of slides. 2. See Kress (2003) on the similarities and differences between writing and speech as distinct, albeit related, modes. 3. The lecture in fact represents a macro-genre, for embedded within the exposition are a historical recount, a factorial explanation, discussions, and occasionally the initiation-response-feedback structure typical of classroom interactions (Sinclair & Coulthard, 1975). 4. This argument extends to all discourses Basil Bernstein’s (1990) theory of pedagogic discourse—specifically the idea that pedagogic discourse recontextualizes knowledge from the contexts where it is produced to pedagogic contexts where it is reproduced and disseminated, and that this recontextualization involves semantic shifts that maintain the existing social order.

REFERENCES Atkinson, C. (2007 [2005]). Beyond bullet points: Using PowerPoint 2007 to create presentations that inform, motivate and inspire (2nd ed.). Redmond, WA: Microsoft Press. Bakhtin, M. M. (1986). Speech genres and other late essays. (C. Emerson & M. Holquist, Eds., V. W. McGee, Trans.). Austin: University of Texas Press. Bernstein, B. (1990). The structuring of pedagogic discourse: Class, code and control (Vol. 4). London: Routledge. Bernstein, B. (2000 [1996]). Pedagogy, symbolic control, and identity: Theory, research, critique (Revised ed.). London: Rowman & Littlefield. Bozarth, J. (2008). Better than bullet points: Creating engaging e-Learning with PowerPoint. San Francisco, CA: John Wiley and Sons. Bringhurst, R. (2004). The elements of typographic style (Version 3.2). Point Roberts, WA: Hartley & Marks. Christie, F. (2000). The language of classroom interaction. In L. Unsworth (Ed.), Researching language in schools and functional linguistic perspectives (pp. 184– 203). London: Cassell. Christie, F. (2002). Classroom discourse analysis: A functional perspective. New York: Continuum. Commonwealth of Australia. (2002). Style manual for authors, editors and printers (6th ed.). Stafford, BC: John Wiley & Sons Australia. Commonwealth of Australia. (2012). Budget Speech 2012–13. Retrieved from Budget_Speech.pdf Djonov, E. (2005). Analysing the organisation of information in websites: From hypermedia design to systemic functional hypermedia discourse analysis (Doctoral thesis). University of New South Wales, Sydney. Retrieved from http:// Djonov, E., & Van Leeuwen, T. (2012). Normativity and software: A multimodal social semiotic approach. In S. Norris (Ed.), Multimodality and practice: Investigating theory-in-practice-through-method (pp. 119–137). New York: Routledge.

Bullet Points, New Writing, and the Marketization of Public Discourse  249 Djonov, E., & Van Leeuwen, T. (in press). Between the grid and composition: Layout in PowerPoint’s design and use. Semiotica. Exxon. (1982). How the • got its name. The Record, October, 19. Fairclough, N. (1993). Critical discourse analysis and the marketization of public discourse: The universities. Discourse and Society, 4(2), 133–169. Fairclough, N. (2001). The discourse of new labour: Critical discourse analysis. In S. Yates, S. Taylor, & M. Wetherell (Eds.), Discourse as data: A guide for analysis (pp. 229–266). London: Sage. Farkas, D. K. (2006). Towards a better understanding of PowerPoint deck design. Information Design Journal, 14(2), 162–171. Goldhaber, M. (1997). The attention economy and the net. First Monday: Peer Reviewed Journal on the Internet, 2(4). Retrieved from http://www.firstmonday. dk/issues/issue2_4/goldhaber Halliday, M. A. K. (1994). An introduction to functional grammar (2nd ed.). London: Arnold. Halliday, M. A. K., & Matthiessen, C. M. I. M. (2004). An introduction to functional grammar (3rd ed.). London: Arnold. Hood, S. E., & Martin, J. R. (2005). Invoking attitude: The play of graduation in appraising discourse. Revista Signos, 38(58), 195–220. Iedema, R. (2003). Multimodality, resemiotisation: Extending the analysis of discourse as multi-semiotic practice. Visual Communication, 2(1), 29–57. Knox, J. S. (2009). Punctuating the home page: Image as language in an online newspaper. Discourse & Communication, 3(2), 145–172. Kress, G. (2003). Literacy in the new media age. London: Routledge. Kress, G. (2005). Gains and losses: New forms of texts, knowledge, and learning. Computers and Composition, 22(1), 5–22. Kress, G., & Van Leeuwen, T. (2006 [1996]). Reading Iiages: The grammar of visual design (2nd ed.). London: Routledge. Leech, G. (1966). English in advertising: A linguistic study of advertising in Great Britain. London: Longman. Lemke, J. L. (1998). Multiplying meaning: Visual and verbal semiotics in scientific text. In J. R. Martin & R. Veel (Eds.), Reading science: Critical and functional perspectives on discourses of science (pp. 87–113). London: Routledge. Martin, J. R. (2010). Life as a theme: Pitching vertical discourse in PowerPoint slides. Paper presented at the 5th International Conference on Multimodality, University of Technology, Sydney, Australia, December 1–3, 2010. Martin, J. R., & Rose, D. (2008). Genre relations: Mapping culture. London: Equinox. Matthiessen, C. M. I. M. (2007). The multimodal page: A systemic functional exploration. In T. Royce & W. Bowcher (Eds.), New directions in the analysis of multimodal discourse (pp. 1–62). Mahwah, NJ: Lawrence Erlbaum. Mitchell, O. (n.d.). 5 ways bullet-point slides damage your brand [Web log post]. Retrieved from Myers, G. (2000). Powerpoints: Technology, lectures, and changing genres. In A. Trosborg (Ed.), Analysing professional genres (pp. 177–191). Amsterdam: Benjamins. Sinclair, J. M., & Coulthard, M. R. (1975). Towards an analysis of discourse: The English used by teachers and pupils. London: Oxford University Press. Strathern, M. (2006). Bullet-proofing: A tale from the United Kingdom. In A. Riles (Ed.), Documents: Artifacts of modern knowledge (pp. 181–205). Ann Arbor: The University of Michigan Press. The University of Chicago. (2010). The Chicago Manual of Style: The essential guide for writers, editors, and publishers (16th ed.). Chicago: University of Chicago Press.

250  Emilia Djonov and Theo Van Leeuwen Thibault, P. J. (2007). Writing, graphology and visual semiosis. In T. Royce & W. Bowcher (Eds.), New directions in the analysis of multimodal discourse (pp. 111– 146). Mahwah, NJ: Lawrence Erlbaum. Tufte, E. R. (2003). The cognitive style of PowerPoint (2nd ed.). Cheshire, CT: Graphics Press. Van Leeuwen, T. (2006). The new writing. Wordrobe: A journal about the future of language, August 2006 (Free Launch Issue), 10–12. Van Leeuwen, T. (2008a). Discourse and practice: New tools for critical analysis. London: Oxford University Press. Van Leeuwen, T. (2008b). New forms of writing, new visual competencies. Visual Studies, 23(2), 130–135. Van Leeuwen, T. (2010). The new writing. Paper presented at the the UTSpeaks public lecture series, University of Technology, Sydney, Australia. Transcript available online at Yates, J., & Orlikowski, W. (2007). The PowerPoint presentation and its corollaries: How genres shape communicative action in organizations. In M. Zachry & C. Thralls (Eds.), Communicative practices in workplaces and the professions: Cultural perspectives on the regulation of discourse and organizations (pp. 67–91). Amityville, NY: Baywood Publishing Company. Zhao, S., Djonov, E., & Van Leeuwen, T. (in press). Semiotic technology and practice: A multimodal social semiotic approach to PowerPoint. Text & Talk.

15 Toward a Semiotics of Listening Theo van Leeuwen

1. INTRODUCTION For some time I have wanted to say something about listening—about what listeners do while someone else is speaking. In the analysis of communication, for instance conversation analysis, the emphasis has been on turn-taking. Each participant is now listener, now speaker. But listening—what happens when someone is not the speaker, what listeners do as listeners—is not transcribed. It is absent from most of the accounts of communication we work with in linguistics, semiotics, and multimodality. The emphasis has been on the text, and yes, listeners, or audiences, are often mentioned, but as silent participants. In literary and media studies, this emphasis on the text has been criticized for over thirty years now. In media studies, for instance, text analysis has often been replaced with ‘ethnographic’ audience analysis through interviews and focus groups, following the groundbreaking examples of Morley (1980), Ang (1985), and Katz and Liebes (1986). But the act of listening or watching itself was not studied. In contemporary interactive media, audiences become ‘users’ who are actively selecting what they are watching and listening to, and in many cases responding and taking turns, as if in conversation. But that hardly means the end of listening and reading and viewing. In a variant of the famous 4’33” experiment of John Cage, I propose here to turn the camera around, away from the stage, and toward the listeners, to try and understand what it is they do and ‘say’ as they are listening. Or, better still, to use a split-screen approach, where you can see both and observe how listening and speaking relate to each other. This will show, I think, that listening is not only a mental activity or an act of interpretation, but also a semiotic activity in which the listener either actively follows and supports the speaker in the way accompanists follow and support singers or instrumental soloists, or silently critiques the speaker, or silently articulates a counterpoint discourse of a kind that critical discourse analysis so far has not yet been able to analyze.

252  Theo van Leeuwen I will explore this theme by focusing on two examples. The first is the use of listening shots in film. Good film directors and editors have a deep understanding of listening and the relation between speaking and listening, but they express it, not in academic language, but in the language of film. As Gilles Deleuze (1986) has said, “The great directors of the cinema may be compared, in our view, not merely with painters, architects and musicians, but also with thinkers. They think with movement-images and time-images instead of concepts” (p. xiv). More particularly, I will draw on Sidney Lumet’s film Twelve Angry Men, an Oscar-winning drama from 1957 that is entirely set in the narrow confines of the jury room where the ‘twelve angry men’ must decide the fate of an 18-year-old who stands trial for the murder of his father (Lumet, 1957). I will use this film as a discourse about the relation between listening and speaking from which there is much to learn. The second example is accompaniment in modern jazz, which, in the work of the best practitioners, can, similarly, be understood as a discursive practice from which much insight about dialogic interaction can be gleaned. This essay can only be exploratory. It will open, or reopen, some questions and leave other questions unanswered. But I hope it will spark an interest in the idea of a semiotics of listening in the context of multimodal discourse analysis. It should be remembered, however, that it is not a new idea, and, more specifically, that it was already clearly stated in the work of Bakhtin (1986): When the listener perceives and understands the meaning (the language meaning) of speech, he simultaneously takes an active, responsive attitude to it. He either agrees or disagrees with it (completely or partially), augments it, applies it, prepares for its execution, and so on. And the listener adopts this responsive attitude for the entire duration of the process of listening and understanding. . . Any understanding is imbued with response and necessarily elicit is in one form or another: the listener becomes the speaker. (p. 68) Nevertheless, it is an idea that now needs to be renewed and rethought in the context of the new discipline of multimodality. We need to ‘make the listener speak’—multimodally. 2.  LISTENING IS MULTIMODAL Following a Hallidayan analysis, the verb listen is not a ‘mental process verb’ or a ‘verb of perception’ like hear or see or smell. It cannot combine with ‘projected clauses’. You cannot say “I listen that you are making a sound” or “I listen you making a sound” in the way that you can say “I hear that you are making a sound” or “I hear you making a sound”. Listening is a behavioural,

Toward a Semiotics of Listening  253 observable process, the kind of process which, as Halliday (1985) has said, is “intermediate between the material and the mental” (p. 128). It can stand on its own. It makes perfect sense to answer the question “What are you doing?” with “I am listening”, without any reference to what is being listened to, whereas we wouldn’t say “I am hearing”. Yet we know that there can be no listening without something or someone being listened to. This behaviour is multimodal. The signs of listening can be, and are, realized in different semiotic modes. They can be realized verbally, by a subset of what is referred to as “reaction signals” (Quirk, Greenbaum, Leech, & Svartvik, 1978, p. 274), signs of agreement or disagreement, and interjections—purely emotive words such as “tsss” or “ooh” or “wow” (or “mmm” in one of its many intonational shades), some using sounds that are not part of the English sound system, such as “ugh”, “whew”, or of course the sounds of laughter or derisive snorting. Echoing or silent mouthing what a speaker has just said is another possibility. But terms such as reaction and interjection negate the simultaneity of speaking and listening, and what I am interested in here is acts of listening that take place while the speaker is speaking. I would therefore rather speak of listening signs in all these cases. Listening can be realized by silent body action—by looking at the speaker, and by the expressions that can go with it, or by the postures that may suggest, for instance, deep concentration, scepticism, or rapture, and which do not necessarily have to be accompanied by looking at the speaker. Listening can be realized musically, through listening signs like rhythmic clapping while the music is going on, or singing along, or as I will show later, through the way accompanists dialogically support what soloists are singing or playing. Finally, listening can also be shared with other listeners. We might nudge the person sitting next to us, or a reaction to a melodic turn in a solo may be shared between a pianist and a drummer, for instance. 3.  RESPONSE VERSUS ACCOMPANIMENT It is important to distinguish between ‘response’ and ‘accompaniment’. A response is a separate turn, a distinct move in a sequential alternation between speaking and listening signs. Whether on purpose or not, and whether there is overlap or not, the speaker pauses to leave a slot for listening signs. Musical call and response structures work the same way, as for instance in the Bobby Timmons’ song: (soloist)  (choir) (soloist) (choir)

Every mornin’ has me moanin’ Yes Lord ’Cause of all the trouble I see Yes Lord

254  Theo van Leeuwen The structure of the blues, similarly, leaves distinct slots for responses, whether from the audience or from the accompanying musicians. Traditionally there are three lines, the first a kind of premise, the second reaffirming it, and the third making a point about it, as in this traditional blues song from the time of the Great Depression: Teacher, teacher, why are you so poor Teacher, teacher, why are you so poor When it comes to Unions, you’re an amateur After each of the three lines there is a slot for the audience to join in with an “oh yeah”, a scream, or a shout or a wail (cf. Keil, 1991). Alternatively, the slots can be filled with responses from an accompanist. But listening can also accompany a monologue or solo performance and happen while the speaker is speaking, or while the singer is singing. This kind of listening plays a key role in the way filmmakers analyze the development of dialogic interaction. In a scene from Twelve Angry Men, half an hour into the film, Henry Fonda has for some time been the only dissenting voice, arguing that there is reasonable doubt when all the other jury members want to convict. He is almost giving up. Then two jury members start listening. And as Fonda comes up with a specific proposal that quells their earlier doubts or worries, their looks of attentive listening speak as clearly as Fonda, and do so while Fonda continues to speak. Only at the very end, when Fonda rests his case, do the other jurors move to distinct responses. 1.  CLOSE SHOT FONDA         FONDA     I have a proposition to make to all of you. I am going to call for another vote. I want you eleven men to vote by secret written ballot . . . 2.  CLOSE SHOT OTHER JUROR, LISTENING INTENTLY         FONDA       (continues, off-screen)      . . . I’ll abstain . . . 3.  BACK TO CLOSE SHOT FONDA         FONDA       (continues)      . . . Eleven votes for guilty, I won’t stand alone . . . 4.  CLOSE SHOT YET ANOTHER JUROR, LISTENING INTENTLY         FONDA       (continues, off-screen)      . . . We’ll take a guilty verdict to the judge right now . . .

Toward a Semiotics of Listening  255 5.  BACK TO CLOSE SHOT FONDA         FONDA       (continues)     . . . But if anyone votes not guilty we stay here and talk it out. That’s it. If you want to try it, I’m ready. 6.  LONG SHOT, HIGH ANGLE, ALL TWELVE JURORS   They talk at the same time “All right, let’s do it the hard way”, “That sounds fair”, etc. The distinction between ‘response’ and ‘accompaniment’ is not always clear-cut. The two may be combined or shade into each other. Alan Lomax (1968) has described how ‘call and response’ dialogue may eventually result in full simultaneity: at first “the two parts just touch each other”, but then, “as the excitement of the performance grows”, “the chorus will encroach more and more upon the leader’s time, until at last both are singing without letup in exciting rhythmic relationship to each other” (p. 158). Similar structures occur in advertising jingles (Van Leeuwen, 1999, pp. 73ff). And simultaneity does not only occur in music, it can occur in conversation as well, especially in what Deborah Tannen has memorably called “rapport talk” (1990, p. 190), in which speaking and listening become almost indistinguishable. 4. LISTENING AS EMBODIED INTERPRETATION AND EVALUATION In the 1960s, a number of American researchers began to engage in the highly detailed analysis of rhythm in everyday interaction. Sixteen-millimetre films were made of conversations, classrooms, and other social events, and analyzed in fine detail. William Condon reportedly spent a year and a half studying four and a half seconds from Gregory Bateson’s film of a family eating dinner (Hall, 1983, p. 166), and Edward Hall (1983) describes a study of children in a school playground where the rhythms of children playing in different parts of the playground were in sync, not only with each other, but also with those of a girl whose “skipping and dancing and twirling” from group to group, as Hall puts it, “orchestrated the movements of the entire playground” (p. 169). He concluded that “individuals are dominated in their behaviour by complex hierarchies of interlocking rhythm, comparable to fundamental themes in a symphonic score” (p. 153), and that rhythm provides the basic building blocks of behaviour and interaction. Erickson (1982) similarly studied rhythmic integration in a dinner conversation, using musical notation with a stave for each participant, concluding that “rhythm seems to be the fundamental glue by which cohesive discourse is maintained in conversation” (p. 65). Radan Martinec (2000) has convincingly shown that hierarchies of interlocking rhythm also exist between the different

256  Theo van Leeuwen modes in film—the movement of the actors, the rhythm of their speech, and the rhythm of music. In the act of listening, the listener’s body rhythms synchronize with those of the speaker or actor or soloist. It is this rhythmic integration between speaking and listening that makes listening an embodied semiotic activity. If listening takes place at all, every one of the listener’s body actions will be synchronized with the rhythms of the speaker or of the music, whether the small and micro movements of eye blinking, the somewhat more salient movements of feet tapping, or the longer period rhythm of body shifts at major transitions in the speaker’s discourse or the musical performance. Listening signs fit in with the rhythm of the speaker’s speech or the soloist’s solo. They accentuate what to the listeners are significant moments in the speaker’s speech or the musical solo and at the same time communicate listeners’ interpretations and evaluations of the words or musical phrases to the speaker or soloist. Analyzing the placement of listening signs therefore reveals what listeners single out as significant and why. The following excerpt from Twelve Angry Men occurs during a break in the jury proceedings as a late afternoon summer storm has started. Here it is Henry Fonda’s acting, rather than the editing, that provides the listening signs. He is talking with the jury foreman, played by Martin Balsam. Balsam tells Fonda that he is a sports coach and reminisces about a particular game and a particular young player. TWO-SHOT FONDA AND BALSAM, FONDA LEFT FOREGROUND They stand next to each other, both looking out of the window. Fonda looks pensive, preoccupied       BALSAM    Look at that coming down, will you.       FONDA    Yeah, I guess so       BALSAM     Boy! Look at it go! Reminds me of the storm we had last . . . Fonda turns briefly to Balsam, with a hint of a smile, beginning to pay attention, the turns back again to look out of the window       BALSAM     (continues)   . . . November, something. What a storm we had. Right in the middle of the game . . . Fonda turns to Balsam again, now paying more attention, and still with a faint smile, as if he is beginning to get an idea.       BALSAM     (continues)

Toward a Semiotics of Listening  257   We’re behind, seven-six, but we’re just starting to move the ball, off tackle, you know. Slash! Slash! Cut! I’ll never forget that. We had this kid Slattery. A real ox. Wish I had another one like him. Oh, I probably forgot to tell you I am assist and head football coach at the Andrew J. McCorckle High School . . . Fonda turns to look out of the window again.       BALSAM     (continues)   . . . So anyway, we’re moving along nicely . . . Fonda turns to Balsam again.       BALSAM     (continues)   . . . and eh, all of a sudden it starts to come down cats and dogs just like this, whoosh . . . Fonda turns away again, still with that faint smile.       BALSAM     (continues)   . . . right down. Well it was murder . . . Fonda turns to him, nodding slightly and forming a word with his lips, but we cannot hear what he is saying.       BALSAM     (continues)   . . . I swear, I nearly gave up. We couldn’t go nowhere . . . Fonda turns to look out of the window again, pensively. These are the moments Fonda turns to Balsam, the moments that are accentuated by his listening signs: “a storm we had . . . ”, “in the middle of a game . . . ”, the moment Balsam remembers the young player and wishes he was still with him, “move along nicely”, and “murder”. Clearly, Fonda is interpreting Balsam’s story in relation to the case. The jurors’ deliberations have been stormy, they are about midway now, and things are moving—he is gradually winning over more members of the jury. The young player finally reminds him of the 18-year-old who stands trial for his life, and of whom we have seen just a single, though very memorable shot, in the beginning of the film. Listening signs may reveal that the listener has not yet formed an interpretation or evaluation, but is nevertheless deeply engrossed in the speaker’s words, as in the listening of the second juror in the excerpt in Figure 15.1. Or they may provide positive or negative feedback, with varying degrees of intensity, ranging, for instance, from the just perceptible nod to the strongly affirmative nod, and this feedback may be cognitive (agreement

258  Theo van Leeuwen

Figure 15.1  Attentive, thoughtful listening (Twelve Angry Men, Lumet, 1957)

Figure 15.2  Angry, resentful listening (Twelve Angry Men, Lumet, 1957)

or disagreement) and/or affective (like or dislike), with all the shades and nuances this allows. It is this that provides the potential of a semiotics of listening for critical discourse analysis. By paying attention to negative listening signs, we can give voice to silent dissent. Following is an extract from Twelve Angry Men with some negative listening signs, as expressed by the jury member who is won over last, played by Lee Cobb as a truly “angry man”, full of prejudice and resentment. As Fonda’s accusations get sharper, so do Cobb’s listening signs (see Figure 15.2),

Toward a Semiotics of Listening  259 and the cuts between the shots synchronize exactly with the stresses on the syllables “switch”, “room”, “avenger”, “personally”, “facts”, and “sadist” MEDIUM CLOSE SHOT FONDA, THREE OTHER JURORS BEHIND HIM, ALL STARING AT COBB       FONDA    I feel sorry for you. What it must feel like to want to pull the switch . . . TIGHT CLOSE SHOT COBB, looking angrily at Fonda       FONDA     (continues, off-screen)    . . . Ever since you walked into this room . . . TIGHT CLOSE SHOT FONDA       FONDA     (continues)    . . . you have been acting like a self-appointed avenger . . . TIGHT CLOSE SHOT COBB, fuming       FONDA     (continues, off-screen)    . . . You want to see this boy die because you personally . . . TIGHT CLOSE SHOT FONDA       FONDA     (continues)    . . . want it, not because of the facts . . . TIGHT CLOSE SHOT COBB, clenching his jaws       FONDA     (continues, off-screen)    . . . You’re a . . . TIGHT CLOSE SHOT FONDA       FONDA     (continues)    . . . sadist. TIGHT CLOSE SHOT COBB, hissing and starting to move towards Fonda MEDIUM CLOSE SHOT COBB AS TWO OTHER JURORS RESTRAIN HIM.       COBB    Kill him, kill him.

260  Theo van Leeuwen MEDIUM CLOSE SHOT FONDA WITH THREE OTHER JURORS STANDING BEHIND HIM AND LOOKING DISAPPROVINGLY AT COBB       FONDA      You don’t really mean to kill me, do you. Finally, withholding listening signs can be very disturbing. The speaker senses that the listener is resisting, that rhythmic integration is not happening, and that listening signs are not forthcoming. The opening scene of The Godfather (Coppola, 1972) is a good example. Bonasera (Salvatore Corsitto) comes to see the Don Corleone (Marlon Brando) to ask him for help in revenging an assault on his daughter. During his long imploring speech, we never get to see or hear any reaction from Corleone, who is nevertheless massively present, seen from behind, out of focus and making Bonasera seem small in comparison to his bulk. As the scene progresses, Bonasera’s discomfort becomes increasingly palpable. 5.  MUSICAL ACCOMPANIMENT AS LISTENING The beginning of modern jazz in the late 1930s saw the increased importance of the solo, taken in turn by different instruments. Part and parcel of this development was a new approach to accompaniment. In the playing of Kenny Clarke, and eventually all modern jazz drummers, drums still stated the rhythm but much more lightly than before, replacing the bass drum with the high-hat cymbal as the main time keeper and adding irregularly placed snare drum accents to “help the soloist”, as Clarke said in an interview (quoted in Russell, 1973, p. 133). Soloists acknowledged the inspiration they received from the drummer’s accompaniment. As trumpeter Freddie Hubbard said: I usually play off what a drummer plays. Since I’m being the creative force at that moment, solowise, usually I’m trying to lead the drummer, but I will use certain licks I hear him play and play off that. . . . (quoted in Taylor, 1993, p. 199) Bud Powell similarly developed a piano style in which the piano, rather than playing repetitive rhythmic patterns (the ‘stride’ style) still provided harmonic support but in the form of “very very sparse, very very staccato, and very irregular . . . locked-hands” chords (Anthony Brown, quoted in Meadows, 2003, p. 94). These chords meshed in with the drums to respond to key moments in the solo. They act as listening signs, communicating interpretations of, and reactions to key moments in the solo and so contribute to the dialogic development of the musical performance. Anthony Brown, again:

Toward a Semiotics of Listening  261 You’ll hear Charlie Christian play a figure like [sings a phrase] and all of a sudden, Clarke will pick it up and start playing it too, and start driving it, and then start laying in a backbeat, and set off that momentum, so the drummer becomes a very very key element in the shaping of the Bebop sensibility.” (quoted in Meadows, 2003, p. 101) And such listening signs can have as many shades and nuances as the facial expressions of listeners: The drummer’s listening signs draw on a whole vocabulary of sounds to add emotive colour—crisp rimshots, soft sizzles, military-like snare drum shots, short, growling roars, or cymbalic explosions. And the pianist’s (or guitarist’s) listening signs can be soft or loud, clipped or sustained, gentle or brusque, sparse or dense, mellifluous or dissonant, and much more. Consider two examples. In a recording of the Lee Morgan Quintet (Morgan, 1997), Billie Higgins’ snare drum accents during Wayne Shorter’s tenor solo on “Speedball” would make no sense if they were played on their own. They are nothing like the repetitive preset rhythms programmed into modern keyboards or drum machines. They only make sense in relation to Wayne Shorter’s solo, and they closely follow the melody, first emphasizing the key notes in the descending motif with which the solo opens, then the key notes of a rising melody, and finally echoing the climactic high note that follows. Higgins is listening closely, following exactly where the solo is going. Each hit on the snare drum is an emphatic yes, an affirmation of the story Wayne Shorter is telling on his saxophone.

Figure 15.3  Bill Evans, listening (Steve Schapiro, 1961)

262  Theo van Leeuwen In a recording of “Wave” from a CD by the Brazilian singer Rosa Passos (2004), the bass solo, played by Paulo Paulelli, is only accompanied by the piano (Paul Braga), as is customary in modern jazz. In the first part of the solo, the melody stays put, alternating between just two notes, and so creating a certain suspense. The piano gently echoes this, in part in the gaps between the bass solo’s phrases, in part overlapping with them. But when Paulelli suddenly breaks out of this pattern and surges upward, the piano spurs him on with two sharp chords. Braga understands exactly where Paulelli is going with his solo, to the point that it becomes difficult to say whether the sudden surge was first thought of by Paulelli or Braga. Accompanying is an art form. In the best performances accompanists will follow the singer or soloist with the same absorbed attention we saw on the face of the juror in Figure 15.1, listening with intense concentration as well as closely watching the soloist’s hands and body actions. 6. CODA The essence of accompaniment, and of listening generally, is to forget about yourself and to become immersed in the other. In the same way, and in the same period as Bakhtin declared that “the listener becomes the speaker”, the Jewish theologian Martin Buber (1996 [1923]) described listening as an encounter, an encounter in which the ‘you’ encounters the ‘I’ in a relation that “involves passivity as well as activity, being chosen as well as choosing” and in which “I comes into being through the encounter with you” (p. 13). If listening is interpreted as an encounter, it is a semiotic act in which we become fully focused on the other, on what somebody else is saying, or singing, or playing, rather than our own ‘self-expression’, and in which ‘selfexpression’, in turn, is not possible without a dialogue with, and support from, the other. This makes listening sound almost like a spiritual activity, and it can be. But that does not mean it exists in a sphere that is beyond analysis and can only be spoken about in mystical and celebratory terms. From the point of view of multimodal discourse analysis, a focus on listening means doing away with the duality between the active speaker and the passive listener, seeing both as engaged in semiotic activity, as producing signs, so that both can therefore be analyzed, understood, and appreciated through the same close attention to the signifier that is, and will always be, the hallmark of good semiotics and good discourse analysis. But listening can also be critical, and listening analysis can therefore also become a form of critical discourse analysis. Paying attention to ‘negative’ listening signs, we can begin to show that silence does not always mean consent, and begin to articulate the meanings listeners make, even when, due to whatever kind of social pressure, they cannot say them out loud.

Toward a Semiotics of Listening  263 REFERENCES Ang, I. (1985). Watching Dallas—Soap opera and the melodramatic imagination. London: Methuen. Bakhtin, M. M. (1986). Speech genres and other late essays (V. W. McGee, Trans.). Austin: University of Texas Press. Buber, M. (1996). I and thou. New York, NY: Touchstone Editions. Coppola, F. F. (Director). (1972). The Godfather [Motion picture]. United States: Paramount Pictures. Deleuze, G. (1986). Cinema 1: The movement image (H. Tomlinson, Trans.). London: The Athlone Press. Erickson, F. (1982). Moneytree, lasagna bush, salt and pepper: Social construction of topical cohesion among Italian Americans. In D. Tannen, (Ed.), Analyzing discourse: Text and talk (pp. 43–70). Washington, DC: Georgetown University Press. Hall, E. T. (1983). The dance of life—The other dimension of time. New York, NY: Anchor Press. Halliday, M. A. K. (1985). An introduction to functional grammar. London: Arnold. Katz, E., & Liebes, T. (1986). Mutual aid in the decoding of Dallas: Notes from a cross-cultural study. In P. Drummond & R. Paterson (Eds.), Television in transition (pp. 187–198). London: BFI. Keil, C. (1991). Urban blues. Chicago, IL: University of Chicago Press. Lomax, A. (1968). Folk song style and culture. New Brunswick, NJ: Transaction Books. Lumet, S. (Director). (1957). 12 Angry Men [Motion picture]. United States: Metro Goldwyn Mayer. Martinec, R. (2000). Rhythm in multimodal texts. Leonardo, 33(4), 289–297. Meadows, E. S. (2003). Bebop to cool—Context, ideology and musical identity. Westport, CT: Praeger. Morgan, L. (1997). The best of Lee Morgan: The Blue Note years [CD]. New York, NY: Blue Note. Morley, D. (1980). The nationwide audience: Structure and decoding. London: BFI. Passos, R. (2004). Amorosa [CD]. Hong Kong: Sony Music Entertainment. Quirk, R., Greenbaum S., Leech, G., & Svartvik, J. (1978). A grammar of contemporary English. London: Longman. Russell, R. (1973). Bird lives!: The high life and hard times of Charlie (Yardbird) Parker. New York, NY: Charterhouse. Shapiro, S. (1961). Bill Evans, listening.[photograph]. Retrieved from http://www Tannen, D. (1990). You just don’t understand—Women and men in conversation. London: Virago. Taylor, A. (1993). Notes and tones: Musician-to-musician interviews. New York, NY: daCapo Press. Van Leeuwen, T. (1999). Speech, music, sound. London: Palgrave Macmillan.

This page intentionally left blank


John A. Bateman is a full professor of applied linguistics at the University of Bremen and has been applying mechanisms of discourse interpretation to film for several years. He obtained his PhD in Artificial Intelligence from the University of Edinburgh in 1986 and has worked in various areas of multimodal, computational, and functional linguistics since the early 1990s. He is currently head of the newly formed Bremen Institute of Transmedial Textuality research (BITT) at the University of Bremen, and is involved with several university and third-party funded projects on the application of linguistic methods to filmic analysis. Monika Bednarek is senior lecturer in linguistics at the University of Sydney. Her most recent publications include News Discourse with Helen Caple (Continuum, 2012), The Language of Fictional Television (Continuum, 2010), and articles on television dialogue in the International Journal of Corpus Linguistics, Language & Literature, and Multilingua. She is also coeditor of New Discourse on Language: Functional Perspectives on Multimodality, Identity, and Affiliation with Jim Martin (Continuum, 2010) and Telecinematic Discourse: Approaches to the Language of Films and Television Series, with Roberta Piazza and Fabio Rossi (John Benjamins, 2011). Judith (Judie) Leah Cross is an academic member of the education faculty at the University of Wollongong, as well as head teacher of languages and subject coordinator/lecturer in academic and professional communication at other tertiary institutions. Two of her most recent papers are “Digital Images and the z-Axis”, published in Learning, Media & Technology, and “From the Reflective ePractitioner: A Pilot Model of Teacher Preparation Employing ePortfolio”, available from The International Journal of ePortfolio. Her current research focuses on reflective writing and multimodality, as applied to the blended delivery of training for communication students, TESOL, and overseas-trained teachers. Emilia Djonov is a lecturer in multiliteracies at the Institute of Early Childhood, Macquarie University, Australia, and an honorary postdoctoral

266 Contributors research fellow at the University of Technology, Sydney, where she is researching the interaction between PowerPoint’s design and use in higher education and corporate settings. Her research interests and publications are in the areas of multimodal and hypermedia discourse analysis, visual communication, social semiotics, systemic functional theory, and multiliteracies. Dorothy Economou is a postgraduate academic writing advisor in the Faculty of Arts at the University of Sydney, and has over the past 15 years also been involved in teaching and research into academic skills, online learning, and media communication. She has also worked in journalism and subtitling in Greek and English. Her 2010 PhD from the University of Sydney presented a new multimodal framework for exploring evaluative stance and key in verbal-visual civic journalism texts on critical social issues from both Australia and Greece. Charles Forceville works in the Media Studies department at University of Amsterdam. After publishing Pictorial Metaphor in Advertising (Routledge, 1996), his research broadened to multimodal metaphor in various media and genres. He coedited Multimodal Metaphor with Eduardo Urios-Aparisi (Mouton de Gruyter, 2009) and wrote a “Course in Pictorial and Multimodal Metaphor” ( Nowadays considering the structure and rhetoric of multimodal discourse his core business, he aims to be a cognitivist in the humanities (see http://muldisc, drawing on narratology, Relevance Theory, and evolutionary approaches. His teaching and research interests include documentary film, metaphor, animation, comics and cartoons, and ­advertising. Lauren Gorfinkel is a lecturer in international communication at Macquarie University in the Department of Media, Music, Communication and Cultural Studies. Her research interests are in televised constructions of national identity, as well as intercultural approaches to language, music, and media education. She completed her PhD at the University of Technology, Sydney, in 2012, with a thesis on the cultural politics of Chinese music-entertainment television. Carmen Daniela Maier is an associate professor and member of the Knowledge Communication Research Group at the Faculty of Business and Social Sciences, Aarhus University, Denmark. Among her latest publications are “Visual Evaluation in Film Trailers” in Visual Communication and “Communicating Business Greening and Greenwashing in Global Media: A Multimodal Discourse Analysis of CNN’s Greenwashing Video” in The International Communication Gazette. Her current research focuses

Contributors  267 on the multimodal communication of specialized knowledge, environmental discourses, and corporate videos. Kay L. O’Halloran is Associate Professor in the School of Education at Curtin University in Perth Western Australia and founding Director of the Multimodal Analysis Lab in the Interactive Digital Media Institute at the National University of Singapore. Her areas of research include multimodal analysis, social semiotics, mathematics discourse, and the development of interactive digital media technologies and mathematical and scientific visualization techniques for multimodal and socio-cultural analytics. Alexey Podlasov is a research fellow at the Multimodal Analysis Lab, National University of Singapore. He graduated with a PhD in Computer Science in 2007, with a main topic in image processing and compression. Since 2008 he has been conducting cross-disciplinary research in close connection with social semioticians focusing on software development, image and video processing, and visualization and analysis techniques. Sabine Tan is a research associate at the Multimodal Analysis Lab, Interactive Digital Media (IDMI) at National University of Singapore. Her primary research interests include critical multimodal discourse analysis, visual communication, and social semiotics. She is particularly interested in applications of social semiotic theory to the analysis of corporate institutional discourses involving new and traditional media, such as business news mediated on the Internet, corporate television advertisements, corporate web pages, and other organizational multimodal discourse genres. Angela Thomas is a senior lecturer in English education at the University of Tasmania. She is the coauthor of Children’s Literature and Computer Based Teaching (Open University Press, 2005), and author of Youth Online: Identity and Literacy in the Digital Age (Peter Lang, 2007). Angela’s research focuses on the fusion of literature and digital media; of particular note is her study and development of new media spaces for the deep and immersive exploration of literature. She is currently researching young children’s creative augmented reality storytelling. Chiaoi Tseng is an associate researcher in linguistics and literary science at Bremen University. Her research interests include film analysis, multimodal discourse, and genre. She completed her PhD in 2009 and is the author of the book Cohesion in Film: Tracking Film Elements (Palgrave, 2013). Dr. Tseng currently works within a project exploring the development of automatic support for high-level narrative analysis of films using image-processing techniques.

268 Contributors Len Unsworth is a professor in English and literacies education at Griffith University in Brisbane, Australia. His book publications include Teaching Children’s Literature with Information and Communication Technologies (McGraw-Hill/Open University Press, 2005, with Angela Thomas, Alyson Simpson, and Jenny Asha), e-Literature for Children and Classroom Literacy Learning (Routledge, 2006), New Literacies and the English Curriculum (Continuum, 2008), Multimodal Semiotics (Continuum, 2008), and Reading Visual Narratives, with Clare Painter and Jim Martin (Equinox, 2012). Theo van Leeuwen is Emeritus Professor at the University of Technology, Sydney, and Professor of Language and Communication at the University of Southern Denmark. He has published widely on critical discourse analysis, multimodality and visual semiotics. His books include Reading Images (with Gunther Kress) and Discourse and Practice. He is a founding editor of the journal Visual Communication. Yiqiong Zhang is a lecturer at Guangdong University of Foreign Studies, China. She obtained her PhD from National University of Singapore with a thesis titled Representing science: A multimodal study of science popularization on institutional and mass media websites. Her research interests include cross-cultural studies and multimodal discourse analysis, particularly with respect to hypermodal representations in websites of different contexts. Sumin Zhao is a Chancellor’s Postdoctoral Research Fellow at University of Technology, Sydney. Her current project investigates preschool children’s interactions with mobile technologies in home settings by combing multimodal discourse analysis with other research methodologies. She publishes in the research areas of digital literacies, systemic functional linguistic theory, and multimodal discourse analysis.


accompaniment, in music 253–4, 260–1 advertising 60, 151–3 Affect 153, 183, 219; see also Appraisal Theory; visual semiotics affiliate expert 128, 132–5; see also news affiliates Affiliation (visual) 219–20; see also Appraisal Theory; visual semiotics Age, The 181–2, 193–7 Althusser, Louis 6, 145 Ambience, visual 219–20; see also Appraisal Theory; visual semiotics anchor (presenter) 128; see also news network animation: animated movies 202–3, 215; animation software 215, 222–3; children’s animated narratives 222–3 Appraisal Theory 168–9, 171, 183–4, 197–8; see also Affect; Affiliation; Ambience; Systemic Functional Theory Arnheim, Rudolf 18 attention economy 238 Australia’s Economic Report Card 241–7 Auteur Theory 17–20 authorship 17–22, 32–3 Bakhtin, Mikhail 1, 11, 144, 233, 252, 262 Balance (visual) 43–4; see also visual semiotics Baldry, Anthony 4, 40, 42 Balsam, Martin 256–7 Barthes, Roland 1, 11, 17, 64, 72–3, 94, 96, 99, 103 Bateman, John 19–20, 23, 25, 26, 40, 44, 67 Bateson, Gregory 4, 255 Bergman, Ingmar 22–3, 29–33 Bernstein, Basil 1, 7, 146, 149, 240, 248

Black, Max 56–7, 59, 66 Bloomberg 136–7; see also CNBC; Fox Business News; news Bordwell, David 17, 20, 40, 113, 117 Bourdieu, Pierre 1, 6, 11, 143, 153, 157, 217, 221 branding 50 Brando, Marlon 260 broadsheets 181; see also Age, The; Sydney Morning Herald, The (SMH) Browne, Anthony 260–1 Buber, Martin 262 bullet points 232–5; see also Australia’s Economic Report Card; layout; new writing; PowerPoint Buscombe, Edward 17, 19, 33 Butler, Judith 146 Caple, Helen 40, 43–4, 50, 53, 163, 166, 168, 183, 189 China Central Television (CCTV) 94–6, 97–8 Chinese Communist Party 95, 101 Chineseness 96 Chouliaraki, Lilie 1–2, 6 Christie, Frances 217, 221, 240 Clarke, Kenny 260–1; see also jazz, modern class (social) 5, 74, 146, 170, 241 CNBC 131–6, 138; see also Bloomberg; Fox Business News; news Cobb, Lee 258–9 coding orientation 146; see also Bernstein, Basil; modality, visual cohesion: cohesive chain 26, 29–31; filmic cohesion 26, 31–3; visual cohesion (see new writing) commodification 50; of business and finance 126, 136; of identity 50–1; of indie cultural values 143

270 Index commodity, information as 238; see also attention economy; new writing Conceptual Metaphor Theory (CMT) 55, 56–8 see also metaphor consumerism 51, 156–7 content analysis 37, 147 Cope, Bill 4, 217 Coppola, Francis 260 Critical Discourse Analysis: approaches to 6–7; history of 5–6 critical literacy 217, 221–2 cultural analytics see digital humanities data visualization 72, 76–7, 79, 81, 83–7 Deleuze, Gilles 252 digital humanities 11, 71–2, 87–8 discourse, types of: commercial 50 (see also branding); environmental 110; filmic 23; news 129–130, 135, 139; pedagogic 240, 248; popular 7–8. discourse theories see Critical Discourse Analysis; Multimodal Discourse Analysis Duara, Presenjit 93, 96 Earth Song 109–110, 121–2; see also Jackson, Michael; music video Evans, Bill 261; see also jazz, modern Fairclough, Norman 1–2, 5–7, 52, 125, 127–9, 131, 139–40, 145, 154, 160–4, 183, 235, 238–9 fashion see Japanese Street Fashion film actor see Balsam, Martin; Brando, Marlon; Cobb, Lee; Fonda, Henry film director 17–19, 252; see also Auteur Theory; Bergman, Ingmar; Lumet, Sidney; Nolan, Christopher film form and technique: beginnings 23; continuity 26, 113, 114–15, 119–21; continuity editing 117–18; frame 113; genre conventions/norms 20, 22 (see also film theory: deviation); opening sequences 32; parallel montage 117; shot 20, 25, 40, 43, 113, 252, 259–60 film theory: anchoring bias 23; authorship 21–2 (see also Auteur Theory); deviation, style as 22–32; film beginnings 30–2 Flight of the Conchords 39; see also television series

focalization see narrative technique Fonda, Henry 254–5, 256–60 Fowler, Roger 5, 183 Fox Business News (FBN) 131; see also CNBC; Fox Business News; news frankie 143–4, 156; see also women’s magazines Freire, Paulo 1, 11, 221 Futurity 161; see also news Gibbs, Raymond 55 Godfather, The 260 Goffman, Erving 1, 4, 147 Gorilla 205, 220; see also picture books Gramsci, Antonio 6, 145–6 growing neural gas (GNG) 85, 87; see also data visualization Gumperz, John 4 Hall, Edward 71, 166, 255 Halliday, M. A. K. 3, 5, 22, 72, 119, 131, 162–3, 165–6, 183, 188, 217–18, 235–6, 239, 245–6, 252–3 Happy in China 97–8, 101–3; see also China Central Television (CCTV); music-entertainment television Hasan, Ruqaiya 26 Higgins, Billie 261 hipster 50–1; see also indie (independent) culture Hitchcock, Alfred 19, 30 Hodge, Bob 3, 5, 7, 183 Hubbard, Freddie 260; see also jazz, modern hypermodal 161 identity: Chinese national (see Chineseness); commodified 50–1; corporate 138; gender 146, 222; social 125; see also hipster; indie (independent) culture Iedema, Rick 40, 139, 146, 162, 243 Indie (independent) culture 51, 143–4, 156–7; see also hipster indoctritainment 95 institutional actor 128, 137–8 Internet see Bloomberg; CNBC; Fox Business News; Futurity intersemiosis, intersemiotic relations see multimodal relations Jackson, Michael 110, 114–18; see also music video Japanese Street Fashion 73–4

Index  271 jazz, modern 260–2 Jewitt, Carey 2, 4, 40, 59, 72, 111, 214–15 Johnson, Mark 6, 55–6 Kalantzis, Mary 4, 217 Knox, John 129, 160, 165–7, 189, 236 Kövecses, Zoltán 55, 67 Kress, Gunther 1–5, 7, 38, 40, 42–3, 51, 53, 59, 65, 72, 97, 111, 144, 147, 149, 152, 162–3, 167, 183, 204–5, 210, 217, 235, 238–9, 245, 248 Lakoff, George 6, 55–6 Lam, Wai–man 94, 102, 106 layout 149, 189, 236–7 Leech, Geoffrey 238, 253 Lemke, Jay 2, 4, 95, 160–1, 163, 172, 202, 233 listener 251–2, 262 listening: semiotics of 252–3, 262; listening signs 253, 256–60, 260–1 Livingston, Paisley 17–18, 21–3, 27, 32 Lost Thing, The 202, 206, 215 Lumet, Sidney 252, 258 Macau 93–4, 105–6 Machin, David 2, 7, 36, 51–2, 109, 119–20, 125–8, 139, 144, 146, 147–8, 166–8 Macken-Horarik, Mary 2, 95, 183 Manovich, Lev 9, 88 marketization: of science 160–1, 164; of public discourse 235, 238; see also commodification Martin, James R. 4, 6, 23–5, 40, 42, 53, 144, 148, 151–3, 162, 165, 167, 183–6, 188, 191, 203, 217–19, 221, 240, 247 Martinec, Radan 4, 111, 163, 255–6 mediated realities 94 Memento 20, 28 metaphor: contextual metaphor 151–3; pictorial and multimodal metaphor 59; see also Conceptual Metaphor Theory Meyer, Michael 2, 5–6, 127, 161 modality, visual 38, 147–8; see also visual semiotics monophony 99; see also social unison Montgomery, Martin 127–9, 131, 138, 140 Multimodal Discourse Analysis: approaches to 3–5; history of 2–3 multimodal relations: convergence 218; divergence 218; intermodal complementarity

218; intermodality 217–18; intersemiotic collocation 165; intersemiotic repetition 165; synchronization 111 multimodal transcription 112–13 music: accompaniment 253–4, 260–1; beat 117–18; pitch 118; tone 220 music-entertainment television 94; see also China Central Television (CCTV) music video 110–11 Myers, Greg 160, 239 myth 96; see also Barthes, Roland narrative: children’s animated narratives 222–3; in film 26 (see also cohesion, filmic); transmedia narratives 203, 215 narrative technique: focalization 203; parallelism 220; point of view 203, 204–5 nationalism, in China 95; see also Chineseness Nelkin, Dorothy 160, 164, 167 New London Group 4, 7 news: business news networks 125–6 (see also Bloomberg; CNBC; Fox Business News); genre 139, 186; hypermodal 161; news websites (see Bloomberg; CNBC; Fox Business News; Futurity); refugee news (see broadsheets); science news 160–1, 174 new writing 235–6; see also bullet points Nolan, Christopher 20, 28 Norris, Sigrid 4, 5 O’Halloran, Kay L. 2, 40, 42, 72, 111, 138, 160, 163 Ortony, Andrew 55 O’Toole, Michael 2, 4, 72, 163 Painter, Clare 203–6, 215, 217–21 Passos, Rosa 262; see also jazz, modern Persona 26–8; see also Bergman, Ingmar picture books 202–3; see also Browne, Anthony; Gorilla; Lost Thing, The; Rosen, Michael; Sad Book, The; Tan, Shaun point of view see narrative technique Powell, Bud 260; see also jazz, modern PowerPoint 234–5, 236–8, 239–40 recontextualization 243 remix 221–2

272 Index resemiotization 243 rhythm 255–6 Rose, David 4, 144, 148, 151–2, 162, 186, 221, 240 Rosen, Michael 218 Royce, Terry 4, 163, 165 Sad Book, The 218 science popularization 164 scifotainment 168 Scollon, Ron 4, 128–9, 131–2, 134–5, 139 self-organizing map (SOM) 77–9; see also data visualization; growing neural gas (GNG) Shorter, Wayne 261; see also jazz, modern social activism 221 social actor 127–8, 131, 187, 189–90, 241 social media 161, 166, 171–2, 173, 174 social practice 87–8, 163–4, 173, 241–3 social unison 99; see also monophony software: animation software 215, 222–3; office software 235; video editing software 113; see also growing neural gas (GNG); PowerPoint; self-organizing map (SOM) software studies 71 spatiality 113–14 Sperber, Dan 5, 66 Spring Festival Gala 97, 99–101; see also China Central Television (CCTV); music-entertainment television Summer with Monika 29–30; see also Bergman, Ingmar Sydney Morning Herald, The (SMH) 181–2, 191–3; see also Age, The; broadsheets Systemic Functional Theory: genre 148, 152–3, 162, 233; ideational meaning 183; interpersonal meaning (see Appraisal Theory); logicosemantic relations 235, 245; macro-genre 148; macro-theme 23; metafunctions 3, 40, 42–3, 163, 218; stratification 25 Tagg, Philip 114, 118–20 Tan, Shaun 202 Tannen, Deborah 4, 255 taste 143–4; see also Bourdieu, Pierre television programs see Happy in China; Spring Festival Gala;

10-year Anniversary of the Handover of Macau Specials; see also China Central Television (CCTV); music-entertainment television television series 36; see also Flight of the Conchords television title sequence (TTS) 36–7, 51–2 temporality 113–14; see also spatiality 10-year Anniversary of the Handover of Macau Specials 97–8, 103–5; see also China Central Television (CCTV); music-entertainment television Thibault, Paul 4, 40, 42, 183, 236 Thompson, Kristin 17, 40, 117 Thornborrow, Joanna 36, 51–2, 140, 147–8 Timmons, Bobby 253–4; see also jazz, modern Trew, Tony 5, 183 Tufte, Edward 234 Twelve Angry Men 252, 254–5, 256–60 Unsworth, Len 2–4, 71, 163, 202–5, 215, 218 Van Dijk, Teun 2, 5–6, 52, 125, 127, 161–2, 168, 183 Van Leeuwen, Theo 2–3, 5–7, 24, 40, 42, 43–4, 48, 51, 53, 59, 65, 86, 97–100, 111, 115, 118–19, 131, 133, 136, 138, 144–7, 149, 152, 163, 167, 171, 187, 204–5, 210, 217–18, 220, 224, 233, 235–6, 238, 241, 243, 255 Vernallis, Carol 111, 117, 119, 121 videobite 129 visualization see data visualization; growing neural gas (GNG); selforganizing map (SOM) visual semiotics: Affiliation 219–20; Ambience 219–20; Balance 43–4; layout 149, 189, 236–7; modality 38, 147–8; Salience 43 website data see Bloomberg; CNBC; Fox Business News; Futurity White, Peter 4, 144, 153, 162–3, 168, 183–6, 191, 219 Wild Strawberries 24–8; see also Bergman, Ingmar Wodak, Ruth 2, 5–6, 52, 161 Wollen, Peter 18–19, 33 women’s magazines 144–6